Carina Negreanu

ORCID: 0000-0003-2130-7223
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Spreadsheets and End-User Computing
  • Natural Language Processing Techniques
  • Statistics Education and Methodologies
  • Software Engineering Research
  • Data Quality and Management
  • Data Visualization and Analytics
  • Time Series Analysis and Forecasting
  • Explainable Artificial Intelligence (XAI)
  • Scientific Computing and Data Management
  • Advanced Database Systems and Queries
  • Machine Learning and Data Classification
  • Semantic Web and Ontologies
  • Algorithms and Data Compression
  • Educational Games and Gamification
  • Recommender Systems and Techniques
  • Multimodal Machine Learning Applications
  • Mobile Crowdsensing and Crowdsourcing
  • Neural Networks and Applications
  • Simulation Techniques and Applications
  • Advanced Text Analysis Techniques
  • Cloud Computing and Resource Management
  • Text Readability and Simplification
  • Cloud Data Security Solutions
  • Artificial Intelligence in Games

Microsoft Research (United Kingdom)
2020-2024

University of Toronto
2024

University of California, San Diego
2024

University College London
2023-2024

University of Cambridge
2023-2024

Microsoft (United States)
2023

Carnegie Mellon University
2023

Microsoft (Belgium)
2023

Code-generating large language models translate natural into code. However, only a small portion of the infinite space naturalistic utterances is effective at guiding code generation. For non-expert end-user programmers, learning this challenge abstraction matching. We examine in specific context data analysis spreadsheets, system that maps users query to Python using Codex generator, executes code, and shows result. propose grounded matching, which bridges gap by translating back systematic...

10.1145/3544548.3580817 preprint EN 2023-04-19

Large language models, such as OpenAI's codex and Deepmind's AlphaCode, can generate code to solve a variety of problems expressed in natural language. This technology has already been commercialised at least one widely-used programming editor extension: GitHub Copilot. In this paper, we explore how with large models (LLM-assisted programming) is similar to, differs from, prior conceptualisations programmer assistance. We draw upon publicly available experience reports LLM-assisted...

10.48550/arxiv.2208.06213 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01

LLM-powered tools like ChatGPT Data Analysis, have the potential to help users tackle challenging task of data analysis programming, which requires expertise in processing, and statistics.However, our formative study (n=15) uncovered serious challenges verifying AI-generated results steering AI (i.e., guiding system produce desired output).We developed two contrasting approaches address these challenges.The first (Stepwise) decomposes problem into step-by-step subgoals with pairs editable...

10.1145/3654777.3676345 article EN 2024-10-11

Generative AI tools can help users with many tasks. One such task is data analysis, which notoriously challenging for non-expert end-users due to its expertise requirements, and where holds much potential, as finding relevant sources, proposing analysis strategies, writing code. To understand how workflows be assisted or impaired by generative AI, we conducted a study (n=15) using Bing Chat via participatory prompting. Participatory prompting recently developed methodology in researchers...

10.1145/3663384.3663389 preprint EN 2024-06-22

String data is common in real-world datasets: 67.6% of values a sample 1.8 million real Excel spreadsheets from the web were represented as text. Automatically cleaning such string can have significant impact on users. Previous approaches are limited to error detection, require that user provides annotations, examples, or constraints fix errors, and focus independently syntactic errors semantic strings, but ignore strings often contain both substrings. We introduce DataVinci, fully...

10.1145/3709677 article EN other-oa Proceedings of the ACM on Management of Data 2025-02-10

Code-generating large language models (LLMs) are transforming programming. Their capability to generate multi-step solutions provides even non-programmers a mechanism harness the power of coding. Non-programmers often use spreadsheets manage tabular data, as they offer an intuitive understanding data manipulation and formula out-comes. Considering that LLMs can complex, potentially incorrect code, our focus is on enabling user trust in accuracy LLM-generated code. We present ColDeco, first...

10.1109/vl-hcc57772.2023.00017 article EN 2023-10-03

Spreadsheets are widely used for table manipulation and presentation. Stylistic formatting of these tables is an important property presentation analysis. As a result, popular spreadsheet software, such as Excel, supports automatically based on rules. Unfortunately, writing rules can be challenging users it requires knowledge the underlying rule language data logic. We present Cornet, system that tackles novel problem learning from user-provided formatted cells. Cornet takes inspiration...

10.14778/3603581.3603600 article EN Proceedings of the VLDB Endowment 2023-06-01

Imagine a developer who can only change their last line of code—how often would they have to start writing function from scratch before it is correct? Auto-regressive models for code generation natural language similar limitation: do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, pre-trained diffusion model that addresses this limitation by iteratively denoising complete program conditioned on the encoded language. evaluate CodeFusion task Bash, Python, and...

10.18653/v1/2023.emnlp-main.716 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

The following topics are dealt with: computer science education; programming; software tools; aided instruction; engineering; interactive systems; learning (artificial intelligence); data analysis; text groupware.

10.1109/vl/hcc50065.2020.9127254 article EN 2020-07-16

Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their by writing data-dependent conditional formatting (CF) rules. Writing such rules often challenging as it requires understanding implementing the underlying logic. We present FormaT5, a transformer-based model that can generate CF rule given target table natural language description of desired find user descriptions these tasks are...

10.14778/3632093.3632111 article EN Proceedings of the VLDB Endowment 2023-11-01

Imagine a developer who can only change their last line of code, how often would they have to start writing function from scratch before it is correct? Auto-regressive models for code generation natural language similar limitation: do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, pre-trained diffusion model that addresses this limitation by iteratively denoising complete program conditioned on the encoded language. evaluate CodeFusion task Bash, Python,...

10.48550/arxiv.2310.17680 preprint EN cc-by arXiv (Cornell University) 2023-01-01

LLM-powered tools like ChatGPT Data Analysis, have the potential to help users tackle challenging task of data analysis programming, which requires expertise in processing, and statistics. However, our formative study (n=15) uncovered serious challenges verifying AI-generated results steering AI (i.e., guiding system produce desired output). We developed two contrasting approaches address these challenges. The first (Stepwise) decomposes problem into step-by-step subgoals with pairs editable...

10.1145/3654777.3676345 preprint EN arXiv (Cornell University) 2024-07-02

Users are increasingly being warned to check AI-generated content for correctness. Still, as LLMs (and other generative models) generate more complex output, such summaries, tables, or code, it becomes harder the user audit evaluate output quality Hence, we seeing emergence of tool-assisted experiences help double-check a piece content. We refer these co-audit tools. Co-audit tools complement prompt engineering techniques: one helps construct input prompt, while them response. As specific...

10.48550/arxiv.2310.01297 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Spreadsheets are widely used for table manipulation and presentation. Stylistic formatting of these tables is an important property both presentation analysis. As a result, popular spreadsheet software, such as Excel, supports automatically based on rules. Unfortunately, writing rules can be challenging users it requires knowledge the underlying rule language data logic. We present CORNET, system that tackles novel problem learning from user examples in form formatted cells. CORNET takes...

10.48550/arxiv.2208.06032 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Row completion is the task of augmenting a given table text and numbers with additional, relevant rows. The divides into two steps: subject suggestion, populating main column; gap filling, remaining columns. We present state-of-the-art results for suggestion filling measured on standard benchmark (WikiTables).

10.1145/3487553.3524923 article EN Companion Proceedings of the The Web Conference 2018 2022-04-25

Data management and analysis tasks are often carried out using spreadsheet software. A popular feature in most platforms is the ability to define data-dependent formatting rules. These rules can express actions such as "color red all entries a column that negative" or "bold rows not containing error failure." Unfortunately, users who want exercise this functionality need manually write these conditional (CF) We introduce CORNET, system automatically learns from user examples. CORNET takes...

10.48550/arxiv.2308.07357 preprint EN cc-by arXiv (Cornell University) 2023-01-01

String data is common in real-world datasets: 67.6% of values a sample 1.8 million real Excel spreadsheets from the web were represented as text. Systems that successfully clean such string can have significant impact on users. While prior work has explored errors data, proposed approaches often been limited to error detection or require user provide annotations, examples, constraints fix errors. Furthermore, these systems focused independently syntactic semantic strings, but ignore strings...

10.48550/arxiv.2308.10922 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Data management and analysis tasks are often carried out using spreadsheet software. A popular feature in most platforms is the ability to define data-dependent formatting rules. These rules can express actions such as "color red all entries a column that negative" or "bold rows not containing error failure". Unfortunately, users who want exercise this functionality need manually write these conditional (CF) We introduce Cornet, system automatically learns from user examples. Cornet takes...

10.14778/3611540.3611620 article EN Proceedings of the VLDB Endowment 2023-08-01

With the evolution of Large Language Models (LLMs) we can solve increasingly more complex NLP tasks across various domains, including spreadsheets. This work investigates whether LLMs generate code (Excel OfficeScripts, a TypeScript API for executing many in Excel) that solves Excel specific provided via natural language user instructions. To do so introduce new large-scale benchmark, InstructExcel, created by leveraging 'Automate' feature to automatically OfficeScripts from users' actions....

10.48550/arxiv.2310.14495 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their by writing data-dependent conditional formatting (CF) rules. Writing such rules often challenging as it requires them understand implement the underlying logic. We present FormaT5, a transformer-based model that can generate CF rule given target table natural language description of desired find user descriptions these tasks are...

10.48550/arxiv.2310.17306 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Justin Payan, Swaroop Mishra, Mukul Singh, Carina Negreanu, Christian Poelitz, Chitta Baral, Subhro Roy, Rasika Chakravarthy, Benjamin Van Durme, Elnaz Nouri. Findings of the Association for Computational Linguistics: EMNLP 2023.

10.18653/v1/2023.findings-emnlp.265 article EN cc-by 2023-01-01
Coming Soon ...