NFDI4DS | UHH-SEMS - Publication Details

Carina Negreanu

ORCID: 0000-0003-2130-7223

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5082623545

Research Areas

Topic Modeling
Spreadsheets and End-User Computing
Natural Language Processing Techniques
Statistics Education and Methodologies
Software Engineering Research
Data Quality and Management
Data Visualization and Analytics
Time Series Analysis and Forecasting
Explainable Artificial Intelligence (XAI)
Scientific Computing and Data Management
Advanced Database Systems and Queries
Machine Learning and Data Classification
Semantic Web and Ontologies
Algorithms and Data Compression
Educational Games and Gamification
Recommender Systems and Techniques
Multimodal Machine Learning Applications
Mobile Crowdsensing and Crowdsourcing
Neural Networks and Applications
Simulation Techniques and Applications
Advanced Text Analysis Techniques
Cloud Computing and Resource Management
Text Readability and Simplification
Cloud Data Security Solutions
Artificial Intelligence in Games

Microsoft Research (United Kingdom)
2020-2024

University of Toronto
2024

University of California, San Diego
2024

University College London
2023-2024

University of Cambridge
2023-2024

Microsoft (United States)
2023

Carnegie Mellon University
2023

Microsoft (Belgium)
2023

“What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models

OPENALEX - Publications

Michael Xieyang Liu Advait Sarkar Carina Negreanu Benjamin G. Zorn Jack M. Williams and 2 more

Code-generating large language models translate natural into code. However, only a small portion of the infinite space naturalistic utterances is effective at guiding code generation. For non-expert end-user programmers, learning this challenge abstraction matching. We examine in specific context data analysis spreadsheets, system that maps users query to Python using Codex generator, executes code, and shows result. propose grounded matching, which bridges gap by translating back systematic...

10.1145/3544548.3580817 preprint EN 2023-04-19

What is it like to program with artificial intelligence?

OPENALEX - Publications

Advait Sarkar Andrew Gordon Carina Negreanu Christian Poelitz Sruti Srinivasa Ragavan and 1 more

Large language models, such as OpenAI's codex and Deepmind's AlphaCode, can generate code to solve a variety of problems expressed in natural language. This technology has already been commercialised at least one widely-used programming editor extension: GitHub Copilot. In this paper, we explore how with large models (LLM-assisted programming) is similar to, differs from, prior conceptualisations programmer assistance. We draw upon publicly available experience reports LLM-assisted...

10.48550/arxiv.2208.06213 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01

Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition

OPENALEX - Publications

Majeed Kazemitabaar Jack M. Williams Ian Drosos Tovi Grossman Austin Z. Henley and 2 more

LLM-powered tools like ChatGPT Data Analysis, have the potential to help users tackle challenging task of data analysis programming, which requires expertise in processing, and statistics.However, our formative study (n=15) uncovered serious challenges verifying AI-generated results steering AI (i.e., guiding system produce desired output).We developed two contrasting approaches address these challenges.The first (Stepwise) decomposes problem into step-by-step subgoals with pairs editable...

10.1145/3654777.3676345 article EN 2024-10-11

"It's like a rubber duck that talks back": Understanding Generative AI-Assisted Data Analysis Workflows through a Participatory Prompting Study

OPENALEX - Publications

Ian Drosos Advait Sarkar X. X. Xu Carina Negreanu Sean Rintel and 1 more

Generative AI tools can help users with many tasks. One such task is data analysis, which notoriously challenging for non-expert end-users due to its expertise requirements, and where holds much potential, as finding relevant sources, proposing analysis strategies, writing code. To understand how workflows be assisted or impaired by generative AI, we conducted a study (n=15) using Bing Chat via participatory prompting. Participatory prompting recently developed methodology in researchers...

10.1145/3663384.3663389 preprint EN 2024-06-22

DataVinci: Learning Syntactic and Semantic String Repairs

OPENALEX - Publications

Mukul Singh José Cambronero Sumit Gulwani Vu Le Carina Negreanu and 2 more

String data is common in real-world datasets: 67.6% of values a sample 1.8 million real Excel spreadsheets from the web were represented as text. Automatically cleaning such string can have significant impact on users. Previous approaches are limited to error detection, require that user provides annotations, examples, or constraints fix errors, and focus independently syntactic errors semantic strings, but ignore strings often contain both substrings. We introduce DataVinci, fully...

10.1145/3709677 article EN other-oa Proceedings of the ACM on Management of Data 2025-02-10

COLDECO: An End User Spreadsheet Inspection Tool for AI-Generated Code

OPENALEX - Publications

Kasra Ferdowsi Jack M. Williams Ian Drosos Andrew D. Gordon Carina Negreanu and 3 more

Code-generating large language models (LLMs) are transforming programming. Their capability to generate multi-step solutions provides even non-programmers a mechanism harness the power of coding. Non-programmers often use spreadsheets manage tabular data, as they offer an intuitive understanding data manipulation and formula out-comes. Considering that LLMs can complex, potentially incorrect code, our focus is on enabling user trust in accuracy LLM-generated code. We present ColDeco, first...

10.1109/vl-hcc57772.2023.00017 article EN 2023-10-03

Cornet: Learning Table Formatting Rules By Example

OPENALEX - Publications

Mukul Singh José Cambronero Sánchez Sumit Gulwani Vu Le Carina Negreanu and 2 more

Spreadsheets are widely used for table manipulation and presentation. Stylistic formatting of these tables is an important property presentation analysis. As a result, popular spreadsheet software, such as Excel, supports automatically based on rules. Unfortunately, writing rules can be challenging users it requires knowledge the underlying rule language data logic. We present Cornet, system that tackles novel problem learning from user-provided formatted cells. Cornet takes inspiration...

10.14778/3603581.3603600 article EN Proceedings of the VLDB Endowment 2023-06-01

CodeFusion: A Pre-trained Diffusion Model for Code Generation

OPENALEX - Publications

Mukul Singh José Cambronero Sumit Gulwani Vu Le Carina Negreanu and 1 more

Imagine a developer who can only change their last line of code—how often would they have to start writing function from scratch before it is correct? Auto-regressive models for code generation natural language similar limitation: do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, pre-trained diffusion model that addresses this limitation by iteratively denoising complete program conditioned on the encoded language. evaluate CodeFusion task Bash, Python, and...

10.18653/v1/2023.emnlp-main.716 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Understanding and Inferring Units in Spreadsheets

OPENALEX - Publications

Jack M. Williams Carina Negreanu Andrew D. Gordon Advait Sarkar

The following topics are dealt with: computer science education; programming; software tools; aided instruction; engineering; interactive systems; learning (artificial intelligence); data analysis; text groupware.

10.1109/vl/hcc50065.2020.9127254 article EN 2020-07-16

FormaT5: Abstention and Examples for Conditional Table Formatting with Natural Language

OPENALEX - Publications

Mukul Singh José Cambronero Sumit Gulwani Vu Le Carina Negreanu and 3 more

Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their by writing data-dependent conditional formatting (CF) rules. Writing such rules often challenging as it requires understanding implementing the underlying logic. We present FormaT5, a transformer-based model that can generate CF rule given target table natural language description of desired find user descriptions these tasks are...

10.14778/3632093.3632111 article EN Proceedings of the VLDB Endowment 2023-11-01

LinkingPark: An automatic semantic table interpretation system

OPENALEX - Publications

Shuang Chen Alperen Karaoglu Carina Negreanu Tingting Ma Jin-Ge Yao and 4 more

10.1016/j.websem.2022.100733 article EN Journal of Web Semantics 2022-06-16

CodeFusion: A Pre-trained Diffusion Model for Code Generation

OPENALEX - Publications

Mukul Singh José Cambronero Sumit Gulwani Vu Le Carina Negreanu and 1 more

Imagine a developer who can only change their last line of code, how often would they have to start writing function from scratch before it is correct? Auto-regressive models for code generation natural language similar limitation: do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, pre-trained diffusion model that addresses this limitation by iteratively denoising complete program conditioned on the encoded language. evaluate CodeFusion task Bash, Python,...

10.48550/arxiv.2310.17680 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition

OPENALEX - Publications

Majeed Kazemitabaar Jack M. Williams Ian Drosos Tovi Grossman Austin Z. Henley and 2 more

LLM-powered tools like ChatGPT Data Analysis, have the potential to help users tackle challenging task of data analysis programming, which requires expertise in processing, and statistics. However, our formative study (n=15) uncovered serious challenges verifying AI-generated results steering AI (i.e., guiding system produce desired output). We developed two contrasting approaches address these challenges. The first (Stepwise) decomposes problem into step-by-step subgoals with pairs editable...

10.1145/3654777.3676345 preprint EN arXiv (Cornell University) 2024-07-02

Co-audit: tools to help humans double-check AI-generated content

OPENALEX - Publications

Andrew D. Gordon Carina Negreanu José Cambronero Rasika Chakravarthy Ian Drosos and 7 more

Users are increasingly being warned to check AI-generated content for correctness. Still, as LLMs (and other generative models) generate more complex output, such summaries, tables, or code, it becomes harder the user audit evaluate output quality Hence, we seeing emergence of tool-assisted experiences help double-check a piece content. We refer these co-audit tools. Co-audit tools complement prompt engineering techniques: one helps construct input prompt, while them response. As specific...

10.48550/arxiv.2310.01297 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Solving Data-centric Tasks using Large Language Models

OPENALEX - Publications

Shraddha Barke Christian Poelitz Carina Negreanu Benjamin G. Zorn José Cambronero and 8 more

10.18653/v1/2024.findings-naacl.41 article EN Findings of the Association for Computational Linguistics: NAACL 2022 2024-01-01

CORNET: Learning Table Formatting Rules By Example

OPENALEX - Publications

Mukul Singh José Cambronero Sumit Gulwani Vu Le Carina Negreanu and 2 more

Spreadsheets are widely used for table manipulation and presentation. Stylistic formatting of these tables is an important property both presentation analysis. As a result, popular spreadsheet software, such as Excel, supports automatically based on rules. Unfortunately, writing rules can be challenging users it requires knowledge the underlying rule language data logic. We present CORNET, system that tackles novel problem learning from user examples in form formatted cells. CORNET takes...

10.48550/arxiv.2208.06032 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Rows from Many Sources: Enriching row completions from Wikidata with a pre-trained Language Model

OPENALEX - Publications

Carina Negreanu Alperen Karaoglu Jack M. Williams Shuang Chen Daniel Fabián and 2 more

Row completion is the task of augmenting a given table text and numbers with additional, relevant rows. The divides into two steps: subject suggestion, populating main column; gap filling, remaining columns. We present state-of-the-art results for suggestion filling measured on standard benchmark (WikiTables).

10.1145/3487553.3524923 article EN Companion Proceedings of the The Web Conference 2018 2022-04-25

Demonstration of CORNET: A System For Learning Spreadsheet Formatting Rules By Example

OPENALEX - Publications

Mukul Singh José Cambronero Sumit Gulwani Vu Le Carina Negreanu and 1 more

Data management and analysis tasks are often carried out using spreadsheet software. A popular feature in most platforms is the ability to define data-dependent formatting rules. These rules can express actions such as "color red all entries a column that negative" or "bold rows not containing error failure." Unfortunately, users who want exercise this functionality need manually write these conditional (CF) We introduce CORNET, system automatically learns from user examples. CORNET takes...

10.48550/arxiv.2308.07357 preprint EN cc-by arXiv (Cornell University) 2023-01-01

DataVinci: Learning Syntactic and Semantic String Repairs

OPENALEX - Publications

Mukul Singh José Cambronero Sumit Gulwani Vu Le Carina Negreanu and 1 more

String data is common in real-world datasets: 67.6% of values a sample 1.8 million real Excel spreadsheets from the web were represented as text. Systems that successfully clean such string can have significant impact on users. While prior work has explored errors data, proposed approaches often been limited to error detection or require user provide annotations, examples, constraints fix errors. Furthermore, these systems focused independently syntactic semantic strings, but ignore strings...

10.48550/arxiv.2308.10922 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Cornet: Learning Spreadsheet Formatting Rules by Example

OPENALEX - Publications

Mukul Singh José Cambronero Sánchez Sumit Gulwani Vu Le Carina Negreanu and 1 more

Data management and analysis tasks are often carried out using spreadsheet software. A popular feature in most platforms is the ability to define data-dependent formatting rules. These rules can express actions such as "color red all entries a column that negative" or "bold rows not containing error failure". Unfortunately, users who want exercise this functionality need manually write these conditional (CF) We introduce Cornet, system automatically learns from user examples. Cornet takes...

10.14778/3611540.3611620 article EN Proceedings of the VLDB Endowment 2023-08-01

InstructExcel: A Benchmark for Natural Language Instruction in Excel

OPENALEX - Publications

Justin Payan Swaroop Mishra Mukul Singh Carina Negreanu Christian Poelitz and 5 more

With the evolution of Large Language Models (LLMs) we can solve increasingly more complex NLP tasks across various domains, including spreadsheets. This work investigates whether LLMs generate code (Excel OfficeScripts, a TypeScript API for executing many in Excel) that solves Excel specific provided via natural language user instructions. To do so introduce new large-scale benchmark, InstructExcel, created by leveraging 'Automate' feature to automatically OfficeScripts from users' actions....

10.48550/arxiv.2310.14495 preprint EN other-oa arXiv (Cornell University) 2023-01-01

FormaT5: Abstention and Examples for Conditional Table Formatting with Natural Language

OPENALEX - Publications

Mukul Singh José Cambronero Sumit Gulwani Vu Le Carina Negreanu and 3 more

Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their by writing data-dependent conditional formatting (CF) rules. Writing such rules often challenging as it requires them understand implement the underlying logic. We present FormaT5, a transformer-based model that can generate CF rule given target table natural language description of desired find user descriptions these tasks are...

10.48550/arxiv.2310.17306 preprint EN cc-by arXiv (Cornell University) 2023-01-01

InstructExcel: A Benchmark for Natural Language Instruction in Excel

OPENALEX - Publications

Justin Payan Swaroop Mishra Mukul Singh Carina Negreanu Christian Poelitz and 5 more

Justin Payan, Swaroop Mishra, Mukul Singh, Carina Negreanu, Christian Poelitz, Chitta Baral, Subhro Roy, Rasika Chakravarthy, Benjamin Van Durme, Elnaz Nouri. Findings of the Association for Computational Linguistics: EMNLP 2023.

10.18653/v1/2023.findings-emnlp.265 article EN cc-by 2023-01-01

Coming Soon ...