NFDI4DS | UHH-SEMS - Publication Details

Lucas Dixon

ORCID: 0000-0003-1094-1675

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5103159534

Research Areas

Hate Speech and Cyberbullying Detection
Topic Modeling
Logic, programming, and type systems
Wikis in Education and Collaboration
Natural Language Processing Techniques
Software Engineering Research
Cancer-related gene regulation
Logic, Reasoning, and Knowledge
Advanced Database Systems and Queries
Recommender Systems and Techniques
Formal Methods in Verification
Computability, Logic, AI Algorithms
Explainable Artificial Intelligence (XAI)
Multimodal Machine Learning Applications
Parallel Computing and Optimization Techniques
Adversarial Robustness in Machine Learning
Software Testing and Debugging Techniques
Advanced Malware Detection Techniques
Mathematics, Computing, and Information Processing
Sentiment Analysis and Opinion Mining
Artificial Intelligence in Games
Text Readability and Simplification
Ethics and Social Impacts of AI
Spam and Phishing Detection
Distributed and Parallel Computing Systems

Google (United States)
2010-2024

University of Bologna
2023

Athens University of Economics and Business
2019-2023

Télécom Paris
2021

Stockholm University
2021

University of Oxford
2020

University of South Carolina
2020

Cornell University
2018

University of Edinburgh
2003-2018

Wikimedia Foundation
2018

Constructing Induction Rules for Deductive Synthesis Proofs

OPENALEX - Publications

Alan Bundy Lucas Dixon Jeremy Gow Jacques Fleuriot

We describe novel computational techniques for constructing induction rules deductive synthesis proofs. Deductive holds out the promise of automated construction correct computer programs from specifications their desired behaviour. Synthesis with iteration or recursion requires inductive proof, but standard appropriate are restricted to recycling recursive structure specifications. What is needed rule that can introduce structures. show a combination rippling and use meta-variables as...

10.1016/j.entcs.2005.08.003 article EN Electronic Notes in Theoretical Computer Science 2006-03-01

Measuring and Mitigating Unintended Bias in Text Classification

OPENALEX - Publications

Lucas Dixon John Li Jeffrey Sorensen Nithum Thain Lucy Vasserman

We introduce and illustrate a new approach to measuring mitigating unintended bias in machine learning models. Our definition of is parameterized by test set subset input features. how this can be used evaluate text classifiers using synthetic public corpus comments annotated for toxicity from Wikipedia Talk pages. also demonstrate imbalances training data lead the resulting models, therefore potentially unfair applications. use common demographic identity terms as features on which we...

10.1145/3278721.3278729 article EN 2018-12-27

Ex Machina

OPENALEX - Publications

Ellery Wulczyn Nithum Thain Lucas Dixon

The damage personal attacks cause to online discourse motivates many platforms try curb the phenomenon. However, understanding prevalence and impact of in at scale remains surprisingly difficult. contribution this paper is develop illustrate a method that combines crowdsourcing machine learning analyze scale. We show an evaluation for classifier terms aggregated number crowd-workers it can approximate. apply our methodology English Wikipedia, generating corpus over 100k high quality...

10.1145/3038912.3052591 article EN 2017-04-03

Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification

OPENALEX - Publications

Daniel Borkan Lucas Dixon Jeffrey Sorensen Nithum Thain Lucy Vasserman

Unintended bias in Machine Learning can manifest as systemic differences performance for different demographic groups, potentially compounding existing challenges to fairness society at large. In this paper, we introduce a suite of threshold-agnostic metrics that provide nuanced view unintended bias, by considering the various ways classifier's score distribution vary across designated groups. We also large new test set online comments with crowd-sourced annotations identity references. use...

10.1145/3308560.3317593 article EN 2019-05-13

Ex Machina: Personal Attacks Seen at Scale

OPENALEX - Publications

Ellery Wulczyn Nithum Thain Lucas Dixon

10.48550/arxiv.1610.08914 preprint EN cc-by arXiv (Cornell University) 2016-01-01

Conversations Gone Awry: Detecting Early Signs of Conversational Failure

OPENALEX - Publications

Justine Zhang Jonathan P. Chang Cristian Danescu-Niculescu-Mizil Lucas Dixon Yiqing Hua and 2 more

Justine Zhang, Jonathan Chang, Cristian Danescu-Niculescu-Mizil, Lucas Dixon, Yiqing Hua, Dario Taraborelli, Nithum Thain. Proceedings of the 56th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2018.

10.18653/v1/p18-1125 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018-01-01

Gemma: Open Models Based on Gemini Research and Technology

OPENALEX - Publications

Gemma Team Thomas Mesnard Cassidy Hardin Robert Dadashi Surya Bhupatiraju and 95 more

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma demonstrate strong performance across academic benchmarks for language understanding, reasoning, safety. We release two sizes (2 billion 7 parameters), provide both pretrained fine-tuned checkpoints. outperforms similarly sized on 11 out 18 text-based tasks, we present comprehensive evaluations safety responsibility aspects models,...

10.48550/arxiv.2403.08295 preprint EN arXiv (Cornell University) 2024-03-13

Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences

OPENALEX - Publications

Scott Sanner Krisztian Balog Filip Radlinski Ben Wedin Lucas Dixon

Traditional recommender systems leverage users' item preference history to recommend novel content that users may like. However, modern dialog interfaces allow express language-based preferences offer a fundamentally different modality for input. Inspired by recent successes of prompting paradigms large language models (LLMs), we study their use making recommendations from both item-based and in comparison state-of-the-art collaborative filtering (CF) methods. To support this investigation,...

10.1145/3604915.3608845 article EN 2023-09-14

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

OPENALEX - Publications

Nan Du Yanping Huang Andrew M. Dai Simon Tong Dmitry Lepikhin and 22 more

Scaling language models with more data, compute and parameters has driven significant progress in natural processing. For example, thanks to scaling, GPT-3 was able achieve strong results on in-context learning tasks. However, training these large dense requires amounts of computing resources. In this paper, we propose develop a family named GLaM (Generalist Language Model), which uses sparsely activated mixture-of-experts architecture scale the model capacity while also incurring...

10.48550/arxiv.2112.06905 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Toxicity Detection: Does Context Really Matter?

OPENALEX - Publications

John Pavlopoulos Jeffrey Sorensen Lucas Dixon Nithum Thain Ion Androutsopoulos

Moderation is crucial to promoting healthy online discussions. Although several 'toxicity' detection datasets and models have been published, most of them ignore the context posts, implicitly assuming that comments may be judged independently. We investigate this assumption by focusing on two questions: (a) does affect human judgement, (b) conditioning improve performance toxicity systems? experiment with Wikipedia conversations, limiting notion previous post in thread discussion title. find...

10.18653/v1/2020.acl-main.396 article EN cc-by 2020-01-01

Crowdsourcing Subjective Tasks: The Case Study of Understanding Toxicity in Online Discussions

OPENALEX - Publications

Lora Aroyo Lucas Dixon Nithum Thain Olivia Redfield Rachel Rosen

Discussing things you care about can be difficult, especially via online platforms, where sharing your opinion leaves open to the real and immediate threats of abuse harassment. Due these threats, people stop expressing themselves give up on seeking different opinions. Recent research efforts focus examining strengths weaknesses (e.g. potential unintended biases) using machine learning as a support tool facilitate safe space for discussions; example, through detecting various types negative...

10.1145/3308560.3317083 article EN 2019-05-13

Conjecture Synthesis for Inductive Theories

OPENALEX - Publications

Moa Johansson Lucas Dixon Alan Bundy

10.1007/s10817-010-9193-y article EN Journal of Automated Reasoning 2010-07-23

ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT

OPENALEX - Publications

John Pavlopoulos Nithum Thain Lucas Dixon Ion Androutsopoulos

This paper presents the application of two strong baseline systems for toxicity detection and evaluates their performance in identifying categorizing offensive language social media. PERSPECTIVE is an API, that serves multiple machine learning models improvement conversations online, as well a system, trained on wide variety comments from platforms across Internet. BERT recently popular representation model, fine tuned per task achieving state art NLP tasks. performed better than detecting...

10.18653/v1/s19-2102 article EN cc-by 2019-01-01

Conversations Gone Awry: Detecting Early Signs of Conversational Failure

OPENALEX - Publications

Justine Zhang Jonathan P. Chang Cristian Danescu-Niculescu-Mizil Lucas Dixon Yiqing Hua and 2 more

One of the main challenges online social systems face is prevalence antisocial behavior, such as harassment and personal attacks. In this work, we introduce task predicting from very start a conversation whether it will get out hand. As opposed to detecting undesirable behavior after fact, aims enable early, actionable prediction at time when might still be salvaged. To end, develop framework for capturing pragmatic devices---such politeness strategies rhetorical prompts---used conversation,...

10.48550/arxiv.1805.05345 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Civil Rephrases Of Toxic Texts With Self-Supervised Transformers

OPENALEX - Publications

Léo Laugier John Pavlopoulos Jeffrey Sorensen Lucas Dixon

Platforms that support online commentary, from social networks to news sites, are increasingly leveraging machine learning assist their moderation efforts. But this process does not typically provide feedback the author would help them contribute according community guidelines. This is prohibitively time-consuming for human moderators do, and computational approaches still nascent. work focuses on models can suggest rephrasings of toxic comments in a more civil manner. Inspired by recent...

10.18653/v1/2021.eacl-main.124 article EN cc-by 2021-01-01

Classifying constructive comments

OPENALEX - Publications

Varada Kolhatkar Nithum Thain Jeffrey Sorensen Lucas Dixon Maite Taboada

We introduce the Constructive Comments Corpus (C3), comprised of 12,000 annotated news comments, intended to help build new tools for online communities improve quality their discussions. define constructive comments as high-quality that make a contribution conversation. explain crowd worker annotation scheme and de ne taxonomy subcharacteristics constructiveness. The resulting dataset is evaluated using measurements inter-annotator agreement, expert assessment sample, by constructiveness...

10.5210/fm.v28i4.13163 article EN cc-by-nc-sa First Monday 2023-04-07

Open-graphs and monoidal theories

OPENALEX - Publications

Lucas Dixon Aleks Kissinger

String diagrams are a powerful tool for reasoning about physical processes, logic circuits, tensor networks and many other compositional structures. The distinguishing feature of these is that edges need not be connected to vertices at both ends, unconnected ends can interpreted as the inputs outputs diagram. In this paper, we give concrete construction string using special kind typed graph called an open-graph. While category open-graphs itself adhesive, introduce notion selective adhesive...

10.1017/s0960129512000138 article EN Mathematical Structures in Computer Science 2013-02-28

WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community

OPENALEX - Publications

Yiqing Hua Cristian Danescu-Niculescu-Mizil Dario Taraborelli Nithum Thain Jeffery Sorensen and 1 more

Yiqing Hua, Cristian Danescu-Niculescu-Mizil, Dario Taraborelli, Nithum Thain, Jeffery Sorensen, Lucas Dixon. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018.

10.18653/v1/d18-1305 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models

OPENALEX - Publications

Minsuk Kahng Ian Tenney Mahima Pushkarna Michael Xieyang Liu James Wexler and 5 more

Automatic side-by-side evaluation has emerged as a promising approach to evaluating the quality of responses from large language models (LLMs). However, analyzing results this raises scalability and interpretability challenges. In paper, we present LLM Comparator, novel visual analytics tool for interactively automatic evaluation. The supports interactive workflows users understand when why model performs better or worse than baseline model, how two are qualitatively different. We...

10.1145/3613905.3650755 article EN 2024-05-02

Graphical reasoning in compact closed categories for quantum computation

OPENALEX - Publications

Lucas Dixon Ross Duncan

10.1007/s10472-009-9141-x article EN Annals of Mathematics and Artificial Intelligence 2009-05-01

Six Attributes of Unhealthy Conversations

OPENALEX - Publications

Ilan Price Jordan Gifford-Moore Jory Flemming Saul Musker Maayan Roichman and 4 more

Ilan Price, Jordan Gifford-Moore, Jory Flemming, Saul Musker, Maayan Roichman, Guillaume Sylvain, Nithum Thain, Lucas Dixon, Jeffrey Sorensen. Proceedings of the Fourth Workshop on Online Abuse and Harms. 2020.

10.18653/v1/2020.alw-1.15 article EN cc-by 2020-01-01

Scheme-based theorem discovery and concept invention

OPENALEX - Publications

Omar Montaño-Rivas Roy McCasland Lucas Dixon Alan Bundy

10.1016/j.eswa.2011.06.055 article EN Expert Systems with Applications 2011-07-08

Coming Soon ...