Mark Díaz

ORCID: 0000-0003-0167-9839
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Hate Speech and Cyberbullying Detection
  • Ethics and Social Impacts of AI
  • Misinformation and Its Impacts
  • Mobile Crowdsensing and Crowdsourcing
  • Natural Language Processing Techniques
  • Innovative Human-Technology Interaction
  • Insurance, Mortality, Demography, Risk Management
  • Sentiment Analysis and Opinion Mining
  • Multimodal Machine Learning Applications
  • Privacy-Preserving Technologies in Data
  • Social Media and Politics
  • Health, Environment, Cognitive Aging
  • Swearing, Euphemism, Multilingualism
  • Interdisciplinary Research and Collaboration
  • Semantic Web and Ontologies
  • Mental Health via Writing
  • Innovative Teaching Methodologies in Social Sciences
  • Speech and dialogue systems
  • Biomedical and Engineering Education
  • Psychology of Moral and Emotional Judgment
  • Smart Cities and Technologies
  • Team Dynamics and Performance
  • Healthcare cost, quality, practices
  • Scientific Computing and Data Management

Stanford University
2024

Google (United States)
2021-2024

Google (United Kingdom)
2024

DeepMind (United Kingdom)
2024

Carnegie Mellon University
2024

University of California, San Diego
2024

Chicago Arts Partnerships in Education
2022

Southern California University for Professional Studies
2021

University of Southern California
2021

Northwestern University
2017-2020

Large language models have been shown to achieve remarkable performance across a variety of natural tasks using few-shot learning, which drastically reduces the number task-specific training examples needed adapt model particular application. To further our understanding impact scale on we trained 540-billion parameter, densely activated, Transformer model, call Pathways Language Model PaLM. We PaLM 6144 TPU v4 chips Pathways, new ML system enables highly efficient multiple Pods. demonstrate...

10.48550/arxiv.2204.02311 preprint EN cc-by arXiv (Cornell University) 2022-01-01

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized dialog, which have up to 137B parameters and are pre-trained on 1.56T words public dialog data web text. While model scaling alone can improve quality, it shows less improvements safety factual grounding. demonstrate that fine-tuning with annotated enabling the consult external knowledge sources lead significant towards two key challenges The first challenge,...

10.48550/arxiv.2201.08239 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Abstract Majority voting and averaging are common approaches used to resolve annotator disagreements derive single ground truth labels from multiple annotations. However, annotators may systematically disagree with one another, often reflecting their individual biases values, especially in the case of subjective tasks such as detecting affect, aggression, hate speech. Annotator capture important nuances that ignored while aggregating annotations a truth. In order address this, we investigate...

10.1162/tacl_a_00449 article EN cc-by Transactions of the Association for Computational Linguistics 2022-01-01

We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities is more compute-efficient than its predecessor PaLM. 2 Transformer-based trained using mixture of objectives. Through extensive evaluations on English language, tasks, we demonstrate significantly improved quality downstream tasks across different sizes, while simultaneously exhibiting faster efficient inference compared to This efficiency enables broader deployment also...

10.48550/arxiv.2305.10403 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Participatory approaches to artificial intelligence (AI) and machine learning (ML) are gaining momentum: the increased attention comes partly with view that participation opens gateway an inclusive, equitable, robust, responsible trustworthy AI. Among other benefits, participatory essential understanding adequately representing needs, desires perspectives of historically marginalized communities. However, there currently exists lack clarity on what meaningful entails it is expected do. In...

10.1145/3551624.3555290 preprint EN 2022-10-06

Computational approaches to text analysis are useful in understanding aspects of online interaction, such as opinions and subjectivity text. Yet, recent studies have identified various forms bias language-based models, raising concerns about the risk propagating social biases against certain groups based on sociodemographic factors (e.g., gender, race, geography). In this study, we contribute a systematic examination application language models study discourse aging. We analyze treatment...

10.1145/3173574.3173986 article EN 2018-04-20

A common practice in building NLP datasets, especially using crowd-sourced annotations, involves obtaining multiple annotator judgements on the same data instances, which are then flattened to produce a single “ground truth” label or score, through majority voting, averaging, adjudication. While these approaches may be appropriate certain annotation tasks, such aggregations overlook socially constructed nature of human perceptions that annotations for relatively more subjective tasks meant...

10.18653/v1/2021.law-1.14 article EN cc-by 2021-01-01

Human annotated data plays a crucial role in machine learning (ML) research and development. However, the ethical considerations around processes decisions that go into dataset annotation have not received nearly enough attention. In this paper, we survey an array of literature provides insights crowdsourced annotation. We synthesize these insights, lay out challenges space along two layers: (1) who annotator is, how annotators' lived experiences can impact their annotations, (2)...

10.1145/3531146.3534647 article EN 2022 ACM Conference on Fairness, Accountability, and Transparency 2022-06-20

Human participants play a central role in the development of modern artificial intelligence (AI) technology, psychological science, and user research. Recent advances generative AI have attracted growing interest to possibility replacing human these domains with surrogates. We survey several such "substitution proposals" better understand arguments for against substituting AI. Our scoping review indicates that recent wave proposals is motivated by goals as reducing costs research work...

10.1145/3613904.3642703 article EN cc-by-sa 2024-05-11

Cities are increasingly integrating sensing and information communication technologies to improve municipal services, civic engagement, quality of life for residents. Although these have the potential affect economic, social, environmental factors, there has been less focus on residents lower income communities' involvement in technology design. Based two public forums held underserved communities, we describe residents' perceptions their communities challenges that limit technologies'...

10.1145/3359225 article EN Proceedings of the ACM on Human-Computer Interaction 2019-11-07

Ageism is a pervasive, and often invisible, form of discrimination. Though it can affect people all ages, older adults in particular face age-related stereotypes bias their everyday lives. In this paper, we describe the ways which bloggers articulate collective narrative on ageism as appears lives, develop community with anti-ageist interests, discuss strategies to navigate change societal views institutions. Bloggers criticize stereotypical notions that focus exclusively losses occur age...

10.1145/2998181.2998275 article EN 2017-02-14

Recent studies have identified various forms of bias in language-based models, raising concerns about the risk propagating social biases against certain groups based on sociodemographic factors (e.g., gender, race, geography). In this study, we analyze treatment age-related terms across 15 sentiment analysis models and 10 widely-used GloVe word embeddings attempt to alleviate through a method processing model training data. Our results show significant age is encoded outputs many algorithms...

10.24963/ijcai.2019/852 article EN 2019-07-28

How do historically marginalized narratives spread on social media platforms? Developing research in collaboration with intersectional artists and community, or what we call “platforming intersectionality,” can reveal the promise limitations of for bridging disparate, segregated communities, “networked solidarity.” Using case studies indie TV series about show that intersectionality corporate platforms, but causes are largely visible outside both online offline. Basic conditions spreading...

10.1177/2056305120933301 article EN cc-by-nc Social Media + Society 2020-07-01

Human annotations play a crucial role in machine learning (ML) research and development. However, the ethical considerations around processes decisions that go into building ML datasets has not received nearly enough attention. In this paper, we survey an array of literature provides insights crowdsourced dataset annotation. We synthesize these insights, lay out challenges space along two layers: (1) who annotator is, how annotators' lived experiences can impact their annotations, (2)...

10.48550/arxiv.2112.04554 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Machine learning approaches often require training and evaluation datasets with a clear separation between positive negative examples. This risks simplifying even obscuring the inherent subjectivity present in many tasks. Preserving such variance content diversity is expensive laborious. especially troubling when building safety for conversational AI systems, as both socially culturally situated. To demonstrate this crucial aspect of safety, to facilitate in-depth model performance analyses,...

10.48550/arxiv.2306.11247 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Recent years have seen substantial investments in AI-based tools designed to detect offensive language at scale, aiming moderate social media platforms, and ensure safety of conversational AI technologies such as ChatGPT Bard. These efforts largely treat this task a technical endeavor, relying on data annotated for offensiveness by global crowd workforce, without considering workers' socio-cultural backgrounds or the values their perceptions reflect. Existing research that examines...

10.1145/3630106.3659021 article EN other-oa 2022 ACM Conference on Fairness, Accountability, and Transparency 2024-06-03

With the rise of generative AI (GenAI), there has been an increased need for participation by large and diverse user bases in evaluation auditing. GenAI developers are increasingly adopting crowdsourcing approaches to test audit their products services. However, it remains open question how design deploy responsible effective pipelines auditing evaluation. This workshop aims take a step towards bridging this gap. Our interdisciplinary team organizers will work with participants explore...

10.1609/hcomp.v12i1.31609 article EN Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 2024-10-14

Machine translation (MT) is now widely and freely available, has the potential to greatly improve cross-lingual communication. In order use MT reliably safely, end users must be able assess quality of system outputs determine how much they can rely on them guide their decisions actions. However, it difficult for detect recover from mistranslations due limited language skills. this work we collected 19 MT-mediated role-play conversations in housing employment scenarios, conducted in-depth...

10.1145/3531146.3534638 article EN 2022 ACM Conference on Fairness, Accountability, and Transparency 2022-06-20

Cities are increasingly integrating urban technologies into their infrastructures to improve municipal services, civic engagement, and quality of life for residents. Research suggests that implemented in communities can worsen existing inequalities, yet there is little understanding what underserved residents think about or how they engage with cities technology policies practices. Based on two forums held communities, we found motivated participate city planning because believe impacts the...

10.1145/3170427.3188583 article EN 2018-04-20

Tasks such as toxicity detection, hate speech and online harassment detection have been developed for identifying interactions involving offensive speech. In this work we articulate the need a relational understanding of offensiveness to help distinguish denotative from serving mechanism through which marginalized communities resist oppressive social norms. Using examples queer community, argue that evaluations must focus on impacts language use. We call cynic perspective– or characteristic...

10.18653/v1/2022.woah-1.18 article EN cc-by 2022-01-01

The Walk Score is a patented algorithm for measuring the walkability of given geographic area. In addition to its use in real estate, accompanying API used range research public health and urban development. This study explores how neighborhood residents differently understand notion as well extent which their personal definitions are reflected Score's underlying algorithm. We find that, while generally aligns with residents' priorities around walkability, significant subjective aspects that...

10.1145/3359228 article EN Proceedings of the ACM on Human-Computer Interaction 2019-11-07

In this paper, we present findings from an semi-experimental exploration of rater diversity and its influence on safety annotations conversations generated by humans talking to a generative AI-chat bot. We find significant differences in judgments produced raters different geographic regions annotation platforms, correlate these perspectives with demographic sub-groups. Our work helps define best practices model development -- specifically human evaluation models the backdrop growing...

10.48550/arxiv.2301.09406 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Human participants play a central role in the development of modern artificial intelligence (AI) technology, psychological science, and user research. Recent advances generative AI have attracted growing interest to possibility replacing human these domains with surrogates. We survey several such "substitution proposals" better understand arguments for against substituting AI. Our scoping review indicates that recent wave proposals is motivated by goals as reducing costs research work...

10.48550/arxiv.2401.08572 preprint EN cc-by arXiv (Cornell University) 2024-01-01
Coming Soon ...