Robert West

ORCID: 0000-0002-3984-1232
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Wikis in Education and Collaboration
  • Natural Language Processing Techniques
  • Misinformation and Its Impacts
  • Synthesis and characterization of novel inorganic/organometallic compounds
  • Social Media and Politics
  • Hate Speech and Cyberbullying Detection
  • Organometallic Complex Synthesis and Catalysis
  • Complex Network Analysis Techniques
  • Advanced Text Analysis Techniques
  • Media Influence and Politics
  • Opinion Dynamics and Social Influence
  • Sentiment Analysis and Opinion Mining
  • Digital Marketing and Social Media
  • Multimodal Machine Learning Applications
  • Open Source Software Innovations
  • Privacy-Preserving Technologies in Data
  • Spam and Phishing Detection
  • Molecular Junctions and Nanostructures
  • Digital Games and Media
  • Organoboron and organosilicon chemistry
  • Cancer-related gene regulation
  • Mobile Crowdsensing and Crowdsourcing
  • Web Data Mining and Analysis
  • Software Engineering Research

École Polytechnique Fédérale de Lausanne
2017-2025

Swiss Data Science Center
2024

ETH Zurich
2021-2023

University of Cambridge
2021-2023

University of Chicago
2023

Institute of Software
2022

Chinese Academy of Sciences
2022

Vrije Universiteit Amsterdam
2022

Laboratoire d'Informatique Fondamentale de Lille
2021-2022

University of Florida
2022

Non-profits, as well the media, have hypothesized existence of a radicalization pipeline on YouTube, claiming that users systematically progress towards more extreme content platform. Yet, there is to date no substantial quantitative evidence this alleged pipeline. To close gap, we conduct large-scale audit user YouTube. We analyze 330,925 videos posted 349 channels, which broadly classified into four types: Media, Alt-lite, Intellectual Dark Web (I.D.W.), and Alt-right. According...

10.1145/3351095.3372879 article EN 2020-01-23

Vibrant online communities are in constant flux. As members join and depart, the interactional norms evolve, stimulating further changes to membership its social dynamics. Linguistic change --- sense of innovation that becomes accepted as norm is essential this dynamic process: it both facilitates individual expression fosters emergence a collective identity.

10.1145/2488388.2488416 article EN 2013-05-13

Wikipedia is a major source of information for many people. However, false on raises concerns about its credibility. One way in which may be presented the form hoax articles, i.e., articles containing fabricated facts nonexistent entities or events. In this paper we study by focusing that have been created throughout history. We make several contributions. First, assess real-world impact measuring how long they survive before being debunked, pageviews receive, and heavily are referred to...

10.1145/2872427.2883085 article EN 2016-04-11

Over the past few years, massive amounts of world knowledge have been accumulated in publicly available bases, such as Freebase, NELL, and YAGO. Yet despite their seemingly huge size, these bases are greatly incomplete. For example, over 70% people included Freebase no known place birth, 99% ethnicity. In this paper, we propose a way to leverage existing Web-search-based question-answering technology fill gaps targeted way. particular, for each entity attribute, learn best set queries ask,...

10.1145/2566486.2568032 article EN 2014-04-07

Person-to-person evaluations are prevalent in all kinds of discourse and important for establishing reputations, building social bonds, shaping public opinion. Such can be analyzed separately using signed networks textual sentiment analysis, but this misses the rich interactions between language context. To capture such interactions, we develop a model that predicts individual A’s opinion B by synthesizing information from network which A embedded with analysis evaluative texts relating to...

10.1162/tacl_a_00184 article EN cc-by Transactions of the Association for Computational Linguistics 2014-12-01

Large language models (LLMs) are remarkable data annotators. They can be used to generate high-fidelity supervised training data, as well survey and experimental data. With the widespread adoption of LLMs, human gold--standard annotations key understanding capabilities LLMs validity their results. However, crowdsourcing, an important, inexpensive way obtain annotations, may itself impacted by crowd workers have financial incentives use increase productivity income. To investigate this...

10.48550/arxiv.2306.07899 preprint EN cc-by arXiv (Cornell University) 2023-01-01

<title>Abstract</title> Can large language models (LLMs) create tailor-made, convincing arguments to promote false or misleading narratives online? Early work has found that LLMs can generate content perceived on par with, even more persuasive than, human-written messages. However, there is still limited evidence regarding LLMs' capabilities in direct conversations with humans—the scenario these are usually deployed at. In this pre-registered study, we analyze the power of AI-driven...

10.21203/rs.3.rs-4429707/v1 preprint EN cc-by Research Square (Research Square) 2024-06-05

Navigating information spaces is an essential part of our everyday lives, and in order to design efficient user-friendly systems, it important understand how humans navigate find the they are looking for. We perform a large-scale study human wayfinding, which, given network links between concepts Wikipedia, people play game finding short path from start target concept by following hyperlinks. What distinguishes setup other studies Web-browsing behavior that case graph connections concepts,...

10.1145/2187836.2187920 article EN 2012-04-16

Wikipedia is one of the most popular sites on Web, with millions users relying it to satisfy a broad range information needs every day. Although crucial understand what exactly these are in order be able meet them, little currently known about why visit Wikipedia. The goal this paper fill gap by combining survey readers log-based analysis user activity. Based an initial series surveys, we build taxonomy use cases along several dimensions, capturing users' motivations Wikipedia, depth...

10.1145/3038912.3052716 preprint EN 2017-04-03

It is urgent to understand how effectively communicate public health messages during the COVID-19 pandemic. Previous work has focused on formulate in terms of style and content, rather than who should send them. In particular, little known about impact spokesperson selection message propagation times crisis. We report effectiveness different figures at promoting social distancing among 12,194 respondents from six countries that were severely affected by pandemic time data collection. Across...

10.1371/journal.pone.0245100 article EN cc-by PLoS ONE 2021-02-03

Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e.g., chain-of-thought prompting. However, these inference steps may be inappropriate deductions from the initial context and lead to incorrect final predictions. Here we introduce REFINER, a framework for finetuning LMs generate while interacting with critic model that provides automated feedback reasoning. Specifically, structured LM uses iteratively improve...

10.48550/arxiv.2304.01904 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Large language models (LLMs) have great potential for synthetic data generation. This work shows that useful can be synthetically generated even tasks cannot solved directly by LLMs: problems with structured outputs, it is possible to prompt an LLM perform the task in reverse direction, generating plausible input text a target output structure. Leveraging this asymmetry difficulty makes produce large-scale, high-quality complex tasks. We demonstrate effectiveness of approach on closed...

10.18653/v1/2023.emnlp-main.96 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

In recent years, critics of online platforms have raised concerns about the ability recommendation algorithms to amplify problematic content, with potentially radicalizing consequences. However, attempts evaluate effect recommenders suffered from a lack appropriate counterfactuals—what user would viewed in absence algorithmic recommendations—and hence cannot disentangle effects algorithm user’s intentions. Here we propose method that call “counterfactual bots” causally estimate role...

10.1073/pnas.2313377121 article EN Proceedings of the National Academy of Sciences 2024-02-13

Nutrition is a key factor in people's overall health. Hence, understanding the nature and dynamics of population-wide dietary preferences over time space can be valuable public To date, studies have leveraged small samples participants via food intake logs or treatment data. We propose complementary source population data on nutrition obtained Web logs. Our main contribution spatiotemporal analysis through lens gathered by widely distributed Web-browser add-on, using access volume recipes...

10.1145/2488388.2488510 article EN 2013-05-13

Evaluation of cross-lingual encoders is usually performed either via zero-shot transfer in supervised downstream tasks or unsupervised textual similarity. In this paper, we concern ourselves with reference-free machine translation (MT) evaluation where directly compare source texts to (sometimes low-quality) system translations, which represents a natural adversarial setup for multilingual encoders. Reference-free holds the promise web-scale comparison MT systems. We systematically...

10.18653/v1/2020.acl-main.151 article EN cc-by 2020-01-01

Researchers have suggested that "the Manosphere," a conglomerate of men-centered online communities, may serve as gateway to far right movements. In context, this paper quantitatively studies the migratory patterns between variety groups within Manosphere and Alt-right, loosely connected movement has been particularly active in mainstream social networks. Our analysis leverages over 300 million comments spread through Reddit (in 115 subreddits) YouTube 526 channels) investigate whether...

10.1145/3447535.3462504 article EN 2021-06-21

People regularly face tasks that can be understood as navigation in information networks, where the goal is to find a path between two given nodes. In many such situations, navigator only gets local access node currently under inspection and its immediate neighbors. This lack of global about network notwithstanding, humans tend good at finding short paths, despite fact real-world networks are typically very large. One potential reason for this could possess vast amounts background knowledge...

10.1609/icwsm.v6i1.14238 article EN Proceedings of the International AAAI Conference on Web and Social Media 2021-08-03

Political polarization appears to be on the rise, as measured by voting behavior, general affect towards opposing partisans and their parties, contents posted consumed online. Research over years has focused role of Web a driver polarization. In order further our understanding factors behind online polarization, in present work we collect analyze browsing histories tens thousands users alongside careful measurements time spent various news sources. We show that consumption follows polarized...

10.1609/icwsm.v15i1.18049 article EN Proceedings of the International AAAI Conference on Web and Social Media 2021-05-22

Martin Josifoski, Nicola De Cao, Maxime Peyrard, Fabio Petroni, Robert West. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.

10.18653/v1/2022.naacl-main.342 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2022-01-01

Online social media platforms use automated moderation systems to remove or reduce the visibility of rule-breaking content. While previous work has documented importance manual content moderation, effects remain largely unknown. Here, in a large study Facebook comments (n = 412M), we used fuzzy regression discontinuity design measure impact on subsequent behavior (number hidden/deleted) and engagement additional posted). We found that comment deletion decreased shorter threads (20 fewer...

10.1145/3543507.3583275 article EN Proceedings of the ACM Web Conference 2022 2023-04-26

Generative language models (LMs) have become omnipresent across data science. For a wide variety of tasks, inputs can be phrased as natural prompts for an LM, from whose output the solution then extracted. LM performance has consistently been increasing with model size - but so monetary cost querying ever larger models. Importantly, however, not all are equally hard: some require LMs obtaining satisfactory solution, whereas others smaller suffice. Based on this fact, we design framework...

10.1145/3616855.3635825 preprint EN 2024-03-04
Coming Soon ...