NFDI4DS | UHH-SEMS - Publication Details

Human language reveals a universal positivity bias

OPENALEX - Publications

Peter Sheridan Dodds Eric M. Clark Suma Desu Morgan R. Frank Andrew J. Reagan and 9 more

Significance The most commonly used words of 24 corpora across 10 diverse human languages exhibit a clear positive bias, big data confirmation the Pollyanna hypothesis. study’s findings are based on 5 million individual scores and pave way for development powerful language-based tools measuring emotion.

10.1073/pnas.1411678112 article EN Proceedings of the National Academy of Sciences 2015-02-09

Sifting robotic from organic text: A natural language approach for detecting automation on Twitter

OPENALEX - Publications

Eric M. Clark Jake Ryland Williams Chris Jones Richard A. Galbraith Christopher M. Danforth and 1 more

10.1016/j.jocs.2015.11.002 article EN Journal of Computational Science 2015-11-19

BuzzFace: A News Veracity Dataset with Facebook User Commentary and Egos

OPENALEX - Publications

Giovanni C. Santia Jake Ryland Williams

Veracity assessment of news and social bot detection have become two the most pressing issues for media platforms, yet current gold-standard data are limited. This paper presents a leap forward in development sizeable feature rich dataset. The dataset was built by using collection items posted to Facebook nine outlets during September 2016, which were annotated veracity BuzzFeed. These articles refined beyond binary annotation four categories: mostly true, false, mixture true no factual...

10.1609/icwsm.v12i1.14985 article EN Proceedings of the International AAAI Conference on Web and Social Media 2018-06-15

Vaporous Marketing: Uncovering Pervasive Electronic Cigarette Advertisements on Twitter

OPENALEX - Publications

Eric M. Clark Chris Jones Jake Ryland Williams Allison N. Kurti Mitchell C. Norotsky and 2 more

Twitter has become the "wild-west" of marketing and promotional strategies for advertisement agencies. Electronic cigarettes have been heavily marketed across feeds, offering discounts, "kid-friendly" flavors, algorithmically generated false testimonials, free samples.All electronic cigarette keyword related tweets from a 10% sample spanning January 2012 through December 2014 (approximately 850,000 total tweets) were identified categorized as Automated or Organic by combining classification...

10.1371/journal.pone.0157304 article EN cc-by PLoS ONE 2016-07-13

Sentiment analysis methods for understanding large-scale texts: a case for using continuum-scored words and word shift graphs

OPENALEX - Publications

Andrew J. Reagan Christopher M. Danforth Brian F. Tivnan Jake Ryland Williams Peter Sheridan Dodds

The emergence and global adoption of social media has rendered possible the real-time estimation population-scale sentiment, an extraordinary capacity which profound implications for our understanding human behavior. Given growing assortment sentiment-measuring instruments, it is imperative to understand aspects sentiment dictionaries contribute both their classification accuracy ability provide richer texts. Here, we perform detailed, quantitative tests qualitative assessments 6...

10.1140/epjds/s13688-017-0121-9 article EN cc-by EPJ Data Science 2017-10-30

Zipf’s law holds for phrases, not words

OPENALEX - Publications

Jake Ryland Williams Paul R. Lessard Suma Desu Eric M. Clark James P. Bagrow and 2 more

With Zipf's law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three four orders of magnitude before hitting a clear break scaling. Here, building on the simple observation that phrases one or words comprise coherent units meaning we show empirically extends as many nine rank magnitude. In doing so, develop principled scalable statistical mechanical method random text partitioning,...

10.1038/srep12209 article EN cc-by Scientific Reports 2015-08-11

The Reliability and Construct Validity of the Functional Threshold Power Test in Recreational Cyclists

OPENALEX - Publications

Lewis A. Gough Jake Ryland Williams GM Downes Savannah Sturridge Ashley Warner and 3 more

Background/objectives: The functional threshold power (FTP) 20-min test (FTP20) is popular amongst cyclists and coaches due to the theory it can predict output that be sustained for 60-mins. However, little known in terms of reliability validity this construct, therefore aim study was assess FTP20 construct 60-min power. Methods: Twenty-two male trained (age = 32 ± 10 years, body mass (BM) 77.2 6.8 kg, maximal oxygen uptake (V̇O2max) 59.4 5.6 ml.kg-1.min-1 BM) completed four trials...

10.20944/preprints202502.0505.v1 preprint EN 2025-02-07

Detecting Social Bots on Facebook in an Information Veracity Context

OPENALEX - Publications

Giovanni C. Santia Munif Ishad Mujib Jake Ryland Williams

Misleading information is nothing new, yet its impacts seem only to grow. We investigate this phenomenon in the context of social bots. Social bots are software agents that mimic humans. They intended interact with humans while supporting specific agendas. This work explores effect on spread misinformation Facebook during Fall 2016 and prototypes a tool for their detection. Using dataset about two million user comments discussing posts public pages nine verified news outlets, we first...

10.1609/icwsm.v13i01.3244 article EN Proceedings of the International AAAI Conference on Web and Social Media 2019-07-06

Text mixing shapes the anatomy of rank-frequency distributions

OPENALEX - Publications

Jake Ryland Williams James P. Bagrow Christopher M. Danforth Peter Sheridan Dodds

Natural languages are full of rules and exceptions. One the most famous quantitative is Zipf's law, which states that frequency occurrence a word approximately inversely proportional to its rank. Though this "law" ranks has been found hold across disparate texts forms data, analyses increasingly large corpora since late 1990s have revealed existence two scaling regimes. These regimes thus far explained by hypothesis suggesting separability into core noncore lexica. Here we present defend an...

10.1103/physreve.91.052811 article EN Physical Review E 2015-05-20

The Lexicocalorimeter: Gauging public health through caloric input and output on social media

OPENALEX - Publications

Sharon E. Alajajian Jake Ryland Williams Andrew J. Reagan Stephen Alajajian Morgan R. Frank and 4 more

We propose and develop a Lexicocalorimeter: an online, interactive instrument for measuring the "caloric content" of social media other large-scale texts. do so by constructing extensive yet improvable tables food activity related phrases, respectively assigning them with sourced estimates caloric intake expenditure. show that Twitter, our naive measures input", output", ratio these are all strong correlates health well-being contiguous United States. Our balance measure in many cases...

10.1371/journal.pone.0168893 article EN cc-by PLoS ONE 2017-02-10

Benchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift graphs

OPENALEX - Publications

Andrew J. Reagan Brian F. Tivnan Jake Ryland Williams Christopher M. Danforth Peter Sheridan Dodds

The emergence and global adoption of social media has rendered possible the real-time estimation population-scale sentiment, bearing profound implications for our understanding human behavior. Given growing assortment sentiment measuring instruments, comparisons between them are evidently required. Here, we perform detailed tests 6 dictionary-based methods applied to 4 different corpora, briefly examine a further 20 methods. We show that method will only both reliably meaningfully if (1)...

10.48550/arxiv.1512.00531 preprint EN cc-by-nc-sa arXiv (Cornell University) 2015-01-01

An Evaluation of Generative Pre-Training Model-based Therapy Chatbot for Caregivers

OPENALEX - Publications

Lu Wang Munif Ishad Mujib Jake Ryland Williams George Demiris Jina Huh

With the advent of off-the-shelf intelligent home products and broader internet adoption, researchers increasingly explore smart computing applications that provide easier access to health wellness resources. AI-based systems like chatbots have potential services could mental support. However, existing therapy are often retrieval-based, requiring users respond with a constrained set answers, which may not be appropriate given such pre-determined inquiries reflect each patient's unique...

10.48550/arxiv.2107.13115 preprint EN other-oa arXiv (Cornell University) 2021-01-01

A scalable machine learning approach for measuring violent and peaceful forms of political protest participation with social media data

OPENALEX - Publications

Lefteris Jason Anastasopoulos Jake Ryland Williams

In this paper, we introduce a scalable machine learning approach accompanied by open-source software for identifying violent and peaceful forms of political protest participation using social media data. While protests are statistically rare events, they often shape public perceptions movements. This is, in part, due to the extensive disproportionate coverage which receives relative participation. past, when small number conglomerates served as primary information source about movements,...

10.1371/journal.pone.0212834 article EN cc-by PLoS ONE 2019-03-19

Motivational Interviewing Conversational Agent for Parents as Proxies for Their Children in Healthy Eating: Development and User Testing

OPENALEX - Publications

Diva Smriti Tsui‐Sui Annie Kao Rahil Rathod Ji Youn Shin Wei Peng and 4 more

Increased adoption of off-the-shelf conversational agents (CAs) brings opportunities to integrate therapeutic interventions. Motivational Interviewing (MI) can then be integrated with CAs for cost-effective access it. MI especially beneficial parents who often have low motivation because limited time and resources eat healthy together their children.We developed a Conversational Agent (MICA) improve eating in serve as proxy health behavior change children. Proxy relationships involve person...

10.2196/38908 article EN cc-by JMIR Human Factors 2022-08-07

Simon's fundamental rich-get-richer model entails a dominant first-mover advantage

OPENALEX - Publications

Peter Sheridan Dodds David Rushing Dewhurst Fletcher F. Hazlehurst Colin M. Van Oort Lewis Mitchell and 3 more

Herbert Simon's classic rich-get-richer model is one of the simplest empirically supported mechanisms capable generating heavy-tail size distributions for complex systems. Simon argued analytically that a population flavored elements growing by either adding novel element or randomly replicating an existing would afford distribution group sizes with power-law tail. Here, we show that, in fact, does not produce simple power law as initial has dominant first-mover advantage, and will be...

10.1103/physreve.95.052301 article EN Physical review. E 2017-05-01

NALM-based, phase-preserving 2R regenerator of high-duty-cycle pulses

OPENALEX - Publications

Taras I. Lakoba Jake Ryland Williams Michael Vasilyev

We explore the potential of nonlinear amplifying loop mirror (NALM)-based phase-preserving 2R (reamplification and reshaping) regenerator for simultaneous regeneration multiple wavelength-division-multiplexed (WDM) channels. While not considering multi-channel propagation, we address two issues NALM that appear to us as major obstacles in adopting it realistic WDM applications: a high operating power detrimental effect non-small (33% - 50%) pulse duty cycles. After thorough optimization,...

10.1364/oe.19.023017 article EN cc-by Optics Express 2011-10-28

Abstract 5419: HDAC1 regulates RUNX1 activity in inv(16) acute myeloid leukemia

OPENALEX - Publications

Lisa Richter Yiqian Wang Michelle Becker Jake Ryland Williams R. Katherine Hyde

Abstract RUNX1 and CBFβ form a transcription factor dimer that regulates normal hematopoiesis leukemogenesis. Inversion of chromosome 16 (inv(16)) is one the most common mutations in acute myeloid leukemia (AML), fusing with gene encoding smooth muscle myosin heavy chain (MYH11). The fusion protein encoded by CBFB-MYH11 (CM), retains ability to bind RUNX1, together they cause changes expression leading Recently, we found Histone Deacetylase 1 (HDAC1) part RUNX1:CM complex, all three proteins...

10.1158/1538-7445.am2018-5419 article EN Cancer Research 2018-07-01

The Earth Is Flat and the Sun Is Not a Star: The Susceptibility of GPT-2 to Universal Adversarial Triggers

OPENALEX - Publications

Hunter Scott Heidenreich Jake Ryland Williams

This work considers universal adversarial triggers, a method of adversarially disrupting natural language models, and questions if it is possible to use such triggers affect both the topic stance conditional text generation models. In considering four "controversial" topics, this demonstrates success at identifying that cause GPT-2 model produce about targeted topics as well influence takes towards topic. We show that, while more fringe are challenging identify for, they do appear...

10.1145/3461702.3462578 article EN 2021-07-21

Identifying missing dictionary entries with frequency-conserving context models

OPENALEX - Publications

Jake Ryland Williams Eric M. Clark James P. Bagrow Christopher M. Danforth Peter Sheridan Dodds

In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these for organization fall a family defined by word ordering. Unlike demographic or spatial partitions data, collocation models are special importance their universal applicability. While interested here in text and have framed our treatment appropriately, work is potentially applicable other areas research (e.g., speech, genomics, mobility...

10.1103/physreve.92.042808 article EN Physical Review E 2015-10-12

Zipf's law is a consequence of coherent language production

OPENALEX - Publications

Jake Ryland Williams James P. Bagrow Andrew J. Reagan Sharon E. Alajajian Christopher M. Danforth and 1 more

The task of text segmentation may be undertaken at many levels in analysis---paragraphs, sentences, words, or even letters. Here, we focus on a relatively fine scale segmentation, hypothesizing it to accord with stochastic model language generation, as the smallest where independent units meaning are produced. Our goals this letter include development methods for these minimal units, which produce feature-representations texts that align independence assumption bag-of-terms model, commonly...

10.48550/arxiv.1601.07969 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Low-power, phase-preserving 2R amplitude regenerator

OPENALEX - Publications

Taras I. Lakoba Jake Ryland Williams Michael Vasilyev

10.1016/j.optcom.2011.09.027 article EN Optics Communications 2011-09-29

The Lexicocalorimeter: Gauging public health through caloric input and output on social media

OPENALEX - Publications

Sharon E. Alajajian Jake Ryland Williams Andrew J. Reagan Stephen Alajajian Morgan R. Frank and 4 more

We propose and develop a Lexicocalorimeter: an online, interactive instrument for measuring the "caloric content" of social media other large-scale texts. do so by constructing extensive yet improvable tables food activity related phrases, respectively assigning them with sourced estimates caloric intake expenditure. show that Twitter, our naive measures input", output", ratio these are all strong correlates health well-being contiguous United States. Our balance measure in many cases...

10.48550/arxiv.1507.05098 preprint EN other-oa arXiv (Cornell University) 2015-01-01