Jake Ryland Williams

ORCID: 0000-0002-7050-8403
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Authorship Attribution and Profiling
  • Misinformation and Its Impacts
  • Topic Modeling
  • Advanced Text Analysis Techniques
  • Opinion Dynamics and Social Influence
  • Complex Network Analysis Techniques
  • Social Media and Politics
  • Spam and Phishing Detection
  • Language and cultural evolution
  • Innovative Human-Technology Interaction
  • Hate Speech and Cyberbullying Detection
  • Digital Mental Health Interventions
  • Advanced Malware Detection Techniques
  • Mobile Health and mHealth Applications
  • Media Studies and Communication
  • Computational and Text Analysis Methods
  • Neural Networks and Applications
  • Speech Recognition and Synthesis
  • Sentiment Analysis and Opinion Mining
  • Lexicography and Language Studies
  • Fractal and DNA sequence analysis
  • Optical Network Technologies
  • Advanced Photonic Communication Systems
  • Text and Document Classification Technologies

Drexel University
2016-2023

University of Nebraska Medical Center
2018

University of Vermont
2011-2017

University of California, Berkeley
2016-2017

Significance The most commonly used words of 24 corpora across 10 diverse human languages exhibit a clear positive bias, big data confirmation the Pollyanna hypothesis. study’s findings are based on 5 million individual scores and pave way for development powerful language-based tools measuring emotion.

10.1073/pnas.1411678112 article EN Proceedings of the National Academy of Sciences 2015-02-09

Veracity assessment of news and social bot detection have become two the most pressing issues for media platforms, yet current gold-standard data are limited. This paper presents a leap forward in development sizeable feature rich dataset. The dataset was built by using collection items posted to Facebook nine outlets during September 2016, which were annotated veracity BuzzFeed. These articles refined beyond binary annotation four categories: mostly true, false, mixture true no factual...

10.1609/icwsm.v12i1.14985 article EN Proceedings of the International AAAI Conference on Web and Social Media 2018-06-15

Twitter has become the "wild-west" of marketing and promotional strategies for advertisement agencies. Electronic cigarettes have been heavily marketed across feeds, offering discounts, "kid-friendly" flavors, algorithmically generated false testimonials, free samples.All electronic cigarette keyword related tweets from a 10% sample spanning January 2012 through December 2014 (approximately 850,000 total tweets) were identified categorized as Automated or Organic by combining classification...

10.1371/journal.pone.0157304 article EN cc-by PLoS ONE 2016-07-13

The emergence and global adoption of social media has rendered possible the real-time estimation population-scale sentiment, an extraordinary capacity which profound implications for our understanding human behavior. Given growing assortment sentiment-measuring instruments, it is imperative to understand aspects sentiment dictionaries contribute both their classification accuracy ability provide richer texts. Here, we perform detailed, quantitative tests qualitative assessments 6...

10.1140/epjds/s13688-017-0121-9 article EN cc-by EPJ Data Science 2017-10-30

With Zipf's law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three four orders of magnitude before hitting a clear break scaling. Here, building on the simple observation that phrases one or words comprise coherent units meaning we show empirically extends as many nine rank magnitude. In doing so, develop principled scalable statistical mechanical method random text partitioning,...

10.1038/srep12209 article EN cc-by Scientific Reports 2015-08-11

Background/objectives: The functional threshold power (FTP) 20-min test (FTP20) is popular amongst cyclists and coaches due to the theory it can predict output that be sustained for 60-mins. However, little known in terms of reliability validity this construct, therefore aim study was assess FTP20 construct 60-min power. Methods: Twenty-two male trained (age = 32 ± 10 years, body mass (BM) 77.2 6.8 kg, maximal oxygen uptake (V̇O2max) 59.4 5.6 ml.kg-1.min-1 BM) completed four trials...

10.20944/preprints202502.0505.v1 preprint EN 2025-02-07

Misleading information is nothing new, yet its impacts seem only to grow. We investigate this phenomenon in the context of social bots. Social bots are software agents that mimic humans. They intended interact with humans while supporting specific agendas. This work explores effect on spread misinformation Facebook during Fall 2016 and prototypes a tool for their detection. Using dataset about two million user comments discussing posts public pages nine verified news outlets, we first...

10.1609/icwsm.v13i01.3244 article EN Proceedings of the International AAAI Conference on Web and Social Media 2019-07-06

Natural languages are full of rules and exceptions. One the most famous quantitative is Zipf's law, which states that frequency occurrence a word approximately inversely proportional to its rank. Though this "law" ranks has been found hold across disparate texts forms data, analyses increasingly large corpora since late 1990s have revealed existence two scaling regimes. These regimes thus far explained by hypothesis suggesting separability into core noncore lexica. Here we present defend an...

10.1103/physreve.91.052811 article EN Physical Review E 2015-05-20

We propose and develop a Lexicocalorimeter: an online, interactive instrument for measuring the "caloric content" of social media other large-scale texts. do so by constructing extensive yet improvable tables food activity related phrases, respectively assigning them with sourced estimates caloric intake expenditure. show that Twitter, our naive measures input", output", ratio these are all strong correlates health well-being contiguous United States. Our balance measure in many cases...

10.1371/journal.pone.0168893 article EN cc-by PLoS ONE 2017-02-10

The emergence and global adoption of social media has rendered possible the real-time estimation population-scale sentiment, bearing profound implications for our understanding human behavior. Given growing assortment sentiment measuring instruments, comparisons between them are evidently required. Here, we perform detailed tests 6 dictionary-based methods applied to 4 different corpora, briefly examine a further 20 methods. We show that method will only both reliably meaningfully if (1)...

10.48550/arxiv.1512.00531 preprint EN cc-by-nc-sa arXiv (Cornell University) 2015-01-01

With the advent of off-the-shelf intelligent home products and broader internet adoption, researchers increasingly explore smart computing applications that provide easier access to health wellness resources. AI-based systems like chatbots have potential services could mental support. However, existing therapy are often retrieval-based, requiring users respond with a constrained set answers, which may not be appropriate given such pre-determined inquiries reflect each patient's unique...

10.48550/arxiv.2107.13115 preprint EN other-oa arXiv (Cornell University) 2021-01-01

In this paper, we introduce a scalable machine learning approach accompanied by open-source software for identifying violent and peaceful forms of political protest participation using social media data. While protests are statistically rare events, they often shape public perceptions movements. This is, in part, due to the extensive disproportionate coverage which receives relative participation. past, when small number conglomerates served as primary information source about movements,...

10.1371/journal.pone.0212834 article EN cc-by PLoS ONE 2019-03-19

Increased adoption of off-the-shelf conversational agents (CAs) brings opportunities to integrate therapeutic interventions. Motivational Interviewing (MI) can then be integrated with CAs for cost-effective access it. MI especially beneficial parents who often have low motivation because limited time and resources eat healthy together their children.We developed a Conversational Agent (MICA) improve eating in serve as proxy health behavior change children. Proxy relationships involve person...

10.2196/38908 article EN cc-by JMIR Human Factors 2022-08-07

Herbert Simon's classic rich-get-richer model is one of the simplest empirically supported mechanisms capable generating heavy-tail size distributions for complex systems. Simon argued analytically that a population flavored elements growing by either adding novel element or randomly replicating an existing would afford distribution group sizes with power-law tail. Here, we show that, in fact, does not produce simple power law as initial has dominant first-mover advantage, and will be...

10.1103/physreve.95.052301 article EN Physical review. E 2017-05-01

We explore the potential of nonlinear amplifying loop mirror (NALM)-based phase-preserving 2R (reamplification and reshaping) regenerator for simultaneous regeneration multiple wavelength-division-multiplexed (WDM) channels. While not considering multi-channel propagation, we address two issues NALM that appear to us as major obstacles in adopting it realistic WDM applications: a high operating power detrimental effect non-small (33% - 50%) pulse duty cycles. After thorough optimization,...

10.1364/oe.19.023017 article EN cc-by Optics Express 2011-10-28

Abstract RUNX1 and CBFβ form a transcription factor dimer that regulates normal hematopoiesis leukemogenesis. Inversion of chromosome 16 (inv(16)) is one the most common mutations in acute myeloid leukemia (AML), fusing with gene encoding smooth muscle myosin heavy chain (MYH11). The fusion protein encoded by CBFB-MYH11 (CM), retains ability to bind RUNX1, together they cause changes expression leading Recently, we found Histone Deacetylase 1 (HDAC1) part RUNX1:CM complex, all three proteins...

10.1158/1538-7445.am2018-5419 article EN Cancer Research 2018-07-01

This work considers universal adversarial triggers, a method of adversarially disrupting natural language models, and questions if it is possible to use such triggers affect both the topic stance conditional text generation models. In considering four "controversial" topics, this demonstrates success at identifying that cause GPT-2 model produce about targeted topics as well influence takes towards topic. We show that, while more fringe are challenging identify for, they do appear...

10.1145/3461702.3462578 article EN 2021-07-21

In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these for organization fall a family defined by word ordering. Unlike demographic or spatial partitions data, collocation models are special importance their universal applicability. While interested here in text and have framed our treatment appropriately, work is potentially applicable other areas research (e.g., speech, genomics, mobility...

10.1103/physreve.92.042808 article EN Physical Review E 2015-10-12

The task of text segmentation may be undertaken at many levels in analysis---paragraphs, sentences, words, or even letters. Here, we focus on a relatively fine scale segmentation, hypothesizing it to accord with stochastic model language generation, as the smallest where independent units meaning are produced. Our goals this letter include development methods for these minimal units, which produce feature-representations texts that align independence assumption bag-of-terms model, commonly...

10.48550/arxiv.1601.07969 preprint EN other-oa arXiv (Cornell University) 2016-01-01

We propose and develop a Lexicocalorimeter: an online, interactive instrument for measuring the "caloric content" of social media other large-scale texts. do so by constructing extensive yet improvable tables food activity related phrases, respectively assigning them with sourced estimates caloric intake expenditure. show that Twitter, our naive measures input", output", ratio these are all strong correlates health well-being contiguous United States. Our balance measure in many cases...

10.48550/arxiv.1507.05098 preprint EN other-oa arXiv (Cornell University) 2015-01-01
Coming Soon ...