NFDI4DS | UHH-SEMS - Publication Details

Suicide Risk Assessment with Multi-level Dual-Context Language and

OPENALEX - Publications

Matthew Matero Akash Idnani Youngseo Son S. De Giorgi Huy Quan Vu and 4 more

Mental health predictive systems typically model language as if from a single context (e.g. Twitter posts, status updates, or forum posts) and often limited to level of analysis either the message-level user-level). Here, we bring these pieces together explore use open-vocabulary (BERT embeddings, topics) theoretical features (emotional expression lexica, personality) for task suicide risk assessment on support forums (the CLPsych-2019 Shared Task). We used dual based approaches (modeling...

10.18653/v1/w19-3005 article EN 2019-01-01

Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality

OPENALEX - Publications

Adithya V Ganesan Matthew Matero Aravind Reddy Ravula Huy Vu H. Andrew Schwartz

In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than standard 768+ hidden state sizes each layer within modern transformer-based language models, limiting ability to effectively leverage transformers. Here, we provide a systematic study on role dimension reduction methods (principal components analysis, factorization techniques, multi-layer auto-encoders) well dimensionality embedding vectors and sample...

10.18653/v1/2021.naacl-main.357 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021-01-01

Opioid death projections with AI-based forecasts using social media language

OPENALEX - Publications

Matthew Matero Salvatore Giorgi Brenda Curtis Lyle Ungar H. Andrew Schwartz

Targeting of location-specific aid for the U.S. opioid epidemic is difficult due to our inability accurately predict changes in mortality across heterogeneous communities. AI-based language analyses, having recently shown promise cross-sectional (between-community) well-being assessments, may offer a way more longitudinally community-level overdose mortality. Here, we develop and evaluate, TROP (Transformer Opiod Prediction), model community-specific trend projection that uses social media...

10.1038/s41746-023-00776-0 article EN cc-by npj Digital Medicine 2023-03-08

MeLT: Message-Level Transformer with Masked Document Representations as Pre-Training for Stance Detection

OPENALEX - Publications

Matthew Matero Nikita Soni Niranjan Balasubramanian H. Andrew Schwartz

Much of natural language processing is focused on leveraging large capacity models, typically trained over single messages with a task predicting one or more tokens. However, modeling human at higher-levels context (i.e., sequences messages) under-explored. In stance detection and other social media tasks where the goal to predict an attribute message, we have contextual data that loosely semantically connected by authorship. Here, introduce Message-Level Transformer (MeLT) – hierarchical...

10.18653/v1/2021.findings-emnlp.253 preprint EN cc-by 2021-01-01

Using Facebook language to predict and describe excessive alcohol use

OPENALEX - Publications

Rupa Jose Matthew Matero Garrick Sherman Brenda Curtis Salvatore Giorgi and 2 more

Assessing risk for excessive alcohol use is important applications ranging from recruitment into research studies to targeted public health messaging. Social media language provides an ecologically embedded source of information assessing individuals who may be at harmful drinking.Using data collected on 3664 respondents the general population, we examine how accurately used social classifies as at-risk problems based Alcohol Use Disorder Identification Test-Consumption score benchmarks.We...

10.1111/acer.14807 article EN Alcoholism Clinical and Experimental Research 2022-05-01

Human Language Modeling

OPENALEX - Publications

Nikita Soni Matthew Matero Niranjan Balasubramanian H. Schwartz

Natural language is generated by people, yet traditional modeling views words or documents as if independently. Here, we propose human (HuLM), a hierarchical extension to the problem where human- level exists connect sequences of (e.g. social media messages) and capture notion that moderated changing states. We introduce, HaRT, large-scale transformer model for solving HuLM, pre-trained on approximately 100,000 users, demonstrate it’s effectiveness in terms both (perplexity) fine-tuning 4...

10.18653/v1/2022.findings-acl.52 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2022-01-01

Autoregressive Affective Language Forecasting: A Self-Supervised Task

OPENALEX - Publications

Matthew Matero H. Andrew Schwartz

Human natural language is mentioned at a specific point in time while human emotions change over time. While much work has established strong link between use and emotional states, few have attempted to model Here, we introduce the task of

10.18653/v1/2020.coling-main.261 article EN cc-by Proceedings of the 17th international conference on Computational linguistics - 2020-01-01

Discourse-Level Representations can Improve Prediction of Degree of Anxiety

OPENALEX - Publications

Swanie Juhng Matthew Matero Vasudha Varadarajan Johannes C. Eichstaedt Adithya V Ganesan and 1 more

Swanie Juhng, Matthew Matero, Vasudha Varadarajan, Johannes Eichstaedt, Adithya V Ganesan, H. Andrew Schwartz. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 2: Short Papers). 2023.

10.18653/v1/2023.acl-short.128 article EN cc-by 2023-01-01

SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks

OPENALEX - Publications

Gourab Dey Adithya V Ganesan Yash Kumar Lal M. G. Shah Shreyashee Sinha and 4 more

Social science NLP tasks, such as emotion or humor detection, are required to capture the semantics along with implicit pragmatics from text, often limited amounts of training data. Instruction tuning has been shown improve many capabilities large language models (LLMs) commonsense reasoning, reading comprehension, and computer programming. However, little is known about effectiveness instruction on social domain where pragmatic cues needed be captured. We explore use for tasks introduce...

10.48550/arxiv.2402.01980 preprint EN arXiv (Cornell University) 2024-02-02

Evaluating Contextual Embeddings and their Extraction Layers for Depression Assessment

OPENALEX - Publications

Matthew Matero Albert M. Hung H. Schwartz

Many recent works in natural language processing have demonstrated ability to assess aspects of mental health from personal discourse. At the same time, pre-trained contextual word embedding models grown dominate much NLP but little is known empirically on how best apply them for assessment. Using degree depression as a case study, we do an empirical analysis which off-the-shelf model, individual layers, and combinations layers seem most promising when applied human-level tasks. Notably,...

10.18653/v1/2022.wassa-1.9 article EN cc-by 2022-01-01

Evaluating Contextual Embeddings and their Extraction Layers for Depression Assessment

OPENALEX - Publications

Matthew Matero Albert M. Hung H. Andrew Schwartz

Recent works have demonstrated ability to assess aspects of mental health from personal discourse. At the same time, pre-trained contextual word embedding models grown dominate much NLP but little is known empirically on how best apply them for assessment. Using degree depression as a case study, we do an empirical analysis which off-the-shelf language model, individual layers, and combinations layers seem most promising when applied human-level tasks. Notably, find RoBERTa effective and,...

10.48550/arxiv.2112.13795 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Human Language Modeling

OPENALEX - Publications

Nikita Soni Matthew Matero Niranjan Balasubramanian H. Andrew Schwartz

Natural language is generated by people, yet traditional modeling views words or documents as if independently. Here, we propose human (HuLM), a hierarchical extension to the problem whereby human-level exists connect sequences of (e.g. social media messages) and capture notion that moderated changing states. We introduce, HaRT, large-scale transformer model for HuLM task, pre-trained on approximately 100,000 users, demonstrate its effectiveness in terms both (perplexity) fine-tuning 4...

10.48550/arxiv.2205.05128 preprint EN other-oa arXiv (Cornell University) 2022-01-01

WWBP-SQT-lite: Multi-level Models and Difference Embeddings for Moments of Change Identification in Mental Health Forums

OPENALEX - Publications

Adithya V Ganesan Vasudha Varadarajan Juhi Mittal Shashanka Subrahmanya Matthew Matero and 4 more

Adithya V Ganesan, Vasudha Varadarajan, Juhi Mittal, Shashanka Subrahmanya, Matthew Matero, Nikita Soni, Sharath Chandra Guntuku, Johannes Eichstaedt, H. Andrew Schwartz. Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology. 2022.

10.18653/v1/2022.clpsych-1.25 article EN cc-by 2022-01-01

MeLT: Message-Level Transformer with Masked Document Representations as Pre-Training for Stance Detection

OPENALEX - Publications

Matthew Matero Nikita Soni Niranjan Balasubramanian H. Andrew Schwartz

Much of natural language processing is focused on leveraging large capacity models, typically trained over single messages with a task predicting one or more tokens. However, modeling human at higher-levels context (i.e., sequences messages) under-explored. In stance detection and other social media tasks where the goal to predict an attribute message, we have contextual data that loosely semantically connected by authorship. Here, introduce Message-Level Transformer (MeLT) -- hierarchical...

10.48550/arxiv.2109.08113 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality

OPENALEX - Publications

Adithya V Ganesan Matthew Matero Aravind Reddy Ravula Huy Vu H. Andrew Schwartz

In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than standard 768+ hidden state sizes each layer within modern transformer-based language models, limiting ability to effectively leverage transformers. Here, we provide a systematic study on role dimension reduction methods (principal components analysis, factorization techniques, multi-layer auto-encoders) well dimensionality embedding vectors and sample...

10.48550/arxiv.2105.03484 preprint EN other-oa arXiv (Cornell University) 2021-01-01