- Topic Modeling
- Natural Language Processing Techniques
- Mental Health Research Topics
- Health, Environment, Cognitive Aging
- Biomedical Text Mining and Ontologies
- Machine Learning in Healthcare
- Meta-analysis and systematic reviews
- Explainable Artificial Intelligence (XAI)
- Advanced Text Analysis Techniques
- Text Readability and Simplification
- Machine Learning and Data Classification
- Machine Learning and Algorithms
- Artificial Intelligence in Healthcare and Education
- Computational and Text Analysis Methods
- Text and Document Classification Technologies
- Data Quality and Management
- Scientific Computing and Data Management
- Semantic Web and Ontologies
- Mobile Crowdsensing and Crowdsourcing
- Sentiment Analysis and Opinion Mining
- Imbalanced Data Classification Techniques
- Data Stream Mining Techniques
- Artificial Intelligence in Healthcare
- Software Engineering Research
- Mental Health via Writing
Northeastern University
2015-2024
Universidad del Noreste
2016-2023
IT University of Copenhagen
2023
Tokyo Institute of Technology
2023
Administration for Community Living
2023
American Jewish Committee
2023
John Brown University
2013-2023
Carnegie Mellon University
2023
University of Massachusetts Amherst
2023
Accenture (Switzerland)
2023
The R environment provides a natural platform for developing new statistical methods due to the mathematical expressiveness of language, large number existing libraries, and active developer community. One drawback R, however, is learning curve; programming deterrent non-technical users, who typically prefer graphical user interfaces (GUIs) command line environments. Thus, while statisticians develop in practitioners are often behind terms techniques they use as rely on GUI applications....
Convolutional Neural Networks (CNNs) have recently achieved remarkably strong performance on the practically important task of sentence classification (kim 2014, kalchbrenner johnson 2014). However, these models require practitioners to specify an exact model architecture and set accompanying hyperparameters, including filter region size, regularization parameters, so on. It is currently unknown how sensitive changes in configurations for classification. We thus conduct a sensitivity...
Meta-analysis is increasingly used as a key source of evidence synthesis to inform clinical practice. The theory and statistical foundations meta-analysis continually evolve, providing solutions many new challenging problems. In practice, most meta-analyses are performed in general packages or dedicated programs.Herein, we introduce Meta-Analyst, novel, powerful, intuitive, free program for the variety Meta-Analyst implemented C# atop Microsoft .NET framework, features graphical user...
Sarthak Jain, Byron C. Wallace. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
Attention mechanisms have seen wide adoption in neural NLP models. In addition to improving predictive performance, these are often touted as affording transparency: models equipped with attention provide a distribution over attended-to input units, and this is presented (at least implicitly) communicating the relative importance of inputs. However, it unclear what relationship exists between weights model outputs. work, we perform extensive experiments across variety tasks that aim assess...
Medical researchers looking for evidence pertinent to a specific clinical question must navigate an increasingly voluminous corpus of published literature. This data deluge has motivated the development machine learning and mining technologies facilitate efficient biomedical research. Despite obvious labor-saving potential these concomitant academic interest therein, however, adoption techniques by medical been relatively sluggish. One explanation this is that while many methods have...
Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, Byron C. Wallace. Proceedings of the 58th Annual Meeting Association for Computational Linguistics. 2020.
Summary Meta‐analysis and meta‐regression are statistical methods for synthesizing modelling the results of different studies, critical research synthesis tools in ecology evolutionary biology (E&E). However, many E&E researchers carry out meta‐analyses using software that is limited its functionality not easily updatable. It likely these limitations have slowed uptake new scope quality inferences from syntheses. We developed OpenMEE: Open Meta‐analyst Ecology Evolution to address...
Machine learning (ML) algorithms have proven highly accurate for identifying Randomized Controlled Trials (RCTs) but are not used much in practice, part because the best way to make use of technology a typical workflow is unclear. In this work, we evaluate ML models RCT classification (support vector machines, convolutional neural networks, and ensemble approaches). We trained optimized support machine network on titles abstracts Cochrane Crowd set. evaluated an external dataset (Clinical...
Systematic reviews address a specific clinical question by unbiasedly assessing and analyzing the pertinent literature. Citation screening is time-consuming critical step in systematic reviews. Typically, reviewers must evaluate thousands of citations to identify articles eligible for given review. We explore application machine learning techniques semi-automate citation screening, thereby reducing reviewers' workload. present novel online classification strategy automatically discriminate...
We introduce a deep neural network for automated sarcasm detection.Recent work has emphasized the need models to capitalize on contextual features, beyond lexical and syntactic cues present in utterances.For example, different speakers will tend employ regarding subjects and, thus, detection ought encode such speaker information.Current methods have achieved this by way of laborious feature engineering.By contrast, we propose automatically learn then exploit user embeddings, be used concert...
We present a new Convolutional Neural Network (CNN) model for text classification that jointly exploits labels on documents and their constituent sentences.Specifically, we consider scenarios in which annotators explicitly mark sentences (or snippets) support overall document categorization, i.e., they provide rationales.Our such supervision via hierarchical approach each is represented by linear combination of the vector representations its component sentences.We propose sentence-level...
Abstract Objective To develop and evaluate RobotReviewer, a machine learning (ML) system that automatically assesses bias in clinical trials. From (PDF-formatted) trial report, the should determine risks of for domains defined by Cochrane Risk Bias (RoB) tool, extract supporting text these judgments. Methods We algorithmically annotated 12,808 PDFs using data from Database Systematic Reviews (CDSR). Trials were labeled as being at low or high/unclear risk each domain, sentences informative...
We present a corpus of 5,000 richly annotated abstracts medical articles describing clinical randomized controlled trials. Annotations include demarcations text spans that describe the Patient population enrolled, Interventions studied and to what they were Compared, Outcomes measured (the 'PICO' elements). These are further at more granular level, e.g., individual interventions within them marked mapped onto structured vocabulary. acquired annotations from diverse set workers with varying...
In many settings it is important for one to be able understand why a model made particular prediction. NLP this often entails extracting snippets of an input text ‘responsible for’ corresponding output; when such snippet comprises tokens that indeed informed the model’s prediction, faithful explanation. some settings, faithfulness may critical ensure transparency. Lei et al. (2016) proposed produce rationales neural classification by defining independent extraction and prediction modules....
Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict relationship them. Recent work has instead treated problem as a sequence-to-sequence task, linearizing relations target strings be generated conditioned on input. Here we push limits this approach, using larger language models (GPT-3 Flan-T5 large) than considered in prior...
Class imbalance (i.e., scenarios in which classes are unequally represented the training data) occurs many real-world learning tasks. Yet despite its practical importance, there is no established theory of class imbalance, and existing methods for handling it therefore not well motivated. In this work, we approach problem from a probabilistic perspective, vantage identify dataset characteristics (such as dimensionality, sparsity, etc.) that exacerbate problem. Motivated by theory, advocate...
Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed make this process more efficient via a hybrid approach both crowdsourcing and ML.We trained classifier discriminate between citations that describe those do not. then adopted simple strategy automatically excluding deemed very unlikely be by the deferring crowdworkers...
Automatically detecting verbal irony (roughly, sarcasm) is a challenging task because ironists say something other than ‐ and often opposite to what they actually mean. Discerning ironic intent exclusively from the words syntax comprising texts (e.g., tweets, forum posts) therefore not always possible: additional contextual information about speaker and/or topic at hand necessary. We introduce new corpus that provides empirical evidence for this claim. show annotators frequently require...
A recent "third wave" of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work this area is referred to as deep learning. Recent years have witnessed an explosive growth research into NN-based information retrieval (IR). significant body has been created. In paper, we survey the current...
We propose a new active learning (AL) method for text classification with convolutional neural networks (CNNs). In AL, one selects the instances to be manually labeled aim of maximizing model performance minimal effort. Neural models capitalize on word embeddings as representations (features), tuning these task at hand. argue that AL strategies multi-layered should focus selecting most affect embedding space (i.e., induce discriminative representations). This is in contrast traditional...