Vivek Khetan

ORCID: 0000-0002-4394-4859
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Biomedical Text Mining and Ontologies
  • Natural Language Processing Techniques
  • Data Quality and Management
  • Advanced Text Analysis Techniques
  • Advanced Graph Neural Networks
  • Computational and Text Analysis Methods
  • Bayesian Modeling and Causal Inference
  • Machine Learning in Healthcare
  • Sentiment Analysis and Opinion Mining
  • Information Retrieval and Search Behavior
  • Multimodal Machine Learning Applications
  • Social Media in Health Education
  • Explainable Artificial Intelligence (XAI)
  • Membrane Separation Technologies
  • Web Data Mining and Analysis
  • Advanced Image and Video Retrieval Techniques
  • Intravenous Infusion Technology and Safety
  • Machine Learning and Data Classification
  • Genomics and Chromatin Dynamics
  • Semantic Web and Ontologies
  • Image Retrieval and Classification Techniques
  • Bacterial Genetics and Biotechnology
  • RNA Research and Splicing
  • Electrohydrodynamics and Fluid Dynamics

Accenture (United States)
2021-2024

Accenture (Switzerland)
2022-2023

The University of Texas at Austin
2017-2019

Indian Institute of Technology Kharagpur
2010

A recent "third wave" of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work this area is referred to as deep learning. Recent years have witnessed an explosive growth research into NN-based information retrieval (IR). significant body has been created. In paper, we survey the current...

10.1007/s10791-017-9321-y article EN cc-by Information Retrieval 2017-11-10

Understanding causal narratives communicated in clinical notes can help make strides towards personalized healthcare. Extracted information from be combined with structured EHR data such as patients' demographics, diagnoses, and medications. This will enhance healthcare providers' ability to identify aspects of a patient's story the more informed decisions. In this work, we propose annotation guidelines, develop an annotated corpus provide baseline scores types direction relations between...

10.18653/v1/2022.findings-acl.63 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2022-01-01

Background Intervening in and preventing diabetes distress requires an understanding of its causes and, particular, from a patient’s perspective. Social media data provide direct access to how patients see understand their disease consequently show the distress. Objective Leveraging machine learning methods, we aim extract both explicit implicit cause-effect relationships patient-reported diabetes-related tweets methodology better opinions, feelings, observations shared within online...

10.2196/37201 article EN cc-by JMIR Medical Informatics 2022-07-19

Identification of medical claims from user-generated text data is an onerous but essential step for various tasks including content moderation, and hypothesis generation. SemEval-2023 Task 8 effort towards building those capabilities motivating further research in this direction. This paper summarizes the details results shared task at which involved identifying causal extracting related Populations, Interventions, Outcomes ("PIO") frames social media (Reddit) text. comprised two subtasks:...

10.18653/v1/2023.semeval-1.311 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2023-01-01

We present Reddit Health Online Talk (RedHOT), a corpus of 22,000 richly annotated social media posts from spanning 24 health conditions. Annotations include demarcations spans corresponding to medical claims, personal experiences, and questions.We collect additional granular annotations on identified claims.Specifically, we mark snippets that describe patient Populations, Interventions, Outcomes (PIO elements) within these. Using this corpus, introduce the task retrieving trustworthy...

10.18653/v1/2023.findings-eacl.61 article EN cc-by 2023-01-01

We motivate and introduce CHARD: Clinical Health-Aware Reasoning across Dimensions, to investigate the capability of text generation models act as implicit clinical knowledge bases generate free-flow textual explanations about various health-related conditions several dimensions. collect present an associated dataset, CHARDat, consisting 52 health three conduct extensive experiments using BART T5 along with data augmentation, perform automatic, human, qualitative analyses. show that while...

10.18653/v1/2023.eacl-main.24 article EN cc-by 2023-01-01

The growing quantity and complexity of data pose challenges for humans to consume information respond in a timely manner. For businesses domains with rapidly changing rules regulations, failure identify changes can be costly. In contrast expert analysis or the development domain-specific ontology taxonomies, we use task-based approach fulfilling specific needs within new domain. Specifically, propose extract from incoming instance data. A pipeline constructed state art NLP technologies,...

10.48550/arxiv.2104.08936 preprint EN other-oa arXiv (Cornell University) 2021-01-01

This position paper proposes a systematic approach towards developing framework to help select the most effective embedding models for natural language processing (NLP) tasks, addressing challenge posed by proliferation of both proprietary and open-source encoder models.

10.48550/arxiv.2404.00458 preprint EN arXiv (Cornell University) 2024-03-30

10.18653/v1/2024.emnlp-main.1132 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

Large-scale sequence-to-sequence models have shown to be adept at both multiple-choice and open-domain commonsense reasoning tasks.However, the current formulations do not provide ability control various attributes of chain.To enable better controllability, we propose study as a template filling task (TemplateCSR) -where language fills templates with given constraints factors.As an approach TemplateCSR, (i) dataset templateexpansion pairs for healthcare well-being domain (ii) introduce ITO,...

10.18653/v1/2023.findings-ijcnlp.23 article EN cc-by 2023-01-01

Large-scale sequence-to-sequence models have shown to be adept at both multiple-choice and open-domain commonsense reasoning tasks. However, the current systems do not provide ability control various attributes of chain. To enable better controllability, we propose study as a template filling task (TemplateCSR) -- where language fills templates with given constraints factors. As an approach TemplateCSR, (i) dataset template-expansion pairs (ii) introduce POTTER, pretrained model using...

10.48550/arxiv.2111.00539 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Because researchers typically do not have the time or space to present more than a few evaluation metrics in any published study, it can be difficult assess relative effectiveness of prior methods for unreported when baselining new method conducting systematic meta-review. While sharing study data would help alleviate this, recent attempts encourage consistent been largely unsuccessful. Instead, we propose enable comparisons with work across arbitrary by predicting given one reported...

10.48550/arxiv.1802.00323 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Causality understanding between events is a critical natural language processing task that helpful in many areas, including health care, business risk management and finance. On close examination, one can find huge amount of textual content both the form formal documents or arising from social media like Twitter, dedicated to communicating exploring various types causality real world. Recognizing these "Cause-Effect" relationships continues remain challenge simply because it often expressed...

10.48550/arxiv.2012.05453 preprint EN cc-by-nc-nd arXiv (Cornell University) 2020-01-01

We motivate and introduce CHARD: Clinical Health-Aware Reasoning across Dimensions, to investigate the capability of text generation models act as implicit clinical knowledge bases generate free-flow textual explanations about various health-related conditions several dimensions. collect present an associated dataset, CHARDat, consisting 52 health three conduct extensive experiments using BART T5 along with data augmentation, perform automatic, human, qualitative analyses. show that while...

10.48550/arxiv.2210.04191 preprint EN other-oa arXiv (Cornell University) 2022-01-01

We present Reddit Health Online Talk (RedHOT), a corpus of 22,000 richly annotated social media posts from spanning 24 health conditions. Annotations include demarcations spans corresponding to medical claims, personal experiences, and questions. collect additional granular annotations on identified claims. Specifically, we mark snippets that describe patient Populations, Interventions, Outcomes (PIO elements) within these. Using this corpus, introduce the task retrieving trustworthy...

10.48550/arxiv.2210.06331 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Objective: Leveraging machine learning methods, we aim to extract both explicit and implicit cause-effect associations in patient-reported, diabetes-related tweets provide a tool better understand opinion, feelings observations shared within the diabetes online community from causality perspective. Materials Methods: More than 30 million English were collected between April 2017 January 2021. Deep natural language processing methods applied focus on with personal emotional content. A...

10.48550/arxiv.2111.01225 preprint EN cc-by-nc-sa arXiv (Cornell University) 2021-01-01

Recent advances have led to the availability of many pre-trained language models (PLMs); however, a question that remains is how much data truly needed fine-tune PLMs for downstream tasks? In this work, we introduce DEFT, data-efficient fine-tuning framework leverages unsupervised core-set selection minimize amount tasks. We demonstrate efficacy our DEFT in context text-editing LMs, and compare state-of-the art model, CoEDIT. Our quantitative qualitative results are just as accurate CoEDIT...

10.48550/arxiv.2310.16776 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Gas sparging is one of the techniques used to control concentration polarization during ultrafiltration. In this work, effects gas in stratified flow regime were investigated gel layer controlling cross ultrafiltration a rectangular channel. Synthetic solution pectin was as forming solute. The liquid and rates selected such that prevalent A mass transfer model developed for system quantify on coefficient (Sherwood number). results compared with case no sparging. led an increase by about 23%...

10.12989/mwt.2012.3.3.151 article EN Membrane Water Treatment 2012-07-25

Abstract Background TATA Binding Protein (TBP) is required for transcription initiation by all three eukaryotic RNA polymerases. It participates in transcriptional at the majority of gene promoters, either direct association to box upstream start site or indirectly localizing promoter through other proteins. TBP exists solution a dimeric form but binds DNA as monomer. Here, we present first mathematical model auto-catalytic expression and use it study role dimerization maintaining steady...

10.1186/1745-6150-5-50 article EN cc-by Biology Direct 2010-08-05

<sec> <title>BACKGROUND</title> Intervening in and preventing diabetes distress requires an understanding of its causes and, particular, from a patient’s perspective. Social media data provide direct access to how patients see understand their disease consequently show the distress. </sec> <title>OBJECTIVE</title> Leveraging machine learning methods, we aim extract both explicit implicit cause-effect relationships patient-reported diabetes-related tweets methodology better opinions,...

10.2196/preprints.37201 preprint EN 2022-02-10
Coming Soon ...