- Topic Modeling
- Natural Language Processing Techniques
- Advanced Text Analysis Techniques
- Gene expression and cancer classification
- Genomics and Chromatin Dynamics
- Academic integrity and plagiarism
- Mobile Health and mHealth Applications
- Cancer-related molecular mechanisms research
- Context-Aware Activity Recognition Systems
- Human Mobility and Location-Based Analysis
- Music and Audio Processing
- Misinformation and Its Impacts
- Algorithms and Data Compression
- Data-Driven Disease Surveillance
- Speech and dialogue systems
- Single-cell and spatial transcriptomics
- Text and Document Classification Technologies
- Biomedical Text Mining and Ontologies
- Authorship Attribution and Profiling
- Genomic variations and chromosomal abnormalities
- Public Relations and Crisis Communication
- Physical Activity and Health
- Respiratory viral infections research
- Text Readability and Simplification
- Customer Service Quality and Loyalty
University of Virginia
2019-2024
Office of Public Health Genomics
2021
Engineering Systems (United States)
2019-2020
University of Tehran
2018-2019
K.N.Toosi University of Technology
2013
As available genomic interval data increase in scale, we require fast systems to search them. A common approach is simple string matching compare a term metadata, but this limited by incomplete or inaccurate annotations. An alternative directly through region overlap analysis, leads challenges like sparsity, high dimensionality, and computational expense. We novel methods quickly flexibly query large, messy databases. Here, develop system using representation learning. train numerical...
Data from the single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) are now widely available. One major computational challenge is dealing with high dimensionality and inherent sparsity, which typically addressed by producing lower dimensional representations of single cells downstream clustering tasks. Current approaches produce such individual cell embeddings directly through a one-step learning process. Here, we propose an alternative approach building...
Genomic region sets summarize functional genomics data and define locations of interest in the genome such as regulatory regions or transcription factor binding sites. The number publicly available has increased dramatically, leading to challenges analysis.
Representation learning models have become a mainstay of modern genomics. These are trained to yield vector representations, or embeddings, various biological entities, such as cells, genes, individuals, genomic regions. Recent applications unsupervised embedding approaches been shown learn relationships among regions that define functional elements in genome. Unsupervised representation is free the supervision from curated metadata and can condense rich knowledge publicly available data...
During a disease outbreak, timely non-medical interventions are critical in preventing the from growing into an epidemic and ultimately pandemic. However, taking quick measures requires capability to detect early warning signs of outbreak. This work collects Twitter posts surrounding 2020 COVID-19 pandemic expressing most common symptoms including cough fever, geolocated United States. Through examining variation activities at state level, we observed temporal lag between rises number...
Representation learning models have become a mainstay of modern genomics. These are trained to yield vector representations, or embeddings, various biological entities, such as cells, genes, individuals, genomic regions. Recent applications unsupervised embedding approaches been shown learn relationships among regions that define functional elements in genome. Unsupervised representation is free the supervision from curated metadata and can condense rich knowledge publicly available data...
Motivation Data from the single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) is now widely available. One major computational challenge dealing with high dimensionality and inherent sparsity, which typically addressed by producing lower-dimensional representations of single cells downstream clustering tasks. Current approaches produce such individual cell embeddings directly through a one-step learning process. Here, we propose an alternative approach...
Due to high competition in today's business and the need for satisfactory communication with customers, companies understand inevitable necessity focus not only on preventing customer churn but also predicting their needs providing best services them. The purpose of this article is predict future needed by wireless users, data mining techniques. For purpose, database customers an ISP Shiraz, which logs usage internet connections, utilized. Since service has three main factors define (Time,...
Most past research in the area of serious games for simulation has focused on with constrained multiple-choice based dialogue systems. Recent advancements natural language processing make free-input text classification-based systems more feasible, but an effective framework collecting training data such not yet been developed. This paper presents methods and generating a system. Various crowdsourcing prompt types are presented. A binary category system, which increases fidelity labeling to...
Readily available, trustworthy, and usable medical information is vital to promoting global health. Cochrane a non-profit organization that conducts publishes systematic reviews of research findings. Over 3000 Reviews are presently used as evidence in Wikipedia articles. Currently, Cochrane's researchers manually search pages related medicine order identify articles can be improved with evidence. Our aim streamline this process by applying existing document similarity retrieval methods...
Existing simulations designed for cultural and interpersonal skill training rely on pre-defined responses with a menu option selection interface. Using multiple-choice interface restricting trainees' may limit the ability to apply lessons in real life situations. This systems also uses simplistic evaluation model, where selected options are marked as either correct or incorrect. model not capture sufficient information that could drive an adaptive feedback mechanism improve awareness. paper...
Finding a highly informative, low-dimensional representation for texts, specifically long is one of the main challenges efficient information storage and retrieval. This should capture semantic syntactic text while retaining relevance large-scale similarity search. We propose utilization Rhetorical Structure Theory (RST) to consider structure in representation. In addition, embed document distributed representation, we use Siamese neural network jointly learn representations. Our consists...
Long text representation for natural language processing tasks has capture researchers' attention recently. Beyond the sentence, finding a good turns to bag of words that losses sequence order. Indeed, does not pattern in haphazard way; rather, coherent document there exist systematic connections between sentences. Rhetorical structure theory models this connection tree format. This span and their relation. The importance each is distinguished by hierarchy type named nucleus satellite. In...
Activity recognition using built-in sensors in smart and wearable devices provides great opportunities to understand detect human behavior the wild gives a more holistic view of individuals' health well being. Numerous computational methods have been applied sensor streams recognize different daily activities. However, most are unable capture layers activities concealed behavior. Also, performance models starts decrease with increasing number This research aims at building hierarchical...
Due to the increasing amount of data on internet, finding a highly-informative, low-dimensional representation for text is one main challenges efficient natural language processing tasks including classification. This should capture semantic information while retaining their relevance level document approach maps documents with similar topics space in vector representation. To obtain large text, we propose utilization deep Siamese neural networks. embed distributed representation, use...
Motivation As available genomic interval data increases in scale, we require fast systems to search it. A common approach is simple string matching compare a term metadata, but this limited by incomplete or inaccurate annotations. An alternative directly through region overlap analysis, these approaches lead challenges like sparsity, high dimensionality, and computational expense. We novel methods quickly flexibly query large, messy databases. Results Here, develop system using...