- Natural Language Processing Techniques
- Biomedical Text Mining and Ontologies
- Topic Modeling
- Software Engineering Research
- Semantic Web and Ontologies
- semigroups and automata theory
- Bioinformatics and Genomic Networks
- Speech and dialogue systems
- Software Testing and Debugging Techniques
- Genomics and Phylogenetic Studies
- Software Reliability and Analysis Research
- Algorithms and Data Compression
- Syntax, Semantics, Linguistic Variation
- Machine Learning in Bioinformatics
- Advanced Software Engineering Methodologies
- Genetics, Bioinformatics, and Biomedical Research
- Logic, programming, and type systems
- Machine Learning and Algorithms
- Advanced Malware Detection Techniques
- Logic, Reasoning, and Knowledge
- Software Engineering Techniques and Practices
- Advanced Text Analysis Techniques
- Text Readability and Simplification
- Molecular Biology Techniques and Applications
- Computational Drug Discovery Methods
University of Delaware
2015-2025
University Ucinf
2006-2017
Georgetown University
2007-2014
University Hospital Heidelberg
2014
Georgetown University Medical Center
2014
Princeton University
2014
Heidelberg University
2014
European Molecular Biology Laboratory
2014
Mississippi State University
2012
Tilburg University
2001
Studies have shown that good comments can help programmers quickly understand what a method does, aiding program comprehension and software maintenance. Unfortunately, few projects adequately comment the code. One way to overcome lack of human-written summary comments, guard against obsolete is automatically generate them. In this paper, we present novel technique descriptive for Java methods. Given signature body method, our automatic generator identifies content generates natural language...
Most software engineering tasks require developers to understand parts of the source code. When faced with unfamiliar code, often rely on (internal or external) documentation gain an overall understanding code and determine whether it is relevant for current task. Unfortunately, absent outdated. This paper presents a technique automatically generate human readable summaries Java classes, assuming no exists. The allow main goal structure class. focus content responsibilities rather than their...
We consider the structural descriptions produced by various grammatical formalisms in terms of complexity paths and relationship between sets that each system can generate. In considering formalisms, we show it is useful to abstract away from details formalism, examine nature their derivation process as reflected properties trees. find several considered be seen being closely related since they have tree with same structure those Context-Free Grammars. On basis this observation, describe a...
Most current software systems contain undocumented high-level ideas implemented across multiple files and modules. When developers perform program maintenance tasks, they often waste time effort locating understanding these scattered concerns. We have developed a semi-automated concern location comprehension tool, Find-Concept, designed to reduce the spend on tasks increase their confidence in results of tasks. Find-Concept is effective because it searches unique natural language-based...
As software systems continue to grow and evolve, locating code for maintenance reuse tasks becomes increasingly difficult. Existing static search techniques using natural language queries provide little support help developers determine whether results are relevant, few recommend alternative words reformulate poor queries. In this paper, we present a novel approach that automatically extracts phrases from source identifiers categorizes the in hierarchy. Our contextual allows explore word...
One approach to easing program comprehension is reduce the amount of code that a developer has read. Describing high level abstract algorithmic actions associated with fragments using succinct natural language phrases potentially enables newcomer focus on fewer and more concepts when trying understand given method. Unfortunately, such descriptions are typically missing because it tedious create them manually.
Automated software engineering tools (e.g., program search, concern location, code reuse, quality assessment, etc.) increasingly rely on natural language information from comments and identifiers in code. The first step analyzing words requires splitting into their constituent words. Unlike languages, where space punctuation are used to delineate words, cannot contain spaces. One common way split is follow programming naming conventions. For example, Java programmers often use camel case,...
Protein post-translational modifications (PTMs) play a pivotal role in numerous biological processes by modulating regulation of protein function. We have developed iPTMnet (http://proteininformationresource.org/iPTMnet) for PTM knowledge discovery, employing an integrative bioinformatics approach—combining text mining, data and ontological representation to capture rich information, including enzyme-substrate-site relationships, PTM-specific protein-protein interactions (PPIs) conservation...
Completing software maintenance and evolution tasks for today's large, complex systems can be difficult, often requiring considerable time to understand the system well enough make correct changes. Despite evidence that successful programmers use program structure as identifier names explore software, most existing exploration techniques either structural or lexical information. By using only one type of information, automated tools ignore valuable clues about a developer's intentions -...
An important part of the leading comments for a method are formal parameters method. According to Java documentation writing guidelines, developers should write summary method'sactions followed by each parameter. In this paper, we describe novel technique automatically generate descriptive methods. Such generated can help alleviate lack developer written parameter comments. addition, they programmer in ensuring that comment is current with code. We present heuristics provide high-level...
When writing software, developers often employ abbreviations in identifier names. In fact, some may never occur with the expanded word, or more code. However, most existing program comprehension and search tools do little to address problem of abbreviations, therefore miss meaningful pieces code relationships between software artifacts. this paper, we present an automated approach mining abbreviation expansions from source enhance maintenance that utilize natural language information. Our...
Many software development and maintenance tools involve matching between natural language words in different artifacts (e.g., traceability) or queries submitted by a user code search). Because people likely created the various artifacts, effectiveness of these is often improved expanding adding related to textual artifact representations. Synonyms are particularly useful overcome mismatch vocabularies, as well other word relations that indicate semantic similarity. However, experience shows...
Heat stress triggers an evolutionarily conserved set of responses in cells. The transcriptome responds to hyperthermia by altering expression genes adapt the cell or organism survive heat challenge. RNA-seq technology allows rapid identification environmentally responsive on a large scale. In this study, we have used identify chicken male white leghorn hepatocellular (LMH) line. transcripts 812 were (p < 0.01) with 235 upregulated and 577 downregulated following 2.5 h stress. Among whose...
The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There pressing need to gather such mutation-disease into public knowledge bases, but manual curation slows down the growth databases. We have addressed this problem by developing text-mining system (DiMeX) extract mutation disease from publication abstracts. DiMeX consists series natural language processing modules that preprocess input text apply syntactic semantic patterns...
MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA targets are often reported in the literature. In this paper, we describe miRTex, text mining system that extracts miRNA-target relations, as well miRNA-gene gene-miRNA regulation relations. The achieves good precision recall when evaluated on literature corpus 150 abstracts with F-scores close to 0.90 three different types We...
Recently, automatically extracting biomedical relations has been a significant subject in research due to the rapid growth of literature. Since adaptation domain, transformer-based BERT models have produced leading results on many natural language processing tasks. In this work, we will explore approaches improve model for relation extraction tasks both pre-training and fine-tuning stages its applications. stage, add another level sub-domain data bridge gap between domain knowledge...
Motivation: A large volume of experimental data on protein phosphorylation is buried in the fast-growing PubMed literature. While great value, such information limited databases owing to laborious process literature-based curation. Computational literature mining holds promise facilitate database
We have embedded Tree Adjoining Grammars (TAG) in a feature structure based unification system. The resulting system, Feature Structure (FTAG), captures the principle of factoring dependencies and recursion, fundamental to TAG's. show that FTAG has an enhanced descriptive capacity compared TAG formalism. consider some restricted versions this system possible linguistic stipulations can be made. briefly describe calculus represent structures used by extending on work Rounds, Kasper [Rounds et...
In this paper we present a polynomial time parsing algorithm for Combinatory Categorial Grammar. The recognition phase extends the CKY CFG. process of generating representation parse trees has two phases. Initially, shared forest is build that encodes set all derivation input string. This then pruned to remove spurious ambiguity.
DTG are designed to share some of the advantages TAG while overcoming its limitations. involve two composition operations called subsertion and sister-adjunction. The most distinctive feature is that, unlike TAG, there complete uniformity in way that relate lexical items: always corresponds complementation sister-adjunction modification. Furthermore, DTG, can provide a uniform analysis for wh-movement English Kashmiri, despite fact wh element Kashmiri appears sentence-second position, not...