- Genomics and Phylogenetic Studies
- RNA and protein synthesis mechanisms
- Machine Learning in Bioinformatics
- Protein Structure and Dynamics
- Algorithms and Data Compression
- Advanced Proteomics Techniques and Applications
- Genomics and Chromatin Dynamics
- Bioinformatics and Genomic Networks
- Gene expression and cancer classification
- Genetics, Bioinformatics, and Biomedical Research
- Molecular Biology Techniques and Applications
- Bayesian Methods and Mixture Models
- Glycosylation and Glycoproteins Research
- Enzyme Structure and Function
- Bacterial Genetics and Biotechnology
- Biomedical Text Mining and Ontologies
- Microbial Metabolic Engineering and Bioproduction
- Genomics and Rare Diseases
- RNA modifications and cancer
- RNA Research and Splicing
- Evolution and Paleontology Studies
- DNA and Biological Computing
- Fractal and DNA sequence analysis
- Bacteriophages and microbial interactions
- Computational Drug Discovery Methods
National Center for Biotechnology Information
2010-2021
National Institutes of Health
2010-2021
Vanderbilt University
2014
Center for Human Genetics
2014
Center for Information Technology
2011
Rockefeller University
1986-2008
Florida Atlantic University
2003
United States National Library of Medicine
1990-2002
Duke University Hospital
2000
Duke Medical Center
2000
Recent studies suggest that one or more genes on chromosome 5q21 are important for the development of colorectal cancers, particularly those associated with familial adenomatous polyposis (FAP). To facilitate identification from this locus, a portion region is tightly linked to FAP was cloned. Six contiguous stretches sequence (contigs) containing approximately 5.5 Mb DNA were isolated. Subclones these contigs used identify and position six genes, all which expressed in normal colonic...
A wealth of protein and DNA sequence data is being generated by genome projects other sequencing efforts. crucial barrier to deciphering these sequences understanding the relations among them difficulty detecting subtle local residue patterns common multiple sequences. Such frequently reflect similar molecular structures biological properties. mathematical definition this "local alignment" problem suitable for full computer automation has been used develop a new sensitive algorithm, based on...
The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence cDNA clone containing complete ORF for each human mouse gene. ESTs were generated from libraries enriched full-length cDNAs analyzed candidate full-ORF clones, which then sequenced high accuracy. MGC has currently verified the full nonredundant set >9,000 >6,000 genes. Candidate clones an additional 7,800 3,500 genes also have been identified. All sequences...
An unusual pattern in a nucleic acid or protein sequence region of strong similarity shared by two more sequences may have biological significance. It is therefore desirable to know whether such can arisen simply chance. To identify interesting patterns, appropriate scoring values be assigned the individual residues single sets when several are compared. For sequences, scores reflect biophysical properties as charge, volume, hydrophobicity, secondary structure potential; for multiple they...
Abstract Multiple sequence alignment can be a useful technique for studying molecular evolution, as well analyzing relationships between structure or function and primary sequence. We have developed this purpose an interactive program, MACAW (Multiple Alignment Construction Analysis Workbench), that allows the user to construct multiple alignments by locating, analyzing, editing, combining “blocks” of aligned segments. incorporates several novel features. (1) Regions local similarity are...
BLAST is a commonly-used software package for comparing query sequence to database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated (PSI-BLAST) iteratively searches database, using the matches round i construct position-specific score matrix (PSSM) searching + 1. Biegert and Söding developed Context-sensitive (CS-BLAST), which combines information from with derived library short profiles achieve better homology detection than PSI-BLAST, builds its...
Computer analysis of a conserved domain, BRCT, first described at the carboxyl ter-minus breast cancer protein BRCA1, p53 binding (53BP1), and yeast cell cycle checkpoint RAD9 revealed large super- family domains that occur predominantly in proteins involved functions responsive to DNA damage. The BRCT domain consists ~95 amino acid residues occurs as tandem repeat terminus numerous proteins, but has been observed also or single copy. superfamily presently includes ~40 nonorthologous namely,...
TBLASTN is a mode of operation for BLAST that aligns protein sequences to nucleotide database translated in all six frames. We present the first description modern implementation TBLASTN, focusing on new techniques were used implement composition-based statistics searches. Composition-based use composition being aligned generate more accurate E-values, which allows distinction between true and false matches. Until recently, available only protein-protein They are now as command line option...
Multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence-structure relationships. Until recently, it has been impractical to apply dynamic programming, the most widely accepted method producing pairwise alignments, comparisons of more than three sequences. We describe design application tool multiple amino acid sequences that implements new algorithm greatly reduces computational demands programming. This is able align in reasonable time...
Protein database searches frequently can reveal biologically significant sequence relationships useful in understanding structure and function. Weak but meaningful patterns be obscured, however, by other similarities due only to chance. By searching a for multiple as opposed pairwise alignments, distant are much more easily distinguished from background noise. Recent statistical results permit the power of this approach analyzed. Given typical query sequence, an algorithm described here...
We have constructed a public gene expression data repository and online access analysis, WWW FTP sites for serial analysis of (SAGE) data. The components this resource, SAGEmap, are located at http://www.ncbi.nlm.nih. gov/sage ftp://ncbi.nlm.nih.gov/pub/sage, respectively. herein describe SAGE submission procedures, the construction characteristics tags to assignments, derivation use novel statistical test designed specifically differential-type analyses data, organization resource.
Score-based measures of molecular-sequence features provide versatile aids for the study proteins and DNA. They are used by many sequence data base search programs, as well identifying distinctive properties single sequences. For any such measure, it is important to know what can be expected occur purely chance. The statistical distribution high-scoring segments has been described elsewhere. However, molecular sequences will frequently yield several which some combined assessment in order....
Abstract Motivation: Many studies have shown that database searches using position-specific score matrices (PSSMs) or profiles as queries are more effective at identifying distant protein relationships than use simple sequences queries. One popular program for constructing a PSSM and comparing it with of is Position-Specific Iterated BLAST (PSI-BLAST). Results: This paper describes new software package, IMPALA, designed the complementary procedure single query sequence PSI-BLAST-generated...
We describe an approach to analyzing protein sequence databases that, starting from a single uncharacterized or group of related sequences, generates blocks conserved segments. The procedure involves iterative database scans with evolving position-dependent weight matrix constructed coevolving set aligned For each iteration, the expected distribution scores under random model is used cutoff score for inclusion segment in next iteration. This may be calculated allow chance either fixed number...