Stephen F. Altschul

ORCID: 0000-0003-2120-9631
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • RNA and protein synthesis mechanisms
  • Machine Learning in Bioinformatics
  • Protein Structure and Dynamics
  • Algorithms and Data Compression
  • Advanced Proteomics Techniques and Applications
  • Genomics and Chromatin Dynamics
  • Bioinformatics and Genomic Networks
  • Gene expression and cancer classification
  • Genetics, Bioinformatics, and Biomedical Research
  • Molecular Biology Techniques and Applications
  • Bayesian Methods and Mixture Models
  • Glycosylation and Glycoproteins Research
  • Enzyme Structure and Function
  • Bacterial Genetics and Biotechnology
  • Biomedical Text Mining and Ontologies
  • Microbial Metabolic Engineering and Bioproduction
  • Genomics and Rare Diseases
  • RNA modifications and cancer
  • RNA Research and Splicing
  • Evolution and Paleontology Studies
  • DNA and Biological Computing
  • Fractal and DNA sequence analysis
  • Bacteriophages and microbial interactions
  • Computational Drug Discovery Methods

National Center for Biotechnology Information
2010-2021

National Institutes of Health
2010-2021

Vanderbilt University
2014

Center for Human Genetics
2014

Center for Information Technology
2011

Rockefeller University
1986-2008

Florida Atlantic University
2003

United States National Library of Medicine
1990-2002

Duke University Hospital
2000

Duke Medical Center
2000

10.1006/jmbi.1990.9999 article EN Journal of Molecular Biology 1990-10-05

Recent studies suggest that one or more genes on chromosome 5q21 are important for the development of colorectal cancers, particularly those associated with familial adenomatous polyposis (FAP). To facilitate identification from this locus, a portion region is tightly linked to FAP was cloned. Six contiguous stretches sequence (contigs) containing approximately 5.5 Mb DNA were isolated. Subclones these contigs used identify and position six genes, all which expressed in normal colonic...

10.1126/science.1651562 article EN Science 1991-08-09

A wealth of protein and DNA sequence data is being generated by genome projects other sequencing efforts. crucial barrier to deciphering these sequences understanding the relations among them difficulty detecting subtle local residue patterns common multiple sequences. Such frequently reflect similar molecular structures biological properties. mathematical definition this "local alignment" problem suitable for full computer automation has been used develop a new sensitive algorithm, based on...

10.1126/science.8211139 article EN Science 1993-10-08

The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence cDNA clone containing complete ORF for each human mouse gene. ESTs were generated from libraries enriched full-length cDNAs analyzed candidate full-ORF clones, which then sequenced high accuracy. MGC has currently verified the full nonredundant set >9,000 >6,000 genes. Candidate clones an additional 7,800 3,500 genes also have been identified. All sequences...

10.1073/pnas.242603899 article EN Proceedings of the National Academy of Sciences 2002-12-11

An unusual pattern in a nucleic acid or protein sequence region of strong similarity shared by two more sequences may have biological significance. It is therefore desirable to know whether such can arisen simply chance. To identify interesting patterns, appropriate scoring values be assigned the individual residues single sets when several are compared. For sequences, scores reflect biophysical properties as charge, volume, hydrophobicity, secondary structure potential; for multiple they...

10.1073/pnas.87.6.2264 article EN Proceedings of the National Academy of Sciences 1990-03-01

Abstract Multiple sequence alignment can be a useful technique for studying molecular evolution, as well analyzing relationships between structure or function and primary sequence. We have developed this purpose an interactive program, MACAW (Multiple Alignment Construction Analysis Workbench), that allows the user to construct multiple alignments by locating, analyzing, editing, combining “blocks” of aligned segments. incorporates several novel features. (1) Regions local similarity are...

10.1002/prot.340090304 article EN Proteins Structure Function and Bioinformatics 1991-03-01

BLAST is a commonly-used software package for comparing query sequence to database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated (PSI-BLAST) iteratively searches database, using the matches round i construct position-specific score matrix (PSSM) searching + 1. Biegert and Söding developed Context-sensitive (CS-BLAST), which combines information from with derived library short profiles achieve better homology detection than PSI-BLAST, builds its...

10.1186/1745-6150-7-12 article EN cc-by Biology Direct 2012-01-01

Computer analysis of a conserved domain, BRCT, first described at the carboxyl ter-minus breast cancer protein BRCA1, p53 binding (53BP1), and yeast cell cycle checkpoint RAD9 revealed large super- family domains that occur predominantly in proteins involved functions responsive to DNA damage. The BRCT domain consists ~95 amino acid residues occurs as tandem repeat terminus numerous proteins, but has been observed also or single copy. superfamily presently includes ~40 nonorthologous namely,...

10.1096/fasebj.11.1.9034168 article EN The FASEB Journal 1997-01-01

10.1016/s0076-6879(96)66029-7 article EN Methods in enzymology on CD-ROM/Methods in enzymology 1996-01-01

10.1016/0022-2836(91)90193-a article EN Journal of Molecular Biology 1991-06-01

TBLASTN is a mode of operation for BLAST that aligns protein sequences to nucleotide database translated in all six frames. We present the first description modern implementation TBLASTN, focusing on new techniques were used implement composition-based statistics searches. Composition-based use composition being aligned generate more accurate E-values, which allows distinction between true and false matches. Until recently, available only protein-protein They are now as command line option...

10.1186/1741-7007-4-41 article EN cc-by BMC Biology 2006-12-01

Multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence-structure relationships. Until recently, it has been impractical to apply dynamic programming, the most widely accepted method producing pairwise alignments, comparisons of more than three sequences. We describe design application tool multiple amino acid sequences that implements new algorithm greatly reduces computational demands programming. This is able align in reasonable time...

10.1073/pnas.86.12.4412 article EN Proceedings of the National Academy of Sciences 1989-06-01

Protein database searches frequently can reveal biologically significant sequence relationships useful in understanding structure and function. Weak but meaningful patterns be obscured, however, by other similarities due only to chance. By searching a for multiple as opposed pairwise alignments, distant are much more easily distinguished from background noise. Recent statistical results permit the power of this approach analyzed. Given typical query sequence, an algorithm described here...

10.1073/pnas.87.14.5509 article EN Proceedings of the National Academy of Sciences 1990-07-01

We have constructed a public gene expression data repository and online access analysis, WWW FTP sites for serial analysis of (SAGE) data. The components this resource, SAGEmap, are located at http://www.ncbi.nlm.nih. gov/sage ftp://ncbi.nlm.nih.gov/pub/sage, respectively. herein describe SAGE submission procedures, the construction characteristics tags to assignments, derivation use novel statistical test designed specifically differential-type analyses data, organization resource.

10.1101/gr.10.7.1051 article EN cc-by-nc Genome Research 2000-07-01

Score-based measures of molecular-sequence features provide versatile aids for the study proteins and DNA. They are used by many sequence data base search programs, as well identifying distinctive properties single sequences. For any such measure, it is important to know what can be expected occur purely chance. The statistical distribution high-scoring segments has been described elsewhere. However, molecular sequences will frequently yield several which some combined assessment in order....

10.1073/pnas.90.12.5873 article EN Proceedings of the National Academy of Sciences 1993-06-15

Abstract Motivation: Many studies have shown that database searches using position-specific score matrices (PSSMs) or profiles as queries are more effective at identifying distant protein relationships than use simple sequences queries. One popular program for constructing a PSSM and comparing it with of is Position-Specific Iterated BLAST (PSI-BLAST). Results: This paper describes new software package, IMPALA, designed the complementary procedure single query sequence PSI-BLAST-generated...

10.1093/bioinformatics/15.12.1000 article EN Bioinformatics 1999-12-01

We describe an approach to analyzing protein sequence databases that, starting from a single uncharacterized or group of related sequences, generates blocks conserved segments. The procedure involves iterative database scans with evolving position-dependent weight matrix constructed coevolving set aligned For each iteration, the expected distribution scores under random model is used cutoff score for inclusion segment in next iteration. This may be calculated allow chance either fixed number...

10.1073/pnas.91.25.12091 article EN Proceedings of the National Academy of Sciences 1994-12-06

10.1007/bf02462326 article EN Bulletin of Mathematical Biology 1986-09-01
Coming Soon ...