James Cuff

ORCID: 0000-0003-0570-4106
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • RNA and protein synthesis mechanisms
  • Chromosomal and Genetic Variations
  • Scientific Computing and Data Management
  • Genomics and Chromatin Dynamics
  • Enzyme Structure and Function
  • Protein Structure and Dynamics
  • Genetic Mapping and Diversity in Plants and Animals
  • Genetics, Bioinformatics, and Biomedical Research
  • Glycosylation and Glycoproteins Research
  • Software-Defined Networks and 5G
  • Network Security and Intrusion Detection
  • Machine Learning in Bioinformatics
  • IoT and Edge/Fog Computing
  • Cloud Computing and Resource Management
  • Smart Grid Security and Resilience
  • RNA Research and Splicing
  • Model-Driven Software Engineering Techniques
  • Research Data Management Practices
  • Animal Genetics and Reproduction
  • Distributed and Parallel Computing Systems
  • Algorithms and Data Compression
  • Bioinformatics and Genomic Networks
  • Cloud Data Security Solutions
  • Epigenetics and DNA Methylation

University of Liverpool
2020-2023

Carnegie Mellon University
2020-2023

IBM (United Kingdom)
2020-2023

Qinetiq (United Kingdom)
2020-2023

University of Toronto
2020-2023

IBM Research - Thomas J. Watson Research Center
2020-2023

Boston University
2020

Harvard University
2011-2018

Harvard University Press
2007-2017

Baylor Genetics
2007

R Waterston Kerstin Lindblad‐Toh Ewan Birney Jane Rogers Josep F. Abril and 95 more Pankaj Agarwal Richa Agarwala Rachel Ainscough Marina Alexandersson Peter An Stylianos E. Antonarakis Jonathan Wood Robert Baertsch J. Bailey K. F. Barlow Stephan Beck E. Berry Bruce W. Birren Toby Bloom Peer Bork Marc Botcherby Nicolas Bray Michael R. Brent Daniel G. Brown S.D.M. Brown Carol J. Bult John H. Burton Jonathan A. Butler R. Duncan Campbell Piero Carninci Simon Cawley Francesca Chiaromonte Asif Chinwalla Deanna M. Church Michèle Clamp Christopher Clee Francis S. Collins Lisa L. Cook Richard R. Copley Alan Coulson Olivier Couronne James Cuff Val Curwen Tim Cutts Mark Daly Robert David J. Davies Kimberly D. Delehaunty Justin Deri Emmanouil T. Dermitzakis Colin N. Dewey Nicholas J. Dickens Mark Diekhans Sheila Dodge Inna Dubchak Diane M. Dunn Sean R. Eddy Laura Elnitski Richard D. Emes Pallavi Eswara Eduardo Eyras Adam L. Felsenfeld Ginger Fewell Paul Flicek Karen Foley Wayne N. Frankel Lucinda A. Fulton Robert S. Fulton Terrence S. Furey Diane Gage Richard A. Gibbs Gustavo Glusman Sante Gnerre Nick Goldman Leo Goodstadt Darren Grafham Tina Graves Eric D. Green Simon G. Gregory Roderic Guigó Mark S. Guyer Ross C. Hardison David Haussler Yoshihide Hayashizaki LaDeana W. Hillier Angie S. Hinrichs Wratko Hlavina Timothy R. Holzer Fan Hsu Axin Hua Tim Hubbard Adrienne Hunt Ian J. Jackson David B. Jaffe L. Steven Johnson Matthew C. Jones Thomas A. Jones Ann Joy Michael Kamal Elinor K. Karlsson

10.1038/nature01262 article EN Nature 2002-12-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is comprehensive source stable automatic annotation human genome sequence, with confirmed gene predictions that have been integrated external data sources, and available as either an interactive web site or flat files. also open software engineering develop portable system able handle very genomes associated requirements from sequence...

10.1093/nar/30.1.38 article EN Nucleic Acids Research 2002-01-01

Multiple sequence alignment remains a crucial method for understanding the function of groups related nucleic acid and protein sequences. However, it is known that automatic multiple alignments can often be improved by manual editing. Therefore, tools are needed to view edit alignments. Due growth in databases, large difficult efficiently. The Jalview Java editor presented here, which enables fast viewing editing

10.1093/bioinformatics/btg430 article EN Bioinformatics 2004-01-22

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis 29 eutherian genomes. We confirm that at least 5.5% human undergone purifying selection, locate constrained elements covering ∼4.2% genome. use evolutionary signatures comparisons with experimental data sets to suggest candidate functions ∼60% bases. These reveal small number new coding exons, stop codon readthrough events over 10,000 regions...

10.1038/nature10530 article EN cc-by-nc-sa Nature 2011-10-01

An interactive protein secondary structure prediction Internet server is presented. The allows a single sequence or multiple alignment to be submitted, and returns predictions from six algorithms that exploit evolutionary information sequences. A consensus also returned which improves the average Q3 accuracy of by 1% 72.9%. simplifies use current conservation patterns important function identified.http://barton.ebi.ac.uk/servers/jpred.h tmlgeoff@ebi.ac.uk

10.1093/bioinformatics/14.10.892 article EN Bioinformatics 1998-01-01

The effect of training a neural network secondary structure prediction algorithm with different types multiple sequence alignment profiles derived from the same sequences, is shown to provide range accuracy 70.5% 76.4%. best 76.4% (standard deviation 8.4%), 3.1% (Q(3)) and 4.4% (SOV2) better than PHD run on set 406 non-redundant proteins that were not used train either method. Residues predicted by new method confidence value 5 or greater, have an average Q(3) 84%, cover 68% residues....

10.1002/1097-0134(20000815)40:3<502::aid-prot170>3.0.co;2-q article EN Proteins Structure Function and Bioinformatics 2000-01-01

A new dataset of 396 protein domains is developed and used to evaluate the performance secondary structure prediction algorithms DSC, PHD, NNSSP, PREDATOR. The maximum theoretical Q3 accuracy for combination these methods shown be 78%. simple consensus on domains, with automatically generated multiple sequence alignments gives an average 72.9%. This a 1% improvement over which was best single method evaluated. Segment Overlap Accuracy (SOV) 75.4% 396-protein set. definition DSSP defines 8...

10.1002/(sici)1097-0134(19990301)34:4<508::aid-prot10>3.0.co;2-4 article EN Proteins Structure Function and Bioinformatics 1999-03-01

Although the Human Genome Project was completed 4 years ago, catalog of human protein-coding genes remains a matter controversy. Current catalogs list total approximately 24,500 putative genes. It is broadly suspected that large fraction these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence evolutionary conservation with mouse or dog. However, there currently scientific justification for excluding simply fail to conservation:...

10.1073/pnas.0709013104 article EN Proceedings of the National Academy of Sciences 2007-11-27

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for initially targeted 1% human genome. Here, we present orthologous generation, alignment, and evolutionary constraint 23 mammalian species all targets. Alignments were generated using four different methods; comparisons these methods reveal large-scale consistency but substantial differences in terms small genomic rearrangements, sensitivity (sequence coverage), specificity (alignment accuracy)....

10.1101/gr.6034307 article EN cc-by-nc Genome Research 2007-06-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is comprehensive and integrated source annotation genome sequences, available via interactive website, web services or flat files. As well as being one leading sources annotation, an open software engineering develop portable system able handle very genomes associated requirements. facilities range from sequence analysis data storage...

10.1093/nar/gkh038 article EN Nucleic Acids Research 2003-12-17

Ensembl is a software project to automatically annotate large eukaryotic genomes and release them freely into the public domain. The currently annotates 10 complete genomes. This makes very demands on compute resources, due vast number of sequence comparisons that need be executed. To circumvent financial outlay often associated with classical supercomputing environments, farms multiple, lower-cost machines have now become norm been deployed successfully this project. architecture design...

10.1101/gr.1866304 article EN cc-by-nc Genome Research 2004-05-01

We describe a detailed solution for maintaining high-capacity, data-intensive network flows (eg, 10, 40, 100 Gbps+) in scientific, medical context while still adhering to security and privacy laws regulations.

10.1093/jamia/ocx104 article EN cc-by-nc Journal of the American Medical Informatics Association 2017-08-31

Abstract Objective We describe use cases and an institutional reference architecture for maintaining high-capacity, data-intensive network flows (e.g., 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security privacy laws regulations. Materials Methods High-end networking, packet filter firewalls, intrusion detection systems. Results “Medical Science DMZ” concept as option secure, high-volume transport of large, sensitive data sets between research institutions...

10.1093/jamia/ocw032 article EN cc-by-nc Journal of the American Medical Informatics Association 2016-05-02

This article describes the lessons learned, challenges faced, and innovations made in designing implementing a high-availability private cloud for research computing.

10.1109/mc.2017.182 article EN Computer 2017-01-01

An automatic sequence searching method (ProtEST) is described which constructs multiple protein alignments from sequences and translated expressed tags (ESTs). ProtEST more effective than a simple TBLASTN search of the query against EST database, as are automatically clustered, assembled, made non-redundant, checked for errors, into then aligned displayed.A found translated, error- length-corrected > 58% when single 1407 Pfam-A seed were used probe. The average family size resulting...

10.1093/bioinformatics/16.2.111 article EN Bioinformatics 2000-02-01

It is a pleasure and an honour to welcome you the first edition of Applied AI Letters. Getting this point has been combination many people's hard work we are very excited move into next stage, sharing our vision for Letters with you. When consider lifecycle successful idea, can identify some unique stages. We have put these together in Figure 1. Initially, challenge should be identified, it often (although not always) case that there idea impact solving have. If ignites scientific spirit,...

10.1002/ail2.8 article EN cc-by Applied AI Letters 2020-08-04
Coming Soon ...