- Genomics and Phylogenetic Studies
- RNA and protein synthesis mechanisms
- Chromosomal and Genetic Variations
- Scientific Computing and Data Management
- Genomics and Chromatin Dynamics
- Enzyme Structure and Function
- Protein Structure and Dynamics
- Genetic Mapping and Diversity in Plants and Animals
- Genetics, Bioinformatics, and Biomedical Research
- Glycosylation and Glycoproteins Research
- Software-Defined Networks and 5G
- Network Security and Intrusion Detection
- Machine Learning in Bioinformatics
- IoT and Edge/Fog Computing
- Cloud Computing and Resource Management
- Smart Grid Security and Resilience
- RNA Research and Splicing
- Model-Driven Software Engineering Techniques
- Research Data Management Practices
- Animal Genetics and Reproduction
- Distributed and Parallel Computing Systems
- Algorithms and Data Compression
- Bioinformatics and Genomic Networks
- Cloud Data Security Solutions
- Epigenetics and DNA Methylation
University of Liverpool
2020-2023
Carnegie Mellon University
2020-2023
IBM (United Kingdom)
2020-2023
Qinetiq (United Kingdom)
2020-2023
University of Toronto
2020-2023
IBM Research - Thomas J. Watson Research Center
2020-2023
Boston University
2020
Harvard University
2011-2018
Harvard University Press
2007-2017
Baylor Genetics
2007
The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is comprehensive source stable automatic annotation human genome sequence, with confirmed gene predictions that have been integrated external data sources, and available as either an interactive web site or flat files. also open software engineering develop portable system able handle very genomes associated requirements from sequence...
Multiple sequence alignment remains a crucial method for understanding the function of groups related nucleic acid and protein sequences. However, it is known that automatic multiple alignments can often be improved by manual editing. Therefore, tools are needed to view edit alignments. Due growth in databases, large difficult efficiently. The Jalview Java editor presented here, which enables fast viewing editing
The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis 29 eutherian genomes. We confirm that at least 5.5% human undergone purifying selection, locate constrained elements covering ∼4.2% genome. use evolutionary signatures comparisons with experimental data sets to suggest candidate functions ∼60% bases. These reveal small number new coding exons, stop codon readthrough events over 10,000 regions...
An interactive protein secondary structure prediction Internet server is presented. The allows a single sequence or multiple alignment to be submitted, and returns predictions from six algorithms that exploit evolutionary information sequences. A consensus also returned which improves the average Q3 accuracy of by 1% 72.9%. simplifies use current conservation patterns important function identified.http://barton.ebi.ac.uk/servers/jpred.h tmlgeoff@ebi.ac.uk
The effect of training a neural network secondary structure prediction algorithm with different types multiple sequence alignment profiles derived from the same sequences, is shown to provide range accuracy 70.5% 76.4%. best 76.4% (standard deviation 8.4%), 3.1% (Q(3)) and 4.4% (SOV2) better than PHD run on set 406 non-redundant proteins that were not used train either method. Residues predicted by new method confidence value 5 or greater, have an average Q(3) 84%, cover 68% residues....
A new dataset of 396 protein domains is developed and used to evaluate the performance secondary structure prediction algorithms DSC, PHD, NNSSP, PREDATOR. The maximum theoretical Q3 accuracy for combination these methods shown be 78%. simple consensus on domains, with automatically generated multiple sequence alignments gives an average 72.9%. This a 1% improvement over which was best single method evaluated. Segment Overlap Accuracy (SOV) 75.4% 396-protein set. definition DSSP defines 8...
Although the Human Genome Project was completed 4 years ago, catalog of human protein-coding genes remains a matter controversy. Current catalogs list total approximately 24,500 putative genes. It is broadly suspected that large fraction these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence evolutionary conservation with mouse or dog. However, there currently scientific justification for excluding simply fail to conservation:...
A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for initially targeted 1% human genome. Here, we present orthologous generation, alignment, and evolutionary constraint 23 mammalian species all targets. Alignments were generated using four different methods; comparisons these methods reveal large-scale consistency but substantial differences in terms small genomic rearrangements, sensitivity (sequence coverage), specificity (alignment accuracy)....
The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is comprehensive and integrated source annotation genome sequences, available via interactive website, web services or flat files. As well as being one leading sources annotation, an open software engineering develop portable system able handle very genomes associated requirements. facilities range from sequence analysis data storage...
Ensembl is a software project to automatically annotate large eukaryotic genomes and release them freely into the public domain. The currently annotates 10 complete genomes. This makes very demands on compute resources, due vast number of sequence comparisons that need be executed. To circumvent financial outlay often associated with classical supercomputing environments, farms multiple, lower-cost machines have now become norm been deployed successfully this project. architecture design...
We describe a detailed solution for maintaining high-capacity, data-intensive network flows (eg, 10, 40, 100 Gbps+) in scientific, medical context while still adhering to security and privacy laws regulations.
Abstract Objective We describe use cases and an institutional reference architecture for maintaining high-capacity, data-intensive network flows (e.g., 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security privacy laws regulations. Materials Methods High-end networking, packet filter firewalls, intrusion detection systems. Results “Medical Science DMZ” concept as option secure, high-volume transport of large, sensitive data sets between research institutions...
This article describes the lessons learned, challenges faced, and innovations made in designing implementing a high-availability private cloud for research computing.
An automatic sequence searching method (ProtEST) is described which constructs multiple protein alignments from sequences and translated expressed tags (ESTs). ProtEST more effective than a simple TBLASTN search of the query against EST database, as are automatically clustered, assembled, made non-redundant, checked for errors, into then aligned displayed.A found translated, error- length-corrected > 58% when single 1407 Pfam-A seed were used probe. The average family size resulting...
It is a pleasure and an honour to welcome you the first edition of Applied AI Letters. Getting this point has been combination many people's hard work we are very excited move into next stage, sharing our vision for Letters with you. When consider lifecycle successful idea, can identify some unique stages. We have put these together in Figure 1. Initially, challenge should be identified, it often (although not always) case that there idea impact solving have. If ignites scientific spirit,...