- Research Data Management Practices
- Bioinformatics and Genomic Networks
- Gene expression and cancer classification
- Scientific Computing and Data Management
- Biomedical Text Mining and Ontologies
- Computational Physics and Python Applications
- Distributed and Parallel Computing Systems
- Computational Drug Discovery Methods
- Big Data Technologies and Applications
- Image Processing and 3D Reconstruction
- AI in cancer detection
- Artificial Intelligence in Healthcare
- Cloud Computing and Resource Management
- Genetics, Bioinformatics, and Biomedical Research
- Molecular Biology Techniques and Applications
- Single-cell and spatial transcriptomics
- Engineering Applied Research
- Genomics and Phylogenetic Studies
- Energy and Environmental Systems
Imperial College London
2014-2023
Wellcome Trust
2010-2012
European Bioinformatics Institute
2010-2012
Wellcome Sanger Institute
2012
Cancer Research UK
2010
University of Cambridge
2008
ArrayExpress http://www.ebi.ac.uk/arrayexpress consists of three components: the Repository—a public archive functional genomics experiments and supporting data, Warehouse—a database gene expression profiles other bio-measurements Atlas—a new summary meta-analytical tool ranked across multiple different biological conditions. The Repository contains data from over 6000 comprising approximately 200 000 assays, doubles in size every 15 months. majority are array based, but types included, most...
The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and DDBJ Omics Archive, supporting peer-reviewed publications. It accepts generated by sequencing or array-based technologies currently contains from almost a million assays, over 30 000 experiments. proportion sequencing-based submissions has grown significantly last 2 years reached, in...
The ArrayExpress Archive ( http://www.ebi.ac.uk/arrayexpress ) is one of the three international public repositories functional genomics data supporting publications. It includes generated by sequencing or array-based technologies. Data are submitted users and imported directly from NCBI Gene Expression Omnibus. closely integrated with Atlas sequence databases at European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include based on technology sample...
Abstract The notion that data should be Findable, Accessible, Interoperable and Reusable, according to the FAIR Principles, has become a global norm for good stewardship prerequisite reproducibility. Nowadays, guides policy actions professional practices in public private sectors. Despite such endorsements, however, Principles are aspirational, remaining elusive at best, intimidating worst. To address lack of practical guidance, help with capability gaps, we developed Cookbook, an open,...
The Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. content of this derives from curation, re-annotation statistical analysis selected data the ArrayExpress Archive Functional Genomics Data. A simple interface allows user to query for differential either (i) by names or attributes...
The COVID-19 pandemic has highlighted the need for FAIR (Findable, Accessible, Interoperable, and Reusable) data more than any other scientific challenge to date. We developed a flexible, multi-level, domain-agnostic FAIRification framework, providing practical guidance improve FAIRness both existing future clinical molecular datasets. validated framework in collaboration with several major public-private partnership projects, demonstrating delivering improvements across all aspects of...
High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of currently used in translational medicine studies. Although supported warehouses such as tranSMART, when querying relational databases for hundreds different patient gene expression records queries are slow due to poor performance. Non-relational models, key-value model implemented NoSQL databases, hold promise be more performant solutions. Our motivation improve...
High-throughput molecular profiling data has been used to improve clinical decision making by stratifying subjects based on their profiles. Unsupervised clustering algorithms can be for stratification purposes. However, the current speed of cannot meet requirement large-scale due poor performance correlation matrix calculation. With high-throughput sequencing technologies promising produce even larger datasets per subject, we expect state-of-the-art statistical further impacted unless...
Biomedical informatics has traditionally adopted a linear view of the process (collect, store and analyse) in translational medicine (TM) studies; focusing primarily on challenges data integration analysis. However, management challenge presents itself with new lifecycle emphasized by recent calls for re-use, long term preservation, sharing. There is currently lack dedicated infrastructure focused 'manageability' TM research between collection Current community efforts towards establishing...
Microarray data from cell lines of Non-Small Cell Lung Carcinoma (NSCLC) can be used to look for differences in gene expression between the derived different tumour samples, and investigate if these cluster into distinct groups. Dividing classes help improve diagnosis development screens new drug candidates. The micro-array is first subjected quality control analysis then subsequently normalised using three alternate methods reduce chances being artefacts resulting normalisation process....
Drug target identification, being the first phase in drug discovery is becoming an overly time consuming process and many cases produces inefficient results due to failure of conventional approaches investigate large scale data. The main goal this work identify targets, where there are genes or proteins associated with specific diseases. With help Microarray technology, relationship between biological entities such as protein-protein, gene-gene related chemical compounds used a means...
Translational biomedical research has become a science driven by big data. Improving patient care developing personalized therapies and new drugs depends increasingly on an organization's ability to rapidly intelligently leverage complex molecular clinical data from variety of large-scale internal external, partner public, sources. As analysing these datasets computationally expensive, it is paramount importance enable researchers seamlessly scale up their computation platform while being...