µProteInS—a proteogenomics pipeline for finding novel bacterial microproteins encoded by small ORFs
0301 basic medicine
Open Reading Frames
03 medical and health sciences
Bacteria
Genomics
Software
Proteogenomics
DOI:
10.1093/bioinformatics/btac115
Publication Date:
2022-02-18T20:11:43Z
AUTHORS (8)
ABSTRACT
AbstractSummaryGenome annotation pipelines traditionally exclude open reading frames (ORFs) shorter than 100 codons to avoid false identifications. However, studies have been showing that these may encode functional microproteins with meaningful biological roles. We developed µProteInS, a proteogenomics pipeline that combines genomics, transcriptomics and proteomics to identify novel microproteins in bacteria. Our pipeline employs a model to filter out low confidence spectra, to avoid the need for manually inspecting Mass Spectrometry data. It also overcomes the shortcomings of traditional approaches that usually exclude overlapping genes, leaderless transcripts and non-conserved sequences, characteristics that are common among small ORFs (smORFs) and hamper their identification.Availability and implementationµProteInS is implemented in Python 3.8 within an Ubuntu 20.04 environment. It is an open-source software distributed under the GNU General Public License v3, available as a command-line tool. It can be downloaded at https://github.com/Eduardo-vsouza/uproteins and either installed from source or executed as a Docker image.Supplementary informationSupplementary data are available at Bioinformatics online.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (10)
CITATIONS (7)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....