A deep proteome and transcriptome abundance atlas of 29 healthy human tissues

Proteome
DOI: 10.15252/msb.20188503 Publication Date: 2019-02-18T16:25:05Z
ABSTRACT
Article18 February 2019Open Access Transparent process A deep proteome and transcriptome abundance atlas of 29 healthy human tissues Dongxue Wang orcid.org/0000-0002-4402-0690 Chair Proteomics Bioanalytics, Technische Universität München, Freising, Germany Search for more papers by this author Basak Eraslan Computational Biology, Department Informatics, Technical University Munich, Garching bei Biochemistry, Quantitative Biosciences Gene Center, Ludwig Maximilian Universität, Thomas Wieland OmicScouts GmbH, Björn Hallström Science Life Laboratory, KTH - Royal Institute Technology, Stockholm, Sweden Hopf Daniel Paul Zolg Jana Zecha Anna Asplund Immunology, Genetics Pathology, Uppsala University, Uppsala, Li-hua Li Chen Meng Martin Frejno orcid.org/0000-0002-6651-1773 Tobias Schmidt Karsten Schnatbaum JPT Peptide Technologies Berlin, Mathias Wilhelm Frederik Ponten orcid.org/0000-0003-0703-3940 Uhlen Julien Gagneur Corresponding Author [email protected] orcid.org/0000-0002-8924-8365 Hannes Hahne orcid.org/0000-0003-3601-0051 Bernhard Kuster orcid.org/0000-0002-9094-1677 Center Integrated Protein Munich (CIPSM), Information Wang1,‡, Eraslan2,3,‡, Wieland4, Hallström5, Hopf4, Zolg1, Zecha1, Asplund6, Li1, Meng1, Frejno1, Schmidt1, Schnatbaum7, Wilhelm1, Ponten6, Uhlen5, *,2, *,4 *,1,8 1Chair 2Computational 3Department 4OmicScouts 5Science 6Science 7JPT 8Center ‡These authors contributed equally to work *Corresponding author. Tel: +49 89 289 19411; E-mail: 8161 976289 0; Fax: 1; 71 5696; 5931; Molecular Systems Biology (2019)15:e8503https://doi.org/10.15252/msb.20188503 See also: B et al (February 2019) PDFDownload PDF article text main figures. Peer ReviewDownload a summary the editorial decision including letters, reviewer comments responses feedback. ToolsAdd favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract Genome-, transcriptome- proteome-wide measurements provide insights into how biological systems are regulated. However, fundamental aspects relating which proteins exist, where they expressed in quantities not fully understood. Therefore, we generated quantitative paired from Human Atlas project representing genes 18,072 transcripts 13,640 37 without prior protein-level evidence. The analysis revealed that hundreds proteins, particularly testis, could be detected even highly mRNAs, few show tissue-specific expression, strong differences between mRNA protein within across exist expression is often stable than transcripts. Only 238 9,848 amino acid variants found exome sequencing confidently at level showing proteogenomics remains challenging, needs better computational methods requires rigorous validation. Many uses resource can envisaged study gene/protein regulation biomarker specificity evaluation. Synopsis Proteome quantification reveals as approximate quantities. Tissue-specific rare rather qualitative characteristic. presents most comprehensive date, tissues. evidence provided 15,257 isoforms, missing proteins. Proteogenomics still challenging validation synthetic peptides. Introduction Delineating factors govern activity cells among research topics biology. Although number potential protein-coding genome stabilizing about 20,000, high-quality their physical existence has yet been all intense efforts ongoing identify these currently ~13% "missing proteins" (Omenn al, 2017). While it also generally accepted vary greatly different cell types, body fluids (Kim 2014; 2014), analysed systematically many Furthermore, very clear anabolic catabolic processes coordinated give rise vast levels Messenger RNA important determinants (Vogel 2010; Schwanhäusser 2011), extensive maps types have proxies estimating (GTEx Consortium, 2013; Uhlén 2015; Thul other studies highlighted much higher dynamic range transcript well poor correlation suggesting further possibly diverse regulatory elements play roles (Schwanhäusser 2011; Liu 2016; Franks Decades careful numerous affecting translation or stability such codon usage, start context secondary structure name few. focussed on single were performed model organisms distinct did cover lot Broader scale analyses recently become possible owing advances profiling technologies, but mostly (disease) tissue cell-type resolved (Zhang Mertins 2016). To best our knowledge, no broad-scale integrative transcriptomes proteomes would enable explaining experimentally observed expression. purpose was generate molecular data facilitate control humans. end, major histologically (HPA) (Uhlén 2015) baseline map body. As below 2019, used ways explore its fundamentally topic envisaged, available ArrayExpress (Kolesnikov proteomeXchange (Vizcaíno 2014). Results Discussion Comprehensive transcriptomic proteomic We specimen organs label-free proteomics RNA-Seq (Fig 1A; see Appendix Figs S1–S6 assessment quality). Tissues collected HPA (Fagerberg adjacent cryosections (allele-specific) analysis. quantified total with an average 12,262 (± 1,007 standard deviation, SD) per 1B) when using cut-off 1 fragment kilobase million (FPKM; 2015). Proteomic mass spectrometry resulted identification intensity-based absolute (iBAQ; 2011) 15,210 groups 11,005 680 false discovery rate (FDR) < 1% protein, peptide peptide-spectrum match (PSM) EV1A). based 277,698 non-redundant tryptic peptides, and, average, 10,541 512 covering, 86% every tissue. identified smaller (community-based) resources ProteomicsDB (Schmidt 2018) neXtProt (Gaudet 2017; coverage 15,721 17,470 genes, respectively), provides consistent collection deepest date analysed. It (represented least one unique peptide) covered (release 2018-01-17; Table EV1). These validated peptides (see PRIDE submission mirror spectra). Eighteen antibody staining current release them signal same MS. This corroborates detection new independent method. Eight meet guidelines Project require ≥ 2 each 9 acids length (Deutsch note HPP use reasonable ad hoc criteria likely too conservative therefore discriminate against genuine cases. Comparing spectra endogenous objective criterion why added plots evaluated cases (Zolg "new" factor 10 median (iBAQ log10 scale, 7.4 versus 8.3) explains may missed before. Interestingly, 15 fallopian tube, organ extensively profiled proteomics. Figure 1. donors Body Number colouring bars indicates fractions everywhere enriched certain full classification text. Abundance distribution (grey); fraction shown blue orange. Relative numbers selected functional classes categories panel (B). Colours Download figure PowerPoint Click here expand figure. EV1. Further characterization level. Note high testis. brain (grey). Proteins tissues, orange elevated brain. Clustering gene ontology terms (biological process) divergent Boxes examples GO four (Appendix, brain, heart testis). Overall, 13,413 both levels, spanned almost entire again indicating substantial 1C). some mRNAs (i.e. mean abundance). About 1/3 testis (478 1,408) contained nearly EV1B). "missing" statistically significantly related spermatogenesis (clusterProfiler; n = 82 genes; BH-adjusted P 8 × 10−14). rich known long time exploited for, e.g., cloning cDNAs, apparent absence so surprising. due to, (11,024 genes) obvious technical (such inefficient extraction membrane difficulties identifying small proteins) prevent 300 antibodies (according HPA) 200 ascribed function. inability detect despite poses questions. For example, rapidly degraded implying specialized (and perhaps transient) functions sperm functionality? Are stabilized response egg fertilization? lower end (less abundance) overrepresented G-protein-coupled receptor (n 173; 8.3 10−50), ion channels 109; 7 10−10) cytokine-related biology 76; 6 10−9). simply spectrometric limit or, described times, difficult extract presence multi-pass transmembrane domains giving if any MS-compatible after digestion. profile, applied scheme (2015, 2016) previously developed stratifies five "tissue-enriched" (fivefold above tissue), "group enriched" group 2–7 tissues), "enhanced" "expressed all" (expressed tissues) "mixed" (which do categories). large represented tissues: 37% (6,725) 39% (5,400) 43% (7,866) 53% (7,244) showed ("tissue-enriched", "group-enriched" "tissue-enhanced"). 0.73% (on average) 0.65% tissue-enriched profile. Two notable exceptions exhibit percentage line recent GTEx projects 2013). tissue-restricted tended slightly EV1C). 1,270 1,998 study, HPA. In common 775 lending support spectrometry-based presented here. addition, compared targeted MS (PRM) acquired 52 Edfors (2016) overlapped S7–S9). Incidentally, Edfors' had three First, myoglobin (MB) confirmed PRM Second, PDK1 (3-phosphoinositide-dependent kinase-1) heart-enriched this. immunohistochemistry (IHC) stains conclude broad overstaining specificity. third example CANT1 (soluble calcium-activated nucleotidase 1) prostate-enriched protein. Again, measurement IHC. global trends distributions mirrored interesting detail 1D, EV4). while disease-associated followed drug targets general GPCRs particular speaking notion make ubiquitously (Hao Tatonetti, context, point out value can, quickly examine profile target interest, help understand adverse clinical effects off-target mechanisms action drugs. instance, phenylalanine hydroxylase (PAH) pan-HDAC inhibitor panobinostat (Becher Our shows PAH abundantly liver kidney) site hydroxylation (Matthews, 2007), exerts detrimental effects, i.e. leading decreased tyrosine eventually hypothyroidism affected patients. contrast, essential (Blomen Hart mitochondrial majority central maintaining cellular homeostasis. Despite detail, dataset confirms, level, there core set genes/proteins individual strongly characterized categorical (Geiger evident divergently enrichment specialization respective EV1D, EV3). relationship studied over past years continues debate various correlations computed interpreted artefacts meaning (Liu Fortelny beyond scope attempt reconcile views, should bring clarity. following, confine basic points nonetheless deem important. orders magnitude eight 2A; Fig corresponding plot copy essentially characteristics; EV5). difference alone (at part) overall LC-MS/MS. because limited "sequencing capacity" spectrometry. Thus, detecting low-abundance molecules will harder, wider sampling depth is. (paired-end) 18 M reads Those distributed 4 inevitable bias abundant only ~76,000 ~284,000 tandem (peptide spectrum matches; PSMs) result, easier 2. Analysis Distribution scale) exceeds (FPKM scale; S10 numbers). Protein-to-mRNA slope regression high-abundance copies mRNAs. Ranked heart. 70% tissue, represent 20% shared 100 Regardless rarely 20%. Correlation protein-to-RNA (in resulting 90% positive correlations. next marked. Examples (SYK, left panel) (EIF4A3, right protein/RNA ratios former express SYK, EIF4A3 appears similar noted before, implies synthesis role determining Vogel Marcotte, 2012). Similarly, produced molecule larger high- transcripts, quadratic (slope 2.6 2B 1.8 2.7 EV2A; S11). obs
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (64)
CITATIONS (646)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....