Quantification and discovery of sequence determinants of protein‐per‐mRNA amount in 29 human tissues
Sequence (biology)
DOI:
10.15252/msb.20188513
Publication Date:
2019-02-18T16:25:34Z
AUTHORS (13)
ABSTRACT
Article18 February 2019Open Access Transparent process Quantification and discovery of sequence determinants protein-per-mRNA amount in 29 human tissues Basak Eraslan Computational Biology, Department Informatics, Technical University Munich, Garching, Germany Graduate School Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Search for more papers by this author Dongxue Wang orcid.org/0000-0002-4402-0690 Chair Proteomics Bioanalytics, Freising, Mirjana Gusic Institute Human Genetics, Helmholtz Zentrum Neuherberg, Holger Prokisch orcid.org/0000-0003-2379-6286 Björn M Hallström Science Life Laboratory, KTH - Royal Technology, Stockholm, Sweden Mathias Uhlén orcid.org/0000-0002-4858-8056 Anna Asplund Immunology, Genetics Pathology, Uppsala University, Uppsala, Frederik Pontén orcid.org/0000-0003-0703-3940 Thomas Wieland Hopf Hannes Hahne Corresponding Author [email protected] orcid.org/0000-0003-3601-0051 OmicScouts GmbH, Bernhard Kuster orcid.org/0000-0002-9094-1677 Center For Integrated Protein Munich (CIPSM), Julien Gagneur orcid.org/0000-0002-8924-8365 Information Eraslan1,2,‡, Wang3,‡, Gusic4,5, Prokisch4,5, Hallström6, Uhlén6, Asplund7, Pontén7, Wieland3, Hopf3, *,8, *,3,9 *,1 1Computational 2Graduate 3Chair 4Institute 5Institute 6Science 7Department 8OmicScouts 9Center ‡These authors contributed equally to work *Corresponding author. Tel: +49 8161 9762892; E-mail: 71 5696; 89 289 19411; Molecular Systems Biology (2019)15:e8513https://doi.org/10.15252/msb.20188513 See also: D et al (February 2019) PDFDownload PDF article text main figures. Peer ReviewDownload a summary the editorial decision including letters, reviewer comments responses feedback. ToolsAdd favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract Despite their importance determining protein abundance, comprehensive catalogue features controlling protein-to-mRNA (PTR) ratios quantification effects are still lacking. Here, we quantified PTR 11,575 proteins across using matched transcriptomes proteomes. We estimated regression contribution known synthesis degradation addition 45 mRNA 3 motifs that found association testing. While span than 2 orders magnitude, our integrative model predicts at median precision 3.2-fold. A reporter assay provided functional support two novel UTR motifs, an immobilized affinity competition-binding identified motif-specific bound one motif. Moreover, led new metric codon optimality captures frequency on degradation. Altogether, study shows large fraction ratio variation can be predicted from sequence, it identifies many candidate post-transcriptional regulatory elements. Synopsis Protein-to-mRNA Sequence-based predictions tissue-specific reveal elements yield metrics optimality. sequence-based genes Reporter assays provide proteome-wide adaptation index (PTR-AI), optimality, Introduction Unraveling how gene regulation is encoded genomes central delineating programs understanding predispositions diseases. Although transcript abundance major determinant substantial deviations between levels expression exist (Liu al, 2016). These include much larger dynamic range abundances (García-Martínez 2007; Lackner Schwanhäusser 2011; Wilhelm 2014; Csárdi 2015) poor mRNA–protein correlations important classes cell types (Fortelny 2017; Franks 2017). emphasized non-steady-state conditions driven gene-specific rates (Peshkin 2015; Jovanovic Therefore, consider number molecules per molecule when studying code. Decades single-gene studies have revealed numerous affecting initiation, elongation, termination translation as well Eukaryotic canonically initiated after ribosome, which scanning 5′ cap, recognizes start codon. Start codons secondary structures interfere with ribosome (Kozak, 1984; Kudla 2009). Also, context plays role recognition 1986). The elongation rate determined decoding each coding (Sorensen 1989; Gardin Hanson Coller, 2018). It understood low some tRNAs leads longer time cognate (Varenne 1984), turn lead repressed initiation consistent traffic jam (reviewed However, estimates times cells overall highly debated (Plotkin Kudla, Quax Secondary structure chemical properties nascent peptide chain further modulate (Qu Artieri Fraser, Sabi Tuller, Dao Duc Song, Translation triggered stop its recognition, whereby non-favorable sequences translational read-through (Bonetti 1995; McCaughan Poole Tate 1996). Furthermore, RNA binding (RBPs) microRNAs (miRNAs) recruited mRNAs sequence-specific sites regulate various steps (Baek 2008; Selbach Guo 2010; Gerstberger Hudson Ortlund, Cottrell not only predicting miRNAs RBPs difficult, but few these events understood. Complementary translation, also abundance. Degrons signals acquired or inherent (Geffen first discovered degron was N-terminal amino acid (Bachmair exact mechanism debated, recent data yeast indicating general hydrophobicity region stability (Kats Further protein-encoded degrons several linear structural (Ravid Hochstrasser, Geffen 2016; Maurer 2016), phosphorylated recognized ubiquitin ligases (Mészáros contribute produce. neither nor they quantitatively abundances. To address questions line, Vogel colleagues (Vogel 2010) performed multivariate analysis predict features. This seminal based transcriptome proteome single type, Daoy medulloblastoma cells. Whether conclusions drawn generalized genome-wide other remains open question. transcriptomics proteomics technologies were sensitive quantitative today, leaving reliable 476 protein-coding analysis. among most abundant proteins, therefore leading possibly strong biases. focused refrained discovering exploited (Fig 1A, (PTR ratios) sequence. interpret findings related (Radhakrishnan Green, degradation, included half-life measurements (Tani 2012; Schueler Schwalb profiling 17 independent (Dana O'Connor 2016) immortal primary lines (Zecha 2018; Mathieson Fig 1A). considered candidates UTR, 3′ means systematic modeled effect ratio, measure compared existing metrics. Our all individual relative error Finally, providing initial experimental results assess relevance potentially Figure 1. Variation Overview datasets analyzed study. considering dataset (Wang 2019). interpreted respect occupancy datasets, reflecting datasets. Solid represent dependencies basic kinetic model. Dashed line represents coupling Proportion variance (Materials Methods) explained (y-axis) against proportion same tissue (x-axis) tissues. gray identity y = x. Ad (Adrenal), Ap (Appendices), Br (Brain), Co (Colon), Du (Duodenum), En (Endometrium), Es (Esophagus), FT (Fallopian tube), Fa (Fat), GB (Gall bladder), He (Heart), Ki (Kidney), Li (Liver), Lu (Lung), Ly (Lymphnode), Ov (Ovary), Pa (Pancreas), Pl (Placenta), Pr (Prostate), Re (Rectum), SG (Salivary gland), SI (Small intestine), SM (Smooth muscle), Sp (Spleen), St (Stomach), Te (Testis), Th (Thyroid), (Tonsil), UB (Urinary bladder). Same (B) levels, i.e., log-ratios level Distribution standard deviation (log10) housekeeping (left) (right). varies significantly less (Wilcoxon test). Shown quartiles (boxes horizontal lines) furthest points within 1.5 interquartile lower upper (whiskers). (right) 15 latent factors fitted joint optimization likelihood both modalities (Argelaguet factor Multi-Omics Factor Analysis (MOFA) Factors active capture shared covariation tissues, signal specific modality. Download figure PowerPoint Results Matched transcriptomic proteomic Using label-free RNA-Seq, profiled proteomes adjacent cryo-sections histologically healthy specimens collected Atlas project (Fagerberg 2014) facilitate analysis, every isoform because there little evidence widespread multiple isoforms avoid practical difficulties calling quantifying consistently levels. small (10% 13,664 detected least tissue). 5,636 (43%) (out 12,978 expressed [FPKM > 1] tissue; Materials Methods, Appendix S1). 4,303 (34%) had perfect match RNA-Seq-defined proteomics-defined 12,920 measurements). remaining genes, mismatches varying yet unmatched ones almost (Appendix S2). Since restricted counts level, defined largest average isoform. used compute Methods). RNA-Seq subtracting length sequencing-depth-normalized intronic exonic coverages Subtracting coverage slightly improved sample S3), better reflects concentration mature mRNAs, exposed machinery. technical replicates summarized value. Requiring 10 reads kilobase pair correlation likely values associated measurement error. Lastly, transcripts 6 nt make sure could computed. quantifications (Tables EV1, EV2, EV3 EV4), where 7,972 (69%, minimum 7,300 maximum 8,869) tissue. How explain adjusting has been over last years (Maier 2009; Lundberg Edfors Fortelny In tissue, 1B, x-axis, ranged 20% (ovary) 39% (liver). observed proportions profiles (between 41% pancreas 56% liver, y-axis, P < 10−132 reasons increase twofold. Biologically, conceivable co-expression patterns predictive functionally co-regulated (Franks Technically, may robust nature measures observations (2015) de-noising budding enhance different relevant comparison EV1A B). variations log-ratio similarly level. 0% 43% (brain) suggesting 1C). earlier analyses More interestingly, 7% (colon) 51% 1C, significant 10−19, except pancreas), again indicates regulation. Evidence co-regulation corroborated set enrichment analyses. Among Atlas, abundantly general, fairly similar 1D). Gene (FDR 0.1) DAVID (Huang 2009a,b) cellular complex assembly, negative metabolic process, cytoplasmic transport biological processes enriched EV1C). localized certain components such chaperonin-containing T-complex, whole membrane, cytoskeleton EV1D). contrast, strongly point toward cell-specific biology cilium organization, glycolipid biosynthetic single–multicellular organism inflammatory response EV1E) localizations extracellular space, intrinsic component secretory vesicles granules EV1F). Click here expand figure. EV1. 4,506 measured fold-change Values shown (A) centered ontology terms (Biological Process) whose decile ((0.08,0.22] log10) 9,665 five (Cellular Component) ((0.7, 1.73] next 3,753 valid (standard greater threefold), 31 displayed positive 569 measures. 2018) showed explaining 60% across-tissue able 35% 1E). either failed find 1F). Together, suggest Tissue specificity investigated translation. Overall, 1,233 out inspected 1,542 manually curated (2014). Of these, 825 S4A). According scores 135 being (Table EV5) dataset, observation majority ubiquitously typically higher (Vaquerizas Kechavarzi Janga, 2014). spermatogenesis, multi-organism reproductive DNA modification, meiotic nuclear division germ plasm, pole granule 0.1; S4B C). Sequence identify quantify derived alone. includes mRNA-encoded termination, 2A B, Table EV6). GC content order de novo through testing fold-changes median,
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (149)
CITATIONS (76)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....