Samuel S. Kim

ORCID: 0000-0003-0491-0784
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genetic Associations and Epidemiology
  • Bioinformatics and Genomic Networks
  • RNA Research and Splicing
  • Kruppel-like factors research
  • Genetic Mapping and Diversity in Plants and Animals
  • Genomics and Chromatin Dynamics
  • Genetic Syndromes and Imprinting
  • Epigenetics and DNA Methylation
  • Single-cell and spatial transcriptomics
  • RNA and protein synthesis mechanisms
  • Genomics and Rare Diseases
  • Genetic and phenotypic traits in livestock
  • Privacy-Preserving Technologies in Data
  • Gene expression and cancer classification
  • Machine Learning in Bioinformatics
  • RNA modifications and cancer
  • Fibroblast Growth Factor Research
  • Genetics and Neurodevelopmental Disorders
  • Photoreceptor and optogenetics research
  • Genetic and Kidney Cyst Diseases
  • Cryptography and Data Security
  • 3D Printing in Biomedical Research
  • Natural Language Processing Techniques
  • Cancer-related gene regulation
  • Mast cells and histamine

Massachusetts Institute of Technology
2017-2024

Harvard University
2018-2023

Broad Institute
2019-2022

Emory University
2010-2011

Drexel University
1988

Abstract Many diseases exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting polygenic risk prediction. We develop a new method, S-LDXR, for stratifying squared correlation across genomic annotations, and apply S-LDXR to genome-wide summary statistics 31 complex traits in East Asians (average N = 90K) Europeans 267K) an average of 0.85. determine that is 0.82× (s.e. 0.01) depleted the top quintile background selection...

10.1038/s41467-021-21286-1 article EN cc-by Nature Communications 2021-02-17

Abstract Prioritizing disease-critical cell types by integrating genome-wide association studies (GWAS) with functional data is a fundamental goal. Single-cell chromatin accessibility (scATAC-seq) and gene expression (scRNA-seq) have characterized at high resolution, GWAS scRNA-seq shown promise, but scATAC-seq been limited. Here, we identify fetal adult brain summary statistics from 28 brain-related diseases/traits (average N = 298 K) 3.2 million profiles 83 types. We identified...

10.1038/s41467-024-44742-0 article EN cc-by Nature Communications 2024-01-17

Abstract Genetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a new method polygenic prediction, LDpred-funct, that leverages trait-specific priors to increase prediction accuracy. We fit using recently developed baseline-LD model, which includes coding, conserved, regulatory and LD-related annotations. analytically estimate posterior mean causal effect sizes then use cross-validation regularize these estimates, improving...

10.1101/375337 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2018-07-24

Recent studies have highlighted the role of gene networks in disease biology. To formally assess this, we constructed a broad set pathway, network, and pathway+network annotations applied stratified LD score regression to 42 diseases complex traits (average N = 323K) identify enriched annotations. First, analyzed 18,119 biological pathways. We identified 156 pathway-trait pairs whose enrichment was statistically significant (FDR < 5%) after conditioning on all genes 75 known functional (from...

10.1016/j.ajhg.2019.03.020 article EN publisher-specific-oa The American Journal of Human Genetics 2019-05-01

We assess contributions to autoimmune disease of genes whose regulation is driven by enhancer regions (enhancer-related) and that regulate other in trans (candidate master-regulator). link these SNPs using several SNP-to-gene (S2G) strategies apply heritability analyses draw three conclusions about 11 autoimmune/blood-related diseases/traits. First, characterizations enhancer-related functional genomics data are informative for after conditioning on a broad set regulatory annotations....

10.1016/j.xgen.2022.100145 article EN cc-by-nc-nd Cell Genomics 2022-07-01

Deep learning models have shown great promise in predicting regulatory effects from DNA sequence, but their informativeness for human complex diseases is not fully understood. Here, we evaluate genome-wide SNP annotations two previous deep models, DeepSEA and Basenji, by applying stratified LD score regression to 41 traits (average N = 320K), conditioning on a broad set of coding, conserved annotations. We aggregated across all (respectively blood or brain) tissues/cell-types meta-analyses...

10.1038/s41467-020-18515-4 article EN cc-by Nature Communications 2020-09-17

Abstract Despite considerable progress on pathogenicity scores prioritizing variants for Mendelian disease, little is known about the utility of these common disease. Here, we assess informativeness disease-derived disease and improve upon existing scores. We first apply stratified linkage disequilibrium (LD) score regression to evaluate published across 41 diseases complex traits (average N = 320K). Several resulting annotations are informative even after conditioning a broad set functional...

10.1038/s41467-020-20087-2 article EN cc-by Nature Communications 2020-12-07

Prioritizing disease-critical cell types by integrating genome-wide association studies (GWAS) with functional data is a fundamental goal. Single-cell chromatin accessibility (scATAC-seq) and gene expression (scRNA-seq) have characterized at high resolution, early work on GWAS scRNA-seq has shown promise, but scATAC-seq been limited. Here, we identify fetal adult brain summary statistics from 28 brain-related diseases traits (average N=298K) 3.2 million profiles 83 types. We identified...

10.1101/2021.05.20.445067 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2021-05-21

Abstract Many diseases and complex traits exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting polygenic risk prediction. We developed a new method, S-LDXR, for stratifying squared correlation across genomic annotations, applied S-LDXR to genome-wide association summary statistics 31 in East Asians (EAS) Europeans (EUR) (average N EAS =90K, EUR =267K) an average of 0.85 (s.e. 0.01). determined that was 0.82× 0.01) smaller...

10.1101/803452 preprint EN cc-by-nc bioRxiv (Cold Spring Harbor Laboratory) 2019-10-15

The prodigious growth of digital health data has precipitated a mounting interest in harnessing machine learning methodologies, such as natural language processing (NLP), to scrutinize medical records, clinical notes, and other text-based information. Although NLP techniques have exhibited substantial potential augmenting patient care informing decision-making, privacy adherence regulations persist critical concerns. Federated (FL) emerges viable solution, empowering multiple organizations...

10.1109/icdcs57875.2023.00115 article EN 2023-07-01

Abstract The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship causal disease effect sizes between proximal SNPs, which have largely assumed to be independent. We introduce a new method, LD SNP-pair correlation regression (LDSPEC), estimate derived alleles depending on their allele frequencies, LD, functional annotations; LDSPEC produced robust estimates in simulations across various architectures. applied 70...

10.21203/rs.3.rs-3707248/v1 preprint EN cc-by Research Square (Research Square) 2023-12-15

Abstract Despite considerable progress on pathogenicity scores prioritizing both coding and noncoding variants for Mendelian disease, little is known about the utility of these common disease. Here, we sought to assess informativeness diseasederived improve upon existing scores. We first applied stratified LD score regression annotations defined by top from published disease-derived across 41 independent diseases complex traits (average N = 320K). Several resulting were informative even...

10.1101/2020.01.02.890657 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2020-01-03

Abstract Gene regulation is known to play a fundamental role in human disease, but mechanisms of vary greatly across genes. Here, we explore the contributions disease two types genes: genes whose driven by enhancer regions as opposed promoter (enhancer-related) and that regulate other trans (candidate master-regulator). We link these SNPs using comprehensive set SNP-to-gene (S2G) strategies apply stratified LD score regression resulting SNP annotations draw three main conclusions about 11...

10.1101/2020.09.02.279059 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2020-09-03

Abstract Deep learning models have achieved great success in predicting genome-wide regulatory effects from DNA sequence, but recent work has reported that SNP annotations derived these predictions contribute limited unique information for human complex disease. Here, we explore three integrative approaches to improve the disease informativeness of allelic-effect (predicted difference between reference and variant alleles) constructed using several previously trained deep models: DeepSEA,...

10.1101/2020.09.08.288563 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2020-09-09

Chart corpora, which comprise data visualizations and their semantic labels, are crucial for advancing visualization research. However, the labels in most existing chart corpora high-level (e.g., types), hindering utility broader interactive applications like reuse, animation, accessibility. In this paper, we contribute VisAnatomy, a corpus containing 942 real-world SVG charts produced by over 50 tools, encompassing 40 types featuring structural stylistic design variations. Each is augmented...

10.48550/arxiv.2410.12268 preprint EN arXiv (Cornell University) 2024-10-16

The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship causal disease effect sizes between proximal SNPs, which have largely assumed to be independent. We introduce a new method, LD SNP-pair correlation regression (LDSPEC), estimate derived alleles depending on their allele frequencies, LD, functional annotations; LDSPEC produced robust estimates in simulations across various architectures. applied 70 from UK...

10.1101/2023.12.04.23299391 preprint EN cc-by-nd medRxiv (Cold Spring Harbor Laboratory) 2023-12-04

Abstract Deep learning models have shown great promise in predicting genome-wide regulatory effects from DNA sequence, but their informativeness for human complex diseases and traits is not fully understood. Here, we evaluate the disease of allelic-effect annotations (absolute value predicted difference between reference variant alleles) constructed using two previously trained deep models, DeepSEA Basenji. We apply stratified LD score regression (S-LDSC) to 41 independent (average N=320K)...

10.1101/784439 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2019-09-26
Coming Soon ...