Yannick Marcon

ORCID: 0000-0003-0138-2023
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Health, Environment, Cognitive Aging
  • Data Analysis with R
  • Data Quality and Management
  • Nutritional Studies and Diet
  • Advanced Causal Inference Techniques
  • Genetic Associations and Epidemiology
  • Health disparities and outcomes
  • Research Data Management Practices
  • Scientific Computing and Data Management
  • Landslides and related hazards
  • Birth, Development, and Health
  • Sensor Technology and Measurement Systems
  • Data-Driven Disease Surveillance
  • Gene expression and cancer classification
  • Data Mining Algorithms and Applications
  • Groundwater flow and contamination studies
  • Big Data Technologies and Applications
  • Statistical Methods and Inference
  • Bioinformatics and Genomic Networks
  • Hydrological Forecasting Using AI
  • Distributed and Parallel Computing Systems
  • Ethics in Clinical Research
  • Machine Learning in Healthcare
  • Energy Efficiency and Management
  • Hydrology and Watershed Management Studies

Epigénétique et Destin Cellulaire
2022-2024

McGill University Health Centre
2013-2018

Research in modern biomedicine and social science requires sample sizes so large that they can often only be achieved through a pooled co-analysis of data from several studies. But the pooling information individuals central database may queried by researchers raises important ethico-legal questions controversial. In UK this has been highlighted recent debate controversy relating to UK's proposed 'care.data' initiative, these issues reflect societal professional concerns about privacy,...

10.1093/ije/dyu188 article EN cc-by-nc International Journal of Epidemiology 2014-09-27

Individual-level data pooling of large population-based studies across research centres in international projects faces many hurdles. The BioSHaRE (Biobank Standardisation and Harmonisation for Research Excellence the European Union) project aims to address these issues by building a collaborative group investigators developing tools harmonization, database integration federated analyses.Eight six countries were recruited participate project. Through workshops, teleconferences electronic...

10.1186/1742-7622-10-12 article EN cc-by Emerging Themes in Epidemiology 2013-11-21

Improving the dissemination of information on existing epidemiological studies and facilitating interoperability study databases are essential to maximizing use resources accelerating improvements in health. To address this, Maelstrom Research proposes Opal Mica, two inter-operable open-source software packages providing out-of-the-box solutions for data management, harmonization dissemination.Opal Mica standalone but web applications written Java, JavaScript PHP. They provide services...

10.1093/ije/dyx180 article EN cc-by-nc International Journal of Epidemiology 2017-08-09

Combined analysis of multiple, large datasets is a common objective in the health- and biosciences. Existing methods tend to require researchers physically bring data together one place or follow an plan share results. Developed over last 10 years, DataSHIELD platform collection R packages that reduce challenges these methods. These include ethico-legal constraints which limit researchers' ability analytical inflexibility associated with conventional approaches sharing The key feature from...

10.1371/journal.pcbi.1008880 article EN cc-by PLoS Computational Biology 2021-03-30

Background The lack of accessible and structured documentation creates major barriers for investigators interested in understanding, properly interpreting analyzing cohort data biological samples. Providing the scientific community with open information is essential to optimize usage these resources. A cataloguing toolkit proposed by Maelstrom Research answer needs support creation comprehensive user-friendly study- network-specific web-based metadata catalogues. Methods Development was...

10.1371/journal.pone.0200926 article EN cc-by PLoS ONE 2018-07-24

Existing individual-level human data cover large populations on many dimensions such as lifestyle, demography, laboratory measures, clinical parameters, etc. Recent years have seen investments in catalogues to FAIRify descriptions capitalise this great promise, i.e. make catalogue contents more Findable, Accessible, Interoperable and Reusable. However, their valuable diversity also created heterogeneity, which poses challenges optimally exploit richness.In opinion review, we analyse for...

10.1055/s-0042-1742522 article EN cc-by-nc-nd Yearbook of Medical Informatics 2022-08-01

The importance of maintaining data privacy and complying with regulatory requirements is highlighted especially when sharing omic between different research centers. This challenge even more pronounced in the scenario where a multi-center effort for collaborative omics studies necessary. OmicSHIELD introduced as an open-source tool aimed at overcoming these challenges by enabling privacy-protected federated analysis sensitive data. In order to ensure this, multiple security mechanisms have...

10.1371/journal.pcbi.1012626 article EN cc-by PLoS Computational Biology 2024-12-09

Abstract Summary Extensive human health data from cohort studies, national registries, and biobanks can reveal lifecourse risk factors impacting health. Combining these sources offers increased statistical power, rare outcome detection, replication of findings, extended study periods. Traditionally, this required transfer to a central location or separate partner analyses with pooled summary statistics, posing ethical, legal, time constraints. Federated analysis—which involves remote...

10.1093/bioinformatics/btae726 article EN cc-by Bioinformatics 2024-12-02

In multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those cohort-specific. Multi-task (MTL) a approach facilitates this differentiation through the simultaneous of prediction tasks cohorts. Since data can often not be combined into single storage solution, there would substantial utility an MTL application for geographically distributed sources.Here, we describe development 'dsMTL', computational framework...

10.1093/bioinformatics/btac616 article EN cc-by Bioinformatics 2022-09-07

Abstract Multitask learning allows the simultaneous of multiple ‘communicating’ algorithms. It is increasingly adopted for biomedical applications, such as modeling disease progression. As data protection regulations limit sharing analyses, an implementation multitask on geographically distributed sources would be highly desirable. Here, we describe development dsMTL, a computational framework privacy-preserving, multi-task machine that includes three supervised and one unsupervised dsMTL...

10.1101/2021.08.26.457778 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2021-08-28

[9:56 AM] Emmanuel DuboisThe project introduces the new R package, rechaRge, dedicated to open-source groundwater recharge (GWR) models. The goal is facilitate simulation of GWR estimates for researchers, professionals, and stakeholders, both hydrogeologists non-hydrogeologists, by providing all tools state-of-art modelling available models in a single package. package includes functions data preparation (utility functions), automatic calibration, sensitivity analysis, uncertainty integrated...

10.5194/egusphere-egu24-16210 preprint EN 2024-03-09

Summary. Extensive human health data from cohort studies, national registries, and biobanks can reveal lifecourse risk factors impacting health. Combining these sources offers increased statistical power, rare outcome detection, replication of findings, extended study periods. Traditionally, this required transfer to a central location or separate partner analyses with pooled summary statistics, posing ethical, legal, time constraints. Federated analysis – which involves remote without...

10.31219/osf.io/xc86p preprint EN 2024-10-28

Abstract Motivation The validity of epidemiologic findings can be increased using triangulation, i.e. comparison across contexts, and by having sufficiently large amounts relevant data to analyse. However, access is often constrained practical considerations ethico-legal governance restrictions. Gaining such time-consuming due the requirements associated with requests institutions in different jurisdictions. Results DataSHIELD a software solution that enables remote analysis without need for...

10.1093/bioadv/vbaf046 article EN cc-by Bioinformatics Advances 2024-12-26

Abstract Motivation DataSHIELD is an open-source software infrastructure enabling the analysis of data distributed across multiple databases (federated data) without leaking individuals’ information (non-disclosive). It has applications in many scientific domains, ranging from biosciences to social sciences and including high-throughput genomic studies. R language used interact with (and build) DataSHIELD. This creates difficulties for researchers who do not have experience writing code or...

10.1093/ije/dyac201 article EN cc-by-nc-nd International Journal of Epidemiology 2022-10-27

DATA REPORT article Front. Public Health, 03 October 2022Sec. Life-Course Epidemiology and Social Inequalities in Health Volume 10 - 2022 | https://doi.org/10.3389/fpubh.2022.964086

10.3389/fpubh.2022.964086 article EN cc-by Frontiers in Public Health 2022-10-03
Coming Soon ...