Joseph Enguehard

ORCID: 0000-0002-1648-3356
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Explainable Artificial Intelligence (XAI)
  • Natural Language Processing Techniques
  • 3D Shape Modeling and Analysis
  • COVID-19 diagnosis using AI
  • Machine Learning in Healthcare
  • Time Series Analysis and Forecasting
  • Anomaly Detection Techniques and Applications
  • Data Visualization and Analytics
  • Language and cultural evolution
  • Radiomics and Machine Learning in Medical Imaging
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Lung Cancer Diagnosis and Treatment

Boston Children's Hospital
2021

Harvard University
2021

Babylon Health
2020

Télécom Paris
2019

Deep neural networks usually require large labeled datasets to construct accurate models; however, in many real-world scenarios, such as medical image segmentation, labelling data is a time-consuming and costly human (expert) intelligent task. Semi-supervised methods leverage this issue by making use of small dataset larger set unlabeled data. In article, we present flexible framework for semi-supervised learning that combines the power supervised learn feature representations using...

10.1109/access.2019.2891970 article EN cc-by-nc-nd IEEE Access 2019-01-01

Integrated Gradients (IG), a widely used axiomatic path-based attribution method, assigns importance scores to input features by integrating model gradients along straight path from baseline the input. While effective in some cases, we show that paths can lead flawed attributions. In this paper, identify cause of these misattributions and propose an alternative approach treats space as Riemannian manifold, computing attributions geodesics. We call method Geodesic (GIG). To approximate...

10.48550/arxiv.2502.12108 preprint EN arXiv (Cornell University) 2025-02-17

Lung cancer is by far the leading cause of death in US. Recent studies have demonstrated effectiveness screening using low dose CT (LDCT) reducing lung related mortality. While nodules are detected with a high rate sensitivity, this exam has specificity and it still difficult to separate benign malignant lesions. The ISBI 2018 Nodule Malignancy Prediction Challenge, developed team from Quantitative Imaging Network National Cancer Institute, was focused on prediction nodule malignancy two...

10.1109/tmi.2021.3097665 article EN IEEE Transactions on Medical Imaging 2021-07-26

Several explanation methods such as Integrated Gradients (IG) can be characterised path-based methods, they rely on a straight line between the data and an uninformative baseline. However, when applied to language models, these produce path for each word of sentence simultaneously, which could lead creating sentences from interpolated words either having no clear meaning, or significantly different meaning compared original sentence. In order keep close possible one, we propose Sequential...

10.18653/v1/2023.findings-acl.477 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01

The modelling of Electronic Health Records (EHRs) has the potential to drive more efficient allocation healthcare resources, enabling early intervention strategies and advancing personalised healthcare. However, EHRs are challenging model due their realisation as noisy, multi-modal data occurring at irregular time intervals. To address temporal nature, we treat samples generated by a Temporal Point Process (TPP), us what happened in an event with when it principled way. We gather propose...

10.48550/arxiv.2007.13794 preprint EN other-oa arXiv (Cornell University) 2020-01-01

We introduce $\texttt{time_interpret}$, a library designed as an extension of Captum, with specific focus on temporal data. As such, this implements several feature attribution methods that can be used to explain predictions made by any Pytorch model. $\texttt{time_interpret}$ also provides synthetic and real world time series datasets, various PyTorch models, well set evaluate attributions. Moreover, while being primarily developed based data, some its components have different application,...

10.48550/arxiv.2306.02968 preprint EN cc-by arXiv (Cornell University) 2023-01-01

The choice of sentence encoder architecture reflects assumptions about how a sentence's meaning is composed from its constituent words. We examine the contribution these architectures by holding them randomly initialised and fixed, effectively treating as hand-crafted language priors, evaluating resulting encoders on downstream tasks. find that even when are presented with additional information can be used to solve tasks, corresponding priors do not leverage this information, except in an...

10.48550/arxiv.1910.03492 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Explaining predictions based on multivariate time series data carries the additional difficulty of handling not only multiple features, but also dependencies. It matters what happened, when, and same feature could have a very different impact prediction depending this information. Previous work has used perturbation-based saliency methods to tackle issue, perturbing an input using trainable mask discover which features at times are driving predictions. However these introduce fixed...

10.48550/arxiv.2305.18840 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Several explanation methods such as Integrated Gradients (IG) can be characterised path-based methods, they rely on a straight line between the data and an uninformative baseline. However, when applied to language models, these produce path for each word of sentence simultaneously, which could lead creating sentences from interpolated words either having no clear meaning, or significantly different meaning compared original sentence. In order keep close possible one, we propose Sequential...

10.48550/arxiv.2305.15853 preprint EN cc-by arXiv (Cornell University) 2023-01-01
Coming Soon ...