Steve Nyemba

ORCID: 0009-0009-5377-0037
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Machine Learning in Healthcare
  • Electronic Health Records Systems
  • Artificial Intelligence in Healthcare
  • Artificial Intelligence in Healthcare and Education
  • AI in cancer detection
  • Ethics in Clinical Research
  • Privacy-Preserving Technologies in Data
  • Data Quality and Management
  • Network Security and Intrusion Detection
  • Scientific Computing and Data Management
  • Information and Cyber Security
  • Complex Network Analysis Techniques
  • Software Engineering Research
  • Privacy, Security, and Data Protection
  • Emergency and Acute Care Studies
  • Autopsy Techniques and Outcomes
  • Generative Adversarial Networks and Image Synthesis
  • Research Data Management Practices
  • Chronic Disease Management Strategies
  • Data-Driven Disease Surveillance
  • Digital Imaging in Medicine
  • Semantic Web and Ontologies
  • COVID-19 diagnosis using AI
  • User Authentication and Security Systems
  • Trade Secret Protection Methods

Vanderbilt University Medical Center
2019-2024

Vanderbilt University
2011-2020

Collaborative information systems (CISs) are deployed within a diverse array of environments that manage sensitive information. Current security mechanisms detect insider threats, but they ill-suited to monitor in which users function dynamic teams. In this paper, we introduce the community anomaly detection system (CADS), an unsupervised learning framework threats based on access logs collaborative environments. The is observation typical CIS tend form structures subjects accessed (e.g.,...

10.1109/tdsc.2012.11 article EN IEEE Transactions on Dependable and Secure Computing 2012-01-17

10.1016/j.jbi.2011.01.007 article EN publisher-specific-oa Journal of Biomedical Informatics 2011-01-27

Collaborative information systems (CIS) enable users to coordinate efficiently over shared tasks in complex distributed environments. For flexibility, they provide with broad access privileges, which, as a side-effect, leave such vulnerable various attacks. Some of the more damaging malicious activities stem from internal misuse, where are authorized system resources. A promising class insider threat detection models for CIS focuses on mining patterns audit logs, however, current limited...

10.1186/2190-8532-1-5 article EN cc-by Security Informatics 2012-02-27

Sharing electronic health records (EHRs) on a large scale may lead to privacy intrusions. Recent research has shown that risks be mitigated by simulating EHRs through generative adversarial network (GAN) frameworks. Yet the methods developed date are limited because they 1) focus generating data of single type (e.g., diagnosis codes), neglecting other types demographics, procedures or vital signs) and 2) do not represent constraints between features. In this paper, we introduce method...

10.48550/arxiv.2003.07904 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Abstract Objective The All of Us Research Program makes individual-level data available to researchers while protecting the participants’ privacy. This article describes protections embedded in multistep access process, with a particular focus on how was transformed meet generally accepted re-identification risk levels. Methods At time study, resource consisted 329 084 participants. Systematic amendments were applied mitigate (eg, generalization geographic regions, suppression public events,...

10.1093/jamia/ocad021 article EN Journal of the American Medical Informatics Association 2023-02-21

Collaborative information systems (CIS) enable users to coordinate efficiently over shared tasks. They are often deployed in complex dynamic that provide with broad access privileges, but also leave the system vulnerable various attacks. Techniques detect threats originating from beyond relatively mature, methods insider still evolving. A promising class of threat detection models for CIS focus on communities manifest between based usage common subjects system. However, current only when a...

10.1109/isi.2011.5984061 article EN 2011-07-01

Re-identification risk methods for biomedical data often assume a worst case, in which attackers know all identifiable features (eg, age and race) about subject. Yet, worst-case adversarial modeling can overestimate induce heavy editing of shared data. The objective this study is to introduce framework assessing the considering attacker's resources capabilities.We integrate 3 established measures (ie, prosecutor, journalist, marketer risks) compute re-identification probabilities subjects....

10.1093/jamia/ocaa327 article EN Journal of the American Medical Informatics Association 2020-12-09

Synthetic electronic health record (EHR) data generation has been increasingly recognized as an important solution to expand the accessibility and maximize value of private on a large scale. Recent advances in machine learning have facilitated more accurate modeling for complex high-dimensional data, thereby greatly enhancing quality synthetic EHR data. Among various approaches, generative adversarial networks (GANs) become main technical path literature due their ability capture statistical...

10.2196/52615 article EN cc-by JMIR AI 2024-03-07

Clinical corpora can be deidentified using a combination of machine-learned automated taggers and hiding in plain sight (HIPS) resynthesis. The latter replaces detected personally identifiable information (PII) with random surrogates, allowing leaked PII to blend or "hide sight." We evaluated the extent which malicious attacker could expose such corpus.We modeled scenario where an institution (the defender) externally shared 800-note corpus actual outpatient clinical encounter notes from...

10.1093/jamia/ocz114 article EN Journal of the American Medical Informatics Association 2019-06-13

Abstract Objective Deep learning models for clinical event forecasting (CEF) based on a patient’s medical history have improved significantly over the past decade. However, their transition into practice has been limited, particularly diseases with very low prevalence. In this paper, we introduce CEF-CL, novel method contrastive to forecast in face of limited number positive training instances. Materials and Methods CEF-CL consists two primary components: (1) unsupervised patient...

10.1093/jamia/ocac086 article EN Journal of the American Medical Informatics Association 2022-05-31

Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method 100% effective. The hiding-in-plain-sight (HIPS) approach attempts solve this "residual PII problem." HIPS replaces tagged by a system with realistic fictitious (resynthesized) content, making it harder detect remaining unredacted PII.Using 2000 representative documents from 2 healthcare settings (4000 total), we used novel...

10.1093/jamia/ocaa095 article EN Journal of the American Medical Informatics Association 2020-05-26

Deep learning architectures have an extremely high-capacity for modeling complex data in a wide variety of domains. However, these been limited their ability to support prediction problems using insurance claims data, such as readmission at 30 days, mainly due sparsity issue. Consequently, classical machine methods, especially those that embed domain knowledge handcrafted features, are often on par with, and sometimes outperform, deep approaches. In this paper, we illustrate how the...

10.48550/arxiv.2104.04377 preprint EN other-oa arXiv (Cornell University) 2021-01-01

<sec> <title>UNSTRUCTURED</title> Synthetic electronic health record (EHR) data generation has been increasingly recognized as an important solution to expand the accessibility and maximize value of private on a large scale. Recent advances in machine learning have facilitated more accurate modeling for complex high-dimensional data, thereby greatly enhancing quality synthetic EHR data. Among various approaches, generative adversarial networks (GANs) become main technical path literature due...

10.2196/preprints.52615 preprint EN 2023-09-10

Artificial intelligence, and particularly machine learning (ML), is increasingly developed deployed to support healthcare in a variety of settings. However, clinical decision (CDS) technologies based on ML need be portable if they are adopted broad scale. In this respect, models at one institution should reusable another. Yet there numerous examples portability failure, due naive application models. Portability failure can lead suboptimal care medical errors, which ultimately could prevent...

10.48550/arxiv.2207.02445 preprint EN cc-by arXiv (Cornell University) 2022-01-01
Coming Soon ...