- Biomedical Text Mining and Ontologies
- Topic Modeling
- Software System Performance and Reliability
- Machine Learning in Healthcare
- Software Engineering Techniques and Practices
- Artificial Intelligence in Healthcare and Education
- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Natural Language Processing Techniques
- Software Engineering Research
- Cloud Computing and Resource Management
- Artificial Intelligence in Healthcare
- Advanced Software Engineering Methodologies
- Interconnection Networks and Systems
- Software Testing and Debugging Techniques
- Semantic Web and Ontologies
- Pharmacovigilance and Adverse Drug Reactions
- Service-Oriented Architecture and Web Services
- Advanced Database Systems and Queries
- Food Security and Health in Diverse Populations
- Information Technology Governance and Strategy
- Misinformation and Its Impacts
- Model-Driven Software Engineering Techniques
- Advanced Computational Techniques and Applications
- Privacy-Preserving Technologies in Data
John Snow (United States)
2020-2024
Hebrew University of Jerusalem
1999-2009
United States Air Force
2006
Abstract Medical artificial intelligence (AI) has tremendous potential to advance healthcare by supporting and contributing the evidence-based practice of medicine, personalizing patient treatment, reducing costs, improving both provider experience. Unlocking this requires systematic, quantitative evaluation performance medical AI models on large-scale, heterogeneous data capturing diverse populations. Here, meet need, we introduce MedPerf, an open platform for benchmarking in domain....
Distributed memory parallel systems such as the IBM SP2 execute jobs using variable partitioning. Scheduling in FCFS order leads to severe fragmentation and utilization loss, which lead development of backfilling schedulers EASY. This paper presents a scheduler that improves EAST two ways: It supports both user selected administrative priorities, guarantees bounded wait time for all jobs. The gives each waiting job slack, determines how long it may have before running: 'important' 'heavy'...
Spark NLP is a Natural Language Processing (NLP) library built on top of Apache ML. It provides simple, performant & accurate annotations for machine learning pipelines that can scale easily in distributed environment. comes with 1100+ pretrained and models more than 192+ languages. supports nearly all the tasks modules be used seamlessly cluster. Downloaded 2.7 million times experiencing 9x growth since January 2020, by 54% healthcare organizations as world's most widely enterprise.
We introduce an agile, production-grade clinical and biomedical Named entity recognition (NER) algorithm based on a modified BiLSTM-CNN-Char DL architecture built top of Apache Spark. Our NER implementation establishes new state-of-the-art accuracy 7 8 well-known benchmarks 3 concept extraction challenges: 2010 i2b2/VA extraction, 2014 n2c2 de-identification, 2018 medication extraction. Moreover, models trained using this outperform the commercial solutions, AWS Medical Comprehend Google...
The use of natural language processing (NLP) models, including the more recent large models (LLM) in real-world applications obtained relevant success past years. To measure performance these systems, traditional metrics such as accuracy, precision, recall, and f1-score are used. Although it is important to those terms, often requires an holistic evaluation that consider other aspects robustness, bias, toxicity, fairness, safety, efficiency, clinical relevance, security, representation,...
Agile software development in general and Extreme Programming (XP) particular, promote radical changes how organizations traditionally work. We present analyze new data from a real, large-scale agile project to develop business-critical enterprise information system for the Israeli Air Force (IAF). Our results offer evidence that testing practices actually work, dramatically improving quality productivity. describe organization's successful guidelines four key areas: test design activity...
<sec> <title>BACKGROUND</title> The proliferation of both general-purpose and healthcare-specific Large Language Models (LLMs) has intensified the challenge effectively evaluating comparing them. Data contamination plagues validity public benchmarks; self-preference distorts LLM-as-a-judge approaches; there’s a gap between tasks used to test models those in clinical practice. </sec> <title>OBJECTIVE</title> In response, we propose CLEVER: A methodology for blind, randomized, preference-based...
Effective governance of agile software teams is challenging but required to enable wide adoption methodologies, in particular for large-scale projects. In this paper we apply a full lifecycle model projects, focused on the iteration level. The concept demonstrated via case study large-scale, enterprise-critical project that implemented practices. We analyze three events, including metrics triggered event, decisions taken and followup ensure resolution. conclude iterations can be naturally...
It is a significant challenge to implement and research agile software development methods in organizations such as the army. Since it differs from industry academia, data gathered army its continuous analysis may enrich community knowledge abut methods. This work describes research, conducted during an entire release, about one team at Israeli Air Force that works according Extreme Programming. The establishment of this investigation first release part long-term process, started last year,...
The scheduler is a key component in determining the overall performance of parallel computer, and as we show here, schedulers wide use today exhibit large unexplained gaps during their operation. Also, different scheduling algorithms often vary they show, suggesting that choosing correct for each time frame can improve performance. We present two adaptive achieve this: One chooses by recent past performance, other average degree parallelism, which shown to be correlated algorithmic...
This paper analyzes the reflections of an agile team, developing a large-scale project in industry setting. The team uses iteration summary meeting practice, which includes four elements: customer's summary, formal presentation system, review metrics and reflection. technique for entire reflection element particular is described, empirical evidence given to show that it assessed as highly effective, achieving its intended goals, increasing satisfaction. Further, proposed practice supports...
Learning useful and predictable features from past workloads exploiting them well is a major source of improvement in many operating system problems. We review known parallel workload features, argue that the correct approach for future on-line algorithm design as modeling user- session-based modeling, instead analyzing jobs directly done today. then provide statistically sound answers to two basic questions: Which user session are central enough be potentially useful, answered using...
We present a multivariate analysis technique called Co-Plot that is especially suitable for few samples of many variables. embeds the multidimensional in two dimensions, way allows key variables to be identified, and relations between both observations analyzed together. When applied workloads on parallel supercomputers, we find stable perpendicular axes highly correlated variables, one representing individual job attributes other multijob attributes. The different workloads, hand, are...
As Electronic Health Records (EHR) become ubiquitous in healthcare systems worldwide, including Arabic-speaking countries, the dual imperative of safeguarding patient privacy and leveraging data for research quality improvement grows. This paper presents a first-of-its-kind automated de-identification pipeline medical text specifically tailored Arabic language. includes accurate Named Entity Recognition (NER) identifying personal information; obfuscation models to replace sensitive entities...
Being a global pandemic, the COVID-19 outbreak received media attention. In this study, we analyze news publications from CNN and The Guardian - two of world's most influential organizations. dataset includes more than 36,000 articles, analyzed using clinical biomedical Natural Language Processing (NLP) models Spark NLP for Healthcare library, which enables deeper analysis medical concepts previously achieved. covers key entities phrases, observed biases, change over time in coverage by...
We present a new automated software acceptance tests framework. The framework is novel in supporting the entire lifecycle and all QA activities, including test maintenance over multiple versions, interaction with programmers business analysts, traceability to specifications, multi-user cases more. This enables significant increase productivity product quality. compare our other available tools, products frameworks, several patterns anti-patterns for implementing successful testing solution.
The surging amount of biomedical literature & digital clinical records presents a growing need for text mining techniques that can not only identify but also semantically relate entities in unstructured data. In this paper we propose framework comprising Named Entity Recognition (NER) and Relation Extraction (RE) models, which expands on previous work three main ways. First, introduce two new RE model architectures -- an accuracy-optimized one based BioBERT speed-optimized utilizing crafted...
Agile software development methods mainly aim at increasing quality by fostering customer collaboration and performing exhaustive testing. The introduction of Extreme Programming (XP) – the most common agile method into an organization is accompanied with conceptual organizational changes. These changes range from daily-life (e.g., sitting together maintaining informative project environment) continue on management level meeting listening to during whole process concept team which means that...
Following the global COVID-19 pandemic, number of scientific papers studying virus has grown massively, leading to increased interest in automated literate review. We present a clinical text mining system that improves on previous efforts three ways. First, it can recognize over 100 different entity types including social determinants health, anatomy, risk factors, and adverse events addition other commonly used biomedical entities. Second, processing pipeline includes assertion status...
Social determinants of health (SDoH) significantly influence outcomes, accounting for nearly 40% such outcomes globally. These determinants, pivotal in understanding disparities, are insufficiently documented clinical settings and academic narratives. To address this gap, we examined case reports from PubMed (1975&ndash;2022) to identify mentions six specific SDoH, employing a pre-trained named-entity recognition (NER) model Spark natural language processing (NLP). Multivariate logistic...