Stephen T. Casper

ORCID: 0000-0003-2915-8592
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Neurology and Historical Studies
  • Historical Psychiatry and Medical Practices
  • History of Medical Practice
  • Adversarial Robustness in Machine Learning
  • History of Science and Medicine
  • History of Medicine Studies
  • Explainable Artificial Intelligence (XAI)
  • Mental Health and Psychiatry
  • Canadian Identity and History
  • Neural Networks and Applications
  • Historical Gender and Feminism Studies
  • Medical History and Innovations
  • Traumatic Brain Injury Research
  • Topic Modeling
  • Ethics and Social Impacts of AI
  • Empathy and Medical Education
  • Machine Learning and Data Classification
  • Neuroethics, Human Enhancement, Biomedical Innovations
  • Natural Language Processing Techniques
  • Historical and Cultural Archaeology Studies
  • Diverse Historical and Scientific Studies
  • Australian Indigenous Culture and History
  • Anomaly Detection Techniques and Applications
  • Fault Detection and Control Systems
  • Artificial Intelligence in Healthcare and Education

Clarkson University
2014-2023

Icahn School of Medicine at Mount Sinai
2021

Australian National University
2021

University of Colorado Anschutz Medical Campus
2021

Muhlenberg College
2021

Royal Prince Alfred Hospital
2021

Harvard University
2020-2021

Harvard University Press
2021

Center for Pain and the Brain
2021

Boston Children's Hospital
2021

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with goals. RLHF has emerged as the central method used finetune state-of-the-art large language models (LLMs). Despite this popularity, there been relatively little public work systematizing its flaws. In paper, we (1) survey open problems and fundamental limitations of related methods; (2) overview techniques understand, improve, complement in practice; (3) propose auditing disclosure...

10.48550/arxiv.2307.15217 preprint EN cc-by arXiv (Cornell University) 2023-01-01

The last decade of machine learning has seen drastic increases in scale and capabilities. Deep neural networks (DNNs) are increasingly being deployed the real world. However, they difficult to analyze, raising concerns about using them without a rigorous understanding how function. Effective tools for interpreting will be important building more trustworthy AI by helping identify problems, fix bugs, improve basic understanding. In particular, "inner" interpretability techniques, which focus...

10.1109/satml54575.2023.00039 article EN 2023-02-01

The risks posed by Artificial Intelligence (AI) are of considerable concern to academics, auditors, policymakers, AI companies, and the public. However, a lack shared understanding can impede our ability comprehensively discuss, research, react them. This paper addresses this gap creating an Risk Repository serve as common frame reference. comprises living database 777 extracted from 43 taxonomies, which be filtered based on two overarching taxonomies easily accessed, modified, updated via...

10.70777/agi.v1i1.10881 article EN 2024-09-23

In late October 1977, Ali Maow Maalin was admitted to hospital in Merca, a port town southern Somalia, where he would unhappily come occupy place history as the last known case of smallpox. While Maalin, unlike most victims this dreadful disease, survived, his infection could only have occurred through transmission from another human being, and thus represents remarkable bookend long deadly pox's person-to-person transmission. From 1967 D. A. Henderson headed-up World Health Organization's...

10.1093/jhmas/jrq076 article EN Journal of the History of Medicine and Allied Sciences 2010-12-01

This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These are organized into three different categories: scientific understanding LLMs, development deployment methods, sociotechnical challenges. Based on identified challenges, we pose $200+$ concrete research questions.

10.48550/arxiv.2404.09932 preprint EN arXiv (Cornell University) 2024-04-15

As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research, healthcare, ensuring their safety alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge has shown promise for privacy data removal tasks, which been primary focus existing research. More recently, its potential application gained attention. In this paper, we identify...

10.48550/arxiv.2501.04952 preprint EN arXiv (Cornell University) 2025-01-08

Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order accomplish concrete scientific and engineering goals. Progress this field thus promises provide greater assurance over AI system behavior shed light on exciting questions about nature of intelligence. Despite recent progress toward these goals, there are many open problems that require solutions before practical benefits can be realized: Our methods both conceptual...

10.48550/arxiv.2501.16496 preprint EN arXiv (Cornell University) 2025-01-27

The first International AI Safety Report comprehensively synthesizes the current evidence on capabilities, risks, and safety of advanced systems. report was mandated by nations attending Summit in Bletchley, UK. Thirty nations, UN, OECD, EU each nominated a representative to report's Expert Advisory Panel. A total 100 experts contributed, representing diverse perspectives disciplines. Led Chair, these independent collectively had full discretion over content.

10.48550/arxiv.2501.17805 preprint EN arXiv (Cornell University) 2025-01-29

Leading AI developers and startups are increasingly deploying agentic systems that can plan execute complex tasks with limited human involvement. However, there is currently no structured framework for documenting the technical components, intended uses, safety features of systems. To fill this gap, we introduce Agent Index, first public database to document information about deployed For each system meets criteria inclusion in index, system's components (e.g., base model, reasoning...

10.48550/arxiv.2502.01635 preprint EN arXiv (Cornell University) 2025-02-03

Evaluations of large language model (LLM) risks and capabilities are increasingly being incorporated into AI risk management governance frameworks. Currently, most evaluations conducted by designing inputs that elicit harmful behaviors from the system. However, a fundamental limitation this approach is harmfulness identified during any particular evaluation can only lower bound model's worst-possible-case behavior. As complementary method for eliciting behaviors, we propose evaluating LLMs...

10.48550/arxiv.2502.05209 preprint EN arXiv (Cornell University) 2025-02-03

Nations across the world are working to govern AI. However, from a technical perspective, there is uncertainty and disagreement on best way do this. Meanwhile, recent debates over AI regulation have led calls for "evidence-based policy" which emphasize holding regulatory action high evidentiary standard. Evidence of irreplaceable value policymaking. too an standard can lead systematic neglect certain risks. In historical policy (e.g., tobacco ca. 1965 fossil fuels 1985) rhetoric also...

10.48550/arxiv.2502.09618 preprint EN arXiv (Cornell University) 2025-02-13

Misaligned research objectives have considerably hindered progress in adversarial robustness over the past decade. For instance, an extensive focus on optimizing target metrics, while neglecting rigorous standardized evaluation, has led researchers to pursue ad-hoc heuristic defenses that were seemingly effective. Yet, most of these exposed as flawed by subsequent evaluations, ultimately contributing little measurable field. In this position paper, we illustrate current large language models...

10.48550/arxiv.2502.11910 preprint EN arXiv (Cornell University) 2025-02-17

We explore machine unlearning (MU) in the domain of large language models (LLMs), referred to as LLM unlearning. This initiative aims eliminate undesirable data influence (e.g., sensitive or illegal information) and associated model capabilities, while maintaining integrity essential knowledge generation not affecting causally unrelated information. envision becoming a pivotal element life-cycle management LLMs, potentially standing an foundation for developing generative AI that is only...

10.48550/arxiv.2402.08787 preprint EN arXiv (Cornell University) 2024-02-13

Objective To review the intellectual history of concussion from mid‐19th century to opening decade 21st century. Background Head injuries (HI) and their acute long‐term effects have been investigated for centuries, with major reviews topic appearing by 1870. Thus, while it has long acknowledged that chronic traumatic encephalopathy was first described Harrison Martland in 1928, an examination research up Martland's seminal report places his studies a deeper historical context. This makes...

10.1111/head.13288 article EN Headache The Journal of Head and Face Pain 2018-03-14

The risks posed by Artificial Intelligence (AI) are of considerable concern to academics, auditors, policymakers, AI companies, and the public. However, a lack shared understanding can impede our ability comprehensively discuss, research, react them. This paper addresses this gap creating an Risk Repository serve as common frame reference. comprises living database 777 extracted from 43 taxonomies, which be filtered based on two overarching taxonomies easily accessed, modified, updated via...

10.48550/arxiv.2408.12622 preprint EN arXiv (Cornell University) 2024-08-14

Large language models (LLMs) can often be made to behave in undesirable ways that they are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a wide variety of 'jailbreaking' techniques elicit harmful text from were harmless. Recent work on red-teaming, model editing, and interpretability suggests this challenge stems how (adversarial) fine-tuning largely serves suppress rather than remove capabilities LLMs. Prior introduced latent adversarial training...

10.48550/arxiv.2407.15549 preprint EN arXiv (Cornell University) 2024-07-22

The attitudes that characterize the contemporary “neuro-turn” were strikingly commonplace as part of self-fashioning social identity in biographies and personal papers past neurologists neuroscientists. Indeed, one fundamental connection between nineteenth- twentieth-century neurology neuroscience appears to be value workers both domains attach idea integration, a vision neural science medicine connected reductionist broader inquiries about mind, brain, human nature so doing supposedly...

10.1086/675554 article EN Isis 2014-03-01

Five international consensus statements on concussion in sports have been published. This commentary argues that there is a strong need for new approach to them foregrounds public health expertise and patient-centered guidance. Doing so will help players, parents practitioners keep perspective about these potentially life-altering injuries especially when they recur.

10.1017/jme.2021.56 article EN The Journal of Law Medicine & Ethics 2021-01-01

Deploying large language models (LMs) can pose hazards from harmful outputs such as toxic or false text. Prior work has introduced automated tools that elicit to identify these risks. While this is a valuable step toward securing models, approaches rely on pre-existing way efficiently classify undesirable outputs. Using classifier does not allow for red-teaming be tailored the target model. Furthermore, when failures easily classified in advance, limited marginal value because problems...

10.48550/arxiv.2306.09442 preprint EN cc-by arXiv (Cornell University) 2023-01-01

The Minnesota Multiphasic Personality Inventory (MMPI) was developed at the University of Minnesota, Minneapolis, in 1930s and 1940s. It became a highly successful controversial psychometric tool. In professional terms, tools such as MMPI transformed psychology psychiatry. Psychometric instruments thus readily fit into developmental history psychology, psychiatry, neurology; they were significant part narrative those fields' advances understanding, intervening, treating people with mental...

10.1017/s0269889714000337 article EN Science in Context 2015-02-09

AI systems sometimes exhibit harmful unintended behaviors post-deployment. This is often despite extensive diagnostics and debugging by developers. Minimizing risks from models challenging because the attack surface so large. It not tractable to exhaustively search for inputs that may cause a model fail. Red-teaming adversarial training (AT) are commonly used make more robust. However, they have been sufficient avoid many real-world failure modes differ ones adversarially trained on. In this...

10.48550/arxiv.2403.05030 preprint EN arXiv (Cornell University) 2024-03-07

This is the interim publication of first International Scientific Report on Safety Advanced AI. The report synthesises scientific understanding general-purpose AI -- that can perform a wide variety tasks with focus and managing its risks. A diverse group 75 experts contributed to this report, including an international Expert Advisory Panel nominated by 30 countries, EU, UN. Led Chair, these independent collectively had full discretion over report's content.

10.48550/arxiv.2412.05282 preprint EN arXiv (Cornell University) 2024-11-05
Coming Soon ...