NFDI4DS | UHH-SEMS - Publication Details

The Astropy Project: Sustaining and Growing a Community-oriented Open-source Project and the Latest Major Release (v5.0) of the Core Package*

OPENALEX - Publications

The Astropy Collaboration Adrian M. Price-Whelan Pey Lian Lim N. Earl Nathaniel Starkman and 95 more

Abstract The Astropy Project supports and fosters the development of open-source openly developed Python packages that provide commonly needed functionality to astronomical community. A key element is core package astropy , which serves as foundation for more specialized projects packages. In this article, we summarize features in recent major release, version 5.0, updates on Project. We then discuss supporting a broader ecosystem interoperable packages, including connections with several...

10.3847/1538-4357/ac7c74 article EN cc-by The Astrophysical Journal 2022-08-01

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

OPENALEX - Publications

Yuntao Bai Andy Jones Kamal Ndousse Amanda Askell Anna Chen and 26 more

We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models act as helpful harmless assistants. find this alignment training improves performance on almost all NLP evaluations, is fully compatible with for specialized skills such python coding summarization. explore an iterated online mode of training, where RL policies are updated a weekly cadence fresh data, efficiently improving our datasets models. Finally, we investigate the robustness...

10.48550/arxiv.2204.05862 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Constitutional AI: Harmlessness from AI Feedback

OPENALEX - Publications

Yuntao Bai Saurav Kadavath Sandipan Kundu Amanda Askell Jackson Kernion and 46 more

As AI systems become more capable, we would like to enlist their help supervise other AIs. We experiment with methods for training a harmless assistant through self-improvement, without any human labels identifying harmful outputs. The only oversight is provided list of rules or principles, and so refer the method as 'Constitutional AI'. process involves both supervised learning reinforcement phase. In phase sample from an initial model, then generate self-critiques revisions, finetune...

10.48550/arxiv.2212.08073 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Predictability and Surprise in Large Generative Models

OPENALEX - Publications

Deep Ganguli Danny Hernandez Liane Lovitt Amanda Askell Yuntao Bai and 25 more

Large-scale pre-training has recently emerged as a technique for creating capable, general purpose, generative models such GPT-3, Megatron-Turing NLG, Gopher, and many others. In this paper, we highlight counterintuitive property of discuss the policy implications property. Namely, these have an unusual combination predictable loss on broad training distribution (as embodied in their "scaling laws"), unpredictable specific capabilities, inputs, outputs. We believe that high-level...

10.1145/3531146.3533229 article EN 2022 ACM Conference on Fairness, Accountability, and Transparency 2022-06-20

Discovering Language Model Behaviors with Model-Written Evaluations

OPENALEX - Publications

Ethan Perez Sam Ringer Kamilė Lukošiūtė Karina Nguyen Edwin Chen and 58 more

Ethan Perez, Sam Ringer, Kamile Lukosiute, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Benjamin Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael...

10.18653/v1/2023.findings-acl.847 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01

A General Language Assistant as a Laboratory for Alignment

OPENALEX - Publications

Amanda Askell Yuntao Bai Anna Chen Dawn Drain Deep Ganguli and 17 more

Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human values, meaning helpful, honest, and harmless. As an initial foray in this direction we study simple baseline techniques evaluations, such as prompting. We find benefits from modest interventions increase model size, generalize variety alignment do not compromise performance models. Next investigate scaling trends for several training...

10.48550/arxiv.2112.00861 preprint EN other-oa arXiv (Cornell University) 2021-01-01

In-context Learning and Induction Heads

OPENALEX - Publications

Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova DasSarma and 21 more

"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for hypothesis induction might constitute the mechanism majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing indices). We find develop precisely same point as sudden sharp increase in-context learning ability, visible bump training loss. six complementary...

10.48550/arxiv.2209.11895 preprint EN cc-by arXiv (Cornell University) 2022-01-01

The Capacity for Moral Self-Correction in Large Language Models

OPENALEX - Publications

Deep Ganguli Amanda Askell Nicholas Schiefer Thomas T. Liao Kamilė Lukošiūtė and 43 more

We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have capability to "morally self-correct" -- avoid producing harmful outputs if instructed do so. find strong evidence in support of this across three different experiments, each which reveal facets moral self-correction. for self-correction emerges at 22B model parameters, and typically improves increasing size RLHF training. believe level scale, obtain two capabilities they can use...

10.48550/arxiv.2302.07459 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Towards Measuring the Representation of Subjective Global Opinions in Language Models

OPENALEX - Publications

Esin Durmus Karina Nyugen Thomas I. Liao Nicholas Schiefer Amanda Askell and 13 more

Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed capture issues across different countries. Next, define metric that quantifies the similarity between LLM-generated survey human responses, conditioned...

10.48550/arxiv.2306.16388 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Hypothesis: A new approach to property-based testing

OPENALEX - Publications

David R. MacIver Zac Hatfield-Dodds many other contributors

Property-based testing is a style of popularised by the QuickCheck family libraries, first in Haskell (Claessen & Hughes, 2000) and later Erlang (Arts, Johansson, Wiger, 2006), which integrates generated test cases into existing software workflows: Instead tests that provide examples single concrete behaviour, specify properties hold for wide range inputs, library then attempts to generate refute.For general introduction property-based testing, see (MacIver, 2019).Hypothesis mature widely...

10.21105/joss.01891 article EN cc-by The Journal of Open Source Software 2019-11-21

Toy Models of Superposition

OPENALEX - Publications

Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer Tom Henighan and 11 more

Neural networks often pack many unrelated concepts into a single neuron - puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging. This paper provides toy model where polysemanticity can be fully understood, arising result of models storing additional sparse features in "superposition." We demonstrate the existence phase change, surprising connection to geometry uniform polytopes, and evidence link adversarial examples. also discuss potential...

10.48550/arxiv.2209.10652 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Towards Understanding Sycophancy in Language Models

OPENALEX - Publications

Mrinank Sharma Meg Tong Tomasz Korbak David Duvenaud Amanda Askell and 14 more

Human feedback is commonly utilized to finetune AI assistants. But human may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use feedback, and potential role preference judgments such behavior. first demonstrate five state-of-the-art assistants consistently exhibit across four varied free-form text-generation tasks. To understand if preferences...

10.48550/arxiv.2310.13548 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Scaling Laws and Interpretability of Learning from Repeated Data

OPENALEX - Publications

Danny Hernandez Tom Brown Tom Conerly Nova DasSarma Dawn Drain and 13 more

Recent large language models have been trained on vast datasets, but also often repeated data, either intentionally for the purpose of upweighting higher quality or unintentionally because data deduplication is not perfect and model exposed to at sentence, paragraph, document level. Some works reported substantial negative performance effects this data. In paper we attempt study systematically understand its mechanistically. To do this, train a family where most unique small fraction it many...

10.48550/arxiv.2205.10487 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Measuring Progress on Scalable Oversight for Large Language Models

OPENALEX - Publications

Samuel R. Bowman Jeeyoon Hyun Ethan Perez Edwin Chen Craig Pettit and 41 more

Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising that potentially outperform most skills relevant task at hand. Empirical work this is not straightforward, since we do yet have broadly exceed our abilities. This paper discusses one major ways think about problem, with a focus it can be studied empirically. We first present an experimental design centered tasks for which human specialists succeed but...

10.48550/arxiv.2211.03540 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Deriving semantics-aware fuzzers from web API schemas

OPENALEX - Publications

Zac Hatfield-Dodds Dmitry Dygalo

We present Schemathesis, a tool for finding semantic errors and crashes in OpenAPI or GraphQL web APIs through property-based testing. Our evaluation, thirty independent runs of eight tools against sixteen containerized open-source services, shows that Schemathesis wildly outperforms all previous tools.

10.1145/3510454.3528637 article EN 2022-05-21

Measuring Faithfulness in Chain-of-Thought Reasoning

OPENALEX - Publications

Tamera Lanham Anna Chen Ansh Radhakrishnan Benoit Steiner Carson Denison and 25 more

Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated faithful explanation of model's actual (i.e., its process for question). We investigate hypotheses how CoT may be unfaithful, by examining model predictions change we intervene on (e.g., adding mistakes or paraphrasing it). Models show large variation across tasks in strongly condition predicting their answer, sometimes...

10.48550/arxiv.2307.13702 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Deriving Semantics-Aware Fuzzers from Web API Schemas

OPENALEX - Publications

Zac Hatfield-Dodds Dmitry Dygalo

We present Schemathesis, a tool for finding semantic errors and crashes in OpenAPI or GraphQL web APIs through property-based testing. Our evaluation, thirty independent runs of eight tools against sixteen containerized open-source services, shows that Schemathesis wildly outperforms all previous tools.It is the only to find defects four targets, finds 1.4× 4.5× more unique than respectively second-best each remaining target, handle two-thirds our target services without fatal internal...

10.1109/icse-companion55297.2022.9793781 article EN 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) 2022-05-01

Discovering Language Model Behaviors with Model-Written Evaluations

OPENALEX - Publications

Ethan Perez Sam Ringer Kamilė Lukošiūtė Karina Nguyen Edwin Chen and 58 more

As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how behave. Prior work creates evaluations with crowdwork (which is time-consuming expensive) or existing data sources are not always available). Here, we automatically generate LMs. We explore approaches varying amounts of human effort, from instructing LMs write yes/no questions making complex Winogender schemas multiple stages LM-based generation filtering. Crowdworkers rate...

10.48550/arxiv.2212.09251 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

OPENALEX - Publications

Ansh Radhakrishnan Karina Nguyen Anna Chen Carol Chen Carson Denison and 19 more

As large language models (LLMs) perform more difficult tasks, it becomes harder to verify the correctness and safety of their behavior. One approach help with this issue is prompt LLMs externalize reasoning, e.g., by having them generate step-by-step reasoning as they answer a question (Chain-of-Thought; CoT). The may enable us check process that use tasks. However, relies on stated faithfully reflecting model's actual which not always case. To improve over faithfulness CoT we have...

10.48550/arxiv.2307.11768 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Specific versus General Principles for Constitutional AI

OPENALEX - Publications

Sandipan Kundu Yuntao Bai Saurav Kadavath Amanda Askell A. Callahan and 31 more

Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative, replacing human with from models conditioned only on list of written principles. We find this approach effectively prevents the expression behaviors. The success simple principles motivates us to ask: learn general ethical single principle? To test this,...

10.48550/arxiv.2310.13798 preprint EN other-oa arXiv (Cornell University) 2023-01-01

pydata/xarray: v0.20.1

OPENALEX - Publications

Stephan Hoyer Maximilian Roos Joseph Hamman keewis Deepak Cherian and 23 more

10.5281/zenodo.5648431 article SO 2021-11-05

Tyche: Making Sense of PBT Effectiveness

OPENALEX - Publications

Harrison Goldstein Jeffrey Tao Zac Hatfield-Dodds Benjamin C. Pierce Andrew Head

10.1145/3654777.3676407 article EN 2024-10-11

Falsify your Software: validating scientific code with property-based testing

OPENALEX - Publications

Zac Hatfield-Dodds

Where traditional example-based tests check software using manually-specified input-output pairs, property-based exploit a general description of valid inputs and program behaviour to automatically search for falsifying examples. Given that Python has excellent testing tools, such are often easier work with routinely find serious bugs all other techniques have missed.

10.25080/majora-342d178e-016 article EN cc-by Proceedings of the Python in Science Conferences 2020-01-01

Predictability and Surprise in Large Generative Models

OPENALEX - Publications

Deep Ganguli Danny Hernandez Liane Lovitt Nova DasSarma Tom Henighan and 25 more

Large-scale pre-training has recently emerged as a technique for creating capable, general purpose, generative models such GPT-3, Megatron-Turing NLG, Gopher, and many others. In this paper, we highlight counterintuitive property of discuss the policy implications property. Namely, these have an unusual combination predictable loss on broad training distribution (as embodied in their "scaling laws"), unpredictable specific capabilities, inputs, outputs. We believe that high-level...

10.48550/arxiv.2202.07785 preprint EN cc-by arXiv (Cornell University) 2022-01-01