Dafna Shahaf

ORCID: 0000-0003-3261-0818
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Advanced Text Analysis Techniques
  • Data Visualization and Analytics
  • Multimodal Machine Learning Applications
  • Semantic Web and Ontologies
  • Web Data Mining and Analysis
  • Software Engineering Research
  • Mobile Crowdsensing and Crowdsourcing
  • Logic, Reasoning, and Knowledge
  • Humor Studies and Applications
  • Data Management and Algorithms
  • Sentiment Analysis and Opinion Mining
  • Open Source Software Innovations
  • Machine Learning and Algorithms
  • Artificial Intelligence in Games
  • Rough Sets and Fuzzy Logic
  • Advanced Optical Network Technologies
  • Explainable Artificial Intelligence (XAI)
  • Bayesian Modeling and Causal Inference
  • Language, Metaphor, and Cognition
  • Information Retrieval and Search Behavior
  • Video Analysis and Summarization
  • Innovative Human-Technology Interaction
  • Multi-Agent Systems and Negotiation

Hebrew University of Jerusalem
2016-2024

Stanford University
2013-2022

Tel Aviv University
2022

Bar-Ilan University
2022

Allen Institute
2022

Allen Institute for Artificial Intelligence
2020

University of California, Berkeley
2020

Microsoft (United States)
2015

Stanford Medicine
2013

Carnegie Mellon University
2009-2012

Abstract Recent advances in LLMs have led to an abundance of evaluation benchmarks, which typically rely on a single instruction template per task. We create large-scale collection paraphrases and comprehensively analyze the brittleness introduced by single-prompt evaluations across 6.5M instances, involving 20 different 39 tasks from 3 benchmarks. find that templates lead very performance, both absolute relative. Instead, we propose set diverse metrics multiple paraphrases, specifically...

10.1162/tacl_a_00681 article EN cc-by Transactions of the Association for Computational Linguistics 2024-01-01

The process of extracting useful knowledge from large datasets has become one the most pressing problems in today's society. problem spans entire sectors, scientists to intelligence analysts and web users, all whom are constantly struggling keep up with larger amounts content published every day. With this much data, it is often easy miss big picture.

10.1145/1835804.1835884 article EN 2010-07-25

Recently, much attention has been devoted to the question of whether/when traditional network protocol design, which relies on application algorithmic insights by human experts, can be replaced a data-driven (i.e., machine learning) approach. We explore this in context arguably most fundamental networking task: routing. Can ideas and techniques from learning (ML) leveraged automatically generate "good" routing configurations? focus classical setting intradomain traffic engineering. observe...

10.1145/3152434.3152441 article EN 2017-11-27

In recent years, the blogosphere has experienced a substantial increase in number of posts published daily, forcing users to cope with information overload. The task guiding through this flood thus become critical. To address issue, we present principled approach for picking set that best covers important stories blogosphere.

10.1145/1557019.1557056 article EN 2009-06-28

When information is abundant, it becomes increasingly difficult to fit nuggets of knowledge into a single coherent picture. Complex stories spaghetti branches, side stories, and intertwining narratives. In order explore these one needs map navigate unfamiliar territory. We propose methodology for creating structured summaries information, which we call metro maps. Our proposed algorithm generates concise set documents maximizing coverage salient pieces information. Most importantly, maps...

10.1145/2187836.2187957 article EN 2012-04-16

In an era of information overload, many people struggle to make sense complex stories, such as presidential elections or economic reforms. We propose a methodology for creating structured summaries information, which we call zoomable metro maps. Just cartographic maps have been relied upon centuries help us understand our surroundings, can the landscape.

10.1145/2487575.2487690 article EN 2013-08-11

As the number of scientific publications soars, even most enthusiastic reader can have trouble staying on top evolving literature. It is easy to focus a narrow aspect one's field and lose track big picture. Information overload indeed major challenge for scientists today, especially daunting new investigators attempting master discipline who seek cross disciplinary borders. In this paper, we propose metrics influence, coverage connectivity We use these create structured summaries...

10.1145/2339530.2339706 article EN 2012-08-12

Scientific discoveries are often driven by finding analogies in distant domains, but the growing number of papers makes it difficult to find relevant ideas a single discipline, let alone other domains. To provide computational support for across we introduce SOLVENT, mixed-initiative system where humans annotate aspects research that denote their background (the high-level problems being addressed), purpose specific mechanism (how they achieved purpose), and findings (what learned/achieved),...

10.1145/3274300 article EN Proceedings of the ACM on Human-Computer Interaction 2018-11-01

Analogy—the ability to find and apply deep structural patterns across domains—has been fundamental human innovation in science technology. Today there is a growing opportunity accelerate by moving analogy out of single person’s mind distributing it many information processors, both machine. Doing so has the potential overcome cognitive fixation, scale large idea repositories, support complex problems with multiple constraints. Here we lay perspective on future scalable analogical first steps...

10.1073/pnas.1807185116 article EN Proceedings of the National Academy of Sciences 2019-02-04

We discuss challenges and opportunities for developing generalized task markets where human machine intelligence are enlisted to solve problems, based on a consideration of the competencies, availabilities, pricing different problem-solving resources. The approach couples computation with learning planning, is aimed at optimizing flow subtasks people computational problem solvers. illustrate key ideas in context Lingua Mechanica, project focused harnessing translation skills perform among...

10.1609/aaai.v24i1.7652 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2010-07-04

Humor is an integral aspect of the human experience. Motivated by prospect creating computational models humor, we study influence language cartoon captions on perceived humorousness cartoons. Our studies are based a large corpus crowdsourced that were submitted to contest hosted New Yorker. Having access thousands for same image allows us analyze breadth responses people visual stimulus.

10.1145/2783258.2783388 article EN 2015-08-07

The availability of large idea repositories (e.g., the U.S. patent database) could significantly accelerate innovation and discovery by providing people with inspiration from solutions to analogous problems. However, finding useful analogies in these large, messy, real-world remains a persistent challenge for either human or automated methods. Previous approaches include costly hand-created databases that have high relational structure predicate calculus representations) but are very sparse....

10.1145/3097983.3098038 article EN 2017-08-04

Analogies have been central to creative problem-solving throughout the history of science and technology. As number scientific articles continues increase exponentially, there is a growing opportunity for finding diverse solutions existing problems. However, realizing this potential requires development means searching through large corpus that goes beyond surface matches simple keywords. Here we contribute first end-to-end system analogical search on evaluate its effectiveness with...

10.1145/3530013 article EN ACM Transactions on Computer-Human Interaction 2022-06-08

Finding analogical inspirations in distant domains is a powerful way of solving problems. However, as the number that could be matched and dimensions on which matching occur grow, it becomes challenging for designers to find relevant their needs. Furthermore, are often interested exploring specific aspects product-- example, one designer might improving brewing capability an outdoor coffee maker, while another wish optimize portability. In this paper we introduce novel system targeting...

10.1145/3173574.3173695 article EN 2018-04-19

A taxonomy of the methods used to obtain quality datasets enhances existing resources.

10.1145/3551635 article EN Communications of the ACM 2023-01-20

Finding information is becoming a major part of our daily life. Entire sectors, from Web users to scientists and intelligence analysts, are increasingly struggling keep up with the larger amounts content published every day. With this much data, it often easy miss big picture. In article, we investigate methods for automatically connecting dots---providing structured, way navigate within new topic discover hidden connections. We focus on news domain: given two articles, system finds coherent...

10.1145/2086737.2086744 article EN ACM Transactions on Knowledge Discovery from Data 2012-01-31

A metro map can tell a story, as well provide good directions.

10.1145/2735624 article EN Communications of the ACM 2015-10-23

Significance We present a tractable algorithm that provides near-optimal solution to the crawling problem, fundamental challenge at heart of web search: Given large quantity distributed and dynamic content, what pages do we choose update local cache with goal serving up-to-date client requests? Solving this optimization requires identifying best set refresh given popularity rates change rates—an intractable problem in general case. To overcome intractability, show optimal randomized strategy...

10.1073/pnas.1801519115 article EN cc-by-nc-nd Proceedings of the National Academy of Sciences 2018-07-23

While natural language understanding (NLU) is advancing rapidly, today’s technology differs from human-like in fundamental ways, notably its inferior efficiency, interpretability, and generalization. This work proposes an approach to representation learning based on the tenets of embodied cognitive linguistics (ECL). According ECL, inherently executable (like programming languages), driven by mental simulation metaphoric mappings over hierarchical compositions structures schemata learned...

10.18653/v1/2020.acl-main.559 article EN cc-by 2020-01-01

Conversational Agents (CAs) such as Apple's Siri and Amazon's Alexa are well-suited for task-oriented interactions ("Call Jason"), but other interaction types often beyond their capabilities. One notable example is playful requests: example, people ask CAs personal questions ("What's your favorite color?") or joke with them, sometimes at expense ("Find Nemo"). Failing to recognize playfulness causes user dissatisfaction abandonment, destroying the precious rapport CA.

10.1145/3491101.3519870 article EN CHI Conference on Human Factors in Computing Systems Extended Abstracts 2022-04-27

Recent advances in large language models (LLMs) have led to the development of various evaluation benchmarks. These benchmarks typically rely on a single instruction template for evaluating all LLMs specific task. In this paper, we comprehensively analyze brittleness results obtained via single-prompt evaluations across 6.5M instances, involving 20 different and 39 tasks from 3 To improve robustness analysis, propose evaluate with set diverse prompts instead. We discuss tailored metrics use...

10.48550/arxiv.2401.00595 preprint EN other-oa arXiv (Cornell University) 2024-01-01
Coming Soon ...