NFDI4DS | UHH-SEMS - Publication Details

Marti A. Hearst

ORCID: 0000-0002-4346-1603

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5019933387

Research Areas

Topic Modeling
Natural Language Processing Techniques
Advanced Text Analysis Techniques
Information Retrieval and Search Behavior
Data Visualization and Analytics
Semantic Web and Ontologies
Biomedical Text Mining and Ontologies
Web Data Mining and Analysis
Speech and dialogue systems
Text Readability and Simplification
Usability and User Interface Design
Software Engineering Research
Data Management and Algorithms
Video Analysis and Summarization
Advanced Database Systems and Queries
Expert finding and Q&A systems
Multimedia Communication and Technology
Online Learning and Analytics
Text and Document Classification Technologies
Wikis in Education and Collaboration
Digital Humanities and Scholarship
Image Retrieval and Classification Techniques
Data Mining Algorithms and Applications
Big Data and Business Intelligence
Interactive and Immersive Displays

University of California, Berkeley
2015-2024

Berkeley College
2014-2024

Allen Institute
2020-2023

University of Washington
2020-2023

Northwestern University
2019-2023

Massachusetts Institute of Technology
2023

University of Pennsylvania
2022

Microsoft Research (United Kingdom)
2021

University of Minnesota
2021

Seoul National University
2021

Support vector machines

OPENALEX - Publications

Marti A. Hearst Susan Dumais E. Osuna John Platt Bernhard Schölkopf

My first exposure to Support Vector Machines came this spring when heard Sue Dumais present impressive results on text categorization using analysis technique. This issue's collection of essays should help familiarize our readers with interesting new racehorse in the Machine Learning stable. Bernhard Scholkopf, an introductory overview, points out that a particular advantage SVMs over other learning algorithms is it can be analyzed theoretically concepts from computational theory, and at...

10.1109/5254.708428 article EN IEEE Intelligent Systems and their Applications 1998-07-01

Automatic acquisition of hyponyms from large text corpora

OPENALEX - Publications

Marti A. Hearst

We describe a method for the automatic acquisition of hyponymy lexical relation from unrestricted text. Two goals motivate approach: (i) avoidance need pre-encoded knowledge and (ii) applicability across wide range identify set lexico-syntactic patterns that are easily recognizable, occur frequently text genre boundaries, indisputably indicate interest. discovering these suggest other relations will also be acquirable in this way. A subset algorithm is implemented results used to augment...

10.3115/992133.992154 article EN 1992-01-01

Why phishing works

OPENALEX - Publications

Rachna Dhamija J. D. Tygar Marti A. Hearst

To build systems shielding users from fraudulent (or phishing) websites, designers need to know which attack strategies work and why. This paper provides the first empirical evidence about malicious are successful at deceiving general users. We analyzed a large set of captured phishing attacks developed hypotheses why these might work. then assessed with usability study in 22 participants were shown 20 web sites asked determine ones fraudulent. found that 23% did not look browser-based cues...

10.1145/1124772.1124861 article EN 2006-04-22

Faceted metadata for image search and browsing

OPENALEX - Publications

Ka-Ping Yee Kirsten Swearingen Kevin Li Marti A. Hearst

There are currently two dominant interface types for searching and browsing large image collections: keyword-based search, by overall similarity to sample images. We present an alternative based on enabling users navigate along conceptual dimensions that describe the The makes use of hierarchical faceted metadata dynamically generated query previews. A usability study, in which 32 art history students explored a collection 35,000 fine arts images, compares this approach standard search...

10.1145/642611.642681 article EN 2003-04-05

Untangling text data mining

OPENALEX - Publications

Marti A. Hearst

The possibilities for data mining from large text collections are virtually untapped. Text expresses a vast, rich range of information, but encodes this information in form that is difficult to decipher automatically. Perhaps reason, there has been little work date, and most people who have talked about it either conflated with access or not made use directly discover heretofore unknown information.

10.3115/1034678.1034679 article EN 1999-01-01

Reexamining the cluster hypothesis

OPENALEX - Publications

Marti A. Hearst Jan Pedersen

Article Free Access Share on Reexamining the cluster hypothesis: scatter/gather retrieval results Authors: Marti A. Hearst Xerox Palo Alto Research Center, 3333 Coyote Hill Rd, Alto, CA CAView Profile , Jan O. Pedersen Authors Info & Claims SIGIR '96: Proceedings of 19th annual international ACM conference and development in information retrievalAugust 1996 Pages 76–84https://doi.org/10.1145/243199.243216Online:18 August 1996Publication History 493citation2,028DownloadsMetricsTotal...

10.1145/243199.243216 article EN Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02 1996-01-01

TileBars

OPENALEX - Publications

Marti A. Hearst

The field of information retrieval has traditionally focused on textbases consisting titles and abstracts. As a consequence, many underlying assumptions must be altered for from full-length text collections. This paper argues making use structure when retrieving full documents, presents visualization paradigm, called TileBars, that demonstrates the usefulness explicit term distribution in Boolean-type queries. TileBars simultaneously compactly indicate relative document length, query...

10.1145/223904.223912 article EN 1995-01-01

Multi-paragraph segmentation of expository text

OPENALEX - Publications

Marti A. Hearst

This paper describes TextTiling, an algorithm for partitioning expository texts into coherent multi-paragraph discourse units which reflect the subtopic structure of texts. The uses domain-independent lexical frequency and distribution information to recognize interactions multiple simultaneous themes. Two fully-implemented versions are described shown produce segmentation that corresponds well human judgments major boundaries thirteen lengthy

10.3115/981732.981734 article EN 1994-01-01

A SIMPLE ALGORITHM FOR IDENTIFYING ABBREVIATION DEFINITIONS IN BIOMEDICAL TEXT

OPENALEX - Publications

Ariel Schwartz Marti A. Hearst

10.1142/9789812776303_0042 article EN Biocomputing 2002-12-01

A Critique and Improvement of an Evaluation Metric for Text Segmentation

OPENALEX - Publications

L. A. Pevzner Marti A. Hearst

The P k evaluation metric, initially proposed by Beeferman, Berger, and Lafferty (1997), is becoming the standard measure for assessing text segmentation algorithms. However, a theoretical analysis of metric finds several problems: penalizes false negatives more heavily than positives, overpenalizes near misses, affected variation in segment size distribution. We propose simple modification to that remedies these problems. This new metric—called Window Diff—moves fixed-sized window across...

10.1162/089120102317341756 article EN Computational Linguistics 2002-03-01

Multi-Paragraph Segmentation of Expository Text

OPENALEX - Publications

Marti A. Hearst

10.48550/arxiv.cmp-lg/9406037 preprint EN other-oa arXiv (Cornell University) 1994-01-01

Clustering versus faceted categories for information exploration

OPENALEX - Publications

Marti A. Hearst

article Clustering versus faceted categories for information exploration Author: Marti A. Hearst University of California, Berkeley BerkeleyView Profile Authors Info & Claims Communications the ACMVolume 49Issue 4April 2006 pp 59–61https://doi.org/10.1145/1121949.1121983Published:01 April 2006Publication History 262citation4,167DownloadsMetricsTotal Citations262Total Downloads4,167Last 12 Months95Last 6 weeks6 Get Citation AlertsNew Alert added!This alert has been successfully added and will...

10.1145/1121949.1121983 article EN Communications of the ACM 2006-04-01

SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

OPENALEX - Publications

Philippe Laban Tobias Schnabel Paul N. Bennett Marti A. Hearst

Abstract In the summarization domain, a key requirement for summaries is to be factually consistent with input document. Previous work has found that natural language inference (NLI) models do not perform competitively when applied inconsistency detection. this work, we revisit use of NLI detection, finding past suffered from mismatch in granularity between datasets (sentence-level), and detection (document level). We provide highly effective light-weight method called SummaCConv enables...

10.1162/tacl_a_00453 article EN cc-by Transactions of the Association for Computational Linguistics 2022-01-01

Subtopic structuring for full-length document access

OPENALEX - Publications

Marti A. Hearst Christian Plaunt

We argue that the advent of large volumes full-length text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches information access. Toward this end, we discuss merits imposing structure on text documents; is, a partition into coherent multi-paragraph units represent pattern subtopics comprise text. Using structure, can make distinction between main topics, which occur throughout length subtopics, are only limited extent. why...

10.1145/160688.160695 article EN Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02 1993-01-01

Finding the flow in web site search

OPENALEX - Publications

Marti A. Hearst Ame Elliott Jennifer English Rashmi Sinha Kirsten Swearingen and 1 more

Designing a search system and interface may best be served (and executed) by scrutinizing usability studies.

10.1145/567498.567525 article EN Communications of the ACM 2002-09-01

Classifying semantic relations in bioscience texts

OPENALEX - Publications

Barbara Rosario Marti A. Hearst

A crucial step toward the goal of automatic extraction propositional information from natural language text is identification semantic relations between constituents in sentences. We examine problem distinguishing among seven relation types that can occur entities "treatment" and "disease" bioscience text, identifying such entities. compare five generative graphical models a neural network, using lexical, syntactic, features, finding latter help achieve high classification accuracy.

10.3115/1218955.1219010 article EN 2004-01-01

Animated exploration of dynamic graphs with radial layout

OPENALEX - Publications

Ka-Ping Yee Danyel Fisher Rachna Dhamija Marti A. Hearst

We describe a new animation technique for supporting interactive exploration of graph. use the well-known radial tree layout method, in which view is determined by selection focus node. Our main contribution method animating transition to when node selected. In order keep easy follow, linearly interpolates polar coordinates nodes, while enforcing ordering and orientation constraints. apply this visualizations social networks Gnutella file-sharing network, discuss results from our informal...

10.1109/infvis.2001.963279 article EN 2005-08-29

Empirically validated web page design metrics

OPENALEX - Publications

Melody Y. Ivory Rashmi Sinha Marti A. Hearst

A quantitative analysis of a large collection expert-rated web sites reveals that page-level metrics can accurately predict if site will be highly rated. The also provides empirical evidence important metrics, including page composition, formatting, and overall characteristics, differ among categories such as education, community, living, finance. These results provide an foundation for design guidelines suggest which most evaluation via user studies.

10.1145/365024.365035 article EN 2001-03-01

Cat-a-Cone

OPENALEX - Publications

Marti A. Hearst Chandu Karadi

Article Free Access Share on Cat-a-Cone: an interactive interface for specifying searches and viewing retrieval results using a large category hierarchy Authors: Marti A. Hearst Xerox Palo Alto Research Center, 3333 Coyote Hill Rd, Alto, CA CAView Profile , Chandu Karadi School of Medicine, M121, Stanford University, Stanford, Authors Info & Claims SIGIR '97: Proceedings the 20th annual international ACM conference development in information retrievalJuly 1997Pages...

10.1145/258525.258582 article EN Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02 1997-01-01

Coming Soon ...