Marti A. Hearst

ORCID: 0000-0002-4346-1603
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Advanced Text Analysis Techniques
  • Information Retrieval and Search Behavior
  • Data Visualization and Analytics
  • Semantic Web and Ontologies
  • Biomedical Text Mining and Ontologies
  • Web Data Mining and Analysis
  • Speech and dialogue systems
  • Text Readability and Simplification
  • Usability and User Interface Design
  • Software Engineering Research
  • Data Management and Algorithms
  • Video Analysis and Summarization
  • Advanced Database Systems and Queries
  • Expert finding and Q&A systems
  • Multimedia Communication and Technology
  • Online Learning and Analytics
  • Text and Document Classification Technologies
  • Wikis in Education and Collaboration
  • Digital Humanities and Scholarship
  • Image Retrieval and Classification Techniques
  • Data Mining Algorithms and Applications
  • Big Data and Business Intelligence
  • Interactive and Immersive Displays

University of California, Berkeley
2015-2024

Berkeley College
2014-2024

Allen Institute
2020-2023

University of Washington
2020-2023

Northwestern University
2019-2023

Massachusetts Institute of Technology
2023

University of Pennsylvania
2022

Microsoft Research (United Kingdom)
2021

University of Minnesota
2021

Seoul National University
2021

My first exposure to Support Vector Machines came this spring when heard Sue Dumais present impressive results on text categorization using analysis technique. This issue's collection of essays should help familiarize our readers with interesting new racehorse in the Machine Learning stable. Bernhard Scholkopf, an introductory overview, points out that a particular advantage SVMs over other learning algorithms is it can be analyzed theoretically concepts from computational theory, and at...

10.1109/5254.708428 article EN IEEE Intelligent Systems and their Applications 1998-07-01

We describe a method for the automatic acquisition of hyponymy lexical relation from unrestricted text. Two goals motivate approach: (i) avoidance need pre-encoded knowledge and (ii) applicability across wide range identify set lexico-syntactic patterns that are easily recognizable, occur frequently text genre boundaries, indisputably indicate interest. discovering these suggest other relations will also be acquirable in this way. A subset algorithm is implemented results used to augment...

10.3115/992133.992154 article EN 1992-01-01

To build systems shielding users from fraudulent (or phishing) websites, designers need to know which attack strategies work and why. This paper provides the first empirical evidence about malicious are successful at deceiving general users. We analyzed a large set of captured phishing attacks developed hypotheses why these might work. then assessed with usability study in 22 participants were shown 20 web sites asked determine ones fraudulent. found that 23% did not look browser-based cues...

10.1145/1124772.1124861 article EN 2006-04-22

There are currently two dominant interface types for searching and browsing large image collections: keyword-based search, by overall similarity to sample images. We present an alternative based on enabling users navigate along conceptual dimensions that describe the The makes use of hierarchical faceted metadata dynamically generated query previews. A usability study, in which 32 art history students explored a collection 35,000 fine arts images, compares this approach standard search...

10.1145/642611.642681 article EN 2003-04-05

The possibilities for data mining from large text collections are virtually untapped. Text expresses a vast, rich range of information, but encodes this information in form that is difficult to decipher automatically. Perhaps reason, there has been little work date, and most people who have talked about it either conflated with access or not made use directly discover heretofore unknown information.

10.3115/1034678.1034679 article EN 1999-01-01

Article Free Access Share on Reexamining the cluster hypothesis: scatter/gather retrieval results Authors: Marti A. Hearst Xerox Palo Alto Research Center, 3333 Coyote Hill Rd, Alto, CA CAView Profile , Jan O. Pedersen Authors Info & Claims SIGIR '96: Proceedings of 19th annual international ACM conference and development in information retrievalAugust 1996 Pages 76–84https://doi.org/10.1145/243199.243216Online:18 August 1996Publication History 493citation2,028DownloadsMetricsTotal...

10.1145/243199.243216 article EN Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02 1996-01-01

The field of information retrieval has traditionally focused on textbases consisting titles and abstracts. As a consequence, many underlying assumptions must be altered for from full-length text collections. This paper argues making use structure when retrieving full documents, presents visualization paradigm, called TileBars, that demonstrates the usefulness explicit term distribution in Boolean-type queries. TileBars simultaneously compactly indicate relative document length, query...

10.1145/223904.223912 article EN 1995-01-01

This paper describes TextTiling, an algorithm for partitioning expository texts into coherent multi-paragraph discourse units which reflect the subtopic structure of texts. The uses domain-independent lexical frequency and distribution information to recognize interactions multiple simultaneous themes. Two fully-implemented versions are described shown produce segmentation that corresponds well human judgments major boundaries thirteen lengthy

10.3115/981732.981734 article EN 1994-01-01

The P k evaluation metric, initially proposed by Beeferman, Berger, and Lafferty (1997), is becoming the standard measure for assessing text segmentation algorithms. However, a theoretical analysis of metric finds several problems: penalizes false negatives more heavily than positives, overpenalizes near misses, affected variation in segment size distribution. We propose simple modification to that remedies these problems. This new metric—called Window Diff—moves fixed-sized window across...

10.1162/089120102317341756 article EN Computational Linguistics 2002-03-01

This paper describes TextTiling, an algorithm for partitioning expository texts into coherent multi-paragraph discourse units which reflect the subtopic structure of texts. The uses domain-independent lexical frequency and distribution information to recognize interactions multiple simultaneous themes. Two fully-implemented versions are described shown produce segmentation that corresponds well human judgments major boundaries thirteen lengthy

10.48550/arxiv.cmp-lg/9406037 preprint EN other-oa arXiv (Cornell University) 1994-01-01

article Clustering versus faceted categories for information exploration Author: Marti A. Hearst University of California, Berkeley BerkeleyView Profile Authors Info & Claims Communications the ACMVolume 49Issue 4April 2006 pp 59–61https://doi.org/10.1145/1121949.1121983Published:01 April 2006Publication History 262citation4,167DownloadsMetricsTotal Citations262Total Downloads4,167Last 12 Months95Last 6 weeks6 Get Citation AlertsNew Alert added!This alert has been successfully added and will...

10.1145/1121949.1121983 article EN Communications of the ACM 2006-04-01

Abstract In the summarization domain, a key requirement for summaries is to be factually consistent with input document. Previous work has found that natural language inference (NLI) models do not perform competitively when applied inconsistency detection. this work, we revisit use of NLI detection, finding past suffered from mismatch in granularity between datasets (sentence-level), and detection (document level). We provide highly effective light-weight method called SummaCConv enables...

10.1162/tacl_a_00453 article EN cc-by Transactions of the Association for Computational Linguistics 2022-01-01

We argue that the advent of large volumes full-length text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches information access. Toward this end, we discuss merits imposing structure on text documents; is, a partition into coherent multi-paragraph units represent pattern subtopics comprise text. Using structure, can make distinction between main topics, which occur throughout length subtopics, are only limited extent. why...

10.1145/160688.160695 article EN Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02 1993-01-01

Designing a search system and interface may best be served (and executed) by scrutinizing usability studies.

10.1145/567498.567525 article EN Communications of the ACM 2002-09-01

A crucial step toward the goal of automatic extraction propositional information from natural language text is identification semantic relations between constituents in sentences. We examine problem distinguishing among seven relation types that can occur entities "treatment" and "disease" bioscience text, identifying such entities. compare five generative graphical models a neural network, using lexical, syntactic, features, finding latter help achieve high classification accuracy.

10.3115/1218955.1219010 article EN 2004-01-01

We describe a new animation technique for supporting interactive exploration of graph. use the well-known radial tree layout method, in which view is determined by selection focus node. Our main contribution method animating transition to when node selected. In order keep easy follow, linearly interpolates polar coordinates nodes, while enforcing ordering and orientation constraints. apply this visualizations social networks Gnutella file-sharing network, discuss results from our informal...

10.1109/infvis.2001.963279 article EN 2005-08-29

A quantitative analysis of a large collection expert-rated web sites reveals that page-level metrics can accurately predict if site will be highly rated. The also provides empirical evidence important metrics, including page composition, formatting, and overall characteristics, differ among categories such as education, community, living, finance. These results provide an foundation for design guidelines suggest which most evaluation via user studies.

10.1145/365024.365035 article EN 2001-03-01

Article Free Access Share on Cat-a-Cone: an interactive interface for specifying searches and viewing retrieval results using a large category hierarchy Authors: Marti A. Hearst Xerox Palo Alto Research Center, 3333 Coyote Hill Rd, Alto, CA CAView Profile , Chandu Karadi School of Medicine, M121, Stanford University, Stanford, Authors Info & Claims SIGIR '97: Proceedings the 20th annual international ACM conference development in information retrievalJuly 1997Pages...

10.1145/258525.258582 article EN Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02 1997-01-01
Coming Soon ...