NFDI4DS | UHH-SEMS - Publication Details

Alistair Moffat

ORCID: 0000-0002-6638-0232

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5081861848

Research Areas

Algorithms and Data Compression
Information Retrieval and Search Behavior
Web Data Mining and Analysis
Data Management and Algorithms
Advanced Database Systems and Queries
Topic Modeling
Advanced Data Storage Technologies
Semantic Web and Ontologies
Data Quality and Management
Expert finding and Q&A systems
Natural Language Processing Techniques
Cellular Automata and Applications
Error Correcting Code Techniques
Advanced Data Compression Techniques
Advanced Text Analysis Techniques
DNA and Biological Computing
Mobile Crowdsensing and Crowdsourcing
Advanced Image and Video Retrieval Techniques
Recommender Systems and Techniques
Network Packet Processing and Optimization
semigroups and automata theory
Business Process Modeling and Analysis
Genomics and Phylogenetic Studies
Data Mining Algorithms and Applications
Advanced Wireless Communication Techniques

The University of Melbourne
2016-2025

Queensland University of Technology
2022

The University of Queensland
2022

Parks Victoria
2017

Google (United States)
2014

Nokia (United Kingdom)
2010

Data61
2006-2008

University of Canterbury
1983-2005

University of Waikato
1995

RMIT University
1995

A similarity measure for indefinite rankings

OPENALEX - Publications

William Webber Alistair Moffat Justin Zobel

Ranked lists are encountered in research and daily life it is often of interest to compare these even when they incomplete or have only some members common. An example document rankings returned for the same query by different search engines. A measure similarity between should handle nonconjointness, weight high ranks more heavily than low, be monotonic with increasing depth evaluation; but no satisfying all criteria currently exists. In this article, we propose a new having qualities,...

10.1145/1852102.1852106 article EN ACM transactions on office information systems 2010-11-01

Rank-biased precision for measurement of retrieval effectiveness

OPENALEX - Publications

Alistair Moffat Justin Zobel

A range of methods for measuring the effectiveness information retrieval systems has been proposed. These are typically intended to provide a quantitative single-value summary document ranking relative query. However, many these measures have failings. For example, recall is not well founded as measure satisfaction, since user an actual system cannot judge recall. Average precision derived from recall, and suffers same problem. In addition, average lacks key stability properties that needed...

10.1145/1416950.1416952 article EN ACM transactions on office information systems 2008-12-01

Managing Gigabytes: Compressing and Indexing Documents and Images

OPENALEX - Publications

Ian H. Witten Alistair Moffat Tim Bell

10.1109/tit.1995.476344 article EN IEEE Transactions on Information Theory 1995-11-01

Arithmetic coding revisited

OPENALEX - Publications

Alistair Moffat Radford M. Neal Ian H. Witten

Over the last decade, arithmetic coding has emerged as an important compression tool. It is now method of choice for adaptive on myltisymbol alphabets because its speed, low storage requirements, and effectiveness compression. This article describes a new implementation that incorporates several improvements over widely used earlier version by Witten, Neal, Cleary, which become de facto standard. These include fewer multiplicative operations, greatly extended range alphabet sizes symbol...

10.1145/290159.290162 article EN ACM transactions on office information systems 1998-07-01

Implementing the PPM data compression scheme

OPENALEX - Publications

Alistair Moffat

The prediction by partial matching (PPM) data compression algorithm developed J. Cleary and I. Witten (1984) is capable of very high rates, encoding English text in as little 2.2 b/character. It shown that the estimates made resources required to implement scheme can be revised allow for a tractable useful implementation. In particular, variant described encodes decodes at over 4 kB/s on small workstation operates within few hundred kilobytes space, but still obtains about 2.4 b/character...

10.1109/26.61469 article EN IEEE Transactions on Communications 1990-01-01

Exploring the similarity space

OPENALEX - Publications

Justin Zobel Alistair Moffat

Ranked queries are used to locate relevant documents in text databases. In a ranked query list of terms is specified, then the that most closely match returned---in decreasing order similarity---as answers. Crucial efficacy querying use similarity heuristic, mechanism assigns numeric score indicating how document and match. this note we explore categorise range heuristics described literature. We have implemented all these measures structured way, carried out retrieval experiments with...

10.1145/281250.281256 article EN ACM SIGIR Forum 1998-04-01

Self-indexing inverted files for fast text retrieval

OPENALEX - Publications

Alistair Moffat Justin Zobel

Query-processing costs on large text databases are dominated by the need to retrieve and scan inverted list of each query term. Retrieval time for lists can be greatly reduced use compression, but this adds CPU required. Here we show that component response conjunctive Boolean queries informal ranked similarly reduced, at little cost in terms storage, inclusion an internal index compressed list. This method has been applied a retrieval system collection nearly two million short documents....

10.1145/237496.237497 article EN ACM transactions on office information systems 1996-10-01

Inverted files versus signature files for text indexing

OPENALEX - Publications

Justin Zobel Alistair Moffat Kotagiri Ramamohanarao

Two well-known indexing methods are inverted files and signature files. We have undertaken a detailed comparison of these two approaches in the context text indexing, paying particular attention to query evaluation speed space requirements. examined their relative performance using both experimentation refined approach modeling files, demonstrate that distinctly superior Not only can be used evaluate typical queries less time than but require provide greater functionality. Our results also...

10.1145/296854.277632 article EN ACM Transactions on Database Systems 1998-12-01

Inverted Index Compression Using Word-Aligned Binary Codes

OPENALEX - Publications

Vo Ngoc Anh Alistair Moffat

10.1023/b:inrt.0000048490.99518.5c article EN Information Retrieval 2004-11-18

Off-line dictionary-based compression

OPENALEX - Publications

Niklas Larsson Alistair Moffat

Dictionary-based modeling is a mechanism used in many practical compression schemes. In most implementations of dictionary-based the encoder operates on-line, incrementally inferring its dictionary available phrases from previous parts message. An alternative approach to use full message infer complete advance, and include an explicit representation as part compressed this investigation, we develop scheme that combination simple but powerful phrase derivation method compact encoding. The...

10.1109/5.892708 article EN Proceedings of the IEEE 2000-11-01

Offline dictionary-based compression

OPENALEX - Publications

Niklas Larsson Alistair Moffat

Dictionary-based modelling is the mechanism used in many practical compression schemes. We use full message (or a large block of it) to infer complete dictionary advance, and include an explicit representation as part compressed message. Intuitively, advantage this offline approach that with benefit having access all message, it should be possible optimize choice phrases so maximize performance. Indeed, we demonstrate very good can attained by method without compromising fast decoding...

10.1109/dcc.1999.755679 article EN 1999-01-01

Improvements that don't add up

OPENALEX - Publications

Timothy G. Armstrong Alistair Moffat William Webber Justin Zobel

The existence and use of standard test collections in information retrieval experimentation allows results to be compared between research groups over time. Such comparisons, however, are rarely made. Most researchers only report from their own experiments, a practice that lack overall improvement go unnoticed. In this paper, we analyze achieved on the TREC Ad-Hoc, Web, Terabyte, Robust as reported SIGIR (1998--2008) CIKM (2004--2008). Dozens individual published experiments effectiveness...

10.1145/1645953.1646031 article EN 2009-11-02

Vector-space ranking with effective early termination

OPENALEX - Publications

Vo Ngoc Anh Owen de Kretser Alistair Moffat

Considerable research effort has been invested in improving the effectiveness of information retrieval systems. Techniques such as relevance feedback, thesaural expansion, and pivoting all provide better quality responses to queries when tested standard evaluation frameworks. But enhancements can add cost evaluating queries. In this paper we consider pragmatic issue how improve cost-effectiveness searching. We describe a new inverted file structure using quantized weights that provides...

10.1145/383952.383957 article EN 2001-09-01

Similarity measures for tracking information flow

OPENALEX - Publications

Donald Metzler Yaniv Bernstein W. Bruce Croft Alistair Moffat Justin Zobel

Text similarity spans a spectrum, with broad topical near one extreme and document identity at the other. Intermediate levels of -- resulting from summarization, paraphrasing, copying, stronger forms relevance are useful for applications such as information flow analysis question-answering tasks. In this paper, we explore mechanisms measuring intermediate kinds similarity, focusing on task identifying where particular piece originated. We consider both sentence-to-sentence...

10.1145/1099554.1099695 article EN 2005-10-31

Word‐based text compression

OPENALEX - Publications

Alistair Moffat

Abstract The development of efficient algorithms to support arithmetic coding has meant that powerful models text can now be used for data compression. Here the implementation based on recognizing and recording words is considered. Move‐to‐the‐front several variable‐order Markov have been tested with a number different structures, first decisions went into implementations are discussed then experimental results given show English being represented in under 2‐2 bits per character. Moreover...

10.1002/spe.4380190207 article EN Software Practice and Experience 1989-02-01

Frontiers, challenges, and opportunities for information retrieval

OPENALEX - Publications

James Allan Bruce Croft Alistair Moffat Mark Sanderson

During a three-day workshop in February 2012, 45 Information Retrieval researchers met to discuss long-range challenges and opportunities within the field. The result of is diverse set research directions, project ideas, challenge areas. This report describes format, provides summaries broad themes that emerged, includes brief descriptions all detailed discussion six proposals were voted "most interesting" by participants. Key include need to: move beyond ranked lists documents support...

10.1145/2215676.2215678 article EN ACM SIGIR Forum 2012-05-20

Efficient set intersection for inverted indexing

OPENALEX - Publications

J. Shane Culpepper Alistair Moffat

Conjunctive Boolean queries are a key component of modern information retrieval systems, especially when Web-scale repositories being searched. A conjunctive query q is equivalent to | |-way intersection over ordered sets integers, where each set represents the documents containing one terms, and integer in an ordinal document identifier. As case with many computing applications, there tension between way which data represented, ways it be manipulated. In particular, representing index for...

10.1145/1877766.1877767 article EN ACM transactions on office information systems 2010-12-01

Pruned query evaluation using pre-computed impacts

OPENALEX - Publications

Vo Ngoc Anh Alistair Moffat

Exhaustive evaluation of ranked queries can be expensive, particularly when only a small subset the overall ranking is required, or contain common terms. This concern gives rise to techniques for dynamic query pruning, that is, methods eliminating redundant parts usual exhaustive evaluation, yet still generating demonstrably "good enough" set answers query. In this work we propose new pruning make use impact-sorted indexes. Compared reduce amount computation performed, memory required...

10.1145/1148170.1148235 article EN 2006-08-06

Can machine translation systems be evaluated by the crowd alone

OPENALEX - Publications

Yvette Graham Timothy Baldwin Alistair Moffat Justin Zobel

Abstract Crowd-sourced assessments of machine translation quality allow evaluations to be carried out cheaply and on a large scale. It is essential, however, that the crowd's work filtered avoid contamination results through inclusion false assessments. One method filter via agreement with experts, but even amongst experts levels may not high. In this paper, we present new methodology for crowd-sourcing human quality, which allows individual workers develop their own assessment strategy....

10.1017/s1351324915000339 article EN Natural Language Engineering 2015-09-15

Coming Soon ...