- Algorithms and Data Compression
- Information Retrieval and Search Behavior
- Web Data Mining and Analysis
- Data Management and Algorithms
- Advanced Database Systems and Queries
- Topic Modeling
- Advanced Data Storage Technologies
- Semantic Web and Ontologies
- Data Quality and Management
- Expert finding and Q&A systems
- Natural Language Processing Techniques
- Cellular Automata and Applications
- Error Correcting Code Techniques
- Advanced Data Compression Techniques
- Advanced Text Analysis Techniques
- DNA and Biological Computing
- Mobile Crowdsensing and Crowdsourcing
- Advanced Image and Video Retrieval Techniques
- Recommender Systems and Techniques
- Network Packet Processing and Optimization
- semigroups and automata theory
- Business Process Modeling and Analysis
- Genomics and Phylogenetic Studies
- Data Mining Algorithms and Applications
- Advanced Wireless Communication Techniques
The University of Melbourne
2016-2025
Queensland University of Technology
2022
The University of Queensland
2022
Parks Victoria
2017
Google (United States)
2014
Nokia (United Kingdom)
2010
Data61
2006-2008
University of Canterbury
1983-2005
University of Waikato
1995
RMIT University
1995
Ranked lists are encountered in research and daily life it is often of interest to compare these even when they incomplete or have only some members common. An example document rankings returned for the same query by different search engines. A measure similarity between should handle nonconjointness, weight high ranks more heavily than low, be monotonic with increasing depth evaluation; but no satisfying all criteria currently exists. In this article, we propose a new having qualities,...
A range of methods for measuring the effectiveness information retrieval systems has been proposed. These are typically intended to provide a quantitative single-value summary document ranking relative query. However, many these measures have failings. For example, recall is not well founded as measure satisfaction, since user an actual system cannot judge recall. Average precision derived from recall, and suffers same problem. In addition, average lacks key stability properties that needed...
Over the last decade, arithmetic coding has emerged as an important compression tool. It is now method of choice for adaptive on myltisymbol alphabets because its speed, low storage requirements, and effectiveness compression. This article describes a new implementation that incorporates several improvements over widely used earlier version by Witten, Neal, Cleary, which become de facto standard. These include fewer multiplicative operations, greatly extended range alphabet sizes symbol...
The prediction by partial matching (PPM) data compression algorithm developed J. Cleary and I. Witten (1984) is capable of very high rates, encoding English text in as little 2.2 b/character. It shown that the estimates made resources required to implement scheme can be revised allow for a tractable useful implementation. In particular, variant described encodes decodes at over 4 kB/s on small workstation operates within few hundred kilobytes space, but still obtains about 2.4 b/character...
Ranked queries are used to locate relevant documents in text databases. In a ranked query list of terms is specified, then the that most closely match returned---in decreasing order similarity---as answers. Crucial efficacy querying use similarity heuristic, mechanism assigns numeric score indicating how document and match. this note we explore categorise range heuristics described literature. We have implemented all these measures structured way, carried out retrieval experiments with...
Query-processing costs on large text databases are dominated by the need to retrieve and scan inverted list of each query term. Retrieval time for lists can be greatly reduced use compression, but this adds CPU required. Here we show that component response conjunctive Boolean queries informal ranked similarly reduced, at little cost in terms storage, inclusion an internal index compressed list. This method has been applied a retrieval system collection nearly two million short documents....
Two well-known indexing methods are inverted files and signature files. We have undertaken a detailed comparison of these two approaches in the context text indexing, paying particular attention to query evaluation speed space requirements. examined their relative performance using both experimentation refined approach modeling files, demonstrate that distinctly superior Not only can be used evaluate typical queries less time than but require provide greater functionality. Our results also...
Dictionary-based modeling is a mechanism used in many practical compression schemes. In most implementations of dictionary-based the encoder operates on-line, incrementally inferring its dictionary available phrases from previous parts message. An alternative approach to use full message infer complete advance, and include an explicit representation as part compressed this investigation, we develop scheme that combination simple but powerful phrase derivation method compact encoding. The...
Dictionary-based modelling is the mechanism used in many practical compression schemes. We use full message (or a large block of it) to infer complete dictionary advance, and include an explicit representation as part compressed message. Intuitively, advantage this offline approach that with benefit having access all message, it should be possible optimize choice phrases so maximize performance. Indeed, we demonstrate very good can attained by method without compromising fast decoding...
The existence and use of standard test collections in information retrieval experimentation allows results to be compared between research groups over time. Such comparisons, however, are rarely made. Most researchers only report from their own experiments, a practice that lack overall improvement go unnoticed. In this paper, we analyze achieved on the TREC Ad-Hoc, Web, Terabyte, Robust as reported SIGIR (1998--2008) CIKM (2004--2008). Dozens individual published experiments effectiveness...
Considerable research effort has been invested in improving the effectiveness of information retrieval systems. Techniques such as relevance feedback, thesaural expansion, and pivoting all provide better quality responses to queries when tested standard evaluation frameworks. But enhancements can add cost evaluating queries. In this paper we consider pragmatic issue how improve cost-effectiveness searching. We describe a new inverted file structure using quantized weights that provides...
Text similarity spans a spectrum, with broad topical near one extreme and document identity at the other. Intermediate levels of -- resulting from summarization, paraphrasing, copying, stronger forms relevance are useful for applications such as information flow analysis question-answering tasks. In this paper, we explore mechanisms measuring intermediate kinds similarity, focusing on task identifying where particular piece originated. We consider both sentence-to-sentence...
Abstract The development of efficient algorithms to support arithmetic coding has meant that powerful models text can now be used for data compression. Here the implementation based on recognizing and recording words is considered. Move‐to‐the‐front several variable‐order Markov have been tested with a number different structures, first decisions went into implementations are discussed then experimental results given show English being represented in under 2‐2 bits per character. Moreover...
During a three-day workshop in February 2012, 45 Information Retrieval researchers met to discuss long-range challenges and opportunities within the field. The result of is diverse set research directions, project ideas, challenge areas. This report describes format, provides summaries broad themes that emerged, includes brief descriptions all detailed discussion six proposals were voted "most interesting" by participants. Key include need to: move beyond ranked lists documents support...
Conjunctive Boolean queries are a key component of modern information retrieval systems, especially when Web-scale repositories being searched. A conjunctive query q is equivalent to | |-way intersection over ordered sets integers, where each set represents the documents containing one terms, and integer in an ordinal document identifier. As case with many computing applications, there tension between way which data represented, ways it be manipulated. In particular, representing index for...
Exhaustive evaluation of ranked queries can be expensive, particularly when only a small subset the overall ranking is required, or contain common terms. This concern gives rise to techniques for dynamic query pruning, that is, methods eliminating redundant parts usual exhaustive evaluation, yet still generating demonstrably "good enough" set answers query. In this work we propose new pruning make use impact-sorted indexes. Compared reduce amount computation performed, memory required...
Abstract Crowd-sourced assessments of machine translation quality allow evaluations to be carried out cheaply and on a large scale. It is essential, however, that the crowd's work filtered avoid contamination results through inclusion false assessments. One method filter via agreement with experts, but even amongst experts levels may not high. In this paper, we present new methodology for crowd-sourcing human quality, which allows individual workers develop their own assessment strategy....