- Web Data Mining and Analysis
- Data Management and Algorithms
- Algorithms and Data Compression
- Advanced Database Systems and Queries
- Complex Network Analysis Techniques
- Information Retrieval and Search Behavior
- Semantic Web and Ontologies
- Caching and Content Delivery
- Recommender Systems and Techniques
- Web visibility and informetrics
- Data Mining Algorithms and Applications
- Natural Language Processing Techniques
- semigroups and automata theory
- Network Packet Processing and Optimization
- Reading and Literacy Development
- Topic Modeling
- Ethics and Social Impacts of AI
- DNA and Biological Computing
- Optimization and Search Problems
- Advanced Text Analysis Techniques
- Misinformation and Its Impacts
- Text Readability and Simplification
- Advanced Image and Video Retrieval Techniques
- Data Visualization and Analytics
- Data Quality and Management
Universitat Pompeu Fabra
2015-2025
Silicon Valley University
2018-2025
Northeastern University
2018-2025
University of Chile
2006-2024
Universidad del Noreste
2019-2024
Ospedale Policlinico San Martino
2023
University of California, San Francisco
2023
Consorci Institut D'Investigacions Biomediques August Pi I Sunyer
2023
Intel (United States)
2023
Eastern University
2021-2022
Contents Preface Acknowledgements 1 Introduction 2 User Interfaces for Search by Marti Hearst 3 Modeling 4 Retrieval Evaluation 5 Relevance Feedback and Query Expansion 6 Documents: Languages & Properties with Gonzalo Navarro Nivio Ziviani 7 Queries: 8 Text Classification Marcos Gonccalves 9 Indexing Searching 10 Parallel Distributed IR Eric Brown 11 Web Yoelle Maarek 12 Crawling Carlos Castillo 13 Structured Mounia Lalmas 14 Multimedia Information Dulce Poncele'on Malcolm Slaney 15...
article Free Access Share on A new approach to text searching Authors: Ricardo Baeza-Yates Universidad de Chile, Blanco Encalada 2120, Depto. Ciencias la Computacion, Santiago, Chile ChileView Profile , Gaston H. Gonnet Informatik, Swiss Technological Institute in Zurich, Switzerland SwitzerlandView Authors Info & Claims Communications of the ACMVolume 35Issue 10Oct. 1992pp 74–82https://doi.org/10.1145/135239.135243Published:01 October 1992Publication History...
Bias in Web data and use taints the algorithms behind Web-based applications, delivering equally biased results.
In this paper we study a large query log of more than twenty million queries with the goal extracting semantic relations that are implicitly captured in actions users submitting and clicking answers. Previous analyses were mostly done just not followed after them. We first propose novel way to represent vector space based on graph derived from query-click bipartite graph. then analyze produced by our log, showing it is less sparse previous results suggested, almost all measures these graphs...
In this work, we define and solve the Fair Top-k Ranking problem, in which want to determine a subset of k candidates from large pool n >> candidates, maximizing utility (i.e., select "best" candidates) subject group fairness criteria. Our ranked definition extends using standard notion protected groups is based on ensuring that proportion every prefix top-k ranking remains statistically above or indistinguishable given minimum. Utility operationalized two ways: (i) candidate included...
We present a fast compression technique for natural language texts. The novelties are that (1) decompression of arbitrary portions the text can be done very efficiently, (2) exact search words and phrases on compressed directly, using any known sequential pattern-matching algorithm, (3) word-based approximate extended also efficiently without decoding. scheme uses semistatic model Huffman code where coding alphabet is byte-oriented rather than bit-oriented. compress typical English texts to...
Prunus persica has been proposed as a genomic model for deciduous trees and the Rosaceae family.Optimized protocols RNA isolation are necessary to further advance studies in this species such that functional genomics analyses may be performed.Here we present an optimized protocol rapidly efficiently purify high quality total from peach fruits (Prunus persica).Isolating high-quality fruit tissue is often difficult due large quantities of polysaccharides polyphenolic compounds accumulate...
In this paper we study the trade-offs in designing efficient caching systems for Web search engines. We explore impact of different approaches, such as static vs. dynamic caching, and query results vs.caching posting lists. Using a log spanning whole year limitations demonstrate that lists can achieve higher hit rates than answers. propose new algorithm lists, which outperforms previous methods. also problem finding optimal way to split cache between answers Finally, measure how changes...
Time is an important dimension of any information space and can be very useful in retrieval. Current retrieval systems applications do not take advantage all the time available content documents to provide better search results user experience. In this paper we show some areas that benefit from exploiting such temporal information.
Given the large number of installed apps and limited screen size mobile devices, it is often tedious for users to search app they want use. Although some OSs provide categorization schemes that enhance visibility useful among those installed, emerging category homescreen aims take one step further by automatically organizing in a more intelligent personalized way. In this paper, we study how improve apps' usage experience through prediction mechanism allows show which she going use immediate...
Despite some key problems, big data could fundamentally change scientific research methodology and how businesses develop products provide services.
Around 10% of the people have dyslexia, a neurological disability that impairs person's ability to read and write. There is evidence presentation text has significant effect on text's accessibility for with dyslexia. However, best our knowledge, there are no experiments objectively measure impact font type reading performance. In this paper, we present first experiment uses eye-tracking speed. Using within-subject design, 48 subjects dyslexia 12 texts different fonts. Sans serif, monospaced...
Time is an important dimension of any information space and can be very useful in retrieval particular clustering exploration search results. Search result a feature integrated some today's engines, allowing users to further explore However, only little work has been done on exploiting temporal embedded documents for the presentation, clustering, results along well-defined timelines. In this paper, we present add-on traditional applications which exploit various associated with cluster...