- Advanced Database Systems and Queries
- Data Management and Algorithms
- Semantic Web and Ontologies
- Recommender Systems and Techniques
- Web Data Mining and Analysis
- Data Quality and Management
- Topic Modeling
- Scientific Computing and Data Management
- Mobile Crowdsensing and Crowdsourcing
- Data Mining Algorithms and Applications
- Big Data and Business Intelligence
- Natural Language Processing Techniques
- Advanced Bandit Algorithms Research
- Expert finding and Q&A systems
- Data Stream Mining Techniques
- Advanced Graph Neural Networks
- Ethics and Social Impacts of AI
- Complex Network Analysis Techniques
- Peer-to-Peer Network Technologies
- Open Education and E-Learning
- Privacy-Preserving Technologies in Data
- Multimedia Communication and Technology
- Information Retrieval and Search Behavior
- Advanced Text Analysis Techniques
- Constraint Satisfaction and Optimization
Athena Research and Innovation Center In Information Communication & Knowledge Technologies
2017-2024
Hewlett-Packard (United States)
2010-2021
Arizona State University
2016
Utah State University
2016
University of Massachusetts Amherst
2016
IBM (United States)
2012-2013
IBM Research - Almaden
2010-2012
Stanford University
2007-2011
National and Kapodistrian University of Athens
2004-2011
University of Westminster
2009
Social bookmarking is a recent phenomenon which has the potential to give us great deal of data about pages on web. One major question whether that can be used augment systems like web search. To answer this question, over past year we have gathered what believe largest dataset from social site yet analyzed by academic researchers. Our represents forty million bookmarks del.icio.us. We contribute characterization posts del.icio. us: how many exist (about 115 million), fast it growing, and...
Abstract To bridge the gap between users and data, numerous text-to-SQL systems have been developed that allow to pose natural language questions over relational databases. Recently, novel are adopting deep learning methods with very promising results. At same time, several challenges remain open making this area an active flourishing field of research development. make real progress in building systems, we need de-mystify what has done, understand how when each approach can be used, and,...
In recent years, social Web sites have become important components of the Web. With their success, however, has come a growing influx spam. If left unchecked, spam threatens to undermine resource sharing, interactivity, and openness. This article surveys three categories potential countermeasures - those based on detection, demotion, prevention. Although many these been proposed before for email spam, authors find that applicability differs.
Entity Resolution (ER) is the problem of identifying which records in a database refer to same real-world entity. An exhaustive ER process involves computing similarities between pairs records, can be very expensive for large datasets. Various blocking techniques used enhance performance by dividing into blocks multiple ways and only comparing within block. However, most separately do not exploit results other blocks. In this paper, we propose an iterative framework where are reflected...
Recommendation systems have become very popular but most recommendation methods are `hard-wired' into the system making experimentation with and implementation of new paradigms cumbersome. In this paper, we propose FlexRecs, a framework that decouples definition process from its execution supports flexible recommendations over structured data. approach can be defined declaratively as high-level parameterized workflow comprising traditional relational operators generate or combine...
Preferences have been traditionally studied in philosophy, psychology, and economics applied to decision making problems. Recently, they attracted the attention of researchers other fields, such as databases where capture soft criteria for queries. Databases bring a whole fresh perspective study preferences, both computational representational. From representational perspective, central question is how we can effectively represent preferences incorporate them database querying. look at...
Entity Resolution is an inherently quadratic task that typically scales to large data collections through blocking. In the context of highly heterogeneous information spaces, blocking methods rely on redundancy in order ensure high effectiveness at cost lower efficiency (i.e., more comparisons). This effect partially ameliorated by coarse-grained block processing techniques discard entire blocks either a-priori or during resolution process. this paper, we introduce meta-blocking as a generic...
Abstract We increasingly depend on a variety of data-driven algorithmic systems to assist us in many aspects life. Search engines and recommender among others are used as sources information help making all sort decisions from selecting restaurants books, choosing friends careers. This has given rise important concerns regarding the fairness such systems. In this work, we aim at presenting toolkit definitions, models methods for ensuring rankings recommendations. Our objectives threefold:...
As information becomes available in increasing amounts to a wide spectrum of users, the need for shift towards more user-centered access paradigm arises. We develop personalization framework database systems based on user profiles and identify basic architectural modules required support it. define preference model that assigns each atomic query condition personal degree interest provide mechanism compute any complex degrees constituent ones. Preferences are stored profiles. At time,...
Tagging systems allow users to interactively annotate a pool of shared resources using descriptive tags. As tagging are gaining in popularity, they become more susceptible tag spam: misleading tags that generated order increase the visibility some or simply confuse users. We introduce framework for modeling and user behavior. also describe method ranking documents matching based on taggers' reliability. Using our framework, we study behavior existing approaches under malicious attacks impact...
Query personalization is the process of dynamically enhancing a query with related user preferences stored in profile aim providing personalized answers. The underlying idea that different users may find things relevant to search due preferences. Essential ingredients are: (a) model for representing and storing profiles, (b) algorithms generation answers using Modeling plethora preference types challenge. In this paper, we present combines expressivity concision. addition, provide efficient...
Keyword searches are attractive because they facilitate users searching structured databases. On the other hand, tag clouds popular for navigation and visualization purposes over unstructured data can highlight most significant concepts hidden relationships in underlying content dynamically. In this paper, we propose coupling flexibility of keyword with summarization capabilities to help access a database. We using (data clouds) summarize results guide refine their searches. The cloud...
Many applications offer a form-based environment for nai¿ve users accessing databases without being familiar with the database schema or structured query language. User interactions are translated to queries and executed. However, as user is unlikely know underlying semantic connections among fields presented in form, it often useful provide her textual explanation of query. In this paper, we take graph-based approach translation problem. We represent various forms directed graphs annotate...
We examine the creation of a tag cloud for exploring and understanding set objects (e.g., web pages, documents). In first part our work, we present formal system model reasoning about clouds. then metrics that capture structural properties cloud, briefly selection algorithms are used in current sites del.icio.us, Flickr, Technorati) or have been described recent work. order to evaluate results these algorithms, devise novel synthetic user model. This is specifically tailored evaluation...
Entity Resolution constitutes a core task for data integration that, due to its quadratic complexity, typically scales large datasets through blocking methods. These can be configured in two ways. The schema-based configuration relies on schema information order select signatures of high distinctiveness and low noise, while the schema-agnostic one treats every token from all attribute values as signature. latter approach has significant potential, it requires no fine-tuning by human experts...
Entity Resolution matches mentions of the same entity. Being an expensive task for large data, its performance can be improved by blocking, i.e., grouping similar entities and comparing only in group. Blocking improves run-time Resolution, but it still involves unnecessary comparisons that limit performance. Meta-blocking is process restructuring a block collection order to prune such comparisons. Existing unsupervised meta-blocking methods use simple pruning rules, which offer rather...
Tagging systems allow users to interactively annotate a pool of shared resources using descriptive strings called tags . Tags are used guide interesting and help them build communities that share their expertise resources. As tagging gaining in popularity, they become more susceptible tag spam : misleading generated order increase the visibility some or simply confuse users. Our goal is understand this problem better. In particular, we interested answers questions such as: How many malicious...
How to address user information needs amidst a preponderance of data.
Data is a prevalent part of every business and scientific domain,but its explosive volume increasing complexity make data querying challenging even for experts. For this reason, numerous text-to-SQL systems have been developed that enable relational databases using natural language. The recent advances on deep neural networks along with the creation two large datasets specifically made training systems, paved path novel very promising research area. purpose tutorial dive into area, covering...
Wide spread use of database systems in modern society has brought the need to provide inexperienced users with ability easily search a no specific knowledge query language. Several recent research efforts have focused on supporting keyword-based searches over relational databases. This paper presents an alternative proposal and introduces idea précis queries. These are free-form queries whose answer (a précis) is synthesis results, containing not only information directly related selections...