- Data Management and Algorithms
- Advanced Database Systems and Queries
- Peer-to-Peer Network Technologies
- Caching and Content Delivery
- Advanced Data Storage Technologies
- Data Mining Algorithms and Applications
- Distributed systems and fault tolerance
- Advanced Image and Video Retrieval Techniques
- Cloud Computing and Resource Management
- Algorithms and Data Compression
- Graph Theory and Algorithms
- Human Mobility and Location-Based Analysis
- Distributed and Parallel Computing Systems
- Cryptography and Data Security
- Privacy-Preserving Technologies in Data
- Image Retrieval and Classification Techniques
- Mobile Agent-Based Network Management
- Geographic Information Systems Studies
- Complex Network Analysis Techniques
- Optimization and Search Problems
- Web Data Mining and Analysis
- Recommender Systems and Techniques
- Semantic Web and Ontologies
- Internet Traffic Analysis and Secure E-voting
- Data Stream Mining Techniques
University of Manchester
2023-2025
National University of Singapore
2015-2024
Singapore Management University
2018
Duke-NUS Medical School
2016
Universiti Tunku Abdul Rahman
2016
UNSW Sydney
2012
University of California, Santa Barbara
2011
Singapore-MIT Alliance for Research and Technology
2005-2011
Singapore General Hospital
1987-2009
University of Michigan
2006-2007
Mobile devices equipped with positioning capabilities (e.g., GPS) can ask location-dependent queries to Location Based Services (LBS). To protect privacy, the user location must not be disclosed. Existing solutions utilize a trusted anonymizer between users and LBS. This approach has several drawbacks: (i) All trust third party anonymizer, which is single point of attack. (ii) A large number cooperating, trustworthy needed. (iii) Privacy guaranteed only for snapshot locations; are protected...
Blockchain technologies are taking the world by storm. Public blockchains, such as Bitcoin and Ethereum, enable secure peer-to-peer applications like crypto-currency or smart contracts. Their security performance well studied. This paper concerns recent private blockchain systems designed with stronger (trust) assumption requirement. These target aim to disrupt which have so far been implemented on top of database systems, for example banking, finance trading applications. Multiple platforms...
In this article, we present an efficient B + -tree based indexing method, called iDistance, for K-nearest neighbor (KNN) search in a high-dimensional metric space. iDistance partitions the data on space- or data-partitioning strategy, and selects reference point each partition. The points partition are transformed into single dimensional value their similarity with respect to point. This allows be indexed using structure KNN performed one-dimensional range search. choice of adapts index...
Influence Maximization (IM), which selects a set of k users (called seed set) from social network to maximize the expected number influenced influence spread), is key algorithmic problem in analysis. Due its immense application potential and enormous technical challenges, IM has been extensively studied past decade. In this paper, we survey synthesize wide spectrum existing studies on an perspective, with special focus following aspects: (1) review well-accepted diffusion models that capture...
Given a d-dimensional data set, point p dominates another q if it is better than or equal to in all dimensions and at least one dimension. A skyline there does not exists any that can dominate it. Skyline queries, which return points, are useful many decision making applications.Unfortunately, as the number of increases, chance dominating very low. As such, points become too numerous offer interesting insights. To find more important meaningful high dimensional space, we propose new concept,...
Growing main memory capacity has fueled the development of in-memory big data management and processing. By eliminating disk I/O bottleneck, it is now possible to support interactive analytics. However, systems are much more sensitive other sources overhead that do not matter in traditional I/O-bounded disk-based systems. Some issues such as fault-tolerance consistency also challenging handle environment. We witnessing a revolution design database exploits its storage layer. Many these...
Influence maximization, whose objective is to select k users (called seeds) from a social network such that the number of influenced by seeds influence spread) maximized, has attracted significant attention due its widespread applications, as viral marketing and rumor control. However, in real-world networks, have their own interests (which can be represented topics) are more likely friends (or friends' friends) with similar topics. We increase spread taking into consideration To address...
Blockchain technologies are taking the world by storm. Public blockchains, such as Bitcoin and Ethereum, enable secure peer-to-peer applications like crypto-currency or smart contracts. Their security performance well studied. This paper concerns recent private blockchain systems designed with stronger (trust) assumption requirement. These target aim to disrupt which have so far been implemented on top of database systems, for example banking, finance applications. Multiple platforms...
Crowdsourcing is widely accepted as a means for resolving tasks that machines are not good at. Unfortunately, may yield relatively low-quality results if there no proper quality control. Although previous studies attempt to eliminate "bad" workers by using qualification tests, the accuracies estimated from qualifications be accurate, because have diverse across tasks. Thus, of could further improved selectively assigning who well acquainted with To this end, we propose an adaptive...
We present the design and evaluation of PeerDB, a peer-to-peer (P2P) distributed data sharing system. PeerDB distinguishes itself from existing P2P systems in several ways. First, it is full-fledge management system that supports fine-grain content-based searching. Second, facilitates without shared schema. Third, combines power mobile agents into to perform operations at peers' sites. Fourth, network self-configurable, i.e., node can dynamically optimize set peers communicate directly with...
In data publishing, the owner delegates role of satisfying user queries to a third-party publisher. As publisher may be untrusted or susceptible attacks, it could produce incorrect query results. this paper, we introduce scheme for users verify that their results are complete (i.e., no qualifying tuples omitted) and authentic all result values originated from owner). The supports range selection on key non-key attributes, project as well join relational databases. Moreover, proposed complies...
Edge computing pushes application logic and the underlying data to edge of network, with aim improving availability scalability. As servers are not necessarily secure, there must be provisions for validating their outputs. This paper proposes a mechanism that creates verification object (VO) checking integrity each query result produced by an server - values in tuples tampered with, no spurious introduced. The primary advantages our proposed VO is independent database size, relational...
users in a social network to maximize the expected number of influenced by selected (called influence spread), has been extensively studied, existing works neglected fact that location information can play an important role maximization. Many real-world applications such as location-aware word-of-mouth marketing have requirement. In this paper we study maximization problem. One big challenge is develop efficient scheme offers wide spread. To address challenge, propose two greedy algorithms...
Advertising in social network has become a multi-billion-dollar industry. A main challenge is to identify key influencers who can effectively contribute the dissemination of information. Although influence maximization problem, which finds seed set k most influential users based on certain propagation models, been well studied, it not target-aware and cannot be directly applied online advertising. In this paper, we propose new named Keyword-Based Targeted Influence Maximization (KB-TIM),...
Information networks, such as social media and email often contain sensitive information. Releasing network data could seriously jeopardize individual privacy. Therefore, we need to sanitize before the release. In this paper, present a novel sanitization solution that infers network's structure in differentially private manner. We observe that, by estimating connection probabilities between vertices instead of considering observed edges directly, noise scale enforced differential privacy can...
In the recent decades, we have witnessed rapidly growing popularity of location-based systems. Three types queries on road networks, single-pair shortest path query, k nearest neighbor (kNN) and keyword-based kNN are widely used in Inspired by R-tree, propose a height-balanced scalable index, namely G-tree, to efficiently support these queries. The space complexity G-tree is O(|V|log|V|) where |V| number vertices network. Unlike previous works that separately, supports all within one...
Massive amount of data that are geo-tagged and associated with text information being generated at an unprecedented scale. These geo-textual cover a wide range topics. Users interested in receiving up-to-date tweets such their locations close to user specified location texts interesting users. For example, may want be updated near her home on the topic "food poisoning vomiting." We consider Temporal Spatial-Keyword Top-k Subscription (TaSK) query. Given TaSK query, we continuously maintain...
Deep learning has recently become very popular on account of its incredible success in many complex datadriven applications, including image classification and speech recognition. The database community worked data-driven applications for years, therefore should be playing a lead role supporting this new wave. However, databases deep are different terms both techniques applications. In paper, we discuss research problems at the intersection two fields. particular, possible improvements...
In this big data era, huge amounts of spatial documents have been generated everyday through various location based services. Top-k keyword search is an important approach to exploring useful information from a database. It retrieves k on ranking function that takes into account both textual relevance (similarity between the query and document keywords) (distance locations). Various hybrid indexes proposed in recent years which mainly combine R-tree inverted index so pruning can be executed...
Deep learning has shown outstanding performance in various machine tasks. However, the deep complex model structure and massive training data make it expensive to train. In this paper, we present a distributed system, called SINGA, for big models over large datasets. An intuitive programming based on layer abstraction is provided, which supports variety of popular models. SINGA architecture both synchronous asynchronous frameworks. Hybrid frameworks can also be customized achieve good...
We introduce ChronoStream, a distributed system specifically designed for elastic stateful stream computation in the cloud. ChronoStream treats internal state as first-class citizen and aims at providing flexible support both vertical horizontal dimensions to cope with workload fluctuation dynamic resource reclamation. With clear separation between application-level parallelism OS-level execution concurrency, enables transparent scaling failure recovery by eliminating any network I/O...