- Privacy-Preserving Technologies in Data
- Data Management and Algorithms
- Advanced Database Systems and Queries
- Cryptography and Data Security
- Data Mining Algorithms and Applications
- Cloud Computing and Resource Management
- Stochastic Gradient Optimization Techniques
- Peer-to-Peer Network Technologies
- Rough Sets and Fuzzy Logic
- Mobile Crowdsensing and Crowdsourcing
- Data Quality and Management
- Data Stream Mining Techniques
- Caching and Content Delivery
- Advanced Image and Video Retrieval Techniques
- Human Mobility and Location-Based Analysis
- Network Security and Intrusion Detection
- Advanced Graph Neural Networks
- Recommender Systems and Techniques
- Bayesian Modeling and Causal Inference
- Topic Modeling
- Privacy, Security, and Data Protection
- Constraint Satisfaction and Optimization
- Web Data Mining and Analysis
- Advanced Data Storage Technologies
- Natural Language Processing Techniques
Henan University of Economic and Law
2007-2024
Tianjin University
2024
Zhejiang Gongshang University
2023
Tsinghua University
2023
Durham University
2022
Communication University of China
2020-2021
Guangdong University of Technology
2015-2020
State Grid Corporation of China (China)
2020
Henan Institute of Science and Technology
2020
China Agricultural University
2019
Given a d-dimensional data set, point p dominates another q if it is better than or equal to in all dimensions and at least one dimension. A skyline there does not exists any that can dominate it. Skyline queries, which return points, are useful many decision making applications.Unfortunately, as the number of increases, chance dominating very low. As such, points become too numerous offer interesting insights. To find more important meaningful high dimensional space, we propose new concept,...
ε-differential privacy is the state-of-the-art model for releasing sensitive information while protecting privacy. Numerous methods have been proposed to enforce in various analytical tasks, e.g., regression analysis . Existing solutions analysis, however, are either limited non-standard types of or unable produce accurate results. Motivated by this, we propose Functional Mechanism , a differentially private method designed large class optimization-based analyses. The main idea perturbing...
Event-based social networks (EBSNs), such as Meetup and Plancast, which offer platforms for users to plan, arrange, publish events, have gained increasing popularity rapid growth. EBSNs capture not only the online relationship, but also offline interactions from events. They contain rich heterogeneous information, including multiple types of entities, users, groups tags, their interaction relations. Three recommendation tasks, namely recommending tags groups, events been explored in three...
Histograms are the workhorse of data mining and analysis. This paper considers problem publishing histograms under differential privacy, one strongest privacy models. Existing differentially private histogram publication schemes have shown that clustering (or grouping) is a promising idea to improve accuracy sanitized histograms. However, none them fully exploits benefit clustering. In this paper, we introduce new framework. It features sophisticated evaluation trade-off between...
The outbreak of a novel coronavirus (COVID-19) generated an public opinions in the Chinese Sina-microblog. To help designing effective communication strategies during major health emergency, we propose multiple-information susceptible-discussing-immune (M-SDI) model order to understand patterns key information propagation on social networks. We develop M-SDI model, based discussion quantity and take into account behavior that users may re-enter another related topic or Weibo after discussing...
Data privacy has been an important research topic in the security, theory and database communities last few decades. However, many existing studies have restrictive assumptions regarding adversary's prior knowledge, meaning that they preserve individuals' only when adversary rather limited background information about sensitive data, or uses certain kinds of attacks. Recently, differential emerged as a new paradigm for protection with very conservative knowledge. Since its proposal, had...
Differential privacy is a promising privacy-preserving paradigm for statistical query processing over sensitive data. It works by injecting random noise into each result, such that it provably hard the adversary to infer presence or absence of any individual record from published noisy results. The main objective in differentially private maximize accuracy results, while satisfying guarantees. Previous work, notably matrix mechanism [16], has suggested batch correlated queries as whole can...
In a data stream management system (DSMS), users register continuous queries, and receive result updates as arrive expire. We focus on applications with real-time constraints, in which the user must each update within given period after occurs. To handle fast data, DSMS is commonly placed top of cloud infrastructure. Because properties such arrival rates can fluctuate unpredictably, resources be dynamically provisioned scheduled accordingly to ensure response. It essential, for existing...
Heterogeneous networks refer to the comprising multiple types of entities as well their interaction relationships. They arise in a great variety domains, for example, event-based social Meetup and Plancast, DBLP. Recommendation is useful task these heterogeneous network systems. Although many recommendation algorithms are proposed data, none them able explicitly model influence strength between different entities, which not only achieving higher accuracy but also better understanding role...
Vibration sensor is becoming an essential part of Internet Things (IoT), fueled by the quickly evolving technology improving measurement accuracy and lowering hardware cost. sensors physically attach to core equipments in control manufacturing systems, e.g., motors tubes, providing key insight into running status these devices. Massive readings from vibration sensors, however, pose new technical challenges analytical system, due non-continuous sampling strategy for energy saving, as well...
epsilon-differential privacy is rapidly emerging as the state-of-the-art scheme for protecting individuals' in published analysis results over sensitive data. The main idea to perform random perturbations on results, such that any individual's presence data has negligible impact randomized results. This paper focuses tasks involve model fitting, i.e., finding parameters of a statistical best fit dataset. For tasks, quality differentially private depends upon both effectiveness fitting...
ε-differential privacy is the state-of-the-art model for releasing sensitive information while protecting privacy. Numerous methods have been proposed to enforce epsilon-differential in various analytical tasks, e.g., regression analysis. Existing solutions analysis, however, are either limited non-standard types of or unable produce accurate results. Motivated by this, we propose Functional Mechanism, a differentially private method designed large class optimization-based analyses. The main...
In a stream data analytics system, input arrive continuously and trigger the processing updating of results. We focus on applications with real-time constraints, in which, any unit must be completely processed within given time duration. To handle fast data, it is common to place system top cloud infrastructure. Because properties, such as arrival rates can fluctuate unpredictably, resources dynamically provisioned scheduled accordingly ensure responses. It essential, for existing systems or...
Huge amount of data with both space and text information, e.g., geo-tagged tweets, is flooding on the Internet. Such spatio-textual stream contains valuable information for millions users various interests different keywords locations. Publish/subscribe systems enable efficient effective distribution by allowing to register continuous queries spatial textual constraints. However, explosive growth scale user base has posed challenges existing centralized publish/subscribe spatiotextual...
Existing work in the skyline literature focuses on optimizing processing cost. This paper aims at minimization of communication overhead client-server architectures, where a server continuously maintains dynamic objects. Our first contribution is Filter method that avoids transmission updates from objects cannot influence skyline. Specifically, each object assigned filter so it needs to issue an update only if violates its filter. achieves significant savings over naive approach transmitting...
Differential privacy provides the first theoretical foundation with provable guarantee against adversaries arbitrary prior knowledge. The main idea to achieve differential is inject random noise into statistical query results. Besides correctness, most important goal in design of a differentially private mechanism reduce effect noise, ensuring that noisy results can still be useful. This paper proposes \emph{compressive mechanism}, novel solution on basis state-of-the-art compression...
Key-based workload partitioning is a common strategy used in parallel stream processing engines, enabling effective key-value tuple distribution over worker threads logical operator. It likely to generate poor balancing performance when variance occurs on the incoming data stream. This paper presents new key-based framework, with practical algorithms support dynamic assignment for stateful operators. The framework combines hash-based and explicit routing strategies distribution, which...
The skyline of a d-dimensional dataset consists all points not dominated by others. incorporation the operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing theoretical work on this problem is limited to case where d dimensions are independent each other, which rarely holds for real datasets. state art Log Sampling (LS) technique simply applies results non-independent data anyway, sometimes leading large errors. To...
Collaborative filtering (CF) systems exploit previous ratings and similarity in user behavior to recommend the top-k objects/records which are potentially most interesting assuming a single score per object. However, various applications, record (e.g., hotel) maybe rated on several attributes (value, service, etc.), case simply returning ones with highest overall scores fails capture individual attribute characteristics accommodate different selection criteria. In order enhance flexibility...