Yanhao Wang

ORCID: 0000-0002-7661-3917
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Management and Algorithms
  • Complex Network Analysis Techniques
  • Privacy-Preserving Technologies in Data
  • Recommender Systems and Techniques
  • Optimization and Search Problems
  • Advanced Graph Neural Networks
  • Complexity and Algorithms in Graphs
  • Advanced Malware Detection Techniques
  • Internet Traffic Analysis and Secure E-voting
  • Data Mining Algorithms and Applications
  • Advanced Image and Video Retrieval Techniques
  • Caching and Content Delivery
  • Opinion Dynamics and Social Influence
  • Spam and Phishing Detection
  • Data Quality and Management
  • Peer-to-Peer Network Technologies
  • Adversarial Robustness in Machine Learning
  • Human Mobility and Location-Based Analysis
  • Video Surveillance and Tracking Methods
  • Network Security and Intrusion Detection
  • Multi-Criteria Decision Making
  • Anomaly Detection Techniques and Applications
  • Consumer Market Behavior and Pricing
  • Software Engineering Research
  • Topic Modeling

East China Normal University
2022-2025

Zhejiang Sci-Tech University
2025

Jilin Medical University
2025

Jilin University
2025

National Chung Cheng University
2023-2024

South China University of Technology
2024

Tsinghua University
2024

Indiana University Bloomington
2020-2023

Hunan University of Science and Technology
2022-2023

Beijing Institute of Technology
2023

Influence Maximization (IM), which selects a set of k users (called seed set) from social network to maximize the expected number influenced influence spread), is key algorithmic problem in analysis. Due its immense application potential and enormous technical challenges, IM has been extensively studied past decade. In this paper, we survey synthesize wide spectrum existing studies on an perspective, with special focus following aspects: (1) review well-accepted diffusion models that capture...

10.1109/tkde.2018.2807843 article EN IEEE Transactions on Knowledge and Data Engineering 2018-02-22

Despite its benefits for children's skill development and parent-child bonding, many parents do not often engage in interactive storytelling by having story-related dialogues with their child due to limited availability or challenges coming up appropriate questions. While recent advances made AI generation of questions from stories possible, the fully-automated approach excludes parent involvement, disregards educational goals, underoptimizes engagement. Informed need-finding interviews...

10.1145/3491102.3517479 article EN CHI Conference on Human Factors in Computing Systems 2022-04-28

Influence maximization (IM), which selects a set of k users (called seeds) to maximize the influence spread over social network, is fundamental problem in wide range applications such as viral marketing and network monitoring. Existing IM solutions fail consider highly dynamic nature influence, results either poor seed qualities or long processing time when evolves. To address this problem, we define novel query named Stream Maximization (SIM) on streams. Technically, SIM adopts sliding...

10.14778/3067421.3067429 article EN Proceedings of the VLDB Endowment 2017-03-01

Traffic classification is a critical task in network security and management. Recent research has demonstrated the effectiveness of deep learning-based traffic method. However, following limitations remain: (1) representation simply generated from raw packet bytes, resulting absence important information; (2) model structure directly applying learning algorithms does not take characteristics into account; (3) scenario-specific classifier training usually requires labor-intensive...

10.1609/aaai.v37i4.25674 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Recommender systems typically suggest to users content similar what they consumed in the past. If a user happens be exposed strongly polarized content, she might subsequently receive recommendations which may steer her towards more and radicalized eventually being trapped we call "radicalization pathway". In this paper, study problem of mitigating radicalization pathways using graph-based approach. Specifically, model set "what-to-watch-next" recommender as d-regular directed graph where...

10.1145/3485447.3512143 article EN Proceedings of the ACM Web Conference 2022 2022-04-25

Pattern images input to The knitting CAD system have a large number of different colors, and it is necessary reduce the colors in image by merging similar through color-separation algorithms. However, current pattern usually problem degradation, which seriously affects accuracy In addition, traditional algorithm needs rely on manual setting clustering parameters, very time-consuming laborious. To solve these problems, this paper, we propose density peak based self-organizing mapping (SOM)...

10.1109/access.2025.3526233 article EN cc-by IEEE Access 2025-01-01

In recent years, there has been an increasing interest in image anonymization, particularly focusing on the de-identification of faces and individuals. However, for self-driving applications, merely de-identifying individuals might not provide sufficient privacy protection since street views like vehicles buildings can still disclose locations, trajectories, other sensitive information. Therefore, it remains crucial to extend anonymization techniques view images fully preserve users,...

10.48550/arxiv.2501.09393 preprint EN arXiv (Cornell University) 2025-01-16

Abstract Large language models (LLMs) have recently demonstrated exceptional capabilities across a variety of linguistic tasks including question answering (QA). However, it remains challenging to assess their performance in astronomical QA due the lack comprehensive benchmark datasets. To bridge this gap, we construct Astro-QA, first dataset specifically for astronomy. The contains collection 3,082 questions six types both English and Chinese, along with standard (reference) answers related...

10.1038/s41597-025-04613-9 article EN cc-by Scientific Data 2025-03-18

Abstract In this paper, the development law of residual deformation coal gangue subgrade filler is analyzed through large scale triaxial test, and model mainly sandstone limestone established. The purpose to provide research basis for applicability as filler. results show that increases first then tends be constant under cyclic load multiple vibration times. It found Shenzhujiang cannot accurately predict law, corresponding modification made filling body. Finally, according calculation grey...

10.1038/s41598-023-35199-0 article EN cc-by Scientific Reports 2023-05-21

Abstract In machine learning (ML) problems, it is widely believed that more training samples lead to improved predictive accuracy but incur higher computational costs. Consequently, achieving better data efficiency , is, the trade-off between size of set and output model, becomes a key problem in ML applications. this research, we systematically investigate Univariate Time Series Anomaly Detection (UTS-AD) models. We first experimentally examine performance nine popular UTS-AD algorithms as...

10.1186/s40537-024-00940-7 article EN cc-by Journal Of Big Data 2024-06-11

A great variety of complex systems ranging from user interactions in communication networks to transactions financial markets can be modeled as temporal graphs, which consist a set vertices and series timestamped directed edges. Temporal motifs graphs are generalized subgraph patterns static take into account edge orderings durations addition structures. Counting the number occurrences is fundamental problem for network analysis. However, existing methods either cannot support or suffer...

10.1145/3340531.3411862 article EN 2020-10-19

Mobile device identification techniques can be applied to secure authentication, and will of particular importance for the security mobile networks, such as avoiding spoofing attacks. For Android devices, explicit identifiers, e.g., ID, are used uniquely identify a device. However, permissions required gain this could cause permission abuse leakage user privacy. To address these issues, we use combination implicit identifiers that cannot individually. We first investigate 38 acquired without...

10.1109/access.2016.2626395 article EN cc-by-nc-nd IEEE Access 2016-01-01

We study the problem of extracting a small subset representative items from large data stream. In many mining and machine learning applications such as social network analysis recommender systems, this can be formulated maximizing monotone submodular function subject to cardinality constraint $k$. work, we consider setting where in stream belong one several disjoint groups investigate optimization with an additional \emph{fairness} that limits selection given number each group. then propose...

10.1145/3442381.3449799 preprint EN 2021-04-19

Adversarial attacks on graphs have attracted considerable research interests. Existing works assume the attacker is either (partly) aware of victim model, or able to send queries it. These assumptions are, however, unrealistic. To bridge gap between theoretical graph and real-world scenarios, in this work, we propose a novel more realistic setting: strict black-box attack, which has no knowledge about model at all not allowed any queries. design such an attack strategy, first generic filter...

10.1609/aaai.v36i4.20350 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Finding a small set of representative tuples from large database is an important functionality for supporting multi-criteria decision making. Top- <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> queries and skyline are two widely studied to fulfill this task. However, both them have some limitations: top- query requires the user provide her utility functions finding with highest scores as result; does not need any user-specified function but cannot control result...

10.1109/tkde.2022.3166835 article EN IEEE Transactions on Knowledge and Data Engineering 2022-04-12

Identifying anonymity services from network traffic is a crucial task for management and security. Currently, some works based on deep learning have achieved excellent performance analysis, especially those flow sequence (FS), which utilizes information features of the flow. However, these models still face serious challenge because lacking mechanism to take into account relationships between flows, resulting in mistakenly recognizing irrelevant flows FS as clues identifying traffic. In this...

10.1109/iwqos54832.2022.9812882 article EN 2022-06-10

Selecting a small set of representatives from large database is important in many applications such as multi-criteria decision making, web search, and recommendation. The $k$-regret minimizing ($k$-RMS) problem was recently proposed for representative tuple discovery. Specifically, $P$ tuples with multiple numerical attributes, the $k$-RMS returns size-$r$ subset $Q$ that, any possible ranking function, score top-ranked not much worse than $k$\textsuperscript{th}-ranked $P$. Although has...

10.1109/icde51399.2021.00144 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2021-04-01

Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given set <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$X$</tex> of xmlns:xlink="http://www.w3.org/1999/xlink">$n$</tex> elements, it asks to select subset xmlns:xlink="http://www.w3.org/1999/xlink">$S$</tex> xmlns:xlink="http://www.w3.org/1999/xlink">$k\ll n$</tex> elements maximum diversity, as quantified by the...

10.1109/icde53745.2022.00008 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2022-05-01

Diversity maximization is a fundamental problem with broad applications in data summarization, web search, and recommender systems. Given set X of n elements, the asks for subset S k≪n elements maximum diversity, as quantified by dissimilarities among S. In this paper, we study diversity fairness constraints streaming sliding-window models. Specifically, focus on max-min problem, which selects that maximizes minimum distance (dissimilarity) between any pair distinct within it. Assuming...

10.3390/e25071066 article EN cc-by Entropy 2023-07-14

As privacy issues are receiving increasing attention within the Natural Language Processing (NLP) community, numerous methods have been proposed to sanitize texts subject differential privacy. However, state-of-the-art text sanitization mechanisms based on a relaxed notion of metric local (MLDP) do not apply non-metric semantic similarity measures and cannot achieve good privacy-utility trade-offs. To address these limitations, we propose novel Customized Text (CusText) mechanism original...

10.18653/v1/2023.findings-acl.355 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01
Coming Soon ...