- Complex Network Analysis Techniques
- Advanced Graph Neural Networks
- Opinion Dynamics and Social Influence
- Topic Modeling
- Caching and Content Delivery
- Spam and Phishing Detection
- Graph Theory and Algorithms
- Privacy-Preserving Technologies in Data
- Data Stream Mining Techniques
- Peer-to-Peer Network Technologies
- Network Security and Intrusion Detection
- Internet Traffic Analysis and Secure E-voting
- Data Quality and Management
- Speech and dialogue systems
- Natural Language Processing Techniques
- Human Mobility and Location-Based Analysis
- Cryptography and Data Security
- Artificial Intelligence in Law
- AI in Service Interactions
- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Data Management and Algorithms
- Bayesian Modeling and Causal Inference
- Anomaly Detection Techniques and Applications
- Bioinformatics and Genomic Networks
Xi'an Jiaotong University
2014-2025
Shanghai University of Engineering Science
2023
King Abdullah University of Science and Technology
2018-2019
Chinese University of Hong Kong
2017-2019
Legal Judgement Prediction (LJP) is the task of automatically predicting a law case’s judgment results given text describing facts, which has great prospects in judicial assistance systems and handy services for public. In practice, confusing charges are often presented, because cases applicable to similar articles easily misjudged. To address this issue, existing work relies heavily on domain experts, hinders its application different systems. paper, we present an end-to-end model, LADAN,...
Visual question answering requires a system to provide an accurate natural language answer given image and question. However, it is widely recognized that previous generic VQA methods often tend memorize biases present in the training data rather than learning proper behaviors, such as grounding images before predicting answers. Therefore, these usually achieve high in-distribution but poor out-of-distribution performance. In recent years, various datasets debiasing have been proposed...
Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function biological Online Social Networks (OSNs). Nowadays, massive size some critical networks—often stored in already overloaded relational databases—effectively limits rate at which nodes edges can be explored, making it a challenge to accurately discover statistics. In this work, we propose sampling methods estimate from few queried...
Counting 3-, 4-, and 5-node graphlets in graphs is important for graph mining applications such as discovering abnormal/ evolution patterns social biology networks. In addition, it recently widely used computing similarities between classification protein function prediction malware detection. However, challenging to compute these graphlet counts a large or set of due the combinatorial nature problem. Despite recent efforts counting 3-node 4-node graphlets, little attention has been paid...
Legal case retrieval aims to automatically scour comparable legal cases based on a given query, which is crucial for offering relevant precedents support the judgment in intelligent systems. Due similar goals, it often associated with matching task. To address them, daunting challenge assessing uniquely defined legal-rational similarity within judicial domain, distinctly deviates from semantic similarities general text retrieval. Past works either tagged domain-specific factors or...
Understanding mobile data traffic and forecasting future trend is beneficial to wireless carriers service providers who need perform resource allocation energy saving management. However, predicting accurately at large-scale fine-granularity particularly challenging due the following two factors: spatial correlations between network units (i.e., a cell tower or an access point) introduced by user arbitrary movements, time-evolving nature of movements which frequently changes with time. In...
Graphs are widely used to represent the relations among entities. When one owns complete data, an entire graph can be easily built, therefore performing analysis on is straightforward. However, in many scenarios, it impractical centralize data due privacy concerns. An organization or party only keeps a part of whole i.e., isolated from different parties. Recently, Federated Learning (FL) has been proposed solve isolation issue, mainly for Euclidean data. It still challenge apply FL because...
Predicting interactions between structured entities lies at the core of numerous tasks such as drug regimen and new material design. In recent years, graph neural networks have become attractive. They represent graphs then extract features from each individual using convolution operations. However, these methods some limitations: i) their only a fix-sized subgraph structure (i.e., receptive field) node, ignore in substructures different sizes, ii) are extracted by considering entity...
Characterizing motif (i.e., locally connected sub-graph patterns) statistics is important for understanding complex networks such as online social and communication networks. Previous work made the strong assumption that graph topology of interest known in advance. In practice, sometimes researchers have to deal with situation where unknown because it expensive collect store all topological meta information. Hence, typically what available only a snapshot graph, i.e., subgraph graph....
Bipartite graphs widely exist in real-world scenarios and model binary relations like host-website, author-paper, user-product. In bipartite graphs, a butterfly (i.e., <inline-formula><tex-math notation="LaTeX">$2\times 2$</tex-math></inline-formula> bi-clique) is the smallest non-trivial cohesive structure plays an important role applications such as anomaly detection. Considerable efforts focus on counting butterflies static graphs. However, they suffer from high time space complexity when...
Follower networks such as Twitter and Digg are becoming popular form of social information networks. This paper seeks to gain insights into how they evolve the relationship between their structure ability spread information. By studying Douban follower network, which is a online network in China, we provide some evidences showing its suitability for spreading. For example, it exhibits an unbalanced bow-tie with large out-component, indicates that majority users can widely; effective diameter...
Despite recent efforts to characterize online social network (OSN) structures and activities, user behavior across different OSNs has received little attention. Yet such information could provide insight into issues relating personal privacy protection. For instance, many Foursquare users reveal their Facebook Twitter accounts the public. The authors' in-depth measurement study examines users' activities settings Facebook, Twitter, Foursquare. Results show that are highly correlated among...
Calculating the number of distinct values (i.e., NDV) in a column big table is costly yet fundamental to variety database applications such as data compression and profiling. To reduce high time space cost, sketch methods (e.g., HyperLogLog) have been proposed, which estimate NDV from constructed compact summary values. However, these fail or are manage fully-dynamic scenarios where often inserted into deleted table. solve this issue, we propose novel method, <italic...
The host connection degree distribution (HCDD) is an important metric for network security monitoring. However, it difficult to accurately obtain the HCDD in real time high-speed links with a massive amount of traffic data. In this paper, we propose new sketch method build probabilistic summary host's flows using uniform Flajolet-Martin combined small bitmap. To study its performance comparison previous sampling and methods, present general model that encompasses all these methods. With...
The unbiasedness of online product ratings, an important property to ensure that users' ratings indeed reflect their true evaluations products, is vital both in shaping consumer purchase decisions and providing reliable recommendations. Recent experimental studies showed distortions from historical would ruin the subsequent ratings. How "discover" each single rating (or at micro-level), perform "debiasing operations" real systems are main objectives this work.
Many real-world datasets are given in the format of data streams, and processing these streams is fundamental for many applications such as anomaly detection. In this paper, we study problem computing item frequencies, finding topk hot items, detecting heavy changes. However, widelyused sketches cost large memory usage their performance easily affected by unbalanced distribution streams. To solve issue, a novel method Cold Filter (CF) proposed to split cold items use separate structure...
Random walk-based graph sampling methods have become increasingly popular and important for characterizing large-scale complex networks. While powerful, they are known to exhibit problems when the is loosely connected, which slows down convergence of a random walk can result in poor estimation accuracy. In this work, we observe that many graphs under study, called target graphs, usually do not exist isolation. situations, often related an auxiliary affiliation graph, becomes better connected...
Background: ADAMTS1 and ADAMTS8 are proteases involved in extracellular matrix proteolysis antiangiogenesis, but little is known about their expression function cerebral ischemia. We investigated the changes a rat model of permanent middle artery occlusion (pMCAO). The expressions glyseraldehyde‐3‐phosphate dehydrogenase (GAPDH), β‐actin, cyclophilin, RPL13A were examined order to validate appropriate housekeeping genes for long duration after inducing Methods: Male Sprague–Dawley rats...
As an important metric in graphs, group closeness centrality measures how close a of vertices is to all other graph, and it used numerous graph applications such as measuring the dominance influence node over graph. However, when large-scale contains hundreds millions nodes/edges which cannot reside entirely computer's main memory, maximizing become challenging tasks. In this paper, we present systematic solution for efficiently calculating disk-resident graphs. Our first leverages...
Large language models often necessitate grounding on external knowledge to generate faithful and reliable answers. Yet even with the correct groundings in reference, they can ignore them rely wrong or their inherent biases hallucinate when users, being largely unaware of specifics stored information, pose questions that might not directly correlate retrieved groundings. In this work, we formulate alignment problem introduce MixAlign, a framework interacts both human user base obtain...