- Complex Network Analysis Techniques
- Topic Modeling
- Anomaly Detection Techniques and Applications
- Advanced Graph Neural Networks
- Advanced Text Analysis Techniques
- Opinion Dynamics and Social Influence
- Text and Document Classification Technologies
- Time Series Analysis and Forecasting
- Sentiment Analysis and Opinion Mining
- Graph Theory and Algorithms
- Spam and Phishing Detection
- Recommender Systems and Techniques
- Tensor decomposition and applications
- Natural Language Processing Techniques
- Crime, Illicit Activities, and Governance
- Network Security and Intrusion Detection
- Web Data Mining and Analysis
- Semantic Web and Ontologies
- Internet Traffic Analysis and Secure E-voting
- Data Management and Algorithms
- Particle accelerators and beam dynamics
- Data Visualization and Analytics
- Face and Expression Recognition
- Data-Driven Disease Surveillance
- Imbalanced Data Classification Techniques
Institute of Computing Technology
2015-2024
Chinese Academy of Sciences
2015-2024
University of Chinese Academy of Sciences
2017-2024
Chinese Academy of Medical Sciences & Peking Union Medical College
2024
University of Notre Dame
2024
Institute of High Energy Physics
2018-2023
Dongguan University of Technology
2022
Hohai University
2021
Center for Excellence in Brain Science and Intelligence Technology
2020
China Spallation Neutron Source
2018
Given a large-scale rhythmic time series containing mostly normal data segments (or `beats'), can we learn how to detect anomalous beats in an effective yet efficient way? For example, from electrocardiogram (ECG) readings? Existing approaches either require excessively high amounts of labeled and balanced for classification, or rely on less regularized reconstructions, resulting lower accuracy anomaly detection. Therefore, propose BeatGAN, unsupervised detection algorithm data. BeatGAN...
A water-soluble fluorescent sensor, 1, based on the quinoline platform, demonstrates femtomolar sensitivity for zinc ion with a 14-fold enhanced quantum yield upon chelation to and also exhibits high selectivity over other physiological relevant divalent metals in presence of EDTA. X-ray crystal structure complex reveals that an acetic carboxylic group participates coordination, which significantly enhances affinity 1 ion.
Sentiment classification is a topic-sensitive task, i.e., classifier trained from one topic will perform worse on another. This especially problem for the tweets sentiment analysis. Since topics in Twitter are very diverse, it impossible to train universal all topics. Moreover, compared product review, lacks data labeling and rating mechanism acquire labels. The extremely sparse text of also brings down performance classifier. In this paper, we propose semi-supervised topic-adaptive (TASC)...
An ability of modeling and predicting the cascades resharing is crucial to understanding information propagation launching campaign viral marketing. Conventional methods for cascade prediction heavily depend on hypothesis diffusion models, e.g., independent model linear threshold model. Recently, researchers attempt circumvent problem using sequential models (e.g., recurrent neural network, namely RNN) that do not require knowing underlying Existing employ a chain structure capture memory...
Given a graph of the money transfers between accounts bank, how can we detect laundering? Money laundering refers to criminals using bank's services move massive amounts illegal untraceable destination accounts, in order inject their into legitimate financial system. Existing fraud detection approaches focus on dense subgraph detection, without considering fact that involves high-volume flows funds through chains bank thereby decreasing accuracy. Instead, propose model transactions...
Time series data naturally exist in many domains including medical analysis, infrastructure sensor monitoring, and motion tracking. However, a very small portion of anomalous time can be observed, comparing to the whole data. Most existing approaches are based on supervised classification model requiring representative labels for anomaly class(es), which is challenging real-world problems. So we learn how detect ticks an effective yet efficient way, given mostly normal data? Therefore,...
Nowadays, short texts are very prevalent in various web applications, such as microblogs, instant messages. The severe sparsity of hinders existing topic models to learn reliable topics. In this paper, we propose a novel way tackle problem. key idea is topics by exploring term correlation data, rather than the high-dimensional and sparse occurrence information documents. Such data less more stable with increase collection size, can well capture necessary for learning. To obtain from first...
Sentiment classification is an important problem in tweets mining. There lack labeled data and rating mechanism for generating them Twitter service. And topics are more diverse while sentiment classifiers always dedicate themselves to a specific domain or topic. Thus it challenge make adaptive without sufficient data. Therefore we formally propose multiclass SVM model which transfers initial common classifier topic-adaptive one. To tackle the tweet sparsity, non-text features explored...
As online fraudsters invest more resources, including purchasing large pools of fake user accounts and dedicated IPs, fraudulent attacks become less obvious their detection becomes increasingly challenging. Existing approaches such as average degree maximization suffer from the bias nodes than necessary, resulting in lower accuracy increased need for manual verification. Hence, we propose HoloScope, which introduces a novel metric "contrast suspiciousness" integrating information graph...
Probabilistic topic models have been widely used for sentiment analysis. However, most of existing methods only model the text, but do not consider user, who expresses sentiment, and item, which is expressed on. Since different users may use expressions items, we argue that it better to incorporate user item information into In this paper, propose a new Supervised User-Item based Topic model, called SUIT It can simultaneously utilize textual latent user-item factors. Our proposed method uses...
Non-negative matrix factorization (NMF) has been successfully applied in document clustering. However, experiments on short texts, such as microblogs, Q&A documents and news titles, suggest unsatisfactory performance of NMF. An major reason is that the traditional term weighting schemes, like binary weight tfidf, cannot well capture terms' discriminative power importance due to sparsity data. To tackle this problem, we proposed a novel scheme for NMF, derived from Normalized Cut (Ncut)...
Given a stream of money transactions between accounts in bank, how can we accurately detect laundering agent and suspected behaviors real-time? Money agents try to hide the origin illegally obtained by dispersive multiple small evade detection smart strategies. Therefore, it is challenging catch such fraudsters an unsupervised manner. Existing approaches do not consider characteristics those are suitable streaming settings. propose MonLAD MonLAD-W transaction keeping track their residuals...
Modeling and predicting retweeting dynamics in social media has important implications to an array of applications. Existing models either fail model the triggering effect dynamics, e.g., based on reinforced Poisson process, or are hard be trained using only individual tweet, self-exciting Hawkes process. In this paper, motivated by observation that each is generally dominated a handful key nodes separately trigger high number retweets, we propose mixture process predict with subprocess...
In the smart power grid, short-term load forecasting (STLF) is a crucial step in scheduling and planning for future load, so as to improve reliability, cost, emissions of grid. Different from traditional time series forecast, STLF more challenging task, because complex demand active reactive numerous categories electrical loads effects environment. Therefore, we propose NeuCast, seasonal neural method, which dynamically models various co-evolving hidden space, well extra weather conditions,...
Predicting cascade dynamics has important implications for understanding information propagation and launching viral marketing. Previous works mainly adopt a pair-wise manner, modeling the probability between pairs of users using n2 independent parameters n users. Consequently, these models suffer from severe overfitting problem, especially without direct interactions, limiting their prediction accuracy. Here we propose to model by learning two low-dimensional user-specific vectors observed...
Sentiment classification on tweet events attracts more interest in recent years. The large stream stops people reading the whole classified list to understand insights. We employ co-training framework proposed algorithm. Features are split into text view features and non-text features. Two Random Forest (RF) classifiers trained with common labeled data two views of separately. Then for each specific event, they collaboratively periodically train together boost performance. At last, we...
Aerobic glycolysis is involved in the pathogenesis of pulmonary hypertension (PH). The mechanisms by which increased and how it contributes to vascular remodelling are not yet fully understood. In this study, we demonstrated that elevated lipocalin-2 (LCN2) PH significantly enhances aerobic human artery smooth muscle cells (PASMCs) up-regulating LDHA expression. Knockout Lcn2 or having heterozygous deficiency mice inhibits progression hypoxic PH. Our study reveals LCN2 stimulates expression...
Since the concept of aromaticity was first introduced in transition metal complexes, metals have become a crucial component for modulating aromaticity, leading to variety structural frameworks. Initial studies...
Principal component analysis (PCA) is a fundamental dimension reduction tool in statistics and machine learning. For large high-dimensional data, computing the PCA (i.e., top singular vectors of data matrix) becomes challenging task. In this work, single-pass randomized algorithm proposed to compute with only one pass over data. It suitable for processing extremely stored slow memory (hard disk) or generated streaming fashion. Experiments synthetic real validate algorithm's accuracy, which...
How can we detect fraud in a big graph with rich properties, as online fraudsters invest more resources, including purchasing large pools of fake user accounts and dedicated IPs, to hide their fraudulent attacks? To achieve robustness, existing approaches detected dense sub-graphs suspicious patterns an unsupervised way, such average degree maximization. However, suffer from the bias nodes than necessary, resulting lower accuracy increased need for manual verification. Therefore, propose...
Given a retinal image, can we automatically determine whether it is of high quality (suitable for medical diagnosis)? Can also explain our decision, pinpointing the region or regions that led to decision? Images from human retinas are vital diagnosis multiple health issues, like hypertension, diabetes, and Alzheimer's; low images may force patient come back again second scanning, wasting time possibly delaying treatment. However, existing image assessment methods either black boxes without...