- Complex Network Analysis Techniques
- Point processes and geometric inequalities
- Recommender Systems and Techniques
- Advanced Graph Neural Networks
- Topic Modeling
- Text and Document Classification Technologies
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Face and Expression Recognition
- Matrix Theory and Algorithms
- Reinforcement Learning in Robotics
- Opinion Dynamics and Social Influence
- Information Retrieval and Search Behavior
- Diffusion and Search Dynamics
- Sparse and Compressive Sensing Techniques
- Model Reduction and Neural Networks
- Web Data Mining and Analysis
- Human Mobility and Location-Based Analysis
- Bayesian Methods and Mixture Models
- Generative Adversarial Networks and Image Synthesis
- Image Retrieval and Classification Techniques
- Advanced Bandit Algorithms Research
- Advanced Clustering Algorithms Research
- Neural Networks and Applications
- Graph Theory and Algorithms
Chinese University of Hong Kong, Shenzhen
2019-2025
Georgia Institute of Technology
2013-2023
Southwest Jiaotong University
2022-2023
East China Normal University
2014-2022
Shenzhen Research Institute of Big Data
2020-2022
Chinese University of Hong Kong
2019-2021
Shanghai Jiao Tong University
2019-2020
Institute of Art
2020
University of California, Los Angeles
2019
Amazon (United States)
2018
We present a new algorithm for manifold learning and nonlinear dimensionality reduction. Based on set of unorganized data points sampled with noise from parameterized manifold, the local geometry is learned by constructing an approximation tangent space at each point, those spaces are then aligned to give global coordinates respect underlying manifold. also error analysis our showing that reconstruction errors can be quite small in some cases. illustrate using curves surfaces both...
An important application of graph partitioning is data clustering using a model - the pairwise similarities between all objects form weighted adjacency matrix that contains necessary information for clustering. In this paper, we propose new algorithm with an objective function follows min-max principle. The relaxed version optimization cut leads to Fiedler vector in spectral partitioning. Theoretical analyses indicate it balanced partitions, and lower bounds are derived. tested on newsgroup...
Principal component analysis (PCA) minimizes the sum of squared errors (L2-norm) and is sensitive to presence outliers. We propose a rotational invariant L1-norm PCA (R1-PCA). R1-PCA similar in that (1) it has unique global solution, (2) solution are principal eigenvectors robust covariance matrix (re-weighted soften effects outliers), (3) invariant. These properties not shared by PCA. A new subspace iteration algorithm given compute efficiently. Experiments on several real-life datasets...
Sensor Positioning is a fundamental and crucial issue for sensor network operation management. In the paper, we first study some situations where most existing positioning methods tend to fail perform well, an example being when topology of anisotropic. Then, explore idea using dimensionality reduction techniques estimate sensors coordinates in two (or three) dimensional space, propose distributed method based on multidimensional scaling technique deal with these challenging conditions....
Nonlinear manifold learning from unorganized data points is a very challenging unsupervised and visualization problem with great variety of applications. In this paper we present new algorithm for nonlinear dimension reduction. Based on set sampled noise the manifold, represent local geometry using tangent spaces learned by fitting an affine subspace in neighborhood each point. Those are aligned to give internal global coordinates respect underlying way partial eigendecomposition connection...
User preferences are usually dynamic in real-world recommender systems, and a user»s historical behavior records may not be equally important when predicting his/her future interests. Existing recommendation algorithms -- including both shallow deep approaches embed into single latent vector/representation, which have lost the per item- or feature-level correlations between In this paper, we aim to express, store, manipulate users» more explicit, dynamic, effective manner. To do so,...
Targeting interest to match a user with services (e.g. news, products, games, advertisements) and predicting friendship build connections among users are two fundamental tasks for social network systems. In this paper, we show that the information contained in networks (i.e. user-service interactions) user-user connections) is highly correlated mutually helpful. We propose framework exploits homophily establish an integrated linking interested connecting different common interests, upon...
Due to name abbreviations, identical names, misspellings, and pseudonyms inpublications or bibliographies (citations), an author may have multiple names authors share the same name. Such ambiguity affects performance of document retrieval, web search, database integration, cause improper attribution authors. This paper investigates two supervised learning approaches disambiguate in citations. One approach uses naive Bayes probability model, a generative model; other Support Vector...
Large-scale datasets possessing clean label annotations are crucial for training Convolutional Neural Networks (CNNs). However, labeling large-scale data can be very costly and error-prone, even high-quality likely to contain noisy (incorrect) labels. Existing works usually employ a closed-set assumption, whereby the samples associated with labels possess true class contained within set of known classes in data. such an assumption is too restrictive many applications, since might fact that...
Recent work has shown that optical flow estimation can be formulated as a supervised learning problem. Moreover, convolutional networks have been successfully applied to this task. However, is obfuscated by the shortage of labeled training data. As consequence, existing methods turn large synthetic datasets for easily computer generated ground truth. In work, we explore if deep network trained without supervision. Using image warping estimated flow, devise simple yet effective unsupervised...
Community Question Answering has emerged as a popular and effective paradigm for wide range of information needs. For example, to find out an obscure piece trivia, it is now possible even very post question on community QA site such Yahoo! Answers, rely other users provide answers, often within minutes. The importance sites magnified they create archives millions questions hundreds many which are invaluable the needs searchers. However, make this immense body knowledge accessible, answer...
A key challenge in recommender system research is how to effectively profile new users, a problem generally known as cold-start recommendation. Recently the idea of progressively querying user responses through an initial interview process has been proposed useful preference elicitation strategy. In this paper, we present functional matrix factorization (fMF), novel recommendation method that solves construction within context learning and item profiles. Specifically, fMF constructs decision...
Dynamic treatment recommendation systems based on large-scale electronic health records (EHRs) become a key to successfully improve practical clinical outcomes. Prior relevant studies recommend treatments either use supervised learning (e.g. matching the indicator signal which denotes doctor prescriptions), or reinforcement maximizing evaluation indicates cumulative reward from survival rates). However, none of these have considered combine benefits and learning. In this paper, we propose...
Fashion recommendation has attracted increasing attention from both industry and academic communities. This paper proposes a novel neural architecture for fashion based on image region-level features user review information. Our basic intuition is that: image, not all the regions are equally important users, i.e., people usually care about few parts of image. To model such human sense, we learn an over many pre-segmented regions, which can understand where really interested in...
Event sequence, asynchronously generated with random timestamp, is ubiquitous among applications. The precise and arbitrary timestamp can carry important clues about the underlying dynamics, has lent event data fundamentally different from time-series whereby series indexed fixed equal time interval. One expressive mathematical tool for modeling point process. intensity functions of many processes involve two components: background effect by history. Due to its inherent spontaneousness, be...
Many data types arising from mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers purchasing items market basket analysis reviewers movies movie recommender system. In this paper, we propose new clustering method based on partitioning the underlying graph. The partition is constructed by minimizing normalized sum of edge weights between unmatched pairs vertices We show that an approximate solution to minimization problem...
An author may have multiple names and authors share the same name simply due to abbreviations, identical names, or misspellings in publications bibliographies 1. This can produce ambiguity which affect performance of document retrieval, web search, database integration, cause improper attribution credit. Proposed here is an unsupervised learning approach using K-way spectral clustering that disambiguates citations. The utilizes three types citation attributes: co-author paper titles,...
Automatic metadata generation provides scalability and usability for digital libraries their collections. Machine learning methods offer robust adaptable automatic extraction. We describe a support vector machine classification-based method extraction from header part of research papers show that it outperforms other on the same task. The first classifies each line into one or more 15 classes. An iterative convergence procedure is then used to improve classification by using predicted class...
A novel method for simultaneous keyphrase extraction and generic text summarization is proposed by modeling documents as weighted undirected bipartite graphs. Spectral graph clustering algorithms are useed partitioning sentences of the into topical groups with sentence link priors being exploited to enhance quality. Within each group, saliency scores keyphrases generated based on a mutual reinforcement principle. The then ranked according their selected inclusion in top list summaries...
Recent graph-theoretic approaches have demonstrated remarkable successes for ranking networked entities, but most of their applications are limited to homogeneous networks such as the network citations between publications. This paper proposes a novel method co-ranking authors and publications using several networks: social connecting authors, citation publications, well authorship that ties previous two together. The new framework is based on coupling random walks, separately rank documents...
We propose a novel approach to sufficient dimension reduction in regression, based on estimating contour directions of small variation the response. These span orthogonal complement minimal space relevant for regression and can be extracted according two measures response, leading simple general (SCR GCR) methodology. In comparison with existing techniques, this contour-based methodology guarantees exhaustive estimation central subspace under ellipticity predictor distribution mild...