- Optimization and Search Problems
- Complexity and Algorithms in Graphs
- Computational Geometry and Mesh Generation
- Advanced Bandit Algorithms Research
- Facility Location and Emergency Management
- Auction Theory and Applications
- Data Management and Algorithms
- Machine Learning and Algorithms
- Reinforcement Learning in Robotics
- Topic Modeling
- Complex Network Analysis Techniques
- Advanced Graph Neural Networks
- Scheduling and Optimization Algorithms
- Machine Learning and Data Classification
- Stochastic Gradient Optimization Techniques
- Vehicle Routing Optimization Methods
- Advanced Image and Video Retrieval Techniques
- Face and Expression Recognition
- Sparse and Compressive Sensing Techniques
- Lymphoma Diagnosis and Treatment
- Viral-associated cancers and disorders
- Point processes and geometric inequalities
- Markov Chains and Monte Carlo Methods
- Algorithms and Data Compression
- Advanced Clustering Algorithms Research
Tsinghua University
2016-2025
Chongqing Cancer Hospital
2022-2025
Chongqing University
2007-2025
Beijing Institute of Technology
2024
Ministry of Education of the People's Republic of China
2016-2022
Binghamton University
2022
Institute of Acoustics
2021
University of Chinese Academy of Sciences
2021
Shanxi University of Traditional Chinese Medicine
2020
Chinese Academy of Sciences
2013-2017
Since the invention of word2vec, skip-gram model has significantly advanced research network embedding, such as recent emergence DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all aforementioned models with negative sampling can be unified into matrix factorization framework closed forms. Our analysis proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation network's normalized Laplacian matrix; (2) in theory, is special case when size...
Speech synthesis (text to speech, TTS) and recognition (automatic speech recognition, ASR) are important tasks, require a large amount of text pairs for model training. However, there more than 6,000 languages in the world most lack training data, which poses significant challenges when building TTS ASR systems extremely low-resource languages. In this paper, we develop LRSpeech, system under setting, can support rare with low data cost. LRSpeech consists three key techniques: 1)...
We address the problem of approximate nearest neighbor (ANN) search for visual descriptor indexing. Most spatial partition trees, such as KD VP and so on, follow hierarchical binary space partitioning framework. The key effort is to design different functions (hyperplane or hypersphere) divide points that 1) data can be well grouped support effective NN candidate location 2) quickly evaluated efficient location. a trinary-projection direction-based function. direction defined combination few...
Extracting interesting tuples from a large database is an important problem in multi-criteria decision making. Two representative queries were proposed the literature: top- k and skyline queries. A query requires users to specify their utility functions beforehand then returns users. does not require any function but it puts no control on number of returned Recently, k-regret was received attention community because output size controllable, thus avoids those deficiencies Specifically, that...
Gradient Boosted Decision Trees (GBDT) is a very successful ensemble learning algorithm widely used across variety of applications. Recently, several variants GBDT training algorithms and implementations have been designed heavily optimized in some popular open sourced toolkits including XGBoost, LightGBM CatBoost. In this paper, we show that both the accuracy efficiency can be further enhanced by using more complex base learners. Specifically, extend gradient boosting to use piecewise...
While pre-trained language models (e.g., BERT) have achieved impressive results on different natural processing tasks, they large numbers of parameters and suffer from big computational memory costs, which make them difficult for real-world deployment. Therefore, model compression is necessary to reduce the computation cost models. In this work, we aim compress BERT address following two challenging practical issues: (1) The algorithm should be able output multiple compressed with sizes...
Tabular data on the Web has become a rich source of structured that is useful for ordinary users to explore. Due its potential, tables have recently attracted number studies with goals understanding semantics those and providing effective search exploration mechanisms over them. An important part table column concept determination, i.e., identifying most appropriate concepts associated columns tables. The problem becomes especially challenging availability increasingly knowledge bases...
Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive time consuming. Recently, researchers have tried to use learning algorithms exploit the landscape of loss function problem interest, learn how optimize over it in an automatic way. In this paper, we propose new learning-to-learn model some useful practical...
With the wide use of mobile devices, predicting destination moving vehicles has become an increasingly important problem for location based recommendation systems and destination-based advertising. Most existing approaches are on various Markov chain models, in which historical trajectories used to train model top-k most probable destinations returned. We identify certain limitations previous approaches. Instead, we propose a new data-driven framework, called DestPre, is not probabilistic...
Graph contrastive learning (GCL) has attracted a surge of attention due to its superior performance for node/graph representations without labels. However, in practice, the underlying class distribution unlabeled nodes given graph is usually imbalanced. This highly imbalanced inevitably deteriorates quality learned node GCL. Indeed, we empirically find that most state-of-the-art GCL methods cannot obtain discriminative and exhibit poor on classification. Motivated by this observation,...
Motivated by issues of saving energy in data centers we define a collection new problems referred to as machine activation problems. The central framework introduce considers m machines (unrelated or related) with each i having an cost ai. There is also n jobs that need be performed, and pi, j the processing time job on i. Standard scheduling models assume set fixed all are available. However, our setting, there budget A - would like select subset S activate total a(S) ≤ find schedule for...
<title>Abstract</title> <bold>Objective</bold> To explore the clinical characteristics, treatment, and prognosis of patients with newly diagnosed plasmablastic lymphoma (PBL). <bold>Methods</bold> The data 13 PBL admitted to Chongqing University Cancer Hospital from January 2013 June 2024 were retrospectively analyzed. Survival analysis was performed using Kaplan-Meier survival curve Log-rank test. Univariate multivariate Cox regression model analyses used for analyzing prognostic factors....
Principal component analysis (PCA) is a fundamental dimension reduction tool in statistics and machine learning. For large high-dimensional data, computing the PCA (i.e., top singular vectors of data matrix) becomes challenging task. In this work, single-pass randomized algorithm proposed to compute with only one pass over data. It suitable for processing extremely stored slow memory (hard disk) or generated streaming fashion. Experiments synthetic real validate algorithm's accuracy, which...