NFDI4DS | UHH-SEMS - Publication Details

Jian Li

ORCID: 0000-0003-4650-3925

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101938872

Research Areas

Optimization and Search Problems
Complexity and Algorithms in Graphs
Computational Geometry and Mesh Generation
Advanced Bandit Algorithms Research
Facility Location and Emergency Management
Auction Theory and Applications
Data Management and Algorithms
Machine Learning and Algorithms
Reinforcement Learning in Robotics
Topic Modeling
Complex Network Analysis Techniques
Advanced Graph Neural Networks
Scheduling and Optimization Algorithms
Machine Learning and Data Classification
Stochastic Gradient Optimization Techniques
Vehicle Routing Optimization Methods
Advanced Image and Video Retrieval Techniques
Face and Expression Recognition
Sparse and Compressive Sensing Techniques
Lymphoma Diagnosis and Treatment
Viral-associated cancers and disorders
Point processes and geometric inequalities
Markov Chains and Monte Carlo Methods
Algorithms and Data Compression
Advanced Clustering Algorithms Research

Tsinghua University
2016-2025

Chongqing Cancer Hospital
2022-2025

Chongqing University
2007-2025

Beijing Institute of Technology
2024

Ministry of Education of the People's Republic of China
2016-2022

Binghamton University
2022

Institute of Acoustics
2021

University of Chinese Academy of Sciences
2021

Shanxi University of Traditional Chinese Medicine
2020

Chinese Academy of Sciences
2013-2017

Network Embedding as Matrix Factorization

OPENALEX - Publications

Jiezhong Qiu Yuxiao Dong Hao Ma Jian Li Kuansan Wang and 1 more

Since the invention of word2vec, skip-gram model has significantly advanced research network embedding, such as recent emergence DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all aforementioned models with negative sampling can be unified into matrix factorization framework closed forms. Our analysis proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation network's normalized Laplacian matrix; (2) in theory, is special case when size...

10.1145/3159652.3159706 preprint EN 2018-02-02

When LP Is the Cure for Your Matching Woes: Improved Bounds for Stochastic Matchings

OPENALEX - Publications

Nikhil Bansal Anupam Gupta Jian Li Julián Mestre Viswanath Nagarajan and 1 more

10.1007/s00453-011-9511-8 article EN Algorithmica 2011-04-08

LRSpeech

OPENALEX - Publications

Jin Xu Xu Tan Yi Ren Tao Qin Jian Li and 2 more

Speech synthesis (text to speech, TTS) and recognition (automatic speech recognition, ASR) are important tasks, require a large amount of text pairs for model training. However, there more than 6,000 languages in the world most lack training data, which poses significant challenges when building TTS ASR systems extremely low-resource languages. In this paper, we develop LRSpeech, system under setting, can support rare with low data cost. LRSpeech consists three key techniques: 1)...

10.1145/3394486.3403331 article EN 2020-08-20

Trinary-Projection Trees for Approximate Nearest Neighbor Search

OPENALEX - Publications

Jingdong Wang Naiyan Wang You Jia Jian Li Gang Zeng and 2 more

We address the problem of approximate nearest neighbor (ANN) search for visual descriptor indexing. Most spatial partition trees, such as KD VP and so on, follow hierarchical binary space partitioning framework. The key effort is to design different functions (hyperplane or hypersphere) divide points that 1) data can be well grouped support effective NN candidate location 2) quickly evaluated efficient location. a trinary-projection direction-based function. direction defined combination few...

10.1109/tpami.2013.125 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2013-06-28

Efficient k-Regret Query Algorithm with Restriction-free Bound for any Dimensionality

OPENALEX - Publications

Min Xie Raymond Chi-Wing Wong Jian Li Cheng Long Ashwin Lall

Extracting interesting tuples from a large database is an important problem in multi-criteria decision making. Two representative queries were proposed the literature: top- k and skyline queries. A query requires users to specify their utility functions beforehand then returns users. does not require any function but it puts no control on number of returned Recently, k-regret was received attention community because output size controllable, thus avoids those deficiencies Specifically, that...

10.1145/3183713.3196903 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

Gradient Boosting with Piece-Wise Linear Regression Trees

OPENALEX - Publications

Yu Shi Jian Li Zhize Li

Gradient Boosted Decision Trees (GBDT) is a very successful ensemble learning algorithm widely used across variety of applications. Recently, several variants GBDT training algorithms and implementations have been designed heavily optimized in some popular open sourced toolkits including XGBoost, LightGBM CatBoost. In this paper, we show that both the accuracy efficiency can be further enhanced by using more complex base learners. Specifically, extend gradient boosting to use piecewise...

10.24963/ijcai.2019/476 article EN 2019-07-28

NAS-BERT

OPENALEX - Publications

Jin Xu Xu Tan Renqian Luo Kaitao Song Jian Li and 2 more

While pre-trained language models (e.g., BERT) have achieved impressive results on different natural processing tasks, they large numbers of parameters and suffer from big computational memory costs, which make them difficult for real-world deployment. Therefore, model compression is necessary to reduce the computation cost models. In this work, we aim compress BERT address following two challenging practical issues: (1) The algorithm should be able output multiple compressed with sizes...

10.1145/3447548.3467262 preprint EN 2021-08-13

Scalable column concept determination for web tables using large knowledge bases

OPENALEX - Publications

Dong Deng Yu Jiang Guoliang Li Jian Li Cong Yu

Tabular data on the Web has become a rich source of structured that is useful for ordinary users to explore. Due its potential, tables have recently attracted number studies with goals understanding semantics those and providing effective search exploration mechanisms over them. An important part table column concept determination, i.e., identifying most appropriate concepts associated columns tables. The problem becomes especially challenging availability increasingly knowledge bases...

10.14778/2536258.2536271 article EN Proceedings of the VLDB Endowment 2013-08-01

Learning Gradient Descent: Better Generalization and Longer Horizons

OPENALEX - Publications

Kaifeng Lv Shunhua Jiang Jian Li

Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive time consuming. Recently, researchers have tried to use learning algorithms exploit the landscape of loss function problem interest, learn how optimize over it in an automatic way. In this paper, we propose new learning-to-learn model some useful practical...

10.48550/arxiv.1703.03633 preprint EN other-oa arXiv (Cornell University) 2017-01-01

DESTPRE

OPENALEX - Publications

Mengwen Xu Dong Wang Jian Li

With the wide use of mobile devices, predicting destination moving vehicles has become an increasingly important problem for location based recommendation systems and destination-based advertising. Most existing approaches are on various Markov chain models, in which historical trajectories used to train model top-k most probable destinations returned. We identify certain limitations previous approaches. Instead, we propose a new data-driven framework, called DestPre, is not probabilistic...

10.1145/2971648.2971664 article EN 2016-09-09

ImGCL: Revisiting Graph Contrastive Learning on Imbalanced Node Classification

OPENALEX - Publications

Liang Zeng Lanqing Li Ziqi Gao Peilin Zhao Jian Li

Graph contrastive learning (GCL) has attracted a surge of attention due to its superior performance for node/graph representations without labels. However, in practice, the underlying class distribution unlabeled nodes given graph is usually imbalanced. This highly imbalanced inevitably deteriorates quality learned node GCL. Indeed, we empirically find that most state-of-the-art GCL methods cannot obtain discriminative and exhibit poor on classification. Motivated by this observation,...

10.1609/aaai.v37i9.26319 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Energy efficient scheduling via partial shutdown

OPENALEX - Publications

Samir Khuller Jian Li Barna Saha

Motivated by issues of saving energy in data centers we define a collection new problems referred to as machine activation problems. The central framework introduce considers m machines (unrelated or related) with each i having an cost ai. There is also n jobs that need be performed, and pi, j the processing time job on i. Standard scheduling models assume set fixed all are available. However, our setting, there budget A - would like select subset S activate total a(S) ≤ find schedule for...

10.5555/1873601.1873711 article EN 2010-01-17

Clinical characteristics and prognosis analysis of 13 cases of newly diagnosed plasmablastic lymphoma

OPENALEX - Publications

Bingling Guo Zhixiong Yang Chaoyu Wang Chongling Hu Jun Li and 13 more

<title>Abstract</title> <bold>Objective</bold> To explore the clinical characteristics, treatment, and prognosis of patients with newly diagnosed plasmablastic lymphoma (PBL). <bold>Methods</bold> The data 13 PBL admitted to Chongqing University Cancer Hospital from January 2013 June 2024 were retrospectively analyzed. Survival analysis was performed using Kaplan-Meier survival curve Log-rank test. Univariate multivariate Cox regression model analyses used for analyzing prognostic factors....

10.21203/rs.3.rs-5917684/v1 preprint EN cc-by Research Square (Research Square) 2025-02-13

Single-Pass PCA of Large High-Dimensional Data

OPENALEX - Publications

Wenjian Yu Yu Gu Jian Li Shenghua Liu Yaohang Li

Principal component analysis (PCA) is a fundamental dimension reduction tool in statistics and machine learning. For large high-dimensional data, computing the PCA (i.e., top singular vectors of data matrix) becomes challenging task. In this work, single-pass randomized algorithm proposed to compute with only one pass over data. It suitable for processing extremely stored slow memory (hard disk) or generated streaming fashion. Experiments synthetic real validate algorithm's accuracy, which...

10.24963/ijcai.2017/468 article EN 2017-07-28

Coming Soon ...