Conghui Tan

ORCID: 0000-0003-3993-4751
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Sparse and Compressive Sensing Techniques
  • Privacy-Preserving Technologies in Data
  • Stochastic Gradient Optimization Techniques
  • Music and Audio Processing
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Tensor decomposition and applications
  • Face and Expression Recognition
  • Mobile Crowdsensing and Crowdsourcing
  • Markov Chains and Monte Carlo Methods
  • Advanced Optimization Algorithms Research
  • Recommender Systems and Techniques
  • Advanced Bandit Algorithms Research
  • Optimization and Variational Analysis
  • Topic Modeling
  • Statistical Methods and Inference
  • Hydrological Forecasting Using AI
  • Metaheuristic Optimization Algorithms Research
  • Data Quality and Management
  • Acoustic Wave Phenomena Research
  • Complexity and Algorithms in Graphs
  • Digital Media Forensic Detection
  • Water resources management and optimization
  • Artificial Intelligence in Healthcare and Education

West Bengal Electronics Industry Development Corporation Limited (India)
2019-2020

Chinese University of Hong Kong
2016-2020

ITT Technical Institute
2005

Federated learning has become increasingly important for modern machine learning, especially data privacy-sensitive scenarios. Existing federated mostly adopts the central server-based architecture or centralized architecture. However, in many social network scenarios, is not applicable (e.g., a agent server connecting all users may exist, communication cost to affordable). In this paper, we consider generic setting: 1) and 2) unidirectional of single-sided trust (i.e., user A trusts B but...

10.48550/arxiv.1910.04956 preprint EN cc-by arXiv (Cornell University) 2019-01-01

One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running algorithm. Since traditional line search technique does not apply for optimization algorithms, common practice SGD either use a diminishing size, or tune fixed by hand, which can be time consuming practice. In this paper, we propose Barzilai-Borwein (BB) method automatically compute sizes and its variant: variance reduced (SVRG) method, leads two algorithms: SGD-BB...

10.48550/arxiv.1605.04131 preprint EN other-oa arXiv (Cornell University) 2016-01-01

In order to mine latent semantics from text data, word embedding and topic modeling are two major methodologies in industry. From a pragmatic perspective, each of these lines semantic models faces increasing challenges real-life applications. However, modern mining tasks typically require panoramic view the semantics. Hence, discovering heterogeneous (e.g., types topics) is critical for performance tasks, it necessary design model that meets this demand. Furthermore, with arrival big data...

10.1109/tkde.2021.3077025 article EN IEEE Transactions on Knowledge and Data Engineering 2021-01-01

Nonnegative matrix factorization (NMF) has been successfully applied in several data mining tasks. Recently, there is an increasing interest the acceleration of NMF, due to its high cost on large matrices. On other hand, privacy issue NMF over federated worthy attention, since prevalently image and text analysis which may involve leveraging (e.g, medical record) across parties (e.g., hospitals). In this paper, we study <italic xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/tkde.2020.2985964 article EN IEEE Transactions on Knowledge and Data Engineering 2020-04-10

Automatic Speech Recognition (ASR) is playing a vital role in wide range of real-world applications. However, Commercial ASR solutions are typically “one-size-fits-all” products and clients inevitably faced with the risk severe performance degradation field test. Meanwhile, new data regulations such as European Union’s General Data Protection Regulation (GDPR) coming into force, vendors, which traditionally utilize speech training centralized approach, becoming increasingly helpless to solve...

10.1145/3447687 article EN ACM Transactions on Intelligent Systems and Technology 2021-05-05

With the popularity of video/audio streaming applications in recent years, wide spread Autonomous Sensory Meridian Response (ASMR) erotica content is becoming a serious issue social networks. Due to subtle nature ASMR and its relative rareness real scenario, detecting contents challenging task. In this article, we propose novel neural framework for moderation. The proposed consists pipeline strategies tackle challenges unique Erotica Contents such as data scarcity imbalanced data. Based on...

10.1109/tkde.2023.3283501 article EN IEEE Transactions on Knowledge and Data Engineering 2023-06-07

Due to the rising awareness of privacy protection and voluminous scale speech data, it is becoming infeasible for Automatic Speech Recognition (ASR) system developers train acoustic model with complete data as before. In this paper, we propose a novel Divide-and-Merge paradigm solve salient problems plaguing ASR field. Divide phase, multiple models are trained based upon different subsets while in Merge phase two algorithms utilized generate high-quality those on subsets. We first Genetic...

10.24963/ijcai.2020/513 article EN 2020-07-01

This paper is aimed at demonstrating a genetic algorithm method and applying it to predict the water quality of reservoir in Taiwan island using remote sensing data.Genetic algorithms will be combined with operation tree (GAOT) find relationships between input output data.A fittest function type obtained automatically from this method.The advantages GA are global optimization, nonlinearity, flexibility parallelism.In current case study, used construct relationship algae concentration Landsat...

10.7763/ijmo.2017.v7.566 article EN International Journal of Modeling and Optimization 2017-04-01

An Experimental Learning Element (ELE) for learning and recognizing sequential patterns is being developed as an adaptable pattern classifier of a larger system. Once external are converted into linear sequence named objects, the ELE can build models that associate input object sequences with expected output state sequences. The has been successfully demonstrated in hand-printed characters. This paper describes compares its performance Dynamic Time Wrap (DTW) based speech recognition system...

10.1109/icassp.1985.1168282 article EN 2005-03-23

Nonnegative matrix factorization (NMF) has been successfully applied in different fields, such as text mining, image processing, and video analysis. NMF is the problem of determining two nonnegative low rank matrices U V, for a given input M, that m ≈ UV⊥. There an increasing interest parallel distributed algorithms, due to high cost centralized on large matrices. In this paper, we propose sketched alternating least squares(DSANLS) framework NMF, which utilizes sketching technique reduce...

10.1145/3159652.3159662 article EN 2018-02-02

Dual averaging-type methods are widely used in industrial machine learning applications due to their ability promoting solution structure (e.g., sparsity) efficiently. In this paper, we propose a novel accelerated dual-averaging primal-dual algorithm for minimizing composite convex function. We also derive stochastic version of the proposed method which solves empirical risk minimization, and its advantages on handling sparse data demonstrated both theoretically empirically.

10.1080/10556788.2020.1713779 article EN Optimization methods & software 2020-01-20

Due to the rising awareness of privacy protection and voluminous scale speech data, it is becoming infeasible for Automatic Speech Recognition (ASR) system developers train acoustic model with complete data as before. For example, may be owned by different curators, not allowed share others. In this paper, we propose a novel paradigm solve salient problems plaguing ASR field. first stage, multiple models are trained based upon subsets while in second phase, two algorithms utilized generate...

10.48550/arxiv.2410.15620 preprint EN arXiv (Cornell University) 2024-10-20

Regularized empirical risk minimization problem with linear predictor appears frequently in machine learning. In this paper, we propose a new stochastic primal-dual method to solve class of problems. Different from existing methods, our proposed methods only require O(1) operations each iteration. We also develop variance-reduction variant the algorithm that converges linearly. Numerical experiments suggest are faster than ones such as proximal SGD, SVRG and SAGA on high-dimensional

10.48550/arxiv.1811.01182 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Since data regulations such as the European Union's General Data Protection Regulation (GDPR) have taken effect, traditional two-step Automatic Speech Recognition (ASR) optimization strategy (i.e., training a one-size-fits-all model with vendor's centralized and fine-tuning clients' private data) has become infeasible. To meet these privacy requirements, TFE, novel GDPR-compliant ASR ecosystem, been proposed by us to incorporate transfer learning, federated evolutionary learning towards...

10.1145/3503161.3547731 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10
Coming Soon ...