- Topic Modeling
- Natural Language Processing Techniques
- Advanced Text Analysis Techniques
- Machine Learning and Data Classification
- Web Data Mining and Analysis
- Anomaly Detection Techniques and Applications
- Complex Network Analysis Techniques
- Text and Document Classification Technologies
- Advanced Graph Neural Networks
- Higher Education and Teaching Methods
- Semantic Web and Ontologies
- Bayesian Modeling and Causal Inference
- Domain Adaptation and Few-Shot Learning
- Machine Learning and Algorithms
- Educational Technology and Assessment
- Advanced Computational Techniques and Applications
- Energy, Environment, Economic Growth
- Complex Systems and Time Series Analysis
- Speech Recognition and Synthesis
- Artificial Intelligence in Games
- Recommender Systems and Techniques
- Sports Analytics and Performance
- Explainable Artificial Intelligence (XAI)
- Data Mining Algorithms and Applications
- Data Management and Algorithms
Tencent (China)
2019-2025
University of Electronic Science and Technology of China
2016-2025
University College London
2010-2025
Arizona State University
2002-2024
Shenyang Agricultural University
2023-2024
Beihang University
2009-2024
Hunan Normal University
2024
China University of Petroleum, East China
2022-2024
Southwestern University of Finance and Economics
2022-2023
Yangtze University
2023
We introduce Texygen, a benchmarking platform to support research on open-domain text generation models. Texygen has not only implemented majority of models, but also covered set metrics that evaluate the diversity, quality and consistency generated texts. The could help standardize facilitate sharing fine-tuned open-source implementations among researchers for their work. As consequence, this would in improving reproductivity reliability future work generation.
Automatically generating coherent and semantically meaningful text has many applications in machine translation, dialogue systems, image captioning, etc. Recently, by combining with policy gradient, Generative Adversarial Nets (GAN) that use a discriminative model to guide the training of generative as reinforcement learning shown promising results generation. However, scalar guiding signal is only available after entire been generated lacks intermediate information about structure during...
In the last decade, Dynamic Time Warping (DTW) has emerged as distance measure of choice for virtually all time series data mining applications. This is result significant progress in improving DTW's efficiency, and multiple empirical studies showing that DTW-based classifiers at least equal accuracy their rivals across dozens datasets. Thus far, most research considered only one-dimensional case, with practitioners generalizing to multi-dimensional case one two ways. general, it appears...
The rapid development of the digital economy provides an unprecedented opportunity for China to achieve carbon neutrality by 2060. While previous studies have explored relationship between economy, technologies, and energy, impact on emissions has not received sufficient attention in literature. Meanwhile, although cities are basic units emission reduction policies, few at city level China. This study investigates spatial correlation spillover effects 248 prefecture-level from 2011 2019....
In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for optimisers. Based these findings, propose a Heteroscedastic Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input output warping, admits exact marginal log-likelihood is robust values of learned parameters. We demonstrate HEBO’s...
Automatic Text Summarization (ATS), utilizing Natural Language Processing (NLP) algorithms, aims to create concise and accurate summaries, thereby significantly reducing the human effort required in processing large volumes of text. ATS has drawn considerable interest both academic industrial circles. Many studies have been conducted past survey methods; however, they generally lack practicality for real-world implementations, as often categorize previous methods from a theoretical...
Interpretations of TF-IDF are based on binary independence retrieval, Poisson, information theory, and language modelling. This paper contributes a review existing interpretations, then, is systematically related to the probabilities P(q|d) P(d|q). Two approaches explored: space independent, disjoint terms. For independent terms, an "extreme" query/non-query term assumption uncovers TF-IDF, analogy P(d|q) probabilistic odds O(r|d, q) mirrors relevance feedback. relationship between...
Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper proposes FastDiff, a fast conditional model for high-quality FastDiff employs stack of time-aware location-variable convolutions diverse receptive field patterns efficiently long-term time dependencies with adaptive conditions. A noise schedule predictor is...
Designing waveforms with a Constant Modulus Constraint (CMC) to achievedesirable Slow-Time Ambiguity Function (STAF) characteristics is significantly important in radar technology. The problem NP-hard, due its non-convex quartic objective function and CMC constraint. Existing methods typically involve model-based approaches relaxation data-driven Deep Neural Networks (DNNs) methods, which face the challenge of dataimitation. We observe that Complex Circle Manifold (CCM) naturally satisfies...
Accurate prediction of future blood glucose (BG) levels can effectively improve BG management for people living with type 1 or 2 diabetes, thereby reducing complications and improving quality life. The state the art has been achieved by leveraging advanced deep learning methods to model multimodal data, i.e., sensor data self-reported event organized as multi-variate time series (MTS). However, these are mostly regarded "black boxes" not entirely trusted clinicians patients. In this paper,...
Aiming at the problem of surrounding rock control during 52,102 working face passing through roof fall area return air roadway in lijiahao coal mine. Through on-site investigations, numerical simulations, and engineering practices, we analyzed characteristics causes along rib goaf. Based on Mohr-Coulomb criterion, a model was established, identifying influencing factors, proposing an early intervention pressure relief technology centered "proactive avoidance." Determined starting position...
In this work, we explore the potential of large language models (LLMs) for generating functional test scripts, which necessitates understanding dynamically evolving code structure target software. To achieve this, propose a case-based reasoning (CBR) system utilizing 4R cycle (i.e., retrieve, reuse, revise, and retain), maintains leverages case bank intent descriptions corresponding scripts to facilitate LLMs script generation. improve user experience further, introduce Re4, an optimization...
Personalized diagnosis and therapy requires monitoring patient activity using various body sensors. Sensor data generated during personalized exercises or tasks may be too specific inadequate to evaluated supervised methods such as classification. We propose multidimensional motif (MDM) discovery a means for monitoring, since motifs can capture repeating patterns across multiple dimensions of the data, serve conformance indicators. Previous studies pertaining mining MDMs have proposed...
We introduce a method combining variational autoencoders (VAEs) and deep metric learning to perform Bayesian optimisation (BO) over high-dimensional structured input spaces. By adapting ideas from learning, we use label guidance the blackbox function structure VAE latent space, facilitating Gaussian process fit yielding improved BO performance. Importantly for problem settings, our operates in semi-supervised regimes where only few labelled data points are available. run experiments on three...
Diffusion probabilistic models (DPMs) and their extensions have emerged as competitive generative yet confront challenges of efficient sampling. We propose a new bilateral denoising diffusion model (BDDM) that parameterizes both the forward reverse processes with schedule network score network, which can train novel modeling objective. show surrogate objective achieve lower bound log marginal likelihood tighter than conventional surrogate. also find BDDM allows inheriting pre-trained...
An effective system for monitoring, reporting and verification (MRV) is the cornerstone of any carbon emissions trading market. This paper analyses existing MRV frameworks in China, including under seven emission pilot schemes, to identify four key challenges establishment an China's forthcoming national market: (1) ambiguity legal status relevant policies regulations, (2) unclear requirements content monitoring plans, (3) lack consistency harmonization accounting guidelines, (4) information...