Jun Wang

ORCID: 0000-0002-9515-076X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Advanced Text Analysis Techniques
  • Machine Learning and Data Classification
  • Web Data Mining and Analysis
  • Anomaly Detection Techniques and Applications
  • Complex Network Analysis Techniques
  • Text and Document Classification Technologies
  • Advanced Graph Neural Networks
  • Higher Education and Teaching Methods
  • Semantic Web and Ontologies
  • Bayesian Modeling and Causal Inference
  • Domain Adaptation and Few-Shot Learning
  • Machine Learning and Algorithms
  • Educational Technology and Assessment
  • Advanced Computational Techniques and Applications
  • Energy, Environment, Economic Growth
  • Complex Systems and Time Series Analysis
  • Speech Recognition and Synthesis
  • Artificial Intelligence in Games
  • Recommender Systems and Techniques
  • Sports Analytics and Performance
  • Explainable Artificial Intelligence (XAI)
  • Data Mining Algorithms and Applications
  • Data Management and Algorithms

Tencent (China)
2019-2025

University of Electronic Science and Technology of China
2016-2025

University College London
2010-2025

Arizona State University
2002-2024

Shenyang Agricultural University
2023-2024

Beihang University
2009-2024

Hunan Normal University
2024

China University of Petroleum, East China
2022-2024

Southwestern University of Finance and Economics
2022-2023

Yangtze University
2023

We introduce Texygen, a benchmarking platform to support research on open-domain text generation models. Texygen has not only implemented majority of models, but also covered set metrics that evaluate the diversity, quality and consistency generated texts. The could help standardize facilitate sharing fine-tuned open-source implementations among researchers for their work. As consequence, this would in improving reproductivity reliability future work generation.

10.48550/arxiv.1802.01886 preprint EN cc-by arXiv (Cornell University) 2018-01-01

10.1007/s11356-022-20957-w article EN Environmental Science and Pollution Research 2022-05-27

Automatically generating coherent and semantically meaningful text has many applications in machine translation, dialogue systems, image captioning, etc. Recently, by combining with policy gradient, Generative Adversarial Nets (GAN) that use a discriminative model to guide the training of generative as reinforcement learning shown promising results generation. However, scalar guiding signal is only available after entire been generated lacks intermediate information about structure during...

10.48550/arxiv.1709.08624 preprint EN cc-by arXiv (Cornell University) 2017-01-01

In the last decade, Dynamic Time Warping (DTW) has emerged as distance measure of choice for virtually all time series data mining applications. This is result significant progress in improving DTW's efficiency, and multiple empirical studies showing that DTW-based classifiers at least equal accuracy their rivals across dozens datasets. Thus far, most research considered only one-dimensional case, with practitioners generalizing to multi-dimensional case one two ways. general, it appears...

10.1137/1.9781611974010.33 article EN 2015-06-30

The rapid development of the digital economy provides an unprecedented opportunity for China to achieve carbon neutrality by 2060. While previous studies have explored relationship between economy, technologies, and energy, impact on emissions has not received sufficient attention in literature. Meanwhile, although cities are basic units emission reduction policies, few at city level China. This study investigates spatial correlation spillover effects 248 prefecture-level from 2011 2019....

10.1016/j.cjpre.2022.06.001 article EN cc-by-nc-nd Chinese Journal of Population Resources and Environment 2022-06-01

In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for optimisers. Based these findings, propose a Heteroscedastic Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input output warping, admits exact marginal log-likelihood is robust values of learned parameters. We demonstrate HEBO’s...

10.1613/jair.1.13643 article EN cc-by Journal of Artificial Intelligence Research 2022-07-11

Automatic Text Summarization (ATS), utilizing Natural Language Processing (NLP) algorithms, aims to create concise and accurate summaries, thereby significantly reducing the human effort required in processing large volumes of text. ATS has drawn considerable interest both academic industrial circles. Many studies have been conducted past survey methods; however, they generally lack practicality for real-world implementations, as often categorize previous methods from a theoretical...

10.48550/arxiv.2403.02901 preprint EN arXiv (Cornell University) 2024-03-05

Interpretations of TF-IDF are based on binary independence retrieval, Poisson, information theory, and language modelling. This paper contributes a review existing interpretations, then, is systematically related to the probabilities P(q|d) P(d|q). Two approaches explored: space independent, disjoint terms. For independent terms, an "extreme" query/non-query term assumption uncovers TF-IDF, analogy P(d|q) probabilistic odds O(r|d, q) mirrors relevance feedback. relationship between...

10.1145/1390334.1390409 article EN 2008-07-20

Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper proposes FastDiff, a fast conditional model for high-quality FastDiff employs stack of time-aware location-variable convolutions diverse receptive field patterns efficiently long-term time dependencies with adaptive conditions. A noise schedule predictor is...

10.48550/arxiv.2204.09934 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Designing waveforms with a Constant Modulus Constraint (CMC) to achievedesirable Slow-Time Ambiguity Function (STAF) characteristics is significantly important in radar technology. The problem NP-hard, due its non-convex quartic objective function and CMC constraint. Existing methods typically involve model-based approaches relaxation data-driven Deep Neural Networks (DNNs) methods, which face the challenge of dataimitation. We observe that Complex Circle Manifold (CCM) naturally satisfies...

10.3390/rs17010173 article EN cc-by Remote Sensing 2025-01-06

Accurate prediction of future blood glucose (BG) levels can effectively improve BG management for people living with type 1 or 2 diabetes, thereby reducing complications and improving quality life. The state the art has been achieved by leveraging advanced deep learning methods to model multimodal data, i.e., sensor data self-reported event organized as multi-variate time series (MTS). However, these are mostly regarded "black boxes" not entirely trusted clinicians patients. In this paper,...

10.1016/j.neunet.2025.107229 article EN cc-by Neural Networks 2025-02-05

10.1109/icassp49660.2025.10889904 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Aiming at the problem of surrounding rock control during 52,102 working face passing through roof fall area return air roadway in lijiahao coal mine. Through on-site investigations, numerical simulations, and engineering practices, we analyzed characteristics causes along rib goaf. Based on Mohr-Coulomb criterion, a model was established, identifying influencing factors, proposing an early intervention pressure relief technology centered "proactive avoidance." Determined starting position...

10.1038/s41598-025-94191-y article EN cc-by-nc-nd Scientific Reports 2025-03-28

In this work, we explore the potential of large language models (LLMs) for generating functional test scripts, which necessitates understanding dynamically evolving code structure target software. To achieve this, propose a case-based reasoning (CBR) system utilizing 4R cycle (i.e., retrieve, reuse, revise, and retain), maintains leverages case bank intent descriptions corresponding scripts to facilitate LLMs script generation. improve user experience further, introduce Re4, an optimization...

10.48550/arxiv.2503.20576 preprint EN arXiv (Cornell University) 2025-03-26

Personalized diagnosis and therapy requires monitoring patient activity using various body sensors. Sensor data generated during personalized exercises or tasks may be too specific inadequate to evaluated supervised methods such as classification. We propose multidimensional motif (MDM) discovery a means for monitoring, since motifs can capture repeating patterns across multiple dimensions of the data, serve conformance indicators. Previous studies pertaining mining MDMs have proposed...

10.1109/jstsp.2016.2543679 article EN publisher-specific-oa IEEE Journal of Selected Topics in Signal Processing 2016-03-17

We introduce a method combining variational autoencoders (VAEs) and deep metric learning to perform Bayesian optimisation (BO) over high-dimensional structured input spaces. By adapting ideas from learning, we use label guidance the blackbox function structure VAE latent space, facilitating Gaussian process fit yielding improved BO performance. Importantly for problem settings, our operates in semi-supervised regimes where only few labelled data points are available. run experiments on three...

10.48550/arxiv.2106.03609 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Diffusion probabilistic models (DPMs) and their extensions have emerged as competitive generative yet confront challenges of efficient sampling. We propose a new bilateral denoising diffusion model (BDDM) that parameterizes both the forward reverse processes with schedule network score network, which can train novel modeling objective. show surrogate objective achieve lower bound log marginal likelihood tighter than conventional surrogate. also find BDDM allows inheriting pre-trained...

10.48550/arxiv.2203.13508 preprint EN other-oa arXiv (Cornell University) 2022-01-01

An effective system for monitoring, reporting and verification (MRV) is the cornerstone of any carbon emissions trading market. This paper analyses existing MRV frameworks in China, including under seven emission pilot schemes, to identify four key challenges establishment an China's forthcoming national market: (1) ambiguity legal status relevant policies regulations, (2) unclear requirements content monitoring plans, (3) lack consistency harmonization accounting guidelines, (4) information...

10.1080/14693062.2018.1454882 article EN cc-by Climate Policy 2018-03-29
Coming Soon ...