Zhiqing Sun

ORCID: 0000-0003-1933-496X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Advanced Graph Neural Networks
  • Energy Load and Power Forecasting
  • Speech Recognition and Synthesis
  • Smart Grid Energy Management
  • Explainable Artificial Intelligence (XAI)
  • Optimal Power Flow Distribution
  • Multimodal Machine Learning Applications
  • Electricity Theft Detection Techniques
  • Machine Learning and Data Classification
  • Smart Grid and Power Systems
  • Advanced Neural Network Applications
  • Smart Grid Security and Resilience
  • Advanced Text Analysis Techniques
  • Information Retrieval and Search Behavior
  • Microgrid Control and Optimization
  • Electric Vehicles and Infrastructure
  • Advanced Data and IoT Technologies
  • Power Systems and Technologies
  • Integrated Energy Systems Optimization
  • Data Quality and Management
  • Reinforcement Learning in Robotics
  • Energy Efficiency and Management
  • Technology and Security Systems

State Grid Corporation of China (China)
2023-2024

Carnegie Mellon University
2019-2024

Shanghai Electric (China)
2021-2024

Radar (United States)
2023

Technion – Israel Institute of Technology
2022

Shanghai Institute of Technology
2022

Zhejiang University
2021

Southwest Jiaotong University
2019

Peking University
2018-2019

China Academy of Engineering Physics
2019

We study the problem of learning representations entities and relations in knowledge graphs for predicting missing links. The success such a task heavily relies on ability modeling inferring patterns (or between) relations. In this paper, we present new approach graph embedding called RotatE, which is able to model infer various relation including: symmetry/antisymmetry, inversion, composition. Specifically, RotatE defines each as rotation from source entity target complex vector space....

10.48550/arxiv.1902.10197 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds of millions parameters. However, these suffer from heavy model sizes and high latency such that they cannot be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing accelerating the popular BERT model. Like original BERT, is task-agnostic, is, it can generically applied various downstream NLP tasks via simple fine-tuning. Basically,...

10.18653/v1/2020.acl-main.195 preprint EN cc-by 2020-01-01

DETR is a recently proposed Transformer-based method which views object detection as set prediction problem and achieves state-of-the-art performance but demands extra-long training time to converge. In this paper, we investigate the causes of optimization difficulty in DETR. Our examinations reveal several factors contributing slow convergence DETR, primarily issues with Hungarian loss Transformer cross-attention mechanism. To overcome these propose two solutions, namely, TSP-FCOS...

10.1109/iccv48922.2021.00359 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Stephen Bach, Victor Sanh, Zheng Xin Yong, Albert Webson, Colin Raffel, Nihal V. Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault Fevry, Zaid Alyafeai, Manan Dey, Andrea Santilli, Zhiqing Sun, Srulik Ben-david, Canwen Xu, Gunjan Chhablani, Han Wang, Jason Fries, Maged Al-shaibani, Shanya Urmish Thakker, Khalid Almubarak, Xiangru Tang, Dragomir Radev, Mike Tian-jian Jiang, Alexander Rush. Proceedings of the 60th Annual Meeting Association for Computational Linguistics: System...

10.18653/v1/2022.acl-demo.9 article EN cc-by 2022-01-01

Zhengbao Jiang, Frank Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, Graham Neubig. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

10.18653/v1/2023.emnlp-main.495 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from feedback (RLHF) to align the output of large language models (LLMs) intentions, ensuring they are helpful, ethical, reliable. However, this dependence can significantly constrain true potential agents due high cost obtaining supervision related issues quality, reliability, diversity, self-consistency, undesirable biases. To address these...

10.48550/arxiv.2305.03047 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Knowledge Graph Completion (KGC) aims at automatically predicting missing links for large-scale knowledge graphs. A vast number of state-of-the-art KGC techniques have got published top conferences in several research fields, including data mining, machine learning, and natural language processing. However, we notice that recent papers report very high performance, which largely outperforms previous methods. In this paper, find can be attributed to the inappropriate evaluation protocol used...

10.18653/v1/2020.acl-main.489 preprint EN cc-by 2020-01-01

The Transformer architecture is widely used in natural language processing. Despite its success, the design principle of remains elusive. In this paper, we provide a novel perspective towards understanding architecture: show that can be mathematically interpreted as numerical Ordinary Differential Equation (ODE) solver for convection-diffusion equation multi-particle dynamic system. particular, how words sentence are abstracted into contexts by passing through layers approximating multiple...

10.48550/arxiv.1906.02762 preprint EN other-oa arXiv (Cornell University) 2019-01-01

In an increasingly open electricity market environment, short-term load forecasting (STLF) can ensure the power grid to operate safely and stably, reduce resource waste, dispatching, provide technical support for demand-side response. Recently, with rapid development of demand side response, accurate better incentive regional prosumer groups. Traditional machine learning prediction time series based on statistics failed consider non-linear relationship between various input features,...

10.1109/access.2021.3051337 article EN cc-by IEEE Access 2021-01-01

Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds of millions parameters. However, these suffer from heavy model sizes and high latency such that they cannot be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing accelerating the popular BERT model. Like original BERT, is task-agnostic, is, it can generically applied various downstream NLP tasks via simple fine-tuning. Basically,...

10.48550/arxiv.2004.02984 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Autoregressive sequence models achieve state-of-the-art performance in domains like machine translation. However, due to the autoregressive factorization nature, these suffer from heavy latency during inference. Recently, non-autoregressive were proposed reduce inference time. assume that decoding process of each token is conditionally independent others. Such a generation sometimes makes output sentence inconsistent, and thus learned could only inferior accuracy compared their counterparts....

10.48550/arxiv.1910.11555 preprint EN other-oa arXiv (Cornell University) 2019-01-01

We propose a new paradigm to help Large Language Models (LLMs) generate more accurate factual knowledge without retrieving from an external corpus, called RECITation-augmented gEneration (RECITE). Different retrieval-augmented language models that retrieve relevant documents before generating the outputs, given input, RECITE first recites one or several passages LLMs' own memory via sampling, and then produces final answers. show is powerful for knowledge-intensive NLP tasks. Specifically,...

10.48550/arxiv.2210.01296 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing intransitivity and irrationality preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of preferences, enabling flexible language alignment. In this paper, we propose self-play-based method for alignment, which treats problem as constant-sum two-player game aimed...

10.48550/arxiv.2405.00675 preprint EN arXiv (Cornell University) 2024-05-01

In this paper, we developed an optimal dispatching model of Smart Home Energy Management System (SHEMS) with distributed energy resources (DERs) and intelligent domestic appliances. order to achieve multi-objective optimization between the saving money living comfortable, investigate math models various components come up new concept "load value" for quantitative measure users' comfort. Then set control strategies demand response adjust parameters by load characteristic in system. Applying...

10.1109/isgt-asia.2012.6303266 article EN 2012-05-01

Autoregressive (AR) models have been the dominating approach to conditional sequence generation, but are suffering from issue of high inference latency. Non-autoregressive (NAR) recently proposed reduce latency by generating all output tokens in parallel could only achieve inferior accuracy compared their autoregressive counterparts, primarily due a difficulty dealing with multi-modality generation. This paper proposes new that jointly optimizes both AR and NAR unified...

10.48550/arxiv.2006.16378 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Keyphrase extraction from documents is useful to a variety of applications such as information retrieval and document summarization. This paper presents an end-to-end method called DivGraphPointer for extracting set diversified keyphrases document. combines the advantages traditional graph-based ranking methods recent neural network-based approaches. Specifically, given document, word graph constructed based on proximity encoded with convolutional networks, which effectively capture...

10.1145/3331184.3331219 preprint EN 2019-07-18

Neural network-based Combinatorial Optimization (CO) methods have shown promising results in solving various NP-complete (NPC) problems without relying on hand-crafted domain knowledge. This paper broadens the current scope of neural solvers for NPC by introducing a new graph-based diffusion framework, namely DIFUSCO. Our framework casts as discrete {0, 1}-vector optimization and leverages denoising models to generate high-quality solutions. We investigate two types with Gaussian Bernoulli...

10.48550/arxiv.2302.08224 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Reinforcement Learning from Human Feedback (RLHF) is a widely adopted approach for aligning large language models with human values. However, RLHF relies on reward model that trained limited amount of preference data, which could lead to inaccurate predictions. As result, may produce outputs are misaligned To mitigate this issue, we contribute ensemble method allows the make more accurate using an model-based can be computationally and resource-expensive, explore efficient methods including...

10.48550/arxiv.2401.16635 preprint EN arXiv (Cornell University) 2024-01-29

As a new generation of transportation, electric vehicles play an important role in carbon-peak targets. The development needs the support charging network, and improper planning stations will result waste resources. In order to expand network give full low-carbon efficient characteristics vehicles, this paper proposed station method that considers carbon emission trends. This combined long short-term memory (LSTM) with stochastic impacts by regression on population, affluence, technology...

10.3389/fenrg.2024.1359824 article EN cc-by Frontiers in Energy Research 2024-04-24
Coming Soon ...