Peng Li

ORCID: 0000-0003-1374-5979
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Advanced Computational Techniques and Applications
  • Advanced Text Analysis Techniques
  • Natural Language Processing Techniques
  • Speech Recognition and Synthesis
  • Web Data Mining and Analysis
  • Educational Technology and Assessment
  • Complex Network Analysis Techniques
  • Speech and Audio Processing
  • Multimodal Machine Learning Applications
  • Advanced Database Systems and Queries
  • Domain Adaptation and Few-Shot Learning
  • Music and Audio Processing
  • Data Management and Algorithms
  • Text and Document Classification Technologies
  • Machine Learning and Algorithms
  • 3D Modeling in Geospatial Applications
  • Intelligent Tutoring Systems and Adaptive Learning
  • Sentiment Analysis and Opinion Mining
  • Recommender Systems and Techniques
  • Advanced Clustering Algorithms Research
  • Misinformation and Its Impacts
  • Caching and Content Delivery
  • Experimental Learning in Engineering
  • Wireless Signal Modulation Classification

Tsinghua University
2009-2024

The Synergetic Innovation Center for Advanced Materials
2023-2024

Beijing Academy of Artificial Intelligence
2023-2024

Shanghai Artificial Intelligence Laboratory
2023-2024

Nanjing University of Information Science and Technology
2024

East Carolina University
2024

Beijing Institute of Technology
2022

Harbin Institute of Technology
2022

Shenyang Jianzhu University
2011-2020

Ostwestfalen-Lippe University of Applied Sciences and Arts
2018

Keyphrases are widely used as a brief summary of documents. Since manual assignment is time-consuming, various unsupervised ranking methods based on importance scores proposed for keyphrase extraction. In practice, the keyphrases document should not only be statistically important in document, but also have good coverage document. Based this observation, we propose an method Firstly, finds exemplar terms by leveraging clustering techniques, which guarantees to semantically covered these...

10.3115/1699510.1699544 article EN 2009-01-01

Abstract We revamped hands-on labs in a cyber security course to be aligned with our college's new initiatives increase accessibility utilizing "ed-tech" (cloud services, etc.) and use of learning management systems for real-time assessment student intervention needs, program outcomes continuous improvement. The field is evolving fast. current lab environment were outdated. In this project, virtual are updated or recreated. used employ an old consisting three individual machines. composed...

10.18260/1-2--41300 article EN 2024-02-06

Black-box prompt tuning employs derivative-free optimization algorithms to learn prompts within low-dimensional subspaces rather than back-propagating through the network of Large Language Models (LLMs).Recent studies reveal that black-box lacks versatility across tasks and LLMs, which we believe is related suboptimal choice subspaces.In this paper, introduce with Subspace Learning (BSL) enhance tuning.Based on assumption nearly optimal for similar reside in a common subspace, propose...

10.1109/taslp.2024.3407519 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2024-01-01

Why can pre-trained language models (PLMs) learn universal representations and effectively adapt to broad NLP tasks differing a lot superficially? In this work, we empirically find evidence indicating that the adaptations of PLMs various few-shot be reparameterized as optimizing only few free parameters in unified low-dimensional <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">intrinsic task subspace</i> , which may help us understand why...

10.1109/taslp.2024.3430545 article EN cc-by-nc-nd IEEE/ACM Transactions on Audio Speech and Language Processing 2024-01-01

Conversational search allows a user to interact with system in multiple turns. A query is strongly dependent on the conversation context. An effective way improve retrieval effectiveness expand current historical queries. However, not all previous queries are related to, and useful for expanding query. In this paper, we propose new method select relevant that To cope lack of labeled training data, use pseudo-labeling approach annotate based their impact results. The pseudo-labeled data used...

10.1145/3580305.3599411 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023-08-04

Multiple entities in a document generally exhibit complex inter-sentence relations, and cannot be well handled by existing relation extraction (RE) methods that typically focus on extracting intra-sentence relations for single entity pairs. In order to accelerate the research document-level RE, we introduce DocRED, new dataset constructed from Wikipedia Wikidata with three features: (1) DocRED annotates both named is largest human-annotated RE plain text; (2) requires reading multiple...

10.48550/arxiv.1906.06127 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Recent clustering based anomaly detection technologies classify new observations in different ways, e.g. using probability distributions, cluster centers or whole data points. Some of which suffer from high false classification rate, while others require computational resources. In this paper, we propose a geometric approach to detection, the boundaries clusters are utilized instead. To identify boundaries, algorithm for generating n-dimensional non-convex hulls has been developed. The...

10.1109/iecon.2018.8592906 article EN IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society 2018-10-01

Monaural speech segregation is a very challenging problem which has been studied by many researchers. In this paper, we focus on voiced segregation. Different strategies are used to segregate resolved and unresolved harmonics respectively. For harmonics, "harmonicity" principle novel mechanism based "minimum amplitude" employed. Amplitude modulation rate extracted "enhanced" autocorrelation function of envelope more robust than previous method. An elaborate rule also introduced determine the...

10.1109/icassp.2009.4960670 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2009-04-01

The dynamical correlation between quantum entanglement and intramolecular energy in realistic molecular vibrations is explored using the Lie algebraic approach. explicit expression of measurement can be achieved operations. common different characteristics are also provided. study small helpful for controlling further understanding dynamics.

10.1088/1674-1056/23/7/073301 article EN Chinese Physics B 2014-07-01

Nowadays, the challenge of learning from large scale and imbalanced data set have attracted a great deal attention both industry academia, which is also deemed to be an important task for fraud detection in telecommunication, finance, online commerce. In general, it's almost impossible train classification model on complete set, especially era big data, due space-time complexity. Thus, how sample training original large-scale that can provide more accurate prediction result has become focal...

10.1109/icsssm.2017.7996301 article EN International Conference on Service Systems and Service Management 2017-06-01

The existing methods for the construction of Voronoi Diagram and nearest neighbor query have several disadvantages. In view disadvantages, new were studied in detail. To construct diagram effectively, datasets grouped advance, then divide-and-conquer method combined with increment was used to generate Delaunay triangulation. By generating from triangulation, Creat_Vor algorithm proposed. order effectively update neighbors given points, based on spatial grids studied. VGride_NN VGride_BNN put...

10.1109/ifost.2014.6991116 article EN 2014-10-01

Graph Contrastive Learning (GCL) has proven highly effective in promoting the performance of Semi-Supervised Node Classification (SSNC). However, existing GCL methods are generally transferred from other fields like CV or NLP, whose underlying working mechanism remains underexplored. In this work, we first deeply probe SSNC, and find that promotion brought by is severely unevenly distributed: improvement mainly comes subgraphs with less annotated information, which fundamentally different...

10.24963/ijcai.2022/395 article EN Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022-07-01

In addressing the pivotal role of translating natural language queries into SQL commands, we propose a suite compact, fine-tuned models and self-refine mechanisms to democratize data access analysis for non-expert users, mitigating risks associated with closed-source Large Language Models. Specifically, constructed dataset over 20K sample Text-to-SQL as well preference dateset, improve efficiency in domain generation. To further ensure code validity, corrector was integrated model. Our...

10.48550/arxiv.2409.15985 preprint EN arXiv (Cornell University) 2024-09-24

Black-box prompt tuning employs derivative-free optimization algorithms to learn prompts within low-dimensional subspaces rather than back-propagating through the network of Large Language Models (LLMs). Recent studies reveal that black-box lacks versatility across tasks and LLMs, which we believe is related suboptimal choice subspaces. In this paper, introduce with Subspace Learning (BSL) enhance tuning. Based on assumption nearly optimal for similar reside in a common subspace, propose...

10.48550/arxiv.2305.03518 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Knowledge distillation (KD) is a widely used method for transferring knowledge from large teacher models to computationally efficient student models. Unfortunately, the computational cost of KD becomes unaffordable as pre-trained language (PLMs) grow larger. Computing loss on only part training set promising way accelerate KD. However, existing works heuristically leverage one static data selection strategy during process, demonstrating inconsistent improvements across different scenarios....

10.1016/j.aiopen.2023.08.005 article EN cc-by-nc-nd AI Open 2023-01-01

Continual learning aims to avoid catastrophic forgetting and effectively leverage learned experiences master new knowledge. Existing gradient projection approaches impose hard constraints on the optimization space for tasks minimize interference, which simultaneously hinders forward knowledge transfer. To address this issue, recent methods reuse frozen parameters with a growing network, resulting in high computational costs. Thus, it remains challenge whether we can improve transfer using...

10.1016/j.aiopen.2023.08.010 article EN cc-by-nc-nd AI Open 2023-01-01

Tool learning aims to extend the capabilities of large language models (LLMs) with external tools. A major challenge in tool is how support a number tools, including unseen To address this challenge, previous studies have proposed retrieving suitable tools for LLM based on user query. However, previously methods do not consider differences between seen and nor they take hierarchy library into account, which may lead suboptimal performance retrieval. Therefore, aforementioned issues, we...

10.48550/arxiv.2403.06551 preprint EN arXiv (Cornell University) 2024-03-11
Coming Soon ...