Shengyu Mao

ORCID: 0009-0006-0030-8314
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Semantic Web and Ontologies
  • Advanced Graph Neural Networks
  • Bayesian Modeling and Causal Inference
  • Advanced Text Analysis Techniques
  • Explainable Artificial Intelligence (XAI)
  • Multimodal Machine Learning Applications
  • Magnetic Properties and Applications
  • Law, AI, and Intellectual Property
  • Electric Motor Design and Analysis
  • Library Science and Information Systems
  • Induction Heating and Inverter Technology
  • Artificial Intelligence in Law
  • Mental Health via Writing
  • Sentiment Analysis and Opinion Mining
  • Online Learning and Analytics

Zhejiang University
2023-2024

Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by dynamic nature of world, necessitating frequent updates to LLMs correct outdated information or integrate new knowledge, thereby ensuring continued relevance. Note many...

10.48550/arxiv.2401.01286 preprint EN other-oa arXiv (Cornell University) 2024-01-01

With the development of pre-trained language models, many prompt-based approaches to data-efficient knowledge graph construction have been proposed and achieved impressive performance. However, existing learning methods for are still susceptible several potential limitations: (i) semantic gap between natural output structured with pre-defined schema, which means model cannot fully exploit constrained templates; (ii) representation locally individual instances limits performance given...

10.1145/3539618.3591763 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023-07-18

As Large Language Models (LLMs) and Retrieval Augmentation Generation (RAG) techniques have evolved, query rewriting has been widely incorporated into the RAG system for downstream tasks like open-domain QA. Many works attempted to utilize small models with reinforcement learning rather than costly LLMs improve rewriting. However, current methods require annotations (e.g., labeled relevant documents or answers) predesigned rewards feedback, which lack generalization, fail signals tailored In...

10.48550/arxiv.2405.14431 preprint EN arXiv (Cornell University) 2024-05-23

Previous studies have revealed that vanilla pre-trained language models (PLMs) lack the capacity to handle knowledge-intensive NLP tasks alone; thus, several works attempted integrate external knowledge into PLMs. However, despite promising outcome, we empirically observe PLMs may already encoded rich in their parameters but fail fully utilize them when applying tasks. In this paper, propose a new paradigm dubbed Knowledge Rumination help model related latent without retrieving it from...

10.48550/arxiv.2305.08732 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Event-centric structured prediction involves predicting outputs of events. In most NLP cases, event structures are complex with manifold dependency, and it is challenging to effectively represent these complicated To address issues, we propose Structured Prediction Energy-based Event-Centric Hyperspheres (SPEECH). SPEECH models dependency among components energy-based modeling, represents classes simple but effective hyperspheres. Experiments on two unified-annotated datasets indicate that...

10.18653/v1/2023.acl-long.21 article EN cc-by 2023-01-01

This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs). seeks to adjust models' responses opinion-related questions specified topics since individual's often manifests in form their expressed opinions, thereby showcasing different traits. Specifically, we construct a new benchmark dataset PersonalityEdit address this task. Drawing theory Social Psychology, isolate three representative traits, namely Neuroticism, Extraversion, and...

10.48550/arxiv.2310.02168 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Recently, there has been a growing interest in knowledge editing for Large Language Models (LLMs). Current approaches and evaluations merely explore the instance-level editing, while whether LLMs possess capability to modify concepts remains unclear. This paper pioneers investigation of conceptual LLMs, by constructing novel benchmark dataset ConceptEdit establishing suite new metrics evaluation. The experimental results reveal that, although existing methods can efficiently concept-level...

10.48550/arxiv.2403.06259 preprint EN arXiv (Cornell University) 2024-03-10

With the development of pre-trained language models, many prompt-based approaches to data-efficient knowledge graph construction have been proposed and achieved impressive performance. However, existing learning methods for are still susceptible several potential limitations: (i) semantic gap between natural output structured with pre-defined schema, which means model cannot fully exploit constrained templates; (ii) representation locally individual instances limits performance given...

10.48550/arxiv.2210.10709 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Event-centric structured prediction involves predicting outputs of events. In most NLP cases, event structures are complex with manifold dependency, and it is challenging to effectively represent these complicated To address issues, we propose Structured Prediction Energy-based Event-Centric Hyperspheres (SPEECH). SPEECH models dependency among components energy-based modeling, represents classes simple but effective hyperspheres. Experiments on two unified-annotated datasets indicate that...

10.48550/arxiv.2305.13617 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

Previous studies have revealed that vanilla pre-trained language models (PLMs) lack the capacity to handle knowledge-intensive NLP tasks alone; thus, several works attempted integrate external knowledge into PLMs. However, despite promising outcome, we empirically observe PLMs may already encoded rich in their parameters but fails fully utilize them when applying tasks. In this paper, propose a new paradigm dubbed Knowledge Rumination help model related latent without retrieving from corpus....

10.18653/v1/2023.emnlp-main.206 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01
Coming Soon ...