Yunqing Xia

ORCID: 0009-0005-8608-574X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Advanced Text Analysis Techniques
  • Text and Document Classification Technologies
  • Sentiment Analysis and Opinion Mining
  • Web Data Mining and Analysis
  • Natural Language Processing Techniques
  • Spam and Phishing Detection
  • Music and Audio Processing
  • Semantic Web and Ontologies
  • Speech Recognition and Synthesis
  • Data Management and Algorithms
  • Data Mining Algorithms and Applications
  • Complex Network Analysis Techniques
  • Biomedical Text Mining and Ontologies
  • Speech and Audio Processing
  • Authorship Attribution and Profiling
  • Algorithms and Data Compression
  • Service-Oriented Architecture and Web Services
  • Recommender Systems and Techniques
  • Information Retrieval and Search Behavior
  • Cybercrime and Law Enforcement Studies
  • Speech and dialogue systems
  • Geological Modeling and Analysis
  • Computational and Text Analysis Methods
  • Imbalanced Data Classification Techniques

Xiangtan Electric Manufacturing Group (China)
2024

Microsoft Research Asia (China)
2015-2023

Daqing Oilfield General Hospital
2022

Microsoft (United States)
2020

Search
2020

Tsinghua University
2007-2015

Peking University
2015

HKUST Shenzhen Research Institute
2015

University of Hong Kong
2015

Beijing University of Posts and Telecommunications
2012

The Web holds valuable, vast, and unstructured information about public opinion. Here, the history, current use, future of opinion mining sentiment analysis are discussed, along with relevant techniques tools.

10.1109/mis.2013.30 article EN IEEE Intelligent Systems 2013-03-01

In the era of Web 2.0, huge volumes consumer reviews are posted to Internet every day. Manual approaches detecting and analyzing fake (i.e., spam) not practical due problem information overload. However, design development automated methods is a challenging research problem. The main reason that specifically composed mislead readers, so they may appear same as legitimate ham). As result, discriminatory features would enable individual be classified spam or ham available. Guided by science...

10.1145/2070710.2070716 article EN ACM Transactions on Management Information Systems 2011-12-01

In product reviews, it is observed that the distribution of polarity ratings over reviews written by different users or evaluated based on products are often skewed in real world. As such, incorporating user and information would be helpful for task sentiment classification reviews. However, existing approaches ignored temporal nature posted same product. We argue relations might potentially useful learning embedding thus propose employing a sequence model to embed these into representations...

10.1109/mci.2016.2572539 article EN IEEE Computational Intelligence Magazine 2016-07-18

Abstract-There has been a rapid growth in the number of cybercr imes that cause tremendous financial loss to organizations. Recent studies reveal cybercriminals tend collaborate or even transact cyber-attack tools via "dark markets" established online social media. Accordingly, it presents unprecedented opportunities for researchers tap into these underground cybercriminal communities develop better insights about collaborative cybercrime activities so as combat ever increasing cybercrimes....

10.1109/mci.2013.2291689 article EN IEEE Computational Intelligence Magazine 2014-01-24

Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, trained on labeled may suffer from overfitting adaptability problems. Dataless classification (DLTC) has been proposed as solution to these problems, since it does not require documents. Previous research in DLTC used explicit semantic analysis of Wikipedia content measure distance between documents, which turn classify test based nearest neighbours. The semantic-based method major drawback...

10.1609/aaai.v29i1.9506 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2015-02-19

Multi-document Summarization (MDS) is of great value to many real world applications.Many scoring models are proposed select appropriate sentences from documents form the summary, in which clustering-based methods popular.In this work, we propose a unified sentence model measures representativeness and diversity at same time.Experimental results on DUC04 demonstrate that our MDS method outperforms best existing methods, it yields close compared state-of-the-art generic methods.Advantages...

10.3115/v1/n15-1136 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2015-01-01

The articles in this special section focus on computational intelligence for big social data analytics. In the eras of connectedness and colonization, people are becoming increasingly enthusiastic about interacting, sharing, collaborating through online collaborative media. recent years, collective has spread to many different areas, with particular fields related everyday life such as commerce, tourism, education, health, causing size Social Web expand exponentially. distillation knowledge...

10.1109/mci.2016.2572481 article EN IEEE Computational Intelligence Magazine 2016-07-18

Lyric-based song sentiment classification seeks to assign songs appropriate labels such as light-hearted and heavy-hearted. Four problems render vector space model (VSM)-based text approach ineffective: 1) Many words within lyrics actually contribute little sentiment; 2) Nouns verbs used express are ambiguous; 3) Negations modifiers around the keywords make particular contributions 4) Song lyric is usually very short. To address these problems, (s-VSM) proposed represent document. The...

10.3115/1557690.1557725 article EN 2008-01-01

Stance classification aims at identifying, in the text, attitude toward given targets as favorable, negative, or unrelated. In existing models for stance classification, only textual representation is leveraged, while commonsense knowledge ignored. order to better incorporate into we propose a novel model named enhanced memory network, which jointly represents and of target text. The module our treats vectors, uses attention mechanism embody important parts. For module, leverage entity...

10.1109/mis.2020.2983497 article EN IEEE Intelligent Systems 2020-07-01

In order to solve the problem of frequency modulation power deviation caused by randomness and fluctuation wind outputs, a method auxiliary capacity allocation based on data decomposition “flywheel + lithium battery” hybrid-energy storage system was proposed. Firstly, uncertainty is decomposed successive variational mode (SVMD) method, function segmented reconstructed high low frequencies. Secondly, mathematical model established maximize economic benefit energy considering mileage, quantum...

10.3390/en17174391 article EN cc-by Energies 2024-09-02

This paper presents an opinion analysis system based on linguistic knowledge which is acquired from small-scale annotated text and raw topic-relevant Web page. Based the observation corpus, some word-, collocation- sentence-level features for are discovered. Supervised unsupervised learning techniques developed to learn these relevant page, respectively. These then incorporated into a classifier support vector machine (SVM) identify opinionated sentences determine their polarities....

10.1109/wiiat.2008.388 article EN 2008-12-01

Unlike most online social networks where explicit links among individual users are defined, the relations commercial entities (e.g. firms) may not be explicitly declared in Web sites. One main contribution of this article is development a novel computational model for discovery latent from financial news. More specifically, CRF which can exploit both structural and contextual features applied to entity recognition. In addition, point-wise mutual information (PMI)-based unsupervised learning...

10.1080/17517575.2011.621093 article EN Enterprise Information Systems 2011-09-27

Deploying pre-trained transformer models like BERT on downstream tasks in resource-constrained scenarios is challenging due to their high inference cost, which grows rapidly with input sequence length. In this work, we propose a constraint-aware and ranking-distilled token pruning method ToP, selectively removes unnecessary tokens as passes through layers, allowing the model improve online speed while preserving accuracy. ToP overcomes limitation of inaccurate importance ranking conventional...

10.1145/3580305.3599284 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023-08-04

Although deep pre-trained language models have shown promising benefit in a large set of industrial scenarios, including Click-Through-Rate (CTR) prediction, how to integrate that handle only textual signals into prediction pipeline with non-textual features is challenging.

10.1145/3580305.3599780 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023-08-04

Stock market reports in on-line news are widely used by amateurs to make quick investment decisions. Financial analysts often give opinions about trends of stock markets based on past and present economic event indicators. These commonly appear text form abundant over the Internet. It is tedious time consuming for users browse through such manually let alone understand embedded opinions. To overcome this shortcoming, automatic trend predication methods have been proposed. Under conventional...

10.1142/s1793840608001949 article EN International Journal of Computer Processing Of Languages 2008-12-01

Ke Xu, Yunqing Xia, Chin-Hui Lee. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.

10.3115/v1/p15-1089 article EN cc-by 2015-01-01

Research of temporal Information Extraction was regarded as a subtask named entity recognition in 1990's. To date, the scope this research is broadened, ranging from expression extraction and annotation to reasoning understanding. This area now hot NLP topic results are applicable question answering, information extraction, text summarization, etc. paper presents past, present future development extraction.

10.1142/s0219427905001225 article EN International Journal of Computer Processing Of Languages 2005-06-01

Chatting is a popular communication media on the Internet via ICQ, chat rooms, etc. Chat language different from natural due to its anomalous and dynamic natures, which renders conventional NLP tools inapplicable. The problem enormously troublesome because it makes static corpus outdated quickly in representing contemporary language. To address problem, we propose phonetic mapping models present mappings between terms standard words transcription, i.e. Chinese Pinyin our case. Different...

10.3115/1220175.1220300 article EN 2006-01-01
Coming Soon ...