- Topic Modeling
- Advanced Text Analysis Techniques
- Text and Document Classification Technologies
- Sentiment Analysis and Opinion Mining
- Web Data Mining and Analysis
- Natural Language Processing Techniques
- Spam and Phishing Detection
- Music and Audio Processing
- Semantic Web and Ontologies
- Speech Recognition and Synthesis
- Data Management and Algorithms
- Data Mining Algorithms and Applications
- Complex Network Analysis Techniques
- Biomedical Text Mining and Ontologies
- Speech and Audio Processing
- Authorship Attribution and Profiling
- Algorithms and Data Compression
- Service-Oriented Architecture and Web Services
- Recommender Systems and Techniques
- Information Retrieval and Search Behavior
- Cybercrime and Law Enforcement Studies
- Speech and dialogue systems
- Geological Modeling and Analysis
- Computational and Text Analysis Methods
- Imbalanced Data Classification Techniques
Xiangtan Electric Manufacturing Group (China)
2024
Microsoft Research Asia (China)
2015-2023
Daqing Oilfield General Hospital
2022
Microsoft (United States)
2020
Search
2020
Tsinghua University
2007-2015
Peking University
2015
HKUST Shenzhen Research Institute
2015
University of Hong Kong
2015
Beijing University of Posts and Telecommunications
2012
The Web holds valuable, vast, and unstructured information about public opinion. Here, the history, current use, future of opinion mining sentiment analysis are discussed, along with relevant techniques tools.
In the era of Web 2.0, huge volumes consumer reviews are posted to Internet every day. Manual approaches detecting and analyzing fake (i.e., spam) not practical due problem information overload. However, design development automated methods is a challenging research problem. The main reason that specifically composed mislead readers, so they may appear same as legitimate ham). As result, discriminatory features would enable individual be classified spam or ham available. Guided by science...
In product reviews, it is observed that the distribution of polarity ratings over reviews written by different users or evaluated based on products are often skewed in real world. As such, incorporating user and information would be helpful for task sentiment classification reviews. However, existing approaches ignored temporal nature posted same product. We argue relations might potentially useful learning embedding thus propose employing a sequence model to embed these into representations...
Abstract-There has been a rapid growth in the number of cybercr imes that cause tremendous financial loss to organizations. Recent studies reveal cybercriminals tend collaborate or even transact cyber-attack tools via "dark markets" established online social media. Accordingly, it presents unprecedented opportunities for researchers tap into these underground cybercriminal communities develop better insights about collaborative cybercrime activities so as combat ever increasing cybercrimes....
Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, trained on labeled may suffer from overfitting adaptability problems. Dataless classification (DLTC) has been proposed as solution to these problems, since it does not require documents. Previous research in DLTC used explicit semantic analysis of Wikipedia content measure distance between documents, which turn classify test based nearest neighbours. The semantic-based method major drawback...
Multi-document Summarization (MDS) is of great value to many real world applications.Many scoring models are proposed select appropriate sentences from documents form the summary, in which clustering-based methods popular.In this work, we propose a unified sentence model measures representativeness and diversity at same time.Experimental results on DUC04 demonstrate that our MDS method outperforms best existing methods, it yields close compared state-of-the-art generic methods.Advantages...
The articles in this special section focus on computational intelligence for big social data analytics. In the eras of connectedness and colonization, people are becoming increasingly enthusiastic about interacting, sharing, collaborating through online collaborative media. recent years, collective has spread to many different areas, with particular fields related everyday life such as commerce, tourism, education, health, causing size Social Web expand exponentially. distillation knowledge...
Lyric-based song sentiment classification seeks to assign songs appropriate labels such as light-hearted and heavy-hearted. Four problems render vector space model (VSM)-based text approach ineffective: 1) Many words within lyrics actually contribute little sentiment; 2) Nouns verbs used express are ambiguous; 3) Negations modifiers around the keywords make particular contributions 4) Song lyric is usually very short. To address these problems, (s-VSM) proposed represent document. The...
Stance classification aims at identifying, in the text, attitude toward given targets as favorable, negative, or unrelated. In existing models for stance classification, only textual representation is leveraged, while commonsense knowledge ignored. order to better incorporate into we propose a novel model named enhanced memory network, which jointly represents and of target text. The module our treats vectors, uses attention mechanism embody important parts. For module, leverage entity...
In order to solve the problem of frequency modulation power deviation caused by randomness and fluctuation wind outputs, a method auxiliary capacity allocation based on data decomposition “flywheel + lithium battery” hybrid-energy storage system was proposed. Firstly, uncertainty is decomposed successive variational mode (SVMD) method, function segmented reconstructed high low frequencies. Secondly, mathematical model established maximize economic benefit energy considering mileage, quantum...
This paper presents an opinion analysis system based on linguistic knowledge which is acquired from small-scale annotated text and raw topic-relevant Web page. Based the observation corpus, some word-, collocation- sentence-level features for are discovered. Supervised unsupervised learning techniques developed to learn these relevant page, respectively. These then incorporated into a classifier support vector machine (SVM) identify opinionated sentences determine their polarities....
Unlike most online social networks where explicit links among individual users are defined, the relations commercial entities (e.g. firms) may not be explicitly declared in Web sites. One main contribution of this article is development a novel computational model for discovery latent from financial news. More specifically, CRF which can exploit both structural and contextual features applied to entity recognition. In addition, point-wise mutual information (PMI)-based unsupervised learning...
Deploying pre-trained transformer models like BERT on downstream tasks in resource-constrained scenarios is challenging due to their high inference cost, which grows rapidly with input sequence length. In this work, we propose a constraint-aware and ranking-distilled token pruning method ToP, selectively removes unnecessary tokens as passes through layers, allowing the model improve online speed while preserving accuracy. ToP overcomes limitation of inaccurate importance ranking conventional...
Although deep pre-trained language models have shown promising benefit in a large set of industrial scenarios, including Click-Through-Rate (CTR) prediction, how to integrate that handle only textual signals into prediction pipeline with non-textual features is challenging.
Stock market reports in on-line news are widely used by amateurs to make quick investment decisions. Financial analysts often give opinions about trends of stock markets based on past and present economic event indicators. These commonly appear text form abundant over the Internet. It is tedious time consuming for users browse through such manually let alone understand embedded opinions. To overcome this shortcoming, automatic trend predication methods have been proposed. Under conventional...
Ke Xu, Yunqing Xia, Chin-Hui Lee. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.
Research of temporal Information Extraction was regarded as a subtask named entity recognition in 1990's. To date, the scope this research is broadened, ranging from expression extraction and annotation to reasoning understanding. This area now hot NLP topic results are applicable question answering, information extraction, text summarization, etc. paper presents past, present future development extraction.
Chatting is a popular communication media on the Internet via ICQ, chat rooms, etc. Chat language different from natural due to its anomalous and dynamic natures, which renders conventional NLP tools inapplicable. The problem enormously troublesome because it makes static corpus outdated quickly in representing contemporary language. To address problem, we propose phonetic mapping models present mappings between terms standard words transcription, i.e. Chinese Pinyin our case. Different...