- Topic Modeling
- Text and Document Classification Technologies
- Natural Language Processing Techniques
- Complex Network Analysis Techniques
- Web Data Mining and Analysis
- Advanced Text Analysis Techniques
- Spam and Phishing Detection
- Recommender Systems and Techniques
- Face and Expression Recognition
- Caching and Content Delivery
- Image Retrieval and Classification Techniques
- Advanced Graph Neural Networks
- Sentiment Analysis and Opinion Mining
- Neural Networks and Applications
- Multimodal Machine Learning Applications
- Cloud Computing and Resource Management
- Photonic and Optical Devices
- Peer-to-Peer Network Technologies
- Bioinformatics and Genomic Networks
- Machine Learning in Bioinformatics
- Opinion Dynamics and Social Influence
- Human Mobility and Location-Based Analysis
- Semiconductor Lasers and Optical Devices
- Algorithms and Data Compression
- Supply Chain and Inventory Management
University of Electronic Science and Technology of China
2013-2024
Henan University
2012-2024
Tencent (China)
2024
Linköping University
2024
University of Shanghai for Science and Technology
2024
University of Chinese Academy of Sciences
2023
First Affiliated Hospital of Xiamen University
2023
Chinese Academy of Sciences
2023
Chongqing University of Posts and Telecommunications
2023
State Nuclear Power Technology Company (China)
2023
Memory-based approaches for collaborative filtering identify the similarity between two users by comparing their ratings on a set of items. In past, memory-based approach has been shown to suffer from fundamental problems: data sparsity and difficulty in scalability. Alternatively, model-based proposed alleviate these problems, but this tends limit range users. paper, we present novel that combines advantages introducing smoothing-based method. our approach, clusters generated training...
Understanding the intent behind a user's query can help search engine to automatically route some corresponding vertical engines obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches predict mainly utilize machine learning techniques. However, it is difficult often requires many human efforts meet all these by...
Very large-scale classification taxonomies typically have hundreds of thousands categories, deep hierarchies, and skewed category distribution over documents. However, it is still an open question whether the state-of-the-art technologies in automated text categorization can scale to (and perform well on) such large taxonomies. In this paper, we report first evaluation Support Vector Machines (SVMs) web-page full taxonomy Yahoo! categories. Our accomplishments include: 1) a data analysis on...
Dimensionality reduction is an essential data preprocessing technique for large-scale and streaming classification tasks. It can be used to improve both the efficiency effectiveness of classifiers. Traditional dimensionality approaches fall into two categories: feature extraction selection. Techniques in category are typically more effective than those selection category. However, they may break down when processing sets or streams due their high computational complexities. Similarly,...
Labeling text data is quite time-consuming but essential for automatic classification. Especially, manually creating multiple labels each document may become impractical when a very large amount of needed training multi-label classifiers. To minimize the human-labeling efforts, we propose novel active learning approach which can reduce required labeled without sacrificing classification accuracy. Traditional algorithms only handle single-label problems, that is, restricted to have one label....
To help users quickly understand the major opinions from massive online reviews, it is important to automatically reveal latent structure of aspects, sentiment polarities, and association between them. However, there little work available do this effectively. In paper, we propose a hierarchical aspect model (HASM) discover aspect-based sentiments unlabeled reviews. HASM, whole tree. Each node itself two-level tree, whose root represents an children represent polarities associated with it. or...
Proto-oncogene non-receptor tyrosine protein kinase c-Src has been involved in the development, progression and metastasis of a variety human cancers. This contains two self-binding peptide (SBP) sites separately between SH3 domain polyproline-II (PPII) helix SH2 C-terminal phosphorylatable tail (CTPT), which are potential targets anticancer drugs to regulate activity. Here, we described an integrated protocol systematically investigate structural basis, energetic property dynamics behaviour...
Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than decade research and development, problem how to competently interact with diverse road users scenarios remains largely unsolved. Learning methods have much offer towards solving this problem. But they require realistic multi-agent simulator that generates competent interactions. To meet need, we develop dedicated simulation platform called SMARTS (Scalable Multi-Agent RL Training...
The peptide quantitative structure–activity relationship (QSAR), also known as the sequence–activity model (QSAM), has attracted much attention in bio- and chemoinformatics communities is a well developed computational peptidology strategy to statistically correlate sequence/structure activity/property relationships of functional peptides. Amino acid descriptors (AADs) are one most widely used methods characterize structures by decomposing into its residue building blocks sequentially...
The CDKN2A (cyclin dependent kinase inhibitor 2A/multiple tumor suppressor 1) gene, also known as the P16 encodes multiple 1 (MTS1), which belongs to INK4 family. In tissue, has a high expression level compared with normal tissue and reflects prognosis in patients. Our research targeted analysis of 33 tumors clinical parameters, patient immunity roles. was significantly correlated mutation burden (TMB) 10 tumors, MSI (microsatellite instability) tumors. associated infiltrating lymphocyte...
Text message stream is a newly emerging type of Web data which produced in enormous quantities with the popularity Instant Messaging and Internet Relay Chat. It beneficial for detecting threads contained text various applications, including information retrieval, expert recognition even crime prevention. Despite its importance, not much research has been conducted so far on this problem due to characteristics messages are usually very short incomplete. In paper, we present stringent...
Text categorization is an important research area in many Information Retrieval (IR) applications. To save the storage space and computation time text categorization, efficient effective algorithms for reducing data before analysis are highly desired. Traditional techniques this purpose can generally be classified into feature extraction selection. Because of efficiency, latter more suitable such as web documents. However, popular selection Gain (IG) andχ2-test (CHI) all greedy nature thus...
Src homology 3 (SH3) domains are small protein modules involved in the regulation of important cellular pathways such as proliferation and migration, which canonically prefer to recognize interact with proline-rich peptide ligands class I or II motif. Previously, we identified two self-binding peptides (SBPs) human c-Src tyrosine kinase, first SBP (fSBP) segment (248SKPQTQGLAK257) fulfills intramolecular interaction kinase SH3 domain regulate function. The (and its equivalents other family...
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an etiological agent of the current rapidly growing outbreak disease (COVID-19), which straining health systems around world. Disrupting intermolecular association SARS-CoV-2 spike glycoprotein (S protein) with its cell surface receptor human angiotensin-converting enzyme (hACE2) has been recognized as a promising therapeutic strategy against COVID-19. The typical peptide-mediated interaction, where hACE adopts α1-helix, can...
Identifying communities in complex networks is an effective means for analyzing systems, with applications diverse areas such as social science, engineering, biology and medicine. Finding of nodes finding links are two popular schemes network analysis. These schemes, however, have inherent drawbacks inadequate to capture organizational structures real networks. We introduce a new scheme approach identifying mixture node link communities, called hybrid node-link communities. A central piece...
An important problem in analyzing complex networks is discovery of modular or community structures embedded the networks. Although being promising for identifying network communities, popular stochastic models often do not preserve node degrees, thus reducing their representation power and applicability to real-world Here we address this critical problem. Instead using a blockmodel, adopted random-graph null model faithfully capture by preserving expected degrees. The new model, learned...
Dimension reduction for large-scale text data is attracting much attention nowadays due to the rapid growth of World Wide Web. We can categorize those popular dimension algorithms into two groups: feature extraction and selection algorithms. In former, new features are combined from their original through algebraic transformation. Though many them have been validated be effective, these typically associated with high computational overhead, making difficult applied on real-world data....
Latent semantic indexing (LSI) is a successful technology in information retrieval (IR) which attempts to explore the latent semantics implied by query or document through representing them dimension-reduced space. However, LSI not optimal for categorization tasks because it aims find most representative features representation rather than discriminative ones. In this paper, we propose supervised (SLSI) selects basis vectors using training data iteratively. The extracted are then used...
The multiresolution technique is one of the most important techniques for image segmentation. Wavelet transformation a pixel-based method and widely used segmentation approaches, but it suffers deficiency modeling macrotexture pattern given image. In order to overcome such problem, this letter extends from pixel level region proposes new model by incorporating multiregion-resolution Markov random field model. Experiments are conducted using synthetic-aperture-radar data remote sensing...
This paper deals with the problem of jointly mining topics, sentiments, and association between them from online reviews in an unsupervised way. Previous methods often treat a sentiment as special topic assume word is generated flat mixture where discriminative performance analysis not satisfied. A key reason that providing rich priors on polarity for difficult depends topic. To solve we propose novel model. We decompose generative process word's to two-level hierarchy: first level...