Aixin Sun

ORCID: 0000-0003-0764-4258
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Recommender Systems and Techniques
  • Text and Document Classification Technologies
  • Advanced Text Analysis Techniques
  • Web Data Mining and Analysis
  • Multimodal Machine Learning Applications
  • Complex Network Analysis Techniques
  • Image Retrieval and Classification Techniques
  • Advanced Graph Neural Networks
  • Advanced Image and Video Retrieval Techniques
  • Sentiment Analysis and Opinion Mining
  • Data Management and Algorithms
  • Advanced Bandit Algorithms Research
  • Human Mobility and Location-Based Analysis
  • Spam and Phishing Detection
  • Semantic Web and Ontologies
  • Video Analysis and Summarization
  • Geographic Information Systems Studies
  • Caching and Content Delivery
  • Biomedical Text Mining and Ontologies
  • Expert finding and Q&A systems
  • Text Readability and Simplification
  • Wikis in Education and Collaboration
  • Software Engineering Research

Nanyang Technological University
2016-2025

Nankai University
2024

Nanchang University
2024

Nanyang Polytechnic
2023

Macquarie University
2023

Zhengzhou University
2019

National Institute of Education
2010

UNSW Sydney
2005-2006

Named entity recognition (NER) is the task to identify mentions of rigid designators from text belonging predefined semantic types such as person, location, organization etc. NER always serves foundation for many natural language applications question answering, summarization, and machine translation. Early systems got a huge success in achieving good performance with cost human engineering designing domain-specific features rules. In recent years, deep learning, empowered by continuous...

10.1109/tkde.2020.2981314 article EN IEEE Transactions on Knowledge and Data Engineering 2020-03-17

The availability of user check-in data in large volume from the rapid growing location based social networks (LBSNs) enables many important location-aware services to users. Point-of-interest (POI) recommendation is one such services, which recommend places where users have not visited before. Several techniques been recently proposed for service. However, no existing work has considered temporal information POI recommendations LBSNs. We believe that time plays an role because most tend...

10.1145/2484028.2484030 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2013-07-28

Geographical characteristics derived from the historical check-in data have been reported effective in improving location recommendation accuracy. However, previous studies mainly exploit geographical a user's perspective, via modeling distribution of each individual check-ins. In this paper, we are interested exploiting by neighborhood location. The is modeled at two levels: instance-level defined few nearest neighbors location, and region-level for region where exists. We propose novel...

10.1145/2661829.2662002 article EN 2014-11-03

Event detection from tweets is an important task to understand the current events/topics attracting a large number of common users. However, unique characteristics (e.g. short and noisy content, diverse fast changing topics, data volume) make event challenging task. Most existing techniques proposed for well written documents news articles) cannot be directly adopted. In this paper, we propose segment-based system tweets, called Twevent. Twevent first detects bursty tweet segments as then...

10.1145/2396761.2396785 article EN 2012-10-29

For many applications that require semantic understanding of short texts, inferring discriminative and coherent latent topics from texts is a critical fundamental task. Conventional topic models largely rely on word co-occurrences to derive collection documents. However, due the length each document, are much more sparse in terms co-occurrences. Data sparsity therefore becomes bottleneck for conventional achieve good results texts. On other hand, when human being interprets piece text, not...

10.1145/2911451.2911499 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2016-07-07

Many private and/or public organizations have been reported to create and monitor targeted Twitter streams collect understand users' opinions about the organizations. Targeted stream is usually constructed by filtering tweets with user-defined selection criteria e.g. published users from a selected region, or that match one more predefined keywords. then monitored There an emerging need for early crisis detection response such target stream. Such applications require good named entity...

10.1145/2348283.2348380 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2012-08-12

The availability of user check-in data in large volume from the rapid growing location-based social networks (LBSNs) enables a number important location-aware services. Point-of-interest (POI) recommendation is one such services, which to recommend POIs that users have not visited before. It has been observed that: (i) tend visit nearby places, and (ii) different places time slots, same slot, periodically places. For example, usually restaurant during lunch hours, pub at night. In this...

10.1145/2661829.2661983 article EN 2014-11-03

Because of T witter's popularity and the viral nature information dissemination on witter, predicting which witter topics will become popular in near future becomes a task considerable economic importance. Many are annotated by hashtags. In this article, we propose methods to predict new hashtags formulating problem as classification task. We use five standard models (i.e., N aïve bayes, k ‐nearest neighbors, decision trees, support vector machines, logistic regression) for prediction. The...

10.1002/asi.22844 article EN Journal of the American Society for Information Science and Technology 2013-05-08

Micro-blogging services, such as Twitter, and location-based social network applications have generated short text messages associated with geographic information, posting time, user ids. The availability of data received from users offers a good opportunity to study the user's spatial-temporal behavior preference. In this paper, we propose probabilistic model W4 (short for Who+Where+When+What) exploit discover individual users' mobility behaviors spatial, temporal activity aspects. To best...

10.1145/2487575.2487576 article EN 2013-08-11

Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one the most popular online social network platforms, Twitter attracted a large number users who send millions tweets on basis. Due world-wide coverage its real-time freshness tweets, location prediction gained significant attention recent years....

10.1109/tkde.2018.2807840 article EN IEEE Transactions on Knowledge and Data Engineering 2018-02-20

Given an untrimmed video and a text query, natural language localization (NLVL) is to locate matching span from the that semantically corresponds query. Existing solutions formulate NLVL either as ranking task apply multimodal architecture, or regression directly regress target span. In this work, we address with span-based QA approach by treating input passage. We propose localizing network (VSLNet), on top of standard framework, NLVL. The proposed VSLNet tackles differences between through...

10.18653/v1/2020.acl-main.585 article EN cc-by 2020-01-01

In this paper, we propose RNN-Capsule, a capsule model based on Recurrent Neural Network (RNN) for sentiment analysis. For given problem, one is built each category e.g., 'positive' and 'negative'. Each has an attribute, state, three modules: representation module, probability reconstruction module. The attribute of the assigned category. Given instance encoded in hidden vectors by typical RNN, module builds attention mechanism. Based representation, computes capsule's state probability. A...

10.1145/3178876.3186015 article EN 2018-01-01

In a large recommender system, the products (or items) could be in many different categories or domains. Given two relevant domains (e.g., Book and Movie), users may have interactions with items one domain but not other domain. To latter, these are considered as cold-start users. How to effectively transfer users' preferences based on their from domain, is key issue cross-domain recommendation. Inspired by advances made review-based recommendation, we propose model user preference at...

10.1145/3397271.3401169 preprint EN 2020-07-25

Collaborative filtering (CF) is widely used to learn informative latent representations of users and items from observed interactions. Existing CF-based methods commonly adopt negative sampling discriminate different items. Training with on large datasets computationally expensive. Further, should be carefully sampled under the defined distribution, in order avoid selecting an positive item training dataset. Unavoidably, some dataset could test set. In this paper, we propose a...

10.1145/3591469 article EN ACM Transactions on Recommender Systems 2023-04-20

Large Language Models (LLMs) have made remarkable strides in various tasks. Whether LLMs are competitive few-shot solvers for information extraction (IE) tasks, however, remains an open problem. In this work, we aim to provide a thorough answer question. Through extensive experiments on nine datasets across four IE demonstrate that current advanced consistently exhibit inferior performance, higher latency, and increased budget requirements compared fine-tuned SLMs under most settings....

10.18653/v1/2023.findings-emnlp.710 article EN cc-by 2023-01-01

Hierarchical classification refers to the assignment of one or more suitable categories from a hierarchical category space document. While previous work in focused on virtual trees where documents are assigned only leaf categories, we propose top-down level-based method that can classify both and internal categories. As standard performance measures assume independence between they have not considered incorrectly classified into similar far correct ones tree. We therefore category-similarity...

10.1109/icdm.2001.989560 article EN 2002-11-14

Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written maintained by volunteers online. Despite its success as a means of knowledge sharing collaboration, public never stopped criticizing quality edited non-experts inexperienced contributors. In this paper, we investigate problem assessing collaborative authoring Wikipedia. We propose three article measurement models that make use interaction data between their contributors...

10.1145/1321440.1321476 article EN 2007-11-06

10.1016/j.eswa.2007.10.042 article EN Expert Systems with Applications 2007-11-20

In web classification, pages from one or more sites are assigned to pre-defined categories according their content. Since than just plain text documents, classification methods have consider using other context features of pages, such as hyperlinks and HTML tags. this paper, we propose the use Support Vector Machine (SVM) classifiers classify both feature sets. We experimented our method on WebKB data set. Compared with earlier Foil-Pilfs same set, has been shown perform very well. also that...

10.1145/584931.584952 article EN 2002-11-08

Trust between a pair of users is an important piece information for in online community (such as electronic commerce websites and product review websites) where may rely on trust to make decisions. In this paper, we address the problem predicting whether user trusts another user. Most prior work infers unknown ratings from known ratings. The effectiveness approach depends connectivity web can be quite poor when very sparse which often case community. therefore propose classification...

10.1145/1386790.1386838 article EN 2008-07-08

Many applications require semantic understanding of short texts, and inferring discriminative coherent latent topics is a critical fundamental task in these applications. Conventional topic models largely rely on word co-occurrences to derive from collection documents. However, due the length each document, texts are much more sparse terms co-occurrences. Recent studies show that Dirichlet Multinomial Mixture (DMM) model effective for inference over by assuming piece text generated single...

10.1145/3091108 article EN ACM transactions on office information systems 2017-08-21
Coming Soon ...