NFDI4DS | UHH-SEMS - Publication Details

Yangqiu Song

ORCID: 0000-0002-7818-6090

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5020880385

Research Areas

Topic Modeling
Natural Language Processing Techniques
Advanced Graph Neural Networks
Text and Document Classification Technologies
Advanced Text Analysis Techniques
Semantic Web and Ontologies
Sentiment Analysis and Opinion Mining
Complex Network Analysis Techniques
Multimodal Machine Learning Applications
Recommender Systems and Techniques
Data Quality and Management
Speech and dialogue systems
Face and Expression Recognition
Domain Adaptation and Few-Shot Learning
Bayesian Modeling and Causal Inference
Privacy-Preserving Technologies in Data
Image Retrieval and Classification Techniques
Advanced Image and Video Retrieval Techniques
Text Readability and Simplification
Human Pose and Action Recognition
Video Analysis and Summarization
Explainable Artificial Intelligence (XAI)
Data Visualization and Analytics
Hate Speech and Cyberbullying Detection
Advanced Clustering Algorithms Research

Tsinghua University
2006-2024

University of Hong Kong
2013-2024

Peng Cheng Laboratory
2020-2024

Hong Kong University of Science and Technology
2013-2024

Zhejiang University of Finance and Economics
2024

Bar-Ilan University
2023

Tencent (China)
2019-2022

Association for Computing Machinery
2019-2021

West Virginia University
2015-2018

Peking University
2017

Parallel Spectral Clustering in Distributed Systems

OPENALEX - Publications

Wen-Yen Chen Yangqiu Song Hongjie Bai Chih‐Jen Lin Edward Yi Chang

Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral suffers from a scalability problem both memory use and computational time when the size of data set is large. To perform on large sets, we investigate two representative ways approximating dense similarity matrix. We compare one approach by sparsifying matrix with another Nyström method. then pick strategy via retaining nearest neighbors...

10.1109/tpami.2010.88 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2010-04-09

Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks

OPENALEX - Publications

Huan Zhao Quanming Yao Jianda Li Yangqiu Song Dik Lun Lee

Heterogeneous Information Network (HIN) is a natural and general representation of data in modern large commercial recommender systems which involve heterogeneous types data. HIN based recommenders face two problems: how to represent the high-level semantics recommendations fuse information make recommendations. In this paper, we solve problems by first introducing concept meta-graph HIN-based recommendation, then solving fusion problem with "matrix factorization (MF) + machine (FM)"...

10.1145/3097983.3098063 article EN 2017-08-04

Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN

OPENALEX - Publications

Hao Peng Jianxin Li Yu He Yao‐Peng Liu Mengjiao Bao and 3 more

Text classification to a hierarchical taxonomy of topics is common and practical problem. Traditional approaches simply use bag-of-words have achieved good results. However, when there are lot labels with different topical granularities, representation may not be enough. Deep learning models been proven effective automatically learn levels representations for image data. It interesting study what the best way represent texts. In this paper, we propose graph-CNN based deep model first convert...

10.1145/3178876.3186005 article EN 2018-01-01

TextFlow: Towards Better Understanding of Evolving Topics in Text

OPENALEX - Publications

Weiwei Cui Shi‐Xia Liu Li Tan Conglei Shi Yangqiu Song and 3 more

Understanding how topics evolve in text data is an important and challenging task. Although much work has been devoted to topic analysis, the study of evolution largely limited individual topics. In this paper, we introduce TextFlow, a seamless integration visualization mining techniques, for analyzing various patterns that emerge from multiple We first extend existing analysis technique extract three-level features: trend, critical event, keyword correlation. Then coherent consists three...

10.1109/tvcg.2011.239 article EN IEEE Transactions on Visualization and Computer Graphics 2011-11-04

Multilingual and Multi-Aspect Hate Speech Analysis

OPENALEX - Publications

Nedjma Ousidhoum Zizheng Lin Hongming Zhang Yangqiu Song Dit‐Yan Yeung

Nedjma Ousidhoum, Zizheng Lin, Hongming Zhang, Yangqiu Song, Dit-Yan Yeung. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1474 article EN cc-by 2019-01-01

HinDroid

OPENALEX - Publications

Shifu Hou Yanfang Ye Yangqiu Song Melih Abdulhayoglu

With explosive growth of Android malware and due to the severity its damages smart phone users, detection has become increasingly important in cybersecurity. The increasing sophistication calls for new defensive techniques that are capable against novel threats harder evade. In this paper, detect malware, instead using Application Programming Interface (API) only, we further analyze different relationships between them create higher-level semantics which require more effort attackers evade...

10.1145/3097983.3098026 article EN 2017-08-04

Scalable Multiplex Network Embedding

OPENALEX - Publications

Hongming Zhang Liwei Qiu Lingling Yi Yangqiu Song

Network embedding has been proven to be helpful for many real-world problems. In this paper, we present a scalable multiplex network model represent information of multi-type relations into unified space. To combine different types while maintaining their distinctive properties, each node, propose one high-dimensional common and lower-dimensional additional type relation. Then multiple can learned jointly based on model. We conduct experiments two tasks: link prediction node classification...

10.24963/ijcai.2018/428 article EN 2018-07-01

Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

OPENALEX - Publications

Tong Che Yanran Li Ruixiang Zhang R Devon Hjelm Wenjie Li and 2 more

Despite the successes in capturing continuous distributions, application of generative adversarial networks (GANs) to discrete settings, like natural language tasks, is rather restricted. The fundamental reason difficulty back-propagation through random variables combined with inherent instability GAN training objective. To address these problems, we propose Maximum-Likelihood Augmented Discrete Generative Adversarial Networks. Instead directly optimizing objective, derive a novel and...

10.48550/arxiv.1702.07983 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Understanding Hidden Memories of Recurrent Neural Networks

OPENALEX - Publications

Ming Yao Shaozu Cao Ruixiang Zhang Zhen Li Yuanzhe Chen and 2 more

Recurrent neural networks (RNNs) have been successfully applied to various natural language processing (NLP) tasks and achieved better results than conventional methods. However, the lack of understanding mechanisms behind their effectiveness limits further improvements on architectures. In this paper, we present a visual analytics method for comparing RNN models NLP tasks. We propose technique explain function individual hidden state units based expected response input texts. then...

10.1109/vast.2017.8585721 article EN 2017-10-01

Multi-step Jailbreaking Privacy Attacks on ChatGPT

OPENALEX - Publications

Haoran Li Dadi Guo Wei Fan Mingshi Xu Jie Huang and 2 more

With the rapid progress of large language models (LLMs), many downstream NLP tasks can be well solved given appropriate prompts. Though model developers and researchers work hard on dialog safety to avoid generating harmful content from LLMs, it is still challenging steer AI-generated (AIGC) for human good. As powerful LLMs are devouring existing text data various domains (e.g., GPT-3 trained 45TB texts), natural doubt whether private information included in training what privacy threats...

10.18653/v1/2023.findings-emnlp.272 article EN cc-by 2023-01-01

Short text conceptualization using a probabilistic knowledgebase

OPENALEX - Publications

Yangqiu Song Haixun Wang Zhongyuan Wang Hongsong Li Weizhu Chen

Most text mining tasks, including clustering and topic detection, are based on statistical methods that treat as bags of words. Semantics in the is largely ignored process, results often have low interpretability. One particular challenge faced by such approaches lies short understanding, texts lack enough content from which conclusions can be drawn easily. In this paper, we improve understanding using a probabilistic knowledgebase rich our mental world terms concepts (of worldly facts) it...

10.5591/978-1-57735-516-8/ijcai11-388 article EN International Joint Conference on Artificial Intelligence 2011-07-16

TIARA

OPENALEX - Publications

Furu Wei Shi‐Xia Liu Yangqiu Song Shimei Pan Michelle X. Zhou and 4 more

In this paper, we present a novel exploratory visual analytic system called TIARA (Text Insight via Automated Responsive Analytics), which combines text analytics and interactive visualization to help users explore analyze large collections of text. Given collection documents, first uses topic analysis techniques summarize the documents into set topics, each is represented by keywords. addition extracting derives time-sensitive keywords depict content evolution over time. To understand...

10.1145/1835804.1835827 article EN 2010-07-25

Semi-supervised Multi-label Learning by Solving a Sylvester Equation

OPENALEX - Publications

Gang Chen Yangqiu Song Fei Wang Changshui Zhang

Previous chapter Next Full AccessProceedings Proceedings of the 2008 SIAM International Conference on Data Mining (SDM)Semi-supervised Multi-label Learning by Solving a Sylvester EquationGang Chen, Yangqiu Song, Fei Wang, and Changshui ZhangGang Zhangpp.410 - 419Chapter DOI:https://doi.org/10.1137/1.9781611972788.37PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract learning refers problems where an instance can be assigned more than one category....

10.1137/1.9781611972788.37 article EN 2008-04-24

Transitive Transfer Learning

OPENALEX - Publications

Ben Tan Yangqiu Song Erheng Zhong Qiang Yang

Transfer learning, which leverages knowledge from source domains to enhance learning ability in a target domain, has been proven effective various applications. One major limitation of transfer is that the and should be directly related. If there little overlap between two domains, performing these will not effective. Inspired by human transitive inference ability, whereby seemingly unrelated concepts can connected string intermediate bridges using auxiliary concepts, this paper we study...

10.1145/2783258.2783295 article EN 2015-08-07

A unified framework for semi-supervised dimensionality reduction

OPENALEX - Publications

Yangqiu Song Feiping Nie Changshui Zhang Shiming Xiang

10.1016/j.patcog.2008.01.001 article EN Pattern Recognition 2008-01-15

Automatic taxonomy construction from keywords

OPENALEX - Publications

Xueqing Liu Yangqiu Song Shi‐Xia Liu Haixun Wang

Taxonomies, especially the ones in specific domains, are becoming indispensable to a growing number of applications. State-of-the-art approaches assume there exists text corpus accurately characterize domain interest, and that taxonomy can be derived from using information extraction techniques. In reality, neither assumption is valid, for highly focused or fast-changing domains. this paper, we study challenging problem: Deriving set keyword phrases. A solution benefit many real life...

10.1145/2339530.2339754 article EN 2012-08-12

TIARA

OPENALEX - Publications

Shi‐Xia Liu Michelle X. Zhou Shimei Pan Yangqiu Song Weihong Qian and 2 more

We are building an interactive visual text analysis tool that aids users in analyzing large collections of text. Unlike existing work analytics, which focuses either on developing sophisticated analytic techniques or inventing novel visualization metaphors, ours tightly integrates state-of-the-art analytics with to maximize the value both. In this article, we present our from two aspects. first introduce enhanced, LDA-based topic technique automatically derives a set topics summarize...

10.1145/2089094.2089101 article EN ACM Transactions on Intelligent Systems and Technology 2012-02-01

Neural Aspect and Opinion Term Extraction with Mined Rules as Weak Supervision

OPENALEX - Publications

Hongliang Dai Yangqiu Song

Lack of labeled training data is a major bottleneck for neural network based aspect and opinion term extraction on product reviews. To alleviate this problem, we first propose an algorithm to automatically mine rules from existing examples dependency parsing results. The mined are then applied label large amount auxiliary data. Finally, study procedures train model which can learn both the by small accurately annotated human. Experimental results show that although themselves do not perform...

10.18653/v1/p19-1520 preprint EN cc-by 2019-01-01

KBQA

OPENALEX - Publications

Wanyun Cui Yanghua Xiao Haixun Wang Yangqiu Song Seung-won Hwang and 1 more

Question answering (QA) has become a popular way for humans to access billion-scale knowledge bases. Unlike web search, QA over base gives out accurate and concise results, provided that natural language questions can be understood mapped precisely structured queries the base. The challenge, however, is human ask one question in many different ways. Previous approaches have limits due their representations: rule based only understand small set of "canned" questions, while keyword or synonym...

10.14778/3055540.3055549 article EN Proceedings of the VLDB Endowment 2017-01-01

Event Detection and Co-reference with Minimal Supervision

OPENALEX - Publications

Haoruo Peng Yangqiu Song Dan Roth

10.18653/v1/d16-1038 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

On Dataless Hierarchical Text Classification

OPENALEX - Publications

Yangqiu Song Dan Roth

In this paper, we systematically study the problem of dataless hierarchical text classification. Unlike standard classification schemes that rely on supervised training, depends understanding labels sought after categories and requires no labeled data. Given a collection documents set labels, show can be used to accurately categorize documents. This is done by embedding both in semantic space allows one compute meaningful similarity between document potential label. We scheme support...

10.1609/aaai.v28i1.8938 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2014-06-21

ASER: A Large-scale Eventuality Knowledge Graph

OPENALEX - Publications

Hongming Zhang Xin Liu Haojie Pan Yangqiu Song Cane Wing-ki Leung

Understanding human’s language requires complex world knowledge. However, existing large-scale knowledge graphs mainly focus on about entities while ignoring activities, states, or events, which are used to describe how things act in the real world. To fill this gap, we develop ASER (activities, and their relations), a eventuality graph extracted from more than 11-billion-token unstructured textual data. contains 15 relation types belonging five categories, 194-million unique eventualities,...

10.1145/3366423.3380107 article EN 2020-04-20

HeteSpaceyWalk

OPENALEX - Publications

Yu He Yangqiu Song Jianxin Li Cheng Ji Jian Peng and 1 more

Heterogeneous information network (HIN) embedding has gained increasing interests recently. However, the current way of random-walk based HIN methods have paid few attention to higher-order Markov chain nature meta-path guided random walks, especially stationarity issue. In this paper, we systematically formalize walk as a process,and present heterogeneous personalized spacey efficiently and effectively attain expected stationary distribution among nodes. Then propose generalized scalable...

10.1145/3357384.3358061 article EN 2019-11-03

Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components

OPENALEX - Publications

Jingxin Yu Xun Jian Xin Hao Yangqiu Song

Word embeddings have attracted much attention recently. Different from alphabetic writing systems, Chinese characters are often composed of subcharacter components which also semantically informative. In this work, we propose an approach to jointly embed words as well their and fine-grained components. We use three likelihoods evaluate whether the context words, characters, can predict current target word, collected 13,253 demonstrate existing approaches decomposing not enough. Evaluation on...

10.18653/v1/d17-1027 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2017-01-01

Coming Soon ...