Qi He

ORCID: 0000-0001-5257-6843
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Web Data Mining and Analysis
  • Complex Network Analysis Techniques
  • Topic Modeling
  • Recommender Systems and Techniques
  • Advanced Text Analysis Techniques
  • Semantic Web and Ontologies
  • Text and Document Classification Technologies
  • Advanced Graph Neural Networks
  • Peer-to-Peer Network Technologies
  • Advanced Database Systems and Queries
  • Data Quality and Management
  • Information Retrieval and Search Behavior
  • Multimodal Machine Learning Applications
  • Opinion Dynamics and Social Influence
  • Cryptography and Data Security
  • Data Stream Mining Techniques
  • Distributed systems and fault tolerance
  • Expert finding and Q&A systems
  • Human Mobility and Location-Based Analysis
  • Caching and Content Delivery
  • Biomedical Text Mining and Ontologies
  • Wikis in Education and Collaboration
  • Fault Detection and Control Systems
  • Sentiment Analysis and Opinion Mining
  • Spam and Phishing Detection

Bengbu Medical College
2025

Amazon (United States)
2024

Search
2024

LinkedIn (United States)
2013-2023

Beijing University of Technology
2021-2022

Institute of Electronics
2021-2022

Jimei University
2021

District of Columbia Water and Sewer Authority
2018

Qilu Normal University
2018

East China Normal University
2018

This paper focuses on the problem of identifying influential users micro-blogging services. Twitter, one most notable services, employs a social-networking model called "following", in which each user can choose who she wants to "follow" receive tweets from without requiring latter give permission first. In dataset prepared for this study, it is observed that (1) 72.4% Twitter follow more than 80% their followers, and (2) 80.5% have they are following them back. Our study reveals presence...

10.1145/1718487.1718520 article EN 2010-02-04

Query suggestion plays an important role in improving the usability of search engines. Although some recently proposed methods can make meaningful query suggestions by mining patterns from logs, none them are context-aware - they do not take into account immediately preceding queries as context suggestion. In this paper, we propose a novel approach which is two steps. offine model-learning step, to address data sparseness, summarized concepts clustering click-through bipartite. Then, session...

10.1145/1401890.1401995 article EN 2008-08-24

When you write papers, how many times do want to make some citations at a place but are not sure which papers cite? Do wish have recommendation system can recommend small number of good candidates for every that citations? In this paper, we present our initiative building context-aware citation system. High quality is challenging: only should the recommended be relevant paper under composition, also match local contexts places made. Moreover, it far from trivial model topic whole and affect...

10.1145/1772690.1772734 article EN 2010-04-26

Newly emerged event-based online social services, such as Meetup and Plancast, have experienced increased popularity rapid growth. From these we observed a new type of network - (EBSN). An EBSN does not only contain interactions in other conventional networks, but also includes valuable offline captured activities. By analyzing real data collected from Meetup, investigated properties discovered many unique interesting characteristics, heavy-tailed degree distributions strong locality interactions.

10.1145/2339530.2339693 article EN 2012-08-12

Many private and/or public organizations have been reported to create and monitor targeted Twitter streams collect understand users' opinions about the organizations. Targeted stream is usually constructed by filtering tweets with user-defined selection criteria e.g. published users from a selected region, or that match one more predefined keywords. then monitored There an emerging need for early crisis detection response such target stream. Such applications require good named entity...

10.1145/2348283.2348380 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2012-08-12

We consider the problem of analyzing word trajectories in both time and frequency domains, with specific goal identifying important less-reported, periodic aperiodic words. A set words identical trends can be grouped together to reconstruct an event a completely un-supervised manner. The document each across is treated like series, where element - inverse (DFIDF) score at one point. In this paper, we 1) first applied spectral analysis categorize features for different characteristics:...

10.1145/1277741.1277779 article EN 2007-07-23

Understanding how topics in scientific literature evolve is an interesting and important problem. Previous work simply models each paper as a bag of words also considers the impact authors. However, one document on another captured by citations, inherent element literature, has not been considered. In this paper, we address problem understanding topic evolution leveraging develop citation-aware approaches. We propose iterative learning framework adapting Latent Dirichlet Allocation model to...

10.1145/1645953.1646076 article EN 2009-11-02

Web query recommendation has long been considered a key feature of search engines. Building good system, however, is very difficult due to the fundamental challenge predicting users' intent, especially given limited user context information. In this paper, we propose novel "sequential prediction" approach that tries grasp user's intent based on his/her past sequence and its resemblance historical models mined from massive engine logs. Different were examined, including naive variable length...

10.1109/icde.2009.71 article EN Proceedings - International Conference on Data Engineering 2009-03-01

Automatic recommendation of citations for a manuscript is highly valuable scholarly activities since it can substantially improve the efficiency and quality literature search. The prior techniques placed considerable burden on users, who were required to provide representative bibliography or mark passages where are needed. In this paper we present system that considerably reduces burden: user simply inputs query (without bibliography) our automatically finds locations We show naïve...

10.1145/1935826.1935926 article EN 2011-02-01

Point-of-Interest (POI) recommendation is a new type of task that comes along with the prevalence location-based social networks in recent years. Compared traditional tasks, it focuses more on personalized, context-aware results to provide better user experience. To address this challenge, we propose Collaborative Filtering method based Non-negative Tensor Factorization, generalization Matrix Factorization approach exploits high-order tensor instead User-Location matrix model...

10.1145/2766462.2767794 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2015-08-04

Text representation plays a crucial role in classical text mining, where the primary focus was on static text. Nevertheless, well-studied representations including TFIDF are not optimized for non-stationary streams of information such as news, discussion board messages, and blogs. We therefore introduce new temporal based bursty features. Our differs significantly from traditional schemes that it 1) dynamically represents documents over time, 2) amplifies feature proportional to its...

10.1137/1.9781611972771.50 article EN 2007-04-26

Twitter has attracted millions of users to share and disseminate most up-to-date information, resulting in large volumes data produced everyday. However, many applications Information Retrieval (IR) Natural Language Processing (NLP) suffer severely from the noisy short nature tweets. In this paper, we propose a novel framework for tweet segmentation batch mode, called HybridSeg. By splitting tweets into meaningful segments, semantic or context information is well preserved easily extracted...

10.1109/tkde.2014.2327042 article EN IEEE Transactions on Knowledge and Data Engineering 2014-05-30

Social networks mediate not only the relations between entities, but also patterns of information propagation among them and their communication behavior. In this paper, we extensively study temporal annotations (e.g., time stamps duration) historical communications in social propose two novel tools -- motifs maximum-flow for characterizations networks. Using these motifs, verify following hypothesis network: 1) functional behavioral within both are stable over time; 2) synchronous...

10.1145/1871437.1871694 article EN 2010-10-26

Topic detection (TD) is a fundamental research issue in the Detection and Tracking (TDT) community with practical implications; TD helps analysts to separate wheat from chaff among thousands of incoming news streams. In this paper, we propose simple effective topic model called temporal Discriminative Probabilistic Model (DPM), which shown be theoretically equivalent classic vector space feature selection temporally discriminative weights. We compare DPM its various probabilistic cousins,...

10.1109/tpami.2009.203 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2010-01-12

With increased globalization and labor mobility, human resource reallocation across firms, industries regions has become the new norm in markets. The emergence of massive digital traces such mobility offers a unique opportunity to understand at an unprecedented scale granularity. While most studies on have largely focused characterizing macro-level (e.g., region or company) micro-level employee) patterns, problem how accurately predict employee's next career move (which company with what job...

10.1145/3041021.3054200 article EN 2017-01-01

This study thoroughly assesses the factors affecting posttraumatic stress disorder (PTSD) in patients with breast cancers from a multidimensional perspective according to theory of unpleasant symptoms (TOUS). Additionally, it develops nomogram prediction model tailored for this group. A cross-section analysis involving seven major hospitals northern Anhui Province was performed collect data 1135 cancer survivors. The self-reported Posttraumatic Stress Disorder Checklist-Civilian Version...

10.1038/s41598-025-92137-y article EN cc-by-nc-nd Scientific Reports 2025-03-17

Graph has been a ubiquitous and essential data representation to model real world objects their relationships. Today, large amounts of graph have generated by various applications. summarization techniques are crucial in uncovering useful insights about the patterns hidden underlying data. However, all existing works single-process solutions, as result cannot scale graphs. In this paper, we introduce three distributed algorithms address problem. Experimental results show that proposed can...

10.1145/2661829.2661862 article EN 2014-11-03

LinkedIn dynamically delivers update activities from a user's interpersonal network to more than 300 million members in the personalized feed that ranks according their "relevance" user. This paper discloses implementation details behind this system at which can not be found related work, and addresses scalability data sparsity challenges for deploying online. More specifically, we focus on personalization models by generating three kinds of affinity scores: Viewer-ActivityType Affinity,...

10.1145/2783258.2788614 article EN 2015-08-07

Specialists who analyze online news have a hard time separating the wheat from chaff. Moreover, automatic data-mining techniques like clustering of streams into topical groups can fully recover underlying true class labels data if and only all classes are well separated. In reality, especially for streams, this is clearly not case. The question to ask thus this: we cannot full C by clustering, what largest K < clusters find that best resemble classes? Using intuition bursty topics more...

10.1109/icdm.2007.17 article EN 2007-10-01

Users on an online social network site generate a large number of heterogeneous activities, ranging from connecting with other users, to sharing content, updating their profiles. The set activities within user's neighborhood forms stream updates for the consumption. In this paper, we report our experience problem ranking in LinkedIn homepage feed. particular, provide taxonomy describe system architecture (with key components open-sourced) that supports fast iteration model development,...

10.1145/2623330.2623362 article EN 2014-08-22

In this paper, we introduce “task trail” to understand user search behaviors. We define a task be an atomic information need, whereas trail represents all activities within that particular task, such as query reformulations, URL clicks. Previously, web logs have been studied mainly at session or level where users may submit several queries one and handle tasks session. Although previous studies addressed the problem of identification, little is known about advantage using over for...

10.1109/tkde.2014.2316794 article EN IEEE Transactions on Knowledge and Data Engineering 2014-04-11

Online social networks such as Facebook and LinkedIn have been an integrated part of everyday life. To improve the user experience power products around network, Knowledge Graphs (KG) are used a standard way to extract organize knowledge in networks. This tutorial focuses on how build KGs for by developing deep NLP models, holistic optimization network. Building KG poses two challenges: 1) input data each member network is noisy, implicit multilingual, so understanding needed; 2) influence...

10.1145/3366424.3383112 article EN Companion Proceedings of the The Web Conference 2018 2020-04-20

At LinkedIn, we want to create economic opportunity for everyone in the global workforce. To make this happen, LinkedIn offers a reactive Job Search system, and proactive Jobs You May Be Interested In (JYMBII) system match best candidates with their dream jobs. One of most challenging tasks developing these systems is properly extract important skill entities from job postings then target members matched attributes. work, show that commonly used text-based salience market-agnostic extraction...

10.1145/3394486.3403338 article EN 2020-08-20

Twitter has attracted hundred millions of users to share and disseminate most up-to-date information. However, the noisy short nature tweets makes many applications in information retrieval (IR) natural language processing (NLP) challenging. Recently, segment-based tweet representation demonstrated effectiveness named entity recognition (NER) event detection from streams. To split into meaningful phrases or segments, previous work is purely based on external knowledge bases, which ignores...

10.1145/2484028.2484044 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2013-07-28
Coming Soon ...