Qiaozhu Mei

ORCID: 0000-0002-8640-1942
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Complex Network Analysis Techniques
  • Biomedical Text Mining and Ontologies
  • Advanced Text Analysis Techniques
  • Advanced Graph Neural Networks
  • Web Data Mining and Analysis
  • Sentiment Analysis and Opinion Mining
  • Information Retrieval and Search Behavior
  • Green IT and Sustainability
  • Semantic Web and Ontologies
  • Misinformation and Its Impacts
  • Opinion Dynamics and Social Influence
  • Digital Communication and Language
  • Data Mining Algorithms and Applications
  • Adversarial Robustness in Machine Learning
  • Microfinance and Financial Inclusion
  • Caching and Content Delivery
  • Multimodal Machine Learning Applications
  • FinTech, Crowdfunding, Digital Finance
  • Digital Marketing and Social Media
  • Machine Learning and Algorithms
  • Recommender Systems and Techniques
  • Data Quality and Management
  • Machine Learning in Materials Science

University of Michigan–Ann Arbor
2015-2024

Google (United States)
2024

Michigan United
2022-2023

State Street (United States)
2014-2017

University of Illinois Urbana-Champaign
2005-2009

Vanderbilt University
2004

Peking University
2003

This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph methods do not scale for real world usually contain millions nodes. In this paper, we propose a novel network method called ``LINE,'' suitable arbitrary types networks: undirected, directed, and/or weighted. The optimizes carefully designed objective function that...

10.1145/2736277.2741093 preprint EN 2015-05-18

In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture mixture topics sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) can reveal latent topical facets in Weblog collection, subtopics results an ad hoc query, their associated sentiments. It could also provide general sentiment models that are applicable any topics. With specifically designed HMM structure, topic estimated with TSM be utilized extract...

10.1145/1242572.1242596 article EN 2007-05-08

Many previous techniques identify trending topics in social media, even that are not pre-defined. We present a technique to rumors, which we define as include disputed factual claims. Putting aside any attempt assess whether the rumors true or false, it is valuable early possible. It extremely difficult accurately classify every individual post making claim. able by recasting problem finding entire clusters of posts whose topic

10.1145/2736277.2741637 article EN 2015-05-18

Temporal Text Mining (TTM) is concerned with discovering temporal patterns in text information collected over time. Since most bears some time stamps, TTM has many applications multiple domains, such as summarizing events news articles and revealing research trends scientific literature. In this paper, we study a particular task -- the evolutionary of themes stream. We define new mining problem present general probabilistic methods for solving through (1) latent from text; (2) constructing...

10.1145/1081870.1081895 article EN 2005-08-21

Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector, have been attracting increasing attention due to their simplicity, scalability, effectiveness. However, comparing sophisticated deep learning architectures convolutional neural networks, these methods usually yield inferior results when applied particular machine tasks. One possible reason is that learn the representation of in a fully unsupervised way, without leveraging labeled information available for task....

10.1145/2783258.2783307 preprint EN 2015-08-07

This paper describes the University of Michigan's nine-year experience in developing and using a full-text search engine designed to facilitate information retrieval (IR) from narrative documents stored electronic health records (EHRs). The system, called Electronic Medical Record Search Engine (EMERSE), functions similar Google but is equipped with special functionalities for handling challenges unique retrieving medical text.Key features that distinguish EMERSE general-purpose engines are...

10.1016/j.jbi.2015.05.003 article EN cc-by Journal of Biomedical Informatics 2015-05-13

In this paper, we formally define the problem of topic modeling with network structure (TMN). We propose a novel solution to problem, which regularizes statistical model harmonic regularizer based on graph in data. The proposed method bridges and social analysis, leverages power both models discrete regularization. output well summarizes topics text, maps network, discovers topical communities. With concrete selection graph-based regularizer, our can be applied text mining problems such as...

10.1145/1367497.1367512 article EN 2008-04-21

Multinomial distributions over words are frequently used to model topics in text collections. A common, major challenge applying all such topic models any mining problem is label a multinomial accurately so that user can interpret the discovered topic. So far, labels have been generated manually subjective way. In this paper, we propose probabilistic approaches automatically labeling an objective We cast as optimization involving minimizing Kullback-Leibler divergence between word and...

10.1145/1281192.1281246 article EN 2007-08-12

Mining subtopics from weblogs and analyzing their spatiotemporal patterns have applications in multiple domains. In this paper, we define the novel problem of mining theme propose a probabilistic approach to model subtopic themes simultaneously. The proposed discovers by (1) extracting common weblogs; (2) generating life cycles for each given location; (3) snapshots time period. Evolution can be discovered comparative analysis snapshots. Experiments on three different data sets show that...

10.1145/1135777.1135857 article EN 2006-05-23

Generating alternative queries, also known as query suggestion, has long been proved useful to help a user explore and express his information need. In many scenarios, such suggestions can be generated from large scale graph of queries other accessory information, the clickthrough. However, how generate while ensuring their semantic consistency with original remains challenging problem.

10.1145/1458082.1458145 article EN 2008-10-26

Information cascades, effectively facilitated by most social network platforms, are recognized as a major factor in almost every success and disaster these networks. Can cascades be predicted? While many believe that they inherently unpredictable, recent work has shown some key properties of information such size, growth, shape, can predicted machine learning algorithm combines features. These predictors all depend on bag hand-crafting features to represent the cascade global structures....

10.1145/3038912.3052643 article EN 2017-04-03

Researchers and social observers have both believed that hashtags, as a new type of organizational objects information, play dual role in online microblogging communities (e.g., Twitter). On one hand, hashtag serves bookmark content, which links tweets with similar topics; on the other symbol community membership, bridges virtual users. Are real users aware this hashtags? Is affecting their behavior adopting hashtag? adoption predictable? We take initiative to investigate quantify effects...

10.1145/2187836.2187872 article EN 2012-04-16

We study the problem of visualizing large-scale and high-dimensional data in a low-dimensional (typically 2D or 3D) space. Much success has been reported recently by techniques that first compute similarity structure points then project them into space with preserved. These two steps suffer from considerable computational costs, preventing state-of-the-art methods such as t-SNE scaling to (e.g., millions hundreds dimensions). propose LargeVis, technique constructs an accurately approximated...

10.1145/2872427.2883041 preprint EN 2016-04-11

Emojis have been widely used to simplify emotional expression and enrich user experience. As an interesting practice of ubiquitous computing, emojis are adopted by Internet users from many different countries, on devices (particularly popular smartphones), in applications. The "ubiquitous" usage enables us study compare behaviors preferences across countries cultures. We present analysis how smartphone use based a very large data set collected emoji keyboard. contains complete month 3.88...

10.1145/2971648.2971724 article EN 2016-09-09

We administer a Turing test to AI chatbots. examine how chatbots behave in suite of classic behavioral games that are designed elicit characteristics such as trust, fairness, risk-aversion, cooperation, etc., well they respond traditional Big-5 psychological survey measures personality traits. ChatGPT-4 exhibits and traits statistically indistinguishable from random human tens thousands subjects more than 50 countries. Chatbots also modify their behavior based on previous experience contexts...

10.1073/pnas.2313925121 article EN cc-by-nc-nd Proceedings of the National Academy of Sciences 2024-02-22

Information networks are widely used to characterize the relationships between data items such as text documents. Many important retrieval and mining tasks rely on ranking based their centrality or prestige in network. Beyond prestige, diversity has been recognized a crucial objective ranking, aiming at providing non-redundant high coverage piece of information top ranked results. Nevertheless, existing network-based approaches either disregard concern diversity, handle it with non-optimized...

10.1145/1835804.1835931 article EN 2010-07-25

Topic modeling has been a key problem for document analysis. One of the canonical approaches topic is Probabilistic Latent Semantic Indexing, which maximizes joint probability documents and terms in corpus. The major disadvantage PLSI that it estimates distribution each on hidden topics independently number parameters model grows linearly with size corpus, leads to serious problems overfitting. Dirichlet Allocation (LDA) proposed overcome this by treating over as random variable. Both these...

10.1145/1458082.1458202 article EN 2008-10-26

User generated information in online communities has been characterized with the mixture of a text stream and network structure both changing over time. A good example is web-blogging community daily blog posts social bloggers.

10.1145/1835804.1835922 article EN 2010-07-25

Social tagging is becoming increasingly popular in many Web 2.0 applications where users can annotate resources (e.g. pages) with arbitrary keywords (i.e. tags). A tag recommendation module assist process by suggesting relevant tags to them. It also be directly used expand the set of annotating a resource. The benefits are twofold: improving user experience and enriching index resources. However, former one not emphasized previous studies, though lot work has reported that different may...

10.1145/1571941.1572034 article EN 2009-07-19

We present the problem of click-through prediction for advertising in Twitter timeline, which displays a stream Tweets from accounts user choose to follow. Traditional computational usually appears two forms: sponsored search that places ads onto result page when query is issued engine, and contextual regular, static Web page. Compared with these paradigms, placing into Tweet particularly challenging given nature data stream: context an ad can be placed updates dynamically never replicates....

10.1145/2783258.2788582 article EN 2015-08-07

Topic modeling has been proved to be an effective method for exploratory text mining. It is a common assumption of most topic models that document generated from mixture topics. In real-world scenarios, individual documents usually concentrate on several salient topics instead covering wide variety A real also adopts narrow range terms coverage the vocabulary. Understanding this sparsity information especially important analyzing user-generated Web content and social media, which are...

10.1145/2566486.2567980 article EN 2014-04-07
Coming Soon ...