NFDI4DS | UHH-SEMS - Publication Details

Jie Tang

ORCID: 0000-0003-3487-4593

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5044791875

Research Areas

Topic Modeling
Complex Network Analysis Techniques
Advanced Graph Neural Networks
Natural Language Processing Techniques
Recommender Systems and Techniques
Opinion Dynamics and Social Influence
Semantic Web and Ontologies
Web Data Mining and Analysis
Expert finding and Q&A systems
Advanced Text Analysis Techniques
Data Quality and Management
Text and Document Classification Technologies
Multimodal Machine Learning Applications
X-ray Diffraction in Crystallography
Crystallization and Solubility Studies
Domain Adaptation and Few-Shot Learning
Service-Oriented Architecture and Web Services
Mobile Crowdsensing and Crowdsourcing
Biomedical Text Mining and Ontologies
Human Mobility and Location-Based Analysis
Data Mining Algorithms and Applications
Spam and Phishing Detection
Caching and Content Delivery
Online Learning and Analytics
Sentiment Analysis and Opinion Mining

Tsinghua University
2016-2025

Sichuan University of Science and Engineering
2025

Hubei University of Technology
2024

Xuzhou Construction Machinery Group (China)
2024

Nanjing Tech University
2024

Hunan University of Traditional Chinese Medicine
2018-2024

Southwest Forestry University
2024

Central Conservatory of Music
2024

University of Science and Technology of China
2024

Institute of Psychology, Chinese Academy of Sciences
2021-2023

ArnetMiner

OPENALEX - Publications

Jie Tang Jing Zhang Limin Yao Juanzi Li Li Zhang and 1 more

This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, system focuses on: 1) Extracting researcher profiles automatically from Web; 2) Integrating publication data into network existing digital libraries; 3) Modeling entire network; 4) Providing search services for network. So far, 448,470 have been extracted using a unified tagging approach. We integrate publications online Web databases propose...

10.1145/1401890.1402008 article EN 2008-08-24

Evaluating Large Language Models Trained on Code

OPENALEX - Publications

Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Pondé de Oliveira Pinto and 53 more

We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, new evaluation set we release to measure functional correctness for synthesizing programs docstrings, our solves 28.8% the problems, while GPT-3 0% GPT-J 11.4%. Furthermore, find that repeated sampling is surprisingly effective strategy producing working solutions difficult...

10.48550/arxiv.2107.03374 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Dota 2 with Large Scale Deep Reinforcement Learning

OPENALEX - Publications

Christopher Berner Greg Brockman Brooke Chan Vicki Cheung Przemyslaw Debiak and 20 more

On April 13th, 2019, OpenAI Five became the first AI system to defeat world champions at an esports game. The game of Dota 2 presents novel challenges for systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all which will become increasingly central more capable systems. leveraged existing reinforcement learning techniques, scaled learn from batches approximately million frames every seconds. We developed a distributed training tools...

10.48550/arxiv.1912.06680 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Self-supervised Learning: Generative or Contrastive

OPENALEX - Publications

Xiao Liu Fanjin Zhang Zhenyu Hou Mian Li Zhaoyu Wang and 2 more

Deep supervised learning has achieved great success in the last decade. However, its deficiencies of dependence on manual labels and vulnerability to attacks have driven people explore a better solution. As an alternative, self-supervised attracts many researchers for soaring performance representation several years. Self-supervised leverages input data itself as supervision benefits almost all types downstream tasks. In this survey, we take look into new methods computer vision, natural...

10.1109/tkde.2021.3090866 article EN IEEE Transactions on Knowledge and Data Engineering 2021-01-01

Pre-trained models: Past, present and future

OPENALEX - Publications

Xu Han Zhengyan Zhang Ning Ding Yuxian Gu Xiao Liu and 19 more

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled unlabeled data. By storing into parameters fine-tuning on specific tasks, rich implicitly encoded benefit variety downstream which has been extensively demonstrated via experimental...

10.1016/j.aiopen.2021.08.002 article EN cc-by-nc-nd AI Open 2021-01-01

GCC

OPENALEX - Publications

Jiezhong Qiu Qibin Chen Yuxiao Dong Jing Zhang Hongxia Yang and 3 more

Graph representation learning has emerged as a powerful technique for addressing real-world problems. Various downstream graph tasks have benefited from its recent developments, such node classification, similarity search, and classification. However, prior arts on focus domain specific problems train dedicated model each dataset, which is usually non-transferable to out-of-domain data. Inspired by the advances in pre-training natural language processing computer vision, we design...

10.1145/3394486.3403168 preprint EN 2020-08-20

Network Embedding as Matrix Factorization

OPENALEX - Publications

Jiezhong Qiu Yuxiao Dong Hao Ma Jian Li Kuansan Wang and 1 more

Since the invention of word2vec, skip-gram model has significantly advanced research network embedding, such as recent emergence DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all aforementioned models with negative sampling can be unified into matrix factorization framework closed forms. Our analysis proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation network's normalized Laplacian matrix; (2) in theory, is special case when size...

10.1145/3159652.3159706 preprint EN 2018-02-02

DeepInf

OPENALEX - Publications

Jiezhong Qiu Jian Tang Hao Ma Yuxiao Dong Kuansan Wang and 1 more

Social and information networking activities such as on Facebook, Twitter, WeChat, Weibo have become an indispensable part of our everyday life, where we can easily access friends' behaviors are in turn influenced by them. Consequently, effective social influence prediction for each user is critical a variety applications online recommendation advertising.

10.1145/3219819.3220077 preprint EN 2018-07-19

Inferring Social Status and Rich Club Effects in Enterprise Communication Networks

OPENALEX - Publications

Yuxiao Dong Jie Tang Nitesh V. Chawla Tiancheng Lou Yang Yang and 1 more

Social status, defined as the relative rank or position that an individual holds in a social hierarchy, is known to be among most important motivating forces behaviors. In this paper, we consider notion of status from perspective title held by person enterprise. We study intersection and networks whether enterprise communication logs can help reveal how interactions manifest themselves networks. To end, use two datasets with three channels --- voice call, short message, email demonstrate...

10.1371/journal.pone.0119446 article EN cc-by PLoS ONE 2015-03-30

Representation Learning for Attributed Multiplex Heterogeneous Network

OPENALEX - Publications

Yukuo Cen Xu Zou Jianwei Zhang Hongxia Yang Jingren Zhou and 1 more

Network embedding (or graph embedding) has been widely used in many real-world applications. However, existing methods mainly focus on networks with single-typed nodes/edges and cannot scale well to handle large networks. Many consist of billions nodes edges multiple types, each node is associated different attributes. In this paper, we formalize the problem learning for Attributed Multiplex Heterogeneous propose a unified framework address problem. The supports both transductive inductive...

10.1145/3292500.3330964 preprint EN 2019-07-25

User-level sentiment analysis incorporating social networks

OPENALEX - Publications

Chenhao Tan Lillian Lee Jie Tang Long Jiang Ming Zhou and 1 more

We show that information about social relationships can be used to improve user-level sentiment analysis. The main motivation behind our approach is users are somehow "connected" may more likely hold similar opinions; therefore, relationship complement what we extract a user's viewpoints from their utterances. Employing Twitter as source for experimental data, and working within semi-supervised framework, propose models induced either the follower/followee network or in formed by referring...

10.1145/2020408.2020614 preprint EN 2011-08-21

Parameter-efficient fine-tuning of large-scale pre-trained language models

OPENALEX - Publications

Ning Ding Yujia Qin Guang Yang Fuchao Wei Zonghan Yang and 15 more

Abstract With the prevalence of pre-trained language models (PLMs) and pre-training–fine-tuning paradigm, it has been continuously shown that larger tend to yield better performance. However, as PLMs scale up, fine-tuning storing all parameters is prohibitively costly eventually becomes practically infeasible. This necessitates a new branch research focusing on parameter-efficient adaptation PLMs, which optimizes small portion model while keeping rest fixed, drastically cutting down...

10.1038/s42256-023-00626-4 article EN cc-by Nature Machine Intelligence 2023-03-02

Inferring social ties across heterogenous networks

OPENALEX - Publications

Jie Tang Tiancheng Lou Jon Kleinberg

It is well known that different types of social ties have essentially influence on people. However, users in online networks rarely categorize their contacts into "family", "colleagues", or "classmates". While a bulk research has focused inferring particular relationships specific network, few publications systematically study the generalization problem over multiple heterogeneous networks. In this work, we develop framework for classifying type by learning across The incorporates theories...

10.1145/2124295.2124382 article EN 2012-02-08

GraphMAE: Self-Supervised Masked Graph Autoencoders

OPENALEX - Publications

Zhenyu Hou Xiao Liu Yukuo Cen Yuxiao Dong Hongxia Yang and 2 more

Self-supervised learning (SSL) has been extensively explored in recent years. Particularly, generative SSL seen emerging success natural language processing and other fields, such as the wide adoption of BERT GPT. Despite this, contrastive learning---which heavily relies on structural data augmentation complicated training strategies---has dominant approach graph SSL, while progress graphs, especially autoencoders (GAEs), thus far not reached potential promised fields. In this paper, we...

10.1145/3534678.3539321 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022-08-12

Cross-domain collaboration recommendation

OPENALEX - Publications

Jie Tang Sen Wu Jimeng Sun Hang Su

Interdisciplinary collaborations have generated huge impact to society. However, it is often hard for researchers establish such cross-domain collaborations. What are the patterns of collaborations? How do those form? Can we predict this type

10.1145/2339530.2339730 article EN 2012-08-12

COSNET

OPENALEX - Publications

Yutao Zhang Jie Tang Zhilin Yang Jian Pei Philip S. Yu

More often than not, people are active in more one social network. Identifying users from multiple heterogeneous networks and integrating the different is a fundamental issue many applications. The existing methods tackle this problem by estimating pairwise similarity between two networks. However, those suffer potential inconsistency of matchings

10.1145/2783258.2783268 article EN 2015-08-07

CogView: Mastering Text-to-Image Generation via Transformers

OPENALEX - Publications

Ming Ding Zhuoyi Yang Wenyi Hong Wendi Zheng Chang Zhou and 6 more

Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding. We propose CogView, 4-billion-parameter Transformer with VQ-VAE tokenizer to advance this problem. also demonstrate finetuning strategies for various downstream tasks, e.g. style learning, super-resolution, text-image ranking fashion design, methods stabilize pretraining, eliminating NaN losses. CogView achieves state-of-the-art FID on...

10.48550/arxiv.2105.13290 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Understanding retweeting behaviors in social networks

OPENALEX - Publications

Zi Yang Jingyi Guo Keke Cai Jie Tang Juanzi Li and 2 more

Retweeting is an important action (behavior) on Twitter, indicating the behavior that users re-post microblogs of their friends. While much work has been conducted for mining textual content generate or analyzing social network structure, few publications systematically study underlying mechanism retweeting behaviors. In this paper, we perform interesting analysis problem Twitter. We have found almost 25.5% tweets posted by are actually retweeted from friends' blog spaces. Our investigation...

10.1145/1871437.1871691 article EN 2010-10-26

Mining topic-level influence in heterogeneous networks

OPENALEX - Publications

Lu Liu Jie Tang Jiawei Han Meng Jiang Shiqiang Yang

Influence is a complex and subtle force that governs the dynamics of social networks as well behaviors involved users. Understanding influence can benefit various applications such viral marketing, recommendation, information retrieval. However, most existing works on analysis have focused verifying existence influence. Few systematically investigate how to mine strength direct indirect between nodes in heterogeneous networks.

10.1145/1871437.1871467 article EN 2010-10-26

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

OPENALEX - Publications

Xiao Liu Kaixuan Ji Yicheng Fu Weng Lam Tam Zhengxiao Du and 2 more

Prompt tuning, which only tunes continuous prompts with a frozen language model, substantially reduces per-task storage and memory usage at training. However, in the context of NLU, prior work reveals that prompt tuning does not perform well for normal-sized pretrained models. We also find existing methods cannot handle hard sequence labeling tasks, indicating lack universality. present novel empirical finding properly optimized can be universally effective across wide range model scales NLU...

10.48550/arxiv.2110.07602 preprint EN cc-by arXiv (Cornell University) 2021-01-01

GPT understands, too

OPENALEX - Publications

Xiao Liu Yanan Zheng Zhengxiao Du Ming Ding Yujie Qian and 2 more

Prompting a pretrained language model with natural patterns has been proved effective for understanding (NLU). However, our preliminary study reveals that manual discrete prompts often lead to unstable performance—e.g., changing single word in the prompt might result substantial performance drop. We propose novel method P-Tuning employs trainable continuous embeddings concatenation prompts. Empirically, not only stabilizes training by minimizing gap between various prompts, but also improves...

10.1016/j.aiopen.2023.08.012 article EN cc-by AI Open 2023-08-26

A Unified Probabilistic Framework for Name Disambiguation in Digital Library

OPENALEX - Publications

Jie Tang A.C.M. Fong Bo Wang Jing Zhang

Despite years of research, the name ambiguity problem remains largely unresolved. Outstanding issues include how to capture all information for disambiguation in a unified approach, and determine number people K process. In this paper, we formalize probabilistic framework, which incorporates both attributes relationships. Specifically, define objective function propose two-step parameter estimation algorithm. We also investigate dynamic approach estimating K. Experiments show that our...

10.1109/tkde.2011.13 article EN IEEE Transactions on Knowledge and Data Engineering 2011-01-07

Inferring user demographics and social strategies in mobile social networks

OPENALEX - Publications

Yuxiao Dong Yang Yang Jie Tang Yang Yang Nitesh V. Chawla

Demographics are widely used in marketing to characterize different types of customers. However, practice, demographic information such as age, gender, and location is usually unavailable due privacy other reasons. In this paper, we aim harness the power big data automatically infer users' demographics based on their daily mobile communication patterns. Our study a real-world large network more than 7,000,000 users over 1,000,000,000 records (CALL SMS). We discover several interesting social...

10.1145/2623330.2623703 article EN 2014-08-22

Coming Soon ...