NFDI4DS | UHH-SEMS - Publication Details

Community Detection in Attributed Graphs: An Embedding Approach

OPENALEX - Publications

Li Ye Chaofeng Sha Xin Huang Yanchun Zhang

Community detection is a fundamental and widely-studied problem that finds all densely-connected groups of nodes well separates them from others in graphs. With the proliferation rich information available for entities real-world networks, it useful to discover communities attributed graphs where tend have attributes. However, most existing community methods directly utilize original network topology leading poor results due ignoring inherent structures. In this paper, we propose novel...

10.1609/aaai.v32i1.11274 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2018-04-25

DeepTraLog

OPENALEX - Publications

Chenxi Zhang Xin Peng Chaofeng Sha Ke Zhang Zhenqing Fu and 3 more

A microservice system in industry is usually a large-scale distributed consisting of dozens to thousands services running different machines. An anomaly the often can be reflected traces and logs, which record inter-service interactions intra-service behaviors respectively. Existing trace detection approaches treat as sequence service invocations. They ignore complex structure brought by its invocation hierarchy parallel/asynchronous On other hand, existing log events cannot handle logs that...

10.1145/3510003.3510180 article EN Proceedings of the 44th International Conference on Software Engineering 2022-05-21

MAF: A General Matching and Alignment Framework for Multimodal Named Entity Recognition

OPENALEX - Publications

Bo Xu Shizhou Huang Chaofeng Sha Hongya Wang

In this paper, we study multimodal named entity recognition in social media posts. Existing works mainly focus on using a cross-modal attention mechanism to combine text representation with image representation. However, they still suffer from two weaknesses: (1) the current methods are based strong assumption that each and its accompanying matched, can be used help identify entities text. is not always true real scenarios, may reduce effect of theMNER model; (2) fail construct consistent...

10.1145/3488560.3498475 article EN Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining 2022-02-11

Evaluating Large Language Models in Class-Level Code Generation

OPENALEX - Publications

Xueying Du Mingwei Liu Kaixin Wang Hanlin Wang Liu Jun-wei and 5 more

Recently, many large language models (LLMs) have been proposed, showing advanced proficiency in code generation. Meanwhile, efforts dedicated to evaluating LLMs on generation benchmarks such as HumanEval. Although being very helpful for comparing different LLMs, existing evaluation focuses a simple scenario (i.e., function-level or statement-level generation), which mainly asks generate one single unit (e.g., function statement) the given natural description. Such generating independent and...

10.1145/3597503.3639219 article EN 2024-04-12

Dynamically maintaining frequent items over a data stream

OPENALEX - Publications

Cheqing Jin Weining Qian Chaofeng Sha Jeffrey Xu Yu Aoying Zhou

It is challenge to maintain frequent items over a data stream, with small bounded memory, in dynamic environment where both insertion/deletion of are allowed. In this paper, we propose new novel algorithm, called hCount, which can handle insertion and deletion much less memory space than the best reported algorithm. Our algorithm also superior terms precision, recall processing time. addition, our approach does not request preknowledge on size range for extension dynamically. Given little...

10.1145/956863.956918 article EN 2003-11-03

ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation

OPENALEX - Publications

Xueying Du Mingwei Liu Kaixin Wang Hanlin Wang Junwei Liu and 5 more

In this work, we make the first attempt to evaluate LLMs in a more challenging code generation scenario, i.e. class-level generation. We manually construct benchmark ClassEval of 100 Python tasks with approximately 500 person-hours. Based on it, then perform study 11 state-of-the-art our results, have following main findings. First, find that all existing show much worse performance compared standalone method-level benchmarks like HumanEval; and coding ability cannot equivalently reflect...

10.48550/arxiv.2308.01861 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

Distributed Data Stream Clustering: A Fast EM-based Approach

OPENALEX - Publications

Aoying Zhou Feng Cao Ying Yan Chaofeng Sha Xiaofeng He

Clustering data streams has been attracting a lot of research efforts recently. However, this problem not received enough consideration when the are generated in distributed fashion, whereas such scenario is very common real life applications. There exist constraining factors clustering environment: records noisy or incomplete due to unreliable system; system needs on-line process huge volume data; communication potentially bottleneck system. All these pose great challenge for streams. In...

10.1109/icde.2007.367919 article EN 2007-04-01

Task-Oriented ML/DL Library Recommendation Based on a Knowledge Graph

OPENALEX - Publications

Mingwei Liu Chengyuan Zhao Xin Peng S. Suihuai Yu Haofen Wang and 1 more

AI applications often use ML/DL (Machine Learning/Deep Learning) models to implement specific tasks. As application developers usually are not experts, they choose integrate existing implementations of as libraries for their an active research area, attracts many researchers and produces a lot papers every year. Many the propose tasks provide implementations. However, it is easy find that suitable The challenges lie in only fast development domains techniques, but also lack detailed...

10.1109/tse.2023.3285280 article EN IEEE Transactions on Software Engineering 2023-06-13

TraceCRL: contrastive representation learning for microservice trace analysis

OPENALEX - Publications

Chenxi Zhang Xin Peng Tong Zhou Chaofeng Sha Zhenghui Yan and 2 more

Due to the large amount and high complexity of trace data, microservice analysis tasks such as anomaly detection, fault diagnosis, tail-based sampling widely adopt machine learning technology. These approaches usually use a preprocessing step map structured features traces vector representations in an ad-hoc way. Therefore, they may lose important information topological dependencies between service operations. In this paper, we propose TraceCRL, representation approach based on contrastive...

10.1145/3540250.3549146 article EN Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2022-11-07

PUTraceAD: Trace Anomaly Detection with Partial Labels based on GNN and PU Learning

OPENALEX - Publications

Ke Zhang Chenxi Zhang Xin Peng Chaofeng Sha

Distributed tracing has been an important part of microservice infrastructure and learning-based trace analysis used to detect anomalies in systems. Existing anomaly detection approaches ei-ther assume that patterns can be learned from normal execution or rely on fault injection produce labeled traces (i.e., normal/anomalous ones). However, practice it is often difficult ensure the does not involve anomalous obtain a large variety through injection. In this paper, we propose PUTraceAD,...

10.1109/issre55969.2022.00032 article EN 2022-10-01

Local Weighted Matrix Factorization for Top-n Recommendation with Implicit Feedback

OPENALEX - Publications

Keqiang Wang Peng Hong-wei Yuanyuan Jin Chaofeng Sha Xiaoling Wang

Item recommendation helps people to discover their potentially interested items among large numbers of items. One most common application is recommend top-n on implicit feedback datasets (e.g., listening history, watching history or visiting history). In this paper, we assume that the matrix has local property, where original not globally low rank but some sub-matrices are rank. propose Local Weighted Matrix Factorization (LWMF) for by employing kernel function intensify property and weight...

10.1007/s41019-017-0032-6 article EN cc-by Data Science and Engineering 2016-12-01

An Empirical Study of Parameter-Efficient Fine-Tuning Methods for Pre-Trained Code Models

OPENALEX - Publications

Jiaxing Liu Chaofeng Sha Xin Peng

Pre-trained code models (e.g. CodeBERT and CodeT5) have demonstrated their intelligence in various software engineering tasks, such as summarization. And full fine-tuning has become the typical approach to adapting these downstream tasks. However, large can be computationally expensive memory-intensive, particularly when training for multiple To alleviate this issue, several parameter-efficient methods Adapter LoRA) been proposed only train a small number of additional parameters, while...

10.1109/ase56229.2023.00125 article EN 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2023-09-11

Exploiting shopping and reviewing behavior to re-score online evaluations

OPENALEX - Publications

Rong Zhang Chaofeng Sha Minqi Zhou Aoying Zhou

Analysis to product reviews has attracted great attention from both academia and industry. Generally the evaluation scores of are used generate average products shops for future potential users. However, in real world, there is inconsistency problem between review content, some customers do not give out fair reviews. In this work, we focus on detecting credibility by analyzing online shopping behaviors, then re-score shops. end, evaluate our algorithm based data set Taobao, biggest...

10.1145/2187980.2188171 article EN 2012-04-16

Reinforcement Learning Based Sparse Black-box Adversarial Attack on Video Recognition Models

OPENALEX - Publications

Zeyuan Wang Chaofeng Sha Su Yang

We explore the black-box adversarial attack on video recognition models. Attacks are only performed selected key regions and frames to reduce high computation cost of searching perturbations a due its dimensionality. To select frames, one way is use heuristic algorithms evaluate importance each frame choose essential ones. However, it time inefficient sorting searching. In order speed up process, we propose reinforcement learning based selection strategy. Specifically, agent explores...

10.24963/ijcai.2021/435 preprint EN 2021-08-01

On Calibration of Pre-trained Code Models

OPENALEX - Publications

Zhenhao Zhou Chaofeng Sha Xin Peng

Pre-trained code models have achieved notable success in the field of Software Engineering (SE). However, existing studies predominantly focused on improving model performance, with limited attention given to other critical aspects such as calibration. Model calibration, which refers accurate estimation predictive uncertainty, is a vital consideration practical applications. Therefore, order advance understanding calibration SE, we conduct comprehensive investigation into pre-trained this...

10.1145/3597503.3639126 article EN 2024-04-12

Dynamically maintaining frequent items over a data stream

OPENALEX - Publications

Jin Cheqing Weining Qian Chaofeng Sha Jeffrey Xu Yu Aoying Zhou

It is challenge to maintain frequent items over a data stream, with small bounded memory, in dynamic environment where both insertion/deletion of are allowed. In this paper, we propose new novel algorithm, called hCount, which can handle insertion and deletion much less memory space than the best reported algorithm. Our algorithm also superior terms precision, recall processing time. addition, our approach does not request preknowledge on size range for extension dynamically. Given little...

10.1145/956915.956918 article EN 2003-01-01

Nash equilibria in parallel downloading with multiple clients

OPENALEX - Publications

Jiantao Song Chaofeng Sha Hong Zhu

Recently, the scheme of parallel downloading has been proposed as a novel approach to expedite reception large file from Internet. Experiments with single client have shown that can improve its performance significantly by using scheme. Simulations and experiments multiple clients conducted in [Gkantsidis, C et al., (2003), Koo, S (2003)] investigate impact this technique might on network if it is widely adopted. Contrast methodology used (2003)], we formulate noncooperative game. Within...

10.1109/icdcs.2004.1281572 article EN 2004-01-01

Feature Selection Based on a New Dependency Measure

OPENALEX - Publications

Chaofeng Sha Xipeng Qiu Aoying Zhou

Feature selection is a process commonly used in machine learning, wherein subset of the features available from data are selected for application learning algorithm. effective reducing dimensionality, removing irrelevant data, increasing accuracy and efficiency. In this paper, we propose new information distance to measure relevancy two features. Unlike previous feature works, our proposed meets condition triangle inequality. We use InfoDist experimental results showed it has better performance.

10.1109/fskd.2008.515 article EN 2008-10-01

EASC: An exception-aware semantic compression framework for real-world knowledge graphs

OPENALEX - Publications

Sihang Jiang Feng Jian-chuan Chao Wang Jingping Liu Zhuozhi Xiong and 4 more

10.1016/j.knosys.2023.110900 article EN Knowledge-Based Systems 2023-08-11