- Data Management and Algorithms
- Recommender Systems and Techniques
- Topic Modeling
- Software Engineering Research
- Advanced Graph Neural Networks
- Machine Learning and Data Classification
- Web Data Mining and Analysis
- Advanced Database Systems and Queries
- Privacy-Preserving Technologies in Data
- Advanced Image and Video Retrieval Techniques
- Sentiment Analysis and Opinion Mining
- Face and Expression Recognition
- Data Mining Algorithms and Applications
- Data Quality and Management
- Anomaly Detection Techniques and Applications
- Software System Performance and Reliability
- Imbalanced Data Classification Techniques
- Network Security and Intrusion Detection
- Caching and Content Delivery
- Semantic Web and Ontologies
- Software Testing and Debugging Techniques
- Rough Sets and Fuzzy Logic
- Complex Network Analysis Techniques
- Advanced Bandit Algorithms Research
- Text and Document Classification Technologies
Fudan University
2014-2024
East China Normal University
2013
Shanghai University of Electric Power
2011
Community detection is a fundamental and widely-studied problem that finds all densely-connected groups of nodes well separates them from others in graphs. With the proliferation rich information available for entities real-world networks, it useful to discover communities attributed graphs where tend have attributes. However, most existing community methods directly utilize original network topology leading poor results due ignoring inherent structures. In this paper, we propose novel...
A microservice system in industry is usually a large-scale distributed consisting of dozens to thousands services running different machines. An anomaly the often can be reflected traces and logs, which record inter-service interactions intra-service behaviors respectively. Existing trace detection approaches treat as sequence service invocations. They ignore complex structure brought by its invocation hierarchy parallel/asynchronous On other hand, existing log events cannot handle logs that...
In this paper, we study multimodal named entity recognition in social media posts. Existing works mainly focus on using a cross-modal attention mechanism to combine text representation with image representation. However, they still suffer from two weaknesses: (1) the current methods are based strong assumption that each and its accompanying matched, can be used help identify entities text. is not always true real scenarios, may reduce effect of theMNER model; (2) fail construct consistent...
Recently, many large language models (LLMs) have been proposed, showing advanced proficiency in code generation. Meanwhile, efforts dedicated to evaluating LLMs on generation benchmarks such as HumanEval. Although being very helpful for comparing different LLMs, existing evaluation focuses a simple scenario (i.e., function-level or statement-level generation), which mainly asks generate one single unit (e.g., function statement) the given natural description. Such generating independent and...
It is challenge to maintain frequent items over a data stream, with small bounded memory, in dynamic environment where both insertion/deletion of are allowed. In this paper, we propose new novel algorithm, called hCount, which can handle insertion and deletion much less memory space than the best reported algorithm. Our algorithm also superior terms precision, recall processing time. addition, our approach does not request preknowledge on size range for extension dynamically. Given little...
In this work, we make the first attempt to evaluate LLMs in a more challenging code generation scenario, i.e. class-level generation. We manually construct benchmark ClassEval of 100 Python tasks with approximately 500 person-hours. Based on it, then perform study 11 state-of-the-art our results, have following main findings. First, find that all existing show much worse performance compared standalone method-level benchmarks like HumanEval; and coding ability cannot equivalently reflect...
Clustering data streams has been attracting a lot of research efforts recently. However, this problem not received enough consideration when the are generated in distributed fashion, whereas such scenario is very common real life applications. There exist constraining factors clustering environment: records noisy or incomplete due to unreliable system; system needs on-line process huge volume data; communication potentially bottleneck system. All these pose great challenge for streams. In...
AI applications often use ML/DL (Machine Learning/Deep Learning) models to implement specific tasks. As application developers usually are not experts, they choose integrate existing implementations of as libraries for their an active research area, attracts many researchers and produces a lot papers every year. Many the propose tasks provide implementations. However, it is easy find that suitable The challenges lie in only fast development domains techniques, but also lack detailed...
Due to the large amount and high complexity of trace data, microservice analysis tasks such as anomaly detection, fault diagnosis, tail-based sampling widely adopt machine learning technology. These approaches usually use a preprocessing step map structured features traces vector representations in an ad-hoc way. Therefore, they may lose important information topological dependencies between service operations. In this paper, we propose TraceCRL, representation approach based on contrastive...
Distributed tracing has been an important part of microservice infrastructure and learning-based trace analysis used to detect anomalies in systems. Existing anomaly detection approaches ei-ther assume that patterns can be learned from normal execution or rely on fault injection produce labeled traces (i.e., normal/anomalous ones). However, practice it is often difficult ensure the does not involve anomalous obtain a large variety through injection. In this paper, we propose PUTraceAD,...
Item recommendation helps people to discover their potentially interested items among large numbers of items. One most common application is recommend top-n on implicit feedback datasets (e.g., listening history, watching history or visiting history). In this paper, we assume that the matrix has local property, where original not globally low rank but some sub-matrices are rank. propose Local Weighted Matrix Factorization (LWMF) for by employing kernel function intensify property and weight...
Pre-trained code models (e.g. CodeBERT and CodeT5) have demonstrated their intelligence in various software engineering tasks, such as summarization. And full fine-tuning has become the typical approach to adapting these downstream tasks. However, large can be computationally expensive memory-intensive, particularly when training for multiple To alleviate this issue, several parameter-efficient methods Adapter LoRA) been proposed only train a small number of additional parameters, while...
Analysis to product reviews has attracted great attention from both academia and industry. Generally the evaluation scores of are used generate average products shops for future potential users. However, in real world, there is inconsistency problem between review content, some customers do not give out fair reviews. In this work, we focus on detecting credibility by analyzing online shopping behaviors, then re-score shops. end, evaluate our algorithm based data set Taobao, biggest...
We explore the black-box adversarial attack on video recognition models. Attacks are only performed selected key regions and frames to reduce high computation cost of searching perturbations a due its dimensionality. To select frames, one way is use heuristic algorithms evaluate importance each frame choose essential ones. However, it time inefficient sorting searching. In order speed up process, we propose reinforcement learning based selection strategy. Specifically, agent explores...
Pre-trained code models have achieved notable success in the field of Software Engineering (SE). However, existing studies predominantly focused on improving model performance, with limited attention given to other critical aspects such as calibration. Model calibration, which refers accurate estimation predictive uncertainty, is a vital consideration practical applications. Therefore, order advance understanding calibration SE, we conduct comprehensive investigation into pre-trained this...
It is challenge to maintain frequent items over a data stream, with small bounded memory, in dynamic environment where both insertion/deletion of are allowed. In this paper, we propose new novel algorithm, called hCount, which can handle insertion and deletion much less memory space than the best reported algorithm. Our algorithm also superior terms precision, recall processing time. addition, our approach does not request preknowledge on size range for extension dynamically. Given little...
Recently, the scheme of parallel downloading has been proposed as a novel approach to expedite reception large file from Internet. Experiments with single client have shown that can improve its performance significantly by using scheme. Simulations and experiments multiple clients conducted in [Gkantsidis, C et al., (2003), Koo, S (2003)] investigate impact this technique might on network if it is widely adopted. Contrast methodology used (2003)], we formulate noncooperative game. Within...
Feature selection is a process commonly used in machine learning, wherein subset of the features available from data are selected for application learning algorithm. effective reducing dimensionality, removing irrelevant data, increasing accuracy and efficiency. In this paper, we propose new information distance to measure relevancy two features. Unlike previous feature works, our proposed meets condition triangle inequality. We use InfoDist experimental results showed it has better performance.