- Advanced Neural Network Applications
- Anomaly Detection Techniques and Applications
- Advanced Image and Video Retrieval Techniques
- Adversarial Robustness in Machine Learning
- Video Surveillance and Tracking Methods
- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Human Pose and Action Recognition
- Advanced Vision and Imaging
- Topic Modeling
- Speech Recognition and Synthesis
- Neural Networks and Applications
- Advanced Malware Detection Techniques
- Complex Network Analysis Techniques
- Image Enhancement Techniques
- Grey System Theory Applications
- Advanced Numerical Methods in Computational Mathematics
- CCD and CMOS Imaging Sensors
- Machine Learning and Data Classification
- Music and Audio Processing
- Generative Adversarial Networks and Image Synthesis
- Internet Traffic Analysis and Secure E-voting
- Meteorological Phenomena and Simulations
- Advanced Graph Neural Networks
- Natural Language Processing Techniques
Harbin Institute of Technology
2022-2024
Griffith University
2022-2024
Alibaba Group (United States)
2020-2024
Alibaba Group (China)
2017-2024
Liaoning Meteorological Bureau
2019-2024
China Meteorological Administration
2024
University of Illinois Urbana-Champaign
2018-2023
Beijing University of Posts and Telecommunications
2009-2023
Stockholm University
2023
Shenzhen University
2018-2019
We propose a network for Congested Scene Recognition called CSRNet to provide data-driven and deep learning method that can understand highly congested scenes perform accurate count estimation as well present high-quality density maps. The proposed is composed of two major components: convolutional neural (CNN) the front-end 2D feature extraction dilated CNN back-end, which uses kernels deliver larger reception fields replace pooling operations. an easy-trained model because its pure...
With the increase in software vulnerabilities that cause significant economic and social losses, automatic vulnerability detection has become essential development maintenance. Recently, large language models (LLMs) have received considerable attention due to their stunning intelligence, some studies consider using ChatGPT for detection. However, they do not fully characteristics of LLMs, since designed questions are simple without a prompt design tailored This paper launches study on...
This paper is a survey on the application of artificial neural networks in forecasting financial market prices. The objective this to appraise potential using predict system, as it reflected many relevant articles. It will provide some guidelines and references for research implementation. begins with an introduction theory networks. Subsequently focuses forecast stock prices option pricing based non-linear ANN model. proceeded presentation predicting exchange rates. then reviewed...
We propose a network for Congested Scene Recognition called CSRNet to provide data-driven and deep learning method that can understand highly congested scenes perform accurate count estimation as well present high-quality density maps. The proposed is composed of two major components: convolutional neural (CNN) the front-end 2D feature extraction dilated CNN back-end, which uses kernels deliver larger reception fields replace pooling operations. an easy-trained model because its pure...
Few-shot image classification aims to classify unseen classes with limited labelled samples. Recent works benefit from the meta-learning process episodic tasks and can fast adapt class training testing. Due number of samples for each task, initial embedding network becomes an essential component largely affect performance in practice. To this end, most existing methods highly rely on efficient network. data, scale is constrained under a supervised learning(SL) manner which bottleneck...
High quality AI solutions require joint optimization of algorithms and their hardware implementations. In this work, we are the first to propose a fully simultaneous, Efficient Differentiable DNN (deep neural network) architecture implementation co-search (EDD) methodology. We formulate problem by fusing search variables into one solution space, maximize both algorithm accuracy quality. The formulation is differentiable with respect fused variables, so that gradient descent can be applied...
The inference process in Large Language Models (LLMs) is often limited due to the absence of parallelism auto-regressive decoding process, resulting most operations being restricted by memory bandwidth accelerators. While methods such as speculative have been suggested address this issue, their implementation impeded challenges associated with acquiring and maintaining a separate draft model. In paper, we present Medusa, an efficient method that augments LLM adding extra heads predict...
Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity. Recently, most works focus on synthesizing independent images; While for real-world applications, it is common and necessary to generate a series of coherent images story-stelling. In this work, we mainly story visualization continuation tasks propose AR-LDM, latent model auto-regressively conditioned history captions generated images. Moreover, AR-LDM can generalize new characters through...
Object detection and tracking are challenging tasks for resource-constrained embedded systems. While these among the most compute-intensive from artificial intelligence domain, they only allowed to use limited computation memory resources on devices. In meanwhile, such implementations often required satisfy additional demanding requirements as real-time response, high-throughput performance, reliable inference accuracy. To overcome challenges, we propose SkyNet, a hardware-efficient neural...
Recently, we have seen a rapid development of Deep Neural Network (DNN) based visual tracking solutions. Some trackers combine the DNN-based solutions with Discriminative Correlation Filters (DCF) to extract semantic features and successfully deliver state-of-the-art accuracy. However, these are highly compute-intensive, which require long processing time, resulting unsecured real-time performance. To both high accuracy reliable performance, propose novel tracker called...
Partial person re-identification (ReID) aims to solve the problem of image spatial misalignment due occlusions or out-of-views. Despite significant progress through introduction additional information, such as human pose landmarks, mask maps, and partial ReID remains challenging noisy keypoints impressionable pedestrian representations. To address these issues, we propose a unified attribute-guided collaborative learning scheme for ReID. Specifically, introduce an adaptive threshold-guided...
Abstract By analyzing the product structure of extrusion valve and process problems existing in injection molding, selecting molding parameters polystyrene, mold this kind plastic parts is determined to be one four cavities. According average shrinkage rate size was calculated, conical fine striping mechanism arc side core-pulling were designed mold. The design machining forming parts, pouring system other structures are described detail, working introduced. has a successful trial, flexible...
In the context of clean and low-carbon transformation power systems, addressing challenge day-ahead electricity market price prediction issues triggered by strong stochastic volatility supply output due to high-penetration renewable energy integration, as well problems such limited dataset scales short cycles in test sets associated with existing methods, this paper introduced an innovative approach based on a multi-modal feature fusion BiGRUSA-ResSE-KAN deep learning model. data...
Video moderation, which refers to remove deviant or explicit content from e-commerce livestreams, has become prevalent owing social and engaging features. However, this task is tedious time consuming due the difficulties associated with watching reviewing multimodal video content, including frames audio clips. To ensure effective we propose VideoModerator, a risk-aware framework that seamlessly integrates human knowledge machine insights. This incorporates set of advanced learning models...
Convolutional models have been widely used in multiple domains. However, most existing only use local convolution, making the model unable to handle long-range dependency efficiently. Attention overcomes this problem by aggregating global information but also makes computational complexity quadratic sequence length. Recently, Gu et al. [2021] proposed a called S4 inspired state space model. can be efficiently implemented as convolutional whose kernel size equals input much longer sequences...
In this paper, we propose a joint generative and contrastive representation learning method (GeCo) for anomalous sound detection (ASD). GeCo exploits Predictive AutoEncoder (PAE) equipped with self-attention as model to perform frame-level prediction. The output of the PAE together original normal samples, are used supervised representative in multi-task framework. Besides cross-entropy loss between classes, is separate samples within each class. aims better capture context information among...
Recent work has demonstrated that neural networks are vulnerable to adversarial examples. To escape from the predicament, many works try harden model in various ways, which training is an effective way learns robust feature representation so as resist attacks. Meanwhile, self-supervised learning aims learn and semantic embedding data itself. With these views, we introduce against examples this paper. Specifically, coupled with k-Nearest Neighbour proposed for classification. further...
In the real world, a desirable Visual Question Answering model is expected to provide correct answers new questions and images in continual setting (recognized as CL-VQA). However, existing works formulate CL-VQA from vision-only or language-only perspective, straightforwardly apply uni-modal learning (CL) strategies this multi-modal task, which improper suboptimal. On one hand, such partial formulation may result limited evaluations. other neglecting interactions between modalities will...
The task of Language-Based Image Editing (LBIE) aims at generating a target image by editing the source based on given language description. main challenge LBIE is to disentangle semantics in and text then combine them generate realistic images. Therefore, performance heavily dependent learned representation. In this work, conditional generative adversarial network (cGAN) utilized for LBIE. We find that existing conditioning methods cGAN lack representation power as they cannot learn...
Few-shot image classification aims to classify unseen classes with limited labelled samples. Recent works benefit from the meta-learning process episodic tasks and can fast adapt class training testing. Due number of samples for each task, initial embedding network becomes an essential component largely affect performance in practice. To this end, most existing methods highly rely on efficient network. data, scale is constrained under a supervised learning(SL) manner which bottleneck...
Developing artificial intelligence (AI) at the edge is always challenging, since devices have limited computation capability and memory resources but need to meet demanding requirements, such as real-time processing, high throughput performance, inference accuracy. To overcome these challenges, we propose SkyNet, an extremely lightweight DNN with 12 convolutional (Conv) layers only 1.82 megabyte (MB) of parameters following a bottom-up design approach. SkyNet demonstrated in 56th IEEE/ACM...
Words are treated as atomic units in natural language processing tasks and it is a fundamental step to represent them vectors for supporting subsequent computations. GloVe widely used machine learning model train word vectors. Generally, large corpus high computation resources required high-quality using GloVe, making difficult users their own by themselves. A choice nowadays outsource the training process cloud. However, coming with such cloud-based services serious privacy concerns, which...