- Advanced Neural Network Applications
- Advanced Image Fusion Techniques
- Adversarial Robustness in Machine Learning
- Image and Signal Denoising Methods
- Advanced Malware Detection Techniques
- Domain Adaptation and Few-Shot Learning
- E-commerce and Technology Innovations
- Generative Adversarial Networks and Image Synthesis
- Machine Learning and Data Classification
- Anomaly Detection Techniques and Applications
- Image and Video Quality Assessment
- Advanced Image Processing Techniques
- Network Security and Intrusion Detection
- Natural Language Processing Techniques
- Stochastic Gradient Optimization Techniques
- Face and Expression Recognition
- Face recognition and analysis
- Music and Audio Processing
- Speech Recognition and Synthesis
- Speech and Audio Processing
- Biometric Identification and Security
- Machine Learning and Algorithms
- Advanced Computing and Algorithms
- Usability and User Interface Design
- Simulation and Modeling Applications
Nvidia (United States)
2024
Sichuan University
2024
China Jiliang University
2024
Wuhan Institute of Technology
2024
Nanchang University
2024
Nanjing University of Science and Technology
2022
University of Central Florida
2018-2021
Tsinghua University
2021
University of Notre Dame
2020
KLA (United States)
2020
Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) and accelerated schemes stochastic gradient descent (SGD) with momentum). For many models such convolutional neural networks (CNNs), typically converge faster but generalize worse compared to SGD; complex settings generative adversarial (GANs), are the default because of their stability.We propose AdaBelief simultaneously achieve three goals: fast convergence in methods, good generalization...
The recent success of deep neural networks is powered in part by large-scale well-labeled training data. However, it a daunting task to laboriously annotate an ImageNet-like dateset. On the contrary, fairly convenient, fast, and cheap collect images from Web along with their noisy labels. This signifies need alternative approaches using such Existing methods tackling this problem either try identify correct wrong labels or reweigh data terms loss function according inferred rates. Both...
This paper reviews the NTIRE 2019 challenge on real image denoising with focus proposed methods and their results. The has two tracks for quantitatively evaluating performance in (1) Bayer-pattern raw-RGB (2) standard RGB (sRGB) color spaces. had 216 220 registered participants, respectively. A total of 15 teams, proposing 17 methods, competed final phase challenge. by teams represent current state-of-the-art targeting noisy images.
In this paper, we present new data pre-processing and augmentation techniques for DNN-based raw image denoising. Compared with traditional RGB denoising, performing task on direct camera sensor readings presents challenges such as how to effectively handle various Bayer patterns from different sources, subsequently perform valid images. To address the first problem, propose a pattern unification (BayerUnify) method unify patterns. This allows us fully utilize heterogeneous dataset train...
Question answering systems are rapidly advancing, but their opaque nature may impact user trust. We explored trust through an anti-monitoring framework, where is predicted to be correlated with presence of citations and inversely related checking citations. tested this hypothesis a live question-answering experiment that presented text responses generated using commercial Chatbot along varying (zero, one, or five), both relevant random, recorded if participants checked the self-reported in...
Speaker diarization, which is to find the speech segments of specific speakers, has been widely used in human-centered applications such as video conferences or human-computer interaction systems. In this paper, we propose a self-supervised audio-video synchronization learning method address problem speaker diarization without massive labeling effort. We improve previous approaches by introducing two new loss functions: dynamic triplet and multinomial loss. test them on real-world system...
As deep neural networks (DNNs) have become increasingly important and popular, the robustness of DNNs is key to safety both Internet physical world. Unfortunately, some recent studies show that adversarial examples, which are hard be distinguished from real can easily fool manipulate their predictions. Upon observing examples mostly generated by gradient-based methods, in this paper, we first propose use a simple yet very effective non-differentiable hybrid model combines random forests,...
In this paper, we present a new inpainting framework for recovering missing regions of video frames. Compared with image inpainting, performing task on presents challenges such as how to preserving temporal consistency and spatial details, well handle arbitrary input size length fast efficiently. Towards end, propose novel deep learning architecture which incorporates ConvLSTM optical flow modeling the spatial-temporal in videos. It also saves much computational resource that our method can...
This presents a significant challenge for detecting and combating malicious software. Users often grant software permissions unknowingly, exposing their devices to risks such as unauthorized access, file manipulation, malware propagation. Traditional detection algorithms relying on limited permission-based strategies fall short in addressing this issue. To overcome this, we propose PVitNet (Network based On Pyramid Feature processing Vision Transformer), an Android method. incorporates...
The recent success of deep neural networks is powered in part by large-scale well-labeled training data. However, it a daunting task to laboriously annotate an ImageNet-like dateset. On the contrary, fairly convenient, fast, and cheap collect images from Web along with their noisy labels. This signifies need alternative approaches using such Existing methods tackling this problem either try identify correct wrong labels or reweigh data terms loss function according inferred rates. Both...
It is important to guarantee that images circulating in mobile cloud of smart city are not obscured by fog. Based on near-infrared's ability penetrate fog, this paper puts forward a 2-step real-time defog model for camera: putting the infrared - blue light intensity difference factor and using dark channel prior estimate haze distribution map; fusing near-infrared visible information based then, adopting downsampling, fast fuzz algorithm guided filter remove artificial effect improve...
In order to assess the quality of experience (QoE) HTTP video streaming, model three levels service (QoS): network QoS, application QoS and QoE, is employed in this paper. We mainly study effects pause position, therefore propose two new performance metrics: location each time interval pauses. first focus on buffer behaviors player, correlate with based analytical mathematical model. Then subjective tests experiments are carried out how metrics affect Back Propagation Neural Net (BPNN)...
Compilation errors happen during the debugging process of novice students. Compiler error messages help novices to localize and remove errors, but these are difficult understand for Some computing education researchers analyzed compiler generated by novice's attempts compile their programs. However, some important questions remain open. For example, existing compilation category cannot cover all programs due simple static analysis program repair patterns. And prediction models classifying...
In the traditional non-invasive load monitoring (NILM) algorithms, identification accuracy is enhanced with increased network scale while sacrificing calculation speed, which restricts efficiency of identification. this study, a multi-feature (active/reactive power and current peak-to-peak value) fusion algorithm proposed, can achieve smaller maintaining speed. The features amplitudes loads are transformed into values red-green-blue (RGB) color channels by coding then fused V - I trajectory...
The user's Quality of Experience (QoE) is an assessment the human experience. It's not only influenced by Service (QoS) parameters but also preference for video content. This article studies content and how QoE changes on account their biases. Through two experiments, it concluded that higher score is, user rate MOS less reduces when resolution decrease. That more likes content, tolerant they will be to quality reduction videos. Then a network resource management strategy proposed based...
In the past few years, with continuous breakthrough of technology in various fields, artificial intelligence has been considered as a revolutionary technology. One most important and useful applications is face detection. The outbreak COVID-19 promoted development noncontact identity authentication system. Face detection also one key techniques this kind However, current real-time computationally expensive which hinders application recognition. To address issue, we propose verification...
Transfer learning has become the de facto practice to reuse a deep neural network (DNN) that is pre-trained with abundant training data in source task improve model on target tasks smaller-scale data. In this paper, we first investigate correlation between DNN's pre-training performance and their transfer results downstream tasks. We find high of does not necessarily imply transferability. then propose metric, named Fréchet Pre-train Distance, estimate transferability network. By applying...
With the development of mobile applications into a part modern life, user usage behavior data can well reflect attribute characteristics users. For many downstream applications, including advertising, recommendations provide effective support. To users with customized services and optimize experience, industry scholars have been exploring feasible solutions. However, automatic modeling based on app faces unique challenges, (1) poor generalization performance single task, (2) uneven...
We introduce GenUSD, an end-to-end text-to-scene generation framework that transforms natural language queries into realistic 3D scenes, including objects and layouts. The process involves two main steps: 1) A Large Language Model (LLM) generates a scene layout hierarchically. It first proposes high-level plan to decompose the multiple functionally spatially distinct subscenes. Then, for each subscene, LLM with detailed positions, poses, sizes, descriptions. To manage complex object...
Human-computer interaction (Human-Computer Interaction, HCI) mainly studies the information exchange between users and systems, it includes two parts: user-to-system system-to-user exchange. technology, as an interface for humans computers a human-centered methodology to guide system development, plays very important role in development of both computers. The opportunities human-computer are enormous. Advances more practical natural interfaces invaluable will profoundly change our daily...