- Advanced Neural Network Applications
- Botanical Research and Chemistry
- Chemical synthesis and alkaloids
- Bioactive natural compounds
- Domain Adaptation and Few-Shot Learning
- Natural product bioactivities and synthesis
- Analytical Chemistry and Chromatography
- Topic Modeling
- Adversarial Robustness in Machine Learning
- Advanced Image and Video Retrieval Techniques
- Natural Language Processing Techniques
- Bauxite Residue and Utilization
- Video Surveillance and Tracking Methods
- Visual Attention and Saliency Detection
- Medicinal Plant Pharmacodynamics Research
- Traditional Chinese Medicine Analysis
- Electrokinetic Soil Remediation Techniques
- Adsorption and biosorption for pollutant removal
- Nanoparticle-Based Drug Delivery
- Plant-based Medicinal Research
- Expert finding and Q&A systems
- Simulation Techniques and Applications
- Microfluidic and Capillary Electrophoresis Applications
- Brain Tumor Detection and Classification
- Graphene and Nanomaterials Applications
Zhejiang Gongshang University
2023-2025
Taiyuan University of Technology
2022-2024
University of California, Santa Barbara
2022
Beijing University of Posts and Telecommunications
2019-2021
Shanxi University
2019
Yankton Rural Area Health Education Center
2010
East China University of Science and Technology
2009-2010
Beijing University of Chinese Medicine
2000-2001
Hoshi University
1999-2000
Beijing Hospital of Traditional Chinese Medicine
1997
Due to the complex attention mechanisms and model design, most existing vision Transformers (ViTs) can not perform as efficiently convolutional neural networks (CNNs) in realistic industrial deployment scenarios, e.g. TensorRT CoreML. This poses a distinct challenge: Can visual network be designed infer fast CNNs powerful ViTs? Recent works have tried design CNN-Transformer hybrid architectures address this issue, yet overall performance of these is far away from satisfactory. To end these,...
We discover that common diffusion noise schedules do not enforce the last timestep to have zero signal-to-noise ratio (SNR), and some implementations of samplers start from timestep. Such designs are flawed reflect fact model is given pure Gaussian at inference, creating a discrepancy between training inference. show design causes real problems in existing implementations. In Stable Diffusion, it severely limits only generate images with medium brightness prevents generating very bright dark...
The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over LLMs. We delve into study laws and present our distinctive findings that facilitate scale two commonly used configurations, 7B 67B. Guided by laws, we introduce DeepSeek LLM, project dedicated to advancing with long-term perspective. To support pre-training phase, have developed...
Channel pruning can significantly accelerate and compress deep neural networks. Many channel works utilize structured sparsity regularization to zero out all the weights in some channels automatically obtain structure-sparse network training stage. However, these methods apply on each layer separately where correlations between consecutive layers are omitted. In this paper, we first combine one out-channel current corresponding in-channel next as a group, namely out-in-channel. Our proposed...
In this study, adsorption of the heavy metal ions (Pb(II), Cu(II) and Cd(II)) from water by peanut shells (PS), sawdust (S) commercial activated carbon (AC) were comparatively studied. Thus, relationship between different parameters ion removal rates was investigated. The capacity three adsorbents for increased with an increase in temperature, pH value, contact time, adsorbent dosage, concentration, however, it decreased particle size. All processes are better described Langmuir isotherm or...
Recent advancements in diffusion models have unlocked unprecedented abilities visual creation. However, current text-to-video generation struggle with the trade-off among movement range, action coherence and object consistency. To mitigate this issue, we present a controllable (T2V) model, called Control-A-Video, capable of maintaining consistency while customizable video synthesis. Based on pre-trained conditional text-to-image (T2I) our model aims to generate videos conditioned sequence...
We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, substantially enhances the coding and mathematical reasoning capabilities DeepSeek-V2, while maintaining general Compared DeepSeek-Coder-33B, demonstrates...
Recent advancement in text-to-image models and corresponding personalized technologies enables individuals to generate high-quality imaginative images. However, they often suffer from limitations when generating images with resolutions outside of their trained domain. To overcome this limitation, we present the resolution adapter \textbf{(ResAdapter)}, a domain-consistent designed for diffusion unrestricted aspect ratios. Unlike other multi-resolution generation methods that process static...
Camera scene detection is among the most popular computer vision problem on smartphones. While many custom solutions were developed for this task by phone vendors, none of designed models available publicly up until now. To address problem, we introduce first Mobile AI challenge, where target to develop quantized deep learning-based camera classification that can demonstrate a real-time performance smartphones and IoT platforms. For this, participants provided with large-scale CamSDD dataset...
In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for managing computational costs when scaling up model parameters. However, conventional MoE architectures like GShard, which activate top-$K$ out $N$ experts, face challenges in ensuring expert specialization, i.e. each acquires non-overlapping and focused knowledge. response, we propose DeepSeekMoE towards ultimate specialization. It involves two principal strategies: (1) finely segmenting experts...
We revisit the existing excellent Transformers from perspective of practical application. Most them are not even as efficient basic ResNets series and deviate realistic deployment scenario. It may be due to current criterion measure computation efficiency, such FLOPs or parameters is one-sided, sub-optimal, hardware-insensitive. Thus, this paper directly treats TensorRT latency on specific hardware an efficiency metric, which provides more comprehensive feedback involving computational...
Recently, Transformer networks have achieved impressive results on a variety of vision tasks. However, most them are computationally expensive and not suitable for real-world mobile applications. In this work, we present Mobile Convolutional Vision (MoCoViT), which improves in performance efficiency by introducing transformer into convolutional to leverage the benefits both architectures. Different from recent works transformer, block MoCoViT is carefully designed devices very lightweight,...
We discover that common diffusion noise schedules do not enforce the last timestep to have zero signal-to-noise ratio (SNR), and some implementations of samplers start from timestep. Such designs are flawed reflect fact model is given pure Gaussian at inference, creating a discrepancy between training inference. show design causes real problems in existing implementations. In Stable Diffusion, it severely limits only generate images with medium brightness prevents generating very bright dark...