Mingjie Wang

ORCID: 0000-0002-7346-1110
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Surveillance and Tracking Methods
  • Anomaly Detection Techniques and Applications
  • Advanced Neural Network Applications
  • Human Pose and Action Recognition
  • Image Enhancement Techniques
  • Advanced biosensing and bioanalysis techniques
  • Advanced Image Processing Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Fire Detection and Safety Systems
  • Graphene and Nanomaterials Applications
  • Advanced Vision and Imaging
  • RNA Interference and Gene Delivery
  • Domain Adaptation and Few-Shot Learning
  • Computer Graphics and Visualization Techniques
  • 3D Shape Modeling and Analysis
  • 3D Surveying and Cultural Heritage
  • Speech and Audio Processing
  • Advanced Image and Video Retrieval Techniques
  • Face recognition and analysis
  • Emotion and Mood Recognition
  • Gait Recognition and Analysis
  • Machine Learning and ELM
  • Carbon and Quantum Dots Applications
  • Sports, Gender, and Society
  • Human Mobility and Location-Based Analysis

Wuhan Polytechnic University
2023-2024

Zhejiang Sci-Tech University
2023-2024

China People's Public Security University
2024

Ruijin Hospital
2024

Shanghai Jiao Tong University
2024

Dalian Neusoft University of Information
2024

Jilin University
2024

Second Affiliated Hospital of Jilin University
2024

Fudan University
2024

Shanghai Medical College of Fudan University
2024

State-of-the-art approaches for crowd counting resort to deepneural networks predict density maps. However, people in congested scenes remains a challenging task because the presence of drastic scale variation, inconsistency, and complex background can seriously degrade their accuracy. To battle ingrained issue accuracy degradation, this paper, we propose novel powerful network called Scale Tree Network (STNet) accurate counting. STNet consists two key components: Scale-Tree Diversity...

10.1109/tmm.2022.3142398 article EN IEEE Transactions on Multimedia 2022-01-13

Crowd counting has attracted increasing attentions in recent years due to its challenges and wide societal applications. Despite persevering efforts made by the research community, most of existing methods require a large amount location-level annotations. Collecting such type fine-granularity supervisory signals is extremely time-consuming labour-intensive, thereby hindering well generalization these location-adherent models. To shun this drawback, several pioneering studies open promising...

10.1109/wacv56688.2023.00025 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023-01-01

The tumor immune microenvironment (TIME) can limit the effectiveness and often leads to significant side effects of conventional cancer therapies. Consequently, there is a growing interest in identifying novel targets enhance efficacy targeted therapy. More research indicates that tumor-associated macrophages (TAMs), originating from peripheral blood monocytes generated bone marrow myeloid progenitor cells, play crucial role (TME) are closely associated with resistance traditional Lipid...

10.1016/j.intimp.2024.112319 article EN cc-by-nc-nd International Immunopharmacology 2024-05-26

The task of few-shot visual dubbing focuses on synchronizing the lip movements with arbitrary speech input for any talking head video. Albeit moderate improvements in current approaches, they commonly require high-quality homologous data sources videos and audios, thus causing failure to leverage heterogeneous sufficiently. In practice, it may be intractable collect perfect some cases, example, audio-corrupted or picture-blurry videos. To explore this kind support high-fidelity dubbing,...

10.1145/3474085.3475318 preprint EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

In this paper, we present a novel Hierarchically-fused Generative Adversarial Network (HfGAN) for synthesizing realistic images from text descriptions. While existing approaches on topic have achieved impressive success, to generate 256×256 captions, they commonly resort coarse-to-fine scheme and associate multiple discriminators in different stages of the networks. Such strategy is both inefficient prone artifacts. Motivated by above findings, propose an end-to-end network that can...

10.1109/crv.2019.00018 article EN 2019-05-01

10.1007/s11042-017-5258-9 article EN Multimedia Tools and Applications 2017-10-03

10.1016/j.bspc.2024.106591 article EN Biomedical Signal Processing and Control 2024-06-29

Crowd counting from unconstrained and congested scenes is an important task in computer vision. Its main difficulties stem large scale/density variation prone to over-fitting. This paper presents a novel end-to-end stochastic multi-scale aggregation network (SMANet) which carefully addresses these issues. Specifically, general features are first extracted by the front-end subnetwork then fed into back-end consists of module, density map generator, global prior encoder. The impels...

10.1109/icassp40776.2020.9054238 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

10.1016/j.jvcir.2020.102861 article EN Journal of Visual Communication and Image Representation 2020-07-25

Recently, Convolution Neural Networks (CNNs) obtained huge success in numerous vision tasks. In particular, DenseNets have demonstrated that feature reuse via dense skip connections can effectively alleviate the difficulty of training very deep networks and reusing features generated by initial layers all subsequent has strong impact on performance. To feed even richer information into network, a novel adaptive Multi-scale Aggregation module is presented this paper. Composed for multi-scale...

10.1109/wacv.2019.00040 preprint EN 2019-01-01

A robust solution for semi-dense stereo matching is presented. It utilizes two CNN models computing cost and performing confidence-based filtering, respectively. Compared to existing CNNs-based generation approaches, our method feeds additional global information into the network so that learned model can better handle challenging cases, such as lighting changes lack of textures. Through utilizing non-parametric transforms, also more self-reliant than most which rely highly on adjustment...

10.1109/wacv.2019.00174 article EN 2019-01-01

The ever-increasing demands for intuitive interactions in Virtual Reality has triggered a boom the realm of Facial Expression Recognition (FER). To address limitations existing approaches (e.g., narrow receptive fields and homogenous supervisory signals) further cement capacity FER tools, novel multifarious supervision-steering Transformer wild is proposed this paper. Referred as FER-former, our approach features multi-granularity embedding integration, hybrid self-attention scheme,...

10.48550/arxiv.2303.12997 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...