Ran Xu

ORCID: 0000-0003-2913-9420
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Human Pose and Action Recognition
  • Video Surveillance and Tracking Methods
  • Anomaly Detection Techniques and Applications
  • Music and Audio Processing
  • Video Analysis and Summarization
  • Visual Attention and Saliency Detection
  • Domain Adaptation and Few-Shot Learning
  • Sparse and Compressive Sensing Techniques
  • IoT and Edge/Fog Computing
  • CCD and CMOS Imaging Sensors
  • Energy Efficient Wireless Sensor Networks
  • Advanced Image and Video Retrieval Techniques
  • Speech Recognition and Synthesis
  • Face and Expression Recognition
  • Advanced Measurement and Detection Methods
  • Speech and Audio Processing
  • Natural Language Processing Techniques
  • Adversarial Robustness in Machine Learning
  • Advanced Image Fusion Techniques
  • Context-Aware Activity Recognition Systems
  • Image Processing Techniques and Applications
  • Privacy-Preserving Technologies in Data
  • Advanced Algorithms and Applications

Salesforce (United States)
2020-2024

Tianjin University
2009-2024

Hangzhou Dianzi University
2023-2024

Chongqing University of Posts and Telecommunications
2024

Southwest University
2021-2022

Purdue University West Lafayette
2018-2022

Beijing Electronic Science and Technology Institute
2020-2021

Tsinghua University
2018

University at Buffalo, State University of New York
2012-2017

Buffalo State University
2015-2016

Recently, joint video-language modeling has been attracting more and attention. However, most existing approaches focus on exploring the language model upon a fixed visual model. In this paper, we propose unified framework that jointly models video corresponding text sentences. The consists of three parts: compositional semantics model, deep embedding our dependency-tree structure embeds sentence into continuous vector space, which preserves visually grounded meanings word order. leverage...

10.1609/aaai.v29i1.9512 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2015-02-19

The recognition capabilities of current state-of-the-art 3D models are limited by datasets with a small number annotated data and pre-defined set categories. In its 2D counterpart, recent advances have shown that similar problems can be significantly alleviated employing knowledge from other modalities, such as language. Inspired this, leveraging multimodal information for modality could promising to improve understanding under the restricted regime, but this line research is not well...

10.1109/cvpr52729.2023.00120 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring trustworthiness emerges as an important topic. This paper introduces TrustLLM, a comprehensive study LLMs, including principles different dimensions trustworthiness, established benchmark, evaluation, and analysis mainstream discussion...

10.48550/arxiv.2401.05561 preprint EN cc-by-nc-sa arXiv (Cornell University) 2024-01-01

In this paper, we focus on semi-supervised object detection to boost performance of proposal-based detectors (a.k.a. two-stage detectors) by training both labeled and unlabeled data. However, it is non-trivial train data due the un-availability ground truth labels. To address problem, present a proposal learning approach learn features predictions from The consists self-supervised module consistency-based module. module, location loss contrastive context-aware noise-robust respectively....

10.1109/wacv48630.2021.00234 article EN 2021-01-01

Current contrastive learning frameworks focus on leveraging a single supervisory signal to learn representations, which limits the efficacy unseen data and downstream tasks. In this paper, we present hierarchical multi-label representation framework that can leverage all available labels preserve relationship between classes. We introduce novel hierarchy preserving losses, jointly apply penalty loss, enforce constraint. The loss function is driven automatically adapts arbitrary structures....

10.1109/cvpr52688.2022.01616 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Detailed analysis of human action, such as action classification, detection and localization has received increasing attention from the community; datasets like JHMDB have made it plausible to conduct studies analyzing impact that deeper information on greater understanding problem. However, detailed automatic segmentation comparatively been unexplored. In this paper, we take a step in direction propose hierarchical MRF model bridge low-level video fragments with high-level motion...

10.1109/cvpr.2015.7299000 article EN 2015-06-01

Online action detection in untrimmed videos aims to identify an as it happens, which makes very important for real-time applications. Previous methods rely on tedious annotations of temporal boundaries training, hinders the scalability online systems. We propose WOAD, a weakly supervised framework that can be trained using only video-class labels. WOAD contains two jointly-trained modules, i.e., proposal generator (TPG) and recognizer (OAR). Supervised by labels, TPG works offline targets at...

10.1109/cvpr46437.2021.00195 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Triboelectric nanogenerators, as a device that converts mechanical energy into electrical energy, can respond to external pressure stimuli. However, most triboelectric sensors only perform measurements in narrow range, which limits their application multiple scenarios. Here, we proposed wide-range sensor based on the difference Young's modulus of materials and double-sandwich-structure design. We analyzed effect structural angle at material surface performance obtained an optimal combination...

10.1021/acsaelm.2c00681 article EN ACS Applied Electronic Materials 2022-07-28

Action analysis in image and video has been attracting more attention computer vision. Recognizing specific actions clips the main focus. We move a new, general direction this paper ask critical fundamental question: what is action, how action different from motion, given or where action? study philosophical visual characteristics of which lead us to define actionness: intentional bodily movement biological agents (people, animals). To solve problem, we propose lattice conditional ordinal...

10.1109/cvpr.2014.101 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2014-06-01

The field of IoT has blossomed and is positively influencing many application domains. In this article, we bring out the unique challenges poses to research in computer systems networking. arise from characteristics such as diversity domains where they are used increasingly demanding protocols being called upon run (such video LIDAR processing) on constrained resources (on-node network). We show how these open can benefit foundations laid other areas, fifth-generation network cellular...

10.1109/jiot.2020.3007690 article EN publisher-specific-oa IEEE Internet of Things Journal 2020-07-08

Metric learning makes it plausible to learn semantically meaningful distances for complex distributions of data using label or pairwise constraint information. However, date, most metric methods are based on a single Mahalanobis metric, which cannot handle heterogeneous well. Those that multiple metrics throughout the feature space have demonstrated superior accuracy, but at severe cost computational efficiency. Here, we adopt new angle problem and is able implicitly adapt its distance...

10.1145/2339530.2339680 article EN 2012-08-12

Advanced video analytic systems, including scene classification and object detection, have seen widespread success in various domains such as smart cities autonomous systems. With an evolution of heterogeneous client devices, there is incentive to move these heavy analytics workloads from the cloud mobile devices for low latency real-time processing preserve user privacy. However, most systems are heavyweight trained offline with some pre-defined or accuracy requirements. This makes them...

10.1145/3384419.3431159 article EN 2020-11-16

In response to the increasing safety concerns posed by low, slow, and small unmanned aerial vehicles (UAVs), use of flexible nets for interception emerges as a promising solution due its high tolerance, minimal requirements, cost-effectiveness. To enhance effectiveness net capture system these types UAVs, an optimization system’s parameters is conducted. A dynamic model developed, deployment process simulated analyzed through combination ABAQUS 2022/Explicit MATLAB R2020b software. The...

10.3390/drones9030190 article EN cc-by Drones 2025-03-04

Forward-looking correlated imaging plays an increasingly important role in modern radar systems. It overcomes disadvantages of traditional side or squint synthetic aperture (SAR) which is dependent on specific relative motion between the and target scene. A new microwave forward-looking 3-D method based random radiation field combined with sparse reconstruction proposed this article. Firstly, phased array (PAR) adopted to form different antenna patterns. Then, compressed sensing (CS) theory,...

10.1109/tgrs.2020.3047018 article EN IEEE Transactions on Geoscience and Remote Sensing 2021-02-09

This paper proposes a new learning method, which integrates feature selection with classifier construction for human detection via solving three optimization models. Firstly, the method trains series of weak-classifiers by proposed L1-norm Minimization Learning (LML) and min-max penalty function Secondly, selects using integer model to construct strong classifier. The minimization models aim find minimal VC-dimension weak classifiers respectively. Finally, constructs cascade LML (CLML) reach...

10.1109/cvpr.2010.5540224 article EN 2010-06-01

With the increase in number cores modern architectures, need for co-locating multiple workloads has become crucial improving overall compute utilization. However, on same server is often avoided to protect performance of latency sensitive (LS) from contentions created by other co-located shared resources, such as cache and memory bandwidth.

10.1145/3274808.3274820 article EN 2018-11-26

An adaptive video object detection system selects different execution paths at runtime, based on content and available resources, so as to maximize accuracy under a target latency objective (e.g., 30 frames per second). Such is well suited mobile devices with limited computing often running multiple contending applications. Existing solutions suffer from two major drawbacks. First, collecting feature values decide an branch expensive. Second, there switching overhead for transitioning...

10.1145/3492321.3519577 article EN 2022-03-28
Coming Soon ...