Shaoyu Chen

ORCID: 0000-0002-1856-6294
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Autonomous Vehicle Technology and Safety
  • Robotics and Sensor-Based Localization
  • Data Management and Algorithms
  • Data Visualization and Analytics
  • Video Surveillance and Tracking Methods
  • Advanced Vision and Imaging
  • Human Pose and Action Recognition
  • Augmented Reality Applications
  • Multimodal Machine Learning Applications
  • Artificial Intelligence in Games
  • Image and Video Quality Assessment
  • Peer-to-Peer Network Technologies
  • Distributed and Parallel Computing Systems
  • Visual Attention and Saliency Detection
  • Geochemistry and Geologic Mapping
  • 3D Shape Modeling and Analysis
  • Transportation and Mobility Innovations
  • Robotic Path Planning Algorithms
  • Human Motion and Animation
  • Infrared Target Detection Methodologies
  • Digital Games and Media
  • Text Readability and Simplification
  • Digital Media Forensic Detection

New York University
2021-2025

Shanghai Electric (China)
2023-2024

Huazhong University of Science and Technology
2021-2024

China University of Geosciences (Beijing)
2024

City University of New York
2023

Fujian Business University
2023

Horizon Robotics (China)
2022

Georgia Institute of Technology
2017-2021

National Central University
2020

Tianjin Research Institute of Electric Science (China)
2020

Instance segmentation on point clouds is a fundamental task in 3D scene perception. In this work, we propose concise clustering-based framework named HAIS, which makes full use of spatial relation points and sets. Considering methods may result over-segmentation or under-segmentation, introduce the hierarchical aggregation to progressively generate instance proposals, i.e., for preliminarily clustering sets set generating complete instances from Once are obtained, sub-network intra-instance...

10.1109/iccv48922.2021.01518 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

In this paper, we propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation. Previously, most segmentation methods heavily rely on object detection perform mask prediction based bounding boxes or dense centers. contrast, sparse set of activation maps, as new representation, to high-light informative regions each foreground object. Then instance-level features are obtained by aggregating according the highlighted recognition Moreover,...

10.1109/cvpr52688.2022.00439 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Autonomous driving requires a comprehensive understanding of the surrounding environment for reliable trajectory planning. Previous works rely on dense rasterized scene representation (e.g., agent occupancy and semantic map) to perform planning, which is computationally intensive misses instance-level structure information. In this paper, we propose VAD, an end-to-end vectorized paradigm autonomous driving, models as fully representation. The proposed has two significant advantages. On one...

10.1109/iccv51070.2023.00766 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Machine learning advances have afforded an increase in algorithms capable of creating art, music, stories, games, and more. However, it is not yet well-understood how machine might best collaborate with people to support creative expression. To investigate practicing designers perceive the role AI process, we developed a game level design tool for Super Mario Bros.-style games built-in designer. In this paper discuss our Morai Maker intelligent through two mixed-methods studies total over...

10.1145/3290605.3300854 preprint EN 2019-04-29

Labeling objects with pixel-wise segmentation requires a huge amount of human labor compared to bounding boxes. Most existing methods for weakly supervised instance focus on designing heuristic losses priors from While, we find that box-supervised can produce some fine masks and wonder whether the detectors could learn these while ignoring low-quality masks. To answer this question, present BoxTeacher, an efficient end-to-end training framework high-performance segmentation, which leverages...

10.1109/cvpr52729.2023.00307 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

High-definition (HD) map provides abundant and precise environmental information of the driving scene, serving as a fundamental indispensable component for planning in autonomous system. We present MapTR, structured end-to-end Transformer efficient online vectorized HD construction. propose unified permutation-equivalent modeling approach, i.e., element point set with group equivalent permutations, which accurately describes shape stabilizes learning process. design hierarchical query...

10.48550/arxiv.2208.14437 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

3D detection based on surround-view camera system is a critical technique in autopilot. In this work, we present Polar Parametrization for detection, which reformulates position parametrization, velocity decomposition, perception range, label assignment and loss function polar coordinate system. establishes explicit associations between image patterns prediction targets, exploiting the view symmetry of cameras as inductive bias to ease optimization boost performance. Based Parametrization,...

10.48550/arxiv.2206.10965 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

Learning Bird's Eye View (BEV) representation from surrounding-view cameras is of great importance for autonomous driving. In this work, we propose a Geometry-guided Kernel Transformer (GKT), novel 2D-to-BEV learning mechanism. GKT leverages the geometric priors to guide transformer focus on discriminative regions and unfolds kernel features generate BEV representation. For fast inference, further introduce look-up table (LUT) indexing method get rid camera's calibrated parameters at...

10.48550/arxiv.2206.04584 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01

Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and open-loop gap. In this work, we establish a 3DGS-based closed-loop Reinforcement (RL) training paradigm. By leveraging 3DGS techniques, construct photorealistic digital replica of real physical world, enabling AD policy to extensively explore state space learn handle out-of-distribution scenarios through large-scale trial error. To...

10.48550/arxiv.2502.13144 preprint EN arXiv (Cornell University) 2025-02-18

The concept of an AI assistant for task guidance is rapidly shifting from a science fiction staple to impending reality. Such system inherently complex, requiring models perceptual grounding, attention, and reasoning, intuitive interface that adapts the performer's needs, orchestration data streams many sensors. Moreover, all acquired by must be readily available post-hoc analysis enable developers understand performer behavior quickly detect failures. We introduce TIM, first end-to-end...

10.1109/mcg.2025.3549696 article EN IEEE Computer Graphics and Applications 2025-01-01

Exploring large virtual environments, such as cities, is a central task in several domains, gaming and urban planning. VR systems can greatly help this by providing an immersive experience; however, common issue with viewing navigating city the traditional sense that users either obtain local or global view, but not both at same time, requiring them to continuously switch between perspectives, losing context distracting from their analysis. In paper, our goal allow navigate points of...

10.1109/tvcg.2021.3099012 article EN IEEE Transactions on Visualization and Computer Graphics 2021-07-26

Media streaming, with an edge-cloud setting, has been adopted for a variety of applications such as entertainment, visualization, and design. Unlike video/audio streaming where the content is usually consumed passively, virtual reality require 3D assets stored on edge to facilitate frequent edge-side interactions object manipulation viewpoint movement. Compared audio video asset often requires larger data sizes yet lower latency ensure sufficient rendering quality, resolution, perceptual...

10.1109/tvcg.2022.3150522 article EN IEEE Transactions on Visualization and Computer Graphics 2022-02-12

The concept of augmented reality (AR) assistants has captured the human imagination for decades, becoming a staple modern science fiction. To pursue this goal, it is necessary to develop artificial intelligence (AI)-based methods that simultaneously perceive 3D environment, reason about physical tasks, and model performer, all in real-time. Within framework, wide variety sensors are needed generate data across different modalities, such as audio, video, depth, speech, time-of-flight....

10.1109/tvcg.2023.3327396 article EN IEEE Transactions on Visualization and Computer Graphics 2023-01-01

Text presented in augmented reality provides in-situ, real-time information for users. However, this content can be challenging to apprehend quickly when engaging cognitively demanding AR tasks, especially it is on a head-mounted display. We propose ARTiST, an automatic text simplification system that uses few-shot prompt and GPT-3 models specifically optimize the length semantic reality. Developed out of formative study included seven users three experts, our combines customized error...

10.1145/3613904.3642772 article EN 2024-05-11

In-vehicle automated safety features aim to increase safety; however, they are not always perfect. When systems fail, leave the driver unprepared recover quickly and safely. Reliability displays, informing of system's confidence in itself, could help keep drivers aware automation's status when failures occur. This study proposed two metrics for displaying this information driver: automation reliability (AR), a system-centric metric; required engagement (RDE), human-centric metric. Visual...

10.1145/3122986.3123007 article EN 2017-09-24

Small object detection requires the head to scan a large number of positions on image feature maps, which is extremely hard for computation- and energy-efficient lightweight generic detectors. To accurately detect small objects with limited computation, we propose two-stage framework low computation complexity, termed as TinyDet. It enables high-resolution maps dense anchoring better cover objects, proposes sparsely-connected convolution reduction, enhances early stage features in backbone,...

10.48550/arxiv.2304.03428 preprint EN cc-by arXiv (Cornell University) 2023-01-01

With the explosive growth in demand for lithium (Li) resources, Mufushan area has been a hotspot Li deposit exploration China recent years. Geochemical maps and geochemical anomaly are basic of mineral resources. A fixed-value method to contour map is presented here, which concentrations divided into 19 levels on 18 fixed values, ranging from 5 μg/g (corresponding detection limit) 1858 cut-off grade hard-rock type) illustrated six color tones corresponding areas low background, high anomaly,...

10.3390/app14051978 article EN cc-by Applied Sciences 2024-02-28

High-definition (HD) map provides abundant and precise static environmental information of the driving scene, serving as a fundamental indispensable component for planning in autonomous system. In this paper, we present \textbf{Map} \textbf{TR}ansformer, an end-to-end framework online vectorized HD construction. We propose unified permutation-equivalent modeling approach, \ie, element point set with group equivalent permutations, which accurately describes shape stabilizes learning process....

10.48550/arxiv.2308.05736 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01
Coming Soon ...