NFDI4DS | UHH-SEMS - Publication Details

Siyuan Qi

ORCID: 0000-0002-4070-733X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5043510420

Research Areas

Human Pose and Action Recognition
Multimodal Machine Learning Applications
Advanced Vision and Imaging
Domain Adaptation and Few-Shot Learning
Advanced Image and Video Retrieval Techniques
Reinforcement Learning in Robotics
Anomaly Detection Techniques and Applications
Robotics and Sensor-Based Localization
Advanced Neural Network Applications
Video Surveillance and Tracking Methods
Topic Modeling
Robot Manipulation and Learning
3D Surveying and Cultural Heritage
Remote Sensing and LiDAR Applications
Child and Animal Learning Development
Natural Language Processing Techniques
Machine Learning and Algorithms
Image Retrieval and Classification Techniques
Hand Gesture Recognition Systems
Explainable Artificial Intelligence (XAI)
Computer Graphics and Visualization Techniques
Multi-Agent Systems and Negotiation
Molecular Biology Techniques and Applications
Cancer Cells and Metastasis
Text Readability and Simplification

Beijing Academy of Artificial Intelligence
2023-2024

Beijing Institute for General Artificial Intelligence
2023-2024

Henan Normal University
2022

Google (United States)
2020-2021

University of California, Los Angeles
2017-2020

UCLA Health
2019-2020

University of Indianapolis
2019

Indiana University – Purdue University Indianapolis
2019

Renmin University of China
2015

Human-Centric Indoor Scene Synthesis Using Stochastic Grammar

OPENALEX - Publications

Siyuan Qi Yixin Zhu Siyuan Huang Chenfanfu Jiang Song‐Chun Zhu

We present a human-centric method to sample and synthesize 3D room layouts 2D images thereof, obtain large-scale 2D/3D image data with the perfect per-pixel ground truth. An attributed spatial And-Or graph (S-AOG) is proposed represent indoor scenes. The S-AOG probabilistic grammar model, in which terminal nodes are object entities including room, furniture, supported objects. Human contexts as contextual relations encoded by Markov Random Fields (MRF) on nodes. learn distributions from an...

10.1109/cvpr.2018.00618 preprint EN 2018-06-01

Learning Compositional Neural Information Fusion for Human Parsing

OPENALEX - Publications

Wenguan Wang Zhijie Zhang Siyuan Qi Jianbing Shen Yanwei Pang and 1 more

This work proposes to combine neural networks with the compositional hierarchy of human bodies for efficient and complete parsing. We formulate approach as a information fusion framework. Our model assembles from three inference processes over hierarchy: direct (directly predicting each part body using image information), bottom-up (assembling knowledge constituent parts), top-down (leveraging context parent nodes). The inferences explicitly decompositional relations in bodies, respectively....

10.1109/iccv.2019.00580 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Cascaded Human-Object Interaction Recognition

OPENALEX - Publications

Tianfei Zhou Wenguan Wang Siyuan Qi Haibin Ling Jianbing Shen

Rapid progress has been witnessed for human-object interaction (HOI) recognition, but most existing models are confined to single-stage reasoning pipelines. Considering the intrinsic complexity of task, we introduce a cascade architecture multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines proposals and feeds them into recognition network. Each two networks is also connected its predecessor at previous enabling cross-stage...

10.1109/cvpr42600.2020.00432 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Reasoning Visual Dialogs With Structural and Partial Observations

OPENALEX - Publications

Zilong Zheng Wenguan Wang Siyuan Qi Song‐Chun Zhu

We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain reasonable answer based on current question and history, underlying semantic dependencies between entities are essential. In this paper, we explicitly formalize as inference in graphical with partially observed nodes unknown graph structures (relations dialog). The given viewed nodes. is represented by node missing value. first introduce an Expectation Maximization algorithm...

10.1109/cvpr.2019.00683 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense

OPENALEX - Publications

Yixin Chen Siyuan Huang Tao Yuan Yixin Zhu Siyuan Qi and 1 more

We propose a new 3D holistic <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">++</sup> scene understanding problem, which jointly tackles two tasks from single-view image: (i) parsing and reconstruction-3D estimations of object bounding boxes, camera pose, room layout, (ii) human pose estimation. The intuition behind is to leverage the coupled nature these improve granularity performance understanding. exploit critical essential connections...

10.1109/iccv.2019.00874 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

A tale of two explanations: Enhancing human trust by explaining robot behavior

OPENALEX - Publications

Mark Edmonds Feng Gao Hangxin Liu Xu Xie Siyuan Qi and 5 more

Forms of explanation that are best suited to foster trust do not necessarily correspond those components contributing the task performance.

10.1126/scirobotics.aay4663 article EN Science Robotics 2019-12-18

Cascaded Parsing of Human-Object Interaction Recognition

OPENALEX - Publications

Tianfei Zhou Siyuan Qi Wenguan Wang Jianbing Shen Song‐Chun Zhu

This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images. Considering intrinsic complexity structural nature task, we introduce a cascaded parsing network (CP-HOI) for multi-stage, structured HOI understanding. At each cascade stage, an instance detection module progressively refines proposals feeds them into interaction reasoning module. Each two modules is also connected to its predecessor previous enabling efficient cross-stage information...

10.1109/tpami.2021.3049156 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-01-05

Predicting Human Activities Using Stochastic Grammar

OPENALEX - Publications

Siyuan Qi Siyuan Huang Ping Wei Song‐Chun Zhu

This paper presents a novel method to predict future human activities from partially observed RGB-D videos. Human activity prediction is generally difficult due its non-Markovian property and the rich context between environments. We use stochastic grammar model capture compositional structure of events, integrating actions, objects, their affordances. represent event by spatial-temporal And-Or graph (ST-AOG). The ST-AOG composed temporal defined on sub-activities, spatial graphs...

10.1109/iccv.2017.132 article EN 2017-10-01

Hierarchical Human Semantic Parsing with Comprehensive Part-Relation Modeling

OPENALEX - Publications

Wenguan Wang Tianfei Zhou Siyuan Qi Jianbing Shen Song‐Chun Zhu

Modeling the human structure is central for parsing that extracts pixel-wise semantic information from images. We start with analyzing three types of inference processes over hierarchical bodies: direct (directly predicting parts using image information), bottom-up (assembling knowledge constituent parts), and top-down (leveraging context parent nodes). then formulate problem as a compositional neural fusion (CNIF) framework, which assembles in conditional manner, i.e., considering...

10.1109/tpami.2021.3055780 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-01-01

The effects of food safety issues released by we media on consumers’ awareness and purchasing behavior: A case study in China

OPENALEX - Publications

Peng Ya-la Jiajie Li Hui Xia Siyuan Qi Jianhong Li

10.1016/j.foodpol.2014.12.010 article EN Food Policy 2015-01-13

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars

OPENALEX - Publications

Chenfanfu Jiang Siyuan Qi Yixin Zhu Siyuan Huang Jenny Lin and 3 more

10.1007/s11263-018-1103-5 article EN International Journal of Computer Vision 2018-06-30

Learning Human-Object Interactions by Graph Parsing Neural Networks

OPENALEX - Publications

Siyuan Qi Wenguan Wang Baoxiong Jia Jianbing Shen Song‐Chun Zhu

This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images videos. We introduce Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end. For given scene, GPNN infers parse graph includes i) HOI structure represented by an adjacency matrix, ii) node labels. Within message passing inference framework, iteratively computes matrices extensively evaluate our model on three detection...

10.48550/arxiv.1808.07962 preprint EN other-oa arXiv (Cornell University) 2018-01-01

E2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

OPENALEX - Publications

Cheng Han Qifan Wang Yiming Cui Zhiwen Cao Wenguan Wang and 2 more

As the size of transformer-based, models continues to grow, fine-tuning these large-scale pretrained vision for new tasks has become increasingly parameter-intensive. Parameter-efficient learning been developed reduce number tunable parameters during fine-tuning. Although methods show promising results, there is still a significant performance gap compared full To address this challenge, we propose an Effective and Efficient Visual Prompt Tuning (E <sup...

10.1109/iccv51070.2023.01604 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Feeling the force: Integrating force and pose for fluent discovery through imitation learning to open medicine bottles

OPENALEX - Publications

Mark Edmonds Feng Gao Xu Xie Hangxin Liu Siyuan Qi and 3 more

Learning complex robot manipulation policies for real-world objects is challenging, often requiring significant tuning within controlled environments. In this paper, we learn a model to execute tasks with multiple stages and variable structure, which typically are not suitable most approaches. The learned from human demonstration using tactile glove that measures both hand pose contact forces. enables observation of visually latent changes in the scene, specifically forces imposed unlock...

10.1109/iros.2017.8206196 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017-09-01

Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense

OPENALEX - Publications

Yixin Zhu Tao Gao Lifeng Fan Siyuan Huang Mark Edmonds and 7 more

Recent progress in deep learning is essentially based on a "big data for small tasks" paradigm, under which massive amounts of are used to train classifier single narrow task. In this paper, we call shift that flips paradigm upside down. Specifically, propose "small big wherein artificial intelligence (AI) system challenged develop "common sense," enabling it solve wide range tasks with little training data. We illustrate the potential power new by reviewing models common sense synthesize...

10.1016/j.eng.2020.01.011 article EN cc-by-nc-nd Engineering 2020-02-22

Who's the MVP? A Game-Theoretic Evaluation Benchmark for Modular Attribution in LLM Agents

OPENALEX - Publications

Yingxuan Yang Bo Huang Siyuan Qi Feng Chao Haili Hu and 11 more

Large Language Model (LLM) agents frameworks often employ modular architectures, incorporating components such as planning, reasoning, action execution, and reflection to tackle complex tasks. However, quantifying the contribution of each module overall system performance remains a significant challenge, impeding optimization interpretability. To address this, we introduce CapaBench (Capability-level Assessment Benchmark), an evaluation framework grounded in cooperative game theory's Shapley...

10.48550/arxiv.2502.00510 preprint EN arXiv (Cornell University) 2025-02-01

A Generalized Earley Parser for Human Activity Parsing and Prediction

OPENALEX - Publications

Siyuan Qi Baoxiong Jia Siyuan Huang Ping Wei Song‐Chun Zhu

Detection, parsing, and future predictions on sequence data (e.g., videos) require the algorithms to capture non-Markovian compositional properties of high-level semantics. Context-free grammars are natural choices such properties, but traditional grammar parsers Earley parser) only take symbolic sentences as inputs. In this paper, we generalize parser parse which is neither segmented nor labeled. Given output an arbitrary probabilistic classifier, generalized finds optimal segmentation...

10.1109/tpami.2020.2976971 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2020-02-28

The Tong Test: Evaluating Artificial General Intelligence Through Dynamic Embodied Physical and Social Interactions

OPENALEX - Publications

Yujia Peng Jiaheng Han Zhenliang Zhang Lifeng Fan Tengyu Liu and 5 more

The release of the generative pre-trained transformer (GPT) series has brought artificial general intelligence (AGI) to forefront (AI) field once again. However, questions how define and evaluate AGI remain unclear. This perspective article proposes that evaluation should be rooted in dynamic embodied physical social interactions (DEPSI). More specifically, we propose five critical characteristics considered as benchmarks suggest Tong test an system. describes a value- ability-oriented...

10.1016/j.eng.2023.07.006 article EN cc-by-nc-nd Engineering 2023-08-09

Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

OPENALEX - Publications

Siyuan Huang Siyuan Qi Yinxue Xiao Yixin Zhu Ying Wu and 1 more

Holistic 3D indoor scene understanding refers to jointly recovering the i) object bounding boxes, ii) room layout, and iii) camera pose, all in 3D. The existing methods either are ineffective or only tackle problem partially. In this paper, we propose an end-to-end model that simultaneously solves three tasks real-time given a single RGB image. essence of proposed method is improve prediction by parametrizing targets (e.g., boxes) instead directly estimating targets, cooperative training...

10.48550/arxiv.1810.13049 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Intent-Aware Multi-Agent Reinforcement Learning

OPENALEX - Publications

Siyuan Qi Song‐Chun Zhu

This paper proposes an intent-aware multi-agent planning framework as well a learning algorithm. Under this framework, agent plans in the goal space to maximize expected utility. The process takes belief of other agents' intents into consideration. Instead formulating problem partially observable Markov decision (POMDP), we propose simple but effective linear function approximation utility function. It is based on observation that for humans, people's will pose influence our goal. proposed...

10.1109/icra.2018.8463211 preprint EN 2018-05-01

VRGym

OPENALEX - Publications

Xu Xie Hangxin Liu Zhenliang Zhang Yuxing Qiu Feng Gao and 3 more

We propose VRGym, a virtual reality (VR) testbed for realistic human-robot interaction. Different from existing toolkits and VR environments, the VRGym emphasizes on building training both physical interactive agents robotics, machine learning, cognitive science. leverages mechanisms that can generate diverse 3D scenes with high realism through physics-based simulation. demonstrate is able to (i) collect human interactions fine manipulations, (ii) accommodate various robots ROS bridge, (iii)...

10.1145/3321408.3322633 article EN Proceedings of the ACM Turing Celebration Conference - China 2019-05-17

Reasoning Visual Dialogs with Structural and Partial Observations

OPENALEX - Publications

Zilong Zheng Wenguan Wang Siyuan Qi Song‐Chun Zhu

10.48550/arxiv.1904.05548 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Coming Soon ...