Robin Courant

ORCID: 0009-0002-5329-4009
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Analysis and Summarization
  • Multimodal Machine Learning Applications
  • Human Motion and Animation
  • Advanced Vision and Imaging
  • Humor Studies and Applications
  • Human Pose and Action Recognition
  • Music and Audio Processing
  • Subtitles and Audiovisual Media
  • Brain Tumor Detection and Classification
  • Reinforcement Learning in Robotics
  • Advanced Optical Imaging Technologies
  • 3D Shape Modeling and Analysis
  • Augmented Reality Applications
  • Computer Graphics and Visualization Techniques
  • Cell Image Analysis Techniques
  • Robotic Path Planning Algorithms
  • Video Coding and Compression Technologies
  • COVID-19 diagnosis using AI
  • Evacuation and Crowd Dynamics
  • Advanced Image Processing Techniques
  • Hand Gesture Recognition Systems
  • Advanced Image and Video Retrieval Techniques
  • Robot Manipulation and Learning

École Polytechnique
2021-2024

Laboratoire d'Informatique de l'École Polytechnique
2021-2024

Université de Rennes
2022-2023

Institut de Recherche en Informatique et Systèmes Aléatoires
2022-2023

Centre National de la Recherche Scientifique
2021-2023

Institut national de recherche en informatique et en automatique
2022-2023

Abstract Automatically understanding funny moments (i.e., the that make people laugh) when watching comedy is challenging, as they relate to various features, such body language, dialogues and culture. In this paper, we propose FunnyNet-W, a model relies on cross- self-attention for visual, audio text data predict in videos. Unlike most methods rely ground truth form of subtitles, work exploit modalities come naturally with videos: (a) video frames contain visual information indispensable...

10.1007/s11263-024-02000-2 article EN cc-by International Journal of Computer Vision 2024-02-23

This paper presents JAWS, an optimization-driven approach that achieves the robust transfer of visual cinematic features from a reference in-the-wild video clip to newly generated clip. To this end, we rely on implicit-neural-representation (INR) in way compute shares same as We propose general formulation camera optimization problem INR computes extrinsic and intrinsic parameters well timing. By leveraging differentiability neural representations, can back-propagate our designed losses...

10.1109/cvpr52729.2023.01624 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Automatically understanding funny moments (i.e., the that make people laugh) when watching comedy is challenging, as they relate to various features, such body language, dialogues and culture. In this paper, we propose FunnyNet-W, a model relies on cross- self-attention for visual, audio text data predict in videos. Unlike most methods rely ground truth form of subtitles, work exploit modalities come naturally with videos: (a) video frames contain visual information indispensable scene...

10.48550/arxiv.2401.04210 preprint EN cc-by arXiv (Cornell University) 2024-01-01

Stories and emotions in movies emerge through the effect of well-thought-out directing decisions, particular camera placement movement over time. Crafting compelling trajectories remains a complex iterative process, even for skilful artists. To tackle this, this paper, we propose dataset called Exceptional Trajectories (E.T.) with along character information textual captions encompassing descriptions both character. our knowledge, is first its kind. show potential applications E.T. dataset,...

10.48550/arxiv.2407.01516 preprint EN arXiv (Cornell University) 2024-07-01

Recent advances in text-conditioned video diffusion have greatly improved quality. However, these methods offer limited or sometimes no control to users on camera aspects, including dynamic motion, zoom, distorted lens and focus shifts. These motion optical aspects are crucial for adding controllability cinematic elements generation frameworks, ultimately resulting visual content that draws focus, enhances mood, guides emotions according filmmakers' controls. In this paper, we aim close the...

10.48550/arxiv.2412.14158 preprint EN arXiv (Cornell University) 2024-12-18

The artistic crafting of 3D animations by designers is a complex and iterative process. While classical animation tools have brought significant improvements in creating manipulating shapes over time, most approaches rely on 2D input devices to create contents. With the advent virtual reality technologies their ability dive users worlds precisely track 6 dimensions (position orientation), number VR creative emerged such as Quill, AnimVR, Tvori, Tiltbrush or MasterPieceVR. these provide...

10.1109/aivr56993.2022.00016 preprint EN 2022-12-01

Transformers were initially introduced for natural language processing (NLP) tasks, but fast they adopted by most deep learning fields, including computer vision. They measure the relationships between pairs of input tokens (words in case text strings, parts images visual Transformers), termed attention. The cost is exponential with number tokens. For image classification, common Transformer Architecture uses only Encoder order to transform various However, there are also numerous other...

10.48550/arxiv.2303.12068 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Neural Radiance Fields (NeRFs) have revolutionized scene novel view synthesis, offering visually realistic, precise, and robust implicit reconstructions. While recent approaches enable NeRF editing, such as object removal, 3D shape modification, or material property manipulation, the manual annotation prior to edits makes process tedious. Additionally, traditional 2D interaction tools lack an accurate sense of space, preventing precise manipulation editing scenes. In this paper, we introduce...

10.48550/arxiv.2309.03933 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Neural Radiance Fields (NeRFs) have revolutionized scene novel view synthesis, offering visually realistic, precise, and robust implicit reconstructions. While recent approaches enable NeRF editing, such as object removal, 3D shape modification, or material property manipulation, the manual annotation prior to edits makes process tedious. Additionally, traditional 2D interaction tools lack an accurate sense of space, preventing precise manipulation editing scenes. In this paper, we introduce...

10.1109/iccvw60793.2023.00310 preprint EN 2023-10-02
Coming Soon ...