Osamu Yoshie

ORCID: 0000-0002-4192-554X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Video Surveillance and Tracking Methods
  • Natural Language Processing Techniques
  • Domain Adaptation and Few-Shot Learning
  • Topic Modeling
  • Advanced Image and Video Retrieval Techniques
  • Speech and dialogue systems
  • Digital Games and Media
  • Face and Expression Recognition
  • Speech and Audio Processing
  • Advanced Vision and Imaging
  • Music and Audio Processing
  • Anomaly Detection Techniques and Applications
  • Semantic Web and Ontologies
  • Artificial Intelligence in Games
  • Educational Games and Gamification
  • Thermal Radiation and Cooling Technologies
  • Human Motion and Animation
  • Multimodal Machine Learning Applications
  • Speech Recognition and Synthesis
  • Gear and Bearing Dynamics Analysis
  • Neural Networks and Applications
  • Digital Transformation in Industry
  • Manufacturing Process and Optimization
  • Emotion and Mood Recognition

Waseda University
2016-2025

Division of Undergraduate Education
2022

Framework
2022

Graduate School USA
2005-2018

Fudan University
2018

Tokyo University of Science
1993-1995

The University of Tokyo
1994

Recent advances in label assignment object detection mainly seek to independently define positive/negative training samples for each ground-truth (gt) object. In this paper, we innovatively revisit the from a global perspective and propose formulate assigning procedure as an Optimal Transport (OT) problem – well-studied topic Optimization Theory. Concretely, unit transportation cost between demander (anchor) supplier pair weighted summation of their classification regression losses. After...

10.1109/cvpr46437.2021.00037 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Although significant progress has been made in pedestrian detection recently, crowded scenes is still challenging. The heavy occlusion between pedestrians imposes great challenges to the standard Non-Maximum Suppression (NMS). A relative low threshold of intersection over union (IoU) leads missing highly overlapped pedestrians, while a higher one brings plenty false positives. To avoid such dilemma, this paper proposes novel Representative Region NMS (R2NMS) approach leveraging less occluded...

10.1109/cvpr42600.2020.01076 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Being effective and efficient is essential to an object detector for practical use. To meet these two concerns, we comprehensively evaluate a collection of existing refinements improve the performance PP-YOLO while almost keep infer time unchanged. This paper will analyze empirically their impact on final model through incremental ablation study. Things tried that didn't work also be discussed. By combining multiple refinements, boost PP-YOLO's from 45.9% mAP 49.5% COCO2017 test-dev. Since...

10.48550/arxiv.2104.10419 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Multi-label image recognition has attracted considerable research attention and achieved great success in recent years. Capturing label correlations is an effective manner to advance the performance of multi-label recognition. Two types were principally studied, i.e., spatial semantic correlations. However, literature, previous methods considered only either them. In this work, inspired by Transformer, we propose a plug-and-play module, named Spatial Semantic Transformers (SST),...

10.1109/tip.2022.3148867 article EN IEEE Transactions on Image Processing 2022-01-01

Label assignment has been widely studied in general object detection because of its great impact on detectors' performance. In the field dense pedestrian detection, human bodies are often heavily entangled, making label more important. However, none existing method focuses crowd scenarios. Motivated by this, we propose Loss-aware Assignment (LLA) to boost performance detectors Concretely, LLA first calculates classification (cls) and regression (reg) losses between each anchor ground-truth...

10.1016/j.neucom.2021.07.094 article EN cc-by Neurocomputing 2021-08-06

Abstract Multilayer optical film plays a significant role in broad fields of application. Due to the nonlinear relationship between dispersion characteristics materials and actual performance parameters thin films, it is challenging optimize structure with traditional models. In this paper, we present an implementation Deep Q-learning, which suited for most part film. As set concrete demonstrations, solar absorber. The optimal program could absorber 500 epoch (about 200 steps per-epoch)...

10.1038/s41598-020-69754-w article EN cc-by Scientific Reports 2020-07-29

Job shop scheduling problem (JSSP) is one of the well‐known NP‐hard combinatorial optimization problems (COPs) that aims to optimize sequential assignment finite machines a set jobs while adhering specified constraints. Conventional solution approaches which include heuristic dispatching rules and evolutionary algorithms has been largely in use solve JSSPs. Recently, reinforcement learning (RL) gained popularity for delivering better quality In this research, we propose an end‐to‐end deep...

10.1002/tee.23788 article EN IEEJ Transactions on Electrical and Electronic Engineering 2023-03-24

Text-supervised semantic segmentation is a novel research topic that allows segments to emerge with image-text contrasting. However, pioneering methods could be subject specifically designed network architectures. This paper shows vanilla contrastive language-image pretraining (CLIP) model an effective text-supervised segmentor by itself. First, we reveal CLIP inferior localization and due its optimization being driven densely aligning visual language representations. Second, propose the...

10.1109/cvpr52729.2023.00683 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Abstract Nowadays automatic speech recognition (ASR) systems can achieve higher and accuracy rates depending on the methodology applied datasets used. The rate decreases significantly when ASR system is being used with a non-native speaker of language to be recognized. main reason for this specific pronunciation accent features related mother tongue that speaker, which influence pronunciation. At same time, an extremely limited volume labeled makes it difficult train, from ground up,...

10.1186/s13636-021-00199-3 article EN cc-by EURASIP Journal on Audio Speech and Music Processing 2021-02-18

Vehicle routing problem (VRP) is one of the classic combinatorial optimization problems where an optimal tour to visit customers required with a minimum total cost in presence some constraints. Recently, VRP being solved use deep reinforcement learning (DRL), node sets considered (represented) as graph structure. Existing Transformer based DRL solutions for rely only on information ignoring role edges between nodes In this paper, we proposed attention‐based end‐to‐end model solve which...

10.1002/tee.23771 article EN IEEJ Transactions on Electrical and Electronic Engineering 2023-02-13

Abstract Livestreaming commerce is increasingly influencing the sports industry’s supply chain. This study seeks to understand how quality of service characteristics enhance customer flow experiences and encourage buying behaviour. It also delves into relationship between experience impulsive behaviour, particularly examining fan identification moderates this dynamic. Data from 274 participants, who recounted their recent shopping while watching on SLSPs, were analysed. Structural equation...

10.1007/s12063-024-00536-7 article EN cc-by Operations Management Research 2025-01-04

In real-world scenarios, multi-view cameras are typically employed for fine-grained manipulation tasks. Existing approaches (e.g., ACT) tend to treat features equally and directly concatenate them policy learning. However, it will introduce redundant visual information bring higher computational costs, leading ineffective manipulation. For a task, tends involve multiple stages while the most contributed view different is varied over time. this paper, we propose plug-and-play...

10.48550/arxiv.2502.11161 preprint EN arXiv (Cornell University) 2025-02-16

10.1109/icassp49660.2025.10888982 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Detecting human bodies in highly crowded scenes is a challenging problem. Two main reasons result such problem: 1). weak visual cues of heavily occluded instances can hardly provide sufficient information for accurate detection; 2). are easier to be suppressed by Non-Maximum-Suppression (NMS). To address these two issues, we introduce variant two-stage detectors called PS-RCNN. PS-RCNN first detects slightly/none objects an R-CNN [1] module (referred as P-RCNN), and then suppress the...

10.1109/icme46284.2020.9102793 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2020-06-09

Software-defined networks is an emerging architecture that separates the control plane and data plane. This paradigm enables flexible network resource allocations for traffic engineering, which aims to gain better capacity improved delay loss performance. As we know, many heuristic algorithms have been developed solve dynamic routing problem. Whereas they lead a high computational time cost, results in crucial problem whether such approach this NP-complete of any use practice. paper proposes...

10.1109/icnidc.2014.7000278 article EN 2014-09-01

To improve the optical absorptance of a solar selective absorber over wide wavelength range, an eight-layered metal-dielectric film structure was designed by transfer matrix method and fabricated with magnetron sputtering method. The experimental results showed that multilayered yields high 98.3% excellent spectral selectivity angular range in radiation region 250–2000 nm, total hemispherical emittance 0.12 at 400 K, nearly unchanged reflectance after heat treatment 673 K for 48 h vacuum,...

10.1088/2053-1591/aacdb3 article EN Materials Research Express 2018-06-20

The optical properties and thermal stability of a 6-layered metal/dielectric film structure are investigated in this work. A high absorption average > 98% is achieved the broad spectral range 250-1200 nm with experiment results, good agreement our simulated results. samples have typical layered of: SiO(2)(57.3 nm)/Ti(5.7 nm)/SiO(2) (67.1 nm)/Ti(11.6 nm)/SiO(2)(51.4 nm)/Cu(>100 nm), deposited on optically polished Si or K9-glass substrates by magnetron sputtering. sample has an AM1.5G solar...

10.1364/oe.22.0a1843 article EN cc-by Optics Express 2014-11-13
Coming Soon ...