Takahiro Mochizuki

ORCID: 0009-0009-9576-4700
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Analysis and Summarization
  • Image Retrieval and Classification Techniques
  • Advanced Image and Video Retrieval Techniques
  • Music and Audio Processing
  • Face recognition and analysis
  • Sports Analytics and Performance
  • Human-Automation Interaction and Safety
  • Human Motion and Animation
  • Multimedia Communication and Technology
  • Teleoperation and Haptic Systems
  • Cerebrovascular and Carotid Artery Diseases
  • Cerebrospinal fluid and hydrocephalus
  • Handwritten Text Recognition Techniques
  • Visual Attention and Saliency Detection
  • Intracranial Aneurysms: Treatment and Complications
  • Multimodal Machine Learning Applications
  • Speech Recognition and Synthesis
  • Robot Manipulation and Learning
  • Image and Video Stabilization
  • Virtual Reality Applications and Impacts
  • Craniofacial Disorders and Treatments
  • Recommender Systems and Techniques
  • Pharmacy and Medical Practices
  • Ophthalmology and Eye Disorders
  • Brain Tumor Detection and Classification

NHK Spring (Japan)
2025

Tokyo Institute of Technology
2023-2024

Kanagawa Cardiovascular and Respiratory Center
2022-2023

Japan Broadcasting Corporation (Japan)
2010-2020

Nihon University
2014

10.1109/icassp49660.2025.10890127 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.5594/jmi.2024/bzsk5411 article EN SMPTE Motion Imaging Journal 2024-04-01

We propose a robust scene recognition system for baseball broadcast videos. This is based on the data-driven approach which has been successful in continuous speech recognition. It uses multi-stream hidden Markov model to each and an unsupervised adaptation method achieve robustness against differences environmental conditions among games. also employs n-gram language represent contexts scenes, length information. The proposed was evaluated experiments with 16 types acquired from video data...

10.1145/1282280.1282312 article EN 2007-07-09

We developed a new way of viewing TV, CurioView, which uses metadata and retrieval technology to satisfy viewers' curiosity by recommending wide-ranging video content related the viewer is currently watching. describe general expandable architecture that based on CurioView's functions. The can be applied flexibly, not just TVs, but also PCs mobile terminals. report fundamental testing prototype system using this architecture.

10.1109/isbmsb.2010.5463129 article EN 2022 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) 2010-03-01

The rapid increase in the volume of electronically archived images and video materials has given rise to a need for new methods image retrieval. Therefore, various approaches have been proposed. Most conventional retrieval are based on content objects or features describing only partial properties images. But diverse we must deal with induced necessity retrieve more flexibly effectively, focusing multiple In this paper, describe method two ideas. One is adopting fractal sequence designed...

10.3169/itej.57.719 article EN The Journal of The Institute of Image Information and Television Engineers 2003-01-01

In this paper, we investigate a scenario of one-human-multiple-robot navigation in three dimensions, and examine the impacts VR (Virtual Reality) technology on human properties from control-theoretic perspective. We start by reviewing passivity-based distributed control architecture that takes complementary interactions such motion synchronization is autonomously completed robot controller while operator dedicated to navigation. Due limited capability 3-D recognition dimensionality real-time...

10.1016/j.ifacol.2023.01.100 article EN IFAC-PapersOnLine 2022-01-01

We propose a robust scene recognition framework using context information for multimedia contents. Multimedia contents con-sist of sequences that are more likely to happen compared with other sequences. employ statistical approach deal this information. hidden Markov model (HMM) each and n-gram language represent the contexts among scenes. evaluated proposed method in experiments 16 scenes video data 25 baseball games. The significantly improved results without

10.1145/1178677.1178693 article EN 2006-10-26

Nippon Hoso Kyokai (NHK, Japan Broadcasting Corporation), has developed a new artificial intelligence (AI)-driven broadcasting technology called “Smart Production” designed to quickly and accurately gather analyze diverse types of social information deliver wide range viewers. Smart Production uses AI obtained from media open data as well the know-how related program production possessed by broadcast stations. This approach makes it possible extract events incidents in society present...

10.5594/jmi.2019.2959173 article EN SMPTE Motion Imaging Journal 2020-03-01

Visual-based image retrieval based on the visual similarity over entire is very useful when targeting various kinds of large-volume content. This method generally divides an into grid-shaped blocks and uses similarities a comparison features between corresponding block regions in two different images. However, sometimes fails terms object-conscious their backgrounds are almost same but only object or object's positions and/or sizes different. In this paper, we propose new featuring...

10.1109/acpr.2013.106 article EN 2013-11-01

放送局では,ソーシャルメディアなどを利用して番組の要約映像を配信するサービスの必要性が高まっており,映像を自動で要約する技術が求められている.今回我々は,ニュース番組を対象とした映像の自動要約技術を開発し,放送局での実用を想定したニュース要約映像の作成支援システムを試作した.「放送用要約映像に使われやすいシーンの画像特徴」を学習したニューラルネットワークと音声認識技術を利用した自動要約技術により,「放送映像に適した"画作り"」と「アナウンサーの発話内容」の双方を考慮したニュースの要約映像生成を可能にした.また,試作したシステムには,利用者が入力したニュース番組映像を自動で要約するだけでなく,利用者の多様な「こだわり」に応えるための修正機能も実装した.さらに,映像自動要約技術の性能評価と番組制作現場での利用検証により,試作したシステムの有用性を示した.

10.3169/itej.77.262 article EN The Journal of The Institute of Image Information and Television Engineers 2023-01-01

A method has been developed for automatically classifying baseball video scenes into some events that describe their content.The are patternized using a set of rectangles with image features and motion vectors.The basic unit patternization is shot.For the second shot each scene which includes significant information event-classification,a partial generated by dividing used as processing unit.The training expressed sequenced symbols based on data shots shots.“Event-unknown”baseball assigned...

10.3169/itej.61.1139 article EN The Journal of The Institute of Image Information and Television Engineers 2007-01-01

2011年の東日本大震災の際には,報道用の素材映像(震災映像)が大量に撮影された.これらの震災映像を整理,蓄積し活用していくことは,放送局としての使命でもある.しかし,短期間にあまりに大量の映像が撮影されたため,データベース化も進んでおらず,再利用もままならない状況であった.今回,被写体認識技術を中心とするメタデータの自動付与技術を利用して,放送現場でも容易に利用できるメタデータ補完システムを開発し,メタデータの付与作業の支援と映像検索を実現した.本稿では,開発したメタデータ補完システムの機能と性能の概要,およびNHK福島放送局で行っている実験について報告する.

10.3169/itej.70.j133 article JA The Journal of The Institute of Image Information and Television Engineers 2016-01-01

われわれは,放送アーカイブスの有効利用を目的とし,手動付与されたテキスト情報に頼らない手法として,画像解析による検索技術の研究を進めてきた.そして,2015年1月から約4ヵ月間,関連部署との連携により,番組制作者が実際に使用するアーカイブス情報システムにおいて,画像解析を用いた新しい検索システムの検証実験を行った.さらに,ユーザからの評価および操作ログの解析により,今後の実用化に向けて有用なデータを得た.本稿では,導入した技術の紹介,システムや機能の概要に加え,実際のユーザによる評価と操作ログの解析結果,および主な検索機能の精度について述べる.

10.3169/itej.70.j238 article EN The Journal of The Institute of Image Information and Television Engineers 2016-01-01

‘Visual‐based’ image retrieval based on the visual similarities over entire is one of powerful and useful ways when targeting large volume content with inadequate annotation. Generally, conventional methods divide a query target images in database into grid‐shaped blocks calculate similarity features by comparing each corresponding block straightforwardly. However, method sometimes fails terms object‐conscious their backgrounds are almost same but only object different or object's size...

10.1002/tee.22325 article EN IEEJ Transactions on Electrical and Electronic Engineering 2016-12-01

Recently, Head Up Display (HUD) are widely noticed as in-vehicle HMI device. HUD is effective on shortening cognitive time by displaying information in front of the driver. However, Driver cannot receive a sense distance current display. And, depending display method may induce driver distraction. The purpose this study to examine whether receives changing focus we shortens after giving awareness.

10.1299/jsmetld.2014.23.317 article EN The Proceedings of the Transportation and Logistics Conference 2014-01-01

本論文では,エッジ空間パターンを利用したテレビ番組映像からの字幕テキスト領域の検出手法を提案する.エッジ空間パターンは,文字を形成する線や点の配置を反映した特徴量であり,着目画素の周囲におけるエッジの分布パターンに基づいて算出される.エッジの交差や直線以外のエッジも考慮することができ,背景画像の変動にも頑健な特徴量である.提案手法では,番組映像から取得したフレーム画像を走査窓で走査し,窓領域から算出した特徴量を機械学習で判別することによって字幕テキストの候補領域を求める.その後,検出された候補領域をエッジ密度や領域の形状などに基づいて選別し,字幕テキスト領域の外接矩形を検出する.約10時間のテレビ番組映像を対象とした評価実験では,再現率が89.9%,適合率が88.0%,F値が0.889という結果が得られ,従来手法よりもF値が0.135向上することが確認された.

10.3169/itej.69.j197 article JA The Journal of The Institute of Image Information and Television Engineers 2015-01-01
Coming Soon ...