- Music and Audio Processing
- Speech and Audio Processing
- Video Analysis and Summarization
- Music Technology and Sound Studies
- Advanced Image and Video Retrieval Techniques
- Advanced Image Processing Techniques
- Video Surveillance and Tracking Methods
- Multimodal Machine Learning Applications
- Human Pose and Action Recognition
- Generative Adversarial Networks and Image Synthesis
- Face recognition and analysis
- Image and Signal Denoising Methods
- Neuroscience and Music Perception
- Domain Adaptation and Few-Shot Learning
- Speech Recognition and Synthesis
- Advanced Vision and Imaging
- Diverse Musicological Studies
- Image Processing Techniques and Applications
- Gait Recognition and Analysis
- Topic Modeling
- Advanced Image Fusion Techniques
- Porphyrin and Phthalocyanine Chemistry
- Image Retrieval and Classification Techniques
- Advanced Text Analysis Techniques
- Data Management and Algorithms
Hiroshima University
2024-2025
National Institute of Informatics
2015-2024
Nanyang Technological University
2024
The Graduate University for Advanced Studies, SOKENDAI
2018-2023
RIKEN Center for Advanced Intelligence Project
2023
Mongolia International University
2023
Peking University
2010-2022
Beijing National Laboratory for Molecular Sciences
2010-2022
Nanjing University of Posts and Telecommunications
2021
Nanjing University of Science and Technology
2021
Person reidentification is a key technique to match different persons observed in nonoverlapping camera views. Many researchers treat it as special object-retrieval problem, where ranking optimization plays an important role. Existing methods mainly utilize the similarity relationship between probe and gallery images optimize original list, but seldom consider dissimilarity relationship. In this paper, we propose use both cues framework for person reidentification. Its core idea that true...
Real-time semantic segmentation, which can be visually understood as the pixel-level classification task on input image, currently has broad application prospects, especially in fast-developing fields of autonomous driving and drone navigation. However, huge burden calculation together with redundant parameters are still obstacles to its technological development. In this article, we propose a Fast Bilateral Symmetrical Network (FBSNet) alleviate above challenges. Specifically, FBSNet...
Person re-identification, aiming to identify images of the same person from various cameras configured in different places, has attracted much attention multimedia retrieval community. In this problem, choosing a proper distance metric is crucial aspect, and many classic methods utilize uniform learnt metric. However, their performance limited due ignoring zero-shot fine-grained characteristics presented real re-identification applications. paper, we investigate two consistencies across...
"Synergistic effect" is prevalent in natural metalloenzymes activating small molecules, and the success has inspired development of artificial catalysts capable unprecedented organic transformations. In this work, we found that attractive π–π interaction between additives (as electron-donors) perfluorinated arenes electron acceptors) effective gold hydride catalyzed activation C–F bonds, specifically hydrodefluorination (HDF) perfluoroarenes by Sadighi's hydrides [(NHC)AuH] (NHC =...
Deep cross-modal learning has successfully demonstrated excellent performance in multimedia retrieval, with the aim of joint representations between different data modalities. Unfortunately, little research focuses on correlation where temporal structures modalities, such as audio and lyrics, should be taken into account. Stemming from characteristic music nature, we are motivated to learn deep sequential lyrics. In this work, propose a architecture involving two-branch neural networks for...
In this work, travel destinations and business locations are taken as venues. Discovering a venue by photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such photographs generated users. Our goal fine-grained discovery from heterogeneous social multimodal data. To end, we propose novel deep learning model, category-based canonical correlation analysis. Given input, model performs: 1) exact search (find the...
In Intelligent Tutoring System (ITS), tracing the student's knowledge state during learning has been studied for several decades in order to provide more supportive instructions. this paper, we propose a novel model that i) captures students' ability and dynamically assigns students into distinct groups with similar at regular time intervals, ii) combines information Recurrent Neural Network architecture known as Deep Knowledge Tracing. Experimental results confirm proposed is significantly...
In recent years, how to strike a good trade-off between accuracy, inference speed, and model size has become the core issue for real-time semantic segmentation applications, which plays vital role in real-world scenarios such as autonomous driving systems drones. this study, we devise novel lightweight network using multi-scale context fusion (MSCFNet) scheme, explores an asymmetric encoder-decoder architecture alleviate these problems. More specifically, encoder adopts some developed...
Convolutional neural networks based single-image superresolution (SISR) has made great progress in recent years. However, it is difficult to apply these methods real-world scenarios due the computational and memory cost. Meanwhile, how take full advantage of intermediate features under constraints limited parameters calculations also a huge challenge. To alleviate issues, we propose lightweight yet efficient Feature Distillation Interaction Weighted Network (FDIWN). Specifically, FDIWN...
Abstract Photodynamic therapy (PDT) is a non‐invasive treatment modality against range of cancers and nonmalignant diseases, however one must be aware the risk causing phototoxic reactions after treatment. We herein report bioinspired design next‐generation photosensitizers (PSs) that not only effectively produce ROS but undergo fast metabolism to overcome undesirable side effects. constructed series β‐pyrrolic ring‐opening seco ‐chlorins, termed beidaphyrin ( BP ), beidapholactone BPL their...
A novel class of ZnSalens (ZnL(1-10)) with lipophilic and cationic conjugates as optical probes in single two-photon fluorescence microscopy images living cells were prepared, which exhibited chemo- photostability, low cytotoxicity high subcellular selectivity.
Person reidentification (re-id), as an important task in video surveillance and forensics applications, has been widely studied. Previous research efforts toward solving the person re-id problem have primarily focused on constructing robust vector description by exploiting appearance's characteristic, or learning discriminative distance metric labeled vectors. Based cognition identification process of human, we propose a new pattern, which transforms feature from characteristic to...
Representation-residual-based classifiers have attracted much attention in recent years hyperspectral image (HSI) classification. How to obtain the optimal representa-tion coefficients for classification task is key problem of these methods. In this letter, spatial-aware collaborative representation (CR) proposed HSI order make full use spatial-spectral information, we propose a closed-form solution, which spatial and spectral features are both utilized induce distance-weighted...
Capturing videos anytime and anywhere, then instantly sharing them online, has become a very popular activity. However, many outdoor user-generated (UGVs) lack certain appeal because their soundtracks consist mostly of ambient background noise. Aimed at making UGVs more attractive, we introduce ADVISOR, personalized video soundtrack recommendation system. We propose fast effective heuristic ranking approach based on heterogeneous late fusion by jointly considering three aspects: venue...
Person re-identification is widely applied in video surveillance and criminal investigation applications. To achieve better performance, an additional re-ranking step often exploited. Related methods attempt to optimize the result according every single query independently. However, a practical scene, as process goes on, other queries, particular, gradually accumulated logs, can be used guide or regularize current query. In this paper, we propose not only itself but also queries historical...
Melody generation from lyrics has been a challenging research issue in the field of artificial intelligence and music, which enables us to learn discover latent relationships between interesting accompanying melodies. Unfortunately, limited availability paired lyrics–melody dataset with alignment information hindered progress. To address this problem, we create large consisting 12,197 MIDI songs each melody through leveraging different music sources where relationship syllables attributes is...
Abstract We report the first example of gold catalyzing CF bond activation for perfluoroarenes in presence silanes. Tricoordinated gold(I) complexes supported by Xantphos‐type ligands, such as Xantphos and t BuXantphos exhibit efficacy hydrodefluorination (HDF) various types perfluoroarenes. For [ BuXantphosAu(AuCl 2 )], highest turnover number is up to 1000 HDF pentafluoronitrobenzene with diphenylsilane. An examination functional group tolerance shows orthogonality this catalytic protocol...
Face hallucination is a technique that reconstructs high-resolution (HR) faces from low-resolution (LR) faces, by using the prior knowledge learned HR/LR face pairs. Most state-of-the-arts leverage position-patch of human to estimate optimal representation coefficients for each image patch. However, they focus only position information and usually ignore context In addition, when are confronted with misalignment or small sample size (SSS) problem, performance very poor. To this end, paper...
Cross-modal retrieval aims to retrieve data in one modality by a query another modality, which has been very interesting research issue the field of multimedia, information retrieval, and computer vision, database. Most existing works focus on cross-modal between text-image, text-video, lyrics-audio. Little addresses audio video due limited audio-video paired datasets semantic information. The main challenge audio-visual task focuses learning joint embeddings from shared subspace for...
Cross-resolution face recognition (CRFR), which is important in intelligent surveillance and biometric forensics, refers to the problem of matching a low-resolution (LR) probe image against high-resolution (HR) gallery images. Existing shallow learning-based deep methods focus on mapping HR-LR pairs into joint feature space where resolution discrepancy mitigated. However, little works consider how extract utilize intermediate discriminative features from noisy LR query faces further mitigate...