Zhongjie Ye

ORCID: 0000-0003-0306-5267
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Music and Audio Processing
  • Force Microscopy Techniques and Applications
  • Video Analysis and Summarization
  • Lipid Membrane Structure and Behavior
  • Animal Vocal Communication and Behavior
  • Handwritten Text Recognition Techniques
  • Multimodal Machine Learning Applications
  • Hearing Loss and Rehabilitation
  • Neural Networks and Applications
  • Human Pose and Action Recognition
  • Cellular Mechanics and Interactions
  • Advanced Electron Microscopy Techniques and Applications
  • Speech Recognition and Synthesis
  • RNA and protein synthesis mechanisms
  • Mechanical and Optical Resonators
  • Ion channel regulation and function

Scuola Internazionale Superiore di Studi Avanzati
2019-2024

Peking University
2022

Abstract Transmembrane protein 16 F (TMEM16F) is a Ca 2+ -activated homodimer which functions as an ion channel and phospholipid scramblase. Despite the availability of several TMEM16F cryogenic electron microscopy (cryo-EM) structures, mechanism activation substrate translocation remains controversial, possibly due to restrictions in accessible conformational space. In this study, we use atomic force under physiological conditions reveal range structurally mechanically diverse assemblies,...

10.1038/s41467-023-44377-7 article EN cc-by Nature Communications 2024-01-02

There is growing evidence suggesting that mechanical properties of CNS neurons may play an important regulatory role in cellular processes. Here, we employ oscillatory optical tweezers (OOT) to exert a local indentation with forces the range 5-50 pN. We found single above threshold 13 ± 1 pN evokes transient intracellular calcium change, whereas repeated stimulations induce more sustained and variable response. Importantly, were able differentiate magnitude stimuli. Chemical perturbation...

10.1016/j.isci.2022.103807 article EN cc-by-nc-nd iScience 2022-01-25

Although prototypical network (ProtoNet) has proved to be an effective method for few-shot sound event detection, two problems still exist. Firstly, the small-scaled support set is insufficient so that class prototypes may not represent center accurately. Secondly, feature extractor task-agnostic (or class-agnostic): trained with base-class data and directly applied unseen-class data. To address these issues, we present a novel mutual learning framework transductive learning, which aims at...

10.1109/icassp43922.2022.9746042 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Automated audio captioning (AAC) has developed rapidly in recent years, involving acoustic signal processing and natural language to generate human-readable sentences for clips. The current models are generally based on the neural encoder-decoder architecture, their decoder mainly uses information that is extracted from CNN-based encoder. However, they have ignored semantic could help AAC model meaningful descriptions. This paper proposes a novel approach automated incorporating information....

10.48550/arxiv.2110.06100 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Single-molecule force spectroscopy (SMFS) uses the cantilever tip of an atomic microscopy (AFM) to apply a able unfold single protein. The obtained force-distance curve encodes unfolding pathway, and from its analysis it is possible characterize folded domains. SMFS has been mostly used study purified proteins, in solution or reconstituted lipid bilayer. Here, we describe pipeline for analyzing membrane proteins based on SMFS, which involves isolation plasma cells harvesting curves directly...

10.7554/elife.77427 article EN cc-by eLife 2022-09-12

Automated audio captioning (AAC) aims at generating natural language descriptions for an clip. Due to the difficulty and high cost of annotating audio-caption pairs, existing dataset is a very small scale which leads unsatisfied performance AAC models. One intuitive effective solution augment training data boost instead more data. To this end, we propose online augmentation method (FeatureCut) incorporating encoder-decoder framework enable decoder fully make use acoustic features in...

10.23919/apsipaasc55919.2022.9980325 article EN 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2022-11-07

Although prototypical network (ProtoNet) has proved to be an effective method for few-shot sound event detection, two problems still exist. Firstly, the small-scaled support set is insufficient so that class prototypes may not represent center accurately. Secondly, feature extractor task-agnostic (or class-agnostic): trained with base-class data and directly applied unseen-class data. To address these issues, we present a novel mutual learning framework transductive learning, which aims at...

10.48550/arxiv.2110.04474 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Abstract Single-molecule force spectroscopy (SMFS) uses the cantilever tip of an AFM to apply a able unfold single protein. The obtained force-distance curve encodes unfolding pathway, and from its analysis it is possible characterize folded domains. SMFS has been mostly used study purified proteins, in solution or reconstituted lipid bilayer. Here, we describe pipeline for analyzing membrane proteins based on SMFS, that involves isolation plasma cells harvesting curves directly it. We...

10.1101/732933 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2019-08-13

In this technical report, we briefly introduce the solutions of our team `PKU-WICT-MIPL' for PIC Makeup Temporal Video Grounding (MTVG) Challenge in ACM-MM 2022. Given an untrimmed makeup video and a step query, MTVG aims to localize temporal moment target video. To tackle task, propose phrase relationship mining framework exploit localization relevant fine-grained whole sentence. Besides, constrain results different sentence queries not overlap with each other through dynamic programming...

10.48550/arxiv.2207.02687 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Over the past years, Printed Mathematical Expression Recognition (PMER) has progressed rapidly. However, due to insufficient context information captured by Convolutional Neural Networks, some mathematical symbols might be incorrectly recognized or missed. To tackle this problem, in paper, a Dual Branch transformer-based Network (DBN) is proposed learn both local and global for accurate PMER. In our DBN, features are extracted simultaneously, Context Coupling Module (CCM) developed...

10.48550/arxiv.2312.09030 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Target sound detection (TSD) aims to detect the target from a mixture audio given reference information.Previous methods use conditional network extract sounddiscriminative embedding audio, and then it audio.However, performs much differently when using different audios (e.g.performs poorly for noisy shortduration audios), tends make wrong decisions transient events (i.e.shorter than 1 second).To overcome these problems, in this paper, we present reference-aware duration-robust (RaDur)...

10.21437/interspeech.2022-433 article EN Interspeech 2022 2022-09-16

Target sound detection (TSD) aims to detect the target from a mixture audio given reference information. Previous methods use conditional network extract sound-discriminative embedding audio, and then it audio. However, performs much differently when using different audios (e.g. poorly for noisy short-duration audios), tends make wrong decisions transient events (i.e. shorter than $1$ second). To overcome these problems, in this paper, we present reference-aware duration-robust (RaDur) TSD....

10.48550/arxiv.2204.02143 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...