NFDI4DS | UHH-SEMS - Publication Details

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

OPENALEX - Publications

Qian Zhang Lu Han Haşim Sak Anshuman Tripathi Erik McDermott and 2 more

In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming system. computation blocks based on self-attention are to encode both audio and label sequences independently. The activations from combined feed-forward layer compute probability distribution over the space for every combination of acoustic frame position history. This is similar Recurrent Neural Network Transducer (RNN-T) model, which uses RNNs information encoding...

10.1109/icassp40776.2020.9053896 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Toward Domain-Invariant Speech Recognition via Large Scale Training

OPENALEX - Publications

Arun Narayanan Ananya Misra Khe Chai Sim Golan Pundak Anshuman Tripathi and 4 more

Current state-of-the-art automatic speech recognition systems are trained to work in specific `domains', defined based on factors like application, sampling rate and codec. When such recognizers used conditions that do not match the training domain, performance significantly drops. This explores idea of building a single domain-invariant model for varied use-cases by combining large scale data from multiple application domains. Our final system is using 162,000 hours speech. Additionally,...

10.1109/slt.2018.8639610 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2018-12-01

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

OPENALEX - Publications

Wei Xia Lu Han Quan Wang Anshuman Tripathi Yiling Huang and 2 more

In this paper, we present a novel speaker diarization system for streaming on-device applications. system, use transformer transducer to detect the turns, represent each turn by embedding, then cluster these embeddings with constraints from detected turns. Compared conventional clustering-based systems, our largely reduces computational cost of clustering due sparsity Unlike other supervised systems which require annotations time-stamped labels training, only requires including tokens during...

10.1109/icassp43922.2022.9746531 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Temporal Modeling Using Dilated Convolution and Gating for Voice-Activity-Detection

OPENALEX - Publications

Shuo-Yiin Chang Bo Li Gabor Simko Tara N. Sainath Anshuman Tripathi and 2 more

Voice activity detection (VAD) is the task of predicting which parts an utterance contains speech versus background noise. It important first step to determine samples send decoder and when close microphone. The long short-term memory neural network (LSTM) a popular architecture for sequential modeling acoustic signals, has been successfully used in several VAD applications. However, it observed that LSTMs suffer from state saturation problems (i.e., voice dictation tasks), thus requires...

10.1109/icassp.2018.8461921 article EN 2018-04-01

Speech Recognition for Medical Conversations

OPENALEX - Publications

Chung‐Cheng Chiu Anshuman Tripathi Katherine Chou Chris Co Navdeep Jaitly and 9 more

In this paper we document our experiences with developing speech recognition for medical transcription -a system that automatically transcribes doctor-patient conversations.Towards goal, built a along two different methodological lines Connectionist Temporal Classification (CTC) phoneme based model and Listen Attend Spell (LAS) grapheme model.To train these models used corpus of anonymized conversations representing approximately 14,000 hours speech.Because noisy transcripts alignments in...

10.21437/interspeech.2018-40 article EN Interspeech 2022 2018-08-28

Dynamic control of torque in overmodulation and in the field weakening region

OPENALEX - Publications

Anshuman Tripathi Ashwin M. Khambadkone Sanjib Kumar Panda

At high angular velocity, the induction motor is operated in field weakening range due to voltage limit of inverter. Field oriented vector control (FOC) unsuitable for this operation duetocoupling, non-linearities,andsaturationof linear current controllers. A proposed direct torque space modulation (DTC–SVM) scheme using SVM does not use coordinate transforms or controllers achieve DTC. Control stator flux allows dynamic change all regions,including with six-step operation. This paper...

10.1109/tpel.2006.876823 article EN IEEE Transactions on Power Electronics 2006-07-01

Monotonic Recurrent Neural Network Transducer and Decoding Strategies

OPENALEX - Publications

Anshuman Tripathi Lu Han Haşim Sak Hagen Soltau

Recurrent Neural Network Transducer (RNNT) is an end-to-end model which transduces discrete input sequences to output by learning alignments between the sequences. In speech recognition tasks we generally have a strictly monotonic alignment time frames and label sequence. However, standard RNNT loss does not enforce this constraint. This can cause some anomalies in such as outputting sequence of labels at single frame. There also no bound on decoding steps. To address these problems,...

10.1109/asru46091.2019.9003822 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019-12-01

Ageing and Efficiency Aware Battery Dispatch for Arbitrage Markets Using Mixed Integer Linear Programming †

OPENALEX - Publications

Holger C. Hesse Volkan Kumtepeli Michael Schimpe Jorn M. Reniers David A. Howey and 3 more

To achieve maximum profit by dispatching a battery storage system in an arbitrage operation, multiple factors must be considered. While revenue from the application is determined time variability of electricity cost, will lowered costs resulting energy efficiency losses, as well degradation. In this paper, optimal dispatch strategy proposed for systems trading on markets. The based computationally-efficient implementation mixed-integer linear programming method, with cost function that...

10.3390/en12060999 article EN cc-by Energies 2019-03-14

Reserved-Energy-Aided Control in Solid-State Transformers

OPENALEX - Publications

Radhika Sarda Ezequiel Rodriguez Ramos Glen G. Farivar Josep Pou Yeo Howe Li and 2 more

10.1109/tpel.2025.3547155 article EN IEEE Transactions on Power Electronics 2025-01-01

Design and analysis of an aging‐aware energy management system for islanded grids using mixed‐integer quadratic programming

OPENALEX - Publications

Volkan Kumtepeli Yulong Zhao Maik Naumann Anshuman Tripathi Youyi Wang and 2 more

The rapid increase of renewable energy sources made coordinated control the distributed and intermittent generation units a more demanded task. Matching demand supply is particularly challenging in islanded microgrids. In this study, we have demonstrated mixed-integer quadratic programming (MIQP) method to achieve efficient use within an microgrid. A unique objective function involving fuel consumption diesel generator, degradation lithium-ion battery storage system, carbon emissions, load...

10.1002/er.4512 article EN International Journal of Energy Research 2019-06-03

End-To-End Multi-Talker Overlapping Speech Recognition

OPENALEX - Publications

Anshuman Tripathi Lu Han Haşim Sak

In this paper we present an end-to-end speech recognition system that can recognize single-channel where multiple talkers speak at the same time (overlapping speech) by using a neural network model based on Recurrent Neural Network Transducer (RNN-T) architecture. We augment conventional RNN-T architecture including masking for separation of encoded audio features, and label encoders to encode transcripts from different speakers. use L2 loss prevent align wrong speakers' audio, speaker...

10.1109/icassp40776.2020.9054328 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition

OPENALEX - Publications

Anshuman Tripathi Jaeyoung Kim Qian Zhang Lu Han Haşim Sak

In this paper we present a Transformer-Transducer model architecture and training technique to unify streaming non-streaming speech recognition models into one model. The is composed of stack transformer layers for audio encoding with no lookahead or right context an additional on top trained variable context. inference time, the length can be changed trade off latency accuracy We also show that run in Y-model running parallel low high modes. This allows us have results limited delayed large...

10.48550/arxiv.2010.03192 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition

OPENALEX - Publications

Khe Chai Sim Arun Narayanan Ananya Misra Anshuman Tripathi Golan Pundak and 4 more

10.21437/interspeech.2018-2246 article EN Interspeech 2022 2018-08-28

Energy Arbitrage Optimization With Battery Storage: 3D-MILP for Electro-Thermal Performance and Semi-Empirical Aging Models

OPENALEX - Publications

Volkan Kumtepeli Holger C. Hesse Michael Schimpe Anshuman Tripathi Youyi Wang and 1 more

Dispatch of battery storage systems for stationary grid applications is a topic increasing interest: due to the volatility power system's energy supply relying on variable renewable sources, one foresees rising demand and market potential both short- long-term fluctuation smoothing via storage. While revenue attainable arbitrage trading may yet surpass steadily declining cost lithium-ion systems, profitability will be constrained directly by limited lifetime system lowered dissipation losses...

10.1109/access.2020.3035504 article EN cc-by IEEE Access 2020-01-01

Multilingual Speech Recognition with Self-Attention Structured Parameterization

OPENALEX - Publications

Yun Zhu Parisa Haghani Anshuman Tripathi Bhuvana Ramabhadran Brian Farris and 7 more

10.21437/interspeech.2020-2847 article EN Interspeech 2022 2020-10-25

Accurate calculation of winding resistance and influence of interleaving to mitigate ac effect in a medium-frequency high-power transformer

OPENALEX - Publications

Annoy Kumar Das Haonan Tian Zhongbao Wei Sriram Vaisambhayana Shuyu Cao and 2 more

Medium/high-frequency transformer is an integral part of many power conversion systems. Switching at higher frequency results in lesser volume magnetics but induces winding loss density, on account increased eddy current effects conductors. Thus resistance a key parameter to characterize performance medium-frequency (MF) highpower (HP) transformer. In this paper, 10 kW, 0.5/2.5 kV, 1 kHz designs are presented employing different dispositions (normal and interleaved) conductor geometries...

10.1109/acept.2017.8168612 article EN 2017-10-01

State of art survey for design of medium frequency high power transformer

OPENALEX - Publications

Sriram Vaisambhayana Catalin Gabriel Dincan Shuyu Cao Anshuman Tripathi Haonan Tian and 1 more

Medium and high frequency, power transformers play an important role in footprint reduction along with their functions of galvanic isolation, voltage transformation all converters typically used traction systems, offshore wind plant converters, solid state transformer based distribution system grids. This art report analysis the various materials design tradeoffs that govern electromagnetic behavior loss mechanisms medium frequency applications. Typical winding core geometries have been...

10.1109/acept.2016.7811550 article EN 2016-10-01

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

OPENALEX - Publications

Qian Zhang Lu Han Haşim Sak Anshuman Tripathi Erik McDermott and 2 more

In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming system. computation blocks based on self-attention are to encode both audio and label sequences independently. The activations from combined feed-forward layer compute probability distribution over the space for every combination of acoustic frame position history. This is similar Recurrent Neural Network Transducer (RNN-T) model, which uses RNNs information encoding...

10.48550/arxiv.2002.02562 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Contrastive Siamese Network for Semi-Supervised Speech Recognition

OPENALEX - Publications

Soheil Khorram Jaeyoung Kim Anshuman Tripathi Lu Han Qian Zhang and 1 more

This paper introduces contrastive siamese (c-siam) network, an architecture for leveraging unlabeled acoustic data in speech recognition. c-siam is the first network that extracts high-level linguistic information from by matching outputs of two identical transformer encoders. It contains augmented and target branches which are trained by: (1) masking inputs with a loss, (2) incorporating stop gradient operation on branch, (3) using extra learnable transformation (4) introducing new temporal...

10.1109/icassp43922.2022.9747355 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Parallel Operation of Unity Power Factor Rectifier for PMSG Wind Turbine System

OPENALEX - Publications

Md Shafquat Ullah Khan Ali I. Maswood Mohd Tariq Hossein Dehghani Tafti Anshuman Tripathi

Offshore wind power has inspired the fields of high voltage direct current (HVdc) for advantages transmission in long distance. Hefty generators are making advanced multilevel rectifier and parallel operation rectifiers popular choice research with aim to accommodate higher power. Issues reliability complexity control associated active electronic devices at such This paper focuses on novelty three-phase diode each auxiliary bidirectional switching blocks (BSB) improve their performance. For...

10.1109/tia.2018.2870820 article EN IEEE Transactions on Industry Applications 2018-09-17

LC-StatCom with symmetrical I-V characteristic: Total Harmonic Distortion Study

OPENALEX - Publications

Glen G. Farivar Josep Pou Anshuman Tripathi

In this paper, the effect of capacitor voltage ripple on current quality in a cascaded H-bridge (CHB) low-capacitance static compensator (LC-StatCom) with symmetrical I-V characteristics is investigated. Total harmonic distortion synthesized ac an converter operating different ripples evaluated for both inductive and capacitive modes. Simulation-based analyses 350-VA three-cell CHB LC-StatCom system are provided to demonstrate LC-StatCom's effectiveness provide high compared conventional StatCom.

10.1109/acept.2017.8168597 article EN 2017-10-01

Thermal modeling and transient behavior analysis of a medium-frequency high-power transformer

OPENALEX - Publications

Annoy Kumar Das Zhongbao Wei Sriram Vaisambhayana Shuyu Cao Haonan Tian and 2 more

A medium/high-power conversion system, using power electronic (PE) converter in conjunction with a medium/high-frequency transformer, has many desirable effects suitably oriented for modern system architecture. Switching at high frequency results lesser volume of magnetics but induces higher loss density. Thus design and characterization medium-frequency (MF) high-power (HP) transformer significant ramification on its performance application. Thermal management MF HP is one key aspects...

10.1109/iecon.2017.8216372 article EN IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society 2017-10-01

Accurate calculation of leakage inductance for balanced and fractional-interleaved winding in medium-frequency high-power transformer

OPENALEX - Publications

Annoy Kumar Das Zhongbao Wei Shuyu Cao Sriram Vaisambhayana Haonan Tian and 2 more

A medium/high-power conversion system using power electronic (PE) converter along with a medium/high-frequency transformer, offers many desirable features that are beneficial for present-day topologies. Leakage inductance is identified to be one of the key parameters characterize performances such medium-frequency (MF) high-power (HP) transformer. In this paper, existing analytical method calculate leakage concentric winding further refined employing mean turn length individual layer and...

10.1109/spec.2017.8333556 article EN 2017-12-01

Multi-area model predictive load frequency control: A decentralized approach

OPENALEX - Publications

Volkan Kumtepeli Youyi Wang Anshuman Tripathi

Increasing power consumption requires engineers to find better control techniques increase energy efficiency. Advancements in technology allows us use more complex algorithms pursue this goal. Load frequency (LFC) is one of the vital points system and a state art method must be used ensure quality grid. In work, decentralized model predictive controller (MPC) with generation rate constraints handle LFC problem four area interconnected system. It seen that MPC successfully achieved given...

10.1109/acept.2016.7811530 article EN 2016-10-01

Multi-variable optimization methodology for medium-frequency high-power transformer design employing steepest descent method

OPENALEX - Publications

Annoy Kumar Das Zhongbao Wei B. G. Fernandes Haonan Tian Madasamy Palavesha Thevar and 4 more

To find balance among multiple design objectives of a medium/high-frequency (MF/HF) high-power (HP) transformer is best addressed employing an optimization technique. In this paper, MF HP formulated as multi-variable problem, where efficiency, power density and temperature rise are chosen objectives. Total loss, core volume maximum modeled respective cost functions amalgamated using weighted-sum approach to derive objective function. It minimized Steepest descent method. Being gradient-based...

10.1109/apec.2018.8341259 article EN 2022 IEEE Applied Power Electronics Conference and Exposition (APEC) 2018-03-01