- Speech Recognition and Synthesis
- Multilevel Inverters and Converters
- Advanced DC-DC Converters
- Speech and Audio Processing
- Music and Audio Processing
- Microgrid Control and Optimization
- Silicon Carbide Semiconductor Technologies
- Electric Motor Design and Analysis
- Induction Heating and Inverter Technology
- Sensorless Control of Electric Motors
- Advanced Battery Technologies Research
- Topic Modeling
- Autonomous Vehicle Technology and Safety
- Magnetic Properties and Applications
- Natural Language Processing Techniques
- Electric Vehicles and Infrastructure
- Advanced Manufacturing and Logistics Optimization
- Islanding Detection in Power Systems
- Wind Turbine Control Systems
- Electric and Hybrid Vehicle Technologies
- Aerospace and Aviation Technology
- Real-time simulation and control systems
- Indoor and Outdoor Localization Technologies
- Power Systems and Renewable Energy
- Advanced Neural Network Applications
Nanyang Technological University
2016-2025
Google (United States)
2018-2024
Technological Institute of the Philippines
2021
National Institute of Technology Raipur
2021
Indian Institute of Technology Bombay
2017
Indian Institute of Technology Kharagpur
2017
National University of Singapore
2004-2006
General Electric (United States)
2006
In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming system. computation blocks based on self-attention are to encode both audio and label sequences independently. The activations from combined feed-forward layer compute probability distribution over the space for every combination of acoustic frame position history. This is similar Recurrent Neural Network Transducer (RNN-T) model, which uses RNNs information encoding...
Current state-of-the-art automatic speech recognition systems are trained to work in specific `domains', defined based on factors like application, sampling rate and codec. When such recognizers used conditions that do not match the training domain, performance significantly drops. This explores idea of building a single domain-invariant model for varied use-cases by combining large scale data from multiple application domains. Our final system is using 162,000 hours speech. Additionally,...
In this paper, we present a novel speaker diarization system for streaming on-device applications. system, use transformer transducer to detect the turns, represent each turn by embedding, then cluster these embeddings with constraints from detected turns. Compared conventional clustering-based systems, our largely reduces computational cost of clustering due sparsity Unlike other supervised systems which require annotations time-stamped labels training, only requires including tokens during...
Voice activity detection (VAD) is the task of predicting which parts an utterance contains speech versus background noise. It important first step to determine samples send decoder and when close microphone. The long short-term memory neural network (LSTM) a popular architecture for sequential modeling acoustic signals, has been successfully used in several VAD applications. However, it observed that LSTMs suffer from state saturation problems (i.e., voice dictation tasks), thus requires...
In this paper we document our experiences with developing speech recognition for medical transcription -a system that automatically transcribes doctor-patient conversations.Towards goal, built a along two different methodological lines Connectionist Temporal Classification (CTC) phoneme based model and Listen Attend Spell (LAS) grapheme model.To train these models used corpus of anonymized conversations representing approximately 14,000 hours speech.Because noisy transcripts alignments in...
At high angular velocity, the induction motor is operated in field weakening range due to voltage limit of inverter. Field oriented vector control (FOC) unsuitable for this operation duetocoupling, non-linearities,andsaturationof linear current controllers. A proposed direct torque space modulation (DTC–SVM) scheme using SVM does not use coordinate transforms or controllers achieve DTC. Control stator flux allows dynamic change all regions,including with six-step operation. This paper...
Recurrent Neural Network Transducer (RNNT) is an end-to-end model which transduces discrete input sequences to output by learning alignments between the sequences. In speech recognition tasks we generally have a strictly monotonic alignment time frames and label sequence. However, standard RNNT loss does not enforce this constraint. This can cause some anomalies in such as outputting sequence of labels at single frame. There also no bound on decoding steps. To address these problems,...
To achieve maximum profit by dispatching a battery storage system in an arbitrage operation, multiple factors must be considered. While revenue from the application is determined time variability of electricity cost, will lowered costs resulting energy efficiency losses, as well degradation. In this paper, optimal dispatch strategy proposed for systems trading on markets. The based computationally-efficient implementation mixed-integer linear programming method, with cost function that...
The rapid increase of renewable energy sources made coordinated control the distributed and intermittent generation units a more demanded task. Matching demand supply is particularly challenging in islanded microgrids. In this study, we have demonstrated mixed-integer quadratic programming (MIQP) method to achieve efficient use within an microgrid. A unique objective function involving fuel consumption diesel generator, degradation lithium-ion battery storage system, carbon emissions, load...
In this paper we present an end-to-end speech recognition system that can recognize single-channel where multiple talkers speak at the same time (overlapping speech) by using a neural network model based on Recurrent Neural Network Transducer (RNN-T) architecture. We augment conventional RNN-T architecture including masking for separation of encoded audio features, and label encoders to encode transcripts from different speakers. use L2 loss prevent align wrong speakers' audio, speaker...
In this paper we present a Transformer-Transducer model architecture and training technique to unify streaming non-streaming speech recognition models into one model. The is composed of stack transformer layers for audio encoding with no lookahead or right context an additional on top trained variable context. inference time, the length can be changed trade off latency accuracy We also show that run in Y-model running parallel low high modes. This allows us have results limited delayed large...
Dispatch of battery storage systems for stationary grid applications is a topic increasing interest: due to the volatility power system's energy supply relying on variable renewable sources, one foresees rising demand and market potential both short- long-term fluctuation smoothing via storage. While revenue attainable arbitrage trading may yet surpass steadily declining cost lithium-ion systems, profitability will be constrained directly by limited lifetime system lowered dissipation losses...
Medium/high-frequency transformer is an integral part of many power conversion systems. Switching at higher frequency results in lesser volume magnetics but induces winding loss density, on account increased eddy current effects conductors. Thus resistance a key parameter to characterize performance medium-frequency (MF) highpower (HP) transformer. In this paper, 10 kW, 0.5/2.5 kV, 1 kHz designs are presented employing different dispositions (normal and interleaved) conductor geometries...
Medium and high frequency, power transformers play an important role in footprint reduction along with their functions of galvanic isolation, voltage transformation all converters typically used traction systems, offshore wind plant converters, solid state transformer based distribution system grids. This art report analysis the various materials design tradeoffs that govern electromagnetic behavior loss mechanisms medium frequency applications. Typical winding core geometries have been...
In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming system. computation blocks based on self-attention are to encode both audio and label sequences independently. The activations from combined feed-forward layer compute probability distribution over the space for every combination of acoustic frame position history. This is similar Recurrent Neural Network Transducer (RNN-T) model, which uses RNNs information encoding...
This paper introduces contrastive siamese (c-siam) network, an architecture for leveraging unlabeled acoustic data in speech recognition. c-siam is the first network that extracts high-level linguistic information from by matching outputs of two identical transformer encoders. It contains augmented and target branches which are trained by: (1) masking inputs with a loss, (2) incorporating stop gradient operation on branch, (3) using extra learnable transformation (4) introducing new temporal...
Offshore wind power has inspired the fields of high voltage direct current (HVdc) for advantages transmission in long distance. Hefty generators are making advanced multilevel rectifier and parallel operation rectifiers popular choice research with aim to accommodate higher power. Issues reliability complexity control associated active electronic devices at such This paper focuses on novelty three-phase diode each auxiliary bidirectional switching blocks (BSB) improve their performance. For...
In this paper, the effect of capacitor voltage ripple on current quality in a cascaded H-bridge (CHB) low-capacitance static compensator (LC-StatCom) with symmetrical I-V characteristics is investigated. Total harmonic distortion synthesized ac an converter operating different ripples evaluated for both inductive and capacitive modes. Simulation-based analyses 350-VA three-cell CHB LC-StatCom system are provided to demonstrate LC-StatCom's effectiveness provide high compared conventional StatCom.
A medium/high-power conversion system, using power electronic (PE) converter in conjunction with a medium/high-frequency transformer, has many desirable effects suitably oriented for modern system architecture. Switching at high frequency results lesser volume of magnetics but induces higher loss density. Thus design and characterization medium-frequency (MF) high-power (HP) transformer significant ramification on its performance application. Thermal management MF HP is one key aspects...
A medium/high-power conversion system using power electronic (PE) converter along with a medium/high-frequency transformer, offers many desirable features that are beneficial for present-day topologies. Leakage inductance is identified to be one of the key parameters characterize performances such medium-frequency (MF) high-power (HP) transformer. In this paper, existing analytical method calculate leakage concentric winding further refined employing mean turn length individual layer and...
Increasing power consumption requires engineers to find better control techniques increase energy efficiency. Advancements in technology allows us use more complex algorithms pursue this goal. Load frequency (LFC) is one of the vital points system and a state art method must be used ensure quality grid. In work, decentralized model predictive controller (MPC) with generation rate constraints handle LFC problem four area interconnected system. It seen that MPC successfully achieved given...
To find balance among multiple design objectives of a medium/high-frequency (MF/HF) high-power (HP) transformer is best addressed employing an optimization technique. In this paper, MF HP formulated as multi-variable problem, where efficiency, power density and temperature rise are chosen objectives. Total loss, core volume maximum modeled respective cost functions amalgamated using weighted-sum approach to derive objective function. It minimized Steepest descent method. Being gradient-based...