- Natural Language Processing Techniques
- Topic Modeling
- Cognitive and developmental aspects of mathematical skills
- Speech and dialogue systems
- Speech Recognition and Synthesis
- Neuroscience, Education and Cognitive Function
- Text Readability and Simplification
- Reading and Literacy Development
- Algorithms and Data Compression
- Mathematics Education and Teaching Techniques
- Semantic Web and Ontologies
- Handwritten Text Recognition Techniques
- Biomedical Text Mining and Ontologies
- Multimodal Machine Learning Applications
- Translation Studies and Practices
- Speech and Audio Processing
- Machine Learning and Algorithms
- Neuroscience and Music Perception
- Education Methods and Practices
- Creativity in Education and Neuroscience
- Music and Audio Processing
- Education, Achievement, and Giftedness
- Psychology, Coaching, and Therapy
- Visual and Cognitive Learning Processes
- Transcranial Magnetic Stimulation Studies
University of Graz
2015-2024
TU Dortmund University
2023
Leibniz Research Centre for Working Environment and Human Factors
2023
Chinese University of Hong Kong
2023
Czech Academy of Sciences, Institute of Psychology
2020
Neuroscience Institute
2020
Qatar Airways (Qatar)
2015-2018
Western University
2012-2017
Hamad bin Khalifa University
2015-2017
University College London
2017
In this paper, we describe a new model for word alignment in statistical translation and present experimental results. The idea of the is to make probabilities dependent on differences positions rather than absolute positions. To achieve goal, approach uses first-order Hidden Markov (HMM) problem as they are used successfully speech recognition time problem. difference HMM that there no monotony constraint possible orderings. We details test several bilingual corpora.
Training word alignment models on large corpora is a very time-consuming processes. This paper describes two parallel implementations of GIZA++ that accelerate this process. One the runs computer clusters, other multi-processor system using multi-threading technology. Results show near-linear speed-up according to number CPUs used, and quality preserved.
Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Stephan Vogel. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018.
In this paper we present a recipe and language resources for training testing Arabic speech recognition systems using the KALDI toolkit. We built prototype broadcast news system 200 hours GALE data that is publicly available through LDC. describe in detail decisions made building system: MADA toolkit text normalization vowelization; why use 36 phonemes; how generate pronunciations; build model. report results state-of-the-art modeling decoding techniques. The scripts are released on QCRI's...
This paper describes the Arabic MGB-3 Challenge - Speech Recognition in Wild. Unlike last year's MGB-2 Challenge, for which recognition task was based on more than 1,200 hours broadcast TV news recordings from Aljazeera programs, emphasises dialectal using a multi-genre collection of Egyptian YouTube videos. Seven genres were used data collection: comedy, cooking, family/kids, fashion, drama, sports, and science (TEDx). A total 16 videos, split evenly across different genres, divided into...
Functional magnetic resonance imaging (fMRI) studies investigating the neural mechanisms underlying developmental dyscalculia are scarce and results thus far inconclusive. Main aim of present study is to investigate correlates nonsymbolic number magnitude processing in children with without dyscalculia.18 (9 dyscalculia) were asked solve a non-symbolic comparison task (finger patterns) during brain scanning. For spatial control identical stimuli employed, instructions varying only (judgment...
The way the human brain constructs representations of numerical symbols is poorly understood. While increasing evidence from neuroimaging studies has indicated that intraparietal sulcus (IPS) becomes increasingly specialized for symbolic magnitude representation over developmental time, extent to which these changes are associated with age-related differences in or non-numerical processes, such as response selection, remains be uncovered. To address outstanding questions we investigated...
The ability to process the numerical magnitude of sets items has been characterized in many animal species. Neuroimaging data have associated this represent nonsymbolic magnitudes (e.g., arrays dots) with activity bilateral parietal lobes. Yet quantitative abilities humans are not limited processing sets. Humans used sense as foundation for symbolic systems representation magnitude. Although symbol use is widespread human cultures, brain regions involved symbols just beginning be understood....
In this paper we explore the challenges in crowdsourcing task of translation over web which remotely located translators work on providing translations independent each other. We then propose a collaborative workflow for to address some these challenges. our pipeline model, are working phases where output from earlier can be enhanced subsequent phases. also highlight novel contributions model like assistive and synthesis that leverage monolingual bilingual speakers alike. evaluate approach...
In this paper a robust, adaptive approach for mining parallel sentences from bilingual comparable news collection is described Sentence length models and lexicon-based are combined under maximum likelihood criterion. Specific proposed to handle insertions deletions that frequent in data collected the web. The adaptive, updating translation lexicon iteratively using mined get better vocabulary coverage probability parameter estimation. Experiments carried out on 10 years of Xinhua collection....
We explore unsupervised language model adaptation techniques for Statistical Machine Translation. The hypotheses from the machine translation output are converted into queries at different levels of representation power and used to extract similar sentences very large monolingual text collection. Specific models then build retrieved data interpolated with a general background model. Experiments show significant improvements when translating these adapted models.