NFDI4DS | UHH-SEMS - Publication Details

Radityo Eko Prasojo

ORCID: 0000-0002-5148-7299

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5074108268

Research Areas

Topic Modeling
Natural Language Processing Techniques
Semantic Web and Ontologies
Sentiment Analysis and Opinion Mining
Advanced Text Analysis Techniques
Data Quality and Management
Speech Recognition and Synthesis
Speech and Audio Processing
Multimodal Machine Learning Applications
Software Engineering Research
Data Mining and Machine Learning Applications
Edcuational Technology Systems
Music and Audio Processing
Biomedical Text Mining and Ontologies
Text Readability and Simplification
Machine Learning and Data Classification
Web Data Mining and Analysis
Service-Oriented Architecture and Web Services
Advanced Neural Network Applications
Domain Adaptation and Few-Shot Learning
Text and Document Classification Technologies
Expert finding and Q&A systems
Data Stream Mining Techniques
Public Health and Nutrition
Recommender Systems and Techniques

University of Indonesia
2020-2023

Free University of Bozen-Bolzano
2014-2019

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia

OPENALEX - Publications

Alham Fikri Aji Genta Indra Winata Fajri Koto Samuel Cahyawijaya Ade Romadhony and 7 more

Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, Sebastian Ruder. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.500 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

OPENALEX - Publications

Genta Indra Winata Alham Fikri Aji Samuel Cahyawijaya Rahmad Mahendra Fajri Koto and 9 more

Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder. Proceedings of the 17th Conference European Chapter Association for Computational Linguistics. 2023.

10.18653/v1/2023.eacl-main.57 article EN cc-by 2023-01-01

Indonesian first national suicide prevention strategy: key findings from the qualitative situational analysis

OPENALEX - Publications

Sandersan Onie Ashra Vina Kezia Taufik Juneman Abraham Diana Setiyawati and 35 more

10.1016/j.lansea.2023.100245 article EN cc-by The Lancet Regional Health - Southeast Asia 2023-09-01

COPAL-ID: Indonesian Language Reasoning with Local Culture and Nuances

OPENALEX - Publications

Haryo Akbarianto Wibowo Erland Hilman Fuadi Made Nindyatama Nityasya Radityo Eko Prasojo Alham Aji

10.18653/v1/2024.naacl-long.77 article EN 2024-01-01

StuffIE

OPENALEX - Publications

Radityo Eko Prasojo Mouna Kacimi Werner Nutt

Recent knowledge extraction methods are moving towards ternary and higher-arity relations to capture more information about binary facts. An example is include the time, location, duration of a specific fact. These can be even complex extract in advanced domains such as news, where events typically come with different facets including reasons, consequences, purposes, involved parties, related events. The main challenge consists first finding set each fact, second tagging those relevant category.

10.1145/3269206.3271812 article EN 2018-10-17

Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation

OPENALEX - Publications

Haryo Akbarianto Wibowo Tatag Aziz Prawiro Muhammad Ihsan Alham Fikri Aji Radityo Eko Prasojo and 2 more

In its daily use, the Indonesian language is riddled with informality, that is, deviations from standard in terms of vocabulary, spelling, and word order. On other hand, current available NLP models are typically developed mind. this work, we address a style-transfer informal to formal as low resource machine translation problem. We build new dataset parallel sentences counterpart. benchmark several strategies perform style transfer Indonesian. also explore augmenting training set artificial...

10.1109/ialp51396.2020.9310459 article EN 2020-12-04

Entity and Aspect Extraction for Organizing News Comments

OPENALEX - Publications

Radityo Eko Prasojo Mouna Kacimi Werner Nutt

News websites give their users the opportunity to participate in discussions about published articles, by writing comments. Typically, these comments are unstructured making it hard understand flow of user discussions. Thus, there is a need for organizing help (1) gain more insights news topics, and (2) have an easy access that trigger interests. In this work, we address above problem around entities aspects they discuss. More specifically, propose approach entity aspect extraction from...

10.1145/2806416.2806576 article EN 2015-10-17

BERT Goes Brrr: A Venture Towards the Lesser Error in Classifying Medical Self-Reporters on Twitter

OPENALEX - Publications

Alham Fikri Aji Made Nindyatama Nityasya Haryo Akbarianto Wibowo Radityo Eko Prasojo Tirana Noor Fatyanosa

Alham Fikri Aji, Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Radityo Eko Prasojo, Tirana Fatyanosa. Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task. 2021.

10.18653/v1/2021.smm4h-1.9 article EN cc-by 2021-01-01

IndoCollex: A Testbed for Morphological Transformation of Indonesian Word Colloquialism

OPENALEX - Publications

Haryo Akbarianto Wibowo Made Nindyatama Nityasya Afra Feyza Akyürek Suci Fitriany Alham Fikri Aji and 2 more

Haryo Akbarianto Wibowo, Made Nindyatama Nityasya, Afra Feyza Akyürek, Suci Fitriany, Alham Fikri Aji, Radityo Eko Prasojo, Derry Tanti Wijaya. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

10.18653/v1/2021.findings-acl.280 article EN cc-by 2021-01-01

Costs to Consider in Adopting NLP for Your Business

OPENALEX - Publications

Made Nindyatama Nityasya Haryo Akbarianto Wibowo Radityo Eko Prasojo Alham Fikri Aji

Recent advances in Natural Language Processing (NLP) have largely pushed deep transformer-based models as the go-to state-of-the-art technique without much regard to production and utilization cost. Companies planning adopt these methods into their business face difficulties because of lack machine, data, human resources build them. We compare both performance cost classical learning algorithms latest ones common sequence text labeling tasks. In our industrial datasets, we find that often...

10.48550/arxiv.2012.08958 preprint EN other-oa arXiv (Cornell University) 2020-01-01

NIX-TTS: Lightweight and End-to-End Text-to-Speech Via Module-Wise Distillation

OPENALEX - Publications

Rendi Chevi Radityo Eko Prasojo Alham Fikri Aji Andros Tjandra Sakriani Sakti

Several solutions for lightweight TTS have shown promising results. Still, they either rely on a hand-crafted design that reaches non-optimum size or use neural architecture search but often suffer training costs. We present Nix- TTS, achieved via knowledge distillation to high-quality yet large-sized, non-autoregressive, and end-to-end (vocoder-free) teacher model. Specifically, we offer module-wise distillation, enabling flexible independent the encoder decoder module. The resulting Nix -...

10.1109/slt54892.2023.10023322 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2023-01-09

Investigating Text Shortening Strategy in BERT: Truncation vs Summarization

OPENALEX - Publications

Mirza Alim Mutasodirin Radityo Eko Prasojo

The parallelism of Transformer-based models comes at the cost their input max-length. Some studies proposed methods to overcome this limitation, but none them reported effectiveness summarization as an alternative. In study, we investigate performance document truncation and in text classification tasks. Each two was investigated with several variations. This study also how close performances are full-text. We used a dataset tasks based on Indonesian news articles (IndoSum) do tests. shows...

10.1109/icacsis53237.2021.9631364 preprint EN 2021-10-23

Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models

OPENALEX - Publications

Made Nindyatama Nityasya Haryo Akbarianto Wibowo Rendi Chevi Radityo Eko Prasojo Alham Fikri Aji

We perform knowledge distillation (KD) benchmark from task-specific BERT-base teacher models to various student models: BiLSTM, CNN, BERT-Tiny, BERT-Mini, and BERT-Small. Our experiment involves 12 datasets grouped in two tasks: text classification sequence labeling the Indonesian language. also compare aspects of distillations including usage word embeddings unlabeled data augmentation. experiments show that, despite rising popularity Transformer-based models, using BiLSTM CNN provide best...

10.48550/arxiv.2201.00558 preprint EN other-oa arXiv (Cornell University) 2022-01-01

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

OPENALEX - Publications

Genta Indra Winata Alham Fikri Aji Samuel Cahyawijaya Rahmad Mahendra Fajri Koto and 9 more

Natural language processing (NLP) has a significant impact on society via technologies such as machine translation and search engines. Despite its success, NLP technology is only widely available for high-resource languages English Chinese, while it remains inaccessible to many due the unavailability of data resources benchmarks. In this work, we focus developing in Indonesia. being second most linguistically diverse country, Indonesia are categorized endangered some even extinct. We develop...

10.48550/arxiv.2205.15960 preprint EN cc-by-sa arXiv (Cornell University) 2022-01-01

ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair

OPENALEX - Publications

Alham Fikri Aji Tirana Noor Fatyanosa Radityo Eko Prasojo Philip Arthur Suci Fitriany and 4 more

We release our synthetic parallel paraphrase corpus across 17 languages: Arabic, Catalan, Czech, German, English, Spanish, Estonian, French, Hindi, Indonesian, Italian, Dutch, Romanian, Russian, Swedish, Vietnamese, and Chinese. Our method relies only on monolingual data a neural machine translation system to generate paraphrases, hence simple apply. multiple samples using beam search choose the most lexically diverse pair according their sentence BLEU. compare generated with...

10.48550/arxiv.2205.04651 preprint EN other-oa arXiv (Cornell University) 2022-01-01

On “Scientific Debt” in NLP: A Case for More Rigour in Language Model Pre-Training Research

OPENALEX - Publications

Made Nindyatama Nityasya Haryo Akbarianto Wibowo Alham Fikri Aji Genta Indra Winata Radityo Eko Prasojo and 2 more

Made Nindyatama Nityasya, Haryo Wibowo, Alham Fikri Aji, Genta Winata, Radityo Eko Prasojo, Phil Blunsom, Adhiguna Kuncoro. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2023.

10.18653/v1/2023.acl-long.477 article EN cc-by 2023-01-01

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia

OPENALEX - Publications

Alham Fikri Aji Genta Indra Winata Fajri Koto Samuel Cahyawijaya Ade Romadhony and 7 more

NLP research is impeded by a lack of resources and awareness the challenges presented underrepresented languages dialects. Focusing on spoken in Indonesia, second most linguistically diverse fourth populous nation world, we provide an overview current state for Indonesia's 700+ languages. We highlight Indonesian how these affect performance systems. Finally, general recommendations to help develop technology not only Indonesia but also other

10.48550/arxiv.2203.13357 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Developing a Singlish Neural Language Model using ELECTRA

OPENALEX - Publications

Galangkangin Gotera Radityo Eko Prasojo Yugo K. Isal

We develop and benchmark a Singlish pretrained neural language model. To this end, we build novel 3 GB freetext dataset collected through various Singaporean websites. Then, leverage ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) to train transformer-based is chosen due its resource-efficiency better ensure reproducibility. further two text classification datasets in Singlish: sentiment analysis identification. use the fine-tune our model results...

10.1109/icacsis56558.2022.9923521 article EN 2022-10-01

Coming Soon ...