NFDI4DS | UHH-SEMS - Publication Details

Hideaki Hata

ORCID: 0000-0003-0708-5222

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5020711309

Research Areas

Software Engineering Research
Open Source Software Innovations
Software Reliability and Analysis Research
Software Engineering Techniques and Practices
Software System Performance and Reliability
Advanced Malware Detection Techniques
Software Testing and Debugging Techniques
Scientific Computing and Data Management
Online Learning and Analytics
Quantum chaos and dynamical systems
Chaos control and synchronization
Wikis in Education and Collaboration
Artificial Intelligence in Healthcare and Education
COVID-19 diagnosis using AI
Topic Modeling
Mathematical Dynamics and Fractals
Natural Language Processing Techniques
Web Data Mining and Analysis
Auction Theory and Applications
Refrigeration and Air Conditioning Technologies
Mobile Crowdsensing and Crowdsourcing
Nonlinear Dynamics and Pattern Formation
High-Velocity Impact and Material Behavior
Game Theory and Applications
Multimodal Machine Learning Applications

Shinshu University
2019-2025

Ōtani University
2020-2024

Nara Institute of Science and Technology
2013-2021

University of Waterloo
2021

University College London
2021

University of London
2021

Kagoshima University
1990-2020

Shiseido Group (Japan)
2019

Mahidol University
2019

National Archives and Records Administration
2014

Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation

OPENALEX - Publications

Yusuke Oda Hiroyuki Fudaba Graham Neubig Hideaki Hata Sakriani Sakti and 2 more

Pseudo-code written in natural language can aid the comprehension of source code unfamiliar programming languages. However, great majority has no corresponding pseudo-code, because pseudo-code is redundant and laborious to create. If could be generated automatically instantly from given code, we allow for on-demand production without human effort. In this paper, propose a method generate specifically adopting statistical machine translation (SMT) framework. SMT, which was originally designed...

10.1109/ase.2015.36 article EN 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2015-11-01

Pandemic programming

OPENALEX - Publications

Paul Ralph Sebastian Baltes Gianisa Adisaputri Richard Torkar Vladimir Kovalenko and 12 more

As a novel coronavirus swept the world in early 2020, thousands of software developers began working from home. Many did so on short notice, under difficult and stressful conditions.This study investigates effects pandemic developers' wellbeing productivity.A questionnaire survey was created mainly existing, validated scales translated into 12 languages. The data analyzed using non-parametric inferential statistics structural equation modeling.The received 2225 usable responses 53 countries....

10.1007/s10664-020-09875-y article EN cc-by Empirical Software Engineering 2020-09-14

Predicting Defective Lines Using a Model-Agnostic Technique

OPENALEX - Publications

Supatsara Wattanakriengkrai Patanamon Thongtanunam Chakkrit Tantithamthavorn Hideaki Hata Kenichi Matsumoto

Defect prediction models are proposed to help a team prioritize the areas of source code files that need Software Quality Assurance (SQA) based on likelihood having defects. However, developers may waste their unnecessary effort whole file while only small fraction its lines defective. Indeed, we find as little 1-3 percent Hence, in this work, propose novel framework (called <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Line-DP</small> )...

10.1109/tse.2020.3023177 article EN cc-by IEEE Transactions on Software Engineering 2020-09-10

DevGPT: Studying Developer-ChatGPT Conversations

OPENALEX - Publications

Tao Xiao Christoph Treude Hideaki Hata Kenichi Matsumoto

This paper introduces DevGPT, a dataset curated to explore how software developers interact with ChatGPT, prominent large language model (LLM).The encompasses 29,778 prompts and responses from including 19,106 code snippets, is linked corresponding development artifacts such as source code, commits, issues, pull requests, discussions, Hacker News threads.This comprehensive derived shared ChatGPT conversations collected GitHub News, providing rich resource for understanding the dynamics of...

10.1145/3643991.3648400 article EN 2024-04-15

Bug prediction based on fine-grained module histories

OPENALEX - Publications

Hideaki Hata Osamu Mizuno Tohru Kikuno

There have been many bug prediction models built with historical metrics, which are mined from version histories of software modules. Many studies reported the effectiveness these metrics. For levels, most targeted package and file levels. Prediction on a fine-grained level, represents method is required because there may be interesting results compared to coarse-grained (package levels) prediction. These include good performance when considering quality assurance efforts, new findings about...

10.1109/icse.2012.6227193 article EN 2013 35th International Conference on Software Engineering (ICSE) 2012-06-01

Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling

OPENALEX - Publications

Natthakul Pingclasai Hideaki Hata Kenichi Matsumoto

Bug reports are widely used in several research areas such as bug prediction, triaging, and etc. The performance of these studies relies on the information from reports. Previous study showed that a significant number actually misclassified between bugs non-bugs. However, classifying is time-consuming task. In previous study, researchers spent 90 days to classify manually more than 7,000 To tackle this problem, we propose automatic report classification techniques. We apply topic modeling...

10.1109/apsec.2013.105 article EN 2013-12-01

Bug prediction based on fine-grained module histories

OPENALEX - Publications

Hideaki Hata Osamu Mizuno Tohru Kikuno

10.5555/2337223.2337247 article EN 2012-06-02

9.6 Million Links in Source Code Comments: Purpose, Evolution, and Decay

OPENALEX - Publications

Hideaki Hata Christoph Treude Raula Gaikovina Kula Takashi Ishio

Links are an essential feature of the World Wide Web, and source code repositories no exception. However, despite their many undisputed benefits, links can suffer from decay, insufficient versioning, lack bidirectional traceability. In this paper, we investigate role contained in comments these perspectives. We conducted a large-scale study around 9.6 million to establish prevalence, used mixed-methods approach identify links' targets, purposes, evolutionary aspects. found that prevalent...

10.1109/icse.2019.00123 preprint EN 2019-05-01

She Elicits Requirements and He Tests: Software Engineering Gender Bias in Large Language Models

OPENALEX - Publications

Christoph Treude Hideaki Hata

Implicit gender bias in software development is a well-documented issue, such as the association of technical roles with men. To address this bias, it important to understand more detail. This study uses data mining techniques investigate extent which 56 tasks related development, assigning GitHub issues and testing, are affected by implicit embedded large language models. We systematically translated each task from English into genderless back, investigated pronouns associated task. Based...

10.1109/msr59073.2023.00088 article EN 2023-05-01

Deformation and fragmentation behaviour of exploded metal cylinders and the effects of wall materials, configuration, explosive energy and initiated locations

OPENALEX - Publications

Tetsuyuki HIROE Kosuke Fujiwara Hideaki Hata Hideaki Takahashi

10.1016/j.ijimpeng.2008.07.002 article EN International Journal of Impact Engineering 2008-07-25

Cross project defect prediction using class distribution estimation and oversampling

OPENALEX - Publications

Nachai Limsettho Kwabena Ebo Bennin Jacky Keung Hideaki Hata Kenichi Matsumoto

10.1016/j.infsof.2018.04.001 article EN Information and Software Technology 2018-04-11

Bug or Not? Bug Report Classification Using N-Gram IDF

OPENALEX - Publications

Pannavat Terdchanakul Hideaki Hata Passakorn Phannachitta Kenichi Matsumoto

Previous studies have found that a significant number of bug reports are misclassified between bugs and nonbugs, manually classifying is time-consuming task. To address this problem, we propose classification model with N-gram IDF, theoretical extension Inverse Document Frequency (IDF) for handling words phrases any length. IDF enables us to extract key terms length from texts, these can be used as the features classify reports. We build models logistic regression random forest using topic...

10.1109/icsme.2017.14 article EN 2017-09-01

A Dataset of High Impact Bugs: Manually-Classified Issue Reports

OPENALEX - Publications

Masao Ohira Yutaro Kashiwa Yosuke Yamatani Hayato Yoshiyuki Yoshiya Maeda and 5 more

The importance of supporting test and maintenance activities in software development has been increasing, since recent systems have become large complex. Although the field Mining Software Repositories (MSR) there are many promising approaches to predicting, localizing, triaging bugs, most them do not consider impacts each bug on users developers but rather treat all bugs with equal weighting, excepting a few studies high impact including security, performance, blocking, so forth. To make...

10.1109/msr.2015.78 article EN 2015-05-01

Building Bridges across Papua New Guinea's Digital Divide in Growing the ICT Industry

OPENALEX - Publications

Marc Cheong Sankwi Abuzo Hideaki Hata Priscilla Kevin Winifred Kula and 4 more

Papua New Guinea (PNG) is an emerging tech society with opportunity to overcome geographic and social boundaries, in order engage the global market. However, current landscape, dominated by Big Tech Silicon Valley other multinational companies Global North, tends overlook requirements of economies such as PNG. This becoming more obvious issues algorithmic bias (in product deployments) digital divide (as case non-affordable commercial software) are affecting PNG users. The Open Source...

10.48550/arxiv.2501.09482 preprint EN arXiv (Cornell University) 2025-01-16

Developer reactions to protestware in open source software: the cases of color.js and es5.ext

OPENALEX - Publications

Youmei Fan Dong Wang Supatsara Wattanakriengkrai Hathaichanok Damrongsiri Christoph Treude and 2 more

10.1007/s10664-024-10599-6 article EN cc-by-nc-nd Empirical Software Engineering 2025-01-18

How different are different diff algorithms in Git?

OPENALEX - Publications

Yusuf Sulistyo Nugroho Hideaki Hata Kenichi Matsumoto

Abstract Automatic identification of the differences between two versions a file is common and basic task in several applications mining code repositories. Git, version control system, has diff utility users can select algorithms from default algorithm Myers to advanced Histogram algorithm. From our systematic mapping, we identified three popular recent studies. On impact on churn metrics 14 Java projects, obtained different values 1.7% 8.2% commits based algorithms. Regarding...

10.1007/s10664-019-09772-z article EN cc-by Empirical Software Engineering 2019-09-11

Learning to Generate Corrective Patches using Neural Machine Translation

OPENALEX - Publications

Hideaki Hata Emad Shihab Graham Neubig

Bug fixing is generally a manually-intensive task. However, recent work has proposed the idea of automated program repair, which aims to repair (at least subset of) bugs in different ways such as code mutation, etc. Following same line bug this paper we aim leverage past fixes propose current/future bugs. Specifically, Ratchet, corrective patch generation system using neural machine translation. By learning corresponding pre-correction and post-correction with sequence-to-sequence model,...

10.48550/arxiv.1812.07170 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Wait for it: identifying “On-Hold” self-admitted technical debt

OPENALEX - Publications

Rungroj Maipradit Christoph Treude Hideaki Hata Kenichi Matsumoto

Abstract Self-admitted technical debt refers to situations where a software developer knows that their current implementation is not optimal and indicates this using source code comment. In work, we hypothesize it possible develop automated techniques understand subset of these comments in more detail, propose tool support can help developers manage self-admitted effectively. Based on qualitative study 333 indicating debt, first identify one particular class amenable management: on-hold...

10.1007/s10664-020-09854-3 article EN cc-by Empirical Software Engineering 2020-08-04

GitHub Discussions: An exploratory study of early adoption

OPENALEX - Publications

Hideaki Hata Nicole Novielli Sebastian Baltes Raula Gaikovina Kula Christoph Treude

Abstract Discussions is a new feature of GitHub for asking questions or discussing topics outside specific Issues Pull Requests. Before being available to all projects in December 2020, it had been tested on selected open source software projects. To understand how developers use this novel feature, they perceive it, and impacts the development processes, we conducted mixed-methods study based early adopters discussions from January until July 2020. We found that: (1) errors, unexpected...

10.1007/s10664-021-10058-6 article EN cc-by Empirical Software Engineering 2021-10-22

Historage

OPENALEX - Publications

Hideaki Hata Osamu Mizuno Tohru Kikuno

Software systems are changed continuously for adapting to the environment, correcting faults, improving performance, and so on. For in-depth analysis related software evolution, it is informative obtain histories of fine-grained source code entities. This paper presents a tool named Historage that can provide entire fine grained entities in Java, such as methods, constructors, fields, etc. A characteristic ability tracing entity including renaming changes. We applied our technique five open...

10.1145/2024445.2024463 article EN 2011-09-05

A topological analysis of communication channels for knowledge sharing in contemporary GitHub projects

OPENALEX - Publications

Jirateep Tantisuwankul Yusuf Sulistyo Nugroho Raula Gaikovina Kula Hideaki Hata Arnon Rungsawang and 2 more

10.1016/j.jss.2019.110416 article EN Journal of Systems and Software 2019-09-09

Sentiment Classification Using N-Gram Inverse Document Frequency and Automated Machine Learning

OPENALEX - Publications

Rungroj Maipradit Hideaki Hata Kenichi Matsumoto

We propose a sentiment classification method with general machine-learning framework. In comparison to publicly available data sets, our achieved the highest F1 values in positive and negative sentences on all sets.

10.1109/ms.2019.2919573 article EN IEEE Software 2019-05-29

Generative AI for Pull Request Descriptions: Adoption, Impact, and Developer Interventions

OPENALEX - Publications

Tao Xiao Hideaki Hata Christoph Treude Kenichi Matsumoto

GitHub's Copilot for Pull Requests (PRs) is a promising service aiming to automate various developer tasks related PRs, such as generating summaries of changes or providing complete walkthroughs with links the relevant code. As this innovative technology gains traction in Open Source Software (OSS) community, it crucial examine its early adoption and impact on development process. Additionally, offers unique opportunity observe how developers respond when they disagree generated content. In...

10.1145/3643773 article EN Proceedings of the ACM on software engineering. 2024-07-12

Coming Soon ...