Shin Hwei Tan

ORCID: 0000-0001-8633-3372
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Software Engineering Research
  • Software Testing and Debugging Techniques
  • Software Reliability and Analysis Research
  • Advanced Malware Detection Techniques
  • Software System Performance and Reliability
  • Open Source Software Innovations
  • Topic Modeling
  • Software Engineering Techniques and Practices
  • Parallel Computing and Optimization Techniques
  • Security and Verification in Computing
  • Scientific Computing and Data Management
  • Advanced Computational Techniques and Applications
  • Advanced Materials Characterization Techniques
  • Spreadsheets and End-User Computing
  • Boron and Carbon Nanomaterials Research
  • Particle Detector Development and Performance
  • Hydrogen Storage and Materials
  • Advanced Data Storage Technologies
  • Privacy-Preserving Technologies in Data
  • Mechanical Failure Analysis and Simulation
  • Viral Infectious Diseases and Gene Expression in Insects
  • Particle accelerators and beam dynamics
  • Speech and dialogue systems
  • Information and Cyber Security
  • Superconductivity in MgB2 and Alloys

Concordia University
2023-2024

Hong Kong University of Science and Technology
2023-2024

University of Hong Kong
2023-2024

Government of Canada
2023-2024

Southern University of Science and Technology
2018-2023

University of Waterloo
2023

National University of Singapore
2013-2018

University of Illinois System
2012

University of Illinois Urbana-Champaign
2008-2011

This paper presents FUNDED (Flow-sensitive vUl-Nerability coDE Detection), a novel learning framework for building vulnerability detection models. Funded leverages the advances in graph neural networks (GNNs) to develop graph-based method capture and reason about program's control, data, call dependencies. Unlike prior work that treats program as sequential sequence or an untyped graph, learns operates on representation of source code, which individual statements are connected other through...

10.1109/tifs.2020.3044773 article EN IEEE Transactions on Information Forensics and Security 2020-12-14

Large language models such as Codex, have shown the capability to produce code for many programming tasks. However, success rate of existing is low, especially complex One reasons that lack awareness program semantics, resulting in incorrect programs, or even programs which do not compile. In this paper, we systematically study whether automated repair (APR) techniques can fix solutions produced by LeetCode contests. The goal APR enhance reliability large models. Our revealed that: (1)...

10.1109/icse48619.2023.00128 article EN 2023-05-01

Code comments are important artifacts in software. Javadoc widely used Java for API specifications. developers write comments, and users read these to understand the API, e.g., reading a comment method instead of body. An inconsistency between body indicates either fault or, effectively, that can mislead callers introduce faults their code. We present novel approach, called @TCOMMENT, testing specifically properties about null values related exceptions. Our approach consists two components....

10.1109/icst.2012.106 article EN 2012-04-01

Search-based program repair automatically searches for a fix within given space. This may be accomplished by retrofitting generic search algorithm as evidenced the GenProg tool, or building customized in SPR. Unfortunately, automated approaches produce patches that rejected programmers, because of which past works have suggested using human-written to templates guide repair. In this work, we take position will not provide unduly restrict space and attempt overfit repairs into one provided...

10.1145/2950290.2950295 article EN 2016-11-01

Several automated program repair techniques have been proposed to reduce the time and effort spent in bug-fixing. While these tools are designed be generic such that they could address many software faults, different may fix certain types of faults more effectively than other tools. Therefore, it is important compare objectively effectiveness on various fault types. However, existing benchmarks repairs do not allow thorough investigation relationship between We present Codeflaws, a set 3902...

10.1109/icse-c.2017.76 article EN 2017-05-01

Despite the fact an intelligent tutoring system for programming (ITSP) education has long attracted interest, its widespread use been hindered by difficulty of generating personalized feedback automatically. Meanwhile, automated program repair (APR) is emerging new technology that automatically fixes software bugs, and it shown APR can fix bugs large real-world software. In this paper, we study feasibility marrying APR. We perform our with four state-of-the-art tools (GenProg, AE, Angelix,...

10.1145/3106237.3106262 article EN 2017-08-02

Regression occurs when code changes introduce failures in previously passing test cases. As software evolves, regressions may be introduced. Fixing regression errors manually is time-consuming and error-prone. We propose an approach of automated repair regressions, called relifix, that considers the problem as a recon- ciling problematic changes. Specifically, we derive set transformations obtained from our manual inspection 73 real regressions; this uses syntactical information changed...

10.5555/2818754.2818813 article EN International Conference on Software Engineering 2015-05-16

Android apps are omnipresent, and frequently suffer from crashes --- leading to poor user experience economic loss. Past work focused on automated test generation detect in apps. However, repair of has not been studied. In this paper, we propose the first approach automatically apps, specifically a technique for fixing Unlike most test-based approaches, do need test-suite; instead single failing is meticulously analyzed crash locations reasons behind these crashes. Our hinges careful...

10.1145/3180155.3180243 article EN Proceedings of the 44th International Conference on Software Engineering 2018-05-27

JavaScript (JS) is a popular, platform-independent programming language. To ensure the interoperability of JS programs across different platforms, implementation engine should conform to ECMAScript standard. However, doing so challenging as there are many subtle definitions API behaviors, and keep evolving.

10.1145/3453483.3454054 preprint EN 2021-06-18

Regression occurs when code changes introduce failures in previously passing test cases. As software evolves, regressions may be introduced. Fixing regression errors manually is time-consuming and error-prone. We propose an approach of automated repair regressions, called relifix, that considers the problem as a reconciling problematic changes. Specifically, we derive set transformations obtained from our manual inspection 73 real regressions; this uses syntactical information changed...

10.1109/icse.2015.65 article EN 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering 2015-05-01

Refactoring is the process of restructuring existing code without changing its external behavior while improving internal structure. engines are integral components modern Integrated Development Environments (IDEs) and can automate or semi-automate this to enhance readability, reduce complexity, improve maintainability software products. Similar traditional systems such as compilers, refactoring may also contain bugs that lead unexpected behaviors. In paper, we propose a novel approach...

10.48550/arxiv.2501.09879 preprint EN arXiv (Cornell University) 2025-01-16

Automated program repair is a problem of finding transformation (called patch) given incorrect that eliminates the observable failures. It has important applications such as providing debugging aids, automatically grading student assignments, and patching security vulnerabilities. A common challenge faced by existing techniques scalability to large patch spaces, since there are many candidate patches these explicitly or implicitly consider. The correctness criteria for often suite tests....

10.1145/3241980 article EN ACM Transactions on Software Engineering and Methodology 2018-10-22

Intensive use of libraries in Java projects brings potential risk dependency conflicts, which occur when a project directly or indirectly depends on multiple versions the same library class. When this happens, JVM loads one version and shadows others. Runtime exceptions can methods shadowed are referenced. Although management tools such as Maven able to give warnings conflicts is built, developers often ask for crashing stack traces before examining these warnings. It motivates us develop...

10.1109/icse.2019.00068 article EN 2019-05-01

10.1109/saner60148.2024.00035 article EN 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024-03-12

Symbolic execution of Android applications is challenging as it involves either building a customized VM for or modeling the libraries. Since Runtime evolves from one version to another, high-fidelity symbolic engine effect libraries and their evolved versions. Without simulating behavior libraries, path divergence may occur due constraint loss when values flow into framework these later affect subsequent taken. Previous works such JPF-Android have relied on environment In this work, we...

10.1145/3238147.3238225 article EN 2018-08-20

Automated program repair is an emerging area that attempts to patch software errors and vulnerabilities. In this article, we formulate study a problem related automated repair, namely transplantation. A for error in donor automatically adapted inserted into “similar” target program. We observe despite standard procedures vulnerability disclosures publishing of patches, many un-patched occurrences remain the wild. One main reasons fact various implementations same functionality may exist and,...

10.1145/3412376 article EN ACM Transactions on Software Engineering and Methodology 2020-12-31

Context-free language reachability (CFL-reachability) is a fundamental framework for program analysis. A large variety of static analyses can be formulated as CFL-reachability problems, which determines whether specific source-sink pairs in an edge-labeled graph are connected by reachable path, i.e., path whose edge labels form string accepted the given CFL. Computing expensive. The fastest algorithm exhibits slightly subcubic time complexity with respect to input size. Improving scalability...

10.1145/3591233 article EN Proceedings of the ACM on Programming Languages 2023-06-06

Many automated test generation techniques have been proposed for finding crashes in Android apps. Despite recent advancement these approaches, a study shows that app developers prefer reading cases written natural language. Meanwhile, there exist redundancies bug reports (written language) across different apps not previously reused. We propose collaborative finding, novel approach uses bugs other similar to discover the under test. design three settings with varying degrees of interactions...

10.1145/3377811.3380349 article EN 2020-06-27

Successful software systems continuously change their requirements and thus code. When this happens, some existing tests get broken because they no longer reflect the intended behavior, need to be updated. Repairing can time-consuming difficult.

10.1145/1985793.1985978 article EN 2011-05-21

While CUDA has been the most popular parallel computing platform and programming model for general purpose GPU computing, synchronization undergoes significant challenges programmers due to its intricate mechanism coding practices. In this paper, we propose AuCS, first framework automate kernel functions. AuCS transforms original LLVM-level program control flow graph in a semantic-preserving manner exploring possible barrier function locations. Accordingly, develops mechanisms correctly...

10.1109/ase.2019.00075 article EN 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2019-11-01

Whenever a bug or vulnerability is detected in the Linux kernel, kernel developers will endeavour to fix it by introducing patch into mainline version of source tree. However, many users run older "stable" versions Linux, meaning that should also be "backported" one more these versions. This process error-prone and there usually along delay publishing backported patch. Based on an empirical study, we show around 8% all commits submitted are versions,but often than month elapses before...

10.1145/3460319.3464821 preprint EN 2021-07-08

Deep learning (DL) has emerged as a viable means for identifying software bugs and vulnerabilities. The success of DL relies on having suitable representation the problem domain. However, existing DL-based solutions program representations have limitations - they either cannot capture deep, precise semantics or suffer from poor scalability. We present Concoction, first system to learn presentations by combining static source code information dynamic execution traces. Concoction employs...

10.1145/3597503.3639212 article EN cc-by 2024-04-12

Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At same time, automatically generated code faces challenges during deployment concerns around quality and trust. In this article, we study automated coding in a general sense quality, security related issues programmer responsibility. These are key for organizations while deciding usage code. We discuss how advances software engineering such as...

10.48550/arxiv.2405.02213 preprint EN arXiv (Cornell University) 2024-05-03
Coming Soon ...