- Software Engineering Research
- Topic Modeling
- Natural Language Processing Techniques
- Software Testing and Debugging Techniques
- Software Engineering Techniques and Practices
- Web Data Mining and Analysis
- Advanced Malware Detection Techniques
- Software System Performance and Reliability
- Advanced Software Engineering Methodologies
- Multi-Agent Systems and Negotiation
- Scientific Computing and Data Management
- Data Analysis with R
- Software Reliability and Analysis Research
- Service-Oriented Architecture and Web Services
- Web Application Security Vulnerabilities
- Data Visualization and Analytics
- Open Source Software Innovations
- Mobile Crowdsensing and Crowdsourcing
- Geographic Information Systems Studies
- Mobile and Web Applications
- Complex Network Analysis Techniques
- Expert finding and Q&A systems
- Formal Methods in Verification
- Adversarial Robustness in Machine Learning
- Auction Theory and Applications
University of British Columbia
2019-2025
Kelowna General Hospital
2020-2025
Okanagan University College
2019-2024
University of Calgary
2012-2018
Abstract Context Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled mean they actually study not only but also other irrelevant for the of bugs. Objective We want improve our understanding prevalence tangling and types within bug fixing commits. Methods use a crowd sourcing approach manual labeling validate which contribute fixes each line Each is labeled by four participants. If least three participants agree on same...
A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune Pre-trained Language Models (PLMs) achieves higher performance as opposed just one programming language. However, no analysis was made with respect fine-tuning monolingual PLMs. Furthermore, some languages are inherently different language usually cannot be interchanged the others, i.e., Ruby Java possess very structure. To better understand how PLMs affect languages, we...
Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks, such as code generation. The LLMs are mainly utilized the prompt-based zero/few-shot paradigm to guide model accomplishing task. GPT-based models one of popular ones studied for tasks comment generation or test These 'generative' tasks. However, there is limited research on usage 'non-generative' classification using paradigm. In this preliminary...
Mining Software Repositories (MSR) has become a popular research area recently. MSR analyzes different sources of data, such as version control systems, code repositories, defect tracking archived communication, deployment logs, and so on, to uncover interesting actionable insights from the data for improved software development, maintenance, evolution. This chapter provides an overview how conduct study, including setting up formulating goals questions, identifying extracting cleaning...
Large language models (LLMs) have significantly improved their ability to perform tasks in the field of code generation. However, there is still a gap between LLMs being capable coders and top-tier software engineers. The most recent trend using LLM-based agents iterate generation process. Based on observation that top-level engineers often ask clarifying questions reduce Ambiguity both requirements coding solutions, we argue same should be applied for tasks. For this purpose, define...
Many recent models in software engineering introduced deep neural based on the Transformer architecture or use transformer-based Pre-trained Language Models (PLM) trained code. Although these achieve state of arts results many downstream tasks such as code summarization and bug detection, they are PLM, which mainly studied Natural Processing (NLP) field. The current studies rely reasoning practices from NLP for code, despite differences between natural languages programming languages. There...
Code comments can help in program comprehension and are considered as important artifacts to developers software maintenance. However, the mostly missing or outdated, specially complex projects. As a result, several automatic comment generation models developed solution. The recent explore integration of external knowledge resources such Unified Modeling Language class diagrams improve generated comments. In this paper, we propose API2Com, model that leverages Application Programming...
Pre-trained neural Language Models (PTLM), such as CodeBERT, are recently used in software engineering models pre-trained on large source code corpora. Their knowledge is transferred to downstream tasks (e.g. clone detection) via fine-tuning. In natural language processing (NLP), other alternatives for transferring the of PTLMs explored through using adapters, compact, parameter efficient modules inserted layers PTLM. Although adapters known facilitate adapting many compared fine-tuning...
Analysis of mobile app reviews has shown its important role in requirement engineering, software maintenance and evolution apps. Mobile developers check their users' frequently to clarify the issues experienced by users or capture new that are introduced due a recent update. App have dynamic nature discussed topics change over time. The changes among collected for different versions an can reveal about A main technique this analysis is using topic modeling algorithms. However, short texts it...
Context: Technical Debt (TD) is a metaphor used to describe code that "not quite right." Although TD studies have gained momentum, has yet be studied as thoroughly in non-Object-Oriented (OO) or scientific software such R. R multi-paradigm programming language, whose popularity data science and statistical applications amplified recent years. Due R's inherent ability expand through user-contributed packages, several community-led organizations were created organize peer-review packages...
Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster cleaner manner. Existing approaches utilize information retrieval models search matching sequences given query or RNN-based encoder-decoder generate sequences. As it stands, the first approach treats names bags...
Abstract Self-Admitted Technical Debt (SATD) is primarily studied in Object-Oriented (OO) languages and traditionally commercial software. However, scientific software coded dynamically-typed such as R differs paradigm, the source code comments’ semantics are different (i.e., more aligned with algorithms statistics when compared to traditional software). Additionally, many Software Engineering topics understudied development, SATD detection remaining a challenge for this domain. This gap...
This paper describes a framework using the Microsoft Kinect 2 and HoloLens that can assist users in analyzing complex datasets. The system allows for groups of people to view topological map as virtual hologram order them understanding In addition, gestures are built into were created with idea usability mind. By allowing user resize, rotate reposition map, it opens up much wider range data they have received. Custom also possible depending on situation, such raising or lowering water level...
Context: Previous studies demonstrate that Machine or Deep Learning (ML/DL) models can detect Technical Debt from source code comments called Self-Admitted (SATD). Despite the importance of ML/DL in software development, limited focus on automated detection for new SATD types: Algorithm (AD). AD is important because it helps to identify TD early, facilitating research, learning, and preventing accumulation issues related model degradation lack scalability. Aim: Our goal improve performance...
Modeling and implementing auction systems using agent technology is a common practice because agents can assume various roles their behavior will be determined as result of negotiation. However, emergent hurdle. Mechanisms must in place to make sure that participating the won't behave an unintended way. Detecting behaviors design phase rather than deployment more cost effort efficient. Patterns interaction, called scenarios, are basic modeling constructs for behavioral agents. working with...
Pre-trained Programming Language Models (PPLMs) achieved many recent states of the art results for code-related software engineering tasks. Though some studies use data flow or propose tree-based models that utilize Abstract Syntax Tree (AST), most PPLMs do not fully rich syntactical information in source code. Still, input is considered a sequence tokens. There are two issues; first computational inefficiency due to quadratic relationship between length and attention complexity. Second, any...
This research is intended to automatically detect emergent behaviors of scenario based Distributed Software Systems (DSS) in design phase. The direct significance our work reducing the cost verifying DSS for unexpected behavior execution time. Existing approaches have some drawbacks which we try cover work. main contributions are modeling components as a social network and not using behavioral modeling, detecting with no behavior, investigating interactions instances one type.
The verification of Distributed Software Systems (DSS) and Multi agent systems (MAS) has taken a special attention due to the growing demand having DSS in this decade. MAS are class software which functionality or control is distributed. This may cause components (agents) emerge an unexpected behavior their runtime, was not seen requirement design. known as emergent components. cost detecting fixing such problem much more valuable compared fix them after deployment. Therefore, paper new type...
The competitive market of mobile apps requires app developers to consider the users' feedback frequently. This feedback, when comes from different resources, e.g. App Stores and Twitter, will provide a broader picture state app, as users discuss topics on each platform. Automated tools are developed filter informative comments for developers. However, integrate feedbacks platforms, one should evaluate similarities and/or differences text one. Different meaning words in various context, makes...
The paper describes a toolkit that integrates spatially-aware multi-surface systems with mixed-reality approaches to create immersive collaborative environments. multiple digital displays and Microsoft HoloLens devices Kinects. HoloLens' allow several users look at the same virtual hologram while Kinects enable them use body movements interact these holograms as well other surfaces in space. Effectively, enables its build applications utilize space between information. Our approach also...
Context: Mobile app reviews written by users on stores or social media are significant resources for developers.Analyzing have proved to be useful many areas of software engineering (e.g., requirement engineering, testing). Automatic classification requires extensive efforts manually curate a labeled dataset. When the purpose changes (e.g. identifying bugs versus usability issues sentiment), new datasets should labeled, which prevents extensibility developed models desired classes/tasks in...
Code comment generation is the task of generating a high-level natural language description for given code snippet. API2Com model designed to leverage Application Programming Interface Documentations (API Docs) as an external knowledge resource. Shahbazi et al. [1] showed that API Docs might help increase model's performance. However, performance in pertinent comments deteriorates due lengthy documentation used input number APIs method increases. In this paper, we propose evaluate how...
In design of distributed systems with specification languages such as message sequence charts (MSC), communication between different component (agent) types or instances them are defined. There a number methods to verify the using scenarios inter-component communication. Those usually ignore intra-component communication, i.e. components same type. However in large scale systems, e-commerce there several one type that may communicate each other and this violate some regulatory policies...