- Software Engineering Research
- Software Testing and Debugging Techniques
- Advanced Malware Detection Techniques
- Software Reliability and Analysis Research
- Software System Performance and Reliability
- Software Engineering Techniques and Practices
- Web Data Mining and Analysis
- Open Source Software Innovations
- Advanced Software Engineering Methodologies
- Topic Modeling
- Mobile and Web Applications
- Green IT and Sustainability
- Natural Language Processing Techniques
- Scientific Computing and Data Management
- Information and Cyber Security
- Digital and Cyber Forensics
- Mobile Crowdsensing and Crowdsourcing
- Innovative Human-Technology Interaction
- Law, AI, and Intellectual Property
- Multimedia Communication and Technology
- Security and Verification in Computing
- Web Application Security Vulnerabilities
- Logic, programming, and type systems
- Copyright and Intellectual Property
- Usability and User Interface Design
William & Mary
2015-2024
Williams (United States)
2015-2024
Williams & Associates
2022
Università della Svizzera italiana
2021
University of Sannio
2021
Polytechnique Montréal
2018
University of Hertfordshire
2013
Wayne State University
2005-2008
Wayne State College
2006
SUMMARY Feature location is the activity of identifying an initial in source code that implements functionality a software system. Many feature techniques have been introduced automate some or all this process, and comprehensive overview large body work would be beneficial to researchers practitioners. This paper presents systematic literature survey techniques. Eighty‐nine articles from 25 venues reviewed classified within taxonomy order organize structure existing field location. The also...
Code clone detection is an important problem for software maintenance and evolution. Many approaches consider either structure or identifiers, but none of the existing techniques model both sources information. These also depend on generic, handcrafted features to represent code fragments. We introduce learning-based where everything representing terms fragments in source mined from repository. Our analysis supports a framework, which relies deep learning, automatically linking patterns at...
This paper recasts the problem of feature location in source code as a decision-making presence uncertainty. The solution to is formulated combination opinions different experts. experts this work are two existing techniques for location: scenario-based probabilistic ranking events and an information-retrieval-based technique that uses latent semantic indexing. these empirically evaluated through several case studies, which use Mozilla Web browser Eclipse integrated development environment....
This paper presents a novel end-to-end approach to program repair based on sequence-to-sequence learning. We devise, implement, and evaluate system, called SequenceR, for fixing bugs learning source code. uses the copy mechanism overcome unlimited vocabulary problem that occurs with big Our system is data-driven; we train it 35,578 samples, carefully curated from commits open-source repositories. 4,711 independent real bug fixes, as well Defects4J benchmark used in research. SequenceR able...
Different studies show that programmers are more interested in finding definitions of functions and their uses than variables, statements, or arbitrary code fragments [30, 29, 31]. Therefore, require support relevant determining how those used. Unfortunately, existing search engines do not provide enough this to developers, thus reducing the effectiveness reuse.
High cohesion is a desirable property of software as it positively impacts understanding, reuse, and maintenance. Currently proposed measures for in Object-Oriented (OO) reflect particular interpretations capture different aspects it. Existing approaches are largely based on using the structural information from source code, such attribute references, methods to measure cohesion. This paper proposes new classes OO systems analysis unstructured embedded comments identifiers. The measure,...
Millions of open source projects with numerous bug fixes are available in code repositories. This proliferation software development histories can be leveraged to learn how fix common programming bugs. To explore such a potential, we perform an empirical study assess the feasibility using Neural Machine Translation techniques for learning bug-fixing patches real defects. First, mine millions bug-fixes from change hosted on GitHub order extract meaningful examples bug-fixes. Next, abstract...
During the recent years, market of mobile software applications (apps) has maintained an impressive upward trajectory. Many small and large development companies invest considerable resources to target available opportunities. As today, markets for such devices feature over 850K+ apps Android 900K+ iOS. Availability, cost, functionality, usability are just some factors that determine success or lack a given app. Among other factors, reliability is important criteria: users easily get...
Code smells are symptoms of poor design and implementation choices that may hinder code comprehension, possibly increase changeand fault-proneness. While most the detection techniques just rely on structural information, many intrinsically characterized by how elements change overtime. In this paper, we propose Historical Information for Smell deTection (HIST), an approach exploiting history information to detect instances five different smells, namely Divergent Change, Shotgun Surgery,...
Code smells represent symptoms of poor implementation choices. Previous studies found that these make source code more difficult to maintain, possibly also increasing its fault-proneness. There are several approaches identify based on analysis techniques. However, we observe many intrinsically characterized by how elements change over time. Thus, relying solely structural information may not be sufficient detect all the accurately. We propose an approach five different smells, namely...
Energy consumption of mobile applications is nowadays a hot topic, given the widespread use devices. The high demand for features and improved user experience, available powerful hardware, tend to increase apps' energy consumption. However, excessive in apps could also be consequence greedy bad programming practices, or particular API usage patterns. We present largest date quantitative qualitative empirical investigation into categories calls patterns that—in context Android development...
Technical debt is a metaphor introduced by Cunningham to indicate "not quite right code which we postpone making it right". One noticeable symptom of technical represented smells, defined as symptoms poor design and implementation choices. Previous studies showed the negative impact smells on comprehensibility maintainability code. While repercussions quality have been empirically assessed, there still only anecdotal evidence when why bad are introduced, what their survivability, how they...
Deep learning subsumes algorithms that automatically learn compositional representations. The ability of these models to generalize well has ushered in tremendous advances many fields such as natural language processing (NLP). Recent research the software engineering (SE) community demonstrated usefulness applying NLP techniques corpora. Hence, we motivate deep for modeling, highlighting fundamental differences between state-of-the-practice and connectionist models. Our are applicable source...
In past and recent years, the issues related to managing technical debt received significant attention by researchers from both industry academia. There are several factors that contribute debt. One of these is represented code bad smells, i.e., Symptoms poor design implementation choices. While repercussions smells on quality have been empirically assessed, there still only anecdotal evidence when why introduced. To fill this gap, we conducted a large empirical study over change history 200...
Recent years have seen the rise of Deep Learning (DL) techniques applied to source code. Researchers exploited DL automate several development and maintenance tasks, such as writing commit messages, generating comments detecting vulnerabilities among others. One long lasting dreams applying code is possibility non-trivial coding activities. While some steps in this direction been taken (e.g., learning how fix bugs), there still a glaring lack empirical evidence on types changes that can be...
It is common practice for developers of user-facing software to transform a mock-up graphical user interface (GUI) into code. This process takes place both at an application's inception and in evolutionary context as GUI changes keep pace with evolving features. Unfortunately, this challenging time-consuming. In paper, we present approach that automates by enabling accurate prototyping GUIs via three tasks: detection, classification, assembly. First, logical components are detected from...
Deep learning subsumes algorithms that automatically learn compositional representations. The ability of these models to generalize well has ushered in tremendous advances many fields such as natural language processing (NLP). Recent research the software engineering (SE) community demonstrated usefulness applying NLP techniques corpora. Hence, we motivate deep for modeling, highlighting fundamental differences between state-of-the-practice and connectionist models. Our are applicable source...
Mobile developers face unique challenges when detecting and reporting crashes in apps due to their prevailing GUI event-driven nature additional sources of inputs (e.g., sensor readings). To support these tasks, we introduce a novel, automated approach called CRASHSCOPE. This tool explores given Android app using systematic input generation, according several strategies informed by static dynamic analyses, with the intrinsic goal triggering crashes. When crash is detected, CRASHSCOPE...
Deep learning (DL) techniques are gaining more and attention in the software engineering community. They have been used to support several code-related tasks, such as automatic bug fixing code comments generation. Recent studies Natural Language Processing (NLP) field shown that Text-To-Text Transfer Transformer (T5) architecture can achieve state-of-the-art performance for a variety of NLP tasks. The basic idea behind T5 is first pre-train model on large generic dataset using...
Code review is a practice widely adopted in open source and industrial projects. Given the non-negligible cost of such process, researchers started investigating possibility automating specific code tasks. We recently proposed Deep Learning (DL) models targeting automation two tasks: first model takes as input submitted for implements it changes likely to be recommended by reviewer; second reviewer comment posted natural language automatically change required reviewer. While preliminary...
An increasingly popular set of techniques adopted by software engineering (SE) researchers to automate development tasks are those rooted in the concept Deep Learning (DL). The popularity such largely stems from their automated feature capabilities, which aid modeling artifacts. However, due rapid pace at DL have been adopted, it is difficult distill current successes, failures, and opportunities research landscape. In an effort bring clarity this cross-cutting area work, its modern...
The paper addresses the problem of concept location in source code by presenting an approach which combines formal analysis (FCA) and latent semantic indexing (LSI). In proposed approach, LSI is used to map concepts expressed queries written programmer relevant parts code, presented as a ranked list search results. Given elements, our selects most attributes from these documents organizes results lattice, generated via FCA. evaluated case study on eclipse, industrial size integrated...
The paper presents a semi-automated technique for feature location in source code. is based on combining information from two different sources: an execution trace, one hand and the comments identifiers code, other hand.
We present an empirical study to statistically analyze the equivalence of several traceability recovery methods based on Information Retrieval (IR) techniques. The analysis is Principal Component Analysis and overlap set candidate links provided by each method. studied techniques are Jensen-Shannon (JS) method, Vector Space Model (VSM), Latent Semantic Indexing (LSI), Dirichlet Allocation (LDA). results show that while JS, VSM, LSI almost equivalent, LDA able capture a dimension unique which...