On benchmarks of bugs for studies in automatic program repair

Delfim, Fernanda Madeiral

Please use this identifier to cite or link to this item: https://repositorio.ufu.br/handle/123456789/44574

ORCID:	http://orcid.org/0000-0003-2048-7648
Document type:	Tese
Access type:	Acesso Aberto
Title:	On benchmarks of bugs for studies in automatic program repair
Alternate title (s):	Sobre benchmarks de bugs para estudos em reparo automático de programas
Author:	Delfim, Fernanda Madeiral
First Advisor:	Maia, Marcelo de Almeida
First member of the Committee:	Silva, Flávio de Oliveira
Second member of the Committee:	Julia, Stéphane
Third member of the Committee:	Ferrari, Fabiano Cutigi
Fourth member of the Committee:	Murta, Leonardo Gresta Paulino
Summary:	Sistemas de software são indispensáveis para as atividades diárias de todos hoje em dia. Se um sistema se comporta de uma maneira diferente do esperado, diz-se que ele contém um bug. Esses bugs são onipresentes, e corrigi-los manualmente é bem conhecido como uma tarefa cara. Isso motiva o campo de pesquisa emergente reparo automático de programas, que tem como objetivo corrigir bugs sem intervenção humana. Apesar do esforço que pesquisadores fizeram na última década, mostrando que abordagens de reparo automático têm o potencial de corrigir bugs, ainda há uma falta de conhecimento sobre o valor real e as limitações das ferramentas de reparo propostas. Isso se deve às avaliações de alto nível e não avançadas realizadas usando o mesmo benchmark de bugs. Nesta tese, contribuições para o campo de pesquisa de reparo automático são apresentadas, que são focadas na avaliação de ferramentas de reparo. Primeiro, o problema da falta de benchmarks de bugs é abordado. Um novo benchmark de bugs reais e reproduzíveis é proposto, chamado Bears, que foi construído com uma nova abordagem para mineração de bugs a partir de repositórios de software. Em seguida, o problema da falta de conhecimento sobre benchmarks de bugs é abordado. Um estudo descritivo sobre Bears e outros dois benchmarks de bugs é apresentado, que inclui análises sobre diferentes aspectos dos bugs. Por fim, o problema do uso extensivo do mesmo benchmark de bugs para avaliar ferramentas de reparo é abordado. O problema de benchmark overfitting é definido e investigado através de um estudo empírico sobre ferramentas de reparo usando Bears e os outros dois benchmarks. Foi descoberto que a maioria das ferramentas de reparo sofre do problema de benchmark overfitting em relação ao extensivamente usado benchmark. As descobertas a partir de ambos os estudos sugerem que os benchmarks de bugs são complementares e que o uso de múltiplos e diversos benchmarks de bugs é essencial para avaliar a generalização da eficácia das ferramentas de reparo automáticas.
Abstract:	Software systems are indispensable for everyone's daily activities nowadays. If a system behaves differently from what is expected, it is said to contain a bug. These bugs are ubiquitous, and manually fixing them is well-known as an expensive task. This motivates the emerging research field automatic program repair, which aims to fix bugs without human intervention. Despite the effort that researchers have made in the last decade by showing that automatic repair approaches have the potential to fix bugs, there is still a lack of knowledge about the real value and limitations of the proposed repair tools. This is due to the high-level, non-advanced evaluations performed on the same benchmark of bugs. In this thesis, we report on contributions to the research field of automatic repair research focused on the evaluation of repair tools. First, we address the problem of the scarcity of benchmarks of bugs. We propose a new benchmark of real reproducible bugs, named Bears, which was built with a novel approach to mining bugs from software repositories. Second, we address the problem of the lack of knowledge about benchmarks of bugs. We present a descriptive study on Bears and two other benchmarks, including analyses of different aspects of their bugs. Finally, we address the problem of the extensive usage of the same benchmark of bugs to evaluate repair tools. We define the benchmark overfitting problem and investigate it through an empirical study on repair tools over Bears and the other two benchmarks. We found that most repair tools indeed overfit the extensively used benchmark. Our findings from both studies suggest that the benchmarks of bugs are complementary to each other and that the usage of multiple and diverse benchmarks of bugs is key to evaluating the generalization of the effectiveness of automatic repair tools.
Keywords:	Bugs de software Software bugs Reparo automático de programas Automatic program repair Reparo baseado em conjunto de testes Test-suite-based repair Avaliação de ferramenta de reparo Repair tool evaluation Benchmarks de bugs Benchmarks of bugs
Area (s) of CNPq:	CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::ENGENHARIA DE SOFTWARE
Subject:	Computação
Language:	eng
Country:	Brasil
Publisher:	Universidade Federal de Uberlândia
Program:	Programa de Pós-graduação em Ciência da Computação
Quote:	DELFIM, Fernanda Madeiral. On benchmarks of bugs for studies in automatic program repair. 2025. 113 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Uberlândia, Uberlândia, 2025. DOI http://doi.org/10.14393/ufu.te.2024.5079.
Document identifier:	http://doi.org/10.14393/ufu.te.2024.5079
URI:	https://repositorio.ufu.br/handle/123456789/44574
Date of defense:	11-Mar-2019
Sustainable Development Goals SDGs:	ODS::ODS 4. Educação de qualidade - Assegurar a educação inclusiva, e equitativa e de qualidade, e promover oportunidades de aprendizagem ao longo da vida para todos.
Appears in Collections:	TESE - Ciência da Computação

Files in This Item:

File	Description	Size	Format
BenchmarksBugsStudies.pdf	Tese	5.73 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License