On benchmarks of bugs for studies in automatic program repair

Delfim, Fernanda Madeiral

Please use this identifier to cite or link to this item: https://repositorio.ufu.br/handle/123456789/44574

Full metadata record

DC Field	Value	Language
dc.creator	Delfim, Fernanda Madeiral	-
dc.date.accessioned	2025-01-16T14:21:31Z	-
dc.date.available	2025-01-16T14:21:31Z	-
dc.date.issued	2019-03-11	-
dc.identifier.citation	DELFIM, Fernanda Madeiral. On benchmarks of bugs for studies in automatic program repair. 2025. 113 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Uberlândia, Uberlândia, 2025. DOI http://doi.org/10.14393/ufu.te.2024.5079.	pt_BR
dc.identifier.uri	https://repositorio.ufu.br/handle/123456789/44574	-
dc.description.abstract	Software systems are indispensable for everyone's daily activities nowadays. If a system behaves differently from what is expected, it is said to contain a bug. These bugs are ubiquitous, and manually fixing them is well-known as an expensive task. This motivates the emerging research field automatic program repair, which aims to fix bugs without human intervention. Despite the effort that researchers have made in the last decade by showing that automatic repair approaches have the potential to fix bugs, there is still a lack of knowledge about the real value and limitations of the proposed repair tools. This is due to the high-level, non-advanced evaluations performed on the same benchmark of bugs. In this thesis, we report on contributions to the research field of automatic repair research focused on the evaluation of repair tools. First, we address the problem of the scarcity of benchmarks of bugs. We propose a new benchmark of real reproducible bugs, named Bears, which was built with a novel approach to mining bugs from software repositories. Second, we address the problem of the lack of knowledge about benchmarks of bugs. We present a descriptive study on Bears and two other benchmarks, including analyses of different aspects of their bugs. Finally, we address the problem of the extensive usage of the same benchmark of bugs to evaluate repair tools. We define the benchmark overfitting problem and investigate it through an empirical study on repair tools over Bears and the other two benchmarks. We found that most repair tools indeed overfit the extensively used benchmark. Our findings from both studies suggest that the benchmarks of bugs are complementary to each other and that the usage of multiple and diverse benchmarks of bugs is key to evaluating the generalization of the effectiveness of automatic repair tools.	pt_BR
dc.description.sponsorship	CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior	pt_BR
dc.language	eng	pt_BR
dc.publisher	Universidade Federal de Uberlândia	pt_BR
dc.rights	Acesso Aberto	pt_BR
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Bugs de software	pt_BR
dc.subject	Software bugs	pt_BR
dc.subject	Reparo automático de programas	pt_BR
dc.subject	Automatic program repair	pt_BR
dc.subject	Reparo baseado em conjunto de testes	pt_BR
dc.subject	Test-suite-based repair	pt_BR
dc.subject	Avaliação de ferramenta de reparo	pt_BR
dc.subject	Repair tool evaluation	pt_BR
dc.subject	Benchmarks de bugs	pt_BR
dc.subject	Benchmarks of bugs	pt_BR
dc.title	On benchmarks of bugs for studies in automatic program repair	pt_BR
dc.title.alternative	Sobre benchmarks de bugs para estudos em reparo automático de programas	pt_BR
dc.type	Tese	pt_BR
dc.contributor.advisor1	Maia, Marcelo de Almeida	-
dc.contributor.advisor1Lattes	http://lattes.cnpq.br/4915659948263445	pt_BR
dc.contributor.referee1	Silva, Flávio de Oliveira	-
dc.contributor.referee1Lattes	http://lattes.cnpq.br/3190608911887258	pt_BR
dc.contributor.referee2	Julia, Stéphane	-
dc.contributor.referee2Lattes	http://lattes.cnpq.br/6736358221140969	pt_BR
dc.contributor.referee3	Ferrari, Fabiano Cutigi	-
dc.contributor.referee3Lattes	http://lattes.cnpq.br/3154345471250570	pt_BR
dc.contributor.referee4	Murta, Leonardo Gresta Paulino	-
dc.contributor.referee4Lattes	http://lattes.cnpq.br/1565296529736448	pt_BR
dc.creator.Lattes	http://lattes.cnpq.br/8246690925340020	pt_BR
dc.description.degreename	Tese (Doutorado)	pt_BR
dc.description.resumo	Sistemas de software são indispensáveis para as atividades diárias de todos hoje em dia. Se um sistema se comporta de uma maneira diferente do esperado, diz-se que ele contém um bug. Esses bugs são onipresentes, e corrigi-los manualmente é bem conhecido como uma tarefa cara. Isso motiva o campo de pesquisa emergente reparo automático de programas, que tem como objetivo corrigir bugs sem intervenção humana. Apesar do esforço que pesquisadores fizeram na última década, mostrando que abordagens de reparo automático têm o potencial de corrigir bugs, ainda há uma falta de conhecimento sobre o valor real e as limitações das ferramentas de reparo propostas. Isso se deve às avaliações de alto nível e não avançadas realizadas usando o mesmo benchmark de bugs. Nesta tese, contribuições para o campo de pesquisa de reparo automático são apresentadas, que são focadas na avaliação de ferramentas de reparo. Primeiro, o problema da falta de benchmarks de bugs é abordado. Um novo benchmark de bugs reais e reproduzíveis é proposto, chamado Bears, que foi construído com uma nova abordagem para mineração de bugs a partir de repositórios de software. Em seguida, o problema da falta de conhecimento sobre benchmarks de bugs é abordado. Um estudo descritivo sobre Bears e outros dois benchmarks de bugs é apresentado, que inclui análises sobre diferentes aspectos dos bugs. Por fim, o problema do uso extensivo do mesmo benchmark de bugs para avaliar ferramentas de reparo é abordado. O problema de benchmark overfitting é definido e investigado através de um estudo empírico sobre ferramentas de reparo usando Bears e os outros dois benchmarks. Foi descoberto que a maioria das ferramentas de reparo sofre do problema de benchmark overfitting em relação ao extensivamente usado benchmark. As descobertas a partir de ambos os estudos sugerem que os benchmarks de bugs são complementares e que o uso de múltiplos e diversos benchmarks de bugs é essencial para avaliar a generalização da eficácia das ferramentas de reparo automáticas.	pt_BR
dc.publisher.country	Brasil	pt_BR
dc.publisher.program	Programa de Pós-graduação em Ciência da Computação	pt_BR
dc.sizeorduration	113	pt_BR
dc.subject.cnpq	CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::ENGENHARIA DE SOFTWARE	pt_BR
dc.identifier.doi	http://doi.org/10.14393/ufu.te.2024.5079	pt_BR
dc.orcid.putcode	175958013	-
dc.crossref.doibatchid	72d78f57-aed2-4ad5-bb68-92e19bc636f0	-
dc.subject.autorizado	Computação	pt_BR
dc.subject.ods	ODS::ODS 4. Educação de qualidade - Assegurar a educação inclusiva, e equitativa e de qualidade, e promover oportunidades de aprendizagem ao longo da vida para todos.	pt_BR
Appears in Collections:	TESE - Ciência da Computação

Files in This Item:

File	Description	Size	Format
BenchmarksBugsStudies.pdf	Tese	5.73 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License