A Systematic Data Collection Procedure for Software Defect Prediction
Computer Science and Information Systems, Tome 13 (2016) no. 1.

Voir la notice de l'article provenant de la source Computer Science and Information Systems website

Software defect prediction research relies on data that must be collected from otherwise separate repositories. To achieve greater generalization of the results, standardized protocols for data collection and validation are necessary. This paper presents an exhaustive survey of techniques and approaches used in the data collection process. It identifies some of the issues that must be addressed to minimize dataset bias and also provides a number of measures that can help researchers to compare their data collection approaches and evaluate their data quality. Moreover, we present a data collection procedure that uses a bug-code linking technique based on regular expression. The detailed comparison and root cause analysis of inconsistencies with a number of popular data collection approaches and their publicly available datasets, reveals that our procedure achieves the most favorable results. Finally, we implement our data collection procedure in a data collection tool we name the Bug-Code (BuCo) Analyzer.
Keywords: software defect prediction, data collection issues, dataset bias, bug-code linking, open-source projects
@article{CSIS_2016_13_1_a9,
     author = {Goran Mau\v{s}a and Tihana Galinac Grbac and Bojana Dalbelo Ba\v{s}i\'c},
     title = {A {Systematic} {Data} {Collection} {Procedure} for {Software} {Defect} {Prediction}},
     journal = {Computer Science and Information Systems},
     publisher = {mathdoc},
     volume = {13},
     number = {1},
     year = {2016},
     url = {http://geodesic.mathdoc.fr/item/CSIS_2016_13_1_a9/}
}
TY  - JOUR
AU  - Goran Mauša
AU  - Tihana Galinac Grbac
AU  - Bojana Dalbelo Bašić
TI  - A Systematic Data Collection Procedure for Software Defect Prediction
JO  - Computer Science and Information Systems
PY  - 2016
VL  - 13
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CSIS_2016_13_1_a9/
ID  - CSIS_2016_13_1_a9
ER  - 
%0 Journal Article
%A Goran Mauša
%A Tihana Galinac Grbac
%A Bojana Dalbelo Bašić
%T A Systematic Data Collection Procedure for Software Defect Prediction
%J Computer Science and Information Systems
%D 2016
%V 13
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CSIS_2016_13_1_a9/
%F CSIS_2016_13_1_a9
Goran Mauša; Tihana Galinac Grbac; Bojana Dalbelo Bašić. A Systematic Data Collection Procedure for Software Defect Prediction. Computer Science and Information Systems, Tome 13 (2016) no. 1. http://geodesic.mathdoc.fr/item/CSIS_2016_13_1_a9/