Open source software development and maintenance: an exploratory analysis
Abstract
The purpose of this research was to create measures and models for the evaluation of Open Source Software (OSS) projects. An exploratory analysis of the development and maintenance processes in OSS was conducted for this purpose. Data mining and text mining techniques were used to discover knowledge from transactional datasets maintained on OSS projects. Large and comprehensive datasets were used to formulate, test and validate the models. A new multidimensional measure of OSS project performance, called project viability was defined and validated. A theoretical and empirical measurement framework was used to evaluate the new measure. OSS project data from SourceForge.net was used to validate the new measure. Results indicated that project viability is a measure of the performance of OSS projects. Three models were then created for each dimension of project viability. Multiple data mining techniques were used to create the models. Variables identified from process, product, resource and end-user characteristics of the project were used. The use of new variables created through text mining improved the performance of the models. The first model was created for OSS projects in the development phase. The results indicated that end-user involvement could play a significant role in the development of OSS projects. It was also discovered that certain types of projects are more suitable for development in OSS communities. The second model was developed for OSS projects in their maintenance phase. A two-stage model for maintenance performance was selected. The results indicated that high project usage and usefulness could improve the maintenance performance of OSS projects. The third model was developed to investigate the affects of maintenance activities on the project internal structure. Maintenance data for Linux project was used to develop a new taxonomy for OSS maintenance patches. These results were then used to study the affects of various types of patches on the internal structure of the software. It was found that performing proactive maintenance on the software moderates its internal structure.