Study on the relationship of training data size to error rate and the performance comparison for two decision tree algorithms

Zheng, Jianjun

Study on the relationship of training data size to error rate and the performance comparison for two decision tree algorithms

dc.creator	Zheng, Jianjun
dc.date.accessioned	2016-11-14T23:28:00Z
dc.date.available	2011-02-18T22:06:24Z
dc.date.available	2016-11-14T23:28:00Z
dc.date.issued	2004-08
dc.degree.department	Computer Science	en_US
dc.description.abstract	The decision tree model is a well accepted and widely used classification technique in the data mining field because of its advantages with fast construction, accuracy, and understandability. The decision tree model can be induced through algorithms, such as C4.5 and CART. This thesis research studies the relationship of training data size to error rate for the C4.5 and CART algorithms, and also compares the performance of both of them. Several conclusions are drawn from the results of this thesis research; for example, the well accepted 66.7:33.3 splitting ratio in the literature can be increased to 80:20 for large data sets with more than 1000 samples to generate more accurate decision tree models. This thesis research also shows that the performance of C4.5 and CART on small data sets are similar, but differ on large data sets; therefore, large data sets are more suitable for comparing different algorithms.
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/2346/17571	en_US
dc.language.iso	eng
dc.publisher	Texas Tech University	en_US
dc.rights.availability	Unrestricted.
dc.subject	Errors -- Measurement -- Analysis	en_US
dc.subject	Decision making -- Mathematical models	en_US
dc.subject	Decision logic tables	en_US
dc.subject	Chi-square test	en_US
dc.subject	Distribution (Probability theory)	en_US
dc.subject	Decision trees	en_US
dc.subject	Entropy -- Measurement	en_US
dc.subject	Algorithms	en_US
dc.title	Study on the relationship of training data size to error rate and the performance comparison for two decision tree algorithms
dc.type	Thesis

Collections

Texas Tech University

Study on the relationship of training data size to error rate and the performance comparison for two decision tree algorithms

Files

Collections