Efficient implementation of induction of decisions rules

Defense Date:

This thesis titled ‘‘Efficient implementation of induction of decision rules” introduces different approaches for implementing the rule induction algorithm based on IREP++ and RIPPER algorithms. This algorithm is used to solve classification problems. Its main principle is to create rules, based on data, which will allow to assign class to an example clearly. The purpose of this thesis was to implement this algorithm in a way to obtain high performance together with high accuracy and effective model. Two different implementations are shown in this thesis, presenting varied approaches to data storage and the algorithm. The first one is quite simple and uses dictionaries. One dictionary is created for every conditional variable. Each dictionary contains pairs of values: example index and a value of a variable for this example. Afterwards, using all unique values of all variables, new literals are created and then transformed into rules. The second approach assumes creating a map for every unique value of a variable, which contains only indices of those examples, which have this value. Afterwards, maps are joined using conjunctions and alternatives in order to create rules. Those two ways are implemented in the Python language. Both implementations are tested on quality and performance level. Achieved results are presented in this thesis. There is a huge variety of data sets used in experiments in order to rate the performance and effectiveness of a model in a reliable way.