Page 35 - Fister jr., Iztok, and Andrej Brodnik (eds.). StuCoSReC. Proceedings of the 2018 5th Student Computer Science Research Conference. Koper: University of Primorska Press, 2018
P. 35
ng constrained exhaustive search vs. greedy heuristic
search for classification rule learning

Jamolbek Mattiev Branko Kavšek

PhD student, teaching assistant Assistant Professor
University of Primorska University of Primorska, Jožef Stefan Institute
Slovenia
Slovenia
(+386)-70366331
(+386)-56117654
jamolbek.mattiev@famnit.upr.si
branko.kavsek@upr.si

ABSTRACT introduced in [1]. It studies the frequency of items occurring
together in transactional databases, and based on a threshold
Existing classification rule learning algorithms use mainly greedy called support, identifies the frequent itemsets. Another threshold,
heuristic search to find regularities (e.g., a decision tree or a set of confidence, which is the conditional probability that an item
rules) in data for classification. In recent years, extensive research appears in a transaction when another item appears, is used to
was done in the machine learning community on learning rules pinpoint association rules.
using exhaustive search – association rule mining. The objective
there is to find all rules in data that satisfy the user-specified In classification [4], by the help of the analysis of training data we
minimum support and minimum confidence constraints. Although develop a model which is then used to predict the class of objects
the whole set of rules may not be used directly for accurate whose class label is not known. The model is trained so that it can
classification, effective and efficient classifiers have been built distinguish different classes in the data. The training data is
using these, so called, classification association rules. having data objects whose class label is known in advance.
Classification analysis is the organization of data in given classes.
In this paper we compare a “classical” classification rule learning Also known as supervised classification, the classification uses
algorithm that uses greedy heuristic search to produce the final given class labels to order the objects in the data collection.
classifier (a set of decision rules) with a class association rule Classification approaches normally use a training set where all
learner that uses constrained exhaustive search to find objects are already associated with known class labels.
classification rules on a “real life” dataset.
There are many classification approaches such as statistical [8],
The results show that the “classical” classification rule learning divide-and-conquer [6] and covering [3] approaches. Based on
algorithm generates a more compact classifier whose individual these approaches numerous algorithms have been derived such as
rules are somehow “less accurate”. On the other hand, the class PART [5], Naive Bayes [8], See5 [12], C4.5 [11], Prism [3] and
association rule learner produces individual classification rules RIPPER [6]. However, traditional classification techniques often
that are “highly accurate” but the overall classification accuracy produce a small subset of rules, and therefore usually tend to miss
of the classifier remains yet to be checked. detailed rules that might play an important role.

CCS Concepts The aim of our research presented in this paper is to compare a
“classical” state-of-the-art classification rule learning algorithm
• Computing methodologies → Machine learning → Machine that uses greedy heuristic search (PART [5]) with a class
learning approaches → Rule learning association rule learner that uses constrained exhaustive search
• Computing methodologies → Machine learning → Cross- (a modified version of the APRIORI algorithm [2]).
validation
• Computing methodologies → Machine learning → Learning 2. PRELIMINARY CONCEPTS
paradigms → Supervised learning → Supervised learning by
classification Let D be a dataset with n attributes { A1 , A2 , ... , An } and |D|
records (objects) where each record has an object identifier (OID).
Keywords
Let C = { c1 , c2 ,..., ck } be a list of class labels. A specific value
Data mining, machine learning, classification rules, association
rules, search, heuristics. of an attribute Ai and class C is denoted by lower-case letters aim

1. INTRODUCTION and c j respectively.

Classification rule mining and association rule mining are two Definition 1. An item is described as an attribute and a specific
important data mining techniques. Classification rule mining aims
at discovering a small set of rules in the database to form an value for that attribute, denoted by 〈( Ai , aim )〉, e.g. 〈( A1 , a11 )〉,
accurate classifier. Association rule mining finds all rules in the
database that satisfy some minimum support and minimum 〈( A1 , a12 )〉, 〈( A2 , a21 )〉, etc.
confidence constraints [2]. For association rule mining, the target
of mining is not predetermined, while for classification rule Definition 2. An itemset is a set of items, e.g.,
mining there is one and only one pre-determined target, i.e., the
class or dependent attribute. 〈( A1 , a11 ),( A2 , a21 )〉,〈( A1 , a11 ),( A3 , a31 )〉, etc.

Association rule mining is the discovery of what are commonly Definition 3. A Constraint_Itemsets is a user-defined set of
called association rules. Association rule mining, one of the most itemsets in which each itemset is said to be a constrained itemset,
important and well researched techniques of data mining, was first e.g., Constraint_Itemsets = {〈( A1 , a11 ),( A2 , a21 )〉,〈( A3 , a31 )〉}.
Definition 4. A class association rule R has the form
itemset  cj where cj C is a class label.

StuCoSReC Proceedings of the 2018 5th Student Computer Science Research Conference DOI: https://doi.org/10.26493/978-961-7055-26-9.35-38 35
Ljubljana, Slovenia, 9 October
   30   31   32   33   34   35   36   37   38   39   40