Page 62 - Fister jr., Iztok, Andrej Brodnik, Matjaž Krnc and Iztok Fister (eds.). StuCoSReC. Proceedings of the 2019 6th Student Computer Science Research Conference. Koper: University of Primorska Press, 2019
P. 62
METHODS approaches work especially well if we have a small, insuffi-
cient dataset.
In this section, the methods utilized in our proposed GWOTLT
method are briefly presented. Transfer learning is most commonly used in two ways [2, 21]:
2.1 Convolutional Neural Network • Fine-tuning in which the weights of the pre-trained
CNN base model are preserved (frozen) on some of
In the 1980s, the CNNs were first presented in Fukushima’s the layers and fine-tuned (trained) in remaining layers
paper [10]. The author proposed a deep learning approach of CNN.
for visual recognition, called neocognitron, which was based
on the hierarchical layers trained with the utilization of the • CNN as a feature extractor, where the general idea is
stochastic gradient descent algorithm. The major break- to access features of any layers and using those encoded
through with CNNs occurred in 1998 with the LeCun’s LeNet5 features to train a classifier of your choice.
[17] proposed architecture which is considered to be one of
the key factors that started the enormous expansion of the
deep learning field.
Initially, the deep CNNs were defined as 2-dimensional con- Generally, the first (top) layers of the CNN preserve more
strained neural networks with alternating convolutional and abstract, generic features applicable to other tasks, while
subsampling or pooling layers which are fully connected at the layers closer to the bottom provide more specific fea-
the end, combining three architectural ideas [17]: tures that can benefit from fine-tuning as they will be ad-
justed specifically for the targeted task. For the fine-tuning
• local receptive fields, approach to transfer learning, there is no general recipe or
rule to follow on selecting which layers to tune and which
• shared weights, and ones to preserve as they are. Also, another challenge uti-
lizing the fine-tuning approach is deciding how many layers
• spatial and temporal subsampling. to add to the bottom of the pre-trained convolutional base,
and which optimizer and learning rate to use in the process
Most commonly the convolutional layer is composed of sev- of fine-tuning.
eral so-called feature maps. Those feature maps are calcu-
lated with different weight vectors, which enable us to ex- 2.3 Grey Wolf Optimizer
tract multiple features from each location. The results of the
convolutional calculation are obtained from a convolutional In recent years, swarm intelligence and bio-inspired algo-
operation performed between feature maps of the previous rithms for solving the optimization problems are quite pop-
layer and convolution kernel of the current layer in addition ular and proven to be very efficient in solving real-world
to the activation function. A subsampling layer or pooling problems [9].
layer reduces the dimension of feature maps, while preserv-
ing the important extracted features, usually performing lo- One of the most popular representatives of such optimization
cal averaging and subsampling. The fact, that extracted algorithms is a Grey Wolf Optimizer or simply GWO [19].
features’ real locations are not important as long as their The inspiration of GWO is adapted from a strict leader-
approximate positions relative to others remain the same, is ship hierarchy and hunting mechanisms of grey wolfs (Canis
making subsampling possible [17]. lupus). The grey wolf leadership hierarchy is divided into
four dominance groups, i.e. alpha, beta, delta and, omega.
Although the researchers have through the years developed Besides the leadership hierarchy, group hunting is also an in-
various complex CNN architectures which proven to be highly teresting social behavior of grey wolfs. As defined by authors
successful in the large-scale image and video recognition such in [20] main phases of grey wolf hunting are as follows [19]:
as Krizhevsky’s AlexNet [15], Szegedy’s GoogleNet [27] and
Simonyan’s VGG16 [26], the challenges regarding image and • Tracking, chasing and approaching the prey.
video recognition still exist. Such major challenges are pri-
marily the need for large datasets in order to train the CNNs • Pursuing, encircling, and harassing the prey until it
and the time complexity of the training process. stops moving.
• Attack towards the prey.
2.2 Transfer Learning The GWO algorithm implementation is mathematically mod-
eling the mentioned hunting technique and the social hierar-
One of the most popular approaches to address the time chy in order to perform optimization. The basic pseudo-code
complexity of deep CNN training process as well as the prob- of GWO algorithm is presented in Algorithm 1.
lem of not having large dataset is known as a transfer learn-
ing. Transfer learning can be defined as the improvement of 3. PROPOSED METHOD
learning a new task through the transfer of knowledge from
a related task that has already been learned. In machine The basic concept of our proposed method for tuning of
learning terms, the transfer learning roughly translates to transfer learning approach based on the GWO algorithm,
transferring the weights of already trained deep neural net- named as GWOTLT is presented in Figure 1. The GWO
work model for one task, to the model tackling second re- algorithm is used to find the optimal parameters for the
lated task [13]. Based on previous work [16, 2, 25], such
StuCoSReC Proceedings of the 2019 6th Student Computer Science Research Conference 62
Koper, Slovenia, 10 October
cient dataset.
In this section, the methods utilized in our proposed GWOTLT
method are briefly presented. Transfer learning is most commonly used in two ways [2, 21]:
2.1 Convolutional Neural Network • Fine-tuning in which the weights of the pre-trained
CNN base model are preserved (frozen) on some of
In the 1980s, the CNNs were first presented in Fukushima’s the layers and fine-tuned (trained) in remaining layers
paper [10]. The author proposed a deep learning approach of CNN.
for visual recognition, called neocognitron, which was based
on the hierarchical layers trained with the utilization of the • CNN as a feature extractor, where the general idea is
stochastic gradient descent algorithm. The major break- to access features of any layers and using those encoded
through with CNNs occurred in 1998 with the LeCun’s LeNet5 features to train a classifier of your choice.
[17] proposed architecture which is considered to be one of
the key factors that started the enormous expansion of the
deep learning field.
Initially, the deep CNNs were defined as 2-dimensional con- Generally, the first (top) layers of the CNN preserve more
strained neural networks with alternating convolutional and abstract, generic features applicable to other tasks, while
subsampling or pooling layers which are fully connected at the layers closer to the bottom provide more specific fea-
the end, combining three architectural ideas [17]: tures that can benefit from fine-tuning as they will be ad-
justed specifically for the targeted task. For the fine-tuning
• local receptive fields, approach to transfer learning, there is no general recipe or
rule to follow on selecting which layers to tune and which
• shared weights, and ones to preserve as they are. Also, another challenge uti-
lizing the fine-tuning approach is deciding how many layers
• spatial and temporal subsampling. to add to the bottom of the pre-trained convolutional base,
and which optimizer and learning rate to use in the process
Most commonly the convolutional layer is composed of sev- of fine-tuning.
eral so-called feature maps. Those feature maps are calcu-
lated with different weight vectors, which enable us to ex- 2.3 Grey Wolf Optimizer
tract multiple features from each location. The results of the
convolutional calculation are obtained from a convolutional In recent years, swarm intelligence and bio-inspired algo-
operation performed between feature maps of the previous rithms for solving the optimization problems are quite pop-
layer and convolution kernel of the current layer in addition ular and proven to be very efficient in solving real-world
to the activation function. A subsampling layer or pooling problems [9].
layer reduces the dimension of feature maps, while preserv-
ing the important extracted features, usually performing lo- One of the most popular representatives of such optimization
cal averaging and subsampling. The fact, that extracted algorithms is a Grey Wolf Optimizer or simply GWO [19].
features’ real locations are not important as long as their The inspiration of GWO is adapted from a strict leader-
approximate positions relative to others remain the same, is ship hierarchy and hunting mechanisms of grey wolfs (Canis
making subsampling possible [17]. lupus). The grey wolf leadership hierarchy is divided into
four dominance groups, i.e. alpha, beta, delta and, omega.
Although the researchers have through the years developed Besides the leadership hierarchy, group hunting is also an in-
various complex CNN architectures which proven to be highly teresting social behavior of grey wolfs. As defined by authors
successful in the large-scale image and video recognition such in [20] main phases of grey wolf hunting are as follows [19]:
as Krizhevsky’s AlexNet [15], Szegedy’s GoogleNet [27] and
Simonyan’s VGG16 [26], the challenges regarding image and • Tracking, chasing and approaching the prey.
video recognition still exist. Such major challenges are pri-
marily the need for large datasets in order to train the CNNs • Pursuing, encircling, and harassing the prey until it
and the time complexity of the training process. stops moving.
• Attack towards the prey.
2.2 Transfer Learning The GWO algorithm implementation is mathematically mod-
eling the mentioned hunting technique and the social hierar-
One of the most popular approaches to address the time chy in order to perform optimization. The basic pseudo-code
complexity of deep CNN training process as well as the prob- of GWO algorithm is presented in Algorithm 1.
lem of not having large dataset is known as a transfer learn-
ing. Transfer learning can be defined as the improvement of 3. PROPOSED METHOD
learning a new task through the transfer of knowledge from
a related task that has already been learned. In machine The basic concept of our proposed method for tuning of
learning terms, the transfer learning roughly translates to transfer learning approach based on the GWO algorithm,
transferring the weights of already trained deep neural net- named as GWOTLT is presented in Figure 1. The GWO
work model for one task, to the model tackling second re- algorithm is used to find the optimal parameters for the
lated task [13]. Based on previous work [16, 2, 25], such
StuCoSReC Proceedings of the 2019 6th Student Computer Science Research Conference 62
Koper, Slovenia, 10 October