Page 65 - Fister jr., Iztok, and Andrej Brodnik (eds.). StuCoSReC. Proceedings of the 2018 5th Student Computer Science Research Conference. Koper: University of Primorska Press, 2018

P. 65

TRAINING 4. EXPERIMENTS

In this section, we will present details about the training pro- In this section will describe the dataset for evaluation and
cedure. Our project is created using the Python3 program- our experiments which include graphs and results.
ming language and popular library Keras4 for implementing
neural networks. For all models, we train our networks us- 4.1 Data Set
ing the back-propagation algorithm with categorical cross-
entropy loss function. For parameter optimization we using We test our model on dataset intended for named entity
RMSprop optimizer with learning rate η = 0.001 as rec- recognition. The dataset was introduced in 2003. On CoNLL
ommended by Keras library documentation. At the output conference where was the shared task the language-independent
layer of neural networks we using the Softmax activation named entity recognition. The dataset was concentrate on
function. The Softmax activation function calculates the four type of named entities: persons (PER), locations (LOC),
probability distribution over all classes/entities and chooses organizations (ORG), names of miscellaneous entities (MISC)
the class with highest probability for each word. For a vec- that don’t belong to previous three groups and no entity to-
tor z of dimensionality D, the Softmax function is deﬁned ken (O). The line of the dataset contains four ﬁelds: the
as [10]: word, part of speech tag, chunk tag, and name entity tag.
Our focus was on name entity tag and the format of named

entity tag is IOB (Inside, Outside, Beginning) where tokens

are labeled as B-label if the token is the beginning of a named

σ(zi) = ezi for j = 1,...,D entity, I-label if it is inside a named entity but the ﬁrst to-
k ken within the named entity, or O otherwise. The tagging
ezj scheme is the IOB scheme originally presented by Ramshaw
j=1

et al. (1995) [14]. The dataset was contain training, testing

To regularize our model we using dropout technique for re- and validation data for two languages, English and German
ducing overﬁtting in neural networks [18]. We apply dropout [16]. In this paper, we pick the English language.
on each input layer with 0.1 and spatial dropout after con- The statistics of the dataset are shown in Table 1. We use
catenating all inputs with 0.3. Hidden layers contain RNN the true data, without any pre-processing.

cells which have own dropout rate and a number of LSTM

or GRU units. Hidden layers are conﬁgurable, details are Table 1: Dataset statistic, where columns Token and
described below. Sentence refers to the number of tokens and sen-
Model training and model structure are completely conﬁg- tences in the CoNLL 2003 dataset.
urable with json ﬁle. The example of json ﬁle is displayed

in Listing 1. Dataset Token Sentence

{ Training 204567 14987
”max epochs ”: 10 ,

”batch size ”: 32 , Validation 51578 3466
” s a v e p a t h ” : ”models / gru−words−pos−c h a r s / ” ,

” i n p u t s ” : ”words−pos−c h a r s ” , Testing 46666 3684

”embeddings trainable ”: false ,

”embeddings type ”: ”glove ”, 4.2 Features
” r n n t y p e ” : ”GRU” ,
”rnn num layers ”: 3 , In this paper, we use three types of additional features to
”rnn bidirectional ”: true , help our models to improve. To most important features is
”rnn hidden size ”: 100 , Word Embedding. The main problem of using methods of
”rnn dropout ”: 0.2 , deep learning in processing natural language is how to con-
”model name ” : ”GRU−example ” vert word into numbers. Word embedding is the technique
} of creating a language model which mapping words from
the vocabulary into a corresponding vector of real numbers.

Listing 1: Example of model conﬁguration over json We tried randomly initialized word vectors and pre-trained
ﬁle. GloVe vectors with 100 dimensions. The results of using
GloVe vectors are signiﬁcantly better than in the case of the

random initialization.

With json ﬁle we made the list of diﬀerent models and run Word embedding is able to capture syntactic and semantic
our experiments. Model training is conﬁgurable with a num- information but an example of task like NER is very useful to
ber of epochs and batch size for epoch training. The model have morphological and shape information about the word.
structure is conﬁgurable with input features such as word A lot of NLP system with additional character-level features
embedding, part of speech and character embedding. Hid- are reported better results [20]. Based on the good experi-
den layers are conﬁgurable with RNN cell type, LSTM or ence from other articles, we apply character-level input to
GRU, number of cells in the layer, number of hidden lay- one layer with LSTM cells and we notice improvements in
ers, use of bidirectional layers and the dropout rate for each results.

RNN layer. 4.3 Results

3https://www.python.org/ We were running our two experiments with four diﬀerent
4https://keras.io/ models (LSTM, GRU, BI-LSTM, BI-GRU). The ﬁrst exper-

StuCoSReC Proceedings of the 2018 5th Student Computer Science Research Conference 67
Ljubljana, Slovenia, 9 October

60 61 62 63 64 65 66 67 68 69 70