Page 65 - Fister jr., Iztok, and Andrej Brodnik (eds.). StuCoSReC. Proceedings of the 2018 5th Student Computer Science Research Conference. Koper: University of Primorska Press, 2018
P. 65
TRAINING 4. EXPERIMENTS
In this section, we will present details about the training pro- In this section will describe the dataset for evaluation and
cedure. Our project is created using the Python3 program- our experiments which include graphs and results.
ming language and popular library Keras4 for implementing
neural networks. For all models, we train our networks us- 4.1 Data Set
ing the back-propagation algorithm with categorical cross-
entropy loss function. For parameter optimization we using We test our model on dataset intended for named entity
RMSprop optimizer with learning rate η = 0.001 as rec- recognition. The dataset was introduced in 2003. On CoNLL
ommended by Keras library documentation. At the output conference where was the shared task the language-independent
layer of neural networks we using the Softmax activation named entity recognition. The dataset was concentrate on
function. The Softmax activation function calculates the four type of named entities: persons (PER), locations (LOC),
probability distribution over all classes/entities and chooses organizations (ORG), names of miscellaneous entities (MISC)
the class with highest probability for each word. For a vec- that don’t belong to previous three groups and no entity to-
tor z of dimensionality D, the Softmax function is defined ken (O). The line of the dataset contains four fields: the
as [10]: word, part of speech tag, chunk tag, and name entity tag.
Our focus was on name entity tag and the format of named
entity tag is IOB (Inside, Outside, Beginning) where tokens
are labeled as B-label if the token is the beginning of a named
σ(zi) = ezi for j = 1,...,D entity, I-label if it is inside a named entity but the first to-
k ken within the named entity, or O otherwise. The tagging
ezj scheme is the IOB scheme originally presented by Ramshaw
j=1
et al. (1995) [14]. The dataset was contain training, testing
To regularize our model we using dropout technique for re- and validation data for two languages, English and German
ducing overfitting in neural networks [18]. We apply dropout [16]. In this paper, we pick the English language.
on each input layer with 0.1 and spatial dropout after con- The statistics of the dataset are shown in Table 1. We use
catenating all inputs with 0.3. Hidden layers contain RNN the true data, without any pre-processing.
cells which have own dropout rate and a number of LSTM
or GRU units. Hidden layers are configurable, details are Table 1: Dataset statistic, where columns Token and
described below. Sentence refers to the number of tokens and sen-
Model training and model structure are completely config- tences in the CoNLL 2003 dataset.
urable with json file. The example of json file is displayed
in Listing 1. Dataset Token Sentence
{ Training 204567 14987
”max epochs ”: 10 ,
”batch size ”: 32 , Validation 51578 3466
” s a v e p a t h ” : ”models / gru−words−pos−c h a r s / ” ,
” i n p u t s ” : ”words−pos−c h a r s ” , Testing 46666 3684
”embeddings trainable ”: false ,
”embeddings type ”: ”glove ”, 4.2 Features
” r n n t y p e ” : ”GRU” ,
”rnn num layers ”: 3 , In this paper, we use three types of additional features to
”rnn bidirectional ”: true , help our models to improve. To most important features is
”rnn hidden size ”: 100 , Word Embedding. The main problem of using methods of
”rnn dropout ”: 0.2 , deep learning in processing natural language is how to con-
”model name ” : ”GRU−example ” vert word into numbers. Word embedding is the technique
} of creating a language model which mapping words from
the vocabulary into a corresponding vector of real numbers.
Listing 1: Example of model configuration over json We tried randomly initialized word vectors and pre-trained
file. GloVe vectors with 100 dimensions. The results of using
GloVe vectors are significantly better than in the case of the
random initialization.
With json file we made the list of different models and run Word embedding is able to capture syntactic and semantic
our experiments. Model training is configurable with a num- information but an example of task like NER is very useful to
ber of epochs and batch size for epoch training. The model have morphological and shape information about the word.
structure is configurable with input features such as word A lot of NLP system with additional character-level features
embedding, part of speech and character embedding. Hid- are reported better results [20]. Based on the good experi-
den layers are configurable with RNN cell type, LSTM or ence from other articles, we apply character-level input to
GRU, number of cells in the layer, number of hidden lay- one layer with LSTM cells and we notice improvements in
ers, use of bidirectional layers and the dropout rate for each results.
RNN layer. 4.3 Results
3https://www.python.org/ We were running our two experiments with four different
4https://keras.io/ models (LSTM, GRU, BI-LSTM, BI-GRU). The first exper-
StuCoSReC Proceedings of the 2018 5th Student Computer Science Research Conference 67
Ljubljana, Slovenia, 9 October
In this section, we will present details about the training pro- In this section will describe the dataset for evaluation and
cedure. Our project is created using the Python3 program- our experiments which include graphs and results.
ming language and popular library Keras4 for implementing
neural networks. For all models, we train our networks us- 4.1 Data Set
ing the back-propagation algorithm with categorical cross-
entropy loss function. For parameter optimization we using We test our model on dataset intended for named entity
RMSprop optimizer with learning rate η = 0.001 as rec- recognition. The dataset was introduced in 2003. On CoNLL
ommended by Keras library documentation. At the output conference where was the shared task the language-independent
layer of neural networks we using the Softmax activation named entity recognition. The dataset was concentrate on
function. The Softmax activation function calculates the four type of named entities: persons (PER), locations (LOC),
probability distribution over all classes/entities and chooses organizations (ORG), names of miscellaneous entities (MISC)
the class with highest probability for each word. For a vec- that don’t belong to previous three groups and no entity to-
tor z of dimensionality D, the Softmax function is defined ken (O). The line of the dataset contains four fields: the
as [10]: word, part of speech tag, chunk tag, and name entity tag.
Our focus was on name entity tag and the format of named
entity tag is IOB (Inside, Outside, Beginning) where tokens
are labeled as B-label if the token is the beginning of a named
σ(zi) = ezi for j = 1,...,D entity, I-label if it is inside a named entity but the first to-
k ken within the named entity, or O otherwise. The tagging
ezj scheme is the IOB scheme originally presented by Ramshaw
j=1
et al. (1995) [14]. The dataset was contain training, testing
To regularize our model we using dropout technique for re- and validation data for two languages, English and German
ducing overfitting in neural networks [18]. We apply dropout [16]. In this paper, we pick the English language.
on each input layer with 0.1 and spatial dropout after con- The statistics of the dataset are shown in Table 1. We use
catenating all inputs with 0.3. Hidden layers contain RNN the true data, without any pre-processing.
cells which have own dropout rate and a number of LSTM
or GRU units. Hidden layers are configurable, details are Table 1: Dataset statistic, where columns Token and
described below. Sentence refers to the number of tokens and sen-
Model training and model structure are completely config- tences in the CoNLL 2003 dataset.
urable with json file. The example of json file is displayed
in Listing 1. Dataset Token Sentence
{ Training 204567 14987
”max epochs ”: 10 ,
”batch size ”: 32 , Validation 51578 3466
” s a v e p a t h ” : ”models / gru−words−pos−c h a r s / ” ,
” i n p u t s ” : ”words−pos−c h a r s ” , Testing 46666 3684
”embeddings trainable ”: false ,
”embeddings type ”: ”glove ”, 4.2 Features
” r n n t y p e ” : ”GRU” ,
”rnn num layers ”: 3 , In this paper, we use three types of additional features to
”rnn bidirectional ”: true , help our models to improve. To most important features is
”rnn hidden size ”: 100 , Word Embedding. The main problem of using methods of
”rnn dropout ”: 0.2 , deep learning in processing natural language is how to con-
”model name ” : ”GRU−example ” vert word into numbers. Word embedding is the technique
} of creating a language model which mapping words from
the vocabulary into a corresponding vector of real numbers.
Listing 1: Example of model configuration over json We tried randomly initialized word vectors and pre-trained
file. GloVe vectors with 100 dimensions. The results of using
GloVe vectors are significantly better than in the case of the
random initialization.
With json file we made the list of different models and run Word embedding is able to capture syntactic and semantic
our experiments. Model training is configurable with a num- information but an example of task like NER is very useful to
ber of epochs and batch size for epoch training. The model have morphological and shape information about the word.
structure is configurable with input features such as word A lot of NLP system with additional character-level features
embedding, part of speech and character embedding. Hid- are reported better results [20]. Based on the good experi-
den layers are configurable with RNN cell type, LSTM or ence from other articles, we apply character-level input to
GRU, number of cells in the layer, number of hidden lay- one layer with LSTM cells and we notice improvements in
ers, use of bidirectional layers and the dropout rate for each results.
RNN layer. 4.3 Results
3https://www.python.org/ We were running our two experiments with four different
4https://keras.io/ models (LSTM, GRU, BI-LSTM, BI-GRU). The first exper-
StuCoSReC Proceedings of the 2018 5th Student Computer Science Research Conference 67
Ljubljana, Slovenia, 9 October