Page 63 - Fister jr., Iztok, and Andrej Brodnik (eds.). StuCoSReC. Proceedings of the 2016 3rd Student Computer Science Research Conference. Koper: University of Primorska Press, 2016

P. 63

orithm 1 Genetic Algorithm 5.1 Corpus Preparation
1: Initialization
2: Evaluate the initial population In our experiments we used English and Slovenian JRC-
3: for g = 1 to G do ACQUIS Multilingual Parallel Corpora [21]. In this corpora
4: for i = 1 to N p do the sentences are aligned which means the ﬁrst sentence from
5: Crossover the English text is the translation of the ﬁrst sentence from
6: Mutation the Slovenian text and so on. We split each corpus in three
7: Place newly generated individual in a new popu- parts: train, tuning, and test sets. The train set was used to
train our SMT systems and consisted of 6,500 sentences or
lation 184,072 words in English set and 158,768 words in Slovenian
8: end for set. The tuning set was used to optimize the two SMT
9: Selection systems and consisted of 500 sentences or 13,526 words in
10: end for English set and 12,128 words in Slovenian set. The test
set was used to evaluate our SMT systems and consisted of
MODELS' 3,000 sentences or 86,145 words in English set and 76,879
WEIGHTS words in Slovenian set.

SOURCE MOSES TARGET 5.2 Building the SMT systems
TEXT DECODER TEXT
To successfully build the SMT system, seen in Figure 1, we
MODELS created models from the training set, described in the pre-
vious subsection, and Moses toolkit [15]. Moses toolkit is
Figure 1: SMT system. an open-source toolkit for SMT which contains the SMT
decoder and a wide variety of tools for training, tuning and
Instead of iterating over all solutions, we chose 50 % of the applying the system to many translation tasks. Using Moses
population P which will then be recombined and mutated, as toolkit we created language and translation models. The
seen in Algorithm 2. First we calculate sum of all individu- language model was a 5-gram model with improve Kneser-
als’ ﬁtness values. Then we calculatet value = sum∗rand(), Ney smoothing using IRST language modeling (IRSTLM)
where rand() returns a random value between 0 and 1. Af- toolkit [12]. The translation model was built using grow-
ter that, we iterate over all the population P, and calculate diag-ﬁnal-and alignment from GIZA++ [17]. Our SMT sys-
value. When the new value is lower or equal than zero, we tems were extended with four advanced models: distortion,
return the current individual. lexalized reordering, word, and phrase penalty models. This
gave us six models.
Algorithm 2 Roulette Wheel Selection
1: Calculate value 5.3 Optimization Settings
2: for i = 1 to N p do
3: ind = Pi The following settings were used for the optimizations of
4: value = value − f (ind) both SMT systems: D = 14, min = -1, max = 1, Np =
5: if value ≤ 0 then 15, G = 70, F = 0.5 and Cr = 0.7. We can see the tuning
6: return ind process in the Figure 2. Input to tuning methods are the
7: end if tuning set, models and models’ weights, and the output are
8: end for the new models’ weights.

5. EXPERIMENTS 5.4 Results

In this section we will describe the corpus preparation, how Our experiments were based on experiments in [3]. Authors
we built the two SMT systems, and the optimization set- compared the old and new implementations of MERT and
tings. For each method (baseline, MERT, GA, GARW ) we if they are similarly eﬀective. They also measured the com-
got a ﬁle containing weights. The two SMT systems used putation time. In our experiments we compared the basic
these weights to translate the test set. The translated sets GA with GA with Roulette Wheel Selection (GARW ). Each
were then evaluated using the BLEU metric, as shown in of the optimization method (MERT, GA, GARW ) was ran
Table 1. one time, as in [3], and compared to each other. All three
methods were tuned on the same system using the same tun-
ing corpus, and the BLEU scores are shown in Table 1. As
we can see, the BLEU scores for GARW are better than the
basic GA, and comparable with MERT. From the Table 2
we can see the tuning time for each method where GARW is
50 % faster as the basic GA, and comparable with MERT.

5.5 Discussion

Since the optimization is a part of the training, both of them
are done oﬄine which means the tuning time does not af-
fect the actual translating. The actual translating is done
online and the time to translate text depends on the number
of words. But usually one sentence is translated in 1 sec-
ond. The GARW method is computationally faster because

StuCoSReC Proceedings of the 2016 3rd Student Computer Science Research Conference 63
Ljubljana, Slovenia, 12 October

58 59 60 61 62 63 64 65 66 67 68