Page 62 - Fister jr., Iztok, and Andrej Brodnik (eds.). StuCoSReC. Proceedings of the 2016 3rd Student Computer Science Research Conference. Koper: University of Primorska Press, 2016

P. 62

hods. represent the more similar texts, however, no machine trans-
lations will attain a score of 1.
3. BACKGROUND
4. GENETIC ALGORITHM
The ﬁrst ideas of SMT were introduced in 1949 [24]. SMT
was re-introduced in the late 1980s and early 1990s by re- Encoding of chromosomes is one of the problems, when you
searchers at IBM’s Thomas J. Watson Research Center [6, are starting to solve problem with GA. Encoding very de-
5, 7] and has contributed to the signiﬁcant resurgence in in- pends on the problem. Our problem is to ﬁnd optimal mod-
terest in MT. Nowadays it is by far the most widely studied els’ weights. In our algorithm, vector of weights is deﬁned
MT method. individual x = x1, x2, ..., xD where D is the dimension of
the problem. The dimension D is the number of models’
The idea behind SMT comes from information theory. A weights.
text is translated according to the probability distribution
p(e|f ) where a string e in the target language is the trans- The initial population is consisted of Np individuals and is
lation of a string f in the source language. generated randomly between min and max values. As seen
in Algorithm 1, the simplest form of GA involves three types
One approach to model the probability distribution p(e|f ) of operators: crossover, mutation, and selection.
is to apply Bayes Theorem:
The crossover operator uses a crossover probability (Cr ) to
p(e|f ) = p(f |e) · p(e) (1) cross over randomly selected individuals s1 and s2 from the
population to form a new individual (x) using the following
equation:

where the translation model p(f |e) is the probability that xj = s1j rand() < Cr, (3)
the source string f is the translation of the target string e, s2j else.
and the language model p(e) is the probability of seeing that
target string e. This decomposition is attractive as it splits where j is a value between 1 and D and rand() returns a
the problem into the two subproblems. Finding the best real value between 0 and 1. If no crossover was performed,
translation e∗ is done by searching among all possible trans- the oﬀspring x is an exact copy of one of the parents.
lations E for the translation with the highest probability
using the following equation: The mutation operator uses a mutation probability (F ) to
mutate new oﬀspring at each position using the following
e∗ = arg max p(e|f ) = arg max p(f |e) · p(e) (2) equation:

e∈E e∈E

Performing this search eﬃciently is the work of a MT de- xj = xj + rand1() rand2() < F, (4)
coder that uses the foreign string, heuristics and other meth- xj else.
ods to limit the search space and at the same time keeping
acceptable quality. This trade-oﬀ between quality and time where j is a value between 1 and D and randr(), r = {1, 2},
usage can also be found in speech recognition. A text is typ- returns a real value between 0 and 1. After the crossover
ically translated sentence by sentence, but even this is not and mutation the weights can fall out of the interval [min,
enough. In SMT models are based on n-grams, where a n- max ].
gram is a sequence of n items from a given sequence of text.
The statistical translation models were initially word based, The selection operator selects solutions from the newly cre-
but signiﬁcant advances were made with the introduction of ated population and the current population. Solutions with
phrase based models. Language models are typically approx- better ﬁtness survives into the next generation g. The end
imated by smoothed n-gram models, and similar approaches condition is a maximum number of generations(G).
have been applied to the translation models, but there is
additional complexity due to diﬀerent sentence lengths and 4.1 Roulette Wheel Selection
word orders in the languages. Therefore there are more ad-
vanced models that deal with these additional complexity. Because the basic GA selects from all solutions (individuals)
Phrase and word penalty models ensures that the transla- in the population, we decided to use the roulette wheel (RW)
tions/phrases are not too long or too short. Lexical reorder- selection [1] to speed up the process. The basic part of the
ing model conditions reordering on the actual phrases. By selection process is to stochastically select from one genera-
limiting the reordering, we can speed up the decoder as well tion to create the basis of the next generation. The require-
as increase the translation quality. The translation quality is ment is that the ﬁttest individuals have a greater chance of
considered to be the correspondence between a machine and survival than weaker ones. This replicates nature in that
professional human (reference) translation. One of the ﬁrst ﬁtter individuals will tend to have a better probability of
metrics to achieve a high correlation with human judgments survival and will go forward to form the mating pool for
is the BLEU metric, and it is the more popular metric in the next generation. Weaker individuals are not without a
SMT. The BLEU metric always output a real-valued num- chance. In nature such individuals may have genetic coding
ber between 0 and 1. This value indicates how similar are that may prove useful to future generations.
the machine and reference translations. Values closer to 1

StuCoSReC Proceedings of the 2016 3rd Student Computer Science Research Conference 62
Ljubljana, Slovenia, 12 October

57 58 59 60 61 62 63 64 65 66 67