Page 62 - Fister jr., Iztok, and Andrej Brodnik (eds.). StuCoSReC. Proceedings of the 2016 3rd Student Computer Science Research Conference. Koper: University of Primorska Press, 2016
P. 62
hods. represent the more similar texts, however, no machine trans-
lations will attain a score of 1.
3. BACKGROUND
4. GENETIC ALGORITHM
The first ideas of SMT were introduced in 1949 [24]. SMT
was re-introduced in the late 1980s and early 1990s by re- Encoding of chromosomes is one of the problems, when you
searchers at IBM’s Thomas J. Watson Research Center [6, are starting to solve problem with GA. Encoding very de-
5, 7] and has contributed to the significant resurgence in in- pends on the problem. Our problem is to find optimal mod-
terest in MT. Nowadays it is by far the most widely studied els’ weights. In our algorithm, vector of weights is defined
MT method. individual x = x1, x2, ..., xD where D is the dimension of
the problem. The dimension D is the number of models’
The idea behind SMT comes from information theory. A weights.
text is translated according to the probability distribution
p(e|f ) where a string e in the target language is the trans- The initial population is consisted of Np individuals and is
lation of a string f in the source language. generated randomly between min and max values. As seen
in Algorithm 1, the simplest form of GA involves three types
One approach to model the probability distribution p(e|f ) of operators: crossover, mutation, and selection.
is to apply Bayes Theorem:
The crossover operator uses a crossover probability (Cr ) to
p(e|f ) = p(f |e) · p(e) (1) cross over randomly selected individuals s1 and s2 from the
population to form a new individual (x) using the following
equation:

where the translation model p(f |e) is the probability that xj = s1j rand() < Cr, (3)
the source string f is the translation of the target string e, s2j else.
and the language model p(e) is the probability of seeing that
target string e. This decomposition is attractive as it splits where j is a value between 1 and D and rand() returns a
the problem into the two subproblems. Finding the best real value between 0 and 1. If no crossover was performed,
translation e∗ is done by searching among all possible trans- the offspring x is an exact copy of one of the parents.
lations E for the translation with the highest probability
using the following equation: The mutation operator uses a mutation probability (F ) to
mutate new offspring at each position using the following
e∗ = arg max p(e|f ) = arg max p(f |e) · p(e) (2) equation:

e∈E e∈E

Performing this search efficiently is the work of a MT de- xj = xj + rand1() rand2() < F, (4)
coder that uses the foreign string, heuristics and other meth- xj else.
ods to limit the search space and at the same time keeping
acceptable quality. This trade-off between quality and time where j is a value between 1 and D and randr(), r = {1, 2},
usage can also be found in speech recognition. A text is typ- returns a real value between 0 and 1. After the crossover
ically translated sentence by sentence, but even this is not and mutation the weights can fall out of the interval [min,
enough. In SMT models are based on n-grams, where a n- max ].
gram is a sequence of n items from a given sequence of text.
The statistical translation models were initially word based, The selection operator selects solutions from the newly cre-
but significant advances were made with the introduction of ated population and the current population. Solutions with
phrase based models. Language models are typically approx- better fitness survives into the next generation g. The end
imated by smoothed n-gram models, and similar approaches condition is a maximum number of generations(G).
have been applied to the translation models, but there is
additional complexity due to different sentence lengths and 4.1 Roulette Wheel Selection
word orders in the languages. Therefore there are more ad-
vanced models that deal with these additional complexity. Because the basic GA selects from all solutions (individuals)
Phrase and word penalty models ensures that the transla- in the population, we decided to use the roulette wheel (RW)
tions/phrases are not too long or too short. Lexical reorder- selection [1] to speed up the process. The basic part of the
ing model conditions reordering on the actual phrases. By selection process is to stochastically select from one genera-
limiting the reordering, we can speed up the decoder as well tion to create the basis of the next generation. The require-
as increase the translation quality. The translation quality is ment is that the fittest individuals have a greater chance of
considered to be the correspondence between a machine and survival than weaker ones. This replicates nature in that
professional human (reference) translation. One of the first fitter individuals will tend to have a better probability of
metrics to achieve a high correlation with human judgments survival and will go forward to form the mating pool for
is the BLEU metric, and it is the more popular metric in the next generation. Weaker individuals are not without a
SMT. The BLEU metric always output a real-valued num- chance. In nature such individuals may have genetic coding
ber between 0 and 1. This value indicates how similar are that may prove useful to future generations.
the machine and reference translations. Values closer to 1

StuCoSReC Proceedings of the 2016 3rd Student Computer Science Research Conference 62
Ljubljana, Slovenia, 12 October
   57   58   59   60   61   62   63   64   65   66   67