Page 7 - Krész, Miklós, and Andrej Brodnik (eds.). MATCOS-13. Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science. Koper: University of Primorska Press, 2016.
P. 7
morphism between the query and the target. This ma- Figure 4: The parent heuristic
trix is refined after each step: the algorithm checks for each
pair of i and j where Mi,j is true whether the non-mapped frequency of the different atom and bond types in the given
neighbors of the i-th atom in the query can be mapped to database. They suggested mapping the rare atoms before
a neighbor of the j-th atom in the target. If this condition the frequent ones while keeping connectivity. This new atom
does not hold, then Mi,j is set to false and a new refine phase order needs to be calculated only once before any search
is performed. against the database.
The VF2 algorithm maintains a neighbor set for each Although this method has been shown to reduce search time,
molecule (N bq, N bt) which contains those atoms that are not it has some limitations. In its original form, it is applicable
involved in the current partial mapping but have a neighbor only for connected query graphs. However, as in many real-
that is already involved. The algorithm distinguishes the world applications, the query structure may consist of mul-
following two cases. tiple components, we have extended this method to be ap-
plicable for such queries. For example, an isolated atom can
1. N bq = ∅. In this case, the next atom to be mapped be mapped to any atom of the target having the same atom
is the element of this set with the minimum index. type; therefore it is beneficial to consider isolated atoms last.
This atom can be mapped only to an atom from N bt. Furthermore, if we apply this atom ordering technique, then
Before the algorithm extends the current partial map- the calculation of parent atoms should also be adjusted.
ping by adding f (i) = j, it checks if the atom and the
bond types match and |N bq| ≤ |N bt| holds for the new 5.3 Ullmann Algorithm
neighbor sets.
We have successfully applied the aforementioned two heuris-
2. N bq = ∅. In this case, the non-mapped query atom tics in the Ullmann algorithm. They both substantially de-
with the minimum index is considered, which can be crease its running time.
mapped to any non-used atom of the target. Before
the extension of the current partial mapping, only the In addition, we could also decrease the memory requirement
atom types are checked. of the algorithm. Its straightforward implementation re-
quires O(n3) space (where n = max{|V1|, |V2|}), which can
5. HEURISTICS be reduced to O(n2) by saving only the modified matrix el-
ements at each depth instead of saving the entire boolean
In this section, we briefly introduce the heuristics we have compatibility matrix. Because every element can be set from
developed to improve upon the considered algorithms. true to false only once under a node in a backtracking tree,
at most O(n2) positions need to be saved in total.
5.1 Parent heuristic
Another improvement exploits that the compatibility matrix
Definition 3. Suppose the atoms of the query are indexed contains exactly one true value in the first d rows, where d
by positive integers and a partial mapping is given. Let denotes the current search depth. Therefore, the matrix
qi (i ∈ {1, . . . , |V1|}) be an arbitrary non-mapped atom of refinement step can be started at the (d + 1)-th row.
the query molecule. Define the parent atom of qi (denoted by
parent(qi)) as its mapped neighbor atom with the minimum 5.4 VF2
index (if it has any).
We have also devised a few improvements for the VF2 algo-
Notice that at least one atom in each component does not rithm. First, the candidate sets are not pre-calculated and
have a parent atom (i.e., the atom mapped first in its com- not stored in memory. If the next candidate is needed, it
ponent). Furthermore, note that the parent atoms are not can be calculated on the fly. Additionally, if only the first
modified by extensions of the current partial mapping. The mapping is required, then there is no need to calculate those
key observation that we exploited is that a query atom qi can candidates that might not be used in a later step.
only be mapped to those atoms of target that are adjacent
to the image of parent(qi). Search algorithms can effectively Although comparing the cardinality of neighbor sets im-
take advantage of this property in case of molecular graphs, proves the performance of the original algorithm, we found
because they are very sparse. that it becomes superfluous when the parent heuristic is also
applied. In fact, this new heuristic developed by us seems
A simple example is shown in Figure 4 to demonstrate how to be clearly superior to the original technique.
this heuristic can be applied. Suppose that the current map-
ping is f (1) = 4, f (2) = 5, and f (3) is under consideration. Apart from that, the atom reordering heuristic is also ap-
The parent atoms are indicated by arrows pointing from an
atom to its parent atom. As f (parent(3)) = 4, which has
two neighbors that are not mapped yet, namely 2 and 6, the
only possible assignments are f (3) = 2 and f (3) = 6.
5.2 Atom order in the query molecule
H. Shang et. al. introduced a new method in [4] to obtain
better performance of search algorithm. Their idea is to
rearrange the atoms of the query molecule according to the
m a t c o s -1 3 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science 7
Koper, Slovenia, 10-11 October
trix is refined after each step: the algorithm checks for each
pair of i and j where Mi,j is true whether the non-mapped frequency of the different atom and bond types in the given
neighbors of the i-th atom in the query can be mapped to database. They suggested mapping the rare atoms before
a neighbor of the j-th atom in the target. If this condition the frequent ones while keeping connectivity. This new atom
does not hold, then Mi,j is set to false and a new refine phase order needs to be calculated only once before any search
is performed. against the database.
The VF2 algorithm maintains a neighbor set for each Although this method has been shown to reduce search time,
molecule (N bq, N bt) which contains those atoms that are not it has some limitations. In its original form, it is applicable
involved in the current partial mapping but have a neighbor only for connected query graphs. However, as in many real-
that is already involved. The algorithm distinguishes the world applications, the query structure may consist of mul-
following two cases. tiple components, we have extended this method to be ap-
plicable for such queries. For example, an isolated atom can
1. N bq = ∅. In this case, the next atom to be mapped be mapped to any atom of the target having the same atom
is the element of this set with the minimum index. type; therefore it is beneficial to consider isolated atoms last.
This atom can be mapped only to an atom from N bt. Furthermore, if we apply this atom ordering technique, then
Before the algorithm extends the current partial map- the calculation of parent atoms should also be adjusted.
ping by adding f (i) = j, it checks if the atom and the
bond types match and |N bq| ≤ |N bt| holds for the new 5.3 Ullmann Algorithm
neighbor sets.
We have successfully applied the aforementioned two heuris-
2. N bq = ∅. In this case, the non-mapped query atom tics in the Ullmann algorithm. They both substantially de-
with the minimum index is considered, which can be crease its running time.
mapped to any non-used atom of the target. Before
the extension of the current partial mapping, only the In addition, we could also decrease the memory requirement
atom types are checked. of the algorithm. Its straightforward implementation re-
quires O(n3) space (where n = max{|V1|, |V2|}), which can
5. HEURISTICS be reduced to O(n2) by saving only the modified matrix el-
ements at each depth instead of saving the entire boolean
In this section, we briefly introduce the heuristics we have compatibility matrix. Because every element can be set from
developed to improve upon the considered algorithms. true to false only once under a node in a backtracking tree,
at most O(n2) positions need to be saved in total.
5.1 Parent heuristic
Another improvement exploits that the compatibility matrix
Definition 3. Suppose the atoms of the query are indexed contains exactly one true value in the first d rows, where d
by positive integers and a partial mapping is given. Let denotes the current search depth. Therefore, the matrix
qi (i ∈ {1, . . . , |V1|}) be an arbitrary non-mapped atom of refinement step can be started at the (d + 1)-th row.
the query molecule. Define the parent atom of qi (denoted by
parent(qi)) as its mapped neighbor atom with the minimum 5.4 VF2
index (if it has any).
We have also devised a few improvements for the VF2 algo-
Notice that at least one atom in each component does not rithm. First, the candidate sets are not pre-calculated and
have a parent atom (i.e., the atom mapped first in its com- not stored in memory. If the next candidate is needed, it
ponent). Furthermore, note that the parent atoms are not can be calculated on the fly. Additionally, if only the first
modified by extensions of the current partial mapping. The mapping is required, then there is no need to calculate those
key observation that we exploited is that a query atom qi can candidates that might not be used in a later step.
only be mapped to those atoms of target that are adjacent
to the image of parent(qi). Search algorithms can effectively Although comparing the cardinality of neighbor sets im-
take advantage of this property in case of molecular graphs, proves the performance of the original algorithm, we found
because they are very sparse. that it becomes superfluous when the parent heuristic is also
applied. In fact, this new heuristic developed by us seems
A simple example is shown in Figure 4 to demonstrate how to be clearly superior to the original technique.
this heuristic can be applied. Suppose that the current map-
ping is f (1) = 4, f (2) = 5, and f (3) is under consideration. Apart from that, the atom reordering heuristic is also ap-
The parent atoms are indicated by arrows pointing from an
atom to its parent atom. As f (parent(3)) = 4, which has
two neighbors that are not mapped yet, namely 2 and 6, the
only possible assignments are f (3) = 2 and f (3) = 6.
5.2 Atom order in the query molecule
H. Shang et. al. introduced a new method in [4] to obtain
better performance of search algorithm. Their idea is to
rearrange the atoms of the query molecule according to the
m a t c o s -1 3 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science 7
Koper, Slovenia, 10-11 October