Page 50 - Fister jr., Iztok, and Andrej Brodnik (eds.). StuCoSReC. Proceedings of the 2017 4th Student Computer Science Research Conference. Koper: University of Primorska Press, 2017
P. 50
pectively. Let a node u be a parent node of a node v. prefixes. When search is completed, the function removes
Suppose that a child node v is not Null and has a label cj. the branch preserving common prefixes, and returns true.
Then the path u → v represents a symbol φ−1(i) ∈ Σ with
multiplicity j. Such a transition u → v is called a path of Sub- and super-multiset existence. The submsetExis-
length 1 and is allowed if and only if a node v is not Null
and u is a parent node of a node v. tence(M, m) function checks if there exists a multiset x in
M, that satisfies the condition x ⊆ m. The function starts
We define a complete path to be the path of length σ in a with searching for an exact match x = m in M, since m ⊆ m
multiset-trie with the end points at root node (the 1st level) by definition of submultiset inclusion. If an exact match is
and LL. Thus, a multiset m is inserted into a multiset-trie not found in M, the function uses multiset-trie to find the
if and only if there exists a complete path in a multiset- closest (the largest) submultiset of m in M by decreasing
trie that corresponds to m. Note that every complete path multiplicity of elements in m. At every level the function
in a multiset-trie is unique. Therefore, the multisets that tries to proceed with the largest possible multiplicity of an
share a common prefix in a multiset-trie can have a common element that is provided by m. However, when the function
path of length at most σ − 1. Thus, any multiset m that is reaches some level where it meets a Null node and can not
composed of symbols from Σ with maximum multiplicity not go further using path provided by m, it decreases the mul-
greater than n − 1 can be represented by a complete path in tiplicity of an element that corresponds to a current level.
a multiset-trie. Thus, the function can decrease multiplicity of an element
or eventually skip it in order to find the closest x ⊆ m.
Let us have an example of a multiset-trie data structure,
where σ = 2, Σ = I = {1, 2}, n = 3 and the mapping φ is The function supermsetExistence(M, m) checks if there
an identity mapping. The figure 1 presents an example of exists supermultiset x of a given multiset m in M. By anal-
the multiset-trie, where Null children are omitted. Let a pair ogy to the function submsetExistence, the function su-
permsetExistence starts by searching for an exact match
x = m in M. If an exact match is not found in M, the func-
tion searches for the closest (the smallest) supermultiset x of
m in M, in this case, by increasing multiplicity of elements
in m.
Figure 1: Example of multiset-trie structure. Get all sub- and super-multisets. The algorithms for func-
(Li, cj) represents a node with label cj at a level Li. The pair tions getAllSubmsets and getAllSupermsets are based
(L1, cj) is equivalent to (L1, root), for every j. According to entirely on algorithms for submsetExistence and supermse-
the figure 1 we can extract inserted multisets as follows: tExistence functions that do not terminate on the first
(L1, root) → (L2, c0) → (LL, c0) ≡ {10, 20} = ∅ existing sub/supermultiset, but store the results and con-
(L1, root) → (L2, c0) → (LL, c2) ≡ {10, 22} = {2, 2} tinue procedure until all existing sub/supermultisets in M
(L1, root) → (L2, c1) → (LL, c2) ≡ {11, 22} = {1, 2, 2} are found and stored.
(L1, root) → (L2, c2) → (LL, c2) ≡ {12, 22} = {1, 1, 2, 2}
where ek represents an element e with multiplicity k. 2.3 Analysis of the multiset-trie
2.2 Operations By the design of the multiset-trie, it is easy to see that the
functions search, delete and the procedure insert have
Basic tree operations. The procedure insert(M, m) in- complexity of O(σ). Because σ is defined when the structure
is initialized and does not depend on the user input after-
serts a new instance m of type Multiset into multiset-trie M. wards, the asymptotic complexity of the functions search,
If the complete path already exists, then procedure leaves delete and the procedure insert is O(1). Nonetheless, in
the structure unchanged. Otherwise it extends partially ex- the general case the complexity is O(σ).
isting or creates a new complete path. The procedure does
not return any result. In what follows, we focus on analysis of the more involved
functions which correspond to a multiset-containment queries:
The function search(M, m) checks if the complete path submsetExistence, supermsetExistence, getAllSubm-
corresponding to a given multiset m exists in the structure sets and getAllSupermsets.
M. The function returns true if the multiset m exists in M,
and returns false otherwise. Mathematical model. We start with the basics of our math-
The function delete(M, m) searches for the complete path ematical model. Let Σ be an alphabet of cardinality σ, such
that corresponds to m in order to remove it. If the path can that Σ = {1, 2, . . . , σ}. Define N to be the power multiset
not be found, the function immediately returns false. During of Σ with respect to n, where n is the maximal degree of
search, the function collects additional data about common a node in multiset-trie. Thus, the number of multisets in a
complete multiset-trie is |N | = nσ. Let M be a collection of
multisets inserted into multiset-trie M. Our model assumes
that all the inserted multisets from M are chosen with the
same probability p from N, for some p ∈ (0, 1).
StuCoSReC Proceedings of the 2017 4th Student Computer Science Research Conference 50
Ljubljana, Slovenia, 11 October
Suppose that a child node v is not Null and has a label cj. the branch preserving common prefixes, and returns true.
Then the path u → v represents a symbol φ−1(i) ∈ Σ with
multiplicity j. Such a transition u → v is called a path of Sub- and super-multiset existence. The submsetExis-
length 1 and is allowed if and only if a node v is not Null
and u is a parent node of a node v. tence(M, m) function checks if there exists a multiset x in
M, that satisfies the condition x ⊆ m. The function starts
We define a complete path to be the path of length σ in a with searching for an exact match x = m in M, since m ⊆ m
multiset-trie with the end points at root node (the 1st level) by definition of submultiset inclusion. If an exact match is
and LL. Thus, a multiset m is inserted into a multiset-trie not found in M, the function uses multiset-trie to find the
if and only if there exists a complete path in a multiset- closest (the largest) submultiset of m in M by decreasing
trie that corresponds to m. Note that every complete path multiplicity of elements in m. At every level the function
in a multiset-trie is unique. Therefore, the multisets that tries to proceed with the largest possible multiplicity of an
share a common prefix in a multiset-trie can have a common element that is provided by m. However, when the function
path of length at most σ − 1. Thus, any multiset m that is reaches some level where it meets a Null node and can not
composed of symbols from Σ with maximum multiplicity not go further using path provided by m, it decreases the mul-
greater than n − 1 can be represented by a complete path in tiplicity of an element that corresponds to a current level.
a multiset-trie. Thus, the function can decrease multiplicity of an element
or eventually skip it in order to find the closest x ⊆ m.
Let us have an example of a multiset-trie data structure,
where σ = 2, Σ = I = {1, 2}, n = 3 and the mapping φ is The function supermsetExistence(M, m) checks if there
an identity mapping. The figure 1 presents an example of exists supermultiset x of a given multiset m in M. By anal-
the multiset-trie, where Null children are omitted. Let a pair ogy to the function submsetExistence, the function su-
permsetExistence starts by searching for an exact match
x = m in M. If an exact match is not found in M, the func-
tion searches for the closest (the smallest) supermultiset x of
m in M, in this case, by increasing multiplicity of elements
in m.
Figure 1: Example of multiset-trie structure. Get all sub- and super-multisets. The algorithms for func-
(Li, cj) represents a node with label cj at a level Li. The pair tions getAllSubmsets and getAllSupermsets are based
(L1, cj) is equivalent to (L1, root), for every j. According to entirely on algorithms for submsetExistence and supermse-
the figure 1 we can extract inserted multisets as follows: tExistence functions that do not terminate on the first
(L1, root) → (L2, c0) → (LL, c0) ≡ {10, 20} = ∅ existing sub/supermultiset, but store the results and con-
(L1, root) → (L2, c0) → (LL, c2) ≡ {10, 22} = {2, 2} tinue procedure until all existing sub/supermultisets in M
(L1, root) → (L2, c1) → (LL, c2) ≡ {11, 22} = {1, 2, 2} are found and stored.
(L1, root) → (L2, c2) → (LL, c2) ≡ {12, 22} = {1, 1, 2, 2}
where ek represents an element e with multiplicity k. 2.3 Analysis of the multiset-trie
2.2 Operations By the design of the multiset-trie, it is easy to see that the
functions search, delete and the procedure insert have
Basic tree operations. The procedure insert(M, m) in- complexity of O(σ). Because σ is defined when the structure
is initialized and does not depend on the user input after-
serts a new instance m of type Multiset into multiset-trie M. wards, the asymptotic complexity of the functions search,
If the complete path already exists, then procedure leaves delete and the procedure insert is O(1). Nonetheless, in
the structure unchanged. Otherwise it extends partially ex- the general case the complexity is O(σ).
isting or creates a new complete path. The procedure does
not return any result. In what follows, we focus on analysis of the more involved
functions which correspond to a multiset-containment queries:
The function search(M, m) checks if the complete path submsetExistence, supermsetExistence, getAllSubm-
corresponding to a given multiset m exists in the structure sets and getAllSupermsets.
M. The function returns true if the multiset m exists in M,
and returns false otherwise. Mathematical model. We start with the basics of our math-
The function delete(M, m) searches for the complete path ematical model. Let Σ be an alphabet of cardinality σ, such
that corresponds to m in order to remove it. If the path can that Σ = {1, 2, . . . , σ}. Define N to be the power multiset
not be found, the function immediately returns false. During of Σ with respect to n, where n is the maximal degree of
search, the function collects additional data about common a node in multiset-trie. Thus, the number of multisets in a
complete multiset-trie is |N | = nσ. Let M be a collection of
multisets inserted into multiset-trie M. Our model assumes
that all the inserted multisets from M are chosen with the
same probability p from N, for some p ∈ (0, 1).
StuCoSReC Proceedings of the 2017 4th Student Computer Science Research Conference 50
Ljubljana, Slovenia, 11 October