Page 51 - Fister jr., Iztok, and Andrej Brodnik (eds.). StuCoSReC. Proceedings of the 2017 4th Student Computer Science Research Conference. Koper: University of Primorska Press, 2017
P. 51
ξ1, ξ2, . . . , ξσ+1 be random variables such that ξi repre- Subset existence Superset existence
sents the number of nodes in a multiset-trie on i-th level.
350 |M| = 10000 350 |M| = 10000
For every node j on i-th level we assign a random variable |M| = 20000 |M| = 20000
ξij to be the number of its children, such that j ∈ [1, ξi]. |M| = 40000 |M| = 40000
Then ∀ i ∈ [1, σ] the following holds: 300 |M| = 80000 300 |M| = 80000
|M| = 160000
|M| = 160000
250 |M| = 320000 250 |M| = 320000

Visited nodes200 200
Visited nodes
150 150

ξi 100 100

ξi+1 = ξij . (1) 50 50

j=1 0 5 10 15 20 25 0 5 10 15 20 25
Length of input Length of input

The random variable ξij is modeled by a zero-truncated bi- Figure 2: Experiment 1, submsetExistence and su-
nomial distribution on parameters n and pi+1, where pi = permsetExistence functions.
1 − (1 − p)nσ+1−i . The distribution is zero-truncated, be-
cause any internal node has at least 1 descendant if M is 2.4 Experiments

non empty, and is binomial, since our model assumes a ran- The implementation of multiset-trie is done in the C++ pro-
gramming language and uses only the standard library of
dom input. C++14 version of the standard.

Applying the tools from Galton-Watson process [3] and us-
ing the fact that ξ1 = 1, we calculate the expected number
of nodes in a multiset-trie given parameters n, σ and p

σ+1 1 − (1 − p)nσ+1−i In our experiments we will test the following functions: subm-
1 − (1 − p)nσ setExistence, supermsetExistence, getAllSubmsets
E(|M|) = ni−1 . (2) and getAllSupermsets. Performance of the functions will
be measured by the number of visited nodes in multiset-trie
i=1 by the particular function. In particular the performance is
inversely proportional to the number of visited nodes. The
With the expected number of nodes in a multiset-trie M selected parameters of the data structure that will be varied
obtained from (2), we can now generalize the result for a in experiments are as follows:
subtree in M parametrized by an input multiset m. The
subtrees that we are interested in are the ones that contain σ – the cardinality of the alphabet Σ;
all the submultisets or all the supermultisets of m. In order
to calculate the expected cardinality of such subtrees we n – the maximal degree of a node;
need the following definition.

Definition 1. Let m = {1k1 , 2k2 , . . . , σkσ }, where eke is φ – mapping φ : Σ → {1, 2, . . . , σ};
an element e with multiplicity ke. Let M1, M2 be the subsets
of the set M, such that M1 = {x ∈ M : x ⊆ m} and M2 = d – density of a multiset-trie, where d is the function
{x ∈ M : x ⊇ m}. Define αi and βi as follows
d(|M |) = |M | .
|N |

αi = 1, i = 0 The artificially generated multisets are constructed with re-
spect to parameters Σ, σ and n. The collection M is sampled
i (kj + 1), 1≤i≤σ from N with uniform distribution which simulates a random
j=1 user input.

and

βi = 1, i = 0
. Experiments 1 and 2. The Experiments 1 and 2 demon-
ji=1(n − kj − 1), 1≤i≤σ
strate the performance of multiset-trie depending on the
The expected cardinality of a multiset-trie subtree MM1 density. In both experiments the data is artificially gener-
that contains all the multisets from the set M1 is equal to ated at random, meaning that φ has no influence on results,
and therefore is set to an identity mapping.
σ+1 1 − (1 − p)αi−1
1 − (1 − p)ασ In the first experiment multiset-trie is used for storing and
E(|MM1 |) = αi−1 . (3) retrieving sets and the parameters n and σ are set to 2 and
25 respectively. The parameter |M | varies from 10000 sets
i=1 up to 320000 sets.

The expected cardinality of a multiset-trie subtree MM2 In the second experiment the multisets are used and the
that contains all the multisets from the set M2 is equal to parameters n and σ are set to 6 and 25 respectively. The
parameter |M | varies from 40000 multisets up to 640000 sets.
σ+1 1 − (1 − p)βi−1
1 − (1 − p)βσ The performance of the ”existence” functions is different in
E(|MM2 |) = βi−1 . (4) Experiments 1 and 2. With increase of density it increases
in the first experiment (see figure 2) and decreases in the
i=1 second (see figure 3). It happens, because in the first case
the multiset-trie is dense enough, so the increase of the den-
The worst case time complexity of the multiset-trie opera- sity increases the probability of finding submultiset or su-
tions is presented in the table below. permultiset in multiset-trie, which lowers the number of vis-
ited nodes. However, in the second case the multiset-trie is
Function Worst case complexity
insert, search, delete
getAllSubmsets O(σ)
getAllSupermsets
submsetExistence O(|MM1 |)
supermsetExistence O(|MM2 |)
O(|MM1 | − |M1|)
O(|MM2 | − |M2|)

StuCoSReC Proceedings of the 2017 4th Student Computer Science Research Conference 51
Ljubljana, Slovenia, 11 October
   46   47   48   49   50   51   52   53   54   55   56