Page 52 - Fister jr., Iztok, and Andrej Brodnik (eds.). StuCoSReC. Proceedings of the 2016 3rd Student Computer Science Research Conference. Koper: University of Primorska Press, 2016
P. 52
e. There are networks that are built of simple graphs At the beginning the vertices become active independently
which contain directed edges with values. These networks of each other with their assigned probabilities. Thereafter,
are used in infection models, which have several types. All the infection process of the IC model applies with a ran-
of them have similar input structure which is built by a domly selected active set. In this process, each vertex gets a
finite iterative process. During this process there are vertex value by function w∗ : V → [0, 1] on the vertices, which are
sets that contain three type of vertices: infected, infectious, the a posteriori influence. The influence measure σ(·) which
not infected yet (susceptible). We can consider two types of is the sum of output infection value can be given:
process: progressive and non-progressive. The base of the
progressive infection models is that their vertices cannot be σ(w) = w∗(v)
recovered if they had been infected. In turn, during the
non-progressive process there is recovery. The models of v∈V
paper [6] use progressive process. The infection models can
be distinguished by the type of edge-values. The models can There are several simulation that calculating this measure.
be stochastic infection models if edge-values are probabilities We consider the Complete Simulation (CS) according to pa-
(fractions between 0 and 1). These probabilities give the so- per [2] and we prove that theory of Kempe et al. is right
called ‘effect of network’. After building a model we get for the examined GIC. A required number of simulations
an infected vertex set A and the value of infection as the can be defined to give certain measure of approximation ( )
expected value of infected vertices is σ(A). to σ(A) with a given probability (δ). The CS is a method
that gets a network G and a sample size k. The result of
2.1 Independent Cascade Model algorithm is the relative frequency of infection. Since this
method is based on frequency counting, its precision and
In this section we show a specialized infection model, the In- time complexity depends greatly on the size k.
dependent Cascade Model (IC). It is introduced in paper [4].
Starting with an initially active set of vertices A0 ⊂ V (G). Proposition 1. If the GIC starting with A is simulated
Let Ai ⊂ V (G) denote the set of vertices newly activated independently at least
in iteration i. In iteration i + 1, every vertex u ∈ Ai
has one chance to activate each of its inactive neighbors ln(2/δ)/(2wa2v 2)
v ∈ V \ ∪0≤j≤iSj according to wu,v. These are usually prob-
ability values. If the attempt is successful, then v becomes times, then the average number of activated vertices over
active in iteration i + 1. In this case the vertices are infected these simulation is a (1± )-approximation to σ(A), with
independently. The vertices can infect a susceptible one at probability at least (1 − δ), if δ, ∈ (0, 1), and wav is the
once in the current iteration. [6] average of the input weights.
The open question of the field is to compute the expected We remark that the Proposition 1. is not only true for GIC,
number of infected vertex is #P-complete. There are rele- but for the infection process in general.
vant researches for this question in papers [3] and [7]. Kempe
and his colleagues gave approximate values with arbitrary 2.3 Infection Heuristics
accuracy for the number of infected elements with simula-
tion for this problem. Unfortunately, this simulation is very Although after the model generalization, Bo´ta et al. devel-
costly. They defined the influence-maximization problem for oped simulations and heuristics which efficiency and accu-
this model as well. They looked for the vertex set for which racy were not acceptable. As a consequence, other heuris-
the expected value of the infected vertices is the highest. tics are worth to research. We looked for a unique heuristic
The optimal solution for influence maximization can be effi- which can be accelerate existing methods. In our study we
ciently approximated within a factor of (1 − 1/e − ), where used the analytic formula by Srivastrava et al. [9], which is
e is the base of the natural algorithm and is any positive usually applied in binary infection models. This formula has
real number. This is a performance guarantee slightly better been used for IC model before. We used it at first for GIC.
than 63%. They proved that applying a greedy algorithm
gives the guaranteed precision that results in good quality Let Au,t denote a probability that vertex u is infected at time
solutions in practice.
t. Furthermore let Bu,t denote another probability that ver-
2.2 Generalized Independent Cascade Model
tex u is infected until time t. According to above Au,0 =
Bo´ta et al. extended the IC and introduced the Generalized
(Independent) Cascade Model (GIC)1 [2]. The initial state w(u), so at time 0 the apriori infection will be given. In ad-
of this model does not consist of infected and uninfected ver-
tices, but introduces an a priori probability (input infection dition to Bu,t = t Au,j , therefore Au,t = Bu,t − Bu,t−1 .
weight w) of infection for each vertex. At the end of the in- j=0
fection process, each vertex receives an a posterior infection
value (output infection weight (w∗)). Let Ni(u) denote the endpoints set of incoming edges of
vertex u, then we can calculate Bu,t simply:
Formally, there is a given function wv : V → [0, 1] on the ver-
tices, which is the a priori measure of elements of network G. Bu,t := 1 − (1 − Bu,t) · (1 − p(v, u) · Av,t−1)
1Bo´ta et al. used the Generalized Cascade technical term, v∈Ni (u)
but this name is already reserved by Kempe et al. [6]
We can approximate the a posteriori infection with Bu,t be-
cause the infection process is progressive. The accuracy can
be enhanced by increasing t. The basic idea of this heuristic
is to consider independently the effect of edge probabilities.
StuCoSReC Proceedings of the 2016 3rd Student Computer Science Research Conference 52
Ljubljana, Slovenia, 12 October
which contain directed edges with values. These networks of each other with their assigned probabilities. Thereafter,
are used in infection models, which have several types. All the infection process of the IC model applies with a ran-
of them have similar input structure which is built by a domly selected active set. In this process, each vertex gets a
finite iterative process. During this process there are vertex value by function w∗ : V → [0, 1] on the vertices, which are
sets that contain three type of vertices: infected, infectious, the a posteriori influence. The influence measure σ(·) which
not infected yet (susceptible). We can consider two types of is the sum of output infection value can be given:
process: progressive and non-progressive. The base of the
progressive infection models is that their vertices cannot be σ(w) = w∗(v)
recovered if they had been infected. In turn, during the
non-progressive process there is recovery. The models of v∈V
paper [6] use progressive process. The infection models can
be distinguished by the type of edge-values. The models can There are several simulation that calculating this measure.
be stochastic infection models if edge-values are probabilities We consider the Complete Simulation (CS) according to pa-
(fractions between 0 and 1). These probabilities give the so- per [2] and we prove that theory of Kempe et al. is right
called ‘effect of network’. After building a model we get for the examined GIC. A required number of simulations
an infected vertex set A and the value of infection as the can be defined to give certain measure of approximation ( )
expected value of infected vertices is σ(A). to σ(A) with a given probability (δ). The CS is a method
that gets a network G and a sample size k. The result of
2.1 Independent Cascade Model algorithm is the relative frequency of infection. Since this
method is based on frequency counting, its precision and
In this section we show a specialized infection model, the In- time complexity depends greatly on the size k.
dependent Cascade Model (IC). It is introduced in paper [4].
Starting with an initially active set of vertices A0 ⊂ V (G). Proposition 1. If the GIC starting with A is simulated
Let Ai ⊂ V (G) denote the set of vertices newly activated independently at least
in iteration i. In iteration i + 1, every vertex u ∈ Ai
has one chance to activate each of its inactive neighbors ln(2/δ)/(2wa2v 2)
v ∈ V \ ∪0≤j≤iSj according to wu,v. These are usually prob-
ability values. If the attempt is successful, then v becomes times, then the average number of activated vertices over
active in iteration i + 1. In this case the vertices are infected these simulation is a (1± )-approximation to σ(A), with
independently. The vertices can infect a susceptible one at probability at least (1 − δ), if δ, ∈ (0, 1), and wav is the
once in the current iteration. [6] average of the input weights.
The open question of the field is to compute the expected We remark that the Proposition 1. is not only true for GIC,
number of infected vertex is #P-complete. There are rele- but for the infection process in general.
vant researches for this question in papers [3] and [7]. Kempe
and his colleagues gave approximate values with arbitrary 2.3 Infection Heuristics
accuracy for the number of infected elements with simula-
tion for this problem. Unfortunately, this simulation is very Although after the model generalization, Bo´ta et al. devel-
costly. They defined the influence-maximization problem for oped simulations and heuristics which efficiency and accu-
this model as well. They looked for the vertex set for which racy were not acceptable. As a consequence, other heuris-
the expected value of the infected vertices is the highest. tics are worth to research. We looked for a unique heuristic
The optimal solution for influence maximization can be effi- which can be accelerate existing methods. In our study we
ciently approximated within a factor of (1 − 1/e − ), where used the analytic formula by Srivastrava et al. [9], which is
e is the base of the natural algorithm and is any positive usually applied in binary infection models. This formula has
real number. This is a performance guarantee slightly better been used for IC model before. We used it at first for GIC.
than 63%. They proved that applying a greedy algorithm
gives the guaranteed precision that results in good quality Let Au,t denote a probability that vertex u is infected at time
solutions in practice.
t. Furthermore let Bu,t denote another probability that ver-
2.2 Generalized Independent Cascade Model
tex u is infected until time t. According to above Au,0 =
Bo´ta et al. extended the IC and introduced the Generalized
(Independent) Cascade Model (GIC)1 [2]. The initial state w(u), so at time 0 the apriori infection will be given. In ad-
of this model does not consist of infected and uninfected ver-
tices, but introduces an a priori probability (input infection dition to Bu,t = t Au,j , therefore Au,t = Bu,t − Bu,t−1 .
weight w) of infection for each vertex. At the end of the in- j=0
fection process, each vertex receives an a posterior infection
value (output infection weight (w∗)). Let Ni(u) denote the endpoints set of incoming edges of
vertex u, then we can calculate Bu,t simply:
Formally, there is a given function wv : V → [0, 1] on the ver-
tices, which is the a priori measure of elements of network G. Bu,t := 1 − (1 − Bu,t) · (1 − p(v, u) · Av,t−1)
1Bo´ta et al. used the Generalized Cascade technical term, v∈Ni (u)
but this name is already reserved by Kempe et al. [6]
We can approximate the a posteriori infection with Bu,t be-
cause the infection process is progressive. The accuracy can
be enhanced by increasing t. The basic idea of this heuristic
is to consider independently the effect of edge probabilities.
StuCoSReC Proceedings of the 2016 3rd Student Computer Science Research Conference 52
Ljubljana, Slovenia, 12 October