Page 195 - Weiss, Jernej, ur./ed. 2025. Glasbena interpretacija: med umetniškim in znanstvenim┊Music Interpretation: Between the Artistic and the Scientific. Koper/Ljubljana: Založba Univerze na Primorskem in Festival Ljubljana. Studia musicologica Labacensia, 8
P. 195
exploring musicological discourses ...
Table 1: The dataset in numbers.
Category Count
Tokens 8,547,948
Words 6,432,984
Sentences 278,769
As we can see in Table 1, the entire dataset includes roughly 6.5 mil-
lion words and roughly 8.5 million tokens. Tokens include words, punctu-
ation, symbols, etc. The following table shows the markup of words and to-
kens within the dataset according to individual journals.
Table 2: Distribution of texts per journal by words, tokens and percentage.
Name Tokens Words %
GPZ 203,834 ~153,522 2.3
DMD 2,374,001 ~1,788,041 27.1
ML 1,337,088 ~1,007,063 15.3
MZ 4,836,859 ~3,643,007 55.3
As Table 2 demonstrates, by far the largest subset of the dataset comes
31
from Muzikološki zbornik , followed by De musica disserenda , Studia mu-
30
32
33
sicologica Labacensia and Glasbenopedagoški zbornik with only a minute
margin. The distribution between the journals is of course very important,
as the present markup represents a significant skew towards Muzikološki
zbornik. In terms of understanding and interpreting the data, this is of note
as this journal, its layout, and topics are more widely represented than, say,
those of Glasbenopedagoški zbornik. However, in such cases, we must keep
the data as diverse as possible, even with unbalanced representation. The
article counts and the distribution of volumes included in the database are
given in Table 3.
Table 3: Distribution of files and timespans included in the dataset.
Journal ML GPZ MZ DMD
Nr. of articles 7 25 42 255
Volumes included 2017–2024 2010–2023 2000–2023 2005–2023
30 Hereinafter MZ.
31 Hereinafter DMD.
32 Hereinafter ML.
33 Hereinafter GPZ.
195