Page 195 - Weiss, Jernej, ur./ed. 2025. Glasbena interpretacija: med umetniškim in znanstvenim┊Music Interpretation: Between the Artistic and the Scientific. Koper/Ljubljana: Založba Univerze na Primorskem in Festival Ljubljana. Studia musicologica Labacensia, 8
exploring musicological discourses ...
            Table 1: The dataset in numbers.

                          Category                           Count
                           Tokens                           8,547,948
                           Words                            6,432,984
                          Sentences                          278,769

                 As we can see in Table 1, the entire dataset includes roughly 6.5 mil-
            lion words and roughly 8.5 million tokens. Tokens include words, punctu-
            ation, symbols, etc. The following table shows the markup of words and to-
            kens within the dataset according to individual journals.

            Table 2: Distribution of texts per journal by words, tokens and percentage.

                   Name            Tokens            Words             %
                   GPZ             203,834          ~153,522           2.3
                   DMD             2,374,001       ~1,788,041          27.1
                    ML             1,337,088       ~1,007,063          15.3
                    MZ             4,836,859       ~3,643,007          55.3

                 As Table 2 demonstrates, by far the largest subset of the dataset comes
            from Muzikološki zbornik , followed by De musica disserenda , Studia mu-
            sicologica Labacensia  and Glasbenopedagoški zbornik  with only a minute
            margin. The distribution between the journals is of course very important,
            as the present markup represents a significant skew towards Muzikološki
            zbornik. In terms of understanding and interpreting the data, this is of note
            as this journal, its layout, and topics are more widely represented than, say,
            those of Glasbenopedagoški zbornik. However, in such cases, we must keep
            the data as diverse as possible, even with unbalanced representation. The
            article counts and the distribution of volumes included in the database are
            given in Table 3.

            Table 3: Distribution of files and timespans included in the dataset.

                 Journal        ML           GPZ           MZ           DMD
             Nr. of articles    7             25           42           255
             Volumes included  2017–2024   2010–2023    2000–2023     2005–2023
            30   Hereinafter  MZ.
            31   Hereinafter DMD.
            32   Hereinafter ML.
            33   Hereinafter GPZ.

