Page 76 - Fister jr., Iztok, Andrej Brodnik, Matjaž Krnc and Iztok Fister (eds.). StuCoSReC. Proceedings of the 2019 6th Student Computer Science Research Conference. Koper: University of Primorska Press, 2019
P. 76
RELATED WORKS 3. METHODS
Using clinically meaningful features, good quality ECG mea- 3.1 Extending Dataset with Alternative Anno-
surements could be flawlessly classified by simply applying tation
traditional machine learning techniques (i.e. logistic regres-
sion, random-tree, support-vector-machines). On the other We compete with human performance; however, we do not
hand, real-life samples often pose too much noise and high know much about that. There is no better reference cur-
variance that could mislead handcrafted rules, and yet state- rently than human annotation (possibly by experts). Atrial
of-the-art approaches are still relying heavily on feature en- fibrillation is a human category, there is no mathematically
gineering for AF detection. Accordingly, three of the four exact definition for it, thus we need humans to define its
winners of the CinC Cardiology 2017 challenge combined characteristics. Unfortunately, these definitions are vague
only medically relevant feature-extractors and did not in- and fuzzy from the algorithmic point of view as there is al-
corporate any neural network-based features [7, 8, 9]. ways an inherent ambiguity in all applications when we try
to approximate human definitions using mathematical mod-
Only one of the four winner approaches fused expert fea- els. To come around this problem we have created a dataset
tures with other descriptors extracted by a neural network. in which every sample was annotated by multiple (in our
Hong et al.[10] proposed an algorithm concerning 64 fea- case two) experts to allow us measuring the variation of the
tures learned by a Deep Neural Architecture, namely a time- annotations as well.
invariant hierarchical feature extractor network with 4 resid-
ual blocks [11] combined with a Bi-directional Long Short- We asked two doctors to help us: Dr Sz. Herczeg PhD stu-
term Memory network (LSTM [12]) resulting in a 32 dimen- dent in the field of cardiac arrhythmia (Expert-1), and Dr I.
sional continuous descriptor and a Uni-directional LSTM Osztheimer cardiologist consultant (Expert-2), both work-
trained separately using centerwave input to extract 32 time ing at the Heart and Vascular Center of Semmelweis Uni-
related features. While the final classifier was applied on a versity, in Budapest. Our goal was to examine the difference
feature space with more than 600 dimensions, after ranking between the decisions of experts of the Challenge, our doc-
by importance, the top 20 were made up of 17 deep learned tors, and a model trained on this dataset. By this, we aimed
features and only the 3 remaining were clinically relevant or to have an approximation of the accuracy of human perfor-
external statistical features. mance. Then we wanted to explore which features are the
most important ones our model is looking for. Finally, we
At the same time, many other participants of the Challenge made some efforts to highlight these important features to
also used neural networks [13, 14, 15, 16] as feature detector help human specialists.
in addition to their traditional feature extractors. One of
them was Andreotti et al. [17], who compared their feature- To solve that task, we developed a website that displays the
based classifiers to residual neural networks. They con- recordings and provides a graphical user interface to anno-
cluded that their neural networks outperform their feature- tate the currently displayed recording. Asking our doctors
based classifiers, showing the strength of the purely neu- to use that website, we obtained an alternative annotation
ral network-based approach. Parvaneh et al. [18] improved that helped us to validate the data set, i.e. which the obvi-
a dense convolutional network by signal quality index and ous cases are and which samples are too ambiguous to make
by the transformation of signal to the frequency domain. a clear diagnosis. The website picks recordings randomly, se-
Their approach was similar to ours as they applied a neural lecting recordings from the four different classes uniformly.
network to extract frequency-domain features. Xiong et al.
[19] tried multiple methods with success, which we utilized 3.2 Neural Network Architecture
as well, including skip connections, and a neural network
trained on the spectrogram. Based on empirical evidence in the field of computer vision,
to reduce training time and to make the resulting detector
Teijeiro et al. Normal AF Other Noise Avg. more robust, we applied recently published methods such
Datta et al. as ADAM[20], SELU[21], dilated convolutions [22], resid-
Zabihi et al. 0.90 0.85 0.74 0.56 0.83 ual blocks[23] - for which we will provide a quick overview
Hong et al. 0.92 0.82 0.75 0.52 0.83 in this section, and a more detailed description and sum-
0.91 0.84 0.73 0.50 0.83 mary of resulting improvements in appendix A. While sev-
ours 0.91 0.81 0.75 0.57 0.83 eral image recognition baseline NN architectures (such as
ResNet and VGG) could be re-designed to fit the AF detec-
0.88 0.80 0.69 0.64 0.79 tion task, we developed domain-specific ensembles from core
building blocks of the aforementioned baseline architectures.
Table 1: F1 scores on the hidden test set of the Alongside with the proposed networks, we have applied pre-
CinC Challenge 2017. The winner algorithms (first and post-processing steps: forked feature extraction on both
4 rows) excel in different tasks, since they utilize temporal and spectral domain, and merging encoded feature
different pools of features. An important note is vectors from different domains directly under the final clas-
that in order to reduce prediction uncertainty many sifier layer.
have submitted ensembles which improve the overall
accuracy; however, does not reveal the true gener- Despite the moderate improvements on the temporal and
alizing capabilities of the underlying algorithm. spectral domains by the application of the advanced build-
ing blocks (Figure 1), the extension of the logistic regression
on multi-domain feature representations resulted in an archi-
tecture that could significantly outperform the most robust
StuCoSReC Proceedings of the 2019 6th Student Computer Science Research Conference 76
Koper, Slovenia, 10 October
Using clinically meaningful features, good quality ECG mea- 3.1 Extending Dataset with Alternative Anno-
surements could be flawlessly classified by simply applying tation
traditional machine learning techniques (i.e. logistic regres-
sion, random-tree, support-vector-machines). On the other We compete with human performance; however, we do not
hand, real-life samples often pose too much noise and high know much about that. There is no better reference cur-
variance that could mislead handcrafted rules, and yet state- rently than human annotation (possibly by experts). Atrial
of-the-art approaches are still relying heavily on feature en- fibrillation is a human category, there is no mathematically
gineering for AF detection. Accordingly, three of the four exact definition for it, thus we need humans to define its
winners of the CinC Cardiology 2017 challenge combined characteristics. Unfortunately, these definitions are vague
only medically relevant feature-extractors and did not in- and fuzzy from the algorithmic point of view as there is al-
corporate any neural network-based features [7, 8, 9]. ways an inherent ambiguity in all applications when we try
to approximate human definitions using mathematical mod-
Only one of the four winner approaches fused expert fea- els. To come around this problem we have created a dataset
tures with other descriptors extracted by a neural network. in which every sample was annotated by multiple (in our
Hong et al.[10] proposed an algorithm concerning 64 fea- case two) experts to allow us measuring the variation of the
tures learned by a Deep Neural Architecture, namely a time- annotations as well.
invariant hierarchical feature extractor network with 4 resid-
ual blocks [11] combined with a Bi-directional Long Short- We asked two doctors to help us: Dr Sz. Herczeg PhD stu-
term Memory network (LSTM [12]) resulting in a 32 dimen- dent in the field of cardiac arrhythmia (Expert-1), and Dr I.
sional continuous descriptor and a Uni-directional LSTM Osztheimer cardiologist consultant (Expert-2), both work-
trained separately using centerwave input to extract 32 time ing at the Heart and Vascular Center of Semmelweis Uni-
related features. While the final classifier was applied on a versity, in Budapest. Our goal was to examine the difference
feature space with more than 600 dimensions, after ranking between the decisions of experts of the Challenge, our doc-
by importance, the top 20 were made up of 17 deep learned tors, and a model trained on this dataset. By this, we aimed
features and only the 3 remaining were clinically relevant or to have an approximation of the accuracy of human perfor-
external statistical features. mance. Then we wanted to explore which features are the
most important ones our model is looking for. Finally, we
At the same time, many other participants of the Challenge made some efforts to highlight these important features to
also used neural networks [13, 14, 15, 16] as feature detector help human specialists.
in addition to their traditional feature extractors. One of
them was Andreotti et al. [17], who compared their feature- To solve that task, we developed a website that displays the
based classifiers to residual neural networks. They con- recordings and provides a graphical user interface to anno-
cluded that their neural networks outperform their feature- tate the currently displayed recording. Asking our doctors
based classifiers, showing the strength of the purely neu- to use that website, we obtained an alternative annotation
ral network-based approach. Parvaneh et al. [18] improved that helped us to validate the data set, i.e. which the obvi-
a dense convolutional network by signal quality index and ous cases are and which samples are too ambiguous to make
by the transformation of signal to the frequency domain. a clear diagnosis. The website picks recordings randomly, se-
Their approach was similar to ours as they applied a neural lecting recordings from the four different classes uniformly.
network to extract frequency-domain features. Xiong et al.
[19] tried multiple methods with success, which we utilized 3.2 Neural Network Architecture
as well, including skip connections, and a neural network
trained on the spectrogram. Based on empirical evidence in the field of computer vision,
to reduce training time and to make the resulting detector
Teijeiro et al. Normal AF Other Noise Avg. more robust, we applied recently published methods such
Datta et al. as ADAM[20], SELU[21], dilated convolutions [22], resid-
Zabihi et al. 0.90 0.85 0.74 0.56 0.83 ual blocks[23] - for which we will provide a quick overview
Hong et al. 0.92 0.82 0.75 0.52 0.83 in this section, and a more detailed description and sum-
0.91 0.84 0.73 0.50 0.83 mary of resulting improvements in appendix A. While sev-
ours 0.91 0.81 0.75 0.57 0.83 eral image recognition baseline NN architectures (such as
ResNet and VGG) could be re-designed to fit the AF detec-
0.88 0.80 0.69 0.64 0.79 tion task, we developed domain-specific ensembles from core
building blocks of the aforementioned baseline architectures.
Table 1: F1 scores on the hidden test set of the Alongside with the proposed networks, we have applied pre-
CinC Challenge 2017. The winner algorithms (first and post-processing steps: forked feature extraction on both
4 rows) excel in different tasks, since they utilize temporal and spectral domain, and merging encoded feature
different pools of features. An important note is vectors from different domains directly under the final clas-
that in order to reduce prediction uncertainty many sifier layer.
have submitted ensembles which improve the overall
accuracy; however, does not reveal the true gener- Despite the moderate improvements on the temporal and
alizing capabilities of the underlying algorithm. spectral domains by the application of the advanced build-
ing blocks (Figure 1), the extension of the logistic regression
on multi-domain feature representations resulted in an archi-
tecture that could significantly outperform the most robust
StuCoSReC Proceedings of the 2019 6th Student Computer Science Research Conference 76
Koper, Slovenia, 10 October