4.3.1. Comparison with Relevant Previous Work
In order to better assess the contribution of the proposed approach, we first summarize in
Table 2 the classification performance of previous state of the art algorithms that were presented in
Section 2. In more details, Holmes et al. [
17] presented, in 2012, a method that differentiates blister and non-blister events with an accuracy of 89.0%. A year later, Holmes et al. [
18,
19], also, developed an algorithm that recognizes blister events and breath events (with an accuracy of 92.1%) and separates inhalations from exhalations (with an accuracy of more than 90%). Later, Taylor et al. developed two main algorithms for blister detection [
26,
37] based on Quadratic Discriminant Analysis and ANN, and achieved an accuracy of 88.2% and 65.6%, respectively. Nousias et al. in Reference [
13] presented a comparative study between Random Forest, ADABoost, Support Vector Machines and Gaussian Mixture Models, reaching the conclusion that RF and GMM yield a 97% to 98% classification accuracy on the examined dataset, when utilizing MFCC, Spectrogram and Cepstrogram features.
Pettas et. al [
15] developed a recurrent neural network with long short term memory (LSTM), which was tested on the same dataset with this study and using the same modeling schemes, that is,
SingleSubj,
MultiSubj and
LOSO. For the subject-specific modeling case the overall prediction accuracy was
, with higher accuracy in the prediction of breathing sounds (98%). Lower accuracy is demonstrated in drug administration and environmental sounds. Much higher accuracy is reported for
MultiSubj modeling, where the training samples are obtained from all subjects and shuffled across time. It yielded a drug administration prediction accuracy of
, but a lower prediction accuracy of environmental sounds (
), demonstrating a total of
accuracy over all cases. Furthermore, the
LOSO validation demonstrated similar results, with the
SingleSubj case. The high classification accuracy obtained by LSTM-based deep neural networks, is also in agreement with other studies [
13,
19]. Specifically, the recognition of breathing sounds is more accurate than the drug administration phase, which reaches a value of
, while the overall accuracy is
. In order to compare our approach with previous studies, we followed the same validation strategies for each different convolutional neural network architecture and summarize the comparative results in
Table 3.
From
Table 2 and
Table 3, it is apparent that the classification accuracy achieved by our approach does not exceed the performance of the relevant state of the art approaches. In fact, our approach performs, similarly, with the methods developed by Holmes et al. [
17,
18], Taylor et al. [
24] and Pettas et al. [
15], but the approach of Nousias et al. [
13] outperforms our algorithm, mainly, for the drug and environmental noise classes. However, the utilization of a CNN architecture in the time domain allows for an implicit signal representation, that circumvents the need of additional feature extraction (e.g., in the spectral domain) and, thereby, results in significantly lower execution times. We compare the computational cost of Model 5 of our method with the Random Forest algorithm presented in Reference [
13], both executed in the same machine (Intel(R) Core(TM) i5-5250U CPU @ 2.7 GHz). The results are summarized in
Figure 6.
This figure highlights the gain in computational speed up of our approach, compared to the time consuming Random Forest algorithm with feature extraction. Specifically,
Figure 6 shows that classification by RF, using multiple features, requires more than 7 s, whereas the CNN Model 5 requires less than half a second. Finally, it is important to note that our approach is faster even when only STFT is extracted and used as input to the Random Forest.
4.3.2. Pruning Scalar Weights
In order to evaluate the performance of this algorithm, we present the classification accuracy as well as the compression rate, when no retraining is applied, in
Table 4. The parameter
l, which determines the threshold for pruning, varies from
to
with a step of
. It is clear that when increasing the parameter
l and consequently the threshold for pruning, the accuracy of the classifier decays. Among the five models, more robust to changes appears to be Models 1, 4 and 5, because they retain their performance above
, even when the parameter
l is set to
. On the other hand, model 3 and 4 show the worst performance dropping below
for intermediate values of
l.
The results, presented in
Table 5, corresponding to the approach that employs the retraining technique, show that the classifiers are able to adapt to the changes made in the previous layers, retaining their high accuracy, independently of the threshold defined by
l and
. It is worth mentioning that with this approach the lowest classification accuracy is
achieved by model 3, whereas model 5 reaches up to
, improving its initial performance. Additionally, we are able to compress the architectures two times more than the previous approach, where retraining process is not included. This occurs because retraining the network results in larger standard deviation of the weights in each layer, but with their mean value almost equal to zero. Thus, more weights will be zeroed out.
It should be highlighted that pruning scalar values can only reduce the memory requirements since there are fewer non zero weights. However, it does not perform structural pruning, meaning that it is uncertain if the pruned parameters belong to a particular filter or a neuron and therefore it does not improve the computational time.
4.3.3. Pruning Filters in Convolutional Layers
In this section, we present the results for the evaluation of all five developed models, after applying the filter pruning method according to which the filters with the smallest magnitude are removed. We tested the classification accuracy of the pre-trained models for multiple combinations of pruned filters and, additionally, we investigated the effect of iterative pruning and retraining. For every model we chose to leave at least one filter at each convolutional layer. Thus, for Models 1, 2, 5 the number of the removed filters varies from 1 to 15, whereas for Models 3, 4 it is between 1 and 7.
Table 6 and
Table 7 present the performance of the models in terms of test loss and test accuracy, as well as results for the compression and the theoretical speed up of each architecture.
In
Table 6 we observe that the classification accuracy of every model is significantly deteriorating, even at low compression rates. The reason for this is that filter pruning is employed on pre-trained models and therefore the values of their weights are not the optimal for the new, shallower architectures. In addition, Model 2 can be compressed at a larger scale than the others, due to its architecture. It has the most filters in the convolutional layers and, at the same time, the smallest number of neurons in the fully connected layers, as shown in
Table 1, with the amount of parameters belonging to convolutional layers being approximately
of the total number of parameters, whereas for the other models it is
or lower. Note that even with half of the filters removed, the compression rate is low, indicating that the majority of the weights belongs to the fully connected layer. On the other hand, the removal of a filter reduces the computational requirements. For example, when we prune 2 out of 16 filters from model 1, the new, more shallow, architecture requires the
of the initial floating point operations to perform the classification task, providing a reduction of over
. Approaching the maximum number of the pruned filters (leaving only one filter), the required operations are, as expected, considerably reduced to only
of the operations required by un-pruned models.
A countermeasure against the drop of classification accuracy, due to filter removal, is the utilization of retraining technique, as described in
Section 3.2. The results of filter pruning method with iterative retraining are shown in
Table 7. It can be observed that the classification accuracy for all models except from Models 2, 4 remains over
, whereas for Models 2, 4 it drops to
and
when
and
filters are pruned in each layer, respectively. Thus, by applying this method we can significantly reduce the computational time without sacrificing efficiency. A characteristic example is Model 5, which reaches up to
classification accuracy, even with 13 filters pruned. For the same model, the respective performance achieved without retraining is
, while for both cases, the pruned models require
of the operations needed by the initial un-pruned architecture.
4.3.4. Quantizing Only the Convolutional Layers
To evaluate the performance of the vector quantization method, we applied both scalar and product quantization to convolutional layers, as well as to fully connected layers of the network. This paragraph shows the classification accuracy of the developed models with respect to the compression rate and the number of required floating point operations, when the quantization methods are applied only on convolutional layers. As mentioned earlier, both scalar and product quantization are performed along the channel’s dimension. We tested different combinations regarding the number of clusters and the value of the splitting parameter s.
In particular, for scalar quantization the number of clusters varies between 1 and 8, whereas for product quantization we tried
and the maximum number of clusters was set to 8, 4, 2 respectively. Note that for
we essentially perform scalar quantization.
Table 8 shows the classification accuracy and the achieved compression, as well as the speed up in terms of flops. It is clear, that by increasing the number of clusters and therefore the number of filters that contain different kernels, the accuracy of the classifiers increases as well. This originates from the fact that with more different filters more features of the input can be extracted in convolutional layers. It should be also mentioned that the compression rate achieved by this method, is lower than the rate achieved by filter pruning. This happens because an index matrix is required, to map the filters in the codebook to the filters in the original structure, which increases the memory requirements.
Concerning the amount of required operations in convolutional layers, as described before, it can be reduced with this approach by reusing pre-computed inner products. In particular, for similar convolutional kernels we only need to convolve one of them with the input feature map and then the result is shared. Then, the biases are added and the result passes the activation and pooling function, to produce the input feature map of the next layer. It is worth mentioning that the percentage of the required operations is directly proportional to the percentage of the filters needed to be stored. For example, clustering of 16 filters to 4 clusters causes a reduction in required floating point operations. Again, comparing the flops for filter pruning and scalar quantization, the first is more efficient. This is because the removal of a filter reduces the dimensions of the next layer’s input feature map which is not the case for the scalar quantization.
Next, we evaluate the effect of product quantization on the performance of the models. Similarly, to scalar quantization on convolutional layer, we examine the fluctuation of the accuracy with respect to compression rate and the ratio of the required floating operations of the quantized architectures to the amount of flops for the initial structure, as shown in
Table 9. The splitting parameter takes the values 1, 2, 4. As
s increases, the number of clusters in each subspace decreases, since there are fewer filters. Because both the separation of the weight matrix of each layer and the k-means algorithm are performed on the channel axis, when
the results are identical to those with scalar quantization. Additionally, the increase in the value of
s result in a slight decrease in the classification accuracy of the model. For example, Model 1 with
and
reaches an accuracy of
, whereas with
and
, a combination that produces 4 distinct filters in each layer, leads to
. This decrease indicates that apart from how many filters we group together, it is also crucial which filters are grouped. By splitting the original space in smaller sub-spaces, we narrow the available combinations of filters and, thus, filters that differ a lot from each other could be combined forming one cluster.
It is also important to note that the increase of the s parameter leads to a slight decrease of the compression rate. This is because with higher values of the splitting parameter, the lowest number of clusters in the weight matrix is increased as well. For example, for and , the amount of different filters is 4, but if we set the respective amount would be 8, since we form 4 clusters in each subspace. Therefore, for the minimum number of clusters (1 cluster) and for , one filter will be created. For , two distinct filters will be formed and finally for , four filters will have different weight values.
Concerning the performance of the architectures, with respect to the computational complexity, we observe in
Table 9 that
of the initial flops can be avoided for Models 1, 2, 5 without any drop in classification accuracy. On the other hand, for the remaining models we save
of the initial required operations, with no drop in classification accuracy. For
, we are able to cut the majority for the operations with Models 1, 2, 5 reaching up to
accuracy with
of the initial amount of floating point operations. However, in order to achieve a classification accuracy higher than
we can reduce the amount of the operations by half at most. At
of the initial number of flops, model 3 reaches up to
and model 4 to
.
To sum up, quantizing convolutional layers using k-means algorithm, either with the scalar or the product method, we can compress the structure and at the same time we can speed up the production of the output feature map and, consequently, the prediction of the classifier. Between these two benefits, the computation gain is greater, since we efficiently can remove up to of the operations required initially, whereas the maximum compression rate achieved reaches up to . This result is consistent with the theory suggesting that convolutional layers are computationally expensive and they do not add excessive memory requirements. Finally, for product quantization, increasing the value of the parameter s, the performance of the classifier is deteriorated.
4.3.5. Quantizing Only Fully Connected Layer
Similarly, in this paragraph we present the results for scalar and product quantization, but in this case they are performed on the fully connected layers. For this approach we selected to perform quantization with k-means at the y axis of the weights matrix. In this way, we force the neurons to have the same output response and, therefore, we are able to reduce the computational requirements of the layers. Subsequently, we compare the requirements in storage and computation between convolutional and fully connected layer and validate that convolutions are time consuming, whereas fully connected layers significantly increase memory requirements. For Models 1, 2, 5 we perform scalar quantization with number of clusters up to 128 and for Models 3, 4 up to 52. We also executed tests for and clusters up to 32, 16, 8 respectively. It is important to mention that because we force some neurons to have the same output, we do not perform quantization at the output layer of the classifier i.e the last fully connected layer.
From the results shown in
Table 10 it is clear that Models 1, 5, which share the same structure, have the same behaviour retaining their initial performance until a compression rate of
. Furthermore, we are able to achieve a larger compression for Models 3, 4 because both of them have the shallowest convolutional structure, with three convolutional layers of 8 filters in each layer. This means that for these models the weights of fully connected layers occupy a greater portion of the total amount of the trainable parameters. It is important to highlight that we can compress model 4 more than six times and yet achieve a high classification accuracy up to
. Finally, for Models 2, 3 the maximum number of clusters is 52 because the last layer contains
weights and therefore there is no reason to increase it further. Also, as described above, scalar quantization does not contribute to the speed up of the classification task and this is why
Table 10 does not contain the flops that the quantized models require.
The next approach to compress and accelerate the fully connected layers is product quantization through
k-means algorithm.
Table 11 and
Table 12 present the classification accuracy, compression rate as well as the extent of the reduction of floating point operations for different number of clusters and different values of the splitting parameter
s. In
Table 11 Models 2, 3 we stop at 12 clusters, because they have a fully connected layer with 16 neurons and therefore there is no point in increasing the number of clusters beyond 12. Recall that quantization is performed on the columns of the weight matrix, that is, on the output response of a layer. It should be noted that the achieved compression rate is higher as the number of clusters is increasing than the respective rate with scalar quantization. This lies on the fact that the index matrix for this approach is smaller than the index matrix for scalar quantization, containing as many slots as the neurons in each layer are. Furthermore, the difference in the compression rate between the models is due to the difference between their structure. For example, Model 4 has smaller convolutional layers from Models 1, 2, 5 and larger fully connected layer from Model 3 resulting in a higher compression rate. Moreover, it is clear that by quantizing fully connected layers we do not have any gain in computational cost, since the smallest ratio with an acceptable performance is
, which means that the quantized model needs to execute
of initial amount of floating point operations. Finally,
Table 12 shows the performance of the classifiers for
and 4. In this case, it should be highlighted that increasing the value of
s the classification accuracy of the models decreases, despite the same compression rate. For example, model 1 achieves a classification accuracy of
with
and 4 clusters but for
and 2 clusters its accuracy drops to
.
4.3.6. Combining Filter Pruning and Quantization
Finally, we investigate the combination of the aforementioned methods by applying filter pruning in the convolutional layers and quantization on fully connected layers. In this way, we are able to reduce the requirements of the classifier in both memory and computational power. The approach of the iterative training is selected for filter pruning, since it yields better results than the approach with no retraining. Firstly, we perform filter pruning in order to exploit the fact that the weights adjust to the changes and, then, quantization algorithm, either scalar or product, is executed. Below, we present the classification accuracy of the developed architecture, with respect to the amount of the pruned feature maps in convolutional layers and the number of clusters in fully connected layer.
Figure 7 shows how classification accuracy changes as the number of pruned feature maps or the clusters, produced with scalar quantization, increases. It is clear that the accuracy of all models, apart from model 2, depends mostly on the number of clusters in fully connected part of the classifier. When we perform k means on it, with clusters equal to 1, the classification accuracy drops to
(Models 1, 2, 4, 5) and
(Model 3). Model 2 reaches
or above when 14 out of 16 filters have been removed and for 8 clusters in each fully connected layer. However, when we prune 15 out of 16 filters its accuracy drops to
without improving when the number of clusters is increasing. On the other hand, the rest of the models keep their classification accuracy at high levels, even when their convolutional layers are left with only one filter. The best architectures seem to be Models 1 and 5, which achieve an accuracy over
with 8 clusters and even with 15 out of 16 filters removed.
Next, we proceed to the evaluation of combining filter pruning method with product quantization along y axis of the weight matrix of the fully connected layer.
Figure 8 shows the classification accuracy of the developed models, versus the number of pruned feature maps and the number of clusters in each subspace, when the value of splitting parameter
s is set equal to 1. For Models 1, 4, 5 the maximum amount of clusters is 32 whereas for Models 2, 3 is 12. Again, the parameter that affects mostly the performance of the classifier is the number of clusters produced by k-means algorithm. For
, hence when we force all neurons to have the same output response, the classification accuracy drops dramatically to
(Models 1, 5) and
(Models 2, 3, 4). It is also clear that the highest classification accuracy can be achieved with intermediate values of the parameters. For example, model 5 reaches up to
accuracy, which is the highest among our architectures, after pruning 7 feature maps and for 8 clusters in fully connected layer achieving 8 times compression of the initial structure. For the same level of compression model 1 achieves
(
), model 2
(
), model 3 reaches up to
(
) and model 4 to
(
).
Increasing the splitting parameter
s to the value of 2 we take the results presented in
Figure 9. Similarly to the results obtained from the previous experiments increasing the number of clusters in each subspace of the weight matrix we are able to improve the performance of the classifier. Again, the highest classification accuracy,
, is achieved by model 5 when we prune 3 filters from the convolutional layer and quantize the neural network at the back end, with 8 clusters in each subspace, leading to a compression factor of
. However, we can compress it by a factor of 5 and at the same time achieve an accuracy up to
, which is an acceptable trade off between accuracy and compression, by quantizing it with 4 clusters instead of 8. For this compression rate model 1 yields
(
), model 2 reaches up to
accuracy (
), model 3 up to
(
) and model 4 achieves an accuracy of
(
). It is important to note that in order to achieve a compression rate of 8, we need, for model 5, to prune 7 filters and quantize with 4 clusters, but with a drop at classification accuracy of
achieving
. This result is consistent with those presented in previous sections, where it is shown that increasing the value of parameter
s, the accuracy of the classifier at the same level of compression decreases.
Finally,
Figure 10 presents the results for the classification accuracy with respect to the number of pruned feature maps and the number of clusters when the splitting parameter
s is set to 4. For this approach the highest classification,
, is achieved by
Model 5 for the number of pruned filters set equal to five and for four clusters, which results in a compression rate of 4. The following table presents the number of clusters and pruned filters for each model, with a compression rate equal to 4.
Table 13 summarizes the results from the combination of filter pruning and scalar quantization in fully connected layers for the five proposed models and for the same compression rate. It can be seen that Models 1, 4, 5 yield similar classification accuracy despite the fact that Model 4 has fewer filters, while Models 2 and 3, which contain fewer parameters in the fully connected layer, fail to reach the same level of accuracy. In other words we can achieve high performance, even with limited number of filters, as long as we retain enough parameters at the fully connected layer. Overall, when comparing the compression techniques, Model 5 seems to achieve the best performance in most of the
MultiSubj evaluation experiments. This fact, along with its superiority in
SingleSubj and
LOSO validation schemes, indicates that Model 5 is the most preferable among the five architectures.