Transfer Learning For Cloud Image Classification

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

TRANSFER LEARNING FOR CLOUD IMAGE CLASSIFICATION

Mayank Jain1,2 , Navya Jain3 , Yee Hui Lee4 , Stefan Winkler5 , and Soumyabrata Dev1,2
1
The ADAPT SFI Research Centre, Ireland
2
School of Computer Science, University College Dublin, Ireland
3
Ram Lal Anand College, University of Delhi, India
4
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
5
Department of Computer Science, National University of Singapore

ABSTRACT Ground-based sky imagers (GSIs) are now preferred over


Cloud image classification has been extensively studied in the satellites to capture cloud images at high temporal and spa-
literature, as it has several radio-meteorological and remote tial resolution in a cost-effective manner [3]. However, their
sensing applications. Recently, images from ground-based drawback is the presence of noise caused by factors such as
sky imagers (GSIs) are being widely used because of their sun glare, dust particles, and rain droplets [4]. This noise,
high temporal and spatial resolution and low infrastructure along with the diverse shapes, sizes, and textures of the
cost as compared to satellites. To classify sky/cloud images clouds, complicates accurate classification [5]. The complex
obtained from such GSIs, this paper1 examines the applica- nature of cloud images and the ever changing atmospheric
tion of transfer learning using the standard VGG-16 architec- conditions have led researchers to focus on developing pre-
ture. The paper further analyzes the importance of adjust- cise and reliable classification algorithms [6–9].
ing the number of neurons in the top dense layers to improve
the performance of the model. The reasons for the same 1.1. Related Work
are traced by conducting extensive experiments on multiple
datasets exhibiting varied properties. Conventional methods for classifying sky/cloud images rely
on statistically/manually identified characteristics that de-
Index Terms— Cloud Image Classification, Transfer
scribe the color and texture of the image [6]. Extending
Learning, Deep Learning, CNNs, VGG-16
on this concept, Dev et al. [7] proposed a modified texton-
based classification method to integrate color and texture
1. INTRODUCTION information and reported an average accuracy of 95% on the
SWIMCAT dataset of 5 classes.
Clouds are known to hinder the propagation of radio waves [1]
Recently, deep learning techniques have been employed.
and sunlight [2]. At the same time, different cloud types im-
In 2018, Zhang et al. [8] proposed CloudNet which reported
pact the rays in a different manner. Therefore, by accurately
accuracies of 98.33% on the SWIMCAT dataset and up to
identifying and classifying cloud types, researchers and en-
88% on the newly released CCSN dataset with 11 cloud
gineers can better understand their impact on radio wave
classes. Wang et al.’s [9] CloudA architecture raised the bar
transmission and develop strategies to mitigate any poten-
with 98.47% on SWIMCAT and up to 98.83% on another
tial disruptions. Similarly, solar energy systems can adjust
private dataset. In 2022, Liu et al. [5] achieved 84.3% ac-
their operations according to cloud types that are present lo-
curacy on a huge 7 class GCD dataset using a context graph
cally, optimizing energy production, and predicting periods of
attention network, where CloudNet was claimed to achieve
reduced solar irradiance. This information is valuable in en-
a mere 74.84% accuracy. Although deep learning models
suring reliable solar energy generation, especially in regions
show potential, such varied accuracies on different datasets is
heavily dependent on renewable energy sources.
primarily due to the small size of annotated datasets.
This research was conducted with the financial support of Science Foun-
dation Ireland under Grant Agreement No. 13/RC/2106 P2 at the ADAPT
SFI Research Centre at University College Dublin. ADAPT, the SFI Re- 1.2. Contributions
search Centre for AI-Driven Digital Content Technology, is funded by Sci-
ence Foundation Ireland through the SFI Research Centres Programme. Transfer learning (TL) has proven to be highly effective in
Send correspondence to S. Dev: [email protected]
1 In the spirit of reproducible research, the code to reproduce training deep learning models on limited datasets [10], which
the simulations in this paper is shared at https://github.com/ has been the issue of main concern. Hence, the objective of
jain15mayank/Cloud-Classification-using-Deep-Nets this paper is to examine the application of TL in cloud-type
recognition using GSI-obtained sky/cloud images. The key area extraction of 100 × 100 pixels. Traditional augmentation
contributions are summarized as follows: layers for random flipping and rotation are added to make the
models more robust.
• Significantly reduced training time with high accuracy
• Effective usage of TL to compete with state-of-the-art
• In-depth analysis of the impact of number of neurons 3. METHODOLOGY
in the top dense layers while doing TL
As noted before, most sky/cloud image classification datasets
have small cardinality. Hence, transfer learning (TL) is used
2. DATASETS
in this paper to train a largely successful deep learning model
This paper uses the following three publicly available datasets: for image classification, namely, VGG-16 [11]. We use the
1. Singapore Whole-sky IMaging CATegories (SWIM- pre-trained model weights on the IMAGENET dataset [12],
CAT) dataset [7] which is one of the biggest and most complex dataset for im-
2. Cirrus Cumulus Stratus Nimbus (CCSN) dataset [8] age classfication. A model trained on a complex dataset like
3. Ground-based Cloud Dataset (GCD) [5] IMAGENET is expected to perform better on a relatively sim-
SWIMCAT has a total of 784 cloud images of size 125 × pler task of cloud image classification. The study is divided
125 pixels, classified into 5 classes. An image of each class into two parts. While the first part assesses the effectiveness
is shown in Fig. 1 along with the number of images (img) in and benefits of TL, the second part aims to understand the
that class. impact of the number of neurons in the top dense layers.

Input Image Output Image Classification Vector


(100 X 100 X 3) (n X 1) - for n classes

VGG-16 Convolutional Base Layers Fully-Connected


n Units
(3 X 3 X 512) Layer - Output
SoftMax Activation
(nX1)

Clear Sky Patterned Thick-dark Thick-white Veil Clouds Flatten (4608X1)

(224 imgs) (89 imgs) (251 imgs) (135 imgs) (85 imgs) Fully-Connected
Fully-Connected
Layer - hFC1
h1 Units
h2 Units ReLU Activation
Layer - hFC2 (256X1)
ReLU Activation
(256X1)

Fig. 1: Sample images from each class along with the number of
images in each class of SWIMCAT dataset
Fig. 3: Network architecture based on VGG-16 base convolutional
layers used in this study. The number of units, i.e. h1 and h2 respec-
CCSN is a dataset of 2543 cloud images of size 400×400, tively, in hFC1 and hFC2 layers are variable. Number of units in the
which are then divided into 11 classes. While the other two output layer (n) depend on the class count of the dataset.
datasets were composed only of sky image patches obtained
from the images captured by a GSI, CCSN dataset is com- For the first part, the SWIMCAT dataset was used for
posed of landscapes, sceneries, both day and night images, the experiments. According to the TL regime, the convo-
and sky image patches. Such a large variety of images and lutional base layers of the VGG-16 architecture are used as
multiple classes with 139 − 340 images per class makes this is. In the original architecture of VGG-16, there are 2 fully
dataset more difficult to train on. connected (FC) hidden layers of 4096 neural units each and
GCD is the largest ever annotated dataset of sky patch ReLU activation. We will refer to these layers as hFC1 and
images captured by a GSI. It consists of 19, 000 cloud images, hFC2, as shown in Fig. 3. These layers are followed by an-
of size 512×512 pixels, which are divided into 7 classes. One other FC output layer with softmax activation. However, we
of the classes contains ‘mixed’ clouds with significantly fewer have observed that completely removing the hFC2 layer and
images and is not considered in the experiments of this paper. reducing the number of units in the hFC1 layer can signif-
Some sample images from both the CCSN and GCD datasets icantly improve the performance of the network. The re-
are shown in Fig. 2. sults are compared with the state-of-the-art CloudNet [8] and
CloudA [9] architectures. Furthermore, the standard VGG-16
network [11] was trained for the task with and without TL.
For all experiments, cosine decay with restarts was used after
setting the initial learning rate ηinit = 10−6 and the minimum
learning rate ηmin = 10−7 . Eq. 2 shows the computation of
the learning rate η in each epoch ep. Finally, early stopping
(a) CCSN (b) GCD was used to avoid overfitting by monitoring the categorical
Fig. 2: Some sample images from the CCSN, and GCD datasets. cross-entropy loss on the validation set. The held-out test set
comprises randomly selected 25% images from the dataset.
In the pre-processing stage, images from all three datasets
are resized to 125 × 125 pixels and then subjected to central tmax = 100 × (2.5max((ep−100) , 0) ) (1)
 
ep
1 + cos π tmax Dense Units in the hFC1 Layer
η = ηmin + (ηinit − ηmin ) ×
2
(2) 64 128 256 512 1024 2048 4096
For the second part, experiments are performed with 0 96.4% 96.9% 98.5% 99.5% 99.0% 98.5% 96.9%
similar settings as before, on all three datasets with all pos- 99.0
sible combinations of {hFC1, hFC2} with hFC1 ∈ {64, 64 98.0% 97.5% 99.0% 98.5% 98.0% 95.4% 95.9%

Dense Units in the hFC2 Layer


128, 256, 512, 1024, 2048, 4096} and hFC2 ∈ {0, 64, 128, 98.5

Accuracy on Test Set


256, 512, 1024, 2048, 4096}. Here, 0 units in hFC2 would 128 99.0% 98.5% 98.0% 97.5% 96.9% 97.5% 98.0%
mean that we have completely omitted that layer. These ex- 98.0
periments are done to fully understand the role of top dense 256 98.0% 98.5% 97.5% 99.0% 97.5% 96.9% 96.9%
97.5
layers in the case of cloud image classification using TL.
512 99.5% 99.0% 98.5% 98.0% 97.5% 98.0% 96.9%
97.0
4. RESULTS 1024 96.4% 96.9% 98.0% 96.9% 97.5% 96.4% 96.4%
96.5
Table 1 shows that the transfer learning-based VGG-16 model 2048 98.0% 97.5% 97.5% 97.5% 96.4% 96.4% 95.4%
96.0
gave results on par or better than the state-of-the-art models
4096 98.0% 98.5% 97.5% 96.9% 96.4% 96.9% 96.4%
with significantly fewer epochs. Furthermore, reducing the 95.5
number of neurons in the hFC layers also leads to a significant
improvement in accuracy. Fig. 4: Heatmap showing the grid-search results obtained on the
For the second part, Fig. 4 shows the heatmap of the ac- SWIMCAT dataset when different models were trained with differ-
curacies obtained on the test set for the SWIMCAT dataset. ent combinations of neurons in the hFC1 and hFC2 layers.
Lighter hues (indicating better results) can be clearly seen
in the top-left half of the heatmap. This means that for the better for SWIMCAT (99.5% by TL as compared to 98.47%
SWIMCAT dataset, adding more neurons in the hFC layers by Wang et al.’s [9]). This shows that TL is more effective
of VGG-16 could lead to overfitting. Additionally, darker for datasets with low cardinality like SWIMCAT.
hues at the very top-left corner indicate that over-reducing the
number of these neurons might lead to excessive information 5. CONCLUSION AND FUTURE WORK
loss.
Similarly, the heatmaps that were obtained in the CCSN This paper presents the effective use of transfer learning (TL)
and GCD datasets are shown in Fig. 5. Remember that the for the cloud image classification task. The paper notes that
number of classes is higher in the CCSN dataset and that it with TL, not only is the training time significantly reduced,
consists of more complex images. Therefore, in contrast to but the results are also on par with or better than the state-
Fig. 4, lighter hues can now be seen in Fig. 5(a) when the of-the-art custom architectures for datasets with low cardi-
number of neurons in the hFC layers was higher, that is, in the nality. Additionally, the paper performs an extensive grid
bottom-right half of the heatmap. Contrary to larger accuracy search study to understand the impact of the number of neu-
variations in SWIMCAT and CCSN, the variations for GCD rons in the top dense layers (hFC) on the performance of TL
dataset, upon changing the number of neurons in hFC1 and models. The paper notes that for simpler datasets and fewer
hFC2 layers, are very small. This is probably because of the classes, less number of neurons are preferred in the hFC lay-
large size of the GCD dataset. ers. Whereas for more complex datasets and more number of
Overall, it can be noted that the results for the optimized classes, more number of hFC neurons produce better results.
TL model are comparable to the state-of-the-art for GCD Consequently, this paper recommends tuning for hFC neurons
(88.3% on 6-classes by TL as compared to 84.3% on 7- in conjunction with other hyperparameters. Although this pa-
classes by Liu et al. [5]); possibly worse on CCSN; but much per highlights the effectiveness of TL in cloud image classi-

Network Architecture Avg. Training Time (/epoch) Number of epochs Accuracy on Test Set
CloudNet [8] 3.4 seconds 2480 96.97% (98.33%)
CloudA [9] 1.01 seconds 1000 97.47% (98.47%)
VGG−16 (hFC1 and hFC2 with 4096 units each) 1.15 seconds 1249 95.45%
TL-VGG−16 (hFC1 and hFC2 with 4096 units each) 1.15 seconds 843 96.46%
TL-VGG−16 (Only hFC1 with 512 units) 1.12 seconds 648 99.49%
Table 1: Average time per epoch and number of epochs that were required during training of the different deep CNN architectures on the
SWIMCAT dataset. The accuracy of the test set is also reported in the last column. The prefix TL- means that transfer learning was used in
that case. Values within parentheses ‘()’ indicate the accuracy on the test set that is claimed by the original authors.
Dense Units in the hFC1 Layer Dense Units in the hFC1 Layer
64 128 256 512 1024 2048 4096 64 128 256 512 1024 2048 4096

0 55.8% 60.2% 59.1% 58.8% 59.4% 61.7% 62.1% 64 0 87.9% 87.8% 87.7% 88.1% 88.2% 87.8% 88.2% 88.2
64 58.7% 60.1% 59.7% 59.0% 61.0% 64.1% 64.7% 63 64 87.9% 87.7% 86.7% 88.3% 88.1% 87.8% 87.9% 88.0

Dense Units in the hFC2 Layer


Dense Units in the hFC2 Layer

Accuracy on Test Set


Accuracy on Test Set
128 59.0% 59.6% 56.0% 59.8% 59.3% 59.3% 61.5% 62 128 86.9% 87.6% 88.1% 87.2% 87.3% 88.0% 88.0% 87.8
256 59.4% 59.6% 60.7% 60.5% 61.2% 57.9% 63.3% 61 256 88.3% 87.7% 86.7% 87.2% 87.8% 87.6% 87.9% 87.6
512 61.8% 59.9% 59.4% 62.2% 58.0% 62.7% 61.2% 60 512 87.4% 87.4% 87.1% 87.4% 87.7% 88.1% 87.7% 87.4
59
1024 61.8% 59.1% 58.2% 58.0% 58.6% 60.1% 61.3% 1024 87.5% 87.3% 87.7% 87.7% 87.1% 87.9% 88.3% 87.2
58
2048 62.1% 61.0% 60.7% 59.3% 62.6% 60.1% 60.4% 2048 87.9% 87.1% 88.1% 87.3% 87.5% 87.1% 88.1% 87.0
57
4096 61.9% 61.0% 63.3% 62.1% 62.9% 64.1% 63.3% 4096 87.5% 88.0% 87.8% 87.7% 87.5% 88.3% 87.9% 86.8
56
(a) Grid-search results over CCSN (b) Grid-search results over GCD
Fig. 5: Heatmap showing the accuracy results obtained on the (a) CCSN and (b) GCD datasets when different models were trained with
different combinations of neurons in the hFC1 and hFC2 layers.

fication problems with proper hyperparameter tuning, the au- [6] T. Kliangsuwan and A. Heednacram, “Feature extrac-
thors would like to extend this study to other standard deep tion techniques for ground-based cloud type classifica-
learning models apart from VGG-16. Also, it will be inter- tion,” Expert Systems with Applications, vol. 42, no. 21,
esting to see if the importance of the number of hFC neurons pp. 8294–8303, 2015.
persists in other architectures such as ResNet, Inception, and
[7] S. Dev, Y. H. Lee et al., “Categorization of cloud im-
Xception.
age patches using an improved texton-based approach,”
in Proc. IEEE International Conference on Image Pro-
6. REFERENCES cessing (ICIP), 2015, pp. 422–426.

[1] F. Yuan, Y. H. Lee et al., “Comparison of radio-sounding [8] J. Zhang, P. Liu et al., “CloudNet: Ground-based cloud
profiles for cloud attenuation analysis in the tropical re- classification with deep convolutional neural network,”
gion,” in Proc. IEEE International Symposium on An- Geophysical Research Letters, vol. 45, no. 16, pp. 8665–
tennas and Propagation, 2014, pp. 259–260. 8672, 2018.
[9] M. Wang, S. Zhou et al., “CloudA: A ground-based
[2] S. Dev, F. M. Savoy et al., “Estimation of solar ir- cloud classification method with a convolutional neural
radiance using ground-based whole sky imagers,” in network,” Journal of Atmospheric and Oceanic Technol-
Proc. IEEE International Geoscience and Remote Sens- ogy, vol. 37, no. 9, pp. 1661 – 1668, 2020.
ing Symposium (IGARSS), 2016, pp. 7236–7239.
[10] M. Hussain, J. J. Bird et al., “A study on cnn transfer
[3] M. Jain, I. Gollini et al., “An extremely-low cost learning for image classification,” in Advances in Com-
ground-based whole sky imager,” in Proc. IEEE Inter- putational Intelligence Systems, A. Lotfi, H. Bouchachia
national Geoscience and Remote Sensing Symposium et al., Eds. Cham: Springer International Publishing,
(IGARSS), 2021, pp. 8209–8212. 2019, pp. 191–202.

[4] M. Jain, N. Jain et al., “Detecting blurred ground-based [11] K. Simonyan and A. Zisserman, “Very deep convolu-
sky/cloud images,” in Proc. IEEE USNC-URSI Radio tional networks for large-scale image recognition,” in
Science Meeting (Joint w/AP-S Symposium), 2021, pp. Proc. International Conference on Learning Represen-
62–63. tations (ICLR), 2015.
[12] J. Deng, W. Dong et al., “Imagenet: A large-scale hi-
[5] S. Liu, L. Duan et al., “Ground-based remote sens-
erarchical image database,” in Proc. IEEE Conference
ing cloud classification via context graph attention net-
on Computer Vision and Pattern Recognition (CVPR),
work,” IEEE Transactions on Geoscience and Remote
2009, pp. 248–255.
Sensing, vol. 60, pp. 1–11, 2022.

You might also like