Transfer Learning For Cloud Image Classification
Transfer Learning For Cloud Image Classification
Transfer Learning For Cloud Image Classification
Mayank Jain1,2 , Navya Jain3 , Yee Hui Lee4 , Stefan Winkler5 , and Soumyabrata Dev1,2
1
The ADAPT SFI Research Centre, Ireland
2
School of Computer Science, University College Dublin, Ireland
3
Ram Lal Anand College, University of Delhi, India
4
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
5
Department of Computer Science, National University of Singapore
(224 imgs) (89 imgs) (251 imgs) (135 imgs) (85 imgs) Fully-Connected
Fully-Connected
Layer - hFC1
h1 Units
h2 Units ReLU Activation
Layer - hFC2 (256X1)
ReLU Activation
(256X1)
Fig. 1: Sample images from each class along with the number of
images in each class of SWIMCAT dataset
Fig. 3: Network architecture based on VGG-16 base convolutional
layers used in this study. The number of units, i.e. h1 and h2 respec-
CCSN is a dataset of 2543 cloud images of size 400×400, tively, in hFC1 and hFC2 layers are variable. Number of units in the
which are then divided into 11 classes. While the other two output layer (n) depend on the class count of the dataset.
datasets were composed only of sky image patches obtained
from the images captured by a GSI, CCSN dataset is com- For the first part, the SWIMCAT dataset was used for
posed of landscapes, sceneries, both day and night images, the experiments. According to the TL regime, the convo-
and sky image patches. Such a large variety of images and lutional base layers of the VGG-16 architecture are used as
multiple classes with 139 − 340 images per class makes this is. In the original architecture of VGG-16, there are 2 fully
dataset more difficult to train on. connected (FC) hidden layers of 4096 neural units each and
GCD is the largest ever annotated dataset of sky patch ReLU activation. We will refer to these layers as hFC1 and
images captured by a GSI. It consists of 19, 000 cloud images, hFC2, as shown in Fig. 3. These layers are followed by an-
of size 512×512 pixels, which are divided into 7 classes. One other FC output layer with softmax activation. However, we
of the classes contains ‘mixed’ clouds with significantly fewer have observed that completely removing the hFC2 layer and
images and is not considered in the experiments of this paper. reducing the number of units in the hFC1 layer can signif-
Some sample images from both the CCSN and GCD datasets icantly improve the performance of the network. The re-
are shown in Fig. 2. sults are compared with the state-of-the-art CloudNet [8] and
CloudA [9] architectures. Furthermore, the standard VGG-16
network [11] was trained for the task with and without TL.
For all experiments, cosine decay with restarts was used after
setting the initial learning rate ηinit = 10−6 and the minimum
learning rate ηmin = 10−7 . Eq. 2 shows the computation of
the learning rate η in each epoch ep. Finally, early stopping
(a) CCSN (b) GCD was used to avoid overfitting by monitoring the categorical
Fig. 2: Some sample images from the CCSN, and GCD datasets. cross-entropy loss on the validation set. The held-out test set
comprises randomly selected 25% images from the dataset.
In the pre-processing stage, images from all three datasets
are resized to 125 × 125 pixels and then subjected to central tmax = 100 × (2.5max((ep−100) , 0) ) (1)
ep
1 + cos π tmax Dense Units in the hFC1 Layer
η = ηmin + (ηinit − ηmin ) ×
2
(2) 64 128 256 512 1024 2048 4096
For the second part, experiments are performed with 0 96.4% 96.9% 98.5% 99.5% 99.0% 98.5% 96.9%
similar settings as before, on all three datasets with all pos- 99.0
sible combinations of {hFC1, hFC2} with hFC1 ∈ {64, 64 98.0% 97.5% 99.0% 98.5% 98.0% 95.4% 95.9%
Network Architecture Avg. Training Time (/epoch) Number of epochs Accuracy on Test Set
CloudNet [8] 3.4 seconds 2480 96.97% (98.33%)
CloudA [9] 1.01 seconds 1000 97.47% (98.47%)
VGG−16 (hFC1 and hFC2 with 4096 units each) 1.15 seconds 1249 95.45%
TL-VGG−16 (hFC1 and hFC2 with 4096 units each) 1.15 seconds 843 96.46%
TL-VGG−16 (Only hFC1 with 512 units) 1.12 seconds 648 99.49%
Table 1: Average time per epoch and number of epochs that were required during training of the different deep CNN architectures on the
SWIMCAT dataset. The accuracy of the test set is also reported in the last column. The prefix TL- means that transfer learning was used in
that case. Values within parentheses ‘()’ indicate the accuracy on the test set that is claimed by the original authors.
Dense Units in the hFC1 Layer Dense Units in the hFC1 Layer
64 128 256 512 1024 2048 4096 64 128 256 512 1024 2048 4096
0 55.8% 60.2% 59.1% 58.8% 59.4% 61.7% 62.1% 64 0 87.9% 87.8% 87.7% 88.1% 88.2% 87.8% 88.2% 88.2
64 58.7% 60.1% 59.7% 59.0% 61.0% 64.1% 64.7% 63 64 87.9% 87.7% 86.7% 88.3% 88.1% 87.8% 87.9% 88.0
fication problems with proper hyperparameter tuning, the au- [6] T. Kliangsuwan and A. Heednacram, “Feature extrac-
thors would like to extend this study to other standard deep tion techniques for ground-based cloud type classifica-
learning models apart from VGG-16. Also, it will be inter- tion,” Expert Systems with Applications, vol. 42, no. 21,
esting to see if the importance of the number of hFC neurons pp. 8294–8303, 2015.
persists in other architectures such as ResNet, Inception, and
[7] S. Dev, Y. H. Lee et al., “Categorization of cloud im-
Xception.
age patches using an improved texton-based approach,”
in Proc. IEEE International Conference on Image Pro-
6. REFERENCES cessing (ICIP), 2015, pp. 422–426.
[1] F. Yuan, Y. H. Lee et al., “Comparison of radio-sounding [8] J. Zhang, P. Liu et al., “CloudNet: Ground-based cloud
profiles for cloud attenuation analysis in the tropical re- classification with deep convolutional neural network,”
gion,” in Proc. IEEE International Symposium on An- Geophysical Research Letters, vol. 45, no. 16, pp. 8665–
tennas and Propagation, 2014, pp. 259–260. 8672, 2018.
[9] M. Wang, S. Zhou et al., “CloudA: A ground-based
[2] S. Dev, F. M. Savoy et al., “Estimation of solar ir- cloud classification method with a convolutional neural
radiance using ground-based whole sky imagers,” in network,” Journal of Atmospheric and Oceanic Technol-
Proc. IEEE International Geoscience and Remote Sens- ogy, vol. 37, no. 9, pp. 1661 – 1668, 2020.
ing Symposium (IGARSS), 2016, pp. 7236–7239.
[10] M. Hussain, J. J. Bird et al., “A study on cnn transfer
[3] M. Jain, I. Gollini et al., “An extremely-low cost learning for image classification,” in Advances in Com-
ground-based whole sky imager,” in Proc. IEEE Inter- putational Intelligence Systems, A. Lotfi, H. Bouchachia
national Geoscience and Remote Sensing Symposium et al., Eds. Cham: Springer International Publishing,
(IGARSS), 2021, pp. 8209–8212. 2019, pp. 191–202.
[4] M. Jain, N. Jain et al., “Detecting blurred ground-based [11] K. Simonyan and A. Zisserman, “Very deep convolu-
sky/cloud images,” in Proc. IEEE USNC-URSI Radio tional networks for large-scale image recognition,” in
Science Meeting (Joint w/AP-S Symposium), 2021, pp. Proc. International Conference on Learning Represen-
62–63. tations (ICLR), 2015.
[12] J. Deng, W. Dong et al., “Imagenet: A large-scale hi-
[5] S. Liu, L. Duan et al., “Ground-based remote sens-
erarchical image database,” in Proc. IEEE Conference
ing cloud classification via context graph attention net-
on Computer Vision and Pattern Recognition (CVPR),
work,” IEEE Transactions on Geoscience and Remote
2009, pp. 248–255.
Sensing, vol. 60, pp. 1–11, 2022.