Mapping paddy rice by the object-based random forest method using time
series Sentinel-1/ Sentinel-2 data
Rice is one of the world’s major staple foods, especially in China. In this study, we proposed an object-based random forest (RF)
method for paddy rice mapping using time series Sentinel-1 and Sentinel-2 data. Firstly, the Robust Adaptive Spatial Temporal Fusion
Model (RASTFM) was used to blend MODIS and Sentinel-2 data for achieving multi-temporal Sentinel-2 data. Subsequently, the
Savitzky-Golay filter (S-G) was applied to smooth the time series Sentinel-2 NDVI data. And the phenological parameters were derived
from the filtered time series NDVI using the threshold method. Then, the optimum feature combination for paddy mapping was formed
on the basis of Sentinel-2 MSI images, time series Sentinel-2 NDVI, phenology data and time series Sentinel-1 SAR backscattering
images by using the JBh distance. Finally, an object-based Random Forest classifier was used to extract paddy rice with the optimum
feature combination. The result showed that fused Sentinel-2 NDVI time series using RASTFM has a high correlation with the original
Sentinel-2 image. The overall accuracy and Kappa coefficient of the classification results are higher than 95% and 0.93, respectively, when
use the optimum feature combination and object-based RF method. The proposed method can provide technology support for rice map-
ping in areas with a lot of cloudy and rainy weathers.
Fig. 1. Location of the study area (Sentinel-2 image on April 18 of true color composite).
et al. 2015; Zhang et al. 2015). However, its low spatial res- scheduling and other issues. The robust adaptive spatial
olution (250–500 m) may compromise detailed paddy temporal fusion model (RASTFM) (Zhao et al., 2018),
information in the region with heavily heterogeneity. With which fully considers the aforementioned problems, can
a higher spatial resolution data may increase the accuracy achieve a higher prediction accuracy than that of tradi-
of paddy rice mapping (Qin et al. 2015; Dong et al. tional fusion models. In this study, we used RASTFM to
2015), such as Landsat and Sentinel-2 data, especially the generate a time series Sentinel-2-like NDVI data.
Sentinel-2 imagery with spatial resolutions of 10 m, 20 m, This study developed a method for mapping paddy rice
60 m, temporal resolution of 5 days and 13 bands. But with high accuracy in subtropical areas. Although a good
Sentinel-2 is vulnerable to rainy and cloudy weathers, classification result of wetland vegetation can be obtained
which make it difficult to get enough clear imagery for rice from the time series Sentinel-2 image, a time series syn-
paddy monitoring in tropical and subtropical area (Whyte thetic aperture radar (SAR) data may provide higher veg-
et al., 2018). etation classification accuracy, as the radar signal can
Fusing high-spatial/low-temporal resolution data with penetrate clouds, rain, vegetation and soil. Many studies
high temporal/low spatial resolution data (e.g., Landsat have demonstrated that combining optical and radar data
and MODIS) can generate high temporal/high spatial res- can increase the classification accuracy in tropical areas
olution data. A lots of spatial-temporal fusion models have (Jin et al., 2017; Tricht et al., 2018; Hakdaoui et al.,
been proposed and applied in land use/cover researches, 2019). Thus, time series Sentinel-2 and Sentinel-1 imagery
such as the spatial and temporal adaptive reflectance fusion were employed to get the paddy map of the Dongting Lake
model (STARFM), the enhanced STARFM, and the area of Hunan Province, China, a humid region. The field
spatio-temporal integrated temperature fusion model data was used to validate the accuracy of the paddy map.
(STITFM) (Hilker et al., 2009; Walker et al., 2012;
Eemlyanova et al., 2013; Jia et al., 2014). But these fusion 2. Study area and data
algorithms have some limitations (Dao and Liou, 2015; Wu
et al., 2012; Wang and Atkinson, 2017). First, the complex- 2.1. Study area
ity of land cover change in homogeneous or heterogeneous
landscape is not considered comprehensively. The land The Dongting Lake wetland (latitudes 28°300 to 29°310 N
cover change includes invisible change and shape change. and longitudes 111°400 to 113°100 E) is located in the middle
Second, most of spatial-temporal fusion algorithms need reaches of Yangtze River, and on the south bank of the
two or more low-high spatial resolution images as the pri- Jingjiang River section. This area mainly distributes low
ori information, which are usually insufficient due to the plains, with an elevation smaller than 50 m. It has a sub-
cloud pollution, sensor aging or failure, satellite mission tropical monsoon climate, with the annual average temper-
Y. Cai et al. / Advances in Space Research 64 (2019) 2233–2244 2235
ature between 15.8 °C and 17.4 °C, and an annual precipi- 10 m 10 m, representing a typical vegetation community
tation between 1200 mm and 1500 mm. This area is an with homogeneous vegetation and land cover types. The
important grain production center of China. The growing land use maps of Hunan Province (Hunan Provincial
season of paddy rice was April-October (Zhang et al., Department of Natural Resources, 2018) and Google earth
2017). It also grows natural wetland vegetation, sedge images were also used for training samples selection.
and reed, whose growing seasons are March–October and
March–November, respectively. 3. Method
2.2. Data and processing The proposed wetland vegetation classification process
has three main steps (Fig. 3). First, the RASTFM model
The Sentinel-2 data has 13 bands, including visible, near was employed to generate a synthetic NDVI with high spa-
infrared and short-wave bands. The five near infrared tial and temporal resolution from Sentinel-2 and MODIS
bands (4 red edge bands and 1 NIR band) can be used images. Second, use the separability analysis method to
for vegetation monitoring and analysis. Clear Sentinel-2 generate the optimum feature combination using the
images (<2% cloud cover) over the study area (path/row: Sentinel-2 MSI images, time series Sentinel-2 NDVI, phe-
N0205_R075_T49RFM) were acquired in 2018 (Fig. 2). nology data and time series Sentinel-1 SAR data. Finally,
Terrain correction and atmospheric correction were con- extract the wetland vegetation types by the object-based
ducted to the Level-1C data using SRTM DEM (http:// RF method. and Sen2Cor Version-2.5.5 (Luis et al.,
2016; Available online: 3.1. The RASTFM model for time series Sentinel-2
party-plugins-2/sen2cor/). The spectral bands with 10 m reflectance data
spatial resolutions (including Blue, Green, Red and NIR
bands) were used in this study. The robust adaptive spatial temporal fusion model
We selected 23 16-day composited vegetation index (RASTFM) employs a pair of high and coarse resolution
products (MOD13Q1) between January and December images obtained on the same day, and a coarse resolution
2018 acquired from the United States Geological Survey image obtained on the prediction day to predict the fine
(USGS) for the study. These images have a spatial resolu- resolution on the prediction day. RASTFM considers both
tion of 250 m in the sinusoidal projection. After removing the non-shape change prediction of land cover and the
the invalid values using the pixel reliability images, all of shape change prediction, which is better than other
the MOD13Q1 time series were transformed into the spatial-temporal fusion models that only consider one
UTM (WGS84) projection, zone 49 (North), the same as change. The RASTFM model consists of five parts, which
Sentinel-2 MSI (Zhang and Lin, 2018). are the relative radiometric normalization and radiometric
In addition, 29 Sentinel-1 GRD images (C-band, double de-normalization, non-shape change prediction, shape
polarizations with VV and VH) in 2018 with a spatial res- change detection, shape change prediction and high-pass
olution of 10 m were downloaded from European Space modulation. The specific process and implement steps can
Agency (ESA) ( These images were be found in Zhao et al. (2018).
processed by orbital correction, thermal noise removal, We selected the Sentinel-2 data from day of year (DOY)
radiometric calibration, speckle filtering, range-doppler 108 and the MODIS data from DOY113 (the time nearest
terrain correction in Sentinel Application Platform (SNAP) to the acquisition date of Sentinel-2 data) as the input base
software provide by ESA (Schubert et al., 2015; Diñeiro pair images, and MODIS data from DOY273 as the coarse
et al., 2019). In this study, we utilized the backscattering resolution image on the prediction day to predict the
data (r0VV and r0VH ) to extract the wetland vegetation Sentinel-2 data in DOY278 (Table 1).
The field data used in this study were collected in May 3.2. Sentinel-2 NDVI time series reconstruction and
2018. Considering the vegetation growth and distribution, phenology data extraction
we selected 186 sample sites and obtained their geographi-
cal coordinates by GPS. Using the stratified sampling, we In order to eliminate the noise in the fused time series
classified the vegetation in the sample sites into sedge, reed Sentinel-2 NDVI data caused by cloud contamination
and paddy rice. Each sample site has an area of and errors during prediction, the Savizky-Golay (S-G) fil-
Fig. 2. Date of the remote sensing imagery (S-1 and S-2 represent Sentinel-1 and Sentinel-2, respectively).
2236 Y. Cai et al. / Advances in Space Research 64 (2019) 2233–2244
ter was employed (Fig. 4). This filter could clearly describe 3.3. Optimum feature combination selection
minor changes in the study area, despite the complex crop
types and broken plots (Jönsson and Eklundh, 2004; The optimum feature combination analysis was used to
Kontgis et al., 2015; Zhang et al., 2017). The S-G filtering detect the image combination, which can effectively
was performed with a locally adapted moving window, improve the highest classification accuracy with lowest
which utilizes a polynomial least-squares regression to fit data redundancy and time consumption (Dobson et al.,
the time-series data. 1992; San Miguel-Ayanz and Biging, 1997; Murakami
The phenology data were derived from the Sentinel-2 et al., 2001; Van Niel et al., 2005). And a separability anal-
NDVI time series by the threshold method (Jönsson and ysis method (JBh) was used to select the optimum feature
Eklundh, 2004) that assumes that a phenological phe- combination using Sentinel-2 MSI images, time series
nomenon occurs when the NDVI values exceed a given Sentinel-2 NDVI, phenology data and time series
Fig. 4. A pixel of paddy rice of the Sentinel-2 NDVI time series before and after the S-G filtering.
Y. Cai et al. / Advances in Space Research 64 (2019) 2233–2244 2237
Table 2
Phenological parameters.
Phenological parameters Definition
Start of the season (SOS) Time for which the left edge of NDVI curve has increased to 20% of the seasonal amplitude measured from the left
minimum level.
End of the season (EOS) Time for which the right edge of NDVI curve has decreased to 20% of the seasonal amplitude measured from the right
minimum level.
Length of the season (LOS) EOS-SOS
Max of NDVI (MON) The largest NDVI value of the growing season.
Amplitude of NDVIVI The difference between the maximum NDVI and the base level.
Sentinel-1 SAR backscattering images (a total of 105 fea- accuracy (UA). We also classify the prototype objects by
tures). Considering the separability of multi-classes, classes several other classifiers (such as nearest neighbor (NN),
are given different weights. JBh (Eq. (1)) was used to select Maximum likelihood (MLC) classification and regression
optimal temporal windows for multi-classes and it gives tree (CART), and support vector machine (SVM). For
great importance to classes with high a priori probabilities comparison, pixel-based NN, MLC, CART, SVM and
in the selection process. RF classifiers were also used for classification.
XN X N qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
J Bh ¼ pðwi Þ pðwj Þ JM 2 ði; jÞ ð1Þ
i¼1 j>i
4. Result and analysis
where N is the number of classes, p(wi) and p (wj) are the a 4.1. The fused Sentinel-2 NDVI data
priori probabilities of classes i and j, respectively. They
were calculated using the combination of training and val- The MODIS NDVI, fused Sentinel-2 NDVI and actual
idation samples. JM is the Jeffreys-Matusita distance which Sentinel-2 NDVI of the study area on DOY278 are shown
was usually employed to express the separability of classes. in Fig. 6. The fused NDVI image using RASTFM has a
In this section, the optimum feature combination was much higher spatial resolution, and more details than the
selected using MATLAB. MODIS image. The fused NDVI and actual Sentinel-2
NDVI image are visually similar, even at the junctions of
3.4. Wetland vegetation classification complex land covers, such as along the river bench, near
Fig. 6. (a) The MODIS NDVI of DOY (Day of year) 273, (b) the fused NDVI generated by the RASTFM model of DOY 273, and (c) the actual Sentinel-
2 NDVI of DOY 278.
the lake shore, and in some sparse vegetation areas. The growing peaks, which are on DOY145 and DOY241. The
accuracy assessment results show that the fused image single season rice grows fast before DOY209, then the
derived from RASTFM has a high correlation with the NDVI plummets in the rest growing period. The NDVI
actual Sentine-2 image (R2 = 0.934, RMSE = 0.0729) values of reed grow relatively quick during DOY1 and
(Fig. 7). Therefore, the accuracies of the fused NDVI DOY113, but the growth becomes slow and stable from
images are reliable for the identification of paddy rice. DOY129 to DOY257. As for forest, the NDVI value shows
Then, we used the RASTFM algorithm to predict time ser- no obvious change. The NDVI time series of different veg-
ies Sentinel-2-like images (16 d interval and 10 m etation have distinctive features, especially during critical
resolution). growth stages, so they can be used to identify vegetation
The phenological parameters of different vegetation, as
4.2. The smoothed Sentinel-2 NDVI time series and
well as the NDVI time series, are different during critical
phenology data
growth season. Especially in critical phenological stages,
which have been proved to be able to use for vegetation
As the smoothed Sentinel-2 NDVI time series in Fig. 8
classification (Dymond et al., 2002; Kiptala et al., 2013;
shows, forest has the most stable and longest growing sea-
Bargiel, 2017). Five parameters were extracted to show
son, followed by reed. The NDVI value of sedge grows fast
the phenological differences among different vegetation of
in the period of DOY1 to DOY97, but slides quickly dur-
the whole study area, which may increase the mapping
ing DOY113–DOY209, and on DOY209 reaches the low-
accuracy (Fig. 9). Forest show a long growing season
est point of the year. The double cropping rice has two
(~191 days) with an early start (82th ‘day-of-year’; DOY)
and a late end (DOY273). Reed has a similar start of sea-
son (SOS) and end of season (EOS) with forest. The SOS of
both single and double paddy rice is late (DOY113 and 98,
respectively), but the EOS is early (DOY256 and 263,
respectively). Sedge begins to grow around March (DOY
75) and finishes growing around October (DOY 274).
The SOS and EOS of reed are similar to that of sedge. For-
est has the largest NDVI (0.96), while double cropping rice
has the minimum (0.75). The NDVI values of other vegeta-
tion types are very close. On the contrary, forest has the
minimum seasonal NDVI amplitude, but rice and sedge
have large seasonal NDVI amplitudes.
Fig. 7. Scatterplots between the actual Sentinel-2 NDVI and the fused The JBh distance of different numbers of feature combi-
NDVI images of DOY 278. nations are shown in Fig. 10. We selected 22 features as the
Y. Cai et al. / Advances in Space Research 64 (2019) 2233–2244 2239
Fig. 9. Five phenological parameters of the study area (a) SOS, (b) EOS, (c) LOS, (d) MON and (e) AON.
optimal combination for paddy rice classification, because SOS and LOS, and the Sentine-1 (r0VV and r0VH ) on April
additional images have little contribution to separability 12, May 30, July 17, August 10, September 27 were utilized
(i.e., JBh increased by less than 0.2). Considering data to map the paddy rice. The JBh was 9.82 when use the opti-
redundancy and computing time, the Sentinel-2 MSI on mum feature combination, the JM distances among the
February 12, April 18, October 07, November 13, fused vegetation types were above 1.9 (Table 3). Because the for-
Sentinel-2 NDVI on April 7, May 25, June 10, July 12, est phenological characteristics are different from other
August 13, September 30, October 16, phenology data of vegetation, the JM distance between forest and other
2240 Y. Cai et al. / Advances in Space Research 64 (2019) 2233–2244
Table 3
J-M distance among the four vegetation types on different combination of Landsat NDVI data.
JBh J-M distance
All classes Sedge and Double Sedge and Single Reed and Double Reed and single Double cropping rice and Reed and
cropping rice season rice cropping rice season rice single season rice Sedge
9.826 2.000 2.000 2.000 1.989 1.994 1.982
Y. Cai et al. / Advances in Space Research 64 (2019) 2233–2244 2241
Table 4
Classification accuracies of the two methods with several classifiers.
Overall accuracy Kappa coefficient UA (DCR) PA (DCR) UA (SSR) PA (SSR)
Pixel-based method
NN 85.48% 0.84 88.42% 86.93% 79.25% 86.48%
MLC 87.26% 0.86 88.26% 84.69% 81.33% 87.56%
CART 90.15% 0.88 90.65% 90.15% 85.74% 91.28%
SVM 89.23% 0.88 90.36%. 87.24% 85.26% 90.85%
RF 90.64% 0.89 93.42% 91.63% 88.65% 94.63%
Object-based method
NN 85.24% 0.84 88.68% 86.24% 78.32% 86.52%
MLC 88.35% 0.85 88.67% 88.38% 83.52% 89.46%
CART 91.63% 0.90 92.54% 91.33% 87.45% 93.62%
SVM 89.19% 0.88 90.15% 88.26% 85.64% 90.49%
RF 95.26% 0.93 96.76% 95.58% 92.18% 96.85%
*The SSR and DCR represent single season rice and double cropping rice, respectively.
Table 5 5. Discussion
Statistically significant comparison between pixel-based and object-based
methods. The Dongting Lake area is a plain formed by the impact
Comparison methods McNemar’s p-Value of sediments from the Yangtze River, and it is also an
chi-Squared important natural wetland in China. In the past50 years,
NN Pixel-based vs. NN Object-based 1.8726 0.2583 a major part of the natural wetland has been converted
MLC Pixel-based vs. MLC Object-based 2.3847 0.0864 to farmland (paddy fields) or other artificial land due to
CART Pixel-based vs. CART Object-based 3.7912 0.0632
SVM Pixel-based vs. SVM Object-based 2.9358 0.5247
human activities (Wang et al., 2014; Zhang et al., 2018).
RF Pixel-based vs. RF Object-based 9.0000* 0.0015* Recently, with the implementation of ecological protection
policies, some paddy fields gradually become natural wet-
lands. Thus, the fragmentation of vegetation patches
Table 6 becomes increasingly serious. Monitoring the paddy fields
Statistically significant comparison between object-based classifiers. change in the Dongting Lake area is of great significance
Comparison methods McNemar’s p-Value to regional ecological environment and management.
chi-Squared Many studies have employed MODIS, Landsat and GF
NN Object-based vs. MLC Object-based 0.2748 0.5643 (16 m) to get the paddy information (Zhang and Lin,
NN Object-based vs. CART Object-based 1.8852 0.0895 2018). Paddy maps generated from Sentinel-2 and
NN Object-based vs. SVM Object-based 1.7561 0.1679
Sentinel-1 data have a higher resolution and more detailed
NN Object-based vs. RF Object-based 6.4738* 0.0082*
MLC Object-based vs. CART Object-based 1.8257 0.1746 spatial information. Especially, they can provide many
MLC Object-based vs. SVM Object-based 1.8645 0.1680 details in a complex heterogeneous region with different
MLC Object-based vs. RF Object-based 5.7825* 0.0316* vegetation types. On these maps, the fragmented paddy
CART Object-based vs. SVM Object-based 3.3533 0.0645 in the Dongting Lake area can be more accurately and
CART Object-based vs. RF Object-based 2.6253 0.0898
easily separated from other vegetation types.
SVM Object-based vs. RF Object-based 7.2537* 0.0068*
Using time series optical images to identify paddy rice
The symbol ‘‘*” in Tables 5 and 6 indicates that the differences is statis-
accurately is an effective way in some regions. However,
tically significant.
it is difficult to obtain enough clear optical images with
Table 7
Classification accuracies with different feature combinations.
Feature sequences Class Producer accuracy User accuracy Overall accuracy Kappa coefficient
Only Sentinel-2 MIS Paddy rice 87.15% 83.52% 84.28% 0.75
Non rice 86.34% 87.36%
Sentinel-2 MSI + NDVI Paddy rice 88.69% 85.74% 86.53% 0.81
Non rice 90.25% 89.64%
Sentinel-2 MSI + NDVI + phenology Paddy rice 92.36% 88.72% 90.12% 0.88
Non rice 91.83% 91.66%
Sentinel-2 MSI + NDVI + phenology + Sentinel-1 Paddy rice 95.62% 94.89% 95.26% 0.93
Non rice 97.58% 96.37%
2242 Y. Cai et al. / Advances in Space Research 64 (2019) 2233–2244
high spatial and temporal resolutions in rainy and cloudy fication, the contribution of the band reflectance in
days. Spatial and temporal fusion model is a feasible way Sentinel-2 and Sentinel-1 data to the FVC estimation need
to generate images with high spatial and temporal resolu- further research. Additionally, the RF was chosen for
tion, and we employed RASTFM to get Sentinel-2 time paddy identification. Convolutional Neural Network
series with high accuracy in this study. Although optical (CNN), as one of the most popular algorithms in the
multispectral data have higher classification accuracy, but extended deep learning, has been widely used in remote
is insufficient to distinguish wetland vegetations that are sensing image classification and has achieved good results
typically composed of multiple vegetations which have sim- (Zhang et al., 2018). Therefore, the CNN algorithm may
ilar spectral characteristics. Many studies have shown that increase the paddy rice mapping accuracy using the time
SAR data can replace optical images with the same resolu- series remote sensing data with high spatial resolutions.
tion in high cloud, mist and smoke areas because of its
immunity to weather conditions (Erinjery et al., 2018). Acknowledgement
Recent studies have shown that combining optical and
SAR data can improve the extraction accuracy of land The authors would like to thank the editors and anony-
cover information in heavily heterogeneous areas (Veloso mous reviewers for the valuable comments, which are sig-
et al., 2018; Whyte et al., 2018). nificant for improving this manuscript. This Work was
Both object-based and pixel-based methods were used to supported in part by the National Natural Science Founda-
identify paddy rice in this study, but the classification accu- tion of China (41901385), and in part by the China Post-
racy has been improved by object-based classifiers. Paddy doctoral Science Foundation (2019M652815).
fields are severely fragmented due to the persistent impacts
of climate change and anthropogenic factor. Recently, the
