1 Introduction

The United Nations General Assembly (UNGA) announced 2020 as the International Year of Plant Health (IYPH), later extended till 1 July 2021 (FAO 2020). Further, to spread awareness about keeping plants healthy to boost economic development, United Nations declared 12 May as the International Day of Plant Health (IDPH) (Geneva 2021; Manida 2022). Crop yields need to expand by approximately 60% by 2050 to feed the predicted 10 billion population of the world. Unfortunately, in 2021, 30% of rice crops, 23% of maize crops, 22% of wheat crops, 21% of the soybean crops, and 17% of potato crops were damaged due to pathogens (Ristaino et al. 2021).

Recently, Marieta Sakalian, a United Nations Environment Programme (UNEP) expert, warned, “Plant health is increasingly under threat. Climate change and human activities have degraded ecosystems, reduced biodiversity, and created new niches where pests can thrive” (FAO 2020). Plant disease is a manifestation of the visible and invisible responses of plant cells and tissues against any pathogen or environmental factor that may damage plant parts or lead to their death (Lucas 2020). Some pathogens are easily identified based on visible symptoms shown on hosts, but others require a more elaborate plant disease detection methodology. The early detection of crop disease before it reaches its brink may aid in taking the required steps to halt the disease’s transmission (Ristaino et al. 2021). Apart from timely crop disease detection, it is also important to identify the type of disease, as disease control methods differ significantly from one type to another depending on the pathogens, the hosts, and the biological and chemical factors involved.

National Academy of Sciences (NAS) underlined the need for breakthrough technologies for earlier detection and control of crop diseases (NASEM 2019). The automated image-based identification of crop disease using deep neural networks has gained the attention of researchers (Libo et al. 2019; Alguliyev et al. 2021; Chug et al. 2022). However, deep neural networks require the learning of millions of parameters and several hyper-parameters, requiring significant computation time and huge amounts of training data. Therefore, the present work employed the Extreme Learning Machine (ELM), which is a Single hidden Layer Feed-forward Neural Network (SLFNN) (Huang et al. 2004). Another reason for using ELM is its faster convergence with just one tunable parameter (number of hidden neurons). Identifying relevant image patterns will help us to differentiate among the various disease classes. For this purpose, spatial and frequency-based feature extraction approaches are utilized to extract features from crop images segmented using k-means. Based on the extracted features, an ELM-based multi-class classification model is built to predict the suitable class to which the diseased crop image belongs. The proposed framework is named Interpretable Leaf Disease Detector (I-LDD) in the present work.

1.1 Need of interpretability in leaf disease identification

Although there has been significant progress in detecting diseased leaves using machine learning approaches, the outcome of these approaches has remained obfuscated to the end user. Recently, the machine learning community has developed methods to make the outcome of Machine Learning (ML) models interpretable for the end user, thus giving a boost to the end users’ trust in the ML systems (Stiglic et al. 2020; Vellido 2020). In this work, Local Interpretable Model-agnostic Explanations (LIME) (Ribeiro et al. 2016) is employed to explain the outcome of the proposed framework—I-LDD to the naive users.

1.2 Related research work

The reliable prediction of infection recurrence and its intensity is crucial for good crop yields (Libo et al. 2019). Moreover, an accurate and consistent crop infection assessment is essential for effective crop protection in fields. Visual assessment by human raters, microscopic examinations, and molecular, serological, and microbial diagnostic procedures are common approaches for diagnosing and detecting crop diseases (Khakimov et al. 2022). The microscopic examination uses pathogen morphology to diagnose the leaf diseases (Moumni et al. 2020). Pathogen isolates that vary in toxicity or are sensitive to a particular pesticide can be detected using molecular and serological approaches (Khakimov et al. 2022; Zhang et al. 2020). Moreover, the volume of pathogens is not always proportionate to the extent of observable clinical disease. Also, the extent of pathogenic organisms in a crop is often not directly related to disease severity (Bock et al. 2020). Traditional visual estimations detect a disease depending on various disease symptoms or apparent pathogen indicators (Zhang et al. 2020). This visual identification of crop diseases, even if carried out by skilled professionals, is a time-intensive task and prone to errors and bias (Merot et al. 2020).

The implementation of the traditional methods mentioned above is restricted for several reasons—they are time-consuming, counterproductive, involve a skilled technician, and require a research laboratory set-up. Research has lately found new sensor-based approaches and automated image-based analysis for crop disease detection, identification, and quantification (Udutalapally et al. 2020). Farmers are trying to find a solution that will aid in identifying pathogens in plants in real time, allowing immediate intervention and preventive therapies to control the disease and minimize yield loss. Identifying the right pathogen will enable farmers to save considerable money in pathogen control expenses by allowing them to localize spraying instead of preemptively spraying vast areas of agricultural fields (Klassen & Vreysen 2021).

According to the preceding discussion, image-based diagnosis, including visualization (Saleem et al. 2019), image information extraction (Xian & Ngadiran 2021), segmentation (Pallathadka et al. 2022), and a classification system (Alagumariappan et al. 2020), has emerged as the best way of detecting crop disease. The following works on image-based disease detection in plants have been carried out in the literature.

Pallathadka et al. (2022) used resizing and histogram equalization as preprocessing steps, then segmenting leaf images using the k-means segmentation method. They used Discrete Wavelet Transform (DWT) for feature extraction and Principal Component Analysis (PCA) to reduce the feature space. These reduced features are provided to various classifiers—Support Vector Machine (SVM), Naive Bayes Classifier, and Convolutional Neural Networks (CNN), achieving accuracy values of 96.2, 78.8, and 91.3%, respectively. Roy et al. (2023) proposed PCA DeepNet leveraging Generative Adversarial Network (GAN), Principal Component Analysis (PCA), Convolutional Neural Network (CNN), and Faster Region-Based Convolutional Neural Network (F-RCNN) for the real-time identification of tomato leaf diseases. They used a GAN-based augmentation approach after applying preprocessing to the leaf image data. Subsequently, using the standard Principal Component Analysis method, feature extraction was applied to the data. A customized CNN classifier is then used to classify the data. Subsequently, the classified outputs are faster using a faster region-based CNN. Their proposed approach outperformed other deep learning architectures by achieving an accuracy of 99.60%. Krishnan et al. (2022) carried out their study on banana leaf images. For preprocessing images, resizing and filtering are used. Further, they segmented the leaf regions using Total Generalized Variation Fuzzy C means clustering (TGVFCMS). They proposed a classifier based on a CNN, which attained an accuracy of 93.45%. Xian & Ngadiran (2021) proposed another approach using Extreme Learning Machine (ELM) to detect disease in tomato leaf images. The preprocessing was done by resizing and color space conversion to Hue Saturation Value (HSV). To segment the region of interest (ROI), HSV color segmentation was used. Further, the features are extracted based on HSV Histogram, Haralick, and Color moments. The extracted feature pool was fed to the ELM to construct a model to identify the leaf infection. ELM-based classifier achieved an accuracy of 84.94%. Aqel et al. (2021) used 73 leaf images preprocessed by performing contrast enhancement, image resizing to \(256 \times 256\) pixels, and smoothing via a smoothing filter. For segmentation, they used the k-means algorithm. They extracted texture-based features using a Gray Level Co-Occurrence Matrix (GLCM). They applied the Binary Dragonfly (BDA) algorithm to select the optimal features from these extracted features. These optimal features were then fed into the ELM classifier to identify infected leaf regions with 94% accuracy. Bhatia et al. (2020) proposed another method based on ELM on an imbalanced dataset comprising images from a dataset of powdery mildew disease in tomato plants. Numerous resampling procedures, including the Synthetic Minority Over-Sampling Technique (SMOTE), Random Over Sampling (ROS), Importance Sampling (IMPS), and Random Under Sampling (RUS), were used to balance the dataset before performing the classification using ELM. On performing classification using the method based on ELM, the IMPS approach yielded the best results, with a classification accuracy of 88.57% and an Area under the ROC Curve (AUC) of 89.19%. Hatuwal et al. (2020) converted the leaf images to grayscale as a preprocessing step and then used GLCM to extract the texture features such as contrast, correlation, inverse difference moments, and entropy from these preprocessed images. On providing these texture features as input to SVM, K-Nearest Neighbor (KNN), Random Forest (RF), and CNN, CNN exhibited the best accuracy (97.89%), followed by RF (87.436%), SVM (78.61%), and KNN (76.969%). Diana Andrushia et al. (2023) proposed another disease identification mechanism using convolutional capsule networks for grape leaves. They resized the images to \(128 \times 128\). Subsequently, these images are provided as input to a network comprised of only four convolutional layers and a capsule layer to build a classifier that attained an accuracy of 99.12%. Alagumariappan et al. (2020) utilized Raspberry PI hardware to develop a real-time decision support system with ELM. They used Hu moments to extract the features related to the outline of the leaf image and GLCM to capture the features related to the intensity variations of the pixels lying in the region of interest of the leaf image. Further, by providing the extracted features corresponding to three disease classes to ELM, they achieved an accuracy of 95%. Panchal et al. (2019) considered only four diseased classes and used resizing, denoising, transforming Blue Green Red (BGR) color space to HSV color space, and sharpening as the preprocessing steps. Further, they used k-means segmentation to segment the ROI in the leaf images. Subsequently, GLCM is applied to extract the texture features from the ROI. When these extracted features were provided to the Random Forest (RF), Decision Tree (DT), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM), they achieved an accuracy of 98, 94, 92, and 90%, respectively. Kaur et al. (2018) experimented with four distinct soybean leaf diseases—downy mildew, frog eye, and septoria leaf blight, along with healthy leaf images from the PlantVillage dataset (Hughes et al. 2015). They used the k-means segmentation algorithm to identify the ROI and extracted the features based on color moments, color correlograms, GLCM, Gabor, and Discrete Wavelet Transform (DWT). On constructing three SVM models on different combinations of features extracted above, they recorded around 90% accuracy.

1.3 Research motivation and contribution

The recent works discussed above have used various feature extraction methods and proposed several machine learning-based models for successfully identifying diseases in leaf images. However, to the best of our knowledge, this is the first attempt in the domain of detecting crop disease from the leaves of the plants to explain the outcome of the proposed model for detecting leaf diseases to the end-user who is not expected to be well versed with intricacies of machine learning. In summary,

  • An interpretable framework for image-based leaf disease detection is proposed using Extreme Learning Machine (ELM).

  • To evaluate the proposed model, 32 classes of diseases belonging to 9 different crop species are considered.

  • A combination of texture and frequency-based features is extracted from the segmented images and demonstrated the effectiveness of a combined set of features compared to the traditional way of extracting only one type of feature.

  • To establish the efficacy of the proposed framework—Interpretable Leaf Disease Detector, tenfold cross- validation accuracy is reported at a 95% confidence level.

  • To make the predictions made by I-LDD interpretable by the end-user, Local Interpretable Model-agnostic Explanations (LIME) (Ribeiro et al. 2016) is employed that generate the superpixels highlighting the crop disease-specific regions. The infected part of the diseased leaf and the generated annotated leaf images by LIME are validated by an expert.

To increase the computational classification effectiveness of the proposed framework, the use of ELM (Huang et al. 2004) is suggested, which has quicker convergence with just one tunable parameter (number of hidden neurons) for the proposed work. The ELM algorithm is used to address several problems in the medical domain, such as neurological disorders (Lima et al. 2022), COVID Detection COV-ELM (Rajpal et al. 2022), and arrhythmia classification with EEG signal (Atal & Singh 2020). In addition, the ELM and its versions have been used in areas such as 3D object recognition (Qi et al. 2021), fingerprint recognition (Alsmirat et al. 2019), face recognition (Kortli et al. 2020), and leukocyte image segmentation (Zhou et al. 2020). Further, to explain the predictions of the constructed model, LIME is used to generate the superpixels and analyze the impact of these superpixels on the predictions.

The remaining paper is structured in the following sections: Sect. 2 presents the details of the Extreme Learning Machine used in the present work. Section 3 gives the details of the three phases involved in I-LDD. Section 4 presents the experimental results, their analysis, and the statistical significance of the results, followed by the explanations generated by LIME. Section 5 discusses the advantages and limitations of the proposed approach, along with the future scope. Finally, Sect. 6 presents the conclusion.

2 Background details

This section briefly details the ELM algorithm and the explainable AI method—LIME used in the proposed framework—I-LDD.

2.1 Extreme Learning Machine (ELM)

Extreme Learning Machine is a Single hidden Layer Feed-forward Neural Network (SLFNN) that learns much faster than conventional learning through back-propagation and also exhibits good generalization performance (Huang et al. 2004). The input weights and the biases to the hidden layer are initialized randomly, and the Moore-Penrose matrix inverse computes the output weights of SLFNN.

Given a training set containing M samples, where each sample is represented by a tuple \((x_j, t_j)\), denoting the training instances and the corresponding output values, where \(x_j \in \textbf{R}^{m}\), \(t_j \in \textbf{R}^{n}\) and an activation function f(x).

The conventional SLFNN with \(\tilde{M}\) nodes that have the activation function f(x) is represented as follows:

$$\begin{aligned} \sum _{i=1}^{\tilde{M}}\beta _i f(w_i. x_j + b_i ) = o_j, \quad j = 1,...,M. \end{aligned}$$
(1)

where \(w_i\) is the weight vector from the input layer to the \(i^{th}\) hidden node and \(b_i\) denotes the threshold of the \(i^{th}\) node. \(\beta _i\) is the weight vector from \(i^{th}\) hidden node to the output node.

The algorithm can determine the output of these M instances with zero error, i.e., \(\sum _{j=1}^{\tilde{M}}||o_j - t_j ||= 0\). Hence, it can be concluded that:

$$\begin{aligned} \sum _{i=1}^{\tilde{M}}\beta _i f(w_i. x_j + b_i ) = t_j,\quad j = 1,...,M. \end{aligned}$$
(2)

The above system of M equations can be represented as follows:

$$\begin{aligned} L\beta = T \end{aligned}$$
(3)

where L is the output of the hidden layer, and its matrix form is represented below.

$$\begin{aligned} L= & {} \left[ {\begin{array}{lll} f(w_1.x_1 + b_1) &{} \cdots &{} f(w_{\tilde{M}}.x_1 + b_{\tilde{M}} )\\ \vdots &{} \cdots &{} \vdots \\ f(w_1.x_M + b_1) &{} \cdots &{} f(w_{\tilde{M}}.x_M + b_{\tilde{M}} ) \end{array} } \right] _{ M \times \tilde{M}} \end{aligned}$$
(4)
$$\begin{aligned} \beta= & {} \left[ {\begin{array}{c} \beta _1^\intercal \\ \vdots \\ \beta _{\tilde{M}}^\intercal \end{array} } \right] _{\tilde{M} X n} \end{aligned}$$
(5)
$$\begin{aligned} \textbf{T}= & {} \left[ {\begin{array}{c} \textbf{t}_1^\intercal \\ \vdots \\ \textbf{t}_M^\intercal \end{array} } \right] _{M X n} \end{aligned}$$
(6)

The above system of linear equation can be solved as follows:

$$\begin{aligned} \beta = L^\dagger T \end{aligned}$$
(7)

In this equation, \(L^{\dagger }\) is the Moore-Penrose generalized inverse of \(\textbf{L}\).

2.2 LIME

Local Interpretable Model-agnostic Explanations is a technique that can explain the predictions of any classifier by constructing an interpretable model locally around a prediction (Ribeiro et al. 2016). This interpretable model should be a good local approximation of the machine learning model’s predictions. Mathematically, the LIME-generated explanations can be expressed as follows:

$$\begin{aligned} \textrm{explanation}(x) = \textrm{arg} \min _{g \in G} L(f, g, \pi _x) + \Omega (g) \end{aligned}$$
(8)

In Eq. (8), x denotes an instance, and f(x) is the probability that x belongs to a class C. Further, g denotes an explanation (model) that belongs to a family of plausible explanatory models, denoted as G. The proximity measure \(\pi _x\) specifies the neighborhood size evaluated for the explanation around instance x. \(L(f, g, \pi _x)\) denotes the loss measuring the degree of the unfaithfulness of g in approximating f. Finally, to keep \(g \in G\) simple enough to be explainable, the complexity of g, denoted by \(\Omega (g)\), must be kept as minimum as possible.

For images, LIME works by segregating an image into several superpixels and creating several perturbations of the image by turning off some pixels. Further, LIME assigns weights based on measuring the proximity between the artificially made image and the actual image based on how close they are to the original image. Finally, LIME fits a linear regression model on the weighted perturbed images to determine the top contributing superpixels for the prediction task of the underlying black-box model.

3 Materials and methods

The proposed I-LDD framework has been implemented in Python 3.7 in the Google Colaboratory environment (NVIDIA Tesla K80 GPU with 12 GB RAM). Python modules NumPy, HPELM, Matplotlib, Scikit-Learn, and Seaborn have been used for the data pre-processing phase, model creation, and visual analysis. This section describes the dataset used in experimentation, followed by the proposed I-LDD framework.

3.1 Dataset description

The experiments carried out in the present work leveraged 14,578 images from the publicly available PlantVillage dataset (Hughes et al. 2015). The dataset included nine different crop species encompassing 24 different types of plant diseases and healthy leaves from eight species, thus resulting in 32 different classes. Table 1 lists the class of diseases along with their causal organisms, symptoms, and the sample of infected leaves.

3.2 Interpretable Leaf Disease Detector (I-LDD)

The proposed I-LDD comprises three phases. In the first phase, images have been resized for uniformity. Further, images have been segmented to distinguish the leaves from the background (please see Sect. 3.2.1). The textural and frequency-based features are extracted from the segmented leaf images in the second phase (please see Sect. 3.2.2). Finally, the third phase uses an Extreme Learning Machine to detect plant diseases from leaf images (Fig. 6).

3.2.1 Preprocessing

To ensure the uniformity of size, images are resized to the size of \(224 \times 224\). Subsequently, K-means clustering is used for segmenting the images into two regions: the leaf (the region of interest) and the background (the region of non-interest). K-means clustering is an unsupervised technique that categorizes the items into K (a user-defined number) clusters (MacQueen 1967). The algorithm begins by initializing the K cluster centers as randomly chosen data points and assigning each point to the closest centroid. Subsequently, it proceeds by iteratively updating the position of the cluster centers and reassigning the data points to the updated cluster centers. To separate the leaves from the background, k is set to two. Fig. 1 depicts the segmentation of a sample leaf image from the PlantVillage dataset into the region of interest (ROI) and the background using K-means clustering.

Table 1 Each row in the table includes the name of the leaf disease, its causal organism, symptoms of the respective disease, and a sample image of a leaf affected by the specific diseaseHughes et al. (2015)

3.2.2 Feature extraction

Several research studies show that the human visual system (HVS) is highly sensitive to the structural data of an image (Tang et al. 2018). Feature extraction is used to extract relevant features from the images that would be used for the leaf disease detection task. The texture is important in identifying the structure in the images and their classification (Haralick et al. 1973). In addition to texture features, frequency features also play an important role in the development of robust classifiers (Varuna Shree & Kumar 2018). Tang et al. (2018) proposed an image quality assessment combining texture and frequency domain features. Drawing inspiration from the works (Tang et al. 2018; Lacombe et al. 2020), a combination of textural and frequency-based features is used in the present work. The Discrete Wavelet Transform (DWT) and the Fast Fourier transform (FFT) are used to extract frequency-based features. To extract the texture-based features, Gray Level Difference Method (GLDM), Local Binary Pattern (LBP), color histogram, and hue moments are utilized. A brief description of each method is given below.

Discrete wavelet transform (DWT) In Discrete Wavelet Transform (DWT), features related to position and scale are extracted from an image (Oh et al. 2002). The wavelets in DWT are discretely sampled and contain data in both the spatial and frequency domains. DWT uses low-pass and high-pass filters, where low-pass filtering retains an image’s global and rough approximation description. In contrast, high-pass filtering produces a finer or more comprehensive description of the original image. The approximation coefficient allows for high-scale, low-frequency coefficients, whereas the comprehensive description stores low-scale, high-frequency coefficients. When 2D-DWT is performed on the image, it results in four sub-bands: HH (high-high), LH (low-high), HL (high-low), and LL (low-low), as shown in Fig. 2. A Two-level wavelet decomposition of an image provides eight sub-bands. The low-frequency parts of the image are LL1 and LL2, and the high-frequency parts of the image are HH1, HL1, LH1, HH2, HL2 ,and LH2. LL1 approximates the original image. Hence, it can be used further to get the second-level decomposition.

Fig. 1
figure 1

Segmentation of a sample leaf image from the PlantVillage dataset into the region of interest (ROI) and the background using the K-means clustering algorithm

Fig. 2
figure 2

One-level decomposition DWT of the preprocessed leaf image from PlantVillage dataset

Fast Fourier transform (FFT) Fast Fourier transform (FFT) is the enhanced version of Discrete Fourier Transform (DFT) (Yanikoglu & Kholmatov 2009). FFT transforms an image from the spatial domain to the frequency domain. It breaks down an image into sines and cosines with different amplitudes and phases, revealing recurrent patterns in the image. Low frequencies reflect slow changes in the image and approximate it as they store most information. High frequencies represent sudden changes in the image and thus provide a comprehensive description.

The 2-D discrete Fourier transform (DFT) is given as:

$$\begin{aligned} F(u,v) = \sum _{x=0}^{M-1}\sum _{y=0}^{M-1}f(x,y)e^{-i2\pi (ux/M + vy/N)} \end{aligned}$$
(9)

In Eq. (9), f(xy) corresponds to a digital image of size \(M \times N\), and the range of discrete variables u and v is given as, \(u= 0,1,2,\ldots ,M-1\) and \(v= 0,1,2,\ldots ,N-1\).

Figure 3 depicts the result after applying FFT to a sample leaf image.

Fig. 3
figure 3

Sample leaf image from PlantVillage dataset, FFT of the image, FFT + mask generated using a low-pass filter, and image after applying inverse FFT

Gray Level Difference Method (GLDM) The GLDM is based on the occurrence of two pixels with a given absolute difference in gray level that are separated by a particular displacement \(\delta \) (Kim & Park 1999; Weszka et al. 1976). For any \(\delta = (\Delta x, \Delta y)\), let \(f_\delta (x,y) = \mid f(x, y) - f(x+ \Delta x, y+ \Delta y)\mid \) and \(P(i\mid \delta ) \) be the estimated probability density function given by \(P(i\mid \delta ) = Prob(f_\delta (x, y) = i) \). In this work, four values of \(\delta \) are considered: (0, l), (-l, l), (l, 0), and (-l, -l), where l is the inter-sample spacing distance.

Local binary pattern (LBP)

Local binary pattern (LBP) is a textural descriptor for the attribute of strong discrimination power. LBP labels each pixel in an image by comparing the gray level to neighboring pixels and gives a binary number (Ojala et al. 2001). A value of unity is assigned to neighbors with a grey level greater than the center pixel in a predefined patch; otherwise, a zero value is assigned. Figure 4 depicts the sample leaf image on the left side and the corresponding LBP on the right side.

Fig. 4
figure 4

Segmented leaf image on the left side and the corresponding LBP on the right side

Color histogram Given a discrete color space defined by some color axes (red, green, and blue), the color histogram is created by discretizing the image colors and counting how many times each discrete color appears in the image array (Swain & Ballard 1991). In other words, a color histogram depicts the distribution of colors in an image (Ahmad 1994). Figure 5 shows the color histogram of a sample leaf image.

Fig. 5
figure 5

Input image and color histogram corresponding to a sample leaf image from PlantVillage dataset

Hu moments Hu moments are an image descriptor used to characterize the shape of objects in the image (Hu 1962). The features based on Hu moments are invariant to rotation, scaling, translation, and mirroring. Given an image I(xy) of size \(m \times n\), equation (10) gives the 2-D moment (\(m_{p,q}\)) of order (pq) as:

$$\begin{aligned} m_{p,q}=\sum _{x=0}^{m-1}\sum _{y=0}^{n-1} I(x,y)x^py^q \end{aligned}$$
(10)

where p denotes the order of x and q is the order of y.

Similarly, a central moment (\(\mu _{p,q})\) can be defined as (please see equation (11)):

$$\begin{aligned} \mu _{p,q}=\sum _{x=0}^{m-1}\sum _{y=0}^{n-1} I(x,y)(x - \bar{x})^{p}(y - \bar{y})^{q} \end{aligned}$$
(11)

where \(\bar{x}=\dfrac{m_{10}}{m_{00}}\) and \(\bar{y}=\dfrac{m_{01}}{m_{00}}\)

The normalized moments are given by equation (12).

$$\begin{aligned} \delta _{p,q}=\frac{\mu _{p,q}}{m_{00}^{\frac{p+q}{2}+1}} \end{aligned}$$
(12)
Fig. 6
figure 6

I-LDD framework comprises three phases: Dataset Preprocessing, Feature Extraction, and ELM-based multi-classification and LIME-based explanations

The Hu moments descriptor returns a real-valued feature vector of 7 values. These 7 values capture and quantify the shape of the object in an image [please see Eq. (13) to (19)].

  • Moment 1:

    $$\begin{aligned} \textrm{hu}_{1}=\delta _{20}+\delta _{02} \end{aligned}$$
    (13)
  • Moment 2:

    $$\begin{aligned} \textrm{hu}_{2}=(\delta _{20} -\delta _{02})^2+4\delta ^2_{11} \end{aligned}$$
    (14)
  • Moment 3:

    $$\begin{aligned} \textrm{hu}_{3}=(\delta _{30} -3\delta _{12})^2+(3\delta _{21} -\delta _{03})^2 \end{aligned}$$
    (15)
  • Moment 4:

    $$\begin{aligned} \textrm{hu}_{4}=(\delta _{30}+\delta _{12})^2+(\delta _{21}+\delta _{03})^2 \end{aligned}$$
    (16)
  • Moment 5:

    $$\begin{aligned} \textrm{hu}_{5}= & {} (\delta _{30}-3\delta _{12})(\delta _{30}+\delta _{12})((\delta _{30}+\delta _{12})^2\nonumber \\{} & {} -\,3(\delta _{21}+\delta _{03})^2)+(3\delta _{21} - \delta _{03})(3(\delta _{30}\nonumber \\{} & {} +\,\delta _{12})^2-(\delta _{21}+\delta _{03})^2) \end{aligned}$$
    (17)
  • Moment 6:

    $$\begin{aligned} \textrm{hu}_{6}= & {} (\delta _{20} - \delta _{02})((\delta _{30}+\delta _{12})^2 - (\delta _{21}+\delta _{03})^2)\nonumber \\{} & {} +\,4\delta _{11}(\delta _{30}+\delta _{12})(\delta _{21}+\delta _{03}) \end{aligned}$$
    (18)
  • Moment 7:

    $$\begin{aligned} \textrm{hu}_{7}= & {} (3\delta _{21} - \delta _{03})(\delta _{30}+\delta _{12})((\delta _{30}+\delta _{12})^2\nonumber \\{} & {} -\,3(\delta _{21}+\delta _{03})^2) +(\delta _{30} - 3\delta _{12})(\delta _{21}+\delta _{03})\nonumber \\{} & {} (3(\delta _{30}+\delta _{12})^2 - (\delta _{21}+\delta _{03})^2) \end{aligned}$$
    (19)

In the proposed framework—I-LDD, the combined features extracted using the above-mentioned methods are used. Fourteen statistical parameters, including area, mean, standard deviation, skewness, kurtosis, energy, entropy, maximum, minimum, mean absolute deviation, median, range, root-mean-square, and uniformity, are computed for each type of extracted feature. The first 14 features are directly extracted from the segmented leaf image in the spatial domain. In addition, 14 features for each of the four directions of GLDM output 56 (\(14\times 4\)) features and 14 features for each of the remaining three textural-based methods (LBP, color histogram, and Hu moments) are also extracted, generating a total of 112 \((14+56+14+14+14)\) textural features. Moreover, drawing inspiration from Zargari et al. (2018), a pool of frequency-based features is generated by computing the above 14 statistical parameters corresponding to FFT and the eight DWT subbands (LL1, LH1, HL1, HH1, LL2, LH2, HL2, and HH2) making a total of 126 (\(14+14\times 8\)) frequency-based features. Finally, for every leaf image, a feature vector is obtained by merging the vector of the textural feature of length (112) with the frequency feature of length (126), resulting in a combined feature vector of length 238 (112 + 126), which serves as input to the ELM classifier (Huang et al. 2006).

4 Results and analysis

The proposed framework—I-LDD, has been implemented in Python 3.7 in the Google Colaboratory environment (NVIDIA Tesla K80 GPU with 12 GB RAM). The experiments were carried out with different sets of frequency-based and texture-based features, as well as with a combination of frequency and textural features. The boxplot in Fig. 7 shows higher accuracy (0.92) when frequency-based features are used compared to texture-based features, which yield an accuracy of 0.81. Further, Fig. 7 shows that when a combination of 238 frequency and textural features is used, I-LDD yields an accuracy of 0.90. So, for further experimentation, a combined set of frequency-based and texture-based features are used. The number of hidden neurons (\(\tilde{M}=799\)) in the ELM was determined empirically. On comparing the performance of the proposed I-LDD model with some state-of-the-art ML algorithms, namely, Naive Bayes (NB), Decision Tree (DT), K-Nearest Neighbor (KNN), Random Forest Classifier (RF), and Support Vector Machine (SVM), it has been shown that the proposed model (I-LDD) outperformed other machine learning-based models with a tenfold cross-validation accuracy of 0.9322 ± 0.0088 (see Table 2).

Fig. 7
figure 7

Boxplot for accuracy values using a feature pool of combined (frequency and textural) features, frequency features, and textural features. The median accuracy achieved by using a combined feature vector is 0.92, which is higher than the median accuracy values obtained individually by frequency and textural features

Table 2 Comparison of results against state-of-the-art machine learning algorithms on 32 classes

The performance of the proposed model (I-LDD) is also compared with recent works by Xian & Ngadiran (2021) and Kurmi et al. (2022) (see Table 3). Whereas (Xian & Ngadiran 2021) reported an accuracy of 84.94% using ELM with 1024 neurons at the hidden layer, I-LDD achieves a significant improvement with an accuracy of 93% using 483 neurons only. I-LDD also outperformed with an accuracy of 92.86% over the SVM-based classifier (Kurmi et al. 2022), which yielded an accuracy of 91.89%).

Table 3 Comparison of I-LDD with two recently proposed frameworks on PlantVillage dataset

4.1 Statistical significance of I-LDD

In this section, the statistical significance is investigated for the results yielded by ELM (I-LDD) and the competitive machine learning classifiers (please see Table 2). For this purpose, the Friedman test is used, a popular statistical test for comparing more than two algorithms over multiple folds (Demšar 2006). The accuracy values for ELM, SVM, RF, DT, KNN, and NB were computed on 30 seeds. Table 4 shows the mean ranking of classifiers based on their observed accuracy values on 30 different folds.

Table 4 Mean ranking of classifiers based on their observed accuracy values on 30 different folds

Under the null hypothesis, which states that the performance of I-LDD (based on ELM) and the state-of-the-art algorithms is equivalent (mean rank of the algorithms is equal) and the default level of significance \(\alpha = 0.05\), Friedman test yielded \(\chi ^{2} \textrm{statistic}=140.286\) and \(p-\textrm{value}=1.55\textrm{e}{-}28\), indicating the rejection of the null hypothesis. Further, the Nemenyi test (Demšar 2006) is leveraged to examine whether I-LDD yielded a competitive performance compared to other state-of-the-art algorithms. The performance of two classifiers differs significantly if the corresponding average ranks differ by at least the critical difference (CD):

$$\begin{aligned} \textrm{CD}=q_{\alpha } \sqrt{\frac{k(k+1)}{6N}} \end{aligned}$$
(20)

For \(k=6\) (six classifiers to be compared), \(N=30\) (30 folds), and \(\alpha = 0.05\) level of significance, \(q_{\alpha }=2.85\), CD may be computed as 1.37 [using Eq. (20)]. It is to be noted that the difference in the mean rankings of ELM and SVM is 0.8, which is less than the computed CD value, which indicates some degree of similarity between the results obtained by the two algorithms at \(\alpha = 0.05\) level of significance. Conversely, the difference in the mean rankings of ELM and RF is 2.07, which is greater than the computed CD value, which indicates that the two algorithms differ in accuracy at \(\alpha = 0.05\) level of significance. Figure 8 marks the average rank of each algorithm along the axis (higher ranks to the left). It is evident that the proposed method I-LDD, (based on ELM), ranks higher in terms of accuracy.

Fig. 8
figure 8

Average rank of each algorithm is marked along the axis (lower ranks to the left). It is evident that the proposed method, I-LDD (based on ELM), ranks higher in terms of accuracy

4.2 Visualization using LIME

In this work, Local Interpretable Model-agnostic Explanations (LIME) (Ribeiro et al. 2016) is used to identify the regions that were found significant by I-LDD to predict the leaf disease. Table 5 shows LIME-generated explanations for some sample leaf images of five disease categories, namely, grape black rot, corn (maize) northern leaf blight, tomato yellow leaf curl virus, grape esca black measles, and apple black rot. Given an image, the first column in Table 5 denotes the predicted class, the second column describes generic symptoms corresponding to the class label, the third column shows the diseased regions manually marked by a human expert, and the fourth column shows LIME-generated heatmaps. The intensity of the blue color in the LIME-generated heatmap signifies its relative importance in predicting the class label. It has been observed that, in most cases, the dark blue regions in the heatmap correspond to the disease-specific regions marked by a human expert. However, in some cases, the dark blue regions differed from those marked by the human expert. Indeed, the human expert found LIME-generated heatmaps convincing and helpful in earning her trust in the proposed model—I-LDD.

Table 5 Each row in the table shows the predicted class label, its symptoms, diseased regions marked by a human expert, and LIME-generated heatmap

5 Discussions

The ability to stop the spread of crop disease may be aided by the early discovery of the disease before it reaches its peak (Lucas 2020). In addition to timely crop disease diagnosis, identifying the disease type is crucial because disease control approaches vary depending on pathogens, their hosts, and the involved biochemical processes. The automated image-based identification of crop disease using deep neural networks has gained the attention of researchers (Libo et al. 2019; Alguliyev et al. 2021; Chug et al. 2022). Nevertheless, deep neural networks need to learn millions of parameters, including numerous hyperparameters, which take a lot of computation time and training data. Therefore, the Extreme Learning Machine (ELM) (Huang et al. 2004) is employed with just one tunable parameter (number of hidden neurons) for its faster convergence. Finding relevant image patterns will assist us in differentiating between disease types. The features are extracted from crop images segmented using k-means, utilizing both spatial and frequency-based feature extraction algorithms. A multi-class classification model named I-LDD (using ELM) is developed based on these extracted features. The developed model, I-LDD, predicts the appropriate class to which the diseased crop image belongs.

The following are the key advantages of the proposed method:

  • Most of the state-of-the-art methods deal with only a few leaf diseases. This paper is the first attempt to use all 32 classes of diseases belonging to nine different crop species as present in the PlantVillage dataset.

  • Using ELM leads to faster convergence than traditional deep learning and machine learning methods.

  • I-LDD is the first attempt in this domain that employs LIME (Ribeiro et al. 2016) to explain the outcome of the proposed model for detecting leaf diseases to the end-user, who may not visualize the decision-making of machine learning algorithms.

  • LIME generates the superpixels, highlighting the crop disease-specific regions. The infected part of the diseased leaf and the generated annotated leaf images by LIME have been validated by a botanist.

The only limitation of I-LDD is that it requires an abundance of hardware resources when processing a large number of leaf images. Therefore, the present work investigates only 14,578 leaf images. As a future step, the proposed approach may be tailored to capture real-time leaf images from the crop fields and automatically detect the disease in the captured images. Further, this approach can be extended to developing sensors for measuring symptoms of various plant diseases earlier in controlled conditions of smart urban farming systems.

6 Conclusions

Automated detection of leaf diseases is an important issue concerning the farming community. This paper proposed I-LDD—an ELM-based interpretable framework for detecting leaf diseases. I-LDD begins with a preprocessing step that transforms all images to the same size and segments them into the region of interest (leaf) and the background. Next, I-LDD extracts a combined set of frequency-based and texture-based features, which serve as input to the ELM classifier. Using the publicly available PlantVillage dataset, I-LDD achieved a classification accuracy of 0.9322 ± 0.0088 at a confidence level of 95%, which surpassed other competing approaches. On examining the statistical significance of the classification performance of I-LDD, it was found that the I-LDD ranked higher in accuracy than other classifiers. Further, the paper demonstrated the applicability of an explainable AI method, Local Interpretable Model-agnostic Explanations (LIME) to mark the diseased regions of the leaves.