ICT-4202, DIP Lab Manual - 8

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

Institute of Information Technology

Jahangirnagar University
Savar, Dhaka-1342

Lab Manual

Course Code: ICT-4202


Course Title: Digital Image Processing Lab
Lab No.: 8
Lab Title: Image Processing with Machine Learning

Prepared by

Mehrin Anannya
Assistant Professor
Institute of Information Technology
Jahangirnagar University

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
1

Prepared using materials from the internet, especially GeekforGeeks


Lab Title: Image Processing with Machine Learning : To introduce students to
Machine Learning techniques named Linear Regression, Multi-linear Regression
and Logistic Regression.

Lab Contents:

● Linear Regression using Python.

● Multi-Linear Regression using Python.

● Logistic Regression using Python.

Theory with Hands on Practice:

Introduction to Linear Regression:


Linear Regression is a statistical method used for modeling the linear relationship
between a dependent variable and one or more independent variables. In its basic
form, it assumes a straight-line relationship, aiming to find the best-fitting line
through the data by minimizing the sum of squared differences between predicted
and actual values. The model is characterized by interpretable coefficients, such as
the slope and intercept, making it valuable for understanding the impact of
individual features on the target variable. Linear Regression is widely applied in
various fields for tasks like prediction, forecasting, and uncovering underlying
patterns in data.

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
2

Prepared using materials from the internet, especially GeekforGeeks


Linear Regression comes in two main variants: Simple Linear Regression, dealing
with a single independent variable, and Multiple Linear Regression, extending to
scenarios with multiple predictors. While the model's simplicity and
interpretability are strengths, it may not perform optimally when facing complex,
non-linear relationships or outliers. Regularization techniques like Ridge and Lasso
Regression can be employed to enhance robustness in the presence of
multicollinearity or overfitting, expanding the model's applicability across diverse
domains.

Python works using Linear Regression

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('data/cells.csv')
print(df)
plt.scatter(x="time", y="cells", data=df)
x_df = df.drop('cells', axis='columns')
#Or you can pick columns manually. Remember double brackets.
#Single bracket returns as series whereas double returns pandas dataframe which is
what the model expects.

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
3

Prepared using materials from the internet, especially GeekforGeeks


#x_df=df[['time']]
print(x_df.dtypes) #Prints as object when you drop cells or use double brackets
[[]]
#Prints as float64 if you do only single brackets, which is not the right type for our
model.
y_df = df.cells

#SPlit data into training and test datasets so we can validate the model using test
data

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.3,
random_state=42)

#TO create a model instance


from sklearn import linear_model
model = linear_model.LinearRegression() #Create an instance of the model.
model.fit(X_train, y_train) #Train the model or fits a linear model

print(model.score(X_train, y_train)) #Prints the R^2 value, a measure of how well


#observed values are replicated by themodel.

prediction_test = model.predict(X_test)
print(y_test, prediction_test)

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
4

Prepared using materials from the internet, especially GeekforGeeks


print("Mean sq. errror between y_test and predicted =", np.mean((prediction_test-
y_test)**2))
# A MSE value of about 8 is not bad compared to average # cells about 250.

#Residual plot
plt.scatter(prediction_test, prediction_test-y_test)
plt.hlines(y=0, xmin=200, xmax=300)

#Plot would be useful for lot of data points

Here's an explanation of the code:

1. Import Libraries:
- `pandas` is a data manipulation library.
- `numpy` is used for numerical operations.
- `matplotlib.pyplot` is for creating visualizations.
- `seaborn` is built on top of matplotlib and provides additional visualizations.

2. Read CSV Data:


- `pd.read_csv('data/cells.csv')`: Reads a CSV file and creates a DataFrame (`df`).
- `print(df)`: Displays the DataFrame.

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
5

Prepared using materials from the internet, especially GeekforGeeks


3. Scatter Plot:
- `plt.scatter()`: Creates a scatter plot.
- `x="time"` and `y="cells"`: Specifies the columns for the x and y axes.
- `data=df`: Specifies the DataFrame.
- `.drop()`: The method used to drop specified labels from rows or columns.
- `'cells'`: The label of the column to be dropped.
- `axis='columns'`: Specifies that we want to drop a column. `axis=1` is an
alternative way of specifying the same thing.

x values will be time column, so we can define it by dropping cells. x can be


multiple independent variables which we will discuss in a different tutorial this is
why it is better to drop the unwanted columns rather than picking the wanted
column y will be cells column, dependent variable that we are trying to predict.

4. Define Independent and Dependent Variables:


- `x_df`: DataFrame containing independent variables.
- `y_df`: Series containing the dependent variable.

5. Train-Test Split:
- `train_test_split()`: Splits the data into training and testing sets.
#random_state can be any integer and it is used as a seed to randomly split dataset.
#By doing this we work with same test dataset evry time, if this is important.

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
6

Prepared using materials from the internet, especially GeekforGeeks


#random_state=None splits dataset randomly every time

6. Linear Regression Model:


- `linear_model.LinearRegression()`: Creates an instance of the linear regression
model.
- `model.fit(X_train, y_train)`: Trains the model on the training data.

7. Model Evaluation:
- `model.score(X_train, y_train)`: Prints the R^2 value, a measure of how well
the model replicates observed values.

8. Predictions and Mean Squared Error:


- `model.predict(X_test)`: Generates predictions on the test set.
- `np.mean((prediction_test - y_test)**2)`: Calculates the mean squared error
between the predicted and actual values.

9. Residual Plot:
- `plt.scatter()`: Creates a scatter plot for residuals.
- `plt.hlines()`: Adds horizontal lines at y=0 for reference.

This code overall performs linear regression on the given data, evaluates the
model, makes predictions, and visualizes the results.

Introduction to Multi-Linear Regression:

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
7

Prepared using materials from the internet, especially GeekforGeeks


Multi-Linear Regression is an extension of simple linear regression, designed to
handle scenarios where there are multiple independent variables influencing a
dependent variable. The model's equation incorporates these variables, allowing for
a more realistic representation of complex relationships. Each coefficient in the
equation reflects the impact of a one-unit change in the respective independent
variable while holding others constant. Model training involves optimizing these
coefficients to minimize the difference between predicted and actual values,
typically using methods like gradient descent. Multi-Linear Regression is widely
applied across diverse domains, such as finance and biology, where real-world
outcomes are influenced by multiple factors.

It's crucial to consider assumptions like linearity and normality of residuals, and
address challenges like multicollinearity. Model evaluation metrics, including R-
squared and Mean Squared Error, provide insights into the model's performance.
Additionally, proper feature scaling is essential to ensure the coefficients'
sensitivity to variable scales is managed effectively. In summary, Multi-Linear
Regression serves as a powerful and interpretable tool for understanding and
predicting outcomes influenced by multiple interacting variables.

Python works using Multi-Linear Regression

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
8

Prepared using materials from the internet, especially GeekforGeeks


import pandas as pd
import seaborn as sns
import numpy as np

df = pd.read_csv('data/heart_data.csv')
print(df.head())

df = df.drop("Unnamed: 0", axis=1)


#A few plots in Seaborn to understand the data

sns.lmplot(x='biking', y='heart.disease', data=df)


sns.lmplot(x='smoking', y='heart.disease', data=df)

x_df = df.drop('heart.disease', axis=1)


y_df = df['heart.disease']

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.3,
random_state=42)

from sklearn import linear_model

#Create Linear Regression object


model = linear_model.LinearRegression()

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
9

Prepared using materials from the internet, especially GeekforGeeks


#Now let us call fit method to train the model using independent variables.
#And the value that needs to be predicted (Images_Analyzed)

model.fit(X_train, y_train) #Indep variables, dep. variable to be predicted


print(model.score(X_train, y_train)) #Prints the R^2 value, a measure of how well

prediction_test = model.predict(X_test)
print(y_test, prediction_test)
print("Mean sq. errror between y_test and predicted =", np.mean(prediction_test-
y_test)**2)

#Model is ready. Let us check the coefficients, stored as reg.coef_.


#These are a, b, and c from our equation.
#Intercept is stored as reg.intercept_
print(model.coef_, model.intercept_)

#All set to predict the number of images someone would analyze at a given time
#print(model.predict([[13, 2, 23]]))

This code snippet performs a linear regression analysis on a dataset related to heart
disease. Let's break down the code:

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
10

Prepared using materials from the internet, especially GeekforGeeks


1. **Data Loading and Preprocessing:**
- `df = pd.read_csv('data/heart_data.csv')`: Reads a CSV file into a Pandas
DataFrame.
- `df = df.drop("Unnamed: 0", axis=1)`: Drops the 'Unnamed: 0' column from
the DataFrame.

2. **Data Exploration with Seaborn:**


- `sns.lmplot(x='biking', y='heart.disease', data=df)`: Creates a scatter plot using
Seaborn to explore the relationship between the 'biking' feature and the
'heart.disease' target.
- `sns.lmplot(x='smoking', y='heart.disease', data=df)`: Creates a similar scatter
plot for the 'smoking' feature.

3. **Data Splitting:**
- `x_df = df.drop('heart.disease', axis=1)`: Separates the features (independent
variables) from the target variable.
- `y_df = df['heart.disease']`: Extracts the target variable.
- `X_train, X_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.3,
random_state=42)`: Splits the data into training and testing sets.

4. **Linear Regression Model:**


- `model = linear_model.LinearRegression()`: Creates a Linear Regression
model.
- `model.fit(X_train, y_train)`: Trains the model using the training data.

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
11

Prepared using materials from the internet, especially GeekforGeeks


- `print(model.score(X_train, y_train))`: Prints the R^2 value, a measure of how
well the model fits the training data.

5. **Model Evaluation:**
- `prediction_test = model.predict(X_test)`: Makes predictions on the test set.
- `print(y_test, prediction_test)`: Prints the actual values and predicted values.
- `print("Mean sq. error between y_test and predicted =",
np.mean(prediction_test - y_test)**2)`: Calculates and prints the mean squared
error between the predicted and actual values.

6. **Model Coefficients and Intercept:**


- `print(model.coef_, model.intercept_)`: Prints the coefficients and intercept of
the linear regression model.

7. **Prediction:**
- The code is commented out, but there's a prediction example commented in at
the end: `#print(model.predict([[13, 2, 23]]))`. This line would predict the
'heart.disease' value for a given set of features (13, 2, 23).

In summary, this code explores the relationship between certain features and heart
disease using linear regression. It trains a linear regression model on the provided
dataset, evaluates its performance, and prints the model coefficients. Additionally,
it includes a commented-out line for making a prediction using the trained model.

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
12

Prepared using materials from the internet, especially GeekforGeeks


Introduction to Logistic Regression:
Logistic Regression is a fundamental statistical method employed for binary
classification tasks, where the objective is to predict the probability of an instance
belonging to a specific category. Utilizing the sigmoid activation function, logistic
regression models the relationship between input features and the probability of the
positive class. The training process involves optimizing parameters to minimize a
cost function, and the resulting model produces a decision boundary that separates
instances into different classes. Logistic Regression's simplicity, interpretability,
and effectiveness make it widely utilized in various domains, including medicine,
finance, marketing, and social sciences.

While logistic regression is designed for binary outcomes, it can be extended to


handle multiclass classification problems. Despite its simplicity, logistic regression
remains a powerful tool for scenarios where a clear understanding of the
relationship between features and class probabilities is crucial. Its transparent
nature makes it particularly valuable in situations where interpretability and insight
into the underlying factors driving predictions are essential.

Python works using Logistic Regression

import numpy as np
import cv2
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
13

Prepared using materials from the internet, especially GeekforGeeks


df = pd.read_csv("data/wisconsin_breast_cancer_dataset.csv")

print(df.describe().T) #Values need to be normalized before fitting.

print(df.isnull().sum())
#df = df.dropna()

print(df.dtypes)

#Understand the data


sns.countplot(x="diagnosis", data=df) #M - malignant B - benign

sns.displot(df['radius_mean'], kde=False)

print(df.corr())

corrMatrix = df.corr()
fig, ax = plt.subplots(figsize=(10,10)) # Sample figsize in inches
#sns.heatmap(df.iloc[:, 1:6:], annot=True, linewidths=.5, ax=ax)
sns.heatmap(corrMatrix, annot=False, linewidths=.5, ax=ax)

# Assuming df is your DataFrame

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
14

Prepared using materials from the internet, especially GeekforGeeks


# Make sure to replace 'diagnosis' with the actual column name in your DataFrame

# Countplot
sns.countplot(x="diagnosis", data=df)
plt.show()

# Distribution Plot
sns.histplot(df['radius_mean'], kde=False)
plt.show()

# Correlation Matrix and Heatmap


corrMatrix = df.corr()
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(corrMatrix, annot=False, linewidths=.5, ax=ax)
plt.show()

# Split the data into X (features) and Y (target)


Y = df["diagnosis"].values
# Define the independent variables. Let's also drop 'diagnosis' for X.
X = df.drop(labels=["diagnosis"], axis=1)

from sklearn.preprocessing import MinMaxScaler


from sklearn.impute import SimpleImputer

# Impute missing values with the mean


imputer = SimpleImputer(strategy='mean')

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
15

Prepared using materials from the internet, especially GeekforGeeks


X_imputed = imputer.fit_transform(X)

# Scale the features using MinMaxScaler


scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X_imputed)

from sklearn.model_selection import train_test_split


from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Assuming X_scaled is your scaled feature matrix and Y is your target variable

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X_scaled, Y, test_size=0.2,
random_state=42)

# Fit the logistic regression model


model = LogisticRegression(max_iter=5000)
model.fit(X_train, y_train)

# Make predictions on the test set


y_pred = model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
16

Prepared using materials from the internet, especially GeekforGeeks


from sklearn.metrics import confusion_matrix

# Assuming y_pred is the predicted values from your model


cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)

# Print individual accuracy values for each class based on the confusion matrix
print("With Lung disease (True Negative Rate) =", cm[0, 0] / (cm[0, 0] + cm[1, 0]))
print("No disease (True Positive Rate) =", cm[1, 1] / (cm[0, 1] + cm[1, 1]))

Here is an explaination of the code step by step:

1. **Importing Libraries:**
- `import numpy as np`: Importing the NumPy library and renaming it as np.
- `import cv2`: Importing the OpenCV library.
- `import pandas as pd`: Importing the Pandas library.
- `from matplotlib import pyplot as plt`: Importing the pyplot module from the
Matplotlib library.
- `import seaborn as sns`: Importing the Seaborn library.

2. **Loading Data:**

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
17

Prepared using materials from the internet, especially GeekforGeeks


- `df = pd.read_csv("data/wisconsin_breast_cancer_dataset.csv")`: Reading a
CSV file into a Pandas DataFrame.

3. **Exploring Data:**
- `print(df.describe().T)`: Printing the summary statistics of the dataset.
- `print(df.isnull().sum())`: Printing the count of missing values in each column.
- `print(df.dtypes)`: Printing the data types of each column.

4. **Data Visualization:**
- `sns.countplot(x="diagnosis", data=df)`: Creating a countplot of the 'diagnosis'
column.
- `sns.displot(df['radius_mean'], kde=False)`: Creating a distribution plot of the
'radius_mean' column.
- `print(df.corr())`: Printing the correlation matrix of the dataset.
- `sns.heatmap(corrMatrix, annot=False, linewidths=.5, ax=ax)`: Creating a
heatmap of the correlation matrix.

5. **Further Data Visualization:**


- Additional data visualization using `sns.countplot` and `sns.histplot` for
'diagnosis' and 'radius_mean', respectively.

6. **Data Preprocessing:**
- `Y = df["diagnosis"].values`: Extracting the target variable.
- `X = df.drop(labels=["diagnosis"], axis=1)`: Extracting the features, excluding
the 'diagnosis' column.

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
18

Prepared using materials from the internet, especially GeekforGeeks


- `imputer = SimpleImputer(strategy='mean')`: Creating an imputer with the
mean strategy.
- `X_imputed = imputer.fit_transform(X)`: Imputing missing values in X using
the mean.
- `scaler = MinMaxScaler()`: Creating a MinMaxScaler.
- `X_scaled = scaler.fit_transform(X_imputed)`: Scaling the features using
MinMaxScaler.

7. **Data Splitting and Model Training:**


- `X_train, X_test, y_train, y_test = train_test_split(X_scaled, Y, test_size=0.2,
random_state=42)`: Splitting the data into training and testing sets.
- `model = LogisticRegression(max_iter=5000)`: Creating a logistic regression
model.
- `model.fit(X_train, y_train)`: Fitting the model on the training data.

8. **Model Evaluation:**
- `y_pred = model.predict(X_test)`: Making predictions on the test set.
- `accuracy = accuracy_score(y_test, y_pred)`: Calculating the accuracy of the
model.
- `cm = confusion_matrix(y_test, y_pred)`: Creating a confusion matrix.
- Printing the confusion matrix and individual accuracy values for each class.

This code is essentially a data exploration and logistic regression classification


pipeline with some visualization and preprocessing steps. It loads a dataset,
explores its characteristics, visualizes some aspects, preprocesses the data, and then

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
19

Prepared using materials from the internet, especially GeekforGeeks


trains and evaluates a logistic regression model for breast cancer diagnosis
prediction.

Tasks:

INTRODUCTION TO IMAGE PROCESSING USING


PYTHON
20

Prepared using materials from the internet, especially GeekforGeeks

You might also like