ICT-4202, DIP Lab Manual - 8
ICT-4202, DIP Lab Manual - 8
ICT-4202, DIP Lab Manual - 8
Jahangirnagar University
Savar, Dhaka-1342
Lab Manual
Prepared by
Mehrin Anannya
Assistant Professor
Institute of Information Technology
Jahangirnagar University
Lab Contents:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('data/cells.csv')
print(df)
plt.scatter(x="time", y="cells", data=df)
x_df = df.drop('cells', axis='columns')
#Or you can pick columns manually. Remember double brackets.
#Single bracket returns as series whereas double returns pandas dataframe which is
what the model expects.
#SPlit data into training and test datasets so we can validate the model using test
data
prediction_test = model.predict(X_test)
print(y_test, prediction_test)
#Residual plot
plt.scatter(prediction_test, prediction_test-y_test)
plt.hlines(y=0, xmin=200, xmax=300)
1. Import Libraries:
- `pandas` is a data manipulation library.
- `numpy` is used for numerical operations.
- `matplotlib.pyplot` is for creating visualizations.
- `seaborn` is built on top of matplotlib and provides additional visualizations.
5. Train-Test Split:
- `train_test_split()`: Splits the data into training and testing sets.
#random_state can be any integer and it is used as a seed to randomly split dataset.
#By doing this we work with same test dataset evry time, if this is important.
7. Model Evaluation:
- `model.score(X_train, y_train)`: Prints the R^2 value, a measure of how well
the model replicates observed values.
9. Residual Plot:
- `plt.scatter()`: Creates a scatter plot for residuals.
- `plt.hlines()`: Adds horizontal lines at y=0 for reference.
This code overall performs linear regression on the given data, evaluates the
model, makes predictions, and visualizes the results.
It's crucial to consider assumptions like linearity and normality of residuals, and
address challenges like multicollinearity. Model evaluation metrics, including R-
squared and Mean Squared Error, provide insights into the model's performance.
Additionally, proper feature scaling is essential to ensure the coefficients'
sensitivity to variable scales is managed effectively. In summary, Multi-Linear
Regression serves as a powerful and interpretable tool for understanding and
predicting outcomes influenced by multiple interacting variables.
df = pd.read_csv('data/heart_data.csv')
print(df.head())
prediction_test = model.predict(X_test)
print(y_test, prediction_test)
print("Mean sq. errror between y_test and predicted =", np.mean(prediction_test-
y_test)**2)
#All set to predict the number of images someone would analyze at a given time
#print(model.predict([[13, 2, 23]]))
This code snippet performs a linear regression analysis on a dataset related to heart
disease. Let's break down the code:
3. **Data Splitting:**
- `x_df = df.drop('heart.disease', axis=1)`: Separates the features (independent
variables) from the target variable.
- `y_df = df['heart.disease']`: Extracts the target variable.
- `X_train, X_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.3,
random_state=42)`: Splits the data into training and testing sets.
5. **Model Evaluation:**
- `prediction_test = model.predict(X_test)`: Makes predictions on the test set.
- `print(y_test, prediction_test)`: Prints the actual values and predicted values.
- `print("Mean sq. error between y_test and predicted =",
np.mean(prediction_test - y_test)**2)`: Calculates and prints the mean squared
error between the predicted and actual values.
7. **Prediction:**
- The code is commented out, but there's a prediction example commented in at
the end: `#print(model.predict([[13, 2, 23]]))`. This line would predict the
'heart.disease' value for a given set of features (13, 2, 23).
In summary, this code explores the relationship between certain features and heart
disease using linear regression. It trains a linear regression model on the provided
dataset, evaluates its performance, and prints the model coefficients. Additionally,
it includes a commented-out line for making a prediction using the trained model.
import numpy as np
import cv2
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
print(df.isnull().sum())
#df = df.dropna()
print(df.dtypes)
sns.displot(df['radius_mean'], kde=False)
print(df.corr())
corrMatrix = df.corr()
fig, ax = plt.subplots(figsize=(10,10)) # Sample figsize in inches
#sns.heatmap(df.iloc[:, 1:6:], annot=True, linewidths=.5, ax=ax)
sns.heatmap(corrMatrix, annot=False, linewidths=.5, ax=ax)
# Countplot
sns.countplot(x="diagnosis", data=df)
plt.show()
# Distribution Plot
sns.histplot(df['radius_mean'], kde=False)
plt.show()
# Assuming X_scaled is your scaled feature matrix and Y is your target variable
# Print individual accuracy values for each class based on the confusion matrix
print("With Lung disease (True Negative Rate) =", cm[0, 0] / (cm[0, 0] + cm[1, 0]))
print("No disease (True Positive Rate) =", cm[1, 1] / (cm[0, 1] + cm[1, 1]))
1. **Importing Libraries:**
- `import numpy as np`: Importing the NumPy library and renaming it as np.
- `import cv2`: Importing the OpenCV library.
- `import pandas as pd`: Importing the Pandas library.
- `from matplotlib import pyplot as plt`: Importing the pyplot module from the
Matplotlib library.
- `import seaborn as sns`: Importing the Seaborn library.
2. **Loading Data:**
3. **Exploring Data:**
- `print(df.describe().T)`: Printing the summary statistics of the dataset.
- `print(df.isnull().sum())`: Printing the count of missing values in each column.
- `print(df.dtypes)`: Printing the data types of each column.
4. **Data Visualization:**
- `sns.countplot(x="diagnosis", data=df)`: Creating a countplot of the 'diagnosis'
column.
- `sns.displot(df['radius_mean'], kde=False)`: Creating a distribution plot of the
'radius_mean' column.
- `print(df.corr())`: Printing the correlation matrix of the dataset.
- `sns.heatmap(corrMatrix, annot=False, linewidths=.5, ax=ax)`: Creating a
heatmap of the correlation matrix.
6. **Data Preprocessing:**
- `Y = df["diagnosis"].values`: Extracting the target variable.
- `X = df.drop(labels=["diagnosis"], axis=1)`: Extracting the features, excluding
the 'diagnosis' column.
8. **Model Evaluation:**
- `y_pred = model.predict(X_test)`: Making predictions on the test set.
- `accuracy = accuracy_score(y_test, y_pred)`: Calculating the accuracy of the
model.
- `cm = confusion_matrix(y_test, y_pred)`: Creating a confusion matrix.
- Printing the confusion matrix and individual accuracy values for each class.
Tasks: