MACHINE LEARNING PROJECT
MACHINE LEARNING PROJECT
MACHINE LEARNING PROJECT
PROJECT
STUDENT GRADE ANALYSIS
PREDICTION
DATA PREPROCESSING
This part of the code drops the
column 'PlaceofBirth' from the
DataFrame df as it may not be
relevant for analysis. Then, it
prints descriptive statistics of
the dataset using the describe()
function, which provides insights
into its distribution
DATA VISUALIZATION
This section creates count plots for several categorical
variables listed in the ls list. It uses Seaborn's catplot()
function to visualize the distribution of students across
different categories
FEATURE ENGINEERING
Here, the target variable ('Class') is separated from the
features. Categorical variables are converted into dummy
variables using one-hot encoding (pd.get_dummies()), and the
target variable is encoded using LabelEncoder().
MODELING
This section splits the data into training and testing sets using
train_test_split(), standardizes the features using
StandardScaler() to ensure all features have the same scale,
and then transforms the training and testing sets accordingly.
FEATURE IMPORTANCE
This part of the code uses a random forest classifier to
determine the importance of each feature in predicting the
target variable. The feature_importances_ attribute of the
trained random forest model is used to get feature importance
scores. These scores are then sorted, and a bar plot is created
to visualize the importance of each feature.
DIMENSIONALITY REDUCTION
In this section, certain dimensions (features) are removed from
both the training and testing sets based on the list ls. This step
aims to reduce the dimensionality of the dataset by excluding
less relevant features.
MODEL EVALUATION
This part of the code
evaluates various
regression models
using k-fold cross-
validation. Models such
as Linear Regression,
Lasso Regression,
Elastic Net, KNN
Regression, Decision
Tree Regression, and
Support Vector
Regression are trained
and evaluated using
cross-validation to
determine their
performance.
MODEL COMPARISON
Similar to the
previous section,
this code compares
the performance of
the same regression
models, but this
time using scaled
features. Each
model is part of a
pipeline that
includes feature
scaling using
StandardScaler().
HYPERPARAMETER TUNING (LASSO ALGORITHM)
This section
performs
hyperparameter
tuning for the Lasso
regression
algorithm. It uses
grid search
(GridSearchCV) to
find the best value
for the
regularization
parameter alpha.
The grid of alpha
values is defined,
and then the best
alpha value is
determined based
on cross-validated
mean squared error.
USING ENSEMBLES
Here, various
ensemble methods
are evaluated,
including AdaBoost,
Gradient Boosting,
Random Forest, and
Extra Trees. Each
ensemble method is
part of a pipeline
that includes
feature scaling.
Cross-validated
mean squared error
is used to evaluate
the performance of
each ensemble
method.
COMPARING ENSEMBLE ALGORITHMS
PROJECT OBJECTIVE
PREDICTING STUDENT PERFORMANCE BASED ON DEMOGRAPHIC AND BEHAVIORAL FACTORS TO ASSIST
EDUCATORS IN IDENTIFYING AT-RISK STUDENTS.
IMPACT
CONTRIBUTES TO ONGOING EFFORTS TO ENHANCE EDUCATIONAL OUTCOMES AND SUPPORT STUDENT
SUCCESS THROUGH DATA-DRIVEN INSIGHTS AND INTERVENTIONS.
THANK YOU ALL FOR
YOUR KIND
ATTENTION