Assignment 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Assignment 2

Due Date: July 25, 2018 (11 AM)


Presentation: July 25, 2018
Consider the problem of “Retail Credit Scoring for Auto Finance Ltd.” (Case 5). Auto Finance Ltd., a
part of one of India’s large conglomerates, provide loans to enable cash-strapped lower-middle class
Indian customers to buy the two-wheelers. A major problem as observed by the IT team of Auto
Finance Ltd. is that of high default rate. To be specific, they have observed that approximately 71% of
the customers have delayed their repayments. Thus they are more interested in developing a model
which will eventually help them to decide whether to extend loan to a prospective customer or not.
Based on the attached data set “Assignment_2_data.csv” and the list of variables provided in Exhibit
2 of the case, answer the following questions.

1. Build a logistic regression model based on training data set to identify good customers and
bad customers. A good customer is one who has never delayed the payment, whereas a bad
customer is one who has delayed the payment even once. Use the variables “AGE”,
“NOOFDEPE”, “MTHINCTH”, “SALDATFR”, “TENORYR”, “DWNPMFR”, “PROFBUS”,
“QUALHSC”, “QUAL_PG”, “SEXCODE”, “FULLPDC”, “FRICODE” and “WASHCODE” as predictors
in your logistic model. Clearly interpret the output of the model.
2. Judge the performance of the model based on validation data set. Is the performance of the
model satisfactory? Consider at least two criteria.
3. Include the variable “Region” as an additional predictor in your logistic model. Note that you
have to create appropriate dummy variables for “Region”. Does inclusion of “Region”
improves the performance of the model?
4. Suppose Auto Finance Ltd. provides loan for a 2-year period. The management of the Auto
Finance Ltd. has estimated that the profit associated with a “True Positive” case is Rs. 6360.
Furthermore, they also estimated that the losses associated with a “False Negative” case and
a “False Positive” case are Rs. 12500 and Rs. 6360, respectively. Based on confusion matrix
obtained for the validation data set, calculate the total profit for the company.
5. Can you suggest an alternative model? Is the alternative model better that the logistic
regression model?
6. How will the fitted model be helpful in taking managerial decisions?

Hint: 1. You may start with the following code:


# Reading The data set
d<-read.csv("case_data.csv",header=T)
attach(d)
names(d)

# # Creating a hold-out data set


train=(DATASET=="BUILD")
d.test=d[!train,]

# Creating an array of "DefaulterFlag" variable for the training data


# May be required later
DefaulterFlag_train=DefaulterFlag[train]
# Creating an array of "DefaulterFlag" variable for the hold-out data
# May be required later
DefaulterFlag_test=DefaulterFlag[!train]

# Model
mod=glm(DefaulterFlag~AGE+NOOFDEPE+MTHINCTH+SALDATFR+TENORYR+DWNPMFR+PROFBUS+QUALHSC+Q
UAL_PG+SEXCODE+FULLPDC+FRICODE+WASHCODE,data=d,family=binomial,subset=train)
summary(mod)

2. You may have to define dummy variables for “Region” as follows. Note the reference region is all others. Include
the dummy variables in your model.
# Region Code
d$AP2<-ifelse(Region=="AP2", 1, 0)
d$AP2<-as.factor(AP2)
d$Chennai<-ifelse(Region=="Chennai", 1, 0)
d$Chennai<-as.factor(Chennai)
d$KA1<-ifelse(Region=="KA1", 1, 0)
d$KA1<-as.factor(KA1)
d$KE2<-ifelse(Region=="KE2", 1, 0)
d$KE2<-as.factor(KE2)
d$TN1<-ifelse(Region=="TN1", 1, 0)
d$TN1<-as.factor(TN1)

You might also like