Project FRA Milestone1 JPY Nikita Chaturvedi 05.05.2022 Jupyter Notebook PDF
Project FRA Milestone1 JPY Nikita Chaturvedi 05.05.2022 Jupyter Notebook PDF
Project FRA Milestone1 JPY Nikita Chaturvedi 05.05.2022 Jupyter Notebook PDF
Problem Statement
Businesses or companies can fall prey to default if they are not able to keep up their debt obligations. Defaults
will lead to a lower credit rating for the company which in turn reduces its chances of getting credit in the future
and may have to pay higher interests on existing debts as well as any new obligations. From an investor's point
of view, he would want to invest in a company if it is capable of handling its financial obligations, can grow
quickly, and is able to manage the growth scale.
A balance sheet is a financial statement of a company that provides a snapshot of what a company owns,
owes, and the amount invested by the shareholders. Thus, it is an important tool that helps evaluate the
performance of a business.
Data that is available includes information from the financial statement of the
companies for the previous year (2015). Also, information about the Networth of
the company in the following year (2016) is provided which can be used to drive
the labeled field.
In [175]:
import warnings
warnings.filterwarnings("ignore")
In [2]:
In [3]:
Company.head(10)
Capital [Latest] [Latest] [Latest] [Latest] (
68.08 4,458.20 7,410.18 9,070.86 -1,098.88 486.86 ... -10.3 -39.74 -57.74 -57.74 -87.18
06.86 7,714.68 6,944.54 1,281.54 4,496.25 9,097.64 ... -5,279.14 -5,516.98 -7,780.25 -7,723.67 -7,961.51
23.49 2,353.88 2,326.05 1,033.69 -2,612.42 1,034.12 ... -3.33 -7.21 -48.13 -47.7 -51.58
70.83 4,675.33 5,740.90 1,084.20 1,836.23 4,685.81 ... -295.55 -400.55 -845.88 379.79 274.79
31.57 1,536.08 2,567.65 949.98 804.82 834.86 ... -395.87 -987.73 -396.67 -672.36 -1,264.22
45.45 979.13 2,664.04 920.67 263.95 705.76 ... -447.24 -596.97 -456.4 -461.06 -610.8
60.94 -613.79 597.82 1,700.27 -1,121.96 117.67 ... 1.9 -20.43 -3.58 -3.58 -25.91
47.85 86.35 1,220.83 1,329.82 -390.53 2,536.78 ... 19.23 18.18 9.76 9.76 8.71
In [4]:
Company.tail(10)
Capital
Power
3576 5455 Grid 43811.23 5,231.59 38,166.59 1,39,632.92 95,044.55 1,18,264.26 -10,923.29 12
Corpn
3577 566 Tata Steel 46637.38 971.41 66,663.89 1,01,142.12 28,198.44 42,583.38 -3,727.04 12
Sardar
3578 13569 47261.30 42,263.46 44,129.73 46,810.68 2,636.27 3,746.17 665.73 1
Sar.Narm.
3579 5554 Axis Bank 53164.91 474.1 44,676.51 4,61,977.78 4,02,200.22 4,497.01 0 3,58
HDFC
3581 4987 72677.77 501.3 62,009.42 5,90,576 4,96,009.19 8,463.30 0 4,44
Bank
3582 502 Vedanta 79162.19 296.5 34,057.87 71,906.06 37,643.79 29,848.44 2,503.86 11
3583 12002 IOCL 88134.31 2,427.95 67,969.97 1,40,686.75 55,245.01 1,21,643.45 6,376.84 89
3584 12001 NTPC 91293.70 8,245.46 81,657.35 1,73,099.14 85,995.34 1,28,477.59 11,449.79 42
Bharti
3585 15542 111729.10 1,998.70 78,270.80 1,04,241 21,569.70 1,00,084.90 -12,145.30 11
Airtel
In [5]:
erc').str.replace('/','_by_').str.replace('&','and').str.replace('[','_').str.replace
In [6]:
Company.head(10)
Out[6]:
Tata Tele.
1 21214 -3986.19 1,954.93 -2,968.08 4,458.20 7
Mah.
ABG
2 14852 -3192.58 53.84 506.86 7,714.68 6
Shipyard
Bharati
4 23505 -2967.36 50.3 -1,070.83 4,675.33 5
Defence
Hanung
6 23633 -2125.05 30.82 -1,031.57 1,536.08 2
Toys
Quadrant
8 1541 -1695.75 61.23 -1,560.94 -613.79
Tele.
10 rows × 67 columns
In [7]:
Company.info()
<class 'pandas.core.frame.DataFrame'>
In [8]:
Company.dtypes.value_counts()
Out[8]:
object 62
float64 4
int64 1
dtype: int64
In [9]:
Company.shape
print('The number of rows of the dataframe is',Company.shape[0],'.')
print('The number of columns of the dataframe is',Company.shape[1],'.')
Dropping below listed columns as we can either use the raw values or the there percentages or
ratios.Here, we are choosing to drop these raw values and keeping the percentage values:
1. Co_Name as name of the company can be identified from Company code as well.
2. Networth as ROG-Net_Worth_perc is nothing but percentage of Value of a company as on 2015 - Current
Year.
3. Capital_Employed as ROG-Capital_Employed_perc is nothing but percentage of Total amount of capital
used for the acquisition of profits by a company.
4. Gross Block as ROG-Gross_Block_perc is percentage of Total value of all of the assets that a company
owns i.e. Gross Block.
5. Gross Sales as ROG-Gross_Sales_perc is percentage of The grand total of sale transactions within the
accounting period i.e., Gross Sales.
6. Net_Sales as ROG-Net_Sales_perc is percentage of Gross sales minus returns, allowances, and discounts
i.e. Net Sales.
7. Cost_of_Production as ROG-Cost_of_Production_perc is percentage of Costs incurred by a business from
manufacturing a product or providing a service i.e. Cost_of_Production.
8. PBIDT as ROG-PBIDT_perc is percentage of Profit Before Interest, Depreciation & Taxes i.e., PBIDT.
9. PBDT as ROG-PBDT_perc is percentage of Profit Before Depreciation and Tax i.e., PBDT.
10. PBIT as ROG-PBIT_perc is percentage of Profit before interest and taxes i.e., PBIT.
11. PBT as ROG-PBT_perc is percentage of Profit before tax i.e., PBT.
localhost:8888/notebooks/Downloads/Financial Risk Analytics (FRA)/Project FRA Milestone 1/Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022.ipynb 5/102
06/02/2022, 17:52 Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022 - Jupyter Notebook
p p g ,
12. PAT as ROG-PAT_perc is percentage of Profit After Tax i.e., PAT.
13. CP as ROG-CP_perc is percentage of Commercial paper, a short-term debt instrument to meet short-term
liabilities. i.e CP.
14. Revenue_earnings_in_forex as ROG-Revenue_earnings_in_forex_perc is percentage of Revenue earned in
foreign currency i.e.,Revenue_earnings_in_forex .
15. Revenue_expenses_in_forex as ROG-Revenue_expenses_in_forex_perc is percentage of Expenses due to
foreign currency transactions i.e., Revenue_expenses_in_forex.
16. Market_Capitalisation as ROG-Market_Capitalisation_perc is percentage of Product of the total number of
a company's outstanding shares and the current market price of one share i.e., Market_Capitalisation.
In [10]:
Company.drop(['Co_Name','Networth','Gross_Block','Gross_Sales','Net_Sales','Cost_of_
'PBIDT','PBDT','PBIT','PBT','PAT','CP','Revenue_earnings_in_forex',
'Revenue_expenses_in_forex','Market_Capitalisation','Capital_Employed']
In [11]:
Company.head()
Out[11]:
5 rows × 51 columns
In [12]:
Company.shape
print('The number of rows of the dataframe after dropping certain columns is',Compan
print('The number of columns of the dataframe after dropping certain columns is',Com
In [13]:
dups = Company.duplicated()
Company[dups]
Out[13]:
0 rows × 51 columns
In [14]:
Company.isnull().sum()
Out[14]:
Co_Code 0
Networth_Next_Year 0
Equity_Paid_Up 0
Total_Debt 0
Net_Working_Capital 0
Current_Assets 0
Current_Liabilities_and_Provisions 0
Total_Assets_by_Liabilities 0
Other_Income 0
Value_Of_Output 0
Selling_Cost 0
Adjusted_PAT 0
Capital_expenses_in_forex 0
Book_Value_Unit_Curr 0
Book_Value_Adj_Unit_Curr 4
CEPS_annualised_Unit_Curr 0
Cash_Flow_From_Operating_Activities 0
Cash_Flow_From_Investing_Activities 0
Cash_Flow_From_Financing_Activities 0
ROG_Net_Worth_perc 0
ROG_Capital_Employed_perc 0
ROG_Gross_Block_perc 0
ROG_Gross_Sales_perc 0
ROG_Net_Sales_perc 0
ROG_Cost_of_Production_perc 0
ROG_Total_Assets_perc 0
ROG_PBIDT_perc 0
ROG_PBDT_perc 0
ROG_PBIT_perc 0
ROG_PBT_perc 0
ROG_PAT_perc 0
ROG_CP_perc 0
ROG_Revenue_earnings_in_forex_perc 0
ROG_Revenue_expenses_in_forex_perc 0
ROG_Market_Capitalisation_perc 0
Current_Ratio_Latest 1
Fixed_Assets_Ratio_Latest 1
Inventory_Ratio_Latest 1
Debtors_Ratio_Latest 1
Total_Asset_Turnover_Ratio_Latest 1
Interest_Cover_Ratio_Latest 1
PBIDTM_perc_Latest 1
PBITM_perc_Latest 1
PBDTM_perc_Latest 1
CPM_perc_Latest 1
APATM_perc_Latest 1
Debtors_Velocity_Days 0
Creditors_Velocity_Days 0
Inventory_Velocity_Days 103
Value_of_Output_by_Total_Assets 0
Value_of_Output_by_Gross_Block 0
dtype: int64
In [15]:
Company.isnull().sum().sum()
print("Number of missing values in dataset is",Company.isnull().sum().sum())
In [16]:
Company.dtypes.value_counts()
Out[16]:
object 46
float64 4
int64 1
dtype: int64
In [17]:
Company.head()
Out[17]:
5 rows × 51 columns
Data Insights:
'Networth_Next_Year' is the target variable and all other are predector variables.
From data entries it can be observed that 47 columns are of Object Data which are Numerical in nature.
Hence, we will convert these object data types to numerical and then check descriptive statistics of data
(as all these value are of numerical data type).
In [18]:
0.06 14
0.01 14
0.05 15
0.02 17
0 48
Name: Net_Working_Capital, Length: 2699, dtype: int64
CURRENT_ASSETS : 2775
15,248.91 1
13.16 1
11.31 1
13.29 1
266.02 1
..
0.08 16
0.02 18
0.01 19
0.03 20
0 27
In [19]:
Company.columns
Out[19]:
'Net_Working_Capital', 'Current_Assets',
'Current_Liabilities_and_Provisions', 'Total_Assets_by_Liabilit
ies',
'Capital_expenses_in_forex', 'Book_Value_Unit_Curr',
'Book_Value_Adj_Unit_Curr', 'CEPS_annualised_Unit_Curr',
'Cash_Flow_From_Operating_Activities',
'Cash_Flow_From_Investing_Activities',
'Cash_Flow_From_Financing_Activities', 'ROG_Net_Worth_perc',
'ROG_Capital_Employed_perc', 'ROG_Gross_Block_perc',
'ROG_Gross_Sales_perc', 'ROG_Net_Sales_perc',
'ROG_Cost_of_Production_perc', 'ROG_Total_Assets_perc',
'ROG_Revenue_expenses_in_forex_perc', 'ROG_Market_Capitalisatio
n_perc',
'Current_Ratio_Latest', 'Fixed_Assets_Ratio_Latest',
'Inventory_Ratio_Latest', 'Debtors_Ratio_Latest',
'Total_Asset_Turnover_Ratio_Latest', 'Interest_Cover_Ratio_Late
st',
'Creditors_Velocity_Days', 'Inventory_Velocity_Days',
'Value_of_Output_by_Total_Assets', 'Value_of_Output_by_Gross_Bl
ock'],
dtype='object')
In [20]:
cat=[]
num=[]
for i in Company.columns:
if Company[i].dtype=="object":
cat.append(i)
else:
num.append(i)
print("Categorical Columns:",cat)
print("/")
print("Numerical Columns:",num)
In [23]:
In [24]:
feature: Book_Value_Adj_Unit_Curr
Length: 2964
feature: CEPS_annualised_Unit_Curr
Length: 1900
In [25]:
Company.info()
<class 'pandas.core.frame.DataFrame'>
In [26]:
Company.dtypes.value_counts()
Out[26]:
int16 46
float64 4
int64 1
dtype: int64
In [27]:
round(Company.describe(),2).T
Out[27]:
In [28]:
continuous=Company.dtypes[(Company.dtypes=='int64')|(Company.dtypes=='float64')|(Com
data_plot=Company[continuous]
data_plot.boxplot(figsize=(20,10));
plt.xlabel("Continuous Variables")
plt.ylabel("Density")
plt.title("Figure: Boxplot of Continuous Data")
Out[28]:
Noticeably, there are outliers present in the data set.To confirm our analysis , we will further detect
outliers and decide how these outliers should be treated.
Detecting outliers using IQR method by defining a new range, that is called a decision range, and any
data point lying outside this range is considered as an outlier. The range is as given below:
IQR = Q3 − Q1
In [29]:
Q1 = Company.quantile(0.25)
Q3 = Company.quantile(0.75)
IQR = Q3 - Q1
UL = Q3 + 1.5*IQR
LL = Q1 - 1.5*IQR
In [30]:
Out[30]:
Co_Code 291
Networth_Next_Year 676
Equity_Paid_Up 0
Total_Debt 0
Net_Working_Capital 0
Current_Assets 0
Current_Liabilities_and_Provisions 0
Total_Assets_by_Liabilities 0
Other_Income 79
Value_Of_Output 0
Selling_Cost 168
Adjusted_PAT 0
Capital_expenses_in_forex 694
Book_Value_Unit_Curr 0
Book_Value_Adj_Unit_Curr 0
CEPS_annualised_Unit_Curr 0
Cash_Flow_From_Operating_Activities 0
Cash_Flow_From_Investing_Activities 0
Cash_Flow_From_Financing_Activities 0
ROG_Net_Worth_perc 0
ROG_Capital_Employed_perc 0
ROG_Gross_Block_perc 0
ROG_Gross_Sales_perc 0
ROG_Net_Sales_perc 0
ROG_Cost_of_Production_perc 0
ROG_Total_Assets_perc 0
ROG_PBIDT_perc 0
ROG_PBDT_perc 0
ROG_PBIT_perc 0
ROG_PBT_perc 0
ROG_PAT_perc 0
ROG_CP_perc 0
ROG_Revenue_earnings_in_forex_perc 1317
ROG_Revenue_expenses_in_forex_perc 1615
ROG_Market_Capitalisation_perc 0
Current_Ratio_Latest 160
Fixed_Assets_Ratio_Latest 0
Inventory_Ratio_Latest 0
Debtors_Ratio_Latest 0
Total_Asset_Turnover_Ratio_Latest 201
Interest_Cover_Ratio_Latest 0
PBIDTM_perc_Latest 0
PBITM_perc_Latest 0
PBDTM_perc_Latest 0
CPM_perc_Latest 0
APATM_perc_Latest 0
Debtors_Velocity_Days 0
Creditors_Velocity_Days 0
Inventory_Velocity_Days 262
Value_of_Output_by_Total_Assets 150
Value_of_Output_by_Gross_Block 0
dtype: int64
In [31]:
In [32]:
Company.isnull().sum()
Out[32]:
Co_Code 291
Networth_Next_Year 676
Equity_Paid_Up 0
Total_Debt 0
Net_Working_Capital 0
Current_Assets 0
Current_Liabilities_and_Provisions 0
Total_Assets_by_Liabilities 0
Other_Income 79
Value_Of_Output 0
Selling_Cost 168
Adjusted_PAT 0
Capital_expenses_in_forex 694
Book_Value_Unit_Curr 0
Book_Value_Adj_Unit_Curr 0
CEPS_annualised_Unit_Curr 0
Cash_Flow_From_Operating_Activities 0
Cash_Flow_From_Investing_Activities 0
Cash_Flow_From_Financing_Activities 0
ROG_Net_Worth_perc 0
ROG_Capital_Employed_perc 0
ROG_Gross_Block_perc 0
ROG_Gross_Sales_perc 0
ROG_Net_Sales_perc 0
ROG_Cost_of_Production_perc 0
ROG_Total_Assets_perc 0
ROG_PBIDT_perc 0
ROG_PBDT_perc 0
ROG_PBIT_perc 0
ROG_PBT_perc 0
ROG_PAT_perc 0
ROG_CP_perc 0
ROG_Revenue_earnings_in_forex_perc 1317
ROG_Revenue_expenses_in_forex_perc 1615
ROG_Market_Capitalisation_perc 0
Current_Ratio_Latest 160
Fixed_Assets_Ratio_Latest 0
Inventory_Ratio_Latest 0
Debtors_Ratio_Latest 0
Total_Asset_Turnover_Ratio_Latest 202
Interest_Cover_Ratio_Latest 0
PBIDTM_perc_Latest 0
PBITM_perc_Latest 0
PBDTM_perc_Latest 0
CPM_perc_Latest 0
APATM_perc_Latest 0
Debtors_Velocity_Days 0
Creditors_Velocity_Days 0
Inventory_Velocity_Days 365
Value_of_Output_by_Total_Assets 150
Value_of_Output_by_Gross_Block 0
dtype: int64
In [33]:
Company.isnull().sum().sum()
print("Number of missing values after replacing outliers with Nan values is",Company
In [34]:
Company.shape
Data has very few missing or null values and roughly 1.6% of data has outliers.
Here, we are converting outliers to missing values.Hence, total number of missing values in addition to
outliers will be 5717 (Total Number of Outliers+Total Number of Missing Values).
Note: Before converting outliers to NaN values number of missing values present in the dataset was
118.
In [35]:
plt.figure(figsize = (12,8))
sns.heatmap(Company.isnull(), cbar = False, cmap = 'coolwarm', yticklabels = False)
plt.show()
Noticeable, presence of missing values in some variables can be observed.Blue color in the heatmap is
indicating occupied cells while red cuolor indicates missing values present in the data.Listing down few
observations:
Typically if missing data in columns is less then 30 % of our data and at row level data is atleast at 90%
complete, we do not drop the data.Here, we will first check completeness of data and then decide the
technique to be used to move forward.
In order to check the completeness of data at row level, we will look at total number of missing values in each
row.
Note: To find total number of missing values in each row , we will set axis as 1.
Since, it is a company and we want to quantify the data.Therefore, we are choosing to do a missing value
imputation instead of dropping these missing values.
We will try to target companies which completes atleast 90 % of the data in each row i.e. we will filter
out companies where there are atleast 5 or less missing values to identify the reliable data until this
point.
After filtering out these values shape of our data changes (before filtering; number of rows= 3586) to :
Note: We have created a temporary dataframe to filter out companies with atleast 5 missing
values.
In [36]:
In [37]:
Company_temp.shape
Out[37]:
(3569, 51)
In [38]:
Company.isnull().sum().sort_values(ascending = False)/Company.index.size
Out[38]:
ROG_Revenue_expenses_in_forex_perc 0.450363
ROG_Revenue_earnings_in_forex_perc 0.367262
Capital_expenses_in_forex 0.193530
Networth_Next_Year 0.188511
Inventory_Velocity_Days 0.101785
Co_Code 0.081149
Total_Asset_Turnover_Ratio_Latest 0.056330
Selling_Cost 0.046849
Current_Ratio_Latest 0.044618
Value_of_Output_by_Total_Assets 0.041829
Other_Income 0.022030
Cash_Flow_From_Financing_Activities 0.000000
Cash_Flow_From_Investing_Activities 0.000000
Cash_Flow_From_Operating_Activities 0.000000
Book_Value_Adj_Unit_Curr 0.000000
Book_Value_Unit_Curr 0.000000
ROG_Net_Worth_perc 0.000000
CEPS_annualised_Unit_Curr 0.000000
Adjusted_PAT 0.000000
ROG_Gross_Block_perc 0.000000
Value_Of_Output 0.000000
Total_Assets_by_Liabilities 0.000000
Current_Liabilities_and_Provisions 0.000000
Current_Assets 0.000000
Net_Working_Capital 0.000000
Total_Debt 0.000000
Equity_Paid_Up 0.000000
ROG_Capital_Employed_perc 0.000000
Value_of_Output_by_Gross_Block 0.000000
ROG_Gross_Sales_perc 0.000000
ROG_Net_Sales_perc 0.000000
Creditors_Velocity_Days 0.000000
Debtors_Velocity_Days 0.000000
APATM_perc_Latest 0.000000
CPM_perc_Latest 0.000000
PBDTM_perc_Latest 0.000000
PBITM_perc_Latest 0.000000
PBIDTM_perc_Latest 0.000000
Interest_Cover_Ratio_Latest 0.000000
Debtors_Ratio_Latest 0.000000
Inventory_Ratio_Latest 0.000000
Fixed_Assets_Ratio_Latest 0.000000
ROG_Market_Capitalisation_perc 0.000000
ROG_CP_perc 0.000000
ROG_PAT_perc 0.000000
ROG_PBT_perc 0.000000
ROG_PBIT_perc 0.000000
ROG_PBDT_perc 0.000000
ROG_PBIDT_perc 0.000000
ROG_Cost_of_Production_perc 0.000000
ROG_Total_Assets_perc 0.000000
dtype: float64
In [39]:
Company_sub1 = Company.drop(['ROG_Revenue_expenses_in_forex_perc','ROG_Revenue_earni
axis = 1)
In [40]:
Company_sub1.shape
print('The number of rows after dropping columns with more then 30% missing values i
print('The number of columns after dropping columns with more then 30% missing value
The number of rows after dropping columns with more then 30% missing v
alues is 3586 .
The number of columns after dropping columns with more then 30% missin
g values is 49 .
The missing values are of numeric nature.Hence, can be imputed using KNNImputer function from the
impute module of the sklearn. This imputer utilizes the k-Nearest Neighbors method to replace the
missing values in the datasets by finding the nearest neighbors with the Euclidean distance
matrix.
Another critical point here is that the KNN Imptuer is a distance-based imputation method and it requires us to
normalize our data. Otherwise, the different scales of our data will lead the KNN Imputer to generate biased
replacements for the missing values.Here, we will use Scikit-Learn’s Standard Scaler method which will scale
our variables to have values between 0 and 1.
Imputation is done by predicting the missing value based on values of 10 nearest neighbors of the same
variable. Such that all the missing values are replaced based on nearest neighbors value.
In [41]:
In [42]:
In [43]:
In [44]:
In [45]:
imputer = KNNImputer(n_neighbors=10)
In [46]:
In [47]:
Company_imputed.isnull().sum()
Out[47]:
Co_Code 0
Equity_Paid_Up 0
Total_Debt 0
Net_Working_Capital 0
Current_Assets 0
Current_Liabilities_and_Provisions 0
Total_Assets_by_Liabilities 0
Other_Income 0
Value_Of_Output 0
Selling_Cost 0
Adjusted_PAT 0
Capital_expenses_in_forex 0
Book_Value_Unit_Curr 0
Book_Value_Adj_Unit_Curr 0
CEPS_annualised_Unit_Curr 0
Cash_Flow_From_Operating_Activities 0
Cash_Flow_From_Investing_Activities 0
Cash_Flow_From_Financing_Activities 0
ROG_Net_Worth_perc 0
ROG_Capital_Employed_perc 0
ROG_Gross_Block_perc 0
ROG_Gross_Sales_perc 0
ROG_Net_Sales_perc 0
ROG_Cost_of_Production_perc 0
ROG_Total_Assets_perc 0
ROG_PBIDT_perc 0
ROG_PBDT_perc 0
ROG_PBIT_perc 0
ROG_PBT_perc 0
ROG_PAT_perc 0
ROG_CP_perc 0
ROG_Market_Capitalisation_perc 0
Current_Ratio_Latest 0
Fixed_Assets_Ratio_Latest 0
Inventory_Ratio_Latest 0
Debtors_Ratio_Latest 0
Total_Asset_Turnover_Ratio_Latest 0
Interest_Cover_Ratio_Latest 0
PBIDTM_perc_Latest 0
PBITM_perc_Latest 0
PBDTM_perc_Latest 0
CPM_perc_Latest 0
APATM_perc_Latest 0
Debtors_Velocity_Days 0
Creditors_Velocity_Days 0
Inventory_Velocity_Days 0
Value_of_Output_by_Total_Assets 0
Value_of_Output_by_Gross_Block 0
Networth_Next_Year 0
dtype: int64
There is no target variable defined – but since the objective is to build a model for investor to decode which
company to invest in – the variable Networth_Next_Year coud be used to transform into target variable (as
mentined in rubric as well).
We will now create a default variable that should take the below mentioned values:
of 1 when net worth next year is negative & 0 when net worth next year is positive.
If the company’s Networth_Next_Year is positive – then the company would continue to return good
investment for investor and thus could be transformed as 0 (i.e., Non-Default).
If the company’s Networth_Next_Year is negative – then the company is likely to not return a good
investment to investor and transformed as 1 (i.e.Default).
In [48]:
In [49]:
Company_imputed[['default','Networth_Next_Year']].head(10)
Out[49]:
default Networth_Next_Year
0 1 -6.218
1 1 -23.782
2 0 43.906
3 1 -23.723
4 1 -12.392
5 1 -13.211
6 1 -7.314
7 0 8.508
8 1 -27.635
9 0 35.004
In [50]:
Company_imputed['default'].value_counts()
Out[50]:
0 3225
1 361
In [51]:
Company_imputed['default'].value_counts(normalize = True)
Out[51]:
0 0.899331
1 0.100669
Noticeably, approximately 10% of the companies from the dataset are likely to default and these are the
companies in which investors should probably avoid investing in.
Univariate Analysis:
In [52]:
def univariateAnalysis_numeric(column,nbins):
print("Description of " + column)
print("-------------------------------------------------------------------------
print(Company_imputed[column].describe(),end=' ')
plt.figure()
print("Distribution of " + column)
print("-------------------------------------------------------------------------
sns.distplot(Company_imputed[column], kde=False, color='skyblue');
plt.show()
plt.figure()
print("BoxPlot of " + column)
print("-------------------------------------------------------------------------
ax = sns.boxplot(x=Company_imputed[column],color='b')
plt.show()
In [53]:
ted_imp_features=pd.DataFrame(Company_imputed,columns=['Net_Working_Capital','Book_Va
'ROG_Capital_Employed_perc','ROG_Total_Assets_perc','Current_Ratio_
'Fixed_Assets_Ratio_Latest','Inventory_Ratio_Latest','Debtors_Ratio
'Total_Asset_Turnover_Ratio_Latest','Interest_Cover_Ratio_Latest',
'ROG_Market_Capitalisation_perc', 'ROG_Cost_of_Production_perc'])
In [54]:
In [55]:
for x in Numerical_column_list:
univariateAnalysis_numeric(x,20)
pd.options.display.float_format = '{:.3f}'.format
----------------------------------------------------------------------
------
Majorty of the times i.e. 75% of the times company's net working capital, current assets, current
liabilities,total assets by liabilities, other income, value output, selling cost, adjusted PAT
,Book_Value_Unit_Curr,Book_Value_Adj._Unit_Curr,CEPS_annualised_Unit_Curr,
Cash_Flow_From_Operating_Activities, Cash_Flow_From_Investing_Activities etc is positive.
Company is currently not financing in longterm investments in forex currently. Probably company should
consider funding in longterm forex investmets to generate high revenues.
Since, companies are not investing in forex most of the values are 0.Therfore, boxplot is a line for variable
Capital_expense_in_forex.
For variable "Inventory_Velocity Days" there is just one whisker in boxplot is, due to the extreme skewness
of data and also there is no value smaller than the median.
In [56]:
Numerical_column_list = list(Company_num.columns.values)
Numerical_column_list
Out[56]:
['Net_Working_Capital',
'Book_Value_Unit_Curr',
'ROG_Net_Worth_perc',
'ROG_Capital_Employed_perc',
'ROG_Total_Assets_perc',
'Current_Ratio_Latest',
'Fixed_Assets_Ratio_Latest',
'Inventory_Ratio_Latest',
'Debtors_Ratio_Latest',
'Total_Asset_Turnover_Ratio_Latest',
'Interest_Cover_Ratio_Latest',
'ROG_Market_Capitalisation_perc',
'ROG_Cost_of_Production_perc']
Bivariate/Multivariate Analysis:
In [57]:
Out[57]:
The data has higher Non-default companies i.e., the companies whic are expected to have a postive Net
Worth next year (which is good for investors for decision making).
Some of the important parameters which are more likely to contribute to the strength of a company's balance
sheet can be evaluated by below listed parameters:
localhost:8888/notebooks/Downloads/Financial Risk Analytics (FRA)/Project FRA Milestone 1/Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022.ip… 33/102
06/02/2022, 17:52 Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022 - Jupyter Notebook
['Net_Working_Capital','Book_Value_Unit_Curr','ROG-Net_Worth_perc','ROG-Capital_Employed_perc','ROG-
Total_Assets_perc',
'Current_Ratio[Latest]',
'Fixed_Assets_Ratio[Latest]','Inventory_Ratio[Latest]','Debtors_Ratio[Latest]',
'Total_Asset_Turnover_Ratio[Latest]','Interest_Cover_Ratio[Latest]','ROG-Market_Capitalisation_perc', 'ROG-
Cost_of_Production_perc']
1. Net Working Capital: It measures company's liquidity and short-term financial health. A company will have
negative NWC if its ratio of current assets to liabilities is less than one.
2. Book Value (Unit Curr): High book value per share (due to profits accumulated over the years) indicates a
strong company.
3. ROG-Net Worth (%) : Companies with low capital base (that don't need additional capital for growth) will
show a higher ratio.
4. ROG-Capital Employed (%): Captures the profit generated on total capital employed (including
debt).Companies with low capital base (those that don't need additional capital for growth) will display a
higher ratio.
5. ROG-Total Assets (%): Captures the net profit generated on total assets.
6. Current Ratio[Latest]: It tells how cash rich a company is. It helps us gauge the short-term financial
strength of a company.
7. Fixed Assets Ratio[Latest]:It reveals how efficient a company is at generating sales from its existing fixed
assets.
8. Inventory Ratio[Latest] : Shows how efficiently the company manages its inventory.
9. Debtors Ratio[Latest]: A high debt to equity ratio is a warning signal, especially in situations like business
downturns.
10. Total Asset Turnover Ratio[Latest] : Shows how efficiently the company manages its total assets.
11. Interest Cover Ratio[Latest]: measures a company's ability to handle its outstanding debt.
12. ROG-Market Capitalisation (%): Company's worth as determined by the stock market.
13. ROG_Cost_of_Production_perc : Product costing is the process of tracking and studying all the various
expenses that are accrued in the production and sale of a product.
In [58]:
plt.figure(figsize=(25,10))
sns.boxplot(data=Company_imputed_imp_features)
plt.xlabel("Variables")
plt.xticks(rotation=90)
plt.ylabel("Density")
plt.title('Figure:Boxplot of few important features')
Out[58]:
Insights:
In [59]:
Company_imputed_imp_features.plot.kde(figsize = (20,10),
linewidth = 4)
Out[59]:
<AxesSubplot:ylabel='Density'>
In [60]:
# Skewness of Data
Out[60]:
Current_Ratio_Latest 1.275
Total_Asset_Turnover_Ratio_Latest 1.075
Fixed_Assets_Ratio_Latest 0.889
ROG_Market_Capitalisation_perc 0.812
Interest_Cover_Ratio_Latest 0.739
Inventory_Ratio_Latest 0.405
Debtors_Ratio_Latest 0.229
Net_Working_Capital 0.175
ROG_Cost_of_Production_perc 0.115
ROG_Capital_Employed_perc 0.097
Book_Value_Unit_Curr 0.095
ROG_Total_Assets_perc 0.074
ROG_Net_Worth_perc 0.072
dtype: float64
• If the skewness is between -0.5 and 0.5, the data are fairly symmetrical.
• If the skewness is between -1 and –
0.5 or between 0.5 and 1, the data are moderately skewed.
• If the skewness is less than -1 or greater than 1,
the data are highly skewed.
In [61]:
plt.figure(figsize=(8,5))
sns.boxplot(Company_imputed["default"], Company_imputed['Current_Ratio_Latest'],data
plt.title("Figure: Plot of Default with Current_Ratio_Latest")
plt.show()
In [62]:
#boxplot_Total_Asset_Turnover_Ratio[Latest]
plt.figure(figsize=(8,5))
sns.boxplot(Company_imputed["default"], Company_imputed['Total_Asset_Turnover_Ratio_
plt.title('Figure: Boxplot of Default with Total_Asset_Turnover_Ratio[Latest]')
plt.show()
In [63]:
#boxplot_Fixed_Assets_Ratio[Latest]
plt.figure(figsize=(8,5))
sns.boxplot(Company_imputed["default"], Company_imputed['Fixed_Assets_Ratio_Latest']
plt.title('Figure: Boxplot of Default with Fixed_Assets_Ratio[Latest]')
plt.show()
In [64]:
#boxplot_ROG-Market_Capitalisation_perc
plt.figure(figsize=(8,5))
sns.boxplot(Company_imputed["default"], Company_imputed['ROG_Market_Capitalisation_p
plt.title('Figure: Boxplot of Default with ROG-Market_Capitalisation_perc')
plt.show()
In [65]:
#boxplot_Interest_Cover_Ratio[Latest]
sns.boxplot(Company_imputed["default"], Company_imputed['Interest_Cover_Ratio_Latest
plt.title('Figure: Boxplot of Default with Interest_Cover_Ratio[Latest]', fontsize=15
plt.show()
In [66]:
#boxplot_Inventory_Ratio[Latest]
sns.boxplot(Company_imputed["default"], Company_imputed['Inventory_Ratio_Latest'],da
plt.title('Figure: Boxplot of Default with Inventory_Ratio[Latest]', fontsize=15)
plt.show()
In [67]:
#boxplot_Debtors_Ratio[Latest]
sns.boxplot(Company_imputed["default"], Company_imputed['Debtors_Ratio_Latest'],data
plt.title('Figure: Boxplot of Default with Debtors_Ratio[Latest] ',fontsize=15)
plt.show()
In [68]:
#boxplot_Net_Working_Capital
sns.boxplot(Company_imputed["default"], Company_imputed['Net_Working_Capital'],data=
plt.title('Figure: Boxplot of Default with Net_Working_Capital', fontsize=15)
plt.show()
In [69]:
#boxplot_ROG-Cost_of_Production_perc
sns.boxplot(Company_imputed["default"], Company_imputed['ROG_Cost_of_Production_perc
plt.title('Figure: Boxplot of Default with ROG-Cost_of_Production_perc', fontsize=15
plt.show()
In [70]:
#boxplot_ROG-Capital_Employed_perc
sns.boxplot(Company_imputed["default"], Company_imputed['ROG_Capital_Employed_perc']
plt.title('Figure: Boxplot of Default with ROG-Capital_Employed_perc', fontsize=15)
plt.show()
In [71]:
#boxplot_ROG-Capital_Employed_perc
sns.boxplot(Company_imputed["default"], Company_imputed['Book_Value_Unit_Curr'],data
plt.title('Figure: Boxplot of Default with Book_Value_Unit_Curr ', fontsize=15)
plt.show()
In [72]:
#boxplot_ROG-ROG-Total_Assets_perc
sns.boxplot(Company_imputed["default"], Company_imputed['ROG_Total_Assets_perc'],dat
plt.title('Figure: Boxplot of Default with ROG-Total_Assets_perc ', fontsize=15)
plt.show()
In [73]:
plt.figure(figsize = (12,8))
cor_matrix = Company_imputed.drop('default', axis = 1).corr()
sns.heatmap(cor_matrix, cmap = 'plasma', vmin = -1, vmax= 1)
Out[73]:
<AxesSubplot:>
In [ ]:
Split the data into Train and Test dataset in a ratio of 67:33 with the fixed random_state as 42 to ensure
uniformity across multiple systems and stratify on default to make sure both train and test data have similar
proportion of defaulters and non-defaulters. This is done as the dataset is imbalanced and has more of non-
defaulters. Before we do the train-test split , we will first separate independent (X) and dependent (y) variables
(to perform Train-Test split) using train_test_split from sklearn.model_selection.
In [74]:
In [75]:
In [76]:
In [77]:
print('Number of rows and columns of the training set for the independent variables:
print('Number of rows and columns of the training set for the dependent variable:',y
print('Number of rows and columns of the test set for the independent variables:',X_
print('Number of rows and columns of the test set for the dependent variable:',y_tes
Number of rows and columns of the training set for the independent var
iables: (2402, 49)
Number of rows and columns of the training set for the dependent varia
ble: (2402, 1)
Number of rows and columns of the test set for the independent variabl
es: (1184, 49)
Number of rows and columns of the test set for the dependent variable:
(1184, 1)
In [78]:
X_train.head()
Out[78]:
5 rows × 49 columns
In [79]:
y_train.head()
Out[79]:
default
662 0
1373 0
3268 0
3246 0
1456 0
In [80]:
X_test.head()
Out[80]:
5 rows × 49 columns
In [81]:
y_test.head()
Out[81]:
default
3163 0
3133 0
937 0
196 1
2852 0
Here, we will use Logistic regression Model to evaluate the relationship between one dependent binary variable
and one or more independent variables.This model will help predicts the probability of occurrence of Default
using a logit function.
1. Stats Model
2. Scikit Learn
Note: Statsmodels provides a Logit() function for performing logistic regression. The Logit() function
accepts y and X as parameters and returns the Logit object. The model is then fitted to the data.The
logit function is simply the logarithm of the odds.
The equation of the Logistic Regression by which we predict the corresponding probabilities and then go on
predict a discrete target variable is
localhost:8888/notebooks/Downloads/Financial Risk Analytics (FRA)/Project FRA Milestone 1/Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022.ip… 49/102
06/02/2022, 17:52 Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022 - Jupyter Notebook
1
y=
1+𝑒−𝑧
Note: z = 𝛽0 +∑𝑛𝑖=1 (𝛽𝑖 𝑋1 )
In [82]:
import statsmodels.api as sm
In [ ]:
Splitting arrays or matrices into random train and test subsets. Model will be fitted on train set and
predictions will be made on the test set
In [83]:
#Statsmodel requires the labelled data, therefore, concatinating the y label to the
In [84]:
Company_train.to_csv('Company_train.csv',index=False)
Company_test.to_csv('Company_test.csv',index=False)
In [85]:
Company_train["default"].value_counts()
Out[85]:
0 2176
1 226
In [86]:
Company_train.default.sum() / len(Company_train.default)
Out[86]:
0.09408825978351373
In [87]:
Company_train.columns
Out[87]:
'Current_Assets', 'Current_Liabilities_and_Provisions',
'Book_Value_Unit_Curr', 'Book_Value_Adj_Unit_Curr',
'CEPS_annualised_Unit_Curr', 'Cash_Flow_From_Operating_Activiti
es',
'Cash_Flow_From_Investing_Activities',
'Cash_Flow_From_Financing_Activities', 'ROG_Net_Worth_perc',
'ROG_Capital_Employed_perc', 'ROG_Gross_Block_perc',
'ROG_Gross_Sales_perc', 'ROG_Net_Sales_perc',
'ROG_Cost_of_Production_perc', 'ROG_Total_Assets_perc',
'Current_Ratio_Latest', 'Fixed_Assets_Ratio_Latest',
'Inventory_Ratio_Latest', 'Debtors_Ratio_Latest',
'Total_Asset_Turnover_Ratio_Latest', 'Interest_Cover_Ratio_Late
st',
'Creditors_Velocity_Days', 'Inventory_Velocity_Days',
'Value_of_Output_by_Total_Assets', 'Value_of_Output_by_Gross_Bl
ock',
'Networth_Next_Year', 'default'],
dtype='object')
In [ ]:
Model 1
Before starting model building, lets look at the problem of multicollinearity. Multicollinearity occurs when two or
more independent variables are highly correlated with one another in a regression model.
In [88]:
## Importing VIF
return(vif)
In [89]:
Out[89]:
variables VIF
22 ROG_Net_Sales_perc 19.846
21 ROG_Gross_Sales_perc 19.749
13 Book_Value_Adj_Unit_Curr 5.579
12 Book_Value_Unit_Curr 5.537
46 Value_of_Output_by_Total_Assets 4.805
36 Total_Asset_Turnover_Ratio_Latest 4.405
40 PBDTM_perc_Latest 4.187
26 ROG_PBDT_perc 4.079
28 ROG_PBT_perc 4.036
41 CPM_perc_Latest 3.932
29 ROG_PAT_perc 3.477
27 ROG_PBIT_perc 3.386
30 ROG_CP_perc 3.279
25 ROG_PBIDT_perc 3.278
47 Value_of_Output_by_Gross_Block 3.051
39 PBITM_perc_Latest 3.038
33 Fixed_Assets_Ratio_Latest 3.035
38 PBIDTM_perc_Latest 2.713
42 APATM_perc_Latest 2.679
10 Adjusted_PAT 2.471
14 CEPS_annualised_Unit_Curr 2.155
19 ROG_Capital_Employed_perc 1.923
18 ROG_Net_Worth_perc 1.837
37 Interest_Cover_Ratio_Latest 1.781
9 Selling_Cost 1.754
24 ROG_Total_Assets_perc 1.743
35 Debtors_Ratio_Latest 1.737
34 Inventory_Ratio_Latest 1.619
7 Other_Income 1.603
15 Cash_Flow_From_Operating_Activities 1.555
48 Networth_Next_Year 1.457
5 Current_Liabilities_and_Provisions 1.444
variables VIF
3 Net_Working_Capital 1.428
8 Value_Of_Output 1.388
43 Debtors_Velocity_Days 1.387
4 Current_Assets 1.377
2 Total_Debt 1.371
23 ROG_Cost_of_Production_perc 1.363
32 Current_Ratio_Latest 1.308
20 ROG_Gross_Block_perc 1.306
45 Inventory_Velocity_Days 1.304
44 Creditors_Velocity_Days 1.268
17 Cash_Flow_From_Financing_Activities 1.180
16 Cash_Flow_From_Investing_Activities 1.177
31 ROG_Market_Capitalisation_perc 1.164
0 Co_Code 1.104
6 Total_Assets_by_Liabilities 1.094
1 Equity_Paid_Up 1.060
11 Capital_expenses_in_forex nan
Here, we see that the value of VIF is high for many variables. Hence,dropping variables with VIF more than 5
(very high correlation) & build our model.
In [94]:
f_1='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total_
In [95]:
Iterations 10
In [96]:
model_1.summary()
Out[96]:
As most of the coefficients are having p values greater than 5%, those variables are highly correlated
and this can be ignored while taking only significant variables with p values < 0.05.
The elimination of these variables is done one by one, where the highest insignificant variable is
removed first from logistic model and then model performance tested again to see if other variables are
contributing significantly or not.
Variable "ROG_PBIT_perc" has the highest p-value (0.986) and is insignificant, therefore, we need to
eliminate it.
Model_2
In [97]:
_Gross_Block_perc+Inventory_Velocity_Days+Creditors_Velocity_Days+Cash_Flow_From_Fina
In [98]:
Iterations 10
In [99]:
model_2.summary()
Out[99]:
Variable "PBDTM_perc_Latest" has the highest p-value (0.937) and is insignificant, therefore, we need to
eliminate it.
Model_3
In [100]:
e_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total_Assets+Total_Asset_Turn
In [101]:
Iterations 10
In [102]:
model_3.summary()
Out[102]:
Model_4
In [103]:
f_4='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total_
In [104]:
Iterations 10
In [105]:
model_4.summary()
Out[105]:
Variable "Inventory_Velocity_Days" has the highest p-value (0.907) and is insignificant, therefore, we
need to eliminate it.
Model_5
In [106]:
f_5='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total_
In [107]:
Iterations 10
In [108]:
model_5.summary()
Out[108]:
Variable "Debtors_Velocity_Days" has the highest p-value (0.764) and is insignificant, therefore, we need
to eliminate it.
Model_6
In [109]:
atest+Selling_Cost+ROG_Total_Assets_perc+Debtors_Ratio_Latest+Inventory_Ratio_Latest+
In [110]:
Iterations 10
In [111]:
model_6.summary()
Out[111]:
Model_7
In [112]:
of_Production_perc+Current_Ratio_Latest+ROG_Gross_Block_perc+Creditors_Velocity_Days+
In [113]:
Iterations 10
In [114]:
model_7.summary()
Out[114]:
Variable "ROG_CP_perc" has the highest p-value (0.735) and is insignificant, therefore, we need to
eliminate it.
Model_8
In [115]:
s+Total_Asset_Turnover_Ratio_Latest+CPM_perc_Latest+Value_of_Output_by_Gross_Block+ F
In [116]:
Iterations 10
In [117]:
model_8.summary()
Out[117]:
Variable "ROG_Gross_Block_perc" has the highest p-value (0.720) and is insignificant, therefore, we
need to eliminate it.
Model_9
In [118]:
l+Total_Debt+ROG_Cost_of_Production_perc+Current_Ratio_Latest+Creditors_Velocity_Days
In [119]:
Iterations 10
In [120]:
model_9.summary()
Out[120]:
Model_10
In [121]:
ncome+ Net_Working_Capital+Total_Debt+ROG_Cost_of_Production_perc+Current_Ratio_Lates
In [122]:
Iterations 10
In [123]:
model_10.summary()
Out[123]:
Variable "Fixed_Assets_Ratio_Latest" has the highest p-value (0.656) and is insignificant, therefore, we
need to eliminate it.
Model_11
In [124]:
nover_Ratio_Latest+CPM_perc_Latest+Value_of_Output_by_Gross_Block+ Adjusted_PAT+ROG_C
In [125]:
Iterations 10
In [126]:
model_11.summary()
Out[126]:
Variable "Inventory_Ratio_Latest" has the highest p-value (0.528) and is insignificant, therefore, we need
to eliminate it.
Model_12
In [127]:
Interest_Cover_Ratio_Latest+Selling_Cost+ROG_Total_Assets_perc+Debtors_Ratio_Latest+O
In [128]:
Iterations 10
In [129]:
model_12.summary()
Out[129]:
Variable "Selling_Cost" has the highest p-value (0.365) and is insignificant, therefore, we need to
eliminate it.
Model_13
In [130]:
f_13='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total
In [131]:
Iterations 10
In [132]:
model_13.summary()
Out[132]:
Variable "Other_Income" has the highest p-value (0.391) and is insignificant, therefore, we need to
eliminate it.
Model_15
In [133]:
s+Total_Asset_Turnover_Ratio_Latest+CPM_perc_Latest+Value_of_Output_by_Gross_Block+ A
In [134]:
Iterations 10
In [135]:
model_15.summary()
Out[135]:
Model_16
In [136]:
'default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total_Asse
In [137]:
Iterations 10
In [138]:
model_16.summary()
Out[138]:
Variable "Creditors Velocity Days" has the highest p-value (0.360) and is insignificant, therefore, we
localhost:8888/notebooks/Downloads/Financial Risk Analytics (FRA)/Project FRA Milestone 1/Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022.ip… 82/102
06/02/2022, 17:52 Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022 - Jupyter Notebook
_ y_ y g p ( ) g , ,
need to eliminate it.
Model_17
In [139]:
f_17='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total
In [140]:
Iterations 10
In [141]:
model_17.summary()
Out[141]:
Variable "Equity_Paid_Up" has the highest p-value (0.078) and is insignificant, therefore, we need to
eliminate it.
Model_18
In [142]:
f_18='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total
In [143]:
Iterations 10
In [144]:
model_18.summary()
Out[144]:
Variable "ROG_Net_Worth_perc" has the highest p-value (0.089) and is insignificant, therefore, we need
to eliminate it.
Model 19
localhost:8888/notebooks/Downloads/Financial Risk Analytics (FRA)/Project FRA Milestone 1/Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022.ip… 86/102
Model_19
06/02/2022, 17:52 Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022 - Jupyter Notebook
In [145]:
t_by_Total_Assets+CPM_perc_Latest+Value_of_Output_by_Gross_Block+ Adjusted_PAT+ROG_C
In [146]:
Iterations 10
In [147]:
model_19.summary()
Out[147]:
Model_21
localhost:8888/notebooks/Downloads/Financial Risk Analytics (FRA)/Project FRA Milestone 1/Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022.ip… 88/102
06/02/2022, 17:52 Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022 - Jupyter Notebook
In [148]:
ver_Ratio_Latest+ROG_Total_Assets_perc+Debtors_Ratio_Latest+Net_Working_Capital+Tota
In [149]:
Iterations 10
In [150]:
model_21.summary()
Out[150]:
Variable "ROG_Total_Assets_perc" has the highest p-value (0.065) and is insignificant, therefore, we
need to eliminate it.
Model_22
In [151]:
f_22='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total
In [152]:
Iterations 10
In [153]:
model_22.summary()
Out[153]:
Variable "ROG_Capital_Employed_perc" has the highest p-value (0.246) and is insignificant, therefore,
we need to eliminate it.
Model_23
In [154]:
f_23='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total
In [155]:
Iterations 10
In [156]:
model_23.summary()
Out[156]:
Model_24
In [157]:
f_24='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+CPM_perc_Latest+Value_of
In [158]:
Iterations 10
In [159]:
model_24.summary()
Out[159]:
Variable "Debtors_Ratio_Latest" has the highest p-value (0.165) and is insignificant, therefore, we need
to eliminate it.
Model_25
In [160]:
f_25='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+CPM_perc_Latest+Value_of
In [161]:
Iterations 10
In [162]:
model_25.summary()
Out[162]:
Now all the variables are significant, therefore, we don't need to eliminate any variable.Therefore, after
many such iterations below variables were removed :
1.7 Validate the Model on Test Dataset and state the performance
matrices. Also state interpretation from the model
Now we will look at the predicted probability values.
In [172]:
y_prob_pred_train = model_25.predict(Company_train)
pd.DataFrame(y_prob_pred_train).head()
Out[172]:
662 0.000
1373 0.001
3268 0.003
3246 0.002
1456 0.003
In [173]:
y_prob_pred_test = model_25.predict(Company_test)
pd.DataFrame(y_prob_pred_test).head()
...
In [174]:
y_class_pred=[]
for i in range(0,len(y_prob_pred_train)):
if np.array(y_prob_pred_train)[i]>0.5:
a=1
else:
a=0
y_class_pred.append(a)
In [178]:
sns.heatmap((metrics.confusion_matrix(Company_train['default'],y_class_pred)),annot=
,cmap='Blues');
plt.xlabel('Predicted Label');
plt.ylabel('Actual Label',rotation=90);
plt.title('Figure: Confusion Matrix of Train Data');
In [179]:
print(metrics.classification_report(Company_train['default'],y_class_pred,digits=3))
Overall 95% of correct predictions to total predictions were made by the model
In [180]:
y_prob_pred_test = model_25.predict(Company_test)
pd.DataFrame(y_prob_pred_test).head()
Out[180]:
3163 0.001
3133 0.000
937 0.159
196 0.764
2852 0.000
In [181]:
y_class_pred=[]
for i in range(0,len(y_prob_pred_test)):
if np.array(y_prob_pred_test)[i]>0.5:
a=1
else:
a=0
y_class_pred.append(a)
In [182]:
sns.heatmap((metrics.confusion_matrix(Company_test['default'],y_class_pred)),annot=T
,cmap='Blues');
plt.xlabel('Predicted Label');
plt.ylabel('Actual Label',rotation=90);
plt.title('Figure: Confusion Matrix of Test Data');
In [183]:
print(metrics.classification_report(Company_test['default'],y_class_pred,digits=3))
Overall 97% of correct predictions to total predictions were made by the model
1) Of many variables – significantly only 6 variables contribute to the company being predicted as default or not
from logistic regression point of view.
2) The model is likely to predict the 86% companies that could default correctly.
3) Which means only in 14% cases – it could happen that a company that is predicted as defaulter may not be
a defaulter but form an investor point of view – it is ok to no invest money on company that could likely not
default.
4) The precision is a bit less in this model – however still 68% times, the model will predict the defaulter
company correctly.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: