Project FRA Milestone1 JPY Nikita Chaturvedi 05.05.2022 Jupyter Notebook PDF

06/02/2022, 17:52 Project_FRA_Milestone1_Nikita Chaturvedi_05.05.
2022 - Jupyter Notebook
Problem Statement
Businesses or companies can fall prey to default if they are not able to keep up their debt obligations. Defaults
will lead to a lower credit rating for the company which in turn reduces its chances of getting credit in the future
and may have to pay higher interests on existing debts as well as any new obligations. From an investor's point
of view, he would want to invest in a company if it is capable of handling its financial obligations, can grow
quickly, and is able to manage the growth scale.
A balance sheet is a financial statement of a company that provides a snapshot of what a company owns,
owes, and the amount invested by the shareholders. Thus, it is an important tool that helps evaluate the
performance of a business.
Data that is available includes information from the financial statement of the
companies for the previous year (2015). Also, information about the Networth of
the company in the following year (2016) is provided which can be used to drive
the labeled field.
In [175]:
# Importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns # for making plots with seaborn
color = sns.color_palette()
import sklearn.metrics as metrics
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import statsmodels.formula.api as SM
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.metrics import roc_auc_score,roc_curve,classification_report,confusion_
import warnings
warnings.filterwarnings("ignore")
Data Ingestion (Read Dataset):
In [2]:
Company = pd.read_csv('FRA Milestone 1.csv')
localhost:8888/notebooks/Downloads/Financial Risk Analytics (FRA)/Project FRA Milestone 1/Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022.ipynb 1/102

06/02/2022, 17:52 Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022 - Jupyter Notebook
In [3]:
Company.head(10)
Capital [Latest] [Latest] [Latest] [Latest] (
27.48 -1,007.24 5,936.03 474.3 -1,076.34 40.5 ... 0 0 0 0 0
68.08 4,458.20 7,410.18 9,070.86 -1,098.88 486.86 ... -10.3 -39.74 -57.74 -57.74 -87.18
06.86 7,714.68 6,944.54 1,281.54 4,496.25 9,097.64 ... -5,279.14 -5,516.98 -7,780.25 -7,723.67 -7,961.51
23.49 2,353.88 2,326.05 1,033.69 -2,612.42 1,034.12 ... -3.33 -7.21 -48.13 -47.7 -51.58
70.83 4,675.33 5,740.90 1,084.20 1,836.23 4,685.81 ... -295.55 -400.55 -845.88 379.79 274.79
19.39 -1,824.75 694.64 0.02 -1,843.74 0 ... 0 0 0 0 0
31.57 1,536.08 2,567.65 949.98 804.82 834.86 ... -395.87 -987.73 -396.67 -672.36 -1,264.22
45.45 979.13 2,664.04 920.67 263.95 705.76 ... -447.24 -596.97 -456.4 -461.06 -610.8
60.94 -613.79 597.82 1,700.27 -1,121.96 117.67 ... 1.9 -20.43 -3.58 -3.58 -25.91
47.85 86.35 1,220.83 1,329.82 -390.53 2,536.78 ... 19.23 18.18 9.76 9.76 8.71
In [4]:
Company.tail(10)
Capital
Power
3576 5455 Grid 43811.23 5,231.59 38,166.59 1,39,632.92 95,044.55 1,18,264.26 -10,923.29 12
Corpn
3577 566 Tata Steel 46637.38 971.41 66,663.89 1,01,142.12 28,198.44 42,583.38 -3,727.04 12
Sardar
3578 13569 47261.30 42,263.46 44,129.73 46,810.68 2,636.27 3,746.17 665.73 1
Sar.Narm.
3579 5554 Axis Bank 53164.91 474.1 44,676.51 4,61,977.78 4,02,200.22 4,497.01 0 3,58
3580 2806 Infosys 61082.00 574 48,068 48,098 0 12,869 28,721
HDFC
3581 4987 72677.77 501.3 62,009.42 5,90,576 4,96,009.19 8,463.30 0 4,44
Bank
3582 502 Vedanta 79162.19 296.5 34,057.87 71,906.06 37,643.79 29,848.44 2,503.86 11
3583 12002 IOCL 88134.31 2,427.95 67,969.97 1,40,686.75 55,245.01 1,21,643.45 6,376.84 89
3584 12001 NTPC 91293.70 8,245.46 81,657.35 1,73,099.14 85,995.34 1,28,477.59 11,449.79 42
Bharti
3585 15542 111729.10 1,998.70 78,270.80 1,04,241 21,569.70 1,00,084.90 -12,145.30 11
Airtel
Fixing Messy Column Names (containing spaces):
In [5]:
erc').str.replace('/','_by_').str.replace('&','and').str.replace('[','_').str.replace
Checking Top 10 Rows Again :

In [6]:
Company.head(10)
Out[6]:
Co_Code Co_Name Networth_Next_Year Equity_Paid_Up Networth Capital_Employed Tota
0 16974 Hind.Cables -8021.60 419.36 -7,027.48 -1,007.24 5
Tata Tele.
1 21214 -3986.19 1,954.93 -2,968.08 4,458.20 7
Mah.
ABG
2 14852 -3192.58 53.84 506.86 7,714.68 6
Shipyard
3 2439 GTL -3054.51 157.3 -623.49 2,353.88 2
Bharati
4 23505 -2967.36 50.3 -1,070.83 4,675.33 5
Defence
5 2484 Usha Ispat -2519.40 179.35 -2,519.39 -1,824.75
Hanung
6 23633 -2125.05 30.82 -1,031.57 1,536.08 2
Toys
7 3226 K S Oils -2100.56 45.92 -1,945.45 979.13 2
Quadrant
8 1541 -1695.75 61.23 -1,560.94 -613.79
Tele.
9 2334 ITI -1677.18 288 -1,947.85 86.35 1
10 rows × 67 columns

In [7]:
Company.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3586 entries, 0 to 3585
Data columns (total 67 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Co_Code 3586 non-null int64
1 Co_Name 3586 non-null object
2 Networth_Next_Year 3586 non-null float64
3 Equity_Paid_Up 3586 non-null object
4 Networth 3586 non-null object
5 Capital_Employed 3586 non-null object
6 Total_Debt 3586 non-null object
7 Gross_Block 3586 non-null object
8 Net_Working_Capital 3586 non-null object
9 Current_Assets 3586 non-null object
10 Current_Liabilities_and_Provisions 3586 non-null object
11 Total_Assets_by_Liabilities 3586 non-null object
12 Gross_Sales 3586 non-null object
13 Net_Sales 3586 non-null object
14 Other_Income 3586 non-null object
15 Value_Of_Output 3586 non-null object
16 Cost_of_Production 3586 non-null object
17 Selling_Cost 3586 non-null object
18 PBIDT 3586 non-null object
19 PBDT 3586 non-null object
20 PBIT 3586 non-null object
21 PBT 3586 non-null object
22 PAT 3586 non-null object
23 Adjusted_PAT 3586 non-null object
24 CP 3586 non-null object
25 Revenue_earnings_in_forex 3586 non-null object
26 Revenue_expenses_in_forex 3586 non-null object
27 Capital_expenses_in_forex 3586 non-null object
28 Book_Value_Unit_Curr 3586 non-null object
29 Book_Value_Adj_Unit_Curr 3582 non-null object
30 Market_Capitalisation 3586 non-null object
31 CEPS_annualised_Unit_Curr 3586 non-null object
32 Cash_Flow_From_Operating_Activities 3586 non-null object
33 Cash_Flow_From_Investing_Activities 3586 non-null object
34 Cash_Flow_From_Financing_Activities 3586 non-null object
35 ROG_Net_Worth_perc 3586 non-null object
36 ROG_Capital_Employed_perc 3586 non-null object
37 ROG_Gross_Block_perc 3586 non-null object
38 ROG_Gross_Sales_perc 3586 non-null object
39 ROG_Net_Sales_perc 3586 non-null object
40 ROG_Cost_of_Production_perc 3586 non-null object
41 ROG_Total_Assets_perc 3586 non-null object
42 ROG_PBIDT_perc 3586 non-null object
43 ROG_PBDT_perc 3586 non-null object
44 ROG_PBIT_perc 3586 non-null object
45 ROG_PBT_perc 3586 non-null object
46 ROG_PAT_perc 3586 non-null object
47 ROG_CP_perc 3586 non-null object
48 ROG_Revenue_earnings_in_forex_perc 3586 non-null object
49 ROG_Revenue_expenses_in_forex_perc 3586 non-null object
50 ROG_Market_Capitalisation_perc 3586 non-null object
51 Current_Ratio_Latest 3585 non-null object

52 Fixed_Assets_Ratio_Latest 3585 non-null object
53 Inventory_Ratio_Latest 3585 non-null object
54 Debtors_Ratio_Latest 3585 non-null object
55 Total_Asset_Turnover_Ratio_Latest 3585 non-null float64
56 Interest_Cover_Ratio_Latest 3585 non-null object
57 PBIDTM_perc_Latest 3585 non-null object
58 PBITM_perc_Latest 3585 non-null object
59 PBDTM_perc_Latest 3585 non-null object
60 CPM_perc_Latest 3585 non-null object
61 APATM_perc_Latest 3585 non-null object
62 Debtors_Velocity_Days 3586 non-null object
63 Creditors_Velocity_Days 3586 non-null object
64 Inventory_Velocity_Days 3483 non-null float64
65 Value_of_Output_by_Total_Assets 3586 non-null float64
66 Value_of_Output_by_Gross_Block 3586 non-null object
dtypes: float64(4), int64(1), object(62)
memory usage: 1.8+ MB
In [8]:
Company.dtypes.value_counts()
Out[8]:
object 62
float64 4
int64 1
dtype: int64
In [9]:
Company.shape
print('The number of rows of the dataframe is',Company.shape[0],'.')
print('The number of columns of the dataframe is',Company.shape[1],'.')
The number of rows of the dataframe is 3586 .
The number of columns of the dataframe is 67 .
Dropping below listed columns as we can either use the raw values or the there percentages or
ratios.Here, we are choosing to drop these raw values and keeping the percentage values:
1. Co_Name as name of the company can be identified from Company code as well.
2. Networth as ROG-Net_Worth_perc is nothing but percentage of Value of a company as on 2015 - Current
Year.
3. Capital_Employed as ROG-Capital_Employed_perc is nothing but percentage of Total amount of capital
used for the acquisition of profits by a company.
4. Gross Block as ROG-Gross_Block_perc is percentage of Total value of all of the assets that a company
owns i.e. Gross Block.
5. Gross Sales as ROG-Gross_Sales_perc is percentage of The grand total of sale transactions within the
accounting period i.e., Gross Sales.
6. Net_Sales as ROG-Net_Sales_perc is percentage of Gross sales minus returns, allowances, and discounts
i.e. Net Sales.
7. Cost_of_Production as ROG-Cost_of_Production_perc is percentage of Costs incurred by a business from
manufacturing a product or providing a service i.e. Cost_of_Production.
8. PBIDT as ROG-PBIDT_perc is percentage of Profit Before Interest, Depreciation & Taxes i.e., PBIDT.
9. PBDT as ROG-PBDT_perc is percentage of Profit Before Depreciation and Tax i.e., PBDT.
10. PBIT as ROG-PBIT_perc is percentage of Profit before interest and taxes i.e., PBIT.
11. PBT as ROG-PBT_perc is percentage of Profit before tax i.e., PBT.
p p g ,
12. PAT as ROG-PAT_perc is percentage of Profit After Tax i.e., PAT.
13. CP as ROG-CP_perc is percentage of Commercial paper, a short-term debt instrument to meet short-term
liabilities. i.e CP.
14. Revenue_earnings_in_forex as ROG-Revenue_earnings_in_forex_perc is percentage of Revenue earned in
foreign currency i.e.,Revenue_earnings_in_forex .
15. Revenue_expenses_in_forex as ROG-Revenue_expenses_in_forex_perc is percentage of Expenses due to
foreign currency transactions i.e., Revenue_expenses_in_forex.
16. Market_Capitalisation as ROG-Market_Capitalisation_perc is percentage of Product of the total number of
a company's outstanding shares and the current market price of one share i.e., Market_Capitalisation.
In [10]:
Company.drop(['Co_Name','Networth','Gross_Block','Gross_Sales','Net_Sales','Cost_of_
'PBIDT','PBDT','PBIT','PBT','PAT','CP','Revenue_earnings_in_forex',
'Revenue_expenses_in_forex','Market_Capitalisation','Capital_Employed']
In [11]:
Company.head()
Out[11]:
Co_Code Networth_Next_Year Equity_Paid_Up Total_Debt Net_Working_Capital Current_Asse
0 16974 -8021.60 419.36 5,936.03 -1,076.34 40
1 21214 -3986.19 1,954.93 7,410.18 -1,098.88 486.
2 14852 -3192.58 53.84 6,944.54 4,496.25 9,097.
3 2439 -3054.51 157.3 2,326.05 -2,612.42 1,034.
4 23505 -2967.36 50.3 5,740.90 1,836.23 4,685.
Checking Shape of Data after Dropping Columns:
In [12]:
Company.shape
print('The number of rows of the dataframe after dropping certain columns is',Compan
print('The number of columns of the dataframe after dropping certain columns is',Com
The number of rows of the dataframe after dropping certain columns is

3586 .
The number of columns of the dataframe after dropping certain columns
is 51 .
Checking Duplicated Values

In [13]:
# Check for Duplicate Values
dups = Company.duplicated()
Company[dups]
Out[13]:
Co_Code Networth_Next_Year Equity_Paid_Up Total_Debt Net_Working_Capital Current_Asset
Checking Missing or Null Values

In [14]:
Company.isnull().sum()
Out[14]:
Co_Code 0
Networth_Next_Year 0
Equity_Paid_Up 0
Total_Debt 0
Net_Working_Capital 0
Current_Assets 0
Current_Liabilities_and_Provisions 0
Total_Assets_by_Liabilities 0
Other_Income 0
Value_Of_Output 0
Selling_Cost 0
Adjusted_PAT 0
Capital_expenses_in_forex 0
Book_Value_Unit_Curr 0
Book_Value_Adj_Unit_Curr 4
CEPS_annualised_Unit_Curr 0
Cash_Flow_From_Operating_Activities 0
Cash_Flow_From_Investing_Activities 0
Cash_Flow_From_Financing_Activities 0
ROG_Net_Worth_perc 0
ROG_Capital_Employed_perc 0
ROG_Gross_Block_perc 0
ROG_Gross_Sales_perc 0
ROG_Net_Sales_perc 0
ROG_Cost_of_Production_perc 0
ROG_Total_Assets_perc 0
ROG_PBIDT_perc 0
ROG_PBDT_perc 0
ROG_PBIT_perc 0
ROG_PBT_perc 0
ROG_PAT_perc 0
ROG_CP_perc 0
ROG_Revenue_earnings_in_forex_perc 0
ROG_Revenue_expenses_in_forex_perc 0
ROG_Market_Capitalisation_perc 0
Current_Ratio_Latest 1
Fixed_Assets_Ratio_Latest 1
Inventory_Ratio_Latest 1
Debtors_Ratio_Latest 1
Total_Asset_Turnover_Ratio_Latest 1
Interest_Cover_Ratio_Latest 1
PBIDTM_perc_Latest 1
PBITM_perc_Latest 1
PBDTM_perc_Latest 1
CPM_perc_Latest 1
APATM_perc_Latest 1
Debtors_Velocity_Days 0
Creditors_Velocity_Days 0
Inventory_Velocity_Days 103
Value_of_Output_by_Total_Assets 0
Value_of_Output_by_Gross_Block 0
dtype: int64

In [15]:
Company.isnull().sum().sum()
print("Number of missing values in dataset is",Company.isnull().sum().sum())
Number of missing values in dataset is 118
In [16]:
Out[16]:
object 46
float64 4
int64 1
dtype: int64
In [17]:
Company.head()
Out[17]:
Co_Code Networth_Next_Year Equity_Paid_Up Total_Debt Net_Working_Capital Current_Asse
0 16974 -8021.60 419.36 5,936.03 -1,076.34 40
1 21214 -3986.19 1,954.93 7,410.18 -1,098.88 486.
2 14852 -3192.58 53.84 6,944.54 4,496.25 9,097.
3 2439 -3054.51 157.3 2,326.05 -2,612.42 1,034.
4 23505 -2967.36 50.3 5,740.90 1,836.23 4,685.
Data Insights:
Data Consists of both categorical and numerical variables.

After dropping mentioned columns, there are total of 3586 rows and 52 columns in the dataset.Out of 52,
47 columns are of object type, 1 column is of integer type data and remaining 4 are of float type.
Data contains 118 missing or null values.
Data does not contain any duplicated values.
Column "Networth_Next_Year" can be used to drive the labeled field of the company in the following year
(2016).Hence, we will create a "default" variable that should take:
- Value of 1 when net worth next year is negative
- Value of 0 when net worth next year is positive
'Networth_Next_Year' is the target variable and all other are predector variables.
From data entries it can be observed that 47 columns are of Object Data which are Numerical in nature.
Hence, we will convert these object data types to numerical and then check descriptive statistics of data
(as all these value are of numerical data type).

In [18]:
## Recheck the unique values

for column in Company.columns:
if Company[column].dtype == 'object':
print(column.upper(),': ',Company[column].nunique())
print(Company[column].value_counts().sort_values())
print('\n')
0.06 14
0.01 14
0.05 15
0.02 17
0 48
Name: Net_Working_Capital, Length: 2699, dtype: int64
CURRENT_ASSETS : 2775
15,248.91 1
13.16 1
11.31 1
13.29 1
266.02 1
..
0.08 16
0.02 18
0.01 19
0.03 20
0 27
localhost:8888/notebooks/Downloads/Financial Risk Analytics (FRA)/Project FRA Milestone 1/Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022.ip… 10/102

In [19]:
Company.columns
Out[19]:
Index(['Co_Code', 'Networth_Next_Year', 'Equity_Paid_Up', 'Total_Deb

t',
'Net_Working_Capital', 'Current_Assets',
'Current_Liabilities_and_Provisions', 'Total_Assets_by_Liabilit
ies',
'Other_Income', 'Value_Of_Output', 'Selling_Cost', 'Adjusted_PA

T',
'Capital_expenses_in_forex', 'Book_Value_Unit_Curr',
'Book_Value_Adj_Unit_Curr', 'CEPS_annualised_Unit_Curr',
'Cash_Flow_From_Operating_Activities',
'Cash_Flow_From_Investing_Activities',
'Cash_Flow_From_Financing_Activities', 'ROG_Net_Worth_perc',
'ROG_Capital_Employed_perc', 'ROG_Gross_Block_perc',
'ROG_Gross_Sales_perc', 'ROG_Net_Sales_perc',
'ROG_Cost_of_Production_perc', 'ROG_Total_Assets_perc',
'ROG_PBIDT_perc', 'ROG_PBDT_perc', 'ROG_PBIT_perc', 'ROG_PBT_pe

rc',
'ROG_PAT_perc', 'ROG_CP_perc', 'ROG_Revenue_earnings_in_forex_p

erc',
'ROG_Revenue_expenses_in_forex_perc', 'ROG_Market_Capitalisatio
n_perc',
'Current_Ratio_Latest', 'Fixed_Assets_Ratio_Latest',
'Inventory_Ratio_Latest', 'Debtors_Ratio_Latest',
'Total_Asset_Turnover_Ratio_Latest', 'Interest_Cover_Ratio_Late
st',
'PBIDTM_perc_Latest', 'PBITM_perc_Latest', 'PBDTM_perc_Latest',
'CPM_perc_Latest', 'APATM_perc_Latest', 'Debtors_Velocity_Day

s',
'Creditors_Velocity_Days', 'Inventory_Velocity_Days',
'Value_of_Output_by_Total_Assets', 'Value_of_Output_by_Gross_Bl
ock'],
dtype='object')
Running a For loop to separate Categorical and Numerical Columns:

In [20]:
cat=[]
num=[]
for i in Company.columns:
if Company[i].dtype=="object":
cat.append(i)
else:
num.append(i)
print("Categorical Columns:",cat)
print("/")
print("Numerical Columns:",num)
Categorical Columns: ['Equity_Paid_Up', 'Total_Debt', 'Net_Working_Cap

ital', 'Current_Assets', 'Current_Liabilities_and_Provisions', 'Total_
Assets_by_Liabilities', 'Other_Income', 'Value_Of_Output', 'Selling_Co
st', 'Adjusted_PAT', 'Capital_expenses_in_forex', 'Book_Value_Unit_Cur
r', 'Book_Value_Adj_Unit_Curr', 'CEPS_annualised_Unit_Curr', 'Cash_Flo
w_From_Operating_Activities', 'Cash_Flow_From_Investing_Activities',
'Cash_Flow_From_Financing_Activities', 'ROG_Net_Worth_perc', 'ROG_Capi
tal_Employed_perc', 'ROG_Gross_Block_perc', 'ROG_Gross_Sales_perc', 'R
OG_Net_Sales_perc', 'ROG_Cost_of_Production_perc', 'ROG_Total_Assets_p
erc', 'ROG_PBIDT_perc', 'ROG_PBDT_perc', 'ROG_PBIT_perc', 'ROG_PBT_per
c', 'ROG_PAT_perc', 'ROG_CP_perc', 'ROG_Revenue_earnings_in_forex_per
c', 'ROG_Revenue_expenses_in_forex_perc', 'ROG_Market_Capitalisation_p
erc', 'Current_Ratio_Latest', 'Fixed_Assets_Ratio_Latest', 'Inventory_
Ratio_Latest', 'Debtors_Ratio_Latest', 'Interest_Cover_Ratio_Latest',
'PBIDTM_perc_Latest', 'PBITM_perc_Latest', 'PBDTM_perc_Latest', 'CPM_p
erc_Latest', 'APATM_perc_Latest', 'Debtors_Velocity_Days', 'Creditors_
Velocity_Days', 'Value_of_Output_by_Gross_Block']
Numerical Columns: ['Co_Code', 'Networth_Next_Year', 'Total_Asset_Turn

over_Ratio_Latest', 'Inventory_Velocity_Days', 'Value_of_Output_by_Tot
al_Assets']
In [23]:
, 'Interest_Cover_Ratio_Latest', 'PBIDTM_perc_Latest', 'PBITM_perc_Latest', 'PBDTM_p
Converting Categorical Variables to Numerical Variables:

In [24]:
for feature in Company_X:

if Company[feature].dtype == 'object':
print('\n')
print('feature:',feature)
print(pd.Categorical(Company[feature].unique()))
print(pd.Categorical(Company[feature].unique()).codes)
Company[feature] = pd.Categorical(Company[feature]).codes
feature: Book_Value_Adj_Unit_Curr
['-167.58', '-15.18', '94.14', '-39.64', '-212.89', ..., '209.35', '24

7.39', '114.87', '69.99', '195.8']
Length: 2964
Categories (2963, object): ['-0.01', '-0.02', '-0.03', '-0.05', ...,

'99.12', '99.77', '997.59', '999.22']
[ 116 102 2931 ... 705 2597 1276]
feature: CEPS_annualised_Unit_Curr
['-22.09', '-0.02', '-148.31', '-43.08', '-159.5', ..., '104.9', '41.7

5', '39.03', '17.93', '51.79']
Length: 1900
Categories (1900, object): ['-0.01', '-0.02', '-0.03', '-0.04', ...,

'94.92', '96.53', '986.67', '995.65']
[ 257 1 188 ... 1367 907 1572]
Checking Changed Dtype Information

In [25]:
Company.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3586 entries, 0 to 3585
Data columns (total 51 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Co_Code 3586 non-null int64
1 Networth_Next_Year 3586 non-null float64
2 Equity_Paid_Up 3586 non-null int16
3 Total_Debt 3586 non-null int16
4 Net_Working_Capital 3586 non-null int16
5 Current_Assets 3586 non-null int16
6 Current_Liabilities_and_Provisions 3586 non-null int16
7 Total_Assets_by_Liabilities 3586 non-null int16
8 Other_Income 3586 non-null int16
9 Value_Of_Output 3586 non-null int16
10 Selling_Cost 3586 non-null int16
11 Adjusted_PAT 3586 non-null int16
12 Capital_expenses_in_forex 3586 non-null int16
13 Book_Value_Unit_Curr 3586 non-null int16
14 Book_Value_Adj_Unit_Curr 3586 non-null int16
15 CEPS_annualised_Unit_Curr 3586 non-null int16
16 Cash_Flow_From_Operating_Activities 3586 non-null int16
17 Cash_Flow_From_Investing_Activities 3586 non-null int16
18 Cash_Flow_From_Financing_Activities 3586 non-null int16
19 ROG_Net_Worth_perc 3586 non-null int16
20 ROG_Capital_Employed_perc 3586 non-null int16
21 ROG_Gross_Block_perc 3586 non-null int16
22 ROG_Gross_Sales_perc 3586 non-null int16
23 ROG_Net_Sales_perc 3586 non-null int16
24 ROG_Cost_of_Production_perc 3586 non-null int16
25 ROG_Total_Assets_perc 3586 non-null int16
26 ROG_PBIDT_perc 3586 non-null int16
27 ROG_PBDT_perc 3586 non-null int16
28 ROG_PBIT_perc 3586 non-null int16
29 ROG_PBT_perc 3586 non-null int16
30 ROG_PAT_perc 3586 non-null int16
31 ROG_CP_perc 3586 non-null int16
32 ROG_Revenue_earnings_in_forex_perc 3586 non-null int16
33 ROG_Revenue_expenses_in_forex_perc 3586 non-null int16
34 ROG_Market_Capitalisation_perc 3586 non-null int16
35 Current_Ratio_Latest 3586 non-null int16
36 Fixed_Assets_Ratio_Latest 3586 non-null int16
37 Inventory_Ratio_Latest 3586 non-null int16
38 Debtors_Ratio_Latest 3586 non-null int16
39 Total_Asset_Turnover_Ratio_Latest 3585 non-null float64
40 Interest_Cover_Ratio_Latest 3586 non-null int16
41 PBIDTM_perc_Latest 3586 non-null int16
42 PBITM_perc_Latest 3586 non-null int16
43 PBDTM_perc_Latest 3586 non-null int16
44 CPM_perc_Latest 3586 non-null int16
45 APATM_perc_Latest 3586 non-null int16
46 Debtors_Velocity_Days 3586 non-null int16
47 Creditors_Velocity_Days 3586 non-null int16
48 Inventory_Velocity_Days 3483 non-null float64
49 Value_of_Output_by_Total_Assets 3586 non-null float64
50 Value_of_Output_by_Gross_Block 3586 non-null int16

dtypes: float64(4), int16(46), int64(1)
memory usage: 462.4 KB
In [26]:
Out[26]:
int16 46
float64 4
int64 1
dtype: int64

In [27]:
round(Company.describe(),2).T
Out[27]:
count mean std min 25% 50%
Co_Code 3586.0 16065.39 19776.82 4.00 3029.25 6077.50 24
Networth_Next_Year 3586.0 725.05 4769.68 -8021.60 3.98 19.02
Equity_Paid_Up 3586.0 963.22 604.30 0.00 399.25 1058.00
Total_Debt 3586.0 716.66 704.02 0.00 5.00 546.00
Net_Working_Capital 3586.0 1241.80 788.90 0.00 484.25 1205.50
Current_Assets 3586.0 1227.19 859.12 0.00 417.25 1193.00
Current_Liabilities_and_Provisions 3586.0 838.92 737.16 0.00 76.25 740.50
Total_Assets_by_Liabilities 3586.0 1543.59 918.59 0.00 747.00 1561.50 2
Other_Income 3586.0 237.34 320.10 0.00 10.00 53.00
Value_Of_Output 3586.0 1060.58 851.34 0.00 193.25 984.00
Selling_Cost 3586.0 218.16 326.97 0.00 0.00 16.00
Adjusted_PAT 3586.0 725.19 486.18 0.00 429.25 634.00
Capital_expenses_in_forex 3586.0 38.41 103.54 0.00 0.00 0.00
Book_Value_Unit_Curr 3586.0 1475.19 876.21 0.00 677.00 1441.50 2
Book_Value_Adj_Unit_Curr 3586.0 1439.54 859.66 -1.00 660.25 1397.50 2
CEPS_annualised_Unit_Curr 3586.0 766.75 526.91 0.00 464.00 582.00
Cash_Flow_From_Operating_Activities 3586.0 853.48 617.21 0.00 355.25 703.00
Cash_Flow_From_Investing_Activities 3586.0 830.13 534.97 0.00 271.25 1027.50
Cash_Flow_From_Financing_Activities 3586.0 926.98 562.65 0.00 425.25 1200.00
ROG_Net_Worth_perc 3586.0 1193.52 686.45 0.00 693.25 1083.50
ROG_Capital_Employed_perc 3586.0 1203.52 714.62 0.00 637.25 1114.50
ROG_Gross_Block_perc 3586.0 784.95 464.85 0.00 556.00 580.00
ROG_Gross_Sales_perc 3586.0 1283.22 734.54 0.00 747.25 1144.00
ROG_Net_Sales_perc 3586.0 1279.97 732.60 0.00 748.25 1138.50
ROG_Cost_of_Production_perc 3586.0 1291.87 730.64 0.00 740.25 1177.50
ROG_Total_Assets_perc 3586.0 1237.13 736.45 0.00 631.25 1154.00
ROG_PBIDT_perc 3586.0 1337.94 750.91 0.00 743.00 1245.00
ROG_PBDT_perc 3586.0 1345.10 752.49 0.00 745.25 1252.50
ROG_PBIT_perc 3586.0 1342.16 745.57 0.00 756.25 1247.00
ROG_PBT_perc 3586.0 1312.40 734.64 0.00 721.25 1209.50
ROG_PAT_perc 3586.0 1287.95 715.27 0.00 726.25 1180.00
ROG_CP_perc 3586.0 1331.98 748.07 0.00 739.25 1243.00
ROG_Revenue_earnings_in_forex_perc 3586.0 565.15 215.06 0.00 571.00 571.00

count mean std min 25% 50%
ROG_Revenue_expenses_in_forex_perc 3586.0 652.95 279.29 0.00 644.00 644.00
ROG_Market_Capitalisation_perc 3586.0 865.03 515.11 0.00 601.00 601.00
Current_Ratio_Latest 3586.0 249.97 249.97 -1.00 88.00 136.00
Fixed_Assets_Ratio_Latest 3586.0 328.16 352.03 -1.00 27.00 164.50
Inventory_Ratio_Latest 3586.0 514.77 504.85 -1.00 0.00 401.50
Debtors_Ratio_Latest 3586.0 574.38 491.33 -1.00 39.25 571.00
Total_Asset_Turnover_Ratio_Latest 3585.0 1.24 2.67 0.00 0.07 0.60
Interest_Cover_Ratio_Latest 3586.0 583.88 344.73 -1.00 372.00 471.00
PBIDTM_perc_Latest 3586.0 1125.01 675.97 -1.00 453.00 1059.50
PBITM_perc_Latest 3586.0 1131.02 642.01 -1.00 575.00 1078.50
PBDTM_perc_Latest 3586.0 1144.84 645.67 -1.00 619.00 1072.50
CPM_perc_Latest 3586.0 1086.45 602.02 -1.00 608.00 1016.00
APATM_perc_Latest 3586.0 1046.48 545.05 -1.00 754.00 911.50
Debtors_Velocity_Days 3586.0 249.99 194.35 0.00 60.25 255.50
Creditors_Velocity_Days 3586.0 227.90 172.04 0.00 59.00 237.00
Inventory_Velocity_Days 3483.0 79.64 137.85 -199.00 0.00 35.00
Value_of_Output_by_Total_Assets 3586.0 0.82 1.20 -0.33 0.07 0.48
Value_of_Output_by_Gross_Block 3586.0 346.93 353.00 0.00 46.00 181.50

In [28]:
continuous=Company.dtypes[(Company.dtypes=='int64')|(Company.dtypes=='float64')|(Com
data_plot=Company[continuous]
data_plot.boxplot(figsize=(20,10));
plt.xlabel("Continuous Variables")
plt.ylabel("Density")
plt.title("Figure: Boxplot of Continuous Data")
Out[28]:
Text(0.5, 1.0, 'Figure: Boxplot of Continuous Data')
Noticeably, there are outliers present in the data set.To confirm our analysis , we will further detect
outliers and decide how these outliers should be treated.
Detecting outliers using IQR method by defining a new range, that is called a decision range, and any
data point lying outside this range is considered as an outlier. The range is as given below:
IQR = Q3 − Q1
Lower Bound= Q1 - 1.5*IQR
Upper Bound=Q3 + 1.5*IQR
In [29]:
Q1 = Company.quantile(0.25)
Q3 = Company.quantile(0.75)
IQR = Q3 - Q1
UL = Q3 + 1.5*IQR
LL = Q1 - 1.5*IQR

In [30]:
((Company> UL)|(Company< LL)).sum()
Out[30]:
Co_Code 291
Equity_Paid_Up 0
Total_Debt 0
Current_Assets 0
Other_Income 79
Value_Of_Output 0
Selling_Cost 168
Adjusted_PAT 0
ROG_PBIDT_perc 0
ROG_PBDT_perc 0
ROG_PBIT_perc 0
ROG_PBT_perc 0
ROG_PAT_perc 0
ROG_CP_perc 0
PBITM_perc_Latest 0
PBDTM_perc_Latest 0
CPM_perc_Latest 0
APATM_perc_Latest 0
dtype: int64

In [31]:
# Replacing outliers to NaN Values
Company[((Company> UL) | (Company< LL))]= np.nan

In [32]:
Company.isnull().sum()
Out[32]:
Co_Code 291
Equity_Paid_Up 0
Total_Debt 0
Current_Assets 0
Other_Income 79
Value_Of_Output 0
Selling_Cost 168
Adjusted_PAT 0
ROG_PBIDT_perc 0
ROG_PBDT_perc 0
ROG_PBIT_perc 0
ROG_PBT_perc 0
ROG_PAT_perc 0
ROG_CP_perc 0
PBITM_perc_Latest 0
PBDTM_perc_Latest 0
CPM_perc_Latest 0
APATM_perc_Latest 0
dtype: int64

In [33]:
Company.isnull().sum().sum()
print("Number of missing values after replacing outliers with Nan values is",Company
Number of missing values after replacing outliers with Nan values is 5

717
In [34]:
Company.shape
print('The number of rows of the temporary dataframe created is',Company.shape[0],'

print('The number of columns of the temporary dataframe created is',Company.shape[1]
The number of rows of the temporary dataframe created is 3586 .
The number of columns of the temporary dataframe created is 51 .
Data has very few missing or null values and roughly 1.6% of data has outliers.
Here, we are converting outliers to missing values.Hence, total number of missing values in addition to
outliers will be 5717 (Total Number of Outliers+Total Number of Missing Values).
Note: Before converting outliers to NaN values number of missing values present in the dataset was
118.
1.2 Missing Value Treatment
Visualizing Missing Values:

In [35]:
plt.figure(figsize = (12,8))
sns.heatmap(Company.isnull(), cbar = False, cmap = 'coolwarm', yticklabels = False)
plt.show()

Noticeable, presence of missing values in some variables can be observed.Blue color in the heatmap is
indicating occupied cells while red cuolor indicates missing values present in the data.Listing down few
observations:
For variable "Networth_Next_Year" some values might be completely missing.

Maximum values are missing from variable "ROG-Revenue expenses in forex (%)" followed by "Revenue
expenses in forex" ( which is expected, since ROG is the percentage represtation of of revenue values).
Also, some missing values can be observed in variables "Inventory Velocity (Days)", "Debtors
Ratio[Latest]", "ROG-Market Capitalisation (%)","Captital_expenses_in_forex","Selling_cost" and
"Other_Income".
Typically if missing data in columns is less then 30 % of our data and at row level data is atleast at 90%
complete, we do not drop the data.Here, we will first check completeness of data and then decide the
technique to be used to move forward.
In order to check the completeness of data at row level, we will look at total number of missing values in each
row.
Note: To find total number of missing values in each row , we will set axis as 1.
Since, it is a company and we want to quantify the data.Therefore, we are choosing to do a missing value
imputation instead of dropping these missing values.
We will try to target companies which completes atleast 90 % of the data in each row i.e. we will filter
out companies where there are atleast 5 or less missing values to identify the reliable data until this
point.
After filtering out these values shape of our data changes (before filtering; number of rows= 3586) to :
The number of rows of the temporary dataframe created is 3569 .
The number of columns of the temporary dataframe created is 51 .
This indicates that most of our data is still available.
Note: We have created a temporary dataframe to filter out companies with atleast 5 missing
values.
In [36]:
Company_temp = Company[Company.isnull().sum(axis = 1) <= 5]
In [37]:
Company_temp.shape
Out[37]:
(3569, 51)

In [38]:
Company.isnull().sum().sort_values(ascending = False)/Company.index.size
Out[38]:
ROG_Revenue_expenses_in_forex_perc 0.450363
ROG_Revenue_earnings_in_forex_perc 0.367262
Capital_expenses_in_forex 0.193530
Networth_Next_Year 0.188511
Inventory_Velocity_Days 0.101785
Co_Code 0.081149
Total_Asset_Turnover_Ratio_Latest 0.056330
Selling_Cost 0.046849
Current_Ratio_Latest 0.044618
Value_of_Output_by_Total_Assets 0.041829
Other_Income 0.022030
Cash_Flow_From_Financing_Activities 0.000000
Cash_Flow_From_Investing_Activities 0.000000
Cash_Flow_From_Operating_Activities 0.000000
Book_Value_Adj_Unit_Curr 0.000000
Book_Value_Unit_Curr 0.000000
ROG_Net_Worth_perc 0.000000
CEPS_annualised_Unit_Curr 0.000000
Adjusted_PAT 0.000000
ROG_Gross_Block_perc 0.000000
Value_Of_Output 0.000000
Total_Assets_by_Liabilities 0.000000
Current_Liabilities_and_Provisions 0.000000
Current_Assets 0.000000
Net_Working_Capital 0.000000
Total_Debt 0.000000
Equity_Paid_Up 0.000000
ROG_Capital_Employed_perc 0.000000
Value_of_Output_by_Gross_Block 0.000000
ROG_Gross_Sales_perc 0.000000
ROG_Net_Sales_perc 0.000000
Creditors_Velocity_Days 0.000000
Debtors_Velocity_Days 0.000000
APATM_perc_Latest 0.000000
CPM_perc_Latest 0.000000
PBDTM_perc_Latest 0.000000
PBITM_perc_Latest 0.000000
PBIDTM_perc_Latest 0.000000
Interest_Cover_Ratio_Latest 0.000000
Debtors_Ratio_Latest 0.000000
Inventory_Ratio_Latest 0.000000
Fixed_Assets_Ratio_Latest 0.000000
ROG_Market_Capitalisation_perc 0.000000
ROG_CP_perc 0.000000
ROG_PAT_perc 0.000000
ROG_PBT_perc 0.000000
ROG_PBIT_perc 0.000000
ROG_PBDT_perc 0.000000
ROG_PBIDT_perc 0.000000
ROG_Cost_of_Production_perc 0.000000
ROG_Total_Assets_perc 0.000000
dtype: float64
Dropping columns with more than 30% missing values

Dropping columns with more than 30% missing values
We are sorting proportion of missing values by dividing number of missing values by number of applicable
rows.We will eliminate anything that is more then 30 %. Noticeably, "ROG-Revenue_expenses_in_forex_perc"
and "ROG-Revenue_earnings_in_forex_perc" are the only to values which are more then 30%. Therefore, we
can eliminate these values.
In [39]:
Company_sub1 = Company.drop(['ROG_Revenue_expenses_in_forex_perc','ROG_Revenue_earni
axis = 1)
In [40]:
Company_sub1.shape
print('The number of rows after dropping columns with more then 30% missing values i
print('The number of columns after dropping columns with more then 30% missing value
The number of rows after dropping columns with more then 30% missing v
alues is 3586 .
The number of columns after dropping columns with more then 30% missin
g values is 49 .
The missing values are of numeric nature.Hence, can be imputed using KNNImputer function from the
impute module of the sklearn. This imputer utilizes the k-Nearest Neighbors method to replace the
missing values in the datasets by finding the nearest neighbors with the Euclidean distance
matrix.
Another critical point here is that the KNN Imptuer is a distance-based imputation method and it requires us to
normalize our data. Otherwise, the different scales of our data will lead the KNN Imputer to generate biased
replacements for the missing values.Here, we will use Scikit-Learn’s Standard Scaler method which will scale
our variables to have values between 0 and 1.
It is recommended that data should be split to response and predictor variables
Imputation is done by predicting the missing value based on values of 10 nearest neighbors of the same
variable. Such that all the missing values are replaced based on nearest neighbors value.
Segregate the predictors and response
In [41]:
predictors = Company_sub1.drop('Networth_Next_Year', axis = 1)

response = Company_sub1['Networth_Next_Year']
Scale the predictors
In [42]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_predictors = pd.DataFrame(scaler.fit_transform(predictors), columns = predict
In [43]:
Company_sub2 = pd.concat([scaled_predictors, response], axis = 1)

Imputing the remaining missing values
In [44]:
from sklearn.impute import KNNImputer
In [45]:
imputer = KNNImputer(n_neighbors=10)
In [46]:
Company_imputed = pd.DataFrame(imputer.fit_transform(Company_sub2), columns = Compan

In [47]:
Company_imputed.isnull().sum()
Out[47]:
Co_Code 0
Equity_Paid_Up 0
Total_Debt 0
Current_Assets 0
Other_Income 0
Value_Of_Output 0
Selling_Cost 0
Adjusted_PAT 0
ROG_PBIDT_perc 0
ROG_PBDT_perc 0
ROG_PBIT_perc 0
ROG_PBT_perc 0
ROG_PAT_perc 0
ROG_CP_perc 0
PBITM_perc_Latest 0
PBDTM_perc_Latest 0
CPM_perc_Latest 0
APATM_perc_Latest 0
dtype: int64
Noticeably, missing values have been treated now.

1.3 Transform Target variable into 0 and 1
There is no target variable defined – but since the objective is to build a model for investor to decode which
company to invest in – the variable Networth_Next_Year coud be used to transform into target variable (as
mentined in rubric as well).
We will now create a default variable that should take the below mentioned values:
of 1 when net worth next year is negative & 0 when net worth next year is positive.
If the company’s Networth_Next_Year is positive – then the company would continue to return good
investment for investor and thus could be transformed as 0 (i.e., Non-Default).
If the company’s Networth_Next_Year is negative – then the company is likely to not return a good
investment to investor and transformed as 1 (i.e.Default).
Hence, creating a binary target variable using 'Networth_Next_Year'
Creating a binary target variable using 'Networth_Next_Year'
In [48]:
Company_imputed['default'] = np.where((Company_imputed['Networth_Next_Year'] > 0), 0
Checking top 10 rows
In [49]:
Company_imputed[['default','Networth_Next_Year']].head(10)
Out[49]:
default Networth_Next_Year
0 1 -6.218
1 1 -23.782
2 0 43.906
3 1 -23.723
4 1 -12.392
5 1 -13.211
6 1 -7.314
7 0 8.508
8 1 -27.635
9 0 35.004

In [50]:
Company_imputed['default'].value_counts()
Out[50]:
0 3225
1 361
Name: default, dtype: int64
Checking proportion of default
In [51]:
Company_imputed['default'].value_counts(normalize = True)
Out[51]:
0 0.899331
1 0.100669
Name: default, dtype: float64
Noticeably, approximately 10% of the companies from the dataset are likely to default and these are the
companies in which investors should probably avoid investing in.
1.4 Univariate (4 marks) & Bivariate ( 6marks) analysis with proper

interpretation. (You may choose to include only those variables
which were significant in the model building)
Univariate Analysis:
In [52]:
def univariateAnalysis_numeric(column,nbins):
print("Description of " + column)
print("-------------------------------------------------------------------------
print(Company_imputed[column].describe(),end=' ')
plt.figure()
print("Distribution of " + column)
print("-------------------------------------------------------------------------
sns.distplot(Company_imputed[column], kde=False, color='skyblue');
plt.show()
plt.figure()
print("BoxPlot of " + column)
print("-------------------------------------------------------------------------
ax = sns.boxplot(x=Company_imputed[column],color='b')
plt.show()

In [53]:
ted_imp_features=pd.DataFrame(Company_imputed,columns=['Net_Working_Capital','Book_Va
'ROG_Capital_Employed_perc','ROG_Total_Assets_perc','Current_Ratio_
'Fixed_Assets_Ratio_Latest','Inventory_Ratio_Latest','Debtors_Ratio
'Total_Asset_Turnover_Ratio_Latest','Interest_Cover_Ratio_Latest',
'ROG_Market_Capitalisation_perc', 'ROG_Cost_of_Production_perc'])
In [54]:
Company_num = Company_imputed_imp_features.select_dtypes(include = ['int64','int16',

Company_cat=Company_imputed_imp_features.select_dtypes(["object"])
Categorical_column_list=list(Company_cat.columns.values)
Numerical_column_list = list(Company_num.columns.values)
Numerical_length=len(Numerical_column_list)
Categorical_length=len(Categorical_column_list)
print("Length of Numerical columns is :",Numerical_length)
print("Length of Categorical columns is :",Categorical_length)
Length of Numerical columns is : 13
Length of Categorical columns is : 0
In [55]:
for x in Numerical_column_list:
univariateAnalysis_numeric(x,20)
pd.options.display.float_format = '{:.3f}'.format
Name: Book_Value_Unit_Curr, dtype: float64 Distribution of Book_Value_

Unit_Curr
----------------------------------------------------------------------
------
Insights from Univariate Analysis:

Noticeably,even though we have treated outliers but some of the variables still indicate the presence of
outliers.
50% of the times, Equity paid up i.e., amount that has been received by the company through the issue of
shares to the shareholders is in positive.
50 % of the times the company is in debt.

Majorty of the times i.e. 75% of the times company's net working capital, current assets, current
liabilities,total assets by liabilities, other income, value output, selling cost, adjusted PAT
,Book_Value_Unit_Curr,Book_Value_Adj._Unit_Curr,CEPS_annualised_Unit_Curr,
Cash_Flow_From_Operating_Activities, Cash_Flow_From_Investing_Activities etc is positive.
Company is currently not financing in longterm investments in forex currently. Probably company should
consider funding in longterm forex investmets to generate high revenues.
Since, companies are not investing in forex most of the values are 0.Therfore, boxplot is a line for variable
Capital_expense_in_forex.
For variable "Inventory_Velocity Days" there is just one whisker in boxplot is, due to the extreme skewness
of data and also there is no value smaller than the median.
In [56]:
Numerical_column_list = list(Company_num.columns.values)
Numerical_column_list
Out[56]:
['Net_Working_Capital',
'Book_Value_Unit_Curr',
'ROG_Net_Worth_perc',
'ROG_Capital_Employed_perc',
'ROG_Total_Assets_perc',
'Current_Ratio_Latest',
'Fixed_Assets_Ratio_Latest',
'Inventory_Ratio_Latest',
'Debtors_Ratio_Latest',
'Total_Asset_Turnover_Ratio_Latest',
'Interest_Cover_Ratio_Latest',
'ROG_Market_Capitalisation_perc',
'ROG_Cost_of_Production_perc']
Bivariate/Multivariate Analysis:
Countplot of Target Variable:

In [57]:
# EDA for categorical columns 'Holiday_package'.
sns.catplot('default', data=Company_imputed, kind='count',aspect=1.5, palette='mako'

plt.title("Figure: Countplot of Target Variable Default")
Out[57]:
Text(0.5, 1.0, 'Figure: Countplot of Target Variable Default')
The data has higher Non-default companies i.e., the companies whic are expected to have a postive Net
Worth next year (which is good for investors for decision making).
Some of the important parameters which are more likely to contribute to the strength of a company's balance
sheet can be evaluated by below listed parameters:
['Net_Working_Capital','Book_Value_Unit_Curr','ROG-Net_Worth_perc','ROG-Capital_Employed_perc','ROG-
Total_Assets_perc',
'Current_Ratio[Latest]',
'Fixed_Assets_Ratio[Latest]','Inventory_Ratio[Latest]','Debtors_Ratio[Latest]',
'Total_Asset_Turnover_Ratio[Latest]','Interest_Cover_Ratio[Latest]','ROG-Market_Capitalisation_perc', 'ROG-
Cost_of_Production_perc']
1. Net Working Capital: It measures company's liquidity and short-term financial health. A company will have
negative NWC if its ratio of current assets to liabilities is less than one.
2. Book Value (Unit Curr): High book value per share (due to profits accumulated over the years) indicates a
strong company.
3. ROG-Net Worth (%) : Companies with low capital base (that don't need additional capital for growth) will
show a higher ratio.
4. ROG-Capital Employed (%): Captures the profit generated on total capital employed (including
debt).Companies with low capital base (those that don't need additional capital for growth) will display a
higher ratio.
5. ROG-Total Assets (%): Captures the net profit generated on total assets.
6. Current Ratio[Latest]: It tells how cash rich a company is. It helps us gauge the short-term financial
strength of a company.
7. Fixed Assets Ratio[Latest]:It reveals how efficient a company is at generating sales from its existing fixed
assets.
8. Inventory Ratio[Latest] : Shows how efficiently the company manages its inventory.
9. Debtors Ratio[Latest]: A high debt to equity ratio is a warning signal, especially in situations like business
downturns.
10. Total Asset Turnover Ratio[Latest] : Shows how efficiently the company manages its total assets.
11. Interest Cover Ratio[Latest]: measures a company's ability to handle its outstanding debt.
12. ROG-Market Capitalisation (%): Company's worth as determined by the stock market.
13. ROG_Cost_of_Production_perc : Product costing is the process of tracking and studying all the various
expenses that are accrued in the production and sale of a product.

In [58]:
plt.figure(figsize=(25,10))
sns.boxplot(data=Company_imputed_imp_features)
plt.xlabel("Variables")
plt.xticks(rotation=90)
plt.ylabel("Density")
plt.title('Figure:Boxplot of few important features')
Out[58]:
Text(0.5, 1.0, 'Figure:Boxplot of few important features')
Insights:
Variable 'Current_Ratio[Latest]'and 'Total_Asset_Turnover_Ratio[Latest]' still have some extreme values.

Due to the extreme skewness of data, variable Inventory ratio doesnot have a lower whisker.
Distribution Plot of Important Features:

In [59]:
# plotting multiple density plot
Company_imputed_imp_features.plot.kde(figsize = (20,10),
linewidth = 4)
Out[59]:
<AxesSubplot:ylabel='Density'>
In [60]:
# Skewness of Data
Company_imputed_imp_features.skew(axis = 0, skipna = True).sort_values(ascending=Fal
Out[60]:
Current_Ratio_Latest 1.275
Total_Asset_Turnover_Ratio_Latest 1.075
Fixed_Assets_Ratio_Latest 0.889
ROG_Market_Capitalisation_perc 0.812
Interest_Cover_Ratio_Latest 0.739
Inventory_Ratio_Latest 0.405
Debtors_Ratio_Latest 0.229
Net_Working_Capital 0.175
ROG_Cost_of_Production_perc 0.115
ROG_Capital_Employed_perc 0.097
Book_Value_Unit_Curr 0.095
ROG_Total_Assets_perc 0.074
ROG_Net_Worth_perc 0.072
dtype: float64

Insights From Skewness and Distribution Plots of Important Features:

Since, skewness is more then 1 indicating distribution is highly skewed for variables "Current_Ratio[Latest]
" and "Total_Asset_Turnover_Ratio[Latest]".
Data is moderately skewed for variables "Fixed_Assets_Ratio[Latest]", "ROG-Market_Capitalisation_perc"
and "Interest_Cover_Ratio[Latest]".
Other variables look fairly symmetrical.
Note: The rule of thumb is:
• If the skewness is between -0.5 and 0.5, the data are fairly symmetrical.
• If the skewness is between -1 and –
0.5 or between 0.5 and 1, the data are moderately skewed.
• If the skewness is less than -1 or greater than 1,
the data are highly skewed.
In [61]:
sns.boxplot(Company_imputed["default"], Company_imputed['Current_Ratio_Latest'],data
plt.title("Figure: Plot of Default with Current_Ratio_Latest")
plt.show()

In [62]:
#boxplot_Total_Asset_Turnover_Ratio[Latest]
sns.boxplot(Company_imputed["default"], Company_imputed['Total_Asset_Turnover_Ratio_
plt.title('Figure: Boxplot of Default with Total_Asset_Turnover_Ratio[Latest]')
plt.show()

In [63]:
#boxplot_Fixed_Assets_Ratio[Latest]
sns.boxplot(Company_imputed["default"], Company_imputed['Fixed_Assets_Ratio_Latest']
plt.title('Figure: Boxplot of Default with Fixed_Assets_Ratio[Latest]')
plt.show()

In [64]:
#boxplot_ROG-Market_Capitalisation_perc
sns.boxplot(Company_imputed["default"], Company_imputed['ROG_Market_Capitalisation_p
plt.title('Figure: Boxplot of Default with ROG-Market_Capitalisation_perc')
plt.show()

In [65]:
#boxplot_Interest_Cover_Ratio[Latest]
sns.boxplot(Company_imputed["default"], Company_imputed['Interest_Cover_Ratio_Latest
plt.title('Figure: Boxplot of Default with Interest_Cover_Ratio[Latest]', fontsize=15
plt.show()

In [66]:
#boxplot_Inventory_Ratio[Latest]
sns.boxplot(Company_imputed["default"], Company_imputed['Inventory_Ratio_Latest'],da
plt.title('Figure: Boxplot of Default with Inventory_Ratio[Latest]', fontsize=15)
plt.show()
In [67]:
#boxplot_Debtors_Ratio[Latest]
sns.boxplot(Company_imputed["default"], Company_imputed['Debtors_Ratio_Latest'],data
plt.title('Figure: Boxplot of Default with Debtors_Ratio[Latest] ',fontsize=15)
plt.show()

In [68]:
#boxplot_Net_Working_Capital
sns.boxplot(Company_imputed["default"], Company_imputed['Net_Working_Capital'],data=
plt.title('Figure: Boxplot of Default with Net_Working_Capital', fontsize=15)
plt.show()
In [69]:
#boxplot_ROG-Cost_of_Production_perc
sns.boxplot(Company_imputed["default"], Company_imputed['ROG_Cost_of_Production_perc
plt.title('Figure: Boxplot of Default with ROG-Cost_of_Production_perc', fontsize=15
plt.show()

In [70]:
#boxplot_ROG-Capital_Employed_perc
sns.boxplot(Company_imputed["default"], Company_imputed['ROG_Capital_Employed_perc']
plt.title('Figure: Boxplot of Default with ROG-Capital_Employed_perc', fontsize=15)
plt.show()
In [71]:
#boxplot_ROG-Capital_Employed_perc
sns.boxplot(Company_imputed["default"], Company_imputed['Book_Value_Unit_Curr'],data
plt.title('Figure: Boxplot of Default with Book_Value_Unit_Curr ', fontsize=15)
plt.show()

In [72]:
#boxplot_ROG-ROG-Total_Assets_perc
sns.boxplot(Company_imputed["default"], Company_imputed['ROG_Total_Assets_perc'],dat
plt.title('Figure: Boxplot of Default with ROG-Total_Assets_perc ', fontsize=15)
plt.show()

In [73]:
plt.figure(figsize = (12,8))
cor_matrix = Company_imputed.drop('default', axis = 1).corr()
sns.heatmap(cor_matrix, cmap = 'plasma', vmin = -1, vmax= 1)
Out[73]:
<AxesSubplot:>

In [ ]:
1.5 Train Test Split

We will split the data into Training and Testing data set, with 70:30 proportion with the fixed random state as 1
to ensure uniformity across multiple systems.
Split the data into Train and Test dataset in a ratio of 67:33 with the fixed random_state as 42 to ensure
uniformity across multiple systems and stratify on default to make sure both train and test data have similar
proportion of defaulters and non-defaulters. This is done as the dataset is imbalanced and has more of non-
defaulters. Before we do the train-test split , we will first separate independent (X) and dependent (y) variables
(to perform Train-Test split) using train_test_split from sklearn.model_selection.
In [74]:
from sklearn.model_selection import train_test_split

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
Splitting the data into train and test sets
In [75]:
predictors = Company_imputed.drop('default', axis = 1)

response = Company_imputed[['default']]
In [76]:
X_train, X_test, y_train, y_test = train_test_split(predictors, response,

test_size = 0.33, random_state =
In [77]:
print('Number of rows and columns of the training set for the independent variables:
print('Number of rows and columns of the training set for the dependent variable:',y
print('Number of rows and columns of the test set for the independent variables:',X_
print('Number of rows and columns of the test set for the dependent variable:',y_tes
Number of rows and columns of the training set for the independent var
iables: (2402, 49)
Number of rows and columns of the training set for the dependent varia
ble: (2402, 1)
Number of rows and columns of the test set for the independent variabl
es: (1184, 49)
Number of rows and columns of the test set for the dependent variable:
(1184, 1)

In [78]:
X_train.head()
Out[78]:
Co_Code Equity_Paid_Up Total_Debt Net_Working_Capital Current_Assets Current_Liabili
662 -0.489 0.157 -1.015 -0.855 -1.153
1373 -0.453 -0.371 -1.015 -0.019 -0.230
3268 1.264 0.011 -0.785 -0.942 -1.237
3246 -0.850 0.410 -0.833 1.674 -1.273
1456 -0.034 -0.328 1.471 1.639 1.713
In [79]:
y_train.head()
Out[79]:
default
662 0
1373 0
3268 0
3246 0
1456 0
In [80]:
X_test.head()
Out[80]:
Co_Code Equity_Paid_Up Total_Debt Net_Working_Capital Current_Assets Current_Liabili
3163 -0.659 -0.626 0.449 0.710 -1.249
3133 1.463 1.598 1.082 -0.759 -0.589
937 -0.316 0.839 -1.015 0.795 0.637
196 0.968 0.278 -0.265 -1.209 1.401
2852 -0.533 0.018 1.323 1.690 -0.821

In [81]:
y_test.head()
Out[81]:
default
3163 0
3133 0
937 0
196 1
2852 0
1.6 Build Logistic Regression Model (using statsmodel library) on

most important variables on Train Dataset and choose the
optimum cutoff. Also showcase your model building
approach
Here, we will use Logistic regression Model to evaluate the relationship between one dependent binary variable
and one or more independent variables.This model will help predicts the probability of occurrence of Default
using a logit function.
Assumptions of Logistic Regression Model:
1. It assumes that there is minimal, or no multi-collinearity among the independent variables.

2. It assumes that independent variables are linearly related to log of odds.
3. It assumes a large sample for good prediction.
4. It assumes that the observations are independent of each other.
5. There are no influential values(outliers) in the continuous predictors (independent variables).
6. Logistic Regression with 2 classes that the dependent variable is binary and the ordered Logistic
Regression requires the dependent variable to be ordered.
There are two methods to solve a Logistic Regression problem:
1. Stats Model
2. Scikit Learn
Here, we will use Stats Model method by importing statsmodels.api as sm
Note: Statsmodels provides a Logit() function for performing logistic regression. The Logit() function
accepts y and X as parameters and returns the Logit object. The model is then fitted to the data.The
logit function is simply the logarithm of the odds.
logit(x) = log(x / (1 – x))
The inverse of the logit function is the sigmoid function.
The equation of the Logistic Regression by which we predict the corresponding probabilities and then go on
predict a discrete target variable is
1
y=
1+𝑒−𝑧
Note: z = 𝛽0 +∑𝑛𝑖=1 (𝛽𝑖 𝑋1 )
In [82]:
import statsmodels.api as sm
In [ ]:
Creating logistic regression equation & storing it in f_1
model = SM.logit(formula=’Dependent Variable ~ Σ Independent Variables(k)’ data = ‘Data Frame containing

the required values’).fit()
Splitting arrays or matrices into random train and test subsets. Model will be fitted on train set and
predictions will be made on the test set
In [83]:
#Statsmodel requires the labelled data, therefore, concatinating the y label to the
Company_train = pd.concat([X_train,y_train], axis=1)

Company_test = pd.concat([X_test,y_test], axis=1)
In [84]:
Company_train.to_csv('Company_train.csv',index=False)
Company_test.to_csv('Company_test.csv',index=False)
In [85]:
Company_train["default"].value_counts()
Out[85]:
0 2176
1 226
Name: default, dtype: int64
Checking if dataset is balanced
In [86]:
Company_train.default.sum() / len(Company_train.default)
Out[86]:
0.09408825978351373

In [87]:
Company_train.columns
Out[87]:
Index(['Co_Code', 'Equity_Paid_Up', 'Total_Debt', 'Net_Working_Capita

l',
'Current_Assets', 'Current_Liabilities_and_Provisions',
'Total_Assets_by_Liabilities', 'Other_Income', 'Value_Of_Outpu

t',
'Selling_Cost', 'Adjusted_PAT', 'Capital_expenses_in_forex',
'Book_Value_Unit_Curr', 'Book_Value_Adj_Unit_Curr',
'CEPS_annualised_Unit_Curr', 'Cash_Flow_From_Operating_Activiti
es',
'Cash_Flow_From_Investing_Activities',
'Cash_Flow_From_Financing_Activities', 'ROG_Net_Worth_perc',
'ROG_Capital_Employed_perc', 'ROG_Gross_Block_perc',
'ROG_Gross_Sales_perc', 'ROG_Net_Sales_perc',
'ROG_Cost_of_Production_perc', 'ROG_Total_Assets_perc',
'ROG_PBIDT_perc', 'ROG_PBDT_perc', 'ROG_PBIT_perc', 'ROG_PBT_pe

rc',
'ROG_PAT_perc', 'ROG_CP_perc', 'ROG_Market_Capitalisation_per

c',
'Current_Ratio_Latest', 'Fixed_Assets_Ratio_Latest',
'Inventory_Ratio_Latest', 'Debtors_Ratio_Latest',
'Total_Asset_Turnover_Ratio_Latest', 'Interest_Cover_Ratio_Late
st',
'PBIDTM_perc_Latest', 'PBITM_perc_Latest', 'PBDTM_perc_Latest',
'CPM_perc_Latest', 'APATM_perc_Latest', 'Debtors_Velocity_Day

s',
'Creditors_Velocity_Days', 'Inventory_Velocity_Days',
'Value_of_Output_by_Total_Assets', 'Value_of_Output_by_Gross_Bl
ock',
'Networth_Next_Year', 'default'],
dtype='object')
In [ ]:
Model 1
Before starting model building, lets look at the problem of multicollinearity. Multicollinearity occurs when two or
more independent variables are highly correlated with one another in a regression model.
In [88]:
## Importing VIF
from statsmodels.stats.outliers_influence import variance_inflation_factor

def calc_vif(X):
vif = pd.DataFrame()
vif["variables"] = X.columns
vif["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
return(vif)

In [89]:
calc_vif(X_train).sort_values(by='VIF', ascending = False)
Out[89]:
variables VIF
22 ROG_Net_Sales_perc 19.846
21 ROG_Gross_Sales_perc 19.749
13 Book_Value_Adj_Unit_Curr 5.579
12 Book_Value_Unit_Curr 5.537
46 Value_of_Output_by_Total_Assets 4.805
36 Total_Asset_Turnover_Ratio_Latest 4.405
40 PBDTM_perc_Latest 4.187
26 ROG_PBDT_perc 4.079
28 ROG_PBT_perc 4.036
41 CPM_perc_Latest 3.932
29 ROG_PAT_perc 3.477
27 ROG_PBIT_perc 3.386
30 ROG_CP_perc 3.279
25 ROG_PBIDT_perc 3.278
47 Value_of_Output_by_Gross_Block 3.051
39 PBITM_perc_Latest 3.038
33 Fixed_Assets_Ratio_Latest 3.035
38 PBIDTM_perc_Latest 2.713
42 APATM_perc_Latest 2.679
10 Adjusted_PAT 2.471
14 CEPS_annualised_Unit_Curr 2.155
19 ROG_Capital_Employed_perc 1.923
18 ROG_Net_Worth_perc 1.837
37 Interest_Cover_Ratio_Latest 1.781
9 Selling_Cost 1.754
24 ROG_Total_Assets_perc 1.743
35 Debtors_Ratio_Latest 1.737
34 Inventory_Ratio_Latest 1.619
7 Other_Income 1.603
15 Cash_Flow_From_Operating_Activities 1.555
48 Networth_Next_Year 1.457
5 Current_Liabilities_and_Provisions 1.444

variables VIF
3 Net_Working_Capital 1.428
8 Value_Of_Output 1.388
43 Debtors_Velocity_Days 1.387
4 Current_Assets 1.377
2 Total_Debt 1.371
23 ROG_Cost_of_Production_perc 1.363
32 Current_Ratio_Latest 1.308
20 ROG_Gross_Block_perc 1.306
45 Inventory_Velocity_Days 1.304
44 Creditors_Velocity_Days 1.268
17 Cash_Flow_From_Financing_Activities 1.180
16 Cash_Flow_From_Investing_Activities 1.177
31 ROG_Market_Capitalisation_perc 1.164
0 Co_Code 1.104
6 Total_Assets_by_Liabilities 1.094
1 Equity_Paid_Up 1.060
11 Capital_expenses_in_forex nan
Here, we see that the value of VIF is high for many variables. Hence,dropping variables with VIF more than 5
(very high correlation) & build our model.
In [94]:
f_1='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total_
In [95]:
model_1 = SM.logit(formula = f_1,data=Company_imputed).fit()
Optimization terminated successfully.
Current function value: 0.125498
Iterations 10
Checking the coefficients:

In [96]:
model_1.summary()
Out[96]:
Logit Regression Results
Dep. Variable: default No. Observations: 3586
Model: Logit Df Residuals: 3553
Method: MLE Df Model: 32
Date: Sun, 06 Feb 2022 Pseudo R-squ.: 0.6157
Time: 12:31:57 Log-Likelihood: -450.04
converged: True LL-Null: -1171.0
Covariance Type: nonrobust LLR p-value: 4.424e-283
coef std err z P>|z| [0.025 0.975]
Intercept -5.6654 0.271 -20.926 0.000 -6.196 -5.135
Book_Value_Adj_Unit_Curr -1.2438 0.574 -2.168 0.030 -2.368 -0.120
Book_Value_Unit_Curr -1.6603 0.583 -2.848 0.004 -2.803 -0.518
Value_of_Output_by_Total_Assets 0.3727 0.162 2.301 0.021 0.055 0.690
Total_Asset_Turnover_Ratio_Latest -0.1217 0.148 -0.823 0.411 -0.412 0.168
PBDTM_perc_Latest 0.0182 0.232 0.078 0.938 -0.437 0.473
CPM_perc_Latest -0.3498 0.230 -1.523 0.128 -0.800 0.100
ROG_PBIT_perc 0.0020 0.114 0.017 0.986 -0.221 0.225
ROG_CP_perc 0.0284 0.115 0.246 0.805 -0.198 0.255
Value_of_Output_by_Gross_Block -0.4073 0.204 -1.998 0.046 -0.807 -0.008
Fixed_Assets_Ratio_Latest -0.0874 0.198 -0.442 0.658 -0.475 0.300
Adjusted_PAT -0.5003 0.154 -3.252 0.001 -0.802 -0.199
ROG_Net_Worth_perc -0.2211 0.127 -1.745 0.081 -0.469 0.027
Interest_Cover_Ratio_Latest -0.4189 0.150 -2.791 0.005 -0.713 -0.125
Selling_Cost 0.1371 0.134 1.020 0.308 -0.126 0.401
ROG_Total_Assets_perc -0.1902 0.117 -1.620 0.105 -0.420 0.040
Debtors_Ratio_Latest -0.2212 0.120 -1.837 0.066 -0.457 0.015
Inventory_Ratio_Latest -0.0744 0.119 -0.624 0.533 -0.308 0.159
Other_Income -0.1152 0.111 -1.040 0.298 -0.332 0.102
Cash_Flow_From_Operating_Activities -0.0088 0.110 -0.080 0.936 -0.224 0.207
Net_Working_Capital -0.3268 0.101 -3.227 0.001 -0.525 -0.128
Debtors_Velocity_Days 0.0331 0.103 0.321 0.748 -0.169 0.235
Total_Debt 0.6775 0.101 6.726 0.000 0.480 0.875
ROG_Cost_of_Production_perc -0.2281 0.098 -2.330 0.020 -0.420 -0.036

Current_Ratio_Latest -0.7208 0.129 -5.587 0.000 -0.974 -0.468
ROG_Gross_Block_perc 0.0438 0.114 0.384 0.701 -0.179 0.267
Inventory_Velocity_Days -0.0121 0.102 -0.118 0.906 -0.212 0.188
Creditors_Velocity_Days 0.0952 0.095 0.999 0.318 -0.092 0.282
Cash_Flow_From_Financing_Activities -0.0284 0.093 -0.304 0.761 -0.211 0.155
ROG_Market_Capitalisation_perc -0.0354 0.095 -0.374 0.708 -0.221 0.150
Equity_Paid_Up -0.1523 0.088 -1.723 0.085 -0.325 0.021
Possibly complete quasi-separation: A fraction 0.18 of observations can be
perfectly predicted. This might indicate that there is complete
quasi-separation. In this case some parameters will not be identified.
As most of the coefficients are having p values greater than 5%, those variables are highly correlated
and this can be ignored while taking only significant variables with p values < 0.05.
The elimination of these variables is done one by one, where the highest insignificant variable is
removed first from logistic model and then model performance tested again to see if other variables are
contributing significantly or not.
Variable "ROG_PBIT_perc" has the highest p-value (0.986) and is insignificant, therefore, we need to
eliminate it.
Model_2
In [97]:
_Gross_Block_perc+Inventory_Velocity_Days+Creditors_Velocity_Days+Cash_Flow_From_Fina
In [98]:
Iterations 10

In [99]:
model_2.summary()
Out[99]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6656 0.271 -20.931 0.000 -6.196 -5.135
PBDTM_perc_Latest 0.0182 0.232 0.078 0.937 -0.437 0.473
CPM_perc_Latest -0.3502 0.229 -1.530 0.126 -0.799 0.098
ROG_CP_perc 0.0297 0.089 0.332 0.740 -0.145 0.205
Adjusted_PAT -0.5002 0.154 -3.258 0.001 -0.801 -0.199
Selling_Cost 0.1372 0.134 1.021 0.307 -0.126 0.401
Other_Income -0.1152 0.111 -1.040 0.298 -0.332 0.102
Total_Debt 0.6775 0.101 6.726 0.000 0.480 0.875

Equity_Paid_Up -0.1523 0.088 -1.724 0.085 -0.325 0.021
Variable "PBDTM_perc_Latest" has the highest p-value (0.937) and is insignificant, therefore, we need to
eliminate it.
Model_3
In [100]:
e_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total_Assets+Total_Asset_Turn
In [101]:
Iterations 10

In [102]:
model_3.summary()
Out[102]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6647 0.270 -20.948 0.000 -6.195 -5.135
CPM_perc_Latest -0.3347 0.115 -2.908 0.004 -0.560 -0.109
ROG_CP_perc 0.0295 0.089 0.331 0.741 -0.145 0.205
Adjusted_PAT -0.4995 0.153 -3.259 0.001 -0.800 -0.199
Selling_Cost 0.1366 0.134 1.018 0.309 -0.126 0.399
Other_Income -0.1153 0.111 -1.041 0.298 -0.332 0.102
Total_Debt 0.6771 0.101 6.729 0.000 0.480 0.874

Equity_Paid_Up -0.1520 0.088 -1.722 0.085 -0.325 0.021
Variable "Cash_Flow_From_Operating_Activities"has the highest p-value (0.939) and is insignificant,

therefore, we need to eliminate it.
Model_4
In [103]:
In [104]:
Iterations 10

In [105]:
model_4.summary()
Out[105]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6653 0.270 -20.954 0.000 -6.195 -5.135
CPM_perc_Latest -0.3351 0.115 -2.915 0.004 -0.560 -0.110
ROG_CP_perc 0.0298 0.089 0.334 0.738 -0.145 0.205
Adjusted_PAT -0.5011 0.152 -3.299 0.001 -0.799 -0.203
Selling_Cost 0.1358 0.134 1.015 0.310 -0.126 0.398
Other_Income -0.1169 0.109 -1.076 0.282 -0.330 0.096
Total_Debt 0.6765 0.100 6.743 0.000 0.480 0.873

Equity_Paid_Up -0.1520 0.088 -1.724 0.085 -0.325 0.021
Variable "Inventory_Velocity_Days" has the highest p-value (0.907) and is insignificant, therefore, we
need to eliminate it.
Model_5
In [106]:
In [107]:
Iterations 10

In [108]:
model_5.summary()
Out[108]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6651 0.270 -20.956 0.000 -6.195 -5.135
CPM_perc_Latest -0.3355 0.115 -2.920 0.004 -0.561 -0.110
ROG_CP_perc 0.0299 0.089 0.335 0.737 -0.145 0.205
Adjusted_PAT -0.5008 0.152 -3.297 0.001 -0.799 -0.203
Selling_Cost 0.1335 0.132 1.009 0.313 -0.126 0.393
Other_Income -0.1173 0.109 -1.080 0.280 -0.330 0.096
Total_Debt 0.6751 0.100 6.780 0.000 0.480 0.870

Equity_Paid_Up -0.1519 0.088 -1.722 0.085 -0.325 0.021
Variable "Debtors_Velocity_Days" has the highest p-value (0.764) and is insignificant, therefore, we need
to eliminate it.
Model_6
In [109]:
atest+Selling_Cost+ROG_Total_Assets_perc+Debtors_Ratio_Latest+Inventory_Ratio_Latest+
In [110]:
Iterations 10

In [111]:
model_6.summary()
Out[111]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6650 0.270 -20.958 0.000 -6.195 -5.135
CPM_perc_Latest -0.3343 0.115 -2.912 0.004 -0.559 -0.109
ROG_CP_perc 0.0288 0.089 0.323 0.747 -0.146 0.203
Adjusted_PAT -0.4992 0.152 -3.289 0.001 -0.797 -0.202
Selling_Cost 0.1320 0.132 0.998 0.318 -0.127 0.391
Other_Income -0.1157 0.108 -1.067 0.286 -0.328 0.097
Total_Debt 0.6751 0.100 6.781 0.000 0.480 0.870

Equity_Paid_Up -0.1527 0.088 -1.732 0.083 -0.325 0.020
Variable "Cash_Flow_From_Financing_Activities" has the highest p-value (0.757) and is insignificant,

Model_7
In [112]:
of_Production_perc+Current_Ratio_Latest+ROG_Gross_Block_perc+Creditors_Velocity_Days+
In [113]:
model_7= SM.logit(formula = f_7,data=Company_imputed).fit()
Iterations 10

In [114]:
model_7.summary()
Out[114]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6640 0.270 -20.966 0.000 -6.194 -5.135
CPM_perc_Latest -0.3329 0.115 -2.902 0.004 -0.558 -0.108
ROG_CP_perc 0.0301 0.089 0.338 0.735 -0.144 0.204
Adjusted_PAT -0.4971 0.152 -3.280 0.001 -0.794 -0.200
Selling_Cost 0.1297 0.132 0.982 0.326 -0.129 0.388
Other_Income -0.1151 0.108 -1.061 0.289 -0.328 0.097
Total_Debt 0.6756 0.100 6.788 0.000 0.481 0.871

Equity_Paid_Up -0.1531 0.088 -1.738 0.082 -0.326 0.020
Variable "ROG_CP_perc" has the highest p-value (0.735) and is insignificant, therefore, we need to
eliminate it.
Model_8
In [115]:
s+Total_Asset_Turnover_Ratio_Latest+CPM_perc_Latest+Value_of_Output_by_Gross_Block+ F
In [116]:
Iterations 10

In [117]:
model_8.summary()
Out[117]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6667 0.270 -20.975 0.000 -6.196 -5.137
CPM_perc_Latest -0.3306 0.114 -2.889 0.004 -0.555 -0.106
Adjusted_PAT -0.4958 0.151 -3.274 0.001 -0.793 -0.199
Selling_Cost 0.1272 0.132 0.965 0.335 -0.131 0.386
Other_Income -0.1148 0.108 -1.059 0.289 -0.327 0.098
Total_Debt 0.6774 0.099 6.816 0.000 0.483 0.872

Equity_Paid_Up -0.1542 0.088 -1.751 0.080 -0.327 0.018
Variable "ROG_Gross_Block_perc" has the highest p-value (0.720) and is insignificant, therefore, we
Model_9
In [118]:
l+Total_Debt+ROG_Cost_of_Production_perc+Current_Ratio_Latest+Creditors_Velocity_Days
In [119]:
Iterations 10

In [120]:
model_9.summary()
Out[120]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6660 0.270 -20.975 0.000 -6.195 -5.137
CPM_perc_Latest -0.3274 0.114 -2.867 0.004 -0.551 -0.104
Adjusted_PAT -0.4961 0.152 -3.274 0.001 -0.793 -0.199
Selling_Cost 0.1310 0.131 0.997 0.319 -0.127 0.389
Other_Income -0.1142 0.108 -1.054 0.292 -0.327 0.098
Total_Debt 0.6770 0.099 6.815 0.000 0.482 0.872
Cash_Flow_From_Investing_Activities 0.1864 0.095 1.961 0.050 6.43e-05 0.373
Equity_Paid_Up -0.1551 0.088 -1.762 0.078 -0.328 0.017

Variable "ROG_Market_Capitalisation_perc" has the highest p-value (0.697) and is insignificant,

Model_10
In [121]:
ncome+ Net_Working_Capital+Total_Debt+ROG_Cost_of_Production_perc+Current_Ratio_Lates
In [122]:
Iterations 10

In [123]:
model_10.summary()
Out[123]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6620 0.270 -20.985 0.000 -6.191 -5.133
CPM_perc_Latest -0.3280 0.114 -2.870 0.004 -0.552 -0.104
Adjusted_PAT -0.4969 0.151 -3.284 0.001 -0.793 -0.200
Selling_Cost 0.1248 0.131 0.956 0.339 -0.131 0.381
Other_Income -0.1151 0.108 -1.062 0.288 -0.327 0.097
Total_Debt 0.6741 0.099 6.808 0.000 0.480 0.868
Equity_Paid_Up -0.1569 0.088 -1.785 0.074 -0.329 0.015

Variable "Fixed_Assets_Ratio_Latest" has the highest p-value (0.656) and is insignificant, therefore, we
Model_11
In [124]:
nover_Ratio_Latest+CPM_perc_Latest+Value_of_Output_by_Gross_Block+ Adjusted_PAT+ROG_C
In [125]:
Iterations 10

In [126]:
model_11.summary()
Out[126]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6618 0.270 -20.970 0.000 -6.191 -5.133
CPM_perc_Latest -0.3254 0.114 -2.852 0.004 -0.549 -0.102
Adjusted_PAT -0.4960 0.151 -3.276 0.001 -0.793 -0.199
Selling_Cost 0.1241 0.131 0.950 0.342 -0.132 0.380
Other_Income -0.1147 0.108 -1.058 0.290 -0.327 0.098
Total_Debt 0.6755 0.099 6.831 0.000 0.482 0.869
Equity_Paid_Up -0.1567 0.088 -1.783 0.075 -0.329 0.016

Variable "Inventory_Ratio_Latest" has the highest p-value (0.528) and is insignificant, therefore, we need
to eliminate it.
Model_12
In [127]:
Interest_Cover_Ratio_Latest+Selling_Cost+ROG_Total_Assets_perc+Debtors_Ratio_Latest+O
In [128]:
Iterations 10

In [129]:
model_12.summary()
Out[129]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6604 0.270 -20.975 0.000 -6.189 -5.131
CPM_perc_Latest -0.3288 0.114 -2.881 0.004 -0.553 -0.105
Adjusted_PAT -0.4992 0.151 -3.299 0.001 -0.796 -0.203
Selling_Cost 0.1183 0.131 0.905 0.365 -0.138 0.374
Debtors_Ratio_Latest -0.2333 0.113 -2.069 0.039 -0.454 -0.012
Other_Income -0.1198 0.108 -1.108 0.268 -0.332 0.092
Total_Debt 0.6698 0.098 6.808 0.000 0.477 0.863
Equity_Paid_Up -0.1582 0.088 -1.802 0.072 -0.330 0.014

Variable "Selling_Cost" has the highest p-value (0.365) and is insignificant, therefore, we need to
eliminate it.
Model_13
In [130]:
f_13='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total
In [131]:
Iterations 10

In [132]:
model_13.summary()
Out[132]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6558 0.270 -20.942 0.000 -6.185 -5.126
CPM_perc_Latest -0.3304 0.114 -2.894 0.004 -0.554 -0.107
Adjusted_PAT -0.4753 0.149 -3.186 0.001 -0.768 -0.183
Other_Income -0.0877 0.102 -0.857 0.391 -0.288 0.113
Total_Debt 0.6717 0.098 6.832 0.000 0.479 0.864
Equity_Paid_Up -0.1569 0.088 -1.790 0.073 -0.329 0.015

Variable "Other_Income" has the highest p-value (0.391) and is insignificant, therefore, we need to
eliminate it.
Model_15
In [133]:
s+Total_Asset_Turnover_Ratio_Latest+CPM_perc_Latest+Value_of_Output_by_Gross_Block+ A
In [134]:
Iterations 10

In [135]:
model_15.summary()
Out[135]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6461 0.269 -20.990 0.000 -6.173 -5.119
CPM_perc_Latest -0.3289 0.114 -2.887 0.004 -0.552 -0.106
Adjusted_PAT -0.4977 0.147 -3.388 0.001 -0.786 -0.210
Total_Debt 0.6558 0.096 6.804 0.000 0.467 0.845
Equity_Paid_Up -0.1541 0.088 -1.759 0.079 -0.326 0.018

Variable "Total_Asset_Turnover_Ratio_Latest" has the highest p-value (0.379) and is insignificant,

Model_16
In [136]:
'default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+Value_of_Output_by_Total_Asse
In [137]:
Iterations 10

In [138]:
model_16.summary()
Out[138]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6564 0.269 -21.017 0.000 -6.184 -5.129
CPM_perc_Latest -0.3329 0.114 -2.927 0.003 -0.556 -0.110
Adjusted_PAT -0.5025 0.147 -3.420 0.001 -0.791 -0.215
Total_Debt 0.6586 0.096 6.843 0.000 0.470 0.847
Cash_Flow_From_Investing_Activities 0.1832 0.094 1.951 0.051 -0.001 0.367
Equity_Paid_Up -0.1526 0.087 -1.745 0.081 -0.324 0.019
Variable "Creditors Velocity Days" has the highest p-value (0.360) and is insignificant, therefore, we
_ y_ y g p ( ) g , ,
Model_17
In [139]:
In [140]:
Iterations 10

In [141]:
model_17.summary()
Out[141]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6456 0.268 -21.070 0.000 -6.171 -5.120
CPM_perc_Latest -0.3309 0.114 -2.915 0.004 -0.553 -0.108
Adjusted_PAT -0.4986 0.147 -3.389 0.001 -0.787 -0.210
Total_Debt 0.6715 0.095 7.050 0.000 0.485 0.858
Equity_Paid_Up -0.1542 0.088 -1.762 0.078 -0.326 0.017
Variable "Equity_Paid_Up" has the highest p-value (0.078) and is insignificant, therefore, we need to
eliminate it.

Model_18
In [142]:
In [143]:
Iterations 10

In [144]:
model_18.summary()
Out[144]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6166 0.266 -21.127 0.000 -6.138 -5.096
CPM_perc_Latest -0.3348 0.113 -2.958 0.003 -0.557 -0.113
Adjusted_PAT -0.4995 0.147 -3.391 0.001 -0.788 -0.211
Total_Debt 0.6591 0.095 6.973 0.000 0.474 0.844
Variable "ROG_Net_Worth_perc" has the highest p-value (0.089) and is insignificant, therefore, we need
to eliminate it.
Model 19
Model_19
In [145]:
t_by_Total_Assets+CPM_perc_Latest+Value_of_Output_by_Gross_Block+ Adjusted_PAT+ROG_C
In [146]:
Iterations 10

In [147]:
model_19.summary()
Out[147]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6353 0.267 -21.079 0.000 -6.159 -5.111
CPM_perc_Latest -0.3425 0.113 -3.041 0.002 -0.563 -0.122
Adjusted_PAT -0.5869 0.139 -4.225 0.000 -0.859 -0.315
Total_Debt 0.6640 0.094 7.052 0.000 0.479 0.849
Variable "Cash_Flow_From_Investing_Activities" has the highest p-value (0.052) and is insignificant,

Model_21
In [148]:
ver_Ratio_Latest+ROG_Total_Assets_perc+Debtors_Ratio_Latest+Net_Working_Capital+Tota
In [149]:
Iterations 10

In [150]:
model_21.summary()
Out[150]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6487 0.268 -21.056 0.000 -6.175 -5.123
CPM_perc_Latest -0.3525 0.112 -3.150 0.002 -0.572 -0.133
Adjusted_PAT -0.5701 0.138 -4.127 0.000 -0.841 -0.299
ROG_Capital_Employed_perc 0.2259 0.117 1.933 0.053 -0.003 0.455
Total_Debt 0.6544 0.094 6.978 0.000 0.471 0.838
Variable "ROG_Total_Assets_perc" has the highest p-value (0.065) and is insignificant, therefore, we
Model_22

In [151]:
In [152]:
Iterations 10

In [153]:
model_22.summary()
Out[153]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6362 0.267 -21.106 0.000 -6.160 -5.113
Value_of_Output_by_Total_Assets 0.2341 0.124 1.883 0.060 -0.010 0.478
CPM_perc_Latest -0.3627 0.111 -3.259 0.001 -0.581 -0.145
Adjusted_PAT -0.5876 0.137 -4.276 0.000 -0.857 -0.318
ROG_Capital_Employed_perc 0.1159 0.100 1.159 0.246 -0.080 0.312
Total_Debt 0.6533 0.094 6.977 0.000 0.470 0.837
Variable "ROG_Capital_Employed_perc" has the highest p-value (0.246) and is insignificant, therefore,
we need to eliminate it.
Model_23
In [154]:

In [155]:
Iterations 10
In [156]:
model_23.summary()
Out[156]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.6167 0.265 -21.160 0.000 -6.137 -5.096
Value_of_Output_by_Total_Assets 0.2354 0.124 1.896 0.058 -0.008 0.479
CPM_perc_Latest -0.3613 0.111 -3.244 0.001 -0.580 -0.143
Adjusted_PAT -0.5518 0.133 -4.136 0.000 -0.813 -0.290
Total_Debt 0.6546 0.094 6.991 0.000 0.471 0.838
Variable "Value_of_Output_by_Total_Assets" has the highest p-value (0.058) and is insignificant,


Model_24
In [157]:
f_24='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+CPM_perc_Latest+Value_of
In [158]:
Iterations 10
In [159]:
model_24.summary()
Out[159]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.5890 0.264 -21.132 0.000 -6.107 -5.071
CPM_perc_Latest -0.3612 0.111 -3.256 0.001 -0.579 -0.144
Adjusted_PAT -0.5471 0.133 -4.108 0.000 -0.808 -0.286
Total_Debt 0.6619 0.093 7.092 0.000 0.479 0.845

Variable "Debtors_Ratio_Latest" has the highest p-value (0.165) and is insignificant, therefore, we need
to eliminate it.
Model_25
In [160]:
f_25='default~Book_Value_Adj_Unit_Curr+Book_Value_Unit_Curr+CPM_perc_Latest+Value_of
In [161]:
Iterations 10

In [162]:
model_25.summary()
Out[162]:
coef std err z P>|z| [0.025 0.975]
Intercept -5.5826 0.264 -21.167 0.000 -6.099 -5.066
CPM_perc_Latest -0.3632 0.111 -3.283 0.001 -0.580 -0.146
Adjusted_PAT -0.5628 0.133 -4.238 0.000 -0.823 -0.303
Total_Debt 0.6412 0.092 6.982 0.000 0.461 0.821
Now all the variables are significant, therefore, we don't need to eliminate any variable.Therefore, after
many such iterations below variables were removed :
ROG_PBIT_perc, PBDTM_perc_Latest, Cash_Flow_From_Operating_Activities, Inventory_Velocity_Days,

Debtors_Velocity_Days, Cash_Flow_From_Financing_Activities, ROG_CP_perc, ROG_Gross_Block_perc,
ROG_Market_Capitalisation_perc, Fixed_Assets_Ratio_Latest, Inventory_Ratio_Latest, Selling_Cost,
Other_Income, Total_Asset_Turnover_Ratio_Latest, Creditors_Velocity_Days, Equity_Paid_Up,
ROG_Net_Worth_perc, Cash_Flow_From_Investing_Activities, ROG_Total_Assets_perc,
ROG_Capital_Employed_perc, Value_of_Output_by_Total_Assets, Debtors_Ratio_Latest
Variables used for Statistical Modelling are :

Book_Value_Adj_Unit_Curr, Book_Value_Unit_Curr, CPM_perc_Latest, Value_of_Output_by_Gross_Block,

Adjusted_PAT, Interest_Cover_Ratio_Latest, Net_Working_Capital, Total_Debt, ROG_Cost_of_Production_perc
and Current_Ratio_Latest.
1.7 Validate the Model on Test Dataset and state the performance
matrices. Also state interpretation from the model
Now we will look at the predicted probability values.
Prediction on the Data Model:
In [172]:
y_prob_pred_train = model_25.predict(Company_train)
pd.DataFrame(y_prob_pred_train).head()
Out[172]:
662 0.000
1373 0.001
3268 0.003
3246 0.002
1456 0.003
In [173]:
y_prob_pred_test = model_25.predict(Company_test)
pd.DataFrame(y_prob_pred_test).head()
...
Let us now see the predicted classes on Train Data.
In [174]:
y_class_pred=[]
for i in range(0,len(y_prob_pred_train)):
if np.array(y_prob_pred_train)[i]>0.5:
a=1
else:
a=0
y_class_pred.append(a)
Model Evaluation on the Training Data

Let us now check the confusion matrix and the classification report followed by the AUC and the AUC-ROC
curve.

In [178]:
sns.heatmap((metrics.confusion_matrix(Company_train['default'],y_class_pred)),annot=
,cmap='Blues');
plt.xlabel('Predicted Label');
plt.ylabel('Actual Label',rotation=90);
plt.title('Figure: Confusion Matrix of Train Data');
In [179]:
print(metrics.classification_report(Company_train['default'],y_class_pred,digits=3))
precision recall f1-score support
0 0.970 0.980 0.975 2176
1 0.785 0.712 0.747 226
accuracy 0.955 2402
macro avg 0.878 0.846 0.861 2402
weighted avg 0.953 0.955 0.954 2402
Overall 95% of correct predictions to total predictions were made by the model
92% of those defaulted were correctly identified as defaulters by the model
Now, let us see the predicted probability values on test dataset

In [180]:
y_prob_pred_test = model_25.predict(Company_test)
pd.DataFrame(y_prob_pred_test).head()
Out[180]:
3163 0.001
3133 0.000
937 0.159
196 0.764
2852 0.000
Let us now see the predicted classes on Test Data.
In [181]:
y_class_pred=[]
for i in range(0,len(y_prob_pred_test)):
if np.array(y_prob_pred_test)[i]>0.5:
a=1
else:
a=0
y_class_pred.append(a)
Model Evaluation on the Test Data

Let us now check the confusion matrix and the classification report followed by the AUC and the AUC-ROC
curve.

In [182]:
sns.heatmap((metrics.confusion_matrix(Company_test['default'],y_class_pred)),annot=T
,cmap='Blues');
plt.xlabel('Predicted Label');
plt.ylabel('Actual Label',rotation=90);
plt.title('Figure: Confusion Matrix of Test Data');
In [183]:
print(metrics.classification_report(Company_test['default'],y_class_pred,digits=3))
precision recall f1-score support
0 0.974 0.974 0.974 1049
1 0.800 0.800 0.800 135
accuracy 0.954 1184
macro avg 0.887 0.887 0.887 1184
weighted avg 0.954 0.954 0.954 1184
Overall 97% of correct predictions to total predictions were made by the model
91% of those defaulted were correctly identified as defaulters by the model
Some interpretation of the model:
1) Of many variables – significantly only 6 variables contribute to the company being predicted as default or not
from logistic regression point of view.
2) The model is likely to predict the 86% companies that could default correctly.
3) Which means only in 14% cases – it could happen that a company that is predicted as defaulter may not be
a defaulter but form an investor point of view – it is ok to no invest money on company that could likely not
default.
4) The precision is a bit less in this model – however still 68% times, the model will predict the defaulter
company correctly.
localhost:8888/notebooks/Downloads/Financial Risk Analytics (FRA)/Project FRA Milestone 1/Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022.i… 100/102

In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:

In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:

Project FRA Milestone1 JPY Nikita Chaturvedi 05.05.2022 Jupyter Notebook PDF

Uploaded by

Copyright:

Available Formats

Project FRA Milestone1 JPY Nikita Chaturvedi 05.05.2022 Jupyter Notebook PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project FRA Milestone1 JPY Nikita Chaturvedi 05.05.2022 Jupyter Notebook PDF

Uploaded by

Copyright:

Available Formats

06/02/2022, 17:52 Project_FRA_Milestone1_Nikita Chaturvedi_05.05.

2022 - Jupyter Notebook

# Importing the libraries

Data Ingestion (Read Dataset):

Company = pd.read_csv('FRA Milestone 1.csv')

localhost:8888/notebooks/Downloads/Financial Risk Analytics (FRA)/Project FRA Milestone 1/Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022.ipynb 1/102

27.48 -1,007.24 5,936.03 474.3 -1,076.34 40.5 ... 0 0 0 0 0

19.39 -1,824.75 694.64 0.02 -1,843.74 0 ... 0 0 0 0 0

3580 2806 Infosys 61082.00 574 48,068 48,098 0 12,869 28,721

Fixing Messy Column Names (containing spaces):

Checking Top 10 Rows Again :

localhost:8888/notebooks/Downloads/Financial Risk Analytics (FRA)/Project FRA Milestone 1/Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022.ipynb 2/102

Co_Code Co_Name Networth_Next_Year Equity_Paid_Up Networth Capital_Employed Tota

0 16974 Hind.Cables -8021.60 419.36 -7,027.48 -1,007.24 5

3 2439 GTL -3054.51 157.3 -623.49 2,353.88 2

5 2484 Usha Ispat -2519.40 179.35 -2,519.39 -1,824.75

7 3226 K S Oils -2100.56 45.92 -1,945.45 979.13 2

9 2334 ITI -1677.18 288 -1,947.85 86.35 1

localhost:8888/notebooks/Downloads/Financial Risk Analytics (FRA)/Project FRA Milestone 1/Project_FRA_Milestone1_Nikita Chaturvedi_05.05.2022.ipynb 3/102

RangeIndex: 3586 entries, 0 to 3585

Data columns (total 67 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Co_Code 3586 non-null int64

1 Co_Name 3586 non-null object

2 Networth_Next_Year 3586 non-null float64

3 Equity_Paid_Up 3586 non-null object

4 Networth 3586 non-null object

5 Capital_Employed 3586 non-null object

6 Total_Debt 3586 non-null object

7 Gross_Block 3586 non-null object

8 Net_Working_Capital 3586 non-null object

9 Current_Assets 3586 non-null object

10 Current_Liabilities_and_Provisions 3586 non-null object

11 Total_Assets_by_Liabilities 3586 non-null object

12 Gross_Sales 3586 non-null object

13 Net_Sales 3586 non-null object

14 Other_Income 3586 non-null object

15 Value_Of_Output 3586 non-null object

16 Cost_of_Production 3586 non-null object

17 Selling_Cost 3586 non-null object

18 PBIDT 3586 non-null object

19 PBDT 3586 non-null object

20 PBIT 3586 non-null object

21 PBT 3586 non-null object

22 PAT 3586 non-null object

23 Adjusted_PAT 3586 non-null object

24 CP 3586 non-null object

25 Revenue_earnings_in_forex 3586 non-null object

26 Revenue_expenses_in_forex 3586 non-null object

27 Capital_expenses_in_forex 3586 non-null object

28 Book_Value_Unit_Curr 3586 non-null object

29 Book_Value_Adj_Unit_Curr 3582 non-null object

30 Market_Capitalisation 3586 non-null object

31 CEPS_annualised_Unit_Curr 3586 non-null object

32 Cash_Flow_From_Operating_Activities 3586 non-null object

33 Cash_Flow_From_Investing_Activities 3586 non-null object

34 Cash_Flow_From_Financing_Activities 3586 non-null object

35 ROG_Net_Worth_perc 3586 non-null object