Data Analytics Assignment 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Assignment – I

BFT - 6
(Deepening Specialisation 2: Apparel Production Management)

Name of the Subject : Data Analytics & R Name : Radhika Chandak


Subject Code : BFT603DS2 Roll No. : BFT/19/21
Subject Id : 15250 Date of Submission : 27.04.2022

Assignment:
Using data collection methods and applying principles of statistics carry out the following:
Identify problems faced in industrial engineering and collect appropriate data.
Use any one of the following methods in Principle of Forecasting:
Time Series
Solution:

We start by installing a package already available for covid cases , ie covid19.analytics .


To begin , we take out the time series of the confirmed cases and then death cases .
The code will be as follows :

ag<-covid19.data(case='aggregated')

tsc<-covid19.data(case = 'ts-confirmed')

#summary
report.summary(Nentries=10 , graphical.output = F)

- We will be able to see graphs and charts on the right side under plots , upon
zooming we observe :
● We see that the range of dates is from : january 2020 to april 2022 , it is for top 10
countries .
● The pie chart and bar graph show the countries with the confirmed cases and death
cases respectively.
● While Us has the highest no. of confirmed cases , Turkey has the least .
● For death cases , the US is again the highest but France is the lowest .

TIME SERIES - CONFIRMED CASES


TIME SERIES - DEATH CASES

Time Series Worldwide TOTS ****


ts-confirmed ts-deaths ts-recovered
511748975 6228621 0
1.22% 0%
**** Time Series Worldwide AVGS ****
ts-confirmed ts-deaths ts-recovered
1801933.01 21931.76 0
1.22% 0%
**** Time Series Worldwide SDS ****
ts-confirmed ts-deaths ts-recovered
6617130.29 86526.01 0
1.31% 0%
- Then we take out the total per location for our country India and the country with
most cases , ie. , Us .

#total per location


tots.per.location(tsc, geo.loc = c('us' ,'india'))

So under running model we get the linear regression model .


● On the top we can see no. of cases in the log scale and x axis represent no. of days
. Each line of the plot represents the linear regression model . The plot has the
cumulative values and we can see the concave pattern , that is the increasing trend
and then the small concave pattern showing decrease in trend .
● At the bottom we have a bar chart and the values are in the log scale for y axis .
Similarly , we also get it for Us .

LINEAR REGRESSION MODEL - India and Us


- Now to see the Growth Rate of specific countries we can type (For India here )

#growth rate
growth.rate(tsc, geo.loc = 'india')

We can see that we get 2 plots , on the top , y has 2 axis ,one in regular and other in log
scale , what we can observe from here is that during the second lockdown the cases were
increasing more rapidly than before the first lockdown .
At the bottom we have the growth rate as a part of log scale .
- Now let us extract one more time series data , for all the cases and we save it into
tsa - the name of dataframe.

tsa<-covid19.data(case = 'ts-ALL')

And then using

#TOTALS PLOT
totals.plt(tsa)

We can create interactive data for time series cases .


In the linear graph and log graph , we can see that there are around 511.79 million confirmed
cases and 505.520 million active cases ,and so on .
- To see the different Covid cases across the globe we can use the function of live.map
with the dataframe tsa .

#live map
live.map(tsa)

By clicking on the viewer and scrolling on the particular countries we can see the no. of
cases .
- One of the model that is popular among the researchers working on covid 19 data is
called as SIR model . This groups the people into 3 categories , in the first category
we have

● S-people who are healthy but susceptible to the disease .


● I- people who are infected
● R- people who are recovered

We use the function called generate sir model :

#sir model
generate.SIR.model(tsc, 'india',tot.population = 1383000000)

So on the top we have two plots ,


● On the left we have yn axis which represents no. of infected people in the regular
scale and x axis represents no. of days for the first 25 days and the plot is created .
● On the right , the y axis represents no. Of infected people in the log scale and x axis
represents no. of days for the first 25 days .
● In the bottom we have no. of subjects in the log scale . The 3 different lines are
different linear models. Blue shows people susceptible , red shows infected and
green shows recovered people .
● We can observe that from 0 to day 90 approx the no. of people getting infected
reaches to peak and no. of people recovered also reaches to peak .
This is a screenshot of the coding .

You might also like