Tarea 10nestadistica

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Punto 1

1-A biologist measured the sepal lengths of two samples of the


owers Iris Virginica and Iris Setosa. The data set is attached.
Conduct an independent samples t-test to compare the average
sepal length of Iris Virginica and Iris Setosa. Show the relevant
output. Write your results in APA format
IrisVirginicSetosa.xls
Download IrisVirginicSetosa.xls
Punto 2
• One Way ANOVA
Use Excel or R
S.W. Laagakos and F. Mosteller of Harvard University fed mice
di erent doses of red dye number 40 and recorded the time of
death in weeks. Results for female mice, dosage and time of
death are shown in the data

Low Medium High


Control
Dosage Dosage Dosage
70 49 30 34
77 60 37 36
83 63 56 48
87 67 65 48
92 70 76 65
93 74 83 91
100 77 87 98
102 80 90 102
102 89 94
103 97
96
fl
ff
1. Obtain summary statistics for the time of death in the four
groups. Comment on what di erences you observe in the
mean survival times
2. Obtain side by side boxplots of the survival times for the
four groups. Comment on your results
3. Conduct an Analysis of Variance (ANOVA) to investigate if
the mean survival time is di erent across dosages. Show
the ANOVA table resulting from your analysis. Write your
conclusions in APA format. In case a post hoc analysis is
necessary, conduct it and write your results in APA format.
ff
ff
Punto 03

For all the exercises in the attached pdf do the following using
Excel or R :
a) Obtain a scatterplot
b) Compute the correlation coe cient and test for its
signi cance . Report your results using APA style
c) In the case where the correlation is statistically signi cant , t a
linear model and report the equation . Interpret both the slope
and the R-squared of the model in terms of the problem
HWcorrelation.pdf
HOMEWORK: CORRELATION (Problem 1) The data below shows the reading and math
scores of 12 students. Calculate the correlation coef cient. Reading Scores Math Scores 1 4 1 7 2
3 3 8 3 5 4 7 5 9 6 4 6 8 7 10 8 10 8 9 ∑X = 54; ∑Y = 84; ∑XY = 423; ∑X2 = 314; ∑Y2 = 654
(Problem2) The data below shows the hourly earnings (tips included) of 10 employees at a bar
and their attractiveness scores (0 = not at all attractive … 10= extremely attractive). Is there a
correlation between attractiveness and hourly earnings. Attractiveness Score Hourly Earnings 0
$20 1 24 2 25 3 26 4 20 5 30 6 32 7 38 8 34 9 40 ∑X = 45; ∑Y = 289; ∑XY = 1472; ∑X2 = 285;
∑Y2 = 8801 (Problem 3) The data below shows the score on a promotion test given to police
of cers and the number of hours studied. Calculate the correlation coef cient. Hours Studied
Score on promotion test 0 0 1 0 2 1 3 4 4 5 6 6 8 8 16 8 ∑X = 40; ∑Y = 32; ∑XY = 262; ∑X2 =
386; ∑Y2 = 206 (Problem 4) The data below shows the height of students and the overall high
school average. What is the correlation coef cient? Height in Inches High School Average 73
100 79 95 62 90 69 80 74 70 77 65 81 60 63 40 68 30 74 20 ∑X = 720; ∑Y = 650; ∑XY =
46,990; ∑X2 = 52,210; ∑Y2 = 49,150 (Problem 5) The data below shows the number of pounds
overweight and hourly wage of 10 employees working as secretaries in a law rm. Calculate the
correlation coef cient. Pounds Overweight Hourly Wage 50 $12 30 14 20 15 20 13 18 15 13 14
10 20 4 19 0 22 0 25 ∑X = 165; ∑Y = 169; ∑XY = 2308; ∑X2 = 4809; ∑Y2 = 3025
Actions
fi
fi
fi
fi
ffi
fi
fi
fi
fi
fi
Punto 0-4

See attached le
Computer Project Multiple Linear Regression .docx
fi
Punto 05

See attached pdf . Do the exercise both by using the formulas


learned and with software ( Excel or R )
Punto 06

6.1- A researcher collected data in a project to predict the annual


growth per acre of upland boreal forests in southern Canada.
They hypothesized that cubic foot volume growth (y) is a function
of stand basal area per acre (x1), the percentage of that basal
area in black spruce (x2), and the stand’s site index for black
spruce (x3).
The Observed data for cubic feet, stand basal area, percent
basal area in black spruce, and site index is shown below .

a) Obtain a scatterplot matrix and comment on the graphs


b) Obtain the correlation matrix between the variables and
comment on it
c) Run simple linear regression models on cubic foot volume
growth (y) vs stand basal area per acre (x1), the percentage of
that basal area in black spruce (x2), and the stand’s site index for
black spruce (x3). Comment on the results ( whether the models
are signi cant and also the interpretation of the slope , intercept
and R-square for each of the models)
d) Run a multiple linear regression model on cubic foot volume
growth (y) vs stand basal area per acre (x1), the percentage of
that basal area in black spruce (x2), and the stand’s site index for
black spruce (x3). Comment on the overall sign cance of the
model , which variables are signi cant predictors and which ones
are not . Interpret the coe cients of the model , as well as R-
square. If some of the variables are not sign cant predictors , re-
t the multiple linear regression without them and again comment
on the overall sign cance of the model . Interpret the coe cients
of the model , as well as R-square

6.2- You are a data analyst working for a company that is


interested in understanding the factors that in uence employee
salaries. You have been provided with a dataset that contains
information on employee salaries, years of experience, education
level ( 0 = No Bachelor degree, 1= Person has a Bachelor degree
or higher) , and whether the employee holds a management
position ( 0= No , 1= Yes).
You decided to perform a multiple linear regression analysis to
explore how these variables are related to employee salaries.
Below is the output of the regression analysis:
Multiple Linear Regression Output:
================================================
=========================================
Variable Coe cient Standard Error t-statistic P-
value
================================================
=========================================
fi
fi
fi
ffi
ffi
fi
fi
fl
fi
ffi
Intercept 30,000.00 2,500.00 12.00
<0.001***
Years_of_Experience 2,500.00 500.00 5.00
<0.01**
Education_Level 5,000.00 1,000.00 5.00
<0.01**
Management_Position 10,000.00 2,000.00 5.00
<0.01**
================================================
=========================================
Multiple R-squared: 0.75
Adjusted R-squared: 0.72
F-statistic: 23.45 (p-value <0.001***)
Residual Standard Error: 3,000.00
================================================
=========================================
Questions:
1. Overall Signi cance of the Model: a. Based on the F-
statistic and its associated p-value, is the multiple linear
regression model statistically signi cant? Explain why or
why not.
2. Signi cance of the Variables: a. Are any of the
independent variables (Years_of_Experience, Education_Level,
and Management_Position) statistically signi cant predictors? Refer
to the p-values associated with each variable and explain.
3. Interpretation of Coe cients: a. What is the interpretation
of the coe cient for Years_of_Experience? How does a one-unit
increase in years of experience a ect an employee's salary
while holding other variables constant? b. What is the
interpretation of the coe cient for Education_Level? How does
having a Bachelor's degree in uence an employee's salary
while holding other variables constant? c. What is the
fi
ffi
fi
ffi
ffi
fl
ff
fi
fi
interpretation of the coe cient for Management_Position? How
does holding a management position impact an employee's
salary while holding other variables constant?
4. R-Squared Value: a. What does the R-squared value (0.75)
tell us about the goodness of t of the model? How well
does this model explain the variance in employee salaries?
5. Conclusions: a. Based on the analysis, what conclusions
can you draw about the factors that in uence employee
salaries in this dataset?
ffi
fi
fl

You might also like