Excel Summary Doc For STA1000 Ammaar Salasa 2023
Excel Summary Doc For STA1000 Ammaar Salasa 2023
Excel Summary Doc For STA1000 Ammaar Salasa 2023
Contents
STA1000 Microsoft Excel Summary ........................................................................................... 1
STA1000 useful Microsoft Excel Formulae and Functions ......................................................... 2
General keyboard shortcuts:................................................................................................... 2
Jargon and Terminology: ........................................................................................................ 2
Notation and Conventions: ..................................................................................................... 3
Checking if a built-in function exists ........................................................................................ 5
Excel for STA1000.................................................................................................................. 6
Section: Probability ............................................................................................................. 6
Note on Conditional probability: ...................................................................................... 7
Note on graphical representation of data........................................................................ 8
Section: Measures of location, central tendency and spread ........................................ 8
Section: Probability Distributions...................................................................................10
STA1000 useful Microsoft Excel Formulae and Functions
The goal of creating this document is to demystify the world of Microsoft Excel for
students taking introductory statistics courses. The document provides a summary of
some useful ways to use Microsoft Excel and encourages a level of confidence such
that Excel can become a tool. Note that this summary is not comprehensive. Many other
useful functions and formulae exist. This summary also does not concern any use of
Macros or speak to any VBA knowledge.
Observe that each individual rectangle is a cell, the cell highlighted in green is in row 4
and column B. The highlighted cell has a built-in function used to add number called the
SUM function. We will soon see examples of manual formulae. The array in column A
corresponds to the array in column B since items in the same row are related. A4 gives
an English description of what is shown in B4.
Notice that although I could have simply entered the formula in cell B3 but, by listing the
data as I have and referencing the cells in which they are found, I am now able to
change either or both of numbers and have my formula adjust to these new values. This
concept, referencing cells in formulae instead of the actual values, is crucial in excel.
Absolute referencing:
Excel has the feature that when a formula that references a cell is copied, the
referenced cells shift such that the referenced cells are in the same relative position as
they would be for the original cell.
Example:
Observe that after copying the formula in cell B3 to cell C3 the referenced cells also
adjust so that C1 and C2 are added instead of B1 and B2. To combat this, we can
absolute reference either the row or column or both relating to a cell. Example:
Notice that I have used ‘$’ to absolute reference, this ‘locks the item into place’. For the
first cell, B1, no matter where I copy the formula to, the column, B, will be unchanged,
the row will still adapt though. The inverse is true of the second cell, B2.
Array Formulae:
Since September 2018, new dynamic array formulae have been added to Microsoft 365
which will automatically place multiple outputs from any formula into either subsequent
row or column cells. In addition, these formulae can be entered into once cell in the
same way that a normal formula can.
In Legacy versions of Microsoft excel, applying an array formula required that all output
cells be selected, and the formula would be applied using Ctrl+Shift+Enter. These
legacy array formulae are sometimes called CSE formulas.
In the above, I have used an array formula to find the product of multiple cells and add
the result. Although this is a simple example, it demonstrates the usefulness of array
formulae in performing multi-step data manipulation for analysis.
Checking if a built-in function exists
It is not unusual to know and be able to describe what you need to do and yet not know
how to put this into mathematical language or high-level commands. Fortunately, there
are two ways to help this fact:
1. Excel’s formula finder: navigate to the formula tab at the top of your screen and
search, in words for what you need done, then press next:
You will always be able to use this functionality even in tests and exams, but do not
assume that you have loads of time to identify the correct function. Your time in the
assessment is limited.
2. Google it: although you can only do this before and after tests and exams this is
a useful tool if the excel description is confusing or the search does not produce
what you need:
Excel for STA1000
Section: Probability
=PROB(x_range, probability_range, lower limit, [upper limit])
This function is especially useful when dealing with the tables of values given when
dealing with probability mass functions of random variables. Often a question will give
you either a list of values or table of values with associated probabilities and ask for the
probability of one or more of those values occurring as an event. This function returns
the probability that values in a range are between two limits or equal to a lower limit
Arguments
x_range – range of discrete values
probability_range – probability associated with those discrete values being the result of
random experiment
lower limit – minimum/only tested value in range
upper limit (optional) - the maximum value in range, give this value if you want to test a
range, do not give if you only need to test one value.
Example:
Observe that for the x range and probability we give ranges not individual cells, I did a
bad thing by referencing a number instead of a cell.
=COMBIN(number, number chosen)
Gives the number of combinations without repetition for a given number of choices.
Arguments
Number – number of total options
Number chosen – number of items chosen
Same as introstat formula n!/(n-x)!x!
=COMBINA(number, number chosen)
Gives the number of combinations with repetition for a given number of options and
choices.
Arguments
Number – number of total options
Number chosen – number of items chosen
When using this formula ensure that you are dealing with a two-sided test and
remember that is asked for a p value to subtract the output from 1, i.e. 1 –
NORM.DIST(…) = p value. NO NEED TO CONVERT TO Z OR CALCULATE A TEST
STAT
=NORM.S.DIST(z, cumulative)
Returns standard normal distribution, after converting to z and finding test stat
Arguments
Z: value at which the distribution is evaluated (test statistic/critical value)
Cumulative: True - returns probability that encountered test statistic is equal to or less
extreme than the one observed. False – returns the probability that an encountered test
statistic is equal to the observed.
Same as above but requires the calculation of the test stat using the formulae in
introstat. Like a z table but no reading from a bad pdf on a screen.
=NORM.S.INV(probability)
Returns the inverse of cumulative standard normal distribution. You have probability
and want the test stat or critical value. Once again, two-tailed will give z values.
=T.DIST.RT(x, deg_freedom)
Returns the right tailed t-distribution, sometimes called student’s t-test, probability.
Arguments
X – value at which the distribution is evaluated (critical value/test statistic)
Deg_Freedom: degrees of freedom
Gives the equivalent for a t-test of what the z table would give. Light blue highlighted
region below:
=T.DIST.2T(x, deg_freedom)
Returns the two tailed t-distribution, sometimes called student’s t-test, probability.
Arguments
X – value at which the distribution is evaluated (critical value/test statistic)
Deg_Freedom: degrees of freedom
Gives the two tailed t-test alternative of what the norm.dist formula would give. Red
highlighted area below: