Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Data Analysis Basics!
You have completed Data Analysis Basics!
Preview
Data isn't always distributed the way you want. In this video we'll talk about a few of the different ways we can measure the spread of our data.
This video doesn't have any notes.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
We've got the extremes of our data and
we've got the middle.
0:00
But how is our data distributed?
0:03
One common way to describe the spread of
our data is to use the standard deviation
0:05
which is commonly represented
as the Greek letter sigma.
0:10
The standard deviation aims to tell us how
far away our data is from the average.
0:13
To calculate it,
0:18
we start by taking the difference
between each value and the average.
0:19
Then we square each of those values,
add them up, and
0:23
divide by the total number of values.
0:26
This gives us the standard deviation
squared which is also called the variance.
0:29
So to get this standard deviation, we just
take the square root and there we go.
0:34
We've got a standard deviation of 64.29,
so if we were to put this on a graph,
0:39
we'd put the average in the middle and
then go 64.29 above and below the average.
0:44
Then we can say that any data in this
range is within one standard deviation
0:50
of the average.
0:55
So that's a pretty big range.
0:56
Let's see what happens if instead of
a perfect game, our first bowler,
0:58
bowls a 135.
1:03
Now, instead of an average of 134.5,
we've got an average of about 114 and
1:04
our standard deviation is
all the way down to just 17.
1:10
So if we make a plot of this new standard
deviation, we can see that this data
1:14
is much more clustered together than
when it included a perfect game.
1:19
Let's calculate the standard deviation for
the finishing times.
1:23
First, let's add a new label for
Standard Deviation in row nine.
1:27
And let's make it bold and
1:36
then double-click right here to
automatically set the width of the column.
1:38
Then, in the cell next to it, let's type
=STDEV and hit Enter to select a function.
1:45
Then let's paste in the range and
hit Enter again and
1:54
it looks like we've got
a Standard Deviation of about 42 minutes.
1:58
Also, if you're not seeing 42 minutes
here, you can come over here and
2:02
change the data type to Duration and
that should fix your issue.
2:07
So most racers finished within 42
minutes of the average finish time.
2:12
But standard deviation
doesn't tell the whole story,
2:17
it only tells us how compact or
spread out our data is.
2:21
To get the rest of the picture,
we need to talk about skew.
2:25
Skew is when your data seems to
favor one side over the other.
2:29
Most of the data is either to the right or
left of the middle.
2:34
And depending on which
side has the long tail,
2:37
you would say that this data is either
skewed negatively or positively.
2:40
An easy way to remember skew
directions is to start at the peak and
2:45
draw an arrow towards the long tail.
2:49
The direction that arrow points
is how the data is skewed.
2:52
So this data has a negative skew.
2:56
On the other hand, if your data has
no skew and its mean, median, and
2:59
mode are all right in the middle,
then your data is said to have
3:04
a normal distribution which is
frequently referred to as a bell curve.
3:08
Normal distributions have many
convenient properties and
3:13
they occur fairly frequently in real life.
3:16
People's heights, test scores, and
3:19
even blood pressures are all
normally distributed.
3:21
One property of normal distributions
is how many values occur within a given
3:25
standard diviation of the mean.
3:29
68% of the data should be contained
within 1 standard deviation,
3:30
95% should be contained within 2.
3:35
And if you go out to 3
standard deviations at 99.7%,
3:39
that should be pretty
much all of the data.
3:44
Let's see if our data is normally
distributed by seeing how
3:46
close we come to these
numbers in the next video.
3:49
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up