Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Preparing Data for Analysis!
You have completed Preparing Data for Analysis!
Preview
Continue cleaning the dataset with Python's Pandas library. Tackle cleaning missing data.
Binder is no longer available for this project, please follow along by downloading the project files from the downloads tab.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
Hello again.
0:00
Next up,
we're going to tackle missing data.
0:01
So just like we have before,
is to missing data, perfect.
0:04
Now let's run info again,
data.info, scroll.
0:11
And you can see we have a total of 151
0:16
entries in our table here or data frame.
0:21
But some of these have less than 151.
0:26
Those will be the empties.
0:31
So to find our empties, we're going to run
0:33
data where data.is null,
which is our empties and
0:38
we're gonna do this on.any axis = 1.
0:44
If I hit enter .any, there we go.
0:49
I was like, no, but I had a comma
instead of a period, perfect.
0:58
This returns three rows and
1:02
you can see the empty values
are these NaN's not a number.
1:05
If I didn't know these values,
1:09
I would just leave them as is since we can
look up this data let's start fixing them.
1:11
So let's do golbat first = data.loc.
1:16
And then we can just pass in
its ID number which is 42.
1:22
And then to check we can do golbat Name,
1:26
oops so you can see we get golbat.
1:31
So we have the correct one.
1:35
So, Name is correct we
need to do Height And
1:37
then we need to set it equal to and
let's go Golbat Pokemon.
1:44
Back over to our Pokedex.
1:50
5'03'' 5 times 12 is 60, plus 3 is 63.
1:53
And I'm just gonna do an enter,
golbat[''Weight'']
1:59
Equals 121.3 And
2:09
because the rest of those are all floats,
I'm gonna make sure I add a zero here.
2:16
And that should fix golbat.
2:20
And if I just kind of
run this one real quick,
2:23
we can see golbat 63 121.3,
so I had fixed it.
2:27
Awesome, I'm gonna remove this and
we're gonna do the next one.
2:31
So essentially,
copy this we're doing something
2:35
very similar and
machoke which is location 68.
2:40
So that's our next one in the list.
2:45
And that one's type we need to fix so
machoke.
2:48
Type equals and let's go find it.
2:54
I believe I know what it is, but we'll
just go through the steps just in case.
3:00
So we have fighting, and this is a string,
so don't forget to make it a string.
3:05
And that one is fixed and lastly,
3:12
is seel, which is location 87,
3:16
and we need to do the type as well.
3:21
So we need, seel, type equals.
3:25
There's our Seel.
3:37
Type is water.
3:41
Same thing,
3:43
make sure that it's a string and also make
sure you start that first capital letter
3:44
because that's what is being done
in the database or the data set.
3:48
Shift Enter to run it and perfect.
3:55
Now I can run this one
more time down below and
3:58
we get no results because
we fixed them all perfect.
4:03
I also wanna highlight that
pandas has a dropna function.
4:09
Let's look at the docs real quick.
4:13
So drop na I'm gonna come down here to
the examples it's a little bit more clear.
4:16
So you can see they have here this
Alfred and it has no toy is missing and
4:21
then the date or
the time here doesn't exist either.
4:27
So drop the rows where at
least one element is missing.
4:32
So if you do dropna because this
is missing at least one thing, and
4:35
this is missing at least one thing,
it actually deletes the entire row.
4:40
So this is another option depending on
what you need to do with your data set.
4:45
So I wanted to make sure I showed you,
with what we're doing,
4:49
we don't need to drop any full rows.
4:54
But it is an option if you have a dataset
where you have lots of empties and
4:56
you just need to delete your
rows where it has empties.
5:00
So just wanted to call that out for you.
5:04
Lastly, we need to find the unknowns,
and NaNs.
5:09
Cuz if you remember in the spreadsheet our
5:12
NaNs were actually like
this instead of like this.
5:16
So we'll actually need to find them
because they came in as strings and
5:20
we'll be able to see
that in just a second.
5:24
So let's do a search first for
any unknowns and
5:26
have it spit out the index for that row.
5:30
So it looks like that.
5:33
So data, where data we'll do it
5:34
on Height, first Height inches
5:38
Equals Unknown.
5:45
And then we're gonna do .index because
we just want it to bring us the index.
5:49
So you can see we have one,
we have one here at 153.
5:54
So let's go ahead and find 153,
5:58
data.loc[153].
6:03
And it looks like Mew, and yep,
we can see the Height is Unknown, so
6:06
we'll need to go in and fix that.
6:10
So let's do mew =, do an enter mew,
6:12
Height, inches = and
6:18
then we can come back over here.
6:22
Search for mew.
6:27
There's mew Height is 1 foot 4 inches, so
6:30
that's 12 plus 4, which is 16
6:35
Perfect, and
now we can check the other column as well.
6:40
I'm gonna copy this, we can check weight.
6:45
And looks like we don't have any there.
6:57
So then I'm just gonna use this same spot.
6:58
Let's also do type.
7:01
We have 120 so let's fix
that data.loc 120 and
7:03
it's Goldeen's type it's Unknown.
7:11
So let's do that now.
7:16
Equals, and
then let's go find that information.
7:28
And that's water.
7:36
Don't forget to make it a string with a W.
7:38
And copy this again and we could
essentially just copy paste over the top
7:44
of these but I'm listing them out just so
you have the reference.
7:49
So Type back to being now with
nothing because we fixed the error.
7:53
Let's try weaknesses.
8:01
Nothing for weaknesses.
8:02
And just to make sure it
let's check our name.
8:03
Looks like we do have one.
8:08
We have one at 35.
8:09
So data location 35.
8:11
Looks like our Fairy Steel and Poison.
8:16
And if I look at 34 we have Nidoking.
8:20
If I look at 36 we have Clefable.
8:24
So these are kind of in order.
8:28
So if I do Nidoking, I hit Enter,
8:31
see also number 034.
8:36
If I go to Clefairy, Clefairy is 035 and
8:39
then we can see Clefable when we
did 036 was what we got there.
8:43
So let's check Clefairy,
8:48
2 feet is 12 times 2 which is 24 and 16.5.
8:51
So we can actually close that one.
8:56
So we got oops,
that's not the one we want to do.
8:58
There we go, so we got 24 16.5.
9:04
Fairy, Steel, Poison.
9:05
Perfect, so this should be Clefairy.
9:09
So now we can do
9:13
Clefairy = Clefairy,
9:16
Name = Clefairy.
9:23
Perfect, and
then I can run this one more time
9:28
down below just to show
that now it's empty.
9:32
Now let's check for our Nan.
9:37
Our Nan like this, because I believe
those might have come in as strings.
9:40
So that's something to check for because
sometimes it's a simple mistake to put in
9:46
with the lowercase instead
of an uppercase at the end.
9:51
So, I'm just gonna instead
of unknown I'm gonna do Nan.
9:54
And nothing, Height,
9:58
inches, nothing.
10:03
Weight, lbs there's one 148.
10:06
So let's do data.location 148.
10:13
And that is Moltres.
10:19
You can see we have Nan.
10:20
So we can give that a fix.
10:23
And this is the Weight, lbs Equals,
10:32
let's go find our information.
10:37
And weight is 132.3.
10:45
And just to check, yep,
it was weight 132.3.
10:49
I'm gonna add that zero.
10:53
Perfect, just like we were doing before,
10:54
I'm gonna run it again
just to give it a check.
10:58
Yep, zero now after Weight is Type.
11:01
Nothing for type and
11:06
then the last one was Weaknesses.
11:09
Now we've tackled all of our missing data.
11:14
We found everything that's empty.
11:16
We found everything that
has either Unknown or
11:18
Nan with a lowercase second n,
and fixed all that information.
11:21
Perfect, next we're gonna
tackle formatting errors.
11:26
See you in the next video.
11:30
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up