Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Scraping Data From the Web!
You have completed Scraping Data From the Web!
Preview
APIs are all around us on the web. Sometimes we can use scraping techniques to interact with them in a meaningful way.
This video doesn't have any notes.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
Back in the good old days of the Internet,
if we wanted data,
0:00
we had to view it on Web pages.
0:03
Now, however, many sites provide
a Web API that shares their data.
0:05
Sometimes, we can use these APIs
to directly access information,
0:11
without having to scrape the data.
0:15
I'd recommend looking to see if
the site you are wanting to scrape
0:18
offers an API for
the information you need.
0:21
It can be a big time saver.
0:24
Let's take a look at how we can get data
from the World Bank, using their API.
0:27
There are many instances
when using an API is great.
0:32
Sometimes, though, scraping results
from an API is useful as well,
0:36
especially if the API
documentation isn't super helpful.
0:41
Let's take a brief look at one
technique we can use to get and
0:45
process data from an API.
0:49
In this case,
we'll look at The World Bank API.
0:51
It's actually very well documented, which
provides us with some extra knowledge
0:54
as we go about trying to scrape things.
0:59
If we look here, at
the Developer Information overview page,
1:01
it provides information about how to
get started, and what the API provides.
1:04
Let's look here,
1:09
at the Country Queries section, to see
what information we might explore there.
1:10
It looks like we could use this
information to get some generic
1:15
information about
the countries of the world.
1:18
For example, if we wanted to do some
high-level data exploration about
1:21
income level in regions of the world,
let's use this request format here,
1:25
look through some ISO codes, and get
some information that we could explore.
1:31
We won't be doing any actual
exploration of data in this course, but
1:35
check the teachers' notes for
more information.
1:40
Let's take a look at the information we
get from a country with a lot of horses,
1:42
like Ethiopia.
1:46
I know their ISO code is ETH, so
let's put that into the request format.
1:47
So we can copy this, Let's create
1:53
a new tab, and we'll do ETH.
1:58
It looks like we're getting back the same
information as the documentation stated,
2:02
and it's in XML format.
2:06
That's great, we can handle that, we'll
use Beautiful Soup to parse this XML,
2:08
and get the name, region,
and income level.
2:13
This could be used, for
2:16
example, to generate a histogram chart of
regions of the world and income levels.
2:18
Lots of options for
data visualization, here.
2:23
Let's go back to our code, and
create a new world_bank.py file.
2:26
We don't need it inside the spider.
2:31
world_bank.py, and
we'll start with our imports.
2:38
So, from urlib.request import urlopen.
2:42
We're going back to Beautiful Soup,
so bs4 import BeautifulSoup, and
2:47
we'll be using a csv file of ISO codes,
so we 'll want to import csv as well.
2:55
Let's define a function to get
the country information, get_country,
3:02
and we'll pass in our country code, and
3:08
just like we've done with Beautiful Soup
in the past, we define our HTML string.
3:13
It's urlopen, and
3:19
it's that request format string
that we saw just a moment ago,
3:23
worldbank.org/v2/countries/, and
we'll use the string formatter,
3:28
country_code, and
let's bring this down to a new line.
3:39
Next, we define our soup object.
3:44
So, we pass in our HTML,
and for our parser,
3:48
since we're dealing with XML,
we can use an XML parser.
3:52
Scraping XML is pretty
straightforward with Beautiful Soup.
3:58
If we look at the results we got for
Ethiopia, we want to get three fields,
4:02
wb:name, wb:region, and
the wb:incomeLevel.
4:07
Let's go ahead and define those.
4:13
Country_name is soup.find( 'wb:name' ),
4:16
Region, ( 'wb:region' ),
4:26
and income_level, soup.find(
4:31
'wb:incomelevel' ), and
it was all lowercase.
4:36
Now, let's print that information out.
4:43
Here's a good example of a time when
we can use the get_text method.
4:45
get_text, and we'll print the region,
4:52
get_text, and the income_level.
4:57
Now, we can loop through the ISO codes,
and pass them to our get_country method.
5:06
So, if __name__, == '__main__':,
5:12
Let's bring that up on
the screen a little bit,
5:19
I've included a file of ISO codes
that we can open up and read.
5:22
So, file, country_code,
5:27
oop, country_iso_codes.csv,
5:33
want to read that.
5:38
Now, iso_codes, then, will be our reader,
File, and our delimiter is ",".
5:43
Now, we can loop through our file,
and get our information.
5:53
for code in iso_codes, and we want to
pass in our code into our get_country
5:58
method, and
we want the first one from the list.
6:03
Now, we can run world_bank.
6:12
And it looks like I made
a mistake back up here,
6:18
it wasn't all lowercase,
it's actually incomeLevel.
6:21
Let's try it again, and
we get all of our expected data.
6:25
Again, we could do something else here,
6:30
like saving the information
to a csv file or database.
6:32
Check the teachers' notes for
more resources on that.
6:36
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up