🤑 Join the Treehouse affiliate program and earn 25% recurring commission!

New No-Code Track! 🚀start learning today!

🌟 Dreaming of a bright future? 🎓 Ask about the Treehouse Scholarship program! 🚀

✨ Earn college credits in Cybersecurity, JS, HTML, CSS and Python

Well done!

You have completed Scraping Data From the Web!

Sign up for Treehouse Back to Library

Preview

Sign up for Treehouse Continue

Scraping APIs

6:40 with Ken Alger

APIs are all around us on the web. Sometimes we can use scraping techniques to interact with them in a meaningful way.

Teacher's Notes
Questions?
Video Transcript
Downloads
Workspaces

This video doesn't have any notes.

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Back in the good old days of the Internet, if we wanted data, 0:00

we had to view it on Web pages. 0:03

Now, however, many sites provide a Web API that shares their data. 0:05

Sometimes, we can use these APIs to directly access information, 0:11

without having to scrape the data. 0:15

I'd recommend looking to see if the site you are wanting to scrape 0:18

offers an API for the information you need. 0:21

It can be a big time saver. 0:24

Let's take a look at how we can get data from the World Bank, using their API. 0:27

There are many instances when using an API is great. 0:32

Sometimes, though, scraping results from an API is useful as well, 0:36

especially if the API documentation isn't super helpful. 0:41

Let's take a brief look at one technique we can use to get and 0:45

process data from an API. 0:49

In this case, we'll look at The World Bank API. 0:51

It's actually very well documented, which provides us with some extra knowledge 0:54

as we go about trying to scrape things. 0:59

If we look here, at the Developer Information overview page, 1:01

it provides information about how to get started, and what the API provides. 1:04

Let's look here, 1:09

at the Country Queries section, to see what information we might explore there. 1:10

It looks like we could use this information to get some generic 1:15

information about the countries of the world. 1:18

For example, if we wanted to do some high-level data exploration about 1:21

income level in regions of the world, let's use this request format here, 1:25

look through some ISO codes, and get some information that we could explore. 1:31

We won't be doing any actual exploration of data in this course, but 1:35

check the teachers' notes for more information. 1:40

Let's take a look at the information we get from a country with a lot of horses, 1:42

like Ethiopia. 1:46

I know their ISO code is ETH, so let's put that into the request format. 1:47

So we can copy this, Let's create 1:53

a new tab, and we'll do ETH. 1:58

It looks like we're getting back the same information as the documentation stated, 2:02

and it's in XML format. 2:06

That's great, we can handle that, we'll use Beautiful Soup to parse this XML, 2:08

and get the name, region, and income level. 2:13

This could be used, for 2:16

example, to generate a histogram chart of regions of the world and income levels. 2:18

Lots of options for data visualization, here. 2:23

Let's go back to our code, and create a new world_bank.py file. 2:26

We don't need it inside the spider. 2:31

world_bank.py, and we'll start with our imports. 2:38

So, from urlib.request import urlopen. 2:42

We're going back to Beautiful Soup, so bs4 import BeautifulSoup, and 2:47

we'll be using a csv file of ISO codes, so we 'll want to import csv as well. 2:55

Let's define a function to get the country information, get_country, 3:02

and we'll pass in our country code, and 3:08

just like we've done with Beautiful Soup in the past, we define our HTML string. 3:13

It's urlopen, and 3:19

it's that request format string that we saw just a moment ago, 3:23

worldbank.org/v2/countries/, and we'll use the string formatter, 3:28

country_code, and let's bring this down to a new line. 3:39

Next, we define our soup object. 3:44

So, we pass in our HTML, and for our parser, 3:48

since we're dealing with XML, we can use an XML parser. 3:52

Scraping XML is pretty straightforward with Beautiful Soup. 3:58

If we look at the results we got for Ethiopia, we want to get three fields, 4:02

wb:name, wb:region, and the wb:incomeLevel. 4:07

Let's go ahead and define those. 4:13

Country_name is soup.find( 'wb:name' ), 4:16

Region, ( 'wb:region' ), 4:26

and income_level, soup.find( 4:31

'wb:incomelevel' ), and it was all lowercase. 4:36

Now, let's print that information out. 4:43

Here's a good example of a time when we can use the get_text method. 4:45

get_text, and we'll print the region, 4:52

get_text, and the income_level. 4:57

Now, we can loop through the ISO codes, and pass them to our get_country method. 5:06

So, if __name__, == '__main__':, 5:12

Let's bring that up on the screen a little bit, 5:19

I've included a file of ISO codes that we can open up and read. 5:22

So, file, country_code, 5:27

oop, country_iso_codes.csv, 5:33

want to read that. 5:38

Now, iso_codes, then, will be our reader, File, and our delimiter is ",". 5:43

Now, we can loop through our file, and get our information. 5:53

for code in iso_codes, and we want to pass in our code into our get_country 5:58

method, and we want the first one from the list. 6:03

Now, we can run world_bank. 6:12

And it looks like I made a mistake back up here, 6:18

it wasn't all lowercase, it's actually incomeLevel. 6:21

Let's try it again, and we get all of our expected data. 6:25

Again, we could do something else here, 6:30

like saving the information to a csv file or database. 6:32

Check the teachers' notes for more resources on that. 6:36

You need to sign up for Treehouse in order to download course files.

Sign up

You need to sign up for Treehouse in order to set up Workspace

Sign up