Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Scraping Data From the Web!
You have completed Scraping Data From the Web!
Preview
Just because we can do something doesn't mean that we always should do it. Let's take a look at some of the responsibilities that come with the power of web scraping.
Additional Resources
Scraping Legal Cases
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
Now that we've seen how to do some basic
web scraping, I'd like to talk about some
0:00
other responsibilities we have
as citizens of the Internet.
0:04
As Sir Francis Bacon stated,
knowledge is power.
0:08
Having the knowledge and skills needed
to do web scraping is a powerful tool.
0:12
There is another saying
that many attribute,
0:16
ironically enough,
to the Spider-Man comics.
0:19
With great power comes
great responsibility.
0:21
Some of the responsibility
that is incumbent upon us
0:24
as web scraping developers is to know and
follow applicable laws.
0:27
These will vary from country to country.
0:31
And much of the law around
digital content ownership
0:33
is continuously being tested in courts.
0:36
As a disclaimer here,
I'm not an attorney, so
0:40
don't take the following as legal advice.
0:42
With that in mind, however, there are some
specific areas of the law we should be
0:45
aware of at a high level to make us better
citizens and keep us out of trouble.
0:50
Some examples of laws to consider are,
in the United States there
0:55
are three main legal claims that
can be made against web scraping.
1:00
Copyright infringement,
the Computer Fraud and Abuse Act,
1:03
which prohibits accessing a computer
without, or in excess of, authorization.
1:07
Originally designed to protect
financial and government computers,
1:12
there have been instances where other
computers and even cell phones have fallen
1:16
under the CFAA's protection due to the
nature of today's device communication.
1:21
Trespass to Chattels,
which basically means
1:26
interfering with another person's lawful
possession of movable personal property.
1:29
In the European Union there are corporate
laws to consider as well such as Directive
1:34
96/9/EC commonly known as
the Database Directive.
1:39
In Australia, the Spam Act of 2003
prohibits certain forms of web scraping.
1:45
If you decide to produce web
scraping utility, especially one for
1:51
profit, keep these things in mind.
1:54
Again, if you find yourself in a legally
ambiguous web scraping project,
1:56
consult with an attorney who
specializes in this area.
2:01
How can we protect ourselves and
still utilize web scraping tools?
2:05
Many sites include a robots.txt file,
2:09
where limits can be set
as to where bots can go.
2:13
This robots.txt file is
a standardized file which follows
2:16
the robots exclusion standard.
2:20
As you might be able to imagine,
this can become a bit legally muddy,
2:22
a site stating that a human
can access certain parts but
2:26
a computer can't gets a bit
tricky from a legal stand point.
2:29
Similarly, sites may have
a posted terms of service
2:34
that states which part of the site,
if any, data can be collected from, or
2:37
how their data needs to be attributed.
2:41
There have been many interesting
legal cases around web scraping.
2:44
I've included some links in
the teacher's notes about a few.
2:47
Okay, we've seen the power of
scrapping a basic website with Beautiful Soup.
2:51
And now have briefly discussed some of our
responsibilities having these new powers.
2:55
In the next stage let's take a look at
extending our data wrangling skills
3:00
beyond a single page and
start crawling the web.
3:04
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up