Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Security Literacy!
You have completed Security Literacy!
Preview
Even with secure, encrypted email and internet traffic, information about your online activity can still say a lot about you. Learn the aspects of your online presence that can identify you as uniquely as your fingerprint.
New Terms:
- Metadata -- Data about data. The additional information associated with a message or communication besides its direct content.
- IP Address -- Internet Protocol Address. A unique (enough) identifier that allows internet to route traffic to and from the right places.
- Anonymized Dataset -- A collection of information about people where the personally identifiable information has been stripped.
- De-anonymization -- A strategy in data-mining where an anonymized dataset is cross-referenced with other available data to re-identify the sources.
Further Reading:
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
In the last video,
0:00
we explored how common internet traffic
is exposed in different scenarios.
0:01
Our previous example was the fairly
innocuous movie show times.
0:05
But, what if it was about a health
condition, or a political organization?
0:09
Even with Google's HTTPS search,
an eavesdropper on an open
0:14
Wi-Fi can see that someone went to
Google.com, just not the content.
0:18
It maybe somewhat anonymous,
0:23
as the traffic doesn't say the name of the
person, but it is certainly not private.
0:24
In a very similar manner,
going to Google.com is fairly generic, but
0:29
what if it was plannedparenthood.com or
a substance abuse support website?
0:33
What if you were the only other
person in the coffee shop?
0:38
Metadata is essentially data about data.
0:42
In the previous video,
we showed a search for
0:45
movie showtimes from a coffee
shop in Portland, Oregon, and
0:47
the results provided by Google were for
movies in the Portland area.
0:51
Google was able to find out more
information about that searched based on
0:55
metadata attached to the request.
0:59
This includes the IP address,
or internet protocol address,
1:01
which essentially lets the WiFi network,
ISP, and Google know the source
1:05
of the search request, to be able to send
the results back to the right place.
1:10
Other metadata could include date,
time, location, the browser used,
1:15
the device used, its operating system,
the network used, etc.
1:20
It's important to consider that even
though none of this information contains
1:26
your personal information specifically,
like your name or
1:29
your home address, it really doesn't
have to in order to track you.
1:32
In fact, there's so much metadata
attached to almost all internet traffic,
1:37
it can often identify you as
uniquely as your own fingerprint.
1:42
This is another website demo that
I encourage you to try yourself,
1:46
it's called Panopticlick.
1:50
And it's from the Electronic Frontier
Foundation, a great non-profit
1:51
organization dedicated to protecting
our security and privacy rights.
1:54
Just click one button, and this site will
collect as much information from you
1:59
as it has available, which is as much
as almost any site has available.
2:02
It will detect things like the device's
operating system, the browser and
2:07
version, and
even the fonts installed on the device.
2:10
These are all meant to assist the browser
in rendering web pages, things like
2:14
showing the right screen size, whether
you're on a cell phone or a laptop.
2:18
But because these details can be
aggregated with other metadata,
2:22
including your IP address, which is
specific to certain regions of the world.
2:25
The various combinations of them all end
up being fairly statistically unique.
2:29
Going back to the coffee shop example,
let's say you were not the only one
2:35
online while someone was sniffing, or
eavesdropping on the internet traffic.
2:39
Without this metadata associated
with everyone's traffic,
2:44
you'd be anonymous within
the coffee shop crowd.
2:47
But what if you were the only
one using a Mac laptop, or
2:50
what about a recognizable
model of Android phone?
2:53
This is effectively a form
of deanonymization, and
2:56
it's a real threat to your
ability to stay private.
2:59
Another form of deanonymization can occur
when organizations publish datasets that
3:03
they have anonymized, by removing personal
information like name and demographics.
3:07
Unfortunately, when combined
with other public data,
3:13
these datasets can reveal
surprisingly specific information.
3:16
Examples of De-anonymization Failures.
3:20
An academic paper, Simple Demographics
Often Identify People Uniquely,
3:23
showed that a birth date, gender, and zip
code is enough to identify most people.
3:28
Uber published, and since removed, a blog
post demonstrating how they could detect
3:33
when riders had had one night stands.
3:38
An anonymized New York City taxi
dataset was cross-referenced
3:41
with publically available photos
from news and tabloid publications.
3:45
To reveal the home addresses of
celebrities, the clubs they visited,
3:49
and even which of them tipped well.
3:53
Metadata equals surveillance,
it's that simple.
3:55
Personally, I find it pretty
convenient that my search for
3:59
showtimes, or a search for library hours
would show results local to my area.
4:03
This is a tradeoff with my security and
privacy that I accept.
4:07
Other metadata collection,
I strongly oppose.
4:12
The important thing is to understand what
metadata is, when it's being collected,
4:15
and to make that choice for yourself.
4:19
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up