Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Introduction to Big Data!
You have completed Introduction to Big Data!
Preview
When you start dealing with large datasets, the first question you should ask is “What kind of data are we dealing with?”
Terms
- GIS -- Geographic Information Systems
- Relational Database -- A database that stores data according to a predefined schema.
- SQL -- Structured Query Language
- NoSQL -- An overloaded term for non-relational databases
Learn More
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
When you start dealing
with large datasets,
0:00
the first question you should ask is,
what kind of data are we dealing with?
0:02
Is it structured text data?
0:06
Or is unstructured?
0:08
Is data missing?
0:09
Is it not text?
0:11
Is it video?
0:11
Or is it audio?
0:12
Does it have a location, or
geo-spacial information tied to it?
0:13
Are there a lot of default
values in the data?
0:17
The list goes on and on.
0:19
To really make sense out of the data that
you have on hand, and to begin to solve
0:21
the business problems with it, the first
step is to recognize the type of data
0:25
that you have and to put it into the
appropriate systems for storing that data.
0:30
For instance, if your data is something
like a group of customers and
0:35
their buying habits, it will probably
fit best in a relational SQL database or
0:38
a document-based NoSQL database.
0:42
If your data is a social network in
structure, it's dealing with how
0:45
things are interconnected,
you probably want to use a graph database.
0:48
if you have thousands
of videos to process,
0:53
you need to know that before you try
to store it in systems not equipped for
0:55
the high levels of band width needed
to transfer that data in and out.
0:59
Let's explore the major types
of data that you may encounter.
1:04
Structured data is data which is
formatted in a specific structure.
1:07
This means we can often separate
the data into fields that we can access.
1:13
Structured data makes the most sense for
1:18
a relational database where
the structure won't change very often.
1:20
Some common examples of structured
data are application logs,
1:23
customer information, and financial data.
1:27
Now conversely to our structured data is,
1:30
this is data which cannot easily fit
into one or more defined labels.
1:34
For instance, when Twitter analyzes
tweets for malicious content,
1:38
they can never be certain of exactly
what the data represents in the tweets.
1:41
Words and content can mean very different
things, and because there is so
1:45
much data to analyze, the unstructured
nature makes processing it even harder.
1:49
Common examples of unstructured data
include social media posts, books, and
1:54
healthcare data.
1:58
One of the most common unstructured forms
of data actually deserves its own type.
2:00
This kind of data requires high amounts
of bandwidth to process and store.
2:05
Compression is almost always necessary.
2:09
You can think of examples from
Netflix to social media posts.
2:12
Video conferencing apps also produce and
consume this type of data.
2:15
A common request is to store
additional location data or
2:19
metadata alongside other information.
2:22
When this happens,
a world of possibilities opens up.
2:25
Now it's possible to analyze everything
from simple location tracking
2:29
to how some object interacts with
the product in different places overtime.
2:33
A common example of this is any
app that tracks your location,
2:38
any sort of mapping or
direction app like Waze or Google Maps.
2:42
It could also be used for analyzing
fleets of trucks for a company, or
2:46
even for military.
2:50
Think drones of tracking
soldiers on the ground.
2:51
The Internet of Things, or IoT for
short, has created connected devices and
2:54
sensors all over the world.
2:58
These range from weather sensors for
3:00
alerting approaching tornadoes,
all the way to fitness apps.
3:01
The accelerometer of every person with a
modern smartphone can now record activity
3:05
information and send it back to a plethora
of apps for optimizing your workouts.
3:10
Sensors can transmit all
kinds of types of data,
3:15
from structured texts to video and audio.
3:18
In addition to those examples,
some other use cases are vehicle
3:21
communication systems, smart home
devices and traffic monitoring systems.
3:26
At the end of the day, all data on the
modern Internet is really just 0s and 1s.
3:30
The system that we are about to discuss,
right after this quick break,
3:35
are all built for specific formats
of these bits, these 1s and 0s.
3:39
Now keep in mind, there are thousands
of solutions in the world for
3:44
solving problem dealing with big data.
3:48
But before choosing, you need to know
what types of data you are dealing with
3:50
to help your choice of the right tool or
framework.
3:54
Let's dive into the major domains of
big data, starting with data storage.
3:57
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up