0% found this document useful (0 votes)
34 views2 pages

Big Data 3

This whitepaper discusses the key dimensions of big data beyond just volume. It notes that for data to be considered big data, it needs to be analyzed as a set rather than just stored. The three main dimensions of big data discussed are volume, variety, and velocity. Volume refers to the size of the data, variety refers to the different types of data records, and velocity refers to the speed at which new data is generated and needs to be analyzed. Traditional databases struggle with big data due to its dynamic variety of entity types and relationships that are not predefined.

Uploaded by

Otra Vez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views2 pages

Big Data 3

This whitepaper discusses the key dimensions of big data beyond just volume. It notes that for data to be considered big data, it needs to be analyzed as a set rather than just stored. The three main dimensions of big data discussed are volume, variety, and velocity. Volume refers to the size of the data, variety refers to the different types of data records, and velocity refers to the speed at which new data is generated and needs to be analyzed. Traditional databases struggle with big data due to its dynamic variety of entity types and relationships that are not predefined.

Uploaded by

Otra Vez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

A Randy Franklin Smith whitepaper

commissioned by
1. THERE’S MORE TO BIG DATA THAN “BIG”

The “Big” in Big Data applies to much more than simply the volume of data. There is a threshold above which data
becomes truly Big Data, but that threshold is constantly moving as technology improves. With current
technologies, Big Data seems an appropriate term as one begins dealing with data-analysis scenarios that process
hundreds of terabytes. This is even more true when petabytes become the practical unit of measure. Note the
qualifying phrase, “data analysis scenarios that process.” A physical data center that hosts an exabyte of data is
not necessarily dealing with Big Data. But if you must analyze an exabyte of data to answer a given question, then
you are far into the realm of Big Data.

The point is that large amounts of data becomes Big Data only when you must analyze that data as a set. If you are
simply storing 20 years’ worth of nightly system backups so that you can someday reference what a modest-sized
data set looked like 12 years ago, then you don’t have a Big Data scenario on your hands; you simply have a big
storage situation. Big Data is about the analysis of truly large sets of data. If pressed to use a single, simple metric
for Big Data, it might be most accurate to use record quantity. But as you'll see, there are more dimensions to Big
Data than either sheer volume or record quantity.

If Big Data were all about running traditional SELECT queries against bigger and bigger row quantities and sizes,
then we could simply build bigger clusters of relational databases. When you talk to data scientists about Big Data,
the primary idea that you come away with is the difference in analytical methods compared to traditional
relational-database queries. Big Data is about finding the compound relationships between many records of varied
information types. With traditional relational databases, the relationships are predefined in terms of discreet
entities with primary and foreign keys and views that join data along those linked keys. Each time you encounter a
new entity type, you must add a new table
and define its relationship to all the existing

Big
tables. Such encounters are often more
complicated, requiring you to refactor a Data
table into two or more new tables. Velocity

This is where the second of the so-called "3


Data
Vs" of Big Data—variety—comes in.
Incrementally, the next most accurate but Is.. Data
less simple Big Data metric would be record Science
quantity multiplied by total record types.
Relational database models and their
relevant analysis techniques rely on a finite
Data Data
number of entity types with known Volume
Variety
relationships. Big Data is about putting all
possibly relevant data together and finding
relationships and clusters that we didn’t
know were there in the first place. Therefore, data-analysis scenarios that involve a growing and dynamic variety
of entity types can qualify as Big Data, even when dealing with a relatively small amount of data and especially
when the analysis requires techniques that are associated with the Big Data paradigm.

The final measure of magnitude that helps to define Big Data is velocity, or the rate at which new data must be
stored. (For a second, more significant aspect to velocity, see section 2, “The Real-Time Requirement for BDSA.”)
Certainly, not all Big Data scenarios include high velocity. Analysis of data that is collected over a multi-decade

You might also like