Overview of Big Data
Overview of Big Data
Bit 0 or 1 value b
Byte 8 bits B
1024*
Kilobyte KB
bytes
Megabyte 1024 KB MB
Gigabyte 1024 MB GB
Term Capacity Abbreviation
Terabyte 1024 GB TB
Petabyte 1024 TB PB
Exabyte 1024 PB EB
Zettabyte 1024 EB ZB
Yottabyte 1024 ZB YB
* Note that because bits are binary in nature and are the basis on which all other storage
values are based, all values for data storage units are defined in terms of powers of 2. For
example, the prefix kilo typically means 1000; however, in data storage, a kilobyte = 2 10 = 1024
bytes.2 (Table 14.1, Storage Capacity Units; p. 651)
To manage big volumes of data, we have two options for handling additional load.2
Scale up, meaning we keep the same number of systems to store and process data,
but migrate each system to a larger system.
Scale out, meaning we increase the number of systems, but do not migrate to larger
systems.
Velocity
Velocity refers to the speed at which data is entered into a system and must be processed.
For example, Amazon captures every click of the mouse while shoppers are browsing on its
website.2 This happens rapidly.
Velocity is important in stream processing. Think of all the data from radio-frequency
identification (RFID), global positioning system (GPS), near-field communication (NFC), and
Bluetooth sensors flooding in to a system. Stream processing aims to aggregate single data points
from high-velocity data, in order to trigger a high-level event when a certain pattern is detected.
It also focuses on deciding which data to keep from a stream, since it is unfeasible to retain all
the data that is rushing in.
Variety
Variety refers to the complexity of data formats. Big data consists of different forms of data.
For example, when a telecommunications company like Telstra records data on calls to its call
centre, this data includes both:
structured data, which conforms to a predefined data model (e.g., your customer ID, the
timestamp of your call, your service type), and
unstructured data (e.g., the recording of the call, notes that the call centre operator makes
during the call, the problem history related to your call).
Veracity
Veracity refers to the trustworthiness of data. The more data is collected and analysed
automatically but not captured in its entirety (due to the high volume and velocity), the higher
the uncertainty about the accuracy of data. For example, it is particularly challenging to verify
the truthfulness of posts on social media platforms, as we do not always know the posters’
backgrounds and their intentions. In fact, detecting fake reviews, fake news, and fake friends is
currently an active research area.
The four V’s as an infographic
The IBM Big Data & Analytics Hub provides an infographic which explains and gives
examples of each of the four V’s.
To expand the infographic, click on the image. You will also find a downloadable PDF text
version of this infographic in the downloads section at the end of the step.
Other V’s
Further V’s that are often mentioned as key characteristics of big data are:
value: how meaningful the data is
visualisation: graphical representations to assist humans in understanding big data.
Hopefully, you now have an idea of what big data is. In the next step we will discuss where
all the data is coming from.
Your task
How would you define big data?
Share your thoughts in the comments.
References
1. Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers AH. Big data:
The next frontier for innovation, competition, and productivity [Internet]. McKinsey Global
functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation ↩
Institute; 2011[cited 2018 Oct 24]. 143 p. Available from: https://www.mckinsey.com/business-
3. Elmasri R, Navathe SB. Fundamentals of database systems. 7th ed. Pearson; 2017. ↩
2. EMC Education Services. Data Science and Big Data Analytics. Wiley; 2015. ↩
Goodreads. Aristotle > quotes > quotable quote [Internet]. Available
from: https://www.goodreads.com/quotes/20103-the-whole-is-greater-than-the-sum-of-its-parts