Unit 01
Unit 01
What is Data?
The quantities, characters, or symbols on which operations are performed by a computer,
which may be stored and transmitted in the form of electrical signals and recorded on
magnetic, optical, or mechanical recording media.
What is Big Data?
Big Data is also data but with a huge size. Big Data is a term used to describe a
collection of data that is huge in volume and yet growing exponentially with time. In
short such data is so large and complex that none of the traditional data management
tools are able to store it or process it efficiently.
“Extremely large data sets that may be analyzed computationally to reveal patterns ,
trends and association, especially relating to human behavior and interaction are
known as Big Data.”
Examples of Big Data
Following are some the examples of Big Data-
The New York Stock Exchange generates about one terabyte of new trade data per day.
Social Media
The statistic shows that 500+terabytes of new data get ingested into the databases of
social media site Facebook, every day. This data is mainly generated in terms of photo
and video uploads, message exchanges, putting comments etc.
TWITTER
A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time.
With many thousand flights per day, generation of data reaches up to many Petabytes.
Tabular Representation of various Memory Sizes
Name Equal To Size(In Bytes)
Bit 1 bit 1/8
Nibble 4 bits 1/2 (rare)
Byte 8 bits 1
Kilobyte 1024 bytes 1024
Megabyte 1, 024kilobytes 1, 048, 576
Gigabyte 1, 024 megabytes 1, 073, 741, 824
Terrabyte 1, 024 gigabytes 1, 099, 511, 627, 776
Petabyte 1, 024 terrabytes 1, 125, 899, 906, 842, 624
Exabyte 1, 024 petabytes 1, 152, 921, 504, 606, 846, 976
Zettabyte 1, 024 exabytes 1, 180, 591, 620, 717, 411, 303, 424
Yottabyte 1, 024 zettabytes 1, 208, 925, 819, 614, 629, 174, 706, 176
Characteristics of Big Data
3. Variety:
Variety means “Different forms of Data”. Now-a-days, Organizations or Human Beings or
Systems are generating very huge amount of data at very fast rate in different formats. We
will discuss in details about different formats of Data soon.
4. Veracity
• Veracity means “The Quality or Correctness or Accuracy of Captured Data”.
• Out of 4Vs, it is most important V for any Big Data Solutions.
• Because without Correct Information or Data, there is no use of storing large
amount of data at fast rate and different formats. That data should give
correct business value.
Types of Digital Data
1. Structured
2. Unstructured
3. Semi-structured
Structured
Any data that can be stored, accessed and processed in the form of fixed format is
termed as a 'structured' data.
Over the period of time, talent in computer science has achieved greater success in
developing techniques for working with such kind of data (where the format is well
known in advance) and also deriving value out of it.
However, nowadays, we are foreseeing issues when a size of such data grows to
a huge extent, typical sizes are being in the range of multiple zettabytes.
Do you know? 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.
Looking at these figures one can easily understand why the name Big Data is
given and imagine the challenges involved in its storage and processing.
Do you know? Data stored in a relational database management system is
one example of a 'structured' data.
<rec><name>Prashant
Rao</name><sex>Male</
sex><age>35</age></rec
>
<rec><name>Seema
R.</name><sex>Female<
/sex><age>41</age></rec
>
Big Data Analytics
• Big Data analytics is the process of collecting, organizing and analyzing large
sets of data (called Big Data) to discover patterns and other useful
information.
• Big Data analytics can help organizations to better understand the information
contained within the data and will also help identify the data that is most
important to the business and future business decisions. Analysts working with
Big Data typically want the knowledge that comes from analyzing the data.
STAGES INVOLVED IN BIG DATA ANALYTICS
•
High-Performance Analytics Required:
To analyze such a large volume of data, Big Data analytics is
typically performed using specialized software tools and
applications for predictive analytics, data mining, text mining,
forecasting and data optimization.
Collectively these processes are separate but highly integrated
functions of high-performance analytics.
Using Big Data tools and software enables an organization to process
extremely large volumes of data that a business has collected to
determine which data is relevant and can be analyzed to drive better
business decisions in the future.
The Challenges:
For most organizations, Big Data analysis is a challenge. Consider the
sheer volume of data and the different formats of the
data(both structured and unstructured data) that is collected across the
entire organization and the many different ways different types of data
can be combined, contrasted and analyzed to find patterns and other
useful business information.
The first challenge is in breaking down data silos to access all data
an organization stores in different places and often in different
systems.
A second challenge is in creating platforms that can pull in unstructured
data as easily as structured data.
This massive volume of data is typically so large that it's difficult to
process using traditional database and software methods.
How Big Data Analytics is Used Today:
As the technology that helps an organization to break down data silos and analyze
data improves, business can be transformed in all sorts of ways.
Today's advances in analyzing big data allow researchers to decode human DNA in
minutes, predict where terrorists plan to attack, determine which gene is mostly likely
to be responsible for certain diseases and, of course, which ads you are most likely to
respond to on Facebook.
Another example comes from one of the biggest mobile carriers in the world.
France's Orange launched its Data for Development project by releasing subscriber
data for customers in the Ivory Coast.
The 2.5 billion records, which were made anonymous, included details on calls and
text messages exchanged between 5 million users.
Researchers accessed the data and sent Orange proposals for how the data could serve
as the foundation for development projects to improve public health and safety.
Proposed projects included one that showed how to improve public safety by tracking
cell phone data to map where people went after emergencies; another showed how to
use cellular data for disease containment. (source)
The Benefits of Big Data Analytics:
Enterprises are increasingly looking to find actionable insights into their data.
Many big data projects originate from the need to answer specific business
questions. With the right big data analytics platforms in place, an enterprise
can boost sales, increase efficiency, and improve operations, customer service
and risk management.
Webopedia parent company, QuinStreet, surveyed 540 enterprise decision-
makers involved in big data purchases to learn which business areas
companies plan to use Big Data analytics to improve operations. About half of
all respondents said they were applying big data analytics to improve customer
retention, help with product development and gain a competitive advantage.
Notably, the business area getting the most attention relates to increasing
efficiency and optimizing operations. Specifically, 62 percent of respondents
said that they use big data analytics to improve speed and reduce complexity.
Application of Big Data
Here is the list of top Big Data applications in today’s world:
Big Data in Healthcare
Big Data in Education
Big Data in E-commerce
Big Data in Media and Entertainment
Big Data in Finance
Big Data in Travel Industry
Big Data in Telecom
Big Data in Automobile
Let’s discuss the applications of Big Data in detail.
Viewers these days need content according to their choices only. Content that is
relatively new to what they saw the previous time. Earlier the companies
broadcasted the Ads randomly without any kind of analysis.
But after the advent of Big Data analytics in the industry, companies now are
aware of the kind of Ads that attracts a customer and the most appropriate time to
broadcast it for seeking maximum attention.
Customers are now the real heroes of the Media and entertainment industry -
courtesy to Big Data and Analytics.
6. Big Data in
Finance
The functioning of any financial organization depends heavily on its data and to safeguard that
data is one of the toughest challenges any financial firm faces. Data has been the second most
important commodity for them after money.
Even before Big Data gained popularity, the finance industry was already conquering the
technical field. In addition to it, financial firms were among the earliest adopters of Big Data
and Analytics.
Digital banking and payments are two of the most trending buzzwords around and Big
data has been at the heart of it. Big Data is bossing the key areas of financial firms such as
fraud detection, risk analysis, algorithmic trading, and customer contentment.
This has brought much-needed fluency in their systems. They are now empowered to focus
more on providing better services to their customers rather than focussing on security issues.
Big Data has now enhanced the financial system with answers to its hardest of the challenges.
7. Big Data in Travel Industry
While Big Data is spreading like wildfire and various industries have been cooking its food
with it, the travel industry was a bit late to realize its worth. Better late than never though.
Having a stress-free traveling experience is still like a daydream for many.
And now Big Data’s arrival is like a ray of hope, that will mark the departure of all the
hindrances in our smooth traveling experience.
See how Big Data is revolutionizing the travel & tourism sector.
Through Big Data and analytics, travel companies are now able to offer more
customized traveling experience. They are now able to understand their customer’s
requirements in a much-enhanced way.
From providing them with the best offers to be able to make suggestions in real-time,
Big Data is certainly a perfect guide for any traveler. Big Data is gradually taking
the window seat in the travel industry.
8. Big Data in
Telecom
The telecom industry is the soul of every digital revolution that takes place around the world.
With the ever-increasing popularity of smartphones, it has flooded the telecom industry with
massive amounts of data.
And this data is like a goldmine, telecom companies just need to know how to dig it properly.
Through Big Data and analytics, companies are able to provide the customers with smooth
connectivity, thus eradicating all the network barriers that the customers have to deal with.
Companies now with the help of Big Data and analytics can track the areas with the lowest as
well as the highest network traffics and thus doing the needful to ensure hassle-free network
connectivity.
Big Data alike other industries have helped the telecom industry to understand its customers
pretty well.
Telecom industries now provide customers with offers as customized as possible.
Big Data has been behind the data revolution we are currently experiencing.
9. Big Data in
Automobile
“A business like an automobile, has to be driven, in order to get results.” B.C. Forbes
And Big Data has now taken complete control of the automobile industry and is driving it
smoothly. Big Data is driving the automobile industry towards some unbelievable and never
before results.
The automobile industry is on a roll and Big Data is its wheels or I must say Big Data has
given wings to it. Big Data has helped the automobile industry achieve things that were
beyond our imaginations
From analyzing the trends to understanding the supply chain management, from taking
care of its customers to turning our wildest dream of connected cars a reality, Big Data is
well and truly driving the automobile industry crazy.
BIG DATA ANALYTICS TOOLS