BDA Introduction
BDA Introduction
“Extremely large data sets that may be analyzed computationally to reveal patterns , trends
and association, especially relating to human behavior and interaction are known as Big
Data.”
Faculty Name
Examples Of Big Data
Following are some the examples of Big Data-
The New York Stock Exchange generates about one terabyte of new trade data per day.
Faculty Name
Social Media
The statistic shows that 500+terabytes of new data get ingested into the databases of social
media site Facebook, every day. This data is mainly generated in terms of photo and
video uploads, message exchanges, putting comments etc.
TWITTER
A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With
many thousand flights per day, generation of data reaches up to many Petabytes.
Faculty Name
Tabular Representation of various Memory Sizes
Yottabyte 1, 024 zettabytes 1, 208, 925, 819, 614, 629, 174, 706, 176
Faculty Name
Characteristics Of Big Data
• The following are known as “Big Data Characteristics”.
1. Volume
2. Velocity
3. Variety
4. Veracity
5. Value
1. Volume:
Volume means “How much Data is generated”. Now-a-days, Organizations or Human Beings or
Systems are generating or getting very vast amount of Data say TB(Tera Bytes) to PB(Peta Bytes) to
Exa Byte(EB) and more.
Faculty Name
2. Velocity:
Velocity means “How fast produce Data”. Now-a-days, Organizations or
Human Beings or Systems are generating huge amounts of Data at very
fast rate.
3. Variety:
Variety means “Different forms of Data”. Now-a-days, Organizations or
Human Beings or Systems are generating very huge amount of data at very fast
rate in different formats.
Faculty Name
4. Veracity
Veracity means “The Quality or Correctness or Accuracy of Captured Data”.
Out of 5Vs, it is most important V for any Big Data Solutions. Because without
Correct Information or Data, there is no use of storing large amount of data at
fast rate and different formats. That data should give correct business value.
Faculty Name
5. Value:
After having the 4 V’s into account there comes one more V which
stands for Value! The bulk of Data having no Value is of no good to
the company, unless you turn it into something useful.
Data in itself is of no use or importance but it needs to be converted
into something valuable to extract Information.
.
Faculty Name
Types of Digital Data
1. Structured
2. Unstructured
3. Semi-structured
Structured
Any data that can be stored, accessed and processed in the form of fixed format
is termed as a 'structured' data.
Over the period of time, talent in computer science has achieved greater success in
developing techniques for working with such kind of data (where the format is well
known in advance) and also deriving value out of it.
However, nowadays, we are foreseeing issues when a size of such data grows
to a huge extent, typical sizes are being in the range of multiple zettabytes.
Do you know? 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.
Looking at these figures one can easily understand why the name Big Data is
given and imagine the challenges involved in its storage and processing.
Faculty Name
Do you know? Data stored in a relational database management system is
one example of a 'structured' data.
Faculty Name
Unstructured
Any data with unknown form or the structure is classified as unstructured data.
In addition to the size being huge, un-structured data poses multiple challenges in terms
of its processing for deriving value out of it.
A typical example of unstructured data is a heterogeneous data source containing a
combination of simple text files, images, videos etc.
Now day organizations have wealth of data available with them but unfortunately, they
don't know how to derive value out of it since this data is in its raw form or
unstructured format.
• Examples Of Un-structured Data
The output returned by 'Google Search'
Faculty Name
Semi-structured
Semi-structured data can contain both the forms of data.
We can see semi-structured data as a structured in form but it is actually not defined
with e.g. a table definition in relational DBMS.
Example of semi-structured data is a data represented in an XML file.
<rec><name>Prashant
Rao</name><sex>Male</
sex><age>35</age></rec
>
<rec><name>Seema
R.</name><sex>Female<
/sex><age>41</age></rec
> Faculty Name
Faculty Name
Faculty Name
Faculty Name
Faculty Name
Traditional Data v/s Big Data
• Traditional data: Traditional data is the structured data that is being majorly
maintained by all types of businesses starting from very small to big
organizations. In a traditional database system, a centralized database
architecture used to store and maintain the data in a fixed format or fields in a
file. For managing and accessing the data Structured Query Language (SQL) is
used.
• Traditional data is characterized by its high level of organization and structure,
which makes it easy to store, manage, and analyze. Traditional data analysis
techniques involve using statistical methods and visualizations to identify
patterns and trends in the data.
• Traditional data is often collected and managed by enterprise resource
planning (ERP) systems and other enterprise-level applications. This data is
critical for businesses to make informed decisions and drive performance
improvements.
06-01-2021 Faculty Name
Traditional Data v/s Big Data
• 2. Big data: We can consider big data an upper version of traditional data. Big data
deal with too large or complex data sets which is difficult to manage in traditional
data-processing application software. It deals with large volume of both structured,
semi structured and unstructured data. Volume, Velocity and Variety, Veracity and
Value refer to the 5’V characteristics of big data. Big data not only refers to large
amount of data it refers to extracting meaningful data by analyzing the huge amount
of complex data sets. Big data is characterized by the three Vs: volume, velocity, and
variety. Volume refers to the vast amount of data that is generated and collected;
velocity refers to the speed at which data is generated and must be processed; and
variety refers to the many different types and formats of data that must be analyzed,
including structured, semi-structured, and unstructured data.
• Due to the size and complexity of big data sets, traditional data management tools
and techniques are often inadequate for processing and analyzing the data. Big data
technologies, such as Hadoop, Spark, and NoSQL databases, have emerged to help
organizations store, manage, and analyze large volumes of data.
06-01-2021 Faculty Name
Traditional Data v/s Big Data
The main differences between traditional data and big data as follows:
• Volume: Traditional data typically refers to small to medium-sized datasets that can be easily
stored and analyzed using traditional data processing technologies. In contrast, big data refers to
extremely large datasets that cannot be easily managed or processed using traditional
technologies.
• Variety: Traditional data is typically structured, meaning it is organized in a predefined manner
such as tables, columns, and rows. Big data, on the other hand, can be structured, unstructured,
or semi-structured, meaning it may contain text, images, videos, or other types of data.
• Velocity: Traditional data is usually static and updated on a periodic basis. In contrast, big data is
constantly changing and updated in real-time or near real-time.
• Complexity: Traditional data is relatively simple to manage and analyze. Big data, on the other
hand, is complex and requires specialized tools and techniques to manage, process, and analyze.
• Value: Traditional data typically has a lower potential value than big data because it is limited in
scope and size. Big data, on the other hand, can provide valuable insights into customer
behavior, market trends, and other business-critical information.
• Before big data technologies were introduced, the data was managed by
general programming languages and basic structured query languages.
However, these languages were not efficient enough to handle the data
because there has been continuous growth in each organization's
information and data and the domain. That is why it became very important
to handle such huge data and introduce an efficient and stable technology
that takes care of all the client and large organizations' requirements and
needs, responsible for data production and control. Big data technologies,
the buzz word we get to hear a lot in recent times for all such needs.
• In this article, we are discussing the leading technologies that have
expanded their branches to help Big Data reach greater heights. Before we
discuss big data technologies, let us first understand briefly about Big Data
Technology.
/Some common examples that involve the Analytical Big Data Technologies can be listed
as below:
Stock marketing data
Weather forecasting data and the time series analysis
Medical health records where doctors can personally monitor the health status of an
individual
Carrying out the space mission databases where every information of a mission is very
important
Data Storage
Let us first discuss leading Big Data Technologies that come under Data Storage:
•Hadoop: When it comes to handling big data, Hadoop is one of the leading
/
technologies that come into play. This technology is based entirely on map-
reduce architecture and is mainly used to process batch information. Also, it is
capable enough to process tasks in batches. The Hadoop framework was mainly
introduced to store and process data in a distributed data processing environment
parallel to commodity hardware and a basic programming execution model.
Apart from this, Hadoop is also best suited for storing and analyzing the data
from various machines with a faster speed and low cost. That is why Hadoop is
known as one of the core components of big data technologies. The Apache
Software Foundation introduced it in Dec 2011. Hadoop is written in Java
programming language.
Cassandra: Cassandra is one of the leading big data technologies among the list
of top NoSQL databases. It is open-source, distributed and has extensive column
storage options. It is freely available and provides high availability without fail.
/
This ultimately helps in the process of handling data efficiently on large
commodity groups. Cassandra's essential features include fault-tolerant
mechanisms, scalability, MapReduce support, distributed nature, eventual
consistency, query language property, tunable consistency, and multi-datacenter
replication, etc.
Cassandra was developed in 2008 by the Apache Software Foundation for the
Facebook inbox search feature. It is based on the Java programming language.
Data Mining
Presto: Presto is an open-source and a distributed SQL query engine developed to run interactive
analytical queries against huge-sized data sources. The size of data sources can vary from
/
gigabytes to petabytes. Presto helps in querying the data in Cassandra, Hive, relational databases
and proprietary data storage systems.
Presto is a Java-based query engine that was developed in 2013 by the Apache Software
Foundation. Companies like Repro, Netflix, Airbnb, Facebook and Checkr are using this big data
technology and making good use of it.
RapidMiner: RapidMiner is defined as the data science software that offers us a very robust and
powerful graphical user interface to create, deliver, manage, and maintain predictive analytics.
Using RapidMiner, we can create advanced workflows and scripting support in a variety of
programming languages.
Data Analytics
Now, let us discuss leading Big Data Technologies that come under Data Analytics:
Apache Kafka: Apache Kafka is a popular streaming platform. This streaming platform is
primarily known for its three core capabilities: publisher, subscriber and consumer. It is
referred
/
to as a distributed streaming platform. It is also defined as a direct messaging,
asynchronous messaging broker system that can ingest and perform data processing on
real-time streaming data. This platform is almost similar to an enterprise messaging
system or messaging queue.
Besides, Kafka also provides a retention period, and data can be transmitted through a
producer-consumer mechanism. Kafka has received many enhancements to date and
includes some additional levels or properties, such as schema, Ktables, KSql, registry,
etc. It is written in Java language and was developed by the Apache software
community in 2011. Some top companies using the Apache Kafka platform include
Twitter, Spotify, Netflix, Yahoo, LinkedIn etc.
Splunk: Splunk is known as one of the popular software platforms for capturing, correlating,
and indexing real-time streaming data in searchable repositories. Splunk can also produce
graphs, alerts, summarized reports, data visualizations, and dashboards, etc., using related
/
data. It is mainly beneficial for generating business insights and web analytics. Besides,
Splunk is also used for security purposes, compliance, application management and control.
Splunk Inc. introduced Splunk in the year 2014. It is written in combination with AJAX,
Python, C ++ and XML. Companies such as Trustwave, QRadar, and 1Labs are making good
use of Splunk for their analytical and security needs.
KNIME: KNIME is used to draw visual data flows, execute specific steps and analyze the
obtained models, results, and interactive views. It also allows us to execute all the analysis
steps altogether. It consists of an extension mechanism that can add more plugins, giving
additional features and functionalities.
KNIME is based on Eclipse and written in a Java programming language. It was developed in
2008 by KNIME Company. A list of companies that are making use of KNIME includes
Harnham, Tyler, and Paloalto.
Spark: Apache Spark is one of the core technologies in the list of big data technologies. It is one of
those essential technologies which are widely used by top companies. Spark is known for offering
In-memory computing capabilities that help enhance the overall speed of the operational process. It
/
also provides a generalized execution model to support more applications. Besides, it includes top-
level APIs (e.g., Java, Scala, and Python) to ease the development process.
Also, Spark allows users to process and handle real-time streaming data using batching and
windowing operations techniques. This ultimately helps to generate datasets and data frames on
top of RDDs. As a result, the integral components of Spark Core are produced. Components like
Spark MlLib, GraphX, and R help analyze and process machine learning and data science. Spark is
written using Java, Scala, Python and R language. The Apache Software Foundation developed it in
2009. Companies like Amazon, ORACLE, CISCO, VerizonWireless, and Hortonworks are using this big
data technology and making good use of it.
R-Language: R is defined as the programming language, mainly used in statistical computing and
graphics. It is a free software environment used by leading data miners, practitioners and statisticians.
Language is primarily beneficial in the development of statistical-based software and data analytics.
R-language was introduced in Feb 2000 by R-Foundation. It is written in Fortran. Companies like
Barclays, American Express, and Bank of America use R-Language for their data analytics needs.
/
Blockchain: Blockchain is a technology that can be used in several applications related to different
industries, such as finance, supply chain, manufacturing, etc. It is primarily used in processing
operations like payments and escrow. This helps in reducing the risks of fraud. Besides, it enhances
the transaction's overall processing speed, increases financial privacy, and internationalize the
markets. Additionally, it is also used to fulfill the needs of shared ledger, smart contract, privacy, and
consensus in any Business Network Environment.
Blockchain technology was first introduced in 1991 by two researchers, Stuart Haber and W. Scott
Stornetta. However, blockchain has its first real-world application in Jan 2009 when Bitcoin was
launched. It is a specific type of database based on Python, C++, and JavaScript. ORACLE, Facebook,
and MetLife are a few of those top companies using Blockchain technology.
Data Visualization
Let us discuss leading Big Data Technologies that come under Data Visualization:
/
Tableau: Tableau is one of the fastest and most powerful data visualization tools used by leading
business intelligence industries. It helps in analyzing the data at a very faster speed. Tableau helps in
creating the visualizations and insights in the form of dashboards and worksheets.
Tableau is developed and maintained by a company named TableAU. It was introduced in May 2013.
It is written using multiple languages, such as Python, C, C++, and Java. Some of the list's top
companies are Cognos, QlikQ, and ORACLE Hyperion, using this tool.
Plotly: As the name suggests, Plotly is best suited for plotting or creating graphs and relevant
components at a faster speed in an efficient way. It consists of several rich libraries and APIs, such as
MATLAB, Python, Julia, REST API, Arduino, R, Node.js, etc. This helps interactive styling graphs with
Jupyter notebook and Pycharm.
Plotly was introduced in 2012 by Plotly company. It is based on JavaScript. Paladins and Bitbank are
some of those companies that are making good use of Plotly.
Faculty Name
High-Performance Analytics Required:
To analyze such a large volume of data, Big Data analytics is typically performed
using specialized software tools and applications for predictive analytics, data mining
, text mining, forecasting and data optimization.
Collectively these processes are separate but highly integrated functions of high-
performance analytics.
Using Big Data tools and software enables an organization to process extremely large
volumes of data that a business has collected to determine which data is relevant and can
be analyzed to drive better business decisions in the future.
Faculty Name
The Challenges:
For most organizations, Big Data analysis is a challenge. Consider the sheer volume
of data and the different formats of the
data(both structured and unstructured data) that is collected across the entire
organization and the many different ways different types of data can be combined,
contrasted and analyzed to find patterns and other useful business information.
The first challenge is in breaking down data silos to access all data an
organization stores in different places and often in different systems.
A second challenge is in creating platforms that can pull in unstructured data as easily
as structured data.
This massive volume of data is typically so large that it's difficult to process using
traditional database and software methods.
Faculty Name
The Benefits of Big Data Analytics:
Enterprises are increasingly looking to find actionable insights into their data.
Many big data projects originate from the need to answer specific business
questions. With the right big data analytics platforms in place, an enterprise can
boost sales, increase efficiency, and improve operations, customer service and
risk management.
Webopedia parent company, QuinStreet, surveyed 540 enterprise decision-
makers involved in big data purchases to learn which business areas companies
plan to use Big Data analytics to improve operations. About half of all
respondents said they were applying big data analytics to improve customer
retention, help with product development and gain a competitive advantage.
Notably, the business area getting the most attention relates to increasing
efficiency and optimizing operations. Specifically, 62 percent of respondents said
that they use big data analytics to improve speed and reduce complexity.
Faculty Name
How Big Data Analytics is Used Today:
As the technology that helps an organization to break down data silos and analyze
data improves, business can be transformed in all sorts of ways.
Today's advances in analyzing big data allow researchers to decode human DNA in
minutes, predict where terrorists plan to attack, determine which gene is mostly likely
to be responsible for certain diseases and, of course, which ads you are most likely
to respond to on Facebook.
Another example comes from one of the biggest mobile carriers in the world.
France's Orange launched its Data for Development project by releasing subscriber
data for customers in the Ivory Coast.
The 2.5 billion records, which were made anonymous, included details on calls
and text messages exchanged between 5 million users.
Researchers accessed the data and sent Orange proposals for how the data could serve
as the foundation for development projects to improve public health and safety.
Proposed projects included one that showed how to improve public safety by tracking
cell phone data to map where people went after emergencies; another showed how to
use cellular data for disease containment. (source)
Faculty Name
Application of Big Data
Faculty Name
Top Big Data applications in today’s world:
Big Data in Healthcare
Big Data in Education
Big Data in E-commerce
Big Data in Media and Entertainment
Big Data in Finance
Big Data in Travel Industry
Big Data in Telecom
Big Data in Automobile
Faculty Name
Big Data Applications
Faculty Name
Even a minute detail about any customer has now become significant for them. They are
now closer to their customers than they have ever been. This empowers them to provide
customers with more personalized services and predict their demands in advance.
This helps them in building a loyal customer base. Some of the biggest names in the retail
world like Walmart, Sears and Holdings, Costco, Walgreens, and many more now have Big
Data as an integral part of their organizations.
A study by the National Retail Federation estimated that sales in November and December
are responsible for as much as 30% of retail annual sales.
Faculty Name
2. Big Data in Healthcare
Big Data and healthcare are an ideal match. It complements the healthcare industry better
than anything ever will. The amount of data the healthcare industry has to deal with is
unimaginable.
Gone are the days when healthcare practitioners were incapable of harnessing this data.
From finding a cure to cancer to detecting Ebola and much more, Big Data has got it
all under its belt and researchers have seen some life-saving outcomes through it.
Big Data and analytics have given them the license to build more personalized
medications. Data analysts are harnessing this data to develop more and more effective
treatments. Identifying unusual patterns of certain medicines to discover ways for
developing more economical solutions is a common practice these days.
Faculty Name
Smart wearables have gradually gained popularity and are the latest trend among
people of all age groups. This generates massive amounts of real-time data in
the form of alerts which helps in saving the lives of the people.
Faculty Name
3. Big Data in Education
When you ask people about the use of the data that an educational institute gathers, the
majority of the people will have the same answer that the institute or the student might
need it for future references.
Even you had the same perception about this data, didn’t you? But the fact is, this data
holds enormous importance. Big Data is the key to shaping the future of the people
and has the power to transform the education system for better.
Some of the top universities are using Big Data as a tool to renovate their academic
curriculum. Additionally, universities can even track the dropout rates of the students
and are taking the required measures to reduce this rate as much as possible.
Faculty Name
5. Big Data in Media and Entertainment
Media and Entertainment industry is all about art and employing Big Data in it is a
sheer piece of art. Art and science are often considered to be the two completely
contrasting domains but when employed together, they do make a deadly duo and Big
Data’s endeavors in the media industry are a perfect example of it.
Viewers these days need content according to their choices only. Content that is
relatively new to what they saw the previous time. Earlier the companies
broadcasted the Ads randomly without any kind of analysis.
But after the advent of Big Data analytics in the industry, companies now are
aware of the kind of Ads that attracts a customer and the most appropriate time to
broadcast it for seeking maximum attention.
Customers are now the real heroes of the Media and entertainment industry -
courtesy to Big Data and Analytics. Faculty Name
4. Big Data in E-
commerce
One of the greatest revolutions this generation has seen is that of E-commerce. It is now
part and parcel of our routine life. Whenever we need to buy something, the first thought
that provokes our mind is E-commerce. And not your surprise, Big Data has been the face
of it.
Some of the biggest E-commerce companies of the world like Amazon, Flipkart, Alibaba, and
many more are now bound to Big Data and analytics is itself an evidence of the level of
popularity Big Data has gained in recent times.
Big Data is now as important as anyone else in these organizations. Amazon, the biggest E-
commerce firm in the world and one of the pioneers of Big Data and analytics, has Big Data as
the backbone of its system. Flipkart, the biggest E-commerce firm in India, has one of the most
robust data platforms in the country.
Big Data’s recommendation engine is one of the most amazing applications the Big Data world
has ever witnessed. It furnishes the companies with a 360-degree view of its customers.
Companies then suggest customers accordingly. Customers now experience more personalized
services than they have ever had. Big Data has completely redefined people’s online shopping
experiences.
Faculty Name
6. Big Data in Finance
The functioning of any financial organization depends heavily on its data and to safeguard that
data is one of the toughest challenges any financial firm faces. Data has been the second most
important commodity for them after money.
Even before Big Data gained popularity, the finance industry was already conquering the
technical field. In addition to it, financial firms were among the earliest adopters of Big Data
and Analytics.
Digital banking and payments are two of the most trending buzzwords around and Big
data has been at the heart of it. Big Data is bossing the key areas of financial firms such as
fraud detection, risk analysis, algorithmic trading, and customer contentment.
This has brought much-needed fluency in their systems. They are now empowered to focus
more on providing better services to their customers rather than focussing on security issues.
Big Data has now enhanced the financial system with answers to its hardest of the challenges.
Faculty Name
7. Big Data in Travel Industry
While Big Data is spreading like wildfire and various industries have been cooking its food
with it, the travel industry was a bit late to realize its worth. Better late than never though.
Having a stress-free traveling experience is still like a daydream for many.
And now Big Data’s arrival is like a ray of hope, that will mark the departure of all the
hindrances in our smooth traveling experience.
Through Big Data and analytics, travel companies are now able to offer more
customized traveling experience. They are now able to understand their customer’s
requirements in a much-enhanced way.
From providing them with the best offers to be able to make suggestions in real-time,
Big Data is certainly a perfect guide for any traveler. Big Data is gradually taking
the window seat in the travel industry.
Faculty Name
8. Big Data in Telecom
The telecom industry is the soul of every digital revolution that takes place around the world.
With the ever-increasing popularity of smartphones, it has flooded the telecom industry with
massive amounts of data.
And this data is like a goldmine, telecom companies just need to know how to dig it properly.
Through Big Data and analytics, companies are able to provide the customers with smooth
connectivity, thus eradicating all the network barriers that the customers have to deal with.
Companies now with the help of Big Data and analytics can track the areas with the lowest as
well as the highest network traffics and thus doing the needful to ensure hassle-free network
connectivity.
Big Data alike other industries have helped the telecom industry to understand its customers
pretty well.
Telecom industries now provide customers with offers as customized as possible.
Big Data has been behind the data revolution we are currently experiencing.
Faculty Name
9. Big Data in Automobile
“A business like an automobile, has to be driven, in order to get results.” B.C. Forbes
And Big Data has now taken complete control of the automobile industry and is driving it
smoothly. Big Data is driving the automobile industry towards some unbelievable and never
before results.
The automobile industry is on a roll and Big Data is its wheels or I must say Big Data has
given wings to it. Big Data has helped the automobile industry achieve things that were
beyond our imaginations
From analyzing the trends to understanding the supply chain management, from taking
care of its customers to turning our wildest dream of connected cars a reality, Big Data is
well and truly driving the automobile industry crazy.
Faculty Name
Thank You!