Unit I Bda
Unit I Bda
-comprehensive one including data, data frameworks, along with the tools
and techniques used to process and analyse the data.
The History of Big Data
Although the concept of big data itself is relatively new, the origins of large data sets go back
to the 1960s and '70s when the world of data was just getting started with the first data centers
and the development of the relational database. Around 2005, people began to realize just how
much data users generated through Facebook, YouTube, and other online services. Hadoop (an
open-source framework created specifically to store and analyze big data sets) was developed
that same year. NoSQL also began to gain popularity during this time.
The development of open-source frameworks, such as Hadoop (and more recently, Spark) was
essential for the growth of big data because they make big data easier to work with and cheaper
to store. In the years since then, the volume of big data has skyrocketed. Users are still
generating huge amounts of data—but it’s not just humans who are doing it.
With the advent of the Internet of Things (IoT), more objects and devices are connected to the
internet, gathering data on customer usage patterns and product performance. The emergence
of machine learning has produced still more data.
While big data has come far, its usefulness is only just beginning. Cloud computing has
expanded big data possibilities even further. The cloud offers truly elastic scalability, where
developers can simply spin up ad hoc clusters to test a subset of data.
BIG DATA ANALYTICS
Benefits of Big Data and Data Analytics
information.
—which means a completely
different approach to tackling problems.
Types of Big Data
Now that we are on track with what is big data, let’s have a look at the types of big data:
a) Structured
Structured is one of the types of big data and By structured data, we mean data that can be
processed, stored, and retrieved in a fixed format. It refers to highly organized information that
can be readily and seamlessly stored and accessed from a database by simple search engine
algorithms. For instance, the employee table in a company database will be structured as the
employee details, their job positions, their salaries, etc., will be present in an organized manner.
b) Unstructured
Unstructured data refers to the data that lacks any specific form or structure whatsoever. This
makes it very difficult and time-consuming to process and analyze unstructured data. Email is
an example of unstructured data. Structured and unstructured are two important types of big
data.
c) Semi-structured
Semi structured is the third type of big data. Semi-structured data pertains to the data containing
both the formats mentioned above, that is, structured and unstructured data. To be precise, it
refers to the data that although has not been classified under a particular repository (database),
yet contains vital information or tags that segregate individual elements within the data. Thus
we come to the end of types of data.
Characteristics of Big Data
Back in 2001, Gartner analyst Doug Laney listed the 3 ‘V’s of Big Data – Variety, Velocity,
and
Volume. Let’s discuss the characteristics of big data.
These characteristics, isolated, are enough to know what big data is. Let’s look at them in depth:
a) Variety
Variety of Big Data refers to structured, unstructured, and semi-structured data that is gathered
from multiple sources. While in the past, data could only be collected from spreadsheets and
databases, today data comes in an array of forms such as emails, PDFs, photos, videos, audios,
SM posts, and so much more. Variety is one of the important characteristics of big data.
b) Velocity
Velocity essentially refers to the speed at which data is being created in real-time. In a broader
prospect, it comprises the rate of change, linking of incoming data sets at varying speeds, and
activity bursts.
c) Volume
Volume is one of the characteristics of big data. We already know that Big Data indicates huge
‘volumes’ of data that is being generated on a daily basis from various sources like social media
platforms, business processes, machines, networks, human interactions, etc. Such a large
amount of data is stored in data warehouses. Thus comes to the end of characteristics of big
data.
The Challenges in Big Data are the real implementation hurdles. These require immediate
attention and need to be handled because if not handled, the technology’s failure may occur,
leading to some unpleasant results. Big data challenges include storing and analyzing
extremely large and fast-growing data.
Big Challenges with Big Data
This article explores some of the most pressing challenges associated with Big Data and
offers potential solutions for overcoming them.
Big Challenges with Big Data
Data Volume: Managing and Storing Massive Amounts of Data
Data Variety: Handling Diverse Data Types
Data Velocity: Processing Data in Real-Time
Data Veracity: Ensuring Data Quality and Accuracy
Data Security and Privacy: Protecting Sensitive Information
Data Integration: Combining Data from Multiple Sources
Data Analytics: Extracting Valuable Insights
Data Governance: Establishing Policies and Standards
The term Big Data is referred to as large amount of complex and unprocessed data. Now a day's
companies use Big Data to make business more informative and allows to take business
decisions by enabling data scientists, analytical modelers and other professionals to analyse
large volume of transactional data. Big data is the valuable and powerful fuel that drives large
IT industries of the 21st century. Big data is a spreading technology used in each business
sector. In this section, we will discuss application of Big Data.
Travel and tourism are the users of Big Data. It enables us to forecast travel facilities
requirements at multiple locations, improve business through dynamic pricing, and many more.
The financial and banking sectors use big data technology extensively. Big data analytics
help banks and customer behaviour on the basis of investment patterns, shopping trends,
motivation to invest, and inputs that are obtained from personal or financial backgrounds.
Healthcare
Big data has started making a massive difference in the healthcare sector, with the help
of predictive analytics, medical professionals, and health care personnel. It can
produce personalized healthcare and solo patients also.
Telecommunications and the multimedia sector are the main users of Big Data. There
are zettabytes to be generated every day and handling large-scale data that require big data
technologies.
The government and military also used technology at high rates. We see the figures that
the government makes on the record. In the military, a fighter plane requires to
process petabytes of data.
Government agencies use Big Data and run many agencies, managing utilities, dealing with
traffic jams, and the effect of crime like hacking and online fraud.
Aadhar Card: The government has a record of 1.21 billion citizens. This vast data is analyzed
and store to find things like the number of youth in the country. Some schemes are built to
target the maximum population. Big data cannot store in a traditional database, so it stores and
analyze data by using the Big Data Analytics tools.
E-commerce
E-commerce is also an application of Big data. It maintains relationships with customers that
is essential for the e-commerce industry. E-commerce websites have many marketing ideas to
retail merchandise customers, manage transactions, and implement better strategies of
innovative ideas to improve businesses with Big data.
Social Media
Social Media is the largest data generator. The statistics have shown that around 500+ terabytes
of fresh data generated from social media daily, particularly on Facebook. The data mainly
contains videos, photos, message exchanges, etc. A single activity on the social media site
generates many stored data and gets processed when required. The data stored is in terabytes
(TB); it takes a lot of time for processing. Big Data is a solution to the problem.
-screen or on
hardcopy
* Apache Spark is a powerful open source big data analytics tool. It offers over 80 high-level
operators that make it easy to build parallel apps. It is used at a wide range of organizations
to process large datasets.
Features:
to run an application in Hadoop cluster, up to 100 times faster in memory, and ten
times faster on disk
* Plotly is an analytics tool that lets users create charts and dashboards to share online.
Features:
-catching and informative graphics
-grained information on data provenance
ffers unlimited public file hosting through its free community plan
* Lumify is a big data fusion, analysis, and visualization platform. It helps users to discover
connections and explore relationships in their data via a suite of analytic options.
Features:
t provides a variety of options for analyzing the links between entities on the graph
textual content, images,
and videos
* IBM SPSS Modeler is a predictive big data analytics platform. It offers predictive models
and
delivers to individuals, groups, systems and the enterprise. It has a range of advanced
algorithms and analysis techniques.
Features:
The term NoSQL originally referred to “non-SQL” or “non-relational” databases, but the term
has since evolved to mean “not only SQL,” as NoSQL databases have expanded to include a
wide range of different database architectures and data models.