0% found this document useful (0 votes)

45 views12 pages

Unit I - BDA

Big data refers to large datasets that cannot be processed using traditional computing techniques due to their volume, variety and velocity. It has three main characteristics - variety, referring to data from multiple structured and unstructured sources; velocity, the speed at which data is created and needs to be processed in real-time; and volume, the huge amounts of data generated daily from various sources. Common sources of big data include social media, IoT devices, websites and cloud storage. Working with big data poses challenges including quick data growth, storage limitations, syncing data from different sources, security issues, and dealing with unreliable data.

Uploaded by

Sreelakshmi K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views12 pages

Unit I - BDA

Uploaded by

Sreelakshmi K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

BIGDATA ANALYTICS

UNIT-I
Big data:
What is Big Data?
Big Data is a collection of large datasets that cannot be processed using traditional computing
techniques. For example, the volume of data Facebook or Youtube need require it to collect and manage on
a daily basis, can fall under the category of Big Data. However, Big Data is not only about scale and
volume, it also involves one or more of the following aspects − Velocity, Variety, Volume, and
Complexity.

Characteristics of Big Data

3 ‘V’s of Big Data –

Variety,
Velocity,
Volume.
\
1) Variety

Variety of Big Data refers to structured, unstructured, and semistructured data that is gathered from multiple
sources. While in the past, data could only be collected from spreadsheets and databases, today data comes
in an array of forms such as emails, PDFs, photos, videos, audios, SM posts, and so much more.

2) Velocity

Velocity essentially refers to the speed at which data is being created in real-time. In a broader prospect, it
comprises the rate of change, linking of incoming data sets at varying speeds, and activity bursts.

3) Volume

We already know that Big Data indicates huge ‘volumes’ of data that is being generated on a daily basis
from various sources like social media platforms, business processes, machines, networks, human
interactions, etc. Such a large amount of data are stored in data warehouses.

Types of Big Data

Structured

By structured data, we mean data that can be processed, stored, and retrieved in a fixed format. It
refers to highly organized information that can be readily and seamlessly stored and accessed from a
database by simple search engine algorithms. For instance, the employee table in a company database
will be structured as the employee details, their job positions, their salaries, etc., will be present in an
organized manner.
Unstructured

Unstructured data refers to the data that lacks any specific form or structure whatsoever. This makes
it very difficult and time-consuming to process and analyze unstructured data. Email is an example of
unstructured data.

Semi-structured

Semi-structured data pertains to the data containing both the formats mentioned above, that is,
structured and unstructured data. To be precise, it refers to the data that although has not been classified
under a particular repository (database), yet contains vital information or tags that segregate individual
elements within the data.

Sources of big data

Voluminous amounts of big data make it crucial for businesses to differentiate, for the purpose of
effectiveness, the disparate big data sources available

Media as a big data source

Media is the most popular source of big data, as it provides valuable insights on consumer preferences and
changing trends. Since it is self-broadcasted and crosses all physical and demographical barriers, it is the
fastest way for businesses to get an in-depth overview of their target audience, draw patterns and
conclusions, and enhance their decision-making. Media includes social media and interactive platforms, like
Google, Facebook, Twitter, YouTube, Instagram, as well as generic media like images, videos, audios, and
podcasts that provide quantitative and qualitative insights on every aspect of user interaction.

Cloud as a big data source

Today, companies have moved ahead of traditional data sources by shifting their data on the cloud. Cloud
storage accommodates structured and unstructured data and provides business with real-time information
and on-demand insights. The main attribute of cloud computing is its flexibility and scalability. As big data
can be stored and sourced on public or private clouds, via networks and servers, cloud makes for an efficient
and economical data source.

The web as a big data source

The public web constitutes big data that is widespread and easily accessible. Data on the Web or ‘Internet’
is commonly available to individuals and companies alike. Moreover, web services such as Wikipedia
provide free and quick informational insights to everyone. The enormity of the Web ensures for its diverse
usability and is especially beneficial to start-ups and SME’s, as they don’t have to wait to develop their own
big data infrastructure and repositories before they can leverage big data.

IoT as a big data source

Machine-generated content or data created from IoT constitute a valuable source of big data. This data is
usually generated from the sensors that are connected to electronic devices. The sourcing capacity depends
on the ability of the sensors to provide real-time accurate information. IoT is now gaining momentum and
includes big data generated, not only from computers and smartphones, but also possibly from every device
that can emit data. With IoT, data can now be sourced from medical devices, vehicular processes, video
games, meters, cameras, household appliances, and the like.

Databases as a big data source

Businesses today prefer to use an amalgamation of traditional and modern databases to acquire relevant big
data. This integration paves the way for a hybrid data model and requires low investment and IT
infrastructural costs. Furthermore, these databases are deployed for several business intelligence purposes as
well. These databases can then provide for the extraction of insights that are used to drive business profits.
Popular databases include a variety of data sources, such as MS Access, DB2, Oracle, SQL, and Amazon
Simple, among others.

Working with unstructured data

The process of extracting and analyzing data amongst extensive big data sources is a complex
process and can be frustrating and time-consuming. These complications can be resolved if organizations
encompass all the necessary considerations of big data, take into account relevant data sources, and deploy
them in a manner which is well tuned to their organizational goals.

Before the modern day ubiquity of online and mobile applications, databases processed
straightforward, structured data. Data models were relatively simple and described a set of relationships
between different data types in the database.

Unstructured data, in contrast, refers to data that doesn’t fit neatly into the traditional row and
column structure of relational databases. Examples of unstructured data include: emails, videos, audio files,
web pages, and social media messages. In today’s world of Big Data, most of the data that is created is
unstructured with some estimates of it being more than 95% of all data generated.
As a result, enterprises are looking to this new generation of databases, known as NoSQL, to address
unstructured data. MongoDB stands as a leader in this movement with over 10 million downloads and
hundreds of thousands of deployments. As a document database with flexible schema, MongoDB was built
specifically to handle unstructured data. MongoDB’s flexible data model allows for development without a
predefined schema which resonates particularly when most of the data in your system is unstructured.

The Evolution of Big Data

To truly understand the implications of Big Data analytics, one has to reach back into the annals of
computing history, specifically business intelligence (BI) and scientific computing. The ideology behind Big
Data can most likely be tracked back to the days before the age of computers, when unstructured data were
the norm (paper records) and analytics was in its infancy. Perhaps the first Big Data challenge came in the
form of the 1880 U.S. census, when the information concerning approximately 50 million people had to be
gathered, classified, and reported on.

With the 1880 census, just counting people was not enough information for the U.S. government to
work with—particular elements, such as age, sex, occupation, education level, and even the “number of
insane people in household,” had to be accounted for. That information had intrinsic value to the process,
but only if it could be tallied, tabulated, analyzed, and presented. New methods of relating the data to other
data collected came into being, such as associating occupations with geographic areas, birth rates with
education levels, and countries of origin with skill sets.

The 1880 census truly yielded a mountain of data to deal with, yet only severely limited technology
was available to do any of the analytics. The problem of Big Data could not be solved for the 1880 census,
so it took over seven years to manually tabulate and report on the data.

With the 1890 census, things began to change, ...

Challenges of Big Data

It must be pretty clear by now that while talking about big data one can’t ignore the fact that there are some
obvious challenges associated with it. So moving forward in this blog, let’s address some of those
challenges.

 Quick Data Growth

Data growing at such a quick rate is making it a challenge to find insights from it. There is more and more
data generated every second from which the data that is actually relevant and useful has to be picked up for
further analysis.

 Storage

Such large amount of data is difficult to store and manage by organizations without appropriate tools and
technologies.
 Syncing Across Data Sources

This implies that when organisations import data from different sources the data from one source might not
be up to date as compared to the data from another source.

 Security

Huge amount of data in organisations can easily become a target for advanced persistent threats, so here lies
another challenge for organisations to keep their data secure by proper authentication, data encryption, etc.

 Unreliable Data

We can’t deny the fact that big data can’t be 100 percent accurate. It might contain redundant or incomplete
data, along with contradictions.

 Miscellaneous Challenges

These are some other challenges that come forward while dealing with big data, like the integration of
data, skill and talent availability, solution expenses and processing a large amount of data in time and
with accuracy so that the data is available for data consumers whenever they need it.

Data Environment versus Big Data Environment

Below are the lists of points, describe the comparisons between Small Data and Big Data.

BasisOf
Small Data Big Data
Comparison

Data that is ‘small’ enough for human Data sets that are so large or complex that
Definition comprehension.In a volume and format that makes it traditional data processing applications
accessible, informative and actionable cannot deal with them

● Data from traditional enterprise systems like ● Purchase data from point-of-sale
○ Enterprise resource planning ● Clickstream data from websites
Data Source ○ Customer relationship management(CRM) ● GPS stream data – Mobility data sent to
● Financial Data like general ledger data a server
● Payment transaction data from website ● Social media – Facebook, Twitter

Most cases in a range of tens or hundreds of GB.Some

Volume More than a few Terabytes (TB)
case few TBs ( 1 TB=1000 GB)

Velocity ● Controlled and steady data flow ● Data can arrive at very fast speeds.
● Data accumulation is slow ● Enormous data can accumulate within
very short periods of time

High variety data sets which include Tabular

Structured data in tabular format with fixed schema
Variety data,Text files, Images, Video,
and semi-structured data in JSON or XML format
Audio,XML,JSON,Logs,Sensor data etc.

Usually, the quality of data not guaranteed.

Veracity (Quality Contains less noise as data collected in a controlled
Rigorous data validation is required before
of data ) manner.
processing.

Complex data mining for prediction,

Value Business Intelligence, Analysis, and Reporting
recommendation, pattern finding, etc.

Historical data equally valid as data represent solid In some cases, data gets older soon(Eg fraud
Time Variance
business interactions detection).

Mostly in distributed storages on Cloud or in

Data Location Databases within an enterprise, Local servers, etc.
external file systems.

More agile infrastructure with a horizontally

Predictable resource allocation.Mostly vertically
Infrastructure scalable architecture. Load on the system
scalable hardware
varies a lot.
UNIT-II
Bigdata Anlytics:
Overview of Business Intelligence:
Business Intelligence (BI) applications are decision support tools that enable real-time, interactive
access to and analysis of mission-critical corporate information. BI applications bridge the gaps between
information silos in an organization. Sophisticated analytical capabilities have access to such corporate
information resources as data warehouses, transaction processing applications, and enterprise applications
like Enterprise Resource Planning (ERP). BI enables users to access and leverage vast amounts of data,
providing valuable insight into potential opportunities and areas for business process refinement.

BI applications can be classified as follows:

 Personalized Dashboards for Process Monitoring and Highlighting Exceptions

 Decision Support with Drill-Down and “What-If” Analysis
 Data-Mining to Understand and Discover Patterns and Behaviors
 Automated Agents to Drive Rule-Based Business Strategy via Integrated Processes

Investments made in an EPM (Enterprise Process Management) implementation can be very expensive, so it
is imperative that every asset is leveraged. Youngsoft’s team of experienced professionals provide a wide
range of services across the suite of EPM products, consistently delivering solutions that reduce costs,
increase profits, and improve overall efficiency.

Benefits:

 Cost Reduction during the Implementation Process

 Retained Knowledge after Completion of the Implementation
 End User Step by Step Training during the Implementation Process
 Providing Tier-One Production Support during Post-Implementation Process
 Shadowing the Implementation

What is Data Science?

Data Science is the combination of statistics, mathematics, programming, problem-solving, capturing data in
ingenious ways, the ability to look at things differently, and the activity of cleansing, preparing, and aligning
the data.

In simple terms, it is the umbrella of techniques used when trying to extract insights and information from
data.
Applications of Data Science

 Internet Search

Search engines make use of data science algorithms to deliver the best results for search queries in a
fraction of seconds.

 Digital Advertisements

The entire digital marketing spectrum uses the data science algorithms - from display banners to
digital billboards. This is the mean reason for digital ads getting higher CTR than traditional
advertisements.

 Recommender Systems

The recommender systems not only make it easy to find relevant products from billions of products
available but also adds a lot to user-experience. A lot of companies use this system to promote their
products and suggestions in accordance with the user’s demands and relevance of information. The
recommendations are based on the user’s previous search results.

Need of Bigdata Analytics

Why is big data analytics important?

Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in
turn, leads to smarter business moves, more efficient operations, higher profits and happier customers. In his
report Big Data in Big Companies, IIA Director of Research Tom Davenport interviewed more than 50
businesses to understand how they used big data. He found they got value in the following ways:

1. Cost reduction. Big data technologies such as Hadoop and cloud-based analytics bring significant
cost advantages when it comes to storing large amounts of data – plus they can identify more
efficient ways of doing business.
2. Faster, better decision making. With the speed of Hadoop and in-memory analytics, combined
with the ability to analyze new sources of data, businesses are able to analyze information
immediately – and make decisions based on what they’ve learned.
3. New products and services. With the ability to gauge customer needs and satisfaction through
analytics comes the power to give customers what they want. Davenport points out that with big data
analytics, more companies are creating new products to meet customers’ needs.
Types of data analytics
There are 4 different types of analytics.

Descriptive analytics

Descriptive analytics answers the question of what happened. Let us bring an example from ScienceSoft’s
practice: having analyzed monthly revenue and income per product group, and the total quantity of metal
parts produced per month, a manufacturer was able to answer a series of ‘what happened’ questions and
decide on focus product categories.

Descriptive analytics juggles raw data from multiple data sources to give valuable insights into the past.
However, these findings simply signal that something is wrong or right, without explaining why. For this
reason, our data consultants don’t recommend highly data-driven companies to settle for descriptive
analytics only, they’d rather combine it with other types of data analytics.

Diagnostic analytics

At this stage, historical data can be measured against other data to answer the question of why something
happened. For example, you can check ScienceSoft’s BI demo to see how a retailer can drill the sales and
gross profit down to categories to find out why they missed their net profit target. Another flashback to our
data analytics projects: in the healthcare industry, customer segmentation coupled with several filters
applied (like diagnoses and prescribed medications) allowed identifying the influence of medications.

Diagnostic analytics gives in-depth insights into a particular problem. At the same time, a company should
have detailed information at their disposal, otherwise, data collection may turn out to be individual for every
issue and time-consuming.

Predictive analytics

Predictive analytics tells what is likely to happen. It uses the findings of descriptive and diagnostic analytics
to detect clusters and exceptions, and to predict future trends, which makes it a valuable tool for forecasting.
Check ScienceSoft’s case study to get details on how advanced data analytics allowed a leading FMCG
company to predict what they could expect after changing brand positioning.

Predictive analytics belongs to advanced analytics types and brings many advantages like sophisticated
analysis based on machine or deep learning and proactive approach that predictions enable. However, our
data consultants state it clearly: forecasting is just an estimate, the accuracy of which highly depends on data
quality and stability of the situation, so it requires careful treatment and continuous optimization.

Prescriptive analytics

The purpose of prescriptive analytics is to literally prescribe what action to take to eliminate a future
problem or take full advantage of a promising trend. An example of prescriptive analytics from our project
portfolio: a multinational company was able to identify opportunities for repeat purchases based on
customer analytics and sales history.

Prescriptive analytics uses advanced tools and technologies, like machine learning, business rules and
algorithms, which makes it sophisticated to implement and manage. Besides, this state-of-the-art type of
data analytics requires not only historical internal data but also external information due to the nature of
algorithms it’s based on. That is why, before deciding to adopt prescriptive analytics, ScienceSoft strongly
recommends weighing the required efforts against an expected added value.

Big Data Analytics Challenges

Need For Synchronization Across Disparate Data Sources

As data sets are becoming bigger and more diverse, there is a big challenge to incorporate them into an
analytical platform. If this is overlooked, it will create gaps and lead to wrong messages and insights.

2. Acute Shortage Of Professionals Who Understand Big Data Analysis

The analysis of data is important to make this voluminous amount of data being produced in every minute,
useful. With the exponential rise of data, a huge demand for big data scientists and Big Data analysts has
been created in the market. It is important for business organizations to hire a data scientist having skills that
are varied as the job of a data scientist is multidisciplinary. Another major challenge faced by businesses is
the shortage of professionals who understand Big Data analysis. There is a sharp shortage of data scientists
in comparison to the massive amount of data being produced.

3. Getting Meaningful Insights Through The Use Of Big Data Analytics

It is imperative for business organizations to gain important insights from Big Data analytics, and also it is
important that only the relevant department has access to this information. A big challenge faced by the
companies in the Big Data analytics is mending this wide gap in an effective manner.

4. Getting Voluminous Data Into The Big Data Platform

It is hardly surprising that data is growing with every passing day. This simply indicates that business
organizations need to handle a large amount of data on daily basis. The amount and variety of data available
these days can overwhelm any data engineer and that is why it is considered vital to make data accessibility
easy and convenient for brand owners and managers.

5. Uncertainty Of Data Management Landscape

With the rise of Big Data, new technologies and companies are being developed every day. However, a big
challenge faced by the companies in the Big Data analytics is to find out which technology will be best
suited to them without the introduction of new problems and potential risks.

6. Data Storage And Quality

Business organizations are growing at a rapid pace. With the tremendous growth of the companies and large
business organizations, increases the amount of data produced. The storage of this massive amount of data is
becoming a real challenge for everyone. Popular data storage options like data lakes/ warehouses are
commonly used to gather and store large quantities of unstructured and structured data in its native format.
The real problem arises when a data lakes/ warehouse try to combine unstructured and inconsistent data
from diverse sources, it encounters errors. Missing data, inconsistent data, logic conflicts, and duplicates
data all result in data quality challenges.

7. Security And Privacy Of Data

Once business enterprises discover how to use Big Data, it brings them a wide range of possibilities and
opportunities. However, it also involves the potential risks associated with big data when it comes to the
privacy and the security of the data. The Big Data tools used for analysis and storage utilizes the data
disparate sources. This eventually leads to a high risk of exposure of the data, making it vulnerable. Thus,
the rise of voluminous amount of data increases privacy and security concerns.
The Importance of Big Data Analytics

Driven by specialized analytics systems and software, as well as high-powered computing systems, big data
analytics offers various business benefits, including:

 New revenue opportunities

 More effective marketing

 Better customer service

 Improved operational efficiency

 Competitive advantages over rivals

Big data analytics applications enable big data analysts, data scientists, predictive modelers, statisticians and
other analytics professionals to analyze growing volumes of structured transaction data, plus other forms of
data that are often left untapped by conventional BI and analytics programs. This encompasses a mix of
semi-structured and unstructured data -- for example, internet clickstream data, web server logs, social
media content, text from customer emails and survey responses, mobile phone records, and machine data
captured by sensors connected to the internet of things (IoT).

Basic Terminologies in big data environment

we will discuss the terminology related to Big Data ecosystem. This will give you a complete understanding
of Big Data and its terms.

Over time, Hadoop has become the nucleus of the Big Data ecosystem, where many new technologies have
emerged and have got integrated with Hadoop. So it’s important that, first, we understand and appreciate the
nucleus of modern Big Data architecture.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data
sets across clusters of computers, using simple programming models. It is designed to scale up from single
servers to thousands of machines, each offering local computation and storage.

Big Data Analytics - Complete Notes
No ratings yet
Big Data Analytics - Complete Notes
136 pages
Unit 1
No ratings yet
Unit 1
107 pages
Big Data Class 27feb
No ratings yet
Big Data Class 27feb
48 pages
BigData BCom
No ratings yet
BigData BCom
57 pages
BigData - BCom Unit 1
No ratings yet
BigData - BCom Unit 1
9 pages
BDA Lecture Notes Updated Unit 1
No ratings yet
BDA Lecture Notes Updated Unit 1
37 pages
Big Data Analytics Unit 1
No ratings yet
Big Data Analytics Unit 1
26 pages
Presentation 1
No ratings yet
Presentation 1
27 pages
UNIT 1big Data Introduction
No ratings yet
UNIT 1big Data Introduction
56 pages
Unit 1
No ratings yet
Unit 1
22 pages
Big Data Lecture # 1
No ratings yet
Big Data Lecture # 1
15 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
6 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
32 pages
Unit 1
No ratings yet
Unit 1
44 pages
Unit 4
No ratings yet
Unit 4
29 pages
Unit 5
No ratings yet
Unit 5
63 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
36 pages
Unit 1
No ratings yet
Unit 1
56 pages
Unit 1
No ratings yet
Unit 1
20 pages
Ds Unit-1
No ratings yet
Ds Unit-1
19 pages
Unit I Bda
No ratings yet
Unit I Bda
18 pages
Big Data and Business Opportunities
100% (1)
Big Data and Business Opportunities
6 pages
R19 Bda Unit-1
No ratings yet
R19 Bda Unit-1
22 pages
Module - 1 - Session - 1 Introduction To Big Data Platform - Characteristics - Sources - Nature
No ratings yet
Module - 1 - Session - 1 Introduction To Big Data Platform - Characteristics - Sources - Nature
4 pages
Bigdata Writing
No ratings yet
Bigdata Writing
11 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
20 pages
What Is Big Data
No ratings yet
What Is Big Data
4 pages
Module 1
No ratings yet
Module 1
14 pages
Course Material
100% (1)
Course Material
57 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
37 pages
Assignment: Advance Marketing Research & Data Analytics
No ratings yet
Assignment: Advance Marketing Research & Data Analytics
4 pages
Unit 1
No ratings yet
Unit 1
26 pages
Big Data12
No ratings yet
Big Data12
11 pages
Big Data
No ratings yet
Big Data
3 pages
BDA Notes
No ratings yet
BDA Notes
35 pages
Unit-I Bdaur-Bcom
No ratings yet
Unit-I Bdaur-Bcom
5 pages
Assignment: Ce Marketing Research & Data Analytics
No ratings yet
Assignment: Ce Marketing Research & Data Analytics
7 pages
Big Data
No ratings yet
Big Data
14 pages
Emerging Big Data and Cloud Computing
No ratings yet
Emerging Big Data and Cloud Computing
15 pages
What Is Data
No ratings yet
What Is Data
24 pages
Unit 2 Bda
No ratings yet
Unit 2 Bda
5 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
Big Data Gaby PDF
No ratings yet
Big Data Gaby PDF
8 pages
Unit I-KCS-061
No ratings yet
Unit I-KCS-061
42 pages
Big Data
No ratings yet
Big Data
11 pages
Big Data UNIT1
No ratings yet
Big Data UNIT1
23 pages
Unit I: Chapter 1: Introduction To Big Data
No ratings yet
Unit I: Chapter 1: Introduction To Big Data
35 pages
AMR Assignment
No ratings yet
AMR Assignment
11 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
117 pages
Introduction To Bigdata
No ratings yet
Introduction To Bigdata
31 pages
Multiprocessor Architecture: Taxonomy of Parallel Architectures
100% (1)
Multiprocessor Architecture: Taxonomy of Parallel Architectures
32 pages
Big Data
No ratings yet
Big Data
7 pages
Big Data Presentation Slide
100% (1)
Big Data Presentation Slide
30 pages
Report of Big Data
No ratings yet
Report of Big Data
14 pages
Big Data: Made By: Harshita Salian 17038 Syed Khadija Rizvi 17049 Sayyed Alfiya 17041 Rahul Masam 17028 Deepak Pal 17033
No ratings yet
Big Data: Made By: Harshita Salian 17038 Syed Khadija Rizvi 17049 Sayyed Alfiya 17041 Rahul Masam 17028 Deepak Pal 17033
12 pages
EFE Manual English Rev420 C
No ratings yet
EFE Manual English Rev420 C
479 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
01 - Introduction To Big Data Analytics PDF
No ratings yet
01 - Introduction To Big Data Analytics PDF
38 pages
HCIA Cloud Merg
No ratings yet
HCIA Cloud Merg
106 pages
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
No ratings yet
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
9 pages
Big Data
No ratings yet
Big Data
9 pages
Stack
No ratings yet
Stack
13 pages
Unit I - Big Data Programming
No ratings yet
Unit I - Big Data Programming
19 pages
MEMORY MANAGEMENT Mac Os X
60% (10)
MEMORY MANAGEMENT Mac Os X
15 pages
Practical File: Software Project Management
No ratings yet
Practical File: Software Project Management
6 pages
Mcse
No ratings yet
Mcse
2 pages
Micro Control 4th Sem Notes
No ratings yet
Micro Control 4th Sem Notes
21 pages
CH 9-10
No ratings yet
CH 9-10
6 pages
Uefi Bootable
No ratings yet
Uefi Bootable
2 pages
Big Data: Ghislain Fourny
No ratings yet
Big Data: Ghislain Fourny
120 pages
How To Create A Custom Concurrent Program With Host Method and Pass Parameters To The Shell Script (ID 266268.1)
No ratings yet
How To Create A Custom Concurrent Program With Host Method and Pass Parameters To The Shell Script (ID 266268.1)
3 pages
Ds III Unit Notes
No ratings yet
Ds III Unit Notes
36 pages
ECSS E ST 50 03C (31july2008)
No ratings yet
ECSS E ST 50 03C (31july2008)
43 pages
Nepal Telecom Exam Preparation (Level 7) : Dipak Kumar Nidhi
No ratings yet
Nepal Telecom Exam Preparation (Level 7) : Dipak Kumar Nidhi
31 pages
SQL06
No ratings yet
SQL06
36 pages
Exp 12 PPS Lab
No ratings yet
Exp 12 PPS Lab
8 pages
Install Squid Proxy Server On CentOS
No ratings yet
Install Squid Proxy Server On CentOS
3 pages
Lab (Tree)
No ratings yet
Lab (Tree)
15 pages
JARKOM - Setting Modem Yang Digunakan Untuk Layanan Internet Pascabayar (Telkom Speedy)
No ratings yet
JARKOM - Setting Modem Yang Digunakan Untuk Layanan Internet Pascabayar (Telkom Speedy)
14 pages
Certificate Authority High Availability On Cisco IOS Routers
100% (1)
Certificate Authority High Availability On Cisco IOS Routers
11 pages
04-DDD - Assignment Brief 2
No ratings yet
04-DDD - Assignment Brief 2
4 pages
PRACTICE 3-3 Ariel
No ratings yet
PRACTICE 3-3 Ariel
9 pages
MB - Master - User Guide PDF
No ratings yet
MB - Master - User Guide PDF
21 pages
Authority Class 7
No ratings yet
Authority Class 7
18 pages
Lyrics of Lalon Giti Songs Part A
No ratings yet
Lyrics of Lalon Giti Songs Part A
3 pages
C:/Users/HP Hdfs Namenode - Format
No ratings yet
C:/Users/HP Hdfs Namenode - Format
7 pages
Observability at Twitter - Technical Overview, Part I - Twitter Blogs
No ratings yet
Observability at Twitter - Technical Overview, Part I - Twitter Blogs
7 pages
T State Calculation
No ratings yet
T State Calculation
2 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet

Unit I - BDA

Uploaded by

Unit I - BDA

Uploaded by

BIGDATA ANALYTICS

Characteristics of Big Data

3 ‘V’s of Big Data –

Types of Big Data

Sources of big data

Media as a big data source

Cloud as a big data source

The web as a big data source

IoT as a big data source

Databases as a big data source

Working with unstructured data

The Evolution of Big Data

With the 1890 census, things began to change, ...

Challenges of Big Data

 Quick Data Growth

Data Environment versus Big Data Environment

Most cases in a range of tens or hundreds of GB.Some

High variety data sets which include Tabular

Usually, the quality of data not guaranteed.

Complex data mining for prediction,

Mostly in distributed storages on Cloud or in

More agile infrastructure with a horizontally

BI applications can be classified as follows:

 Personalized Dashboards for Process Monitoring and Highlighting Exceptions

 Cost Reduction during the Implementation Process

What is Data Science?

Need of Bigdata Analytics

Big Data Analytics Challenges

Need For Synchronization Across Disparate Data Sources

2. Acute Shortage Of Professionals Who Understand Big Data Analysis

3. Getting Meaningful Insights Through The Use Of Big Data Analytics

4. Getting Voluminous Data Into The Big Data Platform

5. Uncertainty Of Data Management Landscape

6. Data Storage And Quality

7. Security And Privacy Of Data

 New revenue opportunities

 More effective marketing

 Better customer service

 Improved operational efficiency

 Competitive advantages over rivals

Basic Terminologies in big data environment

You might also like