0% found this document useful (0 votes)
18 views13 pages

BDA - Introduction To Big Data Analytics Part 02

The document provides an introduction to Big Data and its analytics, detailing the types of digital data: structured, unstructured, and semi-structured. It explains the characteristics of big data, known as the three Vs: volume, variety, and velocity, and highlights the applications of big data analytics in customer segmentation, fraud detection, and risk management. Additionally, it discusses Hadoop and its components, including the Hadoop Distributed File System (HDFS) and YARN, as essential tools for processing big data.

Uploaded by

srisabarivasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
18 views13 pages

BDA - Introduction To Big Data Analytics Part 02

The document provides an introduction to Big Data and its analytics, detailing the types of digital data: structured, unstructured, and semi-structured. It explains the characteristics of big data, known as the three Vs: volume, variety, and velocity, and highlights the applications of big data analytics in customer segmentation, fraud detection, and risk management. Additionally, it discusses Hadoop and its components, including the Hadoop Distributed File System (HDFS) and YARN, as essential tools for processing big data.

Uploaded by

srisabarivasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 13

#RahoAmbitious

Big Data Analytics


Module Name:
Understanding Big Data

Session Name: Introduction


to Big Data Analytics

Instructor :
Types of Digital Data

Digital data is information that exists in a digital form. It can be created,


processed, and stored on electronic devices. There are three main types of
digital data:
● Structured data:This type of data is organized in a fixed format,
making it easy to store, search, and analyze. Examples of structured
data include relational databases, spreadsheets, and CSV files.
Types of Digital Data
Types of Digital Data

● Unstructured data: This type of data is not organized in a predefined


format. It can include text documents, emails, social media posts,
images, videos, and audio files.
Types of Digital Data

● Semi-structured data: This type of data has some structure, but it


does not follow a strict format. Examples of semi-structured data
include XML files, JSON files, and log files.
Introduction to Big Data

Big data refers to datasets that are too large or complex for traditional
data processing methods. Big data is characterized by three Vs:
● Volume: The amount of data is massive.
● Variety: The data comes in many different forms, including structured,
unstructured, and semi-structured data.
● Velocity: The data is generated and collected at a high speed.
Big Data Analytics

Big data analytics is the process of collecting, storing, and analyzing big data to
extract insights and information. Big data analytics can be used for a variety of
purposes, such as:

● Customer segmentation: Businesses can use big data analytics to segment


their customers into different groups based on their demographics, interests,
and behaviors. This allows businesses to target their marketing campaigns
more effectively.
● Fraud detection: Big data analytics can be used to detect fraudulent activity,
such as credit card fraud and insurance fraud.
● Risk management: Big data analytics can be used to assess risks and identify
potential problems before they occur.
History of Hadoop

Hadoop is an open-source software framework that is used for storing and


processing big data. Hadoop was created by Doug Cutting and Mike
Cafarella at Yahoo! in the early 2000s. Cutting and Cafarella were looking
for a way to store and process the massive amount of data that Yahoo!
was collecting.
Hadoop is based on the MapReduce programming model, which allows for
the parallel processing of large datasets. MapReduce breaks down a large
task into smaller, more manageable tasks that can be processed by
multiple computers simultaneously. This allows Hadoop to process big
data much faster than traditional data processing methods.
Apache Hadoop

Apache Hadoop is the most popular implementation of Hadoop. It is a


collection of open-source software projects that are used for storing and
processing big data. The two main components of Apache Hadoop are:
● Hadoop Distributed File System (HDFS):HDFS is a distributed file
system that stores data across multiple machines. This makes it
scalable and fault-tolerant.
● YARN (Yet Another Resource Negotiator):YARN is a resource
management system that manages the allocation of resources (such
as CPU and memory) for processing data stored in HDFS.
#RahoAmbitious

Thank You!
Disclaimer: All content and material on the upGrad Campus website is copyrighted material, either belonging to
upGrad Campus or its bona fide contributors and is purely for the dissemination of education. You are permitted to
access, print and download extracts from this site purely for your own education only and on the following basis:

● You can download this document from the website for self-use only.

● Any copies of this document, in part or full, saved to disk or to any other storage medium, may only be used for
subsequent, self-viewing purposes or to print an individual extract or copy for non-commercial personal use only.

● Any further dissemination, distribution, reproduction or copying of the content of the document herein or the
uploading thereof on other websites, or use of the content for any other commercial/unauthorised purposes in any way
which could infringe the intellectual property rights of upGrad Campus or its contributors, is strictly prohibited.

● No graphics, images or photographs from any accompanying text in this document will be used separately for
unauthorised purposes.

● No material in this document will be modified, adapted or altered in any way.

● No part of this document or upGrad Campus content may be reproduced or stored on any other website or included
in any public or private electronic retrieval system or service without prior written permission from upGrad Campus.

● Any rights not expressly granted in these terms are reserved.

You might also like