0% found this document useful (0 votes)

4 views13 pages

Sidd Bda

Uploaded by

Maryam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views13 pages

Sidd Bda

Uploaded by

Maryam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

BIG DATA ANALYTICS

Understanding Big
Data :
HDSF, NOSQL ,
Functions in R Presented By

Sidrah Mohammadi Waris

AI&DS-III yr-II sem
22L51A7218
Introduction To HDFS
The Storage Backbone of Big Data

HDFS, or Hadoop Distributed File System, is a core

component of the Hadoop ecosystem used to store
large volumes of data reliably. It works by dividing
files into large blocks and distributing them across
several computers (DataNodes). A central server
(NameNode) manages where each part is stored.
HDFS is designed to handle hardware failures
automatically through data replication, and it
supports high-speed, batch-style processing,
making it ideal for big data applications.
Hadoop Distributed File System
PURPOSE OF HDFS IN HADOOP ECOSYSTEM
HDFS stands for Hadoop Distributed File
System.

It is the primary storage system of Hadoop.

Acts as a central storage layer that allows other components

like MapReduce, Hive, Pig, and Spark to access data efficiently.
HDFS is designed for high throughput and reliable data
storage across clusters.

It supports large file sizes, enabling efficient data

processing and accessibility.

HDFS is fault-tolerant, ensuring data is safe

even in the event of hardware failures.
HDFS Architecture
HDFS follows a master- Components
slave architecture that
ensures reliable, scalable, NameNode: The
and fault-tolerant storage master node
for big data applications. that manages
metadata (file DataNodes:
names, Worker
locations). nodes where
the actual
Secondary data is
NameNode: stored.
Periodically
HDFS stores each file by splitting
merges the
it into blocks and distributing
those blocks across multiple
NameNode’s
DataNodes — ensuring speed, metadata to
scalability, and data safety prevent memory
overload.
NOSQL
NoSQL databases are perfect for handling modern
big data where traditional SQL fails.

NoSQL (Not Only SQL) refers to a class of non-

relational databases designed to store and
manage large volumes of unstructured or semi-
structured data with high scalability, flexibility,
and performance. Unlike traditional SQL
📋❌
No fixed table rules
databases, NoSQL systems support dynamic
Handles huge data 💾
schemas and horizontal scaling, making them
⬆️
Super fast at scaling
ideal for big data and real-time applications.
Perfect for modern apps 🌐📱
“Whether it’s

Types of keys, docs,

columns, or
graphs — NoSQL

NoSQL
has a type for
every kind of
data need.”

Databases NoSQL = freedom to store

data your way.

Types:

🔑 Key-Value Pair 📄 Document Based 📊 Column-Family 🔗 Graph Database

Stores and
Stores data as a Stores data in Stores data
organizes data in
collection of key columns rather using nodes and
flexible, JSON-like
and value pairs than rows relationships
documents
Samples of types of
NoSQL databases
🔑 1. Key-Value Pair (e.g., Redis) 📊 3. Column-Family (Cassandra)
SET user:101 "{'name':'Laila', 'age':20}" INSERT INTO users (id, name, age,
GET user:101 hobbies) VALUES (101, 'Laila', 20,
Key: user:101
['reading', 'music']);
Value: {'name':'Laila', 'age':20}
Output: Output:
{'name':'Laila', 'age':20} id | name | age | hobbies
-----+-------+-----+----------------------
📄 2. Document Based (e.g., MongoDB) 101 | Laila | 20 | ['reading', 'music']

db.users.insertOne({ 🔗 4. Graph Database (Neo4j)

_id: 101,
CREATE (laila:Person {name: 'Laila', age:
name: "Laila",
age: 20, 20})
hobbies: ["reading", "music"] CREATE (book:Interest {type: 'Reading'})
}) CREATE (laila)-[:LIKES]->(book)
Output: Output:
{ +----------------------+
"_id": 101, | p.name | i.type |
"name": "Laila",
+--------+-------------+
"age": 20,
"hobbies": ["reading", "music"] | Laila | Reading |
} +----------------------+
Functions In R
A function is a set of statements organized together to perform a specific
task. R has a large number of in-built functions and the user can create their
own functions.

In R, a function is an object so the R interpreter is able to pass control to the

function, along with arguments that may be necessary for the function to
accomplish the actions.

The function in turn performs its task and returns control to the interpreter
as well as any result which may be stored in other objects.

Creating A Function: Calling A Function:

An R function is created by using the keyword This just means calling or executing a function.
function() Syntax
Syntax function_name(argument1, argument2, ...)
function_name <- function(arg_1, arg_2, ...) { Example
Function body sqrt(49)
} O/P: 7
sum(1, 2) # Calling sum function
o/p: 3

Functions are the building blocks of R programming, allowing you to

write once, use many times — and stay DRY (Don’t Repeat Yourself).
Function Components Every R function is like a
recipe — with ingredients
(arguments), instructions
(body), and a finished dish
(return value).
Function Name :
This is the actual name of the function. It is stored in R
environment as an object with this name.

Arguments :
An argument is a placeholder. When a function is invoked, you
pass a value to the argument. Arguments are optional; that is, a
function may contain no arguments. Also arguments can have
default values.
Function Body:
The function body contains a collection of statements that
defines what the function does.

Return Value:
The return value of a function is the last expression in the
function body to be evaluated.
Types Of Functions In R
1.Built-in Functions 2. User-Defined Functions
R has many in-built functions which can be directly called in the We can create user-defined functions in R. They are specific to what
program without defining them first. a user wants and once created they can be used like the built-in
These are functions that come pre-defined in R, ready to use. functions.
A few of the built-in in functions are as follows: Syntax:
greet <- function(name) {
message <- paste("Hello", name, "!")
Function Description Example Output return(message)
}
greet("Laila")
Adds
1.Sum () Sum(1, 2, 3) [1] 6 Output:
numbers
[1] "Hello Laila!"

2.Mean () Calculates Avg mean(c(2, 4)) [1] 3

3.Sqrt () Finds Sqr root sqrt(16) [1] 4

Returns length
4.Length () length(c(1, 2, 3)) [1] 3
of a vector
3. Return statement
The return() function is used to return
output from a function
If not explicitly used, R returns the last 4. Nested Functions
evaluated expression
A function defined inside another
Example: function.
myfunc <- function(x) { Useful for organizing logic or keeping
helper functions private. 5. Function Scoping
return(x + 10)
} R uses lexical scoping, which means that
myfunc(5) Example:
outer_func <- function(a) { the value of a variable is looked up in the
inner_func <- function(b) { environment where the function was
Output: defined, not where it is called.
[1] 15 return(b^2)
}
return(inner_func(a) + 1) Example
} x <- 10
outer_func(3) scope_example <- function(){
x <- 5
Output: return(x)}
[1]10 scope_example() # Returns 5 because x is
5 inside the function scope
x # Returns 10 as the global x remains
unchanged

Output:
[1] 5
[1] 10
You install a package once,
6. Recursion but you must load it every
A recursive function is a function that calls itself. time you use it.
Recursion is useful for tasks that can be divided into
similar subtasks, such as calculating factorials.

Example: Factorial Function

factorial <- function(n){
if (n <= 1){
return(1) Loading an R Package
} else{ R packages contain functions, data, and code that
return(n * factorial(n -1)) you can use in your R scripts. You can load a package
}} using library() or require().
factorial(5)
Example:
Output: # Install a package if not already installed
[1] 120 if (!require(ggplot2)) install.packages("ggplot2")

# Load the package

library(ggplot2)
Thank You!

R-Programming Notes
100% (1)
R-Programming Notes
33 pages
Unit 2
No ratings yet
Unit 2
32 pages
Compendium of Logistics Policies Volume III PDF
92% (13)
Compendium of Logistics Policies Volume III PDF
219 pages
Unit 1
No ratings yet
Unit 1
26 pages
Data Analysis Using R and Vectors
No ratings yet
Data Analysis Using R and Vectors
35 pages
R Programming For Data Analysis
No ratings yet
R Programming For Data Analysis
68 pages
R Programming
No ratings yet
R Programming
13 pages
Satyam Jha R File
No ratings yet
Satyam Jha R File
41 pages
R Programming Built in Functions
No ratings yet
R Programming Built in Functions
8 pages
R Presentation
No ratings yet
R Presentation
19 pages
R Concepts - 25092018 PDF
No ratings yet
R Concepts - 25092018 PDF
51 pages
Learn R As You Learnt Your Mother Tongue
100% (2)
Learn R As You Learnt Your Mother Tongue
516 pages
Basics of R
No ratings yet
Basics of R
12 pages
What Does "Free and Open-Source" Mean?: (You Don't Have To Pay For It)
No ratings yet
What Does "Free and Open-Source" Mean?: (You Don't Have To Pay For It)
6 pages
R Programming Course Material
No ratings yet
R Programming Course Material
217 pages
Learnr
No ratings yet
Learnr
528 pages
Ids Longs (Unit 3,4,5)
No ratings yet
Ids Longs (Unit 3,4,5)
26 pages
Introduction To R Installation: Data Types Value Examples
No ratings yet
Introduction To R Installation: Data Types Value Examples
9 pages
2.R Concepts - BDSM - Oct2020 PDF
No ratings yet
2.R Concepts - BDSM - Oct2020 PDF
37 pages
R Project
0% (1)
R Project
25 pages
R Functions - 06
No ratings yet
R Functions - 06
26 pages
R Programming Notes
No ratings yet
R Programming Notes
23 pages
R Programming for Data Science 1st Edition Roger Peng pdf download
100% (1)
R Programming for Data Science 1st Edition Roger Peng pdf download
91 pages
R Fundamentals (Hadley Wickham - Rice Univ)
No ratings yet
R Fundamentals (Hadley Wickham - Rice Univ)
66 pages
Functions
No ratings yet
Functions
6 pages
Functions Vs Scripts and Datasets
No ratings yet
Functions Vs Scripts and Datasets
25 pages
Function
No ratings yet
Function
5 pages
Bda. Unit. 5
No ratings yet
Bda. Unit. 5
27 pages
Unit 2 in R Updated-HN
No ratings yet
Unit 2 in R Updated-HN
14 pages
Wa0011
No ratings yet
Wa0011
32 pages
Unit I R Data Structures
No ratings yet
Unit I R Data Structures
30 pages
Functions in R, Math
No ratings yet
Functions in R, Math
16 pages
Screenshot 2024-11-18 at 3.41.59 PM
No ratings yet
Screenshot 2024-11-18 at 3.41.59 PM
86 pages
Statistics Using R Language
No ratings yet
Statistics Using R Language
5 pages
R Language Unit 1
No ratings yet
R Language Unit 1
21 pages
Intro 2 R
No ratings yet
Intro 2 R
206 pages
MIT 201 - Tutorial 01
No ratings yet
MIT 201 - Tutorial 01
8 pages
R WorkSamples
No ratings yet
R WorkSamples
44 pages
Statistics With R Programming For Bigdata (Autosaved)
No ratings yet
Statistics With R Programming For Bigdata (Autosaved)
41 pages
R For Networks Workshop - Ognyanova - 2018
No ratings yet
R For Networks Workshop - Ognyanova - 2018
51 pages
Rprogramming PDF
No ratings yet
Rprogramming PDF
182 pages
R Lesson (1 of 2) PDF
No ratings yet
R Lesson (1 of 2) PDF
182 pages
R Programing
No ratings yet
R Programing
12 pages
4 (John Stredwick) Introduction To Human Resource Ma
No ratings yet
4 (John Stredwick) Introduction To Human Resource Ma
61 pages
Pplpresentation 211012192639
No ratings yet
Pplpresentation 211012192639
35 pages
Homo Deus A Brief History of Tomorrow
No ratings yet
Homo Deus A Brief History of Tomorrow
19 pages
Stat 1st Unit
No ratings yet
Stat 1st Unit
32 pages
Lam IntroductionToR LHL
No ratings yet
Lam IntroductionToR LHL
212 pages
13.1 Course Notes - Section II, III, IV
No ratings yet
13.1 Course Notes - Section II, III, IV
12 pages
R Programming and Development From Basics To Advanced Topics
No ratings yet
R Programming and Development From Basics To Advanced Topics
154 pages
R Programming Course Notes: Overview and History of R
No ratings yet
R Programming Course Notes: Overview and History of R
22 pages
Unit 1 - R Programming
No ratings yet
Unit 1 - R Programming
30 pages
The Basics of The R Programming Language
No ratings yet
The Basics of The R Programming Language
21 pages
Intro 2 R
No ratings yet
Intro 2 R
206 pages
Unit 1 Big Data Analytics - An Introduction (Final)
No ratings yet
Unit 1 Big Data Analytics - An Introduction (Final)
65 pages
R Functions: Things Your Mother (Probably) Didn't Tell You About
No ratings yet
R Functions: Things Your Mother (Probably) Didn't Tell You About
34 pages
Aace Sample
No ratings yet
Aace Sample
4 pages
Commando Quarterly Iss1 1005 PDF
100% (1)
Commando Quarterly Iss1 1005 PDF
51 pages
UHBVN
No ratings yet
UHBVN
4 pages
Photo Editing - A Guide For Beginners - September 2019
100% (3)
Photo Editing - A Guide For Beginners - September 2019
166 pages
System Analysis & Design
No ratings yet
System Analysis & Design
2 pages
Lecture 1 Introduction and Process Analysis - 08-2021
No ratings yet
Lecture 1 Introduction and Process Analysis - 08-2021
36 pages
Solar Inverter Modbus Interface Definitions (V3.0)
No ratings yet
Solar Inverter Modbus Interface Definitions (V3.0)
163 pages
JNFARETPPF
No ratings yet
JNFARETPPF
8 pages
Springer Lecture Notes in Computer Science
No ratings yet
Springer Lecture Notes in Computer Science
11 pages
6 Университет Карлос 3 Мадрид Испания
No ratings yet
6 Университет Карлос 3 Мадрид Испания
6 pages
Dms Unit III
No ratings yet
Dms Unit III
40 pages
Sahil Sagar - BD
No ratings yet
Sahil Sagar - BD
1 page
Multi Panel Meter MT4Y/MT4W Series
No ratings yet
Multi Panel Meter MT4Y/MT4W Series
11 pages
Port Forwarding Na Telekom Huawei HG530 - Bombastic85 PDF
No ratings yet
Port Forwarding Na Telekom Huawei HG530 - Bombastic85 PDF
11 pages
A Salon Coupon Discount System in C++
No ratings yet
A Salon Coupon Discount System in C++
9 pages
MyDMS - Document Management System
No ratings yet
MyDMS - Document Management System
39 pages
Dauda Head
No ratings yet
Dauda Head
7 pages
Rahman Resume
No ratings yet
Rahman Resume
1 page
Tensiometro Welch Allyn Hillroom
No ratings yet
Tensiometro Welch Allyn Hillroom
4 pages
New Drivers License
No ratings yet
New Drivers License
7 pages
845 - FireRay 5000 User Guide
No ratings yet
845 - FireRay 5000 User Guide
32 pages
Prison-Management-System Presentation 1
No ratings yet
Prison-Management-System Presentation 1
12 pages
SDNEA2024001
No ratings yet
SDNEA2024001
42 pages
LIKE in SQL - 1keydata
No ratings yet
LIKE in SQL - 1keydata
2 pages
CR 12
No ratings yet
CR 12
1 page
A 30 Day Plan To End Your Struggle For Data
No ratings yet
A 30 Day Plan To End Your Struggle For Data
20 pages
Some Thoughts On Elektron's Octatrack by Merlin Updated
No ratings yet
Some Thoughts On Elektron's Octatrack by Merlin Updated
38 pages
IT Business Case Template
No ratings yet
IT Business Case Template
3 pages
General Foldback Instructions
No ratings yet
General Foldback Instructions
7 pages

Sidd Bda

Uploaded by

Sidd Bda

Uploaded by

BIG DATA ANALYTICS

Sidrah Mohammadi Waris

HDFS, or Hadoop Distributed File System, is a core

It is the primary storage system of Hadoop.

Acts as a central storage layer that allows other components

It supports large file sizes, enabling efficient data

HDFS is fault-tolerant, ensuring data is safe

NoSQL (Not Only SQL) refers to a class of non-

Types of keys, docs,

Databases NoSQL = freedom to store

🔑 Key-Value Pair 📄 Document Based 📊 Column-Family 🔗 Graph Database

db.users.insertOne({ 🔗 4. Graph Database (Neo4j)

In R, a function is an object so the R interpreter is able to pass control to the

Creating A Function: Calling A Function:

Functions are the building blocks of R programming, allowing you to

2.Mean () Calculates Avg mean(c(2, 4)) [1] 3

3.Sqrt () Finds Sqr root sqrt(16) [1] 4

Example: Factorial Function

# Load the package

You might also like