0% found this document useful (0 votes)

10 views8 pages

Data Analytics-34-41

Uploaded by

Bhuvaneshwari M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views8 pages

Data Analytics-34-41

Uploaded by

Bhuvaneshwari M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

UNIT 3

Data Manipulation
1.Data manipulation is the process of arranging a set of data to make it more
organized and easier to interpret.

2.Data manipulation is used in various industries including accounting, finance,

computer programming, banking, sales, marketing and real estate.

3.The steps of effective data manipulation include extracting data, cleaning the
data, constructing a database, filtering information based on your requirements
and analyzing the data.

Slicing
Slicing is the process of extracting a subset of data from a larger dataset. In
Pandas, we can slice data using the iloc and loc methods. The iloc method is
used to slice data based on the integer position of the rows and columns, while
the loc method is used to slice data based on the labels of the rows and columns.

Subscript and indices

In data analytics, subscripts and indices are used to represent different elements
within datasets or matrices. They play a crucial role in various mathematical and
statistical operations. Here's how they are commonly used:

Matrix Notation:Subscripts are used to denote specific elements within a matrix.

For example, in a 2x2 matrix A, the element in the first row and second column is
denoted as A₁₂.

Time Series Data:Subscripts are used to represent different time periods within
a time series. For example, in a dataset representing monthly sales, Sₜ might
represent sales in month t.

Multi-dimensional Data:For datasets with multiple dimensions or attributes,

subscripts can be used to represent different variables. For instance, in a dataset
with variables like age, income, and education level, Aᵢⱼ might represent the
element at the i-th row and j-th column.
Statistical Notation:Indices are used to denote different groups or categories in
statistical analyses. For instance, in a regression model, β₀ represents the
intercept, β₁ represents the coefficient for the first predictor variable, and so on.

Summation and Aggregation:Indices are crucial in summation operations, such

as the sigma notation (∑). They indicate which elements are being summed or
aggregated in a dataset

Array and Dataframe Access:In programming languages used for data

analytics, like Python or R, indices are used to access specific elements within
arrays, lists, or dataframes. For example, in Python, you might access the first
element of a list as list[0].

Data subset
In data analytics, a "data subset" refers to a portion or segment of a larger
dataset. Creating subsets is a fundamental step in data analysis and is used for
various purposes:

Focus on Specific Variables: You might create a subset to focus on a specific

set of variables of interest, ignoring irrelevant or redundant ones.

Filtering and Sampling: Subsetting allows you to filter out rows or observations
that meet certain criteria. For example, you might want to analyze only the data
related to a specific region, time period, or customer segment.

Dplyr Package
The dplyr package in R Programming Language is a structure of data
manipulation that provides a uniform set of verbs, helping to resolve the most
frequent data manipulation hurdles.

The dplyr Package in R performs the steps given below quicker and in an easier
fashion:

● By limiting the choices the focus can now be more on data manipulation
difficulties.
● There are uncomplicated “verbs”, functions present for tackling every
common data manipulation and the thoughts can be translated into
code faster.
● There are valuable backends and hence waiting time for the computer
reduces.

Select Function

select() is a function from dplyr R package that is used to select data frame
variables by name, by index, and also is used to rename variables while
selecting, and dropping variables by name. In this article, I will explain the syntax
of select() function, and its usage with examples like selecting specific variables
by name, by position, selecting variables from the list of names, and many more.

Syntax

Following is the syntax of select() function of dplyr package in R. This returns an

object of the same class as x (input object).

# Syntax of select()

select(x, variables_to_select)

Program

# Create DataFrame

df <- data.frame(
id = c(10,11,12,13),

name = c('sai','ram','deepika','sahithi'),

gender = c('M','M','F','F'),

dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16')),

state = c('CA','NY','DE',NA),

row.names=c('r1','r2','r3','r4')

Output

id name gender dob state

r1 10 sai M 1990-10-02 CA

r2 11 ram M 1981-03-24 NY

r3 12 deepika F 1987-06-14 DE
r4 13 sahithi F 1985-08-16 <NA>

Filter Function

The filter() function from dplyr package is used to filter the data frame rows
in R. Note that filter() doesn’t actually filter the data instead it retains all rows that
satisfy the specified condition.

Syntax

# Syntax of filter()

filter(x, condition,...)

Program

library(dplyr)

# sample data

df=data.frame(x=c(12,31,4,66,78),

y=c(22.1,44.5,6.1,43.1,99),

z=c(TRUE,TRUE,FALSE,TRUE,TRUE))

# condition

filter(df, x<50 & z==TRUE)

Output

x y z
1 12 22.1 TRUE

2 31 44.5 TRUE

Mutate Function

We can use the mutate() function in R programming to add new variables in the
specified data frame. These new variables are added by performing the
operations on present variables.

Before using the mutate() function, you need to install the dplyr library. We can
use the mutate() method to manipulate big datasets. mutate() is a rather simple
method to use to manipulate datasets.

Syntax

mutate(x, expr)

Program

library(dplyr) #load the library

# Creating data frame

df <- data.frame( studentname = c("Student1", "Student2", "Student3",

"Student4"),

Math = c(75, 58, 93, 66),

Eng= c(44, 89, 89, NA) )

# Calculate the total marks (totalMarks)

# sum of marks in Maths (Math) & English (Eng)

mutate(df, totalMarks = Math + Eng)

Output

Arrange Function

arrange() function in R is from the dplyr package that is used to order/sort the
dataframe rows in either ascending or descending based on column value.

Syntax

# Syntax of arrange()

arrange(.data, ..., .by_group = FALSE)

Program

# Create Data Frame

df=data.frame(id=c(11,22,33,44,55),

name=c("spark","python","R","jsp","java"),

price=c(144,NA,321,567,567),

publish_date= as.Date(
c("2007-06-22", "2004-02-13","2006-05-18","2010-09-02","2007-07-20"))

# Load dplyr library

library(dplyr)

# Using arrange in ascending order

df2 <- df %>% arrange(price)

df2

Output

Subsetting Data in R
No ratings yet
Subsetting Data in R
44 pages
Foxpro Commands
90% (10)
Foxpro Commands
10 pages
Ddbms Lab Manual
No ratings yet
Ddbms Lab Manual
100 pages
A Study On "CRM: Sales Force Automation"
No ratings yet
A Study On "CRM: Sales Force Automation"
84 pages
DS-R Block 3-1 All
No ratings yet
DS-R Block 3-1 All
43 pages
BMR Assignment: Tidyr
No ratings yet
BMR Assignment: Tidyr
3 pages
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
No ratings yet
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
31 pages
CH 3
No ratings yet
CH 3
33 pages
MIT 302 - Statistical Computing II - Tutorial 02
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 02
5 pages
What Is Dplyr
No ratings yet
What Is Dplyr
23 pages
Statistics With R Week 3
No ratings yet
Statistics With R Week 3
3 pages
Lecture 9: Data Wrangling With Dplyr: Kevin Lee
No ratings yet
Lecture 9: Data Wrangling With Dplyr: Kevin Lee
12 pages
Tutorial-Introduction To Dplyr
No ratings yet
Tutorial-Introduction To Dplyr
54 pages
L2 Lecture Note 1
No ratings yet
L2 Lecture Note 1
21 pages
Dar Lecture 7
No ratings yet
Dar Lecture 7
24 pages
6 Working With Data Frames in R
No ratings yet
6 Working With Data Frames in R
8 pages
DSF 11-12
No ratings yet
DSF 11-12
21 pages
Introduction To R
No ratings yet
Introduction To R
18 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Data Science
100% (1)
Data Science
60 pages
Apply Funcs DT
No ratings yet
Apply Funcs DT
32 pages
MLlab 5 TH
No ratings yet
MLlab 5 TH
17 pages
R Programming Cont..
No ratings yet
R Programming Cont..
24 pages
Data Manipulation in R
No ratings yet
Data Manipulation in R
5 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
R Programming Checklist of Basic Skills With Examples
No ratings yet
R Programming Checklist of Basic Skills With Examples
33 pages
Creating and Manipulating Objects
No ratings yet
Creating and Manipulating Objects
12 pages
ProgrammingForDS16 Rdatamanipulation
No ratings yet
ProgrammingForDS16 Rdatamanipulation
20 pages
Data Handling and Manipulation
No ratings yet
Data Handling and Manipulation
18 pages
Programming R - 3
No ratings yet
Programming R - 3
16 pages
Rcourse3 PDF
No ratings yet
Rcourse3 PDF
35 pages
Data Manipulation Workshop Handout
No ratings yet
Data Manipulation Workshop Handout
46 pages
Introduction To R For Business Analytics
No ratings yet
Introduction To R For Business Analytics
7 pages
R
No ratings yet
R
13 pages
Lab Week2-3
No ratings yet
Lab Week2-3
26 pages
Machine Learning - Unit IV Notes
No ratings yet
Machine Learning - Unit IV Notes
18 pages
Presentation 1
No ratings yet
Presentation 1
34 pages
Module II
No ratings yet
Module II
40 pages
R WorkSamples
No ratings yet
R WorkSamples
44 pages
Python Codes
No ratings yet
Python Codes
99 pages
Rbasics
No ratings yet
Rbasics
96 pages
Rajni Ip File Final
No ratings yet
Rajni Ip File Final
42 pages
R Prog
No ratings yet
R Prog
27 pages
Data Handlinng Using Pandas-I
No ratings yet
Data Handlinng Using Pandas-I
46 pages
DA Lab Week-2
No ratings yet
DA Lab Week-2
22 pages
Introduction To R
No ratings yet
Introduction To R
21 pages
Module II
No ratings yet
Module II
40 pages
Basic Data Cleaning
100% (3)
Basic Data Cleaning
64 pages
Introduction To R
No ratings yet
Introduction To R
91 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
R Programming 101 Part 1
No ratings yet
R Programming 101 Part 1
53 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
Unit 2
No ratings yet
Unit 2
76 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
R Vectors
No ratings yet
R Vectors
22 pages
Ids Unit 3 by
No ratings yet
Ids Unit 3 by
109 pages
Apply, Lapply, Sapply, Tapply Function in R With Examples
No ratings yet
Apply, Lapply, Sapply, Tapply Function in R With Examples
10 pages
R Dplyr - Data Manipulation (50 Examples)
No ratings yet
R Dplyr - Data Manipulation (50 Examples)
47 pages
DAUR UNIT 1 Part 2
No ratings yet
DAUR UNIT 1 Part 2
39 pages
R22 Unit3 Vector List Matrix
No ratings yet
R22 Unit3 Vector List Matrix
37 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
JAVA Unit 3
No ratings yet
JAVA Unit 3
21 pages
DDM 2
No ratings yet
DDM 2
82 pages
DDM 5
No ratings yet
DDM 5
46 pages
DDM 3
No ratings yet
DDM 3
43 pages
Da Q&a
No ratings yet
Da Q&a
20 pages
DBMS Architecture (1st, 2nd, 3rd Tier)
No ratings yet
DBMS Architecture (1st, 2nd, 3rd Tier)
19 pages
SAN-CLS-XII-CS-WS - AK - CH 13
No ratings yet
SAN-CLS-XII-CS-WS - AK - CH 13
4 pages
Database Management Systems - Theory
No ratings yet
Database Management Systems - Theory
3 pages
Kiit Internationational School: Project Synopsis ON Spell Correction
No ratings yet
Kiit Internationational School: Project Synopsis ON Spell Correction
15 pages
The Complete Management System That With Your School: Grows
No ratings yet
The Complete Management System That With Your School: Grows
8 pages
Car Rental System
No ratings yet
Car Rental System
12 pages
GenaiStack Script
No ratings yet
GenaiStack Script
2 pages
Student MGT System (Cs Class 12)
No ratings yet
Student MGT System (Cs Class 12)
38 pages
DBS Oel
No ratings yet
DBS Oel
11 pages
PArkash DAta Structure
No ratings yet
PArkash DAta Structure
118 pages
16 Data Mining Techniques - The Complete List - Talend
No ratings yet
16 Data Mining Techniques - The Complete List - Talend
9 pages
Dell Boomi Developer
No ratings yet
Dell Boomi Developer
6 pages
C Dbadm 2404
No ratings yet
C Dbadm 2404
2 pages
Academic Year: 2020: Course Title: Data Structures and Algorithms Lab
No ratings yet
Academic Year: 2020: Course Title: Data Structures and Algorithms Lab
7 pages
Information Technology P1 May-June 2022 Eng
No ratings yet
Information Technology P1 May-June 2022 Eng
26 pages
Simple Draft For CRM
No ratings yet
Simple Draft For CRM
3 pages
Nagarjuna Hadoop Resume
No ratings yet
Nagarjuna Hadoop Resume
7 pages
DBMS Question Bank 2024
No ratings yet
DBMS Question Bank 2024
4 pages
Science Learning Game Capstone Project Documentation
No ratings yet
Science Learning Game Capstone Project Documentation
2 pages
MB 107 Information Technology For Management: Understand Use Develop Analyze
No ratings yet
MB 107 Information Technology For Management: Understand Use Develop Analyze
2 pages
2020 Book AdvancesInBioinformaticsAndCom
No ratings yet
2020 Book AdvancesInBioinformaticsAndCom
284 pages
clc03 Hmtoan Ass4
No ratings yet
clc03 Hmtoan Ass4
56 pages
Learn Python and Develop A Full Deployable Android App or Web App
No ratings yet
Learn Python and Develop A Full Deployable Android App or Web App
3 pages
ADB Chapter 2
No ratings yet
ADB Chapter 2
40 pages
20 Common Errors in Salesforce With Resolution 1720015105
No ratings yet
20 Common Errors in Salesforce With Resolution 1720015105
12 pages
DS&ML 1
No ratings yet
DS&ML 1
9 pages
Core Web API Interview Questions and Answers - Dot Net Tutorials
No ratings yet
Core Web API Interview Questions and Answers - Dot Net Tutorials
19 pages

Data Analytics-34-41

Uploaded by

Data Analytics-34-41

Uploaded by

UNIT 3

2.Data manipulation is used in various industries including accounting, finance,

Subscript and indices

Matrix Notation:Subscripts are used to denote specific elements within a matrix.

Multi-dimensional Data:For datasets with multiple dimensions or attributes,

Summation and Aggregation:Indices are crucial in summation operations, such

Array and Dataframe Access:In programming languages used for data

Focus on Specific Variables: You might create a subset to focus on a specific

Following is the syntax of select() function of dplyr package in R. This returns an

id name gender dob state

filter(df, x<50 & z==TRUE)

library(dplyr) #load the library

# Creating data frame

df <- data.frame( studentname = c("Student1", "Student2", "Student3",

Math = c(75, 58, 93, 66),

Eng= c(44, 89, 89, NA) )

# sum of marks in Maths (Math) & English (Eng)

mutate(df, totalMarks = Math + Eng)

arrange(.data, ..., .by_group = FALSE)

# Create Data Frame

# Load dplyr library

# Using arrange in ascending order

df2 <- df %>% arrange(price)

You might also like