0% found this document useful (0 votes)

48 views12 pages

STA1040 MidSem Exam

The document discusses importing data and necessary packages in R. It then performs several data cleaning techniques on the dataframe including [1] replacing missing values with zeros, [2] removing rows with any missing values, and [3] combining two variables into one new variable. It also [4] subsets the data to only include married respondents and [5] considers removing a column due to many missing values.

Uploaded by

gugugaga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views12 pages

STA1040 MidSem Exam

Uploaded by

gugugaga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

STA1040 MidSem Exam

Mark Bilahi M’rabu

2023-10-19

Import the Data and necessary packages

library(readxl)
install.packages('tidyverse', repos='http://cran.us.r-project.org')

## Installing package into ’C:/Users/ADMIN/AppData/Local/R/win-library/4.3’

## (as ’lib’ is unspecified)

## package ’tidyverse’ successfully unpacked and MD5 sums checked

##
## The downloaded binary packages are in
## C:\Users\ADMIN\AppData\Local\Temp\RtmpayQgvO\downloaded_packages

install.packages('finalfit', repos='http://cran.us.r-project.org')

## Installing package into ’C:/Users/ADMIN/AppData/Local/R/win-library/4.3’

## (as ’lib’ is unspecified)

## package ’finalfit’ successfully unpacked and MD5 sums checked

##
## The downloaded binary packages are in
## C:\Users\ADMIN\AppData\Local\Temp\RtmpayQgvO\downloaded_packages

install.packages('dplyr', repos='http://cran.us.r-project.org')

## Installing package into ’C:/Users/ADMIN/AppData/Local/R/win-library/4.3’

## (as ’lib’ is unspecified)

## package ’dplyr’ successfully unpacked and MD5 sums checked

## Warning: cannot remove prior installation of package ’dplyr’

## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying

## C:\Users\ADMIN\AppData\Local\R\win-library\4.3\00LOCK\dplyr\libs\x64\dplyr.dll
## to C:\Users\ADMIN\AppData\Local\R\win-library\4.3\dplyr\libs\x64\dplyr.dll:
## Permission denied

1
## Warning: restored ’dplyr’

##
## The downloaded binary packages are in
## C:\Users\ADMIN\AppData\Local\Temp\RtmpayQgvO\downloaded_packages

library(dplyr)

##
## Attaching package: ’dplyr’

## The following objects are masked from ’package:stats’:

##
## filter, lag

## The following objects are masked from ’package:base’:

##
## intersect, setdiff, setequal, union

library(tidyverse)

## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --

## v forcats 1.0.0 v readr 2.1.4
## v ggplot2 3.4.3 v stringr 1.5.0
## v lubridate 1.9.3 v tibble 3.2.1
## v purrr 1.0.2 v tidyr 1.3.0

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --

## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(finalfit)
Telecommunication_Data = read_excel("D:/Documents/School Documents/USIU/Y2/Y2S3/STA1040/Telecommunicatio
View(Telecommunication_Data)
data = data.frame(Telecommunication_Data)

Perform any 5 data manipulation or data cleaning techniques. State and describe

the technique being applied. Illustrate R codes and R outputs plus interpretation

data %>% missing_plot()

2
Missing values map
region
tenure
age
marital
address
income
employ
retire
gender
reside
tollfree
tollten
equipten
cardten
wireten
loglong
logtoll
logequi
logcard
logwire
custcat
churn
0 250 500 750 1000
Observation

#1. Replacing all NA values with zeros

data1 = data %>% replace(is.na(.), 0)
data1 %>% missing_plot()

3
Missing values map
region
tenure
age
marital
address
income
employ
retire
gender
reside
tollfree
tollten
equipten
cardten
wireten
loglong
logtoll
logequi
logcard
logwire
custcat
churn
0 250 500 750 1000
Observation

"The above code locates any and all NA values within the dataframe and replaces them with
the integer 0. This makes subsequent data analysis easier as there are no longer any conflicts
of data types or NA values interrupting calculations"

## [1] "The above code locates any and all NA values within the dataframe and replaces them with \nthe i

#2. Removing all NA values from the dataframe

data2 = na.omit(data)
data2 %>% missing_plot

4
Missing values map
region
tenure
age
marital
address
income
employ
retire
gender
reside
tollfree
tollten
equipten
cardten
wireten
loglong
logtoll
logequi
logcard
logwire
custcat
churn
0 25 50 75 100 125
Observation

summary(data2)

## region tenure age marital

## Length:119 Min. : 2.00 Min. :20.00 Length:119
## Class :character 1st Qu.:16.00 1st Qu.:31.00 Class :character
## Mode :character Median :34.00 Median :39.00 Mode :character
## Mean :34.91 Mean :40.91
## 3rd Qu.:51.50 3rd Qu.:49.00
## Max. :72.00 Max. :69.00
## address income employ retire
## Min. : 0.00 Min. : 15.00 Min. : 0.00 Length:119
## 1st Qu.: 4.00 1st Qu.: 37.00 1st Qu.: 3.00 Class :character
## Median : 9.00 Median : 57.00 Median : 7.00 Mode :character
## Mean :11.29 Mean : 97.06 Mean :10.43
## 3rd Qu.:15.50 3rd Qu.: 96.50 3rd Qu.:15.50
## Max. :44.00 Max. :944.00 Max. :39.00
## gender reside tollfree tollten
## Length:119 Min. :1.00 Length:119 Min. : 23.05
## Class :character 1st Qu.:1.00 Class :character 1st Qu.: 318.80
## Mode :character Median :2.00 Mode :character Median : 851.70
## Mean :2.42 Mean :1051.35
## 3rd Qu.:3.00 3rd Qu.:1659.45
## Max. :6.00 Max. :4905.85
## equipten cardten wireten loglong
## Min. : 29.05 Min. : 5.0 Min. : 20.95 Min. :0.470

5
## 1st Qu.: 624.75 1st Qu.: 222.5 1st Qu.: 507.65 1st Qu.:1.691
## Median :1481.35 Median : 540.0 Median :1217.35 Median :2.116
## Mean :1590.61 Mean : 750.5 Mean :1595.79 Mean :2.183
## 3rd Qu.:2448.28 3rd Qu.:1022.5 3rd Qu.:2405.97 3rd Qu.:2.657
## Max. :4167.70 Max. :4975.0 Max. :6444.95 Max. :4.072
## logtoll logequi logcard logwire
## Min. :2.546 Min. :3.357 Min. :1.322 Min. :2.874
## 1st Qu.:3.002 1st Qu.:3.669 1st Qu.:2.536 1st Qu.:3.449
## Median :3.199 Median :3.764 Median :2.918 Median :3.696
## Mean :3.239 Mean :3.793 Mean :2.897 Mean :3.705
## 3rd Qu.:3.493 3rd Qu.:3.935 3rd Qu.:3.178 3rd Qu.:3.930
## Max. :4.208 Max. :4.353 Max. :4.241 Max. :4.698
## custcat churn
## Length:119 Length:119
## Class :character Class :character
## Mode :character Mode :character
##
##
##

View(data2)
"The above code locates all rows with NA values within the data frame and removes them from the
dataframe leaving you with fewer observations than before however all of them have all the variable data

## [1] "The above code locates all rows with NA values within the data frame and removes them from the\n

#3. Combining the 'region' and 'address' variables to create one comprehensive locator variable 'Address
data3 = data %>% replace(is.na(.), 'x')
View(data3)
data3 = data.frame(unite(data3, col = "Address", c('region', 'address'), sep = ", "))
#View(Address)
"The above code firstly creates a new dataframe where all the NA values have been replaced with the char

## [1] "The above code firstly creates a new dataframe where all the NA values have been replaced with t

#4. Subsetting the Data to focus on a specific demographic of responses

data5 = data[data[, 4] == "Married", ]
data5 = na.omit(data5)
View(data5)
"The above code creates a subset of the initial dataframe by scanning the 'marital' column of the datafr

## [1] "The above code creates a subset of the initial dataframe by scanning the ’marital’ column of the

#5. Removing a column from the Data due to high frequency of missing values / poor data quality
data %>% missing_plot()

6
Missing values map
region
tenure
age
marital
address
income
employ
retire
gender
reside
tollfree
tollten
equipten
cardten
wireten
loglong
logtoll
logequi
logcard
logwire
custcat
churn
0 250 500 750 1000
Observation

na_sum = colSums(is.na(data))
print(na_sum)

## region tenure age marital address income employ retire

## 7 8 5 9 8 8 7 8
## gender reside tollfree tollten equipten cardten wireten loglong
## 9 9 8 8 8 7 8 7
## logtoll logequi logcard logwire custcat churn
## 525 614 322 704 8 9

"The 'logwire' colun has the highest frequency of NA values hence will be the column to be removed"

## [1] "The ’logwire’ colun has the highest frequency of NA values hence will be the column to be remove

data6 = data[ , colnames(data) != "logwire"]

colnames(data)

## [1] "region" "tenure" "age" "marital" "address" "income"

## [7] "employ" "retire" "gender" "reside" "tollfree" "tollten"
## [13] "equipten" "cardten" "wireten" "loglong" "logtoll" "logequi"
## [19] "logcard" "logwire" "custcat" "churn"

7
colnames(data6)

## [1] "region" "tenure" "age" "marital" "address" "income"

## [7] "employ" "retire" "gender" "reside" "tollfree" "tollten"
## [13] "equipten" "cardten" "wireten" "loglong" "logtoll" "logequi"
## [19] "logcard" "custcat" "churn"

"The above code firstyl creates a visualisation of the location of NA values throughout the dataset so a

## [1] "The above code firstyl creates a visualisation of the location of NA values throughout the datas

Consider any of the newly created data in 1(a) above. Describe the data being
used

" Considering 'data5' from the above question; the data is a subset of the parent Telecommunication' dat
only contains entries from respondents who are married irregardless of age or income level"

## [1] " Considering ’data5’ from the above question; the data is a subset of the parent Telecommunicati

Provide appropriate descriptive summaries of any two variables using the chosen

newly created data. Also provide interpretation

"Using data5"

## [1] "Using data5"

#1. Dscriptive summary of 'income' variable

income = data5$income
summary(income)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 15.00 34.00 55.00 88.64 80.00 591.00

#range(income)
#table(income)
histogram1 = hist(income, xlab = 'Income', ylab = '# of Respondents', main = 'Histogram of Income of Mar

8
Histogram of Income of Married respondents
50
40
# of Respondents

30
20
10
0

0 100 200 300 400 500 600

Income

#plot(income)
boxplot1 = boxplot(income, main = 'Boxplot of Income of Married respondents', ylab = 'Income')

9
Boxplot of Income of Married respondents
100 200 300 400 500 600
Income

boxplot1

## $stats
## [,1]
## [1,] 15
## [2,] 34
## [3,] 55
## [4,] 80
## [5,] 145
##
## $n
## [1] 61
##
## $conf
## [,1]
## [1,] 45.69428
## [2,] 64.30572
##
## $out
## [1] 163 359 301 294 228 163 591 262 256 162
##
## $group
## [1] 1 1 1 1 1 1 1 1 1 1
##
## $names
## [1] ""

10
histogram1

## $breaks
## [1] 0 100 200 300 400 500 600
##
## $counts
## [1] 49 5 4 2 0 1
##
## $density
## [1] 0.0080327869 0.0008196721 0.0006557377 0.0003278689 0.0000000000
## [6] 0.0001639344
##
## $mids
## [1] 50 150 250 350 450 550
##
## $xname
## [1] "income"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"

table(data5$income > 300)

##
## FALSE TRUE
## 58 3

"From the above analysis it is clear that among the Married respondents, income is not normally distribu

## [1] "From the above analysis it is clear that among the Married respondents, income is not normally d

#2. Descriptive summary of 'gender' variable

malesum = table(data5$gender)['Male']
femalesum = table(data5$gender)['Female']
malesum

## Male
## 29

femalesum

## Female
## 32

sumsum = femalesum + malesum

femaleprop = (femalesum / sumsum) * 100
maleprop = (malesum / sumsum) * 100
femaleprop

11
## Female
## 52.45902

maleprop

## Male
## 47.54098

"From the above analysis it is clear to see that, among the married respondents their is very little gen

## [1] "From the above analysis it is clear to see that, among the married respondents their is very lit

300+ TOP Business Statistics MCQs and Answers 2021 PDF
100% (2)
300+ TOP Business Statistics MCQs and Answers 2021 PDF
13 pages
Statistics With R Unit 1: Divya Arun Kumar
No ratings yet
Statistics With R Unit 1: Divya Arun Kumar
65 pages
Materi 4
No ratings yet
Materi 4
30 pages
BT1101 L2 LAB - Data Exploration and Viz AY2425S1
No ratings yet
BT1101 L2 LAB - Data Exploration and Viz AY2425S1
45 pages
(R) Internal-2 Q & A
No ratings yet
(R) Internal-2 Q & A
65 pages
04 Descriptive Analysis
No ratings yet
04 Descriptive Analysis
60 pages
Sta238 Wks - Week1+2
No ratings yet
Sta238 Wks - Week1+2
35 pages
R Program3
No ratings yet
R Program3
21 pages
Working With Data
No ratings yet
Working With Data
38 pages
R1 Uptovisualisation
No ratings yet
R1 Uptovisualisation
122 pages
R-Basics Knit
No ratings yet
R-Basics Knit
13 pages
Data Cleaning Using R
No ratings yet
Data Cleaning Using R
26 pages
Week 1-3
No ratings yet
Week 1-3
17 pages
Important R Codes and Notes
No ratings yet
Important R Codes and Notes
13 pages
Unit 2
No ratings yet
Unit 2
29 pages
Unit 2
No ratings yet
Unit 2
76 pages
R For Ds QB PDF Format New 2023 Batch Sem 4 Apr 2025
No ratings yet
R For Ds QB PDF Format New 2023 Batch Sem 4 Apr 2025
27 pages
Practical 1 EDA
No ratings yet
Practical 1 EDA
14 pages
Handling Mi
No ratings yet
Handling Mi
6 pages
Base R
No ratings yet
Base R
9 pages
Pushpendra Lab File
No ratings yet
Pushpendra Lab File
51 pages
Predictive Analytics: Group Assignment 2
No ratings yet
Predictive Analytics: Group Assignment 2
6 pages
(Practical) Programming With R
No ratings yet
(Practical) Programming With R
5 pages
Advanced R Programming Tidyverse Packages Notes
No ratings yet
Advanced R Programming Tidyverse Packages Notes
12 pages
Week13 Slides Review
No ratings yet
Week13 Slides Review
23 pages
First Course On R
No ratings yet
First Course On R
26 pages
R File Code
No ratings yet
R File Code
16 pages
R - SEC - 2022 - Solution DU CBCS
No ratings yet
R - SEC - 2022 - Solution DU CBCS
6 pages
R Programming Interview Questions-1
No ratings yet
R Programming Interview Questions-1
20 pages
ProgrammingForDS14 Rbasics
No ratings yet
ProgrammingForDS14 Rbasics
32 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
Problem Set 1 Solution Numerical Methods
No ratings yet
Problem Set 1 Solution Numerical Methods
32 pages
R-Lab p-4,2,1
No ratings yet
R-Lab p-4,2,1
12 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
4 Overview of R Part 2
No ratings yet
4 Overview of R Part 2
63 pages
Handling Missing Values and Outliers
No ratings yet
Handling Missing Values and Outliers
4 pages
Big Data - Lab 3
No ratings yet
Big Data - Lab 3
25 pages
1 - Business Statistics
No ratings yet
1 - Business Statistics
82 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
SAS R::: Cheat Sheet
No ratings yet
SAS R::: Cheat Sheet
2 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
R Functions
No ratings yet
R Functions
8 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
R Assignment
No ratings yet
R Assignment
9 pages
Final Quantitative Research 1 - Compress
No ratings yet
Final Quantitative Research 1 - Compress
34 pages
Statistics in Psychology and Education - Garrett, Henry Edward, 1894 - 1947 - New York, London (Etc - ) Longmans, Green and Co - Anna's Archive
No ratings yet
Statistics in Psychology and Education - Garrett, Henry Edward, 1894 - 1947 - New York, London (Etc - ) Longmans, Green and Co - Anna's Archive
512 pages
MCQ 15it423e
No ratings yet
MCQ 15it423e
28 pages
R Examples
No ratings yet
R Examples
56 pages
Week 7
No ratings yet
Week 7
10 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Doane - Stats - Chap 007 - Test Answers
100% (2)
Doane - Stats - Chap 007 - Test Answers
83 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
Introduction To R
No ratings yet
Introduction To R
36 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
Mental Models - The Best Way To Make Intelligent Decisions ( 100 Models Explained) - Farnam Street
No ratings yet
Mental Models - The Best Way To Make Intelligent Decisions ( 100 Models Explained) - Farnam Street
24 pages
23 Response Optimization
No ratings yet
23 Response Optimization
30 pages
Apex Steel Catalogue 2021 FINAL Compressed Compressed
No ratings yet
Apex Steel Catalogue 2021 FINAL Compressed Compressed
71 pages
Stroop (Stroop, 1935)
100% (3)
Stroop (Stroop, 1935)
20 pages
BCSL044 Solved Assignment
No ratings yet
BCSL044 Solved Assignment
9 pages
Mathematics Ssc-Ii: Version No. Roll Number
No ratings yet
Mathematics Ssc-Ii: Version No. Roll Number
8 pages
Estimating Peak Runoff Rates: Boonstral
No ratings yet
Estimating Peak Runoff Rates: Boonstral
10 pages
Worksheet
No ratings yet
Worksheet
6 pages
Statistics and Probability Module 4 Moodle
No ratings yet
Statistics and Probability Module 4 Moodle
6 pages
Assesement For Learning Tirtha Sir
No ratings yet
Assesement For Learning Tirtha Sir
29 pages
Aps U9 Test Review Key
No ratings yet
Aps U9 Test Review Key
5 pages
Present Tense
No ratings yet
Present Tense
14 pages
7 Sampling and Sampling Distribution (Class Version)
No ratings yet
7 Sampling and Sampling Distribution (Class Version)
68 pages
Stat - 4 One Sample Z and T Test
No ratings yet
Stat - 4 One Sample Z and T Test
9 pages
8 - M2 - Stratified Sampling
No ratings yet
8 - M2 - Stratified Sampling
33 pages
Central - Tendency Median Mode
No ratings yet
Central - Tendency Median Mode
52 pages
Wa0000.
No ratings yet
Wa0000.
5 pages
2007 Understanding Power and Rule of Thumb For Determining Sample Size
No ratings yet
2007 Understanding Power and Rule of Thumb For Determining Sample Size
8 pages
Logistic Regression
No ratings yet
Logistic Regression
22 pages
4.2 Apportionment
No ratings yet
4.2 Apportionment
86 pages
ECCM16 Paper IRC-Technologie Ucsnik
No ratings yet
ECCM16 Paper IRC-Technologie Ucsnik
8 pages
Fourrth Year Sociology
No ratings yet
Fourrth Year Sociology
14 pages
01 Representation of Data & Working With Data
No ratings yet
01 Representation of Data & Working With Data
12 pages
Discriminant Analysis Presentation
No ratings yet
Discriminant Analysis Presentation
21 pages
Corporate Financial Risk Management Notes
No ratings yet
Corporate Financial Risk Management Notes
26 pages
Classification of Data
No ratings yet
Classification of Data
3 pages
Arithematic Mean
No ratings yet
Arithematic Mean
8 pages
Module-5-Statistics-And-Probability 11
No ratings yet
Module-5-Statistics-And-Probability 11
9 pages
STA1040 Assignment
No ratings yet
STA1040 Assignment
9 pages
STA2050 Assignment 2
No ratings yet
STA2050 Assignment 2
10 pages
Understanding The Z - Scores C2 Lesson 2
No ratings yet
Understanding The Z - Scores C2 Lesson 2
7 pages
10 - Exercise On One Way ANOVA
No ratings yet
10 - Exercise On One Way ANOVA
4 pages
CI Practice Assignment # 2
No ratings yet
CI Practice Assignment # 2
2 pages
Count Data
No ratings yet
Count Data
5 pages
Generalised Linear Models: Getwd
No ratings yet
Generalised Linear Models: Getwd
7 pages
Individual Assingment 2
No ratings yet
Individual Assingment 2
3 pages
Cheat Sheet imputeTS
No ratings yet
Cheat Sheet imputeTS
1 page
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
Principles of Digital Electronics
From Everand
Principles of Digital Electronics
Sapana Rane
No ratings yet
SSL/TLS Under Lock and Key: A Guide to Understanding SSL/TLS Cryptography
From Everand
SSL/TLS Under Lock and Key: A Guide to Understanding SSL/TLS Cryptography
Paul Baka
4/5 (1)
SuperBASIC: The Manual
From Everand
SuperBASIC: The Manual
Jerry Stratton
No ratings yet
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
From Everand
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
Shari Eskenas
5/5 (1)
Learn C Programming through Nursery Rhymes and Fairy Tales: Classic Stories Translated into C Programs
From Everand
Learn C Programming through Nursery Rhymes and Fairy Tales: Classic Stories Translated into C Programs
Shari Eskenas
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet

STA1040 MidSem Exam

Uploaded by

STA1040 MidSem Exam

Uploaded by

STA1040 MidSem Exam

Mark Bilahi M’rabu

Import the Data and necessary packages

## Installing package into ’C:/Users/ADMIN/AppData/Local/R/win-library/4.3’

## package ’tidyverse’ successfully unpacked and MD5 sums checked

## Installing package into ’C:/Users/ADMIN/AppData/Local/R/win-library/4.3’

## package ’finalfit’ successfully unpacked and MD5 sums checked

## Installing package into ’C:/Users/ADMIN/AppData/Local/R/win-library/4.3’

## package ’dplyr’ successfully unpacked and MD5 sums checked

## Warning: cannot remove prior installation of package ’dplyr’

## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying

## The following objects are masked from ’package:stats’:

## The following objects are masked from ’package:base’:

## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --

data %>% missing_plot()

#1. Replacing all NA values with zeros

#2. Removing all NA values from the dataframe

## region tenure age marital

#4. Subsetting the Data to focus on a specific demographic of responses

## region tenure age marital address income employ retire

data6 = data[ , colnames(data) != "logwire"]

## [1] "region" "tenure" "age" "marital" "address" "income"

## [1] "region" "tenure" "age" "marital" "address" "income"

newly created data. Also provide interpretation

## [1] "Using data5"

#1. Dscriptive summary of 'income' variable

## Min. 1st Qu. Median Mean 3rd Qu. Max.

0 100 200 300 400 500 600

table(data5$income > 300)

#2. Descriptive summary of 'gender' variable

sumsum = femalesum + malesum

You might also like