0% found this document useful (0 votes)

93 views57 pages

R for Big Data and Statistics

Uploaded by

Kalighat Okira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views57 pages

R for Big Data and Statistics

Uploaded by

Kalighat Okira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Unit-4

Big Data Analytics

Methods using R
Contents

• Introduction to R-Attributes
• R Graphical user interfaces
• Data import and export
• Attribute and Data Types
• Descriptive Statistics
• Exploratory Data Analysis.
R Overview

• R is a comprehensive statistical and graphical programming language and is a dialect of

the S language:

1988 - S2: RA Becker, JM Chambers, A Wilks

1992 - S3: JM Chambers, TJ Hastie

1998 - S4: JM Chambers

• R: initially written by Ross Ihaka and Robert Gentleman at Dep. of Statistics of U of

Auckland, New Zealand during 1990s.
R Overview

R is an integrated suite of software facilities for data manipulation, calculation and graphical display.
Among other things it has

• an effective data handling and storage facility,

• a suite of operators for calculations on arrays, in particular matrices,
• a large, coherent, integrated collection of intermediate tools for data analysis,
• graphical facilities for data analysis and display either directly at the computer or on hardcopy,
and
• a well developed, simple and effective programming language (called ‘S’) which includes
conditionals, loops, user defined recursive functions and input and output facilities.
How to download?

• CRAN (Comprehensive R Archive Network)

[Link]
Data Types and Objects in R

• Variables are reserved memory locations to store values i.e when we create a variable
we allocate some memory space.

• In R the variable is an object. An object is a data structure having few attributes and
methods which are applied to its attributes.

• Variables can be broadly divided in to two types –

o Numerical
o Character

• Character variables are called Factors and divided in to two types.

• Factor variable is considered nominal if it represents a name(e.g names of persons)

and ordered factors are called ordinal variable (e.g satisfactory level of user from
extremly poor to extremely good).

• Numeric variables are either Interval or Ratio.

R Objects
R has five basic or “atomic” classes of objects:

• Numeric : Also known as Double. The default type when dealing with numbers. – e.g-
1, 1.0, 42.5
• Integer : e.g- 1L, 2L, 42L
• Complex : e.g- 4 + 2i
• Logical : Two possible values: TRUE and FALSE – You can also use T and F, but this is
not recommended. – NA is also considered logical.
• Character : e.g- “a”, “Statistics”, “1 plus 2.”

Other Objects:
• Inf is infinity. You can have either positive or negative infinity.
• NaN means Not a number. It’s an undefined value.
R Attributes

Discuss ??
Data Types and Objects in R
The most essential data structures used in R include:

• Vectors : A vector is an ordered collection of basic data types of a given length.

# Vectors(ordered collection of same data type)

X = c(1, 3, 5, 7, 8)
# Printing those elements in console
print(X)
Data Types and Objects in R

• Lists : A list is a generic object consisting of an ordered collection of objects.

# The first attributes is a numeric vector # containing the employee IDs

which is # created using the 'c' command here
empId = c(1, 2, 3, 4)

# The second attribute is the employee name # which is created using this
line of code here # which is the character vector
empName = c("Debi", "Sandeep", "Subham", "Shiba")

# The third attribute is the number of employees # which is a single numeric

variable.
numberOfEmp = 4

# We can combine all these three different # data types into a list #
containing the details of employees # which can be done using a list command
empList = list(empId, empName, numberOfEmp)
print(empList)
Data import and export

• Dataframes : Dataframes are generic data objects of R which are used to store the
tabular data.
# A vector which is a character vector
Name = c("Amiya", "Raj", "Asish")

# A vector which is a character vector

Language = c("R", "Python", "Java")

# A vector which is a numeric vector

Age = c(22, 25, 45)

# To create dataframe use [Link] command # and then

pass each of the vectors # we have created as arguments #
to the function [Link]()
df = [Link](Name, Language, Age)
print(df)
Data Types and Objects in R

• Matrices : A matrix is a rectangular arrangement of numbers in rows and columns.

A = matrix (c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3,

byrow = TRUE )
# By default matrices are # in column-wise order # So this
parameter decides # how to arrange the matrix

print(A)
Data Types and Objects in R
• Arrays : Arrays are the R data objects which store the data in more than two
dimensions.

A = array( # Taking sequence of elements

c(1, 2, 3, 4, 5, 6, 7, 8), # Creating two
rectangular matrices # each with two rows
and two columns dim = c(2, 2, 2) ) print(A)
Data Types and Objects in R

• Factors : Factors are the data objects which are used to categorize the data and store
it as levels. They are useful for storing categorical data. They can store both strings and
integers.

# Creating factor using factor()

fac = factor(c("Male", "Female", "Male",

"Male", "Female", "Male", "Female"))
print(fac)
Statistics

• Statistics is a method of interpreting, analyzing and summarizing the data.

• Statistical analysis is meant to collect and study the information available in large
quantities

• For example, the collection and interpretation of data about a nation like its economy
and population, military, literacy, etc.

• Statistics have majorly categorized into two types:

o Descriptive statistics
o Inferential statistics
Descriptive Statistics
• In descriptive statistics, the data is summarized through the given observations.

• The summarization is done from a sample of population using parameters such as the mean
or standard deviation.

• Descriptive statistics is a way to organize, represent and describe a collection of data using
tables, graphs, and summary measures. For example, the collection of people in a city using
the internet or using Television.

• Descriptive statistics are also categorized into four different categories:

o Measure of frequency - frequency measurement displays the number of times a

particular data occurs
o Measure of dispersion - Range, Variance, Standard Deviation are measures of
dispersion. It identifies the spread of data
o Measure of central tendency - Central tendencies are the mean, median and mode
of the data
o Measure of position - the measure of position describes the percentile and quartile
ranks.
Inferential statistics
• Inferential statistics is a branch of statistics that involves using data from a sample to
make inferences about a larger population. It is concerned with making predictions,
generalizations, and conclusions about a population based on the analysis of a sample
of data.

• Inferential statistics help to draw conclusions about the population while descriptive
statistics summarizes the features of the data set.

• Inferential statistics encompasses two primary categories –

o hypothesis testing and
o regression analysis.

• It is crucial for samples used in inferential statistics to be an accurate representation of

the entire population.
Exploratory Data Analysis (EDA)
• Exploratory Data Analysis (EDA) is a process of describing the data by means of
statistical and visualization techniques in order to bring important aspects of that data
into focus for further analysis.

• EDA aims to spot patterns and trends, to identify anomalies, and to test early
hypotheses.

• Exploratory data analytics often uses visual techniques, such as graphs, plots, and
other visualizations.
Exploratory Data Analysis (EDA)

Some key benefits of an EDA include:

• Spotting missing and incorrect data

• Understanding the underlying structure of your data
• Testing your hypothesis and checking assumptions
• Calculating the most important variables
• Creating the most efficient model
• Determining error margins
• Identifying the most appropriate statistical tools to help you
Exploratory Data Analysis (EDA)

Types of EDA:

• Univariate analysis: It is one of the simplest forms of data analysis. It looks at the
distribution of a single variable (or column of data) at a time. While univariate
analysis does not strictly need to be visual, it commonly uses visualizations such as
tables, pie charts, histograms, or bar charts.

• Multivariate analysis: It looks at the distribution of two or more variables and

explores the relationship between them. Most multivariate analyses compare two
variables at a time (bivariate analysis).
Data Visualization Using R

Four most common methods of visualizing data

are:

• Histograms
• Barplots
• Boxplots
• Scatterplots
Histograms:
When visualizing a single numerical variable, a histogram can be a go-to tool,
which can be created in R using the hist() function

data("mtcars")

hist(mtcars$mpg)
Histograms:

hist(mtcars$mpg, xlab =
"Miles/gallon", main =
"Histogram of MPG
(mtcars)", breaks = 12,
col = "lightseagreen",
border = "darkorange")

To know more about the arguments that a histogram can take

check this link.

[Link]
ml
Barplots:
A barplot can provide a visual summary of a categorical variable, or a numeric variable
with a finite number of values, like a ranking from 1 to 10. For drawing barplot I will use
cyl varible which is nothing but Number of cylinders in mtcars dataset.

barplot(table(mtcars$cyl))
Barplots:

barplot(table(mtcars$cyl), xlab
= "Number of cylinders", ylab =
"Frequency", main = "mtcars
dataset", col =
"lightseagreen", border =
"darkorange")

To know more about the arguments that a barplot can take check
this link.

[Link]
ml
Boxplots:
we can use a single boxplot as an alternative to a histogram for visualizing a single
numerical variable. Let’s do a boxplot for Weight column in mtcars.

boxplot(mtcars$wt)
Boxplots:

To visualize the relationship between a numerical and categorical variable, we can use a
boxplot. Here mpg is a numerical variable and Number of cylinders is categorical.

boxplot(mpg ~ cyl , data = mtcars)

Boxplots:
You can make the box plot more attractive by setting some of its parameters

boxplot(mpg ~ cyl, data = mtcars,

xlab = "Number of cylinders",
ylab = "Miles/(US) gallon",
main = "Number of cylinders VS Miles/(US) gallon",
pch = 20,
cex = 2,
col = "lightseagreen",
border = "red")
Scatterplots:
To visualize the relationship between two numeric variables we will use a scatterplot. This
can be done with the plot() function.

plot(mpg~disp, data=mtcars)
Scatterplots:

plot(mpg ~ disp, data = mtcars,

xlab = "Displacement",
ylab = "Miles Per Gallon",
main = "MPG vs Displacement",
pch = 20,
cex = 2,
col = "red")
Statistical methods for evaluation:

• Hypothesis Testing
• Difference of Means
• Wilcoxon Rank-Sum Test
• Type I and Type II
• Errors
• power and sample size
• ANOVA
Hypothesis Testing
• Statistical hypothesis is an assumption made about the data of the population
collected for any experiment. Hypothesis testing is also known as “T Testing”.

• It is not mandatory for this assumption to be true every time.

• In order to validate a hypothesis, it will consider the entire population into account.
However, this is not possible practically. Thus, to validate a hypothesis, it will use
random samples from a population.

• On the basis of the result from testing over the sample data, it either selects or rejects
the hypothesis.

• As an example, you may make the assumption that the longer it takes to develop a
product, the more successful it will be, resulting in higher sales than ever before.
Before implementing longer work hours to develop a product, hypothesis testing
ensures there’s an actual connection between the two.
Hypothesis Testing

• Statistical Hypothesis Testing can be categorized into two types as below:

o Null Hypothesis – Hypothesis testing is carried out in order to test the validity of
a claim or assumption that is made about the larger population. This claim that
involves attributes to the trial is known as the Null Hypothesis. The null hypothesis
testing is denoted by H0.

o Alternative Hypothesis – An alternative hypothesis would be considered valid if

the null hypothesis is fallacious. The evidence that is present in the trial is basically
the data and the statistical computations that accompany it. The alternative
hypothesis testing is denoted by H1or Ha.
Hypothesis Testing
• Hypothesis testing is conducted in the following manner:

1. State the Hypotheses – Stating the null and alternative hypotheses.

2. Formulate an Analysis Plan – The formulation of an analysis plan is a crucial step in this
stage.
3. Analyze Sample Data – Calculation and interpretation of the test statistic, as described
in the analysis plan.
4. Interpret Results – Application of the decision rule described in the analysis plan.

• Hypothesis testing ultimately uses a p-value to weigh the strength of the evidence or in
other words what the data are about the population. The p-value ranges between 0 and 1. It
can be interpreted in the following way:

o A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis,
so you reject it.
o A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail
to reject it.

• A p-value very close to the cutoff (0.05) is considered to be marginal and could go either
way.
Hypothesis Testing

The two types of error that can occur from the hypothesis testing:

o Type I Error – Type I error occurs we rejects a null hypothesis when it is true. The
term significance level is used to express the probability of Type I error while
testing the hypothesis. The significance level is represented by the symbol α
(alpha).

o Type II Error – Accepting a false null hypothesis H0 is referred to as the Type II

error. The term power of the test is used to express the probability of Type II error
while testing hypothesis. The power of the test is represented by the symbol β
(beta).
One Sample T-Testing
• One sample T-Testing approach collects a huge amount of data and tests it on random
samples. To perform T-Test normally distributed data is required.

• This test is used to test the mean of the sample with the population. For example, the
height of persons living in an area is different or identical to other persons living in
other areas.
help("[Link]")

# Defining sample vector

x <- rnorm(100)

# One Sample T-Test

[Link](x, mu = 5)
Two Sample T-Testing
• In two sample T-Testing, the sample vectors are compared

# Defining sample vector

x <- rnorm(100)
y <- rnorm(100)

# Two Sample T-Test

[Link](x, y)
Difference of Means

Reference:

[Link]
%3A_Tests_of_Means/8.03%3A_Difference_between_Two_Means
Wilcoxon Test
• The Student’s t-test requires that the distributions follow a normal distribution or if the
sample size is large enough (usually n≥30, thanks to the central limit theorem)

• Wilcoxon test compare two groups when the normality assumption is violated

• The Wilcoxon test is a non-parametric test, meaning that it does not rely on data belonging to
any particular parametric family of probability distributions.

• There are actually two versions of the Wilcoxon test:

o Wilcoxon rank sum test (also referred as The Mann-Withney-Wilcoxon test or Mann-
Whitney U test) is performed when the samples are independent (this test is the non-
parametric equivalent to the Student’s t-test for independent samples).

o The Wilcoxon signed-rank test (also sometimes referred as Wilcoxon test for paired
samples) is performed when the samples are paired/dependent (this test is the non-
parametric equivalent to the Student’s t-test for paired samples).
Wilcoxon rank sum test
Problem: Apply Wilcoxon rank sum test on the given data of following 24 students (12 boys
and 12 girls)
Girls 19 18 9 17 8 7 16 19 20 9 11 18
Boys 16 5 15 2 14 15 4 7 15 6 7 14

The null and alternative hypothesis of the Wilcoxon test are as follows:

o H0 : the 2 groups are equal in terms of the variable of interest

o H1: the 2 groups are different in terms of the variable of interest

Applied to our research question, we have:

o H0 : grades of girls and boys are equal

o H1 : grades of girls and boys are different
Wilcoxon rank sum test

data <- [Link](Gender = [Link](c(rep("Girl", 12), rep("Boy", 12))),

Grade = c(19, 18, 9, 17, 8, 7, 16, 19, 20, 9, 11, 18,16, 5, 15, 2,
14, 15, 4, 7, 15, 6, 7, 14))

library(ggplot2)

ggplot(data) + aes(x = Gender, y = Grade) +

geom_boxplot(fill = "#0c4c8a") + theme_minimal()

hist(subset(data, Gender == "Girl")$Grade,

main = "Grades for girls",
xlab = "Grades" )

hist(subset(data, Gender == "Boy")$Grade,

main = "Grades for boys",
xlab = "Grades" )

test <- [Link](data$Grade ~ data$Gender)

test
Wilcoxon rank sum test

We obtain the following test statistic, the p-value and a reminder of the hypothesis
tested.

Wilcoxon rank sum test with continuity correction

data: data$Grade by data$Gender

W = 31.5, p-value = 0.02056
alternative hypothesis: true location shift is not equal to 0

The p-value is 0.02056. Therefore, at the 5% significance level, we reject the null
hypothesis and we conclude that grades are significantly different between girls and
boys.
Type I and Type II errors

• Using hypothesis testing, one can make decisions about whether the data support or refute
the predictions with null and alternative hypotheses.

• Example: one decide to get tested for COVID-19

based on mild symptoms. There are two errors
that could potentially occur:
o Type I error (false positive): the test
result says that person have coronavirus,
but actually don’t.
o Type II error (false negative): the test
result says that person don’t have
coronavirus, but actually do.
Type I error

• A Type I error means rejecting the null hypothesis when it’s actually true. It means
concluding that results are statistically significant when, in reality, they came about
purely by chance or because of unrelated factors.

• The risk of committing this error is the significance level (alpha or α) you choose.
That’s a value that you set at the beginning of your study to assess the statistical
probability of obtaining your results (p value).

• The significance level is usually set at 0.05 or 5%. This means that your results only
have a 5% chance of occurring, or less, if the null hypothesis is actually true.

• If the p value of your test is lower than the significance level, it means your results are
statistically significant and consistent with the alternative hypothesis. If your p value is
higher than the significance level, then your results are considered statistically non-
significant.
Type I error
• The null hypothesis distribution curve below shows the probabilities of obtaining all
possible results if the study were repeated with new samples and the null hypothesis were
true in the population.

• At the tail end, the shaded area represents alpha. It’s also called a critical region in
statistics.

• If your results fall in the critical region of this curve, they are considered statistically
significant and the null hypothesis is rejected. However, this is a false positive conclusion,
because the null hypothesis is actually true in this case!
Type II error

• A Type II error means not rejecting the null hypothesis when it’s actually false. This is
not quite the same as “accepting” the null hypothesis, because hypothesis testing can
only tell you whether to reject the null hypothesis.

• Instead, a Type II error means failing to conclude there was an effect when there
actually was. In reality, your study may not have had enough statistical power to
detect an effect of a certain size.

• Power is the extent to which a test can correctly detect a real effect when there is
one. A power level of 80% or higher is usually considered acceptable.

• The risk of a Type II error is inversely related to the statistical power of a study. The
higher the statistical power, the lower the probability of making a Type II error.
Type II error

Statistical power is determined by:

• Size of the effect: Larger effects are more easily detected.

• Measurement error: Systematic and random errors in recorded data reduce
power.
• Sample size: Larger samples reduce sampling error and increase power.
• Significance level: Increasing the significance level increases power.

To (indirectly) reduce the risk of a Type II error, you can increase the sample size or the
significance level.
Type II error
• The alternative hypothesis distribution curve below shows the probabilities of obtaining all
possible results if the study were repeated with new samples and the alternative
hypothesis were true in the population.

• The Type II error rate is beta (β), represented by the shaded area on the left side. The
remaining area under the curve represents statistical power, which is 1 – β.

• Increasing the statistical power of your test directly decreases the risk of making a Type II
error.
Analysis Of Variance (ANOVA)
• An ANOVA test is a statistical test used to determine if there is a statistically significant
difference between two or more categorical groups by testing for differences of means
using a variance.

• Another key part of ANOVA is that it splits the independent variable into two or more
groups.
Assumptions Of ANOVA
Here are the three important ANOVA assumptions:

o Normally distributed population derives different group samples.

o The sample or distribution has a homogenous variance
o Analysts draw all the data in a sample independently.

ANOVA test has other secondary assumptions as well, they are:

o The observations must be independent of each other and randomly sampled.

o There are additive effects for the factors.
o The sample size must always be greater than 10.
o The sample population must be uni-modal as well as symmetrical.
Types Of ANOVA Tests

ANOVA test involves setting up:

• Null Hypothesis: All population means are equal.

• Alternate Hypothesis: Atleast one population mean is different from other.

ANOVA tests are of two types:

• One way ANOVA: It takes one categorical group into consideration.

• Two way ANOVA: It takes two categorical group into consideration.
ANOVA Formula
Source Of Degrees Of
Sum Of Squares Mean Squares F Value
Variation Freedom

Between Groups SSB = ∑ nj (X̄ j – X̄)2 df1 =k – 1 MSB = SSB / (k-1) F = MSB/MSE

Error SSE =∑∑ (X- X̄ j)2 df2 = N – k MSE = SSE / (N-k)

Total SST = SSB + SSE df3 = N – 1

N = total number of observations/total sample size

MSB = Mean squares between groups
k = the number of groups
MSE = Mean squares of errors
df1 = Degrees of freedom between groups
df2 = Degrees of freedom of errors SSB = sum of squares between groups
df3 = Total degrees of freedom
SSE = sum of squares of errors
X = each data point in the jth group (individual observation)
X̄ j – X̄ = mean of the jth group, SST = Total sum of squares = SSB +
X- X̄ j = overall mean, and nj is the sample size of the jth SSE
group.
One Way ANOVA test
# Installing the package
[Link]("dplyr")

# Loading the package

library(dplyr)

# Variance in mean within group and between group

boxplot(mtcars$disp~factor(mtcars$gear),
xlab = "gear", ylab = "disp")

# Step 1: Setup Null Hypothesis and Alternate Hypothesis

# H0 = mu = mu01 = mu02(There is no difference
# between average displacement for different gear)
# H1 = Not all means are equal

# Step 2: Calculate test statistics using aov function

mtcars_aov <- aov(mtcars$disp~factor(mtcars$gear))
summary(mtcars_aov)

# Step 3: Calculate F-Critical Value

# For 0.05 Significant value, critical value = alpha = 0.05

# Step 4: Compare test statistics with F-Critical value

# and conclude test p < alpha, Reject Null Hypothesis
One Way ANOVA test

The summary shows that the gear attribute is very significant to

displacement(Three stars denoting it). Also, the P value is less than 0.05, so proves that
gear is significant to displacement i.e related to each other and we reject the Null
Hypothesis.
Two Way ANOVA test
# Installing the package
[Link]("dplyr")
# Loading the package
library(dplyr)

# Variance in mean within group and between group

boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 0),
xlab = "gear", ylab = "disp", main = "Automatic")

boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 1),

xlab = "gear", ylab = "disp", main = "Manual")

# Step 1: Setup Null Hypothesis and Alternate Hypothesis # H0 = mu0 = mu01 = mu02(There is no
difference between # average displacement for different gear) # H1 = Not all means are equal

# Step 2: Calculate test statistics using aov function

mtcars_aov2 <- aov(mtcars$disp~factor(mtcars$gear) * factor(mtcars$am))
summary(mtcars_aov2)

# Step 3: Calculate F-Critical Value # For 0.05 Significant value, critical value = alpha = 0.05

# Step 4: Compare test statistics with F-Critical value # and conclude test p < alpha, Reject
Null Hypothesis
Two Way ANOVA test

The summary shows that the gear attribute is very significant to

displacement(Three stars denoting it) and am attribute is not much significant to
displacement. P-value of gear is less than 0.05, so it proves that gear is significant to
displacement i.e related to each other. P-value of am is greater than 0.05, am is not
significant to displacement i.e not related to each other.
Happy Learning

Unit 1 Big Data Analytics - An Introduction (Final)
No ratings yet
Unit 1 Big Data Analytics - An Introduction (Final)
65 pages
Basics of R
No ratings yet
Basics of R
12 pages
R Data Types 8
No ratings yet
R Data Types 8
7 pages
KMBN It01 - Unit 4
No ratings yet
KMBN It01 - Unit 4
19 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
R Programming Essentials
No ratings yet
R Programming Essentials
27 pages
Unit I - Introduction To R
No ratings yet
Unit I - Introduction To R
21 pages
Data Analytics Using R
100% (1)
Data Analytics Using R
27 pages
R Topicscovered
No ratings yet
R Topicscovered
22 pages
Module 2.5
No ratings yet
Module 2.5
19 pages
Consolidate AmitRana
No ratings yet
Consolidate AmitRana
58 pages
Data Analytic R
No ratings yet
Data Analytic R
28 pages
Data Analytic Using R - Advanced
No ratings yet
Data Analytic Using R - Advanced
51 pages
R Programming: Data Analysis Guide
No ratings yet
R Programming: Data Analysis Guide
61 pages
Module I
No ratings yet
Module I
74 pages
Eda
No ratings yet
Eda
188 pages
Unit Ii Ids Notes
No ratings yet
Unit Ii Ids Notes
30 pages
R - Programming - Fundamentals - PPT 1
No ratings yet
R - Programming - Fundamentals - PPT 1
14 pages
01 IntroSlides
No ratings yet
01 IntroSlides
43 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Ba Assignment Sem 6 (22504025) Dhruvi Pathania
No ratings yet
Ba Assignment Sem 6 (22504025) Dhruvi Pathania
28 pages
Unit 2
No ratings yet
Unit 2
32 pages
An Ordered Book For R Language
No ratings yet
An Ordered Book For R Language
92 pages
R Data Structures Guide
No ratings yet
R Data Structures Guide
10 pages
Data Types & RStudio Basics
No ratings yet
Data Types & RStudio Basics
42 pages
Presentation of R
No ratings yet
Presentation of R
109 pages
UNIT 2 - Advanced Data Structures
No ratings yet
UNIT 2 - Advanced Data Structures
23 pages
Introduction To R For Business Analytics
No ratings yet
Introduction To R For Business Analytics
7 pages
R Concepts - 25092018 PDF
No ratings yet
R Concepts - 25092018 PDF
51 pages
How To Use The R Programming Language For Statistical Analyses
No ratings yet
How To Use The R Programming Language For Statistical Analyses
38 pages
Presentation 3 - Data Structures
No ratings yet
Presentation 3 - Data Structures
45 pages
Module 1 Rprogramming Introduction Part A
No ratings yet
Module 1 Rprogramming Introduction Part A
20 pages
R Programming for Data Analysis Guide
No ratings yet
R Programming for Data Analysis Guide
27 pages
R-Training For Print
No ratings yet
R-Training For Print
11 pages
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
No ratings yet
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
26 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
39 pages
Unit 4
No ratings yet
Unit 4
27 pages
R Programming Easy
No ratings yet
R Programming Easy
8 pages
R Programming
No ratings yet
R Programming
22 pages
Data Analysis for Business Insights
No ratings yet
Data Analysis for Business Insights
44 pages
R for Data Science Beginners
No ratings yet
R for Data Science Beginners
37 pages
A Brief Introduction To R
No ratings yet
A Brief Introduction To R
17 pages
Data Analysis for Business Insights
No ratings yet
Data Analysis for Business Insights
99 pages
R Installation and Basics Guide
No ratings yet
R Installation and Basics Guide
30 pages
R Programming: Data Structures Guide
No ratings yet
R Programming: Data Structures Guide
18 pages
Intro to Data Science with R
No ratings yet
Intro to Data Science with R
40 pages
R for Database Management & Analysis
No ratings yet
R for Database Management & Analysis
79 pages
Data Types in R Programming
No ratings yet
Data Types in R Programming
9 pages
R Basics for Economics Students
No ratings yet
R Basics for Economics Students
7 pages
Advance R Prog.-1
No ratings yet
Advance R Prog.-1
24 pages
R Data Types and Input Methods
No ratings yet
R Data Types and Input Methods
29 pages
Module 5-6
No ratings yet
Module 5-6
12 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Lesson 2 Notes
No ratings yet
Lesson 2 Notes
11 pages
R - A Practical Course
No ratings yet
R - A Practical Course
42 pages
Unit1 R
No ratings yet
Unit1 R
16 pages
Advanced Statistics
No ratings yet
Advanced Statistics
259 pages
Beginner's Guide to R Programming
No ratings yet
Beginner's Guide to R Programming
155 pages
Introduction To Operating System (OS) : Associate Professor, DCG Data Core Systems India PVT LTD Kolkata
No ratings yet
Introduction To Operating System (OS) : Associate Professor, DCG Data Core Systems India PVT LTD Kolkata
59 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Unit-1 Introduction To Big Data Analytics
No ratings yet
Unit-1 Introduction To Big Data Analytics
57 pages
Unit-3 Hadoop Environment
No ratings yet
Unit-3 Hadoop Environment
31 pages
Spark Essentials
No ratings yet
Spark Essentials
15 pages
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
17 pages
K-Means Clustering for Data Analysts
No ratings yet
K-Means Clustering for Data Analysts
25 pages
Bluman Chapter9
No ratings yet
Bluman Chapter9
60 pages
JASP User Manual and System Guide
No ratings yet
JASP User Manual and System Guide
48 pages
Interpretation Final
No ratings yet
Interpretation Final
9 pages
Periodicity Detection Method For Small-Sample Time Series Datasets
No ratings yet
Periodicity Detection Method For Small-Sample Time Series Datasets
10 pages
Business Report - Advanced Statistics - Great Learning
100% (1)
Business Report - Advanced Statistics - Great Learning
20 pages
Chapter 4
0% (1)
Chapter 4
7 pages
Critique of Dream's Speedrun Analysis
No ratings yet
Critique of Dream's Speedrun Analysis
6 pages
7 +franklin+221+-+235
No ratings yet
7 +franklin+221+-+235
15 pages
Nonlife Actuarial Model Evaluation Techniques
No ratings yet
Nonlife Actuarial Model Evaluation Techniques
26 pages
FINAL AgInfomatics MetaAnalysisYieldReport 2014 PDF
No ratings yet
FINAL AgInfomatics MetaAnalysisYieldReport 2014 PDF
62 pages
Lab 3
No ratings yet
Lab 3
36 pages
Khan Et Al (2020) - NPL Determinants
No ratings yet
Khan Et Al (2020) - NPL Determinants
11 pages
Heckmondwike Grammar School Biology Department Edexcel A-Level Biology B
No ratings yet
Heckmondwike Grammar School Biology Department Edexcel A-Level Biology B
48 pages
ch06 Kiemdinh Motmau
No ratings yet
ch06 Kiemdinh Motmau
41 pages
Final Review of Literature
No ratings yet
Final Review of Literature
25 pages
02 - Using Moran's I and GIS To Study The Spatial Pattern of Land Surface Temperature in Relation To Land Use-Land Cover
No ratings yet
02 - Using Moran's I and GIS To Study The Spatial Pattern of Land Surface Temperature in Relation To Land Use-Land Cover
6 pages
Impact of CSR Spending On Firm's Financial Performance: ISSN: 2454-132X Impact Factor: 4.295
No ratings yet
Impact of CSR Spending On Firm's Financial Performance: ISSN: 2454-132X Impact Factor: 4.295
5 pages
Data Mining Test Questions and Answers
0% (1)
Data Mining Test Questions and Answers
5 pages
Marpaung - 2023 - Vocabulary Enrichment Through Picture Word Inducti
No ratings yet
Marpaung - 2023 - Vocabulary Enrichment Through Picture Word Inducti
15 pages
Tables With Interpretations - TQM
No ratings yet
Tables With Interpretations - TQM
15 pages
The Impact of Training Development and Communicati
No ratings yet
The Impact of Training Development and Communicati
18 pages
Effects of Social Media in Coping With Academic Stress of The Selected Students in A Private School in Gumaca, Quezon
No ratings yet
Effects of Social Media in Coping With Academic Stress of The Selected Students in A Private School in Gumaca, Quezon
13 pages
Manganese and Other Micronutrient Additions To Improve Yield of Agaricus Bisporus 2006 Bioresource Technology
No ratings yet
Manganese and Other Micronutrient Additions To Improve Yield of Agaricus Bisporus 2006 Bioresource Technology
6 pages
MDC Lecture 1 - Anova
No ratings yet
MDC Lecture 1 - Anova
10 pages
IBM HR Performance Appraisal Study
No ratings yet
IBM HR Performance Appraisal Study
10 pages
Macroeconomic Impact on Nigeria's Stock Index
No ratings yet
Macroeconomic Impact on Nigeria's Stock Index
7 pages
Respro Notes
No ratings yet
Respro Notes
21 pages
Vit C Titration
No ratings yet
Vit C Titration
25 pages
Breusch-Godfrey Test Results Summary
No ratings yet
Breusch-Godfrey Test Results Summary
8 pages