Unit 1 Introduction
Unit 1 Introduction
1
Introduction to
Data
□ Data refers to raw facts, observations, or information that can be collected,
recorded, and analyzed. Data can take various forms, including numbers,
text, images, audio, and more.
□ Types of Data:
□ Quantitative Data (Numeric): Measurable and represented by numbers
(e.g., height, temperature).
□ Qualitative Data (Categorical): Descriptive and represented by
categories or labels (e.g., colors, types).
2
Types of Data
3
Qualitative or Categorical
Data
Qualitative data, also known as the categorical data, describes the data that fits into the categories. Qualitative data are not
numerical. The categorical information involves categorical variables that describe the features such as a person’s gender, home
town etc. Categorical measures are defined in terms of natural language specifications, but not in terms of numbers.
Sometimes categorical data can hold numerical values (quantitative value), but those values do not have a mathematical sense.
Examples of the categorical data are birthdate, favourite sport, school postcode. Here, the birthdate and school postcode hold the
quantitative value, but it does not give numerical meaning.
Nominal Data
□ Nominal data is a type of qualitative data that groups variables into categories. These categories are purely descriptive,
have no quantitative or numeric value, and cannot be placed into any kind of meaningful order or hierarchy.
□ It helps to label the variables without providing the numerical value. Nominal data is also called the nominal scale. It
cannot be ordered and measured. But sometimes, the data can be qualitative and quantitative. Examples of nominal
data are letters, symbols, words, gender etc.
□ The nominal data are examined using the grouping method. In this method, the data are grouped into categories, and
then the frequency or the percentage of the data can be calculated. These data are visually represented using the pie
charts.
For Example
4
Ordinal
Data
□ Ordinal data/variable is a type of data that
follows a natural order. The significant
feature of the nominal data is that the
difference between the data values is not
determined. This variable is mostly found in
surveys, finance, economics,
questionnaires, and so on.
□ The ordinal data is commonly represented
using a bar chart. These data are
investigated and interpreted through many
visualisation tools. The information may be
expressed using tables in which each row in
the table shows the distinct category.
5
Quantitative or Numerical Data
Quantitative data is also known as numerical data which represents the numerical
value (i.e., how much, how often, how many). Numerical data gives information
about the quantities of a specific thing. Some examples of numerical data are
height, length, size, weight, and so on. The quantitative data can be classified into
two different types based on the data sets. The two different classifications of
numerical data are discrete data and continuous data.
Discrete Data
Discrete data can take only discrete values. Discrete information contains only a
finite number of possible values. Those values cannot be subdivided meaningfully.
Here, things can be counted in whole numbers.
Example: Number of students in the class
Continuous Data
Continuous data is data that can be calculated. It has an infinite number of probable
values that can be selected within a given specific range.
Example: Temperature range
6
7
Data Categorization based on
measurement:
Definition: Data categorization involves classifying data into different groups or categories
based on certain characteristics. This process helps in organizing and making sense of
diverse data types.
Methods of Data Categorization:
Nominal Categorization: Categorizing data into distinct categories without any inherent
order or ranking. Examples include colors or types of fruits.
Ordinal Categorization: Categorizing data where there is a meaningful order or
ranking. Examples include education levels or survey ratings.
Interval Categorization: Categorizing data with equal intervals between consecutive points,
but the absence of a true zero point. Temperature measured in Celsius is an example.
Ratio Categorization: Categorizing data with equal intervals between consecutive points and
a true zero point. Examples include height, weight, and income.
8
Types of Scale
9
Purpose of Data Categorization:
□ Facilitates data organization and management.
□ Aids in the analysis and interpretation of data.
□ Enables efficient communication of information.
□ Understanding data, elements, variables, and data categorization
provides the foundation for statistical analysis and data-driven
decision-making. The appropriate categorization method depends
on the nature of the data and the goals of the analysis.
10
Terms
Elements:
Definition: Elements are the individual entities or units within a dataset. Each
element represents a single observation or data point.
Examples:
In a dataset of student exam scores, each student's score is an element.
In a survey about favorite colors, each respondent's choice represents an element.
Variables:
Definition: Variables are characteristics or attributes that can take different values. They
are the properties of the elements being measured or observed.
Types of Variables:
Independent Variable: The variable manipulated or controlled in an experiment.
Dependent Variable: The variable being measured or observed, affected by the
independent variable.
11
Classification of Digital
Data
Digital data can be classified based on various characteristics, including
its format, structure, and nature. Here are some common classifications
of digital data:
□ Format Based Classification
12
Format-Based Classification:
a. Text
Data:
Consists of alphanumeric characters and is typically human-readable. Examples include documents, emails, and
web pages.
b. Numeric Data:
Comprises numerical values and is often used for quantitative analysis. Examples include spreadsheets, databases,
and numerical datasets.
c. Audio Data:
Represents sound and is stored in digital audio formats. Examples include MP3 files, WAV files, and
streaming audio.
d. Image Data:
Consists of visual information and is stored in digital image formats. Examples include JPEG images, PNG
images, and GIFs.
e. Video Data:
Represents moving images and is stored in digital video formats. Examples include MP4 videos, AVI files,
and streaming video.
f. Multimedia Data:
Combines multiple types of data, such as text, audio, images, and video.
Examples include multimedia presentations, interactive content, and multimedia websites
13
Structure-Based Classification:
a. Unstructured Data:
Lacks a predefined data model and is often text-heavy.
Examples include emails, social media posts, and text documents.
b. Semi-Structured Data:
Has some level of structure but does not fit neatly into traditional relational
databases.
Examples include XML files, JSON data, and certain types of log files.
c. Structured Data:
Organized in a predefined manner, often in rows and columns.
Examples include relational databases, spreadsheets, and CSV files.
14
Nature-Based Classification:
a. Discrete Data:
Consists of separate, distinct values with no intermediate values.
Examples include whole numbers, categories, and binary data.
b. Continuous Data:
Represents a range of values and can take any value within that range.
Examples include real numbers, temperature measurements, and time.
c. Binary Data:
Consists of bits (0s and 1s) and is fundamental to all digital data.
Examples include machine code, executable files, and binary images.
15
Domain-Specific Classification:
a. Scientific Data:
Data generated from scientific experiments and observations.
Examples include sensor readings, scientific simulations, and experimental results.
b. Business Data:
Data related to business processes and operations.
Examples include sales data, customer records, and financial transactions.
c. Geospatial Data:
Data associated with geographical locations.
Examples include maps, GPS data, and satellite imagery.
d. Health Data:
Data related to healthcare and medical information.
Examples include electronic health records, medical imaging data, and patient
demographics.
16
Big Data: Introduction
□ Big Data may well be the Next Big Thing in the IT world.
□ Big data burst upon the scene in the first decade of the 21st century.
□ Like many new information technologies, big data can bring about
dramatic cost reductions, substantial improvements in the time
required to perform a computing task, or new product and service
offerings.
17
What is BIG DATA?
□ ‘Big Data’ is similar to ‘small data’, but bigger in
size
□ The basic idea behind the phrase 'Big Data' is that everything we do is
increasingly leaving a digital trace (or data), which we (and others) can use
and analyse.
□ Big Data therefore refers to our ability to make use of the ever-
increasing volumes of data.
□ Big Data is one of those things, and is completely transforming the way
we do business and is impacting most other parts of our lives
□ Big Data refers to extremely large and complex datasets that cannot be
easily processed, managed, or analyzed using traditional data
processing tools.
□ The term "Big Data" encompasses not only the volume of data but also
its velocity, variety, and, increasingly, veracity and value.
19
From the dawn of civilization
until 2003, humankind generated
five exabytes of data. Now we
produce five exabytes every two
days…and the pace is
accelerating.
Eric Schmidt,
Executive Chairman, Google
20
BIG DATA Everywhere
23
The key characteristics of Big Data are
often referred to as the "4Vs":
Volume: Big Data involves a massive amount of data. Traditional databases and
processing systems may struggle to handle the sheer volume, which can range
from terabytes to petabytes and beyond.
Velocity: Velocity refers to the speed at which data is generated, collected, and
processed. With the advent of real-time data sources such as sensors, social
media, and online transactions, data is often generated at high speeds.
Variety: Big Data comes in various formats and types, including structured,
semi-structured, and unstructured data. This diversity includes text, images,
videos, log files, social media posts, sensor data, and more.
Veracity: Veracity refers to consistency, accuracy, quality and reliability of the data.
Data veracity refers to the biasedness, noise, and abnormality in data. Big Data
sources may have inconsistencies, errors, or incomplete information. Managing and
ensuring data quality is a challenge in Big Data analytics.
24
25
Variety
c•
Data at Data in Data in Data in
Rest Motion Many Doubt
Uñcortbir\ty du9 to
Terabytes to Streaming data, Forms dsts
exabytes of millisecond9 tO Structured, inconsistency &
existing data to seconds to unstructured, incompleteness,
process respond text, mukimedia ambiguities, latency,
deception, model
26
approximations
Why Four V’s??
□ Whether data is structured or unstructured, it’s only as valuable
as the business outcomes it makes possible.
□ However, the data itself isn’t the only factor responsible for
those outcomes.
□ How you measure that data, from a business point of view, helps
you tie the value of the data to its potential and supports
decisions that lead to positive business results.
□ To get there, you need a big data analytics platform.
27
Continued
…
□ Once you have a platform that can measure along the four V’s
—volume, velocity, variety, and veracity—you can then extend
the outcomes of the data to impact customer acquisition,
retention, upsell, cross-sell and other revenue generating
indicators.
□ You can also look at this information as a competitive strategy that
brings corresponding improvements in operational efficiency and
helps you leverage data across the enterprise for other initiatives.
28
The Model Has Changed…
32
Old Model: Few companies are generating data, all others are consuming
data
New Model: all of us are generating data, and all of us are consuming
data
29
What’s driving Big
33
Data
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time
30
Data
Analytics
❖ Data Science and Data Analytics are two most trending terminologies
of today’s time.
❖ Data is collected into raw form and processed to
the
according requirement of a company.
❖ This data is utilized for the decision making purpose.
❖ This process helps the businesses to grow in the
❖ market.
❖ But, the
Data main question
Analytics is the arises
answer– What
here.is and,
the process
Data Analyst and
called?
Data Scientist are the ones who perform this process.
31
What is Data
Analytics?
❖ Data or information is in raw format.
❖ The increase in size of the data has led to arise
❖ In need for carrying out inspection, data cleaning and
❖ transformation. Data modeling to gain insights from the data
in order to derive conclusions for better decision-making process.
❖ This process is known as data analysis.
❖ The analysis is an interactive process of a person tackling a
problem, finding the data required to get an answer, analyzing
that data, and interpreting the results in order to provide a
recommendation for action.
32
Why Data Analytics? - rise of big data is
a significant factor
Informed Decision-Making: Data analytics helps organizations make informed and
data-driven decisions. By analyzing large sets of data, businesses can identify
trends, patterns, and correlations, enabling better decision-making processes.
34
Big Data Vs
BI
□ Big Data collectively refers to the act of generating, capturing and
usually processing enormous amounts of data on a continuing
basis.
35
36
Data Analytics vs Data Scientist
37
38
Need for Data Analytics /Data
Science
39
40
History
41
History
R is a programming language and software environment
for statistical analysis, graphics representation and reporting.
R was created by Ross Ihaka and Robert Gentleman at
the University of Auckland, New Zealand.
R is freely available under the GNU General Public License.
R provided for various operating systems like Linux, Windows
and Mac.
This programming language was named R, based on the first
letter of first name of the two R authors (Robert Gentleman
and Ross Ihaka).
42
❖ R play on the name of the Bell Labs Language S.
❖ Zealand.
R made its first appearance in
1993.
❖ Since mid-1997 there has been a core group (the "R Core
Team")
who can modify the R source code
archive.
43
Why Learn R Programming Language
❖ With R, you can perform statistical analysis, data analysis as well as
machine learning.
❖ We can create objects, functions and packages in it.
❖ R is platform-independent and can be used across multiple operating systems.
❖ R is free owing to its open-source GNU licensing and can be installed by
anyone.
❖R consists of a robust collection of graphical libraries like ggplot2, plotly
and many more.
❖R is most widely used by the various industries like health, finance,
banking, manufacturing and many more.
❖ There are about 2 million job openings for R programmers worldwide.
❖ Companies hire R programmers for many roles like data analysts,
business analysts, data visualization experts, and business intelligence
44
45
Features of
R
❑ As stated earlier, R is a programming language and
software environment for statistical analysis, graphics
representation and reporting. The following are the important
❑ features of R −
R is a well-developed, simple and effective programming
language which includes conditionals, loops, user defined
❑ recursive functions and input and output facilities.
❑ R has an effective data handling and storage facility,
R provides a suite of operators for calculations on arrays,
❑ lists, vectors and matrices.
R provides a large, coherent and integrated collection of tools
46
How R is better than Other Technologies
There are certain unique aspects of R programming which makes it better in
comparison with other technologies:
• Graphical Libraries – Libraries like ggplot2, plotly facilitate appealing libraries
for making well-defined plots.
• Availability / Cost – R is completely free.
• Advancement in Tool – R supports various advanced tools and features that allow
you to build robust statistical models.
• Job Scenario – The immense growth in Data Science and rise in demand, R has
become the most in-demand programming language of the world today.
• Customer Service Support and Community – With R, you can enjoy strong
community support.
• Portability – R is highly portable. Many different programming languages and
software frameworks can easily combine with the R environment for the best results.
47
Sourcing of R
Script
RStudi
o
• RStudio is an Integrated Development Environment for
R.
as
• It facilitates extensive code editing, development as various
features.
well
Features of
RStudio
• RStudio provides various tools and features that allow you to boost
your code productivity.
• It can also be accessed over the web and is cross-platform in nature.
• It facilitates automatic checking of updates
• It provides support for recovery in case of file loss.
• With RStudio, you can manage the data more 48
Components of
RStudio
• Source – In the top left corner of the screen is the text editor that
allows
you to work within source scripting. You can enter multiple lines in this
source.
• Console – This is present on the bottom left corner of the main window
Workspace and History – In the top right corner, the R workspace and
the history window. This will give you the list of all the variables and
Banking:
□ Large amount of customer data is generated every day in Banks. While dealing
with millions of customers on regular basis, it becomes hard to track their
□ mortgages. Solution:
□ R builds a custom model that maintains the loans provided to every individual
which helps us to decide the amount to be paid by the customer over
customer
time.
□Insurance:
Insurance extensively depends on forecasting. It is difficult to decide which policy
to accept or reject.
□ Solution:
□ By using the continuous credit report as input, we can create a model in R that will
not only assess risk appetite but also make a predictive forecast as well.
52
Healthcare:
□ Every year millions of people are admitted in hospital and billions
□ Given the patient history and medical history, a predictive model can
53
More Applications of R
Programming
❖ finance and banking sectors for detecting fraud, reducing customer churn
rate and for making future decisions.
❖ bioinformatics to analyze strands of genetic sequences, for performing
drug discovery and also in computational neuroscience.
❖ Social media analysis to discover potential customers in online advertising.
❖ Companies also use social media information to analyze
customer
sentiments for making their products better.
❖ E-Commerce companies make use of R to analyze the purchases made
by the customers as well as their feedbacks.
❖ Manufacturing companies use R to analyze customer feedback.
❖ They also use it to predict future demand to adjust their
manufacturing
speeds and maximize 54
profits.
Companies Using R
Some of the companies that are using R programming are as
follows:
• Facebook
• Google
• Linkedin
• IBM
• Twitter
• Uber
• Airbnb
• Ford Motor company
• Microsoft 55
Who uses R?
56
1.1 Features of
RR allows branching and looping as well as modular programming using functions.
□
The nature and values of all variables and objects appear here
(Console section)
□ Top-Right Section: To manage datasets and variables (Data section)
58
1.4 Variables in R
1. Naming Variables: A variable in R can store any object in R
including atomic vector, list, matrix, array, factor and data
frame. A valid variable name consists of letters, numbers and
the dot or underline characters.
2. Assigning Values to Variables: In R, an assignment to a
variable can be done in three ways = , <- and -> sign.
3. Finding Variables: To know all the variables currently
available in the workspace we use the ls() function.
4. Removing Variables: Variable can be deleted by using the
rm() function along with variable name.
59
1.5 Input in R
□ 1.5.1 Input of Data from Terminal: The scan function is used
to take data from the user at the terminal.
□ 1.5.2 Input of Data through R Objects: There are many types
of
R-objects including Vectors, Lists, Matrices, Arrays, Factors and
Data Frames.
60
1.6 Output in R
61
1.7 Inbuilt Functions in
□ R Mathematical Functions: R can also be used as a calculator along with
1.7.1
facility to use many mathematical functions. Ex: sqrt, abs, floor, ceiling etc.
□ 1.7.2 Trigonometric Functions: R provides the user an ability to compute the
result using different trigonometric functions. Ex: sin, cos, tan etc.
□ 1.7.3 Logarithmic Functions: R has an extensive facility to provide log of a
number with proper specification of the base . Ex : log with base 10 and
natural base
□ 1.7.4 Date and Time Functions: Dates and times have special classes in R that
allow for numerical and statistical calculations.
□ 1.7.5 Sequence Function: A sequence is a set of related numbers, events, date etc.
that follow each other in a particular order. R has a number of facilities
for generating commonly used sequences of numbers.
□ 1.7.6 Repeat Function: Function rep is used to replicates the values in a vector. It
is a very powerful feature in R which helps the user to create a set of values in
an easy manner
62
1.7.7
□
□
1.
Strings
Creating a String: String in R is written within a pair of single quote or double quotes.
2.Concatenating Strings: The paste() function concatenates several strings together. It
creates a new string by joining the given strings end to end.
□ 3.Formatting of Strings: Strings can be formatted to a specific style according to
the requirement of the user using format() function.
□ 4.Counting number of character: nchar() function is used to count the number of
characters including spaces in a string.
□ 5.Change case:The functions toupper() and tolower() functions are used to change the case
of characters of a string.
□ 6.Extracting parts of a string: The substring() or substr() function extracts parts of a
string depending on the index position of the string.
□ 7. Searching Matches: The grep() function is used for searching the matches.
□ 8.Changing String to expression: The eval() function evaluates an expression only and
not a string..
□ 9. Split the Elements of Vector: The function strsplit() is used to split the elements of a
character vector into substrings according to the matches to substring split within 63
1.8 Packages in R
□ 1.8.1 Standard Packages: R packages are a collection of R
functions, complied code and sample data. They are stored under a
directory called "library" in the R environment.
□ 1.8.2 Contributed Packages: There are thousands of contributed
packages for R, written by many authors. Some of these packages
implement specialized statistical methods, others give access to
data or hardware, and others are designed to complement
textbooks.
64
R Installation
https://cran.r-
project.org/
bin/windows/
base/
65
R Console Window
66
R Command Prompt
Once you have R environment setup, then it’s easy to start your R
This will launch R interpreter and you will get a prompt > where
https://rstudio.com/
products/rstudio/
download/
#download
68
R - Data
Types
❖ In contrast to other programming languages like C and java in R,
the variables are not declared as some data type.
❖ The variables are assigned with R-Objects and the data type of the R-
object becomes the data type of the variable.
❖ There are many types of R-objects. The frequently used ones are −
✓ Vectors
✓ Lists
✓ Matrices
✓ Arrays
✓ Factors
✓ Data Frames 69
R-
Functions
✓ A function is a set of statements to perform a specific task.
✓ R has a large number of in-built functions
✓ The user can create their own functions.
Built-in Function
✓ Simple examples of in-built functions are seq(), mean(), max(), sum(x) and
paste(...)
etc.
✓They are directly called by user written
programs. # Create a sequence of numbers from
32 to 44. print(seq(32,44))
# Find mean of numbers from 25 to
82. print(mean(25:82))
# Find sum of numbers from 41 to 68.
print(sum(41:68) 70
)
User-defined
Function
❖ They are specific to what a user wants and once created they can
be used like the built-in functions.
# Create a function to print squares of numbers in
sequence. new.function <- function(a) { for(i in 1:a) { b <-
i^2 print(b) } }
Calling a Function
# Call the function new.function supplying 6 as an
argument. new.function(6)
Produces
[1] 1 the following
[1] 4 [1]result
9 − [1] 16 [1] 25 [1]
36
71
R String Manipulation Functions
1. grep()
It is used for pattern matching and replacement.
grep("b+", c("abc", "bda", "ccaa", "abd"), perl=TRUE,
value=TRUE)
grep("b+", c("abc", "bda", "ccaa", "abd"), perl=TRUE,
value=FALSE) grep("chid+", c("chidambaram", "Villupuram",
"Srimushnam", "chidambaram"), perl=TRUE, value=FALSE)
grep("அ+", c("அṅпw", "øwøøw", "அddw"), perl=TRUE,
value=FALSE)
[1] 1 2 4
[1] 1 4 72
2. nchar()
With the help of this function, we can count the characters.
> str <- "Big Data at DataFlair"
>nchar(st
r) [21]
3. paste()
Concatena
te n
number of
strings
using the
paste()
function.
> #Author
DataFlair
> paste("H
> [1] Matthew scored 72.30
adoop", 73
percent
5. strsplit()
> #Author DataFlair
> str = "Splitting sentence into words"
> strsplit(str, " ")
> strsplit(str, "")
Output
[1] "Splitting" "sentence" "into" "words"
[1] "S" "p" "l" "i" "t" "t" "i" "n" "g" " " "s" "e" "n" "t" "e" "n" "c" "e" "
" "i" "n"
74
"t" "o" " " "w" "o" "r" "d" "s"
Vector
Vectors are the most basic R data objects and there are six types of atomic
vectors. They are logical, integer, double, complex, character and raw.
The non-character values are coerced to character type if one of the elements
is a character.
76
Accessing Vector
❖ The [ ] brackets are used for indexing. Indexing starts with position
Elements
❖ 1. Giving a negative value in the index drops that element from
❖ result.
TRUE, FALSE or 0 and 1 can also be used for indexing.
# Accessing vector elements using position.
t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
u <- t[c(2,3,6)]
print(u) [1] "Mon" "Tue" "Fri"
# Accessing vector elements using logical indexing.
v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
print(v) [1] "Sun" "Fri"
# Accessing vector elements using negative
indexing. x <- t[c(-2,-5)]
print(x) [1]
"Sun" "Tue" "Wed" "Fri" "Sat"
# Accessing vector elements using 0/1
indexing. y <- t[c(0,0,0,0,0,0,1)]
print(y) 77
R-
Lists
❖ Lists are the R objects which contain elements of different
❖ types. numbers, strings, vectors and another list inside it.
❖ List is created using list() function.
Creating a List
# Create a list containing strings, numbers, vectors and a logical
values.
list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23,
119.1) print(list_data)
[1] "Red"
[1] "Green"
[1] 21 32 11
[1] TRUE
[1] 51.23
[1] 119.1 78
R - Matrices
Syntax
❖ data is the input vector which becomes the data elements of the matrix.
❖ nrow is the number of rows to be created.
❖ ncol is the number of columns to be created.
❖ byrow, If TRUE, then the input vector elements are arranged by row.
❖ dimname is the names assigned to the rows and columns.
79
Matrix
Example
# Elements are arranged sequentially by
row. M <- matrix(c(3:14), nrow = 4, byrow =
TRUE) print(M)
# Access the
onlyelement
the 2nd at
row.
2nd column and 4th
print(P[2,])
row. print(P[4,2]) col1[1]
col2
13
col3
6
row1 row2 row3
7 row4 81
# Example for Matrix # Matrix
A <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9),3,3) Multiflication
function Matmul<-A*B
B <- matrix(c(11, 12, 13, 14, 15, 16, 17, 18,
19),3,3) Matmul
C <- matrix(c(21, 22, 23, 24, 25, 26, 27, 28, # Matrix
29),3,3) Transforce tm<-
t(A)
# Check whether the variable A Matrix or
A
not is.matrix(A) tm
#Multiplication by a #Computing Column & Row
Sums sum(A)
Scalar s<-3
colSums(A)
s1<- rowSums(A
A*s s1 )
# Matrix #Computing Column & Row
Means mean(A)
Addition colMeans(A)
Matadd<-A+B rowMeans(A
Matadd )
#Accessing the matrix
# Matrix
element A 82
R - Data
Frames
❑ A data frame is a table or
structure.
❑ Each column contains values of one
variable.
Following are the characteristics of a data frame
83
❑ Each column should contain same number of data items.
# Create the data frame.
emp.data <- data.frame( emp_id = c (1:5),
emp_name =
c("Rick","Dan","Michelle","Ryan","Gary"), salary =
c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-
01", "2013-09-23", "2014-11-15",
"2014-05-11", "2015-03-27")),
stringsAsFactors = FALSE )
# Print the data frame.
print(emp.data) 84
Summary of Data in Data Frame
The statistical summary and nature of the data can be obtained
by applying summary() function.
# Create the data frame.
emp.data <- data.frame( emp_id = c (1:5),
emp_name =
c("Rick","Dan","Michelle","Ryan","Gary"), salary =
c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23",
"2014-11-15",
"2014-05-11", "2015-03-27")), stringsAsFactors =
FALSE )
85
summary(emp.data)
86