0% found this document useful (0 votes)
15 views32 pages

DSR Unit 1

Uploaded by

kriti.abcdef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views32 pages

DSR Unit 1

Uploaded by

kriti.abcdef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

DATA SCIENCE USING R

VIII SEMESTER
DS-427T

Department of Computer Science and Engineering,


BVCOE New Delhi
1 Subject: Data Science Using R , Instructor: Ms 8/20/2024
RACHNA NARULA
Introduction
⚫ R is an open-source language that is contributed to developers
and programmers from all around the world.
⚫ Due to its platform independence, diversity of packages, and robust
graphical features, it has become the primary tool for the analytics
industry.
⚫ R has become the lingua franca of Data Science and statistics. It
is the most popular analytic tool.
⚫ The estimated R users are nearing approximately 2 million!
⚫ Part of GNU project.
⚫ Written primarily in C and Fortran.
⚫ Available for various operating systems: Unix/Linux, Windows,
Mac.Department of Computer Science and Engineering,
⚫2 Can
BVCOE New Delhi
be downloaded
Subject: Data Science Using R , Instructor: Ms
and installed 8/20/2024
from
http://cran.r-project.org/
RACHNA NARULA
History of R

Department of Computer Science and Engineering, BVCOE New Delhi


3 Subject: Data Science Using R , Instructor: Ms RACHNA NARULA 8/20/2024
Department of Computer Science and Engineering,
BVCOE New Delhi
4 Subject: Data Science Using R , Instructor: Ms
8/20/2024
RACHNA NARULA
Popularity of R

Department of Computer Science and Engineering,


BVCOE New Delhi
5 Subject: Data Science Using R , Instructor: Ms 8/20/2024
RACHNA NARULA
Department of Computer Science and Engineering,
BVCOE New Delhi
6 Subject: Data Science Using R , Instructor: Ms 8/20/2024
RACHNA NARULA
Some application areas

Department of Computer Science and Engineering,


BVCOE New Delhi
7 Subject: Data Science Using R , Instructor: Ms 8/20/2024
RACHNA NARULA
Different Job Roles
Some of the positions that are available for the R programmers are
as follows:
⚫ Data Scientist :A Data Scientist is supposed to extract data,
transform it into a structured format, perform analysis and
forecast future insights.
⚫ Business Analyst: A Business Analyst has to develop solutions
that are technical in nature for the various business problems.
They are required to seek solutions, advance the efforts of the
company as well as fulfill the requirements of the business.
⚫ Data Analyst :A Data Analyst is responsible for extracting and
analyzing data. This task requires extensive usage of R’s
statistical libraries to deliver accurate results so that the
companies can make careful data-driven decisions.
Department of Computer Science and Engineering,
BVCOE New Delhi
8 Subject: Data Science Using R , Instructor: Ms 8/20/2024
RACHNA NARULA
⚫ Data Visualization Expert : R is most popular for its
visualization libraries. Due to this reason, Data
Visualization experts in R programming are in-demand in
the industries.
⚫ Quantitative Analyst: Quantitative Analysts are engaged
in the financial and banking industries. These industries
have to deal with all types of data and R provides an ideal
solution to their various data problems.

Department of Computer Science and Engineering,


BVCOE New Delhi
9 Subject: Data Science Using R , Instructor: Ms 8/20/2024
RACHNA NARULA
What R does and does not
o data handling and storage: numeric, o is not a database, but connects to
textual DBMSs
o matrix algebra o has no graphical user interfaces, but
connects to Java, TclTk
o hash tables and regular expressions
o language interpreter can be very slow,
o high-level data analytic and statistical
but allows to call own C/C++ code
functions
o no spreadsheet view of data, but
o classes (“OO”)
connects to Excel/MsOffice
o graphics
o no professional / commercial support
o programming language: loops,
branching, subroutines

Department of Computer Science and Engineering,


BVCOE New Delhi
10 Subject: Data Science Using R , Instructor: Ms 8/20/2024
RACHNA NARULA
Source: dataflair.org

Department of Computer Science and Engineering,


BVCOE New Delhi
11 Subject: Data Science Using R , Instructor: Ms 8/20/2024
RACHNA NARULA
When to choose between R and Python?
• The choice between R vs Python also depends on what you
are trying to accomplish with your code.
• If you are trying to analyze a dataset and present the
findings in a research paper, then R is probably a better
choice.
• But if you are writing a data analysis program that runs in a
distributed system and interacts with lots of other
components, it would be preferable to work with Python.

Department of Computer Science and Engineering,


BVCOE New Delhi
12 Subject: Data Science Using R , Instructor: Ms 8/20/2024
RACHNA NARULA
Structured data and Unstructured
data
⚫ Structured Data
⚫ The data which is to the point, factual, and highly
organized is referred to as structured data. It is
quantitative in nature, i.e., it is related to quantities that
means it contains measurable numerical values like
numbers, dates, and times.

13 Department of Computer Science and Engineering, BVCOE New Delhi


Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
Unstructured Data
⚫ All the unstructured files, log files, audio files, and image
files are included in the unstructured data. Some
organizations have much data available, but they did not
know how to derive data value since the data is raw.

14 Department of Computer Science and Engineering, BVCOE New Delhi


Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
⚫ Unstructured data is the data that lacks any predefined
model or format. It requires a lot of storage space, and it is
hard to maintain security in it. It cannot be presented in a
data model or schema. That's why managing, analyzing,
or searching for unstructured data is hard. It resides in
various different formats like text, images, audio and
video files, etc. It is qualitative in nature and sometimes
stored in a non-relational database or NO-SQL.

15 Department of Computer Science and Engineering, BVCOE New Delhi


Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
16 Department of Computer Science and Engineering, BVCOE New Delhi
Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
Qualitative and Quantitative Data
⚫ Statistics is a subject that deals with the collection,
analysis, and representation of collected data. The
analytical data derived from methods of statistics are used
in the fields of geology, psychology, forecasting, etc.

⚫ Quantitative data is numerical, countable, and measurable,


providing information on how many, how much, or how
often. Qualitative data, however, is descriptive,
interpretative, and language-based, helping us understand
the reasons, processes, or contexts behind certain
behaviors.
17 Department of Computer Science and Engineering, BVCOE New Delhi
Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
Qualitative Data
⚫ The data collected on grounds of categorical variables are
qualitative data. Qualitative data are more descriptive and
conceptual in nature. It measures the data on the basis of
the type of data, collection, or category.

⚫ The data collection is based on what type of quality is


given. Qualitative data is categorized into different groups
based on characteristics. The data obtained from these
kinds of analysis or research is used in theorization,
perceptions, and developing hypothetical theories. These
data are collected from texts, documents, transcripts,
audio and video recordings, etc

18 Department of Computer Science and Engineering, BVCOE New Delhi


Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
Examples of Qualitative Data

⚫ Textual responses from open-ended survey questions


⚫ Observational notes or fieldwork observations
⚫ Interview transcripts
⚫ Photographs or videos
⚫ Personal narratives or case studies

19 Department of Computer Science and Engineering, BVCOE New Delhi


Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
Quantitative Data
⚫ The data collected on the grounds of the numerical variables
are quantitative data. Quantitative data are more objective and
conclusive in nature. It measures the values and is expressed in
numbers. The data collection is based on “how much” is the
quantity. The data in quantitative analysis is expressed in
numbers so it can be counted or measured. The data is
extracted from experiments, surveys, market reports, matrices,
etc. Some examples of quantitative data are:

⚫ Age, Height, Weight, etc.


⚫ Temperature
⚫ Income
⚫ Number of siblings
⚫ GPA
20 Department of Computer Science and Engineering, BVCOE New Delhi
Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
21 Department of Computer Science and Engineering, BVCOE New Delhi
Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
Levels of Measurement
⚫ Levels of measurement, also called scales of measurement, tell you
how precisely variables are recorded. In scientific research, a
variable is anything that can take on different values across your
data set (e.g., height or test scores).

⚫ There are 4 levels of measurement:

⚫ Nominal: the data can only be categorized


⚫ Ordinal: the data can be categorized and ranked
⚫ Interval: the data can be categorized, ranked, and evenly spaced
⚫ Ratio: the data can be categorized, ranked, evenly spaced, and has
a natural zero.

22 Department of Computer Science and Engineering, BVCOE New Delhi


Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
⚫ Depending on the level of measurement of the variable,
what you can do to analyze your data may be limited.
There is a hierarchy in the complexity and precision of the
level of measurement, from low (nominal) to high (ratio).

23 Department of Computer Science and Engineering, BVCOE New Delhi


Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
24 Department of Computer Science and Engineering, BVCOE New Delhi
Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
25 Department of Computer Science and Engineering, BVCOE New Delhi
Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
26 Department of Computer Science and Engineering, BVCOE New Delhi
Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
The Five steps of Data Science
⚫ Data Science is a detailed study of the flow of information
from the colossal amounts of data present in an
organization’s repository. It involves obtaining
meaningful insights from raw and unstructured data which
is processed through analytical, programming, and
business skills. The five essential steps to perform data
science are as follows:
⚫ 1. Asking an interesting question
⚫ 2. Obtaining the data
⚫ 3. Exploring the data
⚫ 4. Modeling the data
⚫ 5. Communicating and visualizing the results
27 Department of Computer Science and Engineering, BVCOE New Delhi
Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
Ask an interesting question
⚫ This is probably my favorite step. As an entrepreneur, I
ask myself (and others) interesting questions every day. I
would treat this step as you would treat a brainstorming
session. Start writing down questions regardless of
whether or not you think the data to answer these
questions even exists.

28 Department of Computer Science and Engineering, BVCOE New Delhi


Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
Obtain the data
⚫ Once you have selected the question you want to focus
on, it is time to scour the world for the data that might be
able to answer that question. As mentioned before, the
data can come from a variety of sources; so, this step can
be very creative!

29 Department of Computer Science and Engineering, BVCOE New Delhi


Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
Explore the data
⚫ Once this step is completed, the analyst generally has
spent several hours learning about the domain, using code
or other tools to manipulate and explore the data, and has
a very good sense of what the data might be trying to tell
them.

30 Department of Computer Science and Engineering, BVCOE New Delhi


Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
Model the data
⚫ step involves the use of statistical and machine learning
models. In this step, we are not only fitting and choosing
models, but we are also implanting mathematical
validation metrics in order to quantify the models and
their effectiveness.

31 Department of Computer Science and Engineering, BVCOE New Delhi


Subject: Data Science Using R , Instructor: Ms RACHNA NARULA
Communicate and visualize the
results
⚫ This is arguably the most important step. While it might
seem obvious and simple, the ability to conclude your
results in a digestible format is much more difficult than it
seems. We will look at different examples of cases when
results were communicated poorly and when they were
displayed very well.

32 Department of Computer Science and Engineering, BVCOE New Delhi


Subject: Data Science Using R , Instructor: Ms RACHNA NARULA

You might also like