Class One
Class One
Language.
Facilitators:
Prof. Susan Balaba Tumwebaze
Dr. Thomas Odong
Dr. Hellen Namawejje
R-programming and RStudio
• R soft ware • RStudio software
• Download R • https://www.rstudio.com/produ
cts/rstudio/download/
http://www.r-project.org/
or
http://cran.r-project.org/
• Reproducibility
• Note: You can install R and RStudio • Select the download for your
directly by typing it in Google URLs operating system
R and RSTUDIO workspace
How to update R and RStudio
• Updating R: R 4.3.1 is the latest R version
Option One
>help(solve)
>?solve
>? t.test
or
>help(t.test)
Basic concepts in R
• R as a calculator • Factors
Data is sometimes categorized
e.g. Type of soils
( Loam, clay, sandy)
R is case sensitive
DATA FRAME in R
• Data frame: represents a typical Characteristics of a Data frame
data table that researchers come • The column names should be
up with – like a spreadsheet. non-empty
e.g. • The row names should be
unique
• The data stored in a data frame
can be numeric, factor or
character type
• Each column should contain
same number of data items
Level of measurement
Some Definitions
Variable
Variable Gender
Gender
Attribute
Attribute Attribute
Attribute Female
Female Male
Male
What Is Level of
Measurement?
The relationship of the values that are assigned
to the attributes for a variable
Variable Party Affiliation
Values 1 2 3
Relationship
Types of level of measurement
1. Nominal • Nominal: The values “name” the
attribute uniquely; The name does not
2. Ordinal
imply any ordering of the cases
3. Interval
4. Ratio • Ordinal: Attributes can be rank-ordered…
Note:
Interval and Ratio are times • Interval: When distance between
referred to as Scale attributes has meaning, e.g temperature:
measurements distance from 30-40 is the same as
distance from 70-80
Why is Level of measurement
important?
• Helps you to decide what • Ratio: absolute zero is
statistical analysis is appropriate meaningful. E.g number of
on the value that were assigned clients in past one months
• It is meaningful to say that “...we
• Helps you decide how to had twice as many clients in this
interpret the data from that period as we did in the previous
variable six months
The Hierarchy of Levels
• We need to first import the data • Data files from other programs
into R e.g., SAS.sas files, SPSS.sav
• Importing data from different file files, or STATA.dta files etc can
types and sources using add on be used
packages
Importing data • Packages
• Importing the most commonly
• From Text (readr) to import csv
used file types of CSV and excel files
files using the data.
• From Excel(readxl) to import
excel files
For Windows: the default directory structure involves a single backslash “\” but
R interprets these as escape characters, so you must replace these with forward
slashes “/” or two backslashes “\\”
For Mac/Linux: The default directory structure already uses forward slashes
IMPORTING DATA IN R
• Using ABC dataset • Set working directory using getwd()
• Install required packages command
Note: To perform the above operations we will use dplyr and tidyverse packages. dplyr
provide functions to make these operations more intuitive and codes more readable.
DM1: Creating new/adding a
variable(s)
• Use the assignment operator <-
to create new variables.
e.g.
salaries$rank <-as.factor(salaries$rank)
check
class(df$rank)
levels(df $ rank)
DM6: Continued_Continuous
variables
• Scale (ratio and interval) –numerical /integer
check
class(salaries $ salary)
DM7: Dealing with missing variables
• It might happen that your dataset is not complete.
• There are advanced ways that can be used to impute missing data.
Prop.table() #percentages