0% found this document useful (0 votes)
12 views22 pages

R Vectors

Uploaded by

dzedziphilly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views22 pages

R Vectors

Uploaded by

dzedziphilly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

NAME REGISTRATION NUMBER

Oscar T Gotosa R2213059N

Cecilia Thauzeni R2212327B

Takudzwa Masakadza R2213087A

Gerald D Mikitai R2212196C

Munashe Chirarapasi R228422P

Ntombikhona Madhlauza R2213119C


R VECTORS
• A vector is simply a list of items that are of the same type.
• To combine vector elements you can use the c function. And to
separate the vector elements we use ,.
• For example : x = c(1, 3, 5, 7, 9)
• WE CAN CREATE A VECTOR OF ELEMENTS BY USING “ ” AROUND
THE ELEMENT.
• FOR EXAMPLE : GENDER = C(“MALE”, “FEMALE”).
• WE CAN ALSO CREATE A SEQUENCE OF INTEGER VALUES USING
THE : OPERATOR.
• FOR EXAMPLE : 2 : 7
• BUT FOR GENERAL SEQUENCES WE USE THE SEQ OPERATOR.
• FOR EXAMPLE 1): SEQ(FROM = 1, TO = 7, BY = 1).
• 2): SEQ(FROM = 1, TO = 7, BY = 1/3).
• 3): SEQ(FROM = 1, TO = 7, BY = 0.25).
• TO FIND OUT HOW MANY ITEMS A VECTOR HAS, USE THE
LENGTH() FUNCTION.
• FOR EXAMPLE : LENGTH(X) .
• TO REPEAT VECTORS, USE REP() FUNCTION.
• FOR EXAMPLE 1): REP(1, TIMES =10)
• 2): REP(“MARINE”, TIMES = 5)
• WE MAY ALSO USE A REPEATED SEQUENCE MANY TIMES.
• FOR EXAMPLE 1): REP(1 : 3, TIMES 5)
• 2): REP(SEQ(FROM =2, TO =5, BY =0.25 ), TIMES =5)
• 3): REP(C(“M”, “F”), TIMES =5)
BASIC VECTOR OPERATIONS

• WE CAN ADD A VALUE TO EACH ELEMENT OF A VECTOR USING


THE BASIC MATH OPS (+ - * /).
• FOR EXAMPLE 1): X + 10
• 2): X – 10
• 3): X * 10
• 4): X / 10
• SIMILARLY IF TWO VECTORS ARE OF THE SAME PATTERN, YOU
CAN PERFORM BASIC OPS ON THEM.
VECTOR
EXTRACTIONS
• WE CAN EXTRACT VECTOR ELEMENTS USING [] FUNCTION. FOR
EXAMPLE : Y[3]
• BUT USING THE – WILL TAKE ALL ELEMENTS EXCEPT THE THIRD
ELEMENT. FOR EXAMPLE : Y[-3]
• WE CAN EXTRACT THE FIRST 3 ELEMENTS, USING THE :
FUNCTION. FOR EXAMPLE : Y[1 : 3].
• Y [C(1, 5)], TO EXTRACT THE 1ST AND 5TH ELEMENTS. AND Y[-C(1,
5)], TO EXTRACT ALL ELEMENTS EXCEPT 1ST AND 5TH VALUES.
• LASTLY, Y[Y <6], TO EXTRACT ELEMENTS LESS THAN 6.
WORKING WITH MATRICES
•WE CAN CREATE A MATRIX VALUE USING “MATRIX” COMMAND.
•FOR EXAMPLE : MATRIX (C(1, 2, 3, 4, 5, 6, 7, 8, 9), NROW = 3,
BYROW = TRUE).
• NROW = 3 IS TO LET R KNOW THE NUMBER OF ROWS WE WANT.

• BYROW = TRUE IS TO LET R KNOW HOW TO ENTER THE


ELEMENT ORDER.
• FOR EXAMPLE : MAT = MATRIX (C(1, 2, 3, 4, 5, 6, 7, 8, 9), NROW
= 3, BYROW = TRUE).
•WE CAN ALSO USE THE [] TO EXTRACT THE
ELEMENTS IN THE MATRIX.
•FOR EXAMPLE : MAT[1, 2]
•OR : MAT[C(1, 3), 2] , FOR EXTRACTING
ELEMENTS IN THE ROW 1 AND 3, COLUMN 2.
•OR MAT[2,], EXTRACT ALL THE ELEMENTS IN
ROW TWO BY LEAVING IT EMPTY.
• OR MAT[,1], EXTRACTING JUST COLUMN TWO.
• WE CAN PERFORM BASIC MATH OPERATIONS
ON MATRICES I.E (+ - / *).
DATA FRAMES
• IS A COMBINATION OF TWO OR MORE VECTORS ASSIGNED TO A
VARIABLE.
CREATING A DATA FRAME
• A = SAMPLE(1:100, 15)
• B = SAMPLE (1000:2000, 15)
• C = SAMPLE (LETTERS, 15)
• #SAMPLE FUNCTION SELECTS RANDOM ITEMS
• DF = DATA.FRAME(A,B,C)
• VIEW(DF) #SHOWS OUR DATA FRAME
BASIC DATA FRAME FUNCTIONS
HEAD (DF,5) - SELECTS 5 ROWS FROM TOP TO BOTTOM
• TAIL (DF,5) - SELECTS 5 ROWS FROM BOTTOM UP
• #IF THE NUMBER OF ROWS IS NOT SPECIFIED IT WILL SELECT 6
• NEW_ROW = C(5, 1807, “H”) - GENERATES NEW ROW
• RBIND(DF, NEW_ROW) - ADDS ROW TO DF
• DF = RBIND(DF, NEW_ROW) - UPDATES DF
• D = SAMPLE (10000:20000, 16) - NEW COLUMN VARIABLE.
• CBIND(DF, D) - ADD COLUMN
• DF = CBIND(DF, D) - UPDATES DF
CHANGING COLUMN NAMES
• COLNAMES(DF) - CALLS DF COLUMN NAMES
• COLNAMES(DF)[1:2] = C(“1ST_COL”,
• “2ND_COL”) - CHANGES 1ST AND 2ND COLUMN
• NAMES
• SUBSET(DF, 1 > 70) - SHOWS ROWS SMALLER THAN 70
• DF[2:3, C(1,4)] - DISPLAYS SELECTED ROWS. BLANK SPACES INDICATE ALL THE
ROWS/COLUMNS
• STR(DF) - SHOWS DATA FORMATS
• DIM(DF) - SHOWS NUMBER OF ROWS AND COLUMNS
• NROW(DF) - NUMBER OF ROWS
• NCOL(DF) - SHOWS NUMBER OF COLUMNS
• WRITE.CSV(DF, “DATAFRAME.CSV”) - SAVES DF AS .CSV FILE
• GET WD() - SHOWS DIRECTORY OF SAVED FILES
• SET WD() - CHANGES DIRECTORY
• WRITE.CSV(DF, “DATAFRAME.CSV”, ROW.NAMES=F) - TO REMOVE ROW NUMBERS
• READ.CSV(“DATA FRAME.CSV”) - READ THE .CSV FILE
DATA MANIPULATION AND ANALYSIS
•LN R, THERE ARE SEVERAL LIBRARIES AND PACKAGES SPECIFICALLY
DESIGNED FOR DATA MANIPULATION TASKS. THESE LIBRARIES
PROVIDE A WIDE RANGE OF FUNCTIONS AND TOOLS TO EFFICIENTLY
CLEAN, TRANSFORM, AND ANALYZE DATA. SOME OF THE MOST
POPULAR R LIBRARIES AND PACKAGES FOR DATA.

PACKAGES IN R PROGRAMMING
•THE PACKAGE IS AN APPROPRIATE WAY TO ORGANIZE THE WORK
AND SHARE IT WITH OTHERS. TYPICALLY, A PACKAGE WILL INCLUDE
CODE (NOT ONLY R CODE!), DOCUMENTATION FOR THE PACKAGE AND
THE FUNCTIONS INSIDE, SOME TESTS TO CHECK EVERYTHING WORKS
AS IT SHOULD, AND DATA SETS.
•PACKAGES IN R
•PACKAGES IN R PROGRAMMING LANGUAGE ARE A SET OF R FUNCTIONS, COMPILED CODE, AND SAMPLE DATA. THESE
ARE STORED UNDER A DIRECTORY CALLED “LIBRARY” WITHIN THE R ENVIRONMENT. BY DEFAULT, R INSTALLS A
GROUP OF PACKAGES DURING INSTALLATION. ONCE WE START THE R CONSOLE, ONLY THE DEFAULT PACKAGES ARE
AVAILABLE BY DEFAULT. OTHER PACKAGES THAT ARE ALREADY INSTALLED NEED TO BE LOADED EXPLICITLY TO BE
UTILIZED BY THE R PROGRAM THAT’S GETTING TO USE THEM.

WHAT ARE REPOSITORIES?


•A REPOSITORY IS A PLACE WHERE PACKAGES ARE LOCATED AND STORED SO YOU CAN INSTALL R PACKAGES FROM IT.
ORGANIZATIONS AND DEVELOPERS HAVE A LOCAL REPOSITORY, TYPICALLY THEY ARE ONLINE AND ACCESSIBLE TO
EVERYONE. SOME OF THE MOST POPULAR REPOSITORIES FOR R PACKAGES ARE:
 CRAN: COMPREHENSIVE R ARCHIVE NETWORK(CRAN) IS THE OFFICIAL REPOSITORY, IT IS A NETWORK
OF FTP AND WEB SERVERS MAINTAINED BY THE R COMMUNITY AROUND THE WORLD. THE R
COMMUNITY COORDINATES IT, AND FOR A PACKAGE TO BE PUBLISHED IN CRAN, THE PACKAGE NEEDS
TO PASS SEVERAL TESTS TO ENSURE THAT THE PACKAGE IS FOLLOWING CRAN POLICIES.
 BIOCONDUCTOR: BIOCONDUCTOR IS A TOPIC-SPECIFIC REPOSITORY, INTENDED FOR OPEN SOURCE
SOFTWARE FOR BIOINFORMATICS. SIMILAR TO CRAN, IT HAS ITS OWN SUBMISSION AND REVIEW
PROCESSES, AND ITS COMMUNITY IS VERY ACTIVE HAVING SEVERAL CONFERENCES AND MEETINGS PER
YEAR IN ORDER TO MAINTAIN QUALITY.

• GITHUB: GITHUB IS THE MOST POPULAR REPOSITORY FOR OPEN-SOURCE PROJECTS. IT’S POPULAR AS IT
COMES FROM THE UNLIMITED SPACE FOR OPEN SOURCE, THE INTEGRATION WITH GIT, A VERSION
CONTROL SOFTWARE, AND ITS EASE TO SHARE AND COLLABORATE WITH OTHERS.
DIFFERENCE BETWEEN A PACKAGE AND A
LIBRARY
•THERE IS ALWAYS CONFUSION BETWEEN A PACKAGE AND A
LIBRARY, AND WE FIND PEOPLE CALLING LIBRARIES AS
PACKAGES.
 LIBRARY(): IT IS THE COMMAND USED TO LOAD A
PACKAGE, AND IT REFERS TO THE PLACE WHERE THE
PACKAGE IS CONTAINED, USUALLY A FOLDER ON OUR
COMPUTER.
 PACKAGE: IT IS A COLLECTION OF FUNCTIONS
BUNDLED CONVENIENTLY. THE PACKAGE IS AN
1) DPLYR R LIBRARY AND PACKAGE

• IN ORDER TO MANIPULATE THE DATA, R PROVIDES A LIBRARY CALLED DPLYR


WHICH CONSISTS OF MANY BUILT-IN METHODS TO MANIPULATE THE DATA.
SO TO USE THE DATA MANIPULATION FUNCTION, FIRST NEED TO IMPORT
THE DPLYR PACKAGE USING LIBRARY(DPLYR) LINE OF CODE. BELOW IS THE
LIST OF A FEW DATA MANIPULATION.
• IT OFFERS A MORE INTUITIVE SYNTAX COMPARED TO BASE R FUNCTIONS,
MAKING IT EASIER TO PERFORM COMMON DATA MANIPULATION TASKS.
Function Name Description

filter() Produces a subset of a Data Frame.

distinct() Removes duplicate rows in a Data Frame

arrange() Reorder the rows of a Data Frame

Produces data in required columns of a


select()
Data Frame

rename() Renames the variable names

Creates new variables without dropping


mutate()
old ones.

Produces data in required columns of a


select()
Data Frame

rename() Renames the variable names


•THE FILTER() FUNCTION IS USED TO PRODUCE THE SUBSET OF THE DATA THAT SATISFIES THE CONDITION SPECIFIED IN THE
FILTER() METHOD. IN THE CONDITION, WE CAN USE CONDITIONAL OPERATORS, LOGICAL OPERATORS, NA VALUES, RANGE
OPERATORS FILTER() METHOD
# IMPORT DPLYR PACKAGE
LIBRARY(DPLYR)
# CREATE A DATA FRAME
STATS <- DATA.FRAME(PLAYER=C('A', 'B', 'C', 'D'),
RUNS=C(100, 200, 408, 19),
WICKETS=C(17, 20, NA, 5))

# FETCH PLAYERS WHO SCORED MORE


# THAN 100 RUNS
FILTER(STATS, RUNS>100)
OUTPUT:
PLAYER RUNS WICKETS
1 B 200 20
2 C 408 NA
WHAT IS READR?
•READR IS AN R LIBRARY AND PACKAGE DESIGNED TO FACILITATE THE PROCESS OF READING
RECTANGULAR DATA (LIKE CSV, TSV, AND OTHER DELIMITED FILES) INTO R. IT IS PART OF THE
TIDYVERSE COLLECTION OF R PACKAGES, WHICH AIM TO PROVIDE A COHESIVE AND CONSISTENT
SET OF TOOLS FOR DATA SCIENCE. THE READR PACKAGE WAS DEVELOPED BY HADLEY WICKHAM
AND IS MAINTAINED BY RSTUDIO.
•KEY FEATURES OF READR
1. FAST INPUT: READR IS BUILT TO BE FASTER THAN BASE R FUNCTIONS WHEN READING LARGE
DATASETS, MAKING IT AN IDEAL CHOICE FOR WORKING WITH BIG DATA FILES.
2. CONSISTENT APIS: THE READR PACKAGE OFFERS A CONSISTENT AND STRAIGHTFORWARD API
FOR READING DATA FILES, WHICH MAKES IT EASIER FOR USERS TO WORK WITH VARIOUS FILE
FORMATS.
3. TYPE STABILITY: READR AUTOMATICALLY INFERS DATA TYPES DURING THE IMPORT PROCESS,
ENSURING THAT THE IMPORTED DATA IS CONSISTENT AND CORRECTLY TYPED. THIS CAN HELP
REDUCE ERRORS AND INCONSISTENCIES IN DOWNSTREAM DATA ANALYSIS.
TIDYVERSE
•THE TIDYVERSE IS A COLLECTION OF R PACKAGES DESIGNED FOR DATA SCIENCE TASKS, INCLUDING DATA MANIPULATION,
EXPLORATION, AND VISUALIZATION. IT IS DEVELOPED BY RSTUDIO AND IS KNOWN FOR ITS CONSISTENT AND INTUITIVE API. THE
TIDYVERSE EMPHASIZES TIDY DATA PRINCIPLES, WHICH INVOLVE ORGANIZING DATA IN A WAY THAT MAKES IT EASY TO ANALYZE AND
VISUALIZE. IT FOCUSES ON RESHAPING AND TIDYING UP MESSY DATASETS. IT PROVIDES FUNCTIONS LIKE GATHER() AND SPREAD() TO
CONVERT DATA BETWEEN WIDE AND LONG FORMATS, MAKING IT EASIER TO WORK WITH DIFFERENT TYPES OF DATASETS.
COMPONENTS OF THE TIDYVERSE
•THE TIDYVERSE INCLUDES SEVERAL R PACKAGES THAT ARE COMMONLY USED IN DATA SCIENCE, SUCH AS:
 DPLYR: THIS PACKAGE PROVIDES FUNCTIONS FOR DATA MANIPULATION, SUCH AS FILTERING, SORTING, SUMMARIZING, AND
TRANSFORMING DATA. IT IS DESIGNED TO WORK WITH DATA FRAMES AND IS KNOWN FOR ITS EFFICIENT IMPLEMENTATION.
 TIDYR: THIS PACKAGE PROVIDES FUNCTIONS FOR TIDYING DATA SETS, SUCH AS GATHERING AND SPREADING VARIABLES. IT HELPS
TO ENSURE THAT DATA IS ORGANIZED IN A CONSISTENT AND EASY-TO-ANALYZE FORMAT.
 GGPLOT2: THIS PACKAGE PROVIDES A POWERFUL AND FLEXIBLE SYSTEM FOR CREATING STATISTICAL GRAPHICS, SUCH AS
SCATTER PLOTS, LINE GRAPHS, AND BAR CHARTS. IT IS BASED ON THE GRAMMAR OF GRAPHICS, WHICH PROVIDES A SYSTEMATIC
WAY TO BUILD COMPLEX VISUALIZATIONS FROM SIMPLE COMPONENTS.
 READR: THIS PACKAGE PROVIDES FUNCTIONS FOR READING TABULAR DATA FROM VARIOUS FILE FORMATS, SUCH AS CSV, TSV,
AND EXCEL. IT IS DESIGNED TO BE FAST AND EFFICIENT, AND SUPPORTS A WIDE RANGE OF FILE FORMATS.
 TIBBLE: THIS PACKAGE PROVIDES A NEW TYPE OF DATA FRAME THAT IS DESIGNED TO BE MORE USER-FRIENDLY THAN THE
TRADITIONAL R DATA FRAME. IT INCLUDES FEATURES SUCH AS PRINTING ONLY THE FIRST FEW ROWS AND COLUMNS BY DEFAULT,
WHICH CAN MAKE IT EASIER TO WORK WITH LARGE DATA SETS.
BENEFITS OF THE TIDYVERSE
•THE TIDYVERSE OFFERS SEVERAL BENEFITS FOR R USERS, INCLUDING:
 CONSISTENCY: THE TIDYVERSE PACKAGES SHARE A COMMON SYNTAX AND
DESIGN PHILOSOPHY, MAKING IT EASIER TO LEARN AND USE MULTIPLE
PACKAGES.
 EFFICIENCY: THE TIDYVERSE PACKAGES ARE DESIGNED TO BE FAST AND
EFFICIENT, WHICH CAN HELP TO REDUCE THE TIME REQUIRED TO PERFORM
DATA ANALYSIS TASKS.
 FLEXIBILITY: THE TIDYVERSE INCLUDES A WIDE RANGE OF PACKAGES THAT CAN
BE USED FOR DIFFERENT TASKS, FROM DATA MANIPULATION TO VISUALIZATION.

• INTEGRATION: THE TIDYVERSE PACKAGES ARE DESIGNED TO WORK TOGETHER


SEAMLESSLY, MAKING IT EASIER TO PERFORM COMPLEX DATA ANALYSIS TASKS.
•IN SUMMARY, THE TIDYVERSE IS A COLLECTION OF R PACKAGES DESIGNED FOR
DATA SCIENCE TASKS THAT EMPHASIZES TIDY DATA PRINCIPLES AND PROVIDES A
CONSISTENT AND INTUITIVE API. IT INCLUDES SEVERAL POPULAR PACKAGES FOR
DATA MANIPULATION, EXPLORATION, AND VISUALIZATION, AND OFFERS SEVERAL
BENEFITS FOR R USERS.

•THANK YOU!!!!

You might also like