0% found this document useful (0 votes)
5 views4 pages

123

This document provides a beginner's guide to using R and RStudio, emphasizing the importance of setting a working directory and creating R scripts. It includes instructions for generating random numbers, plotting graphs, and installing packages like ggplot2 for enhanced graphics. The author adopts a humorous tone while encouraging readers to explore R programming and its capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views4 pages

123

This document provides a beginner's guide to using R and RStudio, emphasizing the importance of setting a working directory and creating R scripts. It includes instructions for generating random numbers, plotting graphs, and installing packages like ggplot2 for enhanced graphics. The author adopts a humorous tone while encouraging readers to explore R programming and its capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 4

I wanted to do a short post on how to do something with the Search Console API with

R. Backing up a bit, I thought I’d include a short summary of how to get started
with R, and as you do, I’m now writing a separate post on how to do that. No, I
won’t back up more and explain computers or how to connect your printer.

Setup
For today’s excursion we’ll use the programming language “R”. Why do they call it
“R”? What happened to all the languages between “C” and “R”? Moving on. But why R?
You could do the same things with Python, or most programming languages, but Python
is for hipsters, and R is for real data scientists.

Today, we’ll be a data scientist, or a pirate (arrrrr!), or even a pirate data


scientist.

Pirate data scientists are the coolest. Good job on finding this post and joining
the club. You can add R programming to your resume/CV already - it’s such a weird
language that nobody will ask you to do anything in a job interview. And in real
life, you’ll just search around for code samples to copy & paste anyway. Trust me.
I got all of this to work without really understanding what I’m doing either.

Anyway, R is available for most computers. There’s a simple IDE called RStudio
which I use for R. The open source desktop edition is free. You don’t even need an
editor. Install it and let’s move on. You might be prompted to install “R” as well
- follow the directions for that. “R” is the programming language, “RStudio” is an
easy way of working in “R”.

Using RStudio
Obviously, you should read the 291 page documentation. Someone should (I didn’t,
maybe it’s not even 291 pages long). This is just my short-cut. It might not even
be a good short cut, as clearly I am no authority on R. I do sometimes EAT like a
pirate though.

RStudio has a few quirks that you could get used to. The main window will look like
this initially:

RStudio main view

(It turns out, a lot of these graphics get resized when I publish, so they look a
bit terrible. Oops. But at least it’s fast.)

Setting a working directory


The only reasonable thing here will be the file explorer at the bottom right. Your
first job will be to pick or make a working directory. Navigate to the right folder
on your computer there, use the “New Folder” button to create a directory if
needed.

Now for the magic: Click “More” and “Set as Working Directory”.

If you don’t do this, all files you read & write will end up somewhere else. It’s
super-annoying, computers can be such jerks with details like this. Always remember
to set the working directory first, even if it’s currently showing that directory.

Create a new R file


Now for the programming part. In your main menu (on the Mac, in your title bar),
select File / New File / R Script.
This will open another quadrant in your RStudio window for your R script. This will
be what things in RStudio will mostly look like.

Your first R program


Who am I to say what your first R program should do? I turned to Google, and
apparently other sites like to create a list of random numbers and plot their
distribution. With that in mind, here’s something you could try. Copy the code
below into the top left quadrant.

n <- floor(rnorm(1000, 100, 10))


t <- table(n)
barplot(t)
Now click “Save” (the diskette-icon - who knows what disks are nowadays, wth), give
it a file name, like “test”. And now you should have something like this:

For those used to programming languages, you assign values by using “<-”. R is a
bit weird in that you can kinda assign values to functions too, but whatever floats
your boat, R. Also, you can apply functions to individual numbers, vectors, or
arrays all at once.

You don’t really have to understand the code here, but very roughly:

rnorm() creates a list of 1000 randomly distributed numbers averaging around 100
with a deviation of 10 (so mostly numbers 70-130, math is weird too). floor() turns
them into integers. Mathematically it’s a set of numbers in a normal distribution
with a mean of 100 and a standard deviation of 10. These are now assigned to the
variable “n”.

table() counts the individual occurrences of each number and places them into the
variable “t”.

barplot() then just shows that as a graph.

Running your R script


Clearly, you just hit the “run” or “play” button, and it’ll go, right? No.
Remember, R is for scientists, so you must click the “Source” button instead.
Nobody knows how that happened, it just is.

Outcome
If all goes well, your RStudio UI should now look like this:

You can notice a few things here:

The console quadrant (bottom left) mentions the “source()” command. You can enter
any R command here, and it’ll be processed. This is useful for when you have no
idea what you’re doing, and need to try things out.
The file quadrant now shows a graph. What the heck, huh? So cool. But also, why.
The top right quadrant shows your variables. This is kinda useful for figuring
things out.
If you run the script a few times (remember, the “Source” button - we’re data
scientists here), it’ll create new sets of random numbers and generate new graphs.
Try it out. Clicking stuff is cool, but also, to show how to deal with these graphs
we’re going to need them.

When you have multiple graphs, you can switch between graphs (“plots” in data-
scientist-eze) using the arrows:

In the same place, you can export these graphs to save them as files, or copy them
into your clipboard if you’re writing a report.

(I really like this set of random numbers. Arrrr.)

Using packages
The default R installation doesn’t have all the cool stuff. If you use Stack
Overflow regularly to copy and paste code, I mean to learn, you’ll see mentions of
other “packages” or “libraries”. Installing these is often pain-free. You need an
internet connection though (this is kinda assumed anyway nowadays, it’s not like
we’re a pirate on a boat in the ocean, oh wait).

For R, there are always two steps involved: install the package, and then use the
library. Why they don’t call it the same thing, I don’t know. Gate-keeping by data
scientists, obviously.

Let’s try one out.

Step 1: install the package.

In the console quadrant (bottom left), copy the following and hit enter:

install.packages("ggplot2")
This will now install the ggplot2 library. This library helps to make nice
graphics. If you’re curious, there’s a big collection of R graphics that you can
use to copy & paste in your code, many of them use ggplot2.

Your console should show something like this now (the exact content will differ):

Step 2: load the library in your code.

Let’s start a new script (menu: File / New File / R Script), and use the following
code:

library(ggplot2)

ggplot(mpg, aes(displ, hwy, colour = class)) +


geom_point()
The first line (library(…)) loads the ggplot2 library. You only have to install it
once, in future scripts you can just load it like that. RStudio tries to figure out
which libraries it needs too, and helps you to remember to install them. If you
don’t install them, the script won’t work.

The next two lines use ggplot() to create a graph (plot). ggplot() takes the
dataset (“mpg”), the items in there you want to graph with aes(), and then adds the
type of graphic (“geom_point()”) that you want to do. ggplot() does these weird
things with just adding things together with “+” to combine them.

You might wonder where the data used in the graphic comes from - how did we
suddenly get “MPG” data and car types? R includes a number of small data sources
that you can use for trying things out. It makes it a bit easier to mess with
simple graphics before you use your real data. In this case, it’s some older car
manufacturer information: the “mpg dataset” as a part of the “mtcars dataset”. If
you spot random car-related statistics and graphics in R, now you know why.

Side-note about dyplr


Another weird setup is “dyplr”, which you might encounter with R. I’m not covering
it in this post, this is just FYI. It’s basically a way of routing the output from
one part into another part of code using “%>%”, with the goal of making it easy to
write (and hard for mortals to understand, I guess). In R, it’ll look something
like this:

firstthing() %>% secondthing()


“dyplr” is a separate library, so you install it in the same way as previously
mentioned, etc. A lot of R-related code snippets offer both dyplr and “normal” R
code variations. You can probably get around with not using it at all, but once
you’ve seen how it works, it’s not that weir … ok, it is still weird.

That’s mostly it
At this point, you should be ready to do things in RStudio. Remember, R is weird,
and the names of things are a bit confusing at first, so use your favorite search
engine whenever you get stuck. Regardless, I hope this helps to get you started.

You might also like