Skip to content

Assignment 1 R Programming Tutorial Pdf

The assignment for week 2 is kinda tough if you have not used R before. The video lectures also did not prepare you for it. If you have not taken the swirl tutorial, I strongly recommend that you finish it at the beginning of the week 2. You also want to start working on the assignment as soon as possible.

Derek Franks wrote a great tutorial. If you follow the step by step tutorial closely, you should have no problem finishing some problems in assignment 1. Here is the link to the tutorial:

https://github.com/derekfranks/practice_assignment/blob/master/Practice_Assignment.pdf

The second challenge I had about this assignment is that I did not know how to return a data frame in a function. After experimenting a bit and I finally got it to work. Here are the code for returning a data frame in a function.

## initiate the data frame results <- data.frame() ## loop through the files for (i in id) { ## read file and get completed cases ## add to the data frame. results <- rbind(results, data.frame(id=i,nobs=completed_cases)) } ## return the data frame return(results)

Function cor is used in one of the problems, but it’s not taught. You are supposed to figure it out by yourself. The usage is actually quite easy. Suppose you read the file and store it in a data frame called data. To calculate the correlation between column 2 and column 3, you use corr this way.

cor(data[,2], data[,3])

As a developer you can pick-up R super fast.

If you are already a developer, you don’t need to know much about a new language to be able to reading and understanding code snippets and writing your own small scripts and programs.

In this post you will discover the basic syntax, data structures and control structures that you need to know to start reading and writing R scripts.

Let’s get started.

R Crash Course For Developers
Photo by hackNY.org, some rights reserved.

R Syntax is Different, But The Same

The syntax in R  looks confusing, but only to begin with.

It is an older LISP-style language inspired by an even older language (S). The assignment syntax is probably the strangest thing you will see. Assignment uses the arrow (<-) rather than a single equals (=).

R has all of your familiar control flow structures like if-the-else, for-loops and while loops.

You can create your own functions and libraries of helper functions for your scripts.

If you have done any scripting before, like JavaScript, Python, Ruby, BASH or similar, then you will pick up R very quickly.

You Can Already Program, Just Learn the R Syntax

As a developer, you already know how to program.

You can take a problem and think up the type of procedure and data structures you need. The language you are using is just a detail. You only need to map your idea of the solution onto the specifics of the language you are using.

This is how you can get started using R very quickly.

To get started, you need to know the absolute basics. Basics such as:

  • How do we assign data to variables?
  • How do we work with different data types?
  • How do we work with the data structures for handling data?
  • How do we use the standard flow control structures?
  • How do you work with functions and third-party packages?

You learn the answers to these questions by looking at code examples. You can then:

  • Map third party code you’re reading onto those examples to better understand them.
  • Pattern the code you write from scratch from the examples.

Let’s take a quick tour of the basic syntax of R

Need more elp with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

R Crash Course For Developers (Start Here)

In this section we will take a quick look at the basic syntax used in R.

After reading (and ideally working through) the examples in this section, you will have enough background as a developer to start reading and understanding other peoples R code.

You will also have the confidence to start writing your own small R scripts.

The examples in this section are split into the following sections:

  1. Assignment
  2. Data Structures
  3. Flow Control
  4. Functions
  5. Packages

Start the R interactive environment (type R on the command line) and let’s get started.

1. Assignment

The key to assignment in R is the arrow operator (<-) for assignment.

Below are examples of assigning an integer, double, string and a boolean, and printing each out to the console in turn.

Remember, do not use equals (=) for assignment. It is the biggest mistake new R programmers make.

2. Data Structures

There three data structures that you will use the most in R:

  1. Vectors
  2. Lists
  3. Matrices
  4. Data Frames

Lists

Lists provide a group of named items, not unlike a map.

You can define a new list with the list() function. A list can be initialized with values or empty. Note that the named values in the list can be accessed using the dollar operator ($). Once referenced, they can be read or written. This is also how new items can be added to the list.

Vectors

Vectors are lists of data that can be the same or different types:

Notice that vectors are 1-index (indexes start at 1 not 0).

You will use the c() function a lot to concatenate variables into a vector.

Matrices

A matrix is a table of data. It has dimensions (rows and columns) and the columns can be named.

A lot of useful plotting and machine learning algorithms require the data to be provide as a matrix.

Note the syntax to index into rows [1,] and columns [,1] of a matrix.

Data Frame

Data frames are useful for actually representing tables of your data in R.

A matrix is much simpler structure, intended for mathematical operations. A data frame is more suited to representing a table of data and is expected by modern implementations of machine learning algorithms in R.

Note that you can index into rows and columns of a data frame just like you can for a matrix. Also note that you can reference a column using its name (df$years)

Some other data structures you could go on to learn about are lists and arrays.

3. Flow Control

R supports all the same flow control structures that you are used to.

  1. If-Then-Else
  2. For Loop
  3. While Loop

As a developer, these are all self explanatory.

If-Then-Else

For Loop

While Loop

4. Functions

Functions let you group code and call that code repeatedly with arguments.

The two main concerns with functions are:

  1. Calling Functions
  2. Help For Functions
  3. Writing Custom Functions

Call Functions

You have already used one function, the c() function for concatenating objects into a vector.

R has many built in functions and additional functions can be provided by installing and loading third-party packages.

Here is an example of using a statistical function to calculate the mean of a vector of numbers:

Help for Functions

You can help help with a function in R by using the question mark operator (?) followed by the function name.

Alternatively, you can call the help() function and pass the function name you need help with as an argument (e.g. help(mean)).

You can get example usage of a function by calling the example() function and passing the name of the function as an argument.

Custom Functions

You can define your own functions that may or may not take arguments or return a result.

Below is an example of a custom function to calculate and return the sum of three numbers:

5. Packages

Packages are the way that third party R code is distributed. The Comprehensive R Archive Network (CRAN) provides hosting and listing of third party R packages that you can download.

Install a Package

You can install a package hosted on CRAN by calling a function. It will then pop-up a dialog to ask you which mirror you would like to download the package from.

For example, here is how you can install the caret package which is very useful in machine learning:

Help For Package

A package can provide a lot of new functions. You can read up on a package on it’s CRAN page, but you can also get help for the package within R using the library function.

5 Things To Remember

Here are five quick tips to remember when getting started in R:

  • Assignment. R uses the arrow operator (<-) for assignment, not a single equals (=).
  • Case Sensitive. The R language is case sensitive, meaning that C() and c() are two different function calls.
  • Help. You can help on any operator or function using the help() function or the ? operator and help with packages using the double question mark operator (??).
  • How To Quit. You can exit the R interactive environment by calling the q() function.
  • Documentation. R installs with a lot of useful documentation. You can review it in the browser by typing: help.start()

Get a Reference Book

There are many great resources online for learning more about how to use R.

I recommend grabbing a good reference text and keeping it close by. I use and recommend R in a Nutshell.

Summary

In this post you took a crash course in basic R syntax.

As a developer, you now know enough to read other peoples R scripts.

You also have the tools to start writing your own little scripts in the R interactive environment.

Next Step

Did you work through all of the examples?

  1. Start R.
  2. Work through the tutorial.
  3. Let me know how you went (leave a comment)

Do you have any questions? I there something else you would like covered?

Leave a comment and let me know.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

># integer

>i<-23

>i

[1]23

 

># double

>d<-2.3

>d

[1]2.3

 

># string

>s<-'hello world'

>s

[1]"hello world"

 

># boolean

>b<-TRUE

>b

[1]TRUE

# create a list of named items

a<-list(aa=1,bb=2,cc=3)

a

a$aa

 

# add a named item to a list

a$dd=4

a

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

># create a vector using the c() function

>v<-c(98,99,100)

>v

[1]  98  99100

>v[1:2]

[1]9899

># create a vector from a range of integers

>r<-(1:10)

>r

[1]  1  2  3  4  5  6  7  8  910

>r[5:10]

[1]  5  6  7  8  910

># add a new item to the end of a vector

>v<-c(1,2,3)

>v[4]<-4

>v

[1]1234

# Create a 2-row, 3-column matrix with named headings

>data<-c(1,2,3,4,5,6)

>headings<-list(NULL,c("a","b","c"))

>m<-matrix(data,nrow=2,ncol=3,byrow=TRUE,dimnames=headings)

>m

     abc

[1,]123

[2,]456

 

>m[1,]

abc

123

 

>m[,1]

[1]14

# create a new data frame

years<-c(1980,1985,1990)

scores<-c(34,44,83)

df<-data.frame(years,scores)

df[,1]

df$years

# if then else

a<-66

if(a>55){

print("a is more than 55")

}else{

print("A is less than or equal to 55")

}

 

[1]"a is more than 55"

# for loop

mylist<-c(55,66,77,88,99)

for(value inmylist){

print(value)

}

 

[1]55

[1]66

[1]77

[1]88

[1]99

# while loop

a<-100

while(a<500){

a<-a+100

}

a

 

[1]500

# call function to calculate the mean on a vector of integers

numbers<-c(1,2,3,4,5,6)

mean(numbers)

 

[1]3.5

# help with the mean() function

?mean

help(mean)

# example usage of the mean function

example(mean)

# define custom function

mysum<-function(a,b,c){

sum<-a+b+c

return(sum)

}

# call custom function

mysum(1,2,3)

 

[1]6

# install the caret package

install.packages("caret")

# load the package

library(caret)

# help for the caret package

library(help="caret")


Frustrated With Your Progress In R Machine Learning?

Develop Your Own Models in Minutes

…with just a few lines of R code

Discover how in my new Ebook:
Machine Learning Mastery With R

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.