Fundamentals
This page provides a very brief introduction to general R programming, and is sufficient to perform very complicated computations. It is certainly not comprehensive, however, and it is expected that one will need to supplement the information here with other material, and some healthy experimentation.
Variables and values
A value can be assigned to a variable using the assignment operator <-
. The variable is created if it doesn’t already exist.
x <- 123
Creating or assigning a variable a value does not produce any output. Providing the name of the variable as input does produce its value as output.
x
## [1] 123
One can reassign the value of x
using <-
. The type (e.g. numeric, string) of the variable need not be the same as before.
x <- "hello"
x
## [1] "hello"
One can use the operator =
instead of <-
, but this is not recommended. In RStudio Alt + - is a keyboard shortcut for writing the assignment operator.
A valid variable name consists of letters, numbers and the dot .
or underscore _
characters. Visible variable names must start with a letter.
Arithmetic
As you would expect, one can add, subtract, multiply, divide and exponentiate. There are also operators for integer division and remainder (modulus).
Operator | Meaning |
---|---|
+ | add |
- | subtract |
* | multiply |
/ | divide |
^ or ** | exponentiate |
%/% | integer division |
%% | integer modulus |
2^3
## [1] 8
7 %/% 3
## [1] 2
7 %% 3
## [1] 1
For convenience, these operators are used the same way we write them on paper. The alternative is to surround the operator with ticks.
`+`(2,3)
While this is natural, it is necessary in some circumstances to use parentheses or curly braces to ensure operations occur in the right order.
2 + 3 * 5
## [1] 17
(2 + 3) * 5
## [1] 25
{2 + 3} * 5
## [1] 25
Conditional statements
Commands can be executed conditional upon a logical value (i.e. TRUE or FALSE) by using an if/else statement.
if (TRUE) {
print("this is printed")
} else {
print("this is not printed")
}
## [1] "this is printed"
if (FALSE) {
print("this is not printed")
} else {
print("this is printed")
}
## [1] "this is printed"
It is not necessary to have an else statement.
if (TRUE) {
print("this is printed")
}
## [1] "this is printed"
if (FALSE) {
print("this is not printed")
}
One can nest if/else statements
if (FALSE) {
print("this is not printed")
} else {
if (TRUE) {
print("this is printed")
}
}
## [1] "this is printed"
if (FALSE) {
print("this is not printed")
} else if (TRUE) {
print("this is printed")
}
## [1] "this is printed"
The curly braces {
and }
are necessary when there are multiple statements to be executed in the conditional block. For example, in the following code “1” is not printed, but “2” is (the indentation is just misleading).
if (FALSE)
print(1)
print(2)
## [1] 2
The above snippet is equivalent to
if (FALSE) {
print(1)
}
print(2)
## [1] 2
It is often considered good practice to use curly braces even for single-statement blocks, as it is easy to read and can be less likely to lead to bugs on modification.
The ifelse
operator can also be useful in some situations. You can read about it after you are familiar with vectors in R.
Relational and logical operators
Conditional statements are often used when the logical value is computed dynamically.
x <- 5
if (x < 10) {
print("x is less than 10")
} else {
print("x is greater than or equal to 10")
}
## [1] "x is less than 10"
The relational operators are
Operator | Meaning |
---|---|
< | less than |
> | greater than |
<= | less than or equal to |
>= | greater than or equal to |
== | equals |
!= | not equals |
Logical operators are useful for combining logical values. These are
Operator | Meaning |
---|---|
! | not |
& | and |
| | or |
&& | short-circuit and |
|| | short-circuit or |
A short-circuit &&
or ||
evaluates only those values from left to right that are required to determine if the result is TRUE
or FALSE
. For example, FALSE && x
evaluates to FALSE
and TRUE || x
evaluates to TRUE
irrespective of the value of x
and so x
need not be evaluated.
x <- TRUE
y <- FALSE
if (!y) {
print("this is printed")
}
## [1] "this is printed"
if (x & y) {
print("this is not printed")
}
if (x | y) {
print("this is printed")
}
## [1] "this is printed"
if (x || NA) {
print("this is printed even though NA is not a logical value")
}
## [1] "this is printed even though NA is not a logical value"
Functions
Functions are incredibly useful building blocks for writing understandable programs, enabling code re-use and avoiding repetitions. The syntax for creating a function is
miles2km <- function(x) {
# 1 mile is 1.609344 km
y <- x * 1.609344
return(y)
}
What is 500 miles in kilometres?
miles2km(500)
## [1] 804.672
What is 200 miles in kilometers?
miles2km(200)
## [1] 321.8688
Once the miles2km
function has been written once, it can be used many times.
Functions can have multiple arguments.
hypotenuse <- function(a, b) {
return(sqrt(a^2 + b^2))
}
hypotenuse(3, 4)
## [1] 5
There are many functions and constants that have already been defined in R’s base package. These include the constant pi
, trigonometric functions like sin
, cos
, tan
, and functions like log
, exp
. The log
function has a default second argument.
log(100) # outputs natural log of 100
## [1] 4.60517
log(100, 10) # outputs log of 100 to base 10
## [1] 2
log(100, base = 10) # outputs log of 100 to base 10
## [1] 2
There are advanced techniques one can use to define and use functions, which you can research on your own. For example, one can call functions using named arguments, which can help avoid mistakes.
foo <- function(a, b) {
return(10*a + b)
}
foo(4, 2)
## [1] 42
foo(a=4, b=2)
## [1] 42
foo(b=2, a=4)
## [1] 42
While it is often not necessary to explicitly use return
, it is usually considered good practice. An R function will return the last evaluated statement in the function.
foo <- function(x) {
x + 1
}
foo(22)
## [1] 23
The return
command stops execution of that function and returns.
foo <- function(x) {
return(x)
print("this is not printed")
}
foo(22)
## [1] 22
Only one value can be returned by an R function. However, this value can be a vector, a list, or even another function.
Getting help
One can access R’s help documentation by opening the Help window in RStudio. One often wants to access the documentation for a specific function. One can do this by typing “?” followed by the function name in the R console.
?log
One can perform a keyword search using “??” instead of “?”.
??logarithm
Of course, R is a popular language and there are also many resources online such as webpages, blogs and forums that can help solve problems.
Vectors
We have focused on single values so far, for simplicity. In fact, a single number in R is a vector of size 1.
One can combine two vectors using the c
function.
x <- 1
y <- 2
z <- c(x, y)
z
## [1] 1 2
w <- c(z, 3)
w
## [1] 1 2 3
There are a few ways to quickly specify vectors.
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
10:1
## [1] 10 9 8 7 6 5 4 3 2 1
seq(1, 2, 0.1)
## [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
rep(42, 8)
## [1] 42 42 42 42 42 42 42 42
Many basic R functions operate element-wise on vectors.
a <- c(1, 3, 5, 7)
b <- c(2, 4, 6, 8)
a + b
## [1] 3 7 11 15
a * b
## [1] 2 12 30 56
a ^ b
## [1] 1 81 15625 5764801
log(a)
## [1] 0.000000 1.098612 1.609438 1.945910
Other functions make sense for vectors of length greater than 1.
x <- c(5, 2, 1, 6, 4, 1, 3)
sort(x)
## [1] 1 1 2 3 4 5 6
min(x)
## [1] 1
max(x)
## [1] 6
sum(x)
## [1] 22
mean(x)
## [1] 3.142857
var(x)
## [1] 3.809524
Vectors can be indexed using vectors of integers (although not necessarily with integer type).
x <- 1:6
x[3]
## [1] 3
x[c(2, 4)]
## [1] 2 4
x[c(3, 5, 3)]
## [1] 3 5 3
x[-c(2, 4)]
## [1] 1 3 5 6
They can also be indexed using vectors of logicals.
x <- 1:6
x[c(FALSE, TRUE, FALSE, TRUE, FALSE, FALSE)]
## [1] 2 4
x[x %% 2 == 0]
## [1] 2 4 6
Values can be changed using assignment.
x <- 1:6
x[2] <- 42
x
## [1] 1 42 3 4 5 6
x[c(2, 3)] <- c(11, 12)
x
## [1] 1 11 12 4 5 6
Vector types
So far we have focused on variables that are classified as numeric
by R. There are 6 atomic vector types, and each vector can only have one of these types.
- logical
- integer
- numeric / double
- complex
- character
- raw
A single logical value takes either the value TRUE
or FALSE
.
An integer is specified by appending “L” to the end of the input. It is not that common to create integer vectors in R functions: most of the time one just uses numeric (i.e. double-precision floating point) vectors even when all the elements are mathematically integers.
x <- 123L
typeof(x)
## [1] "integer"
x <- 123
typeof(x)
## [1] "double"
A complex number is specified using complex
. One can call the following functions on a complex number: Re
, Im
, Mod
, Arg
, Conj
.
x <- complex(real=1, imaginary=2)
x
## [1] 1+2i
Conj(x)
## [1] 1-2i
A variable of type character can be a string, not just a single character. In particular, a single string is a vector of length 1, not a vector of characters.
x <- "hello"
typeof(x)
## [1] "character"
x <- "hello world"
length(x)
## [1] 1
print(x)
## [1] "hello world"
x <- c("hello", "world")
length(x)
## [1] 2
print(x)
## [1] "hello" "world"
We do not discuss the raw type here: it is used to store raw bytes.
Matrices
Matrices can be created using the matrix
command.
x <- matrix(1:6, 2, 3)
x
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
The dimension of a matrix is given by dim
.
dim(x)
## [1] 2 3
They can be indexed in a similar way to vectors.
x[1,3]
## [1] 5
x[2,]
## [1] 2 4 6
x[2,c(TRUE, FALSE, TRUE)]
## [1] 2 6
x[,3]
## [1] 5 6
There are many built-in R functions for matrices. One important one is %*%
for matrix multiplication (*
is element-wise multiplication).
A <- matrix(c(4, 2, 2, 4), 2, 2)
B <- matrix(c(1, 2, 3, 4), 2, 2)
A * B
## [,1] [,2]
## [1,] 4 6
## [2,] 4 16
A %*% B
## [,1] [,2]
## [1,] 8 20
## [2,] 10 22
t(B) # transpose
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
eigen(B)
## eigen() decomposition
## $values
## [1] 5.3722813 -0.3722813
##
## $vectors
## [,1] [,2]
## [1,] -0.5657675 -0.9093767
## [2,] -0.8245648 0.4159736
svd(B) # singular value decomposition
## $d
## [1] 5.4649857 0.3659662
##
## $u
## [,1] [,2]
## [1,] -0.5760484 -0.8174156
## [2,] -0.8174156 0.5760484
##
## $v
## [,1] [,2]
## [1,] -0.4045536 0.9145143
## [2,] -0.9145143 -0.4045536
chol(A) # cholesky decomposition
## [,1] [,2]
## [1,] 2 1.000000
## [2,] 0 1.732051
diag(A)
## [1] 4 4
One can add a row (resp. column) to a matrix using the rbind
(resp. cbind
) function.
A <- diag(1:3)
A
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 2 0
## [3,] 0 0 3
rbind(A, 4:6)
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 2 0
## [3,] 0 0 3
## [4,] 4 5 6
cbind(A, 4:6)
## [,1] [,2] [,3] [,4]
## [1,] 1 0 0 4
## [2,] 0 2 0 5
## [3,] 0 0 3 6
An R matrix
is equivalent a 2-dimensional array
. One can create higher dimensional arrays.
v <- array(1:24, c(2,3,4))
v[1,2,3]
## [1] 15
An R vector
is similar to, but not exactly the same as, a matrix
with one column.
matrix(1:2, 2, 1)
## [,1]
## [1,] 1
## [2,] 2
1:2
## [1] 1 2
One can convert between the two using the as.vector
and as.matrix
commands.
Lists
Vectors and matrices must contain variables of the same type. Lists are more generic data structures, and are sometimes called generic vectors.
List elements can be given names, which can be used to index the list using $
. Lists can also be indexed using numbers inside [[
and ]]
. New elements can be added by name or by indexing
x <- list(numbers = 1:5, message = "hello")
x
## $numbers
## [1] 1 2 3 4 5
##
## $message
## [1] "hello"
x$numbers
## [1] 1 2 3 4 5
x$message
## [1] "hello"
x[[1]]
## [1] 1 2 3 4 5
x[[2]]
## [1] "hello"
x$even.numbers <- x$numbers[(x$numbers %% 2) == 0]
x[[4]] <- c(1.1, 2.2)
names(x)[[4]] <- "floats"
x
## $numbers
## [1] 1 2 3 4 5
##
## $message
## [1] "hello"
##
## $even.numbers
## [1] 2 4
##
## $floats
## [1] 1.1 2.2
Lists can contain lists or functions as elements, so they can be very rich data structures.
f <- list()
f$name <- "multiply"
f$arg.types <- list(x="numeric", y="numeric")
f$eval <- function(x, y) {
return(x*y)
}
f$name
## [1] "multiply"
f$arg.types
## $x
## [1] "numeric"
##
## $y
## [1] "numeric"
f$eval(6, 7)
## [1] 42
Basic scoping
R uses lexical scoping, which means that it resolves a variable name in a specific location in the code by considering where that variable is defined in the code. In particular, name resolution does not depend on the run time call stack of the program. We do not go into too many details here, except to show some examples that should help you become familiar with how scoping works in R.
A name introduced inside a function renders irrelevant any variables with the same name outside the function body.
x <- 1
foo <- function() {
x <- 2
x
}
foo()
## [1] 2
x <- 1
foo <- function(x) {
x
}
foo(2)
## [1] 2
If a name cannot be resolved in the “local scope”, R will look one level up in the code.
x <- 1
y <- 2
foo <- function() {
x <- 3
c(x, y)
}
foo()
## [1] 3 2
The scoping rule determines which y
is being referred to. The value associated with that y
does depend on the execution context. In the below example, the name is resolved to the y
in global scope, whose value changes between the two invocations of foo
.
x <- 1
y <- 2
foo <- function() {
x <- 3
c(x, y)
}
foo()
## [1] 3 2
x <- 20
y <- 10
foo()
## [1] 3 10
To see what is meant by “one level up”, the following snippet may be helpful.
x <- 1
foo <- function() {
x <- 2
bar <- function() {
x
}
bar()
}
foo()
## [1] 2
Lexical scoping is now very common, and R’s specific scoping rules are usually easy to work with.
A particular problem is when a programmer expects a variable to be defined locally within a function but it is not, and instead a global variable with the same name is used.
Curly braces {
and }
do not define a new scope in R.
Pass by value semantics
In R one can think of arguments to functions as if they are passed by value. That is, when a function is called with an argument x
it is as if x
is copied and then passed to the function. So any modifications of x
in the function body do not affect the variable x
from the caller’s perspective.
a <- c(1, 2)
foo <- function(x) {
x[1] <- 3
x
}
foo(a)
## [1] 3 2
a
## [1] 1 2
This is also true for assignments. If a variable y
is assigned to take the same value that x
takes, modifications of y
do not affect x
.
x <- c(1, 2)
y <- x
y[1] <- 3
y
## [1] 3 2
x
## [1] 1 2
In the above, we have said that it is as if the values are copied. There are performance enhancements to make this less computationally costly in some situations, and R actually uses a copy on modify strategy in the background. This is not necessary to understand to write working code, but knowledge of how R works internally can be used to improve performance.
Iteration: while and for loops
Iteration allows a program to execute a block of code many times. Iteration (or recursion) greatly increases the expressive power of a programming language.
One way to do this is using a while
loop, which executes a block of code until some condition is met.
x <- 0
while (x < 3) {
x <- x + 1
print(x)
}
## [1] 1
## [1] 2
## [1] 3
Another way is to use a for
loop, where one executes a block of code for each element in a vector.
vec <- c(1,4,9)
for (x in vec) {
sqrt.x <- sqrt(x)
print(sqrt.x)
}
## [1] 1
## [1] 2
## [1] 3
It is quite common for the vector in a for loop to be defined as 1:n
for some positive integer n
.
vec <- c(1,4,9)
n <- length(vec)
for (i in 1:n) {
print(sqrt(vec[i]))
}
## [1] 1
## [1] 2
## [1] 3
Some people prefer to use loop replacements such as apply
and lapply
. We will consider this in more detail later.
When using loops, the commands break
and next
are useful. The former immediately exits the loop, while the latter immediately starts the next iteration of the loop.
x <- 1
while (TRUE) {
x <- x+1
if (x >= 5) {
break
}
}
x
## [1] 5
for (i in 1:3) {
if (i == 2) {
next
}
print(i)
}
## [1] 1
## [1] 3
Recursion
In some cases, defining a function recursively is natural, and R supports recursive functions. A classic example is a recursive function for computing a Fibonacci number.
fib <- function(n) {
stopifnot(n %% 1 == 0 && n >= 0)
if (n == 0 || n == 1) {
return(n)
}
return(fib(n-2) + fib(n-1))
}
# compute and store the first 5 Fibonacci numbers
fib.first5 <- rep(0, 5)
for (i in 1:5) {
fib.first5[i] <- fib(i)
}
fib.first5
## [1] 1 1 2 3 5
The recursive function/algorithm fib
is very poor from a performance perspective. It is good to think about why, and how one might overcome the problem.
Errors, warnings and messages
You might have noticed the use of the stopifnot
function in fib
, which is used there to check that the input is a non-negative integer. It will raise an error if this check fails. Checks like this are useful for ensuring that functions are used only in appropriate situations.
In general, an error can be thrown using the stop
command, which will stop execution and print the error message, unless the error is caught.
stop("informative error message")
## Error: informative error message
A warning can be printed using the warning
command, and a message can be printed using the message
command. Neither warnings and messages stop execution, and the difference between them is primarily semantic: warnings are often used to indicate something unexpected has occurred, but the function was probably able to deal with it appropriately. The messages from stop
, warning
and message
are sent to standard error and not standard out, in contrast to print
.
warning("informative warning message")
## Warning: informative warning message
message("informative message")
## informative message
Errors, warnings and messages are collectively known as conditions.
One can catch a condition using tryCatch
. error
, warning
, message
and finally
are named arguments, the first three being functions that are called when the corresponding condition is raised. finally
is just a block of code.
tryCatch({
warning("be careful!")
}, error = function(e) {
paste("there was an error:", e$message)
}, warning = function(w) {
paste("there was a warning:", w$message)
}, message = function(m) {
paste("there was a message:", m$message)
}, finally = {
print("finished.")
})
## [1] "finished."
## [1] "there was a warning: be careful!"
If all you want to do is display when there is an error you can use try
, which allows execution to continue and displays the error message.
try(stop("something is wrong"))
## Error in try(stop("something is wrong")) : something is wrong
Special values
R has some special values, including NULL
, NA
, Inf
and NaN
.
NULL
is returned by functions with no return value, and can be thought of us an empty vector.
x <- NULL
x <- c(x, 1)
x <- c(x, 2)
x
## [1] 1 2
NA
is “Not Available” and is used for missing values. NA
is propagated where appropriate.
x <- c(1, NA)
y <- c(3, 4)
x + y
## [1] 4 NA
Inf
represents positive infinity. -Inf
represents minus infinity.
NaN
is “Not a Number” and is the value of, e.g., 0/0
, Inf - Inf
or Inf/Inf
.