Intro to R

R is a language and an environment, that is good for analyzing data and creating rich graphics. To get started make sure you have R installed on your computer. The latest version of R is available at the Comprehensive R Archive Network.

When you start R, an interpreter window is launched. You can type commands into it, or cut and paste them from a document.

Use R as a Calculator

The easiest way to get started is to simply use R as a calculator. Type some numerical expressions into R and see what happens (or cut and paste the code below). The lines that start with "#" are comments do not get evaluated.

# how many seconds in a day?
24*60*60

# what is 2 to the power of 16?
2^16

# what's the square root of pi?
sqrt(22/7)

# generate a series of numbers using the colon character
1:5

# what's the average value of the series of numbers above?
mean(1:5)

The basic elements of R are variables and functions. Look, you've already used two functions! (bonus points: type the word pi on the command line, and you'll see that it's a predefined variable).

Variables

Variables are used to store data so we can operate on it. R has four basic kinds of variables: vector, matrix, dataframe, list. The way we assign data to a variable is to use an assignment operator "<-". An equals sign also works (=, common in other languages), but the convention in R is to use the little arrow.

Let's start with a vector. It can hold one or more values.

# create x and assign it the value of 2
x <- 2

# make other variables
y <- 3.14

# you can name variables using letters, words, periods, underscores
temperature <- 37

temperature2 <- 98.6

# but the name cannot start with a number!
2temperature <- 98.6

# we can assign character strings too, but they have to be enclosed in quotes
genotype <- ″wt″

Now that you've created them, if you type the name any of the variables above, the value of the variable is returned.

To see all the objects you've created so far, use the ls() function. Type it on the command line, and the objects in your environment will be listed.

To assign more than one value to a variable, we often have to use a concatenation function: c()

# the concentration of DNA in my std curve samples
dna <- c(5, 10, 20, 40, 80, 160, 320)

Now if you type the name of this variable, all the values will be returned. However, each value can be accessed using square brackets. The elements of the vector are numbered beginning with 1 (not 0 like some other languages).

# what concentration was the 3rd sample?
dna[3]

# We can use multiple subscripts
dna[1:5]

# I just need the odd samples
dna[c(1,3,5,7)]

Sometimes we don't know how long a variable is, let's use the length() function to figure that out.

length(dna)

If we use a vector in an expression, it get's evaluated for each element.

# I need to divide my concentrations by 2
dna/2

# I can assign the results to a new vector
dna_dilution <- dna/2

Vectors can have multiple values, but they must all be of the same type.

numeric vector

23.2

45.8

63.7

character vector

"red"

"green"

"blue"

boolean vector

TRUE

FALSE

TRUE

Boolean vectors are an interesting concept. They allow us to test things and apply logic.

# which DNA concentrations are greater than 50?
dna > 50

# we can capture the result, and use that as an "index vector" to return the values
# that meet the criteria
iv <- dna > 50

# see which values met our criteria
dna[iv]

# we can also use a shortcut, and use the logical expression directly
dna[dna > 50]

R/R Intro (last edited 2011-09-28 06:59:07 by ChrisSeidel)