Revision 18 as of 2015-04-19 19:56:14

Clear message

<<TableOfContents(3)>>

Write to the clipboard

Ever produce a list that you want to take somewhere else outside of R without having to write it to a file? Use the ClipBoard!

The only annoying thing is that you have to write them as a character vector.

# get some random numbers
x <- runif(5)
# write them to the clipboard
writeClipboard(as.character(x))

To move tables of data, use the write.table function:

mytable <- matrix(1:100, ncol=10, nrow=10)
write.table(mytable, "clipboard", sep="\t")
# suppress row and column names
write.table(mytable, "clipboard", sep="\t", row.names=FALSE, col.names=FALSE)

To move a column of numbers into R from other programs you can use the scan() function. For instance

# bring a column of numbers from the clipboard into a vector
x <- scan()
# hit Paste, then Return
# or use readClipboard
x <- readClipboard()

How to flatten a matrix in R

If you have a matrix or data frame where some of the rows have values that need to be averaged. There are a few ways to go about this.

Imagine you have a table of gene expression values measured under various conditions. Each row of the table represents a gene, some of the rows represent replicate measurements of that gene. The gene name is found in the column "sys_id". Each column represents a different condition. To average the replicates you could do the following:

Example 1:

library(stats)

mat <- aggregate(dat[, 2:5], list(dat$sys_id), mean)

Example 2: Create a factor with the column of gene names, then use that factor to apply a function to the values in each individual column, then recombine the columns into a dataframe or matrix.

use tapply()

tapply(data, factor, function)

Colors for Heatmaps

library(RColorBrewer)
hmcol <- colorRampPalette(brewer.pal(10, "RdBu"))(256)

Write lines to a file

# open a file connection
fout <- file("filename", "w")
# print stuff to the connection (usually as part of a loop)
cat("some data", sep="\n", file=fout)
# close the connection
close(fout)

Install package from bioconductor

source("http://bioconductor.org/biocLite.R")
# the packages I usually install after a new installation
biocLite("edgeR")
# you can hand it a list of packages
biocLite(c("ggplot2","GenomicRanges","GenomicFeatures", "gplots", "RColorBrewer", "topGO", "genefilter"))

List functions and objects in a package

ls("package:BioString")

# lists functions with call sequences:
lsf.str("package:Biostrings")

Divide each column in a matrix by it Column Sums

Use a function called sweep.

# create a matrix of random numbers
m <- matrix(sample(1:9,9),nrow=3,ncol=3)

# divide each column by column sums
sweep(m,2,colSums(m),"/")

Another way to do this would be use the prop.table function, which takes a "1" or "2" to denote by Row or by Column respectively.

prop.table(m,2)

Create an Empty Dataframe

Sometimes you want to create a data structure to populate, with pre-defined data types. How do you do that in R, in the absence of any data?

mydf <- data.frame(fruit=character(), quantity=numeric(), stringsAsFactors=FALSE)

Load and get name of R Object

Sometimes you want to load a previously saved R object into memory, and you may not remember the name of the object:

foo <- "my important string of data"
save(foo, file="mySavedObject.RData")

If you call: load("mySavedObject.RData"), the object will be put into your workspace, but you may not remember what it was called, or you may have some code that needs to operate on it regardless of what it is called. How can you refer to it (i.e. work on foo), if you don't know what it's called?

When you load the object, it goes into your environment, and if you assign the load statement to a variable, the name of the loaded object will be in the variable:

> myObjectName <- load("mySavedObject.RData")
> ls()
[1] "foo"          "myObjectName"
> print(myObjectName)
[1] "foo"

With this method, your object AND a variable holding the name of your object are in your environment. If you had written a program that needs the value of the object, but didn't know it's name, you now have a way to get it:

myName <- get(myObjectName)
# you could then choose to get rid of the original object:
rm(list=myObjectName)

Now your program has a copy of the contents of the "foo" object called "myName". But now you have an environment with two data objects that are identical. One you know the name of (myName), the other you forgot (foo). However we can get rid of the original like in the example above.

The steps above are often abbreviated as such:

myName <- get(load("mySavedObject.RData"))

But this leaves you with two objects.

Another way to do this is to load the object into a new environment, get the object into the name of a variable that you know, and then get rid of the environment.

tmp <- new.env()
myObjectName <- load("mySavedObject.RData", tmp)
myName <- get(myObjectName, tmp)
rm(tmp)

Now *myName* has the contents of *foo* but *foo* is no more.

If you have a lot of objects to parse this way, you could make it into a function:

getObject <- function(filename){
  tmp <- new.env()
  myObjectName <- load(filename, tmp)
  theObject <- get(myObjectName, tmp)
  rm(tmp)
  return(theObject)
}

myName <- getObject("mySavedObject.RData")

Or you could utilize the fact that function scope will accomplish the same thing, i.e. loading the object into a space, and returning it's value so you can assign it to a name you know, and then getting rid of the original.

getObject <- function(filename){
  theObject <- get(load(filename))
  return(theObject)
}

myName <- getObject("mySavedObject.RData")