Revision 1 as of 2010-03-12 23:05:59

Clear message

Convert BAM to a GenomeDataObject

There must be a better way to do this, but I can't figure it out.

For ChIP SEQ analysis I usually start with Eland Export files that I read into R with ShortReads, and convert to GenomeData objects for use with the chipseq package. However, lately I've been stuck with BAM files, which are still mysterious and awkward to me.

In order to get the files into R in a form I can start analyzing, I use Rsamtools, which isn't officially out yet, but it's installed in my environment and can read BAM files. The code below basically reads in the BAM file, extracts out the chromosome, strand, and position, for every read, and builds a GenomeData object.

A genome data object is basically a list of chromosomes, whereby each chromosome has s sublist of "+" and "-" vectors of integers representing alignment positions for each strand.

library(Rsamtools)

# read in a BAM file
wt <- scanBam("mybamfile.bam")

# get chr names
bamchrs <- levels(wt[[1]][["rname"]])

# create empty list
mygd <- list()
for( i in 1:length(bamchrs) ){

  # create index vectors for positions and strands
  chriv <- wt[[1]][["rname"]] == bamchrs[i]
  plusStriv <- wt[[1]][["strand"]] == "+"
  minusStriv <- wt[[1]][["strand"]] == "-"
  # create genome data-like list
  mygd[[i]] <- list(list("+"=wt[[1]][["pos"]][chriv & plusStriv], "-"=wt[[1]][["pos"]][chriv & minusStriv]))

}

names(mygd) <- bamchrs
# convert to a GenomeData object
mygd <- GenomeData(listData = mygd)