Random sampling of fastq files
Use the ShortRead library from bioconductor to take random samples of a fastq file. Can be useful for systematically down sampling data.
The script below reads a directory of g'zipd fastq files with names based on a bar code index, takes 1 million reads a random, and write a new file to the current directory.
library(ShortRead) fqpath <- "/fastq/C0TD0ACXX/" fqfiles <- dir(pattern="*[AGCT].fastq.gz$", path=fqpath) for( f in fqfiles){ cat("sampling",f,"\n") s1 <- FastqSampler(paste(fqpath,f,sep=""),10e6) newName <- sub(".gz","",f) writeFastq(yield(s1),file=newName) }