Goals

Introducing Rend-Seq data

The Li Lab generates end-enriched RNA sequencing (Rend-Seq) data that looks something like this:

image

In the bottom pane, we can see the mrp is a gene on the minus (reverse) strand of the E. coli genome and metG is a gene on the plus (forward) strand of the genome.

The top pane shows end-enriched reads from the plus strand and the middle pane shows reads from the minus strand. 5’ (often transcription start) peaks are in blue, and 3’ (often transcription stop) peaks are in red. So, for example, we can see that metG has strong 3’ peak at the end of the transcript, but not as strong of a 5’ peak. In contrast, mrp has approximately equally-strong 5’ and 3’ peaks.

Peak Calling

Introduction

By eye, it is easy to see end-enriched peaks in the data above. Our eyes are fairly adept at spotting peaks, but it can be a time-consuming process to annotate all peaks in the genome (Li Lab member Mandy Levine did just this with the small T4 genome and it took hundreds of hours).

For example, take a look at this genomic region:

image

It’s relatively easy to see real peaks between the thiM and rcnR gene bodies. However, the signals that we see through the thiM gene body don’t seem biological: those are probably sequencing artificats. Computationally, it might not be trivial to distinguish between these two.

Implementation

TODO