Coursework

This coursework will review some of the key points of introduction to R, plotting in R and genomics data courses.

All students should download the zip file for this course work from here.

Student should include within this directory the R script containing the code to generate the required results under the Exercise/Question number the R code refers too.

For example.(This is not the answer)

Question 1 - Read in the differential expression table and produce a data.frame of all results. How many genes have a padj < 0.05.

myRers <- read.delim("DE_Genes/Expression.csv",sep=">",header=FALSE)
length(myRers)

For exercise 2, question 7, a figure should be included within results directory.

For exercise 3, your Github ID should be included in an email to bioinformatics resource centre (brc@rockefeller.edu) by the deadline listed below.

The directory of R code and image for exercise 2, question 7 should be made available to me (brc@rockefeller.edu) by deadline below.

The deadline for course work is April 30th, 2018

Exercise 1 - Working with Differential Gene expression results.

In this question we can review results from the re-analysis of some publically available RNA-seq data.

The data we use is from the Encode consortium, Experiments ENCSR297UBP (GM12878 cell line) and ENCSR552EGO (HeLa cell lines)

The data is part of the Encode project and so more information can be found at the Encode portal.

I have already created a table of differentially expressed genes between GM12878 and HeLa and a table of gene expression values for all samples which you can find in directory DE_Genes.

Differential Expression Table - DE_Genes/GM12878_Minus_HeLa_DEG.csv

Absolute Expression Table - DE_Genes/Expression.csv

  • Question 1 - Read in the differential expression table and produce a data.frame of all results. How many genes have a padj < 0.05.

  • Question 2 - Now with these genes with a padj < 0.05, create a scatter plot (as seen below) of -log10 pvalues on Y axis and log2FoldChange on X axis using ggplot2.

  • Question 3 - Read in the absolute expression table, add 1 to every value in table and make a boxplot of log10 expression values for all samples.

  • Question 4 - Now create a similar boxplot with just genes that have a padj < 0.05 and a log2FoldChange > 1.

  • Question 5 - Using the absolute expression table, identify the genes whose expression is in the top 60%. Filter the results from the differential expression table to these results and plot the log2 basemean on X and log2FoldChange on Y. Highlight genes who have padj < 0.05

Exercise 2 - Working with HOMER peak calls for H3K27Ac signal.

In this question we can review some publically available data.

The data we use is H3K27Ac ChIP-seq from the Encode consortium, Experiment ENCSR863VHE and sample 1 (ENCBS844FSC) out of two replicates.

The data is part of the Encode project and so more information can be found at the Encode portal.

I have already identified a set of genomic locations enriched for H3K27Ac signal using the HOMER software and you will find this in the directory HOMER_peaks.

HOMER_peaks/H3K27Ac_Limb_1.txt

  • Question 1 - Read in H3K27Ac_Limb_1.txt file and report the number of genomic locations listed in file.

  • Question 2 - Make a histogram of the log10 of regions sizes as shown below using base graphics.

  • Question 3 - Make a density plot of the log10 of regions sizes as shown below using ggplot graphics.

  • Question 4 - Make a density plot for each chromosome of the log10 of regions sizes as shown below using ggplot graphics.

  • Question 5 - Make a boxplot plot of the log10 of findPeaks.Score for each chromosome as shown below using ggplot graphics.

  • Question 6 - Export the Homer genomic regions as a BED3 file.

  • Question 7 - Load the generated BED3 file into IGV and compare to the peaks in bed/narrowPeak and signal p-value in bigWig format found at the Encode portal. Capture a image and include in results.

Exercise 3 - Create a Github account.

Github offers a good service for storage, organisation and source control of our code.

Create a Github account for yourself following the instructions here https://services.github.com/on-demand/intro-to-github/create-github-account.