This coursework will review some of the key points of introduction to R, plotting in R and genomics data courses.
All students should download the zip file for this course work from here.
Student should include within this directory the R script containing the code to generate the required results under the Exercise/Question number the R code refers too.
For example.(This is not the answer)
myRers <- read.delim("DE_Genes/Expression.csv",sep=">",header=FALSE)
length(myRers)
For exercise 2, question 7, a figure should be included within results directory.
For exercise 3, your Github ID should be included in an email to bioinformatics resource centre (brc@rockefeller.edu) by the deadline listed below.
The directory of R code and image for exercise 2, question 7 should be made available to me (brc@rockefeller.edu) by deadline below.
The deadline for course work is April 30th, 2018
In this question we can review results from the re-analysis of some publically available RNA-seq data.
The data we use is from the Encode consortium, Experiments ENCSR297UBP (GM12878 cell line) and ENCSR552EGO (HeLa cell lines)
The data is part of the Encode project and so more information can be found at the Encode portal.
I have already created a table of differentially expressed genes between GM12878 and HeLa and a table of gene expression values for all samples which you can find in directory DE_Genes.
Differential Expression Table - DE_Genes/GM12878_Minus_HeLa_DEG.csv
Absolute Expression Table - DE_Genes/Expression.csv
Question 1 - Read in the differential expression table and produce a data.frame of all results. How many genes have a padj < 0.05.
Question 2 - Now with these genes with a padj < 0.05, create a scatter plot (as seen below) of -log10 pvalues on Y axis and log2FoldChange on X axis using ggplot2.
In this question we can review some publically available data.
The data we use is H3K27Ac ChIP-seq from the Encode consortium, Experiment ENCSR863VHE and sample 1 (ENCBS844FSC) out of two replicates.
The data is part of the Encode project and so more information can be found at the Encode portal.
I have already identified a set of genomic locations enriched for H3K27Ac signal using the HOMER software and you will find this in the directory HOMER_peaks.
HOMER_peaks/H3K27Ac_Limb_1.txt
Question 1 - Read in H3K27Ac_Limb_1.txt file and report the number of genomic locations listed in file.
Question 2 - Make a histogram of the log10 of regions sizes as shown below using base graphics.
Question 6 - Export the Homer genomic regions as a BED3 file.
Question 7 - Load the generated BED3 file into IGV and compare to the peaks in bed/narrowPeak and signal p-value in bigWig format found at the Encode portal. Capture a image and include in results.
Github offers a good service for storage, organisation and source control of our code.
Create a Github account for yourself following the instructions here https://services.github.com/on-demand/intro-to-github/create-github-account.