Bioinformatics Lab 2
Part A - hclust
1.) Write a 2-column linkage matrix using the following hclust rules
- Label the objects A, B, C, D, E, F as negative numbers -1, -2, -3, -4, -5, -6
- Put the closest pair of objects in the first row of the matrix. The first row will be: -1, -2
- Merge AB (1) with C (-3)
- Repeat for the rest of the tree.
- Show the full merge matrix and the heights of each row in the merge matrix.
2.) Create a script with the following incomplete code. Based on your UPGMA merging calculation and your linkage matrix above, fill in the blanks in the code below to plot the tree. Show the resulting plot.
# initialize an empty list that will contain fields of our hclust object
a <- list()
# encoding rules:
# negative numbers are leaves (A,B,...,E) -> (-1,-2,...,-5)
# positive are merged clusters (defined by row number in $merge)
# each merged pair in a row
a$merge <- rbind(c(-1, -2), # +1 (A-B)
c(1,-3) , # +2 (A-B)-C
_______ , # +3 (D-E)
_______ , # +4 ((A-B)-C)-(DE)
_______) # +5 (((A-B)-C)-(DE))-F
a$height <- c(1, 2, 1.5, 3, 4) # merge heights
a$order <- 1:6 # order of leaves
a$labels <- LETTERS[1:6] # labels of leaves as Letters
class(a) <- "hclust" # force a to be hclust object
# plot the tree
# plot knows that a is an hclust object and plots accordingly
plot(a,hang=-1) # or use plot(as.dendrogram(a))
Part B - UPGMA and WGCNA
-
Add the code below to your script to read in the distance matrix distance_matrix.txt and plot the dendrogram. Show the dendrogram.
# Set the Working Directory First my.dist1 <- read.table("distance_matrix.txt",sep="\t", header=T) rownames(my.dist1) = colnames(my.dist1) my.dist1[lower.tri(my.dist1)] = t(my.dist1)[lower.tri(my.dist1)] hc1 <- hclust(as.dist(my.dist1), method="average") plot(hc1,hang=-1)
2.) View the distance matrix in Excel and paste it in the report. Use header=T.
3.) What does header=T do in this situation?
4.) Add the following code to install and run WGCNA.
# method 1
install.packages("WGCNA", dependencies=T)
source("http://bioconductor.org/biocLite.R")
biocLite(c("AnnotationDbi", "impute", "GO.db", "preprocessCore"))
biocLite("WGCNA")
install.packages("WGCNA")
# method 2 (easier method)
install.packages("BiocManager")
library(BiocManager)
BiocManager::install("WGCNA")
# load WGCNA and run on the hc1 dendrogram
cutree(hc1,k=3) # hclust cluster labels with k=3 clusters
library(WGCNA)
dynamicMods = cutreeDynamic(
dendro = hc1,
distM = my.dist1,
deepSplit = 2,
pamRespectsDendro = FALSE,
minClusterSize = 2)
names(dynamicMods) <- colnames(my.dist1)
5.) Looking at dynamicMods, what leaves are clustered together?
6.) Add the following code to show the clustered data. What do the colors at the bottom represent?
dynamicColors = labels2colors(dynamicMods) # colors might not work
table(dynamicColors)
# Plot the dendrogram and colors underneath
sizeGrWindow(8,6)
plotDendroAndColors(hc1,
dynamicColors,
"Dynamic Tree Cut",
dendroLabels = NULL,
hang = -1,
addGuide = TRUE,
#guideHang = 0.05,
main = "dynamic cut clustering")
Part C - Distance Matrix
| <> | A | B | C | D |
|---|---|---|---|---|
| A | x | 0.14 | 0.48 | 0.33 |
| B | x | 0.5 | 0.28 | |
| C | x | 0.55 | ||
| D | x |
1.) Do the UPGMA by hand, showing the new UPGMA distance matrices as you merge clusters.
2.) Create the linkage matrix as you did in A.1, and list the height of each branch. For example, the first row of the linkage matrix is -1, -2 with height .14.
3.) Mirror the code you used in A.2 to plot the dendrogram for this new matrix.
4.) Use code below to verify your results by applying hclust to the distance matrix.
# create distance matrix
my.dist2 <- matrix(data=rep(0,16),ncol=4)
my.dist2[1,2]<-.14; my.dist2[1,3]<-.48; my.dist2[1,4]<-.33;
my.dist2[2,3]<-.50; my.dist2[2,4]<-.28;
my.dist2[3,4]<-.55;
# make matrix symmetric
# make the lower triangle equal to the upper triangle
my.dist2[lower.tri(my.dist2)] = t(my.dist2)[lower.tri(my.dist2)]
diag(my.dist2)<-999 # big number to guarantee it’s not the min
# compare with hclust
test <- hclust(as.dist(my.dist2), method="average") # average linkage, UPGMA
test$merge
test$height
test$order
test$labels<-LETTERS[1:4]
plot(test,hang=-1)
Part D - Multidimensional Scaling
1.) Show the 2d MDS plot for my.dist2. How does this visualization change your perspective on how the objects are clustered?
## MDS, you might need to refresh (sweep) the plot window
locs<-cmdscale(as.dist(my.dist2), k=2) # k is number of mds components
x<-locs[,1]
y<-locs[,2]
# pch=NA hides plot symbols:
plot(x,y,main="Multi-dimensional Scaling",
xlab="MDS dimension-1", ylab="MDS dimension-2", pch=NA,asp=1)
# pch specifies plot symbol; we want null because we are using letter next
# plot the text labels instead of symbols:
text(x,y,test$labels,cex=1)