diff --git a/README.md b/README.md index 4f4e118..2f05c2f 100644 --- a/README.md +++ b/README.md @@ -2,64 +2,84 @@ ## Part A - seq function -### a: Create a vector where the first element is 1, the last element is 33, with an increment of 2 between elements. +### a: +Create a vector where the first element is 1, the last element is 33, with an increment of 2 between elements. -### b: Create a vector with 15 equally spaced elements in which the first element is 7 and the last element is 40. Hint: use ?seq for help and the option length.out option. +### b: +Create a vector with 15 equally spaced elements in which the first element is 7 and the last element is 40. Hint: use ?seq for help and the option length.out option. -### c: Use the sample function to create a vector with variable name my.dna that consists of 20 uniformly-random letters “A”, “C”, “G”, and “T”. +### c: +Use the sample function to create a vector with variable name my.dna that consists of 20 uniformly-random letters “A”, “C”, “G”, and “T”. -### d: Use the == logic operator and other R functions on your my.dna variable to determine how many of the letters are “A”. Hint: you can use sum on a TRUE/FALSE vector or you can use the functions which and length. +###d: +Use the == logic operator and other R functions on your my.dna variable to determine how many of the letters are “A”. Hint: you can use sum on a TRUE/FALSE vector or you can use the functions which and length. -### e: Confirm your answer in d with the table(my.dna). From the output of table, create a pie chart and barplot. Add x and y labels to your barplot. +### e: +Confirm your answer in d with the table(my.dna). From the output of table, create a pie chart and barplot. Add x and y labels to your barplot. -### f: Use the sample function with the option prob=c(.1,.4,.4,.1)to create a vector with variable name my.dna2 that consists of 20 non-uniformly random letters “A”, “C”, “G”, and “T”. Use table to show the nucleotide counts. +### f: +Use the sample function with the option prob=c(.1,.4,.4,.1)to create a vector with variable name my.dna2 that consists of 20 non-uniformly random letters “A”, “C”, “G”, and “T”. Use table to show the nucleotide counts. ## Part B: NCBI Search -### Setup: Search NCBI (http://www.ncbi.nlm.nih.gov/) for “Alzheimer human.” This will take you to Entrez gene, which shows you the hits in the NCBI databases. Choose the top hit for Alzheimer under “Gene” information. +### Setup: +Search NCBI (http://www.ncbi.nlm.nih.gov/) for “Alzheimer human.” This will take you to Entrez gene, which shows you the hits in the NCBI databases. Choose the top hit for Alzheimer under “Gene” information. -### 1. What is the name of the gene? +### 1. +What is the name of the gene? -### 2. What chromosome is the gene on? +### 2. +What chromosome is the gene on? -### 3. What species has the most similar gene to the human version? +### 3. +What species has the most similar gene to the human version? ## Part C: Reading fasta files, nucleotide and dinucleotide frequencies ### Setup: - Install and load the seqnir library - Download the fasta file found from Part B - Read the fasta file in as a string +Install and load the seqnir library +Download the fasta file found from Part B +Read the fasta file in as a string -### 1. What data type is the fasta? +### 1. +What data type is the fasta? -### 2. Create a function that converts the fasta string to a vector +### 2. +Create a function that converts the fasta string to a vector -### 3. Using the function from C.2, how long is the sequence? +### 3. +Using the function from C.2, how long is the sequence? -### 4. Show the first 20 nucleotides of the sequence +### 4. +Show the first 20 nucleotides of the sequence -### 5. How many of each nucleotide are there in the sequence? +### 5. +How many of each nucleotide are there in the sequence? -### 6. Create a barplot of the counts, including axes labels +### 6. +Create a barplot of the counts, including axes labels -### 7. Calculate the probability of each nucleotide +### 7. +Calculate the probability of each nucleotide ## Part D: GC Content -### 1. Add code to your R script to calculate the G+C content of the fasta vector +### 1. +Add code to your R script to calculate the G+C content of the fasta vector -### 2. How many gc pairs are there? +### 2. +How many gc pairs are there? -### 3. Show a barplot of all dinucleotide counts +### 3. +Show a barplot of all dinucleotide counts ## Part E: Coronavirus ### Setup - Paper: https://www.ncbi.nlm.nih.gov/pubmed/32015508 - DNA/RNA: https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3?report=fasta - Protein: https://www.ncbi.nlm.nih.gov/protein/QHD43415.1?report=fasta +Paper: https://www.ncbi.nlm.nih.gov/pubmed/32015508 +DNA/RNA: https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3?report=fasta +Protein: https://www.ncbi.nlm.nih.gov/protein/QHD43415.1?report=fasta - -### 1. Download the DNA/RNA fasta file and determine the nucleotide frequencies. Comment on how the frequencies compare with the human APOE gene. +### 1. +Download the DNA/RNA fasta file and determine the nucleotide frequencies. Comment on how the frequencies compare with the human APOE gene.