From 97f529ff0222538977fa76304673950e4b44b0f5 Mon Sep 17 00:00:00 2001 From: noah Date: Tue, 30 Aug 2022 14:59:11 -0500 Subject: [PATCH] README formatting --- README.md | 82 ++++++++++++++++++------------------------------------- 1 file changed, 27 insertions(+), 55 deletions(-) diff --git a/README.md b/README.md index 2f05c2f..94f1637 100644 --- a/README.md +++ b/README.md @@ -2,76 +2,48 @@ ## Part A - seq function -### a: -Create a vector where the first element is 1, the last element is 33, with an increment of 2 between elements. - -### b: -Create a vector with 15 equally spaced elements in which the first element is 7 and the last element is 40. Hint: use ?seq for help and the option length.out option. - -### c: -Use the sample function to create a vector with variable name my.dna that consists of 20 uniformly-random letters “A”, “C”, “G”, and “T”. - -###d: -Use the == logic operator and other R functions on your my.dna variable to determine how many of the letters are “A”. Hint: you can use sum on a TRUE/FALSE vector or you can use the functions which and length. - -### e: -Confirm your answer in d with the table(my.dna). From the output of table, create a pie chart and barplot. Add x and y labels to your barplot. - -### f: -Use the sample function with the option prob=c(.1,.4,.4,.1)to create a vector with variable name my.dna2 that consists of 20 non-uniformly random letters “A”, “C”, “G”, and “T”. Use table to show the nucleotide counts. +a: Create a vector where the first element is 1, the last element is 33, with an increment of 2 between elements. +b: Create a vector with 15 equally spaced elements in which the first element is 7 and the last element is 40. +c: Use the sample function to create a vector with variable name my.dna that consists of 20 uniformly-random letters “A”, “C”, “G”, and “T”. +d: Use the == logic operator and other R functions on your my.dna variable to determine how many of the letters are “A”. Hint: you can use sum on a TRUE/FALSE vector or you can use the functions which and length. +e: Confirm your answer in d with the table(my.dna). From the output of table, create a pie chart and barplot. Add x and y labels to your barplot. +f: Use the sample function with the option prob=c(.1,.4,.4,.1)to create a vector with variable name my.dna2 that consists of 20 non-uniformly random letters “A”, “C”, “G”, and “T”. Use table to show the nucleotide counts. ## Part B: NCBI Search -### Setup: +### Setup + Search NCBI (http://www.ncbi.nlm.nih.gov/) for “Alzheimer human.” This will take you to Entrez gene, which shows you the hits in the NCBI databases. Choose the top hit for Alzheimer under “Gene” information. -### 1. -What is the name of the gene? +### Evaluation -### 2. -What chromosome is the gene on? - -### 3. -What species has the most similar gene to the human version? +a: What is the name of the gene? +b: What chromosome is the gene on? +c: What species has the most similar gene to the human version? ## Part C: Reading fasta files, nucleotide and dinucleotide frequencies -### Setup: +### Setup + Install and load the seqnir library Download the fasta file found from Part B Read the fasta file in as a string -### 1. -What data type is the fasta? +### Evaluation -### 2. -Create a function that converts the fasta string to a vector - -### 3. -Using the function from C.2, how long is the sequence? - -### 4. -Show the first 20 nucleotides of the sequence - -### 5. -How many of each nucleotide are there in the sequence? - -### 6. -Create a barplot of the counts, including axes labels - -### 7. -Calculate the probability of each nucleotide +a: What data type is the fasta? +b: Create a function that converts the fasta string to a vector +c: Using the function from C.2, how long is the sequence? +d: Show the first 20 nucleotides of the sequence +e: How many of each nucleotide are there in the sequence? +f: Create a barplot of the counts, including axes labels +g: Calculate the probability of each nucleotide ## Part D: GC Content -### 1. -Add code to your R script to calculate the G+C content of the fasta vector - -### 2. -How many gc pairs are there? - -### 3. -Show a barplot of all dinucleotide counts +a: Add code to your R script to calculate the G+C content of the fasta vector +b: How many gc pairs are there? +c: Show a barplot of all dinucleotide counts ## Part E: Coronavirus @@ -80,6 +52,6 @@ Paper: https://www.ncbi.nlm.nih.gov/pubmed/32015508 DNA/RNA: https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3?report=fasta Protein: https://www.ncbi.nlm.nih.gov/protein/QHD43415.1?report=fasta -### 1. -Download the DNA/RNA fasta file and determine the nucleotide frequencies. Comment on how the frequencies compare with the human APOE gene. +### Evaluation +a: Download the DNA/RNA fasta file and determine the nucleotide frequencies. Comment on how the frequencies compare with the human APOE gene.