README formatting

This commit is contained in:
Noah L. Schrick 2022-08-30 14:53:29 -05:00
parent 501bdb1979
commit 35b3ef9366

View File

@ -2,64 +2,84 @@
## Part A - seq function ## Part A - seq function
### a: Create a vector where the first element is 1, the last element is 33, with an increment of 2 between elements. ### a:
Create a vector where the first element is 1, the last element is 33, with an increment of 2 between elements.
### b: Create a vector with 15 equally spaced elements in which the first element is 7 and the last element is 40. Hint: use ?seq for help and the option length.out option. ### b:
Create a vector with 15 equally spaced elements in which the first element is 7 and the last element is 40. Hint: use ?seq for help and the option length.out option.
### c: Use the sample function to create a vector with variable name my.dna that consists of 20 uniformly-random letters “A”, “C”, “G”, and “T”. ### c:
Use the sample function to create a vector with variable name my.dna that consists of 20 uniformly-random letters “A”, “C”, “G”, and “T”.
### d: Use the == logic operator and other R functions on your my.dna variable to determine how many of the letters are “A”. Hint: you can use sum on a TRUE/FALSE vector or you can use the functions which and length. ###d:
Use the == logic operator and other R functions on your my.dna variable to determine how many of the letters are “A”. Hint: you can use sum on a TRUE/FALSE vector or you can use the functions which and length.
### e: Confirm your answer in d with the table(my.dna). From the output of table, create a pie chart and barplot. Add x and y labels to your barplot. ### e:
Confirm your answer in d with the table(my.dna). From the output of table, create a pie chart and barplot. Add x and y labels to your barplot.
### f: Use the sample function with the option prob=c(.1,.4,.4,.1)to create a vector with variable name my.dna2 that consists of 20 non-uniformly random letters “A”, “C”, “G”, and “T”. Use table to show the nucleotide counts. ### f:
Use the sample function with the option prob=c(.1,.4,.4,.1)to create a vector with variable name my.dna2 that consists of 20 non-uniformly random letters “A”, “C”, “G”, and “T”. Use table to show the nucleotide counts.
## Part B: NCBI Search ## Part B: NCBI Search
### Setup: Search NCBI (http://www.ncbi.nlm.nih.gov/) for “Alzheimer human.” This will take you to Entrez gene, which shows you the hits in the NCBI databases. Choose the top hit for Alzheimer under “Gene” information. ### Setup:
Search NCBI (http://www.ncbi.nlm.nih.gov/) for “Alzheimer human.” This will take you to Entrez gene, which shows you the hits in the NCBI databases. Choose the top hit for Alzheimer under “Gene” information.
### 1. What is the name of the gene? ### 1.
What is the name of the gene?
### 2. What chromosome is the gene on? ### 2.
What chromosome is the gene on?
### 3. What species has the most similar gene to the human version? ### 3.
What species has the most similar gene to the human version?
## Part C: Reading fasta files, nucleotide and dinucleotide frequencies ## Part C: Reading fasta files, nucleotide and dinucleotide frequencies
### Setup: ### Setup:
Install and load the seqnir library Install and load the seqnir library
Download the fasta file found from Part B Download the fasta file found from Part B
Read the fasta file in as a string Read the fasta file in as a string
### 1. What data type is the fasta? ### 1.
What data type is the fasta?
### 2. Create a function that converts the fasta string to a vector ### 2.
Create a function that converts the fasta string to a vector
### 3. Using the function from C.2, how long is the sequence? ### 3.
Using the function from C.2, how long is the sequence?
### 4. Show the first 20 nucleotides of the sequence ### 4.
Show the first 20 nucleotides of the sequence
### 5. How many of each nucleotide are there in the sequence? ### 5.
How many of each nucleotide are there in the sequence?
### 6. Create a barplot of the counts, including axes labels ### 6.
Create a barplot of the counts, including axes labels
### 7. Calculate the probability of each nucleotide ### 7.
Calculate the probability of each nucleotide
## Part D: GC Content ## Part D: GC Content
### 1. Add code to your R script to calculate the G+C content of the fasta vector ### 1.
Add code to your R script to calculate the G+C content of the fasta vector
### 2. How many gc pairs are there? ### 2.
How many gc pairs are there?
### 3. Show a barplot of all dinucleotide counts ### 3.
Show a barplot of all dinucleotide counts
## Part E: Coronavirus ## Part E: Coronavirus
### Setup ### Setup
Paper: https://www.ncbi.nlm.nih.gov/pubmed/32015508 Paper: https://www.ncbi.nlm.nih.gov/pubmed/32015508
DNA/RNA: https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3?report=fasta DNA/RNA: https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3?report=fasta
Protein: https://www.ncbi.nlm.nih.gov/protein/QHD43415.1?report=fasta Protein: https://www.ncbi.nlm.nih.gov/protein/QHD43415.1?report=fasta
### 1.
### 1. Download the DNA/RNA fasta file and determine the nucleotide frequencies. Comment on how the frequencies compare with the human APOE gene. Download the DNA/RNA fasta file and determine the nucleotide frequencies. Comment on how the frequencies compare with the human APOE gene.