README and R script skeleton

2022-08-30 14:48:14 -05:00 · 2022-08-30 14:48:14 -05:00 · 501bdb1979
commit 501bdb1979
3 changed files with 107 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,4 @@
+.Rproj.user
+.Rhistory
+.RData
+.Ruserdata
--- a/README.md
+++ b/README.md
@ -0,0 +1,65 @@
+# Bioinformatics Lab 1
+
+## Part A - seq function
+
+### a: Create a vector where the first element is 1, the last element is 33, with an increment of 2 between elements.
+
+### b: Create a vector with 15 equally spaced elements in which the first element is 7 and the last element is 40. Hint: use ?seq for help and the option length.out option.
+
+### c: Use the sample function to create a vector with variable name my.dna that consists of 20 uniformly-random letters “A”, “C”, “G”, and “T”.
+
+### d: Use the == logic operator and other R functions on your my.dna variable to determine how many of the letters are “A”. Hint: you can use sum on a TRUE/FALSE vector or you can use the functions which and length. 
+
+### e: Confirm your answer in d with the table(my.dna). From the output of table, create a pie chart and barplot. Add x and y labels to your barplot.
+
+### f: Use the sample function with the option prob=c(.1,.4,.4,.1)to create a vector with variable name my.dna2 that consists of 20 non-uniformly random letters “A”, “C”, “G”, and “T”.  Use table to show the nucleotide counts.
+
+## Part B: NCBI Search
+
+### Setup: Search NCBI (http://www.ncbi.nlm.nih.gov/) for “Alzheimer human.” This will take you to Entrez gene, which shows you the hits in the NCBI databases.  Choose the top hit for Alzheimer under “Gene” information. 
+
+### 1. What is the name of the gene?
+
+### 2. What chromosome is the gene on?
+
+### 3. What species has the most similar gene to the human version?
+
+## Part C: Reading fasta files, nucleotide and dinucleotide frequencies
+
+### Setup: 
+	Install and load the seqnir library  
+	Download the fasta file found from Part B  
+	Read the fasta file in as a string
+
+### 1. What data type is the fasta?
+
+### 2. Create a function that converts the fasta string to a vector
+
+### 3. Using the function from C.2, how long is the sequence?
+
+### 4. Show the first 20 nucleotides of the sequence
+
+### 5. How many of each nucleotide are there in the sequence?
+
+### 6. Create a barplot of the counts, including axes labels
+
+### 7. Calculate the probability of each nucleotide
+
+## Part D: GC Content
+
+### 1. Add code to your R script to calculate the G+C content of the fasta vector
+
+### 2. How many gc pairs are there?
+
+### 3. Show a barplot of all dinucleotide counts
+
+## Part E: Coronavirus
+
+### Setup  
+	Paper: https://www.ncbi.nlm.nih.gov/pubmed/32015508  
+	DNA/RNA: https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3?report=fasta  
+	Protein: https://www.ncbi.nlm.nih.gov/protein/QHD43415.1?report=fasta  
+
+
+### 1. Download the DNA/RNA fasta file and determine the nucleotide frequencies. Comment on how the frequencies compare with the human APOE gene.
+
--- a/Schrick-Noah_CS-6643_Lab1.R
+++ b/Schrick-Noah_CS-6643_Lab1.R
@ -0,0 +1,38 @@
+# Lab 1 for the University of Tulsa's CS-6643 Bioinformatics Course
+# Introduction to R, Online bioinformatics resources, nucleotide frequency statistics
+# Professor: Dr. McKinney, Fall 2022
+# Noah L. Schrick - 1492657
+
+#### Part A: Seq Function
+## a
+
+## b
+
+## c
+
+## d
+
+## e
+
+## f
+
+
+#### Part B: NCBI (no supporting R code for this part)
+
+#### Part C: Reading fasta files, nucelotide and dinucleotide frequencies
+
+## Pre-cursor: Load associated supportive libraries
+
+## 1
+
+## 2
+
+## 3
+
+#### Part D: GC Content
+
+## 1
+
+#### Part E: Coronavirus
+
+## 1