QM-7093-NoSQL-Final/Report/Schrick-Noah_QM-7093_Final.tex

125 lines
4.5 KiB
TeX

\documentclass{article}
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{spverbatim}
\graphicspath{ {../images/} }
\usepackage[utf8]{inputenc}
\usepackage{float}
\usepackage{indentfirst}
\setlength{\parskip}{\baselineskip}%
\setlength{\belowcaptionskip}{5pt}
\title{QM 7093: Enterprise Data Systems: NoSQL with MongoDB}
\author{Noah L. Schrick}
\date{13 December, 2022}
\begin{document}
\maketitle
\tableofcontents
\newpage
\section{MongoDB}
Due to the flexible nature of the project assignment and the ability to take alternate database approaches, MongoDB was chosen as the database implementation.
While both MongoDB and CouchDB are document-based NoSQL databases, they each have differing advantages. My primary research focuses and interests revolve around the High-Performance Computing (HPC) space, and MongoDB sees greater usage in this area. MongoDB has greater scalability and better performance than CouchDB, though it does lack the design priorities of availability that CouchDB offers. MongoDB is also one of the most-widely used databases across all models, ranking at position 5 on \url{https://db-engines.com/en/ranking?utm_source=xp&utm_medium=blog&utm_campaign=content}.
\section{Insertions and Queries}
\subsection{Inserting Data}
Insert all records from the provided datasheet with the following properties:
\begin{itemize}
\item{Create all the records with columns StockCode, Description, Quantity, Price, Customer ID and Country (that means you should NOT include invoice and invoice Date in your columns).
}
\item{The code for records without customer ID should NOT have a customer ID column.
}
\item{Create another column “HighDemand” but ONLY for records with Quantity more than 12 (12 included). In the column put “Yes”.
}
\end{itemize}
To reduce the amount of manual insertions and minimize the risk of human insertion error, the xlsx datasheet was converted to a csv. Each text cell ("Description", for example) was encapsulated in quotes before the conversion. The delimiter used was a comma (","). Saving the data in a csv format allows for an easy insertion by MongoDB using mongoimport.
\begin{spverbatim}
mongoimport --db QM_7093_Final --headerline --file Project_Data.csv --type csv
\end{spverbatim}
\begin{figure}[h!]
\centering
\includegraphics[width=\linewidth]{"../images/mongoimport.png"}
\vspace*{-6mm}
\caption{Part 1.a: Importing from CSV}
\label{fig:import}
\end{figure}
Removing Invoice and InvoiceDate from the Project\_Data collection can be performed with:
\begin{spverbatim}
db.Project_Data.updateMany({}, {$unset: { "Invoice": "", "InvoiceDate": ""}} )
\end{spverbatim}
\begin{figure}[h!]
\centering
\includegraphics[width=\linewidth]{"../images/prior_invoice_drop.png"}
\vspace*{-6mm}
\caption{Collection Sample Prior to Removing Invoice and Invoice Date }
\label{fig:prior_i_drop}
\end{figure}
\begin{figure}[h!]
\centering
\includegraphics[width=\linewidth]{"../images/after_invoice_removal.png"}
\vspace*{-6mm}
\caption{Collection Sample After Removing Invoice and Invoice Date }
\label{fig:after_i_drop}
\end{figure}
Removing the CutomerID field when empty can be performed with:
\begin{spverbatim}
db.Project_Data.updateMany({"CustomerID" : ""}, { $unset : {"CustomerID" : 1 } } )
\end{spverbatim}
\begin{figure}[h!]
\centering
\includegraphics[width=\linewidth]{"../images/empty_removal.png"}
\vspace*{-6mm}
\caption{Collection Sample After Removing Empty CustomerID fields }
\label{fig:after_ci_drop}
\end{figure}
Adding "HighDemand":
\begin{spverbatim}
db.Project_Data.aggregate([
{
$addFields: {
HighDemand: {
$cond: [
{ $gt: [ "$Quantity", 11 ] },
"Yes",
"$$REMOVE"
]
}
}
},
{
$out: "Project_Data"
}
]).pretty()
\end{spverbatim}
\subsection{Queries}
\textbf{Question 1:} How many records have the column “HighDemand”? (Must have a code to answer this, one way to answer this is to have a code that displays all the records except those with the column HighDemand and then subtract the number from total number of records)
\begin{spverbatim}
db.Project_Data.count({"HighDemand": "Yes"})
\end{spverbatim}
\textbf{Question 2:} Display the records with price more than 4 (4 excluded)
\begin{spverbatim}
db.Project_Data.find({"Price": {"$gt": 4}}).pretty()
\end{spverbatim}
\section{Metadata}
\end{document}