QM-7093-NoSQL-Final/Report/Schrick-Noah_QM-7093_Final.tex
2022-12-06 17:14:06 -06:00

179 lines
6.9 KiB
TeX

\documentclass{article}
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{spverbatim}
\graphicspath{ {../images/} }
\usepackage[utf8]{inputenc}
\usepackage{float}
\usepackage{indentfirst}
\setlength{\parskip}{\baselineskip}%
\setlength{\belowcaptionskip}{8pt}
\title{QM 7093: Enterprise Data Systems: NoSQL with MongoDB}
\author{Noah L. Schrick}
\date{13 December, 2022}
\begin{document}
\maketitle
\tableofcontents
\newpage
\section{MongoDB}
Due to the flexible nature of the project assignment and the ability to take alternate database approaches, MongoDB was chosen as the database implementation.
While both MongoDB and CouchDB are document-based NoSQL databases, they each have differing advantages. My primary research focuses and interests revolve around the High-Performance Computing (HPC) space, and MongoDB sees greater usage in this area. MongoDB has greater scalability and better performance than CouchDB, though it does lack the design priorities of availability that CouchDB offers. MongoDB is also one of the most-widely used databases across all models, ranking at position 5 on \url{https://db-engines.com/en/ranking?utm_source=xp&utm_medium=blog&utm_campaign=content}.
\section{Insertions and Queries}
\subsection{Inserting Data}
Insert all records from the provided datasheet with the following properties:
\begin{itemize}
\item{Create all the records with columns StockCode, Description, Quantity, Price, Customer ID and Country (that means you should NOT include invoice and invoice Date in your columns).
}
\item{The code for records without customer ID should NOT have a customer ID column.
}
\item{Create another column “HighDemand” but ONLY for records with Quantity more than 12 (12 included). In the column put “Yes”.
}
\end{itemize}
To reduce the amount of manual insertions and minimize the risk of human insertion error, the xlsx datasheet was converted to a csv. Each text cell ("Description", for example) was encapsulated in quotes before the conversion. The delimiter used was a comma (","). Saving the data in a csv format allows for an easy insertion by MongoDB using mongoimport.
\begin{spverbatim}
mongoimport --db QM_7093_Final --headerline --file Project_Data.csv --type csv
\end{spverbatim}
In real-world or large dataset applications, this is not an ideal approach. Inserting data only to subsequently remove it is an inefficient and unoptimized approach. Pre-processing the data to selectively remove unwanted records before insertion would be a better solution.
\begin{figure}[!h!]
\centering
\includegraphics[width=\linewidth]{"../images/mongoimport.png"}
\vspace*{-6mm}
\caption{Part 1.a: Importing from CSV}
\label{fig:import}
\end{figure}
Removing Invoice and InvoiceDate from the Project\_Data collection can be performed with:
\begin{spverbatim}
db.Project_Data.updateMany({}, {$unset: { "Invoice": "", "InvoiceDate": ""}} )
\end{spverbatim}
An image of a sample record prior to removing Invoice and Invoice Date can be seen in Figure \ref{fig:prior_i_drop}, and an image of the same record after removing the fields can be seen in Figure \ref{fig:after_i_drop}.
\begin{figure}[h!]
\centering
\includegraphics[width=\linewidth]{"../images/prior_invoice_drop.png"}
\vspace*{-6mm}
\caption{Collection Sample Prior to Removing Invoice and Invoice Date }
\label{fig:prior_i_drop}
\end{figure}
\begin{figure}[h!]
\centering
\includegraphics[width=\linewidth]{"../images/after_invoice_removal.png"}
\vspace*{-6mm}
\caption{Collection Sample After Removing Invoice and Invoice Date }
\label{fig:after_i_drop}
\end{figure}
Removing the CutomerID field when empty can be performed with:
\begin{spverbatim}
db.Project_Data.updateMany({"CustomerID" : ""}, { $unset : {"CustomerID" : 1 } } )
\end{spverbatim}
\begin{figure}[h!]
\centering
\includegraphics[width=\linewidth]{"../images/empty_removal.png"}
\vspace*{-6mm}
\caption{Collection Sample After Removing Empty CustomerID fields }
\label{fig:after_ci_drop}
\end{figure}
Adding the HighDemand column can be performed with an aggregate function. If the condition is met, the true value adds "Yes" to the value of the newly added field. \$\$REMOVE is a built-in mongo indicator to suppress. If the condition is not met, the field is not added to the record. By default, aggregates only read and do not write. Adding \$out tells mongo to write the results. In this case, we have told mongo to write back to the Project\_Data collection.
\begin{spverbatim}
db.Project_Data.aggregate([
{
$addFields: {
HighDemand: {
$cond: [
{ $gt: [ "$Quantity", 11 ] },
"Yes",
"$$REMOVE"
]
}
}
},
{
$out: "Project_Data"
}
]).pretty()
\end{spverbatim}
\subsection{Queries}
\textbf{Question 1:} How many records have the column “HighDemand”? (Must have a code to answer this, one way to answer this is to have a code that displays all the records except those with the column HighDemand and then subtract the number from total number of records)
\begin{spverbatim}
db.Project_Data.count({"HighDemand": "Yes"})
\end{spverbatim}
Answer: 8.
A partial image of the records with column HighDemand can be seen in Figure \ref{fig:high-demand}.
\begin{figure}[h!]
\centering
\includegraphics[width=\linewidth]{"../images/high_demand.png"}
\vspace*{-6mm}
\caption{Records with HighDemand}
\label{fig:high-demand}
\end{figure}
\textbf{Question 2:} Display the records with price more than 4 (4 excluded)
\begin{spverbatim}
db.Project_Data.find({"Price": {"$gt": 4}}).pretty()
\end{spverbatim}
Total number of records with price greater than four: 6.
\begin{figure}[h!]
\centering
\includegraphics[scale=0.5]{"../images/gt_four.png"}
\vspace*{-3mm}
\caption{Records with Price Greater than Four}
\label{fig:gt_four}
\end{figure}
\clearpage
\section{Metadata}
\begin{spverbatim}
{
"metadata": [
{"key": "InvoiceReceipt", "value": "IN-C123456.pdf"},
{"key": "File size", "value": 32764},
{"key": "MIME type", "value": "application/pdf"},
{"key": "CancellationStatus", "value": "true"},
{"key": "Author", "value": {"LName": "Schrick”, "FName": "Noah”}},
{"key": "Security", "value": "false”},
{"key": "Fonts", "value": "Calibri"},
{"key": "URL", "value": ""},
{"key": "RevisionTimestamp", "value": ISODate("2022-12-08T10:01:00Z")},
{"key": "ItemCategory", "value": "furniture"},
{"key": "ItemWeight", "value": "20"},
{"key": "CustomerStanding", "value": "good"},
{"key": "PaymentMethod", "value": "Bank"},
{"key": "CreditCardVendor", "value": ""},
{"key": "Origin", "value": "USA"},
{"key": "Expedited", "value": "false"},
{"key": "HoldStatus", "value": ""},
{"key": "Wholesaler", "value": "true"}
]
}
\end{spverbatim}
\end{document}