Chapter 3 Editing

This commit is contained in:
Noah L. Schrick 2022-04-03 19:02:40 -05:00
parent e6324fc2a3
commit b5e2a19428

View File

@ -84,7 +84,7 @@ performance benefits of memory operations, since graph computation relies less o
\cite{zhang_boosting_2017}, \cite{ainsworth_graph_2016}, \cite{berry_graph_2007}. The author of \cite{cook_rage_2018} does incorporate PostgreSQL as an initial and final storage mechanism to write the starting and resulting \cite{zhang_boosting_2017}, \cite{ainsworth_graph_2016}, \cite{berry_graph_2007}. The author of \cite{cook_rage_2018} does incorporate PostgreSQL as an initial and final storage mechanism to write the starting and resulting
graph information, but no intermediate storage is otherwise conducted. graph information, but no intermediate storage is otherwise conducted.
While the design decision to not use intermediate storage maximizes performance for graph generation, it does suffer from a few complications. When generating large graphs, the system runs the risk While the design decision to not use intermediate storage maximizes performance for graph generation, it introduces a few complications. When generating large graphs, the system runs the risk
of running out of memory. This typically does not occur when generation is conducted on small graphs, and is especially true when relatively small graphs are generated on an HPC of running out of memory. This typically does not occur when generation is conducted on small graphs, and is especially true when relatively small graphs are generated on an HPC
system with substantial amounts of memory. However, when running on local systems or when the graph is large, memory can quickly be depleted due to state space explosion. The memory depletion is due to two primary system with substantial amounts of memory. However, when running on local systems or when the graph is large, memory can quickly be depleted due to state space explosion. The memory depletion is due to two primary
memory consumption points: the frontier which contains all of the states that still need to be explored, and the graph instance which holds all of the states and their information, memory consumption points: the frontier which contains all of the states that still need to be explored, and the graph instance which holds all of the states and their information,
@ -115,14 +115,14 @@ performance benefits of memory operations, since graph computation relies less o
and are then removed from memory. and are then removed from memory.
However, a new issue arose with database storage. The original design was to save staging, preparation, and communication cost by writing all the data in one query (as in, writing all of the network states in one query, all the network However, a new issue arose with database storage. The original design was to save staging, preparation, and communication cost by writing all the data in one query (as in, writing all of the network states in one query, all the network
state items in one query, and all the edges in one query). While this was best option in terms of performance, it was also not scalable. Building the SQL queries themselves quickly began depleting the already constrained memory with large storage state items in one query, and all the edges in one query). While this was best option in terms of performance, it was also not feasible when the amount of data to store was large in relation to system memory. Building the SQL queries themselves quickly began depleting the already constrained memory with large storage
requests. As a result, the storage process would consume too much memory and crash the generator tool. To combat this, all queries had to be broken up into multiple queries. As previously mentioned, an extra 10\% buffer was saved requests. As a result, the storage process would consume too much memory and crash the generator tool. To combat this, all queries had to be broken up into multiple queries. As previously mentioned, an extra 10\% buffer was saved
for the storage process. SQL query strings are now built until they consume the 10\% buffer, where they are then processed by PostgreSQL, cleared, and the query building process resumes. for the storage process. SQL query strings are now built until they consume the 10\% buffer, where they are then processed by PostgreSQL, cleared, and the query building process resumes.
\TUsubsection{Portability} \TUsubsection{Portability}
The intermediate database storage is greatly advantageous in increasing the portability of RAGE across various systems, while still allowing for performance benefits. By allowing for a user-defined argument, users can safely assign The intermediate database storage is greatly advantageous in increasing the portability of RAGE across various systems, while still allowing for performance benefits. By allowing for a user-defined argument, users can safely assign
a value that allows for other processes and for the host OS to continue their workloads. While the ``total memory" component currently utilizes the Linux \textit{sysconf()} function, this is not rigid and is easily adjustable. When a value that allows for other processes and for the host OS to continue their workloads. While the ``total memory" component currently utilizes the Linux \textit{sysconf()} function, this is not rigid and is easily adjustable. When
working on a High-Performance Computing cluster, using this function could lead to difficulties since multiple users may be working on the same nodes, which prevents RAGE from fully using all system memory. This could be prevented working on an HPC cluster, using this function could lead to difficulties since multiple users may be working on the same nodes, which prevents RAGE from fully using all system memory. This could be prevented
by using a job scheduler argument such as Slurm's ``--exclusive" option, but this may not be desirable. Instead, a user could pass in the amount of total memory to use (and can be reused from a job scheduler's memory allocation by using a job scheduler argument such as Slurm's ``--exclusive" option, but this may not be desirable. Instead, a user could pass in the amount of total memory to use (and can be reused from a job scheduler's memory allocation
request option), and the intermediate database storage process would function in the same fashion. request option), and the intermediate database storage process would function in the same fashion.