CS-7863-Final/Report/Schrick-Noah_CS-7863_Final-Report.tex
2022-05-03 00:30:22 -05:00

329 lines
47 KiB
TeX

\RequirePackage{setspace}
\documentclass{article}
\usepackage{boxedminipage}
\usepackage{graphicx} % Images
\graphicspath{ {./images/} }
\usepackage{subcaption} % Captions on subfigures
\usepackage{algorithm} % Math and Big Oh
\usepackage[noend]{algpseudocode}
\usepackage{ifpdf} % Detect PDF or DVI mode
\usepackage{babel} % Bibliography
\usepackage{dsfont} % mathbb
\usepackage[table,xcdraw]{xcolor} % Highlighted cells for tables
\usepackage[hidelinks]{hyperref} % Clickable TOC Links
\hypersetup{
colorlinks,
citecolor=black,
filecolor=black,
linkcolor=black,
urlcolor=black
}
\usepackage[utf8]{inputenc}
\usepackage{float}
\usepackage{indentfirst}
\setlength{\parskip}{\baselineskip}
% Table of Contents/Figure Spacing
\usepackage[titles]{tocloft}
\cftsetindents{figure}{0em}{3.5em}
\cftsetindents{table}{0em}{3.5em}
\title{CS 7863: Network Theory Final Project: Compliance Graph Analysis}
\author{Noah Schrick}
\date{May 3, 2022}
\begin{document}
\maketitle
\tableofcontents
\section{Introduction}
\subsection{Attack Graphs}
To address the rising risks of computing and threats to cybersecurity, vulnerability analysis modeling is a technique employed by experts to identify weak points in a system or set of systems. One such modeling approach is to represent the system or set of systems through graphical means, with system information encoded into the nodes and edges of the graph. This modeling approach was first utilized in the 1990s in a format called attack trees, and can be seen through the works of the authors of \cite{phillips_graph-based_1998} and \cite{schneier_modeling_1999}. These attack trees would later be expanded into attack graphs.
Attack graphs begin with a root node that contains all the current information of the system or set of systems. From this initial root state, all assets in the system are examined to see if any single modification can be made, where a modification is typically a change in system policy or security settings. If a modification can be made, an edge is drawn from the previous state to a new state that includes all of the previous state's information, but now reflects the change in the system. This edge is labeled to reflect which change was made to the system. This process is exhaustively repeated, where all system properties are examined, all attack options are fully enumerated, all permutations are examined, and all changes to a system are encoded into their own independent states, where these states are then individually analyzed through the process.
\subsection{Compliance Graphs}
Compliance graphs are an alternate form of attack graphs, utilized specifically for examining compliance and regulation statuses of systems. Like attack graphs, compliance graphs can also be used to determine all ways that systems may fall out of compliance or violate regulations. These graphs are notably useful for cyber-physical systems due to the increased need for compliance. As the authors of \cite{j_hale_compliance_nodate}, \cite{baloyi_guidelines_2019}, and \cite{allman_complying_2006} discuss, cyber-physical systems have seen greater usage, especially in areas such as critical infrastructure and Internet of Things.
The semantics of compliance graphs are similar to that of attack graphs, but with a few differences regarding the information at each state. While security and compliance statuses are related, the information that is analyzed in compliance graphs is focused less on certain security properties, and is expanded to also examine administrative policies and properties of systems. Since compliance and regulation is broad and can vary by industry and application, the information to analyze can range from safety regulations, maintenance compliance, or any other regulatory compliance. However, the graph structure of compliance graphs is identical to that of attack graphs, where edges represent a modification to the systems, and nodes represent all current information in the system.
\subsection{Difficulties of Compliance Graph Analysis}
Analysis of directed graphs is not as simple as their undirected counterparts, and attack and compliance graphs are directed acyclic graphs. The primary contributor to the increased difficulty is due to the asymmetric adjacency matrix present in directed graphs. With undirected graphs, simplifications can be made in the analysis process both computationally and conceptually. Since the ``in" degrees are equal to the ``out" degrees, less work is required both in terms of parsing the adjacency matrix, but also in terms of determining importance of nodes. The author of \cite{newman2010networks} discusses that common analysis techniques such as eigenvector centrality is often unapplicable to directed acyclic graphs. As the author of \cite{Mieghem2018DirectedGA} discusses, the difficulty of directed graphs also extends to the graph Laplacian, where the definition for asymmetric adjacency matrices is not uniquely defined, and is based on either row or column sums computing to zero, but both cannot. The author of \cite{Mieghem2018DirectedGA} continues to discuss that directed graphs lead to complex eigenvalues, and can lead to adjacency matrices that are unable to be diagonalized. These challenges require different approaches for typical clustering or centrality measures.
\section{Related Works}
The author of \cite{ming_diss} presents three centrality measures that were applied to various attack graphs. The centrality measures implemented were Katz, K-path Edge, and Adapted PageRank. Each of these centrality measures are applicable to the directed format of attack graphs, and conclusions can be drawn regarding patching schemes for preventing exploits. As an approach for avoiding complex eigenvalues, the authors of \cite{Guo2017HermitianAM} present work examining directed, undirected, and mixed graphs using its Hermitian adjacency matrix. Other works, such as that discussed by the author of \cite{Mieghem2018DirectedGA} include mathematical manipulation of directed graph spectra (originally presented by the author of \cite{Brualdi2010SpectraOD}) with Schur's Theorem to bound eigenvalues and allow for explicit computation, which can then be used for additional analysis metrics.
\section{Experimental Networks} \label{sec:networks}
The work conducted in this approach utilized three compliance graphs, with their properties displayed in Table \ref{table:networks}. Connectivity in this table refers to the mean degree, divided by the number of nodes in the network, multiplied by 100 to get the number in a percentage form. Network 1 is a vehicle maintenance network. This network has one car asset that is deemed ``brand new", and has no mileage. This network is examined at its current state, and progresses through time with time steps of 1 month, up to 12 months total. At each time step the car gains mileage and increases its age property, and is reexamined to evaluate its standing in regards to its vehicular regulatory maintenance schedule. Network 2 is an artificial company network that is attempting to maintain HIPAA compliance \cite{noauthor_health_1996}. This network examines its standing in relation to security properties that are required per HIPAA guidelines, as well as employee cooperation to training and administrative policies. This network is also progressed through time to illustrate the company's standing in relation to yearly audits and trainings that must be followed. Employees are also added and removed through the network at set points during the time progression process. Network 3 is another artificial company network. This company is attempting to maintain PCI DSS compliance \cite{PCI}. This network generation was static and did not progress through time. This network examined the company and its current state, and examined all changes that could occur. These changes were primarily tied to security properties such as physical break-ins on the property, firewalls being disabled, default system settings, and encryption expiration.
\begin{table}[]
\centering
\begin{tabular}{|c|c|c|c|}
\hline
\textbf{Network} & \textbf{Nodes} & \textbf{Edges} & \textbf{Connectivity (\%)} \\ \hline
Car & 2491 & 12968 & 0.209 \\ \hline
HIPAA & 2321 & 8063 & 0.150 \\ \hline
PCI DSS & 61 & 163 & 4.381 \\ \hline
\end{tabular}
\caption{Network Properties for the Three Networks Utilized}
\label{table:networks}
\end{table}
\section{Centralities and their Applications to Compliance Graphs} \label{sec:centralities}
\subsection{Introduction}
The author of \cite{PMID:30064421} provides a survey of centrality measures, and discusses how various centrality measures have been implemented and brought forth in order to determine node importance in networks. By determining the importance of nodes, various conclusions can be drawn regarding the network. In the case of compliance graphs, conclusions can be drawn regarding the prioritization of patching or correction schemes. If one node is known to lead to the creation of many other nodes, it may be said that a patch is imperative to prevent further opportunities for compliance violation. This work discusses five centrality measures, and discusses their application to compliance graphs.
\subsection{Degree}
Degree centrality is a trivial, localized measure of node importance based on the number of edges that a node has. In an undirected graph, the degree centrality is predicated solely on the number of edges. However, in the case of a directed graph, a distinction is drawn with a degree centrality oriented on the number of edges coming into a node, and another measure focused on the number of edges leaving a node. Both of these cases provide useful information for compliance graphs. When a node has a large number of other nodes it points to, this node may be prioritized since it creates further opportunity for violation. When a node has a large number of edges pointing to it, this node may be prioritized since the probability that systems may enter this state is higher due to the increased number of ways that a system could lead to this state.
\subsection{Betweenness}\label{sec:between}
Betweenness centrality ranks node importance based on its ability to transfer information flow in a network. For all pairs of nodes in a network, a shortest path is determined. A node that is in this shortest path is considered to have importance. The total betweenness centrality is based on the number of shortest paths that pass through a given node. For compliance graphs, the shortest paths are useful to identify the quickest way that systems may fall out of compliance. By prioritizing the nodes that fall in the highest number of shortest paths, correction schemes can be employed to prolong or prevent systems from falling out of compliance.
Betweenness centrality is given in Equation \ref{eq:between}, where \textit{i} and \textit{j} are two different, individual nodes in the network, $\sigma_{ij}$ is the total number of shortest paths from \textit{i} to \textit{j}, and $\sigma _{ij}(v)$ is the number of shortest paths that include a node \textit{v}.
\begin{equation}
\sum_{i \neq i \neq v} \frac{\sigma_{ij}(v)}{\sigma_{ij}}
\label{eq:between}
\end{equation}
\subsection{Katz}
Katz centrality was first introduced by the author of \cite{Katz}, and measures the importance of nodes through all paths in a network. Katz centrality varies in that its centrality measure is not limited to solely the shortest path between any two given nodes. The original work by the author defines Katz as seen in Equation \ref{eq:Katz}, where \textit{i} and \textit{j} are nodes in the network, \textit{n} is the total number of nodes in the network, \textit{A} is the adjacency matrix, and $\alpha$ is an attenuation factor and has a value between 0 and 1. From this, a value of 1 is assigned if node \textit{i} is connected to node \textit{j}.
\begin{equation}
C_{\mathrm {Katz} }(i)=\sum _{k=1}^{\infty }\sum _{j=1}^{n}\alpha ^{k}(A^{k})_{ji}
\label{eq:Katz}
\end{equation}
Later works have expanded on the original Katz to include a $\beta$ vector that allows for additional scaling in the instance that prior knowledge of the network exists. The modified equation can be seen in Equation \ref{eq:mod_katz}.
\begin{equation}
\vec{x} = \left(I - \alpha A \right)^{-1}\vec{\beta}
\label{eq:mod_katz}
\end{equation}
For compliance graphs, Katz centrality represents the total number of paths that exist from a given node to any other downstream nodes, and is scaled based on the attenuation factor as well as the prior knowledge vector $\beta$. When the Katz centrality of a given node is high, prioritizing a correction scheme for the node would be useful to prevent opportunity of future compliance violations that may be many steps ahead, but still reachable from the current state.
\subsection{K-Path Edge}
K-path edge centrality, as discussed by the authors of \cite{K_Path_Edge}, is predicated on information passing through a network as a means of generalizing k-path centrality. With K-path edge centrality, importance is based on the edges of the network. One difference from betweenness centrality, is that as discussed in Section \ref{sec:between}, betweenness centrality is global and counts all nodes in a the shortest path. K-path edge centrality is localized, and is constrained by \textit{k} steps from a given node. Equation \ref{eq:kpe} displays the centrality measure for K-path edge centrality, where \textit{m} is a given edge in the network, \textit{N} is the total number of nodes in the network, $\delta_{n}^{(K)}$ is the number of K-paths from node \textit{n}, and $\delta_{n}^{(K)}(m)$ is the number of K-paths from node \textit{n} that include edge \textit{m}.
\begin{equation}
L^{(K)}(m) = \sum_{n = 1}^{N}\frac{\delta_{n}^{(K)}(m)}{\delta_{n}^{(K)}}
\label{eq:kpe}
\end{equation}
For compliance graphs, K-path edge centrality is useful to identify a short chain of changes that may result in a compliance violation. If a node has a high K-path edge centrality and it is likely that the system will be put into that node, then a series of changes could occur that could then put the system in a different states. Prioritizing nodes that have a high K-path edge centrality could be useful in deterring a short chain of changes that could cripple the system further. It is also useful to prevent states where the system is near a compliance violation.
\subsection{Adapted Page Rank}
The original PageRank algorithm was first designed by the authors of \cite{PageRank} for the Google prototype for ranking web pages. The authors of \cite{Adapted_PageRank} later introduced an Adapated PageRank that was designed to measure both the number and quality of connections specifically for an urban network. Equation \ref{eq:PR} displays the PageRank algorithm, where $\gamma$ is a damping factor with a value between 0 and 1, \textit{n} is the total number of nodes in the network, \textit{A} is the adjacency matrix of the network, \textit{i} and \textit{j} represent the row and column of the adjacency matrix, \textit{x} is a given node in the network, and \textit{k} is the row sum out degree. Since the Adapted PageRank algorithm measures the quality of connections, there is increased application to directed networks such as compliance graphs. As seen in Equation \ref{eq:PR}, the \textit{k$_j$} term is a penalizing factor. Importance is based on the in degree of a node, with a penalty for the out degree. If many nodes point to a given node, then that node is said to be important due to its accessibility.
\begin{equation}
x_i = \frac{1-\gamma}{n} + \gamma\sum_{j = 1}^{n}\frac{A_{ij}}{k_j}x_j
\label{eq:PR}
\end{equation}
The adapted PageRank algorithm includes additional data that may be present in an urban network, such as geographical position, resource availability, and proximity to facilities. This data is user-defined, and may not be present in the network. Equation \ref{eq:APC} displays the Adapated PageRank algorithm in matrix form where \textit{D} is the user-defined data matrix, \textit{I} is the identity matrix, and $\mathds{1}$ is a column matrix comprised of 1s.
\begin{equation}
(I-\gamma A D)\vec{x} = \frac{1-\gamma}{n}\mathds{1}
\label{eq:APC}
\end{equation}
For compliance graphs, the Adapted Page Rank algorithm is useful for a few reasons. First, it is able to include user-defined data regarding the network. This could include scaling certain nodes to have greater weight, such as those known to be a compromised state. Second, since nodes are penalized for pointing to other nodes, this algorithm is useful for determining nodes that are likely to be visited. If a state has a greater in degree, it may need prioritization since the system has a higher likelihood of being placed in this state.
\section{Transitive Closure}
\subsection{Introduction and Application}
Transitive closure represents a transitive relation on a given binary set, and can be used to determine reachability of a given network. Figure \ref{fig:TC} \footnote{Image origin can be located at: https://commons.wikimedia.org/wiki/File:Transitive-closure.svg, and this image has been licensed under the terms of the GNU Free Documentation License.} displays an example output when performing transitive closure. In context of compliance graphs, it is useful to consider that an adversary (whether an internal or external malicious actor, poor policy execution by an organization, accidental misuse, or any other adversarial occurrence) could have no time constraints. That is, for any given state of the system or set of systems, an adversarial act could have ``infinite" time to perform a series of actions. If no prior knowledge is known about the network, it can be assumed that all changes performed on the systems are equally likely. In practice, specifying a probability that a change can occur has been performed through a Markov Decision Process, such as that seen by the authors of \cite{li_combining_2019} and \cite{zeng_cyber_2017}. When under these assumptions, it is useful to then consider which nodes are important, assuming they have 1-step reachability to any downstream node they may have a transitive connection to. As a result, a transitive closure was identified for all networks described in
Section \ref{sec:networks}, and this transitive closure was then analyzed through the five centrality methods discussed in Section \ref{sec:centralities}. Results and a discussion of the results can be seen in Section \ref{sec:results}.
\begin{figure}[htp]
\includegraphics[width=\linewidth]{"./images/Transitive-closure.png"}
\vspace{.2truein} \centerline{}
\caption{Example of Transitive Closure}
\label{fig:TC}
\end{figure}
\section{Dominant Tree}
\subsection{Introduction and Application}
Dominance, as initially introduced by the author of \cite{dominance} in terms of flow, is defined as a node that is in every path to another node. For instance, if a node \textit{i} is a destination node, and every path to \textit{i} from a source node includes node \textit{j}, then node \textit{j} is said to dominate node \textit{i}. Figure 2 displays an example starting network. With node 1 being the source node, it is evident that node 2 immediately dominates nodes 3, 4, 5, and 6, since all messages from node 1 must pass through node 2. By definition, each node must also dominate itself, so node 2 also dominates node 2.
Following the properties of dominance, a dominator tree can be derived. In a dominator tree, each node has children that it immediately dominates. Immediate dominance is referred to nodes that strictly dominate a given node, but do not strictly dominate any other node that may strictly dominate a node. Figure 3 displays the dominant tree of the network seen in Figure 2.
\begin{figure}[htp] \label{fig:preDtree}
\includegraphics[width=\linewidth]{"./images/pre-Dtree.png"}
\vspace{.2truein} \centerline{}
\caption[]{Example Network for Illustrating Dominance \footnote{Image origin can be located at: https://commons.wikimedia.org/wiki/File:Dominator$\_$control$\_$flow$\_$graph.svg, and this image has been released into the public domain for use for any purpose, unless such conditions are required by law.}}
\end{figure}
\begin{figure}[htp]
\includegraphics[width=\linewidth]{"./images/post-Dtree.png"}
\vspace{.2truein} \centerline{}
\caption[]{Dominant Tree Derived from the Network Displayed in Figure 2 \footnote{Image origin can be located at: https://commons.wikimedia.org/wiki/File:Dominator$\_$tree.svg, and this image has been released into the public domain for use for any purpose, unless such conditions are required by law.}}
\label{fig:post-Dtree}
\end{figure}
Dominant trees do alter the structure of compliance graphs, and leads to leaf nodes and branches that do not exist in the original network. As a result, some nodes that have directed edges to other nodes may be moved to a position where the edge no longer points to the original nodes. However, in dominant trees, all node parents dominate their children. In this format, the information flow is guided predominantly by the upstream nodes, and all parents in the dominant tree exist as upstream nodes in the original compliance graph. While some downstream nodes may be altered, the importance of nodes can be reexamined in the dominant tree to see how importance differs when information flow is refined. To this end, dominant trees were identified for all networks described in Section \ref{sec:networks}, and these dominant trees were then analyzed through the five centrality methods discussed in Section \ref{sec:centralities}. Results and a discussion of the results can be seen in Section \ref{sec:results}.
\section{Results and Result Analysis} \label{sec:results}
\subsection{Results}
In this section, only results for the car network are displayed for brevity. These results can be seen in Tables \ref{table:car-deg} through \ref{table:car-betweenness}. For the HIPAA and PCI DSS networks, results can be seen in Appendices \ref{apx:hipaa} and \ref{apx:pci}, respectively.
\begin{table}[]
\centering
\begin{tabular}{|cc|cc|cc|}
\hline
\multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Base}} & \multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Transitive Closure}} & \multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Dominant Tree}} \\ \hline
\multicolumn{1}{|c|}{\textbf{Node}} & \textbf{Value} & \multicolumn{1}{|c|}{\textbf{Node}} & \textbf{Value} & \multicolumn{1}{|c|}{\textbf{Node}} & \textbf{Value} \\ \hline
\multicolumn{1}{|c|}{314} & 11 & \multicolumn{1}{|c|}{0} & \multicolumn{1}{|c|}{2490} & \multicolumn{1}{|c|}{1} & \multicolumn{1}{|c|}{1246} \\ \hline
\multicolumn{1}{|c|}{346} & 10 & \multicolumn{1}{|c|}{1} & \multicolumn{1}{|c|}{2489} & \multicolumn{1}{|c|}{3} & \multicolumn{1}{|c|}{934} \\ \hline
\multicolumn{1}{|c|}{362} & 10 & \multicolumn{1}{|c|}{3} & \multicolumn{1}{|c|}{2487} & \multicolumn{1}{|c|}{7} & \multicolumn{1}{|c|}{156} \\ \hline
\multicolumn{1}{|c|}{370} & 10 & \multicolumn{1}{|c|}{7} & \multicolumn{1}{|c|}{2479} & \multicolumn{1}{|c|}{42} & \multicolumn{1}{|c|}{115} \\ \hline
\multicolumn{1}{|c|}{374} & 10 & \multicolumn{1}{|c|}{15} & \multicolumn{1}{|c|}{2463} & \multicolumn{1}{|c|}{314} & \multicolumn{1}{|c|}{31} \\ \hline
\multicolumn{1}{|c|}{376} & 10 & \multicolumn{1}{|c|}{27} & \multicolumn{1}{|c|}{2447} & \multicolumn{1}{|c|}{0} & \multicolumn{1}{|c|}{1} \\ \hline
\multicolumn{1}{|c|}{377} & 10 & \multicolumn{1}{|c|}{42} & \multicolumn{1}{|c|}{2431} & \multicolumn{1}{|c|}{15} & \multicolumn{1}{|c|}{1} \\ \hline
\multicolumn{1}{|c|}{378} & 10 & \multicolumn{1}{|c|}{60} & \multicolumn{1}{|c|}{2367} & \multicolumn{1}{|c|}{27} & \multicolumn{1}{|c|}{1} \\ \hline
\multicolumn{1}{|c|}{379} & 10 & \multicolumn{1}{|c|}{87} & \multicolumn{1}{|c|}{2303} & \multicolumn{1}{|c|}{60} & \multicolumn{1}{|c|}{1} \\ \hline
\multicolumn{1}{|c|}{380} & 10 & \multicolumn{1}{|c|}{130} & \multicolumn{1}{|c|}{2239} & \multicolumn{1}{|c|}{87} & \multicolumn{1}{|c|}{1} \\ \hline
\multicolumn{1}{|c|}{381} & 10 & \multicolumn{1}{|c|}{187} & \multicolumn{1}{|c|}{2175} & \multicolumn{1}{|c|}{130} & \multicolumn{1}{|c|}{1} \\ \hline
\multicolumn{1}{|c|}{382} & 10 & \multicolumn{1}{|c|}{250} & \multicolumn{1}{|c|}{2111} & \multicolumn{1}{|c|}{187} & \multicolumn{1}{|c|}{1} \\ \hline
\multicolumn{1}{|c|}{398} & 9 & \multicolumn{1}{|c|}{314} & \multicolumn{1}{|c|}{2047} & \multicolumn{1}{|c|}{250} & \multicolumn{1}{|c|}{1} \\ \hline
\multicolumn{1}{|c|}{406} & 9 & \multicolumn{1}{|c|}{2} & \multicolumn{1}{|c|}{1244} & \multicolumn{1}{|c|}{2} & \multicolumn{1}{|c|}{0} \\ \hline
\multicolumn{1}{|c|}{410} & 9 & \multicolumn{1}{|c|}{4} & \multicolumn{1}{|c|}{1243} & \multicolumn{1}{|c|}{4} & \multicolumn{1}{|c|}{0} \\ \hline
\end{tabular}
\caption{Top 15 Nodes with Degree Centrality}
\label{table:car-deg}
\end{table}
\begin{table}[]
\centering
\begin{tabular}{|cc|cc|cc|}
\hline
\multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Base}} & \multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Transitive Closure}} & \multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Dominant Tree}} \\ \hline
\multicolumn{1}{|c|}{\textbf{Node}} & \textbf{Value} & \multicolumn{1}{|c|}{\textbf{Node}} & \textbf{Value} & \multicolumn{1}{|c|}{\textbf{Node}} & \textbf{Value} \\ \hline
\multicolumn{1}{|c|}{314} & 0.002459349 & \multicolumn{1}{|c|}{0} & 74.447935 & \multicolumn{1}{|c|}{1} & 0.0542337315 \\ \hline
\multicolumn{1}{|c|}{377} & 0.001870821 & \multicolumn{1}{|c|}{1} & 67.679941 & \multicolumn{1}{|c|}{3} & 0.0385235854 \\ \hline
\multicolumn{1}{|c|}{346} & 0.001870821 & \multicolumn{1}{|c|}{3} & 60.55317 & \multicolumn{1}{|c|}{7} & 0.0066730273 \\ \hline
\multicolumn{1}{|c|}{376} & 0.001870821 & \multicolumn{1}{|c|}{7} & 51.894146 & \multicolumn{1}{|c|}{0} & 0.0058248184 \\ \hline
\multicolumn{1}{|c|}{374} & 0.001870821 & \multicolumn{1}{|c|}{15} & 43.13118 & \multicolumn{1}{|c|}{42} & 0.0050225267 \\ \hline
\multicolumn{1}{|c|}{378} & 0.001870821 & \multicolumn{1}{|c|}{27} & 35.752083 & \multicolumn{1}{|c|}{314} & 0.0016459253 \\ \hline
\multicolumn{1}{|c|}{380} & 0.001870821 & \multicolumn{1}{|c|}{42} & 29.550411 & \multicolumn{1}{|c|}{27} & 0.0009036979 \\ \hline
\multicolumn{1}{|c|}{381} & 0.001870821 & \multicolumn{1}{|c|}{60} & 22.205831 & \multicolumn{1}{|c|}{250} & 0.0005660377 \\ \hline
\multicolumn{1}{|c|}{382} & 0.001870821 & \multicolumn{1}{|c|}{87} & 16.522142 & \multicolumn{1}{|c|}{15} & 0.000491815 \\ \hline
\multicolumn{1}{|c|}{262} & 0.001870821 & \multicolumn{1}{|c|}{130} & 12.155237 & \multicolumn{1}{|c|}{187} & 0.000458049 \\ \hline
\multicolumn{1}{|c|}{370} & 0.001870821 & \multicolumn{1}{|c|}{2} & 10.714534 & \multicolumn{1}{|c|}{130} & 0.0004472501 \\ \hline
\multicolumn{1}{|c|}{379} & 0.001870821 & \multicolumn{1}{|c|}{4} & 9.740485 & \multicolumn{1}{|c|}{87} & 0.0004461702 \\ \hline
\multicolumn{1}{|c|}{418} & 0.001469376 & \multicolumn{1}{|c|}{5} & 9.740485 & \multicolumn{1}{|c|}{60} & 0.0004460622 \\ \hline
\multicolumn{1}{|c|}{459} & 0.001469376 & \multicolumn{1}{|c|}{6} & 9.740485 & \multicolumn{1}{|c|}{2} & 0.0004014452 \\ \hline
\multicolumn{1}{|c|}{467} & 0.001469376 & \multicolumn{1}{|c|}{187} & 8.82693 & \multicolumn{1}{|c|}{4} & 0.0004014452 \\ \hline
\end{tabular}
\caption{Top 15 Nodes with Katz Centrality}
\label{table:car-katz}
\end{table}
\begin{table}[]
\centering
\begin{tabular}{|cc|cc|cc|}
\hline
\multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Base}} & \multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Transitive Closure}} & \multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Dominant Tree}} \\ \hline
\multicolumn{1}{|c|}{\textbf{Node}} & \multicolumn{1}{|c|}{\textbf{Value}} & \multicolumn{1}{|c|}{\textbf{Node}} & \multicolumn{1}{|c|}{\textbf{Value}} & \multicolumn{1}{|c|}{\textbf{Node}} & \multicolumn{1}{|c|}{\textbf{Value}} \\ \hline
\multicolumn{1}{|c|}{314} & \multicolumn{1}{|c|}{231} & \multicolumn{1}{|c|}{0} & \multicolumn{1}{|c|}{2490} & \multicolumn{1}{|c|}{1} & \multicolumn{1}{|c|}{2336} \\ \hline
\multicolumn{1}{|c|}{346} & \multicolumn{1}{|c|}{175} & \multicolumn{1}{|c|}{1} & \multicolumn{1}{|c|}{1489} & \multicolumn{1}{|c|}{0} & \multicolumn{1}{|c|}{2181} \\ \hline
\multicolumn{1}{|c|}{362} & \multicolumn{1}{|c|}{175} & \multicolumn{1}{|c|}{3} & \multicolumn{1}{|c|}{2487} & \multicolumn{1}{|c|}{3} & \multicolumn{1}{|c|}{1091} \\ \hline
\multicolumn{1}{|c|}{370} & \multicolumn{1}{|c|}{175} & \multicolumn{1}{|c|}{7} & \multicolumn{1}{|c|}{2479} & \multicolumn{1}{|c|}{7} & \multicolumn{1}{|c|}{158} \\ \hline
\multicolumn{1}{|c|}{374} & \multicolumn{1}{|c|}{175} & \multicolumn{1}{|c|}{15} & \multicolumn{1}{|c|}{2463} & \multicolumn{1}{|c|}{15} & \multicolumn{1}{|c|}{117} \\ \hline
\multicolumn{1}{|c|}{376} & \multicolumn{1}{|c|}{175} & \multicolumn{1}{|c|}{27} & \multicolumn{1}{|c|}{2447} & \multicolumn{1}{|c|}{27} & \multicolumn{1}{|c|}{117} \\ \hline
\multicolumn{1}{|c|}{377} & \multicolumn{1}{|c|}{175} & \multicolumn{1}{|c|}{42} & \multicolumn{1}{|c|}{2431} & \multicolumn{1}{|c|}{42} & \multicolumn{1}{|c|}{117} \\ \hline
\multicolumn{1}{|c|}{378} & \multicolumn{1}{|c|}{175} & \multicolumn{1}{|c|}{60} & \multicolumn{1}{|c|}{2367} & \multicolumn{1}{|c|}{187} & \multicolumn{1}{|c|}{33} \\ \hline
\multicolumn{1}{|c|}{379} & \multicolumn{1}{|c|}{175} & \multicolumn{1}{|c|}{87} & \multicolumn{1}{|c|}{2303} & \multicolumn{1}{|c|}{250} & \multicolumn{1}{|c|}{32} \\ \hline
\multicolumn{1}{|c|}{380} & \multicolumn{1}{|c|}{175} & \multicolumn{1}{|c|}{130} & \multicolumn{1}{|c|}{2239} & \multicolumn{1}{|c|}{314} & \multicolumn{1}{|c|}{31} \\ \hline
\multicolumn{1}{|c|}{381} & \multicolumn{1}{|c|}{175} & \multicolumn{1}{|c|}{187} & \multicolumn{1}{|c|}{2175} & \multicolumn{1}{|c|}{60} & \multicolumn{1}{|c|}{3} \\ \hline
\multicolumn{1}{|c|}{382} & \multicolumn{1}{|c|}{175} & \multicolumn{1}{|c|}{250} & \multicolumn{1}{|c|}{2111} & \multicolumn{1}{|c|}{86} & \multicolumn{1}{|c|}{3} \\ \hline
\multicolumn{1}{|c|}{398} & \multicolumn{1}{|c|}{129} & \multicolumn{1}{|c|}{314} & \multicolumn{1}{|c|}{2047} & \multicolumn{1}{|c|}{130} & \multicolumn{1}{|c|}{3} \\ \hline
\multicolumn{1}{|c|}{406} & \multicolumn{1}{|c|}{129} & \multicolumn{1}{|c|}{2} & \multicolumn{1}{|c|}{1244} & \multicolumn{1}{|c|}{2} & \multicolumn{1}{|c|}{0} \\ \hline
\multicolumn{1}{|c|}{410} & \multicolumn{1}{|c|}{129} & \multicolumn{1}{|c|}{4} & \multicolumn{1}{|c|}{1243} & \multicolumn{1}{|c|}{4} & \multicolumn{1}{|c|}{0} \\ \hline
\end{tabular}
\caption{Top 15 Nodes with K-path Edge Centrality}
\label{table:car-kpe}
\end{table}
\begin{table}[]
\centering
\begin{tabular}{|cc|cc|cc|}
\hline
\multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Base}} & \multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Transitive Closure}} & \multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Dominant Tree}} \\ \hline
\multicolumn{1}{|c|}{\textbf{Node}} & \textbf{Value} & \multicolumn{1}{|c|}{\textbf{Node}} & \textbf{Value} & \multicolumn{1}{|c|}{\textbf{Node}} & \textbf{Value} \\ \hline
\multicolumn{1}{|c|}{2490} & 0.0827 & \multicolumn{1}{|c|}{2490} & 0.1992 & \multicolumn{1}{|c|}{314} & 0.001655 \\ \hline
\multicolumn{1}{|c|}{1004} & 0.01506 & \multicolumn{1}{|c|}{2479} & 0.0158 & \multicolumn{1}{|c|}{250} & 0.001479 \\ \hline
\multicolumn{1}{|c|}{1467} & 0.00969 & \multicolumn{1}{|c|}{2480} & 0.0158 & \multicolumn{1}{|c|}{187} & 0.001272 \\ \hline
\multicolumn{1}{|c|}{2479} & 0.00948 & \multicolumn{1}{|c|}{2481} & 0.0158 & \multicolumn{1}{|c|}{130} & 0.001028 \\ \hline
\multicolumn{1}{|c|}{2480} & 0.00948 & \multicolumn{1}{|c|}{2482} & 0.0158 & \multicolumn{1}{|c|}{42} & 0.001025 \\ \hline
\multicolumn{1}{|c|}{2481} & 0.00948 & \multicolumn{1}{|c|}{2483} & 0.0158 & \multicolumn{1}{|c|}{87} & 0.00074 \\ \hline
\multicolumn{1}{|c|}{2482} & 0.00948 & \multicolumn{1}{|c|}{2484} & 0.014 & \multicolumn{1}{|c|}{27} & 0.00074 \\ \hline
\multicolumn{1}{|c|}{2483} & 0.00948 & \multicolumn{1}{|c|}{2485} & 0.014 & \multicolumn{1}{|c|}{1} & 0.00074 \\ \hline
\multicolumn{1}{|c|}{667} & 0.00919 & \multicolumn{1}{|c|}{2486} & 0.0139 & \multicolumn{1}{|c|}{378} & 0.00044 \\ \hline
\multicolumn{1}{|c|}{2484} & 0.0083 & \multicolumn{1}{|c|}{2487} & 0.0139 & \multicolumn{1}{|c|}{379} & 0.00044 \\ \hline
\multicolumn{1}{|c|}{2485} & 0.0083 & \multicolumn{1}{|c|}{2488} & 0.0139 & \multicolumn{1}{|c|}{380} & 0.00044 \\ \hline
\multicolumn{1}{|c|}{2486} & 0.0083 & \multicolumn{1}{|c|}{2489} & 0.0139 & \multicolumn{1}{|c|}{381} & 0.00044 \\ \hline
\multicolumn{1}{|c|}{2487} & 0.0083 & \multicolumn{1}{|c|}{2424} & 0.0029 & \multicolumn{1}{|c|}{382} & 0.00044 \\ \hline
\multicolumn{1}{|c|}{2488} & 0.0083 & \multicolumn{1}{|c|}{2425} & 0.0029 & \multicolumn{1}{|c|}{470} & 0.00044 \\ \hline
\multicolumn{1}{|c|}{2489} & 0.0083 & \multicolumn{1}{|c|}{2426} & 0.0029 & \multicolumn{1}{|c|}{471} & 0.00044 \\ \hline
\end{tabular}
\caption{Top 15 Nodes with PageRank Centrality}
\label{table:car-APC}
\end{table}
\begin{table}[]
\centering
\begin{tabular}{|cc|cc|cc|}
\hline
\multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Base}} & \multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Transitive Closure}} & \multicolumn{2}{|c|}{\cellcolor[HTML]{FFFF00}\textbf{Dominant Tree}} \\ \hline
\multicolumn{1}{|c|}{\textbf{Node}} & \textbf{Value} & \multicolumn{1}{|c|}{\textbf{Node}} & \textbf{Value} & \multicolumn{1}{|c|}{\textbf{Node}} & \textbf{Value} \\ \hline
\multicolumn{1}{|c|}{42} & 9067.205 & \multicolumn{1}{|c|}{0} & 0 & \multicolumn{1}{|c|}{1} & 2489 \\ \hline
\multicolumn{1}{|c|}{27} & 8442.166 & \multicolumn{1}{|c|}{1} & 0 & \multicolumn{1}{|c|}{3} & 2486 \\ \hline
\multicolumn{1}{|c|}{60} & 8279.62 & \multicolumn{1}{|c|}{2} & 0 & \multicolumn{1}{|c|}{7} & 927 \\ \hline
\multicolumn{1}{|c|}{87} & 7580.359 & \multicolumn{1}{|c|}{3} & 0 & \multicolumn{1}{|c|}{42} & 906 \\ \hline
\multicolumn{1}{|c|}{15} & 7578.523 & \multicolumn{1}{|c|}{4} & 0 & \multicolumn{1}{|c|}{27} & 760 \\ \hline
\multicolumn{1}{|c|}{130} & 6868.21 & \multicolumn{1}{|c|}{5} & 0 & \multicolumn{1}{|c|}{15} & 612 \\ \hline
\multicolumn{1}{|c|}{7} & 6482.031 & \multicolumn{1}{|c|}{6} & 0 & \multicolumn{1}{|c|}{314} & 372 \\ \hline
\multicolumn{1}{|c|}{187} & 6111.862 & \multicolumn{1}{|c|}{7} & 0 & \multicolumn{1}{|c|}{250} & 352 \\ \hline
\multicolumn{1}{|c|}{50} & 5950.928 & \multicolumn{1}{|c|}{8} & 0 & \multicolumn{1}{|c|}{187} & 330 \\ \hline
\multicolumn{1}{|c|}{70} & 5822.054 & \multicolumn{1}{|c|}{9} & 0 & \multicolumn{1}{|c|}{130} & 306 \\ \hline
\multicolumn{1}{|c|}{104} & 5683.944 & \multicolumn{1}{|c|}{10} & 0 & \multicolumn{1}{|c|}{87} & 280 \\ \hline
\multicolumn{1}{|c|}{156} & 5474.525 & \multicolumn{1}{|c|}{11} & 0 & \multicolumn{1}{|c|}{60} & 252 \\ \hline
\multicolumn{1}{|c|}{1467} & 5299.985 & \multicolumn{1}{|c|}{12} & 0 & \multicolumn{1}{|c|}{0} & 0 \\ \hline
\multicolumn{1}{|c|}{250} & 5296.964 & \multicolumn{1}{|c|}{13} & 0 & \multicolumn{1}{|c|}{2} & 0 \\ \hline
\multicolumn{1}{|c|}{115} & 5196.398 & \multicolumn{1}{|c|}{14} & 0 & \multicolumn{1}{|c|}{4} & 0 \\ \hline
\end{tabular}
\caption{Top 15 Nodes with Betweenness Centrality}
\label{table:car-betweenness}
\end{table}
\subsection{Result Analysis}
When viewing the results of the car networks, unsurprisingly, each centrality method ranks nodes in a different order. These differences in rankings can be used based on additional metrics, such as severity, cost, or disturbance of systems, to identify correction schemes best suited for a given network. However, degree centrality and K-path edge centrality rankings for the top 15 were identical for the car network. This also extends to the HIPAA network, as seen in Appendix \ref{apx:hipaa}, but does not extend to the PCI DSS network. The value for \textit{k} in K-path edge centrality was set to 3. With a relatively small \textit{k} value in comparison to the overall size of the car and HIPAA networks, coupled with the high degree count of the top 15 nodes ranked with degree centrality, it is likely that the high degree count correlates to the K-path edge centrality scoring. This reasoning extends to the PCI DSS network, where the network is substantially smaller and there is a greater connectivity percent.
Comparing the transitive closure format of compliance graphs, the associated centrality rankings greatly vary from their original compliance graph rankings. As expected however, the root or leaf node has the highest centrality value. Since the root node can reach all nodes, and the leaf node can be reached by all nodes, these two nodes are expectedly ranked high. What is unexpected, however, is that the top 15 rankings are not comprised of the most upstream 15 nodes or the 15 most downstream nodes. While rankings do tend to be higher for more upstream for K-path edge, Katz, and degree centralities, nodes in the 100s, 200s, and 300s all make appearances. Betweenness centrality for the transitive closure representation yielded no valuable insight, since shortest paths to a node from any given node is reachable in 1 step.
For the dominant tree representation, it was initially hypothesized that nodes ranked highly in the original compliance graph's betweenness centrality or Katz centrality measures would closely relate to the dominant tree results. However, the dominant tree rankings also vary greatly from the original compliance graph's rankings. Even nodes that saw no appearances in the top 15 of the base compliance graph or transitive closure representation made appearances in the dominant tree results. Since the dominant tree format does favor the upstream nodes due to a lesser reordering effect caused by dominance, the PageRank ordering were not predominantly downstream nodes, but mostly nodes in the 300s.
\section{Conclusions and Future Work}
\subsection{Conclusions}
Each centrality measure implemented in this work provides various information that is useful for identifying correction schemes based on a network science approach. The results from the centrality methods differ, and each network can determine which rankings should be preferred based on prior knowledge of the network and the overhead of implementing correction measures. In addition, transitive closure representations and dominant trees were derived from the original compliance graphs, and unique rankings were identified. Transitive closure rankings are useful for determining which nodes are most important when an adversarial action can be considered to have infinite time and resources to perform changes to the original system. Dominant tree rankings are useful for determining which nodes are most important from an information flow perspective, where adversarial actions must pass though a series of nodes to reach any other node in the network. By applying correction schemes to the bottlenecks of the network, it may be possible to eliminate branches of the dominant tree entirely, leading to a removal of nodes in the original compliance graph.
\subsection{Future Work}
Based on the results of this work, there is ample room to continue investigation of centrality methods for compliance graphs. With three compliance graphs generated for three different networks along with various node importance rankings, it would be useful to artificially implement correction schemes based on the rankings to see their effects on the compliance graph. Likewise, using a user-defined data matrix in centrality methods like PageRank, further research could examine how node importance varies based on user-defined metrics. Edge weights could also be assigned to the original compliance graphs to represent the probability that a given change in the network could occur. Edge weights would be reflected in the adjacency matrices of the graphs, and centrality methods could be reexamined to determine node importance when probabilities are given. Transitive closures and dominant trees derived from the compliance graphs present a new approach for examining compliance graphs. Further research can be conducted to determine the effects of correction schemes when employed on nodes ranked highly in their respective centrality measures.
\clearpage
\addcontentsline{toc}{section}{Bibliography}
\bibliography{Bibliography}
\bibliographystyle{ieeetr}
\include{Appendices}
\end{document}