Conclusions and Future Work

This commit is contained in:
Noah L. Schrick 2023-04-23 17:48:28 -05:00
parent 52d960530e
commit b8fa5cfd3c
7 changed files with 87 additions and 47 deletions

View File

@ -1,3 +1,19 @@
@inproceedings{CR-Simple,
author = {Nosayba El{-}Sayed and
Bianca Schroeder},
title = {Checkpoint/restart in practice: When 'simple is better'},
booktitle = {2014 {IEEE} International Conference on Cluster Computing, {CLUSTER}
2014, Madrid, Spain, September 22-26, 2014},
pages = {84--92},
publisher = {{IEEE} Computer Society},
year = {2014},
url = {https://doi.org/10.1109/CLUSTER.2014.6968777},
doi = {10.1109/CLUSTER.2014.6968777},
timestamp = {Thu, 23 Mar 2023 23:59:40 +0100},
biburl = {https://dblp.org/rec/conf/cluster/El-SayedS14.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@book{hursey2010coordinated,
title={Coordinated checkpoint/restart process fault tolerance for MPI applications on HPC systems},
author={Hursey, Joshua},

View File

@ -51,6 +51,11 @@
\newlabel{sec:mem-constraint}{{\mbox {III-A}1}{2}{Memory Constraint Difficulties}{subsubsection.3.1.1}{}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {III-A}1}Memory Constraint Difficulties}{2}{subsubsection.3.1.1}\protected@file@percent }
\@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {III-A}2}Implementation}{2}{subsubsection.3.1.2}\protected@file@percent }
\@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {III-A}3}Portability}{3}{subsubsection.3.1.3}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {III-B}}Restarting}{3}{subsection.3.2}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {IV}Results}{3}{section.4}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {V}Conclusions and Future Work}{3}{section.5}\protected@file@percent }
\citation{CR-Simple}
\bibdata{Bibliography}
\bibcite{schneier_modeling_1999}{1}
\bibcite{j_hale_compliance_nodate}{2}
@ -63,14 +68,11 @@
\bibcite{hursey2010coordinated}{9}
\bibcite{SCR}{10}
\bibcite{dmtcp}{11}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {III-A}3}Portability}{3}{subsubsection.3.1.3}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {III-B}}Restarting}{3}{subsection.3.2}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {IV}Results}{3}{section.4}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {V}Conclusions and Future Work}{3}{section.5}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{References}{3}{section*.1}\protected@file@percent }
\bibcite{BLCR}{12}
\bibcite{cook_scalable_2016}{13}
\bibcite{li_concurrency_2019}{14}
\bibcite{li_combining_2019}{15}
\bibcite{CR-Simple}{16}
\bibstyle{ieeetr}
\@writefile{toc}{\contentsline {section}{References}{4}{section*.1}\protected@file@percent }
\gdef \@abspage@last{4}

View File

@ -73,4 +73,10 @@ M.~Li, P.~Hawrylak, and J.~Hale, ``Combining {OpenCL} and {MPI} to support
heterogeneous computing on a cluster,'' {\em ACM International Conference
Proceeding Series}, 2019.
\bibitem{CR-Simple}
N.~El{-}Sayed and B.~Schroeder, ``Checkpoint/restart in practice: When 'simple
is better','' in {\em 2014 {IEEE} International Conference on Cluster
Computing, {CLUSTER} 2014, Madrid, Spain, September 22-26, 2014}, pp.~84--92,
{IEEE} Computer Society, 2014.
\end{thebibliography}

View File

@ -4,45 +4,45 @@ The top-level auxiliary file: Schrick-Noah_AG-CG-CR.aux
The style file: ieeetr.bst
Database file #1: Bibliography.bib
Warning--empty journal in BLCR
You've used 15 entries,
You've used 16 entries,
1876 wiz_defined-function locations,
556 strings with 6253 characters,
and the built_in function-call counts, 3258 in all, are:
= -- 298
> -- 150
564 strings with 6509 characters,
and the built_in function-call counts, 3570 in all, are:
= -- 330
> -- 158
< -- 0
+ -- 56
- -- 41
* -- 211
:= -- 489
add.period$ -- 18
call.type$ -- 15
change.case$ -- 13
+ -- 59
- -- 43
* -- 230
:= -- 528
add.period$ -- 19
call.type$ -- 16
change.case$ -- 14
chr.to.int$ -- 0
cite$ -- 16
duplicate$ -- 168
empty$ -- 330
format.name$ -- 41
if$ -- 786
cite$ -- 17
duplicate$ -- 187
empty$ -- 364
format.name$ -- 43
if$ -- 865
int.to.chr$ -- 0
int.to.str$ -- 15
missing$ -- 13
newline$ -- 52
num.names$ -- 15
pop$ -- 69
int.to.str$ -- 16
missing$ -- 14
newline$ -- 55
num.names$ -- 16
pop$ -- 75
preamble$ -- 1
purify$ -- 0
quote$ -- 0
skip$ -- 87
skip$ -- 100
stack$ -- 0
substring$ -- 147
swap$ -- 50
substring$ -- 171
swap$ -- 57
text.length$ -- 0
text.prefix$ -- 0
top$ -- 0
type$ -- 0
warning$ -- 1
while$ -- 28
width$ -- 17
write$ -- 131
while$ -- 32
width$ -- 18
write$ -- 141
(There was 1 warning)

View File

@ -1,4 +1,4 @@
This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023/Arch Linux) (preloaded format=pdflatex 2023.4.3) 23 APR 2023 17:00
This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023/Arch Linux) (preloaded format=pdflatex 2023.4.3) 23 APR 2023 17:48
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
@ -536,6 +536,21 @@ Underfull \hbox (badness 4660) in paragraph at lines 125--130
ess is greatly
[]
Underfull \hbox (badness 1622) in paragraph at lines 137--138
\OT1/ptm/m/n/10 function calls or snapshots that are required. The C/R
[]
[3]
Underfull \hbox (badness 2150) in paragraph at lines 139--140
\OT1/ptm/m/n/10 checkpoint times and sizes, as well as time taken to
[]
Underfull \hbox (badness 1565) in paragraph at lines 139--140
\OT1/ptm/m/n/10 settings to alter or enable, or communication strategies
[]
(./Schrick-Noah_AG-CG-CR.bbl
Underfull \hbox (badness 2351) in paragraph at lines 27--30
[]\OT1/ptm/m/n/8 S. Ainsworth and T. M. Jones, ``Graph prefetching using data
@ -552,7 +567,7 @@ Underfull \hbox (badness 5091) in paragraph at lines 54--56
[]\OT1/ptm/m/n/8 J. Ansel, K. Arya, and G. Cooperman, ``Dmtcp: Transparent
[]
[3])
)
** Conference Paper **
Before submitting the final camera ready copy, remember to:
@ -564,20 +579,18 @@ Before submitting the final camera ready copy, remember to:
uses only Type 1 fonts and that every step in the generation
process uses the appropriate paper size.
[4
] (./Schrick-Noah_AG-CG-CR.aux)
[4] (./Schrick-Noah_AG-CG-CR.aux)
Package rerunfilecheck Info: File `Schrick-Noah_AG-CG-CR.out' has not changed.
(rerunfilecheck) Checksum: CC85FF3DB94FE8393E2ED734D36908F3;1379.
)
Here is how much of TeX's memory you used:
12042 strings out of 476025
190409 string characters out of 5796533
12044 strings out of 476025
190434 string characters out of 5796533
1871388 words of memory out of 5000000
32305 multiletter control sequences out of 15000+600000
32306 multiletter control sequences out of 15000+600000
544489 words of font info for 89 fonts, out of 8000000 for 9000
1141 hyphenation exceptions out of 8191
75i,8n,76p,1314b,588s stack positions out of 5000i,500n,10000p,200000b,80000s
75i,8n,76p,1314b,453s stack positions out of 5000i,500n,10000p,200000b,80000s
</usr/share/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi10.pfb></usr/share/
texmf-dist/fonts/type1/public/amsfonts/cm/cmr10.pfb></usr/share/texmf-dist/font
s/type1/public/amsfonts/cm/cmr7.pfb></usr/share/texmf-dist/fonts/type1/public/a
@ -585,10 +598,10 @@ msfonts/cm/cmsy10.pfb></usr/share/texmf-dist/fonts/type1/urw/times/utmb8a.pfb><
/usr/share/texmf-dist/fonts/type1/urw/times/utmbi8a.pfb></usr/share/texmf-dist/
fonts/type1/urw/times/utmr8a.pfb></usr/share/texmf-dist/fonts/type1/urw/times/u
tmri8a.pfb>
Output written on Schrick-Noah_AG-CG-CR.pdf (4 pages, 133124 bytes).
Output written on Schrick-Noah_AG-CG-CR.pdf (4 pages, 135615 bytes).
PDF statistics:
163 PDF objects out of 1000 (max. 8388607)
137 compressed objects within 2 object streams
32 named destinations out of 1000 (max. 500000)
165 PDF objects out of 1000 (max. 8388607)
139 compressed objects within 2 object streams
33 named destinations out of 1000 (max. 500000)
94 words of extra memory for PDF output out of 10000 (max. 10000000)

Binary file not shown.

View File

@ -129,11 +129,14 @@ Previous works with RAGE have been designed around maximizing performance to lim
request option), and the checkpointing process would function in the same fashion. Since PostgreSQL is used for the checkpointing, no file system dependencies are necessary for the cluster.
\subsection{Restarting}
The restarting process for attack and compliance graph generation requires only a limited set of information. In order for a proper generation restart, the generator first needs to pull the unexplored queue (``frontier") database table into memory. After the frontier is loaded, the generator tool needs to know at which integer to begin tagging new states. This is accomplished by checking the ID of the latest state in the frontier. Since the instance has already been explored, it does not need to be retrieved from disk. In addition, the frontier is a queue of unexplored nodes. Since these nodes have yet to be explored, no edge information is available, and as a result, no edge information is required. At this point, the generator tool is able to pop the first node from the queue and resume the generation process.
\section{Results}
\section{Conclusions and Future Work}
This work presents an application-level approach at C/R. This approach was built into RAGE itself, and has no dependencies on C/R libraries. In addition, it does not need support from the operating system, allowing for fault-tolerance on HPC clusters that may not support C/R. The results highlight the minimal time requirement to both checkpoint and restart the generation process. Since only the necessary information is stored and retrieved, there are no lengthy function calls or snapshots that are required. The C/R implemented also serves as a form of memory relief. Due to the size of large-scale attack and compliance graphs, there is increased difficulty in storing all information in memory. This approach allows users to abstract away the memory constraint difficulties while maximizing performance.
Future work includes performance and size comparisons to available C/R libraries. This would include implementing SCR, BLCR, and/or DMTCP and measuring their respective checkpoint times and sizes, as well as time taken to restart. This timing information can then be compared to the checkpoint and restart time presented by this work. Future work could also include new optimization techniques. Additional investigations can provide insight on techniques for reducing the runtime of database queries, PostgreSQL database settings to alter or enable, or communication strategies between distributed nodes. Other work can involve alternative approaches at C/R. This work made use of PostgreSQL since it was already incorporated into RAGE, but future work can look toward alternatives. Filesystem C/R approaches can be investigated to determine possible advantages over a dedicated database. One other route for future work involves identifying the checkpointing interval. The work presented by the authors of \cite{CR-Simple} discusses a multitude of options for determining an optimal interval. Various options presented in their work can be implemented, along with their closed-form solution for identifying a default checkpointing interval that can be built into RAGE.
%\bibliographyp
\bibliography{Bibliography}