I am building a PowerShell class based tool that works with XML files of as much as a few hundred Kb size, and as many as a hundred of these files. I have extensive validation code that is time consuming, so I want to validate the XML and if it is valid hash it and write that hash as an attribute to the root element. My code can then hash on load, compare with that value and skip validation when possible. This is a pretty major speed improvement. To that end, I have this static method of my xml class.
static [String] GetHash ([Xml.XmlDocument]$xmlDocument) { $stringBuilder = [Text.StringBuilder]::new() $hashBuilder = [Security.Cryptography.MD5CryptoServiceProvider]::new() #[Security.Cryptography.SHA384CryptoServiceProvider] $encoding = [Text.Encoding]::UTF8 $xmlToHash = [Xml.XmlDocument]::new() $xmlToHash.AppendChild($xmlToHash.ImportNode($xmlDocument.DocumentElement, $true)) $string = [pxXML]::ToString($xmlToHash) $hashIntermediate = $hashBuilder.ComputeHash($encoding.GetBytes($string)) foreach ($intermediate in $hashIntermediate) { $stringBuilder.Append($intermediate.ToString('x2')) } return $stringBuilder.ToString() }
As well as this static .ToString()
method that is used by the hasher and also used for writing results to the console in testing.
static [String] ToString ([Xml]$Xml) { $toString = [System.Collections.Generic.List[String]]::new() $stringWriter = [IO.StringWriter]::new() $xmlWriter = [Xml.XmlTextWriter]::new($stringWriter) $xmlWriter.Formatting = "indented" $xmlWriter.Indentation = 4 $Xml.WriteTo($xmlWriter) $xmlWriter.Flush() $stringWriter.Flush() $toString.Add($stringWriter.ToString()) $toString.Add("`r" * 2) return [string]::Join("`n", $toString) }
As you might notice from the commented part of the $hashBuilder
line, I have tested this with different algorithms, and there is little difference in performance. Which is a bit of a surprise. So I wonder if there is something else her that I am missing, that is the the/an actual bottleneck? Or is this code actually reasonable close to optimal performance already? Given that I am totally new to classes and digging this deep into .NET, I suspect there is a bit to learn here beyond just optimizing this code.