PowerShell Hashtables as an Index

Have you ever needed to grep to a ton of logs that reference AD objects, fetch those objects from AD and build a report? No?... Well let's just say you have. That can get pretty slow and round tripping to your local friendly domain controller can get out of hand. There are some easy ways to avoid making unnessessary queries, here's one of them.

 

function Get-CompiledObject {
    param (
        [Parameter(Mandatory=$true, ValueFromPipeline=$true, Position=0)]
        $ad_group,
        [hashtable]$dn2sid
    )
    process {
        $obj = New-Object PSObject
        $obj | Add-Member -MemberType NoteProperty -Name Name -Value $ad_group.Name
        $obj | Add-Member -MemberType NoteProperty -Name SID  -Value $ad_group.SID.Value
        $obj | Add-Member -MemberType NoteProperty -Name Members -Value @()
    foreach ($m in $ad_group.Members) {            
        
        $member = $dn2sid[$m]
        
        if ($member -eq $null) {
            $member = Get-ADObject -Filter {DistinguishedName -like $m}
            $dn2sid[$m] = $member
        }
        
        $obj.Members += $member.SID

        Clear-Variable "member"
    }

    $obj
}

}

This is nothing special. Its purpose is to create an object that contains the name of a group, it's SID and the SIDs' of all its members so that we can retain the group -> member data in perpetuity. One of the arguments, $dn2sid, is a hashtable which stores AD objects, using a given object's DistinguishedName as it's key. This hashtable allows us to do two interesting things: select an object from memory without filtering, only query an object from AD when necessary.

I've been told at work to not bother optimizing anything, "there will be time for that later", but is an efficient design from the start a premature optimization? I say "No". In the example function, the dataset I tested with took 15 minutes to run without using the hashtable but 3 minutes with the hashtable. In another instance I was tasked with correlating properties on objects contained in 2 large lists (6k+, large for PowerShell). Naively itterating over list one and using Where-Object to correlations to list two took ~30 minutes. By preloading the objects into a hashtable, you don't need to fliter the list seaking for an object when you can pull the value from a hashtable directly; my 30 minute opperation now takes 1:30. A 5x speedup allows you to test the code more often in a given amount of time. Taking the easy wins by including optimizations from the start can mean the difference between getting something done today vs. tomorrow.