PowerShell where-object query on large datasets
The last few weeks I had to create a few PowerShell scripts where I had to combine certain information from several sources.
The way I used to do it was to do a where-object on an ID in an array. This works well, but I noticed that on large datasets it takes a lot of time. A where-object on a dataset of 30.000 items it takes on average between 1 and 2 seconds. So if we do the math 2 seconds on 30.000 items costs in total 60.000 seconds (16 hours) just doing the where-object.
When I ran the complete script, it took in total 7 hours. This was way to long to just get a report of all users in our tenant with data from Azure AD V1 and V2.
It looked something like this
After some searching online, I found some tricks to speed it up, what in the end blew my mind. The script went from 7 hours back to 7 minutes with one change.
What I did was load all data from Get-MSOLUser into a variable and made an indexed array from it based on the ObjectID. Then when I do a foreach on the Get-AzureADUser I could just get the index of the Get-MSOLUser array with the ObjectID, see the below script for the explanation.