By turning PowerShell code into tokens and structures, you can find errors, auto-document your code, and create powerful refactoring tools.
Colorful World of Tokens
Whenever you load PowerShell code into specialized editors, the code gets magically colored, and each color represents a given token type. The colors can help you understand how PowerShell interprets your code.
Generic editors without a built-in PowerShell Engine like notepad++ or VSCode use complex regular expressions to try and identify the correct tokens. A 100% precise tokenization however comes directly from the PowerShell Parser and is not the result of generic RegEx rules. In this article series we’ll look at all the goodness the PowerShell Parser is willing to share with you.
At the end of today, you get a new command: Test-PSOneScript
parses one - or thousands - of PowerShell files and returns always 100% accurate tokens in no time. It is part of our PSOneTools module, so just install the latest version to get your hands on the command, or use the source code presented later in this article.
Install-Module -Name PSOneTools -Scope CurrentUser -Force
With tokens you can do a whole bunch of interesting things, for example:
- Auto-document code and create lists of variables, commands, or method calls found in a script
- Identify syntax errors that make the parser choke
- Perform a security analysis and identify scripts using risky commands
PSParser Overview
The PSParser is the original parser built into the early versions of PowerShell. Even though it is old, it is still part of all PowerShell versions and very useful because of its simplicity. It distinguishes 20 different token types:
PS> [Enum]::GetNames([System.Management.Automation.PSTokenType]).Count
20
PS> [Enum]::GetNames([System.Management.Automation.PSTokenType]) | Sort-Object
Attribute
Command
CommandArgument
CommandParameter
Comment
GroupEnd
GroupStart
Keyword
LineContinuation
LoopLabel
Member
NewLine
Number
Operator
Position
StatementSeparator
String
Type
Unknown
Variable
When you use PSParser to tokenize PowerShell code, it reads your code character by character and groups the characters into meaningful words, the tokens. If the PSParser encounters characters it isn’t expecting, it generates Syntax Errors, i.e. when a string starts with double-quotes but ends with single-quotes.
Tokenizing PowerShell Code
Use Tokenize() to tokenize PowerShell code. Here is a simple example:
# the code that you want tokenized:
$code = {
# this is some test code
$service = Get-Service |
Where-Object Status -eq Running
}
# create a variable to receive syntax errors:
$errors = $null
# tokenize PowerShell code:
$tokens = [System.Management.Automation.PSParser]::Tokenize($code, [ref]$errors)
# analyze errors:
if ($errors.Count -gt 0)
{
# move the nested token up one level so we see all properties:
$syntaxError = $errors | Select-Object -ExpandProperty Token -Property Message
$syntaxError
}
else
{
$tokens
}
Tokenize() expects the code you want to tokenize, plus an empty variable that it can fill with any syntax errors. Because the variable $errors is empty when Tokenize starts, and gets filled while the method parses the code, it needs to be submitted by reference (by memory pointer) which in PowerShell is done through [ref].
When Tokenize() completes, you receive all tokens as return value in $tokens, plus any syntax errors in $errors.
Looking at Tokens
This is what the first three token returned in $tokens look like:
PS> $tokens[0..2]
Content :
Type : NewLine
Start : 0
Length : 2
StartLine : 1
StartColumn : 1
EndLine : 2
EndColumn : 1
Content : # this is some test code
Type : Comment
Start : 4
Length : 24
StartLine : 2
StartColumn : 3
EndLine : 2
EndColumn : 27
Content :
Type : NewLine
Start : 28
Length : 2
StartLine : 2
StartColumn : 27
EndLine : 3
EndColumn : 1
Each token is represented by a PSToken object which returns the token content as string, the token type, and the exact position where the token was found.
How Syntax Errors Work
If the parser encounters unexpected characters while parsing the code, it generates a syntax error. The parser continues parsing, so there can be multiple syntax errors returned.
Let’s create a syntax error and send a string to the parser that is missing its ending quote.
To send faulty code to the parser, you cannot use a scriptblock though because scriptblocks are smart and only accept formally correct PowerShell code. That’s why you have to send your faulty PowerShell code to the parser as a string instead of a scriptblock.
# the code that you want tokenized:
$code = "
'Hello
"
When you run the script again, it now returns the syntax error(s):
PS> $syntaxError
Message : The string is missing the terminator: '.
Content : 'Hello
Type : Position
Start : 4
Length : 8
StartLine : 2
StartColumn : 3
EndLine : 3
EndColumn : 1
Improving Parser Error Objects
The parser emits a PSParseError object per syntax error which looks like this:
PS> $errors
Token Message
----- -------
System.Management.Automation.PSToken The string is missing the terminator: '.
Unfortunately, the token details are hidden inside the property Token. So I use a little-known trick to make all properties visible immediately:
Select-Object
supports the use of -Property and -ExpandProperty at the same time. So I used -ExpandProperty to take the PSToken object out of Token, plus used -Property to attach the original property Message to the extracted token. As a result, all properties show up immediately:
PS> $errors | Select-Object -ExpandProperty Token -Property Message
Message : The string is missing the terminator: '.
Content : 'Hello
Type : Position
Start : 4
Length : 8
StartLine : 2
StartColumn : 3
EndLine : 3
EndColumn : 1
Examining Real Scripts
To examine real file-based scripts, simply embed the logic from above inside a pipeline-aware function. Test-PSOneScript
does exactly this and makes parsing PowerShell files a snap:
function Test-PSOneScript
{
<#
.SYNOPSIS
Parses a PowerShell Script (*.ps1, *.psm1, *.psd1)
.DESCRIPTION
Invokes the simple PSParser and returns tokens and syntax errors
.EXAMPLE
Test-PSOneScript -Path c:\test.ps1
Parses the content of c:\test.ps1 and returns tokens and syntax errors
.EXAMPLE
Get-ChildItem -Path $home -Recurse -Include *.ps1,*.psm1,*.psd1 -File |
Test-PSOneScript |
Out-GridView
parses all PowerShell files found anywhere in your user profile
.EXAMPLE
Get-ChildItem -Path $home -Recurse -Include *.ps1,*.psm1,*.psd1 -File |
Test-PSOneScript |
Where-Object Errors
parses all PowerShell files found anywhere in your user profile
and returns only those files that contain syntax errors
.LINK
https://powershell.one
#>
param
(
# Path to PowerShell script file
# can be a string or any object that has a "Path"
# or "FullName" property:
[String]
[Parameter(Mandatory,ValueFromPipeline)]
[Alias('FullName')]
$Path
)
begin
{
$errors = $null
}
process
{
# create a variable to receive syntax errors:
$errors = $null
# tokenize PowerShell code:
$code = Get-Content -Path $Path -Raw -Encoding Default
# return the results as a custom object
[PSCustomObject]@{
Name = Split-Path -Path $Path -Leaf
Path = $Path
Tokens = [Management.Automation.PSParser]::Tokenize($code, [ref]$errors)
Errors = $errors | Select-Object -ExpandProperty Token -Property Message
}
}
}
Parsing Individual Files
To parse an individual file, simply submit its path to Test-PSOneScript
. It immediately returns the tokens and any syntax errors (if present):
$Path = "C:\Users\tobia\test.ps1"
$result = Test-PSOneScript -Path $Path
Checking for Errors
Let’s start with checking whether the script file has syntax errors:
PS> $result.Errors.Count -gt 0
False
To get a list of all token types present in the script, try this (the output may vary depending on the actual code in your script file, of course):
PS> $result.Tokens.Type | Sort-Object -Unique
Command
CommandParameter
CommandArgument
Number
String
Variable
Member
Type
Operator
GroupStart
GroupEnd
Keyword
Comment
NewLine
Creating a List of Used Variables
To get a list of all variables used in the script, simply filter for token type Variable:
PS> $result.Tokens |
Where-Object Type -eq Variable |
Sort-Object -Property Content -Unique |
ForEach-Object { '${0}' -f $_.Content}
$_ldaptype
$_SortedReportProp
$AD_Capabilities
$AD_CreateDiagrams
$AD_CreateDiagramSourceFiles
$AD_DomainGPOs
...
$xlEqual
$zipPackage
$ZipReport
$ZipReportName
Creating a List of Used Commands
Likewise, if you’d like to get a list of commands used by the script, filter for the appropriate token type (Command):
PS> $result.Tokens |
Where-Object Type -eq Command |
Sort-Object -Property Content -Unique |
Select-Object -ExpandProperty Content
Add-Content
Add-Member
Add-Type
Add-Zip
Append-ADUserAccountControl
ConvertTo-HashArray
ConvertTo-Html
...
Start-sleep
Test-Path
Where
write-error
Write-Output
Write-Verbose
Write-Warning
You can even analyze the frequency of how often commands were used. This gets you the 10 most-often used commands:
PS> $result.Tokens |
Where-Object Type -eq Command |
Select-Object -ExpandProperty Content |
Group-Object -NoElement |
Sort-Object -Property Count -Descending |
Select-Object -First 10
Count Name
----- ----
51 Search-AD
49 New-Object
35 Write-Verbose
29 get-date
25 %
24 New-TimeSpan
24 Where
21 select
19 Sort-Object
17 Invoke-Method
Analyzing Use of .NET Methods
Maybe you are interested in finding out which native .NET methods the script uses. Again, it is just a matter of token filtering:
PS> $result.Tokens |
Where-Object Type -eq Member |
Select-Object -ExpandProperty Content |
Sort-Object -Unique
Accessible
ActiveSheet
Add
AdjacentSites
adminDisplayName
...
whenchanged
whencreated
Workbooks
Worksheets
At this point, you are reaching the limit of token analysis:
While it is nice to get a list of method names used by a script, it is not really useful. You’d need a bigger picture to know the object types the called methods belong to. All this is possible, too, but not with tokens alone. What’s required is a look at script structures that consist of multiple tokens - a case for the Abstract Syntax Tree (AST) which we shed light on in one of the next parts of this series.
Bulk-Analysis: Scanning Entire Folders
Test-PSOneScript
can’t just examine one file at a time. It is fully pipeline-aware and knows how to deal with files returned by Get-ChildItem
.
Finding Scripts With Errors
So if you want to identify scripts with syntax errors anywhere in your script library, simply runGet-ChildItem
to gather the files to be tested, and pipe them to Test-PSOneScript
like this:
# get all PowerShell files from your user profile...
Get-ChildItem -Path $home -Recurse -Include *.ps1, *.psd1, *.psm1 -File |
# ...parse them...
Test-PSOneScript |
# ...filter those with syntax errors...
Where-Object Errors |
# ...expose the errors:
ForEach-Object {
[PSCustomObject]@{
Name = $_.Name
Error = $_.Errors[0].Message
Type = $_.Errors[0].Type
Line = $_.Errors[0].StartLine
Column = $_.Errors[0].StartColumn
Path = $_.Path
}
}
This will find any script with any syntax error. If you’d like to be more specific, you can filter on the error message.
Identifying Risky Commands
The sky is the limit, so if you’d like to identify scripts that use risky commands such as Invoke-Expression
, just adjust the filter:
$blacklist = @('Invoke-Expression', 'Stop-Computer', 'Restart-Computer')
# get all PowerShell files from your user profile...
Get-ChildItem -Path $home -Recurse -Include *.ps1, *.psd1, *.psm1 -File |
# ...parse them...
Test-PSOneScript |
# ...filter those using commands in our blacklist
Foreach-Object {
# get the first token that is a command and that is in our blacklist
$badToken = $_.Tokens.Where{$_.Type -eq 'Command'}.Where{$_.Content -in $blacklist} |
Select-Object -First 1
if ($badToken)
{
$_ | Add-Member -MemberType NoteProperty -Name BadToken -Value $badToken -PassThru
}
} |
# ...expose the errors:
ForEach-Object {
[PSCustomObject]@{
Name = $_.Name
Offender = $_.BadToken.Content
Line = $_.BadToken.StartLine
Column = $_.BadToken.StartColumn
Path = $_.Path
}
}
What’s Next
Using thePSParser is just your first step into the wonderful world of tokens and script analysis. In the next part we’ll take a look at the more sophisticated Parser object which was introduced in PowerShell 3 and differentiates 150 different token kinds plus 26 token flags.
And if that’s still not enough detail, we look into the Abstract Syntax Tree (AST) and how it forms meaningful structures from a group of token.
BTW, have you checked out PowerShell Conference EU yet? Both Call for Papers and Delegate Registration are open!