# Tokenizing PowerShell Scripts

By turning PowerShell code into tokens and structures, you can find errors, auto-document your code, and create powerful refactoring tools.

By turning PowerShell code into tokens and structures, you can find errors, auto-document your code, and create powerful refactoring tools.

## Colorful World of Tokens

Whenever you load PowerShell code into specialized editors, the code gets magically colored, and each color represents a given token type. The colors can help you understand how PowerShell interprets your code.

Generic editors without a built-in PowerShell Engine like notepad++ or VSCode use complex regular expressions to try and identify the correct tokens. A 100% precise tokenization however comes directly from the PowerShell Parser and is not the result of generic RegEx rules. In this article series we’ll look at all the goodness the PowerShell Parser is willing to share with you.

At the end of today, you get a new command: Test-PSOneScript parses one - or thousands - of PowerShell files and returns always 100% accurate tokens in no time. It is part of our PSOneTools module, so just install the latest version to get your hands on the command, or use the source code presented later in this article.

Install-Module -Name PSOneTools -Scope CurrentUser -Force


With tokens you can do a whole bunch of interesting things, for example:

• Auto-document code and create lists of variables, commands, or method calls found in a script
• Identify syntax errors that make the parser choke
• Perform a security analysis and identify scripts using risky commands

## PSParser Overview

The PSParser is the original parser built into the early versions of PowerShell. Even though it is old, it is still part of all PowerShell versions and very useful because of its simplicity. It distinguishes 20 different token types:

PS> [Enum]::GetNames([System.Management.Automation.PSTokenType]).Count
20

PS> [Enum]::GetNames([System.Management.Automation.PSTokenType]) | Sort-Object
Attribute
Command
CommandArgument
CommandParameter
Comment
GroupEnd
GroupStart
Keyword
LineContinuation
LoopLabel
Member
NewLine
Number
Operator
Position
StatementSeparator
String
Type
Unknown
Variable


When you use PSParser to tokenize PowerShell code, it reads your code character by character and groups the characters into meaningful words, the tokens. If the PSParser encounters characters it isn’t expecting, it generates Syntax Errors, i.e. when a string starts with double-quotes but ends with single-quotes.

## Tokenizing PowerShell Code

Use Tokenize() to tokenize PowerShell code. Here is a simple example:

# the code that you want tokenized:
$code = { # this is some test code$service = Get-Service |
Where-Object Status -eq Running
}

# create a variable to receive syntax errors:
$errors =$null
# tokenize PowerShell code:
$tokens = [System.Management.Automation.PSParser]::Tokenize($code, [ref]$errors) # analyze errors: if ($errors.Count -gt 0)
{
# move the nested token up one level so we see all properties:
$syntaxError =$errors | Select-Object -ExpandProperty Token -Property Message
$syntaxError } else {$tokens
}


PS> $tokens[0..2] Content : Type : NewLine Start : 0 Length : 2 StartLine : 1 StartColumn : 1 EndLine : 2 EndColumn : 1 Content : # this is some test code Type : Comment Start : 4 Length : 24 StartLine : 2 StartColumn : 3 EndLine : 2 EndColumn : 27 Content : Type : NewLine Start : 28 Length : 2 StartLine : 2 StartColumn : 27 EndLine : 3 EndColumn : 1  Each token is represented by a PSToken object which returns the token content as string, the token type, and the exact position where the token was found. ### How Syntax Errors Work If the parser encounters unexpected characters while parsing the code, it generates a syntax error. The parser continues parsing, so there can be multiple syntax errors returned. Let’s create a syntax error and send a string to the parser that is missing its ending quote. To send faulty code to the parser, you cannot use a scriptblock though because scriptblocks are smart and only accept formally correct PowerShell code. That’s why you have to send your faulty PowerShell code to the parser as a string instead of a scriptblock. # the code that you want tokenized:$code = "
'Hello
"


When you run the script again, it now returns the syntax error(s):

PS> $syntaxError Message : The string is missing the terminator: '. Content : 'Hello Type : Position Start : 4 Length : 8 StartLine : 2 StartColumn : 3 EndLine : 3 EndColumn : 1  #### Improving Parser Error Objects The parser emits a PSParseError object per syntax error which looks like this: PS>$errors

Token                                Message
-----                                -------
System.Management.Automation.PSToken The string is missing the terminator: '.


Unfortunately, the token details are hidden inside the property Token. So I use a little-known trick to make all properties visible immediately:

Select-Object supports the use of -Property and -ExpandProperty at the same time. So I used -ExpandProperty to take the PSToken object out of Token, plus used -Property to attach the original property Message to the extracted token. As a result, all properties show up immediately:

PS> $errors | Select-Object -ExpandProperty Token -Property Message Message : The string is missing the terminator: '. Content : 'Hello Type : Position Start : 4 Length : 8 StartLine : 2 StartColumn : 3 EndLine : 3 EndColumn : 1  ## Examining Real Scripts To examine real file-based scripts, simply embed the logic from above inside a pipeline-aware function. Test-PSOneScript does exactly this and makes parsing PowerShell files a snap: function Test-PSOneScript { <# .SYNOPSIS Parses a PowerShell Script (*.ps1, *.psm1, *.psd1) .DESCRIPTION Invokes the simple PSParser and returns tokens and syntax errors .EXAMPLE Test-PSOneScript -Path c:\test.ps1 Parses the content of c:\test.ps1 and returns tokens and syntax errors .EXAMPLE Get-ChildItem -Path$home -Recurse -Include *.ps1,*.psm1,*.psd1 -File |
Test-PSOneScript |
Out-GridView

parses all PowerShell files found anywhere in your user profile

.EXAMPLE
Get-ChildItem -Path $home -Recurse -Include *.ps1,*.psm1,*.psd1 -File | Test-PSOneScript | Where-Object Errors parses all PowerShell files found anywhere in your user profile and returns only those files that contain syntax errors .LINK https://powershell.one #> param ( # Path to PowerShell script file # can be a string or any object that has a "Path" # or "FullName" property: [String] [Parameter(Mandatory,ValueFromPipeline)] [Alias('FullName')]$Path
)

begin
{
$errors =$null
}
process
{
# create a variable to receive syntax errors:
$errors =$null
# tokenize PowerShell code:
$code = Get-Content -Path$Path -Raw -Encoding Default

# return the results as a custom object
[PSCustomObject]@{
Name = Split-Path -Path $Path -Leaf Path =$Path
Tokens = [Management.Automation.PSParser]::Tokenize($code, [ref]$errors)
Errors = $errors | Select-Object -ExpandProperty Token -Property Message } } }  ### Parsing Individual Files To parse an individual file, simply submit its path to Test-PSOneScript. It immediately returns the tokens and any syntax errors (if present): $Path = "C:\Users\tobia\test.ps1"
$result = Test-PSOneScript -Path$Path


### Checking for Errors

PS> $result.Errors.Count -gt 0 False  To get a list of all token types present in the script, try this (the output may vary depending on the actual code in your script file, of course): PS>$result.Tokens.Type | Sort-Object -Unique
Command
CommandParameter
CommandArgument
Number
String
Variable
Member
Type
Operator
GroupStart
GroupEnd
Keyword
Comment
NewLine


### Creating a List of Used Variables

To get a list of all variables used in the script, simply filter for token type Variable:

PS> $result.Tokens | Where-Object Type -eq Variable | Sort-Object -Property Content -Unique | ForEach-Object { '${0}' -f $_.Content}$_ldaptype
$_SortedReportProp$AD_Capabilities
$AD_CreateDiagrams$AD_CreateDiagramSourceFiles
$AD_DomainGPOs ...$xlEqual
$zipPackage$ZipReport
$ZipReportName  ### Creating a List of Used Commands Likewise, if you’d like to get a list of commands used by the script, filter for the appropriate token type (Command): PS>$result.Tokens |
Where-Object Type -eq Command |
Sort-Object -Property Content -Unique |
Select-Object -ExpandProperty Content

ConvertTo-HashArray
ConvertTo-Html
...
Start-sleep
Test-Path
Where
write-error
Write-Output
Write-Verbose
Write-Warning


You can even analyze the frequency of how often commands were used. This gets you the 10 most-often used commands:

PS> $result.Tokens | Where-Object Type -eq Command | Select-Object -ExpandProperty Content | Group-Object -NoElement | Sort-Object -Property Count -Descending | Select-Object -First 10 Count Name ----- ---- 51 Search-AD 49 New-Object 35 Write-Verbose 29 get-date 25 % 24 New-TimeSpan 24 Where 21 select 19 Sort-Object 17 Invoke-Method  ### Analyzing Use of .NET Methods Maybe you are interested in finding out which native .NET methods the script uses. Again, it is just a matter of token filtering: PS>$result.Tokens |
Where-Object Type -eq Member |
Select-Object -ExpandProperty Content |
Sort-Object -Unique

Accessible
ActiveSheet
...
whenchanged
whencreated
Workbooks
Worksheets


At this point, you are reaching the limit of token analysis:

While it is nice to get a list of method names used by a script, it is not really useful. You’d need a bigger picture to know the object types the called methods belong to. All this is possible, too, but not with tokens alone. What’s required is a look at script structures that consist of multiple tokens - a case for the Abstract Syntax Tree (AST) which we shed light on in one of the next parts of this series.

## Bulk-Analysis: Scanning Entire Folders

Test-PSOneScript can’t just examine one file at a time. It is fully pipeline-aware and knows how to deal with files returned by Get-ChildItem.

### Finding Scripts With Errors

So if you want to identify scripts with syntax errors anywhere in your script library, simply runGet-ChildItem to gather the files to be tested, and pipe them to Test-PSOneScript like this:

# get all PowerShell files from your user profile...
Get-ChildItem -Path $home -Recurse -Include *.ps1, *.psd1, *.psm1 -File | # ...parse them... Test-PSOneScript | # ...filter those with syntax errors... Where-Object Errors | # ...expose the errors: ForEach-Object { [PSCustomObject]@{ Name =$_.Name
Error = $_.Errors[0].Message Type =$_.Errors[0].Type
Line = $_.Errors[0].StartLine Column =$_.Errors[0].StartColumn
Path = $_.Path } }  This will find any script with any syntax error. If you’d like to be more specific, you can filter on the error message. ### Identifying Risky Commands The sky is the limit, so if you’d like to identify scripts that use risky commands such as Invoke-Expression, just adjust the filter: $blacklist = @('Invoke-Expression', 'Stop-Computer', 'Restart-Computer')

# get all PowerShell files from your user profile...
Get-ChildItem -Path $home -Recurse -Include *.ps1, *.psd1, *.psm1 -File | # ...parse them... Test-PSOneScript | # ...filter those using commands in our blacklist Foreach-Object { # get the first token that is a command and that is in our blacklist$badToken = $_.Tokens.Where{$_.Type -eq 'Command'}.Where{$_.Content -in$blacklist} |
Select-Object -First 1

if ($badToken) {$_ | Add-Member -MemberType NoteProperty -Name BadToken -Value $badToken -PassThru } } | # ...expose the errors: ForEach-Object { [PSCustomObject]@{ Name =$_.Name
Offender = $_.BadToken.Content Line =$_.BadToken.StartLine
Column = $_.BadToken.StartColumn Path =$_.Path
}
}


## What’s Next

Using thePSParser is just your first step into the wonderful world of tokens and script analysis. In the next part we’ll take a look at the more sophisticated Parser object which was introduced in PowerShell 3 and differentiates 150 different token kinds plus 26 token flags.

And if that’s still not enough detail, we look into the Abstract Syntax Tree (AST) and how it forms meaningful structures from a group of token.

BTW, have you checked out PowerShell Conference EU yet? Both Call for Papers and Delegate Registration are open!