Advanced Tokenizing PowerShell Scripts

The advanced PowerShell Parser turns PowerShell code into detailed tokens. Use them to auto-document, analyze or just find your scripts. You can also perfectly colorize your code.

The advanced PowerShell Parser turns PowerShell code into detailed tokens. Use them to auto-document, analyze or just find your scripts. You can also perfectly colorize your code.

When PowerShell surfaced in version 1, it came with a basic PSParser that can turn PowerShell Code into tokens. It soon turned out that PSParser has a few blind spots and cannot deal with nested tokens.

So in PowerShell 3, a new and more powerful Parser was introduced that breaks up PowerShell Code in a much more detailed range of token. In this article we examine the new Parser, and you get a ready-to-use function to parse your scripts: Get-PSOneToken.

Please make sure you have installed the latest version of the module PSOneTools, or else copy and paste the source codes of this article.

Install-Module -Name PSOneTools -Scope CurrentUser -Force

Overview

The PSParser we already covered knows 20 different token types. The new Parser we are covering today knows 150 different token kinds, each of which can be decorated with 26 token flags. This provides a very detailed picture, especially when it comes to nested token.

Nested Token

PowerShell can embed variables and expressions in Expandable Strings (double-quoted strings), so when you take a look at these lines, how many token do you see?

"Hello $env:username"
"PowerShell Version: $($host.Version)"

PSParser would see just one string token per line, plus one token for the NewLine:

$code = '"Hello $env:username"
"PowerShell Version: $($host.Version)"
'

Test-PSOneScript -Code $code | Select-Object -ExpandProperty Tokens

All nested token are invisible to PSParser.

Using the New Parser

Because of these limitations, I added a new function to the module PSOneTools that uses the new Parser instead of the old PSParser: Get-PSOneToken. Make sure you are using the latest version of the module PSOneTools, or copy and run the source code below.

Support For Nested Token

The new parser supports nested tokens:

PS> # get tokens for code:
PS> $result = Get-PSOneToken -Code '"Hello $env:username"'

PS> # emit tokens:
PS> $result.Tokens

NestedTokens : {username}
Value        : Hello $env:username
Text         : "Hello $env:username"
TokenFlags   : ParseModeInvariant
Kind         : StringExpandable
HasError     : False
Extent       : "Hello $env:username"

Text       : 
TokenFlags : ParseModeInvariant
Kind       : EndOfInput
HasError   : False
Extent     : 

PS> # tokens of type "StringExpandable" have a "NestedTokens" property:
PS> $result.Tokens[0].NestedTokens

Name         : username
VariablePath : env:username
Text         : $env:username
TokenFlags   : None
Kind         : Variable
HasError     : False
Extent       : $env:username

Support For Syntax Errors

Get-PSOneToken and the new Parser also return potential syntax errors, just like Test-PSOneScript and the PSParser, but there are way more details:

PS> # using the new Parser with syntax error in code:
PS> $result = Get-PSOneToken -Code '"Hello'
PS> $result.Errors

Message             : The string is missing the terminator: ".
IncompleteInput     : True
ErrorId             : TerminatorExpectedAtEndOfString
File                : 
StartScriptPosition : System.Management.Automation.Language.InternalScriptPosition
EndScriptPosition   : System.Management.Automation.Language.InternalScriptPosition
StartLineNumber     : 1
StartColumnNumber   : 1
EndLineNumber       : 1
EndColumnNumber     : 7
Text                : "Hello
StartOffset         : 0
EndOffset           : 6

PS> # using the old PSParser with syntax error in code:
PS> $result = Test-PSOneScript -Code '"Hello'
PS> $result.Errors

Message     : The string is missing the terminator: ".
Content     : "Hello
Type        : Position
Start       : 0
Length      : 6
StartLine   : 1
StartColumn : 1
EndLine     : 1
EndColumn   : 7

Finding All Variables in Code

Let’s start with retrieving all variables from a script. The improved detail provided by Parser over PSParser is great but also challenging because you first need to know the Kind of token and (optionally) its TokenFlags.

Investigating Tokens

To find out Kind and TokenFlags of what you are after, begin with submitting sample code that contains the expression:

PS> (Get-PSOneToken -Code '$a = 10').Tokens

Name         : a
VariablePath : a
Text         : $a
TokenFlags   : None
Kind         : Variable
HasError     : False
Extent       : $a

Text       : =
TokenFlags : AssignmentOperator
Kind       : Equals
HasError   : False
Extent     : =

...

As you see, the variable in your sample code is represented by a token of kind Variable with no special TokenFlags.

Retrieving Token of Type “Variable”

Once you know the Kind of token you are after, ask Get-PSOneToken to return just the token of that kind:

PS> Get-PSOneToken -Code '$a = 12' -TokenKind Variable

Name         : a
VariablePath : a
Text         : $a
TokenFlags   : None
Kind         : Variable
HasError     : False
Extent       : $a

Why Finding Tokens Rocks…

As you’ll soon discover, -TokenKind sports full intellisense, and also accepts multiple kinds. This line would find all tokens that either start a PowerShell Class, PowerShell Workflow, or PowerShell Function:

Get-PSOneToken -Code 'class test {}; workflow simple {}; function abc {}' -TokenKind Command, Class, Workflow, function

If you’re scratching your head why finding such token could be useful, here is a clap on your front head: for example to identify scripts that define such things! Or to tag your scripts in some sort of inventory.

Finding Scripts that Define Classes or Workflows

Get-PSOneToken is fully pipeline-aware, so you can easily use it as a filter and find scripts that contain a given token. This example lists all scripts that define PowerShell Classes or PowerShell Workflows in your user profile (if any):

# finding all scripts that define classes or workflows:
Get-ChildItem -Path $home -Include *.ps1, *.psm1 -Recurse -ErrorAction SilentlyContinue |
    Where-Object {
        # if there is at least one of the requested token in the
        # script, let it pass:
        $_ | Get-PSOneToken -TokenKind Class, Workflow | Select-Object -First 1 
    }

Finding Operators

When you look at the variety of values you can specify for -TokenKind, you can easily find exactly the token you are after, for example an assignment operator:

PS> Get-PSOneToken -Code '$a = 100; if ($a -eq 100) { 100 }' -TokenKind Equals

Text       : =
TokenFlags : AssignmentOperator
Kind       : Equals
HasError   : False
Extent     : =

If you wanted to find the -eq comparison operator, you’d have to use the kind Ieq instead of Equals:

PS> Get-PSOneToken -Code '$a = 100; if ($a -eq 100) { 100 }' -TokenKind Ieq

Text       : -eq
TokenFlags : BinaryPrecedenceComparison, BinaryOperator
Kind       : Ieq
HasError   : False
Extent     : -eq

The Parser is always looking for the real operator. There operator -Eq in reality is just an alias for -Ieq, the case-insensitive equality operator.

Retrieving Token Groups: TokenFlags

What if you wanted to list all operators? That’s when TokenFlags are helpful because they group similar token Kinds. Have a look:

PS> (Get-PSOneToken -Code '$a = 100; $a -eq 100; $a -gt 100').Tokens | 
        Select-Object -Property Text, TokenFlags

Text                                 TokenFlags
----                                 ----------
$a                                         None
=                            AssignmentOperator
100                                        None
;                            ParseModeInvariant
$a                                         None
-eq  BinaryPrecedenceComparison, BinaryOperator
100                                        None
;                            ParseModeInvariant
$a                                         None
-gt  BinaryPrecedenceComparison, BinaryOperator
100                                        None
                             ParseModeInvariant

All comparison operators share the TokenFlag BinaryOperator, so to get all comparison operators from your code, request this TokenKind:

PS> Get-PSOneToken -Code '$a = 100; $a -eq 100; $a -gt 100' -TokenFlag BinaryOperator

Text       : -eq
TokenFlags : BinaryPrecedenceComparison, BinaryOperator
Kind       : Ieq
HasError   : False
Extent     : -eq

Text       : -gt
TokenFlags : BinaryPrecedenceComparison, BinaryOperator
Kind       : Igt
HasError   : False
Extent     : -gt

If you wanted to get a list of all binary operators used by a script, it’s almost trivial now:

# replace with path to your file
$Path = "C:\...\file.ps1"

Get-PSOneToken -Path $Path -TokenFlag BinaryOperator |
Select-Object -ExpandProperty Text |
Sort-Object -Unique

Nested Token Support

Get-PSOneToken uses the new Parser so nested token inside ExpandableString tokens are no longer a blind spot. To unwrap them from the top-level tokens, they just need to be unwrapped recursively. This is done by Expand-PSOneToken:

PS> # by default, nested tokens are not returned:
PS> Get-PSOneToken -Code '"Hello $host"' -TokenKind StringExpandable | Select-Object -ExpandProperty Text

"Hello $host"

PS> # nested tokens can be unwrapped though:
PS> Get-PSOneToken -Code '"Hello $host"' -TokenKind StringExpandable | Expand-PSOneToken | Select-Object -ExpandProperty Text

"Hello $host"
$host

This functionality is already built into Get-PSOneToken when you specify the parameter -IncludeNestedToken:

PS> # get top-level token only:
PS> (Get-PSOneToken -Code '"Hello $host"').Tokens.Text

"Hello $host"

PS> # include nested tokens by specifying -IncludeNestedToken:
PS> (Get-PSOneToken -Code '"Hello $host"' -IncludeNestedToken).Tokens.Text

"Hello $host"
$host

Now you won’t miss anything. This gets you a list of all variables, including $secret used in nested expressions inside expandable strings:

$code = {
  $a = 1
  $b = 2
  "This is also used: $($secret = 100; $secret)"
}

Get-PSOneToken -ScriptBlock $code -TokenKind Variable -IncludeNestedToken |
  Select-Object -ExpandProperty Text |
  Sort-Object -Unique

Auto-Documenting Script Files

Thanks to Get-PSOneToken, it is now trivial to analyze script files and retrieve all kinds of lists and statistics.

Creating List of Used Variables

Take a look at the simple code to get a list of all variables used in a script, for example to create some automated documentation:

# replace with path to your file
$Path = "C:\...\file.ps1"

Get-PSOneToken -Path $Path -TokenKind Variable -IncludeNestedToken |
Select-Object -ExpandProperty Text |
Sort-Object -Unique

Unlike using the old PSParser, the code now also picks up variables that hide inside nested token.

Creating a List of Used Commands

Likewise, if you’d like to get a list of commands used by the script, filter for the appropriate token type. Commands are represented by token of kind Generic and contain a TokenFlag of type CommandName:

# replace with path to your file
$Path = "C:\...\file.ps1"

Get-PSOneToken -Path $Path -TokenKind Generic -TokenFlag CommandName -IncludeNestedToken |
Select-Object -ExpandProperty Text |
Sort-Object -Unique

You can even analyze the frequency of how often commands were used. This gets you the 10 most-often used commands:

# replace with path to your file
$Path = "C:\...\file.ps1"

Get-PSOneToken -Path $Path -TokenKind Generic -TokenFlag CommandName -IncludeNestedToken |
Select-Object -ExpandProperty Text |
Group-Object -NoElement |
Sort-Object -Property Count -Descending |
Select-Object -First 10

Get-PSOneToken

The bulk of work is done by Get-PSOneToken. The easiest way is to download and install the command via the PowerShell Gallery and Install-Module:

Install-Module -Name PSOneTools -Scope CurrentUser -Force

I added this to version 1.4 of the module, so if you have installed the module previously, make sure you install the latest version or update it via Update-Module.

Implementation

Here is the source code:

function Get-PSOneToken
{
  <#
      .SYNOPSIS
      Parses a PowerShell Script (*.ps1, *.psm1, *.psd1) and returns the token

      .DESCRIPTION
      Invokes the advanced PowerShell Parser and returns tokens and syntax errors

      .EXAMPLE
      Get-PSOneToken -Path c:\test.ps1
      Parses the content of c:\test.ps1 and returns tokens and syntax errors

      .EXAMPLE
      Get-ChildItem -Path $home -Recurse -Include *.ps1,*.psm1,*.psd1 -File |
      Get-PSOneToken |
      Out-GridView

      parses all PowerShell files found anywhere in your user profile

      .EXAMPLE
      Get-ChildItem -Path $home -Recurse -Include *.ps1,*.psm1,*.psd1 -File |
      Get-PSOneToken |
      Where-Object Errors

      parses all PowerShell files found anywhere in your user profile
      and returns only those files that contain syntax errors

      .LINK
      https://powershell.one/powershell-internals/parsing-and-tokenization/advanced-tokenizer
      https://github.com/TobiasPSP/Modules.PSOneTools/blob/master/PSOneTools/1.4/Get-PSOneToken.ps1
  #>

  [CmdletBinding(DefaultParameterSetName='Path')]
  param
  (
    # Path to PowerShell script file
    # can be a string or any object that has a "Path" 
    # or "FullName" property:
    [String]
    [Parameter(Mandatory,ValueFromPipeline,ParameterSetName='Path')]
    [Alias('FullName')]
    $Path,
    
    # PowerShell Code as ScriptBlock
    [ScriptBlock]
    [Parameter(Mandatory,ValueFromPipeline,ParameterSetName='ScriptBlock')]
    $ScriptBlock,
    
    
    # PowerShell Code as String
    [String]
    [Parameter(Mandatory, ValueFromPipeline,ParameterSetName='Code')]
    $Code,
    
    # the kind of token requested. If neither TokenKind nor TokenFlag is requested, 
    # a full tokenization occurs
    [System.Management.Automation.Language.TokenKind[]]
    $TokenKind = $null,

    # the kind of token requested. If neither TokenKind nor TokenFlag is requested, 
    # a full tokenization occurs
    [System.Management.Automation.Language.TokenFlags[]]
    $TokenFlag = $null,

    # include nested token that are contained inside 
    # ExpandableString tokens
    [Switch]
    $IncludeNestedToken

  )
  
  begin
  {
    # create variables to receive tokens and syntax errors:
    $errors = 
    $tokens = $null

    # return tokens only?
    # when the user submits either one of these parameters, the return value should
    # be tokens of these kinds:
    $returnTokens = ($PSBoundParameters.ContainsKey('TokenKind')) -or 
    ($PSBoundParameters.ContainsKey('TokenFlag'))
  }
  process
  {
    # if a scriptblock was submitted, convert it to string
    if ($PSCmdlet.ParameterSetName -eq 'ScriptBlock')
    {
      $Code = $ScriptBlock.ToString()
    }

    # if a path was submitted, read code from file,
    if ($PSCmdlet.ParameterSetName -eq 'Path')
    {
      $code = Get-Content -Path $Path -Raw -Encoding Default
      $name = Split-Path -Path $Path -Leaf
      $filepath = $Path

      # parse the file:
      $ast = [System.Management.Automation.Language.Parser]::ParseFile(
        $Path, 
        [ref] $tokens, 
      [ref]$errors)
    }
    else
    {
      # else the code is already present in $Code
      $name = $Code
      $filepath = ''

      # parse the string code:
      $ast = [System.Management.Automation.Language.Parser]::ParseInput(
        $Code, 
        [ref] $tokens, 
      [ref]$errors)
    }

    if ($IncludeNestedToken)
    {
      # "unwrap" nested token
      $tokens = $tokens | Expand-PSOneToken
    }

    if ($returnTokens)
    {
      # filter token and use fast scriptblock filtering instead of Where-Object:
      $tokens |
      & { process { if ($TokenKind -eq $null -or 
          $TokenKind -contains $_.Kind) 
          { $_ }
      }} |
      & { process {
          $token = $_
          if ($TokenFlag -eq $null) { $token }
          else {
            $TokenFlag | 
            Foreach-Object { 
              if ($token.TokenFlags.HasFlag($_)) 
            { $token } } | 
            Select-Object -First 1
          }
        }
      }
            
    }
    else
    {
      # return the results as a custom object
      [PSCustomObject]@{
        Name = $name
        Path = $filepath
        Tokens = $tokens
        # "move" nested "Extent" up one level 
        # so all important properties are shown immediately
        Errors = $errors | 
        Select-Object -Property Message, 
        IncompleteInput, 
        ErrorId -ExpandProperty Extent
        Ast = $ast
      }
    }  
  }
}

Tokenizing PowerShell Code

The tokenization is done by Parser which works very similar to the old PSParser discussed here:

[System.Management.Automation.Language.Parser]::ParseFile($Path, [ref] $tokens, [ref]$errors)

[System.Management.Automation.Language.Parser]::ParseInput($Code, [ref] $tokens, [ref]$errors)

Unlike PSParser, it supports parsing string input as well as reading input from file. It returns the Abstract Syntax Tree which we discuss in the next part, plus returns tokens and syntax errors by reference.

Expand-PSOneToken

This function takes care of recursively expanding nested token and is also part of the module PSOneTools, so when you installed it above, you already have this command.

Implementation

Most of the hard work is done by PowerShell, and Expand-PSOneToken utilizes the power of the PowerShell Parameter Binder:

function Expand-PSOneToken
{
  <#
      .SYNOPSIS
      Expands all nested token from a token of type "StringExpandable"

      .DESCRIPTION
      Recursively emits all tokens embedded in a token of type "StringExpandable"
      The original token is also emitted.

      .EXAMPLE
      Get-PSOneToken -Code '"Hello $host"' -TokenKind StringExpandable | Expand-PSOneToken 
      Emits all tokens, including the embedded (nested) tokens
      .LINK
      https://powershell.one/powershell-internals/parsing-and-tokenization/advanced-tokenizer
      https://github.com/TobiasPSP/Modules.PSOneTools/blob/master/PSOneTools/1.4/Expand-PSOneToken.ps1
  #>

  # use the most specific parameter as default:
  [CmdletBinding(DefaultParameterSetName='ExpandableString')]
  param
  (
    # binds a token of type "StringExpandableToken"
    [Parameter(Mandatory,ParameterSetName='ExpandableString',
    Position=0,ValueFromPipeline)]
    [Management.Automation.Language.StringExpandableToken]
    $StringExpandable,

    # binds all tokens
    [Parameter(Mandatory,ParameterSetName='Token',
    Position=0,ValueFromPipeline)]
    [Management.Automation.Language.Token]
    $Token
  )

  process
  {
    switch($PSCmdlet.ParameterSetName)
    {
      # recursively expand token of type "StringExpandable"
      'ExpandableString'  { 
        $StringExpandable 
        $StringExpandable.NestedTokens | 
          Where-Object { $_ } | 
          Expand-PSOneToken
      }
      # return regular token as-is:
      'Token'             { $Token }
      # should never occur:
      default             { Write-Warning $_ }
    }
  }
}

The function uses two ParameterSets, one for token of type StringExpandableToken, and one for the more generic type Token. When tokens are piped into the function, the Parameter Binder automatically picks the most appropriate parameter set.

If a StringExpandableToken is piped into Expand-PSOneToken, the token is emitted, and then the property NestedToken with the nested tokens is piped into Expand-PSOneToken again. That takes care of all recursively nested tokens.

Be aware that NestedToken can be empty. To not continue recursion on empty nested tokens, the code runs NestedToken through Where-Object and uses the object as clause. If it is $null, it gets filtered out.

If a regular Token is piped into Expand-PSOneToken, it simply gets emitted with no recursivity.

What’s Next

The new Parser provides much richer detail than the old PSParser but is still token-based. It is the Abstract Syntax Tree (AST) that takes individual tokens and forms higher structures from it. With the help of the AST, you can finally find more sophisticated information such as assignments, function definitions, and much more. That is what we’ll be looking at in the next part

Meanwhile, if you enjoyed the reading then you may want to look at PowerShell Conference EU - a place where you find yourself in excellent company! Each year, this conference becomes a busy and friendly meeting point for Advanced PowerShellers from more than 35 countries. That’s where things like what you just read about, and the latest & greatest PowerShell tricks and gossip are shared. It’s also a perfect place for networking, making new friends that share your very special passion about scripting and PowerShell.

Both Call for Papers and Delegate Registration are open!