Understanding PowerShell Pipeline - powershell.one

Automation solutions rarely consist of a single command. More often, they combine a number of commands that resemble the steps it takes to automate what you want.

PowerShell supports two ways of combining commands: you can either use classic variables to store the result from one command and feed it to a parameter of another one. Or you can use the modern streaming pipeline.

A pipeline is not the evolution of variables nor is it always the better choice. As you will see in this article, both approaches have distinct advantages and disadvantages and complement each other well.

Classic variables are similar to “downloading” data which is fast but resource-intense.

The pipeline works like “streaming” data which provides results almost momentarily and minimizes resources, yet overall it’s slower than using variables.

Scriptblocks resemble the fundamental building blocks of pipeline-aware commands. In this article, I am focusing on how the PowerShell pipeline works and what its benefits and drawbacks are from a users perspective.

In an upcoming article, I’ll switch focus and look at the same from a developers aspect, showing how you can best support the PowerShell pipeline in your own functions and scripts.

Pipeline: Streaming Data

The PowerShell pipeline is a mechanism provided by the PowerShell engine to pass information from one command to another one. That’s important to know. The commands that you use inside a pipeline are completely unaware that the pipeline even exists. The pipeline is a service provided by the PowerShell engine to easily combine two or more commands:

PowerShell Pipeline

Thanks to the pipeline, commands can be combined without the need to use variables, and without manually passing values to parameters.

Many PowerShell code examples follow this scheme. Here is a PowerShell one-liner combining three commands to read the latest 20 errors and warnings from the system eventlog and output them to a text report, making sure that all text is visible and extra long text lines are wrapped instead of cut off:

Get-EventLog -LogName System -EntryType Error,Warning -Newest 20 | Format-Table -AutoSize -Wrap | Out-File -FilePath $env:temp\report.txt -Width 120

Commands Are Lone Wolves

Any command working in a pipeline is working in its own universe and knows nothing about the upstream and downstream commands in the pipeline. It receive its input via its regular parameters, just as if you had called the command stand-alone.

It is the PowerShell engine that takes the results from the upstream command and feeds it to the parameters of the following downstream command. So a pipeline is always just one option of many, and you could always rewrite a pipeline one-liner and have it use classic variables and individual command calls instead:

# get latest 20 errors and warnings from system eventlog:
$events = Get-EventLog -LogName System -EntryType Error, Warning -Newest 20

# format output to produce a table, do not cut off extra long lines but instead wrap them:
$formatting = Format-Table -AutoSize -Wrap -InputObject $events

# write output to file with a maximum width of 120 characters:
Out-File -FilePath $env:temp\report.txt -Width 120 -InputObject $formatting

# open produced output file in notepad editor:
notepad $env:temp\report.txt

As you immediately recognize, classic variables work like messengers and carry the results from one command to the next in a typical criss-cross pattern:

Variable Assignment

Parameter Binding

With classic variables, you decide which parameter of the following command receives the variable: you simply assign values manually to command parameters.

In a pipeline, it is the PowerShell engine that automatically binds the results from one command to the parameter(s) of the next. How does PowerShell know which parameter to pick?

The PowerShell parameter binder is taking care of this and probably one of the most complex parts of the PowerShell engine. It relies in part on information provided by the commands. That’s why you cannot pipe data to just any command. A command needs to actively support pipeline input and declare which parameters can accept pipeline input, and when.

Expose Pipeline-Aware Parameters

Get-PipelineAwareParameter is a test function that exposes this parameter binding information. Specify the name of any valid command, and it returns the parameters that can accept pipeline input:

function Get-PipelineAwareParameter
{
  param
  (
    # Name of command to examine:
    [Parameter(Mandatory)]
    [string]
    $CommandName
  )
  
  # exclude common parameter:
  $commonParameter = 'Verbose',
  'Debug',
  'ErrorAction',
  'WarningAction',
  'InformationAction',
  'ErrorVariable',
  'WarningVariable',
  'InformationVariable',
  'OutVariable',
  'OutBuffer',
  'PipelineVariable'
  
  # get command information:
  $commandInfo = Get-Command -Name $CommandName
  
  # identify unique parameters
  $hash = [System.Collections.Generic.Dictionary[[string],[int]]]::new()
  $CommandInfo.ParameterSets.Foreach{$_.Parameters.Foreach{$hash[$_.Name]++}}
  
  # look at each parameterset separately...
  $CommandInfo.ParameterSets | ForEach-Object {
    # ...list the unique parameters that are allowed in this parameterset:
    if ($_.IsDefault)
    {
      $parameters = '(default)'
    }
    else
    {
      $parameters = $_.Parameters.Name.Where{$commonParameter -notcontains $_}.Where{$hash[$_] -eq 1}.Foreach{"-$_"} -join ', '
    }
    
    # check each parameter in this parameterset...
    $_.Parameters | 
      # include only those that accept pipeline input:
      Where-Object { $_.ValueFromPipeline -or $_.ValueFromPipelineByPropertyName} |
      ForEach-Object {
        # if the parameter accepts pipeline input via object properties...
        if ($_.ValueFromPipelineByPropertyName)
        {
          # list the property names in relevant order:
          [System.Collections.Generic.List[string]]$aliases = $_.Aliases
          $aliases.Insert(0, $_.Name)
          $propertyNames = '$input.' + ($aliases -join ', $input.')
        }
        else
        {
          $propertyNames = ''
        }
        
        # return info about parameter:
        [PSCustomObject]@{
          # parameter name:
          Parameter = '-{0}' -f $_.Name
          # required data type
          Type = '[{0}]' -f $_.ParameterType
          # accepts pipeline input directly?
          ByValue = $_.ValueFromPipeline
          # reads property values?
          ByProperty = $propertyNames
          # list of parameters in this parameterset
          TriggeredBy = $parameters
        }
      }
  }
}

This is how you examine a command like Stop-Service with Get-PipelineAwareParameter:

Get-PipelineAwareParameter -CommandName Stop-Service

The result lists two pipeline-aware parameters: -InputObject and -Name:

Parameter   : -InputObject
Type        : [System.ServiceProcess.ServiceController[]]
ByValue     : True
ByProperty  : 
TriggeredBy : (default)

Parameter   : -Name
Type        : [System.String[]]
ByValue     : True
ByProperty  : $input.Name, $input.ServiceName
TriggeredBy : -Name

Binding Parameters Via Pipeline

Based on the information exposed by Get-PipelineAwareParameter, Stop-Service apparently accepts pipeline information on two of its parameters:

-InputObject expects input of type [System.ServiceProcess.ServiceController[]] which represents one or more actual services, for example the results of Get-Service. These services can be processed directly (ByValue). This parameter is the default parameter.
-Name expects input of type [System.String[]] which represents one or more service names (strings). This parameter accepts both plain strings (ByValue) and input objects with the properties Name or ServiceName that contain strings.

Let’s test this and try some sample calls, feeding different information into Stop-Service. All of the below pipeline bindings work and produce the exact same result:

# using -InputObject and ServiceController objects (ByValue):
Get-Service -Name Spooler | Stop-Service -WhatIf

# using strings (ByValue):
'Spooler' | Stop-Service -WhatIf

# using objects with property "Name" (ByPropertyName)
[PSCustomObject]@{Name='Spooler'} | Stop-Service -WhatIf

# using objects with property "ServiceName" (ByPropertyName)
[PSCustomObject]@{ServiceName='Spooler'} | Stop-Service -WhatIf

Contracts: ISA and HASA

Assume for a second you want to cook apple pie. For this you obviously need apples. To enable others to provide you with apples, you can generally receive them via two routes:

ISA: Either you accept apples directly. That is a so-called ISA (or “is a”) contract. PowerShell calls this type of contract ByValue: you get exactly the value you are looking for. Stop-Service uses a ISA contract for the parameters -InputObject (which accepts ServiceController objects directly) and -Name (which accepts string arrays directly).
HASA: You can accept of course an entire apple tree, too, and pick the apples yourself. This is the so-called HASA (or “has a”) contract where you list the properties of an object that you’ll accept. PowerShell calls this type of contract ByPropertyName: you get a complex object and take the value from one of its properties. Stop-Service uses a HASA contract for its parameter -Name: it accepts any object and “picks” the service name from the properties Name or ServiceName, in this order.

Contracts Turn Commands Into Lego Pieces

The ISA and HASA pipeline contracts turn commands into flexible lego pieces that can be combined in numerous ways as long as the results of one command meet the contracts of the following command. That’s why PowerShell is so insanely flexible, and all commands peacefully work together regardless of who created them. They all adhere to the same input contracts.

That’s coincidentally also the reason why PowerShell was initially code-named Monad: this code name is a tribute to Leibniz’ Monadology, a philosophical approach to explain the world: a Monads is a simple base unit, and Leibniz explains it to be “one, has no parts and is therefore indivisible”. Anything substantial in the world is - according to this philosophy - composed by a combination of these Monads and can be recombined to other things.

In PowerShells ecosystem, pipeline-aware commands are “Monads”. The pipeline combines them to create something complex. The same basic commands (Monads) can produce different things when rearranged differently. The lego company picked up this concept as well.

Combining Commands Made Simple

Since Stop-Service happily accepts string arrays, you can use Get-Content (which produces string arrays) to read a text file with service names, and pipe these directly to Stop-Service:

$path = "C:\servicesToStop.txt"
Get-Content -Path $path | Where-Object { $_.Trim() } | Stop-Service -WhatIf

The example just adds Where-Object to filter out any blank lines.

Likewise, you can use the comma operator to produce string arrays yourself and pipe these to Stop-Service to stop all of them:

'wuauserv', 'spooler' | Stop-Service -WhatIf

Or, you could dump the dependent services from a running service which turn out to be rich servicecontroller objects:

Get-Service -Name Winmgmt | Select-Object -ExpandProperty DependentServices

Status   Name               DisplayName                           
------   ----               -----------                           
Running  MBAMService        Malwarebytes Service                  
Stopped  NcaSvc             Network Connectivity Assistant        
Running  jhi_service        Intel(R) Dynamic Application Loader...
Running  iphlpsvc           IP Helper         

Since Stop-Service also accepts servicecontroller objects, you could again pipe these dependent services directly to Stop-Service:

Get-Service -Name Winmgmt | Select-Object -ExpandProperty DependentServices | Stop-Service -WhatIf

No Mind Reader Included

The PowerShell pipeline does not come with a mind reader, and you can’t just combine commands and hope for the best. The pipeline contracts just make sure that commands can pass data to each other. This doesn’t necessarily mean that combining commands actually makes sense, even if the contract allows it.

You can for example technically pipe the results from Get-ChildItem to Stop-Service:

# works but produces errors:
Get-ChildItem -Path 'c:\windows' | Stop-Service -WhatIf

The contract defined by Stop-Service accepts objects with a property Name when this property contains strings. This is true for the file system objects provided by Get-ChildItem. Still there are of course no services with such names, so Stop-Service throws exceptions complaining it can’t find the services.

That’s just as if you feed a text file into Stop-Service using Get-Content (see above): PowerShell and the command contracts allow this, however it is your responsibility to make sure the actual data makes sense.

Real-Time Processing

The PowerShell pipeline isn’t just a convenient way for users to take results from one command and automatically bind them to a parameter of another command. That’s fun and the basis of a lot of powerful one-liners, but the pipeline has a way more important technical feature: it is a highly efficient real-time streaming mechanism that saves resources and can speed up producing results.

Like outlined in the intro, you can automate anything either by using variables or by using the pipeline. This is a choice you have, and you shouldn’t base this choice on how you feel today. There are technical reasons when you should use variables and when a pipeline works better.

The following section helps you understand what the benefits and disadvantages are so you can make an educated choice the next time you write some PowerShell code.

Comparing To Classic Variables

When you use variables, your script strictly executes one command after the next. Not only do you have to wait for all data to be collected. You also need to always store all data in memory. Here is a classic example that looks for all log files in your windows folder and displays only the files that have changed within the past 12 hours:

# get all log files
$logFiles = Get-ChildItem -Path $env:windir -Recurse -Filter *.log -Include *.log -File -ErrorAction Ignore

# filter those that have changed in the past 12 hours:
$12HoursAgo = (Get-Date).AddHours(-12)
$latestChanges = foreach($file in $logFiles)
{
  if ($file.LastWriteTime -gt $12HoursAgo)
  {
    # return file
    $file
  }
}

# display files
$latestChanges | Out-GridView -Title 'LogFiles With Recent Changes'

When you run this, you’ll notice that it takes forever. PowerShell first searches the entire windows folder for log files, then filters out the ones with recent changes, and then finally displays them. Plus when you keep an eye on a resource monitor, you’ll see your script burn a lot of memory because $logFiles keeps all files in memory.

Pipeline Produces Faster Results

If you do the same with the pipeline, the code is shorter but you’ll also get results much quicker. Actually, you receive results in real-time: the moment Get-ChildItem discovers a suitable log file, it is already displayed by Out-GridView, and you can see the file list grow that the gridview shows as more results come in:

# filter those that have changed in the past 12 hours:
$12HoursAgo = (Get-Date).AddHours(-12)

Get-ChildItem -Path $env:windir -Recurse -Filter *.log -Include *.log -File -ErrorAction Ignore |
  Where-Object LastWriteTime -gt $12HoursAgo |
  Out-GridView -Title 'LogFiles With Recent Changes'

Variables Are Faster Overall

Even though the pipeline appears to be faster because first results appear in real-time, if you measure the total time, variables win. The time penalty you pay for the pipeline grows exponentially with the number of passed objects:

# with a few objects, both approaches work equally well:
Get-Random -InputObject (1..49) -Count 7
1..49 | Get-Random -Count 7

# piping many objects exposes an exponential time penalty:
$manyObjects = 1..10000000
# this is still fast:
Get-Random -InputObject $manyObjects -Count 7
# very slow now:
$manyObjects | Get-Random -Count 7

There is a reason for this, and part of the time is caused by a long-standing PowerShell bug that bites both Windows PowerShell and PowerShell 7. You can read more and even work around it and remove the pipeline time penalty.

Executing Many Command At The Same Time

As you have seen with the log file example, the PowerShell pipeline manages to execute all commands inside the pipeline at the same time: while Get-ChildItem was still producing results, Out-GridView could already display some.

begin, process, end

The pipeline is not doing parallel processing, though. It just cleverly interweaves the code of all commands. That’s why pipeline-aware PowerShell commands can provide up to three different code blocks: begin, process, and end:

begin: This code executes once before the pipeline starts to process data. Code can be used to initialize things that only need to be set up once.
process: This code executes repeatedly like a loop, once per incoming pipeline object.
end: This code executes once after all pipeline elements have been processed. Code can be used to clean up things, i.e. delete temporary files.

So when you combine three commands in a pipeline…

PowerShell Pipeline

…PowerShell recombines the code from all three commands and executes the code in this fashion:

PowerShell Pipeline

The process block is repeated for each object travelling through the pipeline. The illustration shows two objects travelling the pipeline.

To better understand how the three code sections play together, use Test-Pipeline. It is a simple pipeline-aware PowerShell function that visualizes when and how often the three code sections are running:

function Test-Pipeline
{
  param
  (
    [Parameter(Mandatory)]
    [string]
    $Name,
    
    [Parameter(ValueFromPipeline)]
    [object]
    $InputObject
  )
  
  begin   { Write-Host "BEGIN   ${Name}" -ForegroundColor Green }
  process 
  { 
    Write-Host "PROCESS ${Name}: $InputObject" -ForegroundColor DarkYellow 
    # pass received object on to next command:
    $InputObject
  }
  end     { Write-Host "END     ${Name}" -ForegroundColor Red }
}

As you will discover in just a moment, the PowerShell pipeline works like a huge loop and always processes only one object. So when a command emits a number of results, the pipeline operator takes only the first result object and guides it through the process blocks of all commands. The next result object enters the pipeline only when the previous result object is completely processed and returned.

This explains both why results become visible so quickly (as they are produced), and why the pipeline requires only very little memory (only one object is held in memory at any given time).

Processing One Pipeline Element

Let’s get practical: combine a number of calls to Test-Pipeline in a pipeline:

1 | Test-Pipeline -Name A | Test-Pipeline -Name B | Test-Pipeline -Name C 

The result clearly shows how PowerShell combines the code of all three commands:

BEGIN   A
BEGIN   B
BEGIN   C
PROCESS A: 1
PROCESS B: 1
PROCESS C: 1
1
END     A
END     B
END     C

First, the begin section of all three commands execute.
Next, the process section processes the input data, and at the end the data is returned.
Finally, the end section of all three commands execute.

Processing Many Pipeline Elements

Now check out what happens when you submit more than one piece of data to the pipeline:

1..3 | Test-Pipeline -Name A | Test-Pipeline -Name B | Test-Pipeline -Name C 

This time, three objects float through the pipeline:

BEGIN   A
BEGIN   B
BEGIN   C
PROCESS A: 1
PROCESS B: 1
PROCESS C: 1
1
PROCESS A: 2
PROCESS B: 2
PROCESS C: 2
2
PROCESS A: 3
PROCESS B: 3
PROCESS C: 3
3
END     A
END     B
END     C

Again, begin executes only once for all three commands.
This time, process processes each incoming data object, so there is only one instance of a data object processed inside the pipeline at any given time. This reduces the memory footprint, and as you can see, processed results are emitted in real-time, while all commands are still busy producing and processing data.
At the end, end executes once for all three commands.

So a pipeline can process as many objects as you want. The process block acts like a loop and repeats for each incoming pipeline object.

Processing NULL Values

When a pipeline command emits no results, the result looks different. I am using Get-Service with a service name that does not exist to simulate a command that does not emit any results:

Get-Service -Name None -ErrorAction Ignore | Test-Pipeline -Name A | Test-Pipeline -Name B | Test-Pipeline -Name C

The result visualizes that now the process block is skipped:

BEGIN   A
BEGIN   B
BEGIN   C
END     A
END     B
END     C

Which makes total sense: the begin block executes before any pipeline data is processed so PowerShell cannot know at this time if any results need to be processed. The process block processes each incoming object, so when no results are fed to the pipeline commands, this block never executes. The end block then executes and does all necessary cleanup.

There is a strange phenomenon that occasionally causes confusion: when you pipe $null to a command, this null value is processed by the pipeline:
$null | Test-Pipeline -Name A | Test-Pipeline -Name B | Test-Pipeline -Name C
BEGIN   A
BEGIN   B
BEGIN   C
PROCESS A: 
PROCESS B: 
PROCESS C: 
END     A
END     B
END     C
Compare this result to a slightly different call:
@() | Test-Pipeline -Name A | Test-Pipeline -Name B | Test-Pipeline -Name C 
This time, the process block is skipped. When you pipe an empty array, PowerShell unwraps the array and feeds each array element to the commands, one at a time, which coincidentally is - nothing. With an empty array, no elements are fed into the pipeline thus process is skipped.

So $null (or any other variable with a null content) really is a special case: while this variable represents nothing and can be used to identify undefined or empty values…
$undefined -eq $null
…the instance of $null itself (or any empty variable) is not nothing. Therefore, a true nothing is not carried through the pipeline, but the physical instance of $null is, even multiple times:
$null, $null, $null | Test-Pipeline -Name A | Test-Pipeline -Name B | Test-Pipeline -Name C

# assuming the three variables are undefined:
$undefined1, $undefined2, $undefined3 | Test-Pipeline -Name A | Test-Pipeline -Name B | Test-Pipeline -Name C
If you wanted to exclude empty values from passing through the pipeline, either make the pipeline-aware parameter mandatory, or explicitly exclude null values:
$null | Where-Object { $_ -ne $null } | Test-Pipeline -Name A | Test-Pipeline -Name B | Test-Pipeline -Name C

Using Commands Stand-Alone

Let’s finally see how Test-Pipeline behaves when you run it outside the pipeline as a stand-alone command:

Test-Pipeline -Name A -InputObject 1

Outside the pipeline, the command still works, and the three code blocks are executed in consecutive order:

BEGIN   A
PROCESS A: 1
1
END     A

Guidelines

So when should you use a PowerShell pipeline, and when would you want to use classic variables? Of course, this is to a large extent simply a matter of taste, but there are also a few non-negotiable hard facts to consider:

The Pipeline benefits are:

Save Resources: when you need to process a lot of complex objects, streaming them through a pipeline saves memory because only one object needs to be held in memory at any given time. This assumes of course that you did not save results in variables: starting a pipeline with a variable instead of a command destroys this advantage: you already burned the memory.
Fast User Responses: if a command takes a long time to produce the complete set of results, streaming them through a pipeline emits results in real-time as they become available. The user gets the fastest-possible feedback time. Without a pipeline, you’d have to wait for all results to be collected before you could go on and present results to the user.

The Variable benefits are:

Reusable: Once you save data to variables, the data is reusable and can be read as often as you want. The original data that is streamed through a pipeline is not available for other purposes, and you’d have to execute the command again that produced the original data.
Faster Overall: If total speed matters to you, using variables and avoiding the pipeline can speed up the total execution time tremendously. This is largely due to a bug in PowerShell that can seriously slow down pipeline processing in some scenarios. You can work around this bug and make the pipeline execute just as quickly as the variable approach, though.

PREVIOUSVBScript and CSharp

NEXTSupport Pipeline