Make Loops Stream

Traditional looping constructs like “foreach” and “do...while” cannot stream: you need to wait for all results to be done. With a simple trick, you can add streaming.

Traditional looping constructs like “foreach” and “do...while” cannot stream: you need to wait for all results to be done. With a simple trick, you can add streaming.

By embedding traditional loops into a ScriptBlock you get real-time streaming. This way, you can process the results immediately as they become available, and add an async touch to your scripts.

Let’s go step-by-step and examine first the benefits of streaming, then look at when it makes sense to add streaming to classic loops.

Why is Streaming Important?

The PowerShell Pipeline has built-in real-time streaming, so you receive results when they are created:

Get-ChildItem -Path c:\windows -Recurse -Filter *.log -File -ea 0 |
ForEach-Object { 'processing {0}' -f $_.FullName } |
Out-GridView

While Get-ChildItem is traversing your Windows folder, the pipeline starts emitting results as they become available. You can see this in Out-GridView: it is adding new lines as they are emitted by Get-ChildItem.

This makes a script very responsive (a user sees initial results very quickly) and saves a lot of memory (only one file object at a time needs to be accommodated in memory).

Classic Loops Lack Streaming

PowerShell also supports all classic loop constructs such as foreach and do...while. While they perform better overall, they do not support streaming and return their results only when everything is processed:

$files = Get-ChildItem -Path c:\windows -Recurse -Filter *.log -File -ea 0 

$results = foreach($_ in $files)
{ 'processing {0}' -f $_.FullName }

$results | Out-GridView

Now you have to wait a long time for Get-ChildItem to produce the results. Once they are in, foreach can process the data in $files very fast, and the overall time for this script is faster than the pipeline approach. However, for a user this approach appears to be much slower because there is a long waiting time until the first responses appear.

You can see this in Out-GridView: it opens only after a long delay, then shows all results almost momentarily.

Overview: Streaming vs. Variables

If you must use a classic loop construct like foreach or do...while (and there are good reasons for it), you must save the results to a variable. However, you can add streaming behavior simply by embedding them into a ScriptBlock. ScriptBlocks support streaming by default.

Let’s first look at why that might be a good idea.

Passing Results in Real-Time

I rewrote the code and embedded the loop in a ScriptBlock. No longer do I need to assign the results of foreach to a variable like $results. Instead, the enclosing ScriptBlock can stream the results in real-time to downstream commands such as Out-GridView:

$files = Get-ChildItem -Path c:\windows -Recurse -Filter *.log -File -ErrorAction SilentlyContinue 

& {
foreach($_ in $files)
{ 'processing {0}' -f $_.FullName }
} | Out-GridView

It works but isn’t any faster and still has a long initial delay. This comes at no surprise: it just takes so much time for Get-ChildItem to gather all the data in $files in the first place. So when foreach starts to do something (and in fact now outputs results in real-time via streaming), the initial delay has already taken place.

That’s why it is important to first understand when streaming can help you, and when you shouldn’t bother considering it:

Classic Loops Must Have All Data

There is no way to work around the fact that classic loops like foreach and do...while can only start running when they have all data already present in some variable. If you want to change that, you must rewrite your code and use the PowerShell Pipeline and Foreach-Object instead.

So classic loops only make sense when the data is already present in some variable, or is emitted directly from some command.

A “Real” Streaming Example

To see streaming in action, let’s assume the data is already present in some variable $files, and also add a Start-Sleep to the code to pretend it is doing something very expensive with the data:

# $files is supposed to be filled with data already
$result = foreach($_ in $files)
{ 
    # artificially slowing things down a bit
    'processing {0}' -f $_.FullName 
    Start-Sleep -Seconds 1
} 

$result | Out-GridView

You now have to wait a very long time for the results to appear because foreach returns its result only when it has processed all data.

“That’s not true!”, you may argue. When you remove Out-GridView from the example above, foreach does return data in real-time as it becomes available. Please look again: we are looking at returning data so that your script can do something with it. foreach returns its result only when it has completed the entire loop. What you are seeing when you remove Out-GridView is not what we are talking about here: whenever you output data directly to the console (not assigning to a variable, not piping to another command), PowerShell emits the results to the console immediately as they become available.

Enabling Streaming

Now let’s turn on streaming for the foreach loop by embedding the loop into a ScriptBlock:

& {
    # $files is supposed to be filled with data already
    foreach($_ in $files)
    { 
        # artificially slowing things down a bit
        'processing {0}' -f $_.FullName 
        Start-Sleep -Seconds 1
    }
} | Out-GridView

This time, the results are passed on to Out-GridView in real-time, without having to first collect them in a variable. The user gets first results instantaneously, and a programmer could start filtering out data immediately to conserve memory.

When Adding Streaming Makes Sense

After some chewing on the code, you’ll soon realize that embedding foreach inside a scriptblock isn’t very useful: you could have used a scriptblock in the first place, and abandon foreach altogether:

# $files is supposed to be filled with data already
$files | & { 
    process
    {
        # artificially slowing things down a bit
        'processing {0}' -f $_.FullName 
        Start-Sleep -Seconds 1
    }
} | Out-GridView

All you need to do is place your looping code inside a process{} block so it gets repeated for each incoming pipeline element. So where does adding streaming make sense?

Do..While Loops Do Matter

There is one loop that can’t easily be replaced with a pipeline: Do...While. It is special because it determines freshly before or after each iteration whether it should iterate again. This is often used for reading database records or file content until the data source encounters an End of File.

You can’t convert Do..While loops easily to a pipeline because typically you don’t have data to start the pipeline with. Instead, the loop itself produces the data.

With the trick above, you can keep using Do..While loops and just enable streaming by embedding the loop into a ScriptBlock.

Here are two real-world scenarios:

Reading Large Files

Assume you want to read large log files with maximum control over the read process. Here is an example:

# take a text file to play with
# replace with path to any text file you want
# ensure the file exists:
$Path = "$pshome\types.ps1xml"

[IO.StreamReader]$reader = [System.IO.StreamReader]::new($Path)
$result = while (-not $reader.EndOfStream)
{
    # read current line
    $reader.ReadLine()

    # add artificial delay to pretend this was a HUGE file
    Start-Sleep -Milliseconds 300
}

# close and dispose the streamreader properly:
$reader.Close()
$reader.Dispose()

$result | Out-GridView

I added an artificial delay to the loop to mimic reading a really huge file. When you run the code, a StreamReader reads the file line-by-line. A While loop checks at the begin of each loop whether all lines have been read.

There are plenty of ways to read text files, and PowerShell sports its own Get-Content which is really simple to use. I chose to use a StreamReader here solely to have a use case where checking some End of File property is required.

Since While does not support streaming, all results must be stored to a variable like $result first, and the user sees the result only when all lines have been read. With large files (or thanks to the artificial delay in the example) this can take very long.

Adding Streaming

Let’s add real-time streaming to While by embedding the loop into a ScriptBlock:

# take a text file to play with
# replace with path to any text file you want
# ensure the file exists:
$Path = "$pshome\types.ps1xml"

[IO.StreamReader]$reader = [System.IO.StreamReader]::new($Path)

# embed loop in scriptblock:
& {
    while (-not $reader.EndOfStream)
    {
        # read current line
        $reader.ReadLine()
    
        # add artificial delay to pretend this was a HUGE file
        Start-Sleep -Milliseconds 10
    }
# process results in real-time as they become available:
} | Out-GridView

# close and dispose the streamreader properly:
$reader.Close()
$reader.Dispose()

When you run this, the results are emitted to Out-GridView immediately. There is no need anymore to store all results in a variable like $result, and a scripter could add filters like Where-Object to immediately filter out useful lines.

Reading Databases

When you execute a SQL statement on a database, you receive records one by one until the database returns EOF (End-of-File). So PowerShell does not know how often the loop is going to iterate which is why a head-controlled While loop is used.

There are plenty of ways how to connect to and read database content. I chose an approach via COM objects just to come up with an example where it is required to check a EOF (end-of-file) property in a While loop. I am sure there are smarter ways to read databases.

Querying Database via SQL

The next script connects to a local SQLServer instance and reads the names of all tables.

The script assumes a local SQLServer database and queries the list of tables in the database. If you’d like to connect to a different database, change $connectionString accordingly.

# define your database details:
$InstanceId = "$env:computername\FIRSTDB"
$Database = "master"
$connectionString = "Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=$Database;Data Source=$InstanceId"
$sql = "select * from sys.databases"

# connect to database
$connection = New-Object -ComObject ADODB.Connection
$connection.Open($connectionString)
$rs = $connection.Execute($sql)

# loop through records
$result = while ($rs.Eof -eq $false)
{
  # turn each record into an object:
  $hash = @{}
  foreach($field in $rs.Fields)
  {
    $hash[$field.Name] = $field.Value
  }
  [PSCustomObject]$hash

  $rs.MoveNext()
}

# close database
$rs.Close()
$connection.Close()

# emit results
$result | Out-GridView

Since While loops can’t stream, you get the results only when all records have been processed. For the case of system tables, this delay does not matter, but when you are querying real tables with thousands of records, it does matter.

Adding Streaming

Now let’s add just a tiny bit of code to make the While loop stream. This way, you get the results from your database in real-time as they are retrieved:

# define your database details:
$InstanceId = "$env:computername\FIRSTDB"
$Database = "master"
$connectionString = "Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=$Database;Data Source=$InstanceId"
$sql = "select * from sys.databases"

# connect to database
$connection = New-Object -ComObject ADODB.Connection
$connection.Open($connectionString)
$rs = $connection.Execute($sql)

# loop through records
# embed code in scriptblock
& {
    while ($rs.Eof -eq $false)
    {
      # turn each record into an object:
      $hash = @{}
      foreach($field in $rs.Fields)
      {
        $hash[$field.Name] = $field.Value
      }
      [PSCustomObject]$hash

      $rs.MoveNext()
    }
# emit the results in real-time to the next command
# i.e. Out-GridView:
} | Out-GridView

# close database
$rs.Close()
$connection.Close()

By embedding the loop inside a ScriptBlock, you can stream its result directly to Out-GridView in real-time. No need to hog memory in $results, and immediate feedback to the user.