Reading and Writing Excel Files

Excel files are a popular data source. With the help of the free module ImportExcel, PowerShell supports reading and writing .xlsx files.

Excel files (.xlsx) are a very important data exchange format for a number of reasons:

  • Human Readable: Excel files can easily be opened and read by non-IT staff. It is trivial to browse the data or make changes.
  • Type Support: Excel files support basic data types like string, dates, and numeric values
  • Not Platform-Specific: You can exchange excel data across platforms and locales. Unlike with text-based formats, encoding and special character support are no issue

PowerShell does not come with native support for .xlsx files though. That’s why previously users resorted to exporting excel data to csv, then use Import-Csvto read the exported data into PowerShell.

This workaround produces extra work and has a number of other disadvantages. Thanks to the free module ImportExcel, going the extra route via csv is not required anymore. You now can directly read and write .xlsx data. Microsoft Office is not required.

In this article, you’ll learn how to read and write .xlsx and .xlsm files in just a line of code. Plus I provide you with Convert-XlsToXlsx, a clever function that auto-converts .xls files to .xlsx and .xlsm file types. That’s important because ImportExcel can only deal with the modern .xlsx and .xlsm file types. The older .xls excel files use a proprietary binary format that only excel knows how to read.

Convert-XlsToXlsx may be highly useful in its own right when you need to bulk-convert older excel files to modern formats.

It also illustrates how to access the excel object model, and more importantly, how to release COM objects so you don’t end up with memory leaks and ghost processes.

Adding Excel Support to PowerShell

Thanks to Doug Finke and his awesome free module ImportExcel, reading and writing .xlsx files is a snap now - no Office installation required. Simply download and install this free module from the PowerShell Gallery:

Install-Module -Name ImportExcel -Scope CurrentUser -Force

If you have Administrator privileges at hand, you might want to install the module for All Users instead. This makes sure the module is available for all users but more importantly, it makes the module available for both in Windows PowerShell and PowerShell 7.

Install-Module -Name ImportExcel -Force

When you install modules in the scope CurrentUser, modules are available only for the PowerShell edition you used to do the install, so you would have to potentially install the module twice in different locations.

Reading And Writing Excel Files

The two most important cmdlets from this module are:

  • Import-Excel: takes a path to a .xlsx file and returns all data from the default worksheet. Use the parameter -WorksheetName to specify a given worksheet. Example:

    # import excel file and show in gridview (make sure file exists!)
    $Path = "c:\path\to\some\excel.xlsx"
    Import-Excel -Path $Path | Out-GridView
    
  • Export-Excel: saves all piped data to a *.xlsx file. Use the parameter -WorksheetName to specify a given worksheet. By default, existing data on the worksheet will be overwritten. Example:

    # create am excel sheet with all local user accounts
    Get-LocalUser | Export-Excel
    

Playing With Sample Data

Let’s play with the new excel commands! Writing excel files is simple: pipe data to Export-Excel to create new excel files:

$Path = "$env:temp\listOfServices.xlsx"
Get-Service | Export-Excel -Path $Path -AutoSize -AutoFilter -FreezeTopRow -BoldTopRow -ClearSheet -WorksheetName 'List of Services' -Show

To play with Import-Excel, let’s retrieve some real-world sample data files first.

Downloading Sample Data

Finding excel sample data is easy: just google for Download Excel Sample Data to come up with urls. They come as individual files and ZIP archives. To make downloading a pleasant experience, I created a bunch of helper functions.

To download files, simply use Download-File and Download-Zip:

# use TLS1.2 with HTTPS:
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

# creates folder if it does not yet exist:
filter Assert-FolderExists
{
  $exists = Test-Path -Path $_ -PathType Container
  if (!$exists) { 
    Write-Warning "$_ did not exist. Folder created."
    $null = New-Item -Path $_ -ItemType Directory 
  }
}

# download, unblock and extract zip files
filter Download-Zip($Path)
{
  # download to temp file:
  $temp = "$env:temp\temp.zip"
  Invoke-WebRequest -Uri $_ -OutFile $temp
  # unblock:
  Unblock-File -Path $temp
  # extract archive content:
  Expand-Archive -Path $temp -DestinationPath $Path -Force
  
  # report
  $zip = [System.IO.Compression.ZipFile]::OpenRead($temp)
  $zip.Entries | ForEach-Object { Write-Warning "Download: $_" }
  $zip.Dispose()
  
  # remove temp file:
  Remove-Item -Path $temp
}

# test whether filename is valid:
function Test-ValidFileName($FileName)
{
  $FileName.IndexOfAny([System.IO.Path]::GetInvalidFileNameChars()) -eq -1
}

# download and unblock file:
filter Download-File($Path, $FileName)
{
  # does the url specify a filename?
  if ([string]::IsNullOrWhiteSpace($FileName))
  {
    # take filename from url:
    $FileName = $_.Split('/')[-1]
    # remove url parameters:
    $FileName = $FileName.Split('?')[0]
    # test for valid file name:
    $isValid = Test-ValidFileName -FileName $FileName
    if (!$isValid)
    {
      throw "Url contains no valid file name. $FileName is not valid. Use parameter -FileName to specify a valid filename."
    }
  }
  
  $filePath = Join-Path -Path $Path -ChildPath $FileName
  Invoke-WebRequest -Uri $_ -OutFile $filePath
  # unblock:
  Unblock-File -Path $Path
  
  Write-Warning "Download: $FileName"
}

# create local folder for downloaded files:
($OutPath = "$env:temp\excelsampledata") | Assert-FolderExists

# download various excel sample files:
'https://www.contextures.com/SampleData.zip' | Download-Zip -Path $OutPath
'https://go.microsoft.com/fwlink/?LinkID=521962' | Download-File -Path $OutPath -FileName financial.xlsx
'http://www.principlesofeconometrics.com/excel/theories.xls' | Download-File -Path $OutPath 
'http://www.principlesofeconometrics.com/excel/food.xls' | Download-File -Path $OutPath 
'https://www.who.int/healthinfo/statistics/whostat2005_mortality.xls?ua=1' | Download-File -Path $OutPath 
'https://www.who.int/healthinfo/statistics/whostat2005_demographics.xls?ua=1' | Download-File -Path $OutPath 

When you run this code, it downloads a bunch of excel sample files:

WARNING: Download: SampleData.xlsx
WARNING: Download: financial.xlsx
WARNING: Download: theories.xls
WARNING: Download: food.xls
WARNING: Download: whostat2005_mortality.xls
WARNING: Download: whostat2005_demographics.xls

Reading Excel Files

To read data directly from excel files, use Import-Excel. For example, to get the financial data for December only, try this:

# path with excel files
# (assuming you downloaded the sample data as instructed before)
Set-Location -Path "$env:temp\excelsampledata"

Import-Excel -Path .\financial.xlsx | Where-Object 'Month Number' -eq 12 | Out-GridView

By default, Import-Excel reads data from the first worksheet. If your file contains more than one worksheet, use the parameter -WorksheetName to specify its name.

To group the countries for December, simply use the common PowerShell pipeline cmdlets:

Obviously, you can do this with excel directly as well. This is about automation (in case you need to do these kinds of analysis regularly), and it is for PowerShell home boys who may not know how to pivot in excel but do know their tools in PowerShell.

And it is about learning: there is no better way to learn the PowerShell pipeline cmdlets!

# path with excel files
# (assuming you downloaded the sample data as instructed before)
Set-Location -Path "$env:temp\excelsampledata"

Import-Excel -Path .\financial.xlsx | Where-Object 'Month Number' -eq 12 | Group-Object -Property Country -NoElement | Sort-Object -Property Count -Descending

Here is the result:

Count Name                     
----- ----                     
   21 Germany                  
   21 United States of America 
   21 Canada                   
   21 France                   
   21 Mexico  

Accessing XLS Files

The bad news is: .xls files cannot be accessed. They use a proprietary binary format that can only be read by excel.

The good news is: provided you have excel installed, it is trivial to convert .xls files to .xlsx files. If you are really still using .xls files, you should consider this transform for good. .xls is really outdated and should no longer be used.

Converting XLS To XLSX

Above I downloaded a bunch of .xls files that can’t be processed by Import-Excel. Bummer.

Below is a function Convert-XlsToXlsx that auto-converts .xls files to .xlsx and .xlsm files, though. The script requires Microsoft Office to be installed on your box because only excel knows how to open the binary format used in .xls files:

function Convert-XlsToXlsx
{
  param
  (
    # Path to the xls file to convert:
    [Parameter(Mandatory,ValueFromPipeline,ValueFromPipelineByPropertyName)]
    [string[]]
    [Alias('FullName')]
    $Path,

    # overwrite file if it exists:
    [switch]
    $Force,
    
    # show excel window during conversion. This can be useful for diagnosis and debugging.
    [switch]
    $Visible
  )

  # do this before any file can be processed:
  begin
  {
    # load excel assembly (requires excel to be installed)
    Add-Type -AssemblyName Microsoft.Office.Interop.Excel

    # open excel in a hidden window
    $excel = New-Object -ComObject Excel.Application
    $workbooks = $excel.Workbooks
    if ($Visible) { $excel.Visible = $true }

    # disable interactive dialogs
    $excel.DisplayAlerts = $False
    $excel.WarnOnFunctionNameConflict = $False
    $excel.AskToUpdateLinks = $False

    # target file formats
    $xlsx = [Microsoft.Office.Interop.Excel.XlFileFormat]::xlOpenXMLWorkbook
    $xlsm = [Microsoft.Office.Interop.Excel.XlFileFormat]::xlOpenXMLWorkbookMacroEnabled
  }

  # do this for each file:
  process
  {
    foreach($_ in $Path)
    {
      # check for valid file extension:
      $extension = [System.Io.Path]::GetExtension($_)
      if ($extension -ne '.xls') 
      { 
        Write-Verbose "No xls file, skipping: $_"  
        continue 
      } 
  
      # open file in excel:
      $workbook = $workbooks.Open($_)
      
      # test for macros:
      if ($workbook.HasVBProject)
      {
        $extension = 'xlsm'
        $type = $xlsm
      }
      else
      {
        $extension = 'xlsx'
        $type = $xlsx      
      }
      
       # get destination path
      $outPath = [System.Io.Path]::ChangeExtension($_, $extension)

      # does it exist?
      $exists = (Test-Path -Path $outPath) -and !$Force
      if ($exists)
      {
        Write-Verbose "File exists and -Force was not specified, skipping: $_"  
        Write-Warning "File exists. Use -Force to overwrite. $_"
        continue
      }
      
      # save in new format:
      $workbook.SaveAs($outPath, $type)
      # close document
      $workbook.Close()
      # release COM objects to prevent memory leaks:
      $null = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($workbook)
      
      Write-Verbose "File successfully converted: '$_' -> '$outPath'"  
    }
  }
  
  # do this once all files have been processed
  end
  {
    # quit excel and clean up:
    $excel.Quit()
    
    # release COM objects to prevent memory leaks:
    $null = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($workbooks)
    $null = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($excel)
    $excel = $workbooks = $null
    # clean up:
    [GC]::Collect()
    [GC]::WaitForPendingFinalizers()
    Write-Verbose "Done."  
  }
}

It is beyond the scope of this article to discuss the function in detail. I’d like to point out though that the code illustrates important aspects when using COM objects in PowerShell:

When using COM objects like Excel.Application, it can be challenging to free all object references at the end. When you do this wrong, references will stay alive, and so does the excel process in memory. Of course you can always kill the process after use but this might damage excel, and next time you launch it, it starts in recovery mode.

A better approach is to make sure you are storing each object reference in a dedicated variable. Next, make sure you actively release each reference after use by calling ReleaseComObject().

When you did that right, no open reference should survive, and when you call Quit(), excel should be removed from your process list.

Now it’s trivial to convert all downloaded .xls files to the appropriate new formats:

# path with excel files. 
# assuming you created this folder and downloaded files to it:
$Path = "$env:tmp\Excelsampledata"

# get all xls files and convert them:
Get-ChildItem -Path $Path -Filter *.xls -File | Convert-XlsToXlsx -Verbose