Pete Hinchley: Read a Microsoft Excel Workbook using PowerShell and the Open XML SDK

There are a few options for reading a Microsoft Excel workbook using PowerShell, but one of the most efficient is to use the Open XML SDK.

  1. Firstly, download the Open XML SDK 2.5 from Microsoft. You only need OpenXMLSDKV25.msi.
  2. Install the SDK. The installation directory on an x64 computer will default to: C:\Program Files (x86)\Open XML SDK\V2.5
  3. Download GenericMethods.psm1 from the PSGenericMethods GitHub repository. Save the file within a directory on the same computer where you installed the Open XML SDK. e.g. C:\Scripts. This module will make it easier to work with the generic methods exposed by the SDK.
  4. Create an Excel document and add some sample data. e.g. C:\Scripts\Test.xlsx.
  5. Create a PowerShell script named Excel.ps1 with the code shown below and also save it under C:\Scripts.
  6. Now run the PowerShell script from a command prompt: powershell.exe -executionpolicy bypass -file C:\Scripts\Excel.ps1 -path "C:\Scripts\Test.xlsx"
Param ([string]$path = $(throw "-path is required."))

Import-Module "$PSScriptRoot\GenericMethods.psm1"
[System.Reflection.Assembly]::LoadFrom("C:\Program Files (x86)\Open XML SDK\V2.5\lib\DocumentFormat.OpenXml.dll") | out-null

[Reflection.Assembly]::LoadWithPartialName("DocumentFormat.OpenXml") | out-null
[Reflection.Assembly]::LoadWithPartialName("DocumentFormat.OpenXml.Packaging") | out-null
[Reflection.Assembly]::LoadWithPartialName("DocumentFormat.OpenXml.Spreadsheet") | out-null
[Reflection.Assembly]::LoadWithPartialName("OpenXmlPowerTools") | out-null

[DocumentFormat.OpenXml.Packaging.SpreadsheetDocument]$document = $null
$document = [DocumentFormat.OpenXml.Packaging.SpreadsheetDocument]::Open($path, $false)

[DocumentFormat.OpenXml.Packaging.WorkbookPart]$workBookPart = $document.WorkbookPart
$elements = $workBookPart.SharedStringTablePart.SharedStringTable.Elements

[DocumentFormat.OpenXml.Spreadsheet.Workbook]$workBook = $workBookPart.Workbook
[DocumentFormat.OpenXml.Spreadsheet.Sheets]$sheets = $workBook.Sheets

[DocumentFormat.OpenXml.Spreadsheet.Sheet]$sheet = $null

foreach ($sheet in $sheets) {
  [DocumentFormat.OpenXml.Packaging.WorksheetPart]$workSheetPart = $workBookPart.GetPartById($sheet.Id)

  $cells = Invoke-GenericMethod -InputObject $workSheetPart.Worksheet -MethodName Descendants -GenericType DocumentFormat.OpenXml.Spreadsheet.Cell

  foreach ($cell in $cells) {
    if ($cell.DataType.Value -eq "SharedString") {
      $stringTable = Invoke-GenericMethod -InputObject $workBookPart -MethodName GetPartsOfType -GenericType DocumentFormat.OpenXml.Packaging.SharedStringTablePart
      $value = $stringTable.SharedStringTable.InnerText
    } else {
      [String]$value = $cell.InnerText
    }

    "Value at {0}: {1}" -f $cell.CellReference, $value
    [DocumentFormat.OpenXml.Packaging.SharedStringTablePart]$sharedTablePart = $null
  }
}

$document.Close()

The code will iterate through the cells of the workbook (left to right, top to bottom) and output the reference and value of any non-empty cell. For example:

Value at A1: 321
Value at A2: ABC
Value at B1: XYZ