Get Excel data without Excel


In a specific scenario, I needed to be able to query an Excel file and get data from a worksheet but I didn’t (and couldn’t) have the Office suite (or even Excel only) installed on the machine.

If the Excel file is and xlsx file (the new Microsoft Open XML Format since Excel 2007), you could extract the xml content (e.g. xl\worksheets\sheet1.xml) from the archive (the xlsx file is actually a compressed archive), read it as an xmlDocument, and so on… But it is not really that straight forward… I’ll leave it for a future post.

What I ended up doing, is using the ACE.OLEDB provider to query the Excel worksheet as if it were a SQL table. The only problem is, that you can call it only from 32bit applications. Don’t worry, you don’t have to explicitly run the script from a 32bit PowerShell console or ISE. I wrapped the code that queries the Excel as a scriptblock and I invoke it as a 32bit job using the -RunAs32 switch (see Get-Help about_Jobs or Get-Help Start-Job for more details)

The complete PowerShell example function is:

function Get-ExcelData {
    [CmdletBinding(DefaultParameterSetName='Worksheet')]
    Param(
        [Parameter(Mandatory=$true, Position=0)]
        [String] $Path,

        [Parameter(Position=1, ParameterSetName='Worksheet')]
        [String] $WorksheetName = 'Sheet1',

        [Parameter(Position=1, ParameterSetName='Query')]
        [String] $Query = 'SELECT * FROM [Sheet1$]'
    )

    switch ($pscmdlet.ParameterSetName) {
        'Worksheet' {
            $Query = 'SELECT * FROM [{0}$]' -f $WorksheetName
            break
        }
        'Query' {
            # Make sure the query is in the correct syntax (e.g. 'SELECT * FROM [SheetName$]')
            $Pattern = '.*from\b\s*(?<Table>\w+).*'
            if($Query -match $Pattern) {
                $Query = $Query -replace $Matches.Table, ('[{0}$]' -f $Matches.Table)
            }
        }
    }

    # Create the scriptblock to run in a job
    $JobCode = {
        Param($Path, $Query)

        # Check if the file is XLS or XLSX 
        if ((Get-Item -Path $Path).Extension -eq 'xls') {
            $Provider = 'Microsoft.Jet.OLEDB.4.0'
            $ExtendedProperties = 'Excel 8.0;HDR=YES;IMEX=1'
        } else {
            $Provider = 'Microsoft.ACE.OLEDB.12.0'
            $ExtendedProperties = 'Excel 12.0;HDR=YES'
        }
        
        # Build the connection string and connection object
        $ConnectionString = 'Provider={0};Data Source={1};Extended Properties="{2}"' -f $Provider, $Path, $ExtendedProperties
        $Connection = New-Object System.Data.OleDb.OleDbConnection $ConnectionString

        try {
            # Open the connection to the file, and fill the datatable
            $Connection.Open()
            $Adapter = New-Object -TypeName System.Data.OleDb.OleDbDataAdapter $Query, $Connection
            $DataTable = New-Object System.Data.DataTable
            $Adapter.Fill($DataTable) | Out-Null
        }
        catch {
            # something went wrong :-(
            Write-Error $_.Exception.Message
        }
        finally {
            # Close the connection
            if ($Connection.State -eq 'Open') {
                $Connection.Close()
            }
        }

        # Return the results as an array
        return ,$DataTable
    }

    # Run the code in a 32bit job, since the provider is 32bit only
    $job = Start-Job $JobCode -RunAs32 -ArgumentList $Path, $Query
    $job | Wait-Job | Receive-Job
    Remove-Job $job
}

Then, you can use it to get the entire default worksheet (Sheet1):

Get-ExcelData -Path C:\myFiles\Users.xlsx

Or to get a specific worksheet:

Get-ExcelData -Path C:\myFiles\Users.xlsx -WorksheetName 'Sheet2'

Or by specifying a query:

Get-ExcelData -Path C:\myFiles\Users.xlsx -Query 'SELECT TOP 3 * FROM Sheet3'

Or an even more complex query:

Get-ExcelData -Path C:\myFiles\Users.xlsx -Query "SELECT GivenName, Surname, City, State FROM Sheet1 WHERE State in ('CA','WA')"

 

HTH,

/Martin.


Comments (6)

  1. Anonymous says:

    Hi Martin,

    Thanks a lot for your article! That saved me a lot of time today :)
    Just a detail : you can now invoke your code directly without job thanks to the 64bit provider (available here :
    http://www.microsoft.com/en-us/download/details.aspx?id=13255)

  2. Anwar says:

    Thanks man. Its very useful code. Appreciate ur effort.

  3. Ross says:

    Hi Martin

    code above works beautifully so thank you for this.

    how would i adapt the code to populate the sql result (assuming only one record returned) into variable/s?
    e.g. for Get-ExcelData -Path C:\myFiles\Users.xlsx -Query “SELECT GivenName, Surname, City, State FROM Sheet1 WHERE State in (‘CA’,’WA’)”

    $GivenName = GivenName
    $Surname = Surname
    $City= City
    $State =State

    Reeason being i want to use these values to execute a command

    Thanks
    Ross

    1. Martin Schvartzman says:

      Hi Ross,
      The function returns a deserialized datatable, where you can easily access it’s rows:

      $results = Get-ExcelData -Path C:\myFiles\Users.xlsx -Query “SELECT TOP 1 GivenName, SureName, City, State FROM users”
      $GivenName = $results.Rows[0].GivenName
      $SureName = $results.Rows[0].SureName
      $City = $results.Rows[0].City
      # etc…

      HTH,
      \Martin.

  4. Lorcan says:

    Hi, this is great thank you. much faster than using COM, even on large worksheets.

    I can’t seem to get it to query on headers that have spaces. Is there an easy way to do this?

    get-excel data -path C:\temp\test.xlsx -query “SELECT * from sheet1 where “Space Test” = 1″
    get-excel data -path C:\temp\test.xlsx -query “SELECT * from sheet1 where ‘Space Test’ = 1″
    etc.?

    1. skyruiss says:

      try with brackets
      e.g.
      get-excel data -path C:\temp\test.xlsx -query “SELECT * from sheet1 where [Space Test] = 1″