Friday, May 10, 2013

Extract worksheets from Excel into separate files with PowerShell

I recently had need to dust off an old VB script I'd written years ago to get worksheets out of Excel files. I've also been curious about doing more with PowerShell, and besides feeling guilty about putting a VB script into use in 2012, it seemed like a really good learning opportunity.

So why not just rewrite the script in .NET? Well, you can definitely do that; in fact, the code would look very similar. However, not everyone is a .NET developer. I wrote the original VB script on a team where we were building C++ DLLs for ETL processing; .NET wasn't part of our code base. I also think there are plenty of IT roles - DevOps, DBAs, Network Admininstrators to name a few - that might find a simple PowerShell tool like this a little easier to use and/or modify for their needs.

So that being said, just copy & paste the code below into an empty .ps1 file, and you should be good to go. To use it, simply execute the following command (should work from command-line, batch file, or managed code):

PowerShell.exe -command "C:\ScriptFile.ps1" -filepath "C:\Spreadsheet.xls" -output_type "csv"

I did run into one problem / issue while writing this script - getting it to run the first time! Thanks to this great article by Scott Hanselman, I found out that there are some very tight Windows security restrictions on PowerShell scripts - particularly the ones you didn't write yourself. After reading his article, it seemed easier for me (and for anyone who wants to use my code) to just post the source code rather than a downloadable script with certificates, at least in this instance. Maybe if I write another PowerShell article I'll give the certificate thing a go.

If you get the error message I got - "The file C:\ScriptFile.ps1 cannot be loaded. The execution of scripts is disabled on this system. Please see "Get-Help about_signing" for more details." - you can enable execution of PowerShell scripts you'vecreated by running the following command 'As Administrator':

PowerShell.exe Set-ExecutionPolicy RemoteSigned

Anyway, here's my script:


# Purpose: Extract all of the worksheets from an Excel file into separate files.

[CmdletBinding()]
Param ( 
    [Parameter(Mandatory=$true,Position=0)] 
    [string]$filepath,

    [Parameter(Mandatory=$true,Position=1)] 
    [ValidateSet("csv","txt","xls","html")] 
    [string]$output_type 
)

#-----------------------------------------------------------------------------#

# Figures out and returns the 'XlFileFormat Enumeration' ID for the specified format.
# http://msdn.microsoft.com/en-us/library/office/bb241279%28v=office.12%29.aspx 
# NOTE: The code being used for 'xls' is actually a 'text' type, but it seemed
# to work the best for splitting the worksheets into separate Excel files.

function GetOutputFileFormatID 

Param([string]$fomat_name
    $Result = 0 

    switch($fomat_name
    { 
        "csv" {$Result = 6
        "txt" {$Result = 20
        "xls" {$Result = 21
        "html" {$Result = 44
        default {$Result = 51
    } 
    
    return $Result 
}

#-----------------------------------------------------------------------------# 

$Excel = New-Object -ComObject "Excel.Application" 
$Excel.Visible = $false #Runs Excel in the background. 
$Excel.DisplayAlerts = $false #Supress alert messages. 

$Workbook = $Excel.Workbooks.open($filepath

#Loop through the Workbook and extract each Worksheet in the specified file type.  
if ($Workbook.Worksheets.Count -gt 0) { 
    write-Output "Now processing: $WorkbookName" 
    
    $FileFormat = GetOutputFileFormatID($output_type

    #Strip off the Excel extension. 
    $WorkbookName = $filepath -replace ".xlsx""" #Post 2007 extension
    $WorkbookName = $WorkbookName -replace ".xls""" #Pre 2007 extension 

    $Worksheet = $Workbook.Worksheets.item(1

    foreach($Worksheet in $Workbook.Worksheets) { 
        $ExtractedFileName = $WorkbookName + "~~" + $Worksheet.Name + "." + $output_type 

        $Worksheet.SaveAs($ExtractedFileName$FileFormat

        write-Output "Created file: $ExtractedFileName" 
    } 


#Clean up & close the main Excel objects. 
$Workbook.Close() 
$Excel.Quit() 

No comments:

Post a Comment