Hey, Scripting Guy! How can I get a list of available metadata for Microsoft Office documents?
Hey, JR. If we understand your question correctly, you’d like to know if it’s true that July 21, 2006 will mark the 500thHey, Scripting Guy! column; in addition, you’d like to know whether we are planning to do anything to mark this auspicious occasion. Well, the answer to both questions is yes.
And, yes, technically you can send us an electronic postcard. But that doesn’t sound half as much fun as getting a real postcard, if you know what we mean.
Anyway, thanks for writing, JR. Hope we were able to help.
Hmmm, now that you mention it, we guess your question did say something about metadata, didn’t it? Wonder how we could have missed that? But, what the heck: we have some time to kill while we wait for the postcards to start rolling in, so let’s see if we can show you how to get a list of the available metadata for a Microsoft Office document.
To begin with, we should note that there is actually a separate COM object (Dsofile.exe) that you can download and use to get metadata from Microsoft Office documents. The advantage of using Dsofile is that it can retrieve metadata from any Microsoft Office document. The disadvantage to Dsofile? You’ll obviously have to download and install it on every computer where you might need to run your metadata-retrieving script. Because a lot of people don’t like to download and install additional utilities, we won’t discuss Dsofile today; if you’re interested, take a look at the Tales From the Script column Dsofile: The Untold Story.
Instead, what we’re going to do today is show you how to get at metadata (also referred to as document properties) using plain old Microsoft Office. The nice thing about this, of course, is that there’s nothing to download or install: if you’ve got Microsoft Office up and running then you’ve already got everything you need. The one disadvantage: you’ll have to modify your script slightly depending on the type of document you want to examine. But don’t worry, we’ll explain how to do that. Because you asked a fairly simple question – How can I get a list of available metadata for Microsoft Office documents? – let’s start off with an equally-simple script, one that retrieves metadata for an Excel spreadsheet (C:\Scripts\Test.xls):
Set objExcel = CreateObject("Excel.Application") objExcel.Visible = True Set objWorkbook = objExcel.Workbooks.Open("C:\Scripts\Test.xls") For Each strProperty in objWorkbook.BuiltInDocumentProperties Wscript.Echo strProperty.Name Next
Yes, it is a very simple little script. We start out by creating an instance of the Excel.Application object. And you’re right: because we’re using Excel we’ve pretty much limited ourselves to retrieving information about Excel spreadsheets. What if we wanted to retrieve information about Word documents? In that case, we’d need to use the Word.Application object. OK, then how about PowerPoint presentations? That’s easy: we’d need to use the PowerPoint.Application object. Microsoft Publisher? That’s right: Publisher.Application. And so on.
After creating the Excel.Application object we use this line of code to make Excel visible onscreen:
objExcel.Visible = True
Needless to say, we don’t need to do this: the script works just fine even if you can’t see Excel onscreen. We do this primarily to help you visualize what’s going on. If you want to run the script without ever seeing the spreadsheet then simply leave out this line of code. Of course, each time you run the script you’ll also end up with an invisible copy of Excel running in the background. To combat that, just make sure you call the Quit method to dismiss Excel when the script finishes. The “invisible” version of this script looks like this:
Set objExcel = CreateObject("Excel.Application") Set objWorkbook = objExcel.Workbooks.Open("C:\Scripts\Test.xls") For Each strProperty in objWorkbook.BuiltInDocumentProperties Wscript.Echo strProperty.Name Next objExcel.Quit
After getting Excel up and running we then use this line of code to open the spreadsheet C:\Scripts\Test.xls:
Set objWorkbook = objExcel.Workbooks.Open("C:\Scripts\Test.xls")
So far so good. Now, how do we get to the metadata? Well, it turns out that Microsoft Office documents store their metadata in a collection named BuiltInDocumentProperties. To get a list of available metadata for a Microsoft Excel spreadsheet all we have to do is set up a For Each loop to walk through this collection, echoing back the Name of each item (each piece of metadata) found in the collection:
For Each strProperty in objWorkbook.BuiltInDocumentProperties Wscript.Echo strProperty.Name Next
What does that give us? That gives us something very much like this:
Title Subject Author Keywords Comments Template Last author Revision number Application name Last print date Creation date Last save time Total editing time Number of pages Number of words Number of characters Security Category Format Manager Company Number of bytes Number of lines Number of paragraphs Number of slides Number of notes Number of hidden Slides Number of multimedia clips Hyperlink base Number of characters (with spaces)
There’s your available metadata.
But wait: we aren’t done yet. After all, that script shows us all the metadata that could be configured for Test.xls. But what if we’d like to find out which metadata has been configured for Test.xls? Can we get at that information? Of course we can; this script returns metadata property names and configured values:
On Error Resume Next Set objExcel = CreateObject("Excel.Application") objExcel.Visible = True Set objWorkbook = objExcel.Workbooks.Open("C:\Scripts\Test.xls") For Each strProperty in objWorkbook.BuiltInDocumentProperties Wscript.Echo strProperty.Name & " - " & strProperty.Value Next
The only difference between this script and our first script? This line right here, where we echo back not only the item Name but also the item Value:
Wscript.Echo strProperty.Name & " - " & strProperty.Value
Oh: and we also added On Error Resume Next to the beginning of the script. That’s important: in some cases the script will crash upon encountering a property that has never been configured. To guard against that possibility, just use On Error Resume Next.
The net result will be similar to this, depending on which properties have values assigned to them and which ones do not:
Title - Metadata Test Subject - Excel Test Scripts Author - Ken Myer Keywords - testing, scripts Comments - This is a sample spreadsheet used for testing purposes. Template - Last author - Ken Myer Revision number - Application name - Microsoft Excel Creation date - 6/13/2006 8:40:17 PM Last save time - 6/13/2006 9:07:15 AM Security - 0 Category - Format - Manager - Company - Microsoft Corporation Hyperlink base -
Ah, we had a feeling you were going to ask that. Yes, you can use a script to programmatically set the value of some of these properties. Some properties – such as the document Creation date or Number of characters – are read-only, for obvious reasons; however, other properties – such as Title, Subject, and Author – can be configured using a script similar to this:
On Error Resume Next Set objExcel = CreateObject("Excel.Application") objExcel.Visible = True Set objWorkbook = objExcel.Workbooks.Open("C:\Scripts\Book1.xls") For Each strProperty in objWorkbook.BuiltInDocumentProperties If strProperty.Name = "Title" Then strProperty.Value = "Test Title" End if Next
Again, this is fairly simple. After setting up our For Each loop we use this line of code to check each document property to see if the Name of the property is equal to Title:
If strProperty.Name = "Title" Then
If it is, then we use this line of code to set the Value of the property to Test Title:
strProperty.Value = "Test Title"
Give it a try and see what happens.
Two quick notes. Yes, there might be a slightly faster way to get at the value of an individual property like Title. As usual, though, we went for the approach we thought was easier rather than the approach that might have been a tiny bit faster. Also, as you doubtless know, you can add your own, custom metadata to Microsoft Office documents. Can you get at those properties using a script? You bet: just use the CustomDocumentProperties collection instead of the BuiltInDocumentProperties collection.