Reading in the Help File with Get-Content (Part 2 of 2)

What I did in the previous blog post was create a glorified way to read in the content from an XML file, specifically, one of the PowerShell help files, and turn it into plain text. Wow! Not all the useful since I’d have to figure out a way to search for text in the file. Instead, what I now want to talk about is how to read in the contents of an XML file while telling PowerShell to treat the object as XML. This will change how I navigate through the file.

Let’s start with creating a new object, I’ll call $help_xml, and this time we’ll cast the object as XML. And there I go again with that word, cast. At it's most basic, all this means is that I will tell PowerShell to specifically interpret the file contents as XML and make the object an XML object. And as we saw from part 1, if I didn't do this, Get-Content would read in the XML file as plain text by default.

And here’s how I make sure Get-Content reads in the help file as XML. In an opened, PowerShell console, I type:

$help_xml = [xml] (Get-Content $pshome\en-us\System.Management.Automation.dll-Help.xml)

(Notice that the only difference between this object creation and the $help_plain from the part 1 is the [xml] type specification, which I've highlighted in red to make it perfectly clear.)

I get a different result from displaying the contents of this object:

PS C:\> $help_xml

xml                                                                               helpItems
---                                                                                 ---------
version="1.0" encoding="utf-8"                              helpItems

Much different from a listing of the file. That’s because my new object, $help_xml, has the properties and methods of an xml object. I know from various books and web tutorials that an xml file is structured so that the information is contained in elements with attributes. Knowing the structure of an XML file, i.e. what the elements inside a file are, will let you pull out the information by referencing the elements by name. And when you create a PowerShell object that represents XML information, then the elements and attributes in XML show up as properties of the PowerShell object. Quite a mouthful, but it translates into being able to use the PowerShell object dotted notation to find things in the object that represents an XML file read in by Get-Content.

Going back to our example, in the help file, we displayed the $help_xml file as a list of two properties called: xml and helpItems. How do I know this? Well, I can use the Get-Member cmdlet to see just the properties for the $help_xml object.

PS C:\> $help_xml | gm -MemberType properties

   TypeName: System.Xml.XmlDocument

Name                                               MemberType Definition
----                                                        ---------- ----------
helpItems          Property                 System.Xml.XmlElement helpItems {get;}
xml                      Property                 System.String xml {get;set;}

In the above definition the property, helpItems, is defined as an XML element. The way XML documents work is that they always start with a single top or root node and have child nodes inside that root node. (If you aren’t very familiar with XML, there are lots of tutorials on the web. A book I’ve used to learn about XML is the XML Pocket Consultant by William R. Stanek. And feel free to add other book references for XML in the comments section.)

Knowing that helpItems is the root node, I can “walk the tree” using PowerShell dotted notation.

PS C:\> $help_xml .helpItems

schema         #comment                            command                               providerHelp
------                 --------                                        -------                                         ------------
maml        { v 1.1.0.9 ,  v 1.1.0.9 ,...       {command:command, :...           {...

Now, I see there are 4 properties of the helpItems node. I can also look at the properties using Get-Member again.

PS C:\> $help_xml.helpItems | gm -MemberType properties

   TypeName: System.Xml.XmlElement

Name             MemberType Definition
----                    ---------- ----------
#comment        Property   System.Object[] #comment {get;}
command         Property   System.Object[] command {get;}
providerHelp    Property   System.Object[] providerHelp {get;}
schema            Property   System.String schema {get;set;}

I could keep walking down through these properties. And I plan to.

Specifically, the next blog posts will be about rifling through the command and providerHelp properties to ferret out the help information from the file. Once I see where the information is located I can start figuring out how to create some simple scripts to pull out the information. And then I should be able to run the script for any PowerShell help file.

In the meantime, feel free to roam around the files yourself to discover what’s there. As long as you don't write content back to the file (for example don't do a Set-Content), you should not have any effect to the help file itself. 

And if this sparks questions or comments, please do post them in the comments section of this blog.