Experimenting with PowerShell V2 scripting, variables and control structures

Last week I was testing Visual Studio 2010 to write a C# application to export all my blog posts to a file. I described that in some detail at https://blogs.technet.com/josebda/archive/2010/03/21/experimenting-with-visual-studio-2010-and-backing-up-the-entries-on-my-blog.aspx

I am performing the exact same task, but this time using PowerShell V2. The basic idea is still the same and I am still using a Browser and HTML document objects to do most of the work. This is part of a series of blog posts I am doing on PowerShell V2, focusing mostly on programming. The sample script uses many variables, objects and collections. It also uses different types of loops and conditinal statements. If you're familiar with programming, it should all make sense.

The code

Let's start with the basics of creating and running the PowerShell script. To make it simple, you can use Notepad to create a file called "BlogBackup.ps1". You can then copy and pasted the code from second cell of the table below:

BlogBackup.ps1

$Browser = New-Object -COM "InternetExplorer.Application"$MorePages = $True$Page = 1$Post = 0$BaseURL = "https://blogs.technet.com/josebda" $File = "./josebda.htm""Exporting posts from " + $BaseURL + " to a file""<HTML><BODY>" | Out-File $FileWhile ($MorePages){ $URL = $BaseURL + "/default.aspx?p=" + $Page "Loading Page " + $Page + " (" + $URL + ")" $Browser.Navigate($URL) While ($Browser.ReadyState -ne 4) { Start-Sleep -Seconds 1 } "Processing Page " + $Page $MorePages = $False $Divs = $Browser.Document.getElementsByTagName("DIV")

ForEach ($Div in $Divs) { $DivText = $Div.OuterHTML.ToString() If ($DivText.Length -gt 16) { If ($DivText.Substring(2,16) -eq "<DIV class=post>") { $MorePages = $True $Post++ $Title = @($Div.getElementsByTagName("A"))[0].InnerHTML "Exporting post " + $Post + " = " + $Title $DivText | Out-File $File -append } } } "Processed Page " + $Page +"." $Page++}"</BODY></HTML>" | Out-File $File -append"Processing complete!"

Execution Policy

If you simply try to run the script, you should get the following error:

PS C:> .blogbackup.ps1
File C:blogbackup.ps1 cannot be loaded because the execution of scripts is disabled on this system. Please see "get-help about_signing" for more details.
At line:1 char:17
+ .blogbackup.ps1 <<<<
    + CategoryInfo          : NotSpecified: (:) [], PSSecurityException
    + FullyQualifiedErrorId : RuntimeException

That's because the default setting for PowerShell is to restrict the execution of unsigned scripts. You can confirm that by running

PS C:> Get-ExecutionPolicy
Restricted

You can change that policy by using the Set-ExecutionPolicy cmdlet.
To be on the safe side, we'll change it just for the current process (if you close PowerShell and open it again, the policy will be back to Restricted).

PS C:> Set-ExecutionPolicy -scope process Unrestricted
Execution Policy Change
The execution policy helps protect you from scripts that you do not trust. Changing the execution policy might expose
you to the security risks described in the about_Execution_Policies help topic. Do you want to change the execution
policy?
[Y] Yes  [N] No  [S] Suspend  [?] Help (default is "Y"): <ENTER>

PS C:Usersjosebda> Get-ExecutionPolicy
Unrestricted

With that, you should now be able to run the script just fine for this session.
If you really intend to write scripts and you want to run them securely, you should learn about how to sign them
You can start by reading https://technet.microsoft.com/en-us/magazine/2008.04.powershell.aspx.

The output

Here's what the output of the script should look like:

PS C:> .blogbackup.ps1
Exporting posts from https://blogs.technet.com/josebda to a file
Loading Page 1 (https://blogs.technet.com/josebda/default.aspx?p=1)
Processing Page 1
Exporting post 1 = Comparing RPC, WMI and WinRM for remote server management with PowerShell V2
Exporting post 2 = Why Hyper-V VHD Files Are So Large - And How To Efficiently Copy Them
Exporting post 3 = Experimenting with PowerShell V2 Remoting
Exporting post 4 = How DFS Replication (DFS-R) secures its communication
Exporting post 5 = Experimenting with Visual Studio 2010 and backing up the entries on my blog
Exporting post 6 = Windows Storage Server 2008 and iSCSI Software Target 3.2 documentation on TechNet
Exporting post 7 = FAST'10 Technical Sessions
Exporting post 8 = Unique Document URLs in MOSS 2007 and the new Document ID feature in SharePoint 2010
Exporting post 9 = Random thoughts and links on Storage
Exporting post 10 = Presentations from Storage Developer Conference 2009 (SDC 2009) are now available for download
Exporting post 11 = Windows Server DFS Namespaces (DFS-N) Reference
Exporting post 12 = Configuring Failover Clusters with Windows Storage Server 2008
Exporting post 13 = Automatically uploading files from File Server to SharePoint using the File Classification Infrastructure (FCI)
Exporting post 14 = Six Uses for the Microsoft iSCSI Software Target
Exporting post 15 = Download for Powershell v2 for Windows 7? No need... It's already there!
Processed Page 1.
Loading Page 2 (https://blogs.technet.com/josebda/default.aspx?p=2)
Processing Page 2
Exporting post 16 = SQL Server 2008 R2 Enterprise Evaluation November CTP available for MSDN/TechNet Subscribers
Exporting post 17 = Mistakes when configuring your Hyper-V environment
Exporting post 18 = Scary SQL Server stuff: tombstones, phantoms, blobs, ghosts and zombies
Exporting post 19 = Implementing an End-User Data Centralization Solution with Folder Redirection and Offlines Files
Exporting post 20 = SharePoint 2010 beta in November. Details and documentation right now!
Exporting post 21 = File Server Capacity Tool (FSCT) 1.0 available for download

[Lots of lines excluded here for brevity]

Exporting post 296 = EAS Support in Windows / Suporte para EAS no Windows
Exporting post 297 = Project Server 2003 Training / Treinamento para Project Server 2003
Exporting post 298 = Automating Certificates / Automaçao de Certificados
Exporting post 299 = DoNotAllowXPSP2
Exporting post 300 = MOM 2005 Preview
Processed Page 20.
Loading Page 21 (https://blogs.technet.com/josebda/default.aspx?p=21)
Processing Page 21
Exporting post 301 = Security for Applications
Exporting post 302 = Good scripting book for WSH, ADSI, WMI?
Exporting post 303 = Outlook 2003 and the OAB
Exporting post 304 = We live very interesting times
Exporting post 305 = Security, VHDs e Defragmentation
Exporting post 306 = This blog thing
Processed Page 21.
Loading Page 22 (https://blogs.technet.com/josebda/default.aspx?p=22)
Processing Page 22
Processed Page 22.
Processing complete!

After it runs, you should also find a file called josebda.htm with the entire text of all the blog posts. That's what the Out-File cmdlet used in the script does.

The variables

The script uses a number of variables to keep track of things. These are the items starting with a $ sign, like $Page (used to track what page we are processing), $BaseURL (the blog location), $File (the name of the output file) or $Post (the number of posts found so far). You will notice they are initialized and later updated throughout the code. One special variable called $MorePages is used to know if we have reached the end of the blog. You see, the blog system provides a set of pages starting with 1, but I cannot tell from the start how many pages with blog posts there will be. So, I use this variable to track if I have loaded a page with no posts in it, which indicates there are no more pages to process.

There are some variables that hold more complex information. They are actually objects. That includes the $Browser (this is an instance of an Internet Explorer browser that is used to retrieve the pages) and $Divs (a set of HTML elements using the <DIV> tag). These objects are not just basic types, but more complex ones, which include a longer list of properties and methods. For the $Browser, for instance, I use the $Browser.Navigate method to load a specific page) and the $Browser.ReadyState property to tell if the page is done loading. I also use $Browser.Document.getElementsByTagName (yes, that's a method of an object inside an object) to find all "DIV" tags in the resulting document. I used $Divs to store that resulting set of tags.

The control structures

Several PowerShell control structures are used, like While, ForEach and If. The main loop, for instance, makes sure we keep getting more pages until we find a page with no $MorePosts, incrementing $Page at the end of each pass. A second loop uses a ForEach to look into each of the elements returned by GetElementsByTagName("DIV") to inspect it. Also, two If statements look at whether there are enough characters in the $Div.OuterHTML to look at (we need at least 16 to have a chance of being a post) and then if it starts with "<DIV class=post>", which means it's one of the posts inside the page (there are usually a few posts in each page).

If you're not familiar with PowerShell expressions, you might be confused by the comparison operators. They might seem unintuitive at first, but you get used to them. In the script, I used -eq (equal), -ne (not equal) and -gt (greater than). The main While loop does not use an operator because $MorePages is already a boolean (contains either a $True or a $False value).

I must admit that moving strings around like this is probably not the most efficient wait to process the document, but I was shooting for simplicity not for extreme performance. In fact, the only reason I even bothered to check the length of the $Div.OuterHTML was because the .Substring(2,16) will fail if the string is not long enough.

A couple of tricks

There are two lines in the code that are somewhat tricky and also deserve a comment.

First, there is the line saying "While ($Browser.ReadyState -ne 4) { Start-Sleep -Seconds 1 }". This is the line that waits for the document to finish loading before we go look at it. You see, the browser control is asynchronous and it will give us back control before the page is fully loaded. This line will wait for ReadyState to become 4, which means that "the control has finished loading the new document and all its contents".The statement inside the loop waits for 1 second. See https://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.readystate.aspx for details on the other states.

Second, there is the line saying "$Title = @($Div.GetElementsByTagName("A"))[0].InnerHTML". In short, this line extracts the text inside the first "<A>" tag within the post. The $Div contains a post, which is an HTML element containing an entire "<DIV>" tag. Inside it, the first "<A>" tag contains a link to the URL of the post and the inner text of that tag is the title of the post. The statement starts by getting a list of all "<A>" tags within the $Div and then gets the first of those elements (that would be element number zero, represented by [0]). This whole section is optional, but it's interesting to show the name of the post while you are looking at the output.

Conclusion

This post has more of a developer flavor to it and it shows the range of tasks you can automate with PowerShell. For instance, you could create a script to get a list of all computer objects in Active Directory with the term "file" in the name, then you could use that list to find all file shares in each of those servers that do not end with $, then you could get information about used/free space on each volume used by each share and finally you could output a nice HTML table with all the results. But that would be an entirely new blog post...