Use a PowerShell Hash Table to Simplify File Backups

Doctor Scripto

Summary: Learn how to use a Windows PowerShell hash table to simplify backing up unique files.

 

Hey, Scripting Guy! QuestionHey, Scripting Guy! Your series on hash tables has been pretty interesting. I am wondering if you have some practical uses for hash tables. Can you provide a few examples of how using a hash table would be useful?

—RE

 

Hey, Scripting Guy! AnswerHello RE,

Microsoft Scripting Guy Ed Wilson here. This morning the Scripting Wife received an email message from one of the hotels where we had stayed during our last trip to Australia. They were advertising their winter specials. The temperature around here has hovered near 100 degrees Fahrenheit (37.7 degrees Celsius according to my Windows PowerShell conversion module). One of the things I love about Australia (besides Tim Tams, Lamingtons, great scuba diving, awesome scenery, and especially wonderful people) is the fact that when the weather is oppressively hot in the Deep South, it is winter down under. A quick flight to Brisbane and one has escaped the hot sticky weather of Charlotte, North Carolina, in July. 

RE, to answer directly your question let me begin by detailing a scenario. Suppose there is a directory structure that contains a number of folders and files. Inside the different folders, each file name is, of course, unique. However, inside the subfolders, there are duplicate files. A sample file structure containing duplicate files in nested folders is shown in the following figure.

Image of sample file structure containing duplicate files in nested folders

If I need to flatten this directory structure (move it from several nested folders, some of which have duplicate files, to a single folder with no duplicates), there is a variety of approaches I can use. One is to use the graphical user interface of the Windows Explorer tool. The problem with this approach is it is interactive, and involves a lot of clicking, “Yes I know there is an existing file and I want to write over that file.” This makes things proceed rather slowly. In addition, I might want to have a job that I can schedule on a nightly basis for backup purposes. Writing a bit of Windows PowerShell code will allow me the flexibility to solve the problem in multiple ways. In addition, I can avoid multiple mouse clicks. A hash table will simplify the coding that is required.

If I use the name property (of the System.IO.FileInfo object) as the key for my hash table, and the fullname property (of the same object) as the value that is associated with the key, I can filter out all of the duplicate files. This filtering occurs because the key property of a hash table must be unique.

In the command that follows, I first create an empty hash table and store it in the $hash variable. I use a semicolon to allow me to continue another command on the same line. Next, I use dir to get a directory listing of my current directory (the C:\hsgtest folder is the working directory as indicated by the Windows PowerShell prompt). I use the recurse switch to cause the command to work through all of the nested directories. The dir command is an alias for the Get-ChildItem cmdlet. I pipe the results to the Where-Object cmdlet (? Is an alias for the Where-Object cmdlet). Inside the script block (the braces), I use the ! operator (means not) to cause the Where-Object to return only items that are not containers (in other words, files). I pipe the files to the ForEach-Object cmdlet (% is an alias for the ForEach-Object cmdlet). Inside the script block (braces) associated with the ForEach-Object cmdlet I add the name property and the fullname property from the fileinfo objects to the hash table by using the add method. The name of each file becomes the key in the hash table, and the fullname (which is the full path to the file) becomes the value associated with each item. When the command runs, errors appear for each duplicate file name that are attempted to be added to the hash table. This is expected, and is confirmation the command works properly. The complete command is shown here:

PS C:\hsgTest> $hash = @{}; dir -recurse | ? { !$_.psiscontainer} | % { $hash.add($_.name,$_.fullname) }

To see the consolidated list of files, I inspect the contents of the $hash variable as shown here:

PS C:\hsgTest> $hash

 

Name                           Value

—-                           —–

testfile30.txt                 C:\hsgTest\testfile30.txt

testfile27.txt                 C:\hsgTest\hsgtest2\testfile27.txt

testfile21.txt                 C:\hsgTest\hsgtest2\testfile21.txt

testfile20.txt                 C:\hsgTest\testfile20.txt

testfile25.txt                 C:\hsgTest\hsgtest2\testfile25.txt

testfile10.txt                 C:\hsgTest\testfile10.txt

testfile6.txt                  C:\hsgTest\testfile6.txt

testfile35.txt                 C:\hsgTest\hsgtest2\hsgTest3\testfile35.txt

testfile34.txt                 C:\hsgTest\hsgtest2\hsgTest3\testfile34.txt

testfile28.txt                 C:\hsgTest\testfile28.txt

testfile1.txt                  C:\hsgTest\testfile1.txt

testfile23.txt                 C:\hsgTest\hsgtest2\testfile23.txt

testfile2.txt                  C:\hsgTest\testfile2.txt

testfile29.txt                 C:\hsgTest\hsgtest2\testfile29.txt

testfile24.txt                 C:\hsgTest\testfile24.txt

testfile9.txt                  C:\hsgTest\testfile9.txt

testfile38.txt                 C:\hsgTest\hsgtest2\hsgTest3\testfile38.txt

testfile40.txt                 C:\hsgTest\hsgtest2\hsgTest3\testfile40.txt

testfile8.txt                  C:\hsgTest\testfile8.txt

testfile36.txt                 C:\hsgTest\hsgtest2\hsgTest3\testfile36.txt

testfile33.txt                 C:\hsgTest\hsgtest2\hsgTest3\testfile33.txt

testfile32.txt                 C:\hsgTest\hsgtest2\hsgTest3\testfile32.txt

testfile26.txt                 C:\hsgTest\testfile26.txt

testfile22.txt                 C:\hsgTest\testfile22.txt

testfile31.txt                 C:\hsgTest\hsgtest2\hsgTest3\testfile31.txt

testfile39.txt                 C:\hsgTest\hsgtest2\hsgTest3\testfile39.txt

testfile37.txt                 C:\hsgTest\hsgtest2\hsgTest3\testfile37.txt

testfile5.txt                  C:\hsgTest\testfile5.txt

testfile4.txt                  C:\hsgTest\testfile4.txt

testfile7.txt                  C:\hsgTest\testfile7.txt

testfile3.txt                  C:\hsgTest\testfile3.txt

Of course, I could simply use the Copy-Item cmdlet to flatten the hierarchy, but unfortunately, the nested folders still get copied. The container switched parameter causes the Copy-Item cmdlet to duplicate the file structure, including any nested folders. By default, the container switched parameter has a value of TRUE and will duplicate the existing hierarchy as seen in the following figure.

Image of container switched parameter when set to TRUE duplicating existing hierarchy

To supply a value of FALSE to the switched parameter requires the trick of using a colon and then the $false value. The resulting command is shown here:

PS C:\hsgTest> Copy-Item -Path . -Destination C:\hsgBackup -Recurse -Container:$false

All of the files are copied into the root of the destination, but the two nested folders still appear. The two nested folders are empty, but still present. This is shown in the following figure.

Image of nested folders empty but still present

One of the cool things about the Copy-Item cmdlet is that it will accept an array for the path parameter. Using the values property from the hash table stored in the $hash variable, I have the full path to each unique file in the directory structure. I can copy the unique files to the hsgbackup directory by using the Copy-Item cmdlet. The sub expression $() is required to force evaluation of the $hash.values property before executing the Copy-Item command.

PS C:\hsgTest> Copy-Item -Path $($hash.values) -Destination C:\hsgBackup

A quick look at the hsgbackup directory reveals that the copy proceeded as expected—no nested folders appear. The backup directory is shown in the following figure.

Image of backup directory

The complete command used to backup unique files from nested directories follows this paragraph. Refer to earlier portions of this article for a complete explanation of the commands and aliases. (To simplify the command syntax, I set my working directory to the directory I wanted to backup.)

$hash = @{}; dir -recurse | ? { !$_.psiscontainer} | % { $hash.add($_.name,$_.fullname) }

Copy-Item -Path $($hash.values) -Destination C:\hsgBackup

RE, that is all there is to using hash tables in scripts. Hash Table Week will continue tomorrow when I will talk about using hash tables in conjunction with other Windows PowerShell commands.

 

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy

 

 

0 comments

Discussion is closed.

Feedback usabilla icon