Use a PowerShell Hash Table to Simplify File Backups
Summary: Learn how to use a Windows PowerShell hash table to simplify backing up unique files.
Hey, Scripting Guy! Your series on hash tables has been pretty interesting. I am wondering if you have some practical uses for hash tables. Can you provide a few examples of how using a hash table would be useful?
—RE
Hello RE,
Microsoft Scripting Guy Ed Wilson here. This morning the Scripting Wife received an email message from one of the hotels where we had stayed during our last trip to Australia. They were advertising their winter specials. The temperature around here has hovered near 100 degrees Fahrenheit (37.7 degrees Celsius according to my Windows PowerShell conversion module). One of the things I love about Australia (besides Tim Tams, Lamingtons, great scuba diving, awesome scenery, and especially wonderful people) is the fact that when the weather is oppressively hot in the Deep South, it is winter down under. A quick flight to Brisbane and one has escaped the hot sticky weather of Charlotte, North Carolina, in July.
RE, to answer directly your question let me begin by detailing a scenario. Suppose there is a directory structure that contains a number of folders and files. Inside the different folders, each file name is, of course, unique. However, inside the subfolders, there are duplicate files. A sample file structure containing duplicate files in nested folders is shown in the following figure.
If I need to flatten this directory structure (move it from several nested folders, some of which have duplicate files, to a single folder with no duplicates), there is a variety of approaches I can use. One is to use the graphical user interface of the Windows Explorer tool. The problem with this approach is it is interactive, and involves a lot of clicking, “Yes I know there is an existing file and I want to write over that file.” This makes things proceed rather slowly. In addition, I might want to have a job that I can schedule on a nightly basis for backup purposes. Writing a bit of Windows PowerShell code will allow me the flexibility to solve the problem in multiple ways. In addition, I can avoid multiple mouse clicks. A hash table will simplify the coding that is required.
If I use the name property (of the System.IO.FileInfo object) as the key for my hash table, and the fullname property (of the same object) as the value that is associated with the key, I can filter out all of the duplicate files. This filtering occurs because the key property of a hash table must be unique.
In the command that follows, I first create an empty hash table and store it in the $hash variable. I use a semicolon to allow me to continue another command on the same line. Next, I use dir to get a directory listing of my current directory (the C:\hsgtest folder is the working directory as indicated by the Windows PowerShell prompt). I use the recurse switch to cause the command to work through all of the nested directories. The dir command is an alias for the Get-ChildItem cmdlet. I pipe the results to the Where-Object cmdlet (? Is an alias for the Where-Object cmdlet). Inside the script block (the braces), I use the ! operator (means not) to cause the Where-Object to return only items that are not containers (in other words, files). I pipe the files to the ForEach-Object cmdlet (% is an alias for the ForEach-Object cmdlet). Inside the script block (braces) associated with the ForEach-Object cmdlet I add the name property and the fullname property from the fileinfo objects to the hash table by using the add method. The name of each file becomes the key in the hash table, and the fullname (which is the full path to the file) becomes the value associated with each item. When the command runs, errors appear for each duplicate file name that are attempted to be added to the hash table. This is expected, and is confirmation the command works properly. The complete command is shown here:
PS C:\hsgTest> $hash = @{}; dir -recurse | ? { !$_.psiscontainer} | % { $hash.add($_.name,$_.fullname) }
To see the consolidated list of files, I inspect the contents of the $hash variable as shown here:
PS C:\hsgTest> $hash
Name Value
—- —–
testfile30.txt C:\hsgTest\testfile30.txt
testfile27.txt C:\hsgTest\hsgtest2\testfile27.txt
testfile21.txt C:\hsgTest\hsgtest2\testfile21.txt
testfile20.txt C:\hsgTest\testfile20.txt
testfile25.txt C:\hsgTest\hsgtest2\testfile25.txt
testfile10.txt C:\hsgTest\testfile10.txt
testfile6.txt C:\hsgTest\testfile6.txt
testfile35.txt C:\hsgTest\hsgtest2\hsgTest3\testfile35.txt
testfile34.txt C:\hsgTest\hsgtest2\hsgTest3\testfile34.txt
testfile28.txt C:\hsgTest\testfile28.txt
testfile1.txt C:\hsgTest\testfile1.txt
testfile23.txt C:\hsgTest\hsgtest2\testfile23.txt
testfile2.txt C:\hsgTest\testfile2.txt
testfile29.txt C:\hsgTest\hsgtest2\testfile29.txt
testfile24.txt C:\hsgTest\testfile24.txt
testfile9.txt C:\hsgTest\testfile9.txt
testfile38.txt C:\hsgTest\hsgtest2\hsgTest3\testfile38.txt
testfile40.txt C:\hsgTest\hsgtest2\hsgTest3\testfile40.txt
testfile8.txt C:\hsgTest\testfile8.txt
testfile36.txt C:\hsgTest\hsgtest2\hsgTest3\testfile36.txt
testfile33.txt C:\hsgTest\hsgtest2\hsgTest3\testfile33.txt
testfile32.txt C:\hsgTest\hsgtest2\hsgTest3\testfile32.txt
testfile26.txt C:\hsgTest\testfile26.txt
testfile22.txt C:\hsgTest\testfile22.txt
testfile31.txt C:\hsgTest\hsgtest2\hsgTest3\testfile31.txt
testfile39.txt C:\hsgTest\hsgtest2\hsgTest3\testfile39.txt
testfile37.txt C:\hsgTest\hsgtest2\hsgTest3\testfile37.txt
testfile5.txt C:\hsgTest\testfile5.txt
testfile4.txt C:\hsgTest\testfile4.txt
testfile7.txt C:\hsgTest\testfile7.txt
testfile3.txt C:\hsgTest\testfile3.txt
Of course, I could simply use the Copy-Item cmdlet to flatten the hierarchy, but unfortunately, the nested folders still get copied. The container switched parameter causes the Copy-Item cmdlet to duplicate the file structure, including any nested folders. By default, the container switched parameter has a value of TRUE and will duplicate the existing hierarchy as seen in the following figure.
To supply a value of FALSE to the switched parameter requires the trick of using a colon and then the $false value. The resulting command is shown here:
PS C:\hsgTest> Copy-Item -Path . -Destination C:\hsgBackup -Recurse -Container:$false
All of the files are copied into the root of the destination, but the two nested folders still appear. The two nested folders are empty, but still present. This is shown in the following figure.
One of the cool things about the Copy-Item cmdlet is that it will accept an array for the path parameter. Using the values property from the hash table stored in the $hash variable, I have the full path to each unique file in the directory structure. I can copy the unique files to the hsgbackup directory by using the Copy-Item cmdlet. The sub expression $() is required to force evaluation of the $hash.values property before executing the Copy-Item command.
PS C:\hsgTest> Copy-Item -Path $($hash.values) -Destination C:\hsgBackup
A quick look at the hsgbackup directory reveals that the copy proceeded as expected—no nested folders appear. The backup directory is shown in the following figure.
The complete command used to backup unique files from nested directories follows this paragraph. Refer to earlier portions of this article for a complete explanation of the commands and aliases. (To simplify the command syntax, I set my working directory to the directory I wanted to backup.)
$hash = @{}; dir -recurse | ? { !$_.psiscontainer} | % { $hash.add($_.name,$_.fullname) }
Copy-Item -Path $($hash.values) -Destination C:\hsgBackup
RE, that is all there is to using hash tables in scripts. Hash Table Week will continue tomorrow when I will talk about using hash tables in conjunction with other Windows PowerShell commands.
I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.
Ed Wilson, Microsoft Scripting Guy
0 comments