Generate Random Content for SharePoint – 3


This is the third chapter of the Generate Random Content For SharePoint series. This time, detailing the challenges you might face when you want to create Excel workbooks. Chapters:

(If you don’t want to read through all the drama, jump directly to the Falling Action section.)

Exposition

If you've been following my previous posts, by now you should know how to generate unique, big files for upload test, and Word documents with random words to test the indexing functionality. In this post I'll show you the challenges you might face when creating Excel files.

 

Rising Action

Act 1

An Excel table is relatively simple with column- and row headers and some numbers. Unfortunately we do not have the "Lorem ipsum" generator as in Word, but in the previous section we already discussed that this would not be good anyway. We could however use the methods learned in the previous post; get the header colums' content from a dictionary and use a simple random generator for the numbers.

# Generate the first line with the headers
For ($Column = 1; $Column -le $Columns; $Column++)
{
    $RandomWordLine = Get-Random -Minimum 1 -Maximum $DictionaryFileRows
    $RandomWord = $DictionaryFileContent[$RandomWordLine]
    $FileStream.Write("`t$RandomWord")
}
# And the rows with the content
For ($Row = 1; $Row -le $Rows; $Row++)
{
    $FileStream.Write("`r`n") # New line
    $RandomWordLine = Get-Random -Minimum 1 -Maximum $DictionaryFileRows
    $RandomWord = $DictionaryFileContent[$RandomWordLine]
    $fileStream.Write("$RandomWord`t") # Row first column
    For ($Column = 1; $Column -le $Columns; $Column++)
    {
        $NumberTXT = [string](Get-Random -Minimum 1 -Maximum 10000) + "`t"
        $FileStream.Write($NumberTXT) # The actual numbers in the table
    }
}

I hope you found the "`t" and "`r`n" directives interesting. This is how you tell PowerShell to put in a Tab and a new row entry. Also, because we want to use the search refiners, we update the Creator and Las Modified By fields just as we learned earlier. Cheesy easy. Yeah... Not quite.

Act 2

As I've said earlier, while using a Office COM Objects from PowerShell is possible, the performance is not the best. The same applies to Excel as well, of course. Imagine you have to create a table with 100 columns and 1000 rows. This is how long it would take:

  COM Object   Text Writer
Time Taken 324 s 18 s

Even if we have to do the conversion from TXT to XLSX, we will be well below how long it takes with the COM Object. I guess it's a no-brainer what we are going to choose. With Word documents the method was simple: we generated a long string, then pasted it into a document, saved the document as some name, and that was it. It would be good to use this with Excel, wouldn't it? Well... Yes, it would.

Climax

Act 3

Tiny little problem, that Excel does not work as Word does. There's no Selection object, so channeling information directly into an Excel file directly is not possible. Why is this a problem? Two reasons:

  • If we do not have a template Excel file, then we have to create TXT files, then convert them into XLSX. Not a big burden, but it has a little performance impact.
$ExcelWorkBook = $ExcelApplication.Workbooks.Open($TempTXT)
$ExcelWorkBook.SaveAs($ExcelFile, [Microsoft.Office.Interop.Excel.XlFileFormat]::xlWorkbookDefault)
$ExcelWorkBook.Saved = $true
$ExcelWorkBook.Close()
  • If we do have a template file, then we have to copy-paste the content into the file. Yes, copy-paste. It is working, but if you want to use the machine while the files are being generated, then it might cause some problems.
$TempDocument = $ExcelApplication.Workbooks.Open($TempTXT,$null,$true)
$TempSheet = ($TempDocument.Sheets)[1]
$CopiedContent = $TempSheet.UsedRange.Copy()
$Pasted = $ExcelSheet.Range("A1").PasteSpecial()

Now there are a few things you might have noticed:

  • The [Microsoft.Office.Interop.Excel.XlFileFormat]::xlWorkbookDefault directive is defining the output file format, so if you want to generate different files, you could change it to fit your needs. The full list of possible output formats in the XlFileFormat enumeration article.
  • The ($TempDocument.Sheets)[1] definition is not as any other array references as we do not start from zero, but from one.
  • Last, but not least the $TempSheet.UsedRange.Copy() call is the one that copies the content to the clipboard, and this is where you might wish to not use your computer, because you might end-up losing some information.

Act 4

The rest should be pretty straightforward, as the property bag of an excel file is the same as a Word documents. Well... It is. In a way. But not when you are doing the simple TXT to XLSX conversion. For some reason Excel decides that in this case the Creator property should be empty. Even more... Not just empty, but not in the core.xml file at all.

<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<cp:coreProperties xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties">
<cp:lastModifiedBy>Zsolt Illes</cp:lastModifiedBy>
<dcterms:created xsi:type="dcterms:W3CDTF">2017-03-02T07:17:00Z</dcterms:created>
<dcterms:modified xsi:type="dcterms:W3CDTF">2017-03-02T07:17:00Z</dcterms:modified>
</cp:coreProperties>

Of course we could set all these properties in the Excel COM Object directly, but as we discussed it a few times earlier, this would have some serious performance impact. This means that we have to create it from PowerShell. It is a pretty straight-forward operation:

$CreatorNameSpace = 'http://purl.org/dc/elements/1.1'
$CreatorElement = $CoreXML.CreateElement('dc','creator',$CreatorNameSpace)
$null = $CoreXML.DocumentElement.AppendChild($CreatorElement)

For reasons unknown to me PowerShell 5 creates the new element like this:

<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/" />

Instead of this:

<dc:creator />

Purely from XML perspective there is little difference between the two, as the first one means the same, just a bit more noisy. Excel however does not like the first option, and if we leave it in the core.xml like this, the Excel file becomes invalid. To work around that we have to clear that namespace reference from the Element. The only way I found for this is to rip it off on a string level:

$CreatorNameSpaceToNull = ' xmlns:dc="' + $CreatorNameSpace + '"'
$CoreXML = $CoreXML.OuterXml.Replace($CreatorNameSpaceToNull,'')

If you know a more elegant solution, please share it in the comment section.

[Edit: 2017.03.14]

I got the solution for the above from my colleague Roman Lutz.

[System.Xml.XmlNamespaceManager]$NameSpaceManager = New-Object System.Xml.XmlNamespaceManager $CoreXML.NameTable
$NameSpaceManager.AddNamespace('dc', 'http://purl.org/dc/elements/1.1/')
$DCNameSpace = $NameSpaceManager.LookupNamespace('dc')
$NewElement = $CoreXML.CreateElement('dc', 'creator', $DCNameSpace)

 

Falling Action

Act 5

We have now everything to put the solution together.

  • Dictionary for the Row- and Column headers,
  • Random generator for the numbers,
  • A list for users and dates to update the document properties,
  • Way to push all this information into an Excel file,
  • I’ve not detailed it in this article, but my solution also contains a routine to use a pre-defined Excel file template. This might come handy if you want to fine tune your search experience even more. (Just think about search driven solutions for example.)

The scripts are still available on Codeplex (link). Of course this solution still has room for improvement, but that is up for you guys to play with.

Dénouement

With these files on your environment you can test your search infrastructure in a near-life scenario. Stay tuned for the last episode of the series where I detail the challenges with creating PowerPoint presentations.

Comments (0)

Skip to main content