Exchange 2010 has an interesting if little known feature based on Windows TIFF iFilters. I had an interesting problem not long ago where a customer wanted to be able to scan as many attachment types as possible for specific key words. The well known filters were the Office 2010 iFilter pack available here http://support.microsoft.com/kb/2460041/en-us. Adobe has also released a PDF iFilter available here http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025.
In case you need a refresher on how iFilters interact with Exchange 2010, the short answer is they allow Exchange 2010's Exchange Search to index the contents of the supported file types when they are installed on Exchange 2010 servers with the Mailbox role and when they are installed on Exchange 2010 servers with the Hub Transport role they allow the Hub Transport agents to take some action based on the contents of supported file types when they are sent as attachments. A more detailed explanation can be found here: http://technet.microsoft.com/en-us/library/ee732397(v=exchg.141).aspx
This customer wanted to go beyond the mainstream file types and wanted to include images as well. The problem with images is that an image is just that; an image. Although images can contain text, when they do, the text has to be converted to something that can be understood by Exchange prior to being able to index the content or act upon it using a transport agent.
Although the Windows Server OS does not offer native OCR support for most image types, it does offer OCR support for TIFF images; that support is extended to Exchange 2010 in the form of the Windows TIFF iFilter. You can read more about the Windows TIFF iFilter here: http://technet.microsoft.com/en-us/library/dd834685.aspx. Exchange 2010 also supports the Windows TIFF iFilter after performing some additional configuration steps which are documented here: http://technet.microsoft.com/en-us/library/dd744689(v=WS.10).aspx. After performing all of the steps in the documentation, the Exchange 2010 portion was still not working for me. Additional testing and through trial and error led me to create the steps below. I also created a PowerShell script to automate creating the necessary registry entries.
The only scenarios that I tested were with Windows Server 2008 R2 and Exchange 2010 SP3. It is possible these steps will also work with Windows Server 2012 R2 and Exchange 2010 or Exchange 2013. If I successfully test these steps with Exchange 2013 on Windows Server 2012 R2 then I will update this section.
Exchange 2010 Mailbox Servers vs. Hub Transport Servers
If you wish to index the contents of TIFF files and search for that content within user's mailboxes then the implementation steps must be performed on the Mailbox servers. If the intent is to be able to make Hub Transport rules based on *.tif and *.tiff attachment types, then the implementation steps must be performed on the Hub Transport servers. Needless to say, you can also perform the steps on both Mailbox and Hub Transport servers if you wish to both search and control the mail routing of TIFF data.
Implementation Steps (Mailbox and Hub Transport Servers)
- Install the Windows TIFF iFilter using the following PowerShell Commands
Import-Module servermanager; Add-WindowsFeature TIFF-iFilter
- Open Group Policy and go to Computer Configuration > Administrative Templates > Windows Components > Search > OCR > "Force TIFF iFilter to perform OCR for every page in a TIFF document" and select Enabled.
- Run the following PowerShell to create the necessary registry entries or manually create them as per the documentation here. If your system drive is not C:\ you will need to modify the path in the script below to the proper drive letter before running the script.
#Creates *.tif Entry
#Creates *.tiff Entry
- From within an elevated PowerShell window type Stop-Service msftesql-exchange -force
- (Mailbox servers only) Locate the database catalog for each database and delete the catalog
- From within an elevated PowerShell window type Start-Service msexchangesearch
Exchange 2010 should now be configured to index TIFF content and Hub Transport rules should now be able to take some action based on TIFF file attachments. If the implementation steps were performed on a mailbox server, ensure that the database content has been fully indexed before attempting to search for TIFF content.
- Mailbox Servers - The easiest way I found to validate that the TIFF iFilters were working as intended for mailbox servers was to use MSPaint to create a TIFF file with a unique keyword which I then saved in a user mailbox. I then searched the user's mailbox for the keyword and the TIFF file was found
- Hub Transport Servers - I created a hub transport rule that rejected any emails with an attachment that contained the unique keyword that was embedded in the TIFF file then I attached the TIFF file and attempted to send it to another user within the organization. As expected the NDR was properly generated and the message delivery was blocked.
Image OCR is far from perfect, the quality, size, and clarity of the source image will greatly determine how successful the Exchange TIFF iFilter accuracy will be. Despite its imperfections, if you manage an Exchange Organization that needs maximum control over attachment types, the Windows TIFF iFilter could definitely lend some value to any comprehensive solution.