We just introduced the world to the File Classification Infrastructure (Windows Server 2008 R2 File Classification Infrastructure – Managing data based on business value). Now let us show you how to set up your file server to classify files.
The first step is to install the File Classification Infrastructure (FCI). This is done by installing the File Server Resource Manager (FSRM) role service in the File Services role, since FCI is exposed through FSRM. This does mean that FCI is available on every SKU of Windows Server 2008 R2 and comes at no additional cost.
Configuring properties to track
We think of classification as a process that is business driven and expect the corporation to determine which properties it needs for files to drive their policies. When you first install FCI, you’ll notice that no properties are predefined. Defining a property is easy though:
- Open FSRM (it’s under the Administrative Tools)
- Navigate to Classification Management –> Classification Properties
- Click on “Create Property”
In this screen shot I’ve defined two properties:
- PII a boolean (yes/no) to indicate if a file contains personal information
- Secrecy an ordered list of values (High, Medium, and Low)
People often have a tendency to define a large number of properties that they think might be useful information to know about files. However, we would strongly encourage people to only classify files for properties that are actually driving a management policy of some sort. Classifying files for too many properties, slows down the classification process and means we have to store more information on your disks.
Properties really just require two pieces of information: the name and the type. Some property types require more information (an Ordered List property requires a list of valid values, etc). Optionally you can supply a description for the property. The following types are supported by FCI in Windows Server 2008 R2:
- Yes/No – a boolean
- Number – integer
- Multiple Choice List – a list of values where multiple values can be selected
- Ordered List – a list of values that have an implicit ordering (for example high/medium/low or first/second/last)
- Multi-String – allows you to set several unique strings to a property
This is a strict subset of the types available on SharePoint.
Created classification properties sets up a schema of properties that should be to tracked for this file server. No files have been modified at this point. However, everything is set up for an admin to write a script that sets properties on files (using the FCI API http://msdn.microsoft.com/en-us/library/dd392349(VS.85).aspx), for a LOB application to set properties on files (since it is remotely scriptable COM, scripts, native, and managed applications can use the API), or for Office files to be manually classified (see a future blog post on more details here). However, to classify the existing files, automatic classification is necessary. For this we use Classification Rules.
Automatically classifying files
Classification rules are simply created by navigating to the “Classification Rules” node and clicking on “Create a New Classification Rule” action
Each rule has a name and a scope. The name allows us to figure out which rule set a property value on a file. The scope is necessary since your may have different logic to classify engineering files compared to your finance files, etc. Each classification rule uses a classification mechanism to decide which value to assign a specific property.
This rule uses the Folder Classifier which assigns the specified value to the classification property for all files within the rule’s scope. Using this rule everything that appears in D:\engineering will be marked as Medium Secrecy.
Another rule may use the Content Classifier to search the contents of files.
The Content Classifier searches for text or patterns using the same mechanism as the search indexer and if it finds them assigns the specified value to the classification property. For this to work, we have to tell the Content Classifier what to search for. This is done by clicking on the Advanced button and selecting the Additional Classification Parameters tab on the resulting dialog.
Here you can supply a series of parameters by specifying their name and the parameter value. Parameter names that you can use are:
- RegularExpression – this is a standard .Net regular expression
- String – a simple string
- StringCaseSensitive – a string that is case sensitive
You can have multiple parameters of the same type or a mix of all the parameter types. When all of these parameters are found in a file, then the rule will assign the property value. If you need to set a property value if a file contains the words “Confidential” or “Private” you will need to setup two different rules. The content classifier is intelligent enough to only scan the file once even if there are multiple rules defined.
If these classification mechanisms are not enough for your needs, you can build your own classification plugin (see the Windows 7 SDK) or purchase one from a 3rd party ISV. Once such a plugin is installed it shows up in the drop down list of the classification mechanisms.
Multiple classification rules may attempt to assign values to the same property on a file. Consider the following:
- A rule attempts to set the Secrecy Property to Medium since the file is located in d:\Engineering
- Another rule attempts to set the Secrecy Property to High since it contains the word Confidential
In such cases, FCI attempts to aggregate the property values. This can be done for the following property types:
- Yes/No properties – Yes wins over No
- Multiple Choice List – Combines the sets of values
- Ordered List properties – The highest entry in the list wins
- Multi-String – Combines the sets of strings into one set of unique strings
In addition, rules by default only classify files that have not yet been classified for the same property. To change this click on the Advanced button.
By checking the “Re-evaluate existing property values” box the rule will attempt to reclassify files if the file or the classification rules have changed. Once the box is checked, the rule must be set to either overwrite any existing property values or to aggregate the new value with the existing value.
The classification rules are applied on a scheduled basis to the files on the server since this may take some time. The administrator can specify the time during which classification can take place. During that time period FCI will find any files that have to be classified or re-classified and process as many of them as possible. Once a file has been classified, the properties stay with the file while it is moved around on NTFS file systems. For Office files, the properties stay attached no matter what is done to the file (they are stored in the file). Of course there is a mechanism to remove properties from files as well, if the administrator needs to.
Now that we have specified the property schema we are interested in for this server and setup a series of rules to automatically classify rules, we are all ready to start managing our files based on their value to the business instead of just where we store them.
Post by Matthias Wollnik
See also these additional blog posts: