…DLP – Creating Custom Rules


Microsoft has provided multiple DLP Policy Templates to help Government customers protect their critical data quickly and efficiently. However, these rules may not meet your specific requirements and may require custom DLP rules. A great example of this is around Social Security Numbers and the built in classification rule. The rule itself is designed to do multiple things:

  • Look for formatted and unformatted Social Security Numbers
  • Rule out numbers deemed invalid as SSN's
  • Look for proximity matches such as a leading or trailing value (SSN, SS#, etc.)

The rule then assigns values to the various 'hits' and if the total of the values reaches the limit, which in this case is 85%, then it is deemed as a match and takes action based on how you configured your rule. There are multiple ways to combat this, first you can override the default threshold within the rule (this is covered in another post) or you can create a custom rule pack, which we will do here.

Before we do that I'd like to break down a DLP policy into its components. First there are the classification rules, these rules are at the core of a DLP policy. They define the term or terms being scanned, the proximity of values if multiple terms are being evaluated and the score    assigned for each of the values when the term matches the criteria. You can also define the term being evaluated as a word or complex expression, which we will see in the example below. Once you have defined what you are evaluating you then create a policy which defines the following:

  • What rules you want to use to evaluate messages
  • When you want to apply the rule (all messages, only internal, only internal to external, sent from a specific user, etc.)
  • What you want to do if the rule matches (encrypt, block, send for approval, etc.)

Now that we have the basics down lets create a simple rule that ONLY looks for formatted and unformatted Social Security numbers and validates that the numbers meet SSN guidelines (ensuring they are valid). In addition the rule will put a MailTip in the message if the rule is triggered, if the user still sends the message it will be blocked and reported to a supervisor. Before we get started there are several TechNet articles that go into great detail on Classification Rules including Developing Sensitive Information Rules Packages as well as Define Your Own DLP Templates and Information Types. Let's get started…

Creating a custom Classification Rule

Get ready, because we are about to get knee deep into code! I am going to break down the full rule into sections, then show you the whole piece of code together that we will be uploading. One other important note before getting started is the GUID, below I have several GUID's for the various objects, it is important that these GUID's are always unique. There are several GUID generators out there, but I wanted to note you may run into errors when saving code if you have a GUID that already exists in a rule set somewhere else. There are some general sections that cover the overall structure of the rule, things like the version, schema, GUID, published name and description. This section is shown below:

<?xml version="1.0" encoding='UTF-8'?>

<RulePackage xmlns="http://schemas.microsoft.com/office/2011/mce">

    <RulePack id="b4b4c60e-2ff7-47b2-a672-86e36cf608be">

        <Version major="1" minor="0" build="0" revision="0"/>

        <Publisher id="7ea13c35-0e58-472a-b864-5f2e717edec6"/>

        <Details defaultLangCode="en-us">

            <LocalizedDetails langcode="en-us">

                <PublisherName>DLP by the Cloud Master</PublisherName>

                <Name>Custom SSN Classification</Name>

                <Description>Custom SSN Classification</Description>

            </LocalizedDetails>

        </Details>

    </RulePack>

Once we get past the framework code we can look at what the package is actually delivering. I am going to break this section in to multiple sections because it has a lot of meat to it. The first section is all around the Entity:

     <Rules>

        <!– SSN –>    

        <Entity id="0ba2cb9d-4ef1-4fdd-bd16-c3f431363d4b" patternsProximity="300" recommendedConfidence="75">

            <Pattern confidenceLevel="85">

            <IdMatch idRef="FormattedSSN" />

            </Pattern>            

            <Pattern confidenceLevel="85">

            <IdMatch idRef="UnformattedSSN" />

            </Pattern>

        </Entity>

 

Some of the key variables include the Entity ID, patternsProximity and recommendedConfidence. The Entity ID is a unique GUID for the rule. PatternsProximity attribute defines the distance, in Unicode Characters, from the IdMatch location for all other matches specified for that pattern. Finally the recommendedConfidence is the attribute that suggests the confidence level based on a 'positive ID' of the rule, in this case 75.

Within the rule there are additional components including the Pattern ConfidenceLevel and IdMatch idRef. These define the confidence level that is assigned if the IdMatch is found. In the rule here we are searching for two patterns, FormattedSSN and UnformattedSSN, if either rule is matched it will set the confidence level to 85.

Now that we have defined the confidence level if a pattern is matched we need to define in more detail what variables FormattedSSN and UnformattedSSN mean, this is where we get a little crazy in the coding…and for full disclosure, I did not work out the logic in these rules…Bing is an amazing tool!

Now that we have that cleared up we can look at our expressions:

        <Regex id="FormattedSSN">

        (?!\b(\d)\1+-(\d)\1+-(\d)\1+\b)(?!123-45-6789|219-09-9999|078-05-1120)(?!666|000|9\d{2})\d{3}-(?!00)\d{2}-(?!0{4})\d{4}

        </Regex>

        <Regex id="UnformattedSSN">

        (?!\b(\d)\1+\b)(?!123456789|219099999|078051120)(?!666|000|9\d{2})\d{3}(?!00)\d{2}(?!0{4})\d{4}

        </Regex>

To get details on the expressions you can follow the link above to the original source article I pulled it from, but essentially the FormattedSSN expression is looking for a typical number sequence for SSN's in the form NNN-NN-NNNN, where N is any number from 0-9. But it also filters out commonly faked SSN's and invalid sequences such as 123-45-6789 or full sections containing all 0's. The second expression UnformattedSSN looks for everything above but doesn't require the '-', so SSN's in the format NNNNNNNNN, where N is any number from 0-9. This expression does the same validation steps to ensure the SSN is a valid one.

The final section of code covers the LocalizedStrings which includes localization for rules names and their descriptions. This includes the default name and default description of the rule package.

        <LocalizedStrings>

            <Resource idRef="0ba2cb9d-4ef1-4fdd-bd16-c3f431363d4b">

                <Name default="true" langcode="en-us">

                    Custom Social Security Number

                </Name>

                <Description default="true" langcode="en-us">

                    A custom classification for detecting Social Security numbers

                </Description>

            </Resource>

        </LocalizedStrings>

    </Rules>

</RulePackage>

The full rule when it is all put together looks like this:

<?xml version="1.0" encoding='UTF-8'?>

<RulePackage xmlns="http://schemas.microsoft.com/office/2011/mce">

    <RulePack id="b4b4c60e-2ff7-47b2-a672-86e36cf608be">

        <Version major="1" minor="0" build="0" revision="0"/>

        <Publisher id="7ea13c35-0e58-472a-b864-5f2e717edec6"/>

        <Details defaultLangCode="en-us">

            <LocalizedDetails langcode="en-us">

                <PublisherName>DLP by the Cloud Master</PublisherName>

                <Name>Custom SSN Classification</Name>

                <Description>Custom SSN Classification</Description>

            </LocalizedDetails>

        </Details>

    </RulePack>

    <Rules>

        <!– SSN –>    

        <Entity id="0ba2cb9d-4ef1-4fdd-bd16-c3f431363d4b" patternsProximity="300" recommendedConfidence="75">

            <Pattern confidenceLevel="85">

            <IdMatch idRef="FormattedSSN" />

            </Pattern>            

            <Pattern confidenceLevel="85">

            <IdMatch idRef="UnformattedSSN" />

            </Pattern>

        </Entity>

        <Regex id="FormattedSSN">

        (?!\b(\d)\1+-(\d)\1+-(\d)\1+\b)(?!123-45-6789|219-09-9999|078-05-1120)(?!666|000|9\d{2})\d{3}-(?!00)\d{2}-(?!0{4})\d{4}

        </Regex>

        <Regex id="UnformattedSSN">

        (?!\b(\d)\1+\b)(?!123456789|219099999|078051120)(?!666|000|9\d{2})\d{3}(?!00)\d{2}(?!0{4})\d{4}

        </Regex>

        <LocalizedStrings>

            <Resource idRef="0ba2cb9d-4ef1-4fdd-bd16-c3f431363d4b">

                <Name default="true" langcode="en-us">

                    Custom Social Security Number

                </Name>

                <Description default="true" langcode="en-us">

                    A custom classification for detecting Social Security numbers

                </Description>

            </Resource>

        </LocalizedStrings>

    </Rules>

</RulePackage>

 

Adding the Custom Rule Classification to Exchange Online

Now that we have our rule defined we can add it to Exchange Online to be used in a DLP policy. To do this we are going to need PowerShell and a global admin account to your Office 365 tenant.

 

  • First, copy your code to notepad and save it in an easily accessible location, in this example we are going to use C:\Temp. When you save the file change the Save as type to All Files and save it with an XML extension using UTF-8 as the encoding type.

     

  • Connect to your Office 365 Tenant via PowerShell

    $Cred = Get-Credential

    $Session = New-PSSession -ConfigurationName Microsoft.Exchange -ConnectionUri https://ps.outlook.com/powershell/ -Credential $Cred -Authentication Basic -AllowRedirection

    Import-PSSession $Session

  • Next Import your Classification Rule

    New-ClassificationRuleCollection –FileData ([Byte[]]$(Get-Content -path C:\temp\ssn.xml -Encoding byte -ReadCount 0))

     

 

  • If you want to be sure you can run the following rule to confirm the rule was properly imported. Note the code I use below will provide full details of each rule rather than a summary, if you want a quick summary just run Get-ClassificationRuleCollection and remove the last section. I have highlighted the rule for clarity.

    Get-ClassificationRuleCollection | fl

       

Time to create a new DLP rule in the Exchange Admin Center (EAC)

We are going to create a DLP rule that blocks all email sent from inside the network to external recipients with the following features:

  • Block the message
  • Secure MailTip when rule is triggered to warn the user
  • Email notification to administrator when rule is tripped including details

 Here we go…

  • Log into the Office 365 Portal (http://portal.microsoftonline.com)
  • Click the Admin button and select Exchange from the drop down list
  • Once in the EAC, click Compliance Management from the menu on the left then click Data Loss Prevention
  • Select New custom DLP policy

       

  • In the DLP policy creator give the policy a unique name, description (optional), select enabled and Test DLP policy with Policy Tips then click Save

     

  •  Next we will select our newly created rule and click the pencil icon to edit the rule

  

  • Click Rules, select the drop down next to the '+' and select Block messages with sensitive information

         

  •  Click *select sensitive information types

 

  •  Click the '+', scroll down until you find your newly created classification rule, highlight the rule, click add, and click OK

 

  •  Leave all the defaults in the Sensitive Information Types section and click OK
    • Note – You can override the variables we set in the rule here by clicking the pencil and selecting the specific area you want to override

  • Next, click *Select One…

  • Select the administrator you want to receive the violation email, I am going to set this to my account for this demo
  • Next click *Include message properties

  •  Select the specific properties you want to be in the alert email, in this case we are going to take sender, recipient, cc'd recipients, bcc'd recipients, matching rules, detected data classifications and matching content

  

  •  For now leave all the other defaults and click Save

Testing the new rule

For clarity I am going to disable all rules except the one we just created.

I am a user up to no good, I open up an email and try to send a formatted SSN to an external recipient. As you can see, within seconds of writing the SSN in the email (without any other markers such as SSN, SS# etc) I get a MailTip warning me that sensitive data has been detected.

But I am a rebel so I decide to send the message anyway. While I obviously can't show you a message that wasn't delivered, I can show you the report I received within minutes of the rule being triggered, including all of the important attributes I asked for in the rule properties.

I get the same results if I use an unformatted SSN. So what happens when I have an attachment that has secure data in it? I created a simple Excel spreadsheet that includes a Social Security Number:

Well darn…I tried to be tricky but the filter caught this one as well. Not only did it add a MailTip, it also highlighted the document with the secure data.

And here is the corresponding alert email I got after attempting to send the message.

Conclusion

As you can see DLP is very flexible and can be customized to protect your organization from virtually any type of content you deem sensitive. Not only that but you can choose to take a variety of actions if a rule is triggered including:

  • Block message
  • Warn user with MailTip
  • Block but allow user to override
  • Submit to admin for approval
  • Encrypt document
  • Apply rights management to the message

Exchange Online's DLP policy is a great hybrid of pre-created policies based on common laws across the world and custom policies tailored to your organizations specific needs. Additionally, DLP allows you to implement the rule but not act on it, essentially letting you see how effective a rule may be in your organization before you decide to fully implement it across the organization. Stay tuned for more great posts about DLP templates and policies.

 

 

 

 

 

Comments (2)

  1. Kyle Green says:

    Jorge,

    Thanks a ton for the write up, this is exactly what my customer was missing from the built in SSN rules. I couldn’t help but notice that the "UnformattedSSN" regex will flag an unformatted 10 digit phone number, or any 10+ digit string of numbers for that matter.

    I attempted to add to your regex of :
    (?!b(d)1+b)(?!123456789|219099999|078051120)(?!666|000|9d{2})d{3}(?!00)d{2}(?!0{4})d{4}

    By adding the following: ^d{3}(?!00)d{2}(?!0{4})d{4}$

    Even though the regex checks out using various testing methods, EOP seems to generate inconsistent results with this. I have found that it would ignore a 10 digit phone number one email, then flag it the very next. Do you have any ideas on how to possibly improve
    the regex within your rule to prevent these false positives?

  2. chris pyle says:

    AWESOME!

Skip to main content