The new Search in Exchange Server 2007


In Exchange Server 2007, we completely rewrote Search – we upgraded from MS-Search 2.0 to 3.0 (which SQL Server 2005 also uses) and we changed from a “crawl” model to an “always up to date” model – we also rewrote the indexing part to be more efficient in how it communicates with the Exchange Information Store – the end result is an 35x improvement in indexing speed.

Because Search used so much resources, it was disabled by default in Exchange Server 2003, In Exchange Server 2007, it is enabled by default, and all our Microsoft Dogfood servers have it running – it hums along nicely, taking between 2 and 5% of CPU in steady state, rather than bringing the machine to its knees every night. Another added benefit of being on top of things is that the new messages are still fresh in the Jet cache, so reading them doesn’t cause any additional I/Os.

Two new services handle the indexing work:

  • The Exchange Search Indexer is the component responsible for driving the creation and the updates of the search index. Think of it as the conductor – it constantly monitors changes in the system and schedules work in response. For example:
    • Adding a new Mailbox Database (MDB) triggers a crawl of the new MDB
    • Moving a new Mailbox triggers a crawl of the new mailbox
    • Receiving a new email causes the indexer to add the message to the index
  • MS-Search: The Search Indexer sends to MS-Search batches of messages (documents in the MS-Search terminology) that need to be indexed. To retrieve the contents of these documents MS-Search calls into an Exchange specific protocol handler that we wrote to get data out of the Exchange Information Store, and then uses a set of filters that are able to understand specific formats (HTML, PDF, Word, Excel, PowerPoint…) and decode them into plain text. To identify the words in the document, MS-Search uses another set of components called word breakers (in some languages like Chinese or Japanese for example, it is not very easy to tell how symbols are grouped to form a word).  This is no different that the Windows Content Indexing Service or the Windows Desktop Search.

You might notice that a third process actually uses most of the CPU – it is because MS-Search is actually composed of the core indexer (msftesql.exe) and a sacrificial filter daemon (msftefd.exe) which can be recycled at will. That’s where the protocol handler, filters and word breakers live.

The performance of local search in Exchange Server 2007 is significantly better than in Exchange 2003. Search results are returned within seconds, and the search index is also typically updated within seconds after a message is created.

Full crawl is also much faster, but still uses a significant amount of CPU, memory and I/O. During more intensive processing phases, this could in principle disrupt regular mail flow. Since delivery of mail must take precedence over indexing, indexing backs off (throttles) when the load on mailbox server becomes too high. The indexing load is controlled by regulating the number of items (document chunks) that are processed per unit of time. A monitoring thread keeps track of the load on each MDB using the average latency it takes to retrieve documents from the MDB.  If this latency crosses a certain threshold (the default is 20 ms), it starts to restrict indexing by setting progressively higher processing delays per MDB (the delay value).  Before fetching a document from the MDB, the indexer checks the current throttling delay value, if it’s larger than zero, it sleeps accordingly (throttling only occurs during full-crawls).

The delay value (DVt) for period t is calculated according to the following formula:

Errort = Max(0, Latencyt– LatencyThreshold)

PKt = KP*Errort

DVt = Min(MaxDV, (1- alfa )*PKt – alfa*DVt-1))

where LatencyThreshold is the latency that triggers  throttling, MaxDV is the maximum allowed delay value,  KP is the proportional gain factor and alfa is the feedback factor.

Here is a Performance Monitor screenshot of throttling in action (click on the picture if the width looks cut off for you):

Notice the white line (which shows the delay value) – the red line is the CPU utilization – which shows a dramatic decrease when the delay value increases – the greenish line is the latency.

– Remi Lemarchand

Comments (20)
  1. I know this is a little bit off-topic, but do you really mean SQL Server 2007(!) not 2005…?

    Christian

  2. Remi says:

    Indeed – I meant SQL Server 2005.

  3. Exchange says:

    OK fixed the typo in the article. :)

  4. tt says:

    None of this matters a whit unless you’ve persuaded the Outlook team to allow usage of this new Exchange search when Outlook is running in cached Exchange mode.  Current versions of Outlook unbelievably merely thrash the local disk in a brute force search.  It is so bad I recommend our Outlook users use Copernic desktop search to run searches on their Outlook email.

  5. Remi says:

    Well it does matter to some extent – not everybody runs cached mode Outlook, and most people do use OWA and/or Airsync which make full use of the new Exchange Search.

    If you’re running cached mode, you can install Windows Desktop Search or equivalent (I don’t know Copernic Desktop Search but I imagine it’s similar) – or upgrade to Outlook 2007.

  6. good penny stock says:

    piskasosiska 538236 http://onlineinvestmentworld.com/penny-stocks/good-penny-stock.html good penny stock <a href="http://onlineinvestmentworld.com/penny-stocks/good-penny-stock.html">good penny stock</a>  [URL=http://onlineinvestmentworld.com/penny-stocks/good-penny-stock.html]good penny stock[/URL] onlineinvestmentworld.com/penny-stocks/good-penny-stock.html [link=http://onlineinvestmentworld.com/penny-stocks/good-penny-stock.html]good penny stock[/link] * http://onlineinvestmentworld.com/penny-stocks/penny-stock-tip.html penny stock tip <a href="http://onlineinvestmentworld.com/penny-stocks/penny-stock-tip.html">penny stock tip</a>  [URL=http://onlineinvestmentworld.com/penny-stocks/penny-stock-tip.html]penny stock tip[/URL] onlineinvestmentworld.com/penny-stocks/penny-stock-tip.html [link=http://onlineinvestmentworld.com/penny-stocks/penny-stock-tip.html]penny stock tip[/link] *

  7. Matt Wilbur says:

    Very interesting. I’m curious if you’ve considered adding, as a feature, an ability to search against the catalog at a mailbox store level?  We are in the middle of trying to hunt down and remove some proprietary phrases that fell into some mailboxes (compliance issues) and it is IMPOSSIBLE to use anything but a dumb ascii viewer to look inside the edb files directly – its a shame that the search index is unreachable to admins to look for potential audit/compliance/security issues when needed. (should be easy to add?)

    just my $.02. sounds very nice – would make this weekend a lot better for me if something like that was available for 2003 :)

  8. private hyip says:

    piskasosiska 538236 http://onlineinvestmentworld.com/hyip/private-hyip.html private hyip <a href="http://onlineinvestmentworld.com/hyip/private-hyip.html">private hyip</a>  [URL=http://onlineinvestmentworld.com/hyip/private-hyip.html]private hyip[/URL] onlineinvestmentworld.com/hyip/private-hyip.html [link=http://onlineinvestmentworld.com/hyip/private-hyip.html]private hyip[/link] * http://onlineinvestmentworld.com/hyip/hyip-ranking.html hyip ranking <a href="http://onlineinvestmentworld.com/hyip/hyip-ranking.html">hyip ranking</a>  [URL=http://onlineinvestmentworld.com/hyip/hyip-ranking.html]hyip ranking[/URL] onlineinvestmentworld.com/hyip/hyip-ranking.html [link=http://onlineinvestmentworld.com/hyip/hyip-ranking.html]hyip ranking[/link] *

  9. invest overseas says:

    piskasosiska 538236 http://onlineinvestmentworld.com/invest/invest-overseas.html invest overseas <a href="http://onlineinvestmentworld.com/invest/invest-overseas.html">invest overseas</a>  [URL=http://onlineinvestmentworld.com/invest/invest-overseas.html]invest overseas[/URL] onlineinvestmentworld.com/invest/invest-overseas.html [link=http://onlineinvestmentworld.com/invest/invest-overseas.html]invest overseas[/link] * http://onlineinvestmentworld.com/hyip/e-gold-hyip-invest.html e gold hyip invest <a href="http://onlineinvestmentworld.com/hyip/e-gold-hyip-invest.html">e gold hyip invest</a>  [URL=http://onlineinvestmentworld.com/hyip/e-gold-hyip-invest.html]e gold hyip invest[/URL] onlineinvestmentworld.com/hyip/e-gold-hyip-invest.html [link=http://onlineinvestmentworld.com/hyip/e-gold-hyip-invest.html]e gold hyip invest[/link] *

  10. Exchange says:

    Matt,

    Good news – we DO have something like this available in Exchange 2007… great point, I’ll work on getting a blog post on this written as it definitely deserves its own entry and more detail!

  11. miriam hopkins says:

    piskasosiska 538236 http://miriam.weeklysaleads.com/miriam-hopkins.html miriam hopkins <a href="http://miriam.weeklysaleads.com/miriam-hopkins.html">miriam hopkins</a>  [URL=http://miriam.weeklysaleads.com/miriam-hopkins.html]miriam hopkins[/URL] miriam.weeklysaleads.com/miriam-hopkins.html [link=http://miriam.weeklysaleads.com/miriam-hopkins.html]miriam hopkins[/link] * http://miriam.weeklysaleads.com/busty-miriam.html busty miriam <a href="http://miriam.weeklysaleads.com/busty-miriam.html">busty miriam</a>  [URL=http://miriam.weeklysaleads.com/busty-miriam.html]busty miriam[/URL] miriam.weeklysaleads.com/busty-miriam.html [link=http://miriam.weeklysaleads.com/busty-miriam.html]busty miriam[/link] *

  12. craigslist minneapolis mn says:

    piskasosiska 538236 http://minneapolis-mn.cellulite-removal.net/craigslist-minneapolis-mn.html craigslist minneapolis mn <a href="http://minneapolis-mn.cellulite-removal.net/craigslist-minneapolis-mn.html">craigslist minneapolis mn</a>  [URL=http://minneapolis-mn.cellulite-removal.net/craigslist-minneapolis-mn.html]craigslist minneapolis mn[/URL] minneapolis-mn.cellulite-removal.net/craigslist-minneapolis-mn.html [link=http://minneapolis-mn.cellulite-removal.net/craigslist-minneapolis-mn.html]craigslist minneapolis mn[/link] * http://minneapolis-mn.cellulite-removal.net/orpheum-theater-minneapolis-mn.html orpheum theater minneapolis mn <a href="http://minneapolis-mn.cellulite-removal.net/orpheum-theater-minneapolis-mn.html">orpheum theater minneapolis mn</a>  [URL=http://minneapolis-mn.cellulite-removal.net/orpheum-theater-minneapolis-mn.html]orpheum theater minneapolis mn[/URL] minneapolis-mn.cellulite-removal.net/orpheum-theater-minneapolis-mn.html [link=http://minneapolis-mn.cellulite-removal.net/orpheum-theater-minneapolis-mn.html]orpheum theater minneapolis mn[/link] *

  13. city of austin trash says:

    piskasosiska 538236 http://city-of-austin.autosportcatlaog.com/city-of-austin-trash.html city of austin trash <a href="http://city-of-austin.autosportcatlaog.com/city-of-austin-trash.html">city of austin trash</a>  [URL=http://city-of-austin.autosportcatlaog.com/city-of-austin-trash.html]city of austin trash[/URL] city-of-austin.autosportcatlaog.com/city-of-austin-trash.html [link=http://city-of-austin.autosportcatlaog.com/city-of-austin-trash.html]city of austin trash[/link] * http://city-of-austin.autosportcatlaog.com/austin-city-limit-video.html austin city limit video <a href="http://city-of-austin.autosportcatlaog.com/austin-city-limit-video.html">austin city limit video</a>  [URL=http://city-of-austin.autosportcatlaog.com/austin-city-limit-video.html]austin city limit video[/URL] city-of-austin.autosportcatlaog.com/austin-city-limit-video.html [link=http://city-of-austin.autosportcatlaog.com/austin-city-limit-video.html]austin city limit video[/link] *

  14. Chris24 says:

    Remi,

    I am anxious to hear what your team is doing for Administrative search (keyword search) against the catalog across the stores. Multi-mailbox is a feature we could use right away. Thx.

  15. Kumar says:

    Hello Chris & Matt,

    I’m Kumar Cunchala, the PM for Exchange 2007 Search. We are in the process of posting a blog article which will talk about performing keyword searches (in subject and body) against multiple mailboxes. If you have any other questions about Search in Exchange 2007, please feel free to email me at kumarcs AT microsoft DOT com.

    Regards,
    Kumar

  16. Andy says:

    Does this mean that I would be able to search across multiple exchange mailboxes in OL2007? WDS won’t let me do that, and upgrading to OL2007 broke Lookout (which did).

  17. Bill says:

    The only way I have been able to search multiple exchange mailboxes in Outlook 2007 is by installing Google Desktop Search.  It searches multiple exchange mailboxes and subfolders without issue.

  18. Andy says:

    GDS doesn’t appear to do it with OL2007, but Copernic is looking good so far to do the job – cheers tt

  19. Exchange says:

    Bill & Andy,

    Keep checking back – in a few days we will have a blog post that will go into this more!

  20. HK says:

    Remi

    In an earlier post here it mentioned that there would a blog article about performing keyword searches against multiple mailboxes. We are in need to do searches across multiple mailboxes in our Exchange 2007 without having to do it on individual mailboxes. Are there any info available on how to do this? I couldn’t find any info on this topic on the Internet. Your help would be much appreciated.

    Thanks in advance,

    Hosul

Comments are closed.