This post is by Corom Thompson, Principal Software Engineer at Microsoft.
On November 22nd, 1963, the President of the United States, John F. Kennedy, was assassinated. He was shot by a lone gunman named Lee Harvey Oswald while driving through the streets of Dallas in his motorcade. The assassination has been the subject of so much controversy that, 25 years ago, an act of Congress mandated that all documents related to the assassination be released this year. The first batch of released files has more than 6,000 documents totaling 34,000 pages, and the last drop of files contains at least twice as many documents.
We’re all curious to know what’s inside them, but it would take decades to read through these. We approached this problem of gaining insights by using Azure Search and Cognitive Services to extract knowledge from this deluge of documents, using a continuous process that ingests raw documents, enriching them into structured information that enables you to explore the underlying data.
Today, at the Microsoft Connect(); 2017 event, we created the demo web site* shown in Figure 1 below – this is a web application that uses the AzSearch.js library and designed to give you interesting insights into this vast trove of information.
Figure 1 – JFK Files web application for exploring the released files
On the left you can see that the documents are broken down by the entities that were extracted from them. Already we know these documents are related to JFK, the CIA, and the FBI. Leveraging several Cognitive Services, including optical character recognition (OCR), Computer Vision, and custom entity linking, we were able to annotate all the documents to create a searchable tag index.
We were also able to create a visual map of these linked entities to demonstrate the relationships between the different tags and data. Below, in Figure 2, is the visualization of what happened when we searched this index for “Oswald”.
Figure 2 – Visualization of the entity linked mapping of tags for the search term “Oswald”
Through further investigation and linking, we were able to even identify that the entity linking Cognitive Service annotated this term with a connection to Wikipedia, and we quickly realized that the Nosenko who was identified in the documents was actually a KGB defector interrogated by the CIA, and these are audio tapes of the actual interrogation. It would have taken years to figure out these connections, but we were instead able to do this in minutes thanks to the power of Azure Search and Cognitive Services.
Another fun fact we learned is that the government was actually using SQL Server and a secured architecture to manage these documents in 1997, as seen in the architecture diagram in Figure 3 below.
Figure 3 – Architecture diagram from 1997 indicating SQL Server was used to manage these documents
We have created an architecture diagram of our own to demonstrate how this new AI-powered approach is orchestrating the data and pulling insights from it – see Figure 4 below.
This is the updated architecture we used to apply the latest and greatest Azure-powered developer tools to create these insightful web apps. Figure 4 displays this architecture using the same style from 54 years ago.
Figure 4 – Updated architecture of Azure Search and Cognitive Services
We’ll be making this code available soon, along with tutorials of how we built the solution – stay tuned for more updates and links on this blog.
Update to original blog post: The code is now available in GitHub here.
Meanwhile, you can navigate through the online version of our application* and draw your own insights!
* Try typing a keyword into the Search bar up at the top of the demo site, to get started, e.g. “Oswald”.