Augmenting Image Metadata with Cognitive Services


Hello Folks,

This week, the team from Microsoft Canada has partnered with 4 startup teams at Communitech, an industry-led Waterloo Ontario based innovation centre that supports and fosters a community of nearly 1,000 tech companies, to incorporate Microsoft Cognitive Services into their offerings. While I am unfortunately unable to be there, I wanted to share my experience in helping another startup do the same.

Last Month we partnered with a local startup to create a solution to address the ever-growing problem we all have. How do we categorize and index all our digital pictures in such a way that we can search and retrieve based on picture content and not just based on date taken or location.

The Kwilt team is based out of Invest Ottawa. They are a startup with high-level goals for their applications, they currently have several apps to manage how people interact with their photos.
Their products are:

They’ve created an application that helps navigate the sea of photos located in social, cloud and messaging accounts. This is completed in a manner that is seamless to the end user as all the search work is completed through the Kwilt app. In a Mobile App and on the web.

image

During the project, we leveraged Azure Cognitive Services to augment the capabilities of the app. We introduced capabilities that will assist users facing the challenge of “tagging” all their photos to allow for more accurate searches. Capabilities that will address users mistyping tags that would result in no or wrong photos found. Address accents and special characters for search. i.e.: Montreal ≠ Montréal and more…

Technologies used in this project:

  • Azure Storage
  • Azure Functions
  • Microsoft Cognitive Services
  • Computer Vision API
  • Azure Cosmos DB (formerly DocumentDB)
  • Azure Search

Here is how we get this done!

image

And by the way the code we used is available here.

Ingest and Analyze

Ingesting all the data from Kwilt database in Azure Cognitive Services allows the service to automatically tag photos, and eliminating the need for manual user input. We started by building a robust/efficient workflow to push data from the Kwilt backend database to Azure in order to facilitate analysis by leveraging Cognitive Services.

You can test the capabilities of the service yourselves.

First the data is received from the Kwilt backend into an Azure Storage Queue. The process to feed the queue is a proprietary PHP script/process that extract new entries in the Kwilt database, converts them into a JSON format and sends them to the Azure Storage Queue using the Azure Storage PHP SDK to send messages to the configured Storage Account & Key.

Here is a sample message that is being stored in the queues.

{
    "sorting_time": "2015-06-07 22:50:36",
    "type": "image",
    "id": 68682364,
    "name": "010309_0800_6154_nals",
    "created_time": "2015-06-08 05:50:36",
    "width": 919,
    "height": 602,
    "mime_type": "image/jpeg",
    "size": 576761,
    "time_taken": "2015-06-07 22:50:36",
    "modified_time": "2015-06-08 05:50:38",
    "source_url": "https://farm1.staticflickr.com/333/18585231402_798c4247fe_o.jpg",
    "recent_time": "2015-06-07 22:50:36",
    "thumbnail_url": "https://farm1.staticflickr.com/333/18585231402_eac0b3fe77_z.jpg"
}

 

Once the Message is in the queue it triggered the Azure function. (below is a screen capture of the Azure Function configuration.)

image6

Once the data is in the queue for analysis, we leverage Azure Functions
to send the info to Cognitive Services to analyze the image. Of  course,
since we want to utilize proper DevOps practices we have configured
Azure Functions for continuous integration from a Github repository
setup for the various functions.

var https = require('https');

module.exports = function (context, message) {
  logInfo(context, 'Analyzing Image Id: ' + message.id)
  logVerbose(context, 'Queue Message:\n' + JSON.stringify(message));

  // Validate Configuration
  if (!process.env.OcpApimSubscriptionKey) {
    throwError(context, 'Missing Configuration, OcpApimSubscriptionKey not configured in Application Settings.');
  }

  // Validate Message
  if (!message.thumbnail_url) {
    throwError(context, 'Invalid Message, thumbnail_url missing.');
  }

  // Define Vision API options
  var options = {
    host: 'westus.api.cognitive.microsoft.com',
    port: 443,
    path: '/vision/v1.0/analyze?visualFeatures=Categories,Tags,Description,Faces,ImageType,Color,Adult&details=&language=en',
    method: 'POST',
    headers: {
      'content-type': 'application/json',
      'Ocp-Apim-Subscription-Key': process.env.OcpApimSubscriptionKey
    }
  };

  logVerbose(context, 'Thumbnail Url: ' + message.thumbnail_url);

  var req = https.request(options, function (res) {
    res.setEncoding('utf8');

    res.on('data', function (data) {
      logVerbose(context, 'Vision API Response\n' + data);

      var visionData = JSON.parse(data);

      // Was the image successfully processed
      switch (res.statusCode) {
        case 200: // Sucess
          updateFileMetaData(message, visionData);
          break;
        case 400: // Error processing image
          var errorMessage = visionData.message ? visionData.message : "Unknown error processing image";
          logInfo(context, errorMessage);
          updateFileMetaData(message, null, visionData);
          break;
        case 403: // Out of call volume quota
          context.done(new Error('Out of cal volume quota'));
          return;
        case 429: // Rate limit is exceeded
          context.done(new Error('Rate limit is exceeded'));
          return;
      }

      // Set the object to be stored in Document DB
      context.bindings.outputDocument = JSON.stringify(message);

      context.done();
    });
  });

  req.on('error', function (e) {
    logVerbose(context, 'Vision API Error\n' + JSON.stringify(e));
    throwError(context, e.message);
  });

  // write data to request body
  var data = {
    url: message.thumbnail_url
  };

  req.write(JSON.stringify(data));
  req.end();
};


function updateFileMetaData(message, visionData, error) {
  // Document DB requires ID to be a string
  // Convert message id to string
  message.id = message.id + '';

  // Keep a record of the raw/unedited Vision data
  message['azure_vision_data'] = {
    timestamp: new Date().toISOString().replace(/T/, ' ').replace(/\..+/, ''),
    data: visionData,
    error: error
  };

  if (visionData) {
    // Flatten/append vision data to the file object
    message['isAdultContent'] = visionData.adult.isAdultContent;
    message['isRacyContent'] = visionData.adult.isRacyContent;
    message['auto_tags'] = extractConfidenceList(visionData.tags, 'name', 0.1);
    message['auto_categories'] = visionData.categories ? extractConfidenceList(visionData.categories, 'name', 0.1) : [];
    message['auto_captions'] = extractConfidenceList(visionData.description.captions, 'text', 0.1);
    message['auto_description_tags'] = visionData.description.tags;
    message['auto_dominantColorForeground'] = visionData.color.dominantColorForeground;
    message['auto_dominantColorBackground'] = visionData.color.dominantColorBackground;
    message['auto_accentColor'] = visionData.color.accentColor;
    message['auto_isBWImg'] = visionData.color.isBWImg;
    message['auto_clipArtType'] = visionData.imageType.clipArtType;
    message['auto_lineDrawingType'] = visionData.imageType.lineDrawingType;
  }

  // Convert existing tags field from comma seperated string to array
  if (message.tags && typeof message.tags === 'string') {
    message.tags = message.tags.split(',');
  } else {
    message.tags = [];
  }

  // Azure Search requires location to be a single field
  if (message.latitude && typeof message.latitude === 'number') {
    message['location'] = {
      type: 'Point',
      coordinates: [message.longitude, message.latitude]
    }
  }
}

function throwError(context, message) {
  logVerbose(context, 'Error: ' + message);
  throw new Error(message);
}

function logInfo(context, message) {
  context.log('+[Info] ' + message);

}

function logVerbose(context, message) {
  if (process.env.VerboseLogging) {
    context.log('![Verbose] ' + message);
  }
}

// Extracts a list of values by field from an array of objects
// where the confidence value is greater than or equal to the
// optional minConfidenceValue.
function extractConfidenceList(objArray, field, minConfidenceValue) {
  if (Object.prototype.toString.call(objArray) !== '[object Array]') {
    throw new Error("objArray (type: " + Object.prototype.toString.call(objArray) + ") in extractConfidenceList is not an array.");
  }

  if (!field || typeof field !== 'string') {
    throw new Error("field in extractConfidenceList is missing or not an string.");
  }

  // If not min confidence value specified or value is undefined set to 0
  if (!minConfidenceValue) { minConfidenceValue = 0; }

  var list = new Array();

  objArray.forEach(function (obj) {
    // Do we need to do a confidence check?
    if (minConfidenceValue > 0 && typeof obj['confidence'] === 'number') {
      // Is confidence >= min required?
      if (obj['confidence'] >= minConfidenceValue) {
        list.push(obj[field]);
      }
    }
    else {
      // No check needed push field into array
      list.push(obj[field]);
    }
  });

  return list;
}

Analysis of the images (visual features & details) is configured in the
function when the Vision API HTTP API call options are defined.

// Define Vision API options
  var options = {
    host: 'westus.api.cognitive.microsoft.com',
    port: 443,
    path: '/vision/v1.0/analyze?visualFeatures=Categories,Tags,Description,Faces,ImageType,Color,Adult&details=&language=en',
    method: 'POST',
    headers: {
      'content-type': 'application/json',
      'Ocp-Apim-Subscription-Key': process.env.OcpApimSubscriptionKey
    }
  };

In this case the following features were configured in English (only
English and Chinese are available for the API):

  • Categories– categorizes image content according to a taxonomy defined in documentation.
  • Tags– tags the image with a detailed list of words related to the image content.
  • Description– describes the image content with a complete English sentence.
  • Faces– detects if faces are present. If present, generate coordinates, gender and age.
  • ImageType– detects if image is clipart or a line drawing.
  • Color– determines the accent color, dominant color, and whether an image is black&white.
  • Adult– detects if the image is pornographic in nature (depicts nudity or a sex act). Sexually suggestive content is also detected.

Once the analysis is complete the resulting JSON message is stored in Cosmos DB.

Here is the result the processing of the following image:

image8

The following is the result of the analysis:

 

{
	"sorting_time": "2016-06-19 15:59:58",
	"type": "image",
	"id": "68772289",
	"name": "Photo Jun 19, 15 59 58.jpg",
	"created_time": "2016-07-28 15:54:59",
	"modified_time": "2016-07-28 15:55:16",
	"size": 2289041,
	"mime_type": "image/jpeg",
	"latitude": 46.8124,
	"longitude": -71.2038,
	"geoname_id": 6325494,
	"city": "Québec",
	"state": "Quebec",
	"state_code": "QC",
	"country": "Canada",
	"country_code": "CA",
	"time_taken": "2016-06-19 15:59:58",
	"height": 3264,
	"width": 2448,
	"thumbnail_url": "",
	"recent_time": "2016-06-19 15:59:58",
	"tags": [
		"town",
		"property",
		"road",
		"neighbourhood",
		"residential area",
		"street"
	],
	"account_id": 111193,
	"storage_provider_id": 105156,
	"altitude": 53,
	"camera_make": "Apple",
	"camera_model": "iPhone 6",
	"azure_vision_data": {
		"timestamp": "2017-03-29 15:59:55",
		"data": {
			"categories": [
				{
					"name": "outdoor_street",
					"score": 0.96484375
				}
			],
			"adult": {
				"isAdultContent": false,
				"isRacyContent": false,
				"adultScore": 0.007973744533956051,
				"racyScore": 0.010262854397296906
			},
			"tags": [
				{
					"name": "outdoor",
					"confidence": 0.9993922710418701
				},
				{
					"name": "sky",
					"confidence": 0.9988007545471191
				},
				{
					"name": "building",
					"confidence": 0.9975806474685669
				},
				{
					"name": "street",
					"confidence": 0.9493720531463623
				},
				{
					"name": "walking",
					"confidence": 0.9154794812202454
				},
				{
					"name": "sidewalk",
					"confidence": 0.8519290685653687
				},
				{
					"name": "people",
					"confidence": 0.7953380942344666
				},
				{
					"name": "way",
					"confidence": 0.7908639311790466
				},
				{
					"name": "scene",
					"confidence": 0.7276134490966797
				},
				{
					"name": "city",
					"confidence": 0.624116063117981
				}
			],
			"description": {
				"tags": [
					"outdoor",
					"building",
					"street",
					"walking",
					"sidewalk",
					"people",
					"road",
					"city",
					"narrow",
					"bicycle",
					"man",
					"group",
					"woman",
					"standing",
					"old",
					"pedestrians",
					"holding",
					"platform",
					"parked",
					"carriage",
					"riding",
					"train",
					"clock"
				],
				"captions": [
					{
						"text": "a group of people walking down a narrow street",
						"confidence": 0.8872056096672615
					}
				]
			},
			"requestId": "38fa30e6-2a50-4a7f-b780-e6472c6d1a52",
			"metadata": {
				"width": 600,
				"height": 800,
				"format": "Jpeg"
			},
			"faces": [],
			"color": {
				"dominantColorForeground": "Grey",
				"dominantColorBackground": "Grey",
				"dominantColors": [
					"Grey",
					"White"
				],
				"accentColor": "2C759F",
				"isBWImg": false
			},
			"imageType": {
				"clipArtType": 0,
				"lineDrawingType": 0
			}
		}
	},
	"isAdultContent": false,
	"isRacyContent": false,
	"auto_tags": [
		"outdoor",
		"sky",
		"building",
		"street",
		"walking",
		"sidewalk",
		"people",
		"way",
		"scene",
		"city"
	],
	"auto_categories": [
		"outdoor_street"
	],
	"auto_captions": [
		"a group of people walking down a narrow street"
	],
	"auto_description_tags": [
		"outdoor",
		"building",
		"street",
		"walking",
		"sidewalk",
		"people",
		"road",
		"city",
		"narrow",
		"bicycle",
		"man",
		"group",
		"woman",
		"standing",
		"old",
		"pedestrians",
		"holding",
		"platform",
		"parked",
		"carriage",
		"riding",
		"train",
		"clock"
	],
	"auto_dominantColorForeground": "Grey",
	"auto_dominantColorBackground": "Grey",
	"auto_accentColor": "2C759F",
	"auto_isBWImg": false,
	"auto_clipArtType": 0,
	"auto_lineDrawingType": 0,
	"location": {
		"type": "Point",
		"coordinates": [
			-71.2038,
			46.8124
		]
	}
}

Once this analysis is stored in the Cosmos DB it can be indexed and searched using Azure Search. as you can see in the following screen captures The Kwilt Team was able to digest the analysis and with Azure Search to build an extremely user friendly search proposition for their users.

In the project beta client we were able to search by keywords (Food, Plates, Fireworks…) and by location (Gatineau) without any manual tagging of the pictures.

image15

All I can say is that i cannot wait to process my own photo streams through this service.  I’ve already installed the app on my phone.

Take the opportunity to test out Microsoft Cognitive Services in your own initiatives by visiting: https://azure.microsoft.com/en-ca/services/cognitive-services/

Cheers!!

clip_image003

Pierre Roman
@pierreroman

Comments (0)

Skip to main content