This post is by Janet Yeilding, a Program Manager on the Azure Stream Analytics team, and is based on a recent session from the Microsoft Advanced Analytics webinar series.
Azure Stream Analytics (ASA) is a cloud service that enables real-time processing over streams of data flowing in from devices, sensors, websites and other live systems. The stream-processing logic in ASA is expressed in a SQL-like query language with some added extensions such as windowing for performing temporal calculations. At a recent webinar, we explained key language constructs and query patterns and did an Internet of Things (IoT) demonstration – you can view a recording of the webinar here – and, in this post, we share a few highlights from the webinar and about this technology.
Events and Time
ASA is a temporal system, so every event that flows through it has a timestamp. A timestamp is assigned automatically based on the event's arrival time to the input source but you can also access a timestamp in your event payload explicitly using TIMESTAMP BY:
SELECT * FROM SensorReadings TIMESTAMP BY time
As you perform computations and aggregations over your input and generate output events, Stream Analytics will assign timestamps to them, as well. You can access this timestamp with System.Timestamp:
SELECT System.Timestamp AS Time FROM SensorReadings
Windowing is a core requirement for stream processing applications to perform set-based operations like counts or aggregations over events that arrive within a specified period of time. ASA supports three types of windows: Tumbling, Hopping, and Sliding.
Tumbling Windows define a repeating, non-overlapping window of time.
SELECT sensorId, COUNT(*) AS Count
FROM SensorReadings TIMESTAMP BY time
GROUP BY sensorId, TumblingWindow(second, 10)
Like Tumbling Windows, Hopping Windows move forward in time by a fixed period but they can overlap with one another.
SELECT sensorId, COUNT(*) AS Count, AVG(temp)
FROM SensorReadings TIMESTAMP BY time
GROUP BY sensorId, HoppingWindow(second, 10 , 5)
With a Sliding Window, the system is asked to logically consider all possible windows of a given length and output events for cases when the content of the window actually changes – that is, when an event entered or existed the window.
SELECT sensorId, MIN(temp) as temp
TIMESTAMP BY time
GROUP BY sensorId, SlidingWindow(second, 5)
HAVING MIN(temp) > 75
Joining Multiple Streams
Like standard T-SQL, JOIN in ASA is used to combine records from two or more input sources. JOINs in ASA are temporal in nature, meaning each JOIN must provide some limits on how far the matching rows can be separated in time.
Example: Find cases where the humidity reading changed by at least .1 within 5 seconds
SELECT s1.time, s1.dspl, s1.hmdt as previousHmdt, s2.hmdt as newHmdt, datediff(ss, s1.time, s2.time) as secondsApart
FROM SensorData s1 timestamp by time
JOIN SensorData s2 timestamp by time
ON s1.dspl = s2.dspl
AND DATEDIFF(s, s1, s2) BETWEEN 0 AND 5
WHERE (s2.hmdt – s1.hmdt >= .1) or (s1.hmdt – s2.hmdt >= .1)
Reference Data JOIN
Reference data is static or slow-changing data. It is often used for performing lookups or correlations and can be incorporated into a ASA query using a Reference Data JOIN.
Example: Use a lookup table to perform dynamic temperature threshold alerting for each device
SELECT SensorReadings.sensorID, SensorReadings.temp
ON SensorReadings.sensorID = thresholdRefData.sensorID
WHERE SensorReadings.temp > thresholdRefData.value
ASA supports multiple computations and output targets in a single job. You can use the INTO keyword to specify an output target explicitly.
Example: Write all events to outputLog and write events above 75 degrees to outputTempAlert
WHERE temp > 75
Azure ML and ASA are now integrated in private preview, enabling you to call ML models over streaming data in an ASA job. For details on supported scenarios and signup information, see Real Time scoring of streaming data using Machine Learning models.