This post is authored by Yiwen Sun, Data Scientist at Microsoft.
I attended the IEEE Visualization Conference 2015 in Chicago recently and jotted down a few points related to machine learning. For those of you who are unfamiliar with this conference, it’s the largest annual gathering of practitioners, academics and researchers looking to make data visually understandable and usable. Conference paper talks are organized into three tracks: Visual Analytics Science and Technology (VAST), Information Visualization (InfoVis), and Scientific Visualization (SciVis). Co-located are three IEEE symposiums: Large Data Analysis and Visualization (LDAV), Visualization for Cyber Security (VizSec), and the very first Symposium of Visualization in Data Science (VDS).
Over 1500 attendees participated this year, including leading companies in Business Intelligence and Advanced Analytics including Bloomberg, Google, IBM, Tableau, and, of course, Microsoft.
One big impression I got is that ML and Data Visualization are getting coupled more tightly. Over half of the papers address ML techniques in their data processing step. For example, the best paper for VAST “Reducing Snapshots to Points: A Visual Analytics Approach to Dynamic Network Exploration” utilizes vectorization, normalization, and dimensionality reduction to project high-dimensional dynamic network data onto two dimensions, then visualize them using two juxtaposed views: one showing network snapshots and the other showing the evolution of the network. This enables users to differentiate regular, stable states from anomalies more easily.
Below is a summary of ML techniques highlighted in four major application areas:
In network or spatial data visualization, clustering and classification have been widely used to reduce clutter and identify regions of interest. For example, in the paper “MobilityGraphs: Visual Analysis of Mass Mobility Dynamics via Spatio-Temporal Graphs and Clustering”, hourly Twitter user movement data in Greater London area are spatially aggregated into regional clusters and color-coded by temporal clusters.
(Image from Interactive Graphics Systems Group at Technical University of Darmstadt)
For time-series data visualization, a big challenge is to present large dataset on the limited display space without over-plotting. An effective approach is to aggregate the data points into segments of time, and create a hierarchy of multi-focus zoomed line chart, as illustrated in the paper “TimeNotes: A Study on Effective Chart Visualization and Interaction Techniques for Time-Series Data”
(Image from TimeNotes)
In textual data visualization, text mining techniques such as entity extraction, topic identification and sentiment analysis become essential. In the paper “Exploring Evolving Media Discourse Through Event Cueing”, multiple mining results, such as entities in Wordle, sentiment scores over timeline, are linked together to enable and enhance the analysis of media discourse.
(Image from VADER Lab at Arizona State University)
Anomaly detection, though not a standalone research area for visualization, has been studied by different research groups, to assist human judgement with automated analysis results. In “Visualization and Analysis of Rotating Stall for Transonic Jet Engine Simulation” the authors applied Grubbs’ test to identify outliers in blade passages as the early sign of turbine engine’s rotating stall. In “TargetVue: visual analysis of anomalous user behaviors in online communication systems”, TLOF (time-adaptive local outlier factor) model was used to identify sudden changes of user behaviors based on a set of features extracted for each user from the online communication data.
The VAST Challenge was another highlight – this is an annual contest that began in 2006 and is designed to reflect real-world analytics challenges and encourage research into novel data processing, visualization and interaction methods. This year’s challenge was to analyze individual and group movement in an amusement park over a weekend which involves a criminal investigation. Popular languages used for data processing and ML were Python and R, both of which are currently supported by Azure Machine Learning.
Overall, the conference was a great place to learn about the very latest in all things visualization, and to interact with experts in the domain.