This post is authored by Patty Ryan, Principal Data Scientist, Hang Zhang, Senior Data and Applied Scientist, and Mustafa Kasap, Senior Software Design Engineer, at Microsoft.
Learn from the Professionals through Comparison of Sensor Data
As any athlete aspiring to greatness can tell you, measurement of your own performance and tips from the pros are two keys to improvement. Thanks to affordable wearable sensors, it is now possible for you to measure your own performance and also benchmark it to that of the professionals.
Everyone knows that a professional practicing a sport looks visibly different from an amateur. In skiing, we’ve identified just nine sensor positions that can clearly differentiate professionals from the amateurs. The information from these nine sensors allowed us to build a simple but powerful machine learning model that can classify professionals and non-professionals correctly 98% of the time.
Sensor Data Delivers an Activity Proficiency Signature
Here’s how it works: Each of the sensors measure position, acceleration and rotation individually, and record this data along with a time stamp. Sensor data includes position, acceleration and rotation, all relative to x, y and z coordinates. While the sample rate of sensors varies, we recommend a minimum of a 100hz. To illustrate the potential of these wearable sensors in measuring sports performance, we worked with the Professional Ski Instructors of America and the American Association of Snowboard Instructors, or PSIA-AASI. We measured the organization’s professional-level skiers and compared these measures to intermediate skiers.
To start, we characterized with the PSIA-AASI hallmark differences between professionals vs. non-professionals. These differences include the relative position of their upper body vs. lower body, their limbs, and how they took a turn. Using these insights from domain experts, we engineered data features that characterize limb position relative to one another, and the upper body relative to the lower body. We added these engineered features to the dataset of individual sensor measures.
Then we broke our ski activity sample into small activity interval slices – in our case, time slices of two seconds each. For each of these slices, we created summary statistical measures on the sensor data from these intervals. These summary statistical measures were basic measures and included medians, minimums, maximums and quartiles.
We also created features that characterize frequency measures for this sensor data, represented by spatial and temporal features in the time-series graphic illustration. We ran a fast-discrete Fourier transform function on the variables to transform the data from time series to frequency measures (measured in hz) for a given interval window. In our case, we chose a 2-second time window. This generated the frequency components of our sensor signal, including constant power, low-band average power, mid-band average power, and high-band average power. Finally, we generated cross-correlation measures on select variables to measure the similarity of various two-series combinations.
We filtered to the best thirty features, and trained and tested a logistical regression model. This model predicted the right skill level classification 98% of the time. In addition to this classification, the amateur can get even more guidance on specific differences and areas to improve relative to the professional model, by investigating specific sensor differences seen at various phases of the activity.
Follow this Recipe to Recreate from Scratch
Refer to sensor kit and R script here.
- Place the sensors on the body per the above diagram and test. Refer to the sensor kit at this GitHub location for suggestions on sensor options. Your sensor should emit positional, acceleration and rotational sensor data from the feet, each of lower legs, pelvis, torso and shoulders, as identified in the diagram above, with a minimal sampling rate of 100 hz.
- Generate data! Create skiing experiment data of drills, including short radius turns, medium radius turns, and large radius turns. Be sure to exclude non-skiing time from your experiment sample.
- Label the data by skill level. We labeled it as professionals vs. non-professionals.
- Store the data in the cloud. Store data on the device, or your phone for batch upload, or stream data to a storage location in the cloud.
- Import data into an Azure Machine Learning workspace. Options for Azure ML data import include Azure SQL Database, Azure Blob Storage, Azure Table, Azure Document DB and more.
- Clean and transform the data. Transform into the wide data format, with one row for each athlete and experiment. Refer to the R script here for this, and the following steps. Using the dplyr functions makes this easy. Create engineered features to better illustrate the differences between pros and non-pros. Features that best illustrate these differences in skiing include the normalized difference between the upper body and lower body, and relational positions and rotation of upper body and lower body, as well as the relational position and rotation of the limbs.
- Slice the data into intervals and generate statistics for these intervals. Over your time window, generate summary statistics as well as frequency and frequency covariance statistics. You can use a fast-discrete Fourier transform function in R to generate frequency statistics. Generate summary statistics including median, standardization, max, min, 1st quantile, and 3rd quantile. Generate frequency statistics for constant power, low-band average power, mid-band average power, and high-band average power of each time window. Finally, you can generate frequency covariance statistics for select sets of variables.
- Prepare data to train the predictive model. Exclude labeled and irrelevant columns from evaluation and training. In this case, we exclude the experiment number, subject id, time stamp and, of course, skill level.
- Select features. In our case, we used Joint Mutual Information Maximization to reduce to 30 features.
- Split the data for training and test. In our case, we used a 70/30 split of train and test.
- Train the model. In our case, we chose logistic regression to predict the skill level.
- Evaluate the model results based on the test set that hasn’t been seen yet. Reviewing the confusion matrix will allow you to see how your measure performs.
Compare Your Skiing Sensor Data to the Pros
Comparing your data to that of the pros will give you very specific guidance on how to improve. Use our data set linked here (about 890 MB), and analyze your own skiing sensor data vs. the sample from professionals.
Expand this Model to Other Sports
We invite you to help us expand this model by adding activity models for additional sports and activities. Read about our sports sensor work at Real Life Code. Contribute to the sensor kit, sports activity data and models found at this GitHub location. Or reach out to Kevin Ashley or Max Zilberman at Microsoft.