Guest Post: Analysing Twitter Data with Hadoop and Power View

By Jen Stirrup

In the first part of this blog series, we looked at how social media data is a big driver for Big Data technologies such as HDInsight. However, once it is in HDInsight, how is it possible for business users to access the data using familiar tools such as Excel?

Large datasets can become difficult to navigate, evaluate and to use – this means that users will need a ‘helping hand’ in order to obtain their data. Fortunately, we have a number of tools available to us for visualising data, and these tools vary with respect to their usability and accessibility. This is depicted below:

Figure 1Relationship between Business and Technical oriented reporting tools

In this section, we will look at PowerPivot and Power View, which are aimed more at the business end of Business Intelligence consumers and users, rather than technically-oriented team members.

How do we get Big Data into PowerPivot? There are a number of stages involved in connecting PowerPivot to Hadoop. The steps are outlined here:

1. Firstly, install the Hive ODBC connector for Excel. For details, please see the Windows Azure reference.

2. Next, you will need to create an ODBC data source for Hadoop. This is required so that you can connect to the defined data source, specify your credentials and so on. For details, please refer to the Windows Azure reference, where the ‘Help’ content is now being stored.

3. The final stage is to import data. Windows Azure has a reference section on importing data, which can be found here.

4. You may also need to open ports on your Hadoop Cluster. Cindy Gross (@SQLCIndy ¦ Blog) has provided details on this process, which you can find here.

5. You also need to make sure that PowerPivot is enabled as a COM add-in. To do that, please visit the Office reference here.

Importing Data into PowerPivot

For general details on importing data in to PowerPivot, please visit the Microsoft Office 2013 website here. Here are the instructions for importing Hadoop data into PowerPivot:

Open a new blank workbook in Excel 2013. Look for the PowerPivot tab at the right hand side, and then click on the green Manage button in the Data Model group on the far left. In order to get Hadoop data, you need to click on “Get External Data” in the PowerPivot for Excel dialog.

To get Hive data, choose “From Other Sources” then select “Others (OLEDB/ODBC)” and click on the "Next >" button. To give the connection a ‘friendly’ name, you can choose ‘TwitterData’ or something else descriptive.

Once you have entered all of the necessary details, the wizard will continue. Choose the “Next >” option, and then choose the option “Select from a list of tables and views to choose the data to import”. Choose the census table then click on “Finish”.

The data is now loaded into PowerPivot – success! We can now use it in PowerView.

Adding a PowerView visualisation

Once the data is in PowerPivot, we can visualise it using Power View. To add a ‘Power View’ sheet, click on the bottom and add ‘New Power View’ as an option. The data model will appear at the right hand side, and we can start to create our own visualisation of the Twitter data.

Summary

Although there are a lot of ‘working parts’ in order to get this working, it is straightforward since we are primarily using tools which are familiar to us. When #BigData goes mainstream, people will start to want to get their hands on it, and PowerPivot is perfectly positioned for data-savvy Business Intelligence practitioners to start to use Big Data due to its speed of access and delivery.

 

Author Bio

Jen Stirrup

Copper Blue Consulting

https://www.copperblueconsulting.com/

Jen Stirrup is a SQL Server MVP, with a Bachelors degree in Psychology and two Masters in Artificial Intelligence and Cognitive Science. Jen is joint owner of Copper Blue Consulting, delivering data and Business Intelligence solutions to global clients. Jen is a veteran SQL Server Developer with over 12 years SQL experience in delivering end-to-end Business Intelligence solution. Jen blogs for SQLServerPedia as well as her own blog. She is very active in many SQL Server User Groups in the UK, helping to run the SQLHerts group in England. Jen's focus is on the Microsoft BI stack, with a specific passion for Data Visualisation.