Twitter Sentiment Analysis - Part 1
Twitter Sentiment Analysis - Part 1
Twitter Sentiment Analysis - Part 1
Twitter Sentiment Analysis - Part 1. Extracting and Mining Twitter Data Using RapidMiner and Google/Microsoft Tools | bicortex
bicortex
Home
Why BI?
CV
Contact
{RSS}
Tuesday, March 11, 2014
Twitter Sentiment Analysis Part 1. Extracting and Mining Twitter Data Using
Zapier, RapidMiner and Google/Microsoft Tools
In this short series (two parts second part can be found HERE) I want to expand on the subject of sentiment analysis of Twitter data
through data mining techniques. In the previous post I showed how to extract Twitter data using an SSIS package, load it into a
relational database, and create a small cube to show the harvested tweets in a pivot table. For me, the best part of working on that
solution was creating a stored procedure which determines Twitter feeds sentiment based on a dictionary of most commonly used
words. It was a pretty rudimentary approach; nevertheless, it worked well and turned out to be an effective way of analysing social
media data using bare SQL. In the next few posts I want to elaborate on the sentiment analysis part and rather than using SQL, I will
show how to conduct such analysis using more sophisticated tool, a free data mining software called RapidMiner.
This series will be split into two separate posts first one about the methods of tweets extraction and collection and second one on the
actual sentiment analysis. In this first post I want to focus on how to amass a decent pool of tweets in two different ways using a service
called Zapier, Google Docs and a little handy tool called GDocBackUpCMD as well as SSIS and a little bit of C#. Each way has its
pros and cons but they are more robust then using RSS/ATOM feed as shown in my previous post.
Firstly, lets look at Zapier. Zapier is a great service which allows web applications to talk to each other by providing integration
platform for a number of popular web services e.g. YouTube, Gmail, Evernote, Paypal etc. It is even possible to integrate the data into
MS SQL Server; however this is classed as a premium service with a higher cost involved. We will use Zapier to extract Twitter feeds
into a Google Docs spread sheet and then copy the data across to our local environment to mine it for sentiment trends. To get started
make sure you have a Twitter account, Google account and have set up Zapier account as well. Next, in Google Docs, create an empty
spreads sheet with three columns and their headings I have labeled mine as UserName for the feed author, Date for when the tweet
was posted and Feed representing the content of the tweet. Also, make sure that you enter a dummy record in one of the cells I
have added test row as per screenshot below. For some reason Zapier requires that to validate the object it will be writing to.
Once all your accounts are active and you have a Google Docs spread sheet skeleton ready, building a zap is really easy. Just select the
services you wish to integrate by dragging and dropping them onto the top process pane, configure the parameters and enable the
integration. Here, I have placed Twitter as a source and Google Docs as a target.
http://bicortex.com/twitter-sentiment-analysis-mining-twitter-data-using-rapidminer-part-1/
1/9
11/3/2014
Twitter Sentiment Analysis - Part 1. Extracting and Mining Twitter Data Using RapidMiner and Google/Microsoft Tools | bicortex
Finally, enable the zap and bingo, every 15 minutes (using fee account) you should see your Twitter data populating the spread sheet.
The complete zap should resemble the image below.
http://bicortex.com/twitter-sentiment-analysis-mining-twitter-data-using-rapidminer-part-1/
2/9
11/3/2014
Twitter Sentiment Analysis - Part 1. Extracting and Mining Twitter Data Using RapidMiner and Google/Microsoft Tools | bicortex
Providing your zap has run, we are ready to copy the Google Docs data into our local environment. As mentioned before, in this post I
will explain how to achieve this using two different methods free tool called GDocBackUpCMD with a little batch file and SSIS for
database integration as well as C# hacking with SQL Server Integration Services.
@ech off
echo
c:\GDocBackupCMD\GDocBackupCMD.exe -mode=backup -username=YourUserName -password=YourPasswo
Save the file on the C:\drive as Export_Tweets.bat. The two directories and the batch file should look as per image below.
Now we can execute the file to see if the process works as expected. Double-clicking on the batch file should execute it in the console
view and provided all the parameters were entered correctly run the process. After completion we should also have a copy of our
Google Doc spread sheet in our local folder Tweets.
Finally, all there is to do it to create a database table where all twitter data will be exported into and a small SSIS package which will
help us populate it based on the local spread sheet version as per below.
http://bicortex.com/twitter-sentiment-analysis-mining-twitter-data-using-rapidminer-part-1/
3/9
11/3/2014
Twitter Sentiment Analysis - Part 1. Extracting and Mining Twitter Data Using RapidMiner and Google/Microsoft Tools | bicortex
Naturally, if you wish to make it a continuous process, a more elaborate package would be required to handle truncation (either Google
spread sheet or the database table), data comparison for already stored tweets and some sort of looping.
This is just one of the ways to export Google Docs spread sheet into our local environment. Another method involves a little bit of C#
but in turn does not rely on other utilities and tools; it can all be done in SSIS.
By default, Google API setup places the DLLs in C:\Program Files\Google\Google Data API SDK\Redist. When you open up this
location you should see a number of DLL files, the ones we are interested in are Google.GData.Client.dll, Google.GData.Extensions.dll
and Google.GData.Spreadsheets.dll. Next, we need to register those in GAC, otherwise known as Global Assembly Cache. For this
purpose you can use the gacutil.exe executable being called from a batch file or command line, GacView tool if you are afraid of the
console or simply by copying those three DLL files into C:\Windows\assembly folder. If for some reason you get an error, you should
change the settings for UAC (User Access Control) by just typing UAC in the Windows search pane, adjusting the settings and
rebooting the computer. Once the DLLs have been registered we can create out database table which will store our Twitter feeds. As
my spread sheet has three columns, we will create a table with three attributes, executing the following code:
1
2
3
4
Next, lets create an Integration Services package. To do this, in a blank project create two variable of string data type which will
represent your Gmail User ID and Gmail password and populate them with your credentials as per image below.
Now, drop a Data Flow Task from the tools menu onto Control Flow tab pane and within it (in Data Flow tab) place a Script
Component and an OLE DB Destination component, linking the two together as per below.
In the Script Component properties pane enter the two variables we defined earlier as ReadOnlyVariables.
Next, in the Inputs and Outputs property create three outputs columns correlating to the names of the database table we created earlier,
ensuring that the data types and lengths are the same as your database table.
http://bicortex.com/twitter-sentiment-analysis-mining-twitter-data-using-rapidminer-part-1/
4/9
11/3/2014
Twitter Sentiment Analysis - Part 1. Extracting and Mining Twitter Data Using RapidMiner and Google/Microsoft Tools | bicortex
Just before we are ready to write the code, we should reference/add the DLLs form Google API we downloaded previously. Go ahead
and click on Edit Script which should invoke Visual Studio code environment and under Solution Explorer on the right hand side, rightclick on References and select Add Reference. From here, navigate to the directory where DLLs are stored (on my machine they
were saved under C:\Program Files (x86)\Google\Google Data API SDK\Redist and select Google.GData.Client.dll,
Google.GData.Extensions.dll and Google.GData.Spreadsheets.dll files. When all three files were added, click OK.
Finally, we are ready to write some code. Enter the following C# script (ensuring that appropriate adjustments are made to reflect your
spread sheet/file name, sheet/tab name and column headings) and when finished, save the code and run the package.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using Google.GData;
using Google.GData.Client;
using Google.GData.Extensions;
using Google.GData.Spreadsheets;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void PreExecute()
{
base.PreExecute();
}
public override void PostExecute()
{
base.PostExecute();
}
public override void CreateNewOutputRows()
http://bicortex.com/twitter-sentiment-analysis-mining-twitter-data-using-rapidminer-part-1/
5/9
11/3/2014
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
Twitter Sentiment Analysis - Part 1. Extracting and Mining Twitter Data Using RapidMiner and Google/Microsoft Tools | bicortex
SpreadsheetsService GoogleExcelService;
GoogleExcelService = new SpreadsheetsService("Spreadsheet");
GoogleExcelService.setUserCredentials(Variables.GmailUserID, Variables.GmailPasswo
SpreadsheetQuery query = new SpreadsheetQuery();
SpreadsheetFeed myFeed = GoogleExcelService.Query(query);
Hopefully, everything executed as expected and when you query the table you should see the tweets extracted from the Google spread
sheet content.
In the second part to this two-part series I want to explore sentiment analysis using freely available RapidMiner tool based on the data
we have collected and create a mining model which can be used for Twitter feeds classification. Finally, if youre after a more
adhoc/playful sentiment analysis tool for Twitter data and (optionally) have some basic knowledge of Python programming language,
check out my post on using etcML web based tool under THIS link.
Submit Article :- BlinkList + Blogmarks + Digg + Del.icio.us + Ekstreme Socializer + Feedmarker + Furl + Google Bookmarks +
ma.gnolia + Netvouz + RawSugar + Reddit + Scuttle + Shadows + Simpy + Spurl + Technorati + Unalog + Wink
http://scuttle.org/bookmarks.php/pass?action=add
Posted in: .NET, Excel, How To's, SQL, SSIS
Tags: .NET, C#, Code, Data Mining, Programming, RapidMiner, Social Networks, SQL, SSIS
http://bicortex.com/twitter-sentiment-analysis-mining-twitter-data-using-rapidminer-part-1/
6/9
11/3/2014
Twitter Sentiment Analysis - Part 1. Extracting and Mining Twitter Data Using RapidMiner and Google/Microsoft Tools | bicortex
This entry was posted on Thursday, February 28th, 2013 at 1:02 pm and is filed under .NET, Excel, How To's, SQL, SSIS. You can follow any responses to
this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
2 Responses to Twitter Sentiment Analysis Part 1. Extracting and Mining Twitter Data Using
Zapier, RapidMiner and Google/Microsoft Tools
Leave a Reply
Name (*) :
Email (*) :
URL :
Comment (*) :
My name is Marcin and this site is a random collection of (after) thoughts, recipes, reflections and
desultory posts about the world of BI, data analytics and everything that I fancy and categorize under the BI umbrella. I'm a native of
http://bicortex.com/twitter-sentiment-analysis-mining-twitter-data-using-rapidminer-part-1/
7/9
11/3/2014
Twitter Sentiment Analysis - Part 1. Extracting and Mining Twitter Data Using RapidMiner and Google/Microsoft Tools | bicortex
Poland but since my university days I have lived in Australia, Melbourne and worked as a DBA, BI Developer, Business Analyst and BI
Consultant. My main interests lie in both, technical aspects of Business Intelligence (primarily Microsoft BI stack i.e. MS SQL Server,
SSIS, SSRS, SSAS, SharePoint, PowerPivot), data modeling and systems architecture as well as business applications of BI solutions
(project management, corporate data management strategies, enterprise BI solutions implementation). On the whole, I am very fond of
anything closely or remotely related to data and as long as it can be represented as a string of ones and zeros and then analyzed and
visualized, you've got my attention!
Outside sporadic updates to this site I typically find myself fiddling with data, spending time with my kids or a good book (these days
odds are against the book), the gym or watching a good movie while eating Polish sausage with Zubrowka (best served on rocks with
apple juice and a lime twist). Please read on and if you find these posts of any interests, don't hesitate to leave me a comment!
Subscribe
Subscribe via RSS | Comments (RSS)
Or, subscribe via email:
Subscribe
Advertising
Latest Traffic
Live Traffic Feed
http://bicortex.com/twitter-sentiment-analysis-mining-twitter-data-using-rapidminer-part-1/
8/9
11/3/2014
Twitter Sentiment Analysis - Part 1. Extracting and Mining Twitter Data Using RapidMiner and Google/Microsoft Tools | bicortex
Latest Tweets
Amazon And IBM vs. Open Source Hadoop: Bigness May Not Beat Quality @ http://t.co/xEVPGSMLsO #hadoop
#bigdata - posted on 06/03/2014 09:57:32
Not only is Data Science not a science, its not even a good job prospect @ http://t.co/MLzX8oUO3j #datascience
#bigdata - posted on 06/03/2014 00:01:22
Big Data Picks Up the Pace @ http://t.co/L4xNdkSlCD #bigdata - posted on 05/03/2014 22:53:08
Two Approaches To In-Memory Database Battle @ http://t.co/Re58yqsFZQ #RDBMS #analytics #bigdata - posted on
04/03/2014 15:12:12
The Best of Python in 2013 @ http://t.co/Y9c8SDRc3p #python - posted on 04/03/2014 15:09:52
Deep learning: A high-risk/high-reward big data investment @ http://t.co/Mts1mSdDLs #bigdata #datamining - posted on
04/03/2014 15:08:51
Tags
.NET
C#
Code
19
Data Mining
Programming
RapidMiner
Social Networks
SQL
34
SSIS
16
4,994 views
Twitter Sentiment Analysis Part 1. Extracting and Mining Twitter Data Using Zapier, RapidMiner and
Google/Microsoft Tools
Twitter Data Sentiment Analysis Using etcML and Python
How to create an XML file using SSIS and MS SQL Server
Programmatically Checking If File Exits In SSIS
1,803 views
1,408 views
1,228 views
1,082 views
http://bicortex.com/twitter-sentiment-analysis-mining-twitter-data-using-rapidminer-part-1/
9/9