Twitter is a social network widely used for research (for both market and academy), mostly because its API is really robust and allows to emulate any user action. It also has the listening feature for the Streaming API, which opens a connection that receives tweet objects from any specific subject. Among the many libraries available for using Twitter API, Tweepy, implemented in python, is usually the first choice of most programmers, ’cause it’s really efficient and easy to use. The easiest way to install tweepy is by the PIP (package manager for python):
sudo pip install tweepy
If PIP is not installed yet, you can get it on a Linux distro by apt-get:
sudo apt-get install python-pip
The next step is to register your application on Twitter developer platform. Once you registered your application, you will need to obtain 4 access informations: consumer key, consumer secret, access token and access token secret. The source code for the tweet collector will require those informations for authenticating on Twitter API.
Twitter access informations on developer platform
That’s the only change you gotta do on the script code for make it work. Anyway, the method on_status is responsible for handling the tweet objects which are received by streaming. In this example, just to demonstrate, the method retrieves the author name and the text from the tweet. It alsos favorites the tweet received. In a real case, you could also persist the object in a database, for example, or any other action according to your needs.
THE SOURCE CODE
# -*- coding: utf-8 -*-
#Retrieving the query by args
query = sys.argv[1:]
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
api = tweepy.API(auth)
def on_status(self, tweet):
#When receive some tweet, this is where you handle it
def on_error(self, status_code):
print "Erro com o código:", status_code
print "Tempo esgotado!"
streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(), timeout=60)
streaming_api.filter(follow=None, track=query, languages=["en"])
To run the code above, it’s required to pass some argument to be used as terms to be retrivied by the API. For example, for searching the terms “nintendo”, “zelda”, and “super mario”, you should run the script like this (assuming you saved the file as “collector.py”):
python collector.py "nintendo" "zelda" "super mario"
The original code should print on console the author’s name and tweet original text. It’s worth to point out that Twitter Streaming API doesn’t capture all tweets related to the queried terms, specially when the terms are trending and therefore has a large amount of data. Even so, that API provides a statistically significant amount of data about the subject in point. If you have any doubts or suggestions, please use the comment area or contact me.