All for Joomla All for Webmasters
Data Mining Python

Tweepy: Tweet Collector in Python

Twitter is a social network widely used for research (for both market and academy), mostly because its API is really robust and allows to emulate any user action. It also has the listening feature for the Streaming API, which opens a connection that receives tweet objects from any specific subject. Among the many libraries available for using Twitter API, Tweepy, implemented in python, is usually the first choice of most programmers, ’cause it’s really efficient and easy to use. The easiest way to install tweepy is by the PIP (package manager for python):

If PIP is not installed yet, you can get it on a Linux distro by apt-get:

The next step is to register your application on Twitter developer platform. Once you registered your application, you will need to obtain 4 access informations: consumer key, consumer secret, access token and access token secret. The source code for the tweet collector will require those informations for authenticating on Twitter API.

Tweepy - onde encontrar as chaves de autenticação na plataforma do Twitter
Twitter access informations on developer platform

That’s the only change you gotta do on the script code for make it work. Anyway, the method on_status is responsible for handling the tweet objects which are received by streaming. In this example, just to demonstrate, the method retrieves the author name and the text from the tweet. It alsos favorites the tweet received. In a real case, you could also persist the object in a database, for example, or any other action according to your needs.

THE SOURCE CODE

To run the code above, it’s required to pass some argument to be used as terms to be retrivied by the API. For example, for searching the terms “nintendo”, “zelda”, and “super mario”, you should run the script like this (assuming you saved the file as “collector.py”):

The original code should print on console the author’s name and tweet original text. It’s worth to point out that Twitter Streaming API doesn’t capture all tweets related to the queried terms, specially when the terms are trending and therefore has a large amount of data. Even so, that API provides a statistically significant amount of data about the subject in point. If you have any doubts or suggestions, please use the comment area or contact me.

You Might Also Like

1 Comment

  • Reply
    Sannytet
    December 11th, 2018 at 20:20

    Nice posts! 🙂
    ___
    Sanny

  • Leave a Reply