Tweepy: Tweet Collector in Python

Posted on November 25th, 2017

Twitter is a social network widely used for research (for both market and academy), mostly because its API is really robust and allows to emulate any user action. It also has the listening feature for the Streaming API, which opens a connection that receives tweet objects from any specific subject. Among the many libraries available for using Twitter API, Tweepy, implemented in python, is usually the first choice of most programmers, ’cause it’s really efficient and easy to use. The easiest way to install tweepy is by the PIP (package manager for python):

sudo pip install tweepy

1	sudo pip install tweepy

If PIP is not installed yet, you can get it on a Linux distro by apt-get:

sudo apt-get install python-pip

1	sudo apt-get install python-pip

The next step is to register your application on Twitter developer platform. Once you registered your application, you will need to obtain 4 access informations: consumer key, consumer secret, access token and access token secret. The source code for the tweet collector will require those informations for authenticating on Twitter API.

Twitter access informations on developer platform

That’s the only change you gotta do on the script code for make it work. Anyway, the method on_status is responsible for handling the tweet objects which are received by streaming. In this example, just to demonstrate, the method retrieves the author name and the text from the tweet. It alsos favorites the tweet received. In a real case, you could also persist the object in a database, for example, or any other action according to your needs.

THE SOURCE CODE

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
import tweepy

#Retrieving the query by args
query = sys.argv[1:]

#Authentication
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

class CustomStreamListener(tweepy.StreamListener):

  def on_status(self, tweet):
    #When receive some tweet, this is where you handle it
    print str(tweet.author.screen_name)
    print str(tweet.text)
    api.create_favorite(tweet.id)

    return True

  def on_error(self, status_code):
      print "Erro com o código:", status_code
      return True

  def on_timeout(self):
      print "Tempo esgotado!"
      return True

def main():
  streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(), timeout=60)
  streaming_api.filter(follow=None, track=query, languages=["en"])

#!/usr/bin/env python

# -*- coding: utf-8 -*-

import sys

import tweepy

#Retrieving the query by args

query = sys.argv[1:]

#Authentication

consumer_key = ''

consumer_secret = ''

access_token = ''

access_token_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

class CustomStreamListener(tweepy.StreamListener):

def on_status(self, tweet):

#When receive some tweet, this is where you handle it

print str(tweet.author.screen_name)

print str(tweet.text)

api.create_favorite(tweet.id)

return True

def on_error(self, status_code):

print "Erro com o código:", status_code

return True

def on_timeout(self):

print "Tempo esgotado!"

return True

def main():

streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(), timeout=60)

streaming_api.filter(follow=None, track=query, languages=["en"])

To run the code above, it’s required to pass some argument to be used as terms to be retrivied by the API. For example, for searching the terms “nintendo”, “zelda”, and “super mario”, you should run the script like this (assuming you saved the file as “collector.py”):

python collector.py "nintendo" "zelda" "super mario"

1	python collector.py "nintendo" "zelda" "super mario"

The original code should print on console the author’s name and tweet original text. It’s worth to point out that Twitter Streaming API doesn’t capture all tweets related to the queried terms, specially when the terms are trending and therefore has a large amount of data. Even so, that API provides a statistically significant amount of data about the subject in point. If you have any doubts or suggestions, please use the comment area or contact me.

Ronan Lopes

CTO at FIT Energia. Msc in Computer Science from the Federal University of São João del-Rei (UFSJ). Took a specialization course in Data Science & Big Data at PUC Minas . Linux enthusiast and supporter of open-source software. A buddhist who learns from Lama Padma Samten. In his spare time he draws something, solves magic cubes and enjoys some nice beers with his friends.

1 Comment

Sannytet

December 11th, 2018 at 20:20

Nice posts! 🙂
___
Sanny

Tweepy: Tweet Collector in Python

THE SOURCE CODE

Ronan Lopes

You Might Also Like

Web Crawler: Mining WordPress posts with Nokogiri

Daemonize: running a Python script as a daemon

Gokano: daily points bot collector in Python with Mechanize

1 Comment

Sannytet

Leave a Reply Cancel Reply