Django and Celery

(Originally posted August 19, 2013)

While working on Streakflow, it became evident quickly that email reminders would be important. Not only because they would hopefully bring back users, but also because a little nagging is a good way to get people to finish goals. This email system should email users daily, at a time selected by them. This is exactly the type of thing that is perfect for Celery. In this case, we’re going to be using the scheduling portion of Celery. Jumping in to something new and unknown may be overwhelming, but I’ll show you how simple it is to configure Celery to use in Django, and run in production.

The first step is to install django-celery,

$ pip install django-celery

Then add djcelery to your installed apps,

INSTALLED_APPS += ("djcelery", )

And then migrate your database (assuming you’re using South, which you should):

$ python manage.py migrate djcelery

Finally, add the following three lines to your settings.py file.

import djcelery
djcelery.setup_loader()

That’s as far as the django-celery docs front page gets you, and it really is (almost) that simple. From here, we want to add, functionality for scheduled tasks. We’ll deal with the code first, and the production set up at the end.

Back to your settings.py file and add the following.

BROKER_URL = 'amqp://guest:guest@localhost:5672//'

CELERY_TIMEZONE = 'UTC'
from celery.schedules import crontab
CELERYBEAT_SCHEDULE = {
    'check-goals-complete': {
        'task': 'streakflow.apps.members.tasks.reminder_emails',
        'schedule': crontab(minute='*/30'),
    },
}

The broker_url is what celery uses to connect to the broker (something like Redis or RabbitMQ). We’ll get to that later.

CELERY_TIMEZONE is used so celery can keep track of the time correctly. No reason here to not have it be utc.

We then import corntab for use in the schedule. Here we do two things. We give the scheduled task an arbitrary name, and then tell celery where it should look for the task, and on what schedule it should. run the code. The ‘task’ parameter is in similar vein of imports, and the crontab definition here is set up for every 30 minutes because some timezones are at half hour offsets. This code is very much straight from the docs.

In our tasks.py file where we define the task, we just need to make sure that we have the following pieces.

from celery import task

@task
def reminder_emails():
  ...
  ...
  send_email(.....)

The logic would replace the …s. As long as this code matches the path from settings, you’ll be fine.

At this point, that’s all the code/setup you need, so now we’re going to shift over to getting this thing to run in production. For me, I used a smallest instance at Digital Ocean. They’re ve 1ff8 ry cheap, and also come with many tutorials to get you started, which is fantastic. Highly recommended. After firing up a default Ubuntu box and installing everything else, which isn’t the topic of this post, we want to install the queue.

$ sudo apt-get install rabbitmq-server

All we want to do with this is make sure that it is running.

$ sudo rabbitmqctl status

It should be running, but if it isn’t, you can always,

$ sudo rabbitmqctl start

To monitor the services, I like to use supervisor. It’s intuitive and simple to hook up to all Django functions like gunicorn. For celery, we’re going to use two supervisor programs. This is because we need to use celery beat for the scheduler, as well as have a celery worker to execute the code. The code both will be almost identical. For each, we need two pieces of code. One is a bash script that contains the code that we want to run when we start the celery process. This looks like the following for the worker.

#!/bin/bash

source /path/to/env/bin/activate
cd /path/to/django/proj/
exec python manage.py celery worker --loglevel=INFO

This activates the virtualenv, changes directories to where manage.py is, and runs the celery command through manage.py like it says in the docs. The only difference for the beat, is that we want to execute “celerybeat” instead of “celery worker”. Make sure to chmod u+x this script!

Finally, we want to create a supervisor conf file for each. The location for this should be in /etc/supervisor/conf.d/ along with your supervisor conf file for gunicorn if you’re using that too.

[program:celery_worker]
command=/path/to/bin/celery_worker_start
stdout_logfile=/path/to/logs/celery_worker.log
redirect_stderr=true

Again, the difference for celery beat is just changing the worker to beat and making sure they match up.

To load the celery things into supervisor,

$ sudo supervisorctl reread
$ sudo supervisorctl update

You can then check the status of both of them by running

$ sudo supervisorctl status

At this point you should see both of them running! If you check the logs, you can see that the celery beat program is waking up and checking how things are, and then sleeping until it is time to run the mailer. The celery worker just sits there until it gets a task. Anything that you print from the task will be able to be seen in the log. Also, you can change the log level from INFO, to DEBUG if you want more information.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s