Django + Celery

Hey Everyone, I've been using django and celery in production for the last 4 years now and was thinking of making a YouTube series on celery, scaling, how it works, using websockets with celery via (django-channels), kubernetes with celery and event driven architecture. The django community has been a great help for me learning so wanted to give back in some way.

My question is what would you like to learn about?

21 thoughts on “Django + Celery”

  1. The basics probably? I am actually really struggling with celery right now. I found a lot of tutorials for periodic tasks but I need an asynchronous task for uploading and than parsing and downloading data. And I would like to present a progress bar of it in the front end. Can’t manage to pull it off

  2. Setting up celery with docker on widows. I am really struggling wit it. Also my linter is not finding celery in my django project.

  3. I’ve used celery a lot and have always hated how opaque it feels when combined with redis. So I’d like to see something about queue visualisation and reporting etc

  4. Do an in depth exploration of redis vs rabbit mq using Django + celery. What are the advantages/disadvantages to either choice? What are the monitoring solutions (outside of flower)? What are the common roadblocks (e.g. result queues may be set up to live for far too long)? There are a lot questions like this that don’t have a centralized place to look up answers

  5. Cool idea, I’ve been using it for a while too. I can provide some content/feedback if you want to work on it with someone else.

  6. Please be sure to have a transcript – I find watching videos takes too much time.

    Here are a few challenges I have not yet resolved:

    * monitoring queues for display on our internal dashboard. Flower did not work well enough
    * Getting “at most once” behavior, right now some jobs run multiple times. The flags about “when to ack” are confusing when there are longer jobs.
    * Best practices when logging from code that is used both from celery and from command line (cron scripts) and the web application
    * Managing queues when there is a large variation in job length (50 millisec to 30 minutes). We currently split into two queues but we still get delays (the “short” jobs vary from 50 millisec to 2 minutes).
    * Best user interaction with short jobs and celery. We have some downloadable reports that take 45 to 60 seconds to generate. The user now just waits while the web server computes it. I’d rather be doing this via celery but the user does not want to have to come back to the page. The problem we have is getting occasional timeouts when the database is heavily loaded. (more than 120 seconds and web times out). A progress bar would be nice but is not critical – what matters more is the user wants the report now (not via email or coming back to it).
    * For AWS, how to share disk across servers. Celery job A downloads 5 files to local disk, and archives to S3. It then queues 5 jobs, one for each file. Right now we arrange all of them to run on one server to have a shared file system. The files are 100-400MB so the second jobs don’t want to fetch from S3 again. The “load file” jobs can then start 100+ smaller jobs as result of parsing the large file.

  7. i haven’t solved the problem with rolling out new releases while having quite long running non idempotent long running tasks running. i would love to hear about that. how to pause accepting new tasks but finish currently running ones. all that runs in docker.

  8. A thing that I’ve not been able to find a lot of good information on is how to incorporate something like Luigi into the Django/Celery workflow. I’ve got a lot of tasks that need to be run and then another one that has to be run after all of those others finish. And being able to only run the final task if all of the others pass successfully otherwise notify on the failed ones so they can be cleaned up and re-run to kick off the final task. That would be super helpful and amazing!

  9. That would be very interesting indeed!

    Would be awesome to see a YouTube series starting with an overview what you have build and subsequently go into the details of the different areas like using Celery, making use of Websockets, Kubernetes etc.

  10. I’d love to see patterns for handling large batch ETL jobs. Maybe show how to scale up/out to-

    1. Read in data from a CSV or parquet from S3
    1. Transform and process the data via a DRF serializer
    1. Add the data to a database via the Django ORM

    Do this for millions of rows of data.

  11. “how it works, using websockets with celery via (django-channels)”

    That sounds interesting- what part(s) would you use the websocket for?

    I am one who strongly prefers written tutorials over YouTube videos, so something to consider is posting a text version or even a text transcript. I usually skip the videos unless I can’t find a written guide and really need to learn something, or if it is a topic where a video is particularly illustrative.


Leave a Comment