Management tasks on agentless containers

Running one-off agentless commands.

Earlier this year, I was tasked with dealing with a Django deployment that had gone awry.

This particular project was running in an ECS container, and the underlying issue was caused by a logical error in our entrypoint script that was failing to run migrations before starting the WSGI server. This meant that we were only notified of the problem once the required fields were accessed.

Since this issue with our entrypoint wasn’t readily apparent, the immediate stop-gap here would be to run migrations manually, but there’s no easy way to do this if your containers are running on EC2-backed ECS with the SSM Agent disabled, or if you’re backed by Fargate.

Thankfully, our database and cache were accessible behind an SSH bastion, meaning it was possible to spin up a local instance of the project hooked up directly to the live database. I hacked together this docker-compose.yml to allow us to run this one-off management command:

version: "3.7"

x-bastion-service: &bastion-service
  image: kroniak/ssh-client
  volumes:
    - "${SSH_AUTH_SOCK}:${SSH_AUTH_SOCK}"
    - "~/.ssh/id_rsa:/root/.ssh/id_rsa"
    - "~/.ssh/id_rsa.pub:/root/.ssh/id_rsa.pub"
  environment:
    - SSH_AUTH_SOCK
  entrypoint: "ssh -vp 22 -N [email protected] -o StrictHostKeyChecking=no -L"

x-live-environment: &live-environment
  environment:
    - DEPLOY_ENV=live
  links:
    - "postgres:postgres.production.host"

services:
  web:
    <<: *live-environment
    build: .
  depends_on:
    - db

  db:
    <<: *bastion-service
    command: "0.0.0.0:5432:postgres.production.host:5432"

If you’re not familiar with docker-compose syntax, this essentially defines a new service, db, that is actually just a SSH client masquerading as a database. It’s simply forwarding requests to the container’s port 5432 through the bastion and directly to postgres.production.host:5432.
We then just add a link to proxy any requests intended for the production host to our fake database.

Needless to say, this is a terrible idea to run against production, and you shouldn’t be using this for an established production environment. In an ideal scenario, your entrypoints would be written and tested well before you run into issues with migrations. These were not ideal circumstances, but this compose happened to save us a few hours of downtime.

It was a fun little challenge to solve, and hopefully this helps anyone who might be in a similar position. If you do end up needing something like this, the only things worth keeping in mind are:

Ensuring that initialisation actions are idempotent.
If you do have non-idempotent tasks running on initialisation, disable them.
Ensuring that your container is otherwise identical to production.
Don't enable any debug settings, or point your local instance to a development configuration.

Keep this in mind, and you should be able to execute your management command as per:

docker-compose exec web python src/manage.py migrate