Gunicorn server restart

Get Python test coverage faster without killing your server

Jon Jagger
Jon Jagger
Published January 18, 2023 in technology

Getting system test coverage from a Python web server is not straightforward. If you search the internet all the hits describe killing the server (eg gunicorn) to get the coverage exit handlers to run. When you run your server from a docker container this means the next test run is forced to bring up a new container. This slowed our test cycle which we didn’t like, so we found a faster way. We restart the server instead of killing it.

This blog post contains links to a public github Kosli demo repo tdd which contains:

  • gunicorn running with multiple workers
  • each running a simple Flask web server (with an API)
  • the web server scores the XY Business Game by Jerry Weinberg
  • system tests, running in parallel, with coverage obtained by restarting gunicorn

The Architecture

  • The server is running inside a docker container S.
  • The pytest-cov package is installed from requirements.txt.
  • The code in S runs with coverage when in test mode.
  • The system tests are running inside a different docker container T.
  • The tests in T send HTTP requests to S.
  • When all the tests in T have finished we have to extract the coverage from S.

http requests code coverage kosli

Getting system-test coverage the slower way

  1. Create a file called sitecustomize.py as follows:
import coverage
coverage.process_startup()
  1. Ensure that, when testing, an environment variable called COVERAGE_PROCESS_START is set to the path of your .coveragerc file inside S. Eg:
[run]
data_file=${XY_CONTAINER_COV_DIR}/.coverage
  1. Now bring up a server S, wait till S is ready, then run the tests from T. When the tests have finished, kill the server S. The coverage exit handler will run and write out the .coverage files.
  2. Ensure the directory specified in data_file= is volume-mounted in your docker-compose.yaml file so the .coverage files are not lost.

Getting system-test coverage the faster way

You do not need a sitecustomize.py file.

As before, when testing, an environment variable called COVERAGE_PROCESS_START is set to the path of the .coveragerc file inside S.

If the COVERAGE_PROCESS_START environment variable is set, our gunicorn.sh file starts gunicorn with the –config flag set to a file called gunicorn_coverage.py:

#!/usr/bin/env bash
set -Eeu

readonly MY_DIR="$(cd "$(dirname "${BASH_SOURCE\[0]}")"; pwd)"

if [ -z "${COVERAGE_PROCESS_START:-}" ]; then
  COVERAGE_CONFIG=
else
  COVERAGE_CONFIG="--config ${MY_DIR}/gunicorn_coverage.py"
fi

gunicorn \
  ...
  ${COVERAGE_CONFIG} \
  --workers=2 \
  ...

The gunicorn_coverage.py file hooks into gunicorn’s post_fork handler to start coverage when a worker process starts, and into gunicorn’s worker_exit handler to stop and save coverage when a worker process exits.

import coverage
import os
import subprocess

cov = coverage.Coverage(config_file=os.environ["COVERAGE_PROCESS_START"])

def post_fork(server, worker):
    recreate_coverage_dir()
    cov.start()

def worker_exit(server, worker):
    cov.stop()
    cov.save()

def recreate_coverage_dir():
    cov_dir = os.environ["XY_CONTAINER_COV_DIR"]
    rmdir_cmd = ["rm", "-rf", cov_dir]
    mkdir_cmd = ["mkdir", "-p", cov_dir]
    subprocess.run(rmdir_cmd, check=False)
    subprocess.run(mkdir_cmd, check=True)

Now, instead of killing the server to get the coverage, we can restart the server by sending a SIGHUP signal to the gunicorn master process. This brings up new worker processes and calls their post_fork() handlers, and brings down the old worker processes and calls their worker_exit() handlers. The recreate_coverage_dir() function relies on the former happening before the latter. The server is running inside an Alpine container so we use the -o flag to find the oldest gunicorn process which will be the master process:

restart_server()
{
  docker exec --interactive "${CONTAINER_NAME}" \
    sh -c “pkill -SIGHUP -o gunicorn”
}

The SIGHUP signal is asynrchronous so after sending it we must wait until the cov.stop() and cov.save() calls have finished and all the .coverage files have been written. This is quite tricky; we are running the tests in parallel, across more than one web server (--workers=2). We opted to simply wait until the number of .coverage files stabilizes. Then we combine the .coverage files and generate json and html reports:

gather_coverage.sh

wait_for_all_coverage_files()
{
  while : ; do
    echo -n .
    a1="$(actual_coverage_files_count)"; sleep 0.25
    a2="$(actual_coverage_files_count)"; sleep 0.25
    a3="$(actual_coverage_files_count)"; sleep 0.25
    a4="$(actual_coverage_files_count)"; sleep 0.25
    [ "${a1}${a2}${a3}${a4}" == "${a1}${a1}${a1}${a1}" ] && break
  done
  echo .
}

actual_coverage_files_count()
{
  find "${COV_DIR}" -maxdepth 1 -type f -name ^.coverage | wc -l | xargs
}

create_coverage_json() { ... }
create_coverage_html() { ... }

wait_for_all_coverage_files
cd${COV_DIR}coverage combine --keep --quiet
create_coverage_json
create_coverage_html

When restarting the server, we want local edits to be live, so we use volume-mounts in docker-compose.yaml to “overlay” the source and test directories:

...
services:
    ...
    volumes:
      - ./source:${XY_CONTAINER_ROOT_DIR}/source:ro
      - ./test:${XY_CONTAINER_ROOT_DIR}/test:ro

The top level bash script to run the tests (in the restarted container) and gather the coverage looks like this:

...
restart_server; wait_till_server_ready
run_tests_system
restart_server; wait_till_server_ready
gather_coverage
  • The first restart_server call causes the gunicorn post_fork() handler to start coverage.
  • The second restart_server call causes the gunicorn worker_exit() handler to stop and save the coverage.
  • Both handlers run in both restarts, but this is ok since the post_fork() runs before the worker_exit().

In the tdd demo repo, gathering the system test coverage from a new server takes ~10 seconds; from a restarted server ~4 seconds. FTW.


ABOUT THIS ARTICLE

Published January 18, 2023, in technology

AUTHOR
Jon Jagger
Jon Jagger
Live in Git Blame? Don’t spend hours searching for the change that broke your application! Query, search and discover all the changes in one place

Latest articles

The Misunderstood Troll - A story about collaboration, communication and visibility in a regulated software organizations

In this talk Alex Kantor, Director of Technology at Modulr, will show you how they used Kosli to enable their developers to release directly to production in a financially regulated environment - …

How to Configure CLI Tools in Standard Formats with Viper in Golang

Over the past few years, the DevOps and CloudOps sectors have seen a rise in tools that focus on improving certain operations of teams within the industry. There seems to be a tool for almost any …

Get Python test coverage faster without killing your server

Getting system test coverage from a Python web server is not straightforward. If you search the internet all the hits describe killing the server (eg gunicorn) to get the coverage exit handlers to …

Sign up to our newsletter

We'll let you know about the Kosli launch, product news, features and updates
Kosli is committed to protecting and respecting your privacy. By submitting this newsletter request, I consent to Kosli sending me marketing communications via email. I may opt out at any time. For information about our privacy practices, please visit Kosli's privacy policy.
Kosli team reading the newsletter

Let’s chat!

Got a question about Kosli? An idea for a new feature? Join Kosli Slack and talk to us.

Join
Developers using Kosli