Optimal Gunicorn-worker Configuration (number And Class) For Python Rest Apis

February 22, 2024 Post a Comment

Lets say I have two conceptually different REST APIs developed in Python through a framework like Flask or Falcon which I need to deploy through Gunicorn on a server with 4GB of RA

Solution 1:

Basically you need two different things: parallelism and async.

The way Gunicorn handles requests is by allowing each worker to process one request. As such there is no "buffer" in front of the application to handle overflow and there is no solution to a possible "thundering herd" problem (see here).

You will need to run 2 different gunicorn instances, each running one of the API's.

Ideally, you should have a ballpark estimation of your possible load for each API, because in your case parallelism is very limited (2 vcores are not much really) and as such, CPU will be a bottleneck for every worker.

Baca Juga

Given the gunicorn documentation recommendations (2* nr of cores + 1) I would try to start from here, with the base assumption that it might overload the server:

#for API1workers = 4worker_class = sync
threads = 2#for API2workers = 10worker_class = gevent

You will have to twist and tweak these values based on your server load, IO traffic and memory availability. You should test load response with a script designed to mock a flurry of simultaneous requests to both API's (you can use grequests for that).

Python Playground

Optimal Gunicorn-worker Configuration (number And Class) For Python Rest Apis

Solution 1:

Post a Comment for "Optimal Gunicorn-worker Configuration (number And Class) For Python Rest Apis"