Optimal Gunicorn-worker Configuration (number And Class) For Python Rest Apis
Solution 1:
Basically you need two different things: parallelism and async.
The way Gunicorn handles requests is by allowing each worker to process one request. As such there is no "buffer" in front of the application to handle overflow and there is no solution to a possible "thundering herd" problem (see here).
You will need to run 2 different gunicorn instances, each running one of the API's.
Ideally, you should have a ballpark estimation of your possible load for each API, because in your case parallelism is very limited (2 vcores are not much really) and as such, CPU will be a bottleneck for every worker.
Given the gunicorn documentation recommendations (2* nr of cores + 1) I would try to start from here, with the base assumption that it might overload the server:
#for API1workers = 4worker_class = sync
threads = 2#for API2workers = 10worker_class = gevent
You will have to twist and tweak these values based on your server load, IO traffic and memory availability. You should test load response with a script designed to mock a flurry of simultaneous requests to both API's (you can use grequests for that).
Post a Comment for "Optimal Gunicorn-worker Configuration (number And Class) For Python Rest Apis"