Handling large bursts of POST requests to your ActivityPub inbox, using a buffer in Nginx
submitted 7 months ago by Rimu
join.piefed.social/2024/04/17/handling-large-bu…
Fediverse traffic is pretty bursty and sometimes there will be a large backlog of Activities to send to your server, each of which involves a POST. This can hammer your instance and overwhelm the backend’s ability to keep up. Nginx provides a rate-limiting function which can accept POSTs at full speed and proxy them slowly through to your backend at whatever rate you specify.
Does individual users send activities directly? I thought only users of your instance and remote federated instances send traffic to your instance, so this change would only affect data coming through from the larger instances?
Also, what’s happening with the original request while they remain in queue? Say for example if
large-instance.com
is sending 11 updates to your instance; while your back end server is processing the first 10, what’s happening with the 11th? Does it get put on hold while your back end churn, or does it get a 200 OK response, despite the request may be failed at a later date? Neither of which seems ideal — if the instance get put in a queue waiting for a response while you churn, or worst yet, if your backend fails and your buffer is waiting X seconds to time out each request, you’re going to hurt the global federation by holding up a spot in their out going federation queue; if the instance is sent with a 200 OK and your backend eventually fails, you’d lose data as the other instance wouldn’t know they’d need to retry.A HTTP 200 response is sent immediately. There is a small risk of data loss but only during the time when rate limiting is happening. Most instances behave well and only send activities at a sensible rate but when things go wrong we need to avoid the bad effects caused by those that are mis-configured or under attack.
It's not a perfect solution. But the alternative - fail to accept some activities, causing them to retry later and us to slowly fall more and more behind - is worse.
Thanks for the clarification! I’m inclined to agree here, small risk of data loss but failing to receive updates all together during peak burst is probably much worse.