TIL: Asyncio for API Requests - Batching vs. Semaphore
This article describes the difference between manual batching vs Semaphore.
In the world of API requests, like those to Wikipedia, we often face the
challenge of rate limits. To navigate this efficiently with Python’s asyncio, I
dived into two strategies: Batching and Semaphore. Batching breaks down
requests into manageable groups, processing them sequentially like a relay
race. On the flip side, Semaphore manages how many requests run concurrently,
akin to a bouncer controlling a nightclub’s entry.
During my tests on a Wikipedia public API to retrieve data for 100 entities, I uncovered some intriguing insights. With batching, I sent 10 requests at a time in systematic bursts, ensuring a steady and safer approach. Semaphore, though set to 10 concurrent requests, initially stumbled upon rate limits, fetching fewer results. This mimicked real-world API interactions, like Wikipedia’s defense mechanisms against potential abuse.
The key turning point was introducing a delay in the Semaphore method. This small tweak made a significant difference, balancing speed and adherence to API rate limits. It became clear that a more measured approach could outshine sheer speed, particularly when external constraints like API limitations come into play.
Key Lessons Learned:
The Reality of Rate Limiting: APIs have built-in measures to thwart abuse. Bombarding an API too rapidly can result in blocks. It’s crucial to verify the actual results, ensuring they are valid and not empty (None).
The Safety of Batching: Batching naturally spaces out requests, effectively reducing the risk of hitting rate limits. While its simplicity might seem less ‘Pythonic’ to some, its straightforward nature (processing batch by batch) is often more reliable in avoiding rate limit triggers.
*The Nuance of Semaphoresk: Semaphores, while appearing more Pythonic, require careful calibration. Not all results are guaranteed to be complete. Adjusting the concurrency limits and strategically introducing delays can prevent triggering the API’s defensive measures.