Article 41WAW It's a Little Stale

It's a Little Stale

by
Remy Porter
from The Daily WTF on (#41WAW)

Megan K's organization does the sane and reasonable thing: they cache all of their dependencies locally, and their CI/CD process pulls from the local cache. Since they're in the Python world, this means pulling from PyPI using Bandersnatch.

Someone set up the Bandersnatch job. Someone linked it into their CI/CD process. No one wanted to claim credit when it started failing. Every five minutes, Bandersnatch tried to contact PyPI. Every five minutes, it got a StalePage error, specifically complaining about a "stale serial", and sent an alert email to the entire dev team. Every five minutes, the entire dev team ignored the error. For ten days.

The office turned into a tense standoff. No one wanted to talk about the errors, since the first one to bring it up was going to be the one who had to fix it. Each day could be underscored by Ennio Morricone.

Megan caved first. Instead of using rules to filter her inbox, she manually deleted each message. Eventually, sick of getting spammed, she did some research, and shared her findings with the developers.

Megan started where anyone would start debugging HTTP requests: with curl. The curled the failing URL, first from her machine, and when that worked, from the Bandersnatch machine. That also worked. Even as a Python script, running on the same machine, sending a very similar request, failed.

Megan fired up a Python REPL and imported the requests library, which is used for sending HTTP requests. She sent a request, and that failed. curl always got a fresh serial, requests always got a stale serial. Obviously, there was something different about the requests.

Megan checked the headers, and found two key differences: the User-Agent and the Accept-Encoding headers. Thinking the obvious, she tried setting curl to use the same User-Agent as requests- and it kept working. Changing the User-Agent in requests didn't fix the problem.

But changing the Accept-Encoding did. If Megan sent a request with Accept-Encoding: gzip, deflate, the PyPI loadbalancer routed her to a stale cache server, every time. Accept-Encoding: '', on the other hand, always got a fresh server. By default, requests used gzip, deflate. Bandersnatch, in turn, used the default settings for requests.

That of course, wasn't enough. Eventually, the cache server that Bandersnatch was getting routed to would also become stale, so Megan had to add an arbitrary (and highly variable) parameter to her requests- specifically, a URL parameter called cache-bust which was always set to the current timestamp.

An issue was raised with PyPI, which is still open at the time of this writing. Obviously, it's some simple misconfiguration in the load balancing and caching, or perhaps an intentional configuration based on bad assumptions, but the moral of the story: Megan was glad they were caching all of their dependencies locally, just in case PyPI ever went really wrong.

proget-icon.png [Advertisement] Ensure your software is built only once and then deployed consistently across environments, by packaging your applications and components. Learn how today! TheDailyWtf?d=yIl2AUoC8zAseTRAVY_I_k
External Content
Source RSS or Atom Feed
Feed Location http://syndication.thedailywtf.com/TheDailyWtf
Feed Title The Daily WTF
Feed Link http://thedailywtf.com/
Reply 0 comments