Optimized Database Access Patterns for Dummies
Initech sold some expensive devices which supported location tracking. Their customers wanted to track the location of these devices, with full history of all previous locations, down to five minute increments.
When Eurydice F joined the team, she understood that it would be a lot of data to manage. She was an experienced DBA, and had all sorts of ideas about how you might partition the database to make that scale of data manageable, and the ways you would index it to make access and retrieval efficient.
What she found instead was a table called DEVICELOGDAY. It had three fields. The first was a date- only a date, no time information, the second was an integer field simply called deviceId, and the third was a CLOB field called data.
There was also a temporary table, a transaction-local table, simply called DEVICELOG. This had more reasonable fields- a timestamp, a device ID, a lat/lon pair.
The secret of the dataflow existed in two stored procedures, PACKLOGDAY and UNPACKLOGDAY. A previous developer had seen "we need to store a row for every device, every five minutes" as an unbearable transaction burden. So instead, they decided to store one row per device, per day. That was DEVICELOGDAY. Every five minutes, a device would report its location and put that location in the DEVICELOG temporary table. Then PACKLOGDAY would be called, with the device ID as a parameter.
PACKLOGDAY would take all rows in DEVICELOG with that device ID (which, as its a transaction-local temp table, would be the only rows in that table), and append them to the data column in DEVICELOGDAY for that device and that day, as a character-separated-values entry, e.g.:
13:41:57|42.6559038|-73.806023313:42:02|41.9262175|-74.0180634
One row, per device, by day. If you wanted to convert the pipe-separated fields back into rows and columns, there was a helpful UNPACKLOGDAY stored procedure which would populate a temporary table, based on the combination of device ID and a date range.
This solution was taken because of performance concerns, and the solution proved the importance of those performance concerns: it was slow as a dead turtle at the end of a marathon. "But," stated the developers who had already been broken by this code, "imagine how slow it'd be if we hadn't optimized?"
Eurydice has this to add:
As a bonus WTF: we developed and edited stored procedures directly in the production database using Oracle SQL Developer
The database itself was their source control.
[Advertisement] Continuously monitor your servers for configuration changes, and report when there's configuration drift. Get started with Otter today!