CodeSOD: Producing Self Joins
Wesley considers himself an "intermediate SQL-er". The previous "database expert", though, has moved on to greener pastures, and now Wesley is "the database person". This means that they need to pick through a bunch of stored procedures and triggers and try and understand the undocumented, unversion-controlled database code.
The upshot, is that Wesley is starting to feel like his intermediate skills might be more "expert" than the previous expert.
For example, Wesley found this query. The goal of this query is, for a single point-of-sale terminal, in a date range, how many of each product did they sell, as a raw count. It should be able to say, there were 5 apples, 10 oranges, etc.
Now, you or I or Wesley are probably already reaching for the GROUP BY and maybe a CASE. It's the right choice, obviously. But the "expert" has a different solution:
SELECTa.Name,b.RegisterId,COUNT(c.ProduceId) AS Apples,COUNT(d.ProduceId) AS Oranges,-- ... COUNT(ac.ProduceId) AS Zucchini,COUNT(ad.ProduceId) AS Other,COUNT(ae.ProduceId) AS GrandTotalFROM Cashier aINNER JOIN Sale b ON a.RegisterId = b.RegisterId AND (CONVERT(DATE,b.SaleDate) BETWEEN @startDate AND @endDateLEFT OUTER JOIN ProduceType c ON b.ProduceId = c.ProduceId AND c.ProduceName = 'Apples'LEFT OUTER JOIN ProduceType d ON b.ProduceId = d.ProduceId AND d.ProduceName = 'Oranges'-- ...LEFT OUTER JOIN ProduceType ac ON b.ProduceId = x.ProduceId AND x.ProduceName = 'Zucchini'LEFT OUTER JOIN ProduceType ad ON b.ProduceId = y.ProduceId AND y.ProduceName = 'Other'LEFT OUTER JOIN ProduceType ae ON b.ProduceId = z.ProduceId AND z.ProduceName IS NOT NULLGROUP BY a.Name, b.RegisterIdORDER BY a.Name, b.RegisterId
This" is a massive self-join. Each join trips back to ProduceType, but filters it for one specific type of produce. For every kind of produce they track. Well, presumably every kind- at some point the list of produce in the ProduceType table might change, in which case this query breaks.
This isn't the simplest query to write in SQL, given that SQL really doesn't like it when you dynamically project an arbitrary number of columns, so pretty much any solution is going to be at least a little bit ugly. Still, with a smart use of CASE statements or possibly sub-queries, you can certainly get there. Given all the possible ways to write a query like this, it's impressive to see that the original developer hit upon what may be the worst possible one.
Wesley adds:
[Advertisement] Ensure your software is built only once and then deployed consistently across environments, by packaging your applications and components. Learn how today!I can't stop asking myself: how does someone end up writing a join into a N-way self-join instead of counting cases over a single join? HOW?! Were they challenged to write a loop-switch sequence in SQL? Was CASE broken somehow?
Since there's no revision control for these scripts, I may never know.