Article 6WBF1 CodeSOD: Join Us in this Query

CodeSOD: Join Us in this Query

by
Remy Porter
from The Daily WTF on (#6WBF1)

Today's anonymous submitter worked for a "large, US-based, e-commerce company." This particular company was, some time back, looking to save money, and like so many companies do, that meant hiring offshore contractors.

Now, I want to stress, there's certainly nothing magical about national borders which turns software engineers into incompetents. The reality is simply that contractors never have their client's best interests at heart; they only want to be good enough to complete their contract. This gets multiplied by the contracting firm's desire to maximize their profits by keeping their contractors as booked as possible. And it gets further multiplied by the remoteness and siloing of the interaction, especially across timezones. Often, the customer sends out requirements, and three months later gets a finished feature, with no more contact than that- and it never goes well.

All that said, let's look at some SQL Server code. It's long, so we'll take it in chunks.

-- ===============================================================================-- Author : Ignacius Ignoramus-- Create date: 04-12-2020-- Description:SP of Getting Discrepancy of Allocation Reconciliation Snapshot-- ===============================================================================

That the comment reinforces that this is an "SP", aka stored procedure, is already not my favorite thing to see. The description is certainly made up of words, and I think I get the gist.

ALTER PROCEDURE [dbo].[Discrepency](@startDate DATETIME,@endDate DATETIME)ASBEGIN

Nothing really to see here; it's easy to see that we're going to run a query for a date range. That's fine and common.

DECLARE @tblReturn TABLE(intOrderItemId INT)

Hmm. T-SQL lets you define table variables, which are exactly what they sound like. It's a local variable in this procedure, that acts like a table. You can insert/update/delete/query it. The vague name is a little sketch, and the fact that it holds only one field also makes me go "hmmm", but this isn't bad.

DECLARE @tblReturn1 TABLE(intOrderItemId INT)

Uh oh.

DECLARE @tblReturn2 TABLE(intOrderItemId INT)

Oh no.

DECLARE @tblReturn3 TABLE(intOrderItemId INT)

Oh no no no.

DECLARE @tblReturn4 TABLE(intOrderItemId INT)

This doesn't bode well.

So they've declared five variables called tblReturn, that all hold the same data structure.

What happens next? This next block is gonna be long.

INSERT INTO @tblReturn --(intOrderItemId) VALUES (@_ordersToBeAllocated)/* OrderItemsPlaced */select intOrderItemIdfrom CompanyDatabase..Orders oinner join CompanyDatabase..OrderItems oi on oi.intOrderId = o.intOrderIdwhere o.dtmTimeStamp between @startDate and @endDateAND intOrderItemId Not In (/* _itemsOnBackorder */select intOrderItemIdfrom CompanyDatabase..OrderItems oiinner join CompanyDatabase..Orders o on o.intOrderId = oi.intOrderIdwhere o.dtmTimeStamp between @startDate and @endDateand oi.strstatus='backordered' )AND intOrderItemId Not In (/* _itemsOnHold */select intOrderItemIdfrom CompanyDatabase..OrderItems oiinner join CompanyDatabase..Orders o on o.intOrderId = oi.intOrderIdwhere o.dtmTimeStamp between @startDate and @endDateand o.strstatus='ONHOLD'and oi.strStatus <> 'BACKORDERED' )AND intOrderItemId Not In (/* _itemsOnReview */select intOrderItemIdfrom CompanyDatabase..OrderItems oiinner join CompanyDatabase..Orders o on o.intOrderId = oi.intOrderIdwhere o.dtmTimeStamp between @startDate and @endDate and o.strstatus='REVIEW' and oi.strStatus <> 'BACKORDERED')AND intOrderItemId Not In (/*_itemsOnPending*/select intOrderItemIdfrom CompanyDatabase..OrderItems oiinner join CompanyDatabase..Orders o on o.intOrderId = oi.intOrderIdwhere o.dtmTimeStamp between @startDate and @endDateand o.strstatus='PENDING'and oi.strStatus <> 'BACKORDERED')AND intOrderItemId Not In (/*_itemsCancelled */select intOrderItemIdfrom CompanyDatabase..OrderItems oiinner join CompanyDatabase..Orders o on o.intOrderId = oi.intOrderIdwhere o.dtmTimeStamp between @startDate and @endDateand oi.strstatus='CANCELLED' )

We insert into @tblReturn the result of a query, and this query relies heavily on using a big pile of subqueries to decide if a record should be included in the output- but these subqueries all query the same tables as the root query. I'm fairly certain this could be a simple join with a pretty readable where clause, but I'm also not going to sit here and rewrite it right now, we've got a lot more query to look at.

INSERT INTO @tblReturn1/* _backOrderItemsReleased */select intOrderItemIdfrom CompanyDatabase..OrderItems oiinner join CompanyDatabase..orders o on o.intorderid = oi.intorderidwhere oi.intOrderItemid in ( select intRecordID from CompanyDatabase..StatusChangeLog where strRecordType = 'OrderItem' and strOldStatus in ('BACKORDERED') and strNewStatus in ('NEW', 'RECYCLED') and dtmTimeStamp between @startDate and @endDate )and o.dtmTimeStamp < @startDateUNION(/*_pendingHoldItemsReleased*/select intOrderItemIdfrom CompanyDatabase..OrderItems oiinner join CompanyDatabase..orders o on o.intorderid = oi.intorderidwhere oi.intOrderID in ( select intRecordID from CompanyDatabase..StatusChangeLog where strRecordType = 'Order' and strOldStatus in ('REVIEW', 'ONHOLD', 'PENDING') and strNewStatus in ('NEW', 'PROCESSING') and dtmTimeStamp between @startDate and @endDate )and o.dtmTimeStamp < @startDate)UNION/* _reallocationsowingtonostock */(select oi.intOrderItemID from CompanyDatabase.dbo.StatusChangeLog inner join CompanyDatabase.dbo.OrderItems oi on oi.intOrderItemID = CompanyDatabase.dbo.StatusChangeLog.intRecordIDinner join CompanyDatabase.dbo.Orders o on o.intOrderId = oi.intOrderId where strOldStatus = 'RECYCLED' and strNewStatus = 'ALLOCATED' and CompanyDatabase.dbo.StatusChangeLog.dtmTimestamp > @endDate and strRecordType = 'OrderItem'and intRecordId in ( select intRecordId from CompanyDatabase.dbo.StatusChangeLog where strOldStatus = 'ALLOCATED' and strNewStatus = 'RECYCLED' and strRecordType = 'OrderItem' and CompanyDatabase.dbo.StatusChangeLog.dtmTimestamp between @startDate and @endDate ) )

Okay, just some unions with more subquery filtering. More of the same. It's the next one that makes this special.

INSERT INTO @tblReturn2SELECT intOrderItemId FROM @tblReturn UNIONSELECT intOrderItemId FROM @tblReturn1

Ah, here's the stuff. This is just bonkers. If the goal is to combine the results of these queries into a single table, you could just insert into one table the whole time.

But we know that there are 5 of these tables, so why are we only going through the first two to combine them at this point?

 INSERT INTO @tblReturn3/* _factoryAllocation*/select oi.intOrderItemId from CompanyDatabase..Shipments s inner join CompanyDatabase..ShipmentItems si on si.intShipmentID = s.intShipmentIDinner join Common.CompanyDatabase.Stores stores on stores.intStoreID = s.intLocationIDinner join CompanyDatabase..OrderItems oi on oi.intOrderItemId = si.intOrderItemId inner join CompanyDatabase..Orders o on o.intOrderId = s.intOrderId where s.dtmTimestamp >= @endDateand stores.strLocationType = 'FACTORY'UNION ( /*_storeAllocations*/select oi.intOrderItemId from CompanyDatabase..Shipments s inner join CompanyDatabase..ShipmentItems si on si.intShipmentID = s.intShipmentIDinner join Common.CompanyDatabase.Stores stores on stores.intStoreID = s.intLocationIDinner join CompanyDatabase..OrderItems oi on oi.intOrderItemId = si.intOrderItemId inner join CompanyDatabase..Orders o on o.intOrderId = s.intOrderIdwhere s.dtmTimestamp >= @endDateand stores.strLocationType <> 'FACTORY')UNION(/* _ordersWithAllocationProblems */ select oi.intOrderItemIdfrom CompanyDatabase.dbo.StatusChangeLoginner join CompanyDatabase.dbo.OrderItems oi on oi.intOrderItemID = CompanyDatabase.dbo.StatusChangeLog.intRecordIDinner join CompanyDatabase.dbo.Orders o on o.intOrderId = oi.intOrderIdwhere strRecordType = 'orderitem'and strNewStatus = 'PROBLEM'and strOldStatus = 'NEW'and CompanyDatabase.dbo.StatusChangeLog.dtmTimestamp > @endDateand o.dtmTimestamp < @endDate)

Okay, @tblReturn3 is more of the same. Nothing more to really add.

 INSERT INTO @tblReturn4 SELECT intOrderItemId FROM @tblReturn2 WHERE intOrderItemId NOT IN(SELECT intOrderItemId FROM @tblReturn3 )

Ooh, but here we see something a bit different- we're taking the set difference between @tblReturn2 and @tblReturn3. This would almost make sense if there weren't already set operations in T-SQL which would handle all of this.

Which brings us, finally, to the last query in the whole thing:

SELECT o.intOrderId,oi.intOrderItemId,o.dtmDate,oi.strDescription,o.strFirstName + o.strLastName AS 'Name',o.strEmail,o.strBillingCountry,o.strShippingCountryFROM CompanyDatabase.dbo.OrderItems oiINNER JOIN CompanyDatabase.dbo.Orders o on o.intOrderId = oi.intOrderIdWHERE oi.intOrderItemId IN (SELECT intOrderItemId FROM @tblReturn4)END

At the end of all this, I've determined a few things.

First, the developer responsible didn't understand table variables. Second,they definitely didn't understand joins. Third, they had no sense of the overall workflow of this query and just sorta fumbled through until they got results that the client said were okay.

And somehow, this pile of trash made it through a code review by internal architects and got deployed to production, where it promptly became the worst performing query in their application. Correction: the worst performing query thus far.

buildmaster-icon.png [Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today!
External Content
Source RSS or Atom Feed
Feed Location http://syndication.thedailywtf.com/TheDailyWtf
Feed Title The Daily WTF
Feed Link http://thedailywtf.com/
Reply 0 comments