Some of the most popular courses we teach are the ones dealing with writing SQL queries against SQL Server. Recently I had some students ask how, for each order, can I see the number of days since the customer’s most recent order, as well as the number of days since the customer’s first order. Actually, these students were working in the healthcare industry, and they wanted to know the number of days since a patient’s most recent appointment and the number of days since the same patient’s first appointment, but we’ll adopt the question to order data to make it easy to practice with the many variations of SQL Server sample databases. How to solve the problem? SQL Server Window Functions to the rescue.
SQL Server had some limited window functions in SQL 2005 and 2008, but the capability was enhanced in SQL 2012. I’ll show you the queries against Microsoft’s AdventureWorks2012 database, but many sample databases would work.
Let’s start with the table structure. We only need three columns. The Sales.SalesOrderHeader table has what we need: SalesOrderID, CustomerID, and OrderDate. The basic query without any windows function:
SELECT CustomerID
, SalesOrderID
, OrderDate
FROM Sales.SalesOrderHeader
ORDER BY CustomeID, OrderDate DESC;
The first three rows from the result set:
Number of days since the customer’s most recent order.
CustomerID |
SalesOrderID |
OrderDate |
11000 |
43793 |
2005-07-22 00:00:00.000 |
11000 |
51522 |
2007-07-22 00:00:00.000 |
11000 |
57418 |
2007-11-04 00:00:00.000 |
What a window function does in a result set is, for a given row, let you compare or aggregate values from other rows from within the same result set. Let’s start by adding a fourth column to the result set that shows the first OrderDate for each customer for every row.
SELECT CustomerID
, SalesOrderID
, OrderDate
, FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS FirstOrderDate
FROM Sales.SalesOrderHeader
ORDER BY CustomeID, OrderDate DESC;
The first three rows from the result set:
Number of days since the customer’s most recent order and first order date
CustomerID |
SalesOrderID |
OrderDate |
FirstOrderDate |
11000 |
43793 |
2005-07-22 00:00:00.000 |
2005-07-22 00:00:00.000 |
11000 |
51522 |
2007-07-22 00:00:00.000 |
2005-07-22 00:00:00.000 |
11000 |
57418 |
2007-11-04 00:00:00.000 |
2005-07-22 00:00:00.000 |
So the fourth column now shows the first OrderDate for each customer. But how does it work?
The syntax for a window function definition goes something like this:
Window_Function() OVER (how to temporarily subdivide and order rows in result set)
So in our example, the window function is FIRST_VALUE(OrderDate), that is, get the first OrderDate in the group. Here it is in all its glory.
FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS FirstOrderDate
The OVER keyword is just that, a mandatory keyword. The parentheses are required after the OVER(). Inside the parentheses, you explain to SQL Server how to order and group the rows in the result set to which the window function is applied.
The PARTITION BY clause in the window function is optional, but is used to create groups for the rows in the result set temporarily. Think of it as a temporary GROUP BY clause for the result set. When PARTITION BY is included, the window function will basically “reset” for each group. In our case, it “resets” for each changed CustomerID.
We add to the OVER clause the ORDER BY statement. Again, in a temporary fashion, each of the rows in each of the groups is put into a specific order for the window function. So for each customer, we are going to report on the first order date placed by each individual customer. Note that the ORDER BY clause of the window function can be completely different from the ORDER BY clause for the entire query. The PARTITION BY and ORDER BY clauses inside the OVER() function are temporary in nature and only apply to the result set while the results of the window function are being calculated.
However, we don’t want the raw dates, we want the number of days between the first order and the current order. So we need to apply a DATEDIFF function.
SELECT CustomerID
, SalesOrderID
, OrderDate
, DATEDIFF(DAY,
FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate),
OrderDate) AS DaysFromFirstOrder
FROM Sales.SalesOrderHeader
ORDER BY CustomeID, OrderDate DESC;
The first three rows of the result:
Number of days since the customer’s most recent order, as well as the number of days since the customer’s first order.
CustomerID |
SalesOrderID |
OrderDate |
DaysFromFirstOrder |
11000 |
43793 |
2005-07-22 00:00:00.000 |
0 |
11000 |
51522 |
2007-07-22 00:00:00.000 |
730 |
11000 |
57418 |
2007-11-04 00:00:00.000 |
835 |
Just about there! The last requirement is the number of days between the current order and the most recent order for each customer. We’ll have to add a second window function to the select statement of our query. Point is, you can have multiple window functions in a single query.
The window function we need is LAG(column [, offset] [, default]). We want to LAG on the OrderDate column by 1 row. If there is no previous row, allow the result to be NULL. Here’s how it looks.
LAG(OrderDate, 1) OVER (PARTITION BY CustomerID ORDER BY OrderDate)
Here’s how the final query looks:
SELECT CustomerID
, SalesOrderID
, OrderDate
, DATEDIFF(DAY,
FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate),
OrderDate) AS DaysFromFirstOrder
, COALESCE(
DATEDIFF(DAY,
LAG(OrderDate, 1) OVER (PARTITION BY CustomerID ORDER BY OrderDate),
OrderDate)
,0) AS DaysFromPrevOrder
FROM Sales.SalesOrderHeader
ORDER BY CustomerID, OrderDate;
And the final result:
Number of days since the customer’s most recent order, as well as the number of days since the customer’s first order. Also, Days from previous order.
CustomerID |
SalesOrderID |
OrderDate |
DaysFromFirstOrder |
DaysFromPrevOrder |
11000 |
43793 |
2005-07-22 00:00:00.000 |
0 |
0 |
11000 |
51522 |
2007-07-22 00:00:00.000 |
730 |
730 |
11000 |
57418 |
2007-11-04 00:00:00.000 |
835 |
105 |
If you are interested in learning about these and many other powerful ways to write queries against data in SQL Server, come and take a class with us.