I recently presented a way of querying contiguous data ranges using SQL Server In this post I am going to tackle the same problem but solving it in Oracle instead.
Often times we are confronted with problems that seem very easy to start with but once you get your teeth into it, you realise it wasnâ€™t as easy as initially thought. In this blog, I am attempting to work with data ranges that are not easily groupable.
Problem Statement: Using the data structure listed below, group the data ranges by dataKey when the dataKey is contigous, when breaks in data key are identified, create a new reporting output line. Query input-
grp | dataRange | dataKey | Comments |
---|---|---|---|
A | 1000 | 1 | |
A | 1001 | 1 | |
A | 1002 | 1 | |
A | 1003 | 1 | |
A | 1004 | 1 | |
A | 1005 | 2 | — Notice dataKey is not contiguous |
A | 1006 | 2 | |
A | 1007 | 1 | |
A | 1008 | 1 | |
A | 1009 | 1 | |
A | 1010 | 1 | |
A | 1011 | 1 |
Desired Query output-
dataKey | DataRangeBreakDown |
---|---|
1 | A [1000 .. 1004] |
2 | A [1005 .. 1006] |
1 | A [1007 .. 1011] |
Approach: This is an interesting puzzle, because we cannot simply group the data by “dataKey” column, doing so would ignore the breaks in data. The key to solving this refreshing problem is by “somehow” introducing a pseudo-column which does the heavy lifting for us. Table below illustrates the new pseudo-column, if we introduce this column using SQL, rest is just simple group by. Before peeking at the final solution, I would encourage you to have a go, but if you can’t for any reason I have presented final solution later in the blog.
grp | dataRange | dataKey | New Column |
---|---|---|---|
A | 1000 | 1 | 0 |
A | 1001 | 1 | 0 |
A | 1002 | 1 | 0 |
A | 1003 | 1 | 0 |
A | 1004 | 1 | 0 |
A | 1005 | 2 | 1 |
A | 1006 | 2 | 1 |
A | 1007 | 1 | 2 |
A | 1008 | 1 | 2 |
A | 1009 | 1 | 2 |
A | 1010 | 1 | 2 |
A | 1011 | 1 | 2 |
Setup Data:
DROP TABLE tmpDataRange; CREATE TABLE tmpDataRange AS SELECT 'A' grp, 1000 AS dataRange, 1 AS dataKey FROM Dual UNION ALL SELECT 'A', 1001, 1 FROM Dual UNION ALL SELECT 'A', 1002, 1 FROM Dual UNION ALL SELECT 'A', 1003, 1 FROM Dual UNION ALL SELECT 'A', 1004, 1 FROM Dual UNION ALL SELECT 'A', 1005, 2 FROM Dual UNION ALL SELECT 'A', 1006, 2 FROM Dual UNION ALL SELECT 'A', 1007, 1 FROM Dual UNION ALL SELECT 'A', 1008, 1 FROM Dual UNION ALL SELECT 'A', 1009, 1 FROM Dual UNION ALL SELECT 'A', 1010, 1 FROM Dual UNION ALL SELECT 'A', 1011, 1 FROM Dual ;
Solution:
Welcome to the solution. I have tackled the problem using the aproach above using Common Table Expressions (CTE) feature. I think CTEs are fantastic even for writing daily garden variety queries as they make the query so much more understandable by being written in the way we digest other information, i.e. Top to Bottom. Overview of solution query:
Data0 – Instantiate the problem statement input data
Data1 – Assign incremental row numbers to input data
Data2 – Self join the data to the previous row
Data3 – This is where the “magic” happens – Build a cumulative data column which increments on every dataKey break
Data4 – Is concerned with grouping and presenting the data in format requested Presented below is the final solution:
WITH data0 AS (SELECT grp, dataRange, dataKey FROM tmpDataRange), data1 AS (SELECT grp, dataRange, dataKey, LAG(dataKey) OVER(ORDER BY dataRange ASC) lagDataKey FROM data0), data2 AS (SELECT grp, dataRange, dataKey, CASE WHEN dataKey=NVL(lagDataKey,dataKey) THEN 0 ELSE 1 END cumu FROM data1), data3 AS (SELECT grp, dataRange, dataKey, SUM(cumu) OVER(ORDER BY dataRange) cumulativeWindow FROM data2), data4 AS (SELECT cumulativeWindow, dataKey, MAX(grp) || ' [' || TO_CHAR(MIN(dataRange)) ||' .. '|| TO_CHAR(MAX(dataRange)) ||']' DataRangeBreakDown FROM data3 GROUP BY cumulativeWindow, dataKey) SELECT dataKey, DataRangeBreakDown FROM data4;
Variations: One of the great things about this solution is that it is data-type agnostic. Feel free to change the data-type of the dataRange column to DATE for example as shown below and it still works.
DROP TABLE tmpDataRange; CREATE TABLE tmpDataRange AS SELECT 'A' grp, SYSDATE AS dataRange, 1 AS dataKey FROM Dual UNION ALL SELECT 'A', SYSDATE+1, 1 FROM Dual UNION ALL SELECT 'A', SYSDATE+2, 1 FROM Dual UNION ALL SELECT 'A', SYSDATE+3, 1 FROM Dual UNION ALL SELECT 'A', SYSDATE+4, 1 FROM Dual UNION ALL SELECT 'A', SYSDATE+5, 2 FROM Dual UNION ALL SELECT 'A', SYSDATE+6, 2 FROM Dual UNION ALL SELECT 'A', SYSDATE+7, 1 FROM Dual UNION ALL SELECT 'A', SYSDATE+8, 1 FROM Dual UNION ALL SELECT 'A', SYSDATE+9, 1 FROM Dual UNION ALL SELECT 'A', SYSDATE+10, 1 FROM Dual UNION ALL SELECT 'A', SYSDATE+11, 1 FROM Dual ;
Would love to hear your thoughts on this, or if you have a better solution why not share it with everyone.