How can I use an SQL Pivot for this?

By : Peter M
Source: Stackoverflow.com
Question!

I have a data set that is organized in the following manner:

Timestamp|A0001|A0002|A0003|A0004|B0001|B0002|B0003|B0004 ...
---------+-----+-----+-----+-----+-----+-----+-----+-----
2008-1-1 |  1  |  2  | 10  |   6 |  20 |  35 | 300 |  8
2008-1-2 |  5  |  2  |  9  |   3 |  50 |  38 | 290 |  2    
2008-1-4 |  7  |  7  | 11  |   0 |  30 |  87 | 350 |  0
2008-1-5 |  1  |  9  |  1  |   0 |  25 | 100 |  10 |  0
...

Where A0001 is Value A of item #1 and B0001 is Value B of item #1. There can be over 60 different items in a table, and each item has an A value column and a B value column, meaning a total of over 120 columns in the table.

Where I want to get to is a 3 column result (Item index, A Value, B Value) that sums the A and B values for each item:

Index | A Value | B Value
------+---------+--------
 0001 |   14    |   125
 0002 |   20    |   260
 0003 |   31    |   950
 0004 |    9    |    10
 .... 

As I am going from columns to rows I would expect a pivot in the solution, but I am not sure of how to flesh it out. Part of the issue is how to strip out the A's and B's to form the values for the Index column. The other part is that I have never had to use a Pivot before, so I am stumbling over the basic syntax as well.

I think that ultimately I need to have a multi step solution that first builds the summations as:

ColName | Value
--------+------
A0001   |  14
A0002   |  20
A0003   |  31
A0004   |   9
B0001   | 125
B0002   | 260
B0003   | 950
B0004   |  10

Then modify the ColName data to strip out the index:

ColName | Value | Index | Aspect
--------+-------+-------+-------
A0001   |  14   | 0001  |  A
A0002   |  20   | 0002  |  A
A0003   |  31   | 0003  |  A
A0004   |   9   | 0004  |  A
B0001   | 125   | 0001  |  B
B0002   | 260   | 0002  |  B
B0003   | 950   | 0003  |  B
B0004   |  10   | 0004  |  B

Finally self join to move the B values up next to the A Values.

This seems to be a long winded process to get what I want. So I am after advice as to whether I am headed down the right path, or is there another approach that I have over looked that will make my life so much easier.

Note 1) The solution has to be in T-SQL on MSSQL 2005.

Note 2) The format of the table cannot be changed.

Edit Another method I have thought about uses UNIONs and individual SUM()s on each column:

SELECT '0001' as Index, SUM(A0001) as A, SUM(B0001) as B FROM TABLE
UNION
SELECT '0002' as Index, SUM(A0002) as A, SUM(B0002) as B FROM TABLE
UNION
SELECT '0003' as Index, SUM(A0003) as A, SUM(B0003) as B FROM TABLE
UNION
SELECT '0004' as Index, SUM(A0004) as A, SUM(B0004) as B FROM TABLE
UNION
...

But this approach really doesn't look very nice either

EDIT So far there are 2 great responses. But I would like to add two more conditions to the query :-)

1) I need to select the rows based on a range of timestamps (minv < timestamp < maxv).

2) I also need to conditionally select rows on a UDF that processes the timestamp

Using Brettski's table names, would the above translate to:

...
(SELECT A0001, A0002, A0003, B0001, B0002, B0003 
 FROM ptest 
 WHERE timestamp>minv AND timestamp<maxv AND fn(timestamp)=fnv) p
unpivot
(val for item in (A0001, A0002, A0003, B0001, B0002, B0003)) as unpvt
...

Given that I have conditionally add the fn() requirement, I think that I also need to go down the dynamic SQL path as proposed by Jonathon. Especially as I have to build the same query for 12 different tables - all of the same style.

By : Peter M


Answers

Same kinda answer here, that was fun:

-- Get column names from system table
DECLARE @phCols NVARCHAR(2000)
SELECT @phCols = COALESCE(@phCols + ',[' + name + ']', '[' + name + ']') 
    FROM syscolumns WHERE id = (select id from sysobjects where name = 'Test' and type='U')

-- Get rid of the column we don't want
SELECT @phCols = REPLACE(@phCols, '[Timestamp],', '')

-- Query & sum using the dynamic column names
DECLARE @exec nvarchar(2000)
SELECT @exec =
'
    select
    	SUBSTRING([Value], 2, LEN([Value]) - 1) as [Index],
    	SUM(CASE WHEN (LEFT([Value], 1) = ''A'') THEN Cols ELSE 0 END) as AValue, 
    	SUM(CASE WHEN (LEFT([Value], 1) = ''B'') THEN Cols ELSE 0 END) as BValue
    FROM
    (
    	select *
    	from (select ' + @phCols + ' from Test) as t
    	unpivot (Cols FOR [Value] in (' + @phCols + ')) as p
    ) _temp
    GROUP BY SUBSTRING([Value], 2, LEN([Value]) - 1)
'
EXECUTE(@exec)

You don't need to hard code column names in this one.



OK, I have come up with one solution which should get you started. It will probably take some time to put together, but will perform well. It would be nice if we didn't have to list out all the columns by name.

Basically this is using UNPIVOT and placing that product into a temp table, then querying it into your final data set. I named my table ptest when I put this together, this is the one with all of the A0001, etc columns.

-- Create the temp table
CREATE TABLE #s (item nvarchar(10), val int)

-- Insert UNPIVOT product into the temp table
INSERT INTO  #s (item, val)
SELECT item, val
FROM
(SELECT A0001, A0002, A0003, B0001, B0002, B0003
FROM ptest) p
unpivot
(val for item in (A0001, A0002, A0003, B0001, B0002, B0003)) as unpvt

-- Query the temp table to get final data set
SELECT RIGHT(item, 4) as item1,
Sum(CASE WHEN LEFT(item, 1) = 'A' THEN val ELSE 0 END) as A,
Sum(CASE WHEN LEFT(item, 1) = 'B' THEN val ELSE 0 END) as B
from #s
GROUP BY RIGHT(item, 4)

-- Delete temp table 
drop table #s

By the way, thanks for the question, this was the first time I got to use UNPIVOT. Always wanted to, just never had a need.

By : Brettski


I have found that 'transactions' behave very differently on different SQL servers. In some cases, starting a transaction locks all other connections from being able to execute any SQL until the transaction is committed or rolled back (MS SQLServer 6.5). Others don't have any problems, and only lock when there is a modification (oracle). The locks can even expand to encompass only your changes - cell locks / row locks / page locks / table locks.

Typically I use transactions only when data-integrity between multiple insert / delete / update statements must be maintained. Even still, I prefer to implement this using DB-defined cascading deletes so that the database does it automatically and atomically.

Use a transaction if you can foresee a situation where you would want to rollback multiple modifications, but otherwise, the database will do it's atomic updates without the extra code to deal with it.

By : Kieveli


This video can help you solving your question :)
By: admin