星期五, 10月 28, 2011

[Challenge] 累計加總(Runing Total)

Beyond Relational 的 TSQL Challenge 65
  • 資料來源(題目內有資料來源的 script)
Date       CustomerID Type  Amount
---------- ---------- ----- --------
2011-01-01 CUST1001   INV   12000.00
2011-01-02 CUST1001   PAY    3000.00
2011-01-03 CUST1001   INV    8000.00
2011-01-04 CUST1001   PAY    9000.00
2011-01-04 CUST1002   INV    1000.00
2011-01-05 CUST1002   PAY    5000.00
2011-01-05 CUST1002   INV    6000.00
  • 結果
Date       CustomerID Type  Amount   Balance
---------- ---------- ----- -------- ---------
2011-01-01 CUST1001   INV   12000.00  12000.00
2011-01-02 CUST1001   PAY    3000.00   9000.00
2011-01-03 CUST1001   INV    8000.00  17000.00
2011-01-04 CUST1001   PAY    9000.00   8000.00
2011-01-04 CUST1002   INV    1000.00   1000.00
2011-01-05 CUST1002   INV    6000.00   7000.00
2011-01-05 CUST1002   PAY    5000.00   2000.00
  • 規則:
    1. 同一個客戶同一天會有多筆 INV 和 PAY 資料
    2. 同一個客戶同一天內的全部的 INV 資料必須先計算,之後再計算 PAY 資料
    3. 結果必須根據 CustomerID , Date , Type , Amount 來排序
    4. Balance 可以為負


Self - Join 解法

;
WITH CTE AS
(
  SELECT 
          CustomerID,
          Date,
          [Type],
          Amount,
          ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY CustomerID,Date,Type) AS ROWNO -- 產生基準欄位
  FROM TC65
)
SELECT 
       T1.CustomerID,
       T1.Date,
       T1.Type,
       T1.Amount,
       Balance = 
           SUM
               (
                 CASE 
                   WHEN T2.[Type] = 'PAY' THEN T2.Amount * -1
                   ELSE T2.Amount
                 END
               )
FROM CTE AS T1 
      JOIN CTE AS T2 ON T1.CustomerID = T2.CustomerID 
                           AND T2.ROWNO <= T1.ROWNO -- JOIN 時 關鍵語法
GROUP BY T1.CustomerID,T1.Date,T1.Type,T1.Amount
ORDER BY T1.CustomerID, T1.Date, T1.Type, T1.Amount
使用自身連結(Self Join),把 T1 當成基本資料表,T2 當成累計資料表,利用 ROW_NUMBER() 產生基準連結欄位來 JOIN 兩個 Table,JOIN 條件限制為 T2 基準連結欄位不可大餘 T1 基準連擊欄位,結果如下圖,從圖形就可以清楚看出,根據基準連結欄位,產生 T1 基本資料表中每筆對應的累計資料。
;
WITH CTE AS
(
  SELECT
          CustomerID,
          Date,
          [Type],
          Amount,
          ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY CustomerID,Date,Type) AS ROWNO
  FROM TC65
  WHERE CustomerID = 'CUST1001' -- 故意限定在一個客戶就好
)
SELECT
        T1.CustomerID,
        T1.Date,
        T1.Type,
        T1.Amount,
        T1.ROWNO,
        T2.ROWNO, -- 把 T2 的 ROWNO、Type 和 Amount 欄位列出來
        T2.Type,
        T2.Amount
FROM CTE AS T1 
      JOIN CTE AS T2 ON T1.CustomerID = T2.CustomerID 
                           AND T2.ROWNO <= T1.ROWNO -- 關鍵語法
[SQL] 累計加總(Runing Total)

Cursor 解法

DECLARE @Result TABLE ( CustomerID varchar(50) , [Date] datetime , [Type] varchar(3) , Amount money , Balance money)

DECLARE curCustomerID CURSOR
  FOR
      SELECT CustomerID
      FROM TC65
      GROUP BY CustomerID

OPEN curCustomerID

  DECLARE @customerid as varchar(50)
  DECLARE @balance as money

  FETCH NEXT FROM curCustomerID INTO @customerid
    WHILE (@@FETCH_STATUS = 0)
      BEGIN  
   
        SET @balance = 0

        DECLARE curData CURSOR
          FOR
              SELECT [Date] , [Type] , Amount
              FROM TC65
              WHERE CustomerID = @customerid
              ORDER BY [Date] , [Type]

        OPEN curData
    
          DECLARE @date as datetime
          DECLARE @type as varchar(3)
          DECLARE @amount as money
   
          FETCH NEXT FROM curData INTO @date,@type,@amount
            WHILE (@@FETCH_STATUS = 0)
              BEGIN  

                SET @balance = @balance + IIF(@type = 'PAY' , @amount * -1 , @amount )

                INSERT INTO @result (CustomerID,[Date],[Type],Amount,Balance)
                  VALUES(@customerid,@date,@type,@amount,@balance)      

                FETCH NEXT FROM curData INTO @date,@type,@amount
              END

        CLOSE curData
        DEALLOCATE curData

        FETCH NEXT FROM curCustomerID INTO @customerid
      END

CLOSE curCustomerID
DEALLOCATE curCustomerID

SELECT * FROM @result

Window 函數解法

SELECT * , 
  SUM
    (
      CASE 
        WHEN [Type] = 'PAY' THEN Amount * -1 
        ELSE Amount 
      END
    ) 
  OVER 
    (
      PARTITION BY CustomerID 
      ORDER BY [Date] , CASE WHEN [Type] = 'INV' THEN 1 ELSE 2 END 
      ROWS UNBOUNDED PRECEDING -- 2012 新用法
    ) AS Balance
FROM TC65
ORDER BY CustomerID, [Date], [Type], Amount

Techday 2012 - SQL Server 2012 T-SQL 新語法應用實務筆記

課程中,講師有提到 Runing Total 議題,是少數建議使用 Cursor 來處理的問題,利用 Self Join 或 SubQuery 的話,會有效能問題。

30 VS 10萬筆資料測試:Cursor、Self Join、SubQuery 和 Window 函數跑30筆時差不多,但資料筆數增加時 Cursor 所需時間會呈線性成長,Self Join 和 SubQuery 所需時間會呈指數倍增,10萬筆資料實測結果如下:
  1. Cursor => 3 秒
  2. SubQuery => 18 秒 
  3. Self-Join => 跑不完
  4. Window 函數 => 小於 1 秒
建議 2012 請用 Window 函數、2012 之前版本請用 Cursor 處理 Runing Total。
    優缺點比較

程式碼
關聯子查詢
Window 函數
優點
         通常用在其他程式語言的解法
         遵循 SQL 以集合為基礎的概念
         最佳化
         容易寫
         符合 SQL 以集合為基礎
缺點
         不以集合為基礎
         SQL 引擎只能對一句句傳入的語法最佳化,沒辦法視整個需求為單一操作
         多行程式碼較高bugs
         較差的效能,因為執行計畫是以 N2 複雜度計算
         不好寫
         並非適合所有狀況
         不是所有的查詢都很容易透過 window 函數重寫

沒有留言:

張貼留言