天天看點

SQL自定義函數 字元串截取 遞歸查詢 以及一個小例子背景1. Table_UFName2. Table_UFLocation3. Table_AllUFLocation4. Table_AllUFFile總結

背景

最近遇到一個需求,抽出與本文最相關的,概括大緻如下:

  1. 有三個表,描述某個File Share的檔案拓撲:
    • User表,記錄file owner name(domain\user_name),唯一主鍵UserId
    • Location表,記錄folder的父子關系,唯一主鍵LocationId
    • File表,記錄檔案metadata
  2. 有些folder以某些user_name為檔案夾名,稱此類檔案夾為user folder
  3. 若某user folder下有子檔案夾,子檔案夾也屬于此user folder
  4. 找出所有user folder下的所有file

解決邏輯:

  1. 找出所有的user_name(Table_UFName)
  2. 根據user name清單,找出對應的user folder(Table_UFLocation)
  3. 遞歸找出所有的user folder(包括所有的子檔案夾,Table_AllUFLocation)
  4. 找到所有user folder下的所有file(Table_AllUFFile)

找個自己拿手的語言,把資料從表中取出來,然後照上面的邏輯來做,這樣比較簡單,但為了學習和練手,就拿Transact-SQL來寫實作。上面四步中,每步的輸出都是一個table。

START

1. Table_UFName

對于此表,首先,需要寫個String Split方法,能夠截取出來“\”後面的user_name;然後對于User表中每個user,都執行該Split方法,結果寫入此表。

自定義SQL function

我們需要建立一個新的自定義SQL Function。SQL function細節很多,此處不展開,有需求看前連結。此function的輸入是一個String,一個Char,調用的時候指定index(取截取出來的String list中的第幾個)

-- CREATE SPLITSTRING FUNCTION
IF EXISTS 
(
    SELECT *
    FROM SYS.OBJECTS
    WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[SplitString]')
    AND TYPE IN ( N'FN', N'IF', N'TF', N'FS', N'FT' )
)
DROP FUNCTION [dbo].[SplitString]
GO

CREATE FUNCTION [dbo].[SplitString]
(
    @str NVARCHAR(MAX), 
    @separator CHAR()
)
RETURNS TABLE
AS
RETURN (
WITH tokens(p, a, b) AS (
    SELECT 
        CAST( AS BIGINT), 
        CAST( AS BIGINT), 
        CHARINDEX(@separator, @str)
    UNION ALL
    SELECT
        p + , 
        b + , 
        CHARINDEX(@separator, @str, b + )
    FROM tokens
    WHERE b > 
)
SELECT
    p- ItemIndex,
    SUBSTRING(
        @str, 
        a, 
        CASE WHEN b >  THEN b-a ELSE LEN(@str) END) 
    AS Item
FROM tokens
);
GO
           

使用方法如下

傳回的是

Shawn
           

T-SQL while循環

對每個user name執行SplitString方法,結果輸出到一個表中,該表就是Table_UFName(代碼中就是臨時表 @Table_UFName)。(感覺T-SQL的循環應該有更簡單的寫法,這裡暫時不糾結)

DECLARE @userid INT
DECLARE @maxuserid INT
DECLARE @username NVARCHAR(260)
DECLARE @userfoldername NVARCHAR(260)

DECLARE @Table_UFName TABLE (UserFolderName NVARCHAR(260))

SELECT @userid = MIN(UserId) FROM [dbo].[User]
SELECT @maxuserid = MAX(UserId) FROM [dbo].[User]

-- FIND USER PROFILE FOLDER NAMES FROM SPLITTING USER NAMES
WHILE(@userid <= @maxuserid)
BEGIN
    SELECT @username = [UserName] FROM [dbo].[User] WHERE [UserId] = @userid
    SELECT @userfoldername = Item FROM [dbo].[SplitString](@username,'\') WHERE ItemIndex = 1
    INSERT INTO @Table_UFName ([UserFolderName]) VALUES
    (@userfoldername)
    SET @userid = @userid + 1
END
           

2. Table_UFLocation

此表比較簡單,就是根據Table_UFName表,取出所有的root user folder(root user folder指那些檔案夾名直接就在Table_UFName中的那些folder,不包含其中的子檔案夾),寫入Table_UFLocation

-- FIND USER FOLDER LOCATIONS
IF EXISTS (SELECT OBJECT_ID FROM SYS.TABLES WHERE name = 'Table_UFLocation' AND TYPE = 'U')
BEGIN
DROP TABLE [dbo].[Table_UFLocation]
END
--BEGIN
SELECT IDENTITY(INT, , ) AS [LocId], [LocationId], [ParentId], [FolderName], [Level] INTO [dbo].[Table_UFLocation]
FROM [dbo].[Location]
WHERE [FolderName] IN (SELECT UserFolderName FROM @Table_UFName)
--END
GO
           

NOTE

因為上面儲存Table_UFName的時候,用的是臨時表,是以,這裡正好也能展現BEGIN-END和GO的差別(就是上面代碼塊中被注釋了的BEGIN-END,可以自行comment GO,uncomment BEGIN-END,看看有什麼效果)

3. Table_AllUFLocation

這裡需要用到遞歸,對每個根user folder,找到其所有sub folder

T-SQL的遞歸查詢

這裡就需要提到CTE(common table expression)。

A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.

最重要的,就是這個

self-referencing

,確定CTE能夠較簡單地支援使用遞歸查詢。下面就是使用CTE實作遞歸查詢的T-SQL(line 25~37)。建立Table_UFAllLocation,然後依次對每個root user folder執行遞歸查詢,找到所有的子user folder,并插入Table_UFAllLocation

DECLARE @maxlocid INT
DECLARE @locid INT
DECLARE @id INT

SELECT @maxlocid = MAX(LocId) FROM [dbo].[Table_UFLocation]
SELECT @locid = MIN(LocId) FROM [dbo].[Table_UFLocation]

IF EXISTS (SELECT OBJECT_ID FROM SYS.TABLES WHERE name = 'Table_AllUFLocation' AND TYPE = 'U')
BEGIN
DROP TABLE [dbo].[Table_AllUFLocation]
END
CREATE TABLE [dbo].[Table_AllUFLocation]
(
    [LocationId] INT NOT NULL,
    [ParentId] INT NOT NULL,
    [FolderName] NVARCHAR() NOT NULL,
    [Level] INT NOT NULL,
    [UF_Tag] NVARCHAR() NOT NULL
);

-- FIND ALL FOLDERS IN USER ROOT FOLDERS
WHILE(@locid <= @maxlocid)
BEGIN
    SELECT @id = LocationId FROM Table_UFLocation WHERE LocId = @locid
    ;WITH allloc AS
    (
        SELECT loc.[LocationId], loc.[ParentId], loc.[FolderName], loc.[Level], ufloc.[FolderName] AS [UF_Tag]
        FROM [dbo].[Location] AS loc
        INNER JOIN [dbo].[Table_UFLocation] AS ufloc ON loc.[LocationId] = ufloc.[LocationId]
        WHERE loc.[LocationId] = @id
        UNION ALL
        SELECT loc.[LocationId], loc.[ParentId], loc.[FolderName], loc.[Level], allloc.[UF_Tag]
        FROM [dbo].[Location] AS loc
        JOIN allloc ON loc.[ParentId] = allloc.[LocationId]
    )
    INSERT INTO [dbo].[Table_AllUFLocation]
    SELECT * FROM allloc
    SET @locid = @locid + 
END
GO
           

4. Table_AllUFFile

這步就是簡單的query了,找到所有user folder下的所有file,寫入Table_AllUFFile

IF EXISTS (SELECT OBJECT_ID FROM SYS.TABLES WHERE name = 'Table_AllUFFile' AND TYPE = 'U')
BEGIN
DROP TABLE [dbo].[Table_AllUFFile]
END
SELECT fi.[FileId], fi.[FileName], fi.[Location], upfallloc.[UF_Tag] INTO [dbo].[Table_AllUFFile]
FROM [dbo].[File] AS fi
INNER JOIN [dbo].[Table_AllUFLocation] AS upfallloc ON fi.[Location] = upfallloc.[LocationId]
WHERE fi.[Location] IN
(
    SELECT DISTINCT [LocationId] FROM [dbo].[Table_AllUFLocation]
)
           

FINISH

關于BEGIN-END與GO的差別,這裡可以補充說一下,通過以下code能看得很清楚:

  • BEGIN-END代表的是代碼塊的分隔符,意義在于分隔,就像是其他程式設計語言裡面的花括号一樣。必須配對,有BEGIN就要有對應的END,例如下面的code是錯誤的,會報

    Incorrect syntax near '@string_test'.

    (就像沒有寫反花括号一樣)
DECLARE @string_test nvarchar()
SET @string_test = 'A'
BEGIN
    PRINT @string_test
           
  • GO不僅僅代表代碼塊的形式上的分隔,更有含義上的分隔,就像是function的概念。若幹個GO之間的代碼塊與代碼塊之間,在code執行的過程中,是沒有關系的,這樣的code是正确的
DECLARE @string_test1 nvarchar(100)
SET @string_test1 = 'A'
BEGIN
    PRINT @string_test1
END
GO

DECLARE @string_test2 nvarchar()
SET @string_test2 = 'B'
BEGIN
    PRINT @string_test2
END
GO
           

而這樣的code就有誤,報

Must declare the scalar variable "@string_test".

DECLARE @string_test nvarchar(100)
SET @string_test = 'A'
BEGIN
    PRINT @string_test
END
GO

SET @string_test = 'B'
BEGIN
    PRINT @string_test
END
GO
           

因為@string_test的作用域是第一個GO之前,就像是一個function中定義的變量,出了這個function就無效了

— 在這裡下載下傳以上完整測試代碼 —

總結

  1. T-SQL自定義函數。
  2. T-SQL實作字元串截取的函數
  3. BEGIN-END與GO的差別
  4. 使用T-SQL CTE實作遞歸查詢
  5. 實作部落格零的突破,finally……