天天看點

使用SAP HANA Web-based Development Workbench進行SQLScript練習練習一:練習2:計算總共9125部電影,一共包含多少藝術類别?練習3:計算每種藝術類别總共包含多少部電影:練習4:列出每部電影包含的風格數目:練習5:羅列出每部電影的風格分布情況練習6:計算movie的rating分布情況練習7:統計使用者投票情況練習8:統計使用者投票得分情況

通過csv檔案提供的資料庫表内容:

links.csv的格式:

movies.csv格式,一個movie可以有多種風格(genres),通過|分隔:

ratings.csv:

使用者給movie打得分:

tags.csv:movie的标簽

練習一:

列出四張表的總記錄數:

select 'links'   as "table name", count(1) as "row count" from "MOVIELENS"."public.aa.movielens.hdb::data.LINKS"
union all
select 'movies'  as "table name", count(1) as "row count" from "MOVIELENS"."public.aa.movielens.hdb::data.MOVIES"
union all
select 'ratings' as "table name", count(1) as "row count" from "MOVIELENS"."public.aa.movielens.hdb::data.RATINGS"
union all
select 'tags'    as "table name", count(1) as "row count" from "MOVIELENS"."public.aa.movielens.hdb::data.TAGS";           

執行結果:

練習2:計算總共9125部電影,一共包含多少藝術類别?

DO
BEGIN
  DECLARE genreArray NVARCHAR(255) ARRAY;
  DECLARE tmp NVARCHAR(255);
  DECLARE idx INTEGER;
  DECLARE sep NVARCHAR(1) := '|';
  DECLARE CURSOR cur FOR SELECT DISTINCT "GENRES" FROM "MOVIELENS"."public.aa.movielens.hdb::data.MOVIES";
  DECLARE genres NVARCHAR (255) := '';
  idx := 1;
  FOR cur_row AS cur() DO
    SELECT cur_row."GENRES" INTO genres FROM DUMMY;
    tmp := :genres;
    WHILE LOCATE(:tmp,:sep) > 0 DO
      genreArray[:idx] := SUBSTR_BEFORE(:tmp,:sep);
      tmp := SUBSTR_AFTER(:tmp,:sep);
      idx := :idx + 1;
    END WHILE;
    genreArray[:idx] := :tmp;
  END FOR;

  genreList = UNNEST(:genreArray) AS ("GENRE");
  SELECT "GENRE" FROM :genreList GROUP BY "GENRE";
END;           

執行結果,總共包含18種:

練習3:計算每種藝術類别總共包含多少部電影:

DO
BEGIN
  DECLARE genreArray NVARCHAR(255) ARRAY;
  DECLARE tmp NVARCHAR(255);
  DECLARE idx INTEGER;
  DECLARE sep NVARCHAR(1) := '|';
  DECLARE CURSOR cur FOR SELECT DISTINCT "GENRES" FROM "MOVIELENS"."public.aa.movielens.hdb::data.MOVIES";
  DECLARE genres NVARCHAR (255) := '';
  idx := 1;
  FOR cur_row AS cur() DO
    SELECT cur_row."GENRES" INTO genres FROM DUMMY;
    tmp := :genres;
    WHILE LOCATE(:tmp,:sep) > 0 DO
      genreArray[:idx] := SUBSTR_BEFORE(:tmp,:sep);
      tmp := SUBSTR_AFTER(:tmp,:sep);
      idx := :idx + 1;
    END WHILE;
    genreArray[:idx] := :tmp;
  END FOR;

  genreList = UNNEST(:genreArray) AS ("GENRE");
  SELECT "GENRE", count(1) FROM :genreList GROUP BY "GENRE";
END;           

練習4:列出每部電影包含的風格數目:

SELECT
    "MOVIEID"
  , "TITLE"
  , OCCURRENCES_REGEXPR('[|]' IN GENRES) + 1 "GENRE_COUNT"
  , "GENRES"
FROM "MOVIELENS"."public.aa.movielens.hdb::data.MOVIES"
ORDER BY "GENRE_COUNT" ASC;           

練習5:羅列出每部電影的風格分布情況

SELECT
    "GENRE_COUNT"
  , COUNT(1)
FROM (
  SELECT
    OCCURRENCES_REGEXPR('[|]' IN "GENRES") + 1 "GENRE_COUNT"
  FROM "MOVIELENS"."public.aa.movielens.hdb::data.MOVIES"
)
GROUP BY "GENRE_COUNT" ORDER BY "GENRE_COUNT";           

比如至少擁有1個風格的電影,有2793部,2個風格的電影有3039部,等等。

練習6:計算movie的rating分布情況

SELECT DISTINCT
  MIN("RATING_COUNT") OVER( ) AS "MIN",
  MAX("RATING_COUNT") OVER( ) AS "MAX",
  AVG("RATING_COUNT") OVER( ) AS "AVG",
  SUM("RATING_COUNT") OVER( ) AS "SUM",
  MEDIAN("RATING_COUNT") OVER( ) AS "MEDIAN",
  STDDEV("RATING_COUNT") OVER( ) AS "STDDEV",
  COUNT(*) OVER( ) AS "CATEGORY_COUNT"
FROM (
  SELECT "MOVIEID", COUNT(1) as "RATING_COUNT"
  FROM "MOVIELENS"."public.aa.movielens.hdb::data.RATINGS"
  GROUP BY "MOVIEID"
)
GROUP BY "RATING_COUNT";           

明細情況:

SELECT "RATING_COUNT", COUNT(1) as "MOVIE_COUNT"
FROM (
  SELECT "MOVIEID", COUNT(1) as "RATING_COUNT"
  FROM "MOVIELENS"."public.aa.movielens.hdb::data.RATINGS"
  GROUP BY "MOVIEID"
)
GROUP BY "RATING_COUNT" ORDER BY "RATING_COUNT" asc;           

比如有397部電影的使用者投票數為5票

練習7:統計使用者投票情況

SELECT "RATING_COUNT", COUNT(1) as "USER_COUNT"
FROM (
  SELECT "USERID", COUNT(1) as "RATING_COUNT"
  FROM "MOVIELENS"."public.aa.movielens.hdb::data.RATINGS"
  GROUP BY "USERID"
)
GROUP BY "RATING_COUNT" ORDER BY 1 DESC;           

有一位使用者投了2391票,一位使用者投了1868票:

練習8:統計使用者投票得分情況

SELECT "RATING", COUNT(1) as "RATING_COUNT"
FROM "MOVIELENS"."public.aa.movielens.hdb::data.RATINGS"
GROUP BY "RATING" ORDER BY 1 DESC;           

有15095份使用者投票,打的分數是5分

本文來自雲栖社群合作夥伴“汪子熙”,了解相關資訊可以關注微信公衆号"汪子熙"。