天天看點

AWS - Redshift - Spectrum 外部資料

  1. 建立IAM Role

    在 Select your use case 下,選擇 Redshift - Customizable,然後選擇 Next: Permissions。

    此時顯示 Attach permissions policy 頁面。

    選擇 AmazonS3ReadOnlyAccess 和 AWSGlueConsoleFullAccess(如果使用的是 AWS Glue 資料目錄)。或選擇 AmazonAthenaFullAccess(如果使用的是 Athena Data Catalog)。

    選擇 Next: Review。

  2. 将Role 賦給 Redshift Cluster。

    選中 叢集,點選“管理IAM角色”。添加一個role。

  3. 建立外部 schema 及 外部表 (glue)

    3.1 建立外部 schema

testdb=# create external schema chen_glue_db_schema from data catalog
testdb-# database 'chen_glue_db'
testdb-# iam_role 'arn:aws-cn:iam::524560386974:role/chen-redshift-glue-role'
testdb-# create external database if not exists;
INFO:  External database "chen_glue_db" already exists
CREATE SCHEMA

           

檢視是否導入

testdb=# select * from pg_catalog.svv_external_tables;
     schemaname      |      tablename       |                              location                              |               input_format               |                       output_format                        |                 se
rialization_lib                  |                 serde_parameters                 | compressed |                                 parameters
---------------------+----------------------+--------------------------------------------------------------------+------------------------------------------+------------------------------------------------------------+-------------------
---------------------------------+--------------------------------------------------+------------+-----------------------------------------------------------------------------
 chen_glue_db_schema | chen_glue_table_0603 | s3://chen-redshift-spectrum/testdb1/spectrum/chen_glue_table_0603/ | org.apache.hadoop.mapred.TextInputFormat | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | org.apache.hadoop.
hive.serde2.OpenCSVSerde         | {"separatorChar":","}                            |          0 | {"classification":"csv"}
 spectrum            | sales                | s3://chen-redshift-spectrum/testdb1/spectrum/sales                 | org.apache.hadoop.mapred.TextInputFormat | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | org.apache.hadoop.
hive.serde2.lazy.LazySimpleSerDe | {"field.delim":"\t","serialization.format":"\t"} |          0 | {"EXTERNAL":"TRUE","numRows":"172000","transient_lastDdlTime":"1587466643"}
 spectrum            | test01               | s3://chen-redshift-spectrum/testdb1/spectrum/test01                | org.apache.hadoop.mapred.TextInputFormat | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | org.apache.hadoop.
hive.serde2.lazy.LazySimpleSerDe | {"field.delim":"\t","serialization.format":"\t"} |          0 | {"EXTERNAL":"TRUE","numRows":"172000","transient_lastDdlTime":"1587466757"}
 spectrum            | test02               | s3://chen-redshift-spectrum/testdb1/spectrum/test02                | org.apache.hadoop.mapred.TextInputFormat | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | org.apache.hadoop.
hive.serde2.lazy.LazySimpleSerDe | {"field.delim":"\t","serialization.format":"\t"} |          0 | {"EXTERNAL":"TRUE","numRows":"172000","transient_lastDdlTime":"1587467742"}
(4 rows)
           

查詢表

testdb=# select * from chen_glue_db_schema.chen_glue_table_0603;
 id | col1
----+------
  1 | a
  2 | b
(2 rows)
           

當在建立external schema 時,出現 這個role,說明建立role 時,沒有選擇對的 service。

重新建立role,Service 選擇 Redshift。添加 policy。

AmazonS3FullAccess
AmazonRedshiftFullAccess
AWSGlueConsoleFullAccess
AWSGlueServiceRole
AmazonAthenaFullAccess
           

錯誤

testdb=# create external schema s_test0604 from data catalog database 'spectrumdb_db_0604' iam_role 'arn:aws-cn:iam::XXXXXXXXXXXX:role/chen-glue-role' create external database if not exists;
ERROR:  User arn:aws-cn:redshift:cn-northwest-1:XXXXXXXXXXXX:dbuser:chen-test/dbadmin is not authorized to assume IAM Role arn:aws-cn:iam::XXXXXXXXXXXX:role/chen-glue-role
           

之後再建立時就可以了。

繼續閱讀