天天看點

Kylin - 分析資料

Go to <code>Query</code> page in top menu bar, then click

<code>Manage Projects</code>.

Kylin - 分析資料

Click the <code>+ Project</code> button to add a new project.

Kylin - 分析資料

Enter a project name, e.g, “Tutorial”, with a description (optional), then click

<code>submit</code> button to send the request.

Kylin - 分析資料

After success, the project will show in the table.

Kylin - 分析資料

Click <code>Model</code> in top bar and then click <code>Data Source</code> tab in the left part, it lists all the tables loaded into Kylin; click

<code>Load Hive Table</code> button.

Kylin - 分析資料

Enter the hive table names, separated with commad, and then click <code>Sync</code> to send the request.

Kylin - 分析資料

[Optional] If you want to browser the hive database to pick tables, click the <code>Load Hive Table From Tree</code> button.

Kylin - 分析資料

[Optional] Expand the database node, click to select the table to load, and then click

<code>Sync</code>.

Kylin - 分析資料

A success message will pop up. In the left <code>Tables</code> section, the newly loaded table is added. Click the table name will expand the columns.

Kylin - 分析資料

In the background, Kylin will run a MapReduce job to calculate the approximate cardinality for the newly synced table. After the job be finished, refresh web page and then click the table name, the cardinality will be shown in the table info.

Kylin - 分析資料

Before create a cube, need define a data model. The data model defines the star schema. One data model can be reused in multiple cubes.

Click <code>Model</code> in top bar, and then click <code>Models</code> tab. Click <code>+New</code> button, in the drop-down list select

<code>New Model</code>.

Kylin - 分析資料

Enter a name for the model, with an optional description.

Kylin - 分析資料

In the <code>Fact Table</code> box, select the fact table of this data model.

Kylin - 分析資料

[Optional] Click <code>Add Lookup Table</code> button to add a lookup table. Select the table name and join type (inner or left).

Kylin - 分析資料

[Optional] Click <code>New Join Condition</code> button, select the FK column of fact table in the left, and select the PK column of lookup table in the right side. Repeat this if have more than one join columns.

Kylin - 分析資料

Click “OK”, repeat step 4 and 5 to add more lookup tables if any. After finished, click “Next”.

The “Dimensions” page allows to select the columns that will be used as dimension in the child cubes. Click the

<code>Columns</code> cell of a table, in the drop-down list select the column to the list.

Kylin - 分析資料

Click “Next” go to the “Measures” page, select the columns that will be used in measure/metrics. The measure column can only from fact table.

Kylin - 分析資料

Click “Next” to the “Settings” page. If the data in fact table increases by day, select the corresponding date column in the

<code>Partition Date Column</code>, and select the date format, otherwise leave it as blank.

[Optional] Select <code>Cube Size</code>, which is an indicator on the scale of the cube, by default it is

<code>MEDIUM</code>.

[Optional] If some records want to excluded from the cube, like dirty data, you can input the condition in

<code>Filter</code>.

Kylin - 分析資料

Click <code>Save</code> and then select <code>Yes</code> to save the data model. After created, the data model will be shown in the left

<code>Models</code> list.

Kylin - 分析資料

After the data model be created, you can start to create cube.

<code>New Cube</code>.

Kylin - 分析資料

Step 1. Cube Info

Select the data model, enter the cube name; Click <code>Next</code> to enter the next step.

You can use letters, numbers and ‘_’ to name your cube (blank space in name is not allowed).

<code>Notification List</code> is a list of email addresses which be notified on cube job success/failure.

Kylin - 分析資料

Step 2. Dimensions

Click <code>Add Dimension</code>, it popups two option: “Normal” and “Derived”: “Normal” is to add a normal independent dimension column, “Derived” is to add a derived dimension column. Read more in

Click “Normal” and then select a dimension column, give it a meaningful name.

Kylin - 分析資料

[Optional] Click “Derived” and then pickup 1 more multiple columns on lookup table, give them a meaningful name.

Kylin - 分析資料

Repeate 2 and 3 to add all dimension columns; you can do this in batch for “Normal” dimension with the button

<code>Auto Generator</code>.

Kylin - 分析資料

Click “Next” after select all dimensions.

Step 3. Measures

Click the <code>+Measure</code> to add a new measure.

Kylin - 分析資料

There are 6 types of measure according to its expression: <code>SUM</code>, <code>MAX</code>, <code>MIN</code>, <code>COUNT</code>, <code>COUNT_DISTINCT</code> and <code>TOP_N</code>. Properly select the return type for

<code>COUNT_DISTINCT</code> and <code>TOP_N</code>, as it will impact on the cube size.

SUM

Kylin - 分析資料

MIN

Kylin - 分析資料

MAX

Kylin - 分析資料

COUNT

Kylin - 分析資料

DISTINCT_COUNT

This measure has two implementations:

a) approximate implementation with HyperLogLog, select an acceptable error rate, lower error rate will take more storage.

b) precise implementation with bitmap (see limitation in https://issues.apache.org/jira/browse/KYLIN-1186).

Kylin - 分析資料

Pleaste note: distinct count is a very heavy data type, it is slower to build and query comparing to other measures.

TOP_N

Approximate TopN measure pre-calculates the top records in each dimension combination, it will provide higher performance in query time than no pre-calculation; Need specify two parameters here: the first is the column will be used as metrics for Top records

(aggregated with SUM and then sorted in descending order); the second is the literal ID, represents the record like seller_id;

Properly select the return type, depends on how many top records to inspect: top 10, top 100 or top 1000.

Kylin - 分析資料

Step 4. Refresh Setting

This step is designed for incremental cube build.

<code>Auto Merge Time Ranges (days)</code>: merge the small segments into medium and large segment automatically. If you don’t want to auto merge, remove the default two ranges.

<code>Retention Range (days)</code>: only keep the segment whose data is in past given days in cube, the old segment will be automatically dropped from head; 0 means not enable this feature.

<code>Partition Start Date</code>: the start date of this cube.

Kylin - 分析資料

Step 5. Advanced Setting

<code>Aggregation Groups</code>: by default Kylin put all dimensions into one aggregation group; you can create multiple aggregation groups by knowing well about your query patterns. For the concepts of “Mandatory Dimensions”, “Hierarchy

<code>Rowkeys</code>: the rowkeys are composed by the dimension encoded values. “Dictionary” is the default encoding method; If a dimension is not fit with dictionary (e.g., cardinality &gt; 10 million), select “false” and then enter

the fixed length for that dimension, usually that is the max. length of that column; if a value is longer than that size it will be truncated. Please note, without dictionary encoding, the cube size might be much bigger.

You can drag &amp; drop a dimension column to adjust its position in rowkey; Put the mandantory dimension at the begining, then followed the dimensions that heavily involved in filters (where condition). Put high cardinality dimensions ahead of low cardinality

dimensions.

Step 6. Overview &amp; Save

You can overview your cube and go back to previous step to modify it. Click the

<code>Save</code> button to complete the cube creation.

Kylin - 分析資料