Kylin-基本知識

2016-05-25 23:50:00

Table - This is definition of hive tables as source of cubes, which must be synced before building cubes.

Cube Descriptor - This describes definition and settings for a cube instance, defining which data model to use, what dimensions and measures to have, how to partition to segments and how to handle auto-merge etc.

Cube Instance - This is instance of cube, built from one cube descriptor, and consist of one or more cube segments according partition settings.

Partition - User can define a DATE/STRING column as partition column on cube descriptor, to separate one cube into several segments with different date periods.

Cube Segment - This is actual carrier of cube data, and maps to a HTable in HBase. One building job creates one new segment for the cube instance. Once data change on specified data period, we can refresh related segments to avoid rebuilding

whole cube.

Aggregation Group - Each aggregation group is subset of dimensions, and build cuboid with combinations inside. It aims at pruning for optimization.

Mandotary - This dimension type is used for cuboid pruning, if a dimension is specified as “mandatory”, then those combinations without such dimension are pruned.

Hierarchy - This dimension type is used for cuboid pruning, if dimension A,B,C forms a “hierarchy” relation, then only combinations with A, AB or ABC shall be remained.

Derived - On lookup tables, some dimensions could be generated from its PK, so there’s specific mapping between them and FK from fact table. So those dimensions are DERIVED and don’t participate in cuboid generation.

Count Distinct(HyperLogLog) - Immediate COUNT DISTINCT is hard to calculate, a approximate algorithm -

Count Distinct(Precise) - Precise COUNT DISTINCT will be pre-calculated basing on RoaringBitmap, currently only int or bigint are supported.

Top N - For example, with this measure type, user can easily get specified numbers of top sellers/buyers etc.

BUILD - Given an interval of partition column, this action is to build a new cube segment.

REFRESH - This action will rebuilt cube segment in some partition period, which is used in case of source table increasing.

MERGE - This action will merge multiple continuous cube segments into single one. This can be automated with auto-merge settings in cube descriptor.

PURGE - Clear segments under a cube instance. This will only update metadata, and won’t delete cube data from HBase.

NEW - This denotes one job has been just created.

PENDING - This denotes one job is paused by job scheduler and waiting for resources.

RUNNING - This denotes one job is running in progress.

FINISHED - This denotes one job is successfully finished.

ERROR - This denotes one job is aborted with errors.

DISCARDED - This denotes one job is cancelled by end users.

RESUME - Once a job in ERROR status, this action will try to restore it from latest successful point.

DISCARD - No matter status of a job is, user can end it and release resources with DISCARD action.

Kylin-基本知識

繼續閱讀

Apache Kylin權威指南3.4　管理Cube碎片

Apache Kylin權威指南3.5　小結

Apache Kylin權威指南導讀

《Hack與HHVM權威指南》——1.6.1 沒有類型的變量

比學習知識更重要的是思維方式 ——CTO訓練營第三季開營禮

深度學習——你需要了解的八大開源架構

深度學習真的可以零基礎入門嗎？

【中間件3】手把手教你在UbuntuKylin安裝配置開源版Tair（請指教）一相關資源二安裝步驟2 三啟動步驟

安裝Kylin

在Ubuntu / Ubuntu Kylin下安裝和解除安裝 Nodepadqq

Kylin-百度地圖的實踐

Kylin與CDH相容性剖析1. 概述2. 内容3. 實戰演練4.總結5.結束語

kylin_學習_02_kylin使用教程

kylin_學習_01_kylin安裝部署

kylin_學習_00_資源帖

Kylin如何實作基數統計概述基于RoaringBitmap的精确統計算法基于HyperLogLog的近似統計算法如何在Kylin中進行基數統計Kylin中基數統計的實作結語參考文獻