天天看點

Oracle Database In-Memory Concept

Oracle Database In-Memory (Database In-Memory) is a suite of features, first introduced in Oracle Database 12c Release 1 (12.1.0.2), that greatly improves performance for real-time analytics and mixed workloads.

Traditionally, relational databases store data in either row or columnar formats. Memory and disk store data in the same format.

An Oracle database stores rows contiguously in data blocks. For example, in a table with three rows, an Oracle data block stores the first row, and then the second row, and then the third row. Each row contains all column values for the row. Data stored in row format is optimized for transaction processing. For example, updating all columns in a small number of rows may modify only a small number of blocks.

To address the problems relating to analytic queries, some database vendors have introduced a columnar format. A columnar database stores selected columns—not rows—contiguously. For example, in a large sales table, the sales IDs reside in one column, and sales regions reside in a different column.

Analytical workloads access few columns while scanning, but scan the entire data set. For this reason, the columnar format is the most efficient for analytics. Because columns are stored separately, an analytical query can access only required columns, and avoid reading inessential data. For example, a report on sales totals by region can rapidly process many rows while accessing only a few columns.

一般關系資料庫要麼使用行格式要麼使用列格式來存儲資料,記憶體與磁盤是以相同格式存儲。行格式即一行一行的存儲,列格式即一列一列的存儲

Database vendors typically force customers to choose between a columnar and row-based format. For example, if the data format is columnar, then the database stores data in columnar format both in memory and on disk. Gaining the advantages of one format means losing the advantages of the alternate format. Applications either achieve rapid analytics or rapid transactions, but not both. The performance problems for mixed-use databases are not solved by storing data in a single format.

The demand for real-time analytics means that more analytic queries are being executed in a mixed-workload database. The traditional approach is not sustainable.

Fact table and Dimension table

Fact table事實表即包含事實資料的表,這些資料可用于彙總,如彙總具體時間段内一組商店的特定商品的銷售情況。是以fact table通常包含大量行,需要彙總列會建立索引。一般來說,一個事實資料表都要和一個或多個次元表相關聯,使用者在利用事實資料表建立多元資料集時,可以使用一個或多個次元表。

Dimension table次元表,即是你從哪個角度去觀察事實表資料,它包含了事實表的特性,有些特性提供描述性資訊,有些特性指定如何彙總事實資料表資料。如某地區商品的銷量,是從地區這個角度觀察商品銷量的。事實表就是銷量表,次元表就是地區表

在fact table與dimension table連接配接時預設會先将fact table相應資料放入PGA,再根據join條件過濾dimension table, 即此時fact table稱為join的build side, dimension table稱為join的probe side

(一)Dual-Format: Column and Row 雙重格式:列式與行式

The IM column store maintains copies of tables, partitions, and individual columns in a special compressed columnar format that is optimized for rapid scans. The IM column store resides in the In-Memory Area, which is an optional portion of the system global area (SGA). The IM column store does not replace row-based storage or the database buffer cache, but supplements it.

是否啟用In-Memory并不會影響buff cache的工作方式

The database buffer cache stores and processes data blocks in the same way whether the IM column store is enabled or disabled. Buffer I/O and buffer pools function the same.

In-Memory Area的資料不需要在buffer cache中緩存

Objects populated in the IM column store do not also need to be loaded into the buffer cache.

You can configure all or a subset of a database object's columns for population in the IM column store. Similarly, for a partitioned table or materialized view, you can configure all or a subset of the partitions for population.Use the INMEMORY clause in DDL statements to enable the IM column store at any of the following levels:

Column (nonvirtual or virtual)

Table, materialized view, or partition

Tablespace

The following figure shows a sample IM column store. The database stores the sh.sales table on disk in traditional row format. The SGA stores the data in columnar format in the IM column store, and in row format in the database buffer cache.

The columnar format does not affect the format of data stored in data files or in the buffer cache, nor does it affect undo data and online redo logging.

The database processes DML modifications in the same way, regardless of whether the IM column store is enabled, by updating the buffer cache, online redo log, and undo tablespace. However, the database uses an internal mechanism to track changes and ensure that the IM column store is consistent with the rest of the database. For example, if the sales table is populated in the IM column store, and if an application updates a row in sales, then the database automatically keeps the copy of the sales table in the IM column store transactionally consistent. A query that accesses the IM column store always returns the same results for a query that accesses the buffer cache.

(二)In-Memory Storage Units 記憶體存儲結構

The IM column store manages both data and metadata in optimized storage units, not in traditional Oracle data blocks. 

Oracle Database maintains the storage units in the In-Memory Area.

In-Memory Area又細分為兩個子池(均由oracle自動決定使用大小):

  1. The columnar data pool:由IMCUs和IMEUs組成,用于存儲實際資料

The columnar pool of the In-Memory Area stores the actual data: IMCUs and IMEUs. 

V$INMEMORY_AREA的POOL列名為"1MB POOL" 即為columnar data pool

  1. The metadata pool:由SMUs組成

This subpool stores metadata about the objects that reside in the IM column store.

V$INMEMORY_AREA的POOL列名為"64KB POOL"即為metadata pool

Figure 2-5 IM Column Store: Memory and Process Architecture

  1. In-Memory Compression Units (IMCUs)

An In-Memory Compression Unit (IMCU) is a compressed, read-only storage unit that contains data for one or more columns. An IMCU is analogous to a tablespace extent.

可以不用解壓直接對IMCU進行過濾,過濾後擷取資料時再解壓。

注意IMCU是隻讀結構,不能對資料進行更改

The columnar format enables queries to execute directly against the compressed columns.

Compression enables scanning and filtering operations to process a much smaller amount of data, which optimizes query performance. Oracle Database only decompresses data when it is required for the result set.

The compression applied in the IM column store is closely related to Hybrid Columnar Compression. Both technologies process column vectors. The primary difference is that the column vectors for the IM column store are optimized for SIMD vector processing, whereas the column vectors for Hybrid Columnar Compression are optimized for disk storage.

When you enable an object for population into the IM column store, you specify the type of compression in the INMEMORY clause: FOR DML, FOR QUERY (LOW or HIGH), FOR CAPACITY (LOW or HIGH), or NONE. 如:

ALTER TABLE sh.sales INMEMORY MEMCOMPRESS FOR QUERY LOW;

ALTER TABLE sh.sales INMEMORY MEMCOMPRESS FOR QUERY LOW NO INMEMORY (promo_id, quantity_sold, amount_sold);

同一IMCU隻能存同一對象的列資料(類似表空間的extent概念),對象中指定INMEMORY的所有列同時存于相同的IMCU中。一個IMCU中存儲多少行資料稱為granule,它由Oracle自動決定,同一對象的所有IMCU的granule相同

The IM column store stores data for a single object (table, partition, materialized view) in a set of IMCUs. An IMCU stores columnar data for one and only one object.

Each IMCU contains all column values (including nulls) for a subset of rows in a table segment. A subset of rows is called a granule. All IMCUs for a given segment contain approximately the same number of rows. Oracle Database determines the size of a granule automatically depending on data type, data format, and compression type. A higher compression level results in more rows in the IMCU.

The number of rows in an IMCU dictates the amount of space an IMCU consumes. If the target number of rows causes an IMCU to grow beyond the amount of contiguous 1 MB extents available in the 1 MB pool, then the IMCU creates additional extents (pieces) to hold the remaining column CUs. An IMCU always allocates space in 1 MB increments.

IMCU與blocks是一對多的關系,即一個IMCU可以存儲多個blocks資料,存于IMCU的資料并不會被排序(按讀入順序存儲),另外IMCU中列順序同記錄順序

A one-to-many mapping exists between an IMCU and a set of database blocks. Each IMCU stores the values for columns for a different set of blocks.The columns in an IMCU are not sorted. Oracle Database populates them in the order that they are read from disk.

Example 2-2 IMCUs and Row Subsets

In this simplified example, only the following 4 columns of the customers table have the INMEMORY attribute: cust_id, cust_first_name, cust_last_name, and cust_gender. Only 5 rows exist in the table, stored in 2 data blocks. Conceptually, the first data block stores its rows as follows:

82,Madeline,Li,F;37004,Abel,Embrey,M;1714,Hardy,Gentle,M

The second data block stores rows as follows:

100439,Uma,Campbell,F;3047,Lucia,Downey,F

Assume IMCU 1 stores the data for the first data block. In this case, the cust_id column values for the 3 rows in this data block stores are stored “vertically” within a CU as follows:

82

37004

1714

IMCU 2 stores the data from the second data block. The cust_id column values for these 2 rows are stored within a CU as follows:

100439

3047

Because the cust_id value is the first value for each row in the data block, the cust_id column is in the first position within the IMCU. Columns always occupy the same position, so Oracle Database can reconstruct the rows by reading the IMCUs for a segment.

An IMCU has two parts: a set of Column Compression Units (CUs), and a header that contains metadata such as the IM storage index.

1.1 Column Compression Units

CU用于存儲IMCU中每列的資料,它又可分解為body and header.

A Column Compression Unit (CU) is contiguous storage for a single column in an IMCU. Every IMCU has one or more CUs. A CU is divided into a body and a header. The body of every CU stores the column values for the range of rows included in the IMCU. The header contains metadata about the values stored in the CU body, for example, the minimum and maximum value within the CU. It may also contain a local dictionary, which is a sorted list of the distinct values in that column and their corresponding dictionary codes.

The following figure shows an IMCU with 4 CUs for the sales table: prod_id, cust_id, time_id, and channel_id.

每個CU都是按rowid順序存儲的,即不同CU的相同位置資料為相同行記錄

The CUs store values in rowid order. For this reason, the database can answer queries by “stitching” the rows back together. For example, an application issues the following query:

SELECT cust_id, time_id, channel_id

FROM   sales

WHERE  prod_id =5;

The database begins by scanning the prod_id column for entries with the value 5. Assume that the database finds 5 in position two in the prod_id column. The database now must find the corresponding cust_id, time_id, and channel_id for this row.

Because the CUs store data in rowid order, the database can find the corresponding cust_id, time_id, and channel_id values in position 2 in those columns. Thus, to answer the query, the database must extract the values from position 2 in the cust_id, time_id, and channel_id columns, and then stitch the row back together to return it to the end user.

In a CU, the local dictionary has a list of distinct values and their corresponding dictionary codes.

The local dictionary stores the symbol contained in the column. The following figure illustrates how a CU stores a name column in a vehicles table.

Figure 2-8 Local Dictionary

In the preceding figure, the CU contains only 7 rows. Every distinct value in this CU, such as Cadillac or Audi, is assigned a different dictionary code, such as 2 for Cadillac and 0 for Audi. The CU stores the dictionary code rather than the original value.

Note: When the database uses a common dictionary for a join group, the local dictionary contains references to the common dictionary rather than the symbols. For example, rather than storing the values Audi, BWM, and Cadillac for the vehicles.name column, the local dictionary stores dictionary codes such as 101, 220, and 66.

如果表列使用了common dictionary,它的dictionary code使用common dictionary的code

The CU header contains the minimum and maximum values for the column. In this example, the minimum value is Audi and the maximum value is Cadillac. The local dictionary stores the list of distinct values: Audi, BMW, and Cadillac. Their corresponding dictionary codes (0, 1, and 2) are implicit. The local dictionary for a CU in each IMCU is independent of the local dictionaries in other IMCUs.

If a query filters on Audi automobiles, then the database scans this IMCU for only 0 codes.

1.2 In-Memory Storage Indexes

用于維護In-Memory Storage Indexes ,它用于存儲此IMCU中所有列各自的最大最小值

Every IMCU header automatically creates and manages In-Memory Storage Indexes (IM storage indexes) for its CUs. An IM storage index stores the minimum and maximum for all columns within the IMCU.

For example, sales is populated in the IM column store. Every IMCU for this table has all columns. The sales.prod_id column is stored in a separate CU within every IMCU. The IMCU header has the minimum and maximum values of each prod_id CU (and every other CU).

To eliminate unnecessary scans, the database can perform IMCU pruning based on SQL filter predicates. The database scans only the IMCUs that satisfy the query predicate, as shown in the WHERE prod_id > 14 AND prod_id < 29 example in the following graphic.

Figure 2-9 Storage Index for Columnar Data

  1. Snapshot Metadata Units (SMUs)

快照中繼資料單元(SMU)包含相關IMCU的中繼資料和事務資訊

A Snapshot Metadata Unit (SMU) contains metadata and transactional information for an associated IMCU.This figure shows IMCUs in the data pool, and SMUs in the metadata pool.

IMCU與SMU一一對應

Every IMCU maps to a separate SMU. Thus, if the columnar data pool contains 100 IMCUs, then the metadata pool contains 100 SMUs. The SMUs store several types of metadata for their associated IMCUs, including the following:

Object numbers

Column numbers

Mapping information for rows

如果buffer cache中修改了IM相關資料,會在SMU的事務日志中記錄更改的rowid、DML操作的SCN,并辨別出相應哪個IMCU已過期,如果有查詢用到此IMCU會直接從buffer cache中取資料。詳細population/repopulation過程見文檔《Repopulation of the IM Column Store》

Every SMU contains a transaction journal. The database uses the transaction journal to keep the IMCU transactionally consistent.

The database uses the buffer cache to process DML, just as when the IM column store is not enabled. For example, an UPDATE statement might modify a row in an IMCU. In this case, the database adds the rowid for the modified row to the transaction journal and marks it stale as of the SCN of the DML statement. If a query needs to access the new version of the row, then the database obtains the row from the database buffer cache.

The database achieves read consistency by merging the contents of the column, transaction journal, and buffer cache. When the IMCU is refreshed during repopulation, queries can access the up-to-date row directly from the IMCU.

Figure 2-11 Transaction Journal

  1. In-Memory Expression Units (IMEUs)

IMEU用于包含IM expression虛拟列及表虛拟列的資料,它與IMCU一一對應。

An In-Memory Expression Unit (IMEU) is a storage container for materialized In-Memory Expressions (IM expressions) and user-defined virtual columns.

  1. 表的虛拟列是建立表時指定的計算表達式,如create table t1(id number, vir1 GENERATED ALWAYS AS (id*2) VIRTUAL);
  2. IM expression是查詢語句中使用到的計算表達式,Oracle通過ESS自動選出hot expression,并将它們生成名字字首為SYS_IME的虛拟列。

Conceptually, an IMEU is a logical extension of its parent IMCU. Just as an IMCU can contain multiple columns, an IMEU can contain multiple virtual columns. Every IMEU maps to exactly one IMCU, mapping to the same row set. The IMEU contains expression results for the data contained in its associated IMCU. By default, the IMEU inherits the INMEMORY clause properties, including Oracle Real Application Clusters (Oracle RAC) properties such as DISTRIBUTE and DUPLICATE, from the base segment. You can selectively enable or disable virtual columns for storage in IMEUs. You can also specify compression levels for different columns.

在IMCU populate/repopulate同時會對IMEU資料populate/repopulate,但同時IMCU可以單獨進行populate及repopulate,詳見程序部分說明

When the IMCU is populated, the associated IMEU is also populated.

A typical IM expression involves one or more columns, possibly with constants, and has a one-to-one mapping with the rows in the table. For example, an IMCU for an employees table contains rows 1–1000 for the column weekly_salary. For the rows stored in this IMCU, the IMEU calculates the automatically detected IM expression weekly_salary*52, and the user-defined virtual column quarterly_salary defined as weekly_salary*12. The 3rd row down in the IMCU maps to the 3rd row down in the IMEU.

(三)Expression Statistics Store (ESS)  表達式統計資訊

ESS即IM expression統計資訊庫,存于資料字典中,用于辨別expression的權重,相關資料字典DBA_EXPRESSION_STATISTICS

The Expression Statistics Store (ESS) is a repository maintained by the optimizer to store statistics about expression evaluation. The ESS resides in the SGA and persists on disk.

When an IM column store is enabled, the database leverages the ESS for its In-Memory Expressions (IM expressions) feature. However, the ESS is independent of the IM column store. The ESS is a permanent component of the database and cannot be disabled.

The database uses the ESS to determine whether an expression is “hot” (frequently accessed), and thus a candidate for an IM expression. During a hard parse of a query, the ESS looks for active expressions in the SELECT list, WHERE clause, GROUP BY clause, and so on.

For each segment, the ESS maintains expression statistics such as the following:

Frequency of execution

Cost of evaluation

Timestamp evaluation

The optimizer assigns each expression a weighted score based on cost and the number of times it was evaluated. The values are approximate rather than exact. More active expressions have higher scores. The ESS maintains an internal list of the most frequently accessed expressions.

Control the behavior of IM expressions using the DBMS_INMEMORY_ADMIN package. For example, the IME_CAPTURE_EXPRESSIONS procedure prompts the database to identify and gradually populate the hottest expressions in the database. The IME_POPULATE_EXPRESSIONS procedure forces the database to populate the expressions immediately.

ESS information is stored in the data dictionary and exposed in the DBA_EXPRESSION_STATISTICS view. This view shows the metadata that the optimizer has collected in the ESS. IM expressions are exposed as system-generated virtual columns, prefixed by the string SYS_IME, in the DBA_IM_EXPRESSIONS view.

(四)In-Memory Process Architecture 相關程序

In response to queries and DML, server processes scan columnar data and update SMU metadata. Background processes populate row data from disk into the IM column store.

  1. In-Memory Coordinator Process (IMCO)

IMCO程序主要用于管理population與repopulation作業。IMCO程序預設每2min喚醒一次對所有變動的IMCU進行repopulation,不論它是否達到閥值

The In-Memory Coordinator Process (IMCO) manages many tasks for the IM column store. Its primary task is to initiate background population and repopulation of columnar data.

  1. Space Management Worker Processes (Wnnn)

Wnnn是IMCO實際的工作程序

Space Management Worker Processes (Wnnn) populate or repopulate data on behalf of IMCO.

During population, Wnnn processes are responsible for creating IMCUs, SMUs, and IMEUs.

The INMEMORY_MAX_POPULATE_SERVERS initialization parameter controls the maximum number of worker processes that can be started for population. The INMEMORY_TRICKLE_REPOPULATE_PERCENT initialization parameter controls the maximum percentage of time that worker processes can perform trickle repopulation.

(五)CPU Architecture: SIMD Vector Processing

For data that does need to be scanned in the IM column store, the database uses SIMD (single instruction, multiple data) vector processing.

The IM column store maximizes the number of column entries that the CPU can load into the vector registers(向量寄存器) and evaluate. Instead of evaluating each entry in the column one at a time, the database evaluates a set of column values in a single CPU instruction. SIMD vector processing enables the database to scan billions of rows per second.

For example, an application issues a query to find the total number of orders in the sales table that use the promo_id value of 9999. The sales table resides in the IM column store. The query begins by scanning only the sales.promo_id column, as shown in the following diagram:

Figure 2-12 SIMD Vector Processing

The CPU evaluates the data as follows:

  1. Loads the first 8 values (the number varies depending on data type and compression mode) from the promo_id column into the SIMD register, and then compares them with the value 9999 in a single instruction
  2. Discards the entries.
  3. Loads another 8 values into the SIMD register, and then continues in this way until it has evaluated all entries.

For example, suppose a user executes the following ad hoc query:

SELECT cust_id, time_id, channel_id

FROM sales

WHERE prod_id BETWEEN 14 and 29

When using the buffer cache, the database would typically scan an index to find the product IDs, use the rowids to fetch the rows from disk into the buffer cache, and then discard the unwanted column values. Scanning data in row format in the buffer cache requires many CPU instructions, and can result in suboptimal CPU efficiency.

When using the IM column store, the database can scan only the requested sales columns, avoiding disk altogether. Scanning data in columnar format pipelines only necessary columns to the CPU, increasing efficiency. Each CPU core scans local in-memory columns using SIMD vector instructions.

注:SIMD(Single Instruction Multiple Data)為單指令多資料流,能夠複制多個操作數,并把它們打包在大型寄存器的一組指令集。以加法指令為例,單指令單資料(SISD)的CPU對加法指令譯碼後,執行部件先通路記憶體,取得第一個操作數;之後再一次通路記憶體,取得第二個操作數;随後才能進行求和運算。而在SIMD型的CPU中,指令譯碼後幾個執行部件同時通路記憶體,一次性獲得所有操作數進行運算。這個特點使SIMD特别适合于多媒體應用等資料密集型運算