天天看點

In-place update in WiredTiger

There is a great new feature in the release note of MongoDB 3.5.12.

Faster In-place Updates in WiredTiger This work brings improvements to in-place update workloads for users running the WiredTiger engine, especially for updates to large documents. Some workloads may see a reduction of up to 7x in disk utilization (from 24 MB/s to 3 MB/s) as well as a 20% improvement in throughput.

I thought wiredtiger has impeletementd the <code>delta page</code> feature introduced in the bw-tree paper, that is, writing pages that are deltas from previously written pages. But after I read the source code, I found it's a totally diffirent idea, <code>in-place update</code> only impacted the in-meomry and journal format, the on disk layout of data is not changed.

I will explain the core of the <code>in-place update</code> implementation.

MongoDB introduced <code>mutable bson</code> to descirbe document update as <code>incremental(delta) update</code>.

Mutable BSON provides classes to facilitate the manipulation of existing BSON objects or the construction of new BSON objects from scratch in an incremental fashion.

Suppose you have a very large document, see 1MB

If the fightvalue is changed from 100 to 101, you can use a <code>DamageEvent</code> to describe the update, it just tells you <code>the offset、size、content(kept in another array)</code> of the change.

So if you have many small changes for a document, you will have <code>DamageEvent</code> array, MongoDB add a new storage interface to support inserting <code>DamageEvent array (DamageVector)</code>.

WiredTiger added a new update type called <code>WT_UPDATE_MODIFIED</code> to support MongoDB, when a <code>WT_UPDATE_MODIFIED</code> update happened, wiredTiger first logged a <code>change list</code> which is transformed from DamageVector into journal, then kept the change list in memory associated with the original record.

When the record is read, wiredTiger will first read the original record, then apply every operation in <code>change list</code>, returned the final record to the client.

So the core for <code>in-place update</code>:

WiredTiger support <code>delta update</code> in memory and journal, so the IO of writing journal will be greatly reduced for large document.

WiredTiger's data layout is kept unchanged, so the IO of writing data is not changed.