天天看点

MONGO Replica 频繁插入大数据的问题

看看问题

我在做这样一个测试:架设 Replica Set,有 3 个节点,运行于同台机器的3个不同端口。使用 PHP 往里面不停地以每次插入 10000 个文档,一共需要插入 1E 左右个文档。

在插入中,2 个 SECONDARY 全部状态为 Recovering,错误信息:”errmsg” : “error RS102 too stale to catch up”。并且在插入7000W左右文档时(并不表示在 7000W 数据后才发生),发现插入速度变的很不稳定:

>mongo insert in 2.1298868656158 Secs. memory 164.25 MB

>mongo insert in 61.71651506424 Secs. memory 164.25 MB

>mongo insert in 2.6970100402832 Secs. memory 164.25 MB

>mongo insert in 58.021383047104 Secs. memory 164.5 MB

>mongo insert in 43.900979042053 Secs. memory 165 MB

>mongo insert in 26.911059856415 Secs. memory 164 MB

>mongo insert in 49.29422211647 Secs. memory 164.25 MB

该如何处理?

幸运的是官方文档 Resyncing a Very Stale Replica Set Member 告诉了问题所在,OPLOG(operation log 的简称)。OPLOG 是用于 Replica Set的 PRIMARY 和 SECONDARY 之间同步数据的系统 COLLECTION。OPLOG 的数据大小是有峰值的,64 位机器默认为 ~19G(19616.9029296875MB),通过 db.printReplicationInfo() 可以查看到:

configured oplog size: 19616.9029296875MB (OPLOG 大小)

log length start to end: 15375secs (4.27hrs) (OPLOG 中操作最早与最晚操作的时间差)

oplog first event time: Thu Jul 07 2011 21:03:29 GMT+0800 (CST)

oplog last event time: Fri Jul 08 2011 01:19:44 GMT+0800 (CST)

now: Thu Jul 07 2011 17:20:16 GMT+0800 (CST)

要了解上面参数更详细的含义可以看下 mongo_vstudio.cpp 源代码, JS 的噢

https://github.com/mongodb/mongo/blob/master/shell/mongo_vstudio.cpp

当 PRIMARY 有大量操作的时候,OPLOG 里就会插入相应的大量文档。每条文档就是一个操作,有插入(i)、更新(u)、删除(d)。

test:PRIMARY> db.oplog.rs.find()

{ “ts” : { “t” : 1310044124000, “i” : 11035 }, “h” : NumberLong(“-2807175333144039203″), “op” : “i”, “ns” : “cas_v2.cas_user_flat”, “o” : { “_id” : ObjectId(“4e15afdb1d6988397e0c6612″), … } }

{ “ts” : { “t” : 1310044124000, “i” : 11036 }, “h” : NumberLong(“5285197078463590243″), “op” : “i”, “ns” : “cas_v2.cas_user_flat”, “o” : { “_id” : ObjectId(“4e15afdb1d6988397e0c6613″), … } }

ts: the time this operation occurred.

h: a unique ID for this operation. Each operation will have a different value in this field.

op: the write operation that should be applied to the slave. n indicates a no-op, this is just an informational message.

ns: the database and collection affected by this operation. Since this is a no-op, this field is left blank.

o: the actual document representing the op. Since this is a no-op, this field is pretty useless.

由于 OPLOG 的大小是有限制的,所以 SECONDARY 的同步可能无法更上 PRIMARY 插入的速度。这时候当我们查看 rs.status() 状态的时候就会出现 “error RS102 too stale to catch up” 的错误。

If this occurs, the slave will start giving error messages about needing to be resynced. It can’t catch up to the master from the oplog anymore: it might miss operations between the last oplog entry it has and the master’s oldest oplog entry. It needs a full resync at this point.

解决办法:

Resyncing a Very Stale Replica Set Member 给出了当我们遇到 Error RS102 错误时,该做些什么事。还可以根据 Halted Replication 中的 Increasing the OpLog Size ,调整 OPLOG 的大小为适当的值。

This indicates that you’re adding data to the database at a rate of 524MB/hr. If an initial clone takes 10 hours, then the oplog should be at least 5240MB, so something closer to 8GB would make for a safe bet.

最后在数据继续插入的情况下,使用 rs.remove() 移除 2 个SECONDARY 后,插入又恢复了原来的速度。剩下就是插完后再重新同步 SECONDARY。

>mongo insert in 0.62605094909668 Secs. memory 164.25 MB

>mongo insert in 0.63488984107971 Secs. memory 164 MB

>mongo insert in 0.64394617080688 Secs. memory 164.25 MB

>mongo insert in 0.61102414131165 Secs. memory 164 MB

>mongo insert in 0.64304113388062 Secs. memory 164.25 MB

一些解释

Replica Set status状态说明:

0 Starting up, phase 1

1 Primary

2 Secondary

3 Recovering

4 Fatal error

5 Starting up, phase 2

6 Unknown state

7 Arbiter

8 Down

参考资料

  • http://www.snailinaturtleneck.com/blog/2010/10/14/getting-to-know-your-oplog/
  • http://www.snailinaturtleneck.com/blog/2010/10/12/replication-internals/
  • http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
  • http://www.mongodb.org/display/DOCS/Halted+Replication

Tagged with: mongo • Replica-SET • RS102

继续阅读