MONGO Replica 频繁插入大数据的问题

看看问题

我在做这样一个测试：架设 Replica Set，有 3 个节点，运行于同台机器的3个不同端口。使用 PHP 往里面不停地以每次插入 10000 个文档，一共需要插入 1E 左右个文档。

在插入中，2 个 SECONDARY 全部状态为 Recovering，错误信息：”errmsg” : “error RS102 too stale to catch up”。并且在插入7000W左右文档时（并不表示在 7000W 数据后才发生），发现插入速度变的很不稳定：

>mongo insert in 2.1298868656158 Secs. memory 164.25 MB

>mongo insert in 61.71651506424 Secs. memory 164.25 MB

>mongo insert in 2.6970100402832 Secs. memory 164.25 MB

>mongo insert in 58.021383047104 Secs. memory 164.5 MB

>mongo insert in 43.900979042053 Secs. memory 165 MB

>mongo insert in 26.911059856415 Secs. memory 164 MB

>mongo insert in 49.29422211647 Secs. memory 164.25 MB

该如何处理？

幸运的是官方文档 Resyncing a Very Stale Replica Set Member 告诉了问题所在，OPLOG（operation log 的简称）。OPLOG 是用于 Replica Set的 PRIMARY 和 SECONDARY 之间同步数据的系统 COLLECTION。OPLOG 的数据大小是有峰值的，64 位机器默认为 ~19G（19616.9029296875MB），通过 db.printReplicationInfo() 可以查看到：

configured oplog size: 19616.9029296875MB (OPLOG 大小)

log length start to end: 15375secs (4.27hrs) （OPLOG 中操作最早与最晚操作的时间差）

oplog first event time: Thu Jul 07 2011 21:03:29 GMT+0800 (CST)

oplog last event time: Fri Jul 08 2011 01:19:44 GMT+0800 (CST)

now: Thu Jul 07 2011 17:20:16 GMT+0800 (CST)

要了解上面参数更详细的含义可以看下 mongo_vstudio.cpp 源代码， JS 的噢

https://github.com/mongodb/mongo/blob/master/shell/mongo_vstudio.cpp

当 PRIMARY 有大量操作的时候，OPLOG 里就会插入相应的大量文档。每条文档就是一个操作，有插入（i）、更新（u）、删除（d）。

test:PRIMARY> db.oplog.rs.find()

{ “ts” : { “t” : 1310044124000, “i” : 11035 }, “h” : NumberLong(“-2807175333144039203″), “op” : “i”, “ns” : “cas_v2.cas_user_flat”, “o” : { “_id” : ObjectId(“4e15afdb1d6988397e0c6612″), … } }

{ “ts” : { “t” : 1310044124000, “i” : 11036 }, “h” : NumberLong(“5285197078463590243″), “op” : “i”, “ns” : “cas_v2.cas_user_flat”, “o” : { “_id” : ObjectId(“4e15afdb1d6988397e0c6613″), … } }

ts: the time this operation occurred.

h: a unique ID for this operation. Each operation will have a different value in this field.

op: the write operation that should be applied to the slave. n indicates a no-op, this is just an informational message.

ns: the database and collection affected by this operation. Since this is a no-op, this field is left blank.

o: the actual document representing the op. Since this is a no-op, this field is pretty useless.

由于 OPLOG 的大小是有限制的，所以 SECONDARY 的同步可能无法更上 PRIMARY 插入的速度。这时候当我们查看 rs.status() 状态的时候就会出现 “error RS102 too stale to catch up” 的错误。

If this occurs, the slave will start giving error messages about needing to be resynced. It can’t catch up to the master from the oplog anymore: it might miss operations between the last oplog entry it has and the master’s oldest oplog entry. It needs a full resync at this point.

解决办法：

Resyncing a Very Stale Replica Set Member 给出了当我们遇到 Error RS102 错误时，该做些什么事。还可以根据 Halted Replication 中的 Increasing the OpLog Size ，调整 OPLOG 的大小为适当的值。

This indicates that you’re adding data to the database at a rate of 524MB/hr. If an initial clone takes 10 hours, then the oplog should be at least 5240MB, so something closer to 8GB would make for a safe bet.

最后在数据继续插入的情况下，使用 rs.remove() 移除 2 个SECONDARY 后，插入又恢复了原来的速度。剩下就是插完后再重新同步 SECONDARY。

>mongo insert in 0.62605094909668 Secs. memory 164.25 MB

>mongo insert in 0.63488984107971 Secs. memory 164 MB

>mongo insert in 0.64394617080688 Secs. memory 164.25 MB

>mongo insert in 0.61102414131165 Secs. memory 164 MB

>mongo insert in 0.64304113388062 Secs. memory 164.25 MB

一些解释

Replica Set status状态说明：

0 Starting up, phase 1

1 Primary

2 Secondary

3 Recovering

4 Fatal error

5 Starting up, phase 2

6 Unknown state

7 Arbiter

8 Down

参考资料

http://www.snailinaturtleneck.com/blog/2010/10/14/getting-to-know-your-oplog/
http://www.snailinaturtleneck.com/blog/2010/10/12/replication-internals/
http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
http://www.mongodb.org/display/DOCS/Halted+Replication

Tagged with: mongo • Replica-SET • RS102

MONGO Replica 频繁插入大数据的问题

看看问题

该如何处理？

一些解释

参考资料

继续阅读

软件开发的风险管理之二

java学习之心得体会

程序员基本素质要求

8个道理，让你的程序人生受益终生

[转]俞敏洪：在职场混好必读的22本书

我遇到的一些国内开发者的毛病我遇到的一些国内开发者的毛病不会问问题缺失获取信息的能力缺乏知识体系盲目跟风缺乏责任感不独立思考不切实际不阅读太把技术当回事不思进取

我的职业生涯（四）

[好文摘录] 怎么样向老板提问看上去不蠢？工作中的两种思维如何通过问问题明确任务职场中有80%的时间花在沟通上，剩下的20%才是你完成任务的时间。学会问问题，是职场沟通中最重要的技能之一。

一秒看透本质的人，是如何思考的？

程序员简历上写这种项目，难怪面试当炮灰。。。二、如何让你的项目经验更有技术含量

程序员不了解这些投简历的巨坑，面试注定一开始就失败！前言第一阶段：练手第二阶段：冲刺第三阶段：收尾

在公司里写代码天天摸鱼偷懒，出去面试又该怎么写简历？

砺鹰职业测评|更了解自己的职业兴趣

面试的三重境界

世界500强常用的管理方法和工具

一个四年java程序员的年终总结