背景和概述
資料狀況:
mongodb的一個db中存了近2000萬記錄,有type(字元串)、ctime(float,建立時間)等索引參數,記錄的自然順序和ctime的排序是一緻的。
查詢需求:
從第47萬開始查詢并傳回結果,每批次傳回100條;為後續業務邏輯需求,在執行batch_size(100)前會先執行一個count()。
探索過程:
之前隻采用type作為查詢條件并用skip參數跳過前47萬記錄,感覺有些慢(約1分鐘),于是想通過type、ctime兩個查詢條件并棄用skip參數,結果大跌眼鏡,查詢耗時猛增到了7分鐘;仔細看日志,發現時間都耗在了count函數,就batch_size函數的執行時間來說,第二個查詢是更快的,猜測前一方案batch_size查詢慢是因為skip參數,于是測試type條件+skip參數分别為0、10w、20w、30w、40w的情況,發現耗時逐漸顯著增長,也即skip參數對速度的影響也是顯著的,skip值與耗時成正相關。
結論:
想提高查詢效率,若用了count就要減少查詢條件,最好繞過count函數相關需求,然後增加查詢條件并棄用skip參數。
主要查詢日志的對比
一個查詢條件,帶skip參數,耗時1分鐘出頭 count()用時10746ms batch_size(100)用時59983ms
x = bookscache.find({"type": "s"}, {"_id": 0, "type": 0}, skip = skip)
2017-02-20T11:30:48.600+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:1361 #61 (1 connection now open)
2017-02-20T11:30:48.606+0800 I NETWORK [conn61] received client metadata from 127.0.0.1:1361 conn61: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T11:30:48.617+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:1365 #62 (2 connections now open)
2017-02-20T11:30:48.619+0800 I NETWORK [conn62] received client metadata from 127.0.0.1:1365 conn62: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T11:30:59.366+0800 I COMMAND [conn62] command cache.bookscache command: count { count: "bookscache", query: { type: "s" } } planSummary: COUNT_SCAN { type: 1 } keysExamined:17090106 docsExamined:0 numYields:133793 reslen:44 locks:{ Global: { acquireCount: { r: 267588 } }, Database: { acquireCount: { r: 133794 } }, Collection: { acquireCount: { r: 133794 } } } protocol:op_query 10746ms
2017-02-20T11:31:59.352+0800 I COMMAND [conn62] command cache.bookscache command: find { find: "bookscache", filter: { type: "s" }, projection: { _id: 0, type: 0 }, skip: 487951, batchSize: 100 } planSummary: IXSCAN { type: 1 } cursorid:20810842870 keysExamined:488051 docsExamined:488051 numYields:5868 nreturned:100 reslen:396171 locks:{ Global: { acquireCount: { r: 11738 } }, Database: { acquireCount: { r: 5869 } }, Collection: { acquireCount: { r: 5869 } } } protocol:op_query 59983ms
2017-02-20T11:32:26.792+0800 I - [conn61] end connection 127.0.0.1:1361 (2 connections now open)
2017-02-20T11:32:26.792+0800 I - [conn62] end connection 127.0.0.1:1365 (2 connections now open)
兩個查詢條件,無skip參數,耗時近7分鐘 count()用時409125ms batch_size(100)用時280ms
x = bookscache.find({"type": "s", "ctime": {"$gte": minCtime}}, {"_id": 0, "type": 0})
2017-02-20T11:47:09.422+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:12310 #69 (1 connection now open)
2017-02-20T11:47:09.424+0800 I NETWORK [conn69] received client metadata from 127.0.0.1:12310 conn69: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T11:47:09.431+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:12314 #70 (2 connections now open)
2017-02-20T11:47:09.431+0800 I NETWORK [conn70] received client metadata from 127.0.0.1:12314 conn70: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T11:53:58.561+0800 I COMMAND [conn70] command cache.bookscache command: count { count: "bookscache", query: { type: "s", ctime: { $gte: 1483859529.57 } } } planSummary: IXSCAN { ctime: 1 } keysExamined:19366965 docsExamined:19366965 fromMultiPlanner:1 numYields:164853 reslen:44 locks:{ Global: { acquireCount: { r: 329708 } }, Database: { acquireCount: { r: 164854 } }, Collection: { acquireCount: { r: 164854 } } } protocol:op_query 409125ms
2017-02-20T11:53:58.849+0800 I COMMAND [conn70] command cache.bookscache command: find { find: "bookscache", filter: { type: "s", ctime: { $gte: 1483859529.57 } }, projection: { _id: 0, type: 0 }, batchSize: 100 } planSummary: IXSCAN { ctime: 1 } cursorid:17875490189 keysExamined:582 docsExamined:582 fromMultiPlanner:1 numYields:19 nreturned:100 reslen:427036 locks:{ Global: { acquireCount: { r: 40 } }, Database: { acquireCount: { r: 20 } }, Collection: { acquireCount: { r: 20 } } } protocol:op_query 280ms
2017-02-20T12:06:07.802+0800 I COMMAND [conn70] command cache.bookscache command: getMore { getMore: 17875490189, collection: "bookscache", batchSize: 100 } originatingCommand: { find: "bookscache", filter: { type: "s", ctime: { $gte: 1483859529.57 } }, projection: { _id: 0, type: 0 }, batchSize: 100 } planSummary: IXSCAN { ctime: 1 } cursorid:17875490189 keysExamined:1830 docsExamined:1830 fromMultiPlanner:1 numYields:22 nreturned:100 reslen:457999 locks:{ Global: { acquireCount: { r: 46 } }, Database: { acquireCount: { r: 23 } }, Collection: { acquireCount: { r: 23 } } } protocol:op_query 134ms
2017-02-20T12:06:36.267+0800 I - [conn69] end connection 127.0.0.1:12310 (2 connections now open)
2017-02-20T12:06:36.267+0800 I - [conn70] end connection 127.0.0.1:12314 (2 connections now open)
更多測試和查詢日志
skip0 用時8s(batch_size未執行) count()用時7683ms batch_size(100)用時因中止程序而缺失
2017-02-20T12:20:04.722+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:8250 #73 (1 connection now open)
2017-02-20T12:20:04.725+0800 I NETWORK [conn73] received client metadata from 127.0.0.1:8250 conn73: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T12:20:04.731+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:8254 #74 (2 connections now open)
2017-02-20T12:20:04.732+0800 I NETWORK [conn74] received client metadata from 127.0.0.1:8254 conn74: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T12:20:12.417+0800 I COMMAND [conn74] command cache.bookscache command: count { count: "bookscache", query: { type: "s" } } planSummary: COUNT_SCAN { type: 1 } keysExamined:17090106 docsExamined:0 numYields:133720 reslen:44 locks:{ Global: { acquireCount: { r: 267442 } }, Database: { acquireCount: { r: 133721 } }, Collection: { acquireCount: { r: 133721 } } } protocol:op_query 7683ms
2017-02-20T12:20:13.780+0800 I - [conn73] end connection 127.0.0.1:8250 (2 connections now open)
2017-02-20T12:20:13.780+0800 I - [conn74] end connection 127.0.0.1:8254 (2 connections now open)
skip10w 用時27s count()用時7563ms batch_size(100)用時19682ms
2017-02-20T12:20:30.563+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:8964 #75 (1 connection now open)
2017-02-20T12:20:30.565+0800 I NETWORK [conn75] received client metadata from 127.0.0.1:8964 conn75: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T12:20:30.571+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:8966 #76 (2 connections now open)
2017-02-20T12:20:30.571+0800 I NETWORK [conn76] received client metadata from 127.0.0.1:8966 conn76: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T12:20:38.136+0800 I COMMAND [conn76] command cache.bookscache command: count { count: "bookscache", query: { type: "s" } } planSummary: COUNT_SCAN { type: 1 } keysExamined:17090106 docsExamined:0 numYields:133734 reslen:44 locks:{ Global: { acquireCount: { r: 267470 } }, Database: { acquireCount: { r: 133735 } }, Collection: { acquireCount: { r: 133735 } } } protocol:op_query 7563ms
2017-02-20T12:20:57.820+0800 I COMMAND [conn76] command cache.bookscache command: find { find: "bookscache", filter: { type: "s" }, projection: { _id: 0, type: 0 }, skip: 100000, batchSize: 100 } planSummary: IXSCAN { type: 1 } cursorid:19274394564 keysExamined:100100 docsExamined:100100 numYields:1475 nreturned:100 reslen:909397 locks:{ Global: { acquireCount: { r: 2952 } }, Database: { acquireCount: { r: 1476 } }, Collection: { acquireCount: { r: 1476 } } } protocol:op_query 19682ms
2017-02-20T12:20:59.124+0800 I - [conn76] end connection 127.0.0.1:8966 (2 connections now open)
2017-02-20T12:20:59.124+0800 I - [conn75] end connection 127.0.0.1:8964 (2 connections now open)
skip20w 用時41s count()用時8020ms batch_size(100)用時33298ms
2017-02-20T12:21:25.084+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:10346 #77 (1 connection now open)
2017-02-20T12:21:25.085+0800 I NETWORK [conn77] received client metadata from 127.0.0.1:10346 conn77: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T12:21:25.097+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:10348 #78 (2 connections now open)
2017-02-20T12:21:25.098+0800 I NETWORK [conn78] received client metadata from 127.0.0.1:10348 conn78: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T12:21:33.119+0800 I COMMAND [conn78] command cache.bookscache command: count { count: "bookscache", query: { type: "s" } } planSummary: COUNT_SCAN { type: 1 } keysExamined:17090106 docsExamined:0 numYields:133726 reslen:44 locks:{ Global: { acquireCount: { r: 267454 } }, Database: { acquireCount: { r: 133727 } }, Collection: { acquireCount: { r: 133727 } } } protocol:op_query 8020ms
2017-02-20T12:22:06.419+0800 I COMMAND [conn78] command cache.bookscache command: find { find: "bookscache", filter: { type: "s" }, projection: { _id: 0, type: 0 }, skip: 200000, batchSize: 100 } planSummary: IXSCAN { type: 1 } cursorid:21072687025 keysExamined:200100 docsExamined:200100 numYields:2656 nreturned:100 reslen:487416 locks:{ Global: { acquireCount: { r: 5314 } }, Database: { acquireCount: { r: 2657 } }, Collection: { acquireCount: { r: 2657 } } } protocol:op_query 33298ms
2017-02-20T12:22:07.749+0800 I - [conn78] end connection 127.0.0.1:10348 (2 connections now open)
2017-02-20T12:22:07.749+0800 I - [conn77] end connection 127.0.0.1:10346 (2 connections now open)
skip30w 用時61s count()用時12879ms batch_size(100)用時47989ms
2017-02-20T12:22:12.500+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:11645 #79 (1 connection now open)
2017-02-20T12:22:12.502+0800 I NETWORK [conn79] received client metadata from 127.0.0.1:11645 conn79: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T12:22:12.514+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:11647 #80 (2 connections now open)
2017-02-20T12:22:12.517+0800 I NETWORK [conn80] received client metadata from 127.0.0.1:11647 conn80: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T12:22:25.397+0800 I COMMAND [conn80] command cache.bookscache command: count { count: "bookscache", query: { type: "s" } } planSummary: COUNT_SCAN { type: 1 } keysExamined:17090106 docsExamined:0 numYields:133837 reslen:44 locks:{ Global: { acquireCount: { r: 267676 } }, Database: { acquireCount: { r: 133838 } }, Collection: { acquireCount: { r: 133838 } } } protocol:op_query 12879ms
2017-02-20T12:23:13.388+0800 I COMMAND [conn80] command cache.bookscache command: find { find: "bookscache", filter: { type: "s" }, projection: { _id: 0, type: 0 }, skip: 300000, batchSize: 100 } planSummary: IXSCAN { type: 1 } cursorid:21139424984 keysExamined:300100 docsExamined:300100 numYields:3944 nreturned:100 reslen:623958 locks:{ Global: { acquireCount: { r: 7890 } }, Database: { acquireCount: { r: 3945 } }, Collection: { acquireCount: { r: 3945 } } } protocol:op_query 47989ms
2017-02-20T12:23:14.636+0800 I - [conn79] end connection 127.0.0.1:11645 (2 connections now open)
2017-02-20T12:23:14.636+0800 I - [conn80] end connection 127.0.0.1:11647 (2 connections now open)
skip40w 用時74s count()用時10389ms batch_size(100)用時63313ms
2017-02-20T12:23:46.867+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:14010 #81 (1 connection now open)
2017-02-20T12:23:46.868+0800 I NETWORK [conn81] received client metadata from 127.0.0.1:14010 conn81: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T12:23:46.881+0800 I NETWORK [thread1] connection accepted from 127.0.0.1:14012 #82 (2 connections now open)
2017-02-20T12:23:46.882+0800 I NETWORK [conn82] received client metadata from 127.0.0.1:14012 conn82: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Windows", name: "Windows 10", architecture: "AMD64", version: "10.0.14393-" }, platform: "CPython 2.7.12.final.0" }
2017-02-20T12:23:57.272+0800 I COMMAND [conn82] command cache.bookscache command: count { count: "bookscache", query: { type: "s" } } planSummary: COUNT_SCAN { type: 1 } keysExamined:17090106 docsExamined:0 numYields:133781 reslen:44 locks:{ Global: { acquireCount: { r: 267564 } }, Database: { acquireCount: { r: 133782 } }, Collection: { acquireCount: { r: 133782 } } } protocol:op_query 10389ms
2017-02-20T12:25:00.588+0800 I COMMAND [conn82] command cache.bookscache command: find { find: "bookscache", filter: { type: "s" }, projection: { _id: 0, type: 0 }, skip: 400000, batchSize: 100 } planSummary: IXSCAN { type: 1 } cursorid:19982057947 keysExamined:400100 docsExamined:400100 numYields:5292 nreturned:100 reslen:460769 locks:{ Global: { acquireCount: { r: 10586 } }, Database: { acquireCount: { r: 5293 } }, Collection: { acquireCount: { r: 5293 } } } protocol:op_query 63313ms
2017-02-20T12:25:02.015+0800 I - [conn82] end connection 127.0.0.1:14012 (2 connections now open)
2017-02-20T12:25:02.015+0800 I - [conn81] end connection 127.0.0.1:14010 (2 connections now open)