天天看點

資料庫測試一些知識

作者:亮 亮言

1 測試覆寫率的幾個衡量名額

2 傳統資料庫測試

2.1 SQLite

2.2 Oracle

3 新興的NewSQL資料庫

3.1 比較常見的設計

3.2 Tidb

3.3 FoundationDB

1 測試覆寫率的幾個衡量名額

常見的幾種衡量測試覆寫率方法

函數覆寫(Function Coverage)

語句覆寫(Statement Coverage)

決策覆寫(Decision Coverage)Branch Coverage

條件覆寫(Condition Coverage)

Modified Condition/Decision Coverage (DC + CC + 每一個condition能獨立影響decision結果的測試用例)

主要用在 safety-critical system (航空航天器件)
           
int foo( int x ,int y)
{
   int  z = 0;
   if ( (x>0) && (y>0) ) {
        z = x;
   }

   return z;
}           
if ( a or b ) and c then           

condition/decision criteria 滿足:

a = true , b = true, c = true

a = false, b = false, c= false

第一個Test中,b的值 ;第二個test中,c的值 都不會影響 decision的值。

a=false, b=true, c=false

a=false, b=true, c=true

a=false, b=false, c=true

a=true, b=false, c=true

2 傳統資料庫測試

2.1 SQLite

https://www.sqlite.org/testing.html

源碼:128.9 KSLOC of C code

測試代碼:91772.0 KSLOC of test code

比例:711:1

PR測試示例:

https://sqlite.org/src/info/940f2adc8541a838

主要測試集合清單:

Three independently developed test harnesses

100% branch test coverage in an as-deployed configuration

Millions and millions of test cases

Out-of-memory tests

I/O error tests

Crash and power loss tests

Fuzz tests

Boundary value tests

Disabled optimization tests

Regression tests

Malformed database tests

Extensive use of assert() and run-time checks

Valgrind analysis

Undefined behavior checks

其中三塊獨立開發的測試用例集合:

1 TCL 腳本寫的Test (最開始的測試用例):開發測試時候用

26.1 KSLOC of C code

million 級别的case

2 C語言的測試用例:100% MC/DC test coverage

792.3 KSLOC of C code

hundreds of millions of tests.

3 SQL Logic Test (邏輯測試)

SQLite同時和PostgreSQL, MySQL,Oracle 等資料庫跑相同功能的SQL語句,用來确認各個語句的執行結果是一緻的。

7.2 million

就算是跑了這麼多的測試,還是不能阻止bug的發生,騰訊的安全平台部門發現了SQLite的一個遠端代碼執行漏洞。

SQLite的作者也談到了最開始開發SQLite的一段故事:

當SQLite運作的場景越來越多的時候(million 級别的應用,billion個裝置上的時候),他會穩定的收到bug報告。

當bug數累計的越來越多的時候,作者花了十個月 (2008-09-25 through 2009-07-25) 的時間編寫測試用例,使

SQLite能達到100% MC/DC名額。在這之後,bug報告數目就迅速降低下來了。

SQLite是如何做測試的

https://news.ycombinator.com/item?id=18685296

Remote Code Execution vulnerability in SQLite

100% branch, line coverage means nothing. It's about logical coverage. What are you testing for? You are not testing lines of code, but logic.

Right. The actual standard is called "modified condition/decison coverage" or MC/DC. In languages like C, MC/DC and branch coverage, though not exactly the same, are very close.

Achieving 100% MC/DC does not prove that you always get the right answer. All it means is that your tests are so extensive that you managed to get every machine-code branch to go in both directions at least once. It is a high standard and is difficult to achieve. It does not mean that the software is perfect.

But it does help. A lot. When I was young, I used to think I could right flawless code. Then I wrote SQLite, and it got picked up and used by lots of applications. It will amaze you how many problems will crop up when your code runs on in millions of application on billions of devices.

I was getting a steady stream of bug reports against SQLite. Then I took 10 months (2008-09-25 through 2009-07-25) to write the 100% MC/DC tests for SQLite. And after that, the number of bug reports slowed to a trickle. There still are bugs. But the number of bugs is greatly reduced. (Note that 100% MC/DC was first obtained on 2009-07-25, but the work did not end there. I spend most of my development time adding and enhancing test cases to keep up with changes in the deliverable SQLite code.)

100% MC/DC is just an arbitrary threshold - a high threshold and one that is easy to measure and difficult to cheat - but it is just a threshold at which we say "enough". You could just as easily choose a different threshold, such as 100% line coverage. The higher the threshold, the fewer bugs will slip through. But there will always be bugs.

My experience is that the weird tests you end up having to write just to cause some obscure branch to go one way or another end up finding problems in totally unrelated parts of the system. One of the chief benefits of 100% MC/DC is not so much that every branch is tested, but rather that you have to write so many tests, and such strange, weird, convoluted, and stressful tests, that you randomly stumble across (and fix) lots of problems you would have never thought about otherwise.

Another big advantage of 100% MC/DC is that once they are in place, you can change anything, anywhere in the code, and if the tests all still pass, you have high confidence that you didn't break anything. This enables us to evolve the SQLite code much faster than we could otherwise, using relatively few eyeballs.

Yet another advantage of 100% MC/DC is that you are really testing compiled machine code, not source code. So you worry less about compiler bugs. "Undefined behavior" is a big bugbear with C. We worry less than others about UB because we have tested the output of the compiler and we know that the compiler did what we wanted, even if the official C-language spec didn't require it to. We still avoid UB, and SQLite does not currently contain any UB as far as we know. But is is nice to know that even if we missed some UB in the code someplace, it probably doesn't matter.

2.2 Oracle

https://news.ycombinator.com/item?id=18442941

Oracle Database 12.2

資料庫測試一些知識

Oracle Database developer 的工作内容:

1 開始處理新的bug

2 花費兩周時間理清引起這個bug的代碼邏輯,這個涉及到需要了解20多個标志位在不同場景下的組合情況,有時候多達上百個

3 為了處理bug,新增加一個标志位和幾行代碼規避這個bug

4 将Oracle DB代碼打包,送出到有100-200台機器組成的測試叢集裡測試代碼

5 可以回家了,測試需要跑20-30個小時

6 3-4-5的情況要重複幾次,如果不順利的話。

7 在大約兩周之後,你終于把幾十個标志位的排列組合情況摸清楚了,測試用例有都OK了

8 再給自己新增加的标志位寫測試用例,再來一輪測試。然後送出代碼review

9 代碼review過程持續2周到2個月不等。

是以大概一個bug的解決時間在 2w-2m不等。

A bug's Odyssey

Here is how the life of an Oracle Database developer is:

  • Start working on a new bug.
  • Spend two weeks trying to understand the 20 different flags

    that interact in mysterious ways to cause this bag.

  • Add one more flag to handle the new special scenario. Add a few more lines of code that

    checks this flag and works around the problematic situation and avoids the bug.

  • Submit the changes to a test farm consisting of about 100 to 200 servers that would compile the code,

    build a new Oracle DB, and run the millions of tests in a distributed fashion.

  • Go home. Come the next day and work on something else. The tests can take 20 hours to 30 hours to complete.
  • Go home. Come the next day and check your farm test results. On a good day, there would be about 100 failing tests.

    On a bad day, there would be about 1000 failing tests.

Pick some of these tests randomly and try to understand what went wrong with your assumptions.

Maybe there are some 10 more flags to consider to truly understand the nature of the bug.

  • Add a few more flags in an attempt to fix the issue. Submit the changes again for testing. Wait another 20 to 30 hours.
  • Rinse and repeat for another two weeks until you get the mysterious incantation of the combination of flags right.
  • Finally one fine day you would succeed with 0 tests failing.
  • Add a hundred more tests for your new change to ensure that the next developer who has the misfortune of touching

    this new piece of code never ends up breaking your fix.

  • Submit the work for one final round of testing. Then submit it for review.

    The review itself may take another 2 weeks to 2 months. So now move on to the next bug to work on.

  • After 2 weeks to 2 months, when everything is complete, the code would be finally merged into the main branch.

The above is a non-exaggerated description of the life of a programmer in Oracle fixing a bug. Now imagine what horror it is going to be to develop a new feature. It takes 6 months to a year (sometimes two years!) to develop a single small feature (say something like adding a new mode of authentication like support for AD authentication).

I don't work for Oracle anymore. Will never work for Oracle again!

3 新興的NewSQL資料庫

3.1 比較常見的設計

單機版的KV store 設計:

資料庫測試一些知識

在單機KV store基礎上 通過分布式一緻性協定(Paxos極其變種(Raft,ZK,VR等)

3.2 Tidb

我們現在有六百多萬個 Test

1 自動化測試

怎麼去自動生成測試Case:

1.1 fault injection 測試異常分支

Hardware : disck,cpu,network card,

Software: os,network protocol,file system

kill -9

1.2 fuzz testing

1.3 上層協定相容Mysql,把Mysql的測試用例直接遷移過來

1.3 通過解析文法樹

https://www.pingcap.com/blog-cn/golang-failpoint/

1.4 Jespen測試:

驗證分布式系統一緻性的測試架構

https://pingcap.com/blog-cn/tidb-jepsen/

2 所有出現過的 bug,曆史上隻要出現過一次,你一定要寫一個 Test 去 cover 它 目前主流開源社群都在堅持的做法

3.3 FoundationDB

FoundationDB開源産品,後來被Apple收購又不開源了,用在Apple的Icloud存儲背景場景下,

現在又被重新開源了。

https://apple.github.io/foundationdb/testing.html

重要思想:Simulation

a deterministic simulation of an entire FoundationDB cluster within a single-threaded process.

用單個程序(single-thread)确定性的(deterministic)模拟出整個叢集(entire cluster)

确定性:可以複現問題

同時可以方面的模拟各個層面的Failure Mode

在C++11 基礎上加入了actor-based-concurrency的能力,

開發出了一個新的Flow語言

1 high performance

2 actor-based concurrency

3 Simulation

one trillion CPU-hours of simulation on FoundationDB

Reference:

https://www.infoq.cn/article/test-coverage-rate-role https://www.sqlite.org/th3/doc/trunk/www/th3.wiki https://www.pingcap.com/blog-cn/distributed-system-test-1/ https://www.pingcap.com/blog-cn/distributed-system-test-2/ https://www.pingcap.com/blog-cn/distributed-system-test-3/ https://apple.github.io/foundationdb/engineering.html#simulation https://jepsen.io/

Remote code execution vulnerability in SQLite

Ask HN: What's the largest amount of bad code you have ever seen work?