天天看點

Inside MSXML Performance(MSXML性能分析) (6)

Validation 驗證

Validation compares the types of elements in an XML document against a Document Type Definition (DTD) or XML Schema. For example, the DTD may say that all "Customer" elements must contain a child "Name" element. Take a look at the DTD for Hamlet.xml (hamletdtd.htm) and the XML Schema for Hamlet.xml (hamletschema.htm).

驗證是指按照文檔類型定義(DTD)或者XML Schema來檢查XML文檔中的元素類型。例如,DTD中規定所有“Customer”元素必須包含一個“Name”子元素。可以看一下Hamlet.xml的DTD(hamletdtd.htm)和Hamlet.xml的XML Schema(hamletschema.htm)[SL1] 。

Validation is another huge area for performance analysis, but I only have time for a brief mention today. Validation is expensive for several reasons. First, it involves loading a separate file (the DTD or XML Schema) and compiling it. Second, it requires state machinery for performing the validation itself. Third, when the schema also includes information about data types, any data types also have to be validated. For example, if an XML element or attribute is typed as an integer, that text has to be parsed to see if it is a valid integer.

驗證是性能分析的另一大領域,但是這裡隻有一個比較簡單的讨論。由于很多原因,驗證的代價是很大的。首先,它牽涉到另一個單獨的檔案(DTD或者XML Schema)需要載入。第二,它需要狀态機(state machinery)配合進行驗證。第三,如果Schema包含了資料類型的資訊,那麼所有資料類型都必須經過驗證。例如,如果一個XML元素或類型被定為整型,那麼相應的文本必須經過解析來檢視它是否是一個合法的整型。

The following table shows the difference between loading without validation, with DTD validation, and with XML Schema validation.

下表中顯示了載入時沒有驗證,有DTD驗證和有XML Schema驗證的不同情況:

Sample 樣本 Load (milliseconds) 載入(毫秒)

DTD (milliseconds)

DTD

(毫秒)

Schema (milliseconds)

Schema

(毫秒)

Schema plus datatypes (milliseconds)

Schema

并有資料類型檢驗(毫秒)
Ado.xml 662 2,230 2,167 3064
Hamlet.xml 106 215 220 N/A
Ot.xml 1,069 2,168 2,193 N/A
Northwind.xml 64 123 127 N/A

The bottom line is to expect validation to double or triple the time it takes to load your documents. New to MSXML January 2000 Web Release is a

SchemaCollection

object, which allows you to load the XML Schema once and then share it across your documents for validation. This will be discussed in a future article.

最起碼,驗證可能會使載入文檔的時間增加兩倍或三倍。MSXML January 2000 Web Release中新增加了

SchemaCollection

對象,它能夠使得XML Schema隻需載入一次,并能在各文檔驗證時共享。這将在以後的文章中讨論。

XSL

XSL can be a big performance win over using DOM code for generating "transformed" reports from an XML document. For example, suppose you wanted to print out all the speeches by Hamlet in the sample Hamlet.xml. You might use

selectNodes

to find all the speeches by Hamlet, then use another

selectNodes

call to iterate through the lines of each of those speeches, as follows:

XSL在性能上大大優于使用DOM代碼去轉化XML文檔。例如,假設你想要列印出Hamlet.xml中哈姆雷特所有的話。你可能會用selectNodes來查找所有哈姆雷特的話,然後使用另一個selectNodes來查找這些話中的每一行,代碼如下:

function Method1(doc)

{

    var speeches = doc.selectNodes("/PLAY/ACT/SCENE/SPEECH[SPEAKER='HAMLET']");

    var s = speeches.nextNode();

    var out = "";

    while (s)

    {

        var lines = s.selectNodes("LINE");

        var line = lines.nextNode();

        while (line)

        {

            out += line.text;

            line = lines.nextNode();

        }

        out += "<hr>";

        s = speeches.nextNode();

    }

    return out;

}

This works, but it takes about 1,500 milliseconds. A better way to tackle this problem is to use XSL. The following XSL style sheet (or template) does exactly the same thing:

這能夠達到目的,但是會花大概1,500毫秒。一個更好的處理這個問題的方式是使用XSL。以下的XSL樣式表(或者模闆)可以完成同樣的任務:

<xsl:template xmlns:xsl="http://www.w3.org/TR/WD-xsl">

  <xsl:for-each select="/PLAY/ACT/SCENE/SPEECH[SPEAKER='HAMLET']">

    <xsl:for-each select="LINE">

        <xsl:value-of/>

    </xsl:for-each>

    <hr/>

  </xsl:for-each>

</xsl:template>

You can then write the following simpler script code that uses this template:

你可以使用該模闆寫以下簡單的腳本代碼:

function Method2(doc)

{

    var xsl = new ActiveXObject("Microsoft.XMLDOM");

    xsl.async = false;

    xsl.load("hamlet.xsl");

    return doc.transformNode(xsl)

}

This takes only 203 milliseconds—it is more than seven times faster. This is a rather compelling reason to use XSL. In addition, it is easier to update the XSL template than it is to rewrite your code every time you want to get a different report.

這隻需203毫秒就可以了——比前面的方法快7倍以上。這也是為什麼要使用XSL的有力理由。而且,如果你想要得到不同的報告,改寫XSL模闆比改寫你的代碼要容易得多。

The problem is that XSL is very powerful. You have a lot of rope with which to hang yourself, so to speak. XSL has a rich expression language that can be used to walk all over the document in any order. It is highly recursive, and the MSXML parser includes script support for added extensibility. Using all these features with reckless abandon will result in slow XSL style sheets. The following sections describe a few specific traps to watch out for.

問題是XSL太強大了。是以你可以用很多方法來處理問題。XSL有很豐富的表達語言讓你以任何次序來周遊文檔。它是高度遞歸的,而且MSXML解析器增加了對擴充性的腳本支援。濫用這些功能會導緻效率很低的XSL樣式表。以下幾個部分會讨論一些必須注意的陷阱。

Scripting 腳本

It is convenient to call script from within an XSL style sheet, and it is a great extensibility mechanism. But as always, there is a catch. Script code is slow. For purposes of illustration, imagine that we wrote the following style sheet instead of the one shown previously:

在XSL樣式表中可以很友善的調用腳本,這提供了很好的擴充性能。但是它總是帶來性能上的損失。腳本代碼的執行速度比較慢。為了說明這一點,我們改寫前面的樣式表如下:

<xsl:template xmlns:xsl="http://www.w3.org/TR/WD-xsl">

  <xsl:for-each select="/PLAY/ACT/SCENE/SPEECH[SPEAKER='HAMLET']">

    <xsl:eval>this.text</xsl:eval>

    <hr/>

  </xsl:for-each>

</xsl:template>

This produces the same result, but it takes 266 milliseconds instead of 203 milliseconds—a whopping 23 percent slower. The more frequently your xsl:eval statements are executed, the slower the performance becomes. For purposes of illustration only, lets move the xsl:eval inside the inner for-each loop:

這産生相同的結果,但執行需要266毫秒而不是203毫秒了,慢了整整23%。你越經常執行xsl:eval語句,性能下降就越明顯。為了說明這一點,将xsl:eval移到内層for-each循環中:

    <xsl:for-each select="LINE">

        <xsl:eval>this.text</xsl:eval>

    </xsl:for-each>

This one takes 516 milliseconds, more than twice as slow. The bottom line is to be careful with script code in XSL.

這個代碼的執行速度為516毫秒,比原先慢了2倍。是以,你應該對XSL中的腳本代碼小心使用。

The Dreaded "//" Operator 令人擔心的“ // ”運算符

Watch out for the "//" operator. This little operator walks the entire subtree looking for matches. Developers use it more than they should just because they are too lazy to type in the full path. (I catch myself using it all the time, too.) For example, try switching the select statement in the previous example to the following:

小心“//”運算符。這個小小的運算符會周遊整個子樹來進行查找比對。開發者經常在不必要的情況下使用它,隻是因為他們懶得打入完整路徑。(我發現我也總是使用它。)例如,将前面例中的select語句改寫如下:

  <xsl:for-each select="//SPEECH[SPEAKER='HAMLET']">

The time it takes to perform the selection jumps from 203 milliseconds to 234 milliseconds. My laziness just cost me a 15 percent tax.

這次,它的執行時間從203毫秒升至234毫秒。我的懶惰造成了15%的損失。

Prune the Search Tree 精簡查找樹

If there's anything you can do to "prune" the search tree, by all means do it. For example, suppose you were reporting all speeches by Bernardo from Hamlet.xml. All Bernardo's speeches happen to be in Act I. If you already knew this, you could skip the entire search of Act II through Act V. The following shows what the new select statement would look like:

如果你有任何方法可以“精簡”查找樹,那就盡力去做。例如,假設你想查找Hamle.xml中所有Bernardo的話。而所有他的話都在第一幕中。如果你已經知道這一點了,你就應該跳過查找第二至第四幕。以下是新的select語句:

select="/PLAY/ACT[TITLE='ACT I']/SCENE/SPEECH[SPEAKER='BERNARDO']"

This chops the time down from 141 milliseconds to 125 milliseconds, a healthy 11 percent improvement.

這使得運作時間從141毫秒降低到125毫秒,整整提高了11%性能。

Cross-Threading Models 跨線程模式

Before, the

transformNode

and

transformNodeToObject

methods required that the threading model of the style sheet and that of the document being transformed be the same. In the MSXML January 2000 Web Release, you can use free-threaded style sheets on rental documents and vice versa. This means you can get the performance benefit of using rental documents at the same time as the performance win of sharing free-threaded style sheets across threads.

以前,

transformNode

transformNodeToObject

方法要求樣式表和被轉換文檔的線程模式必須相同。在MSXML January 2000 Web Release中,你可以在租用模式的文檔上使用自由線程的樣式表,也可以反過來。這意味着你可以在得到租用文檔的性能優勢的同時享受自由線程模式的樣式表在各線程之中共享的性能提升。

Conclusion

 [SL1]Since the link is not available, we can omit this sentence