Java那些坑(1):Java7的substring

問題描述

應用伺服器從Weblogic 10更新到了Weblogic 12，對應的jdk也從1.6更新到了1.7，結果原來的一個sql執行時間從1s變成了25s。

這個sql是使用Mybatis進行的動态拼裝，如下，省略了一些字段。

<code><</code><code>insert</code> <code>id</code><code>=</code><code>"insertBatchMap"</code> <code>parameterType</code><code>=</code><code>"Map"</code><code>></code>

<code> </code><code>INSERT INTO BTT_LOG</code>

<code> </code><code>DATATIME,</code>

<code> </code><code>CHANNEL,</code>

<code> </code><code>ORGAN_NO,</code>

<code> </code><code>VALUES</code>

<code> </code><code><</code><code>foreach</code> <code>collection</code><code>=</code><code>"list"</code> <code>separator</code><code>=</code><code>","</code> <code>index</code><code>=</code><code>"i"</code> <code>item</code><code>=</code><code>"ls"</code><code>></code>

<code> </code><code>#{ls[ID],jdbcType=BIGINT},</code>

<code> </code><code>#{ls[DATATIME],jdbcType=TIMESTAMP},</code>

<code> </code><code>#{ls[CHANNEL],jdbcType=VARCHAR},</code>

<code> </code><code>#{ls[ORGAN_NO],jdbcType=VARCHAR},</code>

<code> </code><code>#{ls[USER_ID],jdbcType=VARCHAR},</code>

<code> </code><code></</code><code>foreach</code><code>></code>

<code></</code><code>insert</code><code>></code>

傳入一個Map，裡面是個List。最終生成的sql是下面這樣的：

<code>insert</code>

<code>BTT_LOG(ID,DATATIME,CHANNEL,ORGAN_NO,USER_ID) </code>

<code>values</code>

這個sql的作用是将目前庫的資料遷移到曆史庫，一條sql遷移1000條，每天大約2000萬的資料，速度一下子慢了25倍，資料都導不完了。

問題分析

看資料庫中sql的執行時間，也就用了1s左右，是以肯定不是資料庫的問題。通過VisualVM檢視CPU熱點，發現熱點在org.apache.ibatis.parsing.GenericTokenParser.parse上面，自用時間特别長。

這個函數自用時間很長，看了下代碼如下：

<code> </code><code>public</code> <code>String parse1(String text) {</code>

<code> </code><code>StringBuilder builder = </code><code>new</code> <code>StringBuilder();</code>

<code> </code><code>String after = text;</code>

<code> </code><code>int</code> <code>start = after.indexOf(openToken);</code>

<code> </code><code>int</code> <code>end = after.indexOf(closeToken);</code>

<code> </code><code>while</code> <code>(start > -</code><code>1</code><code>) {</code>

<code> </code><code>if</code> <code>(end > start) {</code>

<code> </code><code>String before = after.substring(</code><code>0</code><code>, start);</code>

<code> </code><code>String content = after.substring(start + openToken.length(), end);</code>

<code> </code><code>String substitution;</code>

<code> </code><code>// check if variable has to be skipped</code>

<code> </code><code>if</code> <code>(start > </code><code>0</code> <code>&& text.charAt(start - </code><code>1</code><code>) == </code><code>'\\'</code><code>) {</code>

<code> </code><code>before = before.substring(</code><code>0</code><code>, before.length() - </code><code>1</code><code>);</code>

<code> </code><code>substitution = </code><code>new</code> <code>StringBuilder(openToken).append(content).append(closeToken).toString();</code>

<code> </code><code>substitution = handler.handleToken(content);</code>

<code> </code><code>builder.append(before);</code>

<code> </code><code>builder.append(substitution);</code>

<code> </code><code>after = after.substring(end + closeToken.length());</code>

<code> </code><code>String before = after.substring(</code><code>0</code><code>, end);</code>

<code> </code><code>builder.append(closeToken);</code>

<code> </code><code>break</code><code>;</code>

<code> </code><code>start = after.indexOf(openToken);</code>

<code> </code><code>end = after.indexOf(closeToken);</code>

<code> </code><code>builder.append(after);</code>

<code> </code><code>return</code> <code>builder.toString();</code>

感覺自用時間不應該那麼長，大部分都是調用其他函數的，是以我懷疑這個自用時間是有問題的，可能包含了substring的時間。

我寫了下面的代碼做實驗：

<code>public</code> <code>static</code> <code>void</code> <code>test() {</code>

<code> </code><code>while</code><code>(</code><code>true</code><code>){</code>

<code> </code><code>String xxxx = </code><code>"jfkfsdfjskfjsdkfjsakjdkfjdskfjslfjslkfjsdkfjsdlfjsd"</code><code>.substring(</code><code>2</code><code>);</code>

這段代碼對應的CPU抽樣如下：

自用時間特别長，substring的時間很短。

然後我改了一下，把while(true)改成for(int i=0;i<100;i++)，結果substring的時間很長。

<code> </code><code>String xxxx = </code><code>"jfkfsdfjskfjsdkfjsakjdkfjdskfjslfjslkfjsdkfjsdlfjsd"</code><code>.substring(</code><code>2</code><code>);</code>

對應的抽樣如下：

試了好多次都這樣。然後我看了下它們的位元組碼：

使用while(true)的

使用for循環的

沒有什麼大差別，使用for循環時由于還需要做加法，判斷是否小于100，是以自用的指令應該更多。substring方法的位元組碼遠遠多于這幾條位元組碼，是以我覺得上面兩個函數的時間都耗在substring上了。

我們用的MyBatis版本是3.1.1，我對比了下3.2.0和3.1.1的代碼，發現了如下問題：

原來的代碼：

<code>String after = text;</code>

<code>before = after.substring(</code><code>0</code><code>,start);</code>

<code>builder.append(before);</code>

變成了下面這樣：

<code>char</code><code>[] src = text.toCharArray();</code>

<code>builder.append(src, offset, start - </code><code>1</code><code>)</code>

把所有原來的substring方法全部替換成了使用char數組加上起止位置。是以說substring一定有問題，在新版的MyBatis裡不用了。

為什麼我們的應用這個問題會特别明顯，因為每條sql 1000行，每行100個字元的話就是10萬個字元，parse這個函數需要對這個10萬字元的字元串做上萬次substring，是以出問題了。更新了新版的MyBatis，這個問題解決了。

那麼substring到底有啥問題呢？其實是substring在java7裡面做了修改，在java6裡面使用substring，和使用char數組加起止位置是一樣的，但是在java7裡就不一樣了。

String類主要有三個成員：

<code>private</code> <code>final</code> <code>char</code> <code>value[]; </code>

<code>private</code> <code>final</code> <code>int</code> <code>offset; </code>

<code>private</code> <code>final</code> <code>int</code> <code>count;</code>

在1.6裡面，substring以及它調用的構造函數如下：

<code>// JDK6,共享 value數組提升速度 </code>

<code>String(</code><code>int</code> <code>offset, </code><code>int</code> <code>count, </code><code>char</code> <code>value[]) { </code>

<code> </code><code>this</code><code>.value = value; </code>

<code> </code><code>this</code><code>.offset = offset; </code>

<code> </code><code>this</code><code>.count = count; </code>

<code>public</code> <code>String substring(</code><code>int</code> <code>beginIndex, </code><code>int</code> <code>endIndex) { </code>

<code> </code><code>// 如果範圍和自己一模一樣,則傳回自身,否則用value字元數組構造一個新的對象 </code>

<code> </code><code>return</code> <code>((beginIndex == </code><code>0</code><code>) && (endIndex == count)) ? </code><code>this</code> <code>: </code>

<code> </code><code>new</code> <code>String(offset + beginIndex, endIndex - beginIndex, value); </code>

使用substring産生的新對象，和原來的對象共用一個char[]數組。

這樣能提高性能。但是有風險，就是記憶體浪費（有人稱之為”記憶體洩漏“，但是我覺得不能算記憶體洩露，因為這段記憶體是可以回收的），怎麼浪費的呢？

<code>String x = </code><code>"一個非常非常長的字元，可能是從網上抓取的一段文字"</code><code>;</code>

<code>String y = x.substring(a,b); </code><code>//比較短</code>

<code>y的使用時間很長，短時間内不會被回收</code>

問題就來了，原來x會有一個很長的char[]數組，使用substring後這個數組被y引用了，是以回收x的時候這個數組不會被回收，因為被y引用着，但是y隻會用其中很短的一部分，造成了記憶體的浪費。

在1.7中做了修改，構造函數變成下面這樣，每次都把自己的數組拷貝一份出來。

<code>public</code> <code>String(</code><code>char</code> <code>value[], </code><code>int</code> <code>offset, </code><code>int</code> <code>count) { </code>

<code> </code><code>this</code><code>.offset = </code><code>0</code><code>;</code>

<code> </code><code>this</code><code>.count = count; </code>

<code> </code><code>this</code><code>.value = Arrays.copyOfRange(value, offset, offset+count); </code>

這樣就不會有記憶體浪費的問題了。但是性能肯定不如共享char[]數組，不過也不會太差。

本文轉自nxlhero 51CTO部落格，原文連結：http://blog.51cto.com/nxlhero/1830578，如需轉載請自行聯系原作者

Java那些坑(1):Java7的substring

繼續閱讀

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

用mybatis的generator插件在項目中自動生成dao及entity

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

sqlServer根據經緯查距離

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

從mybatis到mybatis plus(mp)

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method