靜态掃描之Yara第二話--編寫yara規則（1）

編寫簡單高效的yara規則（1）

翻譯自：https://www.bsk-consulting.de/2015/02/16/write-simple-sound-yara-rules/

在過去的兩年裡，我根據IOC抓取到的樣本，編寫了約2000條Yara規則。許多安全專家發現，Yara提供了一種簡單有效的方法，可以根據樣本中的字元串或位元組序列編寫自定義規則，這使得廣大使用者可以建立屬于自己的檢測工具。

然而，讓我不滿意的是，研究人員發表的yara規則存在兩大不足：

産生許多誤報
隻能識别單一的樣本，這樣的話還不如用hash值識别

是以，我決定寫一篇關于如何建構最佳Yara規則的文章，這些規則可以用來掃描上傳到沙箱的單個樣本以及整個檔案系統，而且誤報幾率很小。

這些規則是基于特征字元串的，易于了解。您不需要了解PE的逆向工程，我決定避免使用“pe”這樣的新的Yara子產品，我認為這些子產品在實踐中可能會導緻記憶體洩漏或其他錯誤。

自動化生成yara規則

首先，我相信自動生成的規則永遠比不上手動建立的規則。在 IOC scanners THOR和LOKI的工作期間，我不得不手動建立數百個Yara規則，很明顯，是個繁瑣的工作。我曾經的方法是通過以下指令從我的樣本中提取UNICODE和ASCII字元串：

strings -el samples.exe
strings -a sample.exe

我更喜歡UNICODE字元串，因為它們經常被忽略，并且在某個惡意軟體家族中更改的頻率較低。確定在規則中使用帶有“wide”關鍵字的UNICODE字元串和帶有“ascii”關鍵字的ASCII字元串，如果要全比對，則使用“fullword”。

這種方法的問題是，我們不能保證其中的特征字元串是唯一的，并且這些字元串可能出現在合法的軟體中。

在下面的示例中檢視提取的字元串：

NTLMSSP
%d.%d.%d.%d
%s\IPC$
\\%s
NT LM 
%s%s%s
%s.exe %s
%s\Admin$\%s.exe
RtlUpcaseUnicodeStringToOemString
LoadLibrary( NTDLL.DLL ) Error:%d

你能确定字元串“NT LM 0.12”是這個惡意軟體特有的，不會出現在合法的軟體中嗎？

為了解決這個問題，我開發了yarGen，一個Yara規則生成器，附帶一個大型合法軟體的良性字元串庫。我使用Windows 2003，Windows 7和Windows 2008 R2伺服器的Windows系統檔案夾檔案，Microsoft Office，7zip，Firefox，Chrome，Cygwin和各種防毒軟體檔案夾等合法軟體生成良性字元串庫。 yarGen允許您生成自己的良性字元串庫或添加更多合法軟體的檔案夾到現有的良性字元串庫。

yarGen從惡意樣本中提取所有ASCII和UNICODE字元串，并删除所有也出現在良性字元串資料庫中的字元串。然後使用模糊正規表達式和“Gibberish Detector”來評估和評分每個字元串，這使得yarGen能夠選出最優的特征字元串。這些字元串的前20位将被整合到最終的規則中。

我們來看看兩個例子。 Enfal Trojan和SMB蠕蟲樣本的示例。

yarGen從Enfal木馬樣本中提取出以下規則：

rule Enfal_Generic {
meta:
description = "Auto-generated rule - from 3 different files"
author = "YarGen Rule Generator"
reference = "not set"
date = "2015/02/15"
super_rule = 
hash = "6d484daba3927fc0744b1bbd7981a56ebef95790"
hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
strings:
$s0 = "urlmon" fullword
$s1 = "Registered trademarks and service marks are the property of their respec" wide
$s2 = "Micorsoft Corportation" fullword wide
$s3 = "IM Monnitor Service" fullword wide
$s4 = "imemonsvc.dll" fullword wide
$s5 = "iphlpsvc.tmp" fullword
$s6 = "XpsUnregisterServer" fullword
$s7 = "XpsRegisterServer" fullword
$s8 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword
$s9 = "tEHt;HuD" fullword
$s10 = "6.0.4.1624" fullword wide
$s11 = "#*8;-&gt;)" fullword
$s12 = "%/&gt;#?#*8" fullword
$s13 = "\\%04x%04x\" fullword
$s14 = ",," fullword
$s15 = ",," fullword
$s16 = ",," fullword
$s17 = ",," fullword
$s18 = ",," fullword
$s19 = ",," fullword
$s20 = ",," fullword
condition:
all of them
}

生成的字元串集合包含許多有用的字元串，但也包含随機的ASCII字元（ s9， s11，$ s12），它們可以比對目前的樣本，但比對不了其他相似的惡意樣本（如同一個家族的）。

yarGen從SMB蠕蟲樣本中提取以下規則：

rule sig_smb {
meta:
description = "Auto-generated rule - file smb.exe"
author = "YarGen Rule Generator"
reference = "not set"
date = "2015/02/15"
hash = "db6cae5734e433b195d8fc3252cbe58469e42bf3"
strings:
$s0 = "LoadLibrary( NTDLL.DLL ) Error:%d" fullword ascii
$s1 = "SetServiceStatus failed, error code = %d" fullword ascii
$s2 = "%s\\Admin$\\%s.exe" fullword ascii
$s3 = "%s.exe %s" fullword ascii
$s4 = "iloveyou" fullword ascii
$s5 = "Microsoft@ Windows@ Operating System" fullword wide
$s6 = "\\svchost.exe" fullword ascii
$s7 = "secret" fullword ascii
$s8 = "SVCH0ST.EXE" fullword wide
$s9 = "msvcrt.bat" fullword ascii
$s10 = "Hello123" fullword ascii
$s11 = "princess" fullword ascii
$s12 = "Password123" fullword ascii
$s13 = "Password1" fullword ascii
$s14 = "config.dat" fullword ascii
$s15 = "sunshine" fullword ascii
$s16 = "password &lt;=14" fullword ascii
$s17 = "del /a %1" fullword ascii
$s18 = "del /a %0" fullword ascii
$s19 = "result.dat" fullword ascii
$s20 = "training" fullword ascii
condition:
all of them
}

以上規則算是合格的yara規則，但它們遠非最佳的yara規則。，盡管這些yara規則不會比對上合法軟體。

如果你不想使用或下載下傳yarGen，你也可以使用由Joe Security提供的線上工具Yara Rule Generator，它也是基于yarGen的。

接下來，我們來看看如何生成更高效更通用的yara規則。

生成高效通用的yara規則

正如我在導言中所說的産生誤報的規則相當煩人。然而，真正的悲劇是大多數規則太具體，不能比對多個樣本，是以效果和hash值比對一樣效果。

于是我将這些字元串進行分類：

Very specific strings：單個惡意樣本特有的
  Rare strings：可能不會出現在合法軟體中，但也有可能出現
  Strings that look common：通用型，不會出現在合法軟體中的

觀察一下規則以便更好地了解。忽略名為$ mz的定義，稍後我會解釋它。

以$ s開頭的是specific字元串，我認為這些字元串非常特殊，不會出現在合法的軟體中。請注意兩個字元串中的拼寫錯誤：“Micorsoft Corportation”而不是“Microsoft Corporation”和“Monnitor”，而不是“Monitor”。

以$ x開頭的字元串是rare字元串，它們可能會出現在合法軟體。

以$ z開頭的是general字元串，能通用地比對惡意軟體，不會出現在合法軟體中。

rule Enfal_Malware_Backdoor {
meta:
description = "Generic Rule to detect the Enfal Malware"
author = "Florian Roth"
date = "2015/02/10"
super_rule = 
hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790"
hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
strings:
$mz = { d a }

$s1 = "Micorsoft Corportation" fullword wide
$s2 = "IM Monnitor Service" fullword wide

$x1 = "imemonsvc.dll" fullword wide
$x2 = "iphlpsvc.tmp" fullword
$x3 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword

$z1 = "urlmon" fullword
$z2 = "Registered trademarks and service marks are the property of their" wide
$z3 = "XpsUnregisterServer" fullword
$z4 = "XpsRegisterServer" fullword
condition:
( $mz at  ) and
(
(  of ($s*) ) or
(  of ($x*) and all of ($z*) )
)
and filesize < 
}

現在來看條件語句，注意我們使用

$mz

來定義掃描PE檔案，避免如防病毒簽名檔案，浏覽器緩存或字典檔案等誤報。加上

filesize

來給掃描樣本加上大小限制，達到更精确的效果

我規定了當目标檔案隻要存在一個specific字元串，就觸發此規則（1 of

$s*

）

當出現若幹個rare字元串且出現全部genernal字元串時，觸發此規則(2 of

$x*

and all of

$z*

)

接下來我們看第二個例子：

rule SMB_Worm_Tool_Generic {
meta:
description = "Generic SMB Worm/Malware Signature"
author = "Florian Roth"
reference = "http://goo.gl/N3zx1m"
date = "2015/02/08"
hash = "db6cae5734e433b195d8fc3252cbe58469e42bf3"
strings:
$mz = { d a }

$s1 = "%s\\Admin$\\%s.exe" fullword ascii
$s2 = "SVCH0ST.EXE" fullword wide

$a1 = "LoadLibrary( NTDLL.DLL ) Error:%d" fullword ascii
$a2 = "\\svchost.exe" fullword ascii
$a3 = "msvcrt.bat" fullword ascii
$a4 = "Microsoft@ Windows@ Operating System" fullword wide

$x1 = "%s.exe %s" fullword ascii
$x2 = "password &lt;=14" fullword ascii
$x3 = "del /a %1" fullword ascii
$x4 = "del /a %0" fullword ascii
$x5 = "SetServiceStatus failed, error code = %d" fullword ascii

$z1 = "secret" fullword ascii
$z2 = "Hello123" fullword ascii
$z3 = "princess" fullword ascii
$z4 = "Password123" fullword ascii
$z5 = "Password1" fullword ascii
$z6 = "sunshine" fullword ascii
$z7 = "training" fullword ascii
$z8 = "iloveyou" fullword ascii
condition:
$mz at  and
(  of ($s*) and  of ($x*) ) or
( all of ($a*) and  of ($x*) ) or
(  of ($z*) and  of ($x*) ) and
filesize < 
}

$s*

為specific字元串（如SVCH0ST.EXE，”O”被替換為”0”，這可能是目前樣本才有的特征），

$a*

為rare字元串，這些字元串也有可能出現在合法軟體中，

$x*

為general字元串，是惡意軟體通用的特征，不會比對到合法軟體，

$z*

為自定義的密碼類字元串，一般是暴力破解類惡意軟體才會擁有這種字元串，我們也将其歸為一類，

最後，我們通過判斷各類字元串的權重優化條件。

測試

測試是很關鍵的一個環節，一個yara規則寫的好不好，就看我給你的樣本可不可以檢測出來。

你應該通過兩個步驟來測試你的yara規則：

掃描惡意樣本
掃描合法軟體

馬上下載下傳yara來進行試試吧！

如果你的規則可以檢測出惡意樣本，且不會比對到合法軟體，那麼你的yara規則就可以算是合格了 ^_^

靜态掃描之Yara第二話--編寫yara規則（1）

編寫簡單高效的yara規則（1）

自動化生成yara規則

生成高效通用的yara規則

測試

繼續閱讀

靜态掃描之Yara第四話--編寫yara規則（3）

惡意軟體緒論

通過環境變量來繞過Windows Defender和隐藏混淆行為什麼是環境變量繞過Windows Defender和隐藏混淆行為

DLL劫持原理&防禦方法

靜态掃描之Yara第三話--編寫yara規則（2）

惡意代碼分析實戰：靜态分析技術

Cuckoo的配置與使用Ubuntu + VirtualBox + windows Xp SP3