title | author | date | CreateTime | categories |
C# 對 byte 數組進行模式搜尋 | lindexi | 2019-08-31 16:55:58 +0800 | 2018-07-19 11:16:46 +0800 | C# |
本文告訴大家幾個方法從 byte 數組找到對應的相同序列的數組
最簡單的方法是進行數值判斷,但是代碼最少是使用Linq ,效率比較高是使用 Boyer-Moore 算法,下面就告訴大家幾個算法的代碼
判斷數值
class ByteArrayRocks { public static IEnumerable<int> IndexOf(byte[] source, int start, byte[] pattern) { if (IsEmptyLocate(source, start, pattern)) { yield break; } for (int i = start; i < source.Length; i++) { if (!IsMatch(source, i, pattern)) { continue; } yield return i; } } private static readonly int[] Empty = new int[0]; private static bool IsMatch(byte[] array, int position, byte[] candidate) { if (candidate.Length > (array.Length - position)) { return false; } for (int i = 0; i < candidate.Length; i++) { if (array[position + i] != candidate[i]) { return false; } } return true; } private static bool IsEmptyLocate(byte[] array, int start, byte[] candidate) { return array == null || candidate == null || array.Length == 0 || array.Length < start || candidate.Length == 0 || candidate.Length + start > array.Length; } }
這是最簡單的方法,參見 https://stackoverflow.com/a/283648/6116637
linq
這個方法的代碼最少
class LinqArraySearch { public static IEnumerable<int> IndexOf(byte[] source, int start, byte[] pattern) { for (int i = start; i < source.Length; i++) { if (source.Skip(i).Take(pattern.Length).SequenceEqual(pattern)) { yield return i; } } } }
Boyer-Moore-Horspool 搜尋
這是最快的方法
class BoyerMooreHorspool { public static IEnumerable<long> IndexesOf(byte[] source, int start, byte[] pattern) { if (source == null) { throw new ArgumentNullException(nameof(source)); } if (pattern == null) { throw new ArgumentNullException(nameof(pattern)); } long valueLength = source.LongLength; long patternLength = pattern.LongLength; if ((valueLength == 0) || (patternLength == 0) || (patternLength > valueLength)) { yield break; } var badCharacters = new long[256]; for (var i = 0; i < 256; i++) { badCharacters[i] = patternLength; } var lastPatternByte = patternLength - 1; for (long i = 0; i < lastPatternByte; i++) { badCharacters[pattern[i]] = lastPatternByte - i; } long index = start; while (index <= valueLength - patternLength) { for (var i = lastPatternByte; source[index + i] == pattern[i]; i--) { if (i == 0) { yield return index; break; } } index += badCharacters[source[index + lastPatternByte]]; } } }
參見:https://stackoverflow.com/q/16252518/6116637
測試了三個方法的性能,請看下面
BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134 Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores .NET Core SDK=2.1.202 [Host] : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT DefaultJob : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
Method | Mean | Error | StdDev |
Linq | 13,332.8 us | 251.782 us | 466.694 us |
ByteArrayRocks | 371.3 us | 7.327 us | 14.462 us |
BoyerMooreHorspool | 108.3 us | 1.153 us | 1.079 us |
其他方法請看下面
使用不安全代碼的 Boyer Moore 算法 C# High Performance Boyer Moore Byte Array Search Algorithm