原題:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].
解決方法:
我采用了最簡單的一種方法,用一個哈希表來記錄基因串出現的次數,達到兩次即将其加入到結果集中。
還有很多優化的方法,但總體思路都差不多,這裡就不一一列舉了。
代碼:
vector<string> findRepeatedDnaSequences(string s) {
vector<string> res;
map<string,int> m;
int n = s.size();
if (n < 10)
return res;
string seq = s.substr(0, 10);
m[seq]++;
for(int i = 10; i < n; i++){
seq.erase(0, 1);
seq += s[i];
++m[seq];
if (m[seq] == 2)
res.push_back(seq);
}
return res;
}