天天看点

187. Repeated DNA Sequences [leetcode]

题目:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].       

方法一:

hashmap + string操作。

时间:O(N)

class Solution {public: vector<string> findRepeatedDnaSequences(string s) { vector<string> res; if (s.size() < 10) return res; unordered_map<string, int> strs; for (int i = 0; i <= s.size() - 10; i++) { strs[s.substr(i, 10)]++; } for (unordered_map<string, int>:: iterator it = strs.begin(); it != strs.end(); it++) { if (it->second > 1) res.push_back(it->first); } return res; }};

该方法的缺点是由于把string用作key,空间占用过大

接下来的方法参考的是讨论区大神们的思路:

方法二:介个bit manipulation

把10位的string用一个int来表示。 A:00 C:01 G:10 T:11 因此一个DNA片段只要20个bit即可表示,范围在INT内。

code:

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        vector<string> res;
        if (s.size() < 10) return res;
        unordered_map<char,int> change;
        change['A'] = 0;
        change['C'] = 1;
        change['G'] = 2;
        change['T'] = 3;
        unordered_map<int, int> strs;
        int num = 0;
        for (int i = 0; i < s.size(); i++) {
             num = num << 2 | change[s[i]];
            if (i < 9) continue;
            num &= 0xfffff;
            if (strs.count(num) && strs[num] == 1) {
                res.push_back(s.substr(i - 9, 10));
            }
            strs[num]++;
        }

        return res;
    }
};