[leetcode] 187. Repeated DNA Sequences

2023-06-19 17:12:51

<span style="font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; background-color: rgb(255, 255, 255);">All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.</span>

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

解法一：

這個思想與之前的不同，1）十個character的string用一個int表示。2）這個int是動态更新，不是完全替代。

首先，dna的string由ACGT表示，他們的ASCII碼分别是：A: 0100 0001　　C: 0100 0011　　G: 0100 0111　　T: 0101 0100

發現，他們最後三位是由不同的數字表示的。那麼30位的bit就可以代表長度為10的string。在更新目前substr的時候，先用一個mask取int中最低的27位，然後左移3位，再把最新的char對應的3bit放到int的最低3位。查找重複substr使用hash table。

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        vector<string> res;
        if(s.size()<=10) return res;
        
        int cur = 0, i =0;
        int mask = 0x7ffffff;
        
        while(i<9) cur = (cur<<3) | (s[i++] & 7);
        
        unordered_map<int,int> m;
        while(i<s.size()){
            cur = ((cur&mask)<<3) | (s[i++] & 7);
            if(m.find(cur)==m.end()){
                m[cur]++;
            }else{
                if(m[cur]==1) res.push_back(s.substr(i-10,10));
                m[cur]++;
            }
        }
        return res;
    }
};

[leetcode] 187. Repeated DNA Sequences

繼續閱讀

【leetcode】32.longest-valid-parentheses（最長有效括号）【leetcode】32.longest-valid-parentheses（最長有效括号）

力扣每日一題：65. 有效數字題目：65. 有效數字解題思路

<LeetCode>三道腦筋急轉彎程式設計題

求水窪的問題——深度優先算法

200. 島嶼的個數（深度優先搜尋）

dfs深度優先搜尋_圖的深度優先搜尋（DFS）

【算法提高班】貪婪政策

#每日一題力扣第28題采購方案

【python】：布爾運算

對角線周遊LeetCode–對角線周遊

查找算法之二分查找查找算法之二分查找

LeetCode-110-平衡二叉樹-C語言

筆試面試題目：滑動視窗(二)

27. Remove Element(清單)題目代碼

LeetCode OJ Binary Tree Right Side View

leetcode809