187. Repeated DNA Sequences [leetcode]

2023-06-19 17:14:26

题目：

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

方法一：

hashmap + string操作。

时间：O(N)

class Solution {public: vector<string> findRepeatedDnaSequences(string s) { vector<string> res; if (s.size() < 10) return res; unordered_map<string, int> strs; for (int i = 0; i <= s.size() - 10; i++) { strs[s.substr(i, 10)]++; } for (unordered_map<string, int>:: iterator it = strs.begin(); it != strs.end(); it++) { if (it->second > 1) res.push_back(it->first); } return res; }};

该方法的缺点是由于把string用作key，空间占用过大

接下来的方法参考的是讨论区大神们的思路：

方法二：介个bit manipulation

把10位的string用一个int来表示。 A:00 C:01 G:10 T:11 因此一个DNA片段只要20个bit即可表示，范围在INT内。

code：

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        vector<string> res;
        if (s.size() < 10) return res;
        unordered_map<char,int> change;
        change['A'] = 0;
        change['C'] = 1;
        change['G'] = 2;
        change['T'] = 3;
        unordered_map<int, int> strs;
        int num = 0;
        for (int i = 0; i < s.size(); i++) {
             num = num << 2 | change[s[i]];
            if (i < 9) continue;
            num &= 0xfffff;
            if (strs.count(num) && strs[num] == 1) {
                res.push_back(s.substr(i - 9, 10));
            }
            strs[num]++;
        }

        return res;
    }
};

187. Repeated DNA Sequences [leetcode]

继续阅读

一文看懂字符串的加减乘除

极大似然法(ML)与最大期望法(EM)

C++ 第十五周报告1--《冒泡法排序》

C++实现简单顺序表

笔试面试题目：滑动窗口(二)

27. Remove Element(列表)题目代码

数据结构与算法（27）——排序（二）

Dijkstra--简易版（最短路径）

C经典书籍笔记——C陷阱与缺陷②(语法陷阱之优先级)一、错误案列二、优先级规律

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

线性表之顺序表的实现

LeetCode OJ Binary Tree Right Side View

leetcode809

C++判断素数、求最大公约数代码判断一个数是否为素数求两个数的最大公约数

SequoiaDB巨杉数据库C++驱动概述

hdu7108哈希