題目

搜尋引擎會通過日志檔案把使用者每次檢索使用的所有檢索串都記錄下來，每個查詢串的長度為1-255位元組。

假設目前有一千萬個記錄（這些查詢串的重複度比較高，雖然總數是1千萬，但如果除去重複後，不超過3百萬個。一個查詢串的重複度越高，說明查詢它的使用者越多，也就是越熱門。），請你統計最熱門的10個查詢串，要求使用的記憶體不能超過1G。

思路

第一步：用Hashmap（STL中叫unordered_map）統計詞頻

第二步：用容量為K的最小堆取出出現次數最大的K個詞

（參考 http://blog.csdn.net/fuyufjh/article/details/48369801）

代碼

#include <iostream>
#include <vector>
#include <queue>
#include <functional>
#include <algorithm>
#include <cstdlib>
#include <ctime>
#include <map>
#include <string>
#include <unordered_map>
using namespace std;

typedef pair<string, int> Record;

struct RecordComparer {
    bool operator() (const Record &r1, const Record &r2) {
        return r1.second > r2.second;
    }
};

vector<Record> TopKNumbers(vector<string> &input, int k) {
    unordered_map<string, int> stat;
    for (const string &s : input) stat[s]++;
    priority_queue<Record, vector<Record>, RecordComparer> heap;
    auto iter = stat.begin();
    for (int i = ; i < k && iter != stat.end(); i++, iter++) {
        heap.push(*iter);
    }
    for (; iter != stat.end(); iter++) {
        if (iter->second > heap.top().second) {
            heap.pop();
            heap.push(*iter);
        }
    }
    vector<Record> result;
    while (!heap.empty()) {
        result.push_back(heap.top());
        heap.pop();
    }
    return result;
}

/********  測試代碼  *********/
int main() {
    clock_t cbegin, cend;
    vector<string> test;
    char buf[];
    for (int i = ; i < ; i++) {
        int x = rand() % ;
        sprintf(buf, "STR%d", x);
        test.push_back(string(buf));
    }
    auto result = TopKNumbers(test, );
    for (auto it = result.rbegin(); it != result.rend(); it++) {
        cout << it->first << '\t' << it->second << endl;
    }
    printf("============================\n");
    sort(test.begin(), test.end());
    for (const string &s : test) {
        cout << s << endl;
    }
}

Ref

http://blog.csdn.net/v_JULY_v/article/details/6403777

取出現次數最多的K個數題目思路代碼Ref

題目

思路

代碼

Ref

繼續閱讀

Codeforces 1417 D. Make Them Equal(思維+構造)

查找算法之二分查找查找算法之二分查找

查找算法學習之二分查找（Python版本）——BinarySearch

CQ V1.0分詞bates(基于雙數組tire樹)—應該是目前最快的中文分詞算法

Command Network(POJ 3164)---定根最小樹形圖模闆題題目描述輸入格式輸出格式輸入樣例輸出樣例分析源程式

開源低帶寬語音編解碼器

241 Different Ways to Add Parentheses（C代碼版）

【趨高機器視覺】機器視覺技術原了解析及解決方案

CSMA/CD1． CSMA/CD的概述2． CSMA 的工作原理3． CSMA/CD控制規程及特點4． CSMA/CD協定5． CSMA/CD的優點6．結束語

極大似然法(ML)與最大期望法(EM)

C++ 第十五周報告1--《冒泡法排序》

筆試面試題目：滑動視窗(二)

資料結構與算法（27）——排序（二）

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

hdu7108哈希