PTA 詞頻統計（30 分）

2021-08-31 10:51:07

詞頻統計（30 分）

請編寫程式，對一段英文文本，統計其中所有不同單詞的個數，以及詞頻最大的前10%的單詞。

所謂“單詞”，是指由不超過80個單詞字元組成的連續字元串，但長度超過15的單詞将隻截取保留前15個單詞字元。而合法的“單詞字元”為大小寫字母、數字和下劃線，其它字元均認為是單詞分隔符。

輸入格式:

輸出格式:

輸入樣例：

This is a test.

The word "this" is the word with the highest frequency.

Longlonglonglongword should be cut off, so is considered as the same as longlonglonglonee.  But this_8 is different than this, and this, and this...#
this line should be ignored.

輸出樣例：（注意：雖然單詞 `the` 也出現了4次，但因為我們隻要輸出前10%（即23個單詞中的前2個）單詞，而按照字母序， `the` 排第3位，是以不輸出。）

23
5:this
4:is

#include<stdio.h>  
#include<iostream>  
#include<string>  
#include<algorithm>  
#include<vector>  
using namespace std;  
struct node{  
    string s;  
    int n;  
};  
vector<node > q;  
bool cmp(node s1,node s2)  
{  
    if(s1.n ==s2.n )  
    return s1.s <s2.s ;  
    else  
    return s1.n >s2.n;   
}  
//比較如果詞頻一樣，按字元從小到大排；   
int main()  
{  
    char n;  
    string s;  
    while(scanf("%c",&n))  
    {  
        if(n>='A'&&n<='Z'||n>='a'&&n<='z'||n>='0'&&n<='9'||n=='_')  
        {  
            if(n>='A'&&n<='Z')  
            n=n+32;  
            s+=n;//string可以直接相加   
        }//進行大小寫轉化，并累加字母為單詞；   
        else  
        if(n=='#'||s.size()>0)  
        {  
            string ss;  
            if(s.size()>0)  
            {  
                int g=0;  
                for(int i=0;i<15&&i<s.size();i++)  
                {  
                    ss+=s[i];  
                }  
                for(int i=0;i<q.size();i++)  
                {  
                    if(q[i].s==ss)  
                    q[i].n++;//記錄單詞個數；   
                    g=1;  
                }  
                if(g==0)  
                {  
                    node cc ;  
                    cc.n = 1;  
                    cc.s = ss;  
                    q.push_back(cc);  
                }//如果是新單詞，新記錄；   
            }  
            s.clear();  
            //每次都空；   
            if(n=='#')  
            {  
                break;  
            }  
        }  
    }  
    printf("%d\n",q.size());//單詞數   
    sort(q.begin(),q.end(),cmp);//進行排序   
    for(int i=0;i<q.size()/10;i++)  
    printf("%d:",q[i].n),cout<<q[i].s,printf("\n");  
}

PTA 詞頻統計（30 分）

詞頻統計（30 分）

輸入格式:

輸出格式:

輸入樣例：

輸出樣例：（注意：雖然單詞 `the` 也出現了4次，但因為我們隻要輸出前10%（即23個單詞中的前2個）單詞，而按照字母序， `the` 排第3位，是以不輸出。）

繼續閱讀

FZU 1978 Repair the brackets

UVA 10344- 23 out of 5

ZOJ 3935 2016

POJ 2115 C Looooops

HDU 5381 The sum of gcd

ZOJ 1104 Leaps Tall Buildings

ZOJ 3700 Ever Dream

HDU 2821 Pusher

ZOJ 1199 Point of Intersection

UVA 1401 Remember the Word

UVA 620 Cellular Structure

ZOJ 2748 Free Kick

CSU 1567 Reverse Rot

UVA 519 Puzzle (II)

swift資料合集

蘋果稽核 Guideline 2.1 - Information Needed

PTA 詞頻統計（30 分）

詞頻統計（30 分）

輸入格式:

輸出格式:

輸入樣例：

輸出樣例：（注意：雖然單詞 the 也出現了4次，但因為我們隻要輸出前10%（即23個單詞中的前2個）單詞，而按照字母序， the 排第3位，是以不輸出。）

繼續閱讀

輸出樣例：（注意：雖然單詞 `the` 也出現了4次，但因為我們隻要輸出前10%（即23個單詞中的前2個）單詞，而按照字母序， `the` 排第3位，是以不輸出。）