複合資料類型，英文詞頻統計

該作業要求來自于：https://edu.cnblogs.com/campus/gzcc/GZCC-16SE2/homework/2696

1.清單，元組，字典，集合分别如何增删改查及周遊。

　　①清單的增删改查及周遊：

#清單list
list=["你好世界","C語言","JAVA","Python"];
print("清單:",list);
#增加
list.append("JavaScript");
print("添加JavaScript:",list);
list.insert(2,"PHP");
print("添加PHP，索引值為2(插入到第三個位置):",list)
#删除
list.pop();
print("删除最後一個值:",list);
list.pop(3);
print("删除第四個值:",list)
#修改
list[0]="HTML";
print("修改第一個值:",list);
#查找
print("查找第二個值:",list[1]);
#周遊
print("周遊list:");
for l in list:
    print("\t",l);

　　②元組的增删改查及周遊：

#元組tuple
tuple = ("你好世界", "C語言", "JAVA", "Python");
print("元組:",tuple);
#查找
print("查找第一個值：",tuple[0]);
#周遊
print("周遊tuple:");
for t in tuple:
    print("\t",t);

　　③字典的增删改查及周遊：

#字典dict
dict={"Jack":75,"Mary":81,"Amy":68,"Joe":92};
print("字典:",dict);
#增加
dict["Lida"]=90;
print("增加Lida:",dict);
#删除

if "Jack" in dict:
    dict.pop("Jack");

print("删除Jack:",dict); #修改 dict["Joe"]=88; print("修改Joe:",dict); #查找 print("查找Amy的成績:",dict["Amy"]); #周遊 print("周遊dict:") for d in dict: print("\t",d);

　　④集合的增删改查及周遊：

#集合set
set=set(["你好世界","C語言","JAVA","Python"]);
print("集合:",set);
#增加
set.add("你好世界");
print("增加‘你好世界’(無法增加，set中午重複值):",set);
set.add("C++");
print("增加‘C++’:",set);
#删除

if "JAVA" in set:
    set.remove("JAVA");

print("删除‘JAVA’:",set); set.pop(); print("删除一個值:",set); #周遊 print("周遊set:") for s in set: print("\t",s);

2.總結清單，元組，字典，集合的聯系與差別。參考以下幾個方面：

括号
有序無序
可變不可變
重複不可重複
存儲與查找方式

3.詞頻統計

　 1.下載下傳一長篇小說，存成utf-8編碼的文本檔案 file

　　2.通過檔案讀取字元串 str

　　3.對文本進行預處理

　　4.分解提取單詞 list

　　5.單詞計數字典 set , dict

　　6.按詞頻排序 list.sort(key=lambda),turple

　　7.排除文法型詞彙，代詞、冠詞、連詞等無語義詞

　　　　自定義停用詞表

　　　　或用stops.txt

　　8.輸出TOP(20)

　　9.可視化：詞雲

排序好的單詞清單word儲存成csv檔案

import pandas as pd
pd.DataFrame(data=word).to_csv('big.csv',encoding='utf-8')

線上工具生成詞雲：
https://wordart.com/create

import pandas as pd
from nltk.corpus import stopwords
#擷取停用詞
stopwords=stopwords.words('english');
#讀取檔案
f=open("Life of Johann Wolfgang Goethe.txt","r",encoding="utf-8");
str=f.read();
#檔案處理
dict={};
str=str.lower();#小寫轉化
remove=".,?:…—“”";
for i in remove:
    str = str.replace(i, " ");#以空格替換符号
list=str.split();#空格分割單詞單詞
for l in list:
    dict[l]=list.count(l);#擷取單詞數目
for s in stopwords:
    if s in dict.keys():
        dict.pop(s);#删除停用詞
d=sorted(dict.items(),reverse=True,key=lambda d:d[1]); #排序
print("前20個單詞出現頻數為：")
for i in range(20):
    print(d[i][0],"--",d[i][1]);
pd.DataFrame(data=d).to_csv('big.csv',encoding='utf-8');#儲存為.csv格式

複合資料類型，英文詞頻統計

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入