1.清單,元組,字典,集合分别如何增删改查及周遊。

- 括号
- 有序無序
- 可變不可變
- 重複不可重複
- 存儲與查找方式
3.詞頻統計
-
1.下載下傳一長篇小說,存成utf-8編碼的文本檔案 file
2.通過檔案讀取字元串 str
3.對文本進行預處理
4.分解提取單詞 list
5.單詞計數字典 set , dict
6.按詞頻排序 list.sort(key=lambda),turple
7.排除文法型詞彙,代詞、冠詞、連詞等無語義詞
8.輸出TOP(20)
- 可視化:詞雲
排序好的單詞清單word儲存成csv檔案
import pandas as pd
pd.DataFrame(data=word).to_csv('big.csv',encoding='utf-8')
stop=open('stops.txt','r',encoding='utf8').read()
stop=stop.split()
stopSet=set(stop)
def gettxt():
sep = ",.;:?-_"
txt = open('star.txt','r',encoding='utf8').read().lower()
for ch in sep:
txt=txt.replace(ch,' ')
return txt
starList = gettxt().split()
starSet = set(starList)
starSet = starSet-stopSet
starDict = {}
for word in starSet:
starDict[word] = starList.count(word)
word = list(starDict.items())
word.sort(key=lambda x:x[1],reverse=True)
import pandas as pd
pd.DataFrame(data=word).to_csv('star.csv',encoding='utf-8')
i=0
while True:
print(word[i])
i=i+1
if i == 19:
break