天天看點

字元串操作,檔案操作,英文詞頻統計預處理

本次作業的來源https://edu.cnblogs.com/campus/gzcc/GZCC-16SE1/homework/2684

1.字元串操作:

  • 解析身份證号:生日、性别、出生地等。
IdCard=input("請輸入你的身份證号:")
if len(IdCard) == 18:
    print("你的身份号是:"+IdCard)
else:
    print("輸入有誤")
IdPlace = IdCard[0:6]
IdBirth = IdCard[6:14]
IdSex = IdCard[14:17]
print("出生地為:"+IdPlace)
year = IdBirth[0:4]
month = IdBirth [4:6]
day = IdBirth[6:8]
print("生日:{}年{}月{}日".format(year,month,day))
province=IdCard[0:2]
city = IdCard[2:4]
county = IdCard[4:6]
print("你的出生地為:{}省{}市{}縣".format(province,city,county))
if int(IdSex)%2 == 0:
    print("性别:女")
else:
    print("性别:男")      
字元串操作,檔案操作,英文詞頻統計預處理
  • 凱撒密碼編碼與解碼
import  os
def encryption():
    str_raw = input("請輸入明文:")
    k = int(input("請輸入位移值:"))
    str_change = str_raw.lower()
    str_list = list(str_change)
    str_list_encry = str_list
    i = 0
    while i < len(str_list):
        if ord(str_list[i]) < 123-k:
            str_list_encry[i] = chr(ord(str_list[i]) + k)
        else:
            str_list_encry[i] = chr(ord(str_list[i]) + k - 26)
        i = i+1
    print ("加密結果為:"+"".join(str_list_encry))
def decryption():
    str_raw = input("請輸入密文:")
    k = int(input("請輸入位移值:"))
    str_change = str_raw.lower()
    str_list = list(str_change)
    str_list_decry = str_list
    i = 0
    while i < len(str_list):
        if ord(str_list[i]) >= 97+k:
            str_list_decry[i] = chr(ord(str_list[i]) - k)
        else:
            str_list_decry[i] = chr(ord(str_list[i]) + 26 - k)
        i = i+1
    print ("解密結果為:"+"".join(str_list_decry))
while True:
    print (u"1. 加密")
    print (u"2. 解密")
    choice = input("請選擇:")
    if choice == "1":
        encryption()
    elif choice == "2":
        decryption()
    else:
        print (u"您的輸入有誤!")      
字元串操作,檔案操作,英文詞頻統計預處理
  • 網址觀察與批量生成
for i in range(2,10):
    url='http://news.gzcc.cn/html/xiaoyuanxinwen/{}.html'.format(i)
    print(url)      
字元串操作,檔案操作,英文詞頻統計預處理

2.英文詞頻統計預處理

  • 下載下傳一首英文的歌詞或文章或小說。
  • 将所有大寫轉換為小寫
  • 将所有其他做分隔符(,.?!)替換為空格
  • 分隔出一個一個的單詞
  • 并統計單詞出現的次數。
article ='''
Big data analytics and business analytics
by Duan, Lian; Xiong, Ye
Over the past few decades, with the development of automatic identification, data capture and storage technologies, 
people generate data much faster and collect data much bigger than ever before in business, science, engineering, education and other areas. 
Big data has emerged as an important area of study for both practitioners and researchers. 
It has huge impacts on data-related problems. 
In this paper, we identify the key issues related to big data analytics and then investigate its applications specifically related to business problems.
'''

split = article.split()
print(split)

#使用空格替換标點符号
article = article.replace(",","").replace(".","").replace(":","").replace(";","").replace("?","")


#大寫字母轉換成小寫字母
exchange = article.lower();
print(exchange)

#生成單詞清單
list = exchange.split()
print(list)

#生成詞頻統計
dic = {}
for i in list:
    count = list.count(i)
    dic[i] = count
print(dic)

#排除特定單詞
word = {'and','the','with','in','by','its','for','of','an','to'}
for i in word:
    del(dic[i])
print(dic)

#排序
dic1= sorted(dic.items(),key=lambda d:d[1],reverse= True)
print(dic1)

#輸出詞頻最大的前十位單詞
for i in range(10):
    print(dic1[i])      
字元串操作,檔案操作,英文詞頻統計預處理

3.檔案操作

  • 同一目錄、絕對路徑、相對路徑
//同一目錄
fo=open('cipher.txt','r',encoding='utf8')
content=fo.read()
fo.close()
print(content,end='')
//絕對路徑
fo=open(r'C:/Users/Czc/PycharmProjects/untitled1/aa.py','r',encoding='utf8')
content=fo.read()
fo.close()
print(content,end='')
//相對路徑
fo=open(r'./cipher.txt','r',encoding='utf8')
content=fo.read()
fo.close()
print(content,end='')      
字元串操作,檔案操作,英文詞頻統計預處理
  • 凱撒密碼:從檔案讀入密函,進行加密或解密,儲存到檔案。
file=open("cipher.txt")
a = file.read()
print(a)
cipher='';
jiemi='';
for i in a:
    cipher=cipher+chr(ord(i)+3);
print("加密後的密碼:",cipher)
file=open("cipher.txt",'w')
file.write(cipher)
file.close()      
字元串操作,檔案操作,英文詞頻統計預處理
  • 詞頻統計:下載下傳一首英文的歌詞或文章或小說,儲存為utf8檔案。從檔案讀入文本進行處理。
#coding=utf-8



file=open("bin.txt")
text=file.read();
file.close();
s=",.?!"
for i in s:
  text=text.replace(i," ")
  text=text.lower().split()
print(text)
count={}
for i in text:
  try:
    count[i]=count[i]+1
  except KeyError:
    count[i]=1
print(count)      
字元串操作,檔案操作,英文詞頻統計預處理

 4.函數定義

  • 加密函數
def encryption():
    str_raw = input("請輸入明文:")
    k = int(input("請輸入位移值:"))
    str_change = str_raw.lower()
    str_list = list(str_change)
    str_list_encry = str_list
    i = 0
    while i < len(str_list):
        if ord(str_list[i]) < 123-k:
            str_list_encry[i] = chr(ord(str_list[i]) + k)
        else:
            str_list_encry[i] = chr(ord(str_list[i]) + k - 26)
        i = i+1
    print ("加密結果為:"+"".join(str_list_encry))      
  • 解密函數
def decryption():
    str_raw = input("請輸入密文:")
    k = int(input("請輸入位移值:"))
    str_change = str_raw.lower()
    str_list = list(str_change)
    str_list_decry = str_list
    i = 0
    while i < len(str_list):
        if ord(str_list[i]) >= 97+k:
            str_list_decry[i] = chr(ord(str_list[i]) - k)
        else:
            str_list_decry[i] = chr(ord(str_list[i]) + 26 - k)
        i = i+1
    print ("解密結果為:"+"".join(str_list_decry))      
  • 讀文本函數
def readFile(filePath):
    file=open(filePath,'r',encoding='utf-8')
    return file.read()