第一次个人编程作业

https://github.com/Mr-CaoMei/031902104

一、PSP表格

PSP2.1	Personal Software Process Stages	预估耗时（分钟）	实际耗时（分钟）
Planning	计划
· Estimate	· 估计这个任务需要多少时间
Development	开发
· Analysis	· 需求分析 (包括学习新技术)
· Design Spec	· 生成设计文档
· Design Review	· 设计复审
· Coding Standard	· 代码规范 (为目前的开发制定合适的规范)
· Design	· 具体设计
· Coding	· 具体编码
· Code Review	· 代码复审
· Test	· 测试（自我测试，修改代码，提交修改）
Reporting	报告
· Test Repor	· 测试报告
· Size Measurement	· 计算工作量
· Postmortem & Process Improvement Plan	· 事后总结, 并提出过程改进计划
· 合计

二、计算模块接口

2.1 计算模块接口的设计与实现过程

初步构想：敏感词筛选功能由下列三个函数作用实现，分别放置在3个.py文件中。首先想到利用正则表达式对文档进行初步处理可以直接消除各敏感词间纷繁复杂的扰乱符号的影响，这样直接对敏感词文档进行读入后比对纯汉字和字母的清洗文档，就可以很简单地进行敏感词筛选。But，实现完功能之后才发现：在输出方面存在BUG！

2.2 计算模块接口部分的性能改进

针对接口性能方面，貌似我的代码为进行自定义函数分类，而是直接分散在各个.py文件，最后复制粘贴进main.py，导致进行性能分析时的分析图表出现密密麻麻的一大列。

2.3 计算模块部分单元测试展示。

1、clear_words 函数：

import os
import re

# import chardet

curr_dir = os.path.dirname(os.path.abspath(__file__))
org_path = os.path.join(curr_dir, 'org.txt')
words_path = os.path.join(curr_dir, 'words.txt')  # 绝对路径的获取

for line in open(org_path, 'r', encoding="utf-8"):
line = re.sub('[^\\na-zA-Z\u4e00-\u9fa5]', '', line)  #清洗文档，仅保留汉字与字母

2、make_word_list 函数：

import pypinyin


def pinyin(word):
s = ''
for i in pypinyin.pinyin(word, style=pypinyin.NORMAL):
s += ''.join(i)
return s  #将敏感词在pypinyin库中的匹配得到拼音连接组成新的敏感字符串并返回

curr_dir = os.path.dirname(os.path.abspath(__file__))
org_path = os.path.join(curr_dir, 'org.txt')
words_path = os.path.join(curr_dir, 'words.txt')  # 绝对路径的获取
words_list = []
with open(words_path, 'r', encoding="utf-8") as file1_object:
lines1 = file1_object.readlines()
for line in lines1:
words_list.append(line.strip())  # 将敏感词存储在列表words_list中
words_list.append(pinyin(line))  #按行读入敏感词并调用pinyin函数添加拓展敏感词

3、sensitive_search 函数：

int_count = 0
ans_path = os.path.join(curr_dir, 'ans.txt')
for line in open(org_path, 'r', encoding="utf-8"):
line = re.sub('[^\\na-zA-Z\u4e00-\u9fa5]', '', line)  # 去除文档中除了汉字和字母的其他符号
int_count += 1
for word in words_list:
if word in line:
#print("Line%d:<%s>%s" % (int_count, word, word))
str1 = str("Line%d:<%s>%s\n" % (int_count, word, word))
with open(ans_path, 'r+',encoding="utf-8") as file3_object:
file3_object.write(str1)

4、测试覆盖率

第一次个人编程作业

2.4 计算模块部分异常处理说明

1.对于中文汉字拆分：例如”法****“==”氵去车仑工力“这类拆字功能未能智能实现

2.对于拼音简写组合：例如”FLG“或者”法lungong“等敏感词检测未能实现

3.对于输出结果未能尽善尽美：例如”Line2:<邪教>邪_jiao” 只能输出 “Line2:<邪教>邪教”

三、心得

1、为充分利用github上现有资源，在拿到题目的时候仅仅对于题目的功能进行代码实现的设想，而没有第一时间安装学习github，如搜索网络上开源的资源，导致完成的速度和效率甚至是实现的功能都不如意。

第一次个人编程作业

2.1 计算模块接口的设计与实现过程

2.2 计算模块接口部分的性能改进

2.3 计算模块部分单元测试展示。

2.4 计算模块部分异常处理说明

三、心得

继续阅读

Webstorm上配置Github和Git

如何下载blob:https://www.bilibili.com/的视频

GitHub打开太慢,或者打不开Github,试试代理

黑马程序员——java基础：异常学习日志

Github访问速度慢的解决方案总汇

【考研政治】2021肖八整理（时政部分）

分享开源Cesium地形制作工具

路径错误Maven：Non-resolvable parent POM: Failure to find错误

git关联问题解决

github 如何和 xcode 联系起来

localstack 1.0 ga 了

解决方案之：DM relay 处理单元报错

用 Canvas 编织璀璨星空图

《2020失业潮，普通人能否出奇制胜？》笔记

【转载】性能测试知多少---性能分析与调优的原理

开源按键组件Multi_Button的使用,含测试工程