Natural Language Processing With Python (3)

2023-06-19 05:47:54

Chapter 4 something about python basic: (1)A list is typically a sequence of objects all having the same type, of arbitrary length. Mutable. (2)A tuple is typically a collection of objects of ifferent types , of fixed length. Immutable. (3)Generator expression is much faster than the use of list comprehensions. (4)Looking up in the dict by the key is much faster than looking up in the list. (5)Looking up in the dict by the value is so slow that we should invert the direction for searching.But notice that dict keys must be immutable types, such as strings and tuples. (6)Be careful about the use of sort() and sorted().Also notice that they have more than one argument. (7)LGB rule. (8)Function for checking type : isinstance(). Use assert statement to raise error when type is wrong. (9)DOC style : a one-line summary, a more detailed explanation, a doctest example and epytext markup. (10)Lambda expressions and use of higher-order functions. (11)Multiple-argument function.(the use of *args and **kwarges) (12)The __name__='main' for unit test, the __file__ to locate your file. (1)_x and __all__ for hide variables and functions from "from .. import *", but not from "import ..". (13)Some trap: "%s %s" % "aaa", "ggg" is wrong with not adding parentheses, not use a mutable object as the default vale of a parameter because of using the same object.

Chapter 5 Learning POS tagging. Some function useful for tagging : (1)nltk.pos_tag() (2)nltk.corpus.brown.tagged_words() / tagged_sents(), using unsimplified tags. (3)nltk.defaultdict(), or defaultdict for python (4)conditionFreqDist and nltk.index is a special kind of dict (5)nltk.RegexpTagger() (6)nltk.unigramTagger(), nltk.bigramTagger(), nltk.ngramTagger() (7)tagger.evaluate() (8)open, dump, close(), load()

some skill on nlp: (1)some idiom use of nltk.defaultdict() (2)lookup tagger, which use the dict as a model of a real tagger. (3)n-gram means n-1 preceding tokens. (4)use the backoff and cutoff to solve the sparse data problem. typical use as following : t0=nltk.DefaultTagger(..) t1=nltk.UnigramTagger(.., backoff=t0) t2=nltk.BigramTagger(.., backoff=t1) t2.evalutae() (5)Separate the trainning and testing data, such as 90% for trainning and 10% for testing.Use the brown.tagged_sents() for trainning and testing which include evaluating.And use the othercorpus.sents() for real application. (6)Tag at sentence level instead of word level, such as using the sents()function. (7)transformation-based tagging beats n-gram tag with two reason : smaller space size and better model. A useful tagging is the Brill tagger.

Natural Language Processing With Python (3)

繼續閱讀

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

模拟A卷二、6 unix系統中tail指令實作

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入