最近在教兒子做自然拼讀,跟他玩了一個單詞遊戲,就是利用簡單的枚舉找出适合小朋友學習的兩個字母的單詞。人工找尋難免有疏漏之處,這裡使用PyEnchant給出一個簡單的腳本。
01 - foo.py
1 #!/usr/bin/python3
2 """
3 A simple script to check a string is an English word
4
5 1. download PyEnchant from https://pypi.org/project/pyenchant/
6 2. save pyenchant-2.0.0.tar.gz to /tmp
7 3. tar zxf pyenchant-2.0.0.tar.gz
8 4. export PYTHONPATH=/tmp/pyenchant-2.0.0:$PYTHONPATH
9 5. ./foo.py <string>
10 """
11
12 import sys
13 import enchant
14
15
16 def is_english_word(word):
17 d_en = enchant.Dict("en_US")
18 return d_en.check(word)
19
20
21 def get_alphabet():
22 l_alph = []
23 for i in range(26):
24 l_alph.append(chr(ord('a') + i))
25 return l_alph
26
27
28 def main(argc, argv):
29 if argc != 2:
30 sys.stderr.write("Usage: %s <char>\n" % argv[0])
31 return 1
32
33 char_in = argv[1]
34
35 l_word1 = []
36 l_alph = get_alphabet()
37 for char in l_alph:
38 word = char_in + char
39 if is_english_word(word):
40 l_word1.append(word)
41 print(l_word1)
42
43 l_word2 = []
44 for char in l_alph:
45 word = char_in + char
46 word = word.upper()
47 if is_english_word(word):
48 if word.lower() in l_word1:
49 continue
50 l_word2.append(word)
51 print(l_word2)
52 return 0
53
54 if __name__ == '__main__':
55 sys.exit(main(len(sys.argv), sys.argv))
很簡單,核心代碼就是:
def is_english_word(word):
d_en = enchant.Dict("en_US")
return d_en.check(word)
02 - 測試foo.py
kaiba$ ./foo.py 'a'
['ab', 'ac', 'ad', 'ah', 'am', 'an', 'as', 'at', 'av', 'aw', 'ax']
['AA', 'AF', 'AG', 'AI', 'AK', 'AL', 'AP', 'AR', 'AU', 'AZ']
kaiba$ ./foo.py 'b'
['be', 'bf', 'bi', 'bk', 'bl', 'bu', 'bx', 'by']
['BA', 'BB', 'BC', 'BM', 'BO', 'BP', 'BR', 'BS']
kaiba$ ./foo.py 'be'
['bed', 'bee', 'beg', 'bet', 'bey']
['BEN']
kaiba$ ./foo.py 't'
['ta', 'ti', 'tn', 'to', 'tr', 'ts']
['TB', 'TC', 'TD', 'TE', 'TH', 'TL', 'TM', 'TU', 'TV', 'TX', 'TY']
kaiba$ ./foo.py 'tea'
['teak', 'teal', 'team', 'tear', 'teas', 'teat']
[]
附記 - foo.sh (直接egrep /usr/share/dict/words)
1 #!/bin/bash
2
3 function is_english_word
4 {
5 typeset word=${1?"*** str, e.g. a"}
6 egrep "^$word$" /usr/share/dict/words > /dev/null 2>&1
7 return $?
8 }
9
10 (( $# != 1 )) && echo "Usage: $0 <str prefix>" >&2 && exit 1
11 str_prefix=$1
12
13 lwords=""
14 uwords=""
15 for c in {a..z}; do
16 typeset -l lword=$str_prefix$c
17 typeset -u uword=$lword
18 is_english_word $lword && lwords+="$lword "
19 is_english_word $uword && uwords+="$uword "
20 done
21
22 lwords=$(echo $lwords)
23 uwords=$(echo $uwords)
24 rc=1
25 [[ -n $lwords ]] && echo $lwords && rc=0
26 [[ -n $uwords ]] && echo $uwords && rc=0
27 exit $rc
- 運作foo.sh
$ for c in {a..z}; do ./foo.sh $c; echo; done
aa ab ac ad ae af ag ah ai ak al am an ap aq ar as at av aw ax ay az
AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AY AZ
ba bb bd be bf bg bi bk bl bm bn bo bp br bs bt bu bv bx by bz
BA BB BC BD BE BF BG BH BI BL BM BN BO BP BR BS BT BU BV BW BX
ca cb cc cd ce cf cg ch ck cl cm co cp cq cr cs ct cu cv cy
CA CB CC CD CE CF CG CH CI CJ CL CM CN CO CP CQ CR CS CT CU CV CW CY CZ
da db dc dd de dg di dj dk dl dm dn do dp dr ds dt du dx dy dz
DA DB DC DD DE DF DG DH DI DJ DK DM DN DO DP DQ DR DS DT DU DV DW DX DZ
ea ec ed ee ef eg eh el em en eo ep eq er es et eu ew ex ey
EA EC ED EE EF EG EI EL EM EO EP EQ ER ES ET EV EW
fa fb fc fe ff fg fi fl fm fn fo fp fr fs ft fu fv fw fy fz
FA FB FC FD FE FF FI FL FM FO FP FR FS FT FV FW FX FY
ga gd ge gi gl gm gn go gp gr gs gt gu gv
GA GB GC GD GE GG GH GI GM GN GO GP GQ GR GS GT GU GW
ha hb hd he hf hg hi hl hm ho hp hq hr hs ht hv hw hy
HA HB HC HD HE HF HG HH HI HJ HK HL HM HO HP HQ HR HS HT HU HV HW HZ
ia ib ic id ie if ii ik il im in io iq ir is it iv iw ix
IA IB IC ID IE IF IG IL IM IN IO IP IQ IR IS IT IU IV IW IX
ja jg jo jr js jt
JA JC JD JI JJ JO JP JV
ka kb kc kg ki kl km kn ko kr kt kv kw ky
KB KC KD KE KG KI KN KO KP KR KS KT KV KW KY
la lb lc ld le lf lg lh li ll lm ln lo lp lr ls lt lu lv lx ly
LA LB LC LD LE LF LG LH LI LJ LL LM LO LP LR LS LT LU LV LW LZ
ma mb mc md me mf mg mh mi mk ml mm mn mo mp mr ms mt mu mv mw my
MA MB MC MD ME MF MG MH MI MJ ML MM MN MO MP MR MS MT MU MV MW MX MY
na nb nd ne ng ni nj nl nm no np nr ns nt nu nv ny
NA NB NC ND NE NF NG NH NI NJ NL NM NP NQ NS NT NU NV NW NY NZ
ob oc od oe of og oh ok ol om on op or os ot ow ox oy oz
OA OB OC OD OE OF OG OH OK OL OM ON OO OP OR OS OT OU OV OW
pa pc pd pe pf pg ph pi pk pl pm po pp pq pr ps pt pu
PA PB PC PD PE PF PG PH PI PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY
qe qh ql qm qn qp qr qs qt qu qv qy
QA QB QC QD QE QF QM QN QP QR QS QV
ra rc rd re rf rg rh rm rn ro rs rt
RA RB RC RD RE RF RH RI RJ RL RM RN RO RP RQ RR RS RT RU RV RW RX
sa sb sc sd se sf sg sh si sk sl sm sn so sp sq sr ss st su sv sw
SA SB SC SD SE SF SG SI SJ SL SM SN SO SP SR SS ST SU SV SW SX SY
ta tb tc te tg th ti tk tm tn to tp tr ts tu tv tx
TA TB TC TD TE TG TH TI TL TM TN TO TP TR TS TT TU TV TW TX
uc ug uh ui um un up ur us ut ux
UA UB UC UG UH UI UK UL UN UP UR US UT UU UV UW
va vb vc vd vg vi vl vo vp vr vs vt vv
VA VB VC VD VE VF VG VI VJ VL VM VN VO VP VR VS VT VU VV VW
wa wb wc wd we wf wg wh wi wk wl wm wo wr ws wt wy
WA WB WC WD WF WG WH WI WL WM WO WP WR WS WU WV WW WY
xc xd xi xr xs xu xw xx
XA XB XD XL XN XO XP XQ XT
ya yd ye yi ym yn yo yr ys yt
YA YB YP YT YU YV YY
za zn zo zs
ZA ZB ZD ZG ZI ZK ZT ZZ