這是《The Art of Readable Code》的讀書筆記,再加一點自己的認識。強烈推薦此書:
- 英文版:《The Art of Readable Code》
- 中文版:編寫可讀代碼的藝術
代碼為什麼要易于了解
“Code should be written to minimize the time it would take for someone else to understand it.”
日常工作的事實是:
- 寫代碼前的思考和看代碼的時間遠大于真正寫的時間
- 讀代碼是很平常的事情,不論是别人的,還是自己的,半年前寫的可認為是别人的代碼
- 代碼可讀性高,很快就可以了解程式的邏輯,進入工作狀态
- 行數少的代碼不一定就容易了解
- 代碼的可讀性與程式的效率、架構、易于測試一點也不沖突
整本書都圍繞“如何讓代碼的可讀性更高”這個目标來寫。這也是好代碼的重要标準之一。
如何命名
變量名中應包含更多資訊
使用含義明确的詞,比如用 download
而不是 get
,參考以下替換方案:
download
get
- send -> deliver, dispatch, announce, distribute, route
- find -> search, extract, locate, recover
- start -> lanuch, create, begin, open
- make -> create,set up, build, generate, compose, add, new
避免通用的詞
像
tmp
和
retval
這樣詞,除了說明是臨時變量和傳回值之外,沒有任何意義。但是給他加一些有意義的詞,就會很明确:
- tmp_file = tempfile.NamedTemporaryFile()
- ...
- SaveData(tmp_file, ...)
不使用retval而使用變量真正代表的意義:
- sum_squares += v[i]; // Where's the "square" that we're summing? Bug!
嵌套的for循環中,
i
、
j
也有同樣讓人困惑的時候:
- for (int i = 0; i < clubs.size(); i++)
- for (int j = 0; j < clubs[i].members.size(); j++)
- for (int k = 0; k < users.size(); k++) if (clubs[i].members[k] == users[j])
- cout << "user[" << j << "] is in club[" << i << "]" << endl;
換一種寫法就會清晰很多:
- if (clubs[ci].members[mi] == users[ui]) # OK. First letters match.
是以,當使用一些通用的詞,要有充分的理由才可以。
使用具體的名字
CanListenOnPort
就比
ServerCanStart
好,can start比較含糊,而listen on port确切的說明了這個方法将要做什麼。
--run_locally
就不如
--extra_logging
來的明确。
增加重要的細節,比如變量的機關 _ms
,對原始字元串加 _raw
_ms
_raw
如果一個變量很重要,那麼在名字上多加一些額外的字就會更加易讀,比如将
string id; // Example: "af84ef845cd8"
換成
string hex_id;
。
- Start(int delay) --> delay → delay_secs
- CreateCache(int size) --> size → size_mb
- ThrottleDownload(float limit) --> limit → max_kbps
- Rotate(float angle) --> angle → degrees_cw
更多例子:
- password -> plaintext_password
- comment -> unescaped_comment
- html -> html_utf8
- data -> data_urlenc
對于作用域大的變量使用較長的名字
在比較小的作用域内,可以使用較短的變量名,在較大的作用域内使用的變量,最好用長一點的名字,編輯器的自動補全都可以很好的減少鍵盤輸入。對于一些縮寫字首,盡量選擇衆所周知的(如str),一個判斷标準是,當新成員加入時,是否可以無需他人幫助而明白字首代表什麼。
合理使用 _
、 -
等符号,比如對私有變量加 _
字首。
_
-
_
- var x = new DatePicker(); // DatePicker() 是類的"構造"函數,大寫開始
- var y = pageHeight(); // pageHeight() 是一個普通函數
-
- var $all_images = $("img"); // $all_images 是jQuery對象
- var height = 250; // height不是
-
- //id和class的寫法分開
- <div id="middle_column" class="main-content"> ...
命名不能有歧義
命名的時候可以先想一下,我要用的這個詞是否有别的含義。舉個例子:
- results = Database.all_objects.filter("year <= 2011")
現在的結果到底是包含2011年之前的呢還是不包含呢?
使用 min
、 max
代替 limit
min
max
limit
- CART_TOO_BIG_LIMIT = 10
- if shopping_cart.num_items() >= CART_TOO_BIG_LIMIT:
- Error("Too many items in cart.")
-
- MAX_ITEMS_IN_CART = 10
- if shopping_cart.num_items() > MAX_ITEMS_IN_CART:
- Error("Too many items in cart.")
對比上例中
CART_TOO_BIG_LIMIT
和
MAX_ITEMS_IN_CART
,想想哪個更好呢?
使用 first
和 last
來表示閉區間
first
last
- print integer_range(start=2, stop=4)
- # Does this print [2,3] or [2,3,4] (or something else)?
-
- set.PrintKeys(first="Bart", last="Maggie")
first
和
last
含義明确,适宜表示閉區間。
使用 beigin
和 end
表示前閉後開(2,9))區間
beigin
end
- PrintEventsInRange("OCT 16 12:00am", "OCT 17 12:00am")
-
- PrintEventsInRange("OCT 16 12:00am", "OCT 16 11:59:59.9999pm")
上面一種寫法就比下面的舒服多了。
Boolean型變量命名
- bool read_password = true;
這是一個很危險的命名,到底是需要讀取密碼呢,還是密碼已經被讀取呢,不知道,是以這個變量可以使用
user_is_authenticated
代替。通常,給Boolean型變量添加
is
、
has
、
can
、
should
可以讓含義更清晰,比如:
- SpaceLeft() --> hasSpaceLeft()
- bool disable_ssl = false --> bool use_ssl = true
符合預期
- public class StatisticsCollector {
- public void addSample(double x) { ... }
- public double getMean() {
- // Iterate through all samples and return total / num_samples
- }
- ...
- }
在這個例子中,
getMean
方法周遊了所有的樣本,傳回總額,是以并不是普通意義上輕量的
get
方法,是以應該取名
computeMean
比較合适。
漂亮的格式
寫出來漂亮的格式,充滿美感,讀起來自然也會舒服很多,對比下面兩個例子:
- class StatsKeeper {
- public:
- // A class for keeping track of a series of doubles
- void Add(double d); // and methods for quick statistics about them
- private: int count; /* how many so far
- */ public:
- double Average();
- private: double minimum;
- list<double>
- past_items
- ;double maximum;
- };
什麼是充滿美感的呢:
- // A class for keeping track of a series of doubles
- // and methods for quick statistics about them.
- class StatsKeeper {
- public:
- void Add(double d);
- double Average();
- private:
- list<double> past_items;
- int count; // how many so far
- double minimum;
- double maximum;
- };
考慮斷行的連續性和簡潔
這段代碼需要斷行,來滿足不超過一行80個字元的要求,參數也需要注釋說明:
- public class PerformanceTester {
- public static final TcpConnectionSimulator wifi = new TcpConnectionSimulator(
- 500, /* Kbps */
- 80, /* millisecs latency */
- 200, /* jitter */
- 1 /* packet loss % */);
-
- public static final TcpConnectionSimulator t3_fiber = new TcpConnectionSimulator(
- 45000, /* Kbps */
- 10, /* millisecs latency */
- 0, /* jitter */
- 0 /* packet loss % */);
-
- public static final TcpConnectionSimulator cell = new TcpConnectionSimulator(
- 100, /* Kbps */
- 400, /* millisecs latency */
- 250, /* jitter */
- 5 /* packet loss % */);
- }
考慮到代碼的連貫性,先優化成這樣:
- public class PerformanceTester {
- public static final TcpConnectionSimulator wifi =
- new TcpConnectionSimulator(
- 500, /* Kbps */
- 80, /* millisecs latency */ 200, /* jitter */
- 1 /* packet loss % */);
-
- public static final TcpConnectionSimulator t3_fiber =
- new TcpConnectionSimulator(
- 45000, /* Kbps */
- 10, /* millisecs latency */
- 0, /* jitter */
- 0 /* packet loss % */);
-
- public static final TcpConnectionSimulator cell =
- new TcpConnectionSimulator(
- 100, /* Kbps */
- 400, /* millisecs latency */
- 250, /* jitter */
- 5 /* packet loss % */);
- }
連貫性好一點,但還是太羅嗦,額外占用很多空間:
- public class PerformanceTester {
- // TcpConnectionSimulator(throughput, latency, jitter, packet_loss)
- // [Kbps] [ms] [ms] [percent]
- public static final TcpConnectionSimulator wifi =
- new TcpConnectionSimulator(500, 80, 200, 1);
-
- public static final TcpConnectionSimulator t3_fiber =
- new TcpConnectionSimulator(45000, 10, 0, 0);
-
- public static final TcpConnectionSimulator cell =
- new TcpConnectionSimulator(100, 400, 250, 5);
- }
用函數封裝
- // Turn a partial_name like "Doug Adams" into "Mr. Douglas Adams".
- // If not possible, 'error' is filled with an explanation.
- string ExpandFullName(DatabaseConnection dc, string partial_name, string* error);
-
- DatabaseConnection database_connection;
- string error;
- assert(ExpandFullName(database_connection, "Doug Adams", &error)
- == "Mr. Douglas Adams");
- assert(error == "");
- assert(ExpandFullName(database_connection, " Jake Brown ", &error)
- == "Mr. Jacob Brown III");
- assert(error == "");
- assert(ExpandFullName(database_connection, "No Such Guy", &error) == "");
- assert(error == "no match found");
- assert(ExpandFullName(database_connection, "John", &error) == "");
- assert(error == "more than one result");
上面這段代碼看起來很髒亂,很多重複性的東西,可以用函數封裝:
- CheckFullName("Doug Adams", "Mr. Douglas Adams", "");
- CheckFullName(" Jake Brown ", "Mr. Jake Brown III", "");
- CheckFullName("No Such Guy", "", "no match found");
- CheckFullName("John", "", "more than one result");
-
- void CheckFullName(string partial_name,
- string expected_full_name,
- string expected_error) {
- // database_connection is now a class member
- string error;
- string full_name = ExpandFullName(database_connection, partial_name, &error);
- assert(error == expected_error);
- assert(full_name == expected_full_name);
- }
列對齊
列對齊可以讓代碼段看起來更舒适:
- CheckFullName("Doug Adams" , "Mr. Douglas Adams" , "");
- CheckFullName(" Jake Brown ", "Mr. Jake Brown III", "");
- CheckFullName("No Such Guy" , "" , "no match found");
- CheckFullName("John" , "" , "more than one result");
-
- commands[] = {
- ...
- { "timeout" , NULL , cmd_spec_timeout},
- { "timestamping" , &opt.timestamping , cmd_boolean},
- { "tries" , &opt.ntry , cmd_number_inf},
- { "useproxy" , &opt.use_proxy , cmd_boolean},
- { "useragent" , NULL , cmd_spec_useragent},
- ...
- };
代碼用塊區分
- class FrontendServer {
- public:
- FrontendServer();
- void ViewProfile(HttpRequest* request);
- void OpenDatabase(string location, string user);
- void SaveProfile(HttpRequest* request);
- string ExtractQueryParam(HttpRequest* request, string param);
- void ReplyOK(HttpRequest* request, string html);
- void FindFriends(HttpRequest* request);
- void ReplyNotFound(HttpRequest* request, string error);
- void CloseDatabase(string location);
- ~FrontendServer();
- };
上面這一段雖然能看,不過還有優化空間:
- class FrontendServer {
- public:
- FrontendServer();
- ~FrontendServer();
- // Handlers
- void ViewProfile(HttpRequest* request);
- void SaveProfile(HttpRequest* request);
- void FindFriends(HttpRequest* request);
-
- // Request/Reply Utilities
- string ExtractQueryParam(HttpRequest* request, string param);
- void ReplyOK(HttpRequest* request, string html);
- void ReplyNotFound(HttpRequest* request, string error);
-
- // Database Helpers
- void OpenDatabase(string location, string user);
- void CloseDatabase(string location);
- };
再來看一段代碼:
- # Import the user's email contacts, and match them to users in our system.
- # Then display a list of those users that he/she isn't already friends with.
- def suggest_new_friends(user, email_password):
- friends = user.friends()
- friend_emails = set(f.email for f in friends)
- contacts = import_contacts(user.email, email_password)
- contact_emails = set(c.email for c in contacts)
- non_friend_emails = contact_emails - friend_emails
- suggested_friends = User.objects.select(email__in=non_friend_emails)
- display['user'] = user
- display['friends'] = friends
- display['suggested_friends'] = suggested_friends
- return render("suggested_friends.html", display)
全都混在一起,視覺壓力相當大,按功能化塊:
- def suggest_new_friends(user, email_password):
- # Get the user's friends' email addresses.
- friends = user.friends()
- friend_emails = set(f.email for f in friends)
-
- # Import all email addresses from this user's email account.
- contacts = import_contacts(user.email, email_password)
- contact_emails = set(c.email for c in contacts)
-
- # Find matching users that they aren't already friends with.
- non_friend_emails = contact_emails - friend_emails
- suggested_friends = User.objects.select(email__in=non_friend_emails)
-
- # Display these lists on the page. display['user'] = user
- display['friends'] = friends
- display['suggested_friends'] = suggested_friends
-
- return render("suggested_friends.html", display)
讓代碼看起來更舒服,需要在寫的過程中多注意,培養一些好的習慣,尤其當團隊合作的時候,代碼風格比如大括号的位置并沒有對錯,但是不遵循團隊規範那就是錯的。
如何寫注釋
當你寫代碼的時候,你會思考很多,但是最終呈現給讀者的就隻剩代碼本身了,額外的資訊丢失了,是以注釋的目的就是讓讀者了解更多的資訊。
應該注釋什麼
不應該注釋什麼
這樣的注釋毫無價值:
- // The class definition for Account
- class Account {
- public:
- // Constructor
- Account();
- // Set the profit member to a new value
- void SetProfit(double profit);
- // Return the profit from this Account
- double GetProfit();
- };
不要像下面這樣為了注釋而注釋:
- // Find a Node with the given 'name' or return NULL.
- // If depth <= 0, only 'subtree' is inspected.
- // If depth == N, only 'subtree' and N levels below are inspected.
- Node* FindNodeInSubtree(Node* subtree, string name, int depth);
不要給爛取名注釋
- // Enforce limits on the Reply as stated in the Request,
- // such as the number of items returned, or total byte size, etc.
- void CleanReply(Request request, Reply reply);
注釋的大部分都在解釋clean是什麼意思,那不如換個正确的名字:
- // Make sure 'reply' meets the count/byte/etc. limits from the 'request'
- void EnforceLimitsFromRequest(Request request, Reply reply);
記錄你的想法
我們讨論了不該注釋什麼,那麼應該注釋什麼呢?注釋應該記錄你思考代碼怎麼寫的結果,比如像下面這些:
- // Surprisingly, a binary tree was 40% faster than a hash table for this data.
- // The cost of computing a hash was more than the left/right comparisons.
-
- // This heuristic might miss a few words. That's OK; solving this 100% is hard.
-
- // This class is getting messy. Maybe we should create a 'ResourceNode' subclass to
- // help organize things.
也可以用來記錄流程和常量:
- // TODO: use a faster algorithm
- // TODO(dustin): handle other image formats besides JPEG
-
- NUM_THREADS = 8 # as long as it's >= 2 * num_processors, that's good enough.
-
- // Impose a reasonable limit - no human can read that much anyway.
- const int MAX_RSS_SUBSCRIPTIONS = 1000;
可用的詞有:
- TODO : Stuff I haven't gotten around to yet
- FIXME : Known-broken code here
- HACK : Adimittedly inelegant solution to a problem
- XXX : Danger! Major problem here
站在讀者的角度去思考
當别人讀你的代碼時,讓他們産生疑問的部分,就是你應該注釋的地方。
- struct Recorder {
- vector<float> data;
- ...
- void Clear() {
- vector<float>().swap(data); // Huh? Why not just data.clear()?
- }
- };
很多C++的程式員啊看到這裡,可能會想為什麼不用
data.clear()
來代替
vector.swap
,是以那個地方應該加上注釋:
- // Force vector to relinquish its memory (look up "STL swap trick")
- vector<float>().swap(data);
說明可能陷阱
你在寫代碼的過程中,可能用到一些hack,或者有其他需要讀代碼的人知道的陷阱,這時候就應該注釋:
- void SendEmail(string to, string subject, string body);
而實際上這個發送郵件的函數是調用别的服務,有逾時設定,是以需要注釋:
- // Calls an external service to deliver email. (Times out after 1 minute.)
- void SendEmail(string to, string subject, string body);
全景的注釋
有時候為了更清楚說明,需要給整個檔案加注釋,讓讀者有個總體的概念:
- // This file contains helper functions that provide a more convenient interface to our
- // file system. It handles file permissions and other nitty-gritty details.
總結性的注釋
即使是在函數内部,也可以有類似檔案注釋那樣的說明注釋:
- # Find all the items that customers purchased for themselves.
- for customer_id in all_customers:
- for sale in all_sales[customer_id].sales:
- if sale.recipient == customer_id:
- ...
或者按照函數的步進,寫一些注釋:
- def GenerateUserReport():
- # Acquire a lock for this user
- ...
- # Read user's info from the database
- ...
- # Write info to a file
- ...
- # Release the lock for this user
很多人不願意寫注釋,确實,要寫好注釋也不是一件簡單的事情,也可以在檔案專門的地方,留個寫注釋的區域,可以寫下你任何想說的東西。
注釋應簡明準确
前一個小節讨論了注釋應該寫什麼,這一節來讨論應該怎麼寫,因為注釋很重要,是以要寫的精确,注釋也占據螢幕空間,是以要簡潔。
精簡注釋
- // The int is the CategoryType.
- // The first float in the inner pair is the 'score',
- // the second is the 'weight'.
- typedef hash_map<int, pair<float, float> > ScoreMap;
這樣寫太羅嗦了,盡量精簡壓縮成這樣:
- // CategoryType -> (score, weight)
- typedef hash_map<int, pair<float, float> > ScoreMap;
避免有歧義的代詞
- // Insert the data into the cache, but check if it's too big first.
這裡的
it's
有歧義,不知道所指的是
data
還是
cache
,改成如下:
- // Insert the data into the cache, but check if the data is too big first.
還有更好的解決辦法,這裡的
it
就有明确所指:
- // If the data is small enough, insert it into the cache.
語句要精簡準确
- # Depending on whether we've already crawled this URL before, give it a different priority.
這句話了解起來太費勁,改成如下就好了解很多:
- # Give higher priority to URLs we've never crawled before.
精确描述函數的目的
- // Return the number of lines in this file.
- int CountLines(string filename) { ... }
這樣的一個函數,用起來可能會一頭霧水,因為他可以有很多歧義:
- ”” 一個空檔案,是0行還是1行?
- “hello” 隻有一行,那麼傳回值是0還是1?
- “hello\n” 這種情況傳回1還是2?
- “hello\n world” 傳回1還是2?
- “hello\n\r cruel\n world\r” 傳回2、3、4哪一個呢?
是以注釋應該這樣寫:
- // Count how many newline bytes ('\n') are in the file.
- int CountLines(string filename) { ... }
用執行個體說明邊界情況
- // Rearrange 'v' so that elements < pivot come before those >= pivot;
- // Then return the largest 'i' for which v[i] < pivot (or -1 if none are < pivot)
- int Partition(vector<int>* v, int pivot);
這個描述很精确,但是如果再加入一個例子,就更好了:
- // ...
- // Example: Partition([8 5 9 8 2], 8) might result in [5 2 | 8 9 8] and return 1
- int Partition(vector<int>* v, int pivot);
說明你的代碼的真正目的
- void DisplayProducts(list<Product> products) {
- products.sort(CompareProductByPrice);
- // Iterate through the list in reverse order
- for (list<Product>::reverse_iterator it = products.rbegin(); it != products.rend();
- ++it)
- DisplayPrice(it->price);
- ...
- }
這裡的注釋說明了倒序排列,單還不夠準确,應該改成這樣:
- // Display each price, from highest to lowest
- for (list<Product>::reverse_iterator it = products.rbegin(); ... )
函數調用時的注釋
看見這樣的一個函數調用,肯定會一頭霧水:
- Connect(10, false);
如果加上這樣的注釋,讀起來就清楚多了:
- def Connect(timeout, use_encryption): ...
-
- # Call the function using named parameters
- Connect(timeout = 10, use_encryption = False)
使用資訊含量豐富的詞
- // This class contains a number of members that store the same information as in the
- // database, but are stored here for speed. When this class is read from later, those
- // members are checked first to see if they exist, and if so are returned; otherwise the
- // database is read from and that data stored in those fields for next time.
上面這一大段注釋,解釋的很清楚,如果換一個詞來代替,也不會有什麼疑惑:
- // This class acts as a caching layer to the database.
簡化循環和邏輯
流程控制要簡單
讓條件語句、循環以及其他控制流程的代碼盡可能自然,讓讀者在閱讀過程中不需要停頓思考或者在回頭查找,是這一節的目的。
條件語句中參數的位置
對比下面兩種條件的寫法:
- if (length >= 10)
- while (bytes_received < bytes_expected)
-
- if (10 <= length)
- while (bytes_expected > bytes_received)
到底是應該按照大于小于的順序來呢,還是有其他的準則?是的,應該按照參數的意義來
- 運算符左邊:通常是需要被檢查的變量,也就是會經常變化的
- 運算符右邊:通常是被比對的樣本,一定程度上的常量
這就解釋了為什麼
bytes_received < bytes_expected
比反過來更好了解。
if/else的順序
通常,
if/else
的順序你可以自由選擇,下面這兩種都可以:
- if (a == b) {
- // Case One ...
- } else {
- // Case Two ...
- }
-
- if (a != b) {
- // Case Two ...
- } else {
- // Case One ...
- }
或許對此你也沒有仔細斟酌過,但在有些時候,一種順序确實好過另一種:
- 正向的邏輯在前,比如
就比if(debug)
好if(!debug)
- 簡單邏輯的在前,這樣
和if
就可以在一個螢幕顯示 - 有趣、清晰的邏輯在前else
舉個例子來看:
- if (!url.HasQueryParameter("expand_all")) {
- response.Render(items);
- ...
- } else {
- for (int i = 0; i < items.size(); i++) {
- items[i].Expand();
- }
- ...
- }
看到
if
你首先想到的是
expand_all
,就好像告訴你“不要想大象”,你會忍不住去想它,是以産生了一點點迷惑,最好寫成:
- if (url.HasQueryParameter("expand_all")) {
- for (int i = 0; i < items.size(); i++) {
- items[i].Expand();
- }
- ...
- } else {
- response.Render(items);
- ...
- }
三目運算符(?:)
- time_str += (hour >= 12) ? "pm" : "am";
-
- Avoiding the ternary operator, you might write:
- if (hour >= 12) {
- time_str += "pm";
- } else {
- time_str += "am";
- }
使用三目運算符可以減少代碼行數,上例就是一個很好的例證,但是我們的真正目的是減少讀代碼的時間,是以下面的情況并不适合用三目運算符:
- return exponent >= 0 ? mantissa * (1 << exponent) : mantissa / (1 << -exponent);
-
- if (exponent >= 0) {
- return mantissa * (1 << exponent);
- } else {
- return mantissa / (1 << -exponent);
- }
是以隻在簡單表達式的地方用。
避免使用do/while表達式
- do {
- continue;
- } while (false);
這段代碼會執行幾遍呢,需要時間思考一下,
do/while
完全可以用别的方法代替,是以應避免使用。
盡早return
- public boolean Contains(String str, String substr) {
- if (str == null || substr == null) return false;
- if (substr.equals("")) return true;
- ...
- }
函數裡面盡早的return,可以讓邏輯更加清晰。
減少嵌套
- if (user_result == SUCCESS) {
- if (permission_result != SUCCESS) {
- reply.WriteErrors("error reading permissions");
- reply.Done();
- return;
- }
- reply.WriteErrors("");
- } else {
- reply.WriteErrors(user_result);
- }
- reply.Done();
這樣一段代碼,有一層的嵌套,但是看起來也會稍有迷惑,想想自己的代碼,有沒有類似的情況呢?可以換個思路去考慮這段代碼,并且用盡早return的原則修改,看起來就舒服很多:
- if (user_result != SUCCESS) {
- reply.WriteErrors(user_result);
- reply.Done();
- return;
- }
- if (permission_result != SUCCESS) {
- reply.WriteErrors(permission_result);
- reply.Done();
- return;
- }
- reply.WriteErrors("");
- reply.Done();
同樣的,對于有嵌套的循環,可以采用同樣的辦法:
- for (int i = 0; i < results.size(); i++) {
- if (results[i] != NULL) {
- non_null_count++;
- if (results[i]->name != "") {
- cout << "Considering candidate..." << endl;
- ...
- }
- }
- }
換一種寫法,盡早return,在循環中就用continue:
- for (int i = 0; i < results.size(); i++) {
- if (results[i] == NULL) continue;
- non_null_count++;
-
- if (results[i]->name == "") continue;
- cout << "Considering candidate..." << endl;
- ...
- }
拆分複雜表達式
很顯然的,越複雜的表達式,讀起來越費勁,是以應該把那些複雜而龐大的表達式,拆分成一個個易于了解的小式子。
用變量
将複雜表達式拆分最簡單的辦法,就是增加一個變量:
- if line.split(':')[0].strip() == "root":
-
- //用變量替換
- username = line.split(':')[0].strip()
- if username == "root":
- ...
或者這個例子:
- if (request.user.id == document.owner_id) {
- // user can edit this document...
- }
- ...
- if (request.user.id != document.owner_id) {
- // document is read-only...
- }
-
- //用變量替換
- final boolean user_owns_document = (request.user.id == document.owner_id);
- if (user_owns_document) {
- // user can edit this document...
- }
- ...
- if (!user_owns_document) {
- // document is read-only...
- }
邏輯替換
- 1) not (a or b or c) <–> (not a) and (not b) and (not c)
- 2) not (a and b and c) <–> (not a) or (not b) or (not c)
是以,就可以這樣寫:
- if (!(file_exists && !is_protected)) Error("Sorry, could not read file.");
-
- //替換
- if (!file_exists || is_protected) Error("Sorry, could not read file.");
不要濫用邏輯表達式
- assert((!(bucket = FindBucket(key))) || !bucket->IsOccupied());
這樣的代碼完全可以用下面這個替換,雖然有兩行,但是更易懂:
- bucket = FindBucket(key);
- if (bucket != NULL) assert(!bucket->IsOccupied());
像下面這樣的表達式,最好也不要寫,因為在有些語言中,x會被賦予第一個為
true
的變量的值:
- x = a || b || c
拆解大表達式
- var update_highlight = function (message_num) {
- if ($("#vote_value" + message_num).html() === "Up") {
- $("#thumbs_up" + message_num).addClass("highlighted");
- $("#thumbs_down" + message_num).removeClass("highlighted");
- } else if ($("#vote_value" + message_num).html() === "Down") {
- $("#thumbs_up" + message_num).removeClass("highlighted");
- $("#thumbs_down" + message_num).addClass("highlighted");
- } else {
- $("#thumbs_up" + message_num).removeClass("highighted");
- $("#thumbs_down" + message_num).removeClass("highlighted");
- }
- };
這裡面有很多重複的語句,我們可以用變量還替換簡化:
- var update_highlight = function (message_num) {
- var thumbs_up = $("#thumbs_up" + message_num);
- var thumbs_down = $("#thumbs_down" + message_num);
- var vote_value = $("#vote_value" + message_num).html();
- var hi = "highlighted";
-
- if (vote_value === "Up") {
- thumbs_up.addClass(hi);
- thumbs_down.removeClass(hi);
- } else if (vote_value === "Down") {
- thumbs_up.removeClass(hi);
- thumbs_down.addClass(hi);
- } else {
- thumbs_up.removeClass(hi);
- thumbs_down.removeClass(hi);
- }
- }
變量與可讀性
消除變量
前一節,講到利用變量來拆解大表達式,這一節來讨論如何消除多餘的變量。
沒用的臨時變量
- now = datetime.datetime.now()
- root_message.last_view_time = now
這裡的
now
可以去掉,因為:
- 并非用來拆分複雜的表達式
- 也沒有增加可讀性,因為`datetime.datetime.now()`本就清晰
- 隻用了一次
是以完全可以寫作:
- root_message.last_view_time = datetime.datetime.now()
消除條件控制變量
- boolean done = false;
- while (/* condition */ && !done) {
- ...
- if (...) {
- done = true;
- continue;
- }
- }
這裡的
done
可以用别的方式更好的完成:
- while (/* condition */) {
- ...
- if (...) {
- break;
- }
- }
這個例子非常容易修改,如果是比較複雜的嵌套,
break
可能并不夠用,這時候就可以把代碼封裝到函數中。
減少變量的作用域
我們都聽過要避免使用全局變量這樣的忠告,是的,當變量的作用域越大,就越難追蹤,是以要保持變量小的作用域。
- class LargeClass {
- string str_;
- void Method1() {
- str_ = ...;
- Method2();
- }
- void Method2() {
- // Uses str_
- }
- // Lots of other methods that don't use str_
- ... ;
- }
這裡的
str_
的作用域有些大,完全可以換一種方式:
- class LargeClass {
- void Method1() {
- string str = ...;
- Method2(str);
- }
- void Method2(string str) {
- // Uses str
- }
- // Now other methods can't see str.
- };
将
str
通過變量函數參數傳遞,減小了作用域,也更易讀。同樣的道理也可以用在定義類的時候,将大類拆分成一個個小類。
不要使用嵌套的作用域
- # No use of example_value up to this point.
- if request:
- for value in request.values:
- if value > 0:
- example_value = value
- break
-
- for logger in debug.loggers:
- logger.log("Example:", example_value)
這個例子在運作時候會報
example_value is undefined
的錯,修改起來不算難:
- example_value = None
- if request:
- for value in request.values:
- if value > 0: example_value = value
- break
-
- if example_value:
- for logger in debug.loggers:
- logger.log("Example:", example_value)
但是參考前面的消除中間變量準則,還有更好的辦法:
- def LogExample(value):
- for logger in debug.loggers:
- logger.log("Example:", value)
-
- if request:
- for value in request.values:
- if value > 0:
- LogExample(value) # deal with 'value' immediately
- break
用到了再聲明
在C語言中,要求将所有的變量事先聲明,這樣當用到變量較多時候,讀者處理這些資訊就會有難度,是以一開始沒用到的變量,就暫緩聲明:
- def ViewFilteredReplies(original_id):
- filtered_replies = []
- root_message = Messages.objects.get(original_id)
- all_replies = Messages.objects.select(root_id=original_id)
- root_message.view_count += 1
- root_message.last_view_time = datetime.datetime.now()
- root_message.save()
-
- for reply in all_replies:
- if reply.spam_votes <= MAX_SPAM_VOTES:
- filtered_replies.append(reply)
-
- return filtered_replies
讀者一次處理變量太多,可以暫緩聲明:
- def ViewFilteredReplies(original_id):
- root_message = Messages.objects.get(original_id)
- root_message.view_count += 1
- root_message.last_view_time = datetime.datetime.now()
- root_message.save()
-
- all_replies = Messages.objects.select(root_id=original_id)
- filtered_replies = []
- for reply in all_replies:
- if reply.spam_votes <= MAX_SPAM_VOTES:
- filtered_replies.append(reply)
-
- return filtered_replies
變量最好隻寫一次
前面讨論了過多的變量會讓讀者迷惑,同一個變量,不停的被指派也會讓讀者頭暈,如果變量變化的次數少一些,代碼可讀性就更強。
一個例子
假設有一個頁面,如下,需要給第一個空的
input
指派:
- <input type="text" id="input1" value="Dustin">
- <input type="text" id="input2" value="Trevor">
- <input type="text" id="input3" value="">
- <input type="text" id="input4" value="Melissa">
- ...
- var setFirstEmptyInput = function (new_value) {
- var found = false;
- var i = 1;
- var elem = document.getElementById('input' + i);
- while (elem !== null) {
- if (elem.value === '') {
- found = true;
- break;
- }
- i++;
- elem = document.getElementById('input' + i);
- }
- if (found) elem.value = new_value;
- return elem;
- };
這段代碼能工作,有三個變量,我們逐一去看如何優化,
found
作為中間變量,完全可以消除:
- var setFirstEmptyInput = function (new_value) {
- var i = 1;
- var elem = document.getElementById('input' + i);
- while (elem !== null) {
- if (elem.value === '') {
- elem.value = new_value;
- return elem;
- }
- i++;
- elem = document.getElementById('input' + i);
- }
- return null;
- };
再來看
elem
變量,隻用來做循環,調用了很多次,是以很難跟蹤他的值,
i
也可以用
for
來修改:
- var setFirstEmptyInput = function (new_value) {
- for (var i = 1; true; i++) {
- var elem = document.getElementById('input' + i);
- if (elem === null)
- return null; // Search Failed. No empty input found.
- if (elem.value === '') {
- elem.value = new_value;
- return elem;
- }
- }
- };
重新組織你的代碼
分離不相關的子問題
工程師就是将大問題分解為一個個小問題,然後逐個解決,這樣也易于保證程式的健壯性、可讀性。如何分解子問題,下面給出一些準則:
- 看看這個方法或代碼,問問你自己“這段代碼的最終目标是什麼?”
- 對于每一行代碼,要問“它與目标直接相關,或者是不相關的子問題?”
- 如果有足夠多行的代碼是處理與目标不直接相關的問題,那麼抽離成子函數
來看一個例子:
- ajax_post({
- url: 'http://example.com/submit',
- data: data,
- on_success: function (response_data) {
- var str = "{\n";
- for (var key in response_data) {
- str += " " + key + " = " + response_data[key] + "\n";
- }
- alert(str + "}");
- // Continue handling 'response_data' ...
- }
- });
這段代碼的目标是發送一個
ajax
請求,是以其中字元串處理的部分就可以抽離出來:
- var format_pretty = function (obj) {
- var str = "{\n";
- for (var key in obj) {
- str += " " + key + " = " + obj[key] + "\n";
- }
- return str + "}";
- };
意外收獲
有很多理由将
format_pretty
抽離出來,這些獨立的函數可以很容易的添加feature,增強可靠性,處理邊界情況,等等。是以這裡,可以将
format_pretty
增強,就會得到一個更強大的函數:
- var format_pretty = function (obj, indent) {
- // Handle null, undefined, strings, and non-objects.
- if (obj === null) return "null";
- if (obj === undefined) return "undefined";
- if (typeof obj === "string") return '"' + obj + '"';
- if (typeof obj !== "object") return String(obj);
- if (indent === undefined) indent = "";
-
- // Handle (non-null) objects.
-
- var str = "{\n";
- for (var key in obj) {
- str += indent + " " + key + " = ";
- str += format_pretty(obj[key], indent + " ") + "\n"; }
- return str + indent + "}";
- };
這個函數輸出:
- {
- key1 = 1
- key2 = true
- key3 = undefined
- key4 = null
- key5 = {
- key5a = {
- key5a1 = "hello world"
- }
- }
- }
多做這樣的事情,就是積累代碼的過程,這樣的代碼可以複用,也可以形成自己的代碼庫,或者分享給别人。
業務相關的函數
那些與目标不相關函數,抽離出來可以複用,與業務相關的也可以抽出來,保持代碼的易讀性,例如:
- business = Business()
- business.name = request.POST["name"]
-
- url_path_name = business.name.lower()
- url_path_name = re.sub(r"['\.]", "", url_path_name)
- url_path_name = re.sub(r"[^a-z0-9]+", "-", url_path_name)
- url_path_name = url_path_name.strip("-")
- business.url = "/biz/" + url_path_name
-
- business.date_created = datetime.datetime.utcnow()
- business.save_to_database()
抽離出來,就好看很多:
- CHARS_TO_REMOVE = re.compile(r"['\.']+")
- CHARS_TO_DASH = re.compile(r"[^a-z0-9]+")
-
- def make_url_friendly(text):
- text = text.lower()
- text = CHARS_TO_REMOVE.sub('', text)
- text = CHARS_TO_DASH.sub('-', text)
- return text.strip("-")
-
- business = Business()
- business.name = request.POST["name"]
- business.url = "/biz/" + make_url_friendly(business.name)
- business.date_created = datetime.datetime.utcnow()
- business.save_to_database()
簡化現有接口
我們來看一個讀寫cookie的函數:
- var max_results;
- var cookies = document.cookie.split(';');
- for (var i = 0; i < cookies.length; i++) {
- var c = cookies[i];
- c = c.replace(/^[ ]+/, ''); // remove leading spaces
- if (c.indexOf("max_results=") === 0)
- max_results = Number(c.substring(12, c.length));
- }
這段代碼實在太醜了,理想的接口應該是這樣的:
- set_cookie(name, value, days_to_expire);
- delete_cookie(name);
對于并不理想的接口,你永遠可以用自己的函數做封裝,讓接口更好用。
按自己需要寫接口
- ser_info = { "username": "...", "password": "..." }
- user_str = json.dumps(user_info)
- cipher = Cipher("aes_128_cbc", key=PRIVATE_KEY, init_vector=INIT_VECTOR, op=ENCODE)
- encrypted_bytes = cipher.update(user_str)
- encrypted_bytes += cipher.final() # flush out the current 128 bit block
- url = "http://example.com/?user_info=" + base64.urlsafe_b64encode(encrypted_bytes)
- ...
雖然終極目的是拼接使用者資訊的字元,但是代碼大部分做的事情是解析python的object,是以:
- def url_safe_encrypt(obj):
- obj_str = json.dumps(obj)
- cipher = Cipher("aes_128_cbc", key=PRIVATE_KEY, init_vector=INIT_VECTOR, op=ENCODE) encrypted_bytes = cipher.update(obj_str)
- encrypted_bytes += cipher.final() # flush out the current 128 bit block
- return base64.urlsafe_b64encode(encrypted_bytes)
這樣在其他地方也可以調用:
- user_info = { "username": "...", "password": "..." }
- url = "http://example.com/?user_info=" + url_safe_encrypt(user_info)
分離子函數是好習慣,但是也要适度,過度的分離成多個小函數,也會讓查找變得困難。
單任務
代碼應該是一次隻完成一個任務
- var place = location_info["LocalityName"]; // e.g. "Santa Monica"
- if (!place) {
- place = location_info["SubAdministrativeAreaName"]; // e.g. "Los Angeles"
- }
- if (!place) {
- place = location_info["AdministrativeAreaName"]; // e.g. "California"
- }
- if (!place) {
- place = "Middle-of-Nowhere";
- }
- if (location_info["CountryName"]) {
- place += ", " + location_info["CountryName"]; // e.g. "USA"
- } else {
- place += ", Planet Earth";
- }
-
- return place;
這是一個用來拼地名的函數,有很多的條件判斷,讀起來非常吃力,有沒有辦法拆解任務呢?
- var town = location_info["LocalityName"]; // e.g. "Santa Monica"
- var city = location_info["SubAdministrativeAreaName"]; // e.g. "Los Angeles"
- var state = location_info["AdministrativeAreaName"]; // e.g. "CA"
- var country = location_info["CountryName"]; // e.g. "USA"
先拆解第一個任務,将各變量分别儲存,這樣在後面使用中不需要去記憶那些繁長的key值了,第二個任務,解決位址拼接的後半部分:
- // Start with the default, and keep overwriting with the most specific value. var second_half = "Planet Earth";
- if (country) {
- second_half = country;
- }
- if (state && country === "USA") {
- second_half = state;
- }
再來解決前半部分:
- var first_half = "Middle-of-Nowhere";
- if (state && country !== "USA") {
- first_half = state;
- }
- if (city) {
- first_half = city;
- }
- if (town) {
- first_half = town;
- }
大功告成:
- return first_half + ", " + second_half;
如果注意到有
USA
這個變量的判斷的話,也可以這樣寫:
- var first_half, second_half;
- if (country === "USA") {
- first_half = town || city || "Middle-of-Nowhere";
- second_half = state || "USA";
- } else {
- first_half = town || city || state || "Middle-of-Nowhere";
- second_half = country || "Planet Earth";
- }
- return first_half + ", " + second_half;
把想法轉換成代碼
要把一個複雜的東西解釋給别人,一些細節很容易就讓人産生迷惑,是以想象把你的代碼用平實的語言解釋給别人聽,别人是否能懂,有一些準則可以幫助你讓代碼更清晰:
- 用最平實的語言描述代碼的目的,就像給讀者講述一樣
- 注意描述中關鍵的字詞
- 讓你的代碼符合你的描述
下面這段代碼用來校驗使用者的權限:
- $is_admin = is_admin_request();
- if ($document) {
- if (!$is_admin && ($document['username'] != $_SESSION['username'])) {
- return not_authorized();
- }
- } else {
- if (!$is_admin) {
- return not_authorized();
- }
- }
- // continue rendering the page ...
這一段代碼不長,裡面的邏輯嵌套倒是複雜,參考前面章節所述,嵌套太多非常影響閱讀了解,将這個邏輯用語言描述就是:
- 有兩種情況有權限:
- 1、你是管理者(admin)
- 2、你擁有這個文檔
- 否則就沒有權限
根據描述來寫代碼:
- if (is_admin_request()) {
- // authorized
- } elseif ($document && ($document['username'] == $_SESSION['username'])) {
- // authorized
- } else {
- return not_authorized();
- }
- // continue rendering the page ...
寫更少的代碼
最易懂的代碼就是沒有代碼!
- 去掉那些沒意義的feature,也不要過度設計
- 重新考慮需求,解決最簡單的問題,也能完成整體的目标
- 熟悉你常用的庫,周期性研究他的API
最後
還有一些與測試相關的章節,留給你自己去研讀吧,再次推薦此書:
- 英文版:《The Art of Readable Code》
- 中文版:編寫可讀代碼的藝術