如果問上學的時候的我,我會說API就是Application programming interface,這個時候HR就會不懂裝懂地點點頭然後讓我進入下一輪面試。
如果問剛剛工作的我,我會說API就是接口嘛!PM靈光一現想出了一個新功能,我花幾天時間寫一個API,把這個功能展現出來,前端做網頁做APP的人可以用。
比如雅虎,有提供今天的天氣的API,查閱新聞的API,看體育比分的API。現代社會衣食住行離不開API,上淘寶看打折商品,用滴滴打車,上班發郵件,下班看電視劇,背後都是API在支撐。
當然API還可以從封裝方式來區分。主要是兩個流派,一派是基于REST,一派是基于RPC。REST是用HTTP封裝,而RPS往往用自定義的協定封裝。今天REST這一派衍生出了GraphQL,而RPC這一派衍生出了gRPC。這四種到底該用哪種,是今天網際網路公司的日常争論。最後往往要吵到最核心的API定義:API到底是用來幹什麼的。
是以什麼是API,現在的我會這麼解釋:API的本質就是幫人讀資料、寫資料。流派在變,技術在變,寫API、用API的人的職稱也會變,但是API的本質不會變。無論是哪種API,它的終極目的就是能讓人讀資料讀的輕松,寫資料寫的愉快。懂了這個,就明白了GraphQL解決的是什麼問題。
既然要寫GraphQL,就需要明白它的前輩REST。
REST有幾種操作,POST是寫新資料,GET是讀資料,PUT是改資料,DELETE是删資料。還有一些不常用的,比如PATCH、HEAD什麼的,一般不用。這幾種操作都是基于HTTP協定的,而且很好了解。如果想看今天的天氣,那麼用GET。如果我想買一個手提包,那就用POST。如果我想改我的QQ名,用PUT。如果我想删掉我十年前的一篇部落格,那就用DELETE。
但如果生活都是這麼簡單就好了。
REST這些操作往往界限很模糊。比如寫新資料、改資料、删資料,這三個就往往分不清楚。
我舉一個真實的例子:點贊。我原來寫雅虎評論區的API的時候,就為這個頭疼過。點贊有很多種實作方法。
比如我可以全用POST。寫一條新的“贊”,POST一個“贊”上去。如果我想把贊變成踩呢?那就POST一個“踩”。如果我想取消點贊,那就再POST一個取消。
還有一種實作方式,就是全用PUT。所有人對所有評論預設狀态是“不贊不踩”,這個狀态是中性的。如果我要點贊,那就把我“不贊不踩”的狀态改成“贊”。點“踩”也是一個道理。如果取消,就再改回“不贊不踩”。
當然有人還會認為取消“贊”應該用DELETE,因為要删資料。
總結一下,光點贊的實作就有四種方法:
所有操作都用POST
所有操作都用PUT
點贊點踩用POST,取消用DELETE
點贊點踩用PUT,取消用DELETE
我在雅虎寫API的時候,用的就是第四種。結果前端工程師有時候會搞不清楚,以為我用的是第一種。點贊這麼簡單的API,就有四種方法實作,更複雜的API就更難了解了。有的API對資料修改很多,既需要寫一些新的,也需要改一下舊的,最後還要删一下重複的。這設計起來就太亂了。
API還有一個問題,就是備援資訊過多。比如我要看一篇新聞報道,那我就做一個GET,GET到的東西有
标題
新聞機構(比如新華社)
新聞類别(比如體育、财經)
新聞圖檔
摘要(一兩句話概括)
文本
新聞視訊
記者
釋出時間
新聞連結
原始連結
但問題是這些東西往往有很多都用不到,比如這個界面:

它隻有标題、圖檔、新聞機構,這麼多response field用三個就行。
再看這個:
這個需要五個:标題、新聞類别(财經)、新聞機構、摘要、圖檔。
哪怕我隻需要三五個response field,我都要用API拿到全部11個資料。這不是浪費流量麼?
還有一個問題,就是拼裝Post Body很累。比如我想發一條評論,post body就可以大概寫成
這麼個JSON其實就是一個長長的字元串,每次我要發評論,我都得拼裝這麼個東西。如果我可以用模闆+變量就好了,也就是說我存一個固定的模闆
然後我隻需要把$1、$2、$3這些變量設好就行了。REST目前不支援這麼做,隻能用一些别的library來實作。
這些都是作為前端工程師的煩心事,後端工程師表示我其實也很難啊,我的麻煩更多。
比如驗證,每一個傳過來的request parameter都需要看是不是合法的。比如上面的sendFrom這個field,必須得是一種手機的作業系統。前端要是不小心說這個使用者評論是從收音機裡發出來的我不能接受。每一個field都得驗證是不是合法的,一共二十多個field我驗證二十多遍。雖然這些驗證方法都可以寫成library減少重複代碼,但是還是很麻煩。
設計API的時候,往往會用這麼一種思路,就是每一個endpoint對應一種resource。API既然是讀資料寫資料的工具,那麼我按照資料的種類把API分成幾個endpoint。
比如部落格文章就是一種resource,我搞一個/v1/news,這個endpoint有POST、PUT、GET、DELETE這麼幾個操作。然後部落格評論是/v1/comments,同樣也是上面四種操作。部落格評論區可以點贊,這就是/v1/vote。
部落格、評論、和點贊,這三者其實有依存關系。你不能沒有文章光發評論,你也不能沒有評論向空氣點贊。是以dependency flow就是:news -> comments -> votes。但是如果光看三個endpoint,你是看不明白這個關系的。
GraphQL把上面這些問題都解決了,解決的方法就是定了這麼幾個規矩:
不需要GET、POST、PUT、DELETE這麼多動作,一切簡化為讀和寫
Response不會一次給全部資料,用的時候要什麼,伺服器就傳回什麼
PostBody可以加入variable
寫API之前先寫Schema,一切資料都得定義類型
資料Dependency必須确立好,這樣Resource結構一目了然
具體GraphQL如何寫,我就不重複了,官網的教程很不錯
GraphQL: A query language for APIs.graphql.org
我把我的學習筆記粘貼一下:
Query and mutation are the two pillar of GraphQL
GraphQL always return json
Query error still use 200
Fragment and variable can avoid query manipulation significantly.
Field selection improve performance
Typing system help API server do type check
While query fields are executed in parallel, mutation fields run in series, one after the other. (This is to prevent race condition)
You can use "__typename" to get this meta field
GraphQL schema language defines type
! Means non-nullable
Root schema
Note that Query and Mutation are also objects. (Think GraphQL as object-oriented-programming)
GraphQL is basically a tree, and scalar types are leave nodes
GraphQL comes with a set of default scalar types out of the box:
Int: A signed 32‐bit integer.
Float: A signed double-precision floating-point value.
String: A UTF‐8 character sequence.
Boolean: true or false.
ID: The ID scalar type represents a unique identifier, often used to refetch an object or as the key for a cache. The ID type is serialized in the same way as a String; however, defining it as an ID signifies that it is not intended to be human‐readable.
There is also a way to specify custom scalar types. For example, we could define a Date type:
Define enum like this
An Interface is an abstract type that includes a certain set of fields that a type must include (OOP again)
Union types basically are OR logic
Note that members of a union type need to be concrete object types; you can't create a union type out of interfaces or other unions.
Input types look exactly the same as regular object types, but with the keyword input instead of type:
You can't mix input and output types in your schema
I think when they invented GraphQL, they want to call it TreeQL, but then realized that object relationship can be pointing backward, like a graph. So TreeQL becomes GraphQL.
A fragment cannot refer to itself or create a cycle, as this could result in an unbounded result!
This is good
This is bad
Rules of querying
Don't query non-existing field
Don't query object without scalar field (i.e. Always end with leave node)
Each field is a function of the previous type which returns the next type (think OOP). This function is called "resolver"
Here is an example of resolver
Resolver runs asynchronously. It returns a promise. This improves latency! (Especially when returning a list)
API design
Good: Use single endpoint to serve all query/mutation
Bad: Use multiple endpoint to serve different resource. (You can still do it, but it will be difficult to use GraphiQL reference tool https://github.com/graphql/graphiql)
Good: Use json and and `Accept-Encoding: gzip`
Avoid versioning (personally I don't think it is avoidable)
Every field is nullable by default. Remember that when designing applications
GraphQL usually fetch field individually, but you can optimize performance by batching and caching
Found a visualization tool: https://github.com/APIs-guru/graphql-voyager
Only business layer should do business logic
Database has its own design logic, and client also have a logic of using the API. Better design API around client usage instead of database design.
In the pipeline of API middleware, place graphql after authentication
GraphQL only use GET and POST. For GET, accept `?query=...&variables=...&operationName=...`
Accept header `Content-Type:application/json`
Response should be like
If there were no errors returned, the "errors" field should not be present on the response. If no data is returned, according to the GraphQL spec, the "data" field should only be included if the error occurred during execution.
Please disable GraphiQL for production
Don't do business layer authorization in GraphQL (or any API) layer. Do it in business layer.
For pagination, cursor-based pagination is the most powerful. Better use a base64 encoded form (so that the format is not intuitive thus no one will rely upon it)
This is a good example of pagination (github graphQL)
Request:
Response:
Note 1: If you base64 decode the cursor, it has format like this `cursor:v2:2020-02-02T15:37:01-08:00`
Note 2: Cursor is on edge, not on node. "Edge" and "node" concept will let people know cursor is not a property of object, but the connection.
Note 3: PageInfo will tell you whether we reached the end of the list.
Note 4: With pageInfo, you don't even need the cursor field on each edge.
Every object should have an id. In this way you can fetch any object with this query:
And the response is:
The object id in Github is actually a base64 encoding.
User id "MDQ6VXNlcjU1NzI1MzU=" => "04:User5572535"
Repository id "MDEwOlJlcG9zaXRvcnkzODE3NjgwMg==" => "010:Repository38176802"
Issue id "MDU6SXNzdWU1Njg2MDM4MTY=" => "05:Issue568603816"
So my guess is, their global id is <database_shard>:<object_type><database_key>
If the id is the same, it must be the same object.
This design is good, because when you fetch N items, you will get exactly N result. If one of the N items turns out to be null, then you will get null in place. You will love this when doing batch! (Think about Yahoo's sherpa batch api as a lesson)
Global object id can be used in caching too. In the past, people will use cache key like `<url>:<response>`. Now, people can do `<object_id>:<object>`.