你也許已經知道Redis并不是簡單的key-value存儲,實際上他是一個資料結構伺服器,支援不同類型的值。也就是說,你不必僅僅把字元串當作鍵所指向的值。下列這些資料類型都可作為值類型:
二進制安全的字元串
Lists: 按插入順序排序的字元串元素的集合。他們基本上就是連結清單(linked lists)。
Sets: 不重複且無序的字元串元素的集合。
Sorted sets,類似Sets,但是每個字元串元素都關聯到一個叫score浮動數值(floating number value)。裡面的元素總是通過score進行着排序,是以不同的是,它是可以檢索的一系列元素。(例如你可能會問:給我前面10個或者後面10個元素)。
Hashes,由field和關聯的value組成的map。field和value都是字元串的。這和Ruby、Python的hashes很像。
Bit arrays (或者說 simply bitmaps): 通過特殊的指令,你可以将 String 值當作一系列 bits 處理:可以設定和清除單獨的 bits,數出所有設為 1 的 bits 的數量,找到最前的被設為 1 或 0 的 bit,等等。
HyperLogLogs: 這是被用于估計一個 set 中元素數量的機率性的資料結構。别害怕,它比看起來的樣子要簡單…參見本教程的 HyperLogLog 部分。D
學習這些資料類型的原理,以及如何使用它們解決 command reference 中的特定問題,并不總是不關緊要的。是以,本文檔是一個關于 Redis 資料類型和它們最常見特性的導論。 在所有的例子中,我們将使用 <code>redis-cli</code> 工具。它是一個簡單而有用的指令行工具,用于向 Redis 伺服器發出指令。
Redis key值是二進制安全的,這意味着可以用任何二進制序列作為key值,從形如”foo”的簡單字元串到一個JPEG檔案的内容都可以。空字元串也是有效key值。
關于key的幾條規則:
太長的鍵值不是個好主意,例如1024位元組的鍵值就不是個好主意,不僅因為消耗記憶體,而且在資料中查找這類鍵值的計算成本很高。
太短的鍵值通常也不是好主意,如果你要用”u:1000:pwd”來代替”user:1000:password”,這沒有什麼問題,但後者更易閱讀,并且由此增加的空間消耗相對于key object和value object本身來說很小。當然,沒人阻止您一定要用更短的鍵值節省一丁點兒空間。
最好堅持一種模式。例如:”object-type:id:field”就是個不錯的注意,像這樣”user:1000:password”。我喜歡對多單詞的字段名中加上一個點,就像這樣:”comment:1234:reply.to”。
這是最簡單Redis類型。如果你隻用這種類型,Redis就像一個可以持久化的memcached伺服器(注:memcache的資料僅儲存在記憶體中,伺服器重新開機後,資料将丢失)。
我們用redis-cli來玩一下字元串類型:
正如你所見到的,通常用SET command 和 GET command來設定和擷取字元串值。
值可以是任何種類的字元串(包括二進制資料),例如你可以在一個鍵下儲存一副jpeg圖檔。值的長度不能超過512 MB。
<code>SET</code> 指令有些有趣的操作,例如,當key存在時<code>SET</code>會失敗,或相反的,當key不存在時它隻會成功。
雖然字元串是Redis的基本值類型,但你仍然能通過它完成一些有趣的操作。例如:原子遞增:
INCR 指令将字元串值解析成整型,将其加一,最後将結果儲存為新的字元串值,類似的指令有INCRBY, DECR 和 DECRBY。實際上他們在内部就是同一個指令,隻是看上去有點兒不同。
INCR是原子操作意味着什麼呢?就是說即使多個用戶端對同一個key發出INCR指令,也決不會導緻競争的情況。例如如下情況永遠不可能發生:『用戶端1和用戶端2同時讀出“10”,他們倆都對其加到11,然後将新值設定為11』。最終的值一定是12,read-increment-set操作完成時,其他用戶端不會在同一時間執行任何指令。
對字元串,另一個的令人感興趣的操作是GETSET指令,行如其名:他為key設定新值并且傳回原值。這有什麼用處呢?例如:你的系統每當有新使用者通路時就用INCR指令操作一個Redis key。你希望每小時對這個資訊收集一次。你就可以GETSET這個key并給其指派0并讀取原值。
為減少等待時間,也可以一次存儲或擷取多個key對應的值,使用MSET和MGET指令:
MGET 指令傳回由值組成的數組。
有些指令不是針對任何具體的類型定義的,而是用于和整個鍵空間互動的。是以,它們可被用于任何類型的鍵。
使用EXISTS指令傳回1或0辨別給定key的值是否存在,使用DEL指令可以删除key對應的值,DEL指令傳回1或0辨別值是被删除(值存在)或者沒被删除(key對應的值不存在)。
TYPE指令可以傳回key對應的值的存儲類型:
在介紹複雜類型前我們先介紹一個與值類型無關的Redis特性:逾時。你可以對key設定一個逾時時間,當這個時間到達後會被删除。精度可以使用毫秒或秒。
上面的例子使用了EXPIRE來設定逾時時間(也可以再次調用這個指令來改變逾時時間,使用PERSIST指令去除逾時時間 )。我們也可以在建立值的時候設定逾時時間:
TTL指令用來檢視key對應的值剩餘存活時間。
要說清楚清單資料類型,最好先講一點兒理論背景,在資訊技術界List這個詞常常被使用不當。例如”Python Lists”就名不副實(名為Linked Lists),但他們實際上是數組(同樣的資料類型在Ruby中叫數組)
一般意義上講,清單就是有序元素的序列:10,20,1,2,3就是一個清單。但用數組實作的List和用Linked List實作的List,在屬性方面大不相同。
Redis lists基于Linked Lists實作。這意味着即使在一個list中有數百萬個元素,在頭部或尾部添加一個元素的操作,其時間複雜度也是常數級别的。用LPUSH 指令在十個元素的list頭部添加新元素,和在千萬元素list頭部添加新元素的速度相同。
那麼,壞消息是什麼?在數組實作的list中利用索引通路元素的速度極快,而同樣的操作在linked list實作的list上沒有那麼快。
Redis Lists用linked list實作的原因是:對于資料庫系統來說,至關重要的特性是:能非常快的在很大的清單上添加元素。另一個重要因素是,正如你将要看到的:Redis lists能在常數時間取得常數長度。
如果快速通路集合元素很重要,建議使用可排序集合(sorted sets)。可排序集合我們會随後介紹。
LPUSH 指令可向list的左邊(頭部)添加一個新元素,而RPUSH指令可向list的右邊(尾部)添加一個新元素。最後LRANGE指令可從list中取出一定範圍的元素:
注意:LRANGE 帶有兩個索引,一定範圍的第一個和最後一個元素。這兩個索引都可以為負來告知Redis從尾部開始計數,是以-1表示最後一個元素,-2表示list中的倒數第二個元素,以此類推。
上面的所有指令的參數都可變,友善你一次向list存入多個值。
還有一個重要的指令是pop,它從list中删除元素并同時傳回删除的值。可以在左邊或右邊操作。
我們增加了三個元素,并彈出了三個元素,是以,在這最後 清單中的指令序列是空的,沒有更多的元素可以被彈出。如果我們嘗試彈出另一個元素,這是我們得到的結果:
當list沒有元素時,Redis 傳回了一個NULL。
正如你可以從上面的例子中猜到的,list可被用來實作聊天系統。還可以作為不同程序間傳遞消息的隊列。關鍵是,你可以每次都以原先添加的順序通路資料。這不需要任何SQL ORDER BY 操作,将會非常快,也會很容易擴充到百萬級别元素的規模。
例如在評級系統中,比如社會化新聞網站 reddit.com,你可以把每個新送出的連結添加到一個list,用LRANGE可簡單的對結果分頁。
在部落格引擎實作中,你可為每篇日志設定一個list,在該list中推入部落格評論,等等。
可以使用LTRIM把list從左邊截取指定長度。
可以使用Redis來實作生産者和消費者模型,如使用LPUSH和RPOP來實作該功能。但會遇到這種情景:list是空,這時候消費者就需要輪詢來擷取資料,這樣就會增加redis的通路壓力、增加消費端的cpu時間,而很多通路都是無用的。為此redis提供了阻塞式通路 BRPOP 和 BLPOP 指令。 消費者可以在擷取資料時指定如果資料不存在阻塞的時間,如果在時限内獲得資料則立即傳回,如果逾時還沒有資料則傳回null, 0表示一直阻塞。
同時redis還會為所有阻塞的消費者以先後順序排隊。
如需了解詳細資訊請檢視 RPOPLPUSH 和 BRPOPLPUSH。
目前為止,在我們的例子中,我們沒有在推入元素之前建立空的 list,或者在 list 沒有元素時删除它。在 list 為空時删除 key,并在使用者試圖添加元素(比如通過 <code>LPUSH</code>)而鍵不存在時建立空 list,是 Redis 的職責。
這不光适用于 lists,還适用于所有包括多個元素的 Redis 資料類型 – Sets, Sorted Sets 和 Hashes。
基本上,我們可以用三條規則來概括它的行為:
當我們向一個聚合資料類型中添加元素時,如果目标鍵不存在,就在添加元素前建立空的聚合資料類型。
當我們從聚合資料類型中移除元素時,如果值仍然是空的,鍵自動被銷毀。
對一個空的 key 調用一個隻讀的指令,比如 <code>LLEN</code> (傳回 list 的長度),或者一個删除元素的指令,将總是産生同樣的結果。該結果和對一個空的聚合類型做同個操作的結果是一樣的。
規則 1 示例:
但是,我們不能對存在但類型錯誤的 key 做操作: > set foo bar OK > lpush foo 1 2 3 (error) WRONGTYPE Operation against a key holding the wrong kind of value > type foo string
規則 2 示例:
所有的元素被彈出之後, key 不複存在。
規則 3 示例:
Redis Hashes —
Redis hash 看起來就像一個 “hash” 的樣子,由鍵值對組成:
Hash 便于表示 objects,實際上,你可以放入一個 hash 的域數量實際上沒有限制(除了可用記憶體以外)。是以,你可以在你的應用中以不同的方式使用 hash。
<code>HMSET</code> 指令設定 hash 中的多個域,而 <code>HGET</code> 取回單個域。<code>HMGET</code> 和 <code>HGET</code> 類似,但傳回一系列值:
也有一些指令能夠對單獨的域執行操作,比如 <code>HINCRBY</code>:
你可以在文檔中找到 hash 指令的完整清單。
值得注意的是,小的 hash 被用特殊方式編碼,非常節約記憶體。
Redis Sets —
Redis Set 是 String 的無序排列。<code>SADD</code> 指令把新的元素添加到 set 中。對 set 也可做一些其他的操作,比如測試一個給定的元素是否存在,對不同 set 取交集,并集或差,等等。
現在我已經把三個元素加到我的 set 中,并告訴 Redis 傳回所有的元素。可以看到,它們沒有被排序 —— Redis 在每次調用時可能按照任意順序傳回元素,因為對于元素的順序并沒有規定。
Redis 有檢測成員的指令。一個特定的元素是否存在?
“3” 是 set 的一個成員,而 “30” 不是。
Sets 适合用于表示對象間的關系。 例如,我們可以輕易使用 set 來表示标記。
一個簡單的模組化方式是,對每一個希望标記的對象使用 set。這個 set 包含和對象相關聯的标簽的 ID。
假設我們想要給新聞打上标簽。 假設新聞 ID 1000 被打上了 1,2,5 和 77 四個标簽,我們可以使用一個 set 把 tag ID 和新聞條目關聯起來:
但是,有時候我可能也會需要相反的關系:所有被打上相同标簽的新聞清單:
擷取一個對象的所有 tag 是很友善的:
注意:在這個例子中,我們假設你有另一個資料結構,比如一個 Redis hash,把标簽 ID 對應到标簽名稱。
使用 Redis 指令行,我們可以輕易實作其它一些有用的操作。比如,我們可能需要一個含有 1, 2, 10, 和 27 标簽的對象的清單。我們可以用 <code>SINTER</code> 指令來完成這件事。它擷取不同 set 的交集。我們可以用:
不光可以取交集,還可以取并集,差集,擷取随機元素,等等。
擷取一個元素的指令是 <code>SPOP</code>,它很适合對特定問題模組化。比如,要實作一個基于 web 的撲克遊戲,你可能需要用 set 來表示一副牌。假設我們用一個字元的字首來表示不同花色:
現在,我們想要給每個玩家 5 張牌。<code>SPOP</code> 指令删除一個随機元素,把它傳回給用戶端,是以它是完全合适的操作。
但是,如果我們對我們的牌直接調用它,在下一盤我們就需要重新充滿這副牌。開始,我們可以複制 <code>deck</code> 鍵中的内容,并放入 <code>game:1:deck</code> 鍵中。
這是通過 <code>SUNIONSTORE</code> 實作的,它通常用于對多個集合取并集,并把結果存入另一個 set 中。但是,因為一個 set 的并集就是它本身,我可以這樣複制我的牌:
現在,我已經準備好給 1 号玩家發五張牌:
One pair of jacks, not great…
Now it’s a good time to introduce the set command that provides the number of elements inside a set. This is often called the cardinality of a set in the context of set theory, so the Redis command is called <code>SCARD</code>.
The math works: 52 - 5 = 47.
When you need to just get random elements without removing them from the set, there is the <code>SRANDMEMBER</code> command suitable for the task. It also features the ability to return both repeating and non-repeating elements.
Redis Sorted sets —
Sorted sets are a data type which is similar to a mix between a Set and a Hash. Like sets, sorted sets are composed of unique, non-repeating string elements, so in some sense a sorted set is a set as well.
However while elements inside sets are not ordered, every element in a sorted set is associated with a floating point value, called the score (this is why the type is also similar to a hash, since every element is mapped to a value).
Moreover, elements in a sorted sets are taken in order (so they are not ordered on request, order is a peculiarity of the data structure used to represent sorted sets). They are ordered according to the following rule:
If A and B are two elements with a different score, then A > B if A.score is > B.score.
If A and B have exactly the same score, then A > B if the A string is lexicographically greater than the B string. A and B strings can’t be equal since sorted sets only have unique elements.
Let’s start with a simple example, adding a few selected hackers names as sorted set elements, with their year of birth as “score”.
As you can see <code>ZADD</code> is similar to <code>SADD</code>, but takes one additional argument (placed before the element to be added) which is the score. <code>ZADD</code> is also variadic, so you are free to specify multiple score-value pairs, even if this is not used in the example above.
With sorted sets it is trivial to return a list of hackers sorted by their birth year because actually they are already sorted.
Implementation note: Sorted sets are implemented via a dual-ported data structure containing both a skip list and a hash table, so every time we add an element Redis performs an O(log(N)) operation. That’s good, but when we ask for sorted elements Redis does not have to do any work at all, it’s already all sorted:
Note: 0 and -1 means from element index 0 to the last element (-1 works here just as it does in the case of the <code>LRANGE</code>command).
What if I want to order them the opposite way, youngest to oldest? Use ZREVRANGE instead of ZRANGE:
It is possible to return scores as well, using the <code>WITHSCORES</code> argument:
Sorted sets are more powerful than this. They can operate on ranges. Let’s get all the individuals that were born up to 1950 inclusive. We use the <code>ZRANGEBYSCORE</code> command to do it:
We asked Redis to return all the elements with a score between negative infinity and 1950 (both extremes are included).
It’s also possible to remove ranges of elements. Let’s remove all the hackers born between 1940 and 1960 from the sorted set:
<code>ZREMRANGEBYSCORE</code> is perhaps not the best command name, but it can be very useful, and returns the number of removed elements.
Another extremely useful operation defined for sorted set elements is the get-rank operation. It is possible to ask what is the position of an element in the set of the ordered elements.
The <code>ZREVRANK</code> command is also available in order to get the rank, considering the elements sorted a descending way.
With recent versions of Redis 2.8, a new feature was introduced that allows getting ranges lexicographically, assuming elements in a sorted set are all inserted with the same identical score (elements are compared with the C <code>memcmp</code> function, so it is guaranteed that there is no collation, and every Redis instance will reply with the same output).
The main commands to operate with lexicographical ranges are <code>ZRANGEBYLEX</code>, <code>ZREVRANGEBYLEX</code>, <code>ZREMRANGEBYLEX</code> and <code>ZLEXCOUNT</code>.
For example, let’s add again our list of famous hackers, but this time use a score of zero for all the elements:
Because of the sorted sets ordering rules, they are already sorted lexicographically:
Using <code>ZRANGEBYLEX</code> we can ask for lexicographical ranges:
Ranges can be inclusive or exclusive (depending on the first character), also string infinite and minus infinite are specified respectively with the <code>+</code> and <code>-</code> strings. See the documentation for more information.
This feature is important because it allows us to use sorted sets as a generic index. For example, if you want to index elements by a 128-bit unsigned integer argument, all you need to do is to add elements into a sorted set with the same score (for example 0) but with an 8 byte prefix consisting of the 128 bit number in big endian. Since numbers in big endian, when ordered lexicographically (in raw bytes order) are actually ordered numerically as well, you can ask for ranges in the 128 bit space, and get the element’s value discarding the prefix.
If you want to see the feature in the context of a more serious demo, check the Redis autocomplete demo.
Just a final note about sorted sets before switching to the next topic. Sorted sets’ scores can be updated at any time. Just calling <code>ZADD</code> against an element already included in the sorted set will update its score (and position) with O(log(N)) time complexity. As such, sorted sets are suitable when there are tons of updates.
Because of this characteristic a common use case is leader boards. The typical application is a Facebook game where you combine the ability to take users sorted by their high score, plus the get-rank operation, in order to show the top-N users, and the user rank in the leader board (e.g., “you are the #4932 best score here”).
Bitmaps —
Bitmaps are not an actual data type, but a set of bit-oriented operations defined on the String type. Since strings are binary safe blobs and their maximum length is 512 MB, they are suitable to set up to 2^32 different bits.
Bit operations are divided into two groups: constant-time single bit operations, like setting a bit to 1 or 0, or getting its value, and operations on groups of bits, for example counting the number of set bits in a given range of bits (e.g., population counting).
One of the biggest advantages of bitmaps is that they often provide extreme space savings when storing information. For example in a system where different users are represented by incremental user IDs, it is possible to remember a single bit information (for example, knowing whether a user wants to receive a newsletter) of 4 billion of users using just 512 MB of memory.
Bits are set and retrieved using the <code>SETBIT</code> and <code>GETBIT</code> commands:
The <code>SETBIT</code> command takes as its first argument the bit number, and as its second argument the value to set the bit to, which is 1 or 0. The command automatically enlarges the string if the addressed bit is outside the current string length.
<code>GETBIT</code> just returns the value of the bit at the specified index. Out of range bits (addressing a bit that is outside the length of the string stored into the target key) are always considered to be zero.
There are three commands operating on group of bits:
<code>BITOP</code> performs bit-wise operations between different strings. The provided operations are AND, OR, XOR and NOT.
<code>BITCOUNT</code> performs population counting, reporting the number of bits set to 1.
<code>BITPOS</code> finds the first bit having the specified value of 0 or 1.
Both <code>BITPOS</code> and <code>BITCOUNT</code> are able to operate with byte ranges of the string, instead of running for the whole length of the string. The following is a trivial example of <code>BITCOUNT</code> call:
Common user cases for bitmaps are:
Real time analytics of all kinds.
Storing space efficient but high performance boolean information associated with object IDs.
For example imagine you want to know the longest streak of daily visits of your web site users. You start counting days starting from zero, that is the day you made your web site public, and set a bit with <code>SETBIT</code> every time the user visits the web site. As a bit index you simply take the current unix time, subtract the initial offset, and divide by 3600*24.
This way for each user you have a small string containing the visit information for each day. With <code>BITCOUNT</code> it is possible to easily get the number of days a given user visited the web site, while with a few <code>BITPOS</code> calls, or simply fetching and analyzing the bitmap client-side, it is possible to easily compute the longest streak.
Bitmaps are trivial to split into multiple keys, for example for the sake of sharding the data set and because in general it is better to avoid working with huge keys. To split a bitmap across different keys instead of setting all the bits into a key, a trivial strategy is just to store M bits per key and obtain the key name with <code>bit-number/M</code> and the Nth bit to address inside the key with <code>bit-number MOD M</code>.
HyperLogLogs —
A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, in the case of the Redis implementation, which is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We’ll just call them HLL from now) has seen very few elements.
HLLs in Redis, while technically a different data structure, is encoded as a Redis string, so you can call <code>GET</code> to serialize a HLL, and <code>SET</code> to deserialize it back to the server.
Conceptually the HLL API is like using Sets to do the same task. You would <code>SADD</code> every observed element into a set, and would use <code>SCARD</code> to check the number of elements inside the set, which are unique since <code>SADD</code> will not re-add an existing element.
While you don’t really add items into an HLL, because the data structure only contains a state that does not include actual elements, the API is the same:
Every time you see a new element, you add it to the count with <code>PFADD</code>.
Every time you want to retrieve the current approximation of the unique elements added with <code>PFADD</code> so far, you use the <code>PFCOUNT</code>.
An example of use case for this data structure is counting unique queries performed by users in a search form every day.
Redis is also able to perform the union of HLLs, please check the full documentation for more information.
There are other important things in the Redis API that can’t be explored in the context of this document, but are worth your attention:
It is possible to iterate the key space of a large collection incrementally.
It is possible to run Lua scripts server side to win latency and bandwidth.
Redis is also a Pub-Sub server.
This tutorial is in no way complete and has covered just the basics of the API. Read the command reference to discover a lot more.
Thanks for reading, and have fun hacking with Redis!
本文作者:陳群
本文來自雲栖社群合作夥伴rediscn,了解相關資訊可以關注redis.cn網站。