This document is a collection of some examples and tips for using redis, the open-source data structure server. It is intended primarily for developers, and deliberately omits some topics that will be important in any redis deployment, like security and backups.
Some Uses for Redis Data Types
You may have heard of Redis referred to as a NoSQL Database. Technically, this is true: It's a database that doesn't use SQL. But the statement is meaningless. It is also true that ice cream is a food that isn't made from whale meat. But obviously, calling ice cream a NoWhale Food doesn't give you any sense of how good the ice cream is, or what sets it apart from other kinds of food that aren't made from whale meat but that are also very unlike ice cream -- it just gives the whaling industry a way to lump together people who don't use their products.
So rather than thinking about redis as a database with some kind of non-existent relationship to SQL, think of it as a data structure server with a rich set of commands for querying and manipulating those data structures over a network connection. Redis data types include:
- Strings
- Hashes
- Lists
- Sets
- Ordered Sets (called zsets in redis)
- Transactions
- Publishers and Subscribers
This table lists some common programming tasks and data structures, and suggests some redis functions or data structures for them:
dictionary lookup | SET, GET, SETNX, etc. |
counters | SET, INCR, INCRBY |
list manipulation | LPUSH, RPUSH, LPOP, RPOP, LLEN, LINSERT, LINDEX, etc. |
event logging | lists, zsets, pubsub |
queues | lists (RPUSH, BLPOP, BLPOPRPUSH, etc.) |
priority queues | zsets |
membership | sets, bitstrings |
state | hashes |
heartbeats | zsets |
hit counters | zsets |
message broadcast | pubsub |
search | reverse indexes (never use KEYS in production) |
Documentation
Redis has fantastic documentation. Yet redis is so easy to learn and use that you'll seldom feel you need to consult it.
There are three reasons to look at the docs for every command you use.
- Time complexity - Each command's complexity is given in Big-O notation. This is immensely helpful.
- Related commands - An ever-growing list of related commands is given on the right-hand side of the page.
- Recipes - At the bottom of many pages is a section that gives usage patterns or recipes for the command under discussion. There are many insightful ideas to be found here.
Getting Started with Redis
Before you go any further in this document, make sure you have played with redis for at least 5 or 10 minutes. If you have not, here is a fantastic on-line, interactive redis tutorial: http://try.redis.io/
For reference later on, you'll want the source of all things redis: http://redis.io
You'll want to run a redis-server on your machine, and use the redis-cli at the command-line for hacking.
Node.JS redis client:
- https://github.com/mranney/node_redis
- npm install redis
Python redis client:
There are some gotchas with the Python API: https://github.com/andymccurdy/redis-py#api-reference
- SELECT statement not implemented
- DEL is 'delete' in python
- ZADD argument order is wrong
- SETEX argument order is wrong
The default redis port is 6379.
In node, create client with explicit host and port like so:
> var r = require('redis').createClient(6379, '127.0.0.1')
Commands are asynchronous. In standard node fashion, first return value is error or null; second return value is result of redis command.
> r.set('foo', 42, console.log) null 'OK' > r.get('foo', console.log) null '42'
For the rest of this, I'm going to use the synchronous redis-cli for demonstrations. Lines beginning with redis>
are input to the redis-cli.
Things are Strings, Mostly
As you can see from that last example, the values pointed to by keys are strings or they are nil. Nil values will be given as null in node, not undefined.
But some commands only work with numbers (like INCR); for these, if your key can't be parsed as a number, it's an error:
redis> set foo pie OK redis> incr foo (error) ERR value is not an integer or out of range
Atomic Counters
I guess that sounds like Geiger counters. I mean counters you can update atomically.
- GET key
- SET key value
- EXISTS key
- SETNX key value
- INCR key
- INCRBY key int
- INCRBYFLOAT key float
- GETSET key value
GET and SET are fairly obvious.
SETNX sets a value on a key only if there is no existing value. This is so useful. And it is a single, atomic command.
redis> get foo (nil) redis> setnx foo 17 (integer) 1 redis> get foo "17" redis> setnx foo 42 (integer) 0
The return value from SETNX is 1 if the value was set, 0 otherwise.
INCR, INCRBY, and INCRBYFLOAT all increment counters. They are also atomic operations. They return the post-increment value of your key.
redis> set foo 42 OK redis> incr foo (integer) 43 redis> incrby foo 17 (integer) 60
Notice that these results come back as numbers, not strings.
INCRBYFLOAT is in redis 2.6.
GETSET is awesome. It sets the value of a key, and returns the value the key had before you changed it.
redis> del foo (integer) 1 redis> get foo (nil) redis> getset foo 3 (nil) redis> getset foo 4 "3"
Efficient Multiple Queries and Transaction Blocks
You'll often want to do several queries together. Each redis command results in one network round trip. So chaining lots of commands together is going to waste a lot of time. Redis has a solution for this: Transaction Blocks.
A transaction block is a series of commands sent all at once across the wire. They are executed sequentially as a single atomic operation.
Here's a stupid node script to show how this works:
#!/usr/bin/env node var r = require('redis').createClient(); r.multi() .set("foo", 42) .set("bar", "ice cream") .set("baz", 6.28) .get("foo") .get("bar") .get("baz") .exec(function(err, resultList) { console.log(JSON.stringify(resultList, null, 2)); r.end(); // terminate the redis connection; node can quit });
When run, this prints:
[ "OK", "OK", "OK", "42", "ice cream", "6.28" ]
The result list includes one value per each command executed. Awesome.
Note that if one of your commands crashes, the others still all execute! (Try this by breaking one of the set statements by leaving out the value argument.) You'll get one return value for each operation that didn't crash. This sounds terrifying, but it's not actually such a big deal in practice. Commands will only crash if you feed them the wrong arguments or wrong number of arguments. And you'll probably find problems like that long before you go into production.
Optimistic Locking
Your transactions are atomic once they start, but of course you can't guarantee that you'll get there first. The redis command WATCH lets you name keys you're worried about; it causes your next transaction to be immediately aborted if any of the watched keys has been modified by anyone else.
For example, here's some node code:
var r = require('redis').createClient(); r.watch('foo'); r.get('foo', function(err, result) { // meanwhile, by the time we get to the next exec(), // someone else has modified foo ... r.multi() .set('foo', "some heavy computation") .exec(function(err, results) { // Here we find that err is null // and results is also null <--- nb, NOT an empty list }); });
Nice. You just have to be careful that you check the result type of an exec() when you're watching variables. It could be either a list or null. Note also that it's not an error for a transaction to be aborted, so err will be null in either case.
Lists and Queues
Lists make great queues. Use commands like
- BLPOP
- RPUSH
- BLPOPRPUSH
The commands that start with B are blocking. They will sit there and quietly do nothing until there is a value available on the list for them to pop. You can specify a timeout, or have them block forever (timeout = 0).
The R in RPUSH means to push on the right-hand side of the list. An L is the left-hand side. Suum cuique.
Try this in two separate redis-cli sessions. First here:
redis1> blpop myqueue 0
That redis client should just look like it's hanging. It is. Ok, now in a different shell, and another redis-cli, do this:
redis2> rpush myqueue 42
Now if you look back at the first redis-cli, you'll see that it immediately popped the value off the list and said something like:
redis1> 1) "myqueue" 2) "42" (26.87s)
That rules!
There's even an atomic BLPOPRPUSH that lets you pop something off of one queue and stick it on another.
Publish / Subscribe
That's right, redis comes with pub/sub functionality. How awesome is that. Check this out:
- PUBLISH
- SUBSCRIBE
- UNSUBSCRIBE
- PSUBSCRIBE
- PUNSUBSCRIBE
PUBLISH publishes a message to a channel. You get to make up names of channels on the fly just like variable names.
SUBSCRIBE and UNSUBSCRIBE attach and remove listeners to or from a channel. Again, the channel does not have to exist or be predefined in any way. By naming it, you cause it to exist.
PSUBSCRIBE and PUNSUBSCRIBE are like SUBSCRIBE and UNSUBSCRIBE, but they let you use wildcards for pattern matching. That means you can subscribe to a bunch of channels at once. This is useful for, say, log routing.
Ok, enough of the jibba jabba. Here's an example, again using two separate redis CLIs, to show how it works. In one redis-cli, do this:
redis1> subscribe message-channel Reading messages... (press Ctrl-C to quit) 1) "subscribe" 2) "message-channel" 3) (integer) 1
In the same way the BLPOPping redis-cli blocked, this redis-cli is not capable of doing anything but listen for messages. Now in a second redis-cli, do this:
redis2> publish message-channel "i like pie" publish message-channel "i like pie" (integer) 1
Cast your eager gaze back to the first shell and you will see:
redis1> 1) "message" 2) "message-channel" 3) "i like pie"
Well strip my gears and call me shiftless! Some things to notice:
- The subscriber blocks forever - not just for the first message. So subscribers are dedicated clients.
- The message channel doesn't have to be created in any way.
- The subscriber tells you the name of the channel as well as the message it received.
- I like pie.
- The return value from publish tells you how many subscribers received your message! So if that's 0, you know nobody's listening.
You might use PSUBSCRIBE in a case where you have message channels named according to application namespaces, as different modules or logging systems might do. In that case, just check the name of the channel on the received message.
Subscribe with one client:
redis1> psubscribe log.* 1) "psubscribe" 2) "log.*" 3) (integer) 1
Publish with another client:
redis2> publish log.bacon yes publish log.bacon yes (integer) 1
See message received in first client:
redis1> 1) "pmessage" 2) "log.*" 3) "log.bacon" 4) "yes"
Pubsub is useful for things like:
- Chat programs
- Message passing among application components
- Log routing from applications in multiple languages
It's good stuff.
Heartbeats and Time-based Event Logs
It's a common pattern to use numerical timestamps for ranking members in a ZSET.
I'll show this in the node console, to illustrate how the optional parameters are done.
> var r = require('redis').createClient(); // and I'll define these utility functions just for this example > function now() { return (new Date()).getTime() / 1000 } > function print(err, results) { console.log(JSON.stringify(results, null, 2)) }
Whenever someone logs into my site, I record them in my 'last-login' zset. I'll simulate some logins like so:
> r.zadd('last-login', now(), 'lloyd'); > r.zadd('last-login', now(), 'jparsons'); > r.zadd('last-login', now(), 'zarter'); > r.zadd('last-login', now(), 'lloyd'); // he logged in again! w00t!
Here's what the zset contains now:
> r.zrange('last-login', 0, -1, print); // remember, I defined 'print' above ["jparsons", "zcarter", "lloyd"]
Since this is a set, lloyd only appears once, with updated login timestamp.
ZRANGE gives you everything in the zset, in order of score, from the beginning offset to the end offset. (0 from the front, -1 from the end.)
As you can see, the scores are sorted smallest to biggest. To get the most recent, you can use ZREVRANGE:
> r.zrevrange('last-login', 0, -1, print); ["lloyd", "zcarter", "jparsons"]
To see the scores, use the optional WITHSCORES argument. As a rule, the name of the optional argument is given as a string in the node function call.
> r.zrevrange('last-login', 0, -1, 'WITHSCORES', print); ["lloyd", "1339627441.115", "zcarter", "1339627437.7579999", "jparsons", "1339627432.928"]
Awesome. In addition to getting elements by their rank in the ZSET, you can get them by score. For example, to see everyone who's logged in in the last hour, you could do:
> var an_hour_ago = now() - (60 * 60); > r.zrevrangebyscore('last-login', an_hour_ago, Infinity, print);
Here are two ways to get the last person who logged in:
> r.zrevrange('last-login', 0, 0, print); ["lloyd"]
> r.zrevrangebyscore('last-login', Infinity, 0, 'WITHSCORES', 'LIMIT', 0, 1, print); ["lloyd", "1339627441.115"]
Timeout keys
You can set an expiration date on keys in redis. How cool is that?
- EXPIRE
- EXPIREAT
- TTL
- PERSIST
- SETEX
Example:
redis> set foo 42 OK redis> ttl foo (integer) -1 redis> expire foo 5 (integer) 1
Three seconds later ...
redis> ttl foo (integer) 2
A further two seconds later ...
redis> get foo (nil) redis> ttl foo (integer) -1
Using EXPIRE you set an expiration in seconds from the present. TTL tells you the time-to-live for the key, if there is an expiration set on it. Otherwise, you get -1. Note that you get -1 both when there's no expiration and when there's no key.
If you change your mind, and you want to save the key from oblivion, you can PERSIST it to undo the EXPIRE command.
EXPIREAT lets you specify the expiration date of a key in unix timestamp.
SETEX is an atomic shortcut for SET + EXPIRE.
Using ZSETs, you can periodically cull expired objects by using the ZREMRANGEBYSCORE and ZREMRANGEBYRANK commands.
Mutex Locks
In the case where you need a mutex, here's a recipe for using the atomic GETSET and SETNX to orchestrate locking: http://redis.io/commands/setnx (scroll down). I have used this recipe in production.
Bit Strings
Bit strings are just that: compact representations of ones and zeroes. I haven't yet used bitstrings in production myself. I recommend reading this article on realtime metrics with bitstrings: http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/.
Extra data in zsets
The score for a zset value is a float. You can be clever and sneak extra data in by encoding something else after the decimal. This adds processing time and complexity for your application, but could save you space in redis for large data sets. So the score would be <score_number>.<extra_data_number>
.
My Most Favorite Flavor
My most favorite flavor in redis is the ZSET.
ZSETs are a large part of the answer to most of life's problems. And more than that, they are just fucking awesome. Fast, efficient, versatile, simple, powerful. Talk to your doctor today to find out whether ZSETs might be right for you.
You may encounter articles on the web trying to scare you away from using lots of small zsets due to their memory consumption. If they were written before Redis 2.4 was released, they are out-of-date and useless. I recommend instead reading about redis's memory optimizations for small zsets: http://antirez.com/post/everything-about-redis-24.html.
My Least Favorite Flavor
Storing huge globs of JSON.
How can I hate this? I love both redis and JSON; how can I not want to bring them together? It's so tempting to do so! And it's so easy! It's just a stringify()
and a SET
away.
In some cases, stashing JSON or passing it over queues and pubsub channels could be the right thing to do. I think logging data is a good candidate for this. But before you start cramming huge messes of stringified JSON in your db, check in with your gut and ask yourself:
- Does the structure of the JSON matter for my queries?
- How much space could this consume over time?
- Do I want a structured document db and/or sql on the side?
If the structure matters, consider using redis hashes or flattening your data across your keyspace. Or consider storing the JSON data in MongoDB and manipulating pointers and commonly-used and/or rapidly-changing metadata in redis.
Maybe the data doesn't belong in your redis memory at all? For example, if you have a job queueing system, you might want redis to manage the queues themselves, and store pointers for the job data which might be retrieved by workers from some different, disk-resident db. (This is, for instance, how TJ Holowaychuk's implemented kue: https://github.com/learnboost/kue.)
Memory usage in redis is important to think about. The speed benefits of redis are lost when you overflow into disk memory. As a conservative calculation, I multiply all the bytes I think I might store and multiply by 10, which antirez once recommended as a worst-case factor for data structure overhead.
Keyspace Design
Here are some things I keep in mind when creating keys:
- Keys are like urls and should read like nice urls
- Keys are like urls with paths separated by a colon, '
:
' (this is convention only) - Use a common prefix for all keys in your app (like a domain)
- Be careful about possible name collisions
- Key names should make it obvious what the values are for
- Redis data structures are like program variables; how would you structure your data types in your program?
So for a test user account creation and verification service called "Persona Test User," I have these keys:
ptu:nextval | An iterator |
ptu:mailq | A queue (list) of incoming verification emails |
ptu:emails:staging | A ZSET of emails being staged, sorted by creation date |
ptu:emails:valid | A ZSET of email accounts ready for use, sorted by creation date |
ptu:email:<email>:passwd | The password for an email account |
The ptu:
prefix makes it extra clear what these keys are for. Even if I have a private redis db for this app, this is descriptively useful and adds a measure of safety.
If I ever have more values to store per email than just the password, I could use a hash, with a key like ptu:identity:<email>
.
In the case where you want to do something akin to a join, like, say, associate an email and a remote url to store a BrowserID assertion, just make a new key. In this case, it might be:
ptu:assertion:email:<email>:origin:<domain>
And since assertions time out, I would do a SETEX or EXPIREAT on that key when I created it.
You can rename keys with RENAME