Redis: Particular configuration can cause mismatched responses

Created on 2 Aug 2019  ·  7Comments  ·  Source: go-redis/redis

Using the ClusterClient with ReadOnly set and MaxRetries > 0 can lead to serious bugged client state.

In short, if a timeout or other retryable error occurs, the connection can buffer responses incorrectly, so you can get into a state where your code is:

msg, err := client.Echo("message").Result()
log.Println("msg:", msg) // should print "msg: message"
log.Println("err:", err) // should print: "err: <nil>"

can in fact print "msg: OK" and others.

This can lead to states where the client is unknowingly returning totally incorrect, but valid-looking responses with no error:

Within Redis server: <AAAA: "hello", BBBB: "world">
msg, err = client.Get("AAAA").Result() // "OK", <nil>
msg, err = client.Get("BBBB").Result() // "hello", <nil>

I've created https://github.com/patrickwhite256/goredis-bug-demo which contains a fake redis "server" and a script that can trip the bug. Because it's not a real Redis server, it's not proof positive of the bug, so I have also included a detailed explanation of what exactly needs to happen to trip the bug all the way down the stack.

I've opened #1112 to address this.

Most helpful comment

Any chance we can get this fix back ported to V6? Since we are seeing the problem in production we can't deploy a beta version of the package. Plus we are in the early days of converting to go-modules.

All 7 comments

Hi,

I've not done as thorough check as you so please forgive me if I miss something.

But SingleConnPool is only created in single place to init the connection. If there are any errors the connection is removed from the pool. After net.Conn is initialized the SingleConnPool is discarded and is not used anywhere else.

So you probably confuse newConn and baseClient.newConn.

I also briefly looked at the program and there is no way the program that executes single command and exits can reproduce any bugs in buffering / connection pool. There should not be any buffers / connections next time you run it.

Does that make sense?

Have you run the demo, or just looked? If you haven't tried it, I strongly encourage doing so.

That single place where SingleConnPool is used, while initializing the connection, is where the bug occurs - this is why the bug only happens when ReadOnly is enabled - the initialization is skipped otherwise. From the perspective of _getConn, there are no errors, so the connection is not removed from the pool. The README in the repository I linked has a nearly line-by-line explanation.

Sorry, I've only looked at the program / description. BTW thanks - you've done big work - I appreciate it.

I've slightly changed your fix by providing more robust SingleConnPool implementation from https://github.com/go-pg/pg/blob/master/internal/pool/pool_single.go. Now your example reliably fails but ideally we should also retry by opening new connection. I will see what I can do later...

Thanks for working with me on this. It would be helpful if you could release a new version / tag.

Okay, I've tagged v7.0.0-beta that you should be able to use with Go Modules and semantic version importing import "github.com/go-redis/redis/v7". See https://github.com/golang/go/wiki/Modules if you are not familiar with Go Modules.

Any chance we can get this fix back ported to V6? Since we are seeing the problem in production we can't deploy a beta version of the package. Plus we are in the early days of converting to go-modules.

Also tagged v6.15.4 with the fix for this issue

Was this page helpful?
0 / 5 - 0 ratings

Related issues

songlong picture songlong  ·  8Comments

mouhong picture mouhong  ·  3Comments

MaerF0x0 picture MaerF0x0  ·  7Comments

Turing-Chu picture Turing-Chu  ·  3Comments

youcandoit95 picture youcandoit95  ·  3Comments