Testing With Load
Testing with behaviors is great, as I have mentioned in prior posts, and unit testing is great for making sure your code does what you think it does. With those two forms of testing we cover all our bases for web api development testing, right?
No. Web api development must be tested at scale, under load, because under those conditions strange issues start cropping up. Where I work we are experimenting with different load testing frameworks, and I believe we have settled on locust for our load testing needs.
Locust is neat, because with minimal python (example below stolen from locust) you can have a fully functional load test on your application:
Back to what I wanted to talk about in this post. We were performing load tests
yesterday, when all of a sudden we saw a HUGE number of new connections being made
to our redis elasticache instance. This was very strange as we were using a well
known redis library redigo, moreover we were using the
which to my knowledge was supposed to perform connection pooling, thereby only
providing a certain number of connections to a redis server.
We were using redigo by consequence of using negroni-sessions which uses boj/redisstore. So right off the bat I can already tell we are too many layers deep in this redis abstraction. Here is, at a high level, the flow:
- Request Comes In
- Negroni ServeHTTP is called
- Negroni Calls negroni sessions
- Negroni Sessions provides a wrapper around Gorilla Sessions
- Gorilla Sessions uses a plugable Storage Backend
- Gorilla Sessions calls boj/redistore
- Redistore wraps garyburd/redigo for persisting the session
As seen, this is a complicated flow for merely setting a cookie, and persisting said cookie value in a Key/Value Store. Anyways, back to the beginning, it appeared that there was no connection pooling to redis. Here is a sample of how we were initiating negroni-sessions:
A few thoughts on everything so far:
- There is a lot of abstraction for the sake of abstraction in this flow
- Abstraction is fine if reasonable, this is becoming unreasonable
- A little copying is better than a little dependency
- There are a lot of hidden assumptions that get lost along the way,
- Interfaces describe behaviors, Structs describe state
- redigo would be much more powerful (and mockable) if Pool was an Interface instead of a struct.
This is all great, but we are obviously creating a redis.Pool, and yet we are seeing just about one new connection for every api call we make to redis. Looking closer at the pooling implementation specifically this block:
It is fairly obvious at this point, no where in my sessions journey was anyone setting a realistic default value for Pool.MaxActive, and since it is an int in go, we default to 0. The above implementation for getting connections is saying if there is no MaxActive, just go ahead and create a new connection if all other idle connections are busy…. Since we are running a pair of webservers under load it was fairly easy to overrun our MaxIdle connections and require new connections to be setup, and reaped basically as we go. I believe there is some tuning we could do with MaxIdle to not have to reap basically every connection we make also.
This really flies in the face of Rob Pike’s go-proverbs specifically “Making the zero value useful.” In my opinion standard operations of a connection “pool” should default to a sane MaxActive connection limit, as opposed to defaulting to infinity. This doesn’t even resemble a pool under load at this point because blocking connections are immediately reaped after the connection within the pool is “closed” releasing that connection back to the pool..
Moreover, in reviewing redigo, there is a lot of synchronization, with mutexes that would be better suited using goroutines and channels. Specifically going against “Channels orchestrate; mutexes serialize.” and “Don’t communicate by sharing memory, share memory by communicating.”