Graceful Shutdown of Go App in AWS ECS
At work we came across a problem set where we needed to stop one of our golang application gracefully in Amazon ECS. This application Shepard’s many transactions that are in flight, and it is not acceptable for us to merely stop the container without at least changing state of those transactions, if not actually completing the transactions.
After some research into how docker stops containers it seems that the following mapping is how docker tells containers to stop processing:
docker stop
=>kill -SIGTERM
then after a time periodkill -SIGKILL
docker rm -f
=>kill -SIGKILL
Basically if you do a docker stop to a container PID 1 in that container will be handed a SIGTERM signal, and told to terminate. This is great, because we can write our code to handle that signal appropriately and stop accepting new transactions, in order to clean up the old transactions before shutdown.
As seen in this excellent article docker stop
takes a timeout parameter, which should default
to 30 seconds. After that 30 second timer ends, docker will send a SIGKILL
to the container process
and all hope will be lost, as there is no way to intercept a SIGKILL.
What is really enlightening about that article is the fact that if you do not construct your dockerfile right
using the exec
CMD syntax, your application will start from a parent /bin/sh
which will not broadcast
the signals received to the child process, which is the thing we want to signal. By making our Dockerfile
look like the below we will be running /app
as PID 1 inside the container:
Below you can see, I made the following gist as a proof of concept golang service that will watch for the appropriate signal from docker:
When the above running application receives a kill -SIGTERM or kill -SIGINT it will be caught by our signal watcher worker anonymous function, which in turn will signal the main app to signal all of it’s worker goroutines to shutdown. After business logic for shutdown occurs, we signal back to the signal watcher worker we have completed on the done channel, which will then write an appropriate code to the returnCode.
Since we use ECS in Amazon for our container deployment, and de-deployment, we needed to make sure that ECS was using the same docker stop mechanism to make sure this will work, and behold it does use the docker client StopContainer function to stop containers.
I guess the point is, there are times when your app needs to be responsible on shutdown, and you should make every effort to clean up after yourself, even if you are being signaled to quit.