Operations
FireDrill GameDays at Sky Betting & Gaming
Running GameDays at SB&G and the lessons learned
11 minute read
Zero-Downtime Kubernetes Deployments
When migrating services to shiny new cloud-native infrastructure, special care must be taken to ensure that releases that were zero-downtime continue to be so. When said service is the login system for your entire customer-facing product offering, a little extra effort is probably needed
10 minute read
Rising from the Ashes
We’ve always enjoyed running incident response drills, but they were becoming stale. This post covers how we addressed the problems with our fire drills and iterated upon them
8 minute read
Kafka on NFS
There is a general recommendation against running Apache Kafka on NFS storage, but nobody really gives a good explanation as to why. In this post we look at some broker crashes we have seen happening on Kafka clusters which use NFS storage and why they were happening.
5 minute read
JMX Metrics in Kafka Connect
The use of JMX metrics in Java applications is often poorly documented and is a feature that people are often unaware of. In this post we explore how to use the JMX metrics provided by Kafka Connect.
11 minute read
Crash! Bang! Wallop! Practice makes perfect
Engineered Chaos, breaking production, and getting away with it. How the Core Tribe in Sky Betting and Gaming break stuff to make things better
14 minute read