Our DevOps architect, Haggai Philip Zaguri, helps make sense of Chaos Engineering.
Tell us a little bit about your background
I started with IT / OPS in a small startup in Tel Aviv, at some point, we moved from the basement to new offices. I did everything, including IT, Ops & Scripts. We grew from a small company with just a few customers to an Internet provider. Initially, we turned to open source for financial reasons and from there it became a way of life and a way of thinking.
Was there a turning point in your career? A moment when you understand that you need to change direction or look at things from another perspective?
The turnaround occurred when I started working at Tikal. I had to deal with a large number of clients that needed different solutions. My guiding principle was always good communication and a good understanding of the problem, helping me reach the right solution for the specific customer.
What are you going to talk about at FTRD?
The theme I'm going to talk about is Chaos Engineering, in general Chaos is a very sensitve word for me in the DevOps world where there's a lot of chaos, literally, and Chaos Engineering is basically dealing with this mess and putting things in place. Issues like monitoring & logging have more weight. In addition, the organizations we meet create a challenge of finding the correct path for them. Whether a parent company buys other companies and needs to organize them, or a company pivoting from Monolith to Microservices, suddenly needs new tools to meet current challenges. The field of Chaos Engineering started with Netflix, they based it on the theory of chaos from mathematics. According to this theory, it's possible to calculate the path of an object, but not the anomaly of when it'll come in contact with another object. Netflix had to deal with the early stages of cloud services, where concepts like Autoscale were not yet common. Therefore, they developed a set of tools, like an army of “monkeys”, that tries to deal with different parts of the app like security etc. This correlated well with their corporate culture of collective responsibility, so they started to create tools that create a system outage and try to let developers deal with the problems. Some of these tools are less relevant today since there are fewer limits to resources on the cloud. Today there are more complex tools such as Kubeaudit or Kubehunter that can take them further in the development process to the CI / CD stages.
Are there any influencers in the field you are following?
I follow Martin Fowler in particular and a number of people at ThoughtWorks. I also follow Adrian Cockcroft former Cloud Architect at NetFlix and Kelsey Hightower from Google.
In your eyes, what is the most important thing when you look at the world of Full Stack and development?
The most important thing is communication both at the technological level and on the interpersonal level. Without cooperation between people, it is impossible to do things, especially in the Full Stack world, which is advancing towards Full Cycle, requiring collective responsibility.