The Chamber 🏰 of Tech Secrets is open. Today the world of software and systems engineering meets self-help as we talk about resilience in systems and in life.
Resilience is a common theme in systems engineering. I like the definition: “the ability of a system to maintain an acceptable level of service despite failures, challenges, and unexpected conditions”. This is accomplished through architectural and engineering techniques that anticipate, respond to, and recover from failures.
Those with a systems background will be familiar with many of these techniques including monitoring for early detection, elasticity, scaling, backups, hot-standby environments, regionally distributed workloads, rapid recovery, failovers, and the like.
When failures do occur, we are typically very intentional about learning from them, employing tactics like blameless postmortems.
I often like to look at metaphors in the world and bring them into tech, but today let’s go the opposite direction.
A few months ago, NVidia CEO Jensen Huang gave a talk in which he said a few very interesting things about resilience. Give it a watch.
One that stand out to me: “I don’t know how to teach it to you except for I hope suffering happens to you.”
Is Jensen just a meanie? No. He understands that adversity, challenge, frustration, struggles, hardships, and failures are simply part of life, but also are often the very best of teachers. More so, they are the catalyst for improvement and change.
I think this is an important lesson to take to heart in the business world. In my experience, the higher level your role, the more you have to influence and align with others. As your role expands, there are by nature more groups you must align with to be successful. This means things often won’t go your way.
How will you respond? Do you choose to live in frustration or resign yourself to giving a lesser effort? Or do you choose to treat each work frustration or failure to influence as a chance to self-assess, refocus on what is most important, and get better?
Do you choose to give up, or are you resilient?
Here’s 3 tips from systems that may help you choose resilience.
Invest in resilience: In software and infrastructure, resilience isn’t the default—it requires deliberate investment in redundancy, failover mechanisms, and recovery processes. The same is true for personal resilience. You have to invest in habits, relationships, and mental frameworks that help you recover from setbacks. One example would be Failure Testing. Just as engineers run chaos experiments to test system stability under stress, you can proactively put yourself in controlled challenges—hard workouts, “scary” activities, new challenges, learning new skills—to build resilience. Personally, I try and have a challenge every 2 months that gets me out of my comfort zone, pushes me in a new area, or forces a beginner’s mindset. Just for fun:
January / February - Try Brazilian Jiu Jitsu (first drop in today, actually)
March / April - Run a sub-6 minute mile. I want to see 5:XX on the clock. I am not sure if I’ve ever done that.
May / June - Hyrox, Tactical Games
July / August - 50k run / hike on the Appalachian Trail in GA (6500k vertical up/down)
September / October - Climb Grand Teton from car-to-car in a day (7k vertical up/down), possible solo backcountry elk hunt
November / December - TBD
Observability: In systems, we have lots of tools that create data that we can use to understand what is taking place in near-real-time so that we can quickly react to issues. Do you have any data about you? What does nominal and optimal performance look like at work, in your health and fitness, and in your relationships? Do you have a vision for what you want life to look like? Goals? Do you have any time to reflect on those and measure if things are on par or not? Personally, I like to have goals for each month and have a weekly and monthly review process I go through. My wife and I also write down answers to ~10 questions each month about how things are going between us to ensure we’re communicating about what’s important. This creates data and a chance to reflect.
Blameless post-mortem: In software, teams often conduct “blameless post-mortems” or retrospectives to look back on an incident or outage and see, objectively, what can be done better in the future to avoid a repeat? Take the time to analyze your decisions and resulting outcomes. Don’t blame others and don’t blame yourself (I always fail at these things, nobody listens to me, blah blah blah). I often use a simple set of questions:
What went well?
What didn’t go well?
What did I learn?
Being resilient does not mean pretending all is well. You can limp along through a struggle and still be resilient if you are operating at “an acceptable level of service”. Software outages hurt. Suffering in life hurts. We can and must choose resilience. What will you do when you experience failures, challenges, or unexpected conditions? Resilience is a choice that takes investment, and the choice is yours.
Bang on!
'If you do not make time for your wellness, you will be forced to make time for your illness'.
What we aren't changing, we're choosing!! I love the blameless checkins/postmortems, brilliant strategy, keeping it simple, keeping it focused, and living a life with intention and love!
Solid post, Brian. It seems there was a good bit of challenges from early on in Mr. Jensen Huang's life... which of course required much resiliency & grit from him. I smell biographical film potential here - hehe
Yes, his perseverance certainly appears to paying off... funny how hard work does that ;)
Also, your "comfort zone challenges" are pretty darn awesome too! May have to steal one or two of those (was actually hopeful to see if my oldest daughter would want to check out Brazilian jiu-jitsu classes with me this year!)
Tech Secret #44, makes me think of this verse:
"but we also glory in our sufferings, because we know that suffering produces perseverance;
perseverance, character; and character, hope"
Thanks for this post, Brian. All the best!