The Fog of War: Answering "Tell me about a decision you made with incomplete data."

Senior/Staff Engineer Asked at: FAANG, Stripe, Cloudflare, Datadog

Q: "Tell me about a time you had to make a critical decision with incomplete information. How did you approach it, and what was the outcome?"

Why this matters: Engineering isn't about following a spec. It's about navigating ambiguity. This question tests your judgment, risk assessment, and ability to create forward momentum in the face of uncertainty. They're not hiring a coder; they're hiring a decision-maker.

Interview frequency: Almost guaranteed in senior and staff loops.

❌ The Death Trap

95% of candidates fall into one of three traps:
1. The Hypothetical: They talk about what they *would* do, revealing they lack real-world experience in high-stakes situations.
2. The Low-Stakes Story: They describe choosing a library or debating a minor feature, trivializing the question and signaling they avoid real responsibility.
3. The Lucky Guess: They tell a story where they made a gut call that happened to work out, failing to show a repeatable process for sound judgment.

"Most people say: 'Well, we weren't sure if we should use React or Vue. I gathered some data, we made a list of pros and cons, and we picked React. It worked out fine.'"

🔄 The Reframe

What they're really asking: "Describe your mental model for navigating uncertainty. Prove that you can de-risk the unknown and make progress when the path is not clear."

This reveals: Your ability to think in probabilities, your bias toward action, and your understanding of second-order consequences. It separates operators from theorists.

🧠 The Mental Model

Don't just tell a story. Frame your story with a decision-making framework. This shows you have a process, not just a lucky anecdote. I call it the **"Map & Move"** framework.

1. Frame the Real Question & Constraints
2. Map Knowns, Unknowns, and Reversibility
3. Choose the Path That Buys the Most Information
4. Act, Communicate the Trade-offs, and Measure

📖 The War Story

Situation: "At a previous fintech company, our core payment processing service began suffering intermittent, severe latency spikes—going from 50ms to over 5000ms—during peak traffic hours."

Challenge: "Our dashboards lit up like a Christmas tree, but the logs were useless. They showed the delay, but not the source. We had no clear root cause. Was it a database lock, a network issue, a subtle bug in our code, or a silent failure from our third-party payment gateway?"

Stakes: "Around 15% of checkouts were failing. We were losing about $10,000 in revenue every 10 minutes this continued. More importantly, our merchant trust was eroding with every failed transaction. The CTO was on the call, asking for an ETA for a fix we couldn't even identify."

✅ The Answer

My Thinking Process:

"This was a classic 'fog of war' scenario. My first instinct wasn't to find the root cause, but to find the fastest path to mitigate customer impact and gather more data. I immediately started running the 'Map & Move' framework in my head."

What I Did:

"First, I **framed the real question**. It wasn't 'What is the bug?' It was 'What is the fastest, safest way to restore service and gain clarity?' We had a 30-minute window before the next major peak.

Second, I **mapped the unknowns and reversibility**. We had two paths: 1) Deploy a speculative hotfix to add more logging and timeouts around our code. This was risky and a 'one-way door'—if it made things worse, rollback would be slow. 2) Activate our secondary, more expensive payment provider, which we used for disaster recovery. This was a 'two-way door'—highly reversible. We could flip a switch and be back in seconds.

Third, I **chose the path that bought information**. The failover was the obvious choice. It would immediately tell us if the problem was internal to our system or related to the primary payment gateway. Stopping the bleeding was more valuable than being a hero who finds the bug instantly.

Finally, I **acted and communicated**. I told my director: 'I'm initiating a failover to our secondary provider. It will increase our transaction cost by 8% while active, but it de-risks a potential 15% revenue loss and buys our team time to diagnose. This is a temporary measure.' I made the call, and we rerouted 100% of traffic within 60 seconds."

The Outcome:

"The impact was immediate. The latency spikes vanished. Checkout success rate returned to 99.9%. This confirmed the issue was with our primary gateway. We had protected an estimated $150,000 in revenue at the cost of about $5,000 in higher fees. With the pressure off, we were able to work with the provider to identify a bug in their load balancer. We switched back four hours later."

What I Learned:

"I learned that in a crisis with incomplete information, the most valuable decision isn't the one that solves the problem, but the one that creates the conditions for the solution to be found. It’s about buying time and information. We have since built automated health checks that trigger this failover automatically."

🎯 The Memorable Hook

This connects your practical action to a deeper principle about decision-making under uncertainty, showing you're not just a problem-solver but a strategic thinker.

💭 Inevitable Follow-ups

Q: "How did you get buy-in for a decision that immediately increased costs?"

Be ready: Frame your answer in terms of risk mitigation and expected value. "I presented it as buying an insurance policy. A guaranteed small loss of $5k was better than a probable large loss of $150k."

Q: "What would you have done if the failover didn't work?"

Be ready: This tests your contingency planning. "My next step was to assume the problem was ours. The plan was to immediately begin scaling down non-critical features on the same infrastructure to reduce cognitive load on the system, while a separate sub-team prepared to deploy the enhanced logging build."

🔄 Adapt This Framework

If you're junior: Focus on a smaller scope. Talk about a time you had to choose a technical approach for a feature with ambiguous requirements. Your "Map & Move" framework applies to choosing a path that makes it easiest to pivot when requirements become clearer.

If you're senior: The war story above is ideal. Emphasize the blast radius of the decision, the cross-functional communication (to product, leadership, etc.), and the long-term systemic improvements that came from it.

If you lack this specific experience: Use an architectural design decision. "We had to choose between a monolith and microservices for a new project with an unclear future scale. We chose a 'modular monolith'—a reversible, two-way door that gave us speed now without closing the door to microservices later."

Written by Benito J D