The Command Line Scalpel: Solving a $10,000/Hour Outage with One Line of Code

Junior/Mid Engineer Asked at: FAANG, Startups, Cloud Providers

Q: How do you find all files larger than 100MB under /var/log?

Why this matters: This is not a test of memory. It's a simulation of a real-world crisis. A server's disk space is finite, and runaway log files are a common cause of production outages. Your ability to quickly diagnose and resolve this issue on a live server is a direct measure of your operational competence.

Interview frequency: High. A fundamental skill for anyone who will ever touch a production environment.

❌ The Death Trap

The candidate freezes. They can't recall the exact syntax. They say, "Uh, I'd probably Google it." While honest, it misses the point. The worst answer is suggesting a complex solution, like "I'd write a Python script to iterate through the files," which is like using a sledgehammer for surgery.

"Most people say: 'I'd use `ls -l` and then maybe `grep` or `awk` to parse the size...' This is slow, inefficient, and signals that you're not familiar with the purpose-built tools for the job."

🔄 The Reframe

What they're really asking: "It's 3 AM. A critical server is down because the disk is full. SSH is lagging. Every second counts. Are you the firefighter who knows how to use the axe, or are you the bystander looking for the instruction manual?"

This reveals if you have "server instincts." Can you operate under pressure with the native, universal tools available on any Linux system? They are testing for self-sufficiency and resourcefulness.

🧠 The Mental Model

I think of this through the lens of the Unix Philosophy, which is a powerful model for problem-solving.

1. Small, Sharp Tools: Use the right tool for the job. For finding files, the tool is `find`. It's designed for this and does it better than anything else.
2. Compose and Conquer: Start with the simplest command that works, then add complexity if needed. `find` is a one-stop-shop for this problem.
3. Silence is Golden: A tool should do its job without chatter. The correct command will give you exactly what you asked for and nothing more.

📖 The War Story

Situation: "I was on-call. At 3:17 AM, PagerDuty screamed: 'Disk Space Critical on primary database replica.' This server handled all our internal analytics and reporting."

Challenge: "SSHing into the machine was painfully slow. Every command took 10-15 seconds to respond. `df -h` confirmed the root partition was at 100%. The application was down. The business was flying blind."

Stakes: "This wasn't just an inconvenience. Our fraud detection models relied on real-time queries from this replica. Every minute it was down was a minute we were exposed. We were losing situational awareness of our entire operation."

✅ The Answer

My Thinking Process:

"My first thought is, it's almost always a runaway log file or a core dump. Running a `du -sh *` from the root directory would be too slow on a dying machine. The most likely culprit is `/var/log`. I need a command that searches metadata (like file size) without reading the file contents. The perfect tool for this is `find`."

What I Did:

"So, the first command I'd run is the direct answer to the question:"

find /var/log -type f -size +100M

"I would then break down what each part does for the interviewer:

  • find: The command itself, our file-finding scalpel.
  • /var/log: The path to search within.
  • -type f: An expression to find only entities of type 'file', ignoring directories, links, etc.
  • -size +100M: The core condition. Find files with a size greater than (+) 100 Megabytes (M).

"In my war story, this command instantly returned `/var/log/app/debug.log` at 47GB. My next steps were crucial:

  1. Verify: I used `tail /var/log/app/debug.log` to see the last few lines. It was a repeating error message from a misconfigured debug flag.
  2. Remediate (Safely): I didn't use `rm`. That can be dangerous if a process has an open file handle. Instead, I truncated the file to zero bytes, which is instantaneous and safe: > /var/log/app/debug.log

The Outcome:

"The moment I truncated the file, disk space dropped from 100% to 22%. The server load plummeted, and SSH became responsive again. Within two minutes, the database replica was healthy and all systems returned to normal. The immediate crisis was over."

What I Learned:

"That incident cemented a core belief: mastery of the basics is non-negotiable. In a crisis, you don't rise to the occasion; you fall to the level of your training. Knowing `find` cold was more valuable than any complex cloud tool in that moment. It also taught me the critical difference between `rm` and `>` for live log files."

🎯 The Memorable Hook

Anyone can learn a complex tool. True seniority is knowing which simple, universal tool will solve the problem with the least amount of risk and effort. The command line is the great equalizer—it's available everywhere, and its power is directly proportional to your understanding of it.

💭 Inevitable Follow-ups

Q: "Great. Now delete all those files you just found."

Be ready: "The `find` command has a built-in action for this: `find /var/log -type f -size +100M -delete`. This is safer and more efficient than piping to `xargs rm`. You should always run the command without `-delete` first to double-check what you're about to remove."

Q: "How would you find the top 5 largest files on the *entire* system?"

Be ready: "Here, you compose tools. You'd use `find` to get the raw data, `sort` to order it, and `head` to filter it: `find / -type f -printf "%s %p\n" 2>/dev/null | sort -nr | head -n 5`. Explain each part: print size and path, ignore permission errors, sort numerically in reverse, and take the top 5."

Q: "How do you prevent this from happening again?"

Be ready: "This is the crucial step. You'd configure `logrotate`. I'd set up a policy in `/etc/logrotate.d/` for our application to rotate logs when they reach a certain size (e.g., 100M), keep a limited number of old logs, and compress them. This automates the cleanup and prevents future outages."

Written by Benito J D