The River and the Lake: Python's Most Deceptive Interview Question
Q: Write a Python script to read a log file and print all lines containing the word "ERROR".
Why this matters: This seems trivial, but it's a filter for practicality. They are not testing your knowledge of Python syntax. They are testing if you understand that in the real world, log files are not 10 lines long; they are gigabytes. Your answer reveals if you write code for the textbook or for production reality.
Interview frequency: Extremely high. A go-to "warm-up" scripting question.
❌ The Death Trap
90% of candidates write code that works perfectly on their laptop with a 5-line test file. The trap is that this very same code will crash a server when faced with a real log file. They load the entire file into memory.
"Most people say: 'Sure, I'll just read all the lines into a list and loop through it.'"
# The naive, memory-hungry approach
def find_errors(filename):
with open(filename, 'r') as f:
lines = f.readlines() # <-- The TRAP!
for line in lines:
if 'ERROR' in line:
print(line)
If `filename` is a 2GB log file, this script will attempt to allocate 2GB of RAM. On a resource-constrained server, this will get it killed by the OOM (Out Of Memory) killer.
🔄 The Reframe
What they're really asking: "Show me you can process data that is larger than the available memory. Prove you respect the machine's constraints."
This reveals your operational discipline. It's a proxy for understanding concepts like streaming, generators, and resource management. It shows you think about the environment your code will run in, not just the code itself.
🧠 The Mental Model
I use a simple analogy: **"Treat the file like a river, not a lake."**
📖 The War Story
Situation: "At a former company, a junior engineer was tasked with writing a simple script to count error types from the previous day's application log. The script was meant to run once a day on the main application server."
Challenge: "He wrote a script using the 'Lake' approach—`readlines()`. It worked perfectly on his 1MB test log. However, in production, after a particularly busy day, the log file had grown to 5GB. The cron job kicked off at midnight."
Stakes: "The script immediately tried to allocate 5GB of RAM on a server that only had 8GB total. It starved the main application, causing it to become unresponsive. The OOM killer eventually terminated the script, but not before causing a 15-minute outage during a critical batch processing window. A simple, 'harmless' script took down production."
✅ The Answer
"Absolutely. This is a classic problem that highlights the importance of memory-efficient processing. My approach would be to treat the file like a stream, processing it line by line without ever loading the whole thing into memory."
The Memory-Efficient Solution
import sys
def find_errors_safely(log_file_path):
"""
Reads a log file line by line and prints lines containing 'ERROR'.
This approach is memory-efficient and suitable for very large files.
"""
try:
# The 'with' statement ensures the file is properly closed, even if errors occur.
with open(log_file_path, 'r') as log_file:
# Iterating directly over the file object is like the 'River' approach.
# It reads the file one line at a time into memory.
for line in log_file:
if 'ERROR' in line:
# .strip() removes leading/trailing whitespace, including the newline character.
print(line.strip())
except FileNotFoundError:
print(f"Error: The file was not found at {log_file_path}", file=sys.stderr)
except Exception as e:
print(f"An unexpected error occurred: {e}", file=sys.stderr)
# Example usage:
if __name__ == "__main__":
find_errors_safely('/var/log/app.log')
Why This Is Better:
- Memory Efficiency: The script's memory usage is constant, regardless of whether the file is 1 kilobyte or 1 terabyte. It only ever holds one line in memory at a time.
- Robustness: The
with open(...)syntax is crucial. It guarantees the file handle is closed, preventing resource leaks, even if an exception occurs inside the block. - Error Handling: It includes basic `try...except` blocks to gracefully handle common issues like a missing file, which is essential for any production-ready script.
🎯 The Memorable Hook
"A junior developer writes code that works on their machine. A senior developer writes code that respects the production machine it will ultimately run on."
This reframes the problem from a simple scripting task to a matter of professional responsibility and system-wide thinking. It shows you have the maturity to think beyond your own editor.
💭 Inevitable Follow-ups
Q: "How would you make the search term 'ERROR' case-insensitive?"
Be ready: "Simple. I would convert each line to lowercase before checking: if 'error' in line.lower():. This ensures it matches 'ERROR', 'Error', or 'error'."
Q: "How would you modify this to be a reusable command-line tool that can take the filename and search term as arguments?"
Be ready: "I'd use Python's built-in argparse library. This allows for defining command-line arguments, handling help text, and providing a clean user interface. I'd refactor the script to take `log_file_path` and `search_term` from `argparse` instead of being hardcoded."
Q: "For a 100GB file, is Python even the right tool? What else could you use?"
Be ready: "That's a great question about choosing the right tool for the job. While this Python script would *work* and still be memory-efficient, standard command-line tools like grep, which are written in highly optimized C, would likely be much faster. For a simple search, I would probably use `grep 'ERROR' /var/log/app.log`. I'd choose Python when the logic becomes more complex—for example, if I needed to parse JSON from the log line, connect to a database, or send a formatted alert."
🔄 Adapt This Framework
If you're junior: Nailing the "River vs. Lake" analogy and providing the memory-safe code is a home run. It shows you've learned a crucial lesson early. You don't need to proactively bring up `argparse` or `grep`, but be ready for it.
If you're senior: You should immediately identify the memory trap. Your answer should start with, "The most important consideration here is memory usage, as log files can be massive." You should proactively discuss the trade-offs between this script and using native tools like `grep` or `awk`.
