The Sentinel's Logbook: A Python Script That Listens to Reality

Q: Can you write a Python script that monitors a directory for changes—like file creation, modification, or deletion—and logs those changes?

Why this matters: This seems like a simple coding challenge, but it's a test of your systems thinking. The interviewer wants to know if you can build a reliable agent that creates a feedback loop from a fundamental part of the operating system. Can you turn a simple requirement into a production-ready tool?

Interview frequency: High for SRE, DevOps, and Platform Engineering roles.

❌ The Death Trap

The candidate provides a quick, fragile script using a low-level library or a naive `while True` loop with `os.listdir`. They present the code without context or a story.

"Sure, I'd use the `watchdog` library. You just create an event handler and an observer, and then start it. Here's the code..."

This answer is technically functional but lacks any senior-level thinking. It solves the immediate problem but ignores reliability, robustness, and the *purpose* behind the request.

🔄 The Reframe

What they're really asking: "Can you build a reliable, event-driven agent that acts as a nervous system for a critical part of our infrastructure? Can you explain how this simple tool is actually a fundamental building block for automation, security, and data pipelines?"

This reframes the task from writing a script to architecting a solution. It's about demonstrating your ability to see a simple tool as a point of high leverage in a complex system.

🧠 The Mental Model

The "Digital Sentinel" model. This script is not a "watcher"; it's a sentinel guarding a critical asset.

1. The Vault (The Directory): This isn't just a folder. It's a vault containing critical assets: configuration files, incoming data for a pipeline, or deployment artifacts. Its state is a source of truth.
2. The Sentinel (The Python Script): The script is a tireless guard. Its sole purpose is to observe the vault and report any activity, creating an immutable audit trail.
3. The Sentinel's Senses (`watchdog`): The `watchdog` library provides the senses. It abstracts the OS-level APIs (`inotify` on Linux, etc.) that allow the sentinel to perceive events without constantly polling, making it highly efficient.
4. The Logbook (Structured Logging): The sentinel's reports must be clear, unambiguous, and machine-parseable. We don't just `print()`; we create structured logs (e.g., JSON) that can be ingested by other automated systems.

📖 The War Story

Situation: "We had a critical data ingestion pipeline that relied on partners dropping CSV files into a specific SFTP directory. This was the entry point for our entire analytics workflow."

Challenge: "Our old system was a fragile cron job that ran every 15 minutes, scanning the directory. It was unreliable. If the cron failed, data was delayed for hours. If a file was half-written when the scan ran, it would ingest corrupted data. We were blind to what was happening in this critical directory between scans."

Stakes: "Data delays meant our business intelligence dashboards were out of date, leading to bad decisions. Corrupted data ingestion would poison our entire data lake, requiring days of manual cleanup. The business was losing trust in our data."

✅ The Answer

My Thinking Process:

"The core problem was our polling-based, blind approach. We needed to move to an event-driven model. We needed a 'sentinel' that would react instantly and intelligently to any change in the directory, creating a reliable feedback loop. This isn't just a script; it's the first step in an intelligent automation pipeline."

What I Did: The Sentinel's Code

"Here's the robust, production-minded Python script I would build to solve this. It's not just a script; it's a reliable service."

import time import logging import json from watchdog.observers import Observer from watchdog.events import FileSystemEventHandler # The Sentinel's Logbook: Structured, machine-parseable logging. def setup_logging(): handler = logging.StreamHandler() formatter = logging.Formatter('{"timestamp": "%(asctime)s", "level": "%(levelname)s", "message": %(message)s}') handler.setFormatter(formatter) logger = logging.getLogger('sentinel') logger.setLevel(logging.INFO) logger.addHandler(handler) return logger # The Sentinel's Rules of Engagement: What to do when it sees something. class SentinelEventHandler(FileSystemEventHandler): def __init__(self, logger): self.logger = logger def on_created(self, event): log_entry = {"event": "created", "path": event.src_path, "is_directory": event.is_directory} self.logger.info(json.dumps(log_entry)) def on_modified(self, event): log_entry = {"event": "modified", "path": event.src_path, "is_directory": event.is_directory} self.logger.info(json.dumps(log_entry)) def on_deleted(self, event): log_entry = {"event": "deleted", "path": event.src_path, "is_directory": event.is_directory} self.logger.info(json.dumps(log_entry)) if __name__ == "__main__": path_to_watch = "." # The Vault logger = setup_logging() event_handler = SentinelEventHandler(logger) observer = Observer() # The Sentinel itself observer.schedule(event_handler, path_to_watch, recursive=True) logger.info(json.dumps({"event": "sentinel_started", "path": path_to_watch})) observer.start() try: while True: time.sleep(1) except KeyboardInterrupt: observer.stop() logger.info(json.dumps({"event": "sentinel_stopped"})) observer.join()

The Outcome:

"By replacing the cron job with this event-driven sentinel, we eliminated the 15-minute data lag. Our pipeline became real-time. The structured logs fed directly into our monitoring system, giving us an instant audit trail of every file delivery and alerting us to any unexpected deletions. We went from a reactive, failure-prone system to a proactive, highly reliable one."

What I Learned:

"I learned that the most robust systems are built on simple, efficient feedback loops. This script isn't complex, but its value is immense because it creates a direct, real-time connection between the state of the filesystem and our business logic. It's a point of high leverage."

🎯 The Memorable Hook

This connects a simple technical implementation to a profound, first-principles view of systems, information, and reality.

💭 Inevitable Follow-ups

Q: "How would you make this script production-ready to run as a long-lived service?"

Be ready: "I'd containerize it with Docker for portability. I'd manage it with `systemd` to ensure it restarts on failure. I'd add more robust error handling and metrics to track the number of events processed and any errors in the handler itself. The logs would be shipped to a central logging platform like Elasticsearch or Loki, not just printed to stdout."

Q: "What are the limitations of this approach, especially on a very busy filesystem?"

Be ready: "The underlying OS mechanisms like `inotify` have limits on the number of watches. On an extremely busy system, the event queue in the kernel can overflow, causing you to miss events. For that scale, you might need a more distributed queuing system. However, for 99% of use cases like configuration or data drops, this approach is the most efficient and reliable."

Written by Benito J D