The Filesystem Compass: Navigating Directories with Precision

Junior/Mid Engineer Asked at: Microsoft, Azure, Enterprises

Q: How do you recursively list all files with a certain extension in PowerShell?

Why this matters: This question is a performance test disguised as a file search question. The answer separates those who know the command from those who understand the system. On a file server with millions of files, the difference between the naive answer and the professional answer is the difference between a 3-second query and a 30-minute server meltdown.

Interview frequency: Very high. Tests fundamental knowledge of efficient file system interaction.

❌ The Death Trap

The candidate gives the most common, intuitive, and dangerously inefficient answer. They retrieve every single item from the file system and then filter them inside PowerShell. This is the brute-force approach.

"The performance nightmare:

Get-ChildItem -Path C:\... -Recurse | Where-Object { $_.Extension -eq '.log' }
This command tells PowerShell: 'Go get every single file and folder, turn each one into a rich object, bring them all back to me, and *then* I'll look at the extension and tell you which ones I want to keep.' You've made the file system do a mountain of unnecessary work."

🔄 The Reframe

What they're really asking: "I need to find a needle in a haystack the size of a continent. Are you going to ship the entire haystack to my office for me to sift through, or do you know how to ask the haystack for the needle directly?"

This tests your understanding of "shifting left" on performance. The most efficient work is the work that you can get someone else—in this case, the highly-optimized file system provider—to do for you. It's about delegation and leverage, not just execution.

🧠 The Mental Model

I call this the "Librarian's Request" principle. You wouldn't ask a librarian to bring you every book in the library so you can find the ones about dragons. You go to the card catalog first.

1. Specify the Aisle (`-Path`): Tell the librarian which section of the library you're interested in.
2. Use the Card Catalog (`-Filter`): Give the librarian a precise query that they can look up in their optimized index. This is the fastest possible search.
3. Receive Only the Results: The librarian only brings you the books that match your specific request, saving everyone a massive amount of effort.

📖 The War Story

Situation: "We needed to write a script to run a daily audit on a massive Windows file server. The goal was to find all `.mp3` and `.avi` files in user home directories, which violated company policy. The server stored millions of files."

Challenge:** "The first version of the script, written by a junior admin, used the `Get-ChildItem | Where-Object` pattern. When it ran at 2 AM, the server's CPU and memory usage shot to 100%. The script took over 8 hours to complete and caused performance degradation that impacted our offices in Asia as they started their workday."

✅ The Answer

My Thinking Process:

"The problem is not PowerShell; it's the inefficient use of it. The script was asking PowerShell to do the filtering, which is slow. I need to tell PowerShell to ask the *file system provider* to do the filtering, which is written in low-level code and is orders of magnitude faster."

What I'd Do:

"The correct and performant way to do this is with the `-Filter` parameter. It pushes the filtering logic down to the file system provider itself."

Get-ChildItem -Path C:\inetpub\logs -Recurse -Filter *.log

"Here is the breakdown of why this is the professional's choice:

  • Get-ChildItem: The cmdlet for interacting with items in a provider (in this case, the filesystem). Alias is `gci`.
  • -Path C:\inetpub\logs: The 'aisle'—where to start looking.
  • -Recurse: Search this directory and all subdirectories.
  • -Filter *.log: The 'card catalog request'. This is the magic. It uses the file system's native wildcard matching capabilities to find the files. PowerShell never even sees the objects for files that *don't* match the filter, resulting in a huge performance gain.

"To demonstrate advanced knowledge, I would also explain the difference between `-Filter` and `-Include`."

"`-Filter` is always the fastest because it's provider-side. `-Include` is a PowerShell-side filter that happens after the provider returns items, so it's less efficient. A simple rule is: always use `-Filter` if you can. Use `-Include` only when you need to specify multiple, non-trivial patterns like `Get-ChildItem -Include *.log, *.txt`."

The Outcome:

"I rewrote the audit script to use `-Filter`. The new command was `Get-ChildItem -Path \\server\homedirs -Recurse -Filter *.mp3`. A second command ran for `.avi`. The script that previously took 8 hours and crashed the server now completed in under 5 minutes with negligible CPU impact. We could run it hourly without anyone noticing."

🎯 The Memorable Hook

Knowing your tool is good. Knowing how your tool interacts with the underlying system is better. The best engineers understand that their code is just one part of a larger system, and they leverage the strengths of every part of that system for maximum efficiency.

💭 Inevitable Follow-ups

Q: "How would you find only directories, not files?"

Be ready: "You can use the dedicated `-Directory` switch: `Get-ChildItem -Path C:\... -Directory`. Similarly, `-File` will get only files. This is clearer and often faster than piping to `Where-Object { $_.PSIsContainer }`."

Q: "How would you combine this to find all `.log` files and then get their total size?"

Be ready: "You'd pipe the efficient search to `Measure-Object`: `Get-ChildItem -Path C:\... -Recurse -Filter *.log | Measure-Object -Property Length -Sum`. This shows you can compose cmdlets to build a full data-processing pipeline."

Written by Benito J D