The Concurrency Contract: How to Stop Wasting Time Waiting

Junior/Mid Engineer Asked at: Stripe, Netflix, any company with microservices

Q: How would you run two functions in parallel in Python? For example, two functions that each call a slow API.

Why this matters: This question tests your understanding of leverage. A computer spends most of its time waiting—for the network, for the disk, for a database. This question asks: "Are you smart enough to do something useful during that waiting time?" Your answer shows if you can write efficient, high-throughput systems or if you write code that sits around doing nothing.

Interview frequency: High. Essential for any role building backend services or complex automation.

❌ The Death Trap

The trap is to either not know how to do it at all, or to reach for the low-level `threading` module without a clear reason. The `threading` module is powerful, but it's like building your own furniture from raw lumber. It's easy to make mistakes if you don't know what you're doing.

"A junior candidate might write this:"

# They just run the functions one after another.
result1 = call_slow_api_1()
result2 = call_slow_api_2()

This is a failure to answer the question. If each API call takes 2 seconds, the total time is 4 seconds. The program spends 2 seconds doing nothing while waiting for the first API, then another 2 seconds doing nothing while waiting for the second. It's a massive waste of time.

🔄 The Reframe

What they're really asking: "When your code is blocked waiting for something, how do you reclaim that time? Show me you know the modern, safe, and high-level tools for managing concurrency in Python."

This reveals if you are up-to-date with the Python standard library. It shows you prefer well-abstracted, safer tools (`concurrent.futures`) over lower-level, more error-prone ones (`threading`). It's a test of good judgment.

🧠 The Mental Model

I use the **"The Expert Chef"** analogy. Your program is a chef in a kitchen.

1. The Bad Chef (Sequential): Puts a pot of water on the stove and stares at it for 10 minutes until it boils. Only then do they start chopping vegetables. Inefficient.
2. The Expert Chef (Concurrent): Puts the water on to boil (makes the first API call). While waiting, they immediately start chopping vegetables (makes the second API call). The total time is governed by the longest single task, not the sum of all tasks. This is what we want.

📖 The War Story

Situation: "We had a dashboard for our support team that, on loading a customer's profile, needed to fetch data from three different microservices: the user service, the billing service, and the order history service."

Challenge: "The original code was written sequentially. The user service took 500ms, billing took 800ms, and orders took 1200ms. The dashboard took 500 + 800 + 1200 = 2.5 seconds to load. Our support agents, who handle dozens of calls an hour, were complaining that the system felt sluggish."

Stakes: "Slow tools lead to longer customer call times and frustrated support agents. Every second we could shave off that load time was a direct improvement in operational efficiency and employee morale. The business was losing money because our code was lazy."

✅ The Answer

"The best way to handle this in modern Python is with the `concurrent.futures` module, specifically `ThreadPoolExecutor`. It's a high-level abstraction that makes managing a group of threads simple and safe."

The Code: From Slow to Fast

First, let's define our slow, I/O-bound functions that simulate calling APIs.

import time
import concurrent.futures

def fetch_user_data():
    print("Fetching user data...")
    time.sleep(2)  # Simulate a 2-second network call
    print("User data fetched.")
    return {"user": "Naval"}

def fetch_billing_data():
    print("Fetching billing data...")
    time.sleep(2)  # Simulate a 2-second network call
    print("Billing data fetched.")
    return {"balance": 100}

# --- The SLOW, Sequential Way ---
start_time = time.time()
user_data = fetch_user_data()
billing_data = fetch_billing_data()
end_time = time.time()
print(f"\nSequential execution took: {end_time - start_time:.2f} seconds")

# --- The FAST, Concurrent Way ---
start_time = time.time()
# The 'with' statement ensures threads are cleaned up properly.
with concurrent.futures.ThreadPoolExecutor() as executor:
    # submit() schedules a function to be run and returns a Future object.
    user_future = executor.submit(fetch_user_data)
    billing_future = executor.submit(fetch_billing_data)
    
    # .result() waits for the function to complete and returns its result.
    user_data = user_future.result()
    billing_data = billing_future.result()

end_time = time.time()
print(f"Concurrent execution took: {end_time - start_time:.2f} seconds")

The sequential version takes about 4 seconds. The concurrent version takes about 2 seconds. We've effectively cut the execution time in half by performing the "waiting" in parallel. The `ThreadPoolExecutor` manages all the complexity of creating, running, and joining threads for us.

🎯 The Memorable Hook

This elevates the discussion from a mere optimization technique to a fundamental principle of value creation in software. It shows you think about impact, not just implementation.

💭 Inevitable Follow-ups

Q: "Why does this work for I/O-bound tasks? What about Python's Global Interpreter Lock (GIL)?"

Be ready: "The GIL is a lock in CPython that ensures only one thread executes Python bytecode at a time. This would seem to prevent parallelism. However, for I/O-bound operations like a network call, the Python thread releases the GIL while it's waiting for the operating system to respond. This allows another thread to acquire the GIL and start its own network call. So while we don't get true parallelism for Python code, we get concurrency for waiting, which is exactly what we need for I/O tasks."

Q: "When would this approach be a bad idea? When would you use `multiprocessing` instead?"

Be ready: "This is a terrible idea for CPU-bound tasks, like calculating a million Fibonacci numbers. Because of the GIL, the threads would just fight over the lock and could even run slower than the sequential version. For CPU-bound work where you need true parallelism, you must use the `multiprocessing` module. It creates separate processes, each with its own Python interpreter and memory, bypassing the GIL entirely."

🔄 Adapt This Framework

If you're junior: Correctly identifying `concurrent.futures.ThreadPoolExecutor` as the solution and writing the working code is a fantastic answer. It shows you know the modern, correct tool for the job.

If you're senior: You should lead with `ThreadPoolExecutor` and immediately explain *why* it's the right choice for I/O-bound tasks, proactively bringing up the GIL and contrasting it with CPU-bound tasks and `multiprocessing`. You're expected to demonstrate a deep understanding of the underlying constraints.

Written by Benito J D