Grafana Dashboards: The Art of Turning Data into Decisions

Mid Engineer Asked at: FAANG, Cloud-Native Startups

Q: Let's discuss dashboarding. What are the different ways to create a dashboard in Grafana, and what are the trade-offs of each approach?

Why this matters: This question isn't about your ability to click buttons in a UI. It's about your philosophy of information design. Can you choose the right tool for the job? Do you prioritize speed, consistency, or customizability? Your answer reveals whether you're a dashboard consumer or a true architect of clarity.

Interview frequency: High for SRE, DevOps, and Platform roles.

❌ The Death Trap

The candidate simply lists the three methods without any context, strategic reasoning, or discussion of the "why."

"There are three ways. You can import a dashboard from grafana.com using its ID, you can import a dashboard from a JSON file, or you can create one from scratch by adding panels."

This is a correct but low-value answer. It shows you've used the tool, but not that you've thought deeply about how to use it effectively to solve business problems.

🔄 The Reframe

What they're really asking: "How do you build a shared understanding of reality for an engineering team? What's your strategy for moving from zero visibility to actionable insight, and how do you ensure that insight is consistent and scalable?"

This reframes dashboarding from a technical task into a strategic act of communication and system building. It's about creating leverage, not just charts.

🧠 The Mental Model

The "Architectural Blueprints" model. Building a dashboard is like constructing a house; you have different strategies depending on your needs.

1. Importing by ID (The Prefab Home): This is for common, well-understood problems (like node resource usage). You leverage the wisdom of the community by importing a professionally designed, battle-tested blueprint from `grafana.com`. It's the fastest way to get a solid, 80% solution.

2. Importing by JSON (The Blueprint Copy Machine): This is for ensuring consistency and scale. You've already designed a perfect house (a custom dashboard for your service in staging). You use JSON to create a perfect, bit-for-bit copy of that blueprint for a new location (production). This is key for managing dashboards as code.

3. Creating from Scratch (The Custom Architectural Design): This is for unique, business-specific problems that no one has solved before. You are the architect, starting with a blank slate to design a bespoke view of your system's health. It requires the most effort but delivers the highest value by providing tailored insights.

📖 The War Story

Situation: "We had just deployed our first application to a new Kubernetes cluster. The app was running, but we were effectively flying blind. We had metrics coming into Prometheus, but no way to visualize them."

Challenge: "The SRE team needed a baseline understanding of the cluster's health *immediately*. At the same time, the application team needed a highly specific view of their service's performance and business logic, which didn't exist anywhere."

Stakes: "We were about to route live traffic to this cluster. Going live without proper visibility wasn't just risky; it was unprofessional. An incident would be a black box, impossible to debug, leading to extended downtime and a loss of customer trust."

✅ The Answer

My Thinking Process:

"My approach was to use all three methods strategically in a phased approach, prioritizing speed for common problems and dedicating effort to our unique business needs."

What I Did: A Three-Pronged Strategy

1. The Quick Win (Prefab Home): The first thing I did was go to `grafana.com` and grab the ID for the standard 'Node Exporter Full' dashboard (ID 1860). Within five minutes, we had a comprehensive, professionally designed view of the CPU, memory, and disk usage of every node in our cluster. This immediately gave the SRE team the baseline visibility they needed and built confidence.

2. The Custom Blueprint (Architectural Design): Next, I worked with the application developers. Their service had custom Prometheus metrics like `orders_processed_total` and `payment_latency_seconds`. No public dashboard could visualize this. So, I created a new dashboard from scratch. This involved deep-diving into PromQL in the 'Explore' view to build queries that answered specific business questions. For example, to calculate the 95th percentile latency:"

# PromQL for P95 latency of the payment service
histogram_quantile(0.95, sum(rate(payment_latency_seconds_bucket{job="my-app"}[5m])) by (le))
            

"This resulted in a highly valuable, bespoke dashboard that directly mapped to our business KPIs. This was our 'golden' blueprint for the service."

3. Scaling with Consistency (The Copy Machine): Once we were happy with our custom dashboard in the staging environment, the question was how to promote it to production without error-prone manual recreation. I used the 'Share' -> 'Export' feature to get the dashboard's JSON model. Then, in our production Grafana, I used the 'Import' feature and pasted that JSON. This guaranteed a perfect, identical copy. We then checked this JSON into our Git repository, treating our dashboards as version-controlled code."

The Outcome:

"By using this tiered approach, we went from zero visibility to comprehensive, multi-layered observability in a single afternoon. The SRE team had their infrastructure view, and the app team had their business view. When we went live, we could see the health of the system from the bare metal all the way up to the customer transaction, all because we chose the right dashboarding strategy for each layer of the problem."

What I Learned:

"I learned that dashboarding is a spectrum. On one end is speed and leveraging community knowledge; on the other is deep, custom work to create unique business value. A mature observability strategy uses the entire spectrum. You don't build from scratch what you can borrow, and you don't borrow when you need a tailored solution."

🎯 The Memorable Hook

"A good dashboard isn't a collection of charts. It's an opinionated story about what matters. The three creation methods are just different ways of telling that story: borrowing a classic, copying a bestseller, or writing your own."

This connects the technical methods to the act of storytelling and communication, demonstrating a higher level of abstract thinking.

💭 Inevitable Follow-ups

Q: "How do you make dashboards dynamic and reusable, so you don't have to create one for every service or environment?"

Be ready: "Through variables. I'd create variables for things like `$datasource`, `$namespace`, or `$service`. For example, a query variable could use `label_values(my_metric, service)` to automatically populate a dropdown with all available services. This allows you to have one dashboard that can be filtered to show the context for any service in any environment, which is massively scalable."

Q: "How do you manage dashboards at scale and prevent 'dashboard rot'?"

Be ready: "By treating dashboards as code. We store the JSON models in a dedicated Git repository. Changes are made via pull requests. This gives us version history, peer review, and the ability to provision a new Grafana instance with all our standard dashboards automatically. We also use folders and tagging within Grafana to keep things organized by team and service."