AWS Quota Monitoring: A ChatGPT Success Story

December 30, 2024

AWS

ChatGPT

DevOps

Automation

SRE

AWS Quota Monitoring: A ChatGPT Success Story

Too many AWS outages start the same way: “We hit a quota limit we didn’t know existed.”

I’ve seen this happen multiple times across different environments. It’s almost always discovered after the fact—after something breaks. Teams don’t anticipate it the first time, but once it happens, they realize how preventable it could have been with just a bit of foresight.

The ChatGPT Experiment

In a recent ChatGPT exchange, I explored how to get ahead of that moment. After some thoughtful setup, a single question yielded a complete and practical approach for tracking AWS service quotas across accounts and regions.

What impressed me most wasn’t just the immediate solution, but the forward-looking framework: something teams can implement today to monitor and alert on quota usage, then grow into a governance-ready tool that supports scaling and DR planning.

The Key Question

My goal wasn’t just to solve quota monitoring—it was to demonstrate how ChatGPT can deliver high-value, structured guidance when provided with the right context. Here’s the question that unlocked the comprehensive response:

“If one of the desired solution features is to know if a quota resource limit has been changed from a default value, would it be possible to: use Config to get resources containing list of regions used, get the list of current resource limits for each resource type in a default region (say us-east-1), verify the limit has not been increased and store this initial value as a default for the region, and then on subsequent scans, report only on limit increases that have been implemented and optionally track request history as well and store this detail in a centralized data store that can store organization-wide resource limits across all accounts, with reporting optimized to focus on increases as a way for insight and governance and potentially mitigation support—say for ensuring DR regions already have provisioned limit increases in case DR is needed into a region that is sparsely populated?”

The Response Framework

ChatGPT provided a complete solution architecture:

Step 1: Identify Active AWS Regions
Use AWS Config, Cost Explorer, or SDK methods to reliably list all active regions in use.

Step 2: Capture Default Limits
Use the AWS Service Quotas API to retrieve initial quota limits per resource type and store these as baseline values for comparison.

Step 3: Identify Deviations from Defaults
During subsequent scans, compare current quotas to stored baselines and identify explicit increases or changes.

Step 4: Store Data Centrally
Maintain baselines, changes, and quota-increase-request history in a centralized data store (DynamoDB, S3, or structured database).

Step 5: Optimized Reporting for Governance
Create targeted dashboards emphasizing changes rather than static values, with clear insight into DR preparedness.

The Real Value

This isn’t just about managing AWS quotas—it’s about demonstrating how thoughtful framing can unlock actionable, low-effort solutions that address both immediate needs and long-term planning.

The response included critical considerations I hadn’t thought of:

AWS defaults often differ across regions (capture baselines per region)
Store quota-increase-request metadata for auditability
Integrate with existing governance dashboards
Highlight DR readiness explicitly in reporting

Key Takeaways

For AWS Operations: Proactive quota monitoring prevents outages and supports DR planning. The framework scales from immediate alerting to organization-wide governance.

For AI Tool Usage: With proper context and specific questions, ChatGPT can deliver comprehensive, implementable solutions that go beyond surface-level advice.

For SRE Practice: The best reliability improvements often come from preventing problems you haven’t had yet. Quota monitoring is infrastructure insurance—boring until you need it.

The Bottom Line

Sometimes the most valuable solutions come from asking the right questions of the right tools. In this case, a single well-crafted prompt delivered a complete strategy that addresses immediate operational needs while building toward long-term governance and resilience.

Because the alternative—discovering quota limits during an outage—is a lesson nobody wants to learn the hard way.