Introducing Gradual: A Concurrency-First Stress Testing Framework

Traditional stress testing tools focus on Requests Per Second (RPS) as the primary metric. While RPS is important, it doesn't tell the full story of how your system behaves under concurrent load. This is especially true in high-latency environments like fintech, where concurrent users can create bottlenecks that simple RPS testing might miss.

That's why I'm building Gradual—a concurrency-first stress testing framework designed to simulate real-world user load patterns.

The Problem with RPS-Based Testing

Most stress testing tools work like this:

# Traditional RPS-based approach
for i in range(1000):  # 1000 requests
    send_request()
    time.sleep(1/1000)  # 1 RPS

This approach has several limitations:

Unrealistic Load Patterns: Real users don't make requests at perfectly spaced intervals
Misses Concurrency Issues: Doesn't simulate multiple users hitting the system simultaneously
Poor Latency Simulation: Doesn't account for network latency and user think time
Limited Bottleneck Discovery: May miss issues that only appear under concurrent access

Enter Gradual: Concurrency-First Design

Gradual takes a different approach by focusing on concurrent users rather than requests per second:

# gradual.yaml
config:
  name: "User Login Flow"
  duration: "5m"
  ramp_up: "additive"  # or "multiplicative"
  
users:
  - name: "authenticated_user"
    concurrency: 100
    behavior:
      - action: "login"
        think_time: "2-5s"  # Random think time
      - action: "browse_dashboard"
        think_time: "5-15s"
      - action: "view_reports"
        think_time: "3-8s"
      - action: "logout"
        think_time: "1-3s"
    
  - name: "guest_user"
    concurrency: 50
    behavior:
      - action: "view_public_data"
        think_time: "1-3s"
      - action: "register"
        think_time: "10-30s"

Key Features

1. YAML-Based Configuration

No more writing complex tick functions. Define your test scenarios in simple YAML:

actions:
  login:
    method: "POST"
    url: "/api/auth/login"
    headers:
      Content-Type: "application/json"
    body:
      username: "{{user.username}}"
      password: "{{user.password}}"
    assertions:
      - status_code: 200
      - response_time: "< 500ms"

2. Flexible Ramp-Up Strategies

Gradual supports both additive and multiplicative concurrency ramping:

# Additive: Add 10 users every 30 seconds
ramp_config = {
    "strategy": "additive",
    "initial": 10,
    "increment": 10,
    "interval": "30s",
    "max_concurrency": 100
}

# Multiplicative: Double users every minute
ramp_config = {
    "strategy": "multiplicative",
    "initial": 5,
    "multiplier": 2,
    "interval": "1m",
    "max_concurrency": 80
}

3. Realistic User Behavior

Simulate real user patterns with think times and conditional logic:

behavior:
  - action: "login"
    think_time: "2-5s"
    success_rate: 0.95  # 95% success rate
    
  - action: "conditional_action"
    condition: "{{response.status == 'success'}}"
    think_time: "5-10s"
    
  - action: "error_handling"
    condition: "{{response.status == 'error'}}"
    think_time: "1-2s"

4. Pluggable Reporting

Gradual uses an adapter-based reporting system:

from gradual.reporting import BaseReporter

class CustomReporter(BaseReporter):
    def on_request_complete(self, request):
        # Custom logic for each completed request
        self.metrics.record_latency(request.duration)
        self.metrics.record_status(request.status_code)
    
    def on_test_complete(self, summary):
        # Generate custom report
        self.generate_html_report(summary)
        self.send_slack_notification(summary)

Architecture Overview

Gradual is built using clean architecture principles:

gradual/
├── core/           # Domain logic
│   ├── user.py     # User simulation
│   ├── action.py   # Action execution
│   └── metrics.py  # Performance metrics
├── adapters/       # External integrations
│   ├── http.py     # HTTP client
│   ├── reporting/  # Report generators
│   └── storage/    # Data persistence
├── config/         # Configuration parsing
└── cli/           # Command-line interface

Example Usage

# Run a simple test
gradual run --config test.yaml

# Run with custom reporting
gradual run --config test.yaml --reporter custom

# Run with real-time monitoring
gradual run --config test.yaml --monitor

Why This Matters for Fintech

In fintech applications, concurrent access patterns are crucial:

Trading Systems: Multiple traders accessing the same data simultaneously
Risk Management: Concurrent risk calculations across portfolios
Reporting: Multiple users generating reports at the same time
Data Ingestion: Concurrent data feeds being processed

Gradual helps identify bottlenecks in these scenarios that traditional RPS testing might miss.

Roadmap

Gradual is currently in active development. Planned features include:

Distributed Testing: Run tests across multiple machines
Real-time Monitoring: Live dashboard during test execution
Integration Testing: Support for complex multi-service scenarios
Performance Baselines: Automatic regression detection
Cloud Integration: Native support for AWS, GCP, Azure

Getting Involved

Gradual is open source and I'm actively looking for contributors. If you're interested in:

Stress testing and performance engineering
Python and async programming
Building developer tools
Fintech and high-performance systems

Check out the project on GitHub and let's build something amazing together!

Conclusion

Gradual represents a shift in how we think about stress testing—from simple request counting to realistic user simulation. By focusing on concurrency and user behavior, we can discover performance issues that traditional tools might miss.

The framework is still in early development, but the core concepts are solid. I'm excited to see how it evolves and helps teams build more robust, scalable systems.

Stay tuned for more updates on Gradual's development, and feel free to reach out if you'd like to contribute or learn more about stress testing strategies!