Preventing Chained .NET Microservice Calls from Causing System-Wide Latency and Retry Storms

A parcel delivery platform begins with three independently deployed .NET services. The public Delivery API accepts a shipment request, calls the Pricing service, waits for the Capacity service, and then calls the Route Planning service. Each service is small, has its own deployment pipeline, and runs in a container.

On paper, this looks like a microservice architecture. In production, one slow Capacity response causes threads to wait in Pricing. Pricing then delays Delivery API requests. Clients retry, the number of calls grows, and a problem in one dependency spreads through the entire request path.

The problem is not that the services are too large. The problem is that they are independent deployment units but still behave like tightly coupled methods inside one process. A resilient design must reduce request-time communication, give each service the data it needs, make asynchronous messages safe to repeat, and protect the few synchronous calls that remain.

Context and Scope

The example system has four responsibilities:

Delivery API accepts shipment requests from web and mobile clients.
Pricing calculates the delivery price.
Capacity tracks available vehicles and collection slots.
Route Planning estimates routes and delivery windows.

The initial request flow is synchronous:

Client
  |
  v
Delivery API
  |
  v
Pricing
  |
  v
Capacity
  |
  v
Route Planning

Every arrow represents a network call that must finish before the previous service can continue. The response time observed by the client includes the work and waiting time of the entire chain.

The target architecture keeps the user-facing operation short:

Capacity changes -----+
                      |
Route changes --------+--> Message broker --> Local service projections
                      |
Pricing rules --------+

Client
  |
  v
Delivery API
  |
  v
Local decision and short required calls

Services publish state changes when those changes happen. Other services consume the messages and maintain the local data needed for their own decisions. The Delivery API no longer discovers basic information by recursively asking several services during every customer request.

Why a Chain of Small Services Can Be Slower Than a Monolith

A method call inside one process is relatively cheap. A microservice call crosses process and often machine boundaries. It may involve serialization, network transport, authentication, connection handling, queueing, and remote processing.

A synchronous chain multiplies failure opportunities.

Suppose Delivery API calls Pricing, Pricing calls Capacity, and Capacity calls Route Planning. Delivery API remains blocked until every downstream operation completes. If Route Planning slows down, Capacity waits. Pricing waits for Capacity, and Delivery API waits for Pricing.

This causes several risks:

The end-to-end response time grows with every dependency.
One unavailable service can make unrelated entry points unavailable.
Waiting requests consume threads, connections, and memory.
Client retries add more load to an already unhealthy path.
Scaling only the public API does not remove the downstream bottleneck.
Deploying one service independently becomes risky because callers depend on its immediate response.

Microservices provide independent deployment and fine-grained scaling only when their design boundaries are also independent.

Step 1: Define Logical Boundaries Before Splitting Processes

A logical microservice should represent a coherent business capability, not an arbitrary technical layer.

Domain-Driven Design uses a bounded context to describe an area where business terms and rules have one consistent meaning. A logical microservice commonly follows such a boundary.

For the delivery platform:

Logical service	Owns
Delivery	Shipment creation and delivery lifecycle
Pricing	Tariffs, discounts, and price calculation rules
Capacity	Vehicle capacity and collection-slot availability
Route Planning	Routes, travel estimates, and planning rules

A logical service may be implemented by more than one physical process when separate parts need different scaling. For example, price calculation workers and tariff administration endpoints may belong to the same logical Pricing boundary while being deployed separately.

The important rule is that a service boundary should reduce coordination. Splitting one tightly coupled workflow into many processes without changing its data and communication model only adds network cost.

Step 2: Give Each Service Exclusive Control of Its Data

Two services that write to the same tables are not independent. A schema change made for one service can force code changes and coordinated deployments in the other.

Each logical microservice should control its own storage. That storage may run inside the service boundary or use an external database engine, but other logical services should not bypass the owning service and manipulate the same schema.

Delivery Service ------> Delivery Store
Pricing Service -------> Pricing Store
Capacity Service ------> Capacity Store
Route Service ---------> Route Store

Exclusive storage does not mean that a service can never use information produced elsewhere. It means that the service receives that information through an explicit contract and stores the local representation needed for its own work.

For example, Pricing may need a capacity classification such as Available, Limited, or Unavailable. It does not need direct access to every Capacity table. Capacity can publish classification changes, and Pricing can store a small local projection.

This protects design independence:

Capacity can reorganize its internal tables without changing Pricing.
Pricing can use a storage technology suited to its own workload.
Each service can deploy its schema and code together.
A service can keep only the external data that its decisions require.

Step 3: Push Changes Instead of Pulling Data During Requests

The original chain asks downstream services for information only when a customer request arrives. This keeps services dependent on one another at the most latency-sensitive moment.

Reverse the direction. When Capacity changes, publish the change immediately. Pricing and Delivery consume it and update local projections.

Vehicle or slot update
        |
        v
Capacity Service
        |
        v
CapacityChanged event
        |
        +--------------------+
        |                    |
        v                    v
Pricing projection     Delivery projection

A service receiving an external request can now answer from its own data instead of starting a long call tree.

This does not remove all communication. It moves communication away from the critical request path and makes it asynchronous.

The pattern works well because:

The publisher does not wait for every subscriber to finish.
A new subscriber can be added without changing the publisher.
Receivers can scale independently.
Temporary receiver outages can be absorbed by queued messages.
Each service owns the projection needed for its own response.

A queue and a topic solve different distribution problems:

A queue delivers each message to one competing receiver. This is useful when several worker instances share processing work.
A topic gives each subscription its own copy. This is useful when Delivery, Pricing, and Reporting must all react to the same event.

Azure Service Bus supports both queues and topics. RabbitMQ is another option when the system needs an independently hosted broker. Higher-level libraries can add abstractions above message brokers, but the architectural decision remains the same: publish state changes rather than performing deep synchronous lookups.

Step 4: Design Messages for Independent Evolution

Independent deployment does not remove compatibility requirements. It moves compatibility from shared binaries and schemas to communication contracts.

A message should contain the stable facts consumers need, not the publisher's private persistence model.

public sealed record CapacityChanged(
    Guid EventId,
    DateTimeOffset OccurredAt,
    string RegionCode,
    string AvailabilityLevel);

This event does not expose internal vehicle tables or route-allocation objects. Consumers can store their own representation.

When evolving a contract:

Prefer additive changes that older consumers can ignore.
Avoid changing the meaning of an existing field.
Keep consumers tolerant of fields they do not use.
Support old and new message versions during a transition.
Remove old contract support only after every consumer has migrated.

Independent CI/CD is possible only when one service can be upgraded without forcing every consumer to deploy at the same time.

Step 5: Make Message Processing Idempotent

Reliable delivery may send the same message more than once. A sender may publish successfully but fail to receive the acknowledgment. A timeout can cause a retry even though the receiver already processed the first attempt.

An idempotent operation has the same final effect whether it runs once or several times.

Setting a projection to a new availability value is naturally idempotent:

Set region NORTH to Limited
Set region NORTH to Limited

Incrementing a counter is not:

Increase unavailable count by 1
Increase unavailable count by 1

For messages that are not naturally idempotent, attach a unique identifier and record which identifiers have already been processed.

public interface IProcessedEventStore
{
    Task<bool> ContainsAsync(
        Guid eventId,
        CancellationToken cancellationToken);

    Task AddAsync(
        Guid eventId,
        DateTimeOffset processedAt,
        CancellationToken cancellationToken);
}

public sealed class CapacityProjectionHandler
{
    private readonly IProcessedEventStore _processedEvents;
    private readonly ICapacityProjection _projection;

    public CapacityProjectionHandler(
        IProcessedEventStore processedEvents,
        ICapacityProjection projection)
    {
        _processedEvents = processedEvents;
        _projection = projection;
    }

    public async Task HandleAsync(
        CapacityChanged message,
        CancellationToken cancellationToken)
    {
        if (await _processedEvents.ContainsAsync(
            message.EventId,
            cancellationToken))
        {
            return;
        }

        await _projection.SetAvailabilityAsync(
            message.RegionCode,
            message.AvailabilityLevel,
            cancellationToken);

        await _processedEvents.AddAsync(
            message.EventId,
            DateTimeOffset.UtcNow,
            cancellationToken);
    }
}

The example illustrates the duplicate-detection workflow:

Read the unique event identifier.
Check whether it was processed.
Apply the state change.
Store the identifier.
Reject the same identifier when it appears again.
Remove sufficiently old identifiers according to the system's retention rules.

The storage update and processed-event record should be coordinated so a crash cannot leave the business change applied while the event remains marked as unseen.

Step 6: Protect Unavoidable Synchronous Calls with Polly

Some synchronous calls remain valid. A public API may need immediate confirmation from an authorization service, or a service may need to contact an external provider before responding.

These calls should be short, bounded, and protected by a resilience pipeline.

Polly allows .NET applications to combine strategies such as:

Exponential retry for temporary failures
Circuit breaking for longer failures
Rate limiting or bulkhead-like isolation to prevent congestion
Timeouts to stop waiting indefinitely

var pipeline = new ResiliencePipelineBuilder()
    .AddRetry(new RetryStrategyOptions
    {
        ShouldHandle = new PredicateBuilder()
            .Handle<DependencyUnavailableException>(),
        BackoffType = DelayBackoffType.Exponential,
        UseJitter = true,
        MaxRetryAttempts = 3,
        Delay = TimeSpan.FromMilliseconds(400)
    })
    .AddCircuitBreaker(new CircuitBreakerStrategyOptions
    {
        ShouldHandle = new PredicateBuilder()
            .Handle<DependencyUnavailableException>(),
        FailureRatio = 0.5,
        SamplingDuration = TimeSpan.FromSeconds(12),
        MinimumThroughput = 6,
        BreakDuration = TimeSpan.FromSeconds(20)
    })
    .AddRateLimiter(new SlidingWindowRateLimiter(
        new SlidingWindowRateLimiterOptions
        {
            PermitLimit = 40,
            SegmentsPerWindow = 4,
            Window = TimeSpan.FromMinutes(1)
        }))
    .AddTimeout(TimeSpan.FromSeconds(2))
    .Build();

The protected operation runs through the pipeline:

await pipeline.ExecuteAsync(
    async cancellationToken =>
    {
        await dependencyClient.SendAsync(cancellationToken);
    },
    requestCancellationToken);

Each strategy addresses a different failure mode.

Retry

Retry is appropriate for a temporary problem that may disappear quickly. Exponential delays prevent the caller from retrying continuously. Jitter adds variation so many instances do not retry at exactly the same moment.

Retrying every exception is dangerous. Invalid requests and permanent business failures should not be repeated.

Circuit breaker

A circuit breaker stops calls temporarily after failure conditions cross the configured threshold. Instead of continuing to pressure an unavailable dependency, callers fail quickly until the break period ends.

The same ResiliencePipeline instance must be reused when the circuit state should be shared. Rebuilding it for every request would also rebuild its state.

Rate limiter and isolation

A congested dependency should not consume every outbound connection or worker. A rate limiter bounds how often protected work can start. Related isolation strategies can cap concurrent work, queue a limited amount, and reject excess requests before they exhaust the entire service.

Timeout

A timeout sets the maximum waiting period. The protected code must observe the cancellation token and stop when cancellation is requested. A timeout that the dependency ignores does not release work cleanly.

Retries, circuit breaking, limiting, and timeouts are not substitutes for removing chained calls. They protect necessary boundaries. Applying them to a deeply coupled chain may only make the chain fail more slowly and with more complexity.

Step 7: Run Background Consumers with a .NET Host

Message consumers need controlled startup, dependency injection, configuration, logging, and graceful shutdown. The .NET Generic Host provides these facilities.

A worker can inherit from BackgroundService and stop when the host signals cancellation.

public sealed class CapacityEventWorker : BackgroundService
{
    private readonly ICapacityEventReceiver _receiver;
    private readonly CapacityProjectionHandler _handler;

    public CapacityEventWorker(
        ICapacityEventReceiver receiver,
        CapacityProjectionHandler handler)
    {
        _receiver = receiver;
        _handler = handler;
    }

    protected override async Task ExecuteAsync(
        CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            var message = await _receiver.ReceiveAsync(stoppingToken);

            await _handler.HandleAsync(
                message,
                stoppingToken);
        }
    }
}

var builder = Host.CreateApplicationBuilder(args);

builder.Services.AddHostedService<CapacityEventWorker>();

builder.Services.Configure<HostOptions>(options =>
{
    options.ShutdownTimeout = TimeSpan.FromSeconds(8);
});

using var host = builder.Build();

await host.RunAsync();

Graceful shutdown matters because an orchestrator may stop or move an instance during load balancing or recovery. The worker should stop receiving new work, finish or safely abandon current work, release resources, and exit before the orchestrator's termination deadline.

Increasing the host shutdown timeout does not automatically increase the orchestrator's timeout. Both limits must agree.

Step 8: Use Containers for Deployment Independence

A microservice that depends on software installed manually on a particular server is difficult to move when a node is busy or unhealthy.

A container image packages the application and its runtime dependencies while sharing the host operating system kernel. This makes containers lighter and faster to start than full virtual machines.

A simplified multi-stage Dockerfile can build and package a .NET worker:

FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
WORKDIR /src

COPY ["CapacityProjection.Worker.csproj", "./"]
RUN dotnet restore "CapacityProjection.Worker.csproj"

COPY . .
RUN dotnet publish "CapacityProjection.Worker.csproj"     --configuration Release     --output /app/publish     /p:UseAppHost=false

FROM mcr.microsoft.com/dotnet/runtime:10.0 AS final
WORKDIR /app
COPY --from=build /app/publish .
ENTRYPOINT ["dotnet", "CapacityProjection.Worker.dll"]

Build and run the image with Docker:

docker build ./ -t delivery/capacity-projection:v1
docker run --name capacity-projection delivery/capacity-projection:v1

Containerization improves deployment independence, but it does not create logical independence. A containerized service that shares a database and participates in long synchronous call chains is still tightly coupled.

Step 9: Decide Whether Microservices Are Worth the Cost

Microservices introduce additional engineering work:

Network communication replaces local calls.
Failure handling becomes part of normal design.
Integration testing becomes more important.
Message contracts must be versioned.
Duplicate delivery and eventual consistency must be handled.
Containers, registries, scaling, and orchestration require infrastructure.
Observing a request across several processes is harder than debugging one process.

Adoption is easier to justify when the system needs one or more of these benefits:

Different capabilities require independent scaling.
Separate teams need independent release cycles.
The application must combine several technology stacks.
Legacy subsystems must be integrated and replaced gradually.
One deployment unit has become too large to maintain efficiently.
Traffic and software complexity make coarse scaling wasteful.

When these pressures do not exist, splitting every module into a network service may increase cost without producing enough value.

Testing the Refactored Workflow

The architecture should be tested for failure behavior, not only successful responses.

Contract tests

Verify that publishers and consumers agree on required fields and meanings. Test additive evolution so an older consumer can still process a newer compatible message.

Duplicate-delivery tests

Deliver the same EventId several times and verify that the projection changes only once.

Out-of-order tests

Where event order matters, include version or sequence information and verify that an older update cannot overwrite newer state.

Dependency failure tests

Force the synchronous dependency to fail and verify:

Retries stop at the configured limit.
The circuit opens after the failure threshold.
Requests fail quickly while the circuit is open.
Rate limiting prevents unbounded outbound work.
Timeout cancellation reaches the protected code.

Shutdown tests

Cancel the host while the worker is idle and while it is processing a message. Verify that it exits inside the permitted shutdown period and does not leave the message in an unknown state.

Isolation tests

Slow Capacity processing and confirm that unrelated Delivery endpoints remain responsive. A resilient boundary should prevent local congestion from becoming system-wide congestion.

Common Mistakes

Splitting by class or technical layer

A separate service for every controller, repository, or table creates communication without creating business independence. Split around coherent capabilities.

Sharing a database between logical services

A shared schema couples deployments and design choices even when the applications run in separate containers.

Fetching all external data during the user request

This recreates a distributed call stack. Push changes and maintain local projections for request-time decisions.

Retrying without idempotency

Retries can repeat completed work. Every receiver must be prepared for duplicate delivery.

Retrying permanent failures

An invalid request will not become valid after a delay. Retry only failures that can realistically be temporary.

Creating a new resilience pipeline per request

Circuit breaker state belongs to the pipeline instance. Recreating the pipeline prevents failures from contributing to one shared circuit state.

Treating a timeout as forced termination

Timeout cancellation works only when protected operations observe the cancellation token.

Assuming containers provide resilience

Containers improve packaging and movement. Resilience still requires sound boundaries, failure handling, message safety, and orchestration.

Allowing unlimited queues

A queue that grows without a bound turns temporary congestion into memory pressure and extreme latency. Limit queued work and reject overload intentionally.

Architecture Checklist

[ ] Each logical service represents a coherent business capability
[ ] Service boundaries reduce communication rather than multiply it
[ ] Every logical service controls its own data
[ ] Shared information crosses boundaries through explicit contracts
[ ] User requests do not trigger long nested service calls
[ ] State changes are pushed asynchronously to interested consumers
[ ] Topics are used when every subscriber needs a copy
[ ] Queues are used when competing workers share one stream of work
[ ] Message contracts can evolve without coordinated deployment
[ ] Every consumer can recognize duplicate messages
[ ] Non-idempotent operations are protected by event identifiers
[ ] Retries use bounded attempts and increasing delays
[ ] Circuit breakers stop pressure on failing dependencies
[ ] Rate limits or isolation bounds protect service resources
[ ] Timeouts pass cancellation to protected operations
[ ] Background workers support graceful shutdown
[ ] Container images include required runtime dependencies
[ ] Failure, duplication, shutdown, and overload paths are tested
[ ] The operational benefits justify the added distributed-system cost

Conclusion

A collection of small containers is not automatically a microservice architecture. If one incoming request causes a chain of synchronous calls, the services share failure and latency even though they deploy separately.

Reliable .NET microservices own their data, publish changes asynchronously, and keep request paths short. Consumers process duplicate messages safely. The synchronous boundaries that remain are protected with Polly retries, circuit breakers, limits, and timeouts. .NET hosted services manage background work and graceful shutdown, while containers provide deployment independence.

The result is not a system that never fails. It is a system designed so one slow or unavailable component does not turn a local problem into system-wide congestion.