API DesignDistributed Systems
June 7, 2026

Designing Reliable Messaging with Queues and Delivery Guarantees

Messaging is one of the most important tools for application integration. It lets one component produce information, and another component consumes it later, often without both systems being online at the same time.

That flexibility is powerful, but it is not free. Once messages become part of the architecture, you must make explicit choices about queues, topics, delivery guarantees, message expiration, persistence, dead letter handling, and duplicate processing.

The Problem

Synchronous calls are easy to understand. A client calls a service, waits for the response, and either succeeds or fails. Messaging changes that shape.

With messaging, a producer sends a message to an intermediate broker. The broker then dispatches it to one or more consumers.

Producer
  |
  v
Message broker
  |
  +-- Consumer A
  +-- Consumer B

This gives you decoupling, buffering, and scaling options. It also creates new questions:

  • Can a message be lost?
  • Can a message be delivered twice?
  • What happens when no consumer is available?
  • What happens when a consumer fails halfway through processing?
  • How long should a message stay valid?
  • Where do broken messages go?
  • How do we monitor the health of the flow?

These questions define the reliability of the system.

Core Idea

A message broker is the intermediate component between producers and consumers. Examples of broker technology include Apache ActiveMQ, Kafka, and RabbitMQ.

The broker owns the mechanics of receiving, storing, dispatching, and sometimes replicating messages. The application code still owns the meaning of each message and how to handle duplicates, failures, and invalid payloads.

A useful mental model is:

Producer responsibility:
  - Build valid message
  - Add useful metadata
  - Send to broker
  - Handle broker acknowledgement

Broker responsibility:
  - Accept message
  - Store or forward message according to configuration
  - Dispatch to consumers
  - Apply delivery and expiration behavior

Consumer responsibility:
  - Process message safely
  - Handle duplicate delivery if required
  - Report success or failure

Do not treat the broker to avoid design. The broker gives delivery tools, but the system still needs clear semantics.

Queues versus Topics

The first design decision is whether the message should go through a queue or a topic.

A queue is commonly used for point-to-point or work-distribution scenarios. Producers send messages to the queue. Consumers take messages from the queue. If no consumer is currently connected, the queue stores the messages until one is available.

Producer 1 ---> Queue ---> Consumer 1
Producer 2 ------^   \---> Consumer 2
Producer 3 ----------/     Consumer 3

When several consumers read from the same queue, they usually compete for messages. Each message is consumed by one consumer. This supports horizontal scaling because you can add more consumers when the message volume grows.

A topic has a different meaning. Producers publish messages, and connected consumers receive a copy. This is closer to broadcast behavior.

Producer ---> Topic ---> Subscriber A
                 |-----> Subscriber B
                 |-----> Subscriber C

Use a queue when a message represents work that should be handled once. Use a topic when a message represents information that several consumers should observe.

Delivery Guarantees

Message quality of service describes what the broker promises after accepting a message. There are three common scenarios.

Guarantee Meaning Practical consequence
At most once The message may be lost, but it is not delivered more than once Good for best-effort data where duplicates are worse than loss
At least once The message is not lost, but it may be delivered more than once Consumers must handle duplicates
Exactly once The message is delivered once with no loss and no duplicate processing Strongest behavior, usually more expensive

At most once can work for short-lived values such as rapidly changing exchange rates, where a later value will replace the previous one.

At least once is common when losing data is unacceptable, but duplicates can be detected and ignored by the consumer.

Exactly once is the ideal from a business point of view, but it is usually the most expensive option because the system must track duplicates and coordinate delivery state.

Designing Consumers for Duplicate Messages

A safe consumer should assume that a message can arrive more than once unless the whole system has been designed and configured otherwise.

One simple approach is to use an identifier in the message header or body and record that it has already been processed.

public class PaymentSettlementConsumer {
    private final ProcessedMessageRepository processedMessages;
    private final SettlementService settlementService;

    public void handle(Message message) {
        String messageId = message.getHeader("MessageID");

        if (processedMessages.exists(messageId)) {
            return;
        }

        settlementService.settle(message.getBody());
        processedMessages.save(messageId);
    }
}

This is only a conceptual example, but the idea is important. If you choose at least once delivery, idempotent consumers become part of the architecture.

Zero Message Loss

When zero message loss is required, brokers usually rely on persistence, copies, or both.

With message persistence, the broker writes the message to a filesystem or database before acknowledging the producer. If the broker crashes, it can recover from the persisted journal.

Producer
  |
  v
Broker
  |
  +-- Persistent journal
  |
  v
Consumer

With message copies, the broker sends copies to backup broker instances before acknowledging the producer. If the primary broker fails, a backup instance can take over.

Producer
  |
  v
Primary broker
  |
  +-- Copy to backup broker A
  +-- Copy to backup broker B

These techniques improve reliability, but they have performance costs. The more durable the guarantee, the more work the system must do before saying that a message was accepted.

Dead Letter Queues and Time to Live

A dead letter queue is a special destination for messages that cannot be processed normally. Messages may be sent there when no consumer is available after a certain period, when the broker does not know what to do with the message, or when processing fails repeatedly.

Queue
  |
  +-- valid message ---> Consumer
  |
  +-- failed message --> Dead Letter Queue

A dead letter queue should not be ignored. It is an operational signal and often contains recoverable business data.

Time to live defines how long a message remains valid. If the message expires before it is consumed, the broker may discard it or move it to a special queue such as the dead letter queue.

This is useful when old data is actively harmful. For example, a short-lived notification may not be worth processing after its time window has passed.

Message Headers

Headers carry metadata used by the broker, route, and consumers.

A practical header set can include:

MessageID: unique identifier for duplicate detection
CorrelationID: identifier shared by related messages
CreatedAt: message creation time
Sender: producing system
MessageType: business type of the payload
TracePath: integration steps already passed

Good headers make troubleshooting easier. They also help build message history, where each route step records that it processed the message.

Testing a Messaging Flow

A messaging test should cover more than the successful path.

A useful test workflow is:

  1. Publish a valid message.
  2. Verify that one consumer processes it.
  3. Publish the same message twice.
  4. Verify that duplicate handling is safe.
  5. Stop all consumers.
  6. Publish messages.
  7. Start consumers and verify that queued messages are processed.
  8. Publish invalid messages.
  9. Verify that they reach the dead letter queue.
  10. Send a special test message and verify that monitoring receives it.

A conceptual test message can look like this:

{
  "messageType": "TEST",
  "messageId": "test-001",
  "createdAt": "2026-05-28T10:00:00Z",
  "payload": {
    "route": "payment-settlement"
  }
}

Every intermediate step should either tolerate this message or explicitly route it away from business side effects.

Common Mistakes

One mistake is choosing exactly once everywhere. It sounds ideal, but it can create unnecessary performance costs.

Another mistake is forgetting that at least once requires duplicate-safe consumers. The broker can redeliver, so the consumer must be prepared.

A third mistake is not monitoring the dead letter queue. A growing dead letter queue usually means a real problem: bad messages, missing consumers, incorrect routing, or downstream failures.

A fourth mistake is putting too little metadata in the message. Without identifiers and timestamps, troubleshooting becomes guesswork.

A fifth mistake is treating topics as durable work queues. Topics and queues have different semantics. Choose based on the business meaning of the message.

Checklist

  • Queue or topic is chosen intentionally.
  • Delivery guarantee is documented.
  • Consumers are duplicate-safe when needed.
  • Message IDs are stable and unique.
  • Correlation IDs are available for related messages.
  • Dead letter queues are monitored.
  • Time to live is configured only when expiration is meaningful.
  • Zero-loss requirements are justified by business value.
  • Persistence and broker copies are understood as performance tradeoffs.
  • Test messages are routed safely.

Conclusion

Messaging is not only a transport mechanism. It is a reliability contract between producers, brokers, and consumers.

A good messaging design defines who receives each message, what delivery guarantee is required, how duplicates are handled, how failures are recovered, and how the flow is observed in production. Once those decisions are explicit, queues and topics become powerful tools instead of hidden sources of uncertainty.', DATE_FORMAT(NOW(), '%Y-%m-%d %H:%i'), 'Event-Driven Systems', 0, 0 ), ( 'Choosing Formats and API Protocols for Java Integration', 'Learn when to use XML, JSON, YAML, Protobuf, SOAP, REST, gRPC, GraphQL, ETL, data virtualization, and change data capture in Java integration design.', 'Integration design is not only about where messages go. It is also about how data is represented and which communication style connects the systems.

A Java service may need to receive JSON from a mobile application, write XML for a legacy settlement platform, expose REST APIs, call a gRPC service, support GraphQL queries, or move database records through ETL or change data capture. Each option solves a different problem.

The Problem

When two systems communicate, they must agree on two things:

  • The data format.
  • The communication protocol or integration technique.

The format defines how data is encoded. The protocol or technique defines how systems exchange it.

System A
  |
  | protocol: REST
  | format: JSON
  v
System B

Poor choices create unnecessary coupling. A format that is easy for humans may be inefficient for high-volume machine communication. A flexible API style may require extra validation, security, or governance. A data integration process may be useful for reporting but dangerous if used as the main way to synchronize operational systems.

Core Idea

Start from the use case, not from the technology.

Ask these questions:

  1. Is this request-response communication or asynchronous movement?
  2. Is the consumer a browser, mobile app, backend service, legacy system, or reporting platform?
  3. Is the payload human-readable or optimized for compact binary transfer?
  4. Does the client need fixed responses or flexible field selection?
  5. Is the data transient, as part of a call or message, or data at rest in storage?
  6. Who owns the source of truth?

Once those answers are clear, choosing between XML, JSON, Protobuf, REST, gRPC, GraphQL, ETL, virtualization, or change data capture becomes easier.

XML

XML is widely known in enterprise integration and legacy systems. It is structured, schema-friendly, and common in older platforms.

A settlement payload can look like this:

<payment>
  <id>1ef43029-f1eb-4dd8-85c4-1c332b69173c</id>
  <currency>EUR</currency>
  <sender>giuseppe@test.it</sender>
  <recipient>stefano@domain.com</recipient>
  <amount>10.0</amount>
</payment>

XML works well when a target system already expects it or when schema validation is important. The downside is verbosity. In many modern API scenarios, JSON is lighter and easier for web clients.

JSON and YAML

JSON is common in web APIs and especially natural for JavaScript-based clients. In Java, Jackson is a common way to map JSON fields to Java object fields, either implicitly by matching names or explicitly through annotations.

{
  "id": "1ef43029-f1eb-4dd8-85c4-1c332b69173c",
  "currency": "EUR",
  "sender": "giuseppe@test.it",
  "recipient": "stefano@domain.com",
  "amount": 10.0
}

JSON is a practical default for REST APIs and browser or mobile clients.

YAML is another serialization format. It became especially common in configuration-heavy environments, including Kubernetes resource definitions and framework configuration. For application payloads, JSON is often more common. For configuration, YAML can be readable and compact.

payment:
  currency: EUR
  sender: giuseppe@test.it
  recipient: stefano@domain.com
  amount: 10.0

YAML should be treated carefully in integration payloads because whitespace and formatting mistakes can make it harder to process consistently.

Protobuf

Protocol Buffers, usually called Protobuf, is a binary serialization format. It is language independent and commonly used when systems need compact, efficient, strongly described payloads.

A .proto file describes the message structure and can be used by a compiler to generate Java classes for serialization and deserialization.

syntax = "proto3";

option java_outer_classname = "PaymentProto";
option java_package = "it.test";

message Payment {
  string id = 1;
  string currency = 2;
  string sender = 3;
  string recipient = 4;
  float amount = 5;
}

The numeric field indexes are important because Protobuf uses them as identifiers in the binary message. This means schema evolution must be handled deliberately.

Use Protobuf when compact machine-to-machine communication matters more than human readability.

SOAP and REST

SOAP and REST are common API styles for communication between systems.

SOAP is strongly associated with XML and traditional enterprise web services. It can provide formal contracts, but it is verbose and less flexible than many modern alternatives.

REST is more lightweight. It uses HTTP verbs such as GET, PUT, POST, and DELETE against resources identified by URIs. JSON is commonly used as the payload format.

POST /payments
Content-Type: application/json

{
  "recipient": "stefano@domain.com",
  "amount": 10.0,
  "currency": "EUR"
}

REST is flexible, but some capabilities that may be embedded in SOAP-style stacks, such as validation, session handling, or security details, are usually provided through additional tools, libraries, and platform conventions.

gRPC

gRPC is a modern remote procedure call framework. It uses Protobuf by default and supports patterns such as synchronous calls, asynchronous behavior, bidirectional streaming, notifications, security, and flow control.

A minimal service definition can look like this:

syntax = "proto3";

option java_multiple_files = true;
option java_package = "it.test";
option java_outer_classname = "GrpcTestProto";

package grpctest;

service Ping {
  rpc Send (PingRequest) returns (PingReply) {}
}

message PingRequest {
  string msg = 1;
}

message PingReply {
  string msg = 1;
}

In Java, a client builds a channel, creates a stub, builds a request object, sends it, and receives a response.

ManagedChannel channel =
    ManagedChannelBuilder.forTarget("localhost:9783")
        .usePlaintext()
        .build();

PingGrpc.PingBlockingStub blockingStub = PingGrpc.newBlockingStub(channel);

PingRequest request = PingRequest.newBuilder()
    .setMsg("Ciao")
    .build();

PingReply response = blockingStub.send(request);

Use gRPC for service-to-service communication where compact contracts, streaming, and efficient cross-language communication are important. In production, also plan for shutdown, exception handling, retries, flow control, and load balancing.

GraphQL

GraphQL is a way to define APIs where clients can request exactly the data they need. This can be useful for mobile applications and other clients where reducing unnecessary data transfer matters.

A client can request only selected fields:

query {
  payments {
    date
    amount
    recipient
  }
}

A query can also include conditions:

query {
  getPayments(recipient: "giuseppe") {
    amount
    date
  }
}

GraphQL supports complex nested types and can include features such as pagination, sorting, and caching. It can be implemented inside backend code or provided through a standalone server that reads from a data source.

Use GraphQL when client flexibility is the main concern. Use REST when simple resource-oriented APIs are enough. Use gRPC when strong service contracts and binary efficiency matter more.

Data Integration

So far, the focus has been on transient data: data moving as part of an API call, a message, or an integration route. Data integration is different. It focuses on data at rest, such as databases, files, and CSV exports.

Common techniques include ETL, data virtualization, and change data capture.

ETL:
Source data stores
  |
  v
Extract
  |
  v
Transform
  |
  v
Load into the target data store

ETL reads from one or more sources, transforms the data, and loads it into a target system. It is common in data warehouses and batch processing.

Data virtualization tries to avoid copying data. Instead of loading into a target database, a virtual layer translates requests into queries against source systems. This can reduce replication, but it can become complicated and may need caching for performance.

Change data capture listens for changes in a source system and propagates them to interested systems. This can be done by polling or by reading database transaction logs. The detected events are often propagated through queues.

Choosing the Right Option

A practical decision guide:

Use case Good fit
Web or mobile API REST with JSON
Legacy enterprise contract SOAP with XML
Efficient service-to-service calls gRPC with Protobuf
Client-selected fields GraphQL
Configuration YAML or properties-style configuration
Batch reporting movement ETL
Avoiding physical data replication Data virtualization
Reacting to database changes Change data capture

The table is not a rulebook. It is a starting point. The real choice depends on existing systems, operational skills, governance, security, and performance expectations.

Common Mistakes

One mistake is choosing data integration when an API would be cleaner. Moving data at rest can create stale data, unclear ownership, and synchronization problems.

Another mistake is using GraphQL only because it is flexible. If clients always need the same resource shape, REST may be simpler.

A third mistake is choosing Protobuf or gRPC without planning schema evolution and generated-code ownership.

A fourth mistake is treating REST as complete by itself. Validation, security, documentation, and error semantics still need design.

A fifth mistake is ignoring the consumer. A mobile client, backend service, legacy platform, and data warehouse have different needs.

Checklist

  • The data format matches the consumer.
  • The protocol matches the interaction style.
  • Human readability and binary efficiency are weighed deliberately.
  • API security and validation are handled explicitly.
  • Schema ownership is clear.
  • Data-at-rest integration has a clear source of truth.
  • ETL jobs have monitoring and recovery.
  • Change data capture events are traceable.
  • GraphQL is chosen for client-driven data selection, not fashion.
  • gRPC calls include error and lifecycle handling.

Conclusion

Formats and protocols are architectural decisions. XML, JSON, YAML, Protobuf, SOAP, REST, gRPC, GraphQL, ETL, virtualization, and change data capture all solve real problems, but not the same problem.

A good integration design starts with the communication need, the consumer, the source of truth, and the operational constraints. Once those are clear, the technology choice becomes a consequence rather than a guess.

Share:

Comments0

Home Profile Menu Sidebar
Top