Tracing Java Microservices with Jaeger and APM

Distributed applications need more than logs and metrics. They also need a way to follow one request across many components.

Tracing gives that view. It helps a team understand the path of a request as it moves through REST endpoints, injected services, downstream APIs, and other microservices. In production, that path is often the difference between guessing and knowing where latency or failure started.

The Problem

A simple Java application has a short call path.

HTTP request
  |
  v
REST resource
  |
  v
service method
  |
  v
response

A microservices application can be much harder to follow.

Client
  |
  v
API gateway
  |
  v
Payment API
  |
  +--> Profile service
  +--> Authorization service
  +--> Notification service
  +--> Settlement integration

If the response is slow or incorrect, the team needs to know which part of the chain caused the problem. Logs alone may contain the data, but without a shared identifier the entries are hard to correlate. Metrics may show that something is slow, but not necessarily which request path created the issue.

Tracing solves this by propagating an identifier through the call chain.

Core Idea

Tracing records the path of a request through a system.

The chapter describes OpenTracing as a CNCF standard for implementing this kind of functionality. In Quarkus, the example uses the SmallRye OpenTracing extension.

A trace can be understood as the complete request journey. A span is one unit of work inside that journey.

Trace: GET /payment-summary

Span 1: API gateway request
Span 2: Payment API resource
Span 3: Profile service call
Span 4: Transaction query
Span 5: Response assembly

A tracing tool can show the hierarchy and timing of these spans. That makes it useful for troubleshooting, audit logging, and understanding system behavior.

Simple Quarkus Tracing Example

A REST resource can be traced without adding tracing code directly to the endpoint when the tracing extension is present.

@Path("/trace")
public class TracingTest {
    @Inject
    NameGuessService service;

    @GET
    @Produces(MediaType.TEXT_PLAIN)
    public String hello() {
        String name = service.guess();
        return "Hello " + name;
    }
}

The endpoint listens on /trace, calls an injected service, and returns text. The source example points out that REST endpoint tracing is basically provided by the framework when the smallrye-opentracing extension is included.

The service method can be explicitly traced with @Traced.

@ApplicationScoped
@Traced
public class NameGuessService {
    public String guess() {
        Random random = new Random();
        String[] names = {"Giuseppe", "Stefano", "Filippo", "Luca", "Antonello"};
        return names[random.nextInt(names.length)];
    }
}

The method does not need complicated instrumentation. The annotation tells the framework that this operation should be visible as part of the trace.

Running Jaeger Locally

Jaeger collects and displays tracing data. The chapter uses a containerized all-in-one Jaeger server for local testing.

sudo docker run -p 5775:5775/udp -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 -p 16686:16686 -p 14268:14268 jaegertracing/all-in-one:latest

Then the application can be started with JVM arguments that name the service and configure sampling.

./mvnw compile quarkus:dev -Djvm.args="-DJAEGER_SERVICE_NAME=testservice -DJAEGER_SAMPLER_TYPE=const -DJAEGER_SAMPLER_PARAM=1"

After invoking the /trace endpoint a few times, Jaeger can display traces and spans. In the source example, one span represents the REST call and another span represents the injected service call.

In a real system, the same idea extends across multiple microservices.

Incoming request
  |
  v
Span: REST endpoint
  |
  v
Span: application service
  |
  v
Span: remote service call
  |
  v
Span: database or integration call

What Tracing Helps You See

Tracing is useful because it connects the request to its sub-calls.

It can help answer:

Which services were called?
Which operation was slow?
Did the request fail before or after a downstream call?
How much time did each span take?
Which subsystem handled the request?
How does the call tree change between releases?

This is especially valuable when the same business feature is implemented by many components.

For example, a payment summary may need profile data, recent payments, authorization state, and notification status. A trace can show whether the slow part is the payment service itself or a downstream dependency.

Tracing Is Not APM by Itself

Tracing is powerful, but it is only one part of Application Performance Management.

APM is broader. It combines several signals into one operational view:

Logs:
What happened?

Metrics:
How is the system behaving over time?

Health checks:
Is the service usable right now?

Tracing:
Which path did a specific request take?

Runtime data:
How are JVM memory, CPU, and threads behaving?

The chapter describes APM as the effort to understand how applications are performing and how underlying parameters, such as memory usage and database metrics, affect end-user experience, such as responsiveness and response time.

This often requires a stack of tools. Logs, metrics, health checks, tracing, JVM data, and sometimes Java agents all contribute different views.

Be Careful with APM Costs

APM is valuable, but it is not free.

There are three main concerns.

Performance Impact

Many modern approaches aim to be asynchronous and lightweight. Older or more invasive techniques, such as Java agents, can add noticeable overhead. This is especially important because APM is most needed when systems are already under pressure.

Monitoring must help the overloaded system,
not become another reason it is overloaded.

Maintenance Effort

Collected data can be huge. One transaction may generate logs, metrics, timings, error information, and tracing spans. Multiply that by thousands of transactions and many services, and storage plus configuration becomes serious operational work.

Correlation Difficulty

Tracing correlates one request path well. The harder part is connecting traces to logs, health checks, metrics, and business KPIs. A strong observability design should include common identifiers and consistent naming so the different views can be joined during troubleshooting.

Service Monitoring and KPIs

Service monitoring connects technical data to business meaning.

The chapter separates two types of KPIs:

KPI type	Examples
Technical	memory usage, thread count, connection count, CPU usage
Business	transaction time, concurrent users, new users, amount of money passing through the platform

The boundary can be blurry. Average transaction time can be both a business metric and a technical symptom.

A useful architecture map connects business features to technical components.

Business KPI:
Payment transaction time

Implemented by:
Payment API
Authorization service
Profile service
Settlement integration

Technical metrics to inspect:
JVM heap usage
thread count
CPU usage
connection pools
downstream call timing

This map can be simple documentation. It does not need to be a complicated tool on day one. The value is in knowing where to look when a business metric changes.

If payment transaction time gets worse, the team can immediately inspect the services and infrastructure that implement that feature.

Example Monitoring Map

For a payment platform, a lightweight monitoring map can look like this:

Feature: make payment

Business metrics:
- average payment transaction time
- number of payment attempts
- number of accepted payments
- number of rejected payments

Technical components:
- mobile backend
- payment service
- authorization integration
- transaction database

Technical metrics:
- API response time
- JVM used heap
- live thread count
- database connection count
- downstream authorization call time

Tracing:
- one trace identifier per payment request

This map makes troubleshooting faster because it joins the language of business and operations.

Practical Workflow

Identify the most important user-facing flows.
Add tracing to REST entry points and meaningful internal service methods.
Propagate trace identifiers across service boundaries.
Use Jaeger or an equivalent collector to visualize spans.
Keep span names meaningful and stable.
Combine traces with metrics, health checks, and logs.
Add JVM and infrastructure metrics where they explain user experience.
Build a feature-to-component map for important business KPIs.
Watch APM overhead and storage growth.
Use collected data to improve architecture and maintenance decisions.

Common Mistakes

The first mistake is tracing only one service in a distributed flow. The most useful traces cross boundaries.

The second mistake is collecting spans with meaningless names. A span named method1 is not useful during an incident.

The third mistake is expecting tracing to replace logs and metrics. It complements them.

The fourth mistake is installing APM tooling without planning data volume and retention.

The fifth mistake is ignoring business KPIs. Technical metrics matter, but production systems exist to support business behavior.

Checklist

Important request paths have trace identifiers.
REST entry points are traced.
Important service methods are traced.
Span names describe real operations.
Jaeger or another trace viewer is available.
Logs and metrics can be correlated with traces where possible.
JVM memory, CPU, and thread metrics are visible.
APM overhead is considered.
Data retention and storage growth are planned.
Business KPIs are mapped to technical components.
Monitoring data is used in maintenance planning.

Conclusion

Tracing gives Java teams a request-level map of distributed behavior. Jaeger and OpenTracing-style instrumentation make it possible to see spans, timings, and call relationships that are difficult to reconstruct from logs alone.

APM expands that view by combining traces with logs, metrics, health checks, JVM data, and business KPIs. The strongest monitoring design does not only ask whether servers are alive. It asks whether important business features are working well and which technical components explain the answer.