Spring AI Series: #6 Observability: Spring AI and Beyond; with Grafana, Prometheus, Loki and Tempo

This blog is part of a series on Spring AI. Check out the previous article in this series: Model Context Protocol (MCP) [and Tool Calling 2].

Observability is crucial for any user facing system, and the job gets even harder with microservices architecture and with high traffic. To make our lives easier, we rely on various observability tools to monitor our systems so that we can take action before user experience takes a hit. LLM-driven AI systems introduce a unique problem here. AI hallucination is real, and since prompt engineering is more of an art than science, these GenAI based tools’ behavior in the wild isn’t deterministic. So, keeping a close eye on the system becomes even more important. Then there’s the economic part of it too. If you’re using the cutting edge LLMs provided by OpenAI or Google, you are paying them by the every input and output token. So, now that’s another thing to keep an eye on.

Fortunately, Spring AI makes our lives a lot easier in this regard. All we have to do is to have a observation pipeline up and running, and that’s where you’ll be spending most of your time. Configuration on the Spring AI side is very minimal to say the least.

There are 3 basic components to monitor when it comes to observability-

Metrics - Observing the system metrics, basically to monitor resource usage
Logs - Observing the application logs to keep an eye on errors and faults
Traces - Observing the application or system activity. Traces are much more detailed and gives a more granular picture of the system behavior.

What we’ll do in this installment, is we’ll set up a observability pipeline, covering each of these 3 components, with Grafana in the center.

Dockerizing Both of Our Java Applications

Before going into the observability setup, we’ll quickly define the applications in our docker compose.

mcp-server:
  depends_on:
    postgres:
      condition: service_healthy
  build: ./mcp-server
  container_name: mcp-server
  restart: on-failure
  networks:
    - agent-network
  ports:
    - "8081:8081"

chat-client:
  depends_on:
    ollama:
      condition: service_healthy
    cassandra:
      condition: service_healthy
    postgres:
      condition: service_healthy
  build: ./chat-client
  container_name: chat-client
  restart: on-failure
  networks:
    - agent-network
  ports:
    - "8080:8080"

Checking if our Ollama, Cassandra and Postgres instances are healthy may take a little bit of improvisation. Just having the containers up don’t mean they are ready to serve. I’ll share my experience on this in the future, but you can go check the code repository right away if you’re interested.

Each of this application will have a Dockerfile in it, that will build the project, produce a jar, and deploy the jar.

FROM gradle:jdk17 AS builder
COPY src /usr/src
COPY build.gradle /usr/
WORKDIR /usr/
RUN gradle clean bootJar

FROM eclipse-temurin:17-jre
COPY --from=builder /usr/build/libs/app.jar /usr/app.jar
ENTRYPOINT ["java", "-jar", "/usr/app.jar"]

We can now start setting up observability.

Metrics Collector (`Prometheus`)

Prometheus has been an widely used monitoring and alerting tool for a while now. We’ll use it as our metrics collector. Usually, in the context of JVM applications, Prometheus is particularly useful for system health monitoring. On top of that, thanks to Spring AI, we’ll be able to monitor various API usage metrics through Prometheus.

First, let’s add Spring Actuator and micrometer registry for Prometheus to our application.

implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'io.micrometer:micrometer-registry-prometheus'

And, in the application properties, we’ll add some new entries -

management.observations.key-values.application=mcp-server
management.endpoints.web.exposure.include=*
management.endpoint.health.show-details=always
management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.metrics.distribution.percentiles-histogram.http.client.requests=true

Basically, we’re exposing all kinds of data (metrics, threaddump, prometheus etc.). This will help you to explore all different kinds of data exposed by Spring Actuator. For this application, only health,info,metrics,prometheus would have been enough. Once this is done and the application is up, you can visually explore all this data at the /actuator/{exposed_data} endpoint. For example, the data that our Prometheus instance will ingest and parse will be at /actuator/prometheus.

This is all the setup we need on the application side. We’ll have to do these same changes in our Blog (and MCP server) application and our LLM driven chat client application, since we would like to monitor both. Now let’s configure a Prometheus instance in our docker compose file.

prometheus:
  networks:
    - agent-network
  image: prom/prometheus:latest
  container_name: prometheus
  ports:
    - "9090:9090"
  volumes:
    - ./entrypoint-setup/prometheus.yml:/etc/prometheus/prometheus.yml

And, we’ll have that configuration file prometheus.yml at the /entrypoint-setup directory that we had created. We’ll configure both of our application here.

scrape_configs:
  - job_name: 'chat-client'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 10s
    static_configs:
      - targets: ['chat-client:8080']

  - job_name: 'mcp-server'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 10s
    static_configs:
      - targets: [ 'mcp-server:8081' ]

Now, you can explore Prometheus in all it’s glory! Just by itself, Prometheus can be extremely helpful to monitor your applications.

Log Aggregator (`Grafana Loki`)

Let’s have our Loki instance up first.

loki:
  image: grafana/loki:main
  networks:
    - agent-network
  container_name: loki
  command: [ "-config.file=/etc/loki/local-config.yml" ]
  volumes:
    - ./entrypoint-setup/loki.yml:/etc/loki/local-config.yml
  ports:
    - "3100:3100"

As you can see, there’s a config file involved here as well. This is very basic setup, with an in-memory kvstore.

auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-05-15
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

Now, on the applications’ side, we already have SLF4J as our out-of-the-box logger for our Spring applications. To setup Loki as our log aggregator, we’ll use Loki logback appender. Here we are adding the dependency first.

implementation 'com.github.loki4j:loki-logback-appender:2.0.1'

If you already have a logback-spring.xml in your resources folder (or create one if you don’t), add the Loki logback appender in it.

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <contextName>mcp-server</contextName>

    <appender name="LOKI" class="com.github.loki4j.logback.Loki4jAppender">
        <http>
            <url>http://loki:3100/loki/api/v1/push</url>
        </http>
    </appender>

    <root level="DEBUG">
        <appender-ref ref="LOKI" />
    </root>
</configuration>

You can now use one of Loki endpoints to verify if all the services are operational.

~ GET http://localhost:3100/services
store => Running
compactor => Running
distributor => Running
ingester-querier => Running
server => Running
query-frontend-tripperware => Running
analytics => Running
ruler => Running
cache-generation-loader => Running
memberlist-kv => Running
ring => Running
ingester => Running
query-scheduler-ring => Running
query-scheduler => Running
querier => Running
query-frontend => Running

Distributed Tracing Integration (Grafana Tempo)

For our project, Tempo is almost an overkill, but the ease of use kind of justifies it! We’ll have our Tempo instance configured first, with Zipkin as the tracing backend.

  tempo:
    networks:
      - agent-network
    image: grafana/tempo:latest
    container_name: tempo
    command:
      - -config.file=/etc/tempo.yaml
    volumes:
      - ./entrypoint-setup/tempo.yml:/etc/tempo.yaml:ro
      - tempo:/var/tempo
    ports:
      - "3200:3100"
      - "9411:9411"

The first port mapping (3200:3100, we already have Loki running on 3100) here is for Tempo, and 9411 is for Zipkin. Let’s write the configuration file now.

server:
  http_listen_port: 3200

distributor:
  receivers:
    zipkin:
      endpoint: 0.0.0.0:9411

storage:
  trace:
    backend: local
    local:
      path: /tmp/tempo/blocks

overrides:
  metrics_generator_processors:
    - span-metrics
    - service-graphs
    - local-blocks

metrics_generator:
  storage:
    path: /var/tempo/storage
  traces_storage:
    path: /var/tempo/traces_storage
  processor:
    span_metrics:
    service_graphs:
    local_blocks:
      flush_to_storage: true
      filter_server_spans: false

You could ignore the metrics-genrators here if you just wanted to monitor tracing spans, but these metrics generators help us make some TraceQL driven charts and graphs in Grafana.

On the application side, we’ll enable Aspect Oriented Programming (AOP), micrometer tracing and Zipkin dependency.

implementation 'org.springframework.boot:spring-boot-starter-aop'
implementation 'io.micrometer:micrometer-tracing-bridge-brave'
implementation 'io.zipkin.reporter2:zipkin-reporter-brave'

And, some new properties.

management.opentelemetry.resource-attributes.service.name=mcp-server
management.tracing.sampling.probability=1.0

Basically we’re sampling all requests (100%). For production systems, you might want to dial it down quite a lot! We’ll also have a bean configuration leveraging AOP.

@Configuration
public class ObservationConfig {
  @Bean
  ObservedAspect observedAspect(ObservationRegistry registry) {
      return new ObservedAspect(registry);
  }
}

Now to check if Tempo is ingesting data properly and all the different types of data it’s ingesting, you can check :3100/metrics.

Setting Up Grafana

All of our data aggregators are ready, so we’ll have our Grafana instance set up.

grafana:
  networks:
    - agent-network
  image: grafana/grafana:latest
  container_name: grafana
  ports:
    - "3000:3000"
  restart: unless-stopped
  environment:
    - GF_AUTH_ANONYMOUS_ENABLED=true
    - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    - GF_AUTH_DISABLE_LOGIN_FORM=true
  volumes:
    - grafana:/var/lib/grafana
    - ./entrypoint-setup/grafana:/etc/grafana/provisioning/datasources:ro

This should be enough. I have disabled authentication since I’m using it locally. You can get rid of these environment variables if you want.

Aaaand, now you can access Grafana and add all the data sources from the UI.

But you can also add all these in the configuration file, so that whenever you are calling docker compose up, your Grafana is ready to use!

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    jsonData:
      httpMethod: POST
      exemplarTraceIdDestinations:
        - name: trace_id
          datasourceUid: tempo
  - name: Tempo
    type: tempo
    orgId: 1
    url: http://tempo:3200
    basicAuth: false
    isDefault: true
    version: 1
    apiVersion: 1
    uid: tempo
    jsonData:
      httpMethod: GET
      tracesToLogs:
        datasourceUid: 'loki'
      nodeGraph:
        enabled: true
  - name: Loki
    type: loki
    uid: loki
    orgId: 1
    url: http://loki:3100
    basicAuth: false
    isDefault: false
    version: 1
    apiVersion: 1
    jsonData:
      derivedFields:
        -   datasourceUid: tempo
            matcherRegex: \[.+,(.+?),
            name: TraceID
            url: $${__value.raw}

That’s it! Your observability pipeline is ready to use now.

Now you can monitor all the metrics and trace data exposed by Spring AI. For example, here I’m just checking token usage and server errors.

And, if we turn on all types of plain-text chat content monitoring (this should be for testing purposes only) in our client application -

spring.ai.chat.client.enabled=true
spring.ai.chat.client.observations.log-completion=true
spring.ai.chat.client.observations.log-prompt=true
spring.ai.tools.observations.include-content=true

- we can find the trace id of a certain request -

- and get all the details of that request!

Tried to cover as much as possible in one blog post. This project at it’s current state is available here. In the next one, we’ll create a frontend around this project so that we can actually use the tool that we made!

Dockerizing Both of Our Java Applications

Metrics Collector (Prometheus)

Log Aggregator (Grafana Loki)

Distributed Tracing Integration (Grafana Tempo)

Setting Up Grafana

Metrics Collector (`Prometheus`)

Log Aggregator (`Grafana Loki`)