Fixing Next.js ISR Revalidation Across Kubernetes Replicas

There's this button in our Strapi admin panel. It says "Revalidate". When you click it, it's supposed to clear the cached version of a page so the fresh content shows up. Simple enough.

For a long time, clicking that button did almost nothing. You'd press it, check the page — old content. Press it again — old content. Try a different browser, incognito mode, clear your local cache — old content. It wasn't that the button was broken in any obvious way. It just didn't seem to have any visible effect most of the time.

The fix took us down a rabbit hole of Next.js caching internals, Kubernetes networking, and a Redis Pub/Sub setup that works great but lives in a file where it probably shouldn't. Here's the whole story.

Context

Quick recap of the stack. The strapi.io frontend is a Next.js 13 app using the Pages Router with ISR. We have over 1000 statically generated pages. Content comes from Strapi CMS, and editors use the strapi-plugin-revalidate-button to push updates. The app runs on Kubernetes in 3 replicas behind a load balancer, with CloudFront as the CDN layer.

If you want the background on how we set up ISR originally, there's an earlier article covering that. This one is about what happens when that setup meets horizontal scaling.

The Cache That Nobody Else Knew About

This is actually a well-known limitation of Next.js ISR in self-hosted multi-instance deployments. It comes up frequently in GitHub discussions, but unless you've been burned by it, it's easy to overlook.

The core issue: res.revalidate() only clears the ISR cache on the instance where it runs. Each Next.js pod keeps its own copy of every static page — in memory and on the filesystem. When the revalidation request arrives through the load balancer, one pod out of three gets the call. That pod regenerates the page. The other two keep serving the old version.

Now here's where it gets really frustrating. When the content editor clicks the button multiple times, the load balancer tends to route their requests to the same pod — the one that was already revalidated. So they're re-revalidating a pod that's already fresh, while the other two stay stale. Clicking more doesn't help.

CloudFront made this even more unpredictable. The CDN invalidation request itself worked fine — CloudFront would clear its edge cache as expected. But then it needs to refetch the page from the origin. The origin is our load balancer, which routes to one of three pods. If it hits the one that was revalidated, great — fresh content gets cached at the edge. If it hits one of the other two, stale content goes right back into CloudFront.

In practice it was often worse than pure probability, because the load balancer routing isn't purely random. Depending on connection reuse and pod health checks, you could end up in situations where the same stale pod kept getting picked. Content editors had essentially lost trust in the button.

To visualize what was happening:

Three Approaches That Didn't Work

cacheHandler with Redis — Next.js lets you swap the cache backend via configuration. Sounds perfect for sharing state across pods. Except this feature applies to the App Router's Data Cache, not to the Pages Router's res.revalidate() mechanism. Different caching layers, different APIs. Didn't apply to our setup.
instrumentation.ts — The clean answer. A file that runs once at server startup, ideal for setting up background processes like Redis subscriptions. Two issues: it was experimental in Next.js 13, and it had a known bug where it didn't execute at all in standalone output mode. We use standalone for Docker. Spent a few hours wondering why our code never fired before finding the GitHub issue. Classic.
Shared PVC across pods — Mount the same filesystem volume so all pods share the .next/cache directory. Sounds reasonable until you learn that Next.js keeps an in-memory LRU cache on top of the filesystem. Shared disk doesn't clear the other pods' RAM. Next.

What Actually Worked: Redis Pub/Sub

Redis wasn't running in our cluster at the time, so we spun up a new instance for this. Its Pub/Sub feature lets you broadcast messages to all subscribers on a channel. The pattern we landed on:

Revalidation request arrives at any pod
That pod publishes to a Redis channel instead of revalidating directly
All pods are subscribed to that channel
Each pod receives the message and revalidates its own cache
Once all pods are revalidated, CloudFront cache gets invalidated too

Step 4 deserves a bit more explanation. When a pod receives the Redis message, it needs to call res.revalidate(path) to clear its own ISR cache. The problem is that res.revalidate() is only available inside a Next.js API route handler — it's a method on the response object, not something you can import and call from anywhere in the codebase. There's no nextjs.revalidate(path) function you can use in a background process or a Redis subscriber callback. If instrumentation.ts had worked, we might have had access to the server internals in a way that could bypass this. But with that option gone, we needed another way in.

Our workaround: each pod makes an HTTP request to itself (http://127.0.0.1/api/internal-revalidate) when it receives a Redis message. A pod calling its own API endpoint to trigger revalidation. It's a roundabout way to get access to res.revalidate(), but it works reliably.

If Redis is unavailable, the endpoint falls back to direct revalidation on just the one pod — the old behavior. Not great, but it doesn't crash.

Here's how the flow looks after the fix:

Walking Through the Implementation

Let's look at the key pieces. These snippets are simplified from our production code — just enough to see the logic, nothing you'd copy-paste directly.

The core of the solution is a Pub/Sub client singleton that manages Redis connections. Since Redis requires separate clients for publishing and subscribing, and different parts of the app only need one or the other, we split the connection into two methods. The revalidation endpoint only connects a publisher, while the subscriber setup in _app.js only connects a subscriber. If Redis isn't reachable, the application continues without it:

class PubSubClient {
  static #instance = null
  #publisher = null
  #subscriber = null

  static getInstance() {
    if (!PubSubClient.#instance) {
      PubSubClient.#instance = new PubSubClient()
    }
    return PubSubClient.#instance
  }

  async connectPublisher() {
    if (this.#publisher) return
    this.#publisher = new Redis(redisOptions)
    await this.#publisher.connect()
  }

  async connectSubscriber() {
    if (this.#subscriber) return
    this.#subscriber = new Redis(redisOptions)
    await this.#subscriber.connect()
  }

  get publisher() { return this.#publisher }
  get subscriber() { return this.#subscriber }
}

The main revalidation endpoint — the one the Strapi plugin hits — only needs the publisher side. It publishes a message to Redis instead of calling res.revalidate() directly. If Redis is down, it falls back to the old single-pod behavior:

// /api/revalidate.js
export default async function handler(req, res) {
  // ... auth check, resolve pageToRevalidate from request ...

  const client = PubSubClient.getInstance()
  await client.connectPublisher()

  if (client.publisher) {
    await client.publisher.publish(
      'strapi-web:revalidate',
      JSON.stringify({ path: pageToRevalidate, timestamp: Date.now() })
    )
  } else {
    await res.revalidate(pageToRevalidate)
  }

  return res.json({ revalidated: true, page: pageToRevalidate })
}

Then there's the internal endpoint that each pod calls on itself. It's secured with a shared secret and only accessible from localhost — its only purpose is to give the Redis subscriber a way to trigger res.revalidate():

// /api/internal-revalidate.js — only accessible from localhost
export default async function handler(req, res) {
  // ... validate shared secret ...

  const path = req.query.path
  await res.revalidate(path)
  return res.json({ revalidated: true, path })
}

And finally, the part we're least proud of. With instrumentation.ts broken in standalone mode, the only reliable place to run server-side initialization in Next.js 13 Pages Router was the module scope of _app.js. When this file loads on the server, the top-level code runs once — and that's where we set up the Redis subscription. This side only needs the subscriber connection:

// _app.js — server-side module initialization
if (typeof window === 'undefined') {
  const revalidatePath = async (path) => {
    const port = process.env.PORT || 3000
    await fetch(
      `http://127.0.0.1:${port}/api/internal-revalidate?path=${encodeURIComponent(path)}`,
      { headers: { /* shared secret for auth */ } }
    )
  }

  const { PubSubClient } = await import('src/lib/redis/pubsub-client.js')
  const client = PubSubClient.getInstance()
  await client.connectSubscriber()

  client.subscriber.on('message', (channel, message) => {
    if (channel !== 'strapi-web:revalidate') return
    const { path } = JSON.parse(message)
    revalidatePath(path)
  })

  await client.subscriber.subscribe('strapi-web:revalidate')
}

Module-level side effect. Persistent Redis connection initialized from a React entry point. Not the kind of code you show off at a conference talk, but it's been doing its job without issues.

What We Learned

The problem we ran into is well-documented if you know where to look, but poorly surfaced otherwise. If you search for "Next.js ISR multiple instances" you'll find GitHub issues and discussions going back years. The official docs mention it almost as an aside. It's the kind of thing you discover in production, not during development.

What surprised us most was how much the framework constraints shaped the final architecture. The reason we have a pod calling itself via HTTP, the reason we initialize Redis in _app.js, the reason we need a second API endpoint — all of these exist because of specific limitations in Next.js 13's Pages Router and the broken instrumentation.ts in standalone mode. The solution matches the constraints we had, not any textbook pattern.

One design decision we're glad we made early was the graceful degradation. When Redis is down, revalidation still works on one pod. When the internal endpoint fails, it logs and moves on. The system never fully breaks. This has been a deliberate choice and it gives us peace of mind that a Redis hiccup won't cascade into content editors losing the ability to publish at all.

The good news is that this is getting easier for everyone else. Next.js 15 stabilized instrumentation.ts and fixed the standalone mode bug. The cacheHandler API has matured, and libraries like @neshca/cache-handler handle shared Redis caching across instances properly. If you're starting a new project today, you have much better options. But if you're maintaining a Next.js 13 Pages Router app on Kubernetes and need on-demand revalidation to actually work — we can confirm this approach holds up. It's been running on strapi.io for weeks now and content editors have stopped complaining. Which, if you've ever worked with content teams, you know is the real metric that matters.

Fixing ISR Revalidation Across Kubernetes Replicas on strapi.io

Context

The Cache That Nobody Else Knew About

Three Approaches That Didn't Work

What Actually Worked: Redis Pub/Sub

Walking Through the Implementation

What We Learned

References

Related Posts

Incremental Static Regeneration in Next.js with Strapi

React & Next.js in 2025 - Modern Best Practices

Why SQLite Is a Poor Fit for Production Strapi Applications