Building Grafana Mimir’s Next-Gen Architecture with WarpStream Co-Founder, Ryan Worl
[00:00:00]
**Mat Ryer:** Hello, welcome to season 3 of Grafana's Big Tent, the podcast all about the people, community, tools, and tech around observability. I'm here with my co-host Tom Wilkie. Hello, Tom.
**Tom Wilkie:** Hello Matthew, how are you?
**Mat Ryer:** Pretty good, thanks. Listeners have been writing in, they'd rather you referred to me as Matt.
**Tom Wilkie:** Do you mean listener has been writing in?
**Mat Ryer:** Yes, the listener's name is M-at-ryer.
**Tom Wilkie:** And it's with one T. If you're actually in my phone as Matt, quote, one-T-ryer.
**Mat Ryer:** Great. So then it's got two T's in it then, because you've mentioned that.
**Tom Wilkie:** Yeah, exactly. Well, now I know you don't drink coffee either.
**Mat Ryer:** Yeah, cool. That's it.
**Tom Wilkie:** Did you put everyone's drink preferences in?
**Mat Ryer:** It's their middle name. Yeah. I like to be thorough on that. When I get someone's number, I just don't want the number. I want the company, I want the home address. Do you know what I mean? The work email, personal email. I had a builder around recently. I'm spending the summer in London and getting all the household chores done. And he had a black tea, no sugar, no milk.
**Tom Wilkie:** Wow.
**Mat Ryer:** I know.
**Tom Wilkie:** Not a real builder. Do you put the milk in the cup?
**Mat Ryer:** No, no. Well, obviously, I don't just pour it into his mouth.
**Tom Wilkie:** Yeah, okay. Not anymore.
**Mat Ryer:** I put it in after I brewed the tea though, not before I brewed the tea.
**Tom Wilkie:** Yeah. That's the correct way. Okay.
**Mat Ryer:** Well, we've also got guests, Tom, speaking of a cup of tea. Would you like a cup of tea, Marco?
**Marco Pracucci:** For sure.
**Mat Ryer:** Welcome to the Big Tent podcast. Tell us about yourself.
**Marco Pracucci:** Thank you. Hello, everyone. I'm Marco. I'm a software engineer at Grafana Labs. I joined Grafana five and a half years ago, quite a long time. I've always worked on our metrics product. And I spent the last couple of years leading the new Mimir architecture project. We're going to learn more about that today.
[00:02:30]
**Mat Ryer:** We're also joined by Cyril Tovena. Hello, Cyril.
**Cyril Tovena:** Hey, Matt. Hey, guys. I'm Cyril. I've been working at Grafana Labs for six years now. And I just recently moved to the AI department, but I was working on Loki before, so touching on also databases at Grafana Labs.
**Tom Wilkie:** This is not a competition for how long have we been working at Grafana Labs, folks, because I have been here over seven years now.
**Mat Ryer:** Right. It's not a competition, though, you say, Tom, but if it was...
**Tom Wilkie:** For anywhere, yeah. Okay. So cool. But just to be clear, it's not. I also work with Cyril quite a lot on the assistant and various other agentic LLM things, which is very cool. So it's a friend of mine now, I'd say. Would you say that, Cyril?
**Cyril Tovena:** We'll see. Yeah, except when I spot Matt using two T's in my text after years still doing it.
**Mat Ryer:** Yeah. It's fun.
**Tom Wilkie:** And from WarpStream, we're also joined by Ryan Worl. Hello, Ryan. Welcome. Tell us about yourself.
**Ryan Worl:** Hi. Thanks for having me, Matt. Yeah, I'm Ryan Worl. I'm one of the co-founders of WarpStream. And after the acquisition by Confluent last year, I'm a director of engineering at Confluent now. Previously, slightly relevant to this podcast, I worked at Datadog.
**Tom Wilkie:** Boo.
**Ryan Worl:** No, no, no.
**Tom Wilkie:** Datadog's not a dirty word here. And I guess, Ryan, you know, leading the witness a little bit here, but how long have you been working with Grafana Labs?
**Ryan Worl:** Oh, wow. I think it's over a year now, probably at least 15 months.
**Tom Wilkie:** Yeah. Well, it's safe to say we were one of your first, if not the first customer. I think you were first three.
**Ryan Worl:** First three. Okay. So maybe number two or number three, I can't remember.
**Tom Wilkie:** Definitely bigger.
**Ryan Worl:** And I feel there. Well, we'll get on to that, I guess, won't we?
[00:04:30]
**Tom Wilkie:** But today we're going to talk, so Marco kind of alluded to it in his intro that he's been working on a new architecture for Mimir. And all of us say Mimir wrong. I do apologize for the rename on the project, but I say Mimir. But yeah, Marco, what is this new architecture? And what is it called?
**Marco Pracucci:** Well, internally, we call this project Sigyn.
**Tom Wilkie:** Sigyn.
**Marco Pracucci:** But then the end is just the next generation architecture of Mimir. And all the development of the new architecture is done in public. Grafana Mimir is an open source project, it will continue to be open source. And the new architecture is supported in open source as well.
**Tom Wilkie:** Nice. Sigyn is a goddess in Norse mythology.
**Mat Ryer:** Norse mythology. Yeah. Let's base this out from mythology.
**Tom Wilkie:** It's like from the North, like mythology, from Manchester.
**Mat Ryer:** Sigyn is a goddess in Norse mythology, primarily known as the loyal wife of Loki. Is that right?
**Tom Wilkie:** And what happens when we run out of Norse gods, do we have to just stop?
**Mat Ryer:** This is why we chose the Norse and those kind of pantheons of gods, because there's a lot of them, right? So we don't really have to worry about running out. I really wanted to call Mimir Mjolnir, though, but no one could pronounce it, even fewer people than can pronounce Mimir.
[00:06:00]
**Tom Wilkie:** So before we talk about Sigyn, let's talk a little bit about Mimir and the Mimir project and a little bit of the history. Really shortly, what is Mimir, Marco? And why are we talking about it?
**Marco Pracucci:** Yeah. So Mimir is an open source, time series database. It's scalable, multi-tenant, natively supports OpenTelemetry metrics and Prometheus metrics. Mimir has been designed to store massive volume of metrics in a reliable and cost-effective way. And today, it's what powers the hosted metrics solution at Grafana Cloud, other than plenty of users using our open source version.
**Tom Wilkie:** And just to give some perspective on the scale of Mimir now, it's doing, I think, billions of active series, right, across just in Grafana Cloud. And if we talk about all of the customers of Grafana Cloud that we know in the open source usage, then it's easily doing tens of billions of active series worldwide. So I think it's probably one of the biggest hosted Prometheus-like projects out there now in terms of scale, which is nice.
But we, or I guess, because spoiler alert, I started Cortex, which is the project that became Mimir, made some architectural decisions ten years ago when we started this project. That I think, Marco, you spent your last five years fixing, shall we say? Is that accurate?
**Marco Pracucci:** That's really fixing. I see everything as an evolution. The very change, the context change, the technology change. And so we change our products as well. What we build is not static, is more dynamic and just follows the evolution. I'm used to saying that I just did re-architectures since when I joined Grafana Labs. First we changed the storage engine and we moved from a database, like Bigtable, to the object storage back in time. That's something we did originally in Cortex. And I spent the last couple of years with the second big re-architecture of Mimir, which I guess is what we're going to talk about in the rest of this podcast.
[00:08:30]
**Tom Wilkie:** And just again, I know I'm kind of really stringing out this intro here, but just to set the scene. So Mimir, the original architecture or kind of the second architecture, not the third one, this was inspired heavily by Cassandra, right? So it did replication factor three, it did quorum reads and writes. It offloaded everything to object storage eventually. But in the meantime, we had this big kind of tier of services that we call ingestors that stored everything three times for reliability. And then obviously sharded things for scalability.
**Marco Pracucci:** Yeah. And ingestors had a write-ahead log to re-replay the data, the most recent data that does not yet been uploaded to the object storage in case of restarts or crashes.
**Tom Wilkie:** And the thing, I guess, spoiler alert, the thing we're trying to fix or have fixed in the second architecture is like this replication cost a bunch of money, right? It cost us money to store this data three times. You've used a lot of local storage for this write-ahead log. The chatter between all the services on the network meant it was expensive in network costs. All of these things were really holding us back from really achieving massive scale, economically, if not technically.
**Marco Pracucci:** Yeah, that's absolutely correct. Replication factor three is expensive. On the ingestion path, what we call the write path, we have to keep three copies of each data. And when we query back the most recent data from the ingestors, we have to query at least two copies in order to guarantee consistency.
But I think that does more than just cost reduction. It's also about reliability. In what I now call the old architecture, ingestors are also the weak link between the read path and the write path. The typical issue we have is some heavy queries issued by customers that overload the ingestors. And if ingestors are overloaded, the ingestion path, the write path is affected as well. So we may fail to ingest new metrics. In the worst case scenario, we will also have an outage affecting both the write path and the read path at the same time.
**Tom Wilkie:** Yeah, the ingestors are a big distributed single point of failure.
**Marco Pracucci:** Big distributed replicated single point of failure. Yeah. Also, the way the sharding and the replication works in the old architecture makes the ingestors fragile when it comes to computing the quorum. Essentially, you just need two random ingestors to be unavailable at the same time to have an outage, which is something we wanted to address with the new architecture as well.
[00:11:00]
**Marco Pracucci:** So actually, I think we can summarize the three big goals of the new architecture. First, decoupling the read and the write path. I'm used to saying that whatever happens on the read path should not affect the write path and vice versa. Second, we would like to have a better resiliency on node failures. We will dive more into this later, but the core idea of the new architecture is to have a predictable partitioning scheme, so that if one node is unavailable, we know exactly which other node to go to query the same data of the unavailable node. And last, but not least, reduced costs of running the Mimir infrastructure, reducing the Mimir TCO. And one of the biggest cost reductions comes from the reduced replication factor and quorum in the new architecture. We don't have replication factor three and quorum two anymore.
**Mat Ryer:** Yeah. So when you do these big kind of redesigns, is this something where you literally do it in docs, you sit down and just have to think through basically all of this at the same time, or do you try and kind of break it into bits which you can specialize in? And can you do iteratively? How does it work?
**Marco Pracucci:** My personal approach is to always start with a prototype to validate the idea. The prototype should be a quick hack. I think about prototypes like hackathons, something you should get done in a week. But in a week, you should get something working. Very hacky, but at least to prove the idea.
Then, to me, it's the right time to go back to paper, write down a doc with the high level design of the system you want to build. Describe the problem you want to solve and how you plan to solve this problem. And then there's the implementation phase or what I call the productionizing phase. If it takes one week to build a prototype, it probably takes one year to productionize it.
And that's where I typically try to break down this big work into smaller deliverables, something that we can iteratively build and roll out to production. My main goal typically is just to validate our assumptions and our ideas as soon as possible. I just don't want to spend a year building a new piece of software and then realize that it doesn't work because we haven't considered something. We should continuously validate our assumptions.
[00:13:30]
**Tom Wilkie:** Yeah, I love that. I remember this distinctly, when this happened, by the way, because we were working on adaptive metrics at the time. This was almost two and a half years ago, when we first started this. We were working on adaptive metrics. This is the idea that we can aggregate metrics before we write them to disk. And the whole idea was the aggregator had to be cheaper than writing them to disk, right? And we were actually struggling to do that. The aggregating metrics before writing in tenant would be more expensive.
And one of the engineers not here had the idea, let's use Kafka, so we don't have to do a lot of the quorum consistency and the replication. And let's just put everything in Kafka and then write these really lightweight aggregators and a very, very different model to how we were architecting things. And it turned out to be more cost effective. And we started to get used to running these big old open source Kafka clusters with all their local disks and all the things that come with that.
And honestly, we couldn't have timed it better because about the same time, this project popped up on Hacker News, right? And you were number one on Hacker News for a while, weren't you Ryan? And it was like WarpStream, pitched as Kafka, but on Object Storage.
**Ryan Worl:** Correct. Yeah, we were on the front page of Hacker News with our initial launch blog post for, I think it was like a day and a half. The response was mixed, I would say, for sure. There was a lot of skepticism, but I'm glad the timing worked out for you.
**Tom Wilkie:** No, the response internally was like, none of us particularly liked the operational aspects of Kafka, but we definitely liked, or were beginning to understand and admire, the architectural implications, I think, of Kafka. And so the fact that the project popped up at the exact right moment for us was so serendipitous.
And then I remember, Marco, you and I had a chat about could we, should we learn from adaptive metrics and re-architect the whole of Mimir around this concept? And obviously, Marco, you kind of took that and ran with it for a few more years. Is my retelling of history just rose-colored glasses? Or is that actually kind of semi-accurate?
**Marco Pracucci:** No, it's accurate. It's definitely accurate. Speaking about Kafka, I always try to distinguish between the Kafka protocol and Kafka the software. I think Kafka the protocol is great. I think Kafka the software is quite hard to operate at scale. And that's where, for us, WarpStream was some fresh air because it kept what I think is a good protocol with an easier to operate implementation.
[00:16:00]
**Tom Wilkie:** And not just easy to operate. I mean, we as an organization at Grafana Labs, we don't actually run a lot of JVM-based services, right? It's not an area of expertise. I think a lot of the challenges we have with operating Kafka come from the fact we don't have the expertise in operating the JVM at scale. And Ryan, WarpStream's not JVM, right?
**Ryan Worl:** No, WarpStream is all written in Go. That was one of our goals as well, was to make the operations, down to the level of I don't need to understand how to make a fancy cluster thing run, all the way down to the installation process on the single machine, designed to be as easy as possible. And Go definitely makes that possible.
I think the JVM definitely is better than it was, say five years ago, in terms of how easy it is to operate, but yeah, it's still scary for a lot of organizations that don't have somebody who understands deeply how the JVM works and a lot of times it's not worth hiring them. If you didn't already have a need for somebody that was running Cassandra for you or something like that... I was, I think like a lot of people and myself included, carrying scars from 15 years ago using the JVM. And obviously it's a lot better now than it was then, but yeah, back in the Cassandra days, I could not wait to get away to a compiled language. Go came along, which was nice.
[00:17:30]
**Tom Wilkie:** So I guess, Marco, what do we, what is the second architecture? How has it changed? And I guess we've spoiled it now by talking about Kafka, but what are we doing? And how is it different from the old Mimir architecture?
**Marco Pracucci:** Yeah, well, I guess what, to introduce Kafka in the big picture. The main idea of the new architecture is to put Kafka between the write path and the read path. Essentially, a write path request, so a request to ingest metrics, completes as soon as the data has been committed to Kafka. And then this data is asynchronously replayed by ingestors. Ingestors in the new architecture are just read path components, pure read path components that are used to serve the most recent data for queries.
So the core of the idea is to put Kafka between the write path and the read path and leverage on the guarantees that the Kafka protocol gives us, for example, full consistency on a given partition. One of the core differences between the old architecture and the new architecture is that ingestors consume from partitions. And when they consume from partition, the consumption is gapless. If you restart an ingestor and the process is down for five minutes, at startup, it will resume consuming from where it was left. There are no gaps in the data, which is very different from the old architecture.
This allowed us to have just quorum one on the read path. We just need to query each partition once. We don't need to query two copies of each data, because there may be gaps in one of these two copies, like in the old architecture.
We still have a replication on the ingestors for high availability, but instead of having a replication factor three in the ingestors, we have replication factor two. So we have two copies of each data, and we just query one of these copies. So again, we have replication factor two, just in case one of these ingestors becomes unhealthy or restarts or whatever. We still have another copy of the partition. That's the gist of the new architecture.
Obviously, there are nuances in this architecture. There are some trade-offs we picked. But the core difference is this one, having Kafka in between and a different partitioning and replication scheme.
[00:20:00]
**Tom Wilkie:** One of the things, again, a lot of things happened around this time. Grafana Cloud was predominantly deployed on Google two years ago, and we started to see a lot more demand from customers and partners for deployments of Grafana Cloud on Amazon. I remember around the same time we started spinning up regions on Amazon. I think we've now got 50 of these regions, not just on Amazon, but across a bunch of providers.
It's interesting, because in the Google world, we used to deploy everything in just a single availability zone, because our observation of Google's kind of zonal and regional reliability is that they're basically the same thing. I don't really speak for Google here, but there's a lot of regional global services that go down, and then it doesn't matter if you split your application across multiple zones.
But we found in Amazon that was not the case. We found when we started deploying in Amazon that we really had to go multi-AZ because their zonal availability was worse than their regional availability and their regional availability was probably better than Google's regional availability, again, not sponsored. We had to battle then with a lot of the increased network costs, didn't we?
**Marco Pracucci:** That's right. The main downside of going multi-AZ is that if you do a bunch of cross-AZ data transfer, your TCO will increase significantly. Both in GCP and AWS, you pay for cross-AZ data transfer, and essentially you pay two cents per gig to transfer from one AZ to another AZ. That's when we started to look at WarpStream. If you run the new architecture on top of Kafka, you will still pay for cross-AZ data transfer to replicate the data between AZs. The key changer, the game changer for us was WarpStream, adopting WarpStream.
**Tom Wilkie:** How does WarpStream help with the inter-AZ traffic, Ryan?
**Ryan Worl:** The way that WarpStream solves the cross-AZ data transfer problem is instead of replicating at Kafka... The open source Kafka project works roughly the way Mimir did in the second generation architecture, where it's replicating, it's not like a quorum consistency. It looks a little bit more like Raft, where there's a leader, and then it's replicating to followers, but the high level idea is the same. Where you're replicating from the write coming in in one availability zone, and it's being replicated to the followers in two other zones, if you're running in a three-AZ setup.
[00:22:30]
The way that WarpStream differs is that instead of replicating over the network directly between nodes, WarpStream writes data directly to object storage first. So we receive a bunch of concurrent writes from different clients, batch those writes together, and then write a file to object storage.
And in AWS and GCP, basically everywhere that there is object storage, the guarantees are that the data is replicated across the same number, some number of availability zones that would be equivalent to three. The data is replicated, and you can read it out from any of those other zones and not pay any cross-AZ data transfer costs. So by kind of using object storage as both the storage layer and the network layer, you can bypass the cross-AZ data transfer costs in exchange for accepting the latency penalty that comes with writing the data to object storage first.
**Mat Ryer:** It sounds almost like a loophole in Amazon's billing model. How does Amazon feel about this? Have you had any interesting conversations with them?
**Ryan Worl:** So no, I haven't had any conversations with Amazon. You're not the first person to bring that up. And basically there's a long line of companies whose products take advantage of this loophole. The biggest and most obvious one is probably Snowflake. Snowflake has been using a very similar model to this for well over a decade at this point. It's just so strongly baked in that it's kind of impossible to take it back without upsetting some huge slice of their customers. Yeah, it would be really challenging, and then also the competitive dynamics between the major CSPs, I don't think would let it happen. And it's the same, we talked about S3 and Amazon, but the same is true on other CSPs.
**Tom Wilkie:** Yeah.
**Ryan Worl:** The only major CSP that is different in this respect is Azure, in that they just don't charge for cross-AZ data transfer at all. They said that they would for a long time, and it was on the pricing page, but never actually happened. And they recently publicly announced that they will not.
**Tom Wilkie:** Yeah, they did finally acknowledge that they weren't going to. It's like Google for the longest time never charged for Prometheus queries for Google Managed Prometheus. I don't know whether they do now or not, but they didn't for a very long time.
**Marco Pracucci:** Well, AWS just started to apply the pricing for some load balancer networking cost in May 2025. It's recently discovered. Yeah, it was something in the pricing pages for several years, but it was not actually billed.
**Tom Wilkie:** Interesting. That explains a few things. Marco and I have been working a lot on our cloud bills this past few weeks, so we've been deep in the cost explorer.
[00:25:00]
**Cyril Tovena:** I do have a question for you, Ryan. I'm wondering, what's the catch? It sounds like WarpStream is solving a lot of our issues, but any software has probably trade-offs. So what's the trade-off in WarpStream? How does that work?
**Ryan Worl:** Yeah, the biggest trade-off is basically end-to-end latency and produce latency. So those are two distinct components because I like to break them down differently and explain where the latency comes from.
So obviously, writing data to object storage is much slower than writing to say a local NVMe SSD. That part is really obvious. The primary cause of latency in WarpStream, though, is actually the batching that you have to do in order to make it cost-effective. So instead of writing, if you had some strawman where you wanted to write just one record into one object in object storage, obviously you'd get no cost savings because every write to object storage is fairly expensive. But if you instead batch a bunch of records together and then write them all into one object, as the size of the object grows, you get to amortize the cost of that put over more and more bytes. So the effective cost per byte goes down.
And the knob that you have to tune the infrastructure cost in WarpStream is basically how long do I wait before I write that data to object storage. So we can kind of tune the produce latency up and down based on what costs you're willing to absorb.
And now there's also different tiers of object storage, like there's S3 Express One Zone in AWS and Azure Blob Storage has a premium tier that is lower latency. So the storage cost is higher on those tiers, but the latency to write is lower. But it still costs something to do that initial put and you still have to play this game of tuning how long do I want to wait.
So that's the produce side. The end-to-end latency is also higher because we're batching the same way on the other end. We're basically doing the same process in reverse to read the data out. But on the read side, there's a little bit more flexibility because you can cache the reads from object storage. Because the objects are immutable, you can cache them. So if you get subsequent reads from concurrent readers of the same partition, or, getting a little too into the weeds of how our storage engine works for now, but basically, it's all about waiting. How much are you willing to wait in order to achieve a certain cost threshold?
Obviously, you're taking a little bit of a risk on a new implementation of something being perfectly compatible. It's like every new Postgres-compatible database that comes out has its own little quirks. But basically, the big one there is the latency.
[00:28:00]
**Cyril Tovena:** Yeah, I found that interesting because in my previous company I was working at, they were using Kafka for an event system. If something is happening in the system, they will create an event. And then at some point, it will go into something like an OLTP or database or whatever. And the latency was never really something that was required. But ultimately, what was required is more being able to just put something somewhere and it will be there and I will be able to consume it at some point.
So I'm wondering, do you think the biggest majority of Kafka users actually will fit into "I don't care much about the latency"? Or do you think it's actually a lot more people that need the latency constraint on Kafka?
**Ryan Worl:** Yeah. In terms of byte volume and infrastructure spend, it's by far the non-latency sensitive use cases. In terms of business value or in terms of pure customer account, I think it's much, much less clear.
So the way that we found our introduction into the market basically was trying to target these latency insensitive use cases where it's like logging, metrics, some kinds of CDC workloads, where the latency doesn't really matter. You just want the cost to be as low as possible and the operational burden to be as low as possible. And we're letting other, with WarpStream being part of Confluent now, there's a whole suite of other Kafka things at Confluent that can be used to accomplish the rest of the spectrum of use cases.
And then WarpStream also, the BYOC model of how we deliver it is also a little different.
**Cyril Tovena:** For our listeners, Ryan, what's BYOC?
**Ryan Worl:** Yeah, BYOC is an acronym as well, obviously. BYOC stands for Bring Your Own Cloud.
**Tom Wilkie:** Oh, cloud. Yeah, I was going to say I didn't need to bring one of them, Tom.
[00:30:00]
**Ryan Worl:** Yeah, the idea is basically that there's some software that you want to run from a vendor and you would like it to ideally run inside your own VPC and on hardware that you're in control of because maybe you're processing sensitive data and you have compliance restrictions about what you can do with it. Or you have really good discounts with your cloud provider and the infrastructure cost would be lower if you ran it compared to running it in a vendor account with no discounts or whatever the vendor is willing to give you.
There's a lot of different reasons why people like BYOC, but basically the way that we deliver WarpStream is you get a Docker container, we have a Helm chart, you run it in Kubernetes on your own Kubernetes cluster in your VPC and we don't have any access to the actual raw data flowing through the cluster. So it lets you kind of sidestep a lot of annoying compliance burden if you were to use a vendor that didn't have a compliance certificate list a mile long. Confluent has a lot of them, so they're very effective. But if you're looking outside of Kafka at other infrastructure providers, there's a lot of people out there that will run different pieces of infrastructure software for you, but will it fit with your compliance is a much harder question to answer, especially if you're in a very regulated industry like healthcare or financial services.
[00:31:30]
**Tom Wilkie:** So I guess one of the things I'm interested in here, you talk about volume and latency being, you know, the less latency sensitive applications being the higher volume ones. Marco, do you happen to know off the top of your head what our volume is going through WarpStream? Because we've moved a lot of workloads to it now, haven't we?
**Ryan Worl:** I'd love to know this too.
**Tom Wilkie:** Yeah, come on.
**Marco Pracucci:** We can say. Marco, are you loading a dashboard right now?
**Tom Wilkie:** No, I know off of my mind. These are numbers I look at every day. And our largest installations are currently producing 20 gigabytes per second uncompressed for a single WarpStream cluster. And since, as I mentioned before, we have replication factor two, we consume the double. So we consume 40 gigabytes per second uncompressed.
We are using snappy compression and our data is not super compressible. So the compressed data transfer is not much lower. It's about seven, eight gigabytes per second produced compressed.
That's funny because when we started prototyping the new architecture and we approached WarpStream, we benchmarked WarpStream at one gigabyte per second uncompressed and we thought that was a big number. And now in production in a single cluster, we are doing more than 20x that.
**Tom Wilkie:** I love that you're quoting a single cluster. Single big cluster.
**Marco Pracucci:** But I would look at it. Yeah, that's very important for all of our clusters, right?
**Tom Wilkie:** Yeah, that's not an aggregate across all clusters. When you talk about one billion series, it's not one billion series across 10 different clusters. It's in a single cluster. To me, that's what makes the big difference when it comes to scalability. Because you have some unit of data, per-tenant data that you can't really shard between different clusters. So that's why I think per-cluster numbers matter more than the aggregate across all the clusters.
**Mat Ryer:** Yeah, I was just looking for that big vanity metric.
**Ryan Worl:** Yeah, I was going to say that's the real fun one for me. It's like, I can go quote this podcast and say, so much data.
**Tom Wilkie:** Have we exceeded, we must have exceeded a hundred gigs a second now in aggregate.
**Marco Pracucci:** Yeah, for sure.
**Tom Wilkie:** Yeah. Okay.
[00:34:00]
**Tom Wilkie:** So Ryan, where does that put us amongst all of the WarpStream users?
**Ryan Worl:** Oh, that's a good question. So I don't think I can say precisely, but you're definitely in the upper percentiles. That's for sure.
**Tom Wilkie:** Very diplomatic.
**Ryan Worl:** Yeah. In terms of the amount of traffic in one cluster, you're about parity with our biggest other customers.
**Tom Wilkie:** Challenge accepted.
**Ryan Worl:** Yeah, keep going.
**Tom Wilkie:** Yeah, we will. We've got some big, we just have to write less efficient code.
**Mat Ryer:** Oh, that's easy. That's the easy one.
**Tom Wilkie:** Yeah. No, we've got some big customers coming up pretty soon. So I think we'll be pushing those limits pretty soon.
**Tom Wilkie:** And Marco, you've been using WarpStream now for over a year. How has it been? Has it delivered on everything that Ryan promised?
**Marco Pracucci:** Pretty much. Pretty much. Working with WarpStream people was great. Along the road, we naturally had some issues here and there, like some performance issues or scalability issues. But at the end, the support we got from WarpStream was great.
So to get back to your question, operating WarpStream is fairly easy. In particular, operating the agents is very, very easy. Agents are stateless. I personally consider the agents the dumb part of the system. The system is essentially divided into two big blocks. The agents on one side and what we call the WarpStream control plane on the other side.
So operating the agents is very straightforward. They are stateless, easy to auto-scale. The load is evenly distributed between the replicas. I think we never had a single issue caused by the agents.
Over time, with our scale growing, we've hit some performance or scalability issues more on the control plane side. And that's where we got a lot of support from WarpStream. They both provided short-term workarounds and long-term fixes and improvements.
I personally think that a year and a half ago when we benchmarked WarpStream at one gigabyte per second, correct me if I'm wrong Ryan, but I don't think at that time it would have succeeded at 20 gigabytes per second. It was more a natural growth supported by continuous improvements from the WarpStream team.
And we've also seen that the inter-AZ costs not be there and like it behave the way we expect. Yeah, there's still some cross-AZ data transfer. Both on the Mimir side, we also have a control plane. And on WarpStream side, they also have a control plane. But that's a zero point something percent compared to the data plane. The bulk of the data transfer is the actual data plane. The control plane data transfer is minimal and not significantly impacting the TCO.
[00:37:00]
**Marco Pracucci:** What's actually impacting the TCO a lot is the S3 cost or the object storage cost. If we look at the WarpStream TCO for us, 50% of the TCO is S3 or the object storage in other cloud providers, and 50% is everything else, like the CPU, the memory, the data transfer.
**Tom Wilkie:** And how much of that is the CloudTrail events?
**Marco Pracucci:** I can't speak about it.
**Tom Wilkie:** Well, for the listeners, Marco and I have been doing a lot of work on our cloud costs, and one of our engineers found hundreds of thousands of dollars a month worth of CloudTrail events that had gone unnoticed. But now it was quite easy to turn them off, so we can't blame WarpStream for that.
**Marco Pracucci:** Yeah, if you're on anything you're on top of S3 that does a lot of operations, and you have CloudTrail enabled for S3 events, be aware of the growing bill.
[00:38:00]
**Tom Wilkie:** And we've also seen, as WarpStream and the new Sigyn Kafka architecture allowed us to scale Mimir to bigger tenants, has it delivered on the kind of cost savings in a global sense? Has it delivered on the reliability improvements that we talked about at the beginning?
**Marco Pracucci:** A lot of questions. Sorry.
**Tom Wilkie:** I'm a terrible interviewer. I know.
**Marco Pracucci:** You're great. Well, let me start from this one. Every say in cost reduction saving. Yes, absolutely. We did. Should we share some percentage numbers, or should we skip it?
**Tom Wilkie:** Yeah, I think you can, maybe don't give away the absolute TCOs, but the relative improvements we see should be fine.
**Marco Pracucci:** In Grafana Cloud, we reduced the Mimir TCO by 25% moving from the old architecture to the new architecture. And it's fair to say that we started from a position where we squeezed TCO down as much as possible in the old architecture. So we started from a position where we optimized everything that came to our mind, and to further optimize TCO without that, it was required to re-architect part of the system.
**Marco Pracucci:** What about scalability improvements?I wouldn't say that the new architecture is more scalable than the old one, just because the old one, I consider the old one very scalable as well. I think even with the old architecture, we reached the point where there's nearly no size that scares us too much.
**Tom Wilkie** I'm going to quote you on that, Marco.
**Marco Pracucci:** Yeah, sure.
**Tom Wilkie** 10 billion active series, Marco says we can do that now.
**Marco Pracucci:** Yeah. Well, you can do it at a cost. I think we got some knobs. We recently introduced support for cross-cluster federation. So we are now able to shard the tenant even between different clusters. I feel quite comfortable saying that now we have options both in the old architecture and in the new architecture to support our most demanding customers.
[00:40:00]
**Marco Pracucci:** I think it's more interesting the question about reliability because that's where we got measurable improvements compared to the old architecture. Mostly thanks to the decoupling between the read path and the write path and the different partitioning scheme as I mentioned before.
Now, this is harder to measure, but it's fair to say that over the last year, we've seen some issues affecting only one of the two paths, like just the write path or just the read path. There are more common issues on the read path, like performance issues caused by heavy queries, but not affecting the write path anymore.
With the old architecture, the write path would have been affected and in the worst case scenario, you could have a full outage both on the write path and the read path. In the new architecture, we also had a couple of bad outages on the read path, but the write path was just continuing ingesting metrics as usual.
**Tom Wilkie:** And I guess now because you're consuming from Kafka for the read path, you can scale, you can read or consume multiple times to scale the reads now. I guess you didn't have that possibility before.
**Marco Pracucci:** We didn't have that possibility before. Yeah. In the new architecture, there's also a new component that I didn't mention. It's called block builder and it's responsible to periodically consume from Kafka or WarpStream and build the TSDB blocks uploaded to the object storage for the long-term storage.
Previously, it was the responsibility of the ingestors to build these blocks, but now we delegated this responsibility to a new component. And that's yet another component that we can horizontally scale. If there's high load on the write path, we can just scale out the block builders and build smaller but faster blocks.
**Tom Wilkie:** Nice. Very cool.
**Marco Pracucci:** In the old architecture, when ingestors, since the data is replicated three times, when the ingestors upload the blocks to the object storage, these blocks contain duplicated data and specifically contain three times duplicated data.
In the new architecture, the block builders consume from Kafka partitions and in a Kafka partition data is not replicated. So we consume each data point just once and when we upload the blocks to the object storage, they are already de-duplicated.
We still have compactors to compact or horizontally-sharded blocks. Essentially blocks are sharded by time. So you have, instead of having a lot of one-hour blocks over time, one-hour blocks are compacted into two-hour blocks and then six-hour blocks, 12-hour, 24-hour blocks and so on. But the compactor does not de-duplicate metrics anymore in the new architecture because we don't have duplicated data in the object storage in the first place.
**Tom Wilkie:** So the compactors got cheaper to run as well then?
**Marco Pracucci:** Compactors were already cheap, so I wouldn't claim that a significant TCO reduction comes from compactors, but yeah, they're slightly cheaper.
**Tom Wilkie:** Nice. Very cool.
[00:43:00]
**Tom Wilkie:** And I guess we like this so much that we've rolled it out to everything in Grafana Labs, right? So we're doing this in Loki, we're doing this in Tempo. I think we're looking at doing, I don't know if we're doing it in Pyroscope, but I wanted to touch on something that I think is very interesting that we haven't talked about yet. Cyril, for Loki it's a bit different. It wasn't specifically just about TCO, it was about innovation. I think it's quite hard to actually innovate when you have a replication factor three because if you want to create a new index, then where do you create that index? It's kind of hard to do that. So I think it's nice to be able to decouple or consume the data so you can build an index on the side.
**Cyril Tovena:** Something that wasn't possible. Another example is maybe adaptive logs or OpenTelemetry that we have now. I think it's, if it's very well, being able to have another set of workers that ingest the same data but for different purposes, that's definitely the difference you get.
**Tom Wilkie:** Of course, the original adaptive metrics team recently moved over to using WarpStream instead of the open source Kafka as well. So yeah, that's another bonus.
**Cyril Tovena:** Yeah, and the adaptive logs are also super excited about this, yeah, and they're also doing the same, yeah.
**Tom Wilkie:** And how far, I guess, we talked about this as if Sigyn's done and dusted. What have we got left to do? How many clusters and tenants have we got left to migrate? What does the future hold?
**Marco Pracucci:** Right now, about 60 to 65 percent of our customers are already running on the new metrics architecture and the plan is to move 100 percent of the customers by the end of the year.
**Tom Wilkie:** And where are we at with the Loki, the Loki, I think we call it Thor internally for Loki.
**Cyril Tovena:** So we call it Thor. I think we, so we are probably a bit behind because Marco started this. So we're following and it's like, we have a lag of six months, if not more. But we are already in production. There's like five cells that are running in production. So every new cell that we create in Loki is using WarpStream by default. But changing other cells takes more time. And I think this is something we haven't touched on, but all the work that Marco did is "on the fly engine." It's definitely difficult to change the ingestion from replication factor three to WarpStream as we are ingesting billions of metrics. So this is a difficult task for sure.
**Mat Ryer:** "On the fly engine" is swapping the engines on a plane mid-flight.
**Cyril Tovena:** Yeah. Okay, good. I'm glad about that.
**Mat Ryer:** Yeah.
**Cyril Tovena:** After "the fly engine." Yeah, we're just going to say "the fly engine."
**Mat Ryer:** This is going to be the quote. Yeah, that's the poster now.
[00:46:00]
**Mat Ryer:** But the thing is, I can't imagine working on something that takes a year. I can't really sort of fathom. Marco's been working on Mimir for five years. I'm sure using Cursor. Marco, just type in, "make use WarpStream, make this faster."
**Marco Pracucci:** I'm using Claude Code instead of Cursor. Because I actually love my IDE, which I use GoLand. And I'm very stuck to GoLand. So I found Claude Code very interesting because you can use the LLM on the terminal while using your IDE without having to learn a new IDE. So I'm using Claude Code nowadays. I love it. I think it's typically smarter than me, especially on small pieces of software.
It's still quite difficult to use it for Mimir itself. The codebase is very complicated. It's huge. It's lacking a bunch of context there. But over the past three days, for example, I've built a tool for some internal networking analysis. I 100% built the tool with Claude. I didn't write a single line of code. I just typed what I wanted. I guided Claude step by step, like I told him, build these components, components work this way and so on. I didn't ask for the final result. And I just spent days telling Claude what to do. But for these kind of tasks, I think it's very good.
**Mat Ryer:** And speaking about Warp, I'm now using a terminal called Warp.
**Marco Pracucci:** Not WarpStream.
**Mat Ryer:** But yes, I use that. That's very good. If you have to do GitHub merges and stuff, it will do the thing for you. It'll resolve it. And you can kind of tell it, "prefer this." It can choose the content as it's merging. So you can tell it in advance, prefer the more complicated prompts or something.
It's also really good for finding things on your network. You know how you need to know the IP address of some random device. If you just ask it, it finds it. It just uses all the tools to figure out.
**Tom Wilkie:** How do you feel that there's now a new shinier Warp on the agenda?
**Ryan Worl:** I actually think it's older. I think it is. So we previously had a different product, my co-founder and I, called WarpTable. And I think that that might predate Warp the terminal. So I guess I don't know who's really there first.
**Tom Wilkie:** But are you using Claude Code or Cursor or Warp internally?
**Ryan Worl:** Yeah, so we're big Cursor folks on the WarpStream team. Cursor is actually a customer of WarpStream, which is very exciting. They're super fun to work with. I thought a pretty good experience using Cursor.
I tried kind of the same strategy that Marco was describing where it's not great at a really big complicated codebase, just asking it to do arbitrary things. But if you can kind of scope a very narrow problem for it, like writing a Go package, for example, if you can ask it to write just one package and then you glue everything together yourself, or it's very obvious at the end how to glue things together and you ask Cursor to do it, then the success rate is pretty high. But just asking it to do some arbitrary thing in a giant half million line codebase is not going to work.
**Mat Ryer:** Yeah, the higher level stuff, the harder it is. There's more options for it to sort of get creative or just misunderstand. Whereas if you build small little focused bits, it does a surprisingly good job. And then yeah, let's say you stitch them together to create the whole. So you're kind of still doing the engineering.
**Tom Wilkie:** Is this you, are you just firing your salary here, Matt?
**Mat Ryer:** It's like, well, honestly, sometimes I will just vibe it if it's UI stuff, and it's essentially just tweaking CSS.
**Tom Wilkie:** Most of the time, Matt.
**Mat Ryer:** I mean, yeah, you should see some of the new stuff we've built though.
**Tom Wilkie:** Should I?
**Mat Ryer:** Wouldn't mind. Yeah. You'll see it.
[00:50:00]
**Tom Wilkie:** I think, Ryan, was there anything we should have asked you that we didn't?
**Ryan Worl:** I think the, what I'd be interested to learn a little bit more about is like, how has using WarpStream changed the way you're thinking about new features? And has it opened up any possibilities? I mean, Cyril, you kind of alluded to a little bit, but what are the things that you're, things that were on the to-do list of "what if we had an infinite amount of time to do this new big feature," but now maybe it's a little easier that you have the ability to reconsider the data multiple times?
**Cyril Tovena:** Yeah, I think so. I'll answer for my side. I think Marco, you probably have another answer. For the Loki folks, it's definitely around being able to build another storage on the side while the current storage is still working. So that's kind of nice because you can experiment. So that was a change, I think, in the culture of the team, being able to experiment on the side by building another storage or maybe multiple storages and see which one works the best. So yeah, those kind of experiments weren't really possible before.
**Mat Ryer:** It's very common when people are building technology not to really think about that. But this, like you say, Cyril, that this flexibility, having, we try and do it, we'll try and make choices that give our future selves the most options because we assume we don't really know. We have a strong feeling about things, but you don't really know and often we're wrong. And pivoting is kind of, especially when we're working in AI stuff, there's new things every week to sort of pay attention to.
So yeah, I think that's a good lesson for anyone building systems really. Remember that probably, especially if it's platform, you've got engineers that could be working on it and building things. How do you enable them?
And I'm not surprised that Cursor and Grafana Labs are using WarpStream. I think it's right, Ryan, it's very probably the best technology for ingesting data if you don't care that much about the latency. So I think it makes all sense for me.
**Tom Wilkie:** Marco, what's on your agenda? What does WarpStream unlock for you?
**Marco Pracucci:** For sure, it allowed us to build the new architecture faster. If we would have had to rebuild a WarpStream-like solution in Mimir itself from scratch, it would have taken another year. So for sure, it was about delivery speed and velocity.
[00:52:30]
**Marco Pracucci:** To answer Ryan's question, a use case we recently had and we immediately thought about WarpStream was state synchronization or state propagation in Mimir. So far for the control plane, we always used memberlist. It's a gossip-like protocol, which is great as far as you don't need a huge data transfer and you are fine with just the in-memory propagated state with no persistence.
But we recently had another use case for a new service we are building, where we immediately thought about WarpStream and we started building it around WarpStream to propagate state changes between replicas.
**Cyril Tovena:** Tom, you didn't ask me if there's any question I wanted to.
**Tom Wilkie:** No, no, you were next. I haven't got to you yet. Sorry, Marco. What should I have asked you? What should Matt and I have asked you?
**Marco Pracucci:** You didn't ask me if I could go back in time if I would make the same decisions.
**Tom Wilkie:** Okay, would you? How long do we have?
**Mat Ryer:** Well, I have another call to go to.
**Tom Wilkie:** It depends if you can go back in time. Can we go back in time?
**Mat Ryer:** What would you do differently?
**Marco Pracucci:** The new architecture, yes. WarpStream, absolutely yes. Actually, for us, the pain point was and still is, the higher produce latency. We knew since day zero, but we underestimated the impact on some agents used by clients to remote write metrics.
We were coming from a Prometheus background and Prometheus or Grafana Alloy, they have dynamic concurrency, they have no troubles handling a higher latency. But we learned the hard way that some customers are legitimately using other types of agents that are much more problematic when the latency, the write latency increases.
What I regret the most was not seeing this problem two years ago because if I did, I would have probably invested more time improving these agents while working on the new architecture as well. Now it's a bit late. We did baby steps to contribute to some of these agents, in particular the OTel collector, and at the end what we produced is some recommendations, some knobs to fine tune the config to give to our customers and support them. But still, I think the higher produce latency was the biggest pain point for us, and something I should have addressed back in time.
[00:55:00]
**Tom Wilkie:** Fair enough. Cyril, you can take us out, so finish the podcast with what should I have asked you, what would you have done differently?
**Cyril Tovena:** I don't think there's anything you should have asked me, but I think on Marco's last point, I will do like Matt and say, well, don't you use Claude to fix that?
**Marco Pracucci:** You mean Claude?
**Cyril Tovena:** I mean, yeah, Claude, okay. Yeah, for time travel. And I do feel like we are in the future sometimes with these new AI technologies. So it is quite exciting, but unfortunately time does still matter and we have run out of it.
**Mat Ryer:** But thank you so much. That was really great, kind of deep technical chat. I learned a lot, I've got a lot of notes. Marco, I've got a lot of questions for later.
Last time we were in Montreal, we did a thing in Montreal, Cyril, Marco and I, and they said, "Do you want to come for a cycle around the city and just see it?" And I was like, "Oh, yeah, that sounds nice." Marco says, "Yeah, normally I do like one and a half kilometers, something like this." So I was like, "Okay, that sounds good," but he meant up. This was a 53 kilometer bike ride around the city.
**Marco Pracucci:** Just a small bike ride.
**Mat Ryer:** Yeah, so, but yeah, very, and fast, obviously. But no, so if I went back in time, it didn't look fast. Well, thank you. If I went back in time, I would probably just say, "I'm all right, thanks, Marco. I'll just stay here."
**Marco Pracucci:** Yeah. Well, we're ready to bike again in Amsterdam in November.
**Mat Ryer:** Oh, but luckily Amsterdam is nice and flat.
**Marco Pracucci:** Yeah, everything is flat there.
**Mat Ryer:** Okay, well, this is also time to say goodbye. Thank you so much to our guests, Cyril, Ryan, Marco, fantastic stuff. Tom, yeah, thanks a lot. See you next time on Grafana's Big Tent.