Inside Prometheus 3.0: UTF-8, Native Histograms, and OpenTelemetry Interoperability

**Mat Ryer:** Hello! Welcome to Grafana's Big Tent. I'm Mat Ryer. This is the podcast all about the people, community, tools, and tech around observability. I'm joined by my good friend and co-host Tom Wilkie. Hello, Tom.

**Tom Wilkie:** Hello, Matthew. How are you?

**Mat Ryer:** Pretty good, thanks. Yeah, feeling good about this episode. You?

**Tom Wilkie:** This is the first one I've been on for season three, so yeah, excited to be back.

**Mat Ryer:** Yeah, welcome back. And it's about Prometheus 3.0... You've used Prometheus before, haven't you?

**Tom Wilkie:** Just a little bit, yeah. We were trying to remember, Julius and I were trying to remember the first time I used it and we met, and it's probably the best part of 10 years I've been using Prometheus.

**Mat Ryer:** That's amazing. We're also joined by Richi Hartmann. Hello, Richi. Also Grafana Labs, aren't you?

**Richi Hartmann:** Yeah...

**Tom Wilkie:** Man of few words...

**Richi Hartmann:** I'm German.

**Mat Ryer:** You're German, okay. Thanks for telling us that. That actually is good to know going into this... We're also joined by Julius Volz. Julius, are you also German?

**Julius Volz:** I'm also German. I can say a few more words than Richard, but not that many more.

**Mat Ryer:** \[laughs\] Okay, good. So it sounds like we've got ourselves a podcast then, at least.

**Julius Volz:** Yup.

**Mat Ryer:** Julius, you are the co-creator of Prometheus, which is kind of cool. And you're also at PromLabs at the moment. Just quickly, for anyone that doesn't know Prometheus, what is it and where did it come from and why does it exist?

**Julius Volz:** Yeah, sure. So Prometheus is a metrics-based monitoring system and time-series database, and it came about as a hobby project. So it was the year 2012, I came from Google to SoundCloud, moving from Zurich to Berlin, and at the same time Matt Proud also had done the same move, but from a different Google office; we didn't know each other. We were tasked with making SoundCloud more reliable. SoundCloud was always down, slow, unreliable, all these things, it already had a cluster scheduler, self-built, in-house, very improvised, and couldn't really monitor all the things going on there in a proper way anymore with the existing monitoring tools in the open source world.

So both Matt and I, we really, really, really started missing the monitoring tool that we had had at Google called Borgmon. So over a couple of months, we always came back to that same realization - if we want to improve things, we need to have better insight, better visibility, or nowadays you would maybe say observability... A religious topic. And yeah, eventually we just started building what then became Prometheus in our free time, and then more and more in SoundCloud time, and then eventually Prometheus was mature enough and documented enough and had enough components in its ecosystem, metrics exporters and so on, that we decided to fully publish it, both as an open source project, but also with a blog post from SoundCloud explaining what it is about.

A year later we joined the CNCF as the second project after Kubernetes. So it's all under an open governance now, many companies working under an open governance on Prometheus together to make it a better and better monitoring system. And yeah, the way you can imagine it, it collects metrics about your applications, your infrastructure... Anything really that can emit metrics. And then you gather those metrics, and then you have a query language called PromQL, with which you can do useful things with that collected data. You can wake yourself up at 3am in the night if either the database is down, or your wind park isn't generating power anymore, or there's something slow... Or you can have nice charts, you can do ad hoc debugging, you can do capacity planning... You can do anything you do with numeric metrics, basically. So that's Prometheus in a nutshell.

**Tom Wilkie:** Nice, nice. It's a similar founding story, I guess, to like Backstage and Spotify, right? An internal tool becomes this popular external open source project. Did you ever have conversations at SoundCloud about like SoundCloud actually offering this as a product, or offering this as a services around it? Because I realized Spotify started doing that with Backstage now.

**Julius Volz:** Oh, never. No, not at all. I think also in the first years of SoundCloud, very understandably so, most people at SoundCloud were at first kind of wondering "Why are you building a monitoring system? That's not what we hired you for. That's a kind of ludicrous idea." And honestly, if I had been my own manager back then, I maybe would have told myself "Cut it out or you'll be fired." But we were lucky and it worked out... And of course, we put in a lot of work to make it work.

Eventually, we kind of reached a tipping point internally where both politically and technically everything was working well enough, and accepted well enough that people actually said "Okay, every new service has to be monitored with Prometheus now, because it's like really changing the way we work with metrics and visibility in our services." But still, SoundCloud is in such a different type of business that they never even had the idea of "Let's do a business around Prometheus." They were very generous though, allowing us to kind of really separate it out of SoundCloud, although we had spent that much SoundCloud work time on it... You know, just allowing us to fully release it into the CNCF, and all that. That was really nice. Yeah.

**Tom Wilkie:** What is it about music streaming companies building open source infrastructure projects, though? ...like Spotify and SoundCloud?

**Julius Volz:** That is a very good idea, a very good question. Someone should look into that.

**Tom Wilkie:** Yeah. I wonder what we're going to get out of Apple Music. It's not going to be faster UIs, is it?

**Mat Ryer:** So one thing that struck me when I first encountered Prometheus many years ago was that it was just a sort of text based format. When you get the metrics, it's just sort of text. And that always strikes me as either very naive, or someone's really thought about this. Which one was it, Julius?

**Julius Volz:** \[00:05:51.21\] Obviously, someone really thought about it. No, but I think it is true... So there's multiple layers of history to this. If you look at what Borgmon did internally at Google at the time, there was also text-based format... Though, curiously, the first version of Prometheus actually used the JSON format. The very first alpha, pre-alpha, "Let's test something, let's get something going" was JSON... Which kind of really didn't work too well, because you needed to parse the entire JSON body; very inefficiently, and all that. It didn't work too well.

And then eventually, we came up with both a protobuf format and a text-based format, and then with Prometheus 2, the protobuf format was kind of ditched in favor of "Hey, this text format actually works really, really well. It's also really highly optimized in the ingestion path", where we try to not allocate any memories for metrics we've already seen, and so on... And the really nice thing about this format is the low barrier of entry. So you can even emit it from a shell script from whatever weird environment that doesn't have a full stack for modern programming languages. You really only need to serve something that looks like a very primitive HTTP endpoint, outputting a few ASCII characters, and that's it. And then you can get metrics out of that thing, whatever it happens to be, with a Prometheus server. And that is in contrast to try doing protobuf from a shell script, or some embedded processor or so. So I think that actually really helped Prometheus as well, because it made it really easy for anyone to expose metrics from anything they cared about.

**Tom Wilkie:** I'm glad you brought up protobufs there, because - two things on that. One is they're coming back though, aren't they?

**Julius Volz:** They're coming back. Yeah.

**Tom Wilkie:** Yeah, with the new native histograms. But also, have you seen the HyperPB stuff from the Buff.Build team?

**Julius Volz:** Hyper... What is that?

**Tom Wilkie:** Hyperpb... It's the new just-in-time, data-driven --

**Julius Volz:** I have not. Tell me about it.

**Tom Wilkie:** Oh, I haven't used it yet. I've only seen it on Hacker News in the past few days.. But it looks really exciting. It's supposed to be like 10 times faster than even like generated code, which obviously Go doesn't do.

**Julius Volz:** Awesome.

**Tom Wilkie:** So I really want to check it out for the remote write path, where we're still very proto-heavy, and it's still a bit of a bottleneck.

**Julius Volz:** Yeah. Some other protocols there do use protobuf... And yeah, maybe for completeness sake, protobuf for actual metrics ingestion from monitored services or targets is making a comeback. We're still always going to support the text-based protocol, but Prometheus is getting this new type of metric called a native histogram, and it has very fine-grained buckets. It is way more efficient to transfer it in a binary-based, well-structured format, than integrating that somehow into the old text format. So that was the main motivation to bring back some kind of proto-based scraping format... But again, it is optional. You don't have to use it if you want to get metrics into Prometheus.

**Mat Ryer:** Yeah, cool. So Richard, do you remember when you first used Prometheus, or encountered it?

**Richi Hartmann:** October 2015.

**Mat Ryer:** \[laughs\] Thank you.

**Tom Wilkie:** Do you remember the exact day and time, or...?

**Mat Ryer:** Do you remember what time? I can't believe it.

**Richi Hartmann:** The first week of October of 2015, if you want to be precise... Because I switched jobs.

**Julius Volz:** What was the first page you got from Prometheus?

**Tom Wilkie:** I think it was one of the Kubernetes -- one of the early prototype Kubernetes alerts.

**Richi Hartmann:** I'm pretty certain it was disk or TLS certificate lifetime with a predict linear.

**Tom Wilkie:** Richard, you always came to this very much not from the cloud-native side, but from monitoring data centers and hardware and networks...

**Richi Hartmann:** Yes.

**Tom Wilkie:** Good. I'm glad you expanded on that. \[laughter\]

**Richi Hartmann:** \[00:10:00.10\] If you keep asking yes/no questions... No, but that's actually what I really liked about Prometheus. Historically, a lot of the old stuff used to work. I mean, some more, some less... But it used to work, by and large. But it wasn't exactly made for scale, and as such, a lot of the time stuff was just put into an image and that was the end of it. Basically, the storage which you could access as an end user was more or less \[unintelligible 00:10:29.03\]

**Tom Wilkie:** Oh, you're talking about legacy monitoring systems before Prometheus, right?

**Richi Hartmann:** Yes. Sorry. Yes, I was. And it kind of worked... But Prometheus had to be built for something of much larger scale, and being able to analyze data in more or less arbitrary settings... Which came with the deliberately exposed complexity of cloud. And this is something which really, really stuck with me and basically made me fall in love with Prometheus overnight... Because for the first time in my life, I was able to actually do math on my data, and not just see it and done. I could actually work on and with the data and take it further. And that's super-useful within cloud, and also with all the stuff you can still touch.

**Tom Wilkie:** And I think it's a common misconception that Prometheus and Kubernetes kind of grew up together, came from the same place, were built to work together. I mean, they do work incredibly well together, but actually - we were chatting about this beforehand... The Kubernetes integrations in Prometheus came relatively late. I think if you say 2012 is when you started working on Prometheus - it was three years before the Kubernetes service discovery. Is that right, Julius?

**Julius Volz:** Well, yes, but we only announced Prometheus to the world in January of 2015. And very shortly after, people from Red Hat \[unintelligible 00:11:51.07\] it was back then - added service discovery for Kubernetes. And then also very soon, all the Kubernetes components: Kubernetes API server, Etcd, and so on. Kubelet... All added native Prometheus metrics. So there was immediately or very soon great cross-compatibility between the two systems.

And yes, it's true that the two systems kind of evolved completely separately without first knowing about each other, but on the other hand, they both kind of -- you know, Kubernetes is inspired by Google's Borg cluster scheduler, and then Prometheus was inspired by Google's Borgmon. So the tool to monitor services on Borg, and to monitor the Borg clusters themselves. So it's kind of -- it makes sense that they would work very well together philosophically, with a label-based data model, and service discovery, and all that.

**Tom Wilkie:** And that, I guess, is kind of -- in the early days I remember trying to explain the whole idea of Prometheus to people, and service discovery, and pull, and these kind of things... And it was alien, because dynamically-scheduled environments were relatively new, and few and far between. But I've always seen that as like the real killer set of features in Prometheus, that still no one has recreated.

**Julius Volz:** Yes. And I would say if you read the latest article on my blog, "Why I still recommend native Prometheus instrumentation versus open telemetry", that is one of the main points. If you have a monitoring system, or you want to have a monitoring system, in my opinion, it should know which monitored services should exist, and then check whether they do actually send data. And Prometheus solves that by combining service discovery with a pull-based model. "Hey, first I discover everything that should exist in the world, and then I actively go out and try to pull from it. And if I can't, I will record that in a metric that tells me "Hey, this target is down", and I have built-in, easy health alerting."

\[00:13:49.12\] And that is unfortunately something -- you know, despite all the benefits of OTEL, one thing that currently kind of just got dropped on the floor, when you just push metrics to... You know, you treat your monitoring system as a receptacle for random metrics, as I would say, how does Prometheus - or whatever monitoring system you're pushing to - still know whether any data is not coming in, and all that? So I think that is really a concern that Prometheus addressed really well for the first time, and still, there's not many other solutions that actually even think about that problem, or handle it in the way that Prometheus does.

**Tom Wilkie:** I mean, it's still, I think -- probably actually back to your question about what was the first alert... I think actually an alert on the up metric was probably my first alert I received.

**Mat Ryer:** So hold on, then... Prometheus 3.0 - when was the last release? And why is it taking so long? Julius, what if you type with both hands? It's been seven years since 2.0?

**Julius Volz:** Yeah, seven years. When was the 2.0 release? Was it in 2016, or '17? Does anyone remember?

**Richi Hartmann:** We talked with Planet.com in 2016 in Seattle about their pod lifetimes, and this is part of what triggered Prometheus 2.0. So I suspect it would have been 2017.

**Tom Wilkie:** Ah, well, the official Prometheus blog post says it was seven years ago. So I'm going off with that one here.

**Julius Volz:** Okay, let's go with seven years. Yeah, I mean, here, of course, we can really say how stable we've been. So we really actually -- you know, we're an infrastructure project, we want to be somewhat conservative, and we want to be adopted and trusted by people, so... We actually managed to, let's say, finalize or build enough things into Prometheus 2.0 that were well thought out enough to survive for seven years without having to be broken... Which was really nice for users. And we made it really clear in our documentation page which elements of the Prometheus API surface are stable.

And then there's, of course, always been a few things - the UI can change in various ways and so on that are not declared stable... But mostly, that has meant that for the past seven years people could just get a new version of Prometheus, just start it without thinking, and it just keeps working, except if there's a bug... And then, of course, we started accumulating more and more smaller things, and a few bigger things that we did want to actually change in a breaking way in Prometheus. And then eventually, there came the idea for Prometheus 3.0, and then it actually happened at the end of last year.

So yeah, let's see how long it takes until the next major release now... It could be that we say something like "Hey, let's just bump the major release whenever we have a few breaking changes that we actually want to make", but it also could be that we say "Hey, the signaling is actually nice if the major release doesn't change all the time", because people don't get this fatigue of having to check whether they need to change any settings, and so on.

**Tom Wilkie:** I mean, it's like the Linux versioning. Like, how long did the two dot something Linux tree run for? It was a huge number in the end.

**Julius Volz:** A long, long time.

**Tom Wilkie:** Now they just bump the major version almost on a regular schedule, I believe. So we're somewhere between that, I think.

**Mat Ryer:** Yeah. I like the way Go does this too, the 1.0 backwards compatibility promise of the language... Because I think that is that stability that allows people to build on top of these as foundations. So yeah, a lot of respect for these long, long major release cycles, for sure. So what are you most excited about about 3.0? Richi, are you excited?

**Richi Hartmann:** Yes and no. So part of 3.0 was more of evolution, less of absolutely breaking everything and redoing the world. My own personal favorite is probably performance, because even though it is such a long-running program and such a long-running project, and even though we have made major improvements over the years, we were still able to massively improve the overall performance, both CPU and memory.

\[00:18:09.01\] It was between, I think, 3x to 7x in our tests, between early 2.x versions and 3.x. So contrary to a lot of other software, in particular end user-facing, which just becomes slower and more expensive and just more of a memory hog over time, we actually act out more performance in something which is just vital to most businesses these days.

**Mat Ryer:** Yeah, that's the thing... Because this is used in so many places, the impact of those performance improvements is kind of enormous when you think about it.

**Tom Wilkie:** Yeah, but I love that Prometheus is getting faster and more efficient, but I also feel like people are just using it -- you know, throwing more and more metrics into it. Like, if we're four times better with memory, people are just throwing four times as much metrics into it. So there's definitely this attitude of "Just add a metric for everything, because Prometheus can handle it." And whilst that's brilliant, it's also kind of leading to a huge volume of metrics.

**Julius Volz:** Yup. So, of course, the performance wasn't all directly related to Prometheus 3. That's an ongoing effort that is happening all the time, which is amazing. And then also, I would say a lot of the major cool features that we highlighted as being part of Prometheus 3 were not strictly breaking. So there were a few breaking things really that aren't even worth mentioning, like a couple of deprecated flags and this and that, but no major thing really went away. So most Prometheus users can still even just upgrade from 2 to 3 without having to think much; maybe if they're using some very arcane flag, they have to change something... But mostly, they're getting a couple of cool, new features.

So personally, I worked on just revamping the old UI... The old UI was - well, it was built on an outdated version of Bootstrap, very outdated under the hood, very cluttered, because people just added random UI elements to every page in a very uncoordinated manner over many years... And I felt it would be a little bit sad to release Prometheus 3 with something that still looked... Yeah, like a garbage dump, basically.

So we basically created a new UI, still based on React, based on the \[unintelligible 00:20:35.24\] React component framework... And then also -- so we basically tried to make the pages look more consistent, a bit more cleaned... And also add some new features, like the PromLens style tree view, which means if you enter a PromQL query, you can enable a tree view, which shows you in a tree-like layout the structure of your query and which data you have at which sub expression; you can click on sub expressions, you can get explanations for them, see the data at each sub expression, and these kind of new features that a revamp of the UI enabled.

So I would say that is maybe one of the highlights... We had a few other things. We had Remote Write 2.0. Remote Write is this protocol that the Prometheus server can send to some other remote endpoint, let's say a cloud service that stores your metrics. So it's meant for a Prometheus server to forward metrics to some either long-term storage or other processing system that wants to do further stuff with your metrics... And this new version of the Remote Write protocol has more support for metadata, trace exemplars, the new native histograms, and more. And then it's also more efficient. So that could be interesting for some people. And then, of course, the big thing - should I go on about the OTEL stuff, or does anyone --

**Tom Wilkie:** \[00:22:03.29\] No, I think the unsung hero of Prometheus 3.0 is UTF8 support. And I know that might not be the most exciting for most people, but we talked about the things that make Prometheus different. We talked about pull, and we talked about service discovery... But there was also something in Prometheus very early days of not using the dot as the kind of separator in the names for metrics.

**Julius Volz:** I'm still against that, but...

**Tom Wilkie:** You're still against it. So Prometheus had the underscores... And this was intentional. Right, Julius? This was intentional to separate ourselves from the hierarchical way of doing metrics.

**Julius Volz:** Yeah. Also -- I mean... Yes. \[laughs\]

**Tom Wilkie:** Go on. I feel like there's a story there.

**Julius Volz:** I would have to think about it a little bit... But the dot operator specifically was also just not used initially because we always had the idea of potentially using it as an actual operator, similar to how it was used, for example, in Borgmon. There you could say "metric name dot the name of the job", and it would be like an implicit label selector on the job name. But that never materialized, and then over the years there were more and more other ideas for how to potentially use the dot as an operator at some point... But then eventually, we had long discussions about that, and that never happened.

But we also did not allow a lot of other characters in metric names and label names because we have a query language, which like most programming languages has to care about character escaping issues, and things that look like an operator, but aren't, but really are just like part of a metric name. It can really throw wrench into that. So if you think about most programming languages, they do not allow a dot as part of a variable name... Or a slash, or a minus, because those are operators. And so yeah, I think Tom -- I mean, talk about the UTF8 support, and then maybe also what that means for people who want to use it in terms of PromQL selectors.

**Tom Wilkie:** Yeah, I mean, PromQL is still a bit clunky with names with funny characters in... But I also think the adoption of UTF8 for label values and for metric names - which are just label values - I think it's huge for interoperability. Because the experience before was your metrics were not called the same thing in Prometheus as they were in the legacy system you were migrating from. Or as they were in open telemetry. Or as they were just in other things. And this was a huge barrier for a lot of people, I think.

I just think this really sets the tone that Prometheus 3.0 is much more open to wider usage, beyond the kind of hyper-opinionated Prometheus observability community, or Prometheus monitoring community. So I think it's more that -- this is why I'm excited about it, because honestly, there's a lot of people using open telemetry, there's a lot of people using old legacy HP OpenView and all these other systems, that just want something more modern, more cost effective, more scalable, more usable... And yet, this is honestly a massive barrier for them.

**Julius Volz:** Yup, it makes total sense. So yeah, now you can put any characters you like into your metric names and into your label names.

**Tom Wilkie:** Label names? Oh, yeah, label names have it also.

**Julius Volz:** Yeah. Label names as well.

**Tom Wilkie:** I didn't realize that.

**Julius Volz:** Yeah, yeah. And label values already were arbitrary strings before. The only thing you have to watch out for if you do that, if you go beyond the traditionally allowed character set is that you do have to quote more things in your PromQL selectors, so they will become a little bit more elaborate. But yeah, that's the cost of getting that nice, unchanged metric name.

**Tom Wilkie:** But on that point, how much do you think -- I think PromQL is brilliant, and I live in the terminal day to day, which is weird, because I'm not really a coder anymore... But I still do most of my work on spreadsheets and Google Docs from the terminal. How long do you think this kind of very CLI-first way of interacting with Prometheus, and this query language-first way of interacting with Prometheus is going to persist? Because again, I think that's a sign of the highly opinionated audience community that Prometheus is aimed at. And there's a massive community outside of that who wants something more point and click. I mean, you built something point and click with PromLens. There's definitely an audience for this.

**Julius Volz:** \[00:26:19.23\] Yeah, for sure. Yeah. It's a good question, and of course, there's LLMs and AI and all that stuff these days, where probably in a few years either you'll talk English, or you'll have like a brain interface to just ask... And then humans at some point they're obsolete anyway...

**Tom Wilkie:** It'll be super-disturbing \[unintelligible 00:26:35.01\]

**Julius Volz:** I mean, that's true, to a degree. Of course, I think for quite a few years there will still be many people who just write tons of YAML with PromQL expressions in them, and so on. But yeah, it's a good question how big each of the audiences are. And you're right, probably the audience that is more casual and just wants to get all their data into that, and then somehow ask some intelligent system to give them something out without knowing full PromQL is probably even larger. Yeah.

**Tom Wilkie:** I think the AI bit you raised there is kind of super-interesting as well, because we're seeing more and more people interact with Grafana using the AI system. Did anyone on the call have anything to do with that?

**Mat Ryer:** Yeah, I was working on that one.

**Tom Wilkie:** You were working on that, weren't you, Mat?

**Mat Ryer:** Yeah. It's quite good.

**Tom Wilkie:** And I think one of the things we've been really impressed by - and this is really a testament to the size of the Prometheus community, and the amount of content that it generates out there - is how well these foundation models generate PromQL... Surprisingly well. Especially in the last year or so, they've got really good at it. And so I think that's just a testament to them being trained on the content generated by the huge Prometheus community. The same cannot be said for other monitoring systems. We even see them not as good as generating queries in LogQL, Loki's query language. Even though it's so similar to Prometheus, the community is much, much smaller, and the content's much smaller.

**Mat Ryer:** Yeah, we even see the assistant trying to use PromQL and Prometheus where it probably shouldn't as well. Like, it was trying to just figure out "What's this a percentage of the whole?", because it can't really deal with numbers very well, but it knew it could generate PromQL, so it was using that to just do basic calculations.

So yeah, but it's true, we've done a lot more work teaching it about LogQL. And Prometheus, and also best practices, and all the conversations that people have had in the community about this in the past is all kind of there, and it's taken into account. So we do -- yeah, we do find it really good. I don't write any PromQL now. I will just ask the assistant and it will generate that for me.

**Tom Wilkie:** I'm ashamed to admit that I'm falling into that. I mean, I'm pretty decent at Prom -- I used to be pretty decent at PromQL, I think... But yeah, now I'm just asking the assistant to do it. It's quite embarrassing.

**Julius Volz:** It makes sense. I mean, also I know how to build a web app, but yesterday I built a web application without writing one line of code. And I deployed it and it's working.

**Mat Ryer:** I was going to ask you, did Prometheus 3.0 come about with the help of any sort of AI-guided coding tools, or systems?

**Julius Volz:** GitHub Copilot, mostly... Nothing too crazy. I mean, most of the time when I'm writing more serious code, I don't generate 10 different files automatically and then try to understand what was generated... It's still more auto-complete a few lines, understand everything, and then change the bits that need to be changed. But still definitely it's very helpful to have LLM-based coding assistance.

**Tom Wilkie:** So let's talk a bit about Open Telemetery. Sorry, I know I took us on a tangent there...

**Mat Ryer:** Well, hold on. Before we do, UTF8 - how long before people just have poop emojis in their names?

**Tom Wilkie:** Oh, it's already happened.

**Mat Ryer:** Yeah. That's the inevitable --

**Tom Wilkie:** Of course.

**Mat Ryer:** That's the elephant in the room.

**Tom Wilkie:** I think if you go and check our production services, there are UTF8 characters now in the metric names.

**Mat Ryer:** Yeah. So it's going to be all emojis in the future. So are you happy with that, Julius?

**Julius Volz:** \[00:30:08.00\] Yeah. The poop emojis, I fully support. Just quote your poop, so it works.

**Mat Ryer:** \[laughs\] Okay. That's the clip for the promo for the season. Thank you very much. Okay.

**Tom Wilkie:** Open Telemetry. Let's talk about the Open Telemetry support. And I think UTF8 was a big part of that, but Richi, what else did Prometheus 3.0 bring from an Open Telemetry perspective?

**Richi Hartmann:** A lot in both ways. I mean, we invested quite some effort on both sides in increasing interoperability... Because Prometheus kind of obviously has the largest installed base within all of cloud native. It's the only thing which kube-state-metrics speaks, and basically anything which does metrics within CNCF is Prometheus native. Whereas a lot of the application and a lot of the mind share for new stuff is with Open Telemetry. And if those don't work together seamlessly, it's going to create a lot of friction, a lot of pain for everyone.

So I would say that overall, with probably the exception of UTF-8, a lot more work had to be done on the Open Telemetry side to be compatible with Prometheus, rather than the other way around... But to be clear, it's still something which we from Prometheus team invested heavily in; myself, Goutham, a few others, yourself, Tom... And also if I put on my Grafana hat for a second, also Grafana Labs is massively investing in, with a lot of head count and a lot of time and effort and engineers to make certain that Open Telemetry is able to deliver on this promise of full interoperability with everything... Which is one of the big draws of Open Telemetry.

Other things which we did for Open Telemetry - or let's say with Open Telemetry - is native histograms, for example. So native histograms is largely based on the work of Björn Rabenstein. He called it his Magnum Opus himself, so he really poured a lot of time and energy into making this as near to perfect as is humanly possible. And anyone who has used the old histograms - so for those who don't, or who didn't, the old histograms basically you kind of needed to know how the system properties were to choose sane bucket boundaries for your histograms... Whereas the native histograms just do what you want automatically, more or less. So you don't have this observe, improve, observe, improve cycle, unless you were already a subject matter expert or had a rough inclination of what your data would look like. This is all not needed anymore with native histograms. And there was a specter on the horizon of Open Telemetry also having something very close, but slightly incompatible, similar with the initial histogram, which it happened there and it cost a lot of pain and effort to fix this, in particular on the Open Telemetry side.

So native histograms is the first thing there before something even went stable. Open Telemetry and Prometheus really collaborated closely and made it work before it got released as fully stable. And I feel this is less about Prometheus 3, more about just a change on both sides of the projects to really try and put something out which doesn't hurt or impede the users in ways which they cannot even anticipate before basically getting to at least semi-professional level with the thing.

**Tom Wilkie:** \[00:34:00.09\] I'm super-excited about the native histograms as well. It is a big step forward. And I think it's also a relatively unique approach that Prometheus has done here, by storing the really high-definition histograms long-term. Again, my understanding is limited on this one, but my understanding is most other systems, whilst maybe using the high-definition histograms for transport, like effectively sample and then only store the percentiles sampled... That's why I think most systems -- but we store the full raw histogram in a very, very efficient way, forever. And so you can go in ad hoc in the future and ask arbitrary questions about "What was the performance on this day, at this percentile?", which if you didn't pre-sample and pre-store that, maybe you can't in other systems. I'm not an expert on this, but that's my understanding. Is that accurate?

**Richi Hartmann:** Yeah. As far as I know, yes. And also, you're touching on a very important point. Prometheus is one of the few major projects which deals from everything from generating the data, to transmitting the data, to storing the data, clearing it, using it for alerting, dashboarding, whatever. So the perspective of Prometheus on data handling is necessarily much more - 'constrained' is the wrong word, but it needs to be more deliberate. Of course, it's not the case of just generating something, tossing it over the wall and it's someone else's problem, or of being completely unable to determine how data is being generated... There is actually an underlying holistic design from beginning to end, or through the whole lifetime of a metric.

**Tom Wilkie:** One of the things you said earlier was that you don't need to be an expert on the underlying distribution of the data to use native histograms anymore. I think what you mean by that is you don't have to pre-declare a set number of buckets and think really hard about them. Or use the wrong ones and then get basically garbage data. But the system kind of just deals with that all for you now.

**Richi Hartmann:** Yes.

**Tom Wilkie:** Yeah. I think this is huge. The other thing I think is so exciting about this is I think the number of hacky demos I've seen of native histograms that encode pictures in the histograms, and videos... Like, there's a smiley face, there's a Prometheus logo, but there's also "Never going to give you up" as a video, encoded in a native histogram, that you can play back in Grafana, which I just think is such a cool... And it also just kind of shows how much density and information you can pack into this data.

The thing I've really loved about the native histograms in production usage is we've actually learned a ton about the behavior of our services that previously was hidden behind a super over-aggregated data. It's like going from -- and you're all going to cringe at this analogy, trust me, but... It's like going from standard definition to high-definition TV. Suddenly, I can read the values on the steering wheel of a Formula One car from the in-car footage. Suddenly, where I used to think "Oh, I'm a 99th on this service's couple of hundred milliseconds, that's fine", now I'm actually seeing bimodalities where "No, there's a group of requests that are 30 milliseconds and a group of requests that are 300 milliseconds." And I never knew that about these services before. I think it's fascinating.

**Richi Hartmann:** Maybe we can link to the blog post about this in the show notes... Because it's kind of hard to -- or maybe we can put this in the video if there's a video version of this... But it's kind of hard to describe this through audio, but it's really brutally obvious if you just see an old and a new image. As you say, the fidelity of the data is so much higher. And humans are visual animals. The highest performance data path we have to our brains is the visual cortex. So...

**Tom Wilkie:** And for the listener, I was manically waving my hands around when I was explaining how excited I was about this, but obviously, I realized this is not a video recording, so you have no idea what I'm doing there...

**Julius Volz:** \[00:38:09.03\] The only thing I would add is that not only is it easier to configure - you basically don't have to configure it much at all anymore - and higher resolution, it's also way, way cheaper at the same time. It's way more efficient than the old histograms, because the whole histogram sample can be stored in one time series, versus many; like, one for each bucket. And then also the encoding of that sample is a very efficient binary... It's all very well managed and very efficient, and so it's just better on every dimension. It's amazing.

**Tom Wilkie:** But I mean, also the usability of it. We talked about the query language in PromQL, the way you interact with these histograms... You don't need to know the ins and outs of what the LE label means; you don't need to make sure when you aggregate them, you propagate the LE label... And I've already lost most of the listeners by saying aggregate and propagate the LE label. You don't have to do that anymore. It just works.

**Julius Volz:** Yeah. It just works. The PromQL looks almost exactly as before with traditional histograms, except simpler. You just have to remove some things and suffixes and LE labels and stuff, and it just still works, and it just looks simpler than before... And it works better. It's amazing.

**Mat Ryer:** Yeah, that sounds like too good to be true. What's the process like for coming up with that? Is this just like someone makes a proposal from the community, or is this a team working on this?

**Julius Volz:** So we have a person who's very special, called Björn Rabenstein... And he comes up with very well thought through things every time he comes up with anything. \[laughs\] And he researches everything very deeply, and he thinks through every potential pitfall and avenue, and so on... He was the main driver initially behind coming up with this new kind of exponential sparse bucket histogram, and then also how to make it work with a pull-based model where you cannot reset the histogram anytime a Prometheus server just pulls from a server... Because that's like a GET request, it should be idempotent. And yeah, he was the main architect, and then other people joined... And yeah, he wrote a large proposal and lots of other documentation around it, and gave talks, and so on.

**Tom Wilkie:** It's been a multi-year project of Björn's, right? He's been working on it probably for the best part of five years. And the thing I really -- you know, on one level, you know, sparse bucket histograms, even exponential sparse bucket histograms is a well-known kind of concept. But the nuance and detail of how it's implemented in Prometheus, and the end-to-end thinking of getting it working in the client libraries, in the exposition format, in the storage layer, in the query language - that's where Björn's really excelled. That's where the team has really done a really good job, and why it's taken them a fair amount of time to work out all of those details.

So you mentioned some changes in the query language to support histograms there, and having to tweak some of the queries, and that's one of the kind of - let's call them breaking changes in Prometheus 3.0. What other breaking changes should people be aware of, and what else is kind of maybe gonna catch you out if you just blindly swap your 2.0 binary for a 3.0? Anyone? I've got one...

**Julius Volz:** Go ahead. Because there were so few that were actually relevant, that I forgot about all of them, and I will have to look them up.

**Tom Wilkie:** I will say, there were very few. It was the change in the behavior of the sliding window inclusion...

**Julius Volz:** Yup.

**Tom Wilkie:** That one broke a load of our alerts in production.

**Julius Volz:** Where nobody thought about it beforehand, but then people were using it in a very specific way, that you didn't anticipate. Yup.

**Tom Wilkie:** \[00:41:46.16\] This was one of those 100%, like, it wasn't really intentional that it was like this, and everyone just relied on the semantics and previous behavior... And then the change also - I don't think when we made this change we were expecting it to be so impactful. But we literally had to develop tools to analyze all our customers alerts, and their data, and then be able to proactively tell them "Oh, by the way, this alert is going to change. This is the upgrade date, and this is how we need to fix the alert for you." We had to go and do all of that for our customers, because it actually had this kind of unintended side effect. But it was worth it. The new behavior is much better.

**Julius Volz:** Yup, exactly. And I think other than that, there were not big things. If you go to Prometheus.io and you search for migration, there's a migration guide to Prometheus 3, and it lists a bunch of flags and configuration settings that if you're using them, you have to change or remove, or so on... But they're very minor things, and they may have been deprecated for a while already, and so on. So yeah, just go through those if you have not upgraded yet, but don't worry about big things breaking, basically.

**Tom Wilkie:** So tell me what's coming. When can we expect Prometheus 4.0? Is it going to be another seven years, Julius?

**Julius Volz:** I have no idea. Does anyone know? I don't think we know yet... I think we want to be pragmatic about it, we want to strike a balance between bumping the major version every year or so and waiting for another seven years. I think that was maybe a bit too long... But whenever there are enough features kind of accumulated that would necessitate breaking something, we will just get together and discuss and say "Oh yeah, it's worth cutting a 4.0." That's how I would imagine this would go, but there's no concrete plan around that yet.

**Tom Wilkie:** I know one of the things the team's working on a lot is a new governance structure. What's the ideas there, Richi? What are we trying to do with Prometheus governance?

**Richi Hartmann:** So that's one of the things where we learned from Open Telemetry. Open Telemetry has a very, very open and inclusive governance, where the threshold of contribution to become a member of standing - or a voting member, in the terminology of Open Telemetry - the bar of entry is very low. And this leads to an effect where basically you can defend to your manager or your spouse much more easily investing time in this thing, because you have a stamp of official approval. And obviously, you also feel better about yourself if you have something of standing... So the likelihood of those people staying around and then doing more over time and contributing more is higher. There Prometheus has a pretty high bar. We've lowered it already quite substantially. Basically, anyone who is a maintainer of anything not even that major is already a Prometheus team member... But that is the main design goal behind the change of the governance.

There are bits and pieces which are also -- but not super-relevant for you. The main thing is we want to really broaden the scope of who can call themselves a member of Prometheus, to just widen the contributor base massively.

**Tom Wilkie:** Yeah, I think even I qualify to be a member of the Open Telemetry community, and I don't think I've contributed for years. But yeah. I'm really looking forward to expanding the kind of Prometheus community even further, and making it even more inclusive. I think that should be a really big desire. I don't know whether we'll call that a 4.0 feature, though... Does that does that translate into a major release, maybe?

**Julius Volz:** No, no, no. We can do that without -- we should not have to wait until 4.0 for that.

**Tom Wilkie:** No, of course. Yeah. What else? What other ideas do you have for future work on Prometheus?

**Julius Volz:** So obviously, a lot of the things that we have just talked about are still in the early stages... UTF8 support, for example, does not exist in every ecosystem component and client library yet, and other systems that deal with Prometheus-like data. So that has to percolate through the entire ecosystem and be adopted everywhere.

We want to add better open telemetry compatibility and features to make monitoring work better with open telemetry.

\[00:46:17.04\] For example which open telemetry resource attributes to map into Prometheus target labels, like which attributes should be automatically attached to every metric on an ingested metric, basically, from an OTEL resource. There's already some configurability about that, but still a lot of discussion... And yeah, how to basically graft together those two different monitoring models in a way that just feels more natural and does what people would expect it to do. Native histograms still need to be stabilized... And yeah, I think other than that, can other people think about big new things coming?

**Richi Hartmann:** So the one thing which is also going to come - it harkens back to what Mat said earlier, the text format... Prometheus has always had a text exposition format, because that's how data was transmitted... Then between CNCF and Prometheus team, we wanted to create a standard for this, at the request of CNCF as assisting project, and this has now been merged back into Prometheus. So for those who haven't heard the name OpenMetrics, that is merged back into Prometheus, and there is a working group which defines a 2.0, which also is working on honing more of the native histograms and other bits and pieces.. It's actually not driven by myself anymore; it's Bartek, also Prometheus team member, who now mainly drives this. And this is also one of the things which might trigger a 4.0, or just flows with everything else, because it should be able to, or we should be able to just do it without any breaking changes.

**Julius Volz:** Yeah. Normally, a new format would just -- you know, Prometheus would send a header saying "I also support this new format now. If you can speak it, please speak it to me." Otherwise, the target can send an old format still. So that should not break anything.

**Tom Wilkie:** And I think there's a small team of people from a bunch of different organizations working on a more cloudy storage format for Prometheus, right? Isn't there -- I forgot the name of the Apache data format that's columnar... What's it called?

**Julius Volz:** Was it Arrow?

**Mat Ryer:** Parquet?

**Richi Hartmann:** Parquet, I think that's how you would pronounce it.

**Tom Wilkie:** Yeah, I know there's a couple of people from Grafana Labs working with people from - is it Shopify, or...? I don't know exactly, but yeah, working on maybe a newer storage format using Parquet, that maybe - it's super-early days, but maybe might be even more efficient and cost-effective and performant and scalable than all the \[unintelligible 00:49:10.07\] basically. Julius, what's your opinion on that effort?

**Julius Volz:** \[00:49:15.28\] I have no idea about that effort. I just googled that effort, and it seems to exist. \[laughs\] There's a GitHub repo for it. There's a Prometheus-community/parquet-working group, or wg... And yeah, so there's people working on that. I have no idea how far along it is... Some thoughts I heard about this kind of effort a long time ago is that of, course, the challenge is you expect this to eventually work much better than what we currently have, but it's a lot of work to get over that initial hump of... You know, initially it will not work as well, and then -- basically, getting over that hump until it actually works well, and is worth it. So let's see how that goes. I have no insight into that.

**Mat Ryer:** So there is, there is a conference, isn't there, for Prometheus? ...if people want to actually attend in-person, in real life.

**Richi Hartmann:** It turns out in-person is kind of nice.

**Mat Ryer:** Tell me about it, Richi. When's PromCon?

**Richi Hartmann:** So PromCon is going to happen in October this year. It's going to be back in Munich, so depending on who you ask, it's coming back home. Some of the Prometheus team live in Berlin, and... Yeah. No, I mean, snark between Prometheus team members aside... This October we will have PromCon in Munich again, which is the PromCon which -- the first big one happened in Munich, so for a lot of people it tends to be the first major one.

**Mat Ryer:** I love Munich. Great film.

**Tom Wilkie:** Do you know what date that is, Richi?

**Richi Hartmann:** Yes, but let me actually look it up. I think it's 21st and 22nd of October. Yes, it's 21st and 22nd of October. And the 23rd is going to be the Prometheus Dev Summit, where we basically get together and talk about all the things which go into Prometheus 4.

**Mat Ryer:** Richi, your memory is really good. Would you like to come to Vegas with me?

**Richi Hartmann:** Um, not really. It's way too hot.

**Mat Ryer:** I appreciate the honesty. Imagine him just -- he just says no, if you invite him somewhere. It's nice. You know where you stand with Richi, don't you? \[laughs\]

**Tom Wilkie:** Normally slightly below him, because he's very, very tall.

**Mat Ryer:** Yes. But he's very gentle, I'll just say that as well. Hopefully that makes it into the edit...

**Tom Wilkie:** Hopefully the me going and getting a parcel doesn't.

**Mat Ryer:** Well, that's it. We should do a new regular section. We need a theme song for this...

**Tom Wilkie:** Oh, the reveal of what was in the parcel?

**Mat Ryer:** Yeah. What's in Tom's parcel?

**Tom Wilkie:** It's gym equipment... Gym clothes. I know. I go to the gym now. I'm one of those people. I do apologize. What happened is I got old, and it turns out you lose muscle when you get old, so...

**Julius Volz:** Yup.

**Mat Ryer:** Well, that sucks. Well, Prometheus isn't too old, and has got plenty of muscle left...

**Tom Wilkie:** That is the best segue ever. Thank you, Mat. \[laughs\]

**Mat Ryer:** Anything else we want to chat about before I do a wrap-up?

**Tom Wilkie:** No, I think we've had everything we wanted to. Thank you, both of you.

**Mat Ryer:** Okay. Well, unfortunately, that's all the time we have for today... Thank you so much to our guests, Richi, Julius, and of course, to my co-host. If you enjoyed this episode, why not share it with your team or your friends, if you've got friends?

**Tom Wilkie:** Or your parents.

**Mat Ryer:** I don't know about parents. It depends on your parents, honestly. Some parents are cool, aren't they? So they won't be interested... \[laughter\] But you might have many parents who might care about this. Well, we'll find out. And hopefully, you join us again next time, on Grafana's Big Tent.

@ 2022 Grafana Labs