00:20.58 Matt Godbolt Hi, Ben. 00:21.36 Ben Rady Hi, Matt. 00:22.52 Matt Godbolt How are you doing? 00:23.88 Ben Rady I'm doing pretty well. 00:25.22 Matt Godbolt We've just had a moment of exasperation, haven't we? 00:27.96 Ben Rady Yeah, we're probably talking over the intro. Probably not at this point, but maybe earlier. 00:32.66 Matt Godbolt Maybe earlier. I don't know. Yeah, that... it'll... I'll fix it in post, as I always try to do, or increasingly less. In fact, I try. 00:43.74 Matt Godbolt I think I've kidded myself that the charm is all about like the fact that a dog's barking in the background and there are children screaming across the road and... we lip smack and other things. 00:58.20 Matt Godbolt It's kind of ASMR actually, isn't it? 00:59.95 Ben Rady Yeah. 01:00.06 Matt Godbolt No. Yeah. ASMR. So recently there's been a lot of talk about the Dutch manufacturing company that makes the lithographic systems for chip manufacture. And they're called ASML. 01:14.52 Matt Godbolt And I just can't help but get the two things mixed up. 01:17.14 Ben Rady Ah, every time. 01:17.20 Matt Godbolt They're very different things, but that's not what we came here to talk about today. Is it? 01:23.04 Ben Rady It's not, it's not. I have an idea. I haven't... I have a thing. 01:27.90 Matt Godbolt You have an idea. Okay. 01:30.80 Matt Godbolt Tell me about your thing. 01:31.82 Matt Godbolt Whoa. 01:32.18 Ben Rady Let's say, careful now. 01:34.32 Ben Rady Let's say for the sake of argument, I have a program. 01:41.28 Ben Rady Stop it, phone. 01:43.64 Matt Godbolt There we are. 01:43.70 Ben Rady I have, exactly. 01:43.92 Matt Godbolt Some more charm. and More charm has just entered the room. His phone, Ben's phone, unmuted phone. 01:49.78 Ben Rady I have a program and I want it to run fast. How would I do that? 01:56.60 Matt Godbolt Oh, wow. That is a great question. And we only have around about 30 minutes, 40 minutes to talk about it. So I've got a feeling. 02:07.08 Ben Rady So immediately this is gonna be our first 10 part series. 02:10.14 Matt Godbolt Maybe so. I don't know. I don't know. This is one of those questions because I think obviously deliberately you have asked me a leading and vague question. Because what do you mean by, in fact, each of those words, right? 02:25.12 Ben Rady Can you please define each of the words that you just used in that sentence? 02:29.07 Matt Godbolt Yeah. 02:29.21 Matt Godbolt Right. 02:29.34 Ben Rady Yes, exactly. 02:29.74 Matt Godbolt My program, make my program fast. 02:32.72 Ben Rady Right. 02:33.34 Matt Godbolt All right. So I think make, we can, yeah, cause it to become my program. You have something that you wish to go fast. Right. So the only word we really need to define then is fast. 02:44.84 Matt Godbolt And I'm pretty sure you don't mean stopping eating for a period of time. 02:48.62 Ben Rady Right. Well, I think you could also define my program because in this hypothetical thing is I haven't written anything yet. 02:54.42 Matt Godbolt Oh, 02:54.70 Matt Godbolt oh 02:54.98 Ben Rady Right. 02:55.50 Ben Rady It's like, I want to write a program that doesn't exist, but I want the execution of that program to be fast for some definition of fast that we will define shortly. 03:07.76 Matt Godbolt okay. 03:07.86 Matt Godbolt Right. And so, yes, not no eating, not also unmoving, which is the weird other case of you know stuck fast, which is like one of those really weird auto-antonymic words, you know, like inflammable or something like that. 03:21.15 Ben Rady Right. Right. Uh-huh. Yes. Stuck fast. It's the opposite of this. Yeah. Uh-huh. 03:21.50 Ben Rady Right. 03:22.44 Matt Godbolt You mean quick, to go quickly. All right. So let's talk about what that could mean. Right. 03:26.40 Ben Rady Yes. 03:26.78 Matt Godbolt So the instant you say that and because of what we've been doing professionally for a long time and for what I've been doing most of my career, I think of low latency as fast. 03:39.54 Matt Godbolt Like if you if you decide you need to do a thing, if your program needs to do a thing and it discovers that that need, it should act upon it as quickly as possible having discovered that need. 03:50.32 Ben Rady Mm-hmm. 03:50.82 Matt Godbolt That is what I mean by sort of low latency. It reacts to a real-time thing as quickly as possible. And I think a lot of embedded systems work in this world. A lot of UI, in fact, is this, right? 04:01.38 Matt Godbolt If you said, hey, I want my, you know responsiveness in a UI is about latency, right? 04:05.94 Ben Rady Right. 04:06.04 Matt Godbolt But that's one form of fast. 04:08.79 Ben Rady Mm-hmm. 04:09.94 Matt Godbolt But there's a very real, very important other kind of fast, which is I am... I'm writing a distributed message passing system. 04:21.20 Matt Godbolt You know I'm... I've Mastodon, right? I've... and I'm dealing with an enormous volume of information. 04:27.10 Ben Rady Yeah. 04:27.10 Matt Godbolt And if I was slow at doing that, eventually I would back up and there would be I would never keep up with with the current... Well, it doesn't matter. Individual messages may take a few seconds to go through if that's not fast. 04:40.00 Matt Godbolt I mean, maybe that is fast for that scalar network, right? But you're talking about a throughput problem, you know, or bank clearing is another one. 04:46.28 Ben Rady Right. 04:46.32 Matt Godbolt They've kind of got a bit of latency. 04:47.34 Ben Rady Mm-hmm. 04:47.50 Matt Godbolt You beep your card when you're at the ah the coffee shop and it goes beep and sort of you've got it waiting six or seven seconds and that's a completely fine human amount of time. You know, it's an eternity for computers. 04:58.16 Matt Godbolt But on the other hand, there are... several hundred million people in that same five second period tapping their card on their reader in the same time. 05:07.73 Ben Rady Right, right. 05:07.92 Matt Godbolt And so there's a lot of things to go. So those things, so we've got latency and throughput. And are there any other aspects? 05:17.30 Ben Rady I think for the definition of this conversation, I think latency and throughput are two dimensions of fast which we should consider. 05:25.98 Matt Godbolt Okay. Okay. 05:27.44 Ben Rady I think you could maybe come up with some other ones, but again, we have 30 minutes. 05:32.18 Matt Godbolt Well, I mean, again, we can make this into more. but So so do you want which one do you want to start with? 05:37.11 Ben Rady That's true. 05:37.44 Ben Rady I think we should start with latency. 05:39.44 Matt Godbolt Okay. Okay. So then we get into this sort of amazing set of kind of questions that we get into when we're dealing with computers because how fast is fast now? Right, how quick is – like we just said, if I'm a human waiting at a sales kiosk, 06:00.02 Matt Godbolt If it takes me 30 seconds to pay for something, that is not fast. If it takes me three seconds, it's probably okay. 06:08.70 Ben Rady Mm-hmm. 06:09.36 Matt Godbolt If it takes 0.3 seconds, I won't notice that that anything happened, really. 06:13.70 Ben Rady Yeah. 06:13.78 Matt Godbolt There's a beep and I'm out of there. And any other order of magnitude less than that, like 0.03 seconds or whatever, is immaterial to me at that point, and and it doesn't matter. 06:22.47 Ben Rady Right. 06:22.55 Ben Rady Mm-hmm. 06:22.64 Matt Godbolt But you and I work in a world where at least some of the things that we have done or are adjacent to, we are talking about maybe seven or eight orders of magnitude faster than those three seconds. 06:35.28 Ben Rady Mm-hmm. 06:35.60 Matt Godbolt We're talking... Hundreds of nanoseconds, hundreds of microseconds. 06:39.80 Ben Rady Yeah. 06:39.94 Matt Godbolt And I have to remind myself that a nanosecond is a billionth of a second, right? 06:44.73 Ben Rady Yeah. Yeah. 06:44.82 Matt Godbolt You know, sometimes like we throw these words around. I mean, maybe this is just, again, you and I and the weird world that we live in. But like nanoseconds are kind of a completely normal measurement for a, you know, three gigahertz computer. 06:56.52 Ben Rady Right. 06:56.80 Matt Godbolt A few clock cycles is a few nanoseconds. That's completely normal. Yeah, that's reasonable. And, you know, like to, you know, good old Grace Hopper's, you know, light foot measurement, that a nanosecond is. 07:07.56 Ben Rady Yeah. Yeah. Uh-huh. 07:07.56 Matt Godbolt Here's a foot of copper wire. That's a nanosecond. You're like, oh, shucks, man. You know, that's the speed of light. It's the fastest thing that there is can only go a couple of feet in the time it takes me to add two 64-bit numbers together. 07:20.88 Matt Godbolt That is insane, frankly, that we're at that stage. 07:24.38 Ben Rady It is insane. You know, whenever I think about that, I just think about Grace's wire just twisted around on the inside of the CPU and all the places where it would need to go in order for the light to move in that way to add those two numbers. 07:36.62 Ben Rady And it just makes my brain melt. 07:37.98 Matt Godbolt It's, yeah, it is quite something. And the fact that, yeah, we've taken these Factorio factories and we've crushed them down into the size of my you know thumbnail and somehow snaking around in there are many, many, many, many hundreds of feet of cable. 07:56.37 Ben Rady Yeah. Yeah. 07:57.46 Matt Godbolt As you say, each individual thing. Electron is – well, they don't go all the way down. It's complicated, right? 08:05.20 Matt Godbolt Let's stop pretending that we can talk about that as well. 08:07.09 Ben Rady Yeah. 08:07.44 Matt Godbolt So, yeah, we've got many orders of magnitude here. So, you know, we could break it down into things that need to be measured in nanoseconds, in microseconds, in milliseconds, and seconds. And, you know, that's, I think, probably – after that, I think, you know, the latency of something – maybe, I mean, maybe you still can measure it in minutes, right? I mean, let's think about like you know publishing a video on YouTube. 08:28.36 Matt Godbolt I've just updated my video on YouTube and I click the button to publish it and it takes how many minutes to transcode it to all the different formats that YouTube does before it becomes visible. Maybe that is measured in minutes and maybe I do care about that latency. 08:40.75 Ben Rady Mm-hmm. 08:41.35 Matt Godbolt But I think the thing that I can confidently talk about, and maybe you as well, is the sub-second orders of magnitude latency. 08:52.74 Ben Rady Yeah, yeah, yeah. 08:52.94 Matt Godbolt And so let's talk about what can you do in that amount of time? 08:55.80 Ben Rady Right, right. 08:57.96 Matt Godbolt What, you know, that's a good starting point of like, only, you know, if you say I would like my program to be fast without even knowing what your program is, I've got some bounds on what you could reasonably expect to do in hundreds of nanoseconds, hundreds of microseconds, hundreds of milliseconds, right? 09:14.40 Ben Rady Mm-hmm. 09:14.53 Ben Rady Right, right, right, right, yeah. 09:15.22 Matt Godbolt On commodity hardware, if we're just talking about things like a regular PC or a regular, you know, so a decent server in a data center somewhere or a laptop or something like that, they're all within spitting distance of each other. 09:26.15 Ben Rady Right. 09:26.32 Ben Rady Yeah. So if we think of latency as sort of like the time it between an input is provided and an output is produced, 09:33.46 Matt Godbolt Yeah, sounds good. 09:33.90 Ben Rady then automatically down in the like tens of nanoseconds range, you're sort of like, what output could you possibly consume or what input could you possibly consume and what output could you possibly produce in that span of time, right? Because it's got to go into the computer and then come out. So like maybe a theoretical example here would be like a robotics application where you have some like active control device of a a system or or you know a set of servos or something like that where you're trying to like, you know, imagine like the, you know, like the robotic dogs type thing where you're just like, you know, taking like some information from like the sensor on ah on ah on a robot and then feeding it back and using that to adjust. Like that's already going to be up probably in the like low milliseconds range, probably. 10:24.06 Ben Rady I mean, I don't know. I'm not an expert in robotics, but 10:26.74 Matt Godbolt I'm not sure either. No, but I mean, you think about things that do like autobalance, or you know, drones, like a still that can't be milliseconds because it's already upside down by the time a millisecond of wind has caught it. 10:37.22 Ben Rady Yeah, yeah. 10:37.22 Ben Rady Yeah, yeah, yeah, yeah, yeah. 10:37.22 Matt Godbolt Right. So they must be a little faster than that. But, you know, maybe we're talking microseconds at that level. It seems not totally unreasonable to have some sensor update and put something on a bus internal to a system on a chip kind of thing that can then go, oh, wait a second, I need to speed up this rotor a little bit more to account for that. 10:54.88 Ben Rady Mm-hmm, mm-hmm. 10:55.14 Matt Godbolt Right. But you you're absolutely right. 10:55.86 Ben Rady Yeah. Cause a lot of those times those outputs can be sort of like targets, right? So they don't, it's not necessarily that it needs to go like all the way out to like a servo or something. It's just kind of like, okay, I've gotten this input and now I need to adjust, you know, my targeted angle here. Right. And there's maybe even some other external system that is like actually making the adjustments to the physical objects, but like you might want that target to update on the order of, 11:21.52 Ben Rady you know, tens of mics or hundreds of mics. 11:24.02 Matt Godbolt Right. 11:24.82 Ben Rady Just because when you do, you get sort of like a smoother, better prediction of like where the arm needs to be or something like that. 11:31.42 Matt Godbolt Right, right, right, right. And yeah, so I mean, let's to sort of break it down to consumer hardware. 11:35.59 Ben Rady Yeah. 11:35.60 Matt Godbolt It is difficult to get a message in and out of a computer in, you know, I mean, like you just do a ping, right? You ping a computer on your local network, and it's less than a millisecond, but it's, you know, still hundreds and hundreds of microseconds probably to go through commodity hardware. 11:52.95 Ben Rady Yeah. 11:53.28 Matt Godbolt And, you know, for the kind of weird stuff that you and I have been involved with, there is some very specialist hardware that happens, which we can't really talk about but on the far end into the nanosecond range of things. So I suppose if we're talking about 12:03.54 Ben Rady Mm-hmm. 12:03.54 Matt Godbolt Things that most folks could reasonably get, put their fingers on a keyboard and program, be it a little embedded system like an Arduino or an ARM-y type thing, or indeed a, you know, a program like I'm typing on a computer and I'm expecting, when having pressed a key press to see the key appear in the terminal, that kind of thing. 12:21.80 Ben Rady Mm-hmm. 12:22.77 Matt Godbolt we we've We've kind of picked now our range. Now we're talking probably hundreds of mics to low millis. 12:31.42 Ben Rady Mm-hmm. Mm-hmm. So I've got a program. 12:33.50 Matt Godbolt So, Yeah, so we're down. 12:34.22 Ben Rady It's going to... 12:34.32 Matt Godbolt Yeah, that's right. I forgot where we were now. yeah 12:36.62 Ben Rady It's gonna read some input delivered from outside the computer, and this is a commodity computer, you know, like, and then it's gonna produce some output that is also going to leave the computer, right? 12:51.38 Matt Godbolt Yep. Yeah. 12:52.38 Ben Rady And we're thinking that this is gonna have to occur, it's gonna take at least hundreds of mics for that to occur. Is that a fair statement? 12:59.36 Matt Godbolt I think so. Without without getting... 12:59.78 Ben Rady So that's like the upper bound of how fast we can make it for my definition of fast. 13:03.34 Matt Godbolt I think so. I mean, you know, there are exotic techniques and things that you can even use on commodity hardware that is a little bit more exotic, like DPDK, which is a sort of network bypass thing that you can fiddle with. 13:16.16 Matt Godbolt And if you put a second network card in your computer, you can go faster. But yeah, I think let's say, yeah, already, whatever you're doing, if you're responding to an external network event, you're doing any kind of reaction to it and sending it back out again, we're in the... 13:30.66 Ben Rady Yeah. 13:30.66 Matt Godbolt Microsecond region but probably hundreds of microsecond region would be my instinct and maybe I'm wrong, maybe I've forgotten how, you know, computers are these days. But, or at least like normal computers, right. In normal computers, I mean actually I'm... I was telling you before we started recording I'm surrounded with networking gear and computers and things and AI agents running things for me as I'm setting up. I'm actually upgrading my home network for up to like two and a half gig because that's actually something that is not totally unaffordable for the home these days now and so I'm like, hey, maybe my ideas about what commodity hardware is is out of date but... 14:07.50 Ben Rady Yeah. 14:07.73 Ben Rady Yeah. 14:07.85 Ben Rady No one knows how computers actually work. 14:08.75 Ben Rady Yeah. 14:09.00 Ben Rady Yeah. So my first question would be when I'm writing my program, how do I make sure, because my assumption here is that the input here is unpredictable. It's not like you're gonna start the program and immediately be provided the input. 14:22.66 Ben Rady The program is running, 14:24.33 Matt Godbolt Mm-hmm. Yeah. 14:25.28 Ben Rady There's some thing outside of the computer that creates this input, and then the program has to react and create an output, right? So my first question to you is how do I make sure that there's the minimum possible latency between the generation of that signal and my the code in my program running? 14:42.38 Matt Godbolt Yeah. 14:42.38 Ben Rady Something, something ePoll, right? 14:45.80 Matt Godbolt Well, that's... No, exactly right. No, well, yes and no. I mean, that goes into the DPDK kind of sort of thing I alluded to. So yeah, you know a traditional server program that is maybe blocked on a socket. 14:59.18 Matt Godbolt So you've got a socket, you've got, you know let's say UDP just to make things maybe easier. I don't know, right? You've got a blocking call to receive, which means that your process has gone to sleep 15:11.80 Ben Rady Mm-hmm. 15:12.24 Matt Godbolt Told the operating system, hey, anything interesting coming in on this port, this UDP port, I'd like to know, please. And then I've gone to sleep and I've been descheduled. And now I've just been chucked in the list of all the things that are currently, all the processes that are currently asleep and have nothing to do. 15:30.60 Matt Godbolt So that's where we are now. Packet comes in, goes through the standard kernel process of saying, well, who's this for? Oh, it's for port 9007. That is registered to this process. So we're going to write it into a buffer now rather than discard it, or rather we keep it probably around in a buffer that exists. First of all, is there space in that buffer? Yes, there is. Okay, this is something you can configure for each socket, how much space that you can have, which gets complicated if you're sharing multiple processes listening on the same. 16:01.14 Matt Godbolt It's, yeah. Anyway, and then we go, well, who cares about this, right? We've put it into the buffer now. We've copied the data. Who cares? Oh, there's a process. Okay, well, this process is waiting on this, so we should mark it as ready. This process is now ready to be scheduled. And then... 16:17.82 Matt Godbolt You're done. And then as part of the whole, you're done process, the kernel goes back to, well, I guess I'm now finished, but back to the user process. Now who, who should be run? 16:30.30 Matt Godbolt And maybe there was something else that was going on on that CPU when the packet came in and that other process is actually still higher priority and whatever queuing fairness there is, in which case it will merrily do whatever it was doing before until it runs out of the time slice that it had allocated it and then it goes, oh, someone else's go now, oh look this thing's ready. So but what I'm sort of getting at here is there is a lot of things that happen between that packet arriving in your, even in your kernel, and your process being woken up to do anything about it, and that's a hugely variable amount of work too. 17:01.85 Ben Rady Mm-hmm. Mm-hmm. 17:02.70 Matt Godbolt And you can definitely reduce the amount of work. So the amount of other things that the CPU or the kernel is trying to do on a CPU so that your process is more likely to be woken up. There are schedulers that let you set these things or priorities and stuff, but, 17:22.62 Matt Godbolt The way that I intuitively do these type of things is using some kind of, yeah, like we said, DPDK, which is this sort of networking layer that lets you use shared memory with the network card. 17:37.62 Matt Godbolt And then instead of the kernel being involved at all, you just keep continuously reading a place of memory that the network card will write directly to and say, hey, I've got a packet for you. And then you just pick it up. 17:47.54 Matt Godbolt And then essentially your latency goes down to how fast a PCIe bus transaction can notify that a piece of memory has changed and you pick it up and off you go. 17:57.95 Ben Rady Mm-hmm. 17:58.08 Matt Godbolt Right. But that is still fairly esoteric. So it seems like that's not a fair thing. But I guess it's the difference between... right, if the program part that we've not even started talking about now is itself going to take tens of milliseconds because of the innate amount of work that it has to do. 18:19.18 Matt Godbolt And so the upper bound is dominated by its processing time. Then maybe the effort and all the work to put in shaving tens of microseconds off on the like the interrupt handling time isn't worth it, right? 18:32.55 Ben Rady Mm-hmm. Mm-hmm. 18:32.66 Matt Godbolt It's a lot of engineering effort. It's a lot of complexity. It's a lot of non-standard nonsense. But if we wanted to make something go as fast as possible, those are the kind of things that I would take to doing is to take the kernel out of the equation and burn a CPU core just spinning, doing nothing other than saying, is there anything to do? How about now? How about now? How about now? In a tight loop and potentially reschedule everything else so they can't run on the same CPU as me. There are ways that you can do this with CPU sets with isolation. 19:06.92 Matt Godbolt Trickery, isolcpus in a kernel command. There's a lot of weird things you can do. Anyway, right. And then you can get it down to, in the limit, single digit microseconds between a packet arriving and you going, okay, I have work to do. 19:22.30 Ben Rady Okay. So if we assume that the work that needs to be done is relatively simple, right? You're going to do some adds, you're going to do some multiplies, you're going to try to structure your code so that you're really only doing adds and multiplies, and then you're going to produce some output that is purely a function of that input and whatever state you may have held in memory. 19:46.02 Matt Godbolt All right. So what you're talking about, what we've got is an ALU as a service. You know, we are, have you seen this? There's a guy who's written a microservices based CPU emulator where every single instruction is handled by a different microservice, each of which is written in a different language and the thing runs and it's so... but yeah, isn't it cool? 20:05.10 Ben Rady Oh my God, that's amazing. Man, 20:07.52 Matt Godbolt I love that people think of these things. 20:10.68 Ben Rady you think you were in microservice hell. You should. 20:13.76 Matt Godbolt Yeah, you've got like literally 256 different things and a RAM service, you know, memory service so that everyone can agree on this. 20:20.81 Ben Rady Oh, that's amazing. 20:21.18 Matt Godbolt It's cool. but So you're talking about something like that. Let's just say you know it is... 20:24.55 Ben Rady Yeah. 20:24.60 Matt Godbolt In fact, honestly, probably a decent... Unless you want to... I'm going to say these and you can tell me if where you're trying to steer it is right or not. But like something like a Redis memory cache is like one of these things. 20:36.59 Ben Rady Yeah. Right. 20:36.68 Matt Godbolt It's like something comes in, I look it up in RAM and I'm like, yes, I've got it. 20:39.44 Ben Rady Sure. 20:39.48 Matt Godbolt Here it is. It's like a small piece of data. You know, a key value store that is just, you know, like a cache, right? 20:46.39 Ben Rady Yeah. 20:47.12 Matt Godbolt You're not doing very much work. 20:48.74 Ben Rady Yeah. 20:49.10 Matt Godbolt It's not as as a banal as like I'm adding the three numbers you've given to me and giving them back to you. 20:54.64 Ben Rady Right. 20:54.78 Matt Godbolt It is, that you can imagine there is actual value in that, but it's still, you know. 20:58.76 Ben Rady Yeah. I'm trying to draw a line about the the other parts of the computer that you're going to be talking to. And what I'm trying to say by this is that you can't ignore memory, right? 21:08.54 Matt Godbolt I see. 21:08.62 Ben Rady It's not simply a function of the data that you've read in, but you don't have to go beyond that, right? 21:13.57 Matt Godbolt Yep. 21:13.90 Matt Godbolt So, okay. So let's so let's use like ah an in-memory cache, like a yeah Redis is like without the disk backing type stuff. 21:19.72 Ben Rady Yeah. Right, right. Mm-hmm. 21:20.32 Matt Godbolt Yeah. Yeah, yeah. That's perfect. Okay. So yeah, you're looking, I mean, at that point, there's a, without going into the extremely fun and interesting rabbit hole of how you actually do the lookup, 21:34.34 Matt Godbolt and what data structures you use. But that is probably the key thing is that like, you must consider the data structures you use. But the interesting thing is that with a latency lens, 21:45.34 Matt Godbolt You're not looking at data structures in quite the same way as you do when you are doing like CS, whatever. I don't know what numbers that you use, but you know when you're doing your data structures courses and you're like looking at big O notation, like, you know, big O of N, whatever, N log N for your, you know, so like a not unreasonable data structure for you 22:09.82 Ben Rady Right. Yeah. Right, right. 22:11.70 Matt Godbolt a key value store is something like a balanced red-black tree, right? It's log N to look into it, whatever. 22:20.35 Ben Rady Okay. 22:20.95 Matt Godbolt Another one is a B tree or a B plus tree or one of those kinds of trees, you know, hash maps are other ones as well. And, you know, obviously hash maps do have in theory O of one. So you'd probably want to pick that out of big O notation or whatever. 22:33.61 Ben Rady Right. 22:34.42 Matt Godbolt But the other ones seem reasonable as well. Like it's like, but, the dominating factor with something as straightforward as the thing you've just described, where we're literally just looking something up in the key value store, is going to be reading from RAM. 22:46.96 Matt Godbolt Now, if you're talking about something that's so small it fits into L1, L2, or L3, which are in the of the order of tens of Ks in the L1 case up to several megabytes at L3, and you know bigger than that on bigger service servers and whatnot, but like we're still talking like not all that much information. 23:07.27 Ben Rady Right. 23:07.68 Ben Rady can Can you describe for our listeners, he said, covering for his own ignorance, what L1, L2, and L3 actually are? 23:14.28 Matt Godbolt Oh, I'm so sorry. So there are layers of caches. Caches are expensive both in terms of the amount of space they take up on a chip and how fast, you know going back to our Grace Hopper, you know the bigger something is, the longer it takes to get to the other side of it. 23:36.44 Matt Godbolt So the way that CPUs 23:39.01 Ben Rady Mm-hmm. 23:39.56 Matt Godbolt prevent us from having to read from the very slow RAM, which we'll talk about in a second, is that we have on-chip caches, and they are layered so that you have a very small, incredibly fast level one cache, which is usually... 23:53.32 Matt Godbolt not very large, tens of kilobytes kind of area. Then you have a layer two cache, which is maybe a meg, maybe hundreds of K, that kind of area. And then a layer three cache, which is maybe tens of megs, maybe even a little more than that. 24:09.56 Matt Godbolt And each of them is slightly further away, like literally physically. They're still on the actual thumb size chip inside the machine. They're further away, but it takes longer to look through them to find the data. 24:20.94 Matt Godbolt And so if you can fit something in L1, it's kind of almost free. 24:26.59 Ben Rady Mm-hmm. 24:26.84 Matt Godbolt I'm going to put air quotes over this. We're talking somewhere in the region of, you know, five or six cycles to read from L1. And that includes all of the other things that are like the fixed cost of reading from memory includes doing the lookup for the logical address that you've decided to read from and then turning it into the actual physical address that it needs to go to the correct chip, you know, different processes think that address 2000 has different data in it, right? 24:54.46 Ben Rady Yeah, yeah, yeah. 24:54.86 Matt Godbolt And something has to do that translation. And that's not at this scale. When we're talking about, you know, a three gigahertz machine, a third of a nanosecond is a clock cycle. It's that that look up itself takes time. 25:06.38 Ben Rady Right, right. 25:06.64 Matt Godbolt And so, you know, we need to be doing something. So that's about the limit of it. And there's some other really complicated things that the poor memory system has to do because of the weird out-of-order engine. But we'll ignore that for now. So five or six cycles for like an L1. 25:19.38 Matt Godbolt Now I'm going to forget this off the top of my... should have brought this up beforehand, you know, but like tens of cycles, low tens of cycles for L2, maybe late tens, you know, 80, 90 cycles, maybe L3. No, I don't think it's that much now. 25:30.52 Matt Godbolt Someone's, so you know, order of magnitude-y things. It's, you know, still, probably only tens of nanoseconds but then you go out to RAM and it can be hundreds of nanoseconds if you actually need to go out to RAM. So those are the layers. The other thing that's interesting is that usually on chips the L1 and maybe the L2 are unique to the CPU core that you're running on so you just get a copy of your own and then that's yours and no one can do anything with it with the 25:57.53 Ben Rady Right. 25:57.59 Ben Rady Okay. 25:57.64 Matt Godbolt massive asterisk and footnote for older CPUs, but we maybe won't go there. And then the L3 typically is shared amongst the the other cores that are physically on the same chip as you. 26:08.76 Matt Godbolt And so there are some contentions and there are obviously conversations that have to happen between the chips to say, hey, I'm reading that. Oh, I was just writing to it. Oh, okay. Well, we need to make sure that we do this in the right order. 26:17.23 Ben Rady Right. Mm-hmm. 26:17.30 Matt Godbolt So there's a little bit more complexity there. But yeah, so if we're still talking in single, you know measurable, human, countable numbers of cycles, keeping it inside the L1 or L2 is great. L3 is okay. And if we have to go to RAM, we have to be a bit careful about things. And at that point, two things are important to note about the way caches work. 26:38.80 Matt Godbolt One is that the sort of the whole... the whole theory behind a cache is that having read a byte of memory, it's really, really likely you're going to read the bytes that are around that byte of memory. 26:52.59 Ben Rady Mm-hmm. 26:53.38 Matt Godbolt And so the fundamental unit of work from the memory system on the CPU to the caches and beyond is a sequence of bytes, which is usually 64 or maybe 128 bytes long. And that's kind of like the atom that gets shipped back and forth. And some chips even do things where you say, well, you ask for this block of 64 bytes, I'll just get the next one and maybe the one after that while we're going, because we might as well having just done that. And so this sort of temporal usage pattern is incredibly important for latency because if you're following a linked list, as you may be if you just have a naive hash map, suddenly you're jumping randomly around in memory and the poor system can't make a guess. 27:32.89 Ben Rady Right. 27:33.22 Matt Godbolt But if you are if you just have a damn array of everything and just go, well, it's expensive, it's an order N operation for me to search through these things, but I'm going to run through them linearly, 27:43.92 Matt Godbolt suddenly it may be more worthwhile from a cache standpoint, especially if a memory, a cache misses hundreds of cycles. Well, I can do a lot of work in a hundred cycles on an out-of-order execution machine that has 12 execution units, right? 27:58.04 Matt Godbolt So why don't just do tons of work and hope that that's quicker than going to RAM? So anyway, popping the stack all the way back, to keep the latency as low as possible, you're going to have to think about this. 28:09.35 Ben Rady Yeah. Mm-hmm. 28:09.48 Matt Godbolt And so you might want to make some concessions where the distribution of your latencies needs to be taken into account. Now, if you're going to say like, well, most of the time, I think I'm going to be... this like 80% of the stuff we're reading is hot. Like it's going to be commonly used again, in which case I'm going to use a data structure, which is fine. 28:36.44 Matt Godbolt And that 80% fits inside. It doesn't matter what I'm doing, whether it's a balanced tree or whatever, it fits inside L3, let's say. 28:43.94 Ben Rady Right. 28:44.14 Matt Godbolt But the 20% that's outside of it is just outside. It's going to RAM. We're just going to have to take the hit every time. And the fact that it's incredibly expensive because we're using a balanced tree. So I guess now we turn around and say, like, it's not just how fast to go. It's like, what is the shape of your distribution? Is it better to have the average case fast and the bad case awful? 29:04.70 Ben Rady Right. 29:05.06 Matt Godbolt Or is it better to bound the bad case and say, well, actually, I'm going to use a linear search every single time. And I'm going to say that even the fast one where the linear search, you know, it would have been faster just to check, I don't know, the first one or pack them in a different way. But, you know, I'm paying the cost to do 16 compares, 29:25.00 Ben Rady Yeah, yeah, yeah. 29:26.06 Matt Godbolt Every time and just go, well, that costs me a little bit more than one. 29:29.08 Ben Rady Right. 29:29.12 Matt Godbolt Of course it does, but it makes the worst case go away, but at the lowest. 29:36.03 Ben Rady Right. 29:36.16 Matt Godbolt So yeah, sorry. I've just gibbered around in a big, as I've sort of like realized the enormity of the question you've asked. Yeah. Yeah. 29:42.75 Ben Rady that was That was the intention of the question. But yes, whenever you're talking about latency, you need to not, it's not like, oh, it's 10 microseconds of latency. 29:51.11 Matt Godbolt Yeah. 29:51.24 Ben Rady It's no, it's 10 microseconds of latency in what case? What percentage of the time? You know, oftentimes you talk about like P95, P99, P99.999 numbers. 30:01.42 Matt Godbolt Yeah. 30:01.42 Ben Rady Of like what percentage of the responses are going to be less than some particular target time. And you may have multiple layers of that, right? 30:11.86 Ben Rady Like you may have a maximum threshold that you need and like a P99.99 and then like a P95 and then like a median, right? 30:22.78 Ben Rady And you might want to like structure things you know differently based on whatever those targets might be. So in our case, sort of this hypothetical, you know, robot dog type plugged into a PC perhaps that's receiving the signal. 30:40.60 Ben Rady Maybe what we'd want to say here as one fork of the tree to go down is actually the thing that I care about is the worst case latency, right? 30:49.84 Ben Rady I want the... because if the latency gets too high, the dog is going to fall over and that will be bad, right? 30:54.58 Matt Godbolt Right. 30:54.90 Matt Godbolt That's a very common thing for like, yeah, control boards and things. 30:59.11 Ben Rady Mm-hmm. 30:59.28 Matt Godbolt And you know this is why in those situations, typically embedded systems who have that kind of very strong reliability requirements will use not regular operating systems. 31:12.04 Matt Godbolt They're using you know real-time operating systems that can give you hard deadlines for things happening within a certain amount of time and or do no preemption at all. 31:19.47 Ben Rady Mm-hmm. 31:19.56 Matt Godbolt So it's not like a case that your your process will have its CPU time taken away from It's like, you know, it's your responsibility to to keep running until you get to the end and then yield to the next guy so that you can... 31:30.30 Matt Godbolt But and this is a huge problem. As computers have gotten more complicated, it becomes harder and harder to reason about what an upper bound could be. 31:40.94 Matt Godbolt You know, picking up my 6502 in its box over here. 31:45.06 Ben Rady Right. It's a little easier. 31:45.68 Matt Godbolt You know, it's a little easier. You know, it's it's like I can tell you exactly how long any single thing is going to take all the time. 31:52.49 Ben Rady Yeah. 31:52.54 Matt Godbolt It's completely deterministic and it's well documented. 31:56.34 Ben Rady Right. 31:56.74 Matt Godbolt But we have to reverse engineer what the hell an Intel chip is doing in order to understand... 32:02.86 Ben Rady Essentially. 32:02.86 Matt Godbolt Sort of maybe what the worst case is in that particular situation, right? You know, hey, what happens if, you know, we are trying, we have an L3 cache miss at the same time as another CPU core, like a physically distant core has an L3 cache, both go for it at the same time. 32:20.69 Ben Rady Mm-hmm. 32:20.88 Matt Godbolt How does that work? Is there some really weird edge case where, you know, in the fabric, it times out after some amount of time, I don't know, and it's hard to find out, but if I need to give you an actual, like, someone will get run over if the brakes do not come on when I put my foot on the brake, then maybe I can't, I certainly can't think about the world the same way, but I certainly can't use certain, like, CPUs and designs and operating systems like that. 32:45.75 Ben Rady Mm-hmm. 32:45.81 Ben Rady Mm-hmm. 32:45.86 Matt Godbolt So that's, but, it's interesting. 32:46.60 Ben Rady Yeah. So if I come to you and I'm asking this question of how do I make this program fast? And you go through all of these follow-up questions and what it turns out that I'm asking for here is how do I make sure that the round trip latency of this program that I have running on a just regular commodity PC is not going to ever be higher than 10 microseconds? 33:13.38 Ben Rady The answer is you can't. 33:14.98 Matt Godbolt I think that's a fair... Yeah, that the nearest people I can think of that have to deal with that specific problem are people who make digital audio workstations for you know consumer PCs because they they have... 33:29.46 Matt Godbolt A genuine latency bound because if you're playing a keyboard and there's a one second delay, that's not, between you're pressing the button and hearing the noise change. 33:40.68 Ben Rady Yeah, right. 33:40.79 Ben Rady Yeah. Yeah. Yeah. 33:40.96 Matt Godbolt That is not good for anybody. 33:42.88 Ben Rady Yeah. 33:43.38 Matt Godbolt And so the only way that you can get that is to make everything as low latency as possible. But if you go too far and you don't have enough buffering between you and the set of audio that's coming out, then if you, for whatever reason, can't keep up, you're going to get a massive pop or a crack or a click as suddenly you have to catch up and just drop a whole bunch of samples that you were generating. 34:06.40 Ben Rady Yeah, yeah, yeah. 34:06.58 Matt Godbolt So that's, I mean, I don't know how they do it, honestly. In fact, I do know people who have worked on them and like we should perhaps consider getting them on to talk about this kind of stuff because it's a fascinating problem. 34:20.36 Matt Godbolt But yeah, anyway, but back to your... 34:23.70 Ben Rady Yeah. So the result of this, you can't, so now it's like, okay, yeah, right. 34:29.56 Matt Godbolt Yeah, you can't, yeah. Or it's really, really, really hard. 34:31.22 Ben Rady It's like, you're putting constraints on this problem that you can't really solve for. For example, you'd probably be better off using some specialized hardware rather than trying to do this on a commodity machine. 34:40.98 Ben Rady If you truly have a constraint that, you know, of, of let's just say 10 microseconds or something like that. 34:47.38 Matt Godbolt This is to an extent, I mean, obviously, microcontrollers have been around longer than computers because they've been in things or that's the original design. 34:54.09 Ben Rady Mm-hmm. 34:54.14 Matt Godbolt But that's that is why the brake disc controller in my car is not controlled by the one PC that is like running the the AV display in the middle panel, right? 35:05.19 Ben Rady Right. 35:05.25 Ben Rady Yes. 35:05.30 Matt Godbolt It is like a very specialist piece of software running on a very specialist operating system, which probably isn't an operating system by anyone else's definition of operating system. 35:15.20 Ben Rady Right. 35:15.65 Ben Rady Right. Right. 35:15.76 Matt Godbolt It's just a library you link against and then sets up the interrupt vector and hands you a few nice abstractions, good luck. 35:22.62 Ben Rady Yeah. Yeah. 35:23.54 Matt Godbolt But that's why it's doing nothing but monitoring the disc because then finally you can potentially say, all right, we can upper bound this stuff. We do know what the maximum latency could be, and it is, and I can sit and work it out on paper because it's an ARM, this is an in-order ARM and it only has an L1 cache and even if it's whatever, you know, there are things that you can say about it and you can bound it that way. 35:47.84 Ben Rady Mm-hmm. Mm-hmm. 35:49.26 Ben Rady Okay. So I've got, we've got our inputs. Maybe we've had some discussions about our constraints here and we're like, all right, well, we can't do this to a maximum level of 10 microseconds. 36:00.98 Matt Godbolt Mm-hmm. 36:01.10 Ben Rady That's insane. We could maybe do something where our like P95's, maybe less than that, more than that. 36:07.80 Matt Godbolt Let's just say, I mean, yeah, you and I, we don't know. 36:11.37 Ben Rady Let's just something like that. Yeah. And then, 36:13.74 Matt Godbolt We're making this up as we go along as if it wasn't... 36:15.26 Ben Rady Yeah, and then and then our maximum is something much more reasonable like a second, right? 36:20.01 Matt Godbolt Okay, yeah. 36:20.76 Ben Rady It's like, all right, it's never going to take more than a second. If it does, then we have a bug and we should fix it, right? As opposed to, yeah, that's how the system works, right? 36:32.24 Ben Rady So now we've gotten our message from the outside world. We've gone and looked up the value that we've cached. Hopefully that's an L1 cache and hopefully we've done some smart things to make sure that it's there. But you know again, this is why we have these ranges of latency. 36:49.08 Ben Rady How do we write out the response? Do we have the same sort of operating system packet issue when we're trying to write this thing back out? 36:58.16 Matt Godbolt I mean, yes. If we're talking about standard, like, I mean, we keep changing what it is we're doing. Is it a dog that's falling over? Is it a UDP packet going into a... 37:05.06 Ben Rady Yeah, yeah, yeah. 37:05.28 Matt Godbolt Yeah. But, you know, like, if we're going to send a packet back out again, then yes, there's a similar dance on the way out through a traditional operating system where you write to a bit of memory that you can read and write in user space, and then you say, hey, send it, please. 37:19.10 Ben Rady Mm-hmm. 37:19.22 Matt Godbolt And and the kernel goes, well, I need to send it. I can't hand this bit of memory to... the network card because this bit of memory is just wherever you decided to put it. 37:30.32 Matt Godbolt So maybe on the stack, whatever. And certainly the DMA engine for your network card does not have carte blanche to read any piece of memory anywhere in the whole system for all the obvious reasons of like separation of concerns and to do with some of the ways that they work internally. 37:46.91 Ben Rady Mm-hmm. 37:47.20 Matt Godbolt So the kernel has to copy it somewhere else in a buffer where it knows it could hand the address of that to a network card. And then it returns, potentially returns back to you and says, yes, it's sent, even though it isn't. It's sat in a buffer. Time will pass. Eventually, the kernel will decide to schedule sending it. I mean, probably it will actually send it immediately, all things considered, unless the network card's already busy doing something else. You know, there's these... 38:10.10 Matt Godbolt And then the network card will be notified through some mechanism. And that's relatively expensive and requires a ping across the network. Sorry, I say a network. It is a network. The PCIe bus itself, or however your network card is plugged into your computer, is effectively yet another network. 38:26.66 Ben Rady Mm-hmm. 38:26.74 Matt Godbolt And there's another hop and there are... complicated things going on there. So the message is addressed to it. The network card goes, oh, you say you have a packet for me. How quaint, how lovely. Okay, I'll go and get the address from the ring buffer. 38:37.22 Matt Godbolt Here it is. Okay, and I'll start streaming it out. And so that process can be 38:41.00 Ben Rady Mm-hmm. 38:41.44 Matt Godbolt relatively latent too. And so again, you're less likely to suffer from the scheduling randomness that the kernel does, because it's not like something asynchronous happened and now the kernel has to pick you being amongst all the things that it's next to do. 39:00.27 Ben Rady Mm-hmm. 39:00.48 Matt Godbolt It's like, now you've given it a piece of work and it's very obvious what it should be doing with that work. 39:04.17 Ben Rady Uh-huh. 39:04.58 Matt Godbolt But there still is some redundant copying of data. There is still some messaging to an asynchronous process, which is the network card itself. 39:14.69 Ben Rady Again? 39:14.86 Matt Godbolt The network card can be like, yes, sure, fine. And then it could be busy doing something for all we know, right? You know, I don't know what what magic network cards are doing. Maybe it's, you know, a virtual network card that doesn't even exist as a physical thing. 39:25.24 Matt Godbolt And really it's pretending to be a network card, but in fact it is, in a data center on a shared piece of infrastructure. And what you're really doing is talking to the hypervisor, which then goes, oh, cool, a packet from one of my many virtual machines. 39:40.64 Ben Rady Hypervised. Yeah, right. 39:41.25 Matt Godbolt Yeah, hypervised children, right. 39:42.48 Ben Rady So I'm kind of, I was maybe teeing this up a little bit, but this is like somebody comes along and says, hey, Ben, you made this program to control this robot and you did it in a very silly way. 39:54.34 Matt Godbolt Yeah. 39:54.34 Ben Rady Why are you using a UDP network? That is not necessary. 39:58.06 Matt Godbolt Right. 39:58.36 Ben Rady Are there other devices that one could use, I mean, like a USB-C connection or something else that would be available to, you know, just a regular consumer, commercial computer, something. 40:12.96 Matt Godbolt Yeah, that's it. I don't know too much about the ins and outs of how the kernel interacts with the USB. I know that the underlying USB protocol and the thing you talk with, the hub or whoever you're talking to on the end of it is sort of a negotiated thing. You can ask for, you negotiate bandwidth, you negotiate... 40:31.89 Matt Godbolt I don't think you negotiate latency. I think you can negotiate isochronous, that is repeating things. So like if you've got a webcam plugged in, it can book, hey, I need to send this much data just all the time. 40:45.66 Matt Godbolt And it gets sort of time on the USB bus to be able to send packets over that data. But I don't know about, like, latency. And obviously things like you know USB keyboards and things and mice. And you know gamers obviously use USB mice and whatever, but we're still talking almost human level levels of latency there, you know millisecond, sub-millisecond, I'm sure. 41:04.24 Ben Rady Right, 41:04.28 Ben Rady Right, right. 41:04.36 Matt Godbolt So yeah. I don't really know. My best guess I would be able to have is not to say something like USB, because again, there are some layers of indirection between you and the actual device, even if you're talking the USB protocol through the kernel somehow. 41:21.20 Matt Godbolt But if you had like either, you know, GPIO on a, which is general purpose IO pins on one of the more microcontrollery type systems where you can literally just say, if I write to this memory address, it's not really a memory address. 41:36.33 Ben Rady Mm-hmm. 41:36.56 Matt Godbolt It is whatever data I put there is the highs and lows of these eight pins, right? 41:40.58 Ben Rady Yeah, yeah, yeah. Right. 41:40.72 Matt Godbolt And at that point, you're like, yeah, you're off to the races, right? Or, you know, say a serial interface, which although serial is incredibly slow and awful, it is also something where you're kind of directly attached to and you can say, like, I'm just driving it myself. Now, I'm sure that's no longer true. Yeah. 41:56.02 Matt Godbolt Now I say it out loud, I'm sure that it's virtualized through some mechanism in the kernel. And so it's no longer the case if you open /dev/tty0 that like if you get the OK, you are the only person talking to that now. 42:07.54 Ben Rady Mm-hmm. 42:08.06 Matt Godbolt And still, it would would suffer from that. If you read from it, then you would block and the kernel would put you to sleep until a byte came in. 42:15.84 Ben Rady Right. 42:15.94 Matt Godbolt So even that doesn't make sense. So GPIO is the nearest thing we have to the sit in a tight loop and wait kind of approach that we were talking about before, you know, for both input and output, you could say like, hey, you know, I'm doing a quiz show, whoever presses the button first kind of thing. 42:34.06 Matt Godbolt And there are eight buttons and they go into the eight bits inside my controller. 42:38.56 Ben Rady Yeah, yeah, yeah, yeah. 42:38.56 Matt Godbolt And I just sit in a tight loop reading from it. Obviously there are better hardware ways of doing that, but I'm just sort of thinking out loud. Yeah. Yeah, I don't know. Did you have an idea? 42:49.16 Matt Godbolt You've been driving this in a direction, and I'm wondering if you have a solution in mind that I have not got to. 42:54.94 Ben Rady No, no, no, I mean 42:55.98 Matt Godbolt By the way, have I got the job? Or oh... 42:58.90 Matt Godbolt is it Is it too late? 43:02.48 Ben Rady Well, you passed to the next round, but I'm going to have to, yeah, I know. 43:05.68 Matt Godbolt Oh, okay. We'll have to see. We're now 40 minutes in, so yeah. 43:08.43 Ben Rady Do you have any questions for me? 43:09.62 Matt Godbolt Do I not have questions for you? 43:13.40 Matt Godbolt We've done far too many interviews, haven't we? 43:15.72 Ben Rady Yeah, yeah, yeah, yeah, yeah. 43:18.13 Matt Godbolt Yeah. 43:18.74 Ben Rady I mean, I think that, I mean, we made the full kind of round trip, and I think that's good. I think the other, and if there is a part two to this, I think the part two should be, okay, now what happens when it's the actual like calculation that is the part that needs to quote unquote go fast, right? 43:34.02 Matt Godbolt Oh, yeah, yeah, yeah, yeah. 43:34.50 Ben Rady Like like the IO no longer dominates as it did in this example. And it's basically just this exercise. And how do you take this signal, make sure that you don't accidentally make it slow when it's being processed in the CPU and then get a thing back out. 43:47.52 Matt Godbolt That's an interesting one. Yeah, we should. 43:49.08 Ben Rady Now it's, okay, this is gonna take a material amount of time and we're trying to make it as fast as we can. 43:52.99 Matt Godbolt All right. 43:53.46 Matt Godbolt Yeah, which is probably what most folks think of when you say, can I make my code go fast, is the code part, not all the other stuff around the outside, but often that can dominate. Yeah, no, that would be an interesting one. 44:03.38 Ben Rady Right. 44:03.60 Matt Godbolt So yeah, okay, we'll... well and then we've got to talk about throughput at some point, and maybe that will leg into throughput, because, you know, I think there is part of it, but we'll see. 44:13.11 Ben Rady Yes. Yeah. 44:13.24 Ben Rady Mm-hmm. 44:13.30 Matt Godbolt Well, all right. 44:13.97 Ben Rady Maybe. 44:14.02 Matt Godbolt We rather mysteriously then, and now you and I have to look each other in the eye and go, we will remember that the next episode we record will be the continuation of this one. 44:23.12 Ben Rady Yeah. 44:23.20 Matt Godbolt And we won't just leave our poor audience hanging. 44:25.43 Ben Rady Yes. 44:25.80 Ben Rady Right. 44:26.10 Matt Godbolt So I apologize in advance if we, if we do in fact, but I think this has been very interesting. 44:31.89 Ben Rady Right. 44:32.24 Ben Rady Yeah, that's good. 44:33.52 Matt Godbolt Hopefully our listener agrees and we will, I will see you next time, my friend. 44:38.26 Ben Rady Next time. 44:45.67 Matt Godbolt And cut.