00:20.58
Matt Godbolt
Hi, Ben.

00:21.36
Ben Rady
Hi, Matt.

00:22.52
Matt Godbolt
How are you doing?

00:23.88
Ben Rady
I'm doing pretty well.

00:25.22
Matt Godbolt
We've just had a moment of exasperation, haven't we?

00:27.96
Ben Rady
Yeah, we're probably talking over the intro. Probably not at this point, but maybe earlier.

00:32.66
Matt Godbolt
Maybe earlier. I don't know. Yeah, that... it'll... I'll fix it in post, as I always try to do, or increasingly less. In fact, I try.

00:43.74
Matt Godbolt
I think I've kidded myself that the charm is all about like the fact that a dog's barking in the background and there are children screaming across the road and... we lip smack and other things.

00:58.20
Matt Godbolt
It's kind of ASMR actually, isn't it?

00:59.95
Ben Rady
Yeah.

01:00.06
Matt Godbolt
No. Yeah. ASMR. So recently there's been a lot of talk about the Dutch manufacturing company that makes the lithographic systems for chip manufacture. And they're called ASML.

01:14.52
Matt Godbolt
And I just can't help but get the two things mixed up.

01:17.14
Ben Rady
Ah, every time.

01:17.20
Matt Godbolt
They're very different things, but that's not what we came here to talk about today. Is it?

01:23.04
Ben Rady
It's not, it's not. I have an idea. I haven't... I have a thing.

01:27.90
Matt Godbolt
You have an idea. Okay.

01:30.80
Matt Godbolt
Tell me about your thing.

01:31.82
Matt Godbolt
Whoa.

01:32.18
Ben Rady
Let's say, careful now.

01:34.32
Ben Rady
Let's say for the sake of argument, I have a program.

01:41.28
Ben Rady
Stop it, phone.

01:43.64
Matt Godbolt
There we are.

01:43.70
Ben Rady
I have, exactly.

01:43.92
Matt Godbolt
Some more charm. and More charm has just entered the room. His phone, Ben's phone, unmuted phone.

01:49.78
Ben Rady
I have a program and I want it to run fast. How would I do that?

01:56.60
Matt Godbolt
Oh, wow. That is a great question. And we only have around about 30 minutes, 40 minutes to talk about it. So I've got a feeling.

02:07.08
Ben Rady
So immediately this is gonna be our first 10 part series.

02:10.14
Matt Godbolt
Maybe so. I don't know. I don't know. This is one of those questions because I think obviously deliberately you have asked me a leading and vague question. Because what do you mean by, in fact, each of those words, right?

02:25.12
Ben Rady
Can you please define each of the words that you just used in that sentence?

02:29.07
Matt Godbolt
Yeah.

02:29.21
Matt Godbolt
Right.

02:29.34
Ben Rady
Yes, exactly.

02:29.74
Matt Godbolt
My program, make my program fast.

02:32.72
Ben Rady
Right.

02:33.34
Matt Godbolt
All right. So I think make, we can, yeah, cause it to become my program. You have something that you wish to go fast. Right. So the only word we really need to define then is fast.

02:44.84
Matt Godbolt
And I'm pretty sure you don't mean stopping eating for a period of time.

02:48.62
Ben Rady
Right. Well, I think you could also define my program because in this hypothetical thing is I haven't written anything yet.

02:54.42
Matt Godbolt
Oh,

02:54.70
Matt Godbolt
oh

02:54.98
Ben Rady
Right.

02:55.50
Ben Rady
It's like, I want to write a program that doesn't exist, but I want the execution of that program to be fast for some definition of fast that we will define shortly.

03:07.76
Matt Godbolt
okay.

03:07.86
Matt Godbolt
Right. And so, yes, not no eating, not also unmoving, which is the weird other case of you know stuck fast, which is like one of those really weird auto-antonymic words, you know, like inflammable or something like that.

03:21.15
Ben Rady
Right. Right. Uh-huh. Yes. Stuck fast. It's the opposite of this. Yeah. Uh-huh.

03:21.50
Ben Rady
Right.

03:22.44
Matt Godbolt
You mean quick, to go quickly. All right. So let's talk about what that could mean. Right.

03:26.40
Ben Rady
Yes.

03:26.78
Matt Godbolt
So the instant you say that and because of what we've been doing professionally for a long time and for what I've been doing most of my career, I think of low latency as fast.

03:39.54
Matt Godbolt
Like if you if you decide you need to do a thing, if your program needs to do a thing and it discovers that that need, it should act upon it as quickly as possible having discovered that need.

03:50.32
Ben Rady
Mm-hmm.

03:50.82
Matt Godbolt
That is what I mean by sort of low latency. It reacts to a real-time thing as quickly as possible. And I think a lot of embedded systems work in this world. A lot of UI, in fact, is this, right?

04:01.38
Matt Godbolt
If you said, hey, I want my, you know responsiveness in a UI is about latency, right?

04:05.94
Ben Rady
Right.

04:06.04
Matt Godbolt
But that's one form of fast.

04:08.79
Ben Rady
Mm-hmm.

04:09.94
Matt Godbolt
But there's a very real, very important other kind of fast, which is I am... I'm writing a distributed message passing system.

04:21.20
Matt Godbolt
You know I'm... I've Mastodon, right? I've... and I'm dealing with an enormous volume of information.

04:27.10
Ben Rady
Yeah.

04:27.10
Matt Godbolt
And if I was slow at doing that, eventually I would back up and there would be I would never keep up with with the current... Well, it doesn't matter. Individual messages may take a few seconds to go through if that's not fast.

04:40.00
Matt Godbolt
I mean, maybe that is fast for that scalar network, right? But you're talking about a throughput problem, you know, or bank clearing is another one.

04:46.28
Ben Rady
Right.

04:46.32
Matt Godbolt
They've kind of got a bit of latency.

04:47.34
Ben Rady
Mm-hmm.

04:47.50
Matt Godbolt
You beep your card when you're at the ah the coffee shop and it goes beep and sort of you've got it waiting six or seven seconds and that's a completely fine human amount of time. You know, it's an eternity for computers.

04:58.16
Matt Godbolt
But on the other hand, there are... several hundred million people in that same five second period tapping their card on their reader in the same time.

05:07.73
Ben Rady
Right, right.

05:07.92
Matt Godbolt
And so there's a lot of things to go. So those things, so we've got latency and throughput. And are there any other aspects?

05:17.30
Ben Rady
I think for the definition of this conversation, I think latency and throughput are two dimensions of fast which we should consider.

05:25.98
Matt Godbolt
Okay. Okay.

05:27.44
Ben Rady
I think you could maybe come up with some other ones, but again, we have 30 minutes.

05:32.18
Matt Godbolt
Well, I mean, again, we can make this into more. but So so do you want which one do you want to start with?

05:37.11
Ben Rady
That's true.

05:37.44
Ben Rady
I think we should start with latency.

05:39.44
Matt Godbolt
Okay. Okay. So then we get into this sort of amazing set of kind of questions that we get into when we're dealing with computers because how fast is fast now? Right, how quick is – like we just said, if I'm a human waiting at a sales kiosk,

06:00.02
Matt Godbolt
If it takes me 30 seconds to pay for something, that is not fast. If it takes me three seconds, it's probably okay.

06:08.70
Ben Rady
Mm-hmm.

06:09.36
Matt Godbolt
If it takes 0.3 seconds, I won't notice that that anything happened, really.

06:13.70
Ben Rady
Yeah.

06:13.78
Matt Godbolt
There's a beep and I'm out of there. And any other order of magnitude less than that, like 0.03 seconds or whatever, is immaterial to me at that point, and and it doesn't matter.

06:22.47
Ben Rady
Right.

06:22.55
Ben Rady
Mm-hmm.

06:22.64
Matt Godbolt
But you and I work in a world where at least some of the things that we have done or are adjacent to, we are talking about maybe seven or eight orders of magnitude faster than those three seconds.

06:35.28
Ben Rady
Mm-hmm.

06:35.60
Matt Godbolt
We're talking... Hundreds of nanoseconds, hundreds of microseconds.

06:39.80
Ben Rady
Yeah.

06:39.94
Matt Godbolt
And I have to remind myself that a nanosecond is a billionth of a second, right?

06:44.73
Ben Rady
Yeah. Yeah.

06:44.82
Matt Godbolt
You know, sometimes like we throw these words around. I mean, maybe this is just, again, you and I and the weird world that we live in. But like nanoseconds are kind of a completely normal measurement for a, you know, three gigahertz computer.

06:56.52
Ben Rady
Right.

06:56.80
Matt Godbolt
A few clock cycles is a few nanoseconds. That's completely normal. Yeah, that's reasonable. And, you know, like to, you know, good old Grace Hopper's, you know, light foot measurement, that a nanosecond is.

07:07.56
Ben Rady
Yeah. Yeah. Uh-huh.

07:07.56
Matt Godbolt
Here's a foot of copper wire. That's a nanosecond. You're like, oh, shucks, man. You know, that's the speed of light. It's the fastest thing that there is can only go a couple of feet in the time it takes me to add two 64-bit numbers together.

07:20.88
Matt Godbolt
That is insane, frankly, that we're at that stage.

07:24.38
Ben Rady
It is insane. You know, whenever I think about that, I just think about Grace's wire just twisted around on the inside of the CPU and all the places where it would need to go in order for the light to move in that way to add those two numbers.

07:36.62
Ben Rady
And it just makes my brain melt.

07:37.98
Matt Godbolt
It's, yeah, it is quite something. And the fact that, yeah, we've taken these Factorio factories and we've crushed them down into the size of my you know thumbnail and somehow snaking around in there are many, many, many, many hundreds of feet of cable.

07:56.37
Ben Rady
Yeah. Yeah.

07:57.46
Matt Godbolt
As you say, each individual thing. Electron is – well, they don't go all the way down. It's complicated, right?

08:05.20
Matt Godbolt
Let's stop pretending that we can talk about that as well.

08:07.09
Ben Rady
Yeah.

08:07.44
Matt Godbolt
So, yeah, we've got many orders of magnitude here. So, you know, we could break it down into things that need to be measured in nanoseconds, in microseconds, in milliseconds, and seconds. And, you know, that's, I think, probably – after that, I think, you know, the latency of something – maybe, I mean, maybe you still can measure it in minutes, right? I mean, let's think about like you know publishing a video on YouTube.

08:28.36
Matt Godbolt
I've just updated my video on YouTube and I click the button to publish it and it takes how many minutes to transcode it to all the different formats that YouTube does before it becomes visible. Maybe that is measured in minutes and maybe I do care about that latency.

08:40.75
Ben Rady
Mm-hmm.

08:41.35
Matt Godbolt
But I think the thing that I can confidently talk about, and maybe you as well, is the sub-second orders of magnitude latency.

08:52.74
Ben Rady
Yeah, yeah, yeah.

08:52.94
Matt Godbolt
And so let's talk about what can you do in that amount of time?

08:55.80
Ben Rady
Right, right.

08:57.96
Matt Godbolt
What, you know, that's a good starting point of like, only, you know, if you say I would like my program to be fast without even knowing what your program is, I've got some bounds on what you could reasonably expect to do in hundreds of nanoseconds, hundreds of microseconds, hundreds of milliseconds, right?

09:14.40
Ben Rady
Mm-hmm.

09:14.53
Ben Rady
Right, right, right, right, yeah.

09:15.22
Matt Godbolt
On commodity hardware, if we're just talking about things like a regular PC or a regular, you know, so a decent server in a data center somewhere or a laptop or something like that, they're all within spitting distance of each other.

09:26.15
Ben Rady
Right.

09:26.32
Ben Rady
Yeah. So if we think of latency as sort of like the time it between an input is provided and an output is produced,

09:33.46
Matt Godbolt
Yeah, sounds good.

09:33.90
Ben Rady
then automatically down in the like tens of nanoseconds range, you're sort of like, what output could you possibly consume or what input could you possibly consume and what output could you possibly produce in that span of time, right? Because it's got to go into the computer and then come out. So like maybe a theoretical example here would be like a robotics application where you have some like active control device of a a system or or you know a set of servos or something like that where you're trying to like, you know, imagine like the, you know, like the robotic dogs type thing where you're just like, you know, taking like some information from like the sensor on ah on ah on a robot and then feeding it back and using that to adjust. Like that's already going to be up probably in the like low milliseconds range, probably.

10:24.06
Ben Rady
I mean, I don't know. I'm not an expert in robotics, but

10:26.74
Matt Godbolt
I'm not sure either. No, but I mean, you think about things that do like autobalance, or you know, drones, like a still that can't be milliseconds because it's already upside down by the time a millisecond of wind has caught it.

10:37.22
Ben Rady
Yeah, yeah.

10:37.22
Ben Rady
Yeah, yeah, yeah, yeah, yeah.

10:37.22
Matt Godbolt
Right. So they must be a little faster than that. But, you know, maybe we're talking microseconds at that level. It seems not totally unreasonable to have some sensor update and put something on a bus internal to a system on a chip kind of thing that can then go, oh, wait a second, I need to speed up this rotor a little bit more to account for that.

10:54.88
Ben Rady
Mm-hmm, mm-hmm.

10:55.14
Matt Godbolt
Right. But you you're absolutely right.

10:55.86
Ben Rady
Yeah. Cause a lot of those times those outputs can be sort of like targets, right? So they don't, it's not necessarily that it needs to go like all the way out to like a servo or something. It's just kind of like, okay, I've gotten this input and now I need to adjust, you know, my targeted angle here. Right. And there's maybe even some other external system that is like actually making the adjustments to the physical objects, but like you might want that target to update on the order of,

11:21.52
Ben Rady
you know, tens of mics or hundreds of mics.

11:24.02
Matt Godbolt
Right.

11:24.82
Ben Rady
Just because when you do, you get sort of like a smoother, better prediction of like where the arm needs to be or something like that.

11:31.42
Matt Godbolt
Right, right, right, right. And yeah, so I mean, let's to sort of break it down to consumer hardware.

11:35.59
Ben Rady
Yeah.

11:35.60
Matt Godbolt
It is difficult to get a message in and out of a computer in, you know, I mean, like you just do a ping, right? You ping a computer on your local network, and it's less than a millisecond, but it's, you know, still hundreds and hundreds of microseconds probably to go through commodity hardware.

11:52.95
Ben Rady
Yeah.

11:53.28
Matt Godbolt
And, you know, for the kind of weird stuff that you and I have been involved with, there is some very specialist hardware that happens, which we can't really talk about but on the far end into the nanosecond range of things. So I suppose if we're talking about

12:03.54
Ben Rady
Mm-hmm.

12:03.54
Matt Godbolt
Things that most folks could reasonably get, put their fingers on a keyboard and program, be it a little embedded system like an Arduino or an ARM-y type thing, or indeed a, you know, a program like I'm typing on a computer and I'm expecting, when having pressed a key press to see the key appear in the terminal, that kind of thing.

12:21.80
Ben Rady
Mm-hmm.

12:22.77
Matt Godbolt
we we've We've kind of picked now our range. Now we're talking probably hundreds of mics to low millis.

12:31.42
Ben Rady
Mm-hmm. Mm-hmm. So I've got a program.

12:33.50
Matt Godbolt
So, Yeah, so we're down.

12:34.22
Ben Rady
It's going to...

12:34.32
Matt Godbolt
Yeah, that's right. I forgot where we were now. yeah

12:36.62
Ben Rady
It's gonna read some input delivered from outside the computer, and this is a commodity computer, you know, like, and then it's gonna produce some output that is also going to leave the computer, right?

12:51.38
Matt Godbolt
Yep. Yeah.

12:52.38
Ben Rady
And we're thinking that this is gonna have to occur, it's gonna take at least hundreds of mics for that to occur. Is that a fair statement?

12:59.36
Matt Godbolt
I think so. Without without getting...

12:59.78
Ben Rady
So that's like the upper bound of how fast we can make it for my definition of fast.

13:03.34
Matt Godbolt
I think so. I mean, you know, there are exotic techniques and things that you can even use on commodity hardware that is a little bit more exotic, like DPDK, which is a sort of network bypass thing that you can fiddle with.

13:16.16
Matt Godbolt
And if you put a second network card in your computer, you can go faster. But yeah, I think let's say, yeah, already, whatever you're doing, if you're responding to an external network event, you're doing any kind of reaction to it and sending it back out again, we're in the...

13:30.66
Ben Rady
Yeah.

13:30.66
Matt Godbolt
Microsecond region but probably hundreds of microsecond region would be my instinct and maybe I'm wrong, maybe I've forgotten how, you know, computers are these days. But, or at least like normal computers, right. In normal computers, I mean actually I'm... I was telling you before we started recording I'm surrounded with networking gear and computers and things and AI agents running things for me as I'm setting up. I'm actually upgrading my home network for up to like two and a half gig because that's actually something that is not totally unaffordable for the home these days now and so I'm like, hey, maybe my ideas about what commodity hardware is is out of date but...

14:07.50
Ben Rady
Yeah.

14:07.73
Ben Rady
Yeah.

14:07.85
Ben Rady
No one knows how computers actually work.

14:08.75
Ben Rady
Yeah.

14:09.00
Ben Rady
Yeah. So my first question would be when I'm writing my program, how do I make sure, because my assumption here is that the input here is unpredictable. It's not like you're gonna start the program and immediately be provided the input.

14:22.66
Ben Rady
The program is running,

14:24.33
Matt Godbolt
Mm-hmm. Yeah.

14:25.28
Ben Rady
There's some thing outside of the computer that creates this input, and then the program has to react and create an output, right? So my first question to you is how do I make sure that there's the minimum possible latency between the generation of that signal and my the code in my program running?

14:42.38
Matt Godbolt
Yeah.

14:42.38
Ben Rady
Something, something ePoll, right?

14:45.80
Matt Godbolt
Well, that's... No, exactly right. No, well, yes and no. I mean, that goes into the DPDK kind of sort of thing I alluded to. So yeah, you know a traditional server program that is maybe blocked on a socket.

14:59.18
Matt Godbolt
So you've got a socket, you've got, you know let's say UDP just to make things maybe easier. I don't know, right? You've got a blocking call to receive, which means that your process has gone to sleep

15:11.80
Ben Rady
Mm-hmm.

15:12.24
Matt Godbolt
Told the operating system, hey, anything interesting coming in on this port, this UDP port, I'd like to know, please. And then I've gone to sleep and I've been descheduled. And now I've just been chucked in the list of all the things that are currently, all the processes that are currently asleep and have nothing to do.

15:30.60
Matt Godbolt
So that's where we are now. Packet comes in, goes through the standard kernel process of saying, well, who's this for? Oh, it's for port 9007. That is registered to this process. So we're going to write it into a buffer now rather than discard it, or rather we keep it probably around in a buffer that exists. First of all, is there space in that buffer? Yes, there is. Okay, this is something you can configure for each socket, how much space that you can have, which gets complicated if you're sharing multiple processes listening on the same.

16:01.14
Matt Godbolt
It's, yeah. Anyway, and then we go, well, who cares about this, right? We've put it into the buffer now. We've copied the data. Who cares? Oh, there's a process. Okay, well, this process is waiting on this, so we should mark it as ready. This process is now ready to be scheduled. And then...

16:17.82
Matt Godbolt
You're done. And then as part of the whole, you're done process, the kernel goes back to, well, I guess I'm now finished, but back to the user process. Now who, who should be run?

16:30.30
Matt Godbolt
And maybe there was something else that was going on on that CPU when the packet came in and that other process is actually still higher priority and whatever queuing fairness there is, in which case it will merrily do whatever it was doing before until it runs out of the time slice that it had allocated it and then it goes, oh, someone else's go now, oh look this thing's ready. So but what I'm sort of getting at here is there is a lot of things that happen between that packet arriving in your, even in your kernel, and your process being woken up to do anything about it, and that's a hugely variable amount of work too.

17:01.85
Ben Rady
Mm-hmm. Mm-hmm.

17:02.70
Matt Godbolt
And you can definitely reduce the amount of work. So the amount of other things that the CPU or the kernel is trying to do on a CPU so that your process is more likely to be woken up. There are schedulers that let you set these things or priorities and stuff, but,

17:22.62
Matt Godbolt
The way that I intuitively do these type of things is using some kind of, yeah, like we said, DPDK, which is this sort of networking layer that lets you use shared memory with the network card.

17:37.62
Matt Godbolt
And then instead of the kernel being involved at all, you just keep continuously reading a place of memory that the network card will write directly to and say, hey, I've got a packet for you. And then you just pick it up.

17:47.54
Matt Godbolt
And then essentially your latency goes down to how fast a PCIe bus transaction can notify that a piece of memory has changed and you pick it up and off you go.

17:57.95
Ben Rady
Mm-hmm.

17:58.08
Matt Godbolt
Right. But that is still fairly esoteric. So it seems like that's not a fair thing. But I guess it's the difference between... right, if the program part that we've not even started talking about now is itself going to take tens of milliseconds because of the innate amount of work that it has to do.

18:19.18
Matt Godbolt
And so the upper bound is dominated by its processing time. Then maybe the effort and all the work to put in shaving tens of microseconds off on the like the interrupt handling time isn't worth it, right?

18:32.55
Ben Rady
Mm-hmm. Mm-hmm.

18:32.66
Matt Godbolt
It's a lot of engineering effort. It's a lot of complexity. It's a lot of non-standard nonsense. But if we wanted to make something go as fast as possible, those are the kind of things that I would take to doing is to take the kernel out of the equation and burn a CPU core just spinning, doing nothing other than saying, is there anything to do? How about now? How about now? How about now? In a tight loop and potentially reschedule everything else so they can't run on the same CPU as me. There are ways that you can do this with CPU sets with isolation.

19:06.92
Matt Godbolt
Trickery, isolcpus in a kernel command. There's a lot of weird things you can do. Anyway, right. And then you can get it down to, in the limit, single digit microseconds between a packet arriving and you going, okay, I have work to do.

19:22.30
Ben Rady
Okay. So if we assume that the work that needs to be done is relatively simple, right? You're going to do some adds, you're going to do some multiplies, you're going to try to structure your code so that you're really only doing adds and multiplies, and then you're going to produce some output that is purely a function of that input and whatever state you may have held in memory.

19:46.02
Matt Godbolt
All right. So what you're talking about, what we've got is an ALU as a service. You know, we are, have you seen this? There's a guy who's written a microservices based CPU emulator where every single instruction is handled by a different microservice, each of which is written in a different language and the thing runs and it's so... but yeah, isn't it cool?

20:05.10
Ben Rady
Oh my God, that's amazing. Man,

20:07.52
Matt Godbolt
I love that people think of these things.

20:10.68
Ben Rady
you think you were in microservice hell. You should.

20:13.76
Matt Godbolt
Yeah, you've got like literally 256 different things and a RAM service, you know, memory service so that everyone can agree on this.

20:20.81
Ben Rady
Oh, that's amazing.

20:21.18
Matt Godbolt
It's cool. but So you're talking about something like that. Let's just say you know it is...

20:24.55
Ben Rady
Yeah.

20:24.60
Matt Godbolt
In fact, honestly, probably a decent... Unless you want to... I'm going to say these and you can tell me if where you're trying to steer it is right or not. But like something like a Redis memory cache is like one of these things.

20:36.59
Ben Rady
Yeah. Right.

20:36.68
Matt Godbolt
It's like something comes in, I look it up in RAM and I'm like, yes, I've got it.

20:39.44
Ben Rady
Sure.

20:39.48
Matt Godbolt
Here it is. It's like a small piece of data. You know, a key value store that is just, you know, like a cache, right?

20:46.39
Ben Rady
Yeah.

20:47.12
Matt Godbolt
You're not doing very much work.

20:48.74
Ben Rady
Yeah.

20:49.10
Matt Godbolt
It's not as as a banal as like I'm adding the three numbers you've given to me and giving them back to you.

20:54.64
Ben Rady
Right.

20:54.78
Matt Godbolt
It is, that you can imagine there is actual value in that, but it's still, you know.

20:58.76
Ben Rady
Yeah. I'm trying to draw a line about the the other parts of the computer that you're going to be talking to. And what I'm trying to say by this is that you can't ignore memory, right?

21:08.54
Matt Godbolt
I see.

21:08.62
Ben Rady
It's not simply a function of the data that you've read in, but you don't have to go beyond that, right?

21:13.57
Matt Godbolt
Yep.

21:13.90
Matt Godbolt
So, okay. So let's so let's use like ah an in-memory cache, like a yeah Redis is like without the disk backing type stuff.

21:19.72
Ben Rady
Yeah. Right, right. Mm-hmm.

21:20.32
Matt Godbolt
Yeah. Yeah, yeah. That's perfect. Okay. So yeah, you're looking, I mean, at that point, there's a, without going into the extremely fun and interesting rabbit hole of how you actually do the lookup,

21:34.34
Matt Godbolt
and what data structures you use. But that is probably the key thing is that like, you must consider the data structures you use. But the interesting thing is that with a latency lens,

21:45.34
Matt Godbolt
You're not looking at data structures in quite the same way as you do when you are doing like CS, whatever. I don't know what numbers that you use, but you know when you're doing your data structures courses and you're like looking at big O notation, like, you know, big O of N, whatever, N log N for your, you know, so like a not unreasonable data structure for you

22:09.82
Ben Rady
Right. Yeah. Right, right.

22:11.70
Matt Godbolt
a key value store is something like a balanced red-black tree, right? It's log N to look into it, whatever.

22:20.35
Ben Rady
Okay.

22:20.95
Matt Godbolt
Another one is a B tree or a B plus tree or one of those kinds of trees, you know, hash maps are other ones as well. And, you know, obviously hash maps do have in theory O of one. So you'd probably want to pick that out of big O notation or whatever.

22:33.61
Ben Rady
Right.

22:34.42
Matt Godbolt
But the other ones seem reasonable as well. Like it's like, but, the dominating factor with something as straightforward as the thing you've just described, where we're literally just looking something up in the key value store, is going to be reading from RAM.

22:46.96
Matt Godbolt
Now, if you're talking about something that's so small it fits into L1, L2, or L3, which are in the of the order of tens of Ks in the L1 case up to several megabytes at L3, and you know bigger than that on bigger service servers and whatnot, but like we're still talking like not all that much information.

23:07.27
Ben Rady
Right.

23:07.68
Ben Rady
can Can you describe for our listeners, he said, covering for his own ignorance, what L1, L2, and L3 actually are?

23:14.28
Matt Godbolt
Oh, I'm so sorry. So there are layers of caches. Caches are expensive both in terms of the amount of space they take up on a chip and how fast, you know going back to our Grace Hopper, you know the bigger something is, the longer it takes to get to the other side of it.

23:36.44
Matt Godbolt
So the way that CPUs

23:39.01
Ben Rady
Mm-hmm.

23:39.56
Matt Godbolt
prevent us from having to read from the very slow RAM, which we'll talk about in a second, is that we have on-chip caches, and they are layered so that you have a very small, incredibly fast level one cache, which is usually...

23:53.32
Matt Godbolt
not very large, tens of kilobytes kind of area. Then you have a layer two cache, which is maybe a meg, maybe hundreds of K, that kind of area. And then a layer three cache, which is maybe tens of megs, maybe even a little more than that.

24:09.56
Matt Godbolt
And each of them is slightly further away, like literally physically. They're still on the actual thumb size chip inside the machine. They're further away, but it takes longer to look through them to find the data.

24:20.94
Matt Godbolt
And so if you can fit something in L1, it's kind of almost free.

24:26.59
Ben Rady
Mm-hmm.

24:26.84
Matt Godbolt
I'm going to put air quotes over this. We're talking somewhere in the region of, you know, five or six cycles to read from L1. And that includes all of the other things that are like the fixed cost of reading from memory includes doing the lookup for the logical address that you've decided to read from and then turning it into the actual physical address that it needs to go to the correct chip, you know, different processes think that address 2000 has different data in it, right?

24:54.46
Ben Rady
Yeah, yeah, yeah.

24:54.86
Matt Godbolt
And something has to do that translation. And that's not at this scale. When we're talking about, you know, a three gigahertz machine, a third of a nanosecond is a clock cycle. It's that that look up itself takes time.

25:06.38
Ben Rady
Right, right.

25:06.64
Matt Godbolt
And so, you know, we need to be doing something. So that's about the limit of it. And there's some other really complicated things that the poor memory system has to do because of the weird out-of-order engine. But we'll ignore that for now. So five or six cycles for like an L1.

25:19.38
Matt Godbolt
Now I'm going to forget this off the top of my... should have brought this up beforehand, you know, but like tens of cycles, low tens of cycles for L2, maybe late tens, you know, 80, 90 cycles, maybe L3. No, I don't think it's that much now.

25:30.52
Matt Godbolt
Someone's, so you know, order of magnitude-y things. It's, you know, still, probably only tens of nanoseconds but then you go out to RAM and it can be hundreds of nanoseconds if you actually need to go out to RAM. So those are the layers. The other thing that's interesting is that usually on chips the L1 and maybe the L2 are unique to the CPU core that you're running on so you just get a copy of your own and then that's yours and no one can do anything with it with the

25:57.53
Ben Rady
Right.

25:57.59
Ben Rady
Okay.

25:57.64
Matt Godbolt
massive asterisk and footnote for older CPUs, but we maybe won't go there. And then the L3 typically is shared amongst the the other cores that are physically on the same chip as you.

26:08.76
Matt Godbolt
And so there are some contentions and there are obviously conversations that have to happen between the chips to say, hey, I'm reading that. Oh, I was just writing to it. Oh, okay. Well, we need to make sure that we do this in the right order.

26:17.23
Ben Rady
Right. Mm-hmm.

26:17.30
Matt Godbolt
So there's a little bit more complexity there. But yeah, so if we're still talking in single, you know measurable, human, countable numbers of cycles, keeping it inside the L1 or L2 is great. L3 is okay. And if we have to go to RAM, we have to be a bit careful about things. And at that point, two things are important to note about the way caches work.

26:38.80
Matt Godbolt
One is that the sort of the whole... the whole theory behind a cache is that having read a byte of memory, it's really, really likely you're going to read the bytes that are around that byte of memory.

26:52.59
Ben Rady
Mm-hmm.

26:53.38
Matt Godbolt
And so the fundamental unit of work from the memory system on the CPU to the caches and beyond is a sequence of bytes, which is usually 64 or maybe 128 bytes long. And that's kind of like the atom that gets shipped back and forth. And some chips even do things where you say, well, you ask for this block of 64 bytes, I'll just get the next one and maybe the one after that while we're going, because we might as well having just done that. And so this sort of temporal usage pattern is incredibly important for latency because if you're following a linked list, as you may be if you just have a naive hash map, suddenly you're jumping randomly around in memory and the poor system can't make a guess.

27:32.89
Ben Rady
Right.

27:33.22
Matt Godbolt
But if you are if you just have a damn array of everything and just go, well, it's expensive, it's an order N operation for me to search through these things, but I'm going to run through them linearly,

27:43.92
Matt Godbolt
suddenly it may be more worthwhile from a cache standpoint, especially if a memory, a cache misses hundreds of cycles. Well, I can do a lot of work in a hundred cycles on an out-of-order execution machine that has 12 execution units, right?

27:58.04
Matt Godbolt
So why don't just do tons of work and hope that that's quicker than going to RAM? So anyway, popping the stack all the way back, to keep the latency as low as possible, you're going to have to think about this.

28:09.35
Ben Rady
Yeah. Mm-hmm.

28:09.48
Matt Godbolt
And so you might want to make some concessions where the distribution of your latencies needs to be taken into account. Now, if you're going to say like, well, most of the time, I think I'm going to be... this like 80% of the stuff we're reading is hot. Like it's going to be commonly used again, in which case I'm going to use a data structure, which is fine.

28:36.44
Matt Godbolt
And that 80% fits inside. It doesn't matter what I'm doing, whether it's a balanced tree or whatever, it fits inside L3, let's say.

28:43.94
Ben Rady
Right.

28:44.14
Matt Godbolt
But the 20% that's outside of it is just outside. It's going to RAM. We're just going to have to take the hit every time. And the fact that it's incredibly expensive because we're using a balanced tree. So I guess now we turn around and say, like, it's not just how fast to go. It's like, what is the shape of your distribution? Is it better to have the average case fast and the bad case awful?

29:04.70
Ben Rady
Right.

29:05.06
Matt Godbolt
Or is it better to bound the bad case and say, well, actually, I'm going to use a linear search every single time. And I'm going to say that even the fast one where the linear search, you know, it would have been faster just to check, I don't know, the first one or pack them in a different way. But, you know, I'm paying the cost to do 16 compares,

29:25.00
Ben Rady
Yeah, yeah, yeah.

29:26.06
Matt Godbolt
Every time and just go, well, that costs me a little bit more than one.

29:29.08
Ben Rady
Right.

29:29.12
Matt Godbolt
Of course it does, but it makes the worst case go away, but at the lowest.

29:36.03
Ben Rady
Right.

29:36.16
Matt Godbolt
So yeah, sorry. I've just gibbered around in a big, as I've sort of like realized the enormity of the question you've asked. Yeah. Yeah.

29:42.75
Ben Rady
that was That was the intention of the question. But yes, whenever you're talking about latency, you need to not, it's not like, oh, it's 10 microseconds of latency.

29:51.11
Matt Godbolt
Yeah.

29:51.24
Ben Rady
It's no, it's 10 microseconds of latency in what case? What percentage of the time? You know, oftentimes you talk about like P95, P99, P99.999 numbers.

30:01.42
Matt Godbolt
Yeah.

30:01.42
Ben Rady
Of like what percentage of the responses are going to be less than some particular target time. And you may have multiple layers of that, right?

30:11.86
Ben Rady
Like you may have a maximum threshold that you need and like a P99.99 and then like a P95 and then like a median, right?

30:22.78
Ben Rady
And you might want to like structure things you know differently based on whatever those targets might be. So in our case, sort of this hypothetical, you know, robot dog type plugged into a PC perhaps that's receiving the signal.

30:40.60
Ben Rady
Maybe what we'd want to say here as one fork of the tree to go down is actually the thing that I care about is the worst case latency, right?

30:49.84
Ben Rady
I want the... because if the latency gets too high, the dog is going to fall over and that will be bad, right?

30:54.58
Matt Godbolt
Right.

30:54.90
Matt Godbolt
That's a very common thing for like, yeah, control boards and things.

30:59.11
Ben Rady
Mm-hmm.

30:59.28
Matt Godbolt
And you know this is why in those situations, typically embedded systems who have that kind of very strong reliability requirements will use not regular operating systems.

31:12.04
Matt Godbolt
They're using you know real-time operating systems that can give you hard deadlines for things happening within a certain amount of time and or do no preemption at all.

31:19.47
Ben Rady
Mm-hmm.

31:19.56
Matt Godbolt
So it's not like a case that your your process will have its CPU time taken away from It's like, you know, it's your responsibility to to keep running until you get to the end and then yield to the next guy so that you can...

31:30.30
Matt Godbolt
But and this is a huge problem. As computers have gotten more complicated, it becomes harder and harder to reason about what an upper bound could be.

31:40.94
Matt Godbolt
You know, picking up my 6502 in its box over here.

31:45.06
Ben Rady
Right. It's a little easier.

31:45.68
Matt Godbolt
You know, it's a little easier. You know, it's it's like I can tell you exactly how long any single thing is going to take all the time.

31:52.49
Ben Rady
Yeah.

31:52.54
Matt Godbolt
It's completely deterministic and it's well documented.

31:56.34
Ben Rady
Right.

31:56.74
Matt Godbolt
But we have to reverse engineer what the hell an Intel chip is doing in order to understand...

32:02.86
Ben Rady
Essentially.

32:02.86
Matt Godbolt
Sort of maybe what the worst case is in that particular situation, right? You know, hey, what happens if, you know, we are trying, we have an L3 cache miss at the same time as another CPU core, like a physically distant core has an L3 cache, both go for it at the same time.

32:20.69
Ben Rady
Mm-hmm.

32:20.88
Matt Godbolt
How does that work? Is there some really weird edge case where, you know, in the fabric, it times out after some amount of time, I don't know, and it's hard to find out, but if I need to give you an actual, like, someone will get run over if the brakes do not come on when I put my foot on the brake, then maybe I can't, I certainly can't think about the world the same way, but I certainly can't use certain, like, CPUs and designs and operating systems like that.

32:45.75
Ben Rady
Mm-hmm.

32:45.81
Ben Rady
Mm-hmm.

32:45.86
Matt Godbolt
So that's, but, it's interesting.

32:46.60
Ben Rady
Yeah. So if I come to you and I'm asking this question of how do I make this program fast? And you go through all of these follow-up questions and what it turns out that I'm asking for here is how do I make sure that the round trip latency of this program that I have running on a just regular commodity PC is not going to ever be higher than 10 microseconds?

33:13.38
Ben Rady
The answer is you can't.

33:14.98
Matt Godbolt
I think that's a fair... Yeah, that the nearest people I can think of that have to deal with that specific problem are people who make digital audio workstations for you know consumer PCs because they they have...

33:29.46
Matt Godbolt
A genuine latency bound because if you're playing a keyboard and there's a one second delay, that's not, between you're pressing the button and hearing the noise change.

33:40.68
Ben Rady
Yeah, right.

33:40.79
Ben Rady
Yeah. Yeah. Yeah.

33:40.96
Matt Godbolt
That is not good for anybody.

33:42.88
Ben Rady
Yeah.

33:43.38
Matt Godbolt
And so the only way that you can get that is to make everything as low latency as possible. But if you go too far and you don't have enough buffering between you and the set of audio that's coming out, then if you, for whatever reason, can't keep up, you're going to get a massive pop or a crack or a click as suddenly you have to catch up and just drop a whole bunch of samples that you were generating.

34:06.40
Ben Rady
Yeah, yeah, yeah.

34:06.58
Matt Godbolt
So that's, I mean, I don't know how they do it, honestly. In fact, I do know people who have worked on them and like we should perhaps consider getting them on to talk about this kind of stuff because it's a fascinating problem.

34:20.36
Matt Godbolt
But yeah, anyway, but back to your...

34:23.70
Ben Rady
Yeah. So the result of this, you can't, so now it's like, okay, yeah, right.

34:29.56
Matt Godbolt
Yeah, you can't, yeah. Or it's really, really, really hard.

34:31.22
Ben Rady
It's like, you're putting constraints on this problem that you can't really solve for. For example, you'd probably be better off using some specialized hardware rather than trying to do this on a commodity machine.

34:40.98
Ben Rady
If you truly have a constraint that, you know, of, of let's just say 10 microseconds or something like that.

34:47.38
Matt Godbolt
This is to an extent, I mean, obviously, microcontrollers have been around longer than computers because they've been in things or that's the original design.

34:54.09
Ben Rady
Mm-hmm.

34:54.14
Matt Godbolt
But that's that is why the brake disc controller in my car is not controlled by the one PC that is like running the the AV display in the middle panel, right?

35:05.19
Ben Rady
Right.

35:05.25
Ben Rady
Yes.

35:05.30
Matt Godbolt
It is like a very specialist piece of software running on a very specialist operating system, which probably isn't an operating system by anyone else's definition of operating system.

35:15.20
Ben Rady
Right.

35:15.65
Ben Rady
Right. Right.

35:15.76
Matt Godbolt
It's just a library you link against and then sets up the interrupt vector and hands you a few nice abstractions, good luck.

35:22.62
Ben Rady
Yeah. Yeah.

35:23.54
Matt Godbolt
But that's why it's doing nothing but monitoring the disc because then finally you can potentially say, all right, we can upper bound this stuff. We do know what the maximum latency could be, and it is, and I can sit and work it out on paper because it's an ARM, this is an in-order ARM and it only has an L1 cache and even if it's whatever, you know, there are things that you can say about it and you can bound it that way.

35:47.84
Ben Rady
Mm-hmm. Mm-hmm.

35:49.26
Ben Rady
Okay. So I've got, we've got our inputs. Maybe we've had some discussions about our constraints here and we're like, all right, well, we can't do this to a maximum level of 10 microseconds.

36:00.98
Matt Godbolt
Mm-hmm.

36:01.10
Ben Rady
That's insane. We could maybe do something where our like P95's, maybe less than that, more than that.

36:07.80
Matt Godbolt
Let's just say, I mean, yeah, you and I, we don't know.

36:11.37
Ben Rady
Let's just something like that. Yeah. And then,

36:13.74
Matt Godbolt
We're making this up as we go along as if it wasn't...

36:15.26
Ben Rady
Yeah, and then and then our maximum is something much more reasonable like a second, right?

36:20.01
Matt Godbolt
Okay, yeah.

36:20.76
Ben Rady
It's like, all right, it's never going to take more than a second. If it does, then we have a bug and we should fix it, right? As opposed to, yeah, that's how the system works, right?

36:32.24
Ben Rady
So now we've gotten our message from the outside world. We've gone and looked up the value that we've cached. Hopefully that's an L1 cache and hopefully we've done some smart things to make sure that it's there. But you know again, this is why we have these ranges of latency.

36:49.08
Ben Rady
How do we write out the response? Do we have the same sort of operating system packet issue when we're trying to write this thing back out?

36:58.16
Matt Godbolt
I mean, yes. If we're talking about standard, like, I mean, we keep changing what it is we're doing. Is it a dog that's falling over? Is it a UDP packet going into a...

37:05.06
Ben Rady
Yeah, yeah, yeah.

37:05.28
Matt Godbolt
Yeah. But, you know, like, if we're going to send a packet back out again, then yes, there's a similar dance on the way out through a traditional operating system where you write to a bit of memory that you can read and write in user space, and then you say, hey, send it, please.

37:19.10
Ben Rady
Mm-hmm.

37:19.22
Matt Godbolt
And and the kernel goes, well, I need to send it. I can't hand this bit of memory to... the network card because this bit of memory is just wherever you decided to put it.

37:30.32
Matt Godbolt
So maybe on the stack, whatever. And certainly the DMA engine for your network card does not have carte blanche to read any piece of memory anywhere in the whole system for all the obvious reasons of like separation of concerns and to do with some of the ways that they work internally.

37:46.91
Ben Rady
Mm-hmm.

37:47.20
Matt Godbolt
So the kernel has to copy it somewhere else in a buffer where it knows it could hand the address of that to a network card. And then it returns, potentially returns back to you and says, yes, it's sent, even though it isn't. It's sat in a buffer. Time will pass. Eventually, the kernel will decide to schedule sending it. I mean, probably it will actually send it immediately, all things considered, unless the network card's already busy doing something else. You know, there's these...

38:10.10
Matt Godbolt
And then the network card will be notified through some mechanism. And that's relatively expensive and requires a ping across the network. Sorry, I say a network. It is a network. The PCIe bus itself, or however your network card is plugged into your computer, is effectively yet another network.

38:26.66
Ben Rady
Mm-hmm.

38:26.74
Matt Godbolt
And there's another hop and there are... complicated things going on there. So the message is addressed to it. The network card goes, oh, you say you have a packet for me. How quaint, how lovely. Okay, I'll go and get the address from the ring buffer.

38:37.22
Matt Godbolt
Here it is. Okay, and I'll start streaming it out. And so that process can be

38:41.00
Ben Rady
Mm-hmm.

38:41.44
Matt Godbolt
relatively latent too. And so again, you're less likely to suffer from the scheduling randomness that the kernel does, because it's not like something asynchronous happened and now the kernel has to pick you being amongst all the things that it's next to do.

39:00.27
Ben Rady
Mm-hmm.

39:00.48
Matt Godbolt
It's like, now you've given it a piece of work and it's very obvious what it should be doing with that work.

39:04.17
Ben Rady
Uh-huh.

39:04.58
Matt Godbolt
But there still is some redundant copying of data. There is still some messaging to an asynchronous process, which is the network card itself.

39:14.69
Ben Rady
Again?

39:14.86
Matt Godbolt
The network card can be like, yes, sure, fine. And then it could be busy doing something for all we know, right? You know, I don't know what what magic network cards are doing. Maybe it's, you know, a virtual network card that doesn't even exist as a physical thing.

39:25.24
Matt Godbolt
And really it's pretending to be a network card, but in fact it is, in a data center on a shared piece of infrastructure. And what you're really doing is talking to the hypervisor, which then goes, oh, cool, a packet from one of my many virtual machines.

39:40.64
Ben Rady
Hypervised. Yeah, right.

39:41.25
Matt Godbolt
Yeah, hypervised children, right.

39:42.48
Ben Rady
So I'm kind of, I was maybe teeing this up a little bit, but this is like somebody comes along and says, hey, Ben, you made this program to control this robot and you did it in a very silly way.

39:54.34
Matt Godbolt
Yeah.

39:54.34
Ben Rady
Why are you using a UDP network? That is not necessary.

39:58.06
Matt Godbolt
Right.

39:58.36
Ben Rady
Are there other devices that one could use, I mean, like a USB-C connection or something else that would be available to, you know, just a regular consumer, commercial computer, something.

40:12.96
Matt Godbolt
Yeah, that's it. I don't know too much about the ins and outs of how the kernel interacts with the USB. I know that the underlying USB protocol and the thing you talk with, the hub or whoever you're talking to on the end of it is sort of a negotiated thing. You can ask for, you negotiate bandwidth, you negotiate...

40:31.89
Matt Godbolt
I don't think you negotiate latency. I think you can negotiate isochronous, that is repeating things. So like if you've got a webcam plugged in, it can book, hey, I need to send this much data just all the time.

40:45.66
Matt Godbolt
And it gets sort of time on the USB bus to be able to send packets over that data. But I don't know about, like, latency. And obviously things like you know USB keyboards and things and mice. And you know gamers obviously use USB mice and whatever, but we're still talking almost human level levels of latency there, you know millisecond, sub-millisecond, I'm sure.

41:04.24
Ben Rady
Right,

41:04.28
Ben Rady
Right, right.

41:04.36
Matt Godbolt
So yeah. I don't really know. My best guess I would be able to have is not to say something like USB, because again, there are some layers of indirection between you and the actual device, even if you're talking the USB protocol through the kernel somehow.

41:21.20
Matt Godbolt
But if you had like either, you know, GPIO on a, which is general purpose IO pins on one of the more microcontrollery type systems where you can literally just say, if I write to this memory address, it's not really a memory address.

41:36.33
Ben Rady
Mm-hmm.

41:36.56
Matt Godbolt
It is whatever data I put there is the highs and lows of these eight pins, right?

41:40.58
Ben Rady
Yeah, yeah, yeah. Right.

41:40.72
Matt Godbolt
And at that point, you're like, yeah, you're off to the races, right? Or, you know, say a serial interface, which although serial is incredibly slow and awful, it is also something where you're kind of directly attached to and you can say, like, I'm just driving it myself. Now, I'm sure that's no longer true. Yeah.

41:56.02
Matt Godbolt
Now I say it out loud, I'm sure that it's virtualized through some mechanism in the kernel. And so it's no longer the case if you open /dev/tty0 that like if you get the OK, you are the only person talking to that now.

42:07.54
Ben Rady
Mm-hmm.

42:08.06
Matt Godbolt
And still, it would would suffer from that. If you read from it, then you would block and the kernel would put you to sleep until a byte came in.

42:15.84
Ben Rady
Right.

42:15.94
Matt Godbolt
So even that doesn't make sense. So GPIO is the nearest thing we have to the sit in a tight loop and wait kind of approach that we were talking about before, you know, for both input and output, you could say like, hey, you know, I'm doing a quiz show, whoever presses the button first kind of thing.

42:34.06
Matt Godbolt
And there are eight buttons and they go into the eight bits inside my controller.

42:38.56
Ben Rady
Yeah, yeah, yeah, yeah.

42:38.56
Matt Godbolt
And I just sit in a tight loop reading from it. Obviously there are better hardware ways of doing that, but I'm just sort of thinking out loud. Yeah. Yeah, I don't know. Did you have an idea?

42:49.16
Matt Godbolt
You've been driving this in a direction, and I'm wondering if you have a solution in mind that I have not got to.

42:54.94
Ben Rady
No, no, no, I mean

42:55.98
Matt Godbolt
By the way, have I got the job? Or oh...

42:58.90
Matt Godbolt
is it Is it too late?

43:02.48
Ben Rady
Well, you passed to the next round, but I'm going to have to, yeah, I know.

43:05.68
Matt Godbolt
Oh, okay. We'll have to see. We're now 40 minutes in, so yeah.

43:08.43
Ben Rady
Do you have any questions for me?

43:09.62
Matt Godbolt
Do I not have questions for you?

43:13.40
Matt Godbolt
We've done far too many interviews, haven't we?

43:15.72
Ben Rady
Yeah, yeah, yeah, yeah, yeah.

43:18.13
Matt Godbolt
Yeah.

43:18.74
Ben Rady
I mean, I think that, I mean, we made the full kind of round trip, and I think that's good. I think the other, and if there is a part two to this, I think the part two should be, okay, now what happens when it's the actual like calculation that is the part that needs to quote unquote go fast, right?

43:34.02
Matt Godbolt
Oh, yeah, yeah, yeah, yeah.

43:34.50
Ben Rady
Like like the IO no longer dominates as it did in this example. And it's basically just this exercise. And how do you take this signal, make sure that you don't accidentally make it slow when it's being processed in the CPU and then get a thing back out.

43:47.52
Matt Godbolt
That's an interesting one. Yeah, we should.

43:49.08
Ben Rady
Now it's, okay, this is gonna take a material amount of time and we're trying to make it as fast as we can.

43:52.99
Matt Godbolt
All right.

43:53.46
Matt Godbolt
Yeah, which is probably what most folks think of when you say, can I make my code go fast, is the code part, not all the other stuff around the outside, but often that can dominate. Yeah, no, that would be an interesting one.

44:03.38
Ben Rady
Right.

44:03.60
Matt Godbolt
So yeah, okay, we'll... well and then we've got to talk about throughput at some point, and maybe that will leg into throughput, because, you know, I think there is part of it, but we'll see.

44:13.11
Ben Rady
Yes. Yeah.

44:13.24
Ben Rady
Mm-hmm.

44:13.30
Matt Godbolt
Well, all right.

44:13.97
Ben Rady
Maybe.

44:14.02
Matt Godbolt
We rather mysteriously then, and now you and I have to look each other in the eye and go, we will remember that the next episode we record will be the continuation of this one.

44:23.12
Ben Rady
Yeah.

44:23.20
Matt Godbolt
And we won't just leave our poor audience hanging.

44:25.43
Ben Rady
Yes.

44:25.80
Ben Rady
Right.

44:26.10
Matt Godbolt
So I apologize in advance if we, if we do in fact, but I think this has been very interesting.

44:31.89
Ben Rady
Right.

44:32.24
Ben Rady
Yeah, that's good.

44:33.52
Matt Godbolt
Hopefully our listener agrees and we will, I will see you next time, my friend.

44:38.26
Ben Rady
Next time.

44:45.67
Matt Godbolt
And cut.