00:21.06
Ben Rady
Hey, Matt.

00:22.08
Matt Godbolt
So last time we were talking about performance and amazingly, you and I have remembered just about enough of what we were talking about before to be able continue on that subject.

00:32.62
Ben Rady
Mm hmm. Yep.

00:33.34
Matt Godbolt
So the story so far, as we were recapping during the intro, we have a relatively trivial piece of code and we were interested in making it go as fast as possible.

00:43.69
Ben Rady
okay

00:44.56
Matt Godbolt
mostly by looking at the things outside of the code itself that could be taking time, IO and shipping things magically through kernel space or ah avoiding kernel space and that good stuff.

00:55.11
Ben Rady
Mm-hmm.

00:55.36
Matt Godbolt
So then you said, well, what happens if it wasn't a trivial piece of code? What if it was like a lot of work? How do you make that go fast? And I think that's where we were we were going to continue.

01:07.18
Ben Rady
Right.

01:08.28
Matt Godbolt
So, I mean, how do you want to start this?

01:11.00
Ben Rady
Yeah.

01:11.38
Ben Rady
Well, so yeah, like the previous example, it was dominated by IO, right? Like ...

01:17.18
Matt Godbolt
Or you know a very simple map lookup and just RAM. And we talked about L1, L2, L3 and that kind of stuff.

01:22.70
Ben Rady
right. Yeah, not quite adding two numbers together. You do maybe have to go out to RAM and and and pull some data down, but you know I think ah Redis cache was sort of like the rough idea.

01:32.22
Matt Godbolt
The mental model we had, exactly, yeah.

01:35.32
Ben Rady
Yeah. And you know maybe alternating with robot servo control but that's i think we went off on a tangent there yeah

01:41.06
Matt Godbolt
That's right, we wouldn't have we both, yeah, unsurprisingly.

01:44.76
Ben Rady
So now imagine the the program that i wish to make go fast instead of being dominated by i o let's say is dominated by the program itself And I think this could be interesting in two dimensions One is you might have something where um what you're actually looking for is throughput, right?

02:07.06
Ben Rady
You're just trying to process a lot of data, or maybe you're not trying to process that much data, but the computation on that data is very expensive. And so you're trying to do that as quickly as you possibly can.

02:18.56
Ben Rady
And then there's also the dimension of just latency, where it's like, okay, IO is maybe not the dominant factor here, but the round trip time of this still matters.

02:31.92
Ben Rady
And so you have trade-offs that you now need to make. Whereas before we were just kind of optimizing for, you know, let's stay in the L1 cache and let's make sure that we you know never miss it.

02:40.47
Matt Godbolt
Right, right.

02:40.82
Ben Rady
It's like, well, we can't, this problem is too complicated for that. So now we maybe have some interesting trade-offs between IO and performance inside the CPU.

02:48.86
Matt Godbolt
To sort of concretize, concretize, something that, like that a word?

02:51.98
Ben Rady
That's a word now.

02:53.36
Matt Godbolt
It is a word now. um Something like a video game is a great example of there's a ton of stuff going on in every frame, but you want it so that when you hit the fire button,

03:01.14
Matt Godbolt
You aren't waiting 120 milliseconds before you actually finally see the fire, you know, the shot go off.

03:08.20
Ben Rady
Yeah, yeah.

03:08.20
Matt Godbolt
And, you know, there's a number of things that are sort of kernel bypass-ey in the way, but most of it is just down to computation. And can you get something done in say 60th of a second or a hundredth of a second in modern things.

03:19.34
Matt Godbolt
And then for a throughput based thing, we're talking maybe credit card processing?

03:23.04
Ben Rady
Right.

03:23.30
Matt Godbolt
I don't know that's still IO-ey feeling. i don't know. I'm trying to think of an example that might be, um, that that we could talk about that's, uh,

03:30.36
Ben Rady
I mean, you know there's all kinds of problems with like text manipulation. Like, you've got lots and lots of text you're trying to ah summarize it, extract some you know signal from it, you know, run some NLP process on it.

03:44.72
Matt Godbolt
Yeah.

03:44.78
Ben Rady
That's one.

03:45.80
Matt Godbolt
Right, right. And you've got tons of data to go through and you just want to get through it quickly. But, you know, the end of any one piece of text is not particularly important.

03:52.42
Ben Rady
Yeah.

03:52.66
Matt Godbolt
Yeah, I mean, that works.

03:53.21
Ben Rady
Right.

03:53.66
Matt Godbolt
Well, so I think probably something we didn't even talk about last time or I don't remember talking about last time is the single most important thing about any kind of performance, which is measuring it and making sure you know what you're doing

04:09.96
Ben Rady
Ah, uh-huh.

04:09.96
Matt Godbolt
Before you make any changes, because unlike almost any other aspect of software engineering, there are so many hidden variables and so many weird things going on inside computers or exogenous or in in your own code, which is more complicated than perhaps you remember it is

04:28.99
Ben Rady
Yes. Right.

04:29.02
Matt Godbolt
or there are surprising aspects of it that the only way is to establish some kind of base truth, like,

04:37.09
Ben Rady
Mm-hmm. Mm-hmm.

04:37.78
Matt Godbolt
I have a representative amount of work and I have a harness that can run it and I can get reproducible results that are hopefully intersubjective so I can share it with my team and they can run it and they can also see that or that kind of thing first before you even start making the first change or even trying to understand the problem.

04:54.18
Ben Rady
Yeah.

04:54.24
Matt Godbolt
You just need to have a baseline.

04:56.08
Ben Rady
Right. Right.

04:57.44
Matt Godbolt
So measure, measure, measure, you know, is the, is the, the key there.

05:01.21
Ben Rady
Mm-hmm.

05:01.32
Matt Godbolt
And then, Profiling is the next sort of aspect of it. like How do you know where you're spending time in your program? and that can be a surprising journey as you discover that actually it's not the computation. It's not the big O(N^2) thing that you've got. it it's It's the fact that unbeknownst to you, the particular function you're calling has to do a locale lookup every single time to parse a string and you didn't realize that it was just checking.

05:27.35
Ben Rady
Right.

05:27.58
Matt Godbolt
I wonder if the user has switched into German locale they're expecting us to use periods instead of commas whatever.

05:31.58
Ben Rady
Right. It treats this identifier as a domain name and does a DNS lookup every time you call this function. Right? Yeah, exactly.

05:38.67
Matt Godbolt
Right. You know, yeah, yeah, exactly.

05:38.95
Ben Rady
Yeah.

05:39.00
Matt Godbolt
Understanding the complex code that you have written and that you're interacting with and the surprising effects it has are key.

05:46.24
Ben Rady
Yeah.

05:47.04
Matt Godbolt
So I realized we didn't really touch on that last time, but that would be like the first and most important thing is: measure everything.

05:52.66
Ben Rady
Mm-hmm.

05:52.76
Matt Godbolt
And then you kind of hit the the sort of list of, you know, do I need to do this? Can I do this before? And then ah you know can I avoid it? those Those kinds of things. I can't even remember. that that so Who was it? Somebody we spoke to once had a list of like,

06:07.72
Matt Godbolt
the sort of like rules of optimization and sort Zeroth rule is, do I even need to do this at all?

06:12.26
Ben Rady
Man, that's always That's a good rule for anything, honestly.

06:14.31
Matt Godbolt
um But anyway, I'm getting ahead of myself. I'm getting, it yeah, that's true.

06:18.24
Ben Rady
um Okay, so let me, let me throw to kind of kick this conversation off, let me throw a scenario at you.

06:23.18
Matt Godbolt
That's true.

06:23.76
Ben Rady
Because everything you're saying here makes a ton of sense, right? And so like, okay, yeah, measure: measure at least twice, cut once, right? Like I'm going to measure this thing.

06:32.44
Ben Rady
I'm going to make sure that it is slow. slow I'm going to maybe have a profile that tells me that that it is the slow part in the context of the system. It's not that it has some you know slowness to it that I don't like. It's like, no, the reason that you know the bullet isn't traveling out of the gun fast enough is because this line of code or this method or this thing is slow.

06:54.82
Ben Rady
So let's say that I'm like, all right, well, obviously what I want to do here is I want to measure it and then i want to change it. And then I want to measure it again and I want to make sure it got better, right? What do I do when I set up my little thing to measure it and I run it and I get one result and then I run it again after having made no changes and I get a totally different result?

07:19.88
Matt Godbolt
Errr... I mean, despair!?

07:23.18
Ben Rady
[laughing]

07:24.74
Matt Godbolt
but be aware that that's very common, right? You know, it's... Benchmarking anything is incredibly difficult. And with all the variables that we've talked about, the noise on your machine is perhaps and a source of of that.

07:38.79
Ben Rady
Mm-hmm. Mm-hmm. Mm-hmm.

07:38.82
Matt Godbolt
you know Making sure that there isn't an exogenous input that actually you're you're depending on, like you say, a DNS lookup that sometimes is is in the cache, sometimes isn't in the cache, those kinds of things. And then if you're on a shared machine,

07:50.90
Matt Godbolt
Obviously, I guess it depends, right? right there are some Some aspects of performance are so large, like if you're measuring the throughput of a system and you're running it for like 20 minutes and then it's you're getting results out of that, then usually the noise levels are low enough that it doesn't matter as much. But if you're doing something that's like ah write a...

08:12.26
Matt Godbolt
you know, the collision detection for a video game. And you're like, okay, I have a test essentially, just like I would write a test to say, do these two things intersect, except that it's a little benchmark and I'm going to run the, "are these intersecting?" over and over again?

08:28.05
Ben Rady
Mm-hmm. Mm-hmm.

08:28.14
Matt Godbolt
And each time it takes, I know, 200 micros, something like that. And, and um and then I run it once and it says 200 micros and I changed nothing. I run it again and it says 10 micros. i'm like, what, hang on, this is,

08:39.80
Matt Godbolt
This is bonkers. Now you're in sort of like the noise of the machine area and you are looking at things like the warmth of the cache, whether or not other processes are running at the same time as you, whether you're you're being kicked off the CPU, whether or not, you know, you've got Slack open and in the background and someone's just posted an animating GIF and now 200% of your CPU is is actually animating a stupid ah cat.

09:03.21
Ben Rady
Yeah.

09:03.82
Matt Godbolt
um those are the things, um, that you immediately think about if you, if you can rule out some sort of internal, um, differences like the DNS related thing, or like maybe there's even a cache file on the disc and, and whatever.

09:16.86
Matt Godbolt
Um, and you know, there are frameworks for running benchmarks that try to do their best to isolate your process. They will warn you if you haven't like turned off other, um, certain aspects of your CPU.

09:29.82
Matt Godbolt
so So for example, the CPU can throttle its speed up and down for power saving reasons.

09:35.02
Matt Godbolt
And so that might be a reason, you know, like the first time you ran it it, it ran slowly and then it ran fast because actually the CPU woke up and was now faster, you know, the actual gigahertz clock frequency.

09:45.21
Ben Rady
Mm-hmm.

09:45.43
Ben Rady
Mm-hmm. Mm-hmm.

09:45.88
Matt Godbolt
So there's a ton of sources of noise at that level. And realistically, um when I've done this kind of stuff for the more micro-optimizey stuff, it's best to try and have a dedicated resource somewhere. So we've had like CI machines before now that have been things that we can reserve and or run continuous performance monitoring on critical pieces of code and then just keep a graph of it. But it is noisy.

10:11.72
Matt Godbolt
um There are tools you can run. So I have a friend who swears by using ah Valgrind, which many people know as being like a memory checker for compiled languages, but there are other modes that it has.

10:27.14
Matt Godbolt
And one the modes that it has is like to count the number of instructions executed. It kind of virtualizes the whole thing and it runs as a sort of an emulator of the the very CPU you're running it on, which is kind of how Valgrind itself works.

10:41.10
Matt Godbolt
but it gives you a deterministic "how many instructions did this code path take?", which is interesting.

10:47.08
Ben Rady
Interesting. Yeah. So you're not measuring it by time anymore, right?

10:49.48
Matt Godbolt
You're not measuring by time. And of course, that's not necessarily a perfect proxy because ultimately you want "how long did it take" is the you're trying to optimize.

10:57.10
Ben Rady
Yeah.

10:57.24
Matt Godbolt
But if you, for example, want to put something in CI where you could say, ah literally write a test that says, if this function takes more than 20 million CPU instructions, then we've probably done something dumb somewhere.

11:10.22
Matt Godbolt
Like either we've forgotten a command line switch or we someone added something huge to a function.

11:15.41
Ben Rady
Mm-hmm.

11:15.64
Matt Godbolt
So it's a very... gross um check, but it is one that you could consider doing. And similarly within that, there's a Cachegrind, which tries to estimate what a cache might look like.

11:27.90
Matt Godbolt
And in fact, actually only today, somebody posted, I'm going to just go over to my other screen here.

11:32.35
Ben Rady
Oh, interesting.

11:32.50
Matt Godbolt
Sorry for the sound changing. um Someone's just posted a tool called Cache Explorer, which tries to, um yeah, exactly. Tries to run a little bit of code and give you a visualization of what cache things have happened by simulating a cache.

11:48.84
Matt Godbolt
But again, we don't really know 100% what Intel's caches are doing and AMD's caches are doing, but we can make some you know guesses.

11:55.20
Ben Rady
Yeah.

11:56.12
Matt Godbolt
But those are the kinds of things that might be unstable between runs.

12:02.36
Ben Rady
Interesting. so So in that world you are, and I think there's a few different things here. So you're talking about like you could have a continuously running performance test, continuously running in the sense of it runs on every change, ah runs on every build.

12:17.58
Matt Godbolt
or Or once a night, and you know, two in the morning or something like that, where you can say like, hey, no one is working right now.

12:22.76
Ben Rady
Yeah, yeah, yeah, yeah.

12:22.84
Matt Godbolt
We've got all these resources that are dedicated and there's no other noise coming from it.

12:27.80
Ben Rady
Right. Yeah. And, you know, obviously you can, you can try running those sorts of tests on dedicated hardware. You can try running the tests by not measuring time at all. And just by counting instructions, which should be relatively stable, you know, even if the cat GIF is running. And,

12:41.30
Ben Rady
and is Is your usual MO to then reuse those types of tests for performance optimization? Are you running the same tool that you would run in your CI process when you're like, oh, I need to optimize this method? Or are you building things that are more bespoke?

12:58.26
Matt Godbolt
It depends, honestly. um There have been times when I've used the same tooling. you know Those microbenchmarks are a great source of... um things to look at, but it's really, really easy if you're not very careful.

13:12.06
Matt Godbolt
Again, it depends on which domain you're in and what's, you know, we're talking again from the kind of industries that we've been in where we have a very usually homogenous deployment environment where it's like, well, we have many computers that all look the same.

13:26.72
Ben Rady
okay

13:26.94
Matt Godbolt
So they have this CPU in them and we have the luxury of targeting that CPU or, you know, caring about this particular amount of RAM and number of CPUs and all that good stuff.

13:35.04
Ben Rady
Right. Right.

13:35.70
Matt Godbolt
But typically, developer workstations are not necessarily the same as the servers you're deploying to. And it's it can be easy to fool yourself by running microbenchmarks on your local developer hardware and then be very, very sad when it doesn't pan out when you run it in production.

13:50.90
Matt Godbolt
So you really do have to be careful about drawing too many conclusions from local stuff at the sort of scale that we're talking about with the you know hundreds of microseconds. ah

13:58.08
Ben Rady
Yeah, yeah.

13:58.08
Matt Godbolt
level. So it does it is antagonistic to doing like quick turnaround and sort of like TDD style. Hey, just make it go faster. You've really got to be able to run on representative data, which is hard because it's really, really easy to write a microbenchmark, which is representative on its face, but actually

14:16.50
Ben Rady
Mm-hmm.

14:18.00
Matt Godbolt
is is ah a little bit too best case scenario. um

14:21.12
Ben Rady
Yeah. Mm-hmm.

14:21.12
Matt Godbolt
For example, CPUs have caches we've just talked about, CPUs have branch predictors. And so if you have a very branchy bit of code that you're running 100 million times in a tight loop with the same input data to sort of get you how fast this is gonna be,

14:36.81
Ben Rady
yeah

14:37.26
Matt Godbolt
A: you're hitting the same cache lines over and over again. B: the branch predictor has learned 100% of all of the branches in your benchmark and the benchmarking code around it. And so it is a perfect representation of the best case.

14:49.38
Matt Godbolt
Now, that's an important thing to understand. Like the min, this is something I see people do all sorts of distributions of these profiles. And like the min is an important point. It says like if everything lines up, perfectly, then this is the theoretical best case that you can get.

15:04.60
Matt Godbolt
So the min gives you a piece of information, right? Then there's sort of like people talk about the mean, and I think that's a terrible representation of anything because there's so many tails in there usually.

15:14.92
Ben Rady
Yeah, yeah, right.

15:15.42
Matt Godbolt
but so But having an idea about what the distribution of your function looks like is important. And usually it's a horrible log normal thing that is not very amenable to understanding. But like looking at it and kind of getting a sense of like, are most of them less than 100 micros or is it is it you know just the average? Yeah.

15:31.36
Ben Rady
Yeah.

15:31.62
Matt Godbolt
I don't even know what, I'm not a statistician, but I do tend to look at plots and CDFs and things of that, of what's coming out of the function under test.

15:40.58
Ben Rady
Right.

15:40.78
Matt Godbolt
But yeah, what winding back round to the cache and the branch predictor, if you don't have representative data, then yeah, when you then deploy to production, then you discover that every time you call your function,

15:54.36
Matt Godbolt
it's the worst case because you haven't recently called it.

15:56.88
Ben Rady
right

15:57.12
Matt Godbolt
The branch predictor has lost, has forgotten about that. The cache is not warm. Then you're hitting the worst case of everything.

16:03.42
Ben Rady
Mm-hmm.

16:04.08
Matt Godbolt
And then that doesn't dovetail.

16:05.12
Ben Rady
Mm-hmm.

16:05.26
Matt Godbolt
you like, well, I made this improvement, but you don't you don't see it. So it's almost impossible to do a good micro benchmarks. They can only be sort of representative. They can help you along the way, but you always have to run on real hardware and you always have to run on like a real workload and measure something that you actually care about.

16:22.54
Matt Godbolt
You know, for us, In our world, it's usually you know like the end-to-end latency of a system that we care about. you know So you maybe you say, going back to our Redis example, let's say it was a bit more there was some bit more meat in inside what it was actually doing. And you're like, well, I've optimized the hell out of this part, the bit that does a tree look up for the cache key. And then there's some complicated stuff that goes on. And then we return it. And there's this you know all this networking code that we've talked about. Well, yeah.

16:49.38
Matt Godbolt
you could probably sit and write a really good hash map implementation and all the benchmarks that go with it and think that you've done a great job. And then really you still have to see it in context of in the...

17:00.82
Ben Rady
yeah

17:01.98
Matt Godbolt
the actual Redis server, has it made a difference? Or is it actually no? Well, this is great because it uses tons more RAM than before. And in my micro benchmark, that just works a treat because I've got all the cache to myself. But when I have to share the cache with the networking code that runs around it, then suddenly it's not good anymore.

17:19.51
Ben Rady
okay

17:20.68
Matt Godbolt
I realise, that's an incredibly long and rambly answer to your point, but...

17:24.58
Ben Rady
No, no, that's, I mean, that's, that's really good. And so, you know, I, ah can go back and forth and argue one versus the other about whether it's more important to have systems that are sort of, you know, we talk about systems that are designed to be tested, systems that are designed to be observable.

17:44.38
Matt Godbolt
yeah

17:45.30
Ben Rady
ah You know, you can maybe argue one dimension of observability as performance, I guess. And so you could say like, is this is this system designed to be profiled, right?

17:53.59
Matt Godbolt
I think so, yeah.

17:53.88
Ben Rady
Like, can you measure its performance under real world conditions with real world data? And the trick, of course, there is that you know the more you care about performance, the more you care about not slowing down your performance by measuring your performance.

18:08.42
Ben Rady
And so having something that's running in the most real world environment, which is to say the actual production environment that is measuring performance, you might be questioning, have i am I making my system slower by measuring how slow it is?

18:22.90
Ben Rady
So what are some tricks that you've used in the past to sort of break this? it know that it's an iron triangle, but ah the trade-off there.

18:28.04
Matt Godbolt
Oh, that no, that's interesting. yeah Yeah, because we haven't really talked about that. We've been talking about sort of end-to-end measurements, be it because I'm running a microbenchmark and calling a function, and then literally the microbenchmarking system is doing the, start the clock, call your function 100,000 times, stop the clock.

18:44.06
Ben Rady
Yeah.

18:44.42
Matt Godbolt
That's how long it takes. You know, that's the one thing.

18:45.87
Ben Rady
Yeah, yeah.

18:46.10
Matt Godbolt
Or in the case of, you know, our Redis thing, it's like, well, I have, um I start the clock, I send a request to Redis, I get the request back, I stop the clock. You know, that's how long the request takes. That's pretty obvious. But yeah, if you need anything more fine grained than that, then usually some kind of instrumentation

19:00.70
Ben Rady
Mm-hmm. Mm-hmm. Mm-hmm.

19:01.46
Matt Godbolt
is what you want to put into your software. But to your point, somewhere along the line, that instrumentation itself takes time. And so you kind of put a Heisenberg effect of like, hey, i so I know exactly how long my program takes and it's really, really long because it's measuring time all the, you know, to a first approximation is all it's doing is getting the current time at any point in in the program.

19:25.14
Ben Rady
yeah

19:25.48
Matt Godbolt
And so there are lots of tricks you can do. I mean, there are um there are definitely... relatively low overhead timers that you can use.

19:35.96
Matt Godbolt
If you're, again, if you're in the weeds of microseconds, then it's useful in a C++ context to write yourself a little RAII style class, which is something which starts a clock in its constructor and stops the clock in its destructor.

19:51.10
Ben Rady
Mm-hmm. Mm-hmm.

19:51.32
Matt Godbolt
And then it squirrels away the number somewhere so that you know how long something took. And maybe you can accumulate over time all of the calls that fall into this scope. And those timers can be measured using the CPU's own timestamp clock, which is a relatively low overhead measured in CPU cycles number that you can then later on turn back into a clock number.

20:16.40
Matt Godbolt
I mean, it's also nowadays, actually, the Linux kernel, if we're talking about Linux, is pretty fast at getting at the actual real time. But... But the TSC is a pretty decent, fast way of getting time. ah So that lets you accumulate blocks, and then you can at least sort of look for hotspots in that way. But then there are many tools that you can get that will give you more information, or you can write your own tools that sort of hierarchically arrange these. So you can end up with like flame graphs, which are these time divided blocks

20:45.62
Matt Godbolt
It's almost like stalagmites or stalactite-looking things, like you know flames, I suppose. That's what they look like, rather than if you use the red colors. And so you see hierarchically where time is being spent.

20:54.77
Ben Rady
Mm-hmm. Mm-hmm.

20:54.80
Matt Godbolt
and that's a useful thing to um to have. So you can say, hey, I spend all my time doing this thing, but um really in this function, it's because these three functions it calls are the things that take time, and you get that kind of view out of ah out of it.

21:08.02
Matt Godbolt
But there is a little bit of overhead there because you're measuring and on the entrance and exit from those those functions, and maybe you having to write that down somewhere in like a log file or in shared memory or something like that. So those are all valid ways of, of measuring, but they have different amounts of overhead.

21:23.88
Matt Godbolt
um There is, yeah, cool I was gonna say this one other sort of trick, which is that modern CPUs have some amount of accounting that you can do within the CPU itself.

21:33.70
Ben Rady
so look um Go ahead.

21:34.72
Matt Godbolt
And there are sort of two ways of viewing that. So a traditional profiler, which is, um, and by tradition, I mean like every profile I'd used until fairly, you know, until sometime at previous company, um every profile use is a trick where you,

21:55.42
Matt Godbolt
set a timer, like in the a CPU timer. And then after every, you know ah you know, a thousand times a second, you just say, where the hell is the CPU right now?

22:06.42
Matt Godbolt
What program counter do I have?

22:07.79
Ben Rady
Mm-hmm. Mm-hmm.

22:07.84
Matt Godbolt
And then you do a little bit of work to find out what the stack is. And then that gives you a sample that says, Hey, you spent this much time in here. And then you can sort of infer if you can run your task long enough. then you get this great answer of like, well, you spend 30% of your time in malloc and 30% time in free. And you're like, well, maybe I need to think about my memory allocation strategy. that's ah That's a great thing. and that's That's fantastic for throughput-based things where you have a continuous workload that you care about it all the time. You just want the whole thing to be faster So you can put that on. It's relatively low overhead.

22:38.32
Matt Godbolt
You know, a thousand times a second sounds amazing and, she's you know, like really, really often. But to a computer, that's like eternities between sample points. And so if you can run it long enough, you get great results. If you're worried about latency, though,

22:51.88
Matt Godbolt
that's not much use. You typically find that you're spending 99.99% of your time in the "wait for something to do" routine.

23:01.03
Ben Rady
Right, right.

23:01.48
Matt Godbolt
And then the samples never actually land in a in the code that you care about. So in the, yeah in the In the Redis case, for example, you would find that you've got to optimize your epoll routine.

23:13.76
Matt Godbolt
You're like, no, I haven't. The epoll is just waiting for a packet to arrive, and I only care about it when the packet comes in And it's like, well, that thousand a thousand times a second timer and never went off while you were actually doing anything useful.

23:24.30
Ben Rady
Right. Yeah, yeah, yeah. Yeah.

23:25.76
Matt Godbolt
um And that's where like the instrumentation I was just talking about coming to, because there you get to put the instrumentation where you care about it.

23:30.99
Ben Rady
yeah

23:31.26
Matt Godbolt
But modern CPUs have the ability to do, ah like, essentially record Oh, that's great. So we've got a car alarm going off somewhere in the background. I don't you can hear it. or I can, and I've got my noise-canceling headphones on. all right, well, sorry. Sorry, editor Matt, you've now got something to worry about. [Editor Matt says, seems fine to me!]

23:48.16
Matt Godbolt
um But yeah, so the modern CPUs, Intel ones, can do like a trace of execution. And so they can essentially... instrument themselves and every time a call happens or a return happens or a branch a conditional branch is either taken or not taken some amount of tiny information is recorded into a buffer and then you can ask it to write out that buffer and then you can post process that and sort of infer what happened and there are tooling so uh linux has perf that can record this information And then there's things like um MagicTrace, which is like ah a set of user scripts over the top of it and a nice website that lets you view it in like a timeline diagram. And that's an eye opener, honestly. At that point, you've kind of got the whole world open to you and every single nanosecond is accounted for you. And that's really quite something.

24:41.52
Matt Godbolt
So I forgot what you asked now. Again, you've just you see, this is what you just keep winding me up and watching me go.

24:48.06
Ben Rady
No, no, this is

24:48.62
Ben Rady
this is great. I mean, you yeah you answered my question exactly. so So let's say that you do all these things and you find the part of your code that you're now fairly convinced is the reason that it's slow, right? um And at the risk of having you just recount the compiler explorer origin story, now what?

25:07.28
Matt Godbolt
Well, so that's funny you should say that. Yeah, the Compiler Explorer origin story is almost a somewhat orthogonal, actually, you know in some ways.

25:15.51
Ben Rady
Okay.

25:15.52
Matt Godbolt
That was, yeah, a little bit, but...

25:16.64
Ben Rady
It's not, what are what is, what instructions is this code generating?

25:19.76
Matt Godbolt
and Well, it was, but it was that was more like arguing over whether the compiler was smart enough to make human readable code as good as like the awkward code that you used to write because compilers weren't smart enough, right?

25:32.46
Ben Rady
Right.

25:32.58
Matt Godbolt
and that's So we've i mean we're very much in the micro-optimizations when we're talking about code generation.

25:37.47
Ben Rady
Right, right.

25:38.04
Matt Godbolt
And honestly, that's the fun bit for me.

25:40.20
Ben Rady
Mm-hmm.

25:40.20
Matt Godbolt
But very often, it's like, the again, the things I can't remember.

25:44.68
Ben Rady
Yeah, it's just, you're just doing something dumb, right? Like it's not some, yeah.

25:44.68
Matt Godbolt
Don't be stupid on purpose is like the very first thing you should say. is like you know i'm ah

25:52.93
Ben Rady
Yeah.

25:53.48
Matt Godbolt
Like, am I looking this up in a map over and over and over again when I could just look it up once and remember it, right? Now, compilers are smart, but they're not necessarily smart enough to stop you from calling a whole function chain that is the look up something in the map.

26:05.79
Ben Rady
Mm-hmm.

26:05.82
Matt Godbolt
it's It may not, you know, the compiler hasn't got perfect knowledge. It can't tell that like no one has changed that map since the last time you looked in it.

26:14.09
Ben Rady
Mm-hmm.

26:14.76
Matt Godbolt
And so this is the kind of thing you see. But again, these are relatively small wins sometimes. um Algorithms are important, obviously. you know If you're doing something dumb, if you're searching through a massive array of things when you could use an acceleration structure, be it a hash map or something else, then that's important. But then at this granularity that we're sort of talking about, sometimes it is actually faster to go through a big array than it is to...

26:43.72
Matt Godbolt
ah like jump randomly around in memory that would be like hash map potentially.

26:48.59
Ben Rady
I really

26:49.46
Matt Godbolt
There's almost no case in which a linked list is the right answer.

26:54.14
Ben Rady
it's yeah

26:55.66
Matt Godbolt
I have one special case of where I think a linked list is the right answer.

27:01.14
Ben Rady
i really want to hear this. what When should one use linked list?

27:06.16
Matt Godbolt
So one should use a linked list when you need to keep an ordered list of things, not necessarily ordered as in like they, let's say insertion ordered list of things.

27:21.81
Ben Rady
Mm-hmm.

27:22.00
Matt Godbolt
So I've got an ah a sequence of objects and I can only ever add them to the back of that list, right?

27:29.41
Ben Rady
Mm-hmm.

27:29.66
Matt Godbolt
But I can arbitrarily remove them out from the middle of the list. And those, I'm given a unique identifier for that object by some external source that I have no control over.

27:42.55
Ben Rady
Mm-hmm.

27:43.16
Matt Godbolt
And it's associated forever with one of those objects. And I need to keep them in that sequence, right? But at any point, I could get a message saying, oh, number 79234 is gone. And I need to be able to find it and remove it from that ordered sequence. But I need to keep the sequence.

27:59.36
Matt Godbolt
So you've kind of, in the traditional sense, you might have a hash map to find that object.

28:09.80
Matt Godbolt
which is great. But if it's in the middle of a vector, like an array of memory, that that would be cool.

28:15.76
Ben Rady
Yeah.

28:16.20
Matt Godbolt
You deleting it is not only really painful because you have to shuffle everyone beyond it down one, but you now have to go into the map of where everyone else is.

28:26.24
Matt Godbolt
And kind of update them to be in the new location that they're in

28:29.81
Ben Rady
Mm-hmm. Mm-hmm. Yeah.

28:31.28
Matt Godbolt
So that's a pain.

28:33.15
Ben Rady
Yeah.

28:34.26
Matt Godbolt
Having a doubly linked list is disgusting, but... But given a pointer to just that object itself, all you have to do is like look at its next and prev and wire them up together and you're done.

28:47.92
Ben Rady
Yeah. And connect them together. Yeah. yeah

28:48.66
Matt Godbolt
And you're done. And so that's one of those examples where the worst case is for a single operation.

29:00.92
Matt Godbolt
So like if i if I say, ah so, I mean, I mean I'm alluding to this, this is me thinking about an order book. So in in our world of finance, the exchange

29:08.02
Ben Rady
Man, I was just going to say this sounds like the priority queue for an order.

29:11.65
Matt Godbolt
It's a price level.

29:11.86
Ben Rady
Yeah. Uh huh.

29:12.20
Matt Godbolt
It's a price level for, ah yeah, exactly. So so you're you're told about these things and that happens.

29:15.34
Ben Rady
Yep.

29:15.34
Matt Godbolt
So you could imagine you've got um thousands and thousands of orders on a single level, and it's important to keep them in that sequence so you know their relative priority.

29:25.37
Ben Rady
Mhm. Mhm.

29:26.86
Matt Godbolt
Whenever anyone trades, it's always the order at the front that trades away, which is kind of the worst case for if you did keep them in a vector, right?

29:36.68
Ben Rady
Mhm.

29:36.82
Ben Rady
Yeah.

29:36.96
Matt Godbolt
But you could always store them backwards. And then that's a good case, right?

29:42.08
Ben Rady
Yeah, yeah.

29:42.28
Matt Godbolt
Maybe that works, you know, that's sort of a weird.

29:43.97
Ben Rady
Treat a little more like a stack basically in weird way.

29:45.52
Matt Godbolt
And then you're kind of like the ones at the back.

29:47.92
Matt Godbolt
Yeah, exactly. Which is not, and and there also, there are things like ah deques, double-ended queues that have like this sort of capability of being like multiple slabs,

29:54.41
Ben Rady
Yeah, yeah.

29:54.86
Ben Rady
Yeah.

29:55.08
Ben Rady
Yeah.

29:55.30
Matt Godbolt
which you can chain a linked list effectively of slabs. You could chain a new link slab at the beginning or unchain it moreover. So those are all great and cool and everything.

30:06.00
Ben Rady
Mm-hmm.

30:06.83
Matt Godbolt
um And it's very, you know, much more cache efficient to have them laid out that way. But it does mean that the worst case is worse, right? You know, like whatever it is, it'd be at the front order or the back order, depending on which one you've optimized for,

30:23.18
Matt Godbolt
If you have to remove that one, then suddenly you're shuffling however many thousand other orders around potentially. So that's bad.

30:30.68
Ben Rady
yeah

30:31.34
Matt Godbolt
It's probably the case that the linked list based version is always worse, but it's consistently no... there isn't a bad case, right?

30:40.52
Ben Rady
Right, right. You get the same amount of worseness every time.

30:44.56
Matt Godbolt
And so that's an interesting trade-off.

30:45.18
Ben Rady
Yeah.

30:45.60
Matt Godbolt
it's ah It's an interesting choice. And you know I've seen a number of book implementations in my time. And there's just different tradeoffs. It's a really interesting data structure to try and optimize and ask the question, why would you do it this way?

30:56.66
Matt Godbolt
you know I've just started working in a new company and I've seen another way that it can be done, which I can't talk about, but is fascinating.

31:02.74
Ben Rady
Mm-hmm.

31:02.84
Ben Rady
All right.

31:03.04
Matt Godbolt
And I'm really, really excited by, but you know that's how it is.

31:05.85
Ben Rady
right Yeah.

31:06.60
Matt Godbolt
um So you know that's an interesting one. So anyway, the algorithm... the algorithm can be important. ah you know That's what I think what we got onto this.

31:15.36
Ben Rady
Mm-hmm.

31:16.64
Matt Godbolt
ah And then you know caching, not doing work you don't have to do. So I think, yeah, we've said that the questions you ask yourself are something like, do I actually need to do this at all for each thing?

31:25.02
Ben Rady
Mm-hmm.

31:25.14
Matt Godbolt
And oftentimes you're like you discover, actually, why are we logging this out? Let's just take just take the log line out right or put it behind a guard so it doesn't happen in production. It only turns on when we want that log or whatever.

31:35.86
Matt Godbolt
so It's an expensive piece of work. you know The classic example here is that you format a bunch of strings and then you pass them to debug log, which is not on in production, but you've already done all the work of formatting all the strings.

31:46.03
Ben Rady
Yeah, yeah, yeah.

31:46.18
Matt Godbolt
And you're like, well, hang on a second.

31:48.06
Ben Rady
Right, right.

31:49.22
Matt Godbolt
I don't need to do these. um Another thing is, do I need to do this now? So can I cache this ahead of time?

31:59.84
Matt Godbolt
Is it just ah you know something I could do at program startup? Can I make a reasonable guess as to the values that will happen in this particular thing? In which case, maybe I take some time at the startup and I pre-populate a lookup table with all the possible values.

32:12.54
Ben Rady
Yeah.

32:12.62
Matt Godbolt
And then I say, okay, well, it's just a lookup now at runtime. ah there are other tricks in that world where you can say, well, do I need a perfect answer to this now? Or can I defer updating something until later?

32:25.32
Matt Godbolt
Or can I have another thread post me back occasionally saying, hey, you know, this is the amount of whatever's that are available, you know can you be conservative and get away without on your hot path doing the really complicated stuff and then push other stuff to other threads.

32:40.88
Matt Godbolt
um yeah, I'm just, I'm trying to think of the, the sort of obvious things here and, you know,

32:46.78
Ben Rady
So how often i feel like there's this almost myth of, and I have to wonder if anyone has ever come up to you with questions like this, the myth of the expert programmer that has like the function that is taking all the time. And they're like, I'm going to rewrite this in assembly code and it will be super fast because I have written it in assembly code.

33:13.10
Ben Rady
Like; does that actually happen? And when and why would that actually happen?

33:18.72
Matt Godbolt
So yes, yes, but it's, it's a, it's something you have to do very advisedly. um I'm trying to think about how I can say this without breaking any confidences, but I'm aware of certain critical core loops in certain circumstances where compilers just aren't good at intuiting which variables are the important variables and no amount of tagging them can convince them otherwise. And so they forever spill onto or off of the stack for those things.

33:56.44
Matt Godbolt
and In one specific case, rewriting the core loop of something, you know, and by core loop, I mean, it's a chunky piece of code, know, page of assembly, at least.

34:09.34
Ben Rady
Mm-hmm. Mm-hmm.

34:10.42
Matt Godbolt
um Rewriting that was, writing that in assembly was a definite, definite win in terms of not spending half the time pushing and popping or reading and writing to memory, which has its own issues to do with aliasing and stuff. We've grazed at before in in some of the conversations we've had about the clever tricks that registry naming that can happen if you use registers. But as soon as you put into memory, a whole new system has to come in and it's a lot more complicated and slow. Anyway, so, so yes.

34:38.08
Matt Godbolt
And there are also some other examples where if there are specific instructions and sequences of instructions that do specific things that you need to do, then sometimes cracking out the assembly is worth it.

34:50.52
Matt Godbolt
Although less often now that most of the instructions are available as intrinsics that you can call or library functions that have the same meaning behind the scenes and have basically been invented to wrap the underlying hardware's capabilities.

35:04.94
Matt Godbolt
And then you can phrase things in maybe a slightly tortured way where you know you have to call underscore, underscore some weird thing to get some function and assign it to some weird type and then write your your instructions, not instructions, write your code out longhand where it's like __add bracket, __sub X comma Y, close brackets comma Z, that stuff rather than the more.

35:29.83
Ben Rady
Mm-hmm. Mm-hmm.

35:30.48
Matt Godbolt
But you know you can get it to generate the right code. So both... But at that point, you're really just using the C compiler to write the assembly code that you would like it to write. And that's still better than writing the assembly. So to answer your question, yes, but it really, really has to be worth it because you more than anything else, you're trading off the ability to change and understand your code down the line.

35:52.66
Matt Godbolt
It's so fragile. It is so very, very fragile once you've written it in assembly. And you have to be so sure it's right. So testing is important and you know keeping a C version of it around that you occasionally race with a new compiler against your other input optimization you know version and making sure they...

36:10.53
Ben Rady
Yeah.

36:10.78
Ben Rady
I was going to say, I feel like one of the things you give up by doing this is the ability to just upgrade your compiler and then all of a sudden it's faster, right?

36:18.26
Matt Godbolt
Right, exactly. So that's I think the times that I've known that has worked, and this is not anything I've done for what it's worth. I haven't written the substantial amount of assembly code since the nineties. um But you know where where it has been known to work is where folks have taken the time to carefully write some assembly for something that's really, really important and then keep a C version of it next to it and then have like continual races against the C code versus the assembly code on upgrades and also correctness checks against the two.

36:47.05
Ben Rady
Yeah.

36:47.12
Matt Godbolt
you know this is We run inputs...

36:47.84
Ben Rady
Yeah. Right.

36:48.44
Matt Godbolt
we run inputs on the C code and then we compare it against the output from the, the, the assembly code, that kind of stuff. But yeah, it's, it's tough, man. And I mean, another example of like, when you have to write assemblies, like if you're interacting with kernel magical stuff, you know, like that, but which, which for some of the more esoteric um profiling things, you might have to do an instruction here and there to get, but, but yeah, usually no, usually um the flex in my experience,

37:17.74
Ben Rady
Sure.

37:18.02
Ben Rady
Mm-hmm.

37:18.30
Matt Godbolt
the flexibility you get of leaving it in C or C++ and having the ability to quickly manipulate, move things around, change around, play with compiler flags and let essentially 40 years of other people's experiences that have been poured into the heuristics of a compiler at my code, they're usually better than most things I can come up with. And you know the times that I found I can substantially beat the compiler with assembly have been times where I have been unable to explain correctly to the compiler the unwritten constraints that I'm aware of that it is not.

37:55.20
Matt Godbolt
or vice versa. It is being more conservative. you know It doesn't know that writing through pointer X and then reading through pointer Y, those two things will never be the same address, but I do. And so if I write the code that way, of course I won't.

38:08.36
Matt Godbolt
I'll read one into a register and and keep it in the register the whole time. Whereas the compiler's like, well, every time you write to Y, I have to reread X again because for all I know, Y points at X, that kind of thing.

38:16.84
Ben Rady
Yeah.

38:17.58
Matt Godbolt
Gosh, that's a lot of things to talk about abstractly 40 minutes into a podcast.

38:24.90
Ben Rady
Well, i so I got i got one, i don't know if this is, I'm just gonna ask this question. So talking about perf and using using perf, a tool that I have also used for just answering generally the question of what is my code doing for lots of reasons, performance optimization being one is strace.

38:44.22
Matt Godbolt
go ahead.

38:45.14
Ben Rady
ah Like what system calls am I making? What are the arguments to those system calls? When are you using one versus the other? When are you using perf? When are you using strace? What problems are you trying to solve when you use those tools?

38:59.05
Matt Godbolt
Yeah, that's a really good point, actually. ah strace is like the go-to, isn't it? For like, if something's taking a long time, that I don't have the source code to, I'm going to strace that thing and see what the hell it's doing.

39:09.86
Ben Rady
Right. Yeah.

39:09.89
Matt Godbolt
And it's like...

39:09.96
Ben Rady
Going back to the, it's doing the DNS lookup or it's doing the weird file system call.

39:13.43
Matt Godbolt
Exactly.

39:13.46
Ben Rady
Right. Yeah.

39:14.76
Matt Godbolt
Exactly. And so if we're talking about very high performance things like we have been, then if you're doing it right, strace will show you nothing because you shouldn't be interacting with the operating system.

39:26.15
Ben Rady
Right.

39:26.98
Matt Godbolt
ah You know, if you're the instant you're calling fwrite to write to a file, you've already lost, right?

39:32.61
Ben Rady
Mm-hmm.

39:32.80
Matt Godbolt
That's not a high performance piece of code, right?

39:34.81
Ben Rady
Mm-hmm.

39:35.08
Matt Godbolt
And that's perfectly valid. I mean, obviously, there are loads of bits of... high throughput code that are going to call fwrite. And then you know you do want to look at a strace. So maybe that's your answer. If you're looking for latency, you won't see anything. But for throughput, it might be a really good indicator of like, am I spending a lot of time waiting for the file write to finish happening?

39:55.72
Matt Godbolt
ah So strace is very valuable for that kind of thing. and There are other tools as well. So there's various eBPF-based stuff that I haven't had much personal experience with. but So my knowledge goes back to System Tap, which is like another similar thing where you can hook into more parts than just system calls. You can say like, hey, tell...

40:14.34
Matt Godbolt
call this function effectively in like a funny little scripting language every time there's a page fault. And then I can aggregate page faults and they can be I can see what the heck's happening there and give me the stack. And then I can work out, hey, wait a second, we've just allocated a massive slab of memory and then we get all these page faults. What's going on? And you realize, oh, of course,

40:32.34
Matt Godbolt
Although I think I've allocated a big slab of memory, the operating system has just given me a big gaping empty hole of virtual address space.

40:38.94
Ben Rady
Mm-hmm.

40:39.20
Matt Godbolt
And then every time I read or write to it, it decides to now fault in the 4K page. And that takes a bit of time.

40:45.09
Ben Rady
Yeah.

40:45.40
Matt Godbolt
I'm like, no, no, no, no! I don't want to do this. It's a low latency thing. look At the program startup, I'd like you to do all of that, please. And then I don't have to pay for it later on.

40:52.87
Ben Rady
Yeah.

40:53.40
Matt Godbolt
those kinds of things. And those kinds of stuff you can, you can find from like a SystemTap or dtrace in other operating systems, or again, there's some something, something eBPF. So, so yeah, there are other tools available for sure.

41:05.82
Matt Godbolt
And I know that things like a SystemTap and eBPF based things can also um patch function calls, both in the kernel. And I think in your user code as well, you can put trace points.

41:16.64
Matt Godbolt
And so you can kind of add some dynamic things where you can say every time this function is called, I want to, I want to know about it.

41:21.71
Ben Rady
Nice.

41:21.86
Matt Godbolt
I mean, like, Funny, funny, ah true story. Like it's not the worst profiler in the world for throughput based things to just run it in GDB and hit control C and say backtrace, where are you now? Okay, continue.

41:34.58
Ben Rady
You're sampling.

41:35.07
Matt Godbolt
Control C.

41:35.24
Ben Rady
You're just acting as a sampler, right?

41:37.24
Matt Godbolt
Yeah. Exactly. it's But it's a great...

41:38.76
Ben Rady
Sometimes it works.

41:39.14
Matt Godbolt
Exactly. and And another... While we're just thinking about these things, another performance investigation technique is to just single step through your code. And if *you* get bored of stepping through stuff, then your CPU is taking too long.

41:55.93
Ben Rady
Sympathetic profiling.

41:57.25
Matt Godbolt
That's right.

41:57.68
Ben Rady
how How do I feel when this code runs? Do I feel fast?

42:02.50
Matt Godbolt
Does it feel good?

42:03.22
Ben Rady
Nah, it's just, yeah.

42:03.48
Matt Godbolt
I mean... But no, I mean, i joking aside... Like, especially with the levels layers and layers of indirection that things like ah C++ can give you. And if it hasn't inlined at everything and the compiler hasn't had heuristically determined that it's valuable to keep inlining until it nets out and says, oh well, actually, this is just, you know, return to.

42:24.60
Matt Godbolt
and Then sometimes you'll find yourself stepping into functions that step into functions that step into functions. You're like, what? where How far down is this going? And then eventually get to the return to and you're like, okay, i need to I need to do something about this, right? that This is not... This is not right. i know yeah And again, I say this and people are but in order to debug it, surely you had to do a debug build and debug builds have no... you know aren't fast.

42:47.78
Matt Godbolt
It's like, no, we have to separate the idea of optimization, you know the -O1 -O2 -O3 of like a C++ build from leaving the debug symbols around.

42:58.73
Ben Rady
Yeah,

42:59.14
Matt Godbolt
And knowing that you can still have a completely optimized binary that has at least somewhat useful debug information so that you although inlining has happened and code has been moved all over the shop. And so if you actually single step through it, you'll see your poor cursor inside your source code jumping all over the place.

43:15.08
Ben Rady
yeah yeah

43:15.16
Matt Godbolt
But it still gives you some idea about what happened and where time is being spent and what things are going on. Yeah, so, you know, for me, I all always have debug symbols around because it's like, why wouldn't I? Now, obviously, if we were shipping to external customers, that maybe we wouldn't send.

43:31.94
Ben Rady
Yeah, you're leaking that information or maybe your binaries are just a little bigger that way. And if you're sensitive about the space, but otherwise, why would you not?

43:39.44
Matt Godbolt
Yeah, and it does slow down some parts of the build. You know, it can slow down the build in into cases because it's a lot of debug stuff to go.

43:42.56
Ben Rady
Yeah. Yeah. Okay.

43:42.56
Matt Godbolt
on But there are clever linkers that can do tricks and things.

43:44.94
Ben Rady
Yeah.

43:45.61
Matt Godbolt
And yeah, so.

43:46.30
Ben Rady
Yeah. Okay.

43:46.76
Matt Godbolt
But yeah, that's why strace. What was the question again?

43:49.82
Ben Rady
No, I love that point of like, if you're in latency sensitive code and and strace gives you anything, well, there's your problem, right?

43:56.75
Matt Godbolt
Yeah.

43:57.08
Ben Rady
Like, like that's great.

43:58.28
Matt Godbolt
Yeah. Now, although of course, you know, with multiple threads, you have to be a bit careful because, you know, other threads could be doing system calls and that's, that's fine.

44:03.38
Ben Rady
Right. Right.

44:03.94
Matt Godbolt
We didn't really talk about threads as well.

44:05.07
Ben Rady
Yeah. Yeah.

44:05.14
Matt Godbolt
You know, there are all sorts of horrible thread sharing related issues that can come from that, you know, and the fact that waiting on a mutex or not waiting on a mutex is sometimes an operating system level thing and sometimes is a futex and sometimes it's sort of slightly outside of the kernel so that things can be, you know, so

44:22.60
Ben Rady
Mm hmm.

44:23.44
Matt Godbolt
It's complicated, man. I think that's the short version. There's a lot. It's a deep topic.

44:28.78
Ben Rady
Well, um we've gone 45 minutes on this so far.

44:31.98
Matt Godbolt
Yeah, I think there's a part three, isn't there?

44:35.16
Ben Rady
Maybe we do part three. i don't know.

44:35.24
Matt Godbolt
Maybe.

44:35.24
Ben Rady
i'm not going to commit to that yet, but that that could possibly be a thing in the future.

44:39.40
Matt Godbolt
Yeah, we could we could certainly consider that. well We'll see how this goes out.

44:43.98
Ben Rady
Yeah.

44:44.44
Matt Godbolt
Yeah.

44:44.90
Matt Godbolt
Well, we better stop because, yeah, I've got to edit this. And I'm lazy [Editor Matt certainly is...]. I don't want to have to do more than 45 minutes. Have you any idea how awful it is listening to yourself gabble and realizing all the mistakes you made while editing?

44:57.25
Ben Rady
Yeah.

45:00.66
Ben Rady
uh yeah you gotta listen in on repeat and trying to cut out this um that i said and it's just like the worst thing in the whole world

45:04.99
Matt Godbolt
you know There's only so much I can do in the edit to make myself sound intelligent.

45:05.22
Matt Godbolt
Yeah, I'm sure our listeners have noticed that recently I've stopped cutting most of the things out because life's too short.

45:11.80
Ben Rady
yeah nah it's better this way oh natural

45:14.76
Matt Godbolt
Yeah, this is Yeah, our listener is ah sat on the table next to us in a restaurant while we're having this kind of conversation anyway and just listening in. And that's fine by me.

45:23.46
Ben Rady
Yeah, that's the way to do it. All right, should we call it there?

45:27.08
Matt Godbolt
Let's call it there, my friend. So I will see you next time.

45:30.12
Ben Rady
Until next time.