WEBVTT

1
00:00:00.020 --> 00:00:01.060
<v Matt Godbolt>I'm Matt Godbolt.

2
00:00:01.060 --> 00:00:02.120
<v Ben Rady>And I'm Ben Rady.

3
00:00:02.120 --> 00:01:10.060
<v Matt Godbolt>And this is Two's Complement: a programming podcast. Hey Ben, we spoken a lot about how important it is to have a fast feedback cycle. That's easier said than done in some languages. Um, I spend most of my time in compiled languages like C++, and they're well known for having really slow builds. Now I'd like to talk a bit about how it might be possible to make our life easier and whether there are things we can learn from other languages or other approaches that you might know, or if there are some tricky tricks that I know from C++ that I can, we can talk about to just basically try and give people an idea about how one might test or run or deploy your compiled language project, or indeed any other language project as quickly as possible and get that feedback, that rule of eights thing that we've talked about a number of times.

4
00:01:10.060 --> 00:01:24.920
<v Ben Rady>Uh, yeah, I think that would be cool. It might be kind of important to talk about, uh, when we say like feedback cycle, like what do we mean by that? Like what are some examples of feedback cycles and software development and you know, how do you speed them up?

5
00:01:24.920 --> 00:02:40.340
<v Matt Godbolt>So when I'm developing, I'm a dyed in the wool, IDE person, as, you know, having worked with me and I love being able to make a small change to my code and then very quickly find out where I've gone wrong. And very, very often that is either the squiggly lines in the IDE itself where it's, where it worked out that, Hey, you know, you've mistyped something or maybe the green squiggly lines that like the linter has run and said, you know what? That's probably bad practice. Or mostly it's I, I hit build and very quickly I get the result back from the compiler, says you're a fool, you're missing a semi-colon, you know, that kind of thing. So that's, that's the first level of feedback that I'm looking for is am I on the right track as a developer, just on the nuts and bolts stuff about, have I got my syntax, right. But then once I've got something working and building, I want to know whether it's right and you know, we've talked a bunch about testing and that basically means running automated tests or indeed just running it and kicking the tires myself and saying, is this what I wanted? So that's what I'm thinking about, what I'm thinking about feedback. I think I'd like to go from the point where I've made a change in my code to seeing if it works. However, we define that, as quickly as possible.

6
00:02:40.340 --> 00:03:15.500
<v Ben Rady>Yeah. In fact, you know, there's kind of a model of software development where it's really nothing more than just a number of feedback cycles, all increasing in length, right? From like typing the keys and seeing whether or not you got a syntax error to, you know, saving and maybe compiling and running tests to see if you have a logical error to starting up the system and maybe doing some manual testing or some integration, automated integration testing to see if you have a system level error to, you know, giving it to a user and making sure that the user got what they expected. Deploying it, you know, you can kind of go further and further out.

7
00:03:15.500 --> 00:03:55.220
<v Matt Godbolt>I guess. Yeah. They're all part of the same thing that each, each is a little bit further down the line to the finished product than the last step. But if you think about it holistically, then you get each of those is it's important to be able to, to get to quickly like, like deployment, for example, that's often something which is grafted on at the very, very end and is a massive pain if you're especially with, um, binary deploys, like in a compiled language, sometimes if you've got to deploy a particular build and it has certain dependencies that if you think about that at the end, it can be a nightmare. But if you start from the beginning, then you're always deploying and it's not such a, a problem.

8
00:03:55.220 --> 00:04:46.380
<v Ben Rady>Yeah. I mean, you can think of all of these things, I think as sort of, um, spending a little bit of time to get confidence that you are ready to spend a little bit more time, right? Like you, you know, from a very naive point of view, as a, as a, you know, almost not a person not involved in programming at all, you could just say, well, why don't you just sit down and type out all the code from beginning to end, exactly correct, and then give it to me right. In the, in the final finished form. And, you know, we obviously know as programmers, that's not how it works, but you know, it's, it's maybe useful to think about, okay, well specifically, why doesn't that work? Right? Like what do you actually need to do to, you know, deliver working software? And I think a lot of it is the recognition that it's, you know, I would say it's basically impossible to build large things correctly.

9
00:04:46.380 --> 00:05:26.460
<v Ben Rady>You can only build large things out of small things and try to build the small things correctly. Um, and I mean, you know, maybe they're the lines where you draw between big and small can vary from person to person or team to team or problem to problem. But I think that's kind of mostly true. And so the feedback cycles are just a way to define small things and, and to try to buy a little bit of time by saying, well, I'm going to do some very, very small things and check that they're correct. And then do some more small things and slightly bigger things and slightly bigger things. So builds are obviously a big part of that and how you set up those, those feedback cycles is a big part of that.

10
00:05:26.460 --> 00:06:18.720
<v Matt Godbolt>In a yes, particularly compiled languages. And let's face it even JavaScript these days when I'm hacking on my hobby project, I spend most of my time waiting for web pack to do its magic and convert my TypeScript to Javascript or do whatever. So like everything is compiled these days. So there's definitely something to be said for getting that fast feedback cycle. Um, in, in C++ in particular, I've worked on very large projects back a decade and a half ago when I was working in games industry, there were giant, giant megalithic projects that were very long to build and that made testing hard. It was a, it was a big deal, you know, you would, you would build the code, you'd go and make a cup of tea. You'd come back and hopefully it actually built and linked. And then you deploy it to your, through a serial cable, to a, like a dreamcast on the other end of it.

11
00:06:18.720 --> 00:07:24.560
<v Matt Godbolt>And then you'd run it and you'd be like, Oh crap. Yeah, I've got that the wrong way round didn't I, and that was bad. Nobody really wanted that situation. And so I'm actually inspired with a colleague of mine at the time. Uh, we started taking some approaches that, uh, a guy called John Lakos, who now works at Bloomberg, uh, had come up with as like these are principles by which I lay out structurally, physically lay out my software so that the build time is minimized. And that's a really interesting thing for me that was for Nik and I, it was a real eye-opener that as well as designing software for testability, not that we really were at the time, but you know, but modularity certainly, and ease of understanding and separation of concerns, if only because then we knew that we could give this to Tim and then that, uh, the other guy, and then, you know, we know that who's doing what, and they aren't treading on each other's toes, but there was this idea of structurally designing your code so that you laid out things to minimize rebuilds particularly.

12
00:07:24.560 --> 00:08:08.460
<v Matt Godbolt>And these are some of these are very specific to C++ like those listening, who aren't as familiar with C++ you kind of have to repeat yourself twice in C++. And in many, many ways where you declare something exists with a particular, um, interface, be it just a function or an entire class, and all of its contents. And then you can define what those things are somewhere else. And that's the header file and the CPP file. And there's a huge blur between the two of them, because you can put things in the header file that you could otherwise have put in the CPP file and vice versa. But the real trick is that anything that's calling your code needs to see the declaration. i.e, It exists and it has this form, but doesn't care necessarily about the definition.

13
00:08:08.460 --> 00:08:57.300
<v Matt Godbolt>It doesn't actually care what you're doing with it. It just says there exists a function called printf. Hey, it takes a variable number of arguments, good luck. And that's all you care about as a caller. But anytime you change that contract, anytime you modify, well, now it takes three parameters or two parameters, or it takes an integer instead of a double, all of the callers of your function need to be rebuilt in order to be updated, to know that they have to change their calling convention or anything like that, things like that. And so the trick is to minimize the chance that you're going to be changing something, that's a declaration in a header file so that you don't force everything to rebuild. And so there are a number of tricks that you can do to do this and, and Lakos' book "Large-Scale C++ Design", I think is what it's called was a huge eye opener for us.

14
00:08:57.300 --> 00:10:04.040
<v Matt Godbolt>And actually we realized that a lot of, lot of these things could be automated and we went up, when games company went down. Uh, we founded a little startup to try and make this a machineable thing, an automatic, uh, you know, set of, I guess, what they would be nowadays is refactorings of an existing code base. We had a very different idea and it didn't work out. But as a whole other episode, I think about the failures of my, my, my startup business. But, um, what it means is that there are ways of changing your code. That sometimes are like just free. Like don't put stuff in header files, if you can avoid it, right. Don't expose your inner workings to your consumers. If they don't need to be up in your business, right. They don't need to know that I'm calling some other function. And of course, there's a sort of transitivity issue here, which is that if you are exposing your innards to another person and your innards use another library, now the person who's including you is transitively including another library. And so C++ gets a bad rap for build times. Understandably.

15
00:10:04.040 --> 00:10:40.360
<v Ben Rady>Yeah. Well, I think it is one of the few languages where, well, actually, I don't know about the few. It is uncommon among popular programming languages...I think it's a fair characterization to make...that the structure of your code has such a dramatic impact on your build. Like, I don't even think that you see that in other compiled languages. You definitely don't see that in non-compiled languages. I mean, the load time of, you know, a Ruby or a Python file is negligible compared to the execution time, a lot of the times. Um, and so even with something like Java, like you don't really think about those things, Right?

16
00:10:40.360 --> 00:11:44.840
<v Matt Godbolt>No, there's some magic behind the scenes. And some like very quick pass over, the Java files has generated enough understanding of like, Hey, this is what the module contains that you just don't think about it. It's hidden away from you, but it's exposed warts and all to you in C++. And it's in many ways, it's a feature because in some cases I do want to expose that my deepest implementation to the outside world, because I'm, I'm a generic, uh, algorithm. And if I'm a generic algorithm, genericized on some type that I don't even know yet, C++ decided to go down the route of effectively code generating with my algorithm and your type, you inject your type into my algorithm and the compiler sees it and is able to do all sorts of amazing optimizations given that. And so it's a feature that the inner most part of my binary search or whatever I've written generically is exposed to you so that when you put in your type, that has a particular less than operator, the compiler can make a really good decision about how to optimize that.

17
00:11:44.840 --> 00:12:34.820
<v Matt Godbolt>That's brilliant. That's great. And of course, if I do change the algorithm inside my, um, my, my binary search, of course, you have to rebuild, I've changed how it works and you will be able to optimize in a different way. So that's, that's a good thing. Um, I think a lot of, uh, I mean, just to take one simple example, it's really convenient as a C++ developer to put trivial functions into the head of files of say a class. So if you're writing a class, you have a choice when you declare a, a function, you can also define it in place. And just to actually open the squiggly braces and say, well, okay, this is return my member thingamajig, right? So then, then the implementation is in the head of file too. And that's great. And it's super convenient because it's as a programmer, right?

18
00:12:34.820 --> 00:13:20.780
<v Matt Godbolt>You don't want to be opening two files and keeping these things in track of whatever. So you might do that, but now of course you have put your implementation in the header. And so you've exposed it to other people, which is fine. Now, there are two reasons why you might do that one you're lazy. And I am very lazy all of the time. So mea culpa and two, because certainly in older generation compilers, it was the case that the only time that things could be inlined is if the compiler could see the body of a function, which again, makes sense, right? The compiler needs to be able to see what the function is doing into its innards and having a look at what actual operations are being done in order to make the determination about whether it should inline something, and then ultimately to inline it.

19
00:13:20.780 --> 00:14:04.920
<v Matt Godbolt>And as we know, sort of inlining is like the uber optimization, everything runs off inlining you in line stuff. And then you notice that things are constant and then you can delete huge areas of code, and then you can inline some more stuff. And then before, you know, it, a huge complicated chain of things becomes like a simple operation. So inlining is something you definitely, definitely want to have available. So obviously you want to put some things in the header file in order to say to the world. Yeah. Okay. You can inline my function, but now you'll, you've got this sort of decision to make is performance important to me? Is it really important if it's a trivial, uh, access, uh, like, you know, get, get something, um, then return something like, well, it, it doesn't hurt me to do this, so I'm going to put it in the header file.

20
00:14:04.920 --> 00:15:25.760
<v Matt Godbolt>Something I've seen that people haven't realized is that compilers have moved on and now with link time code generation or, um, yeah. LTO link time optimization compilers are able to do the inlining process, even with things that they didn't at the point of compilation have access to. And so what I have started doing now is putting very, very few things in headers, very few things, indeed. And relying on link time optimization when I'm doing my release builds, which does take longer, but I'm not caring about that. Right. Um, for, for my testable builds, I really want it to build fast while remaining honest to, this is what I'm going to ship. Yeah. So there's a bit of a dichotomy there because obviously everything you do, that's different between shipping and debugging. Um, isn't the same is maybe problematic or could be problematic down the road, but I've found that like having a very tight turnaround in debug mode where I, I, uh, don't turn on link time code generation and I, um, and I move everything out of the header files usually comes back as a, as a boon for me.

21
00:15:25.760 --> 00:16:41.800
<v Matt Godbolt>So that means I can make changes, you know, Hey, I need to, uh, who all is accessing this thing? I want to print out, who's doing it. Or actually, no, I'm going to get rid of this, um, this field entirely. And I'm going to replace it with a calculation. Well, actually that may be a problem, but anyway, um, now I can make those changes. And then the only thing that rebuilds is my test, the C file that it's, um, implemented in, and then the linker has to churn and do some magic. And that's, that's great. And then when I come to do my release build, I turn on link time code generation and it's as if I had written it, the old fashioned way, the traditional way, anything that could be in lined will be in lined if it's profitable to do so. And so it's kind of the best of all worlds, but what I definitely do when I do this is I make sure that I build and run my tests in debug and release in my CI, right. Is anathema to a lot of people. They're like, why would you do this? Right. Just build it in debug. And I'm like, well, I am sort of, sort of voiding the warranty a bit by saying, um, I'm trusting that the compilers optimizations don't change the meaning of my code, which I think at some point you have to do, I think we've talked about this before programming is a faith-based activity at some level. One has to trust the compiler most of the time. Right?

22
00:16:41.800 --> 00:16:47.600
<v Ben Rady>You're going to spend a lot of time checking things that are already true if you don't have some level of trust somewhere.

23
00:16:47.600 --> 00:18:03.080
<v Matt Godbolt>Exactly. Exactly. So assuming that the compiler works, you can get a lot of coverage by running the debug mode, like day in, day out. But just to be sure, just to be sure, and it's relatively low cost when you're doing a release candidate or something like that, you should run the tests in release mode as well, and an amount mainly because, um, it's not actually the compiler that will be at fault that 99 times out of a hundred, if you have a difference in the release versus the debug, it's almost certainly your own fault. C++ has plenty of traps for the unwary. Uh, if you're invoking undefined behavior, which is sort of dreaded your you've gone off script from what the language says you're allowed to do, then in some cases, the compiler is, is completely, uh, able to, to do something very, very differently than you intended because you did it wrong. And, uh, and that typically turns up in release builds when the optimizer sort of comes out. So I guess that's just an example of one of the things that one can do, uh, with a more modern compiler. And I mean, anything greater than like GCC five, I think has supported this and actually Microsoft's compilers have supported this forever.

24
00:18:03.080 --> 00:18:05.460
<v Ben Rady>And the magic here is really in the linker, right?

25
00:18:05.460 --> 00:18:06.360
<v Matt Godbolt>Correct.

26
00:18:06.360 --> 00:18:07.340
<v Ben Rady>Yeah.

27
00:18:07.340 --> 00:18:42.560
<v Matt Godbolt>The linker effectively, when you're doing this, your, your builds only build with like an intermediate language. And then when the linker is invoked, as it's connecting the dots of this function calls, this function calls this function, it's like, well, they're very abstract. It's just a dag almost of like what calls what? And then he goes, Oh, I need to reify this function now. And I have the whole program visible to me now it does make for a slow link. Of course. So that's another reason why you try to do this only in your release builds, but it's, it's open, opens the door to making your, your development cycle faster.

28
00:18:42.560 --> 00:19:31.040
<v Ben Rady>Yeah. And I mean, certainly, you know, this fits very well with the model that we were just talking about in terms of, you know, doing those debugs as a, a way to spend a little bit of time now to give yourself confidence that you can spend more time later, right? If the release builds like I'm intent, you're intentionally shifting work into the release builds and making them slower because it enables you to make some of those other feedback loops, the more frequent feedback loops faster. Right. And, you know, it takes maybe a little bit of in your head math to make sure that that makes sense, but it sounds like from your experience, that makes sense quite often, um, I have to ask though, is, is the whole like running the tests on the release build cause you don't trust that it's exactly the same something that's from hard won experience. Have you seen that before?

29
00:19:31.040 --> 00:20:18.700
<v Matt Godbolt>I'm afraid so. Well, I've found I've I have found compiler bugs, uh, before now in this instance, but as I say, 99 times out of a hundred I've found cases where I was inadvertently relying on undefined behavior. And I think it, when one has spent a couple of decades doing this, you kind of build a mental model of what the compiler is allowed to do and the kind of optimizations it's allowed to do. And so you might on purpose or otherwise write the body of a function in a separate file and know that calling it from another CPP file, won't be able to see unfold, inverted commas, the, the sort of nasty trick that you're pulling in some other place. So the compiler haven't compiled these two things separately, we'll do the right thing, but when it can see the whole program, it's like, Oh, wait a second.

30
00:20:18.700 --> 00:20:58.400
<v Matt Godbolt>You do this thing. Oh, I can throw that away then you're like, no, no, no, no, no, no. Don't do that. You know, you were wrong. I was wrong as a programmer. I was wrong. Definitely. I was relying on what was undefined behavior, but I was getting away with it, which is a dangerous, dangerous world to live in. And I definitely don't want anyone listening to this to think that it's okay to rely on undefined behavior. It is not. It's definitely, you're outside of the warranty of the compiler such as there is one, but, um, but yeah, it is hard won, unfortunately, and it's just worth doing. And I mean, oftentimes if you're going to build, if you're going to build a release version to test, you can do like performance analysis as well. And that seems like a worthwhile thing to have as a side effect.

31
00:20:58.400 --> 00:21:46.400
<v Ben Rady>Yeah. Yeah. And I mean, you know, you're kind of into a little bit of a whole release pipeline with that, right? Where, you know, again, it's these successive feedback cycles are gonna, I'm gonna run my tests, I'm gonna do my debug build. I'm gonna run my tests. I'm gonna do my release building and run my tests again. I'm going to do some performance testing and then deploy and then deploy to one server. And I'm going to play to 10 servers. I'm going to dpeloy to a hundred servers, I'm gonna turn the feature flag on, whatever it is, right. To, to sort of go through those progressions of more and more confidence that things are working. But you know, you don't want to get to that point before you discover that you've, you know, flipped a sign operator somewhere or, you know, use the double equal instead of a single equal or whatever the thing might be.

32
00:21:46.400 --> 00:22:28.360
<v Ben Rady>Right. It's, it's, it's a, it's an intentional movement moving of that cost to later only because the chances of it failing are lower. Right, right. Uh, there, there is like an optimal amount of time for a build to fail. It's not never right. If you're built, literally never fails. Why do you even have it? Right? Like it's not giving you any new information. So you do want it to fail sometimes. Um, you just want it to fail the right amount of time for the, uh, you know, uh, length of the feedback cycle, right? The longer feedback cycles should feel less frequently. And the shorter feedback cycles should fail more often.

33
00:22:28.360 --> 00:23:33.480
<v Matt Godbolt>It's worth saying that not every compiler supports this. I know I said GCC five and plus do support this. But the more unusual compilers for embedded systems won't support this feature, the link time code generation stuff. So maybe depending on your exact situation, you might not be able to apply this all the time. And in fact, another thing that I have sort of internalized as a server developer, as I am, is that my servers are so fast that even a debug version running on my computer runs tractably fast. And you know, it can be 10, 20, 30 times slower than the release build, maybe even more. And that's still okay for me to run all my tests in certainly the tests that I care about before I'll check code in. And that isn't always the case. If you're deploying or you're running tests on a system that is considerably slower or has timings to meet, then maybe you can't do that. Maybe you can't rely on the compiler working fast enough in release. But I think a lot of people fall into the category of able to use these kinds of tricks.

34
00:23:33.480 --> 00:24:18.700
<v Ben Rady>Oh, for sure. And I mean, you know, one of the, one of the actually great things about working in C++ is that as you said, you know, you're probably choosing C++ for a reason. And that reason is probably performance. Well, guess what? That means your tests run super fast if you write them correctly. Right. Like, especially in comparison to the language, like, you know, Ruby or Python or JavaScript, or even Java, uh, in some cases, uh, the tests can be extremely fast. So, you know, I sort of have this benchmark in my head, which I, I forget if I've mentioned before, actually, like, you know, your test, you should be able to run hundreds of tests per second. Well, in C++ it's hundreds of tests per second averaged, including your build time is kind of how I would say that because the actual test run should be like thousands of tests per second. Right.

35
00:24:18.700 --> 00:24:42.300
<v Matt Godbolt>Right. I mean, tests, test systems are, especially in debug are a bit slower than that for all the reasons we discussed and, you know, test frameworks like to try and capture as much information. So when they ultimately fail. So there's a bit of, uh, you know, I would, uh, I'd say it's okay for to only, only run hundreds of tests a second. I'd be surprised if my tests actually run that fast. I'll be honest with you. I should check. I haven't really thought to do it. But

36
00:24:42.300 --> 00:25:31.500
<v Ben Rady>Yeah. I mean, you know, I think it definitely depends on the style of tests that you write. And I, and I think that you can actually even sometimes get into a little bit of a, of a broken windows thing. Well, maybe this isn't exactly broken windows, but it's, you sort of fall into the trap of the speed of the code itself actually hides, um, some not great testing techniques that you wind up doing. Like there's this sort of subtle interaction between tests that are brittle and tests that are slow, right. Tests that are brittle tend to do things like access databases and, you know, read and write files and communicate with services and do all these other things and tests that are slow, also tend to do those things. Right. And so if you sort of listen to the speed of your tests, I don't know if that's a workable metaphor.

37
00:25:31.500 --> 00:26:43.260
<v Ben Rady>If you go with it, if you, if you pay attention to the speed of your tests, they can sometimes tell you when you've done some of that stuff inadvertently, like I have definitely been in the middle of writing tests and all of a sudden the tests got really slow and I'm like, Oh, what's going on here? I'm like, Oh, well, I'm actually reading data from this service. That's why it's so slow. I need to mock that out. Yeah. Um, one of the other things I kind of wanted to ask you is, is, um, there is, there's definitely, you know, you were talking earlier about, you know, having in C++ to sort of say things twice, right? Like there's this choice of like, do I put this in a CPP file or do I put it in a header file? And you know, I think the case that you gave it was, it was more obvious that you shouldn't be putting certain things in a header file, but I can imagine there are lots of situations in which that decision almost becomes arbitrary, right? Like maybe, you know, whether or not I should be using templates or something like that is a decision that you might, you might make. Are there, are there ways to sort of frame those decisions in terms of faster feedback and faster builds where it's sort of like, well, normally from, uh, from one perspective, these two solutions might be, you know, equal in dignity, but from a, you know, build time perspective, actually one of them is much better than the other.

38
00:26:43.260 --> 00:27:48.620
<v Matt Godbolt>That's a really good question. Um, it's definitely the case that certain C++ features mandate you putting code in the header or almost mandate it. So you mentioned templates. That is a great example where there's generic programming almost always has to go into the header file because anyone who might reasonably use that, that functionality needs the implementation of it in order to, to optimize, like we talked before, or even instantiate it, as we said before. Um, another thing is, uh, the compile time programming of constexper functions, which is sort of new money style way of doing programming that, um, well, to take a small diversion, you know, templates are initially designed to be a generic programming tool. And then very quickly it was discovered that, um, they were themselves, the way that the instantiations were done and the way that the compiler resolved certain features, uh, was itself Turing complete.

39
00:27:48.620 --> 00:28:20.340
<v Matt Godbolt>So you can actually write a program purely using templates, which is a meta program. It's a program about a program. And that's a useful characteristic. It turns out it's essentially like an in, in, uh, in build code generator of, of a sort. And, you know, you can start doing all sorts of tricks with like, well, okay, I want to do this if the template type that I was passed is an unsigned, because I don't have to check now if it's negative, cause it can't be negative. And therefore I can actually write my algorithm to take advantage of that potentially.

40
00:28:20.340 --> 00:28:22.900
<v Ben Rady>And then eventually re-implement Lisp.

41
00:28:22.900 --> 00:28:43.280
<v Matt Godbolt>And then, I mean, not all languages, we reimplement Lisp eventually. And in fact, a lot of template meta-programs, it looks like, uh, you know, all the cons-ing and addr-ing and all the weird lispy type terms come up. So, um, constexpr is another way of writing a much more imperative programmer based meta program of a sort.

42
00:28:43.280 --> 00:29:28.240
<v Matt Godbolt>It can be used in many other contexts too, but very often again, that means that the program, the functions that are constexpr have to be put into a header file because the compiler has to be able to see their, their body in order to evaluate them appropriately, which is, which is great. But those two techniques, if you start out as the, as that being like, I'm going to do everything that way you, you basically don't have the option of pulling things into a CPP file without extreme tricks. And there are some tricky tricks for doing some of those things. And I have definitely seen a modern style of C++, which I don't write just because I'm the way that I am. I think, you know, my journey has meant that I'm much more of a imperative one line after another kind of programmer.

43
00:29:28.240 --> 00:30:13.560
<v Matt Godbolt>Um, and I use templates when they are profitable, obviously profitable to me based on my sensibilities and constexpr I use as much as I can, but usually it's not by default, which means that a lot of my stuff is pushed into CPP files just because that's how I go. So I can make these statements about like, Oh yeah, just put it into the CPP file and your builds go faster. Um, and it's very hard to be in the situation where you have a template, heavy piece of code and try and reduce the build time of it. Uh, just because there isn't anything to reduce in a way. So I guess the design of your system, and I suppose this comes back to Lakos' thing, you know, like actual physical design of your system has to be factored in when you're making high-level design of how you're going to fit your components together.

44
00:30:13.560 --> 00:30:50.220
<v Matt Godbolt>If you want to choose, if you want to select for build time, then that's something you need to consider earlier on. Although in terms of some of the things that we've been talking about, it occurs to me now that another sort of dimension is if I am in fact, changing a template heavy piece of code, let's say it's the binary search that we sort of made up earlier. So I'm editing the binary search and I want to make sure that I haven't broken it, right. I mean, that seems like no one's ever written a faulty binary search. I know it's a trivial thing to get, right.

45
00:30:50.220 --> 00:30:52.680
<v Ben Rady>Especially in interviews, perfect every time.

46
00:30:52.680 --> 00:31:08.260
<v Matt Godbolt>Every time without fail. Uh, but you will have a test file for that almost certainly. And so now the way that your build system is set up, contributes to how quickly you can iterate on, did I break my binary search, right?

47
00:31:08.260 --> 00:31:58.160
<v Matt Godbolt>Because if you, if your build system is like the sort of default go to make or Cmake or whatever, where you just sort of say, "make test" it's going to run all the tests. And if you've just touched binarysearch.h Sure as heck 97% of your code base needs to rebuild for you to run the one test that tests it with, you know, ints and floats and doubles or whatever, you've, whatever your binary search test cases and what you really really want to do is build just that one test and run that one test. Right. And that's hard to do in C++ because there's a lot of complexity in the way the build systems work and almost necessarily you group things in modules and then sort of treat them as a, as an atom and build and link that one library together.

48
00:31:58.160 --> 00:32:48.620
<v Matt Godbolt>And so in my own projects, I tend to make one library per subdirectory, and then I try and put minimal stuff in each sub directory. And I have a test for that, like library and then branch out, but it's not always possible to do that. Uh, and I mean, there's certainly the cases of some of the, um, like bigger code bases where everything's in everything else. It may be hard to extract sensible modules, which I think comes back to their design stuff. We've talked about before, where if you design for say testability, and this is another metric that you measure testability by how isolated can I make this object? And it's a test at the build level, which again is like a physical property, not a conceptual thing. It's like, no, can I just type make binary search test or go into my IDE and hit just that and know that I'm not even compiling the rest of the code.

49
00:32:48.620 --> 00:33:30.580
<v Matt Godbolt>That could be something worth thinking about, I don't have an answer for that, but I, you know, one, one, one tries to, to organize one's code as best as one can I actually, I do know, uh, I have, uh, an ex colleague who works for, uh, another, um, uh, financial company. And he's looking, I think he's trying to open source it, a build tool that essentially starts from a test, a C++ build tool that starts from a test, you point it at the test and say, can you run that test please? And it kind of backs out the build tree sort of by look far, following all the include files and, and working backwards and go, well, this is the minimum set of things I need to have built to be up to date in order for, to run that one test, which is if you can open source it.

50
00:33:30.580 --> 00:34:36.840
<v Matt Godbolt>And if it works as well as he he's, he described it to me would be a huge boon to say, no, those tests, please. And there was some subtleties because there's, of course everything's more complicated in C++ you've got like global objects and all sorts of nonsense like that, but that kind of thing is maybe a sea change in the way, pardon the pun, in the way that we, one tests one's code, if you can, you can literally point to that. And, and in fact, do refactorings how many times in C++ have you gone I really want to change this, this interface, and I want to test it with a subset of my, my code, but everything's broken now because I broke the interface and 97%, again, 97% of my code now fails to build, and I don't want to spend the time updating it until I'm sure that this is a right step for me to take. And so I want to make sure my test pass first. So I haven't broken the intent of what I'm changing. And then I want to test like a subset of my code. Does it smell right? Does it feel right? Can I run the test in that part of the code before I then commit to rolling up my sleeves and dealing with the 3000 compiler errors and the rest of the code base? And that's sometimes it's hard to do so a build system, maybe contibutes with that.

51
00:34:36.840 --> 00:34:55.380
<v Ben Rady>Yeah, definitely. I mean, absolutely. This is scary. That's a scary prospect of like, you know, having to make a change to a to an interface and not really having confidence that it's correct until you've done an hour's worth of work. Right, right. Like,

52
00:34:55.380 --> 00:35:52.180
<v Matt Godbolt>Unfortunately that is the way of C++ where you make the change and then you type make and stop fixing every single, every you've got one after another. And then you go, that was worthwhile, which of course leads to that kind of false dichotomy as well. The way having made that change, if you're on the fence about whether it was worthwhile or not, you definitely going to say I'm keeping that that was an hour of toil. Right. I'm not undoing it. Right. So to an extent, you know, um, and I, again, I, as an IDE friendly person, I I'm starting to trust IDEs more and more with some of the refractorings, which I would never have done in C++. And that can be a superpower with some of these changes, because you can say, Hey, add a new parameter, and you worry about it, danger that it gets it wrong. Of course. But, um, and then in which case, undoing that change doesn't feel quite as personal. It doesn't feel like a failure as much when you discover actually, no, it was not the right thing to do, but that's probably a whole other conversation for another time.

53
00:35:52.180 --> 00:36:48.940
<v Ben Rady>True, true enough. But yeah, those, those automated refactorings are very powerful, but yeah. I mean, it's, it's, I dunno, I, I think that there we've talked a lot about some of these things sort of being more about philosophy than they are about technique, right? Like if you start with the philosophy, the tech, you know, there's a lot of smart programmers out there doing, doing clever things. If you start with the philosophy that the techniques will come, right. You'll either find them, um, because you sort of have that desire for them or you'll invent them or you'll borrow them. Um, but whatever it is like, if, if you don't have that, buy-in of like, no, we're not going to have the situation where every time I change, you know, my binary search implementation or my binary search, you know, algorithm that I'm going to just go get a coffee for 30 minutes and then come back later and figure out if anything is, is broken, right? Like you gotta, you gotta design it from the beginning. Um,

54
00:36:48.940 --> 00:37:08.700
<v Matt Godbolt>This is the, the whole, there's an XKCD for this, which tells you how endemic and problematic it is in, in the industry sword fighting...on chairs. And I just, for the record, no one should really be writing their own binary search. There's an STL implementation. That is perfectly good. So use that, even if it's confusingly named it, just use that. Yeah.

55
00:37:08.700 --> 00:37:11.240
<v Ben Rady>Yeah. All the more reason. Right.

56
00:37:11.240 --> 00:38:23.460
<v Matt Godbolt>And I mean, there are so, so yeah, what were just, there are other tricks, of course. Um, but some of the other tricks that one might use to reduce one's, uh, coupling in C++ do have a runtime impact. Now I've sort of said about pulling stuff out of headers and then relying on link time code generation to kind of undo that is, is one thing because that essentially nets out these days, I think, but sometimes you can extract an interface and declare it and define it literally as an interface, like an actual virtual interface thing, knowing that everyone who's ever calling you is now going to be cursed to go through a virtual function call to get to your implementation. But now that acts as this beautiful disconnect, both in terms of the pure interface, like from a design standpoint and from the, well, now I can substitute any old object that I like, and I've built that separately to you. You may not have even been from the same build process. It could be from some other thing completely differently. And so I've insulated you from changes to my implementation at the very highest level. And that can be a powerful technique. And then just because I love this stuff so much, and I'm sorry to get excited about more compilation trickery, even.

57
00:38:23.460 --> 00:38:27.560
<v Ben Rady>That's why we're doing the podcast, is to get get excited about this stuff.

58
00:38:27.560 --> 00:39:17.720
<v Matt Godbolt>That is a very fair point, mate. But even that, which seems like it's an irredeemable change to the way you've written your code. And you've put this massive doorstop in between. I dunno, doorsteps the right thing. You've put this massive, uh, barrier for the optimizer between my implementation and your calling through an interface. Compilers are just starting to get clever enough to see through that now in managed languages like Java, um, this kind of trick has been able to be done for awhile at runtime. As you're calling a virtual method, it kind of goes, Hey, you know what? This is always concrete file system. I wonder if I should just call concrete file system directly and then inline it and then put a check. And if it's never, if it's not a concrete file system, then immediately like go, Oh, Nope, we're done here.

59
00:39:17.720 --> 00:40:12.560
<v Matt Godbolt>We have to de-optimize, and do something else. But most of the time, as long as it is just a concrete file system, then it's as fast as if I'd written a direct call. C++ is starting to pick up on this. There is some devirtualization going into clang and GCC. And a lot of that stuff is getting more and more sophisticated. It's not a total panacea yet, but I would imagine that as time ticks on, it will become more and more possible to rely on the compiler doing magic for you. And that way I can write my code with an interface between things to separate it both from a testability point of view. So I can have my file system and I can have my mock file system and my concrete file system and whatever. And from a build time standpoint, because I very rarely changed the interface to my file system, but I probably, if I'm working on the file system, I'm hacking around inside the implementation all the time and all I'm doing is building and then the linker just has to run.

60
00:40:12.560 --> 00:41:15.040
<v Matt Godbolt>And so there's a lot of good things going on on there. And there is if one doesn't have to use virtual methods to divorce your implementation from your interface, there's a technique called pImpl, which insulates your callers from the structural layer of your object. So if you have like an int and a float as member variables, and you decide to add a char later on, of course you've changed the size of your object, which means anyone who has one of your objects in there has to be rebuilt and all these kinds of things. So there are ways and means around those, those too, but that does come with a runtime cost. And that's something you might just have to take on the chin for certain things. And, you know, again, if you iterate fast, you can probably find quickly the areas that you're, that need the performance. So I don't know, I'm throwing out ideas really for the, for, for build time, um, installation techniques, which is less about the build time itself and about, more about reducing the, uh, the coupling between components so that a small change to an implementation doesn't necessitate a giant rebuild of your program.

61
00:41:15.040 --> 00:41:25.720
<v Ben Rady>Right. Right. And, you know, the, the, the sort of add on benefits of, of doing that beyond just the build time, right? Like the decoupling that you get otherwise.

62
00:41:25.720 --> 00:41:26.940
<v Matt Godbolt>Right. Exactly.

63
00:41:26.940 --> 00:41:39.260
<v Ben Rady>I was going to say, what about, um, other things like distributed builds and other kind of external tricks for speeding up builds? What's your take on those in general?

64
00:41:39.260 --> 00:42:31.420
<v Matt Godbolt>So I think they're a necessary evil once you get to some level of complexity, uh, certainly the companies I've worked at before, we've good success with commercial offerings that allow you to distribute your build in a relatively straightforward way. But most of the time, the time and effort in getting that to work reliably and not to have missed builds and not have issues with distributing the particular version of the compiler that you've got or anything like that is, um, is outweighed by just put just, I say, just with little air quotes here, just putting as fast a computer you can possibly afford in the hands of your developers for their day-to-day activities. So I'm lucky enough to have like a 16 core machine here. And so building locally with 16, cores 16 threads of, of a build is fast enough. And obviously that's, that's great.

65
00:42:31.420 --> 00:43:34.160
<v Matt Godbolt>Not everyone can afford that. Not every company is going to shell out for a machine with that amount of, of, of, of power, but given the cost to engineers of them twiddling their thumbs waiting, that has probably the best bang for buck. Otherwise, you're talking about buying in a, uh, an external distributed build system or trying to get dist CC or equivalent to work, which is a very good product. But as soon as you're having to worry about which version of the compiler is installed on some other person's machine and making sure that they have the same header files that you do, the, the effort of doing that, plus the effort of debugging it, when it goes wrong is very, very high. Nobody wants to be in the situation where you do the build, you run your test, they fail in a surprising way. And so you blow away everything and rebuild only to discover that it was actually a genuine problem. And you just wasted the time because you didn't trust your build. And this is, I know you know this, but I'm very passionate about being able to trust the reliability of your builds. Yeah.

66
00:43:34.160 --> 00:43:42.740
<v Ben Rady>And yeah. So I've heard you rail before about, you should never, maybe never is a strong word, but you should be careful by using "make clean".

67
00:43:42.740 --> 00:43:46.310
<v Matt Godbolt>Yeah. You were right the first time.

68
00:43:46.310 --> 00:44:55.860
<v Matt Godbolt>If you type "make clean" then all bets are off. Now my, my particular case in that is I have seen it when people have builds that are not reproducible when people have builds that are flaky in some way often to do with parallel builds because they haven't either specified their build properly, or they have a step that is not parallelizeable. And it doesn't say that it isn't. And so it's non-deterministic um, then what tends to happen is that any sort of weird unexpected behavior in the program, either like a weird crash or, um, a test that fails unusually that you can get into the situation where the knee jerk response is, well, maybe it was a bad build and then you do make clean and then you run it again and maybe it works. And then there are two reasons for this. The first reason is that your build was indeed buggy and broken and you were in an indeterminant state. And my strong belief is that you should fix your build. And the only way to fix your build is to look at the carcass, pick over it and try and understand from all the files that you have, what the heck happened. And if you've just type, make clean, you got rid of that. Yeah.

69
00:44:55.860 --> 00:45:02.920
<v Ben Rady>It's like trying to solve a crime by cleaning up the crime scene. I can get rid of this blood. Like these fingerprints are all messy. I can fix that.

70
00:45:02.920 --> 00:45:05.280
<v Matt Godbolt>All right. No more murder, right?

71
00:45:05.280 --> 00:45:08.260
<v Ben Rady>Murder solved! Nothing to see here, folks.

72
00:45:08.260 --> 00:46:10.320
<v Matt Godbolt>That's kind of the good case in a way, because make clean make, and it starts working again. And you're like, Oh, okay. I had a bad build, but you won't ever be able to fix it. If you, if you take that approach. The second thing is that maybe you do actually have a genuine one in a million bug in your code. Like it's a race condition, or it's a strange, uh, case that happened that the network drive was, was down temporarily, whatever, whatever unusual circumstances in which case you have chalked it up to, and maybe missed the one opportunity to debug and dig into a freak occurrence. And so now you just do make clean make, and it goes away, and then you don't trust your build and you've lost the opportunity. And so I, this is why I feel so strongly about this kind of stuff, as you can tell, and to bring it back on to the subject, cause I know this is a thing you can wind me up for hours on is, um, you know, obviously adding distribution into your build system adds yet another reason for your build system to be non-deterministic or to be broken in some way or to have issues.

73
00:46:10.320 --> 00:46:26.900
<v Matt Godbolt>Um, and so if you give people another excuse to kind of go, Oh, I don't really know what that problem was. I fancy another cup of coffee, make clean, make, and then walk away from their computer for a bit. Then I think that's a bad direction to go in. But I understand that I have strong opinions about this.

74
00:46:26.900 --> 00:47:10.440
<v Ben Rady>No, I mean, and I it's funny cause I share your opinions in, in a few other forms. One of, you know, staying on brand here, obviously, uh, testing! Of course you knew I was going to say that. Distributed test systems, right? Like whole industries were built trying to figure out how to get Ruby on Rails apps to run their tests in a performant manner. Once they had, you know, thousands and tens of thousands of them, because each one took like a second, right. Cause ActiveRecord. So like, you know, Oh, run them in parallel. Right. And it's like, you're solving the wrong problem when you do that. Right. You're, you're introducing a whole other form of unreliability into your tests in the name of speeding up your, your feedback, which is, you know, like they're feeling the pain and they should try to speed up those feedback loops.

75
00:47:10.440 --> 00:47:46.620
<v Ben Rady>But the way to do that is, you know, there are certain situations in which the best that you can do is try to run it in parallel. But you have to recognize that when you do that, you're creating a whole other kind of unreliable failure that you're then also just going to have to deal with. Um, and certainly in the, you know, the world of "make clean", I feel like there are two kinds of developers in the world, the, the kind where, you know, when you turn it off or turn it and turn it back on again or reset it or do whatever the thing is to make the bug go away, they're happy. And then the other kind where they're disappointed, right? Like what I wanted to see is when you reloaded that page, that the bug happened again.

76
00:47:46.620 --> 00:47:59.540
<v Matt Godbolt>How many times have you had to tell a family member to turn off a system and turn it back on again and then hated yourself so much because it's the only solution available, but yeah, you're, you're so right. Yeah,

77
00:47:59.540 --> 00:48:35.780
<v Ben Rady>Yeah. Yeah. Like you, you, those opportunities are sometimes the next time you're going to get that is in production when there's like serious stuff on the line. And so all those opportunities to try to figure out what's going on with these things, like the worst thing is bugs that that only happen one in a thousand times, right. Because they're so hard to fix. If you're not interested in fixing them, it's great because, okay, I got 999 more times left before I need to worry about this again. But, uh, if you actually want to solve a problem, it's the really want it to be reproducible. So the only way to fix that sort of intermittent failures to really just kill it with fire every opportunity,

78
00:48:35.780 --> 00:49:36.820
<v Matt Godbolt>At the first opportunity you get, exactly. And I think you and I have talked about this kind of stuff, failing early, failing fast and not doing things occasionally, if you can avoid it, you know, in our world of finance, there are some sort of things that one does when one is, uh, receiving, uh, market data from the outside world. And there is a good case and there's a bad case. And if you can engineer it, so you always start off with the bad case, then you'll never be surprised when the peak of market activity, you have to do the same thing. And similarly, like I remember that when the Linux kernel, they fixed a bug in the way the rollover worked in a timer. Like there was a, and so nowadays, um, they, they fudged the timer. So it starts up with like 10 minutes before it overflows just to force it to happen like early at the time. And not like four years, 55 days into the uptime of a machine, which I think is, you know, it's a pragmatic solution. Just like, okay, you boot it up, get to see whether it works every time now, rather than it happening once literally. Well, not even a blue moon. I think that would be three months

79
00:49:36.820 --> 00:49:44.840
<v Ben Rady>Either it's going to work or it's going to not, if you can put it in a situation where if it's broken, it will fail. That is always better. It's always better. Yeah. Yeah.

80
00:49:44.840 --> 00:49:51.880
<v Matt Godbolt>If you're going to fail at all. Yeah. Well, I guess that's, uh, I mean, there's tons of them all to talk about this, but

81
00:49:51.880 --> 00:49:55.500
<v Ben Rady>Oh, so much more to talk about on this topic, but you only go for so long.

82
00:49:55.500 --> 00:50:22.080
<v Matt Godbolt>I know, right. But our poor listeners can only tolerate us blabbering on for so long too. But next time we can talk about some more C++ stuff or we can, we can go into other languages or maybe a whole other topic who knows, but it's been great fun talking about it. I, I I'm, I'm glad you riled me up on the make clean stuff. I'm feeling ready to go out and make some strong statements about things, which is very uncharacteristic for me.

83
00:50:22.080 --> 00:50:23.780
<v Ben Rady>So, yeah. All right. That's good.

84
00:50:23.780 --> 00:50:36.470
<v Matt Godbolt>I guess I'll see you next time.

85
00:50:36.470 --> 00:50:38.470
<v Ben Rady>Next time.