WEBVTT

1
00:00:25.080 --> 00:00:26.160
<v Matt Godbolt>Hey, Ben.

2
00:00:26.160 --> 00:00:27.160
<v Ben Rady>Hey, Matt.

3
00:00:27.160 --> 00:00:35.940
<v Matt Godbolt>I left that a bit long, didn't I, this time? That was a bit later in the intro. I was like, oh, because I was chatting to you pre-show and then, yeah, got distracted.

4
00:00:35.940 --> 00:00:39.900
<v Ben Rady>How long could you go not saying that before people would just be like, is this thing broken?

5
00:00:39.900 --> 00:00:41.100
<v Matt Godbolt>Are we on the right podcast here?

6
00:00:41.100 --> 00:00:42.700
<v Ben Rady>Is this the right podcast? I don't even know.

7
00:00:42.700 --> 00:01:02.180
<v Matt Godbolt>I have a funny story about that. Maybe we've already said it, but one of my pals, one of my C++ pals, his name is Ben, Ben Deane, right? And he's also British. And so he was listening to the first episode of the podcast and he just nearly fell off his chair because the first thing I do is go, hi, Ben. And he's like, how? What?

8
00:01:02.180 --> 00:01:08.480
<v Matt Godbolt>I mean, I'm sure that other people called Ben exist. I appreciate that. But it was just him telling the story that made me laugh.

9
00:01:08.480 --> 00:01:12.140
<v Ben Rady>So it's like, did I forget to leave a phone call or a Google Meet somewhere? And it's just Matt being like, yeah, hi.

10
00:01:12.140 --> 00:01:17.360
<v Matt Godbolt>Now someone is just talking to me. Hey, you bum dialed me.

11
00:01:17.360 --> 00:01:19.560
<v Ben Rady>You left your phone on.

12
00:01:19.560 --> 00:01:20.340
<v Matt Godbolt>Yeah.

13
00:01:20.340 --> 00:01:23.100
<v Ben Rady>Yeah, exactly.

14
00:01:23.100 --> 00:01:33.910
<v Matt Godbolt>We've all been there shouting as loud as we can, trying to get someone's attention. But yeah, not that. But anyway, I wonder how long we could get away with. But it's what we do. It's what we say. Oh, another thing.

15
00:01:33.910 --> 00:01:34.060
<v Ben Rady>Mm-hmm.

16
00:01:34.060 --> 00:01:38.940
<v Matt Godbolt>I met somebody who is a listener or our listener, singular.

17
00:01:38.940 --> 00:01:40.700
<v Ben Rady>The listener. You met the listener.

18
00:01:40.700 --> 00:01:45.150
<v Matt Godbolt>And it occurred to me that's about the fourth person now who's told me that they are our listener.

19
00:01:45.150 --> 00:01:45.580
<v Ben Rady>Oh.

20
00:01:45.580 --> 00:01:49.620
<v Matt Godbolt>So we might have to accept that we have more than one now.

21
00:01:49.620 --> 00:01:51.900
<v Ben Rady>So we have four listeners.

22
00:01:51.900 --> 00:02:06.220
<v Matt Godbolt>It's so funny. Given how the internet is all about tracking and every YouTube video tells you how many people have watched it and all that kind of good stuff, it's really hard to know how many listeners there really are out there for us.

23
00:02:06.220 --> 00:02:07.420
<v Ben Rady>Oh yeah. Because...

24
00:02:07.420 --> 00:02:08.680
<v Matt Godbolt>It seems broken in the podcast world.

25
00:02:08.680 --> 00:02:14.120
<v Ben Rady>Because of one of the classic problems of computer science, cache invalidation.

26
00:02:14.120 --> 00:02:16.000
<v Matt Godbolt>Yeah.

27
00:02:16.000 --> 00:02:17.360
<v Ben Rady>Yeah.

28
00:02:17.360 --> 00:02:27.900
<v Matt Godbolt>Everyone wants to cache your... I mean, I suppose that's the thing, like videos, nobody would want to cache gigabytes of videos, but you're like, hey, a couple of megs of MP3 file, sure.

29
00:02:27.900 --> 00:02:28.340
<v Ben Rady>Right.

30
00:02:28.340 --> 00:02:32.200
<v Matt Godbolt>Spotify, we'll take a copy of that and we'll hand it out to everyone when they press play.

31
00:02:32.200 --> 00:02:32.960
<v Ben Rady>Yeah.

32
00:02:32.960 --> 00:02:43.700
<v Matt Godbolt>And then we get you to try and sign up for our Spotify podcast publisher account that then lets you see how many plays you've had on Spotify. That's great.

33
00:02:43.700 --> 00:03:05.040
<v Matt Godbolt>But you can also listen to it on Apple and Google and YouTube and off of our website. And you can pay, there are these places that will charge you decent amounts of money to tell you how many people listen to your podcast. But I'm just like, somebody should write a web scraper that goes to all of these places and gets them all in one place, right?

34
00:03:05.040 --> 00:03:10.290
<v Matt Godbolt>And by somebody, I mean, probably an LLM that I could just tell it to do that thing.

35
00:03:10.290 --> 00:03:10.860
<v Ben Rady>Yeah.

36
00:03:10.860 --> 00:03:13.380
<v Matt Godbolt>Anyway, that's not what we're here to talk about today.

37
00:03:13.380 --> 00:03:15.520
<v Ben Rady>No, we're not. What are we talking about today?

38
00:03:15.520 --> 00:03:26.080
<v Matt Godbolt>We're talking about the last three days of my life, which have been supposedly preparing for conference presentations, which I have in September.

39
00:03:26.080 --> 00:03:26.200
<v Ben Rady>Oh, okay.

40
00:03:26.200 --> 00:03:41.260
<v Matt Godbolt>I've got three conference presentations coming up, which is great. And while you're sort of writing slides and throwing out ideas, you're kind of like, well, I want a background window that I can kind of just tap into occasionally. And there's tons of stuff in Compiler Explorer.

41
00:03:41.260 --> 00:03:53.380
<v Matt Godbolt>which is just like, hey, run this thing, wait for it to finish, have a look at the output, see if it makes sense, push it to production if it does, updating libraries, that kind of good stuff. And so we've hit August now.

42
00:03:53.380 --> 00:03:58.240
<v Matt Godbolt>which I believe will be actually when this podcast comes out in a rare departure from the norm.

43
00:03:58.240 --> 00:03:58.720
<v Ben Rady>Yeah, that's right.

44
00:03:58.720 --> 00:04:10.300
<v Matt Godbolt>But we've hit August, and I've long since had a calendar reminder to say, hey, we should really upgrade to Ubuntu 24 for all of the production nodes for Compiler Explorer.

45
00:04:10.300 --> 00:04:10.640
<v Ben Rady>Mm-hmm.

46
00:04:10.640 --> 00:04:14.990
<v Matt Godbolt>And, you know, we tend to drag our feet a little bit because we've been bitten before.

47
00:04:14.990 --> 00:04:15.560
<v Ben Rady>Mm-hmm.

48
00:04:15.560 --> 00:04:22.000
<v Matt Godbolt>this is where we should see a big foreshadowing thing that should pop up here.

49
00:04:22.000 --> 00:04:22.280
<v Ben Rady>Oh, boy.

50
00:04:22.280 --> 00:04:32.640
<v Matt Godbolt>And in fact, 10 years ago, we were bitten by an issue. I was doing some research trying to work out what happened and I found my own blog post from 10 years ago.

51
00:04:32.640 --> 00:04:33.360
<v Ben Rady>Oh, wow. Yeah.

52
00:04:33.360 --> 00:04:35.870
<v Matt Godbolt>You know, that kind of thing where you're like, oh, who had this problem?

53
00:04:35.870 --> 00:04:36.100
<v Ben Rady>Yes.

54
00:04:36.100 --> 00:04:37.850
<v Matt Godbolt>Oh, I had this problem.

55
00:04:37.850 --> 00:04:39.180
<v Ben Rady>I did. Former me. Uh-huh.

56
00:04:39.180 --> 00:04:56.740
<v Matt Godbolt>So anyway, how hard can it be to upgrade the operating system? We have everything scripted through the wazoo. We use Packer to build our images, all of our images, apt install all the things they need and then they shut themselves down.

57
00:04:56.740 --> 00:05:03.860
<v Matt Godbolt>Great. And then we make an image, a machine image out of that. And that's what starts up and is a Compiler Explorer node.

58
00:05:03.860 --> 00:05:14.770
<v Matt Godbolt>Pretty straightforward. And we do this fairly often because if you don't, then it rots and you're relying on a ton of stuff that's, you know, install this from this random website.

59
00:05:14.770 --> 00:05:15.290
<v Ben Rady>Right.

60
00:05:15.290 --> 00:05:15.820
<v Ben Rady>Yeah.

61
00:05:15.820 --> 00:05:23.700
<v Matt Godbolt>So anyway, that wasn't a problem. So in theory, it was just change a 22 to 24 and then rerun the Packer.

62
00:05:23.700 --> 00:05:23.810
<v Ben Rady>Mm-hmm.

63
00:05:23.810 --> 00:05:23.930
<v Matt Godbolt>And

64
00:05:23.930 --> 00:05:24.040
<v Ben Rady>Mm-hmm.

65
00:05:24.040 --> 00:05:30.660
<v Matt Godbolt>it failed the first time for an easy to diagnose reason.

66
00:05:30.660 --> 00:05:31.320
<v Ben Rady>Yeah.

67
00:05:31.320 --> 00:05:44.700
<v Matt Godbolt>Luckily I'd already hit this before because I started a 24 upgrade for a different part of the system and hit it. I was like, oh, I must remember this. And of course, completely failed to remember to do it this time around.

68
00:05:44.700 --> 00:05:55.330
<v Matt Godbolt>Which was to do with the way AppArmor has been updated and some of the jailing things that we do. Obviously, we run things in secure environments, and AppArmor has opinions about which things can run.

69
00:05:55.330 --> 00:05:55.640
<v Ben Rady>Yeah.

70
00:05:55.640 --> 00:06:11.760
<v Matt Godbolt>And certainly, our own jailing code needs to be configured so that it can do the things that are usually dodgy. Like, hey, I want to make a whole new namespace. I want to make all of these sort of isolated environments. That shouldn't just be allowed to happen randomly.

71
00:06:11.760 --> 00:06:26.650
<v Matt Godbolt>So AppArmor comes in and tells you, no, you can't do that. And then Compiler Explorer doesn't work. So we fixed that. Cool. Deployed. You know, anecdotally, it took a while to start up. And I was like, ah, watched pot, you know, never boils kind of thing.

72
00:06:26.650 --> 00:06:27.230
<v Ben Rady>Mm-hmm.

73
00:06:27.230 --> 00:06:27.820
<v Ben Rady>Mm-hmm.

74
00:06:27.820 --> 00:06:35.530
<v Matt Godbolt>But to sort of make this slightly less of a shaggy dog story than it already is going to be.

75
00:06:35.530 --> 00:06:36.260
<v Ben Rady>Mm-hmm.

76
00:06:36.260 --> 00:06:44.400
<v Matt Godbolt>It turns out that our boot up time went from not a great couple of minutes to, you know, four or five minutes. It was timing things out.

77
00:06:44.400 --> 00:06:45.200
<v Ben Rady>Hmm. Yeah.

78
00:06:45.200 --> 00:06:54.120
<v Matt Godbolt>And even more interestingly, while the node was booting up, it was unresponsive. Like SSH would time out.

79
00:06:54.120 --> 00:07:00.590
<v Matt Godbolt>And I'm like, what is going on here? What on earth could it be doing at startup?

80
00:07:00.590 --> 00:07:01.140
<v Ben Rady>Oh.

81
00:07:01.140 --> 00:07:06.370
<v Matt Godbolt>that could um you know bring it to its knees like that.

82
00:07:06.370 --> 00:07:07.150
<v Ben Rady>Yeah.

83
00:07:07.150 --> 00:07:07.940
<v Ben Rady>Mm-hmm.

84
00:07:07.940 --> 00:07:22.100
<v Matt Godbolt>So what it ended up being is that at startup, we mount around 2000 SquashFS images. And I don't know if we've talked about this before on the podcast, but Compiler Explorer has a very unusual...

85
00:07:22.100 --> 00:07:22.880
<v Ben Rady>Don't think so.

86
00:07:22.880 --> 00:07:28.200
<v Matt Godbolt>Yeah. Okay. So let's do a bit of an interlude. This is, this is definitely a therapy session, by the way.

87
00:07:28.200 --> 00:07:28.320
<v Ben Rady>Yeah.

88
00:07:28.320 --> 00:07:32.380
<v Matt Godbolt>Thank you, Ben, for being my therapist. And thank you, listener, for being ah my therapist.

89
00:07:32.380 --> 00:07:36.880
<v Ben Rady>Well, yeah, you spent three days of your life on this. You deserve to be able to vent about it, I feel like.

90
00:07:36.880 --> 00:07:38.380
<v Matt Godbolt>I'm just, yeah, I'm going to feel better.

91
00:07:38.380 --> 00:07:39.450
<v Ben Rady>That's just like your moral right.

92
00:07:39.450 --> 00:07:39.980
<v Matt Godbolt>It's because catharsis.

93
00:07:39.980 --> 00:07:40.370
<v Ben Rady>Yeah.

94
00:07:40.370 --> 00:07:40.750
<v Matt Godbolt>Yes.

95
00:07:40.750 --> 00:07:41.520
<v Ben Rady>Yeah. Uh-huh.

96
00:07:41.520 --> 00:07:54.120
<v Matt Godbolt>So Compiler Explorer has many compilers. They are immutable because once I've installed GCC 12.1, I never need to do anything with it again. Again, with a massive footnote there, sometimes we break and we have to redo them, whatever.

97
00:07:54.120 --> 00:07:59.960
<v Matt Godbolt>But um they also need to be shared amongst up to about 40 different machines.

98
00:07:59.960 --> 00:08:05.000
<v Matt Godbolt>And that constitutes about two and a half, three terabytes of binary files.

99
00:08:05.000 --> 00:08:05.260
<v Ben Rady>Yeah.

100
00:08:05.260 --> 00:08:05.760
<v Ben Rady>Yeah, yeah.

101
00:08:05.760 --> 00:08:16.500
<v Matt Godbolt>And it's there's just not really a good solution for I'm sharing that amount of data with the kind of access patterns that are, well, it's a compiler, it's an executable, I'm going to run it from

102
00:08:16.500 --> 00:08:16.500
<v Ben Rady>Right.

103
00:08:16.500 --> 00:08:19.760
<v Matt Godbolt>NFS or wherever you're storing it.

104
00:08:19.760 --> 00:08:20.180
<v Ben Rady>Yeah, yeah.

105
00:08:20.180 --> 00:08:43.390
<v Matt Godbolt>So, you know, the first thing, you know, we when we first started out with Compiler Explorer, every node, it was all the compilers were built into that AMI image. So that the image that I said I was building from 22 to 24 would actually contain the compilers. They were actually apt installed at one stage and That was okay, but it does not scale because, you know, effectively every build gets slightly slower than the previous builders.

106
00:08:43.390 --> 00:08:43.700
<v Ben Rady>Yeah.

107
00:08:43.700 --> 00:08:56.740
<v Matt Godbolt>you More and more and more images, more compiler images get sort of unpacked onto it. And, that once it took more than 24 hours to to complete the AMI image, I knew that we were in trouble and I thought we have to come up with a different solution.

108
00:08:56.740 --> 00:08:56.920
<v Ben Rady>Yeah.

109
00:08:56.920 --> 00:09:08.240
<v Matt Godbolt>So we used NFS and NFS was great for a while. We hit some funny performance issues with NFS just out of the gate, but ultimately we got that settled down.

110
00:09:08.240 --> 00:09:19.140
<v Matt Godbolt>And then things like Boost, which is a C++ header only library. It makes a lot of use of the fact that you can include a file from itself.

111
00:09:19.140 --> 00:09:35.430
<v Matt Godbolt>You can self-reference the file and so it has a bunch of pre-processed trickery that includes the file multiple times and hash defines something to be N plus one. So it can include the same file 50 times to get expanding out of certain things that you can't do in the template system.

112
00:09:35.430 --> 00:09:35.580
<v Ben Rady>Yeah.

113
00:09:35.580 --> 00:09:43.800
<v Matt Godbolt>Those kinds of tricks. Either way, what that is, is tiny text files, including tiny other text files, which is like the worst case scenario for NFS.

114
00:09:43.800 --> 00:09:44.120
<v Ben Rady>Yeah.

115
00:09:44.120 --> 00:10:01.690
<v Matt Godbolt>You've got this massive... even if you cache the file contents, NFS will always go and fetch the, or pretty quickly will go and fetch the metadata to see if the thing has changed on the remote side before it will serve up the cached content it has locally.

116
00:10:01.690 --> 00:10:01.870
<v Ben Rady>Right.

117
00:10:01.870 --> 00:10:02.060
<v Ben Rady>Right.

118
00:10:02.060 --> 00:10:05.810
<v Matt Godbolt>And at that point, you might as well have just read the darn file because it's only 80 bytes long.

119
00:10:05.810 --> 00:10:05.910
<v Ben Rady>Yeah.

120
00:10:05.910 --> 00:10:06.000
<v Matt Godbolt>so

121
00:10:06.000 --> 00:10:06.270
<v Ben Rady>Right.

122
00:10:06.270 --> 00:10:06.540
<v Ben Rady>Yeah.

123
00:10:06.540 --> 00:10:19.100
<v Matt Godbolt>Latency, massive problem. And so Boost was timing out. And so our first solution to this was we would rsync a few libraries to the local disk when we started up so that they were local and then substitute the path.

124
00:10:19.100 --> 00:10:33.600
<v Matt Godbolt>Great, lovely, but not scalable because we have thousands of libraries and then the boot up time was getting longer and longer as well. So take two was for every compiler or library that is

125
00:10:33.600 --> 00:10:33.600
<v Ben Rady>yeah

126
00:10:33.600 --> 00:10:45.360
<v Matt Godbolt>like a final build, like a 12.1 released compiler, we install it on NFS, but then we also build a SquashFS image of that compiler.

127
00:10:45.360 --> 00:10:46.600
<v Ben Rady>Okay.

128
00:10:46.600 --> 00:10:57.240
<v Matt Godbolt>That SquashFS image is also on NFS, which so far you're thinking this is just manifestly worse, which maybe it is.

129
00:10:57.240 --> 00:10:57.820
<v Ben Rady>Okay.

130
00:10:57.820 --> 00:11:20.660
<v Matt Godbolt>And then at startup, we mount the SquashFS image over top of the NFS location where it is also stored. So it looks like we have one unified /opt/compiler-explorer, all the compilers in the world, but some of those directories in there have actually been mounted over top and are actually being served from a SquashFS image that's on NFS as well.

131
00:11:20.660 --> 00:11:23.870
<v Ben Rady>Right. And the intention here is to sort of solve the tiny file problem.

132
00:11:23.870 --> 00:11:24.140
<v Matt Godbolt>Right.

133
00:11:24.140 --> 00:11:35.100
<v Ben Rady>And instead of shipping across an 80 byte file, you're shipping across a SquashFS for a particular compiler and all of the bits come along with it in one swell foop.

134
00:11:35.100 --> 00:11:44.130
<v Matt Godbolt>In one swell foop, yes: it's actually better than that because the way SquashFS works is yes, it does do packing of smaller files into an area.

135
00:11:44.130 --> 00:11:44.180
<v Ben Rady>Wow.

136
00:11:44.180 --> 00:11:45.730
<v Matt Godbolt>also compresses them.

137
00:11:45.730 --> 00:11:47.340
<v Ben Rady>Mm-hmm.

138
00:11:47.340 --> 00:12:01.670
<v Matt Godbolt>And as far as the kernel is concerned, SquashFS is like a block device file system, like a hard disk. And so the kernel is caching and loading things as like 4K pages, 8K, whatever size it's reading and writing.

139
00:12:01.670 --> 00:12:02.360
<v Ben Rady>Right.

140
00:12:02.360 --> 00:12:13.900
<v Matt Godbolt>SquashFS doesn't know that the underlying image is actually on a mutable network file system. And so it caches them forever. It's like, yeah, fine. I'll keep this in my page cache, right?

141
00:12:13.900 --> 00:12:21.490
<v Matt Godbolt>I read page seven of this file system to hand to SquashFS that then unpacks it.

142
00:12:21.490 --> 00:12:21.560
<v Ben Rady>Uh-huh. Uh-huh.

143
00:12:21.560 --> 00:12:32.860
<v Matt Godbolt>And then the files are also cached, right? But the raw block level accesses are cached essentially forever, We've laundered the fact that behind the scenes, it's an NFS drive.

144
00:12:32.860 --> 00:12:46.120
<v Matt Godbolt>And so we get much better caching, much better performance of like reading the metadata for things because yeah, like you read the directory contents and it gives you like a 4K block that probably has all of the directories for everything that's inside that SquashFS image.

145
00:12:46.120 --> 00:12:46.520
<v Ben Rady>Mm-hmm.

146
00:12:46.520 --> 00:13:06.120
<v Matt Godbolt>And then the tiny files are all packed together as well. So like we're winning on every level and it was a huge improvement in the compile time. But it's not free to mount the image in the first place. And obviously it takes up a little bit of memory on the machine to have like...

147
00:13:06.120 --> 00:13:06.250
<v Matt Godbolt>4,000 or 2,000 mounts, right?

148
00:13:06.250 --> 00:13:06.520
<v Ben Rady>Yeah. Yeah.

149
00:13:06.520 --> 00:13:17.440
<v Matt Godbolt>So each mount you know takes up a certain amount of kernel space and you know part of the acceleration is that pre-caching effectively of that sort of top level of the directory of every of every compiler and library.

150
00:13:17.440 --> 00:13:29.500
<v Matt Godbolt>So over the years, you know we went from tens to hundreds to thousands of compilers and our boot up time became dominated by, well, it takes 50 milliseconds to mount each SquashFS image

151
00:13:29.500 --> 00:13:31.220
<v Ben Rady>Yeah.

152
00:13:31.220 --> 00:13:34.620
<v Matt Godbolt>Times 2000. Suddenly that's an appreciable amount of time.

153
00:13:34.620 --> 00:13:35.360
<v Ben Rady>Yeah, yeah.

154
00:13:35.360 --> 00:13:41.660
<v Matt Godbolt>And so then you're like, what if we do this in the background while we're starting up? And then it didn't that didn't work out.

155
00:13:41.660 --> 00:13:43.580
<v Matt Godbolt>Now I know why.

156
00:13:43.580 --> 00:13:44.390
<v Ben Rady>Yeah, OK.

157
00:13:44.390 --> 00:13:44.790
<v Ben Rady>Ah.

158
00:13:44.790 --> 00:13:50.340
<v Matt Godbolt>So anyway, that was the situation that we're in. That's why we mount thousands of files at startup.

159
00:13:50.340 --> 00:14:03.210
<v Matt Godbolt>And we have any number of ways of thinking about consolidating the number of SquashFS images we have so that we can combine all the GCCs together in one image or whatever. But there are a number of other issues with that, which is a whole other podcast episode.

160
00:14:03.210 --> 00:14:04.260
<v Ben Rady>Hmm.

161
00:14:04.260 --> 00:14:19.660
<v Matt Godbolt>And every time I describe this problem to people, by the way, um everyone's like, yeah, I don't know what, I can't think of a solution to this problem. ah This is the general problem of like, I need to ship immutable binaries with low latency to many places.

162
00:14:19.660 --> 00:14:28.380
<v Matt Godbolt>And I can't, and I need to take advantage of the immutability as much as possible. And um yeah. And then the management of that, right.

163
00:14:28.380 --> 00:14:28.480
<v Ben Rady>Yeah, yeah, yeah, yeah.

164
00:14:28.480 --> 00:14:48.020
<v Matt Godbolt>I've got thousands of these things, right. Anyway. So, so, the The problem turned out to be when you mount a an image on and a modern Ubuntu, systemd comes along and says, oh, you've added a mount.

165
00:14:48.020 --> 00:14:59.160
<v Matt Godbolt>I need to make an ad hoc unit, which is kind of what it's kind of dependency tracking node in its graph. I don't know much about systemd.

166
00:14:59.160 --> 00:15:02.000
<v Matt Godbolt>I've learned a lot more about it in the last few days, but...

167
00:15:02.000 --> 00:15:02.440
<v Ben Rady>Heh

168
00:15:02.440 --> 00:15:10.770
<v Matt Godbolt>What it effectively does is it creates a node in some dependency graph so that when you shut the system down, it knows to unmount it and it knows who depends on it and things like that, right?

169
00:15:10.770 --> 00:15:11.080
<v Ben Rady>Okay, yeah, right.

170
00:15:11.080 --> 00:15:24.780
<v Matt Godbolt>Sort of that stuff. Because you can also phrase the entirety of like, /etc/fstab or certain services to say, hey, this service needs this thing. And this thing is a network mount and that network mount needs networking and networking needs to be.

171
00:15:24.780 --> 00:15:30.990
<v Matt Godbolt>So systemd can like follow this graph and say, well, if you're turning on this service, I'll make sure the mounts it needs are in place.

172
00:15:30.990 --> 00:15:31.240
<v Ben Rady>Yeah.

173
00:15:31.240 --> 00:15:33.160
<v Matt Godbolt>And when you turn it off, I can unmount them as well.

174
00:15:33.160 --> 00:15:33.640
<v Ben Rady>Yeah, yeah.

175
00:15:33.640 --> 00:15:35.660
<v Matt Godbolt>So it makes a ton of sense.

176
00:15:35.660 --> 00:15:42.280
<v Ben Rady>I discovered actually, side note, the other day that systemd also does that for shared memory directories,

177
00:15:42.280 --> 00:15:43.140
<v Matt Godbolt>Interesting.

178
00:15:43.140 --> 00:15:51.650
<v Ben Rady>which which bit me in a very painful way ah because of a technology that you're familiar with that uses shared memory for storing data.

179
00:15:51.650 --> 00:15:53.240
<v Matt Godbolt>I'm aware of that.

180
00:15:53.240 --> 00:15:57.960
<v Ben Rady>And it suddenly disappeared when we turned the service off. And we're like, what just happened?

181
00:15:57.960 --> 00:15:59.060
<v Matt Godbolt>Oh my gosh.

182
00:15:59.060 --> 00:15:59.960
<v Ben Rady>Yeah.

183
00:15:59.960 --> 00:16:02.780
<v Matt Godbolt>Yeah. systemd has its fingers in a lot of pies.

184
00:16:02.780 --> 00:16:03.940
<v Ben Rady>Mm-hmm.

185
00:16:03.940 --> 00:16:17.960
<v Matt Godbolt>So when you mount an ad hoc thing, be it apparently shared memory or a SquashFS this image, which itself is mounted through loop back, which is another sort of mount, which is, you know, all that kind of stuff.

186
00:16:17.960 --> 00:16:18.220
<v Ben Rady>Right.

187
00:16:18.220 --> 00:16:18.740
<v Ben Rady>Yeah.

188
00:16:18.740 --> 00:16:22.800
<v Matt Godbolt>systemd tracks it and

189
00:16:22.800 --> 00:16:38.800
<v Matt Godbolt>I don't know what systemd is doing to take so long because this is the rub systemd essentially takes a hundred percent CPU and twice over. So on our two core machine that we run these things on, I can run top that when I actually got it, I said to you, the machine was unresponsive, right?

190
00:16:38.800 --> 00:16:42.690
<v Matt Godbolt>Because this is all in kernel land locks are being taken out left, right, and center.

191
00:16:42.690 --> 00:16:43.020
<v Ben Rady>Yeah, yeah.

192
00:16:43.020 --> 00:16:43.360
<v Ben Rady>Yeah. Mm-hmm.

193
00:16:43.360 --> 00:16:56.040
<v Matt Godbolt>Um, you know, we're trying to mount these things in parallel at sensible levels because we want to try and mount them and deal with the latency. If it takes 50 milliseconds and most of that is, is network latency. I should be at a fire off two or three mounts at once.

194
00:16:56.040 --> 00:17:09.060
<v Matt Godbolt>and get the SquashFS to read the root directory and then have the mounts, whatever. Even if the kernel is sequencing them, you'd think I'd like to have some mounts on the go at once.

195
00:17:09.060 --> 00:17:19.990
<v Matt Godbolt>But no, every time that comes in, systemd does a ton of work. And so PID 1, which is, you know, "init", but it's called systemd on these systems, ha ha takes 100%

196
00:17:19.990 --> 00:17:20.360
<v Ben Rady>Right.

197
00:17:20.360 --> 00:17:42.340
<v Matt Godbolt>And a second process called systemd, which is the one that is per user, I think, uh, takes a hundred percent CPU while this is going on. And especially, and, and that is not the case on Ubuntu 22. It does take some CPUs like I measured 30 and 40% on those, which again, it's like, what on earth are you doing?

198
00:17:42.340 --> 00:17:56.400
<v Matt Godbolt>But I'm sure it's got this massive dependency graph that it's running through. Uh, So yeah, the short answer is Ubuntu 24, either the systemd has changed or some aspect of the configuration.

199
00:17:56.400 --> 00:18:05.270
<v Matt Godbolt>And every time you mount something, it takes a little bit of time, which is probably not a big deal, but unless you're doing 3000 of them back to back or trying to do four of them at a time.

200
00:18:05.270 --> 00:18:05.540
<v Ben Rady>Right. Right.

201
00:18:05.540 --> 00:18:20.280
<v Matt Godbolt>And then it sort of jams the system up completely. And the knock-on effects for this when I rolled out the 24 was, A, our machines took ages to boot, but they did eventually come in just under the threshold of them getting whacked by...

202
00:18:20.280 --> 00:18:22.700
<v Matt Godbolt>machine did not come you know responsive exactly.

203
00:18:22.700 --> 00:18:26.580
<v Ben Rady>Yeah. The timeout essentially. Yeah. Yeah.

204
00:18:26.580 --> 00:18:44.650
<v Matt Godbolt>But of course, they've been chewing 200% CPU for like those three minutes while they were booting up. And the way that we do our auto-scaling... for our cluster is we take the average CPU of the, which is a terrible metric, but it's also the simplest one to do in AWS.

205
00:18:44.650 --> 00:18:44.900
<v Ben Rady>Yeah. Yeah.

206
00:18:44.900 --> 00:19:04.860
<v Matt Godbolt>Now we've got, ah one of our ah committers is is working hard on getting a much better way of scaling up and scaling down and using metrics that make sense. But the only sensible one that you can go to that just has a dropdown entry in AWS is add or remove nodes from the cluster to keep the average CPU at blah.

207
00:19:04.860 --> 00:19:05.860
<v Ben Rady>Yeah, yeah.

208
00:19:05.860 --> 00:19:25.550
<v Matt Godbolt>And so we've got that set to like 30%, 25% right now, so which is, again, not ideal. But it does mean that now suddenly you get into this runaway situation where a little bit of load comes in, you fire up a new node, and for the three minutes, it's taking 100% CPU, which pulls the average up even further.

209
00:19:25.550 --> 00:19:27.180
<v Ben Rady>Yep, yep. Right. ah

210
00:19:27.180 --> 00:19:30.760
<v Matt Godbolt>And before you know it, you have 40 nodes that are booted up.

211
00:19:30.760 --> 00:19:30.820
<v Ben Rady>right

212
00:19:30.820 --> 00:19:38.880
<v Matt Godbolt>And then once it hits that maximum of 40 [nodes]: obviously the CPU thing plunges down, and so it quickly stops dropping them all.

213
00:19:38.880 --> 00:19:39.000
<v Ben Rady>Mm-hmm.

214
00:19:39.000 --> 00:19:54.500
<v Matt Godbolt>But then, yeah. so So we rolled back to Ubuntu 22. And I spent the last two days or day and a half trying to turn off systemd, try to disable this part of systemd,

215
00:19:54.500 --> 00:20:04.530
<v Matt Godbolt>try to add every mount option known to mankind to the end of the SquashFS thing to say, for the love of God and all that's holy, don't track this. I don't care. I'm going to mount it and then I'm going to throw it away.

216
00:20:04.530 --> 00:20:05.100
<v Ben Rady>Yeah, yeah.

217
00:20:05.100 --> 00:20:10.840
<v Matt Godbolt>And neither me, my internet searches or any of the LLMs that I ask could come up with a way.

218
00:20:10.840 --> 00:20:10.860
<v Ben Rady>Yeah.

219
00:20:10.860 --> 00:20:16.250
<v Matt Godbolt>It just doesn't seem like there's a way to do it. In fact, by the end of one session, Claude was saying, you really need to file this as a bug.

220
00:20:16.250 --> 00:20:16.320
<v Ben Rady>Yeah.

221
00:20:16.320 --> 00:20:20.680
<v Matt Godbolt>And I'm like, I don't, I don't, I'm probably doing it wrong. I still think I'm doing it wrong.

222
00:20:20.680 --> 00:20:20.840
<v Ben Rady>Yeah.

223
00:20:20.840 --> 00:20:29.340
<v Matt Godbolt>Right. Right. Um, was and you know, for the longest time, having 3000 things mounted is not really an ideal situation.

224
00:20:29.340 --> 00:20:35.700
<v Matt Godbolt>And so that was fun. Yeah. You're pulling the face of, Oh, uh,

225
00:20:35.700 --> 00:20:35.700
<v Ben Rady>yeah

226
00:20:35.700 --> 00:20:41.080
<v Ben Rady>I'm just thinking, is there really no way to tell systemd just not to track these things?

227
00:20:41.080 --> 00:20:49.030
<v Matt Godbolt>I couldn't find it. You can do all these sort of repressions and things. and it still didn't you know It was still being thrown through it and it's it's monitoring some mount thing.

228
00:20:49.030 --> 00:20:49.720
<v Ben Rady>Ugh, it didn't work. Yeah.

229
00:20:49.720 --> 00:21:00.140
<v Matt Godbolt>ah The best I could do was "kill -STOP" on the second systemd process, like the user level systemd process, which essentially is like a break pointing it.

230
00:21:00.140 --> 00:21:00.440
<v Ben Rady>Yeah.

231
00:21:00.440 --> 00:21:18.680
<v Matt Godbolt>then mount them all, and then "kill -CONT" the process. And that got rid of one of the 100 percenters. But that process is sort of lazily on demand created. So until you start doing work that needs it, it's not there. And so my scripts were dying because they were trying to send it to stop.

232
00:21:18.680 --> 00:21:25.860
<v Matt Godbolt>And it was like, well, that PID doesn't exist yet. And then I'd log onto the machine and finally get through. Yeah, computers, man.

233
00:21:25.860 --> 00:21:27.540
<v Ben Rady>I mean, how do they even work?

234
00:21:27.540 --> 00:21:30.020
<v Matt Godbolt>I mean, how even... how even

235
00:21:30.020 --> 00:21:45.030
<v Matt Godbolt>And so what I would ideally like is to not... Well, first of all, I would like to have a much cleverer way of managing a large nest of SquashFS images and...

236
00:21:45.030 --> 00:21:45.990
<v Ben Rady>Wow.

237
00:21:45.990 --> 00:21:46.940
<v Ben Rady>Mm-hmm.

238
00:21:46.940 --> 00:21:59.300
<v Matt Godbolt>In fact, this whole approach to mounting SquashFS images through NFS or whatever was something that I was doing around the same time that the company-wide solution at Aquatic was being developed.

239
00:21:59.300 --> 00:22:06.640
<v Matt Godbolt>So there is no surprise that the thing I've just been describing to you is familiar to you, at least in part, because some of the...

240
00:22:06.640 --> 00:22:06.960
<v Ben Rady>ah Yeah.

241
00:22:06.960 --> 00:22:07.280
<v Ben Rady>Yeah, right.

242
00:22:07.280 --> 00:22:20.700
<v Matt Godbolt>For our audience, and I don't believe it's revealing any IP. We will have to do some very creative cutting if it is. But Aquatic has a solution for storing environments by effectively putting them in SquashFS images. And, you know, that's not new either.

243
00:22:20.700 --> 00:22:25.210
<v Matt Godbolt>That's what snap images are. That's what Flatpaks are or various types of things.

244
00:22:25.210 --> 00:22:25.240
<v Ben Rady>Yeah.

245
00:22:25.240 --> 00:22:32.640
<v Matt Godbolt>They just mount them. So like this is this is one of the ways that one solves a, hey, I just want an immutable bundle of things.

246
00:22:32.640 --> 00:22:33.320
<v Ben Rady>Yeah.

247
00:22:33.320 --> 00:23:02.100
<v Matt Godbolt>And then one of the people who worked on that from Aquatic, who is also a Compiler Explorer committer, has come up with some solutions that are a bit more clever about having a list of symlinks that point from the file system to a well-known path and that well-known path has AutoFS that auto mounts the thing on demand. And so you still present this, uh, this apparent file system that looks like it's got or every compiler known to mankind, but only when you actually cd into the directory or try and run it.

248
00:23:02.100 --> 00:23:17.820
<v Matt Godbolt>Does the symlink get resolved? And now that SquashFS image gets mounted and it appears in that position and then you can carry on with your life. And that's great. And obviously if we designed it from the beginning for that, we'd be able just retrofit. we would It would have been fine, but retrofitting it into our current

249
00:23:17.820 --> 00:23:18.940
<v Ben Rady>Yeah.

250
00:23:18.940 --> 00:23:37.880
<v Matt Godbolt>ah setup is really, really, really difficult. And then you still have these problems. So there's that that gives you on-demand mounting, which is one thing you can do. And you could try and configure AutoFS to do this, but there are a number of reasons why it's difficult, which are much too complicated to go into now. So it doesn't just work out of the gate, although it sounds like it ought to.

251
00:23:37.880 --> 00:23:57.880
<v Matt Godbolt>But we don't really want 3,000 SquashFS images. That's a pain. I do want to have, like, here are all the GCCs. And what you could do is mount sub parts of those images into this unified tree, which means that like, Hey, I've got all the GCCs, but now GCC 15 has just come out.

252
00:23:57.880 --> 00:24:05.720
<v Matt Godbolt>I don't want to have to rebuild that image because that's 500 gigabytes of GCC and SquashFS is immutable.

253
00:24:05.720 --> 00:24:05.860
<v Ben Rady>Yeah.

254
00:24:05.860 --> 00:24:10.160
<v Matt Godbolt>You can't add things after the fact you have to unpack it and then repack it again.

255
00:24:10.160 --> 00:24:10.700
<v Ben Rady>Right.

256
00:24:10.700 --> 00:24:25.960
<v Matt Godbolt>So if you're, if you're, uh, your solution for all, I'm adding one more GCC is, unpack all the GCCs and then repack all the GCCs with the new one, then you're kind of back to that original AMI problem we had that is it's going to get incrementally worse every time you add a new thing.

257
00:24:25.960 --> 00:24:35.350
<v Matt Godbolt>So what you really want to be able to do again, is like have this, well, all of the GCCs for the last, you know, 10 years are in "older GCCs", but they are mounted in each one individually.

258
00:24:35.350 --> 00:24:35.760
<v Ben Rady>yeah, yeah.

259
00:24:35.760 --> 00:24:37.410
<v Matt Godbolt>It's one image that has like the old ones.

260
00:24:37.410 --> 00:24:38.000
<v Ben Rady>yeah Right.

261
00:24:38.000 --> 00:24:41.800
<v Matt Godbolt>And then periodically you add kind of layers. This is sort of like a LayerFS thing.

262
00:24:41.800 --> 00:24:43.020
<v Ben Rady>Yeah, right.

263
00:24:43.020 --> 00:24:52.620
<v Matt Godbolt>And then you... you consolidate the layers. You can have a process that goes away and says, hey, layers three through nine, I can now net them out and make a new layer three, and then I rewrite the file system to be these things.

264
00:24:52.620 --> 00:24:52.780
<v Ben Rady>Yeah.

265
00:24:52.780 --> 00:25:04.960
<v Matt Godbolt>So that was that's where we want to go ultimately with this. But there isn't a quick MVP for it that gets me out of my my current hot potato situation right now.

266
00:25:04.960 --> 00:25:05.200
<v Ben Rady>Right. So you're back to Ubuntu 22.

267
00:25:05.200 --> 00:25:07.570
<v Matt Godbolt>And we've tried it. We're back to Ubuntu 22.

268
00:25:07.570 --> 00:25:08.320
<v Ben Rady>Yeah.

269
00:25:08.320 --> 00:25:12.870
<v Matt Godbolt>Although, although... although

270
00:25:12.870 --> 00:25:13.660
<v Ben Rady>yeah

271
00:25:13.660 --> 00:25:31.090
<v Matt Godbolt>I had a sort of an idea. So while banging my head on my keyboard, trying to go like, how would even does ah systemd notice when I mount things or when I do stuff to the system?

272
00:25:31.090 --> 00:25:32.220
<v Ben Rady>Mm-hmm.

273
00:25:32.220 --> 00:25:58.390
<v Matt Godbolt>I was like, wait a second. What if I wrote something that looked at file system accesses? So I have, for every file that's stored in a SquashFS image somewhere, it is also available naked in NFS because if SquashFS isn't around or whatever, or for those things we genuinely update quicker than the SquashFS images, it allows me to have access to the files that are just in that /opt/compiler-explorer, right?

274
00:25:58.390 --> 00:26:00.100
<v Ben Rady>right

275
00:26:00.100 --> 00:26:16.620
<v Matt Godbolt>So if I could post hoc, that is, run some trace through the whole system and say, hey, anytime I notice somebody accesses a file that is on NFS, that is inside one of the directories that I have a SquashFS image for, that's when I'm going to choose to mount it.

276
00:26:16.620 --> 00:26:18.280
<v Ben Rady>Hmm.

277
00:26:18.280 --> 00:26:29.460
<v Matt Godbolt>And for the first few times, they're still going, they're still reading from NFS, but once that mount has finished, we sweep in, sweep, that's not a word, swap in or flip in or something, I don't know, one those, sweep in.

278
00:26:29.460 --> 00:26:31.160
<v Ben Rady>Yeah. Yeah. Right. Yeah. But eventually when it mounts, yeah. Uh huh.

279
00:26:31.160 --> 00:26:33.340
<v Ben Rady>Sweep in. Fly in.

280
00:26:33.340 --> 00:26:48.280
<v Matt Godbolt>fly in the mounted SquashFS image over the top and so that is what I've been doing for the last two hours which is why I was slightly like let's just talk about it, it's top of mind for me right now and that's showing some early promise so in this world what we would do is we would not mount anything

281
00:26:48.280 --> 00:26:48.440
<v Ben Rady>Yeah.

282
00:26:48.440 --> 00:26:54.750
<v Matt Godbolt>And then we'd run this daemon, and all it does is sit there and watch file accesses and then sort of lazily bring in the SquashFS images.

283
00:26:54.750 --> 00:26:54.860
<v Ben Rady>Yeah. Yeah.

284
00:26:54.860 --> 00:27:04.200
<v Matt Godbolt>And obviously, in the worst case, a Compiler Explorer node will eventually mount all of the images. But likely as not, it'll never get close.

285
00:27:04.200 --> 00:27:15.000
<v Ben Rady>I was going to actually, was going to ask you about that is like, does that mean that as the Compiler Explorer nodes are running, they're just slowly accumulating these SquashFS mount points.

286
00:27:15.000 --> 00:27:21.680
<v Ben Rady>And do you need to like, clean them up on any sort of regular basis or do you just restart the machines every once in a while? Like, how does that work?

287
00:27:21.680 --> 00:27:29.160
<v Matt Godbolt>Well, so in the current situation we just mount them all at startup and they're up forever. So that's the end, right?

288
00:27:29.160 --> 00:27:29.520
<v Ben Rady>Okay.

289
00:27:29.520 --> 00:27:30.500
<v Matt Godbolt>So this would be a mark.

290
00:27:30.500 --> 00:27:36.000
<v Ben Rady>and And the only memory they're really consuming in that state is just that small amount of kernel memory that you were talking about before.

291
00:27:36.000 --> 00:27:46.260
<v Matt Godbolt>Yeah, which isn't that small. I can't remember what it was, but it was like a trivial enough, not sorry, non-trivial enough that you can see that the machine's like memory has gone down having finished mounting everything.

292
00:27:46.260 --> 00:27:46.420
<v Ben Rady>Yeah.

293
00:27:46.420 --> 00:27:49.390
<v Matt Godbolt>Oh, well, which is less than ideal.

294
00:27:49.390 --> 00:27:50.540
<v Ben Rady>Mm-hmm.

295
00:27:50.540 --> 00:27:56.690
<v Matt Godbolt>So yeah, the moment we pay the cost for all of them, even though, you know, we have some like GCC 1.23.

296
00:27:56.690 --> 00:27:57.360
<v Ben Rady>Yeah.

297
00:27:57.360 --> 00:28:08.470
<v Matt Godbolt>How often do people use that? Probably not very often. And again, we have 40 nodes. They're recycled really quickly. It's very unlikely that any one node will will need all 3000 in its lifetime.

298
00:28:08.470 --> 00:28:08.630
<v Ben Rady>Mm-hmm.

299
00:28:08.630 --> 00:28:08.780
<v Ben Rady>Mm-hmm.

300
00:28:08.780 --> 00:28:17.240
<v Matt Godbolt>So in this new world order, the way that I was imagining it is at least for V1, we just mount them and leave them up because it's no worse than what we have before.

301
00:28:17.240 --> 00:28:17.700
<v Ben Rady>Mm-hmm.

302
00:28:17.700 --> 00:28:30.260
<v Matt Godbolt>you know, it might take an extra... half a second, even a second to mount the access the first time. But we're not holding up the compile in that case, it's just going through the slow NFS path.

303
00:28:30.260 --> 00:28:31.060
<v Ben Rady>Yeah, yeah.

304
00:28:31.060 --> 00:28:53.280
<v Matt Godbolt>And then, and also, obviously the SquashFS images, extra NFS accesses that we didn't need to do otherwise, but the hope is it'll net out pretty quickly. And then by the time we either run it again, or even by the time it's finished, the first reading of like the ELF and it's starting to look at the DLLs that it needs to load in, then it's going to pull the DLLs from the SquashFS image.

305
00:28:53.280 --> 00:28:55.670
<v Matt Godbolt>So that is the hope. We'll see how it goes.

306
00:28:55.670 --> 00:28:56.080
<v Ben Rady>Mm-hmm.

307
00:28:56.080 --> 00:29:08.260
<v Matt Godbolt>I have already found one situation where the SquashFS image is not actually up to date with respect to the changes on the disk, which is like, oh, well, this is going to throw a wrench, a spanner in the works.

308
00:29:08.260 --> 00:29:08.660
<v Ben Rady>OK.

309
00:29:08.660 --> 00:29:09.720
<v Ben Rady>yeah. Yeah.

310
00:29:09.720 --> 00:29:11.730
<v Matt Godbolt>So that is ah an issue.

311
00:29:11.730 --> 00:29:12.340
<v Matt Godbolt>um

312
00:29:12.340 --> 00:29:12.680
<v Ben Rady>Interesting.

313
00:29:12.680 --> 00:29:17.080
<v Ben Rady>I guess you could have the demon that you're writing actually do that, right?

314
00:29:17.080 --> 00:29:17.320
<v Matt Godbolt>It would...

315
00:29:17.320 --> 00:29:17.640
<v Ben Rady>Maybe.

316
00:29:17.640 --> 00:29:25.990
<v Matt Godbolt>I guess so. I think I'm just going to go kind of caveat emptor. ah We'll find them as we hit them, or maybe it'll be something we do as a post ah process.

317
00:29:25.990 --> 00:29:26.580
<v Ben Rady>Yeah.

318
00:29:26.580 --> 00:29:30.720
<v Matt Godbolt>That's just running and looking for things that are out of date, you know, and yeah, go ahead.

319
00:29:30.720 --> 00:29:39.600
<v Ben Rady>Yeah. I guess the, Well, I guess the other thing you could do with that is if you, and you probably have this already, but you could, you could farm that thing for usage statistics, right?

320
00:29:39.600 --> 00:29:55.440
<v Matt Godbolt>Yes. In fact, about a year and a half ago, we changed the um the privacy policy and our our um ah backend to track statistics because you know we don't like to track things.

321
00:29:55.440 --> 00:29:56.540
<v Matt Godbolt>That's not what we're into.

322
00:29:56.540 --> 00:29:56.660
<v Ben Rady>Yeah.

323
00:29:56.660 --> 00:30:06.350
<v Matt Godbolt>I don't care what you're doing with it, really. But it is incredibly useful to say, how often do we use this compiler versus that compiler, which I think is a fair use of non-identifiable information.

324
00:30:06.350 --> 00:30:06.740
<v Ben Rady>Right.

325
00:30:06.740 --> 00:30:14.220
<v Ben Rady>For exactly like problems like this, where you're just like, I'm optimizing these things. I want to optimize them based on the usage, not on like, you know, random things.

326
00:30:14.220 --> 00:30:24.660
<v Matt Godbolt>Exactly. And so certainly we can do things like down the line, we can do things like, hey, let's have a cluster that only does legacy compilers, right?

327
00:30:24.660 --> 00:30:26.610
<v Matt Godbolt>And then that cluster just sits there lives there forever.

328
00:30:26.610 --> 00:30:26.980
<v Ben Rady>Yeah.

329
00:30:26.980 --> 00:30:36.800
<v Matt Godbolt>There's two machines that run all the time. They sit there in their old timey world and request for GCC 1, 2, 3. They're on the front porch with their shotgun across the lap.

330
00:30:36.800 --> 00:30:37.160
<v Ben Rady>and Right, with the rocking chairs.

331
00:30:37.160 --> 00:30:51.350
<v Matt Godbolt>Yeah, that's right. Waiting for a you know the...yea. And then we could even have, you know, conversely, the some faster nodes that are serving the GCC, you know, 15.1s that have just come out and the trunk builds and things like that.

332
00:30:51.350 --> 00:30:51.680
<v Ben Rady>Mm-hmm.

333
00:30:51.680 --> 00:31:05.920
<v Matt Godbolt>and But our management is not good enough. our ah are At the moment, having multiple clusters is painful for us. And so we have kind of two or maybe three clusters, you know one for the GPU things, one for the ARM compilers, and then one for everything else.

334
00:31:05.920 --> 00:31:06.620
<v Ben Rady>Yeah.

335
00:31:06.620 --> 00:31:11.740
<v Matt Godbolt>And retrofitting in everything without breaking a site is so hard.

336
00:31:11.740 --> 00:31:17.730
<v Matt Godbolt>you know This is why you know when we did our talk, a conversation, and you you talked a little bit about your sort of branch-based deployment, spinning up that.

337
00:31:17.730 --> 00:31:18.060
<v Ben Rady>yeah

338
00:31:18.060 --> 00:31:18.580
<v Ben Rady>Yeah. Yeah.

339
00:31:18.580 --> 00:31:22.380
<v Matt Godbolt>That was like, oh man, I wish. I wish we'd thought of that ahead of times.

340
00:31:22.380 --> 00:31:36.580
<v Ben Rady>Right. Just twist, twist the knife. I mean, you know, I, I think it sounds like at the scale that you guys are at right now, just spinning up one of those environments would be prohibitive in terms of cost, but maybe you could structure in a way where it wasn't quite so bad, you know?

341
00:31:36.580 --> 00:31:41.570
<v Matt Godbolt>So cost is not such a big deal anymore. And I say that with a massive footnote.

342
00:31:41.570 --> 00:31:41.660
<v Ben Rady>Yeah.

343
00:31:41.660 --> 00:31:51.260
<v Matt Godbolt>People are surprised at how relatively cheap Compiler Explorer is to run. We're currently at around about three grand a month ah burn rate of AWS stuff, although it's just gone up.

344
00:31:51.260 --> 00:31:51.460
<v Ben Rady>yeah

345
00:31:51.460 --> 00:32:03.340
<v Matt Godbolt>But it's gone up for good reasons. The good reasons is that I've sent a message out, a blog post, in fact. That's what we call these things, a message. I've made a blog post kind of explaining

346
00:32:03.340 --> 00:32:06.300
<v Ben Rady>You made one internet and you sent it out into the webs

347
00:32:06.300 --> 00:32:23.140
<v Matt Godbolt>I did. I sent it out into the webs about the cost breakdown of Compiler Explorer. I did a big sort of dive into it so I could justify, you know, we're very lucky. We have a lot of ah commercial sponsors. We have a lot of people who ah donate on Patreon and GitHub sponsors and...

348
00:32:23.140 --> 00:32:37.460
<v Matt Godbolt>And my dog's barking, which I can't be bothered to edit out. So we have a lot of money coming, which is fantastic. And I like to be very upfront and open about what we do with the money as much as I can within the reasonableness of the fact that it's still

349
00:32:37.460 --> 00:32:37.800
<v Ben Rady>hu

350
00:32:37.800 --> 00:32:40.220
<v Matt Godbolt>my private finances at some level still, right?

351
00:32:40.220 --> 00:32:40.380
<v Ben Rady>Right.

352
00:32:40.380 --> 00:32:53.480
<v Matt Godbolt>I'm sort of hand-waving and gesturing about this stuff stuff. um So I like people to know where the money's going. And so telling people like, this is what we spend it on and this is how much it breaks down to. it was interesting. And so it ended up on Hacker News, which was great.

353
00:32:53.480 --> 00:33:05.380
<v Matt Godbolt>And one person said, have you ever considered talking to, you know, the Grafana people or the... SolarWinds or whatever, you know, the people that we pay money to for subscriptions for like monitoring stuff.

354
00:33:05.380 --> 00:33:18.020
<v Matt Godbolt>And I was like, yeah, kind of, but you know, there's something to be said for it's not that much. It's not a huge, you know, that's costing me, you know, 40 bucks a month. um That's kind of noise. um And I don't want to give up too much of my,

355
00:33:18.020 --> 00:33:22.190
<v Matt Godbolt>ah you know, I don't want wanna to sell out. I would rather pay 40 bucks a month than them say, Hey, you have to put an ad on the top.

356
00:33:22.190 --> 00:33:22.440
<v Ben Rady>Right.

357
00:33:22.440 --> 00:33:28.130
<v Matt Godbolt>If I say, say thank you to, but and I'm like, ah, I don't know if they would do that, but that was, but it went, we went back and forth on that.

358
00:33:28.130 --> 00:33:28.180
<v Ben Rady>Yeah, yeah.

359
00:33:28.180 --> 00:33:28.200
<v Ben Rady>Right.

360
00:33:28.200 --> 00:33:35.980
<v Matt Godbolt>And then he was like, Oh, you do know that AWS have an open source budget. And I'm like the what now?

361
00:33:35.980 --> 00:33:37.700
<v Ben Rady>Oh, yeah.

362
00:33:37.700 --> 00:33:50.080
<v Matt Godbolt>And so I looked it up and it was a blog post from like 2012 that made some mention of yeah you know, like, hey, if you're an open source project, contact us. We might be able to help you.

363
00:33:50.080 --> 00:33:51.340
<v Matt Godbolt>Here's a form to fill in.

364
00:33:51.340 --> 00:33:51.420
<v Ben Rady>Yeah.

365
00:33:51.420 --> 00:34:03.100
<v Matt Godbolt>And I'm like, okay. Now the form has looked complicated. So given how old the blog post was, I just emailed the email address that it said and said, hey, is this thing still on, right?

366
00:34:03.100 --> 00:34:04.170
<v Ben Rady>yeah Right.

367
00:34:04.170 --> 00:34:15.250
<v Matt Godbolt>I'm, you know, I'm the creator of Compiler Explorer. I'm interested in talking if this thing's still on. I immediately got an email bounce and I thought, well, there you go. That tells me, I'm glad I did this rather than spending all the time to look at.

368
00:34:15.250 --> 00:34:15.320
<v Ben Rady>Yeah.

369
00:34:15.320 --> 00:34:27.260
<v Matt Godbolt>So I thought nothing more of it. An hour and a half later, somebody replied going, "oh, wow. Yes. We use Compiler Explorer. We'd love to help definitely fill in the form and send it to us."

370
00:34:27.260 --> 00:34:31.640
<v Matt Godbolt>And I look back at the bounce and it was like, oh, it must be one person's inbox is full.

371
00:34:31.640 --> 00:34:31.770
<v Ben Rady>OK.

372
00:34:31.770 --> 00:34:32.140
<v Ben Rady>Oh, yeah, OK.

373
00:34:32.140 --> 00:34:34.880
<v Matt Godbolt>of the distribution list that it ultimately ended up.

374
00:34:34.880 --> 00:34:35.140
<v Ben Rady>Yeah.

375
00:34:35.140 --> 00:34:54.170
<v Matt Godbolt>So I filled it in and thought, and it asked for what is a year's amount of money that your site or your, your project might need. And so I was like, I guess three grand times 12 or four grand. I can't remember exactly. I think it was three, three, three and a half grand. so So 36 grand, which, you know, it's like monstrous amount of money.

376
00:34:54.170 --> 00:34:54.180
<v Ben Rady>Yeah.

377
00:34:54.180 --> 00:35:00.820
<v Matt Godbolt>And I was expecting him to just say "hahahahah", no, seriously, how much do you need?

378
00:35:00.820 --> 00:35:13.300
<v Matt Godbolt>Um, And I thought, again, nothing more. they that Someone much more business oriented got back to me and said, we'll be back. We'll get back to you with it. Thank you for your you know your email. We'll be back to you within 30 days.

379
00:35:13.300 --> 00:35:13.480
<v Ben Rady>Yeah.

380
00:35:13.480 --> 00:35:17.420
<v Matt Godbolt>And I thought, all right, that's this is the now it's gone into bureaucracy. We'll never see it again.

381
00:35:17.420 --> 00:35:18.700
<v Ben Rady>Right.

382
00:35:18.700 --> 00:35:25.220
<v Matt Godbolt>And ah I just happened to log into my AWS account and I saw 36 grand credit had been applied to

383
00:35:25.220 --> 00:35:25.220
<v Ben Rady>What? Yeah.

384
00:35:25.220 --> 00:35:37.640
<v Matt Godbolt>it And I had to email them back and say like, okay, before I even talk to anyone, what are there? Do I have to tell people about this? Do I have to go out of my way to like, thank you? Do I? What's the deal here?

385
00:35:37.640 --> 00:35:37.780
<v Ben Rady>Yeah.

386
00:35:37.780 --> 00:35:48.620
<v Matt Godbolt>And the woman eventually got back to Oh, sorry. I meant to tell you that you'd been approved. I'm like, were you just going to let that hang? Wow. And so the short version, this is such a,

387
00:35:48.620 --> 00:36:17.400
<v Matt Godbolt>off topic thing is that AWS are now funding the cost that I put forward, the year cost as it was last year, which means immediately you're like oh maybe we can start running more instances and whatever because I have the cash for it now so it feels like I'm treating it like a subsidy but what I'm mindful of is they may not renew at the end of this year in which case I have to be able to scale everything back again so there's a bit of thought

388
00:36:17.400 --> 00:36:18.660
<v Ben Rady>Yeah, yeah, yeah.

389
00:36:18.660 --> 00:36:32.410
<v Matt Godbolt>I don't know. it's ah It's an interesting situation to be in, but it's an amazing situation in the short term. It means like I'm looking at like Redis caching, whereas before I was like, ah I don't really think that I can justify you know another $100 a month just to have something that I might not use or these kinds of things.

390
00:36:32.410 --> 00:36:32.440
<v Ben Rady>Yeah.

391
00:36:32.440 --> 00:36:38.160
<v Matt Godbolt>So it's very exploitative. So I'm excited about that.

392
00:36:38.160 --> 00:36:41.300
<v Matt Godbolt>But how do we get to this?

393
00:36:41.300 --> 00:36:41.360
<v Ben Rady>Oh, that's cool.

394
00:36:41.360 --> 00:36:41.460
<v Ben Rady>That's very cool, actually.

395
00:36:41.460 --> 00:36:44.500
<v Matt Godbolt>I'm forgetting. Oh, yeah, you were saying about how expensive it might be.

396
00:36:44.500 --> 00:36:47.200
<v Ben Rady>Yeah, you know, the branch based stuff might be too expensive and, you know.

397
00:36:47.200 --> 00:36:52.960
<v Matt Godbolt>But now it might be okay. Honestly, I look at my load balancer now.

398
00:36:52.960 --> 00:36:53.260
<v Ben Rady>Yeah.

399
00:36:53.260 --> 00:36:57.780
<v Matt Godbolt>So load balancers cost, what, $10 a month plus the transfer? Maybe a little more...

400
00:36:57.780 --> 00:37:00.520
<v Ben Rady>Yeah, it's the data that you really pay for there, right? Yeah.

401
00:37:00.520 --> 00:37:01.700
<v Matt Godbolt>It is, yeah.

402
00:37:01.700 --> 00:37:09.770
<v Matt Godbolt>And I've often wanted to have multiple load balancers, you know, one for each. I used to have subdomains for like our staging environment and things like that.

403
00:37:09.770 --> 00:37:09.900
<v Ben Rady>Yeah.

404
00:37:09.900 --> 00:37:10.160
<v Ben Rady>Mm. Mm-hmm.

405
00:37:10.160 --> 00:37:18.090
<v Matt Godbolt>And then I made it so that every subdomain ends up in godbolt.org, you know, comes to Compiler Explorer. So you have to be like saying, www.staging.godbolt.org.

406
00:37:18.090 --> 00:37:18.610
<v Ben Rady>Mm-hmm.

407
00:37:18.610 --> 00:37:19.640
<v Ben Rady>Yeah.

408
00:37:19.640 --> 00:37:23.960
<v Matt Godbolt>And then you're into multi-level DNS and that's a pain in the bum because you can't do wildcards and all that kind of stuff. You know, you know these things.

409
00:37:23.960 --> 00:37:25.120
<v Ben Rady>Yeah. Yeah.

410
00:37:25.120 --> 00:37:40.040
<v Matt Godbolt>um But I stopped doing that because I was originally using it to route to a different um load balancer. But you know to have one load balancer per environment was expensive. So everything now goes to the same load balancer and it's URL match to go off to its its merry way.

411
00:37:40.040 --> 00:37:42.620
<v Matt Godbolt>And that's not scaling all that well now.

412
00:37:42.620 --> 00:37:42.620
<v Ben Rady>Right. Yeah, yeah.

413
00:37:42.620 --> 00:37:51.120
<v Matt Godbolt>I've got multiple of them and there's other things on there. And you know there was a time when 10 bucks a month for another load balancer was like meaningful. And now...

414
00:37:51.120 --> 00:37:51.120
<v Ben Rady>Right.

415
00:37:51.120 --> 00:37:54.270
<v Matt Godbolt>not to, you know, put too far to border.

416
00:37:54.270 --> 00:37:54.420
<v Ben Rady>Yeah.

417
00:37:54.420 --> 00:37:58.820
<v Matt Godbolt>That's noise. I don't worry about it. So maybe I should go, maybe I should explore branch based development.

418
00:37:58.820 --> 00:38:02.860
<v Ben Rady>you can spend 10 bucks a month to make your life a little easier, I would suggest that you do it.

419
00:38:02.860 --> 00:38:10.360
<v Matt Godbolt>Yeah. Yeah. That is, that is the cost. I think right now is, you know, the trade-off between the cost of time.

420
00:38:10.360 --> 00:38:10.780
<v Ben Rady>Yeah.

421
00:38:10.780 --> 00:38:23.940
<v Matt Godbolt>And I have supposedly three presentations to prepare for and not be doing Compiler Explorer stuff. And then I've got, kind of two and a half clear months before I have to get my, go and work for a real job.

422
00:38:23.940 --> 00:38:23.960
<v Ben Rady>What - a job?!

423
00:38:23.960 --> 00:38:25.640
<v Matt Godbolt>I know.

424
00:38:25.640 --> 00:38:28.040
<v Ben Rady>That sounds terrible.

425
00:38:28.040 --> 00:38:28.340
<v Matt Godbolt>I know it does.

426
00:38:28.340 --> 00:38:28.380
<v Ben Rady>Yeah.

427
00:38:28.380 --> 00:38:41.460
<v Matt Godbolt>Doesn't it? It sounds awful. So yeah, I'm sort of very much top of mind thinking about how we're going to, uh, how going to go back to work, Ben. don't know. Well, that no

428
00:38:41.460 --> 00:38:46.780
<v Ben Rady>Yeah. I think in the first week you're going to be like, this is awesome.

429
00:38:46.780 --> 00:38:48.000
<v Ben Rady>That's what I predict. You're just going to be like, oh oh, right.

430
00:38:48.000 --> 00:38:48.060
<v Matt Godbolt>I reckon so too.

431
00:38:48.060 --> 00:38:50.360
<v Ben Rady>I remember why I love this.

432
00:38:50.360 --> 00:38:51.870
<v Matt Godbolt>Yeah, I think you're absolutely right.

433
00:38:51.870 --> 00:38:52.000
<v Ben Rady>Yeah. Yeah.

434
00:38:52.000 --> 00:38:58.740
<v Matt Godbolt>I'm pretty bullish about it. I check in with my new gig from time to time.

435
00:38:58.740 --> 00:39:04.840
<v Matt Godbolt>And, you know, I always come away feeling excited. Buoyed [in a british acccents "boid"], as I would say, or Booid, would you say, as a yank?

436
00:39:04.840 --> 00:39:05.160
<v Ben Rady>who

437
00:39:05.160 --> 00:39:12.200
<v Ben Rady>I would probably say buoyed. But I wouldn't say either of those words. I would just say excited because it's just too nautical.

438
00:39:12.200 --> 00:39:12.600
<v Matt Godbolt>You'd say excited. Yeah, that was... That

439
00:39:12.600 --> 00:39:15.280
<v Ben Rady>I'm not.

440
00:39:15.280 --> 00:39:17.700
<v Matt Godbolt>says a man who works for a company called Aquatic.

441
00:39:17.700 --> 00:39:19.300
<v Ben Rady>Yeah.

442
00:39:19.300 --> 00:39:23.040
<v Matt Godbolt>Yeah. Bowie.

443
00:39:23.040 --> 00:39:29.730
<v Ben Rady>we We have a service actually at Aquatic called buoy and I can, I like trip over it every time I say it or spell it.

444
00:39:29.730 --> 00:39:29.940
<v Matt Godbolt>Which...

445
00:39:29.940 --> 00:39:30.500
<v Matt Godbolt>It's...

446
00:39:30.500 --> 00:39:30.620
<v Ben Rady>Buied.

447
00:39:30.620 --> 00:39:40.340
<v Matt Godbolt>So in British English, that is boy. It's always been boy. Like, you know, what is the property of being able to float?

448
00:39:40.340 --> 00:39:40.610
<v Ben Rady>Yeah.

449
00:39:40.610 --> 00:39:41.440
<v Matt Godbolt>It is... Say it.

450
00:39:41.440 --> 00:39:42.280
<v Ben Rady>Booeyant.

451
00:39:42.280 --> 00:39:51.120
<v Matt Godbolt>hey Yeah! Listener, you look at the contortions on Ben's face as he tried to justify pronouncing it that way. Yeah. Yeah. ah yeah

452
00:39:51.120 --> 00:39:52.340
<v Ben Rady>Okay. Point taken.

453
00:39:52.340 --> 00:40:03.440
<v Matt Godbolt>All But that, you know, very few of these language based justifications hold water if you start looking too deeply because English is is not very logical.

454
00:40:03.440 --> 00:40:04.220
<v Ben Rady>Yeah. I think

455
00:40:04.220 --> 00:40:05.760
<v Ben Rady>Well, none of them hold water with

456
00:40:05.760 --> 00:40:06.440
<v Matt Godbolt>Anyway.

457
00:40:06.440 --> 00:40:09.390
<v Ben Rady>buoyant because they float. That's the that's what you're doing.

458
00:40:09.390 --> 00:40:09.540
<v Matt Godbolt>but do

459
00:40:09.540 --> 00:40:12.740
<v Ben Rady>it's and Never mind. I'll go home. um

460
00:40:12.740 --> 00:40:21.200
<v Matt Godbolt>Maybe we should actually somehow we've been talking for, well, I've been talking and you've been very kindly and listener has been very kindly listening to me vent my spleen.

461
00:40:21.200 --> 00:40:28.940
<v Ben Rady>Oh, no. I mean, I love these worst war stories. I think we should do more of these. It's like, let me tell you about this bug that consumed two days of my life.

462
00:40:28.940 --> 00:40:29.240
<v Matt Godbolt>Yeah.

463
00:40:29.240 --> 00:40:29.700
<v Ben Rady>Those are great.

464
00:40:29.700 --> 00:40:38.580
<v Matt Godbolt>I mean, I think it's valuable sometimes to hear them. I mean, so it's fun to tell them, but sometimes it's nice to hear them because then you secretly, you go back to your desk.

465
00:40:38.580 --> 00:40:38.660
<v Ben Rady>Mm-hmm.

466
00:40:38.660 --> 00:40:43.380
<v Matt Godbolt>You're like, I don't feel so bad about spending four hours tracking down this thing now.

467
00:40:43.380 --> 00:40:55.640
<v Ben Rady>Right. Yeah. You know, it's like the old thing about like, you know, hacking and programming in movies is like people with like, you know, one hand on each keyboard and then like, you all the things scrolling by on the screen and the charts and graphs.

468
00:40:55.640 --> 00:41:02.530
<v Ben Rady>And in reality, it's just staring at a stack trace for 30 minutes going like, "I am bad at my job" [whispered].

469
00:41:02.530 --> 00:41:09.640
<v Matt Godbolt>yeah Well, in case there was any doubt, you're not bad at your job. I don't think I'm bad at my job, but yeah, feeling that way occasionally is...

470
00:41:09.640 --> 00:41:17.380
<v Ben Rady>Yeah, that's just how you feel. You're just like, oh, well how did this ever work? I don't understand how it even ever worked, let alone what's happening now.

471
00:41:17.380 --> 00:41:17.500
<v Matt Godbolt>Yeah...

472
00:41:17.500 --> 00:41:17.620
<v Ben Rady>So, yes.

473
00:41:17.620 --> 00:41:21.960
<v Matt Godbolt>Yeah, like, who wrote this ... you know the git blame? And you're like, oh, oh, yeah.

474
00:41:21.960 --> 00:41:23.160
<v Ben Rady>Right. Oh, it's me. Yeah.

475
00:41:23.160 --> 00:41:31.780
<v Matt Godbolt>All right, friend. Well, short of starting a whole new conversation, I think we should finish it up here unless there's anything you want Parting words of wisdom.

476
00:41:31.780 --> 00:41:32.760
<v Ben Rady>No, that sounds good. This was a good episode.

477
00:41:32.760 --> 00:41:33.300
<v Matt Godbolt>Those were your...

478
00:41:33.300 --> 00:41:35.240
<v Ben Rady>I like it. I dig it. Ship it.

479
00:41:35.240 --> 00:41:37.710
<v Matt Godbolt>Fantastic, mate. I will. I will.

480
00:41:37.710 --> 00:41:38.300
<v Ben Rady>Yeah.

481
00:41:38.300 --> 00:41:43.380
<v Matt Godbolt>I will get the minions to edit and put it out soon.

482
00:41:43.380 --> 00:41:45.640
<v Matt Godbolt>And by the minions, I mean me.

483
00:41:45.640 --> 00:41:46.740
<v Ben Rady>Perfect.

484
00:41:46.740 --> 00:41:48.420
<v Matt Godbolt>There are no minions.

485
00:41:48.420 --> 00:41:48.860
<v Ben Rady>Right.

486
00:41:48.860 --> 00:41:49.300
<v Matt Godbolt>Cool.

487
00:41:49.300 --> 00:41:49.700
<v Ben Rady>Cool.

488
00:41:49.700 --> 00:41:51.560
<v Matt Godbolt>All right, friend. Until next time.

489
00:41:51.560 --> 00:41:54.560
<v Ben Rady>Until next time.

