WEBVTT

1
00:00:19.120 --> 00:00:19.680
<v Matt Godbolt>Hi, Ben.

2
00:00:19.680 --> 00:00:20.580
<v Ben Rady>Hey, Matt.

3
00:00:20.580 --> 00:00:21.860
<v Matt Godbolt>How are things?

4
00:00:21.860 --> 00:00:23.180
<v Ben Rady>Not too bad.

5
00:00:23.180 --> 00:00:33.100
<v Matt Godbolt>Cool. um Well, unusually for us, we have a topic before we started today, because yesterday, you and I got coffee, and I got excited.

6
00:00:33.100 --> 00:00:33.660
<v Ben Rady>Yes. Uh-huh.

7
00:00:33.660 --> 00:00:45.480
<v Matt Godbolt>And then we both were like, I wish we'd recorded this. So we're going to try and have that same conversation again, with the same level of excitement and enthusiasm, which you know is a bit of a struggle for us both.

8
00:00:45.480 --> 00:00:45.700
<v Ben Rady>Mm hmm.

9
00:00:45.700 --> 00:00:47.440
<v Matt Godbolt>we're We're not very passionate about what we do.

10
00:00:47.440 --> 00:00:49.220
<v Ben Rady>Right. We're not normally enthusiastic people.

11
00:00:49.220 --> 00:00:50.560
<v Matt Godbolt>No.

12
00:00:50.560 --> 00:00:54.090
<v Ben Rady>So every moment of this podcast has been pure pain.

13
00:00:54.090 --> 00:00:54.320
<v Matt Godbolt>So we'll we'll have to try that. but

14
00:00:54.320 --> 00:00:59.230
<v Ben Rady>I'm just going to admit that up until up until I've been keeping it hidden up until this very moment.

15
00:00:59.230 --> 00:00:59.460
<v Matt Godbolt>like i

16
00:00:59.460 --> 00:01:02.120
<v Matt Godbolt>Until this second and now the public revealed to the public.

17
00:01:02.120 --> 00:01:02.430
<v Ben Rady>Uh huh.

18
00:01:02.430 --> 00:01:05.650
<v Matt Godbolt>Yeah. I mean, we've already told everyone that this is like extremely low effort podcasts.

19
00:01:05.650 --> 00:01:05.900
<v Ben Rady>Yes.

20
00:01:05.900 --> 00:01:14.390
<v Matt Godbolt>We go out of our way to make it as as simple as possible for ourselves, even though, you know, I still like the fact that we do transcripts and that we do edit them a little bit.

21
00:01:14.390 --> 00:01:14.660
<v Ben Rady>Right.

22
00:01:14.660 --> 00:01:18.180
<v Matt Godbolt>I try and make my dog not make a noise in the background.

23
00:01:18.180 --> 00:01:18.860
<v Ben Rady>Mm hmm. Good luck.

24
00:01:18.860 --> 00:01:23.630
<v Matt Godbolt>I know ah he's just started moving. I can hear him. We'll see how this goes.

25
00:01:23.630 --> 00:01:24.300
<v Ben Rady>Mm hmm.

26
00:01:24.300 --> 00:01:29.000
<v Matt Godbolt>But today, yeah I wanted to talk to you about some cool programming stuff that I've actually been doing.

27
00:01:29.000 --> 00:01:29.320
<v Ben Rady>Mm hmm.

28
00:01:29.320 --> 00:01:39.780
<v Matt Godbolt>you know after our last conversation where we said, hey, it's kind of cool to be a programmer and not be a people manager, that has come to pass in the last few days.

29
00:01:39.780 --> 00:01:40.190
<v Ben Rady>Yeah.

30
00:01:40.190 --> 00:01:41.440
<v Ben Rady>Mm hmm.

31
00:01:41.440 --> 00:01:45.650
<v Matt Godbolt>um so Yeah, let me let me phrase it as a puzzle to you.

32
00:01:45.650 --> 00:01:46.500
<v Ben Rady>OK.

33
00:01:46.500 --> 00:02:04.520
<v Matt Godbolt>So this is these are the set of constraints that you have. You're writing a piece of code where you have many threads, possibly many processes. There are some important critical pieces of information that you need to share between these various programs.

34
00:02:04.520 --> 00:02:12.510
<v Matt Godbolt>And it's imperative that at no point do you prevent the producer of up the this information from being stopped or blocked by anything.

35
00:02:12.510 --> 00:02:13.280
<v Ben Rady>OK, yeah.

36
00:02:13.280 --> 00:02:14.060
<v Ben Rady>okay yeah

37
00:02:14.060 --> 00:02:33.820
<v Matt Godbolt>You only need the most recent ah like output, so imagine it is, let's say it's the current, ah the current ah S&P 500, right, whatever the S&P 500 is, something's happening, you're measuring it, and then you're writing the value to to somewhere that many readers are then going to read from.

38
00:02:33.820 --> 00:02:34.000
<v Ben Rady>Mm hmm. Yeah.

39
00:02:34.000 --> 00:02:43.260
<v Matt Godbolt>But obviously, ah that this information is bigger, big enough that um you can't just write it in the hope that everyone sees, um like, the the output

40
00:02:43.260 --> 00:02:43.260
<v Ben Rady>Right.

41
00:02:43.260 --> 00:02:45.440
<v Matt Godbolt>exactly as you wrote it in, right?

42
00:02:45.440 --> 00:02:46.300
<v Ben Rady>It's it's many bytes.

43
00:02:46.300 --> 00:02:47.860
<v Matt Godbolt>It's many bytes, right, exactly.

44
00:02:47.860 --> 00:02:48.460
<v Ben Rady>Uh huh. Yeah.

45
00:02:48.460 --> 00:03:07.320
<v Matt Godbolt>So um how do you design a system where you can share small, but not trivially small pieces of information between threads where you never block the writer, um the readers need a fast access to it, but you can you don't mind if they have to if if they have to like do a bit of extra work.

46
00:03:07.320 --> 00:03:14.300
<v Matt Godbolt>So there are actually many solutions to this, um but what what comes to mind?

47
00:03:14.300 --> 00:03:19.660
<v Ben Rady>Well, I mean, you haven't mentioned anything about the latency constraints.

48
00:03:19.660 --> 00:03:20.640
<v Matt Godbolt>Correct. Yeah.

49
00:03:20.640 --> 00:03:25.980
<v Ben Rady>So given the given the fact that there are currently no latency constraints, I can think of many solutions.

50
00:03:25.980 --> 00:03:26.100
<v Matt Godbolt>Yeah.

51
00:03:26.100 --> 00:03:32.570
<v Matt Godbolt>Well, actually, go on then. So let's think of a ah latency constraint where we don't ah block the writer. at No point do we block the writer.

52
00:03:32.570 --> 00:03:32.840
<v Ben Rady>yeah

53
00:03:32.840 --> 00:03:35.460
<v Matt Godbolt>So think of, well, what can we do?

54
00:03:35.460 --> 00:03:43.840
<v Ben Rady>ah huh ah So going back to one of our previous episodes on building boring things, files, ah you could use a fan out queue.

55
00:03:43.840 --> 00:03:44.720
<v Matt Godbolt>Uh-huh.

56
00:03:44.720 --> 00:04:04.120
<v Ben Rady>You could use um something that ah if you don't care about sort of like um like if ah if a receiver of this information is expected to handle a situation in which it's falling behind by just losing information,

57
00:04:04.120 --> 00:04:07.140
<v Matt Godbolt>Right, which is absolutely the case. and is We only care about the most recent for the readers.

58
00:04:07.140 --> 00:04:10.720
<v Ben Rady>Yeah. You could use some sort of network broadcast to just broadcast it out.

59
00:04:10.720 --> 00:04:10.880
<v Matt Godbolt>Yeah.

60
00:04:10.880 --> 00:04:13.880
<v Ben Rady>And it's like, if you're listening and you get this, that's cool. And if you don't, then you don't.

61
00:04:13.880 --> 00:04:16.290
<v Matt Godbolt>That is cool. Yeah.

62
00:04:16.290 --> 00:04:23.610
<v Ben Rady>Um, which has some, some of the same properties as a fan out queue, but isn't, isn't exactly the same thing.

63
00:04:23.610 --> 00:04:25.700
<v Matt Godbolt>Yeah.

64
00:04:25.700 --> 00:04:32.750
<v Ben Rady>Um, you know, uh, those are the, those are probably the top three, again, modulo, no latency constraints.

65
00:04:32.750 --> 00:04:34.620
<v Ben Rady>those ah That's where I would probably start as one of those three.

66
00:04:34.620 --> 00:04:43.640
<v Matt Godbolt>would i mean if you if you If you drop, write RAM constraints and things like that, you could just ah have an append only like Q and then the readers only look at the most recently written to.

67
00:04:43.640 --> 00:04:45.840
<v Ben Rady>Yeah, right. Well, that's what I was saying. My first one was files, right?

68
00:04:45.840 --> 00:04:46.280
<v Matt Godbolt>Yeah.

69
00:04:46.280 --> 00:04:56.220
<v Ben Rady>Like here's a file start, you know, the messages are fixed length. So started bite zero and read. And then when you hit the end of file, try again, right.

70
00:04:56.220 --> 00:05:00.550
<v Matt Godbolt>Right. And then if you read from if you want to read the latest, you don't care about the intermediates, which is passing.

71
00:05:00.550 --> 00:05:00.780
<v Ben Rady>And just keep going.

72
00:05:00.780 --> 00:05:00.840
<v Ben Rady>Yeah.

73
00:05:00.840 --> 00:05:04.370
<v Matt Godbolt>I didn't state very well. Then you seek to the end of the file.

74
00:05:04.370 --> 00:05:04.520
<v Ben Rady>Yeah.

75
00:05:04.520 --> 00:05:14.440
<v Matt Godbolt>And if it's not a not a not a multiple of the object size, then you know that you've kind of caught the writer while it's still writing. So you mod it something like that.

76
00:05:14.440 --> 00:05:17.220
<v Ben Rady>Yeah, yeah, yeah, that'd be really dumb and would work.

77
00:05:17.220 --> 00:05:17.520
<v Matt Godbolt>And that gives you some amount of. Exactly.

78
00:05:17.520 --> 00:05:18.060
<v Ben Rady>Yeah.

79
00:05:18.060 --> 00:05:27.960
<v Matt Godbolt>Yeah. And all of those things I think you could make work. um But yeah, then to get round back to the obvious part here is that like we don't have infinite RAM files are probably not a good choice.

80
00:05:27.960 --> 00:05:47.440
<v Matt Godbolt>And we're looking for the lowest possible latency. What is the fastest possible way that you could share this information? Again, the writer probably is going to be updating this very commonly. It's, you know, some derived financial thing, and then the readers from time to time need to be able to look at it and kind of say, is this what what is the current price?

81
00:05:47.440 --> 00:06:02.200
<v Matt Godbolt>Maybe it's even like a web but you know a web view of like, what is the current S&P 500? And you're like, okay, I don't care how long it takes my web service to um to to read the current value out, it's but it is it least a consistent value.

82
00:06:02.200 --> 00:06:02.700
<v Ben Rady>Right.

83
00:06:02.700 --> 00:06:03.880
<v Matt Godbolt>um It's not like half of... ah yeah That's right, yeah.

84
00:06:03.880 --> 00:06:07.300
<v Ben Rady>Yeah, you don't get half the bytes. You're like, wow, the market looks cheap today.

85
00:06:07.300 --> 00:06:12.420
<v Matt Godbolt>Yeah, that's a very bad mistake that you make usually only once in a career.

86
00:06:12.420 --> 00:06:14.080
<v Ben Rady>Yeah.

87
00:06:14.080 --> 00:06:45.840
<v Matt Godbolt>um So yeah, we've the one of the solutions to this problem is to use a relatively unusual and a fun construct called a sequence lock. It's used in ah a bunch of places um in the Linux kernel. um So it's kind of well-worn and well well-trusted. um But it has some interesting characteristics. So let me explain to you what a sequence lock is. And then we can talk about why it's kind of difficult to get this right and why I've spent a week trying to write one and not sure whether or not I have actually got it right or not.

88
00:06:45.840 --> 00:07:02.560
<v Matt Godbolt>so So the trick is this, if you are a writer, yeah you, well, first of all, the piece of information is in a shared piece of memory, and it's the same piece of memory for that, um like, we're not using it like a queue, it's just one place to look for a particular value.

89
00:07:02.560 --> 00:07:06.700
<v Matt Godbolt>So like, this is the S&P 500, it lives at this address in some piece of shared memory.

90
00:07:06.700 --> 00:07:06.820
<v Ben Rady>Okay.

91
00:07:06.820 --> 00:07:06.960
<v Matt Godbolt>And if you've got lots of

92
00:07:06.960 --> 00:07:15.320
<v Ben Rady>So the readers and the writers in this context are, have this shared memory. So they both have access to this memory, which means they're at least running on the same machine.

93
00:07:15.320 --> 00:07:16.580
<v Matt Godbolt>Correct, yes, that is right, yeah.

94
00:07:16.580 --> 00:07:20.380
<v Ben Rady>Uh, they may not necessarily be in the same process, but they're running on the same physical computer or virtual computer.

95
00:07:20.380 --> 00:07:21.990
<v Matt Godbolt>That is correct for this. Yes.

96
00:07:21.990 --> 00:07:22.340
<v Ben Rady>Yes.

97
00:07:22.340 --> 00:07:25.070
<v Matt Godbolt>Yeah. Same. fitgejit Well, or yeah. Virtual computer or whatever.

98
00:07:25.070 --> 00:07:25.400
<v Ben Rady>yes Yes. ah

99
00:07:25.400 --> 00:07:38.160
<v Matt Godbolt>I mean, modulo, there are like, you know, tricks to do ah RDMA and things like that across networks and stuff, but no, what we're talking about here really genuinely is a same computer shared Ram chips somewhere in the, in the system.

100
00:07:38.160 --> 00:07:39.690
<v Matt Godbolt>Right. And it's the same address.

101
00:07:39.690 --> 00:07:40.560
<v Ben Rady>Yep.

102
00:07:40.560 --> 00:07:51.630
<v Matt Godbolt>Uh, so, and as well as storing the piece of information that we want to protect, which has, you know, as I say, is like eight, 16. Well, if it's eight bytes we probably wouldn't worry about this, if it's 16-32-ish, that kind of size of bytes.

103
00:07:51.630 --> 00:07:51.860
<v Ben Rady>Mhm.

104
00:07:51.860 --> 00:08:06.160
<v Matt Godbolt>We also store a sequence number. Every time the writer goes to write it increments the sequence number, it updates the data inside the block that's being protected, and then it increments the sequence number again.

105
00:08:06.160 --> 00:08:15.500
<v Matt Godbolt>We'll get to why that's important. um lots of Anyone who's listening who knows roughly what I'm talking about here has like already seen all the problems with the things that we've just talked about. But let's talk about what the reader does.

106
00:08:15.500 --> 00:08:25.470
<v Matt Godbolt>The reader reads the sequence number. ah if If it's odd, the reader the yeah the the reader knows that the writer is in the process of updating it.

107
00:08:25.470 --> 00:08:25.720
<v Ben Rady>Okay, yeah.

108
00:08:25.720 --> 00:08:27.920
<v Matt Godbolt>So the reader says, I can't look.

109
00:08:27.920 --> 00:08:28.480
<v Ben Rady>Right.

110
00:08:28.480 --> 00:08:30.490
<v Matt Godbolt>And it just waits a bit and tries again.

111
00:08:30.490 --> 00:08:31.260
<v Ben Rady>Mm hmm.

112
00:08:31.260 --> 00:08:45.400
<v Matt Godbolt>So what we've got is very optimistic kind of attempt here. If it's even, we take a copy of the protected data, but that doesn't mean to say that it wasn't while we were copying it being updated by the writer.

113
00:08:45.400 --> 00:08:45.980
<v Ben Rady>Yeah.

114
00:08:45.980 --> 00:08:46.680
<v Ben Rady>Mm hmm. Right.

115
00:08:46.680 --> 00:09:14.390
<v Matt Godbolt>So we've just read a bunch of data. We can't look at that data yet. What we're going to do is we're going to look at the sequence lock again. And only if the sequence lock, sorry, the sequence number matches the first sequence number, the you know, when we read it, then we know that we got a good copy of the data that was being shared. And now we can look at it. It's safe to look at it. We know that the the writer wasn't in the process of updating it, nor did the writer get and do an entire update while we were taking a copy of the information.

116
00:09:14.390 --> 00:09:15.320
<v Ben Rady>Right.

117
00:09:15.320 --> 00:09:18.860
<v Matt Godbolt>ah Obviously, if it if the sequence number changes, we have to retry.

118
00:09:18.860 --> 00:09:26.500
<v Ben Rady>Right So what happens if you have the slowest reader in the world? It's like a dude with an abacus.

119
00:09:26.500 --> 00:09:27.080
<v Matt Godbolt>Yeah.

120
00:09:27.080 --> 00:09:33.680
<v Ben Rady>And every single time you go to read the sequence number a second time, it's different.

121
00:09:33.680 --> 00:09:36.100
<v Matt Godbolt>Then you are in trouble, right? It is dabs.

122
00:09:36.100 --> 00:09:38.870
<v Ben Rady>You're just never getting this data. You're just not, you're not fast enough.

123
00:09:38.870 --> 00:09:39.140
<v Matt Godbolt>It is absolutely.

124
00:09:39.140 --> 00:09:40.420
<v Ben Rady>You need to be faster.

125
00:09:40.420 --> 00:09:55.120
<v Matt Godbolt>That is correct. Yes. So like on in in terms of testing in the machine that I've got, I can get it to the state where if I have a writer that is fast enough that literally goes back to back doing modifications, then it is possible to completely starve out the reader.

126
00:09:55.120 --> 00:09:55.520
<v Ben Rady>Yeah.

127
00:09:55.520 --> 00:09:56.240
<v Ben Rady>Mm hmm.

128
00:09:56.240 --> 00:10:13.200
<v Matt Godbolt>I say completely starve out every few microseconds, the right, the reader was able to sneak in, but this is with a very fast reader, right? on And on purpose, what we're talking about is a tiny data structure where you're saying that these are bytes that I'm going to like mem copy out of shared memory. So that like, you're kind of hoping that like,

129
00:10:13.200 --> 00:10:18.790
<v Matt Godbolt>Even if you're going to do something slow with it, you're going to do something slow with the copy that you got, the good copy that you got of the data.

130
00:10:18.790 --> 00:10:19.140
<v Ben Rady>Yeah.

131
00:10:19.140 --> 00:10:30.120
<v Matt Godbolt>And so it shouldn't matter too much. But it's it is possible. That is absolutely a trade-off you're making here. The writer can starve out the readers, but computers are super fast.

132
00:10:30.120 --> 00:10:43.000
<v Matt Godbolt>And provided your writer isn't doing something pathological like sitting in a tight loop doing an update, even the loop overhead is enough to give the reader a decent chance of sneaking in from time to time, at least on a computer timescale.

133
00:10:43.000 --> 00:10:55.360
<v Matt Godbolt>But it's absolutely possible, yeah. um And if you care about these things, you probably need to make your API not retry or have a maximum number of retries and sort of fail for the reader and say like, hey, this didn't I tried a thousand times, it didn't happen.

134
00:10:55.360 --> 00:10:55.520
<v Ben Rady>Yeah, right.

135
00:10:55.520 --> 00:11:06.740
<v Matt Godbolt>And then the the reader could make a ah choice about it. So it's an interesting thing here. So you again, the writer never has to block the writer is able to just write unconditionally.

136
00:11:06.740 --> 00:11:23.080
<v Matt Godbolt>So for example, if the writer is processing a stream, a torrent of of incoming UDP market data as we do in our world, then um it's imperative that you never block because if you block for any length of time, you might miss some information on the network and then you have to go through an expensive recovery process.

137
00:11:23.080 --> 00:11:24.850
<v Matt Godbolt>So it has a great property for the writer.

138
00:11:24.850 --> 00:11:25.200
<v Ben Rady>right

139
00:11:25.200 --> 00:11:31.240
<v Matt Godbolt>And for the reader, it has a bit of a tax. It's not free to do this check, but but

140
00:11:31.240 --> 00:11:31.240
<v Ben Rady>Mm hmm.

141
00:11:31.240 --> 00:11:42.540
<v Matt Godbolt>If the ratio of times the reader and writer are both looking at the same thing at the same time is effectively de minimis, which hopefully it would be, then essentially it's free on both sides.

142
00:11:42.540 --> 00:11:43.190
<v Ben Rady>Mm hmm.

143
00:11:43.190 --> 00:11:43.840
<v Ben Rady>Mm hmm.

144
00:11:43.840 --> 00:11:52.830
<v Matt Godbolt>It's a called it's a called data structure. it's used ah my My understanding is is it's used in the Linux kernel to protect some of the data around the system time.

145
00:11:52.830 --> 00:11:53.740
<v Ben Rady>Mm hmm. Hmm.

146
00:11:53.740 --> 00:12:07.240
<v Matt Godbolt>So the kernel has some configuration information about you know how fast the clock's running and all that kind of good stuff. um It's bigger than it can atomically switch out, which is ah things that CPUs can do.

147
00:12:07.240 --> 00:12:16.700
<v Matt Godbolt>um um And it it's mapped into every process so that when you do get time of day, you don't actually have to do a system call. You can just read these numbers out directly of a piece of memory that's read only in your process.

148
00:12:16.700 --> 00:12:17.260
<v Ben Rady>Oh, cool. Yeah.

149
00:12:17.260 --> 00:12:30.560
<v Matt Godbolt>ah But the kernel can update it when, you know, NTPs fiddling with the times, or, you know, when there's overflows in some of the counters, the hardware counters that you need to use to actually work out what the current time is. But um yeah, for the most part, the kernel doesn't touch that information.

150
00:12:30.560 --> 00:12:39.910
<v Matt Godbolt>But if it does, it needs to do it in a way that's safe for all the processes to read from, and they don't mind doing a retry. So that may be butchering the memory of exactly where that's used. But it's something something like that.

151
00:12:39.910 --> 00:12:40.200
<v Ben Rady>Mm hmm.

152
00:12:40.200 --> 00:12:43.700
<v Matt Godbolt>it's it's ah It's a cool thing to do. um So

153
00:12:43.700 --> 00:12:50.260
<v Ben Rady>Hmm. So this reminds me there's a Sylvester Stallone movie. I don't remember if it's Judge Dredd.

154
00:12:50.260 --> 00:12:52.720
<v Matt Godbolt>Whoa, was not expecting it to go this way.

155
00:12:52.720 --> 00:12:59.880
<v Ben Rady>Yeah, it's it might be. I think it's Judge Dredd. It might be Demolition Man. I don't remember which. I think it's I think it's Judge Dredd.

156
00:12:59.880 --> 00:13:02.680
<v Matt Godbolt>I've only seen one of those, so... and

157
00:13:02.680 --> 00:13:13.860
<v Ben Rady>But there's a robot walking around selling food and it's recycled food. And there's this like thing that's playing. It's like recycled food is good for the environment and OK for you.

158
00:13:13.860 --> 00:13:21.640
<v Ben Rady>And so this data structure to me sounds like this data structure is good for the writers and okay for the readers.

159
00:13:21.640 --> 00:13:26.910
<v Matt Godbolt>That's basically it. Again, though, it does depend on the duty cycle of the writers and readers.

160
00:13:26.910 --> 00:13:27.120
<v Ben Rady>Yes.

161
00:13:27.120 --> 00:13:51.410
<v Matt Godbolt>And so even at the scale that we work run out in finance, there are many microseconds in between packet updates. ah Many single-digit microseconds in time packet objects, which is more than enough for even the most slow reader to get in and get a chance But even if they're delayed for a microsecond or so um So it's it's usually a win-win situation and so it's fun.

162
00:13:51.410 --> 00:13:51.420
<v Ben Rady>Yeah.

163
00:13:51.420 --> 00:13:51.460
<v Ben Rady>Right. Right. Yeah.

164
00:13:51.460 --> 00:14:06.950
<v Matt Godbolt>It's fascinating. It's interesting but the real trick comes from the fact that um Everything that both the compiler and the CPU want to do on your behalf ah to optimize your code is at loggerheads with this very strict ordering of of events.

165
00:14:06.950 --> 00:14:07.300
<v Ben Rady>Yeah.

166
00:14:07.300 --> 00:14:14.420
<v Matt Godbolt>So like i as I described to you, absolutely the writer absolutely has to bump the sequence number first.

167
00:14:14.420 --> 00:14:15.300
<v Ben Rady>yeah

168
00:14:15.300 --> 00:14:22.520
<v Matt Godbolt>Then it needs to change all of the data that it's doing. And it's got a certain amount of time to rummage around changing the data. Again, it's blocking the write reader all the time it's doing this.

169
00:14:22.520 --> 00:14:27.270
<v Ben Rady>Yeah. Yeah. And the sequence number is odd while this is happening, which is the signal to the reader that it shouldn't be reading.

170
00:14:27.270 --> 00:14:27.430
<v Matt Godbolt>Correct.

171
00:14:27.430 --> 00:14:27.600
<v Ben Rady>Right.

172
00:14:27.600 --> 00:14:28.510
<v Matt Godbolt>That's correct, yes, yep.

173
00:14:28.510 --> 00:14:29.080
<v Ben Rady>Yep.

174
00:14:29.080 --> 00:14:34.250
<v Matt Godbolt>And this ah only allows for a single writer. There are extensions that relatively trivially let you have multiple writers for what it's worth.

175
00:14:34.250 --> 00:14:34.480
<v Ben Rady>Right.

176
00:14:34.480 --> 00:14:53.700
<v Matt Godbolt>And then at the end, once you finish rummaging around and modifying and mutating your your data, you need to bump it again to both make it even again to signify that it's okay to start reading it and to be one higher than or two higher than when you started so that anybody who got a copy in between you modifying it also notices and knows to re retry.

177
00:14:53.700 --> 00:14:54.940
<v Ben Rady>Uh-huh. Right.

178
00:14:54.940 --> 00:15:09.560
<v Matt Godbolt>um But those absolutely have to happen in that order. And if anyone's ever, you know, for folks who are used to like regular programming languages, like, you know, Python, the like, you know, it's like, you know, you write a sequence of statements, they absolutely do happen in that order, right?

179
00:15:09.560 --> 00:15:09.600
<v Ben Rady>Uh

180
00:15:09.600 --> 00:15:14.740
<v Matt Godbolt>There's not anything particularly exciting. If I said A equals one, B equals two, A equals three.

181
00:15:14.740 --> 00:15:14.780
<v Ben Rady>-huh.

182
00:15:14.780 --> 00:15:14.820
<v Ben Rady>Uh-huh.

183
00:15:14.820 --> 00:15:31.040
<v Matt Godbolt>no At no point could you pause the CPU and look and observe that A was ah ah three and B was two or whatever. there's not like Sorry, that's a really bad example and it needs a picture to be for me to even get it right in my head.

184
00:15:31.040 --> 00:15:31.370
<v Ben Rady>yeah

185
00:15:31.370 --> 00:15:31.920
<v Ben Rady>and source of Yeah.

186
00:15:31.920 --> 00:15:44.280
<v Matt Godbolt>But there's nothing funky going on there. It just does things one after another. But um for an optimizing compiler, it's very convenient to be able to play fast and loose with all of the operations that you're doing, right?

187
00:15:44.280 --> 00:15:44.480
<v Ben Rady>Right.

188
00:15:44.480 --> 00:15:54.720
<v Matt Godbolt>If you can pull forward something that's expensive that you know you're going to do later on, then you can start like a big divide or a multiply or whatever, while ah you're doing some other setup work.

189
00:15:54.720 --> 00:15:54.940
<v Ben Rady>Yeah.

190
00:15:54.940 --> 00:15:57.940
<v Matt Godbolt>And then when you go to look for the result of the divide, it's ready for you.

191
00:15:57.940 --> 00:15:58.020
<v Ben Rady>Uh-huh.

192
00:15:58.020 --> 00:16:01.740
<v Matt Godbolt>Hooray, compilers want to do this all the time. So they want to be able to move your the the instructions around.

193
00:16:01.740 --> 00:16:02.500
<v Ben Rady>Uh-huh.

194
00:16:02.500 --> 00:16:08.880
<v Matt Godbolt>And you need to be able to tell the compiler, no, you can't move these otherwise unnecessary looking memory operations.

195
00:16:08.880 --> 00:16:24.800
<v Ben Rady>Right, right, right. Is there anything that you can use to indicate that you're writing into memory that is shared potentially shared with another process? So that the compiler knows it's like i'm I'm doing this for the side effects on purpose in a way.

196
00:16:24.800 --> 00:16:47.040
<v Matt Godbolt>There is, I mean there there are so specifically in terms of C++ which is where I am strongest in this regard. um A way to model this is to use the volatile keyword. That is not a good way but it is a way so volatile is on the path to being deprecated because nobody can really describe what it's for or how.

197
00:16:47.040 --> 00:16:51.590
<v Matt Godbolt>But what it essentially says is, don't make any assumptions about this memory location.

198
00:16:51.590 --> 00:16:52.040
<v Ben Rady>Okay.

199
00:16:52.040 --> 00:17:06.120
<v Matt Godbolt>And when you're writing a device driver, if you know you're writing some very low level stuff, then having a pointer that is a volatile pointer to something that's like maps to, I don't know, a temperature sensor so that when you read from it, you're reading the temperature.

200
00:17:06.120 --> 00:17:10.500
<v Matt Godbolt>It's not real memory. It's just a thing that that's a great candidate for something to be volatile.

201
00:17:10.500 --> 00:17:17.760
<v Matt Godbolt>It says like, hey, this can change outside. Every time I say to read or write to this, you can't optimize about it. There's nothing you can do. Just do it.

202
00:17:17.760 --> 00:17:18.200
<v Ben Rady>Yeah.

203
00:17:18.200 --> 00:17:20.050
<v Matt Godbolt>So that is a way to do it.

204
00:17:20.050 --> 00:17:20.340
<v Ben Rady>Mm hmm.

205
00:17:20.340 --> 00:17:29.700
<v Matt Godbolt>But most compilers will see the word keyword volatile and then start slowly backing away from your code.

206
00:17:29.700 --> 00:17:31.040
<v Ben Rady>They Homer into the bushes.

207
00:17:31.040 --> 00:17:33.320
<v Matt Godbolt>They homer into the, but well, certainly the optimizer does.

208
00:17:33.320 --> 00:17:33.520
<v Ben Rady>Yeah.

209
00:17:33.520 --> 00:17:36.020
<v Matt Godbolt>The optimizer says, oh, there's a volatile here.

210
00:17:36.020 --> 00:17:37.080
<v Ben Rady>Uh huh.

211
00:17:37.080 --> 00:17:52.480
<v Matt Godbolt>Um, so as not to anger the gods of volatility, we're gonna, we're gonna not do anything here. Right. So in your very critical code, you might, uh, arguments obviously, ah um, exist for like, don't use volatile for this, for other reasons, which we'll get to in a second.

212
00:17:52.480 --> 00:18:12.830
<v Matt Godbolt>But um specifically for shared memory, where another another process outside of the program you're compiling could be changing it, there's an argument that says volatile is the right way to model it. So the short answer is for shared memory like this, there isn't really a good way for C++ to say, hey, this bit of RAM outside of this program's remit will change under your fee.

213
00:18:12.830 --> 00:18:14.160
<v Ben Rady>Mm hmm.

214
00:18:14.160 --> 00:18:23.760
<v Matt Godbolt>Right. So the best we can do is we model it with atomics. And atomics have a whole bunch of things that come with them, right? One of the themes is atomicity at the hardware level.

215
00:18:23.760 --> 00:18:33.920
<v Matt Godbolt>Like if I read and write to an atomic variable, then it either happened or it didn't happen. You'll never see half of it written, you know, like if you've got ah an eight byte sequence number or a four byte sequence number.

216
00:18:33.920 --> 00:18:34.520
<v Ben Rady>Yeah, right.

217
00:18:34.520 --> 00:18:43.480
<v Matt Godbolt>There's no world in which I write that, and somehow only the first two bytes of that four byte value have been written. That's what, at the basic level, an atomic operation is.

218
00:18:43.480 --> 00:18:44.200
<v Ben Rady>Yeah, yeah.

219
00:18:44.200 --> 00:18:54.780
<v Matt Godbolt>And the the compiler will collaborate and generate the correct CPU sequences, or it will make it an error that this can't happen. Like, you hey, you've you've got a 128 byte structure.

220
00:18:54.780 --> 00:18:56.400
<v Matt Godbolt>that We can't do this atomically.

221
00:18:56.400 --> 00:18:57.720
<v Ben Rady>Yeah, you can't guarantee that. So yeah.

222
00:18:57.720 --> 00:19:04.290
<v Matt Godbolt>Now that that being said, actually what it will do is it will actually use a spin lock to do bigger ah bigger things.

223
00:19:04.290 --> 00:19:04.420
<v Ben Rady>Go.

224
00:19:04.420 --> 00:19:14.120
<v Matt Godbolt>So there's there's a way of saying, please, um you can assert statically at compile time that like, hey, can I do this without some kind of other lock being taken out? Because obviously the whole point of this is not not to have a lock.

225
00:19:14.120 --> 00:19:14.540
<v Ben Rady>Uh huh.

226
00:19:14.540 --> 00:19:17.340
<v Matt Godbolt>So it would be really tragic to to it ah to to have something.

227
00:19:17.340 --> 00:19:17.340
<v Ben Rady>come

228
00:19:17.340 --> 00:19:50.120
<v Matt Godbolt>But all all modern CPUs will let you write four bytes and probably eight bytes. And with some some like very big caveat and footnotes, 16 bytes using some tricks. um But you can do those atomically, provided they're a line, provided they they they're inside a cache line and they don't straddle two cache lines, all these kind of things. But like we can assume that we can do that, but it's only a small amount that we can do that. But um another thing that with the C++ memory model is that the atomics also model the fact that this could be being written or read from other threads.

229
00:19:50.120 --> 00:19:55.230
<v Matt Godbolt>Now, obviously, if we talk about multiple processes, that's kind of a stretch to call it a thread.

230
00:19:55.230 --> 00:19:56.420
<v Ben Rady>Right.

231
00:19:56.420 --> 00:20:07.940
<v Matt Godbolt>Um, but it's the kind of the best we've got right now. If you wanted to share between processes, it's only the thing that I've, I've gone with here. Now, automics are funny because, um,

232
00:20:07.940 --> 00:20:28.520
<v Matt Godbolt>Reading or writing to an atomic doesn't necessarily or sorry reading or writing atomically to memory doesn't necessarily um ah follow that the compiler isn't going to reorder things. It just means that you're literally reading you know writing to this memory location. So back in my atomically, um in the example of the A equals 1, B equals 2, C equals, sorry, A equals 3,

233
00:20:28.520 --> 00:20:28.720
<v Ben Rady>Mm hmm.

234
00:20:28.720 --> 00:20:41.220
<v Matt Godbolt>ah thing. You could make those all atomic, and I'm saying not this is not C++ atomic. I mean, atomic in terms of like the CPU can read and write these memory locations, and it either happens or it doesn't happen.

235
00:20:41.220 --> 00:20:51.690
<v Matt Godbolt>um And without telling the compiler, oh, but you couldn't like, elide the fact that A equals one ever happened. You can't throw the A equals away because you're about to run it overwrite it with a three later on. um That's a separable thing.

236
00:20:51.690 --> 00:20:51.740
<v Ben Rady>Right, yeah.

237
00:20:51.740 --> 00:21:12.140
<v Matt Godbolt>But in C++, they're sort of bound together. And so when you say something's atomic, you're also saying something about the operations that can happen with um the the memory system, where the memory system encompasses both the compiler's ability to move things around, and more critically, the CPU's ability, which we'll talk about in a sec.

238
00:21:12.140 --> 00:21:27.440
<v Matt Godbolt>So in the example of um setting A to 1, setting B to 2, and then setting A to 3, if they were all atomic, the compiler wouldn't throw away the A equals 1 at the top, because it's like, no, this had some semantic meaning to something I can't see.

239
00:21:27.440 --> 00:21:29.600
<v Ben Rady>Right. Yeah.

240
00:21:29.600 --> 00:22:02.550
<v Matt Godbolt>um At least by default, it won't do that. um the The default in C++ is to use an incredibly expensive ah sequentially consistent view for atomic operations, which means that the atomic operations and the code around them happens in a the one true order for even if you have multiple threads doing it, which is expensive potentially on some hardware that makes that an expensive operation to to serialize. you know You could potentially imagine that like um in order to make sure that um Everybody sees A equals 1 before anyone sees B equals 2.

241
00:22:02.550 --> 00:22:14.690
<v Matt Godbolt>you have to kind of like add in instructions or like some other um um sort of ex machina way of saying all threads need to have, make sure that they are, they're good. The caches are all consistent across all the threads.

242
00:22:14.690 --> 00:22:14.840
<v Ben Rady>Right, right.

243
00:22:14.840 --> 00:22:20.250
<v Matt Godbolt>And now now I can write to B and then we go again, hey, has everyone got the B equals two if you care about it and so on.

244
00:22:20.250 --> 00:22:20.280
<v Ben Rady>Mm hmm.

245
00:22:20.280 --> 00:22:32.270
<v Matt Godbolt>And that can be a very expensive operation. And so there are these different ways of of acting with atomics where you can give sort of ah a set a secondary semantic meaning that's less strict than this sequentially consistent view.

246
00:22:32.270 --> 00:22:33.120
<v Ben Rady>Mm hmm.

247
00:22:33.120 --> 00:22:45.400
<v Matt Godbolt>um Like, for example, you can say, this is an acquire. So when you read, ah ah when you store a number in, you could say, this is an acquire operation. I am i am um ah i um um acquiring a lock is the way I like to think about it.

248
00:22:45.400 --> 00:22:45.920
<v Ben Rady>Mm hmm.

249
00:22:45.920 --> 00:23:05.600
<v Matt Godbolt>And then when you finish with it, you release it. And again, it's not like this has any, I mean, we're not actually acquiring any locks or anything like this, but it has the semantics of a release operation and an acquire operation. And what do I mean by that? Well, if you if you're taking out a lock, it's a pretty big hint that that anything after that is under the purview of that lock.

250
00:23:05.600 --> 00:23:05.980
<v Ben Rady>Right.

251
00:23:05.980 --> 00:23:15.510
<v Matt Godbolt>So if you've got 10 lines of code and line 5 takes out a lock, line 6, 7, 8 and 9 can't be shuffled above the lock because otherwise you're doing some things that you were trying to protect outside of the locked area.

252
00:23:15.510 --> 00:23:16.180
<v Ben Rady>Yes.

253
00:23:16.180 --> 00:23:16.400
<v Ben Rady>Right. Right, right, right. Yep.

254
00:23:16.400 --> 00:23:28.260
<v Matt Godbolt>And then similarly the release says nothing above the release can come south of the release operation itself because again you want to protect things that are bounded by an acquire and and and a release sort of pair.

255
00:23:28.260 --> 00:23:28.480
<v Ben Rady>Yeah.

256
00:23:28.480 --> 00:23:56.320
<v Matt Godbolt>There are some other aspects to this to do with different CPUs and different threads and things that are much more subtle in this in this respect. But um that is sort of the gist of it. And then you can also have like a relaxed operation, which is I need it to be an atomic operation, but I don't really make any guarantees about who which CPUs might see which values as long as there is still atomic. That's more like the traditional original atomic that we were talking about, where it's just make sure this happens at atomically at a hardware level, but other otherwise imposes no other restrictions on the order of things.

257
00:23:56.320 --> 00:24:11.840
<v Matt Godbolt>So i it's easy to think about it from where I'm sitting in terms of acquire and release. um And so you could imagine that in our case of our writer here, we write the incremented version, the value, with an acquire.

258
00:24:11.840 --> 00:24:31.160
<v Matt Godbolt>We do all of our rummaging around, and then we write the the the secondary plus the the second increment with a release. And it kind of tells me, my my processor, that like everything I did between the adding one and adding two to the original number, the first increment and the second increment, is for all intents and purposes a lock.

259
00:24:31.160 --> 00:24:31.800
<v Ben Rady>Yeah, yeah, yeah.

260
00:24:31.800 --> 00:24:43.820
<v Matt Godbolt>And that makes a lot of sense, right? Unfortunately C++ has some awkward very specific to C++ things that make that not true, but that's fine. But logically, like in a that that's that seems to make sense.

261
00:24:43.820 --> 00:24:55.620
<v Matt Godbolt>And on the reader, it's a little bit more difficult. You're definitely reading to acquire both times. You're like acquiring a lock, lock, inverted commas, then you're reading out stuff, and then you're reading it again.

262
00:24:55.620 --> 00:25:06.090
<v Matt Godbolt>But again, it's a little bit more subtle than that, because the second thing is also a release in a way. Because if you did get a good copy, you need to make sure that nothing of the copy leaks the other side of the the acquire.

263
00:25:06.090 --> 00:25:06.760
<v Ben Rady>Yeah.

264
00:25:06.760 --> 00:26:14.110
<v Matt Godbolt>It's easy with a picture, but it's subtle. It's subtle and difficult to get right, which is why it took two solid days and lots of testing. And that that brings us to the next bit here. So it's very straightforward. ha It's somewhat straightforward to explain it without a picture and without a whiteboard around um in terms of this sort of hand waving um Things the compiler is allowed to reorder it's much much harder to prove that you got it right Because the output of the compiler may or may not encode this a bit this this like optimization barrier that you've put in you all you've said to the compiler is like hey by saying by putting this acquire in here I'm saying that no no read or that happens after this acquire can be reordered by the compiler ah north of this this read or whatever the the acquire operation. But I can't all I can I can get sort of circumstantial evidence by looking at the disassembly of it and kind of go it doesn't seem to have done anything there but I can't tell that in general it is not able to do that because

265
00:26:14.110 --> 00:26:14.900
<v Ben Rady>Yeah, right. Right.

266
00:26:14.900 --> 00:26:33.440
<v Matt Godbolt>It depends which program I put it in. If I put it in a big program where lots of inlining happening happens and all that kind of stuff, I don't know if I have hinted to the compiler correctly that this is in fact the right thing to do. um So that was a real challenge. um And we've talked a little bit about testing these kinds of things before.

267
00:26:33.440 --> 00:26:37.420
<v Ben Rady>Yeah, testing multi-threaded code in general is not easy.

268
00:26:37.420 --> 00:26:38.560
<v Matt Godbolt>No, exactly.

269
00:26:38.560 --> 00:26:38.980
<v Ben Rady>Yeah.

270
00:26:38.980 --> 00:26:46.020
<v Matt Godbolt>it's It's very difficult and you can, all you can do is kind of develop a certain amount of confidence that you haven't got something egregiously wrong.

271
00:26:46.020 --> 00:27:02.150
<v Ben Rady>Right. I mean, it's the same. I think it's the same thing with like looking at your compiler output, right? The best that you can do in that case is see that you were wrong. But if you fail to see that you were wrong, you don't know that you're right.

272
00:27:02.150 --> 00:27:04.240
<v Matt Godbolt>That's, that's exactly it. Yeah.

273
00:27:04.240 --> 00:27:19.220
<v Ben Rady>So you you tend to do, and I don't know if this is true of of the sort of you know decompiled analysis, but you know you you can you can do a lot of things where you give yourselves lots of opportunities to see that you're wrong.

274
00:27:19.220 --> 00:27:28.820
<v Ben Rady>you know Run the code on different architectures, run it with different sets of data, run it you know multiple times over and over again, different compilers, different compiler settings, I'm sure.

275
00:27:28.820 --> 00:27:29.080
<v Matt Godbolt>different compilers even, yeah.

276
00:27:29.080 --> 00:27:29.560
<v Matt Godbolt>Yeah, yeah, yeah, for real.

277
00:27:29.560 --> 00:27:39.680
<v Ben Rady>And you sort of like look for all of the possible opportunities to prove yourself wrong. And then you fail to prove yourself wrong. And and then you just kind of give up and admit that you might be right.

278
00:27:39.680 --> 00:27:46.220
<v Matt Godbolt>That is exactly how, yeah, that is exactly the ah the sort of the approach that the I've been taking, which is why it's taken

279
00:27:46.220 --> 00:27:46.220
<v Ben Rady>Mm hmm.

280
00:27:46.220 --> 00:27:50.200
<v Matt Godbolt>quite so long apart from just reasoning about it in the first place.

281
00:27:50.200 --> 00:28:07.150
<v Matt Godbolt>And interesting you mentioned about architectures there, because one of the gifts to ah programmers like myself of the Intel domination in server architecture at least, is a very strong memory ordering.

282
00:28:07.150 --> 00:28:07.720
<v Ben Rady>Hmm.

283
00:28:07.720 --> 00:28:19.100
<v Matt Godbolt>So one of the really cool things, I'm sure we've talked about it before, and certainly anyone who's seen any of the nonsense that I've put on the internet knows that I love the kind of amazing tricks that the the CPU is pulling off under the under the hood.

284
00:28:19.100 --> 00:28:19.820
<v Ben Rady>Mm hmm.

285
00:28:19.820 --> 00:28:31.660
<v Matt Godbolt>And one of the one of its sort of real trump cards that has caused problems along the way ah is its ability to reorder and speculate instructions out of the sequence that you gave it them.

286
00:28:31.660 --> 00:28:33.280
<v Ben Rady>Uh huh. Yeah.

287
00:28:33.280 --> 00:28:50.600
<v Matt Godbolt>um Very much like the compiler itself does, the CPU is able to find flows and sequences of instructions and reorder them to take better advantage of the fact, hey, I have a spare multiplier going now, but look, I can find a multiplier that's like 300 instructions in the future, and I know what inputs are going to go into it, so I might as well start it now.

288
00:28:50.600 --> 00:28:50.920
<v Ben Rady>Mm hmm.

289
00:28:50.920 --> 00:29:07.620
<v Matt Godbolt>All those kind of tricks. It's amazing. But when it comes to loads and stores to memory, it doesn't reorder those. Or if it does, it doesn't in a way where it can track if it got it wrong, and it can redo them later on, which is ah takes an a bonkers amount of silicon to do.

290
00:29:07.620 --> 00:29:07.800
<v Ben Rady>Hmm.

291
00:29:07.800 --> 00:29:42.100
<v Matt Godbolt>But it does mean that in general, if I go load load store on one thread, and on another thread, I go, ah no, yeah, some, some, I wonder, it's so difficult to explain this over without without a picture, but the The sequence of operations will make sense there effectively for most intents and purposes there are exceptions this sort of effectively so sequential um even even in the presence of of these kinds of atomic like operations where you've got what thread a doing one thing and thread be doing another thing it's very hard to catch it out doing the wrong thing. So for the most part

292
00:29:42.100 --> 00:30:01.240
<v Matt Godbolt>even the naive code that's got it wrong in air quotes won't be wrong on an x86 because the compiler short of it doing its own optimizations, the code that it generates will will will run in the boring, dull order that that of loads and stores and increments and things that that you you gave it.

293
00:30:01.240 --> 00:30:02.120
<v Ben Rady>Mm hmm.

294
00:30:02.120 --> 00:30:07.500
<v Matt Godbolt>um That is not true in general of every other CPU on the planet.

295
00:30:07.500 --> 00:30:34.560
<v Matt Godbolt>Because Intel loved to be backwards compatible, and presumably the very first time they went multi-threaded, they went, oh, this seems like a useful thick property to have, unaware that it was going to hamstring them for like the next 25 years. But ARM CPUs, RISC-V CPUs, MIPS, and anything else, nothing else that I'm aware of, has such a strong memory ordering. That is to say, if you do loads of a bunch of stores and loads on one CPU,

296
00:30:34.560 --> 00:30:44.160
<v Matt Godbolt>um They could be reordered with respect if you are able to observe them on another CPU Right, obviously, they're not willy-nilly reordered to make your program wrong on a single threaded case, right?

297
00:30:44.160 --> 00:30:44.280
<v Ben Rady>Yeah, yeah, yeah. Okay. me

298
00:30:44.280 --> 00:30:59.930
<v Matt Godbolt>That's not what I say here I'm saying that like, you know, hey if you do load load store and the the CPU um ah It won't read all those loads and stores from where you're sitting to give you the wrong answer but you might see something weird and wacky on another CPU the I think that the there's a sort of cases if you ah

299
00:30:59.930 --> 00:31:00.060
<v Ben Rady>Right.

300
00:31:00.060 --> 00:31:23.580
<v Matt Godbolt>ah do sort of like remember the value of a on one thread and then write b to be one on one and then on the other one you go write b to be sorry but write a to be one and then remember the value of a you can see a case where they both observed a and b to be zero and then they both wrote one to the other and there shouldn't be a way that they could do that right ah again paper this is terrible podcast material i'm sorry

301
00:31:23.580 --> 00:31:45.520
<v Matt Godbolt>So anyway, for on those architectures, the um the atomic operations and the sort of like this acquire and release or acquire release type semantic thing that you give to it is not just a hint to the compiler to say, your compiler, you're not allowed to move these things around. It tells the compiler to put instructions or flags on the instructions that say, hey, CPU, you're not allowed to do reordering here.

302
00:31:45.520 --> 00:32:07.220
<v Matt Godbolt>And so one of the other things that we were doing while we were like working on this was we were deliberately, although we don't run on any architecture other than x86, we were using the presence and the particular semantic ah interpretation of the presence of these serializing barriers or fences or loading instructions that have like particular properties about reordering that can happen.

303
00:32:07.220 --> 00:32:07.800
<v Ben Rady>oh Okay.

304
00:32:07.800 --> 00:32:27.960
<v Matt Godbolt>as a litmus paper, a litmus test for not only do we observe the code to be correct on x86, but we observe the kinds of loads and stores to be the right kinds of loads and stores on ARM that says, hey, the compiler knows that it absolutely has to have let this increment happen before any of the the following code it completes.

305
00:32:27.960 --> 00:32:28.540
<v Ben Rady>Right.

306
00:32:28.540 --> 00:32:36.860
<v Ben Rady>Right, right, right, right. Yes. ha This is sort of like telling telling somebody something and then being like, OK, now explain it back to me.

307
00:32:36.860 --> 00:32:37.840
<v Matt Godbolt>I suppose it is.

308
00:32:37.840 --> 00:32:45.770
<v Ben Rady>Right. And to making sure that they understood what you said, it's like you're going to put these things in there and see if the compiler reorders things in a way that doesn't make sense.

309
00:32:45.770 --> 00:32:46.080
<v Matt Godbolt>Yeah.

310
00:32:46.080 --> 00:32:54.160
<v Ben Rady>And you're like, well, OK, maybe maybe we don't we're not on the same page here about about what ah instructions need to be ordered.

311
00:32:54.160 --> 00:32:55.770
<v Matt Godbolt>Well that's absolutely it.

312
00:32:55.770 --> 00:32:55.960
<v Ben Rady>Yeah.

313
00:32:55.960 --> 00:33:04.080
<v Matt Godbolt>Now obviously there's still no guarantee that that's right and it's still no guarantee that while the compiler has emitted those correct instructions that it still wouldn't

314
00:33:04.080 --> 00:33:14.760
<v Matt Godbolt>move things around above and below those instructions because it feels it has that degree of freedom. I can't introspect into the compiler and look at each instruction and say, given free reign, would you move this up further?

315
00:33:14.760 --> 00:33:15.740
<v Ben Rady>Right. Yeah.

316
00:33:15.740 --> 00:33:17.700
<v Matt Godbolt>But that would be an interesting thing to do.

317
00:33:17.700 --> 00:33:29.930
<v Matt Godbolt>So it has been an interesting few days. I mean, and the the performance you get out of this is pretty astronomical. It's like three on x86. It's, you know, single clock cycles to read and write to something that is, you know, 16 bytes long.

318
00:33:29.930 --> 00:33:30.140
<v Ben Rady>Oh wow.

319
00:33:30.140 --> 00:33:30.140
<v Matt Godbolt>Now,

320
00:33:30.140 --> 00:33:30.140
<v Ben Rady>Yeah.

321
00:33:30.140 --> 00:33:44.140
<v Matt Godbolt>For something that's eight bytes long, you can just automatically write the whole thing and be done with it, and you don't need any kind of sequence lock. But if you've got 16, 24, 32, even up to 48 bytes, a single cache line's worth of work, this is still essentially free.

322
00:33:44.140 --> 00:33:44.620
<v Ben Rady>Mhm. Mhm. Mhm.

323
00:33:44.620 --> 00:33:55.320
<v Matt Godbolt>um free it's all It's all relative in our world. So it's been a super fun thing um to the point where actually, you know, someone was reviewing the code and saying, you know, well, you say it's single threaded writer.

324
00:33:55.320 --> 00:34:13.260
<v Matt Godbolt>Can you detect the case where there are two writers and, you know, throw an exception or blow up or, you know, something like that? And it's like the answer would be yes, I can. But it would add something in a region of twice the overhead, because just even the check and the read and the whatever and the out of band jump is is actually noticeable at this level.

325
00:34:13.260 --> 00:34:13.520
<v Ben Rady>Right.

326
00:34:13.520 --> 00:34:23.920
<v Matt Godbolt>It's pretty minimal. Don't get me wrong, it is pretty minimal. but And also what what you're going to do at this point, you know like yeah it's it's a bit late if you've got two writers, but it's nice to have in debug. So that's what we've done as we put it in ah as a debug check.

327
00:34:23.920 --> 00:34:24.040
<v Ben Rady>Yeah, yeah.

328
00:34:24.040 --> 00:34:38.500
<v Matt Godbolt>and then But it's been a really fun and interesting journey. And in doing so, we've discovered a whole bunch of interesting other techniques that you can use to do to have something that have like different trade-offs between back pressure on the writer versus readers having free rein.

329
00:34:38.500 --> 00:34:39.520
<v Ben Rady>Mm hmm.

330
00:34:39.520 --> 00:34:47.800
<v Matt Godbolt>The classic example of something that's that's a little bit more fair for the readers is a reader's writer lock, like an actual lock in this instance, where

331
00:34:47.800 --> 00:34:47.800
<v Ben Rady>Okay.

332
00:34:47.800 --> 00:35:10.770
<v Matt Godbolt>the The lock is an atomic value and like the bottom 16 bits or the bottom 32 bits are how many readers are currently reading and the top bit or however many bits is like, is there a writer? And then the readers in order to acquire a lock, they have to do some work here. They acquire a lock and they basically ah try to atomically replace the value with one higher.

333
00:35:10.770 --> 00:35:11.780
<v Ben Rady>Mm hmm.

334
00:35:11.780 --> 00:35:16.930
<v Matt Godbolt>And if it fails, they'll get it was part of the the atomic switch operation that the CPUs can do.

335
00:35:16.930 --> 00:35:17.120
<v Ben Rady>Mm hmm.

336
00:35:17.120 --> 00:35:28.000
<v Matt Godbolt>You get back what the actual number was and you can see whether or not the writer was there. And if the writer was there, you know, all bets are off. But if you were able to increment it, okay, then you're now you've let the writer know that there's at least one reader.

337
00:35:28.000 --> 00:35:40.270
<v Matt Godbolt>And so that's what all the readers and you can have like 10 readers all reading from the same thing and that's fine. They can all be reading from it. And then the writer has to do the other thing where it says, I need to be able to to replace a zero. Like literally there are no readers and no writers with a top bit set value.

338
00:35:40.270 --> 00:35:40.620
<v Ben Rady>Right.

339
00:35:40.620 --> 00:35:48.020
<v Matt Godbolt>And if I'm able to automatically do that operation, then I've locked out all of the readers and I've got the the lock and I can spend my time writing to it.

340
00:35:48.020 --> 00:36:02.640
<v Ben Rady>Yeah. So just to, just to clear, clarify here, when you say there are, you know, you've got these two halves of this, in the upper bits of the number of writers and the lower bits of the number of readers, you're saying the number of things that are currently reading or currently writing.

341
00:36:02.640 --> 00:36:02.880
<v Matt Godbolt>Yeah, sure.

342
00:36:02.880 --> 00:36:03.660
<v Matt Godbolt>Correct, currently reading and writing.

343
00:36:03.660 --> 00:36:06.460
<v Ben Rady>Not like the logical number of readers and writers in the system.

344
00:36:06.460 --> 00:36:08.900
<v Matt Godbolt>No, no, no, no, no, this is a current thing which allows them and so yes, yeah, yeah.

345
00:36:08.900 --> 00:36:09.020
<v Ben Rady>Yeah. Yes.

346
00:36:09.020 --> 00:36:13.530
<v Matt Godbolt>And at that, so at that point, what you're doing is you're actually preventing the data race.

347
00:36:13.530 --> 00:36:13.640
<v Ben Rady>Yeah.

348
00:36:13.640 --> 00:36:22.240
<v Matt Godbolt>So that's an aspect we glossed over earlier is that strictly by the book, taking that copy of the data while it was unprotected by any lock is

349
00:36:22.240 --> 00:36:22.240
<v Ben Rady>Mm hmm.

350
00:36:22.240 --> 00:36:33.650
<v Matt Godbolt>not allowed and by the C++ standard. It's an undefined behavior to read a value that's being updated. You know even though we don't look at it unless we know that we've got a good copy, that's not okay by the C++ standard.

351
00:36:33.650 --> 00:36:33.690
<v Ben Rady>Right.

352
00:36:33.690 --> 00:36:33.720
<v Ben Rady>Right.

353
00:36:33.720 --> 00:36:39.400
<v Matt Godbolt>There's some papers out there that are trying to canonify it and come up with ways to make it like okay to do it, but ignoring that for now.

354
00:36:39.400 --> 00:36:40.480
<v Ben Rady>Mm hmm. Yeah.

355
00:36:40.480 --> 00:36:52.160
<v Matt Godbolt>But um yeah, coming back to this thing, in this particular instance, no, we are actually preventing any kind of data race because ah the readers do take out the lock for the tiny amount of time that they're taking their copy or working on it, and then they reduce the lock.

356
00:36:52.160 --> 00:36:52.440
<v Ben Rady>Mm hmm.

357
00:36:52.440 --> 00:36:57.750
<v Matt Godbolt>So it's a number that now it's not the number of readers, it's the number of readers that are currently reading and locked it.

358
00:36:57.750 --> 00:36:58.260
<v Ben Rady>Yeah, yeah, yeah.

359
00:36:58.260 --> 00:37:13.710
<v Matt Godbolt>But they obviously, you can have as many readers as you like, reading from it. And that's potentially useful, again, if it's some kind of shared piece of data, like, you know, this is the, again, the Linux kernel kind of example would be, hey, this is configuration information about which time zone everyone's in.

360
00:37:13.710 --> 00:37:14.540
<v Ben Rady>Right.

361
00:37:14.540 --> 00:37:20.070
<v Matt Godbolt>ah You know, there are many processes that are getting the time of day all the damn time, we never really, you know, we want to

362
00:37:20.070 --> 00:37:20.300
<v Ben Rady>right

363
00:37:20.300 --> 00:37:23.250
<v Matt Godbolt>allow them to make progress most of the time. They don't have to lock each other out.

364
00:37:23.250 --> 00:37:23.580
<v Ben Rady>Right, right.

365
00:37:23.580 --> 00:37:31.870
<v Matt Godbolt>But there's only one writer and it's the kernel. And every now and then he needs to be able to go in and say, Hey, someone just did a sys ctrl and change the time zone or whatever it is, whatever kind of shared information.

366
00:37:31.870 --> 00:37:32.220
<v Ben Rady>Yeah, right.

367
00:37:32.220 --> 00:37:39.780
<v Matt Godbolt>Of course, the problem with that is that the readers, if there are enough of them can easily starve out the writer, you can get a situation where the writer can never actually

368
00:37:39.780 --> 00:37:41.520
<v Ben Rady>Oh, right. Yes.

369
00:37:41.520 --> 00:37:52.320
<v Matt Godbolt>And so then it gets more sophisticated where you start using more bits to say, hey, there's a pending writer, and then the readers aren't allowed to get a lock if there is a pening pending writer, even though, you know, and so on and so forth.

370
00:37:52.320 --> 00:37:52.400
<v Ben Rady>Uh huh.

371
00:37:52.400 --> 00:38:02.790
<v Matt Godbolt>So these things are complicated, but they're fun. They're fun and interesting. But at least that case doesn't suffer from the data race, the sort of strict data race by the book um ah version there.

372
00:38:02.790 --> 00:38:03.660
<v Ben Rady>Mm hmm.

373
00:38:03.660 --> 00:38:20.660
<v Matt Godbolt>But yeah, testing this has been essentially an exercise in all the things we talked about, looking at the disassembly, writing some simple ish tests. And I think ah in one of our earlier episodes, you made a ah comment about, you know, these kinds of tests have to be something like, you have to have seen it work once.

374
00:38:20.660 --> 00:38:33.620
<v Ben Rady>Oh, yeah, yeah, yeah, yeah, right. Well, this I so I had made the argument in an earlier episode that, um you know, write all the unit tests you want, and I do, but write all the unit tests you want.

375
00:38:33.620 --> 00:38:39.280
<v Ben Rady>But if you've never actually seen this thing work in production, you have no reason to believe that it works, right?

376
00:38:39.280 --> 00:38:39.660
<v Matt Godbolt>Right.

377
00:38:39.660 --> 00:39:00.080
<v Ben Rady>And the intention there is to, again, as we were just kind of talking about, you're passing on an opportunity to prove yourself wrong. And sort of moreover, like, it's one thing to say like, okay, I wrote this, you know, lock free queue, and I pushed a bunch of data into it, and I pulled a bunch of data back out,

378
00:39:00.080 --> 00:39:11.920
<v Ben Rady>And I've tried that a number of different ways with different data sets on different architectures and all these things trying to prove that it doesn't work. But if you never even tried the base case of like, did it ever work at all once?

379
00:39:11.920 --> 00:39:12.880
<v Matt Godbolt>Right, then that's true.

380
00:39:12.880 --> 00:39:29.260
<v Ben Rady>And that's just, you're just being silly, right? You're being silly on purpose. But ah you know that that sort of rule of like, have you seen it work once, I think is is more applicable for things that are more mundane, um that are like,

381
00:39:29.260 --> 00:39:36.100
<v Ben Rady>um you know, pretty, pretty easy to get confidence that they work with just unit tests. And the only question is, do you have enough unit tests?

382
00:39:36.100 --> 00:39:36.280
<v Matt Godbolt>Mm-hmm.

383
00:39:36.280 --> 00:39:44.860
<v Ben Rady>Right. It's like, if you've never seen it work once, then, you know, you don't really, there's no reason to believe that you have enough unit tests essentially. Right.

384
00:39:44.860 --> 00:39:45.200
<v Matt Godbolt>Yeah.

385
00:39:45.200 --> 00:39:53.500
<v Ben Rady>But this is a whole other thing, right? This is like, you can see it work a thousand times and still not be confident that it's going to work the thousandth and first.

386
00:39:53.500 --> 00:39:54.480
<v Matt Godbolt>That's, that is, I think the material difference.

387
00:39:54.480 --> 00:40:09.340
<v Ben Rady>Right. Yeah. And you got to get much more creative about how to create that confidence. Um, because unit tests alone and a little bit of exploratory testing, you know, watching it work one time or two times or four, five times is not going to be enough.

388
00:40:09.340 --> 00:40:22.910
<v Matt Godbolt>Yeah, no, it's really, interesting I mean, so when but I put out that we had four different engineers review the, um and this definitely passes the test of, of you know, ah like as in in general, I don't write comments in my code.

389
00:40:22.910 --> 00:40:23.100
<v Ben Rady>Yeah. yeah

390
00:40:23.100 --> 00:40:34.040
<v Matt Godbolt>I think I've said this before, I prefer to have explanatory names for things where it makes a lot of sense. And so, you know, rather than manifest, I mean, most people and a lot of people feel this way.

391
00:40:34.040 --> 00:40:41.900
<v Matt Godbolt>You know, you put sensibly named intermediate values or or variable names, and then it's like it explains itself because essentially the last line is a piece of English prose, right?

392
00:40:41.900 --> 00:40:42.020
<v Ben Rady>and Yeah.

393
00:40:42.020 --> 00:40:45.170
<v Matt Godbolt>That's great. Return, interest rate, times, whatever, you know, there you go.

394
00:40:45.170 --> 00:40:45.280
<v Ben Rady>Yeah. Yeah.

395
00:40:45.280 --> 00:40:55.550
<v Matt Godbolt>They don't have to explain it. But this code has somewhere in the region of 50 lines of non-comment, and it's a four two or 300 lines of header.

396
00:40:55.550 --> 00:40:55.800
<v Ben Rady>Yeah.

397
00:40:55.800 --> 00:40:56.090
<v Matt Godbolt>Right?

398
00:40:56.090 --> 00:40:56.160
<v Ben Rady>yeah

399
00:40:56.160 --> 00:41:01.980
<v Matt Godbolt>It's like the ratio is obscenely the other way, because there's a huge set up explaining how the general thing works.

400
00:41:01.980 --> 00:41:02.220
<v Ben Rady>Right.

401
00:41:02.220 --> 00:41:14.800
<v Matt Godbolt>Then above every single line, there's my reasoning why it's okay to have it in this particular sequence in order, why there are these flags set, why it which are the lines of the code it it kind of interlocks with in terms of if the other thread was doing this thing on this other thread.

402
00:41:14.800 --> 00:41:15.740
<v Ben Rady>Mm hmm. Mm hmm.

403
00:41:15.740 --> 00:41:41.500
<v Matt Godbolt>But um yeah, so anyway, it went to review. Four engineers looked at it, and all of them so far have said, yes, it looks good to me. um But one of them brought up a really interesting point, which was like, you know there are formal verification systems that for multithreaded programming that you can use, um where you describe in a sort of DSL the operations that you're doing, and then a sort of theorem prover runs and shows that there isn't a a window of opportunity where things are wrong.

404
00:41:41.500 --> 00:41:59.480
<v Matt Godbolt>And so, for example, you know if you're if you're designing, I don't know, the Paxos protocol or the Raft protocol or whatever, then this is the kind of thing, if you can explain your ah your protocol algorithm to this system, it can say, yes, I can prove that there exists no opportunity for this this to give you the wrong answer.

405
00:41:59.480 --> 00:41:59.580
<v Ben Rady>Yeah.

406
00:41:59.580 --> 00:42:09.400
<v Matt Godbolt>And that was a really interesting case, ah interesting observation. But unfortunately, they don't run on C++, they run on the description of the system.

407
00:42:09.400 --> 00:42:10.140
<v Ben Rady>Right, right.

408
00:42:10.140 --> 00:42:25.840
<v Matt Godbolt>And that's where I fell over in terms of like, I don't know that this is going to help me here because I quote, know that sequence locks work in theory because they're in the Linux kernel and I've got five different on, you know, there are five different open source things that have very wildly different um atomic operations in them.

409
00:42:25.840 --> 00:42:42.680
<v Matt Godbolt>And so I was like, no. Let me try and do reason this from first principle so that I can defend why we're using it doing it this way. But I would have no confidence that any explanation of what I had done to a theorem prover was ah a good representation of what the C++ semantics were.

410
00:42:42.680 --> 00:42:44.620
<v Ben Rady>It's the difference between the algorithm and the implementation.

411
00:42:44.620 --> 00:42:45.060
<v Matt Godbolt>Exactly right.

412
00:42:45.060 --> 00:42:51.700
<v Ben Rady>You can prove that the algorithm is correct using the theorem prover. To prove the implementation is correct is much more work.

413
00:42:51.700 --> 00:42:52.400
<v Matt Godbolt>Yeah.

414
00:42:52.400 --> 00:42:59.750
<v Ben Rady>um And the other thing is, you know and we mentioned this at ah at the start of the podcast, we were like, well, assume no latency constraints.

415
00:42:59.750 --> 00:43:00.080
<v Matt Godbolt>Right.

416
00:43:00.080 --> 00:43:06.920
<v Ben Rady>And this problem gets a lot simpler. When you add in the latency constraints, you know you were talking about the sort of ratio of code to comments here.

417
00:43:06.920 --> 00:43:08.900
<v Matt Godbolt>Yeah, yeah, yeah, yeah.

418
00:43:08.900 --> 00:43:20.650
<v Ben Rady>It's like the square of the comments now because you have the dimension of thread safety and the dimension of latency, and you're doing both of those things at the same time in the same code, right?

419
00:43:20.650 --> 00:43:20.900
<v Matt Godbolt>Yeah.

420
00:43:20.900 --> 00:43:39.150
<v Ben Rady>um And that's gonna make it much, much, much harder um because you know the the cycle time at least, it's like, okay, we figured out some way to improve the performance by another 30%, and it's like, okay, and how do you know that this hasn't completely broken your threading model?

421
00:43:39.150 --> 00:43:39.420
<v Matt Godbolt>Yeah.

422
00:43:39.420 --> 00:43:42.460
<v Ben Rady>Well, we don't, so we get to do all of that testing all over again, yes.

423
00:43:42.460 --> 00:43:53.040
<v Matt Godbolt>All that, again, yeah, that's the thing. I think you know like one loses the the warm, fuzzy feeling that like you can touch this code in any way and not go through some non-automated handoff.

424
00:43:53.040 --> 00:43:53.060
<v Ben Rady>Right.

425
00:43:53.060 --> 00:44:02.480
<v Matt Godbolt>you know Obviously, I will check in the test and the the smoke test that runs for five seconds and make sure that like nothing overtly stupid happens will remain. And I'll leave that in.

426
00:44:02.480 --> 00:44:02.640
<v Ben Rady>Yeah.

427
00:44:02.640 --> 00:44:36.480
<v Matt Godbolt>but i i can't Or I don't want to think about um having like oh yeah this is it we run it through you know compiler explorer effectively i've actually been using the site to do something for a change ah through his six different architectures and then it should exist that there should be no ldr of this ver variable before the ldar that's the serializing load blah blah you know that kind of stuff that one could go as far as that but i don't know that that wouldn't be so brittle that it would yeah i don't know actually it maybe that is the way to do it i don't know if there's some

428
00:44:36.480 --> 00:44:36.730
<v Ben Rady>Yeah.

429
00:44:36.730 --> 00:44:46.610
<v Ben Rady>Yeah. Well, the the golden rule of all of these things and especially if you have something that is this like thread safety and latency sensitive is design it in a way where you won't have to change it, right?

430
00:44:46.610 --> 00:44:50.070
<v Ben Rady>Like you're gonna do all this upfront costs and all this really hard work

431
00:44:50.070 --> 00:44:50.310
<v Matt Godbolt>ah

432
00:44:50.310 --> 00:44:54.760
<v Matt Godbolt>very

433
00:44:54.760 --> 00:45:06.740
<v Ben Rady>to get this code just exactly the way that you want. Make sure that you encapsulate it really well so that the chances of you having to change it are as minimized as you can make it so that you can just kind of leave it there and then you know go in to git five years later and be like, yep, this code hasn't changed in five years.

434
00:45:06.740 --> 00:45:07.220
<v Matt Godbolt>That's very... Yep.

435
00:45:07.220 --> 00:45:12.960
<v Ben Rady>It's been working for the five years. That's as confident that I'm ever going to get in any piece of code like this that it actually works.

436
00:45:12.960 --> 00:45:14.700
<v Ben Rady>So please don't change it, right?

437
00:45:14.700 --> 00:45:34.320
<v Matt Godbolt>That is a very, very good observation. In fact, one of the, one that of the review comments was I had had ah an interface that was slightly more open-ended and you could do some cool things with it because I could, I gave up this, hey, yeah, you know, rather than just replace the value, I was like, well, the reader can, the the sort of the writing thread can at any time read it, it knows no one else has changed it because it is the only writer.

438
00:45:34.320 --> 00:45:34.620
<v Ben Rady>Mm-hmm. Yeah, yeah.

439
00:45:34.620 --> 00:45:34.720
<v Ben Rady>Mm-hmm.

440
00:45:34.720 --> 00:45:40.660
<v Matt Godbolt>So like you could actually update it anytime you like by looking at the current value and like maybe incrementing it rather than replacing it.

441
00:45:40.660 --> 00:45:46.480
<v Matt Godbolt>And then one of the reviewers was like, yeah, I don't think, I'm a user of this, I'll be a user of this, and I will never want to do that.

442
00:45:46.480 --> 00:45:46.920
<v Ben Rady>Yeah, yeah.

443
00:45:46.920 --> 00:45:50.380
<v Matt Godbolt>And so I took out that interface just like, okay, you've got get, and you've got set.

444
00:45:50.380 --> 00:45:51.020
<v Ben Rady>Right.

445
00:45:51.020 --> 00:45:57.810
<v Matt Godbolt>and And maybe just maybe I will have get, but I promised Scouts honor that I am the ah writing thread.

446
00:45:57.810 --> 00:45:58.040
<v Ben Rady>Mm-hmm.

447
00:45:58.040 --> 00:46:05.700
<v Matt Godbolt>And as such, I'm allowed to get it without any kind of lock promise, promise, promise function that will allow the other access type.

448
00:46:05.700 --> 00:46:05.900
<v Ben Rady>Mm hmm.

449
00:46:05.900 --> 00:46:28.960
<v Matt Godbolt>But Yeah, it's, um, it's been an interesting few days. And, uh, yeah, I figured it would be a fun thing to talk about. And I said to you, this is probably going to be a short one. I've just looked at the timer. And if we haven't lost absolutely everybody with my attempt to try and describe multi-threading memory ordering problems, then I don't know what will get rid of listeners.

450
00:46:28.960 --> 00:46:30.420
<v Ben Rady>We're trying our best here.

451
00:46:30.420 --> 00:46:33.900
<v Matt Godbolt>Yeah. No, no, this has been fun. Thank you for letting me talk about my excite, my pet project.

452
00:46:33.900 --> 00:46:38.420
<v Ben Rady>Oh, no, this is this is super cool. I love this stuff. I'm a big fan. I love this.

453
00:46:38.420 --> 00:46:43.920
<v Matt Godbolt>Well, maybe next time we'll ah we'll talk about something a little bit more human-y again.

454
00:46:43.920 --> 00:46:46.980
<v Ben Rady>I don't know. We can talk about more of this. I'm I'm down with that.

455
00:46:46.980 --> 00:46:47.210
<v Matt Godbolt>All right.

456
00:46:47.210 --> 00:46:47.320
<v Ben Rady>That's.

457
00:46:47.320 --> 00:46:52.360
<v Matt Godbolt>That's cool. Actually, yeah we have we have done a lot of human-ing recently, so we're probably due a bit more tech content.

458
00:46:52.360 --> 00:46:52.700
<v Ben Rady>Yeah, right. Uh huh.

459
00:46:52.700 --> 00:47:04.190
<v Matt Godbolt>Although, you know and to to the extent that folks have contacted us, and we encourage you, you know you can contact us at the... the, the mastodon hackyderm dot.io or whatever it is and or email, you can find our emails.

460
00:47:04.190 --> 00:47:04.680
<v Ben Rady>Mm hmm.

461
00:47:04.680 --> 00:47:06.810
<v Matt Godbolt>um We always love to hear from our listener.

462
00:47:06.810 --> 00:47:07.880
<v Ben Rady>Mm hmm, mm hmm.

463
00:47:07.880 --> 00:47:15.130
<v Matt Godbolt>Um, and you know, folks were very, you know, positive about some of our more human focused and less tech focused stuff, but you know, we can get back to tech as well.

464
00:47:15.130 --> 00:47:15.190
<v Ben Rady>Yeah.

465
00:47:15.190 --> 00:47:15.320
<v Matt Godbolt>That's cool.

466
00:47:15.320 --> 00:47:22.680
<v Ben Rady>You know, I take the philosophy that um the human elements of software are inseparable from the technology technological elements.

467
00:47:22.680 --> 00:47:23.300
<v Matt Godbolt>Yeah.

468
00:47:23.300 --> 00:47:37.460
<v Ben Rady>They are sort of one and the same thing. It's hard to see that, but it really is true. And if you want a great example of this, take a code base that was written by other human beings and give it to a different set of human beings and then watch them.

469
00:47:37.460 --> 00:47:37.480
<v Matt Godbolt>Oh, yeah.

470
00:47:37.480 --> 00:47:38.300
<v Matt Godbolt>Oh, my gosh.

471
00:47:38.300 --> 00:47:45.400
<v Ben Rady>And if that doesn't convince you that, that technology and and people are intimately intertwined, I don't think anything will.

472
00:47:45.400 --> 00:47:53.630
<v Matt Godbolt>We really do need to stop. But ah Kate Gregory has an amazing ah keynote speech where she talks about the empathetic programmer, or something like this.

473
00:47:53.630 --> 00:47:53.760
<v Ben Rady>huh hu

474
00:47:53.760 --> 00:48:01.910
<v Matt Godbolt>And it's it's like all about that. It's like, you know, you should have empathy, because you know, who's going to be reading this code is going to be your friends, your colleagues, and also probably yourself.

475
00:48:01.910 --> 00:48:02.020
<v Ben Rady>Yeah.

476
00:48:02.020 --> 00:48:02.140
<v Matt Godbolt>And obviously,

477
00:48:02.140 --> 00:48:08.260
<v Ben Rady>Yes. Future you who has slept since then and has no memory of what any of these things do.

478
00:48:08.260 --> 00:48:13.280
<v Matt Godbolt>as well. no Yeah, exactly. All right, we should stop now. But this has been fun. Thank you very much. And I guess we'll see you next time.

479
00:48:13.280 --> 00:48:16.280
<v Ben Rady>Yep, until next time.