WEBVTT

1
00:00:20.500 --> 00:00:21.440
<v Matt Godbolt>Hey, Ben.

2
00:00:21.440 --> 00:00:22.320
<v Ben Rady>Hey, Matt.

3
00:00:22.320 --> 00:00:30.980
<v Matt Godbolt>So we yeah planned comprehensively, as always, and today's topic is going to be signals and processes.

4
00:00:30.980 --> 00:00:31.210
<v Ben Rady>Yep.

5
00:00:31.210 --> 00:00:33.340
<v Ben Rady>Yeah, that was, that's

6
00:00:33.340 --> 00:00:36.420
<v Matt Godbolt>And that is the sum extent of our planning.

7
00:00:36.420 --> 00:00:38.280
<v Ben Rady>We said those words out loud annnnnd... Record.

8
00:00:38.280 --> 00:00:52.060
<v Matt Godbolt>And I said, yes, and hit record. And then we continued talking about it during the intro. And we're here. So why is that top of mind for you? Is there a reason why you are worrying about this right now?

9
00:00:52.060 --> 00:01:07.580
<v Ben Rady>There's a reason that I'm worried about this right now, which is that I'm always worried about this because I see part of my job as a software engineer is making sure that the software that I write actually runs... and does what it's supposed to do.

10
00:01:07.580 --> 00:01:07.980
<v Matt Godbolt>Mm-hmm.

11
00:01:07.980 --> 00:01:19.080
<v Ben Rady>Uh, I know that there are lots of places in the world where as a software engineer, you're expected to write code. And then there's another group or team or organization.

12
00:01:19.080 --> 00:01:35.540
<v Ben Rady>or outsourced company that is responsible for actually taking that software and running it on computers and making sure that it continues to run on those computers and that it delivers the value that it is intended to do.

13
00:01:35.540 --> 00:01:39.660
<v Ben Rady>And in some cases, those things are like very separated, right?

14
00:01:39.660 --> 00:01:39.890
<v Matt Godbolt>Right.

15
00:01:39.890 --> 00:01:39.940
<v Ben Rady>Like,

16
00:01:39.940 --> 00:01:51.920
<v Matt Godbolt>You might just make a PR to a function. You change the function, your tests pass, you check it in, and then you have literally no idea how it ends up serving people's requests or whatever it your company does.

17
00:01:51.920 --> 00:01:52.420
<v Ben Rady>Right.

18
00:01:52.420 --> 00:01:53.920
<v Matt Godbolt>Yeah.

19
00:01:53.920 --> 00:02:11.620
<v Ben Rady>Right, right. And then on the other end of that spectrum, I think you can have situations and I have definitely been in these myself where it is like, no, we're building this for the very first time. There's no infrastructure team. There's you and you are going to compile your code.

20
00:02:11.620 --> 00:02:22.900
<v Ben Rady>you are going to SCP your code onto a server somewhere, and then you are going to run a screen and then exec that program in the screen.

21
00:02:22.900 --> 00:02:23.070
<v Matt Godbolt>ah

22
00:02:23.070 --> 00:02:23.560
<v Matt Godbolt>Old school.

23
00:02:23.560 --> 00:02:30.420
<v Ben Rady>And now you can post in a Slack channel or some other but log place, hey, we've deployed to production.

24
00:02:30.420 --> 00:02:35.660
<v Matt Godbolt>And by production you mean, yes, the only reason it didn't quit is because I'm still running it in a screen [session].

25
00:02:35.660 --> 00:02:36.490
<v Ben Rady>Yes, exactly.

26
00:02:36.490 --> 00:02:37.180
<v Matt Godbolt>This is shades of

27
00:02:37.180 --> 00:02:41.320
<v Ben Rady>I did Control A, Control D in the screen, and now our production environment is safe.

28
00:02:41.320 --> 00:02:41.740
<v Matt Godbolt>Everything's fine.

29
00:02:41.740 --> 00:02:42.160
<v Ben Rady>Yes.

30
00:02:42.160 --> 00:02:46.700
<v Matt Godbolt>Everything's fine. What's your logging strategy? Oh, we log back in and we reattach to the screen session to see what happens.

31
00:02:46.700 --> 00:02:46.960
<v Ben Rady>You check the screen. What's in the screen?

32
00:02:46.960 --> 00:02:54.420
<v Matt Godbolt>Yeah! Okay. That does work, but I can understand why, yeah, you might want something a little more sophisticated.

33
00:02:54.420 --> 00:03:02.630
<v Ben Rady>Yes. Well, and those are the two ends of the spectrum, I think, if we're going to simplify it down to a spectrum of like,

34
00:03:02.630 --> 00:03:04.160
<v Matt Godbolt>Yeah.

35
00:03:04.160 --> 00:03:16.040
<v Ben Rady>And, I think that you can in your career and I have done a lot of this as a software engineer, you can kind of like hop to the left.

36
00:03:16.040 --> 00:03:31.060
<v Ben Rady>I don't know.... the side of that spectrum and say, all right, well, okay. I obviously don't want to run it in a screen. What else could I do? And then you start learning about like systemd and things like runit and supervisord and things like that.

37
00:03:31.060 --> 00:03:32.000
<v Matt Godbolt>Or old school nohup was my...

38
00:03:32.000 --> 00:03:33.820
<v Ben Rady>Yeah. Right, nohup.

39
00:03:33.820 --> 00:03:36.520
<v Matt Godbolt>Just nohup the thing and then log out and you're done, right.

40
00:03:36.520 --> 00:03:47.440
<v Ben Rady>Exactly. And then of course, you start moving into distributed environments, the cloud, you learn about Kubernetes and Elastic, what does the ECS stand for? I forget.

41
00:03:47.440 --> 00:03:48.120
<v Ben Rady>and Elastic Compute Service?

42
00:03:48.120 --> 00:03:51.590
<v Matt Godbolt>Container store, compute something, container something.

43
00:03:51.590 --> 00:03:53.210
<v Ben Rady>No, like Container Service.

44
00:03:53.210 --> 00:03:53.610
<v Matt Godbolt>Yeah.

45
00:03:53.610 --> 00:03:54.020
<v Ben Rady>Yeah.

46
00:03:54.020 --> 00:03:57.910
<v Matt Godbolt>Or you've got what's, what's the HashiCorp thing?

47
00:03:57.910 --> 00:03:57.920
<v Ben Rady>Nomad. Yeah.

48
00:03:57.920 --> 00:03:59.980
<v Matt Godbolt>Nomad, Nomad, similar things, you know, yeah.

49
00:03:59.980 --> 00:04:00.160
<v Ben Rady>Nomad. Yeah, yeah.

50
00:04:00.160 --> 00:04:10.880
<v Matt Godbolt>All of these things, which are like orchestration setups that say, Hey, you just tell me some through some mechanism, what you would like to have running and I'll find a place to run them and run them in a particular controlled way.

51
00:04:10.880 --> 00:04:11.400
<v Ben Rady>ah huh

52
00:04:11.400 --> 00:04:17.380
<v Matt Godbolt>And then you take that part of the deployment and running part is taken out of your hands. It's done by a framework, but.

53
00:04:17.380 --> 00:04:17.380
<v Ben Rady>But

54
00:04:17.380 --> 00:04:20.380
<v Matt Godbolt>Presumably. Yeah, go on.

55
00:04:20.380 --> 00:04:33.560
<v Ben Rady>But all these things are accomplishing what is fundamentally the same goal, which is I have produced software and I want it to run. on a computer or maybe multiple computers, maybe not multiple computers.

56
00:04:33.560 --> 00:04:33.680
<v Matt Godbolt>Yeah.

57
00:04:33.680 --> 00:04:35.600
<v Ben Rady>It's like, oh, this needs to run, like exactly one, right?

58
00:04:35.600 --> 00:04:38.000
<v Matt Godbolt>Exactly one or [none at all].

59
00:04:38.000 --> 00:04:44.470
<v Ben Rady>Like there can only, there's like something is consuming a queue and there better only be one of them at a time or bad things are going to happen, right?

60
00:04:44.470 --> 00:04:46.480
<v Matt Godbolt>Yeah.

61
00:04:46.480 --> 00:04:57.750
<v Ben Rady>So I think all of that is kind of encompassed in this in this topic of like, I'm trying to run a program and how do I actually make sure that is happening the way that I want?

62
00:04:57.750 --> 00:04:57.920
<v Matt Godbolt>Yep.

63
00:04:57.920 --> 00:05:07.700
<v Ben Rady>And I think that we could even structure this from sort of the bottom up, right? So we started with screen and I'm just running screen and now I've got a process and it's executing.

64
00:05:07.700 --> 00:05:26.460
<v Matt Godbolt>Well, even screen is one level too far from, "I literally run the process and it's there and I'm watching it and I'm watching it I'll Control-C it", which you know is also valid, but it gives us a sort of starting point of like, what happens when you fire up a process and why is that not okay?

65
00:05:26.460 --> 00:05:42.060
<v Ben Rady>Yeah, right. Right. Yeah, that's great. I love this. Okay, so it's like, so when you do that, you're like, all right, my plan for deployment now is I'm going to SSH onto the production server or EC2 instance or whatever you got, and I'm going to copy and SCP my bits up there, and then I'm going to run

66
00:05:42.060 --> 00:05:44.440
<v Matt Godbolt>Yeah, and let's not get into packaging and deployment. That's even more complicated.

67
00:05:44.440 --> 00:05:44.560
<v Ben Rady>Yeah, right.

68
00:05:44.560 --> 00:05:45.200
<v Matt Godbolt>Let's leave it at that.

69
00:05:45.200 --> 00:05:45.360
<v Ben Rady>Yeah.

70
00:05:45.360 --> 00:05:49.000
<v Matt Godbolt>Some magical process happens and you have the bits that you need on that machine.

71
00:05:49.000 --> 00:05:53.780
<v Ben Rady>Yes. Yes. I have my executable bits on the machine and then I'm just going to run it.

72
00:05:53.780 --> 00:05:53.900
<v Matt Godbolt>And then...

73
00:05:53.900 --> 00:06:01.640
<v Ben Rady>Well, now what you have is you have a process whose child who's a child of an sshd process, right?

74
00:06:01.640 --> 00:06:03.920
<v Matt Godbolt>Probably a child of the shell that you ran it on, depending on how you do it.

75
00:06:03.920 --> 00:06:04.320
<v Ben Rady>Oh, yeah. No, yeah. That's right. Yeah.

76
00:06:04.320 --> 00:06:04.440
<v Matt Godbolt>I mean, if you're going to...

77
00:06:04.440 --> 00:06:09.890
<v Ben Rady>If you're if you're looking at, no, no, you're absolutely right. So if you got the tree, it's like, okay, it's a child of bash.

78
00:06:09.890 --> 00:06:10.320
<v Matt Godbolt>pstree will show...

79
00:06:10.320 --> 00:06:19.960
<v Ben Rady>And then Bash is going to be a child of sshd. And then that's going to be a child of the parent SSH server. And then that's probably going to be a child of init, right?

80
00:06:19.960 --> 00:06:20.180
<v Ben Rady>Like roughly, am I

81
00:06:20.180 --> 00:06:23.140
<v Matt Godbolt>Or which nowadays is probably systemd.

82
00:06:23.140 --> 00:06:23.640
<v Ben Rady>Yeah.

83
00:06:23.640 --> 00:06:28.150
<v Matt Godbolt>Thorin, son of Thrain, son of Thror.

84
00:06:28.150 --> 00:06:29.220
<v Ben Rady>Right.

85
00:06:29.220 --> 00:06:39.580
<v Matt Godbolt>It's going to be your program, son of Bash, son of sshd child process, son of sshd parent process.

86
00:06:39.580 --> 00:06:40.920
<v Ben Rady>Right, right. So.

87
00:06:40.920 --> 00:06:43.060
<v Matt Godbolt>Yeah, got it. Yes, that makes sense. All right.

88
00:06:43.060 --> 00:06:55.300
<v Ben Rady>So if you naively, or maybe like not naively, but you just sort of like just have enough knowledge to be dangerous, you're like, oh, I've got the ampersand operator in Bash that I could put at the end of that.

89
00:06:55.300 --> 00:06:55.340
<v Matt Godbolt>Yeah.

90
00:06:55.340 --> 00:07:06.490
<v Ben Rady>Because it's like, okay, cool. The production server is running on my laptop. And if I put my laptop to sleep or, you know, the SSH session, the client is on my laptop. The server is a server, but it's like, all right, I started on the server.

91
00:07:06.490 --> 00:07:06.700
<v Matt Godbolt>Yeah.

92
00:07:06.700 --> 00:07:12.780
<v Ben Rady>Now I want to go home. I need to close my laptop lid and I need to leave. Well,

93
00:07:12.780 --> 00:07:13.380
<v Matt Godbolt>Yeah.

94
00:07:13.380 --> 00:07:19.100
<v Ben Rady>What exactly is going to happen if I close this lid? Like, I don't want it to stop, right? So you're like, okay, well, here's what it...

95
00:07:19.100 --> 00:07:21.700
<v Matt Godbolt>Well, let's talk about what happens in that situation, just to be absolutely clear.

96
00:07:21.700 --> 00:07:22.020
<v Ben Rady>Yeah, okay, okay, go.

97
00:07:22.020 --> 00:07:31.810
<v Matt Godbolt>Right. So let me read this back to you. So you're saying, yeah, you're running, as described, the production binary having SSH'd into a machine, and you've closed your laptop lid.

98
00:07:31.810 --> 00:07:32.060
<v Ben Rady>Yes.

99
00:07:32.060 --> 00:07:32.580
<v Matt Godbolt>All right.

100
00:07:32.580 --> 00:07:41.460
<v Matt Godbolt>So assuming or even yeah assuming you just close your laptop lid and nothing shut down nicely, it just literally suspended, I don't actually know exactly what your laptop will do in this situation.

101
00:07:41.460 --> 00:07:41.480
<v Ben Rady>Mm-hmm.

102
00:07:41.480 --> 00:07:48.470
<v Matt Godbolt>But let's just assume it disappears off the network instantaneously, which is also completely reasonable if you go into like a tunnel on the train on the way home, that kind of thing.

103
00:07:48.470 --> 00:07:48.660
<v Ben Rady>Right. Yeah. yeah

104
00:07:48.660 --> 00:08:03.680
<v Matt Godbolt>Right. Then eventually the TCP connection between your computer and the SSH daemon on the remote end will time out. There'll be a keep alive that's missing probably, or some other heart beating mechanism will go down.

105
00:08:03.680 --> 00:08:18.720
<v Matt Godbolt>And the SSH daemon will say, hey, that person's gone now. It's time to clean up their session. It will, I think, kill the bash process. And the bash process then will kill all of the children that it knows about,

106
00:08:18.720 --> 00:08:23.620
<v Matt Godbolt>Something like that. Or there's some... sig...Yeah, so this is this is kind of it, right? So what what is...

107
00:08:23.620 --> 00:08:26.820
<v Ben Rady>Yeah, signals and processes, right?

108
00:08:26.820 --> 00:08:40.250
<v Matt Godbolt>Yeah. I mean, I know that the result will be: my program will die. Exactly how that dies, I'm not 100% sure, but that's what would happen eventually, maybe five or six minutes later when the SSH daemon times out your connection and says this person's not there anymore.

109
00:08:40.250 --> 00:08:40.960
<v Ben Rady>Yes.

110
00:08:40.960 --> 00:08:49.850
<v Matt Godbolt>It kills the process tree through some mechanism and then, yeah, you get a phone call as you've got onto the train telling you that the production system is down.

111
00:08:49.850 --> 00:08:49.940
<v Ben Rady>Right.

112
00:08:49.940 --> 00:08:51.200
<v Matt Godbolt>Please fix it.

113
00:08:51.200 --> 00:09:08.700
<v Ben Rady>Exactly. Exactly right. And so, and this is maybe where we troll our listener into, into posting the right answer on the internet to this, because I would suspect what probably happens is that the SSH daemon kills like the process group.

114
00:09:08.700 --> 00:09:09.320
<v Matt Godbolt>Of course.

115
00:09:09.320 --> 00:09:09.740
<v Ben Rady>Right?

116
00:09:09.740 --> 00:09:16.010
<v Matt Godbolt>Yeah, because Bash becomes a a process group controller or whatever the name is...a leader.

117
00:09:16.010 --> 00:09:16.340
<v Ben Rady>Yeah.

118
00:09:16.340 --> 00:09:18.960
<v Matt Godbolt>Process group leader. That's right.

119
00:09:18.960 --> 00:09:19.640
<v Ben Rady>Yeah.

120
00:09:19.640 --> 00:09:25.120
<v Matt Godbolt>Where's my Stevens book? I haven't got it here. No. But yeah, there's...

121
00:09:25.120 --> 00:09:29.190
<v Ben Rady>Yeah, but it's probably going to send a SIGTERM to that process group.

122
00:09:29.190 --> 00:09:29.500
<v Matt Godbolt>Okay. So...

123
00:09:29.500 --> 00:09:41.080
<v Ben Rady>And so every process in the process group is going to receive that term signal and then hopefully gracefully shut down. I don't know if it follows it up with like a SIGKILL at some point or not. Maybe it does. Maybe it doesn't.

124
00:09:41.080 --> 00:09:44.240
<v Ben Rady>I'm not exactly sure what sshd would do there.

125
00:09:44.240 --> 00:09:50.030
<v Matt Godbolt>No. No, but that would seem reasonable so that you never you don't end up with loads of processes that just decided not to kill themselves.

126
00:09:50.030 --> 00:09:50.160
<v Ben Rady>Yeah, yeah.

127
00:09:50.160 --> 00:09:55.730
<v Matt Godbolt>And frankly, I think Bash will probably do the right thing for that circumstance.

128
00:09:55.730 --> 00:09:57.140
<v Ben Rady>Yeah, yeah, yeah.

129
00:09:57.140 --> 00:09:59.390
<v Ben Rady>So day one, we try to deploy like this.

130
00:09:59.390 --> 00:09:59.720
<v Matt Godbolt>okay

131
00:09:59.720 --> 00:10:19.270
<v Ben Rady>We close our laptop lid, we go home, we get the unfortunate call, and then we rush home, and then we open the laptop lid back up, and then we rerun the process. All right, well, I can't do that. So an enterprising person might say, okay what I'm going to do is I'm going to use the bash ampersand command because I know that will put a process into the background, right?

132
00:10:19.270 --> 00:10:19.700
<v Matt Godbolt>Right.

133
00:10:19.700 --> 00:10:30.990
<v Ben Rady>And so I'm going to do that next time. going to run, I'm going to do my deploy. I'm going to put an ampersand on the end, right? And then I'm going to like, now it's running in the background and now I shouldn't have to worry about this.

134
00:10:30.990 --> 00:10:31.380
<v Matt Godbolt>And yeah.

135
00:10:31.380 --> 00:10:44.620
<v Matt Godbolt>Although if I were to do that with like a process like we were just talking about, the very first thing I would notice is that my shell prompt comes back and then immediately loads of junk from my log file is now appearing over top of what I'm running.

136
00:10:44.620 --> 00:10:44.700
<v Ben Rady>Yes.

137
00:10:44.700 --> 00:10:47.290
<v Matt Godbolt>So even before we get into processes, and threads there's like a pragmatic thing.

138
00:10:47.290 --> 00:10:47.540
<v Ben Rady>Yes.

139
00:10:47.540 --> 00:10:58.370
<v Matt Godbolt>So what I would probably do is redirect output to, you know, ~/log.txt and And then we'll put the ampersand on the end.

140
00:10:58.370 --> 00:10:59.040
<v Ben Rady>Right, exactly.

141
00:10:59.040 --> 00:11:08.790
<v Matt Godbolt>So that idea already, right. So good. And now it's in the background and I think we're great. And I, you know, I'm tailing that log file for a bit and that's safe because that's a separate process.

142
00:11:08.790 --> 00:11:09.580
<v Ben Rady>Yep.

143
00:11:09.580 --> 00:11:14.780
<v Matt Godbolt>And now I close the laptop lid and get on the plane, a plane, train, whatever, any more mode of, what happens now?

144
00:11:14.780 --> 00:11:23.770
<v Ben Rady>Right. Right. Well, I think what happens, and I think this because I've had this burn me from time to time, is that, yes, you redirected standard out, but you did not redirect standard error.

145
00:11:23.770 --> 00:11:39.020
<v Ben Rady>And so there is actually still the daemon has a file handle that it thinks it needs to be writing to back to your thing. And so you put this in the background and you do this again and it breaks again. it does exactly the same thing all over again.

146
00:11:39.020 --> 00:11:45.120
<v Matt Godbolt>Well, I think there's more than one reason. So yes, first of all, standard error isn't going anywhere useful.

147
00:11:45.120 --> 00:11:46.300
<v Ben Rady>Right.

148
00:11:46.300 --> 00:11:52.730
<v Matt Godbolt>The second thing here is that although it is in the background, it's still a child of Bash.

149
00:11:52.730 --> 00:11:53.200
<v Ben Rady>Yeah. Right.

150
00:11:53.200 --> 00:12:01.800
<v Matt Godbolt>So you've got it's got you coming both ways. And maybe thirdly, thirdful, is its standard input is still potentially connected to...

151
00:12:01.800 --> 00:12:01.800
<v Ben Rady>Mm-hmm.

152
00:12:01.800 --> 00:12:19.440
<v Matt Godbolt>the console, the terminal, something. I'm waving my hands a lot here because that's a very, I'm less sure about it. But I certainly know that if you try and read from the console, you'll get one of the even more esoteric signals about like, hey, yeah, you can't, you're not connected to it right now. [editing matt here, SIGTSTOP maybe?]

153
00:12:19.440 --> 00:12:20.370
<v Matt Godbolt>And then you'll get stopped.

154
00:12:20.370 --> 00:12:20.540
<v Ben Rady>Yeah.

155
00:12:20.540 --> 00:12:24.770
<v Matt Godbolt>And so you'll see in Bash, stopped inputting required or something weird like that.

156
00:12:24.770 --> 00:12:25.100
<v Ben Rady>Mm-hmm.

157
00:12:25.100 --> 00:12:25.150
<v Matt Godbolt>um

158
00:12:25.150 --> 00:12:25.200
<v Ben Rady>Mm-hmm.

159
00:12:25.200 --> 00:12:29.440
<v Matt Godbolt>So all of those would defeat you and you end up with a dead process.

160
00:12:29.440 --> 00:12:45.740
<v Ben Rady>Right, right. So this is where you start investigating all of the various options that you can pass to SSH when you run this, because you're like, going to make a script. I'm going to make a script that works, and I'm just going to run the script, and it's going to do my deploy, and then I'm going to trust that it works.

161
00:12:45.740 --> 00:12:56.420
<v Ben Rady>And you start learning about, OK, well, I need to do the option that like doesn't read from standard in, because I don't want the standard in problem. And then I got to make sure that I redirect standard out and standard error so I can put this thing in the background.

162
00:12:56.420 --> 00:13:02.180
<v Matt Godbolt>Right. You're saying this right. Just to be clear, these are options you say to SSH or to the bash.

163
00:13:02.180 --> 00:13:03.720
<v Ben Rady>SSH, right?

164
00:13:03.720 --> 00:13:07.840
<v Matt Godbolt>Oh, I see. So now, now we're not going to run bash at all. We're just going to run the executable directly. And, or what were you thinking?

165
00:13:07.840 --> 00:13:16.580
<v Ben Rady>Well, so you're going to run. So I'm thinking of the world where it's like you do a thing, you like copy the bits up to the machine.

166
00:13:16.580 --> 00:13:17.160
<v Matt Godbolt>Uh huh.

167
00:13:17.160 --> 00:13:23.900
<v Ben Rady>And then you have like a separate SSH call where you're passing the command that you want to run as an argument into SSH.

168
00:13:23.900 --> 00:13:27.400
<v Matt Godbolt>Right. So you're no longer running an interactive session. You're just going to... Yeah, that makes sense.

169
00:13:27.400 --> 00:13:27.460
<v Ben Rady>Right?

170
00:13:27.460 --> 00:13:31.650
<v Matt Godbolt>Okay. Then that takes Bash out of the equation, which helps us a bit in this context.

171
00:13:31.650 --> 00:13:31.740
<v Ben Rady>Yeah.

172
00:13:31.740 --> 00:13:42.540
<v Matt Godbolt>Although there is a there is still another Bashian solution that I think I see people go for, which is you type disown in Bash, which says, push this thing and make it not a child of this process anymore.

173
00:13:42.540 --> 00:13:43.080
<v Ben Rady>Ah, yeah. Uh-huh.

174
00:13:43.080 --> 00:14:00.410
<v Matt Godbolt>And that probably, probably... might solve the problem most of the time, except you've left a big like rake in the grass for that because there are other processes in the system that might wish to get rid of that apparently now orphaned process.

175
00:14:00.410 --> 00:14:00.960
<v Ben Rady>Yes.

176
00:14:00.960 --> 00:14:09.300
<v Matt Godbolt>So... That's what nohup's for. It's like it gets of the hang-up and there's some other things that it does. And then there's daemonization and other bits and pieces, which which I'm sure we'll get to in a second.

177
00:14:09.300 --> 00:14:18.270
<v Matt Godbolt>But let's put that to one side and let's go down the rabbit hole that you've described, which is that like I'm now going to run SSH on my computer

178
00:14:18.270 --> 00:14:18.740
<v Ben Rady>Okay.

179
00:14:18.740 --> 00:14:19.680
<v Ben Rady>Yeah.

180
00:14:19.680 --> 00:14:38.400
<v Matt Godbolt>um And I'm going to pass it rather than just SSH. I'm going to do /path/to/my/executable with all the redirects and things set and try and run it from a server and have it live on the remote machine with all of the pipes and things stdin, stderr and stdout all connected to sensible places.

181
00:14:38.400 --> 00:14:40.020
<v Matt Godbolt>So go ahead.

182
00:14:40.020 --> 00:14:40.080
<v Ben Rady>Yes.

183
00:14:40.080 --> 00:14:40.140
<v Ben Rady>Yes.

184
00:14:40.140 --> 00:14:41.300
<v Matt Godbolt>Sorry. That's where I cut you off.

185
00:14:41.300 --> 00:15:07.280
<v Ben Rady>So, so you do that and then you should, I believe, be able to SSH in separately and do like a pstree and see that the parent process of this, the parent of this process is now one because it is disconnected from what it was doing before from the process group that it was in before.

186
00:15:07.280 --> 00:15:09.480
<v Matt Godbolt>ight.

187
00:15:09.480 --> 00:15:29.920
<v Ben Rady>Um, And at that point, you maybe have something where you can close your laptop and have it hang out. Now, hopefully you sent your log somewhere sensible and you don't fill up the disk with logs. You can pipe it into syslog, which is something that I do when I'm trying to punt on this problem entirely is I'm just like, you know what?

188
00:15:29.920 --> 00:15:37.480
<v Ben Rady>There's already a log rotation system on this machine and it's called syslog. So I'm just going to pipe all my logs into that.

189
00:15:37.480 --> 00:15:43.640
<v Matt Godbolt>Right. And quite possibly you already have log aggregation set up for that so that you can go and read it on like a website and all that kind of nonsense as well.

190
00:15:43.640 --> 00:15:43.980
<v Ben Rady>Maybe you do if you're fancy.

191
00:15:43.980 --> 00:15:50.080
<v Matt Godbolt>Maybe. I mean, but yeah, if you're considering that option, you probably don't because you probably don't have any other infrastructure to lean on.

192
00:15:50.080 --> 00:15:51.200
<v Ben Rady>Right, right.

193
00:15:51.200 --> 00:15:55.800
<v Matt Godbolt>Yeah. Okay. So that seems reasonable.

194
00:15:55.800 --> 00:16:04.950
<v Ben Rady>So what do you do? What do you do after this? So you do this, you finally can go home now. You can shut your laptop and go home. And you're like, right, surely we can make this better than this.

195
00:16:04.950 --> 00:16:05.060
<v Matt Godbolt>Right.

196
00:16:05.060 --> 00:16:08.120
<v Ben Rady>What do we do next?

197
00:16:08.120 --> 00:16:11.440
<v Matt Godbolt>Yeah, right. So I still have...

198
00:16:11.440 --> 00:16:13.820
<v Ben Rady>Do you make the systemd job is what is... I'm kind of questioning here.

199
00:16:13.820 --> 00:16:29.280
<v Matt Godbolt>Well, see, I was thinking another thing. So there is a process... Process is a terribly overloaded term. There is a sequence of things you can do on a POSIX system to become a daemon.

200
00:16:29.280 --> 00:16:31.220
<v Ben Rady>It's special incantation. You got to sacrifice something and that's how that works.

201
00:16:31.220 --> 00:16:41.850
<v Matt Godbolt>That's correct. Yes, there's a pentagram involved and not a "Damon" also so because Matt Damon is the only "Damon".

202
00:16:41.850 --> 00:16:42.140
<v Ben Rady>Yeah. Uh-huh.

203
00:16:42.140 --> 00:16:42.280
<v Ben Rady>Right.

204
00:16:42.280 --> 00:16:51.890
<v Matt Godbolt>So aside here, so as you recall, one of the first ah folks at the company you still work at was also called Matt and was not me.

205
00:16:51.890 --> 00:16:52.520
<v Ben Rady>Mm-hmm. Mm-hmm. Yep.

206
00:16:52.520 --> 00:17:01.840
<v Matt Godbolt>And we were discussing various long-lived processes that we were designing a system to use. And the obvious name was the Matt Daemon system. To be pronounced Matt Damon, obviously.

207
00:17:01.840 --> 00:17:02.940
<v Ben Rady>Right, right.

208
00:17:02.940 --> 00:17:16.750
<v Matt Godbolt>But we never did it. Anyway, daemonization is... Let's not get into politics. Becoming a daemon, as I understand it, is a multi-step process.

209
00:17:16.750 --> 00:17:16.880
<v Ben Rady>Right.

210
00:17:16.880 --> 00:17:37.480
<v Matt Godbolt>The first thing you need to do is fork, which gives you a new process, a shiny new process. Then you call something called setsid, which says, I would like to become the session leader for this new process that I've been created because only a process group, and I'm doing this from memory, so listener, please.

211
00:17:37.480 --> 00:17:42.680
<v Matt Godbolt>And although Ben's nodding, this is not necessarily correct. So just take this massive pile of [salt]

212
00:17:42.680 --> 00:17:43.320
<v Ben Rady>Yeah, right.

213
00:17:43.320 --> 00:17:46.060
<v Ben Rady>Nope. We may be hallucinating all of us.

214
00:17:46.060 --> 00:18:01.720
<v Matt Godbolt>Yes. So you fork. The child process then does setsid to become a process leader in its new group. And then if I remember rightly, you have to fork again to then dissociate yourself from any last tendrils that previous process had.

215
00:18:01.720 --> 00:18:14.900
<v Matt Godbolt>And now you're running and you are completely in the clear. It's something like that. It's some weird sequence of events, which means that you have lost all connection with the previous process.

216
00:18:14.900 --> 00:18:31.000
<v Matt Godbolt>And so when you run some like system process and you pass it with --d or -d, sorry, then, and it immediately returns and disappears. Apparently like, "Hey, did it do anything?" But you know, you you run PS and it's still running. That's the kind of process that it's been through.

217
00:18:31.000 --> 00:18:36.640
<v Matt Godbolt>And you're, you know, you can type jobs and it won't be there. It's like completely lost from you. And probably...

218
00:18:36.640 --> 00:18:36.640
<v Ben Rady>Yeah.

219
00:18:36.640 --> 00:19:01.630
<v Matt Godbolt>I don't realize that the thing you were just talking about and I'm having the penny is dropping now some of the flags that you were talking about finding for SSH to set it up correctly might be the ones that effectively have the same side effect but I having just written something that is a daemon for the if you go back to the systemd conversation we were having last time something became a daemon and I went through that process so it's a bit

220
00:19:01.630 --> 00:19:02.340
<v Ben Rady>Yeah

221
00:19:02.340 --> 00:19:29.980
<v Matt Godbolt>Somewhat in top of mind. And even though I had a daemonization thing there, I still, you can choose, I think, systemd, which we're going to, to say either systemd runs the process and does that for it in its own container, or it's expecting it to run in that particular way. ah And so it can babysit different types of processes, if I remember rightly. Okay, let's go back to what you said about systemd, because that sounds like a useful thing to know about. What is systemd?

222
00:19:29.980 --> 00:19:38.040
<v Ben Rady>Right. So so so the so just to put the problem in context, systemd is a solution to a problem. What's the problem? Well, so here's the problem. So you've written your script.

223
00:19:38.040 --> 00:19:45.010
<v Matt Godbolt>ah Yes. See the last conversation we had about it was to what of solution it might be.

224
00:19:45.010 --> 00:19:45.020
<v Ben Rady>Right.

225
00:19:45.020 --> 00:19:46.940
<v Matt Godbolt>What problem it is.

226
00:19:46.940 --> 00:19:50.380
<v Ben Rady>What problem are we creating by solving another problem?

227
00:19:50.380 --> 00:19:51.220
<v Matt Godbolt>Yes

228
00:19:51.220 --> 00:19:52.250
<v Ben Rady>Right? I think actually...

229
00:19:52.250 --> 00:19:52.680
<v Matt Godbolt>Yeah

230
00:19:52.680 --> 00:20:07.080
<v Ben Rady>Is that a thing? I feel like I've said this before on the podcast. I don't remember the difference between computer science and software engineering. We know this one computer science is solving problems with computers. Software engineering is solving the problems that you create when solving problems with computers.

231
00:20:07.080 --> 00:20:09.400
<v Ben Rady>And ah this is a, this is exactly.

232
00:20:09.400 --> 00:20:11.500
<v Matt Godbolt>Yes, that follows. and That checks out. The maths checks out for that for certain.

233
00:20:11.500 --> 00:20:29.280
<v Ben Rady>Yeah. um And so what problems are we, are we both solving and creating by using systemd? Well, so you write your bash script, it deploys your thing. You shut your laptop and then you wait five minutes, you open it back up and then you have [a check?] and it's still running. And you're like right, I think I maybe believe that this is going to work.

234
00:20:29.280 --> 00:20:43.780
<v Ben Rady>And you go home and the next day you come in and still running. Cool. And then three days later it crashes. And you're like, what would have been super cool is instead of me getting a phone call in the middle of the night because it crashed, if it had just restarted.

235
00:20:43.780 --> 00:20:47.540
<v Matt Godbolt>Well, I mean, wouldn't cool if it hadn't crashed would be what the first thought you'd have.

236
00:20:47.540 --> 00:20:47.540
<v Ben Rady>True.

237
00:20:47.540 --> 00:20:51.660
<v Matt Godbolt>But at three in the morning, you probably just want to go, ah for God's sake, just restart the thing.

238
00:20:51.660 --> 00:20:59.080
<v Ben Rady>Just restart it, please. I'll fix it tomorrow. But can we please not call me because I have to SSH back in and rerun the script again or whatever, right?

239
00:20:59.080 --> 00:21:00.640
<v Matt Godbolt>Right, right, right.

240
00:21:00.640 --> 00:21:13.640
<v Ben Rady>So you're like, I just want this to restart. And then you Google and you're like, well, maybe I should run this in systemd, right? And so you wind up making a whole systemd job definition.

241
00:21:13.640 --> 00:21:18.060
<v Ben Rady>And you, I forget where do you put it. You put it in /etc/something, right?

242
00:21:18.060 --> 00:21:19.740
<v Matt Godbolt>Or is it? Yeah. So there's...

243
00:21:19.740 --> 00:21:20.880
<v Ben Rady>I don't even remember now.

244
00:21:20.880 --> 00:21:30.820
<v Matt Godbolt>So, I mean, my understanding is in the beginning, there was init. And init is effectively the first thing that the kernel executes...

245
00:21:30.820 --> 00:21:31.240
<v Ben Rady>Mm-hmm.

246
00:21:31.240 --> 00:21:41.470
<v Matt Godbolt>As a user process and it then decides what to do. And back in the mysteries of time, there were like run levels and it was all like clever directory structures and things like that.

247
00:21:41.470 --> 00:21:41.540
<v Ben Rady>Oh, yeah.

248
00:21:41.540 --> 00:21:52.250
<v Matt Godbolt>And it just fired up the right sequence of daemon processes. One of which would be, you know, sshd so you could log into the machine or a getty that would let actually let you type on the console to get into the machine.

249
00:21:52.250 --> 00:21:52.340
<v Ben Rady>Mm-hmm.

250
00:21:52.340 --> 00:22:00.670
<v Matt Godbolt>And that was it. And then after that, you're off the races. And systemd is the new init. And instead of it being,

251
00:22:00.670 --> 00:22:01.600
<v Ben Rady>Mm-hmm.

252
00:22:01.600 --> 00:22:24.760
<v Matt Godbolt>...a set of of essentially shell scripts that get run to fire things up in the right order. Again, I'm probably a bit... missing loads of bits of context here, but it's a sort of a more principled approach where you have units that are like, I would like this thing to run, please. I would like this to be true under these circumstances. And it depends on these other things that also need to be either running or at least have started before me.

253
00:22:24.760 --> 00:22:28.620
<v Matt Godbolt>And so instead of having essentially numbered directories with, you know, 40.do-this , 41.do...,

254
00:22:28.620 --> 00:22:33.360
<v Ben Rady>Yeah, RC dot D or RC dot one RC dot two, something like that.

255
00:22:33.360 --> 00:22:37.500
<v Matt Godbolt>Yeah, those were the run levels, I think, which was slightly different because it's single user mode versus multi-user mode.

256
00:22:37.500 --> 00:22:37.860
<v Ben Rady>Yeah, something like that. Yeah, right.

257
00:22:37.860 --> 00:22:45.220
<v Matt Godbolt>But this is more like, hey, what sequence do I need to run things in and shut them down in, in order for my system to come up?

258
00:22:45.220 --> 00:22:46.660
<v Ben Rady>Mhm.

259
00:22:46.660 --> 00:22:58.040
<v Matt Godbolt>And systemd does that kind of the right way by actually tracking dependencies, which again was expensive and caused me problems in our last conversation, but is is the right approach and the correct thing to do.

260
00:22:58.040 --> 00:23:08.650
<v Matt Godbolt>And so that's what systemd is. It's like the overarching orchestrator of a computer and all of the processes that are running on it.

261
00:23:08.650 --> 00:23:08.820
<v Ben Rady>Mhm.

262
00:23:08.820 --> 00:23:12.820
<v Matt Godbolt>And so, yes, to make something run in systemd, you put a file in the right magical place.

263
00:23:12.820 --> 00:23:23.760
<v Matt Godbolt>You issue the correct incantation to systemd to go and notice that file is there. And then what?

264
00:23:23.760 --> 00:23:24.380
<v Ben Rady>And then need to reload the system daemon.

265
00:23:24.380 --> 00:23:29.740
<v Matt Godbolt>I'm looking at you because I thought you might've just done this and you could answer the question.

266
00:23:29.740 --> 00:23:33.120
<v Ben Rady>Yes. reload the systemd

267
00:23:33.120 --> 00:23:34.960
<v Matt Godbolt>Yeah, there's like daemonctl reload or something.

268
00:23:34.960 --> 00:23:38.670
<v Matt Godbolt>That's the magical incantation that says, hey, systemd, look through your configuration files.

269
00:23:38.670 --> 00:23:38.840
<v Ben Rady>Tes

270
00:23:38.840 --> 00:23:40.240
<v Matt Godbolt>Something has changed.

271
00:23:40.240 --> 00:23:40.540
<v Ben Rady>Yes.

272
00:23:40.540 --> 00:23:42.600
<v Matt Godbolt>Please do the needful now.

273
00:23:42.600 --> 00:23:49.860
<v Ben Rady>And then it should start up and then you're using something like journalctl to look at the logs of the thing to make sure that it started.

274
00:23:49.860 --> 00:24:04.140
<v Matt Godbolt>Which... is I think for most people, when Linux systems particularly moved from init to systemd, the biggest frying pan to the side of the head was, where are all my chuffing logs?

275
00:24:04.140 --> 00:24:20.640
<v Matt Godbolt>They used be /var/log/whatever, and that's burnt into my mind. They are text files and are in /var/log/blah, and systemd stopped that. And now there are a few logs in /var/log, but nowadays you have to interact with it through, and it has a binary log file format, as I understand it, behind the hood.

276
00:24:20.640 --> 00:24:29.080
<v Matt Godbolt>And you have to learn journalctl, which I still haven't learned, and I still Google the same thing over and over and over again and type in the thing that it tells me to do, which...

277
00:24:29.080 --> 00:24:29.080
<v Ben Rady>Yeah. Right.

278
00:24:29.080 --> 00:24:52.300
<v Matt Godbolt>is ...note to self. Don't, don't do taking a note here. Don't do that. Make a cheat sheet for it and stick it to my monitor. Like all the other cheat sheets I have. Yeah. So that was, but that was like the, but that broke most people, I think, because I didn't have to interact with adding and removing daemons from my system. That's what, you know, my package management system did. But whenever something went wrong, I'm like, where the hell's the log file?

279
00:24:52.300 --> 00:24:52.880
<v Matt Godbolt>Anyway, so journalctl.

280
00:24:52.880 --> 00:24:52.880
<v Ben Rady>right

281
00:24:52.880 --> 00:25:02.020
<v Ben Rady>it's and It's in this magical program called journalctl. um OK. I feel like this is like I want to go to the next level now.

282
00:25:02.020 --> 00:25:02.040
<v Matt Godbolt>So...

283
00:25:02.040 --> 00:25:10.220
<v Ben Rady>It's like, OK, cool. We're going run this on like two computers could because ah we discovered that the reason it crashed is it got OOM killed.

284
00:25:10.220 --> 00:25:18.030
<v Matt Godbolt>Well, let's finish the thought. So just to be... Right, right, right. let Let's just let's just um finish the thought there. So very concretely, you would install the binary to a known good location, which you probably were anyway.

285
00:25:18.030 --> 00:25:18.200
<v Ben Rady>Yeah. Yeah.

286
00:25:18.200 --> 00:25:19.700
<v Matt Godbolt>It wasn't just your home directory, hopefully.

287
00:25:19.700 --> 00:25:22.340
<v Ben Rady>Pick a user that you're going to run it as.

288
00:25:22.340 --> 00:25:23.160
<v Matt Godbolt>Maybe it was. Yes, that's true.

289
00:25:23.160 --> 00:25:24.640
<v Ben Rady>Might be root, might not.

290
00:25:24.640 --> 00:25:27.900
<v Matt Godbolt>Yeah, let's hope it's it avoids being root if it can.

291
00:25:27.900 --> 00:25:28.700
<v Ben Rady>Yeah.

292
00:25:28.700 --> 00:25:38.690
<v Matt Godbolt>But then, yeah, you make a little text file that sort of, it looks like Toml-ish to me, that systemd config-ish file that says, hey, I need these things.

293
00:25:38.690 --> 00:25:39.080
<v Ben Rady>Yeah. Yeah.

294
00:25:39.080 --> 00:25:41.780
<v Matt Godbolt>I provide these things, which you often don't have to do.

295
00:25:41.780 --> 00:25:42.180
<v Ben Rady>Mm-hmm.

296
00:25:42.180 --> 00:25:46.250
<v Matt Godbolt>This is how I'm going to be started up. This script needs to run before I run.

297
00:25:46.250 --> 00:25:46.320
<v Ben Rady>Yeah.

298
00:25:46.320 --> 00:26:14.330
<v Matt Godbolt>this needs This script needs to run after I run. There's a few, like, customization points you've got like that. And you can say what you're wanted by as well. So in this instance, you probably say I'm wanted by multi-user.target, which is like a magical sort of target that says, hey, when it becomes a multi-user system, the fifth, whatever, um run level five, then this is, I am saying that I am wanted by it, which is a way of you kind of going the other way around from the usual dependency saying it depends on me.

299
00:26:14.330 --> 00:26:14.460
<v Ben Rady>Yeah.

300
00:26:14.460 --> 00:26:14.860
<v Matt Godbolt>And that means...

301
00:26:14.860 --> 00:26:17.180
<v Ben Rady>Right. You're joining the dependency tree there. Yeah.

302
00:26:17.180 --> 00:26:31.260
<v Matt Godbolt>Yeah, so now when you start when you reboot the machine, your service will come back up. And then you can have some policies about retrying, restarting it, maximum number of times to restart, how often to wait between how long to wait between them, those kinds of things.

303
00:26:31.260 --> 00:26:31.340
<v Ben Rady>Mm-hmm.

304
00:26:31.340 --> 00:26:34.220
<v Matt Godbolt>And then effectively, it runs itself after that.

305
00:26:34.220 --> 00:26:35.380
<v Matt Godbolt>So that's what we do. Yeah.

306
00:26:35.380 --> 00:26:35.620
<v Ben Rady>Yeah.

307
00:26:35.620 --> 00:26:42.710
<v Matt Godbolt>And so your installation process is copy the binary bits up and make sure that this systemd configuration is there.

308
00:26:42.710 --> 00:26:42.980
<v Ben Rady>Yeah.

309
00:26:42.980 --> 00:26:48.780
<v Matt Godbolt>And then obviously if you want to restart it, there are processes for restarting service, restart and all that kind of good stuff.

310
00:26:48.780 --> 00:26:49.340
<v Ben Rady>yeah Yeah.

311
00:26:49.340 --> 00:26:51.020
<v Ben Rady>servicectl?

312
00:26:51.020 --> 00:26:54.410
<v Matt Godbolt>Yeah, is that what you use? I still use service space, service name restart.

313
00:26:54.410 --> 00:26:54.580
<v Ben Rady>I think that's one. I don't know.

314
00:26:54.580 --> 00:26:57.520
<v Matt Godbolt>There's there's almost certainly a hundred ways to do it.

315
00:26:57.520 --> 00:27:01.390
<v Matt Godbolt>Honestly, I still want to go var run blah or whatever the whole old thing was.

316
00:27:01.390 --> 00:27:01.480
<v Ben Rady>Yeah.

317
00:27:01.480 --> 00:27:14.240
<v Matt Godbolt>I actually don't know what this command is, but it just comes out of my fingers when I need to say, make that thing run again. um But yeah, service space, name of thing, space restart is now what I've learned to do. But Okay, so that's where we are.

318
00:27:14.240 --> 00:27:14.720
<v Ben Rady>Okay.

319
00:27:14.720 --> 00:27:15.780
<v Matt Godbolt>Right, okay, so now now we're good, right?

320
00:27:15.780 --> 00:27:15.860
<v Ben Rady>Yes.

321
00:27:15.860 --> 00:27:32.190
<v Matt Godbolt>We know that the process is being appropriately managed by a piece of software that's designed to start it up at the right time and keep it running. It also has some handling for like, if it does output to standard out, it'll go to a well-defined log place inside this journalctl thing.

322
00:27:32.190 --> 00:27:32.740
<v Ben Rady>Mm-hmm.

323
00:27:32.740 --> 00:27:42.940
<v Matt Godbolt>If it crashes, it will restart it. If you reboot the machine, it'll come back up with it if you set that to be so. so Everything is wonderful. So what's next?

324
00:27:42.940 --> 00:27:55.240
<v Ben Rady>Right. So what's next is that you discover that the thing just crashes every four or five days ah because it's running out of memory because it needs to run on more than one computer. It is too big.

325
00:27:55.240 --> 00:27:59.460
<v Ben Rady>So you have to now run it on multiple computers and you have to distribute whatever work it's doing.

326
00:27:59.460 --> 00:28:02.080
<v Matt Godbolt>We're assuming you've ruled out the, there's a memory leak type issue here.

327
00:28:02.080 --> 00:28:03.350
<v Ben Rady>Yes, it's not a memory leak.

328
00:28:03.350 --> 00:28:03.800
<v Matt Godbolt>Yeah. We're just, yeah, yeah, yeah.

329
00:28:03.800 --> 00:28:06.020
<v Ben Rady>It's just too much data.

330
00:28:06.020 --> 00:28:06.500
<v Matt Godbolt>It's just like, Hey, it's too much.

331
00:28:06.500 --> 00:28:07.200
<v Ben Rady>Yeah.

332
00:28:07.200 --> 00:28:08.640
<v Matt Godbolt>So what do we do now then?

333
00:28:08.640 --> 00:28:16.600
<v Ben Rady>So now we need to run it on multiple computers. And so like one thing you might reach for here is Ansible maybe?

334
00:28:16.600 --> 00:28:25.610
<v Matt Godbolt>I was going to say, is probably duplicating the line in the "scp shh machine service blah restart" and just do "for host in".

335
00:28:25.610 --> 00:28:27.750
<v Ben Rady>Right. Yes. For host and host list. Yes.

336
00:28:27.750 --> 00:28:28.460
<v Matt Godbolt>yeah

337
00:28:28.460 --> 00:28:30.500
<v Ben Rady>Uh-huh. And just do the exact same thing.

338
00:28:30.500 --> 00:28:31.280
<v Matt Godbolt>So that's the first thing I would do, right?

339
00:28:31.280 --> 00:28:31.360
<v Ben Rady>Yes.

340
00:28:31.360 --> 00:28:39.010
<v Matt Godbolt>At least to start with, right? That's the V0 of anything is like, well, okay, let's deploy it to the two computers I know about right now and just do the same thing on both of them.

341
00:28:39.010 --> 00:28:39.070
<v Ben Rady>Right. Yes.

342
00:28:39.070 --> 00:28:39.100
<v Ben Rady>Yeah.

343
00:28:39.100 --> 00:28:40.580
<v Matt Godbolt>And then, okay.

344
00:28:40.580 --> 00:28:53.280
<v Ben Rady>That is probably what I would do. And then I would have the thing where I would try to deploy it and there'd be some package or some configure. Oh, we got to increase the size of the maximum size of the receive buffers on the network.

345
00:28:53.280 --> 00:29:02.760
<v Ben Rady>And so now I've got to like go and change that configuration. I gotta change it. And I've already scaled this out to like 10 computers now, like every month for the last, you know, 10 months, I've been just adding another computer to my to the list of hosts.

346
00:29:02.760 --> 00:29:06.740
<v Matt Godbolt>You've been adding another the host to the list of hosts. Yeah.

347
00:29:06.740 --> 00:29:17.700
<v Ben Rady>And now it takes like, you know three minutes just to iterate through all of them. and I'm like, oh, and I have to remember to log in and set all these settings every time I add a new host and it's getting worse and worse and worse.

348
00:29:17.700 --> 00:29:25.930
<v Matt Godbolt>Okay, so we've now gone firmly outside of signals and processes. And now this is like the setting up of the machine here is what you're talking about, which is valid.

349
00:29:25.930 --> 00:29:26.200
<v Ben Rady>Well.

350
00:29:26.200 --> 00:29:41.880
<v Matt Godbolt>And if you think of, you know, the system, ah sorry, the systemd configuration unit file, whatever we just said, as being part of this machine configuration, then it does make sense to talk about some of the other things that you might need that machine to have set up like packages.

351
00:29:41.880 --> 00:29:45.240
<v Matt Godbolt>And as you say, system settings. So yeah let's segue into that. Let's do it.

352
00:29:45.240 --> 00:29:47.640
<v Ben Rady>Yeah.

353
00:29:47.640 --> 00:30:07.000
<v Ben Rady>Yeah, OK. So you've decided that now, okay I need to retire this bash script. It's served me well, but it's time to move on to something a little bit where I don't have to like build all this stuff myself and make sure that it works and troubleshoot it all. So I'm going to try to use Ansible.

354
00:30:07.000 --> 00:30:08.560
<v Ben Rady>Let's just say.

355
00:30:08.560 --> 00:30:14.080
<v Matt Godbolt>And what is Ansible and what makes something able to be ansed, which is presumably what it means?

356
00:30:14.080 --> 00:30:15.840
<v Ben Rady>And well, first you have to have pants and you can have ants in your pants and then Pantsible.

357
00:30:15.840 --> 00:30:18.720
<v Matt Godbolt>That would be pansible.

358
00:30:18.720 --> 00:30:21.120
<v Ben Rady>That's going to be the fork of Ansible is Ansible.

359
00:30:21.120 --> 00:30:23.760
<v Matt Godbolt>Okay. Okay.

360
00:30:23.760 --> 00:30:40.300
<v Ben Rady>So Ansible is, uh, honestly a tool that I have only used sometimes. It is not, I sort of like wind up making the jump from like, the shell script to like terraform.

361
00:30:40.300 --> 00:30:45.900
<v Ben Rady>That's usually what I do is I'm like, all right, I'm going to go and I'm going to have something like nomad manage these, or I'm going to manage them in the cloud, just making Docker containers.

362
00:30:45.900 --> 00:30:55.930
<v Matt Godbolt>I see. So at that point, you jump straight out into sort of an orchestration environment as opposed to I'm controlling individual machines, because that's the other thing in here, that host list and the provisioning of those machines.

363
00:30:55.930 --> 00:30:56.000
<v Ben Rady>Yeah. Yeah.

364
00:30:56.000 --> 00:31:00.350
<v Matt Godbolt>We're assuming that these machines exist and you haven't got to like make them appear in EC2.

365
00:31:00.350 --> 00:31:00.440
<v Ben Rady>Yeah.

366
00:31:00.440 --> 00:31:02.720
<v Matt Godbolt>But let's go through what Ansible is, because I think that is interesting.

367
00:31:02.720 --> 00:31:11.940
<v Ben Rady>Yes. But, but real, but real high level Ansible is you write a playbook. And I think that playbook is pretty much in YAML and it's got like the steps that you want to perform.

368
00:31:11.940 --> 00:31:25.200
<v Ben Rady>And there's like a lot of sort of baked in things of like, "Oh, I need to copy this artifact from this place to this place". Cool. I need to create a, configuration file here. Cool. I need to restart systemd. Cool. It can do all those things for you.

369
00:31:25.200 --> 00:31:41.860
<v Ben Rady>And there's lots of baked-in tools in Ansible to sort of do the typical system management things: You can install packages. You can create users. You can.. you know, because it's like hopefully, like you said, we weren't running this thing as a root. So we had a dedicated user for it.

370
00:31:41.860 --> 00:31:57.400
<v Ben Rady>I need when I'm setting up a new machine, I need to make that user. I need to make sure they don't have a password, that they have the right SSH keys, you know, all those kinds of wonderful things. So you have some, you know, script or some playbook that you run, you know, as root because it needs to be able to do all these things.

371
00:31:57.400 --> 00:32:08.940
<v Ben Rady>But then it sort of sets up the environment and then like subsequent deploys and things can, you know, kind of make it that the program can run as a user and it doesn't need to root.

372
00:32:08.940 --> 00:32:18.850
<v Matt Godbolt>Got it. Right. That makes sense. So it is essentially a canonifi...canonific..., that word, of what, the steps that you need to do the playbook.

373
00:32:18.850 --> 00:32:18.980
<v Ben Rady>Yeah.

374
00:32:18.980 --> 00:32:20.100
<v Matt Godbolt>I mean, that's a good name for it, right?

375
00:32:20.100 --> 00:32:20.240
<v Ben Rady>Yeah.

376
00:32:20.240 --> 00:32:27.580
<v Matt Godbolt>Like it, it, it replaces the playbook, which is the, you know, the Google doc that you have that says, when, remember when you create a new machine, here's the 25 steps that you have to do.

377
00:32:27.580 --> 00:32:27.580
<v Ben Rady>Mm-hmm.

378
00:32:27.580 --> 00:32:30.800
<v Matt Godbolt>And you kind of roll your eyes and do them. And it's like, well, let's automate this.

379
00:32:30.800 --> 00:32:45.660
<v Matt Godbolt>And it does it in a principled way using, with a bunch of support files that help you, ah make sort of support functionality that lets you do like add user rather than having to go whatever steps you actually have to take to add the user, which I forget these days.

380
00:32:45.660 --> 00:32:45.900
<v Ben Rady>Yeah

381
00:32:45.900 --> 00:32:59.240
<v Matt Godbolt>Okay. That makes sense to me. I think one of the things that I have had difficulty in getting my head around when looking at these sets of tools and only because you've mentioned Terraform.

382
00:32:59.240 --> 00:33:04.800
<v Matt Godbolt>One thing I like about something like Terraform is that you kind of describe the end state

383
00:33:04.800 --> 00:33:04.800
<v Ben Rady>Yeah.

384
00:33:04.800 --> 00:33:11.800
<v Matt Godbolt>And Terraform's responsible for getting whatever the current state is to the end state.

385
00:33:11.800 --> 00:33:12.480
<v Ben Rady>Yeah, yeah.

386
00:33:12.480 --> 00:33:26.370
<v Matt Godbolt>So, whereas with things like Ansible, as I understand it, is you have to be very careful to either be idempotent so you can run the same thing twice and it doesn't re-add another user if there is one already called that thing.

387
00:33:26.370 --> 00:33:27.260
<v Ben Rady>Right.

388
00:33:27.260 --> 00:33:39.160
<v Matt Godbolt>Or you just have to not don't run that step again. You know, like, hey, once we add that user, don't try and do it again. And then you kind of go like, well, now I want to change the user to have a different you know full name or a different shell or whatever.

389
00:33:39.160 --> 00:33:39.220
<v Ben Rady>yeah

390
00:33:39.220 --> 00:33:44.790
<v Matt Godbolt>You're like, now I have to run the change command and I can't just change the add.

391
00:33:44.790 --> 00:33:46.340
<v Ben Rady>Right

392
00:33:46.340 --> 00:34:17.170
<v Matt Godbolt>And Unix systems are so, so complicated. I can't actually imagine how you could write a more general purpose like make my system look this way thing except for at least one listener somebody is currently shouting "Nix" into the void as they're walking along and I know that Nix solves this in a very cool way and I'm very excited by it but I don't have any personal experience with it other than someone demoing to me and me going wow that is super cool.

393
00:34:17.170 --> 00:34:18.320
<v Ben Rady>Yeah.

394
00:34:18.320 --> 00:34:24.450
<v Matt Godbolt>But so just for that, yeah, Nix seems to be, it seems to be like a kind of,

395
00:34:24.450 --> 00:34:26.200
<v Ben Rady>I've heard those same things about Nix, but I have, again, no personal experience.

396
00:34:26.200 --> 00:34:34.920
<v Matt Godbolt>A mind virus that people get, not in a bad way necessarily. That does sound pejorative, but like, cause once you get it, I think you're like, Oh my gosh, this is how everything should always be done.

397
00:34:34.920 --> 00:34:35.240
<v Ben Rady>yeah yeah

398
00:34:35.240 --> 00:34:40.750
<v Matt Godbolt>And that's great. And you become like proselytize it to everybody. And then most people's eyes glaze over.

399
00:34:40.750 --> 00:34:41.300
<v Ben Rady>Right

400
00:34:41.300 --> 00:34:55.160
<v Matt Godbolt>And then you're like, that seems great. And then you just log back onto the machine and just go "sudo apt install bob". And you're like, there we are. We're done. Anyway, back to, oops, I've just banged my, yeah but sorry, editing Matt. You just, I've just whacked the microphone stand. [that's ok, I didn't edit it out -editing Matt]

401
00:34:55.160 --> 00:35:07.420
<v Matt Godbolt>Where were we? So I was sort of saying that there's this sort of difference between sort of prescriptive run these things in order and maybe they're idempotent or maybe they can adapt and say like, well, if's if there's a user already there, don't re-add it, that kind of feeling.

402
00:35:07.420 --> 00:35:15.860
<v Matt Godbolt>Versus the Terraform thing where you just say I should like this to be the end state. Here is a list of users the machine has to have with the properties that users have.

403
00:35:15.860 --> 00:35:16.140
<v Ben Rady>Right.

404
00:35:16.140 --> 00:35:26.020
<v Matt Godbolt>And then Terraform goes behind the scenes and goes, well, why don't I look at what users I've got? Oh, now I'll make a plan. A plan is add three users, delete one user, and presents it to you says, this is what I'm going to do.

405
00:35:26.020 --> 00:35:26.980
<v Ben Rady>Yeah.

406
00:35:26.980 --> 00:35:32.080
<v Ben Rady>Have you ever actually used Terraform to do that type of system administration before?

407
00:35:32.080 --> 00:35:38.800
<v Matt Godbolt>Not on a system, no. I've only ever done it with infrastructural components.

408
00:35:38.800 --> 00:35:39.520
<v Ben Rady>Right. Yeah.

409
00:35:39.520 --> 00:35:42.320
<v Matt Godbolt>So yes, that is true. I've never used it for a you

410
00:35:42.320 --> 00:35:44.080
<v Ben Rady>That'd be amazing. I don't know if I can do that, actually.

411
00:35:44.080 --> 00:35:44.860
<v Matt Godbolt>I don't know that it does.

412
00:35:44.860 --> 00:35:45.020
<v Ben Rady>That'd be amazing if you could do that.

413
00:35:45.020 --> 00:35:53.270
<v Matt Godbolt>You're right. Yeah, now I say. But but suddenly, that's where where I was going with that. Was less that Terraform specifically, but like the phrasing is either outcome or steps.

414
00:35:53.270 --> 00:35:53.880
<v Ben Rady>Yeah.

415
00:35:53.880 --> 00:36:07.080
<v Matt Godbolt>And you know it's nice to supply the outcome. But yeah, I don't know if something does exist. And my only interaction with things like that are with Packer, where I always start from an empty image and then run the sequence of steps to make an image that looks the way I want it to.

416
00:36:07.080 --> 00:36:07.140
<v Ben Rady>Mmm.

417
00:36:07.140 --> 00:36:11.800
<v Matt Godbolt>So I never go back to it and kind of go, hey, I want that image, but slightly different.

418
00:36:11.800 --> 00:36:12.180
<v Ben Rady>Yeah.

419
00:36:12.180 --> 00:36:12.900
<v Matt Godbolt>So yeah, anyway.

420
00:36:12.900 --> 00:36:13.350
<v Ben Rady>Yeah. Yeah.

421
00:36:13.350 --> 00:36:13.960
<v Matt Godbolt>We're all over the place.

422
00:36:13.960 --> 00:36:23.420
<v Ben Rady>But yeah, maybe that's the, I feel like this, this podcast is like the rough draft of a conference talk. Cause it's like, imagine that you want to run a program.

423
00:36:23.420 --> 00:36:23.500
<v Matt Godbolt>[laughing]

424
00:36:23.500 --> 00:36:28.200
<v Ben Rady>What do you do? And you we just sort of work up from the bottom up. And then I feel like the, it'd be good talk, right?

425
00:36:28.200 --> 00:36:31.320
<v Matt Godbolt>I think that's a... When was the last time you gave a conference talk? Come on, it's your turn.

426
00:36:31.320 --> 00:36:36.060
<v Ben Rady>Oh, it's been a long time. I, I, I'm probably overdue, honestly.

427
00:36:36.060 --> 00:36:47.780
<v Matt Godbolt>Because...very much part of the, the last week's conversation. The reason I was looking into that was because I was avoiding writing several conference talks that I have to give in about a month's time.

428
00:36:47.780 --> 00:36:54.040
<v Matt Godbolt>And a week has passed since we last spoke; now I'm giving away all of our secrets.

429
00:36:54.040 --> 00:37:05.920
<v Matt Godbolt>Although much longer will have passed in real time. And I've probably given the conference talk by the time I've released this. um So listen, you can be the judge of whether it was any good or not. But yeah, I have done no work on it at all. So...

430
00:37:05.920 --> 00:37:12.920
<v Matt Godbolt>..oops. But yeah, this is a rough draft of a conference talk on...

431
00:37:12.920 --> 00:37:13.620
<v Ben Rady>It is.

432
00:37:13.620 --> 00:37:15.920
<v Matt Godbolt>"So you want to deploy a service" or "So you want to run a service?"

433
00:37:15.920 --> 00:37:17.910
<v Ben Rady>Yeah, exactly. So you want to run some software, right?

434
00:37:17.910 --> 00:37:18.180
<v Matt Godbolt>Yeah, yeah.

435
00:37:18.180 --> 00:37:32.100
<v Ben Rady>How are you going to do it? And I feel like the punchline of this is like, okay, and now we're migrating this all to the cloud and we're going to use Terraform. We're going to use GCP or maybe you have like, you know, ah a lot of companies I feel like these days have like and essentially like an internal cloud.

436
00:37:32.100 --> 00:37:42.220
<v Ben Rady>Like they're still using Terraform, but they're using tools like Nomad and they have their own, you know, physical servers and they have an infrastructure team that's managing it all. And this maybe leads us back.

437
00:37:42.220 --> 00:37:51.080
<v Ben Rady>This is how you get this. Okay. This is the whole ... This is how you get into the state where you're just like, yeah, I just like changed one function with some unit tests and pushed to PR and I have no idea where goes.

438
00:37:51.080 --> 00:37:57.400
<v Matt Godbolt>Yeah, that's exactly right. Yeah.

439
00:37:57.400 --> 00:37:58.840
<v Ben Rady>Yeah. Uh-huh. Yeah. And now, and now the circle is complete.

440
00:37:58.840 --> 00:38:04.700
<v Matt Godbolt>Well... And now the circle is complete. Yeah, I think we've we've probably yeah reached a good spot then.

441
00:38:04.700 --> 00:38:06.760
<v Ben Rady>Yeah.

442
00:38:06.760 --> 00:38:07.340
<v Matt Godbolt>Yeah.

443
00:38:07.340 --> 00:38:21.420
<v Matt Godbolt>It's good to know these. I think like all of these, like everything we talk about, really, certainly everything that I hold dear that we talk about on this ah this podcast is all about finding the right level of abstraction, knowing that there's a level beneath you.

444
00:38:21.420 --> 00:38:21.470
<v Ben Rady>Yeah.

445
00:38:21.470 --> 00:38:43.210
<v Matt Godbolt>Which in this case, you know maybe your level of abstraction is those cloud tools that we've just been talking about and the services that run. But knowing enough about the level beneath you to say like, okay, I do know that there are processes that run and that something is taking care of the input and output for those processes and making sure the right signals get to them at the right time and not the wrong things like me logging out.

446
00:38:43.210 --> 00:38:43.440
<v Ben Rady>Yeah

447
00:38:43.440 --> 00:39:02.520
<v Matt Godbolt>But I don't know that it exists and maybe I could sketch something, but I don't necessarily know off the top my head. And then you should know beneath that what... that something exists, right? Beneath that layer, we know that there is a systemd and I don't know how that works, but it's always good to have a decent understanding of the level beneath where you're working and then be aware of the layer below that.

448
00:39:02.520 --> 00:39:06.540
<v Ben Rady>Right. Know vaguely what to Google or ask ChatGPT, right?

449
00:39:06.540 --> 00:39:08.030
<v Matt Godbolt>Right. Or ask your favorite Large...

450
00:39:08.030 --> 00:39:08.730
<v Ben Rady>Yeah. Yeah.

451
00:39:08.730 --> 00:39:09.080
<v Matt Godbolt>Yeah.

452
00:39:09.080 --> 00:39:10.490
<v Ben Rady>Ask your favorite LLM.

453
00:39:10.490 --> 00:39:11.200
<v Matt Godbolt>Yeah. Yeah.

454
00:39:11.200 --> 00:39:19.340
<v Matt Godbolt>And so I think this plugs into that kind of mindset completely as like, you know, yeah, it's kind of like know how the cloud works and then...

455
00:39:19.340 --> 00:39:20.160
<v Ben Rady>Yeah.

456
00:39:20.160 --> 00:39:23.320
<v Matt Godbolt>...know where to look when it doesn't work.

457
00:39:23.320 --> 00:39:36.970
<v Ben Rady>Mm-hmm. Mm-hmm. Yeah. Like if you the honestly the only downside to this is that in those environments, I feel like where you have those like, you know, a million layers of abstraction between you and the physical server.

458
00:39:36.970 --> 00:39:37.310
<v Matt Godbolt>Cool.

459
00:39:37.310 --> 00:39:49.920
<v Ben Rady>If you're like an old fuddy daddy like us and you're like, can I just SSH in? It's like, no, you can't have root. It's like, whohe ah what why? i know exactly what to do. I know exactly how to fix this problem. And now I'm going to have...OK, fine. Sure.

460
00:39:49.920 --> 00:39:50.270
<v Matt Godbolt>Yeah.

461
00:39:50.270 --> 00:39:50.580
<v Ben Rady>Whatever.

462
00:39:50.580 --> 00:39:58.840
<v Matt Godbolt>Well, and of course, the irony is, they can probably give you root, but it's not even on the real computer because you're several layers of virtualization away from the machine that's actually running.

463
00:39:58.840 --> 00:39:58.910
<v Ben Rady>Yeah. Mm hmm.

464
00:39:58.910 --> 00:39:58.980
<v Ben Rady>Right, yeah, exactly.

465
00:39:58.980 --> 00:39:59.340
<v Matt Godbolt>You talk about the metal.

466
00:39:59.340 --> 00:40:04.300
<v Ben Rady>It's like it's running in the container service. There's no root to give you. Like you can't get there from here, right?

467
00:40:04.300 --> 00:40:04.580
<v Matt Godbolt>Yeah.

468
00:40:04.580 --> 00:40:05.740
<v Ben Rady>Yeah, yeah.

469
00:40:05.740 --> 00:40:08.910
<v Matt Godbolt>Yeah. Cool. All right, friend. Well, this has been great.

470
00:40:08.910 --> 00:40:09.920
<v Ben Rady>Yeah, yeah.

471
00:40:09.920 --> 00:40:11.080
<v Matt Godbolt>We jammed it. We did it.

472
00:40:11.080 --> 00:40:12.300
<v Ben Rady>Not bad for winging it.

473
00:40:12.300 --> 00:40:25.910
<v Matt Godbolt>Yeah, listener, you can let us know. Post a comment somewhere. I mean, some people watch this on YouTube and that's where I see most of the comments and then otherwise tweeted us or hachyderm.io mastodon-y thing or so just email us.

474
00:40:25.910 --> 00:40:25.980
<v Ben Rady>Yeah.

475
00:40:25.980 --> 00:40:26.120
<v Ben Rady>Yeah. Mastodon.

476
00:40:26.120 --> 00:40:33.430
<v Matt Godbolt>You can get us. But we'd we'd love to hear what you think and what we're doing right and wrong because we've never really asked that.

477
00:40:33.430 --> 00:40:33.490
<v Ben Rady>That's not hard either.

478
00:40:33.490 --> 00:40:33.500
<v Ben Rady>Yeah.

479
00:40:33.500 --> 00:40:37.990
<v Matt Godbolt>We just do this for us. This is just our excuse to catch up, isn't it?

480
00:40:37.990 --> 00:40:39.980
<v Ben Rady>Yeah, that's true.

481
00:40:39.980 --> 00:40:40.090
<v Matt Godbolt>Cool.

482
00:40:40.090 --> 00:40:40.180
<v Ben Rady>That's true.

483
00:40:40.180 --> 00:40:54.630
<v Matt Godbolt>All right, friend. Well, have yourself a great weekend and I'll speak to you soon.

484
00:40:54.630 --> 00:40:57.940
<v Ben Rady>All right.

485
00:40:57.940 --> 00:41:02.910
<v Ben Rady>Until next time.

486
00:41:02.910 --> 00:41:05.910
<v Matt Godbolt>Until next time.

