Matt (00:18): Hey, Ben (00:18): Ben. Hey, Matt. Matt (00:20): How are things going? Ben (00:21): Uh, good. Good. Matt (00:23): It's, it's that time of the week again when we like to record a podcast. Ben (00:27): Mm-hmm. , uh, and we're definitely doing that right now. Matt (00:30): That is exactly what I'm doing. I hope you are doing it too. Otherwise, this is gonna be a very one-sided conversation. Ben (00:35): Uh, wait, let me check. Yes. . Matt (00:37): Okay, that's good. Now, as, as our, our regular listener will know, we put so much care and effort into every one of these. We, you know, we, we rehearse over and over again until we are word perfect. We have a clear idea of exactly what's going on. Then the script is, is, is prepared and edited well in advance. Ben (00:56): Mm-hmm. Using word perfect, Matt (01:00): Using word perfect. Oh my gosh. Ben (01:02): That's not using an editor. Uh, uh, uh, a document editor from the 1980s. That's what we do. Matt (01:07): That is quite something. . Yeah. actually. Oh, we're already derailed . And there I was, I was padding for us coming up with an idea about what this episode's going to be around. And now I've completely derd. I was watching, uh, an American YouTube channel where they do up computers, like old retro computers, and they'd gotten their hands on a, uh, BBC master, which was the kind of enhanced version of the BBC Micro, my favorite computer. And it was in fact, fact actually the computer that I had growing up. And it comes with a built-in set of ROMs and one the ROMs is in fact a simple word processor. And seeing him, discover that, and then kind of try and type in it was, it was so frustrating screaming at the screen go, no, no, you press this key to get out of it, although you do this type of thing. You know, like, it's like watching someone use VI and be confused as heck, and you're like, no, it's obvious. Oh, if you were actually using computers in the late eighties, that so, Ben (02:03): Right. Before WYSIWYG. Matt (02:04): Yeah. So we we're not using Starview. Yeah. Before WYSIWYG. Right. Oh my gosh. Ben (02:08): Before WYSIWYG. People don't even probably know what that means, because it's like, there wouldn't be a distinction in their minds. It'd be like, yeah. If you're writing like a Google Doc, you just type the thing and then that's what it looks like. Right? Matt (02:19): Isn't that funny? Yeah. Yeah. Like, presumably like mid, late eighties through I guess mid late nineties wizzywig was a thing, and it was like a defining difference between applications. This one, and for those few listener that, uh, , I dunno, that dunno what WYSIWYG is, stands for what you see Ben (02:37): The teenage child of our listener Matt (02:39): The teenage child of our listener. What you see is what you get, which is to say, if you edit stuff in a document, uh, in Word or Google Docs, whatever, as you're typing it, that's how it's gonna appear on the printer. Right. You're just writing it and it's kind of obvious. It's so obvious that why would you have ever done it any other way? But back in the day of, of like text-based computing, very often you would have sort of essentially HTML style markup code around your, your document. And then only when you sent it to the printer, did you actually discover what it was gonna end up looking like. You had to render it non interactively. And so there was this big difference between, uh, those that could show you interactively what your document would look like, and those that essentially, uh, had to do it, uh, offline mm-hmm. Ben (03:27): mm-hmm. . And then of course the response to that from the hypothetical teenage child that our listeners could be like, why would you wanna print it? ? Matt (03:35): I suppose so. I suppose so. Yeah. I mean, although I, uh, I had an, uh, an interview today and I was trying to print out, uh, the candidate's resume and twice I sent it to my printer and twice it didn't appear. And I gave up because I hadn't got time. And then an yeah, like a caveman or no, I guess not like a caveman, like a modern person to an, I had look at the resume uhhuh on a screen next to them, which, uh, which is better, but I can't doodle on it. And that's, I think the main reason why I print out resumes is not to actually, it's the therapy of having something that I can ab you know, idly doodle upon Ben (04:10): Mm-hmm. , right? Mm-hmm. while you're letting, listening to them drone on about their various accomplishments. Matt (04:17): Uh, drone on is not, no, it was more than that. . I don't want to make it sound like our interviewing process is, is dull. And this is not even what this episode is supposed to be. We are what? No, five minutes in this Ben (04:27): Is, this is, this has been a hell of a tangent. Uh, Matt (04:30): This is a tangent. So what do you wanna talk about today, Ben? Uh, Ben (04:34): I think we were gonna talk about copying and pasting code and when is it is actually a good idea to copy paste? Matt (04:43): Is it ever a good idea to copy paste? Ben (04:45): I think we can make the, I think I can. Matt (04:48): Go on convince me Ben (04:49): I think I'm reasonably prepared to make a few arguments. Matt (04:50): I'm gonna say make every bug I've ever written, Uhhuh is a result of copy pasting some bit of code from somewhere else and not really thinking about it. Ben (04:58): Every bug? Matt (04:58): Uh, no, let's just start with that though. What, okay. When is it at a good idea? Ben (05:02): When is it, when is it a good idea? So I can think of some maybe more mundane, some less controversial examples of this where when you are copying and pasting from, uh, an example or a tutorial to explore something into like a repl or, or to see how something works as a, as a form of experimentation, right? There's no intention that the code that you write is gonna make its way into any real life system. That's not why you're writing it. You're writing it because you have some API or some behavior that you don't understand, maybe even just part of the language that you don't understand and you're trying to understand it, right? Uh, I think you've built an entire website around this concept. Matt (05:44): There maybe some amount of that truth to that yeah. . Ben (05:47): Um, and so copying and pasting code in, in that context, I think is obviously a good idea. Um, I think people can sometimes, sometimes get into trouble when they're like, all right, well I've got this copy and pasted code that I maybe hacked a few pieces in, uh, and now I'm done . Right? Right. No further work is necessary from this point on. Matt (06:10): And that is definitely the, the, the knee jerk mental image I had of what I was thinking when I'm saying I copy paste my code and then Yeah, you don't read it properly, you don't understand it properly, and you move on Ben (06:21): Mm-hmm. mm-hmm. . Yeah. So, I mean, you know, I think the more, the more interesting places that you can, you can kind of talk about this and think about this are, uh, when you've, you move, you've moved beyond that stage. If we're not exploring, like the goal here is actually to write real production going to be used by other people code. And, and when is copying and pasting code a good idea in that case? Um, I think a more controversial sort of second, uh, condition that I could probably throw out here for discussion is hinges on the definition of duplication. So, uh, I gotta, I gotta talk for a little bit about why, to explain this. I gotta talk for a little bit about why copying pasting code is generally bad. Yeah. Matt (07:06): Okay. I guess that's, that's not a bad idea given that we just assumed. Yeah. I just asserted it was terrible and we didn't back it up in any meaningful way. So Yeah. What, yeah, why is it bad? Ben (07:15): So I, one way it can be bad and one way it can create like, fairly serious problems is when you take a decision in a, in a system that is being made in one place, and now all of a sudden you are making it in two places. And if those two places are ever out of sync, that will create a, a bug in your system, right? Like imagine like the way that you, you know, calculate interest for an account and that's in one place, and then you, you know, all of your interest calculations uses one algorithm and they're all consistent, right? And so like the amount of money that you transfer from account A to account B is consistent with the statements that you send out to your customers saying like, you earned this amount of interest, this is the amount that shows up in your statement. Ben (07:58): This is the amount that we sent to the, you know, ancient bank transfer API, that that does actually does the transfer from point A to point B. Matt (08:05): Written in COBOL. Ben (08:06): and now Yes. Exactly. And now these things are in agreement, and if you were to duplicate that code and have the code that produced the customer statements be different than the code that did the bank transfer, they could get out of sync. And if they did, that would be a very serious bug. Right? Matt (08:24): Yes. Agreed. Ben (08:24): And so that kind of duplication is harmful, and we do our very best, I think, as software engineers to try to avoid that because we know it's harmful. And when you're just copying and pasting code, the question that you could be asking you should be asking yourself is, is am I doing this? Am I creating this situation now where if you change one and you forget or neglect or just aren't aware to change the other, is that going to create a bug? Given that, I think there are situations where the opposite is kind of true, where it's sort of like, I have this piece of code. Matt (08:59): Right? Ben (09:00): And I need to do something similar but different in this code, right? So what I'm trying to do is I'm calculating an interest rate, but for a completely different purpose, right? Like maybe I'm calculating like a, you know, bond yield rate or something like that. Matt (09:20): Okay. Ben (09:20): So it's like, it's like different enough to where it's like, yeah, there's some sort of similarity with this code, but it's not the same thing. And it is certainly not the case that if I changed my, you know, bond interest calculation that I should automatically and always change the customer bank account rate interest calculation, right? Those are two separate things with two different reasons to change. And so a reasonable approach to implementing that new functionality might be depending on the structure of the code to copy that interest rate calculation code to drive out the new behavior that you want with tests to delete the behavior that you don't need and make a copy of it and be like, now this is our interest, this is our bond algorithm, and this is our customer account algorithm. Ben (10:06): And they're different and they have different reasons to change, and they have different behaviors as specified by these tests. The code maybe started out the same and maybe right now has some sort of accidental similarity between it. Matt (10:18): Right. Ben (10:18): And, and you might even discover after going through that whole process that there is some essential similarity that really is the same. It's like doing any kind of interest rate calculation where you're just multiplying a balance by a rate. It's like, okay, fine, maybe that gets pulled out into its own thing that is separate from those two things that is tested in its own way. There's more of like a library and, and that is something that, that you don't want to duplicate. But I would almost argue that it is totally reasonable to start with duplicating those things on purpose because they have different reasons to change making them be the best form that they can be, and then looking at them and being like, is there any actual duplication between these two things at that point? And how do I remove that duplication? Matt (11:08): Right. That makes a lot of sense to me. Uh, definitely the, the, the argument I was gonna make that wasn't as maybe trite as, as like the interest rate itself is the calculation piece of co of code. And if you copy the calculation code and you have it in two places, then a bug fix or a performance optimization in one doesn't necessarily make it to the other. So that would be my, that would just be about a pile onto your argument about don't duplicating it, but yeah, that second reason where it is not copy pasting as an end, it's not like I need this in two places. It's like I'm starting with something that's similar and I don't want to have coupling between these things on purpose. I am choosing to essentially branch the code. Ben (11:51): Yeah, that's a good way to think about it. Matt (11:54): or a little facet of the code. And say these, they, they share an ancestry untracked as it may be in source control. But they share an ancestry because they have a, a similar job to do, but I am deliberately going to evolve them in different directions. And that to me is actually more of a, a, a a a point is that, um, sometimes by reusing a piece of code, you are introducing coupling between components that otherwise wouldn't share so much coupling in code, which maybe, um, hurts you down the line, um, in so to try and sort of think of a concrete reason, uh, example of this is certainly if you are planning on moving two systems in two different directions, the fact that they share a piece of common functionality might make it more complicated and harder to test if it has to do essentially two jobs and the jobs are disjoint or partially disjoint. And at some point, obviously you wanna be able to extract a little bit that is common, truly common. And maybe that's your like Util library function, you know, the horrific naming of whatever that part is notwithstanding. But Ben (12:57): The software junk drawer Matt (12:59): The Yes. I mean, we used to have, uh, we used to have something Yeah. For C-cruft was our, all of the crufty bits of code that you needed in C just to get things to work. Um, naming naming is difficult. Right? We, we, we know that it's been, well, well documented that naming is hard, but we can try it can and should do better. Um, that said though, yeah. By deliberately choosing to copy now lets, the two copies evolve in their own directions and that's mm-hmm. Maybe a boon because you're not weighing down, uh, one implementation with the changes and the modifications to support functionality it doesn't care about which may be bloats its API makes it harder to test, makes it less performant, but it is a trade off between, well, what if there is a core bug in that feature? How, how will you get the fix over to the sort of equivalent in the other piece of code? Ben (13:49): Mm-hmm. . Mm-hmm. . Yeah. Yeah. For me, it, it comes down to this sort of litmus test of like, if these two things don't change in a lockstep, is that a bug or a feature? Right? Matt (13:59): Right. Ben (13:59): Because it could be either depending on what they are. Matt (14:01): Very context dependent. Yeah. Ben (14:03): Yeah. Uh, another thing that I, I think is related, but not the same thing as we're talking about here is a situation where you are using duplication as a, as a way to evolve the design of a system. And I actually have this right now in one of the thing things that I'm working on, we are moving, uh, a process that we've been running from being sort of a, a single-threaded process that runs on a local machine to a distributed process that runs inside of a work queue. And the implementation of the algorithm is the same in both cases, but the sort of scaffolding around it is completely different. And we could have kind of tortured the design a little bit to unify the duplication of those two things, things. But the intention is to get rid of the single-threaded local version and only have the distributed version. Ben (14:59): So we literally have, uh, uh, two copies essentially of this class in the system right now. One is called like Job, and the other one's called JobNew. And JobNew is pretty much a copy and paste of job with all of the changes necessary to make it run inside of our work queue. And the intention is we're going to delete Job and then rename JobNew to Job, uh, when this transition is complete. Now, this is definitely something that takes discipline within the team. Matt (15:29): Right. Ben (15:30): Of, 'cause if of course you don't have that discipline and you get pulled off onto other things, and then you, this sticks around, Matt (15:36): NewJobNew. Ben (15:36): JobNew, and then JobNewNew and then JobNewNewNew, right? And it just gets terrible. But, you know, for teams that, that have that discipline and are able basically, like I say, discipline, but really it's just like, do you have control over your own priorities? Matt (15:50): Right. Commitment. Ben (15:50): Because if you have control over your own priorities, then you can just decide that you want to do this, and then you can do it. If your priorities are set by somebody that doesn't have an understanding of how your source code works, then I wouldn't do this. Because you will easily find yourself in a situation. It's like, oh, we have three new JobNews and they're all terrible, and I don't know what to do now. Yeah. Um, but, um, the, the technique itself, I, I think, you know, for teams that can do this is, is a very, is a very valuable one. There's a, there's sort of a form of this that I do locally, you know, this, this is a one that's living, you know, through multiple PRs and multiple commits. And, you know, it's probably gonna be a few weeks before we fully make this transition, but there are definitely situations in which I will do this exact same thing, sort of like within the span of a few hours where I'm trying to change the implementation of something, uh, in, you know, to maybe to add some, some new behaviors, some performance characteristics, whatever it might be. Ben (16:50): And, uh, you maybe have heard me say this before, this metaphor of like the bag of sand, um, Matt (16:55): As in Indiana Jones bag of sand. Ben (16:58): Yes. As in, you know, for, for, uh, Matt (17:01): Those who don't know what WYSIWYG is. Ben (17:01): The folks who have seen that movie. Yes. , for those who don't know what WYSIWYG is, let me also explain to you the first Indiana Jones movie, um, which is, there's a scene at the very beginning of the Indiana Jones movie where he's in this like lost forgotten temple and he's trying to, uh, recover this golden statue before it gets stolen by this other person who's just gonna sell it. Uh, it belongs in a museum, you know, that scene if you've ever seen it. Um, and so he, and there's traps all around this place, and he's trying not to get killed, and he knows that there's some weight sensor mechanism thing in this pedestal. Ben (17:36): And so he's sitting there staring at this, at this golden idol with a bag of sand in his hand, trying to figure out what the weight of the idol is so that he can swap out one for the other. And unfortunately, Mr. Jones fails at this task and a giant boulder rolls down at him . But in software, you can attempt to do this by creating a new implementation of something that has some characteristic that you want. You know, maybe it's more performant or maybe the code is simpler, or maybe it's got, even got some new behavior in it, but it supports all the existing behavior and completely implement that thing and then find a point like a single seam in the code. You know, maybe it's like where you're instantiating a class or calling a function and you're like, all right, I'm gonna comment out the old one and I'm gonna put in the new one, and then I'm gonna run all my tests and I'm gonna see what happens. And if all the tests pass, you're great. Now you can go and you can delete all that duplicated code that you created. Right. You can delete the old implementation, you can delete all those old things and clean it all up. Uh, if the tests don't pass, then you have a giant boulder rolling at you , and now you have to do something which is usually undo, um, and try again. Right? Unlike unlike Indiana Jones, you have that, you have the opportunity to, to undo. I think Matt (18:44): The critical part of that is that first and foremost, um, the bag of sand itself was a copy paste potentially of the original code. Ben (18:53): Yes. Yes. Matt (18:53): Right? It was like, Hey, I copy pasted it and now I have pretty much carte blanche. 'cause nothing is using this currently. It the bag of sand is in my hand. It's not on the pedestal. Ben (19:02): Exactly. Matt (19:03): This analogy is breaking down a little bit here, but Ben (19:05): Oh, I think, I think you got it going. Yeah. Matt (19:07): Um, and you've got the chance to look at it. And also critically, this could be, it could have its own test for the new functionality it could have. Ben (19:13): Absolutely. Yes. Matt (19:13): Its own the assertions that you want to, to test about the, the replacement characteristics of this thing. And that could be committed and checked in, and it could be side by side in your code base for some amount of time even. Yeah. While maybe it's used in a couple of new locations while you're like, I Jeff definitely need the new functionality and we haven't got the old version. And then at some point you make the call that there seems to be doing what you need it to do. It's passing all the tests for the old system, and then you have your Indiana Jones moment of doing the switch. Ben (19:40): Yeah, yeah. Yeah. Matt (19:41): And that can come in a very controlled process, uh, uh, a very controlled part of the development process. And then you sort of commit it on a Friday and say, all right, everyone, the old system is gone on Monday. I'm really sorry if you've been hacking on the, the the old system over the weekend, it's, you're gonna have some horrible merge conflicts on Monday morning. Ben (19:58): Right, right, right, right. Yeah. Yeah. Yeah. And I mean, you can do more sophisticated things like that where you have like feature flags or like different modes where you're like running the new and the old code. And that is actually kinda what we're doing with this job that I was talking about earlier, where it's like, you know, we've got some things that are using the old job and some things that are new using the new job, and we're sort of slowly transitioning all everything over to be the new thing. And then when there's no more uses of the new thing than, or the old thing, then we'll just have the new thing and we can, we can delete the old thing. Um, but yeah, you can also just do it with just a couple of comments. Right? Yeah. Like comment out old thing, put in new thing, switch back if if, Matt (20:34): But don't, but don't check in that comments. Yes. Right. Which I think we've, we've talked about before, . Ben (20:39): Yeah. We might've, we might've talked about that before. Matt (20:40): We talked about not checking and commented out code. That's a whole other, a whole other episode. So here's another place where, uh, I am tempted very often to use copy paste and I, I say this to you partly as, uh, the catharsis of, uh, of, of admitting it in public, or at least to you and then our listener, a Ben (21:03): Confession? Confessional. Matt (21:04): Thank you. That was the word I was looking for. Ben (21:05): Clean cleanse your soul. You'll feel a lot better. Matt (21:07): I will, I will cleanse. Yeah. What remains of my, my soul, um, , what, um, what I tend to use copy paste for is having written a test for my code and observed its output against the empty string. I asserted it to be, uh, equal to. I will then read through the difference that my, my failing test tells me, Hey, got empty string expected, sorry, I got blah expected empty string. I will read through the blah. And if the blah makes sense to me, if it is the formatting is right and everything, then I would be tempted to take that and copy it into the test itself. Ben (21:43): Yeah. I mean, I don't know that that's, that's certainly not a deadly sin . I don't, I don't know. That's, I if you could see his face, Matt (21:52): You know, there's, it's clearly some sin like characteristics of this. It's, is trying to be kind. Ben (21:58): So I, so for me, it's, it's a question of if it literally is a string. So you can talk about sort of like different values and things, right? Um, you know, like the, the places let's talk about the places where that, that is probably not a great idea. And some of the places where it is, it probably is actually a time saver and not, not a bad idea at all. So if what you're doing is you have written some complicated interest rate calculation that's like compounding daily. Matt (22:25): For example. Ben (22:25): It's got a bunch of different factors in it, and you write out all that code, and then you write the test that asserts that it's the, the new balance is equal to zero. And then you run the test and it says, no, it actually, it's $217 and 38 cents. And then you just copy that into your answer, and then you'd be like, well, that must be correct. Ben (22:43): You're doing it wrong. Right? That's right. Like, that is not what you should be doing. Right. Um, if what you're doing is you've got some like human readable string representation of an object, right? That has like, you know, some interesting information in it that is intended for logs, or you're reporting on a screen or something like that, and you've written that function that kind of appends all the stuff together and there's no branches in that code. It's just gathering up a bunch of information, printing it all out, and you wanna take that and put that into your output. I think that's completely reasonable. So long as you're confident that there aren't gonna be any sort of like weird, like invisible character type things in there that you might not expect, right? Matt (23:30): Right. Or, Ben (23:31): 'cause you can accidentally sort of like copy and paste like an unprintable character, and then someone comes along and they like edit the thing, like taking outta space or putting it back in, and then all of a sudden the tests are failing and you're like, what is going on here? Right? Like, this code is identical. Like, what, what is the, what is the thing? So like, you know, as long as you're confident that, that the copy and paste is like, uh, not going to surprise anyone with its contents, then I think that's a completely reasonable thing. Matt (23:56): That's actually the exact case that I do use this particular kind of thing. It's like, yeah, I have the to string of something and it's like, I, I read the code, uh, well, I wrote the code and it seems reasonably sane to me, and I just wanna have some kind of test somewhere that says like, nobody broke this in a way that's surprising. And so I, I might to string something and look at it. Now, obviously there are, with things like that, you have to be a bit careful because, you know, if you have containers that don't have a well-defined sequence to them, you know, sets and things like that, that don't then you can sometimes become too sensitive to minutia of your code and it becomes very brittle. Ben (24:30): Yeah you wind up with an inconsistent test. Matt (24:30): And almost by its very nature, it's, it's very brittle. And in fact, it reminds me a little bit of, like one of our very first podcasts when we had Claire on talking about the acceptance test, it's a kind of inline acceptance test where you're saying like, this seems reasonable to me and I'd be interested if it changed, but, but you don't get the, the very, very high fidelity of like, oh, clearly you, this is, it was the toString of some sub sub sub object that is missing now, uh, closed peren. And you get the single targeted failure that says, oh, that's where it went wrong. Probably. You're just gonna find out that your giant string of like the whole world object that you created just as a kind of like smorgasbord of all the things test is now different and hopefully your output, your diff output is good enough for you to go to spot that it's a a, a brace missing or a com a parenthesis piece. Ben (25:17): Right? Yeah. Yeah. I think that's a really astute observation because I think a lot of the interesting stuff that Claire was talking about was all of the tooling and the infrastructure that they had to kind of like, deal with the fact that these things are kind of brittle and you need to do things like strip out transient dates and deal with, you know, like, uh, things that are potentially outta order and like all the tools she was talking about are like specifically designed to deal with the, those kinds of things, which your unit testing framework is almost certainly not. Matt (25:47): Right. Ben (25:47): So if you're writing that style of test Matt (25:50): You probably wanna use the right tool for the job then. Ben (25:52): Yeah. Right. Or, or like structure the test in a way where you understand that it's like, okay, we need to be very careful about how we set this up because these tools aren't set up for, they're not capable of handling that kind of variation, and so we can't let that variation in. Whereas with the acceptance testing tools that Claire was talking about, it'd be like, yes, that's completely fine. Matt (26:12): That's, it comes with its approach out of the gate, it's doing some things for you. Yeah. That actually makes me interesting. So one of my, one of the patents when I'm writing Python code and I'm testing like edge cases, and to some extent in C++es, I'm one of the rare heathens who actually uses exceptions in my C++ code. I know a lot of that's unfashionable, but certainly for like error cases and error conditions that are genuinely exceptional and, and typically will shut the application down, uh, I don't mind using 'em, but that's not why I'm talking about them. What typically one has with, uh, exceptions is that you have like some, some kind of class that holds the exception that you can, in your testing framework, you can match against and say like, Hey, I expect a, uh, an exception or an error of this type to be thrown, you know, missing parameter ex uh, expression or something like that. Matt (26:55): And it's good practice in Python to create a unique error for each of your like modules so that you can catch them in tests in general rather than just catching runtime error and things like that. Mm-hmm. , but even having done that Yeah. Even with the fidelity of knowing that you've caught the, this is, you know, uh, key not found in remote data store exception or whatever, you probably wanna look at the message that's in there. And there again, you want to kind of try and find the right amount of, um, brittleness of like, here's the exact error message. And of course, if it contains upstream things from say AWS or Google Cloud, it might have some arcane error message error number inside the, the, the string that you've got but what you really are looking for is just a bit of it. Matt (27:40): And so I will typically write, uh, sort of things that contains a string and it must contain missing string and it must contain my key name that I asked it. And then other than that, all bets are off. Right. I'm happy with that level of, of fidelity only, so I haven't copy pasted the exact error message that I was expecting, which I could easily get by just again, matching an empty string and then seeing what I get because it just seems too brittle. Right. And it's, you know, case or otherwise, and, and most of these matches also expect support regular expression, so you can kind of substring and regular expression bits of it. And so, so even then when you're copy pasting it, it pays to look at what you pasted and say, is this exactly what I wanted? Or should I have modified it in some small way? Ben (28:25): Yeah, yeah. Or am I just reinforcing the bug in this code by writing a test that asserts that the bug is there. Matt (28:30): Make sure the bug is not fixed. Yeah. Ben (28:33): Uhhuh . Right. Exactly. Um, yeah, I mean, I think, I think those are all like very, very reasonable places. Some of the places that might be a little bit more borderline are, um, which I, I could make some maybe arguments for doing it and I could make some maybe arguments for not doing it is, uh, a, a situation where there's a library that exists that does exactly what you need it to do. Right. There's some, there's some function in there. So there's some class in there that does exactly the behavior that you want. You do not want to implement this yourself. You want to use the library, but unfortunately the library has a hundred thousand transitive dependencies, none of which are necessary for the one function that you want. I Right. I have Matt (29:26): A quick question on that. Um, yes. You're developing in Java at the moment, aren't you? Ben (29:30): Yes. Also true in Python. Matt (29:35): Yeah. Okay. Fair. All right. It's a fair fight. Yeah. All right. Yes. But I was just thinking, it sounds like every Maven thing you've ever done before we've gone and in fairness, you know, JavaScript type script as well. Like, you're like, Hey, I just want this thing, this one function. And then you're like, why have I got 150 megabytes of text in my right node packages? How is this possible? How has that much code been written? Ben (29:53): How could you possibly need all these things? Yes. Um, but yes, I will, I will very easily grant, this is a common problem in the Java world, but it is also a common problem in many other worlds too. Matt (30:05): To on track, that I derailed you from you were saying that, but if there's this one class that implements this perfect thing of like, this is how to parse some data structure or some string, Ben (30:15): Right, right, right. And now I gotta pull in like layers and layers and layers of transitive dependencies, some of which might conflict with my existing ones. Or even if they don't might in the future, I might, I might just be like, Nope, we're gonna copy and paste this code. I'm gonna go find these 12 lines of code and I'm going to put them into my project and I might even write some tests around them because now this is my code. I own it now. Matt (30:43): Yeah, that's true. Um, but we should. Also say that one should be careful about licenses when does, when doing this kind of thing. Ben (30:47): Yes. Matt (30:48): So always considering, I know obviously that's a tough mind in this for us, but not necessarily, Ben (30:54): Not all open source licenses are created equal, and you need to understand how they work. Uh, you know, there are things like MIT that are generally pretty safe, but beyond that, it's sort of like, ask your lawyer. Matt (31:06): Ask your Yeah, yeah. Certainly at work . Ben (31:08): Yeah, exactly. Um, but, um, that is another situation where I would never say that that is generally the right thing to do, um, but in certain situations I, I have done it and I wouldn't, I wouldn't, uh, complain anybody else doing it because the trade-off just doesn't make sense. Like, I don't want to add, you know, three times the number of dependencies to this project for a 12 line function, or even like a hundred line function. Right. Yeah. Um, I, I just, I just want this behavior in my system, and so, um, the best way for me to get that is, is to copy and paste it. Matt (31:43): Yeah. Well, that's a good, I think that's a good argument for, for copying and pasting as well as just some cases where you just want that little bit of code, but you don't need everything else that comes with it. So here's an interesting, uh, thought I had. Um, and this is sort of now slightly left field. Uh, it occurs to me that, um, copy and pasting is only bad when a human does it. Right. If we do it, then, uh, it's, it's, uh, problematic, but if the computer does it, then maybe it's forgivable. Ben (32:13): Oh Matt (32:13): And the reason I say to say this is that like one of the, the sort of like the mothers of optimization in the compiler community is inlining functions, which is to say, copying the function you are calling from, where it was defined to where it's being used every single time it's used. And that is generally accepted to be a very good thing to do under certain metrics at least. Ben (32:41): Yeah. Matt (32:42): But the thing is, it's automated, right? I, it is done reliably by a computer every time, and it, and if I change the original implementation, it obviously will be compiled as long as you've got your compiler build set up correctly every time. So, so yeah. It's, it's, so I wondered if there was something from the properties of copy pasting that that, that we're losing. I think it's that when a human does it, you have lost the link between the original and the, the copy right there. Although I sort of see there's this ancestry that you could sort of track it with some way. Maybe, maybe this is my missing functionality in our tooling that we Ben (33:14): Yeah, that's an excellent point. And it's sort of like, imagine a world where like, you know, the lineage of a, a piece of copy and pasted code could be like very easily and obviously tracked. And I have no idea what that would actually look like, or how it, it work. Matt (33:28): Me neither but Ben (33:29): But like, you know, it's like, oh, you're changing this line and then there's some popup is like, did you know that this is a copy of this other line over here? Uh, and like in a not annoying and stupid way it would tell you . Matt (33:41): I mean, many IDEs of course do have some amount of like, Hey, this code is, there's these 10 lines of code are duplicated somewhere else. So it's a kind of extension of that, I suppose. But it's more like knowingly editing a duplicate, the idea could say, Hey, somewhere else you're also looping over and you've just looped over one less than than you having the copy. So do you wanna update the copy or not, or should I tell you about that? Ben (34:04): So the, the trick, the trick to that would be that the IDE would somehow have to understand the domain of the problem that you're trying to solve enough to know that it's like, oh no, this is just another balance calculation just the same as this other one. Because like, one of the things about duplication is you can have code that is the exact same code character for character and it can be not duplicated, right? It can, you can be for completely different reasons, completely different reasons to change completely different reasons to exist. It just happens to be in the same shape right now. Yes. But tomorrow it might be totally different. And if you, if you changed one and changed the other, that would be a huge bug and you would never want to do that. Right? Yeah. And you can have code that is totally different, but is actually duplicated, right? Yeah. Like the, the, the, the structure is different. The, the, the everything's different, but it, it, it's solving the same problem in two just different ways, and it needs to be unified, right? Yeah. And so, like the IDE would have to tell the difference Matt (35:04): Which humans can't even. Ben (35:04): Between all those different humans, humans, humans can barely do it. Right? Matt (35:07): This is why heuristics, you know, are coming to these IDEs and how many lines do I say? And I know some IDEs were like, well, even if you rename all the var variables, I quote know it's the same. Yeah. Bit of code. But yeah, if you, if you changed a for with a do while or a a, some other thing, it's very hard. Although, again, interestingly, and this is only, you know, here I am just keep steering back to things I'd know I love to talk about, but like, something that compilers do internally is they try and canonify all the different ways that you can write code. So they take a loop, any loop of it, be a do while or a while, or a for, and they rewrite it in a canonical form so that, um, similarities in code can be found. And more importantly, for a compiler specifically, Ben (35:46): oh yeah. Matt (35:47): It's, if somebody writes a piece of, so my, my favorite example of this is something where if you count the number of one bits in an integer by looping, looping over all of the different, you know, 32 possible or 64 possible things, and saying, is that bit set and then adding that up, or if you do it by shifting it down, or you do it by other ways, ultimately there is, there are about two or three ways you could write that. And if the compiler can, canonify them into one or two possible variations of intermediate representation, and whether you wrote a for loop or a while loop or a, a a do while or whatever, then the optimization that looks at that and says, there's a single CPU instruction that does exactly, that whole loop can kick in much more easily. Whereas if it has to deal with every possible combination and permutation of like, well, they used to for over here, but they used a while here. Ben (36:32): Yeah. Yeah. Matt (36:32): So it's kind of interesting that, that there's, there are, there's some, some similar shapes in all of this. It's all, all sort of connected. Ben (36:39): Yeah. So in an interesting way, like the compiler is unifying duplication across all of the projects that it compiles in the entire world. Matt (36:48): In, in essence, in order to then match it against the human curated list of like, Hey, uh, this is one way you can do a pop count, and this is another way you can do a pop count and Right. And or all the other things that happen. And, you know, and obviously, um, it makes their testing easier if they can guarantee that. Like a matrix of all possible combinations is reduced by saying, look, look, if there's a loop, it always looks like this. It doesn't matter if it's a do or a for, there's always a start condition, there's always a step condition. And then there's always a terminating condition, and there's always a cleanup condition. And there're always, there's four labels. It's always this way. Some of 'em are empty. Some of them immediately jump to the, the, the, the beginning of the loop. Some of 'em check the condition. Matt (37:24): So if you look at, for example, unoptimized code in, uh, your favorite tool for looking at, uh, compiled languages, then you'll often see that bizarrely, it'll jump to the end of the loop for a for loop, and then it'll jump back to the top again, because that's where the check is done sometimes is at the end of the loop. And so it'll jump to the first thing is, Hey, I set everything up and jump to the end. You're like, no, no, no. Start the loop. You're like, no, it has to check the condition at the beginning of the loop. And that's checked at the end. So it just jumps to the end, does the conditional check and goes around to the top. And so it's really quite interesting to see these things come out, but we're talking about copying and pasting, or we were and now I'm talking about loop optimizers Ben (38:00): . This is, I mean, this is, I, I, I really like the, the way that you're thinking about this. 'cause this is fascinating. It's sort of like getting to the heart of, I think it gets to the heart of why duplication is bad and what are the, some of the things that can make it good. Right. Um, and like, you know, I never really thought about before, like maybe there's some world where you're kind of like removing duplication by temporarily creating it. So imagine a situation where you suspect that two pieces of code are actually the same underneath the hood. Like, they seem like they're doing the same thing, but you're not sure. And so you're, you're, the way that you're gonna figure this out is you're gonna bag of sand it. So you're gonna make a copy of one of them, and then you're gonna try to refactor it and maybe even just rewrite it into the shape of the other one. And then as soon as like you've achieved duplication, it's almost like rows in Tetris. Matt (39:03): Yeah. They disappear Ben (39:05): And they disappear. And you're like, these are actually the same. And then you can delete your copy, and then you can delete one of the originals, and now you only have one of the three. That's Matt (39:11): Cool. Right. That's an interesting way, yeah. Of like, sort of, or it's almost like refactoring where you can make changes, uh, over and over again. You say, well, can I, or, or there are, you know, branches of mathematics where you can sort of show two things are distinct by slowly applying changes until you can show that one is equivalent to the other through a sequence of steps. Matt (39:29): So yeah, I I'm sure there's some clever things that you can do in that, in that respect. Ben (39:34): Yeah. Well, but yeah, duplication done by your compiler is, is, uh, an interesting variation on this. Matt (39:38): I mean, again, it seems interesting, Any way, to get me to talk about things I'm not talking about. Ben (39:44): This is why we have this podcast. Matt (39:46): That's why we have this, Ben (39:46): The intersection between all these different things that works out so well. Matt (39:49): Yeah, I suppose so. I suppose so. Yeah. Well, I think that's about all I've got for copy and pasting. Ben (39:54): Yeah. I can't really think of anything else. Matt (39:56): I'm sure there are other things, that we could come up with. Ben (39:57): There's probably some other, you know, att (39:58): It's, it seems like this is a good amount of time to be talking about it and I'll listen to, hopefully it's been kept entertained on their dog walk or commute or whatever it is, Ben (40:07): Explaining to their teenager what WYSIWYG and printing is Matt (40:10): Their teenager what printing and WYSIWYG. Yeah. . Ben (40:15): Yeah. Matt (40:16): Cool. Well, I'll see you on the next one. Ben (40:19): All right. Sounds good.