Episode 26: Is Universal Darwinism the Sole Source of Knowledge Creation?

  • Links to this episode: Spotify / Apple Podcasts
  • This transcript was generated with AI using PodcastTranscriptor.
  • Unofficial AI-generated transcripts. These may contain mistakes. Please check against the actual podcast.
  • Speakers are denoted as color names.

Transcript

[00:00:00]  Blue: The Theory of Anything Podcast could use your help. We have a small but loyal audience, and we’d like to get the word out about the podcast to others so others can enjoy it as well. To the best of our knowledge, we’re the only podcast that covers all four strands of David Deutsch’s philosophy as well as other interesting subjects. If you’re enjoying this podcast, please give us a five -star rating on Apple Podcasts. This can usually be done right inside your podcast player, or you can Google The Theory of Anything Podcast Apple or something like that. Some players have their own rating system and giving us a five -star rating on any rating system would be helpful. If you enjoy a particular episode, please consider tweeting about us or linking to us on Facebook or other social media to help get the word out. If you are interested in financially supporting the podcast, we have two ways to do that. The first is via our podcast host site, Anchor. Just go to anchor.fm -4 -strands, F -O -U -R -S -T -R -A -N -D -S. There’s a support button available that allows you to do reoccurring donations. If you want to make a one -time donation, go to our blog, which is 4strands.org. There is a donation button there that uses PayPal. Thank you. Welcome back to The Theory of Anything Podcast. Hey, Camio, how’s it going?

[00:01:29]  Red: It’s doing great. I’m feeling great. Bruce, how about you?

[00:01:33]  Blue: Doing good. I am about ready to start a new job, and that’s exciting. We also have Tracy with us this time. Tracy.

[00:01:43]  Green: Hello. Happy to be here.

[00:01:46]  Blue: So we’re going to continue the discussion on artificial intelligence and knowledge creation. In this episode today, I’m going to admit that everything I’ve been telling you up to this point is wrong. Why it’s wrong? Maybe. Maybe. We were having a little bit of a pre -show. This is the way knowledge creation is. It’s our good epistemology. We are throwing out bad explanations. We’re holding on to the good ones. We only really have one good one, which was the universal Darwin algorithm that I’ve suggested, and yet we’re going to see there’s a bit of a problem with it. What we ended up with is a very simple high -level algorithm. Try out variations of potential solutions to a problem and select the variations that best solve the problem. My argument was that that process, as long as you actually have some way of actually measuring which one’s better, and there’s a few caveats that are important, that it must have necessity result in improvements. Let me be clear here. Improvements is really a euphemism for error correction, that you will correct errors, that you will improve because you’re doing error correction. I’m basically saying that error correction is knowledge creation, that knowledge isn’t something special. It’s just a word. It’s the word we happen to use to describe that error correction that takes place, and the improvements that take place when we do a variation in selection algorithm like this. I’ve struggled to see alternatives to this. Even though, as I’m going to show you, there’s a problem with it, I don’t really know what the alternatives would be, and even though I’ve discussed this with lots of people, no one’s ever really offered an alternative, and that’s the thing I’ve been struggling with.

[00:03:44]  Blue: I’m going to explain, and of course, a lot of times I think they’re offering an alternative, but I’m going to explain why I don’t feel like they have. Let me move along here. Here’s the thing. Campbell, he said the following. He said, I insist that if discovery or expansion of knowledge are achieved, blind variation is requisite. Now, we’ve just discussed that when Campbell says blind variation, that I believe he means just variation. He’s trying to say variation that might be random, but it might not be random, just as long as there’s variation. And I’ve explained why I feel that way. I’ve also admitted that most people probably disagree with me on that, but this is, as far as I can tell, the correct reading of Campbell. It shouldn’t become as too much a surprise then that I read this statement differently than many other people do. To me, this looks like a bold prediction, very bold prediction. It seems to me that he’s saying universal Darwinism is the sole source of knowledge creation, that if you ever have knowledge creation, that you are using this variation selection algorithm that I’m talking about and that you will never find knowledge creation unless underneath that’s what’s going on. And that is how I read Campbell when I first read him. And I’ve heard that elsewhere. I didn’t know where it came from. People will often say, Deutsche will often say that there is no source of knowledge creation other than the evolutionary algorithm. And so I think maybe this was the source of that. I don’t know how far back that idea goes. Maybe it predates Campbell, but this was the first time I had read it in a source where someone had actually made that claim.

[00:05:32]  Blue: Now, here’s the problem though. Not everybody reads that sentence the way I just did. So there are other people I’ve talked with, and they read it as kind of backwards from me. They see it instead as a sort of untestable metaphysical assertion. So the two possible interpretations would be mine. If you compute a solution to a problem without variation selection, a novel solution to a problem that you don’t already know how to solve via a direct computation, you disproved universal Darwinism as a universal law. You might still be valid in some non -universal sense. I mentioned before in past episodes, I’m not personally claiming that this is the sole source of knowledge creation, but it would disprove how I’m reading this claim from Campbell. So therefore makes it a testable claim. Can you see that’s the case? Because if you read it the way I’m reading it, he’s making this bold prediction that’s testable. I agree. Okay. Now, the other reading though, is that if you compute a solution to a problem, and it’s this novel new solution to a novel problem, and you didn’t use variation and selection, then we’re just going to relabel that as pre -existing knowledge that we didn’t know that we had. And you know what? The way I kind of complained that Campbell’s not, he’s kind of an opaque writer, you actually can read that sentence both ways. And I’m going to have to just admit that up front. So I guess it’s at least a possibility that the second interpretation is correct. And that he wasn’t trying to make some sort of bold prediction that the Darwinian algorithm is the sole source of knowledge creation, or rather that it can be tested.

[00:07:29]  Blue: That instead he was just simply saying, this algorithm defines what knowledge creation is. Okay. So in other words, this other interpretation is we define knowledge as that which is produced by the algorithm that he is suggesting. Now, if this is the case that he was saying that, one thing that should be obvious is that a lot of my arguments fall apart. Okay. That like my argument that we have to drop the word blind to make sense of this, that was based on the assumption that we’re talking about a testable algorithm, and that we’re looking for a hard to vary explanation that is testable. If we’re reading Campbell as making more of a metaphysical untestable assumption, then there’s no particular reason to favor my interpretation over everyone else’s. Because you can just simply say, well, he’s saying blind variations requisite to define knowledge creation. And therefore, if you have a sighted variant, then that isn’t knowledge creation. It’s preexisting knowledge. And so many people, because especially because of the pseudo -Deutch theory of knowledge that we discussed, they’re very comfortable with that interpretation of Campbell. And this is part of the reason why I’m recording this is to make my argument against that point of view. The thing that you have to keep in mind here, though, is that it absolutely makes the theory unfalsifiable by definition. If blind variation, whatever that means is required to be considered knowledge creation, otherwise it’s preexisting knowledge. There is no way to test this theory. It’s just an untestable metaphysical theory. You cannot falsify it, not even in principle. If it were wrong, let’s say you had to have red blind variations. That’s stupid, but I’m just trying to give a stupid example.

[00:09:19]  Blue: How would you differentiate between the two theories? They’re both just untestable metaphysical theories, right? This is the real reason why I don’t favor that reading of Campbell. And it’s not just that this matches my theory or something like that. It’s that we can discard untestable metaphysical assertions under critical rationalism. We really shouldn’t spend time on it. They might make maybe a good research program, but they aren’t good explanations. This is why I not only don’t favor this interpretation, but if I’m wrong and this is the correct interpretation that he was just trying to make an untestable metaphysical assertion, then I’ve lost all interest in his theory anyhow because it’s no longer a good theory, a good explanation, and I don’t need to spend any more time on it. This is my point of view. So this raises the question then. So I’m going to now treat it like it is a bold prediction, that he is making the claim you’re not going to find examples of apparent knowledge creation without finding that underneath there was variation selection going on. Regardless of whether this is the correct interpretation or not, I’ve now explained why this is the interpretation that is of interest to me and the one that I want to see tested and why I’m disinterested in the other interpretation. So that’s what we’re going to do next. I’m now going to go back to the examples that we used before about artificial intelligence and then we’re going to test this assertion against those examples. So let’s start with the problem solving by searching. Do you guys remember this from back from that episode? Tracy has had a chance to watch those. Do you remember this example?

[00:11:09]  Red: Absolutely. And yes, I do remember this problem solving by searching. So

[00:11:15]  Blue: in the example that’s on the screen for those who can actually see it, we’re talking about in this case, searching for the shortest path between two cities. But that’s what, you know, there’s all sorts of, that’s generalizable to all sorts of different types of searching. So what is the knowledge? We want to find the best path. That’s the knowledge that we’re seeking that we’re trying to create. And how are we actually doing it? We’re doing it by trying the best possible paths. So we have clearly variation in selection going on. So this example exactly matches Campbell’s bold prediction and exactly matches the universal Darwin algorithm as I’ve explained it.

[00:11:54]  Unknown: Okay.

[00:11:55]  Blue: So let’s keep going. Let’s see if the other examples match. The non -optimal search algorithms. All right. So this is like hill climbing, simulated annealing, genetic algorithms. Those all work. The knowledge you’re seeking is to find a locally best variant. You’re not necessarily looking for the absolute best variant, the global minima, but you’re looking for just something that’s pretty good. And the way you do that is you check your neighbor. You say, okay, how close are, you know, I’m at this location in the fitness landscape. What variants are neighbors to me? And you just see if you can improve. And they do it in different ways. Hill climbing is the kind of the most obvious where it just gets exactly the neighbor next to it. Simulator annealing does that but with some extra tricks. The genetic algorithm actually leaps across the landscape and is one of the most interesting because it actually learns information about what the landscape looks like that allows it to try to home in on where the best places it should be in the landscape. But all of them are using a process where they have some sort of concept of what’s my neighbor and how do I, how do I check a variant that’s a little different than the one I’m currently looking at? Okay, so this is again spot on. Campbell is correct. The universal Darwin algorithm underlies this artificial intelligence technique. So adversarial search was the next one we talked about. So this is the min max algorithm. This is how you learn to play chess that you actually get deep blue to play really good chess is using this algorithm where it tries out different moves.

[00:13:26]  Blue: Okay, well I kind of just gave it away the way I said it. The knowledge is I’m trying to find the best move and how I do that is I try every single possible move out as far as I can. I might have a board, I need to have a board evaluation algorithm that’s a heuristic that helped me know when I’ve got a better board but still that this that part is different but what we’re really doing with this algorithm is that we’re trying variations and we’re selecting the best ones. So this again is a spot on Campbell is correct. All right, constricts satisfaction problems. We talked about Sudoku trying to solve Sudoku. The knowledge is we’re trying to find a solution and the way we do it is we try every combination. Again, variation in selection Campbell is spot on. Okay, logic and planning problems. This is a little harder to describe. We gave the example of the Wumpus world. So let’s just go with that. The Wumpus we’re iteratively update a policy attempting to improve it. That is a type of variation in selection for the logic. What you do is you try improvements to the knowledge base. You say, okay, this these true statements imply this true statement. I’m going to add that to the knowledge base. I’ve actually programmed one of these with an HPL algorithm. And then it now knows a little more. And so it tries something else. And even though we’re talking about very different senses of variation in selection, there’s still a really clear variation in selection process going on with both of these examples. So again, these are just spot on Campbell is just correct with these examples. Okay.

[00:15:01]  Blue: And so far, Campbell’s just hitting it out of the ballpark. Okay. Now, this is the one where we start to get into maybe a problem. Not yet, but it starts to uncover that maybe a problem exists. So let’s talk about probabilistic reasoning. So we talked about Bayes nets as an example. And we talked about Google cars as an example. The Google cars use particle filters, which is really a survival of the fittest, very clearly a survival of the fittest type situation. You create a particle that has a momentum and a position. And then you move them all on your map that’s inside the computer, compare it to what the sensors are saying. And then you say, okay, which of those particles best expresses where what I just saw using my sensors. And then it does this. Interestingly, it does a survival of the fittest, but one of the debates that comes up in critical astralism is the whole replicators versus not replicators. This is a case where they fake replicators. You don’t actually have to have multiple versions of the same particle surviving. You simply mark it as this one is survived more often. And therefore it’s a better one and you wait it. So it’s an example of Ella who we had on the show. She’s pointed out that anytime you have replicators, you can always rewrite the algorithm to not use replicators. And this is an example of that’s exactly what they did to save memory and to save time. But the end result is something that’s very close to Darwinian, you know, not very close. It is the Darwinian algorithm. The one that isn’t like it is the Bayes net example. So the Bayes net, I

[00:16:47]  Blue: kind of explain that how you can calculate the chances that a burglary has taken place based on if John or Mary calls that sort of thing. There is no variation in selection going on there. However, before we conclude that this is a counter example to Campbell, there is something we need to kind of point out. Some things that we call artificial intelligence aren’t really meant to discover new knowledge. So most of these algorithms, they’re trying to discover some sort of piece of knowledge, and they’re using search algorithms to do it. But the term artificial intelligence is this giant umbrella term. And it includes all sorts of things, including sometimes just how to utilize existing knowledge better. A Bayes net really falls into that category. So yes, it’s an exception, but it’s a benign exception. It’s not trying to be knowledge creation. Therefore, you can’t count it against Campbell’s example. What they’re really doing there is they’ve got some knowledge about probabilities, and then they’re laying it out in a way that’s tractable so that a computer can calculate it quickly. So for some reason, that’s still called artificial intelligence, but it’s no longer the kind of artificial intelligence that is even pretending to be knowledge creation. Therefore, I don’t see it as a true exception to Campbell’s example. There’s other things like that. Production systems is a famous kind of artificial intelligence, which really boils down to writing out a bunch of if then statements. It’s almost like programming. It’s you’re trying to capture expertise in a language that people can understand, and they don’t have to be a programmer, but it’s kind of still program like. And then my Ravens Progressive Matrix agent, I had to write one for a class.

[00:18:34]  Blue: Ravens Progressive Matrix is an IQ test. So we were trying to write AI algorithms that would do well on this human IQ test. And partway through this class, I started to realize all I was doing is I was using my own creativity to come up with answers to the examples I was being given, that I was figuring out what I had done, that I was programming it into my program, and then any time a similar sort of problem came up, it would know how to solve it. There’s zero attempt to do knowledge creation by the algorithm. It’s really just me distilling it into a program. And therefore not really all that different than any other type of programming. So I want to just say here then that we’re not really going to claim that all of artificial intelligence has to match Campbell. If we know it’s a benign exception, then we just don’t care. If it’s not trying to create knowledge, it’s making no attempt or it doesn’t even seem like it’s creating knowledge, then we don’t consider it an exception and we’ll just ignore it. Okay. Are you with me on that and can you buy that? I can buy that. Yeah, we can buy that. Okay. So at least for all, and you know what, these examples I picked, Cammy, I did this for your company, right? I mean, like, they’re the exact same examples that I used there. I was picking the most common examples. There was not an attempt for me to pick ones that matched Campbell. So it really is just the case, the examples that I picked, all of them match Campbell.

[00:20:04]  Blue: And so that’s a really good sign that Campbell’s onto something here, that his bold prediction is looking correct so far. Okay. And that was actually what got me excited when I read Campbell and when I went back and I had done this presentation for you before I ever read Campbell, I went back and I looked at it and I thought, wow, Campbell’s like spot on with this. Now, here’s the problem though. Machine learning is the most important form of artificial intelligence. It’s certainly the one that gets all the news worthy stuff today in the media. And machine learning has made amazing advances recently in the last decade, 15 years or so, particularly as we’ve gotten faster computers and we’ve had companies like Google that have just got gaps of data. And it turns out that machine learning works way better when you have more data. If you have like really, really large amounts of data, it improves machine learning way more than we thought it would. And you can get all sorts of results that would have been considered impossible 10 years ago. You can suddenly start having machine learning actually function on them. Because, and we’ve never had that much data sitting with one company before. So this was the big breakthrough came from the fact that suddenly we had companies that existed that had massive amounts of data that they could try running through machine learning algorithms. And that’s when we discovered, oh my gosh, machine learning is more effective than we thought it was, right? It’s just we haven’t up to this point had the necessary computing power and we haven’t had the data necessary for it.

[00:21:40]  Blue: So really machine learning, if there’s going to be a problem with Campbell’s theory, it’ll be in machine learning. So we’re going to have to consider machine learning on its own here. Now, I’ve got this interesting quote up here from Leslie Valiant. Leslie Valiant is super famous in machine learning circles. He came up with what’s called the PAC learning theory, probably approximately correct. It’s the theory underlining all of current machine learning. And basically, it gives certain guarantees about if you have enough examples, then you can eliminate existing hypotheses and you can get down to the ones that remain and the chances it will be correct. You can set certain probability bounds on it. It’s a very interesting theory because it creates theoretical guarantees. So he said in his book, his book is called probably approximately correct, by the way. He says machine learning is the general field that studies how complex mechanisms can be created without a designer. So certainly, people in the field of machine learning very strongly perceive machine learning as existing to create advances in knowledge. We’re studying how does knowledge creation happen and how do we have complex mechanisms, as he says, without a designer. So this is what they’re trying to do in machine learning. So that’s the first thing. So machine learning, furthermore, it’s consistently come up with what Campbell called apparent discoveries or expansions of knowledge. So you guys have both watched the AlphaGo movie, haven’t you? Tracy, have you seen the AlphaGo movie? No, I have not. Okay. No worries. You’ve had a chance to see it though, right, Camille? Yeah, I just watched it last night. It was great. It’s an amazing movie. It really is fun.

[00:23:35]  Blue: We’ll do an episode on just the AlphaGo movie because it’s such a great movie. So the reason why I bring this up is when I explain what I’m about to explain, it doesn’t have the same impact as when you actually see the people who are involved trying to deal with what’s happening. So AlphaGo was this machine learning algorithm that was the first to play Go at a professional level. And in fact, it didn’t just play at a professional level. It ended up beating Lisa Dahl, who was the world champion at the time. So it became the world champion. There was a gigantic leap. Go is so hard to write an algorithm for, or at least that was what they believed back then, that they thought it would require general intelligence to be able to actually create a good Go algorithm. The minimax algorithm, which is how we normally do game algorithms like this, it just didn’t work on it. The number of moves is so exponentially large. And the degree to which a single move can shift a board from being a good board to a bad board in one move is amazingly large. And so you have this branching factor that’s just too large. And so you just can’t get away with using the sorts of algorithms we use to play chess with. So people, nobody had any idea how to actually go about writing a Go playing algorithm that was anything but a real amateur. And Go players knew this. So the idea of a computer beating them was like laughable because everyone knew Go programs are horrid.

[00:25:21]  Blue: And so AlphaGo comes on the scene, Lisa Dolls super confident that he’s going to win because it’s just unthinkable that a computer could beat him. And during one of the games, they have something called move 37 that took place. Everybody laughed because the move was such a bad move. And in the movie, you can actually see like the commentators who are watching the show start to laugh because AlphaGo has made such an embarrassingly bad move. And as Lisa Dolls studying the board, and then they continue to play. And the programmers think it’s a bug. They’re like panicking because they think that they have a bug in their algorithm that led to this really bad move. And they’re super embarrassed. They continue to play and down the road, turns out that move was an amazingly good move. It looked bad because AlphaGo had come up with a whole new style of play that no human being had ever seen before. And Lisa Doll later, and I got the quote here, he said, I thought AlphaGo was merely a machine. But when I saw this move, I changed my mind. Surely AlphaGo is creative. This move was really creative and beautiful. So you’ve got this AlphaGo, which is doing creative gameplay that no human beings ever seen before, that the programmers think is a bug because they don’t understand what it’s doing. And yet it turns out it’s a very subtly beautiful move that changes. And it ends up that we now play go today. Human beings don’t play go the same way they did back then because AlphaGo as a player has changed how human beings play go because it came up with a style of play that they had never thought of before.

[00:27:08]  Blue: So it would be so hard to think of this example as not being a discovery or expansion of knowledge, not being creative in some legitimate sense.

[00:27:20]  Red: And the other thing that I think stands out is the programmers who were programming AlphaGo don’t actually understand the game that well.

[00:27:32]  Blue: You know, when they’re talking about it, they’re confused by some of the moves that great players make. They don’t completely understand the scoring. They don’t really understand the game. So how could it not be knowledge creation for them, for the algorithm to be able to figure out how to do something that they don’t understand? Right. I think this is a really convincing example of this, particularly if you go watch the movie and you see the human beings involved where you can actually see their emotions as it’s taking place real time. I think it will seriously cause doubt for someone who thinks AlphaGo didn’t create knowledge, right, that is convinced that the pseudo -Deutsch theory of knowledge is correct and is convinced that the programmers had the actual knowledge and all the knowledge came from the programmers. I think this is a really strong refuting, falsifying example of that point of view. It would be hard to imagine, I mean like if you don’t accept this as a falsifying example, I doubt that you can conceive a falsifying example, right? It seems unlikely falsifying examples exist if you don’t accept this one as one. A couple other examples of this that I actually find also very convincing. Visual recognition algorithms were hand coded by experts 10, 15 years ago and they were really bad, but they had these people who had spent their lifetime coming up with how do you make a good visual recognition algorithm? How do I find lines? How do I like look for eyes? How do I use that to, let’s say I’m trying to make an algorithm that recognizes cameo, right? What am I looking for? These humans would come up with these and they just weren’t that good, right?

[00:29:23]  Blue: It was just very poor performance and you had people who would spend their whole careers just trying to understand how to improve visual recognition algorithms. There was a guy in a podcast that I was listening to and he said, I went to the people who were doing this for their career and I was talking to them at a conference and he was a machine learning guy. He said, you all need to switch fields now. He says, because in five years, this whole field will be dominated by deep learning. Deep learning is going to do create visual recognition algorithms better than any human being knows how to do and he was right. Today, when you take a picture and it says, oh, that’s cameo. It’s deep learning that makes those algorithms. It’s no human being alive knows how to make a good visual recognition algorithm. We don’t have a clue how to do it, but deep learning knows how to do it, okay? This is again an example of how we have what really looks like discovery or expansion of knowledge as per Campbell coming out of machine learning. If you’re not going to accept this example, then it seems unlikely you will accept any example. You probably should just admit that your theory is unfalsifiable. The other one is deep fakes. Have you guys seen deep fakes? Do you know what I mean? Yes. I’ve seen ones where you have like, have you seen the ones where they have Obama saying something really funny or something like that and you can see him talking and his mouth is moving and it’s his voice and yet he’s just saying something ridiculous or something like that. Have you seen anything like that before?

[00:31:09]  Green: I’ve seen things like it, yes.

[00:31:12]  Blue: So those are created by something called generative adversarial networks again. Basically, the idea is, we’ll discuss this in a future episode, but basically the idea is, is that you have two neural networks, one that tries to create a fake and one that tries to detect a fake. So when they first get started, the one that’s trying to detect the fake can hardly, it doesn’t know how to detect it. You feed it real images and you feed it fake images which maybe are noise, like just white noise at the beginning and it doesn’t know the difference. But it’s being trained so it’s starting to get better with each generation. And you have the other one that’s trying to create a fake one. It’s just outputting noise. So you give a picture of Obama and you give a picture of noise and it’s really obvious that which one is to a human, which one is the fake and which one isn’t. And then you let it learn over time and the two compete and you just let them go at it. And the end result is, is that even though the one that generates the pictures of Obama, you know, the fake images or video of Obama, even though it never saw images of Obama, just from its competition with the other network, it starts to generate video that looks just like Obama to the point where finally a human can’t tell the difference anymore.

[00:32:32]  Green: Wow.

[00:32:33]  Blue: So that’s, that’s how those work, right? How can those not be an expansion of our knowledge? This is a super creative sort of function that deep learning is creating that we don’t know how to create. No human being alive knows how to create. And yet it’s creating this. So machine learning has got to be the strongest challenge, right? It’s got to turn out to be the case. For Campbell’s bold prediction to be true, it must be tested against machine learning. And if machine learning didn’t use universal Darwinism, that would absolutely be a problem for Campbell’s bold prediction. And it would call into question if the universal Darwin Darwin algorithm is actually the sole source of knowledge creation that we think it is. So, okay, now, worse yet, machine learning, it is the way they teach it in school. And I just barely graduated. So I’m very familiar with the way they teach it in school. They absolutely teach machine learning as being inductive. Now inductive is the is the philosophy that was discredited by Popper. So in fact, here’s a quote from my textbook. This is the textbook by Tom Mitchell called machine learning. It’s the most famous starter text for machine learning in existence. The oldest one. And he says, any hypothesis found to approximate the target function well over sufficiently large set of training examples will also approximate the target function well over other unobserved examples. That is a description of induction, right? I mean, this is very close to the understanding of induction that Popper discredited. And then worse yet, here’s his definition of learning.

[00:34:16]  Blue: Okay, which he says a program is said to learn from experience E with respect to some class or task T and performance measure P, if it if its performance at task T as measured by P improves with experience E, this is like clearly empiricism empiricism, right? Another discredited philosophy that we learn from experience. Okay, directly. Now, I mean, obviously, we do learn from experience, but from a Popperian viewpoint, it’s always theory first, right? It’s things are based on conjecture and refutation. So again, this seems like they’re endorsing these bad philosophies. Even worse, machine learning usually there’s there’s one exception called explanation based learning, but usually it’s purely instrumentalist, which is another discredited philosophy. They’re just trying to make predictions, they’re not trying to find explanations. They don’t care what the explanations are, they’re just trying to find something useful. So machine learning is deep rooted in bad philosophy. And worse than that, it actually works. And it seems to actually work based on these bad philosophies, right? Okay, do you see what I’m scary? Yeah, okay, I wanted to really emphasize this, because this is what I bring up a lot to my Popperian friends, and they don’t really see it as a serious problem. But it is, it’s a serious problem that machine learning, you know, they usually kind of say, Oh, machine learning, it’s inductive, they’ll admit that it’s inductive. So therefore, I don’t care about it, because that’s a bad philosophy. Okay, when I saw this, my first impression was, well, induction is false. So it must be the case that machine learning works based on universal Darwinism and no one’s noticed it before.

[00:36:00]  Blue: And in fact, it was in the AGI episode of this podcast where I expressed that idea publicly for the first time. And immediately, Dennis Ella and I got into an argument, and then we had to shelve it. And then we continued the argument after the show for a long time. And I am the only one who actually was supporting that view. I actually thought that would be what any Popperian would think is that they would say, Well, I mean, clearly, it can’t be the case that machine learning is working based on induction. So even though it looks like it’s working based on induction, it must not be induction because induction’s false, it must be based on universal Darwinism instead. Otherwise, how would we explain the knowledge creation that is coming out of it? And nobody agreed with me on that. And I’ve talked to not just Dennis Ella, but I’ve talked to numerous Popperians. And I won’t say I’m alone. I actually think I’m starting to convince some people. But it surely is not, I thought it would have been obvious that that would be the correct conclusion for a Popperian standpoint. And no other Popperians saw it that way. So that was actually where the idea for this all came from, was from that little discussion on that episode. I thought, you know, I should circle back. I should see if I’m right about this or not. And as we’ll see, maybe I’m not. Okay, but that was my first impression. Okay, so let me make this clear, though. A common argument is machine learning is inductive. So it’s on the wrong path. You hear that all the time amongst Popperians.

[00:37:25]  Blue: Okay, and usually they mean on the wrong path for AGI or something like that. But just it’s on the wrong path. It’s not that interesting because it’s inductive and induction’s wrong. It is not possible for machine learning to work based on a non -existent theory. This is the thing that I think people are missing. Okay, so it’s either one of these two would have to be the case. Either A, machine learning is not inductive, it looks inductive, but it’s not. Or induction has to actually exist. And Popper’s disproof of induction must be wrong in some way. If you’re trying to use the logic, because I just laid it out, I don’t see any other possibility. You see any other possibility?

[00:38:03]  Red: Not with the way you’ve laid out your logic. I don’t.

[00:38:08]  Blue: So my conclusion was machine learning must not be based on induction. Now I’ve come to the realization there actually is an error with the way I just laid that out. It could be the case that induction is vague, that the word induction doesn’t point to a single specific concept. And I’ve started to believe that’s actually the case. The more I’ve talked to people who consider themselves inductivists, I don’t think they have a very specific idea in mind. I think it’s kind of vague and loosey -goosey. If that’s the case, then it could be the case that the word sometimes points to a concept that is discredited, but sometimes points to something else. In which case A and B could both be true, but just under different definitions of induction. Okay, however, when I’ve talked to both inductivists and Popperians, and I’ve suggested that to them, again, I get universal pushback. There seems to be agreement between both inductivists and Popperians that induction is a very specific concept that it’s not loosey -goosey and that Popperians say it’s wrong, inductivists say it’s right, but they don’t buy my argument that it might be a vague concept that actually points to many different underlying concepts. So let me just say this. I actually think I’m right that it’s vague. The more I’ve talked with inductivists, I can see that they don’t agree with each other. So I’m convinced that we’re not talking about a single -coherent concept anymore. However, if you really do buy that induction is a single -coherent concept, then my argument above must hold. In which case, the dismissal of machine learning because it’s inductive can’t be correct. It must be the case that there’s something else going on.

[00:39:50]  Blue: So let’s take a look now. Let’s dig into machine learning. The most popular kind of machine learning is deep learning. It’s the most powerful. I want to start with this one. If it turns out that deep learning, which is neural networks, if it is not based on the universal Darwin algorithm that we’ve talked about, variation in selection, then we’ve got a serious problem with Campbell’s bold prediction. So let me explain what deep learning is. It works off of something called gradient descent. So you can see from, for those who are able to actually see my screen, but I’ll describe this, it works by, it’s very similar to hill climbing or hill descent in this case, where you’re somewhere in this parabola here. Imagine that’s the problem we’re trying to solve. And you’re here. This is what your weights are. And then you say, okay, since I’m here, I’m going to measure what the slope is. That’s what a gradient is, is the slope. Based on my slope, I know which direction is going to improve if I move. Or at least I know with a high degree of probability that it will improve. It’s not 100 % guaranteed. And then you move in that direction, and then you measure it again. And you say, okay, what’s my gradient now? And you just keep doing that until you no longer get improvements. And by doing that, if you really had this parabola -like structure, you would be guaranteed to get to the global minima, which would be the best possible solution with the fewest errors. And gradient descent is based on that. In fact, there is linear regression uses gradient descent also, and it is a parabola shape. And so it’s guaranteed to find the global minima.

[00:41:38]  Blue: But neural networks aren’t shaped like that. This image here with this grand canyon -like look to it, even that’s too simple. I mean, we’re talking about hundreds of thousands of dimensions. We’re talking about going in all sorts of different directions. And there just isn’t really any guarantee that you’re going to get to the global minima on a neural network. So typically what we’re really doing is we’re just doing a search problem. And all these images, by the way, credit to my instructor, Dr. Kira. I just pulled them from his lectures. And you can see from the lectures, he defines gradient descent as a search problem. And so he sees it as, look, it’s a lot like hill climbing. You’re just simply trying to find a decent local minima. You’re trying to just find some sort of solution that’s fairly good, good enough. It makes good enough predictions and you’re done. And so you just travel down the gradient, you keep checking what the slope is, you keep going a little bit further down. And once you no longer are getting improvements for some period of time, you halt the algorithm and you select which one was ever the best one and you’re done. And typically what you would actually do though is you would run this multiple times, you would try it with different hyperparameters. So there’s actually kind of this second outer loop that takes place that has maybe more clear variation in selection. But even if you’re just talking about a single descent down, this is, you know, this is how it works.

[00:43:05]  Blue: So based on this, the reason why I want to look at this is that this is the example I get most commonly that paparians tell me this is a sighted variant. Therefore, this one doesn’t follow Campbell’s algorithm, because it seems very sighted at some intuitive level, right? I mean, this is a very powerful algorithm that allows you to search through this gigantic space and because it can measure the slope of the function, it just almost magically knows how to keep finding better and better variants. And it doesn’t have to search all across the landscape, it can just kind of search right to a decent local minimum. And so this, if anything was a sighted variant, if they exist, I don’t know that they do exist, but if they do exist, this would probably be it. Okay. Therefore, this represents an edge case to the theory. So this is the basic gradient descent algorithm. Does that make sense? Do you have any questions about this? I can clarify if I wasn’t as clear as I should have been.

[00:44:04]  Red: I think you were clear. Probably any questions I had will just show a lack of an tell of processing on my part.

[00:44:17]  Green: You’re not alone. Maybe just keep going. Well, I’ll wrap my brain around this better.

[00:44:22]  Blue: Okay. So maybe just a couple of things that will help clarify a little bit. A neural network is a computation graph. So imagine you’re trying to do a computation and you’re doing it through nodes. You move through nodes and as you compute something at one node and then you move the next node and you compute something different. That’s really what a neural network is. Okay. What they really do is they set up a computation graph and these nodes have a large number of parameters. So imagine a function with hundreds or thousands or hundreds of thousands, maybe parameters in it. Okay. Imagine that what you’re trying to do is you’ve got examples. So you’ve got, okay, I’ve got inputs and I’ve got what the correct output should be. So I’m going to try to feed it into my function and then I’m going to try to tweak all the parameters on my function until I’m getting a fairly good prediction that matches what the correct answer is. That’s really all it’s doing, but it uses this idea of a function having a slope to seek out how to find a good local minima. And it’s very cool how it works. It’s amazing that it works at all. Like if I were to take you up into the hills and I would say, okay, here I’m going to give you something that tells you the slope of where you’re standing or you can probably just tell just because your inner ear tells you that. And you’re blindfolded though, but I want you to walk down the mountains following the slope.

[00:45:55]  Blue: Is there really any chance you’d make it down the mountain or would you really just get stuck really soon at some local minima where you have to go up before you can go further down? Okay. That’s interesting. Okay. There’s no real reason why this should work. The fact that it does work is actually an active area for research because we don’t know why neural networks really work. It seems like they should just get stuck in local minima that are useless. And yet they don’t. Not only do they not do that, but they once you get the right hyper parameters, they almost uniformly find that no matter where you start, they end up in almost always an equally good final result. And I’ve got papers on this where they’re trying to study why that is because it’s a mystery. And they’ve come up with mathematical models to try to explain why it works. The simplest intuitive understanding, and I don’t know how accurate this is, but this is how they usually explain it, is in the 3D world, there just aren’t enough directions for you to explore. But if you’re in the hyperspace that exists for machine learning problem where there’s hundreds of thousands of dimensions, there’s almost always some place where you can go down a bit further on your slope. And so because of that, because you’ve got these large parameter spaces, it just continues to find improvements. And then it kind of bottoms out and all of them kind of just matter which slope you follow, they all bottom out somewhere near the same level. And it is a bit of a mystery, but it’s almost like magic right now.

[00:47:33]  Blue: And they didn’t come up with neural networks, because they thought, oh, this would be a good idea to do this, they were just trying to copy the brain, it’s not even a good copy of the brain, it’s not even slightly a good copy of the brain. And so there’s a great deal of just luck that they came up with this model based on the brain, it was wrong. And it just so happens that it works really well. And it finds these really good functions. I wish more of the things when I did things wrong would work as well as this did. So there really is a strong happy accident thing going on here. Okay, so let’s take a look at gradient descent. Does it follow the universal Darwin algorithm that I’ve suggested? So I’m going to go through each part, start with a problem, pass, absolutely, that one gradient descent follows that one. Conjecture a solution to a problem. Yes, when even with just a single descent, you have to conjecture multiple solutions, multiple parameters to go into this computation graph, and then you have to see which ones are better. So we’ve, this one definitely follows it. Measure how good the proposed solution is. Well, you have to measure each variant against the last one to know when to stop, because that’s what the stopping out the criteria is. Once you don’t get improvements anymore, you say, okay, after I haven’t gotten an improvement, you make it up, but don’t get improvements for, you know, 10 generations or something, I’m going to just stop and take the best one. So that is a case of measuring how good the solutions are. You’re doing that implicitly as part of the algorithm. Retain the better solutions.

[00:49:09]  Blue: This is also implicit in the stopping criteria, the very fact that you stop and that you’re doing it once you’re getting improvements. That implies you’re going with one of the better variants. And then you usually save off the best variant you found and you go with that one. So this is clearly a pass also. Go back to step two and repeat until the problem is sufficiently solved. That is what the stopping criteria is. So this one also passes. The result, you will end up with improving solutions to your problem. This is precisely the final results. This is why deep learning works at all is because you keep trying the different sets of parameters until you find the best set following exactly this algorithm. So it’s a pass. And the explanation, this result is the case because the act of comparing variants and discarding the worst ones while keeping the better ones must have necessity result in improvements or error correction, in other words. That is precisely what is going on. So key point, gradient descent creates knowledge precisely because it is the universal Darwin algorithm. If it does not create knowledge, then we have spoiled the explanation for the universal Darwin algorithm. This is my contention. So this would be, from my point of view, a strong pass for Campbell’s theory. Gradient descent, deep learning, is deep rooted in the evolutionary algorithm exactly like Campbell said it would be. So another strong hit for Campbell in my mind. Let me, because this is the one I get challenged on so much, I’m going to dwell on this just a bit longer.

[00:50:42]  Blue: Let’s pretend like, so one of the things Popper talks about is take the other proposed theory seriously and follow it to its logical conclusions. So let’s say that we just really don’t believe gradient descent counts as knowledge creation. How would you have to modify the algorithm and the explanation that I’ve just laid out to make it work with the idea that gradient descent isn’t creating knowledge but instead is just discovering existing knowledge that was in the data or something to that effect. So I’ve rewritten the algorithm and I’m doing my best here. Maybe somebody else could do this in a better way, but I’ve rewritten the algorithm the way to try to amend it in a way that would be consistent with that viewpoint. And I don’t see a way to do it that isn’t as bad as what I have up on the screen where step two changes from conjecture solution to a problem to use blind variation to conjecture solution to the problem. Because if you use sudden variation like gradient descent, then even though it also will end up with improvements, we’re going to declare the final output to not be knowledge creation. I mean, this is a horrible change to the theory. You’re just throwing stuff on there for seemingly no reason. And I don’t know any other way to go about it. The simple truth is that gradient descent works because it is taking variations. It’s comparing them. And when you do that and it’s keeping the better ones, and when you do that, you end up with improvements. And that’s the explanation.

[00:52:19]  Blue: If you’re going to insist that it’s not doing that, then you have to amend the explanation to say this results in the act of comparing variants and discarding the worst ones while keeping the better ones of necessity because this will of necessity result in improvements. Though I want to make clear that improvements aren’t the same as knowledge creation. And if blind variation was used, then knowledge was created. But if sighted variations like gradient descent were used, then the improvements produced aren’t knowledge. This is because knowledge is defined by whether or not the variants were blind. I mean, that’s a terrible explanation. My explanation was a real explanation. This one’s just like, it’s just out there. And I don’t know what else to do here. And this is why I can’t accept that viewpoint. Even though I admit gradient descent seems very sighted and I understand that Campbell said blind variation and didn’t say variation like I’m suggesting. I don’t see how this does anything but disprove that theory. Can I ask something right there?

[00:53:23]  Green: I thought that you said before that even though you’ve got blind variation, I thought that you said technically that includes the set of sighted variations

[00:53:31]  Blue: that includes all of it. That’s my viewpoint. So I’m now saying I’m taking the point of view of someone who’s taking the other side of that argument. So I’m allowing us to say, oh, there is such a thing as a sighted variant. And it’s something distinct from a blind variant. It ruins the explanation. It completely destroys the explanation. It’s just gone as a good explanation. And I don’t see how this could possibly be what Campbell or Popper had in mind either. When they talk about knowledge creation, they’re just talking about the types of improvements that take place when you try out variants to a problem and you keep the better ones. So the end result of this then is that we end up with a kind of convoluted version of the universal Darwin algorithm that I don’t know how to fix. If someone else knows how to fix, I would love to see it. But I don’t know how to save this point of view. And the key questions that you would have to ask are why are we adding an unnecessary extra part to the theory? The theory works if you just treat knowledge as improvements. And it works based on any variants in any selection. It doesn’t matter if they’re blind or sighted. And the end result is that you took a really hard to vary explanation that has a really good explanation as to what’s going on. And you’ve just kind of destroyed it. So this is why I reject this version of that point of view. Now, given all that, and since gradient descent is kind of the premier form of machine learning, it might feel like we’ve now proven Campbell right.

[00:55:10]  Blue: It might feel like we’ve said, wow, I mean like machine learning, the premier form of machine learning is using the universal Darwin algorithm. Campbell’s again got a home run. And he does. It’s amazing that nobody has up to this point noticed that gradient descent is evolutionary algorithm, right? Exactly like Campbell said it would be. It’s posed in such a way that people have missed, that’s how it works. But let’s look at other types of machine learning. So decision trees is another kind of machine learning that’s based on information theory. You use entropy to measure which features are the most important to the answer that you’re trying to select. So it really seems like it’s based on an entirely different theory. But it turns out that it’s still based on the evolutionary algorithm. So what you actually do is a decision tree takes every attribute or feature that you give it, and it measures, okay, based on this feature, if I use it to create a branch in my tree, what information gain would I get based on the computation for entropy, which you don’t need to know what it is, but there’s a computation you can do for it. And it says, okay, what’s my information gain? And then it says, oh, which of which of them was the best information gain? And then it first branches on the one that’s the best. Well, that’s variation in selection. You’ve probably never thought of it that way before. You know, if you’re even if you’re a machine learning expert, it’s probably never occurred to you that a decision tree is an evolutionary algorithm, because it just doesn’t seem like it is. Okay, but it is.

[00:56:46]  Blue: However, let me give you one that’s a way harder example, something called the naive Bayes classifier. So you guys are probably familiar with like spam filter.

[00:56:55]  Unknown: Sure.

[00:56:56]  Blue: Have you ever thought about how spam filter works? No,

[00:57:01]  Red: I actually haven’t.

[00:57:03]  Blue: So imagine that you get have a bunch of examples of good emails and a bunch of examples of spam emails. And so you want to create a machine learning algorithm that differentiates between the class of good emails and the class of spam emails. So one way you might do that would be that you would measure the occurrence or frequency of certain types of words. So if the email contains the word Viagra, that that would be a very strong indicator that it might be a spam, not guaranteed, but a strong indicator that might be a spam, you know, whereas, you know, maybe if it’s something that’s not common in spam, then it’s a good chance. It’s just a personal conversation. These two people know what they’re talking about and they know each other. So by measuring in the occurrence of the frequency of words, you can see how you could come up with a spam filter. This is what a naive Bayes classifier is. Okay. And I don’t see how it has any variants or selection in it. If it does, they’re hidden from me better than with the case of the decision trees or in the case of gradient descent. And yet it still comes up with something very similar to any other like I can take a naive Bayes classifier, I can use it on probably any problem. It’s not the best classifier. And this is this is actually a point all that might be important. But anything I can do with machine learning, I can probably do with a naive Bayes classifier, it won’t be as good as deep learning, it won’t be as good as some of these other techniques.

[00:58:36]  Blue: But for the right situation, like spam filter, it’s the best, it’s really good. And even for other types of problems, it’ll at least give me a decent result, right? So here we have a machine learning algorithm that seems to work just like other machine learning algorithms and yet has no variation in selection behind it that I can see. So this is actually a counter example. Interesting. To Campbell’s bold prediction. Or at least it seems to be a counter example. I mean, there’s always the chance that I’m getting something wrong, that something’s wrong in my background knowledge. And it probably deserves stronger study by someone smarter than me before we declare for certain that it’s a counter example to Campbell. But this is this is probably one of the best counter examples I could give you that’s easy to explain. There’s a couple that are even maybe better counter examples that are hard to explain. So in machine learning, there are certain techniques that can be done either through a variation in selection process, or without a variation in selection process. And so for example, linear regression, typically, for just tractability reasons, you would use gradient descent, which we just said is a variation in selection process. So it’s an evolutionary process. But there’s a way to solve a linear regression directly through just a normal computation that’s called the normal equations, that again, I don’t see any variation selection going on. When I look at those equations, if it is, it’s implicit in some way that I’m missing. Okay, they give the same result, right? The normal equations is guaranteed to give you exactly the final answer, the gradient descent will give you something really close to the answer.

[01:00:21]  Blue: You can get it arbitrarily close to the correct answer. Gradient descent is a much faster algorithm, namely, because it doesn’t have to try every single, it doesn’t have to look at the combination of every single data point. Because of that, it’s just a more tractable algorithm. So that’s why we prefer gradient descent. And yet the normal equations would work. I mean, I could find examples of problems and code up and get the exact same answer to say something that recognizes handwritten digits. I’m going to do this one of these days just to prove it can be done. And you would be able to do it and get a 80 % success rate or something like that on the MNIST problem, which is the handwritten digits that it was done so that they could like get zip codes. They would use linear regression on that they’ve got much better techniques that work a lot better than that. But it’s not a terrible, it’s not a terrible algorithm. I mean, 80 % is okay, right? And you can get the same 80 -ish percent using the normal equations, which uses no evolutionary algorithm at all, as far as I can tell. And then with neural nets, you can’t use the normal equations. But for really, really simple neural nets, ones that are quadratic, okay, you could, in theory, solve for the global minima using a Hessian. You know, it doesn’t matter. I’m not even going to explain what it is. Using a Hessian and Taylor series expansion, you could solve for the global minima on really, really simple neural networks. They would never work with any real problems, right? I mean, if you couldn’t be doing all the cool things we’re doing with deep learning using this technique.

[01:02:03]  Blue: But there are cases where you could solve directly without using gradient descent, and thus you wouldn’t be using an evolutionary algorithm. So it really looks like, on first blush here, that we just found counter examples to Campbell’s bold prediction, which would seem to indicate that it’s possible to create knowledge without using the evolutionary algorithm. And honestly, I don’t know what to make of this. This is a very cool, very interesting problem in epistemology that I don’t think anybody except the 100 or so people who listened to this podcast now know about. And it’s a super exciting problem. And it’s probably got some easy solution. And we just had to figure out how to frame the question better than I’m doing it. But this is a super exciting epistemological problem that exists in machine learning that we need papyrians to look at. And if we can figure out what’s going on, there’s probably potential to improve machine learning to understand it better or to understand knowledge creation better. So I think that’s way cool. Now, it should be noted that the counter examples are all subpar examples. So it really looks like that Campbell, even if he’s wrong, there’s first of all, he was right in so many cases while it may not turn out to be a universal law, there’s got to be strong verisimilitude to it. It’s got to be the case that he was approximately correct. And what we want to figure out is why wasn’t he 100 % correct? This is assuming that I haven’t just framed the question wrong, which is a possibility and maybe he is correct. The other thing that could be noted here is that he’s correct for all the really powerful algorithms. Why is that?

[01:03:56]  Blue: Why is it that you have to have evolutionary algorithm and you can’t do knowledge creation except in a subpar way with a non -evolutionary algorithm? I don’t know. What are ways we might go about solving this problem? In fact, we’ve been talking about this all this time. That was why I first built up my case that some of these answers that might at first plus seem like good explanations aren’t actually good explanations. So one thing we might say is machine learning or some machine learning results look like expansions, discoveries of knowledge, but really aren’t. So that might be a possible way to deal with this problem. Now, that’s a form of the pseudo -Deutch theory of knowledge that we’ve dismissed as not being an explanation. So and it’s circular, it’s a bad explanation. Everything still is true, even if this now maybe looks a little more appealing because we know we have a problem we need to solve. In addition to that, there’s an implication here, which is the act of organizing data into a model creates no expansion or discovery of knowledge. Well, I don’t think that’s the way most people would think of knowledge. No matter how novel the result, if we take a bunch of data and we use this algorithm and it comes up with new type of starship or something, whatever the result is, we’re going to suddenly claim that, yes, it looks like an expansion or discovery of knowledge, but it isn’t really, really the knowledge pre -existed, no matter what it is. Well, that’s a very strong, if that’s true, if this is actually a correct answer, that’s a really interesting implication that probably we should be studying a bit more closely.

[01:05:42]  Blue: This argument also can be used for literally anything. So people who have argued with me that gradient descent isn’t knowledge creating. They’ve used this argument with me and I can respond back. They’ll say something of the effect of, well, we know that gradient descent isn’t knowledge creating because it’s using cited variants and therefore it’s not blind variation. And I’ll come back and I’ll say, no, it is blind variation and the way I know is because there’s expansion of knowledge and the argument works either way. You can’t actually differentiate between theories using this answer in its current form. That’s why it’s an easy to vary explanation. How would you falsify this theory? I can’t even think of in principle how you would go about falsifying this theory. The moment you’re allowed to say, well, that expansion of discovery of knowledge isn’t really a expansion of and discovery knowledge. The knowledge just existed in the data. I don’t know how you test that theory. It’s just a testable theory. Right?

[01:06:43]  Red: Yeah. Right.

[01:06:44]  Blue: And then it’s just the fact that when, if you run a machine learning algorithm and it creates a move 37, that is what we mean by knowledge. Deciding to not call it knowledge, it almost seems like it’s just a petty relabeling scheme. Right? It’s like, this is what people mean by knowledge. That was what we were trying to explain. We’re trying to explain how it came up with move 37. Right? And if this, if that move 37 isn’t knowledge, if it’s something else, then whatever that something else is, that’s what I’m really interested in. I’m not actually interested in knowledge creation after all. In fact, why don’t we just use the word knowledge to refer to what move 37 is? That would be easier because that’s what we’re really interested in in the first place. So this, these are the problems that exist with this possible way to explain the issue that I just raised. Now, there is, and by the way, even if this one is true, it creates an interesting new category of preexisting knowledge that needs to be reorganized to be useful. Right? That’s a category we probably ought to put out there and have an epistemology based on because I didn’t know it existed unless it turns out this explanation is true. Right? So, and then it’s also interesting that the way you go about creating preexisting knowledge that needs to be reorganized and reorganizing it is using something that’s exactly like the universal Darwin algorithm that I’m laying out. Right? And so why is that? There’s a number of just problems here that even if this you favor this explanation, you should find these interesting problems. You should look at these and go, okay, I believe in number one here.

[01:08:26]  Blue: I believe that machine learning sometimes results in things that look like expansions or discovery of knowledge, but really they’re coming from the programmer. You should look at these problems that I’m laying out and they should be, these should be interesting epistemological problems that I’m laying out for you. You should be saying, yeah, I want to look into that sooner. Since this is true, that’s an interesting problem I want to solve. Okay. And my experience, though, is that when people say this, they’re kind of done. They kind of, they don’t go further and look at what are the implications of this theory. There’s an alternative to this theory that says something like this. It says machine learning is just a collection of known algorithms that finds patterns and existing observations, but creates no new knowledge. This is really the same as number one. That’s why I call it one B, but it’s worded in a different way. And it brings out, the reason I want to bring this one out is it brings out some interesting problems in its own. For example, why isn’t finding patterns a form of discovery of knowledge? Okay. And if you find a pattern, that’s kind of what I meant by knowledge, right? How would you falsify this theory? I don’t see a way to do that. And isn’t this theory really just for all intensive purposes saying that induction is a real thing and it really works? Yeah, maybe. I mean, sure, you’ve slapped the label on that it doesn’t create knowledge, but who cares, right? Induction really works. It really does find patterns out of observations. It’s a real thing. We should start taking induction seriously, regardless of whether we call it knowledge creation or not.

[01:09:56]  Blue: It’s whatever it’s doing, that’s of interest now. Okay. Now, based on that, there’s an alternative explanation that’s been offered. My friend, Dan Elton, did this in an actual paper called Applying Deutsch’s Concept of Good Explanations to Artificial Intelligence and Neuroscience, an initial exploration. Very interesting paper. I don’t agree with his conclusions. Him and I are starting to team up now. We’re going to be writing a couple papers together. I went over this with him and he said, oh, wow, this is interesting. We should talk more about this. So very open -minded guy. But in his paper, he takes the stance that induction and papyri and epistemology are two separate knowledge -creating processes. And so he believes there’s two knowledge -creating processes. He thinks induction’s one of them. But he thinks induction’s inferior to popper’s knowledge -creating process in certain ways. For one thing, he argues that it’s an easy to vary process. I won’t get into exactly what he means by that because it’s a little bit technical. But he’s right. Machine learning, it works through a process where it’s guaranteed to find, no matter what, it will find something. And that’s kind of an easy to vary thing. It’s different from the way science works, right? And so he also argues that basically it just produces heuristic knowledge. It doesn’t produce explanations, which is true, maybe with the exception of explanation -based learning. Machine learning is not trying to create explanations. It’s trying to create heuristics. Heuristics are a kind of knowledge. Even Deutsch accepts that heuristics are a kind of knowledge. So Elton’s essentially arguing machine learning does create knowledge. It uses an inductive process instead of the universal evolutionary process. But it’s limited in what it can create. It

[01:11:52]  Blue: can only do heuristic knowledge. It just doesn’t have a reach in the same way that explanatory knowledge does. And so this is kind of what he lays out in his paper. Now, again, I have to ask, how would you falsify this theory? Well, his theory is just an initial exploration. It doesn’t yet have any way that you could falsify it, right? It’s just an initial thought, so more of a research program. And it doesn’t really explain what induction is, right? We’ve got a better understanding of what an explanation is than we do of what induction is. It’s just machine learning is induction, and that’s how people refer to it, and he’s just kind of going with that. And then it never explains why does induction usually use variation and selection. Like I’ve just laid out, I’ve just shown that the most important forms of machine learning have to use variation selection to work. Why is that? He never explains that. He hadn’t even thought of that when he wrote this paper. Now that I’m talking with him, he’s starting to think about these things. But this was the potential problems with this theory. So this theory, in fact, let’s even say this is the correct theory. This might be the correct theory. The fact that we’ve got these counter examples leaves me with the impression that his theory might be correct. But what this means is this theory still has problems. It lacks explanatory, the necessary explanatory structure to explain even basic things about what we mean by induction, right? And what we mean by knowledge creation, or how we would falsify this theory or things like that. So this theory, even if it’s right, needs a lot of work. And that should be exciting.

[01:13:25]  Blue: If you’re preparing and you favor theory two here, you should be saying, wow, this is a very cool thing that we’ve just discovered is true. We need to like look into this. We need to understand it better. And we need to actually turn this into a good explanation and a good hard to vary explanation. So if this one’s true, it’s got its own set of problems. Now let’s talk about number three. This was my original theory. All machine learning uses universal Darwinism, but it’s not always so obvious. And I gave the example of the decision tree. I gave the example of gradient descent. Okay, but then I also refuted my own theory. I basically gave you examples that show that unless we can come with some way to show that this is actually still secretly universal Darwinism, my theory must be false, right? This is the theory I advanced, and then I came up with a way to refute it. So it’s tempting to see this one as off the table and to say, well, that’s a bad explanation, because it’s refuted. And therefore we should go with one of the other explanations. And I understand why people say that. But that is not how critical rationalism works. Okay, critical rationalism at its heart is about trying to create good hard to vary explanations. And the other two aren’t. So at some level, what we’re talking about here is a refuted explanation that was the reason why this is a refuted explanation is because it was such a good explanation. It was the only one on the table that was. It was the only one that was actually testable. So I went out and I tested it and I found a counter example.

[01:15:07]  Blue: This is kind of the state of things. We’ve got two not very good explanations. We’ve got one very good explanation that we think we have a counter example to. Again, this should be exciting. This is an exciting thing that exists out there. A problem in epistemology that paparians really should be taking a look at. So this is my summary then of kind of the three proposed solutions. We can declare it not knowledge. We can accept that induction does in fact exist as a separate process. Or we have to somehow show that really the counter examples aren’t counter examples. Every single one of these has problems that are severe that need to be looked into. My third one has one benefit. And this is the point I’m trying to make. Since it’s the only testable theory, the other two are what you would call not even wrong. So mine’s wrong, but the other two are not even wrong. They need to be turned into good explanations that could be tested and then survive the criticism in the testing. Because they can’t even be tested. Right. At this point. Yeah. That’s right. That makes number three the best theory, even though we basically know it’s wrong. Stranges that may sound. And that is actually the correct critical rationalist answer to this problem. Now this actually raises an interesting epistemological problem. We should embrace our best explanation, even if we know it has a problem. And that is true. So our best explanation is machine learning uses universal Darwinism. We know that there’s some exceptions to that. There’s really at least appears to be exceptions to that. The other two, they aren’t good explanations, but they may be really good starting points for a new research program.

[01:16:49]  Blue: And this is something that I think people don’t quite get. And it’s something I really want to emphasize. The mere fact that something is a bad explanation doesn’t mean it’s bad. In fact, pretty much everything that eventually turns into a good explanation starts life as a bad explanation. Right. If you were to go back in time and you were to look at how Einstein’s theory of general relativity came about, it started life as a bad explanation. Right. He had to keep figuring out how to improve the problems and how to address the problems. And this is where I’m really trying to say, I, when I’m arguing with people about this, I’m not trying to say that I know one or two to be wrong. It may well be the case that there is such a thing as things that aren’t really knowledge, but look like knowledge. It may be the case that induction is actually a real thing and that it can create knowledge and it’s separate from the evolutionary algorithm. I don’t really know these things to be false. What I’m really trying to say is not, don’t pay attention to those because they’re false, but pay attention to the problems that that explanation creates and take those problems seriously. Whichever of the three directions you want to go, you need to take the problem seriously. You either take your problems, problems very seriously or you don’t. And this is the argument I want to make to those that I’ve been discussing this with. It’s not that I think you’re wrong or know you to be wrong or something along those lines. These actually may be the right direction to go.

[01:18:23]  Blue: It’s that you really need to be giving thought to what are the implications of my theory and how do I then deal with the problems that seem to exist with my theory, including not the least of which, but probably the most important of which is how do I make my theory testable? You know, how do I get it to where it’s a hard to vary theory that is as testable as number three was. And again, I can’t emphasize over emphasize how exciting this actually is that we have this very interesting epistemological problem that exists in the machine learning space. And how many Poparians do you think there are in the machine learning space? Right? Even most Poparians that are interested in artificial intelligence, they’re really only interested in artificial general intelligence. I don’t think there’s anyone currently looking at this other than me and maybe Dan now because he’s working with me on this. Nobody’s really looking at this problem. And it’s a real problem that needs to be addressed, right? And think about how what a imagine that it’s some future date we replace the machine learning theories that are inductive with Poparian machine learning theories that underpin them instead. Wouldn’t that result in huge improvements in machine learning if we actually understood knowledge creation in a better way that was more accurate? I mean, it should, right? There’s really giant potential for breakthroughs here if we could get some good minds really looking at this. And this is why I’ve recorded these episodes.

[01:19:55]  Blue: I’m really trying to get people interested in this problem because I see it as a legitimate problem that we really need Poparians looking at and trying to move into not AGI but specifically artificial intelligence, the narrow kind, machine learning kind and really start looking at it with a Poparian eye and try to make some progress in this area. So just as a summary then, artificial intelligence is broken into two branches, machine learning and regular AI. Regular AI utilizes search algorithms to solve problems and all search algorithms are pretty much by definition evolutionary algorithms. So they match the universal Darwin algorithm that I’ve laid out. Machine learning apparently sometimes does not use the universal Darwin algorithm. The best forms of machine learning and machine learning do but not all of them do. This is a problem for the idea of universal Darwinism as a universal law. Most Poparians assume it’s a universal law. Most non -Poparians don’t. So Poparians should take interest in the fact that there might be a counter example and should look at that and should try to make sense of that. However, most machine learning algorithms do use the universal Darwin algorithm. Therefore, universal Darwinism and Campbell’s version of it, as I understand it, has good verisimilitude. He was onto something and it’s something good. The most powerful forms of machine learning, gradient descent uses the universal Darwin algorithm. Artificial intelligence is the study of knowledge creating algorithms and even if you think they’re currently failing, that’s what they’re trying to do and so therefore it’s still of interest. Machine learning needs to be rethought to use to not use discredited philosophies. There should be a Poparian version of machine learning that’s non -inductive if induction is really not a thing.

[01:21:44]  Blue: If you believe induction is a thing, then instead you would want to look at how to tie those two together. Why is it there are two knowledge creating processes? What other knowledge creating processes exist? If you don’t believe induction is a thing, which is what most Poparians would say, then there should be a Poparian underpinnings version of machine learning. What is it? I’ve laid out my best guess at it and I’ve showed you the problems with my best guess. So universal Darwinism seems like a right path to rethink the underpinnings of machine learning and then ultimately, artificial intelligence and machine learning, they need Poparians to join the field and help break the field out of its philosophical rut and that is my appeal for the day.

[01:22:27]  Red: The heck of an appeal. We’re going to need a much bigger listenership probably to really get this problem solved.

[01:22:36]  Blue: I really wanted to get people thinking about it. I’m hoping that people who just kind of listened to the podcast that this would be enough information that at least it just piques their interest and they say, wonder if there’s something to what he’s saying here. I’ve done my best to lay it out in such a way that I’ve shown that there’s no easy way to dismiss the problem as just a pseudo problem, that there’s a real problem here that exists that needs to be addressed. We don’t know what it is. Our best guess, what seems like the straight Poparian answer, which is, well, machine learning is expanding knowledge and so therefore it is universal Darwinism, doesn’t seem like it’s necessarily correct. So let’s get some better minds looking at this. That would be my appeal.

[01:23:26]  Red: That’s cool. It’s awesome. Yeah.

[01:23:29]  Green: All right. Well, thank you guys. Thank you, Bruce. Thank you, Hayden. I think I speak for Kami when I say we want tickets to your first TED Talk.

[01:23:41]  Blue: Exactly. I would be more than a static if someone I’m working with got a TED Talk. It doesn’t have to even be me. I just want to see this area get pushed forward. So all right. Thank you guys. Glad you made it, Tracy.

[01:23:58]  Red: Thank you for joining us.

[01:24:00]  Green: Yeah, thank you. It’s fun guys. Have a great rest of your day.

[01:24:03]  Blue: All right. You too. Bye -bye. Bye.

[01:24:07]  Green: Bye.


Links to this episode: Spotify / Apple Podcasts

Generated with AI using PodcastTranscriptor. Unofficial AI-generated transcripts. These may contain mistakes; please verify against the actual podcast.