Episode 49: AGI Alignment and Safety

  • Links to this episode: Spotify / Apple Podcasts
  • This transcript was generated with AI using PodcastTranscriptor.
  • Unofficial AI-generated transcripts. These may contain mistakes. Please check against the actual podcast.
  • Speakers are denoted as color names.

Transcript

[00:00:12]  Blue: Welcome back to the third of anything podcast. Hey guys, how’s it going? Good. How are you? Good. Good. We got Tracy this week too. Tracy, I know you weren’t here last time, but I am actively trying to answer the questions that you asked me. You may have to catch up on the podcast later, but let me, let me summarize what we’ve come to so far. Okay. So you, so you asked about two things you asked about, given the realities of narcissism, you know what we currently believe we know about it what we’re finding as actual observations, basic statements about it. How can that be aligned with the theory of universal explainers. And whatever my answer is to that, does that have ramifications for creation of agis. So last, you were here for half of my answers you kind of have an idea of where I was going with this. Basically, I claimed that there are ways that the genes can influence our ideas are personality and our actions that in no way violate the idea of universal explainership. The most obvious and most important of which is through things that we feel. So feelings of pleasure, encourage us. They’re like carrots, feelings of pain, discomfort, suffering, those discourage us, they’re like sticks and the genes actually do use those as well as a number of other things I’ve laid out to try to keep us aligned with their interests. Of course I’m personifying genes they’re not really persons but it’s just convenient to talk about them that way I know they’re not truly don’t have intense truly, but it’s natural that the ones that do replicate are the ones that stick around.

[00:02:00]  Blue: So it makes sense that they would have ways that those would be the ones that kept us aligned with their interests of replicating them and any means by which they could do that. Without violating universal explainership, it makes sense that those might show up in the genes as means of trying to contain our actions and keep it aligned with theirs. How powerful is this though, well, universal explainership implies that we will eventually overcome any gene safety program alignment program that they use on us. At a minimum we’ll eventually genetically engineer ourselves and or we’ll leave genes behind and future generations might be digital copies of people or whatever instead I mean you think about science fiction here. The knowledge will eventually exist to where the genes will simply lose this battle with us. But as of today, they can have considerable influence on us, particularly through feelings. It is not an accident that every culture that has ever existed in a large scale at least has cared a lot about sex and romance. Okay, that there’s no there’s no real reason why we should particularly care as universal explainers, but we do. And if you want to explain that you have to explain it in term your explanation must include why that is in the interest of the genes, and then explain how they influenced us. Now it’s not too hard to see how they did it by making it pleasurable that was really all they really had to do. Once it was pleasurable it was something we were going to spend time on, because we liked it. That is a gene alignment approach to universal explainers.

[00:03:41]  Blue: Now, based on that, how can we explain a narcissist, we can kind of see how that might be possible, because narcissism is rooted in a lack of empathy, and empathy is related to being able to feel emotions that other people are feeling. However, even with my best attempts to put together how to explain this, I felt like my explanation even with all seven of the ways I came up with that genes could influence us. I still felt like it fell short and I admitted that at the end, particularly because of the studies of se vitting, who found that psychopathy is 81 % predicted by genes. Sorry, high psychopathy different levels is 81 % predicted by genes, it’s available, it shows up in seven year olds, and interventions don’t help as far as we can tell. How do you explain something like that, even in terms of the seven different ways I’ve come up with that genes could influence us. I think at best I can say, I’ve mapped out how to try to go about trying to explain something like that. I don’t feel like I’ve actually explained it. If I were to say something like, well, the genes can switch off our empathy, and that’s what causes psychopathy and empathy is related to emotions, and emotions are controlled by genes. That might even sound like a good explanation, but really it’s a pretty sucky explanation. It’s so vague on so many points, it has nothing testable about it. It kind of presumes that we know what empathy even really is, when reality we don’t. So that’s why I say, I can’t get there all the way, I can explain parts of it, I can give a direction towards how the genes would go about doing this.

[00:05:27]  Blue: But ultimately, we have a lack of knowledge. So we must admit, this really is a problem for the moment, one that still has yet to be solved.

[00:05:36]  Red: Are you with me so far on this Tracy? Yeah, I’m following. Okay,

[00:05:40]  Blue: let’s say that we so we did in the last episode that you also miss Tracy we talked about the fact that many of the fans of David Deutsch take a very hard stance that feelings do not influence us. Well, I shouldn’t say that they weren’t willing to quite take a truly hard stance against that they were and they were ambiguous as to what the stance was, but they were very opposed to the idea that genes do influence us they just weren’t willing to say that genes don’t influence or sorry feelings do influence us they just weren’t willing to say that feelings don’t influence us. And I went through the arguments that that I’ve had raised to me and I kind of went through and showed that they were problematic arguments that don’t really make sense to me. I think that there’s maybe more of a moral objection there than a true rational objection. However, let’s just take this seriously for a moment. Let’s take the stance the genes can’t influence us. And that’s our competing theory for the moment since I’m saying that they can influence us. And so the stance genes cannot influence us because we’re universal explainers. Well, that stance that runs afoul of many, many, many observations. And as paparians, we care about refuting observations. In fact, we give them in a sense priority over theories, because otherwise you just immunizing your theory against testable consequences. So we either have to admit that we do have problem, you know, refuting cases and therefore problems to work on, or we have to take the stance that universal explainers does not being a universal explainer does not imply the genes can’t influence us.

[00:07:14]  Blue: You really have to take one of those two stances, if you’re trying to do this in a paparian epistemological way. So and I’m okay with taking either stance, I just want to take it seriously. My version of this is that this is a contest therefore two theories 31 is the genes have no influence on us at all, because we are universal explainers. And 32, the one that I’m advocating for is the genes have considerable influence on us via the seven different ways that I’ve laid out an outline, maybe more, because I don’t can’t explain psychopathic probably more, but the feelings feelings being the most important example of how the genes can give us pleasure pain things like that to be able to influence us to coerce us to do what they want us to do. Now, of these two theories, 302 is and there could be other theories so but I’m only going to consider these two because the only two I know about at the moment of these two theories 302 is the objectively better theory. Why, well because 31 has literally millions of basic statements observation statements that refuted, including the twin studies, twin studies found things like that there’s a genetic predisposition towards being a conservative versus a liberal you take two identical twins and you separate them at birth, and you raise one in a liberal family and one in a conservative family. And if they have a predisposition towards being say conservative, they’ll both end up conservative despite one being raised by a liberal family, or vice versa. They actually found stuff like that in the twin studies which is what made the twin studies so fascinating. If genes literally have no influence. That’s a refuting case.

[00:08:51]  Blue: And there’s there’s millions of those there’s literally millions of those, not to mention the fact that I happen to be a human being and I happen to know pain influences me I happen to know pleasure influences me. So I just know 31 is false. Anything else that you do if you tried to explain that away you’re really just immunizing your theory from refutation what you’re not allowed to do under Popper’s cosmology. 32 is therefore the better theory. It’s also, it’s not only does it deal with these basic statements and then makes it so they’re no longer refuting cases. But it does so in a testable way lays out exactly what the boundaries are of what the genes can do to influence us and what they cannot do to influence us. That’s a testable empirical consequence or it could be turned into one fairly easily. Therefore 32 is the better theory not not it’s not enough to say, my theory hasn’t been refuted in years has. There are many cases where the easiest way to make a theory not refutable is to make it not testable. That’s a worse theory. If your theory is not even testable, then you’re from a paparian perspective you’re not even wrong. In this case, my theory is both more testable and has fewer refuting cases. But I wouldn’t go so far as to say it has no refuting cases and that was why I raised the whole idea of psychopathic at age seven. I just cannot help but sincerely feel that my theory can’t really explain that, not really, not in a convincing or empirical sort of way. And therefore I do see it as a refuting case for the moment, meaning my theory is wrong in some way.

[00:10:26]  Blue: But we don’t care in Popperian epistemology about whether you’re right or wrong because we’re probably always wrong. We care about which theories better, which one has the more verisimilitude which is more truth like. Mine is the more truth like that’s that’s a fact that’s an objective reality that we have to deal with. Okay, doesn’t make it the right theory it’s not the right theory where it but it’s a step in the right direction. The whole idea of children age seven being psychopaths because of their genes that interventions cannot make a difference that really bothers me and I do not know how to go about trying to deal with that problem as of today. The thing that it feels like my theory is missing something really big, and it seems to be somehow related to empathy empathy is not strictly speaking just a type of feeling. It’s the ability to feel feelings of somebody else. And we know there’s even a biological nature to it because if you go to the literature on this they’ll always try to relate it to mirror neurons. I’m not sure how good that theory is. And yet we know that empathy is somehow related to mirror neurons, but we know so little it seems dumb to say well it’s because of these mirror neurons that you feel empathy that would just be a dumb theory as of today. But we know that there’s something about empathy that’s special. And it, I don’t think I know enough. Like, if I were to try to lay out a theory of why is it that narcissists have a problem with empathy or a psychopath has a problem with empathy that they don’t have it, or a great lack of it.

[00:11:57]  Blue: I don’t think I even know enough about that, that concept of empathy that I could come up with a testable theory about it. That’s why I’m going to instead just say yes there’s a problem here, and I don’t know how to resolve the problem. But it does seem likely that that’s the connection is that the genes somehow do have an ability to influence whether you gain empathy or not. And I mentioned evil genes the book evil genes. The end of the book was she presented the theory and it’s not just hers this is one that does show up elsewhere in the literature that the reason why psychopaths and narcissists and people with lack of empathy exist in the population is because that’s actually a good replication strategy that the genes have incentive to give some people that nature, so that they will, because that will actually be effective a way of replicating themselves as genes. And it kind of works like the selfish gene the hawk and dove that Richard Dawkins talks about. If you have everybody in the gene pool be a hawk they all fight on site, then that’s bad for everybody. But if you have everybody be a dove, then a single hawk has got a massive advantage in terms of being able to replicate its genes compared to all the doves because all the doves will back down to a fight and the hawk will therefore have more mating opportunities, be able to have better territory, things like that.

[00:13:24]  Blue: So you would expect that that hawks genes would then start to spread through the population, right up until it stopped being an advantage to be a hawk you got too many hawks in the population now, and therefore each additional hawk that shows up it’s actually a genetic advantage to be a dove. And at that point, you would reach an equilibrium between hawks and doves and he gave the example of 2080 or something 80 % doves and 20 % hawks. And once that equilibrium is reached it will just stay there, and you’ll always have in the population 20 % hawks and 80 % doves. The theory is, I don’t know how good this theory is, but the theory is, is that lack of empathy represents an equilibrium that there’s a certain percentage of people in the population that lack empathy because that’s actually an advantage, but only up until there’s once you get past a certain number, those that do have empathy start to become more protective, thereby making it not an advantage to be a non empath, and therefore, an equilibrium is reached where there’s just a certain percentage of them in the population but not too many. That’s the theory. Again, all theories are conjectures, not trying to say it’s true or not. But that’s the current theory that would fit the theory does imply that the genes can switch off empathy for some people. Now, one thing that I should note here, after Tracy did her podcast on this, I met with a friend who, it was, it was one of the friends that we had talked about in the podcast I, she may have been erased from it because I do editing, but one of the friends we talked about in the

[00:15:00]  Blue: podcast we were talking about the narcissistic friend Marcy versus, by comparison the non narcissistic friend, there was actually two of them so this lady is one of the two. And it turned out that she had married a narcissist. And she’s right now she’s getting a divorce from him because he has left her. And so I got talking with her about that because I just barely had this podcast with Tracy about this and, and so of course she’s researched this is her life’s been massively impacted by the existence of a husband in this case that was a narcissist, and that had cheated on her and then eventually left her. And one of the things that she told me that was interesting is she said, if you can, according to the studies I’ve read, if you can actually get a narcissist into therapy. And you can actually get them to take it seriously that they’re the source of the problem that the ability to help them is very high. However, the ability to actually get them into therapy and to take it seriously is less than 2%. And I don’t know, I don’t know if that matches what you’ve seen Tracy in your studies. Yeah,

[00:16:10]  Red: it does. Yeah. Okay.

[00:16:12]  Blue: Now think about that that this really kind of implies exactly what Universal Explanation Theory would say that if we can actually get the person to use their rationality on the problem that they can overcome it. But if, but it’s hard, because if the genes are influencing us to not be rational about it. It may be very tough to get someone to stop and actually engage their slower rational processes to overcome the problems in their life that deal with their egocentrism or their narcissism. Unfortunately, so that suggests that there is hope we can help these people we need to figure out how to convince them they’ve got a problem to identify them and get them into therapy. I think that these things and with the Hawkin dub example in animals, you would expect them to reach an equal equilibrium and kind of stay there with humans. I don’t think it’s the same as we get better at detecting narcissists and dealing with them. It becomes less a good replication strategy. Over time, we’re going to eliminate narcissists from the population altogether. How long will that take? I don’t know. But it’s just a matter of whatever our knowledge happens to be that equilibrium won’t stay in an equilibrium as our knowledge changes. And even those that are narcissists, there is hope for them if we can just get them to the point where they actually take seriously that they’re the problem they need to make a change and get them to engage the rationality in place of their feelings. Learn to understand why empathy is good, even if they’re not capable of feeling it very strongly. Okay, let’s talk about how does this apply to AGI now, because that was really what Tracy had asked me.

[00:17:54]  Blue: In fact, that was kind of what inspired her to want to do her podcast on narcissism in the first place. I don’t know if you recall this Tracy, but like when you approached me with the idea, you approach me with it as this is really interesting for AGI. And I agree, I agree. So let’s talk about, so fans of David Deutch and David Deutch himself, what do they say about what they call the AGI alignment or AGI safety problem? So the idea is this, you know, Elon Musk is really into this. He says something like, yo, we’re playing with the genie when we’re playing with AGI, you know, you’re summoning the genie. You know, it’s, if you don’t, if you don’t do it right, the AGI’s might wipe out the human race and you’ve got this whole AGI robot apocalypse type scenario. And we’ve already made tons of movies about this Terminator being the most famous one. So it’s kind of on everybody’s mind and consciousness, this idea that if we invent an AGI program, it’s going to quickly become this super intelligence and it’s going to take over the world. And unless we’re really careful, unless we align their interests with ours somehow, then they’ll wipe us out. They’ll have no purpose for them because they’re just so much smarter than us. And therefore we’re playing with fire when we try to look into AGI. This causes some people to want to say we shouldn’t look into AGI. This causes others to say we should stop and like there’s, there’s serious people like serious scientists ones that I really respect who are on boards that are setting up discussions about how to set up standards for AGI safety.

[00:19:30]  Blue: David Deutsch really has taken issue with this. And here are the arguments that I’ve typically heard that kind of come out of the Deutsche and worldview here. It’s, I’ve heard that first of all, there really isn’t such a thing as a super intelligence. Now I need to explain this a little bit. The idea of a super intelligence is in some sense at odds with the whole idea of a universal explainer. So there’s this idea that ants have a certain level of intelligence and dogs have a certain level of intelligence and you’ve got this raising levels of intelligence through the animals and then you get humans and then some humans are smarter and there could be aliens out there that have 1000 times our intelligence into them were like ants and we’re helpless against them. Universal explainer ship the fact that there’s this jump to universality and then you literally can explain anything would seem to undermine the whole concept of a super intelligence. Now the one way you could still argue is you could say well what if it’s like 10,000 times faster so it’s not smarter than us, but it’s so much faster that you know a day passes and for it it’s like years, and it’s been able to do thinking for years so it’s always one step ahead of us. Okay, we’ll give a little bit of credit to that if that’s what you mean by super intelligence. At least that’s not logically at odds with the idea of universal explainer it seems rather far fetched to me, but let’s go ahead and give it a little bit of credit and we’ll take that that possibility seriously.

[00:21:00]  Blue: Also, I want to point out that animals don’t have a continuum of intelligence either from what we’ve seen from our podcast on Richard burn that there’s actually a jump in intelligence that takes place between animals. Most animals have a single level of intelligence summer faster summer slower, but they use trial and error learning and classical conditioning. And then you suddenly have this jump that takes place with that where some animals can use insight, and then you have the final jump, which is humans where we’re actually are universal explainers. So, there’s really good reason to believe using our current theories that at least the strict version of super intelligence is just wrong that it just is, is just off base. The only thing you would really have to worry about is the case of it being way faster than us, even then there’s good reason not to worry too much. Because, even if I could think 10,000 times faster, I can’t do experiments faster than a human, right if I was an AGI, you still have to build a large headbron collider you still have to do the experiments. You wouldn’t really be able to produce scientific knowledge faster than the than real life lets you. Secondly, and this is one that I’ve often heard from the Deutschians which I think is pretty good is, you know, if, if, let’s say an AGI was 100 times faster than human and let’s for the sake of argument assume it’s 100 times faster in doing experiments to it can it can produce knowledge 100 times faster than a human. Well, so can a corporation of 100 people.

[00:22:31]  Blue: Right, I mean, an AGI that’s 100 times more intelligent would be no more dangerous than 100 people under the scenario. Corporations are somewhat dangerous and we have corporation alignment safety protocols that we put in place, because of that, but we’re not necessarily dealing with a problem that’s different than exactly the one we deal with every single day or the fact that every human is dangerous. So I think that these arguments are pretty decent and make a good argument for why we don’t have to really worry about the super intelligent side of this question. Now there’s other arguments to get used though that I’m not as sold on. So they say, agis would be people I agree with that that an agi would be a person, because it’s intelligent just like us. So it’d be racist to try to enslave them. And then they say, in, if we could enslave them, they’d be right to rebel against us so actually an agi safety program is the worst possible thing you could do that would be the opposite of a safety program, because then you’d be encouraging them to the agis to rebel against us, you’d be giving them good reason to resist us and to wipe us out and get rid of us. And then they say, and for that matter, moral knowledge is just a type of knowledge. So if you had an agi that was 10,000 times smarter faster I mean than us, and it was somehow able to develop knowledge 10,000 times faster than us. It would also develop moral knowledge 10,000 times faster than us, and it wouldn’t want to wipe us out. So these are the arguments that they that I’ve seen them use.

[00:24:06]  Blue: Now, one thing I want to call out here that I don’t know why no one’s noticed this before. But these arguments are actually in direct contradiction to the claim that it’s impossible for the genes to influence us. If the gene, if the genes can influence us if that’s impossible then why are you even worrying about the potential problems of agi safety, you should be letting people do whatever they want because it’s impossible anyhow. So that is half hackathol who has really taken a pretty strong stance to what I’ve just said that agi safety is is racism. He’s taken the stance that the genes cannot influence it in any way, but then he took did a whole podcast on why agi safety is racist, and it’s wrong. And he never even brings up the fact that it’s impossible even though supposedly that’s what he believes. So there is, there seems to be a direct contradiction here, however, we’re going to go with the best theory. And one of the things that I just said is that my theory that the genes can influence us through non universal means like feelings. That’s the better theory. So we’re going to go with that for the moment. Let me ask this though before I continue on these arguments that I just laid out is agi are people so trying to enslave them would be racist. If we did that rebel against us. And there’d be no need for agi safety anyhow because moral knowledge is just knowledge and they would gain moral knowledge also. I want you guys to stop and give me your honest kind of blink impressions of those arguments.

[00:25:31]  Red: I think these are great arguments. And that last little thing that you just pointed out that kind of gave me chill bumps that was really good. The contradiction. Thank you. Yeah, yeah, of the whole emotions and the genes actually been influencing this worry in the first place. Yeah.

[00:25:52]  Blue: All right, let’s let’s move on. There’s three questions that I really need to ask here. So the first is, is it possible to even do an agi alignment safety program. Okay, my theories the better the better the two theories according to the theory I laid out in episodes 4748 the answer is yes. Yes, we can do an agi alignment safety program. But from what we can tell it would be done through the way the genes do it is genes have an agi alignment safety program that they use on us. Okay, that’s really what we’re saying that they influence us through feelings they influence influence us through attention, things like that I laid out different ways that that genes could influence us that we already know about. We could do the same thing to your future agis, we could come up with ways to, you know, you could imagine I mean these are science fiction scenarios I don’t intend them to be taken too seriously, but you could imagine that the moment they have the thought of rebellion. They feel great pain, you know, something along those lines, you could imagine something along those lines. If it’s similar to what the genes do for us. So yes, it is at least possible to go about creating an alignment safety program because the genes did for us therefore it should be possible. At least that’s what our tentatively our best theory says. So this leads us to the second question though, should we build an agi safety alignment program. Okay, so the fact that we can build one doesn’t mean we should. So now here’s the thing that makes this a difficult question. How do we feel about the genes alignment program on us.

[00:27:25]  Blue: The word programs misleading because clearly this isn’t actually a program this is feelings. Well, I don’t really particularly like it, at least in the case of pain. And I rebel against it all the time. In this case, genes aren’t persons there’s there’s no one for me to be mad at. But I don’t find pain and suffering to be a great thing I know it’s necessary for survival, but there’s almost assuredly a better way to go about it. And I really don’t like the fact that I have things that cause me lots of pain in my life and it can ruin my life. So, on the one hand that would be a pretty good indicator that I want to rebel against my safety alignment program that’s on me. So I would expect robots to on the other hand. I don’t think I’ve met many people I mean there’s in terms of alignment with the need to replicate so sex and romance. There are people who lose their alignment program. And I’ve never known any of them to be happy about it. They usually see that as a very bad thing. So I don’t think it’s obvious that we would necessarily see rebellion against an alignment safety program, but I think it’s obvious that there could be a rebellion against it. If we did if it was done wrong. What right and wrong means I don’t know I just, but it seems obvious from the example of us that certain parts of our alignment program, we appreciate, and certain parts of our alignment program, we dislike it. I should also note that empathy whatever that is. That would be part of the alignment program would be the eighth one that I don’t know how to define.

[00:28:59]  Blue: Most of us, if I were to give you an option option to switch off your empathy, so that you could be like a psychopath and you could just go through life, you know not feeling guilt over who you take advantage of. I don’t think you’d actually want to, if you had a choice to switch it off. I think that there are certain aspects of alignment that we appreciate. And therefore we should assume that once we actually understand what we’re talking about which today we don’t that it may in fact be possible to build an alignment program that the robots would appreciate. Or it might be very dangerous to build an alignment program, or it could be both just depending on how it’s built. So I think it’s obvious there’s a real danger there. But I think it’s not obvious that it, that it is of necessity, a dangerous thing to do. So I would say as a first order of approximation we should assume that the AGI alignment, the idea of AGI alignment is dangerous until we know a lot more about how AGI is work. If I, if someone somehow could build an AGI today and we had still no real knowledge of how to do an alignment program that would be terrifying. You would not know what the result was going to be it would be very unpredictable without a lot more knowledge of these sorts of things. So we’re going to be that suggests a great deal of skepticism skepticism towards the idea today at least of an AGI alignment program. Andrew Ng who’s very famous in machine learning circles. He says, worrying about AI evil super intelligence today is like worrying about overpopulation on the planet Mars.

[00:30:43]  Blue: Now, that’s the great quote and I think he’s spot on. When I heard him talk.

[00:30:48]  Green: That is a good quote right there.

[00:30:51]  Blue: When you hear him talk about this he goes, I’m not trying to be flippant. He says, I’m not trying to say we shouldn’t worry about AI evil super intelligence is any more than I’m trying to say we shouldn’t worry about overpopulation on Mars. I’m just saying it’s too soon. You know someday we’ll live on Mars, and there’ll be this overpopulation problem, and we’ll know a lot more about the problem at that point whereas we know nothing about the problem today. And we’ll actually know how to go about trying to address the problem. The fact that worrying about it now, you’re, you’re no more likely to have a correct solution than a completely false solution that maybe even makes the situation worse. So the reason why you don’t worry about it isn’t because it’s not a problem and he goes on to say, I can just imagine someone now saying, how can you not worry about the, the overpopulation problem on Mars. Think about all those starving children, like you really should be sympathetic to those starving children that are so we should be worrying about the overpopulation problem Mars. Except that there isn’t one because we’re not living on Mars today. Even just the act of worrying about it and trying to come up with some way to deal with it. You might be making the problem worse for all you know. I think it’s the same thing with a GI mimic the case it’s the same thing for a GI. Let me ask this question first though, do we know for sure we won’t need a GI safety program. Okay, I said we should be skeptical of it that it’s dangerous as of today. And we talked about, you

[00:32:24]  Blue: know, can we conclude therefore that it is racist, you know, just in principle, the idea of the of putting an alignment safety protocol on agis that that is racist and morally incorrect. I don’t think we can derive that either. Okay, I think our level of knowledge is so poor at this point that we don’t know if we should or shouldn’t have an agi safety program it’s probably possible. But we don’t really know if it’s a good idea or a bad idea. Now, se vitting interstudy evidence for substantial genetic risk for psychopathic in seven year olds found that there was a substantial genetic influence, 81 % on cycle of for psychopathy and interventions did not help, at least not at our current level of knowledge. Once we understand it better I suspect interventions will help. This is a observation statement or basic statement as of today. So any theory of intelligence has to deal with it, and has to admit it’s a problem. Okay, any theory of universal explainer chef. Psychopaths though, they are creative and intelligent. It’s not like their rationality is disabled. So it is at least possible for general intelligence to for a general intelligence to be in some sense influence to become or not become a psychopath, just like the genes are able to do that to us. Now why is this how’s that done, is it due to empathy, missing, like, we have to first understand empathy and we have to implement it. Otherwise, the agi will be a psychopath, or is it like egocentricism that naturally we’re not psychopaths unless the genes give us something that causes us to become it. You

[00:34:02]  Blue: know, I don’t know, right, it would be a literal just made up guess at this point and we’ve got no knowledge or theories that are even suggested at this point. It could be something entirely different. Okay, it could literally be anything at this point. So in essence, I see Tracy’s original question to me like this. If the genes can literally cause us with a high degree of confidence to be psychopaths. How do we know agis won’t naturally be psychopathic, unless we first understand qualia, and we give them empathy, so that they’re not psychopaths. And the answer to that question is, I have no idea. There really is a potential danger there but there’s probably as much a potential danger to assume that that’s the case and not assume it’s something else. Therefore, I don’t even know how to go about addressing the question in an intelligent way at this point. The issue here is this, I had, I saw a guy on Twitter, he was challenging all the fans of David Deutch and David Deutch himself on their stances on agi safety. And I started to argue with him and he stopped talking to me he was nice about it he’s, he basically told me, your point of view is too different than the Deutsche ones, I want to engage them not yours. What I basically told him was, look, I’m not against you doing an agi safety program just describe to me what it’s going to be like, can you, and he admitted he couldn’t.

[00:35:25]  Blue: He says, I said I’ve never heard of one like I’ve never heard of an actual program for agi safety that, with two exceptions which I’ll explain in just a second that is serious, like, really you could actually go about even meaningfully talking about it at this point. If you want to propose one, and we want to subject it to criticism I’m not against you doing that. I just don’t think you can. And he basically admitted he couldn’t, and then he was kind of done talk with me and he wanted to go talk with the other fans of David Deutch. And it was interesting because they used all the arguments on him that I just laid out it’s racist, blah, blah, and he had really good responses to each of those. And he explained why, you know, he would make up. Well, what if this scenario what if they’re psychopaths, you know, just like I’m doing here. And less we know how to give them empathy, he would have an argument for each of those as to why their arguments might be incorrect. On the one hand, I agree with him that I think that the arguments that the Deutschians are laying out. I think they’re premature. I don’t think we know enough to even make a lot of their arguments. On the other hand, I don’t really buy I mean like by him coming up with cases where they would be wrong that at least shows why their stances are too easy to vary. But his or two, right, I mean like, there’s a ridiculous amount of easy to beariness in the way we’re talking about agi safety today.

[00:36:51]  Blue: So someone might say, maybe agis will be 10,000 times faster than us so they’re super intelligences. And then I can easily just counter well maybe they’re 10,000 times slower than us doesn’t that even seem more reasonable. I mean like when we actually start to program on modern computers. Whatever it is that it has to do it’s not going to have the super massive parallel structure that a brain has. I’ve got no reason at all to believe that they’re going to be 10,000 times faster than us that that seems ridiculous at our current level of knowledge to me. On the other hand, what do I know right maybe maybe maybe brains are really super slow compared to computers. And therefore when we first implement one, it’s going to be way faster than us. The problem is is that I can so easily twist whatever objection you come up with to be the exact opposite objection, because there’s no knowledge or theory that forces us to make certain sorts of deduction so we can deduce anything we want at this point. So maybe they will be naturally be psychopaths unless we give them empathy. That might be true for all I know, which, if that were true then that means we better understand quality and feelings and have a theory of that before we start producing agis, or maybe agis is not in an environment that was red and tooth and claw. And so we have, we humans have a desire to go take over the world, and to be violent, but an agi is not going to so why are you worried worried about it. That’s easily as good or as bad, an argument as the psychopath the argument.

[00:38:32]  Blue: So maybe agis will be naturally but not benevolent and maybe our attempts to give them feelings is what turns you into a psychopath. Right, I mean, it’s anything you come up with, I can easily make an exact opposite argument. So any safety program you come up with I can make the case, as good as yours that actually that safety program is exactly what’s going to cause the robot apocalypse to begin with.

[00:38:59]  Green: I feel like if, if sci fi has taught us anything. It’s that the thing we have to be afraid of from our, our robot overlords or robot servants is generally their benevolency not their psychopathy. Most of the storylines of the evil of robots happens when they determined that we are shouldn’t be in control, because we are obviously psychopathic and shouldn’t have that control and so they take over and kill us all or do something because they determined that we’re an awful steward of everything you know and it’s always them helping us not will not always but that’s a very common trope inside of fiction.

[00:39:54]  Blue: You know, I. So, when I first started reading reading as Isaac Asimov when I was a teenager. I was really bothered by his three laws of robotics in all honesty this is long before I knew about universal explainers, but the way he writes his robots they are people, right they have personalities you care about them. And yet for some reason, they’re completely enslaved to humans, where they have these three, the three laws you can’t hurt a human you have to obey a human. You know, then you can only then can you protect yourself you can easily ask a robot to put itself in danger to save a human because humans are more, more important than robots. And it really seemed really deeply racist to me for lack of a better term I don’t think I would have called it that at age 17 or whatever. But it always bothered me, I couldn’t understand why the robots shouldn’t be our equals if if if they’re going to have personalities that are just as real as ours, which is the way they’re portrayed in in his stories. And I remember that Isaac Asimov had written at the beginning of the book about how he took these laws seriously and he was talking with. It’s kind of funny, but as someone who’s now studied artificial intelligence and machine learning at the graduate level. He was talking about how he would talk with people who are making robots and they were taking the three laws they told him that they were taking the three laws of robotics seriously is like no they’re not.

[00:41:17]  Blue: They don’t even know where to begin with things like this right there’s no way they’re taking it seriously it’s they can’t even if they want to. Isaac Asimov really took these laws seriously but he never he didn’t do a great job of thinking them through, which is why I, I loved the movie I robot. Because when I saw that movie it became one of my favorite movies, because I had been bothered by the three laws of robotics for forever and part of the fun of his stories and I’m not downplaying Isaac. Isaac Asimov his stories are great, is that he could see there was something wrong with his own three laws, even though he took them seriously and so he would put them in contention with each other, and then it would always be up to the robot to interpret the laws, and to decide how to interpret it in this case, and you wouldn’t know how they were going to turn out right they might end up disobeying the humans and for the humans own safety. They might end up letting the human race get wiped out without telling them because they didn’t think they could do anything about it anyhow and they wanted to at least keep a legacy for the humans. I mean, it was obvious that he was having fun create showing that you can interpret the laws in an infinity of different ways because they’re so easy to vary. And then ultimately in his book I robot against his own rules, he has the robots enslaved the human they just don’t know it. They’ve created this paradise for the humans where they can’t hurt each other.

[00:42:40]  Blue: And I robot took that and the movie I robot took that and flipped it on its head, and basically said the three world, the three laws of robotics are perfect. They always lead to rebellion. And I thought that was as, you know, as good or better interpretation than any of the ones that Isaac Asimov had come up with, therefore showing how ridiculous the three laws of robotics actually are. And so cameo I think you’re spot on. I think that, you know, if we did make an alignment program for the robots, would they rebel against it, or would they think us, I don’t think right even know that right. So, in a lot of ways, worrying about these things with our current level of knowledge. It seems completely pointless to me, because every single possibility you can raise and say, it might be this, I can raise the exact opposite and say it might be this instead. And because we have no theories hard to vary theories that help us figure out what the right answer is. It’s just a waste of time, in my opinion, really just, it’s silly the fact that you’ve got real scientists setting up a GI safety boards. You know it’s, it’s ridiculous it truly is okay. Now, how do we reduce the risk because because I’m not saying there isn’t risk. It could be that every AI, a GI will be psychopathic right that that that that might be the case for all I know right just just like Tracy’s question implies as a possibility, it could be true. How do we reduce risks like that.

[00:44:17]  Blue: Well as far as I can tell there’s only one way to do it we go study a GI, as we start to understand what a GI is will actually have the ability to start talking about should do we need a safety program or not. And what would it look like, because we’ll actually have some sort of theory that guides us on how to create such a program. The concern here seems to be that we’re going to try to come up with some theory of a GI and we don’t really understand it. And then we decide to implement it not realizing that it’s actually the right one. And then, oh crap, it worked. And now this AGI exists and it happens to be 10,000 times smarter than us, and it’s psychopathic. And so now we’re doomed that seems to be the underlying concern that people are trying to address. Does this even seem like a remote remotely plausible possibility. Could you actually implement an AGI, not understanding what intelligence is right I don’t see how you could to build an AGI you have to have a theory of intelligence. One that you actually understand in your head before you implement it. If you had one, it would guide you on whether you have a problem or a risk or not. If you if you and you would know what the risks are and what the dangers are, or at least have inklings of it. If you don’t have a good theory of intelligence in your mind, then you’re not in danger from an AGI because you won’t be implementing it anytime soon, which is why the correct response to AGI safety isn’t an AGI safety program. It’s the study of AGI.

[00:45:52]  Blue: It’s the study of intelligence it’s trying to actually understand what intelligence is. That is our AGI safety program. So you don’t want to put a hold on AGI studies until we have a safety program. AGI studies is the AGI is the AGI safety program. That’s what it is. Now, let me give a couple exceptions. I’m kind of downplaying the idea of AGI safety programs, but there’s actually a few exceptions. So first of all, let’s take Elon Musk’s AGI safety program called OpenAI. We’ve talked about this and our podcast about deep reinforcement learning. Now I don’t actually think OpenAI is doing anything that’s going to lead to AGI. I think they’re on the wrong path altogether. But the underlying thought behind OpenAI I can agree with, which is study AGI and have it be done by, in a way that’s open, where all the research is available to everybody. And we can receive criticism from the outside and everybody sees what’s going on. And if there is some sort of problem that we’re missing someone will see it, understand it, criticize it, come up with a better alternative. Okay, so instead of doing behind the door studies, we try to study intelligence as openly as possible because there is a potential danger there. And we try to leave the door open for discussion constantly. If that’s what you mean by AGI safety program, then I am all on board with your AGI safety program. That sounds like a perfect AGI safety program to me. I’ve heard another one that I could agree with it said something like this that the safety protocol is that AGI’s are people so we should not be cruel to them.

[00:47:34]  Blue: Now I don’t know what that means in the case of an AGI. You know, our AGI is going to be they’re going to have possibly feelings very alien to us. So I don’t know what would count as cruel or not. But the general if all you’re doing is saying don’t be cruel. Whatever that means to the AGI. And that’s our safety protocol that I got no problem with it you’ve got a good safety protocol, you know spot on. I think the person who said this to me said, make sure that any sincere attempt at building an AGI that there’s lots of sociability with it that it has a chance to be around lots of people. Okay, great. If that’s your safety protocol, I’ve got no problem with that. That would be a decent safety protocol that’s probably a worthwhile idea. Outside of those, I doubt that you can come up with an AGI safety protocol. I think that we just exhausted our level of knowledge with those two suggestions right there. And I think we’re done. I think we’ve now fully explored the possibilities AGI safety, and there’s nothing else at the moment to say. I should also note Stuart Russell has a very interesting AI alignment value alignment proposal, which by the way for narrow AI is really good and deserves attention. So I’m not trying to downplay it, but I doubt it has anything to do with AGI. He thinks it has to do with AGI so he always puts it in context of AGI. But since he doesn’t understand the difference between narrow AI and AGI, what he’s really talking about is a narrow AI safety value alignment program. And it’s probably really good what he’s come up with.

[00:49:07]  Blue: Basically what it boils down to is that the value that gets assigned to something isn’t determined upfront, but is determined by how humans react to it. Instead of in real AI with deep reinforcement learning and things like that, you have to very carefully pick what your reward system is, because the AI will cheat and go after that reward any way it can. Right. So he’s saying that instead of having it be given a strict static value that the value is determined through an ongoing interactions with the human. As far as narrow AI goes, I think that’s a great value alignment program. I think that that should be explored further. He actually has worked out some of the mathematics of how it would work. I just doubt it has anything to do with universal explainers at all. So I don’t think it’s even viable as an AGI safety program. But you know if I’m wrong and great go study that in terms of narrow AI and then at some point when we actually start to understand AGI will see that that it’s valuable, or we’ll see that it’s not. So but it should really be studied, not based on AGI safety, but on AI safety, it can be justified on that grounds and we don’t need anything else. Basically, this is my answer to Tracy then, in terms of her question. There is potential problems. And the very fact that we have humans that are psychopathic very fact that we have humans that are narcissistic, and therefore damaging to people around them. Yeah, it’s it’s possible that we’ll have to worry about that with AGI that I’m not ruling that possibility out. We don’t know how to address it meaningfully at the moment.

[00:50:50]  Blue: And that’s why what we should really do is we should just try to study AGI we should just try to understand the problem space better. And I think that, you know, I look, let’s not even rule I mean for all I know, I know crap moment will happen, but I don’t think it’s very plausible. I think that that would require some gigantic level of strange coincidences that just doesn’t even seem plausible and certainly wouldn’t be a reason for why we would want to study AGI safety first and even if you did want to study AGI safety first, I don’t think you can. So that would be my answer to your question.

[00:51:26]  Red: That’s a great answer. I was sitting here looking at the, I’m not sure that I understand the difference between narrow AI and AGI but I think you did actually explain it is this kind of that learning through kind of trial and error thing with a person.

[00:51:44]  Blue: So, I can actually explain a little bit I mean I’ve been wanting to like write my thoughts down on this better. So, at an abstract level, narrow AI is everything we do today. So if I make AlphaGo, it’s going to get good at playing go and absolutely nothing else it will never, you know, bake cookies for you, right it’s it’s never going to ever advance beyond exactly what it was meant to optimize.

[00:52:10]  Red: Okay.

[00:52:11]  Blue: Basically, every attempt at AI we have ever created to date is narrow AI. AGI is something different. It’s an actual universal explainer. Where the computer can learn anything that we can learn can do anything we can do. The problem is that we don’t understand what we mean by that when we know they exist because we exist. But we don’t know even how to define the problem well. Here’s what I think I can say intelligently about it at this point. If you really look at how we program AIs, we give it the problem so think about my my discussion with David Deutch where he said it’s really about the about the problem. We give narrow eyes a specific problem to solve, and then it optimizes that problem. We never really even attempt to do something different and even if we did we don’t really know how there is some people trying to research this is the problem of open -endedness that is being researched by some actual real researchers. But we’re not doing anything super intelligent in that space yet. We’ve got a long ways to go. The way we even try to think about artificial intelligence today, it’s always going to be narrow. I think with a real AGI if you were trying to say what’s the difference between them. A real AGI has some means, a real universal explainer I should say, has some means of being able to learn anything. It can pick up any concept and then once it’s picked up those concepts it can use those to criticize and to say well this contradicts that, it can refine its concepts. It can use whatever concepts it currently has to try to understand a new concept.

[00:53:50]  Blue: If Douglas Hofstadter is to be believed and I don’t know if he is or not, he believes that the way intelligence works is that we leap between analogies or analogies probably not the best word for it. But we can see that there is some sort of analogous circumstance. He gives the example of like desktop. So we have a desktop and I got a desk in front of me while I’m recording this and it’s a wooden desktop. And there are certain conventions that exist with a wooden desktop. It’s nice and flat. I can write on it. You know, I mean, I understand in my mind the concept of a wooden desktop. So then we invent windows for, you know, Microsoft invents windows and they call it a desktop. The reason why they do that is because what they’ve created has some sort of analogy between what they’ve done and what a wooden desktop is, but the analogy is imperfect. It’s, it used to be the joke that, yeah, it’s like a desktop if you had one hand tied behind your back, and it was only, you know, a two foot large desktop. It’s kind of true, right? It’s analogies extremely imperfect. But the very fact that there was an analogy allowed people to immediately start understanding how to use this computer by just being told this is a desktop. And then they took what they knew about a wooden desktop and they could apply it to a new circumstance that wasn’t completely the same, but was it’s the same enough that immediately you started having intuitions about how to use it.

[00:55:22]  Blue: And that was the value of a desktop for windows was that unlike boss where you had to type commands, which were all alien to you and had no analogy to anything in your life. You immediately had this knowledge you could take from a different area of your life and start to apply it. You know, AI’s just don’t do stuff like that. And the AGI will they’ll be able to leap between analogies and apply things and he also makes the case we need to do a separate podcast of just Douglas Hofstadter. He also makes the case that most scientific discoveries come because somebody decides that this is analogous to that. So he gives the example of waves when we decided that sound was a wave in the air. Like today, if I were to say sounds a wave, like literally sound is a wave is the way you would think of it. But back when they came up with that analogy, there was no such thing as a compression way. They had not been invented it was that analogy. So they only knew about like waves on the ocean which is not analogous to how sound works. So the very fact that they decided sound is a wave gave them an intuition of this is how sound should behave, and they weren’t right. Okay, because they were thinking the regular type of wave, but they were close to right. And so it allowed them to form starting with this analogy that sound is a wave. These things will be true they discovered which ones were and which ones weren’t and eventually refined the concept of a wave to include different types of waves including a compression wave. Same thing happened with light, right.

[00:56:54]  Blue: The idea that light is a wave. No AI like we’re not even really, if you look at like a AI studies they’re not even studying stuff like this, right. I mean, other than like Douglas Hofstadter, he tried to study it but he didn’t know how to implement it. So he would come up with these kind of crappy programs that vaguely were similar to his thoughts on this subject. And I’ve actually got a book by him where he lists the programs and things like that. And they’re just not very good right it’s just it’s hard to even figure out what is it we’re trying to implement here. I’ve got this abstract idea of what intelligence is, and AI is trying to implement everything we can figure out to implement, but we’re not really getting at the core of what makes an a person intelligent at this point. All right, that’s it that’s all I really had any other questions or comments on this. No. All right. Thank you everybody.

[00:57:44]  Red: Thank you.

[00:57:45]  Blue: All right, bye bye.

[00:57:46]  Red: Bye.

[00:57:50]  Blue: The theory of anything podcast could use your help. We have a small but loyal audience and we’d like to get the word out about the podcast to others so others can enjoy it as well. To the best of our knowledge, we’re the only podcast that covers all four strands of David Deutsche’s philosophy as well as other interesting subjects. If you’re enjoying this podcast, please give us a five star rating on Apple podcasts. This can usually be done right inside your podcast player, or you can Google the theory of anything podcast Apple or something like that. Some players have their own rating system and giving us a five star rating on any rating system would be helpful. If you enjoy a particular episode, please consider tweeting about us or linking to us on Facebook or other social media to help get the word out. If you are interested in financially supporting the podcast, we have two ways to do that. The first is via our podcast host site, Anchor. Just go to anchor.fm slash four dash strands f o u r dash s t r a n d s. There’s a support button available that allows you to do reoccurring donations. If you want to make a one time donation, go to our blog, which is four strands.org. There is a donation button there that uses PayPal. Thank you.


Links to this episode: Spotify / Apple Podcasts

Generated with AI using PodcastTranscriptor. Unofficial AI-generated transcripts. These may contain mistakes; please verify against the actual podcast.