Episode 7: Lies, Damn Lies, and Statistics

  • Links to this episode: Spotify / Apple Podcasts
  • This transcript was generated with AI using PodcastTranscriptor.
  • Unofficial AI-generated transcripts. These may contain mistakes. Please check against the actual podcast.
  • Speakers are denoted as color names.

Transcript

[00:00:00]  Blue: The Theory of Anything Podcast could use your help. We have a small but loyal audience, and we’d like to get the word out about the podcast to others so others can enjoy it as well. To the best of our knowledge, we’re the only podcast that covers all four strands of David Deutsch’s philosophy as well as other interesting subjects. If you’re enjoying this podcast, please give us a five -star rating on Apple Podcasts. This can usually be done right inside your podcast player, or you can Google The Theory of Anything Podcast Apple or something like that. Some players have their own rating system and giving us a five -star rating on any rating system would be helpful. If you enjoy a particular episode, please consider tweeting about us or linking to us on Facebook or other social media to help get the word out. If you are interested in financially supporting the podcast, we have two ways to do that. The first is via our podcast host site, Anchor. Just go to anchor.fm -4 -strands, F -O -U -R -S -T -R -A -N -D -S. There’s a support button available that allows you to do reoccurring donations. If you want to make a one -time donation, go to our blog, which is four strands.org. There is a donation button there that uses PayPal. Thank you. Welcome to The Theory of Anything Podcast. I’m Bruce Nielsen. I got cameo here with me, and today we’re going to do a subject that cameo herself chose that she wanted to talk about, which was statistics. Probably heard the joke about lies, damn lies, and statistics. That’s what we’re going to talk about today, is how people use statistics or misuse statistics. Absolutely.

[00:01:48]  Blue: I’ve got a couple things just to use as an example. Cami and I started this conversation over lunch. This is right during the coronavirus scare, so it was an online lunch. Here’s some things that we had talked about. Suppose there’s a woman that’s age 52, and she hears this statistic, that women over 40 have a 5 % chance of getting breast cancer. What is her chance of getting breast cancer? Anybody know? Probably the assumption would be 5 % chance, right? But suppose later, she hears this statistic, that women over 50 have a 6 % chance of getting breast cancer. Well, she’s both over 40 and over 50, so which one applies to her? Again, maybe you would assume it means 6%. Oh, she’s actually in the 6 % category. But now suppose this woman later hears that women over 60 have a 10 % chance of getting breast cancer. Now, what is her percent chance of getting breast cancer? I’ll stop and think about that one for a second. What do you think, Cami, at this point?

[00:02:56]  Red: None of those percent, because you and I have already been talking about this, of course none of those percentages actually represent her likelihood of having breast cancer, because breast cancer is typically based on personal and individual things.

[00:03:16]  Blue: Yeah, so in this case, it’s hard to even figure out, she’s not even in any of these categories, right? I mean, women over 50 includes women over 60 who have a much higher chance, so does that mean her chance would be lower than 6 %? Probably. It’s hard to figure out, based on these numbers, because statistics apply to populations, they don’t apply to individuals, and so we use statistics as if they apply to individuals, but they really never directly apply to individuals.

[00:03:47]  Red: Yeah, and even within populations, so the reason why we wanted to talk about this is we see a lot of statistics about how much people are going to die, because of coronavirus. Originally people weren’t scared so much like two or three weeks ago, because the death rate is lower than the flu, or the death rate is just 1 % bigger than the flu, or a variety of different percentages that we were understanding, and when people think about those percentages, a lot of times they assume that they’re going to be applied consistently across the population, so if we know that the death rate is 10%, that’s pretty scary, because one in 10 people could be dying, but the book that has been in my head a lot lately is a book called The Doomsday Book by Connie Willis, in which a population ends up having 100 % mortality, a small population, because they’re unable to care for themselves, and they all end up dying from something that only has a 20 % death rate, and so that was what you and I first started talking about that kind of got us into this mode that wanting to talk about probability, and I love that you have this slide now, the two uses of probability theory.

[00:05:26]  Blue: Yes, so there are two uses for probability theory, and this is something that I think people don’t think that much about, okay, so like take a die, and before you roll it, one of the chances you’re going to roll a six, well obviously it’s one in six, now you probably don’t think about this, but now I take a die and I roll it, and I cover it with a ball, so I don’t know what I just rolled, okay, one of the chances that the die underneath the ball is a six, well obviously it’s one in six, okay, but there’s actually a difference between these two cases, one is actual probability, and the other is ignorance, so we use probability to cover ignorance to discuss what we don’t know, and in fact this is probably the main way we use probability theory today, is to discuss things, and now you can see from this example that it makes some sense, right, this isn’t just something stupid, the odds that the die is a six, so the die is either a six or it isn’t, right, it’s already been rolled, it’s been determined, it’s either a six or it isn’t, right, so there’s no probabilities at all involved in the strict sense of probabilities, and yet to say there’s a one in six probability still makes sense to us, right, because we just don’t know what it is, but we know it was produced by a process that gave it a one in six chance, right,

[00:06:53]  Red: you’re breaking my brain a little bit here,

[00:06:57]  Blue: so it gets crazier though, right, so what’s the difference between these two examples, well it’s a little hard to describe what the difference between the two examples are, okay, other than the obvious one that one’s a straight up probability and the other one is somehow related to ignorance,

[00:07:13]  Red: okay,

[00:07:14]  Blue: so we use probability to cover our ignorance, to measure our ignorance, and there are ways in which this makes sense in ways in which this doesn’t make sense, and it’s not always obvious which way it is when we start using probability theory on things, okay, so suppose we wanted to discover that some disease has a hundred percent chance, you have a hundred percent chance of getting some disease if you have a certain gene and the gene has exists in one in ten thousand people, okay, so the question I want to ask is what are the odds that, you know, I’ll get this disease, okay, so right off the bat, we could say well there’s one in ten thousand people have this gene, we can treat that as a probability, we can say there’s a one in ten thousand chance you’re gonna get this disease, okay, if you live long enough or whatever, right, okay, so and this this makes sense, right, this isn’t complete stupidity that we’re using probability theory in this way, but there’s something a little off about it too, okay, so just a quick primer on probability theory so I can use some use some notation here, so we put P disease, if you’re listening to this, if you’re watching this on YouTube then you can see the screen, but if not then you are gonna have to just bear with me here for a that you’re gonna get the disease and so we’re postulating that it’s a point zero one percent chance, one in ten thousand, okay, then the next one is probability of the disease given, the bar means given that you have the gene, well that’s a hundred percent chance, okay, so if no you have the gene there isn’t a probability that you’re gonna get the disease, you are going to get the disease, okay, then there’s the probability that you’re gonna get the disease if you don’t have the gene and in this case we’re claiming it’s zero percent, if you know you don’t have the gene there is no chance you’re gonna get the disease, so then the real question that we’re asking is what’s the probability that you have the gene, right, we’re postulating that that is also point zero one percent because that’s one in ten thousand people that have the gene, but the odds of you having a gene is not random, okay, obviously it depends on you know who your parents were, right, whether they had the gene or not and that’s not going to be randomly distributed throughout the population, that may be the odds across the entire world, but like maybe nobody in India has the gene or it’s way more rare there or something like that, right, so which population you’re a part of matters in this case and that odds point zero one percent isn’t really going to apply to you, right, it’s just it’s the best we could come up with to express our ignorance if that makes any sense.

[00:10:10]  Red: It makes, you know, it actually makes a great sense and I love this example especially you know given coronavirus and how much conversation we’re having about which populations are likely to be hit hardest, you know, when you look at death rates right now in Germany as compared to Italy there’s so much of this under the surface and so much of people postulating about what’s causing different death rates, right, where really we’re just doing it to cover our ignorance, it’s fascinating, go on Bruce.

[00:10:46]  Blue: Okay so this is a famous example that comes up, so I’m obviously studying machine learning and working on a master’s degree, this came up in my machine learning class, this comes up everywhere, they always bring this up, so this is an example that gets used, so it’s the idea of screening for a disease, okay, so you have a test for a disease, let’s say it’s the coronavirus test or whatever, right, and it’s 95 % accurate, okay, what we mean by that is that the probability that it’s going to show a positive if you have the disease is 95%, so that’s the second line down there, right, right, okay, but it gives 5 % false positives, that means that if you do not have the disease there’s a 5 % chance it will still say that you have the, you know, say positive, okay, so now let’s say the disease is, you know, one in a thousand, so point zero zero one, and you’ve just been tested positive, what are the odds that you have the disease, so just using your intuition cameo, make up what you think the odds are going to be based on what I’ve told you so far,

[00:11:50]  Red: well, you know, the easy answer, the answer my brain wants is that there’s a 95 % chance that I have the disease,

[00:12:00]  Blue: yeah, okay, so because that’s easy math, right, is to just say well that that’s as good as I can guess anyway, so why do you think that that number is wrong,

[00:12:11]  Red: well, I’m not sure I believe that it’s wrong or right, because I don’t believe any of this data probably is fairly relevant to me or to anybody in these, you know, in this specific instance,

[00:12:24]  Blue: all right, so there’s actually a way to calculate the actual probability using something called Bayes theorem, which deserves its own show, so I won’t get too far into it, if you look on the screen here though, I give what the formula is for Bayes theorem, okay, and what Bayes theorem does is it allows you to switch the order, so we know what the probability of getting a positive if you have the diseases is 95%, what we really want to know is what’s the chances that you have the disease if you have a positive, okay, so we want to take that and flip that, you see how I’m showing how it can be flipped there with Bayes theorem, yeah, okay, now I’m not going to use Bayes theorem, I’m going to just give you straight numbers that will be far more intuitively obvious, okay, here’s the actual numbers, so let’s say we have a population of 100,000 and how many of them based on our numbers have the disease, well it’s one in a thousand, so a hundred of them have the disease, okay, and the rest don’t, now we test all of them and of those 100 that have the disease, 95 of them get a positive and five get a negative, okay, because that’s the numbers, that’s the chances, right, right, okay, now of the 99,900, 5 % of them get a positive and 95 % of them get a negative, okay, so that’s 4,995 that get a positive, if you have a positive from this test, you’re in that group of 5,090 of which only 95 have the disease, in other words, there’s a 1.9 % chance you have the disease,

[00:14:05]  Red: interesting,

[00:14:06]  Blue: okay, okay, if I tested positive, if you tested positive, so this is actually why it doesn’t make sense to have a screening test like this and give it to the whole population, so this assumption here is that we’re giving it to the whole population which in this case we’re saying is 100,000 people, you would never really do this in real life, it would be stupid to use a test like that, that way in real life, what you actually do is you go to the doctor, the doctor first checks you out, says oh your symptoms suggest you have this disease which although I don’t know what the probability is at this point moved you into a much higher probability category for having the disease, then the test makes some sense, right, sure, but if you just went out and you just tested everybody, the test would be basically useless,

[00:14:56]  Red: yeah because well, but you know for us as humans when we look at numbers, we think that 95 % accuracy is a pretty good number, yes, when we look at that we say okay, in fact when you put this example up here, one of the first things that I thought about was well as condoms, condoms are considered to be 98 % accurate when they’re used correctly every time, but even 98 % accurate isn’t great if what you’re trying to do is prevent pregnancy.

[00:15:32]  Blue: And it’s also interesting, obviously if you were, let’s say for some reason you had to test the entire world, what you would want to actually do is you would want to have the test multiple times, right, it would be super expensive, but like you would want to have the test get down to okay, well this is the group that is most likely to have it, you know, and then you would want to test them again, then you want to test them again until you got it down to some level of odd that was reasonable, it would take a while to get there.

[00:16:03]  Red: You would have that 5,000 number of people who had tested positive that then you would test again, and you’d probably have about the same percentage again as you were narrowing it down for accuracy.

[00:16:19]  Blue: Yeah, but consider also if you’re trying to do is stop the spread of the disease, let’s say it’s contagious, you still got that five out there that have the disease but were tested negative, you know, it’s you got to deal with them somehow, right? Fascinating, okay. Okay, so now let’s talk about machine learning. So machine learning is based on statistics, it’s there’s a joke meme out there about a guy included it in the original version of the show and then thought, you know, I’m not sure I should because I’m not sure who it came from, but a joke about how machine learning is really just statistics, right? But it’s been reframed so that people like it, right? Reframed so that people like it. Yeah, so this isn’t quite true. Machine learning is its own field of study that has its own vocabulary and it’s got its own tools that are different from statistics, but it is rooted in statistics, that part is certainly true. And a lot of the machine learning techniques are actually statistical learning techniques, which are used by statisticians. And then they have their own machine learning has their own techniques separate from statistical learning techniques, such as neural nets, which aren’t used by statisticians. Right,

[00:17:37]  Red: right.

[00:17:37]  Blue: So, but they all work in the same way. The idea is that we’re trying to split the popular trend, we’re trying to make a prediction for an individual based on some sample that we’ve been trained on. And it’s the exact same problem. How do you apply statistics to the individual? And there’s no easy answer to it. So machine learning is trying to come up with some way to automatically split up the population to where it’s going to give you a good answer for an individual. And there’s tons of problems with it. It’s a that’s what the whole study is about is how do we solve these problems, right. And so this this graph I’ve got here, let me describe this is from one of my actual master’s papers, right. So what this is, is this was the there’s a data set called the diabetes data set where they were checking for diabetes in Navajo Indians. And you can use the machine learning techniques to try to predict if someone’s going to have diabetes or not based on certain characteristics. Okay. So what I’ve done here is I’ve done something called the t -sne, which takes a whole bunch of different dimensions and flattens them to two dimensions. And so that you can get a kind of an intuitive feel for what’s going on for machine learning. And what we have here is we’ve got class zero and class one. So class zero is is red. That means you don’t have diabetes. And class one is the blue, that means you do have diabetes. So what’s this machine learning technique doing right here? Well, it’s taking this giant population based on a number of different statistics that have been flattened to a simple x and y.

[00:19:12]  Blue: And it says I’m going to draw a line. And if you’re above the line, then I’m predicting that you have diabetes. And if you’re below the line and predicting that you don’t. Okay. And you can see the line there, right.

[00:19:22]  Red: Okay.

[00:19:24]  Blue: Well, now if you really look at it, the diabetes and not diabetes are all pretty intermixed. Okay. But you can see that rather intelligently drew the line such that there’s going to be, if it predicts that you have diabetes, there’s a better than 50 % chance that it’s right. And if predicts that you don’t have diabetes, there’s a better than 50 % chance that it’s right. So it just draws that line and it says, okay, this is it, predicting diabetes for these and not for these. And then it knows it’s going to maximize its percent chance based on this technique with the line, which is logistic regression is what it’s called. It’s predicting it’s going to maximize its predictions by drawing the line right there. Okay. Okay. Now, when I put it in this way, you see what a simple mathematical technique it is, right? It’s not even all that intelligent, right? I mean, we treat machine learning like it’s some sort of, you know, counterpart to human intelligence. But really, it’s just statistics, right? It’s just measuring, okay, I’m going to maximize my predictions if I guess here. Now we do that on the sample set. And notice that I refer to it as the training set. Okay. It gets a 70 % accuracy on the training set. And then on my cross -validation set, it’s a little less. Well, the reason why that is, is because of course it scores really well on the set that it trained on, right? Okay. The question is how well does it do in real life? Okay. And we don’t know that. You never actually get to know how well it does in real life.

[00:20:58]  Blue: What you do to try to mimic that is first we have something called a cross -validation set, where I hold back some of the samples and don’t let it look at them until after I’ve trained. And then I have something called the test set, which I don’t show here, which I don’t, I’m like maybe for each time I train it, I then use the cross -validation set, but slowly over time that contaminates the cross -validation set because it slowly learns based on, oh, I need to tweak this because it’s not doing very well yet. It starts to indirectly learn the cross -validation set. Then I have a test set I use at the very end that it’s never seen at all. And then that’s supposed to give me some sort of confidence that my final trainings, what I came up with will actually work in real life because now I’m checking against a certain amount of samples that I’ve held back that it’s just never seen before. And it never scores as well on that test set as it does on the training or cross -validation set. It always scores less well and in real life it’ll do even worse. And that’s precisely because there is no easy straightforward way to go about applying statistics to individuals. This is all the techniques we’ve had to come up with over the years to try to deal with that. And there’s really interesting things that have come out of this. Maybe you’ve heard about like racist machine learning or sexist machine learning. It’s the exact same problem. It’s the statistics problem trying to apply it to individuals, but in a special case. So I’ve got two actual online magazine quotes here that I’ve put up, one

[00:22:33]  Blue: from The Verge and one from Wired. And on The Verge it talks about how Google Photos had a problem where it was classifying, you know, somebody noticed that the photos of their African American friends were getting tagged as gorillas. And so they were embarrassed by that. So how did they take care of it? They simply removed gorilla as a label so that it couldn’t possibly do that. They didn’t have any direct way to go in and try to get it to tag correctly. So they just removed gorilla as a label and that way it couldn’t do that and they wouldn’t be embarrassed by that. So they didn’t really ever fix the problem at all. The other one that’s interesting is the Wired story. So the way they word this kind of downplays what’s really going on here. The idea was is that they had this thing that could predict your gender from your photo and it worked really well. It says that if you were white it would predict well but if you were an African American woman or a woman with darker skin then it tended to error a lot. Okay well what’s really going on here is that its sample set was probably based on a population in say the United States or something like that. There were more white people than there were black people. And so when it trained it found dark skin as a good predictor that it should guess that it was male. So it tended to guess that black women were male.

[00:24:10]  Red: Interesting.

[00:24:11]  Blue: Okay and because they’re only a small part of the training set this didn’t really screw up its statistics for trying to do the predictions and so it made it so that the end result of the predicting engine was really only useful if you were white.

[00:24:29]  Red: Right. Right.

[00:24:30]  Blue: And so what they had to actually do is they had to go back and do a separate training just for African Americans so that it would learn to predict correctly not using skin color as the basis. You know. Go ahead.

[00:24:44]  Red: Well just that it’s super fascinating the first example Google dumps tons and tons and tons of money into things like this like that they they might have been appalled at the mistake. This quote says nearly three years on Google hasn’t really fixed anything and so I’m assuming this was in 2018. What do you think Google’s trying to do to deal with these problems? I mean I assume that the fact that they haven’t done anything is in a little bit it’s miss portraying. I’m assuming that they’re actually trying to fix the problem.

[00:25:22]  Blue: Yeah. So you know there are techniques you can use to fix problems like this and this is actually a big area for study right now is how to make that they usually say how to make machine learning not racist.

[00:25:35]  Red: But there’s a broader problem here which is how do you just make machine learning accurate to begin with so that it isn’t trying to make guesses for a broad population but is better at making guesses for the population that you are a part of that you care about.

[00:25:52]  Unknown: Right.

[00:25:53]  Blue: And there was an interesting similar study to this. There was that guy that social streams done some business with Ben. I forget his last name. Ben Taylor.

[00:26:04]  Red: Oh yeah yeah yeah. So

[00:26:06]  Blue: I went to one of his talks at a conference and he had created a machine learning an engine that would take a picture of you and then tell you how good looking you were.

[00:26:20]  Red: Okay. Interesting.

[00:26:22]  Blue: And obviously there’s ethics around that too. I mean like do you want your kids to take a picture of themselves and then get rated as a four you know. Yes. So but it worked. Okay. It would take a picture you could put Kara Knightley into it and it would tell you that she was good looking you know. It actually did properly predict if this person is good looking or not. Right. And one of the things that he pointed out is that they had to do special training to try to get it to so they their data set they got was off of they found some dating site where they could download the data.

[00:27:03]  Unknown: So

[00:27:03]  Blue: they could download the pictures. They could download the ratings from real people.

[00:27:06]  Unknown: Right.

[00:27:07]  Blue: Interesting. And that was and so they downloaded they took this data off this site and if I remember correctly they had to slowly download it so they didn’t get caught and took the data off this site and then trained on it. Now here’s the thing though you’re being trained on a population that matches whoever that dating site catered to which for the sake of argument let’s say that’s the United States. Okay. Well you’re going to have a matching population that matches the racial breakdown of that population. So obviously African Americans are going to be in a minority.

[00:27:46]  Unknown: Right.

[00:27:47]  Blue: And so they would then do these ratings and what would happen of course is because I mean you can look this up. There’s certain races are more popular for dating than other races. Right. This is something that’s actually been shown well on a number of different studies on dating sites. Well it would match those. So it would rate if you have some race that was less popular for dating they would get automatically rated lower in terms of their looks. So what he did is he would tweak it so that it would rate equivalently no matter what your race was by only looking at just that population and then giving a rating based on just that population. So it would it was so if it’s giving you a seven then you’re a seven for that subpopulation rather than for the overall population and that was his way of making it non -racist.

[00:28:40]  Red: Interesting.

[00:28:41]  Blue: Now one of the things that’s interesting here though is let’s say that you’re just a regular person on a dating site that the pre -adjusted numbers actually may well represent your views.

[00:28:55]  Red: Right.

[00:28:55]  Blue: So you’ve adjusted it so that it’s not racist but it’s actually in a sense less accurate in terms of prediction for certain people now. Right. Presumably the people in the majority. Right.

[00:29:06]  Red: Because because while the population itself may not be racist it is likely to find the things that it finds attractive attractive. That’s right. Yeah.

[00:29:18]  Blue: Fascinating. Okay. So and it kind of shows how there’s no real easy answer for these questions. Right. Is your when you make it so that the machine learning isn’t racist you are giving up a certain amount of accuracy to do that or you have to find some way. I mean there’s other interesting things that come out of this. For instance the fact that even if you don’t have race in your training set it might be able to figure out race through your zip code or

[00:29:46]  Red: something like

[00:29:46]  Blue: that. Right. And so it might end up becoming racist even if there’s not an ounce of race listed anywhere within the numbers that it’s playing with. If it has some sort of correlation to race it will still find it and use it. Yeah. Because it’s just try to and there’s cases like where they’ll predict your chances of going back to jail or you know something like that they’ll use it for parole. And you know if you get a case like that you know if it starts using race and it starts saying okay you know all black men are going to not get parole. Right. Now you’re really talking about straight up racist machine learning where it’s greatly favoring one race and you can see why people would want to come up with some way to tweak that so it doesn’t do that anymore.

[00:30:34]  Red: So you know bringing it back to statistics it’s actually interesting because the way we use and talk about statistics within within our culture is kind of that way. We use statistics a lot of times to back up our own biases. Yes. And to make assumptions about other portions of populations or things like that where you know we have statistics that say how likely people are to go back to jail and we use them as as ways to be biased against certain populations. Right. Which it’s it’s really really interesting and there’s not a way to fix even if you fix machine learning it’s a lot harder to fix people’s brains.

[00:31:23]  Blue: Yeah. And this is actually an interesting thing is there’s there’s this big concern about racism in machine learning. But one of the things that you can demonstrate is that humans are biased racist sexist right. And even when they’re not intending to be they can be. But there’s some really interesting ones out there like there was a there was a somebody who found statistically so they gave they had these these inmates in jail and they gave some of them facelifts plastic surgery. So they would be more attractive

[00:32:00]  Red: and they found that the ones they gave the surgery to didn’t return to jail.

[00:32:04]  Blue: And so they had this theory where they said OK it must be that now that they’re more accepted because they’re more attractive that they’re able to go get a job become part of society and there’s no reason for them to go you know rob a bank and come back to jail again. Right. So this was this was the original theory. Well some scientist came up with this alternative theory. He says I wonder if that’s not what’s going on. So he wanted he did a study. I don’t know if it

[00:32:31]  Unknown: was

[00:32:31]  Blue: a he or she but I’m going to say he wanted to do this study. And they took people who were in court and they had pictures of them and they would rate them by their attractiveness and then they would see how many of them end up going to jail. And what they found is that juries do not send attractive people to jail. Oh interesting. So this is now the alternative explanation that you gave these guys a facelift. They aren’t necessarily fitting into society better. They just not getting sent back to jail anymore.

[00:33:05]  Red: Um the intro just a side note the interesting thing about that to me of that study is is almost nobody goes to trial anymore. Anyway our judicial system has eliminated trial as a part of of of what it means to go through the judicial system. It is really really rare for people to ever go to trial. In fact I saw some statistic that said I don’t I don’t remember actually I don’t remember.

[00:33:35]  Blue: Anyway okay so this is I have a friend who’s a lawyer who told me that uh he he goes to trial his he and the lawyers he’s partners with they go to trial they were they were saying we actually go to trial this is but if you like see one of those big billboards you know call so -and -so if you got injured or something like that he says they never go to trial ever and everybody knows that right they’re just trying to settle out of court right. So in part of his case here is you should really go with a group of lawyers that intends to go to trial right that they’ll settle out of court fine but they will go to trial if they have to because then they’re actually in a more powerful position than one of these who they make their money by not going to trial right

[00:34:16]  Red: right that’s it that’s interesting we’ll have to uh put um I think that there’s some interesting topics we could have around that I’ll we’ll have to talk about that later

[00:34:24]  Blue: okay all

[00:34:25]  Red: right so what are the chances all

[00:34:27]  Blue: right so this was actually just intended to be a bunch of examples we don’t have to talk about any of them specifically the point I’m making here is that we use probability theory on all sorts of different things and I’ve got this large list here of things that that maybe we would try to use probability theory on some of them make perfect sense some of them make sense as a measure of ignorance under some circumstances and some of them don’t make any sense at all.

[00:34:51]  Red: So what are the chances of rolling a six by six sided dice what are the chances of winning at poker what are the chances of catching coronavirus you don’t have it on here but what are the chances of dying of coronavirus right what are the chances of winning lottery what are the chances that 40 miners will beat the miami dolphins oh I like the what are the chances that Obama is the best president of the last century

[00:35:14]  Blue: yes and and we say things like this and we use probability theory language at least uh huh and it’s not always that clear what we mean right so like the chances of Obama being the best president in the last century is probably meaningless in terms of probability theory it probably doesn’t mean anything at all right um what the chances that global warming is a real problem it’s not clear if that applies to language or probability theory makes sense in that setting or not okay the chances the 49ers will beat the the dolphins we would use probability theory in this situation and we would base it based on a population of their most recent wins right right it’s not clear what that means you know it’s it’s probably a decent thing for you to do it’s probably not completely inappropriate like trying to say if Obama is the best president but it’s not super clear how meaningful it is either and so whatever those odds are you come up with they should probably not be taken too seriously because it’s actually going to be determined by a bunch of other factors have you ever um seen or read Rosencrantz and Gilgenstern are dead so it’s a it’s a play it’s a absurdist existentialist play

[00:36:31]  Blue: and the first part of the play starts with with the two characters the one character is flipping a coin over and over again heads heads heads and he’s on this roll heads heads heads heads um and it’s this really beautiful thing because when people think about chances one of the things that we want is fortune we want chance to be applied evenly they they do studies where they will have people trying to make up um to show randomness through number strings you know going back to your six by six sided died and they might have a group of people put together a let’s pretend what a role would look like what the numbers would look like and they can always figure out which one was made by people because people want chance to be applied evenly right they want heads to roll 50 percent of the time and ultimately it will but it’s just a question of how long it takes to get there right you might roll heads a hundred times it could happen it’s because each roll is a new chance it doesn’t have any knowledge of the previous chance right

[00:37:42]  Blue: so in fact that is one of the things that um is interesting is humans don’t do not mimic randomness well and in fact um rock paper scissors which solution stream does a lot um the trick that i used that usually gets me to the i don’t always win but gets me to the end of the of the competition that got me a reputation for being good at it is that people overwhelmingly do rock right so you just select paper as your first go and then after that you just have to pick whatever right and you’ll win more often that way and my wife and i used to do rock paper scissors to determine who was going to go do something unpleasant and i won every single time and she drove her nuts right and i didn’t back then know that it was non -random i found out about that later um but for whatever reason the way i happened to play would beat her strategy that she was happening to play presumably i was less likely to go with rock on my first one and she was more likely to go with rock on her first one and so it was vastly disproportionate how often i won so rock paper scissors is not random right it’s and it’s even possible to have an explicit strategy that wins it for you so

[00:39:02]  Red: i i i like to always say i you know you play with people’s heads that’s that’s also interesting because you have the what are the chances of winning at poker you know poker is is there is a statistics that are and the chances around around getting a hand but you don’t win poker based on what hand you have right you win poker based on the way you play the the table yes and the way you manipulate the people around you and manipulate their understanding of the statistics around or the the probability of each individual hand at any given time and the cards that have already hit table and so many other things that are that have nothing to do with chance they

[00:39:47]  Blue: recently did a in this with they they’ve been working on artificial intelligence algorithms for each of the different games and so obviously we had chess originally with the blue and then we had go was was beat was dominated by alpha go they just recently did texas texas hold them and the thing that’s interesting is they discussed whether or not they should try to

[00:40:10]  Blue: try to read people’s facial expressions as part of the game or not and they decided not to because they realized at some point that if they tried to read people’s facial expressions during the game people would learn to beat the ai by faking facial expressions and so they’d have it just play the odds and it it got to the point where it could beat all the humans just playing the odds but but that’s not the way humans play right humans actually read each other so anyhow yeah and and that’s why I use that as as an example is because poker is not a purely chance thing you know it it seems like it’s heavily based on chance and it is but it it really isn’t truly all chance another one that’s interesting is Microsoft rising by 10 points we measure stocks and their risk based on applying probability theory and in a bell curve a normal distribution to stocks and how much they move but what actually causes stocks to move is events in the real world right that that have nothing to do with probability distributions right and this is the the concept of Nicholas Taleb’s the black swan that it’s it’s the the real life black swan rare event that will cause stocks you know if stocks really followed a probability distribution of financial things really followed a probability distribution then a 10 sigma event would only happen once in the entire history of the universe if that whereas they happen all the time on the stock market you know once every 50 years or something like that right rare but not that rare um and in fact the those rare moves are what determines you know half the value of the stock market and so um if you like only had your stock your money in the stock market on you know the top 20 moves for the decade or something like that that would be half the value of the stock market um and so we try to use probability theory in this case it is something to do with probability but it’s just not probability theory right it’s determined by something else entirely the random events that are just completely unpredictable that do not follow probability theory at all and this is what Nicholas Taleb’s the black swan the book the black swan is about is the fact that we’re trying to use probability theory in situations that have something to do with probability but do not follow probability distributions right and so we use the numbers because it’s quote the best we can do but really that it’s just an inappropriate use of probability theory

[00:42:48]  Red: well and going going back to i think you’re very one of your first slides um we’re it it’s us using statistics to cover our ignorance again um so what are our chances of dying of uh the

[00:43:06]  Blue: coronavirus you know we don’t know right and what they’re trying to do is they’re trying to say okay out of the people that have the coronavirus how many of them die well that that makes a certain amount of sense right that in terms of using probability to cover our ignorance you would think that look if you know if a hundred people got the coronavirus and one of them dies then your chances of dying are one in 100 right okay but first of all we don’t actually know what the population is because we don’t know how many people have the coronavirus and how many don’t right we only know of the cases that reported and obviously there’s going to be a disproportionate number of cases reported if you die yes so that automatically inflates the numbers in terms of deaths from the coronavirus so that’s why will you hear like 3.8 or something like that that’s where that number comes from and it won’t turn out to be that bad we don’t know how bad it is it could be bad the flu is 0.1 percent okay coronavirus might be like 0.7 percent that’d be seven seven times higher than the flu which is why we’re being so cautious with the coronavirus because it may be that it kills quite a bit more often than the flu right however even then it’s not an evenly distributed across the population it like the flu i mean like i get the flu all the time and i’m never even a little bit worried about dying right here because i’m not in the population that’s the most at risk for dying from the flu which would be older people and the coronavirus is going to have the same sort of thing your your chances of dying from it will depend on your age right i mean to say let’s say that it was a one percent death rate it for you personally it won’t be one percent it will depend on what your age is right well

[00:44:57]  Red: and and age is also likely only one of the factors that’s correct you know did you smoke for 30 years 30 years of of your you know you’re you’re 60 and you’ve been smoking for the last 30 or 40 years um uh how good is your health care in your community um what i suspect is that we’ll see populations that are much much higher death rates um and that we’ll see populations that are much much lower death rates depending on even with even maybe potentially at the state level you know Utah’s a very healthy state right we end up with much lower death rates than places where there’s a lot of obesity i mean we we haven’t actually seen or have any real knowledge as to what the contributors are to death

[00:45:49]  Blue: right and obviously it’s also going to be related to what’s your living conditions how good is your health care system it’s going to be based on a number of different things probably based on um earlier on versus later on when it was catching people off guard versus now that we have knowledge of that it is this virus we have to worry about right probably greatly affect the death rates so trying to use probability theory in this case maybe makes some sense but again you have to keep in mind that statistics apply to populations not to you right right well that was a great wrap up for this week’s

[00:46:30]  Red: podcast yes um and on that note um thank you this was fun this was a great conversation yes

[00:46:38]  Blue: thank you cameo


Links to this episode: Spotify / Apple Podcasts

Generated with AI using PodcastTranscriptor. Unofficial AI-generated transcripts. These may contain mistakes; please verify against the actual podcast.