Summary: Naked Statistics

[ NOTE: These book summaries are designed as captures for what I’ve read, and aren’t necessarily great standalone resources for those who have not read the book. Their purpose is to ensure that I capture what I learn from any given text, so as to avoid realizing years later that I have no idea what it was about or how I benefited from it. ]

Capture

  • The point of statistics is to make better decisions about how to live our lives
  • When you say who’s the “best” at something, that’s subjective and could mean many things
  • An index is a single number that represents multiple descriptive metrics
  • It’s very easy to deceive with statistics because there are so many ways to say true things that aren’t what the question wanted answered
  • Mean is all observations added up and divided by the number of observations
  • Median is the value that has an equal number of observations above and below it
  • Correlation is where as one variable moves, so does another one
  • 1 SD is around 68%, and 2SD is around 98%
  • The standard error is the SD for how far away a sample size is from the mean of the overall population
  • The central limit theorem says that if you have a random sample of the population you’ll get a bell curve when you look at variables within that sample
  • You need at least 30 observations for central limit theorem to work
  • When you do regression analysis you try to get to a coefficient, and that coefficient should be a line with a slope
  • If you have bad data, bad samples, etc., there’s little statistics can do to help you
  • When you do regression analysis you are trying to say that the null hypothesis is not likely. You’re aren’t proving it’s wrong, you’re using central limit theorem and other fundamentals to show that it’s unlikely to a certain percent. But you can still be wrong.
  • Something called the sum of squares helps you understand how far off you are, but I don’t fully understand it. Need Carl to explain it to me.

Lessons

  • Central Limit Theorem seems to be like the most important thing instatistics
  • There are SO MANY traps to doing statistics correctly, such as assuming that spending more money in schools gets better results when it might just be a bunch of rich students
  • One of them is reverse causality, such as more golf lessons causing bad golf vs. bad golf causing golf lessons
  • Standard deviations are KEY
  • Standard error is KEY

Questions

  • Is Machine Learning basically doing opaque (neural net) magic in the Sum of Squares space?

[ Find my other book summaries here. ]

__

I do a weekly show called Unsupervised Learning, where I curate the most interesting stories in infosec, technology, and humans, and talk about why they matter. You can subscribe here.

Source: http://feeds.danielmiessler.com

Leave a Reply