## Stream: general

### Topic: Applied statistics

#### Simon Hudon (Jul 05 2018 at 23:41):

I haven't done any statistics in a while. Can someone point me to a formula I can use to calculate the probability that x ≤ y if both are normally distributed random variables?

#### Kenny Lau (Jul 05 2018 at 23:42):

mean and variance?

#### Simon Hudon (Jul 05 2018 at 23:42):

Yes using the mean and variance.

#### Simon Hudon (Jul 06 2018 at 00:06):

Do you have a formula for that?

#### Mario Carneiro (Jul 06 2018 at 01:30):

This is like taking the 2D Gaussian and drawing a line across the distribution; if you project parallel to that line the projected distribution is another Gaussian so the answer should be Erf of some simple term involving the mean and variance

#### Simon Hudon (Jul 06 2018 at 01:37):

I'm not knowledgeable enough to make sense of that answer

#### Simon Hudon (Jul 06 2018 at 01:42):

Maybe it would be useful to reveal more context of my problem. I have two tactics. I'm running them each ~30 times, measuring average running time and variance. I'd like to check if one is actually faster than the other

#### Simon Hudon (Jul 06 2018 at 01:43):

The difference is not big but I'd like to know where I'm heading with the optimization

#### Mario Carneiro (Jul 06 2018 at 01:44):

If the variables are $X \sim N(a, \sigma_X)$ and $Y \sim N(b, \sigma_Y)$ then let $X = a + \sigma_X X'$ and $Y = b + \sigma_Y Y'$ where $X'$ and $Y'$ are standard normal variables; then $a+\sigma_X X' \le b + \sigma_Y Y'$ iff $\sigma_Y Y' - \sigma_X X' \ge a - b$. But $\sigma_Y Y' - \sigma_X X'$ is a Gaussian with SD $\sqrt{\sigma^2_X + \sigma^2_Y}$, so the probability is $\mathrm{erf}\frac{a-b}{\sqrt{\sigma^2_X + \sigma^2_Y}}$

#### Mario Carneiro (Jul 06 2018 at 01:48):

At the first order the answer to your question is easy: If the mean is less then it's probably an improvement

#### Mario Carneiro (Jul 06 2018 at 01:48):

but the larger the standard deviation the less confidence you have in this assertion

#### Simon Hudon (Jul 06 2018 at 01:49):

Right. One of them has a pretty large standard deviation

#### Simon Hudon (Jul 06 2018 at 01:49):

Is it weird that it only started making sense when you brought up the formula? :)

#### Mario Carneiro (Jul 06 2018 at 01:51):

By the way there is a square root in the last two math blocks there, but in my mathjax display I only see a conspicuous space

#### Simon Hudon (Jul 06 2018 at 01:53):

I see the same. You were missing a bracket and a \$ so I ended up compiler your latex myself. I see both square roots

#### Mario Carneiro (Jul 06 2018 at 01:54):

I think I've fixed all the delimiter problems above, but the square root still isn't rendering. You know how to fix it?

#### Simon Hudon (Jul 06 2018 at 01:56):

You're right, it's now rendering almost properly. I haven't played with mathjax so I wouldn't know where to start

#### Mario Carneiro (Jul 06 2018 at 01:57):

MSE uses mathjax and there isn't much trouble using $\sqrt x$ there

#### Reid Barton (Jul 06 2018 at 01:57):

Zulip actually uses katex, not mathjax

#### Simon Hudon (Jul 06 2018 at 01:57):

$x$

#### Mario Carneiro (Jul 06 2018 at 01:58):

you have to use double dollar delimiter

#### Mario Carneiro (Jul 06 2018 at 01:58):

does $\sqrt x$ work?

Nice thanks!

#### Simon Hudon (Jul 06 2018 at 01:59):

$\sqrt x$

No

#### Simon Hudon (Jul 06 2018 at 01:59):

$\sqrt x$

#### Mario Carneiro (Jul 06 2018 at 01:59):

Not sure how to do display math either

#### Simon Hudon (Jul 06 2018 at 01:59):

$\sqrt x$

oh, there it is

#### Simon Hudon (Jul 06 2018 at 02:00):

If you do "three ticks" math (like you would for Lean code), then it seems to work

#### Simon Hudon (Jul 06 2018 at 02:05):

If I try my tactic on three problems, I evaluate the formula you gave me for each problem, how do you recommend I aggregate the data (if I need to)?

#### Mario Carneiro (Jul 06 2018 at 02:22):

To make it a bit easier, you can subtract the two times for each problem, resulting in a random variable that you want to compare to zero

#### Mario Carneiro (Jul 06 2018 at 02:23):

then you take an average weighted by how many trials were performed in each problem

#### Mario Carneiro (Jul 06 2018 at 02:26):

the standard deviation would be the sqrt-sum of the individual SDs, divided by 3 since there are three subproblems (assuming equal weights on the three problems)

#### Simon Hudon (Jul 06 2018 at 02:29):

Does it matter than one of the problem is much harder than the others? It has a run time around 16s while the others are around 7s and 3s

#### Mario Carneiro (Jul 06 2018 at 02:29):

Oh, I see, yes that changes things

#### Mario Carneiro (Jul 06 2018 at 02:30):

I'm assuming in this analysis that we have two gaussian variables X and Y and want to estimate them

#### Mario Carneiro (Jul 06 2018 at 02:30):

but this supposes that the mean and SD are fixed across the sample

#### Simon Hudon (Jul 06 2018 at 02:30):

That sounds right

#### Simon Hudon (Jul 06 2018 at 02:31):

I don't understand your last sentence. What's the opposite of fixed?

#### Mario Carneiro (Jul 06 2018 at 02:32):

We can assume that problem 1 has a gaussian distrib of times for each tactic, but there is no reason to believe that this same gaussian describes the times for problem 2

#### Simon Hudon (Jul 06 2018 at 02:34):

I think that's reasonable

#### Mario Carneiro (Jul 06 2018 at 02:37):

So we have to make some assumption about what we are estimating. Let's say we have good data on each problem, so we can say things like problem 1 is 2s +- .1s for tac 1 and 4s +- 1s for tac 2, and similarly for problem 2. If the goal is to minimize total run time over all future uses of tac1 vs tac2, then we need to assume that problem 1 and problem 2 are somehow representative of those future uses

#### Mario Carneiro (Jul 06 2018 at 02:37):

If we think problem 2 is more common than problem 1, then it will get a higher weight in the average

#### Simon Hudon (Jul 06 2018 at 02:38):

The three problems are deriving transportable for monoids, groups and rings. Rings are very expensive

#### Simon Hudon (Jul 06 2018 at 02:39):

We could believe that the complexity is linear in the number of fields of the structure

#### Mario Carneiro (Jul 06 2018 at 02:39):

I think it is safe to say that each of those will be run only once, but perhaps you want to extrapolate to even larger examples, or smaller?

#### Mario Carneiro (Jul 06 2018 at 02:40):

In other words you should estimate the common use case size

#### Simon Hudon (Jul 06 2018 at 02:42):

Digression: should there be an absolute value in your formula? It's giving me negative numbers

#### Simon Hudon (Jul 06 2018 at 02:42):

I think as you say there will be an average size for the problems where this is used

#### Mario Carneiro (Jul 06 2018 at 02:49):

You mean the a-b/sqrt(sd) formula?

#### Mario Carneiro (Jul 06 2018 at 02:51):

there should not be an absolute value, although I think I may have a sign error in there - one sign means tac1 is better, the other means tac2 is better

#### Simon Hudon (Jul 06 2018 at 02:52):

Ah I see! The highest probability I get is 50%. I'll keep the optimization in but that doesn't seem conclusive

#### Mario Carneiro (Jul 06 2018 at 02:52):

when you use erf it turns the number into a probability which is close to 100% for positive numbers and close to 0 for negative numbers

#### Mario Carneiro (Jul 06 2018 at 02:53):

50% means you have a very small difference, of course

#### Simon Hudon (Jul 06 2018 at 02:53):

Really? I don't get that. Maybe I should be suspicious of my erf implementation.

#### Mario Carneiro (Jul 06 2018 at 02:54):

you can report the number without the erf, that gives the number of SDs away from the mean

#### Mario Carneiro (Jul 06 2018 at 02:55):

wait, did you just push some haskell to mathlib?

#### Simon Hudon (Jul 06 2018 at 02:55):

On my branch yes. I'll take it out before pushing for a merge

#### Simon Hudon (Jul 06 2018 at 03:01):

I checked with another erf implementation, the results are consistent. I get the following numbers, one for each problem:

0.3548605107843447
0.392231363819503
0.5018422725422864


#### Simon Hudon (Jul 06 2018 at 03:01):

Is the last one 50% or 0.50%?

#### Mario Carneiro (Jul 06 2018 at 03:01):

I would assume 50%

#### Mario Carneiro (Jul 06 2018 at 03:02):

erf returns numbers in the range 0-1

#### Simon Hudon (Jul 06 2018 at 03:03):

Ok, I got confused when you said:

when you use erf it turns the number into a probability which is close to 100% for positive numbers and close to 0 for negative numbers

#### Mario Carneiro (Jul 06 2018 at 03:05):

Oh, I just checked wikipedia and it looks like the normalization conventions are a bit different for erf

#### Mario Carneiro (Jul 06 2018 at 03:05):

there is a factor sqrt 2 on the inputs, and it returns a value in the range -1 to 1

#### Mario Carneiro (Jul 06 2018 at 03:06):

Maybe you should just leave the erf off and interpret the number of SDs away from the mean directly

#### Mario Carneiro (Jul 06 2018 at 03:06):

what I really want there is the CDF of the standard normal distribution

#### Mario Carneiro (Jul 06 2018 at 03:07):

which is basically erf but needs some preprocessing

#### Mario Carneiro (Jul 06 2018 at 03:09):

wikipedia says the CDF is

$CDF_{N(\mu,\sigma)}(x)=\frac{1}{2}\left[1+\mathrm{erf}\left(\frac{x-\mu}{\sigma\sqrt 2}\right)\right]$

#### Simon Hudon (Jul 06 2018 at 03:10):

The erf library in haskell has a function normcdf with normcdf x = erfc(-x sqrt 2) ^ 2 does that make sense?

#### Mario Carneiro (Jul 06 2018 at 03:10):

based on the name that's likely to be the right one

#### Mario Carneiro (Jul 06 2018 at 03:11):

try evaluating it at -1, 0, 1

#### Simon Hudon (Jul 06 2018 at 03:13):

0.15865525393145705
0.5
0.8413447460685429


that's correct

#### Mario Carneiro (Jul 06 2018 at 03:15):

use that in place of erf in the formula I gave before

Thanks!

#### Simon Hudon (Jul 06 2018 at 03:17):

That looks much better!

#### Simon Hudon (Jul 06 2018 at 03:18):

0.6276517385920415
0.6416715838500027
0.6840264065071848


#### Simon Hudon (Jul 06 2018 at 03:18):

And if I flip aand b in a - b, I get:

0.37234826140795846
0.3583284161499974
0.3159735934928151


#### Simon Hudon (Jul 06 2018 at 03:19):

If we use b - a, that's the probability that b ≥ a, is correct?

yes

Cool

#### Mario Carneiro (Jul 06 2018 at 03:20):

The interpretation is something like "given the data you gathered, there is a 62% chance that in fact tac 1 is better than tac 2 on problem 1"

#### Simon Hudon (Jul 06 2018 at 03:21):

Btw, unfold_projs yielded a big improvement in the end: Lean stopped crashing

#### Mario Carneiro (Jul 06 2018 at 03:21):

ooh, what caused the crash?

Timeout

#### Simon Hudon (Jul 06 2018 at 03:22):

I suspect that definitions generated by a tactic don't play nicely with simp / dsimp / unfold / dunfold if you try to unfold them

#### Mario Carneiro (Jul 06 2018 at 03:24):

you can write your own simp lemmas for them if you want

#### Simon Hudon (Jul 06 2018 at 03:25):

Yes, I tried that. It did help a bit

#### Simon Hudon (Jul 06 2018 at 03:26):

But not as much as unfold_projs. It removed the need to hard code has_mul.mul et cie

#### Simon Hudon (Jul 06 2018 at 03:35):

I don't know about you but I'm having so much fun with that stuff (Lean) lately :D

#### Kenny Lau (Jul 06 2018 at 06:47):

I haven't done any statistics in a while. Can someone point me to a formula I can use to calculate the probability that x ≤ y if both are normally distributed random variables?

oh of course it's 50%, it's symmetric

#### Kevin Buzzard (Jul 06 2018 at 07:23):

oh of course it's 50%, it's symmetric

I think the means are different

Last updated: May 08 2021 at 11:09 UTC