On-line Probability Estimation
Posted on 2025-12-20 by xkollar .
Maybe one day I’ll turn this into a real article, but for now here are just some points and a picture:
- For cases where we don’t have “total population” and samples just come to us one-by-one we can’t use the approach from previous article.
- But we can use the part from “Stumbling in the Dark”
and realize that after
nsamples with no errorsP(observed no errors) = (1-P_err)^n - Some math is possible (https://en.wikipedia.org/wiki/Rule_of_three_(statistics)),
but also just binary search to find
P_errfor given confidence. - Logs are again useful.
Errs | Samples | Pobability | Comment
------+-----------+---------------+------------------
0 | . . . . . | (1-P_err)^5 | <- You are here!
------+-----------+---------------+-----------------
1 | . . . . X | 1-(1-P_err)^5 | Part with errors
| . . . X . | | that we haven't
| . . X . . | | observed.
| . X . . . | |
| X . . . . | | For 95% confidence
------+-----------+ | we need this part
2 | . . . X X | | to be at least 95%
| . . X . X | |
| . . X X . | |
| . X . . X | |
| . X . X . | |
| . X X . . | |
| X . . . X | |
| X . . X . | |
| X . X . . | |
| X X . . . | |
------+-----------+ |
3 | . . X X X | |
| . X . X X | |
| . X X . X | |
| . X X X . | |
| X . . X X | |
| X . X . X | |
| X . X X . | |
| X X . . X | |
| X X . X . | |
| X X X . . | |
------+-----------+ |
4 | . X X X X | |
| X . X X X | |
| X X . X X | |
| X X X . X | |
| X X X X . | |
------+-----------+ |
5 | X X X X X | |To have c-confidence, we want
(1-P_err)^n <= (1-c)
But also high values of P_err
are not very interesting (For example
P_err = 1 satisfies the inequality
but is completely uninteresting),
so we want to find as small P_err
as possible, which is getting us to
(1-P_err)^n = (1-c)
And again, running things through logs makes it a smidge faster.
log (1-P_err)^n = log (1-c)