怎么证明一个算法代码的正确性

#算法证明

##背景

看到一个Reservoir Sampling算法,即水塘抽样算法。这个算法是用来解决从一个长度未知的集合中,随机选择K个元素的问题。先看一下算法的实现:

final int SAMPLE_COUNT = 10;
int index = 0;

Object[] samples = new Object[SAMPLE_COUNT];
Random rnd = new Random();
for (Object element in stream) {
	if (index < SAMPLE_COUNT) {
		samples[index] = element;		
	} else {
		int r = rnd.nextInt(0, index);
		if (r < SAMPLE_COUNT) {
			samples[r] = element;
		}
	}
	index++;
}

##如何证明这个算法的正确性

如何证明这个算法的正确性呢,在网上找到两个相关的证明,一个是来自wiki:

alt text

证明方式采用的是演绎法,即假设一行被选中,且后续所有行都没有覆盖这一行的概率。Pn是第n行被选中的概率,依次类推后续任一[n+1 : N]行没有覆盖这一行的概率=1-Pk,所以程序结束,任一一行n被选中的概率就等于Pn乘以后续所有(1-Pk)

===

一个是来自MSDN:

Base case is trivial. For the k+1st case, the probability a given element i with position <= k is in R is s/k. The prob. i is replaced is the probability k+1st element is chosen multiplied by i being chosen to be replaced, which is: s/(k+1) * 1/s = 1/(k+1), and prob that i is not replaced is k/k+1.

So any given element's probability of lasting after k+1 rounds is: (chosen in k steps, and not removed in k steps)

= s/k * k/(k+1), which is s/(k+1).

So, when k+1 = n, any element is present with probability s/n.

这里用的是归纳法,翻译过来就是设第i个位置被替换成第k个元素的概率是1/k,所以第k+1次迭代,i这个位置仍然是k的。

Written on December 9, 2014