These are the counts produced by cty_sub_big.cc: total = 95412 total_sub = 15000 yes = 4843 yes_sub = 2430 no = 90569 no_sub = 12570 freq = 0.0507588 freq_sub = 0.162 OSR = 3.19156 The size of the raw data set is 95412 of which 15000 were randomly sampled for cty_sub.dat. The proportion of donors in the raw data set is 5.07588%. The proportion of donors in the subset is 16.2%. This is an over sampling rate of 0.162/0.0507588 = 3.19156