#12
|
|||
|
|||
Re: Simple storage problem (math)
This is a worthwhile idea to consider. Define g(p) to be a
"modified gap" function g(p) = (q-p-2)/2 where q is the least prime >p. [ Take advantage that the difference between successive primes is even. ] First, instead of stopping at the prime 13, go up to 23 [ 2x3x5x...x23 < 2^32 but 2x3x5x...x29 > 2^32 ] so the relevant factor is about 6.112910518 (ratio of highest "candidate" to bits used). Now encode all primes <10^8 using the basic idea. Note the average gap at about 10^8 between primes is about 18.42 or g(p) is around 8.21 on average already. Second, using the fact that pi(100 million) = 5 761 455, use pzhon's idea so that the encoding takes only about log_2(C(16 358 820, 5 761 446)) bits or about 15 312 000 bits which is a little bit better than 16 358 820 bits (we save more than 1 megabit). [ How efficiently are the primes encoded? Ignoring 2,3,5,..., 29 for now, on average about 2.6577 bits per prime are used. If pzhon's idea is not used, it's about 2.8394 bits per prime and if just the basic idea is used for all primes <2.29x10^9, the average would be about 3.3532 bits per prime.] So there remains about 424.688 megabits for the remaining 105 959 352 primes or just above 4.008 bits per prime. Is that enough to naively encode g(p)? The "jumping champion" is 6 (most common difference between primes) for any "big" range <1.74x10^35 (see http://mathworld.wolfram.com) , so the most common value for g(p) is 2, so it's possible that for many values of p in (10^8, 2.29x10^9), 0<=g(p)<=14 so only four bits are used for the vast majority of p. Also note the maximum value of g(p) is 145 in the range ( see: http://primes.utm.edu/notes/GapsTable.html ). On the other hand, the average for g(p) should be around 9.5 since ln(2.29x10^9)/2 is about 10.776, i.e., we should expect many values of p in the relevant range such that g(p)>14 (where we would need 4 bits + several more to encode the value of g(p)). It seems that the "budget" of 4.008 bits per value of g(p) will be slightly exceeded to naively encode the difference. OTOH, there could also be enough p such that 0<=g(p)<=6 so that a 3-bit scheme could be used at first instead, but I suspect that there are too many p in the range for which g(p)>=7. Yes, it's close at 100 million. If 200 million is used instead, we should be there. |
|
|