The Heiwa Challenge 2

Plusck · June 23, 2016

4 hours ago, Findthepin1 said:

Now that I've read that I've realized I have no idea how file compression works. Any explanations?

The last couple of posts seem to have covered this.

The main problem is the question of "repetition" in random data.

Assuming I've understood the universe correctly... : there will always be some repetition in random data. Therefore, if you have a reasonable volume of data, you can always compress it slightly by looking for repetitions and then adding an algorithm to find them, reference them and splice them back in. This necessarily gives a net data saving.

The exploit (the challenge that was rejected) was simply taking a single number and treating this as a repetition, and splitting the data into bundles ending with that number. Rather than referencing the repetition, it called it "End Of File". It works, but if you try to save it to a USB stick (for example) all those EOF's take up more space than the original number.

Still, it would be trivial (but time consuming) to find some way of restating the original data to create some sort of repetition and therefore data-saving algorithm. If the data is truly random, it has to contain redundancy. If it is random, there must be a 6- or 7-digit sequence in 1MB that appears a few times, therefore you can split the file by that sequence and add it back in with a very simple algorithm and make a net saving.

The main problem is if that supposedly "random" data isn't at all random. With care, you can scrub your random data to give a result which is ur-random: it is not just random but is guaranteed to remain random even if you look for repetitions. It's fake, it's a product of intelligence, but it takes even greater intelligence to reduce it to a slightly smaller dataset, and that might not even be possible.

cantab · June 23, 2016

7 minutes ago, Plusck said:

there will always be some repetition in random data.

On the contrary, kind of the definition of "random data" is that there is no repetition and no redundancy.

wumpus · June 23, 2016

1 hour ago, Plusck said:

The last couple of posts seem to have covered this.

The main problem is the question of "repetition" in random data.

Assuming I've understood the universe correctly... : there will always be some repetition in random data. Therefore, if you have a reasonable volume of data, you can always compress it slightly by looking for repetitions and then adding an algorithm to find them, reference them and splice them back in. This necessarily gives a net data saving.

The exploit (the challenge that was rejected) was simply taking a single number and treating this as a repetition, and splitting the data into bundles ending with that number. Rather than referencing the repetition, it called it "End Of File". It works, but if you try to save it to a USB stick (for example) all those EOF's take up more space than the original number.

Still, it would be trivial (but time consuming) to find some way of restating the original data to create some sort of repetition and therefore data-saving algorithm. If the data is truly random, it has to contain redundancy. If it is random, there must be a 6- or 7-digit sequence in 1MB that appears a few times, therefore you can split the file by that sequence and add it back in with a very simple algorithm and make a net saving.

The main problem is if that supposedly "random" data isn't at all random. With care, you can scrub your random data to give a result which is ur-random: it is not just random but is guaranteed to remain random even if you look for repetitions. It's fake, it's a product of intelligence, but it takes even greater intelligence to reduce it to a slightly smaller dataset, and that might not even be possible.

Sorry. If the data is random, then only half the possible datasets can be compressed (no matter how you divide the datasets). And of those compressible datasets, only half of them can be compressed by more than a single bit. And of the those compressible by more than a single bit, half of them can can only be compressed by two bits, and so on and so on. And while you may argue that you have compressed each dataset on average by a single bit, you still need to mark whether you have a compressed set or not which puts you back where you started. You can't compress random data.

Below is my (bad) proof of uncompressable random data (google pingeonhole principle random compress for a better one).

Shpaget · June 23, 2016

4 hours ago, cantab said:

On the contrary, kind of the definition of "random data" is that there is no repetition and no redundancy.

No, wrong. A truly random sequence, if long enough is bound to have repetition.

While I can't claim random.org to be truly random, it's as close as I can get. This is the sequence generated by it:

http://pastebin.com/RdcsdcXf

A very rudimentary analysis shows that there are many instances of the same number being repeated twice in row, quite a few times we see 3 in a row, but there are also 4 3s in a row, 4 8s in a row and even 5 9s in a row.

All this in a relatively short sequence of only 1000 digits.

Plusck · June 23, 2016

5 hours ago, wumpus said:

Sorry. If the data is random, then only half the possible datasets can be compressed (no matter how you divide the datasets). And of those compressible datasets, only half of them can be compressed by more than a single bit. And of the those compressible by more than a single bit, half of them can can only be compressed by two bits, and so on and so on. And while you may argue that you have compressed each dataset on average by a single bit, you still need to mark whether you have a compressed set or not which puts you back where you started. You can't compress random data.

Below is my (bad) proof of uncompressable random data (google pingeonhole principle random compress for a better one).

Absolutely agree that a universal compressor isn't possible. That's not what I was trying to say.

All I was saying is that some effort has to be made so that a given large file of "random" data contains no exploitable pattern. The amount of effort - and intelligence - needed to exclude all feasible patterns increases as the size of the file increases.

To take a simple example, with a set of 1 million base-10 numbers you need 6 digits to express string length (i.e. address), therefore if it contains a repeated 7-digit number you only need 8 repeats to express it as [7-digit repeat][address1][address2]...[expunged dataset] and "compress" the file by a single digit.

Of course, the probability of such multiple occurrences is low, but there is no limit on the number of different strategies you can apply to a given file. If you use digits of pi, or e, or whatever (and assuming that those pre-determined digits are available without being considered part of the "compressed" dataset), then you again increase the probability of finding a workable pattern in your data.

Now you could argue that the lack of any discernable pattern is the very definition of "random", but that cannot be true for both a dynamic system (random number generator) and static output (say a file of n random digits). For any given sequence of numbers that are spit out by an RNG, the next number to come will either create a pattern (however obscure) or not. If it creates a pattern then, from a static point of view (looking at the resulting dataset), it fails to demonstrate randomness. However, if you apply a static requirement of randomness to the output of the RNG to validate its results, then you can predict that the next number will not create a pattern, so the RNG fails the randomness test by incorporating an element of predictability. In other words, an RNG must prove the gambler's fallacy to be a fallacy, but to be non-compressible a random file must prove the gambler's fallacy to be (at least partially) true.

So, obviously a universal compressor is impossible. However, making sure that data cannot be compressed by any compressor requires careful massaging of its randomness, and I wouldn't bet on there being no possible strategy or key available to expose an exploitable pattern. Still, I'd be happy to bet on it being so obscure that it is virtually impossible to implement without requiring even more data... but that is a virtual impossibility and not an absolute one for any given large file.

JebKeb · June 25, 2016

Thanks for asserting he's not to be trusted. That bit on cognitive dissonance at the side of the homepage was a bit...savage?

He thinks the world is round and believes in planets, but doesn't think they move and only spacecraft are affected by gravity.

cubinator · June 26, 2016

4 hours ago, JebKeb said:

doesn't think they move

...How? Has he literally never looked at the night sky?

magnemoe · June 26, 2016

On 25.6.2016 at 11:19 PM, JebKeb said:

Thanks for asserting he's not to be trusted. That bit on cognitive dissonance at the side of the homepage was a bit...savage?

He thinks the world is round and believes in planets, but doesn't think they move and only spacecraft are affected by gravity.

Help, Newton solved all issues related to movement of anything in the solar system with the exception of minor errors because of relativity or light pressure.
Finding large errors in newtons laws demands a lot of evidence.

Camacha · June 26, 2016

On 06/23/2016 at 2:28 AM, cantab said:

On the contrary, kind of the definition of "random data" is that there is no repetition and no redundancy.

Can you define the term 'random' with some care?

As soon as you create a random set of, for instance, repeating numbers or identical sequences will pop up. Those are easily collapsed into a more compact set plus description.

321a39e06d6401301d80001dd8b71c47

magnemoe · June 27, 2016

1 hour ago, Camacha said:

Can you define the term 'random' with some care?

As soon as you create a random set of, for instance, repeating numbers or identical sequences will pop up. Those are easily collapsed into a more compact set plus description.

You can get pure random numbers using radiation, either an source or background radiation, Here your only issue is the offset, that is calibrating 0 and max number.
Computers uses an algorithm based on the clock or an fast counter, scramble this like taking the square root and then only use some of the digits and you have an pretty random number.
The ones used in games like KSP works well enough, other algorithms are legal for gambling, here the main issue is more about review to avoid cheating it would be tempting to have an lower chance of dropping the jackpot.

p1t1o · June 27, 2016

Or you could use a lava lamp to generate your random numbers :wink:

https://en.wikipedia.org/wiki/Lavarand

The Heiwa Challenge 2

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation