The Enchanted Cave 2
Delve into a strange cave with a seemingly endless supply of treasure, strategically choos
4.34 / 5.00 31,296 ViewsGhostbusters B.I.P.
COMPLETE edition of the interactive "choose next panel" comic
4.07 / 5.00 10,082 ViewsTuesday, May 3, 2005
5:13 EST
I did a lot of research on MD5 and stuff like that. I believe I can compress a giant file into a tiny one (like one GB into about a MB, which is 1/1000 the size!) by doing what I am going to call:
Checksum Compression (C2)
Ok, bare with me on this long explanation.
Let's consider checksum compressions. You can take a huge file and MD5 it to a tiny size (or a checksum in this case).
A certain kind of program could look at that checksum and produce all the possible results that checksum could be. However, if you had certain parameters inside the checksum that told you which result was the true file (the 1985th guess or something like that) then you wouldnt even have to produce all the other results, just the one that is the actual correct result.
I'm not decairing that I want to use the MD5 checksum. I want to make my own of course. I want to make one that has imprinted information on how to decompress the checksum and stuff like that.
This will take a significant amount of computing time and special progams that take up a lot of space to handle to 2-way algorithm. But look at zip drives. Perhaps a special hardware device made specifically for that purpose could be made for programs of a reasonable size (the algorithm wouldn't be able to back up an entire hard drive).
Computers are getting better and better every year or so. Soon basic computers will be able to handle the aglorithm no problem. Imagine a site that looks like this:
Hey guys! The new version of Macromedia Flash MX is here! Here's the C2 code:
macromediaflashMX.c2 size: 1472KB
A gigabyte to a megabyte? You don't know what you're talking about.
A megabyte is 1,048,576. A gigabyte is 1,073,741,824 bytes. This means if you had a one of these c2 files that had 1,048,576 ASCII characters (because a character is one byte) then each byte would have to represent 22,108 bytes, which is 22megs. 22megs a character is too much to handle because the magnetic bits on your hard drive can have possibly trillions of different possibilities of how they look and how they are flipped. Wouldn't supprise me if a 40meg HD can have a googol of possibilities for how its HDs magnetic bits are arranged.
There is no way in hell you can compress something down to 1/1024th of it's size and do it with a two way algorithm. Too many possibilities of how the bits of the file can work.
I mean, you could make the algorithm metamorphic so it molds the CS depending on how the bits look, but making the installer alone would take years, and I doubt you could do it entirely in your life, since the program would have to redo the checksum for each bit the file you're compressing is composed of. Which even the smallest files have lots of bits, and you have to rearrange it that way unless you can pull a miracle out of your ass and find some mystical algorithm to do this efficiently.
This Checksum Algorithm you thought up is totally improbable. People will have to deal with regular file compression programs like Archivers and if they don't want to deal with that, then they can deal with programs that automatically download and install things off anonymous FTP servers.
omg.
Playstation Network tag: muffin-noodle
the empty set
At 5/3/05 06:13 PM, RegExp wrote: A gigabyte to a megabyte? You don't know what you're talking about.
I see where you're coming from on this one. I wont claim to know as much as you about that kind of computer stuff because you obviously know what you're talking about. However, I still think it could work. Maybe not as well as I originally thought, but I still think it could work.
Ok, so there's millions of possibilities. Let's say you have a program that looks at a c2 checksum and echos all the possible results for the file. Every time it echos a new result, a check parameter is changed. However, let's say that you already know which guess is going to be the correct one. You know that when the 6130200th (three characters, 5D8A18 in hex) try comes around, it's going to be the right answer. The program can figure out what the parameters of that try would be, then print the result into a file. This would take a considerable ammount of time, but I think it would work.
Somewhat similar to monkeys with typewriters.
At 5/3/05 06:51 PM, cryptacet wrote: Somewhat similar to monkeys with typewriters.
banana banana banana bana-
wait... this is totally off topic!
There's not way in hell this is gonna work.
A checksum algorithm will return a fingeprint of a file, not a summary of it. Not only can different files have identical checksums, the checksum simply doesn't contain any useful information for restoring the file.
The whole purpose of one-way hashing algorithms is that you cannot reconstruct the original input data from the hash value.
Just think about it for a sec, regardless if you calculate a checksum of a file that is 4K or 4 Terrabyte, you will always get a checksum that is like 16, 32 or whatever digits long...
Oh, and the md5 hash values of "HELLO" and "HELLA" are
eb61eead90e3b899c6bcbe27ac581660 and
44b888210d99f3088a979d9d5a373551
On the first sight, I can't spot two digits in the same place that are identical, although the two words are almost the same.
Or you could put a zip in a zip in a zip in zip...
Compesed data usualy can't be compresed futher whith god results.
Each time someone abuses hittest, God kills a kitten. Please, learn real collision testing.
Yeah maybe in a few years we'll have brute forcers that can try 10000000000000 possibilities a second. Then we'll compress a 10 gigabyte file into a single checksum then brute-force it to "unzip" it.
Besides this technique would probably have some problems, since even though there are a trillion possibilities of MD5 hashes, there's probably going to be files that are completely incorrect because of colliding hashes (is that how it's said?)
Anyways, forget it dude.
OK. There are a lot of misunderstandings here.
First of all, I am not in any way saying that I want to use MD5 or SHA1 or anything like that, I'm just stating those checksums as an example. My checksums would be in a different, unencrypted (only put through a checksum filter, not anything other than that) format.
Second of all, using my idea you wouldn't have to try millions of different combonations to get the one you want. The number of guesses you'll have to try before you get the right answer is imprinted into the file. Here's an example.
You have a checksum. You have a program that figures out all the combinations that it could be (let's say there's a thousand total). However, it would take forever to generate all of them.
You know which guess is going to be the correct one, and fortunately you can just type in that number (but in the .c2 files, that information will be automatic). The program generates the parameters for that guess (since each try is generated using the try # anyway) and generates the file. The process would take a while, but it would save websites bandwidth, hard disk space, and it would save YOU download time.
Hey James that is a really cool idea. Oh also i like your SIG for CDD pretty cool. well talk to you later.
jmsadmin
big_dogg068
Just forget it dude, your idea would never work.
You're thinking of enhanced rainbow tables, wich really suck anyways. I think you're just better off finding some other way of compressing a file. I can't think of anything cool right now, but all I know is that your idea is impossible. I mean any checksum algorythm has collisions. You would probably end up creating bad files because of them.
Stop argueing your super-duper idea. It won't work.
At 5/4/05 07:16 PM, F-13 wrote: Just forget it dude, your idea would never work.
You're thinking of enhanced rainbow tables, wich really suck anyways. I think you're just better off finding some other way of compressing a file. I can't think of anything cool right now, but all I know is that your idea is impossible. I mean any checksum algorythm has collisions. You would probably end up creating bad files because of them.
Stop argueing your super-duper idea. It won't work.
I wouldn't say it would be impossible, but it is highly highly highly improbable. Also F-13, he's saying that a way around the collisions is to say that following a certain algorithm, it would be the nth or so collision. It would take a lot of computing time in order to actually do this, and I'm not sure how it would work in programming.
At 5/4/05 07:16 PM, F-13 wrote: You're thinking of enhanced rainbow tables...
No way. Not those thirty something gigabyte rainbow tables. Nonono. The rainbow tables dont work with fingerprints because with fingerprints there are millions of different combinations. Regular MD5 hashes actually expand the file. I wouldn't use a filter like that.
I wouldnt need tables, because using the number provided in the .c2 file would hold the info for generating the file. For example, let's say there are 18446744073709551615 possible combinations for a certain code. All you would need a a number between 0 to 18446744073709551615 to generate the parameters. This wouldnt be too hard because your computer can probably handle this very quickly.
Remember this:
18446744073709551615 (Decimal)
=
FFFFFFFFFFFFFFFF (Hexidecimal)
=
1111111111111111111111111111111111111111111111111111111111111111 (Binary, 64 ones)
That's a q-word, or 8 characters in the file. Eight characters is (I believe) 8 bytes. I still think this idea can work, it just needs some computing power* and it will be done.
Also, there is the problem with the compression screwing up some times. Let's face it. All encryptions, compressions, archives, etc. have their weaknesses. Every once in a while you'll get one that is damaged or corrupt. But would you rather download a huge file in a zip format that suddenly gets corrupted, or a small file in a .c2 format that wont decompress worth a damn? Just redownload the file and if it still doesn't work, then it's the site's copy that's corrupt. All they have to do is re-compress it and there you go.
*This operation wouldn't really be that bad. If your computer can handle a thousand floating-point equations per second, which is below normal, you wont have any trouble working this program. It shouldnt be extremely fast, but it will serve a good purpose. It would be extremely usefull for people who have good computers but dial-up internet.
At 5/4/05 10:14 PM, JesusCyborg wrote: I suppose if you were using Earth Simulator or some other ultra-fast computer, you could write a program that would take a text file with no spelling errors, write down it's length and checksum, then you could take that checksum and generate all possible combinations at that specific length and filter out the ones that pass spell check.
Well Cyborg, my programming idol, I dont think you're getting me on this one.
You aren't figuring out all the possible combinations. The number you are given is not the length of the file, it is explained like this:
There are a certain number of possible combinations. All decompressors will try certain combinations in certain orders. However, the number you are given allows you to generate the parameters (because the number you are given is the ammount of times the parameters change before you get the correct file, so you are producing one file instead of a whole bunch of files) and make the file on the first try.
Get it yet?
At 5/4/05 10:27 PM, JesusCyborg wrote: Data: 0x02 0x02
CheckSum: 0x04
Possibility table for two-byte: 1 - 0x02 0x02 --- 2 - 0x01 0x03 --- 3 - 0x00 0x04 --- 4 0x04 0x00
Grab possibility number 1 as the correct one
Right. The checksum would hold:
0x04 (the checksum)
1 (the number of tries before the result is correct)
2 (length)
And, of course, this would work better if the origional input was larger, but you get the picture.
Well dude i'll be back tomarrow. Sorry I really gotta go. I'll be awaiting your answer though!
You're quite the optimist!
Anywho, I have developed a plan and a couple equations to test this idea:
Original:
0x05 0x02 0x25 0x01
(05021901 in hex)
Checksum:
0x33
C2 file:
0x33 4 49565
(2140C19D in hex)
This example is more of an encryption than a compression, but I will post some more examples as the idea progresses.
At 5/5/05 03:35 PM, potbellybob wrote: (2140C19D in hex)
That's supposed to be:
2104C19D
sorry for the type-o!
At 5/5/05 04:33 PM, Lilj wrote: If you ever get it working, I'll be glade to test it.
Do you have the massive amount of computing power that would be needed?
The problem of your system is going to be the time complexity.
If I understood you correctly, your compressed file will contain the checksum and the length of the original file as well as some sort of index.
In a simple example, you could restrict yourself to only compressing ascii strings and for simplicity, you could use CRC16 as checksum algorithm.
The checksum and the length alone won't allow you to reconstruct the original string, so you will definitely need an index. So at least once (during compression) , you have to check all permutations of strings of the same length as the input string to see which ones have the same checksum as the one you got from the input string.
Alright, imagine you have a string with 10 characters and for simplicity, we use an 8 Bit charset. That means each of the 10 characters can have a value between 0 and 255.
That means there are 255 * 255 * 255 ... * 255 possibilites or 255^10 = 1,162,523,670,191,533,212,890,625 or in other words, you need to calculate and compare the checksum of (at least in the worst case) 255^10 strings.
Since there's obviously no direct way to tell from a given checksum, which strings are gonna have that checksum, the only way to figure that out is through backtracking.
The time complexity is O(m^n) where n is the length of the input file and m is the size of each "unit".
Imagine you're compressing a file that is 2gigs in size, then you'd get
255^2147483648 possibilities you need to check in the worst case.
See where I'm going with that?
I totally think your system would work IN THEORY, but at least in the near future, I don't see any chance that any computer will be able to compress or decompress with your method.
At 5/5/05 04:33 PM, Lilj wrote: If you ever get it working, I'll be glade to test it.
Thats good to hear!
But I have some bad news. I expected there to be several trillions of possible combinations, but damn. I'm getting numbers that are so long it's hurting my eyes. Plus, the compression (because the number of tries before you get the right answer is such a huge number) turns out to be about the same size as before. I'm still trying, though, so dont give up.
At 5/5/05 04:54 PM, HarryHunt wrote: Imagine you're compressing a file that is 2gigs in size, then you'd get
255^2147483648 possibilities you need to check in the worst case.
Well, my origional idea would bypass all that.
You have this string:
"This"
Which has a .c2 checksum of:
408 or 0198 in hex.
the length is of course 4.
The number of tries until you get the right answer is:
1,913,712,354 or 7210EEE2 in hex
So the .c2 file is going to be something like:
19847210EEE2
That is, btw, larger than the origional input.
Peaks
Now to get a little off topic, I am going to talk about peaks.
I could add peaks to the mix to save some space.
I'll basically do everything exactly as done before, except ass one parameter to it and shorten the "# or tries" entry.
Since the largest value, or peak, in "This" is 115 (the "s"). I can shorten the # of tries entry by a bunch if I use the 115 peak.
checksum:
408 or 0198 in hex.
the length is of course 4.
The peak is 115.
The number of tries until you get the right answer is:
176,301,294 or 0A8224EE
The output:
198473A8224EE
Hm... that's larger than the origional! Either my mathematics are horrible or this idea isn't going so well...
I think there'd need to be a third value to work with to get the right output strait away. You said at the beginning something about adding in how many "guesses" (let's just call them steps) that would be needed to get the right result...
The decompressor would only need to caclulate up to the step limit. Or to go even further, an algorithm that calculates the result based on steps, instead of incrementing, would get it much faster.
I can't think about it much more right now; I have a science test in under 9 hours, so I need to do some last minute cramming and get to sleep.
At 5/5/05 07:43 PM, ABoxInABox wrote: caclulate
Damn my spelling! How the hell will I pass my english exam?
At 5/5/05 07:45 PM, ABoxInABox wrote:At 5/5/05 07:43 PM, ABoxInABox wrote: caclulateDamn my spelling! How the hell will I pass my english exam?
LOL! NE ways, my origional idea is NOT going to work. I've tried everything... However, I remain persistant. I've come up with a new (better hopefully) idea that I will post once I've made a few examples.