User:Xavier Combelle/7zip long range compression

According to Dbzip2, the Wikimedia foundation is looking for a way to improve the compression speed.

The rzip experiments show, that by using the long range redundancy, a compression ratio similar to 7zip but at a faster compression rate (20x faster). A problem raised is that it would be hard to make adoption of a new compressor to dumps user.

My proposition is to integrate long range redundancy in 7zip compressor, aiming to have a similar compression rate but with compression speed higher than existent.

The idea is to add a new method in C/LzFind.c source of p7zip to search for LZmatch.

This new method would be in two part

  • first a long range search which would index the whole window and look for long range similarity in a similar way than rzip
  • second a close range search which index only the part of the window which is not a match in long range similarity and do the search with a classic search.

I have two questions:

  • where to get the source of p7zip to play easier with wikimedia foundation ? (I would like to use my debian environment to develop it)
  • What would be the simpler algorithm to adapt use for close range search and still have good enough property ?