Zstd: What's the Weissman score for this?

Created on 31 Mar 2018  ·  3Comments  ·  Source: facebook/zstd

I want to use this algorithm for various use cases.
I only want to use it if the weissman score was better than the theoretical limit.
I don't see the weissman score listed on the readme.

Most helpful comment

Weissman scoring has a number of problems:

  1. It is a relative score. You need to pick a reference speed and ratio to compare against.
  2. It produces nonsense answers with T <= 1.
  3. It is sensitive to the time unit used. (log(T_ref / T_exp) would probably have been better than log(T_ref) / log(T_exp)). As it stands, scoring a compressor using minutes vs seconds produces different scores.
  4. It fails to capture the real-world trade-off between ratio and time. The possibility frontier of tradeoffs between speed and ratio in compression is not log-shaped.
  5. It doesn't factor in decompression speed at all.

Nonetheless, with the following parameters:

  • Using gzip (at its default level 6) as the reference compressor.
  • Benchmarking on the Silesia corpus.
  • Using tenths of a second as the time unit (since some of the faster compressors take less than a second, which would otherwise produce negative logs).
  • Using an alpha of one.

I get the following scores:

Algo | Lvl | Score
---- | --- | -----
gzip | 1 | 1.19
gzip | 2 | 1.20
gzip | 3 | 1.15
gzip | 4 | 1.16
gzip | 5 | 1.09
gzip | 6 | 1.00
gzip | 7 | 0.96
gzip | 8 | 0.87
gzip | 9 | 0.83
lz4 | 1 | 2.98
zstd | -5 | 2.97
zstd | -4 | 2.86
zstd | -3 | 2.77
zstd | -2 | 2.58
zstd | -1 | 2.54
zstd | 1 | 2.67
zstd | 2 | 2.34
zstd | 3 | 2.11
zstd | 4 | 1.98
zstd | 5 | 1.67
zstd | 6 | 1.55
zstd | 7 | 1.42
zstd | 8 | 1.34
zstd | 9 | 1.24
zstd | 10 | 1.18
zstd | 11 | 1.12
zstd | 12 | 1.03
zstd | 13 | 0.97
zstd | 14 | 0.94
zstd | 15 | 0.90
zstd | 16 | 0.89
zstd | 17 | 0.86
zstd | 18 | 0.84
zstd | 19 | 0.82
zstd | 20 | 0.82
zstd | 21 | 0.80
zstd | 22 | 0.79

As you can see, both zstd and lz4 do break the theoretical limit of 2.9.

I hope this information is helpful!

All 3 comments

Wondering the same thing, has anyone ran enough tests to publish an accurate weissman score?

Weissman scoring has a number of problems:

  1. It is a relative score. You need to pick a reference speed and ratio to compare against.
  2. It produces nonsense answers with T <= 1.
  3. It is sensitive to the time unit used. (log(T_ref / T_exp) would probably have been better than log(T_ref) / log(T_exp)). As it stands, scoring a compressor using minutes vs seconds produces different scores.
  4. It fails to capture the real-world trade-off between ratio and time. The possibility frontier of tradeoffs between speed and ratio in compression is not log-shaped.
  5. It doesn't factor in decompression speed at all.

Nonetheless, with the following parameters:

  • Using gzip (at its default level 6) as the reference compressor.
  • Benchmarking on the Silesia corpus.
  • Using tenths of a second as the time unit (since some of the faster compressors take less than a second, which would otherwise produce negative logs).
  • Using an alpha of one.

I get the following scores:

Algo | Lvl | Score
---- | --- | -----
gzip | 1 | 1.19
gzip | 2 | 1.20
gzip | 3 | 1.15
gzip | 4 | 1.16
gzip | 5 | 1.09
gzip | 6 | 1.00
gzip | 7 | 0.96
gzip | 8 | 0.87
gzip | 9 | 0.83
lz4 | 1 | 2.98
zstd | -5 | 2.97
zstd | -4 | 2.86
zstd | -3 | 2.77
zstd | -2 | 2.58
zstd | -1 | 2.54
zstd | 1 | 2.67
zstd | 2 | 2.34
zstd | 3 | 2.11
zstd | 4 | 1.98
zstd | 5 | 1.67
zstd | 6 | 1.55
zstd | 7 | 1.42
zstd | 8 | 1.34
zstd | 9 | 1.24
zstd | 10 | 1.18
zstd | 11 | 1.12
zstd | 12 | 1.03
zstd | 13 | 0.97
zstd | 14 | 0.94
zstd | 15 | 0.90
zstd | 16 | 0.89
zstd | 17 | 0.86
zstd | 18 | 0.84
zstd | 19 | 0.82
zstd | 20 | 0.82
zstd | 21 | 0.80
zstd | 22 | 0.79

As you can see, both zstd and lz4 do break the theoretical limit of 2.9.

I hope this information is helpful!

Was this page helpful?
0 / 5 - 0 ratings