1 Introduction
1 Introduction
1. Introduction
Giulio Ermanno Pibiri
ISTI-CNR, [email protected]
@giulio_pibiri
@jermp
Overview
• The process for which data is transformed into another representation that
takes less storage space:
- save space when storing data,
- save time when transmitting data.
• The process for which data is transformed into another representation that
takes less storage space:
- save space when storing data,
- save time when transmitting data.
• The most common one: the trade-o between the space of the compressed
data structure and the e ciency of the operations that we want to support on
the data.
Example: gzip has 9 compression “levels” (1 is fastest but “worst” compression; 9 is slower but “best”).
ffi
ff
Space vs. Time Trade-Off
• The most common one: the trade-o between the space of the compressed
data structure and the e ciency of the operations that we want to support on
the data.
Example: gzip has 9 compression “levels” (1 is fastest but “worst” compression; 9 is slower but “best”).
B C(B) C(C(B))
C C C … C(…C(C(B))…)
• If you think: most of the data we deal with (Web pages, log les,
sequencing data, ecc.) is created by programs, not by humans.
fi
fi
fi
fi
fi
Undecidability
Q. How would you compress these bits?
Compile with:
g++ random_bits.cpp -o random_bits
Run with:
100,000 pseudo-random bits ./random_bits
Undecidability
Q. How would you compress these bits?
A. With the following piece of code.
What if n = 1000000?
What happens now to the CR ?
Compile with:
g++ random_bits.cpp -o random_bits
Run with:
100,000 pseudo-random bits ./random_bits
Data and Information
• In the future, it is foreseen that data will grow much faster than information:
data will become more and more redundant.
• Communication cost.
- Skype, Zoom, FaceTime, WhatsApp, ecc.
- Social networks (Facebook, Instagram, Twitter,…);
fi
fl
Why Data Compression?
• Communication cost.
- Skype, Zoom, FaceTime, WhatsApp, ecc.
- Social networks (Facebook, Instagram, Twitter,…);
fi
Memory Hierarchies
Experiment methodology.
1. Allocate two vectors of the same size, one
holding large_record objects and the other
holding small_record objects.
2. Fill the two vectors with the same data.
3. Sort the two vectors (say, on the day attribute).
ll the vectors
1+2
fi
fi
A Simple Experiment std::chrono to measure time
Experiment methodology.
3
1. Allocate two vectors of the same size, one
holding large_record objects and the other
holding small_record objects.
2. Fill the two vectors with the same data.
3. Sort the two vectors (say, on the day attribute).
ll the vectors
1+2
fi
fi
A Simple Experiment
• Q. Which sort will take less time?
• Hint. Remember! The smaller the data, the more data can be transferred to the processor.
Compile with:
g++ -std=c++11 -O3 sort_bench.cpp -o sort_bench
Run with:
./sort_bench 10000000
A Simple Experiment
• Q. Which sort will take less time?
• Hint. Remember! The smaller the data, the more data can be transferred to the processor.
Compile with:
g++ -std=c++11 -O3 sort_bench.cpp -o sort_bench
Run with:
./sort_bench 10000000