0% found this document useful (0 votes)
89 views47 pages

Let's Shrink "Bloated Debian Repository": Hideki Yamane

- The Debian repository is quite large at 5.9TB total. Using xz compression can significantly reduce its size by around 15%. - Xz compression provides much better compression than the default gzip, reducing file sizes by up to 30-50% on average. However, decompression is slower, sometimes by 2-4x. - The decompression speed difference depends on the specific file. Most see xz as 2-3x slower, but some files see as much as 40x slower decompression with xz. So it provides a space-time tradeoff.

Uploaded by

zennro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views47 pages

Let's Shrink "Bloated Debian Repository": Hideki Yamane

- The Debian repository is quite large at 5.9TB total. Using xz compression can significantly reduce its size by around 15%. - Xz compression provides much better compression than the default gzip, reducing file sizes by up to 30-50% on average. However, decompression is slower, sometimes by 2-4x. - The decompression speed difference depends on the specific file. Most see xz as 2-3x slower, but some files see as much as 40x slower decompression with xz. So it provides a space-time tradeoff.

Uploaded by

zennro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Let's shrink

bloated Debian repository


Hideki Yamane
(Debian Project:Debian Developer)
<henrich @ debian.org/or.jp>
http://iki.debian.org/HidekiYamane

!oda"#$ %genda

Ho large i$ Debian &epo$itor"

'ne da"( ) *o+nd a $ol+tion...

)$ it reall" e**ective,

Problem on $loer %rch

Ho m+ch can e $hrink it,



Debian supports...

How large Debian Repository is?
Arch source! all! am"#$! armel! armh%! hur"&
i'(#! i'(#! ia#$! k%reebs"&am"#$! k%ree&bs"&
i'(#! mips! mipsel! powerpc! s')*! s')*+! sparc

How large is Debian Repository?
Arch source ,-./! all ,0./! am"#$ ,'./! armel
'(./! armh% -#./! hur"&i'(# 1$./! i'(# ,*./!
ia#$ $-./! k%reebs"&am"#$ '0./! k%reebs"&i'(#
'#./! mips ',./! mipsel '$./! powerpc $-./! s')*
'#./! s')*+ -$./! sparc ')./...
3http44www."ebian.org4mirror4si5e6

How large is Debian Repository?
Arch source ,-./! all ,0./! am"#$ ,'./! armel
'(./! armh% -#./! hur"&i'(# 1$./! i'(# ,*./!
ia#$ $-./! k%reebs"&am"#$ '0./! k%reebs"&i'(#
'#./! mips ',./! mipsel '$./! powerpc $-./! s')*
'#./! s')*+ -$./! sparc ')./...
3http44www."ebian.org4mirror4si5e6

Ho can e
improve thi$,

-an e $hrink thi$,
Yes, in some ways...
Drop support architectures
Delete packages from archive

-an e $hrink thi$,
However, we don't want these
solutions
Drop support architectures
Delete packages from archive

.$e /01
Default compression is
gzip
xz can reduce file size

.$e /01
ex)
fontshoraiumefont !"'m maintainer #)
$y gzip % # &',((&k)
$y xz # *+,&,(k)

.$e /01
ex)
fontshoraiumefont !"'m maintainer #)
$y gzip % # &',((&k)
$y xz % # *+,&,(k)
- +,%.(k)

The archive software now accepts packages using xz for compression in
addition to gzip and bzip2 for both source and binary packages.
(snip)
Additionally please only use xz (or bzip2 for that matter) if your package
really profits from its usage (for example, it provides a significant space
saving). While those methods may compress better they often use more
CPU time to do so and a very small decrease in package size is hardly worth
the extra effort placed on slower systems. Think of both user systems and
the Debian buildds which will waste more time an especially bad problem
on slower architectures.
(The archive now supports xz compression by Ansgar Burchardt <[email protected]>
http://lists.debian.org/debian-devel-announce/2011/08/msg00001.html)

The archive software now accepts packages using xz for compression in
addition to gzip and bzip2 for both source and binary packages.
(snip)
Additionally please only use xz (or bzip2 for that matter) if your package
really profits from its usage (for example, it provides a significant space
saving). While those methods may compress better they often use more
CPU time to do so and a very small decrease in package size is hardly worth
the extra effort placed on slower systems. Think of both user systems and
the Debian buildds which will waste more time an especially bad problem
on slower architectures.
(The archive now supports xz compression by Ansgar Burchardt <[email protected]>
http://lists.debian.org/debian-devel-announce/2011/08/msg00001.html)

/0 on 2loer arch i$ problem...
"t'll eat
most /01 time

/0 on 2lower arch i$ problem...
3hen...
if only on 0owerful arch4

/0 on 0owerful arch i$ 563 problem
assumption#
use 78 on "ntel9:;D arch )y default

all i345 amd56 h+rd7i345 ia56 k*reeb$d7amd56 k*reeb$d7i345
8
98
:8
38
68
;8
58
;<
;:
;;
96
6:
3=
3<
$
i
>
e
(
?
@
)

all i345 amd56 h+rd7i345 ia56 k*reeb$d7amd56 k*reeb$d7i345
8
98
:8
38
68
;8
58
;<
;:
;;
96
6:
3=
3<
6;
3:
36
98
:6 :6
:3
be*ore
a*ter A>
$
i
>
e
(
?
@
)

How much can we shrink it4
be*ore a*ter A>
8
;8
988
9;8
:88
:;8
388
3;8
k*reeb$d7i345
k*reeb$d7amd56
ia56
h+rd7i345
amd56
i345
all

How much can we shrink it4
architecture )efore difference
all
;< ,,, 777 777
i345
;: ,,, 777 777
amd56
;; ,,, 777 777
h+rd7i345
96 ,,, 777 777
ia56
6: ,,, 777 777
k*reeb$d7amd56
3= ,,, 777 777
k*reeb$d7i345
3< ,,, 777 777
total :=5 ,,, 777 777
after xz
<eduction
<ate

How much can we shrink it4
architecture )efore difference
all
;< 6; 79: :9B
i345
;: 3: 7:8 34B
amd56
;; 36 7:9 34B
h+rd7i345
96 98 76 :=B
ia56
6: :6 794 63B
k*reeb$d7amd56
3= :6 79; 34B
k*reeb$d7i345
3< :3 796 34B
total :=5 9=: .=& '+>
after xz
<eduction
<ate

.et the >act77
3?og tells the truth...6
3actually %[email protected].@p an" it uses CD: system
but most o% tra%%ic goes to @aist6

9hich arch? 3A6
$3=8 h+rd7i345 $h6
amd56 $parc m54k
poerpc arm $o+rce
$3=8A armel armh*
k*reeb$d7amd56 mip$ ia56
alpha k*reeb$d7i345 mip$el
hppa i345 all

9hich arch? 3si5e6
3otal# ?'3$

all : 36

i345 : :;

amd56 : 94

$o+rce : 3
architect+re
all '&.&=
alpha 8.8:
amd56 .,.?=
arm 8.83
8.55
8.8:
8.89
h+rd7i345 8.83
i345 *+..=
ia56 8.9;
k*reeb$d7amd56 8.::
k*reeb$d7i345 8.:3
m54k 8.88
8.98
8.93
8.48
$3=8 8.84
$3=8A 8.89
$h6 8.88
$o+rce *.?,
8.93
4:.<=
donload
(!@)
armel
armh*
hppa
mip$
mip$el
poerpc
$parc

How much can we cut?
"f we'll apply xz...

-+t *&3$1

)t#$ bene*it *or mirror admin$


architect+re
all ,.*&
alpha 8.88
amd56 (.?=
arm 8.88
8.88
8.88
8.88
h+rd7i345 8.89
i345 %.((
ia56 8.85
k*reeb$d7amd56 8.84
k*reeb$d7i345 8.8=
m54k 8.88
8.88
8.88
8.88
$3=8 8.88
$3=8A 8.88
$h6 8.88
$o+rce 8.88
8.88
*'.%&
donload c+t
(!@)
armel
armh*
hppa
mip$
mip$el
poerpc
$parc

2ource# *=..

Pando Cetork$ &elea$e$ ?lobal )nternet 2peed 2t+d"( Pando


Cetork$ )nc :899( vieed ::th 2eptember( :899(
<http://.pandonetork$.com/Pando7Cetork$7&elea$e$7
?lobal7)nternet72peed72t+d">.
@lo)al Download 2tudy

http://chart$bin.com/vie/:646
You can check your download speed at
http#99www.speedtest.net9

$est + countries
9.Dorea : ::8:D@p$
:.&omania : 9=8=
3.@+lgaria : 9599
6.Eith+ania : 965:
;.Eatvia : 93<<

1nited 2tates # (.(A$ps
@ermany # (&,A$ps
Bapan # .'(&A$ps 3My result ,.)( M/4s itDs enough &6
5icaragua # .?=A$ps
Corld :verage # +?=A$ps

Corth %merica F ;887588D@p$

2o+th %merica F 9887:88D@p$

G+rope F ea$tern i$ better than e$tern



"f we would update Desktop9Daptop everyday in unsta)le

Donload 9879;H@ (ma"be) *or each

)t take$ :73 min$

/> c+t 9min


"t's )enefit for De)ian users
!including developers, of course #)

-oncl+$ion ,

Ho large i$ Debian &epo$itor": 59;?@

'ne da"( ) *o+nd a $ol+tion... : +$e A>

)$ it reall" e**ective, : YG21

Problem on $loer %rch : A45 I all

Ho m+ch can e $hrink it, : .==@$E

"t'll cut download traffic # *&3$9year

"t's )enefit for mirror admins

:lso for De)ian 1sers9Developers



2ra"e&o%% 3Cs "ecompression6
better compre$$ion
v$
increa$e decompre$$ion time

2ra"e&o%% 3Cs "ecompression6
2est Machine Spec
;ntel Core i,
1#./ Mem

2est1
ex1) fonts-horai-umefont_440-1_all.deb
$ du -k data.tar.*
43664 data.tar.gz
!"0 data.tar.xz
$ time tar xf data.tar.gz
real 0m0."#!s
user 0m0.""0s
s$s 0m0.104s
$ time tar xf data.tar.xz
real 0m0.61#s
user 0m0.64s
s$s 0m0.144s

2est1.,
$ %at de%om&.sh
'( )bin)sh
i*0
+hile , $i -lt 100 -
do
i*.ex&r $i / 1.
tar xf $1
done
$ time .)de%om&.sh data.tar.gz
real 1m43.4"!s
user 1m3#.!06s
s$s 0m14.101s
$ time .)de%om&.sh data.tar.xz
real 1m10.106s
user 1m.!"0s
s$s 0m1".16#s

2est-
ex0) o&en%li&art-&ng
$ du -k data.tar.*
60136" data.tar.gz
61100 data.tar.xz
$ time .)de%om&.sh data.tar.gz
real 10m04.6!s
user #m."0#s
s$s 0m16."4#s
$ time .)de%om&.sh data.tar.xz
real 6#m0".146s
user 6m3#.6"6s
s$s 3m4.00"s

2est'
eA3) non7all package J lin+A7image73.:.8737amd56K3.:.:973Kamd56.deb (L)
M time ../decomp.$h data.tar.g>
real 9m:3.4=6$
+$er 9m:8.==3$
$"$ 8m:9.859$
M time ../decomp.$h data.tar.A>
real 3m8.353$
+$er :m;5.5==$
$"$ 8m:6.:;4$
L) lin+A7image73.:.8 ha$ alread" been applied A>

2est$
ex4) on non-x"6 ar%h
1 sorr$2 not %he%ked $et 3-)


2est,
ex) installing &a%kage
4ood %ase
root5h&6)tm&)buildd' time d&kg -i fonts-horai-umefont_43#-1_all.deb
real 0m0.!1s
user 0m0."""s
s$s 0m0.116s
root5h&6)tm&)buildd' time d&kg -i fonts-horai-umefont_440-3_all.deb
real 0m0.!64s
user 0m0."4"s
s$s 0m0.100s

2est,
7ormal %ase
root5h&6)tm&)buildd' time d&kg -i &o&&ler-data_0.4.-1_all.deb
real 0m0.10#s
user 0m0.144s
s$s 0m0.030s
root5h&6)tm&)buildd' time d&kg -i &o&&ler-data_0.4.-"_all.deb
real 0m0.033s
user 0m0.036s
s$s 0m0.036s
do+nload time * almost same
install time /0.104s

2est,
8orst %ase
root5h&6)tm&)buildd' time d&kg -i o&en%li&art-&ng_0.0-0_all.deb
real 0m4.!36s
user 0m6.1"0s
s$s 0m1.6"s
root5h&6)tm&)buildd' time d&kg -i o&en%li&art-&ng_0.0-0.1_all.deb
real 0m40.6#s
user 0m41.!!#s
s$s 0m1.600s
do+nload time * almost same
install time /36s 9x")

3est tells us...
xz decompression is slower than default gz
!at most time)

rarel" *a$ter than g>

+$+all" :74 time$ $loer than g>


it depends on its own data.

good compre$$ion rate F *a$ter decompre$$ion


it doesn't depend on running arch4

Cot checked

?og tells the truth 3again6
package name
total donload
$i>e(?@)
package name n+mber$
9 lin+A7:.5 6(438 krb; <:3(=:3
: openo**ice.org :(4;3 eglibc 543(;63
3 libreo**ice :(365 lin+A7:.5 5<=(895
6 eglibc 9(;55 c+p$ 593(435
; teAlive7eAtra 9(63: openo**ice.org ;=9(;98
5 me$a 9(::3 mono ;48(<38
< evol+tion 9(9== evol+tion7data7$erver ;3<(6<6
4 *reepat$ 9(999 bind= ;93(=4=
= teAlive7ba$e 9(8:: libreo**ice ;8<(<3;
98 $amba 9(894 avahi 6=<(<56

3hey ate &,> of all traffic !'% 9 ?*3$)

Nir$t target,

...then, how to apply it4
:pply top += packages4
;odify de)helper4 (to appl" A> *or all/i345/amd56 b" de*a+lt)
;odify )uild daemon4
;assre)uild for i'?(9amd(&9all arch4
Thoughts?
(after this presentation,
welcome YOUR comment :-)

-oncl+$ion (reall")

Ho large i$ Debian &epo$itor": 59;?@

'ne da"( ) *o+nd a $ol+tion... : +$e A>

)$ it reall" e**ective, : YG21

Problem on $loer %rch : A45 I all

Ho $hrink : .==@$E

"t'll cut download traffic # *&3$9year


2o, recommend to apply 78 to all, Fi'?( and
Famd(& if we can !surely exclude G0riority#reHuireI)

%l$o( !hank$ to nice pict+re$

SpaceFun
http://wiki.debian.org/DebianArt/Themes/SpaceFun
By Valessio Brito
licensed under !"#$

Debian Theme %etch&'

Debian Theme %by (noga)un'

Thinking
http://www.*lickr.com/photos/nachoissd/+,--./0-++/
By Victor !1re2 :: 3ictorpere2p.com
licensed under 4reati3e 4ommons Attribution $./ eneric %44 B5 $./'

A success*ul tool is one that was used to do something undreamed o* by its author.
http://www.*lickr.com/photos/katerha/06,7-/070$/
By katerha
licensed under 4reati3e 4ommons Attribution $./ eneric %44 B5 $./'

You might also like