EMMC-SSD File System Tuning Methodology v1.0 PDF
EMMC-SSD File System Tuning Methodology v1.0 PDF
Rev. 1.0
May 24, 2013
Abstract
This document describes methods of file system performance measurement and tuning for
eMMC/SSD storage media, including open source tools and file system options. It also covers
longevity aspects of eMMC/SSD media with certain file systems and file system power-fail
tolerance in the presence of performance-tuning options.
Audience
Audience of this document is people optimizing I/O performance of embedded systems with
managed FLASH storage devices.
Document Status
Draft
Contact Information
Send your comments, questions and report any other issues related to this document to:
[email protected] and [email protected].
Cogent Embedded, Inc. contact information is available at http://cogentembedded.com
COGENT EMBEDDED 2
May 24, 2013 eMMC/SSD File System Tuning Methodology
Revision History
Rev. Date Author Changes
0.1 October 15, 2012 M. Filippov Initial Revision
0.2 November 1, 2012 M. Filippov Add preliminary results and benchmark scripts
overview
0.3 November 12, 2012 M. Filippov More results, more scripts details
0.4 January 12, 2013 M. Filippov Add IO schedulers results
0.5 May 13, 2013 M. Filippov, Add Wear Intensity and Power-Failure Tolerance
K. Kozhevnikov sections. Update results section and clean-up text.
1.0 May 24, 2013 D. Semyonov Describe power-failure testing results. Clean-up text,
styles, and paragraph ordering. Update logo.
Issues
None
COGENT EMBEDDED 3
May 24, 2013 eMMC/SSD File System Tuning Methodology
Table of Contents
1 Introduction...................................................................................................................................5
1.1 Overview................................................................................................................................5
1.2 Acronyms and Abbreviations................................................................................................5
2 Setup.............................................................................................................................................6
2.1 Hardware Setup......................................................................................................................6
2.2 Software Setup.......................................................................................................................6
3 Performance Benchmarking and Optimization.............................................................................7
3.1 Approach................................................................................................................................7
3.2 Measured Performance Parameters.......................................................................................7
3.3 Results Comparison...............................................................................................................8
3.4 Benchmarking Tools and Workload Generators...................................................................8
3.4.1 Flashbench......................................................................................................................8
3.4.2 FIO..................................................................................................................................9
3.5 File Systems and Their Performance Tuning Options.........................................................10
3.5.1 Ext3/Ext4......................................................................................................................11
3.5.2 BTRFS..........................................................................................................................11
3.5.3 F2FS..............................................................................................................................11
3.6 I/O Schedulers......................................................................................................................11
3.7 Expected Results..................................................................................................................12
3.8 Observed Results.................................................................................................................12
3.8.1 Linear and Random Direct Reading.............................................................................13
3.8.2 Linear and Random Direct Writing..............................................................................15
3.8.3 Linear Reading from Multiple Files (grep)...................................................................17
3.8.4 Linear Writing to Multiple Files (untar).......................................................................18
3.8.5 Linear Reading and Writing Multiple Files (compile).................................................19
3.8.6 Random Reading and Writing Multiple Files (fileserver)............................................21
3.8.7 Random Reading and Writing Multiple Memory-mapped Files (database).................22
3.9 Conclusion...........................................................................................................................23
4 SSD Wear Intensity....................................................................................................................24
4.1 Approach..............................................................................................................................24
4.2 Results..................................................................................................................................24
5 Power-Fail Tolerance..................................................................................................................25
5.1 Approach..............................................................................................................................25
5.2 Results..................................................................................................................................26
5.2.1 Ext4...............................................................................................................................26
5.2.2 BTRFS..........................................................................................................................26
5.2.3 F2FS..............................................................................................................................26
5.2.4 Fsck, Mount, and Unmount Performance.....................................................................27
Appendix I. Benchmarking Scripts...............................................................................................28
Tree Structure...........................................................................................................................28
Configuration............................................................................................................................28
Execution..................................................................................................................................28
Sample Usage...........................................................................................................................29
COGENT EMBEDDED 4
May 24, 2013 eMMC/SSD File System Tuning Methodology
1 Introduction
1.1 Overview
This document provides a methodology for performance tuning of a file system operating on a
managed FLASH device, like SSD/SD/eMMC. It focuses on ext3/ext4, BTRFS and f2fs open
source file systems (list can be extended). A set of performance characteristics is defined and
tools for workload simulation / performance measurement are proposed. Procedures for wear
intensity estimation of tuned file systems and power-fail tolerance are described.
The document is accompanied with a set of scripts that allow conducting performance testing
and tuning for exact target/storage configuration in accordance with the methodology.
COGENT EMBEDDED 5
May 24, 2013 eMMC/SSD File System Tuning Methodology
2 Setup
COGENT EMBEDDED 6
May 24, 2013 eMMC/SSD File System Tuning Methodology
3.1 Approach
The following iterative approach for file system tuning is proposed:
1. file system without any tuning, (that is, with default options), is set up and its
performance parameters are measured;
2. each of the remaining tuning options is applied to the current file system setup and
performance parameters of the new setup are measured;
3. parameter difference is evaluated and the option that gives the biggest performance gain
is added to the current file system setup, step 2 is repeated until there are no more options
or no option gives performance gain.
Each benchmark provides a method for visual performance parameters comparison.
Benchmarking script also provides hints about optimal file system parameters/IO scheduler for
each workload and generally for a file system based on either cumulative throughput or worst
latency.
COGENT EMBEDDED 7
May 24, 2013 eMMC/SSD File System Tuning Methodology
3.4.1 Flashbench
Although not file system benchmarking tool, flashbench allows to measure FLASH media
characteristics such as erase block/segment size which in turn allow to tune file system
geometry accordingly. It is only used during initial evaluation of media characteristics.
The following approach for FLASH media geometry determination is proposed in the
flashbench README:
COGENT EMBEDDED 8
May 24, 2013 eMMC/SSD File System Tuning Methodology
“This is a simple read-only test doing small reads across boundaries of various sizes.
Example:
$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=1024
align 134217728 pre 735µs on 1.08ms post 780µs diff 324µs
align 67108864 pre 736µs on 1.05ms post 763µs diff 300µs
align 33554432 pre 722µs on 1.04ms post 763µs diff 294µs
align 16777216 pre 727µs on 1.05ms post 772µs diff 302µs
align 8388608 pre 724µs on 1.04ms post 768µs diff 299µs
align 4194304 pre 741µs on 1.08ms post 788µs diff 317µs
align 2097152 pre 745µs on 950µs post 811µs diff 171µs
align 1048576 pre 745µs on 945µs post 807µs diff 169µs
align 524288 pre 743µs on 936µs post 799µs diff 165µs
align 262144 pre 746µs on 948µs post 809µs diff 171µs
align 131072 pre 737µs on 935µs post 804µs diff 165µs
align 65536 pre 735µs on 925µs post 796µs diff 159µs
align 32768 pre 735µs on 925µs post 800µs diff 157µs
align 16384 pre 745µs on 911µs post 781µs diff 148µs
align 8192 pre 785µs on 808µs post 725µs diff 53.3µs
align 4096 pre 784µs on 788µs post 779µs diff 5.85µs
align 2048 pre 787µs on 793µs post 789µs diff 4.65µs
This shows the access times to do two 1024 byte reads around the boundaries of power-
of-two aligned blocks. Reading at the end of a 128 MB unit takes around 735
microseconds, reading the last block of this unit together with the first block of the next
one takes about 1080 microseconds and reading the first two blocks in a 128 MB unit
takes around 780 microseconds.
The most interesting number here is the last one, the difference between the second
number and the average of the first and the third is 324 microseconds. These numbers
all stay roughly the same for all units between 4 MB and 128 MB.
However, from 2 MB down to 16 KB, the last column has a much lower value. This
indicates that whatever the memory card does on a 4 MB boundary does not happen at
other boundaries. The educated guess here is that 4 MB is the erase block size, also
called the segment or allocation unit size. This erase blocksize will need to be used in
other tests following this one.
Similarly, both 16 KB and 8 KB boundaries are special. The logical explanation for this
is that the card has 8 KB pages, but can use multi-plane accesses to read two 8 KB
pages simultaneously. ”
Benchmarking scripts automate FLASH geometry guessing with the described method by
finding two rows in the 'flashbench -a' output with the biggest (Drow+1-Drow)/Drow, where Di is the
value of the last output column, and printing corresponding block sizes.
3.4.2 FIO
FIO is described as a tool that “would be able to simulate a given io workload without resorting
to writing a tailored test case”. It is highly configurable: there can be any number of processes
or threads involved, and they can each be using their own way of generating IO.
FIO is driven by job files. A job file may contain any number of threads and/or files - the
typical contents of the job file is a global section defining shared parameters, and one or more
job sections describing the jobs involved. When run, fio parses this file and sets everything up
as described. If we break down a job from top to bottom, it contains the following basic
parameters:
COGENT EMBEDDED 9
May 24, 2013 eMMC/SSD File System Tuning Methodology
1. IO type: defines the io pattern issued to the file(s). We may only be reading sequentially
from this file(s), or we may be writing randomly. Or even mixing reads and writes,
sequentially or randomly;
2. block size: in how large chunks we are issuing IO. This may be a single value, or it may
describe a range of block sizes;
3. IO size: how much data we are going to be reading/writing;
4. IO engine: how we issue IO. We could be memory mapping the file, we could be using
regular read/write, we could be using splice, async io, syslet, or even SG (SCSI generic
sg);
5. IO depth: if the IO engine is async, how deep queue we maintain;
6. IO type: should we be doing buffered IO, or direct/raw IO;
7. num files: how many files we are spreading the workload over;
8. num threads: how many threads or processes we should spread this workload over.
FIO has the following shortcomings that limit the range of useful workloads:
1. it cannot measure read and write throughput of readwrite job separately: for jobs with
equal read and write data amount reported throughput numbers are also equal;
2. it cannot generate metadata intensive activity.
COGENT EMBEDDED 10
May 24, 2013 eMMC/SSD File System Tuning Methodology
3.5.1 Ext3/Ext4
• no journal (tune2fs -O ^has_journal)
• this option turns off file system journal. It should be safe to do so on battery-backed
device with a stable kernel;
• data=writeback (mount -o data=writeback)
• this option specifies the following journaling mode for file data: data may be written
into the main file system after its metadata has been committed to the journal. This is
rumored to be the highest-throughput option. It guarantees internal file system
integrity, however it can allow old data to appear in files after a crash and journal
recovery;
• nobarrier (mount -o nobarrier)
• next to “no journal” option, should be tried in case “no journal” is not acceptable;
may be skipped if “no journal” is selected;
• this option disables the use of write barriers in the jbd code. Write barriers enforce
proper on-disk ordering of journal commits, making volatile disk write caches safe to
use, at some performance penalty. In case of battery-backed media disabling barriers
may safely improve performance;
• fs geometry (mount -o stripe=, mkfs -E stripe-width=)
• this option controls the number of file system blocks that mballoc will try to use for
allocation size and alignment. As in RAID5/6 case proper IO block size and
alignment may speed up FLASH write operations. Hint for the stripe width may be
deduced from the flashbench tool output.
3.5.2 BTRFS
• fs geometry (mkfs –leafsize=)
• this option controls the leaf size, the least data item in which btrfs stores data. As
in RAID5/6 case proper IO block size and alignment may speed up FLASH write
operations. Hint for the leaf size may be deduced from the flashbench tool output;
• SSD mode (mount -o ssd, mount -o ssd_spread)
• these options enable SSD-optimized allocation schemes.
3.5.3 F2FS
• fs geometry (mkfs -s -z):
• these options control number of segments per section and number of sections per
zone. Sizes that match chip parameters may speed up FLASH write operations.
Hint for the segment and zone sizes may be deduced from the flashbench tool
output;
COGENT EMBEDDED 11
May 24, 2013 eMMC/SSD File System Tuning Methodology
COGENT EMBEDDED 12
May 24, 2013 eMMC/SSD File System Tuning Methodology
• BTRFS is consistently faster with SSD mode enabled on devices that support TRIM;
• ext4 prefers noatime/nojournal and CFQ scheduler. Strangely it works (slightly) faster
with discard enabled on eMMC without TRIM support;
• f2fs runs best with its basic settings, or with noatime enabled;
• CFQ scheduler dominates the table followed by the ROW.
first type, where random reads are as fast as sequential reads, on all file systems,
COGENT EMBEDDED 13
May 24, 2013 eMMC/SSD File System Tuning Methodology
and the second type, where random reads are slower than sequential reads for small block sizes,
but the difference gets lower as block size grows. In both cases performance is very little
affected by file system options or IO scheduler selection, but is different for different file
systems:
COGENT EMBEDDED 14
May 24, 2013 eMMC/SSD File System Tuning Methodology
first type, where random write is significantly slower than sequential write and grows linearly
with IO block size. However it depends on the file system type, and even on such medium f2fs
performs significantly better than btrfs or ext4, keeping random writes on par with sequential
writes for IO block sizes up to 64KiB:
COGENT EMBEDDED 15
May 24, 2013 eMMC/SSD File System Tuning Methodology
and the second type, where random writes with small and large IO block size are fast, but with
medium IO block size are slow:
On such media f2fs also performs well with medium IO block size:
16 Gbyte class 10 SD discard cfq 8250 base noop 8315 noatime noop 8263
card (SanDisk) ssd cfq 8246 base deadline 8314 noatime cfq 8201
discard deadline 8244 discard noop 8303 noatime deadline 8199
base row 8244 base row 8302 noatime row 8193
discard noop 8242 geometry deadline 8301 base deadline 8148
16 GByte eMMC geometry row 9188 writeback cfq 9160 discard noop 9226
chip (Toshiba) geometry deadline 9186 nojournal cfq 9155 noatime noop 9225
ssd row 9184 base cfq 9152 base deadline 9225
noatime row 9184 discard cfq 9147 discard cfq 9224
noatime deadline 9183 geometry cfq 9144 noatime row 9222
File system options and IO scheduler selection effect on performance reach up to 20% when
work set to free memory ratio is big.
File system options and IO scheduler selection have significant impact on performance,
especially when work set is less than free memory:
COGENT EMBEDDED 18
May 24, 2013 eMMC/SSD File System Tuning Methodology
16 GByte class 10 base row 10203 discard deadline 9592 base noop 9867
SD card (Kingston) discard cfq 10088 writeback cfq 9406 discard noop 9743
base cfq 10075 nobarrier deadline 9381 discard deadline 9735
base noop 9793 noatime deadline 9335 base deadline 9627
ssd deadline 9752 nojournal noop 9308 noatime deadline 9614
16Gbyte class 10 SD ssd cfq 10148 nobarrier row 9865 base noop 10200
card (Transcend) noatime cfq 10143 discard row 9848 discard noop 10154
base cfq 10130 base noop 9848 base deadline 10044
geometry cfq 10127 writeback cfq 9800 discard deadline 9919
ssd row 10091 noatime noop 9790 base row 9863
16 Gbyte class 10 SD geometry noop 9375 base row 9435 noatime deadline 9261
card (SanDisk) geometry deadline 9375 noatime row 9297 discard noop 9250
discard deadline 9359 discard cfq 9281 noatime row 9071
ssd row 9341 base noop 9274 discard deadline 9002
noatime deadline 9340 nojournal noop 9265 base cfq 8998
16 GByte eMMC base cfq 10561 base row 10461 base cfq 10318
chip (Toshiba) ssd noop 10440 base deadline 10451 noatime cfq 10207
noatime cfq 10440 nojournal cfq 10412 discard deadline 10205
discard noop 10439 discard noop 10406 noatime deadline 10194
noatime deadline 10436 base cfq 10380 base deadline 10153
Write performance starts with very high numbers when work set is small and drops to nearly
direct linear medium write speed at big work set size. Read performance varies insignificantly
with work set size change, but highly depends on selected IO scheduler and a little less on file
system options. Noop scheduler shows the worst aggregated throughput, CFQ and ROW show
the best. Both read and write throughput variation with file system options and IO scheduler
selection reaches 400%:
COGENT EMBEDDED 19
May 24, 2013 eMMC/SSD File System Tuning Methodology
card (SanDisk) ssd row 19977 noatime cfq 19161 base row 19995
geometry row 19951 noatime row 19099 base cfq 19983
geometry cfq 19948 nojournal row 18957 discard cfq 19964
discard row 19884 writeback row 18780 noatime cfq 19797
16 GByte eMMC chip base cfq 21189 writeback cfq 21492 noatime cfq 20891
(Toshiba) ssd cfq 21133 base noop 21447 base cfq 20885
noatime cfq 21008 noatime cfq 21330 discard row 20879
ssd row 20972 base deadline 21296 base row 20801
discard row 20895 discard cfq 21246 discard cfq 20781
Throughput varies greatly with file system options and IO scheduler selection.
16 GByte eMMC chip discard cfq 17396 nobarrier cfq 17561 noatime cfq 17825
(Toshiba) base row 17366 writeback cfq 17537 base cfq 17754
ssd cfq 17323 discard cfq 17529 discard cfq 17739
geometry cfq 17323 geometry row 17518 base row 17324
base cfq 17305 nojournal row 17505 discard row 17296
3.9 Conclusion
Benchmarking results look mostly as expected: read bandwidth is limited by media, random
write bandwidth is great with big FS cache and gets limited by media as I/O work set gets
bigger than FS cache.
There's no single set of file system options/IO schedulers optimal for all media, each medium
has its own performance profile.
Unidirectional (read only, write only) workloads are less sensitive to file system options and IO
scheduler selection than bidirectional workloads.
COGENT EMBEDDED 23
May 24, 2013 eMMC/SSD File System Tuning Methodology
4.1 Approach
Straightforward comparison procedure is to retrieve total-erase-counter parameter before and
after performance tests run, and to use relative growth of the counter serves as a score (to
achieve the certain level of confidence the simulation may be repeated multiple times and
received score is averaged). Of two file system configurations (different file systems or
different configuration parameters of single file system) the one with lower averaged score
observed in such way is considered as a preferred.
In practice, system designer is interested in expected lifetime duration of device that naturally
varies with regard to intended usage scenario. The workload which is not specific to device
may easily kill its storage quickly (e.g. tablet used as mail-server). Therefore it is clear that
benchmark used for estimation of wear intensity shall be chosen to reflect “typical” usage.
Such typical usage can be modeled roughly as number of primitive operations performed daily /
weekly / monthly, so there is known relation between single benchmark cycle and “modeled”
device run time (e.g. one cycle corresponds to 3 months of “normal” device usage). Basing on
observed wear intensity and characteristics of underlying physical storage it is possible to
derive expected device lifetime for given file system configuration, and that information may
be used by system designer to make a decision on what file system to use. For example, if two
configurations yield expected lifetime duration 30 and 100 years respectively, probably the
parameter is not so important and system designer should base his/her decision on file system
performance characteristics; in contrast, if it is 1 and 3 years, probably importance of that
parameter becomes quite high.
4.2 Results
Unfortunately, none of the tested samples supports S.M.A.R.T. diagnostic interface, and
described comparison method is not applicable. As there is no way to execute flash controller
firmware in simulated environment (for instance, nandsim used as underlying flash media), the
only way to compare two file system configurations is to wear off the device completely and to
compare the time-until- failure. Such brute-force comparison procedure is too costly (in all
senses) to perform, and therefore it has not been performed.
COGENT EMBEDDED 24
May 24, 2013 eMMC/SSD File System Tuning Methodology
5 Power-Fail Tolerance
The aim of power-failure tests is to make sure that file-system and data integrity is preserved
after unexpected power outages (or in some cases, after sudden system reset, e.g. triggered by
hardware watchdog). It is expected outcome that data which existed only in operating system
caches will get lost; more important is what happens to information that has been previously
committed to persistent storage. Corruption of data that was successfully flushed to disk, or
corruption of data that was not accessed at all, or even corruption of entire file system (i.e.
inability to mount disk on next boot) are major faults that may forbid usage of specific file
system configuration or even particular storage media unless hardware modifications are added
that (practically) guarantee absence of power-failures.
From hardware perspective, certain operations are expected to be atomic. For instance, block
erase operation or page writing operation all should be atomic. If operation is not completed for
some reason, the state of flash cell array is, strictly speaking, unknown – a block may be erased
just partially, or page programming may be incomplete. It is known that abnormal termination
of such atomic operations may put flash array in internally unstable state (subsequent reading
of the “partially-written” bits may get different values, and writing into “partially-erased” pages
is not guaranteed to be correct), and suggested resolution is to repeat aborted procedure. As
operating system is not managing flash (the task is performed by flash controller firmware),
there is no real control over such “low-level” errors from file system standpoint, and
consequently it is entire responsibility of the firmware to assure proper handling of events like
these. File system, in turn, is responsible for consistency of its metadata and data (in other
words, it shall recover from errors if underlying flash management layer is “reliable”), and it is
expected that different configurations may show different behavior under power-failures.
There are a number of parameters that may get affected by power-failure. For instance, file
system recovery procedure may get triggered on next boot resulting in significantly longer
start-up time; or first file access time after power-failure may be longer. All such aspects may
be important from general system operation standpoint, however in this section we will
concentrate only on file system metadata and data integrity – something that we can assess
relatively easily.
5.1 Approach
Power-failure test script (located at git://git.cogentembedded.com/pub/celf/pf-test) performs
unattended simulation of power failures and analyzes file system and data integrity check
results. Detailed execution log is saved for further examination. Power-fail test server residing
on host manages the execution of various remote tests on target with particular file system
configuration set up, and controls power source to initiate power outages at random time
instants during remote test execution. Test server connects to the target and retrieves logging
and diagnostic information.
The following steps are executed in a loop for each remote test and file system:
1. connect to the target;
2. perform mandatory file system check and recovery;
3. perform data integrity check if applicable;
4. mount current file system configuration;
5. start background test;
6. initiate power outage after random interval.
COGENT EMBEDDED 25
May 24, 2013 eMMC/SSD File System Tuning Methodology
5.2 Results
Power-failure testing is time consuming process. Depending on required confidence degree, the
complete testing round may take from one day to several weeks. (Of course, really bad
configurations will be detected much earlier.) Therefore, this section provides just a few
samples of what you get from power-failure tests, without trying to cover all possible
configurations exercised by performance tuning procedure.
We used two sets of parameters for each file system: “default” and “the recommended
performance-wise”, and executed the tests on PandaBoard with 16 GB class 10 SD card
(Kingston).
It should be noted that optimal parameters from performance point of view may be the worst
possible ones from power-failure tolerance perspective. This is especially true for ext4 which
delivers the best performance with crucial safety features (barriers, journal) disabled.
5.2.1 Ext4
This proved to be the most robust file system among the tested ones, but only at the expense of
improved performance. (In the absence of journal, fsck fails to auto-recover the file system
after just a couple of power failures. Thus, this mode is not covered in the following
performance-related comparison table.)
Default journaling mode, with and without nobarrier mount option enabled randomly, survived
1406 power cycles before fsck detected unrecoverable failure.
Note: “Manual recovery” fsck feature still allowed to bring ext4 partition into consistent state by blindly
answering “yes” to very low-level and highly technical questions. But in real life this would likely mean loss of
data.
It is interesting that the last mount command before the failure did not use nobarrier option.
This may indicate that it is safe to use it in our configuration to improve performance without
affecting power-fail tolerance. Although, more testing rounds are required to prove this.
5.2.2 BTRFS
The test fails after just a few power cycles because fsck finds errors which it is not able to
auto-recover. This happened both with default mount options, and with performance-oriented
ones. Since no safety-related parameters were disabled during the tests, the conclusion is to
avoid using btrfs in that particular configuration if power loss events are anticipated.
5.2.3 F2FS
Unfortunately, F2FS does not have the fsck utility. Therefore, “mandatory file system check
and recovery” step could not be executed for F2FS. This is probably the reason why it
COGENT EMBEDDED 26
May 24, 2013 eMMC/SSD File System Tuning Methodology
demonstrated the most troublesome behavior: hanged sync command during one round of
testing, and severe file system issues revealed by manual checking of basic file system
operations, (although, mount operation alone reported no problems), during another round.
Thus, similar to btrfs, F2FS is not recommended for usage in this particular configuration in the
presence of power failures.
Since we got more-or-less trustful results only for ext4 file system, others are provided only as
samples of what you can get from power-failure test logs.
An interesting observation is that maximum values could be much larger than the median ones.
This should be taken into account while designing system with tight startup/shutdown time
budget.
COGENT EMBEDDED 27
May 24, 2013 eMMC/SSD File System Tuning Methodology
Tree Structure
./
├── benchmark/ # benchmarks directory
│ ├── common # common parts of benchmark scripts
│ ├── benchmark-script # script instance
│ ├── benchmark-script.config # instance configuration
│ ├── benchmark-script.job # FIO job definition (for fio scripts)
│ └── ...
├── fs/ # file systems directory
│ ├── btrfs/ # file system instance
│ │ ├── config # common parts of configuration scripts
│ │ ├── fs-configuration-script # fs configuration script instance
│ │ ├── fs-configuration-script.config # instance configuration
│ │ └── ...
│ ├── ...
│ └── common # common parts of all file system scripts
├── iosched/ # IO schedulers directory
│ ├── config # common parts of configuration scripts
│ ├── ioscheduler-script # IO scheduler configuration script instance
│ └── ...
├── result-xxx/ # results directory
│ └── fs/
│ ├── <fs>/ # results for the file system <fs>
│ │ ├── <fs-config>-<benchmark> # result for <benchmark> in <fs-config>
│ │ └── ...
│ └── ...
├── config # common configuration parameters
└── run* # topmost run wrapper
All scripts in the tree are bash scripts, configuration files are sourced.
Configuration
./config file is the main configuration file, it defines parameters of the used media and file
system options:
A hint on the media geometry parameters may be obtained from ./run geometry.
Execution
COGENT EMBEDDED 28
May 24, 2013 eMMC/SSD File System Tuning Methodology
Once ./config is filled ./run script may be started. For each missing file under ${result_dir} it
invokes fs configuration scripts not mentioned in *_default variables to create and mount
corresponding file system and then it invokes benchmarking scripts in turn and put their output
into the ${result_dir}.
Once ./run is complete result comparison graphs may be built by running ./run gdiff.
Comparison graphs for arbitrary (but compatible) result sets may be built by calling
benchmarking scripts with parameter 'gdiff' and a list of result files to compare.
Sample Usage
Note that run geometry is essentially guessing based on the flashbench output, it may be
unstable and inaccurate.
3. If the step 2 result does not look sane try to analyze flashbench output manually.
Please see section 3.4.1 for the recommended way of doing that.
ext4_exclude='geometry'
btrfs_exclude='geometry'
f2fs_exclude='geometry'
^D
4. run it:
# ./run
5. see the results:
1. plot performance parameters:
$ ./run gdiff
This will pop up a bunch of gnuplot windows with performance parameters for every
benchmark that was run.
2. see automatic analysis hint:
or
This will print the best file system option, IO scheduler and associated performance
metric for every file system and every benchmark.
6. choose file system options to apply, add them to the corresponding *_default
parameters in the config file, e.g. set
ext4_default=”nojournal noatime”
btrfs_default=”ssd noatime”
f2fs_default=”geometry”
7. change result directory, e.g. set
result_dir=result-kingston-sd8g4c-iteration2
8. repeat from step 4.
COGENT EMBEDDED 30