Linux File System Performance Comparison
Linux File System Performance Comparison
Executive Overview.......................................................................................... 3
Filesystems.......................................................................................................... 4
Ext2 Filesystem ............................................................................................. 4
Ext3 Filesystem ............................................................................................. 5
Raw Devices .................................................................................................. 5
Oracle Clustered Filesystem (OCFS) ......................................................... 6
Benchmark ......................................................................................................... 6
Configuration ................................................................................................ 6
Test Setup ...................................................................................................... 7
Test Goals ...................................................................................................... 7
Results ................................................................................................................. 8
Conclusion........................................................................................................ 10
Appendix A – Datafile Layout ...................................................................... 11
Appendix B – Init.ora Parameter File .......................................................... 12
EXECUTIVE OVERVIEW
This paper describes the technical background and the performance results of a
transaction processing workload that we ran on an Oracle 9i Release 2 database
server to compare the performance of four Linux filesystems on direct-attached
SCSI storage. We tested two traditional filesystems, ext2 and ext3; and we tested
two special-purpose filesystems, raw device pseudo-files and the Oracle Cluster
Filesystem (OCFS).
However; unlike most applications the Oracle database server itself also
implements many of the same functions that are implemented by a traditional
filesystem including cache management, IO optimization and data layout. This
overlap of filesystem and database functionality raises several questions: How does
Oracle on Linux perform when the storage is provided by a typical filesystem? Can
the two systems work together? How well can the database perform when it
controls the storage more directly? These questions motivated the performance
tests discussed in this paper.
The test results show that OCFS and raw devices performed similarly; that ext2 and
ext3 performed similarly; and that as the workload scaled up, OCFS and raw device
files yielded significantly greater performance than the ext2 and ext3 filesystems.
The following sections discuss the details of the tests and the results.
This section briefly describes the filesystems we tested: ext2, ext3, raw, and OCFS.
Nearly all Linux filesystems, including ext2 and ext3, use the buffer cache layer in
the Linux operating system kernel for disk reading or writing. The kernel not only
caches data but also uses algorithms such as read-ahead (which consecutively reads
extra data blocks into the cache in the hope that these will be the next data blocks
requested by the application).
As it turns out, although the kernel buffer cache layer is beneficial in many
applications it will usually decrease performance for a database application such as
Oracle. Why? Primarily because the Oracle database itself already has a buffer
cache, so the double-caching of data wastes system memory; furthermore the
Oracle database has more knowledge of its client IO access patterns than the
filesystem can have, so the cache buffer replacement strategies and other algorithms
(such as read-ahead) are most likely to improve performance if handled in the
database rather than the filesystem. A variety of other factors can also decrease
database performance in a filesystem; for example some OS filesystem
implementations develop resource bottlenecks when Oracle uses large numbers of
files.
It is important to note that the files which most directly affect the performance of a
database workload such as the OLTP benchmark discussed in this paper are the
Oracle database’s data files. In this paper the filesystems are compared solely for
the performance impact on the chosen benchmark, which means that only the
datafiles are stored in the filesystems for our test (see Appendix A). The following
are some examples of other files used with Oracle that are not data files: executable
binaries, message files, trace files, and shared libraries.
Ext2 Filesystem
Ext2, which is short for “second extended filesystem,” was the de facto standard
on numerous Linux distributions for many years. Ext2 is a reliable and robust
filesystem and provides a rich set of features including subdirectories, attributes,
quotas, and locks.
One potential problem with ext2 and many other filesystems is that in case of an
improper system shutdown such as a power failure, an ext2 filesystem cannot be
Ext3 Filesystem
Ext3 is now available on most Linux distributions. Ext3 is an enhancement of ext2
to implement algorithms for efficiently using a journal of all writes to the on-disk
filesystem. The journal itself is also stored on the disk which enables ext3 to be
reliable, and the IO to the journal is sequential (due to the physical layout of the
journal’s data blocks on the disk) which enables ext3 to perform well.
Ext3 has a number of advantages. A great advantage of ext3 is that when you have
a large disk and need to recover from a system crash, the recovery process
(filesystem consistency check) is much faster than on Ext2. By journaling changes
(writes), ext3 can recover very quickly regardless of the size of the filesystem.
Another advantage is that ext3 is compatible with Ext2 so converting filesystems
between Ext2 and Ext3 is very easy and does not need a reboot or repartitioning of
the disks.
Raw Devices
Linux provides raw device access with a pseudo filesystem which presents a file-like
interface to read and write actual disk partitions. Normally reads and writes to files
go through the Linux filesystem buffer cache; however, raw device IO bypasses the
buffer cache and directly deals with the underlying hardware devices. There is one
raw device for one partition. For the 2.4 series Linux operating system kernel that
was used for this performance comparison, the maximum number of raw devices
that a system can have is fixed at a total of 255 raw devices, and the number of
partitions that can be created on any disk is fixed at 14.
Beyond clustering features and basic file service, OCFS provides a number of
manageability benefits (for example, resizing datafiles and partitions is easy) and
comes with a set of tools to manage OCFS files.
BENCHMARK
This test used an OLTP workload, generated by processes simulating users who
connect to a database and perform transactions. The database simulated a real-
world retail store chain with 1000 warehouses. By varying the number of users, we
controlled the amount of work and the load on the CPU and the IO subsystem.
Configuration
Hardware
Processor 4 x Intel Pentium III 700 Mhz (Cache 2MB)
OCFS 1.0.8-4
Test Setup
The setup consisted of the above server with direct-attached storage. The database
files were distributed evenly over 6 disks. The disk layout is mentioned in
Appendix A. Appendix B contains the database parameter file used for the tests.
Test Goals
Evaluate the performance of ext2, ext3, raw, and OCFS in terms of the following
parameters, which are representative of the database performance.
• Transactions Per Second (TPS) The amount of work done per second; this
is a measure of throughput. The more transactions, the better the overall
system performance.
TPS
indicates the overall database
performance, it is the chief metric we used
to compare the filesystems.
Users
raw ocfs ext2 ext3j ext3o ext3w
Figure 1 shows that when using raw or OCFS the transaction throughput increases
linearly. Since raw and OCFS bypass the filesystem cache, there is more memory
available for data. For Ext2/Ext3, there is a linear increase for some time after
which the TPS graph stays level. The performance increase stops as the database
cache gets filled up. To acquire free buffers the database writer needs to clear some
space in its own cache, and the system performance becomes IO-bound because
the database is busy writing these disk blocks. Thus we see more “free buffer
waits” in the Oracle statspack reports.
20000
Input (KB/s)
15000
10000
5000
Users
0
raw ocfs ext2 ext3j ext3o ext3w
Output (KB/s)
6000
5000
4000
3000
2000
1000
Users
0
raw ocfs ext2 ext3j ext3o ext3w
Figure 3, above, shows several effects. First, for ext2 and ext3 the number of bytes
written is greater than for raw or OCFS until the workload is scaled to a high
number of users. Why? Initially there are more Oracle buffer cache flushes for
ext2 and ext3, but for high numbers of users the system does less total work than
with raw and OCFS (see also Figure 1 above). Second, the output for ext2/ext3
eventually stays constant which is likely due to a bottleneck in the IO subsystem.
Also, in journal mode ext3 writes more because it journals both data and metadata.
60
50
40
30
20
10
Users
0
CONCLUSION
The ext2 and ext3 filesystems have a cache which decreases the database workload
performance for increasing users/load on the system. For small number of users,
ext2 or ext3 may suffice but for large number of users, the limits of the IO
subsystem are reached sooner than with raw and OCFS storage.
The performance of the overall system depends on the type of application and for
an Oracle database it usually depends greatly on the physical organization of data
on disk. Traditional filesystems such as ext2 and ext3 may perform better for
simple applications which benefit from the caches and algorithms of the filesystem
layer. However; for Oracle database systems the on-disk data layout and access
strategies are best left to the Oracle server and our test results demonstrate the
performance advantages of storing Oracle data in either raw files or OCFS files.
compatible = 9.0.1.3.0
control_files = (/dwork1/ctl1,/dwork3/ctl2)
#control_files = ($diskloc/ctl1,$diskloc/ctl2)
db_name = fstest
db_block_size = 2048
db_files = 100
db_block_buffers = 750000 # SGA 1.7 Gb
sort_area_size = 10485760
shared_pool_size = 25000000
shared_pool_size = 15000000
log_buffer = 1048576
parallel_max_servers = 50
recovery_parallelism = 40
dml_locks = 500
processes = 619
sessions = 619
transactions = 600
cursor_space_for_time = TRUE
undo_management = auto
UNDO_SUPPRESS_ERRORS = true
undo_retention = 60
UNDO_TABLESPACE = undo_ts
max_rollback_segments = 520
db_writer_processes = 1
dbwr_io_slaves = 10
Oracle Corporation
World Headquarters
500 Oracle Parkway
Redwood Shores, CA 94065
U.S.A.
Worldwide Inquiries:
Phone: +1.650.506.7000
Fax: +1.650.506.7200
www.oracle.com