EXT4挂载Mount参数分析和建议

本文根据各种测试结果和实际工程经验,对Ext4和XFS的挂载参数(mount)给出参数分析和建议。

总结

如下参数是大数据环境下推荐使用的。

noatime,nodelalloc

noatime

不更新文件系统上 inode 访问记录,可以提升性能。

 Linux 在默认情况下使用atime选项,每次在磁盘上读取(或写入)数据时都会产生一个记录。默认的atime 选项最大的问题在于即使从页面缓存读取文件(从内存而不是磁盘读取),也会产生磁盘写操作!

使用 noatime 选项阻止了读文件时的写操作。大部分应用程序都能很好工作。只有少数程序如 Mutt 需要这些信息。Mutt 的用户应该使用relatime 选项。使用 relatime 选项后,只有文件被修改时才会产生文件访问时间写操作。nodiratime 选项仅对目录禁用了文件访问时间。relatime 是比较好的折衷,Mutt 等程序还能工作,但是仍然能够通过减少访问时间更新提升系统性能。

注意: noatime 已经包含了 nodiratime。不需要同时指定。

Do not update inode access times on this filesystem (e.g., for faster access on the news spool to speed up news servers).

nodiratime

不更新文件系统上的目录 inode 访问记录,可以提升性能。

relatime

实时更新 inode access 记录。只有在记录中的访问时间早于当前访问才会被更新。(与 noatime 相似,但不会打断如 mutt 或其它程序探测文件在上次访问后是否被修改的进程。),可以提升性能。

Update inode access times relative to modify or change time. Access time is only updated if the previous access time was earlier than the current modify or change time. (Similar to noatime, but doesn’t break mutt or other applications that need to know if a file has been read since the last time it was modified.)

Since Linux 2.6.30, the kernel defaults to the behavior provided by this option (unless noatime was specified), and the strictatime option is required to obtain traditional semantics. In addition, since Linux 2.6.30, the file’s last access time is always updated if it is more than 1 day old.

nobarrier

This disables the use of write barriers in the jbd code. This also requires an IO stack which can support barriers, and if jbd gets an error on a barrier write, it will disable again with a warning. Write barriers enforce proper on-disk ordering of journal commits, making volatile disk write caches safe to use, at some performance penalty. If your disks are battery-backed in one way or another, disabling barriers may safely improve performance. The mount options "barrier" and "nobarrier" can also be used to enable or disable barriers, for consistency with other ext4 mount options.

data=journal

All data are committed into the journal prior to being written into the main file system.

data=writeback 

Data ordering is not preserved, data may be written into the main file system after its metadata has been committed to the journal.

defaults

Use default options: rw, suid, dev, exec, auto, nouser, and async.

delalloc

 (*) Defer block allocation until just before ext4 writes out the block(s) in question.  This allows ext4 to better allocation decisions more efficiently.

nodelalloc

Disable delayed allocation. Blocks are allocated when the data is copied from userspace to the page cache, either via the write(2) system call or when an mmap'ed page which was previously unallocated is written for the first time.

discard

SSD磁盘,不要添加discard参数。Ext4默认是nodiscard。

Controls whether ext4 should issue discard/TRIM commands to the underlying block device when blocks are freed. This is useful for SSD devices and sparse/thinly-provisioned LUNs, but it is off by default until sufficient testing has been done.

IMPORTANT: Do not discard blocks in filesystem usage.
Be sure to turn off the discard option when making your Linux filesystem. You want to allow the SSD manage
blocks and its activity between the NVM (non-volatile memory) and host with more advanced and consistent
approaches in the SSD Controller.
Core Filesystems:
• ext4 – the default extended option is not to discard blocks at filesystem make time, retain this, and do not
add the “discard” extended option as some information will tell you to do.
• xfs – with mkfs.xfs, add the –K option so that you do not discard blocks.
If you are going to use a software RAID, it is recommended to use a chunk size of 128k as starting point, depending
on the workload you are going to run. You must always test your workload.

参考

https://www.intel.com/content/dam/support/us/en/documents/ssdc/data-center-ssds/Intel_Linux_NVMe_Guide_330602-002.pdf