Linux 3.1 released on 24 Oct 2011
Summary: Support for the OpenRISC opensource CPU, performance improvements to the writeback throttling, some speedups in the slab allocator, a new iSCSI implementation, support for Near-Field Communication chips used to enable mobile payments, bad block management in the generic software RAID layer, a new "cpupowerutils" userspace utility for power management, file system barriers enabled by default in ext3, Wii Remote controller support and new drivers and many small improvements.
Contents
-
Prominent features in the 3.1 kernel
- New architecture: OpenRISC
- Dynamic writeback throttling
- File system barriers enabled by default in ext3
- Support for Near-Field Communication
- Slab allocator speedups
- VFS scalability improvements
- New iSCSI implementation
- cpupowerutils
- Software RAID: Bad block management
- Personality to report 2.6.x version numbers
- Wii Remote controller support
- Driver and architecture-specific changes
- Memory management
- Networking
- File systems
- Block layer
- Crypto
- Virtualization
- Security
- Tracing/profiling
- Various core changes
1. Prominent features in the 3.1 kernel
1.1. New architecture: OpenRISC
OpenRISC is an Open Source CPU from the OpenCores project that brings to the world of hardware all the same advantages that Open Source software has known for so long. The aim of the project is to create free and open source computing platforms available under the GNU (L)GPL license, and a set of free, open source implementations of the architecture and open source software development tools, libraries, operating systems and applications. The implementation merged in this release is the 32-bit OpenRISC 1000 family (or1k). Details about the CPU can be found here
Code: arch/openrisc
1.2. Dynamic writeback throttling
Recommended LWN article: "Dynamic writeback throttling"
"Writeback" is the process of writing data from the RAM to the disk, and in this context throttling means blocking processes temporally to avoid them creating new data that needs to be written, until the current data has been written to the disk. The writeback code was suboptimal, because in certain situations it throttled various processes and forced them to write their data to the disk simultaneously, creating random IO patterns which are not good for performance. The new code avoids these situations and helps to create more linear IO patterns. Also, the new code tries to detect the available disk bandwidth, which is used to improve the heuristics that decide which processes should be throttled, which should lead overall to improved throughput. The writeback<->filesystems coupling has also been improved, and a SMP scaling problem has also been fixed.
Code: (commit), (commit), (commit), (commit), (commit), (commit)
1.3. File system barriers enabled by default in ext3
Hard disks have a memory buffer were they temporally store the instructions and data issued from the OS while the disk processes it. The internal software of modern disks changes the order of the instructions to improve performance, which means that instructions may or may not be committed to the disk in the same order the OS issued them. This breaks many of the assumptions that file systems need to reliably implement things like journaling or COW, so disks provide a "cache flush" instruction that the OS uses when it needs it. In the Linux world, when a file system issues that instruction, it is called a "barrier". File systems such as XFS, Btrfs and ext4 already use and enable barriers by default; ext3 supports them but until this release it did not enable them by default: while the data safety guarantees are higher, their performance impact in Ext3 is noticeable in many common workloads, and it considered that it was an unacceptable performance regression to enable them by default. However, Linux distributions like Red Hat have enabled barriers by default in Ext3 for a long time, and now the default for mainline has been changed as well.
In other words: if you use Ext3 and you note performance regressions with this release, try disabling barriers ("barrier=0" mount option).
Code: (commit)
1.4. Support for Near-Field Communication
Near-field communication (Wiki article) allows for simplified wireless exchange of data between two devices in close proximity to each other, usually by no more than a few centimeters. Co-invented by NXP Semiconductors and Sony in 2002, NFC chips can be found in many smartphones already available in the market and more are planning to add them.
NFC is expected to become a widely used system for making payments by smartphone in the US: shoppers who have their credit card information stored in their NFC smartphones can pay for purchases by waving their smartphones near or tapping them on the reader, rather than bothering with the actual credit card. A smartphone or tablet with an NFC chip could make a credit card payment or serve as keycard or ID card. NFC devices can also read NFC tags on a museum or retail display to get more information or an audio or video presentation. NFC can also be used to share contacts, photos, songs, applications, or videos.
This release adds a NFC subsystem and a new NFC socket family.
Code: (commit 1, 2, 3, 4, 5, 6)
1.5. Slab allocator speedups
In this release, the "slub" slab allocator (the default one) implements wider lockless operations in most of the slowpaths in architectures that support CMPXCHG (compare and exchange). In particular the patch decreases the overhead in the performance critical section that frees the slabs, which speeds up slab-intensive workloads
Code: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
1.6. VFS scalability improvements
Like it happened in past releases, this release has a new round of VFS scalability improvements:
Convert the inode_stat.nr_unused counter to a per-cpu counter (commit)
Convert the global LRU list of unused inodes to a per-superblock LRU list (commit), (commit), (commit), (commit), (commit)
As a consequence of the per-superblock LRU list of unused inodes, remove the iprune_sem semaphore (commit)
Kill i_alloc_sem and replace its functionality with a simpler scheme (see commit for details) (commit)
Mount lock scalability for file systems that don't have a mount point (e.g. sockfs and pipefs) (commit)
Avoid taking inode_hash_lock on pipes and sockets (commit)
1.7. New iSCSI implementation
Recommended LWN article: A tale of two SCSI targets
The current iSCSI implementation used in the kernel, SCST, has been obsoleted with the inclusion of the Linux-iSCSI.org SCSI target. The Linux-iSCSI.org target module is a full featured in-kernel software implementation of iSCSI target mode (RFC-3720). More information can be found here.
Code: (commit)
1.8. cpupowerutils
cpupowerutils is a new project derived from cpufrequtils and extended with other features, like a powerful HW monitoring tool. Why a new project? The announcement explains it:
"CPU power consumption vs performance tuning is not about CPU frequency switching anymore for quite some time. Deep sleep states, traditional dynamic frequency scaling and hidden turbo/boost frequencies are tight close together and depend on each other. The first two exist on different architectures like PPC, Itanium and ARM the latter only on x86. On x86 the APU (CPU+GPU) will only run most efficiently if CPU and GPU has proper power management in place. Users and Developers want to have *one* tool to get an overview what their system supports and to monitor and debug CPU power management in detail". cpupowerutils is that tool.
The code is available in tools/power/cpupower/
1.9. Software RAID: Bad block management
The MD layer (aka "Multiple Devices", aka software raid) has gained bad block management support: bad blocks will be added to a list, and the system will try not to use them. This feature requires an updated mdadm version.
Code: many commits
1.10. Personality to report 2.6.x version numbers
There are some programs which broke with the new Linux 3.0 version. Some of those were binary only (for example, all kind of management software from a certain printer manufacturer). sys.platform in Python is also know to return "linux3" under 3.0, which breaks things that were checking for sys.platform == "linux2". To solve this problem, a UNAME26 personality has been added to report 2.6.x version numbers (commit)
1.11. Wii Remote controller support
The driver for the Nintendo Wii Remote controller (wiimote) has been added
Code: (commit)
2. Driver and architecture-specific changes
All the driver and architecture-specific changes can be found in the Linux_3.1_DriverArch page
3. Memory management
Memory control group: add memory.vmscan_stat (commit)
Extend memory hotplug API to allow memory hotplug in virtual machines (commit)
page allocator: fix significant stalls while copying large amounts of data on NUMA machines (commit), (commit)
tmpfs: Increase the file size limit (commit)
"Slub" slab allocator
Add method to verify memory is not freed with slub_debug (commit)
Reduce overhead of slub_debug (commit)
4. Networking
AF_PACKET: add 'cpu' fanout policy. (commit), (commit), (commit)
B.A.T.M.A.N: improved client announcement mechanism (commit), (commit)
Add support for skb zero-copy buffers (commit), (commit), (commit)
Compute protocol sequence numbers and fragment IDs using MD5 instead of MD4, as is inline with both RFC1948 and other OS (commit)
Add multicast group for DCB (commit)
- Netfilter
Lower the default initRTO from 3secs to 1sec, as per RFC2988bis. It falls back to 3secs if the SYN or SYN-ACK packet has been retransmitted, AND the TCP timestamp option is not on (commit)
SCTP: Add Auto-ASCONF (RFC5061) support (commit), (commit), (commit), (commit)
inetpeer microoptimization: reduce the false sharing effect by reordering the members of a struct (commit)
ipv4 microoptimization: save cpu cycles from check_leaf(), with route cache disabled this saves ~2% of cpu in udpflood bench (commit)
ethtool: remove support for ETHTOOL_GRXNTUPLE (commit)
9P: Add 9P2000.L renameat operation (commit), add 9P2000.L unlinkat operation (commit)
5. File systems
Btrfs
Improve ls readdir() performance significantly (commit), (commit)
Switch the btrfs tree locks to reader/writer (see commit link for details) (commit)
NFS
XFS
REISERFS
Default to barrier=flush (commit)
FAT
Fat16 support maximum 4GB file/vol size as WinXP or 7 (commit)
HFSplus
Lift the 2TB size limit (commit)
SquashFS
Make zlib compression support optional (commit)
6. Block layer
Strict CPU affinity, by writing the value 2 to /sys/block/<bdev>/queue/rq_affinity (commit)
loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices (commit), (commit)
Device Mapper
flakey target: add corrupt_bio_byte feature (commit), add drop_writes (commit)
Support the MD RAID1 personality through the dm-raid target (commit)
raid: Support metadata devices (commit)
7. Crypto
Add ablkcipher support (commit)
s390: support hardware accelerated SHA-224 (commit)
eCryptfs: Add mount option to check uid of device being mounted = expect uid (commit)
encrypted-keys: add key format support (commit), add eCryptfs format support (commit)
8. Virtualization
KVM
Nested VMX (Intel virtualization) support (commit)
Enable ERMS feature support (can enhance fast strings attempts to move as much of the data with larger size load/stores as possible) (commit)
vhost TX zero-copy support (commit)
Lockless walking shadow page table (commit)
MMIO page fault support (commit)
PPC: e500: Add shadow PID support (commit), add support for Book3S processors in hypervisor mode (commit), book3s_hv: Add support for PPC970-family processors (commit)
Xen
VMware
vmxnet3: Enable GRO support. (commit)
Others
Introduce Freescale hypervisor management driver (commit)
9. Security
TOMOYO
Add auditing interface. (commit)
Add ACL group support. (commit)
Add policy namespace support. (commit)
Add built-in policy support. (commit)
Make several options configurable. (commit)
Allow using the following properties as conditions: argv[]/envp[] of execve() (commit), executable's realpath and symlink's target (commit), owner/group etc. of file objects (commit), UID/GID etc. of current thread (commit)
10. Tracing/profiling
perf probe: Support adding probes on offline kernel modules (commit), (commit)
perf: Add model 45 Intel Sandy Bridge support (commit)
perf report/annotate/script: Add option to specify a CPU range (commit)
perf tools: Add inverted call graph report support. (commit)
11. Various core changes
Add lock-less NULL terminated single list (commit)
crc8: add new library module providing crc8 algorithm (commit)
Make gen_pool memory allocator lockless. This makes it safe to use in NMI handlers and other special unblockable contexts that could otherwise deadlock on locks (commit)
ptrace: implement PTRACE_INTERRUPT (commit), PTRACE_LISTEN (commit), PTRACE_SEIZE (commit), TRAP_NOTIFY (commit)
module: add /sys/module/<name>/uevent files (commit)
Add SEEK_HOLE and SEEK_DATA flags in lseek() (commit)
coredump: escape "/" in hostname and comm strings (commit)
ipc: introduce shm_rmid_forced sysctl. If set to 1, all shared memory objects will be automatically forced to use IPC_RMID. (commit)
PM / Domains: Support for generic I/O PM domains (v8) (commit)
PM / Domains: System-wide transitions support for generic domains (v5) (commit)
PM / Domains: Wakeup devices support for system sleep transitions (commit)