Linux 3.12 was released on November 2, 2013
Summary: This release adds support for offline deduplication in Btrfs, automatic GPU switching in laptops with dual GPUs, a performance boost for AMD Radeon graphics, better RAID-5 multicore performance, improved handling of out-of-memory situations, improved VFS path name resolution scalability, improvements to the timerless multitasking mode, separate modesetting and rendering device nodes in the graphics DRM layer, improved locking performance for virtualized guests, XFS directory recursion scalability improvements, IPC scalability improvements, tty layer locking improvements, new drivers and many small improvements.
Contents
-
Prominent features
- Offline data deduplication support in Btrfs
- Graphic performance boost for AMD Radeon hardware
- Automatic GPU switching in laptops with dual GPUs
- Separate devices nodes for graphics mode setting and rendering
- Improved timerless multitasking: allow timekeeping CPU go idle
- RAID5 multithreading
- Improved locking performance for virtualized guests
- New lockref locking scheme, VFS locking improvements
- Better Out-Of-Memory handling
- XFS directory recursion scalability, namespace support
- Improved tty layer locking
- IPC locking improvements
- Drivers and architectures
- Core
- Memory management
- Block layer
- File systems
- Networking
- Crypto
- Virtualization
- Security
- Tracing/perf
- Other news sites that track the changes of this release
1. Prominent features
1.1. Offline data deduplication support in Btrfs
The Btrfs filesystem has gained support for offline data deduplication. Deduplication consists in removing copies of repeated data in the filesystem, since the data is the same only one copy is necessary. In some particular workloads, like virtualization VMs -which often contain similar copies of operating systems- or backups the gains can be enormous. By "offline", it means that the deduplication process is done when the file system is mounted and running, but it's not done automatically and transparently as processes write data, but rather it's triggered by userspace software at a time controlled by the system administrator. Online deduplication will be added future releases.
The bedup deduplication tool has a branch that works against this support. The branch can be found here.
The author of the deduplication support has also written an sample deduplication tool, duperemove, which can be found here.
Code: commit
1.2. Graphic performance boost for AMD Radeon hardware
The website Phoronix.com found that graphic performance in modern AMD Radeon GPUs had improved a lot in Linux 3.12. However, there hasn't been any important modification on in the Radeon driver that can cause such massive gains. After some investigation, Phoronix found out that the responsible change for this boost wasn't a change in the Radeon driver itself, but a change to the algorithms in the cpufreq ondemand governor. Apparently, the ondemand governor was oscillating too much between frequencies, and this oscillation harmed graphic performance for Radeon GPUs. The new frequency algorithm eliminates this problem.
Code: commit
1.3. Automatic GPU switching in laptops with dual GPUs
Some laptop hardware, like Nvidia Optimus, have two GPUs, one optimized for performance and other for power saving. Until now, some hacks have been needed to switch between these GPUs. In this release, the driver handles the switch automatically
1.4. Separate devices nodes for graphics mode setting and rendering
Recent hardware development (especially on ARM) shows that rendering (via GPU) and mode-setting (via display-controller) are not necessarily bound to the same graphics device. This release incorporates in the graphics layer support for separate device nodes for mode setting and rendering. The main usage is to allow different access-modes for graphics-compositors (which require the modeset API) and client-side rendering or GPGPU-users (which both require the rendering API).
For more information, see this blog post: Splitting DRM and KMS device nodes
1.5. Improved timerless multitasking: allow timekeeping CPU go idle
Linux 3.10 added support for timerless multitasking, that is, the ability to run processes without needing to fire up the timer interrupt that is traditionally used to implement multitasking. This support, however, had a caveat: it could turn off interrupts in all CPUs, except one that is used to track timer information for the other CPUs. But that CPU keeps the timer turned on even if all the CPUs are idle, which was useless. This release allows to disable the timer for the timekeeping CPU when all CPUs are idle.
Recommended LWN article: Is the whole system idle?
Code: commit 1, 2, 3, 4, 5, 6, 7, 8
1.6. RAID5 multithreading
This release attempts to spread the work needed to handle raid 5 stripes to multiple CPUs in the MD ("multiple devices") layer, which allows more IO/sec on fast (SSD) devices.
1.7. Improved locking performance for virtualized guests
The operating system that runs in each virtualized guest also runs its own locks. With some locks, like spinning locks, this causes problems when many guests are present and keep spinning and wasting host CPU time and other problems. This release replaces paravirtualized spinlocks with paravirtualized ticket spinlocks, which have better performance properties for virtualized guests and brings speedups on various benchmarks.
Recommended paper: Prevent Guests from Spinning Around
1.8. New lockref locking scheme, VFS locking improvements
This release adds a new locking scheme, called "lockref". The "lockref" structure is a combination "spinlock and reference count" that allows optimized reference count accesses. In particular, it guarantees that the reference count will be updated as if the spinlock was held, but using atomic accesses that cover both the reference count and the spinlock words, it can often do the update without actually having to take the lock. This allows to avoid the nastiest cases of spinlock contention on large machines. When updating the reference counts on a large system, it will still end up with the cache line bouncing around, but that's much less noticeable than actually having to spin waiting for the lock. This release already uses lockref to improve the scalability of heavy pathname lookup in large systems.
Recommended LWN article: Introducing lockrefs
1.9. Better Out-Of-Memory handling
The Out-Of-Memory state happens when the computer runs out of RAM and swap memory. When Linux gets into this state, it kills a process in order to free memory. This release includes important changes to how the Out-Of-Memory states are handled, the number of out of memory errors sent to userspace and reliability. For more details see the below link.
Recommended LWN article: Reliable out-of-memory handling
Code: commit 1, 2, 3, 4, 5, 6, 7
1.10. XFS directory recursion scalability, namespace support
XFS has added support for a directory entry file type, the purpose is that readdir can return the type of the inode the dirent points to userspace without first having to read the inode off disk. Performance of directory recursion is much improved. Parallel walk of ~50 million directory entries across hundreds of directories improves significantly, from roughly 500 getdents() calls per second and 250,000 inode lookups per second to determine the inode type at roughly 17,000 read IOPS to 3500 getdents() calls per second at 16,000 IOPS, with no inode lookups at all.
This release has also added XFS support for namespaces, and has reincorporated defragmentation support for the new CRC filesystem format.
1.11. Improved tty layer locking
The tty layer locking got cleaned up and in the process a lot of locking became per-tty, which actually shows up on some odd loads.
Commits: merge commit
1.12. IPC locking improvements
This release includes improvements on the amount of contention we impose on the ipc lock (kern_ipc_perm.lock). These changes mostly deal with shared memory, previous work has already been done for semaphores in 3.10 and message queues in 3.11.
With these chanves, a custom shm microbenchmark stressing shmctl doing IPC_STAT with 4 threads a million times, reduces the execution time by 50%. A similar run, this time with IPC_SET, reduces the execution time from 3 mins and 35 secs to 27 seconds.
Code: commit, 2, 3, 4, 5, 6, 7, 8, 9, 10
2. Drivers and architectures
All the driver and architecture-specific changes can be found in the Linux_3.12-DriversArch page
3. Core
task scheduler: Implement smarter wake-affine logic commit
seqlock: Add a new locking reader type commit
idr: Percpu ida commit
initmpfs: use initramfs if rootfstype= or root= specified commit
Lock in place mounts from more privileged users commit
sysfs: Restrict mounting sysfs commit
CacheFiles: Implement interface to check cache consistency commit, commit
modules: add support for soft module dependencies commit, commit
Add support to aio ring pages migration commit
Implement generic deferred AIO completions commit
4. Memory management
Rework the caching shrinking mechanisms, recommended LWN article: Smarter shrinkers; commit
Data writeback: add strictlimit feature. The feature prevents mistrusted filesystems (ie: FUSE mounts created by unprivileged users) to grow a large number of dirty pages before throttling. commit
Page allocator: fair zone allocator policy commit
Account anon transparent huge pages into NR_ANON_PAGES commit
swap: change block allocation algorithm for SSD commit
swap: make cluster allocation per-cpu commit
swap: make swap discard async commit
5. Block layer
Detect hybrid MBRs commit
dm cache: add data block size limits. Inform users that the data block size can't be any arbitrary number, i.e. its value must be between 32KB and 1GB. Also, it should be a multiple of 32KB commit
6. File systems
- Btrfs
Limit the size of delayed allocation ranges, which will limit extent sizes to 128 MB commit
Allow compressed extents to be merged during defragment commit
Add mount option to force UUID tree checking commit
Check UUID tree during mount if required commit
Create UUID tree if required commit
Fill UUID tree initially commit
- Ext4
Add support for extent pre-caching through a new fiemap flag. This is critically important when using AIO to a preallocated file commit, commit
Allow specifying external journal by pathname mount option commit
Mark block group as corrupt on block bitmap error commit
Mark group corrupt on group descriptor checksum commit
- Ext3
Allow specifying external journal by pathname mount option commit
- XFS
- F2FS
Add support the inline xattrs commit, http://git.kernel.org/linus/65985d935ddd5657c66a8bb3ae9752ed842549b8 commit]
Add support for controlling the garbage collection policy commit, commit
- Pstore
- CEPH
Punch hole support commit
- HFS+
Implement POSIX ACLs support commit
- NFS
Refuse mount attempts with proto=udp commit
- isofs
Refuse RW mount of the filesystem instead of making it RO commit
- udf
Refuse RW mount of the filesystem instead of making it RO commit
7. Networking
tcp: TCP_NOTSENT_LOWAT socket option commit
tcp: TSO packets automatic sizing commit
tcp: add tcp_syncookies mode to allow unconditionally generation of syncookies commit
tcp: increase throughput when reordering is high commit
tcp: prefer packet timing to TS-ECR for RTT commit
tcp: use RTT from SACK for RTO commit
ipv6: Add generic UDP Tunnel segmentation commit
ipv6: drop fragmented ndisc packets by default (RFC 6980) commit
ipv6: mld: implement RFC3810 MLDv2 mode only commit
bridge: apply multicast snooping to IPv6 link-local, too commit
macvlan fdb replace support commit
Devices: export physical port id via sysfs commit
igmp: Allow user-space configuration of igmp unsolicited report interval commit
tcp_probe: add IPv6 support commit
tcp_probe: allow more advanced ingress filtering by mark commit
netfilter: add IPv6 SYNPROXY target commit
- Wireless
- openvswitch
pkt_sched: fq: Fair Queue packet scheduler commit
pktgen: Add UDPCSUM flag to support UDP checksums commit
qdisc: allow setting default queuing discipline commit
tun: Add ability to create tun device with given index commit
tun: Allow to skip filter on attach commit
tun: Support software transmit time stamping. commit
tuntap: hardware vlan tx support commit
vxlan: Add tx-vlan offload support. commit
vxlan: add ipv6 support commit
NFC: Add a GET_SE netlink API, which dumps a list of discovered secure elements in an NFC controller commit
Infiniband: Add receive flow steering support commit
USBNET: Improving USB3 thoughtput commit
8. Crypto
omap-sham - Add OMAP5/AM43XX SHAM Support commit
omap-sham - Add SHA384 and SHA512 Support commit
Add NEON accelerated XOR implementation commit
Add OMAP4 random generator support commit
9. Virtualization
Adds nested EPT support to KVM's nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use EPT when running a nested guest L2 commit
vmware: Add support for virtual IOMMU commit, Add support for virtual IOMMU in VMXNET3 commit
vfio-pci: PCI hot reset interface commit
vfio: add external user support commit
xen: Support 64-bit PV guest receiving NMIs commit
Add xen tpmfront interface commit
10. Security
- Apparmor
Add an optional profile attachment string for profiles commit
Add interface files for profiles and namespaces commit
Add the ability to report a sha1 hash of loaded policy commit
Add the profile introspection file to interface commit
Allow setting any profile into the unconfined state commit
Enable users to query whether apparmor is enabled commit
11. Tracing/perf
- perf
Add option to limit stack depth in callchain dumps commit
Add option to print stack trace on single line commit
diff: Add generic order option for compute sorting commit
diff: Update perf diff documentation for multiple data comparison commit
gtk/hists: Display callchain overhead also commit
kvm stat report: Add option to analyze specific VM commit
list: Skip unsupported events commit
perf report/top: Add option to collapse undesired parts of call graph commit
stat: Add support for --initial-delay option commit
symbols: Add support for reading from /proc/kcore commit
tools: Add 'S' event/group modifier to read sample value commit
tools: Default to cpu// for events v5 commit
tools: Make it possible to read object code from kernel modules commit
tools: Make it possible to read object code from vmlinux commit
top: Add --objdump option commit
trace: Allow specifying which syscalls to trace commit
trace: Implement -o/--output filename commit
trace: Support ! in -e expressions commit