Linux 3.11 was released on September 2, 2013
Summary: This release adds support for a new O_TMPFILE open(2) flag that allows easy creation of secure temporary files, experimental dynamic power management for all Radeon GPUs since r600, preliminary support for NFS 4.2 and SELinux Labeled NFS, experimental support for the Lustre distributed filesystem, detailed tracking of which pages a program writes, ARM huge page support and KVM/Xen support for ARM64, SYSV IPC message queue scalability improvements, a low latency network polling mechanism, a compressed swap cache, new drivers and many small improvements.
Contents
-
Prominent features
- New O_TMPFILE open(2) flag to reduce temporary file vulnerabilities
- AMD Radeon experimental dynamic power management support
- Experimental Lustre filesystem client support
- Preliminary support for NFS 4.2 and SELinux Labeled NFS
- Detailed tracking of which pages a task writes
- ARM huge page support, KVM and Xen support for ARM64
- SYSV IPC message queue scalability improvements
- Low latency network polling
- Zswap: A compressed swap cache
- Drivers and architectures
- Core
- Memory management
- Block layer
- File systems
- Networking
- Crypto
- Virtualization
- Security
- Tracing/perf
- Other news sites that track the changes of this release
1. Prominent features
1.1. New O_TMPFILE open(2) flag to reduce temporary file vulnerabilities
O_TMPFILE is a new open(2)/openat(2) flag that makes easier the creation of secure temporary files. Files opened with the O_TMPFILE flag are created but they are not visible in the filesystem. And as soon as they are closed, they get deleted - just as a file you would have opened and unlinked.
There are two uses for these files. One is race-free temporary files (deleted when closed, never reachable from any directory, not subject to symlink attacks, not requiring to come up with unique names - basically, tmpfile(3) done right). Another use is for creating an initially unreachable file, write whatever you want into it, fchmod()/fchown()/fsetxattr() it as you wish, then atomically link it in, already fully set up.
1.2. AMD Radeon experimental dynamic power management support
Drivers for AMD graphic cards have got support for dynamic power management code for all their GPUs from r600 to present day. This code is experimental, and off by default for now. To enable this experimental code it's neccesary to pass the radeon.dpm=1 module parameter.
Code: commit, commit 1, 2, 3, 4, 5, 6, 7, 8
1.3. Experimental Lustre filesystem client support
Lustre is a parallel distributed file system. It can support multiple compute clusters with tens of thousands of client nodes, tens of petabytes (PB) of storage on hundreds of servers, and more than a terabyte per second (TB/s) of aggregate I/O throughput. It is the most popular cluster file system in high performance computing: six of the top 10 and more than 60 of the top 100 supercomputers in the world have Lustre file systems in them.
This release adds client support, but it's experimental, the code is not very clean and needs to live in drivers/staging for some time. See drivers/staging/lustre/TODO for details.
For more details about Lustre, visit http://lustre.org
Code: (commit)
1.4. Preliminary support for NFS 4.2 and SELinux Labeled NFS
Client support for NFS 4.2
Linux 3.11 has gained preliminary client support for NFS 4.2, a new version of the NFS standard that is being currently developed. For details in what features will bring this new version, see this post.
Labeled NFS (SELinux for NFS)
Also, this kernel version has gained support for Labeled NFS, which adds full SELinux support to NFS. Until now, NFS mounts were treated with a single label, usually something like nfs_t; or at best allow an administrator to override the default with a label using the mount --context option. With Labeled NFS, there are lots of different labels supported on an NFS share. This can be useful to secure virtualization applications by setting the label on an image file on a NFS share. It is also useful to export home directories on a NFS share, then confine applications to only be allowed in certain places, instead of allowing to write any file on the NFS share.
Recommended LWN article: LSFMM 2013: NFS status
Code: commit, commit, commit, commit, commit
1.5. Detailed tracking of which pages a task writes
This release adds a mechanism that helps to track which pages a task writes to. This feature is used by the checkpoint-restore project, but it could be used to gain improved statistics and profiling.
For more details, see Documentation/vm/soft-dirty.txt
Code: (commit)
1.6. ARM huge page support, KVM and Xen support for ARM64
The ARM architecture has gained support for huge pages for both 32-bit and 64-bit CPUs. This implementation allows mapping of 2MB sections; the 64K pages configuration is not supported. It also adds support for transparent huge pages; when enabled the kernel will try to map anonymous pages as 2MB sections where possible.
Code: commit, commit, commit, commit,commit
This release also adds KVM and Xen virtualization support for the ARM64 architecture
1.7. SYSV IPC message queue scalability improvements
This improvement continues the work that began in the SYSV IPC semaphore scaling that was merged in Linux 3.10.
Just like semaphores used to be, message queues also abuse the lock used by the SYSV IPC code, unnecessarily holding it for operations such as permission and security checks, which hurts performance and scalability. In this release, work is done to deal with the message queues (future releases will deal with shared memory). A mix of lockless code paths, shortened critical regions, per-semaphore statistics and cacheline assignments are implemented in the code to make it faster and more scalable.
Code: commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
1.8. Low latency network polling
Modern Linux device drivers don't notify the system of new packet arrival with interrupts, because with the current network bandwidth requirements it would generate many thousands of interrupts per second, which can't be handled without severe performance degradation. For that reason, a periodic poll method (called NAPI in Linux) is used instead. However, the polling interval add latency. This release allows applications to request a per-socket low latency poll interval. Currently only ixgbe, mlx4, and bnx2x support this feature. For more details, see the recommended LWN article.
Recommended LWN article: Low-latency Ethernet device polling
Related paper: A way towards Lower Latency and Jitter
1.9. Zswap: A compressed swap cache
Quoting from this recommended LWN article:
"Zswap is a lightweight, write-behind compressed cache for swap pages. It takes pages that are in the process of being swapped out and attempts to compress them into a dynamically allocated RAM-based memory pool. If this process is successful, the writeback to the swap device is deferred and, in many cases, avoided completely. This results in a significant I/O reduction and performance gains for systems that are swapping"
For more details and performance numbers, see this recommended LWN article: The zswap compressed swap cache
2. Drivers and architectures
All the driver and architecture-specific changes can be found in the Linux_3.11-DriversArch page
3. Core
Add alarm timers: Add support for clocks CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM, thereby enabling wakeup alarm timers via file descriptors (commit)
Add support for LZ4 compressed kernels (commit), (commit), (commit)
Add option to log time spent in suspend (commit)
Task scheduler: Add load-tracking statistics to task to /proc/<PID>/sched (commit)
Add support for wound/wait style locks, for more details see Documentation/ww-mutex-design.txt and this recommended LWN article. (commit)
RCU: Remove TINY_PREEMPT_RCU (commit)
timers: Allow the unbinding of clockevents/clocksources, provide sysfs interfaces for allow the unbinding (commit), (commit), (commit)
Implement generic percpu refcounting (commit)
tools/cpupower: Implement disabling of cstate interface (commit), add Haswell family 0x45 specific idle monitor to show PC8,9,10 states (commit)
4. Memory management
Allow mmap's MAP_HUGETLB for hugetlbfs files (commit)
Support mmap() on /proc/vmcore (commit)
Tune vm_committed_as per-cpu counter (commit)
Kswapd and page reclaim behaviour has been screwy in one way or the other for a long time. One example is reports of a large copy operations or backup causing the machine to grind to a halt or applications pushed to swap. Sometimes in low memory situations a large percentage of memory suddenly gets reclaimed. In other cases an application starts and kswapd hits 100% CPU usage for prolonged periods of time and so on. This patch series aims at addressing some of the worst of these problems. Code: (commit 1), 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
5. Block layer
bcache: RAID 5/6 optimizations (commit)
Device-Mapper: add switch target. This target creates a device that supports an arbitrary mapping of fixed-size regions of I/O across a fixed set of paths (commit)
Device Mapper: Add ability to restore transiently failed devices on resume (commit)
blkio cgroup controller: implement proper hierarchy support (commit)
Add AIX partition table support (commit)
md/raid5: allow 5-device RAID6 to be reshaped to 4-device. (commit)
6. File systems
XFS
Disable speculative preallocation for small files, as it causes freespace fragmentation (commit)
Currently userspace has no way of determining that a filesystem is CRC enabled. Add a flag to the XFS_IOC_FSGEOMETRY ioctl output to indicate that the filesystem has v5 superblock support enabled (commit)
Ordered log vector support: It allows to write metadata without physically logging every individual change but still maintain the full transactional integrity guarantees (commit), (commit)
Disable swap extents ioctl on CRC-enabled filesystems (commit)
Disable noattr2/attr2 mount options for CRC-enabled filesystems, as they are always enabled in them (commit)
ext4
Transaction reservation support (commit)
Avoid issuing empty commits unnecessarily (commit)
Make punch hole work with bigalloc mode (commit)
The common writepages code path is now used for the nodelalloc and ext3 compatibility modes. This allows big writes to be submitted much more efficiently as a single bio request, instead of being sent as individual 4k writes (commit)
The extent cache shrink mechanism, which was introduce in kernel 3.9, no longer has a scalability bottleneck caused by the i_es_lru spinlock (commit)
Btrfs
Remove btrfs_sector_sum structure, it improved the performance in a dd benchmark by ~74% on a SSD (31MB/s -> 54MB/s) (commit)
Allow file data clone within a file (commit)
Performance optimization: merge pending I/O for tree log write back. By test, the performance of the sync write went up ~60%(2.9MB/s -> 4.6MB/s) on a SCSI disk whose disk buffer was enabled (commit)
Show compiled-in config features at module load time (commit)
Add ioctl to wait for qgroup rescan completion (commit)
F2FS
GFS2
Add atomic open support whose main benefit will be the reduction in locking overhead in case of combined lookup/create and open operations(commit)
CIFS
Add SMB 3.02 dialect support (commit)
SMB3 Signing enablement (commit)
Handle big endianness in NTLM (ntlmv2) authentication (commit)
SMB2 FSCTL and IOCTL requests(commit)
Add a "nosharesock" mount option to force new sockets to server to be created (commit)
HPFS
Implement prefetch to improve performance (commit)
FAT
Add FAT_IOCTL_GET_VOLUME_ID (commit)
NILFS2
Implement calculation of free inodes count (commit)
7. Networking
sit: add IPv4 over IPv4 support (commit)
9p: Make 9P2000.L the default protocol for 9P file system (commit)
9p: add privport option to 9P TCP transport (commit)
Add VF link state control (commit)
ipv6: add support of peer address (commit)
mac80211: add support for per-chain signal strength reporting (commit)
mac80211: enable Auth Protocol Identifier on mesh config. (commit)
gso: Update tunnel segmentation to support Tx checksum offload (commit)
Implement /proc/net/icmp6. (commit)
bridge: Add a flag to control unicast packet flood. (commit)
netfilter: Implement RFC 1123 for FTP conntrack (commit)
nl80211: Add generic netlink module alias for cfg80211/nl80211 (commit)
openvswitch: Add gre tunnel support. (commit)
openvswitch: Add tunneling interface. (commit)
packet: nlmon: virtual netlink monitoring device for packet sockets (commit)
xfrm: add LINUX_MIB_XFRMACQUIREERROR statistic counter (commit)
- RDMA
- NFC
MPLS: Add limited GSO support (commit)
8. Crypto
sha256_ssse3: add SHA-224 support (commit)
sha512_ssse3: add SHA-384 support (commit)
Add LZ4 Cryptographic API (commit)
crct10dif: Accelerated CRC T10 DIF computation with PCLMULQDQ instruction (commit), (commit)
dcp: support for Freescale's DCP co-processor (commit)
9. Virtualization
hv
vmbus: Implement multi-channel support (commit)
10. Security
Smack
Local IPv6 port-based controls (commit)
Add smkfstransmute mount option. It complements the smkfsroot option but also marks the root inode as transmutting (commit)
Apparmor
Remove "permipc" command (commit)
11. Tracing/perf
Tracing
Add function probe to trigger a ftrace dump of current CPU trace (commit)
Add function probe to trigger a ftrace dump to console (commit)
perf
Add --percent-limit option in perf report and perf top for not showing small overhead entries in the output; also, a report.percent-limit config variable (commit), (commit), (commit)
Add sysfs entry /sys/device/xxx/perf_event_mux_interval_ms to ajust the multiplexing interval per PMU. The unit is milliseconds (commit)
perf record: Remove -A/--append option, it's no longer working and needed (commit)