|
|
Subscribe / Log in / New account

2.6.36 merge window: the sequel

By Jonathan Corbet
August 11, 2010
As of this writing, some 6700 non-merge changesets have been accepted for the 2.6.36 development cycle. These changes bring a lot of fixes and a number of new features, some of which have been in the works for some time. The most interesting changes since last week's summary are summarized here.

User-visible changes include:

  • The ext3 filesystem, once again, defaults to the (safer) "ordered" mode at mount time. This reverses the change (to "writeback" mode) made in 2009, which was typically overridden by distributions.

  • The out-of-memory killer has been rewritten. The practical result is that the system may choose different processes to kill in out-of-memory situations, and the user-space API for adjusting how attractive processes appear to the OOM killer has changed.

  • The fanotify mechanism has been merged. Fanotify allows a user-space daemon to obtain notification of file operations and, perhaps, block access to specific files. It is intended for use with malware scanning applications, but there are other potential uses (hierarchical storage management, for example) as well.

  • There is a new system call for working with resource limits:

        int prlimit64(pid_t pid, unsigned int resource, 
                      const struct rlimit64 *new_rlim, struct rlimit64 *old_rlim);
    

    It is meant to (someday) replace setrlimit(); the differences include the ability to modify limits belonging to other processes and the ability to query and set a limit in a single operation.

  • The TTY driver has gained support for the EXTPROC mode supported by BSD for the last 20 years or so. This option was originally developed to facilitate telnet's "linemode", but it is useful for contemporary protocols as well.

  • New drivers:

    • Processors and systems: Ingenic JZ4740 SOC systems, Trapeze ITS GPR boards, ifm PDM360NG boards, Freescale P1022DS reference boards, TQM mcp8xx-based boards, TI TNETV107X-based systems, OMAP4430-based PandaBoards, NVIDIA Tegra-based systems, and Tilera TILEPro and TILE64 processors (a whole new architecture).

    • Block: QLogic ISP82XX host adaptors, AppliedMicro 460EX processor on-chip SATA controllers, Samsung S3C/S5P board PATA controllers, and Moorestown NAND Flash controllers.

    • Media: EasyCAP USB video adapters, Softlogic 6x10 MPEG codec cards, Winbond/Nuvoton NUC900-based audio controllers, Cirrus Logic CS42L51 codecs, Cirrus Logic EP93xx series audio devices, Marvell Kirkwood I2S audio devices, Ingenic JZ4740-based audio devices, SmartQ board audio devices, Wolfson Micro WM8741 codecs, and Samsung S5P FIMC video postprocessors.

    • Miscellaneous: Silicon Image sil164 TMDS transmitters, TI DSP bridge devices, PCILynx TSB12LV21/A/B controllers (as a FireWire sniffer; the user-space side has also been added under tools/firewire), Bosch Sensortec BMP085 digital pressure sensors, ROHM BH1780GLI ambient light sensors, Honeywell HMC6352 compasses, Summit Microelectronics SMM665 six-channel active DC output controller/monitor devices, JEDEC JC 42.4 compliant temperature sensors, Intel Topcliff PCH DMA controllers, Intel Moorestown DMAC1 and DMAC2 controllers, Intel Moorestown MAX3110 and MAX3107 UARTs, Intel Medfield UARTs, Quatech SSU-100 USB serial ports, and ARM Primecell SP805 watchdog timers.

Changes visible to kernel developers include:

  • The SCSI layer now supports runtime power management, but almost no work has been done (yet) to push that support down into individual drivers.

  • The MIPS architecture now has kprobes support.

  • The KGDB debugger is now supported with the Microblaze architecture.

  • There are a few new build-time configuration commands: listnewconfig outputs a list of new configuration options, oldnoconfig sets all new configuration options to "no" without asking, alldefconfig sets all options to their default values, and savedefconfig writes a minimal configuration file in defconfig. (This patch adding the first two options above also introduces a new Whatevered-by: patch tag, with unknown semantics).

  • There is a new scripts/coccinelle directory containing a number of Coccinelle "semantic patches" which perform various useful checks. They can be run with "make coccicheck".

  • The kmemtrace ftrace plugin is gone; "perf kmem" should be used instead. The ksym plugin has also been superseded by perf, and, thus, removed.

  • There is a new function for short, blocking delays:

        void usleep_range(unsigned long min, unsigned long max);
    

    This function will sleep (uninterruptibly) for a period between min and max microseconds. It is based on hrtimers, so the timing will be more precise than obtained with msleep().

  • The new IRQF_NO_SUSPEND flag for request_irq() will cause the interrupt line not to be disabled during suspend; IRQF_TIMER can no longer be (mis)used for this purpose.

  • The concurrency-managed workqueues patch set has been merged, completely changing the way workqueues are implemented. One immediate user-visible result will be that there should be far fewer kernel threads running on most systems. All users of the "slow work" API have been converted to concurrency-manged workqueues, so the slow work mechanism has been removed from the kernel.

  • The cpuidle mechanism has been enhanced to allow for the set of available idle states to change over time. Details can be found in this patch.

  • The Blackfin architecture has gained dynamic ftrace support.

  • There is a new super_operations method called evict_inode(); it handles all of the necessary work when an in-core inode is being removed. It should be used instead of clear_inode() and delete_inode().

  • The inotify mechanism has been removed from inside the kernel; the fsnotify mechanism must be used instead. (Of course, the user-space inotify interface is still supported).

  • The Video4Linux2 layer has gained a new framework which simplifies the handling of controls; see this commit and Documentation/video4linux/v4l2-controls.txt for details.

  • The open() and release() functions in struct block_device_operations are now called without the big kernel lock held. Additionally, the locked_ioctl() function has gone away; all block drivers must implement their own locking there as well.

  • The domain name resolution code has been pulled out of the CIFS filesystem and made generic. It works by using the key mechanism to request DNS resolution from user space; see Documentation/networking/dns-resolver.txt for details.

The merge window remains open as of this writing, so we may yet see more interesting features merged for 2.6.36. Watch this space next week for the final merge window updates for this development cycle.

Index entries for this article
KernelReleases/2.6.36


to post comments

2.6.36 merge window: the sequel

Posted Aug 12, 2010 0:07 UTC (Thu) by viro (subscriber, #7872) [Link]

Actually, there's a bit more to ->evict_inode() story: ->drop_inode() has also changed.

Current rules are pretty simple:

1) ->drop_inode() is called when we release the last reference to struct inode. It tells us whether fs wants inode to be evicted (as opposed to retained in inode cache). Doesn't do actual eviction (as it used to), just returns an int. The normal policy is "if it's unhashed or has no links left, evict it now". generic_drop_inode() does these checks. NULL ->drop_inode means that it'll be used. generic_delete_inode() is "just evict it". Or fs can set rules of its own; grep and you'll see.

2) ->delete_inode() and ->clear_inode() are gone; ->evict_inode() is called in all cases when inode (without in-core references to it) is about to be kicked out, no matter why that happens (->drop_inode() telling that it shouldn't be kept around, memory pressure, umount, etc.) It will be called exactly once per inode's lifetime. Once it returns, inode is basically just a piece of memory about to be freed.

3) ->evict_inode() _must_ call end_writeback(inode) at some point. At that point all async access from VFS (writeback, basically) will be completed and inode will be fs's to deal with. That's what calls of clear_inode() in original ->delete_inode() should turn into. Don't dirty an inode past that point; it never worked to start with (writeback logics would've refused to trigger ->write_inode() on such inodes) and now it'll be detected and whined about.

4) kicking the pages out of page cache (== calling truncate_inode_pages()) is up to ->evict_inode() instance; that was already the case for ->delete_inode(), but not for ->clear_inode(). Of course, if fs doesn't use page cache for that inode, it doesn't have to bother. Other than that, ->evict_inode() instance is basically a mix of old ->clear_inode() and ->delete_inode(). inodes with NULL ->evict_inode() behave exactly as ones with NULL ->delete_inode() and NULL ->clear_inode() used to.

That's it. Original was much more convoluted...

About tty line mode

Posted Aug 12, 2010 11:13 UTC (Thu) by raul_benito (subscriber, #6964) [Link]

2.6.36 merge window: the sequel

Posted Aug 19, 2010 9:33 UTC (Thu) by renox (guest, #23785) [Link] (1 responses)

>> The ext3 filesystem, once again, defaults to the (safer) "ordered" mode at mount time. This reverses the change (to "writeback" mode) made in 2009, which was typically overridden by distributions. <<

Is-there an interesting inside story with this?
This kind of change of the default setting looks really amateurish for an external point of view..

Re: ext3: default to ordered mode

Posted Aug 19, 2010 11:45 UTC (Thu) by cladisch (✭ supporter ✭, #50193) [Link]

From this thread:
Ok, so now I know *why* that one filesystem got busted - I built a
kernel without CONFIG_EXT3_DEFAULTS_TO_ORDERED set and it got a
forced reboot (echo b > proc/sysrq-trigger). That'll teach me for
trying to reproduce bugs Andrew is tripping over with his config
files.
Quite frankly, data=writeback mode for ext3 is a dangerous,
dangerous configuration to run by default. IMO, it shouldn't be the
default. Patch below.
Jan's changelog:
data=writeback mode is dangerous as it leads to higher data loss and stale data
exposure when systems crash. It should not be the default, especially when all
major distros ensure their ext3 filesystems default to ordered mode. Change the
default mode to the safer data=ordered mode, because we should be caring far
more about avoiding stale data exposure than performance.


Copyright © 2010, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds