Re: [Xen-devel] [PATCH] xen/balloon: don't online new memory initially

From: Igor Druzhinin
Date: Mon Jul 03 2017 - 14:44:46 EST


On 03/07/17 16:40, Juergen Gross wrote:
> When setting up the Xenstore watch for the memory target size the new
> watch will fire at once. Don't try to reach the configured target size
> by onlining new memory in this case, as the current memory size will
> be smaller in almost all cases due to e.g. BIOS reserved pages.
>
> Onlining new memory will lead to more problems e.g. undesired conflicts
> with NVMe devices meant to be operated as block devices.
>
> Instead remember the difference between target size and current size
> when the watch fires for the first time and apply it to any further
> size changes, too.
>
> In order to avoid races between balloon.c and xen-balloon.c init calls
> do the xen-balloon.c initialization from balloon.c.
>
> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
> ---
> drivers/xen/balloon.c | 3 +++
> drivers/xen/xen-balloon.c | 20 ++++++++++++--------
> include/xen/balloon.h | 8 ++++++++
> 3 files changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 50dcb68d8070..ab609255a0f3 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -780,6 +780,9 @@ static int __init balloon_init(void)
> }
> #endif
>
> + /* Init the xen-balloon driver. */
> + xen_balloon_init();
> +
> return 0;
> }
> subsys_initcall(balloon_init);
> diff --git a/drivers/xen/xen-balloon.c b/drivers/xen/xen-balloon.c
> index e7715cb62eef..66ec519c825c 100644
> --- a/drivers/xen/xen-balloon.c
> +++ b/drivers/xen/xen-balloon.c
> @@ -59,6 +59,8 @@ static void watch_target(struct xenbus_watch *watch,
> {
> unsigned long long new_target;
> int err;
> + static bool watch_fired;
> + static unsigned long target_diff;
>
> err = xenbus_scanf(XBT_NIL, "memory", "target", "%llu", &new_target);
> if (err != 1) {
> @@ -69,7 +71,14 @@ static void watch_target(struct xenbus_watch *watch,
> /* The given memory/target value is in KiB, so it needs converting to
> * pages. PAGE_SHIFT converts bytes to pages, hence PAGE_SHIFT - 10.
> */
> - balloon_set_new_target(new_target >> (PAGE_SHIFT - 10));
> + new_target >>= PAGE_SHIFT - 10;
> + if (watch_fired) {
> + balloon_set_new_target(new_target - target_diff);
> + return;
> + }
> +
> + watch_fired = true;
> + target_diff = new_target - balloon_stats.target_pages;
> }
> static struct xenbus_watch target_watch = {
> .node = "memory/target",
> @@ -94,13 +103,8 @@ static struct notifier_block xenstore_notifier = {
> .notifier_call = balloon_init_watcher,
> };
>
> -static int __init balloon_init(void)
> +void __init xen_balloon_init(void)
> {
> - if (!xen_domain())
> - return -ENODEV;
> -
> - pr_info("Initialising balloon driver\n");
> -
> register_balloon(&balloon_dev);
>
> register_xen_selfballooning(&balloon_dev);
> @@ -109,7 +113,7 @@ static int __init balloon_init(void)
>
> return 0;
> }
> -subsys_initcall(balloon_init);
> +EXPORT_SYMBOL_GPL(xen_balloon_init);
>
> #define BALLOON_SHOW(name, format, args...) \
> static ssize_t show_##name(struct device *dev, \
> diff --git a/include/xen/balloon.h b/include/xen/balloon.h
> index d1767dfb0d95..8906361bb50c 100644
> --- a/include/xen/balloon.h
> +++ b/include/xen/balloon.h
> @@ -35,3 +35,11 @@ static inline int register_xen_selfballooning(struct device *dev)
> return -ENOSYS;
> }
> #endif
> +
> +#ifdef CONFIG_XEN_BALLOON
> +void xen_balloon_init(void);
> +#else
> +static inline void xen_balloon_init(void)
> +{
> +}
> +#endif
>

We came across the same issue just recently. The problem was that for
some kernel versions DMA buffers for emulated devices are allocated in
this recently hotplugged area. This area is not properly described for
QEMU so when a DMA request comes in QEMU treats it as "unassigned" and
skips by default. This eventually leads to cryptic failures of system
loading.

Internally we developed a workaround for QEMU with which we try to
satisfy all the "unassigned" requests. But it doesn't solves the problem
in a proper way IMHO.

I haven't not completely understood your use-case but we might try come
up with a general solution for both of the problems because they are
obviously related.

> Onlining new memory will lead to more problems e.g. undesired conflicts
> with NVMe devices meant to be operated as block devices.

Could you explain this in more detail?

Igor