KNIME Python Integration Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2020-12-06)
KNIME Python Integration Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2020-12-06)
KNIME Python Integration Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2020-12-06)
Quickstart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Anaconda Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Anaconda installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Introduction
This guide describes how to install the KNIME Python Integration to be used with KNIME
Analytics Platform.
This guide refers to the KNIME Python Integration that is available since the
v3.4 release of KNIME Analytics Platform (not to be confused with the KNIME
Python Scripting Extension). The integration is the recommended and most
recent way to use arbitrary Python™ scripts in KNIME Analytics Platform and
supports both Python 2 as well as Python 3.
The KNIME Python Integration makes use of an existing Python, which is installed alongside
KNIME Analytics Platform. As the KNIME Python Integration depends on certain Python
packages, the Python installation needs to have these packages installed. Our recommended
way to set up such a Python environment is to use the Anaconda Python distribution from
Continuum Analytics. In this guide we describe how to install Python and the necessary
packages using Anaconda, as well as how to configure the KNIME Python Integration.
Quickstart
This quickstart guide shows you the basic steps required to install the KNIME Python
Integration and its prerequisites with Python. We do not provide any further details. If you’d
like a more thorough explanation, please refer to the more detailed Anaconda Setup Section.
1. First, install the KNIME Python Integration. In KNIME Analytics Platform, go to File →
Install KNIME Extensions. The KNIME Python Integration can be found under KNIME &
Extensions or by entering Python Integration into the search box.
2. Next, install Anaconda. It is used to manage Python environments. Anaconda can be
downloaded here (choose Anaconda with Python 3).
3. Finally, configure the KNIME Python Integration. Go to the Python Preference page
located at File → Preferences. Select KNIME → Python from the list on the left. In the
page that opens, select Conda under Python environment configuration. Next, provide
the path to your Anaconda installation folder (the default installation path is documented
here). Once a valid path has been entered, the conda version number is shown. Below
the conda version number you can choose which conda environment to be used for
Python 3 and Python 2 by selecting it from a combo box. If you have already set up a
Python environment, containing all the necessary dependencies for the KNIME Python
Integration, just select it from the list and you are ready to go. If you do not have a
suitable environment available, click the New environment… button. This opens the
following dialog:
Provide a name for the new environment and click the Create new environment button.
This creates a new conda environment containing all the required dependencies for the
KNIME Python Integration.
Once the environment is successfully created, the dialog closes and the new
environment is selected automatically.
Anaconda Setup
This section describes how to install and configure Anaconda to be used with the KNIME
Python Integration. Anaconda allows you to manage several so called conda environments,
which can contain different Python versions and different sets of packages, also using
different versions. A conda environment is essentially a folder that contains a specific Python
version and the installed packages. This means you can have several different Python
versions installed on your system at the same time in a clean and easy to maintain manner.
For KNIME, this is especially useful as it allows you to use Python 3 and Python 2 at the same
time without running into version issues; Anaconda keeps each environment nicely
encapsulated and independent of all others. Furthermore, Anaconda is able to create
predefined environments with a single command and makes it easy to add Python packages
to existing ones.
Next, you will learn how to set up an environment that contains the dependencies needed for
the KNIME Python Integration.
Anaconda installation
First, you need to install the latest Anaconda version (Anaconda ≥ 2019.03, conda ≥
4.6.2). On the Anaconda download page you can choose between Anaconda with Python 3.x
or Python 2.x, however this only affects the root conda environment, which we will not use (as
we are creating our own). Therefore, you can choose either one (if you’re not sure, we
suggest selecting Python 3).
Option 2: Manual
If you do not want to create a conda environment automatically, you can create one manually
after Anaconda is installed. Do this with a YAML configuration file, which lists all of the
packages to be installed in the newly created environment. We have provided two such
configuration files below (one configuration file to create a new Python 3 environment and
one file for Python 2). They list all of the dependencies needed for the KNIME Python
Integration:
py3_knime.yml
py2_knime.yml
The above configuration files only contain the Python packages that the KNIME
Python Integration depends on. If you want to use more Python packages in
KNIME you can either add the name of the package at the end of the
configuration file or add them after the environment has been created.
For example, for Python 3 you can use the py3_knime.yml and download it to any folder on
your system (e.g. your home folder). In order to create an environment from this file, open a
shell (Linux), terminal (Mac), or Anaconda prompt (Windows, can be found by entering
anaconda in Windows Search), change the directory to the folder that contains the
configuration file and execute the following command:
This command creates a new environment with the name provided at the top of the
configuration file (of course you can change the name). It also downloads and installs all of
the listed packages (depending on your internet speed, this may take a while).
If you want to use both Python 3 and Python 2 at the same time, just repeat the above steps
using the respective configuration file.
The list of dependencies for Python 3 and Python 2 is almost the same,
however version numbers change.
After Anaconda has successfully created the environment, Python is all set up and you are
ready to proceed with Setting up the KNIME Python Integration.
Just replace <ENV_NAME> with the name of the environment in which you want to install the
package.
You can easily specify a specific version of the package with e.g. scikit-
learn==0.20.2
Troubleshooting
Mac Matplotlib
On Mac, there may be issues with the matplotlib package. The following error:
mkdir ~/.matplotlib
echo "backend: TkAgg" > ~/.matplotlib/matplotlibrc
Installation
From KNIME Analytics Platform, install the KNIME Python Integration by going to File →
Install KNIME Extensions. The KNIME Python Integration can be found under KNIME &
Extensions or by entering Python Integration into the search box.
Select Conda under Python environment configuration. The dialog should look like the
screenshot shown below.
In this dialog, provide the path to the folder containing your Anaconda installation (the default
installation path is documented here). Once you have entered a valid path, the installed conda
version is displayed and KNIME automatically checks for all available conda environments.
Underneath the conda version number, you can choose which conda environment should be
used for Python 3 and Python 2 by selecting it from a combo box. If you have already set up a
Python environment containing all the necessary dependencies for the KNIME Python
Integration, just select it from the list and you are ready to go. If you do not have a suitable
environment available, click the New environment… button. This opens the following dialog:
Provide a name for the new environment and click the Create new environment button. This
creates a new conda environment containing all required dependencies for the KNIME Python
Integration.
Once the environment is successfully created, the dialog closes and the new environment is
selected automatically. If everything worked out fine, the Python version is now shown below
the environment selection and you are ready to go.
Option 2: Manual
If you choose the manual option, you have to point KNIME to a start script which activates
the environment you want to use for Python 2 and Python 3 respectively. This option
assumes that you have created a suitable Python environment earlier by following the
instructions given under Option 2: Manually of the Creating a Conda environment Section.
In order to use the created Anaconda environment for the KNIME Python Integration, you
need to create a start script (shell script on Linux and Mac, bat file on Windows).
If you are using Linux or Mac, here’s an example shell script for the Python environment:
#! /bin/bash
# Start by making sure that the anaconda folder is on the PATH
# so that the source activate command works.
# This isn't necessary if you already know that
# the anaconda bin dir is on the PATH
export PATH="<PATH_WHERE_YOU_INSTALLED_ANACONDA>/bin:$PATH"
These example scripts need to be edited in order to point to the location of your
Anaconda installation and to activate the correct Anaconda environment. I.e.
replace the <PATH_WHERE_YOU_INSTALLED_ANACONDA> with the location of your
Anaconda installation (the default installation path is documented here) and
<ENVIRONMENT_NAME> with the name of the conda environment the script should
start and which you created in the Anaconda Setup guide (e.g. in this case
py3_knime for Python 3.6 or py2_knime for Python 2.7).
For example on Windows, create a new bat script named e.g. py3.bat (py3.sh on Linux or
Mac) and paste the corresponding script to the file.
On Linux/Mac you additionally need to make the file executable (i.e. chmod
gou+x py3.sh).
Once you have created the start script, you’re almost finished setting up Python. The last
thing to do is to point KNIME to the start script you just created. Do this in the Preference
page of the KNIME Python Integration located at File → Preferences. Select KNIME → Python
from the list on the left. The dialog should look like the screenshot shown below.
Figure 1. KNIME Python Preferences page. Here you can set the path to the executable script
that launches your Python environment.
On this page you need to provide the path to the script/bat file you created to start Python. If
you like, you can have configurations for both Python 2 and Python 3 (as is shown above).
Just select the one that you would like to have as the default. If everything is set correctly, the
Python version is now shown in the dialog window and you are ready to go.
Serialization library
You can choose which serialization library should be used by the KNIME Python Integration
to transfer data from KNIME Analytics Platform to Python.
This option does not usually need to be changed and can be left as the default.
Some of these serialization libraries have additional dependencies stated below, however if
you followed the Anaconda Setup, all required dependencies are already included (see YAML
configuration files on the Anaconda Setup guide). Currently, there are three options:
Advanced
A further Advanced option is also available to set up the options of the pre-launched Python
processes. In the background, KNIME Analytics Platform initializes and maintains a pool of
Python processes that can be used by individual Python nodes, reducing the startup cost
when executing any Python nodes. Here, you can set up the pool size in the field Maximum
number of provisioned processes and the duration in minutes before recycling idle processes
in the pool in the field Expiration duration of each process (in minutes).
This node is also useful to make workflows that contain Python nodes more portable by
allowing to recreate the Conda environment used on the source machine (for example your
personal computer) on the target machine (for example a KNIME Server instance).
1. On your local machine, you need to have Conda set up and configured in the
Preferences of the KNIME Python Integration as described in the Anaconda Setup
section
2. Open the node configuration dialog and select the Conda environment you want to
propagate and the packages to include in the environment in case it will be recreated
on a different machine
Conda Environment
Propagation Python Script
4. Successively open the configuration dialogues of the Python node and all subsequent
Python nodes in the workflow that you want to make portable. Upon opening their
dialogues for the very first time, they will automatically pick up the environment by
setting their respective Python 2 and/or Python 3 entries on the Flow Variables tab to
the propagated conda.environment variable.
Once you configured the Conda Environment Propagation node and set up the desired
workflow, you might want to run this workflow on a target machine, for example a KNIME
Server instance.
1. Deploy the workflow by uploading it to the KNIME Server, sharing it via the KNIME Hub,
or exporting it. Make sure that the Conda Environment Propagation node is reset before
or during the deployment process.
2. On the target machine, Conda must also be set up and configured in the Preferences of
the KNIME Python Integration. If the target machine runs a KNIME Server, you may need
to contact your server administrator and/or refer to the Server Administration Guide in
order to do this.
3. During execution (on either machine), the node will check whether a local Conda
environment exists that matches its configured environment. When configuring the
node you can choose which modality will be used for the Conda environment validation
on the target machine. Check name only will only check for the existence of an
environment with the same name as the original one, Check name and packages will
check both name and requested packages to correspond, while Always overwrite
existing environment will disregard the existence of an equal environment on the target
machine and will recreate it.
This option will affect the speed of execution of the node as Conda will
need an increasing amount of time if the check of the environment is
based only on the name of the environment, or if a packages checks is
also requested.
Please be aware that exporting Python environments between systems that run
different Operating Systems might cause some libraries to conflict.
# Path to the folder containing the notebook, e.g. the folder 'data' contained
# in my workflow folder
notebook_directory = "knime://knime.workflow/data/"
• notebook_version: The Jupyter notebook format major version. Sometimes the version
can’t be read from a notebook file. In these cases, this option allows to specify the
expected version in order to avoid compatibility issues. Should be an integer.
• only_include_tag: Only load cells that are annotated with the given custom cell tag
(since Jupyter 5.0.0). This is useful to mark cells that are intended to be used in a
Python module. All other cells are excluded. This is e.g. helpful to exclude cells that do
visualization or contain demo code. Should be a string.
The Python nodes support code completion similar to an IDE. Just hit ctrl-
space (command-space on Mac) e.g. after knime_jupyter. in order to show the
available methods and documentation (knime_jupyter refers to the imported
knime_jupyter Python module, e.g. see script example above).
The Jupyter notebook support for the KNIME Python Integration depends on
the packages IPython, nbformat, and scipy, which are already included if you
used the configuration files from the Anaconda Setup.
You can find example workflows using the knime_jupyter Python module on our EXAMPLES
server.
MDF Reader
Similar to the KNIME Deep Learning Integration, the MDF Reader node requires certain
Python packages to be installed in the Python 3 environment. Since the v4.1 release of
KNIME Analytics Platform, these will be automatically installed if you set up your Python
environment via the Python Preference page (see here). The required packages are the
following:
numpy=1.16.1
libiconv=1.15
asammdf=5.13.13
The MDF Reader node requires a newer numpy version (1.16.1) compared to the
numpy version (1.15) required before.
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME AG under license
from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany.