Deep Learning Software Installation Guide

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

CITS5017 Machine Learning

Semester 2, 2024
Software Installation Guide
(drafted by Unit Coordinator Du Huynh)

If you intend to use your own laptop or your desktop at home, you will need to install Miniconda first.
Please go through this entire software installation guide before carrying out your installation.
If the installation process looks too daunting for you, then maybe you should consider using Google
Colab (see Section 5).

1 Installing Miniconda
Anaconda is an open-source distribution of the Python and R programming languages for many scientific
computing applications. It helps you simplify package management and deployment. The Anaconda
distribution includes data-science packages suitable for Windows, Linux, and macOS. You don’t need to
install Python separately as it is included with Anaconda. By default, the latest version of Python would
be installed when you install Anaconda. However, you might find that, for different projects, you will
need different versions of Python. So, it’s better to set up different Python environments for different
projects.
As of July 2024, the latest version of Python is 3.12.0. However, sometimes due to incompatibility issues
with other libraries (especially TensorFlow), you may need to use a lower version of Python. For this
unit, you need Python version ≥ 3.8. The latest version of TensorFlow that runs Ubuntu (and maybe
other platforms also) is 2.16.2.
Unfortunately, Anaconda includes many packages that you don’t actually need. Furthermore, as we will
set up a specific Python environment for the unit, the hundreds of packages downloaded and installed
in the base environment are unlikely to be used at all. To save disk space, you should install miniconda
instead. It is a significantly cut-down version of Anaconda and can be downloaded from the URL below:
https://docs.anaconda.com/miniconda/
You should choose the appropriate file that is relevant to your computer’s operating system. Ensure
that you download and install the 64-bit version1 as most Python packages are not available for 32-bit
processors these days. The installation process for miniconda should be very quick.
The installation process should bring in the following executable programs: conda, pip, pip3 (just an alias
of pip), pydoc, python, etc. By default, miniconda would be installed in your home directory. However,
you can also specify a directory where you want it to be installed. If you are not sure, just use the default
setting for everything. For instance, if you have chosen the default setting, then after the installation, you
should find the directory miniconda3 in your home directory and under miniconda3/bin, you should
see all the executable programs mentioned above.
You should open a terminal window and type:

∼/miniconda3/bin/conda init
1
If you are still using a 32-bit computer then you need to upgrade it. To find out whether your computer has 32-bit or 64-bit
processors: (i) On the Mac and Linux, type in a terminal window: uname -m, if you see something like x86 64 displayed, then
it uses 64-bit processors; if you see i686 or i386, then it uses 32-bit processors. (ii) On Windows, open File Explorer, right
click This PC and select Properties. In the popped-up window, look at the description under System type.

1
This will run the conda program to initialise the PATH environment variable for you. From then on, for
any programs you want to run, you only need to type the program name (without the path) in the terminal
window that you open.
Try to reboot your computer or maybe just opening a new terminal window would be sufficient. In the
terminal window, type:

conda env list

If you see the error message “command not found.” then it means your PATH environment variable has
not been set up properly. Otherwise, you should see something like:

# conda environments:
#
base * /home/du/miniconda3

By default, miniconda installs a default Python environment called base (this is the name of the environ-
ment). The “*” symbol means that it is currently the active environment.

2 Installing TensorFlow, Jupyter, etc


Installing the Python packages needed for CITS5017 is very simple. The process is identical whether
you use Windows, Linux, or macOS. The easiest way is to type the list of conda or pip commands
given below in a terminal window. If you are certain about the version numbers of the different packages
that you want to install, you can put all the packages together by running a YAML file (e.g., see the
cits5017-2024.yml file provided for the unit).
A common problem that we encountered in past years is: some versions of TensorFlow cannot be in-
stalled or have incompatibility issues with some versions of Python and other packages. One solution to
overcome this issue is let the libraries decide the version of Python to use.

2.1 Installing the packages one by one

This installation procedure requires you to enter the installation commands yourself. However, it is more
interactive and, if the installation failed in any step (e.g., version incompatibility among packages), then
you can identify it. The following versions are known to be compatible (after some detailed investiga-
tion):

• tensorflow 2.16 works for python 3.9–3.12


• scikit-learn and the 2024.03 (or later) version of scikit-learn-intelex work for Python 3.12, but
earlier versions of scikit-learn-intelex are compatible only with python ≤ 3.12. The scikit-learn-
intelex package is optional and should be installed only if your computer uses the Intel processors.
It has been shown (see https://pypi.org/project/scikit-learn-intelex/) that this package (relevant to
Intel processor computers only) significantly speeds up the training of scikit-learn models.

By default, if no version number is specified for a package, conda and pip would try to install the latest
version.
In the procedure below, we assume that the computer does not have a GPU and we will go for the latest
version of Python.
In your terminal window, type the following commands one by one:

2
1. (optional) upgrade conda to the latest version:
conda update -n base conda
This command is required only if you had an older version of conda previously installed.
2. create an environment called cits5017-2024 for the CITS5017 Deep Learning unit:
conda create --name cits5017-2024

3. activate the environment so that the packages installed by subsequent conda install commands are
stored there:
conda activate cits5017-2024

Note that this step is very important. If you omit it, then the subsequent installation commands
will put all the packages in the default base environment.
Type:
conda env list
Now you should see the new environment listed alongside the base environment, for example,
something like the following:

# conda environments:
#
base /home/du/miniconda3
cits5017-2024 * /home/du/miniconda3/envs/cits5017-2024

The “*” symbol should be on the line for cits5017-2023 as it should be the active environment
after we have activated it.
4. install scikit-learn:
conda install scikit-learn

5. (optional) install scikit-learn-intelex:


conda install -c conda-forge scikit-learn-intelex=2024.3.0
Here, we need to provide the channel conda-forge; otherwise, conda would not be able to find the
version 2024.3.0 of scikit-learn-intelex. Note that the latest version of scikit-learn-intelex may be
higher than the number given above, as new versions are released regularly and quickly these days.
For older version of scikit-learn-intelex, we would need to go down the version of Python.
If the command above does not work for your computer, then replace it by
pip install scikit-learn-intelex

You may get the version 2024.5.0 installed instead. This version should work fine as well.
After each installation command, you can type conda list to inspect the installed packages and
their version numbers.
6. install the CPU version of tensorflow:
pip install tensorflow-cpu
Note that pip would have been installed from the previous conda install command, so we don’t
need to explicitly install it ourselves. By default, tensorboard would be installed together. For this
unit, tensorflow version 2.12 onward should be sufficient.
7. Next, type:
pip install tensorflow-datasets

8. install transformers:
pip install transformers

9. install gym:
pip install gym

3
10. install jupyter:
conda install notebook

Jupyter-notebook and jupyter-lab provide the interface for editing and running Python notebook
(.ipynb) files. Both are similar and either one is sufficient for the unit. In the command above,
both are installed.
11. install ipywidgets (needed in Chapter 12):
conda install ipywidgets

12. install seaborn (matplotlib and pandas would be automatically installed alongside):
conda install seaborn

The dependencies of seaborn include matplotlib and pandas. Installing seaborn will therefore
install both latter packages which are needed for the unit as well.
13. install openpyxl. This backend library is needed for pandas to read xls and xlsx files.
conda install openpyxl

14. install chardet. This library package seems to be needed for Jupyter-notebook and Jupyter-lab and
needs to be installed explicitly:
pip install chardet

15. deactivate the environment and then clean up unwanted files:


conda deactivate
conda clean --all

Packages in compressed format were downloaded by each installation command above. The in-
staller extracted them from the zip files and put them in a sub-directory under your home directory
or somewhere else (on macOS, it should be in /opt/miniconda3 if you installed miniconda). Af-
ter installation, these zip files are no longer needed. You will find that you can save a lot of disk
space if you do this cleaning up step. When you are asked to confirm whether a long list of files
ending with .bz2 or .conda should be removed, just type yes.
You can either deactivate the environment or just close the terminal window. Alternatively, while
the cits5017-2024 environment is still activated, you can try running some of the sample Python
code provided by the author of the textbook. Note that as the library packages are installed in the
cits5017-2024 environment, whenever you need to use the installed packages you must activate
the environment first; otherwise, you will see only the packages that come with the installation of
miniconda. After deactivation, you will be back to the default base environment. You should see
the difference by typing conda list here.
NOTE: To open a .ipynb file, always activate the environment in a terminal window, start jupyter-
notebook or jupyter-lab from a suitable directory, and navigate to the sub-directory where the
.ipynb file is and open it. Do not open the .ipynb file by just double-clicking it in your File Explorer
window. If you have made changes to a .ipynb file, then the modification date/time should reflect
that the file has been updated.

Some useful conda commands can be found in the conda cheat sheet.

2.2 Installing packages using a YAML file

To install all the packages using the supplied cits5017-2024.yml file, type:
conda env create --file cits5017-2024.yml
Do not use the author’s supplied environment.yml file on the GitHub page https://github.com/ageron/
handson-ml3 as we don’t need all the packages specified there.

4
The process should take only a few minutes to finish. If the installation process failed at any point, you
would need to sort out the compatibility issues among the packages, modify the cits5017-2024.yml file
where needed, remove the broken environment, and recreate a new environment by repeating the conda
command above.
If the installation completed successfully, you should clean up the unwanted compressed files (see the
previous subsection) by typing:
conda clean --all
To activate and deactivate the cits5017-2024 environment, see the previous subsection.

3 Using your GPU


If your computer has a GPU (graphics processing unit) and you want to make use of it, then the instal-
lation process is a bit more complicated as it depends on what GPU you have and the version of the
driver and associated libraries you have installed (or need to install). Assuming that you have a GPU
from NVIDIA, you will need to have firstly installed the NVIDIA driver and then CUDA2 and cuDNN3
of appropriate versions.
Some useful links are given below:

• TensorFlow GPU Installation Guide:


https://www.tensorflow.org/install/gpu
• NVIDIA CUDA Installation Guide for Microsoft Windows:
https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html
• NVIDIA CUDA Installation Guide for Linux:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

Some useful installation commands for Linux are given below:

• To find out what type of graphics card you have4 , type: sudo lshw -c display
If you have an NVIDIA GPU, you should see the line Configuration: driver=nvidia latency=0
in the displayed message.
• To find out what version of NVIDIA driver you are using (if you have it already installed), type:
nvidia-smi
• To install the NVIDIA driver version 450.x (for instance), type: sudo apt install nvidia-driver-450
• To find out what version of CUDA you are using (if you have it already installed), type: nvcc
--version
If you get the error message command not found, then check your PATH environment variable. If
you have CUDA installed successfully, the directory /usr/local/cuda/bin should exist.
• To install an appropriate cuDNN library, see the instructions on https://docs.nvidia.com/deeplearning
/cudnn/install-guide/index.html

4 Running some sample notebook files


The best way to test whether your installed Python environment cits5017-2024 works for the unit is to
try running some sample notebook files from Chapter 10 onward from the textbook.
2
CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on its own
GPUs.
3
cuDNN is the NVIDIA CUDA Deep Neural Network library.
4
If you have the superuser (or root) privilege, then the word sudo in the commands can be omitted.

5
5 FAQs
• What are environments and why do we need them? It is useful to set up different environ-
ments for different projects when you use Python. For example, in project 1, you might need to
use TensorFlow version 1.8.0, but in project 2, you might need TensorFlow 2.0. You can create
an environment named proj1 for project 1 and another environment named proj2 for project 2.
Different packages and different versions of the same packages can be installed in the two envi-
ronments. You should choose environment names that are meaningful and easy to remember. For
example, we use the environment name CITS5017-2024 for all the programming work for the unit.
• What is Jupyter Notebook? Jupyter Notebook is an open-source web application that allows you
to create and share documents that contain live code, equations, visualisations and narrative text.
The output (graphs, plots, etc) can be displayed inside the environment. Jupyter Notebook files
have the extension .ipynb. Markdown (https://en.wikipedia.org/wiki/Markdown), a lightweight
markup language with plain-text-formatting syntax, is supported in Jupyter Notebook.
• What is JupyterLab? It is the next-generation web-based user interface for Project Jupyter. You
can start JupyterLab by typing jupyter-lab in a terminal window. You can use jupyter-lab or
jupyter-notebook or use them interchangeably.

• What is Google Colab? Google Colab (https://colab.research.google.com/) is a free web-based


environment that supports the editing and running of Python programs. It has all the packages
needed by the unit installed. It also supports free GPUs (up to a certain number of hours). The
interface is almost identical to that of jupyter-lab. For the later part of the unit where we need to
train deep neural networks (DNNs), having access to a GPU would help you speed up the training
process. Rather than performing the installation procedure described above, you can use Google
Colab for the programming work for the entire unit. For more information about Google Colab,
see https://research.google.com/colaboratory/faq.html.

You might also like