Python ------ The python environment can be activated using the command: .. code-block:: bash module load python/3 Most of the python environment are set up based on ``anaconda`` in the ``/apps/sw/miniconda/`` directory. Users who wish to extend/create custom python these environment can: - use the flag ``--user`` when using pip. For example to install ``ipython`` a user can issue the command: .. code-block:: bash pip install --user ipython This will install ``ipython`` to ``~/.local/lib/python3.7/site-packages``. Users can check whether the imported package is the one installed in their home directory by importing it and printing it, e.g .. code-block:: ipython import IPython print(IPython) >>> - a similar approach can be done for ``anaconda`` environments. * new conda environment in a custom location .. code-block:: bash conda create --prefix /home/john/test-env python=3.8 - ``virtualenvs`` are by default created in the home directory ``~/.virtualenvs``. It might be also useful to use the package ``Virtualenvwrapper``. - use ``pipenv`` that is a new and powerful way to creating and managing python environments. The following is an excellent guide on getting started with ``pipenv`` https://robots.thoughtbot.com/how-to-manage-your-python-projects-with-pipenv - install anaconda locally in their home directories - compile and install ``python`` from source. This is non-trivial and requires good knowledge of what the user is doing, but gives full control on the build process and customization of python. For optimal performance, this is the recommended approach. Jupyter notebooks ^^^^^^^^^^^^^^^^^ .. _jupyter_notebook_job_octopus: A jupyter lab server is run on a compute node to which a user can connect to using a browser on the local machine (i.e laptop/desktop/terminal). - submit the jupyter server script using ``sbatch`` (see below) - get the port number from jupyter-${MY_NEW_JOB_ID}.log after the job stars running - create the tunnel to Octopus - get the URL with the autnetication token from jupyter-${MY_NEW_JOB_ID}.log and use that link (with the token) in your browser Jupyter notebook job on a compute node ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The following job script can be used as a template to submit a job. .. code-block:: bash #!/bin/bash #SBATCH --job-name=jupyter-server #SBATCH --partition=normal #SBATCH --account=my_account #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=8000 #SBATCH --time=0-01:00:00 #SBATCH --account=foo_project source ~/.bashrc module purge module load python/3 JUPYTER_PORT=$(random_unused_port) jupyter-lab --no-browser --port=${JUPYTER_PORT} > jupyter-${SLURM_JOB_ID}.log 2>&1 & ssh -R localhost:${JUPYTER_PORT}:localhost:${JUPYTER_PORT} ohead1 -N Connect to the jupyter server from a client ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ After the job is submitted it is possible to connect to the jupyter server (that is running on the compute node) using ssh tunnels from your local client machine's web browser. To create the tunnel, execute (on your local terminal) .. code-block:: bash $ ssh -L localhost:38888:localhost:38888 octopus.aub.edu.lb -N After creating the tunnel, you can access the server from your browser by typing in the url (with the token) found in ``jupyter.log`` (see previous section) The diagram for the steps involved is: .. figure:: jupyter/jupyter_hpc_usage_model.png :scale: 100 % :alt: Running production jobs with Jupyter notebooks ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using Jupyter notebooks through the browser as described above requires a contineous and stable connection to the HPC cluster (to keep the ssh tunnel alive). When connected from inside the campus network, such issues are minimal. However the connection might experience instability and could get disconected especially when there are no user interactions with the notebook, e.g when running a production job when the user is away from the terminal. After developing a Jupyter notebook (through the browser), production jobs can be runs in batch mode by executing the notebook. Such execution does not require interactions with the notebook through the browser. The following template job script can be used to execute the ``input`` notebook and the executed notebook is saved into a separate one where it can be retrieved from the cluster and examined elsewhere, i.e the notebook with the results are saved and no resources or gpu would be needed to view the results. .. note:: no ssh tunnel is required for executing the notebook .. code-block:: bash #!/bin/bash #SBATCH --job-name=jupyter-server #SBATCH --partition=normal #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=8000 #SBATCH --time=0-01:00:00 #SBATCH --account=foo_project ## load modules here module load python/3 ## execute the notebook jupyter nbconvert --to notebook \ --ExecutePreprocessor.enabled=True \ --ExecutePreprocessor.timeout=9999999 \ --execute my_production_notebook.ipynb --output my_results.ipynb