-
Notifications
You must be signed in to change notification settings - Fork 171
Anaconda in a network environment
Here at the USGS Astrogeology Science Center (ASC) we have a software development team consisting of roughly a dozen full or part time developers, another dozen or so technical users, and a few dozen scientists who all need consistent access to software builds and releases. As we moved to using Anaconda for dependency management and software installation, our IT group quickly flagged an issue where users' Home directories were growing astronomically large from each having their own Anaconda installation. Along with that, we had many users who had never used Anaconda before having to manage and debug their own installations. Fairly quickly, we settled upon having shared Anaconda installations that users could access and use across systems without having to manage their own Anaconda installation or update their environments.
I'm sure our IT folks could give a much better explanation than me, but here's a quick run-down of our systems at the ASC. The majority of our users have either a Windows desktop, an Ubuntu desktop, or a macbook. We primarily target our software for use on Unix systems so those users who have a windows desktop use PuTTY to connect to Linux Virtual Machines (VMs). Along side individual workstations, we also have a Linux processing cluster and have Network File System (NFS) drives that users can access. Users can mount the NFS drives directly to their Unix work station or they can access them by connecting to the VMs. The primary NFS drive that I'll be talking about is /nfs/cpkgs/
. This drive is a shared drive that we mount on all of our Unix systems, hence the name cpkgs for common packages. Generally, users manage their own project data either locally, on other NFS drivers, or on high performance storage used by the processing cluster. Files on /nfs/cpks/
are only updated by software developers or some technical users to setup environments for a project or a set of users. So, most users access software and general purpose data from /nfs/cpkgs/
but don't store anything there.
If you are already familiar with Anaconda, Anaconda channels, Anaconda environments, and Anaconda configuration; then you can go ahead and skip to the next section. If any of those things are new or you have questions about them, this section will give you a brief introduction and some links to further docs.
First of all, Anaconda is a package manager that allows you to install a whole bunch of different pieces of software with one or two commands. Let's say you want to install opencv so that you can do some computer vision work in Python; all you have to do to get started is run conda install opencv
and Anaconda will download the lastest version of OpenCV (and all of its dependencies) and install the Python bindings into its Python installation. You can download Anaconda and read the docs on their Individual Edition Home Page. The company that makes Anaconda has some paid products and support that you can get, but we don't use them so I won't be talking about them.
You can use Anaconda to install packages with conda install <package-name>
. For example, I can install the latest available version of OpenCV with conda install opencv
. That will download and install the latest version of OpenCV from https://anaconda.org/anaconda/opencv/files, currently 4.0.1. I can instead install a specific version with conda install <package-name>=<version-number>.
For example, I can install OpenCV 3.4.1 with conda install opencv=3.4.1
.
So now that you know how to find and install packages, what do you do when you don't want a package available all the time or you want different versions of a package available? By default Anaconda will install packages into the directory you install it in. For example, if I install Anaconda into ~/anaconda
and then I install latest version of OpenCV, it will install the dynamic library into ~/anaconda/lib
, the c++ headers into ~/anaconda/include
, and the Python bindings into ~/anaconda/lib/python3.7/site-packages
. If I then have Anaconda install an older version of OpenCV, let's say 3.2, it will replace everything I just installed with an older version. Uninstalling and reinstalling different versions of a package all the time can be a real pain, so instead let's use environments to install them and then we can active the latest OpenCV environment or the OpenCV 3.2 environment depending on what we want. Each environment is basically a separate set of packages that you can swap between whenever you want, and without having to reinstall anything. First, you can create an environment using conda create my_envrionment
. Then, you can activate that environment by running conda activate my_environment
. From here, you are ready to install any packages you want and when you're done with it you can deactivate it by running conda deactivate
.
Using the same example installation as before, I can run conda create -n my_environment
and I will get a new environment at ~/anaconda/envs/my_environment
. If I then activate that environment with conda activate my_environment
and install the latest OpenCV with conda install -c conda-forge opencv
it will install the library into ~/anaconda/envs/my_environment/lib
, the c++ headers into ~/anaconda/envs/my_environment/include
, and the Python bindings into ~/anaconda/envs/my_environment/lib/python3.7/site-packages
. I do some work and then I want to swap to OpenCV 3.2. First, I deactivate my latest OpenCV environment with conda deactivate
. Next, I make a second environment with conda create -n opencv_old
which will be located at ~/anaconda/envs/opencv_old
and activate it with conda activate opencv_old
. Now I have a clean environment and can install OpenCV 3.2 with conda install -c conda-forge opencv=3.2
. This will install the library into ~/anaconda/envs/opencv_old/lib
, the c++ headers into ~/anaconda/envs/opencv_old/include
, and the Python bindings into ~/anaconda/envs/opencv_old/lib/python3.7/site-packages
.
You can have many different environments in your anaconda installation and swap between them at will. The only stuff shared between different environments is what's in your base environment. Your base environment is the root directory in your Anaconda installation and you can activate it using conda activate
. In our installation example, the base environment would be located at ~/anaconda
. Notice that by default Anaconda installs packages into your base environment when you don't have another environment active. Because anything installed here will bleed into your other environments, we highly discourage installing anything into your base environment besides what comes with your Anaconda installation.
Constantly specifying channels can be a real chore, so you may want to add a channel to your Anaconda config with conda config --add channels <channel-name>
. This command will add the channel to your default search and you won't have to specify it everytime you install a package anymore. For example you can default to packages from conda-forge with conda config --add channels conda-forge
. You can also specify a configuration for just an evironment by activating the environment and then adding --env
to your conda config command. For example, I could setup my_environment
from above to default to searching conda-forge for packages by first activating it with conda activate my_environment
and then modifying the config with conda config --env --add channels conda-forge
. This would cause Anaconda to default to conda-forge only when my_environment
is activate.
All of this works using .condarc
files at different places. If you run conda config, then it defaults to modifying your personal .condarc
file at ~/.condarc
. If you use the --env
option, then it modifies the .condarc
file in your environment. For example, when I modified my_environment
to default search conda-forge it added the following to ~/anaconda/envs/my_environment/.condarc
:
channels:
- conda-forge
- defaults
The conda config can set a whole bunch of other options that are beyond the scope of this document. You can read more about what you can do with .condarc
files here.
We wanted to setup a single Anaconda installation in our /nfs/cpkgs/
NFS drive, but Anaconda requires separate installations for Linux and Mac. So, we instead have two Anaconda installations: /nfs/cpkgs/anaconda3_linux
and /nfs/cpkgs/anaconda3_macOS
. As their name suggests, anaconda3_linux
is for Linux and anaconda3_macOS
is for Mac. Within each installation we have separate environments for different versions of software. For example, under /nfs/cpkgs/anaconda3_linux
we have the following environments for different versions of ISIS:
- isis3.6.0
- isis3.6.2
- isis3.7.0
- isis3.7.1
- isis3.8.0
- isis3.8.1
- isis3.9.0
- isis3.9.1
- isis4.0.0
- isis4.0.1
- isis4.1.0
- isis4.1.1
- isis4.2.0
- isis4.3.0
Whenever there is a new release or release candidate for ISIS, we make a new environment and install the new version into it. This allows users to select any version they want and they can easily upgrade or downgrade for their processing needs. The permissions on the installations are restricted so that users can access all of the files, but cannot modify any of them to prevent accidentally changing an environment.
Users can access the shared installation by running the conda.sh
or conda.csh
scripts in /nfs/cpkgs/anaconda3_linux/etc/profile.d/
or /nfs/cpkgs/anaconda3_macOS/etc/profile.d/
. These will modify their environment to run conda and give them access to the shared environments. Most of our users run bash, so we recommend that users simply add source "/nfs/cpkgs/anaconda3_linux/etc/profile.d/conda.sh"
or source "/nfs/cpkgs/anaconda3_macOS/etc/profile.d/conda.sh"
to their .bashrc
or .bash_profile
.
Users can even create personal environments with the shared Anaconda utilities. If a user creates an environment with the shared installation, it will create the environment in their home directory instead of inside the installation. For example, if I attempt to create an OpenCV environment with conda create -n opencv
then it will be located at ~/.conda/envs/opencv
. This is caused by users not having write permissions to the shared installation, so conda defaults back to their home directory. It is okay for users to create one or two environments this way but if users want to have many different environments that they manage themselves and use the shared environments we recommend they setup their own Anaconda installation and follow the steps in the next section to add the shared environments.
Some users want to have their own Anaconda installation and still have access to the shared environments. Some users want to have more fine control over their processing environment while others want to be able to keep their own environments for when they are outside of our network and don't have access to the NFS drives. Thankfully, there is an easy solution to still allow them access to the share environments. Users simply need to run conda config --add envs_dirs "/nfs/cpkgs/anaconda3_linux/envs"
or conda config --add envs_dirs "/nfs/cpkgs/anaconda3_macOS/envs"
. This will add the following to their /.condarc
file:
envs_dirs:
- /nfs/cpkgs/anaconda3_linux/envs
Once this is done, they can run conda activate <shared-env-name>
to activate a shared environment. If the shared environment files are not available because the NFS drive is not mounted, then Anaconda will still continue to operate as before.
NFS drives are not 100% foolproof and you will sometimes have issues, here are a few we have run into. An Anaconda package adds some files to the NFS drive and the NFS software modifies them resulting in the Anaconda package thinking it is corrupted. An Anaconda package uses a SQLite database that does not lock and unlock properly because the NFS drive doesn't handle rapidly locking and unlocking a file well. NFS drives can be very slow for I/O bound processes such as NAIF DSK routines.
Anaconda uses fairly aggressive caching. If it finds anything locally that works, it generally doesn't check the cloud for something better. By keeping old environments with old packages around, you can sometimes get solves that use old packages. Even, just adding the environments via the conda config command from the previous section can cause Anaconda to pull old packages from them. Our developers often have to remove the shared environments from their envs_dirs config in order to reproduce user reported installation issues.
If you have a networked home directory that you share between Linux and Mac systems, you'll need to add some logic to your .bashrc
and/or .bash_profile
files that selects the proper Anaconda installation dependending on what OS you are on. If you have a shared home like this, then you likely already have logic like this in your start up files.
- Building
- Writing Tests
- Test Data
- Start Contributing
- Public Release Process
- Continuous Integration
- Updating Application Documentation
- Deprecating Functionality
- LTS Release Process and Support
- RFC1 - Documentation Delivery
- RFC2 - ISIS3 Release Policy
- RFC3 - SPICE Modularization
- RFC3 - Impact on Application Users
- RFC4 - Migration of ISIS Data to GitHub - Updated Information 2020-03-16
- RFC5 - Remove old LRO LOLA/GRAIL SPK files
- RFC6 - BLOB Redesign
- Introduction to ISIS
- Locating and Ingesting Image Data
- ISIS Cube Format
- Understanding Bit Types
- Core Base and Multiplier
- Special Pixels
- FAQ