Maintenance

Website

Internal Pages

Internal pages are handled by the User Access Manager plugin. A “Members” group has been created to include Administrators, Editors, Authors and Subscribers (all logged in users). When editing a page, an Access box appears on the right-hand side – ticking the “Members” box therefore makes sure that only logged in users can see the page (and it also appears on the menu). The Visibility of the page should be left as Public so that page hierarchies (Parents) can be maintained.

Team Pages

Biographies should be one paragraph long and concise. Emails (and full LinkedIn URLs) should be added to the Social Profile Links section as opposed to Additional Information (it does nothing there). Pictures should be 209 x 270px (portrait).

Security and Caching

These are both handled by the Wordfence Security plugin. The current settings suffice.

Backups

This is handled by the BackWpUp plugin.

Deep Learning Network

The administrative interface for the mailing list is here.

Servers (rogue/iceman)

File System (iceman)

The OS is on a SSD, and there is RAID 0 set up over 2 2TB HDDs. On top of this is logical volume (not filled out to 4TB for some reason…) at /dev/mapper/vg_pool-lv_home – this contains a backup of the home directories from before when the machine had a clean reinstall. This is mounted on /home via /etc/fstab with type “auto” (ext4 fails and neither fdisk nor gdisk find a partition table, so goodness knows whats happening).

User Setup (iceman)

Although some user folders exist already and network login is enabled, users do need to be added to groups (such as sshd) via /etc/group before they can actually access the machine.

MATLAB

In order to renew the MATLAB license, open up the MATLAB graphical interface and go to Help > Licensing > Update Current Licenses… . Then use the use the departmental activation key (passwords here – please ask for access).

In order to upgrade MATLAB, download the Linux installer from the website and extract the files inside. Using a GUI (via NX or VNC), run the install script as sudo. Use the departmental activation key/TAH Network Concurrent User license. Install with defaults (plus create links to /bin) and activate online.

File Sharing

Files can be copied using scp:

scp -p /path/to/local/file username@address.of.remote.machine:/path/to/remote/file

Servers (angel/beast/cyclops)

Software tl;dr

Full details can be found below, but the new and improved recipe is essentially Ubuntu Desktop 14.04, SSH, CUDA (via network .deb) + post-install, Docker, NVIDIA Docker, GlusterFS, Samba.

Hardware

Building was tricky but doable. We had to be careful with order of installing parts to make sure cables can connect – particularly power to the top of the motherboard with the top left radiator fan. The SSD is installed overhanging one of the top trays with the cable routed around. 3 HDDs can using the special combined connector from the motherboard box. The bottom motherboard connections should be connected before putting in the GPUs. Motherboard connection helpers cannot be used with the case and USB plugs since they get in the way of the bottom GPU.

The BIOS should always be updated to the latest version as the system may be otherwise unstable. In the BIOS enable both virtualization settings. PCs on the Imperial College network have Wake-on-LAN. PCI wakeup needs to be enabled in the BIOS for this to work. Therefore the settings that need to be enabled are:

  • CPU Configuration -> Intel Virtualization Technology
  • System Agent Configuration -> Intel VT for Directed I/O (VT-d) -> Intel VT for Directed I/O (VT-d)
  • APM Configuration -> Power On By PCI-E/PCI

Extra devices should be disabled:

  • Onboard Devices -> Bluetooth Controller
  • Onboard Devices -> Wi-Fi Controller

All fan profiles should be set to Turbo to ensure adequate cooling. To improve stability, the AI Overclock Tuner should be changed from “Auto” to “Manual”.

Operating System

The next few sections details the setup for the servers, and remain good instructions for setting up any new PCs with Linux.

First boot a Linux Desktop live CD to open a browser and register the PC for internet connection. Wait about ten minutes for the IP to register. Then install Ubuntu Server 14.04.3 LTS with ssh (thankfully Ubuntu does not set a password for root which is a good security practice). Use the entire SSD (no LVM). Default user is “bicv”. Other options are generally as default.

Change to the latest v4 of the Linux kernel supported by Ubuntu (and remove the old version) by running the following:

sudo apt-get install --install-recommends linux-generic-lts-xenial
sudo apt-get autoremove

Every server is assigned a static IP with an address of the form <hostname>.bg.ic.ac.uk.

To increase the accuracy of ntp synchronisation, add the following (Imperial DoC) ntp servers to /etc/ntp.conf on each machine (below “Specify one or more NTP servers) and then run “sudo service ntp reload”:

time1.doc.ic.ac.uk
time2.doc.ic.ac.uk

Sometimes memory allocated to processes that have completed is not cleared up. If this needs to be done log into root and free all cached objects and then free pagecache, dentries and inodes:

sync && echo 3 > /proc/sys/vm/drop_caches

CUDA

However when booting nouveau is too old for GTX 980s so it fails. Boot into safe mode and repair broken packages/upgrade the system as necessary (it is safest to reboot into safe mode after this). Then drop to the root shell prompt and install build-essential (for the necessary C++ compiler) and dkms (to rebuild the driver after kernel upgrades).

Download the NVIDIA CUDA run file with GTX 9xx support (e.g. using wget); it is almost 1GB in size. Then extract the driver installer:

./<run file> --extract=`pwd`

Run just the driver installer (choosing DKMS support and 32-bit compatibility libraries, but not signing) to allow the system to boot and have the appropriate items for running CUDA installed on the kernel.

After boot the devices would be missing under /dev, so adjust /etc/rc.local to have the following commands before “exit 0”:

X :1 &
nvidia-smi
nvidia-modprobe -c 0 -u
( sleep 60; mount -a; service restart smbd ) &

Running nvidia-smi gets the devices (nvidia0-3) and nvidiactl to appear under /dev. nvidia-smi gives more information about the cards, including temperatures. The second command starts nvidia-uvm.

The first command is for OpenGL (see below). The final command waits a minute, assuming that the network will be ready by then, makes sure GlusterFS is mounted, and then restarts Samba.

OpenGL

Software such as Gazebo requires OpenGL (for rendering, cameras etc.) – getting this to work with headless servers and in Docker at the same time is complex. The first part has been distilled from two sources. Firstly, to make sure there is some amount of X support, run `sudo apt-get install xinit`. Then, to make sure `xorg.conf` is set up properly, run `nvidia-xconfig –use-display-device=None` (the key is `Option “UseDisplayDevice” “none”`); the headless X server should then be running on restarts. However, for some reason the the default display does not work… Therefore add `X :1 &` to the top of /etc/rc.local. To debug, it’s helpful to have `glxinfo`, which is part of the `mesa-utils` package. `DISPLAY=:1` should be used as the enviroment variable when requiring access for OpenGL.

File System

The Ubuntu installer sorts out the partitioning and file systems for the SSD. The HDDs for the 3 servers now use GlusterFS (details also below) to have a 22TB pool replicated over the servers. The details can be found on their own internal page. This storage is mounted under /mnt/glusterfs.

Docker

Install Docker using the instructions for Ubuntu 14.04.

Dockerfiles are developed by Kai and Jose and are available in Bitbucket. Instructions for images can be found in their respective folders. Generally start a container with the interactive and pseudo-tty options (-it), and make it ephemeral when testing with –rm. Otherwise once you exit a container it is stopped but all the information is saved. Use docker rm <id> to delete it or docker start <id> and docker attach <id> to restart it and get back to the terminal.

As an example, to mount the /mnt/glusterfs folder as read-only, but a user’s directory as read-write, use the following options when running the docker container:

-v /mnt/glusterfs/:/data/:ro -v /mnt/glusterfs/users/<user>:/data/users/<user>

Note that for security reasons it is only possible to mount volumes inside a running container with the –privileged=true option. Also note the convention for mounting /mnt/glusterfs as /data within containers.

File Sharing

The default password is “password” – instructions for changing this can be found in the readme.

The /mnt/glusterfs folder is via Samba (under data) with the user “root”. The address is:

smb://<hostname>.bg.ic.ac.uk

Every user has space available under /data/users/<user_id>/ – these are created manually on demand. Datasets are stored under /data/datasets/. Please respect these conventions.

Mount Samba shares with CIFS CLI for Linux

This is a good solution when you want to have an accessible  path to your data on the servers without having to set up a permanent mount with /etc/fstab or when this is not available in your system (e.g. you are not the admin).

mkdir  ~/mountdir
sudo mount -t cifs <hostname>/<path/to/share> ~/mountdir -o username=root,domain=WORKGROUP,password=password,noexec
Mount Samba shares automatically with /etc/fstab

This instructions have been tested in Ubuntu Linux only. I believe they are quite similar for other distros and Mac OS. First, create a folder where to mount the shares, e.g. /media/<hostname> . Then edit /etc/fstab to add the following line

//<hostname>.bg.ic.ac.uk/data /media/<hostname> cifs uid=user,credentials=/home/user/.smbcredentials,iocharset=utf8,sec=ntlm 0 0

To be able to login, we need to create a credentials file at ~/.smbcredentials and add the following lines with the access details*

username=root
domain=IC
password=password

For security reasons, give the right permissions to this file: sudo chown root ~/.smbcredentials and sudo chmod 600 ~/.smbcredentials .

*For the password, please contact the member of the team that started your container.

SSH

The default password is “password” – instructions for changing this can be found in the readme.

To reverse SSH tunnel use the following command, which has options for going into the background, not executing commands, not allocating a pseudo-tty, and performing forwarding:

ssh -fNT -R <remote_port>:localhost:<host_port> <remote_address>

Then the remote can ssh back simply:

ssh <username>@localhost -p <remote_port>

Remote Desktop

The default password is “password” – instructions for changing this can be found in the readme.

Remote desktop is provided by an image running a TightVNC server. Once started you will be given a port to connect to with a VNC client (TurboVNC is recommended cross-platform). The environment keeps running even once the VNC client is closed – contact Kai if you are done with it and it can be removed.

To mount your Imperial network drive check this service to find the link to your personal folder, and then enter the appropriate address in the address bar of the file manager. Adjust the link so that it should look something like (where <user> is your college username):

smb://<user>@icnas4.cc.ic.ac.uk/<user>

When asked for authentication enter “IC” as the domain and your college password.

CUDA

Note that the CUDA images run without the devices attached but require them attached for actual CUDA computation. For convenience on the servers the command is generally:

sudo docker run -it --device /dev/nvidiactl --device /dev/nvidia-uvm --device /dev/nvidia0 --device /dev/nvidia1 --device /dev/nvidia2 --device /dev/nvidia3 <image>

MATLAB

The cuda-matlab image required instantiating a cuda-vnc container, installing MATLAB through the GUI and then committing the container as a new image. MATLAB registers the “host ID” for activating the product, which requires starting any containers based on the cuda-matlab image with the appropriate MAC address (formerly 02:42:ac:11:00:13). Hence (including the data folder):

sudo docker run -dP --mac-address 02:42:ac:11:00:02 -v /mnt/glusterfs/:/data/:ro -v /mnt/glusterfs/users/<user>:/data/users/<user> cuda-matlab

For use with CUDA the NVIDIA device options must also be included.

There is a script in the home directory of each of the new servers to run this automatically, expecting the CRSID as the first argument (assuming the appropriate folder has already been created in /mnt/glusterfs/users):

./matdocker.sh <crsid>

Management

To stop all running containers use:

sudo docker stop $(sudo docker ps -a -q)

To remove all stopped containers use:

sudo docker rm $(sudo docker ps -a -q)

To remove all untagged images use:

sudo docker rmi $(sudo docker images | grep "<none>" | awk "{print \$3}")

Containers can be monitored via cAdvisor on port 8080. If it is not running it can be started using the following command (note the addition of `–restart=always`):

sudo docker run --restart=always -v /:/rootfs:ro -v /var/run:/var/run:rw -v /sys:/sys:ro -v /var/lib/docker/:/var/lib/docker:ro -p 8080:8080 -d --name cadvisor google/cadvisor:latest

Networking

Private LAN

The private network is set up on 10.0.0.x. With a second network interface, run the following to find the name of the network interface:

ifconfig -a

Then adjust /etc/network/interfaces to add (with the appropriate interface name and x for the final bit of the network address):

auto eth0
iface eth0 inet static
  address 10.0.0.x
  netmask 255.255.255.0

Add the private IPs (below) to /etc/hosts, and add the network “cerebro” to /etc/networks with address “10.0.0.0”.

Current machines on the network:

  • <switch>: 10.0.0.1(?)
  • bg-angel: 10.0.0.2
  • bg-beast: 10.0.0.3
  • bg-cyclops: 10.0.0.4

NB: In reality the /etc/hosts files have the public addresses, as the adapter on bg-cyclops is believed to be faulty and this causes problems with distributed software.

etcd

etcd is a distributed key-value store. A single cluster is run across the 3 servers on the private network, hence a static configuration can be used. The following command (see the docs for more details) must be run successfully on each server to start the cluster (the cluster expects all machines up to start with), but henceforth the cluster will still run even with one instance:

ADDR=http://`ifconfig eth0|grep -Po 't addr:\K[\d.]+'` && \
sudo docker run -d --net=host --restart=always --name etcd quay.io/coreos/etcd \
-name `hostname` \
-initial-advertise-peer-urls $ADDR:2380 \
-listen-peer-urls $ADDR:2380 \
-listen-client-urls $ADDR:2379,http://127.0.0.1:2379 \
-advertise-client-urls $ADDR:2379 \
-initial-cluster bg-angel=http://10.0.0.2:2380,bg-beast=http://10.0.0.3:2380,bg-cyclops=http://10.0.0.4:2380

The first part of the command extracts the private IP address (dependent on “eth0” being the correct network interface). The Docker containers can be restarted as necessary, as long as one machine is running. Note that the container must be bound to the host’s network interface, and will bind to ports 2379 for client traffic and 2380 for peer (etcd instance) traffic.

To run etcdctl, for example to query “cluster-health”, the following can be used:

sudo docker run --rm --net=host --entrypoint ./etcdctl quay.io/coreos/etcd <command>

Backups

Box

College users now have unlimited storage space in box. There’s a web UI and a client for Windows and Mac to synchronise your files. However, Linux users don’t have a native client nor we have documentation on how to NOT use the web UI to asynchronously synchronise or files. There are 2 workarounds (one GUI, one CLI/automatic):

CLI

The following is based on these instructions for davfs2 to mount a Box folder with /etc/fstab. Firstly, install davfs2. Then, make a mount point. Thirdly, disable locks (see link above for what to edit). Fourth, add your credentials to the secrets file (see link above for what to add). In order (excluding what needs to be edited in the file):

sudo apt-get install davfs2
mkdir ~/box
sudo nano /etc/davfs2/davfs2.conf
sudo nano /etc/davfs2/secrets

Finally, add the following (one) line to /etc/fstab, replacing <user> with your username:

https://dav.box.com/dav/  /data  davfs defaults,_netdev,uid=<user>,gid=<user>  0  0

GUI

You can use the nautilus file manager in Gnome to access your Box space by doing the following:

  1. Bring up Nautilus
  2. Select Files -> Connect to Server
  3. For  Server Address enter: davs://username%40imperial.ac.uk@dav.box.com/dav
    Replace username with your IC Box username that you can find in your Box Account Settings (see image below).  The ‘%40’ is the character encoding for ‘@’ and you must leave that there exactly as shown.Screenshot from 2016-03-09 16-32-32
  4. Click Connect and, when prompted, enter your external password for Enterprise Box. This should bring up a File Browser window showing you the files you have in your Box space.