Yijin’s Writings

Obsolete: Old Post on Uniswap V2

2021-02-15T18:00:00+00:00

Uniswap is now on v3 and beyond. This old post is now very obsolete, and thus removed to avoid misleading anyone looking for relevant info online!

Using fastai2 on PEER Hub ImageNet Challenge Tasks

2020-04-13T09:00:00+00:00

In my previous post, I described how I built a Singularity container with an editable fastai2 installation for use in the new iteration of their Deep Learning Part 1 course (aka ‘part1-v4’), which is currently on-going.

In this post I would like to share (and record for my own future reference!) my exploratory use of fastai2 on a dataset/challenge that is of interest in the built environment, which is an obvious area of focus for my company Arup.

The dataset is from the PEER Hub ImageNet (PHI) Challenge 2018 ¹ ², which is apparently the first image-based structural damage recognition competition, with a large image dataset (now called Φ-Net) relevant to the field of structural engineering. PEER designed a total of eight detection tasks ² to contribute to the establishment of automated vision-based structural health monitoring. Using fastai2, I explored a few of these tasks, and will use Task 1 as an example in this post.

Task 1 – Scene level – Three classes (pixel/object/structural):

Looking at the data

For Task 1, PEER provided 17424 labelled images, marked as one of the three classes – pixel/object/structural. I was not part of the original PHI Challenge back in 2018, and do not actually know how PEER provided the dataset at the time. The data that I have (courtesy of my colleagues’ entry in the PHI Challenge) is a set of numpy .npy files, which contain the bitmap RGB data with standard resolution 224 × 224px, and their respective class labels. The test set images are also available (along with the sample_submission.csv for the original challenge entry submission), but I do not have the corresponding labels (i.e. the ‘answers’) for the test set:

ylee@hpc01 Task1 ]$ ls
sample_submission.csv  X_test.npy  X_train.npy  y_train.npy

The first thing is obviously to have a look at the data. This means loading it using numpy, where the shape of (17424, 224, 224, 3) indicates that it should be 17424 images of 224px sides with three channels (RGB).

import numpy as np

data = np.load('X_train.npy')
y = np.load('y_train.npy')

print(data.shape, y.shape)

(17424, 224, 224, 3) (17424,)

We can confirm the image data by using PIL to create an image from the first item, and showing it inline in Jupyter Notebook:

from PIL import Image

im = Image.fromarray(data[0])
im

I prefer to have image data in the form of actual image files, as it makes it possible to easily look at the data just by using image viewers or thumbnail view in a file manager. PIL can be used to save the dataset back into bitmap .bmp files. I chose to output filenames in the format of num_pX.bmp, where num is the item number [0 to 17423] and X is the class ID (0 = pixel; 1 = object; 2 = structural).

for i in range(len(data)):
    fname = '%05d_p%s.bmp' % (i, y[i])
    im = Image.fromarray(data[i])
    im.save(fname)

To make it easier for data-loading in fastai2, I created three subfolders and just moved the images by class into the respective folders. Incidentally, this actually made it easier later on, when I started to put in my own ‘corrections’ to the PHI training data set labels. I also quickly checked how many images there are for each class.

ylee@hpc01 bmp ]$ mkdir p0
ylee@hpc01 bmp ]$ mv *_p0.bmp ./p0/
ylee@hpc01 bmp ]$ ls ./p0/ | wc -l
5879
ylee@hpc01 bmp ]$ mkdir p1
ylee@hpc01 bmp ]$ mv *_p1.bmp ./p1/
ylee@hpc01 bmp ]$ ls ./p1/ | wc -l
5713
ylee@hpc01 bmp ]$ mkdir p2
ylee@hpc01 bmp ]$ mv *_p2.bmp ./p2/
ylee@hpc01 bmp ]$ ls ./p2/ | wc -l
5832

Looks like the 17424 images were roughly evenly split into the three classes, i.e. no real need to worry about imbalanced data set. Note that the above could have been done within Python in Jupyter Notebook, but I am in the shell terminal a lot anyways, so I just did it in terminal.

Now that the data is in the form of .bmp files, with subfolders indicating their respective labels, it is straightforward to load into fastai2.

path = Path('/data/phi_challenge/task1/bmp')

path.ls()

(#3) [Path('/data/phi_challenge/task1/bmp/p0'),
Path('/data/phi_challenge/task1/bmp/p1'),
Path('/data/phi_challenge/task1/bmp/p2')]

Using fastai2’s very convenient get_image_files function to get all the image filenames (in this case, they are .bmp files):

fns = get_image_files(path)
fns

(#17424) [Path('/data/phi_challenge/task1/bmp/p0/00000_p0.bmp'),
Path('/data/phi_challenge/task1/bmp/p0/00002_p0.bmp'),
Path('/data/phi_challenge/task1/bmp/p0/00006_p0.bmp'),
Path('/data/phi_challenge/task1/bmp/p0/00007_p0.bmp'),
Path('/data/phi_challenge/task1/bmp/p0/00009_p0.bmp'),
Path('/data/phi_challenge/task1/bmp/p0/00011_p0.bmp'),
Path('/data/phi_challenge/task1/bmp/p0/00012_p0.bmp'),
Path('/data/phi_challenge/task1/bmp/p0/00016_p0.bmp'),
Path('/data/phi_challenge/task1/bmp/p0/00017_p0.bmp'),
Path('/data/phi_challenge/task1/bmp/p0/00019_p0.bmp')...]

Followed by another useful function, verify_images, to check for invalid image files. In this case, it returned zero item, i.e. all 17424 images were okay – expected, since they were written into image files by PIL previously!

failed = verify_images(fns)
failed

(#0) []

Then, I can create a fastai2 DataBlock with the labelled data set. For more information on the fastai2 DataBlock API, have a look at this great blog post from Zach Mueller.

phi1 = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2),
    get_y=parent_label,
    batch_tfms=aug_transforms())

dls = phi1.dataloaders(path)

The blocks for this data set are images (independent variable) and category (dependent variable) i.e. the label.
The data items can be obtained from the same get_image_files function used above.
I used RandomSplitter to create a validation set with 20% of randomly chosen training data.
The y (dependent) variable can be obtained from the subfolder name, i.e. ‘parent_label’ of the image filenames.
This is just for quick exploration, so I just used the fastai2 defaults for data augmentation, passing the transform definitions from aug_transforms to be applied onto the data batches.

After that, a DataLoader (PyTorch-style) is created from the path containing my data, using the DataBlock definition above.

Now I can do a quick visual check on the data, by showing a single batch of the images with their labels. The default batch size is 64, but I just need to see a few of the images to spot-check for any problem, so I asked for 16 in a batch to be shown.

dls.valid.show_batch(max_n=16)

Looks reasonable, with pixel-level labelled as p0 (e.g. a crack on a wall), object-level as p1 (only one item shown above, looks like part of a wall/column?), and structure-level as p2 (e.g. a whole house/building/bridge). There seems to be a p2 image that was wrongly rotated by 90°, but I’ll just leave it for now, unless it turns out to be a problem when looking at the trained model and its predictions and interpretations later.

Create and train model

From here, it is very easy to create a standard CV deep learning CNN learner, with the DataLoader (defined above) and a pretrained model (i.e. the now-ubiquitous “transfer learning” method). Here I used a pretrained ResNet34 model, asking for an additional KPI metric of error rate during training.

Then, I asked for the model to be trained and fine-tuned for five epochs, using the default fastai2 hyperparameters without thinking too much about it (since it’s just for exploring~).

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(5)

epoch	train_loss	valid_loss	error_rate	time
0	0.522718	0.320116	0.124569	00:32

epoch	train_loss	valid_loss	error_rate	time
0	0.320330	0.233091	0.093571	00:52
1	0.262438	0.225992	0.098163	00:49
2	0.210474	0.198712	0.080941	00:46
3	0.143000	0.205004	0.076636	00:45
4	0.102853	0.202703	0.073192	00:44

As shown above, with just the initial training of the ‘head’ of the ResNet34 model (with the original model parameters pretrained on ImageNet), it was already achieving an error rate of just 12.5%, which is not too shabby. Note that it is recommended and sensible to first try and establish a simple ‘baseline’ for sanity check and basic benchmark, but I did not do that here (sorry!).

During five more epochs of fine-tuning, we can see that both training loss and validation loss continue to decrease, i.e. the model is ‘learning’ successfully. The ever-reducing validation loss indicates that the model is not quite suffering from the dreaded ‘overfitting’ yet. At the end of a total of just six epochs of training, the model has an error rate (‘judged’ on the random 20% validation set) of 7.3%, or in other words, it is 92.7% accurate in differentiating between the three classes. That’s pretty good-going, with just ~5 minutes of training (albeit on an NVIDIA Tesla V100…)!

I can then show the confusion matrix to see where the errors are made:

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

The confusion matrix looks reasonable, in that the model did not ‘skip’ a level in mistaking p0 as p2 or vice versa. As an aside, this might mean that there will not be much benefit in trying an ‘ordinal regression’ approach for this classification exercise (though it might still be worth trying, something for another time, perhaps).

In addition to confusion matrix, it is also useful to plot the images that gave the top losses, to see where/what the model was most inaccurate with:

interp.plot_top_losses(36)

This is where I cannot say that I agree with some of the labels in the data set… For example, in the grid above, the second from right image in the first row (shown again below) is labelled p1 (i.e. object-level), but it sure looks like a p0 (i.e. pixel-level) to my human eyes, in agreement with the trained model’s prediction!

p0, not p1?

For cases like this, with fastai2 it is possible to quickly ‘correct’ the data within Jupyter Notebook, taking and modifying some functions from the part1-v4 course notebooks, which uses ipywidgets to provide a graphical UI for picking actions. I can call the ImageClassifierCleaner function on the CNN learner, and display the UI to pick the correcting ‘actions’ for images of interest:

cleaner = ImageClassifierCleaner(learn)
cleaner

For data tracking purposes, I modified the ‘correction’ actions from the original part1-v4 notebook, so that instead of actually deleting unwanted training data images (with the unlink function), it renames the unwanted .bmp file to .deleted instead, which means that when I retrain a model with the cleaned data the ‘deleted’ files will not be picked up by the get_image_files function (see above). Because of the way I formed the .bmp filenames when writing out the original .nyp data into .bmp images, it is very easy to see which images have been discarded (i.e. renamed to .deleted), and which images have had their labels corrected (e.g. nnnnn_p1.bmp being moved into the p0 subfolder) if I want to trace back the changes/corrections that I made.

for idx in cleaner.delete():
#     cleaner.fns[idx].unlink()
    delname = "%s/%s.deleted" % (str(cleaner.fns[idx].parent), cleaner.fns[idx].name[:-4])
    shutil.move(str(cleaner.fns[idx]), delname)
for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat)

In a few rounds of quick training, plotting top losses, and running ImageClassifierCleaner, I ended up deleting a few images as shown below, where they look like strange computer screenshots instead of actual photos:

I also ‘corrected’ (by my interpretation) the labels for about 120 images, which is only ~0.7% of the data set, but I think it’s always useful to have more accurately labelled data, especially when it’s so easy to correct them within the notebooks!

Results and quick comparison

After these corrections, in about 30 minutes of training across four quick experimental notebooks, the best error rate I got was around 6.6% with a pretrained ResNet50 model, or a 93.4% accuracy. Looking at this pdf, the 2018 PHI Challenge winners achieved a test set accuracy of 95% for Task 1, using ensembles of trained models. The mean accuracy for Task 1 was 89%, so my numbers (caveat below) are between the mean and the winner (closer to the winner), which is not too bad : )

Some thoughts on these:

For the amount of work put into these quick exploration and training, I am quite happy with the accuracy of 93% vs. the winning 95% (back in 2018), though obviously note that a difference of ~18 months is aaaaaaaages in the Deep Learning world in terms of improvements in techniques, best practices, and results metrics!
To some, an accuracy difference of 2% might not sound like much, but actually, the winning entry in 2018 (5% error) is 2 percentage points better than my 7% error, i.e. it’s 2/7 × 100 = 29 percent better!
As I only have the validation-set accuracy (of 20% randomly chosen from the labelled data) and do not have the test-set ‘answers’, it is not really a like-for-like comparison with the PHI Challenge numbers, though I think it’s likely still indicative.
I have only used a single ResNet model, and I am not sure whether (or how much) ensembles of models can help with my numbers, noting that ensembles seemed to have significantly boosted the PHI Challenge 2018 winning entries, so it is definitely worth looking into.
I wonder if there are higher resolution images available for the same data set, and whether (or how much) that might help. I used 224 × 224px images that I had to hand, but there might be better quality, higher-resolution images available. It seems like the great people at PEER have now made their Φ-Net dataset available for download, but I have not downloaded or looked at it yet – maybe this is the same 224px data that I already have.
fastai2 has made it very easy for me to load in the data, explore the data visually, create CNN models with pretrained architectures, interpret training results, and even quickly correct mislabelled (I think) data for retraining. As always, kudos to the great people at fast.ai, including the vibrant user community there.

Citations:

Gao, Y. and Mosalam, K. M. (2019). PEER Hub ImageNet (Φ-Net): A large-scale multi-attribute benchmark dataset of structural images, PEER Report No.2019/07, Pacific Earthquake Engineering Research Center, University of California, Berkeley, CA. ↩
Gao, Y., & Mosalam, K. M. (2018). Deep transfer learning for image-based structural damage recognition. Computer-Aided Civil and Infrastructure Engineering, 33(9), 748-768. ↩ ↩²

Bufferbloat and Mikrotik Router

2020-03-29T16:00:00+00:00

Due to covid-19 social (distancing) responsibility, a lot of people are now having to work from home (WFH). For me, this means having to connect to the company VPN for file access, remote-desktop for data visualisation on HPC, and online-only communications including frequent video calls and screen-sharing.

With a sudden spike in network traffic (from everyone WFH!), company network bandwidth can obviously become a bottleneck. However, besides bandwidth, network latency (a.k.a. ‘lag’ – which reminds me of Counter Strike Beta 6.0…) can also be a problem, e.g. for remote-desktop and screen-sharing.

From my recent WFH network activities, and partly by pure chance, I happen to stumble into —or at least to the entrance of…— the rabbit hole of home networking tweaks, including learning about something called Bufferbloat that affects latency, and how to mitigate it.

Bufferbloat

The Bufferbloat Project explains Bufferbloat as “the undesirable latency that comes from a router or other network equipment buffering too much data.” Their wiki suggests a simple measure of Bufferbloat using DSLReports Speed Test. When I ran the speed test, I got results that looked like the following:

My broadband package is 50Mbit/s down, 5Mbit/s up, and so it looks like I am getting my money’s worth (extra ~10% down-link bandwidth!), but the latency seems affected by this Bufferbloat problem, where I observed occasional latencies of >200ms during the speed test:

Their suggested solution is to use Smart Queue Management (SQM) in your router, but my —actually, most?— router does not have SQM, unfortunately… However, they did say that QoS (which is more widely available in routers) can help, even though it will not solve Bufferbloat completely. And so I can still give it a go~

With my newly setup Mikrotik router (more on that below), a bit more Googling brought me to this nice page, where they have a simple Mikrotik QoS config tool. My internet connection details are simple enough, with just a single WAN internet connection to Mikrotik’s ether1 port, and a single bridge (for ethernet and WiFi) on the LAN interface side:

Up-link and down-link speeds seem to follow my broadband package specs (see speed test above), and so I can just use 5M and 50M on the config tool webpage:

I left the rest of the settings alone, and just downloaded the resulting script. To be safe, I had a quick look at it in a text editor, and then followed the instructions to import the config script into Mikrotik Winbox. I also had to go into IP - Firewall - Filter Rules, and removed the ‘FastTrack’ rule so that the new QoS config settings will apply, instead of being bypassed by ‘FastTrack’.

The generated config seems to use a queue type called PCQ, even though another site about lag/latency mentioned the use of a different queue type called SFQ for Bufferbloat (in the absence of the preferred SQM method). But, what the hell, they all don’t make much sense to me anyways, so I’ll just try the config I’ve got!

Then, the moment of truth. I re-ran the DLSReports Speed Test, and now it’s saying that the Bufferbloat problem is gone, with the rating improving from B to A+, though it looks like the QoS bandwidth limits have caused the speeds to reduce slightly, as a trade-off:

I have since tested a few different speed limit (approx. ±10%) settings in the QoS config, but in the end still settled on the ‘rated speeds’ for my broadband package. I also tweaked the QoS service list and protocol/port settings, to change the priorities for my own use cases, while also adding new services such as Microsoft Teams to higher priority, for video meetings etc.

Anecdotally, it seems like the overall network performance has improved with the configured Mikrotik router, and video calls on various software seem to perform well, with less ‘lag’ than before. I guess the best solution to prevent Bufferbloat is still to use a router that supports SQM e.g. via OpenWrt firmware, but the routers can be expensive, and even the more affordable ones look like they will have trade-offs in other features in an all-in-one WiFi router. Overall, I am happy to stick with the new Mikrotik, which brings us to how I started using it in the first place…

Mikrotik for home use

Even before the extra WFH traffic, my old Netgear WiFi router was already starting to act up, with its 5GHz WiFi occasionally dropping for no reason. I did a bit of research, and read that Mikrotik gear (frequently used by SOHOs) can be a cost-effective, high-performance home router. A bit more Googling led me to the Mikrotik hAP ac², which seems to fit my requirements:

Router and WiFi access point all-in-one
Dual-Concurrent 2.4/5GHz AP, supporting up to 802.11ac WiFi
Five Gigabit ethernet ports for WAN and LAN-wired devices
Small unit, with no crazy antennas, but enough coverage for a small place
Relatively cheap for its features, c.£65 on Amazon

I found one for <£60 from an eBay seller, and went for it. Then came the fun(?) of setting it up for home networking use!

Unlike normal retail WiFi routers, this required a bit more work. For the initial setup, I pretty much just followed this great guide (thanks, Murray!). As I bought my Mikrotik from eBay, before doing anything, I did a full reset of the router. I did not have to do the “Make You Old Router Into a Modem” step, because my fibre-optic ISP-supplied router (WiFi functionality already disabled from before) can just be connected directly to the Mikrotik ethernet port 1 as the WAN connection. I did not hear any beeps (mentioned in step 2) when I powered up the Mikrotik, but I guess that differs from model to model.

Following step 3:

I used Winbox’s Quick Set to setup the local network (step 3a) and system password (3b).
For WiFi (3c), I only setup 5GHz WiFi, and disabled the 2.4GHz WiFi because none of the devices at home will need it.
I also checked the 5GHz WiFi frequency ranges occupied by my neighbours, and picked a freq. range that appeared free.
I double-checked that WPS is completely disabled, because I do not plan to use it, and it can be a bit dodgy.
I skipped step 3d (“Internet”) because my fibre-optic broadband is already connected as WAN.
Step 3e (“Updates”) showed that the Mikrotik I bought was already up-to-date in its firmware and packages.
I will circle back to the setup of Guest WiFi (3f) later.
And I skipped step 3g because I do not plan to have a VPN server running at home (plus my broadband is without static IP or port-forwarding…)

Then, I had a look at the various things mentioned in step 4. The interfaces all looked okay, and the 5GHz WiFi signal looked good even in the room furthest from the Mikrotik, which is amazing for such a small box (compared to the much bigger old Netgear!). When looking at the DHCP server, I also setup new static IPs for NAS and desktop PC (sometimes used via RDP). For DNS servers, I ran the super informative DNS benchmark tool from GRC, which confirmed that CloudFlare’s 1.1.1.1 (primary) and 1.0.0.1 (secondary) servers are my best bet, by far. The guide also recommended turning off unused IP Services to reduce attack surface, and this is where I referred to further steps (besides keeping things up-to-date) on securing the Mikrotik router, as there have been major vulnerabilities before, though mainly affecting out-of-date firmware.

I did not actually do much more for steps 5 and 6, as the default firewall rules looked okay for my use, and I am not using IPv6 for my home network. In the last section just before his Conclusion, Murray’s guide mentioned Queues and QoS. The tip given is to not use fifo queues, but to use sfq or pcq queues to prevent Bufferbloat, though without much detail. None of these meant anything to me at all, at the time..! But I looked into it further, and I think I got something useful out of it.

Isolated Guest WiFi setup

Next, I circled back to step 3f of this guide, to setup an isolated guest WiFi that sits on a different subnet IP range and prevented from accessing the LAN devices on my network. There was only a brief list of basic descriptions in the guide, and so I found another more detailed one here (thanks, Marthur!) to follow.

Marthur’s steps are pretty clear, and the setup of Virtual WiFi AP, VLAN, firewall rules, etc. were straightforward enough. The only extra thing I had to do was to create a new ‘interface list’ to include both my original LAN Bridge and the new Guest WiFi Bridge, and then modify the Firewall Mangle rules (from the QoS config script; see above) so that the QoS rules are now applied to all traffic (including to/from the new Guest WiFi). Then I just quickly hopped on to the Guest WiFi, confirmed that it has internet connectivity but no access to LAN devices, and then double-checked on Speed Test that it still honoured the QoS settings and did not cause Bufferbloat. And that’s the Guest WiFi done~!

Mikrotik automatic backup and update

Finally, wary of potential vulnerabilities if firmware/packages go out-of-date, I found and followed this (thanks, /u/beeyev!) to setup automatic backup and update for the Mikrotik router. A quick look at the script did not throw up any obvious red flags, so I just imported it into Winbox and set it up according to the clearly commented instructions. I enabled the setting to install only patch minor version updates, and also setup auto-email whenever the script runs (scheduled for every two days). The email feature needs an SMTP server, so I just followed the recommendation and used the excellent free service on smtp2go. One small note: in Mikrotik’s Email settings, for “Start TLS” the tls_only option did not work for me, so I chose yes instead, and it all seems to work fine:

That’s all for this post. Time to get back to WFH with strange working hours…

fastai2 in Singularity Container

2020-03-26T18:00:00+00:00

The awesome people at fast.ai started the 2020 iteration (aka ‘part1-v4’) of their wildly popular Deep Learning Part I course earlier this month, running it entirely online because of covid-19 social (distancing) responsibility.

The course is using the brand new fastai v2 library (fastai2, currently in pre-release) along with PyTorch, and makes a start in covering the content of their upcoming book.

Installation of the fastai v2 library can be pretty straightforward using conda and pip. It is also well-supported on various cloud GPU platforms such as Paperspace and Colab. However, as with many other cutting-edge deep learning software stacks that typically involve quite frequent updates and changes (for bugfixes, performance enhancements, etc.), it can be a challenge to have everything setup in a multi-user HPC environment, without the risk of affecting other users’ software packages needed for production work.

Containerisation technology presents a possible solution to these challenges, by enabling self-contained (hah!) containers that can be built and deployed with all the internally consistent dependencies, without affecting other parts of the host system or other containers. Docker is arguably the most well-known container system right now, but it might not necessarily be the best for a multi-user HPC environment used for projects and production —instead of experimentation— work, as it can be difficult to setup and ensure the correct user/group permissions in the host system are replicated and honoured in Docker containers. There also seems to be potential risk of undesired privilege escalation to root access due to the way that the Docker daemon works, which is again a problem for multi-user production HPC.

My quick search showed that a different container system, Singularity, might be better-suited for my use case above. The article here helpfully describes some of the problems in Docker defaults that can be solved by Singularity. Even though I do not have sudo permission on the multi-user HPC, I am able to build Singularity containers with fastai2 on a different machine (where I have sudo), e.g. a cheap and cheerful small cloud instance. And when I (and/or others) run the container on the HPC, it will natively support NVIDIA’s CUDA GPU compute for deep learning, honour user/group permissions and filesystem access on the HPC, and will not break or interfere with other software stacks (e.g. finite element analysis with MPI, and GPU-enabled CFD with a different CUDA version) on the HPC used by other users. This gives me the flexibility of being able to experiment and tinker with the latest development version of fastai2 (or other deep learning packages) without having sudo on the HPC, prepare and share Singularity containers that have functioning fastai2 installations, while retaining the rigidity and stability needed for existing software with potentially conflicting dependencies and project-based user security permissions on the HPC.

I have not been experimenting with and using Singularity containers for very long yet, but I will try to describe the steps I took to build the Singularity container with an editable install (i.e. linked to an update-able Git repository) of fastai2.

Installing Singularity

Firstly, Singularity will need to be installed by the sysadmin on the HPC by just following the installation guide. If a separate machine/instance is used to build the Singularity containers (like in my case), then Singularity needs to be installed there too, and root permission is needed for the container-build.

Creating Singularity def file

Next, a Singularity definition file (similar to Docker’s Dockerfile) is created, to have all the steps needed to build the container with the software (fastai2 in this example) and its dependencies (e.g. fastai v1 library, fastcore, etc.), plus any ancillaries (e.g. Jupyter Notebook).

Singularity containers can be bootstrapped from Docker images (which are more popular and widely available), and so in the def file I start with NVIDIA’s own Docker image containing CUDA:

BootStrap: docker
From: nvidia/cuda

Then, define the environment variables that will be set at runtime (i.e. when the container is used):

%environment
    export LANG=C.UTF-8
    export PATH=$PATH:/opt/conda/bin
    export PYTHON_VERSION=3.7
    export LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64

The next bit contains the steps that will be used to install fastai2 and its dependencies, within the %post section of the def file. Again, start by defining the same environment variables, which are used also at build-time (as opposed to runtime, mentioned above):

%post
    export LANG=C.UTF-8
    export PATH=$PATH:/opt/conda/bin
    export PYTHON_VERSION=3.7
    export LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64

Then, install the software and tools needed to setup fastai2 later on. The default OS in NVIDIA’s CUDA Docker image is Ubuntu, and so apt-get is used for this step. I also update pip, and install miniconda, as conda will be used in the next step.

    apt-get -y update
    apt-get -y install --no-install-recommends build-essential ca-certificates \ 
            git vim zip unzip curl python3-pip python3-setuptools graphviz
    apt-get clean
    rm -rf /var/lib/apt/lists/*

    pip3 install --upgrade pip

    curl -o ~/miniconda.sh -O \
      https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh \
      && chmod +x ~/miniconda.sh \
      && ~/miniconda.sh -b -p /opt/conda \
      && rm ~/miniconda.sh \
      && conda install conda-build

Next, go ahead and use conda to install fastai v1 library, and while we are at it, also install Jupyter Notebook and its extensions:

    conda update conda && conda install -c pytorch -c fastai fastai \
      && conda install jupyter notebook \
      && conda install -c conda-forge jupyter_contrib_nbextensions

As I am going to do an editable pip install of both fastai2 and the fastcore dependency, I git clone the two repositories. Note that they are cloned into a shared filepath that exists on the HPC host system, so that I can choose to git pull update the repositories on the HPC host in future, and all the user(s) running the fastai2 Singularity container will automatically pick up the latest updates on the editable install:

    mkdir -p /data/shared
    cd /data/shared && git clone https://github.com/fastai/[fastai2][fastai2] \
    && git clone https://github.com/fastai/fastcore

Then, run the editable pip installs, as recommended currently by fastai as “probably the best approach at the moment, since fastai v2 is under heavy development” still:

    cd /data/shared/fastcore && python3.7 -m pip install -e ".[dev]"
    cd /data/shared/[fastai2][fastai2]  && python3.7 -m pip install -e ".[dev]"

As a final setup step, install some other libraries and packages used in the part1-v4 fastai course:

    conda install pyarrow
    python3.7 -m pip install graphviz ipywidgets matplotlib nbdev>=0.2.12 \
        pandas scikit_learn azure-cognitiveservices-search-imagesearch sentencepiece

With that, all the necessary installs and setup should be there. I then add the ‘start script’ that will be executed when the Singularity container is started. In this case:

Start the Jupyter Notebook server
Make it accessible to other computers/IP (firewalled to internal network only, in our case)
Have the server listen to a non-default port of 9999 (Jupyter default is 8888)
Give it a password hash for access (in this case, the hash corresponds to password fastai)
Make it start in the shared filepath on the HPC host system where I cloned the fastai2 and fastcore repositories. This is also where I have other shared files needed (e.g. the part1-v4 course material)

%startscript
    jupyter notebook --ip=0.0.0.0 --port=9999 --no-browser \
        --NotebookApp.password='sha1:a60ff295d0b9:506732d050d4f50bfac9b6d6f37ea6b86348f4ed' \
        --log-level=WARN --notebook-dir=/data/shared/ &

Finish the def file by adding some basic label and descriptions:

%labels
    ABOUT container for fastai2 (dev editable install) with jupyter notebook on startup (port 9999), for March 2020 fastai course
    AUTHOR Yijin Lee

The complete example def file explained above can be found here.

Building the Singularity container

With the def file, I can now build the Singularity container to get the resulting container sif file. I needed sudo or root permission for this, and so I used a cheap AWS instance (t2.small), instead of the HPC environment (where I only have basic user permissions). My AWS instance only has limited / root device file space, and so I set an environment variable for Singularity to use a different AWS block device storage as the temp directory (or else the build will fail):

root@aws-t2:~# export TMPDIR=/blockdevice/tmp
root@aws-t2:~# ls
fastai2.def
root@aws-t2:~# singularity build fastai2.sif fastai2.def

With the Singularity build, the requested sif file will be created. It is quite a big file, at around 5.0GB, but I only really needed to build and transfer it once, since it will contain an editable (and thus update-able) install of fastai2:

root@aws-t2:~# ls -lh
-rw-r--r-- 1 root   root   1.9K Mar 25 12:00 fastai2.def
-rwxr-xr-x 1 root   root   5.0G Mar 25 12:30 fastai2.sif

The sif file can then be copied/transferred to the HPC environment for actual use.

Running the Singularity container

As I want to use NVIDIA GPU for deep learning compute, the HPC where I run the fastai2 Singularity container needs to have the correct NVIDIA GPU drivers installed (by the sysadmin). Note that the only hard requirement is the drivers — CUDA and other dependencies are self-contained in our Singularity sif file already, all with the correct versions. I can check the NVIDIA GPU status by running nvidia-smi:

[ylee@hpc01 shared]$ nvidia-smi
Thu Mar 25 13:00:00 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:37:00.0 Off |                    0 |
| N/A   53C    P0    30W / 250W |     14MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2468      G   Xorg                                          14MiB |
+-----------------------------------------------------------------------------+

I will start the Singularity container from the shared filepath where the necessary files (e.g. fastai2 and fastcore repositories, part1-v4 course material, etc.) reside — this was mentioned above. In my case, this is in /data/shared, and my sif file is in /data/shared/singularity:

[ylee@hpc01 shared]$ pwd
/data/shared
[ylee@hpc01 shared]$ singularity instance start --nv ./singularity/fastai2.sif fastai2
INFO:    instance started successfully
[ylee@hpc01 shared]$ singularity instance list
INSTANCE NAME    PID      IMAGE
fastai2          13579    /data/shared/singularity/fastai2.sif

The --nv flag above is for Singularity to be able to leverage NVIDIA GPU.

Because of the ‘startscript’ defined in the def file, there should now be a Jupyter Notebook server running and listening on port 9999:

[ylee@hpc01 shared]$ netstat -plunt
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:9999            0.0.0.0:*               LISTEN      13579/python

I can thus point a web browser to the IP at port 9999, and enter the password (defined as fastai in the hash within our def file) to access Jupyter Notebook.

I can also run a shell within the Singularity container instance, to start interactive Python directly, without going via Jupyter Notebook:

[ylee@hpc01 shared]$ singularity shell instance://fastai2
Singularity fastai2.sif:/data/shared> python3.7
Python 3.7.6 (default, Jan  8 2020, 19:59:22)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

From within Python, I can also quickly confirm that fastai2 is indeed installed, and CUDA compute is available for PyTorch:

>>> from fastai2.vision.all import *
>>> torch.cuda.is_available()
True

And, because the container has an editable pip install of fastai2 residing on the HPC host system, I can git pull or git checkout to a specific fastai2 commit from the HPC, and all users of the Singularity container will then ‘get’ the corresponding fastai2 version. For example, starting with a slightly older version (0.0.14):

>>> import fastai2
>>> fastai2.__version__
...
'0.0.14'
>>> exit()

I can exit from the Singularity instance shell to get back to the HPC host system, while leaving the container still running. I then change the fastai2 version (e.g. update to the latest via git pull), and the change will be ‘live’ back in the Singularity instance shell.

Singularity fastai2.sif:/data/shared> exit
exit
[ylee@hpc01 shared]$ cd fastai2
[ylee@hpc01 fastai2]$ git pull
.
.
.
[ylee@hpc01 fastai2]$ cd ..
[ylee@hpc01 shared]$ singularity shell instance://fastai2
Singularity fastai2.sif:/data/shared> python3.7
Python 3.7.6 (default, Jan  8 2020, 19:59:22)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fastai2
>>> fastai2.__version__
'0.0.16'

All the shared filesystem files (e.g. ipynb notebooks) can be accessed from within the container, retaining the original user/group permissions, without having to do/set anything for Singularity. When done, just stop the running container:

[ylee@hpc01 shared]$ singularity instance stop fastai2
Killing fastai2 instance of /data/shared/singularity/fastai2.sif (PID=13579) (Timeout)

Summary

Without getting sudo or root permission on a production HPC cluster, I can define and build a Singularity container on a separate cheap cloud instance (where root is available), which can have a pip editable install of fastai2.

The resulting container sif file can be used on the HPC cluster, have native access to GPU CUDA compute, easily retain user/group permissions in the multi-user HPC environment, and have all the necessary software stack dependencies (except NVIDIA GPU driver, which must be present on the HPC host system) without messing up or interfering with other software stacks or environments on the HPC host system.

The editable install residing on the HPC host filesystem means that I can easily upgrade/change the version of fastai2 via git, and users of the Singularity container can get the corresponding version changes ‘live’. This allows a ‘balance’ of having flexibility to experiment with software stacks in a multi-user production HPC environment with native user/group permissions, while reducing the risks of messing things up for everyone (e.g. via undesired root privilege escalation that can happen in Docker). It also means that other users can all re(use) the same container with the same versions of software stack, e.g. for a fastai study group.

My Singularity example def file explained above can be found here. And please do join us for lively discussions on the fastai forums.

Coding Post

2020-01-23T19:07:00+00:00

A plug for JavaScript API in Oasys LS-DYNA Environment.

var m = new Model();
var n = Node.First(m); 
m.UpdateGraphics();

Hello, Dunia

2020-01-23T19:00:00+00:00

Just testing out Jekyll for blog : )

Check out the Jekyll docs for more info on how to get the most out of Jekyll. File all bugs/feature requests at Jekyll’s GitHub repo. If you have questions, you can ask them on Jekyll Talk.