Running Powerflow on Ubuntu with SLURM and Infiniband

This is a walkthrough on my work on running a proprietary computational fluid dynamics code on the snap version of SLURM over Infiniband. This time, I’ll take you through what it takes to get powerflow to run on Ubuntu18.04. If you like to try out the same thing on STARCCM+, here is a link to a post that takes you through that.

You can use this to perform scaling studies, track down issues and optimizing performance or use it as you like. Much of this will work on other OS:es too.

This is the workbench used here:

Hardware: 2 hosts with 2×20 cores 187GB ram.
Infiniband: Mellanox MT28908 Family [ConnectX-6]
OS: Linux 4.15.0-109-generic (x86_64) Ubuntu18.04.4
SLURM 20.04 (https://snapcraft.io/slurm)
OpenMPI: 4.0.4 (ucx, openib)
Powerflow: 6.2019
A Reference model which is small enough for your computers and large enough to run over 2 nodes on your available cores.

I use Juju to deploy my SLURM clusters in any cloud to get up and running. In this case, I use “MAAS” as the underlying cloud, but this would work on other clouds aswell.

Lets get started.

Modify ulimits on all nodes.

This is done by editing /etc/security/limits.d/30-slurm.conf

* soft nofile  65000
* hard nofile  65000
* soft memlock unlimited
* hard memlock unlimited
* soft stack unlimited
* hard stack unlimited

Modify slurm systemd unit startup files to make ulimit permanent to the slurmd processes.

$ sudo systemctl edit snap.slurm.slurmd.service

[Service]
LimitNOFILE=131072
LimitMEMLOCK=infinity
LimitSTACK=infinity

* Restart slurm on all nodes.

$ sudo systemctl restart snap.slurm.slurmd.service

* Make sure login nodes has correct ulimits after a login.

* Validate that all worker nodes also has correct values on ulimits when using slurm. For example:

$ srun -N 1 --pty bash
$ ulimit -a

You must have all consistent settings for ulimit or things will go sideways. Remember that slurm propagates ulimits from the submitting node, so make sure those are consistent there too.

I’m going to assume you have installed Powerflow on all you nodes at: /software/powerflow/6.2019 but you can have it wherever.

Powerflow also needs csh, install it:

sudo apt install csh

Modify the Powerflow installation

Since powerflow doesn’t yet support Ubuntu (which is a shame) we need to get around this by fixing a few small bugs to get our simulation running as we want.

Workaround #1 – Incorrect awk path

Powerflow assumes that a few os commands are located in a fixed location on all OS:es, which is a bug of course. The bug is located in the file “/software/powerflow/6.2019/dist/generic/scripts/exawatcher” and causes problems.

To fix this, you either edit the exawatcher script and comment out the references:

#set awk=/bin/awk
#set cp=/bin/cp
#set date=/bin/date
#set rm=/bin/rm
#set sleep=/bin/sleep

… as an ugly alternative, you create a symlink to “awk” which is enough to work around the bug. Hopefully this will be fixed in future versions of powerflow.

sudo ln -s /usr/bin/awk /bin/awk

This is not needed on OS:es such at centos6 and centos7 which have those symlinks already in place.

Workaround #2 – bash is not sh

Powerflow has an incorrect script header, referencing “#!/bin/sh” for code that is in fact “#!/bin/bash” and will render into a syntax error on ubuntu.

Replace #!/bin/sh header with #!/bin/bash in the file: /share/apps/powerflow/6.2019/dist/generic/server/pf_sim_cp.select

This is enough really. Its time to run powerflow through SLURM.

Time to write the job-script

#!/bin/bash
#SBATCH -J powerflow_ubuntu
#SBATCH -A erik_lonroth
#SBATCH -e slurm_errors.%J.log
#SBATCH -o slurm_output.%J.log
#SBATCH -N 2
#SBATCH --ntasks-per-node=40
#SBATCH --exclusive
#SBATCH --partition debug

LC_ALL="en_US.utf8"
RELEASE="6.2019"
hosttype="x86_64-unknown-linux"
INSTALLPATH="/software/powerflow/${RELEASE}"

export PATH=$INSTALLPATH/bin":$PATH
export LD_LIBRARY_PATH="$INSTALLPATH/dist/x86_linux/lib:$INSTALLPATH/dist/x86_linux/lib64"

# Set a low number of timesteps since we are only here to test
NUM_TIMESTEPS=100

export EXACORP_LICENSE=27007@license.server.com


exaqsub \
-decompose \
-infiniband \
-num_timesteps $NUM_TIMESTEPS \
-foreground \
--slurm \
-simulate \
-nprocs $(expr $SLURM_NPROCS - 1) \
--mme_checkpoint_at_end \
*.cdi

You probably need to modify this above script for your own environment but the general things are in there. An important note here is that powerflow normally need a separate node to run its “Control Process (CP)” on with more memory than the “Simulation Processors (SP)” nodes. I’m not taking that into account since my example job is small and fits in RAM for this example. This why I also get away with setting:

-nprocs $(expr $SLURM_NPROCS - 1) \

Powerflow will decompose the simulation into N-1 partitions which when simulation start, will leave 1 cpu for running the “CP” process on. This is suboptimal but unless we do this, slurm will complain with:

srun: error: Unable to create step for job 197: More processors requested than permitted

There is probably a smart way of telling slurm about a master process which I hope to learn about soon and use to properly run powerflow with a separate “CP” node.

Submit to slurm

Submitting the script is simply:

$ squeue -p debug ./powerflow-on-ubuntu.sh

You can watch your Infiniband counters to see that significant amount of traffic is sent over the wire which will indicate that you have succeeded.

watch -d cat /sys/class/infiniband/mlx5_0/ports/1/counters/port_rcv_packets

You can also inspect a status file that Powerflow writes continously like this from the working directory of the simulation:

# Lets look at the status of the simulation
$ cat .exa_jobctl_status
Decomposer: Decomposing scale 6

# ... again
$ cat .exa_jobctl_status
Simulator: Initializing voxels [43% complete [21947662 of 51040996]

# ... and once the simulation is complete.
$ cat .exa_jobctl_status
Simulator: Finished 100 Timesteps

I’ve been presenting at Ubuntu Masters about the setup I use to work with my systems which allows me to do things like this easily. Here is a link to that material: https://youtu.be/SGrRqCuiT90

Running STARCCM+ using OpenMPI on Ubuntu with SLURM and Infiniband

This is a walkthrough on my work on running a proprietary computational fluid dynamics code, StarCCM+ on Ubuntu18.04 using the snap version of SLURM with openMPI 4.0.4 over Infiniband.

You can use this to perform scaling studies, track down issues and optimizing performance or use it as you like. Much of this will work on other OS:es too.

This is the workbench used here:

Hardware: 2 hosts with 2×20 cores 187GB ram.
Infiniband: Mellanox MT28908 Family [ConnectX-6]
OS: Linux 4.15.0-109-generic (x86_64) Ubuntu18.04.4
SLURM 20.04 (https://snapcraft.io/slurm)
OpenMPI: 4.0.4 (ucx, openib)
StarCCM+: STAR-CCM+14.06.012
A Reference model which is small enough for your computers and large enough to run over 2 nodes.

Lets get started.

Modify ulimits on all nodes.

This is done by editing /etc/security/limits.d/30-slurm.conf

* soft nofile  65000
* hard nofile  65000
* soft memlock unlimited
* hard memlock unlimited
* soft stack unlimited
* hard stack unlimited

Modify slurm systemd unit startup files to make ulimit permanent to the slurmd processes.

$ sudo systemctl edit snap.slurm.slurmd.service

[Service]
LimitNOFILE=131072
LimitMEMLOCK=infinity
LimitSTACK=infinity

* Restart slurm on all nodes.

$ sudo systemctl restart snap.slurm.slurmd.service

* Make sure login nodes has correct ulimits after a login.

* Validate that all worker nodes also has correct values on ulimits when using slurm. For example:

$ srun -N 1 --pty bash
$ ulimit -a

You must have all consistent settings for ulimit or things will go sideways. Remember that slurm propagates ulimits from the submitting node, so make sure those are consistent there too.

Compile OpenMPI 4.0.4

At the time, this is the latest version. This is my configure but I think you can compile it differently for your needs.

$ ./configure --without-cm --with-ib --prefix=/opt/openmpi-4.0.4
$ make
$ make install

Validate that openmpi can see the correct mca ucx

I’m most concerned in this step that ucx pml is available in the MCA for openmpi, so after my compilation is done, I check for that and the openib btl.

$ /opt/openmpi-4.0.4/bin/ompi_info  | grep -E 'btl|ucx'

MCA btl: openib (MCA v2.1.0, API v3.1.0, Component v4.0.4)
MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.0.4)
MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.0.4)
MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.0.4)
MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v4.0.4)
MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.0.4)
MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.0.4)

What we are looking for here is:

* MCA btl: openib (MCA v2.1.0, API v3.1.0, Component v4.0.4)
* MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.0.4)

The rest are not important at this point. But you might know better, please let me know. You can see in the jobscript later where these modules are referenced.

Validate that ucx_info see your Infiniband device and ib_verbs transports

In my case, I have a Mellanox device (show with: ibv_devices) so I should see that with ucx_info:

$ ucx_info -d | grep -1 mlx5_0

#
# Memory domain: mlx5_0
#     Component: ib

#   Transport: rc_verbs
#      Device: mlx5_0:1
#

#   Transport: rc_mlx5
#      Device: mlx5_0:1
#

#   Transport: dc_mlx5
#      Device: mlx5_0:1
#

#   Transport: ud_verbs
#      Device: mlx5_0:1
#

#   Transport: ud_mlx5
#      Device: mlx5_0:1
#

Modify the STARCCM+ installation

My version of StarCCM uses an old ucx and calls /usr/bin/ucx_info. At some point ending during startup, it fails when its not able to find libibcm.so.1 when using our custom openMPI. Perhaps there is a way to force starccm+ to look for ucx_info on the system, but I have not found any way to do this.

To have StarCCM+ ignore its own ucx, simply remove the ucx from the installation tree and replace with an empty directory.

$ sudo  rm -rf /opt/STAR-CCM+14.06.012/ucx/1.5.0-cda-001/linux-x86_64*
$ mkdir -p /opt/STAR-CCM+14.06.012/ucx/1.5.0-cda-001/linux-x86_64-2.17/gnu7.1/lib

This is not needed on OS:es such at centos6 and centos7 because they use the deprecated library libibcm.so.1.

Time to write the job-script

#!/bin/bash
#SBATCH -J starccmref
#SBATCH -N 2
#SBATCH -n 80
set -o xtrace
set -e

# StarCCM+
export PATH=$PATH:/opt/STAR-CCM+14.06.012/star/bin

# OpenMPI
export OPENMPI_DIR=/opt/openmpi-4.0.4
export PATH=${OPENMPI_DIR}/bin:$PATH
export LD_LIBRARY_PATH=${OPENMPI_DIR}/lib

# Kill any leftovers from previous runs
kill_starccm+

export CDLMD_LICENSE_FILE="27012@license.server.com"
SIM_FILE=SteadyFlowBackwardFacingStep_final.sim
STAR_CLASS_PATH="/software/Java/poi-3.7-FINAL"
NODE_FILE="nodefile"

# Assemble a nodelist using this python lib
hostListbin=/software/hostlist/python-hostlist-1.18/hostlist
$hostListbin --append=: --append-slurm-tasks=$SLURM_TASKS_PER_NODE -e $SLURM_JOB_NODELIST >  $NODE_FILE
# Start
starccm+ -machinefile ${NODE_FILE} \
         -power \
         -batch ./starccmSim.java \
         -np $SLURM_NTASKS \
         -ldlibpath $LD_LIBRARY_PATH \
         -classpath $STAR_CLASS_PATH \
         -fabricverbose \
         -mpi openmpi \
         -mpiflags "--mca pml ucx --mca btl openib --mca pml_base_verbose 10 --mca mtl_base_verbose 10"  \
         ./SteadyFlowBackwardFacingStep_final.siM
# Kill off any rogue processes
kill_starccm+

You probably need to modify this above script for your own environment but the general things are in there.

Submit to slurm

You want this job to run on multiple machines, so for me I use a -n 80 to allocate 2×40 cores which is understood by slurm to allocate the two nodes I have used in the example. If you have less more more cores than I do, use a 2xN number in your submit.

$ squeue -p debug -n 80 ./starccmubuntu.sh

You can watch your Infiniband counters to see that significant amount of traffic is sent over the wire which will indicate that you have succeeded.

watch -d cat /sys/class/infiniband/mlx5_0/ports/1/counters/port_rcv_packets

I’ve been presenting at Ubuntu Masters about the setup I use to work with my systems which allows me to do things like this easily. Here is a link to that material: https://youtu.be/SGrRqCuiT90

Here is also another simular walkthrough of doing this also with the CFD application Powerflow: https://eriklonroth.com/2020/07/15/running-powerflow-on-ubuntu-with-slurm-and-infiniband/

Workshop open source for the future

Come join the Edgeryders global festival and create an open source future for the coming generations of internet citizens.

Our theory of change is building dense networks around people trying to tackle messy socialecological, economic and political challenges. Having a dense network around you is one of the hallmarks of social capital, and gives you access to expertise, resources, skill sharing and financing. In a dense network, there are many ways for you to get from one point to another. A key challenge here is for people with aligned interests to find each other, and as a community we’re out to solve just that.

If you, like me, think open source matters – exceedingly much, in a digital world – this is something for you.

The describing text below is in swedish, but the english version is found here: Teaching Teachers Open Source

Edgeryders-Fest-Twitter-1100x628-erik-svenks-01

https://festival.edgeryders.eu/

När: 28:e November – 10:00 – 17:00
Var: Södra Hamnvägen 9 (Hus Blivande, Vägbeskrivning här)
Hur: Lunch och kaffe ingår.

Biljett får du så här:

  1. Besök https://tell.edgeryders.eu/festival/ticket och fyll i formuläret.
  2. Skapa ett edgeryders konto och introducera dig i forumet..
  3. När du introducerat dig på forumet, får du en biljett till dig via email.

Bakgrund till workshopen

Internet, datorer och programmering blir idag omedelbart en del i våra barns liv. Kommande generationer, kommer aldrig upplevt en värld utan internet och datorer.

I utbildningssystemet, introduceras de för verktyg, tankar och begrepp härledda från den digitala världen. Det är i skolan vi formar mycket av vår världsbild och idag är det mesta av stängd, trots att den borde vara öppen.

Vad händer egentligen med kunskap en digital framtid om en stängd vy på mjukvara och internet får lämnas outmanad i skolan?

Fri och öppen mjukvara präglas av transparens, samarbete, frihet och bemyndigande. Etablerade hörnstenar i vårt utbildningssystem. Så varför lever vi inte som vi lär när det gäller det här området som kommer forma våra barn i grunden?

Hypotesen är att skolvärlden saknar djup och kunskap hur fri och öppen källkod spelar en central roll i hur kunskapen om den digitala världen fungerar. Att utbilda lärare och ledare på detta område är så vi hjälper kommande generationer att forma nästa generation av internetmedborgare.

Agerar du redan i skolvärlden, är ditt deltagande välkommet och kommer du från ett open source engagemang och från mjukvaruvärlden, är din kunskap och erfarenhet central för att forma innehållet.

Vad kommer du att uppleva?

Du kommer få chans att lära om hörnstenar inom fri & öppen mjukvara och hur de är grundläggande för kunskap i en digital värld.

Du får möjlighet att dela med dig av din kunskap inom skola, mjukvara eller både och – samt – ta del av andras.

Du kommer få samarbeta med experter och entusiaster om hur vi når ut till skolväsendet.

Du kommer träffa människor med samma brinnande intresse för att forma kommande generationers internet i en spännande miljö.

Magic with Juju

I’ve been working on a set of tutorials over the last few months on how to write “juju charms”.

If you haven’t checked out Juju yet – do it now.

Juju is an open source framework to operate small and large software stacks in any cloud environment. It works fairly transparent on well known clouds like AWS, Google Cloud Engine, Oracle Clouds, Joyent, RackSpace, CloudSigma, Azure, vsphere MAAS, Kubernetes, LXD and even custom clouds!

I touched base with Juju some 5 years ago and immediately understood that this was going to make an impact in the computer science domain. The primary reasons were to me:

  1. Open Source all the way.
  2. No domain specific languages required to work with it. E.g. use your preferred programming language to do your Infrastructure As Code / DevOps.
  3. Smart modular architecture (charms) that allows re-use and evolution of best solutions in the juju ecosystem.
  4. “Fairly” linux agnostic – although the community is heavy on Ubuntu.
  5. Cloud agnostic – you can use your favorite public cloud provider or use your own private cloud, like an openStack or even VMware environment. Huge benefit to anyone looking to move into cloud.
  6. Robust and maintained codebase backed by Canonical – if enterprise support matters to you. There are other companies, like OmniVector Solutions, also delivering commercial services around Juju if you don’t like Canonical. [Disclaimer: I’m biased in my affiliation with OmniVector]
  7. A healthy, active & friendly open source community – if you, like me, think a living open source community is a key factor to success.

Above factors weigh in heavy to me and even though there are a few competitors to Juju, I think you can’t really ignore it if you are in to DevOps, IaaC, CI/CD, clouds etc.

Juju means “Magic” which is an accurate description on what you can do with this tool.

So, here are a few decent starting points from my collection to getting started with juju. Knock yourself out.

The juju community

Getting started with juju.

Tutorial “Getting started with juju hooks”.

Tutorial “Getting started with reactive charms”

Tutorial “Charms with snaps”

Massive open source win!

Now, this is good news!

The OpenChain project, (backed by The Linux Foundation) helps companies all over the world to be compliant with good practices dealing with open source. In effect, fundamentally to how they do business together within the software realm.

On December 6, 2018 – a large amount of huge players within industry made a massive joint announcement regarding this project! 

I’ve been working for over 10 years, slowly pulling my employer Scania towards this path – and now, we are there! Have a look at the map and you also see the global reach. Given the massive support, from huge players in many domains, its foreseeable that most companies will or need to follow suite, many more which I’m helping in the process.

What does this mean to you?

If you are working for a (any) software company and want to sell or do business with any openChain compliant company in the future. You need to show that your company are following the open chain standards, or, you won’t be able to business with us. The only exception will be if none of your code is open source, which in our world – is more or less impossible.

OpenChain can be seen somewhat similar to an ISO certification for software. This change, will need a transition period of course – where exceptions will be needed for sorting out proprietary mess – but we are getting there.

This framework, will in time, be affecting all software businesses. Probably all over the world. The initiative comes not from politics, but from a strong and healthy private sector and is a massive win for open source in the whole world. As the chairman of open source at Scania, and its representative within the Volkswagen group (640.000 employees), I’m more than happy to announce this message!

Now, go open source! Share this message to everyone that works with software.

 

 

Develop or die

I’ve been technical lead for a HPC center for over a decade now and for some 15 years  watched paradigms come and go within the systems engineering domain. This text is about future developments of HPC in relation to the more general software domain. There are some very tough challenges ahead and I’ll try to explain what is going on.

But first, let me get you in line of thinking with a bit of my own story.

Turning forty this year, exploring and experimenting within computer science, has landed me in a very peculiar place. People reach out for advice in very difficult matters around building all sorts of crazy computer systems. Some sets out to build successful business, some other just want to build robots. I help out as much as I can, although my time normally is just gone. Its a weird feeling to be referenced as senior. Where did those years go?

My experience in building HPC systems started in the early 2000 and from a computer science perspective, I’ve tried to stay aligned with the development in the field. This is never easy in large organizations, but having an employer bold enough to let me develop the field has made it possible, not only to deliver enterprise grade HPC for over a decade – but also to develop some other parts of it in exciting ways.

It didn’t always look like this of course.

I’ve been that person, constantly promoting and adopting “open source” wherever I’ve gone. A concept that at least in automotive (where I roam mostly) was pariah (at best) in my junior years. I came in contact with open source and Linux in the university studies and was convinced open source was going to take over the world. I could see Linux dominate the data centers from single the fact that the licensing model would allow it. There are off course tons of other reasons for that to happen, but the licensing model in it self was decisive.

I’m sorry, but a license model that doesn’t scale better than O(n) is just doomed in the face of O(1).

(I think I’m a nightmare to talk to sometimes.)

Anyway, telling my story of computer science and systems engineering, can’t be without reflecting on some technology elements and methodologies that came about during the past decades. After all, important projections on systems engineering can’t be without referencing technology paradigms.

Late in the mid 1990:ies, “Unix Sysadmins” ruled the domain of HPC. Enterprise systems outside of HPC was Microsoft Windows. For heavyweight systems, VMS and UNIX dialect systems prevailed. Proprietary software was what mattered to management (still is?), and everything was home-rolled. The concept of getting “Locked-In” by proprietary systems was poorly understood which was reflected by low understanding of real costs and how ability to build high quality systems was constituted. Building HPC system at this time was an art form. Web front ends didn’t exist. Most people were self taught and HPC was very different from today. My first HPC cluster (for a large company) lived in wooden shelves from IKEA in a large wardrobe. The CTO of the IT division was a carpenter. I’ll refer to this era as the “Windows/Unix Era“.

The Windows/Unix Era started to morph during the 2000 into the Linux revolution. In just a few years, available (not only) HPC software and infrastructure was getting more and more Linux. Other OS dialects such as HPUX, VMS, AIX, Solaris and Mainframe ended up in phase-out, witnesses of an era slowly disappearing. Linux on client desktops entered the stage and at this point, the crafting of HPC system was becoming an art of automation. Mac:s got thinner.

Sysadmin meant in practice, scripting. bash, ash, ksh etc. cfengine, ansible and puppet. Computer science was starting to become quasi magic (that I think most people perceive computers in general) and advanced provisioning systems became popular. Automation!

I was myself using Rocks Clusters for my HPC platform, with some Red Hat Satellite assistance to achieve various forms of automation. The still ever popular DevOps appeared, mainly in the web services domain and some people started to do python in favor of bash (OMG!). Everything was now moving towards Open Source. It became impossible to ignore this development, also at my primary employer. I became the chairman of the open source forum and slowly started to formalize that field in a professional way. A decade behind giants such as Canonical, IBM or RedHat, but hey – automotive is not IT, right?

However, time and DevOps development stood still in the HPC domain. I blame the performance loss in virtualization layers for that situation, or perhaps being victim of own success. While vmware, xen, kvm etc. filled up the data centers in just a few years, it never took off in HPC. Half a decade passed by with this and so called “clouds” started to appear.

I personally hate the term cloud. It blurs and confuses the true essence which perhaps is better put like: “Someone else’s computer”.

Recent development in the cloud has exposed one of the core problems with cloud resources. I’m not going to make the full case here, but generally speaking, access/integrity/safety/confidentiality of any data is almost impossible to safeguard – if you are using someone else’s computer. There is a fairly easily accessible remedy, which I’ll throw up a fancy quote for:

“Architect distributed systems, keep your data local & always stay in control over your own computing. Encrypt”. 

I think time will prove me right and I’ve challenged that heavily by taking active part in a long running research project; LIM-IT that is set out to map and understand Lock-in effects of various kind. The results from this project should be mandatory education to any IT-manager that wants to stay on top of their game in my industry.

From a technology point of view, I can recommend looking at projects like Nextcloud, OpenStack, Bittorrent, etc. But there are many, many others that operates with this great mindset.

So, where are we now. First of all, HPC is in desperate need of getting ingested by technology from other domains. Especially low hanging fruits like those from the Big Data domain. Just to take one conceptual example; making use of TCP/IP communication stacks to solve typical HPC problems. One of my research teams explored this two years ago by implementing a pCMC algorithm in SPARK as part of a thesis program. The intention was to explore and prove that HPC systems indeed serve as excellent hybrid solutions to tackle typical Data Analytics problems and vice versa.  I’m sure MPI-doctors will object, but frankly, that’s a lot of “Not Invented Here” syndrome. The results from our research thesis spoke for itself. Oh, the code can be downloaded here under a open source license so you can try it yourself. (Credits to Tingwei Huang, and Huijie Shen for excellent work on this.)

Now, there is a downside to everything. So also when opening up for a completely new domain of software stacks. Once you travel down the path of revamping your HPC environment with new technologies from, for example, “Cloud” (Oh I hate that word) and going all in “Everything as a Service” – you are basically faced with an avalanche of technology. Hadoop, SPARK, ELK, OpenStack, Jenkins, K8, Docker, LXC, Travis, etc. etc. All of these stacks requiring years, if not decades, to learn. There exists few, if any, persons that master even a few of these stacks and their skills are highly desired on a global market for computer wizards. Even more true is that they’re probably not on your pay list, because you just can’t afford them.

So, as a manager in HPC, or some other complex enough IT-environment. You face the Gordian knot of a problem:

“How do I manage and operate an IT-environment of vast complexity PLUS manage to keep my cost under some kind of control?”

Mark Shuttleworth talks brilliantly about this in his key-notes nowadays by the way which I’m happy to repeat.

Most managers will fail facing this challenge (unfortunately) and they will fail in three ways:

  1. Managers in IT will fail to adopt the right kind of technologies and get “locked in” on either data-lock-in, license-lock-in, vendor-lock-in, technology-lock-in or all of them – ending up in a budget disaster.
  2. Managers in IT will fail to recruit skilled enough people or to many with the wrong skill set – they will fail to deliver relevant services to the market.
  3. Managers in IT will do both of the above – delivering sub-performing services too late to market, at extreme costs.

If I’m right in my dystopian projections, we will see quite a few companies go down within IT in the coming years. The market will be brutal to those failing to develop. Computer Scientists will be able to ask for fantasy salaries and there will be a latent panic in management.

I’ve spent significant amount of time researching and working on this very challenging problem, which is general to its nature, but definitely applicable to my own expert domain in HPC. In a few days, I’ll fly out to get the opportunity to present a snapshot of my work to one of Europe’s largest HPC center in Germany – HLRS. Its happy times indeed.

My native ipv6 – part1

So, I’ve spent about two years arguing with my network provider Fibra to enable my high speed internet for ipv6 native. A struggle that in January 2018 payed off. I was kindly offered to be included in their PoC for ipv6 and I was thrilled.

Thank you Fibra for finally coming around on this.

I’m no guru on ipv6, but I feel comfortable navigating the fundamentals and have been running ipv6 tunnels for many years now. This is my hopefully short blogg series on getting my native ipv6 working at home.

My ambition at this point, is to have a public ipv6 address assigned from my network provider along with a 48 bit routed network prefix which will be split into a 64 bit prefix delegation for my home network.

This seems like a very basic setup at the time of writing and suits my ambitions for now. Later, I will be running separate ipv6 prefixes on my other virtual environments, while keeping the home network separate.

My current WAN access setup is a openWRT Chaos Calmer TPLINK with the WAN interface obtaining public IP:s from Bahnhof in Sweden. Fibra is the network provider, that for the purpose of the PoC, has turned on the switchport for ipv6 traffic for me.

For this to make sense to a reader, this is an extract of a starting /etc/config/network that serves as a starting point for a native ipv6 setup.

config interface 'wan'
 option ifname 'eth0.2'
 option proto 'dhcp'
 option ipv6 '1'

​config interface 'wan6'
 option _orig_ifname 'eth0.2'
 option _orig_bridge 'false'
 option ifname 'eth0.2'
 option proto 'dhcpv6'
 option reqaddress 'try'
 option reqprefix 'auto'

Bringing up and down the interface should have you ending up with a ipv6 address on the eth0.2 interface. I did that maneuver, but no ipv6 address was handed out to me.

Basic debugging needed, you would likely do the same if you get this result.

I started by turning off the firewall (to be sure no traffic is dropped before it hits the tcpstack):

$ /etc/init.d/firewall stop

Then I tcpdump for ipv6 icmp traffic. I expect to see some ipv6 icmp traffic which I did. (in an ipv6 network, icmp traffic is chatty which is a bit different than you might be used to from ipv4 networks)

$ tcpdump -i eth0.2 icmp6

I could see Router Advertisement messages from an interface not on my devices, so something was working at Fibras side. I needed to verify that my router actually sent out dhcpv6 messages and the replies that should follow.

So, I force the ipv6 dhcp client (odhcp6c) to send router solicitation messages out on the interface I wish to handle the ipv6 native stack.

​$ odhcp6c -s /lib/netifd/dhcpv6.script -Nforce -P48 -t120 eth0.2

tcpdumpv6

Nope, no reply from my network provider.

What should happen here is that the dhcp6 server at my network provider Fibra side should respond with a configuration package (Advertise), but as you can see – its just spamming RA packages, seemingly ignoring my client Router Solicitations. This is not the way its supposed to be and I sent Fibra the information above.

To be continued.

Update 1:

For the curious reader, a correct dhcpv6 exchange (RFC 3315) should look like this:
Client -> Solicit
Server -> Advertise
Client -> Request
Server -> Reply

Update 2:

Also a note from my ipv6 friend guru Jimmy is that, if you need to see more details on the ipv6 dhcp messages coming from the odhcp6c, you can expand the tcpdump filter as:

$ ​tcpdump -i eth0.2 icmp6 or port 546 or 547

You should see packages coming out from your interface as:

IP6 fe80::56e6:fcff:fe9a:246a.dhcpv6-client > ff02::1:2.dhcpv6-server: dhcp6 solicit

 

 

My blogg

I’ll add some blogging here soon.