Tutorial: Relations with Juju

This post teaches you how to build juju charms with relations by studying two example charms made for this purpose. I work professionally with juju since many years and I’ve always thought juju relations should have a more tutorials to help beginner level programmers get start with this extremely powerful tool. You can find other tutorials I’ve written in the open source community on https://discourse.juju.is

Difficulty: Intermediate

What you will learn

This tutorial will teach you implementing a simple relation in juju. We will use two existing charms that implements a master-worker pattern and study that code for reference.

Get the code here: git clone https://github.com/erik78se/masterworker

You will:

  • Learn what a relation is and how to use it in your charms.
  • Learn more about hooks and how hook-tools drives relation data exchanges.
  • Learn about relation-hooks and when they run.
  • How to debug/introspect a relation with hook-tools

Preparations

  • You need a basic understanding of what juju, charms and models are.
  • You should have got through the getting started chapter of official juju documentation.
  • You need to be able to deploy charms to a cloud.
  • You have read: the lifecycle of charm relations
  • You have read: charm writing relations.
  • You need beginner level python programming skills.

Refreshing our memory a little

Some key elements of juju we are worth mentioning before we dig into the code.

Juju hook-tools

When working with relations and juju in general, what goes on under the hood are calls to juju hook-tools.

Looking at what those hook-tools can be done via:

juju help hook-tools

Two specific hook-tools are of high importance when working with juju relations:

relation-get & relation-set

Those are the primary tools when working with juju relations because they get/set data in relations.

Important: You set data on the local unit, and get data from remote units.

Hooks, their environment & context

Hooks-tools are normally executed from within a ‘hook’ where environment variables are set by the Juju agent, dependent on the context/hook. These variables are used when writing code for relations.

An example below show logging of some of those environment variables.

#!/bin/bash
juju-log "This is the relation id: $JUJU_RELATION_ID"
juju-log "This is the remote unit: $JUJU_REMOTE_UNIT"
juju-log "This is the local unit: $JUJU_UNIT_NAME"

Charmhelpers

When building charms with python, the python package charmhelpers provides a set of functions that wrapps the hook-tools. Charmhelpers can be installed with

pip install charmhelpers

Here is the documentation: charmhelpers-docs

Installing charmhelpers for use within you charm, could be part of your install-hook, or even better, cloned into the “./lib/” of your charm, making it part of your charm software.

Cloning charmhelpers into your charm is a good practice since it isolates your charms software requirements from other charms that may live on the same host.

Feeling all refreshed on juju basics, lets now introduce the “master” and “worker” charms.

Master worker

Clone the “masterworker” repo to your client.

git clone https://github.com/erik78se/masterworker

The repo contains:

├── bundle.yaml         # <--- A bundle with a related master + 2 workers
├── master              # <--- The master charm
├── worker              # <--- The worker charm
├── ./lib/hookenv.py    # <--- Part of charmhelpers

The idea here is that:

  • The master is a single unit, whereas the workers can be many.
  • The master send some unique information to the individual worker units.
  • The master send some common information to all workers units.
  • The workers don’t send (relation-set) any information at all.

This pattern is useful in a lot of situations in computer science, such as when implementing client-server solutions.

Lets deploy the master and two workers so we can see how it looks and how the charms are related.

juju deploy master
juju deploy worker -n 2
juju relate master worker

Note: You could of course deploy the bundle instead:

juju deploy ./bundle.yaml
masterworker-deployed.png

Implementation

So, lets go through the steps required to produce the relation between these charms.

The first step in implementing the relation between two charms starts with defining the relation endpoint for the charms and its interface name. This is done in metadata.yaml

Step 1. Define an endpoint and select an interface name

A starting point to create a relation charm, is to modify the the metadata.yaml file. We do this for both master and worker since they have different roles in the relation.

The endpoints for the master and worker are defined as below.

master/metadata.yaml

provides:                # <--- Role
  master-application:    # <--- Relation Name
    interface: exchange  # <--- Interface name 
    limit: 1

worker/metadata.yaml

requires:                # <--- Role
  master:                # <--- Relation name
    interface: exchange  # <--- Interface name

The interface name must be same for the master/worker endpoints or juju will refuse to relate the charms.

Step 2. Decide what data to pass

As we described above, the master is the only part of the relation that exchanges information in our invented exchange interface with the worker.

  1. worker-key for each unique worker. The worker-key is created by the master.
  2. message from the master to all the workers.

So, this data is all what we will “get/set” in the relation.

This is all done as part of the “relation hooks” that we will look into now.

Step 3. Use the relation hooks to set/get data.

Lets follow the events following the call to juju relate:

juju relate master worker

What happens now, is that Juju triggers a specific set of hooks on all units involved in the relation called “relation hooks”. The picture below shows how these hooks are called and in what order when a relation is formed.

juju-hook-state-machine.png

The master set data in master-application-relation-joined

The worker get data in master-relation-changed

A best practice here, is to use relation-joined and/or relation-created to set initial data and relation-changed to retrieve them just as we have done in the master and worker charms.

The reason for this is that we can’t know in relation-created or relation-joined that the other end of the relation has set any relation data yet.

Only a few relation keys (such as, the remote unit ‘private-address’) are available at these early stages (Available in relation-joined) and its not until in relation-change that your own relation data should be expected to be available.

Apart from these considerations, all we do to manage data is via: “relation-set” and “relation-get”.

Now, lets look a bit closer on how the master sends out data that is unique to our worker units.

Communicating unit unique data

Data exchanged on juju relations is a dictionary.

So to pass individual data to workers, the master creates a composite dictionary key, made up by the joining remote unit-name + key-name and relation-set data for that composite key.

./master/hooks/master-application-relation-joined

    log(" ========= hook: master-application-relation-joined  ========")

    # Generate a worker-key
    workerKey = generateWorkerKey()

    # Get the remote unit name so that we can use that for a composite key.
    remoteUnitName = os.environ.get('JUJU_REMOTE_UNIT', None) # remote_unit()

    # Get the worker remote unit private-address for logging
    workerAddr = relation_get('private-address', unit=remoteUnitName)

    log(f"Joined with WORKER at private-address: {workerAddr}")

    # Assemble the relation data.
    relation_data = { f"{remoteUnitName}-worker-key": workerKey }

    # Set the relation data on the relation.
    relation_set(relation_id(), relation_settings=relation_data )

The worker access its individual ‘worker-key’ in the master-relation-changed hook:

./worker/hooks/master-relation-changed

    log(" ========= hook: master-relation-changed  ========")
    
    localunitname = os.environ['JUJU_UNIT_NAME']

    ## If we have data that belong to this unit
    if relation_get(f"{localunitname}-worker-key"):

        # Get the worker-key with our unit name on it, e.g.: 'worker/0-worker-key'
        workerKey = relation_get(f"{localunitname}-worker-key")

Pretty straight forward, right?

Lets explore further how we use an alternative way to send out a message to the workers outside of the relation hooks.

Triggering a relation-change via a juju action.

So, juju takes care of making sure that any change on a relation triggers the hook relation_name-relation-change on the remote units, we can trigger this from other non relation hooks since we can access the relations by their id:s.

Look at the juju-action broadcast-message to show how this is achieved:

./master/actions/broadcast-message

# Assume that the first relation_id is the only and use that.
relation_id = relation_ids('master-application')[0]

# Get the message from the juju function/action
message = function_get('message')

relation_data = { 'message': message }

# ... set the relation data.
relation_set(relation_id, relation_settings=relation_data)

If you run the action ‘broadcast-message’ and watch the “juju debug-log” you will see all units logging the message sent.

juju run-action master/0 broadcast-message message="Hello there"

Look into the relations (debugging)

We will often need to see what goes on on a relation, what data is set etc. Lets see how that is done using the hook-tools.

Here we retrieve the relation-ids for the master/0 unit.

juju run --unit master/0 "relation-ids master-application"
master-application:0

Removing and adding back a relation shows how the relation-id changes from master-application:0 to master-application:1

juju remove-relation master worker
juju relate master worker
juju run --unit master/0 'relation-ids master-application'
master-application:1

We can see from the command below, how the worker can access all (-) keys/data on the master/0 unit.

juju run --unit worker/0 'relation-get -r master:1 - master/0'
egress-subnets: 172.31.27.134/32
ingress-address: 172.31.27.134
private-address: 172.31.27.134
worker/0-worker-key: "5914"
worker/1-worker-key: ADA1

We can from the command below, see that on the master/0 there is no information from the worker, which is expected. Remember that the workers don’t set any data.

juju run --unit master/0 'relation-get -r master-application:1 - worker/0'
egress-subnets: 172.31.35.128/32
ingress-address: 172.31.35.128
private-address: 172.31.35.128

Individual keys can be retrieved as well with their key names:

juju run --unit master/0 "relation-get -r master-application:1 worker/1-worker-key master/0"
ADA1

Step 4. Departing the relation

The last step to implement in juju relation is taking case of when a unit departs from a relation, the programmer should:

  1. Remove any relation data associated with the departing unit from the relation dictionary with the relation-set hook tool.
  2. Do whatever is needed to remove a departing unit from the service e.g. perform reconfiguration, removing databases etc.

Lets walk this through by removing a worker. Follow the events with juju debug-log.

juju remove-unit worker/1

The master (and worker/1) gets notified of the event and executes their respective relation-departed hook.

Departing – as it happens on the master

The master cleans up the relation data associated with the departing (remote) unit. ./master/hooks/master-application-relation-departed

    # Set a None value on the key (removes it from the relation data dictionary)
    relation_data = {f"{remoteUnitName}-worker-key": None}

    # Update the relation data on the relation.
    relation_set(relation_id(), relation_settings=relation_data) 

The master hasn’t done anything else on the host itself, so its duties are complete.

Inspecting the relation will show that the data for worker/1 is gone:

juju run --unit worker/0 'relation-get -r master:1 - master/0'
egress-subnets: 172.31.27.134/32
ingress-address: 172.31.27.134
private-address: 172.31.27.134
worker/0-worker-key: 5914

Departing – as it happens on the worker

On the worker side of the relation, the worker didn’t set any relation data, so it doesn’t have to do anything to clean up in its relation data.

But, the worker should remove the WORKERKEY.file that it created on the host as part of joining the relation.

This cleanup procedure is placed in the ‘relation-broken’ hook.

./worker/hooks/master-relation-broken

    # Remove the WORKERKEY.file
    os.remove("WORKERKEY.file")

The relation-broken hook is the final state when unit is completely cut-off from the other side of the relation, as if the relation was never there. It is last in the relation life-cycle and is a good place to do cleanup related to the host or underlying service deployed by the charm.

If the relation-broken is being executed, you can be sure that no remote units are currently known locally. So, on the master, this hook is not ran until there are no more workers.

Keep in mind that the name of the hook “-broken” has nothing to do with that the relation is “bogus/error”. Its just that the relation is “cut”.

Lets finish up by removing all the relations:

juju remove-relation master worker.

Inspect the relations and look for the file WORKERKEY.file on the remaining worker units (they are gone!).

You will also see in the juju debug-log that the master has finally ran its “relation-broken” hook.

Congratulations, you have completed the tutorial on juju relations!

Running Powerflow on Ubuntu with SLURM and Infiniband

This is a walkthrough on my work on running a proprietary computational fluid dynamics code on the snap version of SLURM over Infiniband. This time, I’ll take you through what it takes to get powerflow to run on Ubuntu18.04. If you like to try out the same thing on STARCCM+, here is a link to a post that takes you through that.

You can use this to perform scaling studies, track down issues and optimizing performance or use it as you like. Much of this will work on other OS:es too.

This is the workbench used here:

Hardware: 2 hosts with 2×20 cores 187GB ram.
Infiniband: Mellanox MT28908 Family [ConnectX-6]
OS: Linux 4.15.0-109-generic (x86_64) Ubuntu18.04.4
SLURM 20.04 (https://snapcraft.io/slurm)
OpenMPI: 4.0.4 (ucx, openib)
Powerflow: 6.2019
A Reference model which is small enough for your computers and large enough to run over 2 nodes on your available cores.

I use Juju to deploy my SLURM clusters in any cloud to get up and running. In this case, I use “MAAS” as the underlying cloud, but this would work on other clouds aswell.

Lets get started.

Modify ulimits on all nodes.

This is done by editing /etc/security/limits.d/30-slurm.conf

* soft nofile  65000
* hard nofile  65000
* soft memlock unlimited
* hard memlock unlimited
* soft stack unlimited
* hard stack unlimited

Modify slurm systemd unit startup files to make ulimit permanent to the slurmd processes.

$ sudo systemctl edit snap.slurm.slurmd.service

[Service]
LimitNOFILE=131072
LimitMEMLOCK=infinity
LimitSTACK=infinity

* Restart slurm on all nodes.

$ sudo systemctl restart snap.slurm.slurmd.service

* Make sure login nodes has correct ulimits after a login.

* Validate that all worker nodes also has correct values on ulimits when using slurm. For example:

$ srun -N 1 --pty bash
$ ulimit -a

You must have all consistent settings for ulimit or things will go sideways. Remember that slurm propagates ulimits from the submitting node, so make sure those are consistent there too.

I’m going to assume you have installed Powerflow on all you nodes at: /software/powerflow/6.2019 but you can have it wherever.

Powerflow also needs csh, install it:

sudo apt install csh

Modify the Powerflow installation

Since powerflow doesn’t yet support Ubuntu (which is a shame) we need to get around this by fixing a few small bugs to get our simulation running as we want.

Workaround #1 – Incorrect awk path

Powerflow assumes that a few os commands are located in a fixed location on all OS:es, which is a bug of course. The bug is located in the file “/software/powerflow/6.2019/dist/generic/scripts/exawatcher” and causes problems.

To fix this, you either edit the exawatcher script and comment out the references:

#set awk=/bin/awk
#set cp=/bin/cp
#set date=/bin/date
#set rm=/bin/rm
#set sleep=/bin/sleep

… as an ugly alternative, you create a symlink to “awk” which is enough to work around the bug. Hopefully this will be fixed in future versions of powerflow.

sudo ln -s /usr/bin/awk /bin/awk

This is not needed on OS:es such at centos6 and centos7 which have those symlinks already in place.

Workaround #2 – bash is not sh

Powerflow has an incorrect script header, referencing “#!/bin/sh” for code that is in fact “#!/bin/bash” and will render into a syntax error on ubuntu.

Replace #!/bin/sh header with #!/bin/bash in the file: /share/apps/powerflow/6.2019/dist/generic/server/pf_sim_cp.select

This is enough really. Its time to run powerflow through SLURM.

Time to write the job-script

#!/bin/bash
#SBATCH -J powerflow_ubuntu
#SBATCH -A erik_lonroth
#SBATCH -e slurm_errors.%J.log
#SBATCH -o slurm_output.%J.log
#SBATCH -N 2
#SBATCH --ntasks-per-node=40
#SBATCH --exclusive
#SBATCH --partition debug

LC_ALL="en_US.utf8"
RELEASE="6.2019"
hosttype="x86_64-unknown-linux"
INSTALLPATH="/software/powerflow/${RELEASE}"

export PATH=$INSTALLPATH/bin":$PATH
export LD_LIBRARY_PATH="$INSTALLPATH/dist/x86_linux/lib:$INSTALLPATH/dist/x86_linux/lib64"

# Set a low number of timesteps since we are only here to test
NUM_TIMESTEPS=100

export EXACORP_LICENSE=27007@license.server.com


exaqsub \
-decompose \
-infiniband \
-num_timesteps $NUM_TIMESTEPS \
-foreground \
--slurm \
-simulate \
-nprocs $(expr $SLURM_NPROCS - 1) \
--mme_checkpoint_at_end \
*.cdi

You probably need to modify this above script for your own environment but the general things are in there. An important note here is that powerflow normally need a separate node to run its “Control Process (CP)” on with more memory than the “Simulation Processors (SP)” nodes. I’m not taking that into account since my example job is small and fits in RAM for this example. This why I also get away with setting:

-nprocs $(expr $SLURM_NPROCS - 1) \

Powerflow will decompose the simulation into N-1 partitions which when simulation start, will leave 1 cpu for running the “CP” process on. This is suboptimal but unless we do this, slurm will complain with:

srun: error: Unable to create step for job 197: More processors requested than permitted

There is probably a smart way of telling slurm about a master process which I hope to learn about soon and use to properly run powerflow with a separate “CP” node.

Submit to slurm

Submitting the script is simply:

$ squeue -p debug ./powerflow-on-ubuntu.sh

You can watch your Infiniband counters to see that significant amount of traffic is sent over the wire which will indicate that you have succeeded.

watch -d cat /sys/class/infiniband/mlx5_0/ports/1/counters/port_rcv_packets

You can also inspect a status file that Powerflow writes continously like this from the working directory of the simulation:

# Lets look at the status of the simulation
$ cat .exa_jobctl_status
Decomposer: Decomposing scale 6

# ... again
$ cat .exa_jobctl_status
Simulator: Initializing voxels [43% complete [21947662 of 51040996]

# ... and once the simulation is complete.
$ cat .exa_jobctl_status
Simulator: Finished 100 Timesteps

I’ve been presenting at Ubuntu Masters about the setup I use to work with my systems which allows me to do things like this easily. Here is a link to that material: https://youtu.be/SGrRqCuiT90

Running STARCCM+ using OpenMPI on Ubuntu with SLURM and Infiniband

This is a walkthrough on my work on running a proprietary computational fluid dynamics code, StarCCM+ on Ubuntu18.04 using the snap version of SLURM with openMPI 4.0.4 over Infiniband.

You can use this to perform scaling studies, track down issues and optimizing performance or use it as you like. Much of this will work on other OS:es too.

This is the workbench used here:

Hardware: 2 hosts with 2×20 cores 187GB ram.
Infiniband: Mellanox MT28908 Family [ConnectX-6]
OS: Linux 4.15.0-109-generic (x86_64) Ubuntu18.04.4
SLURM 20.04 (https://snapcraft.io/slurm)
OpenMPI: 4.0.4 (ucx, openib)
StarCCM+: STAR-CCM+14.06.012
A Reference model which is small enough for your computers and large enough to run over 2 nodes.

Lets get started.

Modify ulimits on all nodes.

This is done by editing /etc/security/limits.d/30-slurm.conf

* soft nofile  65000
* hard nofile  65000
* soft memlock unlimited
* hard memlock unlimited
* soft stack unlimited
* hard stack unlimited

Modify slurm systemd unit startup files to make ulimit permanent to the slurmd processes.

$ sudo systemctl edit snap.slurm.slurmd.service

[Service]
LimitNOFILE=131072
LimitMEMLOCK=infinity
LimitSTACK=infinity

* Restart slurm on all nodes.

$ sudo systemctl restart snap.slurm.slurmd.service

* Make sure login nodes has correct ulimits after a login.

* Validate that all worker nodes also has correct values on ulimits when using slurm. For example:

$ srun -N 1 --pty bash
$ ulimit -a

You must have all consistent settings for ulimit or things will go sideways. Remember that slurm propagates ulimits from the submitting node, so make sure those are consistent there too.

Compile OpenMPI 4.0.4

At the time, this is the latest version. This is my configure but I think you can compile it differently for your needs.

$ ./configure --without-cm --with-ib --prefix=/opt/openmpi-4.0.4
$ make
$ make install

Validate that openmpi can see the correct mca ucx

I’m most concerned in this step that ucx pml is available in the MCA for openmpi, so after my compilation is done, I check for that and the openib btl.

$ /opt/openmpi-4.0.4/bin/ompi_info  | grep -E 'btl|ucx'

MCA btl: openib (MCA v2.1.0, API v3.1.0, Component v4.0.4)
MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.0.4)
MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.0.4)
MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.0.4)
MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v4.0.4)
MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.0.4)
MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.0.4)

What we are looking for here is:

* MCA btl: openib (MCA v2.1.0, API v3.1.0, Component v4.0.4)
* MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.0.4)

The rest are not important at this point. But you might know better, please let me know. You can see in the jobscript later where these modules are referenced.

Validate that ucx_info see your Infiniband device and ib_verbs transports

In my case, I have a Mellanox device (show with: ibv_devices) so I should see that with ucx_info:

$ ucx_info -d | grep -1 mlx5_0

#
# Memory domain: mlx5_0
#     Component: ib

#   Transport: rc_verbs
#      Device: mlx5_0:1
#

#   Transport: rc_mlx5
#      Device: mlx5_0:1
#

#   Transport: dc_mlx5
#      Device: mlx5_0:1
#

#   Transport: ud_verbs
#      Device: mlx5_0:1
#

#   Transport: ud_mlx5
#      Device: mlx5_0:1
#

Modify the STARCCM+ installation

My version of StarCCM uses an old ucx and calls /usr/bin/ucx_info. At some point ending during startup, it fails when its not able to find libibcm.so.1 when using our custom openMPI. Perhaps there is a way to force starccm+ to look for ucx_info on the system, but I have not found any way to do this.

To have StarCCM+ ignore its own ucx, simply remove the ucx from the installation tree and replace with an empty directory.

$ sudo  rm -rf /opt/STAR-CCM+14.06.012/ucx/1.5.0-cda-001/linux-x86_64*
$ mkdir -p /opt/STAR-CCM+14.06.012/ucx/1.5.0-cda-001/linux-x86_64-2.17/gnu7.1/lib

This is not needed on OS:es such at centos6 and centos7 because they use the deprecated library libibcm.so.1.

Time to write the job-script

#!/bin/bash
#SBATCH -J starccmref
#SBATCH -N 2
#SBATCH -n 80
set -o xtrace
set -e

# StarCCM+
export PATH=$PATH:/opt/STAR-CCM+14.06.012/star/bin

# OpenMPI
export OPENMPI_DIR=/opt/openmpi-4.0.4
export PATH=${OPENMPI_DIR}/bin:$PATH
export LD_LIBRARY_PATH=${OPENMPI_DIR}/lib

# Kill any leftovers from previous runs
kill_starccm+

export CDLMD_LICENSE_FILE="27012@license.server.com"
SIM_FILE=SteadyFlowBackwardFacingStep_final.sim
STAR_CLASS_PATH="/software/Java/poi-3.7-FINAL"
NODE_FILE="nodefile"

# Assemble a nodelist using this python lib
hostListbin=/software/hostlist/python-hostlist-1.18/hostlist
$hostListbin --append=: --append-slurm-tasks=$SLURM_TASKS_PER_NODE -e $SLURM_JOB_NODELIST >  $NODE_FILE
# Start
starccm+ -machinefile ${NODE_FILE} \
         -power \
         -batch ./starccmSim.java \
         -np $SLURM_NTASKS \
         -ldlibpath $LD_LIBRARY_PATH \
         -classpath $STAR_CLASS_PATH \
         -fabricverbose \
         -mpi openmpi \
         -mpiflags "--mca pml ucx --mca btl openib --mca pml_base_verbose 10 --mca mtl_base_verbose 10"  \
         ./SteadyFlowBackwardFacingStep_final.siM
# Kill off any rogue processes
kill_starccm+

You probably need to modify this above script for your own environment but the general things are in there.

Submit to slurm

You want this job to run on multiple machines, so for me I use a -n 80 to allocate 2×40 cores which is understood by slurm to allocate the two nodes I have used in the example. If you have less more more cores than I do, use a 2xN number in your submit.

$ squeue -p debug -n 80 ./starccmubuntu.sh

You can watch your Infiniband counters to see that significant amount of traffic is sent over the wire which will indicate that you have succeeded.

watch -d cat /sys/class/infiniband/mlx5_0/ports/1/counters/port_rcv_packets

I’ve been presenting at Ubuntu Masters about the setup I use to work with my systems which allows me to do things like this easily. Here is a link to that material: https://youtu.be/SGrRqCuiT90

Here is also another simular walkthrough of doing this also with the CFD application Powerflow: https://eriklonroth.com/2020/07/15/running-powerflow-on-ubuntu-with-slurm-and-infiniband/

Workshop open source for the future

Come join the Edgeryders global festival and create an open source future for the coming generations of internet citizens.

Our theory of change is building dense networks around people trying to tackle messy socialecological, economic and political challenges. Having a dense network around you is one of the hallmarks of social capital, and gives you access to expertise, resources, skill sharing and financing. In a dense network, there are many ways for you to get from one point to another. A key challenge here is for people with aligned interests to find each other, and as a community we’re out to solve just that.

If you, like me, think open source matters – exceedingly much, in a digital world – this is something for you.

The describing text below is in swedish, but the english version is found here: Teaching Teachers Open Source

Edgeryders-Fest-Twitter-1100x628-erik-svenks-01

https://festival.edgeryders.eu/

När: 28:e November – 10:00 – 17:00
Var: Södra Hamnvägen 9 (Hus Blivande, Vägbeskrivning här)
Hur: Lunch och kaffe ingår.

Biljett får du så här:

  1. Besök https://tell.edgeryders.eu/festival/ticket och fyll i formuläret.
  2. Skapa ett edgeryders konto och introducera dig i forumet..
  3. När du introducerat dig på forumet, får du en biljett till dig via email.

Bakgrund till workshopen

Internet, datorer och programmering blir idag omedelbart en del i våra barns liv. Kommande generationer, kommer aldrig upplevt en värld utan internet och datorer.

I utbildningssystemet, introduceras de för verktyg, tankar och begrepp härledda från den digitala världen. Det är i skolan vi formar mycket av vår världsbild och idag är det mesta av stängd, trots att den borde vara öppen.

Vad händer egentligen med kunskap en digital framtid om en stängd vy på mjukvara och internet får lämnas outmanad i skolan?

Fri och öppen mjukvara präglas av transparens, samarbete, frihet och bemyndigande. Etablerade hörnstenar i vårt utbildningssystem. Så varför lever vi inte som vi lär när det gäller det här området som kommer forma våra barn i grunden?

Hypotesen är att skolvärlden saknar djup och kunskap hur fri och öppen källkod spelar en central roll i hur kunskapen om den digitala världen fungerar. Att utbilda lärare och ledare på detta område är så vi hjälper kommande generationer att forma nästa generation av internetmedborgare.

Agerar du redan i skolvärlden, är ditt deltagande välkommet och kommer du från ett open source engagemang och från mjukvaruvärlden, är din kunskap och erfarenhet central för att forma innehållet.

Vad kommer du att uppleva?

Du kommer få chans att lära om hörnstenar inom fri & öppen mjukvara och hur de är grundläggande för kunskap i en digital värld.

Du får möjlighet att dela med dig av din kunskap inom skola, mjukvara eller både och – samt – ta del av andras.

Du kommer få samarbeta med experter och entusiaster om hur vi når ut till skolväsendet.

Du kommer träffa människor med samma brinnande intresse för att forma kommande generationers internet i en spännande miljö.

Magic with Juju

I’ve been working on a set of tutorials over the last few months on how to write “juju charms”.

If you haven’t checked out Juju yet – do it now.

Juju is an open source framework to operate small and large software stacks in any cloud environment. It works fairly transparent on well known clouds like AWS, Google Cloud Engine, Oracle Clouds, Joyent, RackSpace, CloudSigma, Azure, vsphere MAAS, Kubernetes, LXD and even custom clouds!

I touched base with Juju some 5 years ago and immediately understood that this was going to make an impact in the computer science domain. The primary reasons were to me:

  1. Open Source all the way.
  2. No domain specific languages required to work with it. E.g. use your preferred programming language to do your Infrastructure As Code / DevOps.
  3. Smart modular architecture (charms) that allows re-use and evolution of best solutions in the juju ecosystem.
  4. “Fairly” linux agnostic – although the community is heavy on Ubuntu.
  5. Cloud agnostic – you can use your favorite public cloud provider or use your own private cloud, like an openStack or even VMware environment. Huge benefit to anyone looking to move into cloud.
  6. Robust and maintained codebase backed by Canonical – if enterprise support matters to you. There are other companies, like OmniVector Solutions, also delivering commercial services around Juju if you don’t like Canonical. [Disclaimer: I’m biased in my affiliation with OmniVector]
  7. A healthy, active & friendly open source community – if you, like me, think a living open source community is a key factor to success.

Above factors weigh in heavy to me and even though there are a few competitors to Juju, I think you can’t really ignore it if you are in to DevOps, IaaC, CI/CD, clouds etc.

Juju means “Magic” which is an accurate description on what you can do with this tool.

So, here are a few decent starting points from my collection to getting started with juju. Knock yourself out.

The juju community

Getting started with juju.

Tutorial “Getting started with juju hooks”.

Tutorial “Getting started with reactive charms”

Tutorial “Charms with snaps”

Massive open source win!

Now, this is good news!

The OpenChain project, (backed by The Linux Foundation) helps companies all over the world to be compliant with good practices dealing with open source. In effect, fundamentally to how they do business together within the software realm.

On December 6, 2018 – a large amount of huge players within industry made a massive joint announcement regarding this project! 

I’ve been working for over 10 years, slowly pulling my employer Scania towards this path – and now, we are there! Have a look at the map and you also see the global reach. Given the massive support, from huge players in many domains, its foreseeable that most companies will or need to follow suite, many more which I’m helping in the process.

What does this mean to you?

If you are working for a (any) software company and want to sell or do business with any openChain compliant company in the future. You need to show that your company are following the open chain standards, or, you won’t be able to business with us. The only exception will be if none of your code is open source, which in our world – is more or less impossible.

OpenChain can be seen somewhat similar to an ISO certification for software. This change, will need a transition period of course – where exceptions will be needed for sorting out proprietary mess – but we are getting there.

This framework, will in time, be affecting all software businesses. Probably all over the world. The initiative comes not from politics, but from a strong and healthy private sector and is a massive win for open source in the whole world. As the chairman of open source at Scania, and its representative within the Volkswagen group (640.000 employees), I’m more than happy to announce this message!

Now, go open source! Share this message to everyone that works with software.

 

 

Develop or die

I’ve been technical lead for a HPC center for over a decade now and for some 15 years  watched paradigms come and go within the systems engineering domain. This text is about future developments of HPC in relation to the more general software domain. There are some very tough challenges ahead and I’ll try to explain what is going on.

But first, let me get you in line of thinking with a bit of my own story.

Turning forty this year, exploring and experimenting within computer science, has landed me in a very peculiar place. People reach out for advice in very difficult matters around building all sorts of crazy computer systems. Some sets out to build successful business, some other just want to build robots. I help out as much as I can, although my time normally is just gone. Its a weird feeling to be referenced as senior. Where did those years go?

My experience in building HPC systems started in the early 2000 and from a computer science perspective, I’ve tried to stay aligned with the development in the field. This is never easy in large organizations, but having an employer bold enough to let me develop the field has made it possible, not only to deliver enterprise grade HPC for over a decade – but also to develop some other parts of it in exciting ways.

It didn’t always look like this of course.

I’ve been that person, constantly promoting and adopting “open source” wherever I’ve gone. A concept that at least in automotive (where I roam mostly) was pariah (at best) in my junior years. I came in contact with open source and Linux in the university studies and was convinced open source was going to take over the world. I could see Linux dominate the data centers from single the fact that the licensing model would allow it. There are off course tons of other reasons for that to happen, but the licensing model in it self was decisive.

I’m sorry, but a license model that doesn’t scale better than O(n) is just doomed in the face of O(1).

(I think I’m a nightmare to talk to sometimes.)

Anyway, telling my story of computer science and systems engineering, can’t be without reflecting on some technology elements and methodologies that came about during the past decades. After all, important projections on systems engineering can’t be without referencing technology paradigms.

Late in the mid 1990:ies, “Unix Sysadmins” ruled the domain of HPC. Enterprise systems outside of HPC was Microsoft Windows. For heavyweight systems, VMS and UNIX dialect systems prevailed. Proprietary software was what mattered to management (still is?), and everything was home-rolled. The concept of getting “Locked-In” by proprietary systems was poorly understood which was reflected by low understanding of real costs and how ability to build high quality systems was constituted. Building HPC system at this time was an art form. Web front ends didn’t exist. Most people were self taught and HPC was very different from today. My first HPC cluster (for a large company) lived in wooden shelves from IKEA in a large wardrobe. The CTO of the IT division was a carpenter. I’ll refer to this era as the “Windows/Unix Era“.

The Windows/Unix Era started to morph during the 2000 into the Linux revolution. In just a few years, available (not only) HPC software and infrastructure was getting more and more Linux. Other OS dialects such as HPUX, VMS, AIX, Solaris and Mainframe ended up in phase-out, witnesses of an era slowly disappearing. Linux on client desktops entered the stage and at this point, the crafting of HPC system was becoming an art of automation. Mac:s got thinner.

Sysadmin meant in practice, scripting. bash, ash, ksh etc. cfengine, ansible and puppet. Computer science was starting to become quasi magic (that I think most people perceive computers in general) and advanced provisioning systems became popular. Automation!

I was myself using Rocks Clusters for my HPC platform, with some Red Hat Satellite assistance to achieve various forms of automation. The still ever popular DevOps appeared, mainly in the web services domain and some people started to do python in favor of bash (OMG!). Everything was now moving towards Open Source. It became impossible to ignore this development, also at my primary employer. I became the chairman of the open source forum and slowly started to formalize that field in a professional way. A decade behind giants such as Canonical, IBM or RedHat, but hey – automotive is not IT, right?

However, time and DevOps development stood still in the HPC domain. I blame the performance loss in virtualization layers for that situation, or perhaps being victim of own success. While vmware, xen, kvm etc. filled up the data centers in just a few years, it never took off in HPC. Half a decade passed by with this and so called “clouds” started to appear.

I personally hate the term cloud. It blurs and confuses the true essence which perhaps is better put like: “Someone else’s computer”.

Recent development in the cloud has exposed one of the core problems with cloud resources. I’m not going to make the full case here, but generally speaking, access/integrity/safety/confidentiality of any data is almost impossible to safeguard – if you are using someone else’s computer. There is a fairly easily accessible remedy, which I’ll throw up a fancy quote for:

“Architect distributed systems, keep your data local & always stay in control over your own computing. Encrypt”. 

I think time will prove me right and I’ve challenged that heavily by taking active part in a long running research project; LIM-IT that is set out to map and understand Lock-in effects of various kind. The results from this project should be mandatory education to any IT-manager that wants to stay on top of their game in my industry.

From a technology point of view, I can recommend looking at projects like Nextcloud, OpenStack, Bittorrent, etc. But there are many, many others that operates with this great mindset.

So, where are we now. First of all, HPC is in desperate need of getting ingested by technology from other domains. Especially low hanging fruits like those from the Big Data domain. Just to take one conceptual example; making use of TCP/IP communication stacks to solve typical HPC problems. One of my research teams explored this two years ago by implementing a pCMC algorithm in SPARK as part of a thesis program. The intention was to explore and prove that HPC systems indeed serve as excellent hybrid solutions to tackle typical Data Analytics problems and vice versa.  I’m sure MPI-doctors will object, but frankly, that’s a lot of “Not Invented Here” syndrome. The results from our research thesis spoke for itself. Oh, the code can be downloaded here under a open source license so you can try it yourself. (Credits to Tingwei Huang, and Huijie Shen for excellent work on this.)

Now, there is a downside to everything. So also when opening up for a completely new domain of software stacks. Once you travel down the path of revamping your HPC environment with new technologies from, for example, “Cloud” (Oh I hate that word) and going all in “Everything as a Service” – you are basically faced with an avalanche of technology. Hadoop, SPARK, ELK, OpenStack, Jenkins, K8, Docker, LXC, Travis, etc. etc. All of these stacks requiring years, if not decades, to learn. There exists few, if any, persons that master even a few of these stacks and their skills are highly desired on a global market for computer wizards. Even more true is that they’re probably not on your pay list, because you just can’t afford them.

So, as a manager in HPC, or some other complex enough IT-environment. You face the Gordian knot of a problem:

“How do I manage and operate an IT-environment of vast complexity PLUS manage to keep my cost under some kind of control?”

Mark Shuttleworth talks brilliantly about this in his key-notes nowadays by the way which I’m happy to repeat.

Most managers will fail facing this challenge (unfortunately) and they will fail in three ways:

  1. Managers in IT will fail to adopt the right kind of technologies and get “locked in” on either data-lock-in, license-lock-in, vendor-lock-in, technology-lock-in or all of them – ending up in a budget disaster.
  2. Managers in IT will fail to recruit skilled enough people or to many with the wrong skill set – they will fail to deliver relevant services to the market.
  3. Managers in IT will do both of the above – delivering sub-performing services too late to market, at extreme costs.

If I’m right in my dystopian projections, we will see quite a few companies go down within IT in the coming years. The market will be brutal to those failing to develop. Computer Scientists will be able to ask for fantasy salaries and there will be a latent panic in management.

I’ve spent significant amount of time researching and working on this very challenging problem, which is general to its nature, but definitely applicable to my own expert domain in HPC. In a few days, I’ll fly out to get the opportunity to present a snapshot of my work to one of Europe’s largest HPC center in Germany – HLRS. Its happy times indeed.

My native ipv6 – part1

So, I’ve spent about two years arguing with my network provider Fibra to enable my high speed internet for ipv6 native. A struggle that in January 2018 payed off. I was kindly offered to be included in their PoC for ipv6 and I was thrilled.

Thank you Fibra for finally coming around on this.

I’m no guru on ipv6, but I feel comfortable navigating the fundamentals and have been running ipv6 tunnels for many years now. This is my hopefully short blogg series on getting my native ipv6 working at home.

My ambition at this point, is to have a public ipv6 address assigned from my network provider along with a 48 bit routed network prefix which will be split into a 64 bit prefix delegation for my home network.

This seems like a very basic setup at the time of writing and suits my ambitions for now. Later, I will be running separate ipv6 prefixes on my other virtual environments, while keeping the home network separate.

My current WAN access setup is a openWRT Chaos Calmer TPLINK with the WAN interface obtaining public IP:s from Bahnhof in Sweden. Fibra is the network provider, that for the purpose of the PoC, has turned on the switchport for ipv6 traffic for me.

For this to make sense to a reader, this is an extract of a starting /etc/config/network that serves as a starting point for a native ipv6 setup.

config interface 'wan'
 option ifname 'eth0.2'
 option proto 'dhcp'
 option ipv6 '1'

​config interface 'wan6'
 option _orig_ifname 'eth0.2'
 option _orig_bridge 'false'
 option ifname 'eth0.2'
 option proto 'dhcpv6'
 option reqaddress 'try'
 option reqprefix 'auto'

Bringing up and down the interface should have you ending up with a ipv6 address on the eth0.2 interface. I did that maneuver, but no ipv6 address was handed out to me.

Basic debugging needed, you would likely do the same if you get this result.

I started by turning off the firewall (to be sure no traffic is dropped before it hits the tcpstack):

$ /etc/init.d/firewall stop

Then I tcpdump for ipv6 icmp traffic. I expect to see some ipv6 icmp traffic which I did. (in an ipv6 network, icmp traffic is chatty which is a bit different than you might be used to from ipv4 networks)

$ tcpdump -i eth0.2 icmp6

I could see Router Advertisement messages from an interface not on my devices, so something was working at Fibras side. I needed to verify that my router actually sent out dhcpv6 messages and the replies that should follow.

So, I force the ipv6 dhcp client (odhcp6c) to send router solicitation messages out on the interface I wish to handle the ipv6 native stack.

​$ odhcp6c -s /lib/netifd/dhcpv6.script -Nforce -P48 -t120 eth0.2

tcpdumpv6

Nope, no reply from my network provider.

What should happen here is that the dhcp6 server at my network provider Fibra side should respond with a configuration package (Advertise), but as you can see – its just spamming RA packages, seemingly ignoring my client Router Solicitations. This is not the way its supposed to be and I sent Fibra the information above.

To be continued.

Update 1:

For the curious reader, a correct dhcpv6 exchange (RFC 3315) should look like this:
Client -> Solicit
Server -> Advertise
Client -> Request
Server -> Reply

Update 2:

Also a note from my ipv6 friend guru Jimmy is that, if you need to see more details on the ipv6 dhcp messages coming from the odhcp6c, you can expand the tcpdump filter as:

$ ​tcpdump -i eth0.2 icmp6 or port 546 or 547

You should see packages coming out from your interface as:

IP6 fe80::56e6:fcff:fe9a:246a.dhcpv6-client > ff02::1:2.dhcpv6-server: dhcp6 solicit

 

 

My blogg

I’ll add some blogging here soon.