Friday, 22 January 2016

Port forwarding with systemd

I wanted to run a daemon on a remote shell server and connect to it remotely.
Specifically it's a quassel core which I want to run inside the vpn.

Problem 1: The shell server, quite sensibly, doesn't allow external

Problem 2: The shell server, less sensibly, doesn't permit ssh local port
forwarding. I'll demonstrate why this doesn't make a great deal of sense on a
shell server (spoiler: you can do it anyway).

Assumption: When appropriately connected and authenticated, I can ssh into the
shell server without entering a password.

My first thought was obviously ssh port forwarding:
$ ssh -L 5555:localhost:5555 -Nf
But no:
channel 3: open failed: administratively prohibited: open failed
Boo! nc is available on the shell server, though, and if it wasn't I'd just
copy it there:
$ cat > ~/bin/ <<EOF
exec ssh -- nc localhost 5555
$ nc -l 5555 --keep-open -e ~/bin/
This runs a local nc process which acts like inetd. When it gets a connection
on 5555 it runs, which sshes to the shell server and runs nc
remotely to connect to my daemon on remote localhost port 5555. Now, I can
point my quassel client at localhost 5555 and it will create the ssh
connections automatically. Which is nice.

But wait! We can do better than that. I'd still need to remember to fire up nc
when I log in to my workstation, and that's way too much like hard work.
systemd to the rescue!

On an appropriately configured workstation (my Fedora 23 workstation does this
by default), systemd will run both a system daemon, and user daemons for any
users with active sessions. systemd also does the inetd thing, so I can just
write my own systemd unit for it:
$ mkdir -p ~/.config/systemd/user 
$ cat > ~/.config/systemd/user/quassel.socket <<EOF
Description=Proxy quassel connections


$ cat > ~/.config/systemd/user/quassel@.service <<EOF
Description=Proxy Quassel connections

ExecStart=-/usr/bin/ssh -- nc localhost 5555
$ systemd --user daemon-reload
$ systemd --user enable quassel.socket
$ systemd --user start quassel.socket
And that's it! Whenever I'm logged in to my workstation, connected to the vpn,
and appropriately authenticated, connections to localhost 5555 will be
automagically proxied over ssh to my quassel core daemon.

Tuesday, 15 September 2015

Taking a Nova Compute host down for maintenance

There are times when you want to take an OpenStack Nova compute host down, for example to upgrade the host's OS or hardware. Of course, in a perfectly cloudy world you'd just pull the plug and your massively distributed app would automatically reconfigure itself around the missing bits. In practise, though, you might want something a little less disruptive.

I wrote nova-compute-maintenance to do a best-effort evacuation of a nova compute host prior to taking it down. This tool first disables the target nova compute service in the scheduler, which will prevent any new instances from being scheduled to it. Then it attempts to live migrate all instances on the host to other hosts, leaving the decision about where to the scheduler.

Usage is simple. It uses the python nova client library, and takes its authentication credentials from the environment the same way the nova command line client does:

$ source keystonerc_admin

For simplicity, they can only be specified via the environment, not via the command line. It takes a small number of command line options:

$ usage: [-h]
         [--max-migrations MAX_MIGRATIONS]
         [--poll-interval POLL_INTERVAL]

At its simplest, the invocation is just:

$ ./

The tool is quite chatty. It will initially display a list of all instances it found:

Found instances on host:
  foo-7ac59434-1e45-47a3-bc84-8e39dd9562e8(7ac59434- ...
  foo-16d59bc9-c072-4342-bd85-18fa3b8aa47a(16d59bc9- ...
  foo-4a8d3e3d-de86-4bf3-83f6-9d57ba76a7af(4a8d3e3d- ...
  foo-8a729a41-b59f-470d-ba69-dfa5adae6384(8a729a41- ...
  foo-d8dccfe4-2ec7-4e5e-98d0-de453ef2cae8(d8dccfe4- ...
  foo-325f5d60-ea64-4141-9c20-d74cb7796578(325f5d60- ...
  foo-3e8ce1b7-3ffb-4cbd-92b8-0139ae726f1a(3e8ce1b7- ...
  foo-6293c837-f40d-40b9-b90c-66dc34acc114(6293c837- ...

It will initiate up to a fixed number of migrations at any one time. By default this is 2, but this can be adjusted for the capabilities of your system with the --max-migrations argument. It polls nova regularly to monitor the status of these migrations, and start new ones if required. It displays its current status every time it polls:

  foo-7ac59434-1e45-47a3-bc84-8e39dd9562e8(7ac59434- ...
  foo-16d59bc9-c072-4342-bd85-18fa3b8aa47a(16d59bc9- ...

On completion it displays success or failure. In this case the evacuation failed. There is 1 instance left on the host, and it is in the ACTIVE state.

Failed to migrate the following instances:
  foo-6293c837-f40d-40b9-b90c-66dc34acc114(6293c837- ...: ACTIVE
See logs for details

The tool is idempotent, so if it fails it's completely safe to run it again:

Found instances on host:
  foo-6293c837-f40d-40b9-b90c-66dc34acc114(6293c837- ...
  foo-6293c837-f40d-40b9-b90c-66dc34acc114(6293c837- ...
Success: No instances left on host

You can test the success or failure of the script by its exit code, which follows the usual convention: zero for success, non-zero for failure.

The tool is conservative by default: it will not do anything disruptive to an instance. This means that there are certain instances which it cannot handle automatically. These include instances which are paused, being rescued, or in the error state. If the host has any instances in these states, the tool will migrate all other instances, but leave these in place. As above, the tool will report failure and list all remaining instances and their states.

There is 1 case where the tool will disrupt an instance. If you specify --cold-fallback on the command line and it fails to live migrate an instance 3 times, it will fall back to trying a cold migration. This will cause the instance to be shut down for the duration of the migration. By default, if live migration fails the tool will leave it alone and report it as a failure.

I have developed this tool against Red Hat Enterprise Linux OpenStack Platform 5, which is based on OpenStack Icehouse. I would expect it to work against subsequent versions, too.

N.B. This tool's functionality overlaps significantly with the host-evacuate-live command of the Nova client, although it is considerably more robust. It is my intention to roll the functionality of this tool into Nova itself, or failing that a more robust command in the Nova client. This external tool is intended to bridge the gap until that lands.

Tuesday, 7 July 2015

Don't cache repomd.xml when using squid

If, like me, you try to ensure all your package update traffic goes through a caching proxy, you may occasionally have hit an issue where yum/dnf is trying to download repo metadata which doesn't exist. This can happen because your proxy is caching repomd.xml, and the cached version refers to metadata which has since been deleted.

Fortunately repomd.xml is normally very small, typically less than 4k, so the simplest solution is just not to cache it. You can do that be adding the following stanza to squid.conf:
acl repomd url_regex /repomd\.xml$
cache deny repomd

Thursday, 18 June 2015

Git pull through a proxy

I found myself this afternoon trying to do a git pull from an internal host with a self-signed (or at least, signed by an authority I don't have locally) certificate. I also needed to do this through a proxy. This required a git incantation I hadn't come across before, and also the trusty https_proxy environment variable:

  https_proxy=http://proxy:3128/ \
  git pull refs/changes/X/Y/Z

Thursday, 11 June 2015

Diffing diffs with bash process substitution

I found myself wanting to examine the differences between different git commits. Specifically these commits represent the same change applied to 2 different branches, so I'm interested in what changes the backport author had to make. I was initially using temporary files for this, but stumbled across this bash gem:
$ diff -u <(git show [original]) <(git show [backport])
Note that this isn't your regular redirection, because I'm passing 2 filenames to diff. This is bash's process substitution syntax. It essentially runs a command and substitutes the path of a temporary named pipe connected to the command.

Stick it in a bash function, and you get:
function diffdiff() {
    diff -u <(git show "$1") <(git show "$2") | less
Now I can run:
$ diffdiff 01234567 89abcdef
and see my gloriously diffed diffs.

Wednesday, 10 June 2015

Configure a simple Galera cluster on Fedora

This post documents the most basic possible configuration of a 2-node Galera cluster on Fedora 22. The resulting cluster is good enough for playing with Galera, but should not be considered ready for production.

The required tasks are:
  • Install Fedora and Galera packages
  • Configure firewall
  • Configure SELinux
  • Configure Galera
  • Start Galera cluster

Install Fedora and Galera packages

For the purposes of testing I am using 2 virtual machines, each with 1 CPU, 1 NIC, 2GB RAM and a single 5GB disk. I have installed both of these with Fedora 22 Server using the minimal install + standard packages as defined in the installer. Configure networking such that both nodes can reach each other and DNS is working correctly.

Install Galera and its dependencies with dnf:
# dnf install mariadb-galera-server

Configure firewall

Galera uses the following TCP ports:
  • 3306 for MySQL network connections
  • 4567 for cluster traffic
  • 4444 for state snapshot transfer using rsync
Note that it does not keep the rsync port open, as it is only used when joining the cluster. Other backends may use different ports.

Open the above ports in the firewall with:
# firewall-cmd --add-port 4567/tcp --add-port 4444/tcp
# firewall-cmd --permanent --add-port 4567/tcp --add-port 4444/tcp
# firewall-cmd --add-service mysql
# firewall-cmd --permanent --add-service mysql

Configure SELinux

Note that the default targeted SELinux policy shipped with Fedora 22 will currently prevent Galera from starting on all but the first node. I have reported this in Fedora bug 1229794. There is a reasonable chance that by the time you read this the bug will have been fixed, or the steps below replaced with an SELinux boolean. Please check first.

Copy the following into a file called galeralocal.te:
module galeralocal 1.0;

require {
    type rsync_exec_t;
    type mysqld_safe_exec_t;
    type kerberos_port_t;
    type mysqld_t;
    class tcp_socket name_bind;
    class file { getattr read open execute execute_no_trans };

#============= mysqld_t ==============
allow mysqld_t kerberos_port_t:tcp_socket name_bind;
allow mysqld_t mysqld_safe_exec_t:file getattr;
allow mysqld_t rsync_exec_t:file { read getattr open execute execute_no_trans };
Install the required tools to compile and install local SELinux policy:
# dnf install checkpolicy policycoreutils-python
Compile and load the custom SELinux policy:
# checkmodule -M -m -o galeralocal.mod galeralocal.te
# semodule_package -o galeralocal.pp -m galeralocal.mod
# semodule -i galeralocal.pp
The module must be installed on all nodes. However, after compiling the module on 1 node it is sufficient to copy galeralocal.pp to the other and just install it with semodule, meaning it isn't necessary to install the additional packages.

Configure Galera

Fedora puts the default galera configuration /etc/my.cnf.d/galera.cnf. For the purposes of this setup we leave almost everything untouched. We need to set the wsrep_provider to Galera:
Comment out wsrep_sst_auth, which is not used by the rsync sst method:
Finally, set wsrep_cluster_address to a list of both nodes:
Make these changes on both nodes. 

Starting the cluster

With the above configuration we are almost ready to start the cluster. However, if you try to start the database you will notice that it fails. At startup, the database will attempt to connect to any of the hosts listed in wsrep_cluster_address other than itself, to get the current database state. If it can't do this, it can't safely join the database. However, if none of the database nodes are running we have a bootstrapping problem.

To get round this, after ensuring that the database is most definitely not running anywhere, we edit the configuration on the node with the most up to date state to tell it to start without contacting any other database node. When initialising the cluster this can be any node.

Edit /etc/my.cnf.d/galera.cnf again to set:
With this in place, start the database with:
# systemctl start mariadb
Once the database has come up, you should put wsrep_cluster_address back to its original setting immediately. The reason for this is that it allows the database to come up without synchronising. If the database is, in fact, running somewhere, this will result in diverged state.

With 1 node running, you can now start all the other nodes with:
# systemctl start mariadb

Using the database

You now have a naively configured multi-master Galera cluster. Connect to mysql on any node and use it as normal:
# mysql
MariaDB [(none)]> create database foo;
Query OK, 1 row affected (0.00 sec) 
MariaDB [(none)]> connect foo 
MariaDB [foo]> create table bar (id int auto_increment primary key);
Query OK, 0 rows affected (0.02 sec)
New databases and data will be propagated immediately to all other nodes, and updates can be made on any node.