dischord.org2024-03-12T21:45:57+00:00http://dischord.orgNick Jonesnick@dischord.orgDebian testing / X1 Nano G2 setup notes2024-03-07T22:09:00+00:00http://dischord.org/2024/03/07/debian-12-testing-x1-nano-g2-setup-notes<p>I recently picked up a second <a href="https://www.lenovo.com/gb/en/p/laptops/thinkpad/thinkpadx1/thinkpad-x1-nano-gen-2-(13-inch-intel)/len101t0008">ThinkPad X1 Nano</a>, a Gen 2 device, ostensibly to replace my older but much loved Gen 1 machine. It doesn’t really need replacing since it’s still a great little machine and easily my preferred laptop when travelling, but the deal I found for the Gen 2 was too good to pass up so here we are.</p>
<p>Both this and the G1 run flawlessly under Linux; Everything is supported - even firmware updates, it’s relatively power-efficient (I’m aware that the G2 isn’t as good in this regard), and fast despite a relatively weedy CPU.</p>
<p>This time I thought I’d dump some notes on how I configure Debian on my machines, as well as anything specific required for this device.</p>
<h1 id="hardware">Hardware</h1>
<p>It’s basically the ‘poverty’ spec model with an Intel i5-1240P CPU, 16GB RAM, and a 256GB disk. All fine apart from the disk, fortunately this is one of the parts that is user-servicable and thus upgradeable. Unfortunately it’s a 2242 NVMe and a single-sided one at that which as it turns out are pretty rare. The good news is that you can instead fit a 2230 - which are in plentiful single-sided supply - if you use a <a href="https://www.amazon.co.uk/dp/B0BLH4WHS3?psc=1&ref=ppx_yo2ov_dt_b_product_details">simple adapter</a> <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. I ordered that plus a 1TB Corsair <a href="https://www.amazon.co.uk/dp/B0C28HLKNB?psc=1&ref=ppx_yo2ov_dt_b_product_details">MP600</a> since the reviews for this drive were positive for both performance as well as power consumption.</p>
<p>The other key upgrade is replacing the default TrackPoint cap with <a href="https://www.etsy.com/listing/862901861/3mm-high-softrim-type-3d-printed-caps">one of these</a>.</p>
<p>At that point it was ready for a nice fresh install of Debian. Here’s some rough notes.</p>
<h1 id="install">Install</h1>
<p>Guided, but then delete default <code>/</code>, <code>swap</code>, and <code>/home</code>.</p>
<ul>
<li>Add <code>120GB root</code></li>
<li>Create encrypted volume for remainder</li>
<li>Create partition in encrypted volume for <code>/home</code></li>
<li>Do not add swap (we’ll add a swap file post install)</li>
</ul>
<h2 id="update-to-testing">Update to <code>testing</code></h2>
<p>If you’ve not used the installer for testing, then:</p>
<p>Edit <code>/etc/apt/sources.list</code> and change all <code>bookworm</code> references to <code>testing</code></p>
<pre><code>sudo apt-get update
sudo apt-get -y dist-upgrade
</code></pre>
<p>Reboot once done.</p>
<h2 id="reconfigure-terminal-font">Reconfigure Terminal font</h2>
<pre><code>sudo dpkg-reconfigure console-setup
</code></pre>
<p>UTF8 - Latin1 - Terminus - 14x28 FB only</p>
<h2 id="liquorix-kernel"><a href="https://liquorix.net/">Liquorix kernel</a>:</h2>
<pre><code>curl -s 'https://liquorix.net/install-liquorix.sh' | sudo bash
</code></pre>
<h2 id="setup-swapfile">Setup swapfile</h2>
<pre><code>sudo fallocate -l 8G /.swap
sudo mkswap /.swap
sudo chmod 0600 /.swap
</code></pre>
<p>Line for <code>fstab</code>:</p>
<pre><code>/.swap none swap sw 0 0
</code></pre>
<p>Then <code>swapon -a</code>.</p>
<h1 id="packages">Packages</h1>
<p>Add <code>contrib</code> to <code>/etc/apt/sources.list</code></p>
<pre><code>git
curl
build-essential
imwheel
btop
syncthing
neovim
zsh
flatpak
gnupg2
powertop
ttf-mscorefonts-installer
gnome-software-plugin-flatpak
libssl-dev
</code></pre>
<h2 id="flatpak">Flatpak</h2>
<p><a href="https://flatpak.org/setup/Debian">https://flatpak.org/setup/Debian</a>:</p>
<pre><code>sudo apt install flatpak
sudo apt install gnome-software-plugin-flatpak
sudo flatpak remote-add --if-not-exists flathub https://dl.flathub.org/repo/flathub.flatpakrepo
</code></pre>
<p>Then install <a href="https://apps.gnome.org/en-GB/NewsFlash/">Newsflash</a> and <a href="https://obsidian.md/">Obsidian</a> from GNOME Software Store</p>
<h2 id="commercial">Commercial</h2>
<p>1Password: <a href="https://1password.com/downloads/linux/">https://1password.com/downloads/linux/</a></p>
<p>Chrome: <a href="https://1password.com/downloads/linux/">https://www.google.com/intl/en_uk/chrome/dr/download/</a></p>
<p>Slack: <a href="https://slack.com/intl/en-gb/downloads/linux">https://slack.com/intl/en-gb/downloads/linux</a></p>
<h2 id="kubernetes">Kubernetes</h2>
<p>kubectl:</p>
<pre><code>curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
</code></pre>
<p>Krew:</p>
<pre><code>(
set -x; cd "$(mktemp -d)" &&
OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
KREW="krew-${OS}_${ARCH}" &&
curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
tar zxvf "${KREW}.tar.gz" &&
./"${KREW}" install krew
)
</code></pre>
<p>Then plugins:</p>
<pre><code>kubectl krew install {ctx,ns,who-can,view-allocations,stern,tree}
</code></pre>
<p>And finally <a href="https://github.com/sbstp/kubie">Kubie</a> for quickly switching kubeconfigs:</p>
<pre><code>wget -O ~/bin/kubie https://github.com/sbstp/kubie/releases/download/v0.23.0/kubie-linux-amd64
chmod +x ~/bin/kubie
</code></pre>
<h2 id="programming-languages">Programming Languages</h2>
<h3 id="go">Go</h3>
<p>Download from <a href="https://go.dev/doc/install">https://go.dev/doc/install</a>, then:</p>
<pre><code>sudo tar -C /usr/local -xzf go1.22.1.linux-amd64.tar.gz
</code></pre>
<h3 id="pyenv"><a href="https://github.com/pyenv/pyenv">Pyenv</a></h3>
<pre><code>git clone https://github.com/pyenv/pyenv.git ~/.pyenv
cd ~/.pyenv && src/configure && make -C src
pyenv install 3.12.2
pyenv global 3.12.2
</code></pre>
<p>And then some Python packages:</p>
<h4 id="openstack">OpenStack</h4>
<pre><code>pip install python-{openstack,neutron,nova,octavia,swift,ironic}client
</code></pre>
<h4 id="ansible">Ansible</h4>
<pre><code>pip install -U ansible ansible-lint
</code></pre>
<h3 id="rbenv"><a href="https://github.com/rbenv/rbenv">Rbenv</a></h3>
<pre><code>git clone https://github.com/rbenv/rbenv.git ~/.rbenv
git clone https://github.com/rbenv/ruby-build.git "$(rbenv root)"/plugins/ruby-build
sudo apt -y install libyaml-dev
rbenv install 3.3.0
rbenv global 3.3.0
</code></pre>
<h2 id="neovim">Neovim</h2>
<p>Grab the latest release and extract:</p>
<pre><code>wget https://github.com/neovim/neovim/releases/latest/download/nvim-linux64.tar.gz
sudo tar zxvf nvim-linux64.tar.gz --strip-components=1 -C /usr/local
</code></pre>
<h2 id="helix"><a href="https://helix-editor.com/">Helix</a></h2>
<p>Grab the latest release from <a href="here">https://github.com/helix-editor/helix</a> and extract to ~/bin.</p>
<blockquote>
<p><a href="https://github.com/LGUG2Z/helix-vim">This repo</a> provides useful configuration options when you’re used to vim.</p>
</blockquote>
<h2 id="vistual-studio-code">Vistual Studio Code</h2>
<pre><code>sudo apt-get install wget gpg
wget -qO- https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > packages.microsoft.gpg
sudo install -D -o root -g root -m 644 packages.microsoft.gpg /etc/apt/keyrings/packages.microsoft.gpg
sudo sh -c 'echo "deb [arch=amd64,arm64,armhf signed-by=/etc/apt/keyrings/packages.microsoft.gpg] https://packages.microsoft.com/repos/code stable main" > /etc/apt/sources.list.d/vscode.list'
rm -f packages.microsoft.gpg
</code></pre>
<pre><code>sudo apt install apt-transport-https
sudo apt update
sudo apt install code
</code></pre>
<h2 id="docker">Docker</h2>
<pre><code>sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
bookworm stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
</code></pre>
<pre><code>sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
</code></pre>
<h1 id="shell-setup-and-dotfiles">Shell setup and dotfiles</h1>
<p>Install <a href="https://github.com/tarjoilija/zgen">zgen</a>:</p>
<pre><code>git clone https://github.com/tarjoilija/zgen.git "${HOME}/.zgen"
</code></pre>
<p>I typically copy across my shell’s <code>.history</code> (yes I know I should use something like <a href="https://atuin.sh/">Atuin</a>) and <code>git clone</code> my <a href="https://github.com/yankcrime/dotfiles">dotfiles</a>, symlinking various things into place.</p>
<h1 id="gnome-settings-and-extensions">GNOME Settings and Extensions</h1>
<p>Via gnome-tweaks:</p>
<ul>
<li>Display - set to 100% for scale</li>
<li>Font scale: 1.25</li>
<li>Set Caps Lock to act as Control</li>
<li>Right mouse button should resize windows</li>
</ul>
<p>General settings:</p>
<ul>
<li>Increase mouse pointer size (Accessibilty - Seeing - Cursor size - medium)</li>
<li>Disable animations (makes GNOME feel super snappy)</li>
<li>Set ‘Super+D’ to ‘Hide all normal windows’</li>
</ul>
<h2 id="extensions">Extensions</h2>
<p>I only install a handful of extensions in GNOME, mostly to hide a few extraneous items from the status bar:</p>
<p><a href="https://extensions.gnome.org/extension/744/hide-activities-button/">Hide Activities</a></p>
<p><a href="https://extensions.gnome.org/extension/2398/hide-universal-access/">Hide Universal Access</a></p>
<p><a href="https://extensions.gnome.org/extension/4099/no-overview/">No Overview on startup</a></p>
<p><a href="https://extensions.gnome.org/extension/4630/no-titlebar-when-maximized/">No titlebar when maximised</a></p>
<p>I don’t believe in using anything like Dash-to-dock, instead relying on shortcut keys for my three most important apps (Terminal, Chrome, and Slack) and also the overview plus alt-tab to navigate my way around.</p>
<h2 id="remove-default-music-pictures-etc-dirs-from-files">Remove default Music, Pictures etc. dirs from Files</h2>
<p>To hide these sections, edit <code>~/.config/user-dirs.dirs</code> and set the directory to be <code>$HOME</code>, for example:</p>
<pre><code>XDG_DESKTOP_DIR="$HOME/Desktop"
XDG_DOWNLOAD_DIR="$HOME/Downloads"
XDG_TEMPLATES_DIR="$HOME"
XDG_PUBLICSHARE_DIR="$HOME"
XDG_DOCUMENTS_DIR="$HOME/Documents"
XDG_MUSIC_DIR="$HOME"
XDG_PICTURES_DIR="$HOME"
XDG_VIDEOS_DIR="$HOME"
</code></pre>
<h2 id="got-black-and-white-emojis">Got black and white Emojis?</h2>
<p>Unlink <code>70-no-bitmaps.conf</code> from <code>/etc/fonts/conf.d</code></p>
<h1 id="syncthing">Syncthing</h1>
<pre><code>mkdir -p .config/systemd/user
cd ~/.config/systemd/user
wget https://raw.githubusercontent.com/syncthing/syncthing/master/etc/linux-systemd/user/syncthing.service
systemctl --user enable syncthing.service
systemctl start --user syncthing
journalctl -f --user -u syncthing
</code></pre>
<p>Then jump through the hoops to get it to sync with my Synology NAS.</p>
<h1 id="terminal">Terminal</h1>
<p>I use <a href="https://gogh-co.github.io/Gogh/">Gogh</a> to quickly import some profiles into Gnome Terminal, in particular GitHub Light and GitHub Dark. However, before any profiles appear you need to create a new profile called ‘Default’ and delete the ‘Unnamed’ one. Basically follow the steps in this issue: <a href="https://github.com/Gogh-Co/Gogh/issues/63#issuecomment-401510226">https://github.com/Gogh-Co/Gogh/issues/63#issuecomment-401510226</a>.</p>
<p>I then rebind keys in Terminal so that Super+c is copy and Super+v is paste.</p>
<h1 id="appearance">Appearance</h1>
<p>I like to stick to the default where possible so I don’t change much else. I think GNOME looks great as it is.</p>
<p><img src="/public/static/screenshot_gnome_debian.png" alt="GNOME 45 on Debian testing" class="center" /></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Thanks to <a href="https://jcs.org">jcs</a> for the <a href="https://jcs.org/2021/01/27/x1nano">heads-up</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Rick Froberg, RIP2023-07-03T00:00:00+00:00http://dischord.org/2023/07/03/rick-froberg-rip<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/361925338/"><img src="https://live.staticflickr.com/76/361925338_3f151a95af_b.jpg" title="DSC_0771" /></a></p>
<blockquote>
<p><em>Hot Snakes in Nottingham, back in 2004.</em></p>
</blockquote>
<p>As the tagline for this site and my online persona will attest to, the musical
endeavours that Rick Froberg was involved in - notably Drive Like Jehu and Hot
Snakes - had a profound impact on me. From a teenager hearing ‘Here Come the
Rome Plows’ for the first time through to having my ears melted a couple of
years back at what will sadly be the last time I’ll ever see Hot Snakes live,
his lyrics, vocals, and guitar playing are something that resonated with me on
a deeply fundamental level.</p>
<p>Hot Snakes in particular were absolutely <em>my band</em>, especially during the early
2000s when I was getting excited to go stay in San Diego for the first time.
That trip changed the course of my life in a number of ways, set the scene for
friendships and memories for a lifetime, and was planned around being able to
watch them live in their home town. It cemented my affinity for Southern
California, and ‘Suicide Invoice’ is the sound track to that trip - a CD I all
but wore out during numerous road trips whilst there.</p>
<p>I’m beyond saddened at Rick’s passing, especially knowing that he’s gone well
before his time.</p>
<p>RIP Rick.</p>
Tales from the Sausage Factory2023-01-15T17:52:00+00:00http://dischord.org/2023/01/15/tales-from-the-sausage-factory<p>It’s been far too long since the last Sausage Cloud update, and a <em>lot</em> has happened that warrants an update and some news on what we’ve been up to. The platform itself is now much, much more resilient and performant than it was previously, and this is in no small part thanks to a few special folks and companies that have helped out on the hardware front.</p>
<p><img src="/public/static/crieff-snow.jpeg" alt="Snowy drive to the Bunker" class="center" /></p>
<h1 id="infrastructure">Infrastructure</h1>
<h2 id="networking">Networking</h2>
<p>Since Sausage Cloud was first installed we’ve limped along with 1GbE, which has been mostly fine since the networking that most folks care about is between their instance and the Internet. However, there was no way this was going to cut it if we were to provide network-attached storage, so the hunt began for a reasonably priced (i.e) free 10GbE switch with a sufficient number of ports to service all Blades plus whatever storage nodes we hung off the back.</p>
<p>When fellow Sausage enthusiast Bartosz mentioned that he had a couple of ancient (but perfectly serviceable) Juniper EX4500s going spare, we jumped at the chance to make use of them. They eventually made their way up to my flat in Edinburgh (via an eery early lockdown service station rendez-vous with <a href="https://softiron.com/company/our-team/danny-abukalam/">Danny</a>) where we updated them to the latest firmware prior to making the trip to the Bunker. And if there’s only one adjective I had to use to describe these things, it’s “loud”.</p>
<p><img src="/public/static/ex4500.jpeg" alt="EX4500" class="center" /></p>
<p>Eventually we were able to get one of these switches plumbed in and everything recabled - we followed the approach established with the 1GbE and used a couple of upgraded passthrough modules, and this piece of work paved the way for the next round of upgrades.</p>
<h2 id="storage">Storage</h2>
<p>Perhaps the single biggest upgrade that Sausage Cloud has received during lockdown happened thanks to the awesome folks over at <a href="https://softiron.com">SoftIron</a>, in particular Danny. The lack of a fast, reliable persistent storage solution was fine at first but as usage has grown it started to become more and more of a blocker for certain types of workloads. Whilst we could’ve cobbled something together in classic Sausage style, fundamental infrastructure that underpins your platform - networking, storage - is not usually something you want to do too shitty of a job on, even by our standards. We’re also very much power constrained, and so of course the combination of reliable (since it’s a trek to the Bunker, even for me), low-power and yet high-performance distributed storage requirements (“pick two”) meant that we’d have to spend a fair old chunk of change to implement. Way beyond the hobbyist or community funding that we ostensibly have.</p>
<p>After a few chats, SoftIron came to our rescue and offered to fit us up with their HyperDrive Ceph storage solution, with enough capacity and redundancy to more than meet our meagre requirements. A couple of months later and everything was lined up, ready to go. However, there was no point in getting the storage in until we got the networking upgraded from 1GbE to 10GbE, and that was no small undertaking, so the kit sat in the Bunker gathering dust for the best part of a year before we performed the aforementioned network upgrades. The stars also had to align for Danny to make the long trek up and spend the day doing the installation. Finally they did and so in mid-2021 we were able to get the SoftIron kit racked, cabled and provisioned in a day (!).</p>
<p><img src="/public/static/danny-legend.jpeg" alt="Danny's an absolute legend" class="center" /></p>
<h2 id="compute">Compute</h2>
<p>The “ancient-yet-spritely” Blades we were using previously weren’t doing us any favours in terms of power, and as time has worn on that spriteliness was becoming less and less true. So we’ve invested in some (OK, second-hand) G9 Blades with much newer CPUs, quicker memory, and faster local SSDs. This has allowed us to consolidate workloads without sacrificing performance while at the same time lowering our overall power usage.</p>
<h1 id="platform">Platform</h1>
<p>Rest assured, the software that powers Sausage Cloud hasn’t been neglected either. The OpenStack control plane is now properly redundant with three nodes in the cluster, and on the software side we’ve gone through a couple of rounds of updates to OpenStack, and we’re now on Yoga with an update to Zed planned soon.</p>
<p>We’re also doing a much better job of monitoring! Previously we’d cobbled together something with Netdata which worked OK, but for a while now <a href="https://docs.openstack.org/kolla-ansible/latest/reference/logging-and-monitoring/prometheus-guide.html">K-A</a> has provided comprehensive roles for installing Prometheus plus the various exporters and so we’ve gone with that plus some shiny new dashboards in Grafana to give us much better visibility and awareness of what’s going on and where.</p>
<p><img src="/public/static/libvirt-grafana.png" alt="Grafana" class="center" /></p>
<h1 id="whats-next">What’s next?</h1>
<p>We’d be lying if we said that the recent energy crisis hadn’t hit us hard - it has. Very much so. Our running costs are now nearly four times what they were which is absolutely insane. Luckily everyone’s chipping in extra to cover this increase which is a testament to the usefulness of something that started off as a hobby project.</p>
<p>If you think you’d like to get involved and make use of our platform then feel free to get in touch!</p>
RKE2 and NVIDIA GPUs with the GRID Operator2022-06-14T21:22:00+00:00http://dischord.org/2022/06/14/rke2-and-nvidia-gpus-with-the-grid-operator<p>Similar to my <a href="https://dischord.org/2022/05/16/k3s-rke2-and-gpu-pci-passthrough/">earlier entry</a> on enabling GPUs with K3s or RKE2, this post builds on the process when you’re working with the <a href="https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/overview.html">NVIDIA GRID Operator</a> and specifically <a href="https://docs.rke2.io">RKE2</a>.</p>
<p>Being an Operator, it introduces a custom controller into your cluster which encompasses the business logic required to automatically set up and provision GPU nodes within your cluster. In fact, I believe it’s your only option if you want to take advantage of features like vGPUs which entail a licensing server elsewhere on your network (and the provisioning of which is outside the scope of this post).</p>
<h2 id="building-the-os-image">Building the OS image</h2>
<p>By default, the GRID Operator assumes that any node booted into the cluster with a GPU doesn’t have the driver installed to initialise the hardware. Therefore, it first launches a custom image and if necessary builds the kernel module before loading it on the host. The steps for doing this are well documented <a href="https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/install-gpu-operator-vgpu.html">here</a> and work fine as long as you’re using an OS in the linked repo.</p>
<p>In practice I found this process a little slow when it comes to initialising a new node, and cloud being the name of the game you might find yourself scaling your cluster up and down to meet workload demands, meaning you don’t want to add any extra - avoidable - delays into the whole process. Therefore, I decided to bake a custom OS image with the prerequisites already installed instead and then there’s a flag you can set when you install the Helm chart to tell the Operator not to bother with driver installation. More on that in a minute.</p>
<p>The other problem is that K3s and RKE2 do a lot of the work of configuring the container runtime for us, as long as the NVIDIA container toolkit and the driver are already installed. This doesn’t work as expected if we’re initialising the GPU <em>after</em> we’ve already launched RKE2 on a node.</p>
<p>So to include the drivers and the toolkit in the OS image, you want to include a step along these lines:</p>
<pre><code class="language-shell">export DEBIAN_FRONTEND=noninteractive
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
echo 'deb https://nvidia.github.io/libnvidia-container/stable/ubuntu20.04/$(ARCH) /' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt-get install -y build-essential nvidia-container-toolkit
curl -s -o /tmp/NVIDIA-Linux-x86_64-510.47.03-grid.run http://your.repo/NVIDIA-Linux-x86_64-510.47.03-grid.run
sh /tmp/NVIDIA-Linux-x86_64-510.47.03-grid.run -s
rm -f /tmp/NVIDIA-Linux-x86_64-510.47.03-grid.run
rm -f /tmp/script.sh
</code></pre>
<blockquote>
<p>I use <a href="https://packer.io">Packer</a> to build custom OS images, and it’s trivial to include this shell script snippets such as these as part of that process.</p>
</blockquote>
<p>The other thing you need to do is to include the config so that the node registers itself with your licensing server. If you’re not building an image with this stuff included and instead you’re relying on the Operator to do its bit with the driver then this step is managed for you. However, if you’re following along and skipping this last bit, then we need to bake it into our OS image by including something like the following after you’ve installed the driver:</p>
<pre><code>cat > /etc/nvidia/gridd.conf << EOF
ServerAddress=192.168.1.123
EOF
</code></pre>
<p>As an aside, if you <em>don’t</em> include the licensing information then your GPU-resourced Pods will run fine for a brief while (<1 hour), but then all of a sudden performance will completely drop off a cliff. If you log into the node itself and check the logs for the <code>gridd</code> systemd unit, you’ll see something along the lines of <code>notice: vmiop_log: (0x0): vGPU license state: Unlicensed (Restricted)</code>. This is because the driver drops the GPU into limp mode when you exceed the grace period for the licensing server being contactable. When you haven’t come across this behaviour before, it can be very confusing to troubleshoot.</p>
<h2 id="configuring-rke2">Configuring RKE2</h2>
<p>In this example, I’m also going to make the NVIDIA container runtime the default on our GPU nodes. Create a file called <code>/var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl</code> with the following contents:</p>
<pre><code class="language-toml">[plugins.opt]
path = "/var/lib/rancher/rke2/agent/containerd"
[plugins.cri]
stream_server_address = "127.0.0.1"
stream_server_port = "10010"
enable_selinux = false
sandbox_image = "index.docker.io/rancher/pause:3.6"
[plugins.cri.containerd]
snapshotter = "overlayfs"
disable_snapshot_annotations = true
default_runtime_name = "nvidia"
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes."nvidia"]
runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes."nvidia".options]
BinaryName = "/usr/bin/nvidia-container-runtime"
</code></pre>
<blockquote>
<p>If you do this via cloud-init as part of node provisioning, then it’ll be automatically picked up by RKE2 when it first launches.</p>
</blockquote>
<h2 id="installing-the-operator">Installing the Operator</h2>
<p>Now we can go ahead and install the Operator with a few specific options to match both our pre-installed driver as well as where to find the various bits that RKE2 ships with:</p>
<pre><code>helm install gpu-operator \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.enabled=false \
--set toolkit.enabled=false \
--set driver.licensingConfig.configMapName=licensing-config \
--set toolkit.env[0].name=CONTAINERD_CONFIG \
--set toolkit.env[0].value=/var/lib/rancher/rke2/agent/etc/containerd/config.toml \
--set toolkit.env[1].name=CONTAINERD_SOCKET \
--set toolkit.env[1].value=/run/k3s/containerd/containerd.sock \
--set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
--set toolkit.env[2].value=nvidia \
--set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
--set-string toolkit.env[3].value=true
</code></pre>
<p>After a brief period, you should end up with something looking like this:</p>
<pre><code>nick@deadline ~ % k get pods -n gpu-operator
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-cs2qx 1/1 Running 0 45s
gpu-operator-7bfc5f55-hmqhk 1/1 Running 0 60s
gpu-operator-node-feature-discovery-master-6598566c8c-7rlbc 1/1 Running 0 60s
gpu-operator-node-feature-discovery-worker-jv5q6 1/1 Running 0 60s
gpu-operator-node-feature-discovery-worker-qkm77 1/1 Running 0 60s
nvidia-cuda-validator-ctzzz 0/1 Completed 0 41s
nvidia-dcgm-exporter-rl4bj 1/1 Running 0 45s
nvidia-device-plugin-daemonset-wc5bm 1/1 Running 0 45s
nvidia-device-plugin-validator-xvcp7 0/1 Completed 0 29s
nvidia-operator-validator-zqj9t 1/1 Running 0 46s
</code></pre>
<p>And of course nodes should now have the <code>nvidia.com/gpu</code> resource listed:</p>
<pre><code>nick@deadline ~ % k get node worker-gpu-53d95674-787gx -o jsonpath="{.status.allocatable}" | jq
{
"cpu": "6",
"ephemeral-storage": "197546904422",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "32883428Ki",
"nvidia.com/gpu": "1",
"pods": "110"
}
</code></pre>
<p>If anything fails it’s usually fairly clear from the Pod logs as to what went awry, in particular the logs for the ‘validators’.</p>
K3s / RKE2 and GPU PCI-passthrough2022-05-16T10:18:00+00:00http://dischord.org/2022/05/16/k3s-rke2-and-gpu-pci-passthrough<p>Here’s some notes on getting GPUs working with K3s or RKE2. It’s pulled together from a few places to save folks the same trouble I found, i.e hunting through PRs and issues on GitHub to work out what needs to be configured in order for this to work.</p>
<p>The first step is to get yourself a node with a NVIDIA GPU. In my case I’ve a 3080 configured using PCI passthrough to a VM running Ubuntu 20.04:</p>
<pre><code>nick@gpu0:~$ sudo lspci -v | grep -i nvidia
05:00.0 VGA compatible controller: NVIDIA Corporation Device 2206 (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Device 1467
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
06:00.0 Audio device: NVIDIA Corporation Device 1aef (rev a1)
Subsystem: NVIDIA Corporation Device 1467
</code></pre>
<h2 id="node-os-configuration">Node OS configuration</h2>
<p>Within this VM, we need to install the NVIDIA drivers and also the <a href="https://github.com/NVIDIA/nvidia-docker">container toolkit</a>:</p>
<pre><code>$ sudo apt -y install nvidia-driver-510
$ curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | apt-key add -
$ echo 'deb https://nvidia.github.io/libnvidia-container/stable/ubuntu20.04/$(ARCH) /' > / etc/apt/sources.list.d/nvidia-container-toolkit.list
$ apt update
$ apt install -y nvidia-container-toolkit
</code></pre>
<p>With the necessary binary blobs installed, we can verify that the GPU is working at least as far as the host operating system is concerned by running <code>nvidia-smi</code>:</p>
<pre><code>root@gpu0:~# nvidia-smi
Mon May 16 09:28:25 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:05:00.0 Off | N/A |
| 0% 28C P8 3W / 320W | 0MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
</code></pre>
<p>Looks good, but that’s the easy bit. Now let’s sort out Kubernetes!</p>
<h2 id="configuring-kubernetes">Configuring Kubernetes</h2>
<p>The first thing I want to draw your attention to is <a href="https://github.com/k3s-io/k3s/pull/3890">this PR</a>. It landed in K3s in 1.22, so you need to be installing this version at the very least. This saves the hassle of having to manually craft a containerd config template - it automatically generates the right section if it detects the presence of the drivers and the toolkit. So all we have to do is install K3s with no special options, and when K3s starts you should see the following section in <code>/var/lib/rancher/k3s/agent/etc/containerd/config.toml</code> (if you’re using RKE2 then the path is <code>/var/lib/rancher/rke2/agent/etc/containerd/config.toml</code>):</p>
<pre><code class="language-toml">[plugins.cri.containerd.runtimes."nvidia"]
runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes."nvidia".options]
BinaryName = "/usr/bin/nvidia-container-runtime"
</code></pre>
<p>With everything started and your Kubernetes cluster up and running, it should look something like this:</p>
<pre><code>NAME STATUS ROLES AGE VERSION
control0 Ready control-plane,etcd,master 12d v1.23.6+k3s1
gpu0 Ready <none> 86m v1.23.6+k3s1
worker0 Ready <none> 61m v1.23.6+k3s1
</code></pre>
<h3 id="runtime-classes">Runtime Classes</h3>
<p>Although K3s (and RKE2) will have set up the container runtime (containerd) options for us automatically, in practice what we have on that node now is two available runtimes - the default (runcv2) and also nvidia. When a container is scheduled on that node, we need a way of telling the scheduler which runtime should be used - this is what <a href="https://kubernetes.io/docs/concepts/containers/runtime-class/">Runtime Classes</a> are for. So, we need to create a <code>runtimeClass</code> for the nvidia runtime:</p>
<pre><code>$ kubectl apply -f - <<EOF
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: nvidia
handler: "nvidia"
EOF
</code></pre>
<h3 id="node-feature-discovery">Node Feature Discovery</h3>
<p>The bit that we need to install next to update our nodes and advertise the availability of a GPU resource to the scheduler is the <a href="https://github.com/NVIDIA/k8s-device-plugin">NVIDIA device plugin</a>. By default this will attempt to create a DaemonSet on all nodes in the cluster, and in my case not every single node has a GPU. So what we actually need is a way of selecting the right nodes, and for this we’ll make use of a handy project called <a href="https://github.com/kubernetes-sigs/node-feature-discovery">Node Feature Discovery</a>. It’s an add-on which detects and then adds labels to nodes with bits of information on what hardware is in that node and so on. We can then use these labels to target our nodes with a GPU. It’s available as a Helm chart, or we can install it directly via <code>kubectl</code>:</p>
<pre><code>$ kubectl apply -k "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.11.0"
</code></pre>
<p>Once deployed, you’ll notice a bunch of extra labels like:</p>
<pre><code>$ kubectl describe node gpu0 | grep -i pci
feature.node.kubernetes.io/pci-0300_10de.present=true
feature.node.kubernetes.io/pci-0300_1b36.present=true
</code></pre>
<p>Looking on our host, the NVIDIA GPU corresponds with the PCI ID in the label above:</p>
<pre><code>root@gpu0:~# lspci -nn | grep -i nvidia
05:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2206] (rev a1)
06:00.0 Audio device [0403]: NVIDIA Corporation Device [10de:1aef] (rev a1)
</code></pre>
<p>So we can use a <code>nodeSelector</code> to target nodes with this particular label.</p>
<h3 id="nvidia-device-plugin">NVIDIA Device Plugin</h3>
<p>Now, the Helm chart provided by NVIDIA for the container toolkit is missing a couple of things that we’ll need to make sure the workload lands in the right place and with the right options:</p>
<ul>
<li>It won’t have the <code>runtimeClass</code> set;</li>
<li>It won’t have a nodeSelector for our GPU</li>
</ul>
<p>So instead, we’ll need to grab and template out the manifests and add in our options.</p>
<pre><code>$ helm repo add nvidia-device-plugin https://nvidia.github.io/k8s-device-plugin
$ helm template \
nvidia-device-plugin \
--version=0.11.0 \
--set runtimeClassName=nvidia \
nvidia-device-plugin/nvidia-device-plugin > ~/nvidia-device-plugin.yml
</code></pre>
<p>Then edit <code>nvidia-device-plugin.yml</code> and add those two things to the template spec section. It should look something like this:</p>
<pre><code>spec:
[..]
template:
[..]
spec:
[..]
nodeSelector:
feature.node.kubernetes.io/pci-0302_10de.present: "true"
runtimeClassName: nvidia
</code></pre>
<p>With those changes made we can instantiate those resources in our cluster:</p>
<pre><code>$ kubectl apply -f ~/nvidia-device-plugin.yml
</code></pre>
<p>If everything’s lined up and goes according to plan, you should see the right number of Pods being scheduled as part of the DaemonSet, and there should be some useful bits of information in the logs:</p>
<pre><code>2022/05/30 13:51:50 Loading NVML
2022/05/30 13:51:50 Starting FS watcher.
2022/05/30 13:51:50 Starting OS watcher.
2022/05/30 13:51:50 Retreiving plugins.
2022/05/30 13:51:50 Starting GRPC server for 'nvidia.com/gpu'
2022/05/30 13:51:50 Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock
2022/05/30 13:51:50 Registered device plugin for 'nvidia.com/gpu' with kubelet
</code></pre>
<p>And examining the node itself should show that we do in fact have a <code>nvidia.com/gpu</code> resource available:</p>
<pre><code>$ kubectl get node gpu0 -o jsonpath="{.status.allocatable}" | jq
{
"cpu": "8",
"ephemeral-storage": "99891578802",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "32882948Ki",
"nvidia.com/gpu": "1",
"pods": "110"
}
</code></pre>
<h2 id="testing">Testing</h2>
<p>To test we can use a <a href="https://www.olcf.ornl.gov/tutorials/cuda-vector-addition/">CUDA vector add</a> example image which should get scheduled to our GPU node if nothing’s amiss. Note the <code>runtimeClassName</code> added to the Pod spec:</p>
<pre><code class="language-yaml">apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
runtimeClassName: nvidia
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
env:
- name: NVIDIA_VISIBLE_DEVICES
value: all
- name: NVIDIA_DRIVER_CAPABILITIES
value: compute,utility
# https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
image: "k8s.gcr.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1
</code></pre>
<pre><code>$ kubectl apply -f gputest.yaml
pod/cuda-vector-add created
$ kubectl get pod cuda-vector-add -o jsonpath="{.spec.nodeName}"
gpu0
$ kubectl get pod cuda-vector-add -o jsonpath="{.status.phase}"
Running
</code></pre>
<p>Looks good!</p>
K3s and kube-vip with Cilium's Egress Gateway feature2022-01-04T11:16:00+00:00http://dischord.org/2022/01/04/k3s-kube-vip-cilium-egress<p>New year, new blog post! This time it’s more magic with Cilium, which has a <a href="https://docs.cilium.io/en/v1.10/gettingstarted/egress-gateway/">useful new feature</a> that lets you specify a specific IP address in order to egress traffic from your cluster. This is really handy in very constrained environments, however the official guide suggests that you statically configure an IP address on the designated egress gateway node. This is a bit limiting, as if that node happens to go down then you might need to wait five minutes for a Pod to be scheduled elsewhere to re-plumb in that IP address. In this post we’ll work around that using the venerable <a href="https://kube-vip.io">kube-vip</a> instead.</p>
<blockquote>
<p><em>NB</em>: The enterprise version of Cilium 1.11 has a HA feature for the Egress Gateway functionality, <a href="https://isovalent.com/blog/post/2021-12-release-111#egress-gateway-ha">see this blog post</a>.</p>
</blockquote>
<p>We’ll also heavily lean into Cilium’s support for eBPF by doing away with kube-proxy entirely, but note that this does come with some
<a href="https://docs.cilium.io/en/v1.10/gettingstarted/kubeproxy-free/#limitations">limitations</a>.</p>
<h2 id="install-k3s">Install K3s</h2>
<p>First, let’s set some common options for K3s. We disable the in-built CNI and <a href="https://github.com/k3s-io/klipper-lb">Klipper</a> (the Service LB), disable kube-proxy and the network policy controller (since the functionality will be handled by Cilium), and also specify an additional IP address - that of a VIP which we’ll configure shortly - as a SAN to be able to access our Kubernetes API:</p>
<pre><code>export K3S_VERSION="v1.22.4+k3s1"
export K3S_OPTIONS="--flannel-backend=none --no-flannel --disable-kube-proxy --disable servicelb --disable-network-policy --tls-san=192.168.20.200"
</code></pre>
<p>I’ve got three VMs running openSUSE Leap deployed via created on vSphere - <code>cilium{0..2}</code>. Note that I use <a href="https://github.com/vmware/govmomi/tree/master/govc"><code>govc</code></a> extensively during this article, and I’ll be making use of <a href="https://github.com/alexellis/k3sup">k3sup</a> to bootstrap my cluster:</p>
<pre><code>k3sup install --cluster --ip $(govc vm.ip /42can/vm/cilium0) --user nick --local-path ~/.kube/cilium.yaml --context cilium --k3s-version $K3S_VERSION --k3s-extra-args $K3S_OPTIONS
k3sup join --ip $(govc vm.ip /42can/vm/cilium1) --server-ip $(govc vm.ip /42can/vm/cilium0) --server --server-user nick --user nick --k3s-version $K3S_VERSION --k3s-extra-args $K3S_OPTIONS
k3sup join --ip $(govc vm.ip /42can/vm/cilium2) --server-ip $(govc vm.ip /42can/vm/cilium0) --server --server-user nick --user nick --k3s-version $K3S_VERSION --k3s-extra-args $K3S_OPTIONS
</code></pre>
<p>At this point nodes will be in <code>NotReady</code> status and no Pods will have started as we have no functioning CNI:</p>
<pre><code>% kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
cilium0 NotReady control-plane,etcd,master 2m53s v1.22.4+k3s1 192.168.20.49 <none> openSUSE Leap 15.3 5.3.18-57-default containerd://1.5.8-k3s1
cilium1 NotReady control-plane,etcd,master 65s v1.22.4+k3s1 192.168.20.23 <none> openSUSE Leap 15.3 5.3.18-57-default containerd://1.5.8-k3s1
cilium2 NotReady control-plane,etcd,master 24s v1.22.4+k3s1 192.168.20.119 <none> openSUSE Leap 15.3 5.3.18-57-default containerd://1.5.8-k3s1
% kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-85cb69466-n7z42 0/1 Pending 0 2m50s
kube-system helm-install-traefik--1-xv7q2 0/1 Pending 0 2m50s
kube-system helm-install-traefik-crd--1-w7w5w 0/1 Pending 0 2m50s
kube-system local-path-provisioner-64ffb68fd-mfqsr 0/1 Pending 0 2m50s
kube-system metrics-server-9cf544f65-kqbj4 0/1 Pending 0 2m50s
</code></pre>
<h2 id="add-a-vip-for-the-kubernetes-api">Add a VIP for the Kubernetes API</h2>
<p>Before we go any further and install Cilium, let’s add a VIP for our control plane via kube-vip. Note that we do this as a static Pod (as opposed to as a DaemonSet) since we don’t yet have a functioning CNI. To make this happen we need to do a couple of things - first create the the various RBAC-related resources either via <code>kubectl</code> or by dropping the contents of <a href="https://kube-vip.io/manifests/rbac.yaml">this file</a> into <code>/var/lib/rancher/k3s/server/manifests</code>, and we also need to generate the manifest for our static Pod, customised slightly based on our environment and also for K3s. To create the manifest in my case, I ran these commands:</p>
<pre><code>% alias kube-vip="docker run --network host --rm ghcr.io/kube-vip/kube-vip:v0.4.0"
% kube-vip manifest pod \
--interface eth0 \
--vip 192.168.20.200 \
--controlplane \
--services \
--arp \
--leaderElection > kube-vip.yaml
</code></pre>
<p>This created a <code>kube-vip.yaml</code> with the IP I want to use for my VIP (<code>192.168.20.200</code>) as well as the interface on my nodes to which this should be bound (<code>eth0</code>). The file then needs to be edited, and edit the <code>hostPath</code> <code>path</code> to point to <code>/etc/rancher/k3s/k3s.yaml</code> instead of the default <code>/etc/kubernetes/admin.conf</code>, since the path to the kubeconfig which should be used is different with K3s.</p>
<p>With those changes made, this needs to be copied into the default directory for static Pod manifests on all of our nodes: <code>/var/lib/rancher/k3s/agent/pod-manifests</code>:</p>
<pre><code>% kubectl apply -f https://kube-vip.io/manifests/rbac.yaml
serviceaccount/kube-vip created
clusterrole.rbac.authorization.k8s.io/system:kube-vip-role created
clusterrolebinding.rbac.authorization.k8s.io/system:kube-vip-binding created
% for node in cilium{0..2} ; do cat kube-vip.yaml | ssh $(govc vm.ip $node) 'cat - | sudo tee /var/lib/rancher/k3s/agent/pod-manifests/kube-vip.yaml' ; done
</code></pre>
<p>If everything’s working, after a few seconds you should see the <code>kube-vip</code> Pods in state running and you should be able to ping your VIP:</p>
<pre><code>% kubectl get pods -n kube-system | grep -i vip
kube-vip-cilium0 1/1 Running 0 107m
kube-vip-cilium1 1/1 Running 0 5m9s
kube-vip-cilium2 1/1 Running 0 2m15s
% ping -c 1 192.168.20.200
PING 192.168.20.200 (192.168.20.200) 56(84) bytes of data.
64 bytes from 192.168.20.200: icmp_seq=1 ttl=63 time=1.78 ms
--- 192.168.20.200 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.781/1.781/1.781/0.000 ms
</code></pre>
<p>Now we can install Cilium via Helm, specifying the VIP for the service host to which Cilium should connect:</p>
<pre><code>helm install cilium cilium/cilium --version 1.11 \
--namespace kube-system \
--set kubeProxyReplacement=strict \
--set k8sServiceHost=192.168.20.200 \
--set k8sServicePort=6443 \
--set egressGateway.enabled=true \
--set bpf.masquerade=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true
</code></pre>
<h2 id="deploying-the-kube-vip-cloud-controller">Deploying the kube-vip Cloud Controller</h2>
<p>The Traefik Ingress controller deployed as part of K3s creates a service of type LoadBalancer. As our install options meant that <a href="https://github.com/k3s-io/klipper-lb">Klipper</a> isn’t deployed, so we need something else to handle these types of resources and to surface a VIP on our cluster’s behalf. Given we’re using kube-vip for the control plane, we’ll go ahead and use it for this as well. First, install the kube-vip controller component which will watch and handle Services of this type:</p>
<pre><code class="language-shell">kubectl apply -f https://raw.githubusercontent.com/kube-vip/kube-vip-cloud-provider/main/manifest/kube-vip-cloud-controller.yaml
</code></pre>
<p>Now create a <code>ConfigMap</code> resource with a range that kube-vip will allocate an IP address from by default:</p>
<pre><code class="language-yaml">apiVersion: v1
kind: ConfigMap
metadata:
name: kubevip
namespace: kube-system
data:
range-global: 192.168.20.220-192.168.20.225
</code></pre>
<p>Once the <code>kube-vip-cloud-provider-0</code> in the <code>kube-system</code> namespace is <code>Running</code>, you should see that the LoadBalancer Service for Traefik now has an IP address allocated and is reachable from outside the cluster:</p>
<pre><code class="language-shell">% kubectl get svc traefik -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
traefik LoadBalancer 10.43.21.231 192.168.20.220 80:31186/TCP,443:31184/TCP 16h
% curl 192.168.20.220
404 page not found
</code></pre>
<blockquote>
<p>NB: The 404 is expected and a valid response</p>
</blockquote>
<h2 id="using-and-testing-ciliums-egress-gateway-feature">Using and testing Cilium’s Egress Gateway feature</h2>
<p>As mentioned, the example documentation for the Egress Gateway feature for Cilium suggests that you create a Deployment which launches a container somewhere in your cluster and plumbs in an IP address on an interface, and this IP will be the nominated Egress IP. If that node goes down, it might take some time before kube-scheduler reschedules that Pod onto another node and reconfigures this IP address.</p>
<p>However, we have other options - for example, we can make use of an existing IP address within our cluster, and if it’s one being managed by kube-vip then we can entrust that it’ll be failed over and reassigned in a timely and graceful manner. Let’s test using the IP that’s been assigned to our Traefik LoadBalancer Service - <code>192.168.20.220</code> - by kube-vip.</p>
<p>To test this we need an external service running somewhere that we can connect to. I’ve spun up another VM, imaginatively titled <code>test</code>, and this machine has an IP of <code>192.168.20.70</code>. I’m just going to launch NGINX via Docker:</p>
<pre><code class="language-shell">% echo 'it works' > index.html
% docker run -d --name nginx -p 80:80 -v $(pwd):/usr/share/nginx/html nginx
% docker logs -f nginx
</code></pre>
<p>If we attempt to connect from a client in our cluster to this server, we should something like the following:</p>
<pre><code class="language-shell">% kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
If you don't see a command prompt, try pressing enter.
bash-5.1# curl 192.168.20.70
it works
</code></pre>
<p>And from NGINX:</p>
<pre><code class="language-shell">192.168.20.82 - - [07/Dec/2021:12:35:43 +0000] "GET / HTTP/1.1" 200 9 "-" "curl/7.80.0" "-"
</code></pre>
<p>Predictably, the source IP is the node on which my netshoot container is running (<code>cilium0</code>). Now let’s add a <code>CiliumEgressNATPolicy</code> which will put in place the configuration to set the source IP address to that of my Traefik LoadBalancer for any traffic destined for my external test server IP:</p>
<pre><code class="language-yaml">% cat egress.yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumEgressNATPolicy
metadata:
name: egress-sample
spec:
egress:
- podSelector:
matchLabels:
io.kubernetes.pod.namespace: default
destinationCIDRs:
- 192.168.20.70/32
egressSourceIP: "192.168.20.220"
% kubectl apply -f egress.yaml
ciliumegressnatpolicy.cilium.io/egress-sample created
</code></pre>
<p>We can verify the NAT that’s been put in place by running the <code>cilium bpf egress list</code> command from within one of our Cilium Pods:</p>
<pre><code class="language-shell">% kubectl -n kube-system get pods -l k8s-app=cilium
NAME READY STATUS RESTARTS AGE
cilium-28gdd 1/1 Running 0 17h
cilium-n5wv7 1/1 Running 0 17h
cilium-vfvkd 1/1 Running 0 16h
% kubectl exec -it cilium-vfvkd -n kube-system -- cilium bpf egress list
SRC IP & DST CIDR EGRESS INFO
10.0.0.80 192.168.20.70/32 192.168.20.220 192.168.20.220
10.0.2.11 192.168.20.70/32 192.168.20.220 192.168.20.220
</code></pre>
<p>Now let’s run that <code>curl</code> command again from our netshoot container and observe what happens again in NGINX:</p>
<pre><code class="language-shell">bash-5.1# curl 192.168.20.70
it works
</code></pre>
<pre><code class="language-shell">192.168.20.220 - - [07/Dec/2021:12:47:53 +0000] "GET / HTTP/1.1" 200 9 "-" "curl/7.80.0" "-"
</code></pre>
<p>There we can see that the source IP of the request is now the VIP assigned to our Traefik LoadBalancer. This works because behind the scenes, kube-vip is handling the IP address assignment to a specific physical interface on our behalf. For example, I can ssh to this VIP and validate that it’s been assigned to <code>eth0</code>:</p>
<pre><code>% ssh 192.168.20.220
Warning: Permanently added '192.168.20.240' (ED25519) to the list of known hosts.
nick@cilium0:~> ip -br a li eth0
eth0 UP 192.168.20.163/24 192.168.20.200/32 192.168.20.220/32
</code></pre>
<p>As it turns out, it’s currently assigned to the first node in my cluster (and it actually also happens to be the one with the VIP for the control plane). Let’s see what happens when shut this node and get some timings for how long it takes to failover. My <code>tmp-shell</code> Pod is running on <code>cilium2</code>, so I’ll kick off my <code>curl</code> command in a loop running every second, shut <code>cilium0</code> down, and keep an eye on NGINX:</p>
<pre><code>bash-5.1# while true ; do curl 192.168.20.70 ; sleep 1; done
it works
it works
[..]
</code></pre>
<p>And from the NGINX logs, from the time I shut down <code>cilium0</code> I can see it took about two minutes before it saw any further requests:</p>
<pre><code>192.168.20.220 - - [18/Jan/2022:08:31:48 +0000] "GET / HTTP/1.1" 200 9 "-" "curl/7.80.0" "-"
192.168.20.220 - - [18/Jan/2022:08:34:11 +0000] "GET / HTTP/1.1" 200 9 "-" "curl/7.80.0" "-"
</code></pre>
<p>Not bad!</p>
Cluster external iSCSI initiator to Longhorn volume target, via Cilium's CEW feature2021-12-14T10:55:00+00:00http://dischord.org/2021/12/14/cilium-cew-longhorn<p>This post was partly prompted by the realisation that it’s nearly 2022
and I haven’t posted anything for the year…</p>
<p>Anyway, here’s a snappily-titled post that demos a cool feature of
<a href="https://cilium.io">Cilium</a> - <a href="https://docs.cilium.io/en/v1.9/gettingstarted/external-workloads/">Cluster External Workloads</a> - that lets you extend cluster
networking to external clients, i.e other virtual machines, so that
they can access resources hosted in Kubernetes. I’m a big fan of Cilium as aside from all the security and observability benefits, it also has a lot of cool features such as this that help bridge the gap between the ‘old world’ and the Cloud Native way of doing things 🌠</p>
<p>For this example we’re going to make use of a <a href="https://longhorn.io">Longhorn</a> feature that
lets you connect any <a href="https://longhorn.io/docs/1.2.2/advanced-resources/iscsi/">iSCSI initiator to a Longhorn Volume as a target</a>.
This particular use-case was prompted by a data recovery scenario, in
which maybe you have a VM outside of your Kubernetes cluster to which
you’d like to present a Kubernetes PV.</p>
<h2 id="cluster-bring-up">Cluster bring-up</h2>
<p>I’ve created four virtual machines for my cluster - one as a controlplane / etcd host, and three workers. I’m going to use the venerable <a href="https://rancher.com/docs/rke/latest/en/">RKE</a> for this, so I just need to craft a simple <code>cluster.yaml</code> and run <code>rke up</code>:</p>
<pre><code class="language-yaml">cluster_name: cilium
ssh_agent_auth: true
ignore_docker_version: true
nodes:
- address: 192.168.20.30
user: nick
role:
- controlplane
- etcd
- address: 192.168.20.181
user: nick
role:
- worker
- address: 192.168.20.79
user: nick
role:
- worker
- address: 192.168.20.184
user: nick
role:
- worker
kubernetes_version: v1.20.9-rancher1-1
ingress:
provider: nginx
network:
plugin: none
</code></pre>
<p>Once the cluster is up, install Cilium with the CEW feature enabled:</p>
<pre><code class="language-sh">$ helm repo add cilium https://helm.cilium.io/
$ helm repo update
$ helm install cilium cilium/cilium --version 1.9.9 \
--namespace kube-system \
--set externalWorkloads.enabled=true
</code></pre>
<p>I like to have a VIP to make network access to nodes in my cluster highly available, so for this I install <a href="https://github.com/immanuelfodor/kube-karp">kube-karp</a>:</p>
<pre><code class="language-sh">$ git clone https://github.com/immanuelfodor/kube-karp
$ cd kube-karp/helm
$ helm install kube-karp . \
--set envVars.virtualIp=192.168.20.200 \
--set envVars.interface=eth0 \
-n kube-karp --create-namespace
</code></pre>
<pre><code>$ ping -c 2 192.168.20.200
PING 192.168.20.200 (192.168.20.200) 56(84) bytes of data.
64 bytes from 192.168.20.200: icmp_seq=1 ttl=63 time=1.82 ms
64 bytes from 192.168.20.200: icmp_seq=2 ttl=63 time=2.15 ms
--- 192.168.20.200 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 1.821/1.983/2.146/0.162 ms
</code></pre>
<p>This is the IP I’ll use in the next step when configuring Cilium on my cluster external VM.</p>
<h2 id="configure-external-workload">Configure external workload</h2>
<p>I’ve created another VM which won’t be part of my Kubernetes cluster, and it’s called <code>longhorn-client</code> with an IP of <code>192.168.20.171</code>. This VM has the <code>open-iscsi</code> package installed.</p>
<p>Create and apply the <code>CiliumExternalWorkload</code> resource definition. The name needs to match the hostname of the external VM:</p>
<pre><code class="language-sh">$ cat longhown-cew.yaml
apiVersion: cilium.io/v2
kind: CiliumExternalWorkload
metadata:
name: longhorn-client
labels:
io.kubernetes.pod.namespace: default
spec:
ipv4-alloc-cidr: 10.192.1.0/30
$ kubectl apply -f longhorn-cew.yaml
ciliumexternalworkload.cilium.io/longhorn-client created
</code></pre>
<p>Grab the TLS keys necessary for external workloads to authenticate with Cilium in our cluster, and <code>scp</code> them to our VM:</p>
<pre><code>$ curl -LO https://raw.githubusercontent.com/cilium/cilium/v1.9/contrib/k8s/extract-external-workload-certs.sh
$ chmod +x extract-external-workload-certs.sh
$ ./extract-external-workload-certs.sh
$ ls external*
external-workload-ca.crt external-workload-tls.crt external-workload-tls.key
$ scp external* 192.168.20.171:
Warning: Permanently added '192.168.20.171' (ED25519) to the list of known hosts.
external-workload-ca.crt 100% 1151 497.8KB/s 00:00
external-workload-tls.crt 100% 1123 470.4KB/s 00:00
external-workload-tls.key 100% 1675 636.3KB/s 00:00
</code></pre>
<h3 id="install-cilium-on-external-vm">Install Cilium on external VM</h3>
<p>On the cluster external VM, run the following, adjusting <code>CLUSTER_ADDR</code> for your setup (in my case it’s the VIP):</p>
<pre><code>$ curl -LO https://raw.githubusercontent.com/cilium/cilium/v1.9/contrib/k8s/install-external-workload.sh
$ chmod +x install-external-workload.sh
docker pull cilium/cilium:v1.9.9
CLUSTER_ADDR=192.168.20.200 CILIUM_IMAGE=cilium/cilium:v1.9.9 ./install-external-workload.sh
</code></pre>
<p>After a few seconds this last command should return, and then you can verify connectivity as follows:</p>
<pre><code class="language-sh">$ cilium status
KVStore: Ok etcd: 1/1 connected, lease-ID=7c027b34bb8c593c, lock lease-ID=7c027b34bb8c593e, has-quorum=true: https://clustermesh-apiserver.cilium.io:32379 - 3.4.13 (Leader)
Kubernetes: Disabled
Cilium: Ok 1.9.9 (v1.9.9-5bcf83c)
NodeMonitor: Disabled
Cilium health daemon: Ok
IPAM: IPv4: 2/3 allocated from 10.192.1.0/30, IPv6: 2/4294967295 allocated from f00d::aab:0:0:0/96
BandwidthManager: Disabled
Host Routing: Legacy
Masquerading: IPTables
Controller Status: 18/18 healthy
Proxy Status: OK, ip 10.192.1.2, 0 redirects active on ports 10000-20000
Hubble: Disabled
Cluster health: 5/5 reachable (2021-08-11T11:34:42Z)
</code></pre>
<p>And on the Kubernetes side, if you look at the status of the CEW resource we created you’ll see it has our node’s IP address:</p>
<pre><code class="language-sh">$ kubectl get cew longhorn-client
NAME CILIUM ID IP
longhorn-client 57664 192.168.20.171
</code></pre>
<p>From our VM, an additional test is that we should now be able to resolve cluster internal FQDN’s. The <code>install-external-workload.sh</code> script should’ve updated <code>/etc/resolv.conf</code> for us, but note that you might need to also disable <code>systemd-resolved</code> or netconfig (in my case) so that update doesn’t get clobbered. If it’s working, you can test this by running the following command:</p>
<pre><code class="language-sh">$ dig +short clustermesh-apiserver.kube-system.svc.cluster.local
10.43.234.249
</code></pre>
<h2 id="install-longhorn">Install Longhorn</h2>
<p>Install Longhorn with the default settings into our target cluster:</p>
<pre><code class="language-sh">helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn -n longhorn-system --create-namespace
</code></pre>
<p>With Longhorn rolled out, use the UI to create a new volume, set the frontend to be ‘iSCSI’, and then make sure it’s attached to a host in your cluster. Verify its status:</p>
<pre><code class="language-sh">$ kubectl get lhv test -n longhorn-system
NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
test attached healthy True 21474836480 192.168.20.184 25s
</code></pre>
<p>Grab the iSCSI endpoint for this volume either via the UI (under ‘Volume Details’) or via <code>kubectl</code>:</p>
<pre><code>$ kubectl get lhe -n longhorn-system
NAME STATE NODE INSTANCEMANAGER IMAGE AGE
test-e-b8eb676b running 192.168.20.184 instance-manager-e-baea466a longhornio/longhorn-engine:v1.1.2 17m
$ kubectl get lhe test-e-b8eb676b -n longhorn-system -o jsonpath='{.status.endpoint}'
iscsi://10.0.2.24:3260/iqn.2019-10.io.longhorn:test/1
</code></pre>
<h2 id="connect-iscsi-initiator-client-to-longhorn-volume">Connect iSCSI initiator (client) to Longhorn volume</h2>
<p>Now let’s try and connect our cluster-external VM to the Longhorn volume in our target cluster:</p>
<pre><code class="language-sh">$ iscsiadm --mode discoverydb --type sendtargets --portal 10.0.2.24 --discover
10.0.2.24:3260,1 iqn.2019-10.io.longhorn:test
$ iscsiadm --mode node --targetname iqn.2019-10.io.longhorn:test --portal 10.0.2.24:3260 --login
$ iscsiadm --mode node
10.0.2.24:3260,1 iqn.2019-10.io.longhorn:test
</code></pre>
<p>And now this volume is now available as <code>/dev/sdb</code> in my case:</p>
<pre><code>$ journalctl -xn 100 | grep sdb
Aug 12 09:05:11 longhorn-client kernel: sd 3:0:0:1: [sdb] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
Aug 12 09:05:11 longhorn-client kernel: sd 3:0:0:1: [sdb] Write Protect is off
Aug 12 09:05:11 longhorn-client kernel: sd 3:0:0:1: [sdb] Mode Sense: 69 00 10 08
Aug 12 09:05:11 longhorn-client kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
Aug 12 09:05:11 longhorn-client kernel: sd 3:0:0:1: [sdb] Attached SCSI disk
$ lsblk /dev/sdb
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 20G 0 disk
</code></pre>
<blockquote>
<p>NB: Note that in the current implementation, the path to this iSCSI target (endpoint) is not HA. It’s useful for some ad-hoc data access and recovery, but you cannot rely on this approach for anything beyond that for now.</p>
</blockquote>
Running the Rancher CIS Operator on any Kubernetes cluster2020-10-22T09:58:00+00:00http://dischord.org/2020/10/22/rancher-cis-operator-on-any-kubernetes-cluster<p><a href="https://rancher.com/products/rancher/2.5">Rancher 2.5</a> has ushered in a bunch of changes, and some of the functionality like <a href="https://github.com/rancher/backup-restore-operator">backups</a> and <a href="https://github.com/rancher/cis-operator">CIS scans</a> have been moved out into their own <a href="https://kubernetes.io/docs/concepts/extend-kubernetes/operator/">Operators</a>. It’s possible to make use of these on any Kubernetes cluster, not just one that’s been deployed and managed via Rancher, so this post details the steps necessary to deploy and run specifically the CIS Operator and view the results.</p>
<p>First of all, here’s my cluster deployed in AWS. It’s a four-node cluster, deployed using <a href="https://rancher.com/products/rke/">RKE</a>, with pretty much the defaults:</p>
<pre><code>$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-13-88.eu-west-2.compute.internal Ready worker 2m50s v1.19.2
ip-172-31-4-76.eu-west-2.compute.internal Ready worker 2m50s v1.19.2
ip-172-31-6-16.eu-west-2.compute.internal Ready worker 2m50s v1.19.2
ip-172-31-8-152.eu-west-2.compute.internal Ready controlplane,etcd 2m51s v1.19.2
</code></pre>
<p>Install the Operator using the official Rancher Helm charts:</p>
<pre><code>$ helm repo add rancher https://charts.rancher.io
$ helm repo update
$ helm install rancher-cis-benchmark-crd rancher/rancher-cis-benchmark-crd \
--create-namespace -n cis-operator-system
$ helm install rancher-cis-benchmark rancher/rancher-cis-benchmark \
-n cis-operator-system
</code></pre>
<p>At this point we’ve some objects created in the <code>cis-operator-system</code> namespace as well as some new custom resource definitions that we can examine:</p>
<pre><code>$ kubectl get crds | grep cis
clusterscanbenchmarks.cis.cattle.io 2020-10-22T09:40:57Z
clusterscanprofiles.cis.cattle.io 2020-10-22T09:40:57Z
clusterscanreports.cis.cattle.io 2020-10-22T09:40:57Z
clusterscans.cis.cattle.io 2020-10-22T09:40:57Z
$ kubectl get all -n cis-operator-system
NAME READY STATUS RESTARTS AGE
pod/cis-operator-5cc97bd778-4t45g 1/1 Running 0 10m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cis-operator 1/1 1 1 10m
NAME DESIRED CURRENT READY AGE
replicaset.apps/cis-operator-5cc97bd778 1 1 1 10m
</code></pre>
<pre><code>$ kubectl get clusterscanbenchmarks
NAME CLUSTERPROVIDER MINKUBERNETESVERSION MAXKUBERNETESVERSION
cis-1.5 1.15.0
eks-1.0 eks 1.15.0
gke-1.0 gke 1.15.0
rke-cis-1.5-hardened rke 1.15.0
rke-cis-1.5-permissive rke 1.15.0
$ kubectl get clusterscanprofiles
cis-1.5-profile cis-1.5
eks-profile eks-1.0
gke-profile gke-1.0
rke-profile-hardened rke-cis-1.5-hardened
rke-profile-permissive rke-cis-1.5-permissive
</code></pre>
<p>Let’s use the built-in profile <code>rke-profile-permissive</code> to perform a scan in accordance with the CIS Kubernetes benchmark 1.5. We’ll create an object of kind <code>ClusterScan</code> and refer to the <code>rke-profile-permissive</code> profile:</p>
<pre><code>$ kubectl apply -f - << EOF
---
apiVersion: cis.cattle.io/v1
kind: ClusterScan
metadata:
name: rke-cis
spec:
scanProfileName: rke-profile-permissive
EOF
</code></pre>
<p>Now we can check the status of this via the <code>clusterscans</code> CRD:</p>
<pre><code>$ kubectl get clusterscans
NAME CLUSTERSCANPROFILE TOTAL PASS FAIL SKIP NOT APPLICABLE LASTRUNTIMESTAMP
rke-cis rke-profile-permissive 2020-10-22T10:02:53Z
</code></pre>
<p>While it’s running, if you check in the <code>cis-operator-system</code> namespace you’ll see a number of Pods have been launched that are doing the job of running this scan against our cluster:</p>
<pre><code>$ kubectl get pods -n cis-operator-system
NAME READY STATUS RESTARTS AGE
cis-operator-5cc97bd778-4t45g 1/1 Running 0 21m
security-scan-runner-rke-cis-kbqfl 1/1 Running 0 20s
sonobuoy-rancher-kube-bench-daemon-set-8016e26168744e62-2662m 0/2 ContainerCreating 0 8s
sonobuoy-rancher-kube-bench-daemon-set-8016e26168744e62-2dw7q 2/2 Running 0 8s
sonobuoy-rancher-kube-bench-daemon-set-8016e26168744e62-hq4mt 0/2 ContainerCreating 0 8s
sonobuoy-rancher-kube-bench-daemon-set-8016e26168744e62-qbggn 0/2 ContainerCreating 0 8s
</code></pre>
<p>After a minute or so, our scan should run to completion:</p>
<pre><code>$ kubectl get clusterscans
NAME CLUSTERSCANPROFILE TOTAL PASS FAIL SKIP NOT APPLICABLE LASTRUNTIMESTAMP
rke-cis rke-profile-permissive 92 58 0 0 34 2020-10-22T10:02:53Z
</code></pre>
<p>And now we can checkout the report:</p>
<pre><code>$ kubectl get clusterscanreports
NAME LASTRUNTIMESTAMP BENCHMARKVERSION
scan-report-rke-cis 2020-10-22 10:03:26.744176873 +0000 UTC m=+1304.643435643 rke-cis-1.5-permissive
</code></pre>
<p>The report itself is in JSON:</p>
<pre><code>$ kubectl get clusterscanreport scan-report-rke-cis -o jsonpath="{.spec.reportJSON}" | jq
{
"version": "rke-cis-1.5-permissive",
"total": 92,
"pass": 58,
"fail": 0,
"skip": 0,
✂️ ---------
</code></pre>
<p>Piping the output via <a href="https://stedolan.github.io/jq/"><code>jq</code></a> tidies things up, but it’s still not particularly consumable within our terminal. Obviously the output is designed to be parsed and displayed by something else (i.e Rancher, duh), but we can also quickly tidy it up via a bit of Python to just dump the output into tables:</p>
<pre><code>#!/usr/bin/env python3
import sys
import json
from prettytable import PrettyTable
data = json.load(sys.stdin)
resultsTable = PrettyTable()
summaryTable = PrettyTable()
resultsTable.field_names = ["ID", "Area", "Description", "Result"]
resultsTable.align = "l"
resultsTable.sortby = "ID"
for r in data["results"]:
for c in r["checks"]:
resultsTable.add_row(
[c["id"], r["description"], c["description"], c["state"]])
summaryTable.field_names = ["Total", "Pass", "Fail", "Skip", "N/A"]
summaryTable.align = "r"
summaryTable.add_row(
[data["total"], data["pass"], data["fail"],
data["skip"], data["notApplicable"]]
)
print(resultsTable)
print(summaryTable)
</code></pre>
<p>Finally, if we save that as <code>~/tmp/scanreport.py</code> then we can pipe the output of the previous command, minus <code>jq</code>, and see our results:</p>
<pre><code>$ kubectl get clusterscanreport scan-report-rke-cis -o jsonpath="{.spec.reportJSON}" | ~/tmp/scanreport.py
+--------+----------------------------------+-------------------------------------------------------------------------------------------------------------------+---------------+
| ID | Area | Description | Result |
+--------+----------------------------------+-------------------------------------------------------------------------------------------------------------------+---------------+
| 1.1.1 | Master Node Configuration Files | Ensure that the API server pod specification file permissions are set to 644 or more restrictive (Scored) | notApplicable |
| 1.1.11 | Master Node Configuration Files | Ensure that the etcd data directory permissions are set to 700 or more restrictive (Scored) | pass |
| 1.1.12 | Master Node Configuration Files | Ensure that the etcd data directory ownership is set to etcd:etcd (Scored) | notApplicable |
| 1.1.13 | Master Node Configuration Files | Ensure that the admin.conf file permissions are set to 644 or more restrictive (Scored) | notApplicable |
| 1.1.14 | Master Node Configuration Files | Ensure that the admin.conf file ownership is set to root:root (Scored) | notApplicable |
| 1.1.15 | Master Node Configuration Files | Ensure that the scheduler.conf file permissions are set to 644 or more restrictive (Scored) | notApplicable |
| 1.1.16 | Master Node Configuration Files | Ensure that the scheduler.conf file ownership is set to root:root (Scored) | notApplicable |
| 1.1.17 | Master Node Configuration Files | Ensure that the controller-manager.conf file permissions are set to 644 or more restrictive (Scored) | notApplicable |
✂️ --------
+-------+------+------+------+-----+
| Total | Pass | Fail | Skip | N/A |
+-------+------+------+------+-----+
| 92 | 58 | 0 | 0 | 34 |
+-------+------+------+------+-----+
</code></pre>
Nova and libvirt 6.x2020-08-19T21:59:00+00:00http://dischord.org/2020/08/19/nova-and-libvirt-6-x<p>This one’s a heads-up in case you’ve an OpenStack deployment that’s been around for a while and which hosts instances spawned prior to the Train release. Pets, if you will. You’re in for a bit of a shock if one of these instances is stopped and then started again - libvirt will refuse to create the domain, with a Nova backtrace along these lines:</p>
<pre><code>2020-08-19 16:53:20.533 6 ERROR oslo_messaging.rpc.server libvirt.libvirtError: Requested operation is not valid: format of backing image '/var/lib/nova/instances/_base/c3395c4245b7573c83342d68a0d0ea675b7a1722' of
image '/var/lib/nova/instances/947df0d3-5aab-456d-a200-63b055934a43/disk' was not specified in the image metadata (See https://libvirt.org/kbase/backing_chains.html for troubleshooting)
</code></pre>
<p>The error is because of a change introduced in <a href="https://github.com/libvirt/libvirt/commit/3615e8b39badf2a526996a69dc91a92b04cf262e">libvirt in 6.0</a> which means that it’ll fail to launch the domain if the underlying disk’s backing store doesn’t have a format explicitly defined.</p>
<p>The good news is that this was spotted - and fixed - in Nova in the Train release. There’s an associated bug on Launchpad along with additional background on the problem <a href="https://bugs.launchpad.net/nova/+bug/1864020">here</a>. The bad news is that this fix only applies to new instances created with this fix in place. Older instances which were created without specifying that backing file format will fail with the above error if you’ve upgraded libvirt on your hypervisors.</p>
<p>I hit this when upgrading from Train to Ussuri via <a href="https://docs.openstack.org/kolla-ansible/latest/">Kolla-Ansible</a>; The Kolla-built Ubuntu-based Docker images for Ussuri include this newer version of libvirt in this release, and so when an older VM was stopped (i.e powered off) and then started again I saw this problem.</p>
<p>The fix is relatively straightforward and is mostly a case of following the documentation <a href="https://libvirt.org/kbase/backing_chains.html">linked in the error message</a>, with a few OpenStack-specific twists that are worth being aware of especially if you’re using Kolla. You should also take a backup (if possible) of the <code>disk</code> file as well as the backing file. In my case, the instance’s disks are hosted on the hypervisors themselves. You’ll need to adjust this process if you’re presenting block storage via some other means.</p>
<p>For a Kolla-based deployment, the commands need to be run in the <code>nova_compute</code> container so we can guarantee we’re using the right version of the QEMU tools and also to make our lives a bit easier when referring to disk file paths. If you <code>docker exec</code> straight in to this container you’ll be dropped in as the <code>nova</code> user which won’t have the permissions necessary to update the disk file, so instead we need to use <code>nsenter</code>:</p>
<pre><code>root@compute2:~# PID=$(docker inspect --format {{.State.Pid}} nova_compute)
root@compute2:~# nsenter --target $PID --mount --uts --ipc --net --pid
()[root@compute2 /]#
</code></pre>
<p>Now we need to navigate to the folder hosting our instance’s disk. You’ll need the UUID of the instance to do that (<code>947df0d3-5aab-456d-a200-63b055934a43</code> in my example), then we can use <code>qemu-img info</code> to find a bit more about it:</p>
<pre><code># cd /var/lib/nova/instances/947df0d3-5aab-456d-a200-63b055934a43
# qemu-img info disk
image: disk
file format: qcow2
virtual size: 80 GiB (85899345920 bytes)
disk size: 21.3 GiB
cluster_size: 65536
backing file: /var/lib/nova/instances/_base/c3395c4245b7573c83342d68a0d0ea675b7a1722
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
</code></pre>
<p>In this case, we’re missing a field - backing file format. If we examine another instance booted using Ussuri, we can see that’s present:</p>
<pre><code>image: disk
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 291 MiB
cluster_size: 65536
backing file: /var/lib/nova/instances/_base/c3395c4245b7573c83342d68a0d0ea675b7a1722
backing file format: raw
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
</code></pre>
<p>To fix the problem, we need to run another <code>qemu-img</code> command. We should validate the format of the backing file first, and then armed with the right info we can update the image:</p>
<pre><code># qemu-img info /var/lib/nova/instances/_base/c3395c4245b7573c83342d68a0d0ea675b7a1722
image: /var/lib/nova/instances/_base/c3395c4245b7573c83342d68a0d0ea675b7a1722
file format: raw
virtual size: 2.2 GiB (2361393152 bytes)
disk size: 1.02 GiB
# qemu-img rebase -f qcow2 -F raw \
-b /var/lib/nova/instances/_base/c3395c4245b7573c83342d68a0d0ea675b7a1722 \
/var/lib/nova/instances/947df0d3-5aab-456d-a200-63b055934a43/disk
# qemu-img info disk
image: disk
file format: qcow2
virtual size: 80 GiB (85899345920 bytes)
disk size: 21.3 GiB
cluster_size: 65536
backing file: /var/lib/nova/instances/_base/c3395c4245b7573c83342d68a0d0ea675b7a1722
backing file format: raw
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
</code></pre>
<p>The second command updated the image, and the last command validated that we now see the <code>backing file format</code> specified correctly.</p>
<p>If you’ve done everything right then you should now be able to start the instance up again without Nova erroring.</p>
Linked List # 32019-10-08T10:00:00+00:00http://dischord.org/2019/10/08/linked-list-3<blockquote>
<p>In an effort to inject some life back into this blog, I’m opting to do something relatively lazy and start posting semi-regularly with the odd link to anything that I’ve found of value or interest, but which don’t seem to have percolated their way up to the usual places. I’ve stolen the name (although it’s an obvious one) from <a href="http://daringfireball.net">Daring Fireball</a> and it’s inspired to some degree by <a href="https://blog.scottlowe.org">Scott Lowe’s</a> “Technology Short Takes”.</p>
</blockquote>
<ul>
<li>
<p><a href="https://blog.pragmaticengineer.com/operating-a-high-scale-distributed-system/">Practices and Lessons Learned operating a large-scale distributed system in a reliable way</a>.</p>
</li>
<li>
<p><a href="https://iximiuz.com/en/posts/linux-pty-what-powers-docker-attach-functionality/">Linux PTY</a>: Linux PTYs explained in the context of what powers Docker’s <code>attach</code> functionality.</p>
</li>
<li>
<p><a href="https://github.com/wagoodman/dive">Dive</a>: A tool for exploring a docker image, layer contents, and discovering ways to shrink your Docker image size.</p>
</li>
<li>
<p><a href="https://github.com/rs/curlie">Curlie</a>: A front-end for <a href="https://curl.haxx.se">cURL</a> that provides the ease-of-use and aesthetic goodness of <a href="https://httpie.org">HTTPie</a> but without sacrificing the power and flexibility of cURL itself.</p>
</li>
<li>
<p><a href="https://blog.hackedu.io/analysis-of-common-federated-identity-protocols/">An analysis of common federated identity protocols</a>, specifically OpenID Connect vs OAuth 2.0 vs SAML 2.0. Would’ve been extremely handy to have had this resource whilst working on <a href="https://blog.hackedu.io/analysis-of-common-federated-identity-protocols/">Federated OpenStack at StackHPC</a>.</p>
</li>
<li>
<p><a href="https://github.com/nushell/nushell">Nu Shell</a>: A modern shell written in Rust.</p>
</li>
<li>
<p><a href="https://github.com/saschagrunert/kubernix">kuberNix</a>: Kubernetes development cluster bootstrapping with <a href="https://nixos.org/nix/">Nix</a> packages.</p>
</li>
<li>
<p><a href="https://github.com/sapcc/kubernikus">Kubernikus</a>: Kubernetes as a Service for OpenStack, developed by the folks at <a href="https://www.sap.com">SAP</a>.</p>
</li>
</ul>
Linked List # 22019-08-07T12:42:00+00:00http://dischord.org/2019/08/07/linked-list-2<blockquote>
<p>In an effort to inject some life back into this blog, I’m opting to do something relatively lazy and start posting semi-regularly with the odd link to anything that I’ve found of value or interest, but which don’t seem to have percolated their way up to the usual places. I’ve stolen the name (although it’s an obvious one) from <a href="http://daringfireball.net">Daring Fireball</a> and it’s inspired to some degree by <a href="https://blog.scottlowe.org">Scott Lowe’s</a> “Technology Short Takes”.</p>
</blockquote>
<ul>
<li>
<p><a href="https://drewdevault.com/2019/06/03/Announcing-aerc-0.1.0.html">aerc</a>: An email client for your terminal. I’ve been a fan of <a href="http://mutt.org">Mutt</a> (and <a href="http://neomutt.org">Neomutt</a>) for many, many years. The latter is a valiant attempt at bringing in additional functionality via various patches that have been floating around, but some classic problems (such as lack of asynchronous operations, flaky IMAP support, and so on) persist. aerc is brand new terminal-based email client that solves these problems and rethinks what such a client should do. It’s already perfectly functional and has picked up a decent amount of momentum.</p>
</li>
<li>
<p><a href="https://medium.com/netflix-techblog/predictive-cpu-isolation-of-containers-at-netflix-91f014d856c7">Predictive CPU isolation of containers at Netflix</a>: Infrastructure insghts from the folks at Netflix are always worth a read, and this post is about solving the noisy neighbour problem in granular fashion.</p>
</li>
<li>
<p><a href="https://thebsdbox.co.uk/2019/06/20/Balancing-the-API-Server-with-nftables/">Using nftables to provide Kubernetes API load-balancing</a>: <a href="http://thebsdbox.co.uk">Dan’s</a> been on a roll recently with technical blog updates, and here’s one I found particularly interesting as a worked example of using <code>nftables</code>. I’ve always found packet filtering on Linux to be something of a sorry tale in usability terms, especially when compared to OpenBSD’s <a href="https://www.openbsd.org/faq/pf/">pf</a>.</p>
</li>
<li>
<p><a href="https://github.com/alexellis/inlets">Inlets: Expose local endpoints on the Internet</a>: As someone who runs a public cloud with a paucity of public IPv4 addresses available, I get the problems that this sort of thing is trying to solve.</p>
</li>
<li>
<p><a href="https://github.com/notqmail/notqmail">notqmail</a>: Collaborative open-source successor to <a href="https://cr.yp.to/qmail.html">qmail</a>. Similar to Neomutt mentioned above, this looks like an attempt at repackaging the plethora of qmail patches that are floating around and also cleaning up some of the codebase. I used to run numerous qmail servers back in the day (I was a huge fan of <a href="https://cr.yp.to/djb.html">DJB’s</a> software in general…), so this is of peripheral interest. I say peripheral because running your own mail server in 2019 is a pain in the ass.</p>
</li>
<li>
<p><a href="https://github.com/akavel/up">up</a>: A tool for writing Linux pipes with instant live preview. You can also combine this with <a href="https://github.com/junegunn/fzf">fzf</a> with some pretty nifty results, i.e:</p>
<p><code>echo '' | fzf --multi --preview='bash -c {q}' --preview-window=up:70</code></p>
</li>
</ul>
Inside the Sausage Factory2019-07-23T21:35:00+00:00http://dischord.org/2019/07/23/inside-the-sausage-factory<blockquote>
<p>Update 12/2021: There’s a lot that’s changed in the last couple of years since this post was originally written, and it’s long overdue a bit of an update! Pretty much all of the ‘ancient-yet-spritely’ Blades have been replaced with much more modern servers in an effort to increase performance and also reduce power consumption. We’ve also upgraded to 10GbE networking throughout, and best of all we have a shiny new Ceph cluster thanks to the amazing folks at <a href="https://softiron.com/resources/softiron-donates-hyperdrive-storage-solution-to-open-source-community-project-sausage-cloud/">SoftIron</a>! There’ll be a follow-up post soon with some insight into the work we’ve done….</p>
</blockquote>
<p>Part of the reason why we chose such a ridiculous name for Sausage Cloud was, although <a href="https://dischord.org/2019/07/01/everyone-loves-a-sausage/">everyone loves a sausage</a>, noone really wants to know where they come from or how they’re made, much less to look inside the sausage factory itself 🙀. This seems apt for someone running infrastructure services.</p>
<p>So in this post I’m going to do exactly that, and gaze at the horrors that lie within when it comes to running a cloud platform on a shoestring budget with blatant disregard for service because hobby project.</p>
<h2 id="hardware">Hardware</h2>
<p>The hardware that underpins Sausage Cloud is a set of ancient-yet-spritely HP BL460c G6 Blades. I say spritely because they’ve got a pair of mirrored 1TB disks sat in each one, which makes I/O just about bearable. The generation of Xeon that lies within (E5520) is power hungry and sadly too old for testing some nascent virtualisation technologies - specifically <a href="https://katacontainers.io">Kata</a> and <a href="https://firecracker-microvm.github.io">Firecracker</a> - which is a shame because a number of us would like to play around with that stuff some more.</p>
<p>Networking is provided by a single Cisco 3750G 48-port switch, and we use a couple of passthrough looms in the Blade chassis itself to present a pair of 1GbE interfaces to each Blade. We’ve got VLANs defined in order to segregate each class of traffic, which broadly boils down to:</p>
<ul>
<li>Management</li>
<li>Internet</li>
<li>Overlay</li>
<li>Out-of-band</li>
</ul>
<p>You’ll notice there’s no dedicated VLAN for storage traffic, more on that a little later.</p>
<p>Routing and basic perimeter security is managed by a weedy Juniper SRX210.</p>
<p>For managing out-of-band (consoles etc.) we have to go via the HP BladeSystem’s “onboard administrator” which is a creaky old thing, and remains just about serviceable with the latest firmware applied. It only renders properly in Firefox though for some reason, and the virtual console stuff is only supported via a Java option 🤮. I keep a dedicated VM around on my laptop for exactly this sort of thing.</p>
<h2 id="deployment-and-automation">Deployment and automation</h2>
<p>To manage the baremetal deployment of the cloud infrastructure we use Canonical’s MAAS. It does just enough of the basics in a reasonably sensible fashion to justify its existence, and it’s deployed to a dedicated Blade. I hate that this is a SPOF; All too often administrators overlook the importance of such core infrastructure, and we’re guilty here. We all have our reasons.</p>
<p>This Blade also has a checkout of the <a href="https://docs.openstack.org/kolla-ansible/latest/">Kolla-Ansible</a> source code repository along with a corresponding Python virtualenv, and this is what’s used to deploy and configure the OpenStack deployment itself entirely in Docker containers from just <em>46 lines of configuration</em>:</p>
<pre><code class="language-shell">$ grep -o '^[^#]*' globals.yml | wc -l
46
</code></pre>
<p>And I’m sure there’s some redundant junk in there! Let that sink in for a minute as you gaze upon this expanse of blog. I can’t overstate how fantastic the <a href="https://wiki.openstack.org/wiki/Kolla">Kolla</a> project is, and why - even if you’re not interested in OpenStack - you should take a look. It’s borne from operator experience and it’s comprehensive enough to cater for the majority of use-cases. If you can’t see it covering your use-case, chances are you’re thinking about Doing It Wrong™️.</p>
<p>There’s a little <a href="https://www.pcengines.ch/apu2.htm">PC Engines APU2</a> sat on one side which provides secure connectivity into the platform, with VPN access facilitated by <a href="https://www.wireguard.com">Wireguard</a>.</p>
<p><img src="/public/static/electrics.jpg" alt="Wiring" class="center" /></p>
<h2 id="controllers">Controllers</h2>
<p>Two of the ten Blades have been designated our control nodes. These run all of the core OpenStack API services, workers, and supporting junk such as memcached, Galera and RabbitMQ. The observant amongst you will have probably spat out your drink at the insane notion of running a <em>two node</em> Galera cluster, let alone RabbitMQ, but that’s exactly what I’ve done here, simply because I didn’t want to ‘waste’ another node that could be otherwise used for compute workloads. It’s possible to run a service called <a href="https://galeracluster.com/library/documentation/arbitrator.html">Galera Arbitrator</a> that doesn’t persist any data and which just participates in cluster quorum election but I gave that a miss because it’s not currently supported by Kolla. Instead I decided to YOLO my way through the deployment and just run with two nodes. To be honest, this isn’t quite as big of a deal for a couple of reasons:</p>
<p>Firstly, our cloud is so small such that it doesn’t really see all that much change in the environment. I can take a <a href="https://docs.openstack.org/kolla-ansible/latest/admin/mariadb-backup-and-restore.html">backup</a> of Galera on a daily basis and that’s sufficient to be able to restore in the event of data loss. Secondly, remember that everything is wired into a single switch anyway - so partitioning is a lot less likely.</p>
<p>Finally we’re accountable to noone but ourselves because this is a hobby project, but at the same time we’re proud enough to care - enough to give it sufficient thought to weigh it up anyway.</p>
<h2 id="compute">Compute</h2>
<p>There are five blades allocated for the task of running virtualised workloads. There’s not much to say about these - they pretty much get on with doing their job. I’ve been side-eyeing one of them recently though - <code>compute5</code> - with some suspicion as workloads just don’t seem happy on there, but it keeps plodding along.</p>
<p><img src="/public/static/camp.jpg" alt="Camp" class="center" /></p>
<h2 id="networking">Networking</h2>
<p>I’ve mentioned the physical networking aspects already. The virtual networking is taken care of by one or two lines in Kolla’s configuration (haha remember the good times we had with discovering how to do this ourselves with Puppet and Open vSwitch and oh why am I crying please hold me it’s dark also the pain), and this handles deploying Neutron and using VXLAN to tunnel overlay for the virtually segregated tenant traffic. Just a couple of calls post-deployment (as a one-off task) to the Neutron API are required to define our provider network, specify a range of publicly-accessible IP addresses which can be allocated as floating IPs or used to router gateways, and we’re done.</p>
<p>Our two (lol) designated controllers also happen to run double-duty as network nodes, handling layer-3 traffic ingress and egress for all tenant traffic. We make use of VLAN tagged interfaces to ensure the appropriate segregation, otherwise everything works as it should. OpenStack components for the APIs and for network routing are configured for ‘high availability’, meaning that if we do lose one controller, the other one will assume responsibility for routing traffic in and out of whereever it needs to go. Behind the scenes this is made possible by <a href="https://www.keepalived.org">keepalived</a>.</p>
<p>As far as usage goes, the network is presented much like other public OpenStack cloud platforms and indeed AWS. You get the ability to create your own private networks, subnets and routers, and you can set a public (gateway) IP on a router in order to be able to ingress and egress traffic from the Internet. Virtual machines get private IP addresses assigned, and you can allocate floating IPs from our small pool of public IPv4 IPs. These are our scarecest, most precious resource, and I think after tackling the persistent storage we’d look to IPv6 instead.</p>
<h2 id="monitoring">Monitoring</h2>
<p>Much like DNS, monitoring is ever the misunderstood afterthought. And again, much like DNS, fatally so. Luckily we have some monitoring! What are we, hobbyist amateurs?! 😉</p>
<p>We collect all logs via fluentd, parse them via Logstash, send them on their merry way to Elasticsearch, and then scratch our heads over the Lucene query syntax using Kibana. We’ve also deployed <a href="https://www.netdata.cloud">Netdata</a>; It seemed like a quick win at the time, but it offers up a <em>lot</em> of metrics with no easy way to aggregate or do anything especially useful other than a bit of ad-hoc analysis. It’s been good enough, but as soon as I have the time this is the area in which I’d like to pay some serious attention.</p>
<p><img src="/public/static/netdata.png" alt="Netdata" class="center" /></p>
<h2 id="usage">Usage</h2>
<p>Most of the platform’s usage has centred around people being able to spin up a handful of VMs with a decent amount of memory allocated - 16 or 32GB per instance isn’t uncommon. This fits the pattern of being able to have a persistent, remote virtual environment in which to develop and test without having to worry about crazy bills. The flexibility of the per-tenant networking means that you can replicate some funcionality that would be otherwise awkward to do so using ‘desktop’ virtualisation software. The CPU performance seems to matter a lot less for this sort of thing. In short, it’s a useful platform for those of us interested in developing and testing other cloud platforms. For example, I made use of it extensively when <a href="https://github.com/dcos-terraform/examples/tree/openstack/openstack">pulling together some bits of Terraform</a> to get <a href="https://dcos.io">DC/OS</a> to install via the <a href="https://docs.mesosphere.com/1.13/installing/evaluation/#about-the-mesosphere-universal-installer">Universal Installer</a>.</p>
<h2 id="whats-missing">What’s missing</h2>
<p>Even though it’s such a budget deployment, the platform itself presents and supports a decent selection of services. Apart from all the base stuff required to run and manage virtual compute and networking, there’s some additional supporting components that can keep our users well away from <em>spits</em> virtual machines. If you want to deploy Kubernetes, we have Magnum for that. If it’s good enough for <a href="https://archive.fosdem.org/2017/schedule/event/magnumcern/">CERN’s crazy amount of clusters</a>, then it’s good enough for you. We also have some niceties such as Designate for DNS-as-a-Service.</p>
<p>However, there is one glaring omission - I mean apart from the general infrastructure horror show: Persistent storage. Right now you’re limited to ephemeral, which generally is fine; Your virtual machine’s storage is local to the hypervisor (meaning it’s on an SSD), and it’s only ephemeral insofar as the VM itself. For most of us right now this is more than good enough, but it would be nice to be able to mount software-define block storage volumes on demand, so we’re working towards plugging that gap with <a href="https://ceph.com">Ceph</a>.</p>
<h2 id="closing-thoughts">Closing thoughts</h2>
<p>Reading the above, you’d be forgiven for thinking that this sounds like a recipe for disaster. A painful exercise in learning what it takes to run a complex IaaS. After all, OpenStack is hard right? Surprisingly enough, no. I mean, I live in daily dread of something happening which necessitates a trip to the Bunker (thankfully now not all that far away) and which means myself and Matt have to get our hands in our pockets, but honestly it’s required very little involvement on our behalfs to keep it ticking over. In the time since it’s been deployed, it’s seen a fair amount of usage with over 2418 virtual machines popping into and out of existence, and I’ve had comments from people about the stability of their IRC bouncer running on a VM with 32GB of RAM.</p>
<p>If that’s not validation then I don’t know what is.</p>
<p>To close things out, here’s a picture of <a href="https://twitter.com/M0nk3Ee">Matt</a> after a job well done:</p>
<p><img src="/public/static/welldone.JPG" alt="Well Done" class="center" /></p>
kind and Persistent Storage2019-07-11T14:13:00+00:00http://dischord.org/2019/07/11/persistent-storage-kind<blockquote>
<p>NB: As of <a href="https://github.com/kubernetes-sigs/kind/releases/tag/v0.7.0">kind 0.7.0</a>, the below is no longer necessary! 🎉</p>
</blockquote>
<p>I’ve been using <a href="https://github.com/kubernetes-sigs/kind">kind</a> instead of <a href="https://github.com/kubernetes/minikube">Minikube</a> recently for any Kubernetes-related hacking about - specifically for <a href="https://kudo.dev">KUDO</a>. Why? Well, on macOS at least, it’s quicker to launch and it’s also more efficient from a resource usage perspective - if you happen to have Docker running anyway that is. Remember that Docker for Mac actually runs Linux in a VM via <a href="https://github.com/moby/hyperkit">Hyperkit</a> on your behalf; Minikube - if you <a href="https://github.com/kubernetes/minikube/blob/master/docs/drivers.md#hyperkit-driver">configure it correctly</a> - <em>also</em> creates a VM to deploy Kubernetes. Having two VMs with a committed amount of CPU and memory running behind the scenes doesn’t make sense. You might as well run it all in containers under the one VM (the Docker-provisioned one).</p>
<p>Anyway, I hit a problem early on trying to deploy either the <a href="https://github.com/kudobuilder/operators/tree/master/repository/zookeeper">ZooKeeper</a> or <a href="https://github.com/kudobuilder/operators/tree/master/repository/kafka">Kafka</a> operators, as they both have a <code>StatefulSet</code> defined with persistent storage requirements via <code>volumeClaimTemplates</code>. By default kind uses hostpath-provisioner which is somewhat limited, and so if you look at the logs for your service - for example one of the ZooKeeper pods in my case - you’ll see errors along the lines of:</p>
<pre><code>Creating ZooKeeper log4j configuration
mkdir: cannot create directory '/var/lib/zookeeper/data': Permission denied
chown: cannot access '/var/lib/zookeeper/data': No such file or directory
mkdir: cannot create directory '/var/lib/zookeeper/data': Permission denied
chown: invalid group: 'zookeeper:USER'
/usr/bin/start-zookeeper: line 176: /var/lib/zookeeper/data/myid: No such file or directory
</code></pre>
<p>Fortunately, it’s possible to fix this by switching to the <a href="https://github.com/rancher/local-path-provisioner">Local Path Provisioner</a> by doing the following after launching your kind cluster (and before attempting to run any of the aforementioned services, obvs):</p>
<pre><code class="language-sh">kubectl delete storageclass standard
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
kubectl annotate storageclass --overwrite local-path storageclass.kubernetes.io/is-default-class=true
</code></pre>
<p>I actually spotted the fix for this problem via the <a href="https://github.com/kudobuilder/operators/blob/e4c10303b5d60810c72e85c4a221e465bfd66755/Makefile#L34">Makefile</a> in the <a href="https://github.com/kudobuilder/operators">KUDO operators repository</a>, so kudos (🤦♂️) to the <a href="https://twitter.com/justinmbarrick">original author</a> for that one.</p>
Everyone Loves a Sausage2019-07-01T10:03:00+00:00http://dischord.org/2019/07/01/everyone-loves-a-sausage<blockquote>
<p>NB: There’s a follow-up post <a href="https://dischord.org/2019/07/23/inside-the-sausage-factory/">here</a> that goes into a bit more technical detail if that’s your thing…</p>
</blockquote>
<p>People that know me are probably aware of the fact that I’m something of a pack rat when it comes to hardware (and software). I hate throwing (or deleting) anything that still works and so might have some useful function, however unlikely - the box of ancient SCSI harddisks (9GB and 18GB!) along with SGI workstations and countless other random bits attest to that fact. But when the chance comes to finally make use of that thing you’ve saved… :chefs-kiss-emoji:</p>
<p>A couple of years back, with the demise of <a href="http://github.com/datacentred/">DataCentred</a> there were a couple of significant gaps in my life. The first was gainful employment, and the second was the painful realisation that I had to start <em>paying</em> for hosting again. Unconscionable!</p>
<p>Fortunately the first problem was taken care of by an amazing opportunity at <a href="http://stackhpc.com">StackHPC</a>. The second, well…</p>
<p>Once the dust had settled on the whole thing, over the course of a conversation down the pub one evening myself and <a href="https://twitter.com/M0nk3Ee">Matt</a> came to the realisation that between us, we had enough hardware lying around in our garages and basements to build a very small but servicable cloud platform; a little too much for ‘homelab’ meddling, but definitely enough to do something ‘interesting’. As it turns out, we also knew someone who had recently purchased a former nuclear bunker in Scotland and had kitted it out with some serious connectivity, and so we decided to do the obvious thing: Cart it up there, rack and install the lot of it, and stick it online!</p>
<p><img src="/public/static/bunker.png" alt="Great place to hide a sausage" class="center" /></p>
<p>A plan was soon hatched to perform the following in a single day:</p>
<ul>
<li>Drive from Manchester to <a href="https://en.wikipedia.org/wiki/Comrie">Comrie</a></li>
<li>Build a rack</li>
<li>Physically install all the hardware</li>
<li>Deploy bootstrap and out-of-band infrastructure</li>
<li>Configure basic networking</li>
<li>Do enough testing to be confident that we can manage it all remotely from a few hundred miles away…</li>
<li>Drive from Comrie back to Manchester</li>
</ul>
<p>No mean feat when you consider that most of the hardware hadn’t been turned on for the best part of a year and really hadn’t been taken care of all that well in the meantime. To say it was a long day was an understatement, but we just about managed to keep it under 24 hours…</p>
<p><img src="/public/static/matthax.JPG" alt="Matt at work" class="center" /></p>
<p>As with all things, it’s the naming that’s often the hardest nut to crack. The discussion around what to name it kept us going from North of Glasgow until nearly Lancaster, before Matt cracked it. He had the idea that we should name it after something that everyone likes or loves, and who doesn’t love a sausage?! Of course, this lent itself perfectly when it came to what to name the various ‘flavours’ (sizes) of virtual machine. Gone are boring names such as <code>dc1.1x2</code> and <code>c5d.2xlarge</code>, what you really want is either a ‘chipolata’ (small) or a ‘cumberland’ (large!).</p>
<p>Sausage Cloud was born.</p>
<pre><code>$ openstack flavor list -c Name -c VCPUs -c RAM -c Disk
+------------+-------+------+-------+
| Name | RAM | Disk | VCPUs |
+------------+-------+------+-------+
| wiener | 512 | 1 | 1 |
| chipolata | 1024 | 10 | 1 |
| hotdog | 8192 | 80 | 4 |
| saveloy | 16384 | 80 | 4 |
| cumberland | 32768 | 100 | 8 |
| bratwurst | 4096 | 40 | 2 |
+------------+-------+------+-------+
</code></pre>
<p>Anyway, it seemed like a good idea after having been awake for about 18 hours.</p>
<p>We continued the rest of the work entirely remotely, and within a few days thanks to the combined wonders of Canonical’s <a href="https://maas.io">MAAS</a> and the <a href="https://wiki.openstack.org/wiki/Kolla">Kolla</a> project we had a fully functioning <a href="https://openstack.org">OpenStack</a> platform up and running and presented to the world as a public ‘cloud’. Anyone that still thinks OpenStack is hard to deploy and manage is <em>dead wrong</em>.</p>
<p>Fast forward nearly 14 months and this thing is still going, creaking along and consuming horrendous amounts of power because that’s what 10-year old Intel Xeons like to do best. It’s been through two OpenStack upgrades so far (again, painless thanks to Kolla) and it’s helped with countless test deployments and projects such as my recent attempts at using Terraform to deploy <a href="https://github.com/dcos-terraform/examples/tree/openstack/openstack">DC/OS</a>. And that’s before I mention that it’s great experience and a lot of fun running an Internet-facing service such as this, something that covers the whole gamut from baremetal, networking, and right the way up through the stack.</p>
<p><img src="/public/static/rack-rear.png" alt="Rack" class="center" /></p>
<p>During the course of that year or so we’ve offered free - and relatively unlimited - hosting to various close friends and anyone interested in meddling with OpenStack, either its APIs or what it takes to run this sort of platform, as well as anyone that fancies themselves as something of a refugee from the hyperscale cloud providers.</p>
<p>I had meant to write a bit about this whole process back when we originally installed the platform, but time ran away with me and I never got around to it. Why now then? Well, the free bit is sadly coming to a close, and we need to start paying our gracious datacentre hosts money for power, bandwidth and cooling. There’s a chance that we won’t be able to keep it going without enough people committing to helping pay towards costs, so I figured I’d best write something before it’s no longer a thing!</p>
<p>With that said, if you’re interested in making use of - or helping out with - Sausage Cloud then drop me or Matt a message 🌭</p>
Linked List2019-05-31T09:45:00+00:00http://dischord.org/2019/05/31/linked-list<blockquote>
<p>In an effort to inject some life back into this blog, I’m opting to do something relatively lazy and start posting semi-regularly with the odd link to anything that I’ve found of value or interest. I’ve stolen the name (although it’s an obvious one) from <a href="http://daringfireball.net">Daring Fireball</a> and it’s inspired to some degree by <a href="https://blog.scottlowe.org">Scott Lowe’s</a> “Technology Short Takes”.</p>
</blockquote>
<ul>
<li>
<p><a href="https://github.com/rcoh/angle-grinder">Angle Grinder</a>. A CLI tool for manipulating logfiles. If you’ve ever wanted to quickly construct some ad-hoc live-updating log reporting in your terminal but didn’t have the data available in something like Elasticsearch, then this might be for you.</p>
</li>
<li>
<p><a href="http://nmattia.com/posts/2018-03-21-nix-reproducible-setup-linux-macos.html">Reproducible Linux and macOS setups using Nix</a> - use the Nix language to declaratively describe your system configuration. I had <a href="http://nixos.org">NixOS</a> on my workstation for a time (until the SSD it lived on died) and since then I’ve been meaning to take another look at applying some of the basic principles to other distributions or to my Mac.</p>
</li>
<li>
<p><a href="https://github.com/bpowers/mstat">mstat - measure memory usage of a program over time</a>. Another useful utility for those ad-hoc situations that you might find yourself in where you need to take a closer look at memory usage.</p>
</li>
<li>
<p><a href="https://garethr.dev/2019/04/configuring-kubernetes-with-cue/">Configuring Kubernetes with CUE</a>. Some discussions within the <a href="http://kudo.dev">KUDO</a> community recently led me to take a look at a new language called CUE. Here’s an excellent introduction from Gareth Rushgrove.</p>
</li>
<li>
<p><a href="https://github.com/Zooz/predator">Predator - distributed performance testing platform for APIs</a>. It even supports <a href="https://dcos.io">DC/OS</a>!</p>
</li>
<li>
<p><a href="https://k8s.devstats.cncf.io/d/12/dashboards?refresh=15m&orgId=1">Kubernetes Development Stats Dashboards</a>. I only found about this one very recently - it’s sort of analogous to <a href="https://www.stackalytics.com">Stackalytics</a>.</p>
</li>
</ul>
Identity Crisis2018-09-04T21:29:00+00:00http://dischord.org/2018/09/04/identity-crisis<p>I’ve long had a bit of a thing for ‘classic’ ThinkPads. I owned a couple of X40s back in the day, and then a few years back I bought an X200 to lightly modify and run as an OpenBSD desktop. There are lots to like about these machines - decent keyboard, reasonable build quality, and by virtue of their popularity within open-source circles, excellent hardware support for operating systems other than Windows (and macOS, obviously). There’s also a lot to hate about them, <a href="https://www.techworm.net/2015/08/lenovo-pcs-and-laptops-seem-to-have-a-bios-level-backdoor.html">especially under Lenovo’s stewardship</a>, but for a certain vintage these problems can be disregarded or worked around.</p>
<blockquote>
<p><em>For a laugh and to break up this wall of text, here’s the only photo I could find of one of my old X40s. It’s a picture of my desktop at Carphone Warehouse, somewhere around 2004, with the ThinkPad off to the left looking classy in amongst all the corporate junk:</em></p>
</blockquote>
<p><img src="/public/static/cpw_x40.jpg" alt="My desk at CPW" /></p>
<p>However, if there’s one thing that’s always let them down it’s the screens. Every one I’ve had or used has been terrible. The simple modification I did to the X200 was to upgrade its screen with an AFFS model but even so, it was still low resolution and was of very poor quality compared to my MacBook’s. I eventually sold it as it didn’t really see all that much use, and since then I’ve not bothered thinking about them much.</p>
<p>My attention turned to them again recently though, prompted by a couple of things but mostly because my eye was caught a while ago by some really creative modders in China, with the technical chops required to transform models like the X61 and the X201 into machines <a href="https://liliputing.com/2018/03/x210-mod-turns-classic-lenovo-thinkpad-x201-into-a-modern-pc.html">equipped with modern CPUs</a> and, most importantly (for me anyway), decent screens.</p>
<p>I’d be lying if I said it hasn’t been motivated in part by Apple’s attitude - perceived or otherwise - towards their ‘traditional’ (i.e non-iOS) devices. It’s something of a worry, and although I’ve not been affected by any of the issues with the hardware or software it’s hard to ignore it completely. Maybe the new release of macOS and some refreshed non-Pro laptops, along with the re-designed Mac Pro will turn that around but it remains to be seen.</p>
<p>So anyway, I did my research, weighed up the pros and cons of <a href="https://forum.thinkpads.com/viewtopic.php?t=122640">trying to do the mod myself</a>, and eventually decided to wire nearly 400 UKP to a very random email address supposedly owned by one of the most respected modders (Jacky / <a href="http://facebook.com/lcdfans">lcdfans</a>) in return for a relatively straightforward upgraded X230 with:</p>
<ul>
<li>An Intel i7-3520M CPU;</li>
<li>A 2560x1440 12.5” screen;</li>
<li>A motherboard modded to be able to drive the above display;</li>
<li>A new shell;</li>
<li>A new, modified 7-row keyboard from the previous (pre-chiclet) generation;</li>
<li>Upgraded wireless (with support for 802.11ac);</li>
<li>Upgraded Bluetooth (support for BT4.0).</li>
</ul>
<p>Lo and behold - and completely unannounced - it arrived just a few short weeks later. I stuck 16GB in it along with a 1TB SSD and it’s <em>awesome</em>.</p>
<p>Or rather, it was. It worked for a day. Then the display crapped out. A few frantically-exchanged emails and videos later allayed any concerns I had that I’d be left high-and-dry with a non-working machine, as Jacky was extremely helpful and super responsive. However, despite my best efforts at diagnosing the problem (under his direction) and an attempt at re-soldering the motherboard mod (!) I couldn’t coax it back into life. So I had to return it to China, which took a loooong time.</p>
<p>In fact the whole process took so long that the shine had almost completely worn off the whole idea. The delay wasn’t Jacky’s fault, it was customs on either side taking their sweet time processing the shipment. But I kept seeing other people on Twitter that had cottoned on to the availability of these machines as well and were showing off photos of them, working flawlessly, leaving me somewhat envious. Finally though my replacement unit arrived and the great news is that it’s worked magnificently ever since.</p>
<p><img src="/public/static/x230.jpg" alt="The X230 in action" /></p>
<p>As for as which OS I’m running… I had a brief dally with <a href="http://nixos.org">NixOS</a>, but there was a little too much friction (and potential fragility - mostly my fault) for what I need to do on a day-to-day basis. So instead I’ve settled on the latest release of <a href="https://getfedora.org/">Fedora Linux</a> (I’m really bored of <a href="http://ubuntu.com/">orange</a>) and - brace yourself - it’s been wonderful. I’ve been using it every day now for nearly a month and I’ve not - yet - switched back to my Mac full-time.</p>
<p>Everything works. I get good battery life, about 7-8 hours for moderate usage from the new, official 9-cell battery I purchased. I’m using <a href="http://i3wm.org">i3</a> with <a href="https://github.com/csxr/i3-gnome">some bits</a> to bring in the best Just Works components of GNOME and it’s both fast and distraction-free, helping my concentration no end. The screen is bright and clear, with decent viewing angles. Unfortunately the resolution means that I have to run at 200% scaling for some bits of the UI; It’d be great if fractional scaling was properly supported in Linux so I could do 150% instead, but support for it just isn’t quite there yet. However, this isn’t much of a problem in practice.</p>
<p>Here’s a quick (staged) screenshot of i3 - bonus points if you recognise the background (the mouse mat I’m using in that previous photo provides a clue)…</p>
<p><img src="/public/static/i3neofetch.png" alt="Screnshot of i3" /></p>
<p>There’s other benefits too. The whole thing is properly <em>serviceable</em>. If something breaks then there’s a good chance I can just pick up a replacement part or hell, a machine entirely, on eBay for next to nothing. The only precious bit is the modded motherboard, but odds are that it’ll last (assuming I’ve not just jinxed it by saying that). Also it’s refreshing not to have to baby the thing to quite the same degree with a MacBook. If I drop it and crack the case then no problem, it’s pennies to replace and you can DIY.</p>
<p>Saying all that though, I can’t hand-on-heart say goodbye to Apple’s macOS entirely. Despite the recent mis-steps, I still think that macOS is the best desktop operating system and that Apple make the best laptop hardware. But it’s <em>fun</em> using Linux on the desktop again for the first time in a while, and for the job I’m doing right now it’s actually more suitable than macOS is. That’s a topic for another post though…</p>
<p>If you’re interested in one of these machines (and you should be, especially given the price and limited availability) then Jacky has a new website <a href="http://cnmod.cn">here</a> and a Facebook page <a href="https://facebook.com/lcdfans">here</a>. Tell him I said “hi” - we exchanged about 80 emails during the course of this whole process so I’m sure he misses me 😉</p>
RIAT 20182018-07-18T22:06:00+00:00http://dischord.org/2018/07/18/riat-2018<p>Whoah, 2018?! Here’s some photos taken last weekend, whilst stood baking in the sunshine, at RAF Fairford for the Royal International Air Tattoo:</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/28606888217/"><img src="https://live.staticflickr.com/1763/28606888217_d54c747ebc_b.jpg" title="DSC_4851" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/29622482188/"><img src="https://live.staticflickr.com/927/29622482188_c377b664cf_b.jpg" title="DSC_4315" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/42588984985/"><img src="https://live.staticflickr.com/842/42588984985_4666f384d2_b.jpg" title="DSC_4642" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/42588977825/"><img src="https://live.staticflickr.com/1764/42588977825_a91abf5587_b.jpg" title="DSC_4722" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/43445793112/"><img src="https://live.staticflickr.com/921/43445793112_08814f2b2e_b.jpg" title="DSC_3872" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/42588988545/"><img src="https://live.staticflickr.com/844/42588988545_7e9b17c517_b.jpg" title="DSC_4597" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/42588976195/"><img src="https://live.staticflickr.com/1786/42588976195_6e0bb8f055_b.jpg" title="DSC_4778" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/43445722532/"><img src="https://live.staticflickr.com/1823/43445722532_7dc1ae461b_b.jpg" title="DSC_5054" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/41685766370/"><img src="https://live.staticflickr.com/1783/41685766370_f6c7aafe85_b.jpg" title="DSC_4509" /></a></p>
<p>There’s a bunch more over on <a href="https://flic.kr/s/aHsmjnJ8of">Flickr here</a>. Be grateful that the album only consists of 87 images - I actually took nearly 1600 photos over the course of two days.</p>
<p>I also came to realise that my Nikon D700 is now nearly 10 years old! I’m probably due for some kind of upgrade…</p>
Hell Ride Burnley2016-12-05T22:16:00+00:00http://dischord.org/2016/12/05/hell-ride-burnley<blockquote>
<p><em>“You don’t need a camera like that any more mate, we’ve all got smartphones!”</em></p>
</blockquote>
<p>The Hell Ride Blues made the trip over to Burnley last night to play a gig at The Turf Hotel, so here’s a few photos:</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/31444900005/"><img src="https://live.staticflickr.com/5538/31444900005_fc6169b800_b.jpg" title="DSC_3078" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/31329364671/"><img src="https://live.staticflickr.com/5330/31329364671_ac41e3bcd0_b.jpg" title="DSC_3063-Edit" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/30623093464/"><img src="https://live.staticflickr.com/5788/30623093464_8243e53995_b.jpg" title="DSC_3077" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/31408306916/"><img src="https://live.staticflickr.com/5601/31408306916_7034e5e5a6_b.jpg" title="DSC_3024" /></a></p>
<p>And a couple more for good measure up on <a href="https://www.flickr.com/photos/yankcrime/sets/72157673540666703/">Flickr here</a>.</p>
OpenStack Neutron LBaaS v1 Failover2016-12-02T15:49:00+00:00http://dischord.org/2016/12/02/openstack-neutron-lbaas-v1-failover<p>A quick note on what to do if you find yourself in the unfortunate position of having to failover an <a href="https://wiki.openstack.org/wiki/Neutron/LBaaS">LBaaS</a> V1 pool from one agent to another. This isn’t supported via the API so you need to roll up your sleeves and dust off your favourite MySQL client.</p>
<p>In this example, <code>nn2</code> has failed and everything else has gracefully migrated away - I’m just stuck with some LBaaS pools that I need to move. In case you didn’t realise, you can figure out your Neutron agent UUID by doing the following:</p>
<pre><code>$ openstack network agent list | grep -iE 'loadbalancer.*nn2'
| 9dfa45dd-4562-4180-b7da-879b8c539e5f | Loadbalancer agent | nn2 | None | False | UP | neutron-lbaas-agent |
</code></pre>
<p>In this case, the ‘False’ column means it’s dead and so we need to move all pools associated with that agent. You can see which pools are affected by doing:</p>
<pre><code>$ neutron lb-pool-list-on-agent 44770887-a458-4503-9648-d07f97f56d96
+--------------------------------------+---------+-------------------+----------+----------------+--------+
| id | name | lb_method | protocol | admin_state_up | status |
+--------------------------------------+---------+----------+--------+----------+----------------+--------+
| 00662f74-3fd6-5162-af12-8ee15b73232e | lb1 | LEAST_CONNECTIONS | TCP | True | ACTIVE |
| 0976e919-5305-2232-b924-97176a1abbcb | lb2 | ROUND_ROBIN | TCP | True | ACTIVE |
| 0a628525-3708-abc2-ac87-60c0ef4d66f5 | lb3 | ROUND_ROBIN | TCP | True | ACTIVE |
| 0b57422f-052f-d124-ac31-49b962cd825e | lb4 | ROUND_ROBIN | TCP | True | ACTIVE |
| 10095dbe-1568-bb2c-923e-d9f01241a838 | lb5 | ROUND_ROBIN | TCP | True | ACTIVE |
[ .. ]
</code></pre>
<p>You’ll then want to select a suitable target for where to migrate these pools to. In my example, it’s <code>nn5</code>:</p>
<pre><code>$ openstack network agent list | grep -iE 'loadbalancer.*nn5'
| ee998c47-6a6c-4632-b4d5-64612523823b | Loadbalancer agent | nn5 | None | True | UP | neutron-lbaas-agent |
</code></pre>
<p>The table in question which manages the pool to agent mapping is <code>poolloadbalanceragentbindings</code>:</p>
<pre><code>MariaDB [neutron]> describe poolloadbalanceragentbindings;
+----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| pool_id | varchar(36) | NO | PRI | NULL | |
| agent_id | varchar(36) | NO | MUL | NULL | |
+----------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
</code></pre>
<blockquote>
<p>Make sure you take a backup before doing any updates!</p>
</blockquote>
<p>We need to update the rows in this table so that the target agent UUID is responsible for the pools that need migrating.</p>
<p>Let’s verify how many we have to deal with:</p>
<pre><code>MariaDB [neutron]> select * from poolloadbalanceragentbindings where agent_id = '9dfa45dd-4562-4180-b7da-879b8c539e5f';
[..]
35 rows in set (0.00 sec)
</code></pre>
<p>Now let’s update this database table and change that table ID en masse:</p>
<pre><code>MariaDB [neutron]> update poolloadbalanceragentbindings set agent_id = 'ee998c47-6a6c-4632-b4d5-64612523823b' where agent_id = '9dfa45dd-4562-4180-b7da-879b8c539e5f' limit 35;
</code></pre>
<blockquote>
<p>Protip: Always qualify with the <code>limit</code> statement. This can often stop or limit the damage from a botched query!</p>
</blockquote>
<p>With that done, we need kick Neutron in order for the new agent to realise that there’s a whole load of pools it should be responsible for. You need to restart <code>neutron-server</code> first of all, and then the target <code>neutron-lbaas-agent</code> in question.</p>
<p>Keep an eye on your logs and at this point you should see your pools come back to life on your target network node.</p>
Cleaning up failed Nova instance migrations2016-09-21T14:16:00+00:00http://dischord.org/2016/09/21/cleaning-up-failed-nova-instance-migrations<p>A quick one from my notes. If you’re blighted with logging spam from <code>nova-compute</code> along the lines of:</p>
<pre><code>Migration instance not found: Instance b4ff3ef3-3cb2-485a-b93c-f23d6dfc91ce could not be found.
</code></pre>
<p>Then you’ve some failed instance migration metadata lingering in your Nova database that needs to be purged. Fortunately this is easy enough to fix, as long as you’re happy with manually deleting data that is 😉</p>
<p>First up, find the corresponding row from the <code>migrations</code> table in the Nova DB:</p>
<pre><code>MariaDB [nova]> select id,status,instance_uuid,source_compute,dest_compute from migrations where instance_uuid = '0db0e707-5e29-4da4-8e23-1c5cdf9a69f7';
+-----+-----------+--------------------------------------+----------------+--------------+
| id | status | instance_uuid | source_compute | dest_compute |
+-----+-----------+--------------------------------------+----------------+--------------+
| 998 | confirmed | 0db0e707-5e29-4da4-8e23-1c5cdf9a69f7 | alabama | agenda |
+-----+-----------+--------------------------------------+----------------+--------------+
1 row in set (0.00 sec)
</code></pre>
<p>Then it’s simply a case of removing the offending entry:</p>
<pre><code>MariaDB [nova]> delete from migrations where id = '998' limit 1;
Query OK, 1 row affected (0.00 sec)
MariaDB [nova]> select id,status,instance_uuid,source_compute,dest_compute from migrations where instance_uuid = '0db0e707-5e29-4da4-8e23-1c5cdf9a69f7';
Empty set (0.00 sec)
</code></pre>
<blockquote>
<p>NB: You don’t have to restart any services; You should find that once <code>nova-compute</code> synchronises its state over the next few minutes then that spam ceases.</p>
</blockquote>
More on Docker and Puppet2016-09-10T00:00:00+00:00http://dischord.org/2016/09/10/more-on-docker-and-puppet<p>I gave a presentation at the inaugural - and awesome - <a href="http://www.meetup.com/Docker-Manchester/events/232283804/">Docker Manchester</a> meetup a few weeks back which detailed my <a href="http://dischord.org/2016/03/27/docker-and-puppet/">“building Docker images with Puppet”</a> workflow and why we’ve headed down that path. Since then it’s been refined a fair bit and so my original post is now a bit out of date. So, time for an update!</p>
<p>Headline changes are:</p>
<ul>
<li>No longer building a ‘base’ Puppet image;</li>
<li>A switch to Puppet 4 and the <a href="https://puppet.com/blog/say-hello-to-open-source-puppet-4">AIO</a> packaging which includes <a href="https://github.com/puppetlabs/r10k">r10k</a>;</li>
<li>Use of the latter for handling installation of modules on a per-image basis;</li>
<li>Discovery of <a href="https://github.com/grammarly/rocker">‘Rocker’</a> which makes mounting volumes at image build time (more on that shortly) possible, as well as an ability to template the build process.</li>
</ul>
<p>Most of this was prompted and inspired by the brilliant work that <a href="http://www.morethanseven.net/">Gareth Rushgrove</a> is doing on pretty much exactly this problem.</p>
<p>Here’s the skinny from a working example that we use at <a href="http://www.datacentred.co.uk">DataCentred</a>. It’s basically what I discussed during my presentation - the steps required to build an image from which we can deploy a container that runs <a href="http://docs.openstack.org/developer/horizon/">OpenStack’s Horizon</a>.</p>
<blockquote>
<p><em>I’ve also updated my <a href="https://github.com/yankcrime/docker-puppet">personal repo</a> to follow this approach, so have a poke around there for something to clone and mess around with.</em></p>
</blockquote>
<h2 id="overview">Overview</h2>
<p>The tl;dr summary is that this approach uses Puppet in a mode which runs once during the image build process and applies any relevant configuration. It uses Rocker to share data volumes across builds and to template common aspects of configuration and build artefacts.</p>
<h2 id="introducing-rocker">Introducing Rocker</h2>
<p>Our starting point is this Dockerfile which containers a few Rocker-specific options:</p>
<pre><code>FROM ubuntu:16.04
ENV DOCKER_BUILD_DOMAIN='sal01.datacentred.co.uk'
ENV FACTER_domain=${DOCKER_BUILD_DOMAIN:-vagrant.test} FACTER_role='horizon'
ENV PUPPET_AGENT_VERSION="1.6.1" UBUNTU_CODENAME="xenial"
ENV PATH=/opt/puppetlabs/server/bin:/opt/puppetlabs/puppet/bin:/opt/puppetlabs/bin:$PATH
MOUNT /opt/puppetlabs /etc/puppetlabs /root/.gem
MOUNT ./puppet/hieradata:/hieradata
MOUNT ~/.config/keys:/keys
RUN apt-get update && \
apt-get install -y lsb-release git wget && \
wget https://apt.puppetlabs.com/puppetlabs-release-pc1-"$UBUNTU_CODENAME".deb && \
dpkg -i puppetlabs-release-pc1-"$UBUNTU_CODENAME".deb && \
rm puppetlabs-release-pc1-"$UBUNTU_CODENAME".deb && \
apt-get update && \
apt-get install --no-install-recommends -y puppet-agent="$PUPPET_AGENT_VERSION"-1"$UBUNTU_CODENAME" && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN /opt/puppetlabs/puppet/bin/gem install hiera-eyaml deep_merge r10k:2.2.2 --no-ri --no-rdoc
COPY horizon/Puppetfile /
COPY puppet/modules/profile /profile
COPY puppet/default.pp /
COPY Rockerfile /Dockerfile
RUN r10k puppetfile install --moduledir /etc/puppetlabs/code/modules && \
ln -s /profile /etc/puppetlabs/code/modules/profile && \
puppet apply /default.pp --verbose --show_diff --summarize --hiera_config=/hieradata/hiera.yaml && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
EXPOSE 80
CMD ["/usr/bin/supervisord", "-n"]
TAG horizon:mitaka
</code></pre>
<p>Amongst other things, <a href="https://github.com/grammarly/rocker">Rocker</a> provides a <code>MOUNT</code> directive which you can use to share volumes across image builds as well as being able mount in local directories at build time. This is great because it now means it’s possible to have per-image module installation in a way that doesn’t slow the process down too much. It also solves a couple of the other flaws I pointed out in my previous post, chief amongst which is the concern that you’re baking in a crapton of extra guff - configuration data, Puppet modules - that you simply don’t need. Using Rocker means it’s mounted only when it’s needed - i.e at build time.</p>
<p>Maintaining these on a per-image basis is a pain in the butt as you end up with a slew of Dockerfiles that are almost exactly the same, bar a few options. Fortunately it’s also possible to template your Dockerfiles with this same tool, and as all we basically need to change across builds are these handful of parameters - specifically the ‘role’ that’s used to dicate to Puppet which configuration classes to include, which ports to expose, and what image tags to apply - this means we can keep duplication of our build artefacts down to a reasonable minimum.</p>
<p>If we go ahead and do that we can get this down to a couple of files that are common to all images: a templated ‘Rockerfile’ and a bit of YAML with all the keys / values:</p>
<pre><code>
$ cat common/Rockerfile
FROM {{ .BASE }}
MAINTAINER {{ .MAINTAINER }}
ENV DOCKER_BUILD_DOMAIN={{ .DOCKER_BUILD_DOMAIN }}
ENV FACTER_role={{ .ROLE }} FACTER_domain=${DOCKER_BUILD_DOMAIN:-vagrant.test}
ENV PUPPET_AGENT_VERSION={{ .PUPPET_AGENT_VERSION }}
ENV UBUNTU_CODENAME={{ .UBUNTU_CODENAME }}
ENV PATH={{ .PATH }}
MOUNT {{ .MOUNT.puppet }}
MOUNT {{ .MOUNT.hiera }}
MOUNT {{ .MOUNT.keys }}
RUN {{ .RUN.puppet_install }}
RUN {{ .RUN.install_gems }}
COPY puppet/r10k/{{ .ROLE }} /Puppetfile
COPY {{ .COPY.profile }}
COPY {{ .COPY.defaultpp }}
RUN {{ .RUN.puppet_apply }}
CMD {{ .CMD }}
EXPOSE {{ .EXPOSE }}
TAG {{ .TAG }}
</code></pre>
<p>And then the YAML data that’s used to populate this before it’s sent to the Docker build API:</p>
<pre><code>$ cat common/common.yaml
BASE: 'ubuntu:16.04'
DOCKER_BUILD_DOMAIN: 'sal01.datacentred.co.uk'
PUPPET_AGENT_VERSION: '1.6.1'
UBUNTU_CODENAME: 'xenial'
PATH: '/opt/puppetlabs/server/bin:/opt/puppetlabs/puppet/bin:/opt/puppetlabs/bin:$PATH'
MOUNT:
puppet: '/opt/puppetlabs /etc/puppetlabs /root/.gem'
hiera: './puppet/hieradata:/hieradata'
keys: '~/.config/keys:/keys'
RUN:
puppet_install: |
apt-get update && \
apt-get install -y lsb-release inetutils-ping vim git wget && \
wget https://apt.puppetlabs.com/puppetlabs-release-pc1-"$UBUNTU_CODENAME".deb && \
dpkg -i puppetlabs-release-pc1-"$UBUNTU_CODENAME".deb && \
rm puppetlabs-release-pc1-"$UBUNTU_CODENAME".deb && \
apt-get update && \
apt-get install --no-install-recommends -y puppet-agent="$PUPPET_AGENT_VERSION"-1"$UBUNTU_CODENAME" && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
install_gems: '/opt/puppetlabs/puppet/bin/gem install hiera-eyaml deep_merge r10k:2.2.2 --no-ri --no-rdoc'
puppet_apply: |
r10k puppetfile install --moduledir /etc/puppetlabs/code/modules && \
ln -s /profile /etc/puppetlabs/code/modules/profile && \
puppet apply /default.pp --verbose --show_diff --summarize --hiera_config=/hieradata/hiera.yaml && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
COPY:
profile: 'puppet/modules/profile /profile'
defaultpp: 'puppet/default.pp /'
CMD: '["/usr/bin/supervisord", "-n"]'
</code></pre>
<h2 id="puppet">Puppet</h2>
<p>All Puppet-related configuration data lives under its own subdirectory. This includes Hiera data common to all images and the usual smattering of profile classes, so the directory structure ends up something like this:</p>
<pre><code>$ tree -d -L 2
.
├── common
└── puppet
├── hieradata
├── modules
└── r10k
</code></pre>
<p>In order to handle Puppet module installation, per-image <code>r10k</code> Puppetfiles reside in the r10k subdirectory with a filename that reflects the rolename, i.e <code>puppet/r10k/horizon</code>. As per before, there’s still a common <code>default.pp</code> that simply contains (excluding some PATH setting):</p>
<pre><code class="language-puppet">hiera_include('classes')
Class['apt::update'] -> Package <| |>
create_resources(supervisord::program, hiera('service'))
</code></pre>
<p>The role <code>ENV</code> variable - but this time passed in via the <code>rocker build</code> command line - is responsible for defining what classes are included as an image is created. Our Hiera hierarchy has a role subdirectory with files named in accordance with this role variable. So for this example, <code>puppet/hiera/role/horizon.yaml</code> contains:</p>
<pre><code class="language-puppet">---
classes:
- '::profile::openstack::horizon'
service:
'horizon':
'command': '/usr/sbin/apachectl -DFOREGROUND'
'stdout_logfile': '/dev/stdout'
'stderr_logfile': '/dev/stderr'
'stdout_logfile_maxbytes': '0'
'stderr_logfile_maxbytes': '0'
branding::horizon::release: 'mitaka'
</code></pre>
<p>And then the <code>::profile::openstack::horizon</code> class is just a couple of includes:</p>
<pre><code class="language-puppet">class profile::openstack::horizon {
include ::horizon
include ::branding::horizon
file { '/var/log/apache2/horizon_access.log':
target => '/dev/stdout',
require => Package['httpd'],
}
file { [ '/var/log/apache2/horizon_error.log', '/var/log/apache2/error.log' ]:
target => '/dev/stderr',
require => Package['httpd'],
}
}
</code></pre>
<p>The <code>file</code> resources are to make sure everything logs to either <code>STDOUT</code> or <code>STDERR</code> so that we can leverage Docker’s logging capabilities. The rest of the configuration data is mostly in Hiera, scoped to module or role.</p>
<h2 id="build-process">Build Process</h2>
<p>With all that in place, kicking off a build with Rocker is simple enough:</p>
<pre><code class="language-bash">rocker build -f common/Rockerfile --vars common/common.yaml \
--var EXPOSE="80" --var TAG=horizon:mitaka --var ROLE=horizon .
</code></pre>
<p>Apart from the Puppet and r10k configuration data, all we have to do to build a different image is slightly amend the command line options which specify the role, which ports to expose, and the tag. To kick off a <a href="http://docs.openstack.org/developer/glance/">Glance</a> image build for example, it’s simply a case of amending a few of those <code>--var</code> parameters:</p>
<pre><code class="language-bash">rocker build -f common/Rockerfile --vars common/common.yaml \
--var EXPOSE="9191 9292" --var TAG=glance:mitaka --var ROLE=glance .
</code></pre>
<p>Protip: <code>rocker</code> has a <code>-print</code> option that you can use to make sure the resulting Rockerfile is what you’d expect without actually triggering a build.</p>
<p>Resulting images are then pushed to a private registry and then we’re using Puppet again to deploy containers from these images. Good times!</p>
Nova server group affinity policy violations2016-04-11T19:17:00+00:00http://dischord.org/2016/04/11/nova-affinity-policy-violations<p>Virtual machine live migrations are a <del>crutch</del> fact of life for pretty much anyone managing a sizeable estate. Working with <a href="http://www.linux-kvm.org">KVM</a> via <a href="http://www.openstack.org">OpenStack</a> Nova is no exception; Upgrades, hardware failure, and general hypervisor maintenance often necessitate execution of <code>nova live-migration</code> to move an instance off onto another hypervisor elsewhere. This is all well and good, and in normal operation <code>nova-scheduler</code> will handle placement where it sees fit based on the scheduler filters you’ve configured.</p>
<p>Tangentially, there’s a relatively little-known (but essential) feature of Nova called <a href="https://raymii.org/s/articles/Openstack_Affinity_Groups-make-sure-instances-are-on-the-same-or-a-different-hypervisor-host.html">Server Groups</a>. These allow you to - as the name suggests - group servers (instances) together and apply some kind of policy. Mostly this is used to set affinity or anti-affinity rules, as in to make certain members of this group are instantiated on disparate hypervisors. The use-case is obvious - you’re ensuring that your failure domain for that particular group of instances is wider than a single hypervisor, something that’s essential given the CPU and memory density that most cloud operators configure their compute nodes with. Loss of a single compute node can translate to sometimes 100s of instances going offline, and if the entirety of your database server cluster sat on that one physical box then that you’re well and truly in the hatezone. Configuring a server group which contains your database cluster and setting the anti-affinity policy for that group ensures that this won’t - or shouldn’t - happen.</p>
<p>The key word there is shouldn’t. The <code>nova live-migration</code> command takes another argument, that of a target hypervisor, which an OpenStack administrator can use to arbitrarily migrate an instance to. The problem comes about when you realise that this then basically sidesteps the <code>nova-scheduler</code> filters, including the one which enforces anti-affinity policies. Oops.</p>
<p>We’re still debating whether or not this is a bug (it probably is). There was <a href="https://review.openstack.org/#/c/135351/">some work done</a> a while back to ensure <code>live-migration</code> honors these affinity rules, but this doesn’t seem to be the case when an administrator explicity chooses a target hypervisor.</p>
<p>So in the meantime, enter <a href="https://github.com/openstack/osops-tools-contrib/commit/e3b5bc9634c1437ef9538c0a6e7d89c18289b1bb">antiaffinitycheck.py</a> - a script that OpenStack administrators can use to check whether anti-affinity rules are being adhered to for a given Server Group. It’s pretty straightforward, here’s a couple of examples:</p>
<pre><code>$ ./antiaffinitycheck.py --list c353197c-5fbb-410f-a7b3-843452a55276
+--------------------------------------+-----------+----------------+
| Instance ID | Instance | Hypervisor |
+--------------------------------------+-----------+----------------+
| 4c38cf7f-2073-4d96-b377-d2bd29595d8a | app-db-13 | compute2.dev |
| 7c9d9a8e-1758-4145-a118-b72360eff112 | app-db-10 | compute15.dev |
| 29d3fe6b-2303-4ce2-af90-636e216ac95e | app-db-4 | compute32.dev |
| a701afaf-fef5-44f6-9b09-e440ce6b85fe | app-db-14 | compute11.dev |
| 5c40b1e9-8b38-4978-98b2-391e53e65418 | app-db-11 | compute2.dev |
| 0b8b19cb-4fae-47c2-b3c7-07b92e11f4a1 | app-db-5 | compute38.dev |
+--------------------------------------+-----------+----------------+
</code></pre>
<p>The default output of <code>nova server-group-list</code> is a bit awkward and there’s no reformatting options available, so <code>--list</code> shows us servers and hypervisors involved in a given Server Group in a fairly sane format… Now if it’s not obvious that there’s more than one physical hypervisor involved in that group - and there shouldn’t be! - we can use <code>--check</code>:</p>
<pre><code>$ ./antiaffinitycheck.py --check c353197c-5fbb-410f-a7b3-843452a65276
Anti-affinity rules violated in Server Group: c353197c-5fbb-410f-a7b3-843452a55276
+--------------------------------------+-----------+--------------+
| Instance ID | Instance | Hypervisor |
+--------------------------------------+-----------+--------------+
| 4c38cf7f-2073-4d96-b377-d2bd29595d8a | app-db-13 | compute2.dev |
| 5c40b1e9-8b38-4978-98b2-391e53e65418 | app-db-11 | compute2.dev |
+--------------------------------------+-----------+--------------+
</code></pre>
<p>If you’ve any users that make extensive use of these rules then hopefully this script will come in handy for making sure everything’s as it should be.</p>
Dartmoor2016-04-06T08:32:00+00:00http://dischord.org/2016/04/06/dartmoor<p>Dartmoor’s a long way to go for weekend’s ‘glamping’ but we did it anyway.</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/26249502725/"><img src="https://live.staticflickr.com/1478/26249502725_d4d498ac8b_b.jpg" title="DSC_1182" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/26233683636/"><img src="https://live.staticflickr.com/1494/26233683636_782f281619_b.jpg" title="DSC_1201" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/26193340101/"><img src="https://live.staticflickr.com/1629/26193340101_954fd9a407_b.jpg" title="IMG_0551" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/25654824364/"><img src="https://live.staticflickr.com/1616/25654824364_57ff28cd46_b.jpg" title="DSC_1228" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/25986751400/"><img src="https://live.staticflickr.com/1542/25986751400_38e9c57363_b.jpg" title="DSC_1229" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/26167153472/"><img src="https://live.staticflickr.com/1446/26167153472_65fdf13b75_b.jpg" title="DSC_1233" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/26193344541/"><img src="https://live.staticflickr.com/1480/26193344541_696fd94e5e_b.jpg" title="DSC_1262" /></a></p>
<p>The third shot was taken with my iPhone 6S+. It’s unreal how far cameras in
phones have come in the last few years.</p>
<p>The last four photos are of the <a href="http://www.stone-circles.org.uk/stone/merrivalerows.htm">‘Merrivale Rows’</a>.</p>
Docker and Puppet2016-03-27T13:06:00+00:00http://dischord.org/2016/03/27/docker-and-puppet<blockquote>
<p><strong><em>The principles behind this post still stand, but there’s an updated workflow and tooling which solves a few of the problems mentioned below, <a href="http://dischord.org/2016/09/10/more-on-docker-and-puppet/">described here</a>.</em></strong></p>
</blockquote>
<p>On and off for the last few weeks I’ve been trying to create a reasonably sane workflow that introduces <a href="http://puppetlabs.com">Puppet</a> - my favourite configuration management tool - to <a href="http://docker.com">Docker</a> - everyone’s favourite containerization (is that a word?) technology. I can’t say I’ve really cracked it, I’ve come up with something functional but it’s clear at this point that there’s still a long way for existing configuration management tools - and container-based technologies - to go before we’ve anything reasonably coherent. In fact, the answer might be something different altogether, but who knows.</p>
<p>This all started because I wanted to review the way in which we deploy some of our OpenStack services that would potentially ease the pain of upgrades when it comes to running multiple services on the same host and isolating resources such as shared python libraries. This is a mostly solved problem for some operators (i.e using Python virtualenvs), but I fancied trying to do something cleaner and which at the same time would earn us additional geek cred by being able to tout that fact that “yes, we do run Docker” ;)</p>
<p>I also wanted to re-deploy the services used on my personal domain (i.e the webserver and database that sit behind this very site), so that was a good starting point.</p>
<h2 id="opinions">Opinions</h2>
<p>There’s a few ways to crack this nut, some of which I agree with and a lot of which I don’t. Here’s my thoughts in no particular order:</p>
<ul>
<li>Containers should be ephemeral. If you have to change something in a running container, you should be deploying from a fresh image that contains the necessary change;</li>
<li>A corollary to that is running SSH within a container is a big no-no. You shouldn’t be SSH’ing into them in order to make any configuration amendments;</li>
<li>Likewise, having Puppet run in a container in a master-agent setup is wrong;</li>
<li>Building images for containers using shell scripts is all kinds of wrong and feels like a terrible regression and makes me very sad indeed.</li>
</ul>
<p>There’s a lot of evolving dogma around usage of containers in general - the move towards Unikernels for example - that probably fly in the face of most of what I’m doing here. But if this approach solves a problem and scratches that itch for now then it’s good enough. With that in mind…</p>
<h2 id="why-puppet">Why Puppet?</h2>
<p>Bundling shellscripts and makefiles with your Dockerfile is the wrong way to go, in my opinion. Shell scripts are often fragile, highly opinionated in nature (as in, hard to standardise given the disparity in convention), and this results in horror shows such as inlining configuration or using sed and awk to alter configuration stanzas and variables. I mean, look at the madness that I came up with <a href="http://dischord.org/2013/08/13/docker-and-owncloud-part-2/">here</a>!</p>
<p>These are problems that configuration management and tools like Puppet sought to - and mostly succeeded in - solving. On top of that, I haven’t had to closely examine an application configuration file in quite some time. Puppet becomes the configuration API - all you have to do is know how to write Puppet manifests and you can configure just about anything. No more getting to grips with macro preprocessors just to generate your MTA’s configuration!</p>
<p>I don’t see why any of that should go away with the advent and proliferation of container-based technologies. You still have to generate infrastructure and application configuration <em>somewhere</em>, so why not use the right tool for the job? Puppet is one of the right tools (along with Ansible, Chef, Salt, and so on - just pick one).</p>
<h2 id="having-a-go">Having a go</h2>
<p>Plenty people have attempted to this already, and in fact it’s already entirely possible that others have come to the same conclusion and I’ve managed to miss this. Apologies if so! A year or so ago James Turnbull came up with <a href="https://puppetlabs.com/blog/building-puppet-based-applications-inside-docker">this workflow</a> as a suggestion, but that falls pretty far from the mark when you’re developing a Puppet manifest to generate your image because running <a href="https://github.com/voxpupuli/librarian-puppet">librarian-puppet</a> each time is slow and adds a significant amount of time to the Docker image build process.</p>
<p>Instead I’ve settled on the following process:</p>
<ul>
<li>Manage modules (still using librarian-puppet for now) from the ‘host’ OS, the host in this case being wherever you’re building your image;</li>
<li>Generate a ‘base’ image which contains Puppet and its various dependancies, along with a couple of standard hooks so that your Hiera data and manifests are included by default when you inherit that image;</li>
<li>Run <code>puppet apply</code> as part of your target image generation process;</li>
<li>Clean up after yourself.</li>
</ul>
<p>It’s not perfect, but it works.</p>
<h2 id="structure">Structure</h2>
<p>Everything I’m about to describe is <a href="https://github.com/yankcrime/docker-puppet">in this GitHub repo</a>, but the salient bits of this workflow are structured as follows:</p>
<pre><code>.
├── Dockerfile
├── Puppetfile
├── default.pp
├── docker
│ ├── cachier
│ │ └── Dockerfile
│ └── dischord
│ ├── database
│ └── webserver
├── hiera.yaml
├── modules
│ └── profile, nginx, etc.
└── hieradata
├── common.yaml
├── container
│ ├── dischord_database.yaml
│ └── dischord_webserver.yaml
├── nodes
└── role
├── cachier.yaml
├── database.yaml
└── webserver.yaml
</code></pre>
<p>The <a href="https://github.com/yankcrime/docker-puppet/blob/master/Dockerfile">top-level Dockerfile</a> defines my base image, which includes everything needed to bootstrap Puppet as well as the additional hooks needed to get my modules, Hiera data, and manifests in place to do the configuration. Building this base image is simply a case of running <code>$ docker build -t puppet.</code> Pretty standard stuff. There’s some <code>ONBUILD</code> options in there to ensure that any image derived from this base include those file in the mix, which we’ll then need to use <code>puppet apply</code>.</p>
<h2 id="docker">Docker</h2>
<p>I then have a sub-directory called <code>docker</code> which contains per-site (for now) Dockerfiles, so in my personal example these are grouped under <code>dischord</code>. These are also straightforward, so the webserver one looks like:</p>
<pre><code>FROM puppet:latest
MAINTAINER Nick Jones "nick@dischord.org"
ENV FACTER_role='webserver'
ENV FACTER_container='dischord_webserver'
RUN puppet apply --verbose \
--modulepath /puppet/modules \
--hiera_config /puppet/hiera.yaml \
--manifestdir /puppet/ /puppet/default.pp
RUN apt-get -y clean && rm -rf /puppet
EXPOSE 80 443
CMD ["/usr/bin/supervisord", "-n"]
</code></pre>
<p>Other Dockerfiles follow almost this exact same pattern. The key differences between generated images are a couple of top-level facts, in this case <code>role</code> and <code>container</code>, and also what’s exposed service-wise. I’m not totally settled on this arrangement but it works for me now; I have a generic set of configuration options I want to apply to any webserver and then some container (i.e purpose) specific options that get inherited.</p>
<h2 id="puppet">Puppet</h2>
<p>The necessary configuration data in Puppet (and Hiera) is also pretty straightforward. Basically the top-level role defines which profile classes to include, and these then take care of including any application-specific facts plus ‘workarounds’. There’s some basic configuration in place for Hiera (defining the hierarchy, duh) and then we run <code>puppet apply</code> as detailed in the above Dockerfile. Here’s the <code>nginx</code> profile class:</p>
<pre><code class="language-puppet">class profile::nginx {
include ::nginx
create_resources(nginx::resource::vhost, hiera('vhosts'))
create_resources(nginx::resource::location, hiera('locations'))
file { '/var/www':
ensure => 'directory',
owner => 'root',
group => 'root',
}
file_line { 'nginx_foreground':
path => '/etc/nginx/nginx.conf',
line => 'daemon off;',
require => Class['::nginx'],
}
}
</code></pre>
<p>Which then goes on to inherit configuration data from the module level and eventually the container level, so what’s inherited for my website is this:</p>
<pre><code class="language-yaml">vhosts:
'dischord.org':
'www_root': '/srv/www/'
'try_files':
- '$uri'
- '$uri/'
- '/index.html'
- '/index.php?$query_string'
locations:
'php-fastcgi':
'vhost': 'dischord.org'
'location': '~ \.php$'
'fastcgi': 'unix:/var/run/php5-fpm.sock'
</code></pre>
<p>What this means is that each time I want update the configuration for my website’s nginx Docker container, I just need to add a few lines to a YAML file and then trigger the build and deployment of a new container from this updated image. Easy. A video tells a lot of words, so here’s me defining building a database image. The first run I just build the image with the configuration as-is, and then I make a couple of changes to Hiera to define a new database and then re-generate the image:</p>
<center><script type="text/javascript" src="https://asciinema.org/a/40506.js" id="asciicast-40506" async=""></script></center>
<p>Using Puppet to manage configuration in this way doesn’t add much in the way of build time, in fact we’re mostly waiting for packages to download (albeit via a local cache) and install.</p>
<h2 id="problems">Problems</h2>
<p>There’s a few things that suck about this workflow and using Puppet to build Docker images:</p>
<ul>
<li>Docker hasn’t been designed with this sort of configuration management in mind, so having to copy the entirety of that repo into each and every container at build time feels bad. It’d be nice if you could just mount the Puppet-related stuff at build time to save you having to clean up after yourself, and indeed other people have come up with different use-cases for this option but so far it’s not been implemented;</li>
<li>It breaks the image layers philosophy (see above point). You’ve got two technologies competing for their idea of idempotency, and because of the way these two interact Puppet eventually wins. The user loses because ultimately it takes longer to generate an image;</li>
<li>Idiosyncracies to do with how containers act differently from a standard Linux installation, i.e init, that cause problems with some modules. In my case I’ve ended up standardising on <a href="http://supervisord.org">supervisord</a> to handle running applications in containers, but sometimes you have consider workaround for modules that assume a working Upstart configuration, i.e the <a href="https://forge.puppetlabs.com/puppetlabs/mysql">puppetlabs-mysql</a> module which errors with:</li>
</ul>
<pre><code>Debug: Executing '/sbin/initctl --version'
Error: /Stage[main]/Mysql::Server::Service/Service[mysqld]: Could not evaluate: undefined method `[]' for nil:NilClass
</code></pre>
<p>To fix this, you need to override the default service provider to be plain ol’ <code>init</code>, i.e in the case of the <code>puppetlabs-mysql</code> module:</p>
<pre><code>mysql::server::service_provider: 'init'
</code></pre>
<ul>
<li>Puppet’s lack of support for the more ‘minimal’ Linux distributions leads to slightly ‘bloated’ images. Taking advantage of the convenience of Puppet’s fantastic community and ecosystem means you really need to stick to Ubuntu or RHEL, which in turn means fatter images. There’s ongoing work for better support for things like Alpine Linux but there’s a way to go yet. For me, the convenience outweighs the extra bandwidth requirements.</li>
</ul>
<h2 id="next-steps">Next steps</h2>
<p>There’s a still a ways to go before this workflow becomes useful. Missing steps from this post are getting a private Docker Registry stood up, and then deploying containers from our generated images automatically. Again, <a href="https://forge.puppetlabs.com/garethr/docker">Puppet can be used here</a> but that’s for another post at a later date.</p>
<p>It’d be even better if we made more use of service discovery tools such as Consul to handle some aspects of configuration but really that’s outside the scope of this simplified example. Of course, it could just be that over the coming few weeks and months I bin this stuff off altogether but for now it’s a fun diversion, if nothing else ;)</p>
Cleaning up after Neutron2016-01-05T17:42:00+00:00http://dischord.org/2016/01/05/cleaning-up-after-neutron<p><em>I’ve cobbled this post together from some older notes - there’s a few things in here that haven’t
been a problem (for <a href="http://www.datacentred.co.uk">us</a>) since Juno, and with recent changes to
Neutron i.e DVR and L3-HA some of it is lapsing into irrelevance anyway. But seeing as it came up
recently on <a href="https://wiki.openstack.org/wiki/IRC">IRC</a> I thought I’d put it together here just in
case anyone else needs some help.</em></p>
<p>Managing orphaned Neutron objects is something that most operators are probably going to have to do
at some point. Whether it’s related to a configuration issue or you’ve had some kind of problem
with Neutron itself, you can often end up with network namespaces on your network nodes that are
effectively redundant, hogging resources that could otherwise be used elsewhere.</p>
<p>As an example, <a href="https://bugs.launchpad.net/neutron/+bug/1052535">the default configuration in Juno</a>
when installing from Ubuntu’s packages was to not delete namespaces after their associated network
or router was removed. If you didn’t keep tabs on this, you’d soon end up with a lot of redundant
namespaces on your network nodes. As a public cloud operator this is especially problematic when
you’ve got public IPv4 address space to manage and you really don’t want precious addresses being
wasted on gateway interfaces for virtual routers that are no longer in use.</p>
<p>So what to do?</p>
<ul>
<li><a href="#identifying">Identifying orphans</a></li>
<li><a href="#routers">Cleaning up routers</a></li>
<li><a href="#networks">Cleaning up networks</a></li>
<li><a href="#namespaces">Cleaning up namespaces</a></li>
</ul>
<h2 id="identifying-orphans"><a name="identifying"></a>Identifying orphans</h2>
<p>I wrote a <a href="http://dischord.org/2015/04/14/openstack-orphans">small Python script</a> a while back that
helps to do exactly this. It’s <a href="https://github.com/openstack/osops-tools-generic/blob/master/neutron/listorphans.py">now a part of the official OSOps
repo</a>, and
using it we can do something like:</p>
<pre><code>❯ ./listorphans.py routers
45 orphan(s) found of type routers [03fe3926-167c-4460-a139-12335615a02c,
096bfb34-2df0-4781-993d-5a0edb0db179, 16a48cf4-940d-4221-9427-e8037a223bb4
<snip>
</code></pre>
<p>If you wanted to double check some of these before you go ahead and delete anything, that’s both
sensible and easy. The script is calling the Neutron and Keystone APIs, and for each object
(routers in our case), it’s seeing if the associated tenant ID is valid. Let’s do the same from the
CLI:</p>
<pre><code>❯ neutron router-show 096bfb34-2df0-4781-993d-5a0edb0db179
+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| admin_state_up | True |
| distributed | False |
| external_gateway_info | {"network_id": "6751cb30-0aef-4d7e-94c3-ee2a09e705eb", "enable_snat": true, "external_fixed_ips": [{"subnet_id": "2af591ca-48ac-42b7-afc6-e691b3aa4c8a", "ip_address": "185.98.148.134"}]} |
| ha | False |
| id | 096bfb34-2df0-4781-993d-5a0edb0db179 |
| name | default |
| routes | |
| status | ACTIVE |
| tenant_id | 6ca0dd9ace034080853781c411f8e7a8 |
+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
</code></pre>
<pre><code>❯ openstack project show 6ca0dd9ace034080853781c411f8e7a8
No tenant with a name or ID of '6ca0dd9ace034080853781c411f8e7a8' exists.
</code></pre>
<p>Finding redundant Neutron networks is just a case of running the script with the <code>networks</code> option
instead:</p>
<pre><code>❯ ./listorphans.py networks
1 orphan(s) found of type networks
5fe2e974-10b4-4f5f-a678-303943137497
</code></pre>
<p>Again, if you want to verify this then you just need to do <code>neutron net-show
5fe2e974-10b4-4f5f-a678-303943137497</code> followed by <code>openstack project show</code> with the UUID of the
associated tenant.</p>
<p>Now that we’ve identified some orphans, we need to clean them up.</p>
<h2 id="cleaning-up">Cleaning up</h2>
<h3 id="routers"><a name="routers"></a>Routers</h3>
<p>Deleting an orphaned router can be a little involved depending on how many interfaces it has
defined. In this example we’ll delete a router that has a gateway interface (i.e an Internet-facing
IP address) and a port in a private subnet. This is probably the most common configuration.</p>
<p>If we look at an example router’s configuration:</p>
<pre><code>❯ neutron router-show 16a48cf4-940d-4221-9427-e8037a223bb4
+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| admin_state_up | True |
| distributed | False |
| external_gateway_info | {"network_id": "6751cb30-0aef-4d7e-94c3-ee2a09e705eb", "enable_snat": true, "external_fixed_ips": [{"subnet_id": "2af591ca-48ac-42b7-afc6-e691b3aa4c8a", "ip_address": "185.98.149.130"}]} |
| ha | False |
| id | 16a48cf4-940d-4221-9427-e8037a223bb4 |
| name | default |
| routes | |
| status | ACTIVE |
| tenant_id | 06348371a09148d194354183108708f8 |
+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
</code></pre>
<p>The first thing to do is remove the gateway interface. This is straightforward:</p>
<pre><code>❯ neutron router-gateway-clear 16a48cf4-940d-4221-9427-e8037a223bb4
Removed gateway from router 16a48cf4-940d-4221-9427-e8037a223bb4
❯ neutron router-show 16a48cf4-940d-4221-9427-e8037a223bb4
+-----------------------+--------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------+
| admin_state_up | True |
| distributed | False |
| external_gateway_info | |
| ha | False |
| id | 16a48cf4-940d-4221-9427-e8037a223bb4 |
| name | default |
| routes | |
| status | ACTIVE |
| tenant_id | 06348371a09148d194354183108708f8 |
+-----------------------+--------------------------------------+
</code></pre>
<p>Note that the second time we run the <code>neutron router-show</code> command there’s now no information
displayed for the <code>external_gateway_info</code>. Now we need to delete the internally-facing ports, so
let’s find out exactly what remaining ports are enabled on this router:</p>
<pre><code>❯ neutron router-port-list 16a48cf4-940d-4221-9427-e8037a223bb4
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
| id | name | mac_address | fixed_ips |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
| ff468adc-1bf0-4bf5-b655-d757fd258047 | | fa:16:3e:1a:fe:fb | {"subnet_id": "758f9ba5-cbb7-4667-9783-ee3c9b5bee98", "ip_address": "192.168.0.1"} |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
</code></pre>
<p>With that information, specifically the associated subnet ID, we can now delete the port:</p>
<pre><code>❯ neutron router-interface-delete 16a48cf4-940d-4221-9427-e8037a223bb4 758f9ba5-cbb7-4667-9783-ee3c9b5bee98
Removed interface from router 16a48cf4-940d-4221-9427-e8037a223bb4.
</code></pre>
<pre><code>❯ neutron router-delete 16a48cf4-940d-4221-9427-e8037a223bb4
Deleted router: 16a48cf4-940d-4221-9427-e8037a223bb4
</code></pre>
<p>I’m lazy, so here’s a script to do the whole thing for us:</p>
<pre><code>#!/usr/bin/env bash
ROUTER=$1
neutron router-gateway-clear $ROUTER
for PORT in $(neutron router-port-list -F fixed_ips $ROUTER | awk '{ print $3 }' | tr -d '\n|",') ; do
neutron router-interface-delete $ROUTER $PORT
done
neutron router-delete $ROUTER
</code></pre>
<p>Now it’s just a case of wrapping that up in a quick for loop for each of the orphaned router UUIDs
we collected earlier.</p>
<h3 id="networks"><a name="networks"></a>Networks</h3>
<p>Once you’ve got the routers out of the way, it’s mostly a simple matter of deleting the networks
with <code>neutron net-delete $NET_UUID</code>. If you see the following error:</p>
<pre><code>❯ ./neutronlistorphans.py networks
1 orphan(s) found of type networks
5fe2e974-10b4-4f5f-a678-303943137497
❯ neutron net-delete 5fe2e974-10b4-4f5f-a678-303943137497
Unable to complete operation on network 5fe2e974-10b4-4f5f-a678-303943137497. There are one or more ports still in use on the network.
</code></pre>
<p>Then, as it suggests, something still has an interface plumbed into that network. If it’s not a
router then it could be a VM, but you can find out what exactly by doing the following:</p>
<pre><code>❯ neutron net-show -F subnets 5fe2e974-10b4-4f5f-a678-303943137497
+---------+--------------------------------------+
| Field | Value |
+---------+--------------------------------------+
| subnets | 96554e99-9eca-40ac-af1c-1ff1870fed0f |
+---------+--------------------------------------+
❯ neutron port-list --all-tenants -c id -c fixed_ips | grep 96554e99-9eca-40ac-af1c-1ff1870fed0f
| 5badcd73-a7cb-4e65-a368-254458b1b203 | {"subnet_id": "96554e99-9eca-40ac-af1c-1ff1870fed0f", "ip_address": "192.168.0.1"} |
| 6938fff5-9b6c-4bb5-bf81-d47b6fbc44f9 | {"subnet_id": "96554e99-9eca-40ac-af1c-1ff1870fed0f", "ip_address": "192.168.0.3"} |
</code></pre>
<p>And then doing <code>neutron port-show $UUID</code> on each of those ports will tell you what’s still lingering. In
my case it’s a router and a virtual machine. Oops - I should probably get rid of those as well.</p>
<h2 id="namespaces"><a name="namespaces"></a>Namespaces</h2>
<p>What about orphaned namespaces then? Here’s a quick script that’ll identify any L3
(<code>qrouter-</code>) namespaces across a number of network nodes that Neutron doesn’t know about
(and so is no longer managing):</p>
<pre><code>#!/usr/bin/env bash
for netnode in $1 ; do
echo $netnode
for router in $(ssh $netnode 'ip netns list | grep qrouter | cut -d - -f 2-20') ; do
neutron router-show $router | grep -i unable
done
done
</code></pre>
<p>Just call it with a list of network nodes and it should return the invalid namespaces.</p>
<p>For L2 namespaces (<code>qdhcp-</code>), use this slightly altered version instead:</p>
<pre><code>#!/usr/bin/env bash
for netnode in $1 ; do
echo $netnode
for router in $(ssh $netnode 'ip netns list | grep qdhcp | cut -d - -f 2-20') ; do
neutron net-show $router | grep -i unable
done
done
</code></pre>
<p>Now, for each of the invalid <code>qdhcp</code> or <code>qrouter</code> namespaces we need to manually delete the network
namespace itself as well as the corresponding port(s) in OVS. In this example I’m on a network node
called <code>osnet1</code> and deleting a router whose UUID is <code>1e944eb4-2773-40f1-9a03-7403f915b334</code>:</p>
<pre><code>nick@osnet1:~$ sudo ip netns exec qrouter-1e944eb4-2773-40f1-9a03-7403f915b334 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2956: qg-51fde206-37: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether fa:16:3e:df:ea:ee brd ff:ff:ff:ff:ff:ff
inet 185.98.149.219/23 brd 185.98.149.255 scope global qg-51fde206-37
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fedf:eaee/64 scope link
valid_lft forever preferred_lft forever
</code></pre>
<p>Here we have a router with a single (gateway) interface configured - <code>qg-51fde206-37</code>. Verify that there’s an OVS port we need to delete as well:</p>
<pre><code>nick@osnet1:~$ sudo ovs-vsctl show | grep -A5 -B5 qg-51fde206-37
type: internal
Port "qg-eebb6dbe-9a"
tag: 1
Interface "qg-eebb6dbe-9a"
type: internal
Port "qg-51fde206-37"
tag: 1
Interface "qg-51fde206-37"
type: internal
Port "qr-e724f3da-db"
tag: 113
Interface "qr-e724f3da-db"
type: internal
</code></pre>
<p>The fact that it’s tagged with ‘1’ is another sign that it’s invalid. In the
case of a DHCP namespace it’ll have a ‘tap’ interface and if it’s invalid
it’ll have a tag of 4095. Now we can go ahead and delete the namespace and then
the OVS port with confidence:</p>
<pre><code>nick@osnet1:~$ sudo ip netns delete qrouter-1e944eb4-2773-40f1-9a03-7403f915b334
nick@osnet1:~$ sudo ovs-vsctl del-port qg-51fde206-37
nick@osnet1:~$ sudo ip netns exec qrouter-1e944eb4-2773-40f1-9a03-7403f915b334 ip a
Cannot open network namespace "qrouter-1e944eb4-2773-40f1-9a03-7403f915b334": No such file or directory
nick@osnet1:~$ sudo ovs-vsctl show | grep qg-51fde206-37
nick@osnet1:~$
</code></pre>
<p>Done. Oh too many steps you say? Here’s another script that does the work for you (for routers
anyway):</p>
<pre><code>#!/usr/bin/env bash
ROUTER=$1
GIF=$(sudo ip netns exec qrouter-$ROUTER ip a | grep qg | awk '{ print $2 }' RS="\n\n" | cut -d : -f 1)
RIF=$(sudo ip netns exec qrouter-$ROUTER ip a | grep qr | awk '{ print $2 }' RS="\n\n" | cut -d : -f 1)
echo "Deleting network namespace qrouter-$1"
ip netns delete qrouter-$ROUTER
echo "Deleting OVS ports $GIF $RIF"
if [[ $GIF ]]; then ovs-vsctl del-port $GIF ; fi
if [[ $RIF ]]; then ovs-vsctl del-port $RIF ; fi
</code></pre>
<p>In fact it’s because these things have to be done in a specific order that this is sometimes the
reason why you might still have some namespaces hanging around, even if you’ve configured Neutron’s
L3 agent to delete them automatically. It’s not impossible for additional interfaces (that Neutron
doesn’t know about) to be defined in a namespace, which will then cause the auto-delete functionality
to fail.</p>
<p>Deleting the <code>qdhcp</code> namespaces follows a similar process, although they’re normally a little less
problematic as you shouldn’t have any additional L3 interfaces configured. Simply do the <code>ip netns
delete</code> step for each offending namespace, along with any associated OVS ports.</p>
Archiving data in Nova's database2015-12-30T16:59:00+00:00http://dischord.org/2015/12/30/archiving-data-in-nova-s-database<blockquote>
<p>Update: Good news on the below. The issue with <code>nova-manage db archive_deleted_rows</code> has been <a href="https://review.openstack.org/#/c/299474/">fixed in Newton</a>, and also <a href="https://review.openstack.org/#/c/326730/">backported to Mitaka</a>. So if you’re running a fairly recent installation of Nova then you’ve a properly supported option for archiving this data prior to deletion. There’s also <a href="https://blueprints.launchpad.net/nova/+spec/purge-deleted-instances-cmd">this blueprint</a> which would handle purging the data entirely, hopefully slated for implementation in Ocata.</p>
</blockquote>
<p>OpenStack Nova’s database can grow to a significant size over time, thanks to the fact that entries
in the <code>instances</code> table (amongst others) aren’t actually deleted, they’re only flagged as such:</p>
<pre><code>MariaDB [nova]> select count(*) from instances;
+----------+
| count(*) |
+----------+
| 66746 |
+----------+
1 row in set (0.02 sec)
MariaDB [nova]> select count(*) from instances where deleted_at is not null;
+----------+
| count(*) |
+----------+
| 66344 |
+----------+
1 row in set (0.26 sec)
</code></pre>
<p>And of course it doesn’t take long for a reasonably busy platform to build up like that. The
knock-on effect is that certain API calls ending up being particularly slow, notably
<code>os-simple-tenant-usage</code>, as described in this <a href="https://bugs.launchpad.net/nova/+bug/1421471">bug
here</a>. This API is also what’s used when you login to
Horizon with an account that has the ‘admin’ role assigned, and unfortunately it means that it can
take a long time to login as more and more cruft accumulates in the database.</p>
<p>Until a while back this was an easy problem to stay on top of using the <code>nova-manage</code> command, but
sadly this functionality broke <a href="http://lists.openstack.org/pipermail/openstack-dev/2014-December/052896.html">“at some point” because of a change that was
introduced</a>. There’s
work being done to properly address this but it doesn’t look like it’ll land until the next release.</p>
<p>Fortunately in the meantime someone has committed a couple of scripts to the
<a href="https://wiki.openstack.org/wiki/Osops">OSOps</a> repo that takes
care of handling this archival process using Percona’s
<a href="https://www.percona.com/doc/percona-toolkit/2.1/pt-archiver.html">pt-archiver</a> tool and it does the
job nicely. For example, you can obtain a preview of what needs to be done using
<code>openstack_db_archive_progress.sh</code>:</p>
<pre><code># ./openstack_db_archive_progress.sh -d nova -H localhost -u root -p nova123
Wed Dec 30 18:52:24 GMT 2015 nova.block_device_mapping has 1048, 67306 ready for archiving
and 0 already in shadow_block_device_mapping. Total records is 68354
Wed Dec 30 18:52:24 GMT 2015 nova.instance_metadata has 75, 75119 ready for archiving and 0
already in shadow_instance_metadata. Total records is 75194
Wed Dec 30 18:52:25 GMT 2015 nova.instance_system_metadata has 1187555, 50555 ready for
archiving and 0 already in shadow_instance_system_metadata. Total records is 1238110
Wed Dec 30 18:52:26 GMT 2015 nova.instance_actions has 150350, 0 ready for archiving and 0
already in shadow_instance_actions. Total records is 150350
Wed Dec 30 18:52:26 GMT 2015 nova.instance_faults has 1619, 15409 ready for archiving and 0
already in shadow_instance_faults. Total records is 17028
Wed Dec 30 18:52:26 GMT 2015 nova.virtual_interfaces has 0, 0 ready for archiving and 0
already in shadow_virtual_interfaces. Total records is 0
Wed Dec 30 18:52:26 GMT 2015 nova.fixed_ips has 0, 0 ready for archiving and 0 already in
shadow_fixed_ips. Total records is 0
Wed Dec 30 18:52:26 GMT 2015 nova.security_group_instance_association has 0, 0 ready for
archiving and 0 already in shadow_security_group_instance_association. Total records is 0
Wed Dec 30 18:52:26 GMT 2015 nova.migrations has 218, 0 ready for archiving and 0 already in
shadow_migrations. Total records is 218
Wed Dec 30 18:52:26 GMT 2015 nova.instance_extra has 358, 51401 ready for archiving and 0
already in shadow_instance_extra. Total records is 51759
</code></pre>
<p>And then to actually do the archival it’s simply a case of running:</p>
<pre><code># ./openstack_db_archive.sh -d nova -H localhost -u root -p nova123
</code></pre>
<p>By way of comparison, here’s how long <code>nova usage-list</code> (which hits the <code>os-simple-tenant-usage</code> endpoint) took before:</p>
<pre><code># time nova usage-list
[..]
real 0m45.250s
user 0m0.592s
sys 0m0.252s
</code></pre>
<p>And after:</p>
<pre><code># time nova usage-list
[..]
real 0m24.126s
user 0m0.355s
sys 0m0.109s
</code></pre>
<p>Timings and data are from a development environment in VMs (running OpenStack Kilo, with Galera / MariaDB) on my
laptop, so take from that what you will!</p>
<p>Finally, the scripts themselves are mirrored on GitHub here:
<a href="https://github.com/openstack/osops-tools-generic/tree/master/nova">https://github.com/openstack/osops-tools-generic/tree/master/nova</a>.</p>
Cinder multi-backend with multiple Ceph pools2015-12-22T14:34:00+00:00http://dischord.org/2015/12/22/cinder-multi-backend-with-multiple-ceph-pools<p>This isn’t so much of a ‘how-to’ (as it’s been documented with perfect clarity by Sébastien Han <a href="http://www.sebastien-han.fr/blog/2013/04/25/ceph-and-cinder-multi-backend/">here</a>), it’s more of a warning when enabling the multi-backend functionality. <em>tl;dr</em> If you enable Cinder multi-backend, double-check the output of <code>cinder service-list</code> and be sure to update the host mappings for all your existing volumes.</p>
<p>I came across a problem recently after enabling Cinder’s multi-backend feature to support multiple Ceph pools. Attempting to delete existing instances that had volumes associated with them would fail; The symptom was a HTTP/504 (gateway timeout) leaving the machine in an ‘error’ state having sat there ‘deleting’ for some time. Instantiating new virtual machines, attaching volumes, and then deleting them was all fine.</p>
<p>For the failed deletions, as far as Nova’s logs are concerned the delete mostly goes according to plan apart from the call to Cinder to detach the associated volume:</p>
<pre><code>DEBUG cinderclient.client [req-3011c2ec-2274-49e4-9f1e-e9ec7eb850a5 ] Failed attempt(1 of
3), retrying in 1 seconds _cs_request
/usr/lib/python2.7/dist-packages/cinderclient/client.py:297
</code></pre>
<p>And a few seconds later after the third attempt, the following exception makes an appearance:</p>
<pre><code>ERROR oslo.messaging.rpc.dispatcher [req-3011c2ec-2274-49e4-9f1e-e9ec7eb850a5 ] Exception
during message handling: Gateway Time-out (HTTP 504)
</code></pre>
<p>This is where our instance is left in the error state. Tracing through from the original request using the request-id (yay for <a href="https://www.elastic.co/videos/introduction-to-the-elk-stack">ELK</a>) Cinder seemingly gives up and the RPC request times out. We can see where Nova talks to Cinder API, for example:</p>
<pre><code>INFO cinder.api.openstack.wsgi [req-c0854007-3b72-4964-b337-b9e331874c4e
d1a60c7b4bbc427e8f2c3f99d0c2e6d2 509d6ae853114ea9aaaac02804d3a4dd - - -] POST
http://compute.datacentred.io:8776/v1/509d6ae853114ea9aaaac02804d3a4dd/volumes/7a8a46ba-868a-4620-acc5-84467036fecb/action
</code></pre>
<p>There’s the message generated:</p>
<pre><code>DEBUG oslo_messaging._drivers.amqpdriver [req-c0854007-3b72-4964-b337-b9e331874c4e
d1a60c7b4bbc427e8f2c3f99d0c2e6d2 509d6ae853114ea9aaaac02804d3a4dd - - -] MSG_ID is
774b48ebff5540fcb9262f35941f6f9d _send
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:311
DEBUG oslo_messaging._drivers.amqp [req-c0854007-3b72-4964-b337-b9e331874c4e
d1a60c7b4bbc427e8f2c3f99d0c2e6d2 509d6ae853114ea9aaaac02804d3a4dd - - -] UNIQUE_ID is
a39a4a8ec681468b87463968a8b64a2c. _add_unique_id
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqp.py:252
</code></pre>
<p>And then a few seconds later we see another exception:</p>
<pre><code>ERROR cinder.api.middleware.fault [req-c0854007-3b72-4964-b337-b9e331874c4e
d1a60c7b4bbc427e8f2c3f99d0c2e6d2 509d6ae853114ea9aaaac02804d3a4dd - - -] Caught error: Timed
out waiting for a reply to message ID 774b48ebff5540fcb9262f35941f6f9d
[..]
INFO eventlet.wsgi.server [req-c0854007-3b72-4964-b337-b9e331874c4e d1a60c7b4bbc427e8f2c3f99d0c2e6d2 509d6ae853114ea9aaaac02804d3a4dd - - -] Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 468, in handle_one_response
write(b''.join(towrite))
File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 399, in write
_writelines(towrite)
File "/usr/lib/python2.7/socket.py", line 334, in writelines
self.flush()
File "/usr/lib/python2.7/socket.py", line 303, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 376, in sendall
tail = self.send(data, flags)
File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 358, in send
total_sent += fd.send(data[total_sent:], flags)
error: [Errno 104] Connection reset by peer
</code></pre>
<p>There were no other problems on the platform at the time - loadbalancers are fine, RabbitMQ is OK - and so on the face of it this is something of a mystery.</p>
<p>My initial attempts at triaging the issue involved manually trying to detach the volume associated with the instance:</p>
<pre><code>❯ nova reset-state --active 75966ed2-e003-4575-8b64-08c4c414492c
Reset state for server 75966ed2-e003-4575-8b64-08c4c414492c succeeded; new state is active
❯ nova stop 75966ed2-e003-4575-8b64-08c4c414492c
Request to stop server 75966ed2-e003-4575-8b64-08c4c414492c has been accepted.
❯ nova volume-detach 75966ed2-e003-4575-8b64-08c4c414492c
7a8a46ba-868a-4620-acc5-84467036fecb
❯ cinder show 7a8a46ba-868a-4620-acc5-84467036fecb | grep -i status
| status | detaching |
</code></pre>
<p>And there it hung, indefinitely. Googling the symptoms of the problem don’t necessarily get you very far at this point, as there’s <a href="https://bugs.launchpad.net/nova/+bug/1449221">no</a> <a href="https://bugs.launchpad.net/cinder/+bug/1413610">shortage</a> of <a href="https://ask.openstack.org/en/question/62198/how-to-debug-volume-stuck-in-deleting-state-issue/">bugs</a> out there to do with Cinder volumes ending up in a strange state and instances that can’t be deleted. Most of these suggest <a href="https://ask.openstack.org/en/question/66918/how-to-delete-volume-with-available-status-and-attached-to/">diving into the database</a> in order to clean things up but really you should try and discern the root cause if it becomes apparent that this isn’t a one off.</p>
<p>The clue really is in the fact that there’s some kind of messaging timeout. We know that <code>cinder-api</code> is receiving the request from <code>nova-compute</code>, so what’s happening to the message? Why isn’t <code>cinder-volume</code> picking it up? Well, every service on OpenStack has a registry of what runs where in terms of agents, whether it’s <code>nova-conductor</code>, <code>nova-compute</code>, <code>neutron-dhcp-agent</code>, <code>neutron-metadata-agent</code>, etc., and the same is true of Cinder - there’s <code>cinder-volume</code> and <code>cinder-scheduler</code>. Knowing that there’d been some very recent changes to Cinder’s configuration, on a hunch I decided to check the output of <code>cinder service-list</code>:</p>
<pre><code>❯ cinder service-list
+------------------+-----------------------------------------------+----------+-------+
| Binary | Host | Status | State |
+------------------+-----------------------------------------------+----------+-------+
| cinder-scheduler | controller0 | enabled | up |
| cinder-scheduler | controller1 | enabled | up |
| cinder-volume | controller0 | enabled | down |
| cinder-volume | controller1 | enabled | down |
| cinder-volume | rbd:cinder.volumes.flash@cinder.volumes.flash | enabled | up |
| cinder-volume | rbd:cinder.volumes@cinder.volumes | enabled | up |
+------------------+-----------------------------------------------+----------+-------+
</code></pre>
<p>And there is our smoking gun. Enabling the multi-backend functionality has introduced two new instances of <code>cinder-volume</code>, one per pool, and now the previous instances with the old configuration that were responsible for talking to Ceph are now ‘down’ - that’s why the RPC messages from <code>cinder-api</code> aren’t being responded to, because the ‘host’ responsible for that volume is AWOL.</p>
<p>In order to fix this there’s a couple of things that need to be done. First is to administratively disable the older, now redundant, services:</p>
<pre><code>❯ for host in controller{0,1}; do cinder service-disable $host cinder-volume ; done
+-------------+---------------+----------+
| Host | Binary | Status |
+-------------+---------------+----------+
| controller0 | cinder-volume | disabled |
+-------------+---------------+----------+
+-------------+---------------+----------+
| Host | Binary | Status |
+-------------+---------------+----------+
| controller1 | cinder-volume | disabled |
+-------------+---------------+----------+
</code></pre>
<p>The next step is to update the mappings for the volumes themselves, and unfortunately to do that we need to make some changes in the Cinder database. The first change is to the <code>volumes</code> table, and the second is to the <code>volume_mappings</code> table - both of these will update the associated <code>host</code> column for the row that corresponds to the <code>volume_id</code> associated with this instance.</p>
<pre><code>MariaDB [cinder]> select host from volumes where id='7a8a46ba-868a-4620-acc5-84467036fecb';
+---------------------+
| host |
+---------------------+
| controller1#DEFAULT |
+---------------------+
MariaDB [cinder]> select volume_id,attached_host,instance_uuid from volume_attachment where
volume_id='7a8a46ba-868a-4620-acc5-84467036fecb';
+--------------------------------------+---------------------+--------------------------------------+
| volume_id | attached_host | instance_uuid |
+--------------------------------------+---------------------+--------------------------------------+
| 7a8a46ba-868a-4620-acc5-84467036fecb | controller1#DEFAULT | 75966ed2-e003-4575-8b64-08c4c414492c |
+--------------------------------------+---------------------+--------------------------------------+
</code></pre>
<p>The convention when updating the <code>host</code> and <code>attached_host</code> columns is to use <code>host#pool</code>, so to fix these rows we need to do the following:</p>
<pre><code>MariaDB [cinder]> update volumes set host='rbd:cinder.volumes@cinder.volumes#cinder.volumes'
where id='7a8a46ba-868a-4620-acc5-84467036fecb' limit 1;
MariaDB [cinder]> update volume_attachment set
attached_host='rbd:cinder.volumes@cinder.volumes#cinder.volumes' where
volume_id='7a8a46ba-868a-4620-acc5-84467036fecb' limit 1;
</code></pre>
<p>Now let’s try deleting our instances again:</p>
<pre><code>❯ cinder reset-state --state in-use 7a8a46ba-868a-4620-acc5-84467036fecb
❯ nova delete 75966ed2-e003-4575-8b64-08c4c414492c
Request to delete server 75966ed2-e003-4575-8b64-08c4c414492c has been accepted.
</code></pre>
<p>And a couple of seconds later:</p>
<pre><code>❯ nova show 75966ed2-e003-4575-8b64-08c4c414492c
ERROR (CommandError): No server with a name or ID of '75966ed2-e003-4575-8b64-08c4c414492c'
exists.
</code></pre>
<p>Hurrah! If that works you’ll need to update both the <code>volumes</code> and <code>volume_attachments</code> table en masse to reflect the new hosts, and then delete the service entries for the redundant services themselves.</p>
Battle of Britain 75th Anniversary Airshow2015-09-21T16:49:00+00:00http://dischord.org/2015/09/21/battle-of-britain-75th-anniversary-airshow<p>This past weekend saw Duxford and the <a href="http://www.iwm.org.uk/visits/iwm-duxford">IWM</a> host an airshow dedicated to the 75th anniversary of
the Battle of Britain. It was an incredible event with the sun shining almost all day long - here’s
a few of the photos that I managed to take throughout the day:</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/20978537183/"><img src="https://live.staticflickr.com/5798/20978537183_be3bf86a70_b.jpg" title="DSC_0329" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/21608528251/"><img src="https://live.staticflickr.com/5819/21608528251_5cd96b2b13_b.jpg" title="DSC_0410" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/21411824458/"><img src="https://live.staticflickr.com/5716/21411824458_5da673fb65_b.jpg" title="DSC_0458" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/21588322702/"><img src="https://live.staticflickr.com/654/21588322702_2bd27a9f61_b.jpg" title="DSC_0563" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/21411823548/"><img src="https://live.staticflickr.com/5750/21411823548_fb4530b062_b.jpg" title="DSC_0641" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/21411823298/"><img src="https://live.staticflickr.com/5697/21411823298_ea1d85743d_b.jpg" title="DSC_0678" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/21573468186/"><img src="https://live.staticflickr.com/5779/21573468186_c22280aedd_b.jpg" title="DSC_0974" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/21588321222/"><img src="https://live.staticflickr.com/5821/21588321222_34e947f391_b.jpg" title="DSC_1005" /></a></p>
<p>That last photo is mostly the closest I was able to get to capturing all 17 (!) Spitfires that took
to the sky for the show’s finale. Seeing and hearing all of those incredible machines in the sky at
the same time was a singular experience.</p>
<p>Thanks again to the IWM for putting a fantastic show on, and there’s a few more photos in the series
<a href="https://www.flickr.com/photos/yankcrime/albums/72157658849898596">over on Flickr</a>.</p>
Chamonix2015-06-19T18:39:00+00:00http://dischord.org/2015/06/19/chamonix<p><a href="http://www.chamonixit.com">Chris</a> and Catherine’s wedding was the perfect excuse to spend a few days in and around Chamonix. The weather couldn’t have been any better for the big day, but the rest of the time it was a bit hit-and-miss and meant that our trip up the Aiguille du Midi wasn’t ideal for sweeping vistas of the valley:</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/18983394091/"><img src="https://live.staticflickr.com/515/18983394091_1f40907324_b.jpg" title="DSC_0120" /></a></p>
<p>Still, the <a href="http://dischord.org/misc/minorthreat/scanman.htm">ScanmaN</a> came fully prepared with apple vodka, wine, and baguettes - so what was the problem again exactly?</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/18792907698/"><img src="https://live.staticflickr.com/522/18792907698_5259d87b64_b.jpg" title="DSC_0156" /></a></p>
<p>Here’s a few more of my favourites - the rest are on <a href="https://www.flickr.com/photos/yankcrime/sets/72157652448859203">Flickr</a>.</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/18331933723/"><img src="https://live.staticflickr.com/334/18331933723_8232ff1012_b.jpg" title="DSC_0253" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/18764890890/"><img src="https://live.staticflickr.com/406/18764890890_9a44901c1e_b.jpg" title="DSC_0148" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/18975487882/"><img src="https://live.staticflickr.com/363/18975487882_c9ffdf1b76_b.jpg" title="DSC_0275" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/18764917748/"><img src="https://live.staticflickr.com/346/18764917748_0632ff7b5b_b.jpg" title="DSC_0122" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/18764917188/"><img src="https://live.staticflickr.com/3826/18764917188_1e55561887_b.jpg" title="DSC_0283" /></a></p>
OpenStack Summit Vancouver 20152015-05-23T20:48:00+00:00http://dischord.org/2015/05/23/vancouver-openstack-summit-2015<p>It’s a fair old way to go for just a week’s stay, but the pain of long-distance travel was more than worthwhile for an opportunity to attend the first <a href="https://www.openstack.org/summit/vancouver-2015/">OpenStack Summit</a> of 2015, and to soak up some of the sights and sounds that the fantastic city of Vancouver has to offer. Here’s a few photos I snapped during the bits of time we did have free away from the Summit itself:</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/18078415381/"><img src="https://live.staticflickr.com/5447/18078415381_3c96a07236_b.jpg" title="DSC_9990" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/17456963583/"><img src="https://live.staticflickr.com/5457/17456963583_7f7c2a5d6d_b.jpg" title="DSC_0043" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/17804281152/"><img src="https://live.staticflickr.com/5341/17804281152_c3e99559f6_b.jpg" title="DSC_9963" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/17806810375/"><img src="https://live.staticflickr.com/7787/17806810375_61687738cb_b.jpg" title="DSC_9985" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/17780536716/"><img src="https://live.staticflickr.com/7673/17780536716_8854f3914a_b.jpg" title="DSC_9953" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/17780540506/"><img src="https://live.staticflickr.com/8767/17780540506_ac3f89d975_b.jpg" title="DSC_9933" /></a></p>
<p>And a few more over on Flickr <a href="https://www.flickr.com/photos/yankcrime/sets/72157652616326350">here</a>.</p>
Orchestrating CoreOS with OpenStack Heat2015-04-18T17:53:00+00:00http://dischord.org/2015/04/18/orchestrating-coreos-with-openstack-heat<p>Having finally spent a bit of time with <a href="https://wiki.openstack.org/wiki/Heat">OpenStack’s Heat</a>, I’ve started to see what I can do with automating infrastructure deployments and services by using it in conjunction with <a href="http://coreos.org">CoreOS</a>. This post sort of builds on <a href="http://blog.scottlowe.org/2014/08/13/deploying-coreos-on-openstack-using-heat/">Scott Lowe’s introduction to CoreOS and Heat</a> and does a few of the things he suggests, such as creating a dedicated network and deploying an arbitrary number of instances. It’s just enough to get a cluster stood up with which you can then define some services and roll out your application stack in order to start testing.</p>
<p>The Heat template itself looks like this:</p>
<noscript><pre>400: Invalid request</pre></noscript>
<script src="https://gist.github.com/af2a9dce692014dd38b8.js"> </script>
<p>Most of it is pretty standard, the interesting bits that I think are worth pointing out are:</p>
<ul>
<li>Line 22, or thereabouts, where I list a few flavors. These are specific to <a href="http://www.datacentred.co.uk/on-demand-compute/">DataCentred’s OpenStack installation</a> and will need changing if you’re deploying this elsewhere;</li>
<li>The image ID on line 35 is also specific to <a href="http://www.datacentred.co.uk">DataCentred</a> but can be overridden either here or as a parameter when you create the stack;</li>
<li>Line 99, where we define a ResourceGroup with a count passed in as a parameter;</li>
<li>Line 111, which saves confusion by suffixing the (parameterised) name with the RG’s index value for the OS::Nova::Server instance;</li>
<li>The <code>user_data</code> section which does just about enough to start <a href="https://coreos.com/etcd/">etcd</a> and <a href="https://github.com/coreos/fleet">fleet</a> and gets our instances talking to one another.</li>
</ul>
<p>To launch this stack using the <code>heat</code> CLI, run the following command:</p>
<pre><code>$ heat stack-create -f coreos-heat.yaml \
-P key_name=deadline \
-P count=3 \
-P public_net_id=6751cb30-0aef-4d7e-94c3-ee2a09e705eb \
-P discovery_url=$(curl -sw "\n" 'https://discovery.etcd.io/new?size=3') \
-P name=webserver coreos
</code></pre>
<p>In that example, <code>webserver</code> is the prefix that each instance will use and the last argument, <code>coreos</code>, is the name of the stack itself. And yes, passing in <code>count=3</code> is a bit redundant as it’s the default in the template, but for illustration’s sake I think it helps here ;) The <code>discovery_url</code> is passed in as a parameter, and in our example and in my lab I’ve been using the etcd project’s provided discovery service, but you’re free to run your own instead of course.</p>
<p>Kick that command off, give it a few minutes, and eventually you should have a successfully deployed stack. Login to one of the instances and then you’ll be able to verify the state of the cluster:</p>
<pre><code>nick@deadline:~> ssh core@185.43.218.192
Last login: Sat Apr 18 18:06:22 2015 from 86.143.53.8
CoreOS stable (607.0.0)
core@webserver-0 ~ $ fleetctl list-machines
MACHINE IP METADATA
1fc4fbb3... 192.168.10.24 -
59c16e8a... 192.168.10.23 -
882768f6... 192.168.10.22 -
</code></pre>
<p>At this point you can define some units and launch containers in your cluster via <code>fleet</code> - the CoreOS project’s website has you covered with a good introduction <a href="https://coreos.com/docs/launching-containers/launching/launching-containers-fleet/">to get you started</a>.</p>
Out-of-sync quotas in OpenStack Nova2015-04-17T14:54:00+00:00http://dischord.org/2015/04/17/out-of-sync-quotas-in-openstack<p>From time-to-time user quotas in OpenStack become out of sync, i.e usage will show X number of instances in use and the reality, Y, is a different value altogether. Until now the way to fix this has been to manually update the <code>nova</code> database and amend the <code>quota_usages</code> table and then trigger a refresh by launching a new machine, i.e:</p>
<pre><code>update quota_usages set in_use = 0 where id = 890 and project_id = '509d6ae853114ea9aaaac02804d3a4dd' limit 1;
</code></pre>
<p>Or potentially updating en masse - if you’re feeling brave - with something like:</p>
<pre><code>update quota_usages, (select usage_id, sum(delta) as sum from reservations where project_id='86d829ffc0ff4efb943131f7d2a18d52' and deleted!=id group by usage_id) as r set quota_usages.in_use = r.sum where quota_usages.id = r.usage_id;
update quota_usages, (select distinct(usage_id) as usage_id from reservations where project_id='ad3e3ee7a08d45df965908704f29b873' and deleted=id ) as r set quota_usages.in_use = 0 where quota_usages.id = r.usage_id;
</code></pre>
<p>However, the CERN chaps have just released a handy little tool that checks and then updates where necessary if there’s any mismatches. They’ve blogged a bit about it <a href="http://openstack-in-production.blogspot.co.uk/2015/03/nova-quota-usage-synchronization.html">here</a> and the script itself is over <a href="https://github.com/cernops/nova-quota-sync">here</a>.</p>
OpenStack Orphans2015-04-14T10:39:00+00:00http://dischord.org/2015/04/14/openstack-orphans<p>It’s surprisingly easy to find yourself with a large number of orphaned Neutron objects in your OpenStack installation, i.e resources left behind after a project or tenant has been deleted. This can be particularly problematic when it comes to routers and floating IP addresses as these can quickly eat into what precious few addresses you might have available, especially if they’re publically accessible.</p>
<p>Here’s a bit of python I’ve knocked together to check for orphaned resources of this type within your OpenStack install. The credentials are taken from your current working environment and you’ll need to be an administrator in order for this to work.</p>
<noscript><pre>400: Invalid request</pre></noscript>
<script src="https://gist.github.com/16e9fec07bd1210209cf.js"> </script>
Installing OpenStack's CLI tools on OS X2015-03-23T12:01:00+00:00http://dischord.org/2015/03/23/installing-openstack-s-cli-tools-on-os-x<p>Here’s a quick guide for installing OpenStack’s CLI tools on OS X in a way that doesn’t make too much of a mess.</p>
<p>First up we need to get a couple of additional tools installed that will ‘pollute’ your base install, but once we’re there we can easily create ‘virtual’ Python development environments and install isolated sets of packages on a per-project basis. This includes creating an ‘openstack’ project in which we’ll install keystone, glance, and so on.</p>
<pre><code>$ sudo easy_install pip
$ sudo pip install virtualenv
$ sudo pip install virtualenvwrapper
</code></pre>
<p>With those three things in place, we can then set about configuring a home directory for our virtual Python environments:</p>
<pre><code>$ mkdir .virtualenv
$ source /usr/local/bin/virtualenvwrapper.sh
</code></pre>
<p>Stick that last line into one of your shell’s startup files. In my case that’s ZSH so I’ve added the following to my <code>~/.zshrc</code>:</p>
<pre><code># virtualenv junk
[[ -s "/usr/local/bin/virtualenvwrapper.sh" ]] && source "/usr/local/bin/virtualenvwrapper.sh"
</code></pre>
<p>Now let’s go and create an OpenStack Python development environment:</p>
<pre><code>$ mkvirtualenv openstack
</code></pre>
<p>To work on this environment at any time, do:</p>
<pre><code>$ workon openstack
</code></pre>
<p>At which point we can go ahead and install the various command-line tools and their dependances all within this environment:</p>
<pre><code>$ (openstack)nick@deadline:~> pip install python-{keystone,glance,nova,neutron,cinder}client
Downloading/unpacking python-keystoneclient
Downloading python_keystoneclient-1.2.0-py2.py3-none-any.whl (392kB): 392kB downloaded
Downloading/unpacking python-glanceclient
Downloading python_glanceclient-0.17.0-py2.py3-none-any.whl (84kB): 84kB downloaded
[..]
Successfully installed python-keystoneclient python-glanceclient python-novaclient python-neutronclient oslo.serialization six oslo.utils oslo.config oslo.i18n msgpack-python netifaces
</code></pre>
<p>And we’re done - all you have to remember is that each time you want to use these CLI tools you’ll need to first do <code>workon openstack</code>.</p>
Troubleshooting OpenStack Neutron Networking, Part One2015-03-09T12:05:00+00:00http://dischord.org/2015/03/09/troubleshooting-openstack-neutron-networking-part-one<p>There’s a bit of gap in the current crop of OpenStack documentation, both official and unofficial, when it comes to doing any kind of end-to-end operational troubleshooting on the networking side of things. This series of posts is an attempt to rectify that and join a few concepts together in a way that other administrators will find useful (I hope!).</p>
<p>Our installation mirrors the recommended architecture involving Neutron + ML2 + Open vSwitch and GRE tunnels so these notes only really apply to that combination. You’re on your own with anything else ;)</p>
<p>In the example below, ‘acid’ is a compute node, ‘deadline’ is my laptop, and ‘osnet1’ is a network node.</p>
<h2 id="dhcp">DHCP</h2>
<h3 id="information-gathering">Information gathering</h3>
<p>Pretty much every guide suggests this because it’s solid advice: A good place to start is to get a few key pieces of information noted down in one place somewhere, as you’ll be referring to these multiple times over the course of this exercise.</p>
<h4 id="instance-name-uuid-location-hypervisor-mac-address"><em>Instance name, UUID, location (hypervisor), MAC address</em></h4>
<p>Get the UUID of your instance either via Horizon or by doing the following:</p>
<pre><code>(openstack)nick@deadline:~> nova list --all_tenants | grep void
| d54a6557-e114-4e26-98b8-55c814fb938a | void | ACTIVE | - | Running | default=192.168.0.2, 85.199.252.173
</code></pre>
<p><code>nova show $UUID</code> will tell you which Hypervisor this instance is currently on. Look for the <code>OS-EXT-SRV-ATTR:hypervisor_hostname</code> property, but note that this attribute is only returned if you’re an admin.</p>
<h4 id="network-name-uuid-segmentation-id"><em>Network name, UUID, segmentation ID</em></h4>
<p>You can use Horizon to get the network UUID or use <code>neutron net-list</code> and filter by network name. Once you’ve got the UUID, you can find out the <code>segmentation_id</code> by doing the following:</p>
<pre><code>(openstack)nick@deadline:~> neutron net-show -F provider:segmentation_id 4dc325ed-f141-41d9-8d0a-4f513defacad
+--------------------------+-------+
| Field | Value |
+--------------------------+-------+
| provider:segmentation_id | 11 |
+--------------------------+-------+
</code></pre>
<p>At this point you should convert that <code>segmentation_id</code> into hexadecimal as that’s how it’s referred to in OpenFlow, so in this case 11 = 0xb. What do we need this for? All will soon be revealed!</p>
<h4 id="subnet-name-uuid"><em>Subnet name, UUID</em></h4>
<h4 id="responsible-network-node-for-dhcp-and-l3"><em>Responsible network node for DHCP and L3</em></h4>
<p>You can obtain this now that you’ve got the network’s UUID by doing the following:</p>
<pre><code>(openstack)nick@deadline:~> neutron dhcp-agent-list-hosting-net 5f1e4cc8-37e2-4bfb-b96f-ea1884121542
+--------------------------------------+--------+----------------+-------+
| id | host | admin_state_up | alive |
+--------------------------------------+--------+----------------+-------+
| 1beb99ef-e6f6-4083-8fb6-661f2f61c565 | osnet1 | True | :-) |
+--------------------------------------+--------+----------------+-------+
</code></pre>
<p>If it lists more than one agent then lucky you - you now have more than one place to look ;)</p>
<h2 id="examining-the-basics">Examining the basics</h2>
<p>Your first step should be to examine the basics and look for anything obvious that might be awry. It’s amazing how often the simple problems are overlooked and how quickly some people seem to want to dive into the more esoteric aspects of the system.</p>
<ul>
<li>
<p>Look on the compute node and the network node to make sure they’ve got an interface in the right network and that it’s got an IP address. Can you ping one from the other?</p>
</li>
<li>
<p>On your compute node, check to see that there’s the necessary Open vSwitch bridges in place - br-int and br-tun if you’ve followed the official installation guides - and that there’s some GRE tunnels established between the virtual switches. Are those tunnels on the right subnet? Do the IP addresses look familiar?</p>
</li>
<li>
<p>If you haven’t already, now might be a good time take a look at the logs for nova-compute, neutron-server, neutron-plugin-openvswitch-agent, and OVS.</p>
</li>
<li>
<p>Make sure there’s a DHCP agent responsible somewhere for this network - see the 3rd point above;</p>
</li>
<li>
<p>Check on the relevant network node for a corresponding network namespace and running dnsmasq instance.</p>
</li>
</ul>
<p>Use <code>ovs-vsctl</code> to take a look at the virtual switches that are defined and to check established GRE tunnels:</p>
<pre><code>root@acid:~# ovs-vsctl show
aa3447b4-a88a-459a-bdae-5c4f04a67632
Bridge br-tun
Port "gre-0a0aaa92"
Interface "gre-0a0aaa92"
type: gre
options: {in_key=flow, local_ip="10.10.170.133", out_key=flow, remote_ip="10.10.170.146"}
[…]
</code></pre>
<p>Similarly, looking taking a quick look on the network node at the relevant network namespace can sometimes give you an obvious clue to work on:</p>
<pre><code>nick@osnet1:~$ sudo ip netns list | grep 5f1e4cc8-37e2-4bfb-b96f-ea1884121542
qdhcp-5f1e4cc8-37e2-4bfb-b96f-ea1884121542
nick@osnet1:~$ ps auxww | grep [5]f1e4cc8-37e2-4bfb-b96f-ea1884121542
nobody 11651 0.0 0.0 28204 1092 ? S Nov27 0:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tapb7c643d7-1c --except-interface=lo --pid-file=/var/lib/neutron/dhcp/5f1e4cc8-37e2-4bfb-b96f-ea1884121542/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/5f1e4cc8-37e2-4bfb-b96f-ea1884121542/host --addn-hosts=/var/lib/neutron/dhcp/5f1e4cc8-37e2-4bfb-b96f-ea1884121542/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/5f1e4cc8-37e2-4bfb-b96f-ea1884121542/opts --leasefile-ro --dhcp-range=set:tag0,192.168.0.0,static,86400s --dhcp-lease-max=256 --conf-file= --domain=datacentred.io
</code></pre>
<p>But if you’ve double-checked and exhausted the basics above and you can’t see anything amiss, it’s time to brew a fresh pot of coffee and roll up your sleeves.</p>
<h2 id="dude-wheres-my-dhcpdiscover">Dude where’s my DHCPDISCOVER</h2>
<p>At this point we need to start digging to find out exactly where our traffic is (or isn’t) going. As you’ve already got a session open on the responsible network node, probably the easiest place to look first is within the relevant qdhcp network namespace. Use tcpdump to look for DHCPDISCOVER messages arriving here:</p>
<pre><code>root@osnet1:~# ip netns exec qdhcp-5f1e4cc8-37e2-4bfb-b96f-ea1884121542 tcpdump port 67 or port 68 -lne
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tapb7c643d7-1c, link-type EN10MB (Ethernet), capture size 65535 bytes
11:18:27.112022 fa:16:3e:34:a3:5c > fa:16:3e:54:77:5b, ethertype IPv4 (0x0800), length 301: 192.168.0.10.68 > 192.168.0.3.67: BOOTP/DHCP, Request from fa:16:3e:34:a3:5c, length 259
11:18:33.009733 fa:16:3e:34:a3:5c > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 333: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:34:a3:5c, length 291
11:18:33.009915 fa:16:3e:54:77:5b > fa:16:3e:34:a3:5c, ethertype IPv4 (0x0800), length 354: 192.168.0.3.67 > 192.168.0.10.68: BOOTP/DHCP, Reply, length 312
11:18:33.011455 fa:16:3e:34:a3:5c > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 345: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:34:a3:5c, length 303
11:18:33.011631 fa:16:3e:54:77:5b > fa:16:3e:34:a3:5c, ethertype IPv4 (0x0800), length 373: 192.168.0.3.67 > 192.168.0.10.68: BOOTP/DHCP, Reply, length 331
</code></pre>
<p>The above shows what it looks like when everything’s working as it should. Lucky me, right? You can also use <code>strace</code> against that dnsmasq’s PID to find out what’s going wrong within the process if you’re that way inclined</p>
<p>If you don’t see traffic reaching this network namespace then the problem lies elsewhere in the chain. 9 times out of 10 you’re OK at this point, and it’s here really that the lack of documentation might get in the way of making further progress.</p>
<p>Let’s make sure DHCP traffic is making its way out of our compute node as it should do.</p>
<h3 id="on-the-compute-node">On the Compute Node</h3>
<p>Backing up a little, it’s worth refreshing our memories on how this is supposed to hang together. Each instance’s network interface is realised as a TAP interface on the hypervisor. These TAP interfaces are bridged (using Linux bridges) into an Open vSwitch bridge, and it’s here that most of the magic happens. But why is there a Linux bridge in the mix? This is actually where Security Groups are realised - the official explanation is: “Ideally, the TAP device vnet0 would be connected directly to the integration bridge, br-int. Unfortunately, this isn’t possible because of how OpenStack security groups are currently implemented. OpenStack uses iptables rules on the TAP devices such as vnet0 to implement security groups, and Open vSwitch is not compatible with iptables rules that are applied directly on TAP devices that are connected to an Open vSwitch port.”</p>
<p>So let’s check to see if there’s some sensible iptables rules in place for our instance’s TAP interface. To find out which TAP interface we should be examining, you can either look at the process table or check the VM’s configuration file - libvirt.xml - which exists in a folder with the same name as its UUID under /var/lib/nova/instances:</p>
<pre><code>root@acid:/var/lib/nova/instances/2c9a0a61-ccf0-4f26-baa0-c9b16d72b645# grep -i tap libvirt.xml
<target dev="tap9c4a0ea7-ba"/>
</code></pre>
<p>We can use the ‘9c4a0ea7’ in this example’s to double check the iptables rules that are in place. You’re looking for something like:</p>
<pre><code>root@acid:~# iptables -S | grep 9c4a0ea7
[...]
-A neutron-openvswi-o9c4a0ea7-b -p udp -m udp --sport 68 --dport 67 -j RETURN
-A neutron-openvswi-o9c4a0ea7-b -p udp -m udp --sport 67 --dport 68 -j DROP
[…]
</code></pre>
<p>If that looks OK then it’s time to dig a little deeper. Be warned: You’re almost certainly going to need more coffee.</p>
<p><img src="/public/static/under-the-hood-scenario-1-ovs-compute.png" alt="Compute node networking" /></p>
<p>Remember that each instance’s network interface corresponds with a TAP device on the hypervisor, and that this is connected into a Linux bridge and then from there into an OVS bridge called ‘br-int’. You can use <code>brctl show</code> to show which interfaces are associated with which bridge:</p>
<pre><code>root@acid:~# brctl show
bridge name bridge id STP enabled interfaces
qbr2b57faaf-8d 8000.0e83816f2e37 no qvb2b57faaf-8d
tap2b57faaf-8d
qbr3c726ea9-34 8000.9a7f311e7773 no qvb3c726ea9-34
qbr529df2ab-78 8000.ea0665357633 no qvb529df2ab-78
tap529df2ab-78
qbr9a356881-4f 8000.8a401b915524 no qvb9a356881-4f
qbr9c4a0ea7-ba 8000.3ae5d1c542a1 no qvb9c4a0ea7-ba
tap9c4a0ea7-ba
qbre1041f6d-9e 8000.b690266a1a3c no qvbe1041f6d-9e
qbrec9191c7-60 8000.5278b133888e no qvbec9191c7-60
tapec9191c7-60
</code></pre>
<p>(<code>virbr0</code> can be disregarded - this is created as part of the installation of QEMU and KVM and isn’t used by OpenStack).</p>
<p>Sticking with the ‘9c4a0ea7’ part of our interface’s ID used above, we can check to make sure that there’s the relevant port:</p>
<pre><code>root@acid:~# ovs-vsctl show
aa3447b4-a88a-459a-bdae-5c4f04a67632
[…]
Bridge br-int
fail_mode: secure
[…]
Port "qvo9c4a0ea7-ba"
tag: 104
Interface "qvo9c4a0ea7-ba"
[…]
</code></pre>
<p>Output snipped for brevity. Key details here are a) we have the interface on bridge br-int that we’d expect and b) tag that’s associated with that port.</p>
<p>If any of these components are missing you’ll need to check through nova-compute’s logs in <code>/var/log/nova</code> and also the OVS agent’s logs under <code>/var/log/neutron</code>, both on the compute node in question. Something will have impeded the creation of these components and the logfiles should give you a pretty clear idea as to what. If not, try bumping up the logging and debug levels and then restarting the responsible agents to see where that gets you.</p>
<h3 id="open-vswitch-and-openflow">Open vSwitch and OpenFlow</h3>
<p>Still not worked out what’s wrong? Poor you - this is where it gets messy. Let’s examine is how a packet traverses Open vSwitch and makes its way to a given network node. We’ve seen above how an instance’s network interface is connected into br-int. br-int is connected to br-tun via a patch:</p>
<pre><code>root@acid:~# ovs-vsctl show
aa3447b4-a88a-459a-bdae-5c4f04a67632
Bridge br-tun
[…]
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
[…]
Bridge br-int
fail_mode: secure
[…]
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
</code></pre>
<p>All traffic from br-int makes its way into br-tun via this patch. To look at the OpenFlow rules in place, use <code>ovs-ofctl dump-flows br-tun</code>:</p>
<pre><code>root@acid:~# ovs-ofctl dump-flows br-tun
NXST_FLOW reply (xid=0x4):
[…]
</code></pre>
<p>Whoah, that’s a lot of output - so let’s try and break it down. The rules are grouped by tables, with packets modified and various options set before being sent off to another table or to a port. The high-level flow is basically this:</p>
<p><img src="https://assafmuller.files.wordpress.com/2014/01/flow-table-flow-chart.png" alt="OpenFlow table chart" /></p>
<p>Thanks to Assaf Muller for this diagram. Really what you’re looking for here is rules in tables that correspond with the <code>segmentation_id</code> (translated into hex, remember!) and a vlan_tag which is internally assigned by OVS but corresponds with the tag associated with a given port.</p>
<p>If look at what happens with our network’s <code>segmentation_id</code> (27, or 0x1b):</p>
<pre><code>root@acid:~# ovs-ofctl dump-flows br-tun | grep 0x1b
cookie=0x0, duration=941713.195s, table=2, n_packets=21465888, n_bytes=30396202891, idle_age=5, hard_age=65534, priority=1,tun_id=0x1b actions=mod_vlan_vid:104,resubmit(,10)
cookie=0x0, duration=941706.726s, table=20, n_packets=13688285, n_bytes=2686393985, hard_timeout=300, idle_age=2, hard_age=5, priority=1,vlan_tci=0x0068/0x0fff,dl_dst=fa:16:3e:52:c7:34 actions=load:0->NXM_OF_VLAN_TCI[],load:0x1b->NXM_NX_TUN_ID[],output:3
cookie=0x0, duration=941713.249s, table=21, n_packets=370, n_bytes=54414, idle_age=11432, hard_age=65534, dl_vlan=104 actions=strip_vlan,set_tunnel:0x1b,output:2,output:5,output:4,output:3,output:10,output:11,output:12,output:20,output:13,output:19,output:23,output:22,output:21,output:14,output:6,output:16,output:17,output:8,output:15,output:7,output:9,output:18
</code></pre>
<p>That corresponds with what we’d expect to see. For completeness, table 10 looks like:</p>
<pre><code>root@acid:~# ovs-ofctl dump-flows br-tun table=10
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=2753753.313s, table=10, n_packets=22800331, n_bytes=31425726409, idle_age=2, hard_age=65534, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1
</code></pre>
<p>In this example everything’s as expected. We can use tcpdump on the relevant physical interface to see if all of this lines up and DHCP requests make it out of our hypervisor. You’ll need to just <code>grep</code> for DHCP and / or the instance’s MAC address as our traffic should be GRE encapsulated at this point:</p>
<pre><code>root@acid:~# tcpdump -enl -i p1p1 | grep -i dhcp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on p1p1, link-type EN10MB (Ethernet), capture size 65535 bytes
15:06:41.750735 00:1b:21:6e:cc:a4 > 00:1b:21:6e:ce:7c, ethertype IPv4 (0x0800), length 374: 10.10.170.133 > 10.10.170.119: GREv0, key=0x7, proto TEB (0x6558), length 340: fa:16:3e:d6:12:52 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 332: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:d6:12:52, length 290
</code></pre>
<p>Still with me? Now that we’re happy traffic is making it from the compute node, let’s take a look at what happens on the network node.</p>
<h3 id="network-node">Network Node</h3>
<p>As you’d expect, the configuration of the various Open vSwitch bridges and internals largely mirrors that of what’s on the compute except in reverse.</p>
<p><img src="/public/static/under-the-hood-scenario-1-ovs-netns.png" alt="Network node" /></p>
<p>We know that our dnsmasq instance is running in a network namespace specific to our network’s UUID, how do packets end up there exactly? The network namespace also has a TAP interface:</p>
<pre><code>root@osnet0:~# ip netns exec qdhcp-cfc5510c-6dda-4862-b63c-d16c7f52b521 ip li
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
188: tap90752968-96: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether fa:16:3e:4e:79:57 brd ff:ff:ff:ff:ff:ff
</code></pre>
<p>Which will be connected to a port on br-int and have an associated tag:</p>
<pre><code>root@osnet0:~# ovs-vsctl show | grep -A1 tap90752968-96
Port "tap90752968-96"
tag: 22
Interface "tap90752968-96"
type: internal
</code></pre>
<p>Now if we look at the OpenFlow flows with these two bits of context - our internal tag and also the <code>segmentation_id</code> - it should start to make a bit more sense. The first table to examine is 2:</p>
<pre><code>root@osnet0:~# ovs-ofctl dump-flows br-tun table=2 | grep 0x1b
cookie=0x0, duration=421236.766s, table=2, n_packets=133943644, n_bytes=26674991756, idle_age=0, hard_age=65534, priority=1,tun_id=0x1b actions=mod_vlan_vid:22,resubmit(,10)
</code></pre>
<p>Here we see packets matching our tunnel ID being tagged with VLAN ID 22 and then resubmitted into tunnel 10, the ‘learning’ table. From there, as this is DHCP and therefore multicast, there should be something in table 21:</p>
<pre><code>root@osnet0:~# ovs-ofctl dump-flows br-tun table=21 | grep 0x1b
cookie=0x0, duration=421352.327s, table=21, n_packets=924, n_bytes=44295, idle_age=14, hard_age=65534, dl_vlan=22 actions=strip_vlan,set_tunnel:0x1b,output:2,output:4,output:3,output:9,output:10,output:11,output:12,output:20,output:13,output:19,output:23,output:22,output:21,output:14,output:5,output:16,output:17,output:7,output:15,output:6,output:8,output:18
</code></pre>
<p>And there is. This removes the VLAN tag and outputs it to the other ports on that switch. You can actually see the port numbers and corresponding attached interfaces with <code>ovs-dpctl show</code>:</p>
<pre><code>root@osnet0:~# ovs-dpctl show
system@ovs-system:
lookups: hit:2979708493 missed:17652283 lost:0
flows: 69
port 0: ovs-system (internal)
port 1: br-int (internal)
port 2: br-ex (internal)
port 3: em2
port 4: br-tun (internal)
port 5: tap30264f60-4a (internal)
[…]
</code></pre>
<p>Again, that’s a working example. Hopefully you’ve not made it this far without spotting something that’s broken and a clue somewhere as to what the fix should be.</p>
<p>If you find you’re missing rules, then one way to re-add them without having to recreate the whole network is to remove and then re-add the network from the responsible DHCP agent. You can do this as follows:</p>
<p>With your network’s UUID, obtain the relevant network agent’s UUID:</p>
<pre><code>(openstack)nick@deadline:~> neutron dhcp-agent-list-hosting-net 4dc325ed-f141-41d9-8d0a-4f513defacad
+--------------------------------------+--------+----------------+-------+
| id | host | admin_state_up | alive |
+--------------------------------------+--------+----------------+-------+
| 1beb99ef-e6f6-4083-8fb6-661f2f61c565 | osnet1 | True | :-) |
+--------------------------------------+--------+----------------+-------+
</code></pre>
<p>Then remove and re-add that network:</p>
<pre><code>(openstack)nick@deadline:~> neutron dhcp-agent-network-remove 1beb99ef-e6f6-4083-8fb6-661f2f61c565 4dc325ed-f141-41d9-8d0a-4f513defacad
Removed network 4dc325ed-f141-41d9-8d0a-4f513defacad from DHCP agent
(openstack)nick@deadline:~> neutron dhcp-agent-network-add 1beb99ef-e6f6-4083-8fb6-661f2f61c565 4dc325ed-f141-41d9-8d0a-4f513defacad
Added network 4dc325ed-f141-41d9-8d0a-4f513defacad to DHCP agent
</code></pre>
<p>And now check to see if the missing OpenFlow rules are in place.</p>
<p>Next up - L3!</p>
<h2 id="references">References</h2>
<p><a href="http://assafmuller.com/2013/10/14/gre-tunnels-in-openstack-neutron/">http://assafmuller.com/2013/10/14/gre-tunnels-in-openstack-neutron/</a>
<a href="http://keepingitclassless.net/2014/07/sdn-protocols-2-openflow-deep-dive/">http://keepingitclassless.net/2014/07/sdn-protocols-2-openflow-deep-dive/</a>
<a href="http://ovs-demystify.tumblr.com">http://ovs-demystify.tumblr.com</a>
<a href="https://ask.openstack.org/en/question/51184/neutron-gre-segmentation-id-unique/">https://ask.openstack.org/en/question/51184/neutron-gre-segmentation-id-unique/</a>
<a href="http://docs.openstack.org/admin-guide-cloud/content/under_the_hood_openvswitch.html">http://docs.openstack.org/admin-guide-cloud/content/under_the_hood_openvswitch.html</a></p>
New York2015-01-29T14:43:00+00:00http://dischord.org/2015/01/29/new-york<p>Flickr album with a bunch more shots <a href="https://www.flickr.com/photos/yankcrime/sets/72157650114559919/">here</a>.</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/16364859956/"><img src="https://live.staticflickr.com/7454/16364859956_8be9888ec7_b.jpg" title="DSC_9446" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/16204588049/"><img src="https://live.staticflickr.com/7335/16204588049_de1ce3e91d_b.jpg" title="DSC_9623" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/15768337564/"><img src="https://live.staticflickr.com/8577/15768337564_f0bb51e32c_b.jpg" title="DSC_9750" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/15770770503/"><img src="https://live.staticflickr.com/7457/15770770503_ba5dfd71e3_b.jpg" title="DSC_9758" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/15773167934/"><img src="https://live.staticflickr.com/7379/15773167934_5d736155de_b.jpg" title="DSC_9913" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/16203170898/"><img src="https://live.staticflickr.com/7388/16203170898_aa6c7d3c6b_b.jpg" title="DSC_9717" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/16204931407/"><img src="https://live.staticflickr.com/7387/16204931407_7255c69e1a_b.jpg" title="DSC_9536" /></a></p>
Hell Ride Blues at Wangies2014-09-26T13:15:15+00:00http://dischord.org/2014/09/26/hell-ride-blues-at-wangies<p>A few shots from <a href="https://www.facebook.com/pages/The-Hell-Ride-Blues/120158028013667">The Hell Ride Blues’</a> gig at Wangies (no, I’m not making the name
of the place up…) last night as part of 2014’s <a href="http://www.salfordmusicfestival.co.uk/">Salford Music Festival</a>. So
good to see these guys in a context other than their basement - a lot more
people deserve to see them play live.</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/15172982457/"><img src="https://live.staticflickr.com/3873/15172982457_9f9677a3a4_b.jpg" title="DSC_9398" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/15336498446/"><img src="https://live.staticflickr.com/3900/15336498446_b2647d3ae3_b.jpg" title="DSC_9350" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/15172930258/"><img src="https://live.staticflickr.com/3899/15172930258_9b42310d19_b.jpg" title="DSC_9365-Edit" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/15172811790/"><img src="https://live.staticflickr.com/2944/15172811790_13f838156e_b.jpg" title="DSC_9386" /></a></p>
Back to the Basement2014-09-13T21:22:22+00:00http://dischord.org/2014/09/13/back-to-the-basement<p>Just over a year on from their last ‘underground’ gig and no less awesome.
Someone should tell them that there’s a world outside of that basement
though…</p>
<p>A few more <a href="https://www.flickr.com/photos/yankcrime/sets/72157647152538659/">over here</a>.</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/15228835945/"><img src="https://live.staticflickr.com/5582/15228835945_e171cf131e_b.jpg" title="DSC_9260" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/15042267398/"><img src="https://live.staticflickr.com/3920/15042267398_cb8a0dc8df_b.jpg" title="DSC_9191-Edit" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/15042085059/"><img src="https://live.staticflickr.com/3901/15042085059_2dcb3c2833_b.jpg" title="DSC_9185-Edit" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/15042164630/"><img src="https://live.staticflickr.com/3874/15042164630_5de599a826_b.jpg" title="DSC_9187" /></a></p>
Sintra and Lisbon2014-08-11T21:15:18+00:00http://dischord.org/2014/08/11/sintra-and-lisbon<p>I’m sat here struggling to string together even a few words to say about our
little visit to Sintra, its surrounding area, and Lisbon. Some places
genuinely take you by surprise - in a good way - and that’s exactly what
happened during our trip. It was an amazing week on every level. Even the
excuse for going was perfect: the wedding of a couple of very awesome people
indeed.</p>
<p>But at least I took a few photos while we were out and about - would’ve been
daft not to really…</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/14877767802/"><img src="https://live.staticflickr.com/5563/14877767802_f39ed6c9f6_b.jpg" title="DSC_9080" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/14875670994/"><img src="https://live.staticflickr.com/3871/14875670994_b2c142d99a_b.jpg" title="DSC_9006" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/14875212461/"><img src="https://live.staticflickr.com/3876/14875212461_21449afebf_b.jpg" title="DSC_8647" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/14875726804/"><img src="https://live.staticflickr.com/3905/14875726804_0608bb6cd4_b.jpg" title="DSC_8850" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/14691540900/"><img src="https://live.staticflickr.com/3876/14691540900_e142a8d647_b.jpg" title="DSC_8724" /></a></p>
<p>And there’s a few more up here on <a href="https://www.flickr.com/photos/yankcrime/sets/72157645923925929/">Flickr</a>.</p>
Pavey Ark2013-12-20T11:00:00+00:00http://dischord.org/2013/12/20/pavey-ark<p>I’m not a morning person by any stretch of the imagination. I’m also not
really one for hiking and walking about or the ‘Great Outdoors’ in general, but
an adventure like this - a hike up to the top of Pavey Ark in the Lake District</p>
<ul>
<li>might have just about changed that.</li>
</ul>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/11423440876/"><img src="https://live.staticflickr.com/3755/11423440876_06cf0b182d_b.jpg" title="DSC_8028" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/11423438914/"><img src="https://live.staticflickr.com/5548/11423438914_7abeabff0e_b.jpg" title="DSC_8020" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/11423437164/"><img src="https://live.staticflickr.com/7394/11423437164_1678bb9a4c_b.jpg" title="DSC_8038" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/11423439764/"><img src="https://live.staticflickr.com/5471/11423439764_a47807cb4e_b.jpg" title="DSC_8016" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/11423437924/"><img src="https://live.staticflickr.com/5512/11423437924_472203ebf1_b.jpg" title="DSC_8035" /></a></p>
<p>Related: <a href="http://www.amazon.co.uk/Nikon-AF-S-NIKKOR-14-24mm-2-8G/dp/B000VDCTCI">It’s my birthday tomorrow, just saying</a>.</p>
Deploying ownCloud with Docker, Part 22013-08-13T10:03:00+00:00http://dischord.org/2013/08/13/docker-and-owncloud-part-2<p>If you followed my <a href="http://dischord.org/blog/2013/07/10/docker-and-owncloud/">previous guide</a> (and haven’t
done much with it since…), then you’ll have a basic install of ownCloud
sitting in a Docker container as well as a corresponding base image. Although
this is a good starting point, it’s far from ideal for a number of reasons - we
haven’t configured SSL nor have we configured a slightly more efficient
database backend (MySQL), for example. To get this to a point where we could
happily run it on a daily basis and trust our data with it, we need to do a
some more work.</p>
<p>Attempting to stay vaguely compliant with the Docker philosphy of being
host-agnostic and easily transportable means that we also need to get a little
bit clever about a few things. With this type of setup there are a few
environment-specific aspects to the configuration we need to be aware of, such
as what to use for the SSL certificate generation (notably the Common Name), as
well as passwords for things like MySQL’s root account and the ownCloud
database that we’ll create.</p>
<p>So with that in mind, let’s start off by updating the Dockerfile - our starting
point for creating a Docker image:</p>
<pre><code>FROM ubuntu:12.04
MAINTAINER Nick Jones "nick@dischord.org"
RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" >> /etc/apt/sources.list
RUN apt-get -y update
RUN dpkg-divert --local --rename --add /sbin/initctl
RUN ln -s /bin/true /sbin/initctl
RUN locale-gen en_US en_US.UTF-8
RUN dpkg-reconfigure locales
RUN echo "mysql-server-5.5 mysql-server/root_password password root123" | debconf-set-selections
RUN echo "mysql-server-5.5 mysql-server/root_password_again password root123" | debconf-set-selections
RUN echo "mysql-server-5.5 mysql-server/root_password seen true" | debconf-set-selections
RUN echo "mysql-server-5.5 mysql-server/root_password_again seen true" | debconf-set-selections
RUN apt-get install -y apache2 php5 php5-gd php-xml-parser php5-intl php5-sqlite mysql-server-5.5 smbclient curl libcurl3 php5-mysql php5-curl bzip2 wget vim openssl ssl-cert
RUN wget -q -O - http://download.owncloud.org/community/owncloud-5.0.10.tar.bz2 | tar jx -C /var/www/
RUN mkdir /etc/apache2/ssl
ADD resources/cfgmysql.sh /tmp/
RUN chmod +x /tmp/cfgmysql.sh
RUN /tmp/cfgmysql.sh
RUN rm /tmp/cfgmysql.sh
ADD resources/001-owncloud.conf /etc/apache2/sites-available/
ADD resources/start.sh /start.sh
RUN ln -s /etc/apache2/sites-available/001-owncloud.conf /etc/apache2/sites-enabled/
RUN a2enmod rewrite ssl
EXPOSE :443
RUN chown -R www-data:www-data /var/www/owncloud
</code></pre>
<p>There’s a few key changes here that I should explain before we go much further:</p>
<ul>
<li>Lines 12-15 set some debconf preferences for our MySQL installation ahead of
time. You’ll want to change the password here to something more appropriate, ensuring that MySQL is
secured as the packages are installed;</li>
<li>Line 17 now includes the various MySQL-related server packages;</li>
<li>Line 19 is grabbing the latest version of the ownCloud tarball from the site;</li>
<li>Resources being added from a ‘resources’ folder.</li>
</ul>
<p>So what do we do about our MySQL configuration and customisation for ownCloud?
Well, line 20’s resources/cfgmysql.sh script that we’ll insert contains the
following:</p>
<pre><code>#!/bin/bash
/usr/bin/mysqld_safe &
sleep 5
/usr/bin/mysql -u root -proot123 -e "CREATE DATABASE owncloud; GRANT ALL ON owncloud.* TO 'owncloud'@'localhost' IDENTIFIED BY 'owncloudsql';"
</code></pre>
<p>Amend the -p option to use the MySQL root password specified above and you
should also edit the ‘owncloudsql’ password to be something a little more
secure.</p>
<p>The ‘sleep 5’ is necessary in order to give MySQL enough time to start before
creating the database and assigning permissions. Note that we can’t have this
as seperate ‘RUN’ directives in our Dockerfile as, if you’ll recall, Docker
starts a new container per command and then commits the change if the command
is successful. This way we know that ‘mysqld’ is running before attempting to
run in our bit of SQL.</p>
<p>Next up are the SSL aspects of our configuration. Line 22 of this Dockerfile
takes care of including a second script into our install - start.sh - and it’s
in here that we’ll do the necessary post-container-creation customisation to
get things like our SSL certificates for Apache correctly generated. Let’s
take a closer look at this:</p>
<pre><code>#!/bin/bash
if [ ! -f /etc/apache2/ssl/server.key ]; then
mkdir -p /etc/apache2/ssl
KEY=/etc/apache2/ssl/server.key
DOMAIN=$(hostname)
export PASSPHRASE=$(head -c 128 /dev/urandom | uuencode - | grep -v "^end" | tr "\n" "d")
SUBJ="
C=UK
ST=England
O=Dischord
localityName=Manchester
commonName=$DOMAIN
organizationalUnitName=
emailAddress=nick@dischord.org
"
openssl genrsa -des3 -out /etc/apache2/ssl/server.key -passout env:PASSPHRASE 2048
openssl req -new -batch -subj "$(echo -n "$SUBJ" | tr "\n" "/")" -key $KEY -out /tmp/$DOMAIN.csr -passin env:PASSPHRASE
cp $KEY $KEY.orig
openssl rsa -in $KEY.orig -out $KEY -passin env:PASSPHRASE
openssl x509 -req -days 365 -in /tmp/$DOMAIN.csr -signkey $KEY -out /etc/apache2/ssl/server.crt
fi
HOSTLINE=$(echo $(ip -f inet addr show eth0 | grep 'inet' | awk '{ print $2 }' | cut -d/ -f1) $(hostname) $(hostname -s))
echo $HOSTLINE >> /etc/hosts
/usr/bin/mysqld_safe &
/usr/sbin/apache2ctl -D FOREGROUND
</code></pre>
<p>First off we do a quick test to see if this is being run for the first time in
a new container from an image, and if so generates our SSL keys accordingly.
The section to be customized here is what’s in the SUBJ variable (but mind the
formatting!). This allows for a certain degree of repeatability, i.e from the
one Docker image you could spin up multiple ownCloud containers and know that
they’ll receive unique certificates.</p>
<p>Note that here we’re assuming the CN for the SSL certificate is the container’s
hostname - more on that in a minute as we come to start it up. The passphrase
is randomly generated but then stripped so that Apache can start without asking
for it.</p>
<p>Next, lines 23 and 24 of this start.sh script take care of populating
/etc/hosts with the hostname and IP address of the container. This fixes a
problem with ownCloud to do with how it performs some checks against itself -
you might have <a href="https://gist.github.com/mattwilliamson/6188354">noticed an error</a> with the version configured using my previous
guide.</p>
<p>Finally, the other file we will include from our resources/ folder is the
Apache configuration containing the necessary directives to enable SSL:</p>
<pre><code><Directory /var/www/owncloud>
Options Indexes FollowSymLinks MultiViews
AllowOverride All
Order allow,deny
allow from all
</Directory>
<VirtualHost *:443>
DocumentRoot /var/www/owncloud
<Directory /var/www/owncloud>
Options Indexes FollowSymLinks MultiViews
AllowOverride All
Order allow,deny
allow from all
</Directory>
SSLEngine on
SSLCertificateFile /etc/apache2/ssl/apache.crt
SSLCertificateKeyFile /etc/apache2/ssl/apache.key
</VirtualHost>
</code></pre>
<p>And that should be all we need in terms of our Dockerfile build framework to
get our image created:</p>
<pre><code># ls -R
.:
Dockerfile resources
./resources:
001-owncloud.conf cfgmysql.sh start.sh
</code></pre>
<p>So let’s kick off the build from this set of files and tag it as ‘owncloud’:</p>
<pre><code># docker build -t owncloud .
Uploading context 10240 bytes
Step 1 : FROM ubuntu:12.04
[..]
Successfully built be56671aabdf
</code></pre>
<p>Now for the moment of truth - let’s run up a container from our new image.
This is where we need to pass the hostname that the container will use with the
‘-h’ option:</p>
<pre><code># OWNCLOUD=$(docker run -d -h "owncloud.int.dischord.org" owncloud)
# docker logs $OWNCLOUD
Generating RSA private key, 2048 bit long modulus
.+++
...................+++
e is 65537 (0x10001)
No value provided for Subject Attribute organizationalUnitName, skipped
writing RSA key
Signature ok
subject=/C=UK/ST=England/O=Dischord/L=Manchester/CN=owncloud.dischord.org/emailAddress=nick@dischord.org
Getting Private key
[..]
</code></pre>
<p>The remainder of the output should be MySQL starting up, and finally Apache
which stays in the foreground for Docker to monitor its status. Now if we
point our browser to our host we should see the obligatory self-signed SSL
certificate warning but then:</p>
<p><img class="center" src="https://dl.dropboxusercontent.com/u/174303/owncloudssl1.png" /></p>
<p>Huzzah! Click the ‘Advanced’ dropdown and populate the MySQL options with the details that we used in our cfgmysql.sh script earlier:</p>
<p><img class="center" src="https://dl.dropboxusercontent.com/u/174303/owncloudssl2.png" /></p>
<p>Sorted!</p>
<p>Finally I should say that all the files used in this example are available on <a href="https://github.com/yankcrime/dockerfiles/tree/master/owncloud">Github</a>, and feel
free to hit me up on <a href="http://twitter.com/yankcrime">Twitter</a> or <a href="http://app.net/yankcrime">App.net</a> if you need any help or have any suggestions / corrections.</p>
Basemental2013-08-11T21:57:00+00:00http://dischord.org/2013/08/11/basemental<p>Saturday night, at an undisclosed location in deepest darkest Manchester, we
witnessed <a href="http://hellridedeathpunkblues.com">Hell Ride Death Punk Blues</a> tear
up a basement for a miniscule crowd of people. And what a treat. I did my
best to grab some photos of the occassion, couple of my faves are below and be
sure to check the rest on
<a href="http://www.flickr.com/photos/yankcrime/sets/72157635031239380/">Flickr</a>.</p>
<p>Keep an ear (and an eye) out for these guys… They’ve got big plans, and
rightfully so.</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/9486732029/"><img src="https://live.staticflickr.com/5486/9486732029_881fde719c_b.jpg" title="DSC_7863-Edit" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/9489538220/"><img src="https://live.staticflickr.com/7447/9489538220_a52e431eb8_b.jpg" title="DSC_7913" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/9489531612/"><img src="https://live.staticflickr.com/2846/9489531612_5c91c15111_b.jpg" title="DSC_7858" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/9489547592/"><img src="https://live.staticflickr.com/7302/9489547592_44c48d9772_b.jpg" title="DSC_7872-Edit" /></a></p>
Getting started with Docker - Deploying ownCloud2013-07-10T21:53:00+00:00http://dischord.org/2013/07/10/docker-and-owncloud<p><a href="http://docker.io">Docker</a> is another piece of virtualisation-related
technology that’s very much ‘in vogue’ at the minute. And for good reason -
most of which will be immediately apparent and familiar if you’ve ever worked
with FreeBSD’s Jails or Solaris Zones. It’s billed as ‘The Linux container
engine’, with a primary goal of providing a framework for self-contained,
easily redistributable and deployable images - independantly of hardware,
language, operating system, and so on. The upshot really is that it kind of
looks and works a lot like the aforementioned Jails or Zones, in that it’s a
way of neatly isolating applications or processes within a self-contained
environment that doesn’t require a full-blown hypervisor or the typical set of
dedicated virtual hardware in order to run. It’s built on top of a few key
Linux kernel features, and the Docker project provides the awesomesauce in
terms of tooling and framework to make managing and deploying this stuff in a
repeatable and easily transferrable manner a snap.</p>
<p>So what can I use it for, you ask? Good question, and one which I hope to
explain with an example image that’ll allow you to run your own container
hosting another popular bit of open-source software -
<a href="http://owncloud.org">Owncloud</a>.</p>
<h2 id="getting-started">Getting Started</h2>
<p>First things first and that’s getting Docker installed. I’ll be using Ubuntu
12.04 in my example, so I suggest you follow the <a href="http://www.docker.io/gettingstarted/">existing (and excellent) documentation</a> on getting Docker up and
running. Once you’ve managed that we should be on the same page:</p>
<pre><code># docker version
Client version: 0.4.8
Server version: 0.4.8
Go version: go1.1
</code></pre>
<p>If you’ve followed the installation document correctly, you’ll have a default
set of base images to work with as a starting point for a project such as this.
We can see what’s available with the following:</p>
<pre><code># docker images
REPOSITORY TAG ID CREATED SIZE
base latest b750fe79269d 3 months ago 24.65 kB (virtual 180.1 MB)
base ubuntu-12.10 b750fe79269d 3 months ago 24.65 kB (virtual 180.1 MB)
base ubuntu-quantal b750fe79269d 3 months ago 24.65 kB (virtual 180.1 MB)
ubuntu 12.10 b750fe79269d 3 months ago 24.65 kB (virtual 180.1 MB)
ubuntu latest 8dbd9e392a96 12 weeks ago 131.5 MB (virtual 131.5 MB)
ubuntu quantal b750fe79269d 3 months ago 24.65 kB (virtual 180.1 MB)
</code></pre>
<p>As we’ll be basing our Owncloud install around LTS, we need to pull in that
image specifically:</p>
<pre><code># docker pull ubuntu:12.04
</code></pre>
<p>Once that command completes, <code>docker images</code> should include the following in its output:</p>
<pre><code>ubuntu 12.04 8dbd9e392a96 3 months ago 131.5 MB (virtual 131.5 MB)
</code></pre>
<p>Now we’ve got our starting point it’s on to Owncloud itself.</p>
<h2 id="owncloud">Owncloud</h2>
<p>As you’d probably expect, the base image we’ve pulled in above is relatively
sparse and Owncloud has numerous dependancies including Apache httpd and PHP.
There’s a few steps we need to run through before we’re able to install the
Owncloud package, including grabbing the various dependancies and amending the
Apache configuration. While it’s possible to create a new container from the
base image and manually run the necessary commands, Docker has a concept known
as ‘Dockerfiles’ which allow you to essentially batch all these steps up and
have it run through them for you, delivering a built-to-spec image to use. For
Owncloud, we’ll need a Dockerfile with the following:</p>
<pre><code>FROM ubuntu:12.04
MAINTAINER You "you@your.email"
RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" >> /etc/apt/sources.list
RUN apt-get -y update
RUN dpkg-divert --local --rename --add /sbin/initctl
RUN ln -s /bin/true /sbin/initctl
RUN apt-get install -y apache2 php5 php5-gd php-xml-parser php5-intl php5-sqlite smbclient curl libcurl3 php5-curl bzip2 wget vim
RUN wget -O - http://download.owncloud.org/community/owncloud-5.0.7.tar.bz2 | tar jx -C /var/www/
RUN chown -R www-data:www-data /var/www/owncloud
ADD ./001-owncloud.conf /etc/apache2/sites-available/
RUN ln -s /etc/apache2/sites-available/001-owncloud.conf /etc/apache2/sites-enabled/
RUN a2enmod rewrite
EXPOSE :80
CMD ["/usr/sbin/apache2ctl", "-D", "FOREGROUND"]
</code></pre>
<p>Stick the above into a file called <code>Dockerfile</code>. This should all look
relatively familiar but there’s a few lines that warrant a bit more
explanation:</p>
<ul>
<li>6-7, which are needed because Upstart is currently broken in Docker. This is a workaround so that apt’s actions successfully complete;</li>
<li>18 tells Docker to NAT port 80 on our host to the container when it’s run;</li>
<li>20 is the command that starts Apache’s httpd in the foreground. This is so that Docker can manage the process.</li>
</ul>
<p>We also need to inject a simple configuration file for Apache into this image
so that it’s included when httpd starts. In the same folder that you’ve saved
this Dockerfile, create a new file called <code>001-owncloud.conf</code> with the
following basic directives:</p>
<pre><code><Directory /var/www/owncloud>
Options Indexes FollowSymLinks MultiViews
AllowOverride All
Order allow,deny
allow from all
</Directory>
</code></pre>
<h2 id="creating-an-image">Creating an image</h2>
<p>Now we’re in a position to go ahead and create our Owncloud image. Run the
following command then sit back and watch the magic:</p>
<pre><code># docker build -t owncloud .
</code></pre>
<p>At this point Docker will run through the steps in your Dockerfile and show you
the status at each step:</p>
<pre><code>Uploading context 10240 bytes
Step 1 : FROM ubuntu:12.04
---> 8dbd9e392a96
Step 2 : MAINTAINER Nick Jones "nick@dischord.org"
---> Running in bd4ffa2bde51
---> 5be3042626e5
Step 3 : RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" >> /etc/apt/sources.list
---> Running in 6342a8a5bf3a
---> 0a5f6332ccf1
Step 4 : RUN apt-get -y update
---> Running in 5ba451710f4c
---> 25275d3b5080
Step 5 : RUN dpkg-divert --local --rename --add /sbin/initctl
---> Running in 0a0473208260
---> 927abde6330d
Step 6 : RUN ln -s /bin/true /sbin/initctl
---> Running in 92844ba5d028
---> 21680ab43143
Step 7 : RUN apt-get install -y apache2 php5 php5-gd php-xml-parser php5-intl php5-sqlite smbclient curl libcurl3 php5-curl bzip2 wget vim
---> Running in 0380af0585d2
---> 2d2684e40291
Step 8 : RUN wget -O - http://download.owncloud.org/community/owncloud-5.0.7.tar.bz2 | tar jx -C /var/www/
---> Running in f21f8a471b80
---> dc2c75b30292
Step 9 : RUN chown -R www-data:www-data /var/www/owncloud
---> Running in b9b3e7220075
---> 74f5580c700b
Step 10 : ADD ./001-owncloud.conf /etc/apache2/sites-available/
---> c0179ce255ef
Step 11 : RUN ln -s /etc/apache2/sites-available/001-owncloud.conf /etc/apache2/sites-enabled/
---> Running in 6ba5513a8432
---> f96198e6b59d
Step 12 : RUN a2enmod rewrite
---> Running in 161a90a9e940
---> 67cea873cdd7
Step 13 : EXPOSE :80
---> Running in 9d4cfd17812a
---> 00bd10cd6f7d
Step 14 : CMD ["/usr/sbin/apache2ctl", "-D", "FOREGROUND"]
---> Running in 2ccef53a5cfe
---> f0d7dbd897c5
Successfully built f0d7dbd897c5
</code></pre>
<p>The output of <code>docker images</code> should now include our freshly built Owncloud
image:</p>
<pre><code># docker images | grep owncloud
owncloud latest f0d7dbd897c5 About a minute ago 12.29 kB (virtual 779.4 MB)
</code></pre>
<h2 id="running-the-container">Running the container</h2>
<p>With this built-to-spec image we’re in a position to create a new container
which will run our Owncloud ‘stack’, isolated from the host OS and using only
the bare essentials:</p>
<pre><code># docker run -d owncloud
d7db4d7c6499
</code></pre>
<p>This creates and runs the new container; The <code>-d</code> switch tells Docker to kick
it off in the background. If successful that command returns a UUID -
<code>d7db4d7c6499</code> in my case. We can view running containers by doing the
following:</p>
<pre><code># docker ps
ID IMAGE COMMAND CREATED STATUS PORTS
d7db4d7c6499 owncloud:latest /usr/sbin/apache2ctl 2 minutes ago Up 2 minutes 80->80
</code></pre>
<p>Looks good so far. If you now fire up your browser and point it at your Docker
host /owncloud/ you should be greeted by the initial Owncloud setup page asking
you to create an Admin user:</p>
<p><img class="center" src="http://db.tt/vpKBHEmj" /></p>
<p>Go ahead and do that and you should now be logged into your new Owncloud
install, running atop Docker’s various technologies.</p>
<h2 id="managing-containers">Managing containers</h2>
<p>So we now have our Owncloud container and our base image, what’s next?
Stopping the container is pretty straightforward:</p>
<pre><code># docker stop d7db4d7c6499
d7db4d7c6499
# docker ps
ID IMAGE COMMAND CREATED STATUS PORTS
</code></pre>
<p>The output of <code>docker ps</code> shows us that nothing is currently running. To see all containers, stopped or otherwise, do <code>docker ps -a</code>:</p>
<pre><code># docker ps -a
ID IMAGE COMMAND CREATED STATUS PORTS
d7db4d7c6499 owncloud:latest /usr/sbin/apache2ctl 12 minutes ago Exit 137
2ccef53a5cfe 00bd10cd6f7d /bin/sh -c #(nop) CM 15 minutes ago Exit 0
9d4cfd17812a 67cea873cdd7 /bin/sh -c #(nop) EX 15 minutes ago Exit 0
161a90a9e940 f96198e6b59d /bin/sh -c a2enmod r 15 minutes ago Exit 0
6ba5513a8432 c0179ce255ef /bin/sh -c ln -s /et 15 minutes ago Exit 0
[..]
</code></pre>
<p>I’ve snipped the output for brevity, but what’s interesting here to note is
that Docker effectively created a new container from a new image for each step
in our Dockerfile before finally committing that image with the tag we
specified with the <code>-t</code> option.</p>
<p>To restart our container, run the following:</p>
<pre><code># docker restart d7db4d7c6499
d7db4d7c6499
# docker ps
ID IMAGE COMMAND CREATED STATUS PORTS
d7db4d7c6499 owncloud:latest /usr/sbin/apache2ctl 15 minutes ago Up 18 seconds 80->80
</code></pre>
<h2 id="next-steps">Next Steps</h2>
<p>That’s hopefully given you a feel for Docker and the basics of how to get
something up and running. At this point we have a very basic installation of
Owncloud, there’s potentially a lot more to do before you’d consider making
full-time use of it, such as updating the Apache httpd configuration to enable
TLS and adding MySQL into the mix instead of SQLite as the database backend.</p>
<h2 id="tldr---feeling-lazy">tl;dr - Feeling lazy?</h2>
<p>What if you don’t care about creating your own image from scratch, you just
want to be able to run an Owncloud instance with a couple of commands? From
what I’ve described so far, this technology should be all about being able to
do just that, right? Yup, of course - just do the following:</p>
<pre><code># docker pull yankcrime/owncloud
</code></pre>
<p>And subsequently spin up a new container from that image:</p>
<pre><code># docker run -d owncloud
</code></pre>
<p>Easy. Starting to understand the potential? (-:</p>
<p>Part 2 - adding SSL and MySQL into the mix - is <a href="http://dischord.org/blog/2013/08/13/docker-and-owncloud-part-2/">online here</a>.</p>
Scottish Championship Bike Racing2013-06-09T20:01:00+00:00http://dischord.org/2013/06/09/scottish-championship-bike-racing<p><img class="center" src="https://farm8.staticflickr.com/7379/8999365250_2d211152cb_h.jpg" /></p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/8998178813/"><img src="https://live.staticflickr.com/7358/8998178813_265770f9a4_b.jpg" title="DSC_7452" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/8998172997/"><img src="https://live.staticflickr.com/3808/8998172997_e1e8bd9c75_b.jpg" title="DSC_7553" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/8999373600/"><img src="https://live.staticflickr.com/7460/8999373600_0817e4205c_b.jpg" title="DSC_7206" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/8998189685/"><img src="https://live.staticflickr.com/7339/8998189685_4b19f19ab2_b.jpg" title="DSC_7146" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/8998182927/"><img src="https://live.staticflickr.com/3819/8998182927_60fd4de886_b.jpg" title="DSC_7258" /></a></p>
<p><a href="https://www.flickr.com/photos/yankcrime/sets/72157634031044506/">More of the same over on Flickr</a>.</p>
Getting amongst it2013-05-29T21:38:00+00:00http://dischord.org/2013/05/29/getting-amongst-it<p>Landscape photography isn’t usually my thing, but if there’s one way of
changing that it’s spending some time in the Scottish Highlands. After nearly
three (!) years in Edinburgh we finally took a trip out beyond our doorstep and
tried to cram in as much sightseeing as possible over the course of a Bank
Holiday weekend. The appearance of the sun was almost as incredible as the
use of my camera.</p>
<p>Glen Coe, Fort William, Loch Duich, Ben Nevis, Glen Affric, Glen Roy, Lochalsh,
Loch Ness, Inverness, Loch an Eilein, the Caingorms… Yeah we didn’t do too
bad. Equally as impressive was the weather - the sun shone pretty much the
entirety of the weekend, except sod’s law the light went flat just as I was
setting myself up for the best shots of the weekend - at Eilean Donan castle
and also at Dog Falls.</p>
<p>I took a ton of photos but I was largely experimenting and just trying a few
new things out hence the relatively meagre selection up on Flickr. Reckon I
could get into this sort of thing - think I’ve got the photography bug back a
little…</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/8859400053/"><img src="https://live.staticflickr.com/5445/8859400053_960fb635ce_b.jpg" title="DSC_6881" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/8865433637/"><img src="https://live.staticflickr.com/7343/8865433637_eb803fa344_b.jpg" title="DSC_6969" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/8865435431/"><img src="https://live.staticflickr.com/8131/8865435431_8092e233d8_b.jpg" title="DSC_6939" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/8871976591/"><img src="https://live.staticflickr.com/3789/8871976591_f4d58f5a4c_b.jpg" title="DSC_6885" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/8871978081/"><img src="https://live.staticflickr.com/5346/8871978081_7fbe11a1b2_b.jpg" title="DSC_6907" /></a></p>
Bushfire2010-06-28T00:00:00+00:00http://dischord.org/2010/06/28/bushfire<p>Sweated my arse off in Darmstadt’s <a href="http://www.goldene-krone.de/">Golden Krone</a>
to take a few shots of these guys playing at the ‘Nonstock’ warm-up. Click on
the small ones below to enlarge, or check out the full set on my Flickr
account <a href="http://www.flickr.com/photos/yankcrime/sets/72157624367706626/show/">over here</a>.</p>
<p><a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/4737890457/"><img src="https://live.staticflickr.com/4073/4737890457_37a691a18f_b.jpg" title="DSC_3895-Edit-2" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/4738531624/"><img src="https://live.staticflickr.com/4098/4738531624_d8274f970e_b.jpg" title="DSC_3939" /></a>
<a class="thumbnail" href="https://www.flickr.com/photos/yankcrime/4737886297/"><img src="https://live.staticflickr.com/4080/4737886297_e72490bd0c_b.jpg" title="DSC_3887" /></a></p>
Married to the Sea2010-03-16T00:00:00+00:00http://dischord.org/2010/03/16/married-to-the-sea<p><img class="left" src="https://farm5.staticflickr.com/4053/4438284354_4c112062e1_m.jpg" /></p>
<p>My good friend Greg and his cohorts, known collectively as <a href="http://www.marriedtothesea.co.uk">Married to the
Sea</a>, rolled into Darmstadt late yesterday to
play an acoustic set at a locally (in)famous and unusual venue known as the
Gute Stube. It offered an ideal opportunity to bust out the camera and test
out some low-light shooting of which I’ve done very little with the D700. In
short, I was blown away - not just by the music, but by the fact that I was
snapping away quite happily at ISO 3,200 and coming out with fantastic results.</p>
<!-- more -->
<p>Anyway, yeah - the gig! The band were fantastic, winning the crowd over not
just with their music but also with their infectious charm and humour in this
very intimate venue. Even if the songs weren’t necessarily your cup of tea you
couldn’t help but smile and just enjoy being a part of the event, it was great
fun and hopefully I’ve managed to capture some of that.</p>
<p>Below is a few of my favourite shots from the night, and the full set is over
on
<a href="https://www.flickr.com/photos/yankcrime/sets/72157627090114966/show/">Flickr</a>.</p>
<p><img class="center" src="https://farm5.staticflickr.com/4053/4437517311_f209ec4515_o.jpg" width="1024" />
<img class="center" src="https://farm5.staticflickr.com/4054/4438289190_31e4718105_o.jpg" width="1024" />
<img class="center" src="https://farm5.staticflickr.com/4010/4437507619_b05efaacef_o.jpg" width="1024" /></p>
The Crownhate Ruin2009-01-02T00:00:00+00:00http://dischord.org/2009/01/02/the-crownhate-ruin<p>Two or three years back I was lucky enough to purchase some rare records off a
certain Mr. Joe McRedmond, who some of you might know as one of the members of
<a href="http://www.dischord.com/band/hoover">Hoover</a> and <a href="http://www.dischord.com/band/crownhate-ruin">The Crownhate Ruin</a> We exchanged a few emails
and he offered up some unreleased TCR tracks which I downloaded, listened to,
and that was that. He’s mighty nice, is Joe.</p>
<p>Fast forward a couple of years to late last year, when I uploaded these
unreleased tracks for a friend of mine to my webserver for her to download.
Some time later a blog that I follow posted the final chapter in an excellent
series regarding the <a href="http://hardcorefornerds.blogspot.com/search/label/Hoover%20Genealogy%20Project">genealogy of Hoover</a>, where I mentioned these unreleased
tracks and offered to make them available. Anyway, I guess somehow word got
around and Joe posted the link to these on his Facebook with the following
comment:</p>
<blockquote><p>Some unreleased not very good, poorly recorded The Crownhate Ruin songs with Alex joining us, and David Titus Batista on drums. I'll be surprised if you enjoy it.</p></blockquote>
<p>At this point I felt like a dick as obviously I hadn’t checked with him
beforehand if it was cool before essentially making these publicly available.
But, as I’ve mentioned, he’s a mighty nice chap and he’s totally cool with
these being online.</p>
<p>So anyway, enough waffle. I’m posting this to make them ‘formally’ available
as I’ve no intention of taking them down in light of Joe’s blessing.
<a href="http://dischord.org/misc/music/tcr/Unreleased%2c%20with%20Alex/">Take a listen and - contrary to the above - enjoy</a></p>
<p>Update: More info over at HFN <a href="http://hardcorefornerds.blogspot.com/2009/01/joseph-mcredmond-interview-tcr.html">here</a>.</p>
Fugazi Live Recordings2004-04-21T00:00:00+00:00http://dischord.org/2004/04/21/fugazi-live-recordings<p>Been waiting for this to happen for a good while now. Here’s the announcement
via the <a href="http://dischord.com">Dischord</a> mailing list:</p>
<p>Fugazi bass player, Joe Lally, has announced the launch of a new website to
sell CDs of the first of the many live recordings Fugazi has collected over the
band’s 15+ year career. The website can be found by going to:</p>
<p><a href="http://www.fugaziliveseries.com">http://www.fugaziliveseries.com</a></p>
<!-- more -->
<p>Now, here is a description of the Fugazi “Live” site from the band:</p>
<blockquote><p>"For many years, Fugazi has made a point of taping our live shows. We started out using a simple cassette recorder, moved on to a digital audio tape recorder (DAT) and finally just burned straight on to CDs. Our past attempts at releasing a definitive live show proved fruitless as we could never settle on performances we all felt represented the band at it's best. Instead it was decided that some day we would try to find an easy way to make as many of the tapes available to people as we could."</p><p>"This site marks the beginning of that concept - a basic testing of the waters to see what, if any, interest there is. We have digitally transferred an initial sampling of tapes from our archives and for every order we receive, we will burn a CD copy of the show requested using a generic cover with concert information and a track listing.</p><p>These are very much the original recordings without any attempt to correct for things like volume changes, strange mixing effects, or the occasionally out-of-tune guitar. Though the sound quality on these tapes does vary, if a show was too poorly recorded it didn't make the cut. There will be more shows added as interest indicates and time permits but for now here's twenty. "</p></blockquote>
<p>Fucking awesome, to say the least.</p>
P.O.W House2004-01-25T00:00:00+00:00http://dischord.org/2004/01/25/pow-house<p>Anyone who has been into BMX for a while (i.e more than a couple of years) will
probably have heard of the P.O.W (Pros of Westminster) house. It’s the
original BMXer pad that came about in Southern California in the early 90’s and
where riders like Chris Moeller, Dave Clymer, Keith Treanor and Todd Lyons as
well as many others over the few years of its existence lived, visited and
rode. As such it’s the source of many a legendary tale and the precursor to
high-profile shows like MTV’s Real World.</p>
<p>As I was cleaning up some cruft on my disk at home I came across a couple of
documents which contain a few short diary entries written by Chris Moeller
himself. I don’t think it actually ended up being used in any magazine and as
far as I can see they’re nowhere to be found on the WWW, hence me copying it in
here instead.</p>
<blockquote><p>Welcome to the P.O.W. house. If you don't like it, get the fuck out.</p></blockquote>
<!--more-->
<h2 id="the-pow-house">THE POW HOUSE</h2>
<h3 id="part-i">PART I</h3>
<ul>
<li>Thursday, 4/7/94</li>
</ul>
<p>I showed up around 7:00 pm for the usual after work riding session and found
Dave Clymer down on the ground in the shed looking for a matching set of rims.
Dave was wearing some ripped up old Vision shorts with no underwear, no shirt
and some unlaced Airwalks with no laces or socks. What was left of his shorts
was being held up with a rope or something and he’s got a forty ounce bottle of
OE in his hand. His weird mohawk hair set-up is looking pretty strange these
days. To top it all off he’s starting to compile homemade tattoos. On one
shoulder he let some rider kid do a P.O.W. deal above an ahnk that’s supposed
to symbolize everlasting life. On the other arm he’s got this huge unfinished
bondage chick that’s going to be part of an S M Bikes logo.</p>
<p>S & M is Dave’s primary sponsor and the bike company I helped co-found back in
‘87. We’ve been lucky to have Dave as our main rider for the last five years.
In that time Dave’s name has become almost synonymous with S & M and what
industry executives have labelled “the grunge element”.</p>
<p>As a testament to Dave’s marketing value we recently used a mail order ad to
sell all of Dave’s old dreadlocks for $2.00 each. Then we sold some other
kids’. Even after cutting each dread into three or four pieces we ran out and
had to cancel the ad because kids were still sending money.</p>
<p>Originally from Pennsylvania, Dave moved out to California in ‘88 to ride and
race more often. He quickly became one of the world’s fastest and most
well-known professional racers. His aggressive come-from-behind riding style
and hardcore tactics never made him any friends on the track but did make Dave
the ultimate underdog hero for little BMX punks everywhere. He was once
described by a major publication as “the dirtiest rider in the BMX”. The pun
was of course intended. Lately, Dave’s attention has been focused on the world
of freestyle which he has since turned upside down. Dave’s outrageous antics
include huge ramp-to-ramp backflips and plenty of Evil Knievel-style stunts.
Unlike anyone in BMX before him, Dave has successfully made the transition from
weightlifting BMX stud to chain smoking freestyle daredevil. Now twenty-five
years old Dave is making a living as a part-time mover and a full-time
freestyle showman.</p>
<p>Right now Dave’s down in his shed digging through a bunch of shit. It resembles
that scene in Star Wars where Luke and Hans Solo are in that garbage compactor
thing fighting for their lives and that big snake thing pulls Luke down, you
know. Anyway, the shed is like that, only smaller, about 10’ by 5’. It was
originally built just to house the water heater. Dave has since turned it into
his own little room. To make it more liveable he’s added a bunk bed and a new
electrical outlet. The pile of filthy clothes, bike parts, and porno mags that
was once four inches deep in the room I shared with him is now two feet deep in
the tiny shed.</p>
<p>He’s been working on his bikes for the last two weeks nonstop and still hasn’t
gotten anywhere. At the moment he’s building up another complete bike he was
given to do shows on so he can sell it to some kid for $250.00. He needs to
make it look good because the kid’s dad is coming to look at it. Since this
bike building project began two weeks ago Dave has only been out of the house a
few times and all of his trips have been to the liquor store for beer and
cigarettes. With freestyle shows starting at the local amusement park on
Monday, he needs to finish at least one bike soon.</p>
<p>After riding I venture back into the house. I skip the living room and head
straight for the back room where “Cruisin” Chris is hard at work on issue #2 of
his BMX Racing magazine. Cruisin has his door locked and he’s not answering me.
I ask his roommate Jay to let me into the room so I can check out the computer
setup but he won’t. He acts really mysterious about it all and says Cruisin
will spot my footprints in the carpet.</p>
<p>Across the hall, a bunch of guys are smoking some pot they just got brought
down from L.A. They are using the infamous “four footer”. Kids have passed out
after just one hit from this ridiculous bong. Griffin, another S & M team rider
and two year P.O.W. clears the 48-inch chamber, grabs a cigarette and says he’s
ready for a session on the ramp. He proceeds to rip the ramp apart on both his
bike and his skateboard.</p>
<p>At twenty-one, Griffin is the youngest guy in the house. He moved out to
California from Pennsylvania two years ago to escape the bad weather and to
ride more. Living mainly off checks from his mom, I think Griffin is on a
permanent vacation. Other than a little moving work here and there he spends
the majority of his time sitting around the house smoking, or out riding the
ramp or some local jumps.</p>
<p>Cruisin finally lets me into his room to see the operation. A set of bunk beds,
tons of audio tapes, a nice stereo, a TV, VCR, computer setup, you name it,
this is the secret blue door magazine room. Right now, Cruisin is doing photo
selection and the monitor has some page layout graphic on it. Cruisin says he’s
printing 5,000 issues of BMX Racing magazine, his current brainchild.
Unfortunately for his advertisers they think he’s printing 10,000. With his
first company, RAD Accessories, Cruisin marketed number plates and safety pad
sets. According to Cruisin the whole deal ended with some weird buyout, but I
think he just traded the name to some guy he owed money. Nevertheless, it was
enough to establish Cruisin as a bonafied member of the BMX industry. That was
back in Virginia before his big move to Southern California. BMX was born over
twenty years ago right here in L.A. county and [?], the birthplace of BMX.
Seventies and continues to be the epicenter of the sport today. He just got out
here and he’s having a hard time getting advertisers to pay his bills so
Cruisin’s looking for a job until the magazine picks up. He says after three
issues he’ll be established.</p>
<p>We skate and ride until 8:30 when Sal, the house watchdog shuts the lights off
while I’m in the middle of a run. Cruisin goes to go pick Big Island up from
the hospital where he was having a cast put on his broken arm. Someone threw
their bike off the ramp a few days ago and broke Mike’s arm. Mike is visiting
from Hawaii but is slowly becoming a resident of the house. His T-shirt
company, Lip clothing, has just released ten new shirts. Not ten new designs,
ten new shirts…period. They feature his new I heart beer logo which doesn’t
seem to be selling. When he finally gets rid of all ten he’s gonna come out
with his next product, the I heart Ibuprofen shirt. For now he’s surviving off
the two dollars kids send him in the mail for product info and stickers. He
paid for the ad in the magazine back when he was in Hawaii working. Luckily,
because he is so broke he got some state insurance deal to pay for his hand.</p>
<p>From the house we went to Club 5902, a local bar that has “Disco” night every
Thursday. The passes the guys collect get us in free before 10 pm. I made it on
time but nobody else from the house did. When the chick at the door asked
everyone for five dollars because it was after 10, they all went home.</p>
<p>11:30 pm. Lawann is riding his bike to a local hotel where some shady friends
of his are spending the night. Griffin makes a few jokes about Lawann fucking
some fat girl and Lawann just laughs and says he’s gonna do some drinkin’ and
smokin’…that’s it. We spend the next few hours watching American Me. After
the movie, Big Island suggests we hit the titty bar. Mike is the only guy from
the house that has enough energy to drag himself off the couch for the trip.
Since it was already 1:30 by the time we got there, we convinced the doorman to
let us in for free. 2:30 am. Dave is still digging around in his shed trying to
build that bike. It doesn’t seem like he’s made any progress at all.</p>
<ul>
<li>Friday, 4/8/94</li>
</ul>
<p>I ask the guys at the local jumps why the big double jumps are called the
P.O.W.s and nobody seems to know. I think somebody from the house built it for
the first time a few years ago. There are a bunch of kids that come back here
behind the river bed and build jumps everyday. One local we call Rat Boy just
put a really steep lip on the P.O.W.s and I crash really bad. I stop off at
7-11 to get some beer for the pain, then head to the house.</p>
<p>It’s about 7:00 pm now. The house is just about empty except for Big Island and
Jay. Everybody is working for the moving company today. Moving is a sketchy
deal. Crazy ex-cons on speed moving FBI offices an stuff. The work is sparse,
but when it’s on it’s on. Twenty-four hour shifts aren’t out of the ordinary.
With rent at only 80 something bucks a head, one day of work could pay
someone’s bills for an entire month. Oh yeah, they worked a ten hour shift
moving some offices. I can’t remember any fucked up stories from this
particular night. There are so many moving stories. A couple weeks ago, Lawann
told me about this guy that just fell out of the van on the road. I’ve been
hearing the stories for so long, nothing really phases me anymore. I’ve been
trying to get on a job just for this story but the guy that runs the company
hates my guts because of some shit John Paul wrote about his kid in Ride
magazine. The moving guy’s name is Bingo Reyes and his kid Anthony is a totally
rad rider from the neighborhood. That’s the connection that got everyone
moving. But for some reason Bingo thinks I own the magazine and no matter what
people tell him he hates me. When the story first came out with his kid he got
on the microphone at a race in Las Vegas and caused a really big scene. His
teary-eyed speech ended with him screaming “This magazine is not worth the
paper it’s printed on. You have not heard the last of Bingo Reyes.” It kills me
that this guy hires a bunch of speeded-out gangsters to do his business for him
but he can’t handle his kid getting quoted saying that he likes big tits. His
employees sneak off into corners and get high. People hiding, stealing shit,
getting high, doing lines, it’s fucked up. One time Keith Treanor got caught
hiding, or playing volleyball or something. This one kid, Gonz, was seen
sneaking off into a field and hiding. Even crackhead Ned moved before. They
said he couldn’t get out of bed for a week after that. John Paul moved before.
The White Bear used to move for a couple different companies. I’ll have to get
some good stories later.</p>
<p>Big Island calls up What a Lot Pizza and orders a few larges for only $3.99 a
piece. The WALA guys know everybody at the house now, and the orders come in
simply from “the bikers”. Today’s driver was new though and he was wandering
all over the street. He went to Mrs. Iroquois’ house across the street and then
to the Sand People’s house next door before Big Island started yelling at him,
which was funny because the guy turned out to be deaf, for real. Eventually he
saw Mike waving his arms around and came over.</p>
<p>Iroquois is a strange street. The houses themselves aren’t unusual, just older
single-story tract houses. It’s the motley crew of local characters that makes
it weird. At first the etc., etc., etc. [?-is like this in manuscript]</p>
<p>By the time everybody gets home, I’ve finished a six pack and start accusing
everybody of being on crystal. I make all the usual remarks about the doorknob
and hinge collections, making CB radios out of old shavers, whatever. Griffin
freaks out and starts telling me that I don’t have any respect for his house
and that I should get out. They drag themselves into the back room for a smoke
session and I can hear them in there going crazy about me. Making a really big
deal out of stupid little things like respecting the house can go on for hours,
even days. When there are twenty-something people to tell the same exact story
to, things seem to drag on forever. The funny thing is that when you are in the
house it seems like everything that is going on is really important. So much
information is being dished out, stories and stuff, you feel like something is
happening. Fuck, you don’t even think you need to leave the house. Dave takes
my truck to the store for beer and we turn our attention back to Single White
Female.</p>
<p>Anyway, by the time Dave got back with a shitload of weird imported beers that
only cost 50 cents each, we were heavy into the next movie, some assassin movie
with the girl from Single White Female in it. It was Bridget Fonda. I was
pretty drunk by this point. I’m pretty infamous for showing up at the house
drunk and causing some huge scene. I can remember coming over with my fucked up
friend Crazy Red right after he got out of jail for selling drugs and starting
a food fight that turned into a full-blown furniture fight. Now the lamps are
nailed to the walls. That little incident got me “banned” for a while.</p>
<p>Plenty of people have been “banned” from the house. Shit, I’ve been banned so
many times even I can’t remember. I just don’t show up for a while and then
everything is OK. The P.O.W.s are either very forgiving or very forgetful,
because everybody comes back. Keith got banned after he crashed some party at a
girl’s house, beat up some kids there with a pool stick and then came back to
the house and tried to beat everyone up there too. The cops showed up and took
Keith and Lawann both to jail. Imagine a full lineup of P.O.W.s in the front
yard being IDed by the girl and the beat up kid. Before the cops showed up, Sal
shaved off his dreads so the kid wouldn’t recognize him. Lawann is the only
black kid in the house so they spotted him right away, and nobody could forget
Keith’s face. When he gets drunk, Keith looks and acts even more psychotic than
normal. Oh yeah, one of the cops that showed up and arrested Lawann and Keith
was Craig Turner, son of Gary Turner of GT Bicycles. That was a funny
coincidence.</p>
<p>Shade Nade got banned after a bunch of stuff kept turning up missing including
Darrin’s handgun. People catch Ned wearing clothes that were buried way back in
their closets. Ned is bad news. When nobody was home during the summer and I
was watching the house, Ned had all kinds of crazy drug deals going down in the
back room. I saw people doing lines on the living room table. A few of the
Mansons were hanging around. Ned doesn’t even live at the house. As soon as Sal
heard all that when he got home, Shade Nade was banned. Now he hangs out all
the time. I think they need Ned around for pot.</p>
<p>The funniest ban ever has to be the ban on the White Bear, this friend of mine
that used to live in the water heater room where Dave lives now. Steve got
banned for “talking shit around the house”. Unlike the guys that need to come
over to ride the ramps or get high or whatever, I don’t think Steve ever wants
to come back anyway so it’s no big deal. But they like to talk about it. The
big rumor lately is that Sal is going to kick Dave out of the house, which is
really funny because Dave is the only original P.O.W. left in the house. The
joke is that if they kick Dave out of the house, he’ll take the house with him.
I don’t doubt it.</p>
<ul>
<li>Saturday, 4/9/94</li>
</ul>
<p>Lawann left a message on my answering machine that was about three minutes of
him going “Ah, hey, um, ahhhhhh, huh, aaaaah”. I called him back and he said he
wanted to come in Monday and print some T-shirts. His last United States of
Hate P.O.W. sneeches on the beaches shirts never got printed because he never
gave me any money. Before that I printed P.O.W. St. Ides logo bite shirts for
him and we ended up fighting over 20 bucks. I ended up with a huge gash on my
head from his class ring. It was a really bad ordeal. Definitely the most
bloodshed the house had ever seen. The wall had my blood squirted all over it.
After it was all over the room was empty except for me and Lawann and he
reached into his pocked and pulled out 20 bucks and handed it to me.</p>
<p>Everybody from the house spent the day at some indoor dual slalam mountain bike
race in Long Beach where a bunch of BMXers dominated like usual. Nobody from
the house has bothered trying to cash in on the easy money all the other
Factory BMXers are getting so used to. According to Lawann, a whole race only
lasted about eight seconds and the entire race was the biggest joke ever. Dave
ended up going to downtown HB on the cheesy bar tour. Back at my house at 2:30
am, some crazy guy dressed in a pirate outfit called his chick a bitch, then
slapped her a bunch of times before he collapsed on our corner crying really
loud. The cops showed up and questioned the woman that had a huge sword on her
belt. What the fuck?</p>
<h2 id="the-pow-house-1">THE POW HOUSE</h2>
<h3 id="part-ii">PART II</h3>
<ul>
<li>Sunday, 4/10/94</li>
</ul>
<p>Dave woke up from the cheesy bar night with his own puke all over him. By 7:00
am he was working on his bike trying frantically to get ready for the first
Magic Mountain show at 2:00 pm. The drive to the park is about two hours and
Dave has a lot of work to do. Besides his inoperable bike, Dave’s car has two
flats, a dead battery, and a funky screwdriver setup for an ignition. Dave lost
the keys so Big Mike fixed it up Inglewood style. I don’t think Dave’s driven
his car since he did shows at the L.A. County Fair about six months ago. By
2:00 pm Dave has given up on making it to the show. Even after jumping it, the
car won’t start.</p>
<p>I show up around 2:00 pm and Dave is back in his shed with a porno mag getting
ready to jerk off. His Toyota is in the driveway with the hood up, tools
everywhere. It’s sitting on two flats and is packed full of spare bike parts. I
think Dave realizes he has lost his privacy and comes out of the shed in that
same pair of ratty Vision shorts drinking an Old English 40. We try jumping the
car again…no luck. One quick look at his bike and we all figure out it’s not
even rideable. So Dave sets out wrenching on it again. The car still doesn’t
run and the bike still doesn’t work and there are only 20 hours left to get
ready for tomorrow’s show. Dave can’t afford to lose the $100 a day twice.
Besides, the promoter bought his story today, but another no show could end the
deal. Luckily, Dave’s never even been late for a show in the past.</p>
<p>Lawan eventually creeps out of his room undisturbed about his lazy day. I ask
him why he didn’t go to the big Richard Bartlett jumping contest with the rest
of the house and he tells me Rich the promoter is a bigot. After a few hours of
MTV, Lawan and Alex take my truck to the mall so Lawan can buy some shorts.
Before leaving for the mall, Lawan changes into his funky fresh Filas and some
new shorts. He says there are some ladies there.</p>
<p>It’s about 5:00 pm when Cruisin, Griffin, Sal, and Neal from England show up in
the Radalac, Cruisin’s beat up ‘62 Cadillac. The race/contest was a flop. They
were the only ones that showed up for the contest and Rich called it off. So
they drove two hours each way to do a demo on some really lame track. There
were only about 20 kids watching, and Neal ended up hurting himself. For all
their efforts, Rich ended up giving them 40 bucks to split up. There was
supposed to be a $200 purse-but that’s Rich. With the money they bought two
cases of Budweiser for the house. Cruisin had to eat all the gas costs himself.
I guess he figures it into the cost of doing the magazine. Rich promised him a
full page ad in the next issue which costs $400, yeah.</p>
<p>Later, Jay, Scotty, and Mark get back from Venice Beach with a bunch of pot.
Everyone’s running around calling it “candy”. The front room is suddenly busy.
Jay’s got the four-footer in the kitchen working on it, Griffin is looking for
his lighter. I don’t know exactly what everybody is doing but the whole scene
reminds me of an Indy 500 pit stop. As soon as that bag pulled into the house
people started running around getting loud and doing stuff. After the session,
the smoking crew packed up and headed for the adult bookstore down the road.
They call it “the spot”-16 channels, full doors with locks, paper towel
dispensers in every booth-class. About $1.50 is enough to get any guy off. With
any luck they’ll get a Todd Steen video. Todd is a fellow BMX guy that doubles
as a porn star. Too much.</p>
<p>Eventually, Brian gets home from some shitty race in Minnesota. He had a good
weekend but the track was built out of frozen dirt that thawed out during the
race and turned into slop. Brian got a second and a third and came home with
$1040.00 in prize money.</p>
<p>Magoo, John Paul and Greg Esser show up. Magoo goes crazy for a while pounding
on the back doors and demanding some “candy”. Dave is still tinkering with his
bike. I settle into the couch with a WALA pizza, a Bud, and Spinal Tap on TV.
Cruisin is trying to get an ad out of Greg.</p>
<ul>
<li>Monday, 4/11/94</li>
</ul>
<p>I didn’t get over to the house today. Work was way too hectic. We had a few big
orders to take care of and I had to stick around pretty late. Big Island came
by the shop on his way home from the hospital. He went there to get a new cast
on his arm but after four hours in the waiting room he said fuck it and came
over to the warehouse. Neal and Griffin came over with him. Neal was in his
chick’s car and found a bunch of pictures of her and her high school friends in
bikinis. Neal’s girl looks unreal. Really tan, big tits, fuckin’ crazy. She’s
still in high school and she’s gonna pay Neal’s rent when he runs out of money
in a few months so he won’t have to go back to England. Neal wants to stay out
here and ride as much as possible. After we all look at the photos and Neal
splits, Alex starts going on about how she must run back to school and tell
everyone about her English boyfriend with the tattoos and a pierced dick. After
laughing hysterically Alex starts into some story about some girls at an
English rave. They were in the bathroom being taped by a hidden camera when one
girl tells the others she’s got some Kettamin which is just a horse
tranquilizer and the other girl says “Special K, you’re a hardcore bitch.” Alex
starts laughing hysterically. Alex is always laughing hysterically.</p>
<p>Dave makes the show today. Lawan spent the day working on some rap tracks with
a friend of his from school. Lawan is taking some record producing class. When
he was younger and living in Michigan Lawan played drums in a hardcore band
that put out a 12 song demo and played some local shows.</p>
<ul>
<li>Tuesday, 4/12/94</li>
</ul>
<p>Alex and I didn’t show up at the house until about 7:30 pm so we rushed
straight through the front room and back towards the ramp so we could get some
runs in before closing time. Povah and Griffin were already riding. Povah just
got back from some shows in Texas where he rode with Tony Hawk. Griffin is
learning a ton of new tricks on the six-foot ramp. Sal and Scotty were drinking
some big bottle of wine. It was just a normal night..until Dave got back from
his show.</p>
<p>Before I even got my pads on Dave was running around the ramp asking me if I
wanted to help him build a spine ramp for tomorrow’s show. What the fuck? By
the time we were done riding, it would be close to 9 and Dave wanted to build a
fuckin’ ramp. I didn’t think he was serious but he was. He was rambling about
everything. He said he’d buy beer, pizza, whatever, he just needed a ramp by
tomorrow. For some reason I said yes.</p>
<p>On the way out, Dave was screaming at everyone, running around babbling,
talking to himself, humming songs, you name it. We should havebacked out right
then. Sal was sitting in front of the stove cooking some food and bitching at
Dave about the gas getting turned off. Dave might be a slacker when it comes to
paying the bills but that’s partly because nobody ever gives him the money on
time. I don’t think anybody realized it at the time but the stove burns gas so
obviously the gas wasn’t turned off. The water heater had just been run down.
It must have been one of those rare P.O.W. days when more than one person took
a shower. By the time I made it back into the front room Dave was arguing with
Griffin about the lawn. Dave was acting pretty fuckin’ weird. If any of the
rumors I’d been hearing about Dave being a big speed freak were true, they
would explain all this. Eventually we left.</p>
<p>A quick trip to the hardware store for coping and then to Jeffro’s house for
power tools and we were off. Jeffro’s real name is just Jeff but he has a
pretty big afro so everybody calls him Jeffro. He looks like Horshack from
Welcome Back, Kotter and I think he’s the street’s speed dealer.</p>
<p>From 10 pm to 5:30 am me and Alex built ramps. Dave cut a couple of boards and
pulled some nails, but for the most part he just ran around in circles, smoked
broken cigarettes and drank beer. Occasionally he’d start an argument about
something stupid and then he’d get busy doing nothing again. The broken
cigarettes were laying everywhere burning. They would fall out of Dave’s mouth
and he’d just light a new one without even knowing what was going on. At 6:00
am we pulled up to the Gas-Mart with a five foot wide, four foot high spine
ramp in the back of the truck. Budweiser still in hand, Dave goes in to get
some burritos. As we pulled up to the house the sun was just coming up and
Jeffro was riding up on his Diamond Back. After leaving Dave at his house with
the truck and ramps, we took his car and came home to sleep.</p>
<ul>
<li>Wednesday, 4/13/94</li>
</ul>
<p>Dave called me from the park. The truck made it up there but some hose blew off
the engine causing it to overheat. I’ve driven the truck for three years and
I’ve never blown any hoses off it…give it to Dave and shit just starts
falling off. I think whatever Dave has is contagious. He also broke the handle
that opens the gas-cap compartment from inside. Luckily, Dave brought one of
the Mansons with him to help out and he knew what was wrong with the radiator
hose. The Mansons are a bunch of speed freaks that live down the street. Dave’s
been hanging out with them lately. Iroquois’ got a few houses of weirdos on it
but the Mansons definitely take the cake.</p>
<p>Dave said the promoter shit when she saw the size of the ramp. It didn’t help
any either that Dave hung up so bad on his first backflip attempt that he
ripped the coping right off the top of the ramp. He said the audience was
covering their faces in fear when he went for his second attempt. Just another
day at the Say No to Drugs/Safety in Sports demo starring Dave Clymer.</p>
<ul>
<li>One week ending Wednesday, 4/20/94</li>
</ul>
<p>I didn’t go to the house all week. The ramp building thing and the whole deal
with Dave put me into a mild state of shock. Plus there was a race in Las Vegas
and I had to go to San Francisco for a couple of days. In Vegas I crashed my
brains out in my first moto and sprained both my wrists pretty bad. Lawan and
Neil were both ripping in Superclass which is kind of like semi-pro. It’s not
as hard as pro but you can still make plenty of money. I think they both made a
few hundred bucks for the weekend. Brian was out front in pro all weekend and
ended up with some ridiculous amount of money like normal, something like eight
hundred or so. For Big Island, Rat Boy, and the rest of the kids in my truck,
the weekend was a fucking nightmare. We drove all Friday night just making
Saturday morning sign-ups by five minutes. It was about 400 fucking degrees and
I spent most of Saturday lying in a field dosed up on pain killers in some kind
of concussion daze. We eneded up sleeping in this kid’s garage Saturday night
because all the cheap motels were booked solid. The kid wouldn’t even let us
come into his house to use the bathroom so Alex went out back and shit on a
lawn chair next to the kid’s ramp. He was going to shit in the coping of the
ramp but he couldn’t get his ass up to the right angle. After Sunday’s race, we
went to meet the rest of the P.O.W.s at the live jerk off spot downtown.
Driving around in Vegas traffic is a pain in the ass and we usually spend most
of our time there lost, but everybody knows where the jerk off spot is, so it’s
a good meeting spot. My wrists were so fucked up I couldn’t even jerk off. I
didn’t give a shit though because I really don’t like the place as much as
everyone else does. I like clean little video booths with full-length locked
doors and plastic seats. This place has video booths but the doors are saloon
style. Plus they have these really big padded chairs that are hard to wipe off
and that lean way back and cause you to nut all over yourself. Nobody comes
here for the videos, they come here for the live booths.</p>
<p>In the live booth you stand up and put quarters in to keep the light on that
allows you to see the girl that is dancing in the big ring. There are a bunch
of booths in a big circle and the girl is in the middle. The quarters just keep
the light on in your booth but you can’t really see the girl unless you slip
dollar bills in through the tip slot so she’ll come near you. A few weeks ago
when we were in Vegas for a freestyle contest I was standing there
rubber-necking it so I could jerk off and see her dancing for someone else and
the chick freaked out because I wasn’t tipping her. She was pounding on my
window telling me I couldn’t jerk off without tipping her. I told her to fuck
off. It’s a cheesy scene with all these girls walking around hitting all the
windows yelling at everybody to tip. Sometimes they just sit down in their
chair naked and smoke until someone tips. Fuck ‘em I’ll jerk off to that scene.</p>
<p>I was waiting out front for everyone to come out when Rat Boy came running out
laughing. He put his dollar in the tip slot so the girl would come dance for
him but right when he was nutting he grabbed it back out and ran. I think he
thought the girl was gonna come after him. I’m surprised she didn’t.</p>
<ul>
<li>Thursday, 4/21/94</li>
</ul>
<p>It’s been a week since I let Dave borrow my car and it still stinks. I have no
idea what it smells like but it stinks. I roll the windows down all the time
but it just won’t air out. I think he impregnated my seats with some strange
shed fungus…who fuckin’ knows. Everybody at the house gives Dave a hard time
about smelling like fish but this is different. Anyway, Dave left this morning
for a freestyle contest in Pennsylvania and once again is the talk of the
house. I guess he has a couple ounces of pot and a $100 bag of speed rocks on
him and everybody thinks he’s in jail. Neal dropped him off at LAX but Keith
called from PA and said Dave wasn’t on the plane and he didn’t make it to the
airport. This story was the talk of the town for about 36 hours. People were
calling around for Dave updates. I called his parents’ house to see if he had
mad it but his parents had moved and the forwarding number they left with the
new residents was the Allentown county elementary school lunch menu…turkey
and potatoes. Eventually Dave turned up at the contest. He just missed his
flight and caught a later one. It’s funny how out of hand the story got.
Sometimes I think Dave does this shit on purpose. It’s an ingenious plan to
become the most underground BMX cult hero of all time. Unfortunately for Dave I
don’t think that’s the case, he’s just completely weird, and getting weirder by
the minute. Last week he was telling someone that he was in the best shape of
his life and since he was skinny he could pull off all these tricks that he
would have crashed on when he was buff and 30 pounds heavier. “I can
over-rotate and just pull out of it because I’m so little.” He’s probably
right…as long as he really believes it it. And he does. The White Bear says
Dave is the ultimate psycho-semantic superman. Yeah whatever Steve.</p>
<ul>
<li>Sunday, 4/24/94</li>
</ul>
<p>Today was Lawan’s 22nd birthday so everybody pitched in three dollars and
bought hamburgers and beer for a barbeque. A bunch of kids came over and rode
while Scotty worked the new hibatchi under the deck of the ramp. They got the
hibatchi at the swap meet and it cost everybody an extra $7 on their rent. The
“house” spends rent money kind of like the government spends tax money. Today
Brian was talking about raising the rent up to buy some light bulbs. I sat on
the couch all day and watched TV. Crazy tattooed Jim came over with some guy
and tried to get some pot. Ned sat and cleaned his speed pipe for most of the
day. A few kids learned some new tricks on the ramp and I ended up losing the
controller. There was a little skate session going on out front and Sal hooked
up a basketball hoop and tried unsuccessfully to get a game going. At some
point he started yelling at people about not washing their dirty dishes. I’ve
always liked Sundays at the house. I guess just about everyday is Sunday at the
house.</p>
<p>Unfortunately for the guys that live here, they don’t have much of a choice for
now. None of the P.O.W.s have family west of the Mississippi, and $1075 a month
split nine ways is a far cry from the $350 a month it costs most people to rent
a room in the same area. Most of the guys already spent their money coming out
to Southern California for the weather and the happening BMX scene. That’s
always been the P.O.W. story. Luckily, being broke doesn’t seem that bad to a
bunch of bikers with a backyard full of ramps and jumps. It’s not a lifestyle
most people could handle but like the sign on the living room wall reads,
“Welcome to the P.O.W. house. If you don’t like it, get the fuck out.”</p>
<p>End.</p>