Screen corruption on KDE with Ubuntu 20.04

At some point in the past couple of weeks I guess my laptop updated libraries or something, because when I had to reboot my laptop yesterday I started seeing massive screen corruption in some applications. It manifested itself in horizontal white/black lines remaining especially when selecting text. This was especially visible in konsole. I couldn’t see any package in particular which had updated recently and I tried looking at a few different Xorg or kernel options but to no avail.

Eventually I looked at the KDE compositor settings. I noticed that the “Rendering Backend” was set to XRender, which as far as I understand it is very old. Updating it to ‘OpenGL 3.1 instantly fixed the issue.

I’m leaving this mostly as a note to myself for if it happens again in the future, but at the same time perhaps it is a wider regression in the ubuntu 20.04 KDE packages so it would help someone else.

Best wordle starter words

I recently, like pretty much everyone else got into Wordle. One of the most important things in getting the correct answer is to find the best first word or two to start with which will help guide you to the correct answer. The ideal first word(s) should use one each of the most common letters so for example in the first 2 guesses you can test the top 10 characters.

My first (relatively uneducated) guesses based on what I vaguely remembered about letter frequency in English were ‘spear’ and ‘mount’ – 4 vowels and some of the most common consonants. However it’s pretty much a random guess so I was wondering if we could figure out a better approach.

It’s pretty straight forward to look at the source code of Wordle, which contains two word lists. The first one contains 2315 5-letter words which can be the answer, the second contains a further 10,000 of all possible 5 letter words in English.

So, I wrote a small script to analyse the frequency of letters in the list of possible answers, and then based on that filter the possible words to find the best starting (and subsequent) guesses which would work.

I’ve put the simple python script I used at the bottom of the article, but the output is:

Matching 5 new letters (39%) are: [‘arose’]
Matching 5 new letters (66%) are: [‘unlit’, ‘until’]
Matching 4 new letters (81%) are: [‘duchy’]
Matching 3 new letters (89%) are: [‘pygmy’]

What this means is that if you start with the word ‘arose’, and then ‘until’ (or ‘unlit’), even though it’s only 10 unique letters (38% of the alphabet) because they are the most frequent ones they will cover 2/3 (66%) of the possible words.

In terms of letter frequency overall we get the following ordered detail:

[(‘e’, 1233), (‘a’, 979), (‘r’, 899), (‘o’, 754), (‘t’, 729), (‘l’, 719), (‘i’, 671), (‘s’, 669), (‘n’, 575), (‘c’, 477), (‘u’, 467), (‘y’, 425), (‘d’, 393), (‘h’, 389),
(‘p’, 367), (‘m’, 316), (‘g’, 311), (‘b’, 281), (‘f’, 230), (‘k’, 210), (‘w’, 195), (‘v’, 153), (‘z’, 40), (‘x’, 37), (‘q’, 29), (‘j’, 27)]

The script I wrote is not perfect but it’s at least a start at finding some optimum words

import sys
with open(sys.argv[1]) as fh:
    words = [l.strip() for l in fh]

chars = {}
for char in ''.join(words):
    chars[char] = chars.get(char, 0) + 1
frequency = sorted(chars.keys(), key=lambda c: -chars[c])
print(sorted(chars.items(), key=lambda c: -c[1]))

total_freq = 0
while len(frequency) > 5:
    matching = words
    letters = []
    for char in frequency:
        new_matching = [w for w in matching if char in w]
        if new_matching:
            matching = new_matching
            letters.append(char)
        if len(letters) == 5:
            break

    total_freq += sum([chars[c] for c in letters])
    print("Matching %d new letters (%d%%) are: %r" % (len(letters), total_freq / sum(chars.values()) * 100, matching))
    frequency = [c for c in frequency if c not in letters]

Simple mitigation for the new DNS cache poisoning attack

As reported in many places, a new attack has been presented which can allow an attacker to poison caching and forwarding DNS server entries. The PDF is an interesting read and contains many different ideas which chained together can lead to this attack. I believe the following firewall rule should defend against the attack on caching servers with very little side effect by preventing sending of ICMP messages saying that the given UDP port was unreachable:

iptables -I OUTPUT -p icmp --icmp-type port-unreachable -m u32 --u32 '34 & 0xFF = 17' -j DROP

Running systemd inside a Centos 8 Docker container

Support for systemd in Docker has improved a lot since this 2016 article, but it’s still not obvious quite how to make it work. Why would you want this? Mostly for testing full-server deploys (for example we test ansible deployments against various docker containers to ensure there are no bugs). Here’s a systemd-based centos 8 Dockerfile that also includes an ssh server:
FROM centos:8

# Set up base packages that are expected
RUN dnf -y install openssh-server crontabs NetworkManager firewalld selinux-policy

RUN systemctl mask dev-mqueue.mount dev-hugepages.mount \
     systemd-remount-fs.service sys-kernel-config.mount \
     sys-kernel-debug.mount sys-fs-fuse-connections.mount \
     graphical.target systemd-logind.service \
     NetworkManager.service systemd-hostnamed.service

STOPSIGNAL SIGRTMIN+3

# SSHd setup
EXPOSE 22
COPY docker.pub /root/.ssh/authorized_keys
RUN chmod 600 /etc/ssh/ssh_host* /root/.ssh/authorized_keys

CMD ["/sbin/init"]
You can then launch this like:
docker run -v /sys/fs/cgroup:/sys/fs/cgroup:ro --tmpfs /run container-name
For the latest centos 7 you can use the following Dockerfile:
FROM centos:7

RUN yum -y install openssh-server NetworkManager firewalld && \
    systemctl disable NetworkManager && systemctl enable sshd

EXPOSE 22
COPY docker.pub /root/.ssh/authorized_keys
RUN chmod 600 /etc/ssh/ssh_host* /root/.ssh/authorized_keys

STOPSIGNAL SIGRTMIN+3

CMD ["/sbin/init"]

Complex network setup on Centos/Redhat 8 via Network Manager

Here are some notes, mostly for my future self about how to set up bond + vlan + bridge networks in Centos 8 in order to create a highly resilient virtual machine host server.

Firstly, create the bond interface and set the options:

nmcli con add type bond ifname bond0 bond.options "mode=802.3ad,miimon=100" ipv4.method disabled ipv6.method ignore connection.zone trusted

Note the 802.3ad option – this says we are going to be using the LACP protocol which requires router support, however you can look at the different mode options in the kernel documentation.

Then, we add the required interfaces into the bond – in this case enp131s0f0 and enp131s0f1

nmcli con add type ethernet ifname enp131s0f0 master bond0 slave-type bond connection.zone trusted
nmcli con add type ethernet ifname enp131s0f1 master bond0 slave-type bond connection.zone trusted

If you want to create a standard vlan interface over the top of the bond for vlan tag 123 we can do this just with 1 more command:

nmcli con add type vlan ifname bond0.123 dev bond0 id 123 ipv4.method manual ipv4.addresses a.b.c.d/24 ipv4.gateway a.b.c.d ipv4.dns 8.8.8.8 connection.zone public

Alternatively if you want to set up a bridge on this interface:

nmcli con add ifname br.123 type bridge ipv4.method manual ipv4.addresses a.b.c.d/24 ipv4.gateway a.b.c.d ipv4.dns 8.8.8.8 \
        bridge.stp no bridge.forward-delay 0 connection.zone public
nmcli con add type vlan ifname bond0.123 dev bond0 id 123 master br.123 slave-type bridge connection.zone trusted

If you don’t want the default route or DNS, simply remove the ipv4.gateway and ipv4.dns options above.

If you want to create a bridge which doesn’t have any IP address on the host (just bridges through to the host vm’s) then replace all the ipv4 settings with: ipv4.method disabled ipv6.method ignore.

Ripping unfinalized DVDs from Linux

Recently we had some DVD’s with old family videos that had been recorded directly but never finalized, the device that recorded them then broke so it was impossible to finalize them, and it seemed to be pretty much impossible for anything else to read them. So, I figured out a way to recover them using linux…

First I installed the dvd+rw-tools package on ubuntu and used that to get various info and prove that the DVD itself was readable even though nothing on the system could see it as a video disk or filesystem.

$ dvd+rw-mediainfo /dev/sr0 
INQUIRY:                [HL-DT-ST][DVDRAM GTB0N    ][1.00]
GET [CURRENT] CONFIGURATION:
 Mounted Media:         11h, DVD-R Sequential
 Media ID:              CMC MAG. AE1
 Current Write Speed:   8.0x1385=11080KB/s
 Write Speed #0:        8.0x1385=11080KB/s
 Write Speed #1:        4.0x1385=5540KB/s
 Speed Descriptor#0:    00/0 R@8.0x1385=11080KB/s W@8.0x1385=11080KB/s
 Speed Descriptor#1:    00/0 R@8.0x1385=11080KB/s W@4.0x1385=5540KB/s
READ DVD STRUCTURE[#10h]:
 Media Book Type:       00h, DVD-ROM book [revision 0]
 Legacy lead-out at:    2298496*2KB=4707319808
READ DVD STRUCTURE[#0h]:
 Media Book Type:       25h, DVD-R book [revision 5]
 Last border-out at:    8390653*2KB=17184057344
READ DISC INFORMATION:
 Disc status:           appendable
 Number of Sessions:    1
 State of Last Session: incomplete
 "Next" Track:          1
 Number of Tracks:      19
READ TRACK INFORMATION[#1]:
 Track State:           reserved
 Track Start Address:   0*2KB
 Next Writable Address: 0*2KB
 Free Blocks:           2544*2KB
 Track Size:            2544*2KB
READ TRACK INFORMATION[#2]:
 Track State:           complete incremental
 Track Start Address:   2560*2KB
 Free Blocks:           0*2KB
 Track Size:            240*2KB
 Last Recorded Address: 2591*2KB

I then pulled the entire DVD into a file on the local computer for easier processing later. The command will produce lots of errors (as there are parts of the DVD that are not readable as they were never written to), but the output file (image.iso) will contain a full dump of the DVD eventually

dd if=/dev/sr0 of=image.iso bs=2048 conv=noerror,notrunc iflag=nonblock

I then put together a short perl script to search through this file for 1kb blocks beginning with the magic tag DVDVIDEO – these seemed to be the starts of individual chapters, which avconv (also called ffmpeg on some distributions) can then extract into proper video/audio.

#!/usr/bin/perl
use v5.16;
use strict;
use warnings;
my $off = 0;
my $file = $ARGV[0];
open my $fh, '<:bytes', $file or die;
my $buf;
my @pos;

# Search through each block for one beginning with the header text and store these in array of offsets - I think it's one for each track
while( my $len = read $fh, $buf, 1024 ) {
        die if $len != 1024;

        if( $buf =~ /^DVDVIDEO/ ) {
                push @pos, $off;
        }
        $off++;
}
push @pos, $off;

my $chap = 0;
for( my $i = 0; $i < @pos - 1; $i++ ) {
        my $length = $pos[$i+1] - $pos[$i];
        next if $length < 1000;
        $chap++;
        say "dd if=$file bs=1024 skip=$pos[$i] count=$length | avconv -i - -acodec copy -vcodec copy out$chap.mp4";
}

Save that as extract_dvd_tracks.pl and then run it to extract them as files named like out0.mp4. Note that it won’t process tracks that are less than about 1Mb because there seemed to be a number of small sections like this which we wanted to skip over.

perl extract_dvd_tracks.pl image.iso

Job done! Note that there are probably many different formats and layouts for unfinalized DVD’s, this may be just one of many but hopefully the principle remains the same.

Extracting all PHP code from a file

In checking over a project recently, I wanted to extract all PHP code from a set of files and combine it into a single output so I could easily assess what was being used. The eventual command I ended up with was as follows, hopefully it will be useful to someone else in the future:

find -name \*.php | xargs perl -nE 'BEGIN{ undef $/ } say for /<\?php\s*((?:(?!\s*\?>).)+)/sg'

Implementing correct modal navigation with react-router

I’ve recently been converting a big project from jquery mobile into React and Material-UI. One area that seems pretty weak compared to other frameworks is with regards to having a proper mobile-focused infrastructure for navigation. One of the big issues for me was that of modals (for example a dialog, alert or popup page components). For example when you see an alert you want to be able to click on the ok button to dismiss it. If you are on android you also expect to be able to press the back button to dismiss. However if you have separate routes for your modals such that the url changes when they are open, if you refresh the page at that point you will have a modal but no previous state to explore. There seem to be certain hacks with react-router-dom which allow you to change the page but keep the URL the same, assuming you are using BrowserRouter, however because this was a legacy project I want to keep on using HashRouter. So, I whipped up a quick HOC hack to wrap around a modal which will allow both the back navigation to work as expected, and also for refreshes to go to the main page rather than opening the modal.

I already had a standard base class which had the handleClose() method to close off a dialog and signal to the parent that it was done, so I expanded it to include a state listener as below

function getDisplayName(WrappedComponent) {
    return WrappedComponent.displayName || WrappedComponent.name || 'Component';
}

// Shows some sort of dialog which has a handleClose method to close it,
// optionally calling props.onClose and also handles history appropriately
export const DialogMixin = Base => class extends Base {
    displayName = `withDialogMixin(${getDisplayName(Base)})`;

    __close = () => {
        if( window.location.hash == this.__start_hash ) {
            window.removeEventListener('popstate', this.__close);
            this.setState({ closed: true });
            if( this.props.onClose )
                this.props.onClose();
        }
    }

    handleClose = () => window.history.back();

    render() {
        // A component could be mounted but not showing any dialog - only hook
        // the history if actually showed something.
        const render = super.render();
        if( !this.state.closed && !this.__start_hash && render ) {
            this.__start_hash = window.location.hash;
            window.location.hash += '?dialog';

            window.addEventListener('popstate', this.__close);
        }
        return render;
    }
}

You then just wrap your component with this – for example

export const Alert = DialogMixin(_Alert);

This works because react-router-dom doesn’t attempt to parse query strings etc, so when the app is first loaded you want to just have something to remove any of the ?dialog path hacks:

window.location.hash = window.location.hash.replace(/\?.*/, '');

Dumping all remote mysql queries to a server

For a project recently we needed to see which users were accessing which databases on a remote mysql server. After a little bit of reading and experimenting I came up with this command – hopefully it will be useful for someone in the future.

tshark -i any -n \
    -d tcp.port==3306,mysql \
    -T fields -e timestamp -e ip.src -e mysql.user -e mysql.schema -e mysql.query \
    -f "dst port 3306" \
    -Y "len(mysql.user) > 0 || len(mysql.query) > 0" 

This uses the excellent wireshark tool to capture traffic. The -f flag specifies a filter to only inspect inbound traffic to the server (as we don’t care about the responses).

The line beginning -T specifies the fields we wish to dump (in a tab-separated fashion), -d ensures that all traffic will attempt to be decoded as mysql. The final -Y flag ensures that only packets with queries or logins will be dumped, rather than TCP overhead or placeholder binding etc.

Note that user information is only given at the beginning of a session. Schema (database name) information may be on that same line, or you may have to scan the queries to look for a USE database type command as there are two different ways of setting the current database that will be accessed.

How to generate good-looking geographical heatmaps

For a project recently I needed to produce a geographical heatmap with millions of data points. I first tried using R with OpenStreetMap rendering, but I couldn’t make the heatmap display as flexibly as I wanted. Then, a friend suggested I try using python with the geopandas library. Even then, inspite of some examples in the manual and even a heatmap-based example, I couldn’t make the map look how I wanted with the interfaces that were exposed.

There were a number of issues:

  • My data was weighted, but there is no support for a weighted heatmap – all points are considered equal
  • I want to have a nicer map which has town names on etc. Ideally the cartodb light style
  • My data was heavily weighted to certain locations, so I wanted to run some logarithmic function on the weights generated by the heatmap algorithm to make the data more easily visible
  • The standard heatmap setup assumes there is nothing other than a map outline so by default is not transparent at all
  • I want to include this data in a report so all I want is a single png image output, rather than having the axis lines, various paddings etc on the graph

So, looking through the geopandas code (which mostly relies on seaborn kdeplot) to generate heatmaps I pulled out various bits into my program so that I could have full control over these processes. Here is a short runthrough of the resulting code:

First, we include libraries, read command-line values and set up the area of our map as a geopandas object

import contextily as ctx
import geopandas
import matplotlib.pyplot as plt
import numpy as np
import pandas
import scipy.stats
import seaborn.palettes
import seaborn.utils
import sys

input_file = sys.argv[1]
output_file = sys.argv[2]
langs = sys.argv[3].split(',')
country_codes = sys.argv[4].split(',')

max_tiles = 10

world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

if country_codes[0] == 'list':
    pandas.set_option('display.max_rows', 1000)
    print(world)
    sys.exit()

# Restrict to specified countries
if country_codes[0] == 'all':
    area = world[world.continent != 'Antarctica']
else:
    area = world[world.iso_a3.isin(country_codes)]

Then, we pull out the bounds both in latlng format and convert the area into Spherical Mercator for web display. We need to plot everything in this so that we can pull in open street map tiles for the background later.

latlng_bounds = area.total_bounds
area = area.to_crs(epsg=3857)
axis = area.total_bounds

# Create the map stretching over the requested area
ax = area.plot(alpha=0)

We now read in the CSV and drop any data that we wouldn’t display:

df = pandas.read_csv(input_file,
    header=None,
    names=['weight', 'lat', 'lng'],
    dtype = {
        'weight': np.int32,
        'lat': np.float64,
        'lng': np.float64,
    }
)

df.drop(df[
            (df.lat < axis[1]) | (df.lat > axis[3]) | (df.lng < axis[0]) | (df.lng > axis[2])     # outside bounds of country
    ].index, inplace=True)

NOTE: I tried doing .to_crs() on the CSV data using normal lat/lng representation to convert into Spherical Mercator but it was very slow. As I’m generating the data itself from PostGIS I just run the conversion in that instead. The data I have is basically using the freely available MaxMind geoip database to generate a list of lat/lng/radius. I then pick a random point within that radius to output to the CSV file using a query like:

SELECT ST_AsText(ST_GeneratePoints(ST_Transform(ST_Buffer(location, radius)::geometry, 3857), 1)), ...

Then comes the hard bit – converting these values into the heatmap using an algorithm called KDE. These bits were lifted and simplified from the kdeplot routines mentioned above, but fortunately although not exposed by seaborn the underlying scipy.stats module contains a weighted KDE which is easy to use.

# Calculate the KDE
data = np.c_[df.lng, df.lat]
kde = scipy.stats.gaussian_kde(data.T, bw_method="scott", weights=df.weight)
data_std = data.std(axis=0, ddof=1)
bw_x = getattr(kde, "scotts_factor")() * data_std[0]
bw_y = getattr(kde, "scotts_factor")() * data_std[1]
grid_x = grid_y = 100
x_support = seaborn.utils._kde_support(data[:, 0], bw_x, grid_x, 3, (axis[0], axis[2]))
y_support = seaborn.utils._kde_support(data[:, 1], bw_y, grid_y, 3, (axis[1], axis[3]))
xx, yy = np.meshgrid(x_support, y_support)
levels = kde([xx.ravel(), yy.ravel()]).reshape(xx.shape)

The levels output variable is then the weights of all points on the heatmap we are going to output. We can apply a mathematical function to this (in this case take the quintic root) in order to scale the data appropriately, and output:

cset = ax.contourf(xx, yy, levels,
    20, # n_levels

    cmap=seaborn.palettes.blend_palette(('#ffffff10', '#ff0000af'), 6, as_cmap=True),
    antialiased=True,       # avoids lines on the contours to some extent
)

By generating a palette ourselves we can add an alpha channel to each aspect so that the underlying map will be visible. However there is quite a lot of noise at the lower levels so we want to hide all of them completely:

# Hide lowest N levels
for i in range(0,5):
    cset.collections[i].set_alpha(0)

We then add in the OSM base tiles, which is a bit of a mess as the contextily library seems a bit immature:

def add_basemap(ax, latlng_bounds, axis, url='https://a.basemaps.cartocdn.com/light_all/tileZ/tileX/tileY@2x.png'):
    prev_ax = ax.axis()
    # TODO: Zoom should surely take output pixel request size into account...
    zoom = ctx.tile._calculate_zoom(*latlng_bounds)
    while ctx.tile.howmany(*latlng_bounds, zoom, ll=True) > max_tiles:      # dont ever try to download loads of tiles
        zoom = zoom - 1
    print("downloading %d tiles with zoom level %d" % (ctx.tile.howmany(*latlng_bounds, zoom, ll=True), zoom))
    basemap, extent = ctx.bounds2img(*axis, zoom=zoom, url=url)
    ax.imshow(basemap, extent=extent, interpolation='bilinear')
    ax.axis(prev_ax)        # restore axis after changing the background

add_basemap(ax, latlng_bounds, axis)

Finally we can tweak some of the output settings and save it as a png file:

ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
ax.set_frame_on(False)

plt.savefig(output_file, dpi=200, format='png', bbox_inches='tight', pad_inches=0)