Upgrading Ubuntu servers from 16.04 to 20.04 on Digital Ocean (includes handling floating IP)

Recently I upgraded a bunch of our app servers that were on old 16.04 and other similar sub 20.04 Ubuntu versions.
(Why 20.04 vs 21.04? because every 2 years Ubuntu releases an LTS release for long term support, 20.04 is the latest LTS)

While upgrading, half of them had no issues at all, while the other half ran into networking issues, bad times but this article will help you resolve those.

Talking to Digital Ocean support about this, its not an uncommon issue with larger upgrades like these. They mentioned their recommendation is that for anyone that can, is to install a new server and transfer over the files instead. If that's an option for you, it's a good one, but in my case that wasn't as simple an option.

In this post I'll run through my tips for doing each server upgrade from 16.04 to 18.04, and then to 20.04, along with a few common issues noticed along the way to help if you run into them too.

First, start with backup / snapshot.

Digital Ocean provide a great way to take live snapshots of the servers, but the preferable way if you're able to is to turn off the server and snapshot it before continuing. For several of the servers this process took less than half an hour, all down to the size of the server, and as I ran into Network issues later, the snapshot proved especially useful as I restored back to it while I debugged the issue further.

Updates & upgrades

You'll want to make sure your packages on the server are working and up to date before you continue.
Pretty standard but run and check the output of both to make sure everything's ok:

sudo apt update && sudo apt upgrade

If you notice there's any issues like apt repos now 404'ing etc, make a note of them as you can either see if they can be fixed now, but as you're upgrading from EOL you may find some repos are no longer available on your current version and you'll need to come back to these after the upgrade.

Run the release upgrade

Before starting the next upgrade process, you'll want to open port 1022 ready. This is a port the release upgrade process opens for ssh in case there's an issue.
If you're using ufw you can run:

sudo ufw allow 1022/tcp

(Check if you're running ufw with sudo ufw status)

Now you're ready to begin the upgrade, simply start it with the following command and follow the on screen instructions:

sudo do-release-upgrade

My advice during this is answer Y (yes) to the questions about starting and automated restarts, but answer D (diff) or N (no) to the config changes.
Especially if you know you've made custom changes around files like your /etc/sysctl.conf or your /etc/nginx/nginx.conf file etc. then you'll not want to have to re-do them, but on some files it's worth checking the differences to see what's changed.

The nice thing to remember here though is that if you don't upgrade the config files, it'll save the new default one to the same location but with a .dpkg-dist extension.
e.g. /etc/nginx/nginx.conf.dpkg-dist so you can use this to compare your custom version to the new default they'd have provided later.

The same goes if you went the other way and said yes to changing the files, you'll have .dpkg-old file versions available.

Lastly, if needed you have your backup / snapshot to go back to as well if ever needed.

After finishing up the last stage of this command will be when it requests to restart your machine. Select yes and restart.

In all my upgrades, the 16.04 to 18.04 upgrades came back up fine, the only issues I ran into were during the 18.04 to 20.04 upgrade which was mainly network issues.
However, if you do find the server doesn't come back online after a couple minutes, skip down to the Resolving Network Issues section below.

Checks & repeat

After the 16.04 to 18.04 upgrade, you'll want to check everything is up and running well on your server before continuing to upgrade.

Check key services are up and running, view your sites on the frontend etc., check api monitoring etc. you might have.

IT's worth checking your apt repos here too, over in your /etc/apt/sources.list file and the files in your /etc/apt/sources.list.d folder, check for commented out repos from the upgrade and check if you want to bring them back and if so do so one at a time. Uncomment, sudo apt update to check it's working and sudo apt upgrade where needed, then repeat.

If everything's good, you can repeat the previous step again and continue on to upgrade from 18.04 to 20.04.

Resolving Network Issues

On a bunch of the servers I upgraded, I ran into networking issues where I couldn't connect over ssh to it, and once inside couldn't ping out.
This happened mostly on the 16.04 servers but during their upgrade from 18.04 to 20.04.

Luckily Digital Ocean provide a cloud console you can use to directly access the server even when it's networking is down. (Top right of the droplet info screen)

The cloud console may take a little time to start up and what I ran into was that the networking service was causing startup to take around 2 minutes.
Once in, I quickly found I needed a password as I've always just used ssh keys instead here so if you're like me you'll need to reset the root password which you can do via the servers info page under Access.
That'll email you the new password, it's a long one and the issue I've run into is that you can't paste (at least I couldn't at the time of writing this) into the console so you'll have to type the password manually.
Once you're in, if you did reset the password it'll ask for it again and to change it there, again that means typing out the long password but this'll be the last time.

The main source of the issue is due to Ubuntu switching to use netplan vs the previous networking service used in pre ubuntu 18.04.

View your current config with sudo vim /etc/netplan/50-cloud-init.yaml it should look something like this:

network:  
    ethernets:
        eth0:
            dhcp4: true
            match:
                macaddress: XX:XX:XX:XX:XX:XX
            set-name: eth0
    version: 2

Inside eth0 you need to add 2 new properties, an array of ip addresses and the gateway to use.
Edit the file and add in your servers IP and it's floating IP if you're using one like so:

network:  
    version: 2
    ethernets:
        eth0:
            addresses:
              - xxx.xx.xxx.xxx/16
              - xxx.xx.xxx.xxx/16
            gateway4: XXX.XX.XXX.X
            match:
                macaddress: XX:XX:XX:XX:XX:XX
            nameservers:
                addresses:
                - 67.207.67.2
                - 67.207.67.3
                search: []
            set-name: eth0
        eth1:
            addresses:
              - xxx.xx.xxx.xxx/16
            match:
                macaddress: XX:XX:XX:XX:XX:XX
            nameservers:
                addresses:
                - 67.207.67.3
                - 67.207.67.2
                search: []
            set-name: eth1

Your IP (and floating ip if in use) are available in your droplet info screen and your ip is also shown at the bottom of the cloud console screen too. (Add /16 to the end of these).

Get the macaddress's for eth0 and eth1 by running ifconfig (next to ether in the output blocks for each).

eth1 is the private network while eth0 is public.

Your gateway / gateway4 ip is also shown both in the droplet info screen and the bottom of the cloud console screen as well. (No /16 needed here though!)

Now you'll need to apply the changes with netplan, run:

sudo netplan apply --debug

With this, your main server can now reach and be reached by the outside world again through your droplets default ip but not the floating ip yet.

At this point, I opened an ssh connection to continue as then I can paste commands faster.

In order to fully hook up your floating ip, you need to add your servers "Anchor IP", to look this up run:

curl -s http://169.254.169.254/metadata/v1/interfaces/public/0/anchor_ipv4/address

With this, go back and edit your /etc/netplan/50-cloud-init.yaml file again and add another address ip line in for this anchor ip (with /16 after it again).

Run netplan apply again and you should now be able to access your server via it's floating ip again.

sudo netplan apply

Last part, after doing the above I found that after rebooting my changes to this file would be lost. So before rebooting, you'll want to copy this file to a couple locations to make sure it's always applied (once for netplan as a backup and another for cloud-init).

sudo cp /etc/netplan/50-cloud-init.yaml /etc/netplan/01-netcfg.yaml  
sudo cp /etc/netplan/50-cloud-init.yaml /etc/cloud/cloud.cfg.d/50-curtin-networking.cfg

Next disable the network conf changes in cloud-init by adding "{config: disabled}" into the following file:

sudo vim /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg

Finally you can re-init cloud-init and netplan once more to be safe.

sudo cloud-init clean -r  
sudo netplan apply

You should now be safe to reboot your machine, it can take a couple minutes still to come up due to cloud init waiting for old networking (working on a solution for this too but if you have a simple one please let me know) but it'll come back up fine and all connected to your droplets default ip and floating ip.

PHP version issues

You may find after the upgrade that php's upgraded, e.g. if you've been running 7.2 with fpm, you might find that service is not running and instead 7.4 or 8.0 is running instead.
Here what you can do is simply stop the version you don't want yet (if you're not ready to upgrade that yet) and then start the service you want.

E.g.

sudo systemctl stop php8.0-fpm  
sudo systemctl start php7.2-fpm

You can then check it's status and logs with the following commands:

sudo systemctl status php7.2-fpm  
sudo journalctl -u php7.2-fpm  
less /var/log/php7.2-fpm.log

If it's all working, you can disable the previous service all together, or even mask it to prevent it starting up in the future.

sudo systemctl disable php8.0-fpm  
sudo systemctl mask php8.0-fpm

If you do mask the service you can unmask it at a later date if you want to enable it in the future.

you might also want to change your local CLI version of php, e.g. roll it back to use 7.2 as default.

sudo update-alternatives --set php /usr/bin/php7.2

Resolving MySQL issues

You might also run into some issues if your mysql version upgrades.
On one of the servers I was upgrading that ran mysql it upgraded to v8 from 5.6.
Surprisingly easy upgrade in the end but I did run into some issues start/restarting.

To debug I used 2 terminals, 1 to restart mysql and the other to watch the mysql error log where I'd watch to see error lines.

tail -f /var/log/mysql/error.log

If mysql isn't coming back online you should see the error in the log output.
It'll look something like this:

2021-04-07T17:52:31.945816Z 0 [ERROR] [MY-000067] [Server] unknown variable 'query-cache-size=0'.  
2021-04-07T17:52:31.946389Z 0 [ERROR] [MY-010119] [Server] Aborting

The above was a real example from that server, in this case it turns out mysql v8 [removed a bunch of config params] (https://dev.mysql.com/doc/refman/8.0/en/added-deprecated-removed.html), query-cache-size included so the fix was to edit the mysql config file and comment out that line.

sudo vim /etc/mysql/my.cnf

In this case the query cache has been retired from mysql so there's not a straight foward replacement for this config var. But in other cases it's worth exploring what the alternative is if a config variable has maybe been dropped in favour of another etc. Google is your friend here.

Another common connection issue can be from the sql_mode needing to be set to blank or tweaked.
As well as another issue where you need to change the default-authentication-plugin = mysql_native_password to mimic how it used to run before.

Node version issues

Some of our apps on these app servers ran using older node versions and though we plan to upgrade those, while upgrading the server OS you might not want to handle that separate upgrade at the same time.

That's where nvm comes in, the node version manager.

To install nvm, run the following command:

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash

Then edit where your node service starts up, e.g. for me I use systemd service files for these.

E.g. inside your apps service file "/etc/systemd/system/app-name-here.service"

If previously you had a line like:

ExecStart=node /srv/app-name/index.js

Update it to use the nvm init script and to run with your desired node version: (in this example I'm using an the latest node v6 release)

ExecStart=bash -c ". /home/andrew/.nvm/nvm.sh && nvm run 6 /srv/app-name/index.js"

I hope this guide has helped you with your server upgrades, at the least it serves as a reminder to me of what I've done on mine but if you have any comments or questions let me know in the comments below.

Thanks for reading!