OpenShift 3.11 – Reduce hostSubnetLength without downtime

Have you ever had a situation, where you installed OpenShift 3.11 and after it got taken into production it needed adding way more new nodes as expected? Additionaly – did it happen to you, that you could not extend cluster network anymore because your IT did not want to give you even more IP Addresses? I faced such a situation and realised, that altough setting hostSubnetLength to 8 (reservind 250 IP for pods on each node) I never came even close to half of it before the node got fully saturated.

In such a situation you may think (at least I did so) – why then not to reduce hostSubnetLength to 7, giving each node 126 IP addresses for pods, while it would still be more than needed. At the same time it would allow to double the number of nodes without the need of ip range expansion… Sounds straight forward, but… it isn’t.

According to OpenShift Documentation – it is not possible to change this after performing the installation!

Well – googling the internet in and out did not give a satisfying result, so here is how I did it (even without downtime, as my cluster is a multi-master environment).

Disclaimer

This procedure is not officially supported by RedHat and may render you cluster unusable. Make sure to test it on a non-production cluster first. You should also have a fully-recoverable backup at hand, should things get terribly wrong! There are some troubleshooting hints at the end of this article, but make sure you read it completely and understand what is being done before continuing! You have been warned.

First, just to save initial state perform and save the results of oc adm diagnostics networkcheck on any master node to make sure there are no network problems in the cluster before we even begin.

Now, you have to adjust the configuration in your installation ansible inventory, just to stay in sync should you need to re-run some of the playbooks in future (like scaling up the cluster with new nodes). This is one of the simplest things to do – just replace old osm_host_subnet_length with new one (smaller).

os_sdn_network_plugin_name=redhat/openshift-ovs-networkpolicy
openshift_portal_net=10.1.128.0/23
osm_cluster_network_cidr=10.1.120.0/21
osm_host_subnet_length=7

Then, save your current clusternetwork (just for any case, should things get wrong):

[root@master1 ~]# oc get clusternetwork
NAME      CLUSTER NETWORKS    SERVICE NETWORK   PLUGIN NAME
default   10.1.120.0/21:8   10.1.128.0/23   redhat/openshift-ovs-networkpolicy
oc get clusternetwork -o yaml > clusternetwork-backup.yml
[root@master1 ~]#

Then edit on all master nodes /etc/origin/master/master-config.yaml, in particular this section, to specify new hostSubnetLength:

networkConfig:
  clusterNetworks:
  - cidr: 10.1.120.0/21
    hostSubnetLength: 7
  externalIPNetworkCIDRs:
  - 0.0.0.0/0
  networkPluginName: redhat/openshift-ovs-networkpolicy
  serviceNetworkCIDR: 10.1.128.0/23

After editing it on all master nodes delete current default cluster network and restart master api and controllers on each master node. Pause between nodes, to allow the api and controllers being restartet to fully start. New cluster network definition should show:

[root@master1 ~]# oc delete clusternetwork --all
clusternetwork.network.openshift.io "default" deleted
[root@master1 ~]# master-restart api; master-restart controllers
2
2
[root@master2 ~]# master-restart api; master-restart controllers
2
2
[root@master3 ~]# master-restart api; master-restart controllers
2
2
[root@master1 ~]# oc get clusternetwork
NAME      CLUSTER NETWORKS    SERVICE NETWORK   PLUGIN NAME
default   10.1.120.0/21:7   10.1.128.0/23   redhat/openshift-ovs-networkpolicy

This still does not change hostsubnet for existing hosts (see /24 mask). In order to do this, you need to change hostsubnet objects:

[root@master1 ~]# oc get hostsubnet
NAME          HOST          HOST IP       SUBNET          EGRESS CIDRS   EGRESS IPS
infra1.lan    infra1.lan    10.1.118.15   10.1.123.0/24   []             []
infra2.lan    infra2.lan    10.1.118.16   10.1.126.0/24   []             []
infra3.lan    infra3.lan    10.1.118.17   10.1.124.0/24   []             []
master1.lan   master1.lan   10.1.118.10   10.1.122.0/24   []             []
master2.lan   master2.lan   10.1.118.11   10.1.121.0/24   []             []
master3.lan   master3.lan   10.1.118.12   10.1.120.0/24   []             []
node01.lan    node01.lan    10.1.118.20   10.1.125.0/24   []             []
node02.lan    node02.lan    10.1.118.21   10.1.127.0/24   []             []

You cannot however do this via oc edit hostsubnet as openshift will not allow it. In order to edit this you have first to create a yaml dump of current version and edit it with your preferred text editor:

[root@master1 ~] oc get hostsubnet -o yaml > hostsubnet.yaml
[root@master1 ~] vi hostsubnet.yaml

Edit the file and change all subnet: 10.1.xxx.0/24 to a smaller subset of the subnet. Here it would be just changing /24 to /25. Make sure, that new smaller subnet is inside previous bigger subnet, as the cluster will be restarted in rolling-way. (Theoretically it should be possible to change this to entire new address space, but first I would edit master-config on all master nodes and shutdown all nodes except one master, but this has not been tested or even tried. Feel free to test on a test system)

Now the tricky part – how to update non-updateable hostSubnet entries? Re-create them.

[root@master1 ~]# oc delete hostsubnet --all
hostsubnet.network.openshift.io "infra1.lan" deleted
hostsubnet.network.openshift.io "infra2.lan" deleted
hostsubnet.network.openshift.io "infra3.lan" deleted
hostsubnet.network.openshift.io "master1.lan" deleted
hostsubnet.network.openshift.io "master2.lan" deleted
hostsubnet.network.openshift.io "master3.lan" deleted
hostsubnet.network.openshift.io "node01.lan" deleted
hostsubnet.network.openshift.io "node02.lan" deleted
[root@master1 ~]# oc apply -f hostsubnet.yaml
hostsubnet.network.openshift.io/infra1.lan created
hostsubnet.network.openshift.io/infra2.lan created
hostsubnet.network.openshift.io/infra3.lan created
hostsubnet.network.openshift.io/master1.lan created
hostsubnet.network.openshift.io/master2.lan created
hostsubnet.network.openshift.io/master3.lan created
hostsubnet.network.openshift.io/node01.lan created
hostsubnet.network.openshift.io/node02.lan created
[root@master1 ~]# oc get hostsubnet
NAME          HOST          HOST IP       SUBNET          EGRESS CIDRS   EGRESS IPS
infra1.lan    infra1.lan    10.1.118.15   10.1.123.0/25   []             []
infra2.lan    infra2.lan    10.1.118.16   10.1.126.0/25   []             []
infra3.lan    infra3.lan    10.1.118.17   10.1.124.0/25   []             []
master1.lan   master1.lan   10.1.118.10   10.1.122.0/25   []             []
master2.lan   master2.lan   10.1.118.11   10.1.121.0/25   []             []
master3.lan   master3.lan   10.1.118.12   10.1.120.0/25   []             []
node01.lan    node01.lan    10.1.118.20   10.1.125.0/25   []             []
node02.lan    node02.lan    10.1.118.21   10.1.127.0/25   []             []

Now it is time to do restart of each node in the cluster to allow docker to start with new configuration. To avoid downtime this has to be done one by one, waiting for all nodes to fully start before draining next one. I would suggest that you start with master nodes, then with remaining other nodes, but you may as well do this starting from worker nodes, and finishing with master nodes. Make sure to have enough nodes of each role, so that after draining any node there is still room left in the cluster. Otherwise partial downtime may occure. Do the restart one by one by following this procedure for each one:

oc adm drain master1 --ignore-daemonsets --delete-local-data
login to master1 and do shutdown -r now
wait until node reboots and oc get node shows node as Ready,SchedulingDisabled
re-enable node with oc adm uncordon master1
repeat steps 1-4 for other master nodes
repeat steps 1-4 for all other nodes

After starting, you should not be able to run more than 126 pods on each node. You may want to edit node description to specify new maxpods value. You can test it with trying to start multiple small pods with node-selector of a given node until you reach an scheduling error like:
failed to allocate for range 0: no IP addresses available in range set: 10.1.124.1-10.1.124.126

After this operation you should check the cluster for stability, especially for inter-node connectivity. This can be done by running the oc adm diagnostics networkcheck command from master node. It should not notice any errors or warnings.

You should also be able to add new nodes to a cluster, that until now seemed not extensible due to not enough host subnets inside cluster network. It would just allocate 10.1.xxx.128/25 networks for up to 8 additional new nodes. Hope this helps you, at least this is how I managed to extend such a cluster without the need of allocating new address ranges.

Possible troubleshooting

If after performing this steps you observe problems inside cluster (that did not existed before), especially with SDN or inter-node connectivity you may try to re-create all SDN pods by running this commands:

[root@master1 ~]# oc delete pod -n openshift-sdn -l app=sdn
[root@master1 ~]# oc delete pod -n openshift-sdn -l app=ovs

After adding some new nodes make sure that they got subnets that do not overlap! Should this happen you may need to re-edit hostsubnet so that they do not overlap and restart affected nodes. If after correcting hostsubnet still inter-node communication problems occure, re-create the sdn/ovs pods as shown above. This should not happen, but it happened at once for me, maybe because I re-applied saved hostsubnet after adding nodes. It is however worth checking

Should this not help, you can always take back the changes done by re-applying this procedure with previous host Subnet Length.

OpenShift 3.11 – Reduce hostSubnetLength without downtime

Disclaimer

Possible troubleshooting

Ähnliche Posts

Ein Überblick über CNI-Plugins

Energieeffizienz im Rechenzentrum

Modern Cloud Analytics – Oder: Als wir versuchten, die Wahlen in NRW vorherzusehen (Teil 2)

Schreibe einen Kommentar Antwort verwerfen