Have you ever had a situation, where you installed OpenShift 3.11 and after it got taken into production it needed adding way more new nodes as expected? Additionaly – did it happen to you, that you could not extend cluster network anymore because your IT did not want to give you even more IP Addresses? I faced such a situation and realised, that altough setting hostSubnetLength to 8 (reservind 250 IP for pods on each node) I never came even close to half of it before the node got fully saturated.
In such a situation you may think (at least I did so) – why then not to reduce hostSubnetLength to 7, giving each node 126 IP addresses for pods, while it would still be more than needed. At the same time it would allow to double the number of nodes without the need of ip range expansion… Sounds straight forward, but… it isn’t.
According to OpenShift Documentation – it is not possible to change this after performing the installation!
Well – googling the internet in and out did not give a satisfying result, so here is how I did it (even without downtime, as my cluster is a multi-master environment).
This procedure is not officially supported by RedHat and may render you cluster unusable. Make sure to test it on a non-production cluster first. You should also have a fully-recoverable backup at hand, should things get terribly wrong! There are some troubleshooting hints at the end of this article, but make sure you read it completely and understand what is being done before continuing! You have been warned.
First, just to save initial state perform and save the results of
oc adm diagnostics networkcheck on any master node to make sure there are no network problems in the cluster before we even begin.
Now, you have to adjust the configuration in your installation ansible inventory, just to stay in sync should you need to re-run some of the playbooks in future (like scaling up the cluster with new nodes). This is one of the simplest things to do – just replace old
osm_host_subnet_length with new one (smaller).
os_sdn_network_plugin_name=redhat/openshift-ovs-networkpolicy openshift_portal_net=10.1.128.0/23 osm_cluster_network_cidr=10.1.120.0/21 osm_host_subnet_length=7
Then, save your current clusternetwork (just for any case, should things get wrong):
[root@master1 ~]# oc get clusternetwork NAME CLUSTER NETWORKS SERVICE NETWORK PLUGIN NAME default 10.1.120.0/21:8 10.1.128.0/23 redhat/openshift-ovs-networkpolicy oc get clusternetwork -o yaml > clusternetwork-backup.yml [root@master1 ~]#
Then edit on all master nodes
/etc/origin/master/master-config.yaml, in particular this section, to specify new
networkConfig: clusterNetworks: - cidr: 10.1.120.0/21 hostSubnetLength: 7 externalIPNetworkCIDRs: - 0.0.0.0/0 networkPluginName: redhat/openshift-ovs-networkpolicy serviceNetworkCIDR: 10.1.128.0/23
After editing it on all master nodes delete current default cluster network and restart master api and controllers on each master node. Pause between nodes, to allow the api and controllers being restartet to fully start. New cluster network definition should show:
[root@master1 ~]# oc delete clusternetwork --all clusternetwork.network.openshift.io "default" deleted [root@master1 ~]# master-restart api; master-restart controllers 2 2 [root@master2 ~]# master-restart api; master-restart controllers 2 2 [root@master3 ~]# master-restart api; master-restart controllers 2 2 [root@master1 ~]# oc get clusternetwork NAME CLUSTER NETWORKS SERVICE NETWORK PLUGIN NAME default 10.1.120.0/21:7 10.1.128.0/23 redhat/openshift-ovs-networkpolicy
This still does not change hostsubnet for existing hosts (see /24 mask). In order to do this, you need to change hostsubnet objects:
[root@master1 ~]# oc get hostsubnet NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS infra1.lan infra1.lan 10.1.118.15 10.1.123.0/24   infra2.lan infra2.lan 10.1.118.16 10.1.126.0/24   infra3.lan infra3.lan 10.1.118.17 10.1.124.0/24   master1.lan master1.lan 10.1.118.10 10.1.122.0/24   master2.lan master2.lan 10.1.118.11 10.1.121.0/24   master3.lan master3.lan 10.1.118.12 10.1.120.0/24   node01.lan node01.lan 10.1.118.20 10.1.125.0/24   node02.lan node02.lan 10.1.118.21 10.1.127.0/24  
You cannot however do this via
oc edit hostsubnet as openshift will not allow it. In order to edit this you have first to create a yaml dump of current version and edit it with your preferred text editor:
[root@master1 ~] oc get hostsubnet -o yaml > hostsubnet.yaml [root@master1 ~] vi hostsubnet.yaml
Edit the file and change all
subnet: 10.1.xxx.0/24 to a smaller subset of the subnet. Here it would be just changing
/25. Make sure, that new smaller subnet is inside previous bigger subnet, as the cluster will be restarted in rolling-way. (Theoretically it should be possible to change this to entire new address space, but first I would edit master-config on all master nodes and shutdown all nodes except one master, but this has not been tested or even tried. Feel free to test on a test system)
Now the tricky part – how to update non-updateable hostSubnet entries? Re-create them.
[root@master1 ~]# oc delete hostsubnet --all hostsubnet.network.openshift.io "infra1.lan" deleted hostsubnet.network.openshift.io "infra2.lan" deleted hostsubnet.network.openshift.io "infra3.lan" deleted hostsubnet.network.openshift.io "master1.lan" deleted hostsubnet.network.openshift.io "master2.lan" deleted hostsubnet.network.openshift.io "master3.lan" deleted hostsubnet.network.openshift.io "node01.lan" deleted hostsubnet.network.openshift.io "node02.lan" deleted [root@master1 ~]# oc apply -f hostsubnet.yaml hostsubnet.network.openshift.io/infra1.lan created hostsubnet.network.openshift.io/infra2.lan created hostsubnet.network.openshift.io/infra3.lan created hostsubnet.network.openshift.io/master1.lan created hostsubnet.network.openshift.io/master2.lan created hostsubnet.network.openshift.io/master3.lan created hostsubnet.network.openshift.io/node01.lan created hostsubnet.network.openshift.io/node02.lan created [root@master1 ~]# oc get hostsubnet NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS infra1.lan infra1.lan 10.1.118.15 10.1.123.0/25   infra2.lan infra2.lan 10.1.118.16 10.1.126.0/25   infra3.lan infra3.lan 10.1.118.17 10.1.124.0/25   master1.lan master1.lan 10.1.118.10 10.1.122.0/25   master2.lan master2.lan 10.1.118.11 10.1.121.0/25   master3.lan master3.lan 10.1.118.12 10.1.120.0/25   node01.lan node01.lan 10.1.118.20 10.1.125.0/25   node02.lan node02.lan 10.1.118.21 10.1.127.0/25  
Now it is time to do restart of each node in the cluster to allow docker to start with new configuration. To avoid downtime this has to be done one by one, waiting for all nodes to fully start before draining next one. I would suggest that you start with master nodes, then with remaining other nodes, but you may as well do this starting from worker nodes, and finishing with master nodes. Make sure to have enough nodes of each role, so that after draining any node there is still room left in the cluster. Otherwise partial downtime may occure. Do the restart one by one by following this procedure for each one:
oc adm drain master1 --ignore-daemonsets --delete-local-data
- login to master1 and do
shutdown -r now
- wait until node reboots and
oc get nodeshows node as
- re-enable node with
oc adm uncordon master1
- repeat steps 1-4 for other master nodes
- repeat steps 1-4 for all other nodes
After starting, you should not be able to run more than 126 pods on each node. You may want to edit node description to specify new maxpods value. You can test it with trying to start multiple small pods with node-selector of a given node until you reach an scheduling error like:
failed to allocate for range 0: no IP addresses available in range set: 10.1.124.1-10.1.124.126
After this operation you should check the cluster for stability, especially for inter-node connectivity. This can be done by running the
oc adm diagnostics networkcheck command from master node. It should not notice any errors or warnings.
You should also be able to add new nodes to a cluster, that until now seemed not extensible due to not enough host subnets inside cluster network. It would just allocate
10.1.xxx.128/25 networks for up to 8 additional new nodes. Hope this helps you, at least this is how I managed to extend such a cluster without the need of allocating new address ranges.
If after performing this steps you observe problems inside cluster (that did not existed before), especially with SDN or inter-node connectivity you may try to re-create all SDN pods by running this commands:
[root@master1 ~]# oc delete pod -n openshift-sdn -l app=sdn [root@master1 ~]# oc delete pod -n openshift-sdn -l app=ovs
After adding some new nodes make sure that they got subnets that do not overlap! Should this happen you may need to re-edit hostsubnet so that they do not overlap and restart affected nodes. If after correcting hostsubnet still inter-node communication problems occure, re-create the sdn/ovs pods as shown above. This should not happen, but it happened at once for me, maybe because I re-applied saved hostsubnet after adding nodes. It is however worth checking
Should this not help, you can always take back the changes done by re-applying this procedure with previous host Subnet Length.