Thursday, April 30, 2026

Cilium and Kubernetes - Caveats / Concepts

Cilium is a widely used open-source solution that integrates into the Container Network Interface (CNI) hooks of Kubernetes to provide software-based load balancing and SSL processing. This allows generic computing horsepower to be used in lieu of expensive and proprietary hardware more commonly used for these functions in years prior. While the capabilities and lower financial costs of Cilium are attractive, use of Cilium does impose a significant learning curve and exhibits a few technical difficulties that make it more difficult to confirm configurations are correct for desired solutions.

The differences between hardware-based and software-based load balancing and SSL processing, installation of Cilium and use of ARP versus BGP access patterns are provided in these related posts:

Cilium and Kubernetes - Caveats / Concepts
Cilium and Kubernetes - Installing Cilium Within Kubernetes
Cilium and Kubernetes - Configuring SSL and Load Balancing
Cilium and Kubernetes - Externally Accessing Services via ARP
Cilium and Kubernetes - Externally Accessing Services via BGP

Important Cilium Caveats

After eventually resolving specific quirks with the implementation of Cilium, use of Cilium for load balancing and SSL / TLS processing within Kubernetes is relatively straightforwrard in hindsight. However, the Cilium capabilities come with two CRUCIAL caveats. One is related to a documented bug within Cilium which has not been fixed as of version 1.19.3 (April 30, 2026) and has no specific timeline for development / resolution. The second involves a likely flaw with FRR (Free Range Router) used in illustrating BGP based access to services deployed using Cilium. The SHORT descriptions of these two flaws are provided below.

Missing tags on auto-generated services for Gateways -- When a Gateway is defined for SSL processing, Cilium generates a second virtual load balancer Service object. However, this second Service is not created with the tags used for the parent Gateway that trigger advertisement of the assigned virtual IP when using ARP or BGP. This makes the Gateway VIP unreachable from outside the cluster, defeating the very purpose of using Cilium.

WORKAROUND -- This problem can be solved by manually applying the required key=value label to the auto-generated Service resource after Cilium creates it.

Use of Free Range Router (FRR) in virtualized environments can encounter IP forwarding glitches -- Cilium provides its own code that implements a BGP router process within Kubernetes. External to the cluster, either a real router with BGP capabilities or a virtual router must speak BGP to learn routes from the Kubernetes cluster for advertised IPs. Free Range Router (FRR) is a common open-source BGP router implementatio that runs atop Linux. Use of FRR on a virtual machine that is also talking to a Cilium BGP router also running on a virtualized machine can encounter situations where a Linux server running FRR will not properly forward incoming traffic towards the Kubernetes cluster. At the same time, test traffic originated WITHIN that Linux FRR host will function correctly.

WORKAROUND -- This problem can be mitigated by running the FRR instance on a hardware Linux instance rather than a VM. The problem is likely due to too many layers of nested virtualized Ethernet packet handling. Running FRR on a physical Linux host exhibits no issues accepting incoming traffic and forwarding it correctly into the Kubernetes BGP router instances.

Overall, Cilium is relatively easy to deploy, undeploy, re-deploy, ad nauseum. However, partly due to its nature and that of Kubernetes in general, the process of getting a service deployed on a Kubernetes + Cilium platform can be INCREDIBLY error prone unless one works from a very formal checklist to make sure all reqirements have been addressed. This document will provide a visual image illustrating all of these capabilities:

defining distinct IP ranges for pods deployed to dev and prod environments
defining distinct IP ranges for virtual IPs deployed to dev and prod environments
using unique ConfigMap entries for dev and prod environments
using unqiue SSL cert/key files for dev and prod environments
making hostname distinctions in URLs for HTTPRoute objects
configuring distinct dev / prod Gateway objects with SSL termination

If Kubernetes is ONLY being used in a hobbyist scenario purely for hobbies, virtually none of this complexity is required or warranted. However, if you are learning Kubernetes as part of work requirements, it is likely ANY use of Kubernetes will require these types of operational distinctions to be understood and incorporated into local practices.

Traditional Hardware Based Load Balancing and SSL

Understanding the configuration and use of Cilium is easier after first comparing a traditional hardware based service topology to a Cilium-based footprint within Kubernetes. The approach with Cilium and Kubernetes is replicating each task required via hardware but doing so using dynamically assigned resources. A single administrator can drive all of the assignments but that requires a single administrator to UNDERSTAND all of the steps that were often handled by multiple people in a hardware driven environment.

In this example, a Java Spring Boot web service that accesses financial CD information is being deployed to run in three instances behind a load balancer which not only balances the traffic but handles all SSL processing that encrypts client traffic prior to executing the core service.

In a hardware based deployment, the topology might look like this:

In this physical topology,

The web service runs as a JVM on three physical hosts on the arbitrary subnet 192.168.101.0/24
One or more routers connects this host segment to another segment housing a load balancer appliance - a physical system commonly using ASICs to accelerate balancing decisions and handle de-encryption and encryption of traffic using SSL / TLS keys.
The load balancer's configuration for this service includes the public and private key pair associated with the hostname api.mdhlabs.com
Actual clients accessing the web service behind api.mdhlabs.com exist on another subnet 192.168.99.0/24 which might be one or more router hops away from the load balancer.

In order to turn up this configuration,

A load balancer appliance had to be purchased and installed, at a cost which might easily surpass $100,000 per appliance for high-capacity units.
A security administrator had to generate or purchase the public / private keys for api.mdhlabs.com
Server administrators had to assign IPs for the three physical hosts, turn up those hosts and deploy the JAR file and JVM to run the three instances of the service.
The load balancer administrator had to assign an IP for the virtual IP of api.mdhlabs.com
The load balancer administrator and network administrator had to ensure the IP range used for the virtual IPs was routed within all of these networks and reachable from the 192.168.99.0/24 subnet.
The load balancer administrator had to coordinate with the application administrators to identify what health checks to perform on the service to eliminate dead members from the pool to avoid outages, etc.

Software Based Load Balancing and SSL

Use of Cilium within Kubernetes to perform the "layer 7" functions of load balancing and SSL termination eliminates the need for expensive appliance hardware by spreading the processing work among all of the compute nodes within the Kubernetes cluster. However, virtually all of the same administrative tasks are still required, they are just performed in a different way.

The topology below reflects the typical Deployment, Service, HTTPRoute and Gateway objects configured within Kubernetes to implement load balancing and SSL termination.

In this virtual software-based topology,

A deployment defines a ReplicaSet that operates three instances of the web service as pods on the three Kubernetes nodes.
A service defines a single load balancer IP in front of the three service pods that evenly distributes traffic to the unencrypted endpoints. This virtual IP is automatically assigned from a previously configured pool of IP addresses.
An HTTPRoute object identifies hostname references and URI paths that should be steered to the Service load balancer.
A Gateway object identifies SSL keys and expected hostnames that should be used to decode incoming external traffic that can then be routed by the HTTPRoute into the Service into the Pods. A second virtual IP is automatically assigned to this endpoint from a predefined range.

In order to turn up this configuration,

No hardware load balancer was required -- all processing is performed by the hosts within the Kubernetes cluster.
The Kubernetes administrator had to coordinate with the network administrator to pick a pool of IP addresses to automatically allocate within Cilium.
IP addresses for the pods are automatically assigned.
The Kubernetes administrator had to obtain the same public / private SSL keys for api.mdhlabs.com
The Kubernetes administrator had to coordinate with the application administrators to identify what health checks to perform on the service to eliminate dead members from the pool to avoid outages, etc.
The Kubernetes administrator and application administrator had to define the Service, HTTPRoute and Gateway resources within additional meta tags to drive advertisement of the assigned virtual IPs into the larger network to allow client access.

Use of Cilium or similar "layer 7" tools within Kubernetes involves addressing one final networking problem -- ensuring access to the resulting service from outside the cluster. Cilium provides two approaches for doing this:

Advertising via ARP -- This approach requires the virtual IPs to be assigned from the same subnet used for the physical Kubernetes hosts. The nodes then answer any ARP request attempting to find a MAC for the active virtual IP then process any traffic sent to that VIP.
Advertising via BGP -- This approach uses the BGP routing protocol to advertise host routes (e.g. 192.168.77.128/32) for each virtual IP to routers outside the Kubernetes cluster which use them to route traffic to the VIPs into the cluster for processing.

The ARP approach is somewhat less complicated but poses more potential for administrative mistakes if IPs meant for virtual IPs are accidently assigned to new physical hosts on the network segment serving the physical Kubernetes hosts. The BGP approach requires more expertise at network design and routing protocols but makes it easier to avoid assignment conflicts between VIPs and physical hosts. Each approach requires different configuration options to be supplied to Cilium at installation, however these configuration options are not mutually exclusive to each other. Both approaches can be used in a single Kubernetes cluster simultaneously without conflict.

More information on using Cilium within Kubernetes is provided in other posts in this series:

Cilium and Kubernetes - Installing Cilium Within Kubernetes

This post is installment #2 in a series of posts providing directions on installing and using Cilium for load balancing and SSL processing. Links to all of the posts in the series are provided below for convenience.

As mentioned in the first installment on concepts, Cilium allows an ARP based approach and a BGP routing based approach to make internal virtual IPs reachable from clients outside the Kubernetes cluster. These approaches are not mutually exclusive of each other and require careful consideration of connectivity needs and available IP space at the time Cilium is being installed as well as for routine administration tasks.

Because this tutorial series is intended to explain BOTH the ARP and BGP approaches, the illustrations and instructions will reflect BOTH approaches being deployed within a single cluster simultaneously with different services being active simultaneously using both approaches. At first, this may appear to make naming conventions for files and Kubernetes objects more verbose / obtuse than necessary. However, such complexities will better illustrate where the two approaches differ and how those differences need to be accomodated outside the cluster.

Selecting and Devising an IP Scheme

If load balancer virtual IPs are going to be advertised for external access via ARP, the IP range selected must be assigned in light of these needs:

The IP range must be used for the physical Ethernet ports of the Kuberenetes nodes in the cluster.
The IP range must be large enough to contain the number of physical Kubernetes nodes in the cluster AND twice the number of externally accessible VIPs (one for the service VIP and one for the Gateway VIP)

As an example, if the Kubernetes cluster will consist of five nodes and a total of twenty services requiring SSL and load balancing will be deployed in the cluster, the ARP approach requires a subnet of at least 45 IPs to be allocated. Given subnetting logistics, the smallest block providing 45 IPs would be a /26 which contains 64 consecutive IPs.

If using BGP advertising, the IP range used for load balancer VIPs does not have to reside within the physical subnet used for the physical Kubernenetes hosts. It can be any range not already in use in the larger network.

For home hobby labs, it isn't likely necessary to further subdivide IP ranges used for load balancer IPs. However, for larger clusters that might serve a mix of production, STAGE, QA and DEV work, it may be advisible to further subdivide ranges into environment-specific ranges so firewalls can block non-production traffic from reaching production endpoints. Since this series is intended to explain some of the more complicated subjects, the following overall plan will be used to dedicate specific ranges for ARP and BGP for both a production and development environment.

In the examples to follow, the ARP advertisement functionality will be demonstrated using IPs in the 192.168.99.0/24 subnet already in use for the physical Kubernetes nodes. The resulting setup is reflected in this visual:

In the examples to follow, the BGP advertisement functionality will be demonstrated using IPs a standalone IP range of 192.168.77.0/24. The resulting setup is reflected in this visual:

Of course, one extra complexity of using the BGP approach in a home hobbyist setting is that the approach requires at least one additional BGP router existing outside the cluster to propagate routes into the local host network. It is possible to use FRR (Free Range Router) on Linux as a BGP-capable router but this involves more configuration work in technical areas many software developers avoid -- network design, subnetwork configurations and routing.

Kubernetes Host Preparation Steps

If using the ARP approach, many virtualization platforms including VMware and ProxMox may require modifications to the network configuration of virtual machines to DISABLE protections they deploy by default to halt "gratuitous ARP" traffic being originated from virtual hosts.

On ProxMox systems, this ARP filtering can be disabled by unchecking the Firewall attribute on the virtual machine's Network Device configuration. Here, the function is ENABLED and needs to be unchecked.

Also, for each VM, the MAC Filtering attribute should be disabled. The screen below shows the attribute still enabled which can interfer with ARP behavior required for load balancing.

Disabling Firewall Software

Attempting to run any CNI solution (Istio, MetalLB, Cilium) on nodes built upon virtual machines tends to encounter numerous errors dealing with the virtualization of virtualization of virtualization of physical Ethernet port processing, especially related to MAC discovery via ARP protocols. To avoid as much of these headaches as possible, it is wise to completely disable any firewall functionality running as part of the operating system. In Fedora, the firewalld daemon is enabled by default but should be disabled via these commands.

[root@kube1 ~]# systemctl status firewalld
○ firewalld.service - firewalld - dynamic firewall daemon
     Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: inactive (dead)
       Docs: man:firewalld(1)
[root@kube1 ~]#

If the firewalld daemon is still running, it can be stopped and disabled for future reboots via this command.

[root@kube1 ~]# systemctl disable --now firewalld
Removed '/etc/systemd/system/multi-user.target.wants/firewalld.service'.
Removed '/etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service'.
[root@kube1 ~]#

This step should be performed on ANY virtual machine acting as a node in a Kubernetes cluster.

Enabling ARP Traffic via Kernel Parameters

In addition to disabling functions running on hypervisors such as ProxMox or VMware that might block required ARP traffic, there are also operating system network settings which may default to inhibiting processing of ARP traffic that need to be explicitly enabled. These requirements are described in section 10.1.3 and are summarized here as well.

[root@kube1 ~]# sysctl --all | grep "ipv4.conf" | grep arp | grep default
net.ipv4.conf.default.arp_accept = 1
net.ipv4.conf.default.arp_announce = 1
net.ipv4.conf.default.arp_evict_nocarrier = 1
net.ipv4.conf.default.arp_filter = 0
net.ipv4.conf.default.arp_ignore = 0
net.ipv4.conf.default.arp_notify = 1
net.ipv4.conf.default.drop_gratuitous_arp = 0
net.ipv4.conf.default.proxy_arp = 0
net.ipv4.conf.default.proxy_arp_pvlan = 0
[root@kube1 ~]# sysctl --all | grep "ipv4.conf" | grep arp | grep ens18
net.ipv4.conf.ens18.arp_accept = 1
net.ipv4.conf.ens18.arp_announce = 1
net.ipv4.conf.ens18.arp_evict_nocarrier = 1
net.ipv4.conf.ens18.arp_filter = 0
net.ipv4.conf.ens18.arp_ignore = 0
net.ipv4.conf.ens18.arp_notify = 1
net.ipv4.conf.ens18.drop_gratuitous_arp = 0
net.ipv4.conf.ens18.proxy_arp = 0
net.ipv4.conf.ens18.proxy_arp_pvlan = 0
[root@kube1 ~]#

Installing bpftool (Berkley Packet Filter) and Helm

CNI functions within Kubernetes use a lower level kernel framework named Berkeley Packet Filtering that allow layer 7 application traffic filtering to be inserted within the TCP kernel for more efficient processing. Most modern kernels actually enable BPF by default. However, many operating systems do NOT automatically include a companion diagnostic tool named bpftool with the core kernel function. That can be added by running this command as root.

dnf install bpftool

With that tool available, the required modules can be verified with the following command:

[root@kube1 ~]# bpftool feature probe kernel | grep -E "CONFIG_BPF|CONFIG_CGROUP_BPF|CONFIG_XDP"
CONFIG_BPF is set to y
CONFIG_BPF_SYSCALL is set to y
CONFIG_BPF_JIT is set to y
CONFIG_BPF_JIT_ALWAYS_ON is set to y
CONFIG_CGROUP_BPF is set to y
CONFIG_BPF_EVENTS is set to y
CONFIG_BPF_KPROBE_OVERRIDE is not set
CONFIG_XDP_SOCKETS is set to y
CONFIG_BPF_LIRC_MODE2 is set to y
CONFIG_BPF_STREAM_PARSER is set to y
[root@kube1 ~]#

The eBPF functionality is required on physical hosts acting as nodes in the Kubernetes cluster (like kube1 and kube2). It is NOT needed on hosts merely used for administration and source code development (like fedora1).

Use of Cilium within a cluster also requires use of the Helm configuration CLI of Kubernetes to be available on machines used to administer the cluster. Here, fedora1 is used for development and administration so Helm is installed by pulling a shell script from the Helm GitHub site, making the script executable and running it as shown here.

Installing Cilium CLI and the Cilium Deployment Within a Cluster

There are a few key tasks for installing Cilium with appropriate parameters for performing load balancing with the other components being used.

Installing the Cilium CLI on an administrative host like fedora1.
Removing the default Kubernetes kube-proxy function from the cluster which will be replaced by Cilium
Using helm to deploy Cilium functions to the cluster with appropriate eBPF
Verifying status

Installing Cilium CLI

Cilium includes its own command line interface (CLI) tool that is used for installing the software on a cluster and configuring and inspecting Cilium functions within that cluster on an ongoing basis.

CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}

Disabling kube-proxy Daemon Set from Kubernetes

Default Kubernetes functionality for proxying traffic is provided by the kube-proxy daemon set. This functionality is replaced by Cilium so does not need to be active in the cluster. Prior to installing Cilium, it can be removed by the following commands:

mdh@fedora1:~ $ kubectl -n kube-system get daemonset
NAME           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-proxy     2         2         2       2            2           kubernetes.io/os=linux   17d
mdh@fedora1:~ $
mdh@fedora1:~ $ kubectl -n kube-system delete daemonset kube-proxy
daemonset.apps "kube-proxy" deleted from kube-system namespace
mdh@fedora1:~ $ kubectl -n kube-system get daemonset
NAME           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
mdh@fedora1:~ $

Deploying Cilium to the Cluster

Cilium can be deployed to the cluster by either using the CLI just installed via a cilium install command or by using helm. For the most part, the two approaches are equivalent but two crucial caveats apply for both.

EVERY OPTION shown below must be included EXACTLY as typed for the load balancing and external routing into load balancer virtual IPs to function correctly.
If installed via a cilium install command, adding additional settings generally requires a cilium uninstall then the new cilium install command to be executed. Examples online of updating a Cilium deploymenet using helm update to change individual settings generally DO NOT WORK if the original install was accomplished via cilium install.

The default installation of Cilium that will be executed with a simple cilium install command using the CLI will NOT enable the proper options required for load balancing and proxy (dumb!). In particular, the kubeProxyReplacement parameter must be set to true. The command below must also supply the host IP and TCP port used by the master node of the target Kubernetes cluster. This can be verified with this command:

mdh@fedora1:~ $ kubectl cluster-info
Kubernetes control plane is running at https://192.168.99.12:6443
CoreDNS is running at https://192.168.99.12:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
mdh@fedora1:~ $

With that information confirmed, the host of 192.168.99.12 (kube1) and TCP port 6443 can be referenced in the command below.


helm upgrade --install cilium cilium/cilium --version 1.19.3 \
 --namespace kube-system \
 --set k8sServiceHost=192.168.99.12 \
 --set k8sServicePort=6443 \
 --set ipam.mode=multi-pool \
 --set ipam.operator.autoCreateCiliumPodIPPools.default.ipv4.cidrs='{10.99.0.0/16}' \
 --set ipam.operator.autoCreateCiliumPodIPPools.default.ipv4.maskSize=27 \
 --set ipam.operator.clusterPoolIPv4PodCIDRList=10.0.0.0/8 \
 --set ipam.operator.clusterPoolIPv4MaskSize=27 \
 --set kubeProxyReplacement=true \
 --set enableLoadBalancer=true \
 --set hostServices.enabled=true \
 --set externalIPs.enabled=true \
 --set nodePort.enabled=true \
 --set hostPort.enabled=true \
 --set gatewayAPI.enabled=true \
 --set gatewayAPI.enableAlpn=true \
 --set devices=ens18 \
 --set bpf.masquerade=true \
 --set ipv4.masquerade=true \
 --set l2announcements.enabled=true \
 --set l2announcements.leaseDuration=10s \
 --set l2announcements.leaseRenewDeadline=5s \
 --set l2announcements.leaseRetryPeriod=1s \
 --set l2NeighDiscovery.enabled=true \
 --set k8sClientRateLimit.qps=20 \
 --set k8sClientRateLimit.burst=30 \
 --set l2announcements.leaseRetryPeriod=1s \
 --set l7proxy=true \
 --set nodeIPAM.enabled=true \
 --set bgpControlPlane.enabled=true \
 --set bgp-secrets-namespace=kube-system \
 --set hubble.enabled=true \
 --set hubble.relay.enabled=true \
 --set hubble.ui.enabled=true

At any point, if particular configuration settings of a running Cilium deployment need to be confirmed, all current settings can be summarized using the cilium config view command as shown below.

Given the criticality of parameters specified during installation, documenting both the installation command above and this configuration view output as a reference is highly suggested. If Cilium needs to be re-installed in the future, knowing the EXACT installation command can avoid HOURS of troubleshooting subtle issues stemming from a single omitted parameter.

mdh@fedora1:~ $ cilium config view
agent-not-ready-taint-key                         node.cilium.io/agent-not-ready
auto-create-cilium-pod-ip-pools                   default=ipv4-cidrs:10.99.0.0/16;ipv4-mask-size:27
auto-direct-node-routes                           false
bgp-router-id-allocation-ip-pool
bgp-router-id-allocation-mode                     default
bgp-secrets-namespace                             kube-system
bpf-distributed-lru                               false
bpf-events-drop-enabled                           true
bpf-events-policy-verdict-enabled                 true
bpf-events-trace-enabled                          true
bpf-lb-acceleration                               disabled
bpf-lb-algorithm-annotation                       false
bpf-lb-external-clusterip                         false
bpf-lb-map-max                                    65536
bpf-lb-mode-annotation                            false
bpf-lb-sock                                       false
bpf-lb-source-range-all-types                     false
bpf-map-dynamic-size-ratio                        0.0025
bpf-policy-map-max                                16384
bpf-policy-stats-map-max                          65536
bpf-root                                          /sys/fs/bpf
cgroup-root                                       /run/cilium/cgroupv2
cilium-endpoint-gc-interval                       5m0s
cluster-id                                        0
cluster-name                                      default
clustermesh-cache-ttl                             0s
clustermesh-enable-endpoint-sync                  false
clustermesh-enable-mcs-api                        false
clustermesh-mcs-api-install-crds                  true
cni-exclusive                                     true
cni-log-file                                      /var/run/cilium/cilium-cni.log
custom-cni-conf                                   false
datapath-mode                                     veth
debug                                             false
debug-verbose
default-lb-service-ipam                           lbipam
devices                                           ens18
direct-routing-skip-unreachable                   false
dnsproxy-enable-transparent-mode                  true
dnsproxy-socket-linger-timeout                    10
egress-gateway-reconciliation-trigger-interval    1s
enable-auto-protect-node-port-range               true
enable-bgp-control-plane                          true
enable-bgp-control-plane-status-report            true
enable-bgp-legacy-origin-attribute                false
enable-bpf-clock-probe                            false
enable-bpf-masquerade                             true
enable-drift-checker                              true
enable-dynamic-config                             true
enable-endpoint-health-checking                   true
enable-endpoint-lockdown-on-policy-overflow       false
enable-envoy-config                               true
enable-gateway-api                                true
enable-gateway-api-alpn                           true
enable-gateway-api-app-protocol                   true
enable-gateway-api-proxy-protocol                 false
enable-gateway-api-secrets-sync                   true
enable-health-check-loadbalancer-ip               false
enable-health-check-nodeport                      true
enable-health-checking                            true
enable-hubble                                     true
enable-ipv4                                       true
enable-ipv4-big-tcp                               false
enable-ipv4-masquerade                            true
enable-ipv6                                       false
enable-ipv6-big-tcp                               false
enable-ipv6-masquerade                            true
enable-k8s-networkpolicy                          true
enable-l2-announcements                           true
enable-l2-neigh-discovery                         true
enable-l7-proxy                                   true
enable-lb-ipam                                    true
enable-masquerade-to-route-source                 false
enable-metrics                                    true
enable-no-service-endpoints-routable              true
enable-node-ipam                                  true
enable-node-selector-labels                       false
enable-non-default-deny-policies                  true
enable-policy                                     default
enable-policy-secrets-sync                        true
enable-sctp                                       false
enable-service-topology                           false
enable-source-ip-verification                     true
enable-tcx                                        true
enable-vtep                                       false
enable-well-known-identities                      false
enable-xt-socket-fallback                         true
envoy-access-log-buffer-size                      4096
envoy-base-id                                     0
envoy-config-retry-interval                       15s
envoy-keep-cap-netbindservice                     false
external-envoy-proxy                              true
gateway-api-hostnetwork-enabled                   false
gateway-api-hostnetwork-nodelabelselector
gateway-api-secrets-namespace                     cilium-secrets
gateway-api-service-externaltrafficpolicy         Cluster
gateway-api-xff-num-trusted-hops                  0
health-check-icmp-failure-threshold               3
http-retry-count                                  3
http-stream-idle-timeout                          300
hubble-disable-tls                                false
hubble-listen-address                             :4244
hubble-network-policy-correlation-enabled         true
hubble-socket-path                                /var/run/cilium/hubble.sock
hubble-tls-cert-file                              /var/lib/cilium/tls/hubble/server.crt
hubble-tls-client-ca-files                        /var/lib/cilium/tls/hubble/client-ca.crt
hubble-tls-key-file                               /var/lib/cilium/tls/hubble/server.key
identity-allocation-mode                          crd
identity-gc-interval                              15m0s
identity-heartbeat-timeout                        30m0s
identity-management-mode                          agent
install-no-conntrack-iptables-rules               false
ipam                                              multi-pool
ipam-cilium-node-update-rate                      15s
iptables-random-fully                             false
k8s-client-burst                                  30
k8s-client-qps                                    20
k8s-require-ipv4-pod-cidr                         false
k8s-require-ipv6-pod-cidr                         false
kube-proxy-replacement                            true
kube-proxy-replacement-healthz-bind-address
l2-announcements-lease-duration                   10s
l2-announcements-renew-deadline                   5s
l2-announcements-retry-period                     1s
max-connected-clusters                            255
mesh-auth-enabled                                 false
mesh-auth-gc-interval                             5m0s
mesh-auth-queue-size                              1024
mesh-auth-rotated-identities-queue-size           1024
metrics-sampling-interval                         5m
monitor-aggregation                               medium
monitor-aggregation-flags                         all
monitor-aggregation-interval                      5s
nat-map-stats-entries                             32
nat-map-stats-interval                            30s
node-port-bind-protection                         true
nodeport-addresses
nodes-gc-interval                                 5m0s
operator-api-serve-addr                           127.0.0.1:9234
operator-prometheus-serve-addr                    :9963
packetization-layer-pmtud-mode                    blackhole
policy-cidr-match-mode
policy-default-local-cluster                      true
policy-deny-response                              none
policy-secrets-namespace                          cilium-secrets
policy-secrets-only-from-secrets-namespace        true
preallocate-bpf-maps                              false
procfs                                            /host/proc
proxy-cluster-max-connections                     1024
proxy-cluster-max-requests                        1024
proxy-connect-timeout                             2
proxy-idle-timeout-seconds                        60
proxy-initial-fetch-timeout                       30
proxy-max-active-downstream-connections           50000
proxy-max-concurrent-retries                      128
proxy-max-connection-duration-seconds             0
proxy-max-requests-per-connection                 0
proxy-use-original-source-address                 true
proxy-xff-num-trusted-hops-egress                 0
proxy-xff-num-trusted-hops-ingress                0
remove-cilium-node-taints                         true
routing-mode                                      tunnel
service-no-backend-response                       reject
set-cilium-is-up-condition                        true
set-cilium-node-taints                            true
synchronize-k8s-nodes                             true
tofqdns-dns-reject-response-code                  refused
tofqdns-enable-dns-compression                    true
tofqdns-endpoint-max-ip-per-hostname              1000
tofqdns-idle-connection-grace-period              0s
tofqdns-max-deferred-connection-deletes           10000
tofqdns-preallocate-identities                    true
tofqdns-proxy-response-max-delay                  100ms
tunnel-protocol                                   vxlan
tunnel-source-port-range                          0-0
unmanaged-pod-watcher-interval                    15s
vtep-cidr
vtep-endpoint
vtep-mac
vtep-mask
write-cni-conf-when-ready                         /host/etc/cni/net.d/05-cilium.conflist
mdh@fedora1:~ $

Installing Custom Resource Definitions (CRDs) for Cillium

Cilium functionality involves a handful of Custom Resource Definitions that are the Kubernetes equivalent of an XML schema definition used to define configuration data structures passed to Kubernetes. These CRDs must be applied to the Kubernetes cluster before attempting to build any Cilium related objects in the cluster. The list of commands to apply these CRDs from Cilium's GitHub site are shown below.


kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.4.1/config/crd/standard/gateway.networking.k8s.io_gatewayclasses.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.4.1/config/crd/standard/gateway.networking.k8s.io_gateways.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.4.1/config/crd/standard/gateway.networking.k8s.io_httproutes.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.4.1/config/crd/standard/gateway.networking.k8s.io_referencegrants.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.4.1/config/crd/standard/gateway.networking.k8s.io_grpcroutes.yaml

Any installed CRDs can be verified as shown below.

[mdh@fedora1 cdtrackerapi]$ kubectl get crds
NAME                                         CREATED AT
ciliumcidrgroups.cilium.io                   2026-03-13T03:38:55Z
ciliumclusterwideenvoyconfigs.cilium.io      2026-04-01T02:52:04Z
ciliumclusterwidenetworkpolicies.cilium.io   2026-03-13T03:38:54Z
ciliumendpoints.cilium.io                    2026-03-13T03:38:51Z
ciliumenvoyconfigs.cilium.io                 2026-04-01T02:52:05Z
ciliumgatewayclassconfigs.cilium.io          2026-04-01T02:52:07Z
ciliumidentities.cilium.io                   2026-03-13T03:38:49Z
ciliuml2announcementpolicies.cilium.io       2026-03-13T03:38:58Z
ciliumloadbalancerippools.cilium.io          2026-03-13T03:38:57Z
ciliumnetworkpolicies.cilium.io              2026-03-13T03:38:53Z
ciliumnodeconfigs.cilium.io                  2026-03-13T03:38:59Z
ciliumnodes.cilium.io                        2026-03-13T03:38:52Z
ciliumpodippools.cilium.io                   2026-03-13T03:38:50Z
gatewayclasses.gateway.networking.k8s.io     2026-03-28T22:13:51Z
gateways.gateway.networking.k8s.io           2026-03-28T22:13:51Z
grpcroutes.gateway.networking.k8s.io         2026-03-28T22:13:52Z
httproutes.gateway.networking.k8s.io         2026-03-28T22:13:52Z
referencegrants.gateway.networking.k8s.io    2026-03-28T22:13:52Z
[mdh@fedora1 cdtrackerapi]$

CRDs are left on a cluster even if Cilium is uninstalled. If Cilium has to be repeatedly installed and uninstalled, these CRDs do NOT need to be re-applied each time. They will remain.

Deleting / Re-Installing Cilium

Due to the complexity of Cilium, it is possible (likely) that the tool will require SEVERAL attempts at installation due to missing settings or special override steps required for some operating systems, etc. Once installed, a given installation can be removed from a Kubernetes cluster by the cilium uninstall command.

[mdh@fedora1 cdtrackerapi]$ cilium uninstall
🔥 Deleting pods in cilium-test namespace...
🔥 Deleting cilium-test namespace...
⌛ Uninstalling Cilium
🔥 Deleting pods in cilium-test namespace...
🔥 Deleting cilium-test namespace...
🔥 Cleaning up Cilium node annotations...
[mdh@fedora1 cdtrackerapi]$

Once deleted, it can be re-installed via the larger command in the prior section.

Defining IP Pools for Virtual IP Assignments

In addition to logically implementing SSL processing and load balancing functions for appllications, Cilium also implements IP address management functionality to automatically assign virtual IPs for load balancers based upon criteria in Service and Gateway objects defined for SSL access and load balancing. Earlier in this installment, an IP addressing scheme was described that provided for distinctions between development versus production environments and between load balancer VIPs that would be advertised via ARP versus BGP. With Cilium running in the cluster and its Custom Resource Definition files loaded for Cilium specific objects, those IP pools can now be configured. Here are the four YAML files along with the commands to apply them to the Kubernetes cluster.

Here is the mdhlabs-arp-vip-pool-dev.yaml file defining the development ARP block:

apiVersion: cilium.io/v2
kind: CiliumLoadBalancerIPPool
metadata:
  name: mdhlabs-arp-vip-pool-dev
spec:
  blocks:
  - start: "192.168.99.64"
    stop: "192.168.99.95"
  # allowFirstLastIPs: No reserves first and last
  # as gateway and broadcast IP of the subnet
  allowFirstLastIPs: "No"
  serviceSelector:
    matchLabels:
      "io.kubernetes.service.namespace": "development"
      "mdhlabs-arp": "enable"

Here is the mdhlabs-arp-vip-pool-prod.yaml file defining the production ARP block:

apiVersion: cilium.io/v2
kind: CiliumLoadBalancerIPPool
metadata:
  name: mdhlabs-arp-vip-pool-prod
spec:
  blocks:
  - start: "192.168.99.128"
    stop: "192.168.99.159"
  # allowFirstLastIPs: No reserves first and last
  # as gateway and broadcast IP of the subnet
  allowFirstLastIPs: "No"
  serviceSelector:
    matchLabels:
      "io.kubernetes.service.namespace": "prod"
      "mdhlabs-arp": "enable"

Here is the mdhlabs-bgp-vip-pool-dev.yaml file defining the development BGP block:

apiVersion: cilium.io/v2
kind: CiliumLoadBalancerIPPool
metadata:
  name: mdhlabs-bgp-vip-pool-dev
spec:
  blocks:
  - start: "192.168.77.64"
    stop: "192.168.77.95"
  # allowFirstLastIPs: No reserves first and last
  # as gateway and broadcast IP of the subnet
  allowFirstLastIPs: "No"
  serviceSelector:
    matchLabels:
      "io.kubernetes.service.namespace": "development"
      "mdhlabs-bgp": "enable"

Here is the mdhlabs-bgp-vip-pool-prod.yaml file defining the production BGP block:

apiVersion: cilium.io/v2
kind: CiliumLoadBalancerIPPool
metadata:
  name: mdhlabs-bgp-vip-pool-prod
spec:
  blocks:
  - start: "192.168.77.128"
    stop: "192.168.77.159"
  # allowFirstLastIPs: No reserves first and last
  # as gateway and broadcast IP of the subnet
  allowFirstLastIPs: "No"
  serviceSelector:
    matchLabels:
      "io.kubernetes.service.namespace": "prod"
      "mdhlabs-bgp": "enable"

With all of those configuration files created, they can be applied against the cluster with the following commands:

kubectl apply -f mdhlabs-arp-vip-pool-dev.yaml
kubectl apply -f mdhlabs-arp-vip-pool-prod.yaml
kubectl apply -f mdhlabs-bgp-vip-pool-dev.yaml
kubectl apply -f mdhlabs-bgp-vip-pool-prod.yaml

The status and names of all configured IP pool resources can be viewed as shown below.

mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ kubectl get ciliumloadbalancerippool
NAME                        DISABLED   CONFLICTING   IPS AVAILABLE   AGE
mdhlabs-arp-vip-pool-dev    false      False         32              3s
mdhlabs-arp-vip-pool-prod   false      False         32              2d2h
mdhlabs-bgp-vip-pool-dev    false      False         32              19d
mdhlabs-bgp-vip-pool-prod   false      False         32              19d
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $

The command kubectl describe ciliumloadbalancerippool mdhlabs-bgp-vip-pool-dev can be used to see the actual subnets in each pool, as illustrated here.

mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ kubectl describe ciliumloadbalancerippool mdhlabs-bgp-vip-pool-prod
Name:         mdhlabs-bgp-vip-pool-prod
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  cilium.io/v2
Kind:         CiliumLoadBalancerIPPool
Metadata:
  Creation Timestamp:  2026-04-11T22:54:39Z
  Generation:          1
  Resource Version:    12016097
  UID:                 87c8d61d-667c-49e4-b053-ae3bee76142b
Spec:
  Allow First Last I Ps:  No
  Blocks:
    Start:   192.168.77.128
    Stop:    192.168.77.159
  Disabled:  false
  Service Selector:
    Match Labels:
      io.kubernetes.service.namespace:  prod

(other details omitted)
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $

The next installment in this series will explain how to configure SSL processing and load balancing for an application using Service, Gateway and HTTPRoute resources provided by Cilium.

More information on using Cilium within Kubernetes is provided in other posts in this series:

Cilium and Kubernetes - Configuring SSL and Load Balancing

This post is installment #3 in a series of posts providing directions on installing and using Cilium for load balancing and SSL processing. Links to all of the posts in the series are provided below for convenience.

This post in the series will explain the use of the newer Gateway and HTTPRoute objects provided by the CNI framework within Kubernetes for implementing SSL processing and load balancing. These newer resource object formats supercede older Ingress objects previously standardized within Kubernetes. Because this tutorial series is intended to illustrate how to access such services using both the ARP and BGP schemes supported by Cilium, this post will reflect the creation of two parallel sets of services, one that uses ARP to allow access from the physical host segment and the other using BGP to allow advertising of avialable service points via routing protocols. This will hopefully make it easier to understand how the approaches differ.

Creating a Deployment for a ReplicaSet

Both theARP and BGP examples start with an underlying deployment of a redundant set of pods executing the underlying Spring Boot web service. These will be named differently to ensure completely separate processing paths but in reality, the core deployment is unaffected by the adoption of Cilium to provide SSL and load balancing at higher levels.

Here is the content of the file cd-arp-deploy.yaml for the deployment that will be used by the ARP implementation.

apiVersion: apps/v1
kind: Deployment
metadata:
   name: cd-arp-deploy
spec:
   replicas: 3
   selector:
     matchLabels:
       app: cd-arp
   template:
     metadata:
       labels:
          app: cd-arp
     spec:
       affinity:
         podAntiAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             - labelSelector:
                matchExpressions:
                   - key: app
                     operator: In
                     values:
                      - cd-arp
               topologyKey: "kubernetes.io/hostname"
       containers:
       - name: cdtrackerapi
         image: fedora1.mdhlabs.com:5000/cdtrackerapinossl:latest
         imagePullPolicy: Always
         startupProbe:
           httpGet:
             path: /cdtracker/api/readycheck
             port: 6680
           periodSeconds: 2
           failureThreshold: 10
         readinessProbe:
           httpGet:
             path: /cdtracker/api/readycheck
             port: 6680
           initialDelaySeconds: 0
           periodSeconds: 60
         livenessProbe:
           httpGet:
             path: /cdtracker/api/healthcheck
             port: 6680
           initialDelaySeconds: 10
           periodSeconds: 60
         envFrom:
         - configMapRef:
            name: cd-configmap
         - secretRef:
            name: cd-dbpass-secret
         ports:
         - containerPort: 6680
           protocol: TCP
       tolerations:
       - operator: "Exists"
         effect: "NoSchedule"

Here is the content of the file cd-bgp-deploy.yaml for the deployment that will be used by the BGP implementation.

apiVersion: apps/v1
kind: Deployment
metadata:
   name: cd-bgp-deploy
spec:
   replicas: 3
   selector:
     matchLabels:
       app: cd-bgp
   template:
     metadata:
       labels:
          app: cd-bgp
     spec:
       affinity:
         podAntiAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             - labelSelector:
                matchExpressions:
                   - key: app
                     operator: In
                     values:
                      - cd-bgp
               topologyKey: "kubernetes.io/hostname"
       containers:
       - name: cdtrackerapi
         image: fedora1.mdhlabs.com:5000/cdtrackerapinossl:latest
         imagePullPolicy: Always
         startupProbe:
           httpGet:
             path: /cdtracker/api/readycheck
             port: 6680
           periodSeconds: 2
           failureThreshold: 10
         readinessProbe:
           httpGet:
             path: /cdtracker/api/readycheck
             port: 6680
           initialDelaySeconds: 0
           periodSeconds: 60
         livenessProbe:
           httpGet:
             path: /cdtracker/api/healthcheck
             port: 6680
           initialDelaySeconds: 10
           periodSeconds: 60
         envFrom:
         - configMapRef:
            name: cd-configmap
         - secretRef:
            name: cd-dbpass-secret
         ports:
         - containerPort: 6680
           protocol: TCP
       tolerations:
       - operator: "Exists"
         effect: "NoSchedule"

Creating the Inner LoadBalancer Service

Both the ARP and BGP examples include an inner load balancer which distributes requests to the set of pods without SSL encryption. The two examples below do reflect one key difference between them. The ARP version specifies a label of mdhlabs-arp: enable and the BGP version specifies a label of mdhlabs-bgp: enable. This label coupled with the environment name of prod or development will drive selection of the assigned load balancer virtual IP from the pools of IP space configured earlier. Other than that tag to drive IP selection, these two Service definitions are functionally identical.

Here is the content of the file cd-arp-svc.yaml for the deployment that will be used by the ARP implementation.

apiVersion: v1
kind: Service
metadata:
    labels:
      app: cd-arp
      mdhlabs-arp: enable
    name: cd-arp-svc
spec:
    selector:
      app: cd-arp
    ports:
    - protocol: TCP
      port: 6680
      targetPort: 6680
    type: LoadBalancer
    # externalTrafficPolicy controls how requests from OUTSIDE the
    # cluster are distributed WITHIN the cluster
    # Local = traffic is processed by first node / pod that attracted the request
    # but does not undergo source NAT
    # Cluster (default) = requests are balanced across all nodes/pods but source IPs are NATed
    externalTrafficPolicy: Cluster
    # internalTrafficPolicy controls how reqeuests originating from WITHIN the
    # cluster are distributed:
    # Local - requests stay within pods on same node
    # Cluster (default) - requests are balanced across all nodes and pods
    internalTrafficPolicy: Cluster

Here is the content of the file cd-bgp-svc.yaml for the deployment that will be used by the BGP implementation.

apiVersion: v1
kind: Service
metadata:
    labels:
      app: cd-bgp
      mdhlabs-bgp: enable
    name: cd-bgp-svc
spec:
    selector:
      app: cd-bgp
    ports:
    - protocol: TCP
      port: 6680
      targetPort: 6680
    type: LoadBalancer
    # externalTrafficPolicy controls how requests from OUTSIDE the
    # cluster are distributed WITHIN the cluster
    # Local = traffic is processed by first node / pod that attracted the request
    # but does not undergo source NAT
    # Cluster (default) = requests are balanced across all nodes/pods but source IPs are NATed
    externalTrafficPolicy: Cluster
    # internalTrafficPolicy controls how reqeuests originating from WITHIN the
    # cluster are distributed:
    # Local - requests stay within pods on same node
    # Cluster (default) - requests are balanced across all nodes and pods
    internalTrafficPolicy: Cluster

For both of these service definitions, more explanation of the externalTrafficPolicy and internalTrafficPolicy parameters is warranted. As referenced in the comment lines of the YAML files, the externalTrafficPolicy parameter controls whether externally arriving traffic should be balanced across ALL pods in the cluster (Cluster) or stick to pods running on the same node that accepted the traffic from outside the cluster (Local). Similarly, the internalTrafficPolicy parameter controls whether traffic originating from WITHIN the cluster (such as web service A calling web service B) are balanced across all pods in the cluster (Cluster) or stick to pods on the same node that originated the traffic. In general, if true load balancing is desired / required, these should be set to Cluster.

Creating the HTTPRoute Object

In the new CNI based solutions for layer 7 processing, the HTTPRoute object provides a structure for defining the hostnames appearing in incoming traffic and URI paths that should be routed to underlying Service objects that steer into ReplicaSets of pods. This layer of traffic routing involves matches on the hostname appearing into a URL such as https://api.mdhlabs.com/cdtracker/api/healthcheck which might be different across environments such as https://apidev.mdhlabs.com/cdtracker/api/healthcheck. As a result, this layer of the flow will typically require environment-specific configuration files. While these files may differ in content because of environment, this layer does not reflect any differences based upon the use of ARP versus BGP.

Here is the content of the file cd-arp-httproute.prod.yaml for the deployment that will be used by the ARP implementation.

# cd-arp-httproute.prod.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: cd-arp-httproute
  namespace: prod
spec:
  parentRefs:
  - name: cd-arp-gw
  hostnames:
  - "api.mdhlabs.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /cdtracker/api
    backendRefs:
    - name: cd-arp-svc
      port: 6680

Here is the content of the file cd-bgp-httproute.prod.yaml for the deployment that will be used by the BGP implementation.

# cd-bgp-httproute.prod.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: cd-bgp-httproute
  namespace: prod
spec:
  parentRefs:
  - name: cd-bgp-gw
  hostnames:
  - "api.mdhlabs.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /cdtracker/api
    backendRefs:
    - name: cd-bgp-svc
      port: 6680

Defining a Secret for the SSL Certificate and Private Key

The Gateway resource defined next identifies the hostname expected for incoming traffic and must identify where the public certificate and private key required for that SSL processing will be housed for use by the gateway in decrypting traffic. The following command creates a Secret of type TLS resource in the prod namespace referencing the certificate and key files needed.

mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ kubectl create secret tls -n prod \
api-secrettls --cert=/containeretc/cdtrackerapi/api.mdhlabs.com.cert.pem \
--key=/containeretc/cdtrackerapi/api.mdhlabs.com.key.pem
secret/cdtrackerapi-secrettls created
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $

Use of the basic Secret object within Kubernetes to supply private SSL key information to deployed resources likely has some security concerns associated with it, particularly with the lack of encryption on the wire between the etcd instance of the cluster and nodes that read the Secret object when starting Gateway objects. However, optimization of that aspect of SSL administration is beyond the scope of this tutorial.

Creating the Gateway Object

In the new CNI based solutions for layer 7 processing, the Gateway object replaces the older Ingress component. Like the HTTPRoute object, the Gateway object will often contain references to hostnames and related SSL keys that will require distinct configurations per environment. Like the Service objects earlier, the two examples below do reflect one key difference between them. The ARP version specifies a label of mdhlabs-arp: enable and the BGP version specifies a label of mdhlabs-bgp: enable. This label coupled with the environment name of prod or development will drive selection of the assigned load balancer virtual IP from the pools of IP space configured earlier. Other than that tag to drive IP selection, these two Gateway definitions are functionally identical.

Here is the content of the file cd-arp-gw.prod.yaml for the deployment that will be used by the ARP implementation.

# cd-arp-gw.prod.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  labels:
     app: cd-arp
     service-name: cd-arp-svc
     mdhlabs-arp: enable
# NOTE: The mdhlabs-arp=enable label above must match the label
# specified by a CiliumL2AdvertisementPolicy object to trigger
# advertisement via ARP.  HOWEVER, this Gateway object generates
# a second Service object of type LoadBalancer that must also
# have this mdhlabs-arp=enable label to trigger the actual ARP advertisement.
#
# Cilium up to version 1.19.3 has a bug that fails to copy this 
# label to that auto-generated Service which prevents the ARP advertisement
# from being generated.  That label must be manually added after
# this gateway is created via this command:
#
#    kubectl -n prod label service cilium-gateway-cd-arp-gw mdhlabs-arp=enable
#
# Once added, Cilium will attempt to generate the ARP for the VIP
# NOTE: This tag must be reapplied each time Cilium auto-generates
# the Service.
  name: cd-arp-gw
  namespace: prod
spec:
  gatewayClassName: cilium
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    hostname: "api.mdhlabs.com"
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: api-secrettls

Here is the content of the file cd-arp-gw.prod.yaml for the deployment that will be used by the ARP implementation.

# cd-bgp-gw.prod.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  labels:
     app: cd-bgp
     service-name: cd-bgp-svc
     mdhlabs-bgp: enable
# NOTE: For VIPs assigned for Gateway and Service objects, Cilium
# expects to match a label in the Gateway or Service to a label
# that triggers advertisement of the IP via ARP or BGP.
#
# THe label here (mdhlabs-bgp: enable) matches a label in a
# CiliumBGPAdvertisement config which SHOULD trigger the IP
# assigned here to be adverised.  HOWEVER, this mechanism actually
# works on the SERVICE object and here, this Gateway auto-generates
# a SERIVCE object but Cilium does not label that auto-generated
# service with the (mdhlabs-bgp: enable) tag here so the IP
# is NOT advertised.
#
# This (mdhlabs-bgp: enable) tag must be MANUUALY added to the
# auto-generated Service for the Gateway EACH time the Gateway
# is deployed.
  name: cd-bgp-gw
  namespace: prod
spec:
  gatewayClassName: cilium
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    hostname: "api.mdhlabs.com"
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: api-secrettls

Manually Labeling the Auto-Generated Service for a Gateway

As referenced in the YAML file examples in the prior section that define a Gateway for SSL termination, Cilium has a known bug in its advertisement functionality for both ARP and BGP that requires a manual workaround. The CONCEPT of the advertising mechanism is that defining a special label such as mdhlabs-arp: enable or mdhlabs-bgp: enable that matches a policy for L2 ARP or BGP will trigger actions that generate the advertisement. However, Cilium releases up to 1.19.3 fail to copy this attribute from the Gateway resource specifiying it to the auto-generated Service object that creates the LoadBalancer for the gateway. As a result, the ARP or BGP function is never notified to trigger its advertisement process and the IP assigned to the LoadBalancer for the Gateway never gets advertised and is not reachable outside the cluster.

While waiting for a bug fix in a future Cilium release, this problem can be manually corrected by manually adding the desired label to the auto-generated Service resource after creating the Gateway in the cluster. The auto-generated Service object is always assigned a name of cilium-gateway-originalgatewayname. For the gateways defined as cd-arp-gw and cd-bgp-gw, the following commands would be required to attach the expected label to trigger ARP or BGP advertisement of the IP:


kubectl -n prod label service cilium-gateway-arp-gw mdhlabs-arp=enable
kubectl -n prod label service cilium-gateway-bgp-gw mdhlabs-bgp=enable

Verification of Components After Deployment

Despite what the words might imply, the command kubectl -n prod get all does NOT actually exaustively list all deployed components of all types in a given namespace. It only returns information on pods, services, deployments and replicacsets.

mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ kubectl -n prod get all
NAME                                 READY   STATUS    RESTARTS        AGE
pod/cd-bgp-deploy-6675bf6bb5-cw6xd   1/1     Running   1 (4h31m ago)   16h
pod/cd-bgp-deploy-6675bf6bb5-nrkq9   1/1     Running   1 (4h31m ago)   16h
pod/cd-bgp-deploy-6675bf6bb5-tw5lx   1/1     Running   1 (4h31m ago)   16h

NAME                               TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)          AGE
service/cd-bgp-svc                 LoadBalancer   10.100.125.156   192.168.77.128   6680:32391/TCP   16h
service/cilium-gateway-cd-bgp-gw   LoadBalancer   10.101.163.26    192.168.77.129   443:32549/TCP    16h

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cd-bgp-deploy   3/3     3            3           16h

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/cd-bgp-deploy-6675bf6bb5   3         3         3       16h
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $

In order to confirm the status of all resource types associated with a Cilium based load balancer configuration, the component types must be explciitly listed like this:

mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ kubectl -n prod get pods,deployments,replicaset,services,httproute,gateway
NAME                                 READY   STATUS    RESTARTS        AGE
pod/cd-bgp-deploy-6675bf6bb5-cw6xd   1/1     Running   1 (4h36m ago)   17h
pod/cd-bgp-deploy-6675bf6bb5-nrkq9   1/1     Running   1 (4h37m ago)   17h
pod/cd-bgp-deploy-6675bf6bb5-tw5lx   1/1     Running   1 (4h37m ago)   17h

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cd-bgp-deploy   3/3     3            3           17h

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/cd-bgp-deploy-6675bf6bb5   3         3         3       17h

NAME                               TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)          AGE
service/cd-bgp-svc                 LoadBalancer   10.100.125.156   192.168.77.128   6680:32391/TCP   17h
service/cilium-gateway-cd-bgp-gw   LoadBalancer   10.101.163.26    192.168.77.129   443:32549/TCP    17h

NAME                                                   HOSTNAMES             AGE
httproute.gateway.networking.k8s.io/cd-bgp-httproute   ["api.mdhlabs.com"]   17h

NAME                                          CLASS    ADDRESS          PROGRAMMED   AGE
gateway.gateway.networking.k8s.io/cd-bgp-gw   cilium   192.168.77.129   True         17h

Based upon all the configuration created to this point, both the cd-arp-deploy and cd-bgp-deploy should be physically running on the cluster and they should be reachable from any of the three Kubernetes nodes kube1, kube2 or kube3. Note that an HTTPS service cnanot be reached with a simple curl command that specifies the host IP address instead of the fully qualified domain name.

[root@kube1 ~]# curl -X GET https://api.mdhlabs.com/cdtracker/api/healthcheck
{ "host": "cd-arp-deploy-b94f54dbf-s6ftj", "ready": "true" + "time": "2026-04-30 18:43:29" }[root@kube1 ~]#
[root@kube1 ~]#
[root@kube1 ~]# curl -X GET https://192.168.99.129/cdtracker/api/healthcheck
curl: (35) Recv failure: Connection reset by peer
[root@kube1 ~]#

This is because the Gateway is testing the hostname in the request against the SSL certificate and finding the IP string does not match the host name in the certificate. To test both the ARP version (on 192.168.99.129) and BGP version (on 192.168.77.129), the local /etc/hosts file will need to be edited to flip between the two IP addresses mapped to api.mdhlabs.com.

The final point to note here is that with no other configuration being completed, even though these services are running WITHIN the cluster, the services cannot be accessed from OUTSIDE the cluster. The processes for providing external access into these services via ARP and BGP are covered in the final two installments of this series.

More information on using Cilium within Kubernetes is provided in other posts in this series:

Cilium and Kubernetes - Accessing Services Via ARP

This installment focuses on configuration elements that make virtual IP addresses assigned for load balancing within a Kubernetes cluster reachable to clients outside the cluster using ARP protocol. The processes for allocating IP addresses for auto-assignment and tagging Service objects with attributes to trigger ARP advertisements are described along with workarounds required for current defects in the Cilium implementation for ARP.

With the configuration elements built up to now, the pods are live with private IPs assigned within the cluster, they have been mapped to a service of type LoadBalancer which obtained an IP from a different reserved pool of IPs in the 192.168.99.128/25 range and a layer 7 gateway has been defined with hostname and SSL key information to terminate incoming SSL requests and forward them through an HTTPRoute to the LoadBalancer onto the pods.

However, the VIP address itself is not yet reachable from outside the cluster nodes. In order to expand reachabilty to services beyond the cluster via ARP, the following tasks must be performed:

A CiliumL2AnnouncementPolicy must be activated to define the label to match on to determine which IPs will be advertised via ARP -- here the label mdhlabs-arp=enable will be used
The Service object defining the unencrypted LoadBalancer VIP for the pod Deployment must be altered to specify that label triggering ARP advertisements
The Gateway object defining the SSL processing for incoming traffic must be altered to specify that same label triggering ARP advertisement.
After the Gateway object is activated, the auto-generated Service object tied to the Gateway must be MANUALLY tagged with the same label of mdhlabs-arp=enable

Visualized Configuration Flow

Given the number of discrete configuration elements required for a complete service deployment using Cilium and the possibility of additional environment-specific deployments, it is useful to depict ALL of the primary configuration objects in a single view that relates the references between the elements. When something isn't working, it is likely due to one of these components being overlooked or having a typo in it.

It is also useful to show all of the YAML configuration files for the overall cluster and specific application or service. Doing so illustrates the value in adopting a naming scheme for these files that reflects their content and object type and allows them to be used as a reminder to ensure all bases have been covered. Here, the files associated with using BGP advertisements for the "cd" service are:

mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ ls -l cd-arp*
-rw-r--r--. 1 mdh mdh 1514 Apr 13 22:02 cd-arp-deploy.yaml
-rw-r--r--. 1 mdh mdh  423 Apr 13 22:05 cd-arp-gw.dev.yaml
-rw-r--r--. 1 mdh mdh 1137 Apr 13 22:06 cd-arp-gw.prod.yaml
-rw-r--r--. 1 mdh mdh  371 Apr 13 22:03 cd-arp-httproute.dev.yaml
-rw-r--r--. 1 mdh mdh  361 Apr 13 22:00 cd-arp-httproute.prod.yaml
-rw-r--r--. 1 mdh mdh  469 Apr 13 21:58 cd-arp-svc.yaml
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ ls -l mdhlabs-arp*
-rw-r--r--. 1 mdh mdh 882 Apr  9 18:47 mdhlabs-arp-l2-policy.yaml
-rw-r--r--. 1 mdh mdh 382 Apr 27 19:31 mdhlabs-arp-vip-pool-dev.yaml
-rw-r--r--. 1 mdh mdh 378 Apr 27 19:32 mdhlabs-arp-vip-pool-prod.yaml
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $

Creating the CiliumL2AdvertisementPolicy

The CiliumL2AdvertisementPolicy object defines criteria Cilium will use to decide which Kubernetes nodes are given responsibility for handling ARP traffic, identifying which physical interface should be used to listen for ARP requests and answer them and identify which types of virtual IPs should trigger this process. In this example, the nodeSelector critiera allows all nodes not responsible for the Kubernetes control plane to handle ARP advertisement work.

apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
  name: mdhlabs-layer2-policy
spec:
  
  # 2. Select which nodes can announce these services (optional)
  # Excludes control-plane nodes in this example
  nodeSelector:
    matchExpressions:
      - key: node-role.kubernetes.io/control-plane
        operator: DoesNotExist
  
  # 3. Specify network interfaces for ARP/NDP responses (optional)
  # Supports regular expressions (e.g., matching eth0, eth1)
  interfaces:
    - ens18
  
  # 4. Choose which IP types to announce
  externalIPs: true
  loadBalancerIPs: true

This poicy should be deployed on the cluster with the following command.

These policies are global and are not restricted to any particular namespace.

[mdh@fedora1 cdtrackerapi]$ kubectl apply -f kb.mdhlabs.layer2policy.yaml
ciliuml2announcementpolicy.cilium.io/mdhlabs-layer2-policy created
[mdh@fedora1 cdtrackerapi]$ kubectl get CiliumL2AnnouncementPolicy
NAME                    AGE
mdhlabs-layer2-policy   32s
[mdh@fedora1 cdtrackerapi]$

Adding the Advertising Label to the Service (cd-arp-svc)

With the policy deployed, the required tag now neads to be added to the inner Service definition providing the LoadBalancer VIP into the pods of the Deployment. The YAML is shown below with the required line highlighted in RED.

apiVersion: v1
kind: Service
metadata:
    labels:
      app: cd-arp
      mdhlabs-arp: enable
    name: cd-arp-svc
spec:
    selector:
      app: cd-arp
    ports:
    - protocol: TCP
      port: 6680
      targetPort: 6680
    type: LoadBalancer
    # externalTrafficPolicy controls how requests from OUTSIDE the
    # cluster are distributed WITHIN the cluster
    # Local = traffic is processed by first node / pod that attracted the request
    # but does not undergo source NAT
    # Cluster (default) = requests are balanced across all nodes/pods but source IPs are NATed
    externalTrafficPolicy: Cluster
    # internalTrafficPolicy controls how reqeuests originating from WITHIN the
    # cluster are distributed:
    # Local - requests stay within pods on same node
    # Cluster (default) - requests are balanced across all nodes and pods
    internalTrafficPolicy: Cluster

Adding the Advertising Label to the Gateway (cd-arp-gw) and Service (cilium-service-cd-arp-gw)

The same label also needs to be included in the configuration of the Gateway that defines the SSL processing to perform on arriving external traffic prior to distribution to the inner pod load balancer. Here is the updated Gateway YAML with the additional label shown in RED.

# cd-arp-gw.prod.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  labels:
     app: cd-arp
     service-name: cd-arp-svc
     mdhlabs-arp: enable
# NOTE: The mdhlabs-arp=enable label above must match the label
# specified by a CiliumL2AdvertisementPolicy object to trigger
# advertisement via ARP.  HOWEVER, this Gateway object generates
# a second Service object of type LoadBalancer that must also
# have this mdhlabs-arp=enable label to trigger the actual ARP advertisement.
#
# Cilium up to version 1.19.3 has a bug that fails to copy this 
# label to that auto-generated Service which prevents the ARP advertisement
# from being generated.  That label must be manually added after
# this gateway is created via this command:
#
#    kubectl -n prod label service cilium-gateway-cd-arp-gw mdhlabs-arp=enable
#
# Once added, Cilium will attempt to generate the ARP for the VIP
# NOTE: This tag must be reapplied each time Cilium auto-generates
# the Service.
  name: cd-arp-gw
  namespace: prod
spec:
  gatewayClassName: cilium
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    hostname: "api.mdhlabs.com"
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: api-secrettls

After this Gateway object is applied to the cluster, Cilium will auto-generate a second Service object of type LoadBalancer to provide a virtual IP to access this outer SSL processing layer. HOWEVER, a bug in Cilium that still exists in version 1.19.3 as of April 30, 2026 fails to automatically include the labels that were found in the Gateway YAML definition. As such, this VIP assigned for the Gateway will NOT trigger ARP advertisements via Cilum's automated behavior because it lacks the mdhlabs-arp=enable tag.

To correct this problem, the auto-generated Service object for the Gateway needs to have the mdhlabs-arp: enable label manually added to it. Here is the before and after showing the missing label, the command to add it, and the required label afterwards.

Troubleshooting Layer 2 ARP Advertisements

The following checklist covers some of the more common problems likely to occur when first implementing ARP advertisements for external access to services.

Cilium MUST be installed with the devices option to specify which interfaces will be used to originate ARP messages to attract layer 2 traffic destined for a VIP. If nodes are not showing an active lease for a VIP that has been assigned, the VIP will work from those hosts but will NOT be reachable from other hosts on the same subnet.
Gateway objects must be configured with an annotation specifying devices that match the devices included with the Cilium install command.
In order for a node to generate an ARP message advertising the IP of a Gateway or Service, it has to acquire a "lease" on that VIP and the service must reference a device that is enabled for use in broacasting ARP messages to the rest of the world.
Cilium developers have zero plans to support the ability to PING a virtual IP of any kind, making it harder to identify which layer of configuration is preventing actual traffic from reaching a service from external sources.

All Layer2 related settings within Cilium can be summarized with the following command:

[mdh@fedora1 cdtrackerapi]$ cilium config view | grep l2
enable-l2-announcements                           true
enable-l2-neigh-discovery                         false
l2-announcements-lease-duration                   10s
l2-announcements-renew-deadline                   5s
l2-announcements-retry-period                     1s
[mdh@fedora1 cdtrackerapi]$


Information about leases and associated load balancer services that control them can be verified like this:

[mdh@fedora1 cdtrackerapi]$ kubectl -n kube-system get lease
NAME                                          HOLDER                                                                      AGE
apiserver-issnhmndiijorifh2ntjne5c2q          apiserver-issnhmndiijorifh2ntjne5c2q_258531d0-1190-4dfd-91c9-51a5ca3928b4   122m
cilium-l2announce-prod-cdtrackerapi-service   kube2                                                                       14h
cilium-operator-resource-lock                 kube2-86p7gzk7wb                                                            21d
kube-controller-manager                       kube1_201a8e2a-5280-4fa5-8952-cd81cc2074f3                                  21d
kube-scheduler                                kube1_90c94d72-92a4-46e6-a052-457259e40caa                                  21d
[mdh@fedora1 cdtrackerapi]$ kubectl -n kube-system describe lease cilium-l2announce-prod-cdtrackerapi-service
Name:         cilium-l2announce-prod-cdtrackerapi-service
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>
API Version:  coordination.k8s.io/v1
Kind:         Lease
Metadata:
  Creation Timestamp:  2026-04-03T02:45:24Z
  Resource Version:    6676812
  UID:                 4e0ff733-9278-453e-bcc8-09b3660eab6a
Spec:
  Acquire Time:            2026-04-03T14:46:06.313167Z
  Holder Identity:         kube2
  Lease Duration Seconds:  15
  Lease Transitions:       1
  Renew Time:              2026-04-03T16:48:51.528994Z
Events:                    
[mdh@fedora1 cdtrackerapi]$





More information on using Cilium within Kubernetes is provided in other posts in this series:

Cilium and Kubernetes - Caveats / Concepts

Cilium and Kubernetes - Installing Cilium Within Kubernetes

Cilium and Kubernetes - Configuring SSL and Load Balancing

Cilium and Kubernetes - Externally Accessing Services via ARP

Cilium and Kubernetes - Externally Accessing Services via BGP

Cilium and Kubernetes - Accessing Services Via BGP

This post is installment #5 in a series of posts providing directions on installing and using Cilium for load balancing and SSL processing. Links to all of the posts in the series are provided below for convenience.

This installment focuses on configuration elements that make virtual IP addresses assigned for load balancing within a Kubernetes cluster reachable to clients outside the cluster using BGP protocol. The processes for allocating IP addresses for auto-assignment and tagging Service objects with attributes to trigger BGP advertisements from within the cluster to BGP routers outside the cluster are described along with workarounds required for some apsects of FRR behavior when using it as a BGP router on Linux operating systems.

With the configuration elements built up to now, the pods are live with private IPs assigned within the cluster, they have been mapped to a service of type LoadBalancer which obtained an IP from a reserved pool of IPs and a layer 7 gateway has been defined with hostname and SSL key information to terminate incoming SSL requests and forward them through an HTTPRoute to the LoadBalancer onto the pods.

However, the VIP address itself is not yet reachable from outside the cluster nodes. In order to expand reachabilty to services beyond the cluster via BGP, the following tasks must be performed:

External BGP routers must be configured to talk to BGP router processes running within the cluster to propagate internal host routes for assigned virtual IPs to attract traffic to those IP addresses from outside the Kubernetes cluster.
The BGP router instances within the Kubernetes cluster must be configured via a CiliumBGPClusterConfig object and by adding a label to each node desired to act as a BGP router within the cluster.
The Cilium deployment within Kubernetes must be updated to enable its BGP control plane to integrate into each Kubernetes host IP stack
The Service object defining the unencrypted LoadBalancer VIP for the pod Deployment must be altered to specify that label triggering ARP advertisements
The Gateway object defining the SSL processing for incoming traffic must be altered to specify that same label triggering ARP advertisement.
After the Gateway object is activated, the auto-generated Service object tied to the Gateway must be MANUALLY tagged with the same label of mdhlabs-arp: enable

If the BGP approach is selected to allow virtual IPs assigned to services within Kubernetes externaly reachable, a diagram can be assembled to reflect all of the configuration parameters needed in both the internal Cilium configuration elements inside Kubernetes and the external BGP routers. The first key point to make is that it is not strictly necessary to use a physical router that supports BGP for this function. A Linux host running frr (Free Range Router) can provide basic BGP capabilities. In this example, frr will run within two Linux virtual machines simulating two different physical hosts.

The diagram below illustrates the overall topology and the IP address space that will be advertised via BGP both between the two non-Kubernetes router nodes and for ranges advertised from within Kubernetes.

The diagram merits a few summary points of clarification:

The Kubernetes cluster will act as its own Autonomous System using AS=65432 and each node will run a process that acts like a BGP router
The external network will be treated as a distinct Autonomous System using AS=62112 and two standalone hosts fedora1 and fedora2 will use frr to implement BGP routers for that AS
The 192.168.77.0/24 range of IP space to be used for load balancer virtual IPs within Kubernetes is NOT directly reachable to regular hosts on the 192.168.99.0/24 subnet. Those 192.168.77.x IPs will ONLY be reachable after an assignment triggers the Kubernetes BGP routers to originate a /32 advertisement for that IP which will then be advertised to the AS=62112 routers outside the cluster.
besides sharing those /32 host routes for VIPs between each other via BGP, the fedora1 and fedora2 hosts will incorporate those routes at their Linux OS layer, making the VIPs reachable for command line tests from those hosts.
in this LAN, the router at 192.168.99.1 is normally the gateway IP configured for all of these hosts.
If the 192.168.99.1 gateway router supports BGP or OSPF, it is possible to redistribute the /32 routes to the 192.168.99.1 router so all hosts on the 192.168.99.0/24 subnet can reach the VIPs.
if the 192.168.99.1 gateway router CANNOT support altering its routing configuration, other hosts can be reconfigured to use fedora1 (192.168.99.10) or fedora2 (192.168.99.11) as their gateway IP on their Ethernet configuration.

Visualized Configuration Flow

mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ ls -l cd-bgp*
-rw-r--r--. 1 mdh mdh 1514 Apr 13 22:08 cd-bgp-deploy.yaml
-rw-r--r--. 1 mdh mdh 1181 Apr 13 22:10 cd-bgp-gw.dev.yaml
-rw-r--r--. 1 mdh mdh 1171 Apr 12 18:01 cd-bgp-gw.prod.yaml
-rw-r--r--. 1 mdh mdh  361 Apr 12 17:53 cd-bgp-httproute.prod.yaml
-rw-r--r--. 1 mdh mdh 1094 Apr 27 17:21 cd-bgp-svc.yaml
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ ls -l mdhlabs-bgp*
-rw-r--r--. 1 mdh mdh  349 Apr 12 14:00 mdhlabs-bgp-advertisement.yaml
-rw-r--r--. 1 mdh mdh  883 Apr 26 11:57 mdhlabs-bgp-cluster-config.yaml
-rw-r--r--. 1 mdh mdh 1280 Apr 13 17:47 mdhlabs-bgp-fedora1frr.config.txt
-rw-r--r--. 1 mdh mdh 1280 Apr 13 17:47 mdhlabs-bgp-fedora2frr.config.txt
-rw-r--r--. 1 mdh mdh  781 Apr 11 21:30 mdhlabs-bgp-kube1-node-config.yaml
-rw-r--r--. 1 mdh mdh  781 Apr 11 21:47 mdhlabs-bgp-kube2-node-config.yaml
-rw-r--r--. 1 mdh mdh  781 Apr 11 21:48 mdhlabs-bgp-kube3-node-config.yaml
-rw-r--r--. 1 mdh mdh  328 Apr 12 14:01 mdhlabs-bgp-peer-config.yaml
-rw-r--r--. 1 mdh mdh  133 Apr 12 14:57 mdhlabs-bgp-pw-secret.yaml
-rw-r--r--. 1 mdh mdh  382 Apr 11 17:50 mdhlabs-bgp-vip-pool-dev.yaml
-rw-r--r--. 1 mdh mdh  378 Apr 11 17:50 mdhlabs-bgp-vip-pool-prod.yaml
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $

All of the steps below were first applied to two Fedora servers running as VMs atop ProxMox. The resulting "routers" properly passed routes between themselves and their peers within the Kubernetes cluster and properly routed traffic originated WITHIN them to the virtual IPs advertised via BGP and received responses. However, neither router would FORWARD incoming traffic from other hosts towards those BGP advertised destinations. The same installation was performed on a physical Fedora server running on bare metal on host IP 192.168.99.9 and worked perfectly. There is possibly some limit to how many times packets can be forwarded over virtual Ethernet adapters within virtual machines that prevented the VM based BGP router from functioning.

Installing FRR (Free Range Router) on Linux

Rather than using real routers that cost real dollars, this environment will run FRR which implements BGP, OSPB, IS-IS and other routing protocols atop Linux operating systems. The tool mimics most of Cisco IOS command syntax so it is relatively easy for anyone with familiarity of Cisco routers to use it as a virtual replacement. The tool can be installed and enabled as a daemon using these commands as root.


dnf install frr
systemctl enable frr
systemctl start frr

In order for FRR to function completely in this use case, two additional changes are required. First, packet forwarding must be permanently enabled at the operating system level by setting net.ipv4.ip_forward=1 within sysctl. The current status can be validated via this command.

root@fedorabgp:~ # sysctl -a | grep ipv4 | grep ip_forward
net.ipv4.ip_forward = 0
net.ipv4.ip_forward_update_priority = 1
net.ipv4.ip_forward_use_pmtu = 0
root@fedorabgp:~ #

It can be dynamically changed (but lost after reboot) via this command

root@fedorabgp:~ # sysctl -w net.ipv4.ip_forward=1
root@fedorabgp:~ # sysctl -a | grep ipv4 | grep ip_forward
net.ipv4.ip_forward = 1
net.ipv4.ip_forward_update_priority = 1
net.ipv4.ip_forward_use_pmtu = 0
root@fedorabgp:~ #

To ensure the change is applied at startup, the command can be added to a file placed in the /etc/sysctl.d directory which will be read at startup.

root@fedorabgp:~ # cat /etc/sysctl.d/99-sysctl.conf
# added by mdh 2021-11-25 to alter this setting for elasticsearch
vm.max_map_count=2621441
# added by mdh 2026-04-25 - enables forwarding to use this
# host as a BGP router
net.ipv4.ip_forward=1
root@fedorabgp:~ #

A second change is required to the /etc/frr/daemons configuration file to enable the bgpd daemon. That change is excerpted below.

# The watchfrr, zebra and staticd daemons are always started.
#
bgpd=yes
ospfd=no
ospf6d=no
ripd=no
ripngd=no
isisd=no

After making that change, the frr daemon should be restarted via systemctl restart frr.

Configuring the fedora1 and fedora2 BGP Routers

Before adding configuration to Kubernetes for BGP, the process of configuring FRR with a BGP network with peers and policies can be demonstrated to ensure the basics are working. This example creates a network with the following characteristics:

the routers outside of the Kubernetes cluster use Autonomous System (AS) number 62112
the routers inside of the Kubernetes cluster use Autonomous System (AS) number 65432
fedora1 router's BGP id will be its primary host IP of 192.168.99.10
fedora2 router's BGP id will be its primary host IP of 192.168.99.11
kube1 router's BGP id will be its primary host IP of 192.168.99.12

Here is the initial configuration for the fedora1 router:

fedora1# show running-config
Building configuration...

Current configuration:
!
frr version 10.5.0
frr defaults traditional
hostname fedora1
!
ip prefix-list KUBEFILTER seq 5 permit 192.168.77.0/24
ip prefix-list KUBEFILTER seq 10 deny 0.0.0.0/0 le 32
!
route-map ALLOW-ALL permit 10
exit
!
route-map INBOUNDFILTER permit 10
 match ip address prefix-list KUBEFILTER
exit
!
router bgp 62112
 bgp router-id 192.168.99.10
 neighbor KUBE_AS peer-group
 neighbor KUBE_AS remote-as 65432
 neighbor KUBE_AS password weakpassword
 neighbor KUBE_AS port 32179
 neighbor KUBE_AS timers 30 90
 neighbor MDHLABS peer-group
 neighbor MDHLABS remote-as 62112
 neighbor MDHLABS password weakpassword
 neighbor MDHLABS timers 30 90
 neighbor 192.168.99.12 peer-group KUBE_AS
 neighbor 192.168.99.12 description "kube1 peer"
 neighbor 192.168.99.13 peer-group KUBE_AS
 neighbor 192.168.99.13 description "kube2 peer"
 neighbor 192.168.99.14 peer-group KUBE_AS
 neighbor 192.168.99.14 description "kube3 peer"
 neighbor 192.168.99.11 peer-group MDHLABS
 neighbor 192.168.99.11 description "fedora2 peer"
 !
 address-family ipv4 unicast
  network 172.16.99.0/24
  neighbor KUBE_AS route-map INBOUNDFILTER in
  neighbor KUBE_AS route-map ALLOW-ALL out
  neighbor MDHLABS route-map ALLOW-ALL out
 exit-address-family
exit
!
endfedora1#

Here is the initial configuration for the fedora2 router:

fedora2# show running-config
Building configuration...

Current configuration:
!
frr version 10.5.0
frr defaults traditional
hostname fedora2
!
ip prefix-list KUBEFILTER seq 5 permit 192.168.77.0/24
ip prefix-list KUBEFILTER seq 10 deny 0.0.0.0/0 le 32
!
route-map ALLOW-ALL permit 10
exit
!
route-map FILTER_IN permit 10
 match ip address prefix-list MDHFILTER
exit
!
route-map INBOUNDFILTER permit 10
 match ip address prefix-list KUBEFILTER
exit
!
router bgp 62112
 bgp router-id 192.168.99.11
 neighbor KUBE_AS peer-group
 neighbor KUBE_AS remote-as 65432
 neighbor KUBE_AS password weakpassword
 neighbor KUBE_AS timers 30 90
 neighbor MDHLABS peer-group
 neighbor MDHLABS remote-as 62112
 neighbor MDHLABS password weakpassword
 neighbor MDHLABS timers 30 90
 neighbor 192.168.99.12 peer-group KUBE_AS
 neighbor 192.168.99.12 description "kube1 peer"
 neighbor 192.168.99.13 peer-group KUBE_AS
 neighbor 192.168.99.13 description "kube2 peer"
 neighbor 192.168.99.14 peer-group KUBE_AS
 neighbor 192.168.99.14 description "kube3 peer"
 neighbor 192.168.99.10 peer-group MDHLABS
 neighbor 192.168.99.10 description "fedora1 peer"
 neighbor 192.168.99.10 timers 30 90
 !
 address-family ipv4 unicast
  network 172.16.88.0/24
  neighbor KUBE_AS route-map INBOUNDFILTER in
  neighbor KUBE_AS route-map ALLOW-ALL out
  neighbor MDHLABS route-map ALLOW-ALL out
 exit-address-family
exit
!
End
fedora2#

With these configurations in place, querying the fedora1 router for a BGP summary and a list of IP ROUTES produces the following output:

fedora1# show ip bgp summary

IPv4 Unicast Summary:
BGP router identifier 192.168.99.10, local AS number 62112 VRF default vrf-id 0
BGP table version 2
RIB entries 3, using 384 bytes of memory
Peers 2, using 47 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
192.168.99.11   4      62112        31        32        2    0    0 00:13:55            1        1 "fedora1 internal
192.168.99.12   4      62112         0         0        0    0    0    never       Active        0 "fedora1 internal

Total number of neighbors 2
fedora1# show ip route
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, D - SHARP, F - PBR,
       f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

IPv4 unicast VRF default:
K>* 0.0.0.0/0 [0/100] via 192.168.99.1, ens18, weight 1, 00:43:35
B>* 172.16.88.0/24 [200/0] via 192.168.99.11, ens18, weight 1, 00:05:54
K>* 172.16.99.0/24 [0/0] via 192.168.99.1, ens18, weight 1, 00:06:45
C>* 172.17.0.0/16 is directly connected, docker0, weight 1, 00:43:35
L>* 172.17.0.1/32 is directly connected, docker0, weight 1, 00:43:35
C>* 192.168.99.0/24 [0/100] is directly connected, ens18, weight 1, 00:43:35
L>* 192.168.99.10/32 is directly connected, ens18, weight 1, 00:43:35
fedora1#

Similarly, running the same commands on the fedora2 router inside vtysh shows the following.

fedora2# show ip bgp summary

IPv4 Unicast Summary:
BGP router identifier 192.168.99.11, local AS number 62112 VRF default vrf-id 0
BGP table version 2
RIB entries 3, using 384 bytes of memory
Peers 2, using 47 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
192.168.99.10   4      62112        22        22        2    0    0 00:09:18            1        1 "fedora1 internal
192.168.99.12   4      62112         0         0        0    0    0    never       Active        0 "kube1 internal

Total number of neighbors 2
fedora2# show ip route
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, D - SHARP, F - PBR,
       f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

IPv4 unicast VRF default:
K>* 0.0.0.0/0 [0/100] via 192.168.99.1, ens18, weight 1, 00:11:06
K>* 172.16.88.0/24 [0/0] via 192.168.99.1, ens18, weight 1, 00:01:27
B>* 172.16.99.0/24 [200/0] via 192.168.99.10, ens18, weight 1, 00:02:18
C>* 172.17.0.0/16 is directly connected, docker0, weight 1, 00:11:06
L>* 172.17.0.1/32 is directly connected, docker0, weight 1, 00:11:06
C>* 192.168.99.0/24 [0/100] is directly connected, ens18, weight 1, 00:11:06
L>* 192.168.99.11/32 is directly connected, ens18, weight 1, 00:11:06
fedora2#

Together, both routers show expected / desired results:

both fedora1 andfedora2 show a live peering session to the other with one route sent and one route received
each shows another peering session to 192.168.99.12 with no connection since Cilium has not be configured yet
the unique IP ranges of 172.16.99.0/24 and 172.16.88.0/24 appear in the other router's IP ROUTES summary as having been learned via BGP

These policies are global and are not restricted to any particular namespace.

Configuring the Cilium BGP Advertisement Policy

The CiliumBGPAdvertisement object is used to define criteria that Cilium will use in deciding what IP addresses will be advertised by this BGP mechanism. For this example, a specific meta tag of mdhlabs-bgp: enable will be used to flag IPs needing advertisement. These tags will be added as part of the configuration of Service objects and Gateways that need a LoadBalancer VIP but the policy specifying this tag is defined as shown below in the mdhlabs-bgp-advertisement.yaml file.

apiVersion: cilium.io/v2
kind: CiliumBGPAdvertisement
metadata:
  name: mdhlabs-bgp-advertisement
  labels:
    bgp.cilium.io/advertise: mdhlabs-bgp-advertisement
spec:
  advertisements:
    - advertisementType: Service
      service:
        addresses:
          - LoadBalancerIP
      selector:
        matchLabels:
          mdhlabs-bgp: enable

This policy acts against the entire cluster so it doesn't get applied against a particular namespace.

mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ kubectl apply -f mdhlabs-bgp-advertisement.yaml
ciliumbgpadvertisement.cilium.io/mdhlabs-bgp-advertisement created
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $

Configuring the Cilium BGP Routers

In this example, connections between the external BGP routers on fedora1 and fedora2 will require a password to establish a BGP session to the Cilium BGP routers so the Cilium side needs to be provided that password. This will be done by first creating a Secret object which will then be referenced by a CiliumBGPPeerConfig object. This YAML will be housed in mdhlabs-bgp-pw-secret.yaml.

This secret must deployed to the same namespace used to run cilium, which normally installs itself inside the kube-system namespace. Some random examples on the Internet reference putting this secret in a namespace called cilium-secrets which will NOT make the secret accessible in default Cilium installations.

apiVersion: v1
kind: Secret
metadata:
  name: mdhlabs-bgp-pw-secret
  namespace: kube-system
stringData:
  password: "weakpassword"

The CiliumBGPClusterConfig object is used to define all external BGP routers the Cilium BGP routers will use to share internal routes for virtual IPs. The configuration for this sample lab environment as saved in mdhlabs.bgp-cluster-config.yaml is shown below.

apiVersion: cilium.io/v2
kind: CiliumBGPClusterConfig
metadata:
  name: mdhlabs-bgp-cluster-config
spec:
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/control-plane: ""
  bgpInstances:
    - name: "kubeas65432"
      localASN: 65432
      peers:
        - name: "fedora1"
          peerASN: 62112
          peerAddress: 192.168.1.10
          peerConfigRef:
            name: mdhlabs-bgp-peer-config
        - name: "fedora2"
          peerASN: 62112
          peerAddress: 192.168.1.11
          peerConfigRef:
            name: mdhlabs-bgp-peer-config

The CiliumBGPPeerConfig object serves the same role as a peer-group attribute in the FRR router configuration. It allows a set of common attributes used by all routers in the cluster to be defined in a single element and re-used. In this example, it is being used to supply the password that will be used to authenticate to the external routers on fedora1 and fedora2. Here is the peer configuration as saved in mdhlabs-bgp-peer-config.yaml.

apiVersion: cilium.io/v2
kind: CiliumBGPPeerConfig
metadata:
  name: mdhlabs-bgp-peer-config
spec:
  authSecretRef: mdhlabs-bgp-pw-secret
  gracefulRestart:
    enabled: true
  families:
    - afi: ipv4
      safi: unicast
      advertisements:
        matchLabels:
          bgp.cilium.io/advertise: loadbalancer-services

Adding the Advertising Label to the Service and Gateway

With the policy deployed to look for mdhlabs-bgp:enabled to trigger advertising assigned VIPs, existing Service and Gateway definitions that create load balancer virtual IPs must be altered to specify the expected tag in their metadata and redeployed. The revised cd-bgp-svc.yamlYAML for the Service object is shown below.

apiVersion: v1
kind: Service
metadata:
    labels:
      app: cd-bgp
      mdhlabs-bgp: enable
    name: cd-bgp-svc
spec:
    selector:
      app: cd-bgp
    ports:
    - protocol: TCP
      port: 6680
      targetPort: 6680
    type: LoadBalancer
    # externalTrafficPolicy controls how requests from OUTSIDE the
    # cluster are distributed WITHIN the cluster
    # Local = traffic is processed by first node / pod that attracted the request
    # but does not undergo source NAT
    # Cluster (default) = requests are balanced across all nodes/pods but source IPs are NATed
    externalTrafficPolicy: Cluster
    # internalTrafficPolicy controls how reqeuests originating from WITHIN the
    # cluster are distributed:
    # Local - requests stay within pods on same node
    # Cluster (default) - requests are balanced across all nodes and pods
    internalTrafficPolicy: Cluster

The outer Gateway object that initiates SSL / TLS processing also needs the metadata label applied. For this Gateway defined in cd-bgp-gw.prod.yaml, the full configuration looks like this:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  labels:
     app: cd-bgp
     service-name: cd-bgp-svc
     mdhlabs-bgp: enable
# NOTE: For VIPs assigned for Gateway and Service objects, Cilium
# expects to match a label in the Gateway or Service to a label
# that triggers advertisement of the IP via ARP or BGP.
#
# THe label here (mdhlabs-bgp: enable) matches a label in a
# CiliumBGPAdvertisement config which SHOULD trigger the IP
# assigned here to be adverised.  HOWEVER, this mechanism actually
# works on the SERVICE object and here, this Gateway auto-generates
# a SERIVCE object but Cilium does not label that auto-generated
# service with the (mdhlabs-bgp: enable) tag here so the IP
# is NOT advertised.
#
# This (mdhlabs-bgp: enable) tag must be MANUUALY added to the
# auto-generated Service for the Gateway EACH time the Gateway
# is deployed.
  name: cd-bgp-gw
  namespace: prod
spec:
  gatewayClassName: cilium
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    hostname: "api.mdhlabs.com"
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: api-secrettls

mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ kubectl -n prod apply -f cd-bgp-gw.prod.yaml gateway.gateway.networking.k8s.io/cd-bgp-gw created mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ kubectl -n prod get all NAME READY STATUS RESTARTS AGE pod/cd-bgp-deploy-78d5554d58-lvrzz 1/1 Running 15 (6h55m ago) 15d pod/cd-bgp-deploy-78d5554d58-rxh84 1/1 Running 12 (6h54m ago) 15d pod/cd-bgp-deploy-78d5554d58-zp64d 1/1 Running 22 (6h54m ago) 15d NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/cd-bgp-svc LoadBalancer 10.105.122.66 192.168.77.128 6680:31091/TCP 15d service/cilium-gateway-cd-bgp-gw LoadBalancer 10.103.224.100 192.168.77.129 443:32484/TCP 7s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cd-bgp-deploy 3/3 3 3 15d NAME DESIRED CURRENT READY AGE replicaset.apps/cd-bgp-deploy-78d5554d58 3 3 3 15d mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ ip route default via 192.168.99.1 dev ens18 proto static metric 100 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 192.168.49.0/24 dev br-a18ac545ed4a proto kernel scope link src 192.168.49.1 linkdown 192.168.77.128 nhid 24 proto bgp metric 20 nexthop via 192.168.99.13 dev ens18 weight 1 nexthop via 192.168.99.14 dev ens18 weight 1 nexthop via 192.168.99.12 dev ens18 weight 1 192.168.99.0/24 dev ens18 proto kernel scope link src 192.168.99.10 metric 100 mdh@fedora1:~/gitwork/webservices/cdtrackerapi $

Why isn't the 192.168.77.129 virtual IP being advertised from the cluster back to fedora1? Because the auto-generated service created by Cilium wasn't tagged with the mdhlabs-bgp:enabled tag defined on the original Gateway. Querying the cluster for the details of that gateway shows the missing label:

mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ kubectl -n prod describe service cilium-gateway-cd-bgp-gw
Name:                     cilium-gateway-cd-bgp-gw
Namespace:                prod
Labels:                   gateway.networking.k8s.io/gateway-name=cd-bgp-gw
                          io.cilium.gateway/owning-gateway=cd-bgp-gw
Annotations:              <none>
Selector:                 <none>
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.103.224.100
IPs:                      10.103.224.100
LoadBalancer Ingress:     192.168.77.129 (VIP)
Port:                     port-443  443/TCP
TargetPort:               443/TCP
NodePort:                 port-443  32484/TCP
Endpoints:
Session Affinity:         None
External Traffic Policy:  Cluster
Internal Traffic Policy:  Cluster
Events:                   
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $

If the live auto-generated service is updated to add the expected label, the route will be advertised as shown below.

mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ kubectl -n prod label service cilium-gateway-cd-bgp-gw mdhlabs-bgp=enable
service/cilium-gateway-cd-bgp-gw labeled
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ kubectl -n prod describe service cilium-gateway-cd-bgp-gw
Name:                     cilium-gateway-cd-bgp-gw
Namespace:                prod
Labels:                   gateway.networking.k8s.io/gateway-name=cd-bgp-gw
                          io.cilium.gateway/owning-gateway=cd-bgp-gw
                          mdhlabs-bgp=enable
Annotations:              <none>
Selector:                 <none>
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.103.224.100
IPs:                      10.103.224.100
LoadBalancer Ingress:     192.168.77.129 (VIP)
Port:                     port-443  443/TCP
TargetPort:               443/TCP
NodePort:                 port-443  32484/TCP
Endpoints:
Session Affinity:         None
External Traffic Policy:  Cluster
Internal Traffic Policy:  Cluster
Events:                   <none>
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $  
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $ ip route
default via 192.168.99.1 dev ens18 proto static metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.49.0/24 dev br-a18ac545ed4a proto kernel scope link src 192.168.49.1 linkdown
192.168.77.128 nhid 24 proto bgp metric 20
        nexthop via 192.168.99.13 dev ens18 weight 1
        nexthop via 192.168.99.14 dev ens18 weight 1
        nexthop via 192.168.99.12 dev ens18 weight 1
192.168.77.129 nhid 24 proto bgp metric 20
        nexthop via 192.168.99.13 dev ens18 weight 1
        nexthop via 192.168.99.14 dev ens18 weight 1
        nexthop via 192.168.99.12 dev ens18 weight 1
192.168.99.0/24 dev ens18 proto kernel scope link src 192.168.99.10 metric 100
mdh@fedora1:~/gitwork/webservices/cdtrackerapi $

More information on using Cilium within Kubernetes is provided in other posts in this series: