Kubernetes: Support port ranges in services

Created on 5 Apr 2016  ·  126Comments  ·  Source: kubernetes/kubernetes

There are several applications like SIP apps or RTP which needs a lot of ports to run multiple calls or media streams. Currently there is no way to allow a range in ports in spec. So essentially I have to do this:

          - name: sip-udp5060
            containerPort: 5060
            protocol: UDP
          - name: sip-udp5061
            containerPort: 5061
            protocol: UDP

Doing above for 500 ports is not pretty. Can we have a way to allow port ranges like 5060-5160?

areipvs kinfeature lifecyclfrozen prioritbacklog sinetwork

Most helpful comment

Any updates on supporting port range? I think it's a very useful feature

All 126 comments

Fyi you don't actually need to specify a port in the RC unless you're going to target it through a Service, but I guess this is a pain to do if you have even O(10) ports. I wonder if there's an easy client side solution that doesn't involve changing the Service (something like kubectl expose --port 3000-3020).

The problem with port ranges is that the userspace kube-proxy can't handle
them, and that is still the fallback path. Until/unless we can totally EOL
that, we're rather limited.

Aside from that, I don't immediately see a reason against port ranges.

On Tue, Apr 5, 2016 at 5:55 PM, Prashanth B [email protected]
wrote:

Fyi you don't actually need to specify a port in the RC unless you're
going to target it through a Service, but I guess this is a pain to do if
you have even O(10) ports. I wonder if there's an easy client side solution
that doesn't involve changing the Service (kubectl expose --ports
3000-3001).


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/issues/23864#issuecomment-206055946

I do need to target the ports through a Service since that is the only way outside users would be able to place a call via SIP

The short story is it doesn't work right now. Changing it is not
impossible, but we would need to define what happens when the external LB
and/or kube-proxy doesn't support ranges. It's not on the roadmap, though,
so it would either have to be contributed or escalated.

On Tue, Apr 5, 2016 at 9:01 PM, Aditya Patawari [email protected]
wrote:

I do need to target the ports through a Service since that is the only way
outside users would be able to place a call via SIP


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/issues/23864#issuecomment-206109433

I'm starting to work in this issue. I will try to implement just the logic in kubctl package to map the port ranges to ports which seems a easy and useful workaround. Later I will check the LB and/or kube-proxy option.

golang and kubernetes are new for me so any help, ideas or guidance will be welcomed.

@antonmry What API changes are you looking to do to support this? I glanced at your dev branch and it looks like there's a lot more copy-paste going on than I'd expect. I think I can save you effort if you talk out your desired changes here first.

This change, should we decide to do it, doesn't warrant a new API. It's just some field changes to Services and maybe Pods. The right place to start is with a proposal doc that details the change, the API compatibility concerns, the load-balancer implementation concerns, etc.

@bgrant0607 @antonmry needs this change to make TeleStax (an RTC framework) work well in Kubernetes, so this isn't a "for fun" request, but rather one that's needed to support their application well.

I agree w/ @lavalamp and @thockin that there are simpler designs, and that a lightweight design proposal is the right way to get to agreement on design quickly.

@brendandburns I assumed there was a reason behind the request, but API changes are tricky and expensive. This particular change has been discussed since #1802.

In the best case, any API change would appear in 1.4. If it is blocking an application now, I suggest finding a solution that doesn't require an API change, potentially in parallel with an API and design proposal.

As mentioned above, ports don't need to be specified on pods. The trick is how to LB to them. Perhaps @thockin or @bprashanth has a suggestion.

@bgrant0607 totally agree that this is targeted at 1.4 just wanted to give context on the motivation for the change.

Yeah. Like @thockin is hinting, I also expected to see this start out as annotations on pod and/or service objects.

(as much as I hate that this is how new fields start, this is how new fields start.)

Hi @bgrant0607, @thockin, @brendandburns, @lavalamp

I'm new to kubernetes development, so I was trying different things to test and see how the API works, even if in the beginning was different, as soon as I've realized the complexity of the issue, any of my dev branches have pretended to be a proposal or similar, so please, ignore them.

As far as I understand from your comments, for this issue it's better to have a new annotation than a new API version. I started a new API version looking for a way to keep my developments totally separated and then start the discussion. Also because the containerPort is mandatory so even with an annotation to indicate the range, I don't see how avoid the mandatory containerPort without change the API. How can I do it?.

Finally, please, feel free to manage this issue independently of the work I'm doing here. I would like to contribute and I appreciate your guidance and help but I don't pretend you have to accept my solution or hypothetical PR even if I would like be able to arrive to that point.

One idea is to put the beginning of the range in the current field, and use an annotation of the form alpha-range-end-<port name>=<range end value> or some variation thereof, depending on how exactly the mapping will work. This is kinda hokey but should do the trick.

We also need exposing large port range (10k+) for RTP/WebRTC and SIP frameworks. As SIP stack usually not stable and in some parts is very unpredictable it is really needed thing. BTW exposing port range is not what most of us want. We have dedicated IPs for media traffic and it could be nice to just map all traffic for specific IP to a service. Something like "dmz" and enable ability to expose port ranges for it.

I'd be OK to see some alpha feature enabling ranges (including whole IP)

On Mon, Sep 5, 2016 at 4:21 AM, Steve Kite [email protected] wrote:

We also need exposing large port range (10k+) for RTP/WebRTC and SIP
frameworks. As SIP stack usually not stable and in some parts is very
unpredictable it is really needed thing. BTW exposing port range is not
what most of us want. We have dedicated IPs for media traffic and it could
be nice to just map all traffic for specific IP to a service. Something
like "dmz" and enable ability to expose port ranges for it.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubernetes/issues/23864#issuecomment-244723466,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVJT5USGJRY2LUEAtkv1MUcvEjH2Sks5qm_sjgaJpZM4H_7wH
.

+1 to this idea. Specifying port ranges in services declarations is mandatory for any VoIP-related application. It would be great to have this done. Does anyone figured out a temporary workaround for this? Thanks in advance.

+1. We also need support for a large port range to enable a media processing engine running in a Docker container that is part of a Pod. As others have mentioned, this is needed to process (RTP) media streams. The SIP protocol is limited to 5060 and 5061 so we don't need it for the SIP signaling. The issue is with the standard RTP port range which is 16384-32767. Its important to understand that these ports are typically allocated dynamically when a new media session is being processed but they need to be accessible on the public IP associated with the container. I agree that its critical for anyone wishing to use Kubernetes to orchestrate a Voip service that includes any form of media processing including gateways, call recording services, MCUs, etc. I'm early to Kubernetes but it seems that this is one of the few things holding us back from using it at this point. I'm interested in helping here as well.

+1 Welp also ran into this with a media server application (need host networking to have a stable static IP, clients connect on the standard RTP port range).

Yes please, we would like this for a custom game server proxy which maps a range of external ports to petsets inside the cluster. UDP traffic, very similar to voice or media server applications I suppose.

Bonus points for being able to expose this service and have the GKE firewall rules work with the port range also.

+1

+1. We want to move to K8s. However, this issue is holding us back. If there is a temporary workaround, please let us know. Thanks.

What has mostly held us back is to spec it and explore the compatibility matrix with @kubernetes/sig-network folks - this WILL have impact on anyone who implements their own Services or who uses the userspace proxy.

Copying specific people off the top of my head

@ncdc re: idling
@brendandburns re: windows
@DirectXMan12 re: userspace proxy
@smarterclayton re: Openshift

I'd be happy to help someone explore this in 1.6

cc @knobunc (re OpenShift networking) (and for future reference, I'm probably a good bet to CC on idling as well ;-) )

and now a bug caused by this: #37994 (though on pods via deployments)

That was actually me :) I attempted to manually expose a large number of ports on a deployment which immediately caused issue with the k8s cluster due to the issue linked. @thockin As someone less familiar with the inner workers, what are the main blockers or issues to providing this feature? In my naive view of things, the api would just thread down to how the iptable rules are established, and the computation of whether two pods can coexist on a machine needs to change to accommodate this.

I think the right thing to do is start with a proposal. Things to consider:

  • do we do pods, services, or both?
  • how does this affect iptables rules?
  • how does this affect non-iptables implementations (sig-network)
  • can we support NodePorts on ranges? Seems hard.
  • can ~all (Services, L4) load-balancers support ranges?
  • type=load-balancer is a superset of node port, but nodeport is hard. How
    to resolve that?
  • node-ports are required for Ingress on most platforms, does this mean no
    ingress?

Other considerations to put in the doc as caveats:

  • IPVS does not seem to support multi-port, so this precludes ever using
    IPVS
  • This can't really be supported in the userspace proxy-mode

That's just off the top of my head. I am sure there's more to think
about. As with any such change, the easy parts are easy, it's the other
90% that makes it hard.

On Sat, Dec 10, 2016 at 1:42 PM, Jeremy Ong notifications@github.com
wrote:

That was actually me :) I attempted to manually expose a large number of
ports on a deployment which immediately caused issue with the k8s cluster
due to the issue linked. @thockin https://github.com/thockin As someone
less familiar with the inner workers, what are the main blockers or issues
to providing this feature? In my naive view of things, the api would just
thread down to how the iptable rules are established, and the computation
of whether two pods can coexist on a machine needs to change to accommodate
this.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubernetes/issues/23864#issuecomment-266242070,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVO1BV0lsan1UhNL4veFx2g5SxqN1ks5rGxzQgaJpZM4H_7wH
.

If you look at where most of these requests seem to be coming from (media servers and Voip) I think we could start with only enabling port ranges for pods. Until k8 provides native support for UDP based protocols like SIP and RTP, I don't see why k8 services would need to expose large port ranges. For the most part, media services currently require pods to be delivered in "hostNetwork": true mode because the user proxies do not support session affinity. iptables rules also do not help with these protocols for the same reason. SIP load balancers must be able to discover and route to pods directly on the node IP:port. If initially k8 only supported port ranges for pods, it seems like this could simplify port range requirement assuming it would be acceptable by the community.

So as an initial proposal, could we move forward with this?

  • do we do pods, services, or both? (pods only)
  • how does this affect iptables rules? (ignored, no affect)
  • how does this affect non-iptables implementations? (not sure about this one)
  • can we support NodePorts on ranges? (I don't see why this is needed for the reasons stated above)
  • can ~all (Services, L4) load-balancers support ranges? (I don't see a reason for current k8 LBs to support port ranges for the reasons stated above. Basically they can't handle session affinity for the protocols that require it anyway)
  • type=load-balancer is a superset of node port, but nodeport is hard. How to resolve that? (see previous statement)
  • node-ports are required for Ingress on most platforms, does this mean no ingress? (I believe the answer is yes)

Hopefully I'm not over simplifying this. I realize this could cause some confusion if not properly documented. Voip services are very stateful in nature and most of the service routing capabilities in k8 seem to be focused on stateless applications that rely mainly on http/s or services that are built on top of TCP. Stateful Voip services that rely on UDP and must have support for session affinity will require a completely different type of load balancer. One that works at layer 7 (e.g. SIP) or in the case of RTP the network will need to support tunneling IP:ports to a backing service container. Biting off all this to get port ranges just seems like too much to take on. Couldn't we start simple and bring along some of these other capabilities in stages?

By the way, I'm also very interested in a native SIP load balancer for k8. If this discussion ultimately leads the group in that direction I'm all in.

On Wed, Jan 25, 2017 at 1:35 PM, Brian Pulito notifications@github.com wrote:
>

If you look at where most of these requests seem to be coming from (media servers and Voip) I think we could start with only
enabling port ranges for pods. Until k8 provides native support for UDP based protocols like SIP and RTP, I don't see why k8

I don't know what "native" means here? Other than a large number of
ports, what doesn't work?

services would need to expose large port ranges. For the most part, media services currently require pods to be delivered in
"hostNetwork": true mode because the user proxies do not support session affinity. iptables rules also do not help with these

Our proxies support basic IP affinity, or do you mean something else?

So as an initial proposal, could we move forward with this?

do we do pods, services, or both? (pods only)

I really think Services are what is interesting (and hard). Just
allowing port ranges for pods doesn't do much. Pod ports are mostly
advisory. hostPort and names are the only reasons I can think to use
them

how does this affect iptables rules? (ignored, no affect)
how does this affect non-iptables implementations? (not sure about this one)
can we support NodePorts on ranges? (I don't see why this is needed for the reasons stated above)
can ~all (Services, L4) load-balancers support ranges? (I don't see a reason for current k8 LBs to support port ranges for the reasons stated above. Basically they can't handle session affinity for the protocols that require it anyway)
type=load-balancer is a superset of node port, but nodeport is hard. How to resolve that? (see previous statement)
node-ports are required for Ingress on most platforms, does this mean no ingress? (I believe the answer is yes)

Hopefully I'm not over simplifying this. I realize this could cause some confusion if not properly documented. Voip services are very stateful in nature and most of the service routing capabilities in k8 seem to be focused on stateless applications that rely mainly on http/s or services that are built on top of TCP. Stateful Voip services that rely on UDP and must have support for session affinity will require a completely different type of load balancer. One that works at layer 7 (e.g. SIP) or in the case of RTP the network will need to support tunneling IP:ports to a backing service container. Biting off all this to get port ranges just seems like too much to take on. Couldn't we start simple and bring along some of these other capabilities in stages?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@thockin With respect to the service vs pod point, my take is that one of the few reasons why we'd want to expose a vast range of ports through the firewall is specifically for applications where we don't need/want the service abstraction because presumably the protocols being used for such applications isn't TCP (which has a listening port + ephemeral port ranges).

VOIP, SIP, RTP, Games, all require host networking for latency reasons and protocol reasons so while services are "interesting," I suspect that most people that are able to use services can live with the inconvenience of having more lines exposing individual ports in their pod/service files. For people that need to expose anything in excess of 100 or so ports (and practically 10s of thousands on average), this approach is intractable and shipping earlier with a smaller scope would be preferable.

But you don't need to list them in a pod at all, unless you are either
naming them (not this case) or asking for hostPorts (you said you are
already hostNetwork). If you run in hostNetwork mode, you can just receive
on any port you want. No YAML needed.

That said, most load-balancers DO support UDP. So if you had Services with
port ranges, you could use affinity and run many SIP backends on the same
node. I don't know SIP, so I can't say whether that is useful or not :)

On Wed, Jan 25, 2017 at 11:40 PM, Jeremy Ong notifications@github.com
wrote:

@thockin https://github.com/thockin With respect to the service vs pod
point, my take is that one of the few reasons why we'd want to expose a
vast range of ports through the firewall is specifically for applications
where we don't need/want the service abstraction because presumably the
protocols being used for such applications isn't TCP (which has a listening
port + ephemeral port ranges).

VOIP, SIP, RTP, Games, all require host networking for latency reasons and
protocol reasons so while services are "interesting," I suspect that most
people that are able to use services can live with the inconvenience of
having more lines exposing individual ports in their pod/service files. For
people that need to expose anything in excess of 100 or so ports (and
practically 10s of thousands on average), this approach is intractable and
shipping earlier with a smaller scope would be preferable.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubernetes/issues/23864#issuecomment-275325266,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVG7EuYbvUm8ipX1f_odXELMKmfGWks5rWE3cgaJpZM4H_7wH
.

Firewall rules automatically written in by k8s prevents those ports from being openly accessible though correct?

@thockin Session affinity with SIP and UDP will require layer 7 packet introspection. You can't just really on client IP for session affinity because the client for these connections is mostly likely some other server (such as a session border controller) instead of the client initiating the call. Therefore a service will possibly see all its inbound connections coming in from the same endpoint. The LB will need to understand SIP protocol (via headers, route headers, etc.). So that's what I meant by native SIP support. I would like to be able to provision a SIP service in front of a bunch of pods that communicate with SIP and have it listen on the standard SIP UDP port (5060 or 5061) but that would require a native SIP load balancer. By the way this is not really a port range issue because SIP only needs a couple of ports just like http.

RTP, which is used to stream media in Voip applications, is the issue when you start talking about enabling k8 services with large port ranges. That's the only use case I can think of. The IP:port that a caller needs to stream media to on a media server is sent back to the caller in a response to a SIP INVITE message received at the service (e.g. a SIP based service allocates a media port on a media service and then sends that back to the caller). This is carried in a Session Description back to the caller and may traverse many hops. This media port could be any dynamic port allocated from a large part range. I was not aware that hostNetwork would allow you to bind to any port on the host without a definition in the yml file. If that's the case this may not be necessary. Docker compose doesn't work this way so I assumed k8 also would need that port range to be specified in the yml file. If we wanted to do this without hostHetwork it could get very complicated. For instance, how would the application that is allocating the IP and port be able to chose a port that is routable by some external endpoint? For now I think I agree with your comment that its probably best to just stick with hostNetwork and not worry about large port ranges for media services. As you said, these types of applications will most likely prefer hostNetwork anyway for performance reasons. I'm going to try removing the static list of ports from my yml file today!

With that said, should this really morph into a discussion about native support for SIP in k8?

As @thockin stated above, there is no need to specify a port range in the yml file if you are using "hostNetwork": true mode. I tested this and it works. This provides what I was looking for but I would be interested to hear if others still need port range support and why. Seems like native support for SIP load balancing should be opened in a different feature request.

I've seen a number of places where people have wanted to open ranges of
ports. I don't have the examples off-hand. One example was "whole-IP"
forwarding with no port remapping (VIP).

On Fri, Jan 27, 2017 at 11:20 AM, Brian Pulito notifications@github.com
wrote:

As @thockin https://github.com/thockin stated above, there is no need
to specify a port range in the yml file if you are using "hostNetwork":
true mode. I tested this and it works. This provides what I was looking for
but I would be interested to hear if others still need port range support
and why. Seems like native support for SIP load balancing should be opened
in a different feature request.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubernetes/issues/23864#issuecomment-275750382,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVG0DGSN2PHnh_xviEChEIYKHltgIks5rWkN5gaJpZM4H_7wH
.

We would also like to see port ranges supported for pods. We are trying to run GlusterFS inside Kubernetes using PetSet. GlusterFS uses one port for each volume provisioned. Their official documentation recommends opening 49152:49251 ports.
https://github.com/heketi/heketi/wiki/Kubernetes-Integration#infrastructure-requirements

This is particularly important when dynamic provisioning is used for GlusterFS volumes.

@bpulito How do you implement load balancing for SIP/RTP in a Kubernetes environment? I understand that K8S services / cloud load balancers would need to operate at L7 to implement session affinity for SIP/RTP. And in such a scenario - if you're not using K8S services - how do you implement service discovery and load balancing in K8S for SIP/RTP workloads.

@thockin Is it possible to write an L7 ingress controller for SIP/RTP traffic that bypasses the concept of services in K8S? In such a case, one would run pods in host networking mode - so you don't need any yaml for port ranges in the pod spec - and use the ingress controller to load balance traffic between the pods.

I too agree that a new issue should be created for supporting native load balancing for SIP/RTP traffic in Kubernetes.

PS: We have a requirement for running a WebRTC gateway on Kubernetes.

@khatribharat i've experimented in the past with using Kamailio as a SIP/RTP proxy in a daemonset with hostNetwork. This would be deployed to nodes with public addresses and do direct to pod communication on the inside as you say. Not had a play with custom ingress controllers but i think it would be a similar concept.

+1

+1

In our use case we need to open ports to a daemon (ds) on each host, and need to allow access to these ports.
Typically we need up to a few hundred ports per host.
Opening a range would be helpful.

I think it's very helpful for webgames. We need to start thounds of services, there're many services up and down in everyday. So if it supports port ranges in services, that will be very very helpful.

+1

Wanted to setup localtunnel server on k8s, is there any info when this feature going to be available?

There's an IPVS proposal open now and IPVS, for example, doesn't natively
support ranges (as far as I know). Before we can proceed here, we need to
answer that.

On Tue, Jun 6, 2017 at 9:03 AM, Alex notifications@github.com wrote:

Wanted to setup localtunnel server
https://github.com/localtunnel/server#overview on k8s, is there any
info when this feature going to be available?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubernetes/issues/23864#issuecomment-306534278,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVE-dqR3BN9f9xygMMwijK7dwZIq_ks5sBXhhgaJpZM4H_7wH
.

@bpulito It's not only SIP, you'll also need for telemetry like MQTT and for other many erlang based distributed systems like couchdb, coucbase etc.

Even if IPVS will add support for port range load balancing the signaling packets and media gateways still remain a problem for both SIP and MQTT protocols because the missing of session affinity mechanism.

If you control the clients then perphaps it's better to load balance via a well implemented SRV mechanism. However most SIP services do not control clients and they have to deal with trunks and SBC.

@thockin My proposal would sound something like this:

  1. We need to build a custom object like ingress. This custom controller object should be able to allocate public IP, control SRV entries on kube-dns and perform sanity check against k8s API.

  2. We need a new service type like "ExternalName" which instead mapping CNAME will map SRV records. However it need to act like a headless service and the created Endpoints record in the k8s API should point directly to Pods without proxying. This way you don't need to map port ranges because is nothing to proxy.

This implementation will cover any kind of stateful service on L4: mqtt, SIP, XMPP, cross cluster/datacenter replication or shards balancing. NFV will really need this kind of support but at this point telco industry is more focused on implementations on top of OpenStack but at some point they will also need K8s. So telco community should start contribute on this part. I would start contribute but I'm not 100% sure is this the right approach since I do not have too much experience in K8s internals details. Maybe my proposal is not the right approach.

And just to add one more example - Passive FTP. I cannot use it as i need to expose a passive port range on the service (NodePort for exmaple) with a range (31000-31300 for example).

@m1093782566 Any thoughts on if/how to support ranges in IPVS mode?

@thockin

As far as I know, fwmark with ipvsadm can do port ranges. For example,

iptables -A PREROUTNG -t mangle -d 172.16.52.57 -p tcp --dport 100:200 -j MARK --set-mark 9

ipvsadm -A -f 9 -s rr

ipvsadm -a -f 9  -r 172.16.52.60 -g

ipvsadm -a -f 9 -r 172.16.52.61 -g

Each service with port range will create a single iptables rule. I think it's acceptable since service that requiring port range is minority.

PS. the example is in IPVS DR(not NAT) forwarding mode, but it also works for NAT mode.

Another option is create IPVS service one by one - it's not a clever way.

I've seen a number of places where people have wanted to open ranges of
ports. I don't have the examples off-hand. One example was "whole-IP"
forwarding with no port remapping (VIP).

I wonder if there are REALLY user cases requiring port remapping?

One user case I can find is:

Real server(A) listen on both 80 and 443 and users specify port range[80, 443]

Visist service VIP:80 or VIP:443 -> A without port remapping.

However, if with port remapping, users specify port range [11, 12], what corresponding target port should be? [80, 443] or [443, 80]?

Someone may say, port range can be a service port & target port pair, for example,

service port: [1000: 2000]

target port: [5000: 6000]

I don't think it make sense especially in large amount ports since the mapping relationship is usually unpredictable.

PS.

From my test result, FWMARK with IPVS does not support port remapping, even in masq mode.

Even with pure iptables -m multiport, I don't know how to support port range and port remapping at the same time(with single iptables command). For example,

iptables -A INPUT -p tcp -m multiport --dports 3000,10000,7080,8080,3000,5666 -j ACCEPT

I'm in a project right now requiring that we open somewhere in the ballpark of 10000 ports, and I am not sitting down to write down every single port one by one in the config file. Is there any ETA on this? It's a pretty ridiculous feature not to have considering how required it is when running servers.

  • This can't really be supported in the userspace proxy-mode

I think it could be done with libnetfilter_queue: the proxy would add a rule so that incoming connections would first get sent to -j NFQUEUE, which kube-proxy would be monitoring, so it would see the details of the incoming packet before it was accepted, and it could record the source IP:port→destination port in a hash table. Then the packet would hit another iptables rule basically like the standard userspace proxy per-service rule, except redirecting the entire port range to a single port on the proxy. The proxy would accept the connection, and look up the source IP:port in the hash table to find the corresponding original destination port, and then open a connection to the correponding port on the pod IP.

FYI, there's a decent (needs a couple tweaks, but seems to work well) nfqueue client library in pure go here: https://github.com/subgraph/go-nfnetlink

@danwinship @DirectXMan12

Thanks for coming up userspace proxy mode solutions. BTW, there is a proposed IPVS proxy mode solution(IPVS + FWMARK) in https://github.com/kubernetes/community/pull/1738. Please help review this potential solution when you have a chance, thanks!

Anyway, port range is a strong feature request and I think that most of the blocking issues are in implementation side.

@Excludos even if you would write the code for 10000 ports it wouldn't work. The deployment complains that the file is too large. I generated the code for port configuration and I ended up with such error. So it isn't even a problem creating big YAML file. It seems that it's not possible at all to expose such port ranges at the moment. As you pointed out, port ranges are a must have, and it's very strange that they are still not available.

@KacperMucha is the host networking not an option for you?

Given the extra memory docker requires for forwarding such large numbers of ports, it seems to be even somewhat of an anti-pattern to do the port forwarding (in the sense that, by design, it seems to be impractical to do port forwarding for such large numbers of ports).

@gsaslis I have similar case as in the initial post here. It's a SIP server and it has to have such large amount of ports available. As to your suggestion - are you referring to this - https://github.com/Azure/azure-container-networking/blob/master/docs/acs.md?

@gsaslis hostNetwork comes with caveats. TLDR is that using hostNetwork currently requires that, at most, a single instance of any service may be run on any host. A longer response would detail all the reasons for this, possible hacks, and hard limitations, but I think this is the primary detractor.

@pdf yes, of course.

the problem is that - given the current state of Docker - it seems you should NOT even be trying to expose large numbers of ports. You are advised to use the host network anyway, due to the overhead involved with large port ranges. (it adds both latency, as well as consumes significant resources - e.g. see https://www.percona.com/blog/2016/02/05/measuring-docker-cpu-network-overhead/ )

If you are looking for a more official source, there is still (for years) an open issue in Docker about this:
https://github.com/moby/moby/issues/11185#issuecomment-245983651

So it seems the tl;dr actually is:
if your app needs a large port range, use host network, and work around the limitations it does come with. Or go write your own iptables rules.

I don't like either, but, well, when life gives you lemons... ;)

@gsaslis it's not clear to me that those concerns apply to Kubernetes. I suggest deferring to those with the knowledge of the various parts involved to declare what's viable. In the mean time, possible workarounds have already been well documented, however it's apparent that the workarounds are not adequate for all use-cases.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

port range proposal using kubernetes annotations
annotation: portrange.alpha.kubernetes.io/port-end-xxxx
https://github.com/kubernetes/kubernetes/commit/2cf5af1f8cc33e2d22ed1968aca05e5323ef9a0b

can this been merge?

/lifecycle stale

@maoge We need to figure out how IPVS implements port range.

There are a lot of "+1" type comments on this bug, but still a lot of vagueness about exactly what people need. In particular, the proposal in https://github.com/kubernetes/community/pull/1738 excludes NodePort and ExternalIP services and doesn't discuss LoadBalancers or Ingress at all (in the actual PR). So it doesn't provide any way for a port-range service to accept connections coming from outside the cluster, which is something I think many of the people here probably need.


For SIP/RTP-like use cases, where the pod picks a port to use at random out of the port range, you are effectively limited to one Service per client-visible IP, because the pods will be picking the random ports to use based on what ports are currently unbound in their own network namespace, and so if two pods tried to share the same External/Node/LoadBalancer/Ingress IP, they would be unable to see each other's bound ports and so might both try to use the same ephemeral port at the same time.

Given that, it seems like a simpler model would be that rather than having a Service that binds a range of ports, you instead have a Service that binds an entire IP address. (@thockin hinted at this earlier.) This would be much simpler to implement with the IPVS/userspace/etc proxiers, and would be trivial to add ExternalIP support to as well (perhaps even with an admission controller to ensure you don't bind two whole-IP services to the same ExternalIP). And it's certainly no harder to add LoadBalancer/Ingress support for than port ranges would be. (NodePort would still be out though, since you can't actually reserve the whole IP in that case.)

Binding an entire IP rather than a port range wouldn't work for:

  1. the case where you want to have both a SIP/RTP-like pick-a-random-port service and one or more other services on the same client-visible IP, but not have all of the services implemented by the same pod.
  2. the case where you want to have multiple services binding to different port ranges on the same client-visible IP. (ie, service A binds ports 1000-1999, service B binds ports 2000-2999, etc.)

Does anyone need to do either of those things? (Or anything else where port ranges would work but binding a whole IP wouldn't?)

Whole-IP services might also not be much help to people who want a Service that binds, eg, 10 ports as opposed to 1000. But they can still just keep writing out all 10 ports individually like they're doing now.

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

/remove-lifecycle rotten

+1 to this on my side, I'm working on cluster-to-cluster communication PODS that need to route lots of services, it would be very helpful to be able to expose a bunch of ports at once so I don't need to re-kick the pod to define new pods.

+1 for Deluge traffic range

/lifecycle frozen
/remove-lifecycle stale

Just making sure this doesn't rot.

I meet the same problem. In my case, I need to run tcp server at random port in runtime.
My solutation is to expose port range in Dockerfile. Then create service in my application with nodeport using kubernetes java client. One tcp server with one service.
Stupid and complicated Method but it works with my problem now. Hope for support port range

Hi @adimania were you able to fix that? Also I am getting ice failure when I am deploying sfu/mcu in kubernetes. Can you help me with this?

+1

Needed to run the RTP Endpoint services in kubernetes. Currently running with host networking

I ran into this as an issue getting a mosh based bastion host up and running.. workaround:

for i in seq 60000 60100 ; do echo -e " - protocol: TCP\n port: $i\n targetPort: $i\n name: bastion-$i" ; done

I do wonder how many ports can be in a service and what the implications of adding hundreds of single items is for the system.

get services is now pretty ugly:

bastion-mosh LoadBalancer 10.111.49.103 192.168.0.243 60000:30077/UDP,60001:30289/UDP,60002:30482/UDP,60003:32327/UDP,60004:31479/UDP,60005:31174/UDP,60006:31713/UDP,60007:30375/UDP,60008:31823/UDP,60009:31801/UDP,60010:32442/UDP,60011:30647/UDP,60012:31187/UDP,60013:32446/UDP,60014:31723/UDP,60015:30504/UDP,60016:32186/UDP,60017:31339/UDP,60018:31230/UDP,60019:31210/UDP,60020:31659/UDP,60021:31886/UDP,60022:30806/UDP,60023:32163/UDP,60024:32553/UDP,60025:31685/UDP,60026:32478/UDP,60027:30841/UDP,60028:31189/UDP,60029:32533/UDP,60030:30711/UDP,60031:31945/UDP,60032:32311/UDP,60033:30253/UDP,60034:30218/UDP,60035:31354/UDP,60036:31675/UDP,60037:31624/UDP,60038:31019/UDP,60039:31406/UDP,60040:31256/UDP,60041:31430/UDP,60042:32570/UDP,60043:30888/UDP,60044:32179/UDP,60045:32294/UDP,60046:30308/UDP,60047:31087/UDP,60048:31443/UDP,60049:31872/UDP,60050:30373/UDP,60051:30317/UDP,60052:31521/UDP,60053:32437/UDP,60054:31190/UDP,60055:32625/UDP,60056:30150/UDP,60057:31784/UDP,60058:31021/UDP,60059:30882/UDP,60060:32543/UDP,60061:30747/UDP,60062:30712/UDP,60063:32287/UDP,60064:31478/UDP,60065:32761/UDP,60066:32061/UDP,60067:32472/UDP,60068:32147/UDP,60069:31375/UDP,60070:30554/UDP,60071:32521/UDP,60072:32394/UDP,60073:31274/UDP,60074:31091/UDP,60075:31394/UDP,60076:31052/UDP,60077:32025/UDP,60078:32757/UDP,60079:31047/UDP,60080:31633/UDP,60081:30081/UDP,60082:30499/UDP,60083:30750/UDP,60084:32193/UDP,60085:32721/UDP,60086:32071/UDP,60087:31532/UDP,60088:30451/UDP,60089:30461/UDP,60090:30511/UDP,60091:31033/UDP,60092:31086/UDP,60093:31796/UDP,60094:30899/UDP,60095:31887/UDP,60096:32372/UDP,60097:32613/UDP,60098:31490/UDP,60099:31295/UDP,60100:30822/UDP 19h

Another reason this would be handy: specifying a large amount of ports to open on Services means that an equal amount of iptables rules get created (if k8s is using the iptables proxy mode, like GKE does). Iptables performance slows down dramatically after about 10,000 rules (as implied here) which means that a service with 10,000 open ports (for PASV FTP, for instance) or 10 services with 1000 open ports would degrade iptables performance on k8s nodes.

This problem is solved by the IPVS proxy, but not all environments support this mode (again, GKE). Being able to specify ranges instead of individual ports would also solve this problem.

is this still a possibility without killing iptables?

@m1093782566
If this issue has been triaged, please comment /remove-triage unresolved.

If you aren't able to handle this issue, consider unassigning yourself and/or adding the help-wanted label.

🤖 I am a bot run by vllry. 👩‍🔬

Side note: Now that multiple interfaces per pod are becoming a standard, it
seems that such an additional interface is the escape hatch to expose port
ranges.

@fabiand could you elaborate?

Actually - I was wrong, I ignored the context.
N, even with additional interfaces you can not export ranges _in services_
. It would just allow you to expose a complete IP to the outside of a
cluster, which is similar nbut not the same.

And how would you do that?

use multus to provide additional nics to a pod?

Multus does not seem to work with openshift 3.11 or atomic host :/

Multus does not seem to work with openshift 3.11 or atomic host :/

@kempy007 Multus does work with OpenShift 3.11 though it's not quite as well integrated as with OpenShift 4.x

Thanks @dcbw, I pinned it down to interface requiring promisc mode enabling. Is OpenShift 4.x now deployable to non aws targets now, I see more providers now. :) I will eagerly try v4 tomorrow.

Hi,
I am working on adding static nat rule to translate cluster-ip to pod-ip without port translation, so entire port range opens up between cluster-ip and pod-ip.

Implementation is simple. On presence of an annotation, static nat rule is added.

I wanted to know if anyone would be interested before I proceed further.

Implementation is available at mybranch

Hi,
I am working on adding static nat rule to translate cluster-ip to pod-ip without port translation, so entire port range opens up between cluster-ip and pod-ip.

Implementation is simple. On presence of an annotation, static nat rule is added.

I wanted to know if anyone would be interested before I proceed further.

Implementation is available at mybranch

That is great, we are interested in,.

Any updates on supporting port range? I think it's a very useful feature

Context: sig-architecture subgroup work to search for technical debt
Category: Networking
Reason: Wider adoption, useful for RTP/media kind of traffic, community interest, not supported in usermode kubeproxy

This feature would also be handy for running a samba DC within kubernetes, as required ports include are all dynamic/rpc ports: 49152-65535

Also deploying this service is therefore currently impossible, as a kubectl apply -f samba-dc.yaml fails with:

The Deployment "samba-dc" is invalid: metadata.annotations: Too long: must have at most 262144 characters

it is also impractical to handle such a large deployment file e.g. https://gist.github.com/agowa338/fbf945f03dd6b459c315768ecbe89fc0

Hi,

Any update on the support of port range for a pod/service. Is any development happening around this area.

Gaurav

same need here for media streaming.

Until now I just found one solution for implementing this, by changing the networking stack of the k8s cluster to an implementation like shown here: https://fosdem.org/2020/schedule/event/rethinking_kubernetes_networking_with_srv6/

I am reminded of this issue as I try to put an NFS server behind a load balancer for ex-cluster servicing. Portmapper hands out ports that the LB is not forwarding and thus this does not work.

Wondering if https://github.com/cilium/cilium can solve kube-proxy scale issues by bypassing iptables limitations.

What is the plan on this feature? Any work around? Needed for

  • Media Server
  • RTC Server
  • SIP/RTP Server
  • NFS Server
  • Samba DC
  • Telnet Server

Also for STUN server

What is the plan on this feature? Any work around? Needed for

The workaround is:

Until now I just found one solution for implementing this, by changing the networking stack of the k8s cluster to an implementation like shown here: https://fosdem.org/2020/schedule/event/rethinking_kubernetes_networking_with_srv6/

You need to remove the NAT and route your traffic directly to the pods. This also works for IPv4, but either requires an external NAT46 (with 1:1 mapping) or a routed subnet (might be expansive).

  • Telnet Server

This one shouldn't be affected, Telnet uses only a single port. Did you mean active FTP instead?

You need to remove the NAT and route your traffic directly to the pods. This also works for IPv4, but either requires an external NAT46 (with 1:1 mapping) or a routed subnet (might be expansive).

That's a workaround, yes, but not a solution. Routing to the pod subnet is a major security hole and people who understand the ramifications of doing this ... don't.

@briantopping It isn't a security hole if you know what you're doing... The linked document from above shows how to architect such a setup. And with that setup in place you can just easily add a NAT46 for a public IPv4 in front of it...

Instead of "exposing" ports one can as well just create iptables rules to allow/deny the traffic...

It's a modern IPv6 first design, these all rely on routing instead of NATing and regarding security it's the same. Just requires the admin to know different concepts...
These setups also tend to be much more secure because of not relying on "This service is not nated to external so nobody could access it" philosophies (there are often very trivial ways to bypass these and still access the non exposed services)...

  • Telnet Server

This one shouldn't be affected, Telnet uses only a single port. Did you mean active FTP instead?

By telnet server I meant console server, a server that listens on a range of ports and connect each port to a serial console of a physical or virtual device. Mostly using the telnet protocol.

It isn't a security hole if you know what you're doing

Nothing is a security hole, if you know what you're doing. Let's strive for secure and easy to use by default.

Nothing is a security hole, if you know what you're doing. Let's strive for secure and easy to use by default.

Than you shouldn't use NAT44 (and stick to IPv6 only), as that only provides a false sense of security instead of secure and easy by default.
By now there are countless write ups of how to bypass NATs from the outside in to reach unintended destinations...
NAT is not, was not and most likely will not be a security boundary. In fact it was not even designed to be one. All it does it is making your firewall ruleset more complex to read and handle as well as making your log files opace to where a connection actually comes from/goes to...

Any update on the ability to expose a port range? I skimmed through the comments but couldn't find an answer.

Thanks!

@dkatipamula you can't do it at the moment. You could do it programmatically via the API, by patching the service object, but not with the manifest itself. I hope this gets implemented at one point.

Same issue here, this feature was requested back in 2016 but it seems that no effort has been dedicated to solve it. Adding the range manually is a hassle and error prone.

A shame because work is on the way for port ranges on networkpolicies: https://github.com/kubernetes/enhancements/pull/2090

Hello,

As the original author of the mentioned Kep, I'm going to take a look into this issue, but some points to consider:

  • Because of the nature of Services, I think this would be pretty hard, as you can have a Service with ClusterIP pointing to a different target port, so how would be the following use case covered in portRange:
 ports:
  - port: 443
    protocol: TCP
    targetPort: 8443

Maybe if the idea is only to make a from:to mapping with the same ports, it could be validated in the API with "if an endPort and a targetPort exists, return an error".

I see the problem that we want to solve here is: I have a Pod with multiple ports / a range that needs to be exposed in my ClusterIP / LoadBalancer / ExternalIP like an NFS or an FTP Server and want to expose all of them in a single ClusterIP, right?

I don't have some strong feeling of how to achieve this, due to the nature of the components that deal with the ClusterIP (aka kube-proxy, but Calico and Cilium have their own Service impl).

I think one thing that could be done is to ping sig-network in slack or mailing list asking about this issue, and how plausible it is.

Once it's plausible, a KEP (like the above) should be written as this is going to change a Core API.

@thockin @andrewsykim @danwinship WDYT about this?

There was actually a KEP for this feature before, but it was being specified in a way that wouldn't have worked for most of the users here (https://github.com/kubernetes/kubernetes/issues/23864#issuecomment-435957275). Indeed, one of the +1 comments here even notes "BTW exposing port range is not what most of us want", and another comment points out that for the SIP/RTP case, service port ranges don't actually even help.

If you google "kubernetes sip rtp", this issue is one of the first hits (and the very first "kubernetes-internal" hit), and I think most of the people +1'ing here are doing so because they want to do SIP/RTP to kubernetes pods, and they are assuming that because other people are talking about SIP/RTP here, that this must be the issue that needs to be fixed in order to make SIP/RTP in Kubernetes work better. But I don't think it actually is the feature that they need.

How would we deal the scale of environment variables in the autoinjected services? Right now, for example... a single service creates all these entries, and the pods get this metadata for free.... (i..e. that's how the magic in-cluster client is configured.. but some folks that don't have Kube-dns, use these env vars as well in apps to find IPs for stuff )

MY_SERVICE_PORT=tcp://10.96.24.249:80
MY_SERVICE_PORT_80_TCP=tcp://10.96.24.249:80
MY_SERVICE_PORT_80_TCP_ADDR=10.96.24.249
MY_SERVICE_PORT_80_TCP_PORT=80
MY_SERVICE_PORT_80_TCP_PROTO=tcp
MY_SERVICE_SERVICE_HOST=10.96.24.249
MY_SERVICE_SERVICE_PORT=80

If you supported a large range of say all ports, would that mean you'd inject like 4096*4 environment variables for every service ?

maybe if you did this, you could do it saying " if using a port range, there are no guarantees around env var injection " but that of course would make a split-brained UX around service env var injection

The env var problem would be quite trivial to solve. As it is the application that would use them and applications needing a single port won't suddenly need a port range, we could just add to that standard to just have two ways. For single ports defined use the current env variables. But for the here requested feature of port ranges use something like:

MY_SERVICE_PORT_RANGE_START=1024
MY_SERVICE_PORT_RANGE_SIZE=2048

This will indicate that there is also a port range defined with 2048 ports starting at 1024.

From following this issue now for years, I think the best thing to do is to just not use the export/expose feature anymore and use the network cni directly to just publish everything and restrict access externally to the k8s cluster as we're clearly hitting a design limitation here that is not being addressed in a timely manner.

The only clean solution I think would be to make the cluster IPs (at least partially) routable from external and instead of NAT-ing IPs to the outside world to configure a firewall and/or connection security rules.
Because the next problem that we'd hit after having the port range feature will inevitably be port exhaustion. If I want to run two services that need the full dynamic port range e.g. active ftp + samba ad dc the ports on the nodes own ip will not be enough.

Therefore we'd either need:

  • the cluster ip to be accessible from external and connections restricted/allowed by a firewall/connection security rules instead of NAT
  • if cluster ips shouldn't be exposed (because of a false sense of security or whatever) a 2nd ip allocation pool that is similar to cluster ip but with publicly routable ips and proper firewalling but with a different name.

For me I just went the use IPv6 and have enough address space route. So I can make the cluster ips routable from external and do proper firewalling instead of a NAT. My cloud provider offers a NAT64 gateway as a service, so I just needed to configure a DNS64 dns server and the egress was basically done. The ingress is a different story, as my cloud provider doesn't offer BGP traffic would always pass through one node (aka. I still need to write a manager that flips the route into the cluster when the "ingress"-node goes down...), but besides that it works.
Bonus if you now ask yourself how IPv4 clients will connect to these services? The answer is quite simple through a vip that the cloud providers load balancer has allocated.

@danwinship Port ranges is what some of us want, even for doing SIP/WebRTC. Using host networking is an inferior solution compared to port ranges. Of course, having IPv6 or public cluster IPs could be better but its impossible to have that in most cloud setups, hence, port ranges could be a decent compromise until we get to full IPv6 connectivity.

@nustiueudinastea so what is the complete architecture? Are you using load balancers? External IPs? Do you have a separate LB/external IP for each pod? If not, don't you have to worry about port conflicts between the different pods, who have no way to coordinate among each other which ports are in use and which aren't? Or do you only have a single (or very small number) of pods doing SIP/RTC so that doesn't really matter? If you do have a separate LB/external IP for each pod, then would your needs be met just as well by having a "whole IP" service rather than a port range, or are you also using _other_ ports on the same LB/externalIP for unrelated services?

How would we deal the scale of environment variables in the autoinjected services?

Don't. Environment variables for services are bad anyway.

@danwinship good questions. In my particular case, I have several pods on each node, deployed using daemonsets, which allows me to pre-define the port ranges for each daemonset so that they don't conflict. At the moment, I set the port ranges programatically via the K8s API so I don't have to embed 10s of thousands of ports in the daemonset definitions. Clients/applications talk directly to the daemonset servers via the external IP of the node that they are hosted on.

Clarification: at the moment I am using hostPorts to provide direct connectivity to the WebRTC servers running in the daemonsets.

Env vars ... ok ... makes sense if you all say we can break the 1990s docker links env var model to make progress I'm all for it :). Probably we should have a KEP for that since some things like client go rely on this ?

@nustiueudinastea so what is the complete architecture? Are you using load balancers? External IPs? Do you have a separate LB/external IP for each pod? If not, don't you have to worry about port conflicts between the different pods, who have no way to coordinate among each other which ports are in use and which aren't? Or do you only have a single (or very small number) of pods doing SIP/RTC so that doesn't really matter? If you do have a separate LB/external IP for each pod, then would your needs be met just as well by having a "whole IP" service rather than a port range, or are you also using _other_ ports on the same LB/externalIP for unrelated services?

Well in the "normal enterprise world" or in a "normal cdn" world one could simply use ECMP with an deterministic algo, like using the client IP. Same client IP goes to the same backend. If the backend is down the connection goes to to another one. That other one than sends a RST package and the connection gets reestablished by the client...
Or alternatively the connection state of open and closed ports from the kernel could be synced, I've already seen people doing it for very highly frequented services to prevent RSTs when the lb does a failover (aka. Active-Active configuration).

Anyway, I this is out of scope for k8s and would be something that people would/could do with a daemonset on top of it.
I think we should go with a KEP that drops port mapping entirely (or at least discourages until we can deprecate it) and provides firewalling of the services instead. Should also improve performance and reduce cpu load. As well as enable clusters to span multiple clouds without paying for VPN tunnels or doing any crazy overlay networking...

Env vars ... ok ... makes sense if you all say we can break the 1990s docker links env var model to make progress I'm all for it :). Probably we should have a KEP for that since some things like client go rely on this ?

oh, KUBERNETES_SERVICE_HOST needs to stay. I just meant env vars for end-user services. See also https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/0028-20180925-optional-service-environment-variables.md

Guys, I follow this topic since 2016 and I need to remember you one thing. The whole point of using K8s is to EASLY automate fleet control plane. When automation itself became a bigger problem then the process itself you have to stop the madness of these crazy workarounds. K8s was not built as a data plane orchestration solution. Actually it's crazy inefficient to run data plane through K8s.

I have dealt with multiple telco setups in the last 5 years. Forget about K8S if you need distributed SBCs or distributed media. Just go with Terraform for orchestration and keep K8S as a proxied fleet control plane for the data plane(media).

TLDR: Do not use K8s to orchestrate the data plane. Orchestrate it with Terraform.

Was this page helpful?
0 / 5 - 0 ratings