Enhancements: Topology aware routing of services

Created on 11 Jan 2018  ·  79Comments  ·  Source: kubernetes/enhancements

Feature Description

  • One-line feature description (can be used as a release note):
    Implement "local service", say "topology aware routing of service". Local means "same topology level", e.g. same node, same rack, same failure zone, same failure region and whatever users like.

  • Primary contact (assignee):
    @m1093782566

  • Responsible SIGs:
    /sig network

  • Design proposal link (community repo):
    kubernetes/enhancements#640

  • KEP: https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/20181024-service-topology.md

  • Link to e2e and/or unit tests:

  • Reviewer(s) - (for LGTM) recommend having 2+ reviewers (at least one from code-area OWNERS file) agreed to review. Reviewers from multiple companies preferred:
    @thockin @quinton-hoole @kevin-wangzefeng

  • Approver (likely from SIG/area to which feature belongs):
    @thockin

  • Feature target (which target equals to which milestone):

    • Alpha release target (1.15)

    • Beta release target (x.y)

    • Stable release target (x.y)

PRs

kinapi-change kinfeature sinetwork stagalpha trackeno wpolicy

Most helpful comment

@helayoty It's already in alpha in v1.17, and there are something unclear now, so it may has no major change in v1.18.

All 79 comments

/assign

After discussing in today's sig-network meeting, we are targeting it Alpha in v1.10.

cc @kubernetes/sig-network-misc

The design discussion has not been resolved, this looks like now targeted for v1.11? Please update if not so.

It seems we have missed the v1.10 time window. I will present it on next SIG-Network meeting to see if it can get a v1.11 ticket?

@m1093782566
Any plans for this in 1.11?

If so, can you please ensure the feature is up-to-date with the appropriate:

  • Description
  • Milestone
  • Assignee(s)
  • Labels:

    • stage/{alpha,beta,stable}

    • sig/*

    • kind/feature

cc @idvoretskyi

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

/wg policy

This feature current has no milestone, so we'd like to check in and see if there are any plans for this in Kubernetes 1.12.

If so, please ensure that this issue is up-to-date with ALL of the following information:

  • One-line feature description (can be used as a release note):
  • Primary contact (assignee):
  • Responsible SIGs:
  • Design proposal link (community repo):
  • Link to e2e and/or unit tests:
  • Reviewer(s) - (for LGTM) recommend having 2+ reviewers (at least one from code-area OWNERS file) agreed to review. Reviewers from multiple companies preferred:
  • Approver (likely from SIG/area to which feature belongs):
  • Feature target (which target equals to which milestone):

    • Alpha release target (x.y)

    • Beta release target (x.y)

    • Stable release target (x.y)

Set the following:

  • Description
  • Assignee(s)
  • Labels:

    • stage/{alpha,beta,stable}

    • sig/*

    • kind/feature

Once this feature is appropriately updated, please explicitly ping @justaugustus, @kacole2, @robertsandoval, @rajendar38 to note that it is ready to be included in the Features Tracking Spreadsheet for Kubernetes 1.12.


Please note that Features Freeze is tomorrow, July 31st, after which any incomplete Feature issues will require an Exception request to be accepted into the milestone.

In addition, please be aware of the following relevant deadlines:

  • Docs deadline (open placeholder PRs): 8/21
  • Test case freeze: 8/28

Please make sure all PRs for features have relevant release notes included as well.

Happy shipping!

P.S. This was sent via automation

Hi @m1093782566
This enhancement has been tracked before, so we'd like to check in and see if there are any plans for this to graduate stages in Kubernetes 1.13. This release is targeted to be more ‘stable’ and will have an aggressive timeline. Please only include this enhancement if there is a high level of confidence it will meet the following deadlines:
Docs (open placeholder PRs): 11/8
Code Slush: 11/9
Code Freeze Begins: 11/15
Docs Complete and Reviewed: 11/27

Please take a moment to update the milestones on your original post for future tracking and ping @kacole2 if it needs to be included in the 1.13 Enhancements Tracking Sheet

We are also now encouraging that every new enhancement aligns with a KEP. If a KEP has been created, please link to it in the original post or take the opportunity to develop a KEP.

Thanks!

Per sig-network we are looking at getting a revised KEP in for the 1.13 timeframe but not the code.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

@m1093782566 Hello - I’m the enhancement’s lead for 1.14 and I’m checking in on this issue to see what work (if any) is being planned for the 1.14 release. Enhancements freeze is Jan 29th and I want to remind that all enhancements must have a KEP

@m1093782566 Thanks! Based on the description it sounds like this feature is targeting alpha in 1.15 so there is nothing to track for 1.14 but let me know if that is not correct

@claurence We were hoping for alpha in 1.14 if we can get it in. @m1093782566 do you think that you will finish the CRD rework in time for 1.14?

Thanks @johnbelamaric - do you have any open PRs for this issue for it to make the alpha milestone?

@claurence We were hoping for alpha in 1.14 if we can get it in. @m1093782566 do you think that you will finish the CRD rework in time for 1.14?

I will try.

@m1093782566 @johnbelamaric Hello - we've noticed the KEP for this issue is still marked as "pending" - what more is needed to make the KEP implementable?

This is implementable, we should update that. @m1093782566 do you still
think this can make the March 7 code freeze date?

On Mon, Feb 11, 2019 at 9:14 AM Claire Laurence notifications@github.com
wrote:

@m1093782566 https://github.com/m1093782566 @johnbelamaric
https://github.com/johnbelamaric Hello - we've noticed the KEP for this
issue is still marked as "pending" - what more is needed to make the KEP
implementable?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/enhancements/issues/536#issuecomment-462411649,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AJB4s_Wfb4vO9nwzOBwV9xPnYpAu6eZoks5vMaTYgaJpZM4RaMvm
.

@johnbelamaric

Thanks. I am not sure if we can make the 1.14 code freeze date but I am working on it now.

/assign @thockin
for API review

@vishh shadow

@johnbelamaric @m1093782566 @thockin can one of you please PR the KEP to implementable status? ref: https://github.com/kubernetes/enhancements/issues/536#issuecomment-462418092

done in #845

@claurence this has been retargeted to 1.15. Can you reassign it (I don't have permissions)?

/milestone v1.15

I am interested in shadowing the API review for this... next quarter

Hello @johnbelamaric , I'm the Enhancement Lead for 1.15. Is this feature going to be graduating alpha/beta/stable stages in 1.15? Please let me know so it can be tracked properly and added to the spreadsheet.

Once coding begins, please list all relevant k/k PRs in this issue so they can be tracked properly.

@kacole2

Yes, we are targeting v1.15 and the relevant PR in k/k can be found here: https://github.com/kubernetes/kubernetes/pull/72046

My friend @guanyuding and I will carry it forward.

Thanks!

@m1093782566 we're doing a KEP review for enhancements to be included in the Kubernetes v1.15 milestone. After reviewing your KEP, it's currently missing test plans and graduation criteria which is required information per the KEP Template. Please update the KEP to include the required information before the Kubernetes 1.15 Enhancement Freeze date of 4/30/2019. Thank you.

@m1093782566, Enhancement Freeze for Kubernetes 1.15 has passed and this did not meet the deadline. Test Plans and Graduation Criteria still need to be added to the KEP. This is now being removed from the 1.15 milestone and the tracking sheet. If there is a need for this to be in 1.15, please file an Enhancement Exception. Thank you. @claurence

/milestone clear

Hi @m1093782566 , I'm a 1.16 Enhancement Shadow. Is this feature going to be graduating alpha/beta/stable stages in 1.16? Please let me know so it can be added to the 1.16 Tracking Spreadsheet. If not's graduating, I will remove it from the milestone and change the tracked label.

After reviewing your KEP, it's currently missing test plans and graduation criteria which is required information per the KEP Template. Please update the KEP to include the required information before the Kubernetes 1.16 Enhancement Freeze date of 7/30/2019

Milestone dates are Enhancement Freeze 7/30 and Code Freeze 8/29.

Thank you.

Hello @m1093782566, 1.17 Enhancement Shadow here! 🙂

I wanted to reach out to see *if this enhancement will be graduating to alpha/beta/stable in 1.17?

*
Please let me know so that this enhancement can be added to 1.17 tracking sheet.

Please note that the KEP is missing test plan.

Thank you!

🔔Friendly Reminder

  • The current release schedule is

    • Monday, September 23 - Release Cycle Begins

    • Tuesday, October 15, EOD PST - Enhancements Freeze

    • Thursday, November 14, EOD PST - Code Freeze

    • Tuesday, November 19 - Docs must be completed and reviewed

    • Monday, December 9 - Kubernetes 1.17.0 Released

  • A Kubernetes Enhancement Proposal (KEP) must meet the following criteria before Enhancement Freeze to be accepted into the release

    • PR is merged in
    • In an implementable state
    • Include test plan and graduation criteria
  • All relevant k/k PRs should be listed in this issue

@annajung

I wanted to see this enhancement will be graduating to alpha in 1.17 and would you please help with adding it to 1.17 tracking sheet? Thanks!

Hey @m1093782566 , I will add this enhancement to the tracking sheet to be tracked 👍

Please see the message above for friendly reminders and note that KEP is missing a test plan.

Hey @m1093782566 , if possible, please include links to the tests in testgrid and keep track of any added tests. Thank you!

Hey @m1093782566 , We're only 5 days away from the Enhancements Freeze (Tuesday, October 15, EOD PST). Another friendly reminder that to be able to graduate this in the 1.17 release, you will require to have a test plan defined in the KEP.

Could you also clarify what you mean by this statement under graduation criteria in the KEP for going from alpha to beta?

E2E tests exist for service topology

Any info on helping other members understand the intent would be very appreciated! Thanks again! Please don't hesitate to reach out if you have any questions

Hey @m1093782566 , unfortunately deadline for 1.17 enhancement freeze has passed and looks like the KEP is still missing a test plan. I will be removing this enhancement from the 1.17 milestone.

Please note that you can file an enhancement exception if you need to get this in for 1.17

/milestone clear

I will file an exception today. I added a test plan here: https://github.com/kubernetes/enhancements/pull/1322

@guineveresaenger approved the exception today, so I'll add this back to the 1.17 milestone.

/milestone v1.17

Hello @m1093782566 I'm one of the v1.17 docs shadows.
Does this enhancement for (or the work planned for v1.17) require any new docs (or modifications to existing docs)? If not, can you please update the 1.17 Enhancement Tracker Sheet (or let me know and I'll do so)

If so, just a friendly reminder we're looking for a PR against k/website (branch dev-1.17) due by Friday, November 8th, it can just be a placeholder PR at this time. Let me know if you have any questions!

@m1093782566

Since we're approaching Docs placeholder PR deadline on Nov 8th. Please try to get one in against k/website dev-1.17 branch.

Hey @m1093782566 1.17 Enhancement Shadow here! 👋 I am reaching out to check in with you to see how this enhancement is going.

Thank you for providing k/k PR, I have added the following kubernetes/kubernetes#72046 in the tracking sheet. Are there any other k/k PRs that need to be tracked as well?

Also, another friendly reminder that we're quickly approaching code freeze (Nov. 14th).

@irvifa docs placeholder PR submitted

Hi @m1093782566 @johnbelamaric , Tomorrow is code freeze for 1.17 release cycle. It looks like the k/k PRs have not been merged. We’re flagging this enhancement as At Risk in the 1.17 tracking sheet.

Do you think all necessary PRs will be merged by the EoD of the 14th (Thursday)? After that, only release-blocking issues and PRs will be allowed in the milestone with an exception.

FYI: I wrote a Chinese article about this feature a few days ago, which received a great response in China and now I tranlated it into English https://imroc.io/posts/kubernetes/service-topology-en/

Thank all of you for your patience and careful review @thockin @andrewsykim @johnbelamaric @robscott

I will invest more time and effort to promote this feature. I listed some TODOs:

  • Update KEP
  • Make Service Topology can totally replace ExternalTrafficPolicy
  • Support more proxier
  • Support headless service (kube-dns/coredns)

Cross-AZ traffic is not free and currently there's not really any tools to avoid it in Kubernetes.

Amazon bills $0.02 per gb: https://www.lastweekinaws.com/blog/aws-cross-az-data-transfer-costs-more-than-aws-says/

Meaning with 3 AZs 2/3rd of our internal traffic will be billed at $0.02 per gb (assuming random distribution of target IPs).

Topology aware routing will be great feature to keep data transfers free by preferring pods within the same AZ.

Cross-AZ traffic is not free and currently there's not really any tools to avoid it in Kubernetes.

Amazon bills $0.02 per gb: https://www.lastweekinaws.com/blog/aws-cross-az-data-transfer-costs-more-than-aws-says/

Meaning with 3 AZs 2/3rd of our internal traffic will be billed at $0.02 per gb (assuming random distribution of target IPs).

Topology aware routing will be great feature to keep data transfers free by preferring pods within the same AZ.

Yes, Cross-AZ traffic is usually not free in most cloud vendors. Let's look forward to this feature.

/assign

/assign @johnbelamaric @andrewsykim

I'm going to update the KEP, but I found something about how to implement headless service needs to be rethought and discussed.

In order to handle headless services, the DNS server needs to know the node corresponding to the client IP address in the DNS request - i.e, it needs to map PodIP -> Node. Kubernetes DNS servers(include kube-dns and CoreDNS) will watch PodLocator object. When a client/pod request a headless service domain to DNS server, dns server will retrieve the node labels of both client and the backend Pods via PodLocator. DNS server will only select the IPs of backend Pods which are in the same topological domain with client Pod, and then write A record.

The initial design was that the dns server watch PodLocator to get the node label(topology information) of both client and the backend Pods: When a client/pod request a headless service domain to DNS server, dns server will retrieve the node labels of both client and the backend Pods via PodLocator.

But our current implementation of kube-proxy is to watch the EndpointSlice to get the topology information of backend Pods, and watch current node to get topology information of the client. so PodLocator is not needed anymore, but how does dns server get the topology information of client pod without PodLocator? I think we need PodLocator again if we wanna to implement the headless service 😂

@thockin @robscott @andrewsykim @johnbelamaric @m1093782566

I think kube-dns/coredns can just watch node metadata to get topology information, although it takes more compute resources, but it only have few replicas, unlike kube-proxy, each node has one replica.

What do you think? @thockin @andrewsykim

Hi, @imroc, a friend of mine just sent me this issue, because I was thinking to write a coreDNS plugin to route requests to the same zone, and I acctually did.

It is just a prototype but it does the trick:
https://github.com/erickfaustino/coredns/blob/inmemory/plugin/k8szoneaware/

Perceptions I had when was writing this:

1- Using K8S API for each request could be slower than caching everything
2- I was thinking to use a informer to update my cached maps in any Add, Delete or Update events. (I did not write this part yet)
3- To use annotation in the service to make it eligible
4- Can only be used with headless services

I hope this could helps you, and if I can, I'm also glad to help.

@erickfaustino Thanks for your help. But the key point now is how to get the client's topology information. According to your idea, we need to cache all pods and all nodes(get pod by request ip, get nodeName by pod, then get node's topology), if we do this, coredns will consume a lot of resources in big cluster, e.g. 5k nodes and 50k+ pods, and all nodes will be frequently updated because of kubelet report node's status.

The whole idea is simple:

  1. Get client's source ip from dns request.
  2. Find the topology information of the client (hostname,
    region,zone...)
  3. Filter the endpoints for the requested headless service according to the client's topology.
  4. Return the filtered endpoints.

The difficulty now is the second step. If we choose to watch nodes/node's metadata and pod, as the size of the cluster grows, coredns consumes a lot more resources.

Or maybe we need to use PodLocator again? Or is headless service worthwhile for us to implement the service topology? @thockin @andrewsykim

IMHO, we may not need to support headless service, if people want their service with service topolgoy support, just create a another non headless service. And this also avoid conflicting with istio's Locality Load Balancing, because if the cluster dns return the filtered endpoints' IP, the pilot cannot get a complete list of endpoints. However, pilot needs to generate eds rules and send them to envoy according to the DestinationRule, such as adjusting the endpoints' weight according to its' locality, rather than never forwarding to the endpoints which is in different locality.

Hey there @imroc @m1093782566 -- 1.18 Enhancements shadow here. I wanted to check in and see if you think this Enhancement will be graduating to alpha in 1.18 or having a major change in its current level?

The current release schedule is:

Tuesday, January 28th EOD PST - Enhancements Freeze
Thursday, March 5th, EOD PST - Code Freeze
Monday, March 16th - Docs must be completed and reviewed
Tuesday, March 24th - Kubernetes 1.18.0 Released

To be included in the release, this enhancement must have a merged KEP in the implementable status. The KEP must also have graduation criteria and a Test Plan defined.

I can see that the KEP link mentioned in the issue description is outdated. Kindly if you please update the description with the updated KEP.

If you would like to include this enhancement, once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly. 👍

We'll be tracking enhancements here: http://bit.ly/k8s-1-18-enhancements

Thanks!

@helayoty It's already in alpha in v1.17, and there are something unclear now, so it may has no major change in v1.18.

@imroc Thank you for the response. Kindly if you can update the KEP when it possible.

cc: @robscott

@imroc as discussed on the sig-network call today, I'm going to work on a PR that will ensure EndpointSlices set topology fields matching the topologyKeys defined on a Service. @helayoty I'll try to have a PR for the KEP tomorrow covering that change. I'm thinking this is a small enough update that it won't need tracking, but not sure. At the very least, I don't think 1.18 will have any API changes for Service Topology.

@robscott That's great, and yes, this small change does not affect the logic of Service Topology at all. Thanks for your information.

@robscott @imroc Perfect We'll include the issue in 1.18

Kindly if you can share the KEP PR and any other k/k PRs

/milestone v1.18

Hello, @robscott @imroc @m1093782566 -

Seth here, Docs shadow on the 1.18 release team.

Does this enhancement work planned for 1.18 require any new docs or modifications to existing docs?

If not, can you please update the 1.18 Enhancement Tracker Sheet (or let me know and I'll do so)

If doc updates are required, reminder that the placeholder PRs against k/website (branch dev-1.18) are due by Friday, Feb 28th.

Let me know if you have any questions!

Hey @sethmccombs, this will require some small additions to the Service Topology and/or EndpointSlice docs. I'll have placeholder docs PR in place by Feb 28.

Hey @sethmccombs, this change is not going to make it into 1.18, feel free to take it out of tracking for this release.

There was a good discussion on the sig-network call yesterday about other potential ways we might want to approach topology aware routing. I'm not sure exactly what, if anything, that will turn into, but I don't want to push too far in this direction until we've had more time to explore other potential approaches. This was already going to be a pretty small change, so I don't think leaving it til later will really be noticed.

With all that said, I'm assigning myself to this issue because I'd like to be part of pushing this forward in future releases.

/assign

/milestone clear

Hi @robscott -- 1.19 Enhancements Lead here, I wanted to check in if you think this enhancement would graduate in 1.19?


The current release schedule is:

  • Monday, April 13: Week 1 - Release cycle begins
  • Tuesday, May 19: Week 6 - Enhancements Freeze
  • Thursday, June 25: Week 11 - Code Freeze
  • Thursday, July 9: Week 14 - Docs must be completed and reviewed
  • Tuesday, August 4: Week 17 - Kubernetes v1.19.0 released

Hey @palnabarun, thanks for the reminder! There's been some discussion around a more automated approach to topology-aware routing as an alternative to this. Although closely related to this enhancement, it may end up with a separate KEP. I don't think there will be any direct progress on this KEP in this cycle as we explore this other potential approach.

Thank you @robscott for the updates. I will update the tracking sheet accordingly. :+1:

This is an excellent feature and exactly what I need.

In particular, I'm using the following topologyKeys:

  • ["kubernetes.io/hostname", "topology.kubernetes.io/region", "*"]
  • ["topology.kubernetes.io/region", "*"]

Hi @robscott @imroc @m1093782566

Enhancements Lead here - any plans for this in 1.20? As a reminder Enhancements Freeze is Oct 6th.

The link to the KEP in your description 404s, can you please update it? https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0033-service-topology.md should be https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/20181024-service-topology.md

Thanks!
Kirsten

Hey @kikisdeliveryservice, I don't _think_ there are any plans to move this forward in 1.20. I know we've been talking about ways we can simplify the API in sig-network, and we proposed one option in KEP 2004. That KEP has not been approved, but if it is, I think it may represent an evolution and a potential replacement of this approach. Unfortunately I don't have access to update the link to the KEP in this issue, but maybe @m1093782566 can?

Thanks for the update @robscott I was able to update the description myself. :+1: Will leave this enhancement as Untracked. If this changes let me know by October 6th :smile:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

saschagrunert picture saschagrunert  ·  6Comments

liggitt picture liggitt  ·  7Comments

prameshj picture prameshj  ·  9Comments

justaugustus picture justaugustus  ·  3Comments

justaugustus picture justaugustus  ·  7Comments