Streaming Encoding for LIST Responses #5116

serathius · 2025-01-31T16:22:38Z

serathius · 2025-01-31T16:22:55Z

/sig api-machinery

chenk008 · 2025-02-07T05:51:01Z

I'm glad to see this proposal. We have also implemented similar capabilities in our inner repo and are preparing to push this part to upstream. We have submitted a CFP for the upcoming KubeCon China conference.

In our implementation, we use sync.Pool to efficiently manage memory allocation and cache the serialized results of each item. When the buffer reaches a certain size, we execute a flush operation to parallelize the serialization processing and write to http2.

Additionally, we have added support for gzip compression, which is only enabled when the first batch of cached data reaches 128 * 1024.

For json serialization, we have customized the StreamMarshal method for unstructuredList.

As for protobuf, we generate code through a generator to ensure reverse protobuf marshalling compatibility.

type StreamMarshaller interface {
	// return the object size and the item size slice
	StreamSize() (uint64, []int)

	StreamMarshal(w stream.Writer, itemSize []int) error
}

And it has conducted extensive testing with large datasets and have obtained comparative results. @yulongfang Can you share some benchmark results?

yulongfang · 2025-02-08T06:06:50Z

Thank @chenk008 for your introduction. We have many large-scale clusters in Alibaba Cloud. When the controllers of these large-scale clusters are restarted, they will initiate a full list request to the apiserver, which will have a certain impact on the stability of the cluster. We have to use larger machines to run the apiserver, resulting in a waste of resources.

In this context, we adopted the method to carry out relevant optimization and achieved the following results.

list json format return data stress test scenario description:

apiserver version: 1.30
apiserver specification: 32c 128GB
apiserver replica number: 1 replica
number of stock resources: build 10,000 100kb cr information
stress test scenario: increase pressure according to the gradient of qps 0.1 / 0.5

list json format return data related stress test data:

qps 0.05

before optimization: cpu 35.7 c mem 89Gb
stream json after optimization: cpu 6.22 c mem 60 Gb

qps 0.1

before optimization: cpu 11 c mem 146Gb
stream json after optimization: cpu 7.45 c mem 97 Gb

list protobuf Format Returned data Stress test scenario description:

apiserver version: 1.30
apiserver specification: 32c 128GB
apiserver replica number: 1 replica
Number of existing resources: Build 10,000 configmaps information of size 100kb
Stress test scenario: Increase pressure according to the gradient of qps 0.1 / 0.5

list configmaps format Returned data Related stress test data:

qps 0.05

Before optimization: cpu 16.8 c mem 54.3Gb
After stream json optimization: cpu 16.8 c mem 16.1 Gb

qps 0.1

Before optimization: cpu 42 c mem 122Gb
After stream json optimization: cpu 42 c mem 18 Gb

BenTheElder · 2025-02-12T19:28:13Z

FYI: Technical details are usually discussed in KEP PRs or elsewhere, with the KEP issue serving as a place to link back work.

@chenk008 @yulongfang you might consider reviewing #5119

serathius · 2025-02-13T10:17:32Z

Hey @chenk008 @yulongfang please see the previous discussion in kubernetes/kubernetes#129304 and kubernetes/kubernetes#129334.
We also have already done a performance analysis of our changes in kubernetes/kubernetes#129304 (comment).

We also added running a automatic benchmark of list requests. You can see the results in https://perf-dash.k8s.io/#/?jobname=benchmark%20list&metriccategoryname=E2E&metricname=Resources&Resource=memory&PodName=kube-apiserver-benchmark-list-master%2Fkube-apiserver

We currently run it in JSON + configmap with RV="" configuration, hope to expand it to include Proto, Pods, CustomResources and other types of LIST request. Would be awesome if you can contribute.

serathius · 2025-02-13T19:00:40Z

/milestone v1.33

pacoxu · 2025-02-14T07:53:29Z

@jpbetz @dipesh-rawat this is target to v1.33 and the KEP was merged.
Should the lead-opted-in and tracked label be added and tracked by release team?

dipesh-rawat · 2025-02-14T15:43:10Z

@serathius @pacoxu Unfortunately, the enhancement freeze deadline has passed, and this KEP issue was not lead-opted-in, so it wasn’t added to the tracking board for the v1.33 release. Post-freeze, we've disabled the automated sync job for KEP issues to the tracking board.

To move forward, we’ll need a short exception request filed so the team can add the lead-opted-in label and manually include this in the tracking board.

If you still wish to progress this enhancement in v1.33, please file an exception request as soon as possible, within three days. If you have any questions, you can reach out in the #release-enhancements channel on Slack and we'll be happy to help. Thanks!

(cc v1.33 Release Lead @npolshakova)

serathius · 2025-02-14T17:21:10Z

Ups, @jpbetz is OOO. @deads2k can you take a look?

serathius · 2025-02-14T17:54:25Z

Sent https://groups.google.com/g/kubernetes-sig-release/c/fDI9FdlClnA

dipesh-rawat · 2025-02-14T20:00:36Z

@serathius Since the release team has APPROVED the exception request here. This will be considered to be added to the milestone for v1.33 release.

dipesh-rawat · 2025-02-14T20:01:47Z

Hello @serathius 👋, v1.33 Enhancements team here.

This enhancement is targeting stage beta for v1.33 (correct me, if otherwise)
/stage beta

Here's where this enhancement currently stands:

KEP readme using the latest template has been merged into the k/enhancements repo.
KEP status is marked as implementable for latest-milestone: v1.33. KEPs targeting stable will need to be marked as implemented after code PRs are merged and the feature gates are removed.
KEP readme has up-to-date graduation criteria
KEP has a production readiness review that has been completed and merged into k/enhancements. (For more information on the PRR process, check here). If your production readiness review is not completed yet, please make sure to fill the production readiness questionnaire in your KEP by the PRR Freeze deadline on Thursday 6th February 2025 so that the PRR team has enough time to review your KEP.

With all the KEP requirements in place and merged into k/enhancements, this enhancement is all good for the upcoming enhancements freeze. 🚀

Could we please link the KEP README in the issue description.

Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/5116-streaming-response-encoding/README.md

The status of this enhancement is marked as Tracked for enhancements freeze. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

/label tracked/yes

dipesh-rawat · 2025-02-14T20:06:16Z

I've manually added this KEP to the tracking board and marked it as tracked for enhancements freeze🚀

Could one of the sig leads add the lead-opted-in label? @deads2k, would you be able to help with this or point me to someone who can? Thanks!

dipesh-rawat · 2025-02-17T14:15:32Z

Could one of the sig leads add the lead-opted-in label?

@serathius Would you be able to assist with the above request? It would be great to get the label added as work is being done in this v1.33 release.

serathius · 2025-02-17T14:27:18Z

I'm not a SIG api-machinery lead, so I don't think I should use it. I can ask nicely on Slack.

deads2k · 2025-02-17T14:34:40Z

/label lead-opted-in
/milestone v1.33

fykaa · 2025-02-28T18:58:52Z

Hey again @serathius 👋, v1.33 Enhancements team here,

Just checking in as we approach Code Freeze at 02:00 UTC Friday 21st March 2025 / 19:00 PDT Thursday 20th March 2025.

Here's where this enhancement currently stands:

All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
All PRs are ready to be merged (they have approved and lgtm labels applied) by the code freeze deadline. This includes tests.

For this enhancement, it looks like the following PRs need to be merged before code freeze (and we need to update the Issue description to include all the related PRs of this KEP):

~~Implement missing LIST benchmarks kubernetes#130169~~
~~Implement LIST benchmarks for different API kinds (Configmap, Pod, CustomResource)~~
Implement static analysis to validate json and proto tags of builtin List types kubernetes#130216
~~Implement tests for encoding collections in Proto kubernetes#130395~~
~~Implement JSON streaming encoder~~
~~Implement Proto streaming encoder~~
~~Reduce memory allocated for control plane in benchmarks~~

If you anticipate missing code freeze, you can file an exception request in advance.

Also, please let me know if there are other PRs in k/k we should be tracking for this KEP.

The status of this enhancement is marked as At risk for code freeze.

As always, we are here to help if any questions come up. Thanks!

aakankshabhende · 2025-03-01T02:55:43Z

Hi @serathius 👋 -- this is Aakanksha (@aakankshabhende ) from the 1.33 Communications Team!

For the 1.33 release, we are currently in the process of collecting and curating a list of potential feature blogs, and we'd love for you to consider writing one for your enhancement!

As you may be aware, feature blogs are a great way to communicate to users about features which fall into (but not limited to) the following categories:

This introduces some breaking change(s)
This has significant impacts and/or implications to users
...Or this is a long-awaited feature, which would go a long way to cover the journey more in detail 🎉

To opt in to write a feature blog, could you please let us know and open a "Feature Blog placeholder PR" (which can be only a skeleton at first) against the website repository by Wednesday, 5th March, 2025? For more information about writing a blog, please find the blog contribution guidelines 📚

Tip

Some timeline to keep in mind:

02:00 UTC Wednesday, 5th March, 2025: Feature blog PR freeze
Monday, 7th April, 2025: Feature blogs ready for review
You can find more in the release document

Note

In your placeholder PR, use XX characters for the blog date in the front matter and file name. We will work with you on updating the PR with the publication date once we have a final number of feature blogs for this release.

serathius · 2025-03-03T13:46:32Z

@aakankshabhende, done kubernetes/website#49985

dipesh-rawat · 2025-03-17T16:42:27Z

Hi @serathius 👋, v1.33 Enhancements team here,

Just a quick friendly reminder as we approach the code freeze later this week, at 02:00 UTC Friday 21st March 2025 / 19:00 PDT Thursday 20th March 2025.

The current status of this enhancement is marked as At risk for code freeze. There are a few requirements mentioned in the comment #5116 (comment) that still need to be completed.

If you anticipate missing code freeze, you can file an exception request in advance. Thank you!

serathius · 2025-03-17T16:53:41Z

We are just missing kubernetes/kubernetes#130216, however I don't think I will have time to implement it due to KEP-4988.

I would propose to move it to GA requirement. @liggitt do you think this is acceptable?

liggitt · 2025-03-17T17:34:14Z

We are just missing kubernetes/kubernetes#130216, however I don't think I will have time to implement it due to KEP-4988.

I would propose to move it to GA requirement. @liggitt do you think this is acceptable?

Without that check in place, every new API that gets added risks being unable to make use of streaming encoding... I think it's important to get in place early

serathius · 2025-03-17T22:16:39Z

I see that more as future proofing against API changes. I don't expect we will add API in this release that will not work with streaming encoding, but if that happens we can still validate that manually. Even if we miss some resource we should be good as the most important resources are covered.

liggitt · 2025-03-18T00:42:34Z

https://github.com/liggitt/kubernetes/commits/streaming-list-lint/ has the linting I'd expect... the kube-openapi commit goes to https://github.com/kubernetes/kube-openapi/, then bump that dependency and update the exceptions for the one missing item in k/k

serathius · 2025-03-18T08:41:22Z

It's hard to believe how awesome you are @liggitt! Knowing how busy you are, you still prepared a draft

serathius · 2025-03-18T18:19:04Z

Sent kubernetes/kube-openapi#531

serathius · 2025-03-19T09:30:59Z

And we are done!

One followup is to write the blogpost in kubernetes/website#49985 but that can be done post freeze. Asked @fuweid for collaborate.

dipesh-rawat · 2025-03-19T14:22:40Z

@serathius Thanks, for the update and confirming that all required changes are merged (here). Now we can mark this as tracked for code freeze. Also, please let us know if anything changes before the freeze or if there are any other PRs in k/k we should track for this KEP to keep the status accurate.

This enhancement is now marked as tracked for code freeze for the v1.33 Code Freeze!

serathius · 2025-03-28T09:11:31Z

There is a ongoing discussion whether this KEP should supersede of KEP-3157 as on server side it achieves better results without need for separate api. There are still some other considerations, discussion is under: https://kubernetes.slack.com/archives/C0EG7JC6T/p1743152351539269?thread_ts=1741283769.908819&cid=C0EG7JC6T

Performance comparison after increasing number of informers from 6 to 16 in kubernetes/perf-tests#3242

The watchlist memory usage increased by 30%, from 2GB to 2.6GB

https://perf-dash.k8s.io/#/?jobname=watch-list-on&metriccategoryname=E2E&metricname=L[…]PodName=kube-apiserver-bootstrap-e2e-master%2Fkube-apiserver

On the other hand streaming memory increased by 6%, from 1.74GB to 1.85GB

https://perf-dash.k8s.io/#/?jobname=watch-list-off&metriccategoryname=E2E&metricname=[…]PodName=kube-apiserver-bootstrap-e2e-master%2Fkube-apiserver

Of course this not apple to apple comparison. When looking at request count, WatchList makes 3 times more requests while using 50% less CPU. I expect there might be differences in informers break/restart logic and gzip enablement.

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 31, 2025

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 31, 2025

serathius changed the title ~~Streaming Response Encoding~~ Streaming Encoding for LIST Responses Jan 31, 2025

serathius mentioned this issue Jan 31, 2025

KEP-5116: Add KEP (Streaming response encoding) #5119

Merged

k8s-ci-robot added this to the v1.33 milestone Feb 13, 2025

k8s-ci-robot added stage/beta Denotes an issue tracking an enhancement targeted for Beta status tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Feb 14, 2025

dipesh-rawat added this to 1.33 Enhancements Tracking Feb 14, 2025

dipesh-rawat moved this to Tracked for enhancements freeze in 1.33 Enhancements Tracking Feb 14, 2025

serathius mentioned this issue Feb 17, 2025

Implement static analysis to validate json and proto tags of builtin List types kubernetes/kubernetes#130216

Closed

k8s-ci-robot added the lead-opted-in Denotes that an issue has been opted in to a release label Feb 17, 2025

serathius mentioned this issue Feb 27, 2025

apiserver builds up high memory usage after serving a few large LIST requests kubernetes/kubernetes#114276

Open

fykaa moved this from Tracked for enhancements freeze to At risk for code freeze in 1.33 Enhancements Tracking Feb 28, 2025

serathius mentioned this issue Mar 18, 2025

Add validation rules for streaming kubernetes/kube-openapi#531

Merged

dipesh-rawat moved this from At risk for code freeze to Tracked for code freeze in 1.33 Enhancements Tracking Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming Encoding for LIST Responses #5116

Streaming Encoding for LIST Responses #5116

serathius commented Jan 31, 2025 •

edited

Loading

serathius commented Jan 31, 2025

chenk008 commented Feb 7, 2025 •

edited

Loading

yulongfang commented Feb 8, 2025

BenTheElder commented Feb 12, 2025 •

edited

Loading

serathius commented Feb 13, 2025

serathius commented Feb 13, 2025

pacoxu commented Feb 14, 2025

dipesh-rawat commented Feb 14, 2025

serathius commented Feb 14, 2025 •

edited

Loading

serathius commented Feb 14, 2025

dipesh-rawat commented Feb 14, 2025

dipesh-rawat commented Feb 14, 2025

dipesh-rawat commented Feb 14, 2025

dipesh-rawat commented Feb 17, 2025

serathius commented Feb 17, 2025

deads2k commented Feb 17, 2025

fykaa commented Feb 28, 2025 •

edited

Loading

aakankshabhende commented Mar 1, 2025

serathius commented Mar 3, 2025

dipesh-rawat commented Mar 17, 2025

serathius commented Mar 17, 2025

liggitt commented Mar 17, 2025

serathius commented Mar 17, 2025

liggitt commented Mar 18, 2025

serathius commented Mar 18, 2025 •

edited

Loading

serathius commented Mar 18, 2025

serathius commented Mar 19, 2025 •

edited

Loading

dipesh-rawat commented Mar 19, 2025

serathius commented Mar 28, 2025

Streaming Encoding for LIST Responses #5116

Streaming Encoding for LIST Responses #5116

Comments

serathius commented Jan 31, 2025 • edited Loading

Enhancement Description

serathius commented Jan 31, 2025

chenk008 commented Feb 7, 2025 • edited Loading

yulongfang commented Feb 8, 2025

BenTheElder commented Feb 12, 2025 • edited Loading

serathius commented Feb 13, 2025

serathius commented Feb 13, 2025

pacoxu commented Feb 14, 2025

dipesh-rawat commented Feb 14, 2025

serathius commented Feb 14, 2025 • edited Loading

serathius commented Feb 14, 2025

dipesh-rawat commented Feb 14, 2025

dipesh-rawat commented Feb 14, 2025

dipesh-rawat commented Feb 14, 2025

dipesh-rawat commented Feb 17, 2025

serathius commented Feb 17, 2025

deads2k commented Feb 17, 2025

fykaa commented Feb 28, 2025 • edited Loading

aakankshabhende commented Mar 1, 2025

serathius commented Mar 3, 2025

dipesh-rawat commented Mar 17, 2025

serathius commented Mar 17, 2025

liggitt commented Mar 17, 2025

serathius commented Mar 17, 2025

liggitt commented Mar 18, 2025

serathius commented Mar 18, 2025 • edited Loading

serathius commented Mar 18, 2025

serathius commented Mar 19, 2025 • edited Loading

dipesh-rawat commented Mar 19, 2025

serathius commented Mar 28, 2025

serathius commented Jan 31, 2025 •

edited

Loading

chenk008 commented Feb 7, 2025 •

edited

Loading

BenTheElder commented Feb 12, 2025 •

edited

Loading

serathius commented Feb 14, 2025 •

edited

Loading

fykaa commented Feb 28, 2025 •

edited

Loading

serathius commented Mar 18, 2025 •

edited

Loading

serathius commented Mar 19, 2025 •

edited

Loading