Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a scale-down-min-node-age config #8005

Open
neumaa opened this issue Apr 2, 2025 · 0 comments
Open

Add a scale-down-min-node-age config #8005

neumaa opened this issue Apr 2, 2025 · 0 comments
Labels
area/cluster-autoscaler kind/feature Categorizes issue or PR as related to a new feature.

Comments

@neumaa
Copy link

neumaa commented Apr 2, 2025

Which component are you using?:

cluster-autoscaler

/area cluster-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

This feature request is designed to solve an issue we have when doing maintenance on our Kubernetes nodes and CA shuts down new nodes that have recently been created.

We do regular replacement of our kubernetes nodes for security patches and other updates. During this process, we replace nodes with new ones. After a new node joins the cluster, the pods on the old nodes are drained and moved to the new node. This process can often take some time depending on what is running in the cluster, the number of concurrent drains, and the defined PodDisruptionBudgets. When this happens, we find that CA will occasionally mark new nodes as unneeded and will shut them down which interferes with our process.

Describe the solution you'd like.:

I purpose adding a new configuration that would tell CA not to scale down nodes younger than a certain age. Something like "scale-down-min-node-age". Nodes would not be considered for scale down until they have reached this age.

Describe any alternative solutions you've considered.:

We are using a pretty sort scale-down-unneeded-time setting to prioritize cost. Increasing it would fix this issue, however, cost is a big priority. We'd prefer to keep this value low during normal operations.

We've considered using tags, labels, or taints or mark these new nodes as non-candidates for scale in during the maintenance. Once complete, we could set these back to the desired operational settings. We've considered this approach also for the scale-down-unneeded-time setting or by disabling CA before and enabling it after. These approaches are doable, but adds extra complexity. Our automation triggers cloud provider built in processes to do the node replacements and often takes a while to complete. We'd have to run something to monitor the replacement process and take action when needed. We feel that the feature requested would be a simpler solution and potentially useful to others using CA.

@neumaa neumaa added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants