You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
This feature request is designed to solve an issue we have when doing maintenance on our Kubernetes nodes and CA shuts down new nodes that have recently been created.
We do regular replacement of our kubernetes nodes for security patches and other updates. During this process, we replace nodes with new ones. After a new node joins the cluster, the pods on the old nodes are drained and moved to the new node. This process can often take some time depending on what is running in the cluster, the number of concurrent drains, and the defined PodDisruptionBudgets. When this happens, we find that CA will occasionally mark new nodes as unneeded and will shut them down which interferes with our process.
Describe the solution you'd like.:
I purpose adding a new configuration that would tell CA not to scale down nodes younger than a certain age. Something like "scale-down-min-node-age". Nodes would not be considered for scale down until they have reached this age.
Describe any alternative solutions you've considered.:
We are using a pretty sort scale-down-unneeded-time setting to prioritize cost. Increasing it would fix this issue, however, cost is a big priority. We'd prefer to keep this value low during normal operations.
We've considered using tags, labels, or taints or mark these new nodes as non-candidates for scale in during the maintenance. Once complete, we could set these back to the desired operational settings. We've considered this approach also for the scale-down-unneeded-time setting or by disabling CA before and enabling it after. These approaches are doable, but adds extra complexity. Our automation triggers cloud provider built in processes to do the node replacements and often takes a while to complete. We'd have to run something to monitor the replacement process and take action when needed. We feel that the feature requested would be a simpler solution and potentially useful to others using CA.
The text was updated successfully, but these errors were encountered:
Which component are you using?:
cluster-autoscaler
/area cluster-autoscaler
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
This feature request is designed to solve an issue we have when doing maintenance on our Kubernetes nodes and CA shuts down new nodes that have recently been created.
We do regular replacement of our kubernetes nodes for security patches and other updates. During this process, we replace nodes with new ones. After a new node joins the cluster, the pods on the old nodes are drained and moved to the new node. This process can often take some time depending on what is running in the cluster, the number of concurrent drains, and the defined PodDisruptionBudgets. When this happens, we find that CA will occasionally mark new nodes as unneeded and will shut them down which interferes with our process.
Describe the solution you'd like.:
I purpose adding a new configuration that would tell CA not to scale down nodes younger than a certain age. Something like "scale-down-min-node-age". Nodes would not be considered for scale down until they have reached this age.
Describe any alternative solutions you've considered.:
We are using a pretty sort scale-down-unneeded-time setting to prioritize cost. Increasing it would fix this issue, however, cost is a big priority. We'd prefer to keep this value low during normal operations.
We've considered using tags, labels, or taints or mark these new nodes as non-candidates for scale in during the maintenance. Once complete, we could set these back to the desired operational settings. We've considered this approach also for the scale-down-unneeded-time setting or by disabling CA before and enabling it after. These approaches are doable, but adds extra complexity. Our automation triggers cloud provider built in processes to do the node replacements and often takes a while to complete. We'd have to run something to monitor the replacement process and take action when needed. We feel that the feature requested would be a simpler solution and potentially useful to others using CA.
The text was updated successfully, but these errors were encountered: