目前共有1篇帖子。
字体大小:较小 - 100% (默认)  内容转换:不转换
 
点击 回复
99 0
Ray Serve Autoscaling
巨大八爪鱼
武林盟主 二十一级
回复
1楼 发表于:2026-5-22 18:14
Ray Serve Autoscaling

Each Ray Serve deployment has one replica by default. This means there is one worker process running the model and serving requests. When traffic to your deployment increases, the single replica can become overloaded. To maintain high performance of your service, you need to scale out your deployment.


Manual Scaling

Before jumping into autoscaling, which is more complex, the other option to consider is manual scaling. You can increase the number of replicas by setting a higher value for num_replicas in the deployment options through in place updates. By default, num_replicas is 1. Increasing the number of replicas will horizontally scale out your deployment and improve latency and throughput for increased levels of traffic.


Autoscaling Basic Configuration

Instead of setting a fixed number of replicas for a deployment and manually updating it, you can configure a deployment to autoscale based on incoming traffic. The Serve autoscaler reacts to traffic spikes by monitoring queue sizes and making scaling decisions to add or remove replicas. Turn on autoscaling for a deployment by setting num_replicas="auto". You can further configure it by tuning the autoscaling_config in deployment options.


https://docs.ray.io/en/latest/serve/autoscaling-guide.html

回复帖子
内容:
用户名: 您目前是匿名发表。
验证码:
看不清?换一张
(快捷键:Ctrl+Enter)
本帖信息
点击数:99 回复数:0
作者:巨大八爪鱼
最后回复:巨大八爪鱼
最后回复时间:2026-5-22 18:14
公告板