點擊	回復
23	0

Ray Serve Autoscaling

巨大八爪鱼

一派掌門二十級

1樓發表于：2026-5-22 18:14

回復

Ray Serve Autoscaling

Each Ray Serve deployment has one replica by default. This means there is one worker process running the model and serving requests. When traffic to your deployment increases, the single replica can become overloaded. To maintain high performance of your service, you need to scale out your deployment.

Manual Scaling

Before jumping into autoscaling, which is more complex, the other option to consider is manual scaling. You can increase the number of replicas by setting a higher value for num_replicas in the deployment options through in place updates. By default, num_replicas is 1. Increasing the number of replicas will horizontally scale out your deployment and improve latency and throughput for increased levels of traffic.

Autoscaling Basic Configuration

Instead of setting a fixed number of replicas for a deployment and manually updating it, you can configure a deployment to autoscale based on incoming traffic. The Serve autoscaler reacts to traffic spikes by monitoring queue sizes and making scaling decisions to add or remove replicas. Turn on autoscaling for a deployment by setting num_replicas="auto". You can further configure it by tuning the autoscaling_config in deployment options.

https://docs.ray.io/en/latest/serve/autoscaling-guide.html

回復帖子


內容：	圖片視頻表情
用戶名：	您目前是匿名發表
驗證碼：
	看不清？換一張
	（快捷鍵：Ctrl+Enter）

本帖信息

點擊數：23

回複數：0

評論數：	?
作者：巨大八爪鱼
最後回復：巨大八爪鱼
最後回復時間：2026-5-22 18:14

公告板

	【新功能】现在手机版发帖也可以上传图片了
	【公告】布拉斯侃吧（Purasbar）全站已启用HTTP/2访问以及TLS1.3加密
	【新功能】楼中楼功能已上线
	【公告】Purasbar http访问方式已关闭，从现在起只能通过https方式访问
	【新功能】现在可以直接在发帖框中粘贴图片啦！
	【新功能】搜索框提示功能上線了
	【公告】第十五次補丁包安裝完畢
	【公告】從現在開始，管理員將停止審批會員
	【公告】阿斯兰侃吧现在开始支持简繁混合搜索
	【公告】阿斯蘭侃吧啟用https訪問
	【公告】从今天开始，本站实行主题编号制
	【新功能】图片缩放功能上线了

	©2010-2026 Purasbar Ver2.0 ▲
	除非另有聲明，本站採用創用CC姓名標示-相同方式分享 3.0 Unported許可協議進行許可。