Together AI adds enterprise-grade autoscaling, RBAC, observability dashboards, and self-healing node repair to GPU Clusters as company pursues $1B funding roundTogether AI adds enterprise-grade autoscaling, RBAC, observability dashboards, and self-healing node repair to GPU Clusters as company pursues $1B funding round

Together AI Upgrades GPU Clusters With Autoscaling and Self-Healing Features

2026/03/11 01:34
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Together AI Upgrades GPU Clusters With Autoscaling and Self-Healing Features

Lawrence Jengar Mar 10, 2026 17:34

Together AI adds enterprise-grade autoscaling, RBAC, observability dashboards, and self-healing node repair to GPU Clusters as company pursues $1B funding round.

Together AI Upgrades GPU Clusters With Autoscaling and Self-Healing Features

Together AI has rolled out a significant infrastructure upgrade to its GPU Clusters platform, adding autoscaling, role-based access control, full-stack observability, and self-healing node repair capabilities. The enhancements arrive as the AI cloud company reportedly pursues $1 billion in fresh funding, according to reports from earlier this month.

The timing isn't coincidental. Enterprise customers running distributed training workloads across hundreds of GPUs need more than raw compute—they need infrastructure that doesn't require babysitting.

Autoscaling Targets GPU Waste

The new autoscaling feature, powered by the Kubernetes Cluster Autoscaler, monitors for GPU-constrained workloads and automatically provisions or decommissions nodes based on real-time demand. For teams running variable inference workloads or bursty training jobs, this means no more paying for idle hardware during quiet periods.

Static GPU provisioning has been a persistent pain point. Organizations either overprovision (expensive) or underprovision (performance bottlenecks during demand spikes). Together's approach lets clusters expand during peak load and contract when demand subsides.

Self-Healing Addresses Hardware Reality

GPU hardware fails. In large fleets, it's not a question of if but when. For distributed training, a single unstable node can invalidate hours of compute time.

Together's solution: self-serve health checks that users can trigger before launching major training jobs. Tests range from basic DCGM diagnostics to multi-node NCCL and InfiniBand bandwidth tests. When a node does fail, a three-click self-repair process automatically cordons, drains, and recreates the node—bringing clusters back to healthy status within minutes rather than hours.

Acceptance tests now run automatically during provisioning. Clusters won't be marked ready until they pass.

Enterprise Access Controls

The RBAC implementation introduces "Projects" as isolation boundaries for teams. Two default roles split responsibilities cleanly: Admins get full control plane access for cluster creation and deletion, while Members can access GPU worker nodes and run workloads without touching infrastructure provisioning.

This matters for organizations where platform engineers need to lock down infrastructure while giving ML researchers freedom to experiment.

Observability Gets Native

Every GPU Cluster project now includes a dedicated Grafana instance with pre-built dashboards. Telemetry covers GPU utilization via DCGM metrics, InfiniBand and NIC-level networking data, storage I/O performance, and Kubernetes orchestration health. The feature is currently in private preview.

Market Context

Together AI has been building momentum in the GPU-as-a-service space. The company launched self-service GPU infrastructure in September 2025 and introduced Instant GPU Clusters at NVIDIA GTC 2025 in March of that year. The platform supports NVIDIA Hopper (H100) and Blackwell (B200) GPUs, with Instant Clusters scaling up to 64 GPUs and Dedicated Clusters reaching 1,000 GPUs.

With a reported $7.5 billion market cap and a potential billion-dollar funding round in progress, Together is positioning itself as a serious alternative to hyperscaler GPU offerings—targeting teams that want bare-metal performance without the operational overhead of managing their own hardware.

The new features are available immediately to existing Together GPU Clusters customers.

Image source: Shutterstock
  • together ai
  • gpu infrastructure
  • ai computing
  • cloud infrastructure
  • enterprise ai
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC

Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC

The post Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC appeared on BitcoinEthereumNews.com. Franklin Templeton CEO Jenny Johnson has weighed in on whether the Federal Reserve should make a 25 basis points (bps) Fed rate cut or 50 bps cut. This comes ahead of the Fed decision today at today’s FOMC meeting, with the market pricing in a 25 bps cut. Bitcoin and the broader crypto market are currently trading flat ahead of the rate cut decision. Franklin Templeton CEO Weighs In On Potential FOMC Decision In a CNBC interview, Jenny Johnson said that she expects the Fed to make a 25 bps cut today instead of a 50 bps cut. She acknowledged the jobs data, which suggested that the labor market is weakening. However, she noted that this data is backward-looking, indicating that it doesn’t show the current state of the economy. She alluded to the wage growth, which she remarked is an indication of a robust labor market. She added that retail sales are up and that consumers are still spending, despite inflation being sticky at 3%, which makes a case for why the FOMC should opt against a 50-basis-point Fed rate cut. In line with this, the Franklin Templeton CEO said that she would go with a 25 bps rate cut if she were Jerome Powell. She remarked that the Fed still has the October and December FOMC meetings to make further cuts if the incoming data warrants it. Johnson also asserted that the data show a robust economy. However, she noted that there can’t be an argument for no Fed rate cut since Powell already signaled at Jackson Hole that they were likely to lower interest rates at this meeting due to concerns over a weakening labor market. Notably, her comment comes as experts argue for both sides on why the Fed should make a 25 bps cut or…
Share
BitcoinEthereumNews2025/09/18 00:36
Strategy leans on STRC to accelerate Bitcoin buying in 2026

Strategy leans on STRC to accelerate Bitcoin buying in 2026

The post Strategy leans on STRC to accelerate Bitcoin buying in 2026 appeared on BitcoinEthereumNews.com. Strategy has found a new gear in its Bitcoin accumulation
Share
BitcoinEthereumNews2026/03/11 03:18
Senator Alsobrooks warns that the CLARITY Act middle ground will leave everyone "a little bit unhappy"

Senator Alsobrooks warns that the CLARITY Act middle ground will leave everyone "a little bit unhappy"

Speaking at the American Bankers Association summit in Washington, US Senator from Maryland, Angela Alsobrooks, spoke bluntly to a room full of community bankers
Share
Cryptopolitan2026/03/11 03:25