25Q1
No due date
15% complete
- Cloud auto-scaling using CSP APIs
- Compose stack (single-session / multi-sesison) to bundle multiple models and services
- More RBAC support (introducing new features based on it, such as project administrators)
- Account manager for SSO
- More Relay-compliant GraphQL schema (maybe full migration at some point)
- Raftify-based HA setup
- Extensive Prometheus/OpenTe…
- Cloud auto-scaling using CSP APIs
- Compose stack (single-session / multi-sesison) to bundle multiple models and services
- More RBAC support (introducing new features based on it, such as project administrators)
- Account manager for SSO
- More Relay-compliant GraphQL schema (maybe full migration at some point)
- Raftify-based HA setup
- Extensive Prometheus/OpenTelemetry integration across entire project
- Enhanced container registry integration (per-project registry, per-project quota, etc.)
- Unified storage resource group (storage proxy + storage agent with "direct access" SFTP/filebrowser containers)
- VFolder abstractions for object storage buckets (Minio / S3)
- Rolling update of Backend.AI cluster (or at least agents)
- Multi-license support in a single license server
- Retire of keypair resource policies and migration to user resource policies
- Also need to update the owner-access-key option to use user identities
- Project-first architecture – per-user "workspace"
- Project-level sharing of sessions (#2346)
- Project-level container image visibility (including user-committed images)
- User, vfolder operation audit logs
- Migration of resource allocation maps from agent to manager for more holistic scheduling optimization (e.g., guaranteeing no fragmentation of GPUs)
- Hierarchical managers to parallelize per-resource-group scheduling and idle checks
- Session template revamps
- Live propagation of configurations (e.g., fGPU options) via etcd watch
- Easier (multi-node) installation of open-source edition
- Logging contexts and request IDs
- Make idle checkers scoped within resource groups
- Optimized App Proxy traffic routing (probably via native modules and/or with Cilium)
- User-defined network partitions via flexible SDN control plane integration
- Snapshot and lineage tracking of vfolders (when the underlying storage backend supports)
- Virtual agents to proxy external container orchestrators and node pools
- Experimental agent backends like Singularity and native processes
- Improved documentation for various plugins and SDK