
Proxmox VE Advanced: Clustering, High Availability & Disaster Recovery
Master advanced Proxmox VE clustering with High Availability, Ceph hyper-converged storage, SDN with VLAN/VXLAN, and enterprise disaster recovery strategies. Hands-on 4-day training for production-grade deployments.
Objectifs de la formation
Upon completion of this training, you will be able to:
- Design and deploy production-grade Proxmox VE clusters with 3+ nodes
- Implement High Availability (HA) for automatic failover and zero-downtime operations
- Configure Ceph storage for hyper-converged infrastructure with RBD and CephFS
- Master Software-Defined Networking with VLAN, VXLAN, and EVPN configurations
- Build disaster recovery strategies with Proxmox Backup Server and live restore
- Automate cluster operations using REST API and command-line tools
- Troubleshoot complex scenarios including split-brain, storage failures, and network issues
- Optimize performance for production workloads and resource allocation
- Implement security best practices for clustered environments
- Plan capacity and scaling for growing infrastructure needs
Public concerné
This training is designed for:
Senior System Administrators
Experienced professionals managing production virtualization environments who need to implement high availability and disaster recovery solutions
Infrastructure Architects
Responsible for designing resilient, scalable infrastructure solutions using open-source technologies as alternatives to VMware or Hyper-V
DevOps Engineers
Seeking to automate infrastructure deployment and management with API-driven approaches and Infrastructure as Code
Cloud Engineers
Building private or hybrid cloud solutions with enterprise-grade availability and performance requirements
IT Managers
Technical decision-makers evaluating Proxmox VE for mission-critical workloads and cost optimization strategies
MSP Professionals
Managed Service Providers implementing multi-tenant infrastructure with advanced networking and isolation requirements
This advanced training is particularly relevant for organizations in French-speaking Africa seeking sovereign, cost-effective alternatives to proprietary virtualization solutions.
Prérequis
Technical Prerequisites
Required
- Proxmox VE Experience: Minimum 6 months managing Proxmox VE in production or completion of "Proxmox VE Fundamentals" training
- Linux Administration: Advanced command-line skills, systemd, networking, and storage management
- Networking Expertise: Deep understanding of VLANs, routing, switching, and TCP/IP stack
- Virtualization Knowledge: Experience with KVM, storage concepts, and resource management
- Basic Scripting: Bash scripting abilities for automation tasks
Recommended
- Experience with distributed storage (Ceph, GlusterFS, or similar)
- Understanding of BGP and EVPN concepts
- Familiarity with REST APIs and JSON
- Basic Python knowledge for advanced automation
- Experience with backup and disaster recovery planning
Lab Environment Requirements
Each participant needs access to a lab environment with:
- Minimum 3 physical servers or nested virtualization capability
- 64GB RAM total across all nodes (minimum 16GB per node)
- 500GB storage space for Ceph OSDs and VM storage
- Dedicated network for cluster communication (10Gbps recommended)
- Internet access for package updates and documentation
Programme détaillé
Detailed Training Program
Day 1: Advanced Clustering Architecture
Module 1: Proxmox VE Cluster Deep Dive (4h)
- Cluster architecture review and components
- Corosync cluster engine and communication
- Proxmox Cluster File System (pmxcfs) internals
- Quorum concepts and split-brain prevention
- Advanced cluster networking
- Redundant cluster communication links
- Network latency requirements and optimization
- Multicast vs Unicast configuration
- Cluster scalability and limits
- Node count considerations (tested up to 50 nodes)
- Performance implications of cluster size
- Geographic clustering possibilities
- Advanced cluster management
- Node addition and removal procedures
- Cluster recovery from various failure scenarios
- Backup and restore of cluster configuration
Build a 3-node cluster with redundant corosync links, simulate network failures, and practice cluster recovery procedures
Module 2: High Availability (HA) Implementation (3h)
- HA architecture and components
- HA Manager (ha-manager) internals
- Local Resource Manager (LRM) and Cluster Resource Manager (CRM)
- Fencing mechanisms and watchdog timers
- HA configuration and policies
- Resource states and state machines
- HA groups and migration priorities
- Custom HA policies and constraints
- Failure detection and recovery
- Node failure scenarios and automatic recovery
- Network partition handling
- Storage failure impact on HA
- HA best practices
- Hardware requirements for reliable HA
- Testing HA failover scenarios
- Maintenance mode and planned migrations
Configure HA for critical VMs, test various failure scenarios, implement custom HA policies
Day 2: Ceph Hyper-Converged Storage
Module 3: Ceph Storage Architecture (4h)
- Ceph fundamentals for Proxmox VE
- RADOS architecture and object storage
- Ceph monitors, managers, and OSDs
- CRUSH map and data placement
- Deploying Ceph on Proxmox VE
- Hardware requirements and recommendations
- Network design for Ceph (public/cluster networks)
- OSD deployment strategies (BlueStore)
- Ceph pools and performance tuning
- Pool creation and replication factors
- Erasure coding for space efficiency
- Performance optimization techniques
- QoS and bandwidth limitations
- Ceph RBD for VM storage
- RBD image features and snapshots
- Live migration with Ceph storage
- Thin provisioning and space reclamation
Deploy a 3-node Ceph cluster, create pools with different replication strategies, benchmark performance
Module 4: Advanced Ceph Features (3h)
- CephFS for shared storage
- MDS deployment and high availability
- CephFS volumes and subvolumes
- Access control and quotas
- Ceph maintenance and operations
- Adding and removing OSDs safely
- Upgrading Ceph while maintaining service
- Handling degraded states and recovery
- Monitoring and troubleshooting
- Ceph health monitoring and alerts
- Performance metrics and bottleneck identification
- Common issues and resolution strategies
- Disaster recovery with Ceph
- RBD mirroring for site replication
- Snapshot management strategies
- Recovery from catastrophic failures
Configure CephFS, simulate OSD failures, practice recovery procedures, implement monitoring
Day 3: Software-Defined Networking & Advanced Features
Module 5: SDN Implementation (4h)
- SDN architecture in Proxmox VE
- SDN zones: Simple, VLAN, QinQ, VXLAN, EVPN
- Controllers and transport networks
- VNets and subnet management
- VLAN and QinQ implementation
- VLAN-aware bridges and tagging
- QinQ for service provider scenarios
- Inter-VLAN routing strategies
- VXLAN overlay networks
- VXLAN concepts and encapsulation
- Multicast vs Unicast VXLAN
- MTU considerations and optimization
- Performance impact and hardware offloading
- EVPN-BGP advanced networking
- BGP configuration for EVPN
- Multi-site connectivity
- Anycast gateways and distributed routing
- Exit nodes and SNAT configuration
Implement multi-tenant isolation with VXLAN, configure EVPN for distributed networking, test cross-site connectivity
Module 6: Storage Replication & Migration (3h)
- ZFS replication framework
- Scheduled replication jobs
- Bandwidth limitations and scheduling
- Failover and failback procedures
- Cross-cluster migration strategies
- Online migration techniques
- Storage migration between different backends
- Minimizing downtime during migrations
- Backup integration
- vzdump advanced options
- Backup performance optimization
- Snapshot coordination with applications
Configure ZFS replication, perform live migrations between storage types, optimize backup windows
Day 4: Disaster Recovery & Automation
Module 7: Disaster Recovery Implementation (4h)
- Proxmox Backup Server integration
- PBS architecture and deduplication
- Incremental backup strategies
- Encryption and security considerations
- Disaster recovery planning
- RTO and RPO definitions
- Multi-site backup strategies
- Automated failover procedures
- Live restore capabilities
- Instant VM recovery from backup
- File-level recovery options
- Testing DR procedures without impact
- Cluster disaster recovery
- Full cluster backup strategies
- Recovering from total cluster loss
- Configuration backup and restore
- Ceph disaster recovery procedures
Deploy PBS, implement automated DR workflows, simulate disaster scenarios and recovery
Module 8: Automation and Monitoring (3h)
- REST API automation
- API authentication and tokens
- Common automation scenarios
- Python and pvesh scripting
- Ansible integration
- Proxmox Ansible modules
- Automated deployment workflows
- Configuration management
- Monitoring and alerting
- Metrics collection with InfluxDB
- Grafana dashboards for Proxmox
- Alert configuration and escalation
- Troubleshooting methodology
- Log analysis and correlation
- Performance bottleneck identification
- Common issues and solutions
Build an automated deployment pipeline, implement comprehensive monitoring, create runbooks for common scenarios
Certification and Assessment
- Practical assessment: Deploy and troubleshoot a complex multi-tier application
- Written exam covering all advanced topics
- ECINTELLIGENCE Advanced Clustering certificate upon successful completion
- Complete lab guides and automation scripts to take away
- 90-day access to cloud lab environment for practice
- Preparation guidance for Proxmox VE certified professional paths
Certification
À l'issue de cette formation, vous recevrez une attestation de participation délivrée par ECINTELLIGENCE.
Autres formations qui pourraient vous intéresser
Prêt à développer vos compétences ?
Rejoignez des centaines de professionnels qui ont fait confiance à ECINTELLIGENCE pour leur montée en compétences.
Voir toutes nos formations