Responsibilities
1. Infrastructure Architecture & Planning
- Design scalable cloud infrastructure for real-time data processing systems
- Define production architecture for: Kubernetes environments Streaming platforms (Redpanda/Kafka) ClickHouse clusters Redis deployments Solana validator nodes
- Recommend best practices for reliability, scalability, and cost optimization
- Define deployment topology and environment separation (dev/staging/prod)
2. DevOps Strategy & Implementation
- Establish Infrastructure-as-Code practices (Terraform / Pulumi)
- Design CI/CD pipelines and deployment workflows
- Implement containerization and orchestration strategies
- Guide secure networking, IAM, and secrets management
- Define backup, failover, and disaster recovery strategies
3. Reliability & Observability
- Design monitoring and alerting systems
- Implement observability stack: Prometheus Grafana Logging & tracing systems
- Define operational metrics, SLIs, and SLOs
- Review system performance and bottlenecks
Qualifications
- 4+ years experience in DevOps / Platform Engineering / SRE
- Proven experience designing production cloud architectures
- Strong Kubernetes production experience
- Experience operating distributed systems at scale
- Deep Linux, networking, and performance tuning knowledge
- Experience with Infrastructure-as-Code tools
- Strong experience with CI/CD automation