Senior DevOps Engineer (AWS)
Remote
Full Time
Experienced

Overview
As a senior DevOps Engineer, you will own the AWS infrastructure and DevOps toolchain for a high-scale ad serving system composed of asynchronous Java microservices (Akka framework). Targets include <50ms response time and up to 5M concurrent users with 99.99% uptime.
Responsibilities
- Design & stand up AWS environments end-to-end (landing zone, VPCs, networking, security, automation).
- Build immutable infrastructure and CI/CD for Java microservices (Maven/Gradle) including blue/green & canary releases and automated rollbacks.
- Implement observability: metrics, logs, traces, SLOs/SLIs, alerting, on-call runbooks.
- Engineer reliability & performance: autoscaling, caching layers, multi-AZ/region DR, capacity planning to support 5M+ concurrent users and p95/p99 latency goals.
- Establish security-by-design: IAM least privilege, KMS/Secrets Manager, WAF/Shield, image/signing policies, CIS benchmarks.
- Partner with EY developers & Performance Test Engineer to tune JVM/Akka, thread pools, GC, and infra limits based on load-testing feedback.
- Champion cost governance and tagging; produce dashboards and weekly reports.
- AWS: EKS/ECS, EC2, ALB/NLB, API Gateway/Lambda, S3/CloudFront, DynamoDB/ElastiCache (Redis), Aurora/RDS, MSK/Kinesis, OpenSearch, Route 53, VPC, NAT/GW, WAF/Shield, CloudWatch/X-Ray, IAM, KMS, Secrets Manager.
- IaC & CI/CD: Terraform/CloudFormation, Helm, Argo CD or Flux, GitHub Actions/Jenkins/GitLab CI, Docker.
- Observability: CloudWatch, OpenTelemetry, Prometheus/Grafana, log pipelines.
- Languages/Build: Bash/Python for automation; familiarity with Java build/release workflows.
- 3–5+ years total experience; Senior/Manager-level depth in AWS platform engineering for high-throughput, low-latency services.
- Proven ownership of production systems at 10k–1M+ concurrent users (or comparable high RPS) with 99.9x SLOs.
- Hands-on with Akka/Java microservice delivery pipelines (nice if you’ve tuned JVM, GC, Akka dispatchers).
- Strong grounding in scaling patterns (event-driven, async IO, caching, backpressure, rate limiting) and resilience(circuit breakers, retries, chaos).
- Excellent collaboration, documentation, and stakeholder communication.
- Location: Remote (prefer India candidates)
- Schedule: Must join US morning calls (Eastern Time) as needed.
- Start: 1–3 weeks from offer.
- Term: Through end of January (likely extension).
Apply for this position
Required*