Senior DevOps Engineer (AWS)

Remote
Full Time
Experienced
                                                                                              
Overview
As a senior DevOps Engineer, you will own the AWS infrastructure and DevOps toolchain for a high-scale ad serving system composed of asynchronous Java microservices (Akka framework). Targets include <50ms response time and up to 5M concurrent users with 99.99% uptime.
Responsibilities
  • Design & stand up AWS environments end-to-end (landing zone, VPCs, networking, security, automation).
  • Build immutable infrastructure and CI/CD for Java microservices (Maven/Gradle) including blue/green & canary releases and automated rollbacks.
  • Implement observability: metrics, logs, traces, SLOs/SLIs, alerting, on-call runbooks.
  • Engineer reliability & performance: autoscaling, caching layers, multi-AZ/region DR, capacity planning to support 5M+ concurrent users and p95/p99 latency goals.
  • Establish security-by-design: IAM least privilege, KMS/Secrets Manager, WAF/Shield, image/signing policies, CIS benchmarks.
  • Partner with EY developers & Performance Test Engineer to tune JVM/Akka, thread pools, GC, and infra limits based on load-testing feedback.
  • Champion cost governance and tagging; produce dashboards and weekly reports.
Tech you’ll use (you don’t need every single one, but you know most)
  • AWS: EKS/ECS, EC2, ALB/NLB, API Gateway/Lambda, S3/CloudFront, DynamoDB/ElastiCache (Redis), Aurora/RDS, MSK/Kinesis, OpenSearch, Route 53, VPC, NAT/GW, WAF/Shield, CloudWatch/X-Ray, IAM, KMS, Secrets Manager.
  • IaC & CI/CD: Terraform/CloudFormation, Helm, Argo CD or Flux, GitHub Actions/Jenkins/GitLab CI, Docker.
  • Observability: CloudWatch, OpenTelemetry, Prometheus/Grafana, log pipelines.
  • Languages/Build: Bash/Python for automation; familiarity with Java build/release workflows.
What makes you a great fit
  • 3–5+ years total experience; Senior/Manager-level depth in AWS platform engineering for high-throughput, low-latency services.
  • Proven ownership of production systems at 10k–1M+ concurrent users (or comparable high RPS) with 99.9x SLOs.
  • Hands-on with Akka/Java microservice delivery pipelines (nice if you’ve tuned JVM, GC, Akka dispatchers).
  • Strong grounding in scaling patterns (event-driven, async IO, caching, backpressure, rate limiting) and resilience(circuit breakers, retries, chaos).
  • Excellent collaboration, documentation, and stakeholder communication.
Logistics
  • Location: Remote (prefer India candidates)
  • Schedule: Must join US morning calls (Eastern Time) as needed.
  • Start1–3 weeks from offer.
  • Term: Through end of January (likely extension).

 
Share

Apply for this position

Required*
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*