Site Reliability Engineer
TCS- Maintained 99.95% uptime for critical healthcare applications serving 100K+ users
- Implemented comprehensive monitoring with Prometheus, Grafana, and custom alerting reducing MTTR by 60%
- Migrated monolithic applications to microservices architecture on Kubernetes, improving scalability by 300%
- Built CI/CD pipelines with automated testing, security scanning, and blue-green deployments
- Developed infrastructure-as-code templates reducing provisioning time from days to hours
- Led incident response procedures and post-mortem reviews, establishing blameless culture
- Automated backup, disaster recovery, and compliance reporting processes
