Senior Operations Engineer (Acting Lead) | Production Support | SRE | DevOps Operations
Naga Durga Prasad Talla
Senior Operations Engineer | Site Reliability | Production Support | DevOps Operations
“Ensuring 24×7 production stability, observability, and operational excellence.”
About Me
Senior Operations and Production Support professional with 8+ years of hands-on experience ensuring high availability, incident resilience, and service continuity across telecom and gaming environments. Proven track record in managing critical production incidents, driving observability maturity, and enabling faster root cause resolution through proactive monitoring frameworks. Currently leading a 12-member operations team, collaborating with cross-functional engineering units to stabilize complex production ecosystems, strengthen CI/CD operational practices, and deliver consistent service reliability under business-critical workloads.
Core Skills
Incident Management
Problem Management
Change Management
Production Support
Release Management
Monitoring & Observability
DevOps Operations
Team Leadership
Root Cause Analysis
Infrastructure Operations
Cross-team Coordination
Tech Stack
Monitoring & Observability
DevOps
Cloud
Tools
Databases
Professional Experience
Senior Support Engineer (Acting Lead)
May 2023 – PresentQvantel – Hyderabad
- Leading a 12-member production operations team
- Managing telecom BSS applications
- Incident and outage management
- Kubernetes cluster operations
- Monitoring with Dynatrace, Instana, OpsGenie, and Thruk
- RCA and service stabilization
- CI/CD operational support using Jenkins and Git
System Engineer
May 2019 – Feb 2022ValueLabs – Hyderabad
- Delivered 24×7 production support services
- Resolved Sev1–Sev4 incidents across critical services
- Handled API monitoring and job failure recovery
- Coordinated infrastructure upgrade windows
- Built dashboards and improved monitoring visibility
Game Tester & Customer Care Representative
Oct 2018 – Apr 2019Glu Mobile – Hyderabad
- Performed game testing and bug reporting
- Validated gameplay quality and release-readiness
- Provided player support and feedback analysis
Customer Service Associate
2017 – 2018Amazon – Hyderabad
- Supported customer service and helpdesk processes
- Handled incidents and service escalations
- Tracked service requests for process closure
Production Operations & Site Reliability
Responsible for maintaining highly critical telecom production systems, ensuring 24×7 uptime and service reliability under business-critical operating conditions.
- Kubernetes cluster lifecycle management and workload stability
- Observability implementation with Prometheus and Grafana
- Centralized log monitoring pipelines with Loki
- Incident alerting and on-call action orchestration through OpsGenie
- CI/CD operations support using Jenkins pipelines
- Deployment automation with Helm charts and YAML configuration
- Pod troubleshooting and remediation using kubectl workflows
- RCA-driven reliability improvements and service hardening
Leadership & Achievements
GitHub
wild-apache
Automation, DevOps experiments and infrastructure tools.
View GitHubContact
Email: durgap34@gmail.com
Phone: +91-9010613910
LinkedIn: linkedin.com/in/tndp
GitHub: github.com/wild-apache