Senior Operations Engineer (Acting Lead) | Production Support | SRE | DevOps Operations

Naga Durga Prasad Talla

Senior Operations Engineer | Site Reliability | Production Support | DevOps Operations

“Ensuring 24×7 production stability, observability, and operational excellence.”

Focus Area:

View Experience

About Me

Senior Operations and Production Support professional with 8+ years of hands-on experience ensuring high availability, incident resilience, and service continuity across telecom and gaming environments. Proven track record in managing critical production incidents, driving observability maturity, and enabling faster root cause resolution through proactive monitoring frameworks. Currently leading a 12-member operations team, collaborating with cross-functional engineering units to stabilize complex production ecosystems, strengthen CI/CD operational practices, and deliver consistent service reliability under business-critical workloads.

Core Skills

IT Service Management

Incident Management

Problem Management

Change Management

Production Support

Release Management

Monitoring & Observability

DevOps Operations

Team Leadership

Root Cause Analysis

Infrastructure Operations

Cross-team Coordination

Tech Stack

Monitoring & Observability

PrometheusGrafanaLokiDynatrace InstanaNagiosOpsGenieSplunk

DevOps

JenkinsGitDockerKubernetesHelm

Cloud

AWSGCP

Tools

ServiceNowJIRARancherPagerDuty PostmanSalesforce

Databases

SQLOracle

Professional Experience

Senior Support Engineer (Acting Lead)

May 2023 – Present

Qvantel – Hyderabad

Leading a 12-member production operations team
Managing telecom BSS applications
Incident and outage management
Kubernetes cluster operations
Monitoring with Dynatrace, Instana, OpsGenie, and Thruk
RCA and service stabilization
CI/CD operational support using Jenkins and Git

System Engineer

May 2019 – Feb 2022

ValueLabs – Hyderabad

Delivered 24×7 production support services
Resolved Sev1–Sev4 incidents across critical services
Handled API monitoring and job failure recovery
Coordinated infrastructure upgrade windows
Built dashboards and improved monitoring visibility

Game Tester & Customer Care Representative

Oct 2018 – Apr 2019

Glu Mobile – Hyderabad

Performed game testing and bug reporting
Validated gameplay quality and release-readiness
Provided player support and feedback analysis

Customer Service Associate

2017 – 2018

Amazon – Hyderabad

Supported customer service and helpdesk processes
Handled incidents and service escalations
Tracked service requests for process closure

Production Operations & Site Reliability

Responsible for maintaining highly critical telecom production systems, ensuring 24×7 uptime and service reliability under business-critical operating conditions.

Kubernetes cluster lifecycle management and workload stability
Observability implementation with Prometheus and Grafana
Centralized log monitoring pipelines with Loki
Incident alerting and on-call action orchestration through OpsGenie
CI/CD operations support using Jenkins pipelines
Deployment automation with Helm charts and YAML configuration
Pod troubleshooting and remediation using kubectl workflows
RCA-driven reliability improvements and service hardening

Leadership & Achievements

Leading a 12-member operations team

Supporting business-critical telecom applications

Handling critical production incidents effectively

Stabilizing services during transition to steady state

Designing monitoring dashboards and alert frameworks

Conducting technical knowledge transfer sessions

GitHub

wild-apache

Automation, DevOps experiments and infrastructure tools.

View GitHub

Contact

Email: durgap34@gmail.com

Phone: +91-9010613910

LinkedIn: linkedin.com/in/tndp

GitHub: github.com/wild-apache