Building BGP Troubleshooting Agents
One of the most time-consuming parts of network operations is troubleshooting BGP sessions. When a peer drops, the runbook is always the same: check the summary, pull neighbor details, verify routes, analyze the state.
What if an agent could do that entire workflow automatically?
The Agent
I built a BGP troubleshooting agent using Google's ADK framework. It takes three tools — get_bgp_summary, get_bgp_neighbor_detail, and get_route_information — and chains them together based on what it finds.
The agent evaluates each peer independently. If a peer is Established, it summarizes health indicators. If not, it investigates: pulls detailed neighbor info, checks route reachability, and provides structured analysis.
What Works
The agent consistently identifies the most common failure modes: misconfigured ASNs, missing routes to peer addresses, and TCP/179 reachability issues. It's faster than a human at correlating data across multiple show commands.
What Breaks Production
The risk isn't in the agent's logic — it's in the tool implementations. If your tools use a generic CLI executor instead of read-only RPCs, the LLM could craft destructive commands. Always use a read-only NETCONF user class and validate every input.
Guardrails
Three rules: read-only device credentials, input validation on every tool argument, and rate limiting on device connections. The agent should never connect to more than one device per minute.