Building BGP Troubleshooting Agents

One of the most time-consuming parts of network operations is troubleshooting BGP sessions. When a peer drops, the runbook is always the same: check the summary, pull neighbor details, verify routes, analyze the state.

What if an agent could do that entire workflow automatically?

The Agent

I built a BGP troubleshooting agent using Google's ADK framework. It takes three tools — get_bgp_summary, get_bgp_neighbor_detail, and get_route_information — and chains them together based on what it finds.

The agent evaluates each peer independently. If a peer is Established, it summarizes health indicators. If not, it investigates: pulls detailed neighbor info, checks route reachability, and provides structured analysis.

What Works

The agent consistently identifies the most common failure modes: misconfigured ASNs, missing routes to peer addresses, and TCP/179 reachability issues. It's faster than a human at correlating data across multiple show commands.

What Breaks Production

The risk isn't in the agent's logic — it's in the tool implementations. If your tools use a generic CLI executor instead of read-only RPCs, the LLM could craft destructive commands. Always use a read-only NETCONF user class and validate every input.

Guardrails

Three rules: read-only device credentials, input validation on every tool argument, and rate limiting on device connections. The agent should never connect to more than one device per minute.