Designing a Datacenter Intelligence Platform
When you're responsible for datacenter infrastructure, you need to see everything at once: BGP state, traffic flows, hardware health, and topology changes. Most monitoring tools show you one slice at a time. I wanted to build something that connects the dots.
The Stack
The platform uses a Go backend with ConnectRPC, a Next.js frontend, ClickHouse for analytics, PostgreSQL for operational state, and Kafka as the universal event bus. Every piece of telemetry — NetFlow, gNMI, SNMP, Redfish — flows through Kafka into ClickHouse.
Why ClickHouse
Network telemetry is high-volume, append-only, and query-heavy. ClickHouse handles millions of rows per second on ingest and sub-second queries on billions of rows. It's the right tool for time-series network data.
AI Integration
The chat interface uses Claude for reasoning — generating SQL queries from natural language, explaining anomalies, and correlating events across data sources. The key insight: the AI doesn't replace the engineer's judgment, it accelerates the investigation workflow.