Our Take on What AI PMs Deserve

Dec 12, 2024

The GenAI Landscape: Through Our Eyes

When we first dove into the GenAI space in early 2023, we noticed something that didn't sit right with us. Nearly every tool was built for engineers and data scientists—the folks building the infrastructure. Makes sense, right? The field was new, and technical professionals were racing to launch innovative applications and showcase what this technology could do.

As we talked with teams implementing AI solutions, we watched the rise of "AI Engineers" and prompt specialists. Open-source libraries multiplied overnight. Products like Langsmith evolved to help with productivity, introducing features for testing prompts and configurations. Engineers were happy, and domain experts had ways to contribute.

But something was missing.

A Painful Reality to measure quality

After speaking with over 100 teams building AI products, we discovered a frustrating truth: product leaders were being left behind. The tools weren't designed with them in mind. We saw product leaders painfully reviewing agent runs manually, forced to sift through long text chunks for tiny insights into what actually works.

This hit close to home for us. In most companies—AI-powered or not—product leaders make the tough calls that. They're the ones who need to know if something's working, why it isn't, and plan fixing it.

This need previously gave rise to giants like Amplitude, Mixpanel, and Fullstory. But here's the hard truth we've discovered: these platforms fall flat when analyzing GenAI applications with text or audio interactions. The unstructured nature of these interactions creates a blindspot that existing tools weren't built to address.

Three Questions AI Product Leaders Struggle to Answer

Through our conversations with early customers, we've identified three critical questions:

1. Where is it lacking?

Product managers need easy access to quality metrics, but GenAI outputs don't fit neatly into traditional analytics. One customer told us they were "flying blind" when trying to understand where their chatbot was failing users. They had engagement data but couldn't connect it to the actual conversation quality.

2. Why is it lacking?
We've seen two approaches to answering this question:

The first involves manual review of sessions and transcripts—a process one of our beta users described as "soul-crushing" due to the time investment required.

The second involves slicing data to explore hypotheses. But here's the challenge we kept hearing: GenAI applications log unstructured data points—conversations, audio, images—that don't play well with traditional analytics tools. As one product leader told us, "It's like trying to analyze a book with a calculator."

3. How can we track its improvment?

Maybe the hardest question of all, once a problematic pattern detected. What steps can I take

We believe product managers deserve tools that speak their language. They need to:

  • Visualize complete user journeys across structured and unstructured interactions

  • Easily derive insights that help engage users better

  • Make data-informed decisions about feature prioritization

  • Analyze patterns across multiple data types—web events, GenAI events, costs, latency—all in one place

Product managers uniquely define what "good" looks like in GenAI experiences. While engineers implement features, determining their quality and impact is fundamentally a product responsibility.

That's why we're building Sticklight—not just another developer tool, but a platform that empowers product leaders to truly understand and improve their AI products. We're learning alongside our customers, and we'll continue to share what we discover as we grow.

If you're a product leader navigating the GenAI landscape, we built this for you. We'd love to hear about your challenges and learn together.