The business problem the client had identified was real and quantifiable. Mid-market organizations accumulate software portfolios without any intelligent governance layer. Tools get bought by individual teams, owned by people who leave the company, and auto-renewed at full price while half the licenses sit unused. Duplicate tools proliferate across departments; a company might run both Zoom and Teams for video conferencing, or Salesforce and HubSpot for CRM, without anyone doing a systematic analysis of which to keep, which to consolidate, and how much can be saved.
The challenge was not a shortage of data. Finance systems, identity providers, and procurement tools all held pieces of the picture. The problem was synthesis — there was no intelligence layer to bring the data together and turn it into decisions. Doing this manually for a company running 50+ tools across multiple departments could take weeks of analyst effort. The vision for the Platform was to compress that to minutes.
For Zimetrics, the engineering challenge was well-defined but demanding — build a production-ready agentic AI platform with multi-layer architecture, G2 data integration, semantic search, a deterministic scoring engine, enterprise authentication, and a conversational AI interface and do it fast enough to matter.
Zimetrics approached the Platform as an AI-native platform engineering challenge. Two parallel AI streams ran throughout the engagement: AI as the product being built, and AI as the method of building it. Both were deliberate architectural decisions from Day 1.
The central intelligence layer was built on Google’s Agent Development Kit (ADK) chosen for its mature support for multi-step agentic workflows, LLM orchestration, and tool calling. The ADK’s agent framework provided structured reasoning, conversation state management, and reusable AI “skills” (categorization, recommendation summarization, evidence retrieval) that could be composed and extended as the platform evolved.
A defining architectural principle was the hybrid AI and deterministic model. Early iterations of the recommendation engine relied purely on LLM outputs. The team recognized this introduced accuracy and consistency risks for a product where enterprise users would be acting on financial recommendations. The engine was redesigned with deterministic scoring at its core; five structured overlap signals with weighted scoring logic and LLM reasoning applied at the end to generate explanations and rationale. This preserved the intelligence of generative AI while eliminating hallucination risk from the decision pathway.
The G2 data layer was another critical decision. Rather than relying on generic web scrapers or third-party data APIs, Zimetrics engineered a domain-specific ingestion system using Bright Data. This produced structured, owned, schema-driven data assets at 2 to 5 times lower cost than equivalent external tools and gave the platform durable, updatable product intelligence rather than brittle scrape outputs.
Zimetrics embedded GitHub Copilot as the primary development accelerator across the entire backend engineering cycle; code generation, refactoring, debugging, test generation, and documentation. Claude and other LLM providers were used for ideation, architecture strategy, and implementation planning. Figma’s AI features were used for rapid UI/UX prototyping and stakeholder alignment; particularly valuable given that the client needed visual representations of flows and architecture to review and sign off.
The combination compressed what would have been a sequential, 6-8 month engineering cycle into an AI-parallelized delivery model. Core coding was largely complete within 2 months. The remaining time went into performance optimization, quality assurance, and the evolving feature scope that emerged as the Founder/CEO refined his requirements throughout the engagement.
At the heart of the Platform is an agentic AI workflow that takes a company’s tool inventory as input and produces structured optimization recommendations as output. The process begins when an IT or finance user uploads a stack file, an Excel or CSV, containing tool names, monthly costs per seat, the number of active users and the teams using each tool. The platform ingests this file, maps each tool against the G2 product catalog via semantic vector search, categorizes every tool, groups overlapping tools, and runs the recommendation engine across each group.
The recommendation engine uses five deterministic overlap signals to score every pair of tools in the inventory:
| Signal | Name | Trigger Condition | Why It Matters |
|---|---|---|---|
| S1 | Category Overlap | Two tools share at least 2 G2 categories | Tools in the same categories serve the same function, structural evidence of redundancy |
| S2 | Feature Overlap | Two tools share at least 2 features in G2 | Specific feature duplication that creates direct substitutability between tools |
| S2b | Feature Coverage | Shared features cover at least 50% of the smaller tool’s feature set | Migration risk is low; the smaller tool capabilities are largely absorbed by the larger one, making consolidation straightforward |
| S3 | Alternatives Cross-Reference | One tool appears in the other tool’s alternatives list (name match) | Direct market evidence of substitutability — the product intelligence platform itself classifies these tools as interchangeable |
| S4 | Embedding Similarity | Cosine similarity between tool overviews > 0.60 | Semantic similarity of product descriptions — catches tools that do the same job under different branding |
Once signals are computed for each tool pair, the engine aggregates scores across all signals and assigns one of three actions to every tool in an overlapping group:
The scoring logic accounts for company-specific context: the number of active users on each tool influences the migration complexity assessment, and team distribution informs whether consolidation is practical. A tool used by 10 people is a significantly different consolidation decision than the one used by 200.
After deterministic scoring produces the action assignments, an LLM reasoning pass is applied to generate natural language explanations for each recommendation; why a specific tool should be removed, what savings it would unlock, and what migration steps the user should consider. This gives end users an explainable, trustworthy output.
The entire analysis, from file upload to categorized inventory to full overlap recommendations, runs in under 5 minutes for a typical mid-market stack.
The platform’s intelligence depends entirely on the quality of the product data it reasons over. Zimetrics engineered a domain-specific G2 catalog ingestion pipeline using Bright Data, applying schema-driven extraction and incremental change detection to maintain a structured, current product knowledge base. All G2 tool data — overviews, features, categories, pricing indicators, and alternatives listings were stored as vector embeddings in an AWS S3 vector layer.
When a user uploads their stack file, the categorization engine constructs a combined semantic query for each tool (tool name + description + category signals) and runs similarity search against the vector database to find the highest-confidence G2 match. This matching step enables the platform to categorize unfamiliar or lightly documented tools and map them to G2’s taxonomy; a critical capability given the long tail of SaaS products organizations run.
DynamoDB and DocumentDB serve as the transactional data layer, maintaining tool inventory records, pricing data, conversation session state, user preferences, and saved shortlists. The data model was designed for flexibility and analytics readiness — supporting schema evolution as the platform’s feature set expands.
The backend is built on Python FastAPI, acting as the central orchestration layer between the front end, the GenAI/ADK layer, and the data infrastructure. It handles all business logic: tool categorization and inventory management, company profile operations, stack file ingestion and processing, report generation, and REST API exposure for marketplace integration.
Authentication runs via AWS Cognito, supporting SSO integration with Gmail, Okta, and Entra. All communication is secured via HTTPS with JWT/OAuth-based authorization. Stripe integration handles token-based billing on a pay-as-you-go model. The full stack is deployed within a secure AWS VPC with IAM role-based access control, encryption at rest and in transit, secrets management, and environment isolation across development, UAT, and production, providing the governance controls enterprise buyers require.
GitHub Copilot was the primary tool for development acceleration across backend engineering. It generated API endpoints, business logic scaffolding, and reusable components; shifting engineering effort from writing boilerplate to reviewing, refining, and validating AI-generated code.
Figma’s AI features were used for rapid UI/UX prototyping, enabling the team to quickly produce screen mockups, flow diagrams, and architecture visuals for the client to review and validate. Design moved from a coordination bottleneck to an alignment tool.
LLM providers including Claude and ChatGPT were used for ideation, architecture strategy evaluation, and sprint planning — not for production code generation, but for reasoning through design trade-offs and breaking down feature requirements into engineering tasks.
The Zimetrics team translated our vision into something technically coherent. They have delivered a platform that is go-to-market ready; engineered with the AI intelligence baked in from the ground up. The signal-based recommendation engine, the semantic search layer, and the speed of the analysis was designed into the product architecture. We look forward to building more compelling features with them.