Skip to main content

Building a Scalable Graph RAG System for Enterprise Insights

I led the development of a Graph-based Retrieval-Augmented Generation (RAG) system integrating Neo4j, LangChain, and Azure OpenAI. Designed for enterprise-grade inventory and financial data analytics, the system enables natural language querying over knowledge graphs with structured Cypher output and dynamic result interpretation.

Visit website
  • LLM-Oriented Architecture Design
  • Knowledge Graph Engineering (Neo4j)
  • LangChain Integration & Prompt Engineering
  • Cloud Deployment (GCP) & Cost Optimization
The aero lesson builder app dragging an audio component into a screen about plant cells.

The problem

In 2025, we initiated the GraphRAG project to build a cutting-edge Retrieval-Augmented Generation system over a Neo4j knowledge graph—designed from the ground up to empower domain experts and business users with natural language access to structured enterprise data. The legacy systems were rigid and siloed, with limited ability to connect insights across group companies, inventory hierarchies, and financial years. Our goals were to unify this fragmented data into a queryable knowledge graph, reduce the barrier to data exploration using LLMs, and create a system that’s intuitive for non-technical users while being scalable and robust for complex analytics.

A set of dark themed components for the aero design system

The aero design system

To streamline development across data engineers, AI researchers, and product teams, we built a modular architecture combining graph databases, LLM pipelines, and prompt-driven orchestration. A flexible schema design and prompt framework laid the foundation for consistent query generation, natural language analysis, and scalable knowledge integration. This architecture informed not only the backend reasoning system but also the user experience across the application and supporting tools.

The homepage of the aero design system docs website linking to principles and components.

Design system docs

A system is only effective if teams know how to use it, so we built thorough documentation to guide contributors. It covers graph schema structure, prompt engineering principles, Cypher query patterns, LLM integration, and deployment guidelines—ensuring both developers and analysts can confidently work within the GraphRAG ecosystem.

A dramatic ocean scene with lava forming a new land mass.

Motion design

Interactivity and clarity were core principles in designing the query and result flow. Visual transitions and feedback help users understand how natural language inputs translate into Cypher queries and graph traversals—making the entire reasoning process feel intuitive and traceable.

Encouraging adaptivity

A major part of solving for collaboration was being able to visualize the learner experience in the editor. This was especially beneficial for subject matter experts and instructors need to review and give feedback on the higher level structure without having to dig through all of the adaptivity scenarios screen by screen.

A drag and drop storyboard style editor for creating an adaptive lesson.

An extensible plugin ecosystem usable by everyone

The most powerful aspect of the platform is the ability to create custom plugins for any content, whether it be a degree, course, lesson, screen, or interactive component. Out of the box these can be made configurable with minimal effort from developers. Learning designers can then edit everything using a common configuration interface.

Configuration options for a component.
Configuration options for text.

Next-generation learning experiences

The flexibility of the product allowed for developers to create engaging interactive experiences as highly configurable plugins that could then be used and manipulated by learning designers.

Bringing 3D into learning

One really cool example is the 3D screen plugin. Learning designers can load any model into it and then configure camera positions to animate to for each section.

Interactivity

Learners can then be directed to specific parts of the model and shown labels. They’re also able to click and drag to orbit around and freely explore at any time.

Animation

Learning designers can pick an animation included in the model to play or loop for any section without having to use any complex animation tools.

Project outcomes

Ultimately the project was successful after Smart Sparrow and the aero platform were acquired by Pearson in 2020 to become a foundation for their next generation learning platform.