Personalized search using vector databases and student data to surface the most relevant learning content from 1,900+ courses.
With 18K daily searches from 8K students, the search feature was one of the most used tools, yet it was failing users.
Over 40% of search results received zero clicks, indicating irrelevant or unhelpful recommendations. Less than 2% of searches led to course enrollment.
User feedback revealed the search was unintuitive, difficult to use, and often returned results that didn't match what students were actually looking for.
The existing system didn't consider student context, learning history, or preferences. Everyone saw the same results regardless of their needs.
Rebuilt the search system using vector databases combined with student behavioral data to deliver personalized, relevant results.
Vectorized all course content to enable semantic understanding beyond simple keyword matching.
Incorporated student viewing history, weighting results toward topics and formats they've engaged with previously.
Redesigned the results page to show more content above the fold, allowing students to quickly scan and select what they need.
All platform content is converted into vector representations stored in a specialized database. For courses, we create separate vectors for titles and full content—this dual approach improves relevance by giving more weight to exact title matches while still capturing semantic meaning from descriptions and modules.
This vectorization happens automatically whenever new content is published, keeping the search index current without manual updates.
When a student searches, we combine their query with context about courses they've viewed in the past months. The vector database returns results above a minimum relevance threshold, ensuring only meaningful matches appear.
This personalization helps surface content aligned with their current learning trajectory while still allowing discovery of new topics.
Raw vector search results are re-ranked using multiple quality signals. Courses get bonus points if they relate to recently viewed content, have recent launch dates, or have higher student ratings and completion rates.
This mathematical reweighting ensures the final results balance relevance with quality, showing students the best matches first.
The redesigned interface displays all content types in a clean, easy-to-scan layout optimized for quick decision-making. Students see more results above the fold, with clear visual differentiation between courses, paths, and other content.
This simplified design reduced cognitive load and made it easier for students to find and act on relevant content.
We ran multiple experiments to optimize each component, measuring CTR on top results to validate improvements.
Compared third-party search against custom vector-based system. The internal solution with personalization significantly outperformed generic external tools.
Tested including module titles in vectors versus just descriptions and titles. More detailed vectors improved relevance without sacrificing performance.
Created dedicated vectors for course titles to give them more weight. This improved precision for exact-match queries while maintaining broad semantic search.
Experimented with removing AI-generated tags from vectors. Cleaner data without noisy tags actually improved result quality and relevance.
A/B tested a simplified results layout against the original. The cleaner design increased top-2 position CTR by showing more scannable content.
The new search system dramatically improved discoverability and engagement across the platform.
Adding student behavioral data transformed search from generic to personal.
Running structured A/B tests on each component revealed non-obvious improvements that assumption-based development would have missed.
Both the technical architecture and the interface benefited from simplification over added complexity.
Prioritizing heavily-used features yields the best return on product development effort.