Back

The Evolution of Site Search: From Tokenization & Regex to MongoDB, AI Semantics, and My Multi-Solution Exploration

MaoMaoyu

Current Hybrid Search Solution: A Safe Bet, but Still Limited

Currently, my navigation website uses a hybrid search solution, mainly consisting of these three parts:

  1. Tokenization + Regular Expressions: This is the most basic search method, tokenizing user input and then matching it against website titles and descriptions using regular expressions. It's simple, fast, and low-cost, working well for basic keyword searches. However, it doesn't understand semantics, leading to misinterpretations, like matching a recipe containing "apple" when someone searches for "apple phone."
  2. MongoDB Document Search: MongoDB offers robust text search capabilities, which I use for full-text indexing of website data. Compared to simple tokenization and regex, it supports more complex searches and provides some relevance ranking. But it's still based on keyword matching and doesn't truly understand user intent.
  3. AI Semantic Category Matching (DeepSeek Model): To tackle semantic understanding, I use the DeepSeek model to categorize websites. User queries are also analyzed semantically to match the most relevant category, returning websites from that category. This has improved accuracy, but it's still not granular enough, and the DeepSeek model comes with a cost.

This hybrid approach is a solid choice for now, balancing cost and effectiveness. But as my website grows and user expectations rise, I realize its limitations:

  • Poor Semantic Understanding: Keyword matching fails to grasp users' true intent, like not finding great results for "what are good online collaboration tools?"
  • Unintelligent Result Ranking: Ranking is often based on simple relevance, not what the user truly needs.
  • Higher Maintenance Costs: Three separate systems mean more maintenance, which gets harder as data grows.

Other Solutions Explored: Balancing Cost and Complexity

I've considered other solutions, but didn't adopt them due to various reasons:

  • Database Semantic Search: The ideal solution, it converts website data and user searches into vectors, matching them using similarity calculations. However, it's too costly due to high computation and storage demands, especially for a small website like mine.
    • Intermediate Solution: To cut costs, I considered adding a semantic vector field to my website data, using free semantic models to get vectors of website descriptions. Then, when a user searches, I'd get a vector for their search term and match them. Though cheaper, it needed much dev work and was hard to maintain.
  • Third-Party Site Search APIs (Algolia): These services are effortless and professional, but way too pricey. The free tier of 10,000 searches/month is insufficient for my site.

These explorations have shown me that choosing a site search solution requires balancing cost, effectiveness, complexity, and maintenance. It's all about finding that sweet spot that fits.

Future Vision: AI Knowledge Base + Semantic Search

While my current solution does the job, I have an ideal solution in mind: transforming all my website data into an AI knowledge base and using AI semantic search.

The advantages are clear:

  • Stronger Semantic Understanding: AI can understand user intent, providing more accurate results.
  • Smarter Ranking: AI-powered ranking can better sort results based on user needs and website quality.
  • More Flexible Search: Users can search as if talking to AI, such as "find me a free online flowchart tool."
  • Lower Maintenance Costs: I'd only need to maintain an AI model and knowledge base.

But there are challenges:

  • AI Knowledge Base Construction: How can website data be efficiently transformed into AI-understandable knowledge?
  • AI Model Selection: Which model best fits my needs?
  • AI Caching: How can search results be cached efficiently to avoid redundant computations?

I'm still figuring out some of these things, but I believe that AI-powered site search is the future.

Conclusion and Future Outlook

Site search is an ongoing evolution, with no one-size-fits-all solution. We must constantly experiment, adjust, and optimize.

I hope this article shares my journey and sparks discussion on site search solutions. If you have any experiences or thoughts, feel free to share them in the comments!

In the future, I will keep exploring AI knowledge base-based site search and look forward to sharing more. Thanks for reading!