Building Semantic Search for My Smart Notes App
Building Semantic Search for My Smart Notes App
GitHub Repository: https://github.com/Aymen-Guerrouf/smart-note-backend
Connect with me: LinkedIn - Aymen Guerrouf
I built a smart notes app recently, and honestly the coolest part was the semantic search. You know that feeling when you're desperately trying to find that one note about machine learning, but you can only remember you wrote something about "neural networks" or "AI algorithms"? Traditional keyword search would leave you hanging, but semantic search? It just gets it.
Let me walk you through how I built this system and why it's actually way simpler than you might think.
What Even Is Semantic Search?
Before we dive into code, let's get our heads around what semantic search actually means. Think about how humans understand language. When I say "dog," you don't just think about the letters d-o-g. Your brain immediately connects it to furry animals, pets, barking, walks in the park, and probably a dozen other related concepts.
Traditional search engines are basically fancy word-matching systems. They look for exact matches or slight variations of your search terms. But semantic search tries to understand the meaning behind words, just like your brain does.
Here's a concrete example from my notes app. Let's say I have these two notes:
- "Machine Learning Fundamentals" - contains content about neural networks, algorithms, and training data
- "My Weekend Plans" - mentions walking the dog and visiting the park
If I search for "artificial intelligence," traditional search might miss the first note entirely because those exact words don't appear in it. But semantic search understands that "artificial intelligence," "machine learning," and "neural networks" are all related concepts. It would find that first note and rank it highly, while completely ignoring the weekend plans note.
The magic happens through something called embeddings.
Understanding Embeddings: The Heart of Semantic Search
Embeddings are probably the most important concept to grasp here. Think of an embedding as a way to represent text as a list of numbers that captures its meaning. It's like giving every piece of text a unique fingerprint, but instead of identifying individual texts, these fingerprints group similar meanings together.
Here's how my system works. Every note gets converted into a 1024-dimensional vector. I know that sounds intimidating, but imagine it like this: each note becomes a point in a 1024-dimensional space. Notes about similar topics cluster together in this space, while unrelated notes spread far apart.
class EmbeddingModel:
"""
Text embedding generator with batch processing and fallback handling.
Supports HuggingFace transformer models with automatic fallback to
deterministic hash-based embeddings when models fail.
"""
def __init__(self, model_name: str = "BAAI/bge-large-en-v1.5", vector_size: int = 1024):
self.model_name = model_name
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.dim = vector_size
# Model components loaded lazily to save memory
self.tokenizer: Optional[AutoTokenizer] = None
self.model: Optional[AutoModel] = None
self.use_fallback = False
I'm using a pre-trained model called BGE (BAAI General Embedding), which has already learned to understand relationships between millions of words and concepts. When I feed it text like "machine learning algorithms," it outputs a vector that's mathematically close to vectors for "neural networks," "deep learning," and "artificial intelligence."
The beautiful thing is that I don't need to train this model myself. Researchers at Beijing Academy of Artificial Intelligence already did the heavy lifting, training it on massive amounts of text to understand semantic relationships.
The Embedding Generation Process
Let me break down exactly how text becomes a meaningful vector. This process happens every time I save a note or perform a search.
def encode_text(self, text: str) -> List[float]:
"""Convert text into a vector embedding."""
if not self.use_fallback:
try:
self._ensure_model_loaded()
if self.tokenizer and self.model:
return self._generate_transformer_embedding(text)
except Exception as e:
logging.warning(f"Transformer embedding failed: {e}")
self.use_fallback = True
# Fallback to deterministic hash-based embedding
return self._generate_fallback_embedding(text)
The process starts with tokenization. The model breaks down my text into smaller pieces called tokens. These might be whole words, parts of words, or even individual characters, depending on the text. Think of it like preparing ingredients before cooking.
Next comes the actual embedding generation. The transformer model processes these tokens and generates what's called contextualized embeddings. This means that the same word can have different embeddings depending on its context. The word "bank" in "river bank" gets a different embedding than "bank" in "savings bank."
def _generate_transformer_embedding(self, text: str) -> List[float]:
"""Generate embedding using transformer model."""
# First, we tokenize the input text
encoded_input = self.tokenizer(
text,
padding=True,
truncation=True,
max_length=512, # Limit to avoid memory issues
return_tensors="pt",
)
# Move to appropriate device (GPU if available)
encoded_input = {k: v.to(self.device) for k, v in encoded_input.items()}
with torch.no_grad(): # Disable gradients for inference
model_output = self.model(**encoded_input)
# Apply mean pooling to get sentence-level embeddings
sentence_embeddings = self._mean_pooling(
model_output, encoded_input["attention_mask"]
)
# Normalize embeddings for cosine similarity
sentence_embeddings = torch.nn.functional.normalize(
sentence_embeddings, p=2, dim=1
)
return sentence_embeddings[0].cpu().numpy().astype(np.float32).tolist()
The mean pooling step is crucial here. The transformer model gives us embeddings for individual tokens, but I need one embedding for the entire note. Mean pooling averages all the token embeddings while paying attention to the attention mask, which tells us which tokens are actual content versus padding.
Finally, I normalize the embeddings. This ensures that when I compare embeddings later using cosine similarity, I'm measuring the angle between vectors rather than their magnitude. This makes the similarity scores more meaningful and consistent.
Storing Embeddings: Enter Qdrant
Once I have these beautiful embeddings, I need somewhere to store them efficiently. This is where vector databases shine, and I chose Qdrant for its simplicity and performance.
Traditional databases are great at exact matches and simple comparisons, but they're terrible at finding "similar" vectors. Vector databases like Qdrant are specifically designed for this task. They use specialized indexing algorithms to make similarity searches incredibly fast, even with millions of vectors.
Why I Chose Qdrant Over Other Vector Databases
The vector database landscape is pretty crowded these days, with options like Pinecone, Weaviate, Chroma, and Milvus all competing for attention. So why did I go with Qdrant? Let me break down the decision-making process.
First, let's talk about what makes a good vector database for semantic search. You need three core capabilities: fast similarity search (obviously), horizontal scalability for growing datasets, and flexible filtering to combine vector search with traditional metadata queries. You also want something that doesn't require a PhD in distributed systems to operate.
Qdrant hit the sweet spot for my use case in several key ways. The biggest factor was its hybrid search capabilities. While I can perform pure vector similarity searches, I can also combine them with traditional filters on metadata like tags, creation dates, or note types. This is exactly what I demonstrated in the search function - finding semantically similar content while restricting results to specific categories.
The architecture is another major advantage. Qdrant uses HNSW (Hierarchical Navigable Small World) graphs for indexing, which provides excellent performance characteristics. Think of HNSW as building a multi-layer highway system through your vector space. The top layer connects distant regions with high-speed links, while lower layers provide local connections. When searching, you start on the highway and gradually zoom into more detailed local roads until you find your destination. This approach gives you logarithmic search time complexity, meaning searches stay fast even as your database grows exponentially.
From a practical standpoint, Qdrant runs beautifully in Docker containers and doesn't require complex cluster management for moderate scale applications. Setting up Pinecone requires API keys and external service management, while Weaviate and Milvus often need more complex orchestration. For a notes app that might scale to hundreds of thousands of entries, Qdrant's simplicity is a major win.
The API design also influenced my choice. Qdrant's REST API is intuitive and well-documented, making it easy to integrate with FastAPI. The Python client handles connection pooling and retries automatically, which reduces the boilerplate code I need to write and maintain.
Performance-wise, Qdrant consistently delivers sub-millisecond search times for my dataset size. The cosine similarity calculations are optimized at the SIMD instruction level, taking advantage of modern CPU capabilities to parallelize vector operations. When you're dealing with 1024-dimensional vectors, these optimizations make a real difference in user experience.
One often overlooked factor is memory efficiency. Qdrant uses quantization techniques to reduce the memory footprint of stored vectors without significantly impacting search accuracy. For a self-hosted solution, this means I can run larger datasets on smaller machines, keeping operational costs reasonable.
The filtering system deserves special mention because it's where many vector databases fall short. Qdrant doesn't just bolt on traditional filtering as an afterthought. Instead, it integrates metadata filtering directly into the vector search algorithm. This means I can efficiently search for "machine learning concepts in research notes created last month" without first filtering millions of vectors and then searching the subset. The database optimizes the entire query pipeline together.
class QdrantDatabase:
"""Enhanced Qdrant vector database wrapper with connection management."""
def __init__(self, host: str = "localhost", port: int = 6333):
self.client = QdrantClient(host=host, port=port)
async def ensure_collection(self, collection_name: str, vector_size: int = 1024) -> bool:
"""Create collection if it doesn't exist."""
def _ensure_collection_sync(client: QdrantClient) -> bool:
collections = client.get_collections().collections
existing_collections = [col.name for col in collections]
if collection_name not in existing_collections:
client.create_collection(
collection_name=collection_name,
vectors_config=VectorParams(
size=vector_size,
distance=Distance.COSINE # This is key for semantic similarity
),
)
return True
return False
return await asyncio.get_event_loop().run_in_executor(
None, lambda: _ensure_collection_sync(self.client)
)
I configured Qdrant to use cosine similarity as the distance metric. This measures the angle between vectors, which is perfect for semantic similarity. Two vectors pointing in the same direction (similar meaning) have a cosine similarity close to 1, while vectors pointing in opposite directions have a similarity close to -1.
Creating Notes with Automatic Embedding
Every time someone creates a note in my app, the system automatically generates and stores the embedding. This happens seamlessly in the background.
async def create_note(note_data: dict) -> dict:
"""Create a new note with automatic embedding generation and storage."""
# Generate unique ID and timestamp
note_id = str(uuid4())
current_time = datetime.utcnow()
# Combine title and content for richer embeddings
full_text = f"{note_data['title']}\n\n{note_data['content']}"
# Generate vector embedding
vector_embedding = embedding_model.encode_text(full_text)
# Prepare metadata payload
payload = {
"title": note_data["title"],
"content": note_data["content"],
"tags": note_data.get("tags", []),
"note_type": note_data.get("note_type", "general"),
"created_at": current_time.isoformat(),
"updated_at": current_time.isoformat(),
"word_count": len(note_data["content"].split()) if note_data["content"] else 0,
}
# Create point for Qdrant storage
point = PointStruct(
id=note_id,
vector=vector_embedding, # Our 1024-dimensional semantic fingerprint
payload=payload
)
# Store in Qdrant
operation_info = qdrant_db.client.upsert(
collection_name="notes",
points=[point],
wait=True # Ensure operation completes before returning
)
Notice how I combine the title and content into a single text for embedding generation. This gives the model more context to work with and creates richer semantic representations. A note titled "Machine Learning" with content about "neural networks and training data" produces a much more informative embedding than either piece alone.
The Magic of Semantic Search
Now comes the exciting part. When someone searches for notes, the system performs the same embedding process on their query, then finds notes with the most similar embeddings.
async def search_notes(query: str, limit: int = 10, score_threshold: float = 0.0,
filters: Optional[dict] = None) -> dict:
"""Perform semantic search on notes with optional filtering."""
# Step 1: Convert the search query into the same vector space
query_vector = embedding_model.encode_text(query)
# Step 2: Build filters for additional constraints
qdrant_filter = None
if filters:
conditions = []
# Filter by tags if specified
if filters.get("tags"):
for tag in filters["tags"]:
conditions.append(
FieldCondition(key="tags", match=MatchValue(value=tag))
)
# Filter by note type if specified
if filters.get("note_type"):
conditions.append(
FieldCondition(key="note_type", match=MatchValue(value=filters["note_type"]))
)
if conditions:
qdrant_filter = Filter(must=conditions)
# Step 3: Perform the actual vector search
search_results = qdrant_db.client.search(
collection_name="notes",
query_vector=query_vector,
query_filter=qdrant_filter,
limit=limit,
score_threshold=score_threshold if score_threshold > 0 else None,
with_payload=True,
with_vectors=False, # Don't return vectors to save bandwidth
)
The search process is remarkably straightforward. I convert the user's query into an embedding using the exact same process used for notes. Then Qdrant efficiently finds the notes with embeddings most similar to the query embedding.
The beauty of this approach is that it captures semantic relationships automatically. A search for "neural networks" will find notes about "deep learning," "machine learning," and "artificial intelligence" without me having to manually specify these relationships.
Combining Semantic and Traditional Search
While semantic search is powerful, it's not perfect. Sometimes users really do want exact keyword matches. My system combines both approaches for the best of both worlds.
The semantic search handles the conceptual matching, while optional filters allow for precise constraints like specific tags or note types. This hybrid approach covers edge cases where pure semantic search might miss something important.
# Example search combining semantic understanding with precise filtering
search_request = {
"query": "machine learning algorithms", # Semantic search
"limit": 10,
"score_threshold": 0.7,
"filters": {
"tags": ["ai", "technical"], # Exact tag matching
"note_type": "research" # Exact type matching
}
}
Production Considerations and Robustness
Building a production system means planning for when things go wrong. My embedding model includes a fallback mechanism that switches to deterministic hash-based embeddings if the transformer model fails.
def encode_text(self, text: str) -> List[float]:
"""Convert text into a vector embedding with fallback handling."""
if not self.use_fallback:
try:
self._ensure_model_loaded()
if self.tokenizer and self.model:
return self._generate_transformer_embedding(text)
except Exception as e:
logging.warning(f"Transformer embedding failed: {e}")
self.use_fallback = True
# Graceful degradation to hash-based embeddings
return self._generate_fallback_embedding(text)
This fallback ensures that the search functionality never completely breaks, even if the main embedding model encounters issues. The hash-based embeddings won't capture semantic relationships as well, but they'll still provide some level of similarity matching.
Performance Optimizations
Real-world performance matters, especially when users expect instant search results. I implemented several optimizations to keep the system snappy.
The embedding model uses lazy loading, only initializing the transformer components when first needed. This reduces startup time and memory usage. GPU acceleration automatically kicks in when available, significantly speeding up embedding generation.
For the vector database, I chose cosine similarity over other distance metrics because it's computationally efficient and produces intuitive similarity scores between 0 and 1.
Putting It All Together: The Search API
The final piece is a clean API that ties everything together. Here's how the complete search endpoint works:
@router.post("/search", response_model=SearchResponse)
async def search_notes_endpoint(search_request: SearchRequest):
"""
Complete semantic search endpoint.
Request lifecycle:
1. User query (JSON) →
2. Generate query embedding →
3. Vector search in Qdrant →
4. Ranked results (JSON)
"""
start_time = time.time()
try:
# Validate the search query
query = search_request.query.strip()
if not query:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Search query cannot be empty"
)
# Perform the semantic search
search_results = await search_notes(
query=query,
limit=search_request.limit,
score_threshold=search_request.score_threshold,
filters=search_request.filters
)
# Calculate performance metrics
search_time_ms = (time.time() - start_time) * 1000
returned_count = len(search_results["results"])
has_more = returned_count >= search_request.limit
# Return structured response
response = SearchResponse(
results=search_results["results"],
total_found=search_results["total_found"],
returned_count=returned_count,
query=query,
search_time_ms=search_time_ms,
has_more=has_more
)
return response
except Exception as e:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Search failed: {str(e)}"
)
Why This Approach Works
The magic of this system lies in its simplicity. I don't need to manually define relationships between concepts or build complex rule systems. The pre-trained embedding model already understands semantic relationships learned from massive amounts of text data.
Users can search using natural language, synonyms, related concepts, or even descriptions of what they're looking for. The system finds relevant notes regardless of the exact words used, making it feel almost intuitive.
The combination of semantic understanding with traditional filtering gives users the flexibility to be as specific or as broad as they need. Sometimes you want everything related to machine learning, sometimes you need that specific research note about convolutional neural networks.
Looking Forward
This semantic search system has transformed how I interact with my notes. Instead of trying to remember exact keywords or frantically browsing through tags, I can simply describe what I'm looking for in natural language.
The foundation is solid and extensible. I could easily add features like search result highlighting, query expansion, or even personalized ranking based on user behavior. The vector database could scale to handle millions of notes without breaking a sweat.
Most importantly, the system gets better over time. As I add more notes, the semantic relationships become richer and more interconnected. Each new note adds context that improves search quality for existing content.
Building semantic search might seem daunting at first, but with the right tools and understanding of the core concepts, it's surprisingly achievable. The combination of pre-trained embeddings, vector databases, and thoughtful system design creates something that feels like magic but runs on solid engineering principles.
The future of search isn't about matching keywords—it's about understanding meaning. And honestly, once you experience semantic search in action, going back to traditional keyword matching feels like using a flip phone after an iPhone.
Check It Out
Explore the code: GitHub Repository - Smart Note Backend
Connect with me: LinkedIn - Aymen Guerrouf
Want to discuss semantic search, vector databases, or building intelligent applications? Let's connect!
Enjoyed this post?
Found this helpful? Share the link with others!