How to Add Semantic Search with Weaviate (Step by Step Tutorial)

📖 5 min read•899 words•Updated May 3, 2026

Building a Semantic Search with Weaviate

We’re building a Weaviate semantic search application that lets users query a dataset with contextually similar results. Why? Because traditional keyword-based search is often frustrating and ineffective.

Prerequisites

Docker 20.10+, Docker Compose 1.29+
Go 1.19+ (if you’re working with custom Weaviate setups)
Python 3.11+, pip install weaviate-client>=3.0.0

Step 1: Set Up Weaviate with Docker Compose

First things first, let’s get Weaviate running. We’ll use Docker Compose for a quick setup. This makes it easy to run the database locally and get development going in no time.


version: '3.8'
services:
 weaviate:
 image: semitechnologies/weaviate:latest
 environment:
 - QUERY_DEFAULTS=string
 - AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true
 ports:
 - "8080:8080"
 volumes:
 - ./data:/var/lib/weaviate/data

Run this command to start Weaviate:


docker-compose up -d

Why Docker? It isolates your environment, allowing you to focus on development without having to deal with installation issues. I once decided to install everything manually for a project and ended up with a non-functioning mess. Lesson learned: Docker is a lifesaver.

Step 2: Confirm Weaviate is Running

After your Weaviate instance is up, let’s confirm it’s running correctly. You can hit the following endpoint:


curl http://localhost:8080/v1/schema

If you see a JSON response, you’re good to go. If not, check if the Docker instance is still active. Sometimes Docker just decides to stop for no reason—like that friend who says they’ll come to your party but ghosts you at the last minute.

Step 3: Create a Class in Weaviate

Now, let’s define a class that you will use for storing data. This is essentially a schema that tells Weaviate how to handle your data objects.


import weaviate

client = weaviate.Client("http://localhost:8080")
client.schema.create({
 "class": "Document",
 "properties": [
 {
 "name": "content",
 "dataType": ["text"]
 },
 {
 "name": "embedding",
 "dataType": ["number[]"]
 }
 ]
})

Why this structure? The “content” property holds your text data, and the “embedding” will store the vector representation of your documents for better semantic matching. You’ll need to deal with open issues if you try to couple Weaviate with a different data structure.

Step 4: Add Data to Weaviate

Let’s put some documents into your class. This snippet adds data, along with their embeddings, which can either be generated manually using a model like BERT or through an external service.


import numpy as np

documents = [
 {"content": "This is a document about AI.", "embedding": np.random.rand(300).tolist()},
 {"content": "Another paper discussing machine learning.", "embedding": np.random.rand(300).tolist()}
]

for doc in documents:
 client.data_object.create(doc, class_name="Document")

Notice how I used random embeddings (not recommended for production)? If I had a dime for every time I’ve done something dumb like that, I could retire. In real applications, these embeddings should come from a pretrained transformer model.

Step 5: Querying with Semantic Search

Finally, you can search for documents semantically using Weaviate’s vector search capabilities. The following example queries the database:


query_vector = np.random.rand(300).tolist() # Replace with your actual query embedding
response = client.query.get("Document", ["content"]).with_near_vector({"vector": query_vector}).do()
print(response)

This allows you to find documents based on the meaning behind the query rather than just matching keywords. It’s a huge win for applications that deal with complex datasets and natural language.

The Gotchas

Document Size: If your document is too large, you might hit the limit—Weaviate has constraints on the size of individual objects. Split long documents into smaller parts when needed.
Embedding Quality: Garbage in, garbage out. Poor-quality embeddings will result in irrelevant search results. Be meticulous when choosing a model for generating embeddings.
Indexing Time: Depending on the size of your dataset, indexing can take time. Don’t expect instant results after adding a bunch of documents. Patience is key here.
Open Issues: Regularly check for open issues on the Weaviate GitHub page. At the time of writing, currently there are 579 issues that could impact your project.

Full Code Example


import weaviate
import numpy as np

# Create Weaviate client
client = weaviate.Client("http://localhost:8080")

# Create schema
client.schema.create({
 "class": "Document",
 "properties": [
 {
 "name": "content",
 "dataType": ["text"]
 },
 {
 "name": "embedding",
 "dataType": ["number[]"]
 }
 ]
})

# Add documents with random embeddings
documents = [
 {"content": "This is a document about AI.", "embedding": np.random.rand(300).tolist()},
 {"content": "Another paper discussing machine learning.", "embedding": np.random.rand(300).tolist()}
]

for doc in documents:
 client.data_object.create(doc, class_name="Document")

# Querying using a random embedding
query_vector = np.random.rand(300).tolist()
response = client.query.get("Document", ["content"]).with_near_vector({"vector": query_vector}).do()
print(response)

What’s Next?

Try integrating a proper embedding generation process using a model like Sentence Transformers. That’ll enhance your search results significantly.

FAQ

1. Can Weaviate handle large datasets?

Yes, but performance will depend on your hardware and the quality of embeddings. Always monitor performance when scaling up.

2. Is Weaviate free to use?

Weaviate is open-source, so yes, but make sure to check the licensing. Currently, it’s under the BSD-3-Clause.

3. How do I visualize the vector data?

You can use Weaviate’s built-in GraphQL interface or create your own dashboards using tools like Grafana connected with Weaviate’s API.

Data Sources

Last updated May 03, 2026. Data sourced from official docs and community benchmarks.

🕒 Published: May 3, 2026

✍️

Written by Jake Chen

AI technology writer and researcher.

Learn more →