Introduction
Imagine if your search could think. When users type "remote work best practices," they'd instantly find your article titled "Telecommeting Strategies for Modern Teams" because the search understands the conceptual relationship between these phrases.
This is semantic search in action: technology that comprehends meaning rather than just matching keywords.
In this guide, you'll build a Strapi 5 plugin that brings semantic search to your content.
The system automatically generates AI embeddings when content is created and provides lightning-fast search through clean REST APIs. You'll finish with a production-ready plugin that transforms how users discover your content, complete with configurable field mapping.
Prerequisites
Before we dive in, you'll need:
- Strapi 5 project (we'll use the latest stable version)
- OpenAI API key with access to the embeddings endpoint
- Node.js 18+ and npm
- Basic JavaScript/TypeScript knowledge
- Understanding of Strapi 5 plugins development (helpful but not required)
We'll be working with OpenAI's text-embedding-ada-002 model, which generates 1536-dimensional vectors from text. Each embedding costs about $0.0001 per 1000 tokens, making this economical even for large content libraries.
Step 1: Scaffolding Your Strapi Semantic Search Plugin
Let's start by creating our plugin structure. We'll build this from scratch to understand every component.
1. Create Plugin Directory Structure
Ensure you CD in to your Strapi project folder. Then create the plugin directory structure:
mkdir -p src/plugins/semantic-search/server/src/{controllers,services,routes}
cd src/plugins/semantic-search
Next, set up the plugin's package.json:
1// Path: ./src/plugins/semantic-search
2
3{
4 "name": "strapi-plugin-semantic-search",
5 "version": "1.1.0",
6 "description": "Intelligent semantic search plugin for Strapi 5 powered by OpenAI embeddings",
7 "main": "strapi-server.js",
8 "keywords": [
9 "strapi",
10 "plugin",
11 "semantic-search",
12 "search",
13 "embedding",
14 "openai",
15 "ai"
16 ],
17 "dependencies": {
18 "axios": "^1.10.0",
19 "openai": "^5.8.2"
20 },
21 "engines": {
22 "node": ">=18.0.0 <=22.x.x"
23 },
24 "strapi": {
25 "displayName": "Semantic Search",
26 "name": "semantic-search",
27 "description": "Add semantic search capabilities to your Strapi content using OpenAI embeddings",
28 "kind": "plugin"
29 }
30}
2. Install the dependencies:
npm install
3. Create Plugin Entry Point
Create the plugin entry point (src/plugins/semantic-search/strapi-server.js
) and add the following code:
1// Path: ./src/plugins/semantic-search/strapi-server.js
2
3module.exports = require("./server");
This file instructs Strapi on where to locate our server-side plugin code. Every Strapi plugin needs this entry point.
4. Enable Plugin in Strapi
Now enable the plugin in your main Strapi configuration (config/plugins.js
):
1// ./config/plugins.js
2
3module.exports = ({ env }) => ({
4 "semantic-search": {
5 enabled: true,
6 resolve: "./src/plugins/semantic-search",
7 },
8});
5. Add OpenAI API key
Add your OpenAI API key to your .env
file:
OPENAI_API_KEY=your_openai_api_key_here
Step 2: Create Embedding and Vector Services for OpenAI Semantic Search
The heart of our plugin lies in three interconnected services. Think of them as specialized workers: one talks to OpenAI, another handles vector math, and the third orchestrates search operations.
1. Create Index File for Services
Create the services index file (server/src/services/index.js
):
1// Path: ./src/plugins/semantic-search/server/src/services/index.js
2
3"use strict";
4
5const embeddingService = require("./embedding-service");
6const vectorService = require("./vector-service");
7const searchService = require("./search-service");
8
9module.exports = {
10 embeddingService,
11 vectorService,
12 searchService,
13};
2. Create Embedding Service: OpenAI Integration
This service handles all OpenAI communication and text preprocessing. We'll build it in three parts within the same server/src/services/embedding-service.js
file: initialization, text preprocessing, and embedding generation.
Service Initialization
First, set up the OpenAI client and validate the API key. Create ./src/plugins/semantic-search/server/src/services/embedding-service.js
and start with the initialization code:
1// Path: ./src/plugins/semantic-search/server/src/services/embedding-service.js
2
3'use strict';
4
5const { OpenAI } = require('openai');
6
7module.exports = ({ strapi }) => ({
8
9 init() {
10 if (!process.env.OPENAI_API_KEY) {
11 throw new Error('OPENAI_API_KEY environment variable is required');
12 }
13
14 this.openai = new OpenAI({
15 apiKey: process.env.OPENAI_API_KEY,
16 });
17
18 strapi.log.info('OpenAI embedding service initialized');
19 },
The initialization validates that your API key exists and creates the OpenAI client that we'll use for all embedding requests.
Text Preprocessing
Add the preprocessing function to the same embedding-service.js
file:
1// Path: ./src/plugins/semantic-search/server/src/services/embedding-service.js
2
3// ...
4
5preprocessText(text) {
6 if (!text) return '';
7
8 // Remove HTML tags and normalize whitespace
9 const cleaned = text
10 .replace(/<[^>]*>/g, ' ') // Remove HTML tags
11 .replace(/\\s+/g, ' ') // Normalize whitespace
12 .trim();
13
14 // Limit to 8000 characters for safety (well under OpenAI's token limit)
15 return cleaned.substring(0, 8000);
16 },
This preprocessing removes HTML tags from rich text fields and normalizes whitespace. The 8000-character limit provides a safety buffer well under OpenAI's token limits.
Embedding Generation
Finally, add the core embedding generation function to complete the embedding-service.js
file:
1// Path: ./src/plugins/semantic-search/server/src/services/embedding-service.js
2
3// ...
4
5async generateEmbedding(text) {
6 if (!text || text.trim().length === 0) {
7 throw new Error('Text is required for embedding generation');
8 }
9
10 const processedText = this.preprocessText(text);
11 const originalLength = text.length;
12 const processedLength = processedText.length;
13
14 try {
15 const response = await this.openai.embeddings.create({
16 model: 'text-embedding-ada-002',
17 input: processedText,
18 });
19
20 const embedding = response.data[0].embedding;
21
22 strapi.log.debug(`Generated embedding: ${embedding.length} dimensions`);
23
24 return {
25 embedding,
26 processedText,
27 originalLength,
28 processedLength
29 };
30
31 } catch (error) {
32 strapi.log.error('OpenAI embedding generation failed:', error.message);
33 throw error;
34 }
35 }
36
37});
This function sends preprocessed text to OpenAI and returns both the 1536-dimensional embedding and metadata about the processing.
3. Create Vector Service: Mathematics and Storage
The vector service handles similarity calculations and database operations. We'll build three key functions within ./src/plugins/semantic-search/server/src/services/vector-service.js
: similarity calculation, storage, and search.
Cosine Similarity Calculation
Start with the mathematical foundation of semantic search. Create ./src/plugins/semantic-search/server/src/services/vector-service.js
:
1// Path: ./src/plugins/semantic-search/server/src/services/vector-service.js
2
3// ...
4
5'use strict';
6
7module.exports = ({ strapi }) => ({
8
9 calculateCosineSimilarity(vectorA, vectorB) {
10 if (!vectorA || !vectorB || vectorA.length !== vectorB.length) {
11 throw new Error('Invalid vectors for similarity calculation');
12 }
13
14 let dotProduct = 0;
15 let magnitudeA = 0;
16 let magnitudeB = 0;
17
18 for (let i = 0; i < vectorA.length; i++) {
19 dotProduct += vectorA[i] * vectorB[i];
20 magnitudeA += vectorA[i] * vectorA[i];
21 magnitudeB += vectorB[i] * vectorB[i];
22 }
23
24 magnitudeA = Math.sqrt(magnitudeA);
25 magnitudeB = Math.sqrt(magnitudeB);
26
27 if (magnitudeA === 0 || magnitudeB === 0) {
28 return 0;
29 }
30
31 return dotProduct / (magnitudeA * magnitudeB);
32 },
Cosine similarity measures the angle between two vectors, focusing on direction rather than magnitude. This is like comparing where two people are pointing rather than how far their arms extend: direction matters, not distance.
A score of 1.0 means identical meaning, 0.0
means no relationship, and -1.0
means opposite concepts. For content discovery, you'll typically see scores between 0.3
(somewhat related) and 0.95
(highly similar but not identical).
Why cosine similarity for semantic search? Unlike Euclidean distance, cosine similarity ignores magnitude differences. Two articles about machine learning—one with 50 mentions of "AI" and another with 5 mentions—are conceptually similar despite different intensities. They point in the same direction in vector space.
4. Create Function to Store Embeddings: Embedding Storage
Add the function to store embeddings in your Strapi database to the same vector-service.js
file:
1// Path: ./src/plugins/semantic-search/server/src/services/vector-service.js
2
3// ...
4
5async storeEmbedding(documentId, embedding, contentType, metadata = {}) {
6 if (!documentId || !embedding || !contentType) {
7 throw new Error('Document ID, embedding, and content type are required');
8 }
9
10 try {
11 const updated = await strapi.documents(contentType).update({
12 documentId: documentId,
13 data: {
14 embedding: embedding,
15 embeddingMetadata: {
16 model: 'text-embedding-ada-002',
17 generatedAt: new Date().toISOString(),
18 dimensions: embedding.length,
19 ...metadata
20 }
21 }
22 });
23
24 strapi.log.debug(`Stored embedding for ${contentType} document ${documentId}`);
25 return updated;
26
27 } catch (error) {
28 strapi.log.error(`Failed to store embedding for ${contentType} document ${documentId}:`, error.message);
29 throw error;
30 }
31 },
This function saves both the embedding vector and helpful metadata like generation timestamp and model information.
5. Create Similarity Search Function
Finally, add the search function that finds similar content to complete the vector-service.js
file:
1// Path: ./src/plugins/semantic-search/server/src/services/vector-service.js
2
3// ...
4
5async searchSimilar(queryEmbedding, contentType, options = {}) {
6 const {
7 limit = 10,
8 threshold = 0.1,
9 filters = {},
10 locale = null
11 } = options;
12
13 try {
14 const documents = await strapi.documents(contentType).findMany({
15 filters: {
16 embedding: { $notNull: true },
17 ...filters
18 },
19 locale: locale,
20 limit: 1000 // Fetch up to 1000 documents for similarity comparison
21 });
22
23 if (!documents || documents.length === 0) {
24 return [];
25 }
26
27 const scoredResults = documents
28 .map(doc => {
29 if (!doc.embedding) return null;
30
31 try {
32 const similarity = this.calculateCosineSimilarity(queryEmbedding, doc.embedding);
33
34 return {
35 ...doc,
36 similarityScore: similarity
37 };
38 } catch (error) {
39 strapi.log.warn(`Failed to calculate similarity for document ${doc.documentId}:`, error.message);
40 return null;
41 }
42 })
43 .filter(result => result !== null && result.similarityScore >= threshold)
44 .sort((a, b) => b.similarityScore - a.similarityScore)
45 .slice(0, limit);
46
47 strapi.log.debug(`Found ${scoredResults.length} similar documents for ${contentType}`);
48
49 return scoredResults;
50
51 } catch (error) {
52 strapi.log.error(`Failed to search similar documents for ${contentType}:`, error.message);
53 throw error;
54 }
55 }
56
57});
This function fetches documents with embeddings, calculates similarity scores, filters by threshold, and returns the most relevant results sorted by score.
Strapi logs during startup, highlighting the embedding service initialization message.
6. Create Search Service: Orchestration
The search service ties everything together, coordinating between the embedding and vector services. We'll build two main functions: single content type search and multi-content type search.
Single Content Type Search
Start with the core search functionality. Create ./src/plugins/semantic-search/server/src/services/search-service.js
:
1// Path: ./src/plugins/semantic-search/server/src/services/search-service.js
2
3'use strict';
4
5module.exports = ({ strapi }) => ({
6
7 async semanticSearch(query, contentType, options = {}) {
8 if (!query || !contentType) {
9 throw new Error('Query and content type are required');
10 }
11
12 const startTime = Date.now();
13
14 try {
15 // Generate embedding for the search query
16 const embeddingService = strapi.plugin('semantic-search').service('embeddingService');
17 const queryEmbedding = await embeddingService.generateEmbedding(query);
18
19 // Search for similar documents
20 const vectorService = strapi.plugin('semantic-search').service('vectorService');
21 const results = await vectorService.searchSimilar(
22 queryEmbedding.embedding,
23 contentType,
24 options
25 );
26
27 const searchTime = Date.now() - startTime;
28
29 strapi.log.info(`Semantic search completed in ${searchTime}ms for query: "${query}"`);
30
31 return {
32 query,
33 contentType,
34 results,
35 metadata: {
36 totalResults: results.length,
37 queryProcessing: {
38 originalQuery: query,
39 processedText: queryEmbedding.processedText,
40 embeddingDimensions: queryEmbedding.embedding.length
41 },
42 searchOptions: {
43 limit: options.limit || 10,
44 threshold: options.threshold || 0.1,
45 locale: options.locale || null,
46 filtersApplied: Object.keys(options.filters || {}).length > 0
47 },
48 performance: {
49 searchTime: `${searchTime}ms`
50 }
51 }
52 };
53
54 } catch (error) {
55 strapi.log.error(`Semantic search failed for query "${query}":`, error.message);
56 throw error;
57 }
58 },
This function starts by validating inputs and recording the start time for performance tracking. It then retrieves the embedding service from the plugin registry and converts the user's search query into a 1536-dimensional vector through OpenAI's API. With this query embedding in hand, it calls the vector service to find similar documents by calculating cosine similarity scores against all stored document embeddings in the specified content type.
The function wraps up by calculating the total search time and returning a comprehensive response object that includes the matching documents sorted by relevance, along with detailed metadata about query processing, search options applied, and performance metrics for monitoring and debugging purposes.
Multi-Content Type Search
Add the capability to search across multiple content types simultaneously:
1// Path: ./src/plugins/semantic-search/server/src/services/search-service.js
2
3// ...
4
5async multiContentTypeSearch(query, contentTypes, options = {}) {
6 if (!query || !contentTypes || contentTypes.length === 0) {
7 throw new Error('Query and content types are required');
8 }
9
10 const { aggregateResults = true, ...searchOptions } = options;
11
12 try {
13 const searchPromises = contentTypes.map(contentType =>
14 this.semanticSearch(query, contentType, searchOptions)
15 );
16
17 const results = await Promise.all(searchPromises);
18
19 if (aggregateResults) {
20 // Combine and sort all results by similarity score
21 const allResults = results.flatMap(result =>
22 result.results.map(item => ({
23 ...item,
24 sourceContentType: result.contentType
25 }))
26 );
27
28 allResults.sort((a, b) => b.similarityScore - a.similarityScore);
29
30 return {
31 query,
32 contentTypes,
33 results: allResults.slice(0, searchOptions.limit || 10),
34 metadata: {
35 totalResults: allResults.length,
36 searchedContentTypes: contentTypes.length
37 }
38 };
39 }
40
41 return {
42 query,
43 contentTypes,
44 results,
45 metadata: {
46 searchedContentTypes: contentTypes.length
47 }
48 };
49
50 } catch (error) {
51 strapi.log.error(`Multi-content type search failed for query "${query}":`, error.message);
52 throw error;
53 }
54 }
55
56});
This function performs parallel searches across multiple content types by creating an array of search promises that execute the single semantic search function for each content type concurrently.
When aggregateResults
is true, it flattens all the individual search results into a single array, adds a sourceContentType
field to track which content type each result came from, then sorts the entire collection by similarity score to create a unified ranking across all content types.
Step 3: Automate Semantic Embedding Generation in Strapi
This automation makes semantic search transparent to content creators. When they save content, the system automatically generates embeddings without any manual intervention.
1. Strapi Plugin Registration and Bootstrap
Start with the main plugin setup. Create server/src/index.js
:
1// Path: ./src/plugins/semantic-search/server/src/index.js
2
3"use strict";
4
5const services = require("./services");
6const controllers = require("./controllers");
7const routes = require("./routes");
8
9module.exports = {
10 services,
11 controllers,
12 routes,
13
14 register({ strapi }) {
15 // Initialize the embedding service
16 const embeddingService = strapi
17 .plugin("semantic-search")
18 .service("embeddingService");
19 embeddingService.init();
20 },
21
22 bootstrap({ strapi }) {
23 strapi.log.info("Semantic Search plugin bootstrapped successfully");
24
25 // Register lifecycle hooks for auto-embedding
26 registerEmbeddingLifecycles(strapi);
27 },
28};
This plugin entry point follows Strapi's plugin lifecycle by exporting the required services, controllers, and routes, then defining two key phases of initialization.
2. Create Strapi Lifecycle Hook for Plugin
Add the function that registers automatic embedding generation for your content types:
1// Path: ./src/plugins/semantic-search/server/src/index.js
2
3// ...
4
5function registerEmbeddingLifecycles(strapi) {
6 // Get configuration from plugin config
7 const config = strapi.config.get("plugin.semantic-search") || {};
8
9 // Default content types and field mappings
10 const defaultContentTypes = {
11 "api::article.article": ["title", "content", "summary"],
12 "api::blog.blog": ["title", "body", "excerpt"],
13 };
14
15 // Use configured content types or defaults
16 const contentTypes = validateConfiguration(
17 config.contentTypes || defaultContentTypes,
18 strapi
19 );
20
21 // Store the configuration for use in other functions
22 strapi.plugin("semantic-search").config = { contentTypes };
23
24 Object.keys(contentTypes).forEach((contentType) => {
25 // Use Strapi 5 lifecycle hooks
26 strapi.db.lifecycles.subscribe({
27 models: [contentType],
28 beforeCreate: async (event) => {
29 await processDocumentEmbedding(event, "create", strapi);
30 },
31 beforeUpdate: async (event) => {
32 await processDocumentEmbedding(event, "update", strapi);
33 },
34 });
35
36 strapi.log.info(
37 `Registered embedding lifecycle hooks for ${contentType} with fields: ${contentTypes[
38 contentType
39 ].join(", ")}`
40 );
41 });
42}
This function registers hooks that automatically trigger embedding generation when content is created or updated.
3. Create Document Processing Function
Add the core function that processes documents and generates embeddings:
1// Path: ./src/plugins/semantic-search/server/src/index.js
2
3// ...
4
5async function processDocumentEmbedding(event, action, strapi) {
6 const { model, params } = event;
7 const data = params.data;
8
9 // Get the model name as string
10 const modelName = typeof model === "string" ? model : model.uid;
11
12 // Skip admin and plugin content types
13 if (modelName.startsWith("admin::") || modelName.startsWith("plugin::")) {
14 return;
15 }
16
17 try {
18 const embeddingService = strapi
19 .plugin("semantic-search")
20 .service("embeddingService");
21
22 // Extract text content from the document
23 const textContent = extractTextContent(data, modelName, strapi);
24
25 if (!textContent || textContent.trim().length < 10) {
26 strapi.log.debug(
27 `Skipping embedding generation for ${modelName} - insufficient text content`
28 );
29 return;
30 }
31
32 // Generate embedding
33 const embeddingResult = await embeddingService.generateEmbedding(
34 textContent
35 );
36
37 if (embeddingResult && embeddingResult.embedding) {
38 // Add embedding to the data that will be saved
39 data.embedding = embeddingResult.embedding;
40 data.embeddingMetadata = {
41 model: "text-embedding-ada-002",
42 generatedAt: new Date().toISOString(),
43 dimensions: embeddingResult.embedding.length,
44 processedText: embeddingResult.processedText,
45 originalLength: embeddingResult.originalLength,
46 processedLength: embeddingResult.processedLength,
47 };
48
49 strapi.log.info(
50 `Generated embedding for ${modelName} document (${action})`
51 );
52 }
53 } catch (error) {
54 strapi.log.error(
55 `Failed to generate embedding for ${modelName} document:`,
56 error.message
57 );
58 // Don't throw error - we don't want to break content creation/update
59 }
60}
4. Create Text Extraction Function
Add the function that extracts searchable text from configured fields:
1// Path: ./src/plugins/semantic-search/server/src/index.js
2
3// ...
4
5function extractTextContent(data, modelName, strapi) {
6 let textContent = "";
7
8 // Get configured field mappings for this content type
9 const config = strapi.plugin("semantic-search").config || {};
10 const contentTypes = config.contentTypes || {};
11
12 // Get fields for this specific content type, or use defaults
13 const textFields = contentTypes[modelName] || [
14 "title",
15 "name",
16 "content",
17 "body",
18 "summary",
19 "description",
20 "excerpt",
21 ];
22
23 textFields.forEach((field) => {
24 if (data[field]) {
25 if (typeof data[field] === "string") {
26 textContent += data[field] + " ";
27 } else if (Array.isArray(data[field])) {
28 // Handle rich text blocks or arrays
29 textContent += JSON.stringify(data[field]) + " ";
30 } else if (typeof data[field] === "object") {
31 // Handle nested objects (like rich text)
32 textContent += JSON.stringify(data[field]) + " ";
33 }
34 }
35 });
36
37 return textContent.trim();
38}
5. Create Plugin Configuration Validation Function
Finally, add the validation function that ensures plugin configuration is correct:
1// Path: ./src/plugins/semantic-search/server/src/index.js
2
3// ...
4
5function validateConfiguration(contentTypes, strapi) {
6 if (!contentTypes || typeof contentTypes !== "object") {
7 strapi.log.warn(
8 "Semantic Search: Invalid contentTypes configuration, using defaults"
9 );
10 return {
11 "api::article.article": ["title", "content", "summary"],
12 "api::blog.blog": ["title", "body", "excerpt"],
13 };
14 }
15
16 const validatedConfig = {};
17
18 Object.keys(contentTypes).forEach((contentType) => {
19 // Validate content type format
20 if (!contentType.startsWith("api::") || !contentType.includes(".")) {
21 strapi.log.warn(
22 `Semantic Search: Invalid content type format: ${contentType}. Should be like 'api::article.article'`
23 );
24 return;
25 }
26
27 // Validate fields array
28 const fields = contentTypes[contentType];
29 if (!Array.isArray(fields) || fields.length === 0) {
30 strapi.log.warn(
31 `Semantic Search: Invalid fields for ${contentType}. Should be an array of field names`
32 );
33 return;
34 }
35
36 // Validate field names
37 const validFields = fields.filter((field) => {
38 if (typeof field !== "string" || field.trim() === "") {
39 strapi.log.warn(
40 `Semantic Search: Invalid field name '${field}' for ${contentType}`
41 );
42 return false;
43 }
44 return true;
45 });
46
47 if (validFields.length > 0) {
48 validatedConfig[contentType] = validFields;
49 strapi.log.info(
50 `Semantic Search: Validated configuration for ${contentType}: ${validFields.join(
51 ", "
52 )}`
53 );
54 }
55 });
56
57 return validatedConfig;
58}
Step 4: Build RESTful Semantic Search APIs in Strapi
Now we'll expose our semantic search capabilities through clean REST APIs.
1. Create Controller
Create the controller (./src/plugins/semantic-search/server/src/controllers/search-controller.js
) and add the following code:
1// Path: ./src/plugins/semantic-search/server/src/controllers/search-controller.js
2
3// ...
4
5"use strict";
6
7module.exports = ({ strapi }) => ({
8 async search(ctx) {
9 try {
10 const {
11 query,
12 contentType,
13 limit = 10,
14 threshold = 0.1,
15 filters = {},
16 } = ctx.request.body;
17
18 // Validate required parameters
19 if (!query || !contentType) {
20 return ctx.badRequest("Query and contentType are required");
21 }
22
23 // Validate limits
24 if (limit > 50) {
25 return ctx.badRequest("Limit cannot exceed 50");
26 }
27
28 const searchService = strapi
29 .plugin("semantic-search")
30 .service("searchService");
31 const results = await searchService.semanticSearch(query, contentType, {
32 limit,
33 threshold,
34 filters,
35 locale: ctx.request.query.locale,
36 });
37
38 ctx.body = {
39 success: true,
40 data: results,
41 };
42 } catch (error) {
43 strapi.log.error("Search API error:", error.message);
44 ctx.badRequest(error.message);
45 }
46 },
47
48 async multiSearch(ctx) {
49 try {
50 const {
51 query,
52 contentTypes,
53 limit = 10,
54 threshold = 0.1,
55 aggregateResults = true,
56 } = ctx.request.body;
57
58 // Validate required parameters
59 if (!query || !contentTypes || !Array.isArray(contentTypes)) {
60 return ctx.badRequest("Query and contentTypes array are required");
61 }
62
63 if (contentTypes.length === 0) {
64 return ctx.badRequest("At least one content type is required");
65 }
66
67 const searchService = strapi
68 .plugin("semantic-search")
69 .service("searchService");
70 const results = await searchService.multiContentTypeSearch(
71 query,
72 contentTypes,
73 {
74 limit,
75 threshold,
76 aggregateResults,
77 locale: ctx.request.query.locale,
78 }
79 );
80
81 ctx.body = {
82 success: true,
83 data: results,
84 };
85 } catch (error) {
86 strapi.log.error("Multi-search API error:", error.message);
87 ctx.badRequest(error.message);
88 }
89 },
90
91 async stats(ctx) {
92 try {
93 const { contentType } = ctx.request.query;
94
95 const vectorService = strapi
96 .plugin("semantic-search")
97 .service("vectorService");
98 const stats = await vectorService.getEmbeddingStats(contentType);
99
100 ctx.body = {
101 success: true,
102 data: stats,
103 };
104 } catch (error) {
105 strapi.log.error("Stats API error:", error.message);
106 ctx.badRequest(error.message);
107 }
108 },
109});
2. Create Index Controller
Create the controller index (./src/plugins/semantic-search/server/src/controllers/index.js
):
1// Path: ./src/plugins/semantic-search/server/src/controllers/index.js
2
3"use strict";
4
5const searchController = require("./search-controller");
6
7module.exports = {
8 searchController,
9};
3. Define Routes
Now define the routes (./src/plugins/semantic-search/server/src/routes/index.js
):
1// Path: ./src/plugins/semantic-search/server/src/routes/index.js
2
3"use strict";
4
5module.exports = [
6 {
7 method: "POST",
8 path: "/search",
9 handler: "searchController.search",
10 config: {
11 policies: [],
12 },
13 },
14 {
15 method: "POST",
16 path: "/multi-search",
17 handler: "searchController.multiSearch",
18 config: {
19 policies: [],
20 },
21 },
22 {
23 method: "GET",
24 path: "/stats",
25 handler: "searchController.stats",
26 config: {
27 policies: [],
28 },
29 },
30];
4. Create Server Index File
Finally, create the server index (server/index.js
):
1// Path: ./src/plugins/semantic-search/server/index.js
2
3"use strict";
4
5module.exports = require("./src");
The API design follows REST principles with clear endpoints for different search operations. The validation ensures that malformed requests don't reach the OpenAI API, protecting both performance and costs.
Step 5: Testing and Validation
Let's test our plugin to ensure everything works correctly. First, add the embedding fields to your content types through the Strapi admin panel:
- Navigate to Content Type Builder
- Select your content type (e.g., Article)
- Add two new fields:
embedding
: JSON fieldembeddingMetadata
: JSON field
Restart your Strapi development server:
npm run develop
Look for these log messages confirming the plugin loaded correctly:
[INFO] OpenAI embedding service initialized
[INFO] Semantic Search plugin bootstrapped successfully
[INFO] Semantic Search: Validated configuration for api::article.article: title, content, summary
[INFO] Registered embedding lifecycle hooks for api::article.article with fields: title, content, summary
Create a test article with content like:
- Title: "Getting Started with React Hooks"
- Content: "React Hooks revolutionized how we write React components..."
- Summary: "A comprehensive guide to understanding React Hooks"
When you save the article, you should see:
1[INFO] Generated embedding for api::article.article document (create)
Now test the search API using curl:
curl -X POST http://localhost:1337/api/semantic-search/search \
-H "Content-Type: application/json" \
-d '{
"query": "React components and hooks",
"contentType": "api::article.article",
"limit": 5
}'
You should get a response like:
1{
2 "success": true,
3 "data": {
4 "query": "React components and hooks",
5 "contentType": "api::article.article",
6 "results": [
7 {
8 "id": 1,
9 "title": "Getting Started with React Hooks",
10 "content": "React Hooks revolutionized...",
11 "similarityScore": 0.8945,
12 "createdAt": "2025-01-15T10:30:00.000Z"
13 }
14 ],
15 "metadata": {
16 "totalResults": 1,
17 "queryProcessing": {
18 "embeddingDimensions": 1536
19 }
20 }
21 }
22}
Test the statistics endpoint:
curl http://localhost:1337/api/semantic-search/stats
This should return embedding coverage statistics for all your content types.
The similarity scores tell you how semantically related the results are to your query:
- 0.85-1.0: Highly relevant (direct topic match)
- 0.75-0.85: Relevant (related concepts)
- 0.65-0.75: Somewhat relevant (tangential connection)
- Below 0.65: Low relevance
Step 6: Configuration and Customization
The real power of our plugin lies in its configurability. You can customize which content types to process and which fields to extract text from.
Create a custom configuration in your config/plugins.js
:
1module.exports = ({ env }) => ({
2 "semantic-search": {
3 enabled: true,
4 resolve: "./src/plugins/semantic-search",
5 config: {
6 contentTypes: {
7 "api::article.article": ["title", "content", "summary", "tags"],
8 "api::blog.blog": ["title", "body", "excerpt", "category"],
9 "api::product.product": ["name", "description", "features", "benefits"],
10 "api::course.course": ["title", "overview", "learningOutcomes"],
11 },
12 },
13 },
14});
This configuration tells the plugin:
- Articles: Process
title
,content
,summary
, andtags
- Blog posts: Process
title
,body
,excerpt
, andcategory
- Products: Process
name
,description
,features
, andbenefits
- Courses: Process
title
,overview
, andlearningOutcomes
Restart Strapi and you'll see the new field mappings in the logs:
[INFO] Semantic Search: Validated configuration for api::product.product: name, description, features, benefits
[INFO] Registered embedding lifecycle hooks for api::product.product with fields: name, description, features, benefits
Performance Optimization
For production deployments, consider these optimizations:
- Batch Processing: Process multiple documents at once during large imports
- Caching: Cache frequently searched embeddings to reduce API calls
- Database Indexing: Add indexes on frequently filtered fields
- Rate Limiting: Implement rate limiting to prevent OpenAI API overuse
Cost Management
Monitor your OpenAI usage carefully. Each embedding generation costs approximately $0.0001 per 1000 tokens. For a typical article with 1000 words, expect costs around $0.0003 per embedding.
Set up usage alerts in your OpenAI dashboard to track costs and implement caching strategies for frequently accessed content.
Demo
You can see the demo of the semantic plugin below:
GitHub Repo and Plugin Installation
The Complete source code is available on GitHub and published as an npm package: strapi-plugin-semantic-search.
Install the plugin from npm by running the command below.
npm install strapi-plugin-semantic-search
Then enable the plugin in your config/plugins.js
:
1module.exports = ({ env }) => ({
2 'semantic-search': {
3 enabled: true,
4 },
5});
Conclusion
The future of content discovery is semantic, and you've just built the bridge to get there. You've built a sophisticated semantic search plugin that transforms content discovery in Strapi. The system automatically generates AI embeddings for your content and provides lightning-fast search through clean REST APIs.
Next steps for extending the plugin:
- Add admin panel components for search testing
- Implement webhook support for external integrations
- Add support for image and audio content embeddings
- Integrate with external vector databases for larger datasets
- Build recommendation engines based on content similarity
This foundation enables unlimited possibilities for AI-powered content experiences. Your users will discover content they never knew existed, and you'll see the measurable impact on engagement and satisfaction.
Software Engineer focused on building web applications and writing technical content.