Yes, that is going to work. Additionally, I suggest using PostgreSQL with vector type, and then you can do magic. Here is schema for embedding types:
Table "public.embeddingtypes"
┌─────────────────────────────┬──────────────────────────┬───────────┬──────────┬───────────────────────────────────────────────────────────┐
│ Column │ Type │ Collation │ Nullable │ Default │
├─────────────────────────────┼──────────────────────────┼───────────┼──────────┼───────────────────────────────────────────────────────────┤
│ embeddingtypes_id │ integer │ │ not null │ nextval('embeddingtypes_embeddingtypes_id_seq'::regclass) │
│ embeddingtypes_uuid │ uuid │ │ not null │ gen_random_uuid() │
│ embeddingtypes_datecreated │ timestamp with time zone │ │ not null │ CURRENT_TIMESTAMP │
│ embeddingtypes_datemodified │ timestamp with time zone │ │ │ │
│ embeddingtypes_usercreated │ text │ │ not null │ CURRENT_USER │
│ embeddingtypes_usermodified │ text │ │ not null │ CURRENT_USER │
│ embeddingtypes_name │ text │ │ not null │ │
│ embeddingtypes_description │ text │ │ │ │
└─────────────────────────────┴──────────────────────────┴───────────┴──────────┴───────────────────────────────────────────────────────────┘
Indexes:
"embeddingtypes_pkey" PRIMARY KEY, btree (embeddingtypes_id)
"embeddingtypes_embeddingtypes_uuid_key" UNIQUE CONSTRAINT, btree (embeddingtypes_uuid)
Referenced by:
TABLE "embeddings" CONSTRAINT "embeddings_embeddingtypes_fkey" FOREIGN KEY (embeddings_embeddingtypes) REFERENCES embeddingtypes(embeddingtypes_id)
Triggers:
embeddingtypes_moddatetime BEFORE UPDATE ON embeddingtypes FOR EACH ROW EXECUTE FUNCTION moddatetime('embeddingtypes_datemodified')
insert_username_embeddingtypes BEFORE INSERT OR UPDATE ON embeddingtypes FOR EACH ROW EXECUTE FUNCTION insert_username('embeddingtypes_usermodified')
and it offers you to search embeddings by type, think of it as prinicple, here is the start of it:
1 Elementary objects 2 People 3 Files
Elementary objects are pieces of information in the database, people you know and files. Now if I search for people, I will search semantically, and get embeddings only for people. I also have for speech transcripts in different table.
Here is embeddings schema:
Table "public.embeddings"
┌───────────────────────────┬──────────────────────────┬───────────┬──────────┬────────────────────────────────────────────────────────────────┐
│ Column │ Type │ Collation │ Nullable │ Default │
├───────────────────────────┼──────────────────────────┼───────────┼──────────┼────────────────────────────────────────────────────────────────┤
│ embeddings_id │ integer │ │ not null │ nextval('embeddings_embeddings_id_seq'::regclass) │
│ embeddings_uuid │ uuid │ │ not null │ gen_random_uuid() │
│ embeddings_datecreated │ timestamp with time zone │ │ not null │ CURRENT_TIMESTAMP │
│ embeddings_datemodified │ timestamp with time zone │ │ │ │
│ embeddings_usercreated │ text │ │ not null │ CURRENT_USER │
│ embeddings_usermodified │ text │ │ not null │ CURRENT_USER │
│ embeddings_name │ text │ │ not null │ 'embedding_'::text || nextval('embeddings_name_seq'::regclass) │
│ embeddings_description │ text │ │ │ │
│ embeddings_embeddings │ vector(768) │ │ not null │ │
│ embeddings_chunkid │ integer │ │ not null │ nextval('embeddings_chunkid_seq'::regclass) │
│ embeddings_embeddingtypes │ integer │ │ not null │ │
│ embeddings_referencedid │ integer │ │ not null │ │
│ embeddings_referenceduuid │ uuid │ │ not null │ │
└───────────────────────────┴──────────────────────────┴───────────┴──────────┴────────────────────────────────────────────────────────────────┘
Indexes:
"embeddings_pkey" PRIMARY KEY, btree (embeddings_id)
"embeddings_embeddings_uuid_key" UNIQUE CONSTRAINT, btree (embeddings_uuid)
Foreign-key constraints:
"embeddings_embeddingtypes_fkey" FOREIGN KEY (embeddings_embeddingtypes) REFERENCES embeddingtypes(embeddingtypes_id)
Triggers:
embeddings_moddatetime BEFORE UPDATE ON embeddings FOR EACH ROW EXECUTE FUNCTION moddatetime('embeddings_datemodified')
insert_username_embeddings BEFORE INSERT OR UPDATE ON embeddings FOR EACH ROW EXECUTE FUNCTION insert_username('embeddings_usermodified')
and here is example of files table:
ID 3
UUID "2b0e12ec-e261-4483-ada0-2819a0bcd159"
Date created "2025-03-21 09:03:28.273204+03"
Date modified "2025-03-22 02:00:10.485717+03"
User created "maddox"
User modified "maddox"
File name "/home/data1/protected/Documents/Org/Preliminary-Site-Assessment-and-Inspe
Description nil
Plain text "#+TITLE: {{{site}}} Report \\\\ #+TITLE: Preliminary Site Assessment and
Summary "Based on the provided text and your request for a summary of information
I also store embeddings in chunks, as large documents must be chunked. But then again:
- you can store embedding by summary
- and it is beneficial getting plain text first;
Often I invoke shell scripts for embeddings, let us say, you are speaking and wish computer to do something, that is where it makes sense more:
#!/bin/bash
# Example how to run embeddings model:
# /usr/local/bin/llama-server -ngl 999 -v -c 8192 -ub 8192 --embedding --log-timestamps --host 192.168.1.68 --port 9999 -m /mnt/data/LLM/nomic-ai/quantized/nomic-embed-text-v1.5-Q8_0.gguf
# Get LAN IP address
HOST=$(rcd-get-ethernet-interface.sh)
# Configuration
API_ENDPOINT="http://$HOST:9999/v1/embeddings"
API_KEY="any"
MODEL="nomic-embed-text-v1.5-Q8_0.gguf"
# Read input from standard input
INPUT_TEXT=$(cat)
# Prepare JSON payload
JSON_PAYLOAD=$(jq -n \
--arg model "$MODEL" \
--arg input "$INPUT_TEXT" \
'{model: $model, input: $input}')
# Send request to the API and get the response
RESPONSE=$(curl -s -X POST "$API_ENDPOINT" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d "$JSON_PAYLOAD")
# Extract the embedding from the response
EMBEDDING=$(echo "$RESPONSE" | jq -r '.data[0].embedding')
# Output the embedding
echo "$EMBEDDING"