Replying to Crocodile, Sun 30 Mar 2025 11:19:10 AM EAT

Yes, that is going to work. Additionally, I suggest using PostgreSQL with vector type, and then you can do magic. Here is schema for embedding types:

                                                        Table "public.embeddingtypes"
┌─────────────────────────────┬──────────────────────────┬───────────┬──────────┬───────────────────────────────────────────────────────────┐
│           Column            │           Type           │ Collation │ Nullable │                          Default                          │
├─────────────────────────────┼──────────────────────────┼───────────┼──────────┼───────────────────────────────────────────────────────────┤
│ embeddingtypes_id           │ integer                  │           │ not null │ nextval('embeddingtypes_embeddingtypes_id_seq'::regclass) │
│ embeddingtypes_uuid         │ uuid                     │           │ not null │ gen_random_uuid()                                         │
│ embeddingtypes_datecreated  │ timestamp with time zone │           │ not null │ CURRENT_TIMESTAMP                                         │
│ embeddingtypes_datemodified │ timestamp with time zone │           │          │                                                           │
│ embeddingtypes_usercreated  │ text                     │           │ not null │ CURRENT_USER                                              │
│ embeddingtypes_usermodified │ text                     │           │ not null │ CURRENT_USER                                              │
│ embeddingtypes_name         │ text                     │           │ not null │                                                           │
│ embeddingtypes_description  │ text                     │           │          │                                                           │
└─────────────────────────────┴──────────────────────────┴───────────┴──────────┴───────────────────────────────────────────────────────────┘
Indexes:
    "embeddingtypes_pkey" PRIMARY KEY, btree (embeddingtypes_id)
    "embeddingtypes_embeddingtypes_uuid_key" UNIQUE CONSTRAINT, btree (embeddingtypes_uuid)
Referenced by:
    TABLE "embeddings" CONSTRAINT "embeddings_embeddingtypes_fkey" FOREIGN KEY (embeddings_embeddingtypes) REFERENCES embeddingtypes(embeddingtypes_id)
Triggers:
    embeddingtypes_moddatetime BEFORE UPDATE ON embeddingtypes FOR EACH ROW EXECUTE FUNCTION moddatetime('embeddingtypes_datemodified')
    insert_username_embeddingtypes BEFORE INSERT OR UPDATE ON embeddingtypes FOR EACH ROW EXECUTE FUNCTION insert_username('embeddingtypes_usermodified')

and it offers you to search embeddings by type, think of it as prinicple, here is the start of it:

 1          Elementary objects
 2          People
 3          Files

Elementary objects are pieces of information in the database, people you know and files. Now if I search for people, I will search semantically, and get embeddings only for people. I also have for speech transcripts in different table.

Here is embeddings schema:

                                                           Table "public.embeddings"
┌───────────────────────────┬──────────────────────────┬───────────┬──────────┬────────────────────────────────────────────────────────────────┐
│          Column           │           Type           │ Collation │ Nullable │                            Default                             │
├───────────────────────────┼──────────────────────────┼───────────┼──────────┼────────────────────────────────────────────────────────────────┤
│ embeddings_id             │ integer                  │           │ not null │ nextval('embeddings_embeddings_id_seq'::regclass)              │
│ embeddings_uuid           │ uuid                     │           │ not null │ gen_random_uuid()                                              │
│ embeddings_datecreated    │ timestamp with time zone │           │ not null │ CURRENT_TIMESTAMP                                              │
│ embeddings_datemodified   │ timestamp with time zone │           │          │                                                                │
│ embeddings_usercreated    │ text                     │           │ not null │ CURRENT_USER                                                   │
│ embeddings_usermodified   │ text                     │           │ not null │ CURRENT_USER                                                   │
│ embeddings_name           │ text                     │           │ not null │ 'embedding_'::text || nextval('embeddings_name_seq'::regclass) │
│ embeddings_description    │ text                     │           │          │                                                                │
│ embeddings_embeddings     │ vector(768)              │           │ not null │                                                                │
│ embeddings_chunkid        │ integer                  │           │ not null │ nextval('embeddings_chunkid_seq'::regclass)                    │
│ embeddings_embeddingtypes │ integer                  │           │ not null │                                                                │
│ embeddings_referencedid   │ integer                  │           │ not null │                                                                │
│ embeddings_referenceduuid │ uuid                     │           │ not null │                                                                │
└───────────────────────────┴──────────────────────────┴───────────┴──────────┴────────────────────────────────────────────────────────────────┘
Indexes:
    "embeddings_pkey" PRIMARY KEY, btree (embeddings_id)
    "embeddings_embeddings_uuid_key" UNIQUE CONSTRAINT, btree (embeddings_uuid)
Foreign-key constraints:
    "embeddings_embeddingtypes_fkey" FOREIGN KEY (embeddings_embeddingtypes) REFERENCES embeddingtypes(embeddingtypes_id)
Triggers:
    embeddings_moddatetime BEFORE UPDATE ON embeddings FOR EACH ROW EXECUTE FUNCTION moddatetime('embeddings_datemodified')
    insert_username_embeddings BEFORE INSERT OR UPDATE ON embeddings FOR EACH ROW EXECUTE FUNCTION insert_username('embeddings_usermodified')

and here is example of files table:

                             ID   3                                                                         
                           UUID   "2b0e12ec-e261-4483-ada0-2819a0bcd159"                                    
                   Date created   "2025-03-21 09:03:28.273204+03"                                           
                  Date modified   "2025-03-22 02:00:10.485717+03"                                           
                   User created   "maddox"                                                                  
                  User modified   "maddox"                                                                  
                      File name   "/home/data1/protected/Documents/Org/Preliminary-Site-Assessment-and-Inspe
                    Description   nil                                                                       
                     Plain text   "#+TITLE: {{{site}}} Report \\\\ #+TITLE: Preliminary Site Assessment and 
                        Summary   "Based on the provided text and your request for a summary of information

I also store embeddings in chunks, as large documents must be chunked. But then again:

you can store embedding by summary
and it is beneficial getting plain text first;

Often I invoke shell scripts for embeddings, let us say, you are speaking and wish computer to do something, that is where it makes sense more:

#!/bin/bash
# Example how to run embeddings model:
# /usr/local/bin/llama-server -ngl 999 -v -c 8192 -ub 8192 --embedding --log-timestamps --host 192.168.1.68 --port 9999 -m /mnt/data/LLM/nomic-ai/quantized/nomic-embed-text-v1.5-Q8_0.gguf

# Get LAN IP address
HOST=$(rcd-get-ethernet-interface.sh)

# Configuration
API_ENDPOINT="http://$HOST:9999/v1/embeddings"
API_KEY="any"
MODEL="nomic-embed-text-v1.5-Q8_0.gguf"

# Read input from standard input
INPUT_TEXT=$(cat)

# Prepare JSON payload
JSON_PAYLOAD=$(jq -n \
  --arg model "$MODEL" \
  --arg input "$INPUT_TEXT" \
  '{model: $model, input: $input}')

# Send request to the API and get the response
RESPONSE=$(curl -s -X POST "$API_ENDPOINT" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d "$JSON_PAYLOAD")

# Extract the embedding from the response
EMBEDDING=$(echo "$RESPONSE" | jq -r '.data[0].embedding')

# Output the embedding
echo "$EMBEDDING"

Leave Your Comment or Contact GNU.Support