- Website links that automatically improve its accuracy by using assistance from a Large Language Model (LLM)
Website links that automatically improve its accuracy by using assistance from a Large Language Model (LLM)
Linking within a website enhances navigation and user experience by connecting related content. This connectivity ensures users can easily explore various sections without getting lost, ultimately improving engagement and satisfaction.
Every webmaster should link various terms from one page to related pages within a website, as well as linking back. This improves rankings and makes it easier for users to find relevant information on your website.
More importantly, internal links help drive sales of products or services by guiding visitors through different sections of your website. For an informative site in particular, proper use of internal links can greatly enhance the user experience by helping them better understand the content they are reading.
Now imagine the painstaking process of creating a new page where I have to meticulously search through all those countless entries just to link them appropriately? It’s not exactly rocket science but rather an exercise in extreme patience – or so it seems until you realize how much time is wasted on this trivial chore.
I mean, who knew managing an online presence could be such masochistic and hard monotonous routine work?
A single webpage can contain anywhere from ten to thirty crucial terms that link both internally within the site and externally.
Locating pertinent information is an incredibly time-consuming task!
Workflow: Semantic Search with Large Language Model Embeddings
This workflow outlines a process for implementing semantic search using large language model (LLM) embeddings. The goal is to create intelligent links on web pages that automatically connect users to the most relevant content based on context.
Steps:
Data Storage Setup:
- Utilize PostgreSQL with its vector data type to store and manage LLM-generated text embeddings efficiently.
Text Chunking:
- Divide all website page texts into manageable chunks or segments for processing.
Embedding Generation:
- Generate embeddings (vector representations) for each chunk of text using a large language model.
Template Markup Creation:
- Develop template markup to identify specific terms within the pages that require semantic links.
Markup Interpolation and Linking Program:
- Implement an automated program to process these marked-up terms.
- Convert each marked term into its corresponding LLM embedding.
- Compare this embedding with all stored embeddings in the database using a similarity metric (e.g., cosine similarity).
- Identify the most relevant page based on highest similarity score.
- Incorporate an automatic link to this most relevant page within the original content.
- Implement an automated program to process these marked-up terms.
Benefits:
- Enhanced User Experience: Users are directed to highly contextually related information, improving navigation and understanding of complex topics.
- Efficiency: Automates the process of creating semantic links, reducing manual effort while maintaining accuracy.
I won’t cover the details of setting up data storage, text chunking, or generating embeddings in this explanation. However, if you’re interested in learning more about these topics or need assistance specifically with PostgreSQL setup, feel free to reach out at any time—I’ll be happy to help!
Template Markup Creation
My RCD Template Interpolation System package for Emacs aids in using interpolation functions and allows me to customize the markup as well.
By standard I am using following markup to interpolate template tags ``.
As a primarily Emacs Lisp user, I want to point out that similar templating and string interpolation features exist across various programming languages—such as Python or whatever else you might be using. If you’re curious about how it works in another specific language, feel free to ask an LLM for guidance!
I like these special delimiters as they are not often used in markup and text:
- For opening the delimiter
⟦- MATHEMATICAL RIGHT WHITE SQUARE BRACKET - For closing the delimiter
⟦- MATHEMATICAL LEFT WHITE SQUARE BRACKET
I’m looking for a way to add interpolation for both variables and Lisp code within various types of markup documents like Markdown, Org mode, or Asciidoctor. Typically, this involves first using interpolation followed by one of these markups.
However, in my case, I don’t need any variable or code interpolation; all I want is simple text formatting with markup. Therefore, I plan to use the following delimiters:
lisp
(defcustom rcd-template-any-delimiter-open ""
"The closing delimiter for RCD Template Interpolation System."
:group 'rcd
:type 'string)
And now how is that practically used?
Imagine that I am writing this (precisely this) paragraph about . To highlight a section of text—selecting it for conversion into a “universal semantic link”—I navigate to the words “Emacs Lisp,” mark them, and then use a function, perhaps triggered by key bindings, to enclose those selected terms with specific delimiters.
You can see there how I did it: ``. Spaces before or after the string should not matter.
Words or terms enclosed by delimiters will be transformed into embeddings. These embeddings will then be compared with those stored in the database that correspond either to the same section of a website, a specific set of pages, or external links.
My personal function is following:
```lisp
(defun wrs-area-pages-by-embeddings (&optional limit link) “Search for pages in the current Hyperscope or selected website area based on embeddings similarity to a given query. Optionally LIMIT the number of results.
This function performs an embedding-based search within a specified area, using either the currently active table’s associated hyperscope or allowing selection from available areas if none is set. It retrieves IDs of objects with similar embeddings and displays them in Hyperscope mode.
- LIMIT: Optional argument to restrict the number of returned items (not used directly by this function but can be passed through).
The query for similarity search is derived either from selected text or user input, then converted into an embedding using `rcd-llm-get-embedding-single'. The SQL subquery calculates similarities between embeddings and groups them accordingly. Results are ordered by the minimum similarity within each group.
If any matching IDs `id-list' are found, they will be opened in Hyperscope mode with a formatted query description as context.“ (interactive) (let* ((query (or (rcd-region-string) (rcd-ask-get "Query: ”))) (query (rcd-llm-get-embedding-single query)) (limit (or limit 10)) (area (cond ((and rcd-db-current-table-id (hyperscope-area rcd-db-current-table-id)) (hyperscope-area rcd-db-current-table-id)) (t (wrs-areas-select)))) (id-list (rcd-sql-list “SELECT subquery.embeddings_referencedid FROM ( SELECT e.embeddings_referencedid, e.embeddings_embeddings <=> $1 AS similarity FROM embeddings e JOIN hyobjects h ON e.embeddings_referencedid = h.hyobjects_id WHERE e.embeddings_embeddingtypes = 1 AND e.embeddings_embeddings <=> $1 < 0.5 AND h.hyobjects_areas = $2 ORDER BY similarity ASC LIMIT $3 ) AS subquery GROUP BY subquery.embeddings_referencedid ORDER BY MIN(subquery.similarity) ASC” rcd-db query area limit))) (cond ((and link id-list) (wrs-insert-lightweight-markup-hyperlink (car id-list))) (id-list (hyperscope-by-id-list id-list (format “Query: %s” query))) (t (rcd-message “Matches not found”)))))
```
and then higher level function to run it on template interpolation:
```lisp (defun wrs-semantic-link (query) “Generate a semantic link using embeddings.
This function creates a semantic link by leveraging the `wrs-area-pages-by-embeddings' function. It does not require any arguments and is intended to be used interactively. When called, it processes text in region based on their embeddings and inserts appropriate links.“ (let* ((id (wrs-area-pages-by-embeddings query nil t)) (link (hyperscope-url-link id))) (cond ((and link (rcd-string-not-empty-p link)) (format ”%s“ query link)) (t query))))
(defun wrs-semantic-link-region () “Interactively replaces the current selected text with a semantic link using `wrs-semantic-link' function. If no region is active, does nothing.
The function uses rcd-region-string' to get the currently selected text
and stores it in variableregion'. It then checks if there is any
content in this variable; if not, execution stops here without further
action.
Finally, replaces the original selected text with content from
interpolated' utilizing helper functionrcd-region-string-replace'.“
(interactive)
(let ((region (rcd-region-string)))
(when region
(let ((interpolated (rcd-template-eval-any region #‘wrs-semantic-link)))
(rcd-region-string-replace interpolated)))))
```
But now how to use the function?
The terms in text must be marked, like here: is a dialect of the LISP programming language used by the text editor for extending and customizing its functionality.
The text is interpolated by
rcd-template-eval-anyfunction from RCD Template Interpolation System for Emacs and how exactly programmer is to do that depends on text processing.Any marked terms receive their links in the appropriate markup, whatever it is.
It’s important to note that, beyond highlighting specific terms in the text, there is minimal further human involvement required. The system has the capability of automatically linking these highlighted terms appropriately without additional input from a person.
Let me make an Emacs Lisp region function that demonstrates how it works…
Sample text
Emacs input methods provide a way to input characters from various writing systems and languages.
*…then markup…
provide a way to input characters from various writing systems and languages.
…followed by semantic linking…
- Emacs input methods provide a way to input characters from various writing systems and languages.
Which can be shown here as source:
markdown
3. [Emacs input
methods](https://gnu.support/gnu-emacs/emacs-lisp/Emacs-Lisp-input-method-for-
FULLWIDTH-LATIN-LETTERS.html) provide a way to
input characters from various writing systems and languages.
Let us say, we are speaking of :
markdown
Let us say, we are speaking of [Large Language Models](https://gnu.support/large-language-models-llm/index.html):
IMPORTANT TO NOTE is that me, who is author, doesn’t look up links any more, I leave it for the system to decide.
My work is to mark the terms which I want them to be hyperlinked by using semantic search.
How do these semantically founded website links automatically enhance their accuracy?
The more website pages you write, the more semantic content you add, which improves matching accuracy. With each page update, there may be changes in semantic linking that can be verified and optimized through your own algorithms for greater precision.
Automated link building is a powerful tool for webmasters; it requires human oversight to mark up relevant text but significantly reduces tedious tasks like link searching and manual updates.
Furthermore, if any URLs change, they are automatically updated across all website pages. This article aims to provide developers with concepts that can be implemented in their own projects.
Reference
GNU Project: https://www.gnu.org
GNU Emacs - GNU Project: https://www.gnu.org/software/emacs/
What is Free Software? - GNU Project - Free Software Foundation: https://www.gnu.org/philosophy/free-sw.html
Leave Your Comment or Contact GNU.Support
Contact GNU.Support now. There is a simple rule at GNU.Support: if we can help you, we do, whenever and wherever necessary, and it's the way we've been doing business since 2002, and the only way we know