70759e787031cc24f96e2d34aea87548dac1e700
embeddingsearch
Embeddingsearch is a python library that uses Embedding Similarity Search (similiarly to Magna) to semantically compare a given input to a database of pre-processed entries.
When first implementing the idea, it was conceptualized to only import files into the database.
How to set up
- Install
- Pull a few models using ollama (e.g.
paraphrase-multilingual,bge-m3,mxbai-embed-large,nomic-embed-text) - Install the depencencies
- Set up a local mysql database
How to run the example script
- Start the script
python3 dbtest.py - Generate the index. Type in
index_folderand submit. Thentargetand submit. (This might take a while with no GPU acceleration - go get some coffee) - After the indexing is done, you may prompt searches using
search
Installing the dependencies
Ubuntu 24.04
pip install mysql.connector
apt install python3-magic
Windows
TODO
MySQL database setup
- Install mysql:
sudo apt install mysql-serverand connect to it:sudo mysql -u root - Create the database
CREATE DATABASE embeddingsearch; use embeddingsearch; - Create the user
CREATE USER embeddingsearch identified by "somepassword!"; GRANT ALL ON embeddingsearch.* TO embeddingsearch; - Create the tables
CREATE TABLE searchdomain (id int PRIMARY KEY auto_increment, name varchar(512), settings JSON);
CREATE TABLE query (id int PRIMARY KEY auto_increment, id_searchdomain int, query TEXT, FOREIGN KEY (id_searchdomain) REFERENCES searchdomain(id));
CREATE TABLE entity (id int PRIMARY KEY auto_increment, name varchar(512), probmethod varchar(128), id_searchdomain int, FOREIGN KEY (id_searchdomain) REFERENCES searchdomain(id));
CREATE TABLE queryresult (id int PRIMARY KEY auto_increment, id_query int, id_entity int, result double, FOREIGN KEY (id_query) REFERENCES query(id), FOREIGN KEY (id_entity) REFERENCES entity(id));
CREATE TABLE attribute (id int PRIMARY KEY auto_increment, id_entity int, attribute varchar(512), value longtext, FOREIGN KEY (id_entity) REFERENCES entity(id));
CREATE TABLE datapoint (id int PRIMARY KEY auto_increment, name varchar(512), probmethod_embedding varchar(512), id_entity int, FOREIGN KEY (id_entity) REFERENCES entity(id));
CREATE TABLE embedding (id int PRIMARY KEY auto_increment, id_datapoint int, model varchar(512), embedding blob, FOREIGN KEY (id_datapoint) REFERENCES datapoint(id));
To-do
- Proper config file
- Add support for other databases?
- Add database setup script?
- Remove tables related to caching (It's not done on the sql server side anymore.)
Off-scope
- Support for other database types
Languages
C#
66.1%
HTML
25.1%
Python
5.1%
JavaScript
2.3%
CSS
1.2%
Other
0.2%