# embeddingsearch Embeddingsearch is a python library that uses Embedding Similarity Search (similiarly to [Magna](https://github.com/yousef-rafat/Magna/tree/main)) to semantically compare a given input to a database of pre-processed entries. When first implementing the idea, it was conceptualized to only import files into the database. # How to set up 1. Install ![ollama](https://ollama.com/download) 2. Pull a few models using ollama (e.g. `paraphrase-multilingual`, `bge-m3`, `mxbai-embed-large`, `nomic-embed-text`) 3. [Install the depencencies](#installing-the-dependencies) 4. [Set up a local mysql database](#mysql-database-setup) # How to run the example script 1. Start the script `python3 dbtest.py` 2. Generate the index. Type in `index_folder` and submit. Then `target` and submit. (This might take a while with no GPU acceleration - go get some coffee) 3. After the indexing is done, you may prompt searches using `search` # Installing the dependencies ## Ubuntu 24.04 `pip install mysql.connector` `apt install python3-magic` ## Windows TODO # MySQL database setup 1. Install mysql: `sudo apt install mysql-server` and connect to it: `sudo mysql -u root` 1. Create the database `CREATE DATABASE embeddingsearch; use embeddingsearch;` 2. Create the user `CREATE USER embeddingsearch identified by "somepassword!"; GRANT ALL ON embeddingsearch.* TO embeddingsearch;` 3. Create the tables ```sql CREATE TABLE searchdomain (id int PRIMARY KEY auto_increment, name varchar(512), settings JSON); CREATE TABLE query (id int PRIMARY KEY auto_increment, id_searchdomain int, query TEXT, FOREIGN KEY (id_searchdomain) REFERENCES searchdomain(id)); CREATE TABLE entity (id int PRIMARY KEY auto_increment, name varchar(512), probmethod varchar(128), id_searchdomain int, FOREIGN KEY (id_searchdomain) REFERENCES searchdomain(id)); CREATE TABLE queryresult (id int PRIMARY KEY auto_increment, id_query int, id_entity int, result double, FOREIGN KEY (id_query) REFERENCES query(id), FOREIGN KEY (id_entity) REFERENCES entity(id)); CREATE TABLE attribute (id int PRIMARY KEY auto_increment, id_entity int, attribute varchar(512), value longtext, FOREIGN KEY (id_entity) REFERENCES entity(id)); CREATE TABLE datapoint (id int PRIMARY KEY auto_increment, name varchar(512), probmethod_embedding varchar(512), id_entity int, FOREIGN KEY (id_entity) REFERENCES entity(id)); CREATE TABLE embedding (id int PRIMARY KEY auto_increment, id_datapoint int, model varchar(512), embedding blob, FOREIGN KEY (id_datapoint) REFERENCES datapoint(id)); ``` # To-do - Proper config file - Add support for other databases? - Add database setup script? - Remove tables related to caching (It's not done on the sql server side anymore.) # Off-scope - Support for other database types