Graphical User Interface Installable Client
This GUI tool is installed locally on your machine. It serves as an interface to call API jobs and manage the results efficiently.
Requires a cloud account
Free to Download
Open Source Code Available
Create, Select or Delete a Project
You can create an unlimited number of projects.
Choose a name and content type (Image,Video,Sound,Text,3D Model/Point Cloud)
The raw and sampled files are kept in your local machine.
A cloud archive is created for the indexed vectors.
Data directories are created for the project in your local machine where you can place:
Raw Data to index
Fine-tuning data
Similarity calibration data
Relevance calibration data
Reverse search examples
Essential data examples
Forbidden data examples
Populate a Project’s Data Dirs
For the folders Raw_Data, Search_Examples, Essential_Content, Forbidden_Content,Relevance_Calibration_Dataset you need only respect the file extensions.
Vectorizer_Fine_Tuning_Dataset
We recommend at least 100 examples and no more than 10000 examples for fine-tuning.
You can have up to 100 labels in your fine tuning dataset.
The labels can be any class that describes the content. Each file can have multiple labels.
Open a text editor and add the labels of each file following the format:
{
"file_name_1.ext": ["label_1", "label_2", ...],
"file_name_2.ext": ["label_2"],
}
save your text editor file as "example_to_labels.json" and place it in the "Fine_Tuning_Data" folder.
Similarity_Calibration_Dataset
The similarity dataset must be composed of at least 200 pairs and max 10000 pairs of examples that are similar according to the client's criteria.
To assemble the similarity dataset we recommend you gather your data into clusters, one for each of the fine tuning labels (with the cluster containing items with that label) and then extract at least 2 pairs from each cluster.
The file names inside the pairs must start with a prefix that is the id of the pair. So the files inside the folder should look like this:
1_file_1.ext
1_file_2.ext
2_file_3.ext
2_file_4.ext
Run the Standard Jobs Pipeline
Will check the project data directories for files and when found will run the jobs in an automated way.
Calibration Steps:
Model fine-tuning
Similarity calibration
Relevance calibration
Indexing Steps:
Data sub sampling
Data indexing
Analysis Steps:
Clustering by auto discovery and by calibrated similarity
Reverse search against essential and forbidden data
Inlier and outlier sorting
All results will be stored and made available as data browsing criteria
A data distribution analysis with balancing recommendations and statistics will be produced from all available results
You can assign a custom name to the saved results
Run Individual Jobs
Account Management Jobs:
create_archive
update_parameters
list_content
remove_content
get_archive_ids_and_urls
add_urls_to_contents
clear_batch_folder
Support Jobs:
fine_tune_vectorizer
sample_data
data_balance
send_feedback
Archive Jobs:
calibrate_similarity
calibrate_relevance
index
inliers_outliers
search
cluster_by_number_of_clusters
cluster_by_calibrated_similarity
You can assign a custom name to the saved results
Monitor Jobs
Monitor the Status, Inputs and Results off all Jobs in the Queue.
Request balancing recommendations based on prioritizing data deletion and sourcing, calculated from available results.
Browse Indexed Data by Criteria
They provide a high productivity qualitative method for a human expert to browse the data focusing on high impact data.
Browsing criteria are automatically updated from the available results.
When a browsing criteria is applied, the Indexed Data directory contents are sorted and clustered to reflect the criteria.
Highlights by discovered clusters
Clustered by discovered clusters
Highlights by calibrated similarity
Clustered by calibrated similarity
Sorted by similarity to Essential Examples
Sorted by similarity to Forbidden Examples
Sorted by similarity to Custom Examples
Sorted by Inliers
Sorted by Outliers
Prioritized Under-represented Data to Source for balance
Prioritized Over-represented Data to Remove for balance
Data Distribution Statistical Metrics
Distribution Metrics are updated from all job results available.
Summary of the results of filtering during indexing:
Number of indexed examples
Number of redundant examples
Number of irrelevant examples
Ratio of redundant examples
Ratio of irrelevant examples
Summary of the results of analysis:
Number of clusters, average cluster size by cluster discovery or by calibrated similarity
Number and ratio of over-represented and under-represented clusters and contents
Quantitative method to compare the distribution balance quality across projects or against benchmarks:
Mean distance to dataset center
Average distance between cluster elements and cluster center
Installation Instructions
Click here to download the GUI repository to your Downloads.
Unzip and place and it in your desired folder in your file system.
Inside the GUI repository inside Instalation choose between Windows, Mac OS and Linux sub directories and run the python and anaconda installers.
Inside the GUI repository inside Instalation open the terminal and run the command:
conda env create -f gui.yml
Setting Permissions:
Inside the GUI repository:
In windows select run_gui_windows.bat in explorer and right click it and select run as administrator
In mac OS select run_gui_mac.command in finder
Right-click on the run_gui_mac.command file and select Get Info.
In the Info window, look for the Sharing & Permissions section at the bottom.
Click the lock icon and enter your password to make changes.
Right-click (or Control-click) on the run_gui_mac.command file and select “Make Alias” from the context menu.
Drag the run_gui_mac.command alias file to the Applications folder.
Right-click the file in the Applications folder and select Open. This should prompt a security warning. Confirm that you want to open the file.
Allow the script to run in System Preferences > Security & Privacy if encountering any security warnings.
In Linux (required for first time only):
select run_gui_linux.sh
Right-click on the run_gui_linux.sh file and select Properties.
In the Properties window, go to the Permissions tab.
Check the box that says Allow executing file as program.
Close the Properties window.
How to Run
In windows open the GUI repository and double click run_gui_windows.bat
In mac OS open applications and double click run_gui_mac.command
In Linux open the GUI repository and double click run_gui_linux.sh