Graphical User Interface Installable Client

This GUI tool is installed locally on your machine. It serves as an interface to call API jobs and manage the results efficiently.

  • Requires a cloud account

  • Free to Download

  • Open Source Code Available

Create, Select or Delete a Project

You can create an unlimited number of projects.

Choose a name and content type (Image,Video,Sound,Text,3D Model/Point Cloud)

The raw and sampled files are kept in your local machine.

A cloud archive is created for the indexed vectors.

Data directories are created for the project in your local machine where you can place:

  • Raw Data to index

  • Fine-tuning data

  • Similarity calibration data

  • Relevance calibration data

  • Reverse search examples

  • Essential data examples

  • Forbidden data examples

Populate a Project’s Data Dirs

For the folders Raw_Data, Search_Examples, Essential_Content, Forbidden_Content,Relevance_Calibration_Dataset you need only respect the file extensions.

Vectorizer_Fine_Tuning_Dataset

  1. We recommend at least 100 examples and no more than 10000 examples for fine-tuning.

  2. You can have up to 100 labels in your fine tuning dataset.

  3. The labels can be any class that describes the content. Each file can have multiple labels.

  4. Open a text editor and add the labels of each file following the format:

{

    "file_name_1.ext": ["label_1", "label_2", ...],

    "file_name_2.ext": ["label_2"],

}

save your text editor file as "example_to_labels.json" and place it in the "Fine_Tuning_Data" folder.

Similarity_Calibration_Dataset

  1. The similarity dataset must be composed of at least 200 pairs and max 10000 pairs of examples that are similar according to the client's criteria.

  2. To assemble the similarity dataset we recommend you gather your data into clusters, one for each of the fine tuning labels (with the cluster containing items with that label) and then extract at least 2 pairs from each cluster.

  3. The file names inside the pairs must start with a prefix that is the id of the pair. So the files inside the folder should look like this:

1_file_1.ext

1_file_2.ext

2_file_3.ext

2_file_4.ext

Run the Standard Jobs Pipeline

Will check the project data directories for files and when found will run the jobs in an automated way.

Calibration Steps:

  • Model fine-tuning

  • Similarity calibration

  • Relevance calibration

Indexing Steps:

  • Data sub sampling

  • Data indexing

Analysis Steps:

  • Clustering by auto discovery and by calibrated similarity

  • Reverse search against essential and forbidden data

  • Inlier and outlier sorting

All results will be stored and made available as data browsing criteria

A data distribution analysis with balancing recommendations and statistics will be produced from all available results

You can assign a custom name to the saved results

Run Individual Jobs

Account Management Jobs:

  • create_archive

  • update_parameters

  • list_content

  • remove_content

  • get_archive_ids_and_urls

  • add_urls_to_contents

  • clear_batch_folder

Support Jobs:

  • fine_tune_vectorizer

  • sample_data

  • data_balance

  • send_feedback

Archive Jobs:

  • calibrate_similarity

  • calibrate_relevance

  • index

  • inliers_outliers

  • search

  • cluster_by_number_of_clusters

  • cluster_by_calibrated_similarity

You can assign a custom name to the saved results

Monitor Jobs

Monitor the Status, Inputs and Results off all Jobs in the Queue.

Request balancing recommendations based on prioritizing data deletion and sourcing, calculated from available results.

Browse Indexed Data by Criteria

They provide a high productivity qualitative method for a human expert to browse the data focusing on high impact data.

Browsing criteria are automatically updated from the available results.

When a browsing criteria is applied, the Indexed Data directory contents are sorted and clustered to reflect the criteria.

  • Highlights by discovered clusters

  • Clustered by discovered clusters

  • Highlights by calibrated similarity

  • Clustered by calibrated similarity

  • Sorted by similarity to Essential Examples

  • Sorted by similarity to Forbidden Examples

  • Sorted by similarity to Custom Examples

  • Sorted by Inliers

  • Sorted by Outliers

  • Prioritized Under-represented Data to Source for balance

  • Prioritized Over-represented Data to Remove for balance

Data Distribution Statistical Metrics

Distribution Metrics are updated from all job results available.

Summary of the results of filtering during indexing:

  • Number of indexed examples

  • Number of redundant examples

  • Number of irrelevant examples

  • Ratio of redundant examples

  • Ratio of irrelevant examples

Summary of the results of analysis:

  • Number of clusters, average cluster size by cluster discovery or by calibrated similarity

  • Number and ratio of over-represented and under-represented clusters and contents

Quantitative method to compare the distribution balance quality across projects or against benchmarks:

  • Mean distance to dataset center

  • Average distance between cluster elements and cluster center

Installation Instructions

Click here to download the GUI repository to your Downloads.

Unzip and place and it in your desired folder in your file system.

Inside the GUI repository inside Instalation choose between Windows, Mac OS and Linux sub directories and run the python and anaconda installers.

Inside the GUI repository inside Instalation open the terminal and run the command:

conda env create -f gui.yml

Setting Permissions:

Inside the GUI repository:

In windows select run_gui_windows.bat in explorer and right click it and select run as administrator

In mac OS select run_gui_mac.command in finder

  • Right-click on the run_gui_mac.command file and select Get Info.

  • In the Info window, look for the Sharing & Permissions section at the bottom.

  • Click the lock icon and enter your password to make changes.

  • Right-click (or Control-click) on the run_gui_mac.command file and select “Make Alias” from the context menu.

  • Drag the run_gui_mac.command alias file to the Applications folder.

  • Right-click the file in the Applications folder and select Open. This should prompt a security warning. Confirm that you want to open the file.

  • Allow the script to run in System Preferences > Security & Privacy if encountering any security warnings.

In Linux (required for first time only):

  • select run_gui_linux.sh

  • Right-click on the run_gui_linux.sh file and select Properties.

  • In the Properties window, go to the Permissions tab.

  • Check the box that says Allow executing file as program.

  • Close the Properties window.

How to Run

In windows open the GUI repository and double click run_gui_windows.bat

In mac OS open applications and double click run_gui_mac.command

In Linux open the GUI repository and double click run_gui_linux.sh