fine_tune_vectorizer

The job allows you to customize the vectorizer model to better suit your specific data. This fine-tuning process enhances the model's accuracy, similarity, and relevance for the given content type. You will require a small dataset for the fine-tuning process.

The custom model will be exclusively accessible from your account and can be used across different archives.

You can use both real and synthetic ai generated data for your examples.

Required Account Privileges: "read-write"

Request JSON ["inputs"]:

   "content_type": 
       string in ["Image", "Video", "Sound", "Text", "Point_Cloud"]
       null NOT allowed
       A string specifying the type of content to be vectorized.
    
   "custom_vectorizer_name_for_sampling": 
       string (3 <= len <= 30)
       null allowed
       An optional saved custom vectorizer name if sampling data with a custom vectorizer.
    
   "starting_custom_vectorizer_name": 
       string (3 <= len <= 30)
       null allowed
       An optional saved custom vectorizer name if you want to start fine tuning from a previously saved custom vectorizer.

   "custom_vectorizer_name":
      string (3 <= len <= 30)
      null NOT allowed
      A string indicating the name of the new custom vectorizer being created.

   "augmentation_conditions":
      dict of string to bool
      null allowed 
      Optional. A dict that allows you to control the trasformations applieded to Image and Video to increase the dataset size with variations.
      All are true by default, set them to false if they invalidate the data in your use case.
      The options are:
         "horizontal_flip" -  Randomly flips the image/video horizontally.
         "vertical_flip" - Randomly flips the image/video vertically.
         "small_rotation" -  Applies small random rotations up to 30 degrees to the image/video.
         "large_rotation" - Applies larger random rotations up to 90 degrees to the image/video.
         "center_crop" - Crops the image/video around its center keeping aspect ratio and scaled bewteen 0.5 and 1.0 of image size.
         "brightness_jitter" - Randomly adjusts the brightness of the image/video.
         "contrast_jitter" - Randomly adjusts the contrast of the image/video.
         "saturation_jitter" - Randomly adjusts the color saturation of the image/video.
         "blur" - Applies a Gaussian blur to the image/video with a randomly chosen blur radius.
       For Sound data, time-domain augmentations like adding noise, time stretching, pitch shifting, and volume adjustments are randomly applied.
       For Point_Cloud data, slight random noise is now added to the coordinates as a form of augmentation.
       For Text data, augmentation is not applied automatically as it can easily alter the meaning and coherence of the text.
   
   "file_urls":
      list of strings
      null allowed
      An optional list of strings containing the URLs of files to be downloaded.
      
   "download_from_batch_cloud_folder":
      bool
      null NOT allowed
      A boolean indicating whether to download files from the batch cloud folder.

Response JSON ["results"]

The job does not return results in the response JSON

File Requirements

Requires files to be sent via FTP to the cloud batch folder or in the file_urls

Fine-tuning a vectorizer is a classification based training step that forces the model to pay attention to the important features and to ignore the irrelevant ones. 
We recommend at least 100 examples and no more than 10000 examples for fine-tuning.
You can have up to 100 labels in your fine tuning dataset.
The labels can be any class that describes the content. Each file can have multiple labels.
Open a text editor and add the labels of each file following the format:
{
    "file_name_1.ext": ["label_1", "label_2", ...],
    "file_name_2.ext": ["label_2"],
    "file_name_3.ext": ["label_1", "label_3"]
}
save your text editor file as "example_to_labels.json" and place it with the dataset files.