fine_tune_translator

The job customizes a translator model for your specific data types, enabling translation between different content types. Translators can be used with the search job to utilize examples from one data type when searching an archive of another type.

You will require a small dataset for the fine-tuning process. The custom model will be exclusively accessible from your account and can be used across different archives.

You can use both real and synthetic ai generated data for your examples.

Required Account Privileges: "read-write"

Request JSON ["inputs"]:

   "translator_name": 
      string (3 <= len <= 30)
      null NOT allowed
      A string specifying the name of the custom translator being created.

   "input_content_type":  
      string in ["Image", "Video", "Sound", "Text", "Point_Cloud"]
      null NOT allowed
      A string specifying the type of input content.
    
   "input_custom_vectorizer_name":
      string (3 <= len <= 30)
      null allowed
      An optional string representing the name of the custom vectorizer to be used for input content.
    
   "output_content_type":
      string in ["Image", "Video", "Sound", "Text", "Point_Cloud"]
      null NOT allowed
      A string specifying the type of output content.
      
    "output_custom_vectorizer_name":
      string (3 <= len <= 30)
      null allowed
      An optional string representing the name of the custom vectorizer to be used for output content.
    
   "file_urls":
      list of strings
      null allowed
      An optional list of strings containing the URLs of files to be downloaded.
      
   "download_from_batch_cloud_folder":
      bool
      null NOT allowed
      A boolean indicating whether to download files from the batch cloud folder.

Response JSON ["results"]

The job does not return results in the response JSON

File Requirements

Requires files to be sent via FTP to the cloud batch folder or in the file_urls

Fine-tuning a translator will allow you to search a vector archive of a content type with data in another content type or in the same content type but with initial processing. 
We recommend at least 100 examples and no more than 20000 examples for fine-tuning.

Open a text editor and add the input and output name pairs following the format:
[
    ["input_name_1", "output_name_1"],
    ["input_name_2", "output_name_2"],
    ["input_name_3", "output_name_3"]
    ...
]

save your text editor file as "training_mappings.json" and place it with the files.
optionally you can also create a "validation_mappings.json" file with the same format.
if no validation file is provided, 15% of the training data will be used for validation.