extract_similarity_dataset

The job extracts highlights pairs from each Video or Sound file that can be used to calibrate an archive similarity parameters.

It's useful when you do not want to manually build your similarity calibration dataset, and your data consists of videos or sounds, allowing two highlights to be sampled from the timeline.

Required Account Privileges: "read"

Request JSON ["inputs"]:

   "source_content_type":
      string in ["Video", "Sound"]
      null NOT allowed
      A string specifying the type of source content.

   "content_type":
      string in ["Video", "Image", "Sound"]
      null NOT allowed
      A string specifying the type of output content.

   "max_nr_of_pairs":
      int (>=1)
      null NOT allowed
      The total number of pairs to export from all the files. If the number is smaller than the total nr of files then the most diverse pairs will be prioritized.

   "custom_vectorizer_name":
      string (3 <= len <= 30)
      null allowed
      An optional string representing the name of the custom vectorizer to be used for sampling.
   
   "file_urls":
      list of strings
      null allowed
      An optional list of strings containing the URLs of files to be downloaded.
      
   "download_from_batch_cloud_folder":
      bool
      null NOT allowed
      A boolean indicating whether to download files from the batch cloud folder.

Response JSON ["results"]

   "similarity_calibration_pairs_download_urls":
      list of strings
         public_url (string) file_names from the same pair share a prefix with the structure "<pair_id>_cluster"

File Requirements

Requires files to be sent via FTP to the cloud batch folder or in the file_urls