sample_data

The job extracts samples from various content types such as videos, sounds, and texts. This job supports operations like highlight sampling and time interval sampling, allowing you to generate representative subsets of your data for further analysis or processing when the content is too long.

You do not need to call sample_data if you are calling the index job as long as you don’t need to review the selected highlight as the index job will automatically sample your data if it is too long.

Required Account Privileges: "read"

Request JSON ["inputs"]:

   "source_content_type":
      string in ["Video", "Sound", "Text"]
      null NOT allowed
      A string specifying the type of source content.
      
   "content_type":
      string in ["Image", "Video", "Sound", "Text"]
      null NOT allowed
      A string specifying the type of content after sampling.
      
   "time_intervals_or_highlights":
      string in ["time_intervals", "highlights"]
      null NOT allowed
      A string indicating whether to sample based on time intervals or highlights.
      
   "time_interval":
      float (>=0.1)
      null allowed
      A float specifying the length of each time interval in seconds for sampling.
      
   "nr_samples_per_file": 
      int (>=1)
      null allowed 
      An integer specifying the number of samples to extract per file. If null and using time intervals, all possible intervals are extracted; if null and using highlights, a single sample will be extracted.
   
   "custom_vectorizer_name":
      string (3 <= len <= 30)
      null allowed
      An optional string representing the name of the custom vectorizer to be used for sampling.
   
   "file_urls":
      list of strings
      null allowed
      An optional list of strings containing the URLs of files to be downloaded.
      
   "download_from_batch_cloud_folder":
      bool
      null NOT allowed
      A boolean indicating whether to download files from the batch cloud folder.

Response JSON ["results"]

   "samples_file_names_with_download_urls":
      list of lists of strings
         [file_name (string), public_url (string)]

File Requirements

Requires files to be sent via FTP to the cloud batch folder or in the file_urls