data_balance
The job provides recommendations for adjusting your dataset to ensure balanced representation. It identifies over-represented and under-represented items based on their diversity and similarity to essential and forbidden examples, facilitating a more evenly distributed dataset for improved model training, efficient archiving and relevant discovery.
Required Account Privileges: "read"
Request JSON ["inputs"]:
"clustered_content_ids_sorted_by_decreasing_diversity_with_contents_sorted_by_distance_to_centroid": list of lists of ints null NOT allowed A list of lists of integers, where each sublist represents content IDs sorted by decreasing diversity and by their distance to the centroid within clusters. "ids_sorted_from_inliers_to_outliers": list of ints null allowed An optional list of integers representing content IDs sorted from inliers to outliers. "ids_sorted_by_essential_examples": list of ints null allowed An optional list of integers representing content IDs sorted as essential examples. "ids_sorted_by_forbidden_examples": list of ints null allowed An optional list of integers representing content IDs sorted as forbidden examples.
Response JSON ["results"]
"prioritized_over_represented_ids_to_remove": list of ints "prioritized_under_represented_ids_to_source": list of ints