Automatic determination of data samples in need of human annotation for a machine learning model improvement
Jan 10, 2023·,,,
,,·
0 min read
Daniel Kałuża
Antoni Jamiołkowski
Andrzej Janusz
Igor Marczak
Maciej Matraszek
Andrzej Skowron
Dominik Ślęzak
Abstract
In one embodiment, a method includes determining which objects from a substantial dataset are expected to lead to the largest increase in model quality by applying a samples-selection algorithm using computational capability comprising a processor and/or a memory (e.g., of a processing system and/or a graphics processing unit). The aspect quantifies an informativeness score of data elements in the substantial dataset to determine how likely and/or by what degree data elements will lead to model improvement. The method then automatically determines which data elements of the substantial dataset are in need of human annotation based on a prioritization order derived from the informativeness score and chooses a selected data based on the automatically determining which elements of the substantial dataset are in need of human annotation based on the prioritization order derived from the informativeness score. The method then matches the selected data to an expert.
Type
Publication
In United States Patent Office (Pending)
