Automatic determination of data samples in need of human annotation for a machine learning model improvement

Jan 10, 2023·
Daniel Kałuża
,
Antoni Jamiołkowski
,
Andrzej Janusz
,
Igor Marczak
Maciej Matraszek
Maciej Matraszek
,
Andrzej Skowron
,
Dominik Ślęzak
· 0 min read
Abstract
In one embodiment, a method includes determining which objects from a substantial dataset are expected to lead to the largest increase in model quality by applying a samples-selection algorithm using computational capability comprising a processor and/or a memory (e.g., of a processing system and/or a graphics processing unit). The aspect quantifies an informativeness score of data elements in the substantial dataset to determine how likely and/or by what degree data elements will lead to model improvement. The method then automatically determines which data elements of the substantial dataset are in need of human annotation based on a prioritization order derived from the informativeness score and chooses a selected data based on the automatically determining which elements of the substantial dataset are in need of human annotation based on the prioritization order derived from the informativeness score. The method then matches the selected data to an expert.
Type
Publication
In United States Patent Office (Pending)
publications
Maciej Matraszek
Authors
PhD Candidate
Currently, my research is focused on low-power wireless sensor networks with various aspects: once I was conducting sociometric studies with wearable IoT devices, another time I am trying to model the inner working of a microcontroller.