Automatic determination of data samples in need of human annotation for a machine learning model improvement

Jan 10, 2023·

Daniel Kałuża

Antoni Jamiołkowski

Andrzej Janusz

Igor Marczak

Maciej Matraszek

Andrzej Skowron

Dominik Ślęzak

· 0 min read

Google Patents for US20250021864A1

Abstract

In one embodiment, a method includes determining which objects from a substantial dataset are expected to lead to the largest increase in model quality by applying a samples-selection algorithm using computational capability comprising a processor and/or a memory (e.g., of a processing system and/or a graphics processing unit). The aspect quantifies an informativeness score of data elements in the substantial dataset to determine how likely and/or by what degree data elements will lead to model improvement. The method then automatically determines which data elements of the substantial dataset are in need of human annotation based on a prioritization order derived from the informativeness score and chooses a selected data based on the automatically determining which elements of the substantial dataset are in need of human annotation based on the prioritization order derived from the informativeness score. The method then matches the selected data to an expert.

Type

Patent

Publication

In United States Patent Office (Pending)

Last updated on Jan 10, 2023

Authors

Authors

Authors

Authors

Authors

PhD Candidate

Currently, my research is focused on low-power wireless sensor networks with various aspects: once I was conducting sociometric studies with wearable IoT devices, another time I am trying to model the inner working of a microcontroller.

Authors

Andrzej Skowron

Authors

Dominik Ślęzak

← FrankenTrace: Low-Cost, Cycle-Level, Widely Applicable Program Execution Tracing for ARM Cortex-M SoC May 9, 2023

Learning multimodal entity representations and their ensembles, with applications in a data-driven advisory framework for video game players Oct 28, 2022 →