diff --git a/latest_release.txt b/latest_release.txt index 3af323aac..57fec884d 100644 --- a/latest_release.txt +++ b/latest_release.txt @@ -1 +1 @@ -v2.6.5 +v2.6.6 diff --git a/v2.6.6/.buildinfo b/v2.6.6/.buildinfo new file mode 100644 index 000000000..106cc014f --- /dev/null +++ b/v2.6.6/.buildinfo @@ -0,0 +1,4 @@ +# Sphinx build info version 1 +# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. +config: 4bb82cf1760245cc580767c78eddaf79 +tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/v2.6.6/.doctrees/cleanlab/benchmarking/index.doctree b/v2.6.6/.doctrees/cleanlab/benchmarking/index.doctree new file mode 100644 index 000000000..f6113ce61 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/benchmarking/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/benchmarking/noise_generation.doctree b/v2.6.6/.doctrees/cleanlab/benchmarking/noise_generation.doctree new file mode 100644 index 000000000..ebedd19b8 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/benchmarking/noise_generation.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/classification.doctree b/v2.6.6/.doctrees/cleanlab/classification.doctree new file mode 100644 index 000000000..331b60fbb Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/classification.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/count.doctree b/v2.6.6/.doctrees/cleanlab/count.doctree new file mode 100644 index 000000000..d470edd0e Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/count.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/data_valuation.doctree b/v2.6.6/.doctrees/cleanlab/data_valuation.doctree new file mode 100644 index 000000000..8f40b1b01 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/data_valuation.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/datalab.doctree b/v2.6.6/.doctrees/cleanlab/datalab/datalab.doctree new file mode 100644 index 000000000..d8de7eb11 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/datalab.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/guide/_templates/issue_types_tip.doctree b/v2.6.6/.doctrees/cleanlab/datalab/guide/_templates/issue_types_tip.doctree new file mode 100644 index 000000000..6a4ce9a58 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/guide/_templates/issue_types_tip.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/guide/custom_issue_manager.doctree b/v2.6.6/.doctrees/cleanlab/datalab/guide/custom_issue_manager.doctree new file mode 100644 index 000000000..0fedbb5ff Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/guide/custom_issue_manager.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/guide/generating_cluster_ids.doctree b/v2.6.6/.doctrees/cleanlab/datalab/guide/generating_cluster_ids.doctree new file mode 100644 index 000000000..b77c79590 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/guide/generating_cluster_ids.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/guide/index.doctree b/v2.6.6/.doctrees/cleanlab/datalab/guide/index.doctree new file mode 100644 index 000000000..73fe4f693 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/guide/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/guide/issue_type_description.doctree b/v2.6.6/.doctrees/cleanlab/datalab/guide/issue_type_description.doctree new file mode 100644 index 000000000..6fe684a64 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/guide/issue_type_description.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/guide/table.doctree b/v2.6.6/.doctrees/cleanlab/datalab/guide/table.doctree new file mode 100644 index 000000000..5da53150e Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/guide/table.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/index.doctree b/v2.6.6/.doctrees/cleanlab/datalab/index.doctree new file mode 100644 index 000000000..ecea0ad15 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/data.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/data.doctree new file mode 100644 index 000000000..17b764847 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/data.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/data_issues.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/data_issues.doctree new file mode 100644 index 000000000..452f75ca1 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/data_issues.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/factory.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/factory.doctree new file mode 100644 index 000000000..6bce38def Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/factory.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/index.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/index.doctree new file mode 100644 index 000000000..76d6ce521 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_finder.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_finder.doctree new file mode 100644 index 000000000..19f6e9229 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_finder.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/_notices/not_registered.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/_notices/not_registered.doctree new file mode 100644 index 000000000..23cd6ec35 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/_notices/not_registered.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/data_valuation.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/data_valuation.doctree new file mode 100644 index 000000000..0af7c9989 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/data_valuation.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/duplicate.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/duplicate.doctree new file mode 100644 index 000000000..12e11a1d6 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/duplicate.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/imbalance.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/imbalance.doctree new file mode 100644 index 000000000..9b4e5f775 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/imbalance.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/index.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/index.doctree new file mode 100644 index 000000000..a62fe36fc Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/issue_manager.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/issue_manager.doctree new file mode 100644 index 000000000..697a0cf8a Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/issue_manager.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/label.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/label.doctree new file mode 100644 index 000000000..0ec26ed0f Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/label.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/multilabel/index.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/multilabel/index.doctree new file mode 100644 index 000000000..526cd7938 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/multilabel/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/multilabel/label.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/multilabel/label.doctree new file mode 100644 index 000000000..3265ce6e7 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/multilabel/label.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/noniid.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/noniid.doctree new file mode 100644 index 000000000..79efc44fd Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/noniid.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/null.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/null.doctree new file mode 100644 index 000000000..40ad02300 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/null.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/outlier.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/outlier.doctree new file mode 100644 index 000000000..53f779057 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/outlier.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/regression/index.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/regression/index.doctree new file mode 100644 index 000000000..365de87e8 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/regression/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/regression/label.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/regression/label.doctree new file mode 100644 index 000000000..50aa1c7e5 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/regression/label.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/underperforming_group.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/underperforming_group.doctree new file mode 100644 index 000000000..fff9689e0 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/issue_manager/underperforming_group.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/model_outputs.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/model_outputs.doctree new file mode 100644 index 000000000..7321c1e96 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/model_outputs.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/report.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/report.doctree new file mode 100644 index 000000000..a8468c85f Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/report.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/internal/task.doctree b/v2.6.6/.doctrees/cleanlab/datalab/internal/task.doctree new file mode 100644 index 000000000..3f9ef3b8e Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/internal/task.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/datalab/optional_dependencies.doctree b/v2.6.6/.doctrees/cleanlab/datalab/optional_dependencies.doctree new file mode 100644 index 000000000..a03ee8ae8 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/datalab/optional_dependencies.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/dataset.doctree b/v2.6.6/.doctrees/cleanlab/dataset.doctree new file mode 100644 index 000000000..744d855eb Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/dataset.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/experimental/cifar_cnn.doctree b/v2.6.6/.doctrees/cleanlab/experimental/cifar_cnn.doctree new file mode 100644 index 000000000..130a90ea3 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/experimental/cifar_cnn.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/experimental/coteaching.doctree b/v2.6.6/.doctrees/cleanlab/experimental/coteaching.doctree new file mode 100644 index 000000000..51b5cb722 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/experimental/coteaching.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/experimental/index.doctree b/v2.6.6/.doctrees/cleanlab/experimental/index.doctree new file mode 100644 index 000000000..b0ed9d555 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/experimental/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/experimental/label_issues_batched.doctree b/v2.6.6/.doctrees/cleanlab/experimental/label_issues_batched.doctree new file mode 100644 index 000000000..f6d8f0a3b Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/experimental/label_issues_batched.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/experimental/mnist_pytorch.doctree b/v2.6.6/.doctrees/cleanlab/experimental/mnist_pytorch.doctree new file mode 100644 index 000000000..b7e904cec Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/experimental/mnist_pytorch.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/experimental/span_classification.doctree b/v2.6.6/.doctrees/cleanlab/experimental/span_classification.doctree new file mode 100644 index 000000000..59305e399 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/experimental/span_classification.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/filter.doctree b/v2.6.6/.doctrees/cleanlab/filter.doctree new file mode 100644 index 000000000..7de3e675c Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/filter.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/index.doctree b/v2.6.6/.doctrees/cleanlab/internal/index.doctree new file mode 100644 index 000000000..8dfb4b289 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/label_quality_utils.doctree b/v2.6.6/.doctrees/cleanlab/internal/label_quality_utils.doctree new file mode 100644 index 000000000..8bf2b3032 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/label_quality_utils.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/latent_algebra.doctree b/v2.6.6/.doctrees/cleanlab/internal/latent_algebra.doctree new file mode 100644 index 000000000..d1e6b43f2 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/latent_algebra.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/multiannotator_utils.doctree b/v2.6.6/.doctrees/cleanlab/internal/multiannotator_utils.doctree new file mode 100644 index 000000000..389f51be6 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/multiannotator_utils.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/multilabel_scorer.doctree b/v2.6.6/.doctrees/cleanlab/internal/multilabel_scorer.doctree new file mode 100644 index 000000000..3cb73f24b Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/multilabel_scorer.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/multilabel_utils.doctree b/v2.6.6/.doctrees/cleanlab/internal/multilabel_utils.doctree new file mode 100644 index 000000000..6dc52f351 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/multilabel_utils.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/neighbor/index.doctree b/v2.6.6/.doctrees/cleanlab/internal/neighbor/index.doctree new file mode 100644 index 000000000..0efd10016 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/neighbor/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/neighbor/knn_graph.doctree b/v2.6.6/.doctrees/cleanlab/internal/neighbor/knn_graph.doctree new file mode 100644 index 000000000..992abe002 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/neighbor/knn_graph.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/neighbor/metric.doctree b/v2.6.6/.doctrees/cleanlab/internal/neighbor/metric.doctree new file mode 100644 index 000000000..7e7a8935d Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/neighbor/metric.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/neighbor/search.doctree b/v2.6.6/.doctrees/cleanlab/internal/neighbor/search.doctree new file mode 100644 index 000000000..036f482d6 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/neighbor/search.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/outlier.doctree b/v2.6.6/.doctrees/cleanlab/internal/outlier.doctree new file mode 100644 index 000000000..9ec787e6e Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/outlier.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/token_classification_utils.doctree b/v2.6.6/.doctrees/cleanlab/internal/token_classification_utils.doctree new file mode 100644 index 000000000..03542d2b9 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/token_classification_utils.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/util.doctree b/v2.6.6/.doctrees/cleanlab/internal/util.doctree new file mode 100644 index 000000000..ee099047e Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/util.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/internal/validation.doctree b/v2.6.6/.doctrees/cleanlab/internal/validation.doctree new file mode 100644 index 000000000..3e3b021ba Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/internal/validation.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/models/fasttext.doctree b/v2.6.6/.doctrees/cleanlab/models/fasttext.doctree new file mode 100644 index 000000000..fab040946 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/models/fasttext.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/models/index.doctree b/v2.6.6/.doctrees/cleanlab/models/index.doctree new file mode 100644 index 000000000..5967536bf Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/models/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/models/keras.doctree b/v2.6.6/.doctrees/cleanlab/models/keras.doctree new file mode 100644 index 000000000..880797cbd Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/models/keras.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/multiannotator.doctree b/v2.6.6/.doctrees/cleanlab/multiannotator.doctree new file mode 100644 index 000000000..3e7d8275f Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/multiannotator.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/multilabel_classification/dataset.doctree b/v2.6.6/.doctrees/cleanlab/multilabel_classification/dataset.doctree new file mode 100644 index 000000000..1af374ec6 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/multilabel_classification/dataset.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/multilabel_classification/filter.doctree b/v2.6.6/.doctrees/cleanlab/multilabel_classification/filter.doctree new file mode 100644 index 000000000..239259164 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/multilabel_classification/filter.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/multilabel_classification/index.doctree b/v2.6.6/.doctrees/cleanlab/multilabel_classification/index.doctree new file mode 100644 index 000000000..11f8ef7a2 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/multilabel_classification/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/multilabel_classification/rank.doctree b/v2.6.6/.doctrees/cleanlab/multilabel_classification/rank.doctree new file mode 100644 index 000000000..627b42875 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/multilabel_classification/rank.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/object_detection/filter.doctree b/v2.6.6/.doctrees/cleanlab/object_detection/filter.doctree new file mode 100644 index 000000000..d53036059 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/object_detection/filter.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/object_detection/index.doctree b/v2.6.6/.doctrees/cleanlab/object_detection/index.doctree new file mode 100644 index 000000000..8bc2a4537 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/object_detection/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/object_detection/rank.doctree b/v2.6.6/.doctrees/cleanlab/object_detection/rank.doctree new file mode 100644 index 000000000..eea74b1ef Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/object_detection/rank.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/object_detection/summary.doctree b/v2.6.6/.doctrees/cleanlab/object_detection/summary.doctree new file mode 100644 index 000000000..85293fb91 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/object_detection/summary.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/outlier.doctree b/v2.6.6/.doctrees/cleanlab/outlier.doctree new file mode 100644 index 000000000..a70c3cedc Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/outlier.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/rank.doctree b/v2.6.6/.doctrees/cleanlab/rank.doctree new file mode 100644 index 000000000..be58cb1de Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/rank.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/regression/index.doctree b/v2.6.6/.doctrees/cleanlab/regression/index.doctree new file mode 100644 index 000000000..50d31cbd2 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/regression/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/regression/learn.doctree b/v2.6.6/.doctrees/cleanlab/regression/learn.doctree new file mode 100644 index 000000000..0b46e2708 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/regression/learn.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/regression/rank.doctree b/v2.6.6/.doctrees/cleanlab/regression/rank.doctree new file mode 100644 index 000000000..f3692c655 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/regression/rank.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/segmentation/filter.doctree b/v2.6.6/.doctrees/cleanlab/segmentation/filter.doctree new file mode 100644 index 000000000..fae1efbee Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/segmentation/filter.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/segmentation/index.doctree b/v2.6.6/.doctrees/cleanlab/segmentation/index.doctree new file mode 100644 index 000000000..117dc43c4 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/segmentation/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/segmentation/rank.doctree b/v2.6.6/.doctrees/cleanlab/segmentation/rank.doctree new file mode 100644 index 000000000..fd647d26b Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/segmentation/rank.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/segmentation/summary.doctree b/v2.6.6/.doctrees/cleanlab/segmentation/summary.doctree new file mode 100644 index 000000000..8b05b2805 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/segmentation/summary.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/token_classification/filter.doctree b/v2.6.6/.doctrees/cleanlab/token_classification/filter.doctree new file mode 100644 index 000000000..ef80eb436 Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/token_classification/filter.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/token_classification/index.doctree b/v2.6.6/.doctrees/cleanlab/token_classification/index.doctree new file mode 100644 index 000000000..cf178921a Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/token_classification/index.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/token_classification/rank.doctree b/v2.6.6/.doctrees/cleanlab/token_classification/rank.doctree new file mode 100644 index 000000000..9debbb8dc Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/token_classification/rank.doctree differ diff --git a/v2.6.6/.doctrees/cleanlab/token_classification/summary.doctree b/v2.6.6/.doctrees/cleanlab/token_classification/summary.doctree new file mode 100644 index 000000000..bfc6cdb9c Binary files /dev/null and b/v2.6.6/.doctrees/cleanlab/token_classification/summary.doctree differ diff --git a/v2.6.6/.doctrees/environment.pickle b/v2.6.6/.doctrees/environment.pickle new file mode 100644 index 000000000..14c5dfd23 Binary files /dev/null and b/v2.6.6/.doctrees/environment.pickle differ diff --git a/v2.6.6/.doctrees/index.doctree b/v2.6.6/.doctrees/index.doctree new file mode 100644 index 000000000..1fed81d8f Binary files /dev/null and b/v2.6.6/.doctrees/index.doctree differ diff --git a/v2.6.6/.doctrees/migrating/migrate_v2.doctree b/v2.6.6/.doctrees/migrating/migrate_v2.doctree new file mode 100644 index 000000000..0e73764ff Binary files /dev/null and b/v2.6.6/.doctrees/migrating/migrate_v2.doctree differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/clean_learning/tabular.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/clean_learning/tabular.ipynb new file mode 100644 index 000000000..9594cf80d --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/clean_learning/tabular.ipynb @@ -0,0 +1,820 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Classification with Structured/Tabular Data and Noisy Labels\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Consider Using Datalab\n", + "
\n", + "\n", + "If interested in detecting a wide variety of issues in your tabular data, check out the [Datalab tabular tutorial](https://docs.cleanlab.ai/stable/tutorials/datalab/tabular.html). Datalab can detect many other types of data issues beyond label issues, whereas CleanLearning is a convenience method to handle noisy labels with sklearn-compatible classification models.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this 5-minute quickstart tutorial, we use cleanlab with scikit-learn models to find potential label errors in a classification dataset with tabular features (numeric/categorical columns). Tabular (or *structured*) data are typically organized in a row/column format and stored in a SQL database or file types like: CSV, Excel, or Parquet. Here we consider a Student Grades dataset, which contains over 900 individuals who have three exam grades and some optional notes, each being assigned a letter grade (their class label). cleanlab automatically identifies _hundreds_ of examples in this dataset that were mislabeled with the incorrect final grade (data entry mistakes). \n", + "\n", + "This tutorial shows how to handle noisy labels and produce more robust classification models for your own tabular datasets. cleanlab's `CleanLearning` class automatically detects and filters out such badly labeled data, in order to train a more robust version of any Machine Learning model. No change to your existing modeling code is required! \n", + "\n", + "\n", + "**Overview of what we'll do in this tutorial:**\n", + "\n", + "- Train a classifier model (here scikit-learn's ExtraTreesClassifier, although any model could be used) and use this classifier to compute (out-of-sample) predicted class probabilities via cross-validation.\n", + "\n", + "- Identify potential label errors in the data with cleanlab's `find_label_issues` method.\n", + "\n", + "- Train a robust version of the same ExtraTrees model via cleanlab's `CleanLearning` wrapper.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "Already have an sklearn compatible `model`, tabular `data` and given `labels`? Run the code below to train your `model` and get label issues.\n", + "\n", + "\n", + "
\n", + " \n", + "```python\n", + "\n", + "from cleanlab.classification import CleanLearning\n", + "\n", + "cl = CleanLearning(model)\n", + "_ = cl.fit(train_data, labels)\n", + "label_issues = cl.get_label_issues()\n", + "preds = cl.predict(test_data) # predictions from a version of your model \n", + " # trained on auto-cleaned data\n", + "\n", + "\n", + "```\n", + " \n", + "
\n", + " \n", + "Is your model/data not compatible with `CleanLearning`? You can instead run cross-validation on your model to get out-of-sample `pred_probs`. Then run the code below to get label issue indices ranked by their inferred severity.\n", + "\n", + "\n", + "
\n", + " \n", + "```python\n", + "\n", + "from cleanlab.filter import find_label_issues\n", + "\n", + "ranked_label_issues = find_label_issues(\n", + " labels,\n", + " pred_probs,\n", + " return_indices_ranked_by=\"self_confidence\",\n", + ")\n", + " \n", + "\n", + "```\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Install required dependencies\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can use `pip` to install all packages required for this tutorial as follows:\n", + "\n", + "```ipython3\n", + "!pip install cleanlab\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:16.812911Z", + "iopub.status.busy": "2024-06-25T22:58:16.812738Z", + "iopub.status.idle": "2024-06-25T22:58:18.003187Z", + "shell.execute_reply": "2024-06-25T22:58:18.002555Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs website).\n", + "dependencies = [\"cleanlab\"]\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:18.005860Z", + "iopub.status.busy": "2024-06-25T22:58:18.005379Z", + "iopub.status.idle": "2024-06-25T22:58:18.023261Z", + "shell.execute_reply": "2024-06-25T22:58:18.022710Z" + } + }, + "outputs": [], + "source": [ + "import random\n", + "import numpy as np\n", + "import pandas as pd \n", + "from sklearn.preprocessing import StandardScaler, LabelEncoder\n", + "from sklearn.model_selection import cross_val_predict, train_test_split\n", + "from sklearn.metrics import accuracy_score\n", + "from sklearn.ensemble import ExtraTreesClassifier\n", + "\n", + "from cleanlab.filter import find_label_issues\n", + "from cleanlab.classification import CleanLearning\n", + "\n", + "SEED = 100 \n", + "\n", + "np.random.seed(SEED)\n", + "random.seed(SEED)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Load and process the data\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We first load the data features and labels (which are possibly noisy).\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:18.025554Z", + "iopub.status.busy": "2024-06-25T22:58:18.025175Z", + "iopub.status.idle": "2024-06-25T22:58:18.161564Z", + "shell.execute_reply": "2024-06-25T22:58:18.160996Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
stud_IDexam_1exam_2exam_3notesletter_grade
0f48f7353.0077.009.003C
10bd4e781.0064.0080.00great participation +10B
20bd4e781.0064.0080.00great participation +10B
3cb9d7a0.610.940.78NaNC
49acca448.0090.009.001C
\n", + "
" + ], + "text/plain": [ + " stud_ID exam_1 exam_2 exam_3 notes letter_grade\n", + "0 f48f73 53.00 77.00 9.00 3 C\n", + "1 0bd4e7 81.00 64.00 80.00 great participation +10 B\n", + "2 0bd4e7 81.00 64.00 80.00 great participation +10 B\n", + "3 cb9d7a 0.61 0.94 0.78 NaN C\n", + "4 9acca4 48.00 90.00 9.00 1 C" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "grades_data = pd.read_csv(\"https://s.cleanlab.ai/grades-tabular-demo-v2.csv\")\n", + "grades_data.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:18.192134Z", + "iopub.status.busy": "2024-06-25T22:58:18.191649Z", + "iopub.status.idle": "2024-06-25T22:58:18.195447Z", + "shell.execute_reply": "2024-06-25T22:58:18.194896Z" + } + }, + "outputs": [], + "source": [ + "X_raw = grades_data[[\"exam_1\", \"exam_2\", \"exam_3\", \"notes\"]]\n", + "labels_raw = grades_data[\"letter_grade\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we preprocess the data. Here we apply one-hot encoding to features with categorical data, and standardize features with numeric data. We also perform label encoding on the labels, as cleanlab's functions require the labels for each example to be an interger integer in 0, 1, …, num_classes - 1. " + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:18.197612Z", + "iopub.status.busy": "2024-06-25T22:58:18.197174Z", + "iopub.status.idle": "2024-06-25T22:58:18.205441Z", + "shell.execute_reply": "2024-06-25T22:58:18.204889Z" + } + }, + "outputs": [], + "source": [ + "categorical_features = [\"notes\"]\n", + "X_encoded = pd.get_dummies(X_raw, columns=categorical_features, drop_first=True)\n", + "\n", + "numeric_features = [\"exam_1\", \"exam_2\", \"exam_3\"]\n", + "scaler = StandardScaler()\n", + "X_processed = X_encoded.copy()\n", + "X_processed[numeric_features] = scaler.fit_transform(X_encoded[numeric_features])\n", + "\n", + "encoder = LabelEncoder()\n", + "encoder.fit(labels_raw)\n", + "labels = encoder.transform(labels_raw)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Bringing Your Own Data (BYOD)?\n", + "\n", + "You can easily replace the above with your own tabular dataset, and continue with the rest of the tutorial.\n", + " \n", + "Your classes (and entries of `labels`) should be represented as integer indices 0, 1, ..., num_classes - 1. \n", + "For example, if your dataset has 7 examples from 3 classes, `labels` might look like: `np.array([2,0,0,1,2,0,1])`\n", + "\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Select a classification model and compute out-of-sample predicted probabilities\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we use a simple ExtraTrees classifier that fits various randomized decision tress on our data, but you can choose any suitable scikit-learn model for this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:18.207513Z", + "iopub.status.busy": "2024-06-25T22:58:18.207186Z", + "iopub.status.idle": "2024-06-25T22:58:18.210226Z", + "shell.execute_reply": "2024-06-25T22:58:18.209815Z" + } + }, + "outputs": [], + "source": [ + "clf = ExtraTreesClassifier()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To find potential labeling errors, cleanlab requires a probabilistic prediction from your model for every datapoint. However, these predictions will be _overfitted_ (and thus unreliable) for examples the model was previously trained on. For the best results, cleanlab should be applied with **out-of-sample** predicted class probabilities, i.e., on examples held out from the model during the training.\n", + "\n", + "K-fold cross-validation is a straightforward way to produce out-of-sample predicted probabilities for every datapoint in the dataset by training K copies of our model on different data subsets and using each copy to predict on the subset of data it did not see during training. An additional benefit of cross-validation is that it provides a more reliable evaluation of our model than a single training/validation split. We can implement this via the `cross_val_predict` method from scikit-learn:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:18.212201Z", + "iopub.status.busy": "2024-06-25T22:58:18.211873Z", + "iopub.status.idle": "2024-06-25T22:58:18.729475Z", + "shell.execute_reply": "2024-06-25T22:58:18.728937Z" + } + }, + "outputs": [], + "source": [ + "num_crossval_folds = 5 \n", + "pred_probs = cross_val_predict(\n", + " clf,\n", + " X_processed,\n", + " labels,\n", + " cv=num_crossval_folds,\n", + " method=\"predict_proba\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Use cleanlab to find label issues\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Based on the given labels and out-of-sample predicted probabilities, cleanlab can quickly help us identify poorly labeled instances in our data table. For a dataset with N examples from K classes, the labels should be a 1D array of length N and predicted probabilities should be a 2D (N x K) array. Here we request that the indices of the identified label issues be sorted by cleanlab's self-confidence score, which measures the quality of each given label via the probability assigned to it in our model's prediction." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:18.731907Z", + "iopub.status.busy": "2024-06-25T22:58:18.731548Z", + "iopub.status.idle": "2024-06-25T22:58:20.610609Z", + "shell.execute_reply": "2024-06-25T22:58:20.609890Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cleanlab found 212 potential label errors.\n" + ] + } + ], + "source": [ + "ranked_label_issues = find_label_issues(\n", + " labels=labels, pred_probs=pred_probs, return_indices_ranked_by=\"self_confidence\"\n", + ")\n", + "\n", + "print(f\"Cleanlab found {len(ranked_label_issues)} potential label errors.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's review some of the most likely label errors:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:20.613273Z", + "iopub.status.busy": "2024-06-25T22:58:20.612577Z", + "iopub.status.idle": "2024-06-25T22:58:20.622571Z", + "shell.execute_reply": "2024-06-25T22:58:20.622087Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
exam_1exam_2exam_3noteslabel
45658.092.093.0NaND
82799.086.074.0NaND
6370.079.065.0cheated on exam, gets 0ptsA
1200.081.097.0cheated on exam, gets 0ptsB
23368.083.076.0NaNF
\n", + "
" + ], + "text/plain": [ + " exam_1 exam_2 exam_3 notes label\n", + "456 58.0 92.0 93.0 NaN D\n", + "827 99.0 86.0 74.0 NaN D\n", + "637 0.0 79.0 65.0 cheated on exam, gets 0pts A\n", + "120 0.0 81.0 97.0 cheated on exam, gets 0pts B\n", + "233 68.0 83.0 76.0 NaN F" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X_raw.iloc[ranked_label_issues].assign(label=labels_raw.iloc[ranked_label_issues]).head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "These final grades look suspicious and should definitely be carefully re-examined! This is a straightforward approach to visualize the rows in a data table that might be mislabeled." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Train a more robust model from noisy labels\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Following proper ML practice, let's split our data into train and test sets.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:20.624712Z", + "iopub.status.busy": "2024-06-25T22:58:20.624398Z", + "iopub.status.idle": "2024-06-25T22:58:20.628520Z", + "shell.execute_reply": "2024-06-25T22:58:20.628114Z" + } + }, + "outputs": [], + "source": [ + "X_train, X_test, labels_train, labels_test = train_test_split(\n", + " X_encoded,\n", + " labels,\n", + " test_size=0.2,\n", + " random_state=SEED,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We again standardize the numeric features, this time fitting the scaling parameters solely on the training set.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:20.630722Z", + "iopub.status.busy": "2024-06-25T22:58:20.630413Z", + "iopub.status.idle": "2024-06-25T22:58:20.637186Z", + "shell.execute_reply": "2024-06-25T22:58:20.636770Z" + } + }, + "outputs": [], + "source": [ + "scaler = StandardScaler()\n", + "X_train[numeric_features] = scaler.fit_transform(X_train[numeric_features])\n", + "X_test[numeric_features] = scaler.transform(X_test[numeric_features])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's now train and evaluate the original ExtraTrees model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:20.639175Z", + "iopub.status.busy": "2024-06-25T22:58:20.638844Z", + "iopub.status.idle": "2024-06-25T22:58:20.749460Z", + "shell.execute_reply": "2024-06-25T22:58:20.748869Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Test accuracy of original model: 0.783068783068783\n" + ] + } + ], + "source": [ + "clf.fit(X_train, labels_train)\n", + "acc_og = clf.score(X_test, labels_test)\n", + "print(f\"Test accuracy of original model: {acc_og}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "cleanlab provides a wrapper class that can be easily applied to any scikit-learn compatible model. Once wrapped, the resulting model can still be used in the exact same manner, but it will now train more robustly if the data have noisy labels.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:20.751665Z", + "iopub.status.busy": "2024-06-25T22:58:20.751360Z", + "iopub.status.idle": "2024-06-25T22:58:20.754171Z", + "shell.execute_reply": "2024-06-25T22:58:20.753696Z" + } + }, + "outputs": [], + "source": [ + "clf = ExtraTreesClassifier() # Note we first re-initialize clf\n", + "cl = CleanLearning(clf) # cl has same methods/attributes as clf" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following operations take place when we train the cleanlab-wrapped model: The original model is trained in a cross-validated fashion to produce out-of-sample predicted probabilities. Then, these predicted probabilities are used to identify label issues, which are then removed from the dataset. Finally, the original model is trained on the remaining clean subset of the data once more.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:20.756045Z", + "iopub.status.busy": "2024-06-25T22:58:20.755865Z", + "iopub.status.idle": "2024-06-25T22:58:22.765018Z", + "shell.execute_reply": "2024-06-25T22:58:22.764374Z" + } + }, + "outputs": [], + "source": [ + "_ = cl.fit(X_train, labels_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can get predictions from the resulting model and evaluate them, just like how we did it for the original scikit-learn model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:22.768039Z", + "iopub.status.busy": "2024-06-25T22:58:22.767307Z", + "iopub.status.idle": "2024-06-25T22:58:22.778512Z", + "shell.execute_reply": "2024-06-25T22:58:22.777936Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Test accuracy of cleanlab-trained model: 0.8095238095238095\n" + ] + } + ], + "source": [ + "preds = cl.predict(X_test)\n", + "acc_cl = accuracy_score(labels_test, preds)\n", + "print(f\"Test accuracy of cleanlab-trained model: {acc_cl}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can see that the test set accuracy slightly improved as a result of the data cleaning. Note that this will not always be the case, especially when we evaluate on test data that are themselves noisy. The best practice is to run cleanlab to identify potential label issues and then manually review them, before blindly trusting any accuracy metrics. In particular, the most effort should be made to ensure high-quality test data, which is supposed to reflect the expected performance of our model during deployment." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:22.780796Z", + "iopub.status.busy": "2024-06-25T22:58:22.780390Z", + "iopub.status.idle": "2024-06-25T22:58:22.815718Z", + "shell.execute_reply": "2024-06-25T22:58:22.815171Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "if acc_og >= acc_cl: # check cleanlab has improved prediction accuracy\n", + " raise Exception(\"Cleanlab training failed to improve model accuracy.\")\n", + " \n", + "# this file contains true and noisy labels\n", + "true_data = pd.read_csv(\"https://s.cleanlab.ai/student-grades-demo.csv\")\n", + "true_errors = np.where(true_data[\"letter_grade\"] != true_data[\"noisy_letter_grade\"])[0]\n", + "if not all(x in true_errors for x in ranked_label_issues[:5]): # check top errors are indeed errors\n", + " raise Exception(\"Some of the top listed errors are not actually label errors.\")" + ] + } + ], + "metadata": { + "interpreter": { + "hash": "cda20062bc42cfdcaa0f9720c0b28e880bba110e9dfce6c1689934eec9b595a1" + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/clean_learning/text.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/clean_learning/text.ipynb new file mode 100644 index 000000000..454aa8e13 --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/clean_learning/text.ipynb @@ -0,0 +1,3651 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Text Classification with Noisy Labels\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Consider Using Datalab\n", + "
\n", + "\n", + "If you are interested in detecting a wide variety of issues in your text dataset, check out the [Datalab text tutorial](https://docs.cleanlab.ai/stable/tutorials/datalab/text.html). Datalab can detect many other types of data issues beyond label issues, whereas CleanLearning is a convenience method to handle noisy labels with sklearn-compatible classification models.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this 5-minute quickstart tutorial, we use cleanlab to find potential label errors in an intent classification dataset composed of (text) customer service requests at an online bank. We consider a subset of the [Banking77-OOS Dataset](https://arxiv.org/abs/2106.04564) containing 1,000 customer service requests which can be classified into 10 categories corresponding to the intent of the request. cleanlab will shortlist examples that confuse our ML model the most; many of which are potential label errors, out-of-scope examples, or otherwise ambiguous examples. cleanlab's `CleanLearning` class automatically detects and filters out such badly labeled data, in order to train a more robust version of any Machine Learning model. No change to your existing modeling code is required!\n", + "\n", + "\n", + "**Overview of what we'll do in this tutorial:**\n", + "\n", + "- Define a ML model that can be trained on our dataset (here we use Logistic Regression applied to text embeddings from a pretrained Transformer network, you can use any text classifier model).\n", + "\n", + "- Use `CleanLearning` to wrap this ML model and compute out-of-sample predicted class probabilites, which allow us to identify potential label errors in the dataset.\n", + "\n", + "- Train a more robust version of the same ML model after dropping the detected label errors using `CleanLearning`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "Already have an sklearn compatible `model`, `data` and given `labels`? Run the code below to train your `model` and get label issues using `CleanLearning`. \n", + " \n", + "You can subsequently use the same `CleanLearning` object to train a more robust model (only trained on the clean data) by calling the `.fit()` method and passing in the `label_issues` found earlier.\n", + "\n", + "\n", + "
\n", + " \n", + "```python\n", + "\n", + "from cleanlab.classification import CleanLearning\n", + "\n", + "cl = CleanLearning(model)\n", + "label_issues = cl.find_label_issues(train_data, labels) # identify mislabeled examples \n", + " \n", + "cl.fit(train_data, labels, label_issues=label_issues)\n", + "preds = cl.predict(test_data) # predictions from a version of your model \n", + " # trained on auto-cleaned data\n", + "\n", + "\n", + "```\n", + " \n", + "
\n", + " \n", + "Is your model/data not compatible with `CleanLearning`? You can instead run cross-validation on your model to get out-of-sample `pred_probs`. Then run the code below to get label issue indices ranked by their inferred severity.\n", + "\n", + "\n", + "
\n", + " \n", + "```python\n", + "\n", + "from cleanlab.filter import find_label_issues\n", + "\n", + "ranked_label_issues = find_label_issues(\n", + " labels,\n", + " pred_probs,\n", + " return_indices_ranked_by=\"self_confidence\",\n", + ")\n", + " \n", + "\n", + "```\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Install required dependencies\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can use `pip` to install all packages required for this tutorial as follows:\n", + "\n", + "```ipython3\n", + "!pip install sentence-transformers\n", + "!pip install cleanlab\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:27.076644Z", + "iopub.status.busy": "2024-06-25T22:58:27.076237Z", + "iopub.status.idle": "2024-06-25T22:58:30.002682Z", + "shell.execute_reply": "2024-06-25T22:58:30.002092Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs.cleanlab.ai).\n", + "# If running on Colab, may want to use GPU (select: Runtime > Change runtime type > Hardware accelerator > GPU)\n", + "# Package versions we used:scikit-learn==1.2.0 sentence-transformers==2.2.2\n", + "\n", + "dependencies = [\"cleanlab\", \"sentence_transformers\"]\n", + "\n", + "# Supress outputs that may appear if tensorflow happens to be improperly installed: \n", + "import os \n", + "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\" # disable parallelism to avoid deadlocks with huggingface\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:30.005283Z", + "iopub.status.busy": "2024-06-25T22:58:30.004816Z", + "iopub.status.idle": "2024-06-25T22:58:30.008249Z", + "shell.execute_reply": "2024-06-25T22:58:30.007788Z" + } + }, + "outputs": [], + "source": [ + "import re \n", + "import string \n", + "import pandas as pd \n", + "from sklearn.metrics import accuracy_score\n", + "from sklearn.model_selection import train_test_split, cross_val_predict \n", + "from sklearn.preprocessing import LabelEncoder\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sentence_transformers import SentenceTransformer\n", + "\n", + "from cleanlab.classification import CleanLearning" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:30.010238Z", + "iopub.status.busy": "2024-06-25T22:58:30.009917Z", + "iopub.status.idle": "2024-06-25T22:58:30.013407Z", + "shell.execute_reply": "2024-06-25T22:58:30.013000Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# This cell is hidden from docs.cleanlab.ai \n", + "\n", + "import random \n", + "import numpy as np \n", + "\n", + "pd.set_option(\"display.max_colwidth\", None) \n", + "\n", + "SEED = 123456 # for reproducibility \n", + "\n", + "np.random.seed(SEED)\n", + "random.seed(SEED)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Load and format the text dataset\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:30.015468Z", + "iopub.status.busy": "2024-06-25T22:58:30.015064Z", + "iopub.status.idle": "2024-06-25T22:58:30.062140Z", + "shell.execute_reply": "2024-06-25T22:58:30.061566Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
textlabel
0i accidentally made a payment to a wrong account. what should i do?cancel_transfer
1i no longer want to transfer funds, can we cancel that transaction?cancel_transfer
2cancel my transfer, please.cancel_transfer
3i want to revert this mornings transaction.cancel_transfer
4i just realised i made the wrong payment yesterday. can you please change it to the right account? it's my rent payment and really really needs to be in the right account by tomorrowcancel_transfer
\n", + "
" + ], + "text/plain": [ + " text \\\n", + "0 i accidentally made a payment to a wrong account. what should i do? \n", + "1 i no longer want to transfer funds, can we cancel that transaction? \n", + "2 cancel my transfer, please. \n", + "3 i want to revert this mornings transaction. \n", + "4 i just realised i made the wrong payment yesterday. can you please change it to the right account? it's my rent payment and really really needs to be in the right account by tomorrow \n", + "\n", + " label \n", + "0 cancel_transfer \n", + "1 cancel_transfer \n", + "2 cancel_transfer \n", + "3 cancel_transfer \n", + "4 cancel_transfer " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data = pd.read_csv(\"https://s.cleanlab.ai/banking-intent-classification.csv\")\n", + "data.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:30.064388Z", + "iopub.status.busy": "2024-06-25T22:58:30.063968Z", + "iopub.status.idle": "2024-06-25T22:58:30.067815Z", + "shell.execute_reply": "2024-06-25T22:58:30.067249Z" + } + }, + "outputs": [], + "source": [ + "raw_texts, raw_labels = data[\"text\"].values, data[\"label\"].values\n", + "\n", + "raw_train_texts, raw_test_texts, raw_train_labels, raw_test_labels = train_test_split(raw_texts, raw_labels, test_size=0.1)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:30.070039Z", + "iopub.status.busy": "2024-06-25T22:58:30.069730Z", + "iopub.status.idle": "2024-06-25T22:58:30.073325Z", + "shell.execute_reply": "2024-06-25T22:58:30.072856Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This dataset has 10 classes.\n", + "Classes: {'visa_or_mastercard', 'card_payment_fee_charged', 'beneficiary_not_allowed', 'supported_cards_and_currencies', 'cancel_transfer', 'apple_pay_or_google_pay', 'lost_or_stolen_phone', 'getting_spare_card', 'card_about_to_expire', 'change_pin'}\n" + ] + } + ], + "source": [ + "num_classes = len(set(raw_train_labels))\n", + "\n", + "print(f\"This dataset has {num_classes} classes.\")\n", + "print(f\"Classes: {set(raw_train_labels)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's print the first example in the train set." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:30.075289Z", + "iopub.status.busy": "2024-06-25T22:58:30.074975Z", + "iopub.status.idle": "2024-06-25T22:58:30.078128Z", + "shell.execute_reply": "2024-06-25T22:58:30.077586Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Example Label: getting_spare_card\n", + "Example Text: can i have another card in addition to my first one?\n" + ] + } + ], + "source": [ + "i = 0\n", + "print(f\"Example Label: {raw_train_labels[i]}\")\n", + "print(f\"Example Text: {raw_train_texts[i]}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The data is stored as two numpy arrays for each the train and test set:\n", + "\n", + "1. `raw_train_texts` and `raw_test_texts` store the customer service requests utterances in text format\n", + "2. `raw_train_labels` and `raw_test_labels` store the intent categories (labels) for each example\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, we need to perform label enconding on the labels, cleanlab's functions require the labels for each example to be an interger integer in 0, 1, …, num_classes - 1. We will use sklearn's `LabelEncoder` to encode our labels.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:30.080201Z", + "iopub.status.busy": "2024-06-25T22:58:30.079864Z", + "iopub.status.idle": "2024-06-25T22:58:30.083112Z", + "shell.execute_reply": "2024-06-25T22:58:30.082659Z" + } + }, + "outputs": [], + "source": [ + "encoder = LabelEncoder()\n", + "encoder.fit(raw_train_labels)\n", + "\n", + "train_labels = encoder.transform(raw_train_labels)\n", + "test_labels = encoder.transform(raw_test_labels)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Bringing Your Own Data (BYOD)?\n", + "\n", + "You can easily replace the above with your own text dataset, and continue with the rest of the tutorial.\n", + "\n", + "Your classes (and entries of `train_labels` / `test_labels`) should be represented as integer indices 0, 1, ..., num_classes - 1.\n", + "For example, if your dataset has 7 examples from 3 classes, `train_labels` might be: `np.array([2,0,0,1,2,0,1])`\n", + "\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we convert the text strings into vectors better suited as inputs for our ML model. \n", + "\n", + "We will use numeric representations from a pretrained Transformer model as embeddings of our text. The [Sentence Transformers](https://huggingface.co/docs/hub/sentence-transformers) library offers simple methods to compute these embeddings for text data. Here, we load the pretrained `electra-small-discriminator` model, and then run our data through network to extract a vector embedding of each example." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:30.085179Z", + "iopub.status.busy": "2024-06-25T22:58:30.084823Z", + "iopub.status.idle": "2024-06-25T22:58:34.525740Z", + "shell.execute_reply": "2024-06-25T22:58:34.525194Z" + } + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "71a66c5855e24b9ba0271326d76cbbeb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + ".gitattributes: 0%| | 0.00/391 [00:00" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A typical way to leverage pretrained networks for a particular classification task is to add a linear output layer and fine-tune the network parameters on the new data. However this can be computationally intensive. Alternatively, we can freeze the pretrained weights of the network and only train the output layer without having to rely on GPU(s). Here we do this conveniently by fitting a scikit-learn linear model on top of the extracted embeddings." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:34.528355Z", + "iopub.status.busy": "2024-06-25T22:58:34.527985Z", + "iopub.status.idle": "2024-06-25T22:58:34.530866Z", + "shell.execute_reply": "2024-06-25T22:58:34.530379Z" + } + }, + "outputs": [], + "source": [ + "model = LogisticRegression(max_iter=400)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can define the `CleanLearning` object with our Logistic Regression model and use `find_label_issues` to identify potential label errors.\n", + "\n", + "`CleanLearning` provides a wrapper class that can easily be applied to any scikit-learn compatible model, which can be used to find potential label issues and train a more robust model if the original data contains noisy labels." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:34.532807Z", + "iopub.status.busy": "2024-06-25T22:58:34.532502Z", + "iopub.status.idle": "2024-06-25T22:58:34.535140Z", + "shell.execute_reply": "2024-06-25T22:58:34.534577Z" + } + }, + "outputs": [], + "source": [ + "cv_n_folds = 5 # for efficiency; values like 5 or 10 will generally work better\n", + "\n", + "cl = CleanLearning(model, cv_n_folds=cv_n_folds)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:34.537360Z", + "iopub.status.busy": "2024-06-25T22:58:34.537008Z", + "iopub.status.idle": "2024-06-25T22:58:37.213928Z", + "shell.execute_reply": "2024-06-25T22:58:37.213222Z" + }, + "scrolled": true + }, + "outputs": [], + "source": [ + "label_issues = cl.find_label_issues(X=train_texts, labels=train_labels)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `find_label_issues` method above will perform cross validation to compute out-of-sample predicted probabilites for each example, which is used to identify label issues.\n", + "\n", + "This method returns a dataframe containing a label quality score for each example. These numeric scores lie between 0 and 1, where lower scores indicate examples more likely to be mislabeled. The dataframe also contains a boolean column specifying whether or not each example is identified to have a label issue (indicating it is likely mislabeled). Note that the given and predicted labels here are encoded as intergers as that was the format expected by `cleanlab`, we will inverse transform them later in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:37.216936Z", + "iopub.status.busy": "2024-06-25T22:58:37.216358Z", + "iopub.status.idle": "2024-06-25T22:58:37.224141Z", + "shell.execute_reply": "2024-06-25T22:58:37.223672Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_label_issuelabel_qualitygiven_labelpredicted_label
0False0.85837166
1False0.54727433
2False0.82622877
3False0.96600888
4False0.79244944
\n", + "
" + ], + "text/plain": [ + " is_label_issue label_quality given_label predicted_label\n", + "0 False 0.858371 6 6\n", + "1 False 0.547274 3 3\n", + "2 False 0.826228 7 7\n", + "3 False 0.966008 8 8\n", + "4 False 0.792449 4 4" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_issues.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can get the subset of examples flagged with label issues, and also sort by label quality score to find the indices of the 10 most likely mislabeled examples in our dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:37.226182Z", + "iopub.status.busy": "2024-06-25T22:58:37.225848Z", + "iopub.status.idle": "2024-06-25T22:58:37.229638Z", + "shell.execute_reply": "2024-06-25T22:58:37.229167Z" + } + }, + "outputs": [], + "source": [ + "identified_issues = label_issues[label_issues[\"is_label_issue\"] == True]\n", + "lowest_quality_labels = label_issues[\"label_quality\"].argsort()[:10].to_numpy()" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:37.231718Z", + "iopub.status.busy": "2024-06-25T22:58:37.231397Z", + "iopub.status.idle": "2024-06-25T22:58:37.234484Z", + "shell.execute_reply": "2024-06-25T22:58:37.233942Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "cleanlab found 44 potential label errors in the dataset.\n", + "Here are indices of the top 10 most likely errors: \n", + " [646 390 628 121 702 863 456 135 337 735]\n" + ] + } + ], + "source": [ + "print(\n", + " f\"cleanlab found {len(identified_issues)} potential label errors in the dataset.\\n\"\n", + " f\"Here are indices of the top 10 most likely errors: \\n {lowest_quality_labels}\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's review some of the most likely label errors. To help us inspect these datapoints, we define a method to print any example from the dataset, together with its given (original) label and the suggested alternative label from cleanlab.\n", + "\n", + "We then display some of the top-ranked label issues identified by cleanlab:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:37.236420Z", + "iopub.status.busy": "2024-06-25T22:58:37.236116Z", + "iopub.status.idle": "2024-06-25T22:58:37.239113Z", + "shell.execute_reply": "2024-06-25T22:58:37.238568Z" + } + }, + "outputs": [], + "source": [ + "def print_as_df(index):\n", + " return pd.DataFrame(\n", + " {\n", + " \"text\": raw_train_texts, \n", + " \"given_label\": raw_train_labels,\n", + " \"predicted_label\": encoder.inverse_transform(label_issues[\"predicted_label\"]),\n", + " },\n", + " ).iloc[index]" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:37.240996Z", + "iopub.status.busy": "2024-06-25T22:58:37.240698Z", + "iopub.status.idle": "2024-06-25T22:58:37.247747Z", + "shell.execute_reply": "2024-06-25T22:58:37.247283Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
textgiven_labelpredicted_label
646i was charged for getting cash.card_about_to_expirecard_payment_fee_charged
390can i change my pin on holiday?beneficiary_not_allowedchange_pin
628will i be sent a new card before mine expires?apple_pay_or_google_paycard_about_to_expire
121Would you rather fight one horse-sized duck or 100 duck-sized horses?lost_or_stolen_phonegetting_spare_card
702please tell me how to change my pin.beneficiary_not_allowedchange_pin
\n", + "
" + ], + "text/plain": [ + " text \\\n", + "646 i was charged for getting cash. \n", + "390 can i change my pin on holiday? \n", + "628 will i be sent a new card before mine expires? \n", + "121 Would you rather fight one horse-sized duck or 100 duck-sized horses? \n", + "702 please tell me how to change my pin. \n", + "\n", + " given_label predicted_label \n", + "646 card_about_to_expire card_payment_fee_charged \n", + "390 beneficiary_not_allowed change_pin \n", + "628 apple_pay_or_google_pay card_about_to_expire \n", + "121 lost_or_stolen_phone getting_spare_card \n", + "702 beneficiary_not_allowed change_pin " + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print_as_df(lowest_quality_labels[:5])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "These are very clear label errors that cleanlab has identified in this data! Note that the `given_label` does not correctly reflect the intent of these requests, whoever produced this dataset made many mistakes that are important to address before modeling the data.\n", + "\n", + "cleanlab has shortlisted the most likely label errors to speed up your data cleaning process. With this list, you can decide whether to fix these label issues or remove ambiguous examples from the dataset." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Train a more robust model from noisy labels\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Fixing the label issues manually may be time-consuming, but cleanlab can filter these noisy examples and train a model on the remaining clean data for you automatically.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To establish a baseline, let's first train and evaluate our original Logistic Regression model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:37.249754Z", + "iopub.status.busy": "2024-06-25T22:58:37.249441Z", + "iopub.status.idle": "2024-06-25T22:58:37.472788Z", + "shell.execute_reply": "2024-06-25T22:58:37.472207Z" + }, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " Test accuracy of original model: 0.87\n" + ] + } + ], + "source": [ + "baseline_model = LogisticRegression(max_iter=400) # note we first re-instantiate the model\n", + "baseline_model.fit(X=train_texts, y=train_labels)\n", + "\n", + "preds = baseline_model.predict(test_texts)\n", + "acc_og = accuracy_score(test_labels, preds)\n", + "print(f\"\\n Test accuracy of original model: {acc_og}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have a baseline, let's check if using `CleanLearning` improves our test accuracy.\n", + "\n", + "`CleanLearning` provides a wrapper that can be applied to any scikit-learn compatible model. The resulting model object can be used in the same manner, but it will now train more robustly if the data has noisy labels.\n", + "\n", + "We can use the same `CleanLearning` object defined above, and pass the label issues we already computed into `.fit()` via the `label_issues` argument. This accelerates things; if we did not provide the label issues, then they would be recomputed via cross-validation. After that `CleanLearning` simply deletes the examples with label issues and retrains your model on the remaining data." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:37.475292Z", + "iopub.status.busy": "2024-06-25T22:58:37.474901Z", + "iopub.status.idle": "2024-06-25T22:58:37.673676Z", + "shell.execute_reply": "2024-06-25T22:58:37.673094Z" + }, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Test accuracy of cleanlab's model: 0.89" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "cl.fit(X=train_texts, labels=train_labels, label_issues=cl.get_label_issues())\n", + "\n", + "pred_labels = cl.predict(test_texts)\n", + "acc_cl = accuracy_score(test_labels, pred_labels)\n", + "print(f\"Test accuracy of cleanlab's model: {acc_cl}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can see that the test set accuracy slightly improved as a result of the data cleaning. Note that this will not always be the case, especially when we are evaluating on test data that are themselves noisy. The best practice is to run cleanlab to identify potential label issues and then manually review them, before blindly trusting any accuracy metrics. In particular, the most effort should be made to ensure high-quality test data, which is supposed to reflect the expected performance of our model during deployment.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:37.676213Z", + "iopub.status.busy": "2024-06-25T22:58:37.675824Z", + "iopub.status.idle": "2024-06-25T22:58:37.679887Z", + "shell.execute_reply": "2024-06-25T22:58:37.679368Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "highlighted_indices = [646, 390, 628, 702] # check these examples were found in find_label_issues\n", + "if not all(x in identified_issues.index for x in highlighted_indices):\n", + " raise Exception(\"Some highlighted examples are missing from ranked_label_issues.\")\n", + "\n", + "# Also check that cleanlab has improved prediction accuracy\n", + "if acc_og >= acc_cl:\n", + " raise Exception(\"Cleanlab training failed to improve model accuracy.\")" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "Text x TensorFlow", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": { + "051b95236e1842619ae11a0dfcb5cea9": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "12c2308937124a598ca9aa72a3d387dd": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_051b95236e1842619ae11a0dfcb5cea9", + "placeholder": "​", + "style": "IPY_MODEL_bc1275e8a8c84c77bba48e1eeaa77bd5", + "tabbable": null, + "tooltip": null, + "value": "tokenizer_config.json: 100%" + } + }, + "13da050dbb7f402b8cc4519904f50aa2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "1797900059ca4d3e9302ac523d560132": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "17b6d679744f48b89b9ecade906024c8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_db77b75555a8454b83f7f9bf1426976d", + "placeholder": "​", + "style": "IPY_MODEL_6527f782c06541b088360bc4cb4d005e", + "tabbable": null, + "tooltip": null, + "value": "pytorch_model.bin: 100%" + } + }, + "1972ba4e39964510bf089a893d813eef": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "19aa373c55c24f1f82485bda00282f9b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_8c8a8a802f994fd2a723938eab36ac42", + "placeholder": "​", + "style": "IPY_MODEL_d459e593b0cf4897965b636cad71a1a8", + "tabbable": null, + "tooltip": null, + "value": "vocab.txt: 100%" + } + }, + "19b17b159cb045e2b9409a32373baddb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_b79fde25096642c3986cdf253bb822c6", + "max": 2211.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_3276755a5223478c8966c62bbd86c931", + "tabbable": null, + "tooltip": null, + "value": 2211.0 + } + }, + "1a6969d869184aae809e5bd7297a5640": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "1d6b40b92bfb4edc8ad82a502ca348e7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1e3a852d593944b18283fc6631b75205": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3276755a5223478c8966c62bbd86c931": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "353319008bd54b5db7c5aa1156c04857": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "36a549ec12aa40118c49990d00d92e56": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "379c19fb127744fbaddedb38e706da2f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "379c274234a84819be64cb34a14e5a60": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3a0bf4a32fc648128d7e3f9a4feb12a4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "4175eb1743bb47febc1c98003c127efa": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_fd918815e8fd416e9e948d96e4725abd", + "placeholder": "​", + "style": "IPY_MODEL_bd5b40ed2ef84fe0b0237ab4c21fdbca", + "tabbable": null, + "tooltip": null, + "value": "README.md: 100%" + } + }, + "52216603b0444bb99dbf316ab4eb06da": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5325af54fcc3497e9c80f084403c9542": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "53965e4c6b784a2abcbb37e1db49e5a1": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "56ee21d5ae804bf4b335e5170af63770": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "589b38d8042e487a9264c2a173338b9b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_5325af54fcc3497e9c80f084403c9542", + "max": 391.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_caa96b6f766d49c2bd7c88e557f23410", + "tabbable": null, + "tooltip": null, + "value": 391.0 + } + }, + "5a6fb2df8c9448babf55562331f7d91f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_b8d2d005669f4ec0ba9f7d5c95fb40dd", + "placeholder": "​", + "style": "IPY_MODEL_c8a5f3defa9140d4857106695fa10791", + "tabbable": null, + "tooltip": null, + "value": " 232k/232k [00:00<00:00, 38.7MB/s]" + } + }, + "5aa52ab41e6747ccb71de9e1648a2e43": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5e75f0427ce14a47a052cb2740161cb8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_1797900059ca4d3e9302ac523d560132", + "placeholder": "​", + "style": "IPY_MODEL_3a0bf4a32fc648128d7e3f9a4feb12a4", + "tabbable": null, + "tooltip": null, + "value": "tokenizer.json: 100%" + } + }, + "64dc69730469416f9a563187b21cce57": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6527f782c06541b088360bc4cb4d005e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "6645dc2a73d84294ba3b21b6571ae376": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_9ca7416cbcc44207b9ff8ec60b77e4f7", + "IPY_MODEL_88175351326240ea9b9e4a86c6adbf79", + "IPY_MODEL_99f9e075e06845aab6c516674ddf54b3" + ], + "layout": "IPY_MODEL_56ee21d5ae804bf4b335e5170af63770", + "tabbable": null, + "tooltip": null + } + }, + "6890d3a5e3ca459e846322cbd0cb7681": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7194a422b3d74e05a5f2cc1c2755b878": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_64dc69730469416f9a563187b21cce57", + "placeholder": "​", + "style": "IPY_MODEL_eff1b23e81db4d9fb99976f2f804f4d4", + "tabbable": null, + "tooltip": null, + "value": " 466k/466k [00:00<00:00, 15.3MB/s]" + } + }, + "71a66c5855e24b9ba0271326d76cbbeb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_cefa58e6f04245eaafe2b5263b13bb95", + "IPY_MODEL_589b38d8042e487a9264c2a173338b9b", + "IPY_MODEL_e1fa773d76e549459e4897b226a0889a" + ], + "layout": "IPY_MODEL_53965e4c6b784a2abcbb37e1db49e5a1", + "tabbable": null, + "tooltip": null + } + }, + "727afc6855144b23b5e5c7f31e22a5e6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "78eec705b8b24284a5ab90664dde1e13": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_52216603b0444bb99dbf316ab4eb06da", + "placeholder": "​", + "style": "IPY_MODEL_1a6969d869184aae809e5bd7297a5640", + "tabbable": null, + "tooltip": null, + "value": " 48.0/48.0 [00:00<00:00, 8.61kB/s]" + } + }, + "7fbfcf03a225467b8c41902c74fc32f9": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_17b6d679744f48b89b9ecade906024c8", + "IPY_MODEL_c55108249e8543e78764a0f0a45a568d", + "IPY_MODEL_b9ef14965c534b329236a3a445f116ec" + ], + "layout": "IPY_MODEL_36a549ec12aa40118c49990d00d92e56", + "tabbable": null, + "tooltip": null + } + }, + "80d61f23f3b1421597675f9158230806": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "82917f444a44410580081144539a8367": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "85b6e0daec254d90817fdc4896b81182": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "86788b006c0c4145a40ff9c32d8f71d6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_12c2308937124a598ca9aa72a3d387dd", + "IPY_MODEL_cbf4e292633e4c32a012d754c36df918", + "IPY_MODEL_78eec705b8b24284a5ab90664dde1e13" + ], + "layout": "IPY_MODEL_5aa52ab41e6747ccb71de9e1648a2e43", + "tabbable": null, + "tooltip": null + } + }, + "88175351326240ea9b9e4a86c6adbf79": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_6890d3a5e3ca459e846322cbd0cb7681", + "max": 665.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_13da050dbb7f402b8cc4519904f50aa2", + "tabbable": null, + "tooltip": null, + "value": 665.0 + } + }, + "8c8a8a802f994fd2a723938eab36ac42": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9541afed3d2d4fbb9874c45ec2dd0ca0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_19aa373c55c24f1f82485bda00282f9b", + "IPY_MODEL_a3e6dfde4cba4164a3a9f7b64bf5ece9", + "IPY_MODEL_5a6fb2df8c9448babf55562331f7d91f" + ], + "layout": "IPY_MODEL_379c274234a84819be64cb34a14e5a60", + "tabbable": null, + "tooltip": null + } + }, + "99f9e075e06845aab6c516674ddf54b3": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_ae6e8c3bbf59436a8d38e4cbadafc16d", + "placeholder": "​", + "style": "IPY_MODEL_c76d546ab3bb40afbc09225f4ba9a61a", + "tabbable": null, + "tooltip": null, + "value": " 665/665 [00:00<00:00, 122kB/s]" + } + }, + "9ca7416cbcc44207b9ff8ec60b77e4f7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_80d61f23f3b1421597675f9158230806", + "placeholder": "​", + "style": "IPY_MODEL_ee2ef3a1837742ba90d4d891c883a448", + "tabbable": null, + "tooltip": null, + "value": "config.json: 100%" + } + }, + "a09960d7c6b845278d156742e46337b4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "a13610463eff443eab5201d88b437c32": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_4175eb1743bb47febc1c98003c127efa", + "IPY_MODEL_19b17b159cb045e2b9409a32373baddb", + "IPY_MODEL_e6cb5b942c2643058c8881d9abb1ee59" + ], + "layout": "IPY_MODEL_1d6b40b92bfb4edc8ad82a502ca348e7", + "tabbable": null, + "tooltip": null + } + }, + "a3e6dfde4cba4164a3a9f7b64bf5ece9": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_85b6e0daec254d90817fdc4896b81182", + "max": 231508.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_fdd93a54ae914a45a4e3f741258f7ebf", + "tabbable": null, + "tooltip": null, + "value": 231508.0 + } + }, + "a5d79ce8fcef45fc855951236e270f87": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a75f055623954af4bff1b9766480c9b4": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "adb613c0dee44ac6ad8ac069e861a8b0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "ae6e8c3bbf59436a8d38e4cbadafc16d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b2df1dd291914ac7861f8930a5723c1b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "b79fde25096642c3986cdf253bb822c6": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b8d2d005669f4ec0ba9f7d5c95fb40dd": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b9ef14965c534b329236a3a445f116ec": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_a5d79ce8fcef45fc855951236e270f87", + "placeholder": "​", + "style": "IPY_MODEL_c9893574ae0c4a5a84dfb5057618473d", + "tabbable": null, + "tooltip": null, + "value": " 54.2M/54.2M [00:00<00:00, 158MB/s]" + } + }, + "bc1275e8a8c84c77bba48e1eeaa77bd5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "bd5b40ed2ef84fe0b0237ab4c21fdbca": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "c55108249e8543e78764a0f0a45a568d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_1972ba4e39964510bf089a893d813eef", + "max": 54245363.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_e507c6a79fc249daa9169eb29294e44f", + "tabbable": null, + "tooltip": null, + "value": 54245363.0 + } + }, + "c76d546ab3bb40afbc09225f4ba9a61a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "c8a5f3defa9140d4857106695fa10791": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "c9893574ae0c4a5a84dfb5057618473d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "c9d478f89f134427855dc672b84f2d28": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "caa96b6f766d49c2bd7c88e557f23410": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "cbf4e292633e4c32a012d754c36df918": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_379c19fb127744fbaddedb38e706da2f", + "max": 48.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_a09960d7c6b845278d156742e46337b4", + "tabbable": null, + "tooltip": null, + "value": 48.0 + } + }, + "cefa58e6f04245eaafe2b5263b13bb95": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_c9d478f89f134427855dc672b84f2d28", + "placeholder": "​", + "style": "IPY_MODEL_b2df1dd291914ac7861f8930a5723c1b", + "tabbable": null, + "tooltip": null, + "value": ".gitattributes: 100%" + } + }, + "d459e593b0cf4897965b636cad71a1a8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "d7220a33224d4df99774d8518fb4df26": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "d9cca23a44ac4e8a9bd6eee7d7fb84cb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_5e75f0427ce14a47a052cb2740161cb8", + "IPY_MODEL_e13d202ff3674675bee75cf64f2aeb4a", + "IPY_MODEL_7194a422b3d74e05a5f2cc1c2755b878" + ], + "layout": "IPY_MODEL_1e3a852d593944b18283fc6631b75205", + "tabbable": null, + "tooltip": null + } + }, + "db77b75555a8454b83f7f9bf1426976d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e13d202ff3674675bee75cf64f2aeb4a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_a75f055623954af4bff1b9766480c9b4", + "max": 466062.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_adb613c0dee44ac6ad8ac069e861a8b0", + "tabbable": null, + "tooltip": null, + "value": 466062.0 + } + }, + "e1fa773d76e549459e4897b226a0889a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_353319008bd54b5db7c5aa1156c04857", + "placeholder": "​", + "style": "IPY_MODEL_727afc6855144b23b5e5c7f31e22a5e6", + "tabbable": null, + "tooltip": null, + "value": " 391/391 [00:00<00:00, 64.7kB/s]" + } + }, + "e507c6a79fc249daa9169eb29294e44f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "e6cb5b942c2643058c8881d9abb1ee59": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_82917f444a44410580081144539a8367", + "placeholder": "​", + "style": "IPY_MODEL_d7220a33224d4df99774d8518fb4df26", + "tabbable": null, + "tooltip": null, + "value": " 2.21k/2.21k [00:00<00:00, 384kB/s]" + } + }, + "ee2ef3a1837742ba90d4d891c883a448": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "eff1b23e81db4d9fb99976f2f804f4d4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "fd918815e8fd416e9e948d96e4725abd": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fdd93a54ae914a45a4e3f741258f7ebf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + } + }, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/audio.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/audio.ipynb new file mode 100644 index 000000000..1c3bab40e --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/audio.ipynb @@ -0,0 +1,3213 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "eVufWTY3jRPx" + }, + "source": [ + "# Detecting Issues in an Audio Dataset with Datalab\n", + "\n", + "In this 5-minute quickstart tutorial, we use cleanlab to find label issues in the [Spoken Digit dataset](https://www.tensorflow.org/datasets/catalog/spoken_digit) (it's like MNIST for audio). The dataset contains 2,500 audio clips with English pronunciations of the digits 0 to 9 (these are the class labels to predict from the audio).\n", + "\n", + "**Overview of what we'll do in this tutorial:**\n", + "\n", + "- Extract features from audio clips (.wav files) using a [pre-trained Pytorch model](https://huggingface.co/speechbrain/spkrec-xvect-voxceleb) from HuggingFace that was previously fit to the [VoxCeleb](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/) speech dataset.\n", + "\n", + "- Train a cross-validated linear model using the extracted features and generate out-of-sample predicted probabilities.\n", + "\n", + "- Apply cleanlab's `Datalab` audit to these predictions in order to identify which audio clips in the dataset are likely mislabeled.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "Already have a `model`? Run cross-validation to get out-of-sample `pred_probs`, and then run the code below to audit your dataset and identify any potential issues.\n", + "\n", + "\n", + "
\n", + " \n", + "```python\n", + "\n", + "from cleanlab import Datalab\n", + "\n", + "lab = Datalab(data=your_dataset, label_name=\"column_name_of_labels\")\n", + "lab.find_issues(pred_probs=your_pred_probs, issue_types={\"label\":{}})\n", + "\n", + "lab.get_issues(\"label\")\n", + " \n", + "```\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eqsqBq3PiUHA" + }, + "source": [ + "## 1. Install dependencies and import them\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7nT-U9qc8MS" + }, + "source": [ + "You can use `pip` to install all packages required for this tutorial as follows:\n", + "\n", + "```ipython3\n", + "!pip install tensorflow==2.12.1 tensorflow_io==0.32.0 huggingface_hub==0.17.0 speechbrain==0.5.13 \n", + "!pip install \"cleanlab[datalab]\"\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:40.845728Z", + "iopub.status.busy": "2024-06-25T22:58:40.845543Z", + "iopub.status.idle": "2024-06-25T22:58:46.056318Z", + "shell.execute_reply": "2024-06-25T22:58:46.055750Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs website).\n", + "# Package versions used: tensorflow==2.12.1 tensorflow-io==0.32.0 torch==2.1.2 torchaudio==2.1.2 speechbrain==0.5.13\n", + "\n", + "dependencies = [\"cleanlab\", \"tensorflow==2.12.1\", \"tensorflow_io==0.32.0\", \"huggingface_hub==0.17.0\", \"speechbrain==0.5.13\", \"datasets\"]\n", + "\n", + "# Supress outputs that may appear if tensorflow happens to be improperly installed: \n", + "import os \n", + "os.environ[\"TF_CPP_MIN_LOG_LEVEL\"] = \"3\" \n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\") " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "x-oboEbRdhf6" + }, + "source": [ + "Let's import some of the packages needed throughout this tutorial.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:46.059141Z", + "iopub.status.busy": "2024-06-25T22:58:46.058559Z", + "iopub.status.idle": "2024-06-25T22:58:46.061863Z", + "shell.execute_reply": "2024-06-25T22:58:46.061316Z" + }, + "id": "LaEiwXUiVHCS" + }, + "outputs": [], + "source": [ + "import os\n", + "import pandas as pd\n", + "import numpy as np\n", + "import random\n", + "import tensorflow as tf\n", + "import torch\n", + "\n", + "from cleanlab import Datalab\n", + "\n", + "SEED = 456 # ensure reproducibility" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:46.064020Z", + "iopub.status.busy": "2024-06-25T22:58:46.063572Z", + "iopub.status.idle": "2024-06-25T22:58:46.068163Z", + "shell.execute_reply": "2024-06-25T22:58:46.067637Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# This (optional) cell is hidden from docs.cleanlab.ai \n", + "\n", + "def set_seed(seed=0):\n", + " \"\"\"Ensure reproducibility.\"\"\"\n", + " np.random.seed(seed)\n", + " torch.manual_seed(seed)\n", + " torch.backends.cudnn.deterministic = True\n", + " torch.backends.cudnn.benchmark = False\n", + " torch.cuda.manual_seed_all(seed)\n", + "\n", + "\n", + "set_seed(SEED)\n", + "pd.options.display.max_colwidth = 500\n", + "tf.get_logger().setLevel('FATAL') # suppress more TF logs" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SOen_sxQidLC" + }, + "source": [ + "## 2. Load the data\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uHVskN2eeNj6" + }, + "source": [ + "We must first fetch the dataset. To run the below command, you'll need to have `wget` installed; alternatively you can manually navigate to the link in your browser and download from there.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2024-06-25T22:58:46.070430Z", + "iopub.status.busy": "2024-06-25T22:58:46.070018Z", + "iopub.status.idle": "2024-06-25T22:58:47.619144Z", + "shell.execute_reply": "2024-06-25T22:58:47.618527Z" + }, + "id": "GRDPEg7-VOQe", + "outputId": "cb886220-e86e-4a77-9f3a-d7844c37c3a6" + }, + "outputs": [], + "source": [ + "%%capture\n", + "\n", + "!wget https://github.com/Jakobovski/free-spoken-digit-dataset/archive/v1.0.9.tar.gz\n", + "!mkdir spoken_digits\n", + "!tar -xf v1.0.9.tar.gz -C spoken_digits" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tRvNnyB0e_IE" + }, + "source": [ + "The audio data are .wav files in the `recordings/` folder. Note that the label for each audio clip (i.e. digit from 0 to 9) is indicated in the prefix of the file name (e.g. `6_nicolas_32.wav` has the label 6). If instead applying cleanlab to your own dataset, its classes should be represented as integer indices 0, 1, ..., num_classes - 1." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2024-06-25T22:58:47.621726Z", + "iopub.status.busy": "2024-06-25T22:58:47.621385Z", + "iopub.status.idle": "2024-06-25T22:58:47.631691Z", + "shell.execute_reply": "2024-06-25T22:58:47.631151Z" + }, + "id": "FDA5sGZwUSur", + "outputId": "0cedc509-63fd-4dc3-d32f-4b537dfe3895" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/7_george_26.wav',\n", + " 'spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/0_nicolas_24.wav',\n", + " 'spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/0_nicolas_6.wav']" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "DATA_PATH = \"spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/\"\n", + "\n", + "# Get list of .wav file names\n", + "# os.listdir order is nondeterministic, so for reproducibility,\n", + "# we sort first and then do a deterministic shuffle\n", + "file_names = sorted(i for i in os.listdir(DATA_PATH) if i.endswith(\".wav\"))\n", + "random.Random(SEED).shuffle(file_names)\n", + "\n", + "file_paths = [os.path.join(DATA_PATH, name) for name in file_names]\n", + "\n", + "# Check out first 3 files\n", + "file_paths[:3]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Xi2592bVhSab" + }, + "source": [ + "Let's listen to some example audio clips from the dataset. We introduce a `display_example` function to process the .wav file so we can listen to it in this notebook (can skip these details)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
See the implementation of `display_example` **(click to expand)**\n", + "\n", + "```python\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "import tensorflow_io as tfio\n", + "from pathlib import Path\n", + "from IPython import display\n", + "\n", + "# Utility function for loading audio files and making sure the sample rate is correct.\n", + "@tf.function\n", + "def load_wav_16k_mono(filename):\n", + " \"\"\"Load a WAV file, convert it to a float tensor, resample to 16 kHz single-channel audio.\"\"\"\n", + " file_contents = tf.io.read_file(filename)\n", + " wav, sample_rate = tf.audio.decode_wav(file_contents, desired_channels=1)\n", + " wav = tf.squeeze(wav, axis=-1)\n", + " sample_rate = tf.cast(sample_rate, dtype=tf.int64)\n", + " wav = tfio.audio.resample(wav, rate_in=sample_rate, rate_out=16000)\n", + " return wav\n", + "\n", + "\n", + "def display_example(wav_file_name, audio_rate=16000):\n", + " \"\"\"Allows us to listen to any wav file and displays its given label in the dataset.\"\"\"\n", + " wav_file_example = load_wav_16k_mono(wav_file_name)\n", + " label = Path(wav_file_name).parts[-1].split(\"_\")[0]\n", + " print(f\"Given label for this example: {label}\")\n", + " display.display(display.Audio(wav_file_example, rate=audio_rate))\n", + "```\n", + "\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:47.633725Z", + "iopub.status.busy": "2024-06-25T22:58:47.633541Z", + "iopub.status.idle": "2024-06-25T22:58:47.638761Z", + "shell.execute_reply": "2024-06-25T22:58:47.638322Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "import tensorflow_io as tfio\n", + "from pathlib import Path\n", + "from IPython import display\n", + "\n", + "# Utility function for loading audio files and making sure the sample rate is correct.\n", + "@tf.function\n", + "def load_wav_16k_mono(filename):\n", + " \"\"\"Load a WAV file, convert it to a float tensor, resample to 16 kHz single-channel audio.\"\"\"\n", + " file_contents = tf.io.read_file(filename)\n", + " wav, sample_rate = tf.audio.decode_wav(file_contents, desired_channels=1)\n", + " wav = tf.squeeze(wav, axis=-1)\n", + " sample_rate = tf.cast(sample_rate, dtype=tf.int64)\n", + " wav = tfio.audio.resample(wav, rate_in=sample_rate, rate_out=16000)\n", + " return wav\n", + "\n", + "\n", + "def display_example(wav_file_name, audio_rate=16000):\n", + " \"\"\"Allows us to listen to any wav file and displays its given label in the dataset.\"\"\"\n", + " wav_file_example = load_wav_16k_mono(wav_file_name)\n", + " label = Path(wav_file_name).parts[-1].split(\"_\")[0]\n", + " print(f\"Given label for this example: {label}\")\n", + " display.display(display.Audio(wav_file_example, rate=audio_rate))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2bLlDRI6hzon" + }, + "source": [ + "Click the play button below to listen to this example .wav file. Feel free to change the `wav_file_name_example` variable below to listen to other audio clips in the dataset.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 92 + }, + "execution": { + "iopub.execute_input": "2024-06-25T22:58:47.640672Z", + "iopub.status.busy": "2024-06-25T22:58:47.640496Z", + "iopub.status.idle": "2024-06-25T22:58:48.094556Z", + "shell.execute_reply": "2024-06-25T22:58:48.094048Z" + }, + "id": "dLBvUZLlII5w", + "outputId": "c6a4917f-4a82-4a89-9193-415072e45550" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Given label for this example: 7\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "wav_file_name_example = \"spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/7_jackson_43.wav\" # change this to hear other examples\n", + "display_example(wav_file_name_example)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-QvbZA7yHwkh" + }, + "source": [ + "## 3. Use pre-trained SpeechBrain model to featurize audio\n", + "\n", + "The [SpeechBrain](https://github.com/speechbrain/speechbrain) package offers many Pytorch neural networks that have been pretrained for speech recognition tasks. Here we instantiate an audio feature extractor using SpeechBrain's `EncoderClassifier`. We'll use the \"spkrec-xvect-voxceleb\" network which has been pre-trained on the [VoxCeleb](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/) speech dataset.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:48.096739Z", + "iopub.status.busy": "2024-06-25T22:58:48.096382Z", + "iopub.status.idle": "2024-06-25T22:58:48.947341Z", + "shell.execute_reply": "2024-06-25T22:58:48.946757Z" + }, + "id": "vL9lkiKsHvKr" + }, + "outputs": [], + "source": [ + "%%capture\n", + "\n", + "from speechbrain.pretrained import EncoderClassifier\n", + "\n", + "feature_extractor = EncoderClassifier.from_hparams(\n", + " \"speechbrain/spkrec-xvect-voxceleb\",\n", + " # run_opts={\"device\":\"cuda\"} # Uncomment this to run on GPU if you have one (optional)\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vXlE6IK4ibcr" + }, + "source": [ + "Next, we run the audio clips through the pre-trained model to extract vector features (aka embeddings).\n", + "\n", + "For this tutorial, ensure that you have `ffmpeg` installed on your system. This is the backend used for loading the audio files." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 143 + }, + "execution": { + "iopub.execute_input": "2024-06-25T22:58:48.949724Z", + "iopub.status.busy": "2024-06-25T22:58:48.949534Z", + "iopub.status.idle": "2024-06-25T22:58:48.967991Z", + "shell.execute_reply": "2024-06-25T22:58:48.967533Z" + }, + "id": "obQYDKdLiUU6", + "outputId": "4e923d5c-2cf4-4a5c-827b-0a4fea9d87e4" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
wav_audio_file_pathlabel
0spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/7_george_26.wav7
1spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/0_nicolas_24.wav0
2spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/0_nicolas_6.wav0
\n", + "
" + ], + "text/plain": [ + " wav_audio_file_path \\\n", + "0 spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/7_george_26.wav \n", + "1 spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/0_nicolas_24.wav \n", + "2 spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/0_nicolas_6.wav \n", + "\n", + " label \n", + "0 7 \n", + "1 0 \n", + "2 0 " + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Create dataframe with .wav file names\n", + "df = pd.DataFrame(file_paths, columns=[\"wav_audio_file_path\"])\n", + "df[\"label\"] = df.wav_audio_file_path.map(lambda x: int(Path(x).parts[-1].split(\"_\")[0]))\n", + "df.head(3)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:48.970070Z", + "iopub.status.busy": "2024-06-25T22:58:48.969629Z", + "iopub.status.idle": "2024-06-25T22:58:48.972812Z", + "shell.execute_reply": "2024-06-25T22:58:48.972286Z" + }, + "id": "I8JqhOZgi94g" + }, + "outputs": [], + "source": [ + "import torchaudio\n", + "\n", + "def extract_audio_embeddings(model, wav_audio_file_path: str) -> tuple:\n", + " \"\"\"Feature extractor that embeds audio into a vector.\"\"\"\n", + " signal, fs = torchaudio.load(wav_audio_file_path, backend=\"ffmpeg\") # Reformat audio signal into a tensor\n", + " embeddings = model.encode_batch(\n", + " signal\n", + " ) # Pass tensor through pretrained neural net and extract representation\n", + " return embeddings" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:58:48.974728Z", + "iopub.status.busy": "2024-06-25T22:58:48.974554Z", + "iopub.status.idle": "2024-06-25T22:59:03.451707Z", + "shell.execute_reply": "2024-06-25T22:59:03.451065Z" + }, + "id": "2FSQ2GR9R_YA" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/torch/functional.py:650: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.\n", + "Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:863.)\n", + " return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]\n" + ] + } + ], + "source": [ + "# Extract audio embeddings\n", + "embeddings_list = []\n", + "for i, file_name in enumerate(df.wav_audio_file_path): # for each .wav file name\n", + " embeddings = extract_audio_embeddings(feature_extractor, file_name)\n", + " embeddings_list.append(embeddings.cpu().numpy())\n", + "\n", + "embeddings_array = np.squeeze(np.array(embeddings_list))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dELkcdXgjTn_" + }, + "source": [ + "Now we have our features in a 2D numpy array. Each row in the array corresponds to an audio clip. We're now able to represent each audio clip as a 512-dimensional feature vector!\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2024-06-25T22:59:03.454512Z", + "iopub.status.busy": "2024-06-25T22:59:03.454100Z", + "iopub.status.idle": "2024-06-25T22:59:03.458096Z", + "shell.execute_reply": "2024-06-25T22:59:03.457604Z" + }, + "id": "kAkY31IVXyr8", + "outputId": "fd70d8d6-2f11-48d5-ae9c-a8c97d453632" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[-14.196311 7.319459 12.478975 ... 2.2890875 2.8170238\n", + " -10.89265 ]\n", + " [-24.898056 5.256195 12.559641 ... -3.559721 9.62067\n", + " -10.285245 ]\n", + " [-21.709627 7.5033693 7.913803 ... -6.819831 3.1831515\n", + " -17.208763 ]\n", + " ...\n", + " [-16.084257 6.3210397 12.005453 ... 1.216152 9.478235\n", + " -10.6821785 ]\n", + " [-15.053807 5.242471 1.091424 ... -0.78334856 9.03954\n", + " -23.569176 ]\n", + " [-19.761097 1.1258295 16.753237 ... 3.3508866 11.598274\n", + " -16.23712 ]]\n", + "Shape of array: (2500, 512)\n" + ] + } + ], + "source": [ + "print(embeddings_array)\n", + "print(\"Shape of array: \", embeddings_array.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "o4RBcaARmfVG" + }, + "source": [ + "## 4. Fit linear model and compute out-of-sample predicted probabilities\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "y9BIVyI9kHa4" + }, + "source": [ + "A typical way to leverage pretrained networks for a particular classification task is to add a linear output layer and fine-tune the network parameters on the new data. However this can be computationally intensive. Alternatively, we can freeze the pretrained weights of the network and only train the output layer without having to rely on GPU(s). Here we do this conveniently by fitting a scikit-learn linear model on top of the extracted network embeddings.\n", + "\n", + "To identify label issues, cleanlab requires a probabilistic prediction from your model for every datapoint that should be considered. However these predictions will be _overfit_ (and thus unreliable) for datapoints the model was previously trained on. cleanlab is intended to only be used with **out-of-sample** predicted probabilities, i.e. on datapoints held-out from the model during the training.\n", + "\n", + "K-fold cross-validation is a straightforward way to produce out-of-sample predicted probabilities for every datapoint in the dataset, by training K copies of our model on different data subsets and using each copy to predict on the subset of data it did not see during training. An additional benefit of cross-validation is that it provides more reliable evaluation of our model than a single training/validation split. We can obtain cross-validated out-of-sample predicted probabilities from any classifier via the [cross_val_predict](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html) wrapper provided in scikit-learn.\n", + "Make sure that the columns of your `pred_probs` are properly ordered with respect to the ordering of classes, which for Datalab is: lexicographically sorted by class name.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:03.460277Z", + "iopub.status.busy": "2024-06-25T22:59:03.459958Z", + "iopub.status.idle": "2024-06-25T22:59:04.184137Z", + "shell.execute_reply": "2024-06-25T22:59:04.183570Z" + }, + "id": "i_drkY9YOcw4" + }, + "outputs": [], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.model_selection import cross_val_predict\n", + "\n", + "model = LogisticRegression(C=0.01, max_iter=1000, tol=1e-2, random_state=SEED)\n", + "\n", + "num_crossval_folds = 5 # can decrease this value to reduce runtime, or increase it to get better results\n", + "pred_probs = cross_val_predict(\n", + " estimator=model, X=embeddings_array, y=df.label.values, cv=num_crossval_folds, method=\"predict_proba\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FW1yI9Ryrfkj" + }, + "source": [ + "For each audio clip, the corresponding predicted probabilities in `pred_probs` are produced by a copy of our `LogisticRegression` model that has never been trained on this audio clip. Hence we call these predictions _out-of-sample_. An additional benefit of cross-validation is that it provides more reliable evaluation of our model than a single training/validation split.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2024-06-25T22:59:04.187084Z", + "iopub.status.busy": "2024-06-25T22:59:04.186697Z", + "iopub.status.idle": "2024-06-25T22:59:04.191382Z", + "shell.execute_reply": "2024-06-25T22:59:04.190911Z" + }, + "id": "_b-AQeoXOc7q", + "outputId": "15ae534a-f517-4906-b177-ca91931a8954" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cross-validated estimate of accuracy on held-out data: 0.9708\n" + ] + } + ], + "source": [ + "from sklearn.metrics import accuracy_score\n", + "\n", + "predicted_labels = pred_probs.argmax(axis=1)\n", + "cv_accuracy = accuracy_score(df.label.values, predicted_labels)\n", + "print(f\"Cross-validated estimate of accuracy on held-out data: {cv_accuracy}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SPz8WBwIlxUE" + }, + "source": [ + "## 5. Use cleanlab to find label issues\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "laui-jXMm6qR" + }, + "source": [ + "Based on the given labels, out-of-sample predicted probabilities and features, cleanlab can quickly help us identify label issues in our dataset. For a dataset with N examples from K classes, the labels should be a 1D array of length N and predicted probabilities should be a 2D (N x K) array. \n", + "\n", + "Here, we use cleanlab to find potential label errors in our data. `Datalab` has several ways of loading the data. In this case, we can just pass the DataFrame created above to instantiate the object. We will then pass in the predicted probabilites to the `find_issues()` method so that Datalab can use them to find potential label errors in our data." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:04.193789Z", + "iopub.status.busy": "2024-06-25T22:59:04.193426Z", + "iopub.status.idle": "2024-06-25T22:59:04.293132Z", + "shell.execute_reply": "2024-06-25T22:59:04.292506Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding label issues ...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Audit complete. 7 issues found in the dataset.\n" + ] + } + ], + "source": [ + "lab = Datalab(df, label_name=\"label\")\n", + "lab.find_issues(pred_probs=pred_probs, issue_types={\"label\":{}})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can view the results of running Datalab by calling the `report` method:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:04.295635Z", + "iopub.status.busy": "2024-06-25T22:59:04.295248Z", + "iopub.status.idle": "2024-06-25T22:59:04.308102Z", + "shell.execute_reply": "2024-06-25T22:59:04.307524Z" + }, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dataset Information: num_examples: 2500, num_classes: 10\n", + "\n", + "Here is a summary of various issues found in your data:\n", + "\n", + "issue_type num_issues\n", + " label 7\n", + "\n", + "Learn about each issue: https://docs.cleanlab.ai/stable/cleanlab/datalab/guide/issue_type_description.html\n", + "See which examples in your dataset exhibit each issue via: `datalab.get_issues()`\n", + "\n", + "Data indices corresponding to top examples of each issue are shown below.\n", + "\n", + "\n", + "----------------------- label issues -----------------------\n", + "\n", + "About this issue:\n", + "\tExamples whose given label is estimated to be potentially incorrect\n", + " (e.g. due to annotation error) are flagged as having label issues.\n", + " \n", + "\n", + "Number of examples with this issue: 7\n", + "Overall dataset quality in terms of this issue: 0.9976\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_label_issue label_score given_label predicted_label\n", + "986 True 0.002161 6 3\n", + "176 True 0.002483 7 8\n", + "2318 False 0.004411 3 6\n", + "1005 False 0.004857 0 9\n", + "1871 True 0.007494 6 8\n" + ] + } + ], + "source": [ + "lab.report()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We observe from the report that cleanlab has found some label issues in our dataset. Let us investigate these examples further.\n", + "\n", + "We can view the more details about the label quality for each example using the `get_issues` method, specifying `label` as the issue type." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:04.310494Z", + "iopub.status.busy": "2024-06-25T22:59:04.310016Z", + "iopub.status.idle": "2024-06-25T22:59:04.317984Z", + "shell.execute_reply": "2024-06-25T22:59:04.317507Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_label_issuelabel_scoregiven_labelpredicted_label
0False0.04058776
1False0.99920700
2False0.99937700
3False0.97522088
4False0.99936755
\n", + "
" + ], + "text/plain": [ + " is_label_issue label_score given_label predicted_label\n", + "0 False 0.040587 7 6\n", + "1 False 0.999207 0 0\n", + "2 False 0.999377 0 0\n", + "3 False 0.975220 8 8\n", + "4 False 0.999367 5 5" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_issues = lab.get_issues(\"label\")\n", + "label_issues.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This method returns a dataframe containing a label quality score for each example. These numeric scores lie between 0 and 1, where lower scores indicate examples more likely to be mislabeled. The dataframe also contains a boolean column specifying whether or not each example is identified to have a label issue (indicating it is likely mislabeled).\n", + "\n", + "We can then filter for the examples that have been identified as a label error:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:04.320165Z", + "iopub.status.busy": "2024-06-25T22:59:04.319843Z", + "iopub.status.idle": "2024-06-25T22:59:04.324123Z", + "shell.execute_reply": "2024-06-25T22:59:04.323688Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Here are indices of the most likely errors: \n", + " [ 986 176 1871 516 1946 469 2132]\n" + ] + } + ], + "source": [ + "identified_label_issues = label_issues[label_issues[\"is_label_issue\"] == True]\n", + "lowest_quality_labels = identified_label_issues.sort_values(\"label_score\").index\n", + "\n", + "print(f\"Here are indices of the most likely errors: \\n {lowest_quality_labels.values}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iI07jQ0BnTgt" + }, + "source": [ + "These examples flagged by cleanlab are those worth inspecting more closely." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 237 + }, + "execution": { + "iopub.execute_input": "2024-06-25T22:59:04.326143Z", + "iopub.status.busy": "2024-06-25T22:59:04.325821Z", + "iopub.status.idle": "2024-06-25T22:59:04.331436Z", + "shell.execute_reply": "2024-06-25T22:59:04.330970Z" + }, + "id": "FQwRHgbclpsO", + "outputId": "fee5c335-c00e-4fcc-f22b-718705e93182" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
wav_audio_file_pathlabel
986spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_yweweler_25.wav6
176spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/7_nicolas_43.wav7
1871spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_theo_27.wav6
516spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_yweweler_36.wav6
1946spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_yweweler_14.wav6
469spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_yweweler_35.wav6
2132spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_nicolas_8.wav6
\n", + "
" + ], + "text/plain": [ + " wav_audio_file_path \\\n", + "986 spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_yweweler_25.wav \n", + "176 spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/7_nicolas_43.wav \n", + "1871 spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_theo_27.wav \n", + "516 spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_yweweler_36.wav \n", + "1946 spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_yweweler_14.wav \n", + "469 spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_yweweler_35.wav \n", + "2132 spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_nicolas_8.wav \n", + "\n", + " label \n", + "986 6 \n", + "176 7 \n", + "1871 6 \n", + "516 6 \n", + "1946 6 \n", + "469 6 \n", + "2132 6 " + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.iloc[lowest_quality_labels]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PsDmd5WDnZJG" + }, + "source": [ + "Let's listen to some audio clips below of label issues that were identified in this list.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "p9jLn3Lp85rU" + }, + "source": [ + "In this example, the given label is **6** but it sounds like **8**.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 92 + }, + "execution": { + "iopub.execute_input": "2024-06-25T22:59:04.333385Z", + "iopub.status.busy": "2024-06-25T22:59:04.333193Z", + "iopub.status.idle": "2024-06-25T22:59:04.447734Z", + "shell.execute_reply": "2024-06-25T22:59:04.447152Z" + }, + "id": "ff1NFVlDoysO", + "outputId": "8141a036-44c1-4349-c338-880432513e37" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Given label for this example: 6\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "wav_file_name_example = \"spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_yweweler_14.wav\"\n", + "display_example(wav_file_name_example)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HwokyN0bfVsn" + }, + "source": [ + "In the three examples below, the given label is **6** but they sound quite ambiguous.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 92 + }, + "execution": { + "iopub.execute_input": "2024-06-25T22:59:04.449970Z", + "iopub.status.busy": "2024-06-25T22:59:04.449600Z", + "iopub.status.idle": "2024-06-25T22:59:04.558381Z", + "shell.execute_reply": "2024-06-25T22:59:04.557776Z" + }, + "id": "GZgovGkdiaiP", + "outputId": "d76b2ccf-8be2-4f3a-df4c-2c5c99150db7" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Given label for this example: 6\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "wav_file_name_example = \"spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_yweweler_36.wav\"\n", + "display_example(wav_file_name_example)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 92 + }, + "execution": { + "iopub.execute_input": "2024-06-25T22:59:04.560668Z", + "iopub.status.busy": "2024-06-25T22:59:04.560327Z", + "iopub.status.idle": "2024-06-25T22:59:04.665255Z", + "shell.execute_reply": "2024-06-25T22:59:04.664691Z" + }, + "id": "lfa2eHbMwG8R", + "outputId": "6627ebe2-d439-4bf5-e2cb-44f6278ae86c" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Given label for this example: 6\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "wav_file_name_example = \"spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_yweweler_35.wav\"\n", + "display_example(wav_file_name_example)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:04.667615Z", + "iopub.status.busy": "2024-06-25T22:59:04.667174Z", + "iopub.status.idle": "2024-06-25T22:59:04.772289Z", + "shell.execute_reply": "2024-06-25T22:59:04.771694Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Given label for this example: 6\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "wav_file_name_example = \"spoken_digits/free-spoken-digit-dataset-1.0.9/recordings/6_nicolas_8.wav\"\n", + "display_example(wav_file_name_example)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-rf8iSngtV83" + }, + "source": [ + "You can see that even widely-used datasets like Spoken Digit contain problematic labels. Never blindly trust your data! You should always check it for potential issues, many of which can be easily identified by cleanlab.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:04.774517Z", + "iopub.status.busy": "2024-06-25T22:59:04.774191Z", + "iopub.status.idle": "2024-06-25T22:59:04.777492Z", + "shell.execute_reply": "2024-06-25T22:59:04.776921Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "highlighted_indices = [1946, 516, 469, 2132] # verify these examples were found in find_label_issues\n", + "if not all(x in lowest_quality_labels for x in highlighted_indices):\n", + " raise Exception(\"Some highlighted examples are missing from label_issues_indices.\")" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "collapsed_sections": [], + "name": "audio_quickstart_tutorial_deterministic.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": { + "00f915c09107454681ebfa2b4540f0e8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_51ef1073b0524fd98b6e130059757688", + "placeholder": "​", + "style": "IPY_MODEL_fe6b6ca6fb084738b979aef37c54b399", + "tabbable": null, + "tooltip": null, + "value": " 3.20k/3.20k [00:00<00:00, 803kB/s]" + } + }, + "0290afa0cb3c4a17931ec3f42ae804d8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f1fe2b33807f43c8b4e60f2337122b2e", + "IPY_MODEL_809f1529186a47dcb483d1429c1e4b9a", + "IPY_MODEL_4f6ccbed1de243daa51d2d74dc38f97f" + ], + "layout": "IPY_MODEL_2f5c030fde3548cc9aa00e086715d89d", + "tabbable": null, + "tooltip": null + } + }, + "04cb4c5077eb41298a773da4cc32dec2": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0f572c1ad5114f4588cfb051d49b535f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "1099cdbb12f641959627d62a900557ec": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "27d3185b562d445791b473b250716e1f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "2f5c030fde3548cc9aa00e086715d89d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "32f98320ef984e139c321d7e214c8db5": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "334369b6e2f344128d1a4fc9b7d9524c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "36a700b9da9649e0bf5fd678a84f23b4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "36e18140e7104f689f0c950e74fab62d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "3a2804ae6bd348f3a97dae2a44f48017": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_e2592b808bd84023a5493fa870d55a0b", + "max": 2041.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_8d5abf6685cb4600a2c78ff80f0701f4", + "tabbable": null, + "tooltip": null, + "value": 2041.0 + } + }, + "405acc520a9c439aa3ec12b175636625": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_4e8c2f6b4a55404cb4cac34a5e1ddbb0", + "placeholder": "​", + "style": "IPY_MODEL_63fea54b15ef4ac5a9da0eb1b2a0e03d", + "tabbable": null, + "tooltip": null, + "value": " 2.04k/2.04k [00:00<00:00, 509kB/s]" + } + }, + "4179d8185e434aa1a112ee7d9b29d20b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_443712db39c1449786d22e3a7c0767c4", + "placeholder": "​", + "style": "IPY_MODEL_b02f3f2fb4ec417e88b2a19a29b941a2", + "tabbable": null, + "tooltip": null, + "value": "hyperparams.yaml: 100%" + } + }, + "42b41247581040349e012b68be8f53c5": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "43f2f3d821d04504997dede2c1b82600": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "443712db39c1449786d22e3a7c0767c4": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4e8c2f6b4a55404cb4cac34a5e1ddbb0": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4f6ccbed1de243daa51d2d74dc38f97f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_efe307f51ad34dae898c58c7beda3d63", + "placeholder": "​", + "style": "IPY_MODEL_27d3185b562d445791b473b250716e1f", + "tabbable": null, + "tooltip": null, + "value": " 129k/129k [00:00<00:00, 21.3MB/s]" + } + }, + "4f89be5e3b364ada9df5338e5f7ee4a8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_d48bb13c95054f2f9414d2ab9a822ed1", + "max": 15856877.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_36a700b9da9649e0bf5fd678a84f23b4", + "tabbable": null, + "tooltip": null, + "value": 15856877.0 + } + }, + "51c3c4a3bbe448b4b9902bdd3e6734e7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "51ef1073b0524fd98b6e130059757688": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "53f9b1e20eb3420b9853f15dea0bffb5": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "586066e739754b39b2eaf7a3f1da8504": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "63fea54b15ef4ac5a9da0eb1b2a0e03d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "6686c7d44ccd448c99ca91fa3c91917c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_905292a5be4c43c29e50cb746d562746", + "placeholder": "​", + "style": "IPY_MODEL_a9ee521be6c2488296be4f26b9f8868f", + "tabbable": null, + "tooltip": null, + "value": "mean_var_norm_emb.ckpt: 100%" + } + }, + "6e9efc55a4eb48c1ac08f0b7d8563228": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "779883c9e0f3479aaface68a7fd38753": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "809f1529186a47dcb483d1429c1e4b9a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_32f98320ef984e139c321d7e214c8db5", + "max": 128619.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_43f2f3d821d04504997dede2c1b82600", + "tabbable": null, + "tooltip": null, + "value": 128619.0 + } + }, + "87c5a65a353c4367ab44da337bb564a4": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8d5abf6685cb4600a2c78ff80f0701f4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "8dcf45d6cb38489abf0bc2385f1f0fe3": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c7d4ac522a8141b1855d6a0a06ffccca", + "IPY_MODEL_4f89be5e3b364ada9df5338e5f7ee4a8", + "IPY_MODEL_e2a41f4fe1f64747825001239ab7c734" + ], + "layout": "IPY_MODEL_6e9efc55a4eb48c1ac08f0b7d8563228", + "tabbable": null, + "tooltip": null + } + }, + "905292a5be4c43c29e50cb746d562746": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "98f6b5a3a1d541ffb3fcff9fa60a584a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "9a67d2e831e64ac1b3f117423732ffeb": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9c880a00ce26488ca94bed3c4686acba": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_4179d8185e434aa1a112ee7d9b29d20b", + "IPY_MODEL_3a2804ae6bd348f3a97dae2a44f48017", + "IPY_MODEL_405acc520a9c439aa3ec12b175636625" + ], + "layout": "IPY_MODEL_9a67d2e831e64ac1b3f117423732ffeb", + "tabbable": null, + "tooltip": null + } + }, + "a595f74c4f7842f7ad38229c810d64a7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a9ee521be6c2488296be4f26b9f8868f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "aa06b8f59a1841478727fdbd8e406eb1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_53f9b1e20eb3420b9853f15dea0bffb5", + "placeholder": "​", + "style": "IPY_MODEL_1099cdbb12f641959627d62a900557ec", + "tabbable": null, + "tooltip": null, + "value": "embedding_model.ckpt: 100%" + } + }, + "af3e5db77667443a85ca53bbd97b3455": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b02f3f2fb4ec417e88b2a19a29b941a2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "c41bda47d0d84a2a82813309cee28e95": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_04cb4c5077eb41298a773da4cc32dec2", + "max": 3201.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_af3e5db77667443a85ca53bbd97b3455", + "tabbable": null, + "tooltip": null, + "value": 3201.0 + } + }, + "c7d4ac522a8141b1855d6a0a06ffccca": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_c94f00f43bbe4fe6b134a8d5d95ef380", + "placeholder": "​", + "style": "IPY_MODEL_0f572c1ad5114f4588cfb051d49b535f", + "tabbable": null, + "tooltip": null, + "value": "classifier.ckpt: 100%" + } + }, + "c94f00f43bbe4fe6b134a8d5d95ef380": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cb39878725a9444a9bcdd0b909c96332": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d08e98d155fd42dcbeb07f6f9eb4725b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_cb39878725a9444a9bcdd0b909c96332", + "max": 16887676.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_586066e739754b39b2eaf7a3f1da8504", + "tabbable": null, + "tooltip": null, + "value": 16887676.0 + } + }, + "d1696dfb2368405b8499e222be6f1194": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_6686c7d44ccd448c99ca91fa3c91917c", + "IPY_MODEL_c41bda47d0d84a2a82813309cee28e95", + "IPY_MODEL_00f915c09107454681ebfa2b4540f0e8" + ], + "layout": "IPY_MODEL_a595f74c4f7842f7ad38229c810d64a7", + "tabbable": null, + "tooltip": null + } + }, + "d48bb13c95054f2f9414d2ab9a822ed1": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e2592b808bd84023a5493fa870d55a0b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e2a41f4fe1f64747825001239ab7c734": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_87c5a65a353c4367ab44da337bb564a4", + "placeholder": "​", + "style": "IPY_MODEL_36e18140e7104f689f0c950e74fab62d", + "tabbable": null, + "tooltip": null, + "value": " 15.9M/15.9M [00:00<00:00, 145MB/s]" + } + }, + "e8e9c6cffb19480e95e34110a00102bf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_aa06b8f59a1841478727fdbd8e406eb1", + "IPY_MODEL_d08e98d155fd42dcbeb07f6f9eb4725b", + "IPY_MODEL_ebff3171e53c43e0ac9629989e5563a8" + ], + "layout": "IPY_MODEL_42b41247581040349e012b68be8f53c5", + "tabbable": null, + "tooltip": null + } + }, + "ebff3171e53c43e0ac9629989e5563a8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_779883c9e0f3479aaface68a7fd38753", + "placeholder": "​", + "style": "IPY_MODEL_51c3c4a3bbe448b4b9902bdd3e6734e7", + "tabbable": null, + "tooltip": null, + "value": " 16.9M/16.9M [00:00<00:00, 145MB/s]" + } + }, + "efe307f51ad34dae898c58c7beda3d63": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f1fe2b33807f43c8b4e60f2337122b2e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_334369b6e2f344128d1a4fc9b7d9524c", + "placeholder": "​", + "style": "IPY_MODEL_98f6b5a3a1d541ffb3fcff9fa60a584a", + "tabbable": null, + "tooltip": null, + "value": "label_encoder.txt: 100%" + } + }, + "fe6b6ca6fb084738b979aef37c54b399": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + } + }, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/datalab_advanced.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/datalab_advanced.ipynb new file mode 100644 index 000000000..abc6a7c01 --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/datalab_advanced.ipynb @@ -0,0 +1,1830 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Datalab: Advanced workflows to audit your data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Cleanlab offers a `Datalab` object to identify various issues in your machine learning datasets that may negatively impact models if not addressed. By default, `Datalab` can help you identify noisy labels, outliers, (near) duplicates, and other types of problems that commonly occur in real-world data.\n", + "\n", + "`Datalab` performs these checks by utilizing the (probabilistic) predictions from *any* ML model that has already been trained or its learned representations of the data. Underneath the hood, this class calls all the appropriate cleanlab methods for your dataset and provided model outputs, in order to best audit the data and alert you of important issues. This makes it easy to apply many functionalities of this library all within a single line of code. \n", + "\n", + "**This tutorial will demonstrate some advanced functionalities of Datalab including:**\n", + "\n", + "- Incremental issue search\n", + "- Specifying nondefault arguments to issue checks\n", + "- Save and load Datalab objects\n", + "- Adding a custom IssueManager\n", + "\n", + "If you are new to `Datalab`, check out this [quickstart tutorial](datalab_quickstart.html) for a 5-min introduction!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "Already have (out-of-sample) `pred_probs` from a model trained on an existing set of labels? Maybe you have some `features` as well? Run the code below to examine your dataset for multiple types of issues.\n", + "\n", + "
\n", + " \n", + "```ipython3 \n", + "from cleanlab import Datalab\n", + "\n", + "lab = Datalab(data=your_dataset, label_name=\"column_name_of_labels\")\n", + "lab.find_issues(features=your_feature_matrix, pred_probs=your_pred_probs)\n", + "\n", + "lab.report()\n", + "```\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Install and import required dependencies" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`Datalab` has additional dependencies that are not included in the standard installation of cleanlab.\n", + "\n", + "You can use pip to install all packages required for this tutorial as follows:\n", + "\n", + "```ipython3\n", + "!pip install matplotlib \n", + "!pip install \"cleanlab[datalab]\"\n", + "\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:09.120470Z", + "iopub.status.busy": "2024-06-25T22:59:09.120297Z", + "iopub.status.idle": "2024-06-25T22:59:10.344076Z", + "shell.execute_reply": "2024-06-25T22:59:10.343514Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs website).\n", + "dependencies = [\"cleanlab\", \"matplotlib\", \"datasets\"] # TODO: make sure this list is updated\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:10.346661Z", + "iopub.status.busy": "2024-06-25T22:59:10.346333Z", + "iopub.status.idle": "2024-06-25T22:59:10.349437Z", + "shell.execute_reply": "2024-06-25T22:59:10.348915Z" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.model_selection import cross_val_predict\n", + "\n", + "from cleanlab import Datalab" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create and load the data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll load a toy classification dataset for this tutorial. The dataset has two numerical features and a label column with three classes." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
See the code for data generation. **(click to expand)**\n", + " \n", + "```ipython3\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "from sklearn.model_selection import train_test_split\n", + "from cleanlab.benchmarking.noise_generation import (\n", + " generate_noise_matrix_from_trace,\n", + " generate_noisy_labels,\n", + ")\n", + "\n", + "SEED = 123\n", + "np.random.seed(SEED)\n", + "\n", + "BINS = {\n", + " \"low\": [-np.inf, 3.3],\n", + " \"mid\": [3.3, 6.6],\n", + " \"high\": [6.6, +np.inf],\n", + "}\n", + "\n", + "BINS_MAP = {\n", + " \"low\": 0,\n", + " \"mid\": 1,\n", + " \"high\": 2,\n", + "}\n", + "\n", + "\n", + "def create_data():\n", + "\n", + " X = np.random.rand(250, 2) * 5\n", + " y = np.sum(X, axis=1)\n", + " # Map y to bins based on the BINS dict\n", + " y_bin = np.array([k for y_i in y for k, v in BINS.items() if v[0] <= y_i < v[1]])\n", + " y_bin_idx = np.array([BINS_MAP[k] for k in y_bin])\n", + "\n", + " # Split into train and test\n", + " X_train, X_test, y_train, y_test, y_train_idx, y_test_idx = train_test_split(\n", + " X, y_bin, y_bin_idx, test_size=0.5, random_state=SEED\n", + " )\n", + "\n", + " # Add several (5) out-of-distribution points. Sliding them along the decision boundaries\n", + " # to make them look like they are out-of-frame\n", + " X_out = np.array(\n", + " [\n", + " [-1.5, 3.0],\n", + " [-1.75, 6.5],\n", + " [1.5, 7.2],\n", + " [2.5, -2.0],\n", + " [5.5, 7.0],\n", + " ]\n", + " )\n", + " # Add a near duplicate point to the last outlier, with some tiny noise added\n", + " near_duplicate = X_out[-1:] + np.random.rand(1, 2) * 1e-6\n", + " X_out = np.concatenate([X_out, near_duplicate])\n", + "\n", + " y_out = np.sum(X_out, axis=1)\n", + " y_out_bin = np.array([k for y_i in y_out for k, v in BINS.items() if v[0] <= y_i < v[1]])\n", + " y_out_bin_idx = np.array([BINS_MAP[k] for k in y_out_bin])\n", + "\n", + " # Add to train\n", + " X_train = np.concatenate([X_train, X_out])\n", + " y_train = np.concatenate([y_train, y_out])\n", + " y_train_idx = np.concatenate([y_train_idx, y_out_bin_idx])\n", + "\n", + " # Add an exact duplicate example to the training set\n", + " exact_duplicate_idx = np.random.randint(0, len(X_train))\n", + " X_duplicate = X_train[exact_duplicate_idx, None]\n", + " y_duplicate = y_train[exact_duplicate_idx, None]\n", + " y_duplicate_idx = y_train_idx[exact_duplicate_idx, None]\n", + "\n", + " # Add to train\n", + " X_train = np.concatenate([X_train, X_duplicate])\n", + " y_train = np.concatenate([y_train, y_duplicate])\n", + " y_train_idx = np.concatenate([y_train_idx, y_duplicate_idx])\n", + "\n", + " py = np.bincount(y_train_idx) / float(len(y_train_idx))\n", + " m = len(BINS)\n", + "\n", + " noise_matrix = generate_noise_matrix_from_trace(\n", + " m,\n", + " trace=0.9 * m,\n", + " py=py,\n", + " valid_noise_matrix=True,\n", + " seed=SEED,\n", + " )\n", + "\n", + " noisy_labels_idx = generate_noisy_labels(y_train_idx, noise_matrix)\n", + " noisy_labels = np.array([list(BINS_MAP.keys())[i] for i in noisy_labels_idx])\n", + "\n", + " return X_train, y_train_idx, noisy_labels, noisy_labels_idx, X_out, X_duplicate\n", + "```\n", + "\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:10.351773Z", + "iopub.status.busy": "2024-06-25T22:59:10.351390Z", + "iopub.status.idle": "2024-06-25T22:59:10.360110Z", + "shell.execute_reply": "2024-06-25T22:59:10.359562Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "from cleanlab.benchmarking.noise_generation import (\n", + " generate_noise_matrix_from_trace,\n", + " generate_noisy_labels,\n", + ")\n", + "\n", + "SEED = 123\n", + "np.random.seed(SEED)\n", + "\n", + "BINS = {\n", + " \"low\": [-np.inf, 3.3],\n", + " \"mid\": [3.3, 6.6],\n", + " \"high\": [6.6, +np.inf],\n", + "}\n", + "\n", + "BINS_MAP = {\n", + " \"low\": 0,\n", + " \"mid\": 1,\n", + " \"high\": 2,\n", + "}\n", + "\n", + "\n", + "def create_data():\n", + "\n", + " X = np.random.rand(250, 2) * 5\n", + " y = np.sum(X, axis=1)\n", + " # Map y to bins based on the BINS dict\n", + " y_bin = np.array([k for y_i in y for k, v in BINS.items() if v[0] <= y_i < v[1]])\n", + " y_bin_idx = np.array([BINS_MAP[k] for k in y_bin])\n", + "\n", + " # Split into train and test\n", + " X_train, X_test, y_train, y_test, y_train_idx, y_test_idx = train_test_split(\n", + " X, y_bin, y_bin_idx, test_size=0.5, random_state=SEED\n", + " )\n", + "\n", + " # Add several (5) out-of-distribution points. Sliding them along the decision boundaries\n", + " # to make them look like they are out-of-frame\n", + " X_out = np.array(\n", + " [\n", + " [-1.5, 3.0],\n", + " [-1.75, 6.5],\n", + " [1.5, 7.2],\n", + " [2.5, -2.0],\n", + " [5.5, 7.0],\n", + " ]\n", + " )\n", + " # Add a near duplicate point to the last outlier, with some tiny noise added\n", + " near_duplicate = X_out[-1:] + np.random.rand(1, 2) * 1e-6\n", + " X_out = np.concatenate([X_out, near_duplicate])\n", + "\n", + " y_out = np.sum(X_out, axis=1)\n", + " y_out_bin = np.array([k for y_i in y_out for k, v in BINS.items() if v[0] <= y_i < v[1]])\n", + " y_out_bin_idx = np.array([BINS_MAP[k] for k in y_out_bin])\n", + "\n", + " # Add to train\n", + " X_train = np.concatenate([X_train, X_out])\n", + " y_train = np.concatenate([y_train, y_out])\n", + " y_train_idx = np.concatenate([y_train_idx, y_out_bin_idx])\n", + "\n", + " # Add an exact duplicate example to the training set\n", + " exact_duplicate_idx = np.random.randint(0, len(X_train))\n", + " X_duplicate = X_train[exact_duplicate_idx, None]\n", + " y_duplicate = y_train[exact_duplicate_idx, None]\n", + " y_duplicate_idx = y_train_idx[exact_duplicate_idx, None]\n", + "\n", + " # Add to train\n", + " X_train = np.concatenate([X_train, X_duplicate])\n", + " y_train = np.concatenate([y_train, y_duplicate])\n", + " y_train_idx = np.concatenate([y_train_idx, y_duplicate_idx])\n", + "\n", + " py = np.bincount(y_train_idx) / float(len(y_train_idx))\n", + " m = len(BINS)\n", + "\n", + " noise_matrix = generate_noise_matrix_from_trace(\n", + " m,\n", + " trace=0.9 * m,\n", + " py=py,\n", + " valid_noise_matrix=True,\n", + " seed=SEED,\n", + " )\n", + "\n", + " noisy_labels_idx = generate_noisy_labels(y_train_idx, noise_matrix)\n", + " noisy_labels = np.array([list(BINS_MAP.keys())[i] for i in noisy_labels_idx])\n", + "\n", + " return X_train, y_train_idx, noisy_labels, noisy_labels_idx, X_out, X_duplicate" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:10.362137Z", + "iopub.status.busy": "2024-06-25T22:59:10.361814Z", + "iopub.status.idle": "2024-06-25T22:59:10.366508Z", + "shell.execute_reply": "2024-06-25T22:59:10.366061Z" + } + }, + "outputs": [], + "source": [ + "X_train, y_train_idx, noisy_labels, noisy_labels_idx, X_out, X_duplicate = create_data()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We make a scatter plot of the features, with a color corresponding to the observed labels. Incorrect given labels are highlighted in red if they do not match the true label, outliers highlighted with an a black cross, and duplicates highlighted with a cyan cross." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
See the code to visualize the data. **(click to expand)**\n", + " \n", + "```ipython3\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "import matplotlib.pyplot as plt\n", + "\n", + "def plot_data(X_train, y_train_idx, noisy_labels_idx, X_out, X_duplicate):\n", + " # Plot data with clean labels and noisy labels, use BINS_MAP for the legend\n", + " fig, ax = plt.subplots(figsize=(8, 6.5))\n", + " \n", + " low = ax.scatter(X_train[noisy_labels_idx == 0, 0], X_train[noisy_labels_idx == 0, 1], label=\"low\")\n", + " mid = ax.scatter(X_train[noisy_labels_idx == 1, 0], X_train[noisy_labels_idx == 1, 1], label=\"mid\")\n", + " high = ax.scatter(X_train[noisy_labels_idx == 2, 0], X_train[noisy_labels_idx == 2, 1], label=\"high\")\n", + " \n", + " ax.set_title(\"Noisy labels\")\n", + " ax.set_xlabel(r\"$x_1$\", fontsize=16)\n", + " ax.set_ylabel(r\"$x_2$\", fontsize=16)\n", + "\n", + " # Plot true boundaries (x+y=3.3, x+y=6.6)\n", + " ax.set_xlim(-3.5, 9.0)\n", + " ax.set_ylim(-3.5, 9.0)\n", + " ax.plot([-0.7, 4.0], [4.0, -0.7], color=\"k\", linestyle=\"--\", alpha=0.5)\n", + " ax.plot([-0.7, 7.3], [7.3, -0.7], color=\"k\", linestyle=\"--\", alpha=0.5)\n", + "\n", + " # Draw red circles around the points that are misclassified (i.e. the points that are in the wrong bin)\n", + " for i, (X, y) in enumerate(zip([X_train, X_train], [y_train_idx, noisy_labels_idx])):\n", + " for j, (k, v) in enumerate(BINS_MAP.items()):\n", + " label_err = ax.scatter(\n", + " X[(y == v) & (y != y_train_idx), 0],\n", + " X[(y == v) & (y != y_train_idx), 1],\n", + " s=180,\n", + " marker=\"o\",\n", + " facecolor=\"none\",\n", + " edgecolors=\"red\",\n", + " linewidths=2.5,\n", + " alpha=0.5,\n", + " label=\"Label error\",\n", + " )\n", + "\n", + "\n", + " outlier = ax.scatter(X_out[:, 0], X_out[:, 1], color=\"k\", marker=\"x\", s=100, linewidth=2, label=\"Outlier\")\n", + "\n", + " # Plot the exact duplicate\n", + " dups = ax.scatter(\n", + " X_duplicate[:, 0],\n", + " X_duplicate[:, 1],\n", + " color=\"c\",\n", + " marker=\"x\",\n", + " s=100,\n", + " linewidth=2,\n", + " label=\"Duplicates\",\n", + " )\n", + " \n", + " first_legend = ax.legend(handles=[low, mid, high], loc=[0.75, 0.7], title=\"Given Class Label\", alignment=\"left\", title_fontproperties={\"weight\":\"semibold\"})\n", + " second_legend = ax.legend(handles=[label_err, outlier, dups], loc=[0.75, 0.45], title=\"Type of Issue\", alignment=\"left\", title_fontproperties={\"weight\":\"semibold\"})\n", + " \n", + " ax = plt.gca().add_artist(first_legend)\n", + " ax = plt.gca().add_artist(second_legend)\n", + " plt.tight_layout()\n", + "```\n", + " \n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:10.368744Z", + "iopub.status.busy": "2024-06-25T22:59:10.368339Z", + "iopub.status.idle": "2024-06-25T22:59:10.554054Z", + "shell.execute_reply": "2024-06-25T22:59:10.553420Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "def plot_data(X_train, y_train_idx, noisy_labels_idx, X_out, X_duplicate):\n", + " # Plot data with clean labels and noisy labels, use BINS_MAP for the legend\n", + " fig, ax = plt.subplots(figsize=(6, 4))\n", + " \n", + " low = ax.scatter(X_train[noisy_labels_idx == 0, 0], X_train[noisy_labels_idx == 0, 1], label=\"low\")\n", + " mid = ax.scatter(X_train[noisy_labels_idx == 1, 0], X_train[noisy_labels_idx == 1, 1], label=\"mid\")\n", + " high = ax.scatter(X_train[noisy_labels_idx == 2, 0], X_train[noisy_labels_idx == 2, 1], label=\"high\")\n", + " \n", + " ax.set_title(\"Noisy labels\")\n", + " ax.set_xlabel(r\"$x_1$\", fontsize=16)\n", + " ax.set_ylabel(r\"$x_2$\", fontsize=16)\n", + "\n", + " # Plot true boundaries (x+y=3.3, x+y=6.6)\n", + " ax.set_xlim(-2.5, 8.5)\n", + " ax.set_ylim(-3.5, 9.0)\n", + " ax.plot([-0.7, 4.0], [4.0, -0.7], color=\"k\", linestyle=\"--\", alpha=0.5)\n", + " ax.plot([-0.7, 7.3], [7.3, -0.7], color=\"k\", linestyle=\"--\", alpha=0.5)\n", + "\n", + " # Draw red circles around the points that are misclassified (i.e. the points that are in the wrong bin)\n", + " for i, (X, y) in enumerate(zip([X_train, X_train], [y_train_idx, noisy_labels_idx])):\n", + " for j, (k, v) in enumerate(BINS_MAP.items()):\n", + " label_err = ax.scatter(\n", + " X[(y == v) & (y != y_train_idx), 0],\n", + " X[(y == v) & (y != y_train_idx), 1],\n", + " s=180,\n", + " marker=\"o\",\n", + " facecolor=\"none\",\n", + " edgecolors=\"red\",\n", + " linewidths=2.5,\n", + " alpha=0.5,\n", + " label=\"Label error\",\n", + " )\n", + "\n", + "\n", + " outlier = ax.scatter(X_out[:, 0], X_out[:, 1], color=\"k\", marker=\"x\", s=100, linewidth=2, label=\"Outlier\")\n", + "\n", + " # Plot the exact duplicate\n", + " dups = ax.scatter(\n", + " X_duplicate[:, 0],\n", + " X_duplicate[:, 1],\n", + " color=\"c\",\n", + " marker=\"x\",\n", + " s=100,\n", + " linewidth=2,\n", + " label=\"Duplicates\",\n", + " )\n", + " \n", + " title_fontproperties = {\"weight\":\"semibold\", \"size\": 8}\n", + " first_legend = ax.legend(handles=[low, mid, high], loc=[0.76, 0.7], title=\"Given Class Label\", alignment=\"left\", title_fontproperties=title_fontproperties, fontsize=8, markerscale=0.5)\n", + " second_legend = ax.legend(handles=[label_err, outlier, dups], loc=[0.76, 0.46], title=\"Type of Issue\", alignment=\"left\", title_fontproperties=title_fontproperties, fontsize=8, markerscale=0.5)\n", + " \n", + " ax = plt.gca().add_artist(first_legend)\n", + " ax = plt.gca().add_artist(second_legend)\n", + " plt.tight_layout()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:10.556868Z", + "iopub.status.busy": "2024-06-25T22:59:10.556364Z", + "iopub.status.idle": "2024-06-25T22:59:10.881814Z", + "shell.execute_reply": "2024-06-25T22:59:10.881239Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_data(X_train, y_train_idx, noisy_labels_idx, X_out, X_duplicate)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In real-world scenarios, you won't know the true labels or the distribution of the features, so we won't use these in this tutorial, except for evaluation purposes." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Get out-of-sample predicted probabilities from a classifier" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To detect certain types of issues in classification data (e.g. label errors), `Datalab` relies on predicted class probabilities from a trained model. Ideally, the prediction for each example should be out-of-sample (to avoid overfitting), coming from a copy of the model that was not trained on this example. \n", + "\n", + "This tutorial uses a simple logistic regression model \n", + "and the `cross_val_predict()` function from scikit-learn to generate out-of-sample predicted class probabilities for every example in the training set. You can replace this with *any* other classifier model and train it with cross-validation to get out-of-sample predictions.\n", + "Make sure that the columns of your `pred_probs` are properly ordered with respect to the ordering of classes, which for Datalab is: lexicographically sorted by class name." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:10.883982Z", + "iopub.status.busy": "2024-06-25T22:59:10.883793Z", + "iopub.status.idle": "2024-06-25T22:59:10.907249Z", + "shell.execute_reply": "2024-06-25T22:59:10.906811Z" + } + }, + "outputs": [], + "source": [ + "model = LogisticRegression()\n", + "pred_probs = cross_val_predict(\n", + " estimator=model, X=X_train, y=noisy_labels, cv=5, method=\"predict_proba\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Instantiate Datalab object" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we instantiate the Datalab object that will be used in the remainder in the tutorial by passing in the data created above.\n", + "\n", + "`Datalab` has several ways of loading the data. In this case, we'll simply wrap the training features and noisy labels in a dictionary so that we can pass it to `Datalab`.\n", + "\n", + "Other supported data formats for `Datalab` include: [HuggingFace Datasets](https://huggingface.co/docs/datasets/index) and [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html). `Datalab` works across most data modalities (image, text, tabular, audio, etc). It is intended to find issues that commonly occur in datasets for which you have trained a supervised ML model, regardless of the type of data.\n", + "\n", + "Currently, pandas DataFrames that contain categorical columns might cause some issues when instantiating the `Datalab` object, so it is recommended to ensure that your DataFrame does not contain any categorical columns, or use other data formats (eg. python dictionary, HuggingFace Datasets) to pass in your data." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:10.909656Z", + "iopub.status.busy": "2024-06-25T22:59:10.909243Z", + "iopub.status.idle": "2024-06-25T22:59:10.920635Z", + "shell.execute_reply": "2024-06-25T22:59:10.920101Z" + } + }, + "outputs": [], + "source": [ + "data = {\"X\": X_train, \"y\": noisy_labels}\n", + "\n", + "lab = Datalab(data, label_name=\"y\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## **Functionality 1**: Incremental issue search " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can call `find_issues` multiple times on a `Datalab` object to detect issues one type at a time.\n", + "\n", + "This is done via the `issue_types` argument which accepts a dictionary of issue types and any corresponding keyword arguments to specify nondefault keyword arguments to use for detecting each type of issues. In this first call, we only want to detect label issues, which are detected solely based on `pred_probs`, hence there is no need for us to pass in `features` here." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:10.922949Z", + "iopub.status.busy": "2024-06-25T22:59:10.922753Z", + "iopub.status.idle": "2024-06-25T22:59:13.016689Z", + "shell.execute_reply": "2024-06-25T22:59:13.016047Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding label issues ...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Audit complete. 11 issues found in the dataset.\n", + "Dataset Information: num_examples: 132, num_classes: 3\n", + "\n", + "Here is a summary of various issues found in your data:\n", + "\n", + "issue_type num_issues\n", + " label 11\n", + "\n", + "Learn about each issue: https://docs.cleanlab.ai/stable/cleanlab/datalab/guide/issue_type_description.html\n", + "See which examples in your dataset exhibit each issue via: `datalab.get_issues()`\n", + "\n", + "Data indices corresponding to top examples of each issue are shown below.\n", + "\n", + "\n", + "----------------------- label issues -----------------------\n", + "\n", + "About this issue:\n", + "\tExamples whose given label is estimated to be potentially incorrect\n", + " (e.g. due to annotation error) are flagged as having label issues.\n", + " \n", + "\n", + "Number of examples with this issue: 11\n", + "Overall dataset quality in terms of this issue: 0.9318\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_label_issue label_score given_label predicted_label\n", + "77 True 0.006940 high mid\n", + "7 True 0.007830 low mid\n", + "40 True 0.014828 mid low\n", + "107 True 0.021241 high mid\n", + "120 True 0.026407 high mid\n" + ] + } + ], + "source": [ + "lab.find_issues(pred_probs=pred_probs, issue_types={\"label\": {}}) \n", + "lab.report()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can check for additional types of issues with the same `Datalab`. Here, we would like to detect outliers and near duplicates which both utilize the features of the data.\n", + "\n", + "Notice that this second call to `find_issues()` updates the output of `report()`, we can see the existing label issues detected alongside the new issues." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:13.019424Z", + "iopub.status.busy": "2024-06-25T22:59:13.018849Z", + "iopub.status.idle": "2024-06-25T22:59:13.041040Z", + "shell.execute_reply": "2024-06-25T22:59:13.040364Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding outlier issues ...\n", + "Finding near_duplicate issues ...\n", + "\n", + "Audit complete. 21 issues found in the dataset.\n", + "Dataset Information: num_examples: 132, num_classes: 3\n", + "\n", + "Here is a summary of various issues found in your data:\n", + "\n", + " issue_type num_issues\n", + " label 11\n", + " outlier 6\n", + "near_duplicate 4\n", + "\n", + "Learn about each issue: https://docs.cleanlab.ai/stable/cleanlab/datalab/guide/issue_type_description.html\n", + "See which examples in your dataset exhibit each issue via: `datalab.get_issues()`\n", + "\n", + "Data indices corresponding to top examples of each issue are shown below.\n", + "\n", + "\n", + "----------------------- label issues -----------------------\n", + "\n", + "About this issue:\n", + "\tExamples whose given label is estimated to be potentially incorrect\n", + " (e.g. due to annotation error) are flagged as having label issues.\n", + " \n", + "\n", + "Number of examples with this issue: 11\n", + "Overall dataset quality in terms of this issue: 0.9318\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_label_issue label_score given_label predicted_label\n", + "77 True 0.006940 high mid\n", + "7 True 0.007830 low mid\n", + "40 True 0.014828 mid low\n", + "107 True 0.021241 high mid\n", + "120 True 0.026407 high mid\n", + "\n", + "\n", + "---------------------- outlier issues ----------------------\n", + "\n", + "About this issue:\n", + "\tExamples that are very different from the rest of the dataset \n", + " (i.e. potentially out-of-distribution or rare/anomalous instances).\n", + " \n", + "\n", + "Number of examples with this issue: 6\n", + "Overall dataset quality in terms of this issue: 0.3558\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_outlier_issue outlier_score\n", + "126 True 0.006636\n", + "130 True 0.012571\n", + "129 True 0.012571\n", + "127 True 0.014909\n", + "128 True 0.017443\n", + "\n", + "\n", + "------------------ near_duplicate issues -------------------\n", + "\n", + "About this issue:\n", + "\tA (near) duplicate issue refers to two or more examples in\n", + " a dataset that are extremely similar to each other, relative\n", + " to the rest of the dataset. The examples flagged with this issue\n", + " may be exactly duplicated, or lie atypically close together when\n", + " represented as vectors (i.e. feature embeddings).\n", + " \n", + "\n", + "Number of examples with this issue: 4\n", + "Overall dataset quality in terms of this issue: 0.6160\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets distance_to_nearest_neighbor\n", + "131 True 0.000000 [123] 0.000000e+00\n", + "123 True 0.000000 [131] 0.000000e+00\n", + "129 True 0.000002 [130] 4.463180e-07\n", + "130 True 0.000002 [129] 4.463180e-07\n", + "51 False 0.161148 [] 3.859087e-02\n" + ] + } + ], + "source": [ + "lab.find_issues(features=data[\"X\"], issue_types={\"outlier\": {}, \"near_duplicate\": {}})\n", + "lab.report()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## **Functionality 2**: Specifying nondefault arguments" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also overwrite previously-executed checks for a type of issue. Here we re-run the detection of outliers, but specify that different non-default settings should be used (in this case, the number of neighbors `k` compared against to determine which datapoints are outliers). \n", + "The results from this new detection will replace the original outlier detection results in the updated `Datalab`. You could similarly specify non-default settings for other issue types in the first call to `Datalab.find_issues()`." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:13.043443Z", + "iopub.status.busy": "2024-06-25T22:59:13.043109Z", + "iopub.status.idle": "2024-06-25T22:59:13.062851Z", + "shell.execute_reply": "2024-06-25T22:59:13.062187Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding outlier issues ...\n", + "\n", + "Audit complete. 22 issues found in the dataset.\n", + "Dataset Information: num_examples: 132, num_classes: 3\n", + "\n", + "Here is a summary of various issues found in your data:\n", + "\n", + " issue_type num_issues\n", + " label 11\n", + " outlier 7\n", + "near_duplicate 4\n", + "\n", + "Learn about each issue: https://docs.cleanlab.ai/stable/cleanlab/datalab/guide/issue_type_description.html\n", + "See which examples in your dataset exhibit each issue via: `datalab.get_issues()`\n", + "\n", + "Data indices corresponding to top examples of each issue are shown below.\n", + "\n", + "\n", + "----------------------- label issues -----------------------\n", + "\n", + "About this issue:\n", + "\tExamples whose given label is estimated to be potentially incorrect\n", + " (e.g. due to annotation error) are flagged as having label issues.\n", + " \n", + "\n", + "Number of examples with this issue: 11\n", + "Overall dataset quality in terms of this issue: 0.9318\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_label_issue label_score given_label predicted_label\n", + "77 True 0.006940 high mid\n", + "7 True 0.007830 low mid\n", + "40 True 0.014828 mid low\n", + "107 True 0.021241 high mid\n", + "120 True 0.026407 high mid\n", + "\n", + "\n", + "---------------------- outlier issues ----------------------\n", + "\n", + "About this issue:\n", + "\tExamples that are very different from the rest of the dataset \n", + " (i.e. potentially out-of-distribution or rare/anomalous instances).\n", + " \n", + "\n", + "Number of examples with this issue: 7\n", + "Overall dataset quality in terms of this issue: 0.3453\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_outlier_issue outlier_score\n", + "126 True 0.029542\n", + "130 True 0.031182\n", + "129 True 0.031182\n", + "128 True 0.057961\n", + "127 True 0.058244\n", + "\n", + "\n", + "------------------ near_duplicate issues -------------------\n", + "\n", + "About this issue:\n", + "\tA (near) duplicate issue refers to two or more examples in\n", + " a dataset that are extremely similar to each other, relative\n", + " to the rest of the dataset. The examples flagged with this issue\n", + " may be exactly duplicated, or lie atypically close together when\n", + " represented as vectors (i.e. feature embeddings).\n", + " \n", + "\n", + "Number of examples with this issue: 4\n", + "Overall dataset quality in terms of this issue: 0.6160\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets distance_to_nearest_neighbor\n", + "131 True 0.000000 [123] 0.000000e+00\n", + "123 True 0.000000 [131] 0.000000e+00\n", + "129 True 0.000002 [130] 4.463180e-07\n", + "130 True 0.000002 [129] 4.463180e-07\n", + "51 False 0.161148 [] 3.859087e-02\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/runner/work/cleanlab/cleanlab/cleanlab/datalab/internal/data_issues.py:348: UserWarning: Overwriting columns ['is_outlier_issue', 'outlier_score'] in self.issues with columns from issue manager OutlierIssueManager.\n", + " warnings.warn(\n", + "/home/runner/work/cleanlab/cleanlab/cleanlab/datalab/internal/data_issues.py:378: UserWarning: Overwriting row in self.issue_summary with row from issue manager OutlierIssueManager.\n", + " warnings.warn(\n", + "/home/runner/work/cleanlab/cleanlab/cleanlab/datalab/internal/data_issues.py:357: UserWarning: Overwriting key outlier in self.info\n", + " warnings.warn(f\"Overwriting key {issue_name} in self.info\")\n" + ] + } + ], + "source": [ + "lab.find_issues(features=data[\"X\"], issue_types={\"outlier\": {\"k\": 30}})\n", + "lab.report()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also increase the verbosity of the `report` to see additional information about the data issues and control how many top-ranked examples are shown for each issue." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:13.065062Z", + "iopub.status.busy": "2024-06-25T22:59:13.064860Z", + "iopub.status.idle": "2024-06-25T22:59:13.080822Z", + "shell.execute_reply": "2024-06-25T22:59:13.080209Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dataset Information: num_examples: 132, num_classes: 3\n", + "\n", + "Here is a summary of various issues found in your data:\n", + "\n", + " issue_type num_issues\n", + " label 11\n", + " outlier 7\n", + "near_duplicate 4\n", + "\n", + "Learn about each issue: https://docs.cleanlab.ai/stable/cleanlab/datalab/guide/issue_type_description.html\n", + "See which examples in your dataset exhibit each issue via: `datalab.get_issues()`\n", + "\n", + "Data indices corresponding to top examples of each issue are shown below.\n", + "\n", + "\n", + "----------------------- label issues -----------------------\n", + "\n", + "About this issue:\n", + "\tExamples whose given label is estimated to be potentially incorrect\n", + " (e.g. due to annotation error) are flagged as having label issues.\n", + " \n", + "\n", + "Number of examples with this issue: 11\n", + "Overall dataset quality in terms of this issue: 0.9318\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_label_issue label_score given_label predicted_label\n", + "77 True 0.006940 high mid\n", + "7 True 0.007830 low mid\n", + "40 True 0.014828 mid low\n", + "107 True 0.021241 high mid\n", + "120 True 0.026407 high mid\n", + "54 True 0.039122 mid low\n", + "53 True 0.044598 high mid\n", + "105 True 0.105196 mid high\n", + "4 True 0.133654 high mid\n", + "43 True 0.168033 high mid\n", + "\n", + "\n", + "---------------------- outlier issues ----------------------\n", + "\n", + "About this issue:\n", + "\tExamples that are very different from the rest of the dataset \n", + " (i.e. potentially out-of-distribution or rare/anomalous instances).\n", + " \n", + "\n", + "Number of examples with this issue: 7\n", + "Overall dataset quality in terms of this issue: 0.3453\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_outlier_issue outlier_score\n", + "126 True 0.029542\n", + "130 True 0.031182\n", + "129 True 0.031182\n", + "128 True 0.057961\n", + "127 True 0.058244\n", + "125 True 0.101107\n", + "37 True 0.183382\n", + "109 False 0.209259\n", + "35 False 0.211042\n", + "5 False 0.221316\n", + "\n", + "Additional Information: \n", + "average_ood_score: 0.34530442089193386\n", + "\n", + "\n", + "------------------ near_duplicate issues -------------------\n", + "\n", + "About this issue:\n", + "\tA (near) duplicate issue refers to two or more examples in\n", + " a dataset that are extremely similar to each other, relative\n", + " to the rest of the dataset. The examples flagged with this issue\n", + " may be exactly duplicated, or lie atypically close together when\n", + " represented as vectors (i.e. feature embeddings).\n", + " \n", + "\n", + "Number of examples with this issue: 4\n", + "Overall dataset quality in terms of this issue: 0.6160\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets distance_to_nearest_neighbor\n", + "131 True 0.000000 [123] 0.000000e+00\n", + "123 True 0.000000 [131] 0.000000e+00\n", + "129 True 0.000002 [130] 4.463180e-07\n", + "130 True 0.000002 [129] 4.463180e-07\n", + "51 False 0.161148 [] 3.859087e-02\n", + "52 False 0.161148 [] 3.859087e-02\n", + "5 False 0.169820 [] 4.087324e-02\n", + "89 False 0.169820 [] 4.087324e-02\n", + "92 False 0.259024 [] 6.583757e-02\n", + "91 False 0.346458 [] 9.341292e-02\n", + "\n", + "Additional Information: \n", + "threshold: 0.13\n" + ] + } + ], + "source": [ + "lab.report(num_examples=10, verbosity=2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice how the number of flagged outlier issues has changed after specfying different settings to use for outlier detection." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## **Functionality 3**: Save and load Datalab objects" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A `Datalab` can be saved to a folder at a specified path. In a future Python process, this path can be used to load the `Datalab` from file back into memory. Your dataset is not saved as part of this process, so you'll need to save/load it separately to keep working with it." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:13.083353Z", + "iopub.status.busy": "2024-06-25T22:59:13.082873Z", + "iopub.status.idle": "2024-06-25T22:59:13.103386Z", + "shell.execute_reply": "2024-06-25T22:59:13.102798Z" + } + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "643c32c58a4e48a3a2025d1b0dca76c2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Saving the dataset (0/1 shards): 0%| | 0/132 [00:00)`\n", + "\n", + "Data indices corresponding to top examples of each issue are shown below.\n", + "\n", + "\n", + "----------------------- label issues -----------------------\n", + "\n", + "About this issue:\n", + "\tExamples whose given label is estimated to be potentially incorrect\n", + " (e.g. due to annotation error) are flagged as having label issues.\n", + " \n", + "\n", + "Number of examples with this issue: 11\n", + "Overall dataset quality in terms of this issue: 0.9318\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_label_issue label_score given_label predicted_label\n", + "77 True 0.006940 high mid\n", + "7 True 0.007830 low mid\n", + "40 True 0.014828 mid low\n", + "107 True 0.021241 high mid\n", + "120 True 0.026407 high mid\n", + "\n", + "\n", + "---------------------- outlier issues ----------------------\n", + "\n", + "About this issue:\n", + "\tExamples that are very different from the rest of the dataset \n", + " (i.e. potentially out-of-distribution or rare/anomalous instances).\n", + " \n", + "\n", + "Number of examples with this issue: 7\n", + "Overall dataset quality in terms of this issue: 0.3453\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_outlier_issue outlier_score\n", + "126 True 0.029542\n", + "130 True 0.031182\n", + "129 True 0.031182\n", + "128 True 0.057961\n", + "127 True 0.058244\n", + "\n", + "\n", + "------------------ near_duplicate issues -------------------\n", + "\n", + "About this issue:\n", + "\tA (near) duplicate issue refers to two or more examples in\n", + " a dataset that are extremely similar to each other, relative\n", + " to the rest of the dataset. The examples flagged with this issue\n", + " may be exactly duplicated, or lie atypically close together when\n", + " represented as vectors (i.e. feature embeddings).\n", + " \n", + "\n", + "Number of examples with this issue: 4\n", + "Overall dataset quality in terms of this issue: 0.6160\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets distance_to_nearest_neighbor\n", + "131 True 0.000000 [123] 0.000000e+00\n", + "123 True 0.000000 [131] 0.000000e+00\n", + "129 True 0.000002 [130] 4.463180e-07\n", + "130 True 0.000002 [129] 4.463180e-07\n", + "51 False 0.161148 [] 3.859087e-02\n" + ] + } + ], + "source": [ + "new_lab = Datalab.load(path)\n", + "new_lab.report()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## **Functionality 4**: Adding a custom IssueManager" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`Datalab` detects pre-defined types of issues for you in one line of code: `find_issues()`. What if you want to check for other custom types of issues along with these pre-defined types, all within the same line of code?\n", + "\n", + "All issue types in `Datalab` are subclasses of cleanlab's `IssueManager` class.\n", + "To register a custom issue type for use with `Datalab`, simply also make it a subclass of `IssueManager`.\n", + "\n", + "The necessary members to implement in the subclass are:\n", + "\n", + "- A class variable called `issue_name` that acts as a unique identifier for the type of issue.\n", + "- An instance method called `find_issues` that:\n", + " - Computes a quality score for each example in the dataset (between 0-1), in terms of how *unlikely* it is to be an issue.\n", + " - Flags each example as an issue or not (may be based on thresholding the quality scores).\n", + " - Combine these in a dataframe that is assigned to an `issues` attribute of the `IssueManager`.\n", + " - Define a summary score for the overall quality of entire dataset, in terms of this type of issue. Set this score as part of the `summary` attribute of the `IssueManager`.\n", + " \n", + "To demonstrate this, we create an arbitrary issue type that checks the divisibility of an example's index in the dataset by 13." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:13.123279Z", + "iopub.status.busy": "2024-06-25T22:59:13.122985Z", + "iopub.status.idle": "2024-06-25T22:59:13.129079Z", + "shell.execute_reply": "2024-06-25T22:59:13.128617Z" + } + }, + "outputs": [], + "source": [ + "from cleanlab.datalab.internal.issue_manager import IssueManager\n", + "from cleanlab.datalab.internal.issue_manager_factory import register\n", + "\n", + "\n", + "def scoring_function(idx: int, div: int = 13) -> float:\n", + " if idx == 0:\n", + " # Zero excluded from the divisibility check, gets the highest score\n", + " return 1\n", + " rem = idx % div\n", + " inv_scale = idx // div\n", + " if rem == 0:\n", + " return 0.5 * (1 - np.exp(-0.1*(inv_scale-1)))\n", + " else:\n", + " return 1 - 0.49 * (1 - np.exp(-inv_scale**0.5))*rem/div\n", + "\n", + "\n", + "@register # register this issue type for use with Datalab\n", + "class SuperstitionIssueManager(IssueManager):\n", + " \"\"\"A custom issue manager that keeps track of issue indices that\n", + " are divisible by 13.\n", + " \"\"\"\n", + " description: str = \"Examples with indices that are divisible by 13 may be unlucky.\" # Optional\n", + " issue_name: str = \"superstition\"\n", + "\n", + " def find_issues(self, div=13, **_) -> None:\n", + " ids = self.datalab.issues.index.to_series()\n", + " issues_mask = ids.apply(lambda idx: idx % div == 0 and idx != 0)\n", + " scores = ids.apply(lambda idx: scoring_function(idx, div))\n", + " self.issues = pd.DataFrame(\n", + " {\n", + " f\"is_{self.issue_name}_issue\": issues_mask,\n", + " self.issue_score_key: scores,\n", + " },\n", + " )\n", + " summary_score = 1 - sum(issues_mask) / len(issues_mask)\n", + " self.summary = self.make_summary(score = summary_score)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once registered, this `IssueManager` will perform custom issue checks when `find_issues` is called on a `Datalab` instance.\n", + "\n", + "As our `Datalab` instance here already has results from the outlier and near duplicate checks, we perform the custom issue check separately." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:13.131277Z", + "iopub.status.busy": "2024-06-25T22:59:13.130917Z", + "iopub.status.idle": "2024-06-25T22:59:13.150009Z", + "shell.execute_reply": "2024-06-25T22:59:13.149473Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding superstition issues ...\n", + "\n", + "Audit complete. 32 issues found in the dataset.\n", + "Dataset Information: num_examples: 132, num_classes: 3\n", + "\n", + "Here is a summary of various issues found in your data:\n", + "\n", + " issue_type num_issues\n", + " label 11\n", + " superstition 10\n", + " outlier 7\n", + "near_duplicate 4\n", + "\n", + "Learn about each issue: https://docs.cleanlab.ai/stable/cleanlab/datalab/guide/issue_type_description.html\n", + "See which examples in your dataset exhibit each issue via: `datalab.get_issues()`\n", + "\n", + "Data indices corresponding to top examples of each issue are shown below.\n", + "\n", + "\n", + "----------------------- label issues -----------------------\n", + "\n", + "About this issue:\n", + "\tExamples whose given label is estimated to be potentially incorrect\n", + " (e.g. due to annotation error) are flagged as having label issues.\n", + " \n", + "\n", + "Number of examples with this issue: 11\n", + "Overall dataset quality in terms of this issue: 0.9318\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_label_issue label_score given_label predicted_label\n", + "77 True 0.006940 high mid\n", + "7 True 0.007830 low mid\n", + "40 True 0.014828 mid low\n", + "107 True 0.021241 high mid\n", + "120 True 0.026407 high mid\n", + "\n", + "\n", + "------------------- superstition issues --------------------\n", + "\n", + "About this issue:\n", + "\tExamples with indices that are divisible by 13 may be unlucky.\n", + "\n", + "Number of examples with this issue: 10\n", + "Overall dataset quality in terms of this issue: 0.9242\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_superstition_issue superstition_score\n", + "13 True 0.000000\n", + "26 True 0.047581\n", + "39 True 0.090635\n", + "52 True 0.129591\n", + "65 True 0.164840\n", + "\n", + "\n", + "---------------------- outlier issues ----------------------\n", + "\n", + "About this issue:\n", + "\tExamples that are very different from the rest of the dataset \n", + " (i.e. potentially out-of-distribution or rare/anomalous instances).\n", + " \n", + "\n", + "Number of examples with this issue: 7\n", + "Overall dataset quality in terms of this issue: 0.3453\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_outlier_issue outlier_score\n", + "126 True 0.029542\n", + "130 True 0.031182\n", + "129 True 0.031182\n", + "128 True 0.057961\n", + "127 True 0.058244\n", + "\n", + "\n", + "------------------ near_duplicate issues -------------------\n", + "\n", + "About this issue:\n", + "\tA (near) duplicate issue refers to two or more examples in\n", + " a dataset that are extremely similar to each other, relative\n", + " to the rest of the dataset. The examples flagged with this issue\n", + " may be exactly duplicated, or lie atypically close together when\n", + " represented as vectors (i.e. feature embeddings).\n", + " \n", + "\n", + "Number of examples with this issue: 4\n", + "Overall dataset quality in terms of this issue: 0.6160\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets distance_to_nearest_neighbor\n", + "131 True 0.000000 [123] 0.000000e+00\n", + "123 True 0.000000 [131] 0.000000e+00\n", + "129 True 0.000002 [130] 4.463180e-07\n", + "130 True 0.000002 [129] 4.463180e-07\n", + "51 False 0.161148 [] 3.859087e-02\n" + ] + } + ], + "source": [ + "lab.find_issues(issue_types={\"superstition\": {}})\n", + "lab.report()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + }, + "vscode": { + "interpreter": { + "hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe" + } + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": { + "1f72c80fe22c46a488d3c9212d66f963": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "550862a10e5a4375899db3c2c60d9e0f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_5806cc635c8b401eac9494d608e0a4dc", + "max": 132.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_65f8134f62ed4b9f8f0cfe3571737a56", + "tabbable": null, + "tooltip": null, + "value": 132.0 + } + }, + "57c5ce9bcbf94dbd88f51a8af165ce2f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5806cc635c8b401eac9494d608e0a4dc": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "623729b17fa74332a232bc1d78883eb5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "643c32c58a4e48a3a2025d1b0dca76c2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_76e360db34ac445a8fb794ca51e9e771", + "IPY_MODEL_550862a10e5a4375899db3c2c60d9e0f", + "IPY_MODEL_f565c514bf67479795a7e07548f290ea" + ], + "layout": "IPY_MODEL_1f72c80fe22c46a488d3c9212d66f963", + "tabbable": null, + "tooltip": null + } + }, + "65f8134f62ed4b9f8f0cfe3571737a56": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "76e360db34ac445a8fb794ca51e9e771": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_dd9d4219702c4c07b5e7c7d41596dff9", + "placeholder": "​", + "style": "IPY_MODEL_623729b17fa74332a232bc1d78883eb5", + "tabbable": null, + "tooltip": null, + "value": "Saving the dataset (1/1 shards): 100%" + } + }, + "9913c381bcb54655870e2b7af0ee0a6b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "dd9d4219702c4c07b5e7c7d41596dff9": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f565c514bf67479795a7e07548f290ea": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_57c5ce9bcbf94dbd88f51a8af165ce2f", + "placeholder": "​", + "style": "IPY_MODEL_9913c381bcb54655870e2b7af0ee0a6b", + "tabbable": null, + "tooltip": null, + "value": " 132/132 [00:00<00:00, 12360.98 examples/s]" + } + } + }, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/datalab_quickstart.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/datalab_quickstart.ipynb new file mode 100644 index 000000000..b733e520f --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/datalab_quickstart.ipynb @@ -0,0 +1,1655 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Datalab: A unified audit to detect all kinds of issues in data and labels" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Cleanlab offers a `Datalab` object that can identify various issues in your machine learning datasets, such as noisy labels, outliers, (near) duplicates, drift, and other types of problems common in real-world data. These data issues may negatively impact models if not addressed. `Datalab` utilizes *any* ML model you have already trained for your data to diagnose these issues, it only requires access to either: (probabilistic) predictions from your model or its learned representations of the data.\n", + "\n", + "\n", + "**Overview of what we'll do in this tutorial:**\n", + "\n", + "- Compute out-of-sample predicted probabilities for a sample dataset using cross-validation.\n", + "- Use `Datalab` to identify issues such as noisy labels, outliers, (near) duplicates, and other types of problems \n", + "- View the issue summaries and other information about our sample dataset\n", + "\n", + "You can easily replace our demo dataset with your own image/text/tabular/audio/etc dataset, and then run the same code to discover what sort of issues lurk within it!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "Already have (out-of-sample) `pred_probs` from a model trained on an existing set of labels? Maybe you also have some numeric `features` (or model embeddings of data)? Run the code below to examine your dataset for multiple types of issues.\n", + "\n", + "
\n", + " \n", + "```ipython3 \n", + "from cleanlab import Datalab\n", + "\n", + "lab = Datalab(data=your_dataset, label_name=\"column_name_of_labels\")\n", + "lab.find_issues(features=your_feature_matrix, pred_probs=your_pred_probs)\n", + "\n", + "lab.report()\n", + "```\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Install and import required dependencies" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`Datalab` has additional dependencies that are not included in the standard installation of cleanlab.\n", + "\n", + "You can use pip to install all packages required for this tutorial as follows:\n", + "\n", + "```ipython3\n", + "!pip install matplotlib\n", + "!pip install \"cleanlab[datalab]\"\n", + "\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:16.072028Z", + "iopub.status.busy": "2024-06-25T22:59:16.071628Z", + "iopub.status.idle": "2024-06-25T22:59:17.253438Z", + "shell.execute_reply": "2024-06-25T22:59:17.252912Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs website).\n", + "dependencies = [\"cleanlab\", \"matplotlib\", \"datasets\"] # TODO: make sure this list is updated\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:17.255988Z", + "iopub.status.busy": "2024-06-25T22:59:17.255721Z", + "iopub.status.idle": "2024-06-25T22:59:17.259172Z", + "shell.execute_reply": "2024-06-25T22:59:17.258757Z" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.model_selection import cross_val_predict\n", + "\n", + "from cleanlab import Datalab" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Create and load the data (can skip these details)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll load a toy classification dataset for this tutorial. The dataset has two numerical features and a label column with three possible classes. Each example is classified as either: *low*, *mid* or *high*." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
See the code for data generation. **(click to expand)**\n", + " \n", + "```ipython3\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "from sklearn.model_selection import train_test_split\n", + "from cleanlab.benchmarking.noise_generation import (\n", + " generate_noise_matrix_from_trace,\n", + " generate_noisy_labels,\n", + ")\n", + "\n", + "SEED = 123\n", + "np.random.seed(SEED)\n", + "\n", + "BINS = {\n", + " \"low\": [-np.inf, 3.3],\n", + " \"mid\": [3.3, 6.6],\n", + " \"high\": [6.6, +np.inf],\n", + "}\n", + "\n", + "BINS_MAP = {\n", + " \"low\": 0,\n", + " \"mid\": 1,\n", + " \"high\": 2,\n", + "}\n", + "\n", + "\n", + "def create_data():\n", + "\n", + " X = np.random.rand(250, 2) * 5\n", + " y = np.sum(X, axis=1)\n", + " # Map y to bins based on the BINS dict\n", + " y_bin = np.array([k for y_i in y for k, v in BINS.items() if v[0] <= y_i < v[1]])\n", + " y_bin_idx = np.array([BINS_MAP[k] for k in y_bin])\n", + "\n", + " # Split into train and test\n", + " X_train, X_test, y_train, y_test, y_train_idx, y_test_idx = train_test_split(\n", + " X, y_bin, y_bin_idx, test_size=0.5, random_state=SEED\n", + " )\n", + "\n", + " # Add several (5) out-of-distribution points. Sliding them along the decision boundaries\n", + " # to make them look like they are out-of-frame\n", + " X_out = np.array(\n", + " [\n", + " [-1.5, 3.0],\n", + " [-1.75, 6.5],\n", + " [1.5, 7.2],\n", + " [2.5, -2.0],\n", + " [5.5, 7.0],\n", + " ]\n", + " )\n", + " # Add a near duplicate point to the last outlier, with some tiny noise added\n", + " near_duplicate = X_out[-1:] + np.random.rand(1, 2) * 1e-6\n", + " X_out = np.concatenate([X_out, near_duplicate])\n", + "\n", + " y_out = np.sum(X_out, axis=1)\n", + " y_out_bin = np.array([k for y_i in y_out for k, v in BINS.items() if v[0] <= y_i < v[1]])\n", + " y_out_bin_idx = np.array([BINS_MAP[k] for k in y_out_bin])\n", + "\n", + " # Add to train\n", + " X_train = np.concatenate([X_train, X_out])\n", + " y_train = np.concatenate([y_train, y_out])\n", + " y_train_idx = np.concatenate([y_train_idx, y_out_bin_idx])\n", + "\n", + " # Add an exact duplicate example to the training set\n", + " exact_duplicate_idx = np.random.randint(0, len(X_train))\n", + " X_duplicate = X_train[exact_duplicate_idx, None]\n", + " y_duplicate = y_train[exact_duplicate_idx, None]\n", + " y_duplicate_idx = y_train_idx[exact_duplicate_idx, None]\n", + "\n", + " # Add to train\n", + " X_train = np.concatenate([X_train, X_duplicate])\n", + " y_train = np.concatenate([y_train, y_duplicate])\n", + " y_train_idx = np.concatenate([y_train_idx, y_duplicate_idx])\n", + "\n", + " py = np.bincount(y_train_idx) / float(len(y_train_idx))\n", + " m = len(BINS)\n", + "\n", + " noise_matrix = generate_noise_matrix_from_trace(\n", + " m,\n", + " trace=0.9 * m,\n", + " py=py,\n", + " valid_noise_matrix=True,\n", + " seed=SEED,\n", + " )\n", + "\n", + " noisy_labels_idx = generate_noisy_labels(y_train_idx, noise_matrix)\n", + " noisy_labels = np.array([list(BINS_MAP.keys())[i] for i in noisy_labels_idx])\n", + "\n", + " return X_train, y_train_idx, noisy_labels, noisy_labels_idx, X_out, X_duplicate\n", + "```\n", + "\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:17.261320Z", + "iopub.status.busy": "2024-06-25T22:59:17.260998Z", + "iopub.status.idle": "2024-06-25T22:59:17.269851Z", + "shell.execute_reply": "2024-06-25T22:59:17.269405Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "from cleanlab.benchmarking.noise_generation import (\n", + " generate_noise_matrix_from_trace,\n", + " generate_noisy_labels,\n", + ")\n", + "\n", + "SEED = 123\n", + "np.random.seed(SEED)\n", + "\n", + "BINS = {\n", + " \"low\": [-np.inf, 3.3],\n", + " \"mid\": [3.3, 6.6],\n", + " \"high\": [6.6, +np.inf],\n", + "}\n", + "\n", + "BINS_MAP = {\n", + " \"low\": 0,\n", + " \"mid\": 1,\n", + " \"high\": 2,\n", + "}\n", + "\n", + "\n", + "def create_data():\n", + "\n", + " X = np.random.rand(250, 2) * 5\n", + " y = np.sum(X, axis=1)\n", + " # Map y to bins based on the BINS dict\n", + " y_bin = np.array([k for y_i in y for k, v in BINS.items() if v[0] <= y_i < v[1]])\n", + " y_bin_idx = np.array([BINS_MAP[k] for k in y_bin])\n", + "\n", + " # Split into train and test\n", + " X_train, X_test, y_train, y_test, y_train_idx, y_test_idx = train_test_split(\n", + " X, y_bin, y_bin_idx, test_size=0.5, random_state=SEED\n", + " )\n", + "\n", + " # Add several (5) out-of-distribution points. Sliding them along the decision boundaries\n", + " # to make them look like they are out-of-frame\n", + " X_out = np.array(\n", + " [\n", + " [-1.5, 3.0],\n", + " [-1.75, 6.5],\n", + " [1.5, 7.2],\n", + " [2.5, -2.0],\n", + " [5.5, 7.0],\n", + " ]\n", + " )\n", + " # Add a near duplicate point to the last outlier, with some tiny noise added\n", + " near_duplicate = X_out[-1:] + np.random.rand(1, 2) * 1e-6\n", + " X_out = np.concatenate([X_out, near_duplicate])\n", + "\n", + " y_out = np.sum(X_out, axis=1)\n", + " y_out_bin = np.array([k for y_i in y_out for k, v in BINS.items() if v[0] <= y_i < v[1]])\n", + " y_out_bin_idx = np.array([BINS_MAP[k] for k in y_out_bin])\n", + "\n", + " # Add to train\n", + " X_train = np.concatenate([X_train, X_out])\n", + " y_train = np.concatenate([y_train, y_out])\n", + " y_train_idx = np.concatenate([y_train_idx, y_out_bin_idx])\n", + "\n", + " # Add an exact duplicate example to the training set\n", + " exact_duplicate_idx = np.random.randint(0, len(X_train))\n", + " X_duplicate = X_train[exact_duplicate_idx, None]\n", + " y_duplicate = y_train[exact_duplicate_idx, None]\n", + " y_duplicate_idx = y_train_idx[exact_duplicate_idx, None]\n", + "\n", + " # Add to train\n", + " X_train = np.concatenate([X_train, X_duplicate])\n", + " y_train = np.concatenate([y_train, y_duplicate])\n", + " y_train_idx = np.concatenate([y_train_idx, y_duplicate_idx])\n", + "\n", + " py = np.bincount(y_train_idx) / float(len(y_train_idx))\n", + " m = len(BINS)\n", + "\n", + " noise_matrix = generate_noise_matrix_from_trace(\n", + " m,\n", + " trace=0.9 * m,\n", + " py=py,\n", + " valid_noise_matrix=True,\n", + " seed=SEED,\n", + " )\n", + "\n", + " noisy_labels_idx = generate_noisy_labels(y_train_idx, noise_matrix)\n", + " noisy_labels = np.array([list(BINS_MAP.keys())[i] for i in noisy_labels_idx])\n", + " # Assign few datapoints to rare class\n", + " random_idx = np.random.randint(0, X_train.shape[0], 3)\n", + " noisy_labels[random_idx] = \"max\"\n", + " noisy_labels_idx[random_idx] = np.max(y_bin_idx) + 1\n", + " \n", + "\n", + " return X_train, y_train_idx, noisy_labels, noisy_labels_idx, X_out, X_duplicate" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:17.271937Z", + "iopub.status.busy": "2024-06-25T22:59:17.271495Z", + "iopub.status.idle": "2024-06-25T22:59:17.276247Z", + "shell.execute_reply": "2024-06-25T22:59:17.275702Z" + } + }, + "outputs": [], + "source": [ + "X_train, y_train_idx, noisy_labels, noisy_labels_idx, X_out, X_duplicate = create_data()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We make a scatter plot of the features, with a color corresponding to the observed labels. Incorrect given labels are highlighted in red if they do not match the true label, outliers highlighted with an a black cross, and duplicates highlighted with a cyan cross." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
See the code to visualize the data. **(click to expand)**\n", + " \n", + "```ipython3\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "import matplotlib.pyplot as plt\n", + "\n", + "def plot_data(X_train, y_train_idx, noisy_labels_idx, X_out, X_duplicate):\n", + " # Plot data with clean labels and noisy labels, use BINS_MAP for the legend\n", + " fig, ax = plt.subplots(figsize=(8, 6.5))\n", + " \n", + " low = ax.scatter(X_train[noisy_labels_idx == 0, 0], X_train[noisy_labels_idx == 0, 1], label=\"low\")\n", + " mid = ax.scatter(X_train[noisy_labels_idx == 1, 0], X_train[noisy_labels_idx == 1, 1], label=\"mid\")\n", + " high = ax.scatter(X_train[noisy_labels_idx == 2, 0], X_train[noisy_labels_idx == 2, 1], label=\"high\")\n", + " \n", + " ax.set_title(\"Noisy labels\")\n", + " ax.set_xlabel(r\"$x_1$\", fontsize=16)\n", + " ax.set_ylabel(r\"$x_2$\", fontsize=16)\n", + "\n", + " # Plot true boundaries (x+y=3.3, x+y=6.6)\n", + " ax.set_xlim(-3.5, 9.0)\n", + " ax.set_ylim(-3.5, 9.0)\n", + " ax.plot([-0.7, 4.0], [4.0, -0.7], color=\"k\", linestyle=\"--\", alpha=0.5)\n", + " ax.plot([-0.7, 7.3], [7.3, -0.7], color=\"k\", linestyle=\"--\", alpha=0.5)\n", + "\n", + " # Draw red circles around the points that are misclassified (i.e. the points that are in the wrong bin)\n", + " for i, (X, y) in enumerate(zip([X_train, X_train], [y_train_idx, noisy_labels_idx])):\n", + " for j, (k, v) in enumerate(BINS_MAP.items()):\n", + " label_err = ax.scatter(\n", + " X[(y == v) & (y != y_train_idx), 0],\n", + " X[(y == v) & (y != y_train_idx), 1],\n", + " s=180,\n", + " marker=\"o\",\n", + " facecolor=\"none\",\n", + " edgecolors=\"red\",\n", + " linewidths=2.5,\n", + " alpha=0.5,\n", + " label=\"Label error\",\n", + " )\n", + "\n", + "\n", + " outlier = ax.scatter(X_out[:, 0], X_out[:, 1], color=\"k\", marker=\"x\", s=100, linewidth=2, label=\"Outlier\")\n", + "\n", + " # Plot the exact duplicate\n", + " dups = ax.scatter(\n", + " X_duplicate[:, 0],\n", + " X_duplicate[:, 1],\n", + " color=\"c\",\n", + " marker=\"x\",\n", + " s=100,\n", + " linewidth=2,\n", + " label=\"Duplicates\",\n", + " )\n", + " \n", + " first_legend = ax.legend(handles=[low, mid, high], loc=[0.75, 0.7], title=\"Given Class Label\", alignment=\"left\", title_fontproperties={\"weight\":\"semibold\"})\n", + " second_legend = ax.legend(handles=[label_err, outlier, dups], loc=[0.75, 0.45], title=\"Type of Issue\", alignment=\"left\", title_fontproperties={\"weight\":\"semibold\"})\n", + " \n", + " ax = plt.gca().add_artist(first_legend)\n", + " ax = plt.gca().add_artist(second_legend)\n", + " plt.tight_layout()\n", + "```\n", + " \n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:17.278685Z", + "iopub.status.busy": "2024-06-25T22:59:17.278223Z", + "iopub.status.idle": "2024-06-25T22:59:17.466377Z", + "shell.execute_reply": "2024-06-25T22:59:17.465700Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "def plot_data(X_train, y_train_idx, noisy_labels_idx, X_out, X_duplicate):\n", + " # Plot data with clean labels and noisy labels, use BINS_MAP for the legend\n", + " fig, ax = plt.subplots(figsize=(6, 4))\n", + " \n", + " low = ax.scatter(X_train[noisy_labels_idx == 0, 0], X_train[noisy_labels_idx == 0, 1], label=\"low\")\n", + " mid = ax.scatter(X_train[noisy_labels_idx == 1, 0], X_train[noisy_labels_idx == 1, 1], label=\"mid\")\n", + " high = ax.scatter(X_train[noisy_labels_idx == 2, 0], X_train[noisy_labels_idx == 2, 1], label=\"high\")\n", + " \n", + " ax.set_title(\"Noisy labels\")\n", + " ax.set_xlabel(r\"$x_1$\", fontsize=16)\n", + " ax.set_ylabel(r\"$x_2$\", fontsize=16)\n", + "\n", + " # Plot true boundaries (x+y=3.3, x+y=6.6)\n", + " ax.set_xlim(-2.5, 8.5)\n", + " ax.set_ylim(-3.5, 9.0)\n", + " ax.plot([-0.7, 4.0], [4.0, -0.7], color=\"k\", linestyle=\"--\", alpha=0.5)\n", + " ax.plot([-0.7, 7.3], [7.3, -0.7], color=\"k\", linestyle=\"--\", alpha=0.5)\n", + "\n", + " # Draw red circles around the points that are misclassified (i.e. the points that are in the wrong bin)\n", + " for i, (X, y) in enumerate(zip([X_train, X_train], [y_train_idx, noisy_labels_idx])):\n", + " for j, (k, v) in enumerate(BINS_MAP.items()):\n", + " label_err = ax.scatter(\n", + " X[(y == v) & (y != y_train_idx), 0],\n", + " X[(y == v) & (y != y_train_idx), 1],\n", + " s=180,\n", + " marker=\"o\",\n", + " facecolor=\"none\",\n", + " edgecolors=\"red\",\n", + " linewidths=2.5,\n", + " alpha=0.5,\n", + " label=\"Label error\",\n", + " )\n", + "\n", + "\n", + " outlier = ax.scatter(X_out[:, 0], X_out[:, 1], color=\"k\", marker=\"x\", s=100, linewidth=2, label=\"Outlier\")\n", + "\n", + " # Plot the exact duplicate\n", + " dups = ax.scatter(\n", + " X_duplicate[:, 0],\n", + " X_duplicate[:, 1],\n", + " color=\"c\",\n", + " marker=\"x\",\n", + " s=100,\n", + " linewidth=2,\n", + " label=\"Duplicates\",\n", + " )\n", + " \n", + " title_fontproperties = {\"weight\":\"semibold\", \"size\": 8}\n", + " first_legend = ax.legend(handles=[low, mid, high], loc=[0.76, 0.7], title=\"Given Class Label\", alignment=\"left\", title_fontproperties=title_fontproperties, fontsize=8, markerscale=0.5)\n", + " second_legend = ax.legend(handles=[label_err, outlier, dups], loc=[0.76, 0.46], title=\"Type of Issue\", alignment=\"left\", title_fontproperties=title_fontproperties, fontsize=8, markerscale=0.5)\n", + " \n", + " ax = plt.gca().add_artist(first_legend)\n", + " ax = plt.gca().add_artist(second_legend)\n", + " plt.tight_layout()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:17.468995Z", + "iopub.status.busy": "2024-06-25T22:59:17.468541Z", + "iopub.status.idle": "2024-06-25T22:59:17.842164Z", + "shell.execute_reply": "2024-06-25T22:59:17.841582Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_data(X_train, y_train_idx, noisy_labels_idx, X_out, X_duplicate)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In real-world scenarios, you won't know the true labels or the distribution of the features, so we won't use these in this tutorial, except for evaluation purposes.\n", + "\n", + "\n", + "\n", + "`Datalab` has several ways of loading the data.\n", + "In this case, we'll simply wrap the training features and noisy labels in a dictionary so that we can pass it to `Datalab`." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:17.844521Z", + "iopub.status.busy": "2024-06-25T22:59:17.844084Z", + "iopub.status.idle": "2024-06-25T22:59:17.846842Z", + "shell.execute_reply": "2024-06-25T22:59:17.846404Z" + } + }, + "outputs": [], + "source": [ + "data = {\"X\": X_train, \"y\": noisy_labels}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Other supported data formats for `Datalab` include: [HuggingFace Datasets](https://huggingface.co/docs/datasets/index) and [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html). `Datalab` works across most data modalities (image, text, tabular, audio, etc). It is intended to find issues that commonly occur in datasets for which you have trained a supervised ML model, regardless of the type of data.\n", + "\n", + "Currently, pandas DataFrames that contain categorical columns might cause some issues when instantiating the `Datalab` object, so it is recommended to ensure that your DataFrame does not contain any categorical columns, or use other data formats (eg. python dictionary, HuggingFace Datasets) to pass in your data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Get out-of-sample predicted probabilities from a classifier" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To detect certain types of issues in classification data (e.g. label errors), `Datalab` relies on predicted class probabilities from a trained model. Ideally, the prediction for each example should be out-of-sample (to avoid overfitting), coming from a copy of the model that was not trained on this example. \n", + "\n", + "This tutorial uses a simple logistic regression model \n", + "and the `cross_val_predict()` function from scikit-learn to generate out-of-sample predicted class probabilities for every example in the training set. You can replace this with *any* other classifier model and train it with cross-validation to get out-of-sample predictions.\n", + "Make sure that the columns of your `pred_probs` are properly ordered with respect to the ordering of classes, which for Datalab is: lexicographically sorted by class name." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:17.848849Z", + "iopub.status.busy": "2024-06-25T22:59:17.848666Z", + "iopub.status.idle": "2024-06-25T22:59:17.884461Z", + "shell.execute_reply": "2024-06-25T22:59:17.883825Z" + } + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "model = LogisticRegression()\n", + "pred_probs = cross_val_predict(\n", + " estimator=model, X=data[\"X\"], y=data[\"y\"], cv=5, method=\"predict_proba\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Use Datalab to find issues in the dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We create a `Datalab` object from the dataset, also providing the name of the label column in the dataset. Only instantiate one `Datalab` object per dataset, and note that only classification datasets are supported for now.\n", + "\n", + "All that is need to audit your data is to call `find_issues()`.\n", + "This method accepts various inputs like: predicted class probabilities, numeric feature representations of the data. The more information you provide here, the more thoroughly `Datalab` will audit your data! Note that `features` should be some numeric representation of each example, either obtained through preprocessing transformation of your raw data or embeddings from a (pre)trained model. In this case, our data is already entirely numeric so we just provide the features directly." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:17.886873Z", + "iopub.status.busy": "2024-06-25T22:59:17.886451Z", + "iopub.status.idle": "2024-06-25T22:59:19.940099Z", + "shell.execute_reply": "2024-06-25T22:59:19.939460Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding null issues ...\n", + "Finding label issues ...\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/runner/work/cleanlab/cleanlab/cleanlab/filter.py:904: UserWarning: May not flag all label issues in class: 2, it has too few examples (see `min_examples_per_class` argument)\n", + " warnings.warn(\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding outlier issues ...\n", + "Finding near_duplicate issues ...\n", + "Finding non_iid issues ...\n", + "Finding class_imbalance issues ...\n", + "Finding underperforming_group issues ...\n", + "\n", + "Audit complete. 30 issues found in the dataset.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/sklearn/neighbors/_base.py:246: EfficiencyWarning: Precomputed sparse input was not sorted by row values. Use the function sklearn.neighbors.sort_graph_by_row_values to sort the input by row values, with warn_when_not_sorted=False to remove this warning.\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "lab = Datalab(data, label_name=\"y\")\n", + "lab.find_issues(pred_probs=pred_probs, features=data[\"X\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's review the results of this audit using `report()`.\n", + "This provides a high-level summary of each type of issue found in the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:19.942524Z", + "iopub.status.busy": "2024-06-25T22:59:19.942128Z", + "iopub.status.idle": "2024-06-25T22:59:19.961923Z", + "shell.execute_reply": "2024-06-25T22:59:19.961449Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dataset Information: num_examples: 132, num_classes: 4\n", + "\n", + "Here is a summary of various issues found in your data:\n", + "\n", + " issue_type num_issues\n", + " label 17\n", + " outlier 6\n", + " near_duplicate 4\n", + "class_imbalance 3\n", + "\n", + "Learn about each issue: https://docs.cleanlab.ai/stable/cleanlab/datalab/guide/issue_type_description.html\n", + "See which examples in your dataset exhibit each issue via: `datalab.get_issues()`\n", + "\n", + "Data indices corresponding to top examples of each issue are shown below.\n", + "\n", + "\n", + "----------------------- label issues -----------------------\n", + "\n", + "About this issue:\n", + "\tExamples whose given label is estimated to be potentially incorrect\n", + " (e.g. due to annotation error) are flagged as having label issues.\n", + " \n", + "\n", + "Number of examples with this issue: 17\n", + "Overall dataset quality in terms of this issue: 0.8561\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_label_issue label_score given_label predicted_label\n", + "77 False 0.001908 max mid\n", + "58 False 0.003564 max high\n", + "8 False 0.007331 max mid\n", + "7 True 0.008963 low mid\n", + "120 True 0.009664 high mid\n", + "\n", + "\n", + "---------------------- outlier issues ----------------------\n", + "\n", + "About this issue:\n", + "\tExamples that are very different from the rest of the dataset \n", + " (i.e. potentially out-of-distribution or rare/anomalous instances).\n", + " \n", + "\n", + "Number of examples with this issue: 6\n", + "Overall dataset quality in terms of this issue: 0.3558\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_outlier_issue outlier_score\n", + "126 True 0.006636\n", + "130 True 0.012571\n", + "129 True 0.012571\n", + "127 True 0.014909\n", + "128 True 0.017443\n", + "\n", + "\n", + "------------------ near_duplicate issues -------------------\n", + "\n", + "About this issue:\n", + "\tA (near) duplicate issue refers to two or more examples in\n", + " a dataset that are extremely similar to each other, relative\n", + " to the rest of the dataset. The examples flagged with this issue\n", + " may be exactly duplicated, or lie atypically close together when\n", + " represented as vectors (i.e. feature embeddings).\n", + " \n", + "\n", + "Number of examples with this issue: 4\n", + "Overall dataset quality in terms of this issue: 0.6160\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets distance_to_nearest_neighbor\n", + "131 True 0.000000 [123] 0.000000e+00\n", + "123 True 0.000000 [131] 0.000000e+00\n", + "129 True 0.000002 [130] 4.463180e-07\n", + "130 True 0.000002 [129] 4.463180e-07\n", + "51 False 0.161148 [] 3.859087e-02\n", + "\n", + "\n", + "------------------ class_imbalance issues ------------------\n", + "\n", + "About this issue:\n", + "\tExamples belonging to the most under-represented class in the dataset.\n", + "\n", + "Number of examples with this issue: 3\n", + "Overall dataset quality in terms of this issue: 0.0227\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_class_imbalance_issue class_imbalance_score given_label\n", + "8 True 0.022727 max\n", + "77 True 0.022727 max\n", + "58 True 0.022727 max\n", + "86 False 1.000000 mid\n", + "87 False 1.000000 mid\n", + "\n", + "Additional Information: \n", + "Rarest Class: max\n" + ] + } + ], + "source": [ + "lab.report()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Learn more about the issues in your dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Datalab detects all sorts of issues in a dataset and what to do with the findings will vary case-by-case. For automated improvement of a dataset via best practices to handle auto-detected issues, try [Cleanlab Studio](https://cleanlab.ai/?utm_source=internal&utm_medium=blog&utm_campaign=clostostudio).\n", + "\n", + "To conceptually understand how each type of issue is defined and what it means if detected in your data, check out the [Issue Type Descriptions](../../cleanlab/datalab/guide/issue_type_description.html) page. The [Datalab Issue Types](https://docs.cleanlab.ai/stable/cleanlab/datalab/guide/issue_type_description.html) page also lists additional types of issues that `Datalab.find_issues()` can detect, as well as optional parameters you can specify for greater control over how your data are checked.\n", + "\n", + "Datalab offers several methods to understand more details about a particular issue in your dataset.\n", + "The `get_issue_summary()` method fetches summary statistics regarding how severe each type of issue is overall across the whole dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:19.964041Z", + "iopub.status.busy": "2024-06-25T22:59:19.963733Z", + "iopub.status.idle": "2024-06-25T22:59:19.970181Z", + "shell.execute_reply": "2024-06-25T22:59:19.969663Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
issue_typescorenum_issues
0null1.0000000
1label0.85606117
2outlier0.3557726
3near_duplicate0.6160344
4non_iid0.8217500
5class_imbalance0.0227273
6underperforming_group0.9015620
\n", + "
" + ], + "text/plain": [ + " issue_type score num_issues\n", + "0 null 1.000000 0\n", + "1 label 0.856061 17\n", + "2 outlier 0.355772 6\n", + "3 near_duplicate 0.616034 4\n", + "4 non_iid 0.821750 0\n", + "5 class_imbalance 0.022727 3\n", + "6 underperforming_group 0.901562 0" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "lab.get_issue_summary()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the returned summary DataFrame: LOWER `score` values indicate types of issues that are MORE severe *overall* across the dataset (lower-quality data in terms of this issue), HIGHER `num_issues` values indicate types of issues that are MORE severe *overall* across the dataset (more datapoints appear to exhibit this issue).\n", + "\n", + "We can also only request the summary for a particular type of issue." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:19.972190Z", + "iopub.status.busy": "2024-06-25T22:59:19.971888Z", + "iopub.status.idle": "2024-06-25T22:59:19.977670Z", + "shell.execute_reply": "2024-06-25T22:59:19.977201Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
issue_typescorenum_issues
0label0.85606117
\n", + "
" + ], + "text/plain": [ + " issue_type score num_issues\n", + "0 label 0.856061 17" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "lab.get_issue_summary(\"label\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `get_issues()` method returns information for each *individual example* in the dataset including: whether or not it is plagued by this issue (Boolean), as well as a *quality score* (numeric value betweeen 0 to 1) quantifying how severe this issue appears to be for this particular example." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:19.979760Z", + "iopub.status.busy": "2024-06-25T22:59:19.979425Z", + "iopub.status.idle": "2024-06-25T22:59:19.989814Z", + "shell.execute_reply": "2024-06-25T22:59:19.989379Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_null_issuenull_scoreis_label_issuelabel_scoreis_outlier_issueoutlier_scoreis_near_duplicate_issuenear_duplicate_scoreis_non_iid_issuenon_iid_scoreis_class_imbalance_issueclass_imbalance_scoreis_underperforming_group_issueunderperforming_group_score
0False1.0False0.859131False0.417707False0.664083False0.970324False1.0False1.0
1False1.0False0.816953False0.375317False0.641516False0.890575False1.0False1.0
2False1.0False0.531021False0.460593False0.601188False0.826147False1.0False1.0
3False1.0False0.752808False0.321635False0.562539False0.948362False1.0False1.0
4False1.0True0.090243False0.472909False0.746763False0.878267False1.0False1.0
\n", + "
" + ], + "text/plain": [ + " is_null_issue null_score is_label_issue label_score is_outlier_issue \\\n", + "0 False 1.0 False 0.859131 False \n", + "1 False 1.0 False 0.816953 False \n", + "2 False 1.0 False 0.531021 False \n", + "3 False 1.0 False 0.752808 False \n", + "4 False 1.0 True 0.090243 False \n", + "\n", + " outlier_score is_near_duplicate_issue near_duplicate_score \\\n", + "0 0.417707 False 0.664083 \n", + "1 0.375317 False 0.641516 \n", + "2 0.460593 False 0.601188 \n", + "3 0.321635 False 0.562539 \n", + "4 0.472909 False 0.746763 \n", + "\n", + " is_non_iid_issue non_iid_score is_class_imbalance_issue \\\n", + "0 False 0.970324 False \n", + "1 False 0.890575 False \n", + "2 False 0.826147 False \n", + "3 False 0.948362 False \n", + "4 False 0.878267 False \n", + "\n", + " class_imbalance_score is_underperforming_group_issue \\\n", + "0 1.0 False \n", + "1 1.0 False \n", + "2 1.0 False \n", + "3 1.0 False \n", + "4 1.0 False \n", + "\n", + " underperforming_group_score \n", + "0 1.0 \n", + "1 1.0 \n", + "2 1.0 \n", + "3 1.0 \n", + "4 1.0 " + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "lab.get_issues().head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each example receives a separate *quality score* for each issue type (eg. `outlier_score` is the *quality score* for the `outlier` issue type, quantifying *how typical* each datapoint appears to be). LOWER scores indicate MORE severe instances of the issue, so the most-concerning datapoints have the lowest quality scores. Sort by these scores to see the most-concerning examples in your dataset for each type of issue. The quality scores are directly comparable between examples/datasets, but not across different issue types.\n", + "\n", + "Similar to above, we can pass the type of issue as a argument to `get_issues()` to get the information for one particular type of issue.\n", + "As an example, let's see the examples identified as having the most severe *label* issues:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:19.991928Z", + "iopub.status.busy": "2024-06-25T22:59:19.991591Z", + "iopub.status.idle": "2024-06-25T22:59:20.000388Z", + "shell.execute_reply": "2024-06-25T22:59:19.999887Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_label_issuelabel_scoregiven_labelpredicted_label
7True0.008963lowmid
120True0.009664highmid
40True0.013445midlow
107True0.025184highmid
53True0.026376highmid
\n", + "
" + ], + "text/plain": [ + " is_label_issue label_score given_label predicted_label\n", + "7 True 0.008963 low mid\n", + "120 True 0.009664 high mid\n", + "40 True 0.013445 mid low\n", + "107 True 0.025184 high mid\n", + "53 True 0.026376 high mid" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "examples_w_issue = (\n", + " lab.get_issues(\"label\")\n", + " .query(\"is_label_issue\")\n", + " .sort_values(\"label_score\")\n", + ")\n", + "\n", + "examples_w_issue.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Inspecting the labels for some of these top-ranked examples, we find their given label was indeed incorrect." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Get additional information \n", + "\n", + "Miscellaneous additional information (statistics, intermediate results, etc) related to a particular issue type can be accessed via `get_info(issue_name)`." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:20.002549Z", + "iopub.status.busy": "2024-06-25T22:59:20.002211Z", + "iopub.status.idle": "2024-06-25T22:59:20.009019Z", + "shell.execute_reply": "2024-06-25T22:59:20.008557Z" + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Class NameClass IndexLabel IssuesInverse Label IssuesLabel NoiseInverse Label NoiseLabel Quality Score
0low11220.4285710.1111110.571429
1high01120.4074070.1111110.592593
2mid32550.3378380.0925930.662162
3max21400.3333330.9523810.666667
\n", + "
" + ], + "text/plain": [ + " Class Name Class Index Label Issues Inverse Label Issues Label Noise \\\n", + "0 low 1 12 2 0.428571 \n", + "1 high 0 11 2 0.407407 \n", + "2 mid 3 25 5 0.337838 \n", + "3 max 2 1 40 0.333333 \n", + "\n", + " Inverse Label Noise Label Quality Score \n", + "0 0.111111 0.571429 \n", + "1 0.111111 0.592593 \n", + "2 0.092593 0.662162 \n", + "3 0.952381 0.666667 " + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_issues_info = lab.get_info(\"label\")\n", + "label_issues_info[\"classes_by_label_quality\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This portion of the info shows overall label quality summaries of all examples annotated as a particular class (e.g. the `Label Issues` column is the estimated number of examples labeled as this class that should actually have a different label).\n", + "To learn more about this, see the documentation for the [cleanlab.dataset.rank_classes_by_label_quality](../../cleanlab/dataset.html#cleanlab.dataset.rank_classes_by_label_quality)\n", + "method.\n", + "\n", + "You can view all sorts of information regarding your dataset using the `get_info()` method with no arguments passed. This is not printed here as it returns a huge dictionary but feel free to check it out yourself! Don't worry if you don't understand all of the miscellaneous information in this `info` dictionary, none of it is critical to diagnose the issues in your dataset. Understanding miscellaneous info may require reading the documentation of the miscellaneous cleanlab functions which computed it.\n", + "\n", + "#### Near duplicate issues \n", + "\n", + "Let's also inspect the examples flagged as (near) duplicates.\n", + "For each such example, the `near_duplicate_sets` column below indicates *which* other examples in the dataset are highly similar to it (this value is empty for examples not flagged as nearly duplicated). The `near_duplicate_score` quantifies *how similar* each example is to its nearest neighbor in the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:20.011153Z", + "iopub.status.busy": "2024-06-25T22:59:20.010812Z", + "iopub.status.idle": "2024-06-25T22:59:20.020230Z", + "shell.execute_reply": "2024-06-25T22:59:20.019671Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_near_duplicate_issuenear_duplicate_scorenear_duplicate_setsdistance_to_nearest_neighbor
123True0.000000[131]0.000000e+00
131True0.000000[123]0.000000e+00
129True0.000002[130]4.463180e-07
130True0.000002[129]4.463180e-07
\n", + "
" + ], + "text/plain": [ + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets \\\n", + "123 True 0.000000 [131] \n", + "131 True 0.000000 [123] \n", + "129 True 0.000002 [130] \n", + "130 True 0.000002 [129] \n", + "\n", + " distance_to_nearest_neighbor \n", + "123 0.000000e+00 \n", + "131 0.000000e+00 \n", + "129 4.463180e-07 \n", + "130 4.463180e-07 " + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "lab.get_issues(\"near_duplicate\").query(\"is_near_duplicate_issue\").sort_values(\"near_duplicate_score\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Learn more about handling near duplicates detected in a dataset from [the FAQ](../faq.html#How-to-handle-near-duplicate-data-identified-by-cleanlab?). \n", + "\n", + "Other issues detected in this tutorial dataset include **outliers** and **class imbalance**, see the [Issue Type Descriptions](../../cleanlab/datalab/guide/issue_type_description.html) for more information. `Datalab` makes it very easy to check your datasets for all sorts of issues that are important to deal with for training robust models. The inputs it uses to detect issues can come from *any* model you have trained (the better your model, the more accurate the issue detection will be).\n", + "\n", + "To learn more, check out this [example notebook](https://github.com/cleanlab/examples/blob/master/datalab_image_classification/datalab.ipynb) (demonstrates Datalab applied to a real dataset) and the [advanced Datalab tutorial](datalab_advanced.html) (demonstrates configuration and customization options to exert greater control)." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:20.022362Z", + "iopub.status.busy": "2024-06-25T22:59:20.022024Z", + "iopub.status.idle": "2024-06-25T22:59:20.033881Z", + "shell.execute_reply": "2024-06-25T22:59:20.033295Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "from sklearn.metrics import roc_auc_score\n", + "\n", + "issue_results = lab.get_issues(\"label\")\n", + "outlier_results = lab.get_issues(\"outlier\")\n", + "duplicate_results = lab.get_issues(\"near_duplicate\")\n", + "\n", + "def jaccard_similarity(l1, l2):\n", + " s1 = set(l1)\n", + " s2 = set(l2)\n", + " intersect_set = s1.intersection(s2)\n", + " union_set = s1.union(s2)\n", + " if len(intersect_set) == 0:\n", + " return 0\n", + " return len(intersect_set) / len(union_set)\n", + "\n", + "identified_label_issues_indices = issue_results[issue_results[\"is_label_issue\"] == True].index.tolist()\n", + "label_issue_indices = np.where(y_train_idx != noisy_labels_idx)[0]\n", + "\n", + "label_quality_scores = issue_results[\"label_score\"].tolist()\n", + "Z = (y_train_idx == noisy_labels_idx).astype(float).tolist()\n", + "\n", + "identified_outlier_issues_indices = outlier_results[outlier_results[\"is_outlier_issue\"] == True].index.to_list()\n", + "outlier_issue_indices = list(range(125, 130+1))\n", + "exact_duplicate_idx = [index for index, elem in enumerate(X_train) if (elem == X_duplicate).all()][0]\n", + "if exact_duplicate_idx >= 125: # if the random index selected to create a duplicate >= 125, then the last point is also an outlier\n", + " outlier_issue_indices.append(131)\n", + " \n", + "identified_duplicate_issues_indices = duplicate_results[duplicate_results[\"is_near_duplicate_issue\"] == True].index.tolist()\n", + "duplicate_issue_indices = [exact_duplicate_idx, 129, 130, 131]\n", + "\n", + "\n", + "assert jaccard_similarity(identified_label_issues_indices, label_issue_indices) > 0.4\n", + "assert roc_auc_score(Z, label_quality_scores) > 0.9\n", + "assert jaccard_similarity(identified_outlier_issues_indices, outlier_issue_indices) > 0.9\n", + "assert jaccard_similarity(identified_duplicate_issues_indices, duplicate_issue_indices) > 0.9" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + }, + "vscode": { + "interpreter": { + "hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe" + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/image.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/image.ipynb new file mode 100644 index 000000000..8cc43a68f --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/image.ipynb @@ -0,0 +1,8739 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Detecting Issues in an Image Dataset with Datalab\n", + "\n", + "This quickstart tutorial demonstrates how to find issues in image classification data. Here we use the Fashion-MNIST dataset (60,000 images of fashion products from 10 categories), but you can replace this with your own image classification dataset and still follow the same tutorial.\n", + "\n", + "**Overview of what we'll do in this tutorial:**\n", + "\n", + "- Build a simple [PyTorch](https://pytorch.org/) neural net.\n", + "\n", + "- Use cross-validation to compute out-of-sample predicted probabilities (`pred_probs`) and feature embeddings (`features`) for each image in the dataset.\n", + "\n", + "- Utilize these `pred_probs` and `features` to identify potential issues within the dataset using the `Datalab` class from cleanlab. The issues found by cleanlab include mislabeled examples, near duplicates, outliers, and image-specific problems such as excessively dark or low information images." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "Already have a ML model? Run cross-validation to get out-of-sample `pred_probs` and provide `features` (embeddings of the data). Then use the code below to find any potential issues in your dataset (you can also run this code with one of `pred_probs` or `features` instead of both, but less issue types will be considered).\n", + "\n", + "\n", + "
\n", + " \n", + "```python\n", + "from cleanlab import Datalab\n", + "\n", + "lab = Datalab(data=your_dataset, label_name=\"column_name_of_labels\") # include `image_key` to detect low-quality images\n", + "lab.find_issues(pred_probs=pred_probs, features=features)\n", + "\n", + "lab.report()\n", + "```\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Install and import required dependencies" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can use `pip` to install all packages required for this tutorial as follows:\n", + "\n", + "```ipython3\n", + "!pip install matplotlib torch torchvision datasets>=2.19.0\n", + "!pip install \"cleanlab[image]\"\n", + "# We install cleanlab with extra dependencies for image data\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install \"cleanlab[image] @ git+https://github.com/cleanlab/cleanlab.git\"\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:23.042776Z", + "iopub.status.busy": "2024-06-25T22:59:23.042620Z", + "iopub.status.idle": "2024-06-25T22:59:25.976008Z", + "shell.execute_reply": "2024-06-25T22:59:25.975453Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (this cell is hidden from docs.cleanlab.ai).\n", + "# If running on Colab, may want to use GPU (select: Runtime > Change runtime type > Hardware accelerator > GPU)\n", + "\n", + "dependencies = [\"cleanlab\", \"matplotlib\", \"torch\", \"torchvision\", \"datasets\", \"cleanvision\"]\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install \"cleanlab[image]\" # for colab\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " missing_dependencies = []\n", + " for dependency in dependencies:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")\n", + "\n", + "# Suppress benign warnings: \n", + "import warnings \n", + "warnings.filterwarnings(\"ignore\", \"Lazy modules are a new feature.*\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:25.978473Z", + "iopub.status.busy": "2024-06-25T22:59:25.978153Z", + "iopub.status.idle": "2024-06-25T22:59:25.981939Z", + "shell.execute_reply": "2024-06-25T22:59:25.981484Z" + } + }, + "outputs": [], + "source": [ + "from torch.utils.data import DataLoader, TensorDataset, Subset\n", + "import torch\n", + "import torch.nn as nn\n", + "import torch.optim as optim\n", + "\n", + "from sklearn.model_selection import StratifiedKFold\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "from tqdm.autonotebook import tqdm\n", + "import math\n", + "import time\n", + "import multiprocessing\n", + "\n", + "from cleanlab import Datalab\n", + "from datasets import load_dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Fetch and normalize the Fashion-MNIST dataset\n", + "\n", + "Load train split of the fashion_mnist dataset and view the number of rows and columns in the dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T22:59:25.983914Z", + "iopub.status.busy": "2024-06-25T22:59:25.983584Z", + "iopub.status.idle": "2024-06-25T22:59:36.604830Z", + "shell.execute_reply": "2024-06-25T22:59:36.604361Z" + } + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/datasets/load.py:1486: FutureWarning: The repository for fashion_mnist contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/fashion_mnist\n", + "You can avoid this message in future by passing the argument `trust_remote_code=True`.\n", + "Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.\n", + " warnings.warn(\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ae6a30ccb0c14d74a979a0380deb3f43", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Downloading builder script: 0%| | 0.00/4.83k [00:00\n", + "Bringing Your Own Data (BYOD)?\n", + "\n", + "Load any huggingface dataset or your local image folder dataset, apply relevant transformations, and continue with the rest of the tutorial.\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Define a classification model\n", + "Here, we define a simple neural network with PyTorch. Note this is just a toy model to ensure quick runtimes for the tutorial, you can replace it with any other (larger) PyTorch network." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:00:04.885214Z", + "iopub.status.busy": "2024-06-25T23:00:04.884639Z", + "iopub.status.idle": "2024-06-25T23:00:04.889665Z", + "shell.execute_reply": "2024-06-25T23:00:04.889242Z" + } + }, + "outputs": [], + "source": [ + "class Net(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.cnn = nn.Sequential(\n", + " nn.Conv2d(1, 6, 5),\n", + " nn.ReLU(),\n", + " nn.BatchNorm2d(6),\n", + " nn.MaxPool2d(2, 2),\n", + " nn.Conv2d(6, 16, 5, bias=False),\n", + " nn.ReLU(),\n", + " nn.BatchNorm2d(16),\n", + " nn.MaxPool2d(2, 2),\n", + " )\n", + " self.linear = nn.Sequential(nn.LazyLinear(128), nn.ReLU())\n", + " self.output = nn.Sequential(nn.Linear(128, num_classes))\n", + "\n", + " def forward(self, x):\n", + " x = self.embeddings(x)\n", + " x = self.output(x)\n", + " return x\n", + "\n", + " def embeddings(self, x):\n", + " x = self.cnn(x)\n", + " x = torch.flatten(x, 1) # flatten all dimensions except batch\n", + " x = self.linear(x)\n", + " return x" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:00:04.891871Z", + "iopub.status.busy": "2024-06-25T23:00:04.891471Z", + "iopub.status.idle": "2024-06-25T23:00:04.895483Z", + "shell.execute_reply": "2024-06-25T23:00:04.895048Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# This (optional) cell is hidden from docs.cleanlab.ai\n", + "\n", + "SEED = 123 # for reproducibility\n", + "np.random.seed(SEED)\n", + "torch.manual_seed(SEED)\n", + "torch.backends.cudnn.deterministic = True\n", + "torch.backends.cudnn.benchmark = True\n", + "torch.cuda.manual_seed_all(SEED)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
Helper methods for cross validation **(click to expand)**\n", + "\n", + "```python\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "# Set device\n", + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "\n", + "\n", + "# Method to calculate validation accuracy in each epoch\n", + "def get_test_accuracy(net, testloader):\n", + " net.eval()\n", + " accuracy = 0.0\n", + " total = 0.0\n", + "\n", + " with torch.no_grad():\n", + " for data in testloader:\n", + " images, labels = data[\"image\"].to(device), data[\"label\"].to(device)\n", + "\n", + " # run the model on the test set to predict labels\n", + " outputs = net(images)\n", + "\n", + " # the label with the highest energy will be our prediction\n", + " _, predicted = torch.max(outputs.data, 1)\n", + " total += labels.size(0)\n", + " accuracy += (predicted == labels).sum().item()\n", + "\n", + " # compute the accuracy over all test images\n", + " accuracy = 100 * accuracy / total\n", + " return accuracy\n", + "\n", + "\n", + "# Method for training the model\n", + "def train(trainloader, testloader, n_epochs, patience):\n", + " model = Net()\n", + "\n", + " criterion = nn.CrossEntropyLoss()\n", + " optimizer = optim.AdamW(model.parameters())\n", + "\n", + " model = model.to(device)\n", + "\n", + " best_test_accuracy = 0.0\n", + "\n", + " for epoch in range(n_epochs): # loop over the dataset multiple times\n", + " start_epoch = time.time()\n", + " running_loss = 0.0\n", + "\n", + " for _, data in enumerate(trainloader):\n", + " # get the inputs; data is a dict of {\"image\": images, \"label\": labels}\n", + "\n", + " inputs, labels = data[\"image\"].to(device), data[\"label\"].to(device)\n", + "\n", + " # zero the parameter gradients\n", + " optimizer.zero_grad()\n", + "\n", + " # forward + backward + optimize\n", + " outputs = model(inputs)\n", + " loss = criterion(outputs, labels)\n", + " loss.backward()\n", + " optimizer.step()\n", + "\n", + " running_loss += loss.detach().cpu().item()\n", + "\n", + " # Get accuracy on the test set\n", + " accuracy = get_test_accuracy(model, testloader)\n", + "\n", + " if accuracy > best_test_accuracy:\n", + " best_epoch = epoch\n", + "\n", + " # Condition for early stopping\n", + " if epoch - best_epoch > patience:\n", + " print(f\"Early stopping at epoch {epoch + 1}\")\n", + " break\n", + "\n", + " end_epoch = time.time()\n", + "\n", + " print(\n", + " f\"epoch: {epoch + 1} loss: {running_loss / len(trainloader):.3f} test acc: {accuracy:.3f} time_taken: {end_epoch - start_epoch:.3f}\"\n", + " )\n", + " return model\n", + "\n", + "\n", + "# Method for computing out-of-sample embeddings\n", + "def compute_embeddings(model, testloader):\n", + " embeddings_list = []\n", + "\n", + " with torch.no_grad():\n", + " for data in tqdm(testloader):\n", + " images, labels = data[\"image\"].to(device), data[\"label\"].to(device)\n", + "\n", + " embeddings = model.embeddings(images)\n", + " embeddings_list.append(embeddings.cpu())\n", + "\n", + " return torch.vstack(embeddings_list)\n", + "\n", + "\n", + "# Method for computing out-of-sample predicted probabilities\n", + "def compute_pred_probs(model, testloader):\n", + " pred_probs_list = []\n", + "\n", + " with torch.no_grad():\n", + " for data in tqdm(testloader):\n", + " images, labels = data[\"image\"].to(device), data[\"label\"].to(device)\n", + "\n", + " outputs = model(images)\n", + " pred_probs_list.append(outputs.cpu())\n", + "\n", + " return torch.vstack(pred_probs_list)\n", + "```\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:00:04.897367Z", + "iopub.status.busy": "2024-06-25T23:00:04.897105Z", + "iopub.status.idle": "2024-06-25T23:00:04.906044Z", + "shell.execute_reply": "2024-06-25T23:00:04.905581Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Set device\n", + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "\n", + "\n", + "# Method to calculate validation accuracy in each epoch\n", + "def get_test_accuracy(net, testloader):\n", + " net.eval()\n", + " accuracy = 0.0\n", + " total = 0.0\n", + "\n", + " with torch.no_grad():\n", + " for data in testloader:\n", + " images, labels = data[0].to(device), data[1].to(device)\n", + "\n", + " # run the model on the test set to predict labels\n", + " outputs = net(images)\n", + "\n", + " # the label with the highest energy will be our prediction\n", + " _, predicted = torch.max(outputs.data, 1)\n", + " total += labels.size(0)\n", + " accuracy += (predicted == labels).sum().item()\n", + "\n", + " # compute the accuracy over all test images\n", + " accuracy = 100 * accuracy / total\n", + " return accuracy\n", + "\n", + "\n", + "# Method for training the model\n", + "def train(trainloader, testloader, n_epochs, patience):\n", + " model = Net()\n", + "\n", + " criterion = nn.CrossEntropyLoss()\n", + " optimizer = optim.AdamW(model.parameters())\n", + "\n", + " model = model.to(device)\n", + "\n", + " best_test_accuracy = 0.0\n", + "\n", + " for epoch in range(n_epochs): # loop over the dataset multiple times\n", + " start_epoch = time.time()\n", + " running_loss = 0.0\n", + "\n", + " for _, data in enumerate(trainloader):\n", + " # get the inputs; data is a dict of {\"image\": images, \"label\": labels}\n", + "\n", + " inputs, labels = data[0].to(device), data[1].to(device)\n", + "\n", + " # zero the parameter gradients\n", + " optimizer.zero_grad()\n", + "\n", + " # forward + backward + optimize\n", + " outputs = model(inputs)\n", + " loss = criterion(outputs, labels)\n", + " loss.backward()\n", + " optimizer.step()\n", + "\n", + " running_loss += loss.detach().cpu().item()\n", + "\n", + " # Get accuracy on the test set\n", + " accuracy = get_test_accuracy(model, testloader)\n", + "\n", + " if accuracy > best_test_accuracy:\n", + " best_epoch = epoch\n", + "\n", + " # Condition for early stopping\n", + " if epoch - best_epoch > patience:\n", + " print(f\"Early stopping at epoch {epoch + 1}\")\n", + " break\n", + "\n", + " end_epoch = time.time()\n", + "\n", + " print(\n", + " f\"epoch: {epoch + 1} loss: {running_loss / len(trainloader):.3f} test acc: {accuracy:.3f} time_taken: {end_epoch - start_epoch:.3f}\"\n", + " )\n", + " return model\n", + "\n", + "\n", + "# Method for computing out-of-sample embeddings\n", + "def compute_embeddings(model, testloader):\n", + " embeddings_list = []\n", + "\n", + " with torch.no_grad():\n", + " for data in tqdm(testloader):\n", + " images, labels = data[0].to(device), data[1].to(device)\n", + "\n", + " embeddings = model.embeddings(images)\n", + " embeddings_list.append(embeddings.cpu())\n", + "\n", + " return torch.vstack(embeddings_list)\n", + "\n", + "\n", + "# Method for computing out-of-sample predicted probabilities\n", + "def compute_pred_probs(model, testloader):\n", + " pred_probs_list = []\n", + "\n", + " with torch.no_grad():\n", + " for data in tqdm(testloader):\n", + " images, labels = data[0].to(device), data[1].to(device)\n", + "\n", + " outputs = model(images)\n", + " pred_probs_list.append(outputs.cpu())\n", + "\n", + " return torch.vstack(pred_probs_list)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Prepare the dataset for K-fold cross-validation \n", + "\n", + "To find label issues based on `pred_probs`, we recommend out-of-sample predictions, which can be produced [via K-fold cross-validation](https://docs.cleanlab.ai/stable/tutorials/pred_probs_cross_val.html). To ensure this tutorial runs quickly, we set K and other important neural network training hyperparameters to small values here. Use larger values to get good results in practice!" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:00:04.907974Z", + "iopub.status.busy": "2024-06-25T23:00:04.907710Z", + "iopub.status.idle": "2024-06-25T23:00:04.933797Z", + "shell.execute_reply": "2024-06-25T23:00:04.933335Z" + } + }, + "outputs": [], + "source": [ + "K = 3 # Number of cross-validation folds. Set to small value here to ensure quick runtimes, we recommend 5 or 10 in practice for more accurate estimates.\n", + "n_epochs = 2 # Number of epochs to train model for. Set to a small value here for quick runtime, you should use a larger value in practice.\n", + "patience = 2 # Parameter for early stopping. If the validation accuracy does not improve for this many epochs, training will stop.\n", + "train_batch_size = 64 # Batch size for training\n", + "test_batch_size = 512 # Batch size for testing\n", + "num_workers = multiprocessing.cpu_count() # Number of workers for data loaders\n", + "\n", + "# Create k splits of the dataset\n", + "kfold = StratifiedKFold(n_splits=K, shuffle=True, random_state=0)\n", + "splits = kfold.split(transformed_dataset, transformed_dataset[\"label\"])\n", + "\n", + "train_id_list, test_id_list = [], []\n", + "\n", + "for fold, (train_ids, test_ids) in enumerate(splits):\n", + " train_id_list.append(train_ids)\n", + " test_id_list.append(test_ids)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Compute out-of-sample predicted probabilities and feature embeddings\n", + "\n", + "We use cross-validation to compute out-of-sample predicted probabilities separately for each dataset fold. However, we use only one model to generate embeddings for all the images across the full dataset. This ensures all feature embeddings lie in the same representation space for more accurate detection of data issues. Here we embed all the data using our model trained in the first cross-validation fold, but you could also train a separate embedding model on the full dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:00:04.936173Z", + "iopub.status.busy": "2024-06-25T23:00:04.935735Z", + "iopub.status.idle": "2024-06-25T23:00:37.815128Z", + "shell.execute_reply": "2024-06-25T23:00:37.814505Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Training on fold: 1 ...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 1 loss: 0.482 test acc: 86.720 time_taken: 4.794\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 2 loss: 0.329 test acc: 88.195 time_taken: 4.521\n", + "Computing feature embeddings ...\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a34c2ac5b25b4ee9b3271f7d1cb58d09", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + " 0%| | 0/40 [00:00)`\n", + "\n", + "Data indices corresponding to top examples of each issue are shown below.\n", + "\n", + "\n", + "---------------------- outlier issues ----------------------\n", + "\n", + "About this issue:\n", + "\tExamples that are very different from the rest of the dataset \n", + " (i.e. potentially out-of-distribution or rare/anomalous instances).\n", + " \n", + "\n", + "Number of examples with this issue: 3772\n", + "Overall dataset quality in terms of this issue: 0.3651\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_outlier_issue outlier_score\n", + "27080 True 3.873833e-07\n", + "40378 True 6.915575e-07\n", + "25316 True 1.390277e-06\n", + "2090 True 3.751164e-06\n", + "14999 True 3.881301e-06\n", + "\n", + "\n", + "----------------------- label issues -----------------------\n", + "\n", + "About this issue:\n", + "\tExamples whose given label is estimated to be potentially incorrect\n", + " (e.g. due to annotation error) are flagged as having label issues.\n", + " \n", + "\n", + "Number of examples with this issue: 3585\n", + "Overall dataset quality in terms of this issue: 0.9569\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_label_issue label_score given_label predicted_label\n", + "11262 True 0.000003 Coat T - shirt / top\n", + "19228 True 0.000010 Dress Shirt\n", + "32657 False 0.000013 Bag Dress\n", + "21282 False 0.000016 Bag Dress\n", + "53564 True 0.000018 Pullover T - shirt / top\n", + "\n", + "\n", + "------------------ near_duplicate issues -------------------\n", + "\n", + "About this issue:\n", + "\tA (near) duplicate issue refers to two or more examples in\n", + " a dataset that are extremely similar to each other, relative\n", + " to the rest of the dataset. The examples flagged with this issue\n", + " may be exactly duplicated, or lie atypically close together when\n", + " represented as vectors (i.e. feature embeddings).\n", + " \n", + "\n", + "Number of examples with this issue: 175\n", + "Overall dataset quality in terms of this issue: 0.6321\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets distance_to_nearest_neighbor\n", + "30968 True 0.001267 [30659] 0.000022\n", + "30659 True 0.001267 [30968] 0.000022\n", + "47824 True 0.001454 [3370] 0.000026\n", + "3370 True 0.001454 [47824] 0.000026\n", + "54565 True 0.001854 [9762, 258, 47139] 0.000033\n", + "\n", + "\n", + "\n", + "Removing grayscale from potential issues in the dataset as it exceeds max_prevalence=0.5 \n", + "------------------ low_information images ------------------\n", + "\n", + "Number of examples with this issue: 166\n", + "Examples representing most severe instances of this issue:\n", + "\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "----------------------- dark images ------------------------\n", + "\n", + "Number of examples with this issue: 16\n", + "Examples representing most severe instances of this issue:\n", + "\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAFhCAYAAADgPRuZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8WgzjOAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA2uElEQVR4nO3deXRV5b3/8U8MM4R50ASRMioxAzTKIHAp3qIIohWlKnK1F2sdlsoP67TqVIdevW29rjosUVvagr14K7UiDmgdioooVBm03lYZVIgyhhCGCCTP7w8W55K9v5GHc07ISZ73ay3X8jzZe59n7/OcnS873+/zZDnnnAAAABCMo+q7AwAAADiyCAABAAACQwAIAAAQGAJAAACAwBAAAgAABIYAEAAAIDAEgAAAAIEhAAQAAAgMASAAAEBgCABT4JzT5Zdfrj59+qikpESrVq2KbfPKK69o0KBBKigo0LBhw7Ry5crEz+bPn6/+/furb9++euKJJxLtI0eOVFFRkQYMGKA777yzxrGKi4uVn5+v//f//l/dnhwahLoag5JUXV2twYMH69xzz020VVZW6pJLLlH//v11wgkn6K233qq7k0ODkeo4/N73vqcOHTrUGGtS7ffCu+++Wz169FDnzp3r7qTQ4NTFOKyoqFBxcXHiv3bt2umBBx6QJH3/+99PtOfl5enss8+u61NMLxewffv2pbT/c8895yZOnBj7/4N98MEH7ssvv3TOObdgwQI3YsQI55xze/fudX379nXr1q1zFRUVrl+/fm7z5s3OOefKy8sT2wwePNi9//77rqqqyvXo0cOtWbPGOefcpZde6hYsWJBS/1H/MnUMOufcY4895iZNmlTjmD/5yU/cPffc45xzbs+ePa6srCyl/iMz1Oc4dM65119/3c2bNy+2n3UvdM659957z5WWlrpOnTql1G9klkwdhwdUV1e7Hj16uNWrV8d+NnnyZDdz5syU+n+kNagngDt27NDpp5+ugoICFRQUaMGCBZKk559/XgMHDlRRUZEuvPBCSdLq1as1atQoFRYWasKECdq6daskadSoUZo2bZpKSko0a9YsLViwQEOHDtXAgQN10UUXac+ePd79mTdvnqZMmSJJGjdunBYtWiQXWVq5uLhYRx99tCTppJNO0vr16yVJ7733nvLz85WXl6c2bdpo7NixevnllyVJbdu2lSTt3btXe/fuVVZWljZv3qw2bdqoZ8+ekqTRo0frT3/6UzKXESkIZQxu3bpVc+bM0WWXXVbjWLNnz9b06dMlSU2bNlX79u0P5/IhTRrTODzQl5ycnNhxrXvhgf2POeYY7/6hboQyDg945513dPTRR+tb3/pWjfavv/5aCxYsaHBPABtUALhgwQJ16tRJK1eu1IoVKzR06FBt3LhRV199tebPn6/ly5froYcekiRdc801uvLKK7VixQqdcsopuuOOOxLHadq0qZYuXarx48fr5z//uV577TV98MEH6tWrlx5//PHY+952222aN29erL20tFR5eXmSpKysLHXo0EFbtmyptf+//e1vNWbMmNi+kpSXl1djIA4bNkxdu3bVv/7rv6q4uFhdunTRzp07tXLlSlVVVWnevHk1tseREcoY/MlPfqJbb71V2dnZiZ9v27ZNTZo00Y9//GMNGjRIP/jBD1RRUXEYVw/p0pjG4aFE74XIHCGNQ0n6n//5H33/+9+Ptb/44osaOnRog/sHcYMKAAsKCrRw4ULdcMMNWrx4sdq2bavFixdr9OjRiQ+9Y8eOkqQlS5bovPPOkyRNmTJFb775ZuI4B9oXL16cGLTFxcX64x//qDVr1sTe984779SECRNS6vu7776rGTNm6O677/baftGiRSotLdWyZcv04YcfKisrS7Nnz9bll1+uYcOGKS8vr8YvZxwZIYzBDz74QGVlZRo1alSN9n379mnVqlUaO3as3n//fR1zzDG69957U+oTkhPCODwgei9E5ghpHDrnNHfuXE2aNCn2s9oCw0zXpL47cDj69eunZcuWaf78+Zo+fbomT56sHj16mNse+FOBpVWrVpL2J7mPGzdOM2fOTKo/ubm5Wr9+vUpKSuScU1lZmTp16hTbbs2aNZoyZYqeeeaZxM8P7HvA+vXrdfLJJ9fYLycnR6eeeqpeeuklnXjiiRo+fLjefvttSfv/FPdN54i6EcIYXLx4sd5880317NlTlZWVqqio0GWXXaYZM2aobdu2GjdunKT9CdMH/yseR05jGoc+ovdCZIaQxuFbb72l4447Tt27d6/Rvnv3br3yyiuaMWNGUn2uTw3qCWBpaalat26tiy++WNOmTdOyZcs0ZMgQvfbaa4lfZAfyCkpKSjR37lxJ0pNPPqmRI0fGjjd06FC9/vrr+uyzzyRJ27dvN/+1UZvx48dr1qxZkvbnPAwdOjQ2yMvKynTWWWfp4YcfVn5+fqL95JNP1ocffqj169drx44devHFF3XaaaepvLxcmzZtkvR/eQXHH3+8JGnjxo2S9uddPPjgg5o6dap3X5EeIYzBK664QuvXr9fatWs1Z84cjR07Vo899piysrI0ZswYvfPOO5KkN954QyeccIJ3X5E+jWkc1uab7oXIDCGMwwNqe8r3wgsvaOTIkd+YO5ix6q38JAkvvfSSO/HEE11RUZEbMmSI++ijj5xz+6t9ioqKXGFhoZs8ebJzzrlVq1a5kSNHuoKCAjd+/Hi3ZcsW55xz//Iv/+JWrlyZOObLL7/svv3tb7uCggJXVFTkXn/99dj73nrrre7ZZ5+NtVdVVbkf/vCHrlevXm7QoEHun//8p3POuSVLlripU6c655y76667XJs2bVxRUZErKipyJ598cmL/Z5991vXt29f17t3bzZgxwznn3Nq1axP9yc/Pdz/96U8T20+bNs0df/zx7vjjj3e///3vU7mUSFIIY/Bgr7/+eo2KuFWrVrlhw4bFzglHVmMbh6eeeqrr3Lmza9mypcvLy3OLFi36xnvhLbfc4vLy8txRRx3l8vLy3C9/+csUryiSEcI4PHDcvLw8V1paGnvPSZMmuTlz5iRz+epdlnOREhkAAAA0ag3qT8AAAABIHQEgAABAYAgAAQAAAtMoA8DS0lJNnjzZ/FnPnj21Y8cO72Ndd911KiwsVGFhoc477zzt2rWrxs8feughZWVlJY65fft2jRs3TsXFxSosLNSLL75YY/vly5erSZMmmj9/fqLt+uuvV35+vk444QT9x3/8h3ffkLnSOQbvuOMOde/ePbHm5IH5s1wt617WttblP/7xjxprWrZs2VJ//vOfJUmrVq1SSUmJ+vTpo8svvzw2ez4aviNxX1y7dq1GjRqlgoICjR07VuXl5ZKkOXPmqLCwUMXFxRo+fLj+93//N3GsV155RYWFhcrPz2+Qc6kheekck7Wty1vb2PvrX/+qoqIiFRcXq6SkRIsWLUr5fBqc+q1BOfKOO+44V1FR4b39gbUonXNu+vTp7v7770+83rhxozv99NNdjx49Esf8xS9+4W666SbnnHMff/yx69evX2L76upqN3bsWHf66ae75557zjnn3NKlS90pp5ziqqqq3K5du1zPnj3NSiM0Hoc7Bm+//Xb34IMPxtprW/fym9a6PKCiosJ16tTJ7dixwznn3MSJExNj8uD/RxjSdV8855xzEhWRs2fPdjfffLNzzrnt27e76upq55xz8+bNcxMmTHDOObd161aXn5+fuOdt2LAh9ZNBo3C4Y/JgB6/LW9vY27FjR2Lt4RUrVrjCwsLUO93ANMongGvXrlVJSYkkadeuXZo4caIGDBigSy655LCfbBxYi9I5p8rKyhpzCt1888366U9/WqMtKysrsTxWeXl5jfUqZ82apdGjR6tbt241tq+srNSePXtUWVmpFi1aqE2bNod/0sgo6RyDtalt3ctvWuvy4H1PPfVUtW7dWs45LVq0KDHB80UXXaTnnnsuLX1E5jgS98WPP/5Yo0ePllRzvfKcnJzENrt27Ur8/x/+8Aedf/75iftk165dUzxLNCR1cZ+Mrstb29hr3bp1YjWtg9tD0igDwIM98sgjysvL09///ndNmjRJn3/+ubndGWecodLSUvNn11xzjXJzc/XRRx/pRz/6kaT9S9ZUV1fHVu+47LLL9NFHHyk3N1enn366fvnLX0raHww+8cQTuvbaa2tsP2jQIH3nO99Rbm6uevTooWnTpjXMCSVRq3SMwfvvv1+FhYW64oorEn8W8Vn3sra1Lg+e1HTLli3q2LFj4gYYXZcajU9d3RcLCwsTQd+f/vSnGuPo97//vfr27avrrrtOv/jFLyRJn3zyiTZs2KARI0bo5JNP1vPPP5/O00QDko4xKdnr8lpjT5L+8pe/6IQTTtDYsWP16KOPpu1cGoz6e/hYd9asWeO+/e1vO+ecO+uss9zbb7+d+FmHDh2SeqxcVVXlpk2b5n7zm9+4qqoqN2rUqMSf2Q5+VP3HP/4x8Sfg999/3+Xn5yf2nT9/vnPOuYsvvjjxJ7ZPPvnEnX322W7Xrl1uy5YtrqCgwK1atSr5k0dGSOcY/Oqrr9y+ffvcvn373PTp092Pf/xj55xz48aNc0uWLElsN2DAALdp06bE68WLF7v+/fu7zZs31zheeXm569Kli9u9e7dzzrlNmza5/Pz8xM/fe+89N27cuMM4WzQEdX1fdM65devWuQkTJriBAwe6W265xXXr1i22z9y5c92//du/Oeecu+qqq9yIESPc7t273bp169xxxx3ntm7dmszpoQGqizF5wQUXuNmzZ5s/O3jsHeydd95x3/3udw/7vRq6Rv8EUPrmNQh9HXXUUbrgggs0d+5cVVRU6MMPP9SQIUPUs2dPrVu3Tvn5+dq+fbtmzpypc845R5I0cOBAOee0efNm/e1vf9NVV12lnj176umnn9bUqVP18ssv65lnntGwYcPUsmVLdezYUSNGjNDSpUtT7i8ySypjsFu3bsrOzlZ2drb+/d//XUuWLJFUcy1fF1n38sBal3Pnzo2tdfnss89qzJgxatGihSSpU6dO2rp1a+JPLuvXr1dubm7S/UXDkO77orT/6fGzzz6r999/Xz/60Y907LHHxvY555xz9MILLyS2Hzt2rFq0aKG8vDzl5+fr008/TblfaJhSHZMH1uWdMGGC+fODx97BhgwZonXr1mnz5s0pvX9D0+gDwOHDh+upp56SJL300ksqKys7rP0/+eSTxP/PmzdPxx9/vNq1a6dNmzZp7dq1Wrt2rbp3766PPvpIbdu21bHHHqtXX31V0v5fwtu3b1fnzp21cOHCxPbnnnuufv3rX2vMmDE69thj9cYbb6iqqkqVlZVatGiR+vfvn74LgHqX6hj88ssvE///7LPPJtavrG3dy0OtdRld0zIrK0tDhgxJ/PntySef1Jlnnnl4J4kGpS7ui5K0efPmxD8k7rnnHl122WWx7V955RUdd9xxkqQJEybozTffVHV1tbZt26aPP/5Y3/rWt5I/MTRYqY5JyV6Xt7axt3r1alVVVUmSPvzwQ1VUVMT+sdzYNfoA8Morr9Tnn3+uAQMG6KmnnlKPHj3M7WrLK7jmmmtUUFCgwsJCrV69Wrfddts3vt+tt96amNbg7LPP1mOPPaajjqr9Mp933nk65phjVFBQoEGDBmnSpEkqKio6vJNERkt1DN5www2JMfj+++/rrrvukrQ/AOzYsaN69+6t22+/Xffee68k6eGHH9aaNWt0/fXXq7i4WIMHD04cq7y8XO+9955OO+20Gu9x33336fbbb1fv3r3VoUOHREEIGqe6ui+++uqr6t+/v/r166dWrVpp6tSpkqTZs2crPz9fxcXF+tnPfqbf/e53kqT8/HwNHz5cJ554okaMGKG77rpLnTt3rqOzRiZLdUxK+/9xO2nSpBpttY29A7+ni4uLdemll+oPf/hDcIUgrAUMAAAQmEb/BBAAAAA1EQACAAAEhgAQAAAgMASAAAAAgSEABAAACAwBIAAAQGCa+GxUXV2t0tLSGosqA9L+FSgqKiqUm5v7jfMdpgPjELU5UuOQMYjacC9EJjiccegVAJaWlppL+gAHfPHFF+revXudvgfjEIdS1+OQMYhD4V6ITOAzDr0CwIOXVWmo2rRpE2tr2bJlrG3Tpk2HPNbq1atjbQeWfzvYlVdeGWvbu3fvIY/fEB2JMdIYxqGlVatWsbajjz461rZx48Yar5s0iX99s7OzY21btmxJoXcNS12PkcY6Bi3FxcWxtmXLliV1LGtt6dpWc2jouBcmr3Xr1rG2fv36xdqi49Baz6J9+/axti5dusTaDl4qrjHxGSNeAWBjeMRsnUOyj+mtC2v9Em8M183XkTjXxno9fcdmdLt0junGoq7HSGMdgxbrHxM+Qh+X3AuTZ52XNQ6j21kBoO+xGiufMRLOtxIAAACSPJ8ANjTWo19rYWnrX6XWvxC6du1a47W1WPnll18ea7P+7GGx/hSyZ88er32Ruax/gVmpCM2bN/dqiz5lrq6ujm1j/QnFGtNlZWWxtq+//jrWhjCUlJR4bXfppZfWeP3444/HtrHa7rrrrlibNcYZg42T9Vl369Yt1mb9ufcvf/lLrO3Pf/5zjddnnXVWbJv7778/1nbdddfF2k488cRY2/r162Nt1j2zoeMJIAAAQGAIAAEAAAJDAAgAABAYAkAAAIDANLgiEKvAI9pmbbN79+5YW1VVVazNmuLlyy+/rPHaSu633tOap81Khi0sLIy1bdu2rcbrTz/9NLYNMkt0DFiftcUam5s3b461ReettPaz5pnct29frK1Tp05e+1ZUVNR4XVlZGdsGmaNdu3Y1Xlv3Jeu+17Fjx1jbkiVLYm2jRo2q8Xrr1q2xbaw5Ua3iDiv53pqzcteuXTVeW8n4jXV+1UzTtGnTWFuHDh1ibdFCSWs/S/R+I9nFmj/4wQ9qvD7hhBNi2yxatCjWZk3XZt0frTigRYsWh+yXNc6te3mm4AkgAABAYAgAAQAAAkMACAAAEJgsZ62hErF9+/ZYbkl9yc/Pj7VF/+5u/U3fl5UfE51I15psd+fOnWl9z7y8vBqvrfUKM2mN1/LycrVt27ZO3yOTxqF1rtEcPd+8JOsraE0EHj13axtrcmjr+2Dlp1p5OtE8xnXr1nm9Z32p63FYX2PQyic99thjY23Rvll5olaektVmveeqVau+sZ+16d27t9d7WvnVVn5i1IYNG2JtX331lWfv0qux3AutSet93zOat2ndI6z7TXTRBetYUnyiZutY1v3XWsTBymv26a/VL+uaWcc6EnmBPuOQJ4AAAACBIQAEAAAIDAEgAABAYAgAAQAAApPRE0E3a9Ys1pZKgUeUlQhvtUVZycs++9W2r5XMH53s1JpsM5OKQEJjfd7RpOPt27fHtvEdv9YEvjt27Kjx2pqI1EqiT2VsRgugopOhSnYyNNLLKviwPq9ocrw1RlIp2snNzT3ksaziEese5zPepPhk09Z+VnK/VQTAPdNf9+7dY21WsY31/Y9OuGwVOlqFG9Yk3z6TT1v7Wb8zrTGR7PfB+m5Zx7cKZ8rLy732rWs8AQQAAAgMASAAAEBgCAABAAACQwAIAAAQmIwuArGSia22aBKqz2oemSS6goQUP4d0Fr8gdVZi8qZNm2q8tpJ/R40aFWv74osvYm3Lly+PteXk5ByyX74rOvgmMPv0gSKQ9LLuVdbqQ9GiIMkel1HWZ2+xkuOj48vaxjep3veeHL0XWv2vqKiItVE4lxrrOlvjK1rwIcU/W+veYv2ettqslTqirHucdS/0PSeLz/3R9/tgXTOrMKSu8QQQAAAgMASAAAAAgSEABAAACAwBIAAAQGAyugjEYiWJRgskrARLKyHUdyb6KN+VFSxW36wk+mhSq0+irWRfH6TGShy2rn002ffaa6+NbTNgwIBY23XXXRdrs4p+osf37Zdzzuv41ioi0fe0kq2RXp06dYq1+Sa0R1mfvbUqh+++yb6nNbYsPsf3Tdq3Vq2BP2ucWPcX6/OI3jesAop0rnyRyuo21r7WOIy2+RbSWQVKmfJ7mieAAAAAgSEABAAACAwBIAAAQGAIAAEAAAKT0UUgvoUPyaqP1UF8Zz+PFpr4JvxnSnJpY2IlOe/evTvWlpubW+P1/PnzY9usXr061nbhhRfG2u67775D9stKtk+3aLFIs2bN6vw9Q2ettuK7wkBd39NSSbZPF+sel8qKJLDvcVaxo/V7yCq2SXZ1oHSOX+ucrOP7FqhE9/UpfqlNpqxMxhNAAACAwBAAAgAABIYAEAAAIDAEgAAAAIHJ6CKQTEmUTJbVf2tGf999ozp06BBr27Bhg9fx4c9KhrZWxIgmPi9dujS2zbnnnhtru/HGG2NtPkUg6WYleEcT7n1WakBqfFZkkexx6VMYZO3nW0Th8/lb/U9l3ET3tfrlm5DvuxJTaHw/H98ih9atW9d4bX0+lZWVXsfy6UcqxY9W33zbfFgFJZmyohJ3cwAAgMAQAAIAAASGABAAACAwGZ0DaOVr+EyanMl8cxWiuYLW5LBMyntkWPkxPrlW1md20003ebVZk6tGx46Vk5JsX63jW+9BDmDds77XFRUVsTbrvheduNvKx7OkMwfM91jJjiVrP6vN6qtP7i72i44lyX8hg+j9y7oH+U7oXR+se2t07Fh9tfL9LJlS38DdHAAAIDAEgAAAAIEhAAQAAAgMASAAAEBgGk71RAOUyuSUPvtmSiJpY+d7naMJ91YCvjV5tyWVsXOk+Sbgw49v4YY1Ln0m7vYtCvKR7nFq9Td6PXy2qQ3j0maNJWucWEUOyV5T6z2T/Z2WyrGs8eQz6bNvEZ7PBPv1hSeAAAAAgSEABAAACAwBIAAAQGAIAAEAAALTKItArBnyoytr1Jdkk1ytWdnbtm3rdfxMSThtqKwE83SuiGAlWzekwgpr5YrKysp66EnjZX2HrXGZzpVa6noMJlvM4bMCimT3tSGtGnUkWZ+FVfDhu9JF9HjpHqvRfdO9+oy1nc/YsX7/+l6z+sATQAAAgMAQAAIAAASGABAAACAwBIAAAACBaXAZsT7FHFZCsMVKrE62iMI3MdlivWd03927d8e2admyZazNSsi39oU/K4HZKtyIbmeNm3QmPh+JopDoe1jvac2ITxGIn1atWnlt5/tZp3NM+Bwr3cn9PquUWN+r+iiSaeysa9WiRYtY265duw65r+84T3aVGut3qO9KHdY4t44X/d1qnbf1ntZ2OTk5sbaysrJYW10XkPBtAAAACAwBIAAAQGAIAAEAAAJDAAgAABCYjCkCsQoaLFZhRbQAI5VVP3wKPqxtmjdvHmvz7YfP6iBWIql1zTp06BBrowjEn5Uk3JASx30LVpItKrCuRbKr28B/bPmuYBH9rH1W1kg3a2yl8h3yOQff4j0rSR82qwDB+hyt33Pl5eU1Xrdp0ya2TSqFk+mU7NiMnqMkdezYMdaWyWOu4fxmAwAAQFoQAAIAAASGABAAACAwGZMDaOUD+EyQ7HusVPIC64NPzo+V22flAJaWlqalT6HyzauLqo/cwWQnUvVl5Xf55qchzrrHWblX9ZEb1ZBY31FyAP1Z49B3bPq0WQsU7Nix43C6eMj39GHdv6z8fR9WXv4xxxwTa7N+D2TKJOU8AQQAAAgMASAAAEBgCAABAAACQwAIAAAQmIzJ3rYSTpMt5rASpn0nCs0U0XPwvRbJJrRiP99JjX0mUra2sY6fzsKNVCb+TfY74jupNOKsxG/fQhvf5PJk+SSlp3vSZx/WmLSKDOCvVatWsTbrc7SKaKztokUa1mfme99IZd9k+Zyn70TZ1rGse761XV0Xr/IEEAAAIDAEgAAAAIEhAAQAAAgMASAAAEBgMqYIxEpCtVjJ0NHCh3QXR6SzWMS3H9EiEN+VAKxk6JYtW8barFVEYF8/K2HXp3DD2sY3Sd/azifx2ff4vgUEPu9pJS8nu3pKaKz7me+48Sm2SOWa+4yl+li9wJc1dtNZJNOYWL/jrJUuWrRoEWvzuUf4rspibefz+9faxvd3bbKrg1j7Wb+nrfO0Ckisa5vKaik+MvfbCwAAgDpBAAgAABAYAkAAAIDAEAACAAAEJmOKQKwkzp07d8barKRpnwIJa5UD38KKaN98V4uwWO9pnZMPa789e/bE2tq2bRtrowjEn++s8+lMMK/rggnfVUqirO+pdd7W2LQSn0NnJX5b1zhTC2iOxEog0XO3xpY1Bq3rmOy9trHzXQXIt1gknXzuS9Y2vr+nfYs0fMa1b0xhqY+VyXgCCAAAEBgCQAAAgMAQAAIAAASGABAAACAwGZMRa61WYSXsJpsQWh+spM5k++Z7LayE3M6dO8faNmzYkFQ/Gjvfz8dnpQvfopBUVgzxOVY6WWPaSo72TawOnTXerM/QSi63rns6P/9kj1XXq4Ok8h3N5JVL6pO18oW1CoW1ylayfFcQ8imssD5X38/aOnermNIq2IpK5fvnW2yYTnwbAAAAAkMACAAAEBgCQAAAgMAQAAIAAAQmY4pArORyq3hh8+bNhzyWVTBhHd9K/rSSXJs1a3bIY1latWoVa0t2pnCrX9YKH/Uxm3hj4ps4bG0X/Wx9E5N9V4eJJkj7Fo/4JiYnO5u+70ogiLOur5WAbn2vrST6qHSuTiPFE9Wt8Wydk5Xgnuz49V21wjo+RSA235V7rN+Zluh1tj7/yspKr2P5FEdYY8K36MwqnLSuh08/rPe04gDrPdu0aRNrKy8vP+R7poJvAwAAQGAIAAEAAAJDAAgAABCYjEnUsXILrNwoKz8m+jd23xwR31zBaD+svlrHsrbbvn17rM0SPZ7VL2vybCsvqD4mmGyofCc6rg91PcmzD2sspXPC89BYk8ta9zjrevpMDp9KTqgPazz45vtZ5+TT32hOtmRfM6sfvjlsofHN2fX9XRL9bH0/H4vP/cX3vm29p5Vfb41Nn98DVo1C3759Y21WDmB9TJSfGb/ZAAAAcMQQAAIAAASGABAAACAwBIAAAACByZgiEN/JcHNycmJt0WKI3bt3x7axEj2t9/SZSNlKaPXlO1FzNFnZSkqtqKg45H6SVFZW5tk7+BYvNKTCGisB37cQwCfx2foeUQTixyoCsQrFfCYGr227KN9Jky0+E/z6svrvM26svlrnbRXJMVG+P+u773v9ovta9whrImifyc0tvuPQOr51TsmOE6u4w2IdnyIQAAAA1DkCQAAAgMAQAAIAAASGABAAACAwGVMEsm3btqT3jRZ4tG7dOraNlVxsbZdsYUh0NRKrX7Xx6e+aNWti23Tu3DnWls6E1hD5JttbiejRsZPuQgirH1HWe1p99TmWFE+a9h1LvisLhM63kMu6R1jFItEx6FsokuwYse5xvqs6Jfv92LFjR6zNKg60Cmxgswo+UllVKvrZWmPOKnrwXYEj2pZscUptbT4r3FjXx7eQwzrPjh07xtrquoCTJ4AAAACBIQAEAAAIDAEgAABAYAgAAQAAApMxmdpWQUObNm1ibV988UWsbcuWLTVeWwmW+fn5sTYr+dNKuowez0pyto5lJb526NAh1mbNHv7555/XeG2tPmIlYFvHtxKwYbOS8q0CH58Edt9Ci0xhJT43a9asxmsr2doaX1aiPuJ8rrlkjyWrCCR6z/Rd4cNXtL9W/33vhb59i567lWhvjUHrOqb7ejRmqRQ5RO8TqRSFWWMs2g+r4Mdqs1Yf8S08SWYbyb5nWr9T6qNwjieAAAAAgSEABAAACAwBIAAAQGAIAAEAAAKTMUUg0aIHyS4MiRZ8WKykS+v4FmtFkv79+9d4bc06b63UYR3LSkzevXu3V9+irGvhc31QO6uwxkp87tSp0yGP5bsqR7KsY/mu6OCbwBxlJdtbhTPwY90PWrZs6bWvdX+Jjktr7FpJ9cmOkVSKQHyL06L7lpeXx7axkvu7du0aa7MK7mAXGVqFXL73jei4sPbzXQkm2X2tMW3t57t6lu8qKFEbN2702s5aCaSu8QQQAAAgMASAAAAAgSEABAAACAwBIAAAQGAypgjESkItLS1N2/GthGlf0YRQq2jDt/gi2YIP1B/fxGefxGQrudi3MCTZAhJrP6sfluh21io4SJ5VlGAVR7Rt2zbWZn0WrVu3rvHauq9aBRO+BRnR74JV3GEVnviutmD1I7pqgrWKgjWerYIu69xhF7VZY8d3JZAoa1WRZAvR0s3qmzV2oqvs+BaxWEWj1nW0il7rWmZ8AgAAADhiCAABAAACQwAIAAAQmIzJAcxk0ZwTa/JWhMUnH893omaLlZcSzcmxJjq1crIsVs6UlZMTbbPGvtWPdE543ZhFc/YkaefOnbE263O18gejE95beYK+E01bOWBRVj6exRpv1vFbtGgRaysrK6vxet26dbFtevbsGWuz8qwyJe8s02zYsCHWZl0/30nffXIFrbFj5d5Zop9jKrnV1gTP7dq1i7VFx6ZvHrV1Laz7e33g2wAAABAYAkAAAIDAEAACAAAEhgAQAAAgMBSBeIgmyPbp06eeeoJMYRU++PBNHLaO7/Oe1kS6vgnSVjJ0NIHZmvyUIpDkbdq0KdZmFWlYyfdWcvlXX32Vno41MNYE+9a4ZCJo29q1a5Pe1yqsid5LPvnkk9g21sTH1rGswqBocZNvUYX1ndmxY0eszfq+rVmzpsZr34KVaGFWbVatWuW1XTrxBBAAACAwBIAAAACBIQAEAAAIjFcOYOj5PNF8Bt+F00NyJMZIfY1D33y56HY+29Qm2X1990u2LZVzOhLqui/pPr51Pa0239zRUPleR6st3RrzvdCS7KT4qXxm0e+D7wTf1vfI9z0z6Zr78OmvVwDoO/t3YxVN1LYSt0NXUVFhzp6e7veoD9ZNY8uWLXX6nlZiMg6trsdhusfg5s2b03q8UGVS8UtjvhdafAINq0jHWtEF6eMzDrOcx6dXXV2t0tJS5eTkJF39iMbJOaeKigrl5ubW+TJLjEPU5kiNQ8YgasO9EJngcMahVwAIAACAxoMiEAAAgMAQAAIAAASGABAAACAwBIApcM7p8ssvV58+fVRSUmIu5fLKK69o0KBBKigo0LBhw7Ry5UpJ0rZt21RSUqLi4mKdeOKJevzxxxP7vPfee8rPz1efPn105513Jtrvvvtu9ejRw1xCBwAAwFfQAWCqc2s9//zz2rx5sz799FPdcccduvHGG2PbdOnSRS+88IJWrlypO+64Q1dddZUkKScnRwsXLtSyZcv07rvv6mc/+1liapGrrrpK//3f/61//OMfiX0l6bTTTtO7776bUp8BAAAaVAC4Y8cOnX766SooKFBBQYEWLFggaX8gNnDgQBUVFenCCy+UJK1evVqjRo1SYWGhJkyYoK1bt0qSRo0apWnTpqmkpESzZs3SggULNHToUA0cOFAXXXSR9uzZ492fefPmacqUKZKkcePGadGiRbE5kYqLi3X00UdLkk466SStX79ekpSdna1WrVpJ2r+otHNOzjmVlpZq3759KiwsVHZ2ts4//3zNnz8/sf8xxxyT7OUDAACQ1MACwAULFqhTp05auXKlVqxYoaFDh2rjxo26+uqrNX/+fC1fvlwPPfSQJOmaa67RlVdeqRUrVuiUU07RHXfckThO06ZNtXTpUo0fP14///nP9dprr+mDDz5Qr169avwp9oDbbrtN8+bNi7WXlpYqLy9PkpSVlaUOHTp84wTBv/3tbzVmzJjE623btqmoqEjdu3fX9ddfr86dO9c4piTl5eUlgkYAAIB08FoJJFMUFBRo2rRpuuGGG/S9731PQ4cO1RtvvKHRo0cngqaOHTtKkpYsWaLnnntOkjRlyhSNGzcucZzzzjtPkrR48eJEICntfxJ38HYHHJyHl6x3331XM2bM0Ntvv51oa9++vZYvX64NGzbonHPO0bnnnpvy+wAAABxKgwoA+/Xrp2XLlmn+/PmaPn26Jk+erB49epjbftPs6Af+9FpdXa1x48Zp5syZSfUnNzdX69evV0lJiZxzKisrU6dOnWLbrVmzRlOmTNEzzzxj/rxbt24qKirSm2++qWHDhtV44rd+/Xrl5uYm1T8AAABLg/oTcGlpqVq3bq2LL75Y06ZN07JlyzRkyBC99tpriaDpQK5fSUmJ5s6dK0l68sknNXLkyNjxhg4dqtdff12fffaZJGn79u1as2aNd3/Gjx+vWbNmSdqfhzh06NBY4FlWVqazzjpLDz/8sPLz8xPtGzZsSKznWF5eroULF6p///7Kzc1Vdna2VqxYoaqqKs2ZM0dnnnmmd58AAAAOpUEFgCtXrtRJJ52k4uJiPfDAA5o+fbq6du2qX/3qVxo3bpyKiop0zTXXSJJ+9atf6cEHH1RhYaEWLlyo22+/PXa8Ll266PHHH9fEiRNVWFiokSNHJoLBg9WWAzh+/Hh17NhRvXv31u233657771XkrR06VJdeumlkqSHH35Ya9as0fXXX6/i4mINHjxYkvTZZ59pxIgRKioq0ogRI3T11VeroKBAkvTQQw/pggsuUL9+/RJFL5J06623qnv37iorK1P37t11//33p+GqAgCA0LAWMAAAQGAa1BNAAAAApI4AEAAAIDAEgAAAAIFp9AFgaWmpJk+ebP6sZ8+e2rFjh/exli9frsGDB6u4uFinnHKKVq9eLUmqrKzUOeeco759++o73/mONm/eLGl/RfKZZ56pwsJCjRgxQp9//nnseE2aNEms9AEAAHAkNPoAMDc3V08++WRajnXLLbfozjvv1LJlyzRlyhTdd999kqQnnnhCvXr10ieffKKJEycmqoHvueceDR8+XCtWrNC9996rm266KXEs55xuvvlmffe7301L3wAAAHw1+gBw7dq1KikpkSTt2rVLEydO1IABA3TJJZfE1u09lKysrBpz9x1Yl/fgNYEvuuiixAokH3/8sUaPHi1JOuWUU/TCCy8k3nPWrFkaPXq0unXrlvpJAgAAHIZGHwAe7JFHHlFeXp7+/ve/a9KkSbE/yR5wxhlnqLS0NNb+n//5n5o+fbq6d++umTNnavr06ZJqrgncvn17bdu2TZJUWFioZ555RtL+iaLLy8u1detWlZeX64knntC1115bB2cJAADwzYIKAN966y2df/75kvYHeR06dDC3e+GFF8zl1x555BE9+uijWrduna6++upEAFibm2++WV988YUGDhyo+fPnq1evXsrOztYdd9yhG2+8UU2bNk39pAAAAA5TUAGg9M1rBB/KnDlzdMYZZ0iSJk2apEWLFkn6vzWBJWnbtm1q3769JKldu3aaNWuWPvjgA/3Xf/2Xqqqq1L59e/3tb3/TVVddpZ49e+rpp5/W1KlT9fLLL6d2YgAAAJ6CCgCHDx+up556SpL00ksvqays7LD279ixoxYvXixJevXVV9W/f39JNdcEnj17tsaPHy9pfzC4d+9eSdIDDzyQqEZeuHCh1q5dq7Vr1+rcc8/Vr3/9a40ZMyb1EwQAAPDQpL47cCRdeeWVuuiiizRgwAANHjxYPXr0MLc744wz9MQTT8T+DDxjxgxdccUVqq6uVrt27fSb3/xGkvTDH/5QF1xwgfr06aO8vDw9/fTTkvavXTx16lRlZWVpyJAhevTRR+v2BAEAADywFjAAAEBggvoTMAAAAAgAAQAAgkMACAAAEBgCQAAAgMAQAAIAAASGABAAACAwXvMAVldXq7S0VDk5OSmtpIHGxzmniooK5ebm6qij+PcEAAANgVcAWFpaqmOPPbau+4IG7IsvvlD37t3ruxsAAMCDVwCYk5NT1/2oF926dYu1NW/ePNb2+eefH/JYXbp0ibVt2rQpuY4dAdaT3FTmBG+sYwQAgMbIKwBsrH/2tf5kmeyfMRvanz/THQA21jECAEBj1LCiFgAAAKTM6wlgQ9OmTZtYm/WnXetPtDNnzoy1FRcX13i9ZcuW2DYTJ06MtfXq1SvWZv05ed++fbG2dGpoTycBAEDdIjIAAAAIDAEgAABAYAgAAQAAAkMACAAAEJgGVwTSrl27WFt0DrpWrVp5Hcsq5nj88cdjbX/9619rvL7++uu9jmXNDeg7WfLXX39d43VZWVlsm8rKylibb8FHdXW113YAAKDx4QkgAABAYAgAAQAAAkMACAAAEJiMzgG0JnTu2bNnrC2aH1dRURHbxsp5s46/cOHCWNuUKVNqvH7xxRdj21i5d3v27Im17d69O9ZmieY6Wmvt/vOf/4y1WefJRNAAAOBgRAYAAACBIQAEAAAIDAEgAABAYAgAAQAAApPRRSDt27ePte3cuTPWFi2ssAohWrZsGWuzCiuaN28ea+vcuXON1717945ts3HjRq/3tIpArP5GC1u6du0a28aa8HrXrl2xNgAAgIPxBBAAACAwBIAAAACBIQAEAAAIDAEgAABAYDK6CMRiFUw0aVLzNPbt2xfbpqqqKtbWtGnTWJtVuPG73/3uG99PsotHysvLY23W6iDNmjWLtUXP0+q/7wof1jUDAADh4gkgAABAYAgAAQAAAkMACAAAEBgCQAAAgMBkTBGIVQhhFT5YfIocrG327t0ba/v6668P2TeryMRa4cM6J6tww6f/1nu2bt061rZjx45DHgsAAISNJ4AAAACBIQAEAAAIDAEgAABAYAgAAQAAApMxRSA5OTmxNqs4wiqGSFayK2RYhRxWW3Z2dqwt2f5bfbVWLQEAADgUngACAAAEhgAQAAAgMASAAAAAgSEABAAACEzGFIFYq2ZYhQ/WdtYqHOkU7YdvEYi1kkmyK4Hs2bMn1mZdiyZN4h9pOgtnAABAw8cTQAAAgMAQAAIAAASGABAAACAwGZMDmOykzNa+Vh5cKqy8vUP14XCOZeXoRbez8gktVl4gOYAAAOBgPAEEAAAIDAEgAABAYAgAAQAAAkMACAAAEJh6KQKxChUaI5/ikcPZzkfLli1jbbt27Urb8QEAQMPHE0AAAIDAEAACAAAEhgAQAAAgMASAAAAAgamXIhDflTpSWR0kVDk5ObG2LVu21ENPAABApuIJIAAAQGAIAAEAAAJDAAgAABAYAkAAAIDA1EsRiLVahbUahtW2b9++WFtdF4v4HN/axip2adq0aaytsrLykPta+1nXIp2rigAAgMaJaAEAACAwBIAAAACBIQAEAAAIDAEgAABAYDJ6JZBQWAUk0WKOvXv3xrbJzs6OtbVq1Sp9HQMAAI0STwABAAACQwAIAAAQGAJAAACAwNRLMp41WbFPHlzIrOtj5QBabS1atIi1WZNPAwCAMBBhAQAABIYAEAAAIDAEgAAAAIEhAAQAAAhMxszInEoRSEMvFvEpimnevLnXsb7++utYm7UvRSAAAISrYUdOAAAAOGwEgAAAAIEhAAQAAAgMASAAAEBgMqYIxCqEaNq0aaxt9+7dXvseab6rm1iaNDn0x2BtYxV87Nu3L9bWoUOHWFt5eblX3wAAQONT/5ETAAAAjigCQAAAgMAQAAIAAASGABAAACAw9VIEkp2dHWvzLZjA/7Guo1UEsnfv3iPRHQAA0EDwBBAAACAwBIAAAACBIQAEAAAIDAEgAABAYOqlCMRawcJ3NY9UVtyoS1Yf0rlCie85VlVVxdqaN2+etn4AAICGjyeAAAAAgSEABAAACAwBIAAAQGAIAAEAAAJTL0UgTZr4va1V+JAJBR+WVAo+fApbrNU8rJVAmjZtGmtr0aJFUu8JAAAaJ54AAgAABIYAEAAAIDAEgAAAAIGplxzAPXv2xNpatmwZa7MmNbb45K6lMtF0svvt27cv1mZNgm2de3Q7K7fPN0fSasvJyYm1lZeXx9oAAEDjwxNAAACAwBAAAgAABIYAEAAAIDAEgAAAAIHJ6ImgrYmOM3WyYt9+JTsBs3XNrCKTioqKWFuzZs1ibdbk0BSBAAAQBp4AAgAABIYAEAAAIDAEgAAAAIEhAAQAAAhMvRSBWIUQvsURya7UkYp0rjSS7Hv6rvBhFXxkynUEAACZgSgAAAAgMASAAAAAgSEABAAACAwBIAAAQGDqpQjEWuFj7969Xvvu2bMnbf2wCiGsvjVt2rTGa6uooqqqKtZmrdTh24/oyh/Wsaxr5lvc0bx5c6/tAABA48MTQAAAgMAQAAIAAASGABAAACAwBIAAAACBqZcikGiBgyRt3bo11paTkxNr81lFxGflDsku+EhnQYYvq7/R9+jatWtsm9atW8faSktLY227d++OtbVs2fJwuggAABoRngACAAAEhgAQAAAgMASAAAAAgSEABAAACEy9FIFYRQ+7du2KtVmFCu3atYu17dy585DH910hw9q3vLw8qWNZ2/kWqHTq1KnG640bN8a2+fLLL2Nt1vVp1qyZV98AAEAYiAIAAAACQwAIAAAQGAJAAACAwNRLDqCVk2bZsmVLrM2aRDqaz5ZKfpu1b3TCZWvSZ2tyaCvfz2pr1arVIY9n5fv5iuYTSnbOJQAACANPAAEAAAJDAAgAABAYAkAAAIDAEAACAAAEpl6KQEpLS5Pet6ysLNaWk5NT47VVyOHbZolul52dHdtmz549sTarYKVp06axNqsgI9lrVFFREWuzilYoAgEAIFw8AQQAAAgMASAAAEBgCAABAAAC45UD6JxL65umcjxrX2tyZZ9tfPZL5VipvGey18jaL92fn+/7AgCAzOQVAFqFBanYvXt30vtaBQ1WYUiorEAslevtq6KiQu3atavz9wEAAKnLch6Pbqqrq1VaWqqcnBxlZWUdiX6hgXDOqaKiQrm5uSktwQcAAI4crwAQAAAAjQePbAAAAAJDAAgAABAYAkAAAIDAEAACAAAEhgAQAAAgMASAAAAAgSEABAAACAwBIAAAQGAIAAEAAAJDAAgAABCY/w/qXjtGKAiroAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "lab.report()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Label issues\n", + "\n", + "Let's first inspect mislabeled examples in the dataset. Such errors occur when the given label for an image is incorrect, usually due to mistakes made by data annotators. Cleanlab automatically detects mislabeled data that you can correct to improve your dataset.\n", + "\n", + "For each type of issue that Cleanlab detects, you can use the `get_issues` method to see which examples in the dataset exhibit this type of issue (and how severely). Let's see which images in our dataset are estimated to be mislabeled:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:14.899117Z", + "iopub.status.busy": "2024-06-25T23:02:14.898594Z", + "iopub.status.idle": "2024-06-25T23:02:14.960639Z", + "shell.execute_reply": "2024-06-25T23:02:14.960116Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_label_issuelabel_scoregiven_labelpredicted_label
0False0.166980T - shirt / topDress
1False0.986195T - shirt / topT - shirt / top
2False0.997205SandalSandal
3False0.948781SandalSandal
4False0.999358DressDress
\n", + "
" + ], + "text/plain": [ + " is_label_issue label_score given_label predicted_label\n", + "0 False 0.166980 T - shirt / top Dress\n", + "1 False 0.986195 T - shirt / top T - shirt / top\n", + "2 False 0.997205 Sandal Sandal\n", + "3 False 0.948781 Sandal Sandal\n", + "4 False 0.999358 Dress Dress" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_issues = lab.get_issues(\"label\")\n", + "label_issues.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above dataframe contains a `label_score` for each example in the dataset. These numeric quality scores lie between 0 and 1, where lower scores indicate examples more likely to be mislabeled. It contains a boolean column `is_label_issue` specifying whether or not each example appears to have a label issue (indicating it is likely mislabeled).\n", + "\n", + "Filter the `label_issues` DataFrame to see which examples have label issues, and sort by `label_score`(in ascending order) to see the most likely mislabeled examples first." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:14.962765Z", + "iopub.status.busy": "2024-06-25T23:02:14.962425Z", + "iopub.status.idle": "2024-06-25T23:02:14.971039Z", + "shell.execute_reply": "2024-06-25T23:02:14.970570Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_label_issuelabel_scoregiven_labelpredicted_label
11262True0.000003CoatT - shirt / top
19228True0.000010DressShirt
53564True0.000018PulloverT - shirt / top
54078True0.000022PulloverDress
17371True0.000025PulloverT - shirt / top
\n", + "
" + ], + "text/plain": [ + " is_label_issue label_score given_label predicted_label\n", + "11262 True 0.000003 Coat T - shirt / top\n", + "19228 True 0.000010 Dress Shirt\n", + "53564 True 0.000018 Pullover T - shirt / top\n", + "54078 True 0.000022 Pullover Dress\n", + "17371 True 0.000025 Pullover T - shirt / top" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_issues_df = label_issues.query(\"is_label_issue\").sort_values(\"label_score\")\n", + "label_issues_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
We define a helper method plot_label_issue_examples to visualize results. **(click to expand)**\n", + "\n", + "```python\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "def plot_label_issue_examples(label_issues_df, num_examples=15):\n", + " ncols = 5\n", + " nrows = int(math.ceil(num_examples / ncols))\n", + "\n", + " _, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(1.5 * ncols, 1.5 * nrows))\n", + " axes_list = axes.flatten()\n", + " label_issue_indices = label_issues_df.index.values\n", + "\n", + " for i, ax in enumerate(axes_list):\n", + " if i >= num_examples:\n", + " ax.axis(\"off\")\n", + " continue\n", + " idx = int(label_issue_indices[i])\n", + " row = label_issues.loc[idx]\n", + " ax.set_title(\n", + " f\"id: {idx}\\n GL: {row.given_label}\\n SL: {row.predicted_label}\",\n", + " fontdict={\"fontsize\": 8},\n", + " )\n", + " ax.imshow(dataset[idx][\"image\"], cmap=\"gray\")\n", + " ax.axis(\"off\")\n", + " plt.subplots_adjust(hspace=0.7)\n", + " plt.show()\n", + "```\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:14.973075Z", + "iopub.status.busy": "2024-06-25T23:02:14.972744Z", + "iopub.status.idle": "2024-06-25T23:02:14.978127Z", + "shell.execute_reply": "2024-06-25T23:02:14.977569Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "def plot_label_issue_examples(label_issues_df, num_examples=15):\n", + " ncols = 5\n", + " nrows = int(math.ceil(num_examples / ncols))\n", + "\n", + " _, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(1.5 * ncols, 1.5 * nrows))\n", + " axes_list = axes.flatten()\n", + " label_issue_indices = label_issues_df.index.values\n", + "\n", + " for i, ax in enumerate(axes_list):\n", + " if i >= num_examples:\n", + " ax.axis(\"off\")\n", + " continue\n", + " idx = int(label_issue_indices[i])\n", + " row = label_issues.loc[idx]\n", + " ax.set_title(\n", + " f\"id: {idx}\\n GL: {row.given_label}\\n SL: {row.predicted_label}\",\n", + " fontdict={\"fontsize\": 8},\n", + " )\n", + " ax.imshow(dataset[idx][\"image\"], cmap=\"gray\")\n", + " ax.axis(\"off\")\n", + " plt.subplots_adjust(hspace=0.7)\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### View most likely examples with label errors\n", + "\n", + "Here we define\n", + "`GL` : given label in the original dataset\n", + "`SL` : suggested alternative label by cleanlab" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:14.980350Z", + "iopub.status.busy": "2024-06-25T23:02:14.979967Z", + "iopub.status.idle": "2024-06-25T23:02:15.480861Z", + "shell.execute_reply": "2024-06-25T23:02:15.480294Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_label_issue_examples(label_issues_df, num_examples=15)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Outlier issues\n", + "\n", + "Datalab also detects atypical images lurking in our dataset. Such outliers are significantly different from the majority of the dataset and may have an outsized impact on how models fit to this data.\n", + "\n", + "Similarly to the previous section, we filter the `outlier_issues` DataFrame to find examples that are considered to be outliers. We then sort the filtered results by their outlier quality score, where examples with the lowest scores are those that appear least typical relative to the rest of the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:15.483156Z", + "iopub.status.busy": "2024-06-25T23:02:15.482820Z", + "iopub.status.idle": "2024-06-25T23:02:15.490945Z", + "shell.execute_reply": "2024-06-25T23:02:15.490499Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_outlier_issueoutlier_score
27080True3.873833e-07
40378True6.915575e-07
25316True1.390277e-06
2090True3.751164e-06
14999True3.881301e-06
\n", + "
" + ], + "text/plain": [ + " is_outlier_issue outlier_score\n", + "27080 True 3.873833e-07\n", + "40378 True 6.915575e-07\n", + "25316 True 1.390277e-06\n", + "2090 True 3.751164e-06\n", + "14999 True 3.881301e-06" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "outlier_issues_df = lab.get_issues(\"outlier\")\n", + "outlier_issues_df = outlier_issues_df.query(\"is_outlier_issue\").sort_values(\"outlier_score\")\n", + "outlier_issues_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### View most severe outliers\n", + "\n", + "In this visualization, the first image in every row shows the potential outlier, while the remaining images in the same row depict typical instances from the corresponding class." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
We define a helper method plot_outlier_issues_examples to visualize results. **(click to expand)**\n", + "\n", + "```python\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "def plot_outlier_issues_examples(outlier_issues_df, num_examples):\n", + " ncols = 4\n", + " nrows = num_examples\n", + " N_comparison_images = ncols - 1\n", + "\n", + " def sample_from_class(label, number_of_samples, index):\n", + " index = int(index)\n", + "\n", + " non_outlier_indices = (\n", + " label_issues.join(outlier_issues_df)\n", + " .query(\"given_label == @label and is_outlier_issue.isnull()\")\n", + " .index\n", + " )\n", + " non_outlier_indices_excluding_current = non_outlier_indices[non_outlier_indices != index]\n", + "\n", + " sampled_indices = np.random.choice(\n", + " non_outlier_indices_excluding_current, number_of_samples, replace=False\n", + " )\n", + "\n", + " label_scores_of_sampled = label_issues.loc[sampled_indices][\"label_score\"]\n", + "\n", + " top_score_indices = np.argsort(label_scores_of_sampled.values)[::-1][:N_comparison_images]\n", + "\n", + " top_label_indices = sampled_indices[top_score_indices]\n", + "\n", + " sampled_images = [dataset[int(i)][\"image\"] for i in top_label_indices]\n", + "\n", + " return sampled_images\n", + "\n", + " def get_image_given_label_and_samples(idx):\n", + " image_from_dataset = dataset[idx][\"image\"]\n", + " corresponding_label = label_issues.loc[idx][\"given_label\"]\n", + " comparison_images = sample_from_class(corresponding_label, 30, idx)[:N_comparison_images]\n", + "\n", + " return image_from_dataset, corresponding_label, comparison_images\n", + "\n", + " count = 0\n", + " images_to_plot = []\n", + " labels = []\n", + " idlist = []\n", + " for idx, row in outlier_issues_df.iterrows():\n", + " idx = row.name\n", + " image, label, comparison_images = get_image_given_label_and_samples(idx)\n", + " labels.append(label)\n", + " idlist.append(idx)\n", + " images_to_plot.append(image)\n", + " images_to_plot.extend(comparison_images)\n", + " count += 1\n", + " if count >= nrows:\n", + " break\n", + "\n", + " ncols = 1 + N_comparison_images\n", + " fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(1.5 * ncols, 1.5 * nrows))\n", + " axes_list = axes.flatten()\n", + " for i, ax in enumerate(axes_list):\n", + " if i % ncols == 0:\n", + " ax.set_title(f\"id: {idlist[i // ncols]}\\n GL: {labels[i // ncols]}\", fontdict={\"fontsize\": 8})\n", + " ax.imshow(images_to_plot[i], cmap=\"gray\")\n", + " ax.axis(\"off\")\n", + " plt.subplots_adjust(hspace=0.7)\n", + " plt.show()\n", + "```\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:15.493081Z", + "iopub.status.busy": "2024-06-25T23:02:15.492759Z", + "iopub.status.idle": "2024-06-25T23:02:15.775300Z", + "shell.execute_reply": "2024-06-25T23:02:15.774747Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "def plot_outlier_issues_examples(outlier_issues_df, num_examples):\n", + " ncols = 4\n", + " nrows = num_examples\n", + " N_comparison_images = ncols - 1\n", + "\n", + " def sample_from_class(label, number_of_samples, index):\n", + " index = int(index)\n", + "\n", + " non_outlier_indices = (\n", + " label_issues.join(outlier_issues_df)\n", + " .query(\"given_label == @label and is_outlier_issue.isnull()\")\n", + " .index\n", + " )\n", + " non_outlier_indices_excluding_current = non_outlier_indices[non_outlier_indices != index]\n", + "\n", + " sampled_indices = np.random.choice(\n", + " non_outlier_indices_excluding_current, number_of_samples, replace=False\n", + " )\n", + "\n", + " label_scores_of_sampled = label_issues.loc[sampled_indices][\"label_score\"]\n", + "\n", + " top_score_indices = np.argsort(label_scores_of_sampled.values)[::-1][:N_comparison_images]\n", + "\n", + " top_label_indices = sampled_indices[top_score_indices]\n", + "\n", + " sampled_images = [dataset[int(i)][\"image\"] for i in top_label_indices]\n", + "\n", + " return sampled_images\n", + "\n", + " def get_image_given_label_and_samples(idx):\n", + " image_from_dataset = dataset[idx][\"image\"]\n", + " corresponding_label = label_issues.loc[idx][\"given_label\"]\n", + " comparison_images = sample_from_class(corresponding_label, 30, idx)[:N_comparison_images]\n", + "\n", + " return image_from_dataset, corresponding_label, comparison_images\n", + "\n", + " count = 0\n", + " images_to_plot = []\n", + " labels = []\n", + " idlist = []\n", + " for idx, row in outlier_issues_df.iterrows():\n", + " idx = row.name\n", + " image, label, comparison_images = get_image_given_label_and_samples(idx)\n", + " labels.append(label)\n", + " idlist.append(idx)\n", + " images_to_plot.append(image)\n", + " images_to_plot.extend(comparison_images)\n", + " count += 1\n", + " if count >= nrows:\n", + " break\n", + "\n", + " ncols = 1 + N_comparison_images\n", + " fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(1.5 * ncols, 1.5 * nrows))\n", + " axes_list = axes.flatten()\n", + " for i, ax in enumerate(axes_list):\n", + " if i % ncols == 0:\n", + " ax.set_title(\n", + " f\"id: {idlist[i // ncols]}\\n GL: {labels[i // ncols]}\", fontdict={\"fontsize\": 8}\n", + " )\n", + " ax.imshow(images_to_plot[i], cmap=\"gray\")\n", + " ax.axis(\"off\")\n", + " plt.subplots_adjust(hspace=0.7)\n", + " plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:15.777400Z", + "iopub.status.busy": "2024-06-25T23:02:15.777083Z", + "iopub.status.idle": "2024-06-25T23:02:16.225557Z", + "shell.execute_reply": "2024-06-25T23:02:16.224959Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_outlier_issues_examples(outlier_issues_df, num_examples=5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Near duplicate issues\n", + "\n", + "Datalab also detects which examples are (near) duplicates of other examples in the dataset. Near duplicate images in a dataset can lead to model overfitting and have an outsized impact on evaluation metrics (especially when you have duplicates between training and test splits).\n", + "\n", + "The `near_duplicate_issues` DataFrame tells us which examples are considered to be nearly duplicated in the dataset (including exact duplicates as well). We can sort all images via the `near_duplicate_score` which quantifies how severe this issue is for each image (lower values indicate more severe instances of a type of issue, in this case, how similar the image is to its closest neighbor in the dataset).\n", + "\n", + "This allows us to visualize examples in the dataset that are considered nearly duplicated, along with their highly similar counterparts." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:16.227925Z", + "iopub.status.busy": "2024-06-25T23:02:16.227567Z", + "iopub.status.idle": "2024-06-25T23:02:16.243687Z", + "shell.execute_reply": "2024-06-25T23:02:16.243205Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_near_duplicate_issuenear_duplicate_scorenear_duplicate_setsdistance_to_nearest_neighbor
30659True0.001267[30968]0.000022
30968True0.001267[30659]0.000022
3370True0.001454[47824]0.000026
47824True0.001454[3370]0.000026
9762True0.001854[54565, 258, 47139]0.000033
\n", + "
" + ], + "text/plain": [ + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets \\\n", + "30659 True 0.001267 [30968] \n", + "30968 True 0.001267 [30659] \n", + "3370 True 0.001454 [47824] \n", + "47824 True 0.001454 [3370] \n", + "9762 True 0.001854 [54565, 258, 47139] \n", + "\n", + " distance_to_nearest_neighbor \n", + "30659 0.000022 \n", + "30968 0.000022 \n", + "3370 0.000026 \n", + "47824 0.000026 \n", + "9762 0.000033 " + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "near_duplicate_issues_df = lab.get_issues(\"near_duplicate\")\n", + "near_duplicate_issues_df = near_duplicate_issues_df.query(\"is_near_duplicate_issue\").sort_values(\n", + " \"near_duplicate_score\"\n", + ")\n", + "near_duplicate_issues_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### View sets of near duplicate images" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
We define a helper method plot_near_duplicate_issue_examples to visualize results. **(click to expand)**\n", + "\n", + "```python\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "def plot_near_duplicate_issue_examples(near_duplicate_issues_df, num_examples=3):\n", + " nrows = num_examples\n", + " seen_id_pairs = set()\n", + "\n", + " def get_image_and_given_label_and_predicted_label(idx):\n", + " image = dataset[idx][\"image\"]\n", + " label = label_issues.loc[idx][\"given_label\"]\n", + " predicted_label = label_issues.loc[idx][\"predicted_label\"]\n", + " return image, label, predicted_label\n", + "\n", + " count = 0\n", + " for idx, row in near_duplicate_issues_df.iterrows():\n", + " image, label, predicted_label = get_image_and_given_label_and_predicted_label(idx)\n", + " duplicate_images = row.near_duplicate_sets\n", + " nd_set = set([int(i) for i in duplicate_images])\n", + " nd_set.add(int(idx))\n", + "\n", + " if nd_set & seen_id_pairs:\n", + " continue\n", + "\n", + " _, axes = plt.subplots(1, len(nd_set), figsize=(len(nd_set), 3))\n", + " for i, ax in zip(list(nd_set), axes):\n", + " label = label_issues.loc[i][\"given_label\"]\n", + " ax.set_title(f\"id: {i}\\n GL: {label}\", fontdict={\"fontsize\": 8})\n", + " ax.imshow(dataset[i][\"image\"], cmap=\"gray\")\n", + " ax.axis(\"off\")\n", + " seen_id_pairs.update(nd_set)\n", + " count += 1\n", + " if count >= nrows:\n", + " break\n", + "\n", + " plt.show()\n", + "```\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:16.245847Z", + "iopub.status.busy": "2024-06-25T23:02:16.245511Z", + "iopub.status.idle": "2024-06-25T23:02:16.252358Z", + "shell.execute_reply": "2024-06-25T23:02:16.251902Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "def plot_near_duplicate_issue_examples(near_duplicate_issues_df, num_examples=3):\n", + " nrows = num_examples\n", + " seen_id_pairs = set()\n", + "\n", + " def get_image_and_given_label_and_predicted_label(idx):\n", + " image = dataset[idx][\"image\"]\n", + " label = label_issues.loc[idx][\"given_label\"]\n", + " predicted_label = label_issues.loc[idx][\"predicted_label\"]\n", + " return image, label, predicted_label\n", + "\n", + " count = 0\n", + " for idx, row in near_duplicate_issues_df.iterrows():\n", + " image, label, predicted_label = get_image_and_given_label_and_predicted_label(idx)\n", + " duplicate_images = row.near_duplicate_sets\n", + " nd_set = set([int(i) for i in duplicate_images])\n", + " nd_set.add(int(idx))\n", + "\n", + " if nd_set & seen_id_pairs:\n", + " continue\n", + "\n", + " _, axes = plt.subplots(1, len(nd_set), figsize=(len(nd_set), 3))\n", + " for i, ax in zip(list(nd_set), axes):\n", + " label = label_issues.loc[i][\"given_label\"]\n", + " ax.set_title(f\"id: {i}\\n GL: {label}\", fontdict={\"fontsize\": 8})\n", + " ax.imshow(dataset[i][\"image\"], cmap=\"gray\")\n", + " ax.axis(\"off\")\n", + " seen_id_pairs.update(nd_set)\n", + " count += 1\n", + " if count >= nrows:\n", + " break\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:16.254324Z", + "iopub.status.busy": "2024-06-25T23:02:16.254134Z", + "iopub.status.idle": "2024-06-25T23:02:16.646922Z", + "shell.execute_reply": "2024-06-25T23:02:16.646200Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAK8AAAB2CAYAAAC+o8OSAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8WgzjOAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAWwUlEQVR4nO2dfXAbR/3GH92dTifJkt9k2Y7t1E4aasvyW0wayIQhTUObZNoUmjCFlpZJxtBCM9OBUmBoAwXyB+V1pjOlpTAlTDNAKQGmpJ0Q0gSGkGmgdtK6DnWbguPEjhNjS7b1etLd/v7Ib7crRbZlW25yw35mPJZ0t6vd55777nf3TpKNEEIgEFgQ6Uo3QCCYL8K8AssizCuwLMK8AssizCuwLMK8AssizCuwLMK8AssizCuwLAU3b3t7O6ampnJue//734+//OUvs9bxxBNPoKWlBe3t7QgGg3j88cfZtscffxzBYBAtLS1obW3F3r172bZ//vOfWLNmDVwuFz760Y9m1DnTNgDYvXs3li9fjuXLl+Phhx/Oq69XgkLoCwD79u1DS0sLgsEggsEgBgYGAACjo6O49dZb0draiqamJnz6059GPB4HAPz+979Ha2sr2tvbEQgE8PDDD4NeoI3H47jnnntYfVu2bMHo6OiC+zsj5D2ks7OTHDlyZNb9wuEwezwxMUHq6upIT08PIYSQQ4cOse2Dg4OkvLycnD59mhBCyNmzZ8nx48fJU089RW677baMOmfa9te//pUEAgESiURIIpEgnZ2dZP/+/fPv6BUiX317enrIddddR4aGhgghhExOTpJoNEoIIeSBBx4gX/jCFwghhKTTaXLzzTeTJ554gu1nGAYhhJBkMklWrVpFfve73xFCCPnRj35Etm7dSkzTJIQQ0tXVRR566KGC9i+bgkdem82GcDgMADh27BiLntu3b0c6nc6rjuLiYvY4Go0ilUqx5zfeeCPbXldXh6qqKpw9exYAUFtbi+uvvx4Oh+OyOmfa9txzz+Huu++G2+2Gw+HAjh078Ktf/SrvPr+XFELfH/zgB/jiF7+IJUuWAAA8Hg9cLherf2pqCqZpQtd1xGIx1NbWsv0k6ZJlEokEkskkbDYbKxeLxZBKpZBOpxGJRFi5xWLRcl5d13HHHXfg+9//Pt544w188pOfxGuvvca2P/XUU/j6178+bfnf/va3aG5uRn19Pb70pS+ho6Pjsn0OHTqEUCiEVatWLaitg4ODuOaaa9jz+vp6DA4OLqjOxWYh+p46dQqDg4P48Ic/jI6ODuzatQuGYQAAdu3ahdOnT6Oqqgp+vx9NTU3YsmULK3vs2DG0tLTA7/dj/fr1uO222wAA9957LzweD/x+PyorKzExMYGdO3cuogKLaN4333wTiqJgw4YNAICbbroJy5YtY9vvu+8+fOtb35q2/LZt29DX14f+/n7s3bsX/f39Gdt7e3uxfft2PPfcc3C73YvTiauYheibTqdx4sQJHDhwAEePHsWxY8fw5JNPAgB+/etfIxAI4Pz58xgeHsZbb72Fn/3sZ6zsmjVr0Nvbi7Nnz6K7uxt/+9vfAAAHDx6EaZoYGRnB+fPnUVJSMmNwKgTv6WoDHWLmQn19PVavXo39+/ez106dOoVbbrkFzzzzDNauXbvgdi1duhRnzpxhzwcGBrB06dIF1/tek6++S5cuxdatW+F0OuF2u3H77bfjlVdeAQD8+Mc/xl133QVZluHxeLBt2zYcOXLksjoqKiqwefNmPP/88wCAp59+Gh/72MegaRpUVcVdd92Vs1whWTTzNjY2Ip1Osw4cOnQI77zzTl5lT506xR6Pjo7i8OHDaG1tBQD861//wubNm/H000/jIx/5SEHa+vGPfxzPPvssotEokskknnnmGXziE58oSN2LxUL0vfPOO1mkTKfTOHjwINra2gAAy5Ytw4EDBwAAqVQKf/rTnxAMBgFcivamaQIApqam8OKLL7LjsmzZMhw8eBCEEBBC8OKLL7Jyi0ahZ4AASCgUIoQQ8ve//520tbWRYDBItm/fTtra2ths+MknnyS7du3KWcdnP/tZ0tTURNra2khrayub7RJCyIYNG0hJSQlpa2tjfwcOHCCEEPLmm2+SmpoaUlpaSjRNIzU1NazsTNsIIeSb3/wmaWhoIA0NDeSrX/1qoWUpGIXQ1zAM8uCDD5LGxkbS3NxM7rvvPpJMJgkhhPz73/8mN910EwkGg6SpqYns2LGDxONxQgghjz76KGlqaiKtra2kubmZfOMb32CrC2NjY2Tr1q0kEAiQQCBAbr/9djI6OrqoWtgIEZ+kEFgTcYVNYFmEeQWWRZhXYFkKZt5XX30VmzZtQkNDAzo7O9HR0YHdu3ez7evWrcMf/vCHOdUZDofxqU99CsFgEK2trQgGg/jlL39ZqCYDAPbv349169bNut+ePXty3hOxmAhNZ0aZd0mO3t5ebNy4EXv27MEtt9wCABgfH8d3vvOdBdX7yCOPoKKiAr29veyy5cjISCGafNUjNJ2dgkTexx57DF1dXUxkACgrK8N3v/vdBdV77tw5VFdXs8V3j8eDFStWALh0cNeuXYuVK1ciEAhkRKRHH30Ud9xxB2699VYEAgGsX78e4+PjAC6tXX7+85/HihUrcP3112cspI+MjOCGG25AZ2cnmpubsXPnTrau+V4jNJ2dgpi3p6cHq1evnlfZF154AV1dXTm3PfDAA3jsscfQ2dmJnTt3Zlxlq6+vx8svv4yenh50d3dj37597CoRABw/fhx79uzBqVOn4Pf78ZOf/ATApStB/f396Ovrw9GjR9HT08PKlJSU4I9//CO6u7vx+uuvY2BgAL/5zW/m1a+FIjSdnUWZsD300ENob29HTU0N+vr6Ztx3y5YtGdfOeW644QYMDg7i29/+NkpKSnDvvffi/vvvB3Dp/tGuri60tLTgAx/4AM6cOYOTJ0+yshs3bkR5eTkA4IMf/CC7+vTyyy/jnnvugaqqUFUVO3bsYGVM08RXvvIVtLW1oaOjA6+++mpGnVcSoenlFMS8HR0d+Mc//sGef+9738PJkydht9szbmecD263G5s3b8bu3buxb98+PPvsswCAr33ta/D5fDhx4gRee+01rFu3DolEgpXTNI09lmV52tsF+fsBfvjDH+LixYs4fvw4Xn/9ddx5550Zdb6XCE1npyDm/fKXv4yf/vSneOmll9hruq7nfX/pdBw8eBChUIg97+7uxvLlywEAoVAItbW1UBQF/f39+POf/5xXnRs2bMDevXuRSqWg6zp+/vOfs22hUAhVVVXQNA0jIyPsppMrgdB0dgqy2tDW1oaXXnoJu3btwv3334+KigrY7XZ87nOfw/ve9z62X1dXV8Y9ns8//zxGR0fxwgsv5Bzment78eCDD4IQAkmSUF1dzT7288gjj+Duu+/GL37xCyxfvhzr16/Pq62f+cxn8MYbbyAQCKC0tBQf+tCH0N3dDeBSPrht2zY0NzdjyZIl7HbDK4HQdHbEvQ0CyyKusAksizCvwLII8wosizCvwLII8wosizCvwLII8wosS94XKebzsfX/Fea7VC40nZ58NBWRV2BZhHkFlkWYV2BZhHkFlkWYV2BZhHkFlkWYV2BZhHkFlkWYV2BZhHkFlkWYV2BZhHkFlkWYV2BZhHkFlkWYV2BZhHkFlkWYV2BZhHkFlkWYV2BZhHkFlkWYV2BZhHkFlqUg3887H+jHviVJgizLAC59BbxpmtN+7Fl8G+vMTKepYRjTlrGyplfEvDabDU6nE6qqoqSkBDU1NZBlGcPDwxgfH2e/Rk6/AFmSJKTTacTj8Sv26zxXOzNpOjY2lhEYJEmCzWazvKZXzLwOhwMulwt+vx9NTU2w2+2w2WxIpVJIp9NIJpMghECWZciyDF3XkUwmLSv0YjOdpgCQTCZhGMZlAcHqms7LvDabbV7Djdfrhd/vh8vlwpIlS1BaWgpN0+DxeAAATqcTbrcbpmmy1wAwoe12O9LpNDsQhmEw8a08/AHz17S4uBiVlZXQNO0yTQkhTFMeSbo01dF1HaqqIpVKMU1N00QikbCEpnM2Lx1yCCHsL1/q6+uxceNGVFRUoKOjA/X19Th37hxOnDiBiYkJ+Hw+JBIJqKoKj8cDWZZZJE6n09B1HYZhIBKJIBaLIZFI4MKFC5YRezoWqummTZtQUVGB9vZ2XHPNNTh37hy6u7sRDofh8/nYie/xeCBJEtPUMAxm3Gg0img0ing8josXL7J04mrWdE7m5b9bay6RQlEUyLIMj8cDv98Pv9+PyspKVFZWIhKJQJIkmKYJSZJgt9uhqipcLhcURYGu60ilUjBNEw6Hg00+6HurqgrDMNjfXA/+lWahmhYVFaGioiJD16mpKVYX1dThcMDpdEJRFJZG0LkFTRvof1VVmbmvZk3z/kEVKjIvdj5FXS4X2tvbUVtbi7KyMtTV1UGSJEQiESQSCUxOTmJ4eJjlX7quw+FwwOv1QlEUqKrKcjf63pFIBNFolKUN6XQa4XAY4XAYuq5jampqxt8qm+0L7mYyES3LH9CFftHeQjQtLy9HbW0tZFnG1NQUEokEJiYmMDQ0xDRNp9MZmtrtdjbHyKVpIpGAYRgIh8MIhUJIpVJXpaZzThvmeqA0TUNbWxtWrlyJaDSK8fFxTE5O4vDhw+jr64OiKNA0DbIso6SkBEVFRUwku92O4uJiOJ1OyLIMVVUBXPoBO7vdDkmS2LahoSEMDg4iGo0ikUhMK/RMIvMHkx/G+e00X6RRqhAR6Upo6nK5MjSVJAmKokCSJGiaBkmSMDw8DEmSWIq2WJpml8979MlrrxwNmu1NvF4vysrK4PV6EYlE8M477yAej2NychLRaBSRSISlADQtiMfjsNlsUBQFhmGwnDcWi7HIQSdtVASas9G0xG63Ix6Ps9wumUzm3S/an1wiZ+8zW//nAj2Asw3PXq8XPp8PRUVFC9I0nU5naMr/miUhBOl0mpVZbE2z9ZyLpnNOG+jSFSGE5U25WLlyJW688UaYponjx4/jzJkzGTkWHeKAdycssiyzx/xrsizD6/Xi2muvhdvtZks9/Hqww+GAqqrQdR1DQ0OIRCIYGxvDhQsXFn0paKFpgyzLUBSFGWcmTTds2ADTNPHKK6/MW1MaYb1eLxoaGthETpbljPpUVYWmaUgkEhgeHkYkEsH4+PhVo+mcv1ya5qG80Nlnj81mQ3FxMerq6pBMJlnnsxtF66RCzHQlyDRNRCIRyLIMh8MBRVFYfYZhwG63w+v1IplMwuv1QpIkxONxdqIVMkryfS1EXYqiwOFwsAnWbJrqul4wTePxOEsz6NyCXtCgqz6qqiISicBms11VmuZt3urqakiShBUrVqCxsRGSJLFZK11yAcBmp83NzWhsbEQymcTq1avh8XgQj8cRiURgmib7HV1d19lEgR40Qgh7TF+nBysUCqGxsRENDQ2w2+3QNI0ZGQDS6TSqq6uRSqUwMDAATdMQj8dx4cIFRKNRltvZbDbWVl40PgejkYqiqiqb9ExNTSEajeYtdD6aAmBpEK8pDRLBYBCNjY1IJBJYtWoV03RqagqGYbClRF3XEYvFMjTlr7DRx4lEAkNDQ3A6nZdpSg1Kj2lVVRV0Xcfg4CBcLhdisRguXrzIVov4YJJryY/qymtqs9mgqiqKi4vZhDMSieSt35zMqygK1q5di02bNrHIlkqlWB5ETWkYBmpqanDttdcikUhgbGwMPp8P4XAYIyMj0HUdkUgE8XgcsVgMo6OjGWuPfDpAxaZC2+12NDU1oaGhAUVFRaisrITT6cTk5CQmJycBgM2ki4uL2ayZzqZp5KamyF5640WWZTnDzEVFRaiurobD4cCFCxdYG+cL1XTNmjW4+eabIcsyWx3gNY3H4zAMA7W1tVixYgUSiQRGR0fh8/kQCoVw/vx5pikNBmNjY5dpSvNgelLQgGC32xEIBFBfX5+h6cTEBMLhMIBLJ67NZkNZWRlM00Q4HGZ5Np340YDAB55sTWngoH8ejwc1NTVQVRXDw8NIpVKFn7DRKBuLxViDdV1nZzHNrYB3hzl6INxuN8rLy9lwn0qlEI/HkUwmEY/HUV5ezg4YrY8ag16goCIoioK6ujqUlpbC5XKhqKiIDbmsU/8vkN/vx9KlS1FSUgLDMNiPT2uaBgAZdfPt5vNF3rxOpxNVVVVseI3H4wsyL9WHaqooCus31ZRe9jVNk2mq6zrTVFVVdizoikAikYDP52PRm2pKozp/sw7VtLa2NkNTTdMy0heaJ/t8vgxNfT4fW5vnRzN+fZj6gs5feE3dbjeqqqqgKAo7UfPNp/M278jICCRJwltvvQWfzwdVVaGqKiRJgtvtRlFREYtq9EaakZERAEBVVRUqKyvZQQHeTS+okNlDW/biOB8ZKyoqUF5eDrvdDqfTyWbBuq5n7FteXo7rrruOGSSVSjEBgXdzO/qY1k/hJ1T0IDudTpimicOHD+Po0aMZ5p8rVNO3334bZWVlGRcS6GVdepGBXiIfGhqCzWZDZWUl/H5/RoqVnXZRnWk/+TSJx2azwefzoby8HIqisGW00tJStrJAy/l8PjQ1NSGdTiMSiSCVSrE20v3oX3bk5TWlk25ZluF0OmEYBo4cOQJZlmfM03nyNm8sFoMsy5iYmMDFixehaRpcLhcTlkYzeobSzsmynLFWS4d0vlP82ml2zpsrmecX2en708V3XjxN01BSUpJRdrpbMGeKoPyMnY4cfX19KCoqylvomTQNh8P473//C03T2EgCgEVVGtVoGkDXb6fTNLuPvJmoptlrq4qisGNHgxKvKa2H15S/S43PebO1zfV+1Ly0vbymBY+8tANjY2N4++232QxZkiS4XK7LogRvTE3ToKoqE4PvCN8JKj5PrgkUP7TTx/xByrUWy0duerJk941/z+zX6PvTof3kyZMYGBhYUNpA32N8fBynT59ml3GpZlRTOlrwWjmdTtjt9gxNeQ35SSzfL17TbB2zgwltI2/e7Pbn0tRms2WManQ7f2x4Y9Noe/LkSZw5c6bwOS89c+lNG7yQdJ2SF4gXm3aMCk238x3j13Tpa9kG5VcJsgXk33umPvD70DbyKQs/CmSnM7QNpmmiv78f//nPfxa03kk1HR0dRTKZzGgbjYQAptWZmpRqStvOR15+rZcfQabTlDfYdBdPckVSCn+TEdWGHlM+laCjCO3DfDTN27x0wkAnBLzp+GGDF5A/+6kZ6dCUK6Ly4lLx+X15EfLpYC4hebLNS8meNVMj05PLNE3EYrEFrzbwmtIrYfzJz09u+Lybhx/uadt5/fhAwAeGmTTlzcuTffLnIl/z0skybRshBIlEYk6a5m1eOjGhy1Z8o6iA9HH2FZ1scqUBuUSg+9KO56ojV5nsYQ9AxmRltgNAy82Uz4ZCoQWbN1tTfuimETX7hOdXdCjTpVbZTKdpPlE1+1jyIxJfNhf869kTZL6+uWo6p8hL/9N1Qt68VGg6vObqLP+fP1DZneCh2/kzOVfZ2XI2GknziSa52ppdXyHud51JU3qy8Zryfc41yZyvprwGueYE2YGI1jfTpezsNvF9nk7zud6XvaCPAVEz0LOFPs4l4nTmpY/5bdnweVeu2Wt2PblytZlEm6t5ASw46k4H1ZReKeM15duSa1K6UE35bdmaZJ84tK35aJDr5C+EpnO+MadQ+1mdXOaZK0LTTOaqacE/gLkY0eh/HaFpbsSXjggsizCvwLII8wosizCvwLII8wosizCvwLII8wosizCvwLII8wosizCvwLII8wosizCvwLII8wosizCvwLII8wosizCvwLII8wosizCvwLII8wosizCvwLII8wosizCvwLII8wosS95fOiIQXG2IyCuwLMK8AssizCuwLMK8AssizCuwLMK8AssizCuwLMK8AssizCuwLP8HrCc374KV5soAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_near_duplicate_issue_examples(near_duplicate_issues_df, num_examples=5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Learn more about handling near duplicates detected in a dataset from [the FAQ](../faq.html#How-to-handle-near-duplicate-data-identified-by-cleanlab?)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Dark images\n", + "\n", + "Datalab can also detect low-quality images in the dataset, such as those that are abnormally dark. It can be challenging for both annotators and models to assign a proper class label for low-quality data, which can hamper model training and testing.\n", + "\n", + "The `dark_issues` DataFrame reveals which examples are considered to be abnormally dark. We can sort them via the `dark_score` which quantifies how severe this issue is for each image (lower values indicate more severe instances of a type of issue). This allows us to visualize images in the dataset considered to be too dark (you might consider omitting such low-quality examples from a training dataset)." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:16.649430Z", + "iopub.status.busy": "2024-06-25T23:02:16.649242Z", + "iopub.status.idle": "2024-06-25T23:02:16.658002Z", + "shell.execute_reply": "2024-06-25T23:02:16.657331Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
dark_scoreis_dark_issue
348480.203922True
502700.204588True
39360.213098True
7330.217686True
80940.230118True
\n", + "
" + ], + "text/plain": [ + " dark_score is_dark_issue\n", + "34848 0.203922 True\n", + "50270 0.204588 True\n", + "3936 0.213098 True\n", + "733 0.217686 True\n", + "8094 0.230118 True" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dark_issues = lab.get_issues(\"dark\")\n", + "dark_issues_df = dark_issues.query(\"is_dark_issue\").sort_values(\"dark_score\")\n", + "dark_issues_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### View top examples of dark images" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
We define a helper method plot_image_issue_examples to visualize results. **(click to expand)**\n", + "\n", + "```python\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "def plot_image_issue_examples(issues_df, num_examples=15):\n", + " ncols = 5\n", + " nrows = int(math.ceil(num_examples / ncols))\n", + "\n", + " _, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(1.5 * ncols, 1.5 * nrows))\n", + " axes_list = axes.flatten()\n", + " issue_indices = issues_df.index.values\n", + "\n", + " for i, ax in enumerate(axes_list):\n", + " if i >= num_examples:\n", + " ax.axis(\"off\")\n", + " continue\n", + " idx = int(issue_indices[i])\n", + " label = label_issues.loc[idx][\"given_label\"]\n", + " predicted_label = label_issues.loc[idx][\"predicted_label\"]\n", + " ax.set_title(\n", + " f\"id: {idx}\\n GL: {label}\\n SL: {predicted_label}\",\n", + " fontdict={\"fontsize\": 8},\n", + " )\n", + " ax.imshow(dataset[idx][\"image\"], cmap=\"gray\")\n", + " ax.axis(\"off\")\n", + "\n", + " plt.subplots_adjust(hspace=0.7)\n", + " plt.show()\n", + "```\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:16.660364Z", + "iopub.status.busy": "2024-06-25T23:02:16.660180Z", + "iopub.status.idle": "2024-06-25T23:02:16.665460Z", + "shell.execute_reply": "2024-06-25T23:02:16.664906Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "def plot_image_issue_examples(issues_df, num_examples=15):\n", + " ncols = 5\n", + " nrows = int(math.ceil(num_examples / ncols))\n", + "\n", + " _, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(1.5 * ncols, 1.5 * nrows))\n", + " axes_list = axes.flatten()\n", + " issue_indices = issues_df.index.values\n", + "\n", + " for i, ax in enumerate(axes_list):\n", + " if i >= num_examples:\n", + " ax.axis(\"off\")\n", + " continue\n", + " idx = int(issue_indices[i])\n", + " label = label_issues.loc[idx][\"given_label\"]\n", + " predicted_label = label_issues.loc[idx][\"predicted_label\"]\n", + " ax.set_title(\n", + " f\"id: {idx}\\n GL: {label}\\n SL: {predicted_label}\",\n", + " fontdict={\"fontsize\": 8},\n", + " )\n", + " ax.imshow(dataset[idx][\"image\"], cmap=\"gray\")\n", + " ax.axis(\"off\")\n", + "\n", + " plt.subplots_adjust(hspace=0.7)\n", + " plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:16.667773Z", + "iopub.status.busy": "2024-06-25T23:02:16.667544Z", + "iopub.status.idle": "2024-06-25T23:02:16.846978Z", + "shell.execute_reply": "2024-06-25T23:02:16.846384Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlkAAACfCAYAAADK1szHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8WgzjOAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAqGUlEQVR4nO3deZQU1dkG8GcY9p0BZJEdQZiNZktAFlnkOKJsQuQIImIQERRQYuAEPyFKPAGSuIEhEgMGNaKiIxo1YFhiAA+bIIEYWcMqOwz7MDPv9weHyq13Zm5PTXfNdPc8v3M4p2pudXVV3arq4r5v3RsnIgIiIiIiCqtSxb0BRERERLGID1lEREREPuBDFhEREZEP+JBFRERE5AM+ZBERERH5gA9ZRERERD7gQxYRERGRD/iQRUREROQDPmQRERER+SCiHrICgQDOnz+fZ1mHDh2wevXqoOuYN28eUlJSEAgEkJycjFdeeSXXMsePH0edOnUwcOBA52+XL1/Ggw8+iOTkZCQnJ6N///44ceKE63Migl69eqF69equv8+aNQuJiYkIBALo1KkTNmzYEHQ7S6Jw1O+MGTNQu3ZtBAIBBAIBDB8+3CnLycnBE088gebNm+OWW27B3LlznbJXXnkFycnJSElJQWpqKt566y2n7Fe/+pWzvkAggKpVq+Kpp55yyt944w20aNECzZs3xyOPPIJr164VYu9LHr+v52PHjuHee+9FamoqWrdujZdeeskp++ijj5CamopAIIDExERMmzYN5uAWa9asQceOHZGUlITExESsX7++0PtJ14WjvsePH++6FsuXL+/Uua1Og9U3eReO+jxx4gT69evnXKMjR47E5cuXnfKZM2eiefPmaN68OaZNm+b8/eLFixg1ahRSUlLQqlUrTJ06NVd95vd7HHEkSrRv315WrVoVdLmzZ8860+fOnZOGDRvKli1bXMsMHDhQHn74YRkwYIDztxdffFEGDx4sOTk5IiIyevRoefrpp12f++1vfyujR4+WatWqOX/75ptvpFGjRnL+/HkREVm8eLF07NjR495RQet3+vTpMnHixDzL3nzzTenVq5dkZWXJqVOnpFGjRvKvf/1LRES+/PJL59w4cOCA1KxZU3bv3p1rHVeuXJGEhATZtGmTiIjs3btX6tWrJ0ePHpWcnBzp16+fzJ07t3A7SY5wXM/Dhg2TadOmiYjIhQsXpE2bNrJhwwYREcnIyJDs7GwREbl69ap07NhRPvzwQxEROXz4sDRu3Fh27twpItfr/MyZM+HaNcpDQevbdPToUSlfvrwcPXpUROx1aiuj8CtofU6cOFGefPJJERHJysqSO++8U+bNmyciImvWrJHExES5cOGCXLlyRdq3by+ffvqpiIj84he/kBEjRkhOTo5kZmZKWlqavPfee6515/V7HIkiqiUrLi4OZ8+eBQCsW7fO+d/rqFGjkJWVVaB1VKtWzZm+ePFirlaHN954A02bNkW3bt1yffelS5dw7do1ZGVl4cKFC2jQoIFTvmPHDqSnp2Pq1Km5Pnft2jVcvHgRAHD27FnX5+h/wlG/NkuWLMEjjzyC+Ph4JCQkYOjQofjLX/4CAOjdu7dzbjRs2BB169bFwYMHc60jPT0dDRs2RPv27QEAH3zwAfr374+6desiLi4OY8eOddZJdn5fz9u2bUPfvn0BAJUqVUL37t2xePFiAECVKlVQqtT129uVK1dw9epVxMXFAQBee+01DBs2DK1btwYAlCtXLvL/NxwFwn19v/nmm7jzzjtRt25dAPY6tZVR4YSjPuPi4nD+/Hnk5OQgMzMTly5dcn4flyxZghEjRqBSpUooV64cHn74Yefeum3bNqSlpSEuLg5lypRBnz59nGsbyP/3OCIV91OeCYCcOXNGrl69Kg0aNJAVK1aIiMjf/vY3AeA8Of/+97+X//u//8t3Pe+//74kJiZK2bJl5Te/+Y3z971790qHDh3k0qVLsnDhQldL1uXLl+W+++6TatWqSUJCgtx1113O/4wyMzOlc+fOsnPnTtm3b1+uJ+fZs2dLhQoV5Oabb5ZmzZrJoUOHwnNAYkw46nf69OlSv359SU1NlZ49e8rKlSudsuTkZFm3bp0zP2/ePBkxYkSudaxYsULq168vFy5cyFXWp08feeWVV5z5xx9/XF544QVnfseOHdKwYUNvO15C+X09P/jggzJ+/HjJzs6W48ePS8uWLaVfv35O+dq1ayU5OVnKlSsnkyZNclqpBw0aJJMmTZLevXtLmzZt5PHHH8/zXCBvwlXfN7Rs2VKWLVvm+lt+dRqsjLwLR32eOnVKevToIbVr15bKlSvLmDFjnLJ77rlH3nnnHWf+r3/9q3Tr1k1ERJ599lkZOHCgXLlyRc6fPy/dunWTlJQUEQn+exxpIvIha9u2bdKkSRNXWbNmzTw3N+/bt08CgYB89913kpOTIz169JCvv/5aRCTXQ9bHH38sQ4YMkcuXL8vVq1fl/vvvd0IR06ZNkzlz5jjrNCt179690qlTJzl8+LCIiLz66qvSpUsXj3teMoSjfo8ePSqZmZkiIvLPf/5TateuLfv37xeRgj1kffvtt9KgQQP56quvcq17//79UqFCBTl16pTzNz5kFZ6f17OIyIkTJ2TkyJGSmpoqd9xxh4wZM0YGDRqU63PHjx+Xbt26yZo1a0REpF+/ftKuXTs5ffq0ZGZmyvDhw2Xy5MmF20lyhLO+//GPf0i9evUkKysrz3JdpwUto4ILR33OmzdPxo0bJ1lZWZKRkSE9evSQBQsWiIj9IevSpUsyYcIEadOmjXTv3l0mT54sbdu2FRH773EkiqhwoU1hmn6bNGmCH//4x/j000+RkZGBb7/9FkOHDkWTJk3ws5/9DMuXL0fv3r0BAK+//joGDRqE8uXLo2zZshg+fDhWrVoF4HqS7KuvvoomTZqga9euyMjIQJMmTXDixAksXboUKSkpqF+/PgBg1KhRWLt2LTIzM8O38yVAQeu3bt26KFOmDACgS5cuaNu2LTZt2gQAaNSoEf773/86y+7fvx+NGjVy5nfu3Il77rkHf/rTn9C1a9dc6164cCEGDBiAhIQE52/B1kmFE+r1DAC1atXCokWLsG3bNqxYsQJxcXFISkrK9bnatWujb9++eP/99wFcr9O7774bNWrUQJkyZXD//ffj66+/Dm2HyMprfb/xxhsYOXIk4uPj8yzXdVrQMgqPgtbna6+9huHDhyM+Ph5VqlTBkCFDnN9V2721QoUKePnll7F161asWbMGtWrVcq5t2+9xRCrupzwTVPPkjVDQihUrXM2TNjt27HCmjx8/Li1atJDly5fnWk63ZD3xxBMycuRIycnJkZycHHnsscdcTZs36CfnpUuXSmJiopP4/u6770rLli0LuMclSzjq9+DBg870999/LzfddJP85z//EZHrdaoT37/99lsREdm5c6c0btxYvvjiizzXm52dLY0aNXKaxG/Ys2dPrsT3V199tTC7X+L4fT2fPHnSadXcsmWL1KlTR44cOSIiIv/+97+dcH9GRoZ07dpVXn/9dRG5Hlbq2rWrXLlyRURExo8fL+PGjQvPTpdg4ahvkesvOFSqVEm+//57199tdWoro8IJR33269fPCSVmZmZKv379nMjAqlWrciW+f/LJJyJy/Ry4ePGiiFyPFjVq1Ei++eabXOuPhpas0sX1cGdTtmxZLFmyBOPGjUN2djY6duyINm3aOOXz58/HkSNH8Nxzz+X67Msvv4yvvvoKZcuWhYhg0qRJ6NOnT9DvnDFjBsaMGYPk5GQAQKtWrfCHP/wh6OcGDRqEjRs3okOHDihXrhwqVaqEd955x8Peljyh1O+0adOwefNmlC5dGvHx8Zg3bx5atmwJABgxYgQ2btyIFi1aIC4uDk899RRSUlIAABMmTMC5c+cwZcoUTJkyBcD1rjfuvPNOAMCXX36JUqVKOS2bNzRr1gy//OUv0aVLFwBAjx498Oijj4b/oMQwv67nDRs2YMKECShdujSqVKmC9957D/Xq1QNwPal2yZIlKFOmDLKzszFkyBCMHj0aAHDbbbehf//+aNu2LeLj45GUlIT58+cXwZEoGUKpbwB499130b59e7Ro0cL1d1ud2sooNKFev2PHjkVKSgqys7PRuXNnPPnkkwCu30uHDh3q3KOHDh2Ke+65BwCwd+9e3HfffShdujRKly6NF198EYFAwP+d9UGcCDsTISIiIgq3qMnJIiIiIoomfMgiIiIi8gEfsoiIiIh8ENEPWZs2bcJdd92Fpk2bon379mjbti1mzpzplPfo0QPp6eme1nn27Fk88MADSE5ORmpqKpKTk51E9UWLFrnGM9TbMnTo0HzXu3r1anzxxReetqWkY/3GPtZxbGP9xjbWb+gi8u1CANi+fTvS0tKwaNEi542D06dP49e//nVI633mmWdQu3ZtbN++3eny/4cffgj6uQ4dOmDJkiV5lmVlZWH16tU4e/Ys0tLSQtq+koL1G/tYx7GN9RvbWL/hEbEtWbNmzcLo0aOdygWAhIQEzJ49O6T1Hjp0CPXq1XONeWW+KnzhwgXcf//9SElJQYcOHbB3714A15+Sb7xCun//flSvXh1TpkxBu3btMHfuXMyfPx9vv/02AoFAvq8m0/+wfmMf6zi2sX5jG+s3PCK2JWvLli0YPHhwoT67bNkyLFu2DH/84x9zlU2cOBFDhgzBkiVL0LlzZ6SlpblOoo0bN2Lr1q1o2rQppk6dilmzZuXZX9a5c+eQlJSEWbNmAbjeBHr27Fm89NJLhdrmkob1G/tYx7GN9RvbWL/hEbEtWdrTTz+NQCCAm2++GTt27LAu279//zwrFwB69uyJAwcO4Pnnn0f16tXx6KOPYvz48U55586d0bRpU2d6z549ea6nTJkyeOCBBwq5N6SxfmMf6zi2sX5jG+u3cCL2Iatt27bYsGGDMz9nzhxs3boVZcqUwbVr10Jad6VKldC3b1/MnDkTS5cuxeLFi52y8uXLO9Px8fHIysrKcx0VK1ZEqVIRe/giHus39rGOYxvrN7axfsMjYrfw5z//ORYsWIDPPvvM+VtmZma+B7ygli9fjjNnzjjzmzdvRvPmzUNaJwBUrVoV586dC3k9JQXrN/axjmMb6ze2sX7DI2Ifstq0aYPPPvsML7/8Mpo2bYof/ehH6NmzJx577DFnrDoAGD16NBo0aOD8W79+PZYtW5bvuFXbt29H9+7dnddHly1bhrfeeivk7R00aBC2bt0acUl3kYr1G/tYx7GN9RvbWL/hwbELiYiIiHwQsS1ZRERERNGMD1lEREREPojohyxb9/tmx2RefPjhh2jfvj0CgQBatWqFXr16IScnBwDQpEkTbN26Nc/PjR49GqtWrcp3vTNmzMCVK1c8b080CHc9BAIBBAIBJCYmIj4+3pm3DZngxYwZMzBp0qQ8y5YtW4Ynn3wy38+mp6fj66+/tq7/+PHjSEpKyvX3SB3Wobjw+o19rOPYxvoNXcR2RgoUvvv9/Bw9ehRjxozB5s2b0bhxYwDXO1y70fOsTX59fmRlZaF06dL45S9/iUmTJrleP40V4a6HGxfR/v37EQgE8r2o/NC/f3/0798/z7KsrCykp6cjEAigU6dO+a7j448/znMdkTqsQ3Hh9Rv7WMexjfUbuohuyQrW/b5Xx44dQ3x8PBISEpy/tWvXzlXBH374odMZWn4DYT700EN4+OGHnTckxo4dCwDo1q0bAoEAjh8/XuhtjEThrodw2bVrF7p06YI2bdogJSUFzzzzjFN29OhR9OvXD4mJiejVqxdOnz4NwD0A6erVq5GUlISf/vSnCAQCePvtt7Fs2TLMmTMHgUAg34s6PT0dgwYNcv1t69ateQ7rsHjxYqSmpiI1NRV33303Dh8+7GxHr1690L9/fyQmJqJ79+7Yv39/mI9Q8eL1G/tYx7GN9RsGEsFWrlwpCQkJ0q5dOxk/frx88sknTtmqVaukTZs2eX7urrvuko0bN+b6e3Z2ttx7771So0YNGThwoMyePVsOHTrklDdu3FieeOIJERE5ceKEVK1a1Sm//fbb5aOPPhIRkZEjR0pqaqpkZGQ4nwUgZ86cCXGPI1O46+GGffv2SbVq1Qq9XRMmTJAXXnjBmT916pSIiEyfPl0aN24sJ0+eFBGRoUOHOsstXLhQBgwY4Gx7XFycrF692lnHyJEj5cUXX8z3OzMyMqRZs2aSk5OTq2z69OkyceJEZ3779u1Sp04d5xyaOXOmpKWlOdtRtmxZ2blzp4iIzJo1S/r06ePxCEQ2Xr+xj3Uc21i/oYvolqxg3e/n57PPPkOHDh1y/b1UqVJYunQp1q1bh7S0NKxduxZJSUnYvXu3s8ywYcMAALVq1UKzZs2wb9++PL/jJz/5CapUqVLIPYsu4a6HcOnevTsWLFiAadOmYfny5ahevbpTlpaWhpo1awKwD83QrFkz3H777QX+zs8//xxpaWkFat5etWoV0tLScPPNNwMAxo0bh5UrVyI7OxsAcNttt6F169YAgDFjxmD16tVOWSzg9Rv7WMexjfUbuoh+yALs3e8XVqtWrfDoo48iPT0dnTp1wrJly5yygnbpX7ly5ZC3I5r4UQ/B7Ny500mKz+vCHjx4MNauXYtbb70Vc+fOdQ0y6lc9fvTRR0640auCPJjFGl6/sY91HNtYv6GJ6IescHe/f/jwYaxdu9aZP3PmDPbt2xeWLv2rVKkSkV36h4NfwyAEk5iYiK1bt2Lr1q2YN29ervJdu3ahTp06ePDBBzF79uygbwUWhG1ohszMTKxfvx49evQo0Gd79uyJL774AkeOHAEAzJ8/H71790Z8fDwAYP369fjuu+8AXE/q7Nmzp1MWC3j9xj7WcWxj/YYuot8u3L59OyZPngwRQalSpVCvXj1X9/s7d+5EgwYNnPnOnTvj/fffR9++ffHcc8/laq7MysrCc889h3379qFixYrIysrCyJEjMWDAgJC3dfLkyejTpw8qVqyI5cuX46abbgp5nZEi3PUQLh988AHeeustlC1bFjk5OZg/f37I6xwxYgQeeughpKenY/z48a6hIVauXImuXbuiTJkyeX520KBBWLx4MQKBAO699148++yzmDNnjvO2YcOGDbFgwQJn+dtuuw1TpkzB7t27UbNmTfz5z38OefsjCa/f2Mc6jm2s39BxWB2iAho7dizuuOMODBkyJOR1LVq0COnp6c7bMkREFHsiuiWLKJKEo6WMiIhKDrZkEREREfkgohPfiYiIiKIVH7KIiIiIfMCHLCIiIiIf8CGLiIiIyAd8yCIiIiLyQYG7cIi0IUH0mEUVK1Z0po8dO5bv506cOOGaX758uWt+1KhRznRmZmYom+i7cL4YGmn1q1WqVMmZrlevnqvMrO/Spd2ntJ7X9R/Jwv3ib6TXsal9+/au+c2bN+e7rN6vG2NFAsChQ4fCu2FhVpKuYXMYlFatWrnKtmzZ4kzn5OS4ymrUqOGar1OnjjN9Y8SESFWS6te8L5crV85Vtn///nw/Z9YnYP/99kupUv9rb9Lnn01B6pctWUREREQ+4EMWERERkQ8K3BlpcTdVVq9e3TXfqFEj17zZ3Hf8+HFXmTmG0TfffOMqq1q1qmu+Zs2a+W7DjYF+byjucGIsNUXr79cjrJvNz7p5+eTJk860buo1w4wAcOXKFWfaHPgUAK5evephi/1X0sKFtjEuA4GAM22O/5jX/PPPP+9M63tBLNdxcdevDhHp67Rly5bO9JdffukqM4eX0uPY/e53v3PNT5482ZlOTk52lR0+fNg1r6/xohZL9Wu7JwNwDc68cOFCV5l5/Z46dcpVNnjwYNd8tWrVnOkDBw64yrKysgq+wRbm84LGcCERERFRFOBDFhEREZEPImqAaB0SNOd12eXLl13z2dnZzrR+8/Do0aPOtG5y1es130bTzaGpqamu+bNnzzrTu3fvBnljHnt9rDWzvs3wIABUqFAhz+UA4Nq1a655s7lZh4b1sufPn3emzTAjFZzZ9K+vNfOaBYCEhARneuPGja6yHj16ONOnT592lf397393zZshQR1O0qGKS5cuOdM6tKTPh5KqTJkyzrR+069WrVp5LpcX83rS4Rrzre7WrVu7ytatW+eaN98k1+Ejfe8vX758vt9pnif6nlJSmdcr4D6e5nHPi3lt6RD+mjVrnOmnn346388BQO3atZ3pBg0aWL/TrEN9/Zr3bFt4EPAWIvSKLVlEREREPuBDFhEREZEP+JBFRERE5IOI6sIhKSnJNW/GW728uqlzPeLj451p/Ur/xYsXC71es2fpXbt2ucp0nNkP0fZ6sO4uw8ylCpb/Yu6r7jrDzCPQZTrWbp5Hujd4nVNi5onpnsP9jOHfEC1dOJjHqWHDhq4ys250vpzuTsGc1zl6e/bsKfD2NG/ePN/v0MfAzAPTzJ6nf/jhhwJ/vxeReA3rV/V1no7JzGnT14S+nsyudMzPAe6uF/Tn9L3BzAPTuZK2bdDfae6n/ly4crQisX418ziY1w7gznPS9aCPmflbeuHCBVfZAw884Ex//vnn+X4HADRp0iTPdebFdm5+//33+ZbpHK3C3s/ZhQMRERFRMeFDFhEREZEPir0Lh7JlyzrTofTmaoZ+dBjIpMMHXpbVoSizmVO/2lwU4cJoo4+12fyckZHhKrOdC7orALNpWjcD6yZ2L/VthpnNV8GB3KGHkswMEepjaIaBQmmir1+/fr6f06FF8zq11Sng7g5CL2uGpXSoJJavb/3avBk21ee9+Vq/TqfQYT/zfqnLzPunDh/pe6tZF17OIX3+mevRYSez93K9bKwx76c6PGeG+PWxNtM9AHd3D/qaNK8lHZLUozLYuuTR22CeK2Y4GnCfm8V5v2ZLFhEREZEP+JBFRERE5AM+ZBERERH5oNhzsszYrY7jmnFUW7cMRUXHoM1tCtfo4LFM52GcOHHCmdY5EeYwKgBw8OBBZ3rbtm2uMj2Uhknn2ZjnmC1HQ9PfUZJzsvS1Z3aLol/dtg21YhvqQudemPWoy2x5OcHuE+Y1rLfHHAamJOVc6uNg1qEeWsU8vvr60fdsc942TJX+HdDXsLl9wYbysV3T5nmjzyG9nzpHK1bp42DmsOrfOFsOnv6tfPPNN/NcJ5C7vs1jrfOgzRxuvb16e7zcX/zEliwiIiIiH/Ahi4iIiMgHfMgiIiIi8kGx52SZdEzVjAHrGLltSA5bHoatnyRNf6fOwzFjybZcBb1fJYkZF9f1YsbFJ06c6CpLTEx0zU+ePNmZ1rkB5np0HF5/pzkMgl6P7lPLXK/OGyjJatas6Zo3rz1bHoQegkLnW9iW9bJe2/AjtvXY8nt0P2mxTNeLeQ3pY2ReIzr/qbB9S3nJl9HL6vo15205mGb+HRDb92yd12TbV1td6DLzeOrfZ1t/mLovLHNZL33r6fXackWLEluyiIiIiHzAhywiIiIiHxR7uNAMs4XSLYMfXTrYXkEG3KFHW5gqlpuegzHDC7pZ2Bwq5dNPP3WV7d271zU/bNgwZ3rWrFn5fl84R703m591E3tJpruzsL0KH67rsiheudbXqW2/YokOAeqUCvPeFsrwUoU9F/T22bqN0PMFDXVqxdFFUFGxXb+hdEVU0GvEFtIF3Mfey/YEG/anuLAli4iIiMgHfMgiIiIi8gEfsoiIiIh8UOw5WZEW+za3R7+GaltWM4fhOHbsWOgbFqXM/A7bsEmbNm1ylQ0ZMsQ1P2XKFGfalpMVCp0bYBtypSSzdXWh83lsOXJ6WVsOlO346+3xUlfmsvo7zRyeYMOuRPMwS8GOly3XxnxNXuc82YbOsX2HlxxW/Z3B5vOjc7liucsW29A0ukzn0RaWrZsdL/dd27loG4JH32uKchg8/nIQERER+YAPWUREREQ+KPZwodnsbusioTgEa7Y2w4n6tVi+8n+d2dxrCx3p4zd16tR85/Vr5GY96fCAbm62bYNtVHmGC/9Hn9tmb9m2ZnlbT+yA/RjbwgS2zxU2dKjn9ffbQt/RTodSzOtCXyPmtaivLVuXGH7R179ZT7YeyrVIS2MJp8L2qB/K73G4rm19bprltt9rfc9iuJCIiIgoyvEhi4iIiMgHfMgiIiIi8kGx52RFGi+vD9uWjeWYvhe242Dm6Oh8HbMLDC0Shimy5evEOltula5v2+vYhR0CKZT619tg7outTIulOtd1puvFzF3yst96vQW9J3r5nK4zW5cNtnxNWzcC0S7a8oPDlWNp0kPsFGUOJVuyiIiIiHzAhywiIiIiH/Ahi4iIiMgHUZOTpfvoCDbkTTh4yavS/W5UrVo13/XEUrw/GDOvpbB9HQHuPBEvQy34xcxz8DJ8SCwyz+dQhrixfa6wdewlz8rWx5f+/uLuwy+c9DHS/UfZ+pMyPxuu4Y2Cfc5Lzo6tnsz7sm0fo12wczWW8gvzo/thPHXqVJF9N1uyiIiIiHzAhywiIiIiH0RUm7ctBBisG3wzZOElPOdldG69XnNZPVq5+cqoXyObRwMzhKBfDTfLdB0VNtQQzqZvc116vebr4LEeLjSHvsqL7ZgXtj5snwslLOVlWCU/wqDRQO+bOXSOfvXdXDbYeVLQLjv0fdbW9YI+T/RnzXuv3nZzvbpMh5fOnDnjTEdbaFF3X2AbPkr//oXrfurlHmH+Juu61/da27LmvhTn9Rq7dwoiIiKiYsSHLCIiIiIf8CGLiIiIyAdFnpOl48MmHQ82461eumyw5WDpsnLlyrnmbd9j69JBx/TN/dRDxMRyTpYt3l8cdC6NmRcSLN/ALNf7UZKGTQpWh7ZXxM3jbes+IRS6Hr2ccwUdIkjfN2zDt0Q7nXNkHk99fzx37pwzXblyZVeZl3zXwvJS1+a2AkBCQoIzHcv1GUvdjdiudX3emtdvsHxBP7Eli4iIiMgHfMgiIiIi8kGRtyPqpkuzSc/WnFwcPb4HY2uGNUOCOlx45MgR37Yp0ti6cDD5FVYs6GvjwcRyj9/B6NCobpb3IwwUCWxdjMRSeEnXr62+bb3B665qLly4UOBt8NItgnkt6nQPG53SUa9ePWc62MgAxZ32EIpgoydE876ZbN156HPa7JYE8Lcbntg4ukREREQRhg9ZRERERD7gQxYRERGRD4o8sUTHRgvaTYPO+/AydI5fzG2y5Yx5yRuIdrauDbwMraDXU9jcKi/dBng5p0rCyPU3BMvpMM/9YLkthf1O2/eHK6dE17fOMYpV+vV2fTzN/DNdZuZS6eNnu0a8LOuFbQgeW9cU+nP6fmOWR0I+cCi8XD+Rlq9luxfZfmd1nellmZNFREREFGX4kEVERETkAz5kEREREfmgyHOybN3b67wmM24aLKZqU9h8rWDfYeZk2foK0rkdemihWBpmR++rmdtgy6vSZbZcHl1my+ewrSdY/pBtvWaOhm3onligr0tbXdlyOLwcFy/1VhT0ueDXEEHFQd8fdX9SZp9CXvKsdJ6TWR7K0GcmvT1eljXv2Xpbdf6WeQy89P8ViWw5d/q3qCiutWA5nyZb/4S2vGj9+6z7rtRDLoUTW7KIiIiIfMCHLCIiIiIfFHm4UDcFX7x48X8b42HUdv26vW1Z8zttXQxoep1ehlIxl83MzHSVVa1a1TUfS+FCrbChPBu/hsqxnRv6vDW3XZ8XXoYIiQZ6CAp9LIo6PBrOLhzMbdf1aBtWJ5aGVQrWdYk5r0OJ4fpOW5lt2WBhPtu54WU4qOLoIihc9PErKd3P2IbsK8p7NFuyiIiIiHzAhywiIiIiH/Ahi4iIiMgHRZ5YoLsvMHMbvMTp/VLY/C2do2F+Vucx1KpVyzV/7NgxL5sY0WzHzNbVQbD8LC/L5ve5UOicDNuQHLGWkxVsiCMz30HnwBT2+Hv5XLheM/dy7kbacCOh0N0e6C4KCjuMjL4uzGNoy4fSx9Z2rPW26/xXnU9o8nKORXMek64/L8PoFMV+hyvH0kv3IkU51F3s3CmIiIiIIggfsoiIiIh8wIcsIiIiIh8Uez9ZZn7SyZMn8/2cznmyxVh1DNoc6iVYfyfmsD9e+lHR32n2hRXNfax45SXeb8vlsS1r688o2PA8tjwML/3z2PrJijX6OOi8F/P81nk4plCGojHzLfS5YesHyMv5YOsvSq8nlnKygvXzZstfMY+Dzom5cuVKvp+z5c/oerDlOOp8V70vtu8x16uHe9PrrVy5sjPt5xAsfgh2fzKPUXHknnm5lmw5Y/o8Mc8jfV/SuXq28zhUsXOnICIiIoogfMgiIiIi8kGRxzl007MZBtJhCLMJN9jQD2aTqC4zv0N/v25KNcszMjJy70ABv9PsqkI3VUbz68DB2Lo68EtRDOui66ywXX1EI920rq9Tc/9tXZl4Cd3a6LrQ87YQoG0bzLQCwL2f+juK8hVwv3kJJ2nm8dXnhZdX6s31BLuHmOvVaRq6vm33HzM9pUWLFq4yHS6M5m5ZdL3obpRs6Sy2OvSSGmJjSw3R9au33Sy3/c4Gu2dUqVLFmQ53OJgtWUREREQ+4EMWERERkQ/4kEVERETkgyLPydLdIpj5AGZcFHDHXy9fvuwq07Fac722GLOOT9sE63rBzMvQuQDnz5/PczkAOHPmTIG3IdrY8pMiIRfNzMEJlhNkyyswz7eSlpOlcxXNa1jnPNnyffT1ZauPwr5irbfHVlf6+81t95LHEu30eW/bV3NZfW/XXTjYuvcwBatfcz1627zUi867sq0nmnOyguXcmddEcdyjveR92ZbV+2mej+bvMZA7/9K8xzEni4iIiCgK8CGLiIiIyAdFHi48e/ZsgZc1Q4KVKlVylelmf7NcN1vbmpB1T7+2Eedt27Bv3z5XmdmTfShN2tHGS2/s4Qq52XoS1+s1tyFYD+S2sIQp1nt8103tOvxtXhc6lGjWcbBQYkHrRl+jtm5hvJxXFy5ccM2b6Qs6ZBpLdEjGS5cztu47bD1w20Z4CHZ/NJcNFk4y5/V+2UKA+rxJSEhwpqMt3SPYaBq20RTCpbBdQXhZr63udXhQL+tnV0NsySIiIiLyAR+yiIiIiHzAhywiIiIiHxR5MomZqwS4Rzc/ePCgq+zUqVPOtI6RJyUluebNmKqOmZuf1fkbOhZr5hXUqFHDVaZf+T1w4IAzrbuGMPNG9Hp0zlgs0fk7tqGRTMHyo4qCjtObcXzbUE06lyfW2I4L4K47nZNlXt+hDH9ky72wXcPBvtPcdp2jY9ax3ueiGMqpuHjJXTKvCy+5iboOze/Q+W963uwawsswOrYyfX3rXN1ozrvUx8hWn166ONLM46u/0zynbEOUAfbfR9u1rj9n7mewnCs/h8liSxYRERGRD/iQRUREROQDPmQRERER+aDIA81mHhPgztEyc7A0HbfV6zHpvrhuvfVWZ1oP3aP7tzI/q/Mw9NA+Nua+2PYr1ug+jMy4eM2aNfP9nK0/Ky/05/S8mYPjpW8UHe/XuWexTF8HeogZk772zDrXuSA6N6OgdeMlJytY/qO5rB5Ow8z9uemmm1xltiFZoo3Ow9E5hgWtC72cLQfTy7K2oZGC9UFY0GFijh8/bi03+8mKNjqf7PTp06558zfR1qdWsGNp6wetoLlTwehtMNelr1Gz/74jR464yvRvue2eFiq2ZBERERH5gA9ZRERERD4o8nChbprWzXgF5WV4HrMJWTcT2kJ5XsKDlDdbqMEWItDN/rbwoZfQorlssOE7zHI/X/GNdDo0ppv7q1at6kzbhtzR174ZjstrvSbzPApl+Bb9Hear+vq1fbP+dRhcb3s002F8XU8FDeforh/8HKokv+/U9WR2IWK73+g0Er3PuuuhaKJDbPp6NkNl1apVc5VdvHgx3/V4Cembofhg54WXEKV57uqQ79GjR51pvV86BYLD6hARERFFGT5kEREREfmAD1lEREREPojesQI8MHMtdCyWio6XvCrbsjoPw8wh0a9724bD0DlZtteXbUPJxPIQK4A7rwpw52kA7mNsG3pK52vZXpu2De+hc6c0s171evQQLeYQXIcOHXKVNWnSxJnWOTpFkW9UVI4dO+aa1/tq667Elq+l60nnS5nM4+klH1Pn7OjcG7O+bTmYej/0PSaaBfvNM/OS9f3SrBcv57xe1ryH2IavAuxDaOlzyvysmYMVjM5D9LNLlti5UxARERFFED5kEREREfmgRIQLzebwW265pRi3pGTToTwbW9O+Xo9tvbop2lyvDkPopmmzWVu//l2SwoUnTpxwzeswnxlO0qGAH374wb8N85nZhYs+x2KpC4f9+/cXeFkdBjKvmV27drnKdLcH5md12NYMJQcL1ZnnmO6dXoc2zRE9bOFK2wgiALBnzx5reSTz0k2SGT4HCt4bfF7z+ZXpe6kO6ZshS30u6LBeQfdNnxf6PsVwIREREVGU4UMWERERkQ/4kEVERETkgziJ9YQSIiIiomLAliwiIiIiH/Ahi4iIiMgHfMgiIiIi8gEfsoiIiIh8wIcsIiIiIh/wIYuIiIjIB3zIIiIiIvIBH7KIiIiIfMCHLCIiIiIf/D9jCw+O0q9MWQAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_image_issue_examples(dark_issues_df, num_examples=5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can see from above examples that too dark images can also lead to label errors as it is difficult to see the contents of the image clearly." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Low information images\n", + "\n", + "Other types of low-quality images that Datalab can automatically detect include images whose information content is low. Low information images can hamper model generalization if they are present disproportionately in some classes.\n", + "\n", + "The `lowinfo_issues` DataFrame reveals which images are considered to be low information. We can sort them via the `low_information_score` which quantifies how severe this issue is for each image (lower values indicate more severe instances of a type of issue). This allows us to visualize the images in our dataset containing the least amount of information (you might consider omitting such low-quality examples from a training dataset)." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:16.849486Z", + "iopub.status.busy": "2024-06-25T23:02:16.849298Z", + "iopub.status.idle": "2024-06-25T23:02:16.858899Z", + "shell.execute_reply": "2024-06-25T23:02:16.858348Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_low_information_issuelow_information_score
53050True0.067975
40875True0.089929
9594True0.092601
34825True0.107744
37530True0.108516
\n", + "
" + ], + "text/plain": [ + " is_low_information_issue low_information_score\n", + "53050 True 0.067975\n", + "40875 True 0.089929\n", + "9594 True 0.092601\n", + "34825 True 0.107744\n", + "37530 True 0.108516" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "lowinfo_issues = lab.get_issues(\"low_information\")\n", + "lowinfo_issues_df = lowinfo_issues.query(\"is_low_information_issue\").sort_values(\n", + " \"low_information_score\"\n", + ")\n", + "lowinfo_issues_df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:16.860956Z", + "iopub.status.busy": "2024-06-25T23:02:16.860619Z", + "iopub.status.idle": "2024-06-25T23:02:17.057206Z", + "shell.execute_reply": "2024-06-25T23:02:17.056569Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_image_issue_examples(lowinfo_issues_df, num_examples=5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we can see a lot of low information images belong to the Sandal class." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Easy Mode \n", + "\n", + "Cleanlab is most effective when you run this code with a good ML model. Try to produce the best ML model you can for your data (instead of the toy model from this tutorial). If you don't know the best ML model for your data, try [Cleanlab Studio](https://cleanlab.ai/blog/data-centric-ai/) which will automatically produce one for you. Super easy to use, [Cleanlab Studio](https://cleanlab.ai/blog/data-centric-ai/) is no-code platform for data-centric AI that automatically: detects data issues (more types of issues than this cleanlab package), helps you quickly correct these data issues, confidently labels large subsets of an unlabeled dataset, and provides other smart metadata about each of your data points -- all powered by a system that automatically trains/deploys the best ML model for your data. [Try it for free!](https://cleanlab.ai/signup/)" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:17.059755Z", + "iopub.status.busy": "2024-06-25T23:02:17.059317Z", + "iopub.status.idle": "2024-06-25T23:02:17.063925Z", + "shell.execute_reply": "2024-06-25T23:02:17.063369Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "assert set([53050, 40875, 9594, 34825, 37530]).issubset(lowinfo_issues_df.index.values.tolist())\n", + "assert set([34848, 50270, 3936, 733, 8094]).issubset(dark_issues_df.index.values.tolist())\n", + "assert set([47824, 3370, 3952, 37119]).issubset(near_duplicate_issues_df.index.values.tolist())\n", + "assert set([38093, 22628, 44031, 25316, 40329]).issubset(outlier_issues_df.index.values.tolist())\n", + "assert set([45561, 11262, 54078, 53564]).issubset(label_issues_df.index.values.tolist())" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": { + "0287f50750924a50808aa26a580014fe": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c0a788ec352d4d44892fa2536705f4dc", + "IPY_MODEL_7a970c131e874da996251416c225b32d", + "IPY_MODEL_09b1f64bf95042ea8c1ecac3dda78501" + ], + "layout": "IPY_MODEL_5f60314851a64a0fb5b1c9404a45b1e5", + "tabbable": null, + "tooltip": null + } + }, + "036431e7b0eb4772a3a6f1fefe212d9c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "036e9414f4e54bb2ad3ea1437ef80f09": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_1fa3784843874f89ba3804ef2c1994d1", + "IPY_MODEL_bf5bbd5595f146d69a145a65e34f8a50", + "IPY_MODEL_cb97d3bcaebf43a49c832bd9c9c6fd78" + ], + "layout": "IPY_MODEL_1118875ca9bf492d860849655cdaba79", + "tabbable": null, + "tooltip": null + } + }, + "071720f40df34d7e8cf1f89eb41c7639": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "093403dfc3b64cf1983221741b5fcb0b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_bf7d22d257844656898d70bcfd15fa51", + "max": 10000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_ccf67a40246c43cebf26af4e175838b6", + "tabbable": null, + "tooltip": null, + "value": 10000.0 + } + }, + "0957f4aa54a647f1b3aba75b8cfd6463": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "09b1f64bf95042ea8c1ecac3dda78501": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_415d0efe01bc49ea8b1d6d84dc5a54b3", + "placeholder": "​", + "style": "IPY_MODEL_99b9a00b6f534241af709d22c0961707", + "tabbable": null, + "tooltip": null, + "value": " 29.5k/29.5k [00:00<00:00, 4.39MB/s]" + } + }, + "0b5c4213230749ddb8206b27c8b39146": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0be4194b581f41ecb7e2a135468a0715": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0e221f2e84a64665a92b4dd92afadf8c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_c5386e95ad74411db396e3aed529bbb9", + "placeholder": "​", + "style": "IPY_MODEL_753cea66fb774bb3bb5808e039aabc8b", + "tabbable": null, + "tooltip": null, + "value": " 5.15k/5.15k [00:00<00:00, 905kB/s]" + } + }, + "0e403b4137d64c92b4e2e1b5cf28c86b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_f9af5bd782bc4390a94d9e9012aa2485", + "placeholder": "​", + "style": "IPY_MODEL_89451fe622ec44f4bd95a4a64885a893", + "tabbable": null, + "tooltip": null, + "value": "Generating test split: 100%" + } + }, + "0faf68c3a0094c88ab2178eceb37973e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "103a2144ed2f4c469fe4045caba316f3": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_e62c4e6b1a2145518b7ca20c5af34a2e", + "placeholder": "​", + "style": "IPY_MODEL_ff2c15cc372d4b1b9add158008f1c134", + "tabbable": null, + "tooltip": null, + "value": " 10000/10000 [00:01<00:00, 8897.16 examples/s]" + } + }, + "1118875ca9bf492d860849655cdaba79": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "132f49243383464090725afefab32353": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "13862db5b2154f0a8841e67f27c56688": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "14afaa7e6f6b4bd68221bd65c0679738": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "15de07958ce340bb96187e30d950657b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "18a9d2b10aca4dc287ebb4957c4494a3": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "18c006e806824839924c1b17e4aa827d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "198ad5e2828840aaad9d73385da882ab": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_3e32e3d8f8a54441ae62ebfbbc2f80e7", + "placeholder": "​", + "style": "IPY_MODEL_ed6334dc0dcb4f83b2e722f609ed7c35", + "tabbable": null, + "tooltip": null, + "value": "100%" + } + }, + "1a733db6518e46878e5bce814db1a9af": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1b1f77cff7414ad682ba630b95ef40dd": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1eec23a43c9a405f8ea876f4f6943470": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "1fa3784843874f89ba3804ef2c1994d1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_e9fbd011278a4e449456a1a586a2f6cf", + "placeholder": "​", + "style": "IPY_MODEL_a7748675b7b9444884112e01bba88439", + "tabbable": null, + "tooltip": null, + "value": "Downloading data: 100%" + } + }, + "1fb935317d234ca2ae7d9781c1684c2f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "225de54aaa1d4cbaacb8d8ab90fd3021": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_1a733db6518e46878e5bce814db1a9af", + "placeholder": "​", + "style": "IPY_MODEL_aac3b752786d4a7b80d6dfa7f66058d5", + "tabbable": null, + "tooltip": null, + "value": "Downloading builder script: 100%" + } + }, + "23b1da1a40e241baa2e6a4579a259516": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "25217fbc084b47a1ab89a5004feb4d39": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "25cac02f16d14945998ac7a792d2cfef": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "26587ec3d7e84b609a1989f675943eb7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "26a8a71f7c5a452899f59dd7b5a176f5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d785a96e121e4c91af0d7aa4e0a392d4", + "IPY_MODEL_b056defa36db4cd6ac08a83efd46c2a0", + "IPY_MODEL_e688178b054b4b4888ab3629fc34bbdb" + ], + "layout": "IPY_MODEL_ef33c35c1b59415f9e93f6103a41936c", + "tabbable": null, + "tooltip": null + } + }, + "27d695a44aa84bb2a15f2ef1f03c8f1e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_18c006e806824839924c1b17e4aa827d", + "max": 40.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_4fb31e4d401f4689b2cf7379c8452943", + "tabbable": null, + "tooltip": null, + "value": 40.0 + } + }, + "2d19891161064c219b8c40779deef687": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_f9955a20317e43ab9c3e5c36da2a66fc", + "max": 40.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_5805c4226f014e0b9cd3e41df0138ae0", + "tabbable": null, + "tooltip": null, + "value": 40.0 + } + }, + "2d9799122db642cda546c95a11862621": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_4447772c7acc411baf5380bb9a6cf901", + "IPY_MODEL_43e5d7f69703471899af4a648c2cbcc2", + "IPY_MODEL_ddc119ad2f1847ac870992cd1d347297" + ], + "layout": "IPY_MODEL_a48a1bc164b0442abb0142e1ead6cfc9", + "tabbable": null, + "tooltip": null + } + }, + "2ec5926ac6d04307b2b5107a9a857c20": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "31a4aa99f7534a138ff8b644a0c21694": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_c5ce3f599ef14bddb6ec490396d61a52", + "placeholder": "​", + "style": "IPY_MODEL_b1daeae4531d4f6fb3123af5fb5e6071", + "tabbable": null, + "tooltip": null, + "value": "100%" + } + }, + "348557cd32f24dd3b1dfd17a5eb60d52": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_fbd066b6c9674d549d3726ae29ec577c", + "placeholder": "​", + "style": "IPY_MODEL_deb49a4beb0b4055a0fec51184b27bef", + "tabbable": null, + "tooltip": null, + "value": "Computing checksums: 100%" + } + }, + "34d7ca353f004ac9b8ca4c34cab01827": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "395189a3965f407a828569f3d21e9633": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_ade5be2d0e2c4282ba14fb59df03b8a1", + "placeholder": "​", + "style": "IPY_MODEL_4aa79b3ad37b43f7b204c8bfe98f00a6", + "tabbable": null, + "tooltip": null, + "value": "Downloading data: 100%" + } + }, + "3b894c234eb84d67884cb862157014fb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "3c1fae6297624b9ca75eabb24ef64904": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_198ad5e2828840aaad9d73385da882ab", + "IPY_MODEL_4ed0ebcd7d1f4941be42c4ce40594222", + "IPY_MODEL_46a846ceadbf405a9e29674cef50e766" + ], + "layout": "IPY_MODEL_e4e8c01ad62d49ef87cb6c05755631fe", + "tabbable": null, + "tooltip": null + } + }, + "3d4af5c0b11f42f98cb9ebb33a854b32": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_54ab7d5ce76b4175961417f0e932111f", + "max": 4833.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_b8746042170745e384b9b50907a0e9bc", + "tabbable": null, + "tooltip": null, + "value": 4833.0 + } + }, + "3e32e3d8f8a54441ae62ebfbbc2f80e7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3e7ccfd3557646a6a207ceb93231fdae": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3ec410442997448992f4b70a1c3a38a8": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3eead3d419fc4f948a03577fd477b348": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_ef5767f724db4d65a6b011279ce6832a", + "placeholder": "​", + "style": "IPY_MODEL_73e6ce8c0d0d4f07ac31a5dd5cf712a8", + "tabbable": null, + "tooltip": null, + "value": " 4.83k/4.83k [00:00<00:00, 610kB/s]" + } + }, + "410e0fbb31d74938bf582bbc5fcd2b76": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_8c229a1d8e8a459fbdc3e3b0b31c799a", + "max": 40.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_34d7ca353f004ac9b8ca4c34cab01827", + "tabbable": null, + "tooltip": null, + "value": 40.0 + } + }, + "415d0efe01bc49ea8b1d6d84dc5a54b3": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "42e399a8c2ed49eca7aaf9d97757a29f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "43db7cb50d2b44948ece8343cc759bb5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "43e5d7f69703471899af4a648c2cbcc2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_2ec5926ac6d04307b2b5107a9a857c20", + "max": 8845.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_13862db5b2154f0a8841e67f27c56688", + "tabbable": null, + "tooltip": null, + "value": 8845.0 + } + }, + "4447772c7acc411baf5380bb9a6cf901": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_be6bbebc572743f290aa49a01551700f", + "placeholder": "​", + "style": "IPY_MODEL_912b630c3cd441bb9a97ff21e073f8f1", + "tabbable": null, + "tooltip": null, + "value": "Downloading readme: 100%" + } + }, + "46a846ceadbf405a9e29674cef50e766": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_a113a4c3dc1f4bf9bac198ac1044ad5f", + "placeholder": "​", + "style": "IPY_MODEL_9b1dedc9f66f4949b5981dedb7a5f91e", + "tabbable": null, + "tooltip": null, + "value": " 60000/60000 [00:36<00:00, 1664.15it/s]" + } + }, + "481c02c0fee2405bb4d29585907d7055": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4aa79b3ad37b43f7b204c8bfe98f00a6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "4b85d57847b2433daa31c01a84dfc707": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4c7e3260a9684fb8a7fabff754f7ed43": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_8cc2070ccaea4f14a84805ac33421094", + "IPY_MODEL_784f22d505ea4abf9888396498f8d92e", + "IPY_MODEL_f9f29cc92d464e58a8da4ccd243bdc49" + ], + "layout": "IPY_MODEL_23b1da1a40e241baa2e6a4579a259516", + "tabbable": null, + "tooltip": null + } + }, + "4c820a64219b436d8f50e8c3715bb9b2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_902aa41e71bd4414ab7c0a7987566d82", + "placeholder": "​", + "style": "IPY_MODEL_a4dd0603dbb243c89c59386ecf843db7", + "tabbable": null, + "tooltip": null, + "value": " 40/40 [00:00<00:00, 59.91it/s]" + } + }, + "4ed0ebcd7d1f4941be42c4ce40594222": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_481c02c0fee2405bb4d29585907d7055", + "max": 60000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_3b894c234eb84d67884cb862157014fb", + "tabbable": null, + "tooltip": null, + "value": 60000.0 + } + }, + "4fb31e4d401f4689b2cf7379c8452943": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "512363e3d1404aa1aaadf61056731b75": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "51dbfdfbcde54d49b0ef5ea2238c2eff": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "51f69153a021496182e63b86731bf29d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_31a4aa99f7534a138ff8b644a0c21694", + "IPY_MODEL_a09c7c2b9c9f47568a841e2378c7f762", + "IPY_MODEL_6fff842f96d14d4781c6e4a550ed1ead" + ], + "layout": "IPY_MODEL_803258d411f64e1c9de5d7829d137ade", + "tabbable": null, + "tooltip": null + } + }, + "54ab7d5ce76b4175961417f0e932111f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "55e9a8ed89d44a03af331f7d0c48840b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "58043aa4d6d749c7a5cc698ecfe9e971": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_51dbfdfbcde54d49b0ef5ea2238c2eff", + "placeholder": "​", + "style": "IPY_MODEL_99beb5a1292a43089406c1438b1fd8ce", + "tabbable": null, + "tooltip": null, + "value": " 40/40 [00:00<00:00, 60.96it/s]" + } + }, + "5805c4226f014e0b9cd3e41df0138ae0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "59787573aff042db91034386e12e1865": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "59a395d15e1b4deda81da9717d461443": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_d7f42a56ec6440a692ccddde6aee5229", + "placeholder": "​", + "style": "IPY_MODEL_59787573aff042db91034386e12e1865", + "tabbable": null, + "tooltip": null, + "value": "100%" + } + }, + "5b3e23b0893647989d42500ee1912492": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7f774fb50bfc40cea0b27a584092278f", + "IPY_MODEL_61308e050bc04381adb11248ca588da3", + "IPY_MODEL_4c820a64219b436d8f50e8c3715bb9b2" + ], + "layout": "IPY_MODEL_9c435335f131413fb9a70782f223746b", + "tabbable": null, + "tooltip": null + } + }, + "5f60314851a64a0fb5b1c9404a45b1e5": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "606015349d1544048e0648e814b8b6c2": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "606bcc8d005a4ac98ff74c4c85664f05": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "61308e050bc04381adb11248ca588da3": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_90ccf00da7b34fdeb9e7ee88b7ba34de", + "max": 40.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_88b3e7768c694c8f88c59a70b8d5cca2", + "tabbable": null, + "tooltip": null, + "value": 40.0 + } + }, + "613ad35ddcdc41cb9a04897632e6a366": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_e1ad64061c914a9b836d50926f80db90", + "max": 5148.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_950c7546e7dc45fba74f16b6dc071871", + "tabbable": null, + "tooltip": null, + "value": 5148.0 + } + }, + "65c0f6f709984c2a93ef9673219c973c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6fff842f96d14d4781c6e4a550ed1ead": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_071720f40df34d7e8cf1f89eb41c7639", + "placeholder": "​", + "style": "IPY_MODEL_c3b9e14ad86c4827b65d8197ee188e90", + "tabbable": null, + "tooltip": null, + "value": " 40/40 [00:00<00:00, 66.57it/s]" + } + }, + "7161d21784934cb3b29461b6c666c807": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "73e6ce8c0d0d4f07ac31a5dd5cf712a8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "74731a8fee8c4d6a86ae2bd8d05739ca": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_a09edbf9d7ad4dcf9f03f55bff80b8eb", + "IPY_MODEL_cc387063d7614f70a4047ea1eb3fbdb5", + "IPY_MODEL_d9536c56d9da45c8825cd66ff5b02b42" + ], + "layout": "IPY_MODEL_1fb935317d234ca2ae7d9781c1684c2f", + "tabbable": null, + "tooltip": null + } + }, + "753cea66fb774bb3bb5808e039aabc8b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "7700727980cd488e926cb66ce464c31c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "784f22d505ea4abf9888396498f8d92e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_7161d21784934cb3b29461b6c666c807", + "max": 40.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_8e393eb50092481d8dab13a80b5301fe", + "tabbable": null, + "tooltip": null, + "value": 40.0 + } + }, + "7869ccf74bed4c61b3aa713028a2949a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7a970c131e874da996251416c225b32d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_1b1f77cff7414ad682ba630b95ef40dd", + "max": 29515.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_26587ec3d7e84b609a1989f675943eb7", + "tabbable": null, + "tooltip": null, + "value": 29515.0 + } + }, + "7ab9b746b68043889091cbff21fed157": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7f774fb50bfc40cea0b27a584092278f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_132f49243383464090725afefab32353", + "placeholder": "​", + "style": "IPY_MODEL_18a9d2b10aca4dc287ebb4957c4494a3", + "tabbable": null, + "tooltip": null, + "value": "100%" + } + }, + "803258d411f64e1c9de5d7829d137ade": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8303fa1bdf524ad4813f8da15129d961": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "84bcd28a7f0240109f586df39724eea2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "88b3e7768c694c8f88c59a70b8d5cca2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "89451fe622ec44f4bd95a4a64885a893": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "8c229a1d8e8a459fbdc3e3b0b31c799a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8c66081bfb8f4f0fb90209d3890703e5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "8cc2070ccaea4f14a84805ac33421094": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_ea33ad342a3a42d7be59b86b330e0a72", + "placeholder": "​", + "style": "IPY_MODEL_1eec23a43c9a405f8ea876f4f6943470", + "tabbable": null, + "tooltip": null, + "value": "100%" + } + }, + "8dc0baccac7d49fc8e984ff7783e521f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "8e3767f38a304cfd8bbb8226a6b1e6b3": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8e393eb50092481d8dab13a80b5301fe": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "8ea03b5685e34845bd0e9788ff1690ea": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_f3e35882b32047faa84600d92bc51de1", + "placeholder": "​", + "style": "IPY_MODEL_dd85ed2c420b4e818c1bcb7b95c2bae3", + "tabbable": null, + "tooltip": null, + "value": "Downloading data: 100%" + } + }, + "8ecb1808a9c946e4bc3fae85b53e04fa": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "902aa41e71bd4414ab7c0a7987566d82": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9051d6f71e8e4103a68ff5a972b1f5ef": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_348557cd32f24dd3b1dfd17a5eb60d52", + "IPY_MODEL_c7ac0409def249e8aea703cdcc22dc3e", + "IPY_MODEL_d5fbff666b0d43569b08868b91a12901" + ], + "layout": "IPY_MODEL_7869ccf74bed4c61b3aa713028a2949a", + "tabbable": null, + "tooltip": null + } + }, + "90ccf00da7b34fdeb9e7ee88b7ba34de": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "912b630c3cd441bb9a97ff21e073f8f1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "950c7546e7dc45fba74f16b6dc071871": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "97bf04a7597a475cbbe1a3674d9f8831": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "990637c9b17b4cd29ed450019c374fcb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "99b9a00b6f534241af709d22c0961707": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "99beb5a1292a43089406c1438b1fd8ce": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "99cbd9d8e3c8481cb1756a4469db2097": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "9b1dedc9f66f4949b5981dedb7a5f91e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "9c435335f131413fb9a70782f223746b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9de079b2c5174ef986102eb3c5c47057": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_dba0bb97827c4098a5a9ca4335fd1b88", + "placeholder": "​", + "style": "IPY_MODEL_8dc0baccac7d49fc8e984ff7783e521f", + "tabbable": null, + "tooltip": null, + "value": "100%" + } + }, + "9ef1cb8ab03340fcb1000dded8e0576e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a09c7c2b9c9f47568a841e2378c7f762": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_15de07958ce340bb96187e30d950657b", + "max": 40.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_55e9a8ed89d44a03af331f7d0c48840b", + "tabbable": null, + "tooltip": null, + "value": 40.0 + } + }, + "a09edbf9d7ad4dcf9f03f55bff80b8eb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_512363e3d1404aa1aaadf61056731b75", + "placeholder": "​", + "style": "IPY_MODEL_990637c9b17b4cd29ed450019c374fcb", + "tabbable": null, + "tooltip": null, + "value": "Generating train split: 100%" + } + }, + "a113a4c3dc1f4bf9bac198ac1044ad5f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a34c2ac5b25b4ee9b3271f7d1cb58d09": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_b1e9d8c7944b4d5d9e4e763ff8111c23", + "IPY_MODEL_410e0fbb31d74938bf582bbc5fcd2b76", + "IPY_MODEL_b907dc449b2e4a7fb9486cd32d25c6ee" + ], + "layout": "IPY_MODEL_8e3767f38a304cfd8bbb8226a6b1e6b3", + "tabbable": null, + "tooltip": null + } + }, + "a48a1bc164b0442abb0142e1ead6cfc9": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a4dd0603dbb243c89c59386ecf843db7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "a7748675b7b9444884112e01bba88439": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "aac3b752786d4a7b80d6dfa7f66058d5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "ade5be2d0e2c4282ba14fb59df03b8a1": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "adfe07df4fa34ee4aaa545ed25fb09a5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_0be4194b581f41ecb7e2a135468a0715", + "placeholder": "​", + "style": "IPY_MODEL_97bf04a7597a475cbbe1a3674d9f8831", + "tabbable": null, + "tooltip": null, + "value": " 40/40 [00:00<00:00, 59.96it/s]" + } + }, + "ae6a30ccb0c14d74a979a0380deb3f43": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_225de54aaa1d4cbaacb8d8ab90fd3021", + "IPY_MODEL_3d4af5c0b11f42f98cb9ebb33a854b32", + "IPY_MODEL_3eead3d419fc4f948a03577fd477b348" + ], + "layout": "IPY_MODEL_9ef1cb8ab03340fcb1000dded8e0576e", + "tabbable": null, + "tooltip": null + } + }, + "afb898a15f25446dbf02acad9451d7f9": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_395189a3965f407a828569f3d21e9633", + "IPY_MODEL_b56cf93df6844c34a6b2bf73d7beecf1", + "IPY_MODEL_fb10059183f949b1aa58679b6f5e68d9" + ], + "layout": "IPY_MODEL_c35769250aee4c8a8d1426ec1e661d44", + "tabbable": null, + "tooltip": null + } + }, + "b056defa36db4cd6ac08a83efd46c2a0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_c44c9aba7fe4478796d828e89261ca5c", + "max": 60000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_84bcd28a7f0240109f586df39724eea2", + "tabbable": null, + "tooltip": null, + "value": 60000.0 + } + }, + "b1daeae4531d4f6fb3123af5fb5e6071": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "b1e9d8c7944b4d5d9e4e763ff8111c23": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_f7039b557e22405592de942ab2af6374", + "placeholder": "​", + "style": "IPY_MODEL_99cbd9d8e3c8481cb1756a4469db2097", + "tabbable": null, + "tooltip": null, + "value": "100%" + } + }, + "b56cf93df6844c34a6b2bf73d7beecf1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_7ab9b746b68043889091cbff21fed157", + "max": 26421880.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_f6e32bbd0e904e5482d47af23a90e68c", + "tabbable": null, + "tooltip": null, + "value": 26421880.0 + } + }, + "b57d8df09b744b9ba40ccb2f493ac41b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b7017b8c69034c44a655a1d1180d4f6b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_9de079b2c5174ef986102eb3c5c47057", + "IPY_MODEL_2d19891161064c219b8c40779deef687", + "IPY_MODEL_adfe07df4fa34ee4aaa545ed25fb09a5" + ], + "layout": "IPY_MODEL_bdc6a04d9729483692936e3bf58a5987", + "tabbable": null, + "tooltip": null + } + }, + "b8746042170745e384b9b50907a0e9bc": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b907dc449b2e4a7fb9486cd32d25c6ee": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_0b5c4213230749ddb8206b27c8b39146", + "placeholder": "​", + "style": "IPY_MODEL_42e399a8c2ed49eca7aaf9d97757a29f", + "tabbable": null, + "tooltip": null, + "value": " 40/40 [00:00<00:00, 65.66it/s]" + } + }, + "b9ab02dd79fd4a7cba0f4f657f8e67b8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b9c0dad36ba649998e0a03292f55fa10": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "bb779f9e262449559414273be74e1f9f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bcad1842e9b44d0d836e546327b06f75": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bdc6a04d9729483692936e3bf58a5987": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "be6bbebc572743f290aa49a01551700f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bf5bbd5595f146d69a145a65e34f8a50": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_606bcc8d005a4ac98ff74c4c85664f05", + "max": 4422102.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_fb48987f314c4ad5b856f55902a46dc7", + "tabbable": null, + "tooltip": null, + "value": 4422102.0 + } + }, + "bf7d22d257844656898d70bcfd15fa51": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c0a788ec352d4d44892fa2536705f4dc": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_4b85d57847b2433daa31c01a84dfc707", + "placeholder": "​", + "style": "IPY_MODEL_dbc1da52d2cc4f469798e65e58c06d24", + "tabbable": null, + "tooltip": null, + "value": "Downloading data: 100%" + } + }, + "c0f95a03646e434191b916c4b037adbd": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "c35769250aee4c8a8d1426ec1e661d44": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c3b9e14ad86c4827b65d8197ee188e90": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "c44c9aba7fe4478796d828e89261ca5c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c5386e95ad74411db396e3aed529bbb9": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c5ce3f599ef14bddb6ec490396d61a52": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c7ac0409def249e8aea703cdcc22dc3e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_65c0f6f709984c2a93ef9673219c973c", + "max": 4.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_b9ab02dd79fd4a7cba0f4f657f8e67b8", + "tabbable": null, + "tooltip": null, + "value": 4.0 + } + }, + "cb97d3bcaebf43a49c832bd9c9c6fd78": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_3ec410442997448992f4b70a1c3a38a8", + "placeholder": "​", + "style": "IPY_MODEL_7700727980cd488e926cb66ce464c31c", + "tabbable": null, + "tooltip": null, + "value": " 4.42M/4.42M [00:00<00:00, 108MB/s]" + } + }, + "cc387063d7614f70a4047ea1eb3fbdb5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_606015349d1544048e0648e814b8b6c2", + "max": 60000.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_8c66081bfb8f4f0fb90209d3890703e5", + "tabbable": null, + "tooltip": null, + "value": 60000.0 + } + }, + "cc75429289e142bca67437b899acc9d8": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ccf67a40246c43cebf26af4e175838b6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "cd7e2132d359467393e919631a8e1063": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_8ea03b5685e34845bd0e9788ff1690ea", + "IPY_MODEL_613ad35ddcdc41cb9a04897632e6a366", + "IPY_MODEL_0e221f2e84a64665a92b4dd92afadf8c" + ], + "layout": "IPY_MODEL_25217fbc084b47a1ab89a5004feb4d39", + "tabbable": null, + "tooltip": null + } + }, + "d5fbff666b0d43569b08868b91a12901": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_b57d8df09b744b9ba40ccb2f493ac41b", + "placeholder": "​", + "style": "IPY_MODEL_c0f95a03646e434191b916c4b037adbd", + "tabbable": null, + "tooltip": null, + "value": " 4/4 [00:00<00:00, 1262.20it/s]" + } + }, + "d785a96e121e4c91af0d7aa4e0a392d4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_cc75429289e142bca67437b899acc9d8", + "placeholder": "​", + "style": "IPY_MODEL_43db7cb50d2b44948ece8343cc759bb5", + "tabbable": null, + "tooltip": null, + "value": "Map (num_proc=4): 100%" + } + }, + "d7f42a56ec6440a692ccddde6aee5229": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d9536c56d9da45c8825cd66ff5b02b42": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_036431e7b0eb4772a3a6f1fefe212d9c", + "placeholder": "​", + "style": "IPY_MODEL_e3ffd020d418403d80011e99dab8abde", + "tabbable": null, + "tooltip": null, + "value": " 60000/60000 [00:06<00:00, 8824.70 examples/s]" + } + }, + "dba0bb97827c4098a5a9ca4335fd1b88": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "dbc1da52d2cc4f469798e65e58c06d24": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "dd85ed2c420b4e818c1bcb7b95c2bae3": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "ddc119ad2f1847ac870992cd1d347297": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_bb779f9e262449559414273be74e1f9f", + "placeholder": "​", + "style": "IPY_MODEL_0957f4aa54a647f1b3aba75b8cfd6463", + "tabbable": null, + "tooltip": null, + "value": " 8.85k/8.85k [00:00<00:00, 1.49MB/s]" + } + }, + "deb49a4beb0b4055a0fec51184b27bef": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "e1ad64061c914a9b836d50926f80db90": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e3ffd020d418403d80011e99dab8abde": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "e4417397f4504e97ac6455b54cd8cecd": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_59a395d15e1b4deda81da9717d461443", + "IPY_MODEL_27d695a44aa84bb2a15f2ef1f03c8f1e", + "IPY_MODEL_58043aa4d6d749c7a5cc698ecfe9e971" + ], + "layout": "IPY_MODEL_0faf68c3a0094c88ab2178eceb37973e", + "tabbable": null, + "tooltip": null + } + }, + "e4e8c01ad62d49ef87cb6c05755631fe": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e62c4e6b1a2145518b7ca20c5af34a2e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e688178b054b4b4888ab3629fc34bbdb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_8ecb1808a9c946e4bc3fae85b53e04fa", + "placeholder": "​", + "style": "IPY_MODEL_25cac02f16d14945998ac7a792d2cfef", + "tabbable": null, + "tooltip": null, + "value": " 60000/60000 [00:10<00:00, 7126.68 examples/s]" + } + }, + "e9fbd011278a4e449456a1a586a2f6cf": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ea33ad342a3a42d7be59b86b330e0a72": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ed6334dc0dcb4f83b2e722f609ed7c35": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "ef33c35c1b59415f9e93f6103a41936c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ef5767f724db4d65a6b011279ce6832a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f35a2857866f47469b1e75d72668b89d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_0e403b4137d64c92b4e2e1b5cf28c86b", + "IPY_MODEL_093403dfc3b64cf1983221741b5fcb0b", + "IPY_MODEL_103a2144ed2f4c469fe4045caba316f3" + ], + "layout": "IPY_MODEL_3e7ccfd3557646a6a207ceb93231fdae", + "tabbable": null, + "tooltip": null + } + }, + "f3e35882b32047faa84600d92bc51de1": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f6e32bbd0e904e5482d47af23a90e68c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "f7039b557e22405592de942ab2af6374": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f9955a20317e43ab9c3e5c36da2a66fc": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f9af5bd782bc4390a94d9e9012aa2485": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f9f29cc92d464e58a8da4ccd243bdc49": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_14afaa7e6f6b4bd68221bd65c0679738", + "placeholder": "​", + "style": "IPY_MODEL_8303fa1bdf524ad4813f8da15129d961", + "tabbable": null, + "tooltip": null, + "value": " 40/40 [00:00<00:00, 59.34it/s]" + } + }, + "fb10059183f949b1aa58679b6f5e68d9": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_bcad1842e9b44d0d836e546327b06f75", + "placeholder": "​", + "style": "IPY_MODEL_b9c0dad36ba649998e0a03292f55fa10", + "tabbable": null, + "tooltip": null, + "value": " 26.4M/26.4M [00:00<00:00, 131MB/s]" + } + }, + "fb48987f314c4ad5b856f55902a46dc7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "fbd066b6c9674d549d3726ae29ec577c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ff2c15cc372d4b1b9add158008f1c134": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + } + }, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/tabular.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/tabular.ipynb new file mode 100644 index 000000000..d4633178b --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/tabular.ipynb @@ -0,0 +1,1383 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Detecting Issues in Tabular Data (Numeric/Categorical columns) with Datalab\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this 5-minute quickstart tutorial, we use Datalab to detect various issues in a classification dataset with tabular (numeric/categorical) features. Tabular (or *structured*) data are typically organized in a row/column format and stored in a SQL database or file types like: CSV, Excel, or Parquet. Here we consider a Student Grades dataset, which contains over 900 individuals who have three exam grades and some optional notes, each being assigned a letter grade (their class label). cleanlab automatically identifies _hundreds_ of examples in this dataset that were mislabeled with the incorrect final grade selected. You can run the same code from this tutorial to detect incorrect information in your own tabular classification datasets.\n", + "\n", + "**Overview of what we'll do in this tutorial:**\n", + "\n", + "- Train a classifier model (here scikit-learn's HistGradientBoostingClassifier, although any model could be used) and use this classifier to compute (out-of-sample) predicted class probabilities via cross-validation.\n", + "\n", + "- Create a K nearest neighbours (KNN) graph between the examples in the dataset.\n", + "\n", + "- Identify issues in the dataset with cleanlab's `Datalab` audit applied to the predictions and KNN graph.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "Already have (out-of-sample) `pred_probs` from a model trained on your original data labels? Have a `knn_graph` computed between dataset examples (reflecting similarity in their feature values)? Run the code below to find issues in your dataset.\n", + "\n", + "
\n", + " \n", + "```ipython3 \n", + "from cleanlab import Datalab\n", + "\n", + "lab = Datalab(data=your_dataset, label_name=\"column_name_of_labels\")\n", + "lab.find_issues(pred_probs=your_pred_probs, knn_graph=knn_graph)\n", + "\n", + "lab.get_issues()\n", + "```\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Install required dependencies\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can use `pip` to install all packages required for this tutorial as follows:\n", + "\n", + "```ipython3\n", + "!pip install \"cleanlab[datalab]\"\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:20.872952Z", + "iopub.status.busy": "2024-06-25T23:02:20.872489Z", + "iopub.status.idle": "2024-06-25T23:02:21.974284Z", + "shell.execute_reply": "2024-06-25T23:02:21.973713Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs website).\n", + "dependencies = [\"cleanlab\", \"datasets\"]\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:21.976885Z", + "iopub.status.busy": "2024-06-25T23:02:21.976507Z", + "iopub.status.idle": "2024-06-25T23:02:21.994963Z", + "shell.execute_reply": "2024-06-25T23:02:21.994531Z" + } + }, + "outputs": [], + "source": [ + "import random\n", + "import numpy as np\n", + "import pandas as pd\n", + "\n", + "from sklearn.model_selection import cross_val_predict\n", + "from sklearn.preprocessing import StandardScaler\n", + "from sklearn.ensemble import HistGradientBoostingClassifier\n", + "from sklearn.neighbors import NearestNeighbors\n", + "\n", + "from cleanlab import Datalab\n", + "\n", + "SEED = 100 # for reproducibility\n", + "np.random.seed(SEED)\n", + "random.seed(SEED)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Load and process the data\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We first load the data features and labels (which are possibly noisy).\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:21.997095Z", + "iopub.status.busy": "2024-06-25T23:02:21.996836Z", + "iopub.status.idle": "2024-06-25T23:02:22.023323Z", + "shell.execute_reply": "2024-06-25T23:02:22.022824Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
stud_IDexam_1exam_2exam_3notesletter_grade
0f48f7353.0077.009.003C
10bd4e781.0064.0080.00great participation +10B
20bd4e781.0064.0080.00great participation +10B
3cb9d7a0.610.940.78NaNC
49acca448.0090.009.001C
\n", + "
" + ], + "text/plain": [ + " stud_ID exam_1 exam_2 exam_3 notes letter_grade\n", + "0 f48f73 53.00 77.00 9.00 3 C\n", + "1 0bd4e7 81.00 64.00 80.00 great participation +10 B\n", + "2 0bd4e7 81.00 64.00 80.00 great participation +10 B\n", + "3 cb9d7a 0.61 0.94 0.78 NaN C\n", + "4 9acca4 48.00 90.00 9.00 1 C" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "grades_data = pd.read_csv(\"https://s.cleanlab.ai/grades-tabular-demo-v2.csv\")\n", + "grades_data.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:22.025506Z", + "iopub.status.busy": "2024-06-25T23:02:22.025194Z", + "iopub.status.idle": "2024-06-25T23:02:22.028596Z", + "shell.execute_reply": "2024-06-25T23:02:22.028157Z" + } + }, + "outputs": [], + "source": [ + "X_raw = grades_data[[\"exam_1\", \"exam_2\", \"exam_3\", \"notes\"]]\n", + "labels = grades_data[\"letter_grade\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we preprocess the data. Here we apply one-hot encoding to columns with categorical values and standardize the values in numeric columns." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:22.030765Z", + "iopub.status.busy": "2024-06-25T23:02:22.030439Z", + "iopub.status.idle": "2024-06-25T23:02:22.037728Z", + "shell.execute_reply": "2024-06-25T23:02:22.037318Z" + } + }, + "outputs": [], + "source": [ + "cat_features = [\"notes\"]\n", + "X_encoded = pd.get_dummies(X_raw, columns=cat_features, drop_first=True)\n", + "\n", + "numeric_features = [\"exam_1\", \"exam_2\", \"exam_3\"]\n", + "scaler = StandardScaler()\n", + "X_processed = X_encoded.copy()\n", + "X_processed[numeric_features] = scaler.fit_transform(X_encoded[numeric_features])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Bringing Your Own Data (BYOD)?\n", + "\n", + "Assign your data's features to variable `X` and its labels to variable `labels` instead.\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Select a classification model and compute out-of-sample predicted probabilities\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we use a simple histogram-based gradient boosting model (similar to XGBoost), but you can choose any suitable scikit-learn model for this tutorial.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:22.039761Z", + "iopub.status.busy": "2024-06-25T23:02:22.039428Z", + "iopub.status.idle": "2024-06-25T23:02:22.042031Z", + "shell.execute_reply": "2024-06-25T23:02:22.041594Z" + } + }, + "outputs": [], + "source": [ + "clf = HistGradientBoostingClassifier()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To find potential labeling errors, cleanlab requires a probabilistic prediction from your model for every datapoint. However, these predictions will be _overfitted_ (and thus unreliable) for examples the model was previously trained on. For the best results, cleanlab should be applied with **out-of-sample** predicted class probabilities, i.e., on examples held out from the model during the training.\n", + "\n", + "K-fold cross-validation is a straightforward way to produce out-of-sample predicted probabilities for every datapoint in the dataset by training K copies of our model on different data subsets and using each copy to predict on the subset of data it did not see during training. Make sure that the columns of your `pred_probs` are properly ordered with respect to the ordering of classes, which for Datalab is: lexicographically sorted by class name.\n", + "We can implement this via the `cross_val_predict` method from scikit-learn.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:22.044127Z", + "iopub.status.busy": "2024-06-25T23:02:22.043814Z", + "iopub.status.idle": "2024-06-25T23:02:24.976069Z", + "shell.execute_reply": "2024-06-25T23:02:24.975522Z" + } + }, + "outputs": [], + "source": [ + "num_crossval_folds = 5 \n", + "pred_probs = cross_val_predict(\n", + " clf,\n", + " X_processed,\n", + " labels,\n", + " cv=num_crossval_folds,\n", + " method=\"predict_proba\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Construct K nearest neighbours graph" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The KNN graph reflects how close each example is when compared to other examples in our dataset (in the numerical space of preprocessed feature values). This similarity information is used by Datalab to identify issues like outliers in our data. For tabular data, think carefully about the most appropriate way to define the similarity between two examples.\n", + "\n", + "Here we use the `NearestNeighbors` class in sklearn to easily compute this graph (with similarity defined by the Euclidean distance between feature values). The graph should be represented as a sparse matrix with nonzero entries indicating nearest neighbors of each example and their distance." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:24.978810Z", + "iopub.status.busy": "2024-06-25T23:02:24.978406Z", + "iopub.status.idle": "2024-06-25T23:02:24.988167Z", + "shell.execute_reply": "2024-06-25T23:02:24.987722Z" + } + }, + "outputs": [], + "source": [ + "KNN = NearestNeighbors(metric='euclidean')\n", + "KNN.fit(X_processed.values)\n", + "\n", + "knn_graph = KNN.kneighbors_graph(mode=\"distance\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Use cleanlab to find label issues\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Based on the given labels, predicted probabilities, and KNN graph, cleanlab can quickly help us identify suspicious values in our grades table.\n", + "\n", + "We use cleanlab's `Datalab` class which has several ways of loading the data. In this case, we’ll simply wrap the dataset (features and noisy labels) in a dictionary that is used instantiate a `Datalab` object such that it can audit our dataset for various types of issues." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:24.990119Z", + "iopub.status.busy": "2024-06-25T23:02:24.989943Z", + "iopub.status.idle": "2024-06-25T23:02:26.966161Z", + "shell.execute_reply": "2024-06-25T23:02:26.965466Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding label issues ...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding outlier issues ...\n", + "Finding near_duplicate issues ...\n", + "Finding non_iid issues ...\n", + "Finding class_imbalance issues ...\n", + "Finding underperforming_group issues ...\n", + "\n", + "Audit complete. 358 issues found in the dataset.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/sklearn/neighbors/_base.py:246: EfficiencyWarning: Precomputed sparse input was not sorted by row values. Use the function sklearn.neighbors.sort_graph_by_row_values to sort the input by row values, with warn_when_not_sorted=False to remove this warning.\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "data = {\"X\": X_processed.values, \"y\": labels}\n", + "\n", + "lab = Datalab(data, label_name=\"y\")\n", + "lab.find_issues(pred_probs=pred_probs, knn_graph=knn_graph)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:26.968453Z", + "iopub.status.busy": "2024-06-25T23:02:26.968058Z", + "iopub.status.idle": "2024-06-25T23:02:26.987633Z", + "shell.execute_reply": "2024-06-25T23:02:26.987073Z" + }, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dataset Information: num_examples: 941, num_classes: 5\n", + "\n", + "Here is a summary of various issues found in your data:\n", + "\n", + " issue_type num_issues\n", + " label 294\n", + " outlier 46\n", + "near_duplicate 17\n", + " non_iid 1\n", + "\n", + "Learn about each issue: https://docs.cleanlab.ai/stable/cleanlab/datalab/guide/issue_type_description.html\n", + "See which examples in your dataset exhibit each issue via: `datalab.get_issues()`\n", + "\n", + "Data indices corresponding to top examples of each issue are shown below.\n", + "\n", + "\n", + "----------------------- label issues -----------------------\n", + "\n", + "About this issue:\n", + "\tExamples whose given label is estimated to be potentially incorrect\n", + " (e.g. due to annotation error) are flagged as having label issues.\n", + " \n", + "\n", + "Number of examples with this issue: 294\n", + "Overall dataset quality in terms of this issue: 0.7109\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_label_issue label_score given_label predicted_label\n", + "3 True 0.000005 C F\n", + "886 True 0.000059 D B\n", + "709 True 0.000104 F C\n", + "723 True 0.000169 A C\n", + "689 True 0.000181 B D\n", + "\n", + "\n", + "---------------------- outlier issues ----------------------\n", + "\n", + "About this issue:\n", + "\tExamples that are very different from the rest of the dataset \n", + " (i.e. potentially out-of-distribution or rare/anomalous instances).\n", + " \n", + "\n", + "Number of examples with this issue: 46\n", + "Overall dataset quality in terms of this issue: 0.3590\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_outlier_issue outlier_score\n", + "3 True 3.051882e-07\n", + "7 True 7.683133e-05\n", + "0 True 6.536582e-04\n", + "4 True 8.406589e-04\n", + "8 True 5.324246e-03\n", + "\n", + "\n", + "------------------ near_duplicate issues -------------------\n", + "\n", + "About this issue:\n", + "\tA (near) duplicate issue refers to two or more examples in\n", + " a dataset that are extremely similar to each other, relative\n", + " to the rest of the dataset. The examples flagged with this issue\n", + " may be exactly duplicated, or lie atypically close together when\n", + " represented as vectors (i.e. feature embeddings).\n", + " \n", + "\n", + "Number of examples with this issue: 17\n", + "Overall dataset quality in terms of this issue: 0.6165\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets distance_to_nearest_neighbor\n", + "12 True 0.0 [2, 1, 6, 9] 0.0\n", + "582 True 0.0 [185] 0.0\n", + "185 True 0.0 [582] 0.0\n", + "187 True 0.0 [27] 0.0\n", + "898 True 0.0 [637] 0.0\n", + "\n", + "\n", + "---------------------- non_iid issues ----------------------\n", + "\n", + "About this issue:\n", + "\tWhether the dataset exhibits statistically significant\n", + " violations of the IID assumption like:\n", + " changepoints or shift, drift, autocorrelation, etc.\n", + " The specific violation considered is whether the\n", + " examples are ordered such that almost adjacent examples\n", + " tend to have more similar feature values.\n", + " \n", + "\n", + "Number of examples with this issue: 1\n", + "Overall dataset quality in terms of this issue: 0.0000\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_non_iid_issue non_iid_score\n", + "865 True 0.515002\n", + "837 False 0.556480\n", + "622 False 0.593068\n", + "329 False 0.593207\n", + "920 False 0.618041\n", + "\n", + "Additional Information: \n", + "p-value: 1.4386345844794593e-05\n" + ] + } + ], + "source": [ + "lab.report()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Label issues\n", + "\n", + "The above report shows that cleanlab identified many label issues in the data. We can see which examples are estimated to be mislabeled (as well as a numeric quality score quantifying how likely their label is correct) via the `get_issues` method." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:26.989833Z", + "iopub.status.busy": "2024-06-25T23:02:26.989382Z", + "iopub.status.idle": "2024-06-25T23:02:26.997397Z", + "shell.execute_reply": "2024-06-25T23:02:26.996968Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_label_issuelabel_scoregiven_labelpredicted_label
0True0.000842CF
1False0.555944BB
2False0.555944BB
3True0.000005CF
4True0.004374CD
\n", + "
" + ], + "text/plain": [ + " is_label_issue label_score given_label predicted_label\n", + "0 True 0.000842 C F\n", + "1 False 0.555944 B B\n", + "2 False 0.555944 B B\n", + "3 True 0.000005 C F\n", + "4 True 0.004374 C D" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "issue_results = lab.get_issues(\"label\")\n", + "issue_results.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To review the most severe label issues, sort the DataFrame above by the `label_score` column (a lower score represents that the label is less likely to be correct). \n", + "\n", + "Let's review some of the most likely label errors:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:26.999393Z", + "iopub.status.busy": "2024-06-25T23:02:26.999216Z", + "iopub.status.idle": "2024-06-25T23:02:27.009012Z", + "shell.execute_reply": "2024-06-25T23:02:27.008461Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
exam_1exam_2exam_3notesgiven_labelpredicted_label
30.610.940.78NaNCF
88689.0095.0073.00NaNDB
70964.0070.0086.00NaNFC
72353.0089.0078.00NaNAC
68977.0051.0070.00NaNBD
\n", + "
" + ], + "text/plain": [ + " exam_1 exam_2 exam_3 notes given_label predicted_label\n", + "3 0.61 0.94 0.78 NaN C F\n", + "886 89.00 95.00 73.00 NaN D B\n", + "709 64.00 70.00 86.00 NaN F C\n", + "723 53.00 89.00 78.00 NaN A C\n", + "689 77.00 51.00 70.00 NaN B D" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sorted_issues = issue_results.sort_values(\"label_score\").index\n", + "\n", + "X_raw.iloc[sorted_issues].assign(\n", + " given_label=labels.iloc[sorted_issues], \n", + " predicted_label=issue_results[\"predicted_label\"].iloc[sorted_issues]\n", + ").head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The dataframe above shows the original label (`given_label`) for examples that cleanlab finds most likely to be mislabeled, as well as an alternative `predicted_label` for each example.\n", + "\n", + "These examples have been labeled incorrectly and should be carefully re-examined - a student with grades of 89, 95 and 73 surely does not deserve a D! " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Outlier issues\n", + "\n", + "According to the report, our dataset contains some outliers. We can see which examples are outliers (and a numeric quality score quantifying how typical each example appears to be) via `get_issues`. We sort the resulting DataFrame by cleanlab's outlier quality score to see the most severe outliers in our dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:27.011197Z", + "iopub.status.busy": "2024-06-25T23:02:27.010813Z", + "iopub.status.idle": "2024-06-25T23:02:27.018956Z", + "shell.execute_reply": "2024-06-25T23:02:27.018514Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
exam_1exam_2exam_3notes
30.610.940.78NaN
7100.00100.001.00NaN
053.0077.009.003
448.0090.009.001
80.0056.0096.00<p style=\"font-size: 18px; color: #ff00ff; bac...
\n", + "
" + ], + "text/plain": [ + " exam_1 exam_2 exam_3 notes\n", + "3 0.61 0.94 0.78 NaN\n", + "7 100.00 100.00 1.00 NaN\n", + "0 53.00 77.00 9.00 3\n", + "4 48.00 90.00 9.00 1\n", + "8 0.00 56.00 96.00

\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_near_duplicate_issuenear_duplicate_scorenear_duplicate_setsdistance_to_nearest_neighbor
12True0.0[2, 1, 6, 9]0.0
582True0.0[185]0.0
185True0.0[582]0.0
187True0.0[27]0.0
898True0.0[637]0.0
\n", + "" + ], + "text/plain": [ + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets \\\n", + "12 True 0.0 [2, 1, 6, 9] \n", + "582 True 0.0 [185] \n", + "185 True 0.0 [582] \n", + "187 True 0.0 [27] \n", + "898 True 0.0 [637] \n", + "\n", + " distance_to_nearest_neighbor \n", + "12 0.0 \n", + "582 0.0 \n", + "185 0.0 \n", + "187 0.0 \n", + "898 0.0 " + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "duplicate_results = lab.get_issues(\"near_duplicate\")\n", + "duplicate_results.sort_values(\"near_duplicate_score\").head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The results above show which examples cleanlab considers nearly duplicated (rows where `is_near_duplicate_issue == True`). Here, we see some examples that cleanlab has flagged as being nearly duplicated. Let's view these examples to see how similar they are\n", + "\n", + "Using the one of the lowest-scoring examples, let's compare it against the identified near-duplicate sets." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:27.031675Z", + "iopub.status.busy": "2024-06-25T23:02:27.031332Z", + "iopub.status.idle": "2024-06-25T23:02:27.038748Z", + "shell.execute_reply": "2024-06-25T23:02:27.038299Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "

\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
exam_1exam_2exam_3notes
181.064.080.0great participation +10
281.064.080.0great participation +10
1281.064.080.0great participation +10
681.064.080.0great participation +10
981.064.080.0great participation +10
\n", + "
" + ], + "text/plain": [ + " exam_1 exam_2 exam_3 notes\n", + "1 81.0 64.0 80.0 great participation +10\n", + "2 81.0 64.0 80.0 great participation +10\n", + "12 81.0 64.0 80.0 great participation +10\n", + "6 81.0 64.0 80.0 great participation +10\n", + "9 81.0 64.0 80.0 great participation +10" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Identify the row with the lowest near_duplicate_score\n", + "lowest_scoring_duplicate = duplicate_results[\"near_duplicate_score\"].idxmin()\n", + "\n", + "# Extract the indices of the lowest scoring duplicate and its near duplicate sets\n", + "indices_to_display = [lowest_scoring_duplicate] + duplicate_results.loc[lowest_scoring_duplicate, \"near_duplicate_sets\"].tolist()\n", + "\n", + "# Display the relevant rows from the original dataset\n", + "X_raw.iloc[indices_to_display]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "These examples are exact duplicates! Perhaps the same information was accidentally recorded multiple times in this data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Similarly, let's take a look at another example and the identified near-duplicate sets:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:27.040852Z", + "iopub.status.busy": "2024-06-25T23:02:27.040530Z", + "iopub.status.idle": "2024-06-25T23:02:27.048198Z", + "shell.execute_reply": "2024-06-25T23:02:27.047651Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
exam_1exam_2exam_3notes
2786.080.089.0NaN
18786.080.089.0NaN
\n", + "
" + ], + "text/plain": [ + " exam_1 exam_2 exam_3 notes\n", + "27 86.0 80.0 89.0 NaN\n", + "187 86.0 80.0 89.0 NaN" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Identify the next row not in the previous near duplicate set\n", + "second_lowest_scoring_duplicate = duplicate_results[\"near_duplicate_score\"].drop(indices_to_display).idxmin()\n", + "\n", + "# Extract the indices of the second lowest scoring duplicate and its near duplicate sets\n", + "next_indices_to_display = [second_lowest_scoring_duplicate] + duplicate_results.loc[second_lowest_scoring_duplicate, \"near_duplicate_sets\"].tolist()\n", + "\n", + "# Display the relevant rows from the original dataset\n", + "X_raw.iloc[next_indices_to_display]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We identified another set of exact duplicates in our dataset! Including near/exact duplicates in a dataset may have unintended effects on models; be wary about splitting them across training/test sets. Learn more about handling near duplicates detected in a dataset from [the FAQ](../faq.html#How-to-handle-near-duplicate-data-identified-by-cleanlab?)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This tutorial highlighted a straightforward approach to detect potentially incorrect information in any tabular dataset. Just use Datalab with any ML model -- the better the model, the more accurate the data errors detected by Datalab will be!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Easy Mode \n", + "\n", + "Cleanlab is most effective when you run this code with a good ML model. Try to produce the best ML model you can for your data (instead of the basic model from this tutorial). If you don't know the best ML model for your data, try [Cleanlab Studio](https://cleanlab.ai/blog/data-centric-ai/) which will automatically produce one for you. Super easy to use, [Cleanlab Studio](https://cleanlab.ai/blog/data-centric-ai/) is no-code platform for data-centric AI that automatically: detects data issues (more types of issues than this cleanlab package), helps you quickly correct these data issues, confidently labels large subsets of an unlabeled dataset, and provides other smart metadata about each of your data points -- all powered by a system that automatically trains/deploys the best ML model for your data. [Try it for free!](https://cleanlab.ai/signup/)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:27.050359Z", + "iopub.status.busy": "2024-06-25T23:02:27.050042Z", + "iopub.status.idle": "2024-06-25T23:02:27.058741Z", + "shell.execute_reply": "2024-06-25T23:02:27.058173Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "identified_label_issues = issue_results[issue_results[\"is_label_issue\"] == True]\n", + "label_issue_indices = [3, 723, 709, 886, 689] # check these examples were found in label issues\n", + "if not all(x in identified_label_issues.index for x in label_issue_indices):\n", + " raise Exception(\"Some highlighted examples are missing from identified_label_issues.\")\n", + " \n", + "identified_outlier_issues = outlier_results[outlier_results[\"is_outlier_issue\"] == True]\n", + "outlier_issue_indices = [3, 7, 0, 4, 8] # check these examples were found in outlier issues\n", + "if not all(x in identified_outlier_issues.index for x in outlier_issue_indices):\n", + " raise Exception(\"Some highlighted examples are missing from identified_outlier_issues.\")\n", + " \n", + "identified_duplicate_issues = duplicate_results[duplicate_results[\"is_near_duplicate_issue\"] == True]\n", + "duplicate_issue_indices = [690, 246, 185, 582] # check these examples were found in duplicate issues\n", + "if not all(x in identified_duplicate_issues.index for x in duplicate_issue_indices):\n", + " raise Exception(\"Some highlighted examples are missing from identified_duplicate_issues.\")\n", + " \n", + "# check that the near duplicates shown are actually flagged as near duplicate sets\n", + "if not duplicate_results.iloc[690][\"near_duplicate_sets\"] == 246:\n", + " raise Exception(\"These examples are not in the same near duplicate set\")\n", + " \n", + "if not duplicate_results.iloc[185][\"near_duplicate_sets\"] == 582:\n", + " raise Exception(\"These examples are not in the same near duplicate set\")\n", + "\n", + "# Function to check if all rows are identical\n", + "def are_rows_identical(df):\n", + " first_row = df.iloc[0]\n", + " return all(df.iloc[i].equals(first_row) for i in range(1, len(df)))\n", + "\n", + "# Test to ensure all displayed rows are identical\n", + "if not are_rows_identical(X_raw.iloc[indices_to_display]):\n", + " raise Exception(\"Not all rows are identical! These examples should belong to the same EXACT duplicate set\")\n", + "\n", + "# Repeat the test for the next set of indices\n", + "if not are_rows_identical(X_raw.iloc[next_indices_to_display]):\n", + " raise Exception(\"Not all rows are identical! These examples should belong to the same EXACT duplicate set\")" + ] + } + ], + "metadata": { + "interpreter": { + "hash": "cda20062bc42cfdcaa0f9720c0b28e880bba110e9dfce6c1689934eec9b595a1" + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/text.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/text.ipynb new file mode 100644 index 000000000..96ecfd1b8 --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/text.ipynb @@ -0,0 +1,1515 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Detecting Issues in a Text Dataset with Datalab\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this 5-minute quickstart tutorial, we use Datalab to detect various issues in an intent classification dataset composed of (text) customer service requests at an online bank. We consider a subset of the [Banking77-OOS Dataset](https://arxiv.org/abs/2106.04564) containing 1,000 customer service requests which are classified into 10 categories based on their intent (you can run this same code on any text classification dataset). Cleanlab automatically identifies bad examples in our dataset, including mislabeled data, out-of-scope examples (outliers), or otherwise ambiguous examples. Consider filtering or correcting such bad examples before you dive deep into modeling your data!\n", + "\n", + "**Overview of what we'll do in this tutorial:**\n", + "\n", + "- Use a pretrained transformer model to extract the text embeddings from the customer service requests\n", + "\n", + "- Train a simple Logistic Regression model on the text embeddings to compute out-of-sample predicted probabilities\n", + "\n", + "- Run cleanlab's `Datalab` audit with these predictions and embeddings in order to identify problems like: label issues, outliers, and near duplicates in the dataset." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "Already have (out-of-sample) `pred_probs` from a model trained on an existing set of labels? Maybe you have some numeric `features` as well? Run the code below to find any potential label errors in your dataset.\n", + "\n", + "
\n", + " \n", + "```ipython3 \n", + "from cleanlab import Datalab\n", + "\n", + "lab = Datalab(data=your_dataset, label_name=\"column_name_of_labels\")\n", + "lab.find_issues(pred_probs=your_pred_probs, features=your_features)\n", + "\n", + "lab.report()\n", + "lab.get_issues()\n", + "```\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Install required dependencies\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can use `pip` to install all packages required for this tutorial as follows:\n", + "\n", + "```ipython3\n", + "!pip install sentence-transformers\n", + "!pip install \"cleanlab[datalab]\"\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:29.985772Z", + "iopub.status.busy": "2024-06-25T23:02:29.985300Z", + "iopub.status.idle": "2024-06-25T23:02:32.708857Z", + "shell.execute_reply": "2024-06-25T23:02:32.708280Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs.cleanlab.ai).\n", + "# If running on Colab, may want to use GPU (select: Runtime > Change runtime type > Hardware accelerator > GPU)\n", + "# Package versions we used:scikit-learn==1.2.0 sentence-transformers==2.2.2\n", + "\n", + "dependencies = [\"cleanlab\", \"sentence_transformers\", \"datasets\"]\n", + "\n", + "# Supress outputs that may appear if tensorflow happens to be improperly installed: \n", + "import os \n", + "\n", + "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\" # disable parallelism to avoid deadlocks with huggingface\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:32.711490Z", + "iopub.status.busy": "2024-06-25T23:02:32.711052Z", + "iopub.status.idle": "2024-06-25T23:02:32.714638Z", + "shell.execute_reply": "2024-06-25T23:02:32.714078Z" + } + }, + "outputs": [], + "source": [ + "import re \n", + "import string \n", + "import pandas as pd \n", + "from sklearn.metrics import accuracy_score, log_loss \n", + "from sklearn.model_selection import cross_val_predict \n", + "from sklearn.linear_model import LogisticRegression\n", + "from sentence_transformers import SentenceTransformer\n", + "\n", + "from cleanlab import Datalab" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:32.716772Z", + "iopub.status.busy": "2024-06-25T23:02:32.716455Z", + "iopub.status.idle": "2024-06-25T23:02:32.719650Z", + "shell.execute_reply": "2024-06-25T23:02:32.719116Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# This cell is hidden from docs.cleanlab.ai \n", + "\n", + "import random \n", + "import numpy as np \n", + "\n", + "pd.set_option(\"display.max_colwidth\", None) \n", + "\n", + "SEED = 123456 # for reproducibility\n", + "np.random.seed(SEED)\n", + "random.seed(SEED)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Load and format the text dataset\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:32.721723Z", + "iopub.status.busy": "2024-06-25T23:02:32.721536Z", + "iopub.status.idle": "2024-06-25T23:02:32.745834Z", + "shell.execute_reply": "2024-06-25T23:02:32.745235Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
textlabel
0i accidentally made a payment to a wrong account. what should i do?cancel_transfer
1i no longer want to transfer funds, can we cancel that transaction?cancel_transfer
2cancel my transfer, please.cancel_transfer
3i want to revert this mornings transaction.cancel_transfer
4i just realised i made the wrong payment yesterday. can you please change it to the right account? it's my rent payment and really really needs to be in the right account by tomorrowcancel_transfer
\n", + "
" + ], + "text/plain": [ + " text \\\n", + "0 i accidentally made a payment to a wrong account. what should i do? \n", + "1 i no longer want to transfer funds, can we cancel that transaction? \n", + "2 cancel my transfer, please. \n", + "3 i want to revert this mornings transaction. \n", + "4 i just realised i made the wrong payment yesterday. can you please change it to the right account? it's my rent payment and really really needs to be in the right account by tomorrow \n", + "\n", + " label \n", + "0 cancel_transfer \n", + "1 cancel_transfer \n", + "2 cancel_transfer \n", + "3 cancel_transfer \n", + "4 cancel_transfer " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data = pd.read_csv(\"https://s.cleanlab.ai/banking-intent-classification.csv\")\n", + "data.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:32.748092Z", + "iopub.status.busy": "2024-06-25T23:02:32.747728Z", + "iopub.status.idle": "2024-06-25T23:02:32.751473Z", + "shell.execute_reply": "2024-06-25T23:02:32.750947Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This dataset has 10 classes.\n", + "Classes: {'card_about_to_expire', 'supported_cards_and_currencies', 'change_pin', 'lost_or_stolen_phone', 'apple_pay_or_google_pay', 'getting_spare_card', 'card_payment_fee_charged', 'beneficiary_not_allowed', 'visa_or_mastercard', 'cancel_transfer'}\n" + ] + } + ], + "source": [ + "raw_texts, labels = data[\"text\"].values, data[\"label\"].values\n", + "num_classes = len(set(labels))\n", + "\n", + "print(f\"This dataset has {num_classes} classes.\")\n", + "print(f\"Classes: {set(labels)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's view the i-th example in the dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:32.753615Z", + "iopub.status.busy": "2024-06-25T23:02:32.753281Z", + "iopub.status.idle": "2024-06-25T23:02:32.756361Z", + "shell.execute_reply": "2024-06-25T23:02:32.755826Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Example Label: cancel_transfer\n", + "Example Text: i no longer want to transfer funds, can we cancel that transaction?\n" + ] + } + ], + "source": [ + "i = 1 # change this to view other examples from the dataset\n", + "print(f\"Example Label: {labels[i]}\")\n", + "print(f\"Example Text: {raw_texts[i]}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The data is stored as two numpy arrays:\n", + "\n", + "1. `raw_texts` stores the customer service requests utterances in text format\n", + "2. `labels` stores the intent categories (labels) for each example" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Bringing Your Own Data (BYOD)?\n", + "\n", + "You can easily replace the above with your own text dataset, and continue with the rest of the tutorial.\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we convert the text strings into vectors better suited as inputs for our ML models. \n", + "\n", + "We will use numeric representations from a pretrained Transformer model as embeddings of our text. The [Sentence Transformers](https://huggingface.co/docs/hub/sentence-transformers) library offers simple methods to compute these embeddings for text data. Here, we load the pretrained `electra-small-discriminator` model, and then run our data through network to extract a vector embedding of each example." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:32.758544Z", + "iopub.status.busy": "2024-06-25T23:02:32.758190Z", + "iopub.status.idle": "2024-06-25T23:02:36.431104Z", + "shell.execute_reply": "2024-06-25T23:02:36.430539Z" + } + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "No sentence-transformers model found with name /home/runner/.cache/torch/sentence_transformers/google_electra-small-discriminator. Creating a new one with MEAN pooling.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()\n", + " return self.fget.__get__(instance, owner)()\n" + ] + } + ], + "source": [ + "transformer = SentenceTransformer('google/electra-small-discriminator')\n", + "text_embeddings = transformer.encode(raw_texts)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Our subsequent ML model will directly operate on elements of `text_embeddings` in order to classify the customer service requests." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Define a classification model and compute out-of-sample predicted probabilities" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A typical way to leverage pretrained networks for a particular classification task is to add a linear output layer and fine-tune the network parameters on the new data. However this can be computationally intensive. Alternatively, we can freeze the pretrained weights of the network and only train the output layer without having to rely on GPU(s). Here we do this conveniently by fitting a scikit-learn linear model on top of the extracted embeddings.\n", + "\n", + "To identify label issues, cleanlab requires a probabilistic prediction from your model for each datapoint. However these predictions will be _overfit_ (and thus unreliable) for datapoints the model was previously trained on. cleanlab is intended to only be used with **out-of-sample** predicted class probabilities, i.e. on datapoints held-out from the model during the training.\n", + "\n", + "Here we obtain out-of-sample predicted class probabilities for every example in our dataset using a Logistic Regression model with cross-validation.\n", + "Make sure that the columns of your `pred_probs` are properly ordered with respect to the ordering of classes, which for Datalab is: lexicographically sorted by class name." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:36.433873Z", + "iopub.status.busy": "2024-06-25T23:02:36.433656Z", + "iopub.status.idle": "2024-06-25T23:02:37.315376Z", + "shell.execute_reply": "2024-06-25T23:02:37.314794Z" + }, + "scrolled": true + }, + "outputs": [], + "source": [ + "model = LogisticRegression(max_iter=400)\n", + "\n", + "pred_probs = cross_val_predict(model, text_embeddings, labels, method=\"predict_proba\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Use cleanlab to find issues in your dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Given feature embeddings and the (out-of-sample) predicted class probabilities obtained from any model you have, cleanlab can quickly help you identify low-quality examples in your dataset.\n", + "\n", + "Here, we use cleanlab's `Datalab` to find issues in our data. Datalab offers several ways of loading the data; we’ll simply wrap the training features and noisy labels in a dictionary. " + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:37.318331Z", + "iopub.status.busy": "2024-06-25T23:02:37.317919Z", + "iopub.status.idle": "2024-06-25T23:02:37.320854Z", + "shell.execute_reply": "2024-06-25T23:02:37.320356Z" + } + }, + "outputs": [], + "source": [ + "data_dict = {\"texts\": raw_texts, \"labels\": labels}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All that is need to audit your data is to call `find_issues()`. We pass in the predicted probabilities and the feature embeddings obtained above, but you do not necessarily need to provide all of this information depending on which types of issues you are interested in. The more inputs you provide, the more types of issues `Datalab` can detect in your data. Using a better model to produce these inputs will ensure cleanlab more accurately estimates issues." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:37.323996Z", + "iopub.status.busy": "2024-06-25T23:02:37.323058Z", + "iopub.status.idle": "2024-06-25T23:02:39.331700Z", + "shell.execute_reply": "2024-06-25T23:02:39.331073Z" + }, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding null issues ...\n", + "Finding label issues ...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding outlier issues ...\n", + "Finding near_duplicate issues ...\n", + "Finding non_iid issues ...\n", + "Finding class_imbalance issues ...\n", + "Finding underperforming_group issues ...\n", + "\n", + "Audit complete. 85 issues found in the dataset.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/sklearn/neighbors/_base.py:246: EfficiencyWarning: Precomputed sparse input was not sorted by row values. Use the function sklearn.neighbors.sort_graph_by_row_values to sort the input by row values, with warn_when_not_sorted=False to remove this warning.\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "lab = Datalab(data_dict, label_name=\"labels\")\n", + "lab.find_issues(pred_probs=pred_probs, features=text_embeddings)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After the audit is complete, review the findings using the `report` method:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:39.335819Z", + "iopub.status.busy": "2024-06-25T23:02:39.334538Z", + "iopub.status.idle": "2024-06-25T23:02:39.362346Z", + "shell.execute_reply": "2024-06-25T23:02:39.361791Z" + }, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dataset Information: num_examples: 1000, num_classes: 10\n", + "\n", + "Here is a summary of various issues found in your data:\n", + "\n", + " issue_type num_issues\n", + " label 42\n", + " outlier 38\n", + "near_duplicate 4\n", + " non_iid 1\n", + "\n", + "Learn about each issue: https://docs.cleanlab.ai/stable/cleanlab/datalab/guide/issue_type_description.html\n", + "See which examples in your dataset exhibit each issue via: `datalab.get_issues()`\n", + "\n", + "Data indices corresponding to top examples of each issue are shown below.\n", + "\n", + "\n", + "----------------------- label issues -----------------------\n", + "\n", + "About this issue:\n", + "\tExamples whose given label is estimated to be potentially incorrect\n", + " (e.g. due to annotation error) are flagged as having label issues.\n", + " \n", + "\n", + "Number of examples with this issue: 42\n", + "Overall dataset quality in terms of this issue: 0.9710\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_label_issue label_score given_label predicted_label\n", + "981 True 0.000005 card_about_to_expire card_payment_fee_charged\n", + "974 True 0.000146 beneficiary_not_allowed change_pin\n", + "982 True 0.000224 apple_pay_or_google_pay card_about_to_expire\n", + "971 True 0.000507 beneficiary_not_allowed change_pin\n", + "980 True 0.000960 card_about_to_expire card_payment_fee_charged\n", + "\n", + "\n", + "---------------------- outlier issues ----------------------\n", + "\n", + "About this issue:\n", + "\tExamples that are very different from the rest of the dataset \n", + " (i.e. potentially out-of-distribution or rare/anomalous instances).\n", + " \n", + "\n", + "Number of examples with this issue: 38\n", + "Overall dataset quality in terms of this issue: 0.3584\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_outlier_issue outlier_score\n", + "994 True 0.009642\n", + "999 True 0.013067\n", + "81 True 0.013841\n", + "433 True 0.014722\n", + "989 True 0.018224\n", + "\n", + "\n", + "------------------ near_duplicate issues -------------------\n", + "\n", + "About this issue:\n", + "\tA (near) duplicate issue refers to two or more examples in\n", + " a dataset that are extremely similar to each other, relative\n", + " to the rest of the dataset. The examples flagged with this issue\n", + " may be exactly duplicated, or lie atypically close together when\n", + " represented as vectors (i.e. feature embeddings).\n", + " \n", + "\n", + "Number of examples with this issue: 4\n", + "Overall dataset quality in terms of this issue: 0.6070\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets distance_to_nearest_neighbor\n", + "160 True 0.095724 [148] 0.006237\n", + "148 True 0.095724 [160] 0.006237\n", + "546 True 0.099341 [514] 0.006485\n", + "514 True 0.099341 [546] 0.006485\n", + "481 False 0.123418 [] 0.008165\n", + "\n", + "\n", + "---------------------- non_iid issues ----------------------\n", + "\n", + "About this issue:\n", + "\tWhether the dataset exhibits statistically significant\n", + " violations of the IID assumption like:\n", + " changepoints or shift, drift, autocorrelation, etc.\n", + " The specific violation considered is whether the\n", + " examples are ordered such that almost adjacent examples\n", + " tend to have more similar feature values.\n", + " \n", + "\n", + "Number of examples with this issue: 1\n", + "Overall dataset quality in terms of this issue: 0.0000\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_non_iid_issue non_iid_score\n", + "313 True 0.564102\n", + "13 False 0.572258\n", + "28 False 0.574915\n", + "31 False 0.575507\n", + "40 False 0.575874\n", + "\n", + "Additional Information: \n", + "p-value: 0.0\n" + ] + } + ], + "source": [ + "lab.report()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Label issues\n", + "\n", + "The report indicates that cleanlab identified many label issues in our dataset. We can see which examples are flagged as likely mislabeled and the label quality score for each example using the `get_issues` method, specifying `label` as an argument to focus on label issues in the data." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:39.365992Z", + "iopub.status.busy": "2024-06-25T23:02:39.365066Z", + "iopub.status.idle": "2024-06-25T23:02:39.374228Z", + "shell.execute_reply": "2024-06-25T23:02:39.373792Z" + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_label_issuelabel_scoregiven_labelpredicted_label
0False0.792090cancel_transfercancel_transfer
1False0.257611cancel_transfercancel_transfer
2False0.698710cancel_transfercancel_transfer
3False0.182121cancel_transferapple_pay_or_google_pay
4False0.771619cancel_transfercancel_transfer
\n", + "
" + ], + "text/plain": [ + " is_label_issue label_score given_label predicted_label\n", + "0 False 0.792090 cancel_transfer cancel_transfer\n", + "1 False 0.257611 cancel_transfer cancel_transfer\n", + "2 False 0.698710 cancel_transfer cancel_transfer\n", + "3 False 0.182121 cancel_transfer apple_pay_or_google_pay\n", + "4 False 0.771619 cancel_transfer cancel_transfer" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_issues = lab.get_issues(\"label\")\n", + "label_issues.head() " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This method returns a dataframe containing a label quality score for each example. These numeric scores lie between 0 and 1, where lower scores indicate examples more likely to be mislabeled. The dataframe also contains a boolean column specifying whether or not each example is identified to have a label issue (indicating it is likely mislabeled)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can get the subset of examples flagged with label issues, and also sort by label quality score to find the indices of the 5 most likely mislabeled examples in our dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:39.376516Z", + "iopub.status.busy": "2024-06-25T23:02:39.376160Z", + "iopub.status.idle": "2024-06-25T23:02:39.380109Z", + "shell.execute_reply": "2024-06-25T23:02:39.379723Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "cleanlab found 42 potential label errors in the dataset.\n", + "Here are indices of the top 5 most likely errors: \n", + " [981 974 982 971 980]\n" + ] + } + ], + "source": [ + "identified_label_issues = label_issues[label_issues[\"is_label_issue\"] == True]\n", + "lowest_quality_labels = label_issues[\"label_score\"].argsort()[:5].to_numpy()\n", + "\n", + "print(\n", + " f\"cleanlab found {len(identified_label_issues)} potential label errors in the dataset.\\n\"\n", + " f\"Here are indices of the top 5 most likely errors: \\n {lowest_quality_labels}\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's review some of the most likely label errors. \n", + "\n", + "Here we display the top 5 examples identified as the most likely label errors in the dataset, together with their given (original) label and a suggested alternative label from cleanlab.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:39.381995Z", + "iopub.status.busy": "2024-06-25T23:02:39.381685Z", + "iopub.status.idle": "2024-06-25T23:02:39.387966Z", + "shell.execute_reply": "2024-06-25T23:02:39.387438Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
textgiven_labelsuggested_label
981i was charged for getting cash.card_about_to_expirecard_payment_fee_charged
974can i change my pin on holiday?beneficiary_not_allowedchange_pin
982will i be sent a new card before mine expires?apple_pay_or_google_paycard_about_to_expire
971please tell me how to change my pin.beneficiary_not_allowedchange_pin
980why do i see extra charges for withdrawing my money?card_about_to_expirecard_payment_fee_charged
\n", + "
" + ], + "text/plain": [ + " text \\\n", + "981 i was charged for getting cash. \n", + "974 can i change my pin on holiday? \n", + "982 will i be sent a new card before mine expires? \n", + "971 please tell me how to change my pin. \n", + "980 why do i see extra charges for withdrawing my money? \n", + "\n", + " given_label suggested_label \n", + "981 card_about_to_expire card_payment_fee_charged \n", + "974 beneficiary_not_allowed change_pin \n", + "982 apple_pay_or_google_pay card_about_to_expire \n", + "971 beneficiary_not_allowed change_pin \n", + "980 card_about_to_expire card_payment_fee_charged " + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data_with_suggested_labels = pd.DataFrame(\n", + " {\"text\": raw_texts, \"given_label\": labels, \"suggested_label\": label_issues[\"predicted_label\"]}\n", + ")\n", + "data_with_suggested_labels.iloc[lowest_quality_labels]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "scrolled": true + }, + "source": [ + "These are very clear label errors that cleanlab has identified in this data! Note that the `given_label` does not correctly reflect the intent of these requests, whoever produced this dataset made many mistakes that are important to address before modeling the data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Outlier issues\n", + "\n", + "According to the report, our dataset contains some outliers.\n", + "We can see which examples are outliers (and a numeric quality score quantifying how typical each example appears to be) via `get_issues`. We sort the resulting DataFrame by cleanlab's outlier quality score to see the most severe outliers in our dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:39.389955Z", + "iopub.status.busy": "2024-06-25T23:02:39.389790Z", + "iopub.status.idle": "2024-06-25T23:02:39.396274Z", + "shell.execute_reply": "2024-06-25T23:02:39.395806Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_outlier_issueoutlier_score
994True0.009642
999True0.013067
81True0.013841
433True0.014722
989True0.018224
\n", + "
" + ], + "text/plain": [ + " is_outlier_issue outlier_score\n", + "994 True 0.009642\n", + "999 True 0.013067\n", + "81 True 0.013841\n", + "433 True 0.014722\n", + "989 True 0.018224" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "outlier_issues = lab.get_issues(\"outlier\")\n", + "outlier_issues.sort_values(\"outlier_score\").head()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:39.398247Z", + "iopub.status.busy": "2024-06-25T23:02:39.398071Z", + "iopub.status.idle": "2024-06-25T23:02:39.403758Z", + "shell.execute_reply": "2024-06-25T23:02:39.403283Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
textlabel
994(A AND NOT B) OR (C AND NOT D) OR (B AND NOT C AND D)change_pin
999636C65616E6C616220697320617765736F6D6521cancel_transfer
81cancel transactioncancel_transfer
433phone is gonelost_or_stolen_phone
989<p><samp>File not found.<br>Press F1 to continue</samp></p>supported_cards_and_currencies
\n", + "
" + ], + "text/plain": [ + " text \\\n", + "994 (A AND NOT B) OR (C AND NOT D) OR (B AND NOT C AND D) \n", + "999 636C65616E6C616220697320617765736F6D6521 \n", + "81 cancel transaction \n", + "433 phone is gone \n", + "989

File not found.
Press F1 to continue

\n", + "\n", + " label \n", + "994 change_pin \n", + "999 cancel_transfer \n", + "81 cancel_transfer \n", + "433 lost_or_stolen_phone \n", + "989 supported_cards_and_currencies " + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "lowest_quality_outliers = outlier_issues[\"outlier_score\"].argsort()[:5]\n", + "\n", + "data.iloc[lowest_quality_outliers]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We see that cleanlab has identified entries in this dataset that do not appear to be proper customer requests. Outliers in this dataset appear to be out-of-scope customer requests and other nonsensical text which does not make sense for intent classification. Carefully consider whether such outliers may detrimentally affect your data modeling, and consider removing them from the dataset if so." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Near-duplicate issues\n", + "\n", + "According to the report, our dataset contains some sets of nearly duplicated examples.\n", + "We can see which examples are (nearly) duplicated (and a numeric quality score quantifying how dissimilar each example is from its nearest neighbor in the dataset) via `get_issues`. We sort the resulting DataFrame by cleanlab's near-duplicate quality score to see the text examples in our dataset that are most nearly duplicated." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:39.405738Z", + "iopub.status.busy": "2024-06-25T23:02:39.405577Z", + "iopub.status.idle": "2024-06-25T23:02:39.413963Z", + "shell.execute_reply": "2024-06-25T23:02:39.413526Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_near_duplicate_issuenear_duplicate_scorenear_duplicate_setsdistance_to_nearest_neighbor
160True0.095724[148]0.006237
148True0.095724[160]0.006237
546True0.099341[514]0.006485
514True0.099341[546]0.006485
481False0.123418[]0.008165
\n", + "
" + ], + "text/plain": [ + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets \\\n", + "160 True 0.095724 [148] \n", + "148 True 0.095724 [160] \n", + "546 True 0.099341 [514] \n", + "514 True 0.099341 [546] \n", + "481 False 0.123418 [] \n", + "\n", + " distance_to_nearest_neighbor \n", + "160 0.006237 \n", + "148 0.006237 \n", + "546 0.006485 \n", + "514 0.006485 \n", + "481 0.008165 " + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "duplicate_issues = lab.get_issues(\"near_duplicate\")\n", + "duplicate_issues.sort_values(\"near_duplicate_score\").head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The results above show which examples cleanlab considers nearly duplicated (rows where `is_near_duplicate_issue == True`). Here, we see that example 160 and 148 are nearly duplicated, as are example 546 and 514.\n", + "\n", + "Let's view these examples to see how similar they are." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:39.415988Z", + "iopub.status.busy": "2024-06-25T23:02:39.415681Z", + "iopub.status.idle": "2024-06-25T23:02:39.420859Z", + "shell.execute_reply": "2024-06-25T23:02:39.420441Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
textlabel
160why was i charged an additional fee when paying with card?card_payment_fee_charged
148why was i charged an extra fee when paying with card?card_payment_fee_charged
\n", + "
" + ], + "text/plain": [ + " text \\\n", + "160 why was i charged an additional fee when paying with card? \n", + "148 why was i charged an extra fee when paying with card? \n", + "\n", + " label \n", + "160 card_payment_fee_charged \n", + "148 card_payment_fee_charged " + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.iloc[[160, 148]]" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:39.422954Z", + "iopub.status.busy": "2024-06-25T23:02:39.422568Z", + "iopub.status.idle": "2024-06-25T23:02:39.427845Z", + "shell.execute_reply": "2024-06-25T23:02:39.427357Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
textlabel
546do i have to go to the bank to change my pin?change_pin
514do i have to go into the bank to change my pin?change_pin
\n", + "
" + ], + "text/plain": [ + " text label\n", + "546 do i have to go to the bank to change my pin? change_pin\n", + "514 do i have to go into the bank to change my pin? change_pin" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.iloc[[546, 514]]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We see that these two sets of request are indeed very similar to one another! Including near duplicates in a dataset may have unintended effects on models, and be wary about splitting them across training/test sets. Learn more about handling near duplicates in a dataset from [the FAQ](../faq.html#How-to-handle-near-duplicate-data-identified-by-cleanlab?)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Non-IID issues (data drift)\n", + "According to the report, our dataset does not appear to be Independent and Identically Distributed (IID). The overall non-iid score for the dataset (displayed below) corresponds to the `p-value` of a statistical test for whether the ordering of samples in the dataset appears related to the similarity between their feature values. A low `p-value` strongly suggests that the dataset violates the IID assumption, which is a key assumption required for conclusions (models) produced from the dataset to generalize to a larger population." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:39.429922Z", + "iopub.status.busy": "2024-06-25T23:02:39.429605Z", + "iopub.status.idle": "2024-06-25T23:02:39.433185Z", + "shell.execute_reply": "2024-06-25T23:02:39.432630Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.0" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "p_value = lab.get_info('non_iid')['p-value']\n", + "p_value" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here, our dataset was flagged as non-IID because the rows happened to be sorted by class label in the original data. This may be benign if we remember to shuffle rows before model training and data splitting. But if you don't know why your data was flagged as non-IID, then you should be worried about potential data drift or unexpected interactions between data points (their values may not be statistically independent). Think carefully about what future test data may look like (and whether your data is representative of the population you care about). You should not shuffle your data before the non-IID test runs (will invalidate its conclusions)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As demonstrated above, cleanlab can automatically shortlist the most likely issues in your dataset to help you better curate your dataset for subsequent modeling. With this shortlist, you can decide whether to fix these label issues or remove nonsensical or duplicated examples from your dataset to obtain a higher-quality dataset for training your next ML model. cleanlab's issue detection can be run with outputs from *any* type of model you initially trained.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Easy Mode \n", + "\n", + "Cleanlab is most effective when you run this code with a good ML model. Try to produce the best ML model you can for your data (instead of the basic model from this tutorial). If you don't know the best ML model for your data, try [Cleanlab Studio](https://cleanlab.ai/blog/data-centric-ai/) which will automatically produce one for you. Super easy to use, [Cleanlab Studio](https://cleanlab.ai/blog/data-centric-ai/) is no-code platform for data-centric AI that automatically: detects data issues (more types of issues than this cleanlab package), helps you quickly correct these data issues, confidently labels large subsets of an unlabeled dataset, and provides other smart metadata about each of your data points -- all powered by a system that automatically trains/deploys the best ML model for your data. [Try it for free!](https://cleanlab.ai/signup/)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:39.435254Z", + "iopub.status.busy": "2024-06-25T23:02:39.434956Z", + "iopub.status.idle": "2024-06-25T23:02:39.440265Z", + "shell.execute_reply": "2024-06-25T23:02:39.439708Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "label_issue_indices = [981, 974, 982] # check these examples were found in label issues\n", + "if not all(x in identified_label_issues.index for x in label_issue_indices):\n", + " raise Exception(\"Some highlighted examples are missing from identified_label_issues.\")\n", + " \n", + "identified_outlier_issues = outlier_issues[outlier_issues[\"is_outlier_issue\"] == True]\n", + "outlier_issue_indices = [994, 989, 999] # check these examples were found in duplicates\n", + "if not all(x in identified_outlier_issues.index for x in outlier_issue_indices):\n", + " raise Exception(\"Some highlighted examples are missing from identified_outlier_issues.\")\n", + "\n", + "identified_duplicate_issues = duplicate_issues[duplicate_issues[\"is_near_duplicate_issue\"] == True]\n", + "duplicate_issue_indices = [160, 148, 546, 514] # check these examples were found in duplicates\n", + "if not all(x in identified_duplicate_issues.index for x in duplicate_issue_indices):\n", + " raise Exception(\"Some highlighted examples are missing from identified_duplicate_issues.\")" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "Text x TensorFlow", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/workflows.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/workflows.ipynb new file mode 100644 index 000000000..0a779c5ae --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/datalab/workflows.ipynb @@ -0,0 +1,3757 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Miscellaneous workflows with Datalab\n", + "\n", + "This tutorial demonstrates various useful things you can do with `Datalab` that may not be covered in other tutorials. First get familiar with `Datalab` via the [quickstart](datalab_quickstart.html)/[advanced](datalab_advanced.html) tutorials before going through this one." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "## Accelerate Issue Checks with Pre-computed kNN Graphs\n", + "\n", + "By default, `Datalab` will detect certain types of issues by constructing a k-nearest neighbors graph of your dataset using the [scikit-learn](https://scikit-learn.org/stable/modules/neighbors.html) package. Here we demonstrate how to use your own pre-computed k-nearest neighbors (kNN) graphs with `Datalab`. This allows you to use more efficient approximate kNN graphs to scale to bigger datasets.\n", + "\n", + "Using pre-computed kNN graphs is optional and not required for `Datalab` to function. `Datalab` can automatically compute these graphs for you.\n", + "\n", + "While we use a toy dataset for demonstration, these steps can be applied to any dataset." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. Load and Prepare Your Dataset\n", + "\n", + "Here we'll generate a synthetic dataset, but you should replace this with your own dataset loading process." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:42.866402Z", + "iopub.status.busy": "2024-06-25T23:02:42.865897Z", + "iopub.status.idle": "2024-06-25T23:02:43.293660Z", + "shell.execute_reply": "2024-06-25T23:02:43.293078Z" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "from sklearn.datasets import make_classification\n", + "\n", + "# Set seed for reproducibility\n", + "np.random.seed(0)\n", + "\n", + "# Replace this section with your own dataset loading\n", + "# For demonstration, we create a synthetic classification dataset\n", + "X, y = make_classification(\n", + " n_samples=5000,\n", + " n_features=5,\n", + " n_informative=5,\n", + " n_redundant=0,\n", + " n_repeated=0,\n", + " n_classes=2,\n", + " n_clusters_per_class=2,\n", + " flip_y=0.02,\n", + " class_sep=2.0,\n", + " shuffle=False,\n", + " random_state=0,\n", + ")\n", + "\n", + "\n", + "# Example: Add a duplicate example to the dataset\n", + "X[-1] = X[-2] + np.random.rand(5) * 0.001" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2. Compute kNN Graph\n", + "\n", + "We will compute the kNN graph using [FAISS](https://github.com/facebookresearch/faiss), a library for efficient similarity search. This step involves creating a kNN graph that represents the nearest neighbors for each point in your dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:43.296234Z", + "iopub.status.busy": "2024-06-25T23:02:43.295841Z", + "iopub.status.idle": "2024-06-25T23:02:43.425173Z", + "shell.execute_reply": "2024-06-25T23:02:43.424588Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import faiss\n", + "import numpy as np\n", + "\n", + "# Faiss uses single precision, so we need to convert the data type\n", + "X_faiss = np.float32(X)\n", + "\n", + "# Normalize the vectors for inner product similarity (effectively cosine similarity)\n", + "faiss.normalize_L2(X_faiss)\n", + "\n", + "# Build the index using FAISS\n", + "index = faiss.index_factory(X_faiss.shape[1], \"HNSW32,Flat\", faiss.METRIC_INNER_PRODUCT)\n", + "\n", + "# Add the dataset to the index\n", + "index.add(X_faiss)\n", + "\n", + "# Perform the search to find k-nearest neighbors\n", + "k = 10 # Number of neighbors to consider\n", + "D, I = index.search(X_faiss, k + 1) # Include the point itself during search\n", + "\n", + "# Remove the first column (self-distances)\n", + "D, I = D[:, 1:], I[:, 1:]\n", + "\n", + "# Convert cosine similarity to cosine distance\n", + "np.clip(1 - D, a_min=0, a_max=None, out=D)\n", + "\n", + "# Create the kNN graph\n", + "from scipy.sparse import csr_matrix\n", + "\n", + "\n", + "def create_knn_graph(distances: np.ndarray, indices: np.ndarray) -> csr_matrix:\n", + " \"\"\"\n", + " Create a K-nearest neighbors (KNN) graph in CSR format from provided distances and indices.\n", + "\n", + " Parameters:\n", + " distances (np.ndarray): 2D array of shape (n_samples, n_neighbors) containing distances to nearest neighbors.\n", + " indices (np.ndarray): 2D array of shape (n_samples, n_neighbors) containing indices of nearest neighbors.\n", + "\n", + " Returns:\n", + " scipy.sparse.csr_matrix: KNN graph in CSR format.\n", + " \"\"\"\n", + " assert distances.shape == indices.shape, \"distances and indices must have the same shape\"\n", + "\n", + " n_samples, n_neighbors = distances.shape\n", + "\n", + " # Convert to 1D arrays for CSR matrix creation\n", + " indices_1d = indices.ravel()\n", + " distances_1d = distances.ravel()\n", + " indptr = np.arange(0, n_samples * n_neighbors + 1, n_neighbors)\n", + "\n", + " # Create the CSR matrix\n", + " return csr_matrix((distances_1d, indices_1d, indptr), shape=(n_samples, n_samples))\n", + "\n", + "\n", + "knn_graph = create_knn_graph(D, I)\n", + "\n", + "# Ensure the kNN graph is sorted by row values\n", + "from sklearn.neighbors import sort_graph_by_row_values\n", + "sort_graph_by_row_values(knn_graph, copy=False, warn_when_not_sorted=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3. Train a Classifier and Obtain Predicted Probabilities\n", + "\n", + "Predicted class probabilities from a model trained on your dataset are used to identify label issues." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:43.427519Z", + "iopub.status.busy": "2024-06-25T23:02:43.427141Z", + "iopub.status.idle": "2024-06-25T23:02:43.450209Z", + "shell.execute_reply": "2024-06-25T23:02:43.449619Z" + } + }, + "outputs": [], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.model_selection import cross_val_predict\n", + "\n", + "# Obtain predicted probabilities using cross-validation\n", + "clf = LogisticRegression()\n", + "pred_probs = cross_val_predict(clf, X, y, cv=3, method=\"predict_proba\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 4. Identify Data Issues Using Datalab\n", + "Use the pre-computed kNN graph and predicted probabilities to find issues in the dataset using `Datalab`." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:43.452790Z", + "iopub.status.busy": "2024-06-25T23:02:43.452408Z", + "iopub.status.idle": "2024-06-25T23:02:46.180314Z", + "shell.execute_reply": "2024-06-25T23:02:46.179694Z" + } + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/runner/work/cleanlab/cleanlab/cleanlab/datalab/internal/issue_finder.py:116: UserWarning: Both `features` and `knn_graph` were provided. Most issue managers will likely prefer using `knn_graph` instead of `features` for efficiency.\n", + " warnings.warn(\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding null issues ...\n", + "Finding label issues ...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding outlier issues ...\n", + "Finding near_duplicate issues ...\n", + "Finding non_iid issues ...\n", + "Finding class_imbalance issues ...\n", + "Finding underperforming_group issues ...\n", + "\n", + "Audit complete. 523 issues found in the dataset.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/sklearn/neighbors/_base.py:246: EfficiencyWarning: Precomputed sparse input was not sorted by row values. Use the function sklearn.neighbors.sort_graph_by_row_values to sort the input by row values, with warn_when_not_sorted=False to remove this warning.\n", + " warnings.warn(\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
issue_typescorenum_issues
0null1.0000000
1label0.99140052
2outlier0.356958362
3near_duplicate0.619565108
4non_iid0.0000001
5class_imbalance0.5000000
6underperforming_group0.6519290
\n", + "
" + ], + "text/plain": [ + " issue_type score num_issues\n", + "0 null 1.000000 0\n", + "1 label 0.991400 52\n", + "2 outlier 0.356958 362\n", + "3 near_duplicate 0.619565 108\n", + "4 non_iid 0.000000 1\n", + "5 class_imbalance 0.500000 0\n", + "6 underperforming_group 0.651929 0" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_null_issuenull_scoreis_label_issuelabel_scoreis_outlier_issueoutlier_scoreis_near_duplicate_issuenear_duplicate_scoreis_non_iid_issuenon_iid_scoreis_class_imbalance_issueclass_imbalance_scoreis_underperforming_group_issueunderperforming_group_score
0False1.0False0.999827True0.031217False0.933716False0.627345False0.5False1.0
1False1.0False0.998540False0.530909False0.296974False0.646765False0.5False1.0
2False1.0False0.942721False0.332824False0.803246False0.625202False0.5False1.0
3False1.0False0.999816False0.474031False0.706253False0.655108False0.5False1.0
4False1.0False0.997703False0.131466False0.912389False0.639200False0.5False1.0
.............................................
4995False1.0False0.998646False0.504755False0.746777False0.680033False1.0False1.0
4996False1.0False0.894230False0.340986False0.816472False0.640711False1.0False1.0
4997False1.0False0.999100False0.428545False0.592421False0.658949False1.0False1.0
4998False1.0False0.986792False0.273710True0.000000False0.618033False1.0False1.0
4999False1.0False0.986776False0.273524True0.000000False0.618084False1.0False1.0
\n", + "

5000 rows × 14 columns

\n", + "
" + ], + "text/plain": [ + " is_null_issue null_score is_label_issue label_score \\\n", + "0 False 1.0 False 0.999827 \n", + "1 False 1.0 False 0.998540 \n", + "2 False 1.0 False 0.942721 \n", + "3 False 1.0 False 0.999816 \n", + "4 False 1.0 False 0.997703 \n", + "... ... ... ... ... \n", + "4995 False 1.0 False 0.998646 \n", + "4996 False 1.0 False 0.894230 \n", + "4997 False 1.0 False 0.999100 \n", + "4998 False 1.0 False 0.986792 \n", + "4999 False 1.0 False 0.986776 \n", + "\n", + " is_outlier_issue outlier_score is_near_duplicate_issue \\\n", + "0 True 0.031217 False \n", + "1 False 0.530909 False \n", + "2 False 0.332824 False \n", + "3 False 0.474031 False \n", + "4 False 0.131466 False \n", + "... ... ... ... \n", + "4995 False 0.504755 False \n", + "4996 False 0.340986 False \n", + "4997 False 0.428545 False \n", + "4998 False 0.273710 True \n", + "4999 False 0.273524 True \n", + "\n", + " near_duplicate_score is_non_iid_issue non_iid_score \\\n", + "0 0.933716 False 0.627345 \n", + "1 0.296974 False 0.646765 \n", + "2 0.803246 False 0.625202 \n", + "3 0.706253 False 0.655108 \n", + "4 0.912389 False 0.639200 \n", + "... ... ... ... \n", + "4995 0.746777 False 0.680033 \n", + "4996 0.816472 False 0.640711 \n", + "4997 0.592421 False 0.658949 \n", + "4998 0.000000 False 0.618033 \n", + "4999 0.000000 False 0.618084 \n", + "\n", + " is_class_imbalance_issue class_imbalance_score \\\n", + "0 False 0.5 \n", + "1 False 0.5 \n", + "2 False 0.5 \n", + "3 False 0.5 \n", + "4 False 0.5 \n", + "... ... ... \n", + "4995 False 1.0 \n", + "4996 False 1.0 \n", + "4997 False 1.0 \n", + "4998 False 1.0 \n", + "4999 False 1.0 \n", + "\n", + " is_underperforming_group_issue underperforming_group_score \n", + "0 False 1.0 \n", + "1 False 1.0 \n", + "2 False 1.0 \n", + "3 False 1.0 \n", + "4 False 1.0 \n", + "... ... ... \n", + "4995 False 1.0 \n", + "4996 False 1.0 \n", + "4997 False 1.0 \n", + "4998 False 1.0 \n", + "4999 False 1.0 \n", + "\n", + "[5000 rows x 14 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from cleanlab import Datalab\n", + "\n", + "# Initialize Datalab with the dataset\n", + "lab = Datalab(data={\"X\": X, \"y\": y}, label_name=\"y\", task=\"classification\")\n", + "\n", + "# Perform issue detection using the kNN graph and predicted probabilities, when possible\n", + "lab.find_issues(knn_graph=knn_graph, pred_probs=pred_probs, features=X)\n", + "\n", + "# Collect the identified issues and a summary\n", + "issues = lab.get_issues()\n", + "issue_summary = lab.get_issue_summary()\n", + "\n", + "# Display the issues and summary\n", + "display(issue_summary)\n", + "display(issues)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Explanation:\n", + "\n", + "**Creating the kNN Graph:**\n", + "\n", + "- Compute the kNN graph using FAISS or another library, ensuring the self-points (points referring to themselves) are omitted from the neighbors.\n", + " - Some distance kernels or search algorithms (like those in FAISS) may return negative distances or suffer from numerical instability when comparing\n", + " points that are extremely close to each other. This can lead to incorrect results when constructing the kNN graph.\n", + " - **Note**: kNN graphs are generally poorly suited for detecting exact duplicates, especially when the number of exact duplicates exceeds the number of requested neighbors. The strengths of this data structure lie in the assumption that data points are similar but not identical, allowing efficient similarity searches and proximity-based analyses.\n", + " - If you are comfortable with exploring non-public API functions in the library, you can use the following helper function to ensure that exact duplicate sets are correctly represented in the kNN graph. Please note, this function is not officially supported and is not part of the public API:\n", + "\n", + " ```python\n", + " from cleanlab.internal.neighbor.knn_graph import correct_knn_graph\n", + "\n", + " knn_graph = correct_knn_graph(features=X_faiss, knn_graph=knn_graph)\n", + " ```\n", + "- You may need to handle self-points yourself with third-party libraries.\n", + "- Construct the CSR (Compressed Sparse Row) matrix from the distances and indices arrays.\n", + " - `Datalab` can automatically construct a kNN graph from a numerical `features` array if one is not provided, in an accurate and reliable manner.\n", + "- Sort the kNN graph by row values.\n", + "\n", + "When using approximate kNN graphs, it is important to understand their strengths and limitations to apply them effectively." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "## Data Valuation\n", + "\n", + "In this section, we will show how to use `Datalab` to estimate how much each data point contributes to a trained classifier model. Data valuation helps you understand the importance of each data point, where you can identify more/less valuable data points for your machine learning models.\n", + "\n", + "We will use a text dataset for this example, but this approach can be applied to any dataset." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. Load and Prepare the Dataset\n", + "We will use a subset of the 20 Newsgroups dataset, which is a collection of newsgroup documents suitable for text classification tasks.\n", + "For demonstration purposes, we'll classify documents from two categories: \"alt.atheism\" and \"sci.space\"." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:46.182870Z", + "iopub.status.busy": "2024-06-25T23:02:46.182457Z", + "iopub.status.idle": "2024-06-25T23:02:54.903770Z", + "shell.execute_reply": "2024-06-25T23:02:54.903219Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
TextLabel
0: \\n: >> Please enlighten me. How is omnipote...alt.atheism
1In <19APR199320262420@kelvin.jpl.nasa.gov> baa...sci.space
2\\nHenry, I made the assumption that he who get...sci.space
3\\n\\n\\nNo. I estimate a 99 % probability the Ge...sci.space
4\\nLucky for them that the baby didn't have any...alt.atheism
\n", + "
" + ], + "text/plain": [ + " Text Label\n", + "0 : \\n: >> Please enlighten me. How is omnipote... alt.atheism\n", + "1 In <19APR199320262420@kelvin.jpl.nasa.gov> baa... sci.space\n", + "2 \\nHenry, I made the assumption that he who get... sci.space\n", + "3 \\n\\n\\nNo. I estimate a 99 % probability the Ge... sci.space\n", + "4 \\nLucky for them that the baby didn't have any... alt.atheism" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.datasets import fetch_20newsgroups\n", + "import pandas as pd\n", + "\n", + "# Load the 20 Newsgroups dataset\n", + "newsgroups_train = fetch_20newsgroups(subset='train', categories=['alt.atheism', 'sci.space'], remove=('headers', 'footers', 'quotes'))\n", + "\n", + "# Create a DataFrame with the text data and labels\n", + "df_text = pd.DataFrame({\"Text\": newsgroups_train.data, \"Label\": newsgroups_train.target})\n", + "df_text[\"Label\"] = df_text[\"Label\"].map({i: category for (i, category) in enumerate(newsgroups_train.target_names)})\n", + "\n", + "# Display the first few samples\n", + "df_text.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2. Vectorize the Text Data\n", + "We will use a `TfidfVectorizer` to convert the text data into a numerical format suitable for machine learning models." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:54.905910Z", + "iopub.status.busy": "2024-06-25T23:02:54.905726Z", + "iopub.status.idle": "2024-06-25T23:02:55.048850Z", + "shell.execute_reply": "2024-06-25T23:02:55.048302Z" + } + }, + "outputs": [], + "source": [ + "from sklearn.feature_extraction.text import TfidfVectorizer\n", + "\n", + "# Initialize the TfidfVectorizer\n", + "vectorizer = TfidfVectorizer()\n", + "\n", + "# Transform the text data into a feature matrix\n", + "X_vectorized = vectorizer.fit_transform(df_text[\"Text\"])\n", + "\n", + "# Convert the sparse matrix to a dense matrix\n", + "X = X_vectorized.toarray()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3. Perform Data Valuation with Datalab\n", + "\n", + "Next, we will initialize `Datalab` and perform data valuation to assess the value of each data point in the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:55.051166Z", + "iopub.status.busy": "2024-06-25T23:02:55.050987Z", + "iopub.status.idle": "2024-06-25T23:02:56.380032Z", + "shell.execute_reply": "2024-06-25T23:02:56.379440Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding data_valuation issues ...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Audit complete. 147 issues found in the dataset.\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_data_valuation_issuedata_valuation_score
0False0.500047
1False0.500093
2False0.500000
3False0.500047
4True0.499953
.........
1068False0.500000
1069False0.500000
1070False0.500047
1071False0.500000
1072False0.500000
\n", + "

1073 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " is_data_valuation_issue data_valuation_score\n", + "0 False 0.500047\n", + "1 False 0.500093\n", + "2 False 0.500000\n", + "3 False 0.500047\n", + "4 True 0.499953\n", + "... ... ...\n", + "1068 False 0.500000\n", + "1069 False 0.500000\n", + "1070 False 0.500047\n", + "1071 False 0.500000\n", + "1072 False 0.500000\n", + "\n", + "[1073 rows x 2 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from cleanlab import Datalab\n", + "\n", + "# Initialize Datalab with the dataset\n", + "lab = Datalab(data=df_text, label_name=\"Label\", task=\"classification\")\n", + "\n", + "# Perform data valuation\n", + "lab.find_issues(features=X, issue_types={\"data_valuation\": {}})\n", + "\n", + "# Collect the identified issues\n", + "data_valuation_issues = lab.get_issues(\"data_valuation\")\n", + "\n", + "# Display the data valuation issues\n", + "display(data_valuation_issues)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 4. (Optional) Visualize Data Valuation Scores\n", + "Let's visualize the data valuation scores across our dataset.\n", + "\n", + "Cleanlab's Shapely scores are transformed to lie between 0 and 1 such that: a score below 0.5 indicates a negative contribution to the model's training performance, while a score above 0.5 indicates a positive contribution.\n", + "\n", + "By examining the scores across different classes, we can identify whether positive or negative contributions are disproportionately concentrated in a single class. This can help detect biases in the training data." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:56.382251Z", + "iopub.status.busy": "2024-06-25T23:02:56.382062Z", + "iopub.status.idle": "2024-06-25T23:02:56.822231Z", + "shell.execute_reply": "2024-06-25T23:02:56.821635Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import seaborn as sns\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Prepare the data for plotting\n", + "plot_data = (\n", + " data_valuation_issues\n", + " # Optionally, add a 'given_label' column to distinguish between labels in the histogram\n", + " .join(pd.DataFrame({\"given_label\": df_text[\"Label\"]}))\n", + ")\n", + "\n", + "# Plot strip plots of data valuation scores for each label\n", + "sns.stripplot(\n", + " data=plot_data,\n", + " x=\"data_valuation_score\",\n", + " hue=\"given_label\", # Comment out if no labels should be used in the visualization\n", + " dodge=True,\n", + " jitter=0.3,\n", + " alpha=0.5,\n", + ")\n", + "\n", + "plt.axvline(lab.info[\"data_valuation\"][\"threshold\"], color=\"red\", linestyle=\"--\", label=\"Issue Threshold\")\n", + "\n", + "plt.title(\"Strip plot of Data Valuation Scores by Label\")\n", + "plt.xlabel(\"Data Valuation Score\")\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Learn more about the data valuation issue type [here](issue_type_description.html#data-valuation-issue)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "## Find Underperforming Groups in a Dataset\n", + "\n", + "Here we will demonstrate how to use `Datalab` to identify subgroups in a dataset over which the ML model is producing consistently worse predictions than for the overall dataset.\n", + "\n", + "`Datalab` will automatically find underperforming groups if you provide numerical embeddings and predicted probabilities from any model.\n", + "For this section, we'll determine which data subgroups to consider ourselves, such as by using clustering.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. Generate a Synthetic Dataset\n", + "\n", + "First, we will generate a synthetic dataset with blobs. This dataset will include some noisy labels in one of the blobs." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:56.824635Z", + "iopub.status.busy": "2024-06-25T23:02:56.824255Z", + "iopub.status.idle": "2024-06-25T23:02:56.833532Z", + "shell.execute_reply": "2024-06-25T23:02:56.832991Z" + } + }, + "outputs": [], + "source": [ + "from sklearn.datasets import make_blobs\n", + "import numpy as np\n", + "\n", + "# Generate synthetic data with blobs\n", + "X, y = make_blobs(n_samples=100, centers=3, n_features=2, random_state=42, cluster_std=1.0, shuffle=False)\n", + "\n", + "# Add noise to the labels\n", + "n_noisy_labels = 30\n", + "y[:n_noisy_labels] = np.random.randint(0, 2, n_noisy_labels)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2. Train a Classifier and Obtain Predicted Probabilities\n", + "\n", + "Next, we will train a basic classifier (you can use any type of model) and obtain predicted probabilities for the dataset using cross-validation." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:56.835660Z", + "iopub.status.busy": "2024-06-25T23:02:56.835332Z", + "iopub.status.idle": "2024-06-25T23:02:56.853730Z", + "shell.execute_reply": "2024-06-25T23:02:56.853276Z" + } + }, + "outputs": [], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.model_selection import cross_val_predict\n", + "\n", + "# Obtain predicted probabilities using cross-validation\n", + "clf = LogisticRegression(random_state=0)\n", + "pred_probs = cross_val_predict(clf, X, y, cv=3, method=\"predict_proba\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3. (Optional) Cluster the Data\n", + "\n", + "Datalab identifies meaningful data subgroups by automatically clustering your dataset.\n", + "You can optionally provide your own clusters to control this process. Here we show how to use KMeans clustering, but this manual clustering is entirely optional." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:56.856026Z", + "iopub.status.busy": "2024-06-25T23:02:56.855677Z", + "iopub.status.idle": "2024-06-25T23:02:57.073020Z", + "shell.execute_reply": "2024-06-25T23:02:57.072396Z" + } + }, + "outputs": [], + "source": [ + "from sklearn.cluster import KMeans\n", + "from sklearn.metrics import silhouette_score\n", + "from sklearn.model_selection import GridSearchCV\n", + "\n", + "\n", + "# Function to use in GridSearchCV for silhouette score\n", + "def silhouette_scorer(estimator, X):\n", + " cluster_labels = estimator.fit_predict(X)\n", + " return silhouette_score(X, cluster_labels)\n", + "\n", + "\n", + "# Use GridSearchCV to determine the optimal number of clusters\n", + "param_grid = {\"n_clusters\": range(2, 10)}\n", + "grid_search = GridSearchCV(KMeans(random_state=0), param_grid, cv=3, scoring=silhouette_scorer)\n", + "grid_search.fit(X)\n", + "\n", + "# Get the best estimator and predict clusters\n", + "best_kmeans = grid_search.best_estimator_\n", + "cluster_ids = best_kmeans.fit_predict(X)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 4. Identify Underperforming Groups with Datalab\n", + "\n", + "We will use `Datalab` to find underperforming groups in the dataset based on the predicted probabilities and optionally the cluster assignments." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.075536Z", + "iopub.status.busy": "2024-06-25T23:02:57.075339Z", + "iopub.status.idle": "2024-06-25T23:02:57.094151Z", + "shell.execute_reply": "2024-06-25T23:02:57.093661Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding underperforming_group issues ...\n", + "\n", + "Audit complete. 11 issues found in the dataset.\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_underperforming_group_issueunderperforming_group_scoregiven_labelpredicted_label
3True0.32830800
6True0.32830810
7True0.32830800
8True0.32830810
13True0.32830810
14True0.32830810
15True0.32830810
21True0.32830810
22True0.32830810
28True0.32830801
31True0.32830801
\n", + "
" + ], + "text/plain": [ + " is_underperforming_group_issue underperforming_group_score given_label \\\n", + "3 True 0.328308 0 \n", + "6 True 0.328308 1 \n", + "7 True 0.328308 0 \n", + "8 True 0.328308 1 \n", + "13 True 0.328308 1 \n", + "14 True 0.328308 1 \n", + "15 True 0.328308 1 \n", + "21 True 0.328308 1 \n", + "22 True 0.328308 1 \n", + "28 True 0.328308 0 \n", + "31 True 0.328308 0 \n", + "\n", + " predicted_label \n", + "3 0 \n", + "6 0 \n", + "7 0 \n", + "8 0 \n", + "13 0 \n", + "14 0 \n", + "15 0 \n", + "21 0 \n", + "22 0 \n", + "28 1 \n", + "31 1 " + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from cleanlab import Datalab\n", + "import pandas as pd\n", + "\n", + "# Initialize Datalab with the dataset\n", + "lab = Datalab(data={\"X\": X, \"y\": y}, label_name=\"y\", task=\"classification\")\n", + "\n", + "# Find issues related to underperforming groups, optionally using cluster_ids\n", + "lab.find_issues(\n", + " # features=X # Uncomment this line if 'cluster_ids' is not provided to allow Datalab to run clustering automatically.\n", + " pred_probs=pred_probs,\n", + " issue_types={\n", + " \"underperforming_group\": {\n", + " \"threshold\": 0.75, # Set a custom threshold for identifying underperforming groups.\n", + " # The default threshold is lower, optimized for higher precision (fewer false positives),\n", + " # but for this toy example, a higher threshold increases sensitivity to underperforming groups.\n", + " \"cluster_ids\": cluster_ids # Optional: Provide cluster IDs if clustering is used.\n", + " # If not provided, Datalab will automatically run clustering under the hood.\n", + " # In that case, you need to provide the 'features' array as an additional argument.\n", + " },\n", + " },\n", + ")\n", + "\n", + "# Collect the identified issues\n", + "underperforming_group_issues = lab.get_issues(\"underperforming_group\").query(\"is_underperforming_group_issue\")\n", + "\n", + "# Display the issues along with given and predicted labels\n", + "display(underperforming_group_issues.join(pd.DataFrame({\"given_label\": y, \"predicted_label\": pred_probs.argmax(axis=1)})))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 5. (Optional) Visualize the Results\n", + "\n", + "Finally, we will optionally visualize the dataset, highlighting the underperforming groups identified by `Datalab`." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.096346Z", + "iopub.status.busy": "2024-06-25T23:02:57.095904Z", + "iopub.status.idle": "2024-06-25T23:02:57.238370Z", + "shell.execute_reply": "2024-06-25T23:02:57.237758Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "# Plot the original data points\n", + "plt.scatter(X[:, 0], X[:, 1], c=y, cmap=\"tab10\")\n", + "\n", + "# Highlight the underperforming group (if any issues are detected)\n", + "if not underperforming_group_issues.empty:\n", + " plt.scatter(\n", + " X[underperforming_group_issues.index, 0], X[underperforming_group_issues.index, 1],\n", + " s=100, facecolors='none', edgecolors='r', alpha=0.3, label=\"Underperforming Group\", linewidths=2.0\n", + " )\n", + "else:\n", + " print(\"No underperforming group issues detected.\")\n", + "\n", + "# Add title and legend\n", + "plt.title(\"Underperforming Groups in the Dataset\")\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Learn more about the underperforming group issue type [here](issue_type_description.html#underperforming-group-issue)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Predefining Data Slices for Detecting Underperforming Groups\n", + "\n", + "Instead of clustering the data to determine what data slices are considered when detecting underperforming groups, you can define these slices yourself.\n", + "For say a tabular dataset, you can use the values of a categorical column as cluster IDs to predefine the relevant data subgroups/slices to consider. This allows you to focus on meaningful slices of your data defined by domain knowledge or specific attributes." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. Load and Prepare the Dataset\n", + "\n", + "We'll work with a toy tabular dataset with several categorical and numerical columns, just to illustrate how to use predefined data slices for detecting underperforming groups." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.240722Z", + "iopub.status.busy": "2024-06-25T23:02:57.240396Z", + "iopub.status.idle": "2024-06-25T23:02:57.250400Z", + "shell.execute_reply": "2024-06-25T23:02:57.249939Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
AgeGenderLocationEducationExperienceHighSalary
060OtherIndianaPhD210
150MaleIndianaBachelor's210
236FemaleIndianaPhD210
364MaleKansasHigh School371
429MaleKansasPhD140
.....................
7044OtherIndianaPhD291
7161MaleKansasHigh School10
7242MaleOhioPhD270
7337OtherIndianaPhD220
7439OtherKansasMaster's210
\n", + "

75 rows × 6 columns

\n", + "
" + ], + "text/plain": [ + " Age Gender Location Education Experience HighSalary\n", + "0 60 Other Indiana PhD 21 0\n", + "1 50 Male Indiana Bachelor's 21 0\n", + "2 36 Female Indiana PhD 21 0\n", + "3 64 Male Kansas High School 37 1\n", + "4 29 Male Kansas PhD 14 0\n", + ".. ... ... ... ... ... ...\n", + "70 44 Other Indiana PhD 29 1\n", + "71 61 Male Kansas High School 1 0\n", + "72 42 Male Ohio PhD 27 0\n", + "73 37 Other Indiana PhD 22 0\n", + "74 39 Other Kansas Master's 21 0\n", + "\n", + "[75 rows x 6 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Define the dataset as a multi-line string\n", + "dataset_tsv = \"\"\"\n", + "Age\tGender\tLocation\tEducation\tExperience\tHighSalary\n", + "60\tOther\tIndiana\tPhD\t21\t0\n", + "50\tMale\tIndiana\tBachelor's\t21\t0\n", + "36\tFemale\tIndiana\tPhD\t21\t0\n", + "64\tMale\tKansas\tHigh School\t37\t1\n", + "29\tMale\tKansas\tPhD\t14\t0\n", + "42\tMale\tOhio\tPhD\t7\t0\n", + "60\tMale\tKansas\tHigh School\t26\t0\n", + "40\tOther\tOhio\tBachelor's\t25\t0\n", + "44\tMale\tIndiana\tHigh School\t29\t0\n", + "32\tMale\tOhio\tPhD\t17\t0\n", + "32\tMale\tKansas\tBachelor's\t17\t0\n", + "45\tOther\tOhio\tPhD\t30\t0\n", + "57\tMale\tCalifornia\tHigh School\t27\t1\n", + "61\tMale\tKansas\tHigh School\t32\t0\n", + "45\tOther\tIndiana\tPhD\t4\t0\n", + "24\tOther\tKansas\tBachelor's\t9\t0\n", + "43\tOther\tOhio\tMaster's\t3\t0\n", + "23\tMale\tOhio\tHigh School\t8\t0\n", + "45\tOther\tKansas\tHigh School\t16\t0\n", + "51\tOther\tOhio\tMaster's\t27\t0\n", + "59\tMale\tOhio\tMaster's\t29\t0\n", + "23\tOther\tIndiana\tBachelor's\t8\t0\n", + "42\tMale\tKansas\tPhD\t5\t0\n", + "54\tFemale\tKansas\tMaster's\t34\t0\n", + "33\tOther\tKansas\tPhD\t18\t0\n", + "43\tFemale\tKansas\tPhD\t23\t0\n", + "46\tMale\tOhio\tBachelor's\t28\t0\n", + "48\tOther\tOhio\tPhD\t30\t0\n", + "63\tMale\tKansas\tHigh School\t34\t0\n", + "49\tFemale\tKansas\tPhD\t32\t1\n", + "37\tMale\tKansas\tPhD\t20\t0\n", + "36\tOther\tIndiana\tMaster's\t21\t1\n", + "24\tOther\tIndiana\tHigh School\t9\t0\n", + "58\tFemale\tKansas\tPhD\t32\t0\n", + "28\tMale\tCalifornia\tMaster's\t2\t0\n", + "42\tOther\tKansas\tBachelor's\t17\t0\n", + "30\tFemale\tCalifornia\tPhD\t15\t1\n", + "60\tOther\tOhio\tPhD\t30\t0\n", + "39\tOther\tKansas\tBachelor's\t2\t0\n", + "25\tMale\tOhio\tMaster's\t10\t0\n", + "46\tOther\tIndiana\tPhD\t23\t0\n", + "35\tMale\tIndiana\tBachelor's\t20\t0\n", + "30\tOther\tOhio\tHigh School\t15\t0\n", + "47\tFemale\tOhio\tMaster's\t22\t0\n", + "23\tOther\tOhio\tHigh School\t1\t0\n", + "41\tMale\tOhio\tHigh School\t26\t0\n", + "49\tMale\tKansas\tBachelor's\t1\t0\n", + "28\tFemale\tOhio\tMaster's\t13\t0\n", + "29\tOther\tKansas\tBachelor's\t14\t0\n", + "56\tOther\tIndiana\tBachelor's\t39\t1\n", + "35\tFemale\tOhio\tBachelor's\t20\t0\n", + "38\tOther\tCalifornia\tBachelor's\t8\t1\n", + "57\tOther\tOhio\tMaster's\t38\t1\n", + "61\tMale\tIndiana\tPhD\t28\t0\n", + "25\tOther\tIndiana\tHigh School\t10\t0\n", + "23\tOther\tKansas\tHigh School\t8\t0\n", + "27\tFemale\tOhio\tMaster's\t12\t0\n", + "63\tFemale\tIndiana\tHigh School\t23\t0\n", + "25\tMale\tIndiana\tMaster's\t10\t0\n", + "50\tOther\tOhio\tHigh School\t6\t0\n", + "39\tOther\tKansas\tBachelor's\t24\t0\n", + "47\tOther\tIndiana\tHigh School\t19\t0\n", + "55\tMale\tIndiana\tPhD\t0\t0\n", + "31\tMale\tOhio\tPhD\t7\t0\n", + "57\tFemale\tKansas\tPhD\t15\t0\n", + "35\tMale\tCalifornia\tPhD\t13\t0\n", + "52\tOther\tOhio\tPhD\t11\t0\n", + "36\tOther\tOhio\tMaster's\t21\t0\n", + "29\tMale\tIndiana\tMaster's\t14\t0\n", + "35\tOther\tIndiana\tHigh School\t20\t0\n", + "44\tOther\tIndiana\tPhD\t29\t1\n", + "61\tMale\tKansas\tHigh School\t1\t0\n", + "42\tMale\tOhio\tPhD\t27\t0\n", + "37\tOther\tIndiana\tPhD\t22\t0\n", + "39\tOther\tKansas\tMaster's\t21\t0\n", + "\"\"\"\n", + "\n", + "# Import necessary libraries\n", + "from io import StringIO\n", + "import pandas as pd\n", + "\n", + "# Load the dataset into a DataFrame\n", + "df = pd.read_csv(\n", + " StringIO(dataset_tsv),\n", + " sep='\\t',\n", + ")\n", + "\n", + "# Display the original DataFrame\n", + "display(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Optional**: The categorical features of the dataset can encoded to numerical values for easier. For simplicity, y, we will use `OrdinalEncoder` from [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.252430Z", + "iopub.status.busy": "2024-06-25T23:02:57.252097Z", + "iopub.status.idle": "2024-06-25T23:02:57.261286Z", + "shell.execute_reply": "2024-06-25T23:02:57.260785Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
AgeGenderLocationEducationExperienceHighSalary
060213210
150110210
236013210
364121371
429123140
.....................
7044213291
716112110
7242133270
7337213220
7439222210
\n", + "

75 rows × 6 columns

\n", + "
" + ], + "text/plain": [ + " Age Gender Location Education Experience HighSalary\n", + "0 60 2 1 3 21 0\n", + "1 50 1 1 0 21 0\n", + "2 36 0 1 3 21 0\n", + "3 64 1 2 1 37 1\n", + "4 29 1 2 3 14 0\n", + ".. ... ... ... ... ... ...\n", + "70 44 2 1 3 29 1\n", + "71 61 1 2 1 1 0\n", + "72 42 1 3 3 27 0\n", + "73 37 2 1 3 22 0\n", + "74 39 2 2 2 21 0\n", + "\n", + "[75 rows x 6 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from sklearn.preprocessing import OrdinalEncoder\n", + "\n", + "# Encode the categorical columns\n", + "columns_to_encode = [\"Gender\", \"Location\", \"Education\"]\n", + "encoded_df = df.copy()\n", + "encoder = OrdinalEncoder(dtype=int)\n", + "encoded_df[columns_to_encode] = encoder.fit_transform(encoded_df[columns_to_encode])\n", + "# encoded_df.drop(columns=[\"Salary\"], inplace=True)\n", + "\n", + "# Display the encoded DataFrame\n", + "display(encoded_df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2. Train a Classifier and Obtain Predicted Probabilities\n", + "\n", + "Next, we will train a basic classifier (you can use any type of model) and obtain predicted probabilities for the dataset using cross-validation." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.263463Z", + "iopub.status.busy": "2024-06-25T23:02:57.263131Z", + "iopub.status.idle": "2024-06-25T23:02:57.290442Z", + "shell.execute_reply": "2024-06-25T23:02:57.290007Z" + } + }, + "outputs": [], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.model_selection import cross_val_predict\n", + "\n", + "# Split data\n", + "X = encoded_df.drop(columns=[\"HighSalary\"])\n", + "y = encoded_df[\"HighSalary\"]\n", + "\n", + "# Obtain predicted probabilities using cross-validation\n", + "clf = LogisticRegression(random_state=0)\n", + "pred_probs = cross_val_predict(clf, X, y, cv=3, method=\"predict_proba\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3. Define a Data Slice\n", + "\n", + "For a tabular dataset, you can use a categorical column’s values as pre-computed data slices, so that Datalab skips its default clustering step and directly uses the encoded values for each row in the\n", + "dataset.\n", + "\n", + "For this example, we'll focus our attention to the `\"Location\"` column which has 4 unique categorical values." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.292376Z", + "iopub.status.busy": "2024-06-25T23:02:57.292199Z", + "iopub.status.idle": "2024-06-25T23:02:57.295361Z", + "shell.execute_reply": "2024-06-25T23:02:57.294949Z" + } + }, + "outputs": [], + "source": [ + "cluster_ids = encoded_df[\"Location\"].to_numpy()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 4. Identify Underperforming Groups with Datalab\n", + "\n", + "Now use `Datalab` to detect underperforming groups in the dataset based on the model predicted probabilities and our predefined data slices." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.297209Z", + "iopub.status.busy": "2024-06-25T23:02:57.297040Z", + "iopub.status.idle": "2024-06-25T23:02:57.315750Z", + "shell.execute_reply": "2024-06-25T23:02:57.315304Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding underperforming_group issues ...\n", + "\n", + "Audit complete. 5 issues found in the dataset.\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_underperforming_group_issueunderperforming_group_scoregiven_labelpredicted_label
12True0.57368111
34True0.57368100
36True0.57368110
51True0.57368110
65True0.57368100
\n", + "
" + ], + "text/plain": [ + " is_underperforming_group_issue underperforming_group_score given_label \\\n", + "12 True 0.573681 1 \n", + "34 True 0.573681 0 \n", + "36 True 0.573681 1 \n", + "51 True 0.573681 1 \n", + "65 True 0.573681 0 \n", + "\n", + " predicted_label \n", + "12 1 \n", + "34 0 \n", + "36 0 \n", + "51 0 \n", + "65 0 " + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from cleanlab import Datalab\n", + "\n", + "# Initialize Datalab with the dataset\n", + "lab = Datalab(data=df, label_name=\"HighSalary\", task=\"classification\")\n", + "\n", + "# Find issues related to underperforming groups, optionally using cluster_ids\n", + "lab.find_issues(\n", + " # features=X # Uncomment this line if 'cluster_ids' is not provided to allow Datalab to run clustering automatically.\n", + " pred_probs=pred_probs,\n", + " issue_types={\n", + " \"underperforming_group\": {\n", + " \"threshold\": 0.75, # Set a custom threshold for identifying underperforming groups.\n", + " # The default threshold is lower, optimized for higher precision (fewer false positives),\n", + " # but for this toy example, a higher threshold increases sensitivity to underperforming groups.\n", + " \"cluster_ids\": cluster_ids # Optional: Provide cluster IDs if manual data-slicing is used.\n", + " # If not provided, Datalab will automatically run clustering under the hood.\n", + " # In that case, you need to provide the 'features' array as an additional argument.\n", + " },\n", + " },\n", + ")\n", + "\n", + "# Collect the identified issues\n", + "underperforming_group_issues = lab.get_issues(\"underperforming_group\").query(\"is_underperforming_group_issue\")\n", + "\n", + "# Display the issues along with given and predicted labels\n", + "display(underperforming_group_issues.join(pd.DataFrame({\"given_label\": y, \"predicted_label\": pred_probs.argmax(axis=1)})))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "## Detect if your dataset is non-IID\n", + "\n", + "Here we demonstrate how to discover when your data might violate the foundational IID assumption that underpins most machine learning and analytics.\n", + "Common violations (that can be caught with `Datalab`) include: data drift, or lack of statistical independence where different data points affect one another.\n", + "For this demonstration, we'll work with a 2D dataset where the data points are not independent." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. Load and Prepare the Dataset\n", + "\n", + "For simplicity, we'll just work with a numerical dataset. If your data are not numerical, we recommend using numeric model embeddings of the data.\n", + "\n", + "This issue check is automatically run by `Datalab` whenever you provide numerical data embeddings or predicted probabilities." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.317849Z", + "iopub.status.busy": "2024-06-25T23:02:57.317529Z", + "iopub.status.idle": "2024-06-25T23:02:57.321663Z", + "shell.execute_reply": "2024-06-25T23:02:57.321247Z" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "# Set seed for reproducibility\n", + "np.random.seed(0)\n", + "\n", + "\n", + "def generate_data_dependent(num_samples):\n", + " a1, a2, a3 = 0.6, 0.375, -0.975\n", + " X = [np.random.normal(1, 1, 2) for _ in range(3)]\n", + " X.extend(a1 * X[i-1] + a2 * X[i-2] + a3 * X[i-3] for i in range(3, num_samples))\n", + " return np.array(X)\n", + "\n", + "\n", + "X = generate_data_dependent(50)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2. Detect Non-IID Issues Using Datalab" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.323592Z", + "iopub.status.busy": "2024-06-25T23:02:57.323421Z", + "iopub.status.idle": "2024-06-25T23:02:57.353261Z", + "shell.execute_reply": "2024-06-25T23:02:57.352757Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding non_iid issues ...\n", + "\n", + "Audit complete. 1 issues found in the dataset.\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_non_iid_issuenon_iid_score
0False0.796474
1False0.842432
2False0.922562
3False0.820759
4False0.873136
5False0.887373
6False0.825101
7False0.855875
8True0.751795
9False0.835796
\n", + "
" + ], + "text/plain": [ + " is_non_iid_issue non_iid_score\n", + "0 False 0.796474\n", + "1 False 0.842432\n", + "2 False 0.922562\n", + "3 False 0.820759\n", + "4 False 0.873136\n", + "5 False 0.887373\n", + "6 False 0.825101\n", + "7 False 0.855875\n", + "8 True 0.751795\n", + "9 False 0.835796" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from cleanlab import Datalab\n", + "\n", + "# Initialize Datalab with the dataset\n", + "lab = Datalab(data={\"X\": X})\n", + "\n", + "# Perform data valuation\n", + "lab.find_issues(features=X, issue_types={\"non_iid\": {}})\n", + "\n", + "# Collect the identified issues\n", + "non_iid_issues = lab.get_issues(\"non_iid\")\n", + "\n", + "# Display the non-iid issues\n", + "display(non_iid_issues.head(10))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3. (Optional) Visualize the Results\n", + "\n", + "Finally, we'll visualize the dataset and highlight the non-iid issues detected by `Datalab`.\n", + "\n", + "Note that only the dataset as a whole can be considered to be non-iid, but no individual data point can be considered non-iid.\n", + "\n", + "To be compatible with `Datalab`, the point with the lowest non-iid score is assigned the `is_non_iid_issue` flag if the entire dataset\n", + "is considered non-iid." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.355690Z", + "iopub.status.busy": "2024-06-25T23:02:57.355207Z", + "iopub.status.idle": "2024-06-25T23:02:57.723727Z", + "shell.execute_reply": "2024-06-25T23:02:57.723145Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "# Plot the non-iid scores\n", + "non_iid_issues[\"non_iid_score\"].plot()\n", + "\n", + "# Highlight the point assigned as a non-iid issue\n", + "idx = non_iid_issues.query(\"is_non_iid_issue\").index\n", + "plt.scatter(idx, non_iid_issues.loc[idx, \"non_iid_score\"], color='red', label='Non-iid Issue', s=100)\n", + "plt.title(\"Non-iid Scores\")\n", + "plt.xlabel(\"Sample Index\")\n", + "plt.ylabel(\"Non-iid Score\")\n", + "plt.legend()\n", + "plt.show()\n", + "\n", + "# Visualize dataset ordering\n", + "plt.scatter(X[:, 0], X[:, 1], c=range(len(X)), cmap='coolwarm', s=100)\n", + "plt.title(\"Dataset with data-dependent ordering\")\n", + "plt.xlabel('Feature 1')\n", + "plt.ylabel('Feature 2')\n", + "\n", + "# Add colorbar\n", + "plt.colorbar(label='Sample Index')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "These plots help visualize the non-iid scores for each data point and the dataset ordering, highlighting potential dependencies and issues.\n", + "\n", + "After detecting non-iid issues, you might be interested in quantifying the likelihood that your dataset is non-iid.\n", + "\n", + "To check if your data is non-iid, `Datalab` computes a p-value. A low p-value (close to 0) indicates strong evidence against the null hypothesis that the data is iid, either because the data appear to be drifting in distribution or inter-dependent across samples." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.726015Z", + "iopub.status.busy": "2024-06-25T23:02:57.725658Z", + "iopub.status.idle": "2024-06-25T23:02:57.728742Z", + "shell.execute_reply": "2024-06-25T23:02:57.728215Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "p-value: 0.0\n" + ] + } + ], + "source": [ + "print(\"p-value:\", lab.get_info(\"non_iid\")[\"p-value\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Learn more about the non-iid issue type [here](issue_type_description.html#non-iid-issue)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "## Catch Null Values in a Dataset\n", + "\n", + "Here we demonstrate how to use `Datalab` to catch null values in a dataset and visualize them. Models may learn incorrect patterns if null values are present, and may even error during model training. Dealing with null values can mitigate those risks. \n", + "\n", + "While `Datalab` automatically runs this check by default, this section dives deeper into how to detect the effect of null values in your dataset." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. Load the Dataset\n", + "\n", + "First, we will load the dataset into a Pandas DataFrame. For simplicity, we will use a dataset in TSV (tab-separated values) format.\n", + "Some care is needed when loading the dataset to ensure that the data is correctly parsed.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.730871Z", + "iopub.status.busy": "2024-06-25T23:02:57.730541Z", + "iopub.status.idle": "2024-06-25T23:02:57.743286Z", + "shell.execute_reply": "2024-06-25T23:02:57.742843Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
AgeGenderLocationAnnual_SpendingNumber_of_TransactionsLast_Purchase_Date
056.0OtherRural4099.623.02024-01-03
1NaNFemaleRural6421.165.0NaT
246.0MaleSuburban5436.553.02024-02-26
332.0FemaleRural4046.663.02024-03-23
460.0FemaleSuburban3467.676.02024-03-01
525.0FemaleSuburban4757.374.02024-01-03
638.0FemaleRural4199.536.02024-01-03
756.0MaleSuburban4991.716.02024-04-03
8NaNNaNNaNNaNNaNNaT
9NaNMaleRural4655.821.0NaT
1040.0FemaleRural5584.027.02024-03-29
1128.0FemaleUrban3102.322.02024-04-07
1228.0MaleRural6637.9911.02024-04-08
13NaNMaleUrban9167.474.02024-01-02
14NaNMaleRural6790.463.0NaT
15NaNOtherRural5327.968.02024-01-03
\n", + "
" + ], + "text/plain": [ + " Age Gender Location Annual_Spending Number_of_Transactions \\\n", + "0 56.0 Other Rural 4099.62 3.0 \n", + "1 NaN Female Rural 6421.16 5.0 \n", + "2 46.0 Male Suburban 5436.55 3.0 \n", + "3 32.0 Female Rural 4046.66 3.0 \n", + "4 60.0 Female Suburban 3467.67 6.0 \n", + "5 25.0 Female Suburban 4757.37 4.0 \n", + "6 38.0 Female Rural 4199.53 6.0 \n", + "7 56.0 Male Suburban 4991.71 6.0 \n", + "8 NaN NaN NaN NaN NaN \n", + "9 NaN Male Rural 4655.82 1.0 \n", + "10 40.0 Female Rural 5584.02 7.0 \n", + "11 28.0 Female Urban 3102.32 2.0 \n", + "12 28.0 Male Rural 6637.99 11.0 \n", + "13 NaN Male Urban 9167.47 4.0 \n", + "14 NaN Male Rural 6790.46 3.0 \n", + "15 NaN Other Rural 5327.96 8.0 \n", + "\n", + " Last_Purchase_Date \n", + "0 2024-01-03 \n", + "1 NaT \n", + "2 2024-02-26 \n", + "3 2024-03-23 \n", + "4 2024-03-01 \n", + "5 2024-01-03 \n", + "6 2024-01-03 \n", + "7 2024-04-03 \n", + "8 NaT \n", + "9 NaT \n", + "10 2024-03-29 \n", + "11 2024-04-07 \n", + "12 2024-04-08 \n", + "13 2024-01-02 \n", + "14 NaT \n", + "15 2024-01-03 " + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Define the dataset as a multi-line string\n", + "dataset_tsv = \"\"\"\n", + "Age\tGender\tLocation\tAnnual_Spending\tNumber_of_Transactions\tLast_Purchase_Date\n", + "56.0\tOther\tRural\t4099.62\t3\t2024-01-03\n", + "NaN\tFemale\tRural\t6421.16\t5\tNaT\n", + "46.0\tMale\tSuburban\t5436.55\t3\t2024-02-26\n", + "32.0\tFemale\tRural\t4046.66\t3\t2024-03-23\n", + "60.0\tFemale\tSuburban\t3467.67\t6\t2024-03-01\n", + "25.0\tFemale\tSuburban\t4757.37\t4\t2024-01-03\n", + "38.0\tFemale\tRural\t4199.53\t6\t2024-01-03\n", + "56.0\tMale\tSuburban\t4991.71\t6\t2024-04-03\n", + "NaN\n", + "NaN\tMale\tRural\t4655.82\t1\tNaT\n", + "40.0\tFemale\tRural\t5584.02\t7\t2024-03-29\n", + "28.0\tFemale\tUrban\t3102.32\t2\t2024-04-07\n", + "28.0\tMale\tRural\t6637.99\t11\t2024-04-08\n", + "NaN\tMale\tUrban\t9167.47\t4\t2024-01-02\n", + "NaN\tMale\tRural\t6790.46\t3\tNaT\n", + "NaN\tOther\tRural\t5327.96\t8\t2024-01-03\n", + "\"\"\"\n", + "\n", + "# Import necessary libraries\n", + "from io import StringIO\n", + "import pandas as pd\n", + "\n", + "# Load the dataset into a DataFrame\n", + "df = pd.read_csv(\n", + " StringIO(dataset_tsv),\n", + " sep='\\t',\n", + " parse_dates=[\"Last_Purchase_Date\"],\n", + ")\n", + "\n", + "# Display the original DataFrame\n", + "display(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2: Encode Categorical Values\n", + "\n", + "The `features` argument to `Datalab.find_issues()` generally requires a numerical array.\n", + "Therefore, we need to numerically encode any categorical values. A common workflow is to encode categorical values in the dataset before passing it to the `find_issues` method (or provide model embeddings of the data instead of the data values themselves).\n", + "However, some encoding strategies may lose the original null values.\n", + "\n", + "Here's a strategy to encode categorical columns while keeping the original DataFrame structure intact:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.745202Z", + "iopub.status.busy": "2024-06-25T23:02:57.745027Z", + "iopub.status.idle": "2024-06-25T23:02:57.758585Z", + "shell.execute_reply": "2024-06-25T23:02:57.758128Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
AgeAnnual_SpendingNumber_of_TransactionsLast_Purchase_DateGender_encodedLocation_encoded
056.04099.623.02024-01-030.00.0
1NaN6421.165.0NaT1.00.0
246.05436.553.02024-02-262.01.0
332.04046.663.02024-03-231.00.0
460.03467.676.02024-03-011.01.0
525.04757.374.02024-01-031.01.0
638.04199.536.02024-01-031.00.0
756.04991.716.02024-04-032.01.0
8NaNNaNNaNNaTNaNNaN
9NaN4655.821.0NaT2.00.0
1040.05584.027.02024-03-291.00.0
1128.03102.322.02024-04-071.02.0
1228.06637.9911.02024-04-082.00.0
13NaN9167.474.02024-01-022.02.0
14NaN6790.463.0NaT2.00.0
15NaN5327.968.02024-01-030.00.0
\n", + "
" + ], + "text/plain": [ + " Age Annual_Spending Number_of_Transactions Last_Purchase_Date \\\n", + "0 56.0 4099.62 3.0 2024-01-03 \n", + "1 NaN 6421.16 5.0 NaT \n", + "2 46.0 5436.55 3.0 2024-02-26 \n", + "3 32.0 4046.66 3.0 2024-03-23 \n", + "4 60.0 3467.67 6.0 2024-03-01 \n", + "5 25.0 4757.37 4.0 2024-01-03 \n", + "6 38.0 4199.53 6.0 2024-01-03 \n", + "7 56.0 4991.71 6.0 2024-04-03 \n", + "8 NaN NaN NaN NaT \n", + "9 NaN 4655.82 1.0 NaT \n", + "10 40.0 5584.02 7.0 2024-03-29 \n", + "11 28.0 3102.32 2.0 2024-04-07 \n", + "12 28.0 6637.99 11.0 2024-04-08 \n", + "13 NaN 9167.47 4.0 2024-01-02 \n", + "14 NaN 6790.46 3.0 NaT \n", + "15 NaN 5327.96 8.0 2024-01-03 \n", + "\n", + " Gender_encoded Location_encoded \n", + "0 0.0 0.0 \n", + "1 1.0 0.0 \n", + "2 2.0 1.0 \n", + "3 1.0 0.0 \n", + "4 1.0 1.0 \n", + "5 1.0 1.0 \n", + "6 1.0 0.0 \n", + "7 2.0 1.0 \n", + "8 NaN NaN \n", + "9 2.0 0.0 \n", + "10 1.0 0.0 \n", + "11 1.0 2.0 \n", + "12 2.0 0.0 \n", + "13 2.0 2.0 \n", + "14 2.0 0.0 \n", + "15 0.0 0.0 " + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Define a function to encode categorical columns\n", + "def encode_categorical_columns(df, columns, drop=True, inplace=False):\n", + " if not inplace:\n", + " df = df.copy()\n", + " for column in columns:\n", + " # Drop NaN values or replace them with a placeholder\n", + " categories = df[column].dropna().unique()\n", + "\n", + " # Create a mapping from categories to numbers\n", + " category_to_number = {category: idx for idx, category in enumerate(categories)}\n", + "\n", + " # Apply the mapping to the column\n", + " df[column + '_encoded'] = df[column].map(category_to_number)\n", + "\n", + " if drop:\n", + " df = df.drop(columns=columns)\n", + "\n", + " return df\n", + "\n", + "\n", + "# Encode the categorical columns\n", + "columns_to_encode = [\"Gender\", \"Location\"]\n", + "encoded_df = encode_categorical_columns(df, columns=columns_to_encode)\n", + "\n", + "# Display the encoded DataFrame\n", + "display(encoded_df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3. Initialize Datalab\n", + "\n", + "Next, we initialize `Datalab` with the original DataFrame, which will help us discover all kinds of data issues." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.760490Z", + "iopub.status.busy": "2024-06-25T23:02:57.760309Z", + "iopub.status.idle": "2024-06-25T23:02:57.769925Z", + "shell.execute_reply": "2024-06-25T23:02:57.769513Z" + } + }, + "outputs": [], + "source": [ + "# Import the Datalab class from cleanlab\n", + "from cleanlab import Datalab\n", + "\n", + "# Initialize Datalab with the original DataFrame\n", + "lab = Datalab(data=df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 4. Detect Null Values\n", + "We will use the find_issues method from `Datalab` to detect null values in our dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.771977Z", + "iopub.status.busy": "2024-06-25T23:02:57.771682Z", + "iopub.status.idle": "2024-06-25T23:02:57.780979Z", + "shell.execute_reply": "2024-06-25T23:02:57.780451Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding null issues ...\n", + "\n", + "Audit complete. 1 issues found in the dataset.\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_null_issuenull_score
0False1.000000
1False0.666667
2False1.000000
3False1.000000
4False1.000000
5False1.000000
6False1.000000
7False1.000000
8True0.000000
9False0.666667
10False1.000000
11False1.000000
12False1.000000
13False0.833333
14False0.666667
15False0.833333
\n", + "
" + ], + "text/plain": [ + " is_null_issue null_score\n", + "0 False 1.000000\n", + "1 False 0.666667\n", + "2 False 1.000000\n", + "3 False 1.000000\n", + "4 False 1.000000\n", + "5 False 1.000000\n", + "6 False 1.000000\n", + "7 False 1.000000\n", + "8 True 0.000000\n", + "9 False 0.666667\n", + "10 False 1.000000\n", + "11 False 1.000000\n", + "12 False 1.000000\n", + "13 False 0.833333\n", + "14 False 0.666667\n", + "15 False 0.833333" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Detect issues in the dataset, focusing on null values\n", + "lab.find_issues(features=encoded_df, issue_types={\"null\": {}})\n", + "\n", + "# Display the identified issues\n", + "null_issues = lab.get_issues(\"null\")\n", + "display(null_issues)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 5. Sort the Dataset by Null Issues\n", + "\n", + "To better understand the impact of null values, we will sort the original DataFrame by the `null_score` from the `null_issues` DataFrame.\n", + "\n", + "This score indicates the severity of null issues for each row." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.783021Z", + "iopub.status.busy": "2024-06-25T23:02:57.782725Z", + "iopub.status.idle": "2024-06-25T23:02:57.786338Z", + "shell.execute_reply": "2024-06-25T23:02:57.785895Z" + } + }, + "outputs": [], + "source": [ + "# Sort the issues DataFrame by 'null_score' and get the sorted indices\n", + "sorted_indices = (\n", + " null_issues\n", + " .sort_values(\"null_score\")\n", + " .index\n", + ")\n", + "\n", + "# Sort the original DataFrame based on the sorted indices from the issues DataFrame\n", + "sorted_df = df.loc[sorted_indices]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 6. (Optional) Visualize the Results\n", + "\n", + "Finally, we will create a nicely formatted DataFrame that highlights the null values and the issues detected by `Datalab`.\n", + "\n", + "We will use Pandas' styler to add custom styles for better visualization." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.788349Z", + "iopub.status.busy": "2024-06-25T23:02:57.788056Z", + "iopub.status.idle": "2024-06-25T23:02:57.847199Z", + "shell.execute_reply": "2024-06-25T23:02:57.846616Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
 AgeGenderLocationAnnual_SpendingNumber_of_TransactionsLast_Purchase_Date|is_null_issuenull_score
8nannannannannanNaTTrue0.000000
1nanFemaleRural6421.1600005.000000NaTFalse0.666667
9nanMaleRural4655.8200001.000000NaTFalse0.666667
14nanMaleRural6790.4600003.000000NaTFalse0.666667
13nanMaleUrban9167.4700004.0000002024-01-02 00:00:00False0.833333
15nanOtherRural5327.9600008.0000002024-01-03 00:00:00False0.833333
056.000000OtherRural4099.6200003.0000002024-01-03 00:00:00False1.000000
246.000000MaleSuburban5436.5500003.0000002024-02-26 00:00:00False1.000000
332.000000FemaleRural4046.6600003.0000002024-03-23 00:00:00False1.000000
460.000000FemaleSuburban3467.6700006.0000002024-03-01 00:00:00False1.000000
525.000000FemaleSuburban4757.3700004.0000002024-01-03 00:00:00False1.000000
638.000000FemaleRural4199.5300006.0000002024-01-03 00:00:00False1.000000
756.000000MaleSuburban4991.7100006.0000002024-04-03 00:00:00False1.000000
1040.000000FemaleRural5584.0200007.0000002024-03-29 00:00:00False1.000000
1128.000000FemaleUrban3102.3200002.0000002024-04-07 00:00:00False1.000000
1228.000000MaleRural6637.99000011.0000002024-04-08 00:00:00False1.000000
\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Create a column of separators\n", + "separator = pd.DataFrame([''] * len(sorted_df), columns=['|'])\n", + "\n", + "# Join the sorted DataFrame, separator, and issues DataFrame\n", + "combined_df = pd.concat([sorted_df, separator, null_issues], axis=1)\n", + "\n", + "\n", + "# Define functions to highlight null values and Datalab columns\n", + "def highlight_null_values(val):\n", + " if pd.isnull(val):\n", + " return 'background-color: yellow'\n", + " return ''\n", + "\n", + "\n", + "def highlight_datalab_columns(column):\n", + " return 'background-color: lightblue'\n", + "\n", + "\n", + "def highlight_is_null_issue(val):\n", + " if val:\n", + " return 'background-color: orange'\n", + " return ''\n", + "\n", + "\n", + "# Apply styles to the combined DataFrame\n", + "styled_df = (\n", + " combined_df\n", + " .style.map(highlight_null_values) # Highlight null and NaT values\n", + " .map(highlight_datalab_columns, subset=null_issues.columns) # Highlight columns provided by Datalab\n", + " .map(highlight_is_null_issue, subset=['is_null_issue']) # Highlight rows with null issues\n", + ")\n", + "\n", + "# Display the styled DataFrame\n", + "display(styled_df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Learn more about the null issue type [here](issue_type_description.html#null-issue)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "jp-MarkdownHeadingCollapsed": true + }, + "source": [ + "## Detect class imbalance in your dataset\n", + "\n", + "Here we consider class imbalance, a common issue when working with datasets where one or more classes is significantly rarer than the others. Class imbalance can cause models to become biased towards frequent classes, but detecting this issue can help inform adjustments for fairer and more reliable predictions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. Prepare data\n", + "\n", + "Here work with a fixed toy dataset with randomly generated labels. For this issue type, it is enough to provide labels without any additional features of the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.849591Z", + "iopub.status.busy": "2024-06-25T23:02:57.849139Z", + "iopub.status.idle": "2024-06-25T23:02:57.854947Z", + "shell.execute_reply": "2024-06-25T23:02:57.854415Z" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "labels = np.array(\n", + " ['c', 'c', 'c', 'b', 'b', 'c', 'c', 'b', 'c', 'b', 'b', 'b', 'b',\n", + " 'c', 'c', 'b', 'c', 'b', 'c', 'b', 'b', 'b', 'a', 'c', 'b', 'c',\n", + " 'c', 'b', 'b', 'b', 'b', 'b', 'b', 'c', 'c', 'b', 'c', 'a', 'b',\n", + " 'c', 'b', 'b', 'b', 'c', 'b', 'c', 'b', 'c', 'b', 'b', 'c', 'c',\n", + " 'b', 'c', 'b', 'b', 'b', 'b', 'c', 'c', 'b', 'b', 'b', 'b', 'b',\n", + " 'c', 'c', 'c', 'b', 'b', 'c', 'b', 'b', 'c', 'b', 'c', 'c', 'b',\n", + " 'c', 'c', 'c', 'b', 'c', 'b', 'b', 'b', 'c', 'b', 'b', 'c', 'b',\n", + " 'b', 'b', 'b', 'c', 'b', 'b', 'c', 'b', 'c', 'b', 'b', 'b', 'b',\n", + " 'c', 'c', 'c', 'c', 'c', 'b', 'c', 'b', 'b', 'a', 'b', 'c', 'b',\n", + " 'c', 'b', 'c', 'c', 'b', 'b', 'c', 'c', 'b', 'c', 'c', 'b', 'b',\n", + " 'c', 'c', 'c', 'c', 'c', 'b', 'b', 'c', 'c', 'b', 'c', 'c', 'b',\n", + " 'c', 'b', 'b', 'b', 'c', 'b', 'b', 'c', 'b', 'b', 'c', 'b', 'b',\n", + " 'b', 'b', 'b', 'c', 'c', 'b', 'b', 'b', 'c', 'a', 'b', 'b', 'c',\n", + " 'c', 'c', 'c', 'b', 'b', 'c', 'b', 'c', 'c', 'c', 'c', 'c', 'c',\n", + " 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'b', 'c', 'b', 'b', 'c', 'b',\n", + " 'b', 'b', 'b', 'b', 'c'],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2. Detect class imbalance with Datalab" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.857123Z", + "iopub.status.busy": "2024-06-25T23:02:57.856800Z", + "iopub.status.idle": "2024-06-25T23:02:57.867522Z", + "shell.execute_reply": "2024-06-25T23:02:57.866974Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding class_imbalance issues ...\n", + "\n", + "Audit complete. 4 issues found in the dataset.\n" + ] + } + ], + "source": [ + "from cleanlab import Datalab\n", + "\n", + "lab = Datalab(data={\"label\": labels}, label_name=\"label\", task=\"classification\")\n", + "\n", + "lab.find_issues(issue_types={\"class_imbalance\": {}})\n", + "\n", + "class_imbalance_issues = lab.get_issues(\"class_imbalance\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3. (Optional) Visualize class imbalance issues" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:57.869699Z", + "iopub.status.busy": "2024-06-25T23:02:57.869301Z", + "iopub.status.idle": "2024-06-25T23:02:58.084427Z", + "shell.execute_reply": "2024-06-25T23:02:58.083862Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import seaborn as sns\n", + "import matplotlib.pyplot as plt\n", + "\n", + "plt.figure(figsize=(8, 6))\n", + "\n", + "# Plot the distribution of labels in the dataset\n", + "ax = sns.countplot(x=\"given_label\", data=class_imbalance_issues, order=[\"a\", \"b\", \"c\"], hue=\"is_class_imbalance_issue\")\n", + "plt.title(\"Distribution of Labels\", fontsize=16)\n", + "plt.ylabel(\"Count\", fontsize=14)\n", + "plt.xlabel(\"Given Label\", fontsize=14)\n", + "plt.xticks(fontsize=14, rotation=0)\n", + "plt.yticks(fontsize=14, rotation=0)\n", + "\n", + "# Annotate plot with score of each issue class\n", + "for i, given_label in enumerate([\"a\", \"b\", \"c\"]):\n", + " filtered_df = class_imbalance_issues.query(\"given_label == @given_label\")\n", + " score = filtered_df[\"class_imbalance_score\"].mean()\n", + " y = len(filtered_df)\n", + " plt.annotate(f\"{round(score, 5)}\", xy=(i, y), ha=\"center\", va=\"bottom\", fontsize=14, color=\"red\")\n", + "\n", + "# Add textual annotation to explain the scores\n", + "plt.text(0.1, max(ax.get_yticks()) * 0.35, \"Numbers on top of\\nbars indicate class\\nimbalance scores\", ha='center', fontsize=12, color='red')\n", + "\n", + "# Adjust the legend\n", + "handles, labels = ax.get_legend_handles_labels()\n", + "ax.legend(handles, [\"No Class Imbalance\", \"Class Imbalance\"], title=\"Class Imbalance Issue\", fontsize=12, title_fontsize='14')\n", + "\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:02:58.086814Z", + "iopub.status.busy": "2024-06-25T23:02:58.086463Z", + "iopub.status.idle": "2024-06-25T23:02:58.094082Z", + "shell.execute_reply": "2024-06-25T23:02:58.093532Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "\n", + "# Only one example should suffer from null issues (other just have low scores)\n", + "assert set(null_issues.query(\"is_null_issue\").index) == {8}, \"Null issues are not as expected.\"\n", + "\n", + "# Ensure that the tutorial dataset finds underperforming group based on clustering results\n", + "assert underperforming_group_issues[\"is_underperforming_group_issue\"].sum() > 0, \"No underperforming group issues detected.\"\n", + "\n", + "# Top of non-iid issues show a flag\n", + "assert non_iid_issues.head(10).is_non_iid_issue.sum() > 0, \"No non-iid issues detected at the top of the non-iid issues.\"\n", + "\n", + "# Pre-computed knn-graph section looks for the following issue types, except non-iid\n", + "assert {\"null\", \"label\", \"outlier\", \"near_duplicate\", \"non_iid\", \"class_imbalance\", \"underperforming_group\"}.issuperset(issue_summary[\"issue_type\"]), \"Issue types are not as expected.\"\n", + "\n", + "# Ensure that class imbalance score is correct\n", + "assert all(class_imbalance_issues.query(\"is_class_imbalance_issue\")[\"class_imbalance_score\"] == 0.02), \"Class imbalance issue scores are not as expected\"\n", + "assert all(class_imbalance_issues.query(\"not is_class_imbalance_issue\")[\"class_imbalance_score\"] == 1.0), \"Class imbalance issue scores are not as expected\"" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/dataset_health.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/dataset_health.ipynb new file mode 100644 index 000000000..993f219c3 --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/dataset_health.ipynb @@ -0,0 +1,3112 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "uKlKumjJyIAL" + }, + "source": [ + "# Understanding Dataset-level Labeling Issues\n", + "\n", + "This 5-minute quickstart tutorial shows how `cleanlab.dataset.health_summary()` helps you automatically:\n", + "\n", + "- Score and rank the overall label quality of each class, useful for deciding whether to remove or keep certain classes.\n", + "- Identify overlapping classes that you can merge to make the learning task less ambiguous. Alternatively use this information to refine your annotator instructions (e.g. more precisely defining the difference between two classes).\n", + "- Generate an overall dataset and label quality health score to track improvements in your labels over time as you clean your datasets.\n", + "\n", + "This tutorial does not study issues in individual data points, but rather global issues across the dataset. Much of the functionality demonstrated here can also be accessed via `Datalab.get_info()` when using Datalab to detect label issues." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "Already have (out-of-sample) `pred_probs` from a model trained on your dataset? Run the code below to evaluate the overall health of your dataset and its labels.\n", + "\n", + "\n", + "
\n", + " \n", + "```python\n", + "\n", + "from cleanlab.dataset import health_summary\n", + "\n", + "health_summary(labels, pred_probs)\n", + " \n", + "\n", + "```\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Install dependencies and import them" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can use pip to install all packages required for this tutorial as follows:\n", + "\n", + "```\n", + "!pip install requests\n", + "!pip install cleanlab\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:01.350610Z", + "iopub.status.busy": "2024-06-25T23:03:01.350436Z", + "iopub.status.idle": "2024-06-25T23:03:02.471635Z", + "shell.execute_reply": "2024-06-25T23:03:02.471073Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs website).\n", + "# Package versions used: requests==2.28.0\n", + "\n", + "dependencies = [\"cleanlab\", \"requests\"]\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:02.474395Z", + "iopub.status.busy": "2024-06-25T23:03:02.473977Z", + "iopub.status.idle": "2024-06-25T23:03:02.476834Z", + "shell.execute_reply": "2024-06-25T23:03:02.476395Z" + }, + "id": "_UvI80l42iyi" + }, + "outputs": [], + "source": [ + "import requests\n", + "import io\n", + "import cleanlab\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wd2FlGn4sL0V" + }, + "source": [ + "## Fetch the data (can skip these details)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
See the code for fetching data **(click to expand)**\n", + "\n", + "```python\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "mnist_test_set = [\"0\", \"1\" ,\"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\"]\n", + "imagenet_val_set = [\"tench\", \"goldfish\", \"great white shark\", \"tiger shark\", \"hammerhead shark\", \"electric ray\", \"stingray\", \"cock\", \"hen\", \"ostrich\", \"brambling\", \"goldfinch\", \"house finch\", \"junco\", \"indigo bunting\", \"American robin\", \"bulbul\", \"jay\", \"magpie\", \"chickadee\", \"American dipper\", \"kite\", \"bald eagle\", \"vulture\", \"great grey owl\", \"fire salamander\", \"smooth newt\", \"newt\", \"spotted salamander\", \"axolotl\", \"American bullfrog\", \"tree frog\", \"tailed frog\", \"loggerhead sea turtle\", \"leatherback sea turtle\", \"mud turtle\", \"terrapin\", \"box turtle\", \"banded gecko\", \"green iguana\", \"Carolina anole\", \"desert grassland whiptail lizard\", \"agama\", \"frilled-necked lizard\", \"alligator lizard\", \"Gila monster\", \"European green lizard\", \"chameleon\", \"Komodo dragon\", \"Nile crocodile\", \"American alligator\", \"triceratops\", \"worm snake\", \"ring-necked snake\", \"eastern hog-nosed snake\", \"smooth green snake\", \"kingsnake\", \"garter snake\", \"water snake\", \"vine snake\", \"night snake\", \"boa constrictor\", \"African rock python\", \"Indian cobra\", \"green mamba\", \"sea snake\", \"Saharan horned viper\", \"eastern diamondback rattlesnake\", \"sidewinder\", \"trilobite\", \"harvestman\", \"scorpion\", \"yellow garden spider\", \"barn spider\", \"European garden spider\", \"southern black widow\", \"tarantula\", \"wolf spider\", \"tick\", \"centipede\", \"black grouse\", \"ptarmigan\", \"ruffed grouse\", \"prairie grouse\", \"peacock\", \"quail\", \"partridge\", \"grey parrot\", \"macaw\", \"sulphur-crested cockatoo\", \"lorikeet\", \"coucal\", \"bee eater\", \"hornbill\", \"hummingbird\", \"jacamar\", \"toucan\", \"duck\", \"red-breasted merganser\", \"goose\", \"black swan\", \"tusker\", \"echidna\", \"platypus\", \"wallaby\", \"koala\", \"wombat\", \"jellyfish\", \"sea anemone\", \"brain coral\", \"flatworm\", \"nematode\", \"conch\", \"snail\", \"slug\", \"sea slug\", \"chiton\", \"chambered nautilus\", \"Dungeness crab\", \"rock crab\", \"fiddler crab\", \"red king crab\", \"American lobster\", \"spiny lobster\", \"crayfish\", \"hermit crab\", \"isopod\", \"white stork\", \"black stork\", \"spoonbill\", \"flamingo\", \"little blue heron\", \"great egret\", \"bittern\", \"crane (bird)\", \"limpkin\", \"common gallinule\", \"American coot\", \"bustard\", \"ruddy turnstone\", \"dunlin\", \"common redshank\", \"dowitcher\", \"oystercatcher\", \"pelican\", \"king penguin\", \"albatross\", \"grey whale\", \"killer whale\", \"dugong\", \"sea lion\", \"Chihuahua\", \"Japanese Chin\", \"Maltese\", \"Pekingese\", \"Shih Tzu\", \"King Charles Spaniel\", \"Papillon\", \"toy terrier\", \"Rhodesian Ridgeback\", \"Afghan Hound\", \"Basset Hound\", \"Beagle\", \"Bloodhound\", \"Bluetick Coonhound\", \"Black and Tan Coonhound\", \"Treeing Walker Coonhound\", \"English foxhound\", \"Redbone Coonhound\", \"borzoi\", \"Irish Wolfhound\", \"Italian Greyhound\", \"Whippet\", \"Ibizan Hound\", \"Norwegian Elkhound\", \"Otterhound\", \"Saluki\", \"Scottish Deerhound\", \"Weimaraner\", \"Staffordshire Bull Terrier\", \"American Staffordshire Terrier\", \"Bedlington Terrier\", \"Border Terrier\", \"Kerry Blue Terrier\", \"Irish Terrier\", \"Norfolk Terrier\", \"Norwich Terrier\", \"Yorkshire Terrier\", \"Wire Fox Terrier\", \"Lakeland Terrier\", \"Sealyham Terrier\", \"Airedale Terrier\", \"Cairn Terrier\", \"Australian Terrier\", \"Dandie Dinmont Terrier\", \"Boston Terrier\", \"Miniature Schnauzer\", \"Giant Schnauzer\", \"Standard Schnauzer\", \"Scottish Terrier\", \"Tibetan Terrier\", \"Australian Silky Terrier\", \"Soft-coated Wheaten Terrier\", \"West Highland White Terrier\", \"Lhasa Apso\", \"Flat-Coated Retriever\", \"Curly-coated Retriever\", \"Golden Retriever\", \"Labrador Retriever\", \"Chesapeake Bay Retriever\", \"German Shorthaired Pointer\", \"Vizsla\", \"English Setter\", \"Irish Setter\", \"Gordon Setter\", \"Brittany\", \"Clumber Spaniel\", \"English Springer Spaniel\", \"Welsh Springer Spaniel\", \"Cocker Spaniels\", \"Sussex Spaniel\", \"Irish Water Spaniel\", \"Kuvasz\", \"Schipperke\", \"Groenendael\", \"Malinois\", \"Briard\", \"Australian Kelpie\", \"Komondor\", \"Old English Sheepdog\", \"Shetland Sheepdog\", \"collie\", \"Border Collie\", \"Bouvier des Flandres\", \"Rottweiler\", \"German Shepherd Dog\", \"Dobermann\", \"Miniature Pinscher\", \"Greater Swiss Mountain Dog\", \"Bernese Mountain Dog\", \"Appenzeller Sennenhund\", \"Entlebucher Sennenhund\", \"Boxer\", \"Bullmastiff\", \"Tibetan Mastiff\", \"French Bulldog\", \"Great Dane\", \"St. Bernard\", \"husky\", \"Alaskan Malamute\", \"Siberian Husky\", \"Dalmatian\", \"Affenpinscher\", \"Basenji\", \"pug\", \"Leonberger\", \"Newfoundland\", \"Pyrenean Mountain Dog\", \"Samoyed\", \"Pomeranian\", \"Chow Chow\", \"Keeshond\", \"Griffon Bruxellois\", \"Pembroke Welsh Corgi\", \"Cardigan Welsh Corgi\", \"Toy Poodle\", \"Miniature Poodle\", \"Standard Poodle\", \"Mexican hairless dog\", \"grey wolf\", \"Alaskan tundra wolf\", \"red wolf\", \"coyote\", \"dingo\", \"dhole\", \"African wild dog\", \"hyena\", \"red fox\", \"kit fox\", \"Arctic fox\", \"grey fox\", \"tabby cat\", \"tiger cat\", \"Persian cat\", \"Siamese cat\", \"Egyptian Mau\", \"cougar\", \"lynx\", \"leopard\", \"snow leopard\", \"jaguar\", \"lion\", \"tiger\", \"cheetah\", \"brown bear\", \"American black bear\", \"polar bear\", \"sloth bear\", \"mongoose\", \"meerkat\", \"tiger beetle\", \"ladybug\", \"ground beetle\", \"longhorn beetle\", \"leaf beetle\", \"dung beetle\", \"rhinoceros beetle\", \"weevil\", \"fly\", \"bee\", \"ant\", \"grasshopper\", \"cricket\", \"stick insect\", \"cockroach\", \"mantis\", \"cicada\", \"leafhopper\", \"lacewing\", \"dragonfly\", \"damselfly\", \"red admiral\", \"ringlet\", \"monarch butterfly\", \"small white\", \"sulphur butterfly\", \"gossamer-winged butterfly\", \"starfish\", \"sea urchin\", \"sea cucumber\", \"cottontail rabbit\", \"hare\", \"Angora rabbit\", \"hamster\", \"porcupine\", \"fox squirrel\", \"marmot\", \"beaver\", \"guinea pig\", \"common sorrel\", \"zebra\", \"pig\", \"wild boar\", \"warthog\", \"hippopotamus\", \"ox\", \"water buffalo\", \"bison\", \"ram\", \"bighorn sheep\", \"Alpine ibex\", \"hartebeest\", \"impala\", \"gazelle\", \"dromedary\", \"llama\", \"weasel\", \"mink\", \"European polecat\", \"black-footed ferret\", \"otter\", \"skunk\", \"badger\", \"armadillo\", \"three-toed sloth\", \"orangutan\", \"gorilla\", \"chimpanzee\", \"gibbon\", \"siamang\", \"guenon\", \"patas monkey\", \"baboon\", \"macaque\", \"langur\", \"black-and-white colobus\", \"proboscis monkey\", \"marmoset\", \"white-headed capuchin\", \"howler monkey\", \"titi\", \"Geoffroy's spider monkey\", \"common squirrel monkey\", \"ring-tailed lemur\", \"indri\", \"Asian elephant\", \"African bush elephant\", \"red panda\", \"giant panda\", \"snoek\", \"eel\", \"coho salmon\", \"rock beauty\", \"clownfish\", \"sturgeon\", \"garfish\", \"lionfish\", \"pufferfish\", \"abacus\", \"abaya\", \"academic gown\", \"accordion\", \"acoustic guitar\", \"aircraft carrier\", \"airliner\", \"airship\", \"altar\", \"ambulance\", \"amphibious vehicle\", \"analog clock\", \"apiary\", \"apron\", \"waste container\", \"assault rifle\", \"backpack\", \"bakery\", \"balance beam\", \"balloon\", \"ballpoint pen\", \"Band-Aid\", \"banjo\", \"baluster\", \"barbell\", \"barber chair\", \"barbershop\", \"barn\", \"barometer\", \"barrel\", \"wheelbarrow\", \"baseball\", \"basketball\", \"bassinet\", \"bassoon\", \"swimming cap\", \"bath towel\", \"bathtub\", \"station wagon\", \"lighthouse\", \"beaker\", \"military cap\", \"beer bottle\", \"beer glass\", \"bell-cot\", \"bib\", \"tandem bicycle\", \"bikini\", \"ring binder\", \"binoculars\", \"birdhouse\", \"boathouse\", \"bobsleigh\", \"bolo tie\", \"poke bonnet\", \"bookcase\", \"bookstore\", \"bottle cap\", \"bow\", \"bow tie\", \"brass\", \"bra\", \"breakwater\", \"breastplate\", \"broom\", \"bucket\", \"buckle\", \"bulletproof vest\", \"high-speed train\", \"butcher shop\", \"taxicab\", \"cauldron\", \"candle\", \"cannon\", \"canoe\", \"can opener\", \"cardigan\", \"car mirror\", \"carousel\", \"tool kit\", \"carton\", \"car wheel\", \"automated teller machine\", \"cassette\", \"cassette player\", \"castle\", \"catamaran\", \"CD player\", \"cello\", \"mobile phone\", \"chain\", \"chain-link fence\", \"chain mail\", \"chainsaw\", \"chest\", \"chiffonier\", \"chime\", \"china cabinet\", \"Christmas stocking\", \"church\", \"movie theater\", \"cleaver\", \"cliff dwelling\", \"cloak\", \"clogs\", \"cocktail shaker\", \"coffee mug\", \"coffeemaker\", \"coil\", \"combination lock\", \"computer keyboard\", \"confectionery store\", \"container ship\", \"convertible\", \"corkscrew\", \"cornet\", \"cowboy boot\", \"cowboy hat\", \"cradle\", \"crane (machine)\", \"crash helmet\", \"crate\", \"infant bed\", \"Crock Pot\", \"croquet ball\", \"crutch\", \"cuirass\", \"dam\", \"desk\", \"desktop computer\", \"rotary dial telephone\", \"diaper\", \"digital clock\", \"digital watch\", \"dining table\", \"dishcloth\", \"dishwasher\", \"disc brake\", \"dock\", \"dog sled\", \"dome\", \"doormat\", \"drilling rig\", \"drum\", \"drumstick\", \"dumbbell\", \"Dutch oven\", \"electric fan\", \"electric guitar\", \"electric locomotive\", \"entertainment center\", \"envelope\", \"espresso machine\", \"face powder\", \"feather boa\", \"filing cabinet\", \"fireboat\", \"fire engine\", \"fire screen sheet\", \"flagpole\", \"flute\", \"folding chair\", \"football helmet\", \"forklift\", \"fountain\", \"fountain pen\", \"four-poster bed\", \"freight car\", \"French horn\", \"frying pan\", \"fur coat\", \"garbage truck\", \"gas mask\", \"gas pump\", \"goblet\", \"go-kart\", \"golf ball\", \"golf cart\", \"gondola\", \"gong\", \"gown\", \"grand piano\", \"greenhouse\", \"grille\", \"grocery store\", \"guillotine\", \"barrette\", \"hair spray\", \"half-track\", \"hammer\", \"hamper\", \"hair dryer\", \"hand-held computer\", \"handkerchief\", \"hard disk drive\", \"harmonica\", \"harp\", \"harvester\", \"hatchet\", \"holster\", \"home theater\", \"honeycomb\", \"hook\", \"hoop skirt\", \"horizontal bar\", \"horse-drawn vehicle\", \"hourglass\", \"iPod\", \"clothes iron\", \"jack-o'-lantern\", \"jeans\", \"jeep\", \"T-shirt\", \"jigsaw puzzle\", \"pulled rickshaw\", \"joystick\", \"kimono\", \"knee pad\", \"knot\", \"lab coat\", \"ladle\", \"lampshade\", \"laptop computer\", \"lawn mower\", \"lens cap\", \"paper knife\", \"library\", \"lifeboat\", \"lighter\", \"limousine\", \"ocean liner\", \"lipstick\", \"slip-on shoe\", \"lotion\", \"speaker\", \"loupe\", \"sawmill\", \"magnetic compass\", \"mail bag\", \"mailbox\", \"tights\", \"tank suit\", \"manhole cover\", \"maraca\", \"marimba\", \"mask\", \"match\", \"maypole\", \"maze\", \"measuring cup\", \"medicine chest\", \"megalith\", \"microphone\", \"microwave oven\", \"military uniform\", \"milk can\", \"minibus\", \"miniskirt\", \"minivan\", \"missile\", \"mitten\", \"mixing bowl\", \"mobile home\", \"Model T\", \"modem\", \"monastery\", \"monitor\", \"moped\", \"mortar\", \"square academic cap\", \"mosque\", \"mosquito net\", \"scooter\", \"mountain bike\", \"tent\", \"computer mouse\", \"mousetrap\", \"moving van\", \"muzzle\", \"nail\", \"neck brace\", \"necklace\", \"nipple\", \"notebook computer\", \"obelisk\", \"oboe\", \"ocarina\", \"odometer\", \"oil filter\", \"organ\", \"oscilloscope\", \"overskirt\", \"bullock cart\", \"oxygen mask\", \"packet\", \"paddle\", \"paddle wheel\", \"padlock\", \"paintbrush\", \"pajamas\", \"palace\", \"pan flute\", \"paper towel\", \"parachute\", \"parallel bars\", \"park bench\", \"parking meter\", \"passenger car\", \"patio\", \"payphone\", \"pedestal\", \"pencil case\", \"pencil sharpener\", \"perfume\", \"Petri dish\", \"photocopier\", \"plectrum\", \"Pickelhaube\", \"picket fence\", \"pickup truck\", \"pier\", \"piggy bank\", \"pill bottle\", \"pillow\", \"ping-pong ball\", \"pinwheel\", \"pirate ship\", \"pitcher\", \"hand plane\", \"planetarium\", \"plastic bag\", \"plate rack\", \"plow\", \"plunger\", \"Polaroid camera\", \"pole\", \"police van\", \"poncho\", \"billiard table\", \"soda bottle\", \"pot\", \"potter's wheel\", \"power drill\", \"prayer rug\", \"printer\", \"prison\", \"projectile\", \"projector\", \"hockey puck\", \"punching bag\", \"purse\", \"quill\", \"quilt\", \"race car\", \"racket\", \"radiator\", \"radio\", \"radio telescope\", \"rain barrel\", \"recreational vehicle\", \"reel\", \"reflex camera\", \"refrigerator\", \"remote control\", \"restaurant\", \"revolver\", \"rifle\", \"rocking chair\", \"rotisserie\", \"eraser\", \"rugby ball\", \"ruler\", \"running shoe\", \"safe\", \"safety pin\", \"salt shaker\", \"sandal\", \"sarong\", \"saxophone\", \"scabbard\", \"weighing scale\", \"school bus\", \"schooner\", \"scoreboard\", \"CRT screen\", \"screw\", \"screwdriver\", \"seat belt\", \"sewing machine\", \"shield\", \"shoe store\", \"shoji\", \"shopping basket\", \"shopping cart\", \"shovel\", \"shower cap\", \"shower curtain\", \"ski\", \"ski mask\", \"sleeping bag\", \"slide rule\", \"sliding door\", \"slot machine\", \"snorkel\", \"snowmobile\", \"snowplow\", \"soap dispenser\", \"soccer ball\", \"sock\", \"solar thermal collector\", \"sombrero\", \"soup bowl\", \"space bar\", \"space heater\", \"space shuttle\", \"spatula\", \"motorboat\", \"spider web\", \"spindle\", \"sports car\", \"spotlight\", \"stage\", \"steam locomotive\", \"through arch bridge\", \"steel drum\", \"stethoscope\", \"scarf\", \"stone wall\", \"stopwatch\", \"stove\", \"strainer\", \"tram\", \"stretcher\", \"couch\", \"stupa\", \"submarine\", \"suit\", \"sundial\", \"sunglass\", \"sunglasses\", \"sunscreen\", \"suspension bridge\", \"mop\", \"sweatshirt\", \"swimsuit\", \"swing\", \"switch\", \"syringe\", \"table lamp\", \"tank\", \"tape player\", \"teapot\", \"teddy bear\", \"television\", \"tennis ball\", \"thatched roof\", \"front curtain\", \"thimble\", \"threshing machine\", \"throne\", \"tile roof\", \"toaster\", \"tobacco shop\", \"toilet seat\", \"torch\", \"totem pole\", \"tow truck\", \"toy store\", \"tractor\", \"semi-trailer truck\", \"tray\", \"trench coat\", \"tricycle\", \"trimaran\", \"tripod\", \"triumphal arch\", \"trolleybus\", \"trombone\", \"tub\", \"turnstile\", \"typewriter keyboard\", \"umbrella\", \"unicycle\", \"upright piano\", \"vacuum cleaner\", \"vase\", \"vault\", \"velvet\", \"vending machine\", \"vestment\", \"viaduct\", \"violin\", \"volleyball\", \"waffle iron\", \"wall clock\", \"wallet\", \"wardrobe\", \"military aircraft\", \"sink\", \"washing machine\", \"water bottle\", \"water jug\", \"water tower\", \"whiskey jug\", \"whistle\", \"wig\", \"window screen\", \"window shade\", \"Windsor tie\", \"wine bottle\", \"wing\", \"wok\", \"wooden spoon\", \"wool\", \"split-rail fence\", \"shipwreck\", \"yawl\", \"yurt\", \"website\", \"comic book\", \"crossword\", \"traffic sign\", \"traffic light\", \"dust jacket\", \"menu\", \"plate\", \"guacamole\", \"consomme\", \"hot pot\", \"trifle\", \"ice cream\", \"ice pop\", \"baguette\", \"bagel\", \"pretzel\", \"cheeseburger\", \"hot dog\", \"mashed potato\", \"cabbage\", \"broccoli\", \"cauliflower\", \"zucchini\", \"spaghetti squash\", \"acorn squash\", \"butternut squash\", \"cucumber\", \"artichoke\", \"bell pepper\", \"cardoon\", \"mushroom\", \"Granny Smith\", \"strawberry\", \"orange\", \"lemon\", \"fig\", \"pineapple\", \"banana\", \"jackfruit\", \"custard apple\", \"pomegranate\", \"hay\", \"carbonara\", \"chocolate syrup\", \"dough\", \"meatloaf\", \"pizza\", \"pot pie\", \"burrito\", \"red wine\", \"espresso\", \"cup\", \"eggnog\", \"alp\", \"bubble\", \"cliff\", \"coral reef\", \"geyser\", \"lakeshore\", \"promontory\", \"shoal\", \"seashore\", \"valley\", \"volcano\", \"baseball player\", \"bridegroom\", \"scuba diver\", \"rapeseed\", \"daisy\", \"yellow lady's slipper\", \"corn\", \"acorn\", \"rose hip\", \"horse chestnut seed\", \"coral fungus\", \"agaric\", \"gyromitra\", \"stinkhorn mushroom\", \"earth star\", \"hen-of-the-woods\", \"bolete\", \"ear\", \"toilet paper\"]\n", + "cifar10_test_set = [\"airplane\", \"automobile\", \"bird\", \"cat\", \"deer\", \"dog\", \"frog\", \"horse\", \"ship\", \"truck\"]\n", + "cifar100_test_set = ['apple', 'aquarium_fish', 'baby', 'bear', 'beaver', 'bed', 'bee', 'beetle', 'bicycle', 'bottle', 'bowl', 'boy', 'bridge', 'bus', 'butterfly', 'camel', 'can', 'castle', 'caterpillar', 'cattle', 'chair', 'chimpanzee', 'clock', 'cloud', 'cockroach', 'couch', 'crab', 'crocodile', 'cup', 'dinosaur', 'dolphin', 'elephant', 'flatfish', 'forest', 'fox', 'girl', 'hamster', 'house', 'kangaroo', 'keyboard', 'lamp', 'lawn_mower', 'leopard', 'lion', 'lizard', 'lobster', 'man', 'maple_tree', 'motorcycle', 'mountain', 'mouse', 'mushroom', 'oak_tree', 'orange', 'orchid', 'otter', 'palm_tree', 'pear', 'pickup_truck', 'pine_tree', 'plain', 'plate', 'poppy', 'porcupine', 'possum', 'rabbit', 'raccoon', 'ray', 'road', 'rocket', 'rose', 'sea', 'seal', 'shark', 'shrew', 'skunk', 'skyscraper', 'snail', 'snake', 'spider', 'squirrel', 'streetcar', 'sunflower', 'sweet_pepper', 'table', 'tank', 'telephone', 'television', 'tiger', 'tractor', 'train', 'trout', 'tulip', 'turtle', 'wardrobe', 'whale', 'willow_tree', 'wolf', 'woman', 'worm']\n", + "caltech256 = [\"ak47\", \"american-flag\", \"backpack\", \"baseball-bat\", \"baseball-glove\", \"basketball-hoop\", \"bat\", \"bathtub\", \"bear\", \"beer-mug\", \"billiards\", \"binoculars\", \"birdbath\", \"blimp\", \"bonsai\", \"boom-box\", \"bowling-ball\", \"bowling-pin\", \"boxing-glove\", \"brain\", \"breadmaker\", \"buddha\", \"bulldozer\", \"butterfly\", \"cactus\", \"cake\", \"calculator\", \"camel\", \"cannon\", \"canoe\", \"car-tire\", \"cartman\", \"cd\", \"centipede\", \"cereal-box\", \"chandelier\", \"chess-board\", \"chimp\", \"chopsticks\", \"cockroach\", \"coffee-mug\", \"coffin\", \"coin\", \"comet\", \"computer-keyboard\", \"computer-monitor\", \"computer-mouse\", \"conch\", \"cormorant\", \"covered-wagon\", \"cowboy-hat\", \"crab\", \"desk-globe\", \"diamond-ring\", \"dice\", \"dog\", \"dolphin\", \"doorknob\", \"drinking-straw\", \"duck\", \"dumb-bell\", \"eiffel-tower\", \"electric-guitar\", \"elephant\", \"elk\", \"ewer\", \"eyeglasses\", \"fern\", \"fighter-jet\", \"fire-extinguisher\", \"fire-hydrant\", \"fire-truck\", \"fireworks\", \"flashlight\", \"floppy-disk\", \"football-helmet\", \"french-horn\", \"fried-egg\", \"frisbee\", \"frog\", \"frying-pan\", \"galaxy\", \"gas-pump\", \"giraffe\", \"goat\", \"golden-gate-bridge\", \"goldfish\", \"golf-ball\", \"goose\", \"gorilla\", \"grand-piano\", \"grapes\", \"grasshopper\", \"guitar-pick\", \"hamburger\", \"hammock\", \"harmonica\", \"harp\", \"harpsichord\", \"hawksbill\", \"head-phones\", \"helicopter\", \"hibiscus\", \"homer-simpson\", \"horse\", \"horseshoe-crab\", \"hot-air-balloon\", \"hot-dog\", \"hot-tub\", \"hourglass\", \"house-fly\", \"human-skeleton\", \"hummingbird\", \"ibis\", \"ice-cream-cone\", \"iguana\", \"ipod\", \"iris\", \"jesus-christ\", \"joy-stick\", \"kangaroo\", \"kayak\", \"ketch\", \"killer-whale\", \"knife\", \"ladder\", \"laptop\", \"lathe\", \"leopards\", \"license-plate\", \"lightbulb\", \"light-house\", \"lightning\", \"llama\", \"mailbox\", \"mandolin\", \"mars\", \"mattress\", \"megaphone\", \"menorah\", \"microscope\", \"microwave\", \"minaret\", \"minotaur\", \"motorbikes\", \"mountain-bike\", \"mushroom\", \"mussels\", \"necktie\", \"octopus\", \"ostrich\", \"owl\", \"palm-pilot\", \"palm-tree\", \"paperclip\", \"paper-shredder\", \"pci-card\", \"penguin\", \"people\", \"pez-dispenser\", \"photocopier\", \"picnic-table\", \"playing-card\", \"porcupine\", \"pram\", \"praying-mantis\", \"pyramid\", \"raccoon\", \"radio-telescope\", \"rainbow\", \"refrigerator\", \"revolver\", \"rifle\", \"rotary-phone\", \"roulette-wheel\", \"saddle\", \"saturn\", \"school-bus\", \"scorpion\", \"screwdriver\", \"segway\", \"self-propelled-lawn-mower\", \"sextant\", \"sheet-music\", \"skateboard\", \"skunk\", \"skyscraper\", \"smokestack\", \"snail\", \"snake\", \"sneaker\", \"snowmobile\", \"soccer-ball\", \"socks\", \"soda-can\", \"spaghetti\", \"speed-boat\", \"spider\", \"spoon\", \"stained-glass\", \"starfish\", \"steering-wheel\", \"stirrups\", \"sunflower\", \"superman\", \"sushi\", \"swan\", \"swiss-army-knife\", \"sword\", \"syringe\", \"tambourine\", \"teapot\", \"teddy-bear\", \"teepee\", \"telephone-box\", \"tennis-ball\", \"tennis-court\", \"tennis-racket\", \"theodolite\", \"toaster\", \"tomato\", \"tombstone\", \"top-hat\", \"touring-bike\", \"tower-pisa\", \"traffic-light\", \"treadmill\", \"triceratops\", \"tricycle\", \"trilobite\", \"tripod\", \"t-shirt\", \"tuning-fork\", \"tweezer\", \"umbrella\", \"unicorn\", \"vcr\", \"video-projector\", \"washing-machine\", \"watch\", \"waterfall\", \"watermelon\", \"welding-mask\", \"wheelbarrow\", \"windmill\", \"wine-bottle\", \"xylophone\", \"yarmulke\", \"yo-yo\", \"zebra\", \"airplanes\", \"car-side\", \"faces-easy\", \"greyhound\", \"tennis-shoes\", \"toad\"]\n", + "twenty_news_test_set = ['alt.atheism', 'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'comp.windows.x', 'misc.forsale', 'rec.autos', 'rec.motorcycles', 'rec.sport.baseball', 'rec.sport.hockey', 'sci.crypt', 'sci.electronics', 'sci.med', 'sci.space', 'soc.religion.christian', 'talk.politics.guns', 'talk.politics.mideast', 'talk.politics.misc', 'talk.religion.misc']\n", + "amazon = ['Negative', 'Neutral', 'Positive']\n", + "imdb_test_set = [\"Negative\", \"Positive\"]\n", + "\n", + "ALL_CLASSES = {\n", + " 'imagenet_val_set': imagenet_val_set,\n", + " 'caltech256': caltech256,\n", + " 'mnist_test_set': mnist_test_set,\n", + " 'cifar10_test_set': cifar10_test_set,\n", + " 'cifar100_test_set': cifar100_test_set,\n", + " 'imdb_test_set': imdb_test_set,\n", + " '20news_test_set': twenty_news_test_set,\n", + " 'amazon': amazon,\n", + "}\n", + "\n", + "\n", + "def _load_classes_predprobs_labels(dataset_name):\n", + " \"\"\"Helper function to load data from the labelerrors.com datasets.\"\"\"\n", + "\n", + " base = 'https://github.com/cleanlab/label-errors/raw/'\n", + " url_base = base + '5392f6c71473055060be3044becdde1cbc18284d'\n", + " url_labels = url_base + '/original_test_labels/{}_original_labels.npy'\n", + " url_probs = url_base + '/cross_validated_predicted_probabilities/{}_pyx.npy'\n", + " NUM_PARTS = {'amazon': 3, 'imagenet_val_set': 4} # pred_probs files broken up into parts for larger datatsets\n", + "\n", + " response = requests.get(url_labels.format(dataset_name))\n", + " labels = np.load(io.BytesIO(response.content), allow_pickle=True)\n", + " if dataset_name in NUM_PARTS:\n", + " pred_probs_parts = []\n", + " for i in range(1, NUM_PARTS[dataset_name] + 1):\n", + " url = url_probs.format(dataset_name).replace(\n", + " '.npy',\n", + " f'.part{i}_of_{NUM_PARTS[dataset_name]}.npy',\n", + " )\n", + " response = requests.get(url)\n", + " pred_probs_parts.append(\n", + " np.load(io.BytesIO(response.content), allow_pickle=True))\n", + " pred_probs = np.vstack(pred_probs_parts)\n", + " else:\n", + " response = requests.get(url_probs.format(dataset_name))\n", + " pred_probs = np.load(io.BytesIO(response.content), allow_pickle=True)\n", + " print(f\"\\nLoaded the '{dataset_name}' dataset with predicted \"\n", + " f\"probabilities of shape {pred_probs.shape}\\n\")\n", + "\n", + " return pred_probs, labels, ALL_CLASSES[dataset_name]\n", + "```\n", + "\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:02.478903Z", + "iopub.status.busy": "2024-06-25T23:03:02.478719Z", + "iopub.status.idle": "2024-06-25T23:03:02.491294Z", + "shell.execute_reply": "2024-06-25T23:03:02.490814Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# names of classes in each dataset -- SCROLL DOWN!!!\n", + "mnist_test_set = [\"0\", \"1\" ,\"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\"]\n", + "cifar10_test_set = [\"airplane\", \"automobile\", \"bird\", \"cat\", \"deer\", \"dog\", \"frog\", \"horse\", \"ship\", \"truck\"]\n", + "cifar100_test_set = ['apple', 'aquarium_fish', 'baby', 'bear', 'beaver', 'bed', 'bee', 'beetle', 'bicycle', 'bottle', 'bowl', 'boy', 'bridge', 'bus', 'butterfly', 'camel', 'can', 'castle', 'caterpillar', 'cattle', 'chair', 'chimpanzee', 'clock', 'cloud', 'cockroach', 'couch', 'crab', 'crocodile', 'cup', 'dinosaur', 'dolphin', 'elephant', 'flatfish', 'forest', 'fox', 'girl', 'hamster', 'house', 'kangaroo', 'keyboard', 'lamp', 'lawn_mower', 'leopard', 'lion', 'lizard', 'lobster', 'man', 'maple_tree', 'motorcycle', 'mountain', 'mouse', 'mushroom', 'oak_tree', 'orange', 'orchid', 'otter', 'palm_tree', 'pear', 'pickup_truck', 'pine_tree', 'plain', 'plate', 'poppy', 'porcupine', 'possum', 'rabbit', 'raccoon', 'ray', 'road', 'rocket', 'rose', 'sea', 'seal', 'shark', 'shrew', 'skunk', 'skyscraper', 'snail', 'snake', 'spider', 'squirrel', 'streetcar', 'sunflower', 'sweet_pepper', 'table', 'tank', 'telephone', 'television', 'tiger', 'tractor', 'train', 'trout', 'tulip', 'turtle', 'wardrobe', 'whale', 'willow_tree', 'wolf', 'woman', 'worm']\n", + "caltech256 = [\"ak47\", \"american-flag\", \"backpack\", \"baseball-bat\", \"baseball-glove\", \"basketball-hoop\", \"bat\", \"bathtub\", \"bear\", \"beer-mug\", \"billiards\", \"binoculars\", \"birdbath\", \"blimp\", \"bonsai\", \"boom-box\", \"bowling-ball\", \"bowling-pin\", \"boxing-glove\", \"brain\", \"breadmaker\", \"buddha\", \"bulldozer\", \"butterfly\", \"cactus\", \"cake\", \"calculator\", \"camel\", \"cannon\", \"canoe\", \"car-tire\", \"cartman\", \"cd\", \"centipede\", \"cereal-box\", \"chandelier\", \"chess-board\", \"chimp\", \"chopsticks\", \"cockroach\", \"coffee-mug\", \"coffin\", \"coin\", \"comet\", \"computer-keyboard\", \"computer-monitor\", \"computer-mouse\", \"conch\", \"cormorant\", \"covered-wagon\", \"cowboy-hat\", \"crab\", \"desk-globe\", \"diamond-ring\", \"dice\", \"dog\", \"dolphin\", \"doorknob\", \"drinking-straw\", \"duck\", \"dumb-bell\", \"eiffel-tower\", \"electric-guitar\", \"elephant\", \"elk\", \"ewer\", \"eyeglasses\", \"fern\", \"fighter-jet\", \"fire-extinguisher\", \"fire-hydrant\", \"fire-truck\", \"fireworks\", \"flashlight\", \"floppy-disk\", \"football-helmet\", \"french-horn\", \"fried-egg\", \"frisbee\", \"frog\", \"frying-pan\", \"galaxy\", \"gas-pump\", \"giraffe\", \"goat\", \"golden-gate-bridge\", \"goldfish\", \"golf-ball\", \"goose\", \"gorilla\", \"grand-piano\", \"grapes\", \"grasshopper\", \"guitar-pick\", \"hamburger\", \"hammock\", \"harmonica\", \"harp\", \"harpsichord\", \"hawksbill\", \"head-phones\", \"helicopter\", \"hibiscus\", \"homer-simpson\", \"horse\", \"horseshoe-crab\", \"hot-air-balloon\", \"hot-dog\", \"hot-tub\", \"hourglass\", \"house-fly\", \"human-skeleton\", \"hummingbird\", \"ibis\", \"ice-cream-cone\", \"iguana\", \"ipod\", \"iris\", \"jesus-christ\", \"joy-stick\", \"kangaroo\", \"kayak\", \"ketch\", \"killer-whale\", \"knife\", \"ladder\", \"laptop\", \"lathe\", \"leopards\", \"license-plate\", \"lightbulb\", \"light-house\", \"lightning\", \"llama\", \"mailbox\", \"mandolin\", \"mars\", \"mattress\", \"megaphone\", \"menorah\", \"microscope\", \"microwave\", \"minaret\", \"minotaur\", \"motorbikes\", \"mountain-bike\", \"mushroom\", \"mussels\", \"necktie\", \"octopus\", \"ostrich\", \"owl\", \"palm-pilot\", \"palm-tree\", \"paperclip\", \"paper-shredder\", \"pci-card\", \"penguin\", \"people\", \"pez-dispenser\", \"photocopier\", \"picnic-table\", \"playing-card\", \"porcupine\", \"pram\", \"praying-mantis\", \"pyramid\", \"raccoon\", \"radio-telescope\", \"rainbow\", \"refrigerator\", \"revolver\", \"rifle\", \"rotary-phone\", \"roulette-wheel\", \"saddle\", \"saturn\", \"school-bus\", \"scorpion\", \"screwdriver\", \"segway\", \"self-propelled-lawn-mower\", \"sextant\", \"sheet-music\", \"skateboard\", \"skunk\", \"skyscraper\", \"smokestack\", \"snail\", \"snake\", \"sneaker\", \"snowmobile\", \"soccer-ball\", \"socks\", \"soda-can\", \"spaghetti\", \"speed-boat\", \"spider\", \"spoon\", \"stained-glass\", \"starfish\", \"steering-wheel\", \"stirrups\", \"sunflower\", \"superman\", \"sushi\", \"swan\", \"swiss-army-knife\", \"sword\", \"syringe\", \"tambourine\", \"teapot\", \"teddy-bear\", \"teepee\", \"telephone-box\", \"tennis-ball\", \"tennis-court\", \"tennis-racket\", \"theodolite\", \"toaster\", \"tomato\", \"tombstone\", \"top-hat\", \"touring-bike\", \"tower-pisa\", \"traffic-light\", \"treadmill\", \"triceratops\", \"tricycle\", \"trilobite\", \"tripod\", \"t-shirt\", \"tuning-fork\", \"tweezer\", \"umbrella\", \"unicorn\", \"vcr\", \"video-projector\", \"washing-machine\", \"watch\", \"waterfall\", \"watermelon\", \"welding-mask\", \"wheelbarrow\", \"windmill\", \"wine-bottle\", \"xylophone\", \"yarmulke\", \"yo-yo\", \"zebra\", \"airplanes\", \"car-side\", \"faces-easy\", \"greyhound\", \"tennis-shoes\", \"toad\"]\n", + "twenty_news_test_set = ['alt.atheism', 'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'comp.windows.x', 'misc.forsale', 'rec.autos', 'rec.motorcycles', 'rec.sport.baseball', 'rec.sport.hockey', 'sci.crypt', 'sci.electronics', 'sci.med', 'sci.space', 'soc.religion.christian', 'talk.politics.guns', 'talk.politics.mideast', 'talk.politics.misc', 'talk.religion.misc']\n", + "\n", + "ALL_CLASSES = {\n", + " 'caltech256': caltech256,\n", + " 'mnist_test_set': mnist_test_set,\n", + " 'cifar10_test_set': cifar10_test_set,\n", + " 'cifar100_test_set': cifar100_test_set,\n", + " '20news_test_set': twenty_news_test_set,\n", + "}\n", + "\n", + "\n", + "def _load_classes_predprobs_labels(dataset_name):\n", + " \"\"\"Helper function to load data from the labelerrors.com datasets.\"\"\"\n", + "\n", + " base = 'https://github.com/cleanlab/label-errors/raw/'\n", + " url_base = base + '5392f6c71473055060be3044becdde1cbc18284d'\n", + " url_labels = url_base + '/original_test_labels/{}_original_labels.npy'\n", + " url_probs = url_base + '/cross_validated_predicted_probabilities/{}_pyx.npy'\n", + "\n", + " response = requests.get(url_labels.format(dataset_name))\n", + " labels = np.load(io.BytesIO(response.content), allow_pickle=True)\n", + "\n", + " response = requests.get(url_probs.format(dataset_name))\n", + " pred_probs = np.load(io.BytesIO(response.content), allow_pickle=True)\n", + " print(f\"\\nLoaded the '{dataset_name}' dataset with predicted \"\n", + " f\"probabilities of shape {pred_probs.shape}\\n\")\n", + "\n", + " return pred_probs, labels, ALL_CLASSES[dataset_name]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7PixDik8JFiX" + }, + "source": [ + "## **Start of tutorial:** Evaluate the health of 8 popular datasets\n", + "\n", + "This tutorial shows the output of running `cleanlab.dataset.health_summary()` on 8 popular datasets below:\n", + "\n", + "- 5 image datasets: ImageNet, Caltech256, MNIST, CIFAR-10, CIFAR-100\n", + "- 3 text datasets: IMDB Reviews, 20 News Groups, Amazon Reviews\n", + "\n", + "`cleanlab.dataset.health_summary()` works with several kinds of inputs (see docstring). In this tutorial, we input:\n", + "\n", + "1. out-of-sample predicted probabilities (e.g. computed via [cross-validation](https://docs.cleanlab.ai/master/tutorials/pred_probs_cross_val.html))\n", + "2. labels (can contain label errors and various issues)\n", + "\n", + "For the 8 datasets, we've precomputed and loaded these for you. See [labelerrors.com](https://labelerrors.com/) for more info about the label issues in these datasets." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Want more interpretability?\n", + "\n", + "Pass in a list of class names ordered by their indices into the `class_names` argument in `cleanlab.dataset.health_summary()`.\n", + "\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:02.493229Z", + "iopub.status.busy": "2024-06-25T23:03:02.493054Z", + "iopub.status.idle": "2024-06-25T23:03:07.162100Z", + "shell.execute_reply": "2024-06-25T23:03:07.161589Z" + }, + "id": "dhTHOg8Pyv5G" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "🎯 Caltech256 🎯\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Loaded the 'caltech256' dataset with predicted probabilities of shape (29780, 256)\n", + "\n", + "-------------------------------------------------------------\n", + "| Generating a Cleanlab Dataset Health Summary |\n", + "| for your dataset with 29,780 examples and 256 classes. |\n", + "| Note, Cleanlab is not a medical doctor... yet. |\n", + "-------------------------------------------------------------\n", + "\n", + "Overall Class Quality and Noise across your dataset (below)\n", + "------------------------------------------------------------ \n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Class NameClass IndexLabel IssuesInverse Label IssuesLabel NoiseInverse Label NoiseLabel Quality Score
0tennis-shoes25437330.3592230.3333330.640777
1skateboard18437230.3592230.2584270.640777
2chopsticks3829200.3411760.2631580.658824
3drinking-straw5828180.3373490.2465750.662651
4yo-yo24833370.3300000.3557690.670000
........................
251raccoon167000.0000000.0000001.000000
252hummingbird112000.0000000.0000001.000000
253hourglass109020.0000000.0229891.000000
254starfish200000.0000000.0000001.000000
255saturn176050.0000000.0495051.000000
\n", + "

256 rows × 7 columns

\n", + "
" + ], + "text/plain": [ + " Class Name Class Index Label Issues Inverse Label Issues \\\n", + "0 tennis-shoes 254 37 33 \n", + "1 skateboard 184 37 23 \n", + "2 chopsticks 38 29 20 \n", + "3 drinking-straw 58 28 18 \n", + "4 yo-yo 248 33 37 \n", + ".. ... ... ... ... \n", + "251 raccoon 167 0 0 \n", + "252 hummingbird 112 0 0 \n", + "253 hourglass 109 0 2 \n", + "254 starfish 200 0 0 \n", + "255 saturn 176 0 5 \n", + "\n", + " Label Noise Inverse Label Noise Label Quality Score \n", + "0 0.359223 0.333333 0.640777 \n", + "1 0.359223 0.258427 0.640777 \n", + "2 0.341176 0.263158 0.658824 \n", + "3 0.337349 0.246575 0.662651 \n", + "4 0.330000 0.355769 0.670000 \n", + ".. ... ... ... \n", + "251 0.000000 0.000000 1.000000 \n", + "252 0.000000 0.000000 1.000000 \n", + "253 0.000000 0.022989 1.000000 \n", + "254 0.000000 0.000000 1.000000 \n", + "255 0.000000 0.049505 1.000000 \n", + "\n", + "[256 rows x 7 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Class Overlap. In some cases, you may want to merge classes in the top rows (below)\n", + "-----------------------------------------------------------------------------------\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Class Name AClass Name BClass Index AClass Index BNum Overlapping ExamplesJoint Probability
0sneakertennis-shoes190254660.002216
1frisbeeyo-yo78248290.000974
2duckgoose5988260.000873
3beer-mugcoffee-mug940220.000739
4frogtoad79255220.000739
.....................
32635cormorantcovered-wagon484900.000000
32636conchtoad4725500.000000
32637conchtennis-shoes4725400.000000
32638conchgreyhound4725300.000000
32639tennis-shoestoad25425500.000000
\n", + "

32640 rows × 6 columns

\n", + "
" + ], + "text/plain": [ + " Class Name A Class Name B Class Index A Class Index B \\\n", + "0 sneaker tennis-shoes 190 254 \n", + "1 frisbee yo-yo 78 248 \n", + "2 duck goose 59 88 \n", + "3 beer-mug coffee-mug 9 40 \n", + "4 frog toad 79 255 \n", + "... ... ... ... ... \n", + "32635 cormorant covered-wagon 48 49 \n", + "32636 conch toad 47 255 \n", + "32637 conch tennis-shoes 47 254 \n", + "32638 conch greyhound 47 253 \n", + "32639 tennis-shoes toad 254 255 \n", + "\n", + " Num Overlapping Examples Joint Probability \n", + "0 66 0.002216 \n", + "1 29 0.000974 \n", + "2 26 0.000873 \n", + "3 22 0.000739 \n", + "4 22 0.000739 \n", + "... ... ... \n", + "32635 0 0.000000 \n", + "32636 0 0.000000 \n", + "32637 0 0.000000 \n", + "32638 0 0.000000 \n", + "32639 0 0.000000 \n", + "\n", + "[32640 rows x 6 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " * Overall, about 7% (2,051 of the 29,780) labels in your dataset have potential issues.\n", + " ** The overall label health score for this dataset is: 0.93.\n", + "\n", + "Generated with <3 from Cleanlab.\n", + "\n", + "\n", + "🎯 Mnist_test_set 🎯\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Loaded the 'mnist_test_set' dataset with predicted probabilities of shape (10000, 10)\n", + "\n", + "------------------------------------------------------------\n", + "| Generating a Cleanlab Dataset Health Summary |\n", + "| for your dataset with 10,000 examples and 10 classes. |\n", + "| Note, Cleanlab is not a medical doctor... yet. |\n", + "------------------------------------------------------------\n", + "\n", + "Overall Class Quality and Noise across your dataset (below)\n", + "------------------------------------------------------------ \n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Class NameClass IndexLabel IssuesInverse Label IssuesLabel NoiseInverse Label NoiseLabel Quality Score
055220.0022420.0022420.997758
166210.0020880.0010450.997912
288200.0020530.0000000.997947
333210.0019800.0009910.998020
477230.0019460.0029150.998054
522230.0019380.0029040.998062
600110.0010200.0010200.998980
744120.0010180.0020350.998982
899120.0009910.0019800.999009
911000.0000000.0000001.000000
\n", + "
" + ], + "text/plain": [ + " Class Name Class Index Label Issues Inverse Label Issues Label Noise \\\n", + "0 5 5 2 2 0.002242 \n", + "1 6 6 2 1 0.002088 \n", + "2 8 8 2 0 0.002053 \n", + "3 3 3 2 1 0.001980 \n", + "4 7 7 2 3 0.001946 \n", + "5 2 2 2 3 0.001938 \n", + "6 0 0 1 1 0.001020 \n", + "7 4 4 1 2 0.001018 \n", + "8 9 9 1 2 0.000991 \n", + "9 1 1 0 0 0.000000 \n", + "\n", + " Inverse Label Noise Label Quality Score \n", + "0 0.002242 0.997758 \n", + "1 0.001045 0.997912 \n", + "2 0.000000 0.997947 \n", + "3 0.000991 0.998020 \n", + "4 0.002915 0.998054 \n", + "5 0.002904 0.998062 \n", + "6 0.001020 0.998980 \n", + "7 0.002035 0.998982 \n", + "8 0.001980 0.999009 \n", + "9 0.000000 1.000000 " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Class Overlap. In some cases, you may want to merge classes in the top rows (below)\n", + "-----------------------------------------------------------------------------------\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Class Name AClass Name BClass Index AClass Index BNum Overlapping ExamplesJoint Probability
0272730.0003
1565620.0002
2494920.0002
3353520.0002
4282810.0001
5464610.0001
6373710.0001
7020210.0001
8898910.0001
9070710.0001
10121200.0000
11383800.0000
12454500.0000
13050500.0000
14474700.0000
15484800.0000
16040400.0000
17030300.0000
18575700.0000
19585800.0000
20595900.0000
21676700.0000
22686800.0000
23696900.0000
24787800.0000
25797900.0000
26393900.0000
27060600.0000
28131300.0000
29232300.0000
30141400.0000
31151500.0000
32161600.0000
33171700.0000
34181800.0000
35191900.0000
36242400.0000
37363600.0000
38252500.0000
39262600.0000
40090900.0000
41080800.0000
42292900.0000
43343400.0000
44010100.0000
\n", + "
" + ], + "text/plain": [ + " Class Name A Class Name B Class Index A Class Index B \\\n", + "0 2 7 2 7 \n", + "1 5 6 5 6 \n", + "2 4 9 4 9 \n", + "3 3 5 3 5 \n", + "4 2 8 2 8 \n", + "5 4 6 4 6 \n", + "6 3 7 3 7 \n", + "7 0 2 0 2 \n", + "8 8 9 8 9 \n", + "9 0 7 0 7 \n", + "10 1 2 1 2 \n", + "11 3 8 3 8 \n", + "12 4 5 4 5 \n", + "13 0 5 0 5 \n", + "14 4 7 4 7 \n", + "15 4 8 4 8 \n", + "16 0 4 0 4 \n", + "17 0 3 0 3 \n", + "18 5 7 5 7 \n", + "19 5 8 5 8 \n", + "20 5 9 5 9 \n", + "21 6 7 6 7 \n", + "22 6 8 6 8 \n", + "23 6 9 6 9 \n", + "24 7 8 7 8 \n", + "25 7 9 7 9 \n", + "26 3 9 3 9 \n", + "27 0 6 0 6 \n", + "28 1 3 1 3 \n", + "29 2 3 2 3 \n", + "30 1 4 1 4 \n", + "31 1 5 1 5 \n", + "32 1 6 1 6 \n", + "33 1 7 1 7 \n", + "34 1 8 1 8 \n", + "35 1 9 1 9 \n", + "36 2 4 2 4 \n", + "37 3 6 3 6 \n", + "38 2 5 2 5 \n", + "39 2 6 2 6 \n", + "40 0 9 0 9 \n", + "41 0 8 0 8 \n", + "42 2 9 2 9 \n", + "43 3 4 3 4 \n", + "44 0 1 0 1 \n", + "\n", + " Num Overlapping Examples Joint Probability \n", + "0 3 0.0003 \n", + "1 2 0.0002 \n", + "2 2 0.0002 \n", + "3 2 0.0002 \n", + "4 1 0.0001 \n", + "5 1 0.0001 \n", + "6 1 0.0001 \n", + "7 1 0.0001 \n", + "8 1 0.0001 \n", + "9 1 0.0001 \n", + "10 0 0.0000 \n", + "11 0 0.0000 \n", + "12 0 0.0000 \n", + "13 0 0.0000 \n", + "14 0 0.0000 \n", + "15 0 0.0000 \n", + "16 0 0.0000 \n", + "17 0 0.0000 \n", + "18 0 0.0000 \n", + "19 0 0.0000 \n", + "20 0 0.0000 \n", + "21 0 0.0000 \n", + "22 0 0.0000 \n", + "23 0 0.0000 \n", + "24 0 0.0000 \n", + "25 0 0.0000 \n", + "26 0 0.0000 \n", + "27 0 0.0000 \n", + "28 0 0.0000 \n", + "29 0 0.0000 \n", + "30 0 0.0000 \n", + "31 0 0.0000 \n", + "32 0 0.0000 \n", + "33 0 0.0000 \n", + "34 0 0.0000 \n", + "35 0 0.0000 \n", + "36 0 0.0000 \n", + "37 0 0.0000 \n", + "38 0 0.0000 \n", + "39 0 0.0000 \n", + "40 0 0.0000 \n", + "41 0 0.0000 \n", + "42 0 0.0000 \n", + "43 0 0.0000 \n", + "44 0 0.0000 " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " * Overall, about 0% (15 of the 10,000) labels in your dataset have potential issues.\n", + " ** The overall label health score for this dataset is: 1.00.\n", + "\n", + "Generated with <3 from Cleanlab.\n", + "\n", + "\n", + "🎯 Cifar10_test_set 🎯\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Loaded the 'cifar10_test_set' dataset with predicted probabilities of shape (10000, 10)\n", + "\n", + "------------------------------------------------------------\n", + "| Generating a Cleanlab Dataset Health Summary |\n", + "| for your dataset with 10,000 examples and 10 classes. |\n", + "| Note, Cleanlab is not a medical doctor... yet. |\n", + "------------------------------------------------------------\n", + "\n", + "Overall Class Quality and Noise across your dataset (below)\n", + "------------------------------------------------------------ \n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Class NameClass IndexLabel IssuesInverse Label IssuesLabel NoiseInverse Label NoiseLabel Quality Score
0cat371670.0710.0672690.929
1dog546590.0460.0582430.954
2bird235320.0350.0320960.965
3truck931120.0310.0122320.969
4deer422260.0220.0258960.978
5frog620130.0200.0130920.980
6automobile118130.0180.0130650.982
7airplane016310.0160.0305420.984
8ship813210.0130.0208330.987
9horse712100.0120.0100200.988
\n", + "
" + ], + "text/plain": [ + " Class Name Class Index Label Issues Inverse Label Issues Label Noise \\\n", + "0 cat 3 71 67 0.071 \n", + "1 dog 5 46 59 0.046 \n", + "2 bird 2 35 32 0.035 \n", + "3 truck 9 31 12 0.031 \n", + "4 deer 4 22 26 0.022 \n", + "5 frog 6 20 13 0.020 \n", + "6 automobile 1 18 13 0.018 \n", + "7 airplane 0 16 31 0.016 \n", + "8 ship 8 13 21 0.013 \n", + "9 horse 7 12 10 0.012 \n", + "\n", + " Inverse Label Noise Label Quality Score \n", + "0 0.067269 0.929 \n", + "1 0.058243 0.954 \n", + "2 0.032096 0.965 \n", + "3 0.012232 0.969 \n", + "4 0.025896 0.978 \n", + "5 0.013092 0.980 \n", + "6 0.013065 0.982 \n", + "7 0.030542 0.984 \n", + "8 0.020833 0.987 \n", + "9 0.010020 0.988 " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Class Overlap. In some cases, you may want to merge classes in the top rows (below)\n", + "-----------------------------------------------------------------------------------\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Class Name AClass Name BClass Index AClass Index BNum Overlapping ExamplesJoint Probability
0catdog35730.0073
1automobiletruck19200.0020
2birdcat23200.0020
3airplaneship08160.0016
4birddeer24150.0015
5deerdog45140.0014
6catfrog36130.0013
7birdfrog26130.0013
8catdeer34120.0012
9airplanecat03100.0010
10airplanetruck0980.0008
11shiptruck8970.0007
12birddog2570.0007
13doghorse5760.0006
14cathorse3760.0006
15airplanebird0250.0005
16airplaneautomobile0150.0005
17automobileship1840.0004
18catship3830.0003
19horsetruck7930.0003
20deerhorse4730.0003
21deerfrog4630.0003
22birdship2830.0003
23birdhorse2730.0003
24dogtruck5920.0002
25automobilefrog1610.0001
26airplanehorse0710.0001
27airplanefrog0610.0001
28cattruck3910.0001
29airplanedog0510.0001
30automobiledog1510.0001
31birdtruck2910.0001
32deertruck4910.0001
33dogfrog5610.0001
34frogship6810.0001
35horseship7800.0000
36frogtruck6900.0000
37froghorse6700.0000
38automobiledeer1400.0000
39dogship5800.0000
40airplanedeer0400.0000
41automobilehorse1700.0000
42automobilebird1200.0000
43automobilecat1300.0000
44deership4800.0000
\n", + "
" + ], + "text/plain": [ + " Class Name A Class Name B Class Index A Class Index B \\\n", + "0 cat dog 3 5 \n", + "1 automobile truck 1 9 \n", + "2 bird cat 2 3 \n", + "3 airplane ship 0 8 \n", + "4 bird deer 2 4 \n", + "5 deer dog 4 5 \n", + "6 cat frog 3 6 \n", + "7 bird frog 2 6 \n", + "8 cat deer 3 4 \n", + "9 airplane cat 0 3 \n", + "10 airplane truck 0 9 \n", + "11 ship truck 8 9 \n", + "12 bird dog 2 5 \n", + "13 dog horse 5 7 \n", + "14 cat horse 3 7 \n", + "15 airplane bird 0 2 \n", + "16 airplane automobile 0 1 \n", + "17 automobile ship 1 8 \n", + "18 cat ship 3 8 \n", + "19 horse truck 7 9 \n", + "20 deer horse 4 7 \n", + "21 deer frog 4 6 \n", + "22 bird ship 2 8 \n", + "23 bird horse 2 7 \n", + "24 dog truck 5 9 \n", + "25 automobile frog 1 6 \n", + "26 airplane horse 0 7 \n", + "27 airplane frog 0 6 \n", + "28 cat truck 3 9 \n", + "29 airplane dog 0 5 \n", + "30 automobile dog 1 5 \n", + "31 bird truck 2 9 \n", + "32 deer truck 4 9 \n", + "33 dog frog 5 6 \n", + "34 frog ship 6 8 \n", + "35 horse ship 7 8 \n", + "36 frog truck 6 9 \n", + "37 frog horse 6 7 \n", + "38 automobile deer 1 4 \n", + "39 dog ship 5 8 \n", + "40 airplane deer 0 4 \n", + "41 automobile horse 1 7 \n", + "42 automobile bird 1 2 \n", + "43 automobile cat 1 3 \n", + "44 deer ship 4 8 \n", + "\n", + " Num Overlapping Examples Joint Probability \n", + "0 73 0.0073 \n", + "1 20 0.0020 \n", + "2 20 0.0020 \n", + "3 16 0.0016 \n", + "4 15 0.0015 \n", + "5 14 0.0014 \n", + "6 13 0.0013 \n", + "7 13 0.0013 \n", + "8 12 0.0012 \n", + "9 10 0.0010 \n", + "10 8 0.0008 \n", + "11 7 0.0007 \n", + "12 7 0.0007 \n", + "13 6 0.0006 \n", + "14 6 0.0006 \n", + "15 5 0.0005 \n", + "16 5 0.0005 \n", + "17 4 0.0004 \n", + "18 3 0.0003 \n", + "19 3 0.0003 \n", + "20 3 0.0003 \n", + "21 3 0.0003 \n", + "22 3 0.0003 \n", + "23 3 0.0003 \n", + "24 2 0.0002 \n", + "25 1 0.0001 \n", + "26 1 0.0001 \n", + "27 1 0.0001 \n", + "28 1 0.0001 \n", + "29 1 0.0001 \n", + "30 1 0.0001 \n", + "31 1 0.0001 \n", + "32 1 0.0001 \n", + "33 1 0.0001 \n", + "34 1 0.0001 \n", + "35 0 0.0000 \n", + "36 0 0.0000 \n", + "37 0 0.0000 \n", + "38 0 0.0000 \n", + "39 0 0.0000 \n", + "40 0 0.0000 \n", + "41 0 0.0000 \n", + "42 0 0.0000 \n", + "43 0 0.0000 \n", + "44 0 0.0000 " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " * Overall, about 2% (244 of the 10,000) labels in your dataset have potential issues.\n", + " ** The overall label health score for this dataset is: 0.98.\n", + "\n", + "Generated with <3 from Cleanlab.\n", + "\n", + "\n", + "🎯 Cifar100_test_set 🎯\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Loaded the 'cifar100_test_set' dataset with predicted probabilities of shape (10000, 100)\n", + "\n", + "-------------------------------------------------------------\n", + "| Generating a Cleanlab Dataset Health Summary |\n", + "| for your dataset with 10,000 examples and 100 classes. |\n", + "| Note, Cleanlab is not a medical doctor... yet. |\n", + "-------------------------------------------------------------\n", + "\n", + "Overall Class Quality and Noise across your dataset (below)\n", + "------------------------------------------------------------ \n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Class NameClass IndexLabel IssuesInverse Label IssuesLabel NoiseInverse Label NoiseLabel Quality Score
0boy1154380.540.4523810.46
1girl3553400.530.4597700.47
2seal7249560.490.5233640.51
3man4645470.450.4607840.55
4shark7343460.430.4466020.57
........................
95road685110.050.1037740.95
96skunk75530.050.0306120.95
97orange533120.030.1100920.97
98motorcycle48350.030.0490200.97
99wardrobe94350.030.0490200.97
\n", + "

100 rows × 7 columns

\n", + "
" + ], + "text/plain": [ + " Class Name Class Index Label Issues Inverse Label Issues Label Noise \\\n", + "0 boy 11 54 38 0.54 \n", + "1 girl 35 53 40 0.53 \n", + "2 seal 72 49 56 0.49 \n", + "3 man 46 45 47 0.45 \n", + "4 shark 73 43 46 0.43 \n", + ".. ... ... ... ... ... \n", + "95 road 68 5 11 0.05 \n", + "96 skunk 75 5 3 0.05 \n", + "97 orange 53 3 12 0.03 \n", + "98 motorcycle 48 3 5 0.03 \n", + "99 wardrobe 94 3 5 0.03 \n", + "\n", + " Inverse Label Noise Label Quality Score \n", + "0 0.452381 0.46 \n", + "1 0.459770 0.47 \n", + "2 0.523364 0.51 \n", + "3 0.460784 0.55 \n", + "4 0.446602 0.57 \n", + ".. ... ... \n", + "95 0.103774 0.95 \n", + "96 0.030612 0.95 \n", + "97 0.110092 0.97 \n", + "98 0.049020 0.97 \n", + "99 0.049020 0.97 \n", + "\n", + "[100 rows x 7 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Class Overlap. In some cases, you may want to merge classes in the top rows (below)\n", + "-----------------------------------------------------------------------------------\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Class Name AClass Name BClass Index AClass Index BNum Overlapping ExamplesJoint Probability
0girlwoman3598340.0034
1boyman1146320.0032
2maple_treewillow_tree4796260.0026
3maple_treeoak_tree4752250.0025
4otterseal5572250.0025
.....................
4945cattlewhale199500.0000
4946cattlewillow_tree199600.0000
4947cattlewoman199800.0000
4948cattleworm199900.0000
4949womanworm989900.0000
\n", + "

4950 rows × 6 columns

\n", + "
" + ], + "text/plain": [ + " Class Name A Class Name B Class Index A Class Index B \\\n", + "0 girl woman 35 98 \n", + "1 boy man 11 46 \n", + "2 maple_tree willow_tree 47 96 \n", + "3 maple_tree oak_tree 47 52 \n", + "4 otter seal 55 72 \n", + "... ... ... ... ... \n", + "4945 cattle whale 19 95 \n", + "4946 cattle willow_tree 19 96 \n", + "4947 cattle woman 19 98 \n", + "4948 cattle worm 19 99 \n", + "4949 woman worm 98 99 \n", + "\n", + " Num Overlapping Examples Joint Probability \n", + "0 34 0.0034 \n", + "1 32 0.0032 \n", + "2 26 0.0026 \n", + "3 25 0.0025 \n", + "4 25 0.0025 \n", + "... ... ... \n", + "4945 0 0.0000 \n", + "4946 0 0.0000 \n", + "4947 0 0.0000 \n", + "4948 0 0.0000 \n", + "4949 0 0.0000 \n", + "\n", + "[4950 rows x 6 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " * Overall, about 18% (1,846 of the 10,000) labels in your dataset have potential issues.\n", + " ** The overall label health score for this dataset is: 0.82.\n", + "\n", + "Generated with <3 from Cleanlab.\n", + "\n", + "\n", + "🎯 20news_test_set 🎯\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Loaded the '20news_test_set' dataset with predicted probabilities of shape (7532, 20)\n", + "\n", + "-----------------------------------------------------------\n", + "| Generating a Cleanlab Dataset Health Summary |\n", + "| for your dataset with 7,532 examples and 20 classes. |\n", + "| Note, Cleanlab is not a medical doctor... yet. |\n", + "-----------------------------------------------------------\n", + "\n", + "Overall Class Quality and Noise across your dataset (below)\n", + "------------------------------------------------------------ \n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Class NameClass IndexLabel IssuesInverse Label IssuesLabel NoiseInverse Label NoiseLabel Quality Score
0alt.atheism01130.0344830.0096460.965517
1comp.os.ms-windows.misc21280.0304570.0205130.969543
2comp.sys.ibm.pc.hardware311140.0280610.0354430.971939
3comp.windows.x51020.0253160.0051680.974684
4misc.forsale68200.0205130.0497510.979487
5talk.religion.misc195110.0199200.0428020.980080
6rec.autos7720.0176770.0051150.982323
7comp.sys.mac.hardware4520.0129870.0052360.987013
8sci.electronics125100.0127230.0251260.987277
9talk.politics.guns16430.0109890.0082640.989011
10comp.graphics14110.0102830.0277780.989717
11talk.politics.misc18300.0096770.0000000.990323
12sci.space14340.0076140.0101270.992386
13sci.crypt11220.0050510.0050510.994949
14sci.med13220.0050510.0050510.994949
15rec.motorcycles8200.0050250.0000000.994975
16rec.sport.hockey10200.0050130.0000000.994987
17soc.religion.christian15000.0000000.0000001.000000
18talk.politics.mideast17000.0000000.0000001.000000
19rec.sport.baseball9020.0000000.0050131.000000
\n", + "
" + ], + "text/plain": [ + " Class Name Class Index Label Issues Inverse Label Issues \\\n", + "0 alt.atheism 0 11 3 \n", + "1 comp.os.ms-windows.misc 2 12 8 \n", + "2 comp.sys.ibm.pc.hardware 3 11 14 \n", + "3 comp.windows.x 5 10 2 \n", + "4 misc.forsale 6 8 20 \n", + "5 talk.religion.misc 19 5 11 \n", + "6 rec.autos 7 7 2 \n", + "7 comp.sys.mac.hardware 4 5 2 \n", + "8 sci.electronics 12 5 10 \n", + "9 talk.politics.guns 16 4 3 \n", + "10 comp.graphics 1 4 11 \n", + "11 talk.politics.misc 18 3 0 \n", + "12 sci.space 14 3 4 \n", + "13 sci.crypt 11 2 2 \n", + "14 sci.med 13 2 2 \n", + "15 rec.motorcycles 8 2 0 \n", + "16 rec.sport.hockey 10 2 0 \n", + "17 soc.religion.christian 15 0 0 \n", + "18 talk.politics.mideast 17 0 0 \n", + "19 rec.sport.baseball 9 0 2 \n", + "\n", + " Label Noise Inverse Label Noise Label Quality Score \n", + "0 0.034483 0.009646 0.965517 \n", + "1 0.030457 0.020513 0.969543 \n", + "2 0.028061 0.035443 0.971939 \n", + "3 0.025316 0.005168 0.974684 \n", + "4 0.020513 0.049751 0.979487 \n", + "5 0.019920 0.042802 0.980080 \n", + "6 0.017677 0.005115 0.982323 \n", + "7 0.012987 0.005236 0.987013 \n", + "8 0.012723 0.025126 0.987277 \n", + "9 0.010989 0.008264 0.989011 \n", + "10 0.010283 0.027778 0.989717 \n", + "11 0.009677 0.000000 0.990323 \n", + "12 0.007614 0.010127 0.992386 \n", + "13 0.005051 0.005051 0.994949 \n", + "14 0.005051 0.005051 0.994949 \n", + "15 0.005025 0.000000 0.994975 \n", + "16 0.005013 0.000000 0.994987 \n", + "17 0.000000 0.000000 1.000000 \n", + "18 0.000000 0.000000 1.000000 \n", + "19 0.000000 0.005013 1.000000 " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Class Overlap. In some cases, you may want to merge classes in the top rows (below)\n", + "-----------------------------------------------------------------------------------\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Class Name AClass Name BClass Index AClass Index BNum Overlapping ExamplesJoint Probability
0alt.atheismtalk.religion.misc019140.001859
1comp.os.ms-windows.misccomp.sys.ibm.pc.hardware23100.001328
2misc.forsalesci.electronics61270.000929
3misc.forsalerec.autos6770.000929
4comp.os.ms-windows.misccomp.windows.x2550.000664
.....................
185comp.sys.mac.hardwarerec.motorcycles4800.000000
186comp.sys.mac.hardwarerec.sport.baseball4900.000000
187comp.sys.mac.hardwarerec.sport.hockey41000.000000
188comp.sys.mac.hardwaresci.crypt41100.000000
189talk.politics.misctalk.religion.misc181900.000000
\n", + "

190 rows × 6 columns

\n", + "
" + ], + "text/plain": [ + " Class Name A Class Name B Class Index A \\\n", + "0 alt.atheism talk.religion.misc 0 \n", + "1 comp.os.ms-windows.misc comp.sys.ibm.pc.hardware 2 \n", + "2 misc.forsale sci.electronics 6 \n", + "3 misc.forsale rec.autos 6 \n", + "4 comp.os.ms-windows.misc comp.windows.x 2 \n", + ".. ... ... ... \n", + "185 comp.sys.mac.hardware rec.motorcycles 4 \n", + "186 comp.sys.mac.hardware rec.sport.baseball 4 \n", + "187 comp.sys.mac.hardware rec.sport.hockey 4 \n", + "188 comp.sys.mac.hardware sci.crypt 4 \n", + "189 talk.politics.misc talk.religion.misc 18 \n", + "\n", + " Class Index B Num Overlapping Examples Joint Probability \n", + "0 19 14 0.001859 \n", + "1 3 10 0.001328 \n", + "2 12 7 0.000929 \n", + "3 7 7 0.000929 \n", + "4 5 5 0.000664 \n", + ".. ... ... ... \n", + "185 8 0 0.000000 \n", + "186 9 0 0.000000 \n", + "187 10 0 0.000000 \n", + "188 11 0 0.000000 \n", + "189 19 0 0.000000 \n", + "\n", + "[190 rows x 6 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " * Overall, about 1% (55 of the 7,532) labels in your dataset have potential issues.\n", + " ** The overall label health score for this dataset is: 0.99.\n", + "\n", + "Generated with <3 from Cleanlab.\n", + "\n" + ] + } + ], + "source": [ + "DATASETS = ['caltech256', 'mnist_test_set', 'cifar10_test_set', 'cifar100_test_set', '20news_test_set']\n", + "\n", + "for dataset_name in DATASETS:\n", + "\n", + " print(\"\\n🎯 \" + dataset_name.capitalize() + \" 🎯\\n\")\n", + "\n", + " # load class names, given labels, and predicted probabilities from already-trained model\n", + " pred_probs, labels, class_names = _load_classes_predprobs_labels(dataset_name)\n", + "\n", + " # run 1 line of code to evaluate the health of your dataset\n", + " _ = cleanlab.dataset.health_summary(labels, pred_probs, class_names=class_names)" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "cleanlab_dataset_tutorial.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/faq.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/faq.ipynb new file mode 100644 index 000000000..699fe23e5 --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/faq.ipynb @@ -0,0 +1,2361 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "ffe0d62e", + "metadata": {}, + "source": [ + "# FAQ\n", + "\n", + "Answers to frequently asked questions about the [cleanlab](https://github.com/cleanlab/cleanlab) open-source package.\n", + "\n", + "The code snippets in this FAQ come from a fully executable notebook you can run via Colab or locally by downloading it [here](https://github.com/cleanlab/cleanlab/blob/master/docs/source/tutorials/faq.ipynb).\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "2a4efdde", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:09.205522Z", + "iopub.status.busy": "2024-06-25T23:03:09.205337Z", + "iopub.status.idle": "2024-06-25T23:03:10.329095Z", + "shell.execute_reply": "2024-06-25T23:03:10.328538Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# This cell is hidden on docs.cleanlab.ai. Execute it to ensure all other cells below can be executed in your own notebook\n", + "\n", + "import os \n", + "import logging \n", + "import numpy as np \n", + "import sklearn \n", + "import cleanlab \n", + "\n", + "np.random.seed(123)\n", + "\n", + "# Toy dataset:\n", + "N = 50\n", + "K = 3\n", + "num_errors = 4\n", + "labels = np.random.randint(low=0, high=K, size=N)\n", + "pred_probs = np.random.random_sample(N*K).reshape((N,K))\n", + "pred_probs[np.arange(N),labels] += 4 # make pred_probs accurate\n", + "pred_probs = pred_probs/pred_probs.sum(axis=1)[:, np.newaxis]\n", + "data = np.array([[label+np.random.uniform(), label+np.random.uniform()] for label in labels])\n", + "# introduce label errors in last few examples:\n", + "og0_indices = labels[-num_errors:] == 0\n", + "labels[-num_errors:] = 0\n", + "labels[-num_errors:][og0_indices] = 1\n", + "\n", + "your_classifier=sklearn.linear_model.LogisticRegression() # toy classifier" + ] + }, + { + "cell_type": "markdown", + "id": "d504ec58", + "metadata": {}, + "source": [ + "### What data can cleanlab detect issues in?" + ] + }, + { + "cell_type": "markdown", + "id": "5e70efbc", + "metadata": {}, + "source": [ + "Currently, cleanlab can be used to detect label issues in any classification dataset, including those involving: multiple annotators per example (multi-annotator), or multiple labels per example (multi-label). This includes data from any modality such as: image, text, tabular, audio, etc. For text data, cleanlab also supports NLP tasks like entity recognition in which each word is individually labeled (token classification). We're [working to add support](https://github.com/orgs/cleanlab/projects/2) for all other common supervised learning tasks. If you have a particular task in mind, [let us know](https://github.com/cleanlab/cleanlab/issues?q=is%3Aissue)!" + ] + }, + { + "cell_type": "markdown", + "id": "eca36874", + "metadata": {}, + "source": [ + "### How do I format classification labels for cleanlab?" + ] + }, + { + "cell_type": "markdown", + "id": "38c50875", + "metadata": {}, + "source": [ + "**With Datalab**:\n", + "\n", + "Datalab simplifies label management by accepting both string and integer labels directly. Internally, unique labels are sorted alphanumerically and mapped to integers, facilitating seamless integration with lower-level cleanlab methods. Below are the supported label formats:\n", + "\n", + "- **List of strings or integers**: Directly pass labels as a list of strings or integers without manual encoding.\n", + "\n", + "- **Using** `datasets.Dataset` **with** `ClassLabel`: For advanced use cases, you can structure your dataset using HuggingFace's `datasets.Dataset` object, specifying label columns as `ClassLabel` feature objects for formatting the labels. Refer to the [datasets documentation](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.ClassLabel) for detailed guidance.\n", + "\n", + "```python\n", + "from cleanlab import Datalab\n", + "from datasets import Dataset, Features, Value, ClassLabel\n", + "\n", + "# Example 1: Labels as a list of strings\n", + "labels_str = ['cat', 'dog', 'cat', 'dog']\n", + "datalab_str = Datalab(data={\"text\": [\"a\", \"b\", \"c\", \"d\"], \"label\": labels_str}, label_name=\"label\")\n", + "print(\"String labels:\", datalab_str.labels)\n", + "\n", + "# Example 2: Labels as a list of integers\n", + "labels_int = [1, 2, 2, 1] # These will be remapped to [0, 1] internally\n", + "datalab_int = Datalab(data={\"text\": [\"a\", \"b\", \"c\", \"d\"], \"label\": labels_int}, label_name=\"label\")\n", + "print(\"Integer labels:\", datalab_int.labels)\n", + "\n", + "# Example 3: Advanced - Dataset with ClassLabel feature\n", + "my_dict = {\"pet_name\": [\"Spot\", \"Mittens\", \"Rover\", \"Rocky\", \"Pepper\", \"Socks\"], \"species\": [\"dog\", \"cat\", \"dog\", \"dog\", \"cat\", \"cat\"]}\n", + "features = Features({\"pet_name\": Value(\"string\"), \"species\": ClassLabel(names=[\"dog\", \"cat\"])})\n", + "dataset = Dataset.from_dict(my_dict, features=features)\n", + "datalab_dataset = Datalab(data=dataset, label_name=\"species\")\n", + "print(\"ClassLabel feature:\", datalab_dataset.labels)\n", + "```\n", + "\n", + "Using Datalab allows you to directly handle raw class name labels in your dataset while ensuring compatibility with label encoding requirements of lower-level cleanlab methods, which we'll cover in the next section.\n" + ] + }, + { + "cell_type": "markdown", + "id": "d5d0fbb3", + "metadata": {}, + "source": [ + "**Without Datalab**:\n", + "\n", + "Outside of Datalab, cleanlab offers various lower-level methods to directly operate on labels and diagnose issues. For instance: ``get_label_quality_scores()`` and ``find_label_issues()``. These lower-level methods only work with integer-encoded labels in the range `{0,1, ... K-1}` where `K = number_of_classes`. The `labels` array should only contain integer values in the range `{0, K-1}` and be of shape `(N,)` where `N = total_number_of_data_points`.\n", + "Do not pass in `labels` where some classes are entirely missing or are extremely rare, as cleanlab may not perform as expected. It is better to remove such classes entirely from the dataset first (also dropping the corresponding dimensions from `pred_probs` and then renormalizing it).\n", + "\n", + "**Text or string labels** should to be mapped to integers for each possible value. For example if your original data labels look like this: `[\"dog\", \"dog\", \"cat\", \"mouse\", \"cat\"]`, you should feed them to cleanlab like this: `labels = [1,1,0,2,0]` and keep track of which integer uniquely represents which class (classes were ordered alphabetically in this example). \n", + "\n", + "**One-hot encoded labels** should be integer-encoded by finding the argmax along the one-hot encoded axis. An example of what this might look like is shown below." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "239d5ee7", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:10.331905Z", + "iopub.status.busy": "2024-06-25T23:03:10.331372Z", + "iopub.status.idle": "2024-06-25T23:03:10.334743Z", + "shell.execute_reply": "2024-06-25T23:03:10.334200Z" + } + }, + "outputs": [], + "source": [ + "import numpy as np \n", + "\n", + "# This example arr has 4 labels (one per data point) where \n", + "# each label can be one of 3 possible classes\n", + "\n", + "arr = np.array([[0,1,0],[1,0,0],[0,0,1],[1,0,0]])\n", + "labels_proper_format = np.argmax(arr, axis=1) # How labels should be formatted when passed into the model" + ] + }, + { + "cell_type": "markdown", + "id": "4181cac7", + "metadata": {}, + "source": [ + "### How do I infer the correct labels for examples cleanlab has flagged?" + ] + }, + { + "cell_type": "markdown", + "id": "6d4db5e1", + "metadata": {}, + "source": [ + "If you have a classifier that is compatible with [CleanLearning](../cleanlab/classification.html) (i.e. follows the sklearn API), here's an easy way to see predicted labels alongside the label issues:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "28b324aa", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:10.336788Z", + "iopub.status.busy": "2024-06-25T23:03:10.336459Z", + "iopub.status.idle": "2024-06-25T23:03:13.546031Z", + "shell.execute_reply": "2024-06-25T23:03:13.545314Z" + } + }, + "outputs": [], + "source": [ + "cl = cleanlab.classification.CleanLearning(your_classifier)\n", + "issues_dataframe = cl.find_label_issues(data, labels)" + ] + }, + { + "cell_type": "markdown", + "id": "6d4db5e2", + "metadata": {}, + "source": [ + "Alternatively if you have already computed out-of-sample predicted probabilities (`pred_probs`) from a classifier:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "28b324ab", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:13.549216Z", + "iopub.status.busy": "2024-06-25T23:03:13.548611Z", + "iopub.status.idle": "2024-06-25T23:03:13.584683Z", + "shell.execute_reply": "2024-06-25T23:03:13.584103Z" + } + }, + "outputs": [], + "source": [ + "cl = cleanlab.classification.CleanLearning()\n", + "issues_dataframe = cl.find_label_issues(X=None, labels=labels, pred_probs=pred_probs)" + ] + }, + { + "cell_type": "markdown", + "id": "b386dfc8", + "metadata": {}, + "source": [ + "Otherwise if you have already found issues via:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "90c10e18", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:13.587329Z", + "iopub.status.busy": "2024-06-25T23:03:13.587034Z", + "iopub.status.idle": "2024-06-25T23:03:13.617892Z", + "shell.execute_reply": "2024-06-25T23:03:13.617201Z" + } + }, + "outputs": [], + "source": [ + "issues = cleanlab.filter.find_label_issues(labels, pred_probs)" + ] + }, + { + "cell_type": "markdown", + "id": "ad9ca03e", + "metadata": {}, + "source": [ + "then you can see your trained classifier's class prediction for each flagged example like this: " + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "88839519", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:13.620523Z", + "iopub.status.busy": "2024-06-25T23:03:13.620216Z", + "iopub.status.idle": "2024-06-25T23:03:13.623194Z", + "shell.execute_reply": "2024-06-25T23:03:13.622733Z" + } + }, + "outputs": [], + "source": [ + "class_predicted_for_flagged_examples = pred_probs[issues].argmax(axis=1)" + ] + }, + { + "cell_type": "markdown", + "id": "a668b74b", + "metadata": {}, + "source": [ + "Here you can see the classifier's class prediction for every example via:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "558490c2", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:13.625218Z", + "iopub.status.busy": "2024-06-25T23:03:13.624906Z", + "iopub.status.idle": "2024-06-25T23:03:13.627424Z", + "shell.execute_reply": "2024-06-25T23:03:13.626997Z" + } + }, + "outputs": [], + "source": [ + "class_predicted_for_all_examples = pred_probs.argmax(axis=1)" + ] + }, + { + "cell_type": "markdown", + "id": "f9450eed", + "metadata": {}, + "source": [ + "We caution against just blindly taking the predicted label for granted, many of these suggestions may be wrong! \n", + "You will be able to produce a much better version of your dataset interactively using [Cleanlab Studio](https://cleanlab.ai/studio/?utm_source=github&utm_medium=docs&utm_campaign=clostostudio), which helps you efficiently fix issues like this in large datasets." + ] + }, + { + "cell_type": "markdown", + "id": "bcc97591", + "metadata": {}, + "source": [ + "### How should I handle label errors in train vs. test data?\n", + "\n", + "If you do not address label errors in your test data, you may not even know when you have produced a better ML model because the evaluation is too noisy. For the best-trained models and most reliable evaluation of them, you should fix label errors in both training and testing data.\n", + "\n", + "To do this efficiently, first use cleanlab to automatically find label issues in both sets. You can simply merge these two sets into one larger dataset and run cross-validation training. On the merged dataset, you can do either of the following to detect label issues:\n", + "\n", + "\n", + "\n", + "**With Datalab**: Run `Datalab.find_issues()` on the merged dataset, then call `Datalab.report()` to see the label issues (and other types of data issues).\n", + "\n", + "```python\n", + "from cleanlab import Datalab\n", + "\n", + "lab = Datalab(data = merged_dataset, label_name = \"label_column_name\")\n", + "\n", + "# Run proper cross-validation when computing predicted probabilities\n", + "lab.find_issues(pred_probs = pred_probs, issue_types = {\"label\": {}})\n", + "\n", + "lab.report()\n", + "```\n", + "\n", + "You can fetch the label issues DataFrame from the `Datalab` object by calling:\n", + "\n", + "```python\n", + "label_issues = lab.get_issues(\"label\")\n", + "```\n", + "\n", + "**Without Datalab**: Run cleanlab's lower-level `find_label_issues()` method on the merged datataset. Calling the [CleanLearning.find_label_issues()](../cleanlab/classification.html) method on your merged dataset both runs cross-validation training and finds label issues for you with any scikit-learn compatible classifier you choose.\n", + "\n", + "---\n", + "\n", + "After finding label issues, be **wary** about auto-correcting the labels for test examples. Instead manually fix the labels for your test data via careful review of the flagged issues. You can use [Cleanlab Studio](https://cleanlab.ai/studio/) to fix labels efficiently.\n", + "\n", + "Auto-correcting labels for your training data is fair game, which should improve ML performance (if properly evaluated with clean test labels). You can boost ML performance further by manually fixing the training examples flagged with label issues, as demonstrated in this article:\n", + "\n", + "[**Handling Mislabeled Tabular Data to Improve Your XGBoost Model**](https://cleanlab.ai/blog/label-errors-tabular-datasets/)" + ] + }, + { + "cell_type": "markdown", + "id": "21f42f24", + "metadata": {}, + "source": [ + "### How can I find label issues in big datasets with limited memory? " + ] + }, + { + "cell_type": "markdown", + "id": "089f505e", + "metadata": {}, + "source": [ + "For a dataset with many rows and/or classes, there are more efficient methods in the `label_issues_batched` module. These methods read data in mini-batches and you can reduce the `batch_size` to control how much memory they require. Below is an example of how to use the `find_label_issues_batched()` method from this module, which can load mini-batches of data from `labels`, `pred_probs` saved as .npy files on disk. You can also run this method on Zarr arrays loaded from .zarr files. Try playing with the `n_jobs` argument for further multiprocessing speedups. If you need greater flexibility, check out the `LabelInspector` class from this module." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "41714b51", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:13.629496Z", + "iopub.status.busy": "2024-06-25T23:03:13.629181Z", + "iopub.status.idle": "2024-06-25T23:03:13.655246Z", + "shell.execute_reply": "2024-06-25T23:03:13.654711Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "mmap-loaded numpy arrays have: 50 examples, 3 classes\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d94b77d0f6ad40dfa13bb6ce614c0556", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "number of examples processed for estimating thresholds: 0%| | 0/50 [00:00 0.95, \"issue indices differ in batched mode\"" + ] + }, + { + "cell_type": "markdown", + "id": "438b424d", + "metadata": {}, + "source": [ + "**To use less memory and get results faster if your dataset has many classes:** Try merging the rare classes into a single \"Other\" class before you find label issues. The resulting issues won't be affected much since cleanlab anyway does not have enough data to accurately diagnose label errors in classes that are rarely seen. To do this, you should aggregate all the probability assigned to the rare classes in `pred_probs` into a single new dimension of `pred_probs_merged` (where this new array no longer has columns for the rare classes). Here is a function that does this for you, which you can also modify as needed:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "6983cdad", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:13.669834Z", + "iopub.status.busy": "2024-06-25T23:03:13.669450Z", + "iopub.status.idle": "2024-06-25T23:03:13.672813Z", + "shell.execute_reply": "2024-06-25T23:03:13.672382Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# This cell is hidden on docs.cleanlab.ai\n", + "# Add two rare additional classes to the dataset:\n", + "\n", + "num_rare_instances = 3\n", + "small_prob = 1e-4\n", + "pred_probs = np.hstack((pred_probs, np.ones((len(pred_probs),2))*small_prob))\n", + "pred_probs = pred_probs / np.sum(pred_probs, axis=1)[:, np.newaxis]\n", + "labels[:num_rare_instances] = 3\n", + "labels[num_rare_instances:(2*num_rare_instances)] = 4" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "9092b8a0", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:13.674781Z", + "iopub.status.busy": "2024-06-25T23:03:13.674459Z", + "iopub.status.idle": "2024-06-25T23:03:13.680711Z", + "shell.execute_reply": "2024-06-25T23:03:13.680281Z" + } + }, + "outputs": [], + "source": [ + "from cleanlab.internal.util import value_counts # use this to count how often each class occurs in labels\n", + "\n", + "def merge_rare_classes(labels, pred_probs, count_threshold = 10):\n", + " \"\"\" \n", + " Returns: labels, pred_probs after we merge all rare classes into a single 'Other' class.\n", + " Merged pred_probs has less columns. Rare classes are any occuring less than `count_threshold` times.\n", + " Also returns: `class_mapping_orig2new`, a dict to map new classes in merged labels back to classes \n", + " in original labels, useful for interpreting outputs from `dataset.heath_summary()` or `count.confident_joint()`.\n", + " \"\"\"\n", + " num_classes = pred_probs.shape[1]\n", + " num_examples_per_class = value_counts(labels, num_classes=num_classes)\n", + " rare_classes = [c for c in range(num_classes) if num_examples_per_class[c] < count_threshold]\n", + " if len(rare_classes) < 1:\n", + " raise ValueError(\"No rare classes found at the given `count_threshold`, merging is unnecessary unless you increase it.\")\n", + "\n", + " num_classes_merged = num_classes - len(rare_classes) + 1 # one extra class for all the merged ones\n", + " other_class = num_classes_merged - 1\n", + " labels_merged = labels.copy()\n", + " class_mapping_orig2new = {} # key = original class in `labels`, value = new class in `labels_merged`\n", + " new_c = 0\n", + " for c in range(num_classes):\n", + " if c in rare_classes:\n", + " class_mapping_orig2new[c] = other_class\n", + " else:\n", + " class_mapping_orig2new[c] = new_c\n", + " new_c += 1\n", + " labels_merged[labels == c] = class_mapping_orig2new[c]\n", + "\n", + " merged_prob = np.sum(pred_probs[:, rare_classes], axis=1, keepdims=True) # total probability over all merged classes for each example\n", + " pred_probs_merged = np.hstack((np.delete(pred_probs, rare_classes, axis=1), merged_prob)) # assumes new_class is as close to original_class in sorted order as is possible after removing the merged original classes\n", + " # check a few rows of probabilities after merging to verify they still sum to 1:\n", + " num_check = 1000 # only check a few rows for efficiency\n", + " ones_array_ref = np.ones(min(num_check,len(pred_probs)))\n", + " if np.isclose(np.sum(pred_probs[:num_check], axis=1), ones_array_ref).all() and (not np.isclose(np.sum(pred_probs_merged[:num_check], axis=1), ones_array_ref).all()):\n", + " raise ValueError(\"merged pred_probs do not sum to 1 in each row, check that merging was correctly done.\")\n", + " \n", + " return (labels_merged, pred_probs_merged, class_mapping_orig2new)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "b0a01109", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:13.682727Z", + "iopub.status.busy": "2024-06-25T23:03:13.682409Z", + "iopub.status.idle": "2024-06-25T23:03:13.715580Z", + "shell.execute_reply": "2024-06-25T23:03:13.715011Z" + } + }, + "outputs": [], + "source": [ + "from cleanlab.filter import find_label_issues # can alternatively use find_label_issues_batched() shown above\n", + "\n", + "labels_merged, pred_probs_merged, class_mapping_orig2new = merge_rare_classes(labels, pred_probs, count_threshold=5)\n", + "examples_w_issues = find_label_issues(labels_merged, pred_probs_merged, return_indices_ranked_by=\"self_confidence\")" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "8b1da032", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:13.718256Z", + "iopub.status.busy": "2024-06-25T23:03:13.717832Z", + "iopub.status.idle": "2024-06-25T23:03:13.751531Z", + "shell.execute_reply": "2024-06-25T23:03:13.750843Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# This cell is hidden on docs.cleanlab.ai, and is only for internal testing. You can ignore it.\n", + "\n", + "rare_classes = [c for c in class_mapping_orig2new.keys() if class_mapping_orig2new[c] == pred_probs_merged.shape[1]-1]\n", + "og_examples_w_issues = find_label_issues(labels, pred_probs, return_indices_ranked_by=\"self_confidence\")\n", + "examples_of_interest = [x for x in examples_w_issues if labels[x] not in rare_classes]\n", + "og_examples_of_interest = [x for x in og_examples_w_issues if labels[x] not in rare_classes]\n", + "assert set(examples_of_interest) == set(og_examples_of_interest), \"merged label issues differ from non-merged label issues\"" + ] + }, + { + "cell_type": "markdown", + "id": "3868ee8b", + "metadata": {}, + "source": [ + "### Why isn’t CleanLearning working for me?" + ] + }, + { + "cell_type": "markdown", + "id": "d13c9cd0", + "metadata": {}, + "source": [ + "At this time, CleanLearning only works with data formatted as numpy matrices or pd.DataFrames, \n", + "and with models that are compatible with the `sklearn` API \n", + "(check out [skorch](https://github.com/skorch-dev/skorch) for Pytorch compatibility and [scikeras](https://github.com/adriangb/scikeras) for Tensorflow/Keras compatibility). \n", + "You can still use cleanlab with other data formats though! Just separately obtain predicted probabilities (`pred_probs`) from your model via cross-validation and pass them as inputs. \n", + "\n", + "\n", + "If CleanLearning is running successfully but not improving predictive accuracy of your model, here are some tips:\n", + "\n", + "1. Use cleanlab to find label issues in your test data as well (we recommend pooling `labels` across both training and test data into one input for `find_label_issues()`). Then manually review and fix label issues identified in the test data to verify accuracy measurements are actually meaningful.\n", + "\n", + "2. Try different values for `filter_by`, `frac_noise`, and `min_examples_per_class` which can be set via the `find_label_issues_kwargs` argument in the initialization of `CleanLearning()`.\n", + "\n", + "3. Try to find a better model (eg. via hyperparameter tuning or changing to another classifier). `CleanLearning` can find better label issues by leveraging a better model, which allows it to produce better quality training data. This can form a virtuous cycle in which better models -> better issue detection -> better data -> even better models! \n", + "\n", + "4. Try jointly tuning both model hyperparameters and `find_label_issues_kwargs` values.\n", + "\n", + "5. Does your dataset have a *junk* (or *clutter*, *unknown*, *other*) class? If you have bad data, consider creating one (c.f. Caltech-256).\n", + "\n", + "6. Consider merging similar/overlapping classes found via ``cleanlab.dataset.find_overlapping_classes``.\n", + "\n", + "Other general tips to improve label error detection performance:\n", + "\n", + "1. Try creating more restrictive new filters by combining their intersections (e.g. `combined_boolean_mask = mask1 & mask2` where `mask1` and `mask2` are the boolean masks created by running `find_label_issues` with different values of the `filter_by` argument).\n", + "\n", + "2. If your `pred_probs` are obtained via a neural network, try averaging the `pred_probs` over the last K epochs of training instead of just using the final `pred_probs`. Similarly, you can try averaging `pred_probs` from several models (remember to re-normalize) or using ``cleanlab.rank.get_label_quality_ensemble_scores``.\n" + ] + }, + { + "cell_type": "markdown", + "id": "9ae3899c", + "metadata": {}, + "source": [ + "### How can I use different models for data cleaning vs. final training in CleanLearning?" + ] + }, + { + "cell_type": "markdown", + "id": "a2ce1518", + "metadata": {}, + "source": [ + "The code below demonstrates CleanLearning with 2 different classifiers: `LogisticRegression()` and `GradientBoostingClassifier()`.\n", + "A `LogisticRegression` model is used to detect label issues (via cross-validation run inside CleanLearning) and a `GradientBoostingClassifier` model is finally trained on a clean subset of the data with issues removed.\n", + "This can be done with any two classifiers." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "4c9e9030", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:13.754166Z", + "iopub.status.busy": "2024-06-25T23:03:13.753936Z", + "iopub.status.idle": "2024-06-25T23:03:13.873698Z", + "shell.execute_reply": "2024-06-25T23:03:13.873064Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "LogisticRegression()\n", + "GradientBoostingClassifier()\n" + ] + } + ], + "source": [ + "from cleanlab.classification import CleanLearning\n", + "import numpy as np\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.ensemble import GradientBoostingClassifier\n", + "\n", + "# Make example data\n", + "data = np.vstack([np.random.random((100, 2)), np.random.random((100, 2)) + 10])\n", + "labels = np.array([0] * 100 + [1] * 100)\n", + "\n", + "# Introduce label errors\n", + "true_errors = [97, 98, 100, 101, 102, 104]\n", + "for idx in true_errors:\n", + " labels[idx] = 1 - labels[idx]\n", + "\n", + "# CleanLearning with 2 different classifiers: one classifier is used to detect label issues \n", + "# and a different classifier is subsequently trained on the clean subset of the data.\n", + "\n", + "model_to_find_errors = LogisticRegression() # this model will be trained many times via cross-validation\n", + "model_to_return = GradientBoostingClassifier() # this model will be trained once on clean subset of data\n", + "\n", + "cl0 = CleanLearning(model_to_find_errors)\n", + "issues = cl0.find_label_issues(data, labels)\n", + "\n", + "cl = CleanLearning(model_to_return).fit(data, labels, label_issues=issues)\n", + "pred_probs = cl.predict_proba(data) # predictions from GradientBoostingClassifier\n", + "\n", + "print(cl0.clf) # will be LogisticRegression()\n", + "print(cl.clf) # will be GradientBoostingClassifier()" + ] + }, + { + "cell_type": "markdown", + "id": "b71fef02", + "metadata": {}, + "source": [ + "### How do I hyperparameter tune only the final model trained (and not the one finding label issues) in CleanLearning?" + ] + }, + { + "cell_type": "markdown", + "id": "e7ec1956", + "metadata": {}, + "source": [ + "The code below demonstrates CleanLearning using a `GradientBoostingClassifier()` with no hyperparameter-tuning to find label issues but with hyperparameter-tuning via `RandomizedSearchCV(...)` for the final training of this model on the clean subset of the data.\n", + "This is a useful trick to avoid expensive hyperparameter-tuning for every fold of cross-validation (which is needed to find label issues)." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "8751619e", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:13.876347Z", + "iopub.status.busy": "2024-06-25T23:03:13.875839Z", + "iopub.status.idle": "2024-06-25T23:03:16.938708Z", + "shell.execute_reply": "2024-06-25T23:03:16.938088Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "GradientBoostingClassifier()\n", + "RandomizedSearchCV(estimator=GradientBoostingClassifier(),\n", + " param_distributions={'learning_rate': [0.001, 0.05, 0.1, 0.2,\n", + " 0.5],\n", + " 'max_depth': [3, 5, 10]})\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "from cleanlab.classification import CleanLearning\n", + "from sklearn.ensemble import GradientBoostingClassifier\n", + "from sklearn.model_selection import RandomizedSearchCV\n", + "\n", + "# Make example data\n", + "data = np.vstack([np.random.random((100, 2)), np.random.random((100, 2)) + 10])\n", + "labels = np.array([0] * 100 + [1] * 100)\n", + "\n", + "# Introduce label errors\n", + "true_errors = [97, 98, 100, 101, 102, 104]\n", + "for idx in true_errors:\n", + " labels[idx] = 1 - labels[idx]\n", + "\n", + "# CleanLearning with no hyperparameter-tuning during expensive cross-validation to find label issues\n", + "# but hyperparameter-tuning for the final training of model on clean subset of the data:\n", + "\n", + "model_to_find_errors = GradientBoostingClassifier() # this model will be trained many times via cross-validation\n", + "model_to_return = RandomizedSearchCV(GradientBoostingClassifier(),\n", + " param_distributions = {\n", + " \"learning_rate\": [0.001, 0.05, 0.1, 0.2, 0.5],\n", + " \"max_depth\": [3, 5, 10],\n", + " }\n", + " ) # this model will be trained once on clean subset of data\n", + "\n", + "cl0 = CleanLearning(model_to_find_errors)\n", + "issues = cl0.find_label_issues(data, labels)\n", + "\n", + "cl = CleanLearning(model_to_return).fit(data, labels, label_issues=issues) # CleanLearning for hyperparameter final training\n", + "pred_probs = cl.predict_proba(data) # predictions from hyperparameter-tuned GradientBoostingClassifier\n", + "\n", + "print(cl0.clf) # will be GradientBoostingClassifier()\n", + "print(cl.clf) # will be RandomizedSearchCV(estimator=GradientBoostingClassifier(),...)" + ] + }, + { + "cell_type": "markdown", + "id": "d228decd", + "metadata": {}, + "source": [ + "### Why does regression.learn.CleanLearning take so long?" + ] + }, + { + "cell_type": "markdown", + "id": "de5c984b", + "metadata": {}, + "source": [ + "To effectively identify errors in a regression dataset, the methods in [regression.learn.CleanLearning](../../cleanlab/regression/learn.html#cleanlab.regression.learn.CleanLearning) estimate each datapoint's aleatoric uncertainty (by fitting a second copy of the regression model to predict the residuals’ magnitudes), as well as its epistemic uncertainty (by fitting multiple copies of the regression model with bootstrap resampling). These uncertainty estimates help provide a robust quality score that accounts for the model's imperfect predictions. \n", + "\n", + "These uncertainty estimates help produce better results but require longer runtimes. Here are a few options to speed up the runtime of these methods:\n", + "\n", + "- Reduce the number of bootstrap resampling rounds by decreasing the `n_boot` argument (default value is 5, set it to 0 to skip the epistemic uncertainty estimation entirely).\n", + "\n", + "- Set `include_aleatoric_uncertainty=False` to skip the aleatoric uncertainty estimation.\n", + "\n", + "- Include less elements in the `coarse_search_range` argument of [regression.learn.CleanLearning.find_label_issues](../cleanlab/regression/learn.html#cleanlab.regression.learn.CleanLearning.find_label_issues). This is overall set of values initially considered for estimating the fraction of data that have label issues.\n", + "\n", + "- Reduce the `fine_search_size` argument of [regression.learn.CleanLearning.find_label_issues](../cleanlab/regression/learn.html#cleanlab.regression.learn.CleanLearning.find_label_issues). A higher number represents a more thorough search to precisely estimate the fraction of data that have label issues.\n", + "\n", + "Below is sample code on how to pass in these arguments." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "623df36d", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:16.941276Z", + "iopub.status.busy": "2024-06-25T23:03:16.940849Z", + "iopub.status.idle": "2024-06-25T23:03:16.998699Z", + "shell.execute_reply": "2024-06-25T23:03:16.998084Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
CleanLearning(include_aleatoric_uncertainty=False, model=LinearRegression(),\n",
+       "              n_boot=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "CleanLearning(include_aleatoric_uncertainty=False, model=LinearRegression(),\n", + " n_boot=1)" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from cleanlab.regression.learn import CleanLearning\n", + "\n", + "X = np.random.random(size=(30, 3))\n", + "coefficients = np.random.uniform(-1, 1, size=3)\n", + "y = np.dot(X, coefficients) + np.random.normal(scale=0.2, size=30)\n", + "\n", + "# passing optinal arguments to reduce runtime\n", + "cl = CleanLearning(n_boot=1, include_aleatoric_uncertainty=False)\n", + "cl.find_label_issues(X, y, coarse_search_range=[0.05, 0.1], fine_search_size=2)\n", + "\n", + "# you can also pass coarse_search_range and fine_search_size as kwargs to CleanLearning.fit\n", + "cl.fit(X, y, find_label_issues_kwargs={\"coarse_search_range\": [0.05, 0.1], \"fine_search_size\": 2})" + ] + }, + { + "cell_type": "markdown", + "id": "1677ba25", + "metadata": {}, + "source": [ + "**With Datalab**:\n", + "\n", + "Datalab runs CleanLearning under the hood when looking for label issues in regression datasets. Here's how you can achieve the same behavior as calling `CleanLearning.find_label_issues()` in the code above using Datalab:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "af3052ac", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:17.000966Z", + "iopub.status.busy": "2024-06-25T23:03:17.000615Z", + "iopub.status.idle": "2024-06-25T23:03:17.042881Z", + "shell.execute_reply": "2024-06-25T23:03:17.042387Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding label issues ...\n", + "\n", + "Audit complete. 3 issues found in the dataset.\n" + ] + } + ], + "source": [ + "from cleanlab import Datalab\n", + "\n", + "lab = Datalab(data = {\"X\": X, \"y\": y}, label_name = \"y\", task=\"regression\")\n", + "\n", + "issue_types = {\n", + " \"label\": {\n", + " \"clean_learning_kwargs\": {\"n_boot\": 1, \"include_aleatoric_uncertainty\": False},\n", + " \"coarse_search_range\": [0.05, 0.1],\n", + " \"fine_search_size\": 2,\n", + " },\n", + "}\n", + "lab.find_issues(features=X, issue_types = issue_types)" + ] + }, + { + "cell_type": "markdown", + "id": "bd3ada64", + "metadata": {}, + "source": [ + "### How do I specify pre-computed data slices/clusters when detecting the Underperforming Group Issue?" + ] + }, + { + "cell_type": "markdown", + "id": "6d6c88c3", + "metadata": {}, + "source": [ + "The instructions for specifying pre-computed data slices/clusters when detecting underperforming groups in a dataset are now covered in detail in the Datalab workflows tutorial.\n", + "\n", + "- [Using clustering algorithms](./datalab/workflows.html#Find-Underperforming-Groups-in-a-Dataset).\n", + "- [Using categorical columns in a tabular dataset](./datalab/workflows.html#Predefining-Data-Slices-for-Detecting-Underperforming-Groups)." + ] + }, + { + "cell_type": "markdown", + "id": "7351cf56", + "metadata": {}, + "source": [ + "### How to handle near-duplicate data identified by Datalab?\n", + "\n", + "cleanlab may identify near-duplicate examples in your dataset, these are examples that are very similar to each other and can potentially cause issues in model training and analytics. When near-duplicates are present, models may unexpectedly emphasize these examples, especially if they were accidentally duplicated. In such cases, it is crucial to remove the (near) duplicate copies from your dataset to ensure accurate and reliable results. A common strategy is to remove all but one of the duplicates from your dataset. Here's how you can achieve this with results from cleanlab's `Datalab` class:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "3949f4d1", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:17.045068Z", + "iopub.status.busy": "2024-06-25T23:03:17.044793Z", + "iopub.status.idle": "2024-06-25T23:03:17.052454Z", + "shell.execute_reply": "2024-06-25T23:03:17.051891Z" + } + }, + "outputs": [], + "source": [ + "from typing import Callable\n", + "import pandas as pd\n", + "\n", + "\n", + "def merge_duplicate_sets(df, merge_key: str):\n", + " \"\"\"Generate group keys for each row, then merge intersecting sets.\n", + " \n", + " :param df: DataFrame with columns 'is_near_duplicate_issue' and 'near_duplicate_sets'\n", + " :param merge_key: Name of the column to store the merged sets\n", + " \"\"\"\n", + "\n", + " df[merge_key] = df.apply(construct_group_key, axis=1)\n", + " merged_sets = consolidate_sets(df[merge_key].tolist())\n", + " df[merge_key] = df[merge_key].map(\n", + " lambda x: next(s for s in merged_sets if x.issubset(s))\n", + " )\n", + " return df\n", + "\n", + "def construct_group_key(row):\n", + " \"\"\"Convert near_duplicate_sets into a frozenset and include the row's own index.\"\"\"\n", + " return frozenset(row['near_duplicate_sets']).union({row.name})\n", + "\n", + "def consolidate_sets(sets_list):\n", + " \"\"\"Merge sets if they intersect.\"\"\"\n", + " \n", + " # Convert the input list of frozensets to a list of mutable sets\n", + " sets_list = [set(item) for item in sets_list]\n", + " \n", + " # A flag to keep track of whether any sets were merged in the current iteration\n", + " merged = True\n", + "\n", + " # Continue the merging process as long as we have merged some sets in the previous iteration\n", + " while merged:\n", + " merged = False\n", + " new_sets = []\n", + "\n", + " # Iterate through each set in our list\n", + " for current_set in sets_list:\n", + " # Skip empty sets\n", + " if not current_set:\n", + " continue\n", + "\n", + " # Find all sets that have an intersection with the current set\n", + " intersecting_sets = [s for s in sets_list if s & current_set]\n", + "\n", + " # If more than one set intersects, set the merged flag to True\n", + " if len(intersecting_sets) > 1:\n", + " merged = True\n", + "\n", + " # Merge all intersecting sets into one set\n", + " merged_set = set().union(*intersecting_sets)\n", + " new_sets.append(merged_set)\n", + "\n", + " # Empty the sets we've merged to prevent them from being processed again\n", + " for s in intersecting_sets:\n", + " sets_list[sets_list.index(s)] = set()\n", + "\n", + " # Replace the original sets list with the new list of merged sets\n", + " sets_list = new_sets\n", + "\n", + " # Convert the merged sets back to frozensets for the output\n", + " return [frozenset(item) for item in sets_list]\n", + "\n", + "def lowest_score_strategy(sub_df):\n", + " \"\"\"Keep the row with the lowest near_duplicate_score.\"\"\"\n", + " return sub_df['near_duplicate_score'].idxmin()\n", + "\n", + "\n", + "def filter_near_duplicates(data: pd.DataFrame, strategy_fn: Callable = lowest_score_strategy, **strategy_kwargs):\n", + " \"\"\"\n", + " Given a dataframe with columns 'is_near_duplicate_issue' and 'near_duplicate_sets',\n", + " return a series of boolean values where True indicates the rows to be removed.\n", + " The strategy_fn determines which rows to keep within each near_duplicate_set.\n", + "\n", + " :param data: DataFrame with is_near_duplicate_issue and near_duplicate_sets columns\n", + " :param strategy_fn: Function to determine which rows to keep within each near_duplicate_set\n", + " :return: Series of boolean values where True indicates rows to be removed.\n", + " \"\"\"\n", + " \n", + " # Filter out rows where 'is_near_duplicate_issue' is True to get potential duplicates\n", + " duplicate_rows = data.query(\"is_near_duplicate_issue\").copy()\n", + "\n", + " # Generate group keys for each row and merge intersecting sets\n", + " group_key = \"sets\"\n", + " duplicate_rows = merge_duplicate_sets(duplicate_rows, merge_key=group_key)\n", + "\n", + " # Use the strategy function to determine the indices of the rows to keep for each group\n", + " to_keep_indices = duplicate_rows.groupby(group_key).apply(strategy_fn, **strategy_kwargs).explode().values\n", + "\n", + " # Produce a boolean series indicating which rows should be removed\n", + " to_remove = ~data.index.isin(to_keep_indices)\n", + "\n", + " return to_remove" + ] + }, + { + "cell_type": "markdown", + "id": "d65b3d94", + "metadata": {}, + "source": [ + "The functions above collect sets of near-duplicate examples. Within each\n", + "collection, a single example is chosen to be kept in the dataset. The rest of the examples in the collection are removed.\n", + "Examples that are not near-duplicates of any other examples are kept in the dataset as well.\n", + "\n", + "The choice of which example to keep in each set of near-duplicate examples can be made in a variety of ways. Here, the example with the lowest near-duplicate score is chosen.\n", + "You can use any strategy that best suits your application by defining the strategy as a function and passing it as the `strategy_fn` argument to `filter_near_duplicates()`.\n", + "Below is an example of how this is applied to a dataset.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "1a8e41fa", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:17.054544Z", + "iopub.status.busy": "2024-06-25T23:03:17.054095Z", + "iopub.status.idle": "2024-06-25T23:03:17.073533Z", + "shell.execute_reply": "2024-06-25T23:03:17.072954Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding near_duplicate issues ...\n", + "\n", + "Audit complete. 3 issues found in the dataset.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/tmp/ipykernel_7666/1995098996.py:88: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.\n", + " to_keep_indices = duplicate_rows.groupby(group_key).apply(strategy_fn, **strategy_kwargs).explode().values\n" + ] + } + ], + "source": [ + "from cleanlab import Datalab\n", + "import numpy as np\n", + "\n", + "# Assume you have a dataset with a set of 3 near-duplicate examples\n", + "features = np.random.random(size=(15, 3))\n", + "for neighbor in range(1, 3):\n", + " # Make examples 0, 1, and 2 near-duplicates of each other\n", + " features[neighbor] = features[0] + np.random.normal(scale=0.001, size=3)\n", + "\n", + "# Identify near-duplicate examples with Datalab\n", + "your_dataset = {\n", + " \"features\": features,\n", + "}\n", + "lab = Datalab(data=your_dataset)\n", + "lab.find_issues(features = features, issue_types={\"near_duplicate\": {}})\n", + "\n", + "# Pick out ids of near-duplicate examples to remove\n", + "near_duplicate_issues = (\n", + " lab.get_issues(\"near_duplicate\")\n", + " .query(\"is_near_duplicate_issue\")\n", + " .sort_values(\"near_duplicate_score\")\n", + ")\n", + "ids_to_remove_series = filter_near_duplicates(near_duplicate_issues)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "9202a50a", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:17.075937Z", + "iopub.status.busy": "2024-06-25T23:03:17.075748Z", + "iopub.status.idle": "2024-06-25T23:03:17.079353Z", + "shell.execute_reply": "2024-06-25T23:03:17.078732Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Near-duplicate examples to keep: [0]\n", + "Near-duplicate examples to remove: [1, 2]\n" + ] + } + ], + "source": [ + "print(\"Near-duplicate examples to keep:\", np.where(~ids_to_remove_series)[0].tolist())\n", + "\n", + "print(\"Near-duplicate examples to remove:\", np.where(ids_to_remove_series)[0].tolist())" + ] + }, + { + "cell_type": "markdown", + "id": "3a28168h", + "metadata": {}, + "source": [ + "### What ML models should I run cleanlab with? How do I fix the issues cleanlab has identified?" + ] + }, + { + "cell_type": "markdown", + "id": "1a117547", + "metadata": {}, + "source": [ + "These questions are automatically handled for you in [Cleanlab Studio](https://cleanlab.ai/blog/data-centric-ai/) -- our platform for no-code data improvement.\n", + "While this open-source library **finds** data issues, an interface is needed to efficiently **fix** these issues in your dataset. [Cleanlab Studio](https://cleanlab.ai/blog/data-centric-ai/) is a no-code platform to **find and fix** problems in real-world ML datasets. Cleanlab Studio automatically runs the data quality algorithms from this library on top of AutoML models fit to your data, and presents detected issues in a smart data editing interface. Think of it like a data cleaning assistant that helps you quickly improve the quality of your data (via AI/automation + streamlined UX). [Try it for free!](https://cleanlab.ai/signup/) \n", + "\n", + "![Stages of modern AI pipeline that can now be automated with Cleanlab Studio](https://raw.githubusercontent.com/cleanlab/assets/master/cleanlab/ml-pipeline.png)" + ] + }, + { + "cell_type": "markdown", + "id": "3a28168f", + "metadata": {}, + "source": [ + "### What license is cleanlab open-sourced under?" + ] + }, + { + "cell_type": "markdown", + "id": "1a117546", + "metadata": {}, + "source": [ + "[AGPL-3.0 license](https://github.com/cleanlab/cleanlab/blob/master/LICENSE)\n", + "\n", + "**What does this mean?** If you're working at a company, you can use this open-source library to clean up your internal datasets. You can also use this open-source library to clean up a dataset used to train a model that is deployed in a commercial product.\n", + "For non-commercial purposes, feel free to release altered versions of the source code as long as you include the same license.\n", + "\n", + "Please email `team@cleanlab.ai` to discuss licensing needs if you would like to offer a commercial product that utilizes any cleanlab source code." + ] + }, + { + "cell_type": "markdown", + "id": "1520a93f", + "metadata": {}, + "source": [ + "### Can't find an answer to your question?\n", + "\n", + "If your question is not addressed in these tutorials, please refer to the: [Cleanlab Github issues](https://github.com/cleanlab/cleanlab/issues?q=is%3Aissue), [Cleanlab Code Examples](https://github.com/cleanlab/examples) or our [Slack Community](https://cleanlab.ai/slack).\n", + "\n", + "If your question is not addressed anywhere, please open a [new Github issue](https://github.com/cleanlab/cleanlab/issues/new/choose). Our developers may also provide personalized assistance in our [Slack Community](https://cleanlab.ai/slack). \n", + "\n", + "Professional support and services are also available from our [ML experts](https://cleanlab.ai/about/), learn more by emailing: `team@cleanlab.ai`" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": { + "00dbc1c370d44fcbb44dcd60cb587aa2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "057e5016b7cc481088238185528f6bb7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_09ddf2620ab045ba91ea88a16210bec8", + "placeholder": "​", + "style": "IPY_MODEL_8e2fd49885b84c60bc2e9fcfb5e98db7", + "tabbable": null, + "tooltip": null, + "value": " 10000/? [00:00<00:00, 987848.04it/s]" + } + }, + "09ddf2620ab045ba91ea88a16210bec8": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "10df8f9b40c643cf87c77be707aaff6c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "17ba4eaa15484930b21f20b8e3100f80": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_10df8f9b40c643cf87c77be707aaff6c", + "placeholder": "​", + "style": "IPY_MODEL_dc46b4249b59419880a39e89ba31d50c", + "tabbable": null, + "tooltip": null, + "value": "number of examples processed for estimating thresholds: " + } + }, + "1b9ff3a07c084feaa7dcdf58d8f05eb9": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "23452e56da464c4b96d01d0f60ec04e1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_fffa451d4f684d3f9363c09010ee2f4a", + "max": 50.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_82f30b9f0ff545c88ff32f3391d1e015", + "tabbable": null, + "tooltip": null, + "value": 50.0 + } + }, + "28dcb3fb4f3c41cdaa2349635fb636a1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_1b9ff3a07c084feaa7dcdf58d8f05eb9", + "max": 50.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_00dbc1c370d44fcbb44dcd60cb587aa2", + "tabbable": null, + "tooltip": null, + "value": 50.0 + } + }, + "2b9aae05181e474484115c9349a41c05": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "34f3df63612d4043b2df853787de9a24": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_644073654c1c4ebeb0fe93c87e53f5cd", + "placeholder": "​", + "style": "IPY_MODEL_7ec797f6a3fb47079e43a27a9eec2d81", + "tabbable": null, + "tooltip": null, + "value": " 10000/? [00:00<00:00, 1416946.72it/s]" + } + }, + "51814402f95e474e8aeae2cae568ab2c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c17c8d47e81944caa1b6ef709e543f28", + "IPY_MODEL_28dcb3fb4f3c41cdaa2349635fb636a1", + "IPY_MODEL_34f3df63612d4043b2df853787de9a24" + ], + "layout": "IPY_MODEL_d0998743cd97424ba48d23dcecb9f23e", + "tabbable": null, + "tooltip": null + } + }, + "644073654c1c4ebeb0fe93c87e53f5cd": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7ec797f6a3fb47079e43a27a9eec2d81": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "82f30b9f0ff545c88ff32f3391d1e015": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "8e2fd49885b84c60bc2e9fcfb5e98db7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "c17c8d47e81944caa1b6ef709e543f28": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_f1c5d3314d494929895dd366f7104095", + "placeholder": "​", + "style": "IPY_MODEL_2b9aae05181e474484115c9349a41c05", + "tabbable": null, + "tooltip": null, + "value": "number of examples processed for checking labels: " + } + }, + "d0998743cd97424ba48d23dcecb9f23e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d94b77d0f6ad40dfa13bb6ce614c0556": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_17ba4eaa15484930b21f20b8e3100f80", + "IPY_MODEL_23452e56da464c4b96d01d0f60ec04e1", + "IPY_MODEL_057e5016b7cc481088238185528f6bb7" + ], + "layout": "IPY_MODEL_daf8847cc3a84bc19a021a7474c4ca09", + "tabbable": null, + "tooltip": null + } + }, + "daf8847cc3a84bc19a021a7474c4ca09": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "dc46b4249b59419880a39e89ba31d50c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "f1c5d3314d494929895dd366f7104095": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fffa451d4f684d3f9363c09010ee2f4a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + } + }, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/indepth_overview.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/indepth_overview.ipynb new file mode 100644 index 000000000..8df766d1d --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/indepth_overview.ipynb @@ -0,0 +1,2420 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Sfmml1VCqCHm" + }, + "source": [ + "# The Workflows of Data-centric AI for Classification with Noisy Labels\n", + "\n", + "In this tutorial, you will learn how to easily incorporate [cleanlab](https://github.com/cleanlab/cleanlab) into your ML development workflows to:\n", + "\n", + "- Automatically find issues such as label errors, outliers and near duplicates lurking in your classification data.\n", + "- Score the label quality of every example in your dataset.\n", + "- Train robust models in the presence of label issues.\n", + "- Identify overlapping classes that you can merge to make the learning task less ambiguous.\n", + "- Generate an overall label health score to track improvements in your labels as you clean your datasets over time.\n", + "\n", + "This tutorial provides an in-depth survey of many possible different ways that cleanlab can be utilized for Data-Centric AI. If you have a different use-case in mind that is not supported, please [tell us about it](https://github.com/cleanlab/cleanlab/issues)!\n", + "While this tutorial focuses on standard multi-class (and binary) classification datasets, cleanlab also supports other tasks including: [data labeled by multiple annotators](multiannotator.html), [multi-label classification](../cleanlab/filter.rst#cleanlab.filter.find_label_issues), and [token classification of text](token_classification.html).\n", + "\n", + "**cleanlab is grounded in theory and science**. Learn more:\n", + "\n", + "[Research Publications](https://cleanlab.ai/research) | [Label Errors found by cleanlab](https://labelerrors.com/) | [Examples using cleanlab](https://github.com/cleanlab/examples)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XBK4cAOUyLgW" + }, + "source": [ + "## Install dependencies and import them" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can use pip to install all packages required for this tutorial as follows:\n", + "\n", + "```\n", + "!pip install matplotlib \n", + "!pip install cleanlab[datalab]\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:20.450063Z", + "iopub.status.busy": "2024-06-25T23:03:20.449598Z", + "iopub.status.idle": "2024-06-25T23:03:21.615240Z", + "shell.execute_reply": "2024-06-25T23:03:21.614685Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs website).\n", + "# Package versions used: matplotlib==3.5.1 \n", + "\n", + "dependencies = [\"cleanlab\", \"matplotlib\", \"datasets\"]\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")\n", + "\n", + "%config InlineBackend.print_figure_kwargs={\"facecolor\": \"w\"}" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:21.617725Z", + "iopub.status.busy": "2024-06-25T23:03:21.617310Z", + "iopub.status.idle": "2024-06-25T23:03:21.794940Z", + "shell.execute_reply": "2024-06-25T23:03:21.794390Z" + }, + "id": "avXlHJcXjruP" + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import cleanlab\n", + "from cleanlab import Datalab\n", + "from cleanlab.classification import CleanLearning\n", + "from cleanlab.benchmarking import noise_generation\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.model_selection import cross_val_predict\n", + "from numpy.random import multivariate_normal\n", + "from matplotlib import pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "I6VuupksjruQ" + }, + "source": [ + "## Create the data (can skip these details)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
See the code for data generation **(click to expand)**\n", + "\n", + "\n", + "```python\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "SEED = 0\n", + "\n", + "def make_data(\n", + " means=[[3, 2], [7, 7], [0, 8], [0, 10]],\n", + " covs=[\n", + " [[5, -1.5], [-1.5, 1]],\n", + " [[1, 0.5], [0.5, 4]],\n", + " [[5, 1], [1, 5]],\n", + " [[3, 1], [1, 1]],\n", + " ],\n", + " sizes=[100, 50, 50, 50],\n", + " avg_trace=0.8,\n", + " seed=SEED, # set to None for non-reproducible randomness\n", + "):\n", + " np.random.seed(seed=SEED)\n", + "\n", + " K = len(means) # number of classes\n", + " data = []\n", + " labels = []\n", + " test_data = []\n", + " test_labels = []\n", + "\n", + " for idx in range(K):\n", + " data.append(\n", + " np.random.multivariate_normal(\n", + " mean=means[idx], cov=covs[idx], size=sizes[idx]\n", + " )\n", + " )\n", + " test_data.append(\n", + " np.random.multivariate_normal(\n", + " mean=means[idx], cov=covs[idx], size=sizes[idx]\n", + " )\n", + " )\n", + " labels.append(np.array([idx for i in range(sizes[idx])]))\n", + " test_labels.append(np.array([idx for i in range(sizes[idx])]))\n", + " X_train = np.vstack(data)\n", + " y_train = np.hstack(labels)\n", + " X_test = np.vstack(test_data)\n", + " y_test = np.hstack(test_labels)\n", + "\n", + " # Compute p(y=k) the prior distribution over true labels.\n", + " py_true = np.bincount(y_train) / float(len(y_train))\n", + "\n", + " noise_matrix_true = noise_generation.generate_noise_matrix_from_trace(\n", + " K,\n", + " trace=avg_trace * K,\n", + " py=py_true,\n", + " valid_noise_matrix=True,\n", + " seed=SEED,\n", + " )\n", + "\n", + " # Generate our noisy labels using the noise_marix.\n", + " s = noise_generation.generate_noisy_labels(y_train, noise_matrix_true)\n", + " s_test = noise_generation.generate_noisy_labels(y_test, noise_matrix_true)\n", + " ps = np.bincount(s) / float(len(s)) # Prior distribution over noisy labels\n", + "\n", + " return {\n", + " \"data\": X_train,\n", + " \"true_labels\": y_train, # You never get to see these perfect labels.\n", + " \"labels\": s, # Instead, you have these labels, which have some errors.\n", + " \"test_data\": X_test,\n", + " \"test_labels\": y_test, # Perfect labels used for \"true\" measure of model's performance during deployment.\n", + " \"noisy_test_labels\": s_test, # With IID train/test split, you'd have these labels, which also have some errors.\n", + " \"ps\": ps,\n", + " \"py_true\": py_true,\n", + " \"noise_matrix_true\": noise_matrix_true,\n", + " \"class_names\": [\"purple\", \"blue\", \"seafoam green\", \"yellow\"],\n", + " }\n", + "\n", + "\n", + "data_dict = make_data()\n", + "for key, val in data_dict.items(): # Map data_dict to variables in namespace\n", + " exec(key + \"=val\")\n", + "\n", + "# Display dataset visually using matplotlib\n", + "def plot_data(data, circles, title, alpha=1.0):\n", + " plt.figure(figsize=(14, 5))\n", + " plt.scatter(data[:, 0], data[:, 1], c=labels, s=60)\n", + " for i in circles:\n", + " plt.plot(\n", + " data[i][0],\n", + " data[i][1],\n", + " \"o\",\n", + " markerfacecolor=\"none\",\n", + " markeredgecolor=\"red\",\n", + " markersize=14,\n", + " markeredgewidth=2.5,\n", + " alpha=alpha\n", + " )\n", + " _ = plt.title(title, fontsize=25)\n", + "```\n", + "\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:21.797461Z", + "iopub.status.busy": "2024-06-25T23:03:21.797073Z", + "iopub.status.idle": "2024-06-25T23:03:21.808904Z", + "shell.execute_reply": "2024-06-25T23:03:21.808481Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "SEED = 0\n", + "\n", + "def make_data(\n", + " means=[[3, 2], [7, 7], [0, 8], [0, 10]],\n", + " covs=[\n", + " [[5, -1.5], [-1.5, 1]],\n", + " [[1, 0.5], [0.5, 4]],\n", + " [[5, 1], [1, 5]],\n", + " [[3, 1], [1, 1]],\n", + " ],\n", + " sizes=[100, 50, 50, 50],\n", + " avg_trace=0.8,\n", + " seed=SEED, # set to None for non-reproducible randomness\n", + "):\n", + " np.random.seed(seed=SEED)\n", + "\n", + " K = len(means) # number of classes\n", + " data = []\n", + " labels = []\n", + " test_data = []\n", + " test_labels = []\n", + "\n", + " for idx in range(K):\n", + " data.append(\n", + " np.random.multivariate_normal(\n", + " mean=means[idx], cov=covs[idx], size=sizes[idx]\n", + " )\n", + " )\n", + " test_data.append(\n", + " np.random.multivariate_normal(\n", + " mean=means[idx], cov=covs[idx], size=sizes[idx]\n", + " )\n", + " )\n", + " labels.append(np.array([idx for i in range(sizes[idx])]))\n", + " test_labels.append(np.array([idx for i in range(sizes[idx])]))\n", + " X_train = np.vstack(data)\n", + " y_train = np.hstack(labels)\n", + " X_test = np.vstack(test_data)\n", + " y_test = np.hstack(test_labels)\n", + "\n", + " # Compute p(y=k) the prior distribution over true labels.\n", + " py_true = np.bincount(y_train) / float(len(y_train))\n", + "\n", + " noise_matrix_true = noise_generation.generate_noise_matrix_from_trace(\n", + " K,\n", + " trace=avg_trace * K,\n", + " py=py_true,\n", + " valid_noise_matrix=True,\n", + " seed=SEED,\n", + " )\n", + "\n", + " # Generate our noisy labels using the noise_marix.\n", + " s = noise_generation.generate_noisy_labels(y_train, noise_matrix_true)\n", + " s_test = noise_generation.generate_noisy_labels(y_test, noise_matrix_true)\n", + " ps = np.bincount(s) / float(len(s)) # Prior distribution over noisy labels\n", + "\n", + " return {\n", + " \"data\": X_train,\n", + " \"true_labels\": y_train, # You never get to see these perfect labels.\n", + " \"labels\": s, # Instead, you have these labels, which have some errors.\n", + " \"test_data\": X_test,\n", + " \"test_labels\": y_test, # Perfect labels used for \"true\" measure of model's performance during deployment.\n", + " \"noisy_test_labels\": s_test, # With IID train/test split, you'd have these labels, which also have some errors.\n", + " \"ps\": ps,\n", + " \"py_true\": py_true,\n", + " \"noise_matrix_true\": noise_matrix_true,\n", + " \"class_names\": [\"purple\", \"blue\", \"seafoam green\", \"yellow\"],\n", + " }\n", + "\n", + "\n", + "data_dict = make_data()\n", + "for key, val in data_dict.items(): # Map data_dict to variables in namespace\n", + " exec(key + \"=val\")\n", + "\n", + "# Display dataset visually using matplotlib\n", + "def plot_data(data, circles, title, alpha=1.0):\n", + " plt.figure(figsize=(14, 5))\n", + " plt.scatter(data[:, 0], data[:, 1], c=labels, s=60)\n", + " for i in circles:\n", + " plt.plot(\n", + " data[i][0],\n", + " data[i][1],\n", + " \"o\",\n", + " markerfacecolor=\"none\",\n", + " markeredgecolor=\"red\",\n", + " markersize=14,\n", + " markeredgewidth=2.5,\n", + " alpha=alpha\n", + " )\n", + " _ = plt.title(title, fontsize=25)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:21.810884Z", + "iopub.status.busy": "2024-06-25T23:03:21.810555Z", + "iopub.status.idle": "2024-06-25T23:03:22.044953Z", + "shell.execute_reply": "2024-06-25T23:03:22.044360Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "true_errors = np.where(true_labels != labels)[0]\n", + "plot_data(data, circles=true_errors, title=\"A realistic, messy dataset with 4 classes\", alpha=0.3)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AM6E7tNS9pZn" + }, + "source": [ + "The figure above represents a toy dataset we'll use to demonstrate various cleanlab functionality. In this data, the features *X* are 2-dimensional and examples are colored according to their *given* label above.\n", + "\n", + "Like [many real-world datasets](https://labelerrors.com/), the given label happens to be incorrect for some of the examples (**circled in red**) in this dataset!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## **Workflow 1:** Use Datalab to detect many types of issues " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Datalab offers an easy interface to detect all sorts of common real-world issue in your dataset. Internally it uses many data quality algorithms, and these methods can also be directly invoked — as demonstrated in some of the subsequent workflows here." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:22.047376Z", + "iopub.status.busy": "2024-06-25T23:03:22.047040Z", + "iopub.status.idle": "2024-06-25T23:03:22.073136Z", + "shell.execute_reply": "2024-06-25T23:03:22.072686Z" + } + }, + "outputs": [], + "source": [ + "# Datalab offers several ways of loading the data\n", + "# we’ll simply wrap the training features and noisy labels in a dictionary. \n", + "data_dict = {\"X\": data, \"y\": labels}\n", + "\n", + "# get out of sample predicted probabilities via cross-validation.\n", + "yourFavoriteModel = LogisticRegression(verbose=0, random_state=SEED)\n", + "pred_probs = cross_val_predict(\n", + " estimator=yourFavoriteModel, X=data, y=labels, cv=3, method=\"predict_proba\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All that is need to audit your data is initalize a Datalab object with your dataset and call `find_issues()`. \n", + "\n", + "Pass in the predicted probabilities and feature embeddings for your data and Datalab will do all the work!\n", + "You do not necessarily need to provide all of this information depending on which types of issues you are interested in, but the more inputs you provide, the more types of issues `Datalab` can detect in your data. Using a better model to produce these inputs will ensure cleanlab more accurately estimates issues.\n", + "Make sure that the columns of your `pred_probs` are properly ordered with respect to the ordering of classes, which for Datalab is: lexicographically sorted by class name." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:22.075253Z", + "iopub.status.busy": "2024-06-25T23:03:22.074934Z", + "iopub.status.idle": "2024-06-25T23:03:24.123614Z", + "shell.execute_reply": "2024-06-25T23:03:24.122984Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding null issues ...\n", + "Finding label issues ...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding outlier issues ...\n", + "Finding near_duplicate issues ...\n", + "Finding non_iid issues ...\n", + "Finding class_imbalance issues ...\n", + "Finding underperforming_group issues ...\n", + "\n", + "Audit complete. 78 issues found in the dataset.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/sklearn/neighbors/_base.py:246: EfficiencyWarning: Precomputed sparse input was not sorted by row values. Use the function sklearn.neighbors.sort_graph_by_row_values to sort the input by row values, with warn_when_not_sorted=False to remove this warning.\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "lab = Datalab(data_dict, label_name=\"y\")\n", + "lab.find_issues(pred_probs=pred_probs, features=data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After the audit is complete, review the findings using the `report` method:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:24.125919Z", + "iopub.status.busy": "2024-06-25T23:03:24.125543Z", + "iopub.status.idle": "2024-06-25T23:03:24.144436Z", + "shell.execute_reply": "2024-06-25T23:03:24.143950Z" + }, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dataset Information: num_examples: 250, num_classes: 4\n", + "\n", + "Here is a summary of various issues found in your data:\n", + "\n", + " issue_type num_issues\n", + " label 64\n", + " outlier 7\n", + "near_duplicate 6\n", + " non_iid 1\n", + "\n", + "Learn about each issue: https://docs.cleanlab.ai/stable/cleanlab/datalab/guide/issue_type_description.html\n", + "See which examples in your dataset exhibit each issue via: `datalab.get_issues()`\n", + "\n", + "Data indices corresponding to top examples of each issue are shown below.\n", + "\n", + "\n", + "----------------------- label issues -----------------------\n", + "\n", + "About this issue:\n", + "\tExamples whose given label is estimated to be potentially incorrect\n", + " (e.g. due to annotation error) are flagged as having label issues.\n", + " \n", + "\n", + "Number of examples with this issue: 64\n", + "Overall dataset quality in terms of this issue: 0.7560\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_label_issue label_score given_label predicted_label\n", + "99 True 5.637318e-08 1 0\n", + "8 True 3.896262e-07 1 0\n", + "64 True 3.548391e-05 1 0\n", + "107 True 7.923417e-05 3 1\n", + "10 True 9.375075e-05 2 1\n", + "\n", + "\n", + "---------------------- outlier issues ----------------------\n", + "\n", + "About this issue:\n", + "\tExamples that are very different from the rest of the dataset \n", + " (i.e. potentially out-of-distribution or rare/anomalous instances).\n", + " \n", + "\n", + "Number of examples with this issue: 7\n", + "Overall dataset quality in terms of this issue: 0.3454\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_outlier_issue outlier_score\n", + "147 True 0.014051\n", + "10 True 0.020451\n", + "249 True 0.042594\n", + "132 True 0.043859\n", + "189 True 0.045954\n", + "\n", + "\n", + "------------------ near_duplicate issues -------------------\n", + "\n", + "About this issue:\n", + "\tA (near) duplicate issue refers to two or more examples in\n", + " a dataset that are extremely similar to each other, relative\n", + " to the rest of the dataset. The examples flagged with this issue\n", + " may be exactly duplicated, or lie atypically close together when\n", + " represented as vectors (i.e. feature embeddings).\n", + " \n", + "\n", + "Number of examples with this issue: 6\n", + "Overall dataset quality in terms of this issue: 0.6120\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_near_duplicate_issue near_duplicate_score near_duplicate_sets distance_to_nearest_neighbor\n", + "3 True 0.023714 [58] 0.007136\n", + "58 True 0.023714 [3] 0.007136\n", + "119 True 0.107266 [103] 0.033738\n", + "103 True 0.107266 [119] 0.033738\n", + "238 True 0.119505 [236] 0.037843\n", + "\n", + "\n", + "---------------------- non_iid issues ----------------------\n", + "\n", + "About this issue:\n", + "\tWhether the dataset exhibits statistically significant\n", + " violations of the IID assumption like:\n", + " changepoints or shift, drift, autocorrelation, etc.\n", + " The specific violation considered is whether the\n", + " examples are ordered such that almost adjacent examples\n", + " tend to have more similar feature values.\n", + " \n", + "\n", + "Number of examples with this issue: 1\n", + "Overall dataset quality in terms of this issue: 0.0000\n", + "\n", + "Examples representing most severe instances of this issue:\n", + " is_non_iid_issue non_iid_score\n", + "222 True 0.614915\n", + "122 False 0.624422\n", + "126 False 0.625965\n", + "119 False 0.626079\n", + "118 False 0.627675\n", + "\n", + "Additional Information: \n", + "p-value: 0.0\n" + ] + } + ], + "source": [ + "lab.report()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZmUd-5tljruT" + }, + "source": [ + "## **Workflow 2:** Use CleanLearning for more robust Machine Learning\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:24.146673Z", + "iopub.status.busy": "2024-06-25T23:03:24.146347Z", + "iopub.status.idle": "2024-06-25T23:03:25.602101Z", + "shell.execute_reply": "2024-06-25T23:03:25.601516Z" + }, + "id": "AaHC5MRKjruT" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_label_issuelabel_qualitygiven_labelpredicted_labelsample_weight
0False0.695223001.323529
1False0.523015001.323529
2True0.013720300.000000
3False0.675727001.323529
4False0.646521001.323529
\n", + "
" + ], + "text/plain": [ + " is_label_issue label_quality given_label predicted_label sample_weight\n", + "0 False 0.695223 0 0 1.323529\n", + "1 False 0.523015 0 0 1.323529\n", + "2 True 0.013720 3 0 0.000000\n", + "3 False 0.675727 0 0 1.323529\n", + "4 False 0.646521 0 0 1.323529" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "yourFavoriteModel = LogisticRegression(verbose=0, random_state=SEED)\n", + "\n", + "# CleanLearning: Machine Learning with cleaned data (given messy, real-world data)\n", + "cl = cleanlab.classification.CleanLearning(yourFavoriteModel, seed=SEED)\n", + "\n", + "# Fit model to messy, real-world data, automatically training on cleaned data.\n", + "_ = cl.fit(data, labels)\n", + "\n", + "# See the label quality for every example, which data has issues, and more.\n", + "cl.get_label_issues().head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "78udGSU6jruT" + }, + "source": [ + "### Clean Learning = Machine Learning with cleaned data\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:25.605010Z", + "iopub.status.busy": "2024-06-25T23:03:25.604337Z", + "iopub.status.idle": "2024-06-25T23:03:25.617886Z", + "shell.execute_reply": "2024-06-25T23:03:25.617451Z" + }, + "id": "Wy27rvyhjruU" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Accuracy using yourFavoriteModel: 83%\n", + "Accuracy using yourFavoriteModel (+ CleanLearning): 86%\n" + ] + } + ], + "source": [ + "# For comparison, this is how you would have trained your model normally (without Cleanlab)\n", + "yourFavoriteModel = LogisticRegression(verbose=0, random_state=SEED)\n", + "yourFavoriteModel.fit(data, labels)\n", + "print(f\"Accuracy using yourFavoriteModel: {yourFavoriteModel.score(test_data, test_labels):.0%}\")\n", + "\n", + "# But CleanLearning can do anything yourFavoriteModel can do, but enhanced.\n", + "# For example, CleanLearning gives you predictions (just like yourFavoriteModel)\n", + "# but the magic is that CleanLearning was trained as if your data did not have label errors.\n", + "print(f\"Accuracy using yourFavoriteModel (+ CleanLearning): {cl.score(test_data, test_labels):.0%}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rtEh09G7764o" + }, + "source": [ + "Note! *Accuracy* refers to the accuracy with respect to the *true* error-free labels of a test set., i.e. what we actually care about in practice because that's what real-world model performance is based on. If you don't have a clean test set, you can use cleanlab to make one :)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_b8O6_J2jruU" + }, + "source": [ + "## **Workflow 3:** Use CleanLearning to find_label_issues in one line of code\n" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:25.620009Z", + "iopub.status.busy": "2024-06-25T23:03:25.619680Z", + "iopub.status.idle": "2024-06-25T23:03:25.692767Z", + "shell.execute_reply": "2024-06-25T23:03:25.692197Z" + }, + "id": "Db8YHnyVjruU" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_label_issuelabel_qualitygiven_labelpredicted_label
0False0.69522300
1False0.52301500
2True0.01372030
3False0.67572700
4False0.64652100
\n", + "
" + ], + "text/plain": [ + " is_label_issue label_quality given_label predicted_label\n", + "0 False 0.695223 0 0\n", + "1 False 0.523015 0 0\n", + "2 True 0.013720 3 0\n", + "3 False 0.675727 0 0\n", + "4 False 0.646521 0 0" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# One line of code. Literally.\n", + "issues = CleanLearning(yourFavoriteModel, seed=SEED).find_label_issues(data, labels)\n", + "\n", + "issues.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8OOsvMoMjruU" + }, + "source": [ + "### Visualize the twenty examples with lowest label quality to see if Cleanlab works.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:25.695187Z", + "iopub.status.busy": "2024-06-25T23:03:25.694818Z", + "iopub.status.idle": "2024-06-25T23:03:25.904277Z", + "shell.execute_reply": "2024-06-25T23:03:25.903729Z" + }, + "id": "iJqAHuS2jruV" + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "lowest_quality_labels = issues[\"label_quality\"].argsort()[:20]\n", + "plot_data(data, circles=lowest_quality_labels, title=\"The 20 lowest label quality examples\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wdtPREswG2fe" + }, + "source": [ + "Above, the top 20 label issues circled in red are found automatically using cleanlab (no true labels given).\n", + "\n", + "If you've already computed the label issues using ``CleanLearning``, you can pass them into `fit()` and it will train **much** faster (skips label-issue identification step)." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:25.906574Z", + "iopub.status.busy": "2024-06-25T23:03:25.906190Z", + "iopub.status.idle": "2024-06-25T23:03:25.923183Z", + "shell.execute_reply": "2024-06-25T23:03:25.922697Z" + }, + "id": "PcPTZ_JJG3Cx" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
CleanLearning(clf=LogisticRegression(random_state=0),\n",
+       "              find_label_issues_kwargs={'confident_joint': array([[68,  0,  8,  8],\n",
+       "       [ 5, 46,  3,  0],\n",
+       "       [15,  3, 31, 14],\n",
+       "       [ 2,  1, 12, 34]]),\n",
+       "                                        'min_examples_per_class': 10},\n",
+       "              seed=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "CleanLearning(clf=LogisticRegression(random_state=0),\n", + " find_label_issues_kwargs={'confident_joint': array([[68, 0, 8, 8],\n", + " [ 5, 46, 3, 0],\n", + " [15, 3, 31, 14],\n", + " [ 2, 1, 12, 34]]),\n", + " 'min_examples_per_class': 10},\n", + " seed=0)" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# CleanLearning can train faster if issues are provided at fitting time.\n", + "cl.fit(data, labels, label_issues=issues)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XYFkRMk-jruV" + }, + "source": [ + "## **Workflow 4:** Use cleanlab to find dataset-level and class-level issues\n", + "\n", + "- Did you notice that the yellow and seafoam green class above are overlapping?\n", + "- How can a model ever know (or learn) what's ground truth inside the yellow distribution?\n", + "- If these two classes were merged, the model can learn more accurately from 3 classes (versus 4).\n", + "\n", + "cleanlab automatically finds data-set level issues like this, in one line of code. Check this out!\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:25.925372Z", + "iopub.status.busy": "2024-06-25T23:03:25.925037Z", + "iopub.status.idle": "2024-06-25T23:03:25.934927Z", + "shell.execute_reply": "2024-06-25T23:03:25.934374Z" + }, + "id": "0lonvOYvjruV" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Class Name AClass Name BClass Index AClass Index BNum Overlapping ExamplesJoint Probability
0seafoam greenyellow23260.104
1purpleseafoam green02230.092
2purpleyellow03100.040
3blueseafoam green1260.024
4purpleblue0150.020
5blueyellow1310.004
\n", + "
" + ], + "text/plain": [ + " Class Name A Class Name B Class Index A Class Index B \\\n", + "0 seafoam green yellow 2 3 \n", + "1 purple seafoam green 0 2 \n", + "2 purple yellow 0 3 \n", + "3 blue seafoam green 1 2 \n", + "4 purple blue 0 1 \n", + "5 blue yellow 1 3 \n", + "\n", + " Num Overlapping Examples Joint Probability \n", + "0 26 0.104 \n", + "1 23 0.092 \n", + "2 10 0.040 \n", + "3 6 0.024 \n", + "4 5 0.020 \n", + "5 1 0.004 " + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cleanlab.dataset.find_overlapping_classes(\n", + " labels=labels,\n", + " confident_joint=cl.confident_joint, # cleanlab uses the confident_joint internally to quantify label noise (see cleanlab.count.compute_confident_joint)\n", + " class_names=class_names,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZXkMIKlGjruV" + }, + "source": [ + "Do the results surprise you? Did you expect the purple and seafoam green to also have so much overlap?\n", + "\n", + "There are two things being happening here:\n", + "\n", + "1. **Distribution Overlap**: The green distribution has huge variance and overlaps with other distributions.\n", + " - Cleanlab handles this for you: read the theory behind cleanlab for overlapping classes here: https://arxiv.org/abs/1705.01936\n", + "2. **Label Issues**: A ton of examples (which actually belong to the purple class) have been mislabeled as \"green\" in our dataset.\n", + "\n", + "### Now, let's see what happens if we merge classes \"seafoam green\" and \"yellow\"\n", + "* The top two classes found automatically by ``cleanlab.dataset.find_overlapping_classes()``" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:25.936927Z", + "iopub.status.busy": "2024-06-25T23:03:25.936746Z", + "iopub.status.idle": "2024-06-25T23:03:26.022563Z", + "shell.execute_reply": "2024-06-25T23:03:26.021909Z" + }, + "id": "MfqTCa3kjruV" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[Original classes] Accuracy of yourFavoriteModel: 83%\n", + "[Modified classes] Accuracy of yourFavoriteModel: 94%\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[Modified classes] Accuracy of yourFavoriteModel (+ CleanLearning): 96%\n" + ] + } + ], + "source": [ + "yourFavoriteModel1 = LogisticRegression(verbose=0, random_state=SEED)\n", + "yourFavoriteModel1.fit(data, labels)\n", + "print(f\"[Original classes] Accuracy of yourFavoriteModel: {yourFavoriteModel1.score(test_data, test_labels):.0%}\")\n", + "\n", + "merged_labels, merged_test_labels = np.array(labels), np.array(test_labels)\n", + "\n", + "# Merge classes: map all yellow-labeled examples to seafoam green\n", + "merged_labels[merged_labels == 3] = 2\n", + "merged_test_labels[merged_test_labels == 3] = 2\n", + "\n", + "# Re-run our comparison. Re-run your model on the newly labeled dataset.\n", + "yourFavoriteModel2 = LogisticRegression(verbose=0, random_state=SEED)\n", + "yourFavoriteModel2.fit(data, merged_labels)\n", + "print(f\"[Modified classes] Accuracy of yourFavoriteModel: {yourFavoriteModel2.score(test_data, merged_test_labels):.0%}\")\n", + "\n", + "# Re-run CleanLearning as well.\n", + "yourFavoriteModel3 = LogisticRegression(verbose=0, random_state=SEED)\n", + "cl3 = cleanlab.classification.CleanLearning(yourFavoriteModel3, seed=SEED)\n", + "cl3.fit(data, merged_labels)\n", + "print(f\"[Modified classes] Accuracy of yourFavoriteModel (+ CleanLearning): {cl3.score(test_data, merged_test_labels):.0%}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Bi53hnRxjruW" + }, + "source": [ + "While on one hand that's a huge improvement, it's important to remember that choosing among three classes is an easier task than choosing among four classes, so it's not fair to directly compare these numbers.\n", + "\n", + "Instead, the big takeaway is...\n", + "if you get to choose your classes, combining overlapping classes can make the learning task easier for your model. But if you have lots of classes, how do you know which ones to merge?? That's when you use `cleanlab.dataset.find_overlapping_classes`.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BxI7bgn8L_1K" + }, + "source": [ + "## **Workflow 5:** Clean your test set too if you're doing ML with noisy labels!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iZ43QfbrNk0K" + }, + "source": [ + "If your test and training data were randomly split (IID), then be aware that your test labels are likely noisy too! It is thus important to fix label issues in them before we can trust measures like test accuracy.\n", + "\n", + "* More about what can go wrong if you don't use a clean test set [in this paper](https://arxiv.org/abs/2103.14749)." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:26.024948Z", + "iopub.status.busy": "2024-06-25T23:03:26.024646Z", + "iopub.status.idle": "2024-06-25T23:03:26.156644Z", + "shell.execute_reply": "2024-06-25T23:03:26.156020Z" + }, + "id": "9ZtWAYXqMAPL" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " Noisy Test Accuracy (on given test labels) using yourFavoriteModel: 69%\n", + " Noisy Test Accuracy (on given test labels) using yourFavoriteModel (+ CleanLearning): 71%\n", + "Actual Test Accuracy (on corrected test labels) using yourFavoriteModel: 83%\n", + "Actual Test Accuracy (on corrected test labels) using yourFavoriteModel (+ CleanLearning): 86%\n" + ] + } + ], + "source": [ + "from sklearn.metrics import accuracy_score\n", + "\n", + "# Fit your model on noisily labeled train data\n", + "yourFavoriteModel = LogisticRegression(verbose=0, random_state=SEED)\n", + "yourFavoriteModel.fit(data, labels)\n", + "\n", + "# Get predicted probabilities for test data (these are out-of-sample)\n", + "my_test_pred_probs = yourFavoriteModel.predict_proba(test_data)\n", + "my_test_preds = my_test_pred_probs.argmax(axis=1) # predicted labels\n", + "\n", + "# Find label issues in the test data\n", + "issues_test = CleanLearning(yourFavoriteModel, seed=SEED).find_label_issues(\n", + " labels=noisy_test_labels, pred_probs=my_test_pred_probs)\n", + "\n", + "# You should inspect issues_test and fix issues to ensure high-quality test data labels.\n", + "corrected_test_labels = test_labels # Here we'll pretend you have done this perfectly :)\n", + "\n", + "# Fit more robust version of model on noisily labeled training data\n", + "cl = CleanLearning(yourFavoriteModel, seed=SEED).fit(data, labels)\n", + "cl_test_preds = cl.predict(test_data)\n", + "\n", + "print(f\" Noisy Test Accuracy (on given test labels) using yourFavoriteModel: {accuracy_score(noisy_test_labels, my_test_preds):.0%}\")\n", + "print(f\" Noisy Test Accuracy (on given test labels) using yourFavoriteModel (+ CleanLearning): {accuracy_score(noisy_test_labels, cl_test_preds):.0%}\")\n", + "print(f\"Actual Test Accuracy (on corrected test labels) using yourFavoriteModel: {accuracy_score(corrected_test_labels, my_test_preds):.0%}\")\n", + "print(f\"Actual Test Accuracy (on corrected test labels) using yourFavoriteModel (+ CleanLearning): {accuracy_score(corrected_test_labels, cl_test_preds):.0%}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GluE5XAAjruW" + }, + "source": [ + "## **Workflow 6:** One score to rule them all -- use cleanlab's overall dataset health score\n", + "\n", + "This score can be fairly compared across datasets or across versions of a dataset to track overall dataset quality (a.k.a. *dataset health*) over time.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:26.159138Z", + "iopub.status.busy": "2024-06-25T23:03:26.158680Z", + "iopub.status.idle": "2024-06-25T23:03:26.162549Z", + "shell.execute_reply": "2024-06-25T23:03:26.161995Z" + }, + "id": "0rXP3ZPWjruW" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " * Overall, about 28% (71 of the 250) labels in your dataset have potential issues.\n", + " ** The overall label health score for this dataset is: 0.72.\n" + ] + } + ], + "source": [ + "# One line of code.\n", + "health = cleanlab.dataset.overall_label_health_score(\n", + " labels, confident_joint=cl.confident_joint\n", + " # cleanlab uses the confident_joint internally to quantify label noise (see cleanlab.count.compute_confident_joint)\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "M85Fta_bjruW" + }, + "source": [ + "### How accurate is this dataset health score?\n", + "\n", + "Because we know the true labels (we created this toy dataset), we can compare with ground truth." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:26.164454Z", + "iopub.status.busy": "2024-06-25T23:03:26.164277Z", + "iopub.status.idle": "2024-06-25T23:03:26.168289Z", + "shell.execute_reply": "2024-06-25T23:03:26.167818Z" + }, + "id": "-iRPe8KXjruW" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Percentage of label issues guessed by cleanlab 28%\n", + "Percentage of (ground truth) label errors): 20%\n", + "\n", + "Question: cleanlab seems to be overestimating. How do we account for this 8% difference?\n", + "Answer: Data points that fall in between two overlapping distributions are often impossible to label and are counted as issues.\n" + ] + } + ], + "source": [ + "label_acc = sum(labels != true_labels) / len(labels)\n", + "print(f\"Percentage of label issues guessed by cleanlab {1 - health:.0%}\")\n", + "print(f\"Percentage of (ground truth) label errors): {label_acc:.0%}\")\n", + "\n", + "offset = (1 - label_acc) - health\n", + "\n", + "print(\n", + " f\"\\nQuestion: cleanlab seems to be overestimating.\"\n", + " f\" How do we account for this {offset:.0%} difference?\"\n", + ")\n", + "print(\n", + " \"Answer: Data points that fall in between two overlapping distributions are often \"\n", + " \"impossible to label and are counted as issues.\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8hxY5lxJjruW" + }, + "source": [ + "## **Workflow(s) 7:** Use count, rank, filter modules directly\n", + "\n", + "- Using these modules directly is intended for more experienced cleanlab users. But once you understand how they work, you can create numerous powerful workflows.\n", + "- For these workflows, you **always** need two things:\n", + " 1. Out-of-sample predicted probabilities (e.g. computed via cross-validation)\n", + " 2. Labels (can contain label errors and various issues)\n", + "\n", + "#### cleanlab can compute out-of-sample predicted probabilities for you:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:26.170152Z", + "iopub.status.busy": "2024-06-25T23:03:26.169977Z", + "iopub.status.idle": "2024-06-25T23:03:26.207286Z", + "shell.execute_reply": "2024-06-25T23:03:26.206660Z" + }, + "id": "ZpipUliyjruW" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "pred_probs is a (250, 4) matrix of predicted probabilities\n" + ] + } + ], + "source": [ + "pred_probs = cleanlab.count.estimate_cv_predicted_probabilities(\n", + " data, labels, clf=yourFavoriteModel, seed=SEED\n", + ")\n", + "print(f\"pred_probs is a {pred_probs.shape} matrix of predicted probabilities\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ftWk9CTrjruW" + }, + "source": [ + "### **Workflow 7.1 (count)**: Fully characterize label noise (noise matrix, joint, prior of true labels, ...)\n", + "\n", + "Now that we have `pred_probs` and `labels`, advanced users can compute everything in `cleanlab.count`.\n", + "\n", + "- `py: prob(true_label=k)`\n", + " - For all classes K, this is the distribution over the actual true labels (which cleanlab can estimate for you even though you don't have the true labels).\n", + "- `noise_matrix: p(noisy|true)`\n", + " - This describes how errors were introduced into your labels. It's a conditional probability matrix with the probability of flipping from the true class to every other class for the given label.\n", + "- `inverse_noise_matrix: p(true|noisy)`\n", + " - This tells you the probability, for every class, that the true label is actually a different class.\n", + "- `confident_joint`\n", + " - This is an unnormalized (count-based) estimate of the number of examples in our dataset with each possible (true label, given label) pairing.\n", + "- `joint: p(true label, noisy label)`\n", + " - The joint distribution of noisy (given) and true labels is the most useful of all these statistics. From it, you can compute every other statistic listed above. One entry from this matrix can be interpreted as: \"The proportion of examples in our dataset whose true label is *i* and given label is *j*\".\n", + "\n", + "These five tools fully characterize class-conditional label noise in a dataset.\n", + "\n", + "#### Use cleanlab to estimate and visualize the joint distribution of label noise and noise matrix of label flipping rates:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:26.209735Z", + "iopub.status.busy": "2024-06-25T23:03:26.209531Z", + "iopub.status.idle": "2024-06-25T23:03:26.252694Z", + "shell.execute_reply": "2024-06-25T23:03:26.252052Z" + }, + "id": "SLq-3q4xjruX" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " Joint Label Noise Distribution Matrix P(given_label, true_label) of shape (4, 4)\n", + " p(s,y)\ty=0\ty=1\ty=2\ty=3\n", + "\t---\t---\t---\t---\n", + "s=0 |\t0.27\t0.0\t0.03\t0.03\n", + "s=1 |\t0.02\t0.18\t0.01\t0.0\n", + "s=2 |\t0.06\t0.01\t0.12\t0.06\n", + "s=3 |\t0.01\t0.0\t0.05\t0.14\n", + "\tTrace(matrix) = 0.72\n", + "\n", + "\n", + " Noise Matrix (aka Noisy Channel) P(given_label|true_label) of shape (4, 4)\n", + " p(s|y)\ty=0\ty=1\ty=2\ty=3\n", + "\t---\t---\t---\t---\n", + "s=0 |\t0.76\t0.0\t0.15\t0.14\n", + "s=1 |\t0.06\t0.92\t0.06\t0.0\n", + "s=2 |\t0.17\t0.06\t0.57\t0.25\n", + "s=3 |\t0.02\t0.02\t0.22\t0.61\n", + "\tTrace(matrix) = 2.86\n", + "\n" + ] + } + ], + "source": [ + "(\n", + " py, noise_matrix, inverse_noise_matrix, confident_joint\n", + ") = cleanlab.count.estimate_py_and_noise_matrices_from_probabilities(labels, pred_probs)\n", + "\n", + "# Note: you can also combine the above two lines of code into a single line of code like this\n", + "(\n", + " py, noise_matrix, inverse_noise_matrix, confident_joint, pred_probs\n", + ") = cleanlab.count.estimate_py_noise_matrices_and_cv_pred_proba(\n", + " data, labels, clf=yourFavoriteModel, seed=SEED\n", + ")\n", + "\n", + "# Get the joint distribution of noisy and true labels from the confident joint\n", + "# This is the most powerful statistic in machine learning with noisy labels.\n", + "joint = cleanlab.count.estimate_joint(\n", + " labels, pred_probs, confident_joint=confident_joint\n", + ")\n", + "\n", + "# Pretty print the joint distribution and noise matrix\n", + "cleanlab.internal.util.print_joint_matrix(joint)\n", + "cleanlab.internal.util.print_noise_matrix(noise_matrix)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fKEsc-rBBbuW" + }, + "source": [ + "In some applications, you may have a priori knowledge regarding some of these quantities. In this case, you can pass them directly into cleanlab which may be able to leverage this information to better identify label issues.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:26.255179Z", + "iopub.status.busy": "2024-06-25T23:03:26.254838Z", + "iopub.status.idle": "2024-06-25T23:03:26.349093Z", + "shell.execute_reply": "2024-06-25T23:03:26.348513Z" + }, + "id": "g5LHhhuqFbXK" + }, + "outputs": [], + "source": [ + "cl3 = cleanlab.classification.CleanLearning(yourFavoriteModel, seed=SEED)\n", + "_ = cl3.fit(data, labels, noise_matrix=noise_matrix_true) # CleanLearning with a prioiri known noise_matrix" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cfeJAGyxFFQN" + }, + "source": [ + "### **Workflow 7.2 (filter):** Find label issues for any dataset and any model in one line of code\n", + "\n", + "Features of ``cleanlab.filter.find_label_issues``:\n", + "\n", + "* Versatility -- Choose from several [state-of-the-art](https://arxiv.org/abs/1911.00068) label-issue detection algorithms using ``filter_by=``.\n", + "* Works with any model by using predicted probabilities (no model needed).\n", + "* One line of code :)\n", + "\n", + "Remember ``CleanLearning.find_label_issues``? It uses this method internally." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:26.351751Z", + "iopub.status.busy": "2024-06-25T23:03:26.351554Z", + "iopub.status.idle": "2024-06-25T23:03:26.442325Z", + "shell.execute_reply": "2024-06-25T23:03:26.441715Z" + }, + "id": "p7w8F8ezBcet" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 99, 8, 64, 45, 83, 213, 212, 218, 152, 197, 196, 170, 167,\n", + " 214, 164, 198, 21, 191, 107, 16, 51, 63, 2, 175, 10, 121,\n", + " 117, 24, 95, 82, 76, 26, 90, 25, 62, 22, 92, 49, 97,\n", + " 206, 68, 115, 7, 48, 43, 193, 184, 249, 194, 186, 201, 174,\n", + " 188, 163, 150, 190, 169, 151, 168, 54])" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Get out of sample predicted probabilities via cross-validation.\n", + "# Here we demonstrate the use of sklearn cross_val_predict as another option to get cross-validated predicted probabilities\n", + "pred_probs = cross_val_predict(\n", + " estimator=yourFavoriteModel, X=data, y=labels, cv=3, method=\"predict_proba\"\n", + ")\n", + "\n", + "# Find label issues\n", + "label_issues_indices = cleanlab.filter.find_label_issues(\n", + " labels=labels,\n", + " pred_probs=pred_probs,\n", + " filter_by=\"both\", # 5 available filter_by options\n", + " return_indices_ranked_by=\"self_confidence\", # 3 available label quality scoring options for rank ordering\n", + " rank_by_kwargs={\n", + " \"adjust_pred_probs\": True # adjust predicted probabilities (see docstring for more details)\n", + " },\n", + ")\n", + "\n", + "# Return dataset indices of examples with label issues\n", + "label_issues_indices" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4-ANXupQJPH8" + }, + "source": [ + "\n", + "#### Again, we can visualize the twenty examples with lowest label quality to see if Cleanlab works." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:26.444656Z", + "iopub.status.busy": "2024-06-25T23:03:26.444353Z", + "iopub.status.idle": "2024-06-25T23:03:26.654538Z", + "shell.execute_reply": "2024-06-25T23:03:26.653933Z" + }, + "id": "WETRL74tE_sU" + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_data(data, circles=label_issues_indices[:20], title=\"Top 20 label issues found by cleanlab.filter.find_label_issues()\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BcekDhvFLntB" + }, + "source": [ + "### Workflow 7.2 supports lots of methods to ``find_label_issues()`` via the ``filter_by`` parameter.\n", + "* Here, we evaluate precision/recall/f1/accuracy of detecting true label issues for each method." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:26.656939Z", + "iopub.status.busy": "2024-06-25T23:03:26.656565Z", + "iopub.status.idle": "2024-06-25T23:03:26.842129Z", + "shell.execute_reply": "2024-06-25T23:03:26.841534Z" + }, + "id": "kCfdx2gOLmXS" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
filter_by algorithmprecisionrecallf1accuracy
0prune_by_noise_rate0.7187500.920.8070180.912
2both0.7333330.880.8000000.912
3confident_learning0.7213110.880.7927930.908
1prune_by_class0.6769230.880.7652170.892
4predicted_neq_given0.5679010.920.7022900.844
\n", + "
" + ], + "text/plain": [ + " filter_by algorithm precision recall f1 accuracy\n", + "0 prune_by_noise_rate 0.718750 0.92 0.807018 0.912\n", + "2 both 0.733333 0.88 0.800000 0.912\n", + "3 confident_learning 0.721311 0.88 0.792793 0.908\n", + "1 prune_by_class 0.676923 0.88 0.765217 0.892\n", + "4 predicted_neq_given 0.567901 0.92 0.702290 0.844" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.metrics import precision_score, recall_score, f1_score\n", + "import pandas as pd\n", + "\n", + "yourFavoriteModel = LogisticRegression(verbose=0, random_state=SEED)\n", + "\n", + "# Get cross-validated predicted probabilities\n", + "# Here we demonstrate the use of sklearn cross_val_predict as another option to get cross-validated predicted probabilities\n", + "pred_probs = cross_val_predict(\n", + " estimator=yourFavoriteModel, X=data, y=labels, cv=3, method=\"predict_proba\"\n", + ")\n", + "\n", + "# Ground truth label issues to use for evaluating different filter_by options\n", + "true_label_issues = (true_labels != labels)\n", + "\n", + "# Find label issues with different filter_by options\n", + "filter_by_list = [\n", + " \"prune_by_noise_rate\",\n", + " \"prune_by_class\",\n", + " \"both\",\n", + " \"confident_learning\",\n", + " \"predicted_neq_given\",\n", + "]\n", + "\n", + "results = []\n", + "\n", + "for filter_by in filter_by_list:\n", + "\n", + " # Find label issues\n", + " label_issues = cleanlab.filter.find_label_issues(\n", + " labels=labels,\n", + " pred_probs=pred_probs,\n", + " filter_by=filter_by\n", + " )\n", + "\n", + " precision = precision_score(true_label_issues, label_issues)\n", + " recall = recall_score(true_label_issues, label_issues)\n", + " f1 = f1_score(true_label_issues, label_issues)\n", + " acc = accuracy_score(true_label_issues, label_issues)\n", + "\n", + " result = {\n", + " \"filter_by algorithm\": filter_by,\n", + " \"precision\": precision,\n", + " \"recall\": recall,\n", + " \"f1\": f1,\n", + " \"accuracy\": acc\n", + " }\n", + "\n", + " results.append(result)\n", + "\n", + "# summary of results\n", + "pd.DataFrame(results).sort_values(by='f1', ascending=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vNkStbegYk7y" + }, + "source": [ + "### **Workflow 7.3 (rank):** Automatically rank every example by a unique label quality score. Find errors using `cleanlab.count.num_label_issues` as a threshold.\n", + "\n", + "cleanlab can analyze every label in a dataset and provide a numerical score gauging its overall quality. Low-quality labels indicate examples that should be more closely inspected, perhaps because their given label is incorrect, or simply because they represent an ambiguous edge-case that's worth a second look." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:26.844331Z", + "iopub.status.busy": "2024-06-25T23:03:26.844101Z", + "iopub.status.idle": "2024-06-25T23:03:26.850172Z", + "shell.execute_reply": "2024-06-25T23:03:26.849654Z" + }, + "id": "-uogYRWFYnuu" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 99, 8, 64, 107, 10, 16, 51, 63, 121, 213, 212, 218, 117,\n", + " 2, 152, 197, 196, 170, 45, 24, 167, 83, 95, 82, 76, 26,\n", + " 90, 214, 164, 25, 62, 22, 198, 92, 21, 191, 49, 97, 68,\n", + " 115, 7, 48, 43, 193, 184, 194, 186, 174, 188, 163, 155, 150,\n", + " 190, 169, 156, 151, 168, 54, 172, 176, 157])" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Estimate the number of label issues\n", + "label_issues_count = cleanlab.count.num_label_issues(\n", + " labels=labels,\n", + " pred_probs=pred_probs\n", + ")\n", + "\n", + "# Get label quality scores\n", + "label_quality_scores = cleanlab.rank.get_label_quality_scores(\n", + " labels=labels,\n", + " pred_probs=pred_probs,\n", + " method=\"self_confidence\"\n", + ")\n", + "\n", + "# Rank-order by label quality scores and get the top estimated number of label issues\n", + "label_issues_indices = np.argsort(label_quality_scores)[:label_issues_count]\n", + "\n", + "label_issues_indices" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Qe-nGjdeYu3J" + }, + "source": [ + "#### Again, we can visualize the label issues found to see if Cleanlab works." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:26.852232Z", + "iopub.status.busy": "2024-06-25T23:03:26.851968Z", + "iopub.status.idle": "2024-06-25T23:03:27.064495Z", + "shell.execute_reply": "2024-06-25T23:03:27.063881Z" + }, + "id": "pG-ljrmcYp9Q" + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_data(data, circles=label_issues_indices[:20], title=\"Top 20 label issues using cleanlab.rank with cleanlab.count.num_label_issues()\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ol57ouSTNAfZ" + }, + "source": [ + "#### Not sure when to use Workflow 7.2 or 7.3 to find label issues?\n", + "\n", + "* Workflow 7.2 is the easiest to use as its just one line of code.\n", + "* Workflow 7.3 is modular and extensible. As we add more label and data quality scoring functions in ``cleanlab.rank``, Workflow 7.3 will always work.\n", + "* Workflow 7.3 is also for users who have a custom way to rank their data by label quality, and they just need to know what the cut-off is, found via ``cleanlab.count.num_label_issues``." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gRfHlDlEKyRD" + }, + "source": [ + "## **Workflow 8:** Ensembling label quality scores from multiple predictors" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:27.066898Z", + "iopub.status.busy": "2024-06-25T23:03:27.066558Z", + "iopub.status.idle": "2024-06-25T23:03:28.119015Z", + "shell.execute_reply": "2024-06-25T23:03:28.118393Z" + }, + "id": "wL3ngCnuLEWd" + }, + "outputs": [], + "source": [ + "from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier\n", + "\n", + "# 3 models in ensemble\n", + "model1 = LogisticRegression(penalty=\"l2\", verbose=0, random_state=SEED)\n", + "model2 = RandomForestClassifier(max_depth=5, random_state=SEED)\n", + "model3 = GradientBoostingClassifier(\n", + " n_estimators=100, learning_rate=1.0, max_depth=3, random_state=SEED\n", + ")\n", + "\n", + "# Get cross-validated predicted probabilities from each model\n", + "cv_pred_probs_1 = cross_val_predict(\n", + " estimator=model1, X=data, y=labels, cv=3, method=\"predict_proba\"\n", + ")\n", + "cv_pred_probs_2 = cross_val_predict(\n", + " estimator=model2, X=data, y=labels, cv=3, method=\"predict_proba\"\n", + ")\n", + "cv_pred_probs_3 = cross_val_predict(\n", + " estimator=model3, X=data, y=labels, cv=3, method=\"predict_proba\"\n", + ")\n", + "\n", + "# List of predicted probabilities from each model\n", + "pred_probs_list = [cv_pred_probs_1, cv_pred_probs_2, cv_pred_probs_3]\n", + "\n", + "# Get ensemble label quality scores\n", + "label_quality_scores_best = cleanlab.rank.get_label_quality_ensemble_scores(\n", + " labels=labels, pred_probs_list=pred_probs_list, verbose=False\n", + ")\n", + "\n", + "# Alternative approach: create single ensemble predictor and get its pred_probs\n", + "cv_pred_probs_ensemble = (cv_pred_probs_1 + cv_pred_probs_2 + cv_pred_probs_3)/3 # uniform aggregation of predictions\n", + "\n", + "# Use this single set of pred_probs to find label issues\n", + "label_quality_scores_better = cleanlab.rank.get_label_quality_scores(\n", + " labels=labels, pred_probs=cv_pred_probs_ensemble\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Z-ghgvqVcOJa" + }, + "source": [ + "While ensembling different models' label quality scores (`label_quality_scores_best`) will often be superior to getting label quality scores from a single ensemble predictor (`label_quality_scores_better`), both approaches produce significantly better label quality scores than just using the predictions from a single model." + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "tutorial_cleanlab_2_0.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/multiannotator.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/multiannotator.ipynb new file mode 100644 index 000000000..f74ce1484 --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/multiannotator.ipynb @@ -0,0 +1,1584 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4c7436b8", + "metadata": {}, + "source": [ + "# Estimate Consensus and Annotator Quality for Data Labeled by Multiple Annotators" + ] + }, + { + "cell_type": "markdown", + "id": "4b432513", + "metadata": {}, + "source": [ + "This 5-minute quickstart tutorial shows how to use cleanlab for classification data that has been labeled by *multiple* annotators (where each example has been labeled by at least one annotator, but not every annotator has labeled every example). Compared to existing crowdsourcing tools, cleanlab helps you better analyze such data by leveraging a trained classifier model in addition to the raw annotations. With one line of code, you can automatically compute:\n", + "\n", + "- A **consensus label** for each example (i.e. *truth inference*) that aggregates the individual annotations (more accurately than algorithms from crowdsourcing like majority-vote, Dawid-Skene, or GLAD).\n", + "- A **quality score for each consensus label** which measures our confidence that this label is correct (via well-calibrated estimates that account for the: number of annotators which have labeled this example, overall quality of each annotator, and quality of our trained ML models).\n", + "- An analogous **label quality score** for each individual label chosen by one annotator for a particular example (to measure our confidence in alternate labels when annotators differ from the consensus).\n", + "- An **overall quality score for each annotator** which measures our confidence in the overall correctness of labels obtained from this annotator.\n", + "\n", + "**Overview of what we'll do in this tutorial:**\n", + "\n", + "- Obtain initial consensus labels of multiannotator data using majority vote.\n", + "- Train a classifier model on the initial consensus labels and use it to obtain out-of-sample predicted class probabilities.\n", + "- Use cleanlab's `multiannotator.get_label_quality_multiannotator` function to get improved consensus labels that more accurately reflect the ground truth.\n", + "- View other information about your multiannotator dataset, such as consensus and annotator quality scores, agreement between annotators, detailed label quality scores and more!\n", + "\n", + "**Consensus labels** represent the best guess of the true label for each example and can be used for more reliable modeling/analytics. Cleanlab automatically produces enhanced estimates of consensus through the use of machine learning.\n", + "**Quality scores** help us determine how much trust we can place in each: consensus label, individual annotator, and particular label from a particular annotator. These quality scores can help you determine which annotators are best/worst overall, as well as which current consensus labels are least trustworthy and should perhaps be verified via additional annotation. \n", + "\n", + "This tutorial uses a toy *tabular* dataset labeled with multiple annotators but **these steps can easily be applied to image or text data**." + ] + }, + { + "cell_type": "markdown", + "id": "03385f84", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "Already have `multiannotator_labels` and (out-of-sample) `pred_probs` from a model trained on an existing set of consensus labels? Run the code below to get improved consensus labels and more information about the quality of your labels and annotators.\n", + "\n", + "
\n", + " \n", + "```ipython3 \n", + "from cleanlab.multiannotator import get_label_quality_multiannotator\n", + "\n", + "get_label_quality_multiannotator(multiannotator_labels, pred_probs)\n", + "\n", + "```\n", + "\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "e6a48d31", + "metadata": {}, + "source": [ + "## 1. Install and import required dependencies" + ] + }, + { + "cell_type": "markdown", + "id": "6c6e5b15", + "metadata": {}, + "source": [ + "You can use `pip` to install all packages required for this tutorial as follows:\n", + "\n", + "```ipython3\n", + "!pip install cleanlab\n", + "\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a3ddc95f", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:31.627016Z", + "iopub.status.busy": "2024-06-25T23:03:31.626842Z", + "iopub.status.idle": "2024-06-25T23:03:32.738771Z", + "shell.execute_reply": "2024-06-25T23:03:32.738132Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs website).\n", + "dependencies = [\"cleanlab\"]\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "markdown", + "id": "dd0148e6", + "metadata": {}, + "source": [ + "Let’s import some of the packages needed throughout this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "c4efd119", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:32.741556Z", + "iopub.status.busy": "2024-06-25T23:03:32.740987Z", + "iopub.status.idle": "2024-06-25T23:03:32.744226Z", + "shell.execute_reply": "2024-06-25T23:03:32.743675Z" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.model_selection import cross_val_predict\n", + "\n", + "from cleanlab.multiannotator import get_label_quality_multiannotator, get_majority_vote_label" + ] + }, + { + "cell_type": "markdown", + "id": "345b6678", + "metadata": {}, + "source": [ + "## 2. Create the data (can skip these details)" + ] + }, + { + "cell_type": "markdown", + "id": "82aeedc8", + "metadata": {}, + "source": [ + "For this tutorial we will generate a toy dataset that has 50 annotators and 300 examples. There are three possible classes, `0`, `1` and `2`. \n", + "\n", + "Each annotator annotates approximately 10% of the examples. We also synthetically made the last 5 annotators in our toy dataset have much noisier labels than the rest of the annotators.\n", + "\n", + "Solely for evaluating cleanlab's consensus labels against other consensus methods, we here also generate the true labels for this example dataset. However, true labels are not required for any cleanlab multiannotator functions (and they usually are not available in real applications).\n", + "To generate our multiannotator data, we define a `make_data()` method (can skip these details)." + ] + }, + { + "cell_type": "markdown", + "id": "69b5ddaa", + "metadata": {}, + "source": [ + "
See the code for data generation **(click to expand)**\n", + " \n", + "```ipython3\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + " \n", + "from cleanlab.benchmarking.noise_generation import generate_noise_matrix_from_trace\n", + "from cleanlab.benchmarking.noise_generation import generate_noisy_labels\n", + "\n", + "SEED = 111 # set to None for non-reproducible randomness\n", + "np.random.seed(seed=SEED)\n", + "\n", + "def make_data(\n", + " means=[[3, 2], [7, 7], [0, 8]],\n", + " covs=[[[5, -1.5], [-1.5, 1]], [[1, 0.5], [0.5, 4]], [[5, 1], [1, 5]]],\n", + " sizes=[150, 75, 75],\n", + " num_annotators=50,\n", + "):\n", + " \n", + " m = len(means) # number of classes\n", + " n = sum(sizes)\n", + " local_data = []\n", + " labels = []\n", + "\n", + " for idx in range(m):\n", + " local_data.append(\n", + " np.random.multivariate_normal(mean=means[idx], cov=covs[idx], size=sizes[idx])\n", + " )\n", + " labels.append(np.array([idx for i in range(sizes[idx])]))\n", + " X_train = np.vstack(local_data)\n", + " true_labels_train = np.hstack(labels)\n", + "\n", + " # Compute p(true_label=k)\n", + " py = np.bincount(true_labels_train) / float(len(true_labels_train))\n", + " \n", + " noise_matrix_better = generate_noise_matrix_from_trace(\n", + " m,\n", + " trace=0.8 * m,\n", + " py=py,\n", + " valid_noise_matrix=True,\n", + " seed=SEED,\n", + " )\n", + " \n", + " noise_matrix_worse = generate_noise_matrix_from_trace(\n", + " m,\n", + " trace=0.35 * m,\n", + " py=py,\n", + " valid_noise_matrix=True,\n", + " seed=SEED,\n", + " )\n", + "\n", + " # Generate our noisy labels using the noise_matrix for specified number of annotators.\n", + " s = pd.DataFrame(\n", + " np.vstack(\n", + " [\n", + " generate_noisy_labels(true_labels_train, noise_matrix_better)\n", + " if i < num_annotators - 5\n", + " else generate_noisy_labels(true_labels_train, noise_matrix_worse)\n", + " for i in range(num_annotators)\n", + " ]\n", + " ).transpose()\n", + " )\n", + "\n", + " # Each annotator only labels approximately 10% of the dataset\n", + " # (unlabeled points represented with NaN)\n", + " s = s.apply(lambda x: x.mask(np.random.random(n) < 0.9)).astype(\"Int64\")\n", + " s.dropna(axis=1, how=\"all\", inplace=True)\n", + " s.columns = [\"A\" + str(i).zfill(4) for i in range(1, num_annotators+1)]\n", + "\n", + " row_NA_check = pd.notna(s).any(axis=1)\n", + "\n", + " return {\n", + " \"X_train\": X_train[row_NA_check],\n", + " \"true_labels_train\": true_labels_train[row_NA_check],\n", + " \"multiannotator_labels\": s[row_NA_check].reset_index(drop=True),\n", + " }\n", + "```\n", + " \n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "c37c0a69", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:32.746401Z", + "iopub.status.busy": "2024-06-25T23:03:32.746075Z", + "iopub.status.idle": "2024-06-25T23:03:32.754505Z", + "shell.execute_reply": "2024-06-25T23:03:32.753935Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "from cleanlab.benchmarking.noise_generation import generate_noise_matrix_from_trace\n", + "from cleanlab.benchmarking.noise_generation import generate_noisy_labels\n", + "\n", + "SEED = 111 # set to None for non-reproducible randomness\n", + "np.random.seed(seed=SEED)\n", + "\n", + "def make_data(\n", + " means=[[3, 2], [7, 7], [0, 8]],\n", + " covs=[[[5, -1.5], [-1.5, 1]], [[1, 0.5], [0.5, 4]], [[5, 1], [1, 5]]],\n", + " sizes=[150, 75, 75],\n", + " num_annotators=50,\n", + "):\n", + " \n", + " m = len(means) # number of classes\n", + " n = sum(sizes)\n", + " local_data = []\n", + " labels = []\n", + "\n", + " for idx in range(m):\n", + " local_data.append(\n", + " np.random.multivariate_normal(mean=means[idx], cov=covs[idx], size=sizes[idx])\n", + " )\n", + " labels.append(np.array([idx for i in range(sizes[idx])]))\n", + " X_train = np.vstack(local_data)\n", + " true_labels_train = np.hstack(labels)\n", + "\n", + " # Compute p(true_label=k)\n", + " py = np.bincount(true_labels_train) / float(len(true_labels_train))\n", + " \n", + " noise_matrix_better = generate_noise_matrix_from_trace(\n", + " m,\n", + " trace=0.8 * m,\n", + " py=py,\n", + " valid_noise_matrix=True,\n", + " seed=SEED,\n", + " )\n", + " \n", + " noise_matrix_worse = generate_noise_matrix_from_trace(\n", + " m,\n", + " trace=0.35 * m,\n", + " py=py,\n", + " valid_noise_matrix=True,\n", + " seed=SEED,\n", + " )\n", + "\n", + " # Generate our noisy labels using the noise_matrix for specified number of annotators.\n", + " s = pd.DataFrame(\n", + " np.vstack(\n", + " [\n", + " generate_noisy_labels(true_labels_train, noise_matrix_better)\n", + " if i < num_annotators - 5\n", + " else generate_noisy_labels(true_labels_train, noise_matrix_worse)\n", + " for i in range(num_annotators)\n", + " ]\n", + " ).transpose()\n", + " )\n", + "\n", + " # Each annotator only labels approximately 10% of the dataset\n", + " # (unlabeled points represented with NaN)\n", + " s = s.apply(lambda x: x.mask(np.random.random(n) < 0.9)).astype(\"Int64\")\n", + " s.dropna(axis=1, how=\"all\", inplace=True)\n", + " s.columns = [\"A\" + str(i).zfill(4) for i in range(1, num_annotators+1)]\n", + "\n", + " row_NA_check = pd.notna(s).any(axis=1)\n", + "\n", + " return {\n", + " \"X_train\": X_train[row_NA_check],\n", + " \"true_labels_train\": true_labels_train[row_NA_check],\n", + " \"multiannotator_labels\": s[row_NA_check].reset_index(drop=True),\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "99f69523", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:32.756525Z", + "iopub.status.busy": "2024-06-25T23:03:32.756129Z", + "iopub.status.idle": "2024-06-25T23:03:32.809037Z", + "shell.execute_reply": "2024-06-25T23:03:32.808594Z" + } + }, + "outputs": [], + "source": [ + "data_dict = make_data()\n", + "\n", + "X = data_dict[\"X_train\"]\n", + "multiannotator_labels = data_dict[\"multiannotator_labels\"]\n", + "true_labels = data_dict[\"true_labels_train\"] # used for comparing the accuracy of consensus labels" + ] + }, + { + "cell_type": "markdown", + "id": "4a705e28", + "metadata": {}, + "source": [ + "Let's view the first few rows of the data used for this tutorial. Here are the labels selected by each annotator for the first few examples (rows) in the dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "8f241c16", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:32.811247Z", + "iopub.status.busy": "2024-06-25T23:03:32.810843Z", + "iopub.status.idle": "2024-06-25T23:03:32.827851Z", + "shell.execute_reply": "2024-06-25T23:03:32.827353Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
A0001A0002A0003A0004A0005A0006A0007A0008A0009A0010...A0041A0042A0043A0044A0045A0046A0047A0048A0049A0050
0<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
1<NA><NA><NA><NA><NA><NA>0<NA><NA><NA>...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
2<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>...<NA>0<NA><NA><NA><NA><NA>2<NA><NA>
3<NA><NA><NA><NA><NA><NA>2<NA><NA><NA>...0<NA><NA><NA><NA><NA><NA><NA><NA><NA>
4<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>...<NA><NA><NA>2<NA><NA>0<NA><NA><NA>
\n", + "

5 rows × 50 columns

\n", + "
" + ], + "text/plain": [ + " A0001 A0002 A0003 A0004 A0005 A0006 A0007 A0008 A0009 A0010 ... \\\n", + "0 ... \n", + "1 0 ... \n", + "2 ... \n", + "3 2 ... \n", + "4 ... \n", + "\n", + " A0041 A0042 A0043 A0044 A0045 A0046 A0047 A0048 A0049 A0050 \n", + "0 \n", + "1 \n", + "2 0 2 \n", + "3 0 \n", + "4 2 0 \n", + "\n", + "[5 rows x 50 columns]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "multiannotator_labels.head()" + ] + }, + { + "cell_type": "markdown", + "id": "4a705e29", + "metadata": {}, + "source": [ + "Here are the corresponding features for these examples:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "4f0819ba", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:32.829943Z", + "iopub.status.busy": "2024-06-25T23:03:32.829624Z", + "iopub.status.idle": "2024-06-25T23:03:32.833501Z", + "shell.execute_reply": "2024-06-25T23:03:32.833060Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 5.60856743, 1.41693214],\n", + " [-0.40908785, 2.87147629],\n", + " [ 4.64941785, 1.10774851],\n", + " [ 3.0524466 , 1.71853246],\n", + " [ 4.37169848, 0.66031048]])" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X[:5]" + ] + }, + { + "cell_type": "markdown", + "id": "0cb8131d", + "metadata": {}, + "source": [ + "`multiannotator_labels` contains the class label that each annotator chose for each example in the dataset, with examples that a particular annotator did not label represented using `np.nan`. \n", + "`X` contains the features for each example, which happen to be numeric in this tutorial but any feature modality can be used with ``cleanlab.multiannotator``." + ] + }, + { + "cell_type": "markdown", + "id": "946726ad", + "metadata": {}, + "source": [ + "
\n", + "Bringing Your Own Data (BYOD)?\n", + "\n", + "You can easily replace the above with your own multiannotator labels and features, then continue with the rest of the tutorial.\n", + " \n", + "`multiannotator_labels` should be a numpy array or pandas DataFrame with each column representing an annotator and each row representing an example. Your labels should be represented as integer indices 0, 1, ..., num_classes - 1, where examples that are not annotated by a particular annotator are represented using `np.nan` or `pd.NA`. If you have string labels or other labels that do not fit the required format, you can convert them to the proper format using `cleanlab.internal.multiannotator_utils.format_multiannotator_labels`. \n", + " \n", + "Your features can be represented however you like (since these are not inputs to `cleanlab.multiannotator` methods) as long as you are able to fit a classifer to them and obtain its predicted class probabilities! \n", + "\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "51335def", + "metadata": {}, + "source": [ + "## 3. Get initial consensus labels via majority vote and compute out-of-sample predicted probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "c1857cc7", + "metadata": {}, + "source": [ + "Before training a machine learning model, we must first obtain initial consensus labels from the data annotations representing a crude guess of the best label for each example. The most straight forward way to obtain an initial set of consensus labels is via simple majority vote." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "d009f347", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:32.835547Z", + "iopub.status.busy": "2024-06-25T23:03:32.835233Z", + "iopub.status.idle": "2024-06-25T23:03:32.849533Z", + "shell.execute_reply": "2024-06-25T23:03:32.848983Z" + } + }, + "outputs": [], + "source": [ + "majority_vote_label = get_majority_vote_label(multiannotator_labels)" + ] + }, + { + "cell_type": "markdown", + "id": "7287b733", + "metadata": {}, + "source": [ + "Majority vote consensus labels may not be very reliable, particularly for examples that were only labeled by one or a few annotators. To more reliably estimate consensus, we can account for the features associated with each example (based on which the annotations were derived in the first place). Fitting a classifier model serves as a natural way to account for these feature values, here we train a simple logistic regression model to get significantly more accurate estimates of consensus labels and associated quality scores.\n", + "\n", + "We fit the model with our initial consensus labels, and then get (out-of-sample) predicted class probabilities for each example in the dataset from the trained model. These predicted probabilities help us estimate the best consensus labels and associated confidence values in a statistically optimal manner that accounts for all the available information." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "cbd1e415", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:32.851548Z", + "iopub.status.busy": "2024-06-25T23:03:32.851229Z", + "iopub.status.idle": "2024-06-25T23:03:32.876984Z", + "shell.execute_reply": "2024-06-25T23:03:32.876557Z" + } + }, + "outputs": [], + "source": [ + "model = LogisticRegression()\n", + "\n", + "num_crossval_folds = 5 \n", + "pred_probs = cross_val_predict(\n", + " estimator=model, X=X, y=majority_vote_label, cv=num_crossval_folds, method=\"predict_proba\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "4eab5188", + "metadata": {}, + "source": [ + "## 4. Use cleanlab to get better consensus labels and other statistics" + ] + }, + { + "cell_type": "markdown", + "id": "4d392ce5", + "metadata": {}, + "source": [ + "Using the annotators' labels and the (out-of-sample) predicted class probabilities from the model, cleanlab can estimate **improved consensus labels** for our data that are more accurate than our initial consensus labels were.\n", + "\n", + "Having accurate labels provides insight on each annotator's label quality and is key for boosting model accuracy and achieving dependable real-world results." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "6ca92617", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:32.879167Z", + "iopub.status.busy": "2024-06-25T23:03:32.878774Z", + "iopub.status.idle": "2024-06-25T23:03:34.781620Z", + "shell.execute_reply": "2024-06-25T23:03:34.780988Z" + } + }, + "outputs": [], + "source": [ + "results = get_label_quality_multiannotator(multiannotator_labels, pred_probs, verbose=False)" + ] + }, + { + "cell_type": "markdown", + "id": "98042e7f", + "metadata": {}, + "source": [ + "Here, we use the `multiannotator.get_label_quality_multiannotator()` function which returns a dictionary containing three items:\n" + ] + }, + { + "cell_type": "markdown", + "id": "76d7c0e2", + "metadata": {}, + "source": [ + "1. `label_quality` which gives us the improved consensus labels using information from each of the annotators and the model. The DataFrame also contains information about the number of annotations, annotator agreement and consensus quality score for each example.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "bf945113", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:34.784282Z", + "iopub.status.busy": "2024-06-25T23:03:34.783833Z", + "iopub.status.idle": "2024-06-25T23:03:34.790729Z", + "shell.execute_reply": "2024-06-25T23:03:34.790184Z" + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
consensus_labelconsensus_quality_scoreannotator_agreementnum_annotations
000.7361180.52
100.7577511.03
200.7822320.65
300.7155650.65
400.8242560.85
\n", + "
" + ], + "text/plain": [ + " consensus_label consensus_quality_score annotator_agreement \\\n", + "0 0 0.736118 0.5 \n", + "1 0 0.757751 1.0 \n", + "2 0 0.782232 0.6 \n", + "3 0 0.715565 0.6 \n", + "4 0 0.824256 0.8 \n", + "\n", + " num_annotations \n", + "0 2 \n", + "1 3 \n", + "2 5 \n", + "3 5 \n", + "4 5 " + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results[\"label_quality\"].head()" + ] + }, + { + "cell_type": "markdown", + "id": "984d65c4", + "metadata": {}, + "source": [ + "2. `detailed_label_quality` which returns the label quality score for each label given by every annotator" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "14251ee0", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:34.792847Z", + "iopub.status.busy": "2024-06-25T23:03:34.792477Z", + "iopub.status.idle": "2024-06-25T23:03:34.804987Z", + "shell.execute_reply": "2024-06-25T23:03:34.804442Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
quality_annotator_A0001quality_annotator_A0002quality_annotator_A0003quality_annotator_A0004quality_annotator_A0005quality_annotator_A0006quality_annotator_A0007quality_annotator_A0008quality_annotator_A0009quality_annotator_A0010...quality_annotator_A0041quality_annotator_A0042quality_annotator_A0043quality_annotator_A0044quality_annotator_A0045quality_annotator_A0046quality_annotator_A0047quality_annotator_A0048quality_annotator_A0049quality_annotator_A0050
0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1NaNNaNNaNNaNNaNNaN0.757751NaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaN0.782232NaNNaNNaNNaNNaN0.070564NaNNaN
3NaNNaNNaNNaNNaNNaN0.216078NaNNaNNaN...0.715565NaNNaNNaNNaNNaNNaNNaNNaNNaN
4NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaN0.119188NaNNaN0.824256NaNNaNNaN
\n", + "

5 rows × 50 columns

\n", + "
" + ], + "text/plain": [ + " quality_annotator_A0001 quality_annotator_A0002 quality_annotator_A0003 \\\n", + "0 NaN NaN NaN \n", + "1 NaN NaN NaN \n", + "2 NaN NaN NaN \n", + "3 NaN NaN NaN \n", + "4 NaN NaN NaN \n", + "\n", + " quality_annotator_A0004 quality_annotator_A0005 quality_annotator_A0006 \\\n", + "0 NaN NaN NaN \n", + "1 NaN NaN NaN \n", + "2 NaN NaN NaN \n", + "3 NaN NaN NaN \n", + "4 NaN NaN NaN \n", + "\n", + " quality_annotator_A0007 quality_annotator_A0008 quality_annotator_A0009 \\\n", + "0 NaN NaN NaN \n", + "1 0.757751 NaN NaN \n", + "2 NaN NaN NaN \n", + "3 0.216078 NaN NaN \n", + "4 NaN NaN NaN \n", + "\n", + " quality_annotator_A0010 ... quality_annotator_A0041 \\\n", + "0 NaN ... NaN \n", + "1 NaN ... NaN \n", + "2 NaN ... NaN \n", + "3 NaN ... 0.715565 \n", + "4 NaN ... NaN \n", + "\n", + " quality_annotator_A0042 quality_annotator_A0043 quality_annotator_A0044 \\\n", + "0 NaN NaN NaN \n", + "1 NaN NaN NaN \n", + "2 0.782232 NaN NaN \n", + "3 NaN NaN NaN \n", + "4 NaN NaN 0.119188 \n", + "\n", + " quality_annotator_A0045 quality_annotator_A0046 quality_annotator_A0047 \\\n", + "0 NaN NaN NaN \n", + "1 NaN NaN NaN \n", + "2 NaN NaN NaN \n", + "3 NaN NaN NaN \n", + "4 NaN NaN 0.824256 \n", + "\n", + " quality_annotator_A0048 quality_annotator_A0049 quality_annotator_A0050 \n", + "0 NaN NaN NaN \n", + "1 NaN NaN NaN \n", + "2 0.070564 NaN NaN \n", + "3 NaN NaN NaN \n", + "4 NaN NaN NaN \n", + "\n", + "[5 rows x 50 columns]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results[\"detailed_label_quality\"].head()" + ] + }, + { + "cell_type": "markdown", + "id": "db02e63d", + "metadata": {}, + "source": [ + "3. `annotator_stats` which gives us the annotator quality score for each annotator, alongisde other information such as the number of examples each annotator labeled, their agreement with the consensus labels and the class they perform the worst at. " + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "efe16638", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:34.806990Z", + "iopub.status.busy": "2024-06-25T23:03:34.806665Z", + "iopub.status.idle": "2024-06-25T23:03:34.812916Z", + "shell.execute_reply": "2024-06-25T23:03:34.812465Z" + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
annotator_qualityagreement_with_consensusworst_classnum_examples_labeled
A00500.2449810.208333224
A00470.2959790.294118234
A00490.3241970.310345129
A00460.3553160.346154126
A00480.4397320.480000225
A00310.5232050.580645231
A00340.5353130.607143228
A00210.6069990.718750132
A00150.6095260.678571228
A00110.6211030.692308126
\n", + "
" + ], + "text/plain": [ + " annotator_quality agreement_with_consensus worst_class \\\n", + "A0050 0.244981 0.208333 2 \n", + "A0047 0.295979 0.294118 2 \n", + "A0049 0.324197 0.310345 1 \n", + "A0046 0.355316 0.346154 1 \n", + "A0048 0.439732 0.480000 2 \n", + "A0031 0.523205 0.580645 2 \n", + "A0034 0.535313 0.607143 2 \n", + "A0021 0.606999 0.718750 1 \n", + "A0015 0.609526 0.678571 2 \n", + "A0011 0.621103 0.692308 1 \n", + "\n", + " num_examples_labeled \n", + "A0050 24 \n", + "A0047 34 \n", + "A0049 29 \n", + "A0046 26 \n", + "A0048 25 \n", + "A0031 31 \n", + "A0034 28 \n", + "A0021 32 \n", + "A0015 28 \n", + "A0011 26 " + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results[\"annotator_stats\"].head(10)" + ] + }, + { + "cell_type": "markdown", + "id": "a0d09bfa", + "metadata": {}, + "source": [ + "The `annotator_stats` DataFrame is sorted by increasing `annotator_quality`, showing us the worst annotators first.\n", + "\n", + "Notice that in the above table annotators with ids A0046 to A0050 have the worst annotator quality score, which is expected because we made the last 5 annotators systematically worse than the rest." + ] + }, + { + "cell_type": "markdown", + "id": "20ca8dd2", + "metadata": {}, + "source": [ + "### Comparing improved consensus labels" + ] + }, + { + "cell_type": "markdown", + "id": "1b49657d", + "metadata": {}, + "source": [ + "We can get the improved consensus labels from the `label_quality` DataFrame shown above." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "abd0fb0b", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:34.815004Z", + "iopub.status.busy": "2024-06-25T23:03:34.814624Z", + "iopub.status.idle": "2024-06-25T23:03:34.817354Z", + "shell.execute_reply": "2024-06-25T23:03:34.816825Z" + } + }, + "outputs": [], + "source": [ + "improved_consensus_label = results[\"label_quality\"][\"consensus_label\"].values" + ] + }, + { + "cell_type": "markdown", + "id": "1fd7a5fd", + "metadata": {}, + "source": [ + "Since our toy dataset is synthetically generated by adding noise to each annotator's labels, we know the ground truth labels for each example. Hence we can compare the accuracy of the consensus labels obtained using majority vote, and the improved consensus labels obtained using cleanlab." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "cdf061df", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:34.819538Z", + "iopub.status.busy": "2024-06-25T23:03:34.819165Z", + "iopub.status.idle": "2024-06-25T23:03:34.822725Z", + "shell.execute_reply": "2024-06-25T23:03:34.822177Z" + }, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Accuracy of majority vote labels = 0.8581081081081081\n", + "Accuracy of cleanlab consensus labels = 0.9797297297297297\n" + ] + } + ], + "source": [ + "majority_vote_accuracy = np.mean(true_labels == majority_vote_label)\n", + "cleanlab_label_accuracy = np.mean(true_labels == improved_consensus_label)\n", + "\n", + "print(f\"Accuracy of majority vote labels = {majority_vote_accuracy}\")\n", + "print(f\"Accuracy of cleanlab consensus labels = {cleanlab_label_accuracy}\")" + ] + }, + { + "cell_type": "markdown", + "id": "2c20b2c9", + "metadata": {}, + "source": [ + "We can see that the accuracy of the consensus labels improved as a result of using cleanlab, which not only takes the annotators' labels into account, but also a model to compute better consensus labels." + ] + }, + { + "cell_type": "markdown", + "id": "f82dd4d5", + "metadata": {}, + "source": [ + "### Inspecting consensus quality scores to find potential consensus label errors" + ] + }, + { + "cell_type": "markdown", + "id": "fddb5453", + "metadata": {}, + "source": [ + "We can get the consensus quality score from the `label_quality` DataFrame shown above." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "08949890", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:34.824723Z", + "iopub.status.busy": "2024-06-25T23:03:34.824361Z", + "iopub.status.idle": "2024-06-25T23:03:34.826875Z", + "shell.execute_reply": "2024-06-25T23:03:34.826464Z" + } + }, + "outputs": [], + "source": [ + "consensus_quality_score = results[\"label_quality\"][\"consensus_quality_score\"]" + ] + }, + { + "cell_type": "markdown", + "id": "5f150a08", + "metadata": {}, + "source": [ + "Besides obtaining improved consensus labels, cleanlab also computes consensus quality scores for each example. The lower scores represent potential consensus label errors in the dataset.\n", + "\n", + "Here, we will extract 15 examples that have the lowest consensus quality score, and we can compare their average accuracy when compared to the true labels. We will also compute the average accuracy for the rest of the examples for comparison." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "6948b073", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:34.828877Z", + "iopub.status.busy": "2024-06-25T23:03:34.828554Z", + "iopub.status.idle": "2024-06-25T23:03:34.832657Z", + "shell.execute_reply": "2024-06-25T23:03:34.832132Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Accuracy of 15 worst quality examples = 0.8\n", + "Accuracy of better quality examples = 0.9893238434163701\n" + ] + } + ], + "source": [ + "sorted_consensus_quality_score = consensus_quality_score.sort_values()\n", + "worst_quality = sorted_consensus_quality_score.index[:15]\n", + "better_quality = sorted_consensus_quality_score.index[15:]\n", + "\n", + "worst_quality_accuracy = np.mean(true_labels[worst_quality] == improved_consensus_label[worst_quality])\n", + "better_quality_accuracy = np.mean(true_labels[better_quality] == improved_consensus_label[better_quality])\n", + "\n", + "print(f\"Accuracy of 15 worst quality examples = {worst_quality_accuracy}\")\n", + "print(f\"Accuracy of better quality examples = {better_quality_accuracy}\")" + ] + }, + { + "cell_type": "markdown", + "id": "4fdf4d91", + "metadata": {}, + "source": [ + "We observe that the 15 worst-consensus-quality-score examples have a lower average accuracy compared to the rest of the examples. Cleanlab automatically determines which consensus labels are least trustworthy (perhaps want to have another annotator look at that data). Here we see these trustworthiness estimates really do correspond to the true quality of the consensus labels (which we know in this toy dataset because we have the true labels, unlike in your applications)" + ] + }, + { + "cell_type": "markdown", + "id": "06cae16a", + "metadata": {}, + "source": [ + "## 5. Retrain model using improved consensus labels" + ] + }, + { + "cell_type": "markdown", + "id": "8d4e31ab", + "metadata": {}, + "source": [ + "After obtaining the improved consensus labels, we can now retrain a better version of our machine learning model using these newly obtained labels. " + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "6f8e6914", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:34.834651Z", + "iopub.status.busy": "2024-06-25T23:03:34.834347Z", + "iopub.status.idle": "2024-06-25T23:03:34.863829Z", + "shell.execute_reply": "2024-06-25T23:03:34.863275Z" + } + }, + "outputs": [], + "source": [ + "model = LogisticRegression()\n", + "\n", + "num_crossval_folds = 5 \n", + "improved_pred_probs = cross_val_predict(\n", + " estimator=model, X=X, y=improved_consensus_label, cv=num_crossval_folds, method=\"predict_proba\"\n", + ")\n", + "\n", + "# alternatively, we can treat all the improved consensus labels as training labels to fit the model \n", + "# model.fit(X, improved_consensus_label)" + ] + }, + { + "cell_type": "markdown", + "id": "e59f7d4f", + "metadata": {}, + "source": [ + "## Further improvements \n", + "You can also repeat this process of getting better consensus labels using the model's out-of-sample predicted probabilities and then retraining the model with the improved labels to get even better predicted class probabilities in a virtuous cycle!\n", + "For details, see our [examples](https://github.com/cleanlab/examples) notebook on [Iterative use of Cleanlab to Improve Classification Models (and Consensus Labels) from Data Labeled by Multiple Annotators](https://github.com/cleanlab/examples/blob/master/multiannotator_cifar10/multiannotator_cifar10.ipynb).\n", + "\n", + "If possible, the best way to improve your model is to collect additional labels for both previously annotated data and extra not-yet-labeled examples (i.e. *active learning*). To decide which data is most informative to label next, use `cleanlab.multiannotator.get_active_learning_scores()` rather than the methods from this tutorial. This is demonstrated in our examples notebook on [Active Learning with Multiple Data Annotators via ActiveLab](https://github.com/cleanlab/examples/blob/master/active_learning_multiannotator/active_learning.ipynb).\n", + "\n", + "While this notebook focused on analzying the labels of your data, cleanlab can also check your data features for various issues. Learn how to do this by following our [Datalab tutorials](../tutorials/datalab/index.html), except you do not need to pass in `labels` now that you've already analyzed them with this notebook (or you can provide `labels` to Datalab as the consensus labels estimated here).\n", + "\n", + "\n", + "## How does cleanlab.multiannotator work?\n", + "\n", + "All estimates above are produced via the CROWDLAB algorithm, described in this paper that contains extensive benchmarks which show CROWDLAB can produce better estimates than popular methods like Dawid-Skene and GLAD:\n", + "\n", + "[CROWDLAB: Supervised learning to infer consensus labels and quality scores for data with multiple annotators](https://arxiv.org/abs/2210.06812)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "b806d2ea", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:34.865875Z", + "iopub.status.busy": "2024-06-25T23:03:34.865697Z", + "iopub.status.idle": "2024-06-25T23:03:34.870213Z", + "shell.execute_reply": "2024-06-25T23:03:34.869778Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "if majority_vote_accuracy >= cleanlab_label_accuracy: # check cleanlab has improved prediction accuracy\n", + " raise Exception(\"Cleanlab training failed to improve consensus label accuracy\")\n", + "\n", + "if worst_quality_accuracy > better_quality_accuracy: # check bad consensus quality score corresponds to bad consensus\n", + " raise Exception(\"Cleanlab consensus quality score failed to detect bad consensus labels\")\n", + " \n", + "annotator_stats = results[\"annotator_stats\"]\n", + "bad_annotator_idx = [\"A0046\", \"A0047\", \"A0048\", \"A0049\", \"A0050\"]\n", + "bad_annotator_mask = annotator_stats.index.isin(bad_annotator_idx)\n", + "\n", + "avg_annotator_quality_bad = np.mean(annotator_stats[bad_annotator_mask][\"annotator_quality\"])\n", + "avg_annotator_quality_good = np.mean(annotator_stats[~bad_annotator_mask][\"annotator_quality\"])\n", + "\n", + "if avg_annotator_quality_bad >= avg_annotator_quality_good: # check bad annotator get bad quality scores \n", + " raise Exception(\"Low quality annotators have higher quality scores than good quality annotators\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + }, + "vscode": { + "interpreter": { + "hash": "50292dbb1f747f7151d445135d392af3138fb3c65386d17d9510cb605222b10b" + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/multilabel_classification.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/multilabel_classification.ipynb new file mode 100644 index 000000000..37edb7913 --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/multilabel_classification.ipynb @@ -0,0 +1,795 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "64053c0f-3582-465b-9e4c-a83da332da88", + "metadata": {}, + "source": [ + "# Find Label Errors in Multi-Label Classification Datasets\n", + "\n", + "This 5-minute quickstart tutorial demonstrates how to find potential label errors in multi-label classification datasets. In such datasets, each example is labeled as belonging to one *or more* classes (unlike in *multi-class classification* where each example can only belong to one class). For a particular example in such multi-label classification data, we say each class either applies or not. We may even have some examples where *no* classes apply. Common applications of this include image tagging (or document tagging), where multiple tags can be appropriate for a single image (or document). For example, a image tagging application could involve the following classes: [`copyrighted`, `advertisement`, `face`, `violence`, `nsfw`]" + ] + }, + { + "cell_type": "markdown", + "id": "adaefc8b-b639-4bdf-af0d-337519e37ffc", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "cleanlab finds data/label issues based on two inputs: `labels` formatted as a list of lists of integer class indices that apply to each example in your dataset, and `pred_probs` from a trained multi-label classification model (which do not need to sum to 1 since the classes are not mutually exclusive). Once you have these, run the code below to find issues in your multi-label dataset:\n", + "\n", + "
\n", + " \n", + "```ipython3 \n", + "from cleanlab import Datalab\n", + "\n", + "# Assuming your dataset has a label column named 'label'\n", + "lab = Datalab(dataset, label_name='label', task='multilabel')\n", + "# To detect more issue types, optionally supply `features` (numeric dataset values or model embeddings of the data)\n", + "lab.find_issues(pred_probs=pred_probs, features=features)\n", + "\n", + "lab.report()\n", + "```\n", + "\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "6a6261a3-6ea1-44a6-ac91-d375c8aa5535", + "metadata": {}, + "source": [ + "## 1. Install required dependencies and get dataset\n", + "\n", + "You can use `pip` to install all packages required for this tutorial as follows:\n", + "\n", + "```ipython3\n", + "!pip install matplotlib\n", + "!pip install \"cleanlab[datalab]\"\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "7383d024-8273-4039-bccd-aab3020d331f", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:37.657857Z", + "iopub.status.busy": "2024-06-25T23:03:37.657384Z", + "iopub.status.idle": "2024-06-25T23:03:38.829128Z", + "shell.execute_reply": "2024-06-25T23:03:38.828570Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs.cleanlab.ai).\n", + "# Package versions we used: matplotlib==3.5.1\n", + "\n", + "dependencies = [\"cleanlab\", \"matplotlib\", \"datasets\"]\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "bf9101d8-b1a9-4305-b853-45aaf3d67a69", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:38.831803Z", + "iopub.status.busy": "2024-06-25T23:03:38.831382Z", + "iopub.status.idle": "2024-06-25T23:03:39.025149Z", + "shell.execute_reply": "2024-06-25T23:03:39.024611Z" + } + }, + "outputs": [], + "source": [ + "import random\n", + "import numpy as np\n", + "import sklearn\n", + "from sklearn.multiclass import OneVsRestClassifier\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "from sklearn.model_selection import StratifiedKFold\n", + "import matplotlib.pyplot as plt\n", + "\n", + "from cleanlab import Datalab\n", + "from cleanlab.internal.multilabel_utils import int2onehot, onehot2int" + ] + }, + { + "cell_type": "markdown", + "id": "6fe047ed", + "metadata": {}, + "source": [ + "Here we generate a small multi-label classification dataset for a quick demo. To see cleanlab applied to a real image tagging dataset, check out our [example](https://github.com/cleanlab/examples) notebook [\"Find Label Errors in Multi-Label Classification Data (CelebA Image Tagging)\"](https://github.com/cleanlab/examples/blob/master/multilabel_classification/image_tagging.ipynb)." + ] + }, + { + "cell_type": "markdown", + "id": "6b283ecc-ba52-4bd7-81d8-5397966b1621", + "metadata": {}, + "source": [ + "
Code to generate dataset (can skip these details) **(click to expand)**\n", + " \n", + "```ipython3\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + " \n", + "from cleanlab.benchmarking.noise_generation import (\n", + " generate_noise_matrix_from_trace,\n", + " generate_noisy_labels,\n", + ")\n", + "\n", + "def make_multilabel_data(\n", + " means=[[-5, 3.5], [0, 2], [-3, 6]],\n", + " covs=[[[3, -1.5], [-1.5, 1]], [[5, -1.5], [-1.5, 1]], [[3, -1.5], [-1.5, 1]]],\n", + " boxes_coordinates=[[-3.5, 0, -1.5, 1.7], [-1, 3, 2, 4], [-5, 2, -3, 4], [-3, 2, -1, 4]],\n", + " box_multilabels=[[0, 1], [1, 2], [0, 2], [0, 1, 2]],\n", + " sizes=[100, 80, 100],\n", + " avg_trace=0.9,\n", + " seed=1,\n", + "):\n", + " np.random.seed(seed=seed)\n", + " num_classes = len(means)\n", + " m = num_classes + len(\n", + " box_multilabels\n", + " ) # number of classes by treating each multilabel as 1 unique label\n", + " n = sum(sizes)\n", + " local_data = []\n", + " labels = []\n", + " test_data = []\n", + " test_labels = []\n", + " for i in range(0, len(means)):\n", + " local_data.append(np.random.multivariate_normal(mean=means[i], cov=covs[i], size=sizes[i]))\n", + " test_data.append(np.random.multivariate_normal(mean=means[i], cov=covs[i], size=sizes[i]))\n", + " test_labels += [[i]] * sizes[i]\n", + " labels += [[i]] * sizes[i]\n", + "\n", + " def make_multi(X, Y, bx1, by1, bx2, by2, label_list):\n", + " ll = np.array([bx1, by1]) # lower-left\n", + " ur = np.array([bx2, by2]) # upper-right\n", + "\n", + " inidx = np.all(np.logical_and(X.tolist() >= ll, X.tolist() <= ur), axis=1)\n", + " for i in range(0, len(Y)):\n", + " if inidx[i]:\n", + " Y[i] = label_list\n", + " return Y\n", + "\n", + " X_train = np.vstack(local_data)\n", + " X_test = np.vstack(test_data)\n", + "\n", + " for i in range(0, len(box_multilabels)):\n", + " bx1, by1, bx2, by2 = boxes_coordinates[i]\n", + " multi_label = box_multilabels[i]\n", + " labels = make_multi(X_train, labels, bx1, by1, bx2, by2, multi_label)\n", + " test_labels = make_multi(X_test, test_labels, bx1, by1, bx2, by2, multi_label)\n", + "\n", + " d = {}\n", + " for i in labels:\n", + " if str(i) not in d:\n", + " d[str(i)] = len(d)\n", + " inv_d = {v: k for k, v in d.items()}\n", + " labels_idx = [d[str(i)] for i in labels]\n", + " py = np.bincount(labels_idx) / float(len(labels_idx))\n", + " noise_matrix = generate_noise_matrix_from_trace(\n", + " m,\n", + " trace=avg_trace * m,\n", + " py=py,\n", + " valid_noise_matrix=True,\n", + " seed=seed,\n", + " )\n", + " noisy_labels_idx = generate_noisy_labels(labels_idx, noise_matrix)\n", + " noisy_labels = [eval(inv_d[i]) for i in noisy_labels_idx]\n", + " return {\n", + " \"X_train\": X_train,\n", + " \"true_labels_train\": labels,\n", + " \"X_test\": X_test,\n", + " \"true_labels_test\": test_labels,\n", + " \"labels\": noisy_labels,\n", + " \"dict_unique_label\": d,\n", + " 'labels_idx': noisy_labels_idx,\n", + "\n", + " }\n", + "\n", + "def get_color_array(labels):\n", + " \"\"\"\n", + " This function returns a dictionary mapping multi-labels to unique colors\n", + " \"\"\"\n", + " dcolors ={'[0]': 'aa4400',\n", + " '[0, 2]': '55227f',\n", + " '[0, 1]': '55a100',\n", + " '[1]': '00ff00',\n", + " '[1, 2]': '007f7f',\n", + " '[0, 1, 2]': '386b55',\n", + " '[2]': '0000ff'}\n", + "\n", + " return [\"#\"+dcolors[str(i)] for i in labels]\n", + "\n", + "def plot_data(data, circles, title, alpha=1.0,colors = []):\n", + " plt.figure(figsize=(14, 5))\n", + " done = set()\n", + " for i in range(0,len(data)):\n", + " lab = str(labels[i])\n", + " if lab in done:\n", + " label = \"\"\n", + " else:\n", + " label = lab\n", + " done.add(lab)\n", + " plt.scatter(data[i, 0], data[i, 1], c=colors[i], s=30,alpha=0.6, label = label)\n", + " for i in circles:\n", + " plt.plot(\n", + " data[i][0],\n", + " data[i][1],\n", + " \"o\",\n", + " markerfacecolor=\"none\",\n", + " markeredgecolor=\"red\",\n", + " markersize=14,\n", + " markeredgewidth=2.5,\n", + " alpha=alpha\n", + " )\n", + " _ = plt.title(title, fontsize=25)\n", + " plt.legend()\n", + "```\n", + " \n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "e8ff5c2f-bd52-44aa-b307-b2b634147c68", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:39.027771Z", + "iopub.status.busy": "2024-06-25T23:03:39.027330Z", + "iopub.status.idle": "2024-06-25T23:03:39.040641Z", + "shell.execute_reply": "2024-06-25T23:03:39.040209Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "from cleanlab.benchmarking.noise_generation import (\n", + " generate_noise_matrix_from_trace,\n", + " generate_noisy_labels,\n", + ")\n", + "\n", + "def make_multilabel_data(\n", + " means=[[-5, 3.5], [0, 2], [-3, 6]],\n", + " covs=[[[3, -1.5], [-1.5, 1]], [[5, -1.5], [-1.5, 1]], [[3, -1.5], [-1.5, 1]]],\n", + " boxes_coordinates=[[-3.5, 0, -1.5, 1.7], [-1, 3, 2, 4], [-5, 2, -3, 4], [-3, 2, -1, 4]],\n", + " box_multilabels=[[0, 1], [1, 2], [0, 2], [0, 1, 2]],\n", + " sizes=[100, 80, 100],\n", + " avg_trace=0.9,\n", + " seed=1,\n", + "):\n", + " np.random.seed(seed=seed)\n", + " num_classes = len(means)\n", + " m = num_classes + len(\n", + " box_multilabels\n", + " ) # number of classes by treating each multilabel as 1 unique label\n", + " n = sum(sizes)\n", + " local_data = []\n", + " labels = []\n", + " test_data = []\n", + " test_labels = []\n", + " for i in range(0, len(means)):\n", + " local_data.append(np.random.multivariate_normal(mean=means[i], cov=covs[i], size=sizes[i]))\n", + " test_data.append(np.random.multivariate_normal(mean=means[i], cov=covs[i], size=sizes[i]))\n", + " test_labels += [[i]] * sizes[i]\n", + " labels += [[i]] * sizes[i]\n", + "\n", + " def make_multi(X, Y, bx1, by1, bx2, by2, label_list):\n", + " ll = np.array([bx1, by1]) # lower-left\n", + " ur = np.array([bx2, by2]) # upper-right\n", + "\n", + " inidx = np.all(np.logical_and(X.tolist() >= ll, X.tolist() <= ur), axis=1)\n", + " for i in range(0, len(Y)):\n", + " if inidx[i]:\n", + " Y[i] = label_list\n", + " return Y\n", + "\n", + " X_train = np.vstack(local_data)\n", + " X_test = np.vstack(test_data)\n", + "\n", + " for i in range(0, len(box_multilabels)):\n", + " bx1, by1, bx2, by2 = boxes_coordinates[i]\n", + " multi_label = box_multilabels[i]\n", + " labels = make_multi(X_train, labels, bx1, by1, bx2, by2, multi_label)\n", + " test_labels = make_multi(X_test, test_labels, bx1, by1, bx2, by2, multi_label)\n", + "\n", + " d = {}\n", + " for i in labels:\n", + " if str(i) not in d:\n", + " d[str(i)] = len(d)\n", + " inv_d = {v: k for k, v in d.items()}\n", + " labels_idx = [d[str(i)] for i in labels]\n", + " py = np.bincount(labels_idx) / float(len(labels_idx))\n", + " noise_matrix = generate_noise_matrix_from_trace(\n", + " m,\n", + " trace=avg_trace * m,\n", + " py=py,\n", + " valid_noise_matrix=True,\n", + " seed=seed,\n", + " )\n", + " noisy_labels_idx = generate_noisy_labels(labels_idx, noise_matrix)\n", + " noisy_labels = [eval(inv_d[i]) for i in noisy_labels_idx]\n", + " return {\n", + " \"X_train\": X_train,\n", + " \"true_labels_train\": labels,\n", + " \"X_test\": X_test,\n", + " \"true_labels_test\": test_labels,\n", + " \"labels\": noisy_labels,\n", + " \"dict_unique_label\": d,\n", + " 'labels_idx': noisy_labels_idx,\n", + "\n", + " }\n", + "\n", + "def get_color_array(labels):\n", + " \"\"\"\n", + " This function returns a dictionary mapping multi-labels to unique colors\n", + " \"\"\"\n", + " dcolors ={'[0]': 'aa4400',\n", + " '[0, 2]': '55227f',\n", + " '[0, 1]': '55a100',\n", + " '[1]': '00ff00',\n", + " '[1, 2]': '007f7f',\n", + " '[0, 1, 2]': '386b55',\n", + " '[2]': '0000ff'}\n", + "\n", + " return [\"#\"+dcolors[str(i)] for i in labels]\n", + "\n", + "def plot_data(data, circles, title, alpha=1.0,colors = []):\n", + " plt.figure(figsize=(14, 5))\n", + " done = set()\n", + " for i in range(0,len(data)):\n", + " lab = str(labels[i])\n", + " if lab in done:\n", + " label = \"\"\n", + " else:\n", + " label = lab\n", + " done.add(lab)\n", + " plt.scatter(data[i, 0], data[i, 1], c=colors[i], s=30,alpha=0.6, label = label)\n", + " for i in circles:\n", + " plt.plot(\n", + " data[i][0],\n", + " data[i][1],\n", + " \"o\",\n", + " markerfacecolor=\"none\",\n", + " markeredgecolor=\"red\",\n", + " markersize=14,\n", + " markeredgewidth=2.5,\n", + " alpha=alpha\n", + " )\n", + " _ = plt.title(title, fontsize=25)\n", + " plt.legend()" + ] + }, + { + "cell_type": "markdown", + "id": "672bfc2a", + "metadata": {}, + "source": [ + "Some of the labels in our generated dataset purposely contain errors. The examples with label errors are circled in the plot below, which depicts the dataset. This dataset contains 3 classes, and any subset of these may be the given label for a particular example. We say this example has a label error if it is better described by an alternative subset of the classes than the given label." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "dac65d3b-51e8-4682-b829-beab610b56d6", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:39.042688Z", + "iopub.status.busy": "2024-06-25T23:03:39.042372Z", + "iopub.status.idle": "2024-06-25T23:03:41.714424Z", + "shell.execute_reply": "2024-06-25T23:03:41.713863Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "num_class = 3\n", + "dataset = make_multilabel_data()\n", + "labels = dataset['labels']\n", + "true_errors = np.where(np.sum(int2onehot(dataset['true_labels_train'],3)!=int2onehot(dataset['labels'],3),axis=1)>=1)[0]\n", + "plot_data(dataset['X_train'], circles=true_errors, title=f\"True label errors in multi-label dataset with {num_class} classes\", colors = get_color_array(labels),alpha=0.5)" + ] + }, + { + "cell_type": "markdown", + "id": "144ad4c2-49bb-4147-a743-a83ed1656a11", + "metadata": {}, + "source": [ + "## 2. Format data, labels, and model predictions\n", + "\n", + "In multi-label classification, each example in the dataset is labeled as belonging to one **or more** of *K* possible classes (or none of the classes at all). To find label issues, cleanlab requires predicted class probabilities from a trained classifier. \n", + "Here we produce out-of-sample `pred_probs` by employing cross-validation to fit a multi-label **RandomForestClassifier** model via sklearn's [OneVsRestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html) framework. \n", + "Make sure that the columns of your `pred_probs` are properly ordered with respect to the ordering of classes, which for Datalab is: lexicographically sorted by class name.\n", + "`OneVsRestClassifier` offers an easy way to apply any multi-class classifier model from sklearn to multi-label classification tasks. It is done for simplicity here, but we advise against this approach as it does not properly model dependencies between classes.\n", + "\n", + "To instead train a state-of-the-art Pytorch neural network for multi-label classification and produce `pred_probs` on a real image dataset (that properly account for dependencies between classes), see our [example](https://github.com/cleanlab/examples) notebook [\"Train a neural network for multi-label classification on the CelebA dataset\"](https://github.com/cleanlab/examples/blob/master/multilabel_classification/pytorch_network_training.ipynb). " + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "b5fa99a9-2583-4cd0-9d40-015f698cdb23", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:41.716796Z", + "iopub.status.busy": "2024-06-25T23:03:41.716356Z", + "iopub.status.idle": "2024-06-25T23:03:43.059849Z", + "shell.execute_reply": "2024-06-25T23:03:43.059304Z" + } + }, + "outputs": [], + "source": [ + "SEED = 0\n", + "random.seed(SEED)\n", + "y_onehot = int2onehot(labels, K=num_class) # labels in a binary format for sklearn OneVsRestClassifier\n", + "single_class_labels = [random.choice(i) for i in labels] # used only for stratifying the cross-validation split \n", + "clf = OneVsRestClassifier(RandomForestClassifier(random_state=SEED))\n", + "pred_probs = np.zeros(shape=(len(labels), num_class))\n", + "kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=SEED)\n", + "\n", + "for train_index, test_index in kf.split(X=dataset['X_train'], y=single_class_labels):\n", + " clf_cv = sklearn.base.clone(clf)\n", + " X_train_cv, X_test_cv = dataset['X_train'][train_index], dataset['X_train'][test_index]\n", + " y_train_cv, y_test_cv = y_onehot[train_index], y_onehot[test_index]\n", + " clf_cv.fit(X_train_cv, y_train_cv)\n", + " y_pred_cv = clf_cv.predict_proba(X_test_cv)\n", + " pred_probs[test_index] = y_pred_cv" + ] + }, + { + "cell_type": "markdown", + "id": "41c1efab", + "metadata": {}, + "source": [ + "`pred_probs` should be 2D array whose rows are length-*K* vectors for **each** example in the dataset, representing the model-estimated probability that this example belongs to each class. Since one example can belong to multiple classes in multi-label classification, these probabilities need not sum to 1. For the best label error detection performance, these `pred_probs` should be out-of-sample (from a copy of the model that never saw this example during training, e.g. produced via cross-validation).\n", + "\n", + "`labels` should be a list of lists, whose *i*-th entry is a list of (integer) class indices that apply to the *i*-th example in the dataset. If your classes are represented as string names, you should map these to integer indices. The label for an example that belongs to none of the classes should just be an empty list `[]`.\n", + "\n", + "Once you have `pred_probs` and `labels` appropriately formatted, you can find/analyze label issues in any multi-label dataset via `Datalab`!\n", + "\n", + "Here's what these look like for the first few examples in our synthetic multi-label dataset: " + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "ac1a60df", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:43.062372Z", + "iopub.status.busy": "2024-06-25T23:03:43.061966Z", + "iopub.status.idle": "2024-06-25T23:03:43.066123Z", + "shell.execute_reply": "2024-06-25T23:03:43.065645Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "labels for first 3 examples in format expected by cleanlab:\n", + "[[0], [0, 2], [0]]\n", + "pred_probs for first 3 examples in format expected by cleanlab:\n", + "[[1. 0. 0. ]\n", + " [0.96 0.09 0.88]\n", + " [1. 0.01 0.22]]\n" + ] + } + ], + "source": [ + "num_to_display = 3 # increase this to see more examples\n", + "\n", + "print(f\"labels for first {num_to_display} examples in format expected by cleanlab:\")\n", + "print(labels[:num_to_display])\n", + "print(f\"pred_probs for first {num_to_display} examples in format expected by cleanlab:\")\n", + "print(pred_probs[:num_to_display])" + ] + }, + { + "cell_type": "markdown", + "id": "5a973506-c30e-4409-ac65-495537d13730", + "metadata": {}, + "source": [ + "## 3. Use cleanlab to find label issues \n", + "\n", + "Based on the given `labels` and `pred_probs` from a trained model, cleanlab can quickly help us find label errors in our dataset.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "d09115b6-ad44-474f-9c8a-85a459586439", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:43.068054Z", + "iopub.status.busy": "2024-06-25T23:03:43.067736Z", + "iopub.status.idle": "2024-06-25T23:03:45.057364Z", + "shell.execute_reply": "2024-06-25T23:03:45.056802Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding label issues ...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Audit complete. 30 issues found in the dataset.\n" + ] + } + ], + "source": [ + "lab = Datalab(\n", + " data={\"labels\": labels},\n", + " label_name=\"labels\",\n", + " task=\"multilabel\",\n", + ")\n", + "\n", + "lab.find_issues(\n", + " pred_probs=pred_probs,\n", + " issue_types={\"label\": {}}\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "439c003e", + "metadata": {}, + "source": [ + " Here we request that the indices of the examples identified with label issues be sorted by cleanlab’s self-confidence score, which is used to measure the quality of individual labels. The returned `issues` are a list of indices corresponding to the examples in your dataset that cleanlab finds most likely to be mislabeled. These indices are sorted by the *self-confidence* label quality score, with the lowest quality labels at the start." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "c18dd83b", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:45.059919Z", + "iopub.status.busy": "2024-06-25T23:03:45.059381Z", + "iopub.status.idle": "2024-06-25T23:03:45.067082Z", + "shell.execute_reply": "2024-06-25T23:03:45.066539Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Indices of examples with label issues:\n", + "[275 267 225 72 171 234 165 44 6 29 227 188 102 262 263 35 266 139\n", + " 143 172 53 216 265 176 164 73 75 10 159 107]\n" + ] + } + ], + "source": [ + "label_issues = lab.get_issues(\"label\")\n", + "\n", + "issues = label_issues.query(\"is_label_issue\").sort_values(\"label_score\").index.values\n", + "\n", + "print(f\"Indices of examples with label issues:\\n{issues}\")" + ] + }, + { + "cell_type": "markdown", + "id": "d6af5833", + "metadata": {}, + "source": [ + "Let's look at the samples that cleanlab thinks are most likely to be mislabeled. You can see that cleanlab was able to identify most of `true_errors` in our small dataset (despite not having access to this variable, which you won't have in your own applications)." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "fffa88f6-84d7-45fe-8214-0e22079a06d1", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:45.069167Z", + "iopub.status.busy": "2024-06-25T23:03:45.068908Z", + "iopub.status.idle": "2024-06-25T23:03:47.645930Z", + "shell.execute_reply": "2024-06-25T23:03:47.645335Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_data(dataset['X_train'], circles=issues, title=f\"Inferred label issues in multi-label dataset with {num_class} classes\", colors = get_color_array(labels), alpha = 1)" + ] + }, + { + "cell_type": "markdown", + "id": "32465521", + "metadata": {}, + "source": [ + "### Label quality scores\n", + "\n", + "The above code identifies which examples have label issues and sorts them by their label quality score. We can also take a look at this label quality score for each example in the dataset, which estimates our confidence that this example has been correctly labeled. These scores range between 0 and 1 with smaller values indicating examples whose label seems more suspect." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "c1198575", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:47.648174Z", + "iopub.status.busy": "2024-06-25T23:03:47.647863Z", + "iopub.status.idle": "2024-06-25T23:03:47.651580Z", + "shell.execute_reply": "2024-06-25T23:03:47.651116Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Label quality scores of the first 10 examples in dataset:\n", + "[1. 0.888 0.8224 0.9632 0.968 0.6512 0.0444 1. 0.76 0.774 ]\n" + ] + } + ], + "source": [ + "scores = label_issues[\"label_score\"].values\n", + "\n", + "print(f\"Label quality scores of the first 10 examples in dataset:\\n{scores[:10]}\")" + ] + }, + { + "cell_type": "markdown", + "id": "d65af827-aeda-4b6b-9ae7-b1f0b84700d6", + "metadata": {}, + "source": [ + "### Data issues beyond mislabeling (outliers, duplicates, drift, ...)\n", + "\n", + "While this tutorial focused on label issues, cleanlab's `Datalab` object can automatically detect many other types of issues in your dataset (outliers, near duplicates, drift, etc).\n", + "Simply remove the `issue_types` argument from the above call to `Datalab.find_issues()` above and `Datalab` will more comprehensively audit your dataset.\n", + "Refer to our [Datalab quickstart tutorial](./datalab/datalab_quickstart.html) to learn how to interpret the results (the interpretation remains mostly the same across different types of ML tasks)." + ] + }, + { + "cell_type": "markdown", + "id": "d65af827-aeda-4b6b-9ae7-b1f0b84700d5", + "metadata": {}, + "source": [ + "### How to format labels given as a one-hot (multi-hot) binary matrix?\n", + "\n", + "For multi-label classification, cleanlab expects labels to be formatted as a list of lists, where each entry is an integer corresponding to a particular class. Here are some functions you can use to easily convert labels between this format and a binary matrix format commonly used to train multi-label classification models." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "49161b19-7625-4fb7-add9-607d91a7eca1", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:47.653727Z", + "iopub.status.busy": "2024-06-25T23:03:47.653344Z", + "iopub.status.idle": "2024-06-25T23:03:47.657183Z", + "shell.execute_reply": "2024-06-25T23:03:47.656651Z" + } + }, + "outputs": [], + "source": [ + "labels_binary_format = int2onehot(labels, K=num_class)\n", + "labels_list_format = onehot2int(labels_binary_format)" + ] + }, + { + "cell_type": "markdown", + "id": "a58200c8", + "metadata": {}, + "source": [ + "### Estimate label issues without Datalab \n", + "If you prefer to directly run the same lower-level mathematical functions Datalab uses to detect label issues, you can do so outside of Datalab via the methods in the `cleanlab.multilabel_classification` module such as: [multilabel_classification.filter.find_label_issues](../cleanlab/multilabel_classification/filter.html#cleanlab.multilabel_classification.filter.find_label_issues), [multilabel_classification.rank.get_label_quality_scores](../cleanlab/multilabel_classification/rank.html#cleanlab.multilabel_classification.rank.get_label_quality_scores) \n", + "\n", + "### Application to Real Data \n", + "\n", + "To see cleanlab applied to a real image tagging dataset, check out our [example](https://github.com/cleanlab/examples) notebook [\"Find Label Errors in Multi-Label Classification Data (CelebA Image Tagging)\"](https://github.com/cleanlab/examples/blob/master/multilabel_classification/image_tagging.ipynb). That example also demonstrates how to use a state-of-the-art Pytorch neural network for multi-label classification with image data." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "d1a2c008", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:47.659155Z", + "iopub.status.busy": "2024-06-25T23:03:47.658851Z", + "iopub.status.idle": "2024-06-25T23:03:47.661968Z", + "shell.execute_reply": "2024-06-25T23:03:47.661427Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "A = set(issues)\n", + "B = set(true_errors)\n", + "jaccard = len(A.intersection(B)) / len(A.union(B))\n", + "if not jaccard > 0.7:\n", + " raise Exception(\"issues does not overlap much with the true errors\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/object_detection.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/object_detection.ipynb new file mode 100644 index 000000000..11367571a --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/object_detection.ipynb @@ -0,0 +1,1395 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d299c1e8", + "metadata": {}, + "source": [ + "# Finding Label Errors in Object Detection Datasets\n", + "\n", + "This 5-minute quickstart tutorial demonstrates how to find potential label errors in object detection datasets. In object detection data, each image is annotated with multiple bounding boxes. Each bounding box surrounds a physical object within an image scene, and is annotated with a given class label. \n", + "\n", + "Using such labeled data, we train a model to predict the locations and classes of objects in an image. An example notebook to train the object detection model whose predictions we rely on in this tutorial is available [here](https://github.com/cleanlab/examples/blob/master/object_detection/detectron2_training.ipynb). These predictions can subsequently be input to cleanlab in order to identify mislabeled images and a quality score quantifying our confidence in the overall annotations for each image. \n", + "\n", + "After correcting these label issues, **you can train an even better version of your model without changing your training code!**\n", + "\n", + "This tutorial uses a subset of the [COCO (Common Objects in Context)](https://cocodataset.org/#home) dataset which has images of everyday scenes and considers objects from the 5 most popular classes: car, chair, cup, person, traffic light.\n", + "\n", + "**Overview of what we we'll do in this tutorial**\n", + "\n", + "- Score images based on their overall label quality (i.e. our confidence each image is correctly labeled) using `cleanlab.object_detection.rank.get_label_quality_scores`\n", + "- Estimate which images have label issues using `cleanlab.object_detection.filter.find_label_issues`\n", + "- Visually review images + labels using `cleanlab.object_detection.summary.visualize`\n", + "\n", + "
\n", + "Quickstart\n", + "
\n", + " \n", + "Already have `labels` and `predictions` in the proper format? Just run the code below to find label issues in your object detection dataset.\n", + "\n", + "\n", + "
\n", + " \n", + "```python\n", + "\n", + "from cleanlab.object_detection.filter import find_label_issues\n", + "from cleanlab.object_detection.rank import get_label_quality_scores\n", + "\n", + "# To get boolean vector of label issues for all images\n", + "has_label_issue = find_label_issues(labels, predictions)\n", + "\n", + "# To get label quality scores for all images\n", + "label_quality_scores = get_label_quality_scores(labels, predictions)\n", + " \n", + " \n", + "```\n", + "\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "8d552ab9", + "metadata": {}, + "source": [ + "## 1. Install required dependencies and download data\n", + "You can use `pip` to install all packages required for this tutorial as follows\n", + "```ipython\n", + "!pip install matplotlib\n", + "!pip install cleanlab\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "0ba0dc70", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:50.074123Z", + "iopub.status.busy": "2024-06-25T23:03:50.073945Z", + "iopub.status.idle": "2024-06-25T23:03:51.234793Z", + "shell.execute_reply": "2024-06-25T23:03:51.234221Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs website).\n", + "dependencies = [\"cleanlab\", \"matplotlib\"]\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "c90449c8", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:51.237429Z", + "iopub.status.busy": "2024-06-25T23:03:51.236931Z", + "iopub.status.idle": "2024-06-25T23:03:52.383433Z", + "shell.execute_reply": "2024-06-25T23:03:52.382696Z" + } + }, + "outputs": [], + "source": [ + "%%capture\n", + "\n", + "!wget -nc 'https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/tutorial_obj/predictions.pkl'\n", + "!wget -nc 'https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/tutorial_obj/labels.pkl'\n", + "!wget -nc 'https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/tutorial_obj/example_images.zip' && unzip -q -o example_images.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "df8be4c6", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:52.385963Z", + "iopub.status.busy": "2024-06-25T23:03:52.385754Z", + "iopub.status.idle": "2024-06-25T23:03:52.389001Z", + "shell.execute_reply": "2024-06-25T23:03:52.388572Z" + } + }, + "outputs": [], + "source": [ + "import pickle\n", + "from cleanlab.object_detection.filter import find_label_issues\n", + "from cleanlab.object_detection.rank import (\n", + " _separate_label,\n", + " _separate_prediction,\n", + " get_label_quality_scores,\n", + " issues_from_scores,\n", + ")\n", + "from cleanlab.object_detection.summary import visualize " + ] + }, + { + "cell_type": "markdown", + "id": "2506badc", + "metadata": {}, + "source": [ + "## 2. Format data, labels, and model predictions\n", + "\n", + "We begin by loading `labels` and `predictions` for our dataset, which are the only inputs required to find label issues with cleanlab. Note that the predictions should be **out-of-sample**, which can be obtained for every image in a dataset via K-fold cross-validation. \n", + "\n", + "In a separate [example](https://github.com/cleanlab/examples) notebook ([link](https://github.com/cleanlab/examples/blob/master/object_detection/detectron2_training.ipynb)), we trained a Detectron2 object detection model and used it to obtain predictions on a held-out validation dataset whose `labels` we audit here.\n", + "\n", + "**Note:** If you want to find all the mislabeled images across the entire COCO dataset, you can first execute our [other example notebook](https://github.com/cleanlab/examples/blob/master/object_detection/detectron2_training-kfold.ipynb) that uses K-fold cross-validation to produce **out-of-sample** predictions for every image, then use those labels and predictions below." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "2e9ffd6f", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:52.391018Z", + "iopub.status.busy": "2024-06-25T23:03:52.390684Z", + "iopub.status.idle": "2024-06-25T23:03:52.396916Z", + "shell.execute_reply": "2024-06-25T23:03:52.396495Z" + } + }, + "outputs": [], + "source": [ + "IMAGE_PATH = './example_images/' # path to raw image files downloaded above\n", + "predictions = pickle.load(open(\"predictions.pkl\", \"rb\"))\n", + "labels = pickle.load(open(\"labels.pkl\", \"rb\"))" + ] + }, + { + "cell_type": "markdown", + "id": "35d49e5d", + "metadata": {}, + "source": [ + "In object detection datasets, each given label is a made up of bounding box coordinates and a class label. A model prediction is also made up of a bounding box and predicted class label, as well as the model confidence (probability estimate) in its prediction. To detect label issues, cleanlab requires given labels for each image, and the corresponding model predictions for the image (but not the image itself).\n", + "\n", + "Here’s what an example looks like in our dataset. We visualize the given and predicted labels (in red and blue) for this image using the `cleanlab.object_detection.summary.visualize` method." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "56705562", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:52.398805Z", + "iopub.status.busy": "2024-06-25T23:03:52.398543Z", + "iopub.status.idle": "2024-06-25T23:03:52.885436Z", + "shell.execute_reply": "2024-06-25T23:03:52.884887Z" + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "image_to_visualize = 8 # change this to view other images\n", + "image_path = IMAGE_PATH + labels[image_to_visualize]['seg_map']\n", + "visualize(image_path, label=labels[image_to_visualize], prediction=predictions[image_to_visualize], overlay=False)" + ] + }, + { + "cell_type": "markdown", + "id": "ff36d97f", + "metadata": {}, + "source": [ + "The required format of these `labels` and `predictions` matches what popular object detection frameworks like [MMDetection](https://github.com/open-mmlab/mmdetection) and [Detectron2](https://github.com/facebookresearch/detectron2/) expect. Recall the 5 possible class labels in our dataset are: car, chair, cup, person, traffic light. These classes are represented as (zero-indexed) integers 0,1,...,4.\n", + "\n", + "`labels` is a list where for the i-th image in our dataset, `labels[i]` is a dictionary containing: key `labels` -- a list of class labels for each bounding box in this image and key `bboxes` -- a numpy array of the bounding boxes' coordinates. Each bounding box in `labels[i]['bboxes']` is in the format ``[x1,y1,x2,y2]`` format with respect to the image matrix where `(x1,y1)` corresponds to the top-left corner of the box and `(x2,y2)` the bottom-right (E.g. [XYXY in Keras](https://keras.io/api/keras_cv/bounding_box/formats/), [Detectron 2](https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.Visualizer.draw_box)).\n", + "\n", + "\n", + "Let's see what `labels[i]` looks like for our previous example image:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "b08144d7", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:52.888185Z", + "iopub.status.busy": "2024-06-25T23:03:52.887837Z", + "iopub.status.idle": "2024-06-25T23:03:52.893217Z", + "shell.execute_reply": "2024-06-25T23:03:52.892779Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'bboxes': array([[201.96, 101.71, 334.78, 334.68]], dtype=float32),\n", + " 'labels': array([3]),\n", + " 'bboxes_ignore': array([], shape=(0, 4), dtype=float32),\n", + " 'masks': [[[290.44,\n", + " 200.04,\n", + " 286.59,\n", + " 213.5,\n", + " 285.63,\n", + " 224.08,\n", + " 290.44,\n", + " 231.77,\n", + " 293.32,\n", + " 235.62,\n", + " 289.48,\n", + " 251.97,\n", + " 282.74,\n", + " 266.39,\n", + " 281.78,\n", + " 271.2,\n", + " 280.82,\n", + " 277.93,\n", + " 279.86,\n", + " 287.55,\n", + " 277.93,\n", + " 299.09,\n", + " 276.97,\n", + " 307.75,\n", + " 276.97,\n", + " 321.21,\n", + " 281.78,\n", + " 326.02,\n", + " 290.44,\n", + " 330.83,\n", + " 286.59,\n", + " 333.71,\n", + " 263.51,\n", + " 334.68,\n", + " 261.59,\n", + " 319.29,\n", + " 257.74,\n", + " 295.25,\n", + " 251.97,\n", + " 290.44,\n", + " 251.97,\n", + " 283.7,\n", + " 250.05,\n", + " 283.7,\n", + " 243.31,\n", + " 303.9,\n", + " 243.31,\n", + " 316.4,\n", + " 243.31,\n", + " 319.29,\n", + " 247.16,\n", + " 323.14,\n", + " 251.01,\n", + " 326.02,\n", + " 249.08,\n", + " 328.91,\n", + " 227.93,\n", + " 327.94,\n", + " 226.0,\n", + " 323.14,\n", + " 226.96,\n", + " 313.52,\n", + " 226.96,\n", + " 303.9,\n", + " 226.0,\n", + " 293.32,\n", + " 216.39,\n", + " 283.7,\n", + " 226.0,\n", + " 236.58,\n", + " 228.89,\n", + " 226.96,\n", + " 232.73,\n", + " 219.27,\n", + " 239.47,\n", + " 216.39,\n", + " 240.43,\n", + " 209.65,\n", + " 242.35,\n", + " 202.92,\n", + " 240.43,\n", + " 185.61,\n", + " 230.81,\n", + " 198.11,\n", + " 219.27,\n", + " 215.42,\n", + " 218.31,\n", + " 224.08,\n", + " 220.23,\n", + " 229.85,\n", + " 217.35,\n", + " 237.54,\n", + " 213.5,\n", + " 238.5,\n", + " 207.73,\n", + " 239.47,\n", + " 204.84,\n", + " 239.47,\n", + " 201.96,\n", + " 237.54,\n", + " 201.96,\n", + " 228.89,\n", + " 205.81,\n", + " 224.08,\n", + " 206.77,\n", + " 220.23,\n", + " 218.31,\n", + " 191.38,\n", + " 219.27,\n", + " 185.61,\n", + " 223.12,\n", + " 180.8,\n", + " 226.0,\n", + " 175.03,\n", + " 229.85,\n", + " 167.34,\n", + " 231.77,\n", + " 159.64,\n", + " 236.86,\n", + " 153.25,\n", + " 240.46,\n", + " 151.71,\n", + " 253.35,\n", + " 149.13,\n", + " 254.9,\n", + " 147.07,\n", + " 250.26,\n", + " 143.46,\n", + " 247.16,\n", + " 140.88,\n", + " 244.59,\n", + " 124.39,\n", + " 244.59,\n", + " 115.11,\n", + " 246.65,\n", + " 109.44,\n", + " 249.74,\n", + " 104.81,\n", + " 256.44,\n", + " 102.23,\n", + " 262.11,\n", + " 101.71,\n", + " 268.29,\n", + " 101.71,\n", + " 273.96,\n", + " 101.71,\n", + " 277.06,\n", + " 101.71,\n", + " 283.76,\n", + " 108.41,\n", + " 284.79,\n", + " 110.48,\n", + " 287.88,\n", + " 119.24,\n", + " 286.85,\n", + " 122.33,\n", + " 286.85,\n", + " 126.97,\n", + " 286.85,\n", + " 132.64,\n", + " 286.85,\n", + " 136.76,\n", + " 285.82,\n", + " 145.52,\n", + " 284.27,\n", + " 150.16,\n", + " 286.33,\n", + " 151.71,\n", + " 290.97,\n", + " 155.83,\n", + " 293.03,\n", + " 173.35,\n", + " 297.67,\n", + " 180.05,\n", + " 317.25,\n", + " 190.87,\n", + " 319.32,\n", + " 191.9,\n", + " 326.53,\n", + " 192.42,\n", + " 329.62,\n", + " 192.93,\n", + " 332.2,\n", + " 196.03,\n", + " 334.26,\n", + " 201.18,\n", + " 334.78,\n", + " 207.88,\n", + " 329.11,\n", + " 209.94,\n", + " 326.53,\n", + " 205.82,\n", + " 324.47,\n", + " 203.24,\n", + " 323.44,\n", + " 202.21,\n", + " 320.86,\n", + " 202.21,\n", + " 316.22,\n", + " 203.76,\n", + " 314.68,\n", + " 203.24,\n", + " 313.65,\n", + " 200.67,\n", + " 307.46,\n", + " 199.63,\n", + " 297.67,\n", + " 198.6,\n", + " 294.58,\n", + " 197.06,\n", + " 291.49,\n", + " 197.06,\n", + " 290.97,\n", + " 196.03]]],\n", + " 'seg_map': '000000481413.jpg'}" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "labels[image_to_visualize]" + ] + }, + { + "cell_type": "markdown", + "id": "8f62da67", + "metadata": {}, + "source": [ + "`predictions` is a list where the predictions output by our model for the i-th image: `predictions[i]` is a list/array of shape `(K,)`. Here `K` is the number of classes in the dataset (same for every image) and `predictions[i][k]` is of shape `(M,5)`, where `M` is the number of bounding boxes predicted to contain objects of class `k` (in image i, differs between images). The five columns of `predictions[i][k]` correspond to ``[x1,y1,x2,y2,pred_prob]`` format with respect to the image matrix for each bounding box predicted by the model. Here `(x1,y1)` corresponds to the top-left corner of the box and `(x2,y2)` the bottom-right (E.g. [XYXY in Keras](https://keras.io/api/keras_cv/bounding_box/formats/), [Detectron 2](https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.Visualizer.draw_box)). The last column, `pred_prob` is the model confidence in its predicted label of class `k` for this box. Since our dataset has `K = 5` classes, we have: `predictions[i].shape = (5,)`.\n", + "\n", + "Let's see what `predictions[i]` looks like for our previous example image:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "3d70bec6", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:52.895298Z", + "iopub.status.busy": "2024-06-25T23:03:52.894877Z", + "iopub.status.idle": "2024-06-25T23:03:52.898598Z", + "shell.execute_reply": "2024-06-25T23:03:52.898169Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([array([], shape=(0, 5), dtype=float32),\n", + " array([], shape=(0, 5), dtype=float32),\n", + " array([], shape=(0, 5), dtype=float32),\n", + " array([[204.42398 , 103.44503 , 337.29968 , 336.21005 , 0.9978472]],\n", + " dtype=float32) ,\n", + " array([], shape=(0, 5), dtype=float32)], dtype=object)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "predictions[image_to_visualize]" + ] + }, + { + "cell_type": "markdown", + "id": "cf95ea28", + "metadata": {}, + "source": [ + "\n", + "Once you have `labels` and `predictions` in the appropriate formats, you can **find label issues with cleanlab for any object detection dataset**!" + ] + }, + { + "cell_type": "markdown", + "id": "3daff923", + "metadata": {}, + "source": [ + "## 3. Use cleanlab to find label issues\n", + "Given `labels` and `predictions` from our trained model, cleanlab can automatically find mislabeled images in the dataset. In object detection, we consider an image mislabeled if **any** of its bounding boxes or their class labels are incorrect (including if the image contains any overlooked objects which should've been annotated with a box)\n", + "\n", + "Images may be mislabeled because annotators:\n", + "\n", + "- overlooked an object (forgot to annotate a bounding box around a depicted object)\n", + "- chose the wrong class label for an annotated box in the correct location\n", + "- imperfectly drew the bounding box such that its location is incorrect\n", + "\n", + "\n", + "Cleanlab is expected to flag images that exhibit **any** of these annotation errors as having label issues. More severe annotation errors are expected to produce lower cleanlab label quality scores closer to 0. Let's first estimate which images have label issues:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "4caa635d", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:52.900546Z", + "iopub.status.busy": "2024-06-25T23:03:52.900257Z", + "iopub.status.idle": "2024-06-25T23:03:53.842348Z", + "shell.execute_reply": "2024-06-25T23:03:53.841684Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Pruning 0 predictions out of 138 using threshold==0.0. These predictions are no longer considered as potential candidates for identifying label issues as their similarity with the given labels is no longer considered.\n" + ] + }, + { + "data": { + "text/plain": [ + "array([50, 16, 31, 29, 45])" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_issue_idx = find_label_issues(labels, predictions, return_indices_ranked_by_score=True)\n", + "\n", + "num_examples_to_show = 5 # view this many images flagged with the most severe label issues\n", + "label_issue_idx[:num_examples_to_show]" + ] + }, + { + "cell_type": "markdown", + "id": "66d5fae1", + "metadata": {}, + "source": [ + "The above code identifies *which* images have label issues, returning a list of their indices. This is because we specified the `return_indices_ranked_by_score` argument which sorts these indices by the estimated label quality of each image. Below we describe how to directly estimate the label quality scores of each image.\n", + "\n", + "**Note:** You can omit the `return_indices_ranked_by_score` argument for `find_label_issues()` to instead return a Boolean mask for the entire dataset (True entries in this mask correspond to images with label issues)" + ] + }, + { + "cell_type": "markdown", + "id": "5b501dc9", + "metadata": {}, + "source": [ + "### Get label quality scores\n", + "Cleanlab can also compute scores for each image to estimate our confidence that it has been correctly labeled. These label quality scores range between 0 and 1, with *smaller* values indicating examples whose annotation is *more* likely to be wrong in some way.\n", + "\n", + "Each image in the dataset receives a label quality score. These scores are useful for prioritizing which images to review; if you have too little time, first review the images with the lowest label quality scores." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "a9b4c590", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:53.844530Z", + "iopub.status.busy": "2024-06-25T23:03:53.844300Z", + "iopub.status.idle": "2024-06-25T23:03:54.071152Z", + "shell.execute_reply": "2024-06-25T23:03:54.070546Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Pruning 0 predictions out of 138 using threshold==0.0. These predictions are no longer considered as potential candidates for identifying label issues as their similarity with the given labels is no longer considered.\n" + ] + }, + { + "data": { + "text/plain": [ + "array([0.97489622, 0.70610878, 0.98764951, 0.88899237, 0.99085805])" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "scores = get_label_quality_scores(labels, predictions)\n", + "scores[:num_examples_to_show]" + ] + }, + { + "cell_type": "markdown", + "id": "349521e0", + "metadata": {}, + "source": [ + "We can also use the label quality scores to flag *which* images have label issues based on a threshold. Here we convert these per-image scores into an array of indices corresponding to images flagged with label issues, sorted by label quality score, in the same format returned by `find_label_issues()`" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "ffd9ebcc", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:54.073329Z", + "iopub.status.busy": "2024-06-25T23:03:54.072979Z", + "iopub.status.idle": "2024-06-25T23:03:54.077336Z", + "shell.execute_reply": "2024-06-25T23:03:54.076893Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([50, 16, 31, 29, 45]),\n", + " array([6.95569726e-05, 9.03354841e-05, 8.57510169e-04, 1.58447666e-03,\n", + " 2.39755858e-01]))" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "issue_idx = issues_from_scores(scores, threshold=0.5) # lower threshold will return fewer (but more confident) label issues\n", + "issue_idx[:num_examples_to_show], scores[issue_idx][:num_examples_to_show]" + ] + }, + { + "cell_type": "markdown", + "id": "5a3b8aa0", + "metadata": {}, + "source": [ + "## 4. Use ObjectLab to visualize label issues\n", + "Finally, we can visualize images with potential label errors via cleanlab's `visualize()` function. To enhance the visualization, you can supply a `class_names` dictionary to include as a legend and turn off `overlay` to see the given and predicted labels side by side." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "4dd46d67", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:54.079292Z", + "iopub.status.busy": "2024-06-25T23:03:54.078966Z", + "iopub.status.idle": "2024-06-25T23:03:54.525899Z", + "shell.execute_reply": "2024-06-25T23:03:54.525283Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./example_images/000000009483.jpg | idx 50 | label quality score: 6.95569726168054e-05 | is issue: True\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "issue_to_visualize = issue_idx[0] # change this to view other images\n", + "class_names = {\"0\": \"car\", \"1\": \"chair\", \"2\": \"cup\", \"3\":\"person\", \"4\": \"traffic light\"}\n", + "\n", + "label = labels[issue_to_visualize]\n", + "prediction = predictions[issue_to_visualize]\n", + "score = scores[issue_to_visualize]\n", + "image_path = IMAGE_PATH + label['seg_map']\n", + "\n", + "print(image_path, '| idx', issue_to_visualize , '| label quality score:', score, '| is issue: True')\n", + "visualize(image_path, label=label, prediction=prediction, class_names=class_names, overlay=False)" + ] + }, + { + "cell_type": "markdown", + "id": "de0d7205", + "metadata": {}, + "source": [ + "The visualization depicts the given label (original image annotation which cleanlab identified as problematic) in red on the left and the model-predicted label in blue on the right. Each bounding box contains a class-index number in the top corner indicating which object class that bounding box was annotated/predicted to contain.\n", + "\n", + "This image has a **low** label quality score and is marked as an error. On closer inspection we notice the annotator missed the reflection of the person in the mirror that the model identified. Additionally, the chairs visible in the reflection were not annotated.\n", + "\n", + "Notice examples where the predictions and labels are more similar have higher quality scores than those that are missmatched, and are less likeley to be marked as issues and the number of boxes is agnostic to the score.\n", + "\n", + "Better trained models will lead to better label error detection but you don't need a near perfect model to identify label issues.\n", + "\n", + "\n", + "### Different kinds of label issues identified by ObjectLab\n", + "Now lets view the first few images in our vaidation dataset that are clearly marked as issues and see what various inconsistencies between the `given` and `predicted` label we can spot. " + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "ceec2394", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:54.528945Z", + "iopub.status.busy": "2024-06-25T23:03:54.528568Z", + "iopub.status.idle": "2024-06-25T23:03:54.861197Z", + "shell.execute_reply": "2024-06-25T23:03:54.860670Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./example_images/000000395701.jpg | idx 16 | label quality score: 9.033548411774308e-05 | is issue: True\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "issue_to_visualize = issue_idx[1]\n", + "label = labels[issue_to_visualize]\n", + "prediction = predictions[issue_to_visualize]\n", + "score = scores[issue_to_visualize]\n", + "\n", + "image_path = IMAGE_PATH + label['seg_map']\n", + "print(image_path, '| idx', issue_to_visualize , '| label quality score:', score, '| is issue: True')\n", + "visualize(image_path, label=label, prediction=prediction, class_names=class_names, overlay=False)" + ] + }, + { + "cell_type": "markdown", + "id": "9b5c87fa", + "metadata": {}, + "source": [ + "Notice the armchair to the left of the TV is missing an annotation." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "94f82b0d", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:54.864160Z", + "iopub.status.busy": "2024-06-25T23:03:54.863692Z", + "iopub.status.idle": "2024-06-25T23:03:55.198943Z", + "shell.execute_reply": "2024-06-25T23:03:55.198346Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./example_images/000000154004.jpg | idx 62 | label quality score: 0.38300759625496356 | is issue: True\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "issue_to_visualize = issue_idx[9]\n", + "label = labels[issue_to_visualize]\n", + "prediction = predictions[issue_to_visualize]\n", + "score = scores[issue_to_visualize]\n", + "\n", + "image_path = IMAGE_PATH + label['seg_map']\n", + "print(image_path, '| idx', issue_to_visualize , '| label quality score:', score, '| is issue: True')\n", + "visualize(image_path, label=label, prediction=prediction, class_names=class_names, overlay=False)" + ] + }, + { + "cell_type": "markdown", + "id": "05610be0", + "metadata": {}, + "source": [ + "Similarly, the woman in a red jacket in the foreground is missing an annotation." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "1ea18c5d", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:55.201751Z", + "iopub.status.busy": "2024-06-25T23:03:55.201207Z", + "iopub.status.idle": "2024-06-25T23:03:55.639825Z", + "shell.execute_reply": "2024-06-25T23:03:55.639225Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./example_images/000000448410.jpg | idx 31 | label quality score: 0.0008575101690203273 | is issue: True\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "issue_to_visualize = issue_idx[2]\n", + "label = labels[issue_to_visualize]\n", + "prediction = predictions[issue_to_visualize]\n", + "score = scores[issue_to_visualize]\n", + "\n", + "image_path = IMAGE_PATH + label['seg_map']\n", + "print(image_path, '| idx', issue_to_visualize , '| label quality score:', score, '| is issue: True')\n", + "visualize(image_path, label=label, prediction=prediction, class_names=class_names, overlay=False)" + ] + }, + { + "cell_type": "markdown", + "id": "05c9229d", + "metadata": {}, + "source": [ + "The people in this image should have had individual bounding boxes around each persons (the COCO guidelines state only groups with 10+ objects of the same type can be a \\\"crowd\\\" bounded by a single box). Individuals in the back are missing annotations.\n", + "\n", + "All of these examples received low label quality scores reflecting their low annotation quality in the original dataset." + ] + }, + { + "cell_type": "markdown", + "id": "03d5a521", + "metadata": {}, + "source": [ + "### Other uses of visualize\n", + "The `visualize()` function can also depict non-issue images, labels or predictions alone, or just the image itself. Let's explore this with a few images in our dataset.\n", + "\n", + "We can save a visualization to file via the `save_path` argument. Note the label quality score is high for this example and it is marked as a non-issue. The given and predicted labels closely resemble each other contributing to the high score." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "7e770d23", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:55.645471Z", + "iopub.status.busy": "2024-06-25T23:03:55.645026Z", + "iopub.status.idle": "2024-06-25T23:03:56.095854Z", + "shell.execute_reply": "2024-06-25T23:03:56.095255Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./example_images/000000499768.jpg | idx 0 | label quality score: 0.9748962231208227 | is issue: False\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "image_to_visualize = 0\n", + "image_path = IMAGE_PATH + labels[image_to_visualize]['seg_map']\n", + "print(image_path, '| idx', image_to_visualize , '| label quality score:', scores[image_to_visualize], '| is issue:', image_to_visualize in issue_idx)\n", + "visualize(image_path, label=labels[image_to_visualize], prediction=predictions[image_to_visualize], class_names=class_names, save_path='./example_image.png')" + ] + }, + { + "cell_type": "markdown", + "id": "6c9464e8", + "metadata": {}, + "source": [ + "For the next example, notice how we are only passing in the given labels to visualize. We can limit visualization to either labels, predictions, or neither." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "57e84a27", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:56.099144Z", + "iopub.status.busy": "2024-06-25T23:03:56.098766Z", + "iopub.status.idle": "2024-06-25T23:03:56.313247Z", + "shell.execute_reply": "2024-06-25T23:03:56.312626Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./example_images/000000521141.jpg | idx 3 | label quality score: 0.8889923658893665 | is issue: False\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "image_to_visualize = 3\n", + "image_path = IMAGE_PATH + labels[image_to_visualize]['seg_map']\n", + "print(image_path, '| idx', image_to_visualize , '| label quality score:', scores[image_to_visualize], '| is issue:', image_to_visualize in issue_idx)\n", + "visualize(image_path, label=labels[image_to_visualize], class_names=class_names)" + ] + }, + { + "cell_type": "markdown", + "id": "d8744ab9", + "metadata": {}, + "source": [ + "For completeness, let's just look at an image alone." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "0302818a", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:56.315501Z", + "iopub.status.busy": "2024-06-25T23:03:56.315140Z", + "iopub.status.idle": "2024-06-25T23:03:56.514073Z", + "shell.execute_reply": "2024-06-25T23:03:56.513486Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./example_images/000000143931.jpg | idx 2 | label quality score: 0.9876495074395956 | is issue: False\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "image_to_visualize = 2\n", + "image_path = IMAGE_PATH + labels[image_to_visualize]['seg_map']\n", + "print(image_path, '| idx', image_to_visualize , '| label quality score:', scores[image_to_visualize], '| is issue:', image_to_visualize in issue_idx)\n", + "visualize(image_path)" + ] + }, + { + "cell_type": "markdown", + "id": "46d6282a-4601-4cc3-b8a8-187ea6d5f8bc", + "metadata": {}, + "source": [ + "## Exploratory data analysis\n", + "\n", + "This bonus section considers techniques to uncover annotation irregularities through exploratory data analysis. Specifically, we consider anomalies in object sizes, detect images with unusual object counts, and examine the distribution of class labels.\n", + "\n", + "Let's first consider the number of objects per image, and inspect the images with the largest values (which might reveal something off in our dataset):" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "5cacec81-2adf-46a8-82c5-7ec0185d4356", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:56.516373Z", + "iopub.status.busy": "2024-06-25T23:03:56.515951Z", + "iopub.status.idle": "2024-06-25T23:03:56.519025Z", + "shell.execute_reply": "2024-06-25T23:03:56.518442Z" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "from cleanlab.internal.object_detection_utils import calculate_bounding_box_areas\n", + "from cleanlab.object_detection.summary import (\n", + " bounding_box_size_distribution,\n", + " class_label_distribution,\n", + " object_counts_per_image,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "3335b8a3-d0b4-415a-a97d-c203088a124e", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:56.520943Z", + "iopub.status.busy": "2024-06-25T23:03:56.520640Z", + "iopub.status.idle": "2024-06-25T23:03:57.489599Z", + "shell.execute_reply": "2024-06-25T23:03:57.488999Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./example_images/000000430073.jpg | idx 100\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./example_images/000000183709.jpg | idx 102\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAb4AAAGVCAYAAACB5pQcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8WgzjOAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOz9WbNk15XnB/7W3vscd79TzBFAYB5IAgSnHEimMpM5qaoya1KpNHVJ3damlpke+qE/gB5a1u/dL2ozWZv1S/dDST2qWqqsqixlDVnJHIpkkkwSIEjMEYgAAog57uDTOXuv1Q9rH78XIEhekkEyM+GLBALh16/78XOO77XXWv9BzIx1rGMd61jHOj4sEX7WB7COdaxjHetYx08z1olvHetYxzrW8aGKdeJbxzrWsY51fKhinfjWsY51rGMdH6pYJ751rGMd61jHhyrWiW8d61jHOtbxoYp14lvHOtaxjnV8qGKd+NaxjnWsYx0fqlgnvnWsYx3rWMeHKtJxn7h/92V7/dV3mOVbPP2Rn+fs2ceQ+jMR+b6/u46fXKgqIoKZsVwu+doL32LZjhnHyGeeeorJeISIq/Osr9NPN1wVafjH/+hz4dVLb3Dp5rvMZ3NyKRQtqBmmiqqSVSmlEEKEADl35NyxORqxMR6xNKMr0Ggh0ZGlkLNwcLDPZKNla2sbTJCcSdogIWGxB1Oo31qzgmFoUZIYIQgQQJSimRAEpaFrT6OTC6QT58kISTOtCZmCWMAULIKhxKJkE4gRMFQVRYkqiAXUFiznd5CDW+j+DTaCYZJR7cgEivjn7RYHLBdTmnaDVhKNJYwx/+V//r9b38DruC9x7MQ33jzDo0/t8PY7sLlxAlNFwrpg/FnHkMzM6kKjBWoiVNOf8dGtA8AwrBgSIvf29rh24zqL5YKcM6UUTHwDY6qYGWaGiCBBUFNUjRCCJyfN9H3HeLJFmS4wyVgIqApmYKaYKahgpihGRDEUQfEmjz8XhvvniGyhmqdGw++fIWerYfWphhAs+utg/r5HXsfUVu9hJqgZoT6G1cfVAEWkYMHqZwcKWDakRNAAwcAysPzJX6h1fGji2ImPuMn2icgzJz4NNqk7xHX8rGLQWF0tkkOyUwUzinkSXBd5P+vwSsrwBHb57beYdx1N09AtO0TkPRWf1T8lBGKMWPHE4tcYiJHECDOhqBHbBIo/T0EIoP7fqKKS8SzWI2FIPPXewVbdAE9aiqe1ITmBZzv/HalJUxH/3yrRCmr19RBPjkEQAgH1RCqCmBARCp4kzQpIplhBJYKqJz0NBE2gERWlWHlPbl7HOn7cOHbii9IAYNIitBjFb/J1/IWIIfGJ4LtkVW9nDY+t42cXBmrGnb1dbty9g6aIafEqb9isDE81Q82IQBhKLwCEEAIpRcYxMZ0uaFNCpPdEpQYqBCIQwRQFYvDOow3VGIoahwkKA7FV52BIfWZACJgIEgJCWH3fFbAAogWlQE2EVp9jppSSMTEC/tmDGcEKphnVHkMpMrxfJBAxM09y9WgNUAl+5BJ/opdoHR+uOHavUgxEQCSBsU55f2GitjrxBbMvBVQRgRDikeu0vmI/jRhalUddTwxf/C9fvULBCCGyXC6PbFYCqkd+78hs0JOiV/QpRtpgNPS01nNyoyFah1dP9XlEn73Z0DZV1HqvrobnfYAjy+EjXtEVwVvmMlR35muA+bOLFYooJgXB26NmUhMkBDGEAhREvLITjBAgRJAYUBEyQ5UKmBFj8FUpGERvr3ryW49V1nH/4vitztraFJLflPVGXAMmftYh+KqhZBVUWhKKitJD3Zqvq76fRRy2FJXZdMq7d26ToxAWHVZ6zASIaCk+Q6szNREhBCFGwUpBxEgh0MQGMSUvMm3TIsHI6hWcV0oZtCcQKRSflwUHmHgy9FanSMCsUMybjjHUKs0gECgoaoaYETR58i0dIKA9IAQJBDO8fCy1bQnFOlSVEMXnzWa1AZEpYiyXHbEm+RxALIH65/cuvaK6JBuMbEh34u+1jnXcpzh+4vseMcyY1vGzCEUIvrMHZrMFKTWoQoqR6XSfU5Mt1uPYn14c/T6s5q5FeePNywQRRI2+62qiey/yc5ijBSCmCEEIMSAGIfjr9bkwW/acPLED0hHEgSKlZE+cQVFqFSiCDchLBIY5nVTgkxlIOFKd1sHe0B6vHdjFbI52ClaIufPn68ifbgqidNqjGCEFMAh44hMCUXzGGIKR85KIJ0KpdZzZkTY9daaopQ4Zk//9p3cJ1/EhiGMnPqvfgtXNafXPdfzMwsEJ3n4qqhwcHBBjpFgGicxmM5CKlkOG/9dYX7ufRBxF2YKjNff29rlx964/oRRUzMEgtsJB1jGC1JGCtxdLKU53KD3jJgKZg9mc0cYWxEAp2X+vJjWCX2DFE5vZMB8LXvkfbWgOX2GDo3Ngb1OCDm1OiTRtC01LFCVpwIoAI4o5MMfIlGVhnBJpPPJjUU/4ghBNMY0QzFNYnuOInOEzS509evJXhmMN9ewYyuEcdB3r+HHj+InPBr4Y+E0pqxt3HT+bEBxSjgiqSp87wLlgWpT5vKOokuJRyLq878913O8Ykt7w3Xjz6lssrVAw+r6j17Jq4ZkWSimA1ukYTl0w6EqGoEiBtkksuzm9ZmJNcEWNkq0mL8AgF0hFavO7Hk+tKD3B2Orri0EphpgiUQa2gVMoJPhzg5CaEZoaVHuIDRKamvACaPLj7gJNjKSmcURxMUSUiH8WLT5vFimr8xQERL2lOtyeSm15SsIkYmKYqCNS17GO+xTHTnxa4ctB/IsdZN1++AsR5guWz4QCat7y0mI+31HD4pFKpK5662Xk/oa9fwZlUEphb2+Pm7dukdpEzh3FHLBR1OH8Pgdz4IkjF42A1TlfA3ROJs+ZXo12NGZ/b4/2xAaNgEjEinP3DBAJiCS0KLW+8nldbTseHqbW6s5biwNv0PTwDsniBIYQIjEkLNTKTCMSQNSrtQAEhRgiIUbUxNcJOUJ1sKGlGWrL1Y/BAa0KUmeBZpXjFzGJKBnEZ5TrWMf9imMnvqvX73L7xjUuXjjL6VOnadJoPTv6WYbD6CrMzuqCd7hL97lQwPf9ayj4TysGJR3wudrrly/TmwNN+q4nl/Ie9OaAthQ8WWhUmhSJKdKVQu4KTSV4x9AwnS6hKKm2KE0qTNK8wvKNqW98AEyiV1RkjMKqfVgrRKtDjILfSmqKVBAMQQhixGBIwOeBw+zPAlKO0CLMCfoSIlFBUbRSIYa2pWRP8qUUVCpxIkayFYIpQcWJ7maU0mNljAVIQRBrf6rXcR1/tePYie/Lf/oV2rRktn+DZ5/5BDs7Z2nTsKAe3e2us+FPPw7BFGqCophoBQe8/5nr6/OTjAHQYsDNvXtcuXODmBJ97tHiLc5SCrkqsjiNLhEFApkgSmoDnWb6LtOmEYlMEKcCRGBn5xSTNjBbzClFvYVqRpRIWDXAvToz1ap+MhDMD3l6A38wiPk80PxnAVA1JIiTy/slMii3UOpztaIwIeSMaY8Vpe8c1SnqUzkrxRutRRFTdOCWmt+J6oRBDCg40hRRUoiIeBWrGhDLP/VruY6/unHsxBe6O2yMIou9Kdff2uDkszsYrbdPgMMldb2w/nTjqLKGc7DUvM2kVurCso6fagjkXLh05U2yZzT6kuuIINCbOXgEw8wTlZoRIjQpQvTkOGkbsgElE4PRtIm8O+eXf+PzvPj8V4giWAGLsOx7RjFSSzMEq1Wd8/kCuiKFGz5RXNV6cpj0vAqt8z4zcin0+/soCRVz4QoNpCj0xShqJCvoYs6+BfqZV5zRvGHqlaehWvz9tKMphWRSN2oDaE789cUIVHk2y5QSar24lixbx/2L4yu3xHtMD5TTp8/zxJNPEOL7E9wwMV/HTzeOQOGDJz2tMlRmur4iP4MwNXbv7XL79m1im+g6r/bEqNX44SYxiCHmXD1SIEZXS4kh0oaG3dk+owhaMlYKD5w9z/nTZ3mh7+t87lB705A6QxxaqVBLPU86tRWuwyxNaq1VFVQGjl+snYMQIDWRJBNMWooYWCZoIARozGeVoTioKrYjUvKWreRcgTSRAGTtKZbJ2WDhRPeqxebJtnYp1MwRqKZgPSITPAuXDz7Z61jHjxDHTnxPPvsYt24v+NQnf4l3rt/hwYsbTCbNCiUmRFgjPH/KMQgaVyRgiJScsWhkgiPzBrrWSuEFHMzwszzuv4pRW5xV3/LNq1dQK6BG6TsHn6gnChnUWcy1MgUlRKNtExEhSAsi9Ms5o5EQFUJMYMbZMw8A4sjJECrpXYkS/RqLrap8EfOqUgALBAZFFz9eYajslCDOl8N8EmhaMB0hEgipqd2dqsYpPreLwd0cgikWEhKFGNNKO1ZMXWgaiKKVtxfoa7IFb79afVwsIVboS0cgkCT5RNIKtt7CreM+xrET392bt3jo4U9StMEQsvb1J2t4/M8sKgfKUXFCStHnNlEoBnnFpfKqYB0/2RgUUXb39nj35k2kDZS+R4aKhlhRlkoZrktwKkoMDUaDRrDS0yQwMm1MdLOyqoTSaEIuHWqFvlaKnuA8kQXE0ZY4PsXEBpwomGBSsZ66yjyu7BNwlKllb5dKJEhD0fo6q0TpkmKCJziyE+WLQdMmRCIKDnIxwUolyKsT27W4Vqd/bm+rGkLuFdFB1towIlETIfjxmazBLeu4f3HsxDdXY1mMq1ff5tyZk0wmIw7rh6Pk2PUC+1ON4XQPKE6xKvzrDg2EsFbX+QmHozT9vwvGG29dIYvSxEQ/7xAduHae/hzRKJV8DiEkzAIaGiQWsB6zTAgVoCQFFUHxqr7vFpTSI0lcqcWgZIWYoKgzXMTFo/34HHlZhFpZ4cm4zoUlClJpBVLTjqknyyjpsJMjwz3mCjSIEZOQK2+wQFVyca0VCRyhISg5+yZNta8Vb23V1jkkOrQ6nc7gL1CqQcSazrCO+xfHTnzNqXO8fPl1Nhlx463XOX/ub2CjQQJpIEiv53w/mxgWlEOjUVVzvUY+gGO2jvseg7jD/sEB715/l5AS0/mcUEnq7sLg3nSEQUBcvUVNhBhILeRuwUYz4e6dGakJtGNXRxGJWIik1JBiXCFH3Uw2Y1Yc2iRDUlOnOhy59CvXhVWr1edqYtEfryrUQxVmami/xCRBTdaIzwqjuT6nAf2yp+sz0vfOPxQwy/RFQb0yLCWTyxIrGVOlqDo3UAqqtW1qoH3nq0io74XPRkXXlJx13L84duK78dq7nDxxgumdd9gabWFl4I/BSgZiHT/1sGFOJAPAoT4OdH1fVUH8Mg9Q+3XcvzjqxqBqvPXmm+45Z5VPp1rRkwIW3JanDl5DEJq2ZTlfsDXZglxIkljMF0wPZpw8fQIqJaFpG4zAZDRmMHOt9SJiiktkFgh2RP3ECMEToItPSDWorQkO19kUUp2hhZr06mubUfoeLTNPciaYFK/UdIk7LwSKOlBldjBz2oUoooWiBtYQQhXF1iXBsmMCqkaoWvY2sDkSVVZgF/dtEDJBI6LNz+gKr+OvYhw78V08cZJzH3mUxe0t3nntHTKB1oZdmEOSzdagiZ9a1Pwlwkptw1CyFqIlr/tiYJF7NmX83l89kvzWLdAfPUolq4tBMeP63Ttcvn0dC5BzRjJQICNOVBc3Bo4SgZ42NbSN0C8hRoPSEZNxZ2+OEpEkiDVEesDosyu9qCoSG0+u6sjdoplCwCR4pWaCSaAhOXFBvIUJVF7f4d9NFCMTBGJVfem1R5oAzQaSRn7MDv3EgoCNVpy8BhjVhDmck6GVaeD2RQqhF0R7OgEsOzhHQYLP9QauoGqhqXnONKKkNf90Hfc1jp34kvS8del1HjhxmjMnTzNKLeu25s865AgkvM6QVAmlkAWKBNTKBya39dzvx48wVNA1+V2++iadens5l+zIy0GerGidZYHESGxAktMbQjQ0L9kYCXfu7rKYKpPNiVc+6m3raIeXOuclIRhWCmiBHkJ9TrSqfnL0u1n/OozURMCKUeoGKJiBFK/M1MEvQdw73SRhwed8IuaPhQqeKVUOzaS2JnGi+iC2rU5KN1fgJqrLmWkpLrBQxbhNlVCrQBkcKqrQtcVK1wjrGd867l8cO/E984nPceXaNZ597ln6+ZKmHciy6/hZxvunqqESgQGKllpdrON+xnsqZkBLYe/ggDt37hCCsFwuUTMyRn+ktVmtz12BJQW3HqrzL5FCKcZ0lknjzfpUB40NHE3M7X6WiwWQV+1KJbp5bM2MpYJtnManVXHF54uudKer9mysBHvTKnOmA3Eg1Q5OIEhwnIlEotRpvqnPJtWP0yRgQZFQXRcKnpwr0AULSAVamWo1iT/0IMTEk3ExYmjdwsmMoFTi/LpFv477F8dOfFtnH+K5kw8RGmE02qSY+pfgfUr0x50hrauN+xh1kYghMni6GW5rE8M68d3PeP/9rWb0Wvj2q6+gtcXpAMXiyUSrRZAaot7SjAIjSS5CHSIiQtMIu7u7mDSYKZvjhoDiSJghiQkhRkzyiqguIVCic+NigBKdwmLFVrw+D6/Shr8OIvOm/h5GqOLSKy8HECNQSII/Z4ChipCLEqUKo1PFsGtWD1arRrdPp1SOI1qr1ErjWxn1mnMPgzg6tDf1JIsDZQKHogzrWMf9iGMnvsvXX2Z27YCnPvZxRpubjCytPLIGwurReP8c6YP+vk5+P35IRdSaQYwR04CYL6aHgsleF/5VxrW8fwP2k3r9w78DFbZy72Cfm3duo01Aa7t5qM580YYVfUHc8XwSRhQSffFZ2Hw2Z7a/Twg7nDq5TSNLJm1i3hWnF5gRYiDERDE3fc2mKAHNnljDcM3r8QoOYsG8pZjM9TltkAkrVcdTBCSiElHNqCpFQHNGFzM0+7yw1KQYRCgYot66LeCmt35mvEVbCg2BIEIeXBiWC/KyX5nwVmgOFtweSVEKSraChDEi3irF8krabB3ruB9x7MRXZrA73+Wdd98lticwMg+dO02fO3a2toBICC6PO8yaVJWU0koM92isk979CIOqxiFBfYdMpGiPlcBOM3I9R46S2P8KZ7+fUqyoZhIopef11y8RUqSYUkpZoTnL0GZUIUWhdIXQRELSCtgwsCUxKsuuJ7RjcjaWBwt+7hc+w7Ubl1h09xhobEU7AkbJPbl0WHRLInWUC0FiBZTkWis6j7NoLdQoPodzlrx7CQ2ejlXFpQTDLDJbBHLIlOYuEvcpKr6pqmhMs3ovySCV52T1LJWqbz47FC1EE3KAnDt0OYccsBLZHIEjgAARumD00WvHpkCMlZcqBQtrkep13L/4IdwZ/pCnn32at2+/ydtv3aQJwosx8tGPfoytjz6DFfGZQJ1FDBXHnTt32NzcJKVECOEDq8N1/PAxnMIB8ABSYeNOX6e2j95bqVSu5Xflv8N22F/G+GlSNLxT4bNtd1ff4/ad21gTV/f90c1fMa+wQoyEpiFGaBovxIoWmiahJdFboRk19Nn4wq9+gYtnT3PpyksrBwfME6epUTQTU6JTrceihOqd5152MEhhlqIECxX9Wy2JKjndyNW6ymd2fuyRZZfZ35+ztBk5QpDDOZyYYVrARoCSS18l0YQQxM2PVAlmriKDQTEWZIpkohixy8goMEmbxBgJwWkNWHWTWN2g3qk47FqsYx33J46d+DZCz85kxOVr13jw7IRuuqQZj9k5scnBYsrN63d5+OHHGI/a1fKZmoZlv+Sbf/oNfvGzv8jW5ta6zXkfY1jvHWxgVe0++xxHoYjb3xx5ljfnBnjf6vfeD5H5ix7DZ3l//OQ/w7AIa00wly5fhhhQU7qucwBLPRQRIeI/U60VGU5nUOtJbWTZL/ApWaFtG6b7CzZGY2+ZFq2ctkFxxVuHXd8RxXVZwZMb5rNDwarxa63KtPI7RavNT5Uwq5uiIN6CVVW0eKIKpSP0+8Qy+OpVcYRiK3++oO6OLrlb8fIgIYMzgzmRPtWbdCSFvikQhRhg1G6QomtwltqCFVVXniEQohDiYF3kHMh1rON+xbET3yOPPsDF8w/wkSef4fqNG8z2O5rxhOe/8zLPfepTbGxv8cf/9k94/JHHeOrxJxARulIoRfnoxz7C5uZGJarCqrWyjvsWpupAB8skC4gGLLwfFFAzIr6zH3bRQwr5y8eVOkwyrO6rn8K7mkt33d3d5cbtW9BGuvkCKbYirBdzuoCYCzDnooRgjOIYTFHrwWB/f48TWxuut5mhDa7MImJYVERtVd3HEGhSYlGlvUpxuTJTrZQA80qyOKgmSnJqgAAoItWMttoQ6YD+NHdAd73rwqQVmjMTTIS+6Aps4tQDf42ogCjFhlYpiJSqTuMnSSwSygjwSrFPhT4ZjQijEJHB269+wCRCMJdwQwJW9URXuqDrWMd9imMnvuv37nJxntk5MeH02Yf40y//C371V36NG7e+wfzr32Bnc5ubt64zPThg794un/rUp7l+8wZtm9jamDiqbUCF/aVbYP/ixdHunqP3HGEXogsVW91FH/Xjq+vRCljgPCpd5Q/jvZX4cYnuA9CjvuSRzc2RxFSP01/z/a91lB5wCJL4rgbXex6w9/3mgBQcjuH+gV3e30oVga7refWN17AUyKWAGcGcwF2q952IEMX1NbXvMYvE2FDynPG4Ye/ggOVyQTq5hanSNmNmZVo/phKSoMtCtXwgCqRUKQ56iIg0QIuiJbtTh0RXT6nuHMOJ8fbmIA7thD6zw/ZtCIHqmlf3EUaMATEXj/Z6MmFEkoJZcV+/ygtE2lUlOaQsxaWzGzViSIgoaejDWoX/VNAPpbY6q1OE38N+MGvm1DruZxw78f2Nv/6/5mC2x6uvvsl8uUBD4ctffp5bN26xuZHYnc/R/i735rtMD/a4cfsOe/sHTOe7fO6zP89Hn3yaNjW+E6w38brl+ePGIWIzSKJpAgWlDQELghV14rQqxJasPV1xR4cYlCABF8iIdaH+8ecohw3Io3Ma/+PQ863+fJUD7chvezvQ1Fb5cUgivigeeeXaxhteY2jb/SRmfp7b3UIoF2V/PufajXdpxy3W+YJdyBRRykqui2r349knBLzFGYVcevZv7zPZmtDnTJAGA3IZPpDQ9dmrMjUInrTa2ND3vcuRGahU2kGIQKxtS1xEOlagifhmSImebMTTW4AqZh1RDVgwpKbtUEFpQcRpCauoOqGSXYBa3Cg2BXd+0HqN6qV2srtBTwYNtJooFDoRGgYb64Ec6FVeTGA5V3Hq6NuIv8qQ5HX81OPYie8rf/4CZ041TGe7PHLxI5zYucD/93f/KRcfPkFqhVEynn7qM5w6cZ4//MN/w2QESTZ4/NEH+OpX/oxT2yd44MIDxNjUmcBP8mN9SMMgqJCrKHKyWCH07nTtAAjj+e98B2ki58+c5eGzp3Cx5O/N9/vhNieHs0Q5mqXq4iWrrbtVu5uhOqgk5RW5GgQnPaPVYSIkJ4ZroWQl50zXd6gWkginTp7iMIne51BXGtEqD/fy668Sm0QuStf19H3vszADVPxcB0OkEGJDP1vQjsaYKU2TuHnjNiG2iEY2RlssZ3cpqozGo+o+DiUXpLYESwHVQGxaig2tage25OJwfwmuinJ0+2EcaWlqWI0brEqWBYkMdAnUH5MKjHr/LBhsVSEGqbzA+mXWep1X256hytdQc1bw+acBEupc02eIpTj9QdRWnL6ihRCd4G/rBWMd9zmOb0t071W2zz/Hkx//Bf70K1/nqcefohm3bGxtkrMxm+2C3uaBC4/z1FNP8plPf5p/+S/+DVfevISS+erXvsLjjz/Bxz/+KZqh8lsDXX7seO/582SnUlCUJA1BVhpVIL6gvPXOO4x2Nrm3u8f5U59mnD5YAPh7cTPf2w59/wG9r+qzIa0N/67Wo+avszpyBadmUBc+I1uhaEYLTKdTlosF837JbLngYD4nxkTTJFIUzp44wamTp1atzvt+Tw09XAns7u9y8+4tYtswP5j78QJaBmBJnZslJTWBGEKtxHradsR8NmO56Alhg2effo6bN6/4Ai9Uq6EBQKMrCbAQIkXdvX3edRjBk3AxzylU53UcRapmVTEGrKKpLQ89xZpMhmtlRyyMhuQnnvSGZLm6noeX2Y9LBB3M1MHndfX61SdVKk2pKi713WtL1Cr9AQaOob93jNGd6gWCBH+9dazjPsWxE9/nPv8rdFPl1u6MG7t3kSsv8bGPP8S3nv86n/vcvwNygXv3bvPK5dd4+iPPcP3mLl/4td/ArOPOvRvcvXeXa+9c5bFHH+fkidN1vrBOevcrvBVo0Bt9W2clFr0SkAjmLTg9Urn0DEr5cKTv+IFxtH34/VqJhjsDePv18PVUCxKkLpD+P1WhL4VF1zOdz1h2PdODBVmNg9mMWdfT5eJSXH3H5qRhsjnBotFJoZst2JhsMG6b71oY7/e9pfVEmRpXrr6FmouBD+czJaciiBU3vlMHtTSNiwrEGEgpkMuSvb27tO2YrA2b403e7Ya2HvS5c+J2yfRdT0wJ1R6jIBFmyz1eu/QyzTh54lMjF11pcIYQKmikXs/a9lYDQqlZebAYqlsRGbL1oO5i9doFry6HqqueV58rqs8J8V9VXC/UT1PwFqZIvfek1vWZQeNlGCTq0BUIsmqMSyXIh9q1SBKIawL7Ou5jHDvxffPSghe/+gc8/tQT3Lh5g9neu2ycjpw+c4rnv/UCamNEZrxz7R1eef0SFy88xhOPPMzpnTEPP/wIRGFra4d3r13j5M6p+qVYJ70fNb6X8LSWvCI0q2Ws6EqzM0rk1u3bUOHsDEID5jD1HwQ6Oqz4jlR6lch8CFwKdXGLbpCqhflizmLZATCdTZnNZiyXSxZZWZRCwVzqKkawRLHA7XsHFKOCK7QCRSBSsKLVYkeZ7u0Tt7cqoEfvaxfhvZqcQlZjfjDjyptXGG2PmM4XGC5CvZpVDm1d/NhDiEjwll0IwsHBPlqUokvOnr/oGwXLqGWCjNxxvJ7PFJJTFSioFULT8OWvf6m2VBWqALkObcGhUhycF4Lzag2vQAfoilLcFUFqK7m2HT0B+u+WKlY9/Hv4TIfn1WeLg50RVONZpVZqwVuneogZdj/Asvp9E89/xcRp9PW1nf5hxJhWfOC1ndY67mccO/H92+e/xMXTE26/8xZntjZ59+YlvvBbf5NXXr7J7vQe1992UruFwkG3xyuXXuXtK5d44OwJfuu3fotuLuzuznn2Y48ycJzgJ9CS+hDG0YVeUaT39lOpO2uR4IuHwKgd0S87bAy67Lnx7g0ee+KxY47GpO74h/lMIKuiua8JQJnNl+wfHLDIPbP5gmXuyarkAjE19Hnp/Kyg7iqepC6UbtiaLdOXwFIzKUQsL0GNHAKdKDY7IC96NABdxoIwW8yrU4LPKn8Sa6QgoIU3r1whNQ1ajJJLpRSo2wWV4ujH4DJcEgMptXRdXok/NzHSTjYoXeGRCxcoOZPzkkiPWSEEWf2Tc4bGUMnkTklty8uvv8TF82cQAkpP0YBlW1V/jvZ0ncxgIKViOc1wgmf16quu7FL5fiIBrDop1Ba1Ye6/V8uz9yCJTShSnGOnsZrPGpjbIwmBoMXblF4f+jGbYpYwgyyFwRzXqrqTWqCRRBR3nlAJ9GaU9TKxjvsYx058jz0SyHdnZOs4e2KbWzcji32YzTtOntjg7Tf3ufjow1ho2J/ukfOMzY0tcl5wcDDl6ac+QtM2pBhYk1HvXwxJ71DBf0ABOvy8qMtUSQgc7N3j3u3bNMHnTSKwt7tPzkYbf/A1GRCTZnD91k1u3ttl1vVMlzOH7GsFOiBkhJSiqy+KIgkymU47tBgpJaIFCGGlwxiHGqNWR207gpxYdH1drAN9zqCFGN0NQAWacXPftRy/S5tTlcV8waV33iKME9185lQQnJPnLU9HSgZp0JCITYIYiKKIFZrgYKOYEs14hySCaCYSgIZCZDRuiTFRyoIQjWVZsrTI6c0TvP3GFU5uJEZt3eAE58BZ1prTHOlJNpevUyNKIVANcSWtZqs2zF5r29OrvGphNciRDZ9PDzdWK/qDGUalM5g6Ili0Pq4gBaRftUm90st1wptwA11AhdY8GS/U6Ai02hDS8N5OA5G12Po67mMcO/Ed3FwS2efUuVM88sgjLKdTrr9zHQIs5nMee+RJ0thRd2XZ04TMmVM7fO6zX2Bz8xQpppVMUwx1V7mO+xoD2MAwQgje8hRvv5WS2djc4NGHH+I7r7yMEclZefKpJ33n/0O8h6py8/Yd9pcdC1VyqlWWeGutFONgNkW1MGoaX+wj5H5J1/Wk2DKfLShqbG5uEarcl1YSvput9iATmtEY0ohF58/fmIxJE4HSsz+fsnVixxfyn9A5HWZaiPDG5UsUU3LfO1+vDHSDKksmnkKaJtIVR2/mWrl2yyU72y1aXMEltGN3WIhCFxq2zz1IiRN0ugBxq6KSC6FpEYlcv34X08LJnS1SEpZ9IecewthFqgcUJYKoUCrgZWhB+2eo1Vud/3mudJqFV12GxCEpWm1lHr05ZLXJMnF9ThkeG9qhqxFGAamCFea/61XmAGg6CnpyIe8+4nqdVkjqyi1qQ5W6nvGt4/7FsUuvg4N3eOqpj3Pj9px7+7s89OCD/MKnP8/NW/e4dvUtHnvoEYoq+/f2aaVlc7TFyRPnmUxO8MIL3+L2rTsVyr5Oej+JWKHhgrh/mnjLLdQqUIIL/sbo5z4lt7+JSYjph7seZkYphcWi487tu8zmM58NaUFzpvRLuvmUJgil79nb22P/YJ/96QEGdH1H3/eUPns7b4BCAqaCFiPnzMF0n/l8xrKb10XeKEHoMe7u7TIaj2iaFgpovn8L41G7nCGm0wPeeecd2pAoXU8pZdVeHO5nkUCI0LTi1Y8YIShdl71iqdJmhoB5xTMaTTj74GPs55a9JcyWxXl6uUNEyL23/BbzXc49eJI4SuR86ACRuw6rTueSHMgi1epo4Pa5RudACh9QlqEmvfdpdQqIeDvbBrDKIfb2SIeB6vSOt0WV+vzhfAwE9HIE0VvTbVFMfa5pUiiW0dIjuSNoB7ZENaCaVlzN9YxvHfczjp34miZw9eoNDu7t87WvfIllp3z7lcs8eOEiITRcunKJFCZ85CPPEUPg7t17vPDtbzOdz+ly5lsvP8+161cwLZi5rJKtoN8fAItfx7FiQGQ6BDytUHFWeXtmh2CPMKiIGJS+kEJcSWMcy+hzALfEgBVlupwholjJ9H32eZ8Zfe6RoEg0urKkyx1dzqRRSzal4PB/CaEmXQfiFBzBGRSSCZp7un5O0SWxMWIKWDEsFzbHGzTNmPmiZ7aYc3RIeb/mxoInBDN48623WZQeKORuSZ8zXcl0uaDF5cbEjBDx5CNGtJ6Isr8/I5BoUISIitDGxHIxBxVe+c7rPP/1b/DSC8/z8osv0nddzR3m2pel44FzJ4hJIQhZYRCMTm1ApXIhzehVyeIOeUGUoEawWGeUAhYxCxWS4l+8YW47cC49ybiQ9XB3gDhhfZj7BUXSofP6IM4tlpxqERJIi1Undx3Maqku7BTUekxL7QgkkiVGJjTBVm11rYIHcY0FWMd9jGO3Orvbe2yfehQ5/QDvvvs2r77xCg88Fvn2t14kpp5lt0Cv3+TypUtsJOPzn/sVnn/xRa7dfJur717G8hItcy698Rqf+fTn2NzYrje38h6k2Dp+uPDVGfAdvgoVkABYFSOusPUwIPRCclTfMKsZKsIf+F6+Mhouj0Yb0C4zn/ZsnRg7gRogRozAdDojpcRkPCJGIaUEKRAlkVKDW1kNUMBDCa3GAic2NugoDviIipaObnfqRqalEAksQoQmuVGqHHLD7kfiU1WCq2mxP51y+epVmsmY2WwfgnllWhNGsDqlC3XzIcEXdFPu3L5HKS3RAlujlr7rUaAZJZbLJX2fufzGa2ycO4XpgqYJpJjQcpjKowijKvcllV7gGqDVuw6f7TW4RJlSfE5numpF+gx0qDatImX93gkhoOIVs1P/hvZlGX6jmsvKMA2kBE+LUgJmpaZIryLVXKtUCJhFXA9GiHEA/rrAtRmImt+zVJ6iub5skN43xMGBMj+5ZvY6Poxx7MQ3CYVnPvpxZn3LJz71aQ72r/OtVy4xGQVyVozMxqbPjTqN7HcdvfT8+bf/hLIofPKpT7O9tcOin3FwsM/W5vYRo9R13K/QIzv4rA6DFxEUVzqZzRZ0XcfGziZaKQY/TPhc6FDcOITAKCa0ZKI0dVEObIy3GbQeQQmxQt6rfuSgD5mL0ueOPvdeieYMXXUdiI5IbCIkxSvUtsGkJWZHiY42NyBnxu3oxz5379UmtVVn4tKlyxRzR/WsChJA88pkdgD9pxRJTU3uTWK57JjNM0EDzz37Sab7lwnBkbCmipZC2ySaaFiZM9nY4uNPP8f29g57ewuKQa+ZFV1gwJuA4yTrdZAy1GewItXVlqKjXCspQWwY+a4S4OHnldU1ORwYejU3UCQd1IJz88TFuEVWCqBe3Q2cQFWKVm6fAOJ1voQqO2cJMN/UmK4OZWizHs4KB0WfH/vyrmMdqzh24vutv/Of88aVK7zy+ut86pln0R42J2PKckqzscXGiTF39+9iIdFr5vmXv8HGRJnN77EZT9CklpM7Z3j39oKdHV8UhUPe0WDZso4fL6y2o1bWNKu5jKsixhDZ2dkhtJEyX1YOX/ihKqXhaW07YhQbMrHWBj7zElh5A+YuM8hjmenKOcDNWjlUG8Fbnyk2NOPkIAsZKoJAkugyYE2iyz0Hs3vE0JDahqRKG+/f/TMkQFVjsey48vZVQhM5WMwpAIdjPZCABVecSamhaRp3VdfiNAsCarnOVoVSipPNrYpJR2Fzo+X8xXM07Razgymqxu7BvvPqRJ3rxjBbG5CkAxk9sOx7JiPXwAyDPMAKgWurc3vkE66QmEMrfDXXpCb24XUIld/nn9lWSXhInIfgGbOCBq/OklUZs3qqAlLb6lrnn9U5AkMzFW0aCMEfV3Wn90EJZt0NWsf9jGMnvqtvv87bl67y6LkH+dKX/oyCkYIxGWUmozMs50rQFsvuqSUqaA7s7GwRLbFz8iQXHnqYsw9ccEpDRYtZXXjXcT9CgKpvqAoSHcxQUXcxRlIyZrMZKTVsjto6B7TDbPb9wg4Rfl3nGpnaZ3rNZDJ9LpTeZbaMXBF9rt4RowMpQgi0TXMImxeQUBO0ikt25Z6smRATFGhHDRml10xblGjKxtaE5WzJdH+X7bYlhfu3MA6nwky4dPlNTISuX1aH9Vp6VJShhYSrKjsCM5iQgjuukwRJLQfTJRJg2S29glLFCsTYECUxaUeIFpaLJVs7pyhqvPbaq7RtJGdq1WUuSK2ltnVdF1Q1UEKki6ApeAVqTmfRIC5WXs+zllA/X0FEK6/vvRuGgldg4SjlSIY5H06fICAFzJyH5/JonnSLQFQHsfh8UAmhIZhgJddEBu68nqsGqVCyodkTeYwNVAAOA290nffWcR/j2InvU899nMcvPsxDDz3NQw8/xj/+J/+Yjc3A5uZJ7t65y+apsygN0+U9JpMNNmLClh2UwFMf+RiPP/oEWCHFanxZodDg6vDr+OHisC13dEUwQmhRmznAoNQRYOX1iTjAQGKEYpQ6Uwscb0dt4HMjVZbdgnuzPWKn9LGpclnKeJQwa5FQKz9xMWVBfHEMh+7lg3YjSp3jCFbRkjEmtChd3xGaiOaelCL9YoaW4m3cnFl2c8bsVBRj+KFb5+/XHx0q31xgvlhw5epbSEiUvnPHA6vtZKW2At2YVcz5eVRD1SBKbATrhVYiOShjbdmjQ0ogpsD03gIE0miEpECgIYTAS6++xOuXX+XshU3EXHrO+Y5W04GABFR93ufOimV1XYMZQY3kz6zXLQJ1cDggO6sQweo54EovpWBE0AbTjAttA/Q13zc4ZdNl51SEQiSq1bfw5OZzxIyUUhOwYcXfTaR38IoKVnwl6KzUJCs0EunBXdutIHrspWod6/iBcey76ezpR1nObvIvv/h1ts7s8PgzT3P19W/z0EOP8PDDgbev32RZes4++ABvvfUWzVZge2vCjXt3aa5e42MfmbK9fYLFYkGTIkLCKLSN60l+P3eAdfygGFpgVnf3wyzGE0yUQQsfZrM5TdMSk5EkOEncs98Pfpdhs27Qti3baYe2h8VAUtYOAuReHMCAUkpxEE2tLAc+ltXq0TEYw9Qp1LYo5K5nNpv571SQiVphejCl65akGBmlxu8bEXcrr/GjAFyOzveG2fPlN9+kyz2dGF3xmZxlP99Uqa9RihBg3I4oOSMSuHvnDhtbDTFFSs9KpLp0SsZIlmhjQ9cIS82EtkWtCjEH419/8V9x7tyYPveYerKLIRCrd2KuZP1DF4ceU1fOkSI+WwveQFbLiLkdkZ/nUEFPR0kKh4k/IGCFnD1JGQKSQHOlxAiahdKDxOQITYWiuY4v3DewDhKJwduofZUqS4N8GxV0NZxKqnOEGqX6CmqIRBESslJ2Wcc67kccO/H9t//wf+Tu7i362V3OnTjF/u4MDYGvfO3rnNzehhjocmF6cECDQZlRcmF7a8JbVy/xxT/6FzzzsU/xxBNP8ZWvfIkL5y/SNA0PXjzHaDT+SX7GD02YQS6Z0FSRKS3M5jMkHO7wZ/MZZoWiSsYReCnEYcjyA96g7vKBRiINsSIIq7pHCuTSQRzRF2U+3WdzMkEG5GMI70kwplpnfJ70gg2WNEruBwHtQFs1G4MIJ0+c8FmlBO7ducP29nZNsj9aHK30Ds+jMV92XHn7LUowutyRBwBGnU8WjJSgEUgxYrkQxw2z6YzcZ9o4Yb5YgrgPIlrozCkkWQ3NSjblzv4usU1YCUwmE+7ceZet7YZ2nPA0uWLduf+eHU1XVapMIiEeVm69+EwvBCHYQCao3oV15msKBG9ThmEWPNxExegrxqjgGqCl8t6juON6l/uVc4RUhW2zts7j3DXC54cuMl3IoEu2WqNtvM9jqiseoCu5DFy/unlRp2OIeTt0Heu4X3F8B/bX/pS4/SC56+j6O0x1j6cf/yhvX73KvFuyuTkmmvKJZz/J8y/8GUkjG+MtFh00ofDW1Zd45MFH+fOv/RmnT29z4cJZxuNNmja8Byzw3W7X6xv+2FFnMarqUDyhzqXcGSGESikAT1ZDtdBEJB7nVvAKwxA0K7P9A1LTMJ0tIDWY9KQA89kuRmAxnRHVmM8XbG/tMB6PVxXf0JZUUYfM12MWcXBE245JqWW5WDKS1ueJVgjJ/eNu3bnN5njCeLIBXf8jJz6/3d5/jwUuX73CIncUcfFl7TKmdVZW59MxCKNRg5mRs6Nob9+9y9ZoghWfXfel0I7GRIRcK9dsBWkjcZER7QlSaNIm1nfMDm7zwIWT/j69JxUVpWiuKbAChxg0Qs3dH/DKqCJJqpSYYz0HV/uBa6daGPRWzQ5FtgcpMkFI7QQbn0Dr7yZxyoYVp5g0jOnx865qRMuESqMZvrNFjYLzBmPomYSOMr3lLe6BX4q5go0pFE++WoBQ27q1Hd+vK7513Mc4duLbau6wLJs0qqSSaaQnKGguWBJChL7M2Z/e45Of+DRnd87w0Y99im+8+C0uv/lN2hQ4dfYsBwcdqen49ne+yc//3OdZCeRyNOkdZsIPSoSrx44MJ9YJEgaErJmLA5u6eLJUaLiq0jYti8WcNgqWS23rDYZqxzmHskKHdvMFsR1RcmG5yGRd0gbIi56NjQ1GCHnekZcZNlkdx6rlKUOlcYgkXH2Syi8bjVp6zb5IxkCuiX08mrCxsekmptl5Yz/yWTvSijWD+XLBq5cvIW2kX8zJfUdQRTWQQ0AJtGI0KRDb4JJhxbh19w5NM2IynrC9MeHm7RmpmRAIRKLPuYqiUSkJ8nJBWArWL8hxg/27d3j0kdOYdWgWxJzbFqKg4kLNwapbwVEqkDUrbU6r15/cIim6wMAwcwv1e6WDrdCA0hzuHm9zmkEYjRhtnASJqAqBxis7LQgZzYE+dGhwi6tQHdaH76OD1nzOKxkSS5oqUEDV6rSaVkVwYJPVGa3EOnusTn4SKMfQkl3HOo4bx058/9E/+C/53d/9n1n0+yTZ4slHn+blV75D045Z5p6b9+bsnNji0ltXuHjhEd64cpU7OXPj+m32FsrJUxv8sz/4p7Rs8slPPo0EYdkvCGmLlZecOWk2BDANDGDP9xqfrgk93ysEoVgGjaj1ROywjeQIEk6c2GJjNGFhC2IulIq0E46T9JxIJmKUUAiKu4+XjISWUTshz/cxUbJmJEaa1CIE2hico1erEidUF4oGQmiJQTA6zJIDSLQ4QIdIpwUtivXVRqkosRi3FgeIJELfsZwvvwuocrwwoNTqyGdYV65eWdk1DTB7JKBkVHtCM0JMaCTRhMhSA1pmUDpmB4Vf+rlf5O0rrxAEupyR0KCloCXjZG8Yjya8u7xK0Q3aYEz3bnHm7Ja7VhQIMZJzTzBPBGh1HRJBNZKLVIWcEZkpIiOXEFMHtkioVTS6AkiKFozeNyAhUCwT6y3iFaGBBEJ08eueiIStqv9aQOs9JS1ER68WyzQGaFMl2Sr1QpUoQjFIljHryFmZaMBKRoMikepdWGXJan2qCk0EKz19cMBL1HXiW8f9i2MnvunBNhcfeZR2dIZPfuKzPP+ddyBewcRIMbglzXTGZHKCd29ep2jPn3/ja6i0EIz5zTtsjzc4fXLMbDalbVouX36D8+cf5OzZswQJqAX6viPnJZPJFqKHWoIfHEfxaB+u+ODNgIFlr/bwnbgODqcMwlO+gA7EY/8t4VjLikhtqPUuLdYEFssFzSih0pAaaNKmS2k1rbcGEVIZIylUwWEqqtCAFrFA1yloj5WOLi/IXU8umWJgRJRQofv+AZIENoIjEGN0Me6+73+Mk8nqNprOprz2xmu0oxH39nYZ6GoOjG1ocF8+BFITKaXDASOFcTsiTIRHHnqYa1dfRU1Z5iVbGxNUCyV3EL0daBpIo5ZZzpQgjDYa2rbxNmNI9NktirIWvJFZtyZVkcVUXfCG4hsGHLU5PM/ZHQP9pH6AWl8XrX57QgWkUAUJqmXRgDryT+Z0k+CI1lACKgFKWGnDRqsUmorkFdM6kxOiCaIFKVZbrPWeq2AXo7a7rWDSVw++DiXWzcZAm1jHOu5fHN+P70//mCeffYitzYssZg2vvnyJM2fPM5vuobqkbcf02U1HY3RYNqaUMqdtWhLGbO828eQOl16/ya/+6m/w5S99lS984TcJ586ipsxmc77x/Nd5990r/M5v/y02JydW7/+BSL0PefX3XqURPzfCisPgiDurC1yt6LSqeoQgh61HqWi/H/R+1CSQlT5nNk/skHKht0JP4wCLFEBdi9ME+j4jElkujNJ71VNKoZTCsjN6PdqeLcDEgS5h7KAKjrTGKhIxNInNjYa+X7jMVXH6w4967uryjpbC5SuXKSjd7AARo89+vEIkSCKGSLFCbBMScG3OridGnI6QEluTDR8BBEjjlnbUEsUtiJBMvzSEiAa4ev1d4niMVtcSR3EOs1RdCY0HwcEfxduRrsXZkLNXkSEoEpQyoE5rGzmEKjNmkCt53pG/gRCrk4cI0ZzU0mOoBVdnqQTzgKDBZegGq/iYXI9U1RuSXltmBs+/EnyOJ5XIHvGE6IprcjhTFCdjqLlCTYgBCa4CIyGiuF2SxfUoYx33L45vSzR/ixu3G15+ueNv/Y3n+PSnPsUX/+SLtGNh2fWoJroe0khJTU+QhlGTGMUOFgek2PLpT/0c21sbbG2e4qGHHuEXfiEwHo9ZLpaklBiPRuzv7nLm9EkW8z1yFrY2t0hNYvANG3r/slJ04JizqQ9DCDFU01nzuZkNPnWrNiC+c/e174c8dYaJ6z/2udAZlAxLjHnX05ee3C2xXsmlUAx3F0BAo+s6is8ULURX3her3dghEY1BBxShuZcfvgT7tS9ov2De9Wxvjem6zGKm7tLwI8SQzN1UtvDG5TewAKXPqBVUvTLRkgmhCmX3wyLdMJ/vMzuYsrkxomRogrsgDAa1EoTZcgGY0x2CuzKkELi3d5MuByZRiRUdW4Gjlb5hWPA/K22cTtW1Qj1loZod5GKlonWdW4cNz5MB74IOyi8Sq7OCeeuyil7bwK0VQUKDhaaCjfxnilRha7y6M59DDolWbJgpVlEKq+a2was+zP0TsSpxpkqhiqZroJTg5HfxCtPPhet8GmtwyzruXxw78c1VePK5j3Hpjev83r/5fcwyUYR7d++xsbVJKQGhoZQeohLESKXQRgevPPn4U5w6c4F79+4y2YAvfenLnDxx1rUEAVDMhIP9Pe7tHnD37g1+69/9+8wWC0IHQmA0GuFK7b4IrhPee0PqbnqQIROMUoEhVBDBys2BQ3uZAbBy7LOphf39A24ulvRLZU5h2btqSBQQFazEOh9rMMSFlykUcpVTE2JIFAagRZUiNiUSfOEMVuH27s4eRRgnJYRc54iwsblB3j+S4H9AvJ+2AFQJNeON11+n73s0UUEi5sCSbF7JSM9otME8101EiOztTmma6lpQFVPAyezS4ZqVlolBKH0hNA1t03Ll7UvsTu+ytXmeQPFSvN7bVKTjAOs3V47mYH8fk4YwEYq6G4YBMUYGhZwQBOsUS0MbW6szfK34K3VBHY7pK0ClDIS6USrmfnsmvjuyuuHxW6XqftYbTgYkqZmDU8wI5i4gJlLnhs7Ds5Ldqb1CarQClawmTvfvs9WGzCrCNISw0gtdxzruRxw78d07mPK7//T3ubO74LGHT9PtLvhP/6P/mFvTe/zzf/VPWMx2Gbc7zOdLUomkCSyXQmwS4xZMer705a+QrOfembN89nNf4OpbN7j2zrv84R/9G/7u3/27NDHxO7/zN/nDP/p9eua8/vYbfOnPXmBnu+XE5lnOnj3Bk488xqmtLcaTMW6vEj7E+c+XD/+nIhRKoUSpWolC7r36k+D7frMOLFIUh4+LtxCBH5j53OTU0GBoVyg0FDKm1TkARwlmVSKJ6HUMJhlCR9sGYo6UvrCzvcH+/oKclRgCaEFlhIZcEYF9HTw17raOt/y2tic0TW19BbfHQfJ77oHjENiPeu4d7B/w59/4c969+S7jjQ36bkERrzG0ODQopOTzueC6oa3A7r1bdMvMqE1ezRBJyeelour8Qu1owgSzjLTQo2xubnL12ss8dPFBdu/1TpdQF/BU3KZnSDqqhUhkPuuhJD7zmc/w59/+c4IkkI6sgVJ6TLf8Neh9JlcTKOYVdUHqvDeA5do1ccFw9ysurETGSvTjF630jVC5mMXrLglEq+cnaE1+hkoBapWqjuA0gBzcLb7PhKBk68mAmdtiWXHyfa0VSeZO8Yph0iNWCPrji5CvYx1DHBsqtZzdZb53kxPjhjvXrvHEo4/zxa9+jXfeueHtnaD0OmfcJJrazhFxNYo2nefk9lPMpguee+7jPPnER3no4mM8+vDjhJi4d7DHfN5zcLAgpglnzz/Mm1eu82//5E9RLczmB9zdvc1Lb7zMP/69f8ytO7coKwWQD28L5LCCWz3iih2W6XtX3yhDdTLA9bEKkY/kXHymJMfDdA5YCUMoxT3UMGhjYtQkTu5sMm6FyQRGY295nzg1YrwhjDcSG+PExiQyamEyDsQIGkbMmdCFcZVQk9UsyYoj/JbW0YXCwgp3Z3Nu7e5x794umhsWcyOXH74cGCgVqsobl17nYHZAahuWuVtxH322hydekZUbepSIZaNfzEkBRs3Ia+jgVaABfe5c6cWMQEugpRQhNS23bl3n/NkdkhhaieDec/2uKwwWiGHE3bsznnj8OR648DDdcglopaJQ57XUdqWtEJpmBTU3fNVa+bu7faU8WJUOlFg3RsUru0opUHEaxSB0ftT2ySvCWrUN7Wqo9fuR8yyKBXUQTumw2vL0VrBV2op5i9QcYqrFzYht6FaYeVW8jnXcpzh2xZeI9LMlm40w7wofe/pZ/vv/6R/x8AMXGI0TjzzxFK+/doWEkRrX2YPAYpnpN+Z8+Rv/DOKSBx96kqcefwazwLlzF+is0G6M+c4r32F78wTjyYTrN29y7sx57t2d0eU5uZ9jI0Nab4fcuPkuiHDu7AWa1PzETs5fxhhGn2JA9WeD+tgRlRIXWw41IRaOvQcaeJQmTCYTLBckJiwElrlnMk4UhRQTi70ZKZXKLzOatmWxt09sAlkLoyaii0wmIgESmWLuOTfgcQz3ghuAicsuE8icOrFF6YXpYon15YfaAA0cwVIKy+WSN69eIaRI1kLWvCKou8KJITGSYvKRVRBKVpoA46bBgvDwg49w5e03KaXQWaYUB3mEIDQhOVmbQJ+V2f4em5sTxq2QRi2aDwghHpLIqSAWXPhZKRTriSmS0phuuaRNkV471y6d9wyqK6qHrdFSXLas3ggV3OnVv1NFYp2ThwHr6T+jAkdxKgIDqX0ApdQt0kCBKGqr1w5WZdKo0nk2VLC11b76x1V41FyGTkzJufPNRohU6Ws0OtXGO7LrxLeO+xfHrvjMxoTQMp0vmfUdf/TlPyaUjhs3r/Prv/lbXHzgIVIQjJ6NjTGxabg3nTJfLlkuO576yGf4hc/9La7v3uNb33me27duEGLghW+/yKuXL/Gv/ugPmOcFs37mi4YJH3/m4ywXC6xfsji4RzfbZ2ujZX/3HmdPnyHE+APoDh+yEKr/ms+ttCh9n1ctTNVaWYgvwlKBMD/M61MhJoYSk9GOAOkJCfYPDrz9HBIxtk7bDi2j0RaEDfbnEEZbhHaLe9OeXGdK0XpC6T2z2eotDlGnxdzxoe7+U0oEEfYP7mHWOzXjmP3uoWIZPvdbb72FlkLfLTEtqBld17FYLCpisrrXh0AMgSBCn5eE6BVv227SLdxZAlPaUazzMCXnQskBYkbJSBRSNLY3G4IESlZEotM8jn5e6vXTOu8KAxVByX3ntAhTloulI5Qq4ARwwFBFSw5WVC5adkT4ob7acD6HjcwAZGHVEhVYJdNDykwphVLRrtRZZGVG1HPMgHc5vHZaFVpq+1Xw9mfl1qw2ZqHqmlpQLHYUUbqsdH0+/n26jnX8gDh2xbc03IokdIy3R1y78Rbbkwm7iwX/+J/+z5w/dZq+E4TCdLFg1hVMIGdl2k/55ovfJI02OH2qpemMczsP89lf+gK7s3uEUUIk8j9/8V/SNi0ffexxPvOZX2Bn5zQvvvwdzpw5wZOPP8PXnv8Sp05s41qKC3baiROC5cOq3D4sRrL6d5CIaOcanBX15zw0b9/tz2bMuwVEY7lccvPWbXZOnnzPFuh7EsGt/qtyvvqcvc0Vha4vnDx5ltliQS6ZNghoy62be3SABJcfS2GoOAsizgkLpo7alIjYAnANTxGnsBPUK8Lo87WUAn3umWyMaBogjt4Hfhgq2+/GPx1VaFksFlx+8zIxBbrS1wqmOPKyVHWZ5E4KhlYjWCfnm7SVC+cuCVpnlRuTMTFEgjgSdbro2NgRvv3atxiNW4yOYOoISYkO4pFDdR2poJqBjuCcR68IY2rplwtSjMy6JU3T0s+Wld9YISdH+JlaK02rFWyoOpugtTzU1XV3qbsjuqlE/7xOFqzoV617k9pWGFrf6olNasV3NL3WYq/WcN5CHe6fofXum7T6eqHSa4JfPJGAREeHrmMd9yuOnTEW0jHpI5ujSN872TSXQpOA3HPjrXd54KmneOWllyhhSUojEpnNE4kcCtkStlxy73bHZz7+OKc3TvKnX/rXbJwck1iymB/QbI7Z7/f5xsu3eeW1b7E5PkXuO2aLwMtvXmdy5hyPffQJbGE8/+1v8cuf/VVi+rDDvYaVvS5uZkgpaDSiKd18ChLolkuuvnuF2wdz4kaiW+wTxNg9ODhMaO+rCt4fVttvvpol5jNhf6pY0zCfzjCW9D5gIlhP1ISIa1NmW1aJu4yFniaCSCKFQhNHoJEUjRBH9A3EqIzECAHUWicxWwcZItG1MftCUcOWHbP57Mj5+N4YVU8qgRDg8uXLLMuSPvT0WtuUOhxXolgmxEwaQ98VSlF2p3NUFCvFSeLRE0IMwryrCMZi9PMZJUMYj9id3uPu3l3ObG0RzVVIcjC0+OytZEdIC4NPZUSkoLYAnaF9wmwTIZC7JUEjgRFYj+qCvm8qYnbhnDd80+AGsgP1B0RbsOJ0eOkrR8/vm2xWffVcWNoTVQUYqVQjW3nPTC8MzH6oAJr6HwyJzV3XixaSCMWEYrGClzJSxb5LdJ/GVBSJQhbzjVDZIKiQwgC8Wcc67k8cO/GNraMpiVMnzvHu7m26XBg31e85Ktsbm7RNYHtnTOnnjFJDFGExmyFNi8REitC0mXNnH+f05AzTg0y72XBu5xRnnjjPd156kUkbkCCcPneC6d4e5x54gJwLfX/AbLbg337lzzgZA7/8C3+HGHzjeh/Nt//SxfsTlR4B/VgIZM2+OMXIuQcucCGMeOONK3RaSBjbJ3ZwPGJcvd77vemGEBkg7/4+s5k7cPR9IYkj8WI1SY0a3Z4mGk0DGhzl2YQWSb7gigVPDqFxwrcZfVZyp3TaM8/Z7Wh0jNETohHVK6vNnU1KmZIXI2yRGI0G1N9A2P/g9udA95hPZ1y5coWUEtPFrM6ehlkYhOB+gDE17g2YlMV0l/ncrYe0FCeXA4tuSbbi6MfgMmUmARPY3jrB1TdfZmtj4lSDbO4/GIdzfKi5qWaUSvh3qTNwiTi/Fim1mC4ZRm0Fw4JA8BovxEhnfUXqupvDinRuh4AeG+a/NrQlB9d2r7BLfdyLukOjaKuzvoiT0sVf5kjjwWkSVvceXnEntGRiSJ6Ej7yuiv9T6gxYihKKoUlAC2bZxbYD2Ie2q7OOn0Qc+27qOiMQefvGXboUEUlkKYwmG5gEHn38ce6WBW0bMGloQiC0Ix565BFev/ymVyIoIoUvP/81nnjgMR5+8BFOnDlHMz7FK6++yCOPPMre3l00CT2GROjKks3Nk9y4+S7taAPtC4899iTnz593Tc/jjyn/ysUHCXinlLDeKFZAXGVkWChT01JybTPGhODk8qHl9IOwnXY4SmI0GjPWQEw94whREpr9mpmoo/JydK1NU0qR6rXWscxLTA2zQB5cwYMRGBE0U+pMK5hD6Rtx3c7ceTUUJCJibG+2dNPAIkObWg6P7uh5+eDPNBqNOHfuHFffveqw/IqARBx40xJBfJ6IeeUZozBqG7a2dtgZB+7du+vAlVLo+w61BAz+c0o7nnDvzi0oRhTfhLTJ1Ui8ZDqcbQmGak+Mgdl0zmSz4WAx9+q5zrFHo5Yy17rxyM6nU/PWap0/HmazYZZmNaWDE9xzPT92eA9pWLVXh3987HhUwPoQuVkngEdfBcydQHToHDAgQF0s3eXMdJV8CU6dCKvXWr0SgzqMC7yIC3N/3ztzHev44eLYiW9j5wzloLDMBUmBpmkp0rN7MGdraxuKcOv6TXbv7XJqe4vJaMTd3X1u3rpDLkaxOUkCMTcsprt857UXeefaDc4+8AgvvvQNfumzn+OrX/8qzz33cV565WUO9u8ybhPxxAjM9fsSwkPnH+TTn/gVYhsdmfaTPDt/iWKozkajETb3WYlS6PKSTh2W34RIJ4X5QCSuJqcirioSY/N9QSIDlSGEhOAqMUuFZd9hpVDU25rFekoxJCf3ehOhUxDNJJQiStCIhUwKE28phkzfBUIRNBgxVBxghF6rc3swLCjZOuYLYdKMmUwUneY65zKfSb3niL/7PJk5wfzxxx/n6rtXSdk5iG4d5BqRITq6MkaXJeu6jib4VmFra5NusUebklcpBFKIXr2qYap0/ZIuZ9rUMG4nXtVIJhBQ8U1GjAPx3XNVIyNu3bjNJz/xaa5df5NAAg2oumzXeNJwb7duGnIhqhGyIerHbqW46bDiTgrmgBiRIam4m/pAg0BSRV9KTVhaFXSGc1ito0wp5vQFAaQ4d6/gn3XIl35THdIe1LTaHBU0d/66okidG4KQLCClsFSlq/PJUApKbdHWBPohZi2t4ycQx058k42W8bhl7+A2nQakzMkSSCnRdx0vvPACC4HJ5AR7d/eY7s4JTcvBQUQRGlFyUe7uZXLfMY4Nn//FX+XP/uzrNI1y5851RiNhNr9HmyI74xNM85w333qHJt5h1HZQ5hzc2yCmsS+CHyLn9g+avR3KTuOzJTWK9q4lqQYB5trTaWGSGkyUvhQsOok5RidIB9wpYyRVccMciv9diaNWFFqU3f0DlnHkJq0opRPUIhmhL+IVv0KU4tVcbAkhQb9k0rTE2DIaKYslFAtsbI+5eyvTa6rIQlsJREtIZDPaJJw6tUnbClEyDUYbhWXkEEpoR475A3L4UF2ZKadPn+R3/sZv082mXHv7LV555VWmfUZiZY0FtwRaLJcs5nM2Rg6smS0WkBeIBVdNESNaAlFSbBxQVIRihZ3tCbdv9IxDQ5scrRhSQEzIxdVXSvbjunXzgGc++imee+bnefPqFdAWCclnb6WnaVJF6QqlpqqgRhIhinhlaQOYxWdy3kqtCMuV2IFLrg0nabiLDKC6UgxoTb8UVS6u8j3VoZ7u2hCCyygUN7tFopv2qr+qa4oOrctKmxkq05qg3XnBNzlDxScxYaIESUhfJd3WsY77FMdOfPvTfcwSH/3YI3zz+RtonpHaCcWMTjOj1JI7ZVkSnbY0QRjFlnv7MzY2Wt89Nw2lN5569COcPHuSf/Un/5pGEn1WXnrtZTY3E2+99RrRNph3C2SS2Nga083nlM4YbW1w8eLj3Nu7w+nTZ1Y72Q+rF9/wqVUrqMIMzc6Fa7NBK2zubHP99g0eOfegIyRV3RMuKH3u2Tmx40hAGajH8bvfoMawlIFbr43byFKUpAG1zNakoVdltoQTJ1pCdjh8HAt3dpdsn9yizDONBKazGWlrg/nelI2tBolawRDuRGC1WqkfDREnzU9nnXs6hiXnTmyw6BqWS69ahxbZ4Wzyg2d8AVwj1GDUtDQ7iac3t3nokcf5gz/6IvfmMywkB32Uwv7ePqNm7MckrkPaikEJhDZiDnf2jVhKvPDiN2hHE6JAapM7DiD0fYcQ6PtCEzxpWIXyzxcLTCM///O/SO6Uon1116ASyo0mJq/mYiC7pzmlanQOLUbNVikC/vEDgSBDFTYgPz0JqrpBsX99PCmKuvJKrwVZuBGv4ddx5a4xVKnV9ijGeGR2KPWVDEwJRemWHZOY0cLKjFeqo7rP96BU7qbUa9eXQmoCXZ9ppaGUxY/3ZVnHOo7E8SfG2nDQzXnq6Sfp+g2uvPkqfS70y44YW/anB7STMQfdLtCg1hBDS4oF1GjaEU1oMCtsjM9gVcl+uegxfD509uyDnDl5ildfucT5C+e5/M5VTu5sM2qFVCJPPfExfu7nP88obf7AedSHIaxWLiKh7uq99bSxucWF02c4c/4MW00DuGlNlMB8b4pmRUYObnAbmEDbjCpy7/tw4uq6Z6bEaDSbiT43SB6z199jc1PpM6Q0olvM0AyjkS+8IRaX72qjK/0sOlQDXbfk9GQDoadtG3IekmuorhFOwwgVfjKbLjB6Tp2ImAT2pwu0HNtY6T1UDTfgpZ6HwGg0JqVEjJFOlZQSpVtiJRNHUtGIbtorMdCbkopLihGUFCL7s7tceeNVHjh/lmU3YxzGnmBHDcUSmLcil2WO2ahy3ZQYRuzNCkhCpEf7nhgG1wIh1GpcrSDVVSEXJdcNQqXcrbhyrjXqmqciwaXlGPRZq/9dbUMOMzrPVdUtQ5V+6aAdP3FSpc38+qsafe4JUTgKhhogQl49FpKB9T2MHPkKLmwtQ1vUU3BFlDq1gWju5df31Q2jW7szrOO+xvET39LRYpcvv82tG9eYTMb0d24i2b8xGxstqTUkF2ZZITm+uQk+2Nde6ZZL8rxw6dXXKK0R27oQBaGEwGTnNC9depNmPGHWzxilwGIxpY1C0yQQ4Utf+Sq//LlfZdTEuhD4l+TDEu8nE/sm23fIs77jsY88zeM7W5S+MM0Lrl+/xTOPP+mE6b5j2S2ITQIp5FINXs2qZJWvf0Hkg4qlw/dcCThHKIFiSywVpFGiGLEYPUJMIwdKUIgYUVoW8wWMGyabO0iYcPJkZDpbuqaojsEcNTm8V8Dtd4LWEZ4YoQ2MJi27B1NKiVh+r1bnMc/k6r8GlGaM7lpg6pTvJjgicnO8QZQI0teWnZOxi0HbNvTdHA3KqBlx9a3XOH3mJH1ZEKPPv0opSMjeigwto1FDlzMxjil5l67viDIha1nN+5yTOVzvaskkXokNXDuRQJdBkmBh4Ni9ByaCiVtE6ZDovGfp55LB9f6IJ58KEiNt0xBDohkSmQrZqhShFtSEFCMhCQOHUEvVjMW9IE0DoX4/RQ+82xCEojrAX7zyDnV+KAkJCWKl8ZWAiDLrZjz73Cd/2Au8jnV8zzh24pP5FBlv8vWvvMDPfebj7M3nPLi9xcOPPsVs2fGN73yLRbcAC4zHDYgynx0wbpK39UNC1IhqbE4gbG6Q2hG3btwghEJC+M6LLzFqW5b9HpQlpUxpwoT5ovDRTz5BCR2X33iH86fP8ewzH0MIlZT74Qw1YbZYcLBccrCYM+87FlHYv3GTftZxEDKbFtm7d8D57VOEFDnopsRxYrnsSUN7VF0vEawamDL0Nd8T3k50+H3JkGfG7nRJ0yRie4LZQhAr9F1mPNqiWwYWi45Gjdy13LuV6bOxf+Ccu1L2UBNyTeRROqINjhFenUgISO5J+CIYG8OicTD1mdLGZINoRwEZP0TMZrBY+mKPkfsO7t5loj0jK6TgcltdBomZEJb0fWS0dZqiS0owUptYzgvWRHYP7jEeCeM2up6m9YgKOS9RjJR8Blm04ALUXpFNxi25F2IYkM9G1kyx4vM/apUp7rrulk1u+5OzQBpMXW2lAjOUcKWCfrT+XKh2QhIQcePboeoLGMUCZokUGgdI1TZz1kiw6K1INaTg2qpFiW3jG9Doup+eYJ2qkEsmRneMEJO6qTq8yRSlqJKLz5UHqoOpEiWx7KY8+9xHOPfA+R/++q5jHd8jjp34HnnsAjfvGXMtPProQ7zwze/w8NNP89d+5++Qu57TD1zgd//572ES6XSOSKRNiVKMHDqQzINnHmG+t8/Tzz7Gc5/6LP/P/9f/m5R8iBOsEFWwBTQhcHLrBHumzJYdbTvitZdf4e/9rb/HZnOBkydPrMAcLuf04aj41HyxVzPu7O9ye2/Kzdu3GW1uEkcNN27foisd1mUkuHda1p7bd+/w5MOPEAxObp9wUIJV0IKI2/+Ik4wPQeryAcmv7tPVKJpYLjLdLNE13rbMSyWII/l0/4CiQoiB6bQnMsb95lzpRMTNY8VctktipJVAij0ivbduQ8RCS6MQzRDJSOOQ+agNMdYqJvAeB/bvxUM8+jNmM+z//N/CtWuOiDQjqfLclTfJ2h9JHkJWJcQAFGZpxO6v/zpdXLCYGjsn91gu94nLGeNuycntTeT6LsvlAsSYbEy5eO06O4sGUSVIZGNzk4P9JadOn2Vx6W12ThZK37I1L4RvfAPtMydffxMVJS0OGI9nxDtKev4bbL36Emf27hL29+iLUK4fYLHjTJ6yPdqgm7st2Kht3ZuxzghLTVhowY1sIcURISZC0FX7MvdKTlPS3R4LFWASQGnRkECz67MWWHRziIkQkwNXKmZlmAeqZp8aZ2XU30EXt4ipevRRKprUk95y0SNEUttAghZBO+Xs+ZOcvr7L29e+CZ//ez+Jr9U6PoRx7MR35/Y+s/mSQuEf/aPf5fTJk7y5tcl/9b//rwhF+Py/88vM9xds7Wyt9t5RAo89/hiX33yVrEacJPZv7fH8yy/z4qV3SZOG/b27pLCBFW9htclV2efTGX/zb/wd/j//v/+BNA6EYlgRtje3uXjxYtUxdM1I4cNR9YXakrtz9w4vvPEq26dPEyYJGpguD5gtDpi0EwqF7Z0dJAjj0PDgydOVj9a7k0DljsUmMptNgW1y7pAo7gLO0TbYe8PMUXlFM4v+LjEJWXukCFFqRVDRkG0K9fmbpJCIoZLrSd4CFUOKkTEK3iJVVdQyZurmpBqI6kamIoqJklJku4FpP0fCmNTN2djY/OFATq+/Af/1/wGpCXPgpj15nN/9H3/v+O/zA+I33//A//G/AeDf/6An/zf/Defu2zv/5Yplk+B/+1//rA9jHX9F4vh+fHfnxPGUc2cf4OLFj/DaG98mbow4c+4U/+Hf/g+4c3ufB09fZDa/x0MXznNn94C8LLx7/QYaoGknXL/9NmlTuLM7IzZKtCXjBJI72mYbVV8wwZPfzRu3+fzP/xLzxQF7t2/z+uuX+NVf/etQOVA+HvxwENhX4soYt+/dowQ4WEzZPzjg1OlTSAhMJiMm4w02zpyp4G+hn87pF3N3DEB58ZVXWCyXaOmQJnDrzk0efeRCdUMHgnzPpAdeA6k5GX4yVkIW1EZoiWhx+LyRGVJcUehKZrFcEugdwZejk53FCFpbdFKt9QRXdUFcgYRI0UIQIYaIWaAJLeNJw3gjMZ1ndG5H5oLHjNu3kb7H/ov/AnvyCfyeg729e7x97U0kWXUP8HlViBGZz5HdKXu//TvsMiNnGI3HzGb3aGOkaQOpgl8W3RI1Y3N7k9dee5fT5zdd57OHrheeeORJun7G5UvvMtmeEmSLxdT4T/7+v0e/XPLFP/nndNYzX06ZbJzi9m3l7/y13+SF5/+I/fmUW3f3MBnx7o0DUmucPrPNRpNYLpZoFJqmIYVY25qHczoHlCwBI6WRi0CItxbN/Nr2YZs+nUQtVpK9U1V8FpgRAy2FZXbKEox87FAJ6gPQyDU8fe47ll10edcBLtZ79W/B25y6ZDbraZoJIURiSmxttnz6mc/w9rV3iJde5W//X//7H/o7s451fK84/oxvMqYZ99y6eZN339mj3Yx8+c++zva44U//+Iv8g//F/4YnnniG//b/8n/i9ImTvH3tJm0zYTqb004ilgOqHV1ZMpJNymLKU089wrtvXeNTH/8000Xm6luXMcskEWZ7e7z4woucP3eO7c0tzj95ngsXHnSEXwUdyIfImWFQ7QdhOptxb2+XB7Ye5uSJUy4FFhtGJ89gqWG57NidHnD9+g3mt24z+uRznNg5ydbGhL3dfShGzh3atFy7/i6f7J5lNJ44gOP7tI09+XpKXXZCp5HF0q1ncpcrLL3Ql4wJpBAq8hBMIoIDHVQDRQSVwa/OW65BIBdPOHGYBYmDTwgFlYxU14N79/Y598AOG6FhNj0K+Pkh48EH4PEnACX8u3+dk6dOcSJG8nSfO9/6Bm/8439E6TtiSMh8CqHh4OmnuZvvksYTrty5yanHz7K77BhNEpPUoFk5mM/8vJw5ydtT4MnTzGYHSGnZ3zcuPPMxpvM9ri2Fk+fnaN5isSfYz32afjrlztvfZEHPbL7L5sY5bp6K5E8/x3R5lbv7e9y8cZfeEldsREo9iwtn2RglFssDNLWMxhPakIiVoG7mmxAB1JaIGG07ISZHcepAdzChb0/RN2cpmoBIMUM1YpaBzmd1wLLMkJTIZeT8FjJWKsqYai7c52pA/A6hizQxgXY+U7ZE1kLXz5hOF4Q4YWNjB0z523/zt7n67h2uR2Ezz77PBVzHOn74OHbi29U5J23EaAxjDRSByWSb3B9w+tQJ/vnv/XOefOYT/NIXfpU7+zcIMbLss1dkfaZ0DTl2NG3yyqIpfOaTH+etEw+Q2hO8+o1/y4mTE6S48PCZk2eIbcPunbssD5Z84jd+jgsPPugD8GqVEsKHg8M32MdQCcSbkwmnT5wkiSNl241Nlgf7TKczrt3e5c7dO8z6BVGESQi88fab3Lu7y6effY5TWye4ev0WceRztel8Sc4wwZPegBj8HkcCOGBhOp0imy39UuvCWAnneHWkgJXCKARGMTAvhlpwqLwVLFL1HmU1VlQzYiV3h0q4dgRpi0kmJeXk9gajmGijISEz2RizjD/6fXA0Xdrdu3SvvMTb167y0G/9dR74ld9kev067/zJHw7sRa9ggoAVpnv32N5sSEHocd84OMRumECIQs4O4Oi0ZxRGK9PXphFK6WtFO5DEQUtPKT3ZFC0J0USME0RGlF7QLIQQccNzJQpujqtOdO9MiCG5G3xoHFACXjGrMnj2eTK0FUrVdFAsDUhMBGkdmSlagSeRQkItIL3W6qwBqTQNA5I54EwBFSf4x0Lf9YwG6OhKkUUPtWVxfuJ0us/v/PW/xtvXrjO9t8+t2++y8SPuadaxju8Vx/fjG8PSlEXf0+UZzz37Uba2WlIbuHHtGk8+8QQvv32ZN668zbe/+S2wAilhoafvwbopW5s75KIkg2ee+wRxYwdCw/Mv/AmffO7jmC2JCk3pefLhx/n7/95/yG/8+m9z4uRZzp0/RwjRlRzEFTM+BDnvsJIxYWBzj8YtSY2mN2TRMb11m5044mMPP8HtN6+QF3Os6xBT0taI/dkeL732CnsH+yQzsnUUK66tSYWxc5TC8L0QkkIgIQRiA5PtwM6JgDSFcxc3aSYLdk6N2NhqOX1qm42NMSdPRc6d3iCUglTEoCIus1VBLjI0yqJVMEUhSvGbUwIhaEWb+u2ai7JcFrQPLLsRffa23o8SgzoJAnz9q/SXLrH7yiss7t6tP/esbEGhWvyMmgYr7iC/vbHhoCDtkVC97yoqNVb1lGKZUCKNtN7eNVyqK7Re+ZREsd496Exd6qtS+f31AvNaUedixCCoFp+15o4QRsSUiBFiSLRNQwgJMyPnOX1ekEtP1g4lo+a+jSK28tATXBPUoCa7iNUNhZjL00mISBghsXXwlDSYCSm6TmiIkRiS/644RamPLjIdFXdoqBbVh6a5QO/brbLM/MJnP0szgoODXV69dBk14+rlyz/StV3HOr5XHJ/HtzBKVYLQILz0xmvENnBqY5t3bt3iH/4P/x1zMcZROLlhLHdnpFFLKUqgQcqS5z72ab70lT/j1JlTXL16hWLCrbfu8Ztf+E12786IssHGuCF3c2JquHP7Ng9dfIK2bQmxqruHw2z3Yaj2ViFUGSijW3TQ9Wxsj3nwkbOcOXOathlx/cYt5suOmEbk3DNqfXELoQXp2dzc5sypJTkXmrFfei2lthN9PvfeGui7D0IrF0yC0baulbohSmyU1AaakZFa6JdzTDpim8h0tOOW+cLVVYJUI1uk+rWJVwchuNi1ujNBJBIVihiIVxr39jNYYWdD2IgT7t7bdbi9fXe783j3x6EUggGb/8k/4OcnEwBufv3LXP/yHzsKNfj5CTFy+dqb6IkRqUlICBgFMaOR6AjRECpJQDiY9Wxv7PDg2Qe5dOUN4nhEL0ZI4onflFiEIlqVXKrOpQha6oUXw3Bx6a53x/VSE2QI4vw3nOsHiSQNbUqkKFBc57MvPVaqNqhmhISaEsVnf06D6Ck4ErdXQ4ORgvsGEkEqL8+KVq/NQpMCKYh7EobKD6wzRVHBckeM1f9PXABAshARSu6JphQVcl/4xMc+wpNPPMGlV/+cK5euc+fulLY5RchrubJ13N84duKL0zmjjQ1SiizLgulizlbY5D/+j/4zvvqNF/jjr32ZPi9IGyOm3Yznnn6WK9f26OggCrGNXLv2JkLDrZvXOHnhBCc3T/Dm7lskRnz+s5/kd3773+eL/+afcfv2t3n6Y8/w0KNPgbU88shj5JI/1G7rhqNYSy488ehjPPHoY8TQEIJV3pdy884tNEYkBE6fPs14lGjHI1oZM2VGTJE0GruAdOXwIa4K4kVK7TlWPcvvJvKxojiIWF0ge19gibTtVt3ouHRa0zbIYBLbKqVzfze35HHfuhC8bWe4uohaR0g40qW4MHmUhMtpJ4oZ43HDyR1h/95d2jKpPLQj5+oDaAzHPcuz3/893r59i4f++m9z9tO/yO0Xvsmt578JOPctpcjdvdtMTpz14z7C+ogGUdwuqKiy6CFutOxsTdhqWkJfsJgxhEYCpcqFFQELgZS8AlbcqLYomLiZsJPse7q+Q4OtrkUphaIdRXqSeGVsYlCFCQxFYiDFAAW6zlur1nuCTWmESFsZ470LWJvQdR0luEA1Er2VbUqwAFnJFLqSkSxYXiCWKMGpLAGnGqGGdXMsQs5L2mRAIBevjGNoyP0USYkT2yf4lc9/jq9+53nK/l3u3bjN6XMPMO/mnBq3P8K1XMc6vnccH9wiC7qlEsOEoEoqwv7dGf+3/+6/4z/9T/9X3JvOuHH3XeZ7d/jUs5/gV7/w1/j9f/lnvPzKV9icnMQIvHvrKsvOOH96m1//5V/nD/7gT/mtX/81Pv/5X2Iy2qJI4td/429y9+5HOH3mYed3VdHkGMJKs/HDGM6TElKT6oKnXm2Yi3/3Juwuei5cvMhoewPBCBVo0psSmoiJsljMiU2zIq7LSrzDVtrB3/c4htYfsD/vuLu34MRkwu69hTtsq2K5YNnNXm/PZnRaILSkYEBBpHidFdymyv2+EyEaQRPkQgyBVHlkSeaIeZtbQyGi2F7PhglNVHIqpPT9TXS/f8gq8fdvv8XN77xAHCWe/l/+F5z/7C9x85vfdOK3BPquY3Nr01uwBNBKtMdb0WWVcFzSq2lHjMYjSvZ598b2DkWUOJr455y0ZHpUYRRbQNHSY1bIzl5HpGEyHoO5w0Epbi1l5oAhpbgVFMXtlXLGVk4bwc1jFYIFYkwEcdWVWLsoRZfk4m3WTMC6HqWnhMoBLK7ZWUyJFojmJrWqymK5QFSI0pBDQDS7aLV6xaeD00JwIWsNEJpELsqi9MgosZgu+c/+43/AV7/0Jb72+ot8/LEHufzqq3zyobMu7fYjXtV1rON7xbETXx75LrZfdARgMzS0TcPtxR3+x//p/8GN67dpT+0QSselK29w8Pv/jEef+AU+/+88zauvXeXffvFrpBFcuPgAT1y4iC6Vv/k3fpuA8Pql1/n4M59ByEzGO4weeAZCRPChucjQgvuLS1b/fovuqrpa/X31X+97/PDv7/+UIkIxt8BRMyxEupLpu56rb7/N629dY3exYGt7m3npQX33vegLyUWryGXJwf6eVxmlQPIF1KWm/F0HD7QPOs/OdnAQRs5KXmSyJmYHhaUuUNwWh15AEym5Fua4bSi5ZyMOfnXqMnaxEIKSEBogUEhpzgiIWpAm0W5tst2eJoWG8ail3QhIgo3NHUbthCY2hAhb2xsfeE2OvVF68CI89jjtlcs8cPoUD/3mXwNgeu0tB+NYIOfezXXVCBYI6jCgAlUfexBo9iQUkzslF4tIGPPMx3+edPIEL7x0iW5zk9wpWydP0U/fJeJJSdTou4WTzIO7HJQcSKElRAjRIDt4pctGKRlXYm3cuscasIasPSEk4pCgEU+KpTopiJASxMZIoRBLoevcKLdbLJHRtif16OjclIJvYKp0XCISYkuxXHU3A4iLbmOG4J2HENxEWOuN75KbLkIQUmLa7fObv/ZrXL/2Nl/9ypc58eTDLJY9/aIQGqEsMqrN8a7hOtZxzDh24ttE3CfLEtpM6Hqj7TMbbeLdW28z2dpiebDHyUnD6ZPb/OKvfY7Z/pg3r1zmgfNP8vf+/kVOnd9getDy2re/RgiJz3zmlyjF2N27jVohhRYEIhuVmC4MDtQ/zDjyL3YMie77L8jvd0KnLl+Ysrt7wFs3bvHqm2/yzq0bTDY2GI/GLC1TeqNkFzm2KHR9T2jHpDThO998hXPnL8BsiqaEyhIdFPsZSOsV6MIHVNfmzuuK0ooQFoET8YBmLmwFBSlgGYlC04xICVITCFFp2xHjtiXFCW2TGE8aUgq8evkSO9tbfOJjHydEOHFih/Fo7Ek2Rtq2XW0UXFi6/vfKNkm+/1jyGNfDzGCxQE6eYuPhR3gC6Pd2eetf/x5Xf/+fEGLymWP0+ZfqkqhG1OTwk+TWT6URSjdnHBtUA30xmj4Q5oHHnv0k18oIQs/m6B3evXWb8akTaDCaRph3iZ0t/3uIg2u6kExAMtYvMRr63jmPJRtaGooIJEW0A1F6Ke7GoA5SMYmU6oNnokgoBJQmtaSKwtUSfX4XDEmBdjJhifsAUjogQBdJYpAMoiJMCH3j2qU16TfiSFwpESviSjEmxOydhzJSoBALNKFhuSx8+mMf5+Spln/6u/+S9sRpPvrQY/zZn/4hOw+e58GLF8jfukWv8x/nAq9jHd8Vx84mmnewMkeaSJedgKoKmoXR9pizF85w/fptnnn2E5w51fJHf/Qdbl7f5ZMff5pF9yYPnX+Am1dvcfHCE3z8mZ/j2Wc/RSmBtm05e/oBkLCC67vc4LD4Uv/8i1fpfVCV993VnXzg4x/0e+97wGvcI4LUIsJ0tuAP/vhPmRavLU6dOknX9Rzs72LJvdBi8PZWlMDGxoRRk5hO98kI0rRkC5US4jNDrDhgIQxI2Qo5X0HPDyMgJISPPvoATTuh3VA2mhEiMNkYs7W1QZNamljBHzh5TIKbvwap8ybJLJfKiy+9zMWPXuSppx4HUUIIR87FMFT8HudrALy+7/z9cPdKfe6dO9g/+2fs7t3jxe98k7SxgdngFK8+rzIc+WgKklGJQKRkA4ks5sa5Uxd58Oxprn3ty4QYsb5gRQg0zPcXxAmktiHlAy69coX5/l0mLIkCwSJtbBzoI9FFBiQ5qb+2+YuWCp5xYJIr9nk3RLVQrCBW3CxYBnunUmkknuiGZrVTGPwUBINgWq9wJKUJGxIQ7apmaO+i5qouIh0CKfmxZjWI0WeRZqQoTvy3nqI9SUAskDR6a73OAk+fOskDFy7wJ3/6FZa5cO6BsxTtiWnMgw+f48TODq/3mdSuZ3zruL9x7MRn5nqbaoGNrVNYr/TTKcFGLA6W7N7bI+eOF196gdn0Fp/69N9m/+Aqf/jH/4K2NZ564ime/egv0bXwmU/+Cndu3Wb/4B0effRRqErzNe3VtfZHT3TDV/u9r/LB7Tv/bEcWV3nfQmvHPRY78t7D38MHJLv6jPe/zZHnqWol6dvq7Yd8FIKQUbSJlK7nYG/GqGkpOUPwn7dtA6Nm1Rk2K0gT2D5zkgXGibPnOFjukdWdGAbTUMR5dlJ3H+9PKA7AFDa2Nvmd3/61ygXrkOTvNUBi/JCtwu8zGHS5kFUZN9E5YSSCKFubO4xGY4Ba6VDz7pHk9/5r8p6QH3tztMqhbk+wev+cXdOyzz0bkzG5WH0/bxuGAKUYRiSXzGJ3xi//wm+w2L/tguxNRBWKKa9+55v8T1/8ChtnNzm3OeZTj10k9MpylhmPghvSFsUqiCb3fXV1iP6n4Aa3ll0Grh6HYGgpqznj4Rnwim1Vw4th9TlWW+xqVSGnOlKgICGS6ww5pkAQR9da62o7WevIIy8oeem0hOht7eDuTN5tUCEGP3c5F0yFoAFTtzJqxw2f/ORzfOelF9nbPeDU6bNs7Wxx6c1L7M+WfP7XPs2VS2+yXGa2T5/6ka7rOtbxveLYie/8+Q1uvjtla+sMt3cPaARSEFdz0J7TJ08xO9hnY9Sws/UA0913mB/cY7IFD148BTHy4uUXefXmZV749jd5+OxFTp88wwMPnGU02uRQb/NwjvcjV3mmrLzH6kPH8+87TJjvje/9u4f+bv77hzO6o4aoRxKbfP80OiAttcLzhSMJRR2+Pp/vs2xbQoDR5gjJymRjhI0aYlVL6fqebrmkX3akJJw7e4Yr776DNC1p3KIzRTOUOl2zEAdnNhoRt+b5LhRt1dcUQS272gotA79ewVVWzJyCAEhM3L23y//9H/5Dulz4wmd/iV/+/Oc8yUlP07TEVGkB6rOnQX5MTSv5+nucsSHh/RibpCHpVV/2KlOmaMlEcfPYyWiEYMQAmJFiu/K9Q1ztJCToptnbi+atxd4KvSV6U0p3l6cujtFJYBQiozjipe+8xiMXzlFsCaFQpEcpdH2H51jfiIAxmYwpmglSPfhEUBFyMDdxlejAIqs1+REJNwkBxLDevTEdjBKcyoALCTjIJdCpkGMH8YAUGmKBJP6ZihWnPVTXsaH13fUz5rMDUhOxbs4oTGjSxIn5ObsbhSWvFsU3Z48//ijLbsG9O3c5e+YMWztnWHaZaZf57b/7t7l7b5f5bM75By6S9tbKLeu4v3H8iq8ERs0Opg0pzLE8hxAYjUdI7njz8qt0i8zHf/GXefPqG9y4doWHH7iAbMx5+qO/wMsvfZsLD2xz+c3nuSMtv/iZ5/jDf/VvOHfuLI8/+dyq2oPDLtsHJo5VVTgsdpUEy5G/MyDztSa/w5999+u973Niq9y3Sjh2BP7PkOSOtC5rC0mrGgZUT7ta8R091vce7/Behz9XNd586wq37tzm0Yce4tyZc6vfHI5PMELIFAv0faaVSIyR+WJOXnbunh0To6Zlo53QNIFcjGyFYJllt+CB8+d56qnHmV6/y817e9yazelLZnvU8PCF89iRszaESN2e1EXZTBzEIIcbDXAFlkSDibffgsGy79g8ucON2zdqBaee1KVKz6kgK/CLPz5YTn2POv0DH/1hY0WiXrV3h2tYKm8vOEilzslKKUh0NVHFZcD6rqeJrFqSaEGiYRLIkuizUroFk9Cxv+yJ401CE1mWjEVDc0+xJeNJS5+XWDAkRspyuTo/o3ZMrCT9Umdmi1zorDDBwSW5qJsvxMPPptUuwdBqVOw0A0ej1vuvIoNns8ydbkEeBeJoQQoB6QIRsFAcPQo4n3bkibPUylGUkhX6zO50StHIZNzQBGW/m2LLA+IGbE4i29tbXLz4IK+89AZPP/UUoY0s+8DBdJ/P//Kvslwu6EvPdHqAdoXm1t59udbrWMcQx058l6++S9vuEMseF85v8fjDH2Vn+ywZ5Tsvvsje9B5L6/nkc5/mkcefYnc65fLVy9y4fY0vfvFfcebEBe5dv01cKBcfPc+f/NmX2Vvss3fvViXeUnfP3z1aem+SstVjVnlncojJBxSxgJaMkf//7P1prG3peeeH/d5prbWHM9255iqSzZnFQSRFjS1ZUk9xd6cHJbbRiWMjyIA4HccOEARB4DQC2x8SGAi6O4MRww4SZEJstKN0q1tzS6RIkRQpikMVa666Q93xDHtca73Dkw/Pu0+VWnLrUl3oD/F9gYNbdc++++yz9trv8z7/5z+opVIx5x41fwBSfBccet41lF3XVaGidz13KVkzynL9mdXAeEdfPzs7Y7la0jaBixcv4pzTsFZBN1G7gz5NLe6aRuBdOP97EBbLFUPsWW8WXDy6wL0HD0hDZLo3h9DgwgyRSMmJHBO9RIyzOO8JbUfnrcbFGIvNOreJMdeCZAjW8tGPfwac5969M/6/v/1bfPzTn+fCwR4XJq1eQ/tHlxttWbWov1PP6/tmQIrquJQiYymyxRqdkY0pExGwTjuQUshjJhsViBcx2kmYf3Im+k++Fvkj/u5PtqTfwnrF+X21XuH6Xl8zRd1vBJCMGaKmrjtLnxPbYWDWzsnjBscIpdOOJgf9YGXtBIUeZErOI7gZzSRwcnoPxaYHrG0IpiW4Dm93InOHLZniLbFk0rZgiiVFPWxghOIEKx6LVQi07EyioxanCi8YhLyDaS1AtY1Du1WKShU2/YKzByuSOaYYhzEKdSIqqyhVZqIf0ICIJWePWIP1I0UigQlkR8yZRY6EnBkQbACTPc53fPTjH2e96Tk8OkSK48HZGbfv3ef9H/gAN29cxxT4xtd/l+3Y080amger9+S9frQerd166ML3qc9+AWs6vvGNrzCZXODlV17hL//FH+bF66+zGDPve+5DLJf3+D//v/6P3H+w4qf/7F/glVtv8dyTj9Ovl7gmcev+A55+8jlunyzJp0uIhW9/+9t86tM/+a6C8E8Xqb/TdeljRXQ2Ys43SwMUrAOMo0j+A5v47nSvj81Y48/hRan+gmKcPtA6UlEDX2cNwxhZrVYMaeRsueTk5JjNesvQD3in8GAIgRgj3nu8azg8OmR/T5mKbdcwm0yYTCoMZDWIdOdZqB2TcOnyBeyyUGzmK1/7CsOgzz8MA8W3nG3XjE6z+bx1dE2L9U7Lfsnq+lGELImShTH2mnptDXEcKQKr9Zq7p8csVsfk1jCUQr9N5EmHdqoFRS1r8dls4NVX4cED/qhuy9TOyZ4XLIOEBj8JuMWSK9ffotvf4/BsCd/4unaIfeTglVdorCDLBfbyZXj66Xee878Q6n4Pil6ohIlhxJwc13tDMKentOstTUnkFMEI7dgAQj9GhvmM0TqgcHKy4APPf4JXXzlGgqtQryHmWMkdozJAnSHlUWXpxuCtJcaBEPR5RDJjikixBNcQx7E6wgjOOT0cBcgkxEJMSQtXEkgalitkrDfYKDW5HQ2ArvZzZgexGyFTiNW6zFtfZ7eFrnNc2jNQBhIqn0k5kpM+zji1OLMGUjmjYMkSSBEkDRSTMcWryL2kqis02GhoTIuMDc89+yGapmWxWHJyesZ6OfDF3/kqP/yjP8JLL77E/bffpl8sefvWTVzTUJixWCz/2d/vR+vRetd66MJ3+85dSrbs7++x3fa8733v58u/81XKtONssWEy22d/f8KVyxc4PVnz8vdeYFid8dZrPZbIZjimnTsenN0lDo6LF49YLE/xYa5dE0W7q9rx7Vxa3t2haYHS7qyUXee3+6669rsKl5UiYKQWx/rYcyKg+iVaLDkljHH0sSemQp8iy9WGWBL9OHLv3j2MNfT9lu1mzaRtdJ4zDlhjcdbSzlu85TzaxbtQ8+8MJ4tTTs5OVLybEwDOOdqm5fDwkPl8xtXLV5lMJ+ScWG82LFcLJrMJ4zCwHgcuXDhgvTqlmwRGGwizCVKSsjdjYhgjAcCpK74UJQgN/Ugc1f3E1u8579mcrfjFX/gHuNbw5JOX6fuecYxsy0A5mJ33a7sOVNZr+Nt/B/Pv/C/P8+v+yfVPK0WHwH/vj/j7APyFd/2/TKfw3e/AM0//EY9+j9dECTXmr/4V5Pnnz//67M3XePG3f4v9yzOGfoOxwnQ6QSSxWg8sksU2AcrAEKv+UQpiQ2UjgzBQxJIlQY7ajZGrYbW612TJxDji7Bw1iVaHFUQh1pR3HRmkIpiggvixZHLOOOMIltrZKTRr6gz2/D6vbE9qNJGUKmuokKUYUbi2ohHzvSnTzmKyQrjZiLrpJAdYTC44qc/jenINJBYc4rrqGLMj0ESyZKzpsNHQhgm2C3zusz/E9etvcnZ2xvHxCXfePqGb7DGOI7du3oAR+k1P1wRSgTjkc8j70Xq03qv18Hl8D+5x7dpTxMGyONvw7fsvsBkLF64+zqyd8vVvfIO9gw7/giFt4VOf/BQ/9PzneP31V7n59uskH/GNwZnEfD7jycuHfPBHvsDnPvszjOOW0DTo3KycG+z+4RO/fpDV0F3TBEopDMOWm7feYhi25Fy4fPkaly9fVqjG7uAeFRrnymCzdUN5cHrC8ckxr19/g1QK675QJGtH5iwxaobcbNLRNo71uIZRaHyoHVFBJJFSUo1bAWcDpRSySZWtKrhgabuuFhRhzJE7x3e5dTfz2huvcXh4SN8PrNdrLl69yLydkwoMKVFE2OTI1LfElEgCYz/QNA0paT6a8U7nirvOmZ3mTp0yrLd473H1dU8aRy4REyMT15DGkQikHCvUvDuECNL3mFu3MDFS/vV/XaN83jV1RHazPUilsNn0lNWCzsLtH/lJvvTyi6zyBu8MbS584Nn38cQTT3Fh/5AvfvErPPexj/Dh7Rr73/hXkfv3/vkUvt3a38NcOEJqdE96sM84mzHOOmKjFmJx2iGS2IqwXkQueM8wRsRqtJIxmSSuysQzhYFUwFg9xTlvGdNIkQRSCNbV6CVPjhlSBmtogkFkZL1aYKxVg+qsV7ppA99+4buE0CFpS4mFcVBhvUO/Yo6aYejU3cfs2JzwDhwt6ptpjFMvTbuDTlWiYX2DsQ5yQQ3Es1rMid5TakdWKMWp4YDs7Oa0Q3Qu1MLnda5IwLWOnAo/9OnnuXf3Dpv1lrEfiGNiudpw7dI1bt+6wXa7ZdZe4OjKVVbXF+RtQcZC7v/ow9aj9Wj9SdfDq8Ilcuutt4BC2znAM5lZbBqwWWNVNsPAhb0jfuon/jTPPfk+VutTDj9xwCsvv8q8m+MoNFhO7t9ifXwbS+T6rQf0mxV/6v0f4XOf/TEm3VSp4u865Ump8zijrv7GGBarFTeu3+DB/fvcunWLy1cuYF1kvd7y1vWbhLZj/+CAp555llyE9WbDarvl+OSEk5MTtssN2cJ6WGODoesC3lUPkaIEkjJGglXneVJETKFtHILCSUao86xCLklngHh1zsdQnFIgDGpGHFM+p4xYa9Xh3jq895wuFrVLhePTU0wXlNaeM/cePABv6GMkEUg5YbDklM8p61IZdrZiWoLgQ8A3Dd473eic1qiYE/24RUhY47DWk/LIMAp937/r0KHFLeeMLZqWsLh8Efe+5wjeE3zgHdlC4eTslBdefpm79x7w9HzGB44O+Xtv3mJ8+nH84g6Hx2c0Ai9877v8/vdeZP9DH+BkPuXahz9MXq/rGPa9md39sbez7I4geoOVooUhl4JxFequsUiYQimaSbfeLPnMJz/NCy9+E1FXUrAFKR6pnwMRdTdyWIK3pJgR6iEsG6z3EKGI3ifOeYZcsE4Yx0Gju3IhZfA2cLh3ieVixXff+D5XruwDieBCNYS2eKp5fFFRnpha4M6Nq0HvCM3cQyymGCyGfD46sOggvIBYLAFnCpak6RKu/owa6mdMhylREZVaEp1TmzLlzPjqOGqxznFwMOfK5Uvcvn2X5WKJc4H1esu2H2j6DcvlCV3TsU2JyVFH7iONbylY2qb953JPPFr/5VkPXfgyULK6tQ+jzpF+9id+lF/+B/+QS4dXif2Wz37m09y5e49f/41f4dez53/0N//7XLh4xGI98Bu//RuMcY0w8thjT/EzP/Uz9MOag6On+fLv/BqLxV28t8QkeE9lihkkF0oln22GkbOzBd9/6QXevn+bkgrBGHIxnK22pHJCCFOMDSzXC876Na/cusVYhPVqRcoZ7z1t15Gd4HJiNm0Qm/GSaTGIBawKiE1oMCI1AslhXdVOuUqFKap3GsYBixadgp7qnTXErPBtsLtNBepRnFKhryyZ1A9Y056fysdx4K23ruOtQ3Jks42MeUTIZD9hu9oyDpEQnIrEnaZjW2PftemB97pRlaQbfd8vEEaWZ6fn1mTjmBhkZNMvaboDxjG9Mw8SfR+KFE6PT7gCfOP3f5/N2QmCYdJM+NBf++s89ZM/gQO++T/7n7JcLbBtw3oc2dvbx5ZAN5zy0XHLR595jL3GYZuWV27f5ds3XyfML2PG4V0BWe8NW/OhV2Xkgkb1pKLmy1iPZKnmzSrSztmTktD6jpg2GFrNwJOEkSlGoh5IZMQDRRzew3y6Ty4bchGC83jv6NeRUjKxJKRknFcnmyZMFHosRZm57YQHd+7x+muv8dz7ruG8J0tfc/o0AwIRDfsFnBTIiUwhSwfFYuw7BzNV7nn904hCqyhSIFLTMrAYG8BGhTV3KItFZRMIxhtMqSzTQj3wARS9ZpVFagukbc8nv/AF7t25S9/3BN9w6+ZtHtw7IafEkAaa0DJpZ3gs1gvOeMJkguumHLaPCt+j9d6uh09nMB2h84gUhhLZDj2/8Zu/irWeftvjTeE7v/cNfupf+Dnado///O/9I37p136TL/zI53Eh8IEPfoDvff8b+OmMN9+6xXe/+xJf+OEf5YmnPsRv/uNfI6aek5MTLl18mpIT3u/Ykhri+dXf+RovvPIyy22PdVC8JaeCFxX9Hq9O8U3BuUiwLdPZlL6PnK5WWBcwkumCUtO9zbTe0eGwwWAc5yfgZLIGdu4GJTWLzFXySBbBFKun9awdnoh2TRoRoySZsttQDWr0KwqxOqQig4ZcoSdX+8YkmYwaODfGIqkw8Q02tGzGLWMayBW+aicN08mUcRw4DxXNBet0Eyu5MIwj/XaLisIs3cQzm045lRPV6RlDHCN9GklxJPuxdqv6+jDarXrXcOvt21wB+hIZjFCM5fBDH+SJH/tR0jjim4beGfocaYrn5OQU9wEYyornxfCjf+UnCI/vQ+rxm8iHvu158JsPuHl6xvr2bexFFSmLeUjF5Q/igvPHrH67pWm6c5YtCDhLBpxtGJIlpcqcFJRUUgTJuRYGc96tG7N77wtDhCFvscaTDERnGdY9th+JI1imdVbt6fstkhPGdYzjqGL1PLJa3OP0eM3HPvoc3qu4PuVApmhQbTUv1/mixxj10xSjELzOuHnnfq6nGmPVNMJSKJWNi9GiqI/JiNUIKFMt1HZMaqksUF/n8LkU5N3EMqMHBimFVAzP/an3c7o8Y+h7xpR4660bnD5YYpKl9JlJ1ymxxwcmIXDjrVfpZgdMDg6ZzCf0r772A72fj9aj9cethy5869XA3p5jMpmyPj3l4GCPphFM22BzYW824aknn+a1l2/wZ//FP8/8aMJ3X/oW33rhS3zy+efxjWe+N2N/fo3JLONCw2IRCXfv8vN/7a8zDA84OpwAG4zLpFwwptWN0EIIlsO9Odt+Q06JVAzONhjvESNkY9Q0t+hAvl8sGcZRWXQOvHU4796xfTKQJVBiJvWRlDQmxbeF7EztBAAM5JHUb4kpAYYxCSKJYA1GHMZKlRM49XO0ajjsjaZvDzJq/A4GJ1nnbs4q305EBcJkSoWjDAZnTHVwUdit8R7IiA10E2Gz2ZCiFknnLTlnxrEnxpFxHBERptMpk8kE13ic9UpqsJZcDHlM2OC0oyuFlEZSGtW8utaNHds1l8x2o36JQ79lGNd0+xf47N/4G3z7H/wC7/+RH2N+6TIx9pQ81k3RURCm48jHn3iS8KeOCC5QisccKcT7/rPM2//4e4xDj9kRkOQP6wf/uPVupu/5czxk8ROB1WrF4WHAuaCHCFSblguUWOjHLSE4dkx+hZR1BpqTMmeLEVLJZEmsttualtCQiVg3YaxyEinQhYbTbEA6CluyFGIeqnSgoU8Z56FsRoJvePrpy1iTq6NOxtmWYtZIJVeJCN42pBIpqeCcRj/t8vYUkLeU3Sy29rjKDzPKyK16VxWoF0UuiCqUL+ruYkRz9IxVAoutULwYAVewtbiWnU7QBQqGi5euslov2DvY5zvf+Q5d23K477h1/WXEWC4eHJBKJsaB7eIESYkwn9PuT1mvl1z+54N+P1r/JVoPXfhCCPTjkn5YI7GQ+ozkwLTtcKGAMbz0/dcZS8//4z895s//xT/HL/7C3+fZZ97Ht775TX78T/8E3nmOji7z3Rde4O1br3Jyb+TO/Rv89b/6c7z/A8/hnFBkpbMLG0i5P4dmPvrRD/L4k4/xS7/xKyxWSslHEt60IBFrLDnqgH6II8aG6kDvMGRiGtkOGU3R1kiW0ReMjWgCmscZT4xC3PY1vDMxjlGhQ1H5gDWBbCyuEYKiStXmS3Vt1mknZa1lQkMTGtXzWfWoFMn1wlcSgECxgnM14FOKyihqEcw50vhAyeCdJ1dtlfdqZRVjz7DcklKmaSbM51P29vaq3EN3jCxCipnQaKadKQZbCpKtElIkk1KkpFTnTu8U/ZwSD+7fox+08IkIaTvww//D/zaLt9/mG//P/zvv+8KP6k0yFkpSXV7OekiQxRo7dSQX8XjlwtuAefwShx82jP/w98hNw85c6wd1YfknWb8/aLdnDFy4eAFrVTYgIjsOlbJ0DYxDxLuWIkm/X+dYRkWf6nJmwTlbr7tTCDsJY9bIH+cEWxmRNjj67VqhRZ9IKVOMitZfeu0633/19epkozC783poKkUoOdZiO5BHp3Y5aGJ6KWCNw4iG0+4QE51VVrlPlc/ITnZSirrxWEep2j9nAathuDkafW+M6jPN7jqLzg9zKojTOacVDRKW7AHDMGSee9/7KcXSNlNef/1NNusN+9MDbt66SSbxpz70IUyOIBr51K/PONjbg8kBB5cPufnmGc4/YnU+Wu/tenjnlhK10LQN1lhaA9t1z3Ic2Z85ju+dAB0Xr815663vc+Xiz3N0eMR/9S/+Fb7821+iHyLPPPkUd27fYm9SeN9HvsBrL1/n8z/8CU5OVsxnT2CweF/FttIgWVhte37vO9/glRde5Oz+CW4+IUw906BEFBGLjArtjTkiRLAOyRCsIw8jpUSMFdUsSU2xNpYhn2F9hpIwBDKOkgBjdZ5nLaVEvOUc1tF8O4FcasdncNbRBt0IdLyWSCMs4rJSzQXvPdZ5NZHG0TYB797Rfk3boE4YIvpzpdQ5p/o42ixIGclN4P6De4wx423LZDLh8PCChr4ai+Rcu7Sim6y1pDGRcyElZQ2WHaQqQE5ITDpLRdgMPbmAtVp4fTU7xujcMKXEk5/7Ua599GP84r/3twgHh+fWZt3+Acu7bxPHhCrfoO83pPtLJqc9pgyU/YbcBYVDl2uOjfCpq4+DqNTDvLvbhErU4Q9bqIrOfq21O3tNLZ1FoTtTO+U/XAd3lBY5/39nayK81Q07xpHt1tfibcmlkFJSAX5UI+YUS9XqqSwhO3U/MRb6fmQYRvAN235k0nhSTqS+x7BHMSqPSblXun4qeNtw/fZ9vvzVr9G0kb0GYkx4n7W4GCFloZREij0WweSIFSGlxDAOjEOvxgtGM/OcU9ZpSrmOlh1FoppJl6ChxUahSmsL2TgkFgKFKD2rfk0/GpxzdPX+dLYSWEoAWxjLqKSgqBaG/eaE1169jjENe7M9fuxHf4KTk3tYZ7l96y5PPH6Vl154mWHbE4JlKFvSxtC4Pfa6GRvbgvO4tkHywLTzNOFRLNGj9d6uhw+iLZoALqXgQ0DGjCWQ0c5It5LMZrPiYNbxv/s7f5uSHX/3f/+/5ad/5mf4pV/5NT7/+c/ykQ9e4Xu//2X+m3/jXyPHgPcJF2ZAQ0YoMXPn7n2uv3Wb1159nfurE2gse7MZV7s5xVmyjTqXkFBJB+pB6FsHxiNG539Ss8dwliiJTd9TckGMxTinZBdnaFzAGvWstCbhnK8wI7gQMKXgDDjrMdafz+a8BckRStKNyDWUWjCssVjb4h3aFaAQUzAOSwMixDQS62xl6NeQc+0O1f/K7Yg4eTd70q8LFy4wxELXdBhjSDEiBS1mBnLVeqWcq51WYBg0acAHTywjmYItQhx6pOj1SyVBvXa+seSqC5vtz/nwRz4E3/ous2nD7MIhLjT8i//Ov/sH7pG/9Lf+V/z9f/dv8fZXvsaYIhbLcUl89Vvf5+rzj2GemKujTEpsHmz50le+xnD5kFRhXb1UUgvWrgLq9eOfsIwrksF67h6fsN5s6SZTJt5zNJ+hZIxcxfR/PASaUsQ7FXLHqL22/sl5qkEWJXiI2fEjNY3Ae68WZrkoq9dYtuuIMYHNZoPQcXjk2Z5EKJb1SlmzMUYcCdIUWwqXL13ihRd+j9X6lPc/+4R6dBY1NtiRjHJ9/iFH+j4Ro4rQxapZgWoDE5Mw0fu/Og/tCCk5ZT1B1FndzjQhF6FkIdtMFkOqs2nfTZi1LbYI3qkRd0yCwyJZZTopG7xtGfrErfsnNE547InnGIbMn/3pn8JRuHRhnxu37vDcc+/j7r3bdNMDNpvCpLFs1hsa32JswkliiJkuzHDGkfqe5ckpKf1zJjw9Wv9/vx668HljyFkH4yklXK7Uf3GMw0jrAwVoJ5aPffwzvPDCDVKCJAveuPEKR0cXuP7mde6Ywoc/9iHevP4G1658kPnkIuuYOTs94/qNt3jhxRc4OV1grWfaTWn252RJDKWQXO26ojCkTN9vGeIIOVNSxnaBlEaKUyiqkWr27NWA2FoD3unmLoJzlpgizjqMrQa6phB8IYRW5xjOYihYA8F5DEp8EYTWe5xJiEScczX7rJJDxFKS/luKavGoYnqVc2UEFSK7NmCTECVWAX5VhGVhs9nWeQrAiJEW6wPL0wUheI3Ak9olG0sxiTiOrFaruscJYWpx1mC9pes6zozBiRbimNN51JS1jhwVNpas/o6xZIo3zPYPAPjcZz7Lql/z7f/T3+Hs5IzNuucn/s1/i8nhEf/4//B3uPvCi0hKpJwoUli1nu+kwv7/+7f5zE98gu6ZSyz6Bb/9K1/mpXsLNvNrrMee4/UpV6k1Tt5JhnjHmu6dpZBf4frN6xyvlmADw4P7TL3n8CMfqZabGWN9JXa8Mz+EP9wFxjjUrs9XPaY+XhmtepAopRoh1Kg8fYxlqKkTWWoEbRFW6xW5FLxr2W4y86caTh4oimGsoZjEmHsaZ0mD4KzjlRe+w6VLE65dvkpwQh6HihYoGxNjyUV/1hAzxXgtylhwpqawa0ENuRBcAZR0ZUBdaNADm0LkkNBZa9bBHMlCLNpVClENwws4EYYknByf4Z2n6yYIhlQGCpnVeqAfhNPTBQezKX4b+fSnP8mlywdslmckM+IbR9tMuNY+SzfbcnTlGiluOFtsMMXSuKBzSyLbIdLMIm+8/gazdsJ2dfKw29Sj9Wg91HrowmebRImBcRwxpua3qbyVaIVpG7Am89iTV5jMJ7z+xqtcuXYV1yTe//5neXDv27zyyosc7LcsN/f53W+/xuNX/xTv/8BHuXWy4uzsLogwm89p9zo2m577qzNWZaQxhmGzJlmorZ6eUtFOoLEOyYLLFqwO63Mp9FKwzuNwmJxVp2dVV2SN6p2KtJiYmLQTshGcbwgh4H3AGo/3oUbQCCEEGt/gq91VcI42gLURG5UoMsREyZDzQCYgJWEk16y0UgkRQhr19MyOSm70ZxjzDtxXqvmzNcow9MYy9gN91QmmGHHGsdls2ZtrYbICpIIXQ/AKU3rvqyTDnM+hVGCv17BpGpy1DMOgui6jUToPTk5ZrJasT+9ycb1hH9gmWD04xbnCE13Di6+8Rex7JsAbX/8amwfHSoocRxabLUPK3D3c5x8+uM83/q//iKu+Y1kMd+aB8dIl+pzZpp5N0hlikaK6xFJdUc4dR0yVgSi0t92sOVsv8I1HLMzbKZf29/Wgg0KgKaXz2d0fdgJ6h0QTgubEGXZ5e3Le2RnsORxtDJW0ove+c2oOLhWSbqcTjDNMpy12kxRuDx3rzQLfBMoYyWiqgviA5BVtY1kdL3juiYs0nRKfRIqSU4gY0+r7JfpexRSVtOJaIJ7PjVMZMWR8sDgreKcdYBJRQkqFqneuPrv7QozBo0i2KnkSOAOmwYrBJEEkU5xw7fKRwrlBI5s65ylYfHBYO+HalT29XsbyUz/9Oe5cf404GpabU5bbgc0Yca7h4Ogae3tPMPRLrm0jQz+yWQ2slmcMY4/1LUU6Gj8hD0L7KI/v0XqP10MXvl4W5HxINpa2ywSrJ8di/LmvoPGem28f00wO6OaHxCKMW+EX/vNf4eLRJX7mZ3+Or339d1j2GRfWvH77e7zwxkvsX3sfwWqXcff+qQZsWkd2gdwY2phpnWMMgKAkASPgNIUgjgqr9ds1TeOY+xYJnuwhuEBHTXZvHNlA8F7ZbsbgnWdzsuDC0RF4h28CzqkYeByFcVBYEBGKVYKI905hQ+toggrfCTo/GTbqllIk66ZlPcFYrNX4HZzFiiGnnjEmYnKs+g253zKbTwm+A1FGXq4BokbUjJgsOKPJ1scPHlBipGs7xmFkNlMavjGCbTxdmCtlHaBo0R6Gge1qhYBCnmlkHCPtdIJkLQ5ZCmerJbfv3eZsfcqk8ZihR0btGG7dvcOxZAQh+Ia1dPxH/9q/RClC5zvVeyJaSNOI8ZbeOVI759QIL0lWC6wBuL/AtzOKyWzTCMCDk2PyvbuM40Cs9mjb7ZZ+HOtsVpPZg/e4xlW4MFOs4eRkJC43YD3dpKX1HoOhbVudsVpbC6Cck2iMoZIndnQeLbDqkalWXS4XvPeMMeOC53RxhvcBbwwxDYQm0I8Qx4wRh3MNzjsCAQisNz1jHkmM7B1eZdVnTle2ogiZg3nL4aSl+ILYhjElMpmmDXRtw6TrGFNBnMF5x2a7JY2GkjIlJ4KfMJt0bOJIcIa21fm0Nw3YBidAnf2mrLIM710tfLUIZu2ss9XCXozXey4KIg5nGyWPOShW47GKKJIxm3gMDcZ6EM/Hnv8Mb735Mqe3rtOYA968/irrfqCPI30aeOYDz3Pr7cDlS0f0y1Nu3rhB4+YM/QrnA/uHc7y3zOeHTENLPn3pn3Wfe7QerT+wHn7GFxucRIITzCikUoANiMc0liQWUwLZzDhZRlb9FttamtThXcO9e/c5vDVjtVww9Xus3RZ32NLtH5GGLVsjlRUJztWTqQWTVBgbTcYhuKDMNec8vumIsSBGXfS7Zs5YEu7wACeBJniCB4/BW48L+tWPg5IPrNBOJ2z7nm5vjnpZBjCJ48WKfpvJ/Ug/bpnMG4YUcV1H6z0BpzCm0VmQQQXH22HNZjNibEYkVlJQpgkNoYHQeI7251irFm25GPY2LacPElcuXSZnYUxq/yRSdC7DSCwJRyDGzOZ0zYWjOQeXLpL7qAzDUmXYvorYjTDGREqRsV9RNlswhmIc3YUD+n6NFcuWyL4xSsUvCQnwvTdeofWByWxOCJYJjps3b/EMcP2N17mXt/qzoiHiMNMpNte0myEjZahFemRsDN4VtssVwUaKSeqOkjtKGRjHxHazpd/2ANy6/zbb2xfwYeciogXLtI62cxpqKkIiUdJIQSONrHUkC2d5rYzFTcFKnXNZx9npKavlgieefIrD/QNmJw+4hIqvszSY7HAOhrgiZ+04bSlkrLJsk4a9piLcW94DbylJuHN8SjQBTGRk5PrpTcY4YEaHmJ5797e4ztM5j2/mNNMJ3/zyV1klg6HDuwLBEIOrWYQJJ46YG/q+0E0d4gSXe5xkstiqubMkk0m2gDWUogYHxXRgGpxHiV+V2Wm8xgZZBwWL+ADe19T1jDjNzKNMEBkrCgHlPJ2kgPdVxJ4RMXjjKBU18JJJqXDpyh7TMHL77btKxLGnHBxNMItI6KEthtN7byB5wt1XX+L0+JSjS5cZzYLp3gWeee4JmsmETc48ccGTNz3drdl7uOU9Wo/WDyJnsA3FKOEjJQGjxAdnDNYZtkNP7kdONqfcuH6PLhwgqSe4FpFIkpEbN26wt7ePiGc+aQjTPaxtyWOieNVQNd7hDHgsxnkyDiuWrvW4Tm23rLW0oQWBsN8RvEGy+iUuhy37+/sMayHlRDv1HM33IFfLJmvYbBPrzRJjLU1ozmHIdpckbiD4hnuLO5AyYxxoOs84DBwdWYWdnM7dxOk/cDjEQBNa7q5PCAHSOJDzQEyJowtXWPcr9vem58J2AOcMIThWmzVPT59RiHbRs+2H89+zm0xoWzWkhsA23eV4nUhDJhiP82CKCv2XZwuWmyUxJ5omMJ1MODo45MJjTxK6jjdvvY0JqrHLsdDMGjbbLZODi2qsbYSx9Ejc0o8KeZrTBeuTBwCkmCo7sOCd+jQiHhxISnXWqY4289kclgPDasmQB2Vd1lSNIvqeDcPAnbfv8MFOmXvG6r8fxkH1kUbwJqhtmt2xKCPGGLpOZ3Jx0Kgg19TnReFRodC0U/WcnDY0TFmPW4bjkf3TYy4Bx6sz3n7jTVo3YRLg7Tu3VMpgdsG0VQqwmxWKdvOlFLquox8HdB7b41vH/bP74A05ZSQk+iEiZJDM8uyYe2+8wYcef4x0UthuDTEu6VynnpmmSiiKyliKQMx6yLRkxJTzmZ06pKAuQk7lIGI0I7AUna9yLk0pdR64m5nq90oxWBGUFqXkHBG1I/S7GC+Dzo5FNP3BKmPYmog3QV2cEjz11DN88lMf49oTl1n3S9733FPEYWCIPavNQEyFFBPWGYZk+coXv8GNV15nvem5fbrgo598nsmFI0I70UO1ZFb9hmG74bnDg3/Wfe7RerT+wHrowhdjxDrBuhZTVLeTi8U7T2gabBHG9ajcESl0BeJpRPYL0sq57syZFulaUhL8sicZyyiB/at7Om8Kgc4aAgYbWmzTMGk6vA9sc2RMhZTUBHjWdMzn+zgv5BQR41mul3QEHpzeI7mM62ZImak3ijMV7oksFwtm05mKaZ96isY4Si6VvabzkHHomU8mTGYTurZlc7xh6hq8rUxBY6oDBlAUHjPV1sxax7xJxGIYXMAYw3K15mBvSvABayojD4PxjqeefhqcwVjPZuhZbTZQY5OOmDOd7DGZdUgJBHvKtO3I6xXr9ZqSMn0cSAiz/QscXrzIdNrpIcJZSjbEJMQhsd707M8n2FjwxpFjwQfHxAf25/u03QTShpQGgrWUFGkQfc8BP4w0m5FiDZQBI+BLUeOAFGm8EjhSykixtLllM/RQFGoLrbJiYxxUb2Ytt27ewn/w/Xqfpcg4jqQ00HaBkhNDGeucUoX6UgqhadQuLBVKzoizlAJjyioHSYkoiWw9p2dnLE/PKDVN/drVq8ydlo+bx3e5NQ80dATGalem1nVZasgqWpRKzucsS3UvUYJV43QuGbwn5qj2YFUQfnR0yMm9e3SHAc/AB5+7wtHc89b9NSLhHQJOSeB2UGsCiXivRuwiYJzDFqnemvp+SMxQmZXqLGPVhsxESlEGa8mqTXRePTUlZ0DTao0ULXwGsiirdSdb0QBbte7D7MKdMwanB75UsAE+9fzHef75T7LfTVhuTjlbHDOkgXEcGYbEGAXXzDnbLHj5pTd5/bXXefmV1zm+c8J+07DZ9my955u//y0+8Ynn6doZ681I1zVMZi2ElhIezfgerfd2PbxJNU41cqhnJXbAJiALTWhYrbYE2+EYmc4Maej5l3/+b3Dzzl2++PVfI3SeWAolO/pc8MXRNnpKB8uFoz2KsQyDzmysMRwdXUIcNN6TsyX2mWHMbPu+dkCZvf0jSimEpiPFwlOPP84wFPrthkjk4oUZ1gQ0eFPDW4wLtE3HpYuXVOTudGZnncU4FQNjMtZbxGawqhck747BKE0cA9XE2FgN37RV6IsIk2BJfaLrJhhEE62rbgwxOKM55JO2YxICxlq1HTOQ00jJ2imIzGmaQEmCtTD2S8b1A/b3Z+x3Mx48OEV84OjgENvs6XwKLXY5qeF3n1TPVcrIerllnmHrPCMww+DJ2JzV/d85fDujMdCWTMpnLGuHOht60vEpCYOxlcGa87mfqncDyIDNhuFgn/XNE8ZVj6k+pCklZQ5iMWIYonB4dFGLI9pRxqgzxFJENWjs8hOVgl+kkLPm11ljCR4MQklq16WJHIYGB0nh6jSMlBTZlMx6u2Z/1J+3XC04Xc25dDDB+ECxSiAxRXS2ajTRTotfdTkpOvNNOdH6DkpEsrCNPcvFkrGPiFguX7rCKy89YLM549rhRaZdy7QJSA25FWv1nswJa8GIwrbOJKREZfOiQcZ6gWtavBgVi6NMT5OpGXhVNVqvNWgMUs4FLwEokGLN61MCGCVrZy0NzgqFqNZjtrq1ZHWBsUZwzrPZrDg8POBzP/bjPPf000gQhtRzf3lKTIVh7Ek54axn3I5cv3mPr33967x1/QZnJyvSmFhuetKYWMeRISY2y4zLwu/+5m8xjInp9JCDw31ssOxfuMTTP6ApwaP1aP1x6+EF7BlySTgrhG6KdYWmZFLpib1aIxUrSFNYjYKzga9++3c5PV5ydHiJ09P7NKEjx4JG5hVWmxVPXn6KVizTyYRsW85OHzAstxy0LRcuu5oYbbFOAy+3m5HT1RlHR3NSGrDWITVdoG20wIyigvWZC3QuIHjNrnMKBVkb8K5RFtu55ZPOMhKai5aNaIdYdXkpC2JdZT0anSsZq7l+oglnpjImmxCgJCZtYLFJeGcIDk0Xr3DZzlllR993ztdYFyVudE1LzomxTzRV4mC9+s588vkP8tnmowxD4jvf+T6L0xVudoCxU5BEyZrWLUWUKGENJRuSJKSMWCccuICdBlabBRcvXiKlnn61IKdr5NKQijB66Kae6f4Bb33mk/Diq3z92gXevPoY0s0QMyq5xBnG5EhZ8MbjzJrtpuULf+0vcO9b/wFHzlcNnBDHTDEOKWrQLQaeff/7yHduAGrl5r0nF5UGeO//gKTBVUNuU6+hcxXqNYUiUdmJthZOMYxJ08NjyYxxpDWOzXqlxZGaR5AL2+2W7BIxRT0kjQlSQWwhx0K22vntjAGcszijzF9TMlLUsWXYaCebRuH7L7xMv2q4cPlQ/102BNvinAHWgHZyvqgZu3GhivJdhT3VIMEWIOvvXLIgeh6iRDU7CAiNMYzFkjCqZS2lSkJUFyoFLKlqJK12xVW+g+j9Yo3Cw7kowcei74Xkkc12zTNPP8XP/tRP8vRTTyNmZLPdMKTMNkXymDDFI6Zltdjya7/8S7z+8uscn5wwjgMH+wf4fmTYbOl8g5s1eCOIi+zvzckx0vcbGutZHd/j3s23mO7NaN++y0X7yLnl0Xpv10MXvpRAvNCnkby1TGYO8YbWtDrzMZm2g1wMfTYY57l1+21i2jKf7iNjw5gcBbXF8q6l5Mx6vWDv4IhUMmIUSolDYdtEXLCVa+fA2ipvE6ZhSiCQGShOT7jehtp9VW9Cr16KpliwQizqQmJz1kwxAVMcJgm2eYe+XonqWDGUnMilBQRrYj2FW1xRCr1UhqYVoTg1bi5ogkUeDVEcxjb0Y8RawYkgObKL8rHWKrnA6EndWYdDMCXRmCmjfYA1DUZC7TYAiYAQBxhjBh8ITUe/7bl3/y6+Ccy7CdYIw7bHY0gZWqMkJIbMh9//Ya7f+QaXrj7H7VdfwLSRYOe68RlhG0dkLLTB0k4d98+W/Om/8Jfh//af8aEf/XHujQtu3biLHTKPXZzS2sLdaNmagNgCxTCGjvWeo6lCfdd4xpi0i3CiekdvmE0PqsVWlS+wMzrWzttUGQNwbh3nLJhicBSsqP+oiMGYgHca5IozSMps4wjWkkftZGLMGGvYbDTVOzQe+pF1usdKFDIsRQ0BSimqnROn/qwUUiyI8cSsc6hh2DK3MJTC3v4FTK8BrOt+xVNXr/I2pzgvlBLwtsGWgk2ekgzF9pQBJFiCAdA09ZINJIfJA1aUNZRNQ5KoxuPjgESvnrMUTQkpI+IKkjRVwohBvNomFGPI1hJEZ+Y+GKwUAmr4UHLCyFjN1416kiLkuCWanieuPc0XfvjPc/nqEbGMLPszzRcsMEY1blhvBm68/Aq/+au/yivf/x6huQK+B5eY7hv2DlssU/YuzFj1mX6zYTJp2d8/ZHE2MKaMN4aSIzZbujZQUs+4yRT/KJ3h0Xpv1w/g1WkQF8B6UipsNlUjZQqzSYeIwTZeBbbjgDHCdrtl0gY26xXFFbZjpLUBSQ1jsrTesVieMp1fRIaEnUwx4mjaDush67EXZ1B4ByUZBN/i8JQ0aD5ftQ4rRWnnzgREHMaIOlhAPfVWM96SkZKIMdJk1SCVWoCMKZQslc6uIuWTk2MuX75CzllhKaenZdVz6b5sROoGbumHnmAcqzHSxwzekZIaTuus1FYI7x1XkXeWqRt+JgTPdp0QSRp6mw3OaD5gEQPW4ptAziNODLFfM+8uEdc9Ce0efWiYTDzWOwiOs8Uxt964xVmBn/nhz/HWdXW+dyXTtupvaq1llKzMyZzUxLvWpQ9+4nk++PlP8fbN29x9/Trf/soX6UoibrbkYIk+6/uF+jkaEbwPjDFC0aJkUUmC6xqMcaSdwP8PXI8dXFdp9qilmTX10CA1O88KFJ2Fmeo+Y6wBNHliGHr6PmoRSzpHW67W6ugDnJ2cMdoG42o3lxLeaBLHzozZFIU8hxix1b0oZ3VV6aYTSJHpZMJ2s+bu3Ts8+fQFJjNP16qFnCsFUzSdTsNr1dbMzJpqim3r76v6VBFHLtULUwrJ6GzvZLFh3PbE7QakU5/XIowURiNEKWRjGEumNYaUkxJbrGdL4dg2mMYyc4n9PNCIJoSosYIhyS4TcIsIfPiDH+Fzn/whDi/OSblntT0ll8IwJHK2eNvSr7Z86cu/xRe//CXu375O2VKDbjd0jfAv/OyP03SGO28/gMctX/+9r5OBS5ePGPvIcnHGZp0I1d7NGMNs5qE4YuyZdx0h7w5Fj9aj9d6sh+/4ck9wDYhTm7DR4BtHihFrBnwXcE1H17SM4wM2qyXeQjZW08EbhQhTn5DiEFPjVLIjDSPD8ozWBrrG0wtYr0XUmV3XJ+oTWB1QUiqaO52VqYm1SBadxYk2fwR1aNGZHKSSMd6rjyPKRJ0fTGvlshV6LDhra5FVuHC9XnHt2mMVnqybFGr3ZFTUoCQI0aE/GJbbFfPZIVGMdlvGaT6fC5rO7fQ6qtdkxtmdSbMa/Ro7KMPRComohUEA47QBEqn+xJZUMnvTOa6rpthtgOARb3C50DpHb5T2nsXw03/mz3Nje8bk4kWGbc/0whGtn3Dp8kWs0Y51RMjiSKOhsVOcnwJQujlnpcMcPc6Hrz3LcjvypV/6+2xp8M4R04D1Vv8sHt94JVzkRKo+j9r1UfWfOz3dO39ajG7GpdR4HVONwHkntsjuJAc7VxstlBYlUW23PevNiu04sO0TBhVsp5zo+5HFQqHGg9mcfOVxMI6bb79RzQYqm5Wdi4nF4HDGEccR0ENSQY2picK9O7cI7Yynn3wCMWusTYQgBAMNqDRCNKHBWF9NrkX9NItK5RHBGkesszxT7/3VMLJYrbDiOJjNSZsN/WB1ltZ6vXfFIqOmplsx5wkMxlpiKayKcNx6UoFL2bBvHaYkjcSqioV+6Gmt5XOf+hyf/PTzTLqGOAws1mdghGGIGLE4abl/75hf/kd/j6999RucLpa005aSLGM0TLrAU8/O+KHPfoL50R42FA4uHLJZ9hxe/Wlisbzy4g2+860XoFTdrOjhITQBMWtA6IqllCX+kYD90XqP18NbljVKc06jUp+d90q2QAkafcwM25FWtKjMZ1PiOCDGEoeEawM+GFLuSTLQTVuaJnCwt0+OkdPj12njCjFH2u2J6vWsqfqwKo0mR4oxjEXorLLUbAjaHdRPsDVqNSYVuoNdMnmVEVhHjsJGNuRySJZagETnKtZX02KnTvbeO2IcaZqGnFSgLjVZ3VpfNy1l+nmjydN9GpiKQbIQtwOTdu/c7NhaW+n5VUxdyRj1iRW2s4Xlak1KQec07OaduuGKFVqjCQ9Hly6RUyFMpvreuCqIrM4v2+2W037DsOkZxsK4PyfMOnKB28fHmMmM0HZIpafGYcAZwSm3Fu8DN964DsAv/vKv8uYrN8AZ2jyQlyv6wwvKnPeeJjbknPCtJdDqYNMGrCTSGJWNmBJxjIw519BRw7BRskm/3bDdbnEugCm1qNna9SnJxViLq8QPEVedcjJpHBnitjJDE+O4YT1s2QwZ0k57CptSWK3UKWazXDGZ7pESbDcbCn3Vrkkl0RT9mIitMzehVDu2gwtzVstT5iJMJ46rVw9wJTJWONoZS4kjwXXsAs6tdXjrFL4sEWuDMntNACIxF1LKDLknlp6y9WxXkUk3ZdYaGqPXwQaH8RUWtkb9ZLNqYYNYitPCajGYLDTOsTo7ZghTbOp5ak+7XEQoJTLtWj7/2R/mYx/7KIJhzCOL5Rk5RcYCGE9OgeuvX+fXfuU3+P1vfUdJYKUwaQJ5HJDRUBLsH+3xgY9cIbTC62+8wfxgQkqWi4dXaLopL333FV55+S26dg8pSuhabx4wne0znU44ONrnsacucXgwZW9vn/lLN+CL3/8TbnGP1qP1h9fDJ7AXoc7BMVYNc73RofOm7zFtQy6R1bpn1hhSHFWAXROeTVRY1ASrWjCjJtHjuEHKQBDBTTy4jpgMjuk72ikn57l85ARBHVByThRTVEou5dyCCqObVoxKVLBm58Npqr5JfUeNyYxVEybnp2xXKeBZE7GbwMVLl8g5M5lOSSnVmZzqocToFmyrADDGSFCaISLCpGkZa0KE1I1vHBPOVSivEl52/1mrGz5YUsyU0lBNp9SOTESPAAYlRFRRtbGeKFBsZrM+Y73uWa2XxM2KPBQ++8Of58Of/yi2nXKaE9Z42mSIOSPG6xzMgCtC7LcMfU9oOho/I8fE177yW/w8cHbvDg9mB0QjhNJjY8ZN9zCpMMaB4i2pgC2RW7duM+aIs571ekPOGV8ZsSJCHEbiOrFZLjk7UT/G1WrF/QcPkGKBynY0GutEdW1xIRB8Q7CGZF3NIoxITmCVbRljJMVETpmSIpLNuZ2ZeptWjVuVCBjbsH90xNli0FilUig5nVudFYGUUZJUyQzjhuM3T7h4tIeIMJ81OBtxBiSmWhQMlIxzRg3A2XX4RqOKjKcUHRnkpAblfb/lwckxMW3IJdO4Q6y0xFEwPiEWhZ2rJVvrHdYrmmCqM02WokXaOs3rK4ngPG61pnQ1qsi0DOPI1cuP8UM/9CmefvJphMI4LoklEjOkqI5BURKvvvYyv/gLv8grL76CZCH4TntKyXinEIsxgrSe+f6UVBLf+uaL+BCIw8jh4RXeeOU63/r2t/DNlNl0zmqxYRi3INB2lguX5xweXOKZZ6/RTAzjsOb1V1/jwo0Hf/Id7tF6tP6I9dCFr0jAgcJWUsgSsdJounSxkATLADkxpIKhOl3EpGR5U0kHFLrW0m9GOuNYbh/Q7nv29y4Q2il+csTZZgEp62yrnrZN7eKKMeRc1OwXdZYPaJ6YlAwm4byWwJydFjIEb5ySUKSoq37naL0npkQuoto80Fw1oz9zb2+PUp0/QvBM2j2mk1klwFALH9VU2QAZ30T29hpWi7nCa92EyXQGGC5cvISRwnY7Mp9Pa4KAXlMN/lQn/WiE5XqjDa6UOpPMOguydZZIte4KjjJElsen0Fhev3mdOPSUDG3T0HVwbe8CH/vgh3CXLpF60KRuoxE7znJ2dsJ+c5WcMsYLcbthXG0o7cBx2rJaLBm2SgaR5QrsgBGPJEPfWHXXIVFy5GA+Z3u2YrNZ88ad10AsKQ4q00ATJwyZNCZstDBmXnvpJS4377oVxVJKwtmCw1dBs876UkrIOCCiRttGCka0IKkxgFQpRKaIMi+DFwiO1EeGnFRHmNUObSyFWAom98xmM+7fT0rxryeRkjNjLJhpQJqWmPQQM25XPPXYE0x9w3qxUCs18RSjzjJGsiIGxp0jCSqKL4hYBjIz20JJOJewNpFzQ8yJvf0ZOTo22wExOp/19T4RqXZhO7ShaAakFEuOgB8R4ykpgYdoLdlDHHrsao1/cI/oCk9//Of4kU9+liefeIJU1vTDGRRhiANDHmnaCc5ZvvPN7/CPfvU3+P4LL9O5lsa0JJsYx+GccGWsEEJLcQ5PwJqBu7fu8eDtMy5ducpi7PnGV36b9VoN5H2zoXUN1gneW1KMBDehHxLL9X2++Y23CKElSSLnRHM2/rPuc4/Wo/UH1g8w43NgkqYY4DCi0T3FahBlieotGYLDBk9O1D6oYE1GjOajOePwxjNm0cw702LsBOcuMt9/nNUQyKOQXGJEo39cpa+nlFRTpJb7OOdINVSz5Lo5VncP7yzOe8bYV/2XfmE0mqdpG5wthBDOTYCNCGSh2EI76bjUNKzXK3IZ8c4zn83p2k6H95VAsYt2MShLc39/n/39C9y9/YrOcEx1VcmFtm0pUjg+ucdkco2mUUcSUx2S9XkEYzI5jbRdwBPU+kzecXvRpblxqjUrXLp8yDMfeJYHN26zimsKBp8jrqgWMWUNUTUuMKZCCJaSM6EYbr11i6cuXkVKIRLZDivM2NP30J9pPE6/3QCQZYQ06CHGqEMJacRkhQBjX+ialvVqgbdqmDxu1+eFuybkIKVgUGnLnXt3ME88pr+VFO3cROjHkVJGurY999kstdMuOWKM0/R6lbCh8KSaAmQc1gpt0O7Z+cCaLcO2IAVSTWA4XS548NarxD5BWQF60IBCzpCL13xHrHa16y0mZebTlknw2LybPapcQ/+jDs1Mrgc/qeQnFZhnsyTlEaHFGBhHw4PjnvV2ATZxuDeFrBCzmlYLxleGq0IXKto3FXHYoRn1sVIyxVpKUqJRyOgc/dZtfu6nfpq/8pf+Eo9fOWCMa842D8hloEhmGEaCa5Bo+d1vfoN/+A9+mTdef4umndJY/bxLLkog2nXtKWHIWKsWd6vVltOTtbJVi+POrXvcvbem6/Y1N9GotjSWUQ9+rn4+EI5PjlksKyHKNjSTVhNUHtW9R+s9Xg+v4zNKjzYmYYrDlgBmJJdYgz8hGAhNqEGweiptvHYl/TBqdFCCgOAQQvC00wbbTrHuCMyMcRQNBM2JUbJCKJgagmkJTcBmZSy2rdeuD6oWz+l+U4SDvRmlFNrW451XHV4B4wyTruXgaB9rYDLt3gWRWmWLGtWThWDwwdZUdYuVps4MgWqNlYr6H9oqyDbGYp2nH0c+8YEPUPpB51HW4Nu27lsK0+pBYJdrXWoXmWkbTxMq489ZmsbjqpZpt/nt3EQoiQ+8/znSWGg7x9Q3xBBYx0E9GRHGNKijv1iGccRN1DZtHEdcEe7dvUdWt2LVceWRQMEUCN7SV5YuoPD02ZokDVhPZgSTtUOPkbEYUlGCisUheSSnXue1mHNRvv4yKh+ZHx5URxKgyidzyeqRWTRQWCRDKZqFiEokjGvI9QAiBqisXY2dyqSieYveNgoleyUCjTEzJGWJDnHk9PQ+03bOZNpwdnqmWtBxq7IWEwhtIOeB5emS+bTFG2gseKOyC/XIFHAWSYmcdrFFmRgzzgdi1ueLSfAhk7NgDdXhREjREropXSiARYpXl5VADSZWyU0umSIZMXW6bpyyZa1qT42tBumx0BhLXGyZtA0/9eM/yr/xr/63uHp5zjCuOVvfJkqEVGFdge1Q+MrXvsqv/9Kvc+/efUQM0+6QXCLGGsYx6z1bta/GGpq2oWs9OQ6cnZ7qPTII62I0TzAPtG1LLpm2dfU1Bi3ORVnSyk4F64wiKjETk9BvIsZY1ptHle/Rem/XD1D4Ejgh5UJOQw1aVaYlRinqTjxxCwQUliuZlERjfYLFOUPe9rTdFBl7cIILLfgG51qdVXnLZNrQ2jp3008EBpjNZoQQWC9XeO85ODykmagUIXiPpKyUEwOXL11QmKTVBG1jvDpmFCF4x/7BHhhDqLM4YwBR7RbWVsG1up6UghZgjwrQKVrojH6JSUiNSwLtOJxzhBCIMdG4oLqoGo6aUeYo5R1HEtCiJkXYm89pnFe4zhUmQfPPzLsidUAPAm0IlDHjO4tvwAdHKrrZjqN2L+P6DqvlhgtXDG3TKBu0unqkMbMeE/fXK468w/dbuq5jsTrFGS0Ys+mE5z/xUfitL0NJrO7doWsPEWcJrqBpUIZh6Bmt1RgojSFks17SOnO+sYHO1STrKV9cw/TwiM3yrP5mRQtiitjQ4JtA4wPjGOuVtVjf4MXgfdDNVUrNJVTnnSzKYBXv6VMiVNPpnTAeMecs2pz1njBWSTMpJXVDsZ40JrabDS5kuqnhYN+zWiV9/63T4NmSyVKZn1VDKlKjiqzmJIooiznlRDYdw7ClSKgSF4NrAlEc0XjmDTifiVLOJR6y4zULSMpVLqCQvYhXRMRAElFdqGTiqufStcf5sZ/9OT7x/IfIbFmNxyy2kWHU+KzgPcFazs7O+PVf/01++0tf4cG9E/bnBzjbknKkjyNCqjC71UQS58g5aUizs8R+ozrJqIkNy5RYe48RS7AF6zKh9QxRRfwlqS1cSsLQj5RiCK3DO8dmsVQTgNCci/m32+2fYGt7tB6t/+L18Hl8VrPtxAjkgqXgHBi7K3oOZzzDkEljxjYObwpGMsY1OOMIDQyjYxgL1rekMjDZP8C0R+r6IUITDHSeztVsOTif7zjvmPop88kE7736IYomqqeUquGuwpZtCCRn8c5owRH1zxTJ1bCXc8JEqcnnqtqrJ+Aqn8j1OXU8GTXPzqp8we5gtyK4mg8otm7sABLU01R0/mi9zgGdUy/JnYi9+lCRs1L3fWi0kJNViJ/knXw6qPNEZapeuXyJdJQZxoHJ/lSrDYIUNc4ORgXJMUWc0aQBEJIUYh4Z40heW27eukNJidY3kEe2OSNFPRMlRp544nEAfuInf4z9puXmjTvcvXOLrmuwzR5ni7Pa0RWMaIiuOo3kypp15zAcIjjvsd7TTPbQ3I135AyFKr0oBRvqjGx3ravjTXEB6xvysEVypuAoWW3LEpn53hw7bTk5OSOmouxf0QBbwZyHzVqDur9Yy9D3OBuwVv02lbFouHTpgOXmAbYLQCYbIRWNztIbo855q05RStGfUwopZqwLxO1AzHDn/oLNuMbYicoOZNT3pmmwjQFfEBlJZa1WafXwZyqasbuGxnqNkhKUdTxGnQcbzQ/8y3/5v8InPvkJxhI57U9JozJ6Mep1KmPixRde4Utf/CLf+/aLjENm0k3Ym82IqVeHGARbiWUp68kwxhGJpmZERuIm0tgdgxR1OjKWcdB7OYrgJ5koAyIW70KdWasptnNeDZJQFyTJmUkTGI3qJYN3TKfTH3hje7QerX/aenhyy5gZBbAOZz0m1SBXKzgTyFHAZXxrIGbdUerMw2SD9QISGUtC+lEh0cYyYrDicRikWJwxHB3t4asjSzHqz+iNIRnB4hGrsKKts7Ud+mhMfbwxOLFkZ1UiYJNuxqVUE18hDYVcBtrgaILHGgulho5SIAknqxXb7YZxHLDO0VnHbG/G/uFRdRpR3aDBUKoua7FasdpEYhx54/p1Yo50LtBg6KYT2qAQrfpJ7k7zpSYW6NwTQbshAx6nUKvWRi3MtRPwzmvOn8uEiWfSTUhEkhVK0dcjFop1arnlHGOB1Pestyt+/2tfppsEGmPxfU/cLMluhnUj33/jDYYkfPojH8ZgGcuuMLU8+eSzPPX0s+SS+Z3f+R1OT5bVd1RhaRU2AkU7cSMqvzCiUByChv62mgo/9iMGhXIVIISEZUwJk5esTYP3NXmAjMjAoAROLFm7dTR9vojBOq/3n+iGGkIgSiE0OmPMY2So+YLGaeErMaqpwRgZ4oY9s4e10IaAE4eJE1wTkOw1tggt4NqdZlwXMJQK2421g+zIAqvVllLnWsv1grPFijFNCcWRQbsmUwgUSvHnCRXFCL52+saIyiG8WqxZ4xVlKEKSjK860ojwc3/2z/HBD7yP0+EeYiBuCjbpzPS119/g97/xu7z44gscn57hxdE1E6xE0qg5gFIlRFIEXwDRSCxlxOqsVsTQ91s1ezDKwE3FId5iJeOKoYhDpDBmQ2c1f9AAznpy3lLyqFrIegA1WBrn8b6hCZ5uMsUai7t1/Cfd3x6tR+uPXA9d+NrgNETVN+rYsB5VkJvrKTxbEmCCpXUqPC4u4MTiTCaNK3o2ZGOZTjq8F0xowLQ43yFFSDlj24Zsg5IZjK1uK7WTK1CsocSoydLGEEKgHobJVidL2Rhu3bvPOg7MfODC5YuqbSvqbbhcLbh16x4gHB3sc+3qFdKIdiNeJ2M5C2fHC+Kw5XR9RtO2zJuWbtKiKNmOGm9UNG8VwkxFWK+3DGPkzt27xDzSiGHiG7rpFI/w9NOPn/t6aveYMKJ+nCkVxjHS9wPWerw3zGYGXzfyXCn5GChFpRXW+Cp+7ijeUfTojFpgOcJkwne/+3t859U3OV1ueOull5AxcfFowuyg47CxPHnyNvv3buD9HOssZ6/fQErhQtfSz/ewx6cANPcXzLt7RCcMkwnPXnqMO5stWdUcnB2fkY1j7TyrszWf+OinuXnzdRbLY0wxpMHy2OOP8dgzV4kMvP7qTczQk6pp9LYYNtmoxELAO0c/ZGY+IFmLXCmJYGAcFX7v+5HGdnhjddYmsNxskM1GI4xcULnApKVrO/rN6XnaRIlCiQPJWBoP5MT+dIIvnpQj1jcILTDQGY/EBEVw1mGcICkCCYzXn20MxkEUuH+6pi+FMVsET9M4yipiosdKTVgQVYRKZfRqZ6sHRqmOMaYeeAxAUYF6dS9ArKWIzjS9GEosvPjSSyyOT1n3JyxXKxbHC+7evMX3X3mNVYIQ4OqFfT7xqU9CSmxWp2zXW5bLnmEbiTEpXCzvEMLEKEqirFSDkHCumkoUtSokBC3QyWoPb4t6xKbMKIXgLc55rFiM1T9zSUgtdi44vFhcCGRJPHhwQhwK89UjqPPRem/XwwvYa1fjcJRscaHFNQMxJ2WxebDeEaZT3GatlHJ2ac8JFzzFHLEZEylnQtswnR9RmNCPicaq6XUcB45PlwTnuHjpQDsGo7O3zdBzvFgjeVCvQec5Ojjk6PCAnYEwxhEL3D8+4Wyz4Ggy5/DiRYpRDVUqEUE4PTslDSOzrkOyDuyLEcQqFd4Yx9npAmuEfqNao9UQOTw6OH89ItoFFWVW6L83lvWmZ7PZcu3yjLvHp3TdlPVqy2bTM5u0xJhwrjl3itkx5HYsm8ViwYP7x8Qx0jQdBxemPHb1Eq4SRHYswpgzy+WKvh+xtuHgAIxvFDbcDS0dzGYN4+aEuO0hC3szsFOjCRti+Klvvshf/I3fxr/LGuov1z8zcOPJJ/C1y/zCf/qfEbuGHDzHj10lGUNpPMN0grOWN966ydJbfv3Zp/hHf///Q+oTRQZs64gR2tDxvve9n2wHyFv2Jw4ZFmxXyhq9ffsOt1wgWI8VaJugLOFtwlCwTrWY1hqysWQKxjmWqw2ttVr0naGeTrBAPyacdazXA0eHF4hjZDLpABiHVE3CDeSMqZKbUgZlK7oG3xiKxKrLqy4DRg8hFocUB9KqHGEcyVLY9iOLTU/MWWeGuHOxuFQrN8zOGcZUraY9f+5SFPouWS30cq6zYFGz7BizxmcZoBTVY9ai+OrLb/DCd15gsbxPGhPeWtrGcXjlMWQ78vi1C1w9mmBKpvGB+Z4hpz22feH0wYrtcstitWCzURhZSql1VtQUoZh6AEtYtANOxhBFod5GGpwRYKv60JSw6AwPIyTR9Iu26TBpxFglb+UsDCUT+8QQE+OYdTZf7A++sz1aj9Y/ZT104XNqbgRZGY+2sWTX04QWSqs5dMHimpayydVAVxMLpFiGDXTTOV27xbop3WRG0x7g/T4DhRgzGEhx5MG9OwQX2Nvv8N7qh00MJQur5Zr1+gQpkb3ZHtPZXGc21XVeiQuOzWrJ8ckD5lebc1gxl2rkZAxjjHgrKnCv/otgKiSqRVLDVu05iy0E7Rxi0jN6NrvDAFAlF94GvA2UlLh0OGV9ZpjPWu71S6zp6Dd6zawx58L1XcVTGzRAMiVtWC7uA4GmewpNHaU+TmeMfYo8ODtTJqLxLNcbrAtqCZdRggIwGssHn3yax5/+IF//vW9z/95d8hCxbYASOVys8LnwjZ/5KV7abmisIfUDT50u+JFXXuOZGzfP74NrN9/572dffBmAZC2/9ud+mmHacfWpJ3kyRvqPf4TNbMbqdMV3vvciSSwuGErekMYTpnsTGuv5yPvfhxHDLGvH15meqeu1shtLihs8U5abpcJqwWM1EJFsHTZAFzqGzZZNHNUk2xlcaHR+REUGfKfuPzExxsxsbw7A3sEeZ8bWnL2iEHGBMSWCBesLoNq9QtDMRKNkGWfUS7Nk2GwGNtuknw9fcCYQc1LyS+7x5z6tBmMVxTA7J6HMuUzDGBVjqF7P1rpWtDgbS6yoBeJgB/cnxX0HSbjSsDnpibawXRUO54dcONrHNpnff+VVcvZ0RuFIbMEQgA4otJ3l8tUp+cLIbNGxPFuwWfVsVxvGpAcCFU1UNMZYneMKCA7ra5jtLh7JJLxraAgYsTX1weIbQxqVERvqZ6/kRN9vcdmQxFIIGGtwLlFKfPgd7dF6tB5iPXwQrXgsHZmMo1DSjv7ukESlfmsoq9/rMGPGxRGfqR/UzLbfYt0G0+yzHhzNrKUkZahRMilGshiG9YbtmDHPPKGGwEZ1cyKF7WqlobMlk9JILqMKwEU3B6utEF03Y39aU7/LO+QQ4x0ihkkTFOIaNXFht9GUDDrBKfjGqlN9UXG5a4wKn3fGwmKxIpoQUM2TNUEbTDXJjnGAYKF1+OAJVb+W045so69LXCUT1HP/4XxCYzu2vXY+xkjVQelsVbsNhbzGbUJYMwyDFv9sFVaUAWcDBqEfLN/67mvcunEHm2eYbIkZdd+pG8vNLnBj7xJmGGkvX+VJext4jZcuXWQ1m/KZN6/zzWefJU6nTGLk9lOPcSWOfPK3vsLUF8rBFNsn2tWao/mMyV7L1YM9zk5OuPngjMl8wtF+y+OPzcgxY2XGGIXJdEpzT6v6xWmD7CubdMyZUTzkTOscFohS3WpyIeVIHLass2BHNUvIojCd85WIMkZs8Spz8FYJT8Zy59ZtAE5OT9juzWlDQ+NhHDOtQ9FLI5ga5SQiGvnjVKcn2bEdhWEYySazWgxsU2Q+mbHvwFVrs1QEsupKDR5TdP5rzFR9bNNO46eBujlnslVDbRGFz3NRslBKmhUYk3ZhRhSW71zgdHumsGc2GBm1MFvLvbv3ufP2Ldop7O9P2Z8eMWsCqR+w3pBNQbIWUmMdPjhC4zjqHNP9OdvFhs1yw9npGZv1kpJ07hirA1KpmkgRJbtplqWmozjjNb6rWHKBlKI+f/AK3FrIUT/3FoNNhZxEB5lGcDZDieTySM7waL236+FZna4l0VJsBFsgG0zqEAwej29airXklBGTlM4vLV1j6YcV+IxtDcHMmOwdUNo5Yj3kgqSMuGohJYbWd4zbDTYXSsxk3glwbZuANZq+PW0bLJmU4nmuHmQslmw9zjTnJIudoXTMgncNrbMsVkNl3+nErkimTtkB7RBDaFRPFmMlVmgsTi5qDGyzhsNGAayewL23WOcYU0ayqVE2BmsLOUb0RfkKeSkkWSSfQ1zWWoL3OIm0zVSP1OrQSBFDLvp/3jicWGbNlG7iuXfWg69z0ZxwQZi2hqb0dDbg54es5/d4a30MBtrgISYmuZJlTo/JsxnOCKn3vO/f/p/Af+3n+SDw1T/7Z+DN65Aieb3k2o//OJ/4238Xjo7gwx/m4mrFpPHYvtCsNly9eZtxPsFg+dCmZ77acKGZcM127L15C49FimMYM13Xcnj/FIBJmDHzUwp6IInZIS20gKfgpSjZps/MG8+QoDghWUsKgcPDKwjC2XLBmHqKA1fUkk2CIRYYN7HOyGAcRsaUcd5gStSuhKwdZ3XKwQrGecR4kiTiOLLdJJbDAusStiSc81AKYneBucouHUVwriGlLc6Br/O5IgkXOkIUIqmG03plhdb7wmLUBMBoEKypJgwKjWqWnoimwjsy3uo8zTqdgbaNw0wbmjBhOnfgDKHOf41VJyOREWs9Rf30lImaM7hAO/O0oWU2ndG0gcWpZ3W6Ig2qJaRYJQ95KMVQUlFzAsna7TpDSoU4FkLo8M5QkppMWNSc3VqnM/oU8Tid4VtXGcgGxNO2k3+WPe7RerT+0Hr4wmeUZq2Qoho559xgnUbl5JTJGFxwpCGTks71egMSPI0ofV/8lCiFrlErsVzUbqzIQCnKVgu+YWRDEbXsKkUFwgWdafjQMm4jKaI6KahEE6mbj544pTr+W1HGobNedWL66TyPGcJoxIxBGX4KIXkQjzWBXExlK+7obrvNTVRYXGE52W1KotZpm6Q+la4oUFwAciblrOSIem2NtRijRIcduaHkCrsZ+04nKajgXcx5moF6kg7M92p4qYfp/hxyRvKGYAxTb1me3GWxXrC8d4xHcDj8qCSkbQVc+7MFLY6NF579iZ/kyl/9K7DdwmSihQX49I2b8PGPw0c/rt87OgLgI7/2tT9wv3zsN778A9+MsW0Y5lOsd5RUcK5h2IyEFnzraRxq7pwN4tCMRd/gHJTs2WxHpCSMtXRNYG8+V+lKSRgsRRyzgwNef+MtnQejkHGq6fSWSB2aIUYo1pNNoNiOoayJmw3bYSDKSBK4e/eUo8OGSVPh51Ih1TKQq5lBMAYTdMPf5fsZh2bmZfWR1XQCfb/r6O98lleymqSH1oCNlWSi966cm71WPamMKJc34Iwwm06ZtlWy4bUQq+OQvhYoauZu9WfnkiAldUOqKSXeWFzXYi9fYLI3o5mdsjhZwmZDjAkrel/u5AwFcKFCt5WZm7KyPptQD8Zh97qrz60pGGdUyoOj4IlJkKKv0/lHM75H671dDy9nIGK8wVTmobcJ8RbfBUwpbNYbfDOHApIjwQeGMiLJMvUBhhHrGmIJiJLxsCK4evR2Vu2/xFbz6aJCYmcKpbITVXRc1ODXa/p1jrkWzh0Djcp/MyTJOiswpjrzF4y3512etVBKPv8NDZZcFDZ1Tj/IOccqcNcPZs5ZLbUINQxCVD+Fmm4j+pzWZoYYGZPo8H9M+JDf0adVfYJQqvhdfT8xCmemlPV07yClvJP68c5UsEKrkjVI1Gh+4Hw6IzzWcPbgPuTEpcN9TBqAxDw0jM2EVb+k1Igiax2bJgDQX9hH/IRuOuPP/1v/Njf/k/+Ep372Z+HZZxmC45effYbHrlzl4//Kv8L6d79Gt16xiwh98c/8abYXD3GbDX615o3PfJy0N2UYCq/evM/pJnGwf8ATVw7YbwutNRSxrLY9k67Fe8t2tk+6vIfzjlwKeYTtNpGNwUxarFPtZMrCehjYrNY8/eRljCRlRpbMdrMmhEAbAsF31d814o1hWI+c3rtLqsnvuux5MbDV8No5pzZxBvpkGJY9GDg+WTBESxb9EvEYFxAGSs40vqV1AeJG71djIaq1mlRbOpwjS9LO3ntSTY1w1iqxpuSKLOghKJfIZN7ig77PxRSKjrI1qR2rxavofW1twrtJVburIYRzDkykpOoxi7JFrVH9ZMy5uuiUeh0sxe0GyhZjNS6pnU+40DRM5vucHT9gfbYg9SMx6usyeJxT/amUyHYbialgTQf1UOycfjVOt55YMi5FcgbjnEqKkh5CjVE3Gu8fJbA/Wu/tevh0BpPAWqydIKkgOUI2xDKQBBrnsc4xm3TYrmG92mJcINmI7xpMaBikID7gmhkxGYITilHoTz+UKnL2jSM0munWGK9EBtHuzjpNiUhjhqIWWOVd4m5Tha+2ZufFkogV2pEqrHY2IEULba4elkDVL1UpRolgEmnMmFp4s6hIXWFJe154FL1S0swuDLdIUUgLg2auGxrj6FOPMZw7thijNlS2xsikKn7PuQanmlJFzFrQVcydK8NO6e/OG01GKMK42vD6629Cjjz1+IXKpJtig8dLg/OWKFukCJ1v2ZYBU6FO5nNGO+Ov/c//F5y9fYvr/9F/qIUPwEDcn/P4X/qLrBYLXv72d/kInBe+W7Zh4Rsee3wff++Y+08/Rtqfs1hEbpsJtxcbDvYOsY9fJu5BazKpGNbbgdlsirdQiqNDDxrOqRWY856cd96QGjIrBhIFnCZddEEz+bxVI/Paf7HYrCjWsenXTFvH1FJNCcr5e+6cA3IlUKiwvWRHHAYImf5EX4t3WS33ygzJjWrSXCGVwrSxOqMqjlRGshnJ3jOmamqAoVT0NBYhih4go7GMxoLx2AKtU2OIXQlCCkEyrS1QIqaG3+ai97yiIYVcEqU4nGlwwaqtIAFr9eOtLjIWTzg3ukacfgaoEhCUsYk1FKNfVO9bTc/S+7XtWrqmw7eOpgmsTxesVisdF5QKtZqIpIw3HtdaDA0+KBGpaTzBh8pqLVhTyNEQR0MTGmXRFoPxI7FkjHM4E/4ke9uj9Wj9F66HxhCKzTUMVbVnxgqegok663I2YIthHDb0cYtYw5iSaq+soZvPaaczpnt7uMbi8diyc1IRdUt518C+SNbEgqIEmupBrVTqVDAl0WSHGSv6qDQycspYsQSrhsp9GvV5rYrbRRTmE7Hn7h0576KJ1NnC4lW861VnZHYsdqPBtuqPqBZkCVNhKSX4KAXCaadoHU4M29S/a8ZYKjdOExcoRnVNaENgxJKTEBHGkqBEhFFLpzoh1xmPIQkYAmIyY9Zi7EOHtYZYIqMVonOsk+BcYG8yIRZD8IKVqC4eUudOwHC25aM//pM8+5nP8KX/y39M9/jj4HXzbK5e4ZlP/xAXn36Gl37rH9dg2NoVPP00tukYhoh3oY5VdYPtY4+YXXK8EkZclWPsOIJU/1EnIGp8WqN6CjrNzEr0NxaxjmDVe1XhMHAOPQiIwYnHW6vXII9Kjy+iUohgtRiU3TxXu+keSCT6vtAPkWHINL7BYhljJlVXHus6hR9rMcOj3WIpZArbYdAYLiMkRkXGrSWlSMkGktrnxZzJGawYeloEYW8WmE4crfPn8LYtnmDApAGXBS8BJwFvgl5fa8nWUGphMqLxTeLUY9VaV784f79Mjb0y1mJs0MOs0a1AgLLT66mTG9nqQUv9agu2sqa76YSDKxc5evwyR1eO2JtPsRiipDoOUSKZ85bQQtsaus4riSi0GB/Aad4mVvWapVCdeYQQHC7UYGV/3p4/Wo/We7IeXs7gGlIOlKT6qYJTOzJjSdXeCjFMg6dpWprg6DdbYhy4dHQRP5my2vT044jYjJUEWY2vMY4Q1PtzlEgsmjsWc6YIDGOia0LthAzBerbDhsF7GqOJB9rh6elWs9563eysI6WsDDFrMEVnBwh03RRbaeZNYyq5RAf0VMePUhKTyQRjPD54nHfEmPA2VDnCjlQjUNmhuWR88Iwx0njPduhpbGCMmr02xkRwTint9frKrnhTafzJELMmDATR1+UqBV79PjX6xpQCZaQJU2K/pGkCkrXbvHnzDvcenDDrply8cJmcI3uzA95e3tCiZQzBeWXdAtuTM/YvXsA3DT//7/9v/sD7/2P/4X/M9b/7d3He8/mf/6/D8TFcvKTf/KVf4pn/4H9Nc+stSq96PK2J6sySS2SzXjOfzXQTzqmmNFR4uwjFgNt1ybngXaiWZ4lM1C6+0QPHTvxhrME6tc8yViUmUgreaaJCrqnnkgsS1f5NilLxXYXashGSydpdZM1g3BXlqkfXjgmLJAEGCj1Z86v0VpEqbbEQmoZ+WzDFQFbWo/eWMmpa+2w65UBm3L6zosiGwoB3gbaBHLfYUrBWIUdj3pU8r+NrnZGJfnRLMTivhBRTrdyk3j9SoftdZ2upz2XrHWeoxhA1n68o7FmKeqVqTsgOxtfnbozHGk9fEs57WqsG6l3TsmkWtNKTTk8ocVsTJNVBKHghNI4maAiv9406MeWBISUN0g2GMSp7tvWOXLTAWm9Jm/XDblOP1qP1UOsHmPFpQkIRDUKNRXU4znqsMYhVVxeMZb3aqM2TwOWDi7ShZZuzRsekgg0GGLR7yQ7jYBhHptVcOEoiW+1+SlboM8vOOQJcLkwmDdI5kud8BgfUDV2hSm8NoaaoG6idmc4BrfPMmn3AnoeNSj0VK3KqsUfe66nUB6cMO+d0N6wb+65wFSlAqh1IpmunpGEEa/DW442l6zp8QDPdaHF1rqKddHmHoCOwHYQxNgqdttUdxlAfV+c3VcC9HjKyyeAchqhRQeIgAdlRjDCdTLnkL3D3wSkWT5FIzhpfs0M6J/szXvjSr3L7tZcI7YyP9SMf+h//m3DlCt/59/89Hnzxy5xe2GNmhHY7cPlf/pdojo7gb/5Nbt6/w1txw/uvXKwsWg2kTcWw3W7ZrFdIuaxJAyTt7ihKRpKMOoAbpfVLjQ2qTF1b3XhCReOwDlN22kvON3sMqskUIVQdHz4w9lsMluCUbFWqd6jeMMLBwSE+ZspmWzWdOuNyxuKqVZukgsRKsbeRJCqstqjIP4rgO7U1o1coswkqzPbeE61BjLDebDg5iZXEBMb0eNsq29Ki0qCaKShmN3WjnozqjNdYcnHkpB6xiBp1ixhM0ah367XwYNUFpxSjJB+jOY6merhi9d7Sf3seqYxxqLCm6CyYJDRqHsq5zl7QIjbXz0UTImbasTg7pl8tMVGLr/eO6bTVhBLX6LZjDF0IeugR9UW1ZH2NvsGUtt7uGc7n8I/Wo/XerIcvfMbgEYxEnDUkm4kRitid7zElJXwMpJiqrq7QTS9gjccUPXl774mi5I+CxRRf5w515iaCtYZJN6EMmWCU4We07aENOutpmoBrPU3X6EyksuAUsszszWcgkTp6003eFNWBG890b04pPT74yj7LKi6u43/vA1euXKbfrthOOooIs/m+GuZWI2Bb1JB557pSJDKZzLj2mOHB/SWzicWFhkEMEyyzi3O65oi2a5XRmkc15zW2eh2bKmwuepoPc5yzSpKo0JM1cj4jjMUwFsE0czaj4eLVxzk9e+18Y9dcQhXU52FAiArLFT3Ju5yVfl/f42Z/xo3bt7j7+ptMJ4dcOz6D/+5/B4A7X/oSm1df5+y2ZeYa5gkOhpEG4Jd+ifHZJ7EHU6yz2o0aELEcHF5m/zTDzROC1w2viCir0dSMuR2TsYBx+hixgVgGsoDdbb6iRsZRLGksqALEUYplO4w8WG64dNhqV5iVqSlWasp8ndEWoe97hkG1i5Iyp3fPmPmGiatSBtNSSHhUoG3U26ASl5Rxa0CT1nNT33uNbOrHQdPdRbujnDL9MFCqpnNMEcTgjcMUT5sNV4+Pufj2iWpBE1gSLm05OziiNY6WdzSb003PcoycpKTQY6lemaIHAFsPEM5oMaRC7FqAAqCuNzuUGqh63B1crxpRU80VJO9Yy2bX/lbRemWd1iioMO2w3ZToLbnN+Jlle7olDpmuc3RtwLoG9WRVIX5MCesC3reUGiE1psQ4DCrncJaU07uOl4/Wo/XerIcufH3yTNEUdr3zwXhRokV1MJGY8JOWoRQK6qxeLGz7QU/jBVajRo4Ydro1nZEImlYdvGfStHhnaF3AZHBeN3/vDY9fuchqu8QA08mM/YPDynSsOxOGUjL7+zPa1lX4R5UIO+NfRLh45RKIzt5yKTRNW23ICqCRM947ZvMZs2mHMRbjvM73eAdSPSfG1IKdU8EFAMtjj12mMYEklolYYqPaO+OcQpeuEggqLV0DaQ1t1zCbz0jJ421mPmmxJtQNaAfpgTdC47UzaSczNusFQ47a7I0R32jx60JLjiNv3HqVbd/rid14hbJSqt0qxHVE2pbGtQoJS4HnnuN3r1zi7OJFde9Jltj3bMaRv/dv/A/42Nzx6e9/H/O+p7RLKeWckWt8yzgYhl4IvqVyUTVE1VYma01r2G3PuYg6d2TD4888x/3vvYQV1dfFoWBtIISW+WSP4/6YFAvJWFbbzFmfODSO9fKUaYEYE6vtmcJ3zmq+o9NTWonVDSQmfMpIHmCSSHmBmKCuJBWuMxUpSCmDNBgJen1x5wcz7zty1LxDlRsUTs/OKmrgGfpNRQlUjuJKS1hb/vS3Xucv/OZX8WV3/NCVrOE3fvhjxPmctvH1PoGh77kn8AuPXQGXKJma6qGJHFLUBNqYuGvMzuffu/sUdocnW2FMqfCmwqox1SmmMTXlA4qDZIAh672xi6g35zbrFJs5urDP/oUpw3bJdnrGsE060/ceY7xuOVZZzrt65mrShslRf59YsDZjEbytIdOP1qP1Hq6HZ3XmQjQjKWkkisXWoblQzplfoid012Ak452nMZqnth0HDdZ0kNEPpatFxIinhB6xG0qZcrh/iUBiMpnhorIXhhRpraOzATc7RGyhCa1G0aC6LpcE5xWKbWrxNNbigoplaxgRBhRmMWBM1pNlfOeDbn014s1J5y1WDaVLpYHnqq1ip+XLVN8ybS8tDSVrXEywlpAtQpVjiCZxwy4CSfPdMO48f61pJzz5VM10y4Lf5TPV78ecEWcZNytIA09cO+LO7ds8fe0am1XPd7LF40gmI/3IatNz9+yMkCBbT8FhC4gVkonYuunGswWyNyUaMEEz9uCdI4XNA+2mUGzgwnxC88Ql5OxMX5sxUEzt3Ap521PmQl8KtpswPTwiyUjfn9G2gskGnJBz0kDWEqtwudAnRzSZWbCaP9d40jiwTUKYd9iuxe615ONCKp6UPG8/OOFkcUYTDDYYNpsVxjXcuP4WFw/niEy5dzpyePEyJTpcNZqzacTFDXStMghdi8GSYk8MLbkUvDMkEXKAYrWTmkw6nasFxxhBhsJ2vaWbGBXnS6kSk6wG0tYRszCd7bFY9PSpkNcr/P1TfBG+8vnnWR7uQ04cnS347Ne/RzaWzXRKChVUz4Y4bJluI9PsOC5Ao8XL+l1qhGD9gFHrGf1dRI0PitVZuGCxtjIlrd6PtjoCiWSct+czWrEaE6UMUKmWarp0tq5wtDKdd52jZdJNmV6E7bpHXAMSKMYRc1H0ISVMMpQckawuNdiGtjEaYRQtMSY1As/pYbepR+vReqj18AJ2yVgDodHUc4ynyKizhWwwuHfMflHnim2fuH98nyNzAec1eshlnRsYdmOLndFuRnImlkFhI8mkVEhNq95+wSmRoQg+BFKJVb+np1hlc5oqiPeUUigm4K0lbEZcHMkUiqt+nsbXaUbGmAFnPaUojGODJboGM50g1ZWjFHWjKaVgrWeXgF4qzKksUaWRKxtRIDqStzin3aTJghiHTk9USqGOM/Vi1A3GWKedhknYEACl4NtafI2zbLZbjk9PkDJy7fASR7NnaWcHvH79LsFpjFM0IL7BSiGXSEqFTSmIhTSOeGNwJZ9rKZ/OEb9e6PvrE5PtAPXVuizMDFxqGk6jsDeZcEIiuN2cjXP9oc5MFQLeQYqhbTHO0seBedcw5EROCeMNsRSwgWyUvTsmWOeelBMpjbjGE1Nhsx6ZdHNaIAs43xCLUPqB48WSxWYgxXuErmHot8ynczbLJZcvzFhvVxyfbCiiXfC6pkEgkIcNzhlKbpWRmsGKq2xJBTgE/n/s/Xm0ZfdV34t+fs1qdnfaqjrVqlqVVOpsybLkHtvCGGywwSZAwPASCE0AkwR8L5DcC4TcwMu7CQkmQGgSwDgB04Nb3MiWJXfq+65Uqr7q1Ol3u5pfd//4rXNKHiH3Wu/qjTveGDU1NEo6tdc+a6+91m/+5pzfBpEIskwTQqyKJoVF6+gt6Ah4GQioSBVA0um0mjkm8RmRCdaAdVDWEzrdbGs2/azPuWQyXF1w69wsAKXS1GkKaYjPjJLUaYosLLgQPfF0iALsmxQZBEEIREMBCKJpFIbNtihoKZsuTexwRBGGTS1YsWXzFSu6aIsU5+SiGW83Ppjes6kiExoFIiEi2EaRkOZdUp1TWE/wmnFhsMbjjGtoSJtzdZoZrY6bD61ia1vEefCVgu9KvNTxNSe+FCBELT0pk+if17g/B0D4CHTxtkYrQVFZ8ryFcZZRMUai4y5fKJTWYKNBqVA+koWtohhYKmeoig0SmdCZ6zGyFVOpJFdtAp7C1JTDGqUCSWJJkpS8mwONioqLD/W4qFkfbpCXBW+7/yGmNvpbBHEfoCyjU4GUkrzVCFkTZ0heQd3ucvzYjQzqmsp6grAkiaLVbtHtdNENFDtLBFo3ElLSEaxmNB6x/fwyZriCTXWE4AvIVJz7tHKNkpsgBhrZMhHbqIE40ZEO76u4kEqNlgLhN1F5gjAesfb8CaYltM+ej6oXczvYeX6ZQ2trdFVg7CvarR7JGLZhmGovMKgczy4tgquRzrIjTWnXFVYIvuGp43/nd9+TkrnRiMwZfJbQRiOGipn+gKw/AKCz0We6KukFT+iPmTp3iWq9hhKSQcV0VTBnO0wZiVgsaKUaayzBQpa3SfMODoVwltonzHjDvK05tLoCazXtVBO8Iq88ab9PvrpOa2mF2apkvLHBno0VpicTcrmpHxIIfome98wL6PU6tAuPPbfBPgWzg3jec5eWwVU4B525aeSgz1RnQiKIcz6dUM90WJpOmVSuAXZoyspTWQdSU1QO61NavRzrajACZwWFqTHO0O+PUA4KbxmUInoJSoVKwbq4uRgPx1SJotdJoh7s+9/PO9/7XgA++s/+EcNzZ0mUpnKBzMu4uQpQN5uv0MiZCaGbuZuKiiyxAY0goBwgPMF5nIyI6SgA30iNuSjULaRsZu5RIQbhGwSzaFRa4KsslISMbVZhcA1hMWu3SZzFUVO4CmMNtjaNW1Yj6ebCZdSqiPZKwRHFATQNbciAuiJSfSVe2viaE1/ccUaZMXycDbkIj4w3LlHwVkShd2Zn5rHG02q3CISothI0NnjSRgdRECW4QjA4kxBqiRYVQnmcjdJeNjh61tISCus8SyurjPpDtPRkeZtWu8dCth2dqAg6UBqH4MLSMsVwxOG1ZV71oT9F+xe/bXwFf/Kij/l/Ol4DvOclfs8ji4sc+bv+4vGntv7z9i/e+1V/deNnX7xk2f8T8Y8+9dn/y9fUScK//en3sq5nKQYF3kZSvUqjKXJVe5ROKUpL7UtaUlEVlomx5O08dhp8hPYb49Eqw8kaa8xWxdeZypE75xBuxOy1N8GP/Ai2KtFZjnWBunbIzOMakKN1IFSGli18HXl3Ymtj11BriB2VKOXXgFdEw9fzsXMQq7qmirM0urA07gs0en9RtcgLGf86vj3QcGGbeajSMUdpnZDoCMgqyhrrHcaamDwb8QvvbWPnpaLUn4/VcvMjNiHccf54RbLsSry08TUnPmvFlkO4NQbpJWJT1oj4QMnGCFYnGVNTs1SlQUioXImtHVoEvJIoH9XqAfA0grREJJods2OmRzGCjZV1ZLdN8BGZZyUMB0OK0Rhfl+StimJcsbB7O/hNMS+Jc46WSli5tIo/fxbtPY+9+Y30dyxEOH1wrCwvU4zGZGmbhV1zJKkCn6ClwFUTsrLiS4eOcHEwpq5qvKuZm5tBCMGRI4fJ0qyBeweyLCXgUGqCKWBldZ3nT59harrLuD9CIUALZuZmCDawb/d2ZuemYqtHCKLui4ytHR9YXd9gY30daQQCTZILdu6co91uMRyPKMYj1lYXEWtDdH/C3unZCE7opJwta/o6RQG1H9Fu5yzM9Zib2cbJUwW1Vdz3+P0E4cl8IEm6LMzk3Pay6xhMAovDisJXpDLw8qri9l/7TR4+dowzxRjtodYenbQ4tH8PaXeKvD9k72c+yyM3Xseo1WJ++xxjJTh9/VEuDWoKq5lU0QaolWsSbajLDVINWud4D512Dy3TSOaXhtHYsbaxhqAm2BB1JoPHWYlqMP/BOkaTippoCjvEgBXIOiKQTfCoVNPKWoREYpVGu5yqdLzmFdex8rk7+eGPf5Rff8vrWe7OUU4Cr7zjDtKZHplIwXlqWzJ34QKv+aVf4NbDt7Bjfi8f+9PfpZ+NGNaWKtQIDOVkgjMBJyBQEtKMugokrRypJaPJGF9aHJI8naEYOawvGAxcbDECVem4cH6F6bkOL//pn4df+iXKH/xHdHfvoSwK+sMRRVHiJwXaBlyQCJXhK8EwjEmTlIYciVJRes15T9CbWE2JCZakoeKERuzbh6hjK7cyJ1gb1YycEDhvGuBWElu2AWrjUCKgdZTnix0KjXORv5ikmtpUJA3PtawrTG1xVkRRem8bFxNJVGsSL1BPEg1wpjnrRvzgSlyJlzK+5sQXhCUEjTNxRiBEiO4DmwkMSZbkCBdI2ileeoSM6iauqhEIqmBJVQ4hYC3Efp9HBocPKhJxfWChE9X5z5+7SM4c9dxsNBh1Hh+iWWhwvpkTRP8vC9GHDUeCQgmNrdnawY5mpxns3E4I8Kof/wm6O3ehs4yi32fl8cc48fEPR+AOkFZd7GjCpb1XcXzpQtyNWsdgdhofLJ0Du+l0pghWkCeaREftTSEFk0nB+U6L58Yj9uzbycriGmmo8EYz2b2DxfNLtI/uJ8zORa6dsITgqMqADwqdevqjHZx8/nlWzl/Eh4JDC3vpXnuYPoGVpTVcr0Nn/zb6T50ktMasjsZ0EsFM8Iw6GW5mH0sabDqm1U4p2i1aOw7w7EN/hUlbnNi1G2xFjkOIwLf8+Hs5t3iR9ZFj0fVYSzO6dsS2tbPcDvzt7TfzxcE6cuRo792G7ua86fab2dW7iqmzp9n7mc9y9tvewvreg5xJMzYm62xMLJOeZGIUXmR0Du5npVgnmDVGfWi3NZlsIbMuavtuOirH4GhpEEVNPh4yWFtlMqkJ2qHzDFfWaBzGVkz6Y/ojR7+cYPB42cU7SSoSslYblymCUtGSyELtPE5BMbK8/nWv4eSDDwJwpjvNxZkZth25ivUb3oAVBTJtk4RAXVeUoQ3AelEwLONGotUV1JMEZxQz09P0XYVqT1MbQ5K2aCvNSNfITodBP4n3ipxQ1wqjogB2knjanZwQGpSwLWhlOd/+r36Zyflz5L/4i4gf+kEAWq02namc6V6XjapAW48PCmsUSgqSNEEqhTENJ1QGHALvBdI3xHUVNWMVARWi4wONdFsIkeohAKEUztVYo6hqy7hYYzgZctXeq7lw6SLz22Y5c3aZqVaHmRmJ1hlnz15gYdcsOpNMSklLSyrnUCGCcYKL1kreV0iqmD2tjHzSBnIWRGxRNxR8EA1ewNOMIa7ElXjp4mtOfHs3JmyvDNYaIKBFJO1a76Mag1RIIWh32mTdnKyVMaoqkl4PhWQwGuOVQBJJ4QEFQqGVQAtP6QKq14XJiPbE4KTgyMY6oqrZk7dI+iO0ECSrGxTjEXVVoiaKTrvL3PHnSPIcMCgBiBy1OsGvbzC/ug5Aa20Dk6RIISmfPc6ZD3+YujIc+/vfxYE3vglOn+TiJz8FUpJUNWo8Yde585iNNQC8sczUlrVUYuqAywACSIfSFuEVzgequkJIgdaaJEupvWPXwgLLFwZIUkTQWOPwjV4jDa9p8eIF7n/wId729jdRFiWJ1rjg6c1OMbIlpTUIBMVkQifLee65JzEXlnBrEzo6xXckZtInn+uhVE0eEpb7nl57ltpLLq5vUPf72NyCnEF4gRfQmerw/MVzTMYDLpxdJJk/SCJm6Y9qFs+sAHBmMGGYtskyT10FhPbsvPZ66oGkbsXEkO/cD/O7WF5bZ2ltiDMWdBcXJCrXDMuS/voG3cyTZS3SRKGEQsiYlFQj4eaEJM1SrInfVVXVeNoc2nktqRS0kprjzz1K7TxVXZGmKalSrPdj9Z52OnihCMZjaxvnV42Nk9JRzsw6yyaW3toSi6c/7lNV6/SmugTpo5RWkpLlabx/EAhhsViQUTGlpTRSRH1Nj2Nhz04mw2VcWVPXdcNV8+Aa9KMPjEcjgm0hakEqJUJEcnar3eGWd347e172cu796X/Kqw8ejKLNQHfbdgZL5zHGNNO6gHMGnecgTJznhUhWF0LgTGhmyBLXzNFCiF5/SsTnzziDDBYVNDIECJFT6xoB7EtLF7HGoaRmqjuDTiRlWeDsLJ12j0lR06pz+utDEpnjKsXaSp/5+RlSFSjrGh+iq7o3l+2KnHfR7SQqxENwDd8QhNgEuvim9bmJsrlS8V2Jlza+tsQnRPsPaAAu/38aL//sXZf/548+xJ7ZWZiZgZe/DPbv58BX7ufAf/vjrzrm5o997L97nypJ+MChQ4yFIgiFtwmucihhcXiqqkI2WpGNcjJWBoLSIDOUzBE0btyNeoYxhl279tJuHafVahE2RtHOSEp0mmIRVMYx2+txYP9BHrr3Qc6cvoRYH5EaQY2nsmBGQzIp6GVDKpFy6tnjaC85dfJ57njDq1FBUuMxBFohIcsSZCL50gN3c/VV+8hlSRidZzbRrBlDUUeqwt6ZHhtJYBhGVJOK2akZQOG1apIIOCFpT0+xur4StSq9J5EKgWZ1sM7qYMB0O8EmAW8dfhLl3KQRlK2ayhraEkbjionztLIMKRWVcczs3M3Utqt4+tFHEW6dNJ/CihFeOWpT4WqDlklsw7moeBJMFEiQOrb2JFEbUwnBk08+i7Ux8WmV0MpSbDHhwa98iqJvmJ6bJWkLRnXFkaV1vgU4/tC9PLV9B142MzMhsM4hiFB+6wzLq0tgxnR1GoHKHlxjr6OaFrto5m3BerSIykQAZWFJelOoJOXVv/Ib8CvQbu65N//cL/PpX/rnLD1yP8AWfSAaMBto3EkCEuejnqwXodG1jVxZqRQBF/0ovcCjoyuE9zGtBBrD2+g0sn3nDjKlWbkwQSdQ2ZpOp0NZGCbjCUmeMCkrqrLEm0DwbWpnOHbdYYaLywSZNejShlzvwpaQdUA0CjsRhOQbJ4poGdb8K+SWvq280uq8Ei9xfK0VX5YCfzMzzaW8RUDGSg8ut0xEIzElJEpLtHMkBD43O0s6NUNnepqJKZDeopMElSaINAHVkGhdgtctZJAM11aiA7rwhDRhYd9VZGlOcI7K1Kz31/HOkkhJjmDn/kOYRJLUY8ZDi7WQtXPOnj/LkUtn+YkvfJn7X/caBjt3bqlvvOH3fo90ehqAi5/7HM+uL+O//V0IIUmrkmRS8uhNL+PUxmrkQlWGI8WY7/+LvyBZX2NtZgYXJN5oSGm4Xr6hO0SdSC00UkkmpsSqgFURum+8pb+xgWzU8oUCERwLCwv0B32ctY0bvKOTdZkYQ20E/Y0xF88s8ujDT2PsBGECefBslENSoclUi7VLIxbEJXrT89x0ww2MiiGHDuxkx/QsF3WLRAjmgsOlguAMthBMLcwz1ZlhvD7C1RMmZ59jPKxI6oh8vDWVvOm1t6Mqzz2fuptpBLMPPoKzCvVsBLhMVtewC32O7tlBsXiaoq4R9YhSZAQR6E5FInZtNlAhoBJNmuTkaY+p3iwyBNzKClnqaU116c3OcfHiJZzQyFaLIjgOHTuGMBPuf+BLlL6FSATjyTAiDUNKosDVbku5Riq1JWEmUHgjSDTcd9+9XNNIlqVpGs2DZSBLSgpb4StF5auIYi5ixe+qPtUgI0lSvAmU44pBXXLVwvQmvRJr7FYiscZCpps2XXROdwhQMqqeZCn2sjcSipoL997NZzcW6QyH3PbU85S/9Vvk8/M88J9/g+XjT29Zb0mxqTDUgFQQW0IMk7rGeWj3IqViNDFkSR5Vl/BYX1GUJZVPsAh6bYXysRo0KHAWnEMlGuMsxaTCWsNc2ibPW8iQMjPdpT3VppqUzE/PMx6PsM4yNzvFnl07+fjnH2D3kYPU4xWsrfHWYqqminOb1Ae/hRaVUYI1IlB9I+YQ4hxyc/Z3Ja7ESxlfO6oTuJDnnG61sC7ww3/8R+y+7hhpq8VodZXHP/FxPvyvfxGcQ2pFYgNT1vG4ThAqJXEBqxJSrUizhLSVk7RzRCvFS4GoJWl7jrH19FOHrgxKSIwQTLYvMDM7S/CWwbigPzXFZDCkozUdJTG79xBaGZkZsbZsSHRK0k44LwTZeAhAMTfHcGFHbP/YwJ3v/484JXnFu9/Nwutfz8Xnn+XiI48CAm8sdqPPpf1XcTpPkEiqoqA7iU7QQkg63S4uePIEui2BKQN16RrAgEfrFIkiTXOElDgK0CFWCUqhFHSz1pYJjbcJ/f6Aup6PKhdEr7y6tuRZh+efPc1gbYUTzz5PXRicjRxK6SvSBHTQtPMpyqJmY30AImdcK9KO4siRfYxF4LQOzGybInMSbSy+FrggGa/VfGXlSbyNQAWHZC5vkZaWOkl4y3/6A/hPfwDAK/+O+8IKwb4PfwI+/0W0FEyP+hilGeo2g6zNPceuJ09TismQdktQFaaB1SfUdUlhzpMnitngUUIwHg3YGA7xAmrv6U7nbN8+xefvvJvTz52M0mdBUlvQaYtiMokzVhtIlUBKjSPqxHgXtrzwBBCkIckVfhArVYlgXFWRA2cmGCyDwtHJcmxZsbp2CYDSFThTo1OQyKiP6aITgdKaujLkrR7B1xjb0B6I/Drf6I4KDWYcFYtKJ1jbGLBJmjvaEoxPP0N59mnC0jo8/Bz2V38VgIuPPMh4ZYU0i9qtPgRcCCCjiHn0ahcoGagCVBbaUiGkZFJ60rzFcDIkmBFTbYGUjnGpGJeOTp6R6QykZ2NYEmzJTDvDecfS0hJO5kidsrY2QHmFloAQDAcTsIFhXeC8I01Tvv2d7+CTH/0w48EEX5UM15fxIYlTOy+aak820mdNe/OrirmoGyrEJtI00p3klYLvSrzE8aISn7W20eEMnHvicR78m78G5/i6H/xBXvcPv5+lE8f50h98gBAgixL32NpSj8ek3qAyTakE0koYTfDGkoceKs2jUK2z5LpDvwJRG2Q7J5cKPKwPRmgddQRjm0gRrMcmmtW1EWCYn5JMT2cgwYQSxOVZjilrJpMicpS8YPjMM6ysriIIvO2f/3N2vuI2Tt57H3ioi4q8qqIVUgOUkbJGNjOX2jtW++topTFaMlivsYWLO1sMSbfL3v2HSfOUbfOaNCgWx4+BDOgQsNaipaByBVIlOCuoi4rVpSXwh7B1FZVyAKlTVpf7XDpxmnIyZFQUWGNxddQxDSLBudiiKushQjmWN2oqPyYwod3OwMH0bE175yGcDiTrK3hn6eY562vrpHOz1IWhI9uMpaSSNbKeUJQFH/ixH+XSeMjuVsp1X/gyr3jo0f+OGqJD4NavfDWdwUjBR1/7GnIPbQeDtXW0sPhMkbWn0DIlybq0O704F5aSlYuLDMZD9uzfw8LuBda+dC8Iz+lnH6MjYPH8GZJ2ii0rpGxQ9aqF1rB7bYM500cnSdTWjHkBh8ZJBcozbrW41Gqh8XQaArupPV4leO3QBtIsMDETMHWE9zda1sY5vIKkEWl2oUKFEu+jY7mUhv7qgFYbRGYxyiOCI1MqgjhEwIci2nfpgE4EnW4L32lhpeB1Dzz1VdevFoL/+q3fjNo5RZpmBAfSAS4Ke9vgqYIlMRZPiyQEEhlwSjOYCKTLULlB6ozC1KAF26fnaauAFRCMwouE/sSxWo5ptbt42qgsxYohGselS+s4p3HCEXyU3Qt2FSMTvA7IUBNMRlGv8pZveCNPPvUw/X6fkKfIoNC0mdQl1io2Lf88ouGqAkLgmtljfE4bW6VGJ1RIFxOjupL5rsRLGy8q8UUtRUcA/uoX/yWdmRlavR43fdPbWLj6aggKqaIZZoRtRu3KqigwpkTnmqSVIrwhRyFrx7i20OkgpcDWQ5K2QAePN5agFEFEzzYZFKpBVwrr8dZSY+llXXSSQpDRNDNVoBXCBIKwIBu5IxkVSHZefxP7b7+NxaefYddoyM3veAcAgwvnIwdKBpSiccR28ZkTnqKYNG2Y6OyAjE7SeEt/eQkp23RaLSpTU5uaRGU4I5EixTtHlrZQMsE7h0BgakNwEodBheibdvDwvgbirbHW0ulMc/7MMpcWN7DDPnU1wDtHVVZINDYEtIrmuuDiXK0BxWxsDAh4RmPN+nBA1r7I3qv2Mj8/Q3+wRFlUKKfoZV3mU8mF06c4Oa6oWpI0b+PTafZcfTX3nl3lbFmxrRywp6rQ3vPJm25gNU0ZbwxJlSTLFarVBp0SBMyPJ7z5wYcRzoEGUxtsltBqJWQ6QyWSqrKM1/qcPbvIxsYYaoH2ihtvmOfS6jqXLi1FayYPYTBk2N9Aas+4mKCbzYtrnBx2D4f82sc+TPp/0RYzQvA3L7uVcaqZmzT2SaZkONyAXCKVQHnF3vUBs0tjgoEdDTjq2rUBebGI9RM67YwFIxmN1tgfJPODdSZeMrYpZq7FZGEKgiMEFz+79QglsNYihMbamtldM6R6RJFmfPLNryVUFVIrgnRUVc2liWXUyplms5UZhdQbLfOmhSqiqXLjzOAJpEmClpbBZEBL5ug0o6wrZmc1wtcIDzrVmLqg25unOy1YLcZoHcEnWvpI6TGBmdlteK8pjUGnrbjh8TVWSrw3qGAIvs32Tpcd83OsnTpPK83xwmJChU5AVOBdBPAEovC0dw3YxW8aLMf2plQKU3tCAKXEltWSuFLyXYmXOF5U4gvBY2qDkJGz97Ofu4vu3BwA9//FX/DlP/kQQsYHSEiPVAGZK2rjEM5TVzWZcgSXQJBIHZUiglQIoVBUVKGPFh4vJFIHJsWIJPFYX+ItYB15miMIZGkzAJcG72tqI6lCQNQZxlhaWZdOpwPEh8+amslGn+k9e9h7880IKRmvrfLER/6Gx/76r2h8YfGlRZU1deXI8w7W1LTyNmEcfcEmZcXS8irdbptenuCkoK4LUq0pyxrvHCsrZ7lqYQ8X11ZRzkOoyKoxeTEmHxWRkhFMnEMRaR37Z6aQwwI5qhg8f45Jv8/Kxjpja3GTCdZU2Ma/rqxqUp1GDUXrUAo8Emck1pkGZAOiUsjKoauK2taYejeHDxxhbWWVlZUht776lbzqVdczvfse8naXC2fOc+r5ZfqiR+e1ryT87b2kJ/qIpGZ6agoAM9tjnKVcKiu+53f/M7uuO4bOW0w21jn9pS/y4L/6hfg6YwmJx1hDt9thx8wUctznqS/dy8zsNF4opIMdSYJMJXUx4cjOa9i9YwfSOr5w71foVGOqqmTPju3c9/BDJIki1Ibg4gzPEZirDGkIfObQEZanplDQ6E06pEyoq8DceMgdx59GuUAto4MBQFoUtLN4P/qi4mBp+Jd//Yn/Dsj1PZ+8+2t6Riqt+YUfejcbJLGF3fBKEx0XcaWiP+GB/cdILz4KwTNKJVWnhwgKkSpGwwF9M0H4y23M4D3IqPzjfUVVVXhvwesG8xjhIamW5KlEKk1dWvK0zaBfIebyqKUpJUFKskxTlGPWQ6Az1SXgSBQIYhK1TtKdnmd1dYOiNuRpj227dlKVA0g1xcYGxdoyQsPBPbspNjZwlSFRChksJTWksXJzwTStTseWDcfWmgI0SNVIczKIxi9RyrgZ+SoniStxJV6CeFGJzzkPOkKnrXf8zg/8ALMLC7zpR36Ym9/xDh79xCd44hOfQIrGecE5kjylLPskRO+wGoNu9fBJRuEMoXRIJKnKUbnDuCLykAQkJgpHGFNibHy0a+vwProkCB9i63U8pN3qUFcOaxy1KaNjgU7IdcTGOePwpmb5+JP8zc/+LM5CXZVIHcjyFKUzEBoXiA+wsRhTo7IMYzzTU3MkZQ1ANZxQhKjMn0mN1DndVoo3UULNKkmW5mgqem1JKlPk0HLdPV/itrNLdI8/2nTQRHTAlmqrktZSMUbRXVunKiesBsendm6j8rHK8c5HQm+SRJFra7G+RmvZmK8Gon+7x8vG2d3aaEVkai54RS/rsG/vVQS1gmhlLE/GtLfP0VEpSgQuXDzL1I6DyPE6ZbFMYIyrJ1Q2ymvVpmIcHPuvPky1vMT9v3MXQimue9ff44Z3fCvDxx6D+x+Ji6jw1HXB9oXt7N8+z5HPPMaxpx6n7R1CaaSK0mVpqkmShHDyKerpKSSBl9c1R4qSMk04+8QT1JOSXAr2bgxJ1gcYAkIn7FteBqCUsNjKyXWCNRXGVkz3Zqllzf7VJQBuXL3EYB06Vfwu73j2BDekKVWeUeYpe0cTUuC5qSkuakX36iPc/JX7ePaON/LkxgZBWKSWeAPWxIRuxwOUDZybneeH7vkyndEEm3ebiiXqxgoRfRyrIho299c36AlB8IEWEltVJCqDWpDUjtxFf/rcWDJf4cqaJJVgPVExJQqle2dj2gg0GqkeLQOpyhgON5jqtVBGYkuHzWOb03uPc4Zuu4cLDu8Nxld08hZlUUOQoNrILGNuZ06vqiCfQXamSLMcmaRYl6JEipc5Mm+xeOEck5VR5D4SmJ1rU+MI0kdbFhF1Ql2j+xl1ekVjthuTu/eOPE+33ELEVsa7kvmuxEsbLyrxQSRp++AhCE585QvRochbvv+3f5fb/9538OQnPhkV6RF4K2i3pkhHNco6rKkQoaZ0Y1QnQvW996iqRmqoS02dSEwQEdrsQDhBORnjSbEhELRGqNC4bgucscxMTyNURtLpUpUbtDJHt5MjnKN76UJz6g4pPNYBQqOEQGqF0lG5XjSQ62hKBEoKvK8pa4FUGamOWqQA1WgQARZVh2R2HuMLNu26rTEonZO3OlhraCUZeE87T5ldWefV99yDdv/nxppGCJ689hgl0KsMWRWYSBn95bxoWGMB50UU97ZmC7KuZQYixBnYpgUN4OsIiNhY2+DZZ08wKsccOnI13VYbHVLWF0fc98STHLz6AAeuuYrMOlYf+xLzSQULXY5tP8BV6Rm492Gmd+zkhsNH8EFx/p670abGJgmT1VVm9l0VwRw0IuQhkGUZl1YuoYbr7Hvqab75iWdI/gdtSSMl5/7xjzHJU/qDDYpz59ipBe19ezh7aZlXf+nLfNsXv/R3Hv/W555jvqqotebGb/5mOvPz6CTBFCXmzGn4oz/i8IWzX3XM687F+6MG/vDm6xk2XbWy1yFIzWh2Hu68k4O33srhPGeyscbJL97NXb/x69RVSZ5Fg9XWpOBkngHRmDZ6QNZUpkSg4z0mIsI0zXIWT5+iM2cZtdp0K0MneCQFxcjQweDLiiRVdMpAqhyUBm1jRTRMcowSTKWRR0jwWBfVf5LgSbRDC8XCthmUlExPRT/Goa+ppMJ6jxQKESztrM1gOKTXTnDe4RHULhLK01YHZw2hsnSmp7AiRaGZOIfP2mgktfWs98eI8RAzHuArQ1kYxtumaLUSxqEksigbNGrj9eca9GYM0VR58fpIAVo17iQBtgatV+JKvETx4hJf49F27ZvfzCvf/W6ev+9eEII3fv8PAHD+8SexxmK9RQVJK0BVG7I0I0kl/UnABUcwlvGkoKcThIicNiWgripsSLECEiHIdEo5KZnKWhjfRkkwvqauCpIkI1WC6W27uPGW17NRGAprmckdu+a7XLVnB2fOnabT8MySTOODI0174BPqugAiz0pqhfGeYX/E7NQMSmvGq0NEEAidUlc12ktss9ZKX5OFilzFmWfeyqjrCZ3WFF5IVJ4yLAxJmpCpnDr4qCpTGbRz/OWBg4y370KlrcYsVqAkdNs56aWLvPnBh/GVYaAkM8Hhnd+UAicgsN7iG7PaTTsZfEBKj/P1C3b/EILF+bJxHHcQLMOR5OKiwFmHtzWHD+7n1KmzXLi0Qn8y4fDVRyg2VjDaIBINdc72q27g7Oce5QCQdufpBx1RenXgrf/qX5M2LeXn7/ocz/3ZX/BaIvoVoij5mTMn6e7YQRgNSULg7pcf43hVkwuNzFOyrM1CZXjNgw9y5q5PM0w1OsvpBce8zrjw8U+yf2WNgxfOk4TAZ7fNczrNecUtr6D/xBO87uQJNBE8MVEpa5OS02ceIdGSA9dfT/e66+B7v5fTH/sbBnPzVOcucutgwN03HGNQ17z92RNkUkArInfrxmJnefESPPwwT3z5i4zaCdd969/jpnd+OyvPn+T+P/8QY+toORBCb82iytKQTEUeJp0cL2BUFKATprZNce7CBnnlsVnKXYf3sm/HTqgdZ54/QcgypmYTVodD0vYUO3dldPM2i+cHdDs5K6uLVCGjWh0SXEBnmkSlgNmyB8tzQbQlChjvSNtRVMFahZeSRKcU/TGZMtTjDdpZVFtx1iGkoK4cSE+ioSoCYSIoBwX5XBvnHEJ4sjQlINHU4CxOJAQVGIz71ENDWdR0Wz0QAinARHvi5p9GmDpc5lISokWXbEj4EGfy3tFoel6JK/HSxYub8TUtlcHqKruvPcbLvultSKXoLy7yyfe/n4/97/8O52yEqsu4izPFhF4nxwdDR+aMh7FesZVBjkuyPEMIKMsoRtvJM7pCoKxDb6ww7z312dMoPYVUnoXds9zwyptJ0hbOelYuXIL+WYrRhP76mAMvO4qphtRlG2fGFKMITpif7cGeBc6cWSJvTZPmGZqscXk2oDXBTcAHUp2w3h+wtHiR+RtfwVjX5Jmk1cz4kBlOZ6gkwVmLbloztTGkaUoQgrqqkJ0cled453EmD38rygABAABJREFU+gsCDOdnKQ/sYceRY9z697+DnUePYKuaE5//PPf87M/wZsB5i2+4Zh6iW7UUhGAxvkQogSNaHAklGzBFQCmPdwn4BITH+yq+gQoR5SoEVSnpr3pylTDc6HPi2eNIKVnrDwi6xfOnV0hdTdKa55Xf9C3Mzu/ho3/2h+yvIhJyY73PIGtH5RkhuOe3fpu80+GGb3orB177Os686Y3w6CNxP99YRk1N9xDCkei4e1hJUlZ1RqgM3/Xb/5mdR6+NSiSXLvH6v/xL5PveB3W9de/deM9Xi16/fmWVbGEXgyee5GUf+H24/nrodHjF6ioX77yLR/71v0EpjUolC0eO0mnY4LY3jV1Y4Oi7/x5IxWt/9EcpCfChPyH/0z/ADSLoxboQzYudg5/8Sdbe/U4GO7YxuLTI3P6D+OCj2S4RaBmL+Pg4zV/aoF0LvJWkWUK9NCbLMwywa+8M00cPocqCxbNPMxiWzJPw/PmL7Kgrep0UxmPSUUEnSWhtlHQzqDZGtKoJYdgHlyJWh0wT6LUFM0UR52iJIB9NwEceaSC6R1SVYyPVrHZbaASJ0uxe2IlEUJkChCSRHp1IglRgPV5EbV7jJFneo6492nqkTuikLZTwuNpQVzVUAZH2CKqPCqC1pOgPCbNdlFRIqUF4nPcoGRoXCACJEAopEpyP2qFaKWhEvWnufnUl712JlzheJLgFgvecefBBfumON6GUisRg16jMBwnCIRtCtkDiqgqTCowwtFpdqknAmDHBe8oyynC120kEZDhBr5zw9lPn2eE9M/Pbmer1aBUWJSekicQcP4995iwhKKSQZGkCPIib7jH5ujuwgwEiEehn+xxxFTON2e3cyhKZEky5mtQlVFaytr7GaDBg21SP4CXp2pCp4Yi2FtRlycbSKnuTlMnY4K3A2WiP0pvfzURJtvXmqa2l126zsrRB0m2RpSlWBKqyxCUdJpWhcJJZkbPVsgkOYya88cd/iO627Xzlgx9g59FruP6b30519jT86I9FnzQnMNZhrMNKH01ZvUU3Mvwh+EYFg4jKAZxzeCejY7wMDTDAN7Y1PiJIgwAbWF1cZrixwcrSJWbn5uh1OgxGFSbU3Hj1AcYqsFgE1s6u4Y2mUdfCFgXFcIBONMLDhfsfxwZDquHWH/ghjnzrO+FXfxXRgBiMMYSg6W+sb6lwlLVDdqeYX5hh9cRzrN3zWWYGBQe+/TuQ730vK+fPcfaP/hupFtjRmKdbbVTS4qa1dY6uraCBpN1mGRg9/gS9P/iDCI746Z/hwPd8F2tPH+e6/bvIOx0YjSjuuovWhz7E9M2vYPvR68hrC3f+Lc9qxe73vAfe+14OVUOe++AH44ORtXC1QWUxY772d3+fbGYGgOc++zmWH3yYVrvHqD+MiEprcY3V0T+968svdobw//MoteZffO+30p+bQuOpihEeSNsJSiR4Y/DCUzuHlJEor5KUmR0dUplg8RgJxntUIMqkYRFKUTuJqQVpq0N7ZobJpKIqyoh8lhJnQQqNCw04K/im9bvpzhJVaJSURL8IgdACa+Jmz7krtkRX4qWNF/d8Nj13LeJMTDgakdtNi5OYHL0TiBB79NIKjPHUClTwpKnCmWi+6ipL7R2tlMiJ84b59XV+4JlnGq7YMy/u0/zGb/wP/2rfZ778ot6qFoLPXnsdxjrOnjzL2tIinXbchVYodu7bT20sWZpgqopxUbB9285o+eIcM9MznF5ZQdJlMjCUdc1wIy6MSiXsvO5GZnbv4fjdd/H0X/8VD1aWH/3IR7j+Pd8bE5/3jXNDwOEQUhGcQyLQIokLkwzN9IQt7hNBoXT0BowzlNgAlDJBSE/D2UA0bVKsY315lVQEdu/YxumlAqFSShN44ze9kRDmGVeagzv2R4AGEIzFW8vCzTdx5DWvYfmZ57HBcPU3fhMAa08/w15oFjiPtQZfJ7R1i7qO13DPwSMs9UesDcY88aEPMC89pl9z4JW3w7FjFEpzUeVMpZBOKc6lKULl7JP9re9oUFZUec7Jn/95dpkaZmYovvs9dK4+QvCOL37wj2n1Ohy75Ramr78evvEbMYsrXLq0TufxZzjw+OOcXdjD9Ne/hakDBzE2iqUTrxi9qWkmOj4iX/jZ/5Vy315e8e5v49DrX8fTn7+L0XDAxuoGjtimw8UKVQMfP3AVq3knojlVFCkRqWZufo6bj11Hp62xxhBEiqorzjxzjt8Z1RSdLl4pIrQlmi5HJxMTOyU20M27kVOqBTiDDIEsTWjlKdaMUBhSAU4EQh1YuLDEP/j0XewdDpFT7ej6IyUhaCbrY4KzZDhIIEsUA2OZ+DY6yEj2Fx7pAr6qAIV0gqASZGjMj7zDGE+atJjdvZfcWdywRMqUVtaimFgsPnIbECAUQkTZMimjOwQhukrIQGODFPEEwfsrrkRX4iWPFw1uoQG2CIg3bQgQNqsPmv9W2EBc9LzFWoWxAZ1WpKnAVQqhGv4RlqIsUUlC3mrRHo/Q3vPRa67nXGceawxTU12qMkofdbptbnzFK3nu+Gk2VtfirrEYsT1xPPiym/H5FMJmLG2sMrdzG2+Y7fDmf/k+Hv+5n6P3suvY3uuCNZw8fZoH73+UJ048y97Dh6lEG5nO0l+6wBFf8J6/+BAzeYsgNYmQCGmbgXxDyneRURxKixSGHdvmowmnDDhvKSZj6jSweOo0jCXj8ZCyiK3SQ9ccZe5Vr45XdDShHJb019eZrK/T274dduxoPNWaxOcsQWokjSFtAOENSkbgitvsCr1AqNh7Fys8JZtZWwwh5BboxDqHTjKwjnK0znR3BuUHhLrizPlTZEmXpUdOoQ5fzcKBg0zPzwKx7YqQTPoDZvbtY/8rX4mQivH6Ko/9+Z/x4Pvfz01EsFA1KaL6jEoJBMoGGXv+1BmKRLJvz046vQ53vP+3SKeihBwf/CDnPvCH1ChccBz95ndy4/veR9pqYX/t1+C55+BP/zSamVoTYfLPPgvbtrENOP0Xf8XJP/pjvKmBQFKW3Pr1b4bv/m6KX/q3KA8rlxY58L738ZZ/9s9gdhY++EGe+OM/QjeO8VpIRuWExmeX8/fex8Vnn8Rbwzf/3P/C1W96M6e+8AXa7S5+1Cd4uyUjBrDYaXOpPU2SB4IMKBGpF/tvvwW/cy9iOkU6w2hY8fSDz/LQU6d55tpXM5yZojPTQaYpKE2rFSsjFyqClLg6ZSnY+Gw5iQwOJ2om/T523dFpzbKw3TPTFqSm5lVffJw9i4sAvPaxZyieO930HaJ0GiK2PoXzCB15sJeKikf2HKFer9CthHa7w2q3y2J3imADk/VVZNZBJYrJYAONIW0pUtUm7bYRwZPOK9JWgsRCGDQdoGYH0GzkJDI6xzekdYmAEE1qvaRBIlt8M6e/ElfipYoXlfhaPtDahFn4AFiiEVAAF2JrrdHYS31AywDS4n2KlJHPIxGkrRxjQ2yRSo0VIcr2GYduFo9+q83K7AxJomntv4q3/PA/ZPd11+CNZeXpE/zVz/6/odUFoK0S0BXPbd9Jmc+gtWK1C7t27+FCkgNw9tCr6e8+wo4ZyVULKXve8moO/MN/yKtOX+LOu+/hwadPUYs2G/MjinFs0XlnWLm0SCtX7LtqgbRxagi2pq4npFnkg4UQEImK3LpUo3RKnrcoyiEzrahgL10d5Z6Aypgt3fm6MgwGA2prtvzQIO4vNlFvAoVodsFBxOVVSYHb5EN54mZEyghx9/4ywCU06dNF8d9Ea6RUWGsBR5JG+Lh3Hu9rUlUzKTcIPuG5Z59n7eGnue2ao9z75LPs7cfEbV1MNisnTvCXP/3TaC2wRU1lC7ZNT+Gaysdbi8paUanGBdCOTDeOA3XN7Owsk3pI0a948Fd+kW0jw5HveA9813cxc8/nKT75OUJlGC5d4r7f/E1C8LwO4M1vhtVV/MOPEIKL86B3vQt27qT/Mz/LVe/8FvLhgLMf/ghCSK5++U3xOj3+OLUpMeUoIpP/8A/51PlFbvonP87Cd30Xe7/8hejQAVRVRUgke26/HX76f+LQ6bNs7+Tc+m3vBmDx6afp94fsWNjGxmTw3+lK1tZgnAfrmTtwFW/5yZ9m300vIxjD+sMPwamnCMFgnOKL9z7G9FSLMre4Kc1A1HjjmOrMUWRVFGgQEVgSjKIYjXCANZI0VbTmc3IRW5FlVTCqDFpbDiyv8c67v4RuNmy3PXPqa37Wv/mhr1aSMUrx29/9HdTzM+SpwIUaYTUmsThnqBNBsB4xLhgMh2RZSj6WlLJmLddUkzrO6jbFuomKNlpJ0NFEWvqA57LcW6SBhBc+FlfiSrwk8bUmvhbAq+ua65o5V8x+oqkgGti8ZEtkVgNOCJ7IMlaUBS0QOspsaZlHJ+lIekAEhxYJShmuarhiMxvL7G+3ETblHf/L+2hv3869v/5bHHn17ex59W287Tu+iUd/54MEF0hMyawyHFxewrUrlPRsX19ithwz18xoOqeexqlAgedMUlP0NPM9R3vieX1Xsm0msLxygQ1RsDCJ2/xrq4L1i6cBkMqxMIq6n7uWz6FkCUKgEKT45krqWF2JQBgW9Mo1ymGFFhktUTHdyGRJnTBejpY/2cwMwRhUK6M9PU01GJAtLRF27cZ7i/MeXxtc1iTYxrYFHELQtB9D03IWl1vPAqIprwcZ6Q0CgbWeVkuTtVKqYoQxhqJwJElG0q6Znk1YPbnCbL6Hi+fOMcZzZlyy7hw2NEM+CVILfAAlUkRwUR/TB4IUJGmE9XvnohpKqhlVJQtT08x042Zl3sElaxgODflMl/UTTxAePcmRM4vwJ3/Cznd9O4sf/TjGeZ6787M8f/A0erpHffhwdC0IAbzFuYaMfnckmJ+amuFlv/vbzL7utcx5h5CScnGR4d330PvEx+nPL3C6fJ7rt22HB+7hzOc+j++0eOv7f5Wj7/p2LvztpwG2qAeT9XW48UZe+e53I7RmtLzMF37vD/jsr/8GxnsuLa8y3W7ji8nWTBMiutTLyKj89n/z75laWOCBP/5Dth28msOv/zrcVIfyK5/hwceOU4UEi0UlGp2kdDotyklNKjVKB/CRr2mKgsFSCY0bulABEyS7d8ywXE0IuULpHk44nJzQLiu0DzxwaC+veP4cn7/xGuaXV7h2cfVFEwQS5/ixP/yjF3kU1InmZ9/1dZxLk4a719A90EgZSepIFR3ZMQQRDXG3Nnc+RDnDK3ElXsJ4URXfy+yLvwFff3HxRR8D8LZTp+DUKXjb22DvHvjzP+fVP/UT0G7Dxga3fsc7uPWHvu+rjvmWh//Hc7w3/MI/+x/+3R7gur/j5z/8kQ//na//f/3JH/+dP/+aY2WV8w88zNG3L3Hw1bdz23u+h23XHkUlCQ/85m9yG4B3OC8jA8H72O6kcR0gVtVKNi1noXDCY5yPUlIKNgV/o/VoRCAGEUEGtXV4GRGeIoCxjtI45HhMnmckUpCrlOF4g9n5Xdz5p5/DTlaoGlSnD9GBQskEESRS6Gh748CbWMUCVHXJZGMdCVTGYr1A6wQAbT1uUnLg9a/l2re/HXPhJPbgefjO7wZg8sSTWO9xwaOA7/3QH9OemYHf/m348pfhs59FzE6z801v4ug7vgXuvBOE4PA/jd/z8Y9+nMf/P/+OQKDrLa/YWKPXzEh3HT3KwvVXg6q59vVv5Nof+kcArD399FZ1EfAIIVh56im45Rb+8MabWOx04rVvDIu9dVRliehmICXGv/D5iHSIw7e9mrl9V3H883fx7Cf+kqdsm4OveR3i6PUUn/k4zz57iizvQDUiqA4h7aA6LYS3eO3xSiKCRglNvTHE2YpM53gfCKJGAYnQWAcyUZjgmdSC+aRDUcVKb5jHzd9oaoojR46ifuyfEF4bZdLM3Z9j8vu/iXSei4t9Bitj3EYfe8vLmN4zA+M+M4urXPX5L/HM//YvKA4ewNjAaNRndWWN9Y0Bzz33HEevvw6dpUgdP7eSgm3nz/HW3/ggs0FwXmuEdYggoEEDCyEIIs6aZbyRcchIc3AWSXRuCV8tDXslrsT/7fhaE18B8Mkk5dFURRWGkDak7csKC5HYHiuLVEA7BD7R7TBSMnrT4RAkKJVdtmfxllSBIsN7wQ1mzM+fO8tH9uxkdWaWG7/+rdwCnGtN8fx3/gAkileNJ6Qz0zzwEz9DPRqQ1iXpZMizb3gjvjPHTG86OsTXFfmzz/Lmf/Uz3Pnz/47BVYcjOV0JpAxIINEO8/gjPPD5z7DvtS9jeiZn9vwar/uPv89dP/kTjPfsw3kDIjB97gJv+A+/zl3/5Mfp790VkZNBIDE46ZEya5STA55NaxWHEhJd11zzX/+M/Y8+xvDseU57wZ//7M/wdT/8w3zdT7wXM5nwhQ98gAf+/a9wGw1YpVFzaTCcuOBxDT5Fieg84IXHCwWoy4CAhtoQ7XgkHhAyIuUQkrKu0d6Rqjjv80T/vKoWpIkmkRlKSKba06yfO49Z7oOto7ksgJBolRGkRhB95QTxfMvJiDSJFd+mHmMiBQmCcjxkU67Ke4cVno3FJeb3H2b2dW+MM7Jz5+CXf5kT7/81imZqloTAR//nn6a1ew9vFNC97TZ47DH8iRNUa2t0rjsWN0haIxcXeeLX/iNP/Mr7CdHopkEc0/xeQz0ZkM/NwNvfzu0/+IOMqhp++Zf5ygf+gFbzukTJeP0bRGEItqHzBExdR+i9UFAbTC3IhKAyL0h8AfCe2d174wO0tkZQDlfWVIMBrdlZljcKisEQGaKQeDAT6nJCUQSC91GvVbeRIqVcH2BHJbYaYScTJAqZGlLVZfH0JXwp0ZnEUzKpLKbWOB/rOtMkjjRrsfD+X4dt21n5nd+id+sryL/5nVglUMdPcPXsNuz6Biv338epqZy1NJC1NGF1wFXA5xbPcV470rzDwo4Fdl57LQfyFgvWsDEecOnSEnVdo4QkTzOmk7jJkUKh00YWzoXo4dggk6WSW3No7yS2DlgLIkiCdY2W5xUC+5V4aeNFVXxLSvKc1iAk/+xvPsy+G28kbbcZLi/z6Mc+xp///M9jqooQAnkIbAvwVJIxTGTTwshAKKRMQGtQAuFqEm9J82g82u3HVueyFFwUjt3NQjkUgvGu3fgkxzcuCSvbF6iynLSc0NIZF7cfZigFD997DwCvv+OtdKdXAbg4v8Dywj5SrRHK46Uhswq9cYmVxx5lLbdcSjVTSc7V0zmvA4p9exgcPgjCEIIji88x1aG9lEeObA3jEX5LaNcFQRBxyU2URoQIyTaDCaYheRejMRsrSwxXL/H73/8Pqa3B+4C1jr1l1VztOAWRwUZV++AjCjNEyanQ7CV88NGM1EdVjOAhDlOSqM4hPEF6vNRYIrJWN2ou1gcUDhkU3oOpAyaxSBG1TktjGE/WMe0xuZERvAO0kFSlAWlAOFIR7aamvGMh1WQzsZ3Z1TmFNVhrmHjLnqOH0I8/AUTlLWcdG6dP8F//0fexsG+B8NBzfM+TTwLgd+6kkprUWwiBcw8/Dk+e5LrZNt3rr4M3vAF56iTLjzzCg298E29Yi/PXu7fvYD3JmxbwphLIC/QevWPt/Bke/O3f4w3nTvDAVx7k7ocf5H0XzsLVV0PzGY2tI5fNbZYbUf6NRu4rCI0nIIJnMpwwk7XIXrA+6yxF6xcICRA3i4PReGuWawcrbNcZpnQEDG48xi4tsbYckyxCIdMc5xWqLKmrMcVolVSlKJ1gfMFwTZGcWyTrzCKzBC8sQULKHDvavfi7G6Tq0W95G2rvPvjzP6f+6z9j6eEvsO+Vr6J9xzcSVj7EyS/eR5op9l1/PfLYNYTFU6RVSVoH4PO85o13cHZhjvXVDVbXVnn6mafodqepfE1rusuhQ0fo5in9lTUuXbpA0qwuIhgS1cWrqMMpkyR6EyJIpIocVBc1QqNZbQAfmtG1ijYbV+JKvITxoukMCon3gXOPPcZ9f/ZnEAJf/2M/ztf94A9y4fhx7vrPv8Pm9joE0fiH+cYctJHP8tGyKHjIlALnsLZGqZTQQBS9j27ml56JQ/aQalYuLtLeto203aKeFGycO4cIAVOMkWVJf3mVvvS4sqDd7tJfWcOtRQL7Rn+ZteEcWiRo75HOEtodyqdP8Pp3fxPrzzzJX7uCybqAZoyZSolWHqFUrHKaJ1kniizdFAf2BNFw44hahH5r5uYJwWFdQNoa2Xiv7du3i2frEmcMFksIAed8bAM1FYeI9PCGQiKaOaoneB0dHILASRNVW4Jo+E+RfxWduWOLUGsQTSUeXCAEG6tRKfAINALhBN5F650yqTDeMZgMWS42GAQ49Mqvp392QHHmPPAYMz6wMdpANvJ1aZKSpQkzZcXUaEzetMS7acrxABNbk6qcs6fPsbIY/e0qQEtNa6rDoKgYD4e0zGXCuvUOZMKOq6/mwKFD+J17QKXsdFFxhzNnondbkDRoiPidBUh99HoLm5qP1kQ7dEB5jzIVNG3A7lRvUyk5vktz7sIapJDkDZ0hNZasqvDWkdqYmOKME7QPFL5ET7W3zqOuajbKkovPnQCgPbcN6RWXLq6Qdbu4ssCeWyWpY1vVe0vemYZ2DyEj1208HGGrCUpkaCc4cPAoB4/u4c5P34nWCalLUEqTkDKxgbydIZ1AuUA4s8JCGlVobB03U2r7wta183jKwQA/HqGmZxiur/HcF+9jtpey5+3fyMJtt3Pxb05gXY0gbgbyXperjh7iGp1x73338+iDj3P8mVN8wze8kbX1Szy/8QydJOXI/sNcc/VRLn74L+PvFaAasW3nA1IL0iRFCom3Fm8D3oG1HuebTYrcvF83cQNX4kq8dPHiEp9KCCHOjD70L36W7swsnalpbn7HO9h5zVHiKqyjKj4OcARRE4SOC2/jSJAkEmNNFPB1nkxkGG/QkmYGAM4JKh947vOfZf3sWY6+/jWMzi8yd/AqpFI89cnPUPTXEQKSukQVJf3+GlUnZ+/hwwgkZ8+f45oGCj3d7lF3ewzHQ5ZXl0ilQNYjVs+eZHXXPu6/+27atx5h39EF5vtxlqWUjir/splpNbQAGS3Taeykm3ahbvTxoxlcpBREtQrvXeQlNgtsezTgxrk5zp09hzUG71x0ngiwvW6yrg9boCEg0kSaxBmBFwITHARQQW39biEi9ykEHwEWLipfNFKi8TM0bdAgYgtUSRkrRRHBMB5PbQp0IqHV47nzQ0SdcN/+Y7zngU/z2HU3cu7gbhASaQU2Vcy0M1aeeZKbrtrL8qc+zW3A33ameWiuQ5oIugrSTmz9AowlqCxBJpLUpxSjEZ0X3muNc3k9Luhs384bfuLHohvDr74fPv5x+Ou/RswtxAr4BYfl3tN2riGZCryztHx0RgfIAnScj8kReOCJp6HhJ2pvubqI3/2+wYBaaWYbMevrV1fYP0gbgedNn8fmXhXwfLdNXVdb5yEdtDsdzjzwIOvnznHw9ldx5sHXc913/gBSay587jMsn7lI1pshzwR51UZiI01FCZRK6JuAswYdahyapeVLXNq4ADrBOo9Cg02wIZB2W+w/eIjpNOPZRx4DnbK2vhHv4wZOPOiP2NOcX1FVlEXCJkmuGkVrIldNCNaishyR5djRANVcYecttq7oD8bc/bl7mIxq3nTHW7j5tlsITLi4vMbi2UXOnD3Fvfc9zL5LEQwmQhSoTxKN0pukxoZH2tBrvIsb4tBQeEKw+Kb6vgLqvBIvdbyoxKfbHWztI2lVSn7x/gfozs8D8JU/+RPu+cAfEltCEbsVW34O72PbSUkVF11jSXWG9SE6onuPF44g/NbuzhN5gc45PvQT7+Vt/8vPc8t3vQtbVjz6kY/zxT/8r1GfEsBbjJlQlQOGdkKSxIXhyOEjhOXzAHzkT/+c9cNHOHbD1dx8282cOHuaDV+y6823Mrt/O9/w8+9jxY4QqSZ5NBLnbbBAnPX44PGbBqwBZNh8cEGi8JuqNQLioyrRSoBMMMbQsqC6sSK48ZHHufH/5DpXCMZabTlRO0DIFCk9Eh9nT42PmQyggkSIACJez0BMaASi43tkBQNRHSO2SwNB0pDk40wxVtkGKRSp1kinaWcpFzfW6SEReay2Tl1a5HQukGlKL+mgZIdiMibZtodDr76Do1kXPvcFfLfN2As6wTFNwexgjGzK6YGsqbRHKuh0uozXS5I8v3wRGgTg+tIS9/+3D/HgRz4JQvL6pQvcvh4RsbGdKPHucqX4ZG+aC2kK0hOcYKY3Rbqxwp6qYLsxnJ2d54F2zt7a86q1RfT0dnrOwoXTCAfdZp+xq6xwVIymkub6bIoExH/iHkSgvSMjMJBQvCADJ2lCu93Ges+HfuqneOtP/RRv+Mc/hpkUPPChD2G+8gWWijEz118LvqZXjFm9cJyi1UNmCbVRFGqefUeuo2WWSZKE/nhCcJ5Orxs3EB6KMUzWF0nxPHbfvaTes3/3HrZdfYDsXOyWZI14dv/55+PJ7d/PZFzA7Ayy04G6plxbR+HZtWNu6zOYsgYTttq99aREOsGnPnkn3aRHaBUcu/kGVqsh3pfk7R4HD3dZ7V3k4vKAjVHzvTTzfakEQmqCs3E+jsQJhxQyas4i8CF2f0Kw+BDBSJsJ8EpciZcqXlziy1p0tKIer4P3/Kf3vIeZnQt8/Y+/l1vf9S4e+uiHeeQjHwPicJogCSGJIJZGly9423DSPFJHOHPwHimibxlNReC9i2oRwbNy4iQf/MEfQWctau+gadPJZkCe+5pOUXD8icew7R55u8Ox62/gcx//OPLee/hJwJXLLD0/4vkHPscz917He37oH3P83CIrLvDM0gimUy6dXuPI9n1UG3GBF9bG2YOI7cNNnpZ3Hte0FCNhrqnLQsNDAjaTTNQnjNw614oL+/FveA3V3AKf/9wXSJTAeg9ObLV1CqUohSB1FhcCjmZOE4jJVTq8cKjNBNxQS0IDwaf5U4QG2CGjowM+4KVHNLjy0MieOQdCNa1ZL0hVhnHgSEjbPdJJhTSGa264FoCVc6e5MBoxsjWpj3B0kYIvJV/6zFd482zKG5q7S1tBq6zZK2uOmRF58/2mLUmiAtpYpjKJKQ3ZFrsRUiCXMs5kGwNW+VW1Xbx5M6Xo5pfd84J3OFeBs3gnSdMZAg6/KXzsiW3PpozQtScNserJQiBpNl5ns5x9Vcn53jTXDzZ4Ym6eS3kLFxzWlGDAW0/mLVPC87fb5jnsLyfg+fk5Tg82sNbBqef43X/wfXjjmJ7awWvecDNf/ORd7FI5qydOE8KYYTFEp12cABdyWt0dpNuOcMPXvY3DsxtkWnD+zBJTvTZCSUbFiDRvcf5cn0/98e8ym0zhnSARAh8qRK4hiZVsljSdipPHMWfOkLztbexdPEd63fWxjf/006TdDnkrYXpuCqk1rioJxQhbV2jbVF5KsXhpheeeO001snzb338Xxo7JpEIYjUwTCIG1wTqtXndL1B0hSdOEoANOSKTRMek5H58vPMHHjaX3jhBMQ8lpJK39V3/vV+JK/N+NF9fqtIGpnbtZPDtG+5Lnv/wFQBOC5Ad//7/w2r//PTz5N5/A+gjIkHik92ivCTJQe4uRkgSFECHu/oVB6YCgjQ2OLVU+K8AFvHLUrkILCQZ0lqF0Gk0qZUApjTYJ3USw6+AsZdbG1SkPPfwgTz3zMG/Y1oMT8IpX3sLszm185ctf5tSFs/yX97+f173trQw3Kh4QgVSW1INV+hc3uGYQWzRBBEyw6Aapuvkkh+BiS1BCwEb7JJEAHoWKMz5p8E5Hh4fgCQnU7QYd0+pAt0upE4yQGGFiYm14cIjAlItsp4HWVGikAS89RsUK022CLQAh4sJgLbH9hUVpi5eBoCU1IKRDIfChRgvASwIBVzt0Fp3hlQ4QJDqVFOOYuNeHfXp5i8HKCudPxt8nlSDNEzpWkCoBFhKdQCY5dsN1FI8+AsD03Ax+zZDKEduKFWZsBXk81ykhoHYkypC6PtNBMPUCAnjLWUwSk/FYSapm7jV5Qd+rZT01FV15+bhesEybSOYXeIoL58mUYNSATNLgmLWOjoktzWy8QShHzfuV6GbGZ7IMqpJKAXfeyXfdcgu63Wa8tsZTn/pbPvyvfwlDwNlA4sGrHP2C8++vLWONx0vBcFxgnWVmagbrSk6fOMn6YEhoGTrjIcIVSGdR23VENGpDNQmopcd5/KNjnutYWt02dd+iZE3W62CFotWd4uLFZVoqoM0AKVN87dg4t8FTFx4nX44ehM7E81q6uMTZH/lhDr3vf2LuB34YVxSc+q8fZFt/wLZDh8neUJFkCUJKRo8/BA0m2DfXxJshn/3MszjTo9czLFw1S209xpvo/VgNqcYVzzx2mvn5o4xtvK4iJAThGtGLCpBoDwSPE4ZJiBWexCJVk/xsQsBj/AuUia7ElXiJ4kUlPh8CVW141be9i2Nv+TpOfOlLIARv+qEfAuDcY4/jhGsAF/EYQVRqIAi8jzJnsYMnoiJ7AKlkVJIXYWseARBEVGyXSuKdjZWPAhccQur4emHIraOuS8brQ6rUgVFcOnWKTpaysC26hk/Nddm95wC33dbj8Qfu58LSOU6efAqVtwjB09aWxJWkUy1Uq6kOUo1OEgQiKp80pybkJjAvbInq+kYr04eosSikiFqJPqqV6OkOo9e9Cv74w6y/+XWsH76GP73vIVppRlUWUaYpRMBEQNMUaBgBpdCoWJ8hbIhE6STO6nwISBENX5QQCGkj9F40I9cG/EIzPwli81zD1mfwziGkJ5Gy0eVxcWOBADMmTRzSDzj5dGwbS50ipCfLMxKpmLgxSga8N0zPdNh2VZwk7d21gL1wnNBLKIxEW83F3TvgwdPcf/gqzu1aoD01S6YUe+Z2sHTX3bzx8QbVedMxnlwaI/McZy2qmuBC4PRUD9Ziq/Pk9A7OUjEyky0e5me2z3M8a6FVjnOO2hqyVLKuEm4ZjXhwx34+PT/DgfVFvuP0aT6/92qC2eA7Tz7LvQf2Mnd6maODDZZnd3Bo0I/a3w8/zH2f/RTDVHHb9/wDbn/P97F29iSf/q3fIwuNyLKKhsObURsT0cuNrF+qU+qqYm52lmeeO47PE0YSautIXEBaKCuL8RXBFFhVkWyMWb14FtGWVBq8k6hgkUGihKYwNV5rMjNheTKKMzTpabUUiA5G6K1zATC1Yfj4Y3DHHdxz1V4u+djy3nft1Rx54x1sO3YdYThg8syTrJpVTKJJgkfG/j2rK+ucOXWKjWXDj/+TH6AqJ0BOHUqEUrSyLl++/15uOHY9zz+31ujtxiqc5n4TwkfQSoO09S7gLBAkzgucCxCiALZzHkLCFT++K/FSx4tKfCpJqZ3DFDX7briRl7/tbUit6F+8yN/++1/hw//ml/DCNZktzvNQEi9c40YtEB6cs7FS07HNaZxF62hO6WzT1pARqp9IFc1WfaNZ6T1ojUUglGqElg1pXTNZHjJJajSSxDlCUHQ6EVo/mjiGkwqZQLvbZmM158Szz3Jg/wFsVTAo+whjeO13fxsvkxL4L1x77CY2rj7E2toa42F/C1Ztg0fmKd1Om14vp9eeJqDw1lKVFf1Bn1GxAbWJiUZKZDvHz8QkXLWnqKbmKHsz1NbilIktLmKyCz66dTsCBN3UdhYhA9LH9uomDIBGzFiJBqQiqmb2uMnEjnO/zZlK8B6/udmQcR4YgtvaVUshYutWOELw2EmfvFNy9cEFJktRxSZTKVhHkme0W23KuiQogVSCu+/5HL/w9rcDsHPHPHXxMGQLDDqznApjuk37rcxSbEsxEDDd7lL3cnYdOwIf/iQAbaXQzpE4iUSToXAKprPLc0AdStrSby3sAKVOWROBNPE4r3BSUmlBmcSkZDtzFJ15qipWI+OkxdR0nIHphQXkUpRlczoiIpW08JM/yeo3vJLxwhzF2jfBwcP0ei22792FqgMzxrLv0CF6yxe2ziNxjo7OSDJNXVu8c5iqIOm2kZMJLdFGpQleSLyW1MIhsjapilqVwhucKalNQIUWVZM0rA2kIsHVLsq1pVB6RyI0phQNiT1llNXsaBLPeBQ/66A/YDiMqkTD0YhCavK8xfriJb70gT/ABsee7fPs7XWw87difJxlJk17+uGHHwE0U9OaqYWcSbUWaTQi4GvH6to5jLe0pyRJMuGqA7vitQyuuf88Sii8jNQdaxzeBrAxAXsfN37e2DijhWY0faXkuxIvbby4xJfl6CzjmXsf5Oz3/SDri2eoxn0UFikVIQikbhZTH2H9PvjYIQweFWScR22J0foItW+4DfIFfB1tLK0sQdYeHSIcXQHSWeqJI0sSgoiuDomtqWvDYP0iQ6XxoaYsfFTabxB7QiQ4YyjGG+StFCkV5WDEc489ynjQ5+d+4V9w4NB+sjRl496vAPDFL36JC6eeI00zZqenOdiNvKjrj92IeOXtWGtYW7/E0uo6G2sbjMdDgg1keYvOVM6ebfuY3j5L0skRSuI3YsvIIai9wwZPskn9EKK5bh5nbSTBi4aXR6RCOCDogPQC6TzexzmdICA2Z47CIVWUhxIhNDibEEntIkL/N3UlfQjNRkKhhCD4gFABi8MnYL1h5/w29u3fjayg2n0RiPPXqjRc9/JjnDz1PDIosiSj1clYXrnA0lKkLORZircVAxc415ql3LadnauxWgtBUouU2mQwFLzq1gNMl8XW9//g4YN8PFmAfAYtFaYcYqXhVqn4tobi8vTNN/DZxdMc2Uj4jnPRWd0aS9AptbOx/SwFKNFUsiBUikeRZTHZGS+omlZe8C62JABk4623fQpOnueb/uRvSadnATj35c9w8SufozPdQ5QBORyyMRwxW18msHedZ15AO00pa0vtPYUxVBcuMIXDm5KZXg/RllgBY52iptpQDVFovHXILMNaR+VSvAwoJkitG36hiOhiLEKrCLCyAuc023YdJt02zc5Ll+Dh+zAN2lQmik0ZlFbaRjrL7NwcxpWkiUB6zWg04cTyMmc7LQ7deCM9PGlD4n/+5FlWZ7bz9re/neVyHcoR0iuC8nSyNo898iivuOUVjPvn6OQFK4unt66HswEShXcW7xXW1NSVwdYOZy3eRgS0EAIRREyIwTVryBVwy5V4aePFzfiEotXuMqkNk9Kwbec+zp2aRGkh5wnIiFSmWW8FUZpIaqy1yEAksEoZUZveR/fxBnaPktRNoZIGmHYBgUMHgxQCbyuOXruPzvQUUukokhLN6ChbbfJX34rNUhwVadLls5/5MosnTgFw4ezznKkrUplx9uRppChwlaGcjDly5AC1Lbj34QfAS3acPcPtgMCSJQoZPMV4zEqjr/nYw4+y1h9RVNEAVKkUISVSBLRO0Ykmz9sMjWX9wiK1ralGE7JHnuAg0N9YpRr1SRtHdC8itSA0WBkCUZw3QlYRIfoWxs1xIBFiyxtPNNXdpkO7YJMLKCL6Vgh8iIuNFBKEbuajETEamuJQIpCNy4bHk3aiO/x4MuHOz34eHTT7l5bjL1UKkWTk0z2uffkN3Pu3X0S2FUondNodho2mqZCKNE9xSrMhW1iZkSdxEQ4u58SaYNdVh1BJSpp0aKfJ1q320InnKaYPE7IcIdp4lVC5gtC9THrYd+w6KCcM1y9evkWFhdAha02TJxlLlxbpZt0tYJK1Ub8zbdQIgtR0p2M1uG/nbvITy837NICQJN6Qj/ziT1Hu3cf17/xudr/yjRx8/UOsffSzpBpaWc7Cwi6me5fP7e5dO7m4ew9JplEWimLC8yfPsLBnG5cWz5HIGUSSIGuY+IALAjcZk+aKEBKSVkrAkwLW5TgdeRsZCVJqQlAoAlp4kCmqIYNXAUozplx0jNdjdeeaym9u2zzdwQAasfE0z5mZnuHAod2MB2usrK0zWh8ipWRjbcSX7rmPGw/uZrppNXqnOXX2En/5kU/y1vY3cGjnLJPBBmkXvB1jhhEFmgnJ8089xr5i1BznkVJT2QqHwFYeaz3BR4BQVASKow/nHSI4vAsEbLx3r7gzXImXOF5U4pNIEC3ancBosooLgt70Nsr1laaisI2fWezRg8eGGksWOXve4oVAoOP8CUcQKYEEGuf2SsaH9JNTXU5PTSE1pHmCVppiMuHq6R42zajxZCpHelA6oFWKffwZRJ6QphKtM/YfOszJ+yLQwvQL+uEii4uX0CEgXYURjpfdcgvf+Na3MCkK0kRDSBFN5SmlJEvyrWRgGy1G7wwuGJJURZ3ERkU+hOhcMBj2Wd9Yw4boqSekQAvJwmbVIaGalBjjSbQiNC1IESJHMCjXkLMFCIej4fBFVDhuU0iggYLHUV2I/x/AygZVa6NmohQygmGEZDMvCiFpFMziz7zAecjzDOsEWdqinbUpuhULu3tkMmXe9ePnt67Z4Dg+d9fn2daZJVOa2157Gx/+kz+m2ySnsrK8+vWvZHllQFlZICXPtwEwKnJaM7vYvvsIk+VL1FVF2GxzA6+89XaM6PLU84uojgSvEaWnesEaODM7w8GjR5Dr462f7dq9k8eWKxYWFtixbZ6zZ89w7Npj+OdONvdwpHikmyo0aZvduyPNJM2yxgGcrVax0vHP9cceZbi8TNZqc9uP/DT7X3MHz37+XrQKpCIwPTtH9wU2AoXMKFstbCLIeimrxYQwv5ts9yzF8iqVbqbDiW7kugzUlsp6gqyohhYhGm1TOyYKJdSY4JEkCJFgnCFgGtECQRqitqVNUrRULK5F1SK5KQox7JM0PNlisMHMvn2M1ldx5QwzWcbM7l2M2m0Gy6tM5bA4mPDAV+4HJbmK2AYvzIiT58/wG7/6X5htt3jFzTfwjW/7ep549HH27T9AWY84f+Yso2KAUI1kmQzU9ZigY1fIWY9zAWstIUSUsWtQ3MFFuo4j4IMA59BXxDqvxEscLw7c4gxpluBEC1G1GE4Mnel5zGSMrUzD3wOCiwCQENCykZptIPYRwOGRSiCFjnB6YZEyVjWugU7v9o6yKtFO0lIKXE1Zl/TW1kl6s3gJ3laxilSRSO6cwwSLD4HaCyqvuLUbW1q7h5dwsss1XU0qNQQNiUQuXeThP/h9Zman6PQ6dFo9upNJ/Bjeg2iEoZWIvDxANaCXZgqBFiLSMjYVQKSMBHa3OdsEAmSN8kue5fSHGwQ8dQ1CNUmuAZ8EEWJLykfuHQ14ZQsz1FAXIhGPOPuT8XytVEihwUUPOO/isVKqaAlD5PgJJaNrg5ZIreKCLxo7IwQLC3soK8Mdb3o911+9m/FwwvTx0/Dhj1GbErIeMhX85D99L//hf/t1diwcYM/uXXQ7HWYbp/LxeMyhQ/tZ33iCJGRUIaFsACBBzdDuLOBCC5V2ObO8QjKabN1rs9MzTFmNJgGdI7QDobHm8rznvoceoNoxx7Frr4a77wSgMz1NWFmiKCecOTMiS1NGkwl7W5uzwYASngYIi06a6xW/OcRmq7NZbKdvey18///M7vXzjDszHHvbNwMwPH+GunJo5BYQyb5gFmWb+xyg9pJOOsXMgTmm51Pa4gylLOJ5BAdBIjQ4nSBCnGNnUsVReYhkbiECQnlCUFgbIBi0AqTY6hIoF53NtXAgKnyTtBPXUIRWN7CjuEmYFgpZ1fiyoDq/iEwkSkq6SUJ73x5effQQSxsDLjx/nOJUnF2Oxo5WPo3MU/Kpad7+lrewvHyRL3z5QVqp5upt26nqdapqSHd2Crlum+chRHm7oBtRhwrnJKZqFIa8x1eWYOLGLQSBVHHmZ73HvoDmciWuxEsRLyrxhXEfMdqgl3dod6dYWV9FlJZtvRnWJn1ks5pIIci5LDckhCf4GqkEBIXDNInQxeSgQckUKQTrQjIRgn974eLffRJPnfz/6oP+8/sefFGvN1mKnZ2DRkBXiMszdhca8WM2q6dIst0E8GzKlenGXiUCSi4vhN28jRkP6eYZo9JgbdUAVuL7+QYtGn8WeU5eSGTYJAxertoI4JvFUmxqhTbeZlJIXIiVejSvDXgVmtdF7mQ0AY3zPCUVKkDwJYPli5jEYIuaXKdkMxmoiOpMkCgh+PynP43pj1Aq4dzFC/z1R/6KH/3xH+FIEflsN1x/jGtuvomHHn4aKRyEGiliNahUTqgt0jpG/XV23biNzuDS1vW3lKg8UlxkqPCiBmmwl3EsHLpqH2F+hnZ1dutnnV4PwiVamQYfYgs6keg03ure1eAryknRnEegbFqzwTnkpsNCc51Nfw1uvJFrr3k3KE25scbJT/0lz3/8r7B1zbi0uKJgONgg7a9vnUeaJnR6HWw1oioniCSQ9nKqetK8t4rPh4v3kg0CK0B5h/QO4SMVgqYNDh7hfEPyjhJ1m/J0OBNVd0JU9LGuBldSNPfbfVNTvGMw4M7ZKV4ZAi8bDLg43WPUbpO3WqwExXUHD6MjvIogNXllODjVYc8NN3DKCTh+Fu9Kpmd7lKbAFIa/+NM/Ymp6O0IkSOv49Ec+wxvffCsvv/4qNtaGCNXYcKm4ubK1j4AW73Emnr8LNppVR1OiuNncer4ESqjLQK0rcSVeovhaE18FkHhLZ7xBO0p/kOc56+urdFOJ0BnBFjTQFgiwIRWVB6UjEtI3IBfShtdgia8VPqp/CcVFrfiGA/uZa5CK3kOSanbt2cmlS+e48aZjbFu4Ci8EidZxYy4BHedW0ge8ElHi2Qm2Ly3xLf/lD/nr7/9uNvbtRIS4wxciVnNKKkSIeoJSSgIaJRVFr8twfi5ON4JHBIn3jTqnY2sQD1FdJlZlMdFvkm9DrAfiAifFlkrK8Uef4CP3fhmhO9FZQUVZMW8bmTHiQTpsXp8G2ALIEP8VACo05F8XtUJFrD4cDhE8Nkiciko6Knh0I1+mtG6o4BGAZJ1FEEhkgraGb3nr67jpNW/mB378f2XxpgOsHNgJQtEpGmK/lOyc385tr/tG7r/3frIkp6oHfO/3vYdDuxaQTx0HIEkUMk2w1iF1gk6Ic03AektRDPBmDKbi2sNH6VWXW5ZHrz3COJ/ms/ecpuNtrBJsSfkCPc9WpplqZ+zcfllt5NojR5APHWd+dhprojTWq157G8dUgAcewlcV+JpJ4424Y3aaN73xavgPcMebb2f27Hl4+PKNP3rySbjlFs79k3/McOcOlpdOgZlgrWb79lkyA+2i4NjRg8ysdLeOUwq0lmhSWkowzDytqQyzMaAWASEU3kYTXY+iCvF+FN4hXOxaBCGbRAf4CGbx+K1NUiTia6QTBNUkPiIFNvER+AJQZPEx38hyPrtD8Z5zF7nn0FWc2raTPOtireXV113PTcf2IYMhOBlRvaomyTvsuOVlcOeXefe7voGzu2cZj8c8cN9jPPrQMyyuFKi0hfJQTNapTcKkzihrUJeZpiB0RGZbH8XQbYXzNc5ZSmuwvtFqkg3gzTeefELyQmf7K3ElXor42hJfCBOE4IvtKT49s5vO9Dzbt2+HtMX58+cpR0u4Xpv+4ilSYWJlJBRepNTYRqJI4L3YElJ2HoSXSL05B2g0JBGcEoJTSqBsIM1TkiznrHWYmR67D+1lkPdI8zZKxZaeFwInY7JNGti7lppEqIZDBEsL21jeuxstUggCrUBJEEKhpEQJ38h45WiREMU8XCNXFhenzYaLJ3KzRKPTuclzYjPpAyGEploRDYpyqxHMUw89gioqXCtHqJgYcAJvfXzY4fIszsddsJcyVoGxCNhUNIwcsqaV6ZxAJpETKRoBaYTANuR4GQIaETmTm0owjWyckKBCQFQln/nwX3Ho4D7eecdtrK0uc27xIt56eufPcjtQeRAq48DVh5iam+X3/sMf0Z3r4Kzl6RPHOdJA5qMMmiPRKYNxRTvT7N29I954rRQvIhVhUo1BehSXwS0KzZGrDpGJHI2ktpHuEuxlhJ9O5WWgzuZ3Yy11XdHrdTh75gLdbhepJJNys43afJPNnV8XBSdPnOblwNramHRSMk1sXwM0XUI6vSmSuTmEKzl75gRZq82Bw4fon7+AuVjQzjSd7PLjtLq6ynNPOySeHM16qDlw9cs4d/o0Ximk8HEGHAJOwqapsVIqipmH+J35AMFGH0WCA+Xw2ChAniRkeRtfVlTG4VzAS4ENDmU1IcRNws5mwzJzaW1zW8V8UVGNRoR+HyEzzn3hHq7LbiUloFQC3iJUoHQwORtBXXv37iY7tMDG2iqDi0MeuvthjJlQy/WoEoTiueNn8KaCEqTcnM0prHE4Z/HWEZyFUAMmorkbZDLCIhskc/BNT0VcfqauxJV4qeJFtTonIWW59IyyghAMmZom37mf5fMlrbZguLJE6id4LE4GEjwaiQ8RY6hkhpAu9vCDjIAWLREkQASMeGTUvQyKYCweG90NpGR6Pqc2FpHWhEqiVUIiFTrRaK2RPh4rZJxXgN+CQjtncMZFBZOGehHPK+6eU5nE5AdbMzUpRVPVyVitbRLY2bS6ifMV2bg3iGAguMZKJRAdheK5e+FJdYOOm4xIfKAsa0QOISic8Y0rhEUJhxcq6sAIiRTRuWHTVi4081OaeYgIkfpAaIAnMhCdaALC+0gxkXEmGKTAhUBKQGwmEUmkNChHKCo0hhNfuYtbrr6Ze85O2LV9B6aq6cxFXdZts9t46OIlEgndVgrOoEWHbtZhFCrKKi60RVlhiwm2KklVirAe10DrU+VJEfjasXv3DqRMqarL1VxtDTpJ6LZbsSqtI3pWU269xhmDVpq6vnzcYDAAKZieneaJJ55ifm4WiW6UdWK128l6pDrOGrM8Ich4/JmLF1AbfXYD2ahBsDbzP+fitc6znGguIuj0erT2LuCVZHVhO8XKZR4fHgYbI+546zcyP9tjtb+M0JLheBxFHILCuJrgHLUrUFkCJlb2xjk2/ReD1CASrC9RMnYXlE4iSloIykkRk0WoIQh8kHhT40OLDZ0ykZL3XYgt5F89c2br9H7ugYf++wf8Lz/ydz/4QKUVpwvLcK1kx7ZdfMt3HuMNX38Hjz/8MA89dB/bty3wsb/5DAeu2sUTDz7MK29+GaHaAMAYi0tjhRtsjfAW6WwEudSO4AJKgMNG+TyfNhu7Rs7sCrjlSrzE8SKVWyS2MhT9PivSM7tNs31+H2HXYRbPPcm27ftYO/MMUgWUSKK4cnDgHLJpuQUHidza48XZVDPEdi6KPMtGF9MHiWqqNi08ykp85Ug7SeMALtE6wSNiGdbkps3HJHLd44LXSjI6aU6iM6SI/oBWBpTQUQ5NbLZWLivLCDRC6Ih8DMTXQfy9SkZQipBYH3fnmxUgTStYN/JlUklq67ZanZgSYQOSgPaSwlgwPlq3CIlr2sJbyU0GZJOotxwY8BGkAo2mYUNtoAHFBCJgRkIIFZAQhI4qGVo1rTSHCoHgLdamyKyDlCM6QbL+7PPks3vpdWZpJSmJUHQ6Ef14zdGD3PXg/ezesZ26N4XWgtWlRZ589HEOX7sP1xDKrSkpywnOW5AZWif0WhFsZCfLpFlOR41IleTSUsmCvzzAU1pgq5KyHNLNO6RCUDcblc0IHtpZSqovV3y2rvnWd34TU+2MPJNI6amKgmIYKz5XFwyHI8omf5pinTSJdj3GBHxDqbjpwmmsUphGOLuuDOWkoComDdpS45zDO0+iEw4cOszwBTM+EcAbx8L8DnQG21t7GI4syyvraJnibU2rlSOFI+9M8fXf+I2YyiClpDYG6ywaTV2WVPUYYyoGa0MeeuTBqGRDCsKTtgJlUVG5Euc03qToJEFKWMxS3n7d9dxalvzvzx3n37/lraRK8WOf+Bg/d+N11FcfxhYOa0tEApOiQEnJrp272b1zOxfOncPo6Ov42KlzhLvvx8jA/OwU07Ntjh6+hptfcxs3v+ZGzp05x/rGiB0LKQ8UIxYWtmGPN2LvxuO9oK4s1kQqg226Pt6DcDJ6TLroxyeFRDYqRkiBCldanVfipY0XR2AXiqSpFOyoYCRWmMrm6HWmKOb2EJIOy2dPkyiPdQ23TMZkEJ3PFS40qSlE8rV3UZQ2Vl5yCx0ZhCeg8SaQ5RJTlbg6xVYV1tRkOt2q1oSUBBkaE1GHRG/Z+mwqu3tjMWVJUB6lG63QPIuk+CAaF+jIeVNbsmpcni8IvzVj1w3iDhzOe4yNYrsKt+WdFzDIBs7tnCcisxuUWwhI79Desq3XYzzux6QmQvTzC5IgZONh1nAfAQIYPF6IKGEWxBbaUxBnfNH2Z1PKTIGP5rNehDgT9AHhI49SyShXBp5JaTi8ay/1xiq5TJCFx5YTett28eTjj9Fqddi2FLUfD7kxP/Kam+nfdRfSB370dS8jSTK4cJ4ZKlrPx8piPBlTFSNAoFSksDgTq8FXtwyHpyrMxYcRQnPqT55gSl2e8Q2/9GXOPv4sr82GqKqgkjWVrLkqjLZeM3f+IrlWzDTnBbB3OCDLU8L5Pt95cD8+aHjqaVpnIzBnx7aMI0d3cn05DV+6k+Wl4xx/MprYPvSVe8nWVrgZ+NIrXsFw1wxJFSvUM888znhtkVQ6uknGZHWDB89+Hj8aIUvDF41g28YSr2vOY6rbZnue8cmPfQytFC5JmJ+ZwRtFrhIqaubn55meamFMxRc+fw8BBQQ67Q5Zq4XymkRITCgiECwI8qxL9X+w9+dxu2VnXSf8vdZae+97eKbznHmqqtSQqlRqoBKSgEASCASF1ojYLY0KfFrbV/zIR2heaT++GEFbW2xfkVaRIIINGJSp1dcXMUHDYJgqRZLKUGNS45nPM9/T3mt6/7jWvp9TQEMOVr8KnpVPKqlznuce9r3vda3rd/2GmXpZ3n3vvZy7fYOUEweTXebzwMsvXGPr2rYydo3jojFcO3oUnn2GJ32HKazY297xDj5wcJW1289z210neezRXyGGmkU342NbB+Sr12nqGrdakVNHt7aCv3qZjRMniMGwd23Bzzzz8zz58Se46+4znDl1mv/2q/4owUz5xEeewbhAH5qRUqZrPSn18UOZEBMhqiOTRKN+t7HcyUYRIlNEwU5uWZbdWq/uuqnCJwpEkkLGzzxW5uxev8yqPcra+knM6Chm5TGybzFJc/lS8qUzgpgi2Rh6m60iQCsRLwbrpPh2apfkgiui7gTZEaNq9obDGttojritrJI1TMSV03c9GC4Zbu5ayVpzQk4RnxdgKp1peXDNCCvQdR1daAmpg6Skl6auGQyGNE1ToNpDH1GVTwTVGmFKDJDGxzpjy6Aua+kygksJZw7JMHVlkcowmeyTQtacvOJGj1ENk2RDyomYi4g9C4hS1sul0y6v995cFj8V/2IyJidIBnF9Dp/6pmbU9d5HZZ8ikSMrNXs5YIJnf5o4Vgn33HsXKSYkWyrfEQYNX/pP/o/f9l6Jg4ZwZIPKVVhXYUzFoHHkzXXCoOG//ZF//pv/XlOTgTd9x/fwJuCP/BbP8SXf809+w5+983t/69f2OXee4Y67TzF64WUA/rvX3bU8kHz+F34epx79CDzxFDvtjJc/dcCp0r0OuxaZz7l++TLXtvc5euIYx48dJQyGLDaPc/L8OTblUGT4ls99M2eOb+KjMJ9O2Y2JS59+mi51+G5GNolr29fYuh7JPqmMFWXgXvGenCD6oGCfzWoJZzS+x8kA42ouXdxiZ3eLQd3QjBSVbQaWlFqIDTHpAS8GLd7XLr1ELC4so/GIe07dzi/90vN8wR94OzEmHv2FDyMpMmgaKjtAYqSdz6nqmoGrsWRm0wmT/QOGYhkMR+zsTZgtAleu7ZLCNbp2zl1330nI++SZFlnvPZ1DiS2+I6ZIJC4dhExhL1unuZLq2GIRDOSknp231q31Kq6bc24xjiiOnDrSwpPDBIOFJjM6dorh+jFe++CbefLRf4tLCxpbEcQSsxaTbExxandkKey0XIyrjWCMK0a2Otw/d/YMly5dJif1+psv5tSNYfXIEGsaUnTEpJ2lKTBgSp6cHcY4NcUuh0VjLM6ZomHTLLrQzpktWkCIoj6H1lqyibjKMB4Pqat62XX1LE6gSBZ0xieUYNeEvi9RjaKg7DwkEZJXyyi0wxwMLCm0LBYdJiUQ7WCNUVZeyV/HJ43kcag20kpf9LS7y1mtxowoYcKY3s7a0KfPSqq0s0XF9JTOFASSJUtmOKzYufYi63WGhdD5wOap4+qhSoOETHvqJD/3g99Ns7enTi/GQYqkqtioGbsU0y/W1/EnjjJ2NWIrxFmsqYhnTvPBH/4HyPaOMm97+21JWHG0a7qDN9tTUg5EOqX0Z4XAUoLh7h7kRHtks0DcQr21TRZhvrGGsQUiTqYcBBLNzjZv+ev/G5/1nh94xS39eX/nfwcgNA2z48e5+MaHeMNP/GvCV3455tQp2mc+DX/3u/n0mx7hypnTfPzDn2SyP+f07ae47767yCFAM2TsajYmq8vHvWN2wEo70IBhJ8xI+HMnmZ88XowSDN53GDFMJjO2tnbwvl3qKL3vNJ08qaNRLtC0VkQpEoY9XDSY2RzTZkbDIYP1NeLr71XSlrNghPMHKtf47Dd/Fl2I8OwLfPLJj/HB2ZSnn7jC93/fDzDeGLPIgco4bIpYkwkpUolBfCKESDLg25bKOnxMhOmUjsBzL17k+NGjHNtc58r2NeLCMKuusRL0fvfeE0v4dM6x5El6ettCMGUPKMFTKUPqzRbyLW7LrfWqr5sqfDFBEotYBynRxSl5Fsm7Hhk1JFOxcfYOxs/fzmzrRdrQYZwgwYFYcla6fbaJjH65rDgwOnNTKF9z9lIKHEwmDEcjJpNdcgK/8Jw7fyenzp7lwQffSIiOlITBoAKTCaHF5qSRNGJJKTPYVKr7Q488yF333g0ZuhxVN3SwICw6UowsQse0ndMtOnxsSdGzv79LCgFnLU0zYOR7KK4DWmJUB3kVrguVs9R1RVX1fBFXIoqEZuQY7GoxC8YQTMZVwkqAmVXPyEwmm3L6jRmbhWyEkKUw33QGoqM/3Q30oFD0XoTS7bHM5ssCiYhJgKnISRALYstjGYvgaazQ7l5HiPgEtgs0o1Xa4KnsiEoywWbimTNMz55GREoxBS8akmRugGQxFpsiPVKcDPgIzlrymXN0Z04jGHIuPFZRVp+UDnp2xmpHoJlPS3MAEDog5767spCFGeUAolTVckipCps2ssiRD/zTf0R9sI8YYeWFC7zhf/nf+LW//C0c3H6GxfoIe/wswwLFnuw8g9CyWr4hp9qOxnfc8eC91Maw8J64s0stDpm2QGJjR2d8oa54x3v+6c18tf5vX23lSMeOcGSh3Z+VyMUXr1IZeO4TT3H67jtYW9kgtR3WZSWaOU2SR8A6S5KM5ESKivpgMu2e58n9PZ4xwvETJ0m55d4772a6k7BTnat2ncc7Q4jq1JJyKpZk+t1IPVwvSsaKWSVQ6v4USO6WV+et9equm7Mscxlj1Q0kmg5MR5JMOzdMdrfAVnjnuP/NX8Ljv/R+/P6LmLQohczgMzQugYnFf89oERVHlk79KtEZgLEVW5MDckg4a+lSoqlqnn7mRbZbz8GBp6lXoBAN6kGFtVAbhUt7espmmf9Mpgd07RxntYMzYtg8dpzG2TI/8bS+0xO6wGw2YTGb49tW4ZlQkucB52A0sIxNjbV2yfw0Ji/ngMs0CjmkY5jimbi1WMBgDW8SQxEqijE0iVjE7yTdHIxxuFIcBMFkpfVTnhMK4YVcCC+awiA9ISBnxKicRIwrh4JSbAyYbKisZSSZjWyJMWAqx2IeSc2Y0WiFnAwuJ41gElGpR0nNFgGby2uTQwg2SSEqGGXHUqlTjhs4bGURscVnlVKccvnMbJlTaroGuS4+psXMnFTmr82hXjLn5eFDly2En1q1jRLIBNJdZwicQzCEWkk23T13EF93L5UJrOea5twJ4qDhLf/rd73i3v+C//d3f0bfkTBo+ODf/jZy7crLkeJ7nYlRCVwhBLXlCkE/K6P/DTERgi9FoHyyKRG8Gj7EEAnRE4POxhZtq16rKRTRtxCigt5qhYdKiIxlOhzQHl3nyEs6f60EcueV7IVl7/IOwzMjkgkMawexSHlML52RAj3q/Vw5dUoiqdjck3npwjXa+YzbTh2DuWW20E4z+ECMlWZYLsNn9SAd++8NakKv8DxE8ZA8pqAgt9at9Wqum3NuyRFEM8aDaOhq8B2SK+ZyQOUahmvHWKSG07fdz3OPX6AiIjmr5ZehiK1VQK0uDTrrExFCCIg4nLMMhyMe+dzP482f/Rb8/IAnn36K5597Di+WjMNmS5rNwFo6Y+kWM51Bin6pxGpBlB31K7zw8iV26waHwboKrMVaw2BQYUwmE0sygs77EBhUDmsMTT3EDt0y4sgah7Wa0ye9p2a/6fahqMUEMxcHFb1++lf3P/QQP/PxJzErNQu/oBaLt4k2FvFwVrTHOqPsy1ycYXIhukivxipqvpRU4iBFMwhFrqEsT+NKUSlV2WCw2WjhI7NaWcYSGC0y0TryYEgaHWXj7O3Mol4nIZcAYUhGtBgl1TP2kbgiffErW2WmkIYgpEhMCr8ak3F1Q4q56LT0vWiuYT+jVPIOWZAi5AZZXsNDi7h8aO5Nz2wtrwezPHiICOIUYkxJsAUDt7bAbGLIGLpTx/nlH/1eqt0DMpnhcy/x0F/923zsr/4lpnfcTpKOHCIiDowlFlmBFNPvbn2N+anjyj4VZQDnkmknop6sBLXn8l1X5l6+GB+gAnUyVV0tiUsxxvJ+oTKWyc421jmGgyHNcMTAZWLyLNoFKWdm0ynz6ZS2bQmhyGlyZsUKq8VH9dSpk1jzKZK1uPGY219zJ5cuX+TomWO0foHEUFzmFV1ISf2EciGl9YgNOROTTv9zClQus7u9h4u71MVsIKdeu5vVMLuwjll2+8p6SVJufASiwVAjOWFvmVTfWq/yurkZH+UL7FS3RlCnFBshTWe0ezuM146BqxhtHuPehx7h04/9AjUZn1qSGJxUEJSubARV5yaFP6qqUesyY+gWHb/6C7/IYx/8NR586D7e8Y638bV//E9iho5LO1vs7OzQzueIyTS1JSYPCM5W5BS00+h9MoHa1YybAU0xIV74lhwNbYq42iEmF/ZkZtCMSmJBxBpKV6fzNSiCdekt2QoWVDqenKKy1bKoS4pVpqqzjljguS/68t/P3t138fM/+f9hMKxos5pNOzEYDDEX6UeK5XCgs8mcSsdjlPQiKfW8z7LhZw3ylFw6MtUbGly51mUj08wiNWCOkWZQYUNH1QY6E8krI77uL38z88pRl+4pCZikOkAphUKF9JrSrY9ZulwyKSYVYxvt5tQhJ1O50ilisLa/jrmkcxcINhtyPhSzU+BPLegA+YZ5a28nV44CIstkhdyzXgvhhyzFpccs/TmtMVhbevlkEHHEc2fxZ3MJFNbHau+8nfm99xAk4kQPGyElfIhYqwehnAUjUPUzX4rpsjXapWf1lBWEqq4wlSWEQNWqxk0dgMpsTjRJIhW4N+ZEjEL2kUtXrhBS4tS58yx2JyymM5DEysqQykDyvowARhgjxBj0fjZZTR+AI5sbnL/9DJ+6sM09n/UAd9x2nq3pdTaPbXD7mTN0k32eefZZptOFHiDLQUYKMUqsOg2JiJLLEtiYtUtHWLSJtuhEQwg650slXT1FbDLKD8AWcovoZy9aWCli/GTMK+LKbq1b69VYN6fjSxafigF10c3lnFl0E5pqiJ8Ie9evsHpGWJBZOfkAR+6ec/1TT2BkgdSGFAySRDtGZ3BYiMXOy2ZEAlBhXY1raqwIzz79NE8/+QSjlTGnz5zhbW99K/c/8HrGR9a5tHUFn1pcpe4Qk70pllQKlmc4UEhrdWVAGDqEhO861CXCEENksZgXqCngQ4szVkNEhwNkOCLERIodg6IzW7QtKekJP0vWdPgYsVGWMg1XOf2SZz04B++pC5V8spjzhV/xZRw9fYJ//v0/SGUNLrZMY9JkBaNffJNqYnJ6sg5RC6uoRMOQVXaRDRlLimCsFKJNUEYtFWSHyWoZJgaiZIxNGJPI2TAaVKwbYewN163jni/4fXzZn/zvOZDCtjMRMOogI5QNUAocm0mAcT2phbJRg4ilytol5jTC2IDJDVYMVhqF+pK6y2jRKoUvHnbHpXfVP5cbijaHbjl9w7js7pYdpPpYKknIoYJGFEPMvYpUf/Nw1hjISRmuZD0w9JBxJpIJ2AwpqcsPWQ8rWlgpfqx6IBLRWKiUFdqT7MrjhNKZWyQLtavAVqSY8CHQtS1d1xFSYT2mhE9RiSFRCVRHNzeYL2bsX79MiBGCjiAO2inWCMPBAOsqmrpW1KBAoZLVIB3AVo43/b57efJHf5Xzd99N13k2jxzjQ7/4y3y0GfK6u+7iofsf4hd/7oMMN4ZEIj6qjV4CQlTdqrrMBLXMS5ahGzC7vq/3aem8U0zEEPWei74Qs0wBtouVocIC2kmLeqhmEUjqcHNr3Vqv5rpJAXvPZkzawYRICAlrBbHqytHt7dGOG0brR8i54d6H3sb21gRpr2H8hIRRZxGBJH1Sg84UUkhk0blWRuhyBKOn+6ZyxBS5eOElfuD7vx/jKs7ddhtv+8K38wVv/XzECvvTGedPjliEBbu718h+tuxyQjvHGZ2zVFVFVdfEGHHJUNcGH5UY0lFmRilB0iIngJi01ARaq+ny+toNVV3hsvpyqixBjXVzmen1XY8r6QxVBVN/wMOf9wjrmyv8+I/8EJ//yOtoNjdIJpOTYbLd8divfpxPP3eZNnjqrB2pT6nk9Cl0rN2c0/mddUju6DtzsiGn3iPUakSSKMznk2dlaDg5tFR+Qa4zj3zx2/ncP/qHuR4iA9vonIdcaOdKPijIdElqP2S5lhGdMi8lqy4rq2FBjBZrtbmXMvuT/rqkflZ5CNsunwS937TI5d8wN+1fx2FxKo9xAw1Q66R2kfozUfWd9PBZBrQL1eJmESk24aU46To0IBcxWFM6U8rJpsw3Y9GPkrPey5ZDAg+HObcFCX3FY9auwTqn87qcSSFqOkGKdMGTgnaMXdsW8XuZZ4cCBJfONSX9TuWcsU6h3JSizstNbxFoefCh13Pi564g4rj99tOMxfPErz3Koss89cQzPP/000x2Jxw/fpR6MGbeeiazlrYL5foYQiGg6CHQ0cbA5WtbrB1dUTcItFc3OUGMWFFrvb6L19dS5EBJikORsnxTLKxvuTXku7Ve3XVThU9PkA4fVQJgjSOJCtJD6HRwzx5mv2bl6FFEHNf2Z9z3li/gk7/00zSLKdEEQo5ko2fxTMLYhJPEynhFN3ev20RMmWASzlYkEl3X6objDFHg6pUtfuiH/hk/8uM/ypvf+Aa+7Eu/lPVTIzoRXnPbPcwP9umeeh7QuVyK6GOVjcaYqpApoM41w+GQUV7BxEidjQa/przUAPbNRAiB6WyKqywSi0QCUWG7MYhkko7Xbpj1HMYaVc5RNzXzdsHK8aO85uHP4iB6nv/0BRaLBTF4iMKJ88cZHT2OXyzw+3usjEZUgyHGOXwITPfn7O/uMp9PiMzoklf2bAzkLmJxNFVDKoQEgzJyTRaq2pHNDCc1X/eNf5oT99xFlyyTRWTgBqQYNI5JjM5nYvp1RJLD7ktyX5QKZJi129JU+VKUUm8m0P++ajUR3UCl2OOIi6V8acfX/4ecl8UOymxRjLJV+8bwMLyiFB/1SL1BhYJGOZUf1ldPusFw/BAqNa/43R6yPlyF6JEhSSm3OZd5Yb/h6+OLsZDs8tXnoqeUfg5cJnsAxhksVh2IKjU0CCliO0vXdqQQGY1GS/hWRJC6HMqK64Lq3gzee3znEatXtKoqTFUgZKlIKXP8xAZGEi889xzbFy5S24rgM3VjObKxxtrqEZ5/8QUGg4bx6gYr43UGTWLe7hECeF+umwGRlqoxRCdEsh5oAWtFYe2sDFxxOjMU079//QDFSoFSsyaKKP/q1133W+vW+k9fN0luUb1YVTm6zqvpsbWEHAjB40QIcc58NuPqlascP3OSZAJ1M+DcHffy0se3SWmCtQ3ZWEwRqKYUtcPrgpIOxOkMK8LAVeQukMl0OeNcha1qBMNisYChZd9P+dBHPsQLn/4U6ytr/KEv/8Ns3P86QhCOnDwDQDU6yrQVnFNKfSj5NoIWtJwTPnq6HKiAxcKz8JHRxnoJZq2xlfo7OlfjbKVONCmRo87SUtZAzYRKM3rSR0KQJPgStDqfebpF4rEPf5SPf/xjxDQnh5YkDX6RWEwX+LBg0V1F3BgTYW1gqesMriPQEXJgOK5YWTlCXa+zfnTI6sYqxq7iF57YeiZ7e0wnB/g4J6RATBFrM85m1lfHvOaOO/miz/1c4orwzO5l1oYnaVINbQankUb0DNKisTJlU08pLQufQrA3VJcUMabSIiQGIZFSwFItZ4FkJW1ktDM1JoNE5BXlTYHIvpr1ZA/VSRaYsyAHfUxUf58WSkb53+UjIdKLMG6shnH528vJoclLZPQ3Wykl7aYpsUGqK6HPoqQQc/T9lTeQOTRfLkQRPUz0Bu25vBp9vL7ZdJXea5VRON+HWIhg+pQp6twh9Y9funMRIaZM8nqEaLtAXeQM88WCRedphhVVJVzf3+PylesIFSksmE2mGBOpxyvUwxGZzPbODjt7BzSDISurgnWVQskRGge1SdhKOHrsLJPJnMaqnGHz6CZjgUWnIwFJQRMYRNNI1EA9FtZofxXyMhCa/4vP4Na6tX6n6+bkDAacc7Sh1YF38mDAidN4FR+wNhJmnm4yI7X7OGPZu7LPyWN3snPyIrOtp7FZCMkWuEfIJpOxhGIErKFwUTeK+YLKWXWwdxafIc07nLGYpqHznioJ27vb2M6AGfLtf/2vc/r0cb7knV/EHzp/GwCrR07y2vsf4JOf/CjiPXVlFIorX7MQAleuXsY0hpVmwEAcw1GDsQbrFEbMS2jGMRqtknLCLWdK6jkq6Nwn3nCKB6M6xFhg16hkj0cf/RVC5yF4cpoRMiRfxOAhUEmg8/vgMwezyP6+4IYV4gzGGgauXkKtIXRcuXyFkK6Rg6g36bjh5JnTrG3AaGXE+sYK6+ur1EPLqHbIAibtVey+xaYad8KSx6foUo1LFiGqBlDUNSSlG97RDadwRTYVphTRDDUjjmxiub4RY1hqEVViIcVUtTffDuTsdRbWI2EAOWMLm7X867KQpDJvylK6rHzD45Xid8ga7etnwVnpfUEjy1TaAnMjib6BusH59RUs0p7QBAppp8ImRpRkVZys0YLUYUVdiGJShqakHnotj5VzgbCV+Rh1WKxSD9Fu3cgAayI+t4gk2naGiGph9bm1uxRBUYNCPMnp8HDpY084ycRosGZIDJnxuGF1veZK7vApQY74kAizOWLBWSWZxByZzg44mGjqxupowJE1x7DybDZDqnHNaLVhz2aqq8UgPiTssNGOj4jiOUW6g8G5jMmBkri0PEip+YO5AR++tW6tV2fdVOELIakjvoEkhiAVYhOSWzJqw0QwyHzCYlfYHmWOHjnDpJtzsZtx14Nv5OM//xwpBIiZqnhmWtNoVyRKu5ZUdGkGOpSp6ALUAQ0WtY4QPDFHbFVhY2TVNlx96RLdouPUmdPsT/b54R/5MZ4Rw/cDddTT5YMPv5krl17mYPcChgQmY02FtYbbzp0ll43COUfKQucj0+mM0WhI1x2SW3b29hBxVJV+mSUrIw9jcc4tYShICntidEYDiHNIPeD48bM8+dGPEdqFZuwJuMYSJCkLMOmsU3IgYQg+sfAd2UDVVARJDAaORdsStwXEIYOaQVMxrhKDUSBbT6ZmPj8A6RATMBOYGMvaaMyoqRkOVxlVY7oclCwjFjEZa4SYrc6bStFToxRZAoWkw/y0ZA2GhMRISh0xB5JdIUsmR08WIeAJucVlV+DBuhSbsiEXH1cjpV/OeZlP+OtXymqcDn13VIJUc+KGGv2K1dtkLR3/5dDPdfkzsddjeiWxoPBh3+XmApVmCswpqUCX5XUW4kvGkjLEaMDqfM6IKV20Fuc+YkoyKuouDWJMCvfRXwPVESDGMhg0II56YPF+QY4VAvjQIr1bewGMvfdQumObjA5aUeg2ZkjWU0mF+A7navzcYzGkHNidTBgPK4YrA4yx+G5ezCYEE2usybTTbTZOneTEkXUWk32cqTnYmTCZ7LFmVc7g2xmxSVhbs5h7Lf5i6Lqsn1NlsQxwBk2izwD6PUQg3mr5bq1Xed2cnCHNMXmOUCGpIuZANo3Of7LF9bTxFPHzGfN9wY+PU4/WoBLmixlWhhgz0yJRYCHdW0whIKSi31JAqCeSdDGSSTibsDljHGSJpFIMo4VqWLF1/SrDpmJY1/i2ZT5TuOV7v+ef8LZv+Z84d/ttrK5sglh2t17C5JboAwHDvG1xi0jKkdl8UQqdYX9/FyOZO7d2AZjNW0KEpnY3zGlEZxpGDaZFlNKfU1JPz5QZBd10KmtZ+MBwMGJrd5vxsCZFIXSJNPOF1m2LvEF1fGT1sPShIyK0rWWRIrMqYx3YuqZpGiprccMRlobpJDDbnvBSt4Wrhbo2DEeOjbVVThzdJG0EuqFhOpuyOj7GYGWIIyGY4pzfkXJcbvY5ZxaLBSEEjh09iu86UoqkOEc3qkp1lEV471Mi0y5hSGsslatou44gJV0ia4ciRf/YszJzTsvC10PGh4cJWHZby4KonaXqGHsyzm/WKbxy5hd7W7D+b8tf9o48y4Jf0kOWc01YwrvSP33OkEuyOlrMc//6l3Dn4Xw0p/SKeh7zIXysBBx1vumfRdmkej8ZC3VVU7uKFHU+ar0cCuSJTKdTZvM5rnJYox2nK16dUSIhBVJsSb7j6U88QTtvqYzgk2brde2CN7/pPl54/lPETrBSgalYtIEVG1kfW8ZDR+p22Lm+w7Ejx8kkpt2MYTNgfXUNgJXxmJg6MBHnLKENLLoFwUPXJRbGqeORVT3kovWE3JXPPTOdHqZ23Fq31quxbqrw+RBoQ4cTW+YnQsRizAArkeznhOhxLmIJ5HaG39tn7chJfG2pXMP49GuYvPwkFaIFK3qs610hepKAnrpNIVQkIBvBk/E+4LAMjMNZy9xHokPnQxWYNnPh4gVec8+dVLFhtr8HwO7BhO//x/+It77t8/m8t/9+fHaY1eO89MzHqFNmOm9JomDmkh0XM5Wt2NjY1M50Ty3LFq1nMm3pfFQdmNHwV+s0OUGBW93Y1FjY4CqnRtGArSoG4xXe8Llv5u5772Kyu8tzn3qZF5+/yNUrWyQf6UKiQ3BJ5zU+eI33ERVPG9E4Jt9FMhFMwDmvcJnZpnGGunJIUzFqVmgGBlcltrfmXLFbvNzssrYinD+3zonTJ8i5oRkfI6cpSRwZh5QiJErnJeWMtRbvPfsHB9pNkBBp8YtAwNJUNY1xJBKLboobUjodgZS5fv06jc1srB9hOBzSsyWh1IR+cMehFyo30NkPi58sWyWNsiktc5kZlmGgvv5S6KR8NrlnXqLTvmUx+00Geod/d5gcQv+by6KlhUdfrC2/p92nIWseYlY2a4xK8JFfTxAqjx1jVN0oLOUMNxb5mCLGWoxADGoMkXIsBVpNG4wxmJjx3hGio2sD826GFWE01xnfdGeb+YkVYjsh+Y5xPWDnpSukHPSbnWpEhNc/cA+b6462zXzsY0/jF57hcESTO46MBowGhlEtECMDZ4g5kvyMlGoWk7bsG4lOkrqxGAdG55HGWWwJ241GMxiNMSy6Dpuhi5mYhfZW3bu1XuV1c3IGLCFovMxoxZBEoEsF72mIErGVWg4RA4uDGVe750kIK0dPIK7i1OvexLNblwnTfUzOWFeVIuoL7Tsf0sSDx5I1+0xygZ0swWfm0TMYGaxxhBiworoyW9XM2pYXL1zk+OlTSDGG3t66xsW646knn+C5ly7xNX/qf8SsnmX3+haXn3+RoWtw4smm2IcVIg9kglfheE9O8THRhk4dLsrPCeh7MboJmqJiyjkzGo3xISwJNe97/8/w9K99mPseeD2716/x8Y98nGefeQ7BEruACZmAI4jFhUh0xY4sCc7WpJhJLNBeQrEykzMhe2IWjBXwiW6eyQ66WkkMzdBRNWssFnMWk8zO9cD+/h6RdQZNplu0NKOWkPaJuSGLwySwxSsUI0hV0TQNKScGw4FmB8aGYSNa9DNqsSYWXKIejbQWpQxGGI9X2Fhp1KxALEvHz9IlL6tJRhl+iXJ/cEOBLNBoIYeI7VmcQk52SY44JIz0xTIXGDQtOy+1WjOFyHLIWn3lHA9uqHJLqFSlGOW70RfnnAqtJqBmA5TnsyxT73PWDrfM/JK6mxNjfkWXqUSftHTv6Qk4QVNxqawtLMhECIU4UtrPjKOpRxhxTNI+3TywaAPziXqRVgjdfMHauObCCy9AynTTVmeLxmAE1lZW2N85wErF6rjmjW94kO2da3TtlDtPn2W6u0Oct2BqRsMBi3aXqhrg246UEm0JPVy0gTwosH2RQcRoMEYYDIeEEInJ46PHJos1Al0mhkgytiBCt9at9eqtmyO35IBJFoLBtx1GnIZEJoOpRxgrRB/IRf/l2zmLnIl7l8FVnKiPE5ojnLvr9bz08cc0OTxpwrpzoCLk4iWZhRQ6ckpU9HRyKVTtTPCeyaRjsLKmeV3BI2LUcit2+FnLbO+ApjAxReDy5es8d/QyG2st3/kdf4tv/bZv422/7wv4+fRBkj/AMifEyN70oDAO0USBlEtc0o0bXS4Sh8L+ywLEMu/R/9ZNxXzeUjcj3A0dyNbVbS7uzfnV//go3aIlo/ZXIbQ0Vhg0DdQjZjFjQmawUmkh9RrmKQScCxBN8W1Um1/pO2a0Q7Gi8gpyx2zecTCPVAPHaNAwdiuECBcu7HDp6uP8Sv0Rjm8e5fydd/DIWz6PI8fPkk3fqRTrZ9Ein4o/Yyxdhkiz9KM01uCMIZGpbV3muHrCTzFRWUddN4g43eCTfp5SdAlWMkvT1iJHyORXdGPLrq10U9DPyxy9Xq8vcr0zTC7iQWVj9oUNYoh473V2XLquJfQpLMlPMWpCgZTPsbfh6p/70Dc0lFeY6F0eciyfD7bAobl8R7SriylCEkL5Ocrr1fCuVA4FJTy4dM+5REpFMYTUFrcgfd8xBjqfSDFqvqQYRqMhu36i6QzA9t4eO3sDmtryicefoJvOtOtMFnIixI5zx89w/eoufjFhvpgjNnL82IhTJ84xMB3P783p2gVdBVWljE6XazY3jzFvI2xpdmJIGUyl94FPWFfT1EYLeH/wyBpElKNqP71AMNo/1zezSd1at9ZnsG4uiDZDih6xEJPRGVY2iLEqJHYNIVSENCUTScnisMwPdpg0A2IIDE+dYfPcvTz39JNUocOkTFPXRNS6LKdEFksqzg4xR1LocNTUVUWKnigdtqroQmAy2efY0TW8Mcy6jBWhrmpiO2e2tQ2DEgZroKpHPPGJT/DFX/olbO3v8pf+n9/EX/qL38pDDzzI089+jAEDQo6sbayxiJGda7vEVhl7qbiMAATf4VuPqyxVpSdUk6FpKmzlAIuzA4wTjh+ziAFXGWbPPAvAhUtXeNFaJEKNo2rG2NVVBuMhNgdmezNaHxmPGioGtHmKKW7Bda2kCWMc2VnqRru+2hpSCNgy5wpFxK780kA9OsLqygohTpgtOib7+6yMHE29hk+Ovck+adGyde1xXnrxOb7+G/9nDhaddvUUzWVMJCq1klpKGczy+pIhxUiIgZiFKEBYkEVw9ZjcBcQEUnSoPXEq8Lbqu7KkYl3Wi8AF3fr6WV4PgxqdDxf6p87LMpmg7IjiUfrKLg21gBN5RRENwWtRzCwJMcsCmMOSBJOTkJMrxeqwmPZ/37NchcJtWRJT4FDRpzCnESHkIqNJCtViipdMKmzmG1DX3sWl/8OYMqbAzx1Zs+363jYlQsy00WtActIYoxRhMBrSDLSMrF66zu605UwS6A7Y2dsjRLXdCzEgYlm7coV04QVqI6wOK4xJjK8bhtdqGpu4aw5bOzPGXWB4kBkMGxwd2CFtFDZavTa2Ur9QsiEb1bQOhg1d53VmmaIawccAYkoos9OU+ZzB3mJ13lqv7ro5cktOmDwjeUekYTBaIUfBe88iJ0T0KxhzQpw+eogdTsDP1al9dj0zOnuas3fezYWnHmUASFhgxPbnZCCXRHUhJKMWXbEDHFZqfJ5BDohxdD6wu7vL2uoaKbaao2KUJNB2HQt6FlufpW556qlnWN1YZWM45n/96+/mO77zOzl3/naef/7TjGxi0XmSwNr6OuJ1kzfGsHZ9H4C11TXaI+sqNhaDFYUX67peOpMYHGJ1BpMROp+ojW46i0VLGI1xVcXG5ibZWlw1JHWB2cGUbrrANTV+MSelDjewJDoSnrjcAB05R01qN5ZkDFSOytWkrC7/gaxG1E5jdOxszvr6CuubY4yt2br2Mte3dxmvNaxtnKDCsLoSOXK8IeUZpEanlUuavBaU3o9TheWpQJl6ixhEBe2lKyIJ+1tbxDgiBfAdxEHEFUcTJbjoZ07fPUvxPV1CnaKC+MMb8XDWF/MNNmWyLGL9DLrPSlwWzVwKV/9oIhirhJSU8hJt0CJiyDcQPm+USciyg+9nb2n52lSUr8VPzwNqmJBiYnJwwHQ6JUjppFNCXAUCzlkdExTtpD5M1I4pC6ZIG9xSonAovO/1gkkNUg47337sKYJxlnjsKF1V8Wfe9x9v6qv/O10La9hvVIdoil4zpUwXtJjHkjah6LOaAsSUiVJCaquKcAOkfGvdWq/GusnCFxDTIQLdImKsMBiN6JLHB6/ZdSLYpiFLp6y0nDHB0E13sU3F4mCb61fh7B23c+H5TxJ8osqeHLWQiRjtsMg465CcMAIpRBaLlpXRGFcZfLdArKFylsVsTmUqhqNVppMDjKgLRIxRw0gBV7myRxsuX7jM5tEjvPzyy2ysrvIP/t538j//1b+GuJrLL3yC0WjAPLTa7QgItpgu62WoK4erdUhvit1VJDHvOt1qYybnOZXR5PNsDILjwrPPAVBVA1bXNxiurOqQ3xqmkwMW+1NsSNRVTTCZlCPOqYQhxjkptDrHwWFMXWTYQrBKYjDG4MwAUznq2mFFGaQYwdiKKgu+DZgq0pjE7efP0R6NXLl6kctXr7O5volUmQdOn2M2myFoIb9BCqdM2gxZ8rJgGWNLB1gKS9noFe0LbF+5xMF0wGw6YzZ7G0fWk0LapvfPzKWoHLI0l9O+Yk92+Kf6v5m8DMLQMaeUbq4Xs//G2ZzCaqWQlr83xmqBLp1tDL3UIRd3lBu8Q9Mhw/U3uNjc8Nqkf/2pl8YbWt9px4nCrbvTGb7r+gkkGEPtLMRI8IVpaiBJoq4HrI5XST4yX8yxlaOqa6qqIqWkkHZxnBGxS7jZ2OJ7kyw2qNlCe/Yc3/NXv5nmYEqMicn+AVvXr7OYa8RRih2hC0UknwkhErzHEBnXNQMb2dyw1FVisT8nBMFaS+0MthKaakA0hkjGd5Er0bA1rKmSGlmHmIkpEIPCnEmzq0hBg4x9SIhxjOtKpS/kAoneWrfWq7duTseXVb8XyXShg0WHqzSTyy8WGJx2e6jNUrIBkUp9OBcZv2ipmwG7Fy+zOTzHsVPnuPz8EyQCDepInwRScbGPlJOqqOA1+JbZPNAMaoypCNGTk2DFMDmYsNaMqWtHDJ260CfwRXs3OThgcOQ4YeaZTeeELjLvOpqceO75F/k3P/ETvO0PfAlXhmsc3VjB2EyKkeneAV3X0fmOaqCboM+eLgZyVBKiyQaRXuOlm6uxQu686uKysDE+xq9+8FG96M2Qqhnqlz/ra8NHol9QDwZEAvN5S8gtbe6IuUFSgeoIGKPaRlucLkQ6fGFH5skM4wzitDszRhCrMU9VMZdeiYJdF1JlGA1G3HPXXbRhyrPPPMd0uuCDH/wob33blzOfeWyxl0qpp4mULb5vmLJKAjKHcF+OmWQTCYtxlhQ6bGrI0SM0aAQUS0szW7o78isJLCkphNknJByWQzmcv+WerdkXzUNJwI0r953eIXdG/9dox5d6yBFbvEFLcS5Q7o2Ez/41Lt1rpLdX6y8I9PBm7OHGXFiYIozHY+YpLY0Bsggh55Je4WiaQhoqhVKMwcdEVTeMrMWHThmRVYWJcflcvdav72hTSekQqTEYckzYyjA/VTM5kUrWX4TJlP1rV7myfZnGOWKn88LJZE4M+vk7IsZ3nDwyZjpYUFeB+thRtq7uksVRGQMuUNUDpKnwMRIWAe8t1gopmNKlBjrf4aMm8EVRgk/OCeMcVoQUBd95ams1/cTfiiW6tV7ddXOsTlGKsboxJXy7YIbG5tSm1vFEzhrHY0p8iSm6phRYTA5oYk0OLRdfeIHVIxtcH6+Q575oATUKRvPNkt70yRKzTkysScTYEbzBVg2kQGhb7chy5mB/j/HqCJwjdC0xq78nQIpBnVbqms53bF29Rj0cceHKFg/cdQ8/+/738drX3cPxU6/h737H3+K2syc5feYkd99+Oyvrq6weaRhfV7j2xOnTpJPHOTiYEnzAYojdgsMpjv7DWkMkMhiM+OVf+EUmBzrsx5Y0ah+YT2dIF0nRE3JgN0yRBCZbEp6U9PXHXJELikaOkDKCUxKEEi7VESRmYsiILYxAEQLCZHeH2liwjss7e1TOcurEGutrR2iamtVVy9mzJ7l84SovPH+dv/d3/jHf8Bf+HLP5HlVd48Oh/2aiN92SMgMsK7O0cIuo9MSbBkT1as6pniwV02uKLKYXKOcYC/TYk5y4wQj7htlc6bQSqCygNzw2Un6uxDNxKH3oC5/++mGnxhK+7E2oyxu5oc/89c/fE2D6otdbsAGlgKr1mubvHVpwqRmzfo6bRzb1cBejzrUo5KFlN6msTkHwJZWcbHA50naGVJx0TOk8U08SKv2mzs4SsdwrKaTiUZs1ANdAjgkjhsFwyNlzZxmu1exubTNLLb7zGt/kMl3QAjlylrabY1zHaNhAgGYwIEXRkarJiNX30oVQ5r2ZFLM6NSWdAXch0Cad4fdwtzGC9x1VXVM3NSn0TkBQ1TeZnnZr3Vq/zbp5qDMHhc+ksNJSRKxO57I4jBSoKGWMqSCVKUfy2G5OyDuK8fuIGw85cuw82y/uEeHQAUQEMSopUK5XSanWozV+NqcbGJrK4CTjMfgUyPMJnYOqblTLZByiNhBUCO18gXUNxlh2rm5z9t478de3eOmlF3j49ffxL370R/hz3/QtvONtX8qP/fN/wSfrp/nZ+hdpakc1qHhDhM8HPvgLH+boyiYrqyscObXBeDzCL+b4do7vWrxPzNsOYmQmE5rBiJ//uf/Isd4aK3v2t7eIbSTFiCw81kB0CV8iXpyxIJmULSQNBRWxSnFPXuGtnJc+lQASEyYp7CcSMTZDtrjir2pRZxSJnuw9l19asDfc48jmBt2soa4b1jeOMp8HPvqRp3jvD/84f/JrvoKDxZQsDVZPMEsQMZXC0rMNBSEGjZnJ3hCDpxOHcZlsLCkZBE/rM1U1IIZOZ4dGC3g2YMka55O0vBrTpxf0GzsoRFbKibAkmUg6LF49rZ/y8z2jti94PXPz0DxbSTRyw7wwpcNEjhu9SW9MiDgUvxdJBMpQTDktH6sX8PcSCTGOyhpC8FROI5NCzIhEvR6lkwVKNJQlWC0c0heblEqHl8FYFbZkTTInFxtBCcoyRlm+/YlMrCuOSZYYS1huhKObx0idx/uIWMewGdO1LVUMxK4h+5ZmJKyNa5qcaNuO6cGELBX1YETtGkzVgLXQJbqUEbG0i67INiAEUXPrlKhygqAXxlQOG4Wu80iIOByJjE8JH39jB/+Zrscee2wVOM2hK/mt9XtzJeDSG9/4xoPP5IdvqvA5MlVsMZKo8PShk1HzZnSWVFjcgsWKI1GpBVnOagVlvQp6sexe22dzc51u9QQHu1c0Nw6FLlMq1lkZ7XAMGKduMSkmZOGV+VU3dJ3BOpAQ8PMpjavAWJJkbIHJTM7EdkFlaypT0c5bkled4HRywO5kHzuoef+P/SRf+6f/NFcuv0yaTNjvDphMJnS+Y7Gv1/TCS5f4qf/jx7EVjEY1o1HNsaPHOHFkg1PHj3PkxFFaILbC5tlzHBsfo12EpbRi//oOvmqQpHBu7QzEhEmOoa0pwjwSunmKiOoDpX8vael7acRRXLZ1e3dKgjAljoZU4UyFd5GcI1US6pwhRXJ0eN+ytbjMQV1z4vQpXFNRNY5hPsL73/cLvOGRh7njtbfhU5nXljLbW3+pMsuonV3O+AJRWnEs2sDCz5i2C7xo53owmTBYGzFPC2qnUVMhBpx1JQ0i0wvCc5G76P9X5x4pnU+vcuvJMIfNYDokvhT9YB+r1HeANzrA5HRYzHqiyo3F8dC5JS/JJMv0BaMSBWWAspwb3pj0AEBMh68TMNYqSQnKa8nFom1pY1RikhQaTFDgUiX9qEzFIuX1pRs8T/XaRIy68ZGslENjRqLVA0IuSRhGUKs1V8J4K87fdhubx44ymbYcXN/jICUqY/FimYWWqh7i8PjdHbq2xVqDt4aFTaRsidlgaTDWkfKU2XROSkJIZeaZDCk7JUDlPsqpHApSKc7JqCtU1utt/i/s536r9dhjjxngL1trv0ZEKg4FmbfW782Vc87+scce+0Hgb77xjW/8Le+am8zjE1IydG3AVlZNa6NKCDRHzgPu8HSZIBIIKeF6PVjs9AseLDbNWcwdRzZv4+Bgj5xm5bQfda4jlAdJulGIgFQk65EYCG0EaUAsMQWNcuk6FtMpg+G4HwXpihFTodCktfg2M59N1VDaR65d22Lj5FH+3X94P3e/7m5+/zu+iF/+6Ie4a2DooqZHb376RQBee995utGQvYMJvovMZws+tf0yz/MyxEg1GuBWRtx97k7mT0/ZaNbY2d4jlo4vhUgWNSt2lZ7W66ommxIBlGJhrvfFTYjq66WJ3hxu+7YPQi1vVAQqV2FMLJtxBcYg1kDx3RQTsY1FpKLtWkLsyIvAtcvXWT++Tt0MODg4oKoG/KN/+P383X/4HXTtPr0jijIle3hPVHfVRrqcSEbTxmPXUVUD9qYT/ps/8hU4t8apzSM0jdAuItEkvPdYq3o+L5aqaqicbtKptG25SBPUxDpq4GkRvffkl3Lf8wqKSc/uzKLwY0ah2hvgTspvKJszvRLZXK7Dx3+Fxg9K8GxvbZZLh3rYWNzIugTUseSG37e2d3npP+m+KywFW4Ayx01lTmgLyKzQqaIhWuAPu1NZmrzd+DZM6QJVG5iySiCsdaXzM8TkISXqakhlI7s72+zt7LMyHlEPBnQLy+7ehHqtpg2RtvVYa2nqmi4Js+mUOJ1TVyNIDt96shgiAWONzvZCxFUNxgnig8YXYYucMy4PG0YiMWkH3Hvc3uT6y1VVff2pU6e68Xg8E5Hf9NO9tX5vrJyzTKfT0eXLl79eHaX4X36rn7+5whcTMaqPJEaZjpKtxhNFQ0yiMIe2BBrUajJIwmSjVGsiKQqVQLfYY3d7zh23PcDG5mm2r35arc6ywTnN4BMSxlIif3T+4xuDWQA+KL17OCBlh/EBorI/sZWmKpRok5CS+i+aSHHAZH9/gmRDaANtG9iftXTO8He/42/x97/n+zmytkGYb5cTfdGJAZubq9x7+21EMXSdJ847FvMFre+YzqfMZ3Nm0zkf++iH2Qv7mM6w8B3e6Rd4/ehRTq6uKrFlMcMmwEd81EyHmGMR6ivQS1KpRxZbYExbrLbKhpkVBMzGIC5hrBZLgwVbYxzY2iGiriYpBMR3hNBia8hpQPSe1nfs7k5YW9/AFHXY9vY+H/rQr/HAI68lt4W3kWVJpADAJ1JMtF1gMl+wfzBRKDMZZn5BNAYITGc7+C7CXkXTWMZ1TVVZum6BtZaNjQ1MYZKy7KwKmaX0ln1WXkr9Z3K4buy0FAUs6d657+bSbyh8Uv5cN9786zq6wyJ2o1/pIavz8PF66LUvdj002msC+04zxHjD6+wf+wbSUOnetdCCFrm0LOT9ey5nDr0qxh5eh1RSIkwqUkZDxkFUzaKyKDNZ9PuQopBsMYYQQbLm9DXNgNe/7nXsbu9yZfs604MpK4MRpJZZzuTRKnUzgM6z6BLiGppqhI+ZGMD7jnbh8Zni3VrIPAhd1NxIsUIUs5yJagJErchBKBdoeQ985uuxxx5bs9Z+zalTp7oTJ05s3dQv31q/a9d4PJ4DRy9evPg1jz322Hf9VrDnTRU+EUO2TkkJKSGlAELGZoVpYi5u6iaDWJyEIkvKWIprhrWk3JHjHHxi79plzr/mLhaLfeLBJZx0RJ9LIom+REPWbkeEytVILeSFUq8xc6q6IuZM9EqvDr7DVra3TiykjAgpIKYCC8HHMmsJbF/fZpQ849GInIXve8/38ue/6S/wE+//16zUDSHG5WsRcVhxCBlbO0xtObKhJtJJjmONQ1KmDTCZTbn+8jWe+vDj+MJO29ubcWG2IOMRydRJvRUjatysGXZl4yt6LicJLTsFEhONc1r2f0Y1XjkkujjHSAJxCg8mhblc1VAPRjSjFYbW4sOU/d0d2kXEVFY35VlLa6bYnPDBs7aywk/86P/JI296N1u71xmYNTCJRG/9ZdARmaMZKPQ8Go7pfGTetdhmgGt14zeF+uPEYqkAhzUVq6sNdaPeqz2Dszc7N73kYZmhd1jceiLHjfCk6uWKBrCHHZZL00BYFsP+8FBE8hwSV/pC9YpNt0gGcjqcm9mSSu+slCJ6o2tLebyoDFGWhS5psVoWUt3fk6TS4UqPai/nghZHzKrpE+kLpZSuXh2EdNygB89UfstmnR9ikupKMyUwy+IqQxQljBmjySiLTkOmU7RgMuvHjzM4ukFoPe1kSrc44Nr1C5qVaRt8CHQpkf2cxhjqasii05T2uQckQsx4r3pP/Y6Cjx4kK/kmJIVBC7yp/7UYW+l9XlIebmKdEpFqPB7PbvYXb63f3at09yN0rvvqFL4MBCxIBTngTCRKJGFUXFvmL9lEhIhJGWsCQYTOZayoq0RKHVmU0WZyx97WyzRHLcdO3cXL+/tYdqlMIuOIMZevaYHYYsIktZeSqsaHFt+2VCZDBXQQo8Ji4g83m5iELJngW8Tpltl2HlPpCX8+O2A23WWvqtk4epRf/KX/yNvf8XbOnbmdq5cv4SpHvwmG4JWaX7w89QStYmvJKri3ziA1rFdrtLOWL/yyd/A5roH3/GPuvP0uXlxMOJhtQwwaFGo4zFVD/7+tHIIQQgBU2yVoJxz6mZ7JUET0yzmJUW0UolR56yAlT7vwtO2cCTV1VTMe12wcO05Kkd3dPWb7E6JvCW2lBKMM3nuuXpzwxOPPcP6uU2RfhOnlUJJLUnlVKbxmfcQZSz1IDHJFCJ6V1mBtjZWKyjms0TxBa6x2pyaV93JjJ8bSyacE7imkV8pZH1C7pKIUMov+ehHb56WfCX3H2HeCOfeShF7wnV5hWdZ3aT00GWMiBu2Kcu+lKT3zsje7TqU/oxBbyj1hDNwg17iRJHOjnRoZUirFvViW9aJ6ayxJDKm/VpSYXaOOPZR5GAUiz1YPZi4ZQm5JtBjrMDjIiZxqRLpyMCzXKOZCiILK1cSsEiZrVD7kmgFreY1mVHHt4mXa+RwfNJhZJGGKL6kzBmvUOFwTJtTCLqTIYuHJsWRbGtQGLpnlaCOmrkC7EUlSoNKbHs8Zvby34M3/2pb0X7zfhsx0c0wnibiUdKYHRLRryWo4Rq4qUpXBqTiVmJDQQe7wEggGsjhS1i+BGCXEdKHl+pWrjDeOsXLsDF22epotJ2hbTIQTOv8KoSPEDmNBrCXExKJtFVp1ooUh6gxiabsolpyMOsSUzSWkSExF64ZAUq3g/mQP01i+57v/Aa+77TW4pqFZWaGpBwDUdYV16kZhjcVZh3M1uczaxIDkxEgajFi251NO33MHNNp+Hjt1gnPn7+Dc+deweewMzcYmZmUVBiOoh4hryGLpQkfr54TU6dwsqlzgcK6khAFJUGEYGItxuumF7OiiwcfMIgS6nMhGsM5iK0cUy8G85fruHp7EqbOnOH3+NIFEGztiCnp9YsTg+MDP/BzHN08jgxZbZarslCZPoKoytROczQwaR+0sTdMwaAY0zYDRcIVBPaSpB1SuwjmLtWrj5lwpaFnDgH3nCd4TYyRFFUGnEIlBRc8xRFJMaosW47IwiVEv1Z7WDyyLSir6sRDVFuvG38v5kLEZi33WkjF6o+yB3ufzhg6yZ1Xqg5RC3TNcTSm0uThuJiJR9XnkZcxR/1pSTBALa7XAyD1ZJqekms9irr0k8OS8DG1d2ptCGRFoYTXWaNpDgQz7Y8ONCfD934som7iyFZV1VK7ShA+UIZ0yhCzUwzXWjx6hGTdUTYX3idnMs3uwYH/qmc51a7FWA30XradtA6G4IEU6fJjjfVeimVKZZatpecwGzdrSgsmN1/zWurVehXWTUKfFmESWDkOAkg6QRWnHzkaiCXr6kwExAmmOEaiMICFhs9UO0EftopJCMO1+y861y/yhr/gS9i6+hp99/y/QthOcrbDWEYIy54wRUoqE4KltQ1VZUrLa5QXBmQpsLmbZaen6oKf5BMaSs/5+Ekeyyji1zhGzJ8ZMHTM+Bq5ev8rP/bt/xx1veJgrO1dpnLIyK1dRN3WZOUHKXpmsxhC1lmNJzCdTruzscGRtjRBhPldn/Jgy9XBInQMiDWYwIvpA17bEzoP35NRBn78WlBzkxIIUeEtsMZEuuX8mIla1g2I1RNf2obgEMJnKVVR1g3VDrK2pa8fa+oi1jRHjQYUTw9r6Ok9+/AkGgyHJJFIQBnXDSy9c4NjmKTaOj8nBENtM27VgM9PZnBgCoVNDgcl0xvrRY1R1zbydEzvNLyQJOWn3aqxoiLGoRCUVfE5hxoykgEEPKRqVkw/nftKzIdWVJmc0SH05/4tLRqC6r8SirzuEH2/M8evnZctw2lfc8zdAq8b02Gh5nMPCubzHSsxOD7EDhampPyylX0v5RsZoPysUsBRpj7I/Lap9yykVScuhxVsucK8xRotmKYoRLTi2aCCtddiqUb1dLgG0LEfx9F6nAjjryDpph5LKMWgquuBxVDqfHzo2KwM5sZhOWF9bZW/vgOk8EqTD0BB9JEZPTgpfp6R6Po1SUr1fTOqwlHwiLhYa2CzKIYgxYy1YZ4uBxa11a7166+bSGYhkow6ENlckrLq/O0MwCsO5VJNEI3Oyy/hkyclhkmCzJ+cAOVOJIEndUawRcp4wvX6day9e5k/9D1/Fn/jaP8bP/vsP8B8/8Es89/xL5eRaLd1LhIj3Ct9U1iEJwsLj6lpnL0ZP+SSdqwkURw61AjNGv+yx8xgrhODpUsIaBwF8F6gHA/7Vv/yXfPsXfD5Xdq7qPBGF/0KMGPIS7jJiiZW2e3Pfce3yVQ4OFpAMYbej85l4XfP89vcPmORMGwIpZi0IWfVdxmRwFpJDstNrVQkUGUcWzXlLyQBl1poiPiRSiNpV24gYh7WRFDxGHGKhjYEuQD0wjEe1dq2uZtCMGa/UHN3c4A2f/UaufP5V/vl7f5Sum+GsJebIlWvX+Tvf+V2MmhFuNGB1Y43V0Zi18ZhqWLO2tsZwMObKtWss5h1d3GZlbYXxyoBIxAGD4VBTCXIgi4qqBZBwQwdm+uKWiAmFcW0unU4+jNrL/T96IglF+tEzTqXM1cwrZmn9TDCWA1EsSSLkw4IoRZDeszXLbxV3lryc5fU4ay4wc8p5mZBB0kIsojo5ctTf78cBN8wOl04wUoyxy7wthYiJPTFGlozevrfLWZEWI2qSrvM7vQYUqUIJ0ABU+pL770EuFm/ZLifH+v4jCYc63ZUKGaESh1SGFA3BB4yssHnsFLlbsHP1CuNRQ1PVbO1dJ8SESRWkSp8na3guxWXJlJl0Eu1Ojc2YrB6vIesc32CWCSmHzN1b69Z6ddZNWiIIJgKxBRFMZYhluF9lozduqgg5YiUq1GMiJIMUZyWRCNlqx9Lnh5VZRzvf5uMf+QSPPno3D7zxPF/2ri/kD/3hP8Czz17kX/3kT/KhX/kYPngG2RFxxCI+dk6oqprgNWKmcrWezr3HllO8j3NsMyIBIQZMzlRqJbEMAc0YchGOZ7EsfMbkBR/55Q/y+je+mf2PPw2wLN4C5BQQp7DiLCzYnc2ZzlpSm7BuTDdv2d3ZZ3/SwbYWvvmio6sCfZo2MUHw5K4rvpFlA8uaLIHR+ZR1UbscqRSuLX8vJCRGrCQ95WeHMQ5bQcoBZ4ZUdQWVhcrQ1A2jwZCqqbHOlS5A4a+ubXnNa27nz/65/5F/8L//fdp5VBllTjzzqedYXz+hcivJ1MYxqGo6ifhFh6SMszXnbzuPSCZ0LcNhxWh1xPqRdU6eOMFo2LC6MqRuamIIDJoBOWe6riOEoBuhUdlG8AHvAyGFUpAKwUUM1unrNaLpICGEQjrRxJCcBCkRUSKQ4o2ZfHHZ3WWgN4XWTkxLS5GRL1dOfQL7jWzMvkj2c8lDFmr/6KpXU/cU03uRipqLm15LKEZJK0aDa3V8aUqXmZYjzuUcfUnwSdicyVLSHyUhWaiyIg86LlDGqil6Qyk6QOMSeRmTJFAQETXP6btqhzM6G3dZY4R86VhtSOR6TDNawdXXmWzvY6ox48GY7Z25GrsDKSo6YUUtBp0Irm7wPiDZUoXS0YuOPoSkDjZ9+HG6ZVf2u329+c1vvvfRRx9dedOb3jT51V/91af+c78euMnCF42BKhN9RwgwcDWuEmIXsSUtAGvKDR+WAa1EvYmj5N64vrDQIhlPzOpQUleZrZ0dfuqnf4FYfQ7nz445cWSNUyeP8i3/r2/i4otX+N7v/n4+8isfRWytw60cCQU2bZyl861G5GCK2LknMURCF5SVarLOp7JGIUmudJMIypCLJiMxkTC4puHf/puf4g+86yt53BXoqLEMBgMGdsxsvs/2wS57+3MWnSdQ4T1M9g+4cmGL+WxC8JGYLeszJZntTw/YjUWqgaa1S4zYrPNGjCGRyoaGFuJkCpuvTH5EEGrdsyzYOuGkJxMWy69ayKbCSIWzFmqDqS2OEnrqO2Se1TUmDbBYhs2Iyk0wVeJP/amv43v+wQ8wnc4YDR1+5nn4ix5ge2+HGBOLgxnRB9oQGI4G7G3vkKrEE08/gUGfR0SIFJhahNXxiMoJqytjRitjNo9tsr6mRXF1dZXxao114Byk2NG2CySssbO7i289MXa4uiIUyYZQCBW2AVS0n5NVlmtOy8T4nrhiComi19CZwypWhrM3+F7esPoZmMghAceUDD/hBvZm+WzUHDph+s5Qeg1kAini7QJFpjIykN4jNOtnmGFplN7/SQ+VZgHJgjWCT/19of+eizGCiNF7PcYlmSfGqCbRUY0BVHNb2K6icL0YtaXLQWfKZI08rq0rJBQpzNDM+uZx5ge7bF3dp13MyL6jDZEuBz1UJotzDdk4WqOzW7+zg1QOYu+a4zAYhk2DzeBI6vqEynpyvGlW5+/q1ReKM2fOdBcuXPjYf+7X83tx3VzhS4GUu5JSUBFCxjhXsH9bxLERYw9ZgUYahV1CIKRC9c4OwSps12enZU/XGZpmzIVLWzz++Esc7I2YnBiweWTB3sxRDyr+0rd+Iz//73+F97znBwheT7gxAjGWeXgmdAtcM8K4CrqeqCAF3urF9oFMxlhHTDoTk8JOtQih89ja4BcdF7e3+bUP/QqvOXcGgHvvfS3Xz7+Gqxf32Tu4yu7+lMl8wXTm2b6+YG9vSogz/KLVTiRn2naOb7Xwha4jijpvxFxsygoFIhvBZNXr5f70L0mDeMVqZp2IskejI+W4pK7HDHWZ+ySBnA2YiiwVMSdMEogKJyr0ZYgxMJtPSSEyn7Ts7+5x772vYWW1YTAe8D/9xW9gfbTOz/3sB3j2uU/xwP23sYinyAguQQqZQEXtKq5dv6YzRmOY7k3ZurbNxQtX2NrZpwtBBdd5hpA52JuSDKRnPl0Kht4vTdNgneH4iaOcOH6cE8ePc+TIgM2jRzh+YkyOXqFF15BiYD6bEjqv7iAhkrzOlSgQY+c17SLlhLE6IyWlZYHTBPeitQNcn8YOZSbYw2wlQqkE5eYsHGY86N8bawsSUJIjBHVHuTFCXARE5T+95MLSw7Fa9PS5C8HFaLtnCoFmKXfIvbBFCgyrjFLJvelBL3ov7F9uYMDSM0r13xX61OfOhQnsyERTAN6SDC/GYDFU1hKMilOMa3CDNarhOu3OlK6Nes87B2JwrqZtA9QVr3n9/Zw6f4bVZsjmeJUQAruLKXs7+1x5+RIXX3qZ+WymyR8mE9qOGJIK62+tW+tVXDcHdYbirWcc6oihg+nGWIVqCvREKp6dKZGyho5KmbukpKfeiEAymGRxIqQUMMbRtguiGJ577hKOY8y297nzzgEn6xViO2C+8zxv/tyHyNX/wHf9vX9EZStlIIaAS3Fp2puTMt5iv+lkML0ZcCqQkgJKao6cDU4srhySQ4xKOMkJJ5Z/+WM/zv/j970FgKee/BSffOE6sbPMFnO29nfZn3Us5i1+GiHkpY1UzpnO+8LMU9jGxAShI0UtiilH1biVOVDRCpBRVp5aj5mSr1fYd9np70jU7g6F95IkRJk2hfRhSQgWo91DtojJ5DgnJ5UTpFYINlJVA6azGdu7v8Znf/YDLCYd050dvvIrvpLF576FP/7f/xHO3n2GetwQkrIMkw/Mu3I15c4yb9P3mhO0beTgYMrBZMp0OsW33XJD3p8e0HUeI5Yrl69oakaYoj6jgeee+zSPPfZR9WrMiWFTs76+gjEwXhtrUdw8wrGjmxzdPEZsAlVlGAxWWCzm+KAb5+7uXoEnRYke1hQTa2WSxiVL8lCnJ7b37iw2cUZTBnook2x1Ppt74kshuhCXOXsqp4AlzLH83WI6IMs/Ua8VpQMv5SL6ivXPJUeFj4vhd/94KUvReSvFV0u1lkNKUkqRht4An6JSgZyxVsp3Vl9QmSBjEZK1BKOSAgr0aEUUIjVJPUNN4sTp88CQa8+/zIUXDxgNRnTZMm/nDNbX+Py3fQ5m0GBHDZ989glemL4Is8B4tEqzusrm0XXe/iX3E2Pk2vUtXr56mY31VSQFukXHsWefhcsXbmqr+r28Yoz8zb/5N0/84A/+4PEXX3yxaZomfd7nfd7+d37nd7583333Ldvja9eu2a/7uq+7/T/8h/+wvrGxEb7xG7/x8k/8xE9s/nrY8TN5vLNnzz548eLF+s/+2T97eTqd2n/9r//1pjEmv+td79r+3u/93peqqlo+59d+7dfe/oEPfGB9Y2MjfNM3fdPl/ywX6bdZN1f4IqTckESIJumpMHt8VJNZrMFkA1Tk7DEkojiwYEvCds6ZZDo9xUZBooDPOFuRgjqPpLTg0sULbI7HLPZq2vg8pjrOifEd1I1nZ+cib3jLA3zFH/0jvPcHf5R62KgsQXJxh9BkgMbVS2abDu8LhCU6qHGoF2gqugARSxYV2TaDMbPWs7+Ycfb8GUYbq1zb3gHgwstbbB+1TPanTGZzFtEya/Vkn8IC7yMxK/nGRz0Bh7YQWYCQE132RPQeFVCf0hyh+ErmbIqcQ42bc9I4qGy1MyU7tXczGZMsKVc4M8AbLaCVLbZeUYuqztwASWRjMC4SvKGua0xVKZU/dlSDhrb1PP3Ui7T7nksvfIrP//wv5Pt+4Ic4MrTcfscd3P/Q/Tz0WQ9y1123MRgNcWbGwXyBSEXTNPqeRK/1KrC5uUJOiUW7oG072k7h7enBhHbhSSHx2rvvVNlKEep3viUlQUytWs4QVP9HYj6fE0Pi2tYWF166wDPPfAofA+O1MVVlCSExGg4Yj0Y09YDZfM5gMMB3HQcHU5IP3LO7C0BV1QxGI+3cks7Rgvc3ZJqXWz+ojCZGr/rFZEmhD6TVNyzFRKDvsrjBeaTvu5QUaktRVMKMkCmxdCBCIKrNXGFrHs70cpk/Q+84Qz/HK4VRzQ3KgW7pWJOLQuKQvOOcKw8bsZJLIHAJu0WwPdMWyE6KN6xCuiaphZzEDIOGGCOjjSMcOdWRjOf6zj57u3Pe8SV/ADsY88LVy3zyQ0/TuIpz589w2x33QNvyxMce58rHf40UEuPxKqfOnOHht7yJz33728lOqK3BRrjtzjvh/T97U1vV7+X1tV/7tbf9s3/2z44D3H333Yvr16+7n/7pnz7yoQ99aOUjH/nIJ8+ePRsA/sSf+BN3vO9979sAGAwG6du//dvP/ac8HsD3fd/3nRyPx6lpmnT16tXqn/7Tf3rigQcemH/zN3/z9d/sOb/t277tN33O/9zrpgqftRS4xekAvHw5xWecEZIz+BgL5JIx1uhwWjIhZRIdySy0+CRllGlUi+3Z3vpljRG/2OfK1h4nj5wiXdymsR3cvsLx4yOyZPb2r/FlX/FFXLz4Er/887+KVHqS1lSI3qYs0hu7GwDR10YQJBZzZ6OgIiJYSbggzJ0wdpEHXncvD73xjWwcP0Lbtlz6yBMAXLq0z6VFRY4635qHjoVP5LAgdZ2SbFJLlxYkn0k+I1EZrKBhnCZryKk3AUkdErPCl7lXWkXtjCPEMoMREWxQnxEj6oNIEV9nEwhprrIKZzRGKLUY6RDjSNaBsYi1JAcqbs7gA13KWJdxFYQuk7Lh4uVdmMPLV3d5/uoVQlXz8pUZ1y4/yUcef4Ef+8n3c/edp7jtzDEeeOQR7r7vdkzt2d+fMZnMsULRPar8xFV6zzgxVIMRxhjGVa36ywTtvKNtO8gRIxW1FaTqvS5VphBDBLEMhmuA5djJoxhbwkvL7C0VGUOM6jzjQ6Sqa1KMrAyGnD99BiOG4Sf0s2x9x/7BPkSwWQNc1UXF0D/z8l4KRn0o0ZkatUKEfQSQ74JmSUrRqyVV1Bn0cJKzFnZLRKgIyau+L2ZsOaSJ0ddtTCHuiCWJR3Ii+kASy4nTJ5UAlZOaVYvgg1eNYgiY8j3qu+8+misWeU/MCoNaY8jL4F6DjTp2yGXO5zK4HEkU5ESEkCPGCDEZcgr4tiN4nec5AY9j48QZ3vrOz+bJJ5/iuec/xEIcb3/nl3Hm1Fkaa9jauc6infHOu+7iYOcqH/nlR7l+ZYsLL17gxZcvcNvdr+GNX/AW1taO4ySzP/+MDPf/q1hPPvlk/d73vvc4wN//+3//+T//5//81t7enrn33nsfuHLlSvW3//bfPvFd3/VdFz/xiU80fQH6M3/mz1x5z3ve8/KHP/zhwZve9Kb7fyeP1//8yZMn/Uc/+tFPDofDdMcddzx47dq16gMf+MDaN3/zN1+/8Tm//uu//vJ3f/d3X/joRz/avPGNb3z9/7+uz2e6bjroKtNB0C8cBow4jM10wWOiCmDVm8+oVRaAlDToqHq+EAckzZNRgokU38USQYPXn926conV4QauabhwYZtRdYFmcBI3aKANBB/4iq/8Uq5evMSnn3kRJ7XSwbNaRPVnaihU9RgRKzipFK4tOWXGaucTfIuXxANv+Gzuf/B+Wu+5cHmHJ56/QNsueO0V7fgODubsmhk5RkJMdCnRxZYUW6KPBA+JoGSakCDoqV56nVhWuxCTk34APfQF6k6asxYwydgsSn9PeXmdRDStIqUyqyFrJyeQU8Rmp3PPDLVzkEtqO0Y7iyQquheLJJZzxpgCJlqMaTAOduYHHPgF//b97+OtX/yFpE64duEyly5f4dq1qzz24Sd54mPC+3/mYxw9MeC1993Gfffdy+vuf4Dd/V0uTy8jYogB6FPClxMn1T8aI6SYix2ckFIHxEIc0bmc3OCzkBPasRq1msto8YmpF0EX1qEx2HpAVZXZXsrkoPB1FGFQPov5YspsdkAOikBYY7HOgjE4a1WrCLx84SL59ElWV8fklNWWDoXEU4pL2UMvBk8CxEAMHTlXKOHIlm6tkGuKEbVVLwBSTtRWn18TypUQo8SsjoOdHa5t7/HLjz1OCJGVlRXqZqA5h84yGA5YW1ll2NRYowfAuq4YjdZwzqpZAIKpKj0chEjKKtonQ/KJrtMIIclC9J4UfTGS1g7QiJqK5xgR55AYWdBhbYW4EbfdeR9uvMq/+fc/i6sHPPz5b+eL/+AfohqMiG0gLBYcPXeW6XzK8889TWpGfO4XfRHPP/Usly9eZrGY8fKzzzA/2OfhN30em6ePszeZ3+w29Xt2/eIv/uK4J1d9wzd8wx3f8A3fcMeNf//oo4+OAT7ykY8M+j/76q/+6m2ARx55ZHHvvffOP/nJT45u9vH69c53vnP36NGjEeD8+fPttWvXqmvXrrlf/5x/7I/9sR2Ahx9+uP31z/lfwrq5BPaYyHmBlUGh01tyTLSlwJkQEFdpqKwxOtOyvaGxCsXxGVtiWDJRfR6TIYovuielMrss5G7OlesXGK7dQZrP2N7eZjgWNjY2yZKY7m5jBjX/zVd8CT/4Pf+C2b4nFgmFUtZZFj7tBotnYj/fM9rx2cqwCDNuu+s23vrOdyDVCtvXtjjYm7CzP8NHTUvo5jpkny8WzBr1CQ0+4GMkppacPd5nvNdZT53UJzPHnslXXEYIWNGEeHX30OKUi2wqUTrmLJishIKlJ6ckJa5Amd0oxT/FnrofdX6VtRuPuTiKIKqBk0xA5344S2WtHkoEDfmNATEZ8RnjPatZeOaXP8wzv/YJTp4+y21338H9D97HYPgGJgd7PP/pT3Pt8gGffmmL5168yAc/+DhvedNz/PGv+yom02e0UJsKI44QuqVg2hglecSgMJxCiCo+FzGF7aefk7W9r2Wv19O5lrWWWFxeYkq4qsKU+24pMDCinXYuuXNJ5RLWKSrgKqGuij1csgQfCDGSUyAGwZe5bN00eKsF2Dmd2eWUqSpHLEHImiJQZtsUET2ZlD05eRKGEDpszoxXVulCS+cDVVVTVRUiauXmnCM57cJSSrjsSK7CZGhjphmvUNcD5osW7yPz+aIkH5R8vxQLkUrZuykLMSWswHw2Zz5fEFEj6kHlcLVjMBwwHK+wurLKoG4YD0dUzpEIDEYNw7HCwdY5RqMRsZszn00ZNjU+ZyatZ3T0NNcPDhjVQ/7CX343J06dwSfY25+QFhEnDnENhEhdG+6777N49ulPsL97jdvvu5e1I5tcv3iRlaaim0ee+ujjvEYe5Fh7y7nlN1v33XffvK7rV1yc8+fP/44psJ/J421sbCyjMg7TRfpsmN896+ZYnTkUPVT5QseItTXRijq5RB3KG+vo09cpbDS1T1LbMPUZ9DrzM9rR9FEqJqM6Iww+LJgcXGHr2hpH14ds7c8Ybk0ZNCtYk/Hek6ctMWW+/F1fzE/+2E8RvRZoRKnafYC3nmqK00kOJfAzgIUzt9/Gu/67r8TUjk+9eIXLL7zIYueA7L3GBBHJRvC+QEaLOQun6es+REL06q4hQb0Wo9F5XfJKWE2RbFhqkqxIIdr085deeqHFK8kNrLuSJ6c0xSJORl1LchFMi7FaykR9ESns2RAzPkaqkmsoZJxLJGMJSfApU1cVlUELIOotmVNSW7jsqVfGxFBjGsfFq5e4cuVFjKvZOHqc2+++k9tfeyf3f9Yq8/mMrWtbXHz5Is9dusrl7R0G4yMImfFgwHhlhC32Wb2NGCgTMyeNKNKCWOk9FTuMSaSk7jPOOYXOi4jbWRVZ63ws0gWF5FKxdPOhCP4VyNb5YXFAiTGxurICwPraBn5lhaYa4KQmRI33ySgasHJNzf2N0zSBtluQg7KRrXHgBF8KshPVsqq3ptMDhLGYnhNjLTFYXMpkidSDhqpuyvvWH+m6TsNYRf09vS9kIAExjiNHjmhsVRbGowEZIUbN7DNG5RnJB2IsB0lTkzHaEWd1NCLDrO0IKWi8V+zYPdjjyrXLXLx8GYnlvRghmURVW1JKzBdzus6Tc+LYkXU21jdw9YA2ZJrRGufveS2/787XUA3HLLrA7mSGMRWVsaSo7OMcYVCNGNTCYjHl/te/gatXXub5T3+Ken2Tk071fnUT2J1P+NRTT/FZt53/ne9wv4tXzpnZbPaKovKWt7xl1nu8fvVXf/X1v/JX/spVUJ7C+973vpUjR45EgEceeWTZJv/Yj/3Ykbe97W2zD3/4w4OnnnpqeOPjfc7nfM70M3m8z2Q9/PDDi1//nI8//njz65/zv4R1czM+EXLW1Oaq7nVNVoNlg36hWu+xTbM81UtMajBttCvRQ3ythcj20oieoNF3aUowSSZh4pyda1c5snE3Mw9b1ydsHFlnOKjJyeGnAVxk88QKb/3it/Dv/r8/i3O1ip5VmAfl8RQWK2w6oxv8u77ij+CN8OmLl1nd2CBKA2aAMQt8aOmCJxqIorM5gG6xwMsEHzMhZ2LqVISbO1ISbLLYHNXVJh+SanqyhP57iRGyUk7SKqbPUdT02vQWVMXxSnQWqVR5hUYTFPaqUtIzGvRpy3wqFgZgzMXZo8hNSPqYPmbVWmVRpxjJS12aFaEyllmIuMEQcYaN1QpyJHSR6fVdHrv4S+AMoyOrnDx9httvv523fMGbGY5qHn/iSeYHC86ePMH5s0cJccHa2hGGwyHD4QDn7LJ7izEWiFDvDe3UPMZkrO3JMvrl7L0tpXRz04P94p3qiEjptvS6GTFlBhiQDM4Yhe+AeqpmAseOH2d0/vaSaK8aNVeVmaJP5Ou7ADz08MPMX/9aJQ4ZiwRhtliwiB0+J/b299nf2cUkvYYxZWWL9rBuyuDUULqKEF0AemG33qdGZOmqIqZSGzh/eODOQNd6fIHnrbPE6JUokxLZ6ozYClCYq+pWZKichdzSVArx26YhpBorKhrfPHFE9aPZYrNTSU/0RBNLFy5UVbU0EoghM5nM6JJw7+vu5/T522lWxrS+Y9a25bsuhBzIyRfLtYqQkx520DnqwncM149x531jXvz0M+Sc2Txzlpeff556WLGYz7l84crNbFO/Z9alS5fq8Xj8hhv/7Nu//dtf+qqv+qrrP/IjP3Ls3e9+9/n3vOc9J0ejUbx06VI9mUzsd33Xdz3/lre8ZX7//fd373znO3ff9773bfzDf/gPT/3UT/3UxuXLl+uqqnKMcVlM77///u4zebzP5PU+8MAD7Rd/8Rfv/szP/MwrntNau3RK+i9l3dyMT2TZQcUYccaBTdgUFJYRwRJxIRGtzgSsCDlkTWm3giRLSh5sBWJ1zlO0VUKh9/c2VckRO8dB2OPK9WucOXOCWZiyu7VHdXKdFA2Cmvt2kykPPnAPVy5t86FfeZyqMkvaPBSYawl9CieOHeH1b36A8cl1fvVDTyJe2FxtGG6coBmNCAsDHXgPOSRCCockgS7iTUfOURl6aJGNqD7QRk2Hz1k0JZtISkIu3KiUPDl0KlmIRjVaKANQnCAFIuuhT5FMNhq067J6M0Y5VGcJfURUgyRLJGKsntpDiJqEIdoFSup01pdBTEXykUUEV6nBd20dNgqxhKbGFBnWVlUspi5MvowFRmlElEicz3nxqSd5/pNPkq1l7egmm+tr3H72PCtuwJWLF4kEBoMBt912ljNnTqn5sTWcOnlShdHl/hLS0sTcZsHgEVeXrlffg3N6CPAhsegCo5UVFr7FYGhnc+rxSCOOyIjN1E45idnreySXFAKgrmvyoCmeqFF9TskYElIJ43UdccxnE2xlaaoaky2py6w0DStO2ZXHjx5le32bZ596qhR0q+JvI6RQOrJYIQiTRUvMntrVIAEfAtkU4o9VX8ssXokqUrw/y/2htgeaeuBDVBQhej0ARU1m9xkkOf1exUwbZ/qeot7Drfd0XlGYlfGYq5cvq6NP01CbCqjU4EAyw3Eh4aREbRvEOrqQOHbsOLedew1n7rgbTIXvPJODhWZnhqDz8wQiiv5oWHRvdwhdTkSTSVh8FrKFzePnuPzis+Sq5sjpE2xf2kJiYufy9u9ga/u9u37oh37ohfvuu2/+wz/8w8deeOGFQV3X9syZM91b3/rW/S/90i9dMoF++Id/+PleWjCdTu23fuu3vvze97732Cc+8YnRYDBIN/t4n8nqn/Nnf/ZnNyaTif2Wb/mWiz/1Uz+18eijj668mtfgP3XdVOFzWXApgWmR5JFiKKzszgpTGBeSk7q2iEWSgDhSavEBqmJ0HWNS6rok7TzK/ISkJ18xmToJyYOYzNbLFzm6uo4ZRuaL4vZifHGJUceTrp3yOZ/zCBdeusiVy1epq4oUtPtOErHOYaUiiuHBNz1CvXqEbuGIC2XkXWpnrLbQrG5iaRAaclKnC++DhqiiDD/PDMnKcM3Z03t29m4xuf9Pmcepj2OZPGWWBVtMfyovHWFGr2NOSyZ7LoxFJb9AjiU4VXFOYhHfg1HzZ1OrF2bWAtjT0DWOBnCJuqqWr1WywftIGbspAaaI5B0Wm1XbVXZ0rBEasRAMAYMbQW3GmGwIKWGcGgL4mPjk088SoxImrDUs2kDOwnBUk1LiyU8+yetf9zpGoyExJ2weaJRSbUnBI2hXbY2hclo4JOtnUNU1vou8fOESx44eIybhyac+xUOPvEH9ScucjSIPMeIwVW9zp7oja2p1fbGZLnu6YjhgSuGvSse1vrFOW1XM5nOmkxnj4QqDwagU5IQPC9Y3VnjdA/fz+OMfJ3UlSzAquSaT8SnQdYHpZEbsIjlN9XsAVIMBrnY4gdippjXGABKwVUUIkbbTxHMpH1QKkcoKxjlyhqr3uAwByZGQI9knkkSSZKBCrCsz9ASoa0sMmd39PZw1jOoGg0Osox40iE3YymHdgNYLxzbP8trXPcDq6iptSPhkiJ2aUoBXl6KY1DKv9wTNqp+MZcadRNQoO+s9W+VIZSAMxmyeu43nX+7Is5b1jaMs/PXliOG/lvWZ2Hq9+93vvvrud7/76m/1M3t7e+bHf/zHnxuNRhngE5/4RPPX/tpfOw/w4IMPLrMKrbW/7eP9Zg4yv9nrPHnyZPzpn/7pT9/4Z9/2bd/2X1zLftOsTiSVvLeIixZyUHKGKSG1WWENyaaoCzTKJUkmierflb8HZZ9fEjQkFSFvVoxZk/4clUmk5Hn55cu89t4zbO/MOHfHaQYD6GZziJnKWWLoGI9G/ME/+KX8+I/9S/Z2D5bO7sZqMQohc+frX8ukWzBoAznPaWpD2y6YLTq87xgvWobDdTJQ14b5oo8D0sIVU8THjC3sSyUqlnlbcbRXQk8J2jQ6Y+rTwXLR64moLo3U9zJ6LVyho/dBsxk1KNbrJctZaO7D7kWDYZ0YkkEf11ggksRjUyqCc50XhpQxOVNbwbkaUzUkSYS8wMegrFeMSk5iJneBZLToi7UKtSqfXj/b4qBvgGEzIFcaVbO3P9HZnTGsroxZH66waOHli9c4dnyd0WDEbBH45BNPcGRzg9l8znNPX9AZscn4tqWpa0xllgcCZ90SRhRBU7xDUCh2fZ29vT2eeelFVodjZcVahSjUXlVTPEIIbD73Al8DfOADH2Tr088jRkrED4Bu1Bk48eIFXgv82q99lKtb17DOMplMOThYcOLYCU6dOo51mRBa9R4drTIar3H96lbRviWyBAIZU1WElGmGA6TO+C6wmC8IKeFjoOs8jTEMqlpZlQlaH/FzDYctynMESl5dUseaBNbpNffdQg9Y2ROSmsqLVSlDl9W/dGmuEDOz6Yy1jSOsHT+GpEwtmn7XatAebQ7EZDm6eYQz5+7kyLHTODdi5hMpm6UZevCdFjtrqSq1j0tRoWkRjcfqvPqxLrqosLQEchIqcWo2P1rBxsip03fx7KV9nM242mne4K110+u9733vke/8zu88/frXv34mIjz22GMrbdvK0aNHw1/8i3/xtyyav9fXTQfRKuX8BveTmEheExNU/0PJ7KOY3WYwOumwRo2pU9m+lV+pkyujzZ8WPdENPzjBJCGlKdY5DiYHHOy1jDfGXL865eSZVZx1dL4jpcxg0ECKVFXmi97xBfyr//On6YV8EhMYx0Nv/myOnz3Dy1cuMZ0JGxsbhLAAAtlnprMD4qQjbgj1oCKnjhC9wprLVO1enZWW4mAllSQkqb1UllhYm6VYZSjompIvcsBRCr4YdarPasFGIQaZ3gGFuLz6qQxBrWjyeZZUZouRnCy2asomrqf2nMFElTsQEzl6dbppPVIZsnG4XNG4ikFt8DHRzaN2syUxIYZIMolImQmScTYq5IeQgrIG+y40doGmcUwmB6r/sjXDMbimpguJ3YMZrrbsbE8gBXLseP0D93PpyhUGAxWUJ4HKbCA3yAQy4OqKGCKLrsXHUBzIMrNZy97uBUUfak/qwFntgGvrirxFSFHnoD4UpMcYFcpnwTgNFlZSiH41Qr/pZgghMZvNyRnGwzEXL1zm6aeexvt5iakq4m9TEXzJclgaIwi2soToIUWCV/JNjJGrW9tEEq4yjJqGHCLj4RrNoGFldYWV1RVyIQKFGPTgI1LMuT0+ZXLXYYHGObX6MhCSdmsS0Hmf6GcXo0Z26Sw8ElIkSQQf8EnTM+p6yKLzHD13lrPn7+To8TNgK1KytEHT0wWDSYrONMNGmd6i90vbKvGr9Z6DSct8PifngI8di6iFWkyGpFFiTVMj2TAariHJcu6uB3jpmY9Rj0cM6vpmtqlbq6yHH354fv78+fajH/3oeD6fm2PHjoUv//Iv3/kbf+NvXLzjjjv+q/aBu7nCl6J2AWKRLCSLDtZzn8pdjJWzRuWkQuwQEYh6wpQSFBpTwpBUUpAzMQkStVAkkwklh85kzW+TZEAil16+wrH1O9k7mDPcM2ysNjSDAV3X0nWBpm6wFo4dW+Guu19D/Niz5bXDyTNnuOf+h3j6U5/iYPeAo0fX2N+bYqsBYV+1XDEIi3ZOSnusbK5Q2RL2mYNuWkBOOrw3ZfaS0NllYZtol0sgJ1NifZLavd2Q99azDYXSFWt0hcKZqff5MMVcWX/DANKHnZoSV4N2kErqCCQiytRXAbR1rhgPJ5KNpAA5GFKMtCHQpinWR8xopLo+Y6lqQ8iBlEp8UEqEHNWWTjSeqrMJyU7d+lMRxudELuG8ufO0uUNcxerqkNWVMeTE/sEBbm7p/IL93QPuvP0ci3bOCy+/xPbWHjF7BsOKNnRYV6yyAlo0UiARwCSqxlFRcttcpR1FFxkMVsjWMJsHxqOBxkdVGu1knQMTGNQNq2tKNBuNHatra3qASUm1jM7q+46RqtGviLFC0zSMR0NyNoQuUtma4WhEiB2LxUITsFJk0XbKDs1qyh580MNZDgwbtUxTIf6QlBIbQZGB6XRCaCMrozFRM5lKYY9Lk+9iZFccY8qYQYTKOnKMxBjUkk00+UOy1VlzNsv0it5M24jeO0kSGCWppVbIybLwwp2ve4Dj5++gGqzgNe5BCWuFaGSMLabVmS4GYquykrbraGcdrc9szabMpy2+9QS/YOE7fHlNVrIyuDPE4Dl6/CjNaIg3NW60ztqJE+w/c01NAW6tm17vete7Dt71rnc9+Z/7dfyXuG4uiNbY8m0pnn9ZGWWSVGMkuSPnocb8pI4ognHq6WgRhUZdV+jXRUuWooZtikNocFnF3yIJL4JUGmwZgsWZwHx6wHPXLnDXxnlmbWA8tBjrcNWAmOf4pBq5xfQCd9x9lq2PaZTQcG2Dz3rz53D56h5ta7GmYjaf6JcXwVJhbItJhuCn5NkVTDNlOBpDrvSEnYqxMQGbayRVSsiQWCjsAlaU/Zkd2SSCBLxRkkXUSqneiK7SrjmXjD2pFcbL2t9Z1D6rdk43rNABCWOUvBIFJcHkUkJFmaJBAlkHKBgzBOewdYOTpJtcqPBtROaemFu66Ml+xiREqtFAA3Irg4QI0au4uiQGGFHwzvY+rSkoYcFVCr1VmRCgdkNySIizDOoBTe3wsynbe7v4nKlHQxZ+xGT3gJ2NPa7v7rN59iR7W/vcc9drOHnqOBubR+miJ6bMoG4UAfABiRHJGVc3GFOVbjvjg+dTzz1H5cYMmgEvX3yRuq45dvQYlXXKBoVl7t/GRGf2J4+fojl3QjupmOhaTxDBVQ7vPePtXQBWV1aIR9YKhKzQa0Rog2c6m7G/OyEsAiG01FH1dDFGhsMhIQTatsU5RwiBruuwVUVVKalmZW3M2toq169fJ/rAkfVNhuMxs3kLJmFdojY1OQQGtS1yHdUnGlcxENV5ZudIJdXcFD9XQZm/PfoiJTHFOoX+q7rW+a2FZjDGm4yrLPe+/nWMjpxm1wvrOLJvSSkwGA7IWQg+ElDrwSx6YGjn2vlnNB9yNk9M5i1t2zGftUz3Z8zaBSknXM7U1mJMLqGzlnnnceMxzeoao2HNaPMItj5BqJ//T9jibq1b6zeumyp8KZdUZRUokCVibCKxICaDyQ71jhKSGDKmCMm1Uyz8Ct08rSVF3UDVbiphnTqUdFlPuJIjOQnR2BLSuaBKDVcu7XHu3N1sS2R1lFgf62wmtD2kY3DWMB43bBWI8J4HH+DxHU+KMO8OODiY4xYtddWwMlpRCneOiImEqqJLnv3phCDgZJU6O1yvjYDC2CxOKlbF4fQG2fTApCWljoyH3HduhagS09JiK5uESWXukfX6LD0gleeiOsCYyMWvUQrc2ou9M7l4LapO0adWHU7QcE/nLJWtsU5wRoi2w3sgQO4C3rd0U+1MVkYjKiO0BWQ1ywIrZVa5dJjU9xwz2SppJBlhHubYCI1rMGRmuwdcn+6rHVeG0coK0wPV9W1t7TFfwN7uhPlin9l8zt7+hJXVdU6dOoWrirVc0WBKzNrRpKgWYIj6onYdj6yuc+3qNnu7E97yxjfx0Y89zl133clo1OAqNdKOXUQy1IVie8/dd9A9/ACLblHSzsGDavSA0VANJx56+EG6hx6ijzbKPRRswIfI88+/jGT1Mn30sUc5fvw4p8+cYXtrixeef569vX1SSrSLhUo4UiwElqhU75w4dnSTFDJ7exNlXdYWZ3PxW4WQMzhHXdV47+kF5ToTDyrfqJtyXyaqSqMeFWlRP10RR4ralcUisRFjyFELWD0acu/9D1LVY5577gorm5vsdxHfzkEicXuHph5QGUvImc6rY5CrKhCNrkrRA46UM6PRKoIaogefNLrLBybzObHtMIIyuAsLXFzNkWMnmY6GuGaDo3c/RHf5Eze1qd1at9Zvt26S3BIVl1egpDC5gOXETqndpid6LB1GTEkNKH9AKoxIQS2YE0aSOlyAWpeJzv16V5KkTRLkBd0ic+nCZVbvOcVs7jmyNsTHDmv1yxZywFWWtpsxGA/gYIJPmbaNtLM5YT5HYkIGFt95WjpqWxNNok1qx5ZSRzdvMWIY1BXGmqVTf5FNq0YpZi30MRZGZR9Vk4lJOyRBoZ3eN1R1VyVNXLQ45RT1UCDqX0rZlPpiKWIQpzPSmBLOHNp4IUV4nlVDmHIm5Q7fCZZERdJ/GoNQYV1FbWtGzYAmdnTzBYt2TsgBv2hZZLCDSuG+UmStsRD/f+z9ebhtWVnfj35GM5vV7O70XbWn+iooigILEAxRULHBIPkZY683N8SIf6jxRo0/HtF7E/WXJ3a5T6K/eE3UiEYEbKM0CgLSCFhQfd+eU6fdzdqrmXOO7v7xjrnOwUTgKGKMNaCeqnP22nvPNdda4x3v9/02QvOPKRJC/ztBmSSFPUO93jUURU1ZWBazKd5B61qUlS6wa1t8JyG059I22o6Zzxtc17Iz3WE4HvDwgw+SkufoZYeXhB6rrZhp5/sn2jexprPWYAvDwSP72NrZZG3PGs997q184uMf50UveYG4CGUjafkZmaGbAk07JyRhI1pTkJIQXbRRF+a6pBwc7D6p+MVOpApHDh9gsjvnF3/pv3D4yBHuuusuvuiVr2B1dZXn3Horjzz8MPfdex9G9wcVYSvbbFGWkviCBhJr62vMpgt2t7cZjAuq3MlpC6HrZF6Z0hJy7FfKFGBJZlBZFiLenaQk88rkiUls4mKMKCti+Bgt6wc2OHrldWDHtB5JWNdyCCurIUoFNrc2abtAZQyt9wQEPUkLj08dJlUQIz50JFOilKWuB1TlgLIsiVqxuTMhuY4wE+KO0o6QOkLsSCkxO32Ceu0Ag/U9jPccYM/+v5sC9mfXX9+6pMJ3ZYzcGhqZs2AxROgUZF9FfELpgUQS4TMUIjozEbgncTNJDSmWUhh0RCUvj0kxd0s5Ey1lcgyJqDV4JUkD2mMeupcDtmHfpGatmVJVEDxEFwgkFFPM5i7zDG0dPL/JVXbBYtaRfIsLU7q5wiaDZYd6MCApxcw1xGZO8B0uOMrOU5QBXRmudDLjk1g7SY+PWjaZ3KdltTky61NzdC7u8jyky+hndlkAgUoGoXYGlu2iEm1Tn2Stc9FNRme5hMwde8F/H2OkIrlPiygvM7LoHMEX2c+ywGhHGYX0YQtDsTai8iVN2zCfL2i6hWyQOvbhAtK59wnoCFNW0gwgGo9WJSpCmSTM1ShFM5/SdYEQhPVrbUlhxTOUkIhBMZ8tqEcV29sTut0Z63tbZrOW2pbcdeddjMdD1tf2Y/tZUOhEGtB5nOuoakmXL8sCaw2d8Rw9dISPfvij3PGCF/HBs5vix0qiMCW9BbjVImeY787wQZi3NpOBAGJwQijJWXCdW+DDIstClCTGx17wn+jajrIyjIYDXvzCz+PW5zyXP3jnO3jJ57+EA4cOcOPNN/LgQw8w351SD2oWbYfrZpRlyXg8lo9GFANptKKsDANf0kwaZrZjfa1iYBQ+ePHFVSK3kM9EPihqgTV1zqgMBpQtcG0L3uUDXVw65GgMtixog+PYZddy5LJDTBaRpx57ksuvPMpkNqesalSAsjS0bUNdDxiNRsybBS50KGOYLxYorwgp0fQ2bMBgYIm1RxuDKS3VYJ2VjVXWd3bZPr/JTn2exfYEt5jjO9kHYjuj63ahbVmcP8t0us3+i7xan13Prs/GuqTC968WM/7VYvbXdS2Xvk49/Bk/9A0f+dPPyq+ca81mYeV4HpMIjMkFOsgcrVf5G+tI3kgnpBJBySbaC9RDyibUkPUdOdc6pSzD0Jm0IscMjSaZlOFNkTjELItA9TEyJkNdgRg8Kka0lyw/rzTGylyxc4F55/EmUA0qyqJiNBoxGA7YnezQdY7oPdZonNB45fciJsyGvtMwJDpSLAVy88K0nEefPTQVSg8oigpVFBLdIzwoFAYXPKulYntzh+nWNisbe6ntjH1rJUqVfOADHyF6jWsd1li0Fbeb4WAkMz+l6D3BUoiZ9ah55sQZ1oZrHDx0lI/+6V3s27uORjPI88I95yccAdwiMNttUFYxrMWDVrrCgHOR2MphJcYkAndl0LrAahHgm6LApkhpPPfd/wCPPfQYP/wnP8xwZcxoZYX3vPuP+bpv+Foinpe97PP5w7f/IefPb1IOa+btgnnbkIx0/UWKmKJg4VrKumLIUEhMpeVP3vNBBqViff9e1tb34vxCIFxbUBY1aEVdlqioCS4Ik1pr6uGAJvUMU4HNfRBru0gkOcPBw8e46rqb2G1nzJpd9u49wGS2iyMwn81YrcfM54amaRiPB0y2Z+wspgxXh9iiwHVzppMZzSIwDwVBWTaGlWge546qKKnqCmPkPbw+HFIXBcPhgK3xJr7x7Gxvsbt9DmLCdxNoZpiy4cyJbSYXB/k+u55dn4V1SYXvn4zGPKItSRX4pDA6iqZLZ/NZJSyvpGrR9xBIGb70yQPC0tQxn27JUKhQDZFqIrBP71+I9ki2tBXWKAHFAIWmsopjlx3jquvW2L93lfF4yHw+x3tFUI6zp7cYPHqK7/vIXfy/bryR+1SEaCEooppSmA3m8wlWCQMvFTlNPjmCl40/hYDWhsIOicayU2qesQoTJfA1qkjSMn9JPmBUlOQJQIUim3mT4alcEBG/05RAJXMhxw1EIhGz4DhZkg8kI5pIqbUX7J6U1lmzleekZH0bfYybpGdHEtEpUmpQJZhhCVn87dqOrm0oygptNHVds7q6ims7trbPE3JBFW9veb6lstKhgyR2pyhM1JzkHZKj8wmURtlKSBelkUJhND568AHtvXRUybN9fkrbdpw6cZbaDrFFxXhcYIxiPt1ldzInBGFWomDT7IqRt5YZn49Q2Er0jijmjedd7343IeX8ZL+gtJq2dYxWV7nm7CY3AO9/34c5ffIUScelZZ6JYv3VJDh88iQ3An/8R+/j6fseoqxGaGVZTKfoskRbS200zXxB6zwHDx6RcNjKMF80hDPn+a23/g4rGwNiK23Y7mRO4Rp8FKLYdDoXXWfX0XUd5cqYSENsPc4vWJydUUXL9mSHp0+f48D+Ixy94gCdc3IoaBb0EbSlreg6CdcVqcYMohdHFReW0LsPAZ88h/Yf5fhVN/LOd74PpRWXX34lq+s19WBF4pB0YnNzi/k0MhqNmUx3CUFhi4IUDbu7LT4oVvfuwU5mPHr/oxw4dIy2jQwHBZ2LzCc7lIOatdURKiWispT1gEOHDrM6XmUy20WPK+zKgHZrk1Zb5ru7NIstklZ03bOF79n12V2XVPgeMpb7ypqYLG2AwrKE8pIps9WYRZkSbYpM2C+W8zAVhcRB8CQkWywmjXIdqA6x/zJ5dhQxNhCUGBWTCpE+WIjBQtSURnPv5i6fpw9x+cqIwwf3QdK4RizFJusbnDgpJsOP16s8oBXdosWU0AYgOHxVYpOWTSQaKlOiigJnW7wXdxqbNNaUpMJiTJJuJ+cRikg/GyQrtQy21WhU1JlsIoSUPigiIH/uu72Uk7972xSlkti6qYjRIqT2vX2V1G2JszH2Qjp2Lox4maUqI9AniNYvBSezUg+xqClXayoSad7iOk+3EDu+ZjanKEvKi7VTCpKSoiZFt5dsCOmEBDF1OUUCfPAEFFZbjClI2qIUhM5J4kHy6KokaY/V8lov5g0+RGabDWeGE4rxCEzJaq1ZXV1BYVgsXI7B0iRLhj6F2FOVhejUjMozsxIfNTEanE8kxgQfKCqPKoxkAQHWFChlkWDDQF0VaKVpHagukFtxmtYTIyyaBUYXlFW5zK1rmlYYz0pe3X0H9lOPBtxz3/0YXXDX3XdRDixVMaBIpWg0dUGZw4JLWzKbTtFRETzsbG0zHI2wKEJQrIzW6I4a1swa2+c2mWzvsL1ZMK8LTHYXQsHO9i5VLTFFRDmQaCOvTVnK58rk2DCfEskqlC34T//3r/DAo1tYazl+/ARnN89RVyV79q5z043XcsU1V7N+eMhsukvXBprWsbo2xHeJ2TRQVhVNG7BVxQ03Xk3berppx4nzW9R7VhgORzS+4/TpswzrEl3UtD5SlhVKKdbXNihGY8qVNbqVDWZmBOoMs50pcbGF7zWXz65n12dpXbJzS8zkApX1ehI8lmQ+lTcdlad0SecEARUw9Hoi6RXUkv0iGqOgLiS19cHVKXqSlsfplCB6ggeDXVpK7ezs8uQT59i7Z0yzGhhYQ4odSnUMR5owlA08eI8L4BcLWtXiaKTwaEvColMm0USxqErEnEptxO0+CrkgRaRDQzLXUp496ajEHDgG8SkkopPNQZ99HJGsBk9Lh1GWiKZIfRhRf7IVcoto5hI+E4KiYunVqJIYAWitBQIlEhMYnTIrNL+0mTlKipLk7mE62WRYROrhihBLdhOpi/hOnntoGhZtgypyClvWf5FfG1I/K4p53qhJOGIqIBoSCqtLtCqz/ZWmMIoyKAplcCSSiswXU9ZWxnSupXUdWhfEqJlOG7a3p5R2hPKK4dBiyoIK6Zq1MSSdKLLjj4+Bpm1pnbiEmBQxtZAuCjOgaT1t14CPdLtePCxtjhiqheyDUuB8dgwKKKMZDA3DgcwC67rClKU43WiD71p5z2eSUoygjaEoCnQwuM6zOl7FdYGyGJCUFM96MKQqLW2S1IzOOdrFAq0lC9BqRTvbpbSWQVFSaoMqS+qNDTa3znLNVVdy390P8dgTj9G5Fk3BsB5QlhXD8ZjVtRVW1lYY1kPahcMUltXxGotmhjIFMQoZp6hGDFbWaBrNyvo+XvU1r+Tq66+j2T3PbHeXGCMf/vAHeetv/3fqeoXjV17GtddcxeVXXI0tK85MzvH0kyew5ZCiDWxtblIPSg4fOYgtHM1sIuHMuzP0EExh0UXBYjbHpYakCqwtGY0qDIqp95iyYrhaUDhDVQ85HafMmx3axeRSt6ln17PrU65LKnz9iDmAxLHnOBYga8lS5g/qbMPVB29K7xFjkm4t+5mIpkoCOmXmADqlZfHr2XzSXAmFMGZ+fw8baqM48eQZDuxfYf/6PqpRoLSamRNT573rewBomxkdmuAWONUSNaTQYkrR3fXaJ5D9TGu1FAqnLLBNTnR60WqB+LIfqQ5e/kkiOo8qW69loXlSHsgBskjHRoromIhKoEtS/2wNRGEUgkTkkJmL+WHoJJZjIdtrQc+izW44S+stmf0JSzEh3pWJ2HUstkTIPFpfQduCZnsmXSoJgiMERwxa7nd26VFZtGzRxAsG75LsoFsxNibDstFitJU4oBRRzosnaISoIzbIHC0C86bBJxGEJ52I3jPZ3MUQ6YaWkUsMhzVRR174gjsoq1ICYoPM3brgGVQjKlNniUgg4nBdS/DiJNL5lsF4jcceeYxd12B2tvMnQFENSowuMKqS1A4Qun8IqJAJSdlTUxIMIsYaMTbIh4+YoNKl5AsaIGgOHjzE2dOCOLQ4NvZvUAFVpShTIvrEeDzGuQYfHNYYkksM44CYLcOSFnZvUJ71lTUWswW3Pv82vvhVr2R35zynnznFiafPMByuMB6tMFgZYGvLfLpLUQyx1tB2GcpWJQlP5yKD1b2s7DnCcG0Pt6wfY9IlFn6CLhL7Du9ja2fK81/8Im697bl86IOf4J6HHuHEyRNY9VGec+tzufzm67ji+JjdyYSdrSlVOWRlZY3z52ecPnuO0XiNQ5cfI7YLzm1uUo9FIlMVJX7R0TnHInRMJzsMqpqgNKqUgFtdGIrRgGK8l0G7i9p9NoH94nX06NHnvO51rzv96bw6/zrXAw88UN5www3Pef/733/vS17yks8oveG1r33tlTs7O+ad73znI3/Z3/s7v/M7K1/5lV953dmzZ+/ct2/fX9rZ4BJ1fEnYY0n8KsFmVxPRCZGS0KSVkqw0hXQeyZMUOJ0otAGXQBtMkenwQWZrXsbtABgUURXoqIkqEbTMrvqoHqXJkSqaeeN5+OFTXHbsKmwJtvCYJBvTaLySLz4IaUHJ5hx8okQRWkdpLNoomtCisOLAoUAXJssILN51xK5FFwWBCMaKObQWvaLkDWpIAR0l1Txl2LNvZVNmZhbJomJBRIswXvVUdFAxYJRMMn3UaExmEpLJM/J0NFrSIJTc797pBQoinYhLsqem1oVAlIi/o0Lhm4WktCvNeGUFu7bKfDbFNQ1lUZPa7DSTdZHyAyQpIOpATDIvSgmiFvcWSYAHpQvxbMym0jEEsRmL+bpVIHhFVBHnHbQQgyKZRNd0YBboUlPsRFQcEUKicYGN1TFVqdm/fw1bSHxOSEqibQYDCmWwOWw3JoePCq1KQhISTEiK/Qf3UNQV48uOwP/nx7nttuey+kVfKIHBXnSMPkacl8zA6hN3Az/Ni+64g50bricic1WdtZOt71BGZ41ei3Ne5CQRFo3jzP7zTHenVCsVjz3yBFcfvxpddnIA8EnMpxeWtm3FRzXBykpFXdbE4BgUlq7z4CH4SFCwO92iLCI6OdbGA9pDB1HK4L1nZ3eHOtT4GCi1pSoKTKHoosegcNFg6ppivMr9jz3F828/zPbOKfYcPoxyczZWxtz74KPM2sSgskzOnaEaj/nW1/xTBjry1OOn+cCHPsKdD97PLddewzW33cTe9b2cO7PJ9u4um9sTBiurDFfHqBQZ6AGD1RGT2ZyREVMHpSSxwruA8wtSjBTVgHYyETJXCGjtqYsB9WAPuvobCO9+6KGS06f/4v3x4EHPtdf+pUNf/yrrT//0T+9bWVl5Fv/9K6xL9OoU4oZWiqQjBJP1SAARnSExOaVGIayoQsTrQaymYiEmzCppCbZNAYXFZFNrr7MZc4QUJYE86ZANnI24hqQM7YVISBZT1exO5jz++AkGw8Ns7CsxKdJ2LfVAnqLRlrIc5pnMLAdyKpQB13YolVDGkJRAdUklUpQppVUShppiJDpHjBpTCLmE4PMs80LR7t2kwjJAtM/Kk/eq5FjITE8c7JGiEqLAuXgcCq8y9T8nOcQU8Bn6DcqKJ6fO8GNfXXOAqFWKFHLHl8Fn6OeIRqDSzrN97jxd07K+scb6xhrbW4n5dCrzMGNJUcgtKaW+H8zG2SkX2v69oZfPXZLEFTGJj2uMgUgQC7WoJCsxaKpRDSHSLVp8MMQoFmHRGZiLNRoq4eKQgVLUVcfdd93F/oMbHNp/gKOHr8BoxbAuSMrTRYfS4oCqraGgIAaB0mOS4OLRSCzUVB5hrq+vYIqMVRpDORgSiWLXlqDcI5vu2saY4tB+lDJy4IoiGYlJyEMoMTQI+b9jhKIoOXduk9JWnNk8y+bZ89x+++1sHBzRdYGuleghBXzwgx9iZ2eHQV2zWLQ03WKpIcVK0gUGClNSFpbJZJvtc+cY1gOsHTNfNMxnMwajIT4mCizOeUKYoq1YvrnO0TpYO3gAXRR83ovu4PTp83iXMCZRFyWPPPIYk8mUWRtZGMXO+V2uuOwqmsWC4Yrl+PVHueG263n6yae47yN38pZf/w2ed9NzuO666xlu1KxsDGibxPkz22gMa2uWQ4cPsLeD6faEzbOnaDrH6uoBogkYVQiUnrWpnfcQJA6rGFQMuhGj6nOcY/rQQyW33HILXaf+wseUZeLuu+/+myh+R44ceda1+6+4Lk0go5K4dIQoFheId2LMAafBd5n2DiAJ5CF6UAYbDVWyODzeZDguJvq08Jj/W2UDbJc8JvZ8j7T8cMg4TDQ/4NEaYhBm4oMPPMTuzkKE6gmhviMSAgn7NGhVUpUDhoMhdT2mrAdS8KIiBXCuWxJMkvd4JyfT/hqid8TO4ZqO6MRgWfkWHb2kTMeYkxyCwKn5+mOSWRAAKaCiB+/o2gXzZkHbdjjvJUUBjTYlxhis0VijRKBtDMaIa04kdwCdI3YtoWvxncM5h3eyoaYcI5RUzPCnJmkt16ElyzAGz2y6y/lzZ1nMpwwGNWtr6yRliVGL2DmJn2gW9AnrVF6uLJJ2WVYhEK3RYmQQQieOIlpnvZtkLurkMd5hYiK4TmaLwaGCJ0QvAvMgLh/TRUcXhKXqOsdi0fH0U6f59V95Kz/9E/9fzp/eEtu30JGCw7kO5zxdF3Ctk2ijECVhovV88L3vBxeXjv8pRKxWGKMwlQYbwSaUKTG2QmW9nzYlxma9qg70b05xxDE5U1EOPhK6GZjPd6jKxGRyjqaZ8uqv/FI+8IH3cuKpUwQPpSmoqwKjE5/3ebexZ32F4Bvq2rC6sUo1qNHKYmyBLUrKekhdDSAkRoMx+w4dRVUDOSQGGI9XWFlZZTQaU2mLMYpoI8NBSYUihMh0ETl6+bVszxZMZhO2dydoa1kZDtjc3mZ7t2VnuyEFaNuOYrDGnj1rpNgynU85s3mWE2dOsL53nS/+yi/jVa96Ffd+9E7e/F9/jROPnaNUmr3rJetrmsEKlEOL9o646FBY9h88zNrKmLabMV6pWVlZobYl0XlRpcRI1zQstqdsLea4rsH6zzGr8/Rp+ymLHkDXqU/ZEf4l19bWln71q1991WAwuG3//v3PfeMb33jg8z7v867/tm/7tqWK/+jRo8/54R/+4QMAX/mVX3nVl3/5l1998c9o21ZtbGzc+u///b/fC/J+/P7v//5DR48efU5d18+//vrrb/qFX/iFjf7xv/M7v7OilLr9N3/zN1duueWWGweDwW233XbbDR//+Merz/S6vfd8zdd8zRX977jyyitv+ZEf+ZED/7PHfs/3fM/hjY2NW8fj8W1f93Vfd3nTNMt7/emu9bO1Ls2rM29cKhnRY6t+/qSERYaSAqISESekFJ0JJF6jQyTZSMqBtCbmbkZdoH8oKYP0vpQpR/aQlpPBrAAPwqRLEaOkC+0WLffd+zBHr/h8idhpF8QcHqtsQFup1wZFYUusKnB+DkZRmpKEpg0dITXiNakVRkm2HUQh6Rgh6PgQCLqDkLAqu2jkS1t2RzlgN4WYk9bl9XWuy51BTobXJpMbVNb89Xc8ZQ9QS+4rckcl3Z0xkvSuM5U9IgeKfLnyemiZqSUsShXZMECBl0NDYTRdjCzmC2LwlGUlbDtj5KCRQpYpqGV6RG+rlpL8ThG666XpcYjC9jRa5Ckxyjwv+FagXK0xuhDGay7QKuf/gaQHdI1o/dCWsvMMY6JrHY8/8hQf/tCfsnn6HCEpHnviBD/+f72RolYS/ht66FdB8lgFKQRM0gTfMRiscubMNgejFLTtnQnThx7m8isuA6MkA4+EVlHe21nATvJoL3NjpeQw4LwUOKNNZrbKqxS8PMdSF0Sj2Tx/nn0HD7O6OubA/j2cO7PJoYNHULmrtJXF2MTttz+Pd73zD0kp4b3DlAWxE5hYPlupZxeRlGI0GlENhsyn3fJ9RdaI5o8TWit2d2eUpqJNmuuf+wLOzwKmHDJbTNmdT7n6ymvxbcfp05tMZhHvDcZaIpF6ZcBgCIOhpWs9PlbYNGB32rCbFozW1vgnr3sdjz71JO9+93sIbsEdL34x+w8dZnM6Y2vzDCpqUhjgg2Lf3hX2XX0557e2OXvmHFZZSJrFYgauWBLciBrfdWgnUWF/V9a3f/u3X/bRj350/Ku/+qsPHzlyxP2rf/Wvjt57773DW265Zf4/e/zXf/3Xb37rt37r1Ts7O3ptbS0CvOUtb1ltmkZ//dd//RbAD/zADxz69V//9b0//dM//cSNN97YvPOd71z5Z//sn1114MAB9+Vf/uXT/me94Q1vOPpjP/ZjTx06dMi/7nWvu+Jbv/Vbr/rYxz72GZlchxDU0aNH3Zve9KZHDhw44P/oj/5o/N3f/d1XHD582P2Tf/JPtvrHfeADH1it6zq94x3veODhhx+u/vk//+dXfu/3fm/4mZ/5mROXcq1/1XXJMz60R6VSZjqmAywhKkyySMqyIoWAoqMsNNoauighpzpGCoSNGVPKUgiZySSTSRBwUepBT/zsmaPiVamSzNLEq1fyugKRGAoef+wpnnziNMevu4KmaShzHt/e/atsnzyDxWITtJ2X3DArBtIpnyq1stkGS35+UhEpshfl4AWZb/mQJFU+gc6+or0OUemULb5ko/R5dpF/CZDnm0pni7GUc9IERFTZKko2AWGQRp1ndlLh5PWIEZUhwUwVynNDSfIOQXR1ErUDWiWM0mgjsUdRJ4xSRA+uE1uvzrlM5HAE5wAjtnMxCrElisA+IcQZVH94ER2hSP5k8yZptCrzJiwF3lpDUdUUZYHzrUBzxpKiJMbrrkPHSEATqsB8MSedW3Bya5N7PnY3zkNlLBv7DjCddrz3fR9gfWOF4CWbL8RAPRyCCpL9GBPnt3aYdZ5qMODtf/Qujp44yT8ChlXFBz9+F7uTGfsO7GNjfUMyAI2YB9gse9C5iBiV0NqSlBUD7+hFxhHEk9ZqQ2mkkxR3lcDxK6/j7M4Os+kuz33uLZw4tc14PKCbzWgWHh87imHNynrNl3zJl/DWt/wWqiwJqhMzAhdy1yws0OAdLrZCpErC4B2vDJnNEltb51iNawyrAaltiF3A2pqFS+w9cpRUjfn43Q9w++0389ij94NWrK3v5czTDzObeRYtMhf18rk9ds0eTCnza1NY6tGYkBJdo3AxMdneYdKcp1gb8dX/+P/g3j/9M976K2/m8uuu4UUvfykrBw9x+twmGMtKtUI9rNlpJzgE+j9/4hTr+/agjUGVwq4ulWHqIyNXEpQm2b8bzi1bW1v6N37jN/b+3M/93GNf9VVftQvwpje96fFjx4499y/6nte+9rU73/7t3x5/+Zd/ef07vuM7NgF+5Vd+Zc8XfdEX7WxsbMTFYqF++qd/+vBv//ZvP/iKV7xiBnDTTTedf//73z/+j//xP+6/uJj88A//8In+z9/7vd976mu/9muvmc/nqg+y/VSrqqr0Ez/xEyf7P99www2bH/jAB8ZvfvObNy4ufEVRpDe96U2Pr6ysxBe84AXNk08+efKNb3zjsZ/8yZ880XXdZ3ytf9V1aaxOZUUUrR0xmUy517KRprz56kRKGjEXKwkpFwIjm3GREl1qIJRoVaPYJWGEFq8UOikKpQgKvEqoGETrhRZXjeSItsOEESRNUhJ6q7WQbNoO7vrEw+w/fBjTn5KBK665nPDEKbQqWCRHcokQHaUqUSriCdhkUHk2laJ0MCiPIhDzn72R5xiDFNQUPV5DGSMqqBywKbCSTr2LvhTIZSxRfq5Z0SfC/57lmqQ7BHLHYYhKkbTMO3WUaaKPXmQMScKIcqWSeajKHXmSzkMpLSbYIUgShI5COlIF/XTOZEa/VmppmpxSFN9QJcVDxUQyUcycCYTs9i+FMEoyRZ5jamVzoZN/SzcrnV1E0sjLFOVAgKINAZ8MvvDYKDNZpwLKtYQdOLd9inPPPElVlqytjcRpZXXE4aOXM501nN88B1hhDqeItQUG8M7hvMOUJUVRsTudYaxmPhcHov/+jj/g5MHLeeyJU5RlmQ86sL6+wfraiOreh/gG4J3v+EM2H38SbTVlNcCoghgSzneSMpAiXkV0Kd2fjpL913WOohyjikhZGebTBQcPHOT0MydJrhPbORKLrkNpTWUqxisrbE52GYyGxOTFEjfJ4SDERNQSQ1XWhYTyhkjrFujsaNPMGrxrKKsCXZSgDKsrq6zuO8Aff+xjlPWYWdPQODiw7yBdM+Hk2QldmxFcUxKNYWPfQepBjbEe13ZYU9M0jZhEBGh95Pz5GSklRouG6b7A4Ztv4B8ePsS5k0/zB299G7e95A72HN5PwhBmE3YnU07vzjl58hwHN9bZuKyimU+xthTTgKYF5yi0YaYSCwMHPzXo+L/Nuv/++yvvvXrpS1+6tMfau3dvuOqqq5q/6HuKouArvuIrtn71V39173d8x3dsTiYT/c53vnP953/+5x8FSV1vmka/+tWvvu7i73POqRtvvPGTusgXvvCFS3bmsWPHOoATJ04U136Gc8x/82/+zf5f/uVf3nfy5MmybVvtnFM33HDDJzE+b7jhhvnFxJyXvexl0/l8rh955JFyMpnoz/Ra/6rr0jBqpXBkTZ0SLVbq/SWVpAOoENHa5ny6nNAQZcYmQawJoyGa3v1fY6OwLb0SE2SNeGAqLXBjikkU21HCN7OSjYQYDhu1dI9EG8ujjzzBvo/v4Y7nX7s0ht5YG3PwwF7OP7OdBeId0US800SjiCmgQ5dNfwMKiW9JKkhBzozMkEJW3GWdQ/IZhu1nYeL7KMPKgA/SzWldsrzK7O2ps9VW1FLgc/CCpJ5HjzIGo0wmD5WC8CYpMoUVh5eUjNiDpl4iItCcVkqgRg0ojc1xUoLJRVzsMLqEwmaNpIigU4jZUk1+ZorZ3TIlDFpyFgGvIRh5L6iYQPdkHkVRllhTyv2JiRhbevPklOFAZQtCAIWFGIVxqiWuKqUCHxTGRpyfsbW1iZvvsP/AfupygLYVprScOnOWF1/+QrQR2LUoROzfNC0uiK4xodBlgS0rnIuQLDEohsM1APbu3c9i7/qy4CulcM6TYsgFpffqDMxnjUDHu3NCJ+kYxhohGAWPT4FoDKWRmVWMgbZt8SEwXeyyaAKr4zWeePwJvOuoqxKrJe085e4wJTBFCSrRtW2WTEj6gfOetnUUhbCBXdstDw7B9dIgzerqOk07Y3e3ZdZ4jl11HYeuuJb3fvheFnPFno0VdjbPs2haNvYe4elnnmJnEem8oTCS4aiVpioLknMUgwKCvNcmsxkJw6JrmC1aGu9F0zqboq2ClTGj9RUO7LuVne0d/vB3/4ArrryCW++4AzNa5dFHHsUWA44cPUKlYH1tlTOnT3Hm1Cm6Lsp73hqGpmJl7x7MLGBOFZe0Tf1dW9/0Td90/lWvetX1J06csL/1W7+1Wtd1fO1rXzsBmEwmBuDXf/3XH7riiis+KXy2rutPYoaWZbns7FSWdvUxXp9u/dzP/dzGG9/4xst+6Id+6KmXvexl07W1tfiv//W/PvSxj31s9Jk+j0u51r/qujSoU4nEgCT636Qz/EgePeRaIJTvTEtWJUZp6RTxokFTGmcCjgTJYgIYrSR6KEpnFGMQWFHJTMkolTcHTcJIuoMWrWDIBZUkXVNd19x/70NcffQwhwthhBkcN918nHc//SGx+VLS1XhfoLRsNL1dmMoxOEsdXXZIUSlH8Vw0QzFKYVJaziKNCvgU8clTJJ2tvuTm2CwqlwzAHG2jhAkbUsRcFO0UlRi6aZvBz6jRyqC0/H1cmooYCfnNc1GlghBpYn4KSXSSUaf856xPJAqDToHVkiIhWrXeSK6fKOY4pTzbs1nHlpYQdPbd1AplDNoatDG44ET/p/LcNirpHhGTZGPEtBoXILbSfNqQEzAsMWi6tqXZPsvW1pzV8RBTDYjG4FPELea88CUv4Mrjx/jDt7+dF73wRbRNSz2osYXA7yE6jFW44Em+wdgCnQqc6/B5bpRSAh3Q1jIcDWgWHd7DZHuHZ04+w6GtcwBMFy2TyQxTFgIT53l0iCLiL/JrG4OmCxGrQOlINbSYLpD0UJimxQBrg8gVYqTzEk/kvc9FVJ5fb5gtwnhBNOa+lcNVCPLeyZpXWwrT07We4BLnzp4FnUjaUo83OHLljdz36FNMthfMO8+etRU2z51kXK8QnGc6bfBBi6OMFVZwYUUaUheG0pZYbZlMO1zSNK1ja3tC0pp6PKC0Jd7PwSea2RzlPd56No5dwcHzc3ZPPcO73vK7XHfb8/i8F93BU8+cZmt3l9H6OsF5irKW+KK0YGVjHacVRYBUQNvuYvXfjQT2G264obXWpve///3Dvss6f/68efzxx+sXvehFfyHM98pXvnJ26NAh95//83/e8/a3v331y77sy7aqqkoAt91226Isy/T444+Xn02o8M+v97///ePbbrtt+n3f931n+797/PHH/wdyzP333z+cTqdqPB4ngPe9732j4XAYjx8/3u3fv99/Lq4VLrXwxYRREiqrFMQgnh3Cpheyhuh0splVzqxTMiAheTlB+JjhM/oCEyRLTgn7zyQ5wQefvTCVEi0WGkk7V2g8YnEm0oaUQhZ9a1rX0jUdn7jzUW654QgArplx9fWX84H3KuLC4mKUIF0CWiVCUiQMWsdM1uFC9h3C+ExRTvnahCzF0MtOVzpA0X5FlWd/9HM1sqfmhXvZb/AkhQp5h0u9skG6X5UiXXDEJN4sRmuU1SglMb6SZVhI54ewbFOfcpENtFOMGK1JSrLyNKLDVFqCgH0SMobOZJuUHWNSz7JRkrgtZBGW80ipFxEdpFslZKu14EnG0+sSsz+3/LdSJEK2eXOYQhGDOJYoW6AKLV1oAh86Fosd5pMtQquYK83JcI6NfXu46qorufqqq5nPZ9z18btZW12jWTS5u3LSBSkL3pNwtL6jrocYHYhhgXOORbZoa9uWLluTNbGVzMOQGAyH1KMVhvMdALQqCD6xaBtccAzrSgwPjMj+rdIkrXAhUBclIQVsoWhdh4marmuYzRec2T3L6mjA+toq1pZ0IWB1IoSADwFTyN8ZYzEIq7d1C7oYMdaiNMTgid7L+0FrQvQ0XYeKol1dHY+ZTeecnkx52Zf+PU6ePc+ZZ86SYstoVJIILNrAVVccpt3dZTGZEzxYa4ihI1FQ2gJbGKrSolAsFg3zuaNrHW3nsLZCW9GxjgcDNBqj5TO9aBYsVEQNVnj5q7+CA1XBJ973Xt73B29n88mTvOzLX0VVaiY7W8Su4ImnTrN33wZDUzBdzNFFgVaWFBVVPaYejP+y+9vfqrWxsRFf+9rXnv/BH/zBy/bu3RsOHz7sfvAHf/CI1nrZgf1F66u/+qvP/8Iv/ML+xx9/vPrd3/3dBy/+ma973etO/eAP/uBlMUb1hV/4hdOtrS3znve8Z7y6uhq+8zu/8/xn49qvvfba9i1vecve3/iN31i99tpr25//+Z/fe9dddw2PHj36STCpc07943/8j6/8oR/6oWcefvjh6sd+7MeOfsu3fMsZY8zn7FrhUqHOGClVh0oGH3I/8OdaYqMVOsQMecZsXZYDVJcdhMGQJNIGjTaRkC6QNwhCGLFo2hDACsRoNMJOjAqilyBQlbu/6DK3UTpDkywP3vcEj+bDokqJqBfcctt1fPSPHwBbCaSnIyhPiqIP7EXZvTOMQHVe3E+SlueXN/CexSesxQtsOikSBpLGOwcWlAqZMZiLi2CQUvDJjjVCAoV8b+ReZGBVXwjyRSPdn4ooZXJhgpQhWvmObCcWe2NrYTuSoVGFdGlyAFH0obhK/iK/nlljqJQoD3NqQcwOAgYwXqBajSY5cYbBZHhYW0BcTVR2n0kpghaoOYSGGMBqQ0KjdEEKFp86OjejnW+jIxSmoqpGHL7sGIeO7se5ho9/7GOYLPm4+qojON8BaikJUUajoyH4SIols11PUWiUSlnuIO/XxTwQU8FkMqOuDCpqZvMGjKIqhwxyEK1CYPZCF2hr8d5x/vxZRiuraK1pZwswBmtLtpzYosWMWLjZjJmf0BFYG+9jZ2tHCkVRSHeKJ7hWXgMfAIMqrERjxQgqYqsCtGK+mEMKqDyH1doQrSAhoWlQMVKVBcNijb37N3Cm5oknHsPNHCGUrIzX2dndpRqOGK6u8MyjT6NMhQst2gqLdVAb6qqkqmQG7FzHYrEQeUsIWDRFNQQTqaoBw7KgUJqkvDjqoJm1juFwRNHucHbhOHrDNbxqbT+/+5bf4vfe8jbu+KIXU1rF9k7DyngNUw5YWa0odid0swWltURjsFXElp/jju/gQU9Zpk+r4zt48LOup/sP/+E/PPXN3/zNV3zN13zNNePxOLz+9a8/dfLkyfLTQX3f8i3fsvkzP/Mzh48cOdK98pWv/KRu6Sd/8idP7t+/3/+7f/fvDn33d393tbKyEm6++eb5D/zADzzz2bru7/me7zl75513Dr/lW77laqUUr371qze/6Zu+6ey73vWutYsf9+IXv3hyzTXXtK94xSuu77pOv/rVr978t//23y5JMZ+LawVQFwdZfupHqvTS8Qr3FQV42cR8dk/RSqOTUOeTgkKXxGSJ2qLqkTAnVcR7hQ8SalqgCdERVcKEIJq0JEG3KsjcLuSNOEWhvCfRZ2O8EGiClXmawQIJDyRbQLIUWqGD5qUDy28+fg//9v/xZTx5ZIVmVvGbv/5H+G6B9g5vQOsRNmqSbiUjsM++w0uKgoI+7bx3MXEh4pM8L5sCOsmHM6qQGZgaGyI25qQEa7nVe951/hyvPHiYe2yZ9X6RpDwpenRyEA0qiDg9KpFgCAlUo0wlGLPRpFTmgqPABUqViL4ToXaUg4TOLjuaIFCnKVBUQgZSLUk5QrRYW2LLgkXbEII4fOjQS1Vilr4ryOGtUQmJhx7wM9J7q77kxkx6MhooAXFlSb2VnfKMV1YZrWzgmk6KQFlKWmGMxDhjsZiy6BLVcMTqxh5WxkM2NlbQBUwm29B0GFVRGEs1sHz5V7yKF7/kxSxCQyQym8wpTSnhxDFijKFpWnZ3d9DWsPHI47z0m/6fvOvn/wO7N16PAprFgrZppAM1BSEE1h9+mK/8zu/m1/6v/zdbx69GY5hPZygt89S2adjZmvD002eBAV3bAhFtBe5tuxZb1IS4zaAcUap1ppMZukhUlcUWUFUlRVXmjk7uadd0y6R0rSSPsSwKTpw/x3g8pkweU0I0BUaJuYJSEZ3AdR1dMebWl3whn3joSU4/vcNsS97rx44fQCnN3r17qVfmnH3YM517ZrGjHhRoFPXKkPGo5tB4wPqwZLedM5s3pFTSuETQiuRhPBLzdqMrutCgIvgkc9G6rgmdYzyoiQqM1uzMZ7jQ8ae/9x62T53lC7/2VdTVmHMnd4jjFeq6YNg1PPzQI8x9y+rqARZuylVPfpxf/O1fuOg0+qnXRz/60Rustb9/7bXXTofD4V9IDPmU638R55bJZKKPHj363B/+4R9++ru+67vO/XX/vr/taz6f1w899NDYe/+lt99++18oxbhEHZ90aGSPwqTV0i4r9PBYku4MZZBNsEUn2QBlRtVmZ3/5MMvsr+8FEVkf9OoFmb0ltexqRCAvXV3SWf8V9VLnFmPKvp6SN7Y7zT5/aUiKlpW1IZdfeZQH73sEm3VoRKGLB1jKAJYp26jls1cI/EhK6JS9PZUSr9EAfchp0sIjEYmEzobcLPV5MfWTtJzukK3XEvI8sdmbszcKzWSiGKNwaVJEPD0TyojAue82DU6StZMI/nvii4qStB2MzjrMSDQiqu9SxPuWoipIjcw5tdWEEJddvWj0dP7dF6904e8S+XVKyHw05hlj774jsghNvj7v0ApxmYmermuJXYcPM7x3WFVSqoSNjtAEdjbnXHvTcWxhOH/iaVJw+C4QFnPe+mu/QV2V/L0vfDmtaziwuhdrRbzf38cEOL8XYwyVk/3w2quO4G8+ns2+Fc47uq5DmYKyKCjHQqy44wXPo3verXjnBcoMHdHDffc+xB+8/Y+ohyt80zd+A8omvO945JFHaNqOKy6/guQCk9k27//QR5h3iefdfj06eObzKfP5FKW1sBm1IWQZSEqJwhbyHs1+rTvTOfv37OXYkWOcPfUUbZjhXEM0K5TWUFhJd6jqdcxwwPb2jFNPnyO0oPBU9ZjxcA9GLbjsyH4ef/pButQQojBebaEptKUqK6rKUlaGmAIhRmxZ44PFAl3XsjoaUhYINJwiRhnxLkpQliWFLRhVtUhmZOOgUJad3W2OXn6Ic48/zLv+25u57Uu+hGJlDZoFs90Jk7bBO4XvPN1ijjGK4P/Slox/+XXttd3fhCvL+9///sHdd989eOlLXzrb2toyP/RDP3QY4Gu/9mu3P9fX8r/zuqTCZ5NChX6TztEmgM8yO0XK5IcknZuWLkGlnlFo0dpJfpsXNqiyIotXWn8yuzAJBBbJ7NFcdklBYNE8EFMpiTNKrpyis3NEXQi7NJsp7+5E1OERzjXcdMtx7r/nQTnVRy8zuygFNsQgmqLMphMxeiZB9HOwXFiErKFBGZEJ5PvRE0HIieVCPQmkeHECu8yHZP6VMfzos8BBTMG04JREougCkydFnXWC8rNT0jmmSGzJtCoorCWmQOc60IkQEzbInDAg9z8Rc+EUQ2sXstNLn+cXeqMAOciknoWaMqO2N/RO+d6oC1orhWj+khLbOmM0Mfls2aawOqGiJwSHToqua2hdi0/isBL8QpLIU2DhW5JrGQwCyVY8cO+DvPyLXsz5Pes8ePcDUiCNsDl/9U2/ynNueQ4be9epSktMoks0xuZrhLoeEELAGrnegS1oyO+jFLGArSrpTlNCBynqlRbz8XJQ0i3mVMawu5jz4Q9+iP0H9/PyL3o51ShhS4X3hquvOcqhg4eIMfHI/Q/y5ONP8vznPo8Xv/RFRO3QSWZzxhicExF8SEKWSVHmxq7t2N3dpSik+H7wgx/i3o/fzZmTT6ONsFiFeayxNpBiiw+axhkOX3GYc+cmFKrCxxYUDEZDSluxb6Ni69wZFjNFsglTGkxRUBQiw9BKUWhJXV/M5riQ8FFJqLDRrK2vUmkwytD6lq5rMwktSvhtXeE6x3Q2J0XP7u6UbtZQDQa0sxmh0gyv2M/5R09w1/s+xA0v+jxWB2tsn9vmzM42K9UGlYLp9g7aKrr53y2Hrp/+6Z8++F3f9V11URTp5ptvnr3rXe964PDhw3+3bsJf87pkAbuT9CGUyjAbmZChUhaeJ0I+0esUUckK1qfDEkKMKcl8SEnRi0lKg8qszJ4N0csfLm4Dl0nSSX4HKntCKtmoZW5nCF6SD2T2Aw/c+zj7jx8jqgVrayVXX3WIJx47A9qCkk1YUWNUWs7vUH3OuAKVlprE2Efx5MIgd0KCdlUUR0udUtaUQVCRFCUwFch2a/GiQimOnBibEwtingDm2pM7bBWjdIcqkfCkpHEZVtRaYax0n3IGyC7hWh4b8v1SISyNvlOOK1Jai8wwS05iuGBCDT3zUeU5Hfnas+sOSpx2hH6ai2L/zpDZaYiBpCSjkJDw0TOfz1FdQEWyBjTQuBYTeiMDQQyUc7h5g2JKYE7X7vD7v/l7/P1XvJTnPe9W7rvnPnwIJK0YjIZ85KMf4Su+7FW43NEppZfJFSlBcPLampDlKRExCOhfi3znDRIQrLMEJSVQMXtIGoubdfzC//2LXHvzLbzopS/C0VFaObG1Xcfqynj5OrnkOXbZZbzw+bfh2wka0S0qLQHG4hlriMnnCCkhVxkiG+sDeX+nyN//ohdzx+3P4f/3n36Jg0eO4LyiBGaNz0kVCZc0ZjQiqSEnn3mKiMYFz2h9L+sbGwxHCWMUk/PbTKYJW1YMR0PEfyJS6ZrCKKpC5tOdj6Sk8ZnYs7o2ZmVY0LmWEAw+xozwFCIF8Y7p+Smhc8x3d9na2hSdaEgsZjOC0ti1MWtXXstgeIjJ009x17vfw8u/4iuo12tWGTMoV+h2FU41RO/Quhcr/e+/Pv/zP39xzz333Pc3fR3/u69LskQIFmKhRMOl0hKe7OE/VHZ3gcxglE4vRoVWEWNkZqeV2G313ZMxks6tC9HwGGvRhQEl0gKlybO2DCYaLT6WSmDEqIUwYbSikL0erQupV1kMvn1+mzMnzws1OnY89znXyAZoZT6oTU44yO4mSvUFWP6ttUEbYZUGBUHnTi/pZXFSS3mDEiNulZmhyEHhAjErZpamQluFthaUXhJVLqp4fdmX4rf8UoLocwET8k0MLdF3uNAIjb8wmEKuzZiSoK1sOghpSCkt3p+AjmmJxKbg0TFlG20l4nNr8mt7kQ23Iqs0EtoI21QXFl1YlLFgLMqWKGtJRoa/JioxO1Di0NI2Ld53xCBp9zpFsefKHaTK/2gMzWJGjA0pOHbObfEHb/1dZps73HrLc9izsUFZSfjxx+78BIvZQq5esHFi6HCukfgf18r7NeaIqCDQpvMdTTNntpixaBa0zQLvO0mPoO/8RRuZguJ9f/wBSAUveclLmTZTNjc3ue/u+zl78izr43XqosZ14t96y023YJRiupgQlMOlC05ASiW0ERZrCCFDwSxdfs6eOcPmufNsnd/k5NMn2NrZ5PqbruWJp54ghJTJQZbSDijNCqgBew4dZXOnAW3ofEs5GqLKIVVdUtWR3dmMnekcZTVNEwnJY61kZFZVwXBQMRpWNG1DiIq2cWilWFsdUZhE9A0udDRdhw+BLjh2ZzPm8wW7u7tMJrtMdiacPXuWyfY28+ku2+dOM5/tiERl1zMu9nDVDc/j6HXX4qdzPvSe92BMZHVYQ/JENPVwQNfJ+/nZ9ez6bK5Lc27JLipR584jRfGKVBAQJxCVBKa50CEoNDY3B53o1JLC4dBlns95Afc0gJbOKCkwSqNSpuITsVEKjzi8RHESsQm0QwWbWYu5LzSdBJbmz0x0Lfd84h42DryQoijZc3CDg0cPcObUeZTRQurIc8qEynP0XiwnXaBk7mSTYqVJscysxyiRQynIHDBK0RZ6jJh2q35WJw8n+kgqI0FDUAaNxmLQvodMe7g0ZjanRmtIyUEgRyC1eFUJjJocMZYShmsTyYuOwhZGHGWiFhZtivQE25QNsZUSaYNKAa1iZpcallIOyPmD0tmbKIQbpyWjzwRIKhKNdHsS+y1i7KTEuFtFnVm4AtNGlcTgnMyISoro5OdoZXLnDT5rDtHiEOOczAXbNvKed7+PY5dfzvNf+AJC9Jw8f5ozky3u/NhHeclLP48uiTQhIXZj3juMNdlMQZCj2WwuHWNoid7T+UCIBoOirC11IyYa58+c49GP3cl4XBMU7D92kJuV5RMf/zNuecHNbGyscvTgYR596CFOnTrJZZddKehvgNY77rn/HlY3huzZu04Mcgjy3tN1HcYYYoo43xFdpGk6Fk3DomswxrK+uk5hCwZVjYsdL7zjdq664jgf/MAHWbRTUDXTpiGkwDRYNnTB7myXeZtoveD/9QhskaisZauJdLHA+20SBUlXhKCwpSHSUZdDMZlQisZ5lAGtA6XpSCnQNYbposV1M0iKLgQ65/HzlpA8TePxXUvTLiA6Fjs7mCz07ZoFVbWKtRWDomBl9SDrl1/DM48/yIEnnmD1iiuY706YzRa4bo42jhSXRibPrmfXZ2VdmpxBFxnKQtz9fVjCX30un7j0a0hR5mdaYZBA0hA8RPFzDMQlLKeMMDBlI5b5YEwBr6RzSvSFAxSFzH+ipPfhs3VXMGLDpT0xdSTvSKnIqQIQiJw5t8PDD53ixpuP4sIONz73Os6cfLe49xeWJnrqPKvoSS1SkGWOmHKHJdpBnYf68m+dfRNT1uOhQybn5MK5/F4prjE6CDmaSIlPTOrnbCi5P4hnJ3nuorPMIpH6NjBLI5TIGGJOyYhCdze2kI5WwcgUJO9xKucYJUmRUErE5IVReB/yrDBm0pHC5xQDdBbjxwyTxjyIzeze4BM6CQtWgXhM5o699z7t5RIqO/Do3J2HHCgLBlQnSR2AMoWQbBDXdpPJRDHJHDiEkscfe4aHH/t1jl5xiBtuvYnjR67kkScf5opTh1AYiVcqCrSxFGVJbLJcY9HmN7ViNtklRUcInqZp8Z7soGLYMxe93/bONpfd+lwO7N9D0opzp84xKIZ85ON/xtFzh6gHFSujFa648hhPP/00d911J6N6xGMPP0Y1HvAPX/MambDGLs9zI4pIXQ1omkbeB9bKfXLgdaKoK+qqohzWy2R4YyzOdQxHFXe85IX88fveTQyKwXAovqtmRMDQdQmwQlgZDhmtjxkOLM4F5o3DFGO6uacsSmJQcpgNimKosKUE8cZ8QIspkSxstzNicmhV0QWP0Zb5vMlaUo33kc53LOZCVBmXmulsgYtOPstofIxYFZnt7oDvUMOCY8+5EWzk3nse4I4jx1jbu0apLStrh9BqwWj65CVtU8+uZ9enW5eWxxezj2WvSAZ6JxCNOK/0GQT9eC6liA+eFI2QObAUKmCTJUVJHTBaZmYxQ6Y9ZBiUlrQB1RPqTc7iS/nSlZhDJ/EtsUlo9krnYiUeMnKdWhGj5e67HuHYZQcYrBTs2b/G6mpFt9vShpwjmMjCdInvUbmzhF5jrpcicPKsE6RQpzwXhL406uyoL+zT5Uw0peVMUwUwWp5dSLlL7l1iMtEhJpUL4oXohp4gI/CxzqbhEe+9aAO9J/lALAo0hqKfmWVdHfGC+4rpf2rK4HVamrIBSPFFDAtANJZ9ILFKilRosacLYpjda/WTkXujYi7UOeWhd7lRGRoXolN+nXV+42Q+VMzwrjFGSEZRHGq64NGpICUZOj/2yJM88tiT7N+zh5e/+A7+OP4Jg0HFfDFDW3nvtV1Yvm8OPvU01wAf+MCHmG5t0bYNiURVVpAUs0WDj4GVBx7kNuDhRx/lGS9RVK13THenXH3ZlVx99VU8c+IZqrrC2LNYk3BdR2FK2sZx7PIrmC6m/N7v/h63POcWNvauUxQ1VVFSlwNSSqyMVkkpUhSaECMuC9h1RjCazknIbTPHR0fXOazReNfwgttv5977nmAy2SIqxerho0xmHZ1DunxrWV1boao042HF7s42nYt0PjGqN7I6JrN+faSyGq0Ss3mLa8UHNGbGsktKiGXeobTAz8ZoFvM5rgPfSUfoW0dtC6JrGVQVfuEgJaJ3YEpiaAnzxMJ5yn0bFMMR+y+/ksnWFg984m5e8JKXwrzDdwuC32Gdz6pb1bPr2XWprE5NqRAZQ2bo5dHeMnFguWMpLTIEsn9lntWQN3USJK8gCvElZxKIV2MQMoKOTmBDo0nKEDHEZDPVP1trJZP1gx7lhVCSjJKw0JQLFOC9UPdVUHzizvt56ctvI5YLbr71Bj783o+iTYmKDaiS3nczoZYzO5VlCSmxnP8l4rIAZSk7KJvvQUZGtRSTFIMEzUL+Po8KWlxotHTMSpeZCJOy/q6n4kv8kjjJCIFFDhYJrTw6WTk4qCzcDgJHphCkO9YFne7ZomRejchRVKbQai+aw56gIk4uKd/rXmyf8n1Q+eXOdm1J2KVSSeV1SfSwccovVf7vlOOHen18CnL4MQatkjCE0eLyksQxp7AK8Y6WohdVBAph4GoNyaDMgOGgZmBXuOeeh3ng3gdIyXHo2AFuuPlGhuMV0JbBYEyMCVvUAFT1kCkWnySxYrZwRB/YnS3QxrKvEteQmDRRWabTKaYs2LP3ALvzBZ+46y5a54SRmcDghY3sA6as2Z3PSSlQ6Jr3vfdPCMEzHq6hrMg9UkwUZSE2ZNFTlCVKQWksrnPM5g2D8Zi27YhtQzkoJdWi68QTtSpZWR1hbKRLMFrd4LFHd3BBdJNyLS3DakyhNaGDwhicj4LaG0NVlcTYMahrxvWA0DZ0bWK+8ETnGNiarutoUkfE4x2UaoCxBU07w3UR3yZC19C0C5RP1KbCRUcsSwZpRDObEn0i+AbVzIihRdcWs4AmdpSjVS678mruvfNjPP3gg6yv7md76zSL6RnWd3YvZZt6dj27Pu26NB1flAw3gf0CSRlxUVFqyWZPknIqjiEK0EG8OqMmKElwUFoLUSXl+VEuFCLeJRdLnzdaRQx9J6Dy/3PHmf0KA4mkk5gFJ4MkDIVsfdZfl6Rqd23kxFOnuO/ux7ji+GEOHTvCcO0BFtOWgiQBrnmeJTIDpDsi/8qUSMpjjFxPL9bu3TKF/Zdyp9r3vvkx5gKXSLwWkVw6ecZSVJJ0VIWSfIuQ+65+0kim3Uscjcxde52jzsVZZaINCfHfjAGXHWp0MsuuKQVIOuFxclDIz6d/fZXJbFLBmukdXkKKS8KNSeTfK/dM2LbpgiwjQ7J9xU0xs4LJ14bYqvkY0DoRNVmbqZbzwBQk01BEfz29RgsqEAKqqqjGA1ZWhlRFybnN88xnE0gdDz/xGPfc/xAvf/kXcvU11+FjYrGY02XSinOe6XyO856d6TZd27GxuspoPKbzSca8wImTp9hdWcMkYXz6LuCjwPWDQUXjHKPBgOBadIykIoExbAwHBBcwGImzWhKJtMDISoELAklrw+5iATFRmYLgHSEpJjszrLWgLL4LLKYNCsWMhWAadS1QaVGzOZkzXzic88RkKJVYn60MK3wnGk8XOlJSVHYo9z85Ip7BsKIwGtcmmqZjsjtjPByhlCKGgClk1GFIdI0D53FOrATb+QJFh1EJZXvfVyNJIcowKGta5UjRE2YzLDXgaLdmmD1jTFGwvm8flx09xs7TJ9i4bpV2dxe3aHHN3508vmfX52ZdEqsT40nG4VWXo21CtvyKWeuVobCevqgiqICOYKOi0BqrhDggVl0JrRMp58MpVM/yl5mWtsJ/DAaCJuJBRfGgVDHPj0BmaKKaSIi1mAVQYSmuVgaUDoTQ4X3kwQefJC4sg3qFY1dehicQyDl1Wjqa/KNJJHwMWWrgJWhXicVZylTLTKGgb5D6sFKQvU2pdCGVKM8JY8zzrRjExSZlL1AtxtkJhVNCMujxY0mqkJmn/GoxDU8qYk2B1VqcR5TJjMhM1Q9BCnhCum6tMuFA7neMYsgtLFpFUuYigk3/bsnFOxsIKKVIWmfVoQQMC6kpAl5kGyku/T37gixl0ROjIwUvBgA9OKxF5G5NBoZTzEGyGW7ubyjS/dZ1TT0esLo6RoXI6dPPsL29g6eAoqYcjJnstLzv3R/gwQce5KEHH2B3ssN0OgFga3OLc5vnOXP6BJAo6pqua9md7dI4R2hlFliPRhSFwZSWorSioytLjC6kKzUlvgvZc07E56W1WOT1sNZQlYXcPyMFVyBYRSDRtC3Oe2IKxMgy1QMkqT0FCSz2MaBVQaLEFgOKYkDXeubzjrJeZ9HJOKGqS4wVS7FBMcCQmM/mNO0clyJRadquQwFdKwkM4xVNQtIkUoKVleEytcJaS/SR0Cm8UygLJinCAqY7jUg0fMu8mZGUF6aoKTFRvEWNFthVlQUxQWEKidNSkcpaVIzYsuLI5cc4e+oEk+3TJJNQVYWunk1nuHgdPXr0OUqp27/7u7/7yN/0tfxtXZfU8YUUCakXZWuBKVXIAvN+UpSW7v0qm9YK5icf6JBalGpJyS5hwZQJMUprlCoIXe4SlTicSERR/nXIhg9CKydpbAIXwSPBqjpqdLQiss75ajoJoUMb0ZFtb8+4986HeMFLrue664/z4P0P4do8awTpYvtNPxcDlXQucjEXIJlPXujrkgjZc0+KUTlwVx6rLtL9SRx8IoYkJdf08UO5q9Np2QkHhOEqbE/5uhBZFMnItaYYSC6JMDvPJ2PKUCcs4dOliwyaPgsuBw8K6QM5rdO/mkoA0pCNr4VGI9cS8wEletkoQ5REctsnXKSESmE5I1VkFCB3xTKHFYsupfo0goDWeZacO8zUA8kp4pN0ttYYojYUY0lj6CbbzHanVFXNvgOHKMYjCutomhntvKULnvsefIg77riDtlvQ5IJ28pmTTNfW2FhfJ2lLSobFdMJ02mLqIZNsZu1CoGlbCmtpO/FmLYoSlRQ+eKwtKJXBlgO6tpG5swuUdYVrHY5AYS3ayByvf/20FbnIopvTuRalA1UxwrUB7704CqmIixGjCkpl8THQ+pa6KLCmEmatHhBZYbFwkkFIYDAYUQ9LxisVpIjzkTYkiYNKhkDEJ4dVllE5xBJomynzeUCliqIEUgfB0HYtMQZ8C02j0KpDtY7FPLNinUcZhUmaZjrDUkimomsI3qOzw4S1BjXSuKLEliVBR9p2TmkUDk81GGGHJbvTXQZr64SpdNLPrmfXZ3NdYh5fidaVhI1GMMgGZzJxI2ViScxFUMgPvdDZk7Bom7BRNnWvpEOQNIGAT+LggspzPpswOc2aFPPv6PtKkCsQ8oRSGfJUUWaJUQmmRu4UQiDlTScpMFbzyImTHDq9nwP7hhw9eJgTj58laIEBpeMzIqpPniQVRqBIcleqc9eTQoavMnRHT8Dph/J5Lqbzn5V4GkoRUYTcCsYASlk5HWtI2mI9tEgRM0ZsymLMrjG6d5cRlqNKCaJdpgaEPMsTqzWNSoFAIKExRCE0KE/AkvTFnauYZscMq/Z6OqWzaF8qlMwIlcgUerNuSHgvkLaEuub3gcrdPJndmueJKhsGoC0hBxhribOQ79cSQJyESYTWpXi9lhZMQVSwWMzwbcPq6gr1aJXB6gqHLjvCgf0rKALPPH2CE088xbnNczShY7yxRr0lMWEHjx3FDVdIUdxrjBGt3mi4gq0rVjfW5DVL4EOgrCrCUuyesJUYZackhw5dWOnstMrdH1iTxwGGrFPMbGClsUWJ1hIJVBqN0Vm8ryLaCMvTIwSeorDE1jFdzEk2EVpHcAt0rRms7MMpcfGxRUKZmrIeEqJnz55VXDsnYolaoqC6pmU0HjIaWwpjKK2imy8EjdGGUstn1YWINpoSS+oirmvxzlJaOUwF71Fa0XYtmih+6gwAAQAASURBVECKgUJbYelmMb7SBue9HCqjYrS2Qqs0qTT5QOfl86o1th5z5IojnH3iGTY29jJrHCwZuM+uZ9dnZ10a1BmSQC5JkbQhKIsDXH86jx4fFhKomQNFVSpkfpTZ7wGHCR4bwkX0fQmcVYXBafDWEowhaEXsXcH6jR2EaZnnP7IpZ8akyt1lhtKIMc+zQJmYi4GRyBPf4UPDx++8j9k8cOU116AKmzd/T0oiCg9BIMgY+w+y3IosP87yhfyX0sqAluLac1EVuVPMkKEnCEEjE2QE3oqE4PHeiSBd9QnnGlUYkjE9hrp8OVKUDimmPoqoJ0xASiI+txm6vdCVZmPslGd3USzNQoz4FPPhAdmo+4lrSuADyWc4O8V8zJHrN9rkzVv0hQLnyb/VEoqOuXu7cPfkLRUJSlLGVXFBtN7nCyaAzMrUxlKYksLUkCzKQ+ykax2sjjHDGjscUdQ1plCMRiMOHT7ES156B1/6pV/IlVce5fz502Id18OItmJ9Y4PReIzWirK0HDy4j5XxABU8ppCPyMbKGnv27MGWBdWgpq5rkUfEPLdW4lDjfZD52GSX6XROt1iIDCVGfBalG60prBTwFBJdl9M/dEEM0DYNIQSSBltYUAnftrjZgq7tsFVJMSgECtSarlPYckwbEp0PFLZCWemQV9fGkknYBpxPmHrAymiF0aDCFIrCKlZXK1ZXawpTo82A4XDAcFigdUCbAm0iZWGEHVwU6CLQdY7FvJHU9K4hELLoXtN18pnxweO8kH4kWkzISO2iY2BLamNQOjEaDGkWc5yPhGi54rrr6JoZMTh0gsn5nUvapv62rxgjP/qjP7r/xhtvvKmu6+ePRqPbnvOc59z4J3/yJ//T1veNb3zjgRtuuOGmtbW151lrn7+xsXHrF3/xFx//xCc+8Ul5eL/2a7+29rznPe+GlZWV5w0Gg9suv/zyW778y7/86rNnz5rP5OshBH7kR37kwLXXXntzVVXPX11dfd6rXvWqq++///7yUn7P/wrr0sgtRgm9PFtoBRQRi9LSIVltKLTCe7F2Is+5gk5YJV6RikjUOtscZdanVsQuZJJI3u6MMCJVEIcTlUymf0jszpIzrwXa0wkRtStQqp835gKIdBoxOqypSV5hTcB7x3RSct99j3Djzdcw3Fhh59xp+hIBkjyRcseSckfUZ9OpvJFrpZcOLfQEGMBkHV8kF6l8H7VME5fPnSgHCoxFESCCR+zRKlsQtcLHAhU8hBarNCHk50/fdUqYrYS6h9xhypzUqyhRQUHlMqyWekBFkK48k2ogLrWEIjlQkquIXH9IImJPefapjKKwJSmIyBmjSNYSgpdOTgWiF51jMheMCoQIk7LVqUYbhbYFyUkau7YSiYNWlFWFyTZe6ESKDt95qqqiKCqSMdTjIUpbsJpqWIvIfTGjKxVWlQxXal54+/N45plzNPM5XSNdhDUFxmici9RViVaJtmuphiWNmwsFH6iqGqMLtidTIdHYCq3zrC4fgLRSqCDp7UYJWqAxQiByDcFLjFBhShZtJ/fRpQzJK1znszG6dPcyN4fCGgrALTpal2Qe3bbUgypD1QOSrtndmeN9oBoOAEehLfv2bjCfz3E+CdEk82lGqwOKyjKsCsoCrEmYQYXzoGlBJXQo0Al8aJnPGyYTj4+GsjTM5jt4cqBwujCHDt7L50X32YsRKIlxRgiO5DTDeiARVkpQg+3JRNx+KJg2M/btrRitDNmdbmO0xv1NmFT/Da5v/dZvvewXf/EXDwCsr6/7ffv2+QceeGDwyCOPVC95yUsWf/7x733ve1eefPLJ6vDhw93BgwfTo48+OnjHO96x/sVf/MWjRx999K7hcJhOnjxpv/Ebv/G4c04dPny4W1lZCc8880z5e7/3exubm5tPO+fUp/r6/v37wzd/8zdf/l//63/dD3DNNdc0586ds7//+7+/8ZGPfGR855133nv06FH/6X7P/v37/5d4MS9Nx5e7K5nbBfHhxORzf8CHDEtisDrIJpV6jV4ix0czN4ZoQIeA8VrIL0oTfaRUF2y6guwH8jOynk3rgGjFUrYwyzl4KTMBY3ZRySG1fWpBinrJLNRYYWOWlhDgxIlTHDm2n+PHj/ORMycprFnCmSnn7CXITv/SASw7qEyqoZ9rChVRri8m0JHehFtF+V6tCgpVSI6fApV3o5g7Sp1S3iTF5SaEkDsxBbGf9ZGjcXptXyYVJYEdFTLrVDkRQYoOy7YzpUBMmSuavLxmPZmFvuzLEkG5WnaBkZRHggJbhhjluqJYjoVkpBPSAUPK2XiK5Ts+CowqhVl6xxgFPjVWDhEhxjz3FDZkjPJayAEmopUFLZFEti5Ba8q6YjgaU1c1oQ1snd+ma1o29qyiteeej99DDJaXfsHLuGm4AsBVx69k5erLiTFQVpb5bMrOzjYpWk48cxLv5KrbToqlUQKzWlMxXeziY8LmgNiiqqRLCZJs0blOOn+d6KKTbsc52i4yn3cYm2VB2lDYAu87gUvzfDkGcSryzmPQdEGDhegDi6kjBY0PC+r1VZQeMJ3u4rpEUFAUlsKI3KV1nnnraTFyUEhizF6XhrqwVKUheHmc9xFb5AOrd6gU8CGxO13gnBTnbu7xXSMz8a6jrg0habROYjWYNN47KZ7GQSwwRU2goHMR33RoU6MLIyJ5JYnzqISpFEEpiqpkMZ8yLEYo+79Mo/DXvh544IHyl37plw4AvPKVr9z+rd/6rUfruk4nT560i8VC/c++58d+7MdO3HLLLY/2qetve9vbVl7zmtdcd/r06eId73jH+Ku+6qt2H3nkkdI5p0ajUXzwwQfvHo/HKcbIe9/73uHhw4f9xz/+8fpTff3+++8vf+VXfmU/wM/8zM88/vrXv/78zs6Ovv766285ffp08eM//uMHfuqnfurkp/s9n7s7+anXpc34ABUDWvkMrRXZsSU79yNWVxJOGyC6/CtKAlHcKnwkFdLNkKEx6AXaedeNERVjX7ukwMQsBscBaUnbD0kRLmIMqiyniFFnfZl8aFQyaFUSo2xEJEPnElELQ/W+ex/iObc8h9XVNRbTqdiwxbAkePQwYS9gT1nbJx6OGdBMLMX2CVA5PV46xLicn5EsBCNFOR8OslcyvTBSZeF7k7wYvKgkXa8eCoXfBMkBzOSbfvSWUiaWYEjK5DmakaJpbU538PK4GHPae8QqhYT6anrGZFJBilR+7ZOS0FzihfsdvZfk8KRR/WsTC7SCSK89lGSDnqxD6qUfEVG49HPRgCoVZVlKsoREtWeNW0Lhhc2LXGLTzLALzcCO8DEwne4y2Z5QlyPWV1cZDi1t40hKs2/vmLIe8tD9jzEerXLrNdcDcPTIPg7ffB1FXWEKzXQ6JQZPVQ8ZjwccfFwyMr/0S/8+95dr/M4fvJ2bbrme49dcz2AkHWAv+SjLMvts9nNo6U5dzkmcTecEnyiKmhTJPpQhz75gsrvN7u4Os9053kVKZUFrYj1gNp9jRyW2iMRZZM/wEDF4lOpYP3AET0GioB5WBCKF1uzdGNF2cxatY945Fr5hNFjJgbweks8oQI7E0hZjvXiWTheUxZAYE7u7HVrV+HaOjprZbAJESjMiKA8+oryCQlx6vBNj6xQTrmshCRSulaE0hcSUefDOE42mHBQZmdFYU9CPgF3XURcFIf3lIvX+Nq73v//9o97h6V/8i39xqq7rBHDkyJG/sGg88sgj5T/9p//0igceeGA4n8/1xRmrTz/9dAFw++23L44dO9Y+/fTT1cGDB593xRVXNDfccMPita997dbf+3t/b/7pvv7mN795eV3f+Z3feeV3fud3XnnxNfzpn/7p6DP5PZ/du/WXX5fW8WVLMq1FRpC0CIkTeqlPSkqGWRJjozMTUk6QPQmmcuC8RxlNUpaWKLo1LRl6AoGJq34kZiaizj+74AJo2DPbJTNMp4RJMUciZTeQ1L9fAikISUQ+5lYKrxZx++bmlPPnJxy5/Dj3f+LPUEvTar3sNAgC1Yq0XIZ4fUCtQjbnED2oXo/YMxYzOSMvg8JohY9BNnst3W4gp9JHiD6hCkU0OhNTskBeW0giAzHpwp2IXLA7S3nOGaOQT7QS6Fhew+zKoi4UIa1NZob2cGcGnP8cG7OXW/TvBZWyOYHJEobklgxNkhEmbX6sHHSkkNEfVDJZh6gyOcYTokNR5g4oLjtBnVLuNaU7DiFBgK1zLefOngatMNpS2AFVvYLvPMOBZWPPOqYoGA5L9h88yGy24Nf+269y2Rf8fY7J3ULFQLdYkJpEO1/w4AP3c/jYIa47fpRT9z4AwNnNM/z3T3yAQ0eOce7cOdrWccP1V1CQcJ2jaVv0eMRksotShnpQA4nONWhl8D6ws7PDYDBksXsO71xmwYLzkfHKOgcPHWB9fSRXFcUubjQeCWxupHik1MkM1xdZz9lwbj7gI594jM4J3F0Na6zxrK4NaOctbSc2fsNqiNWa0Dl0nUTykLItnUp4L9ZzwZVopWlbT9cFUpB4Ips0zWJBCh7feSmUGul4lSVqL9Z1OpO/VEVwjpgMEAgpYoohppAUEoKnHozz5zlgUsA1Pst+Et45UlAE96yA/S9a9957b/kN3/AN1/Rd1k033TQPIaj7779/ABCC5LINh8P0Z3/2Z/f97M/+7N4Pf/jDo4ceeqh+29vetvetb33r3rZtH/22b/u2rU/19Yt/5w033LAoy/KT7HQuu+yy7jP9PZ+re/Op1qXN+JRQkgNCU099FFAUVw8UJAMhisMHUWCbC71cni9FsUkyKUOGSWZ0JBn8J6VziKmSdPI+oeEiQfqyO0wWUiGzKuWJOOkYVSZ19GI6pMuxBlLyqFhIwrkhdxYFDz/8BDfefD31cBW32JWOpL/qlJPI81zKKwnhlTmgPEdl0nJOJg2bFE0JuL1wClNk2DPpJV2/F4P0D5OmOZK0xsSETkoIMcZLYU8Ce+psTaOSvUDCQWBT1QvVETcbEMKNFC1x0umd0JQxKFVI1VUedJRCn51ZLqy0fL5aC8t16abTT/BUJTpGIkQvJ/z8+kqobe4A+1uS8vNRoINAYapMwu5NMuwXsb8S5mivIRQ8DqvE01QZQAWInrZrsLZiOpnivKesYf++Pdz43Ot56rGnefSxxwF5r7o8w9XKMBqvcfllV7Nz/jxnds8yMjK3H443eMWXfQm///t/yO3Pfw4WzdlnTlOWNWvra5TjmrIqqfZUtK0jZHi40Iq2W1BVNZddtor3HYuFIvgapRWj4Ziz57fy84PRaIBzDYv5HKUSPsOLxpj8/D1aGxaLjhBFLzjZFaan0gNihKIcMBzJQSh6DSFirSN1Dd6ztI4LCUqVkzeiwzlH8NC04LpETB1N61nMOwiebt7hOoetDO18QdAd2sLCdVg7RmFy2oWT90NyJBXQNiMtdoAuKpIVR6KiMGgdaduO8UjmiQRoZ46N9TUWzzxDWCxQ/u9Ox/f5n//5M5nZJ37iJ37i4Bd8wRc8Vtd1OnXqlJnNZvr48eOfFFXx4Q9/eOicUwBve9vbHnzFK14x+7mf+7mN173udVdf/LjNzU1955131t///d9/RnJE4WUve9m173vf+1bf8573jP/BP/gHO5/q6//yX/7L0/11fd3Xfd25//P//D/PgHw23/72t483NjbCZ/J7/lYWvqCgsQoVFGUSfVFSLWgRjauoJbtIs5xzSR6dIioNqUPFlqALki5QKWCYo+0I5cSBRDD/3O3EiEolyXsgkLTP0sFA0vKB1bETNxJEBxeSJ3mN1q1Q9pebKwKV9WiqbjAxoTqTITiNbxY8+MjDHLjuZk587M8olaPVYsul8vPpnVSiitIxkqRbUykXlJQ3Zk3S0iUak7WB/ffi6NRFBJhM+RS4MAvfU7bn8h0B2auUMZB6s+qYoVwlhBgVySr9DHmKYTVEQlAkJ6xIhcxSVP4a5MBYbcXoW4aZMj/Nmr8UUq7EankoUQmUsSijcVE6MRVrtE4ovSCFMmsnTb5fyLzOe/FTzZ1wjD1ZyGOCg1TiQiJ4hy0UyYhkJSKzxt7Zxqg+DsrkubPQ5pUxAi1OtvFzQzscYHdLnGupipqNNcVlR/YQnpHNtB6UmEoRg+TUaaXYs2eN1dUBTz3a8cADfwxAVSnWxxXRB8qy4vCBPRilmM0aSltJer0OoDUjUxMD+BjR5YB1u0b0QQ4MoWBc13IY0nIIOVIdICRFVVZ454hhxOoo0nYtTbOgGlaMhgO2d7bZnEwxWIZlCRqCtjy9OafRA1xUjAYlcbHD6p49xE6YpC4mPBWdD5TaM6hKxmWBiYlkIkWKOK/FyScGfBdpQiQk8Al805K8YrpYMBoVLNoFRaFJsc3zbUUKDT4bcIuG0+N8Q1GIkYJ3BmNKMUZQimgEig+uZVhYOm+IIVAbRRsLir2H0GdOMW8bGvV3Z8Z3/fXXd9/4jd945hd/8RcP/P7v//7G4cOHV/bv3++eeOKJ+j/9p//06PHjx7cvfvytt97aGGMIIfCa17zm2sOHD3dnz579HxT/J0+eLF75ylfesLq6Gg4ePNg559Tjjz9eAzz3uc9dfLqv33TTTd3Xfu3XnnvTm9607w1veMNlP/uzP3twOByGZ555ppxOp+anfuqnHr/jjjs+7c/5HNzCz2hdUuEziMN6HmaJHySSQqCSkeKndE5oyPxHCbgDK3T7pKyQGfrNPXmSKoSR15M0hDJJUglDkdPGxSdQNHEmf4ACzgS8AhM0FulyRPMl8Ini4g+NwGriOiJaKnqeY4oQC7a25wz3wWD/UeZbp0hpvmQwSvE0S/cRUk/X6LP08nNi2eDIbI60NOAGUBm+zWM8KZhKs3Q0yzDi0i6s1wWmTPDI3RvJZMF6hiFTypuOgmR7r2mU7n1FoxwWsLnDlmtA9Xc9Lbt6jRZnjXShW+8ty5Q4E4BSaGMyDBfRsZDbnmURCpYHoN73M/UFNAn5hiVRJtNmVJdnqioTZCxKSyca+tlv7vgE3tQX5o0piZTGB0IKhErR+ZaqGKBQPH7iFOPBEQZlSWGE1akiEsCrrVitBYeW1p6jxy7nkUI+IjrBuWdOsTocsrG2jjaKPXvWOXK0JEZhgroYJeNQF0sWbxdc7mYNhbV0oUNrTVlVIquApUWe7zqqsqCwox7PwHUdBGGM2rJkbf8eQutx8wVN6NhpPckaiqoWlmiMlNawf+8+JpNdXOekw3YJW5SUpqAo5P1vrc1RXEnkE8hc21hQIeKcw7cehaRnGAW+a9AR8c1VmcGpNM51OQml9+dNErcUpEtV2oJOhNhhdIFSBW0bGQ0jeI9XiqJU+OCIlFilMSmKgfklqq7+tq9f+IVfeOrGG29s/st/+S/7H3vssfrpp5/W11133eL48eP/g6Dxtttua37yJ3/y8R/90R89cu7cuWJjY8P/+I//+FOvec1rrrv4cQcPHvSvfe1rz3/sYx8bnThxoooxctVVVzX/6B/9o/Pf9V3fde78+fPmU30d4Jd+6ZeeuOGGGxa//Mu/vO+JJ56oy7I0R44c6b7gC75g8iVf8iW7n8nv+dzcwU+/Lo3cEhJlSgQjsJtN/f6XmX5EtHJ5EzVCWOiJLvQQXEVK1ZLOnqIGnwglvefYUvNni1KG5sTlqT56ckqEIhtFLr0iZdZmciaczgLsP/cccjd28YcpJemKYgrYqmDyzNOM9x1mOt9Gd7sC38UESaOMyRRtxNMSgVVjzFIKoB8Ch5T9QmPW2eVr0RmGTSnPRHM80FJKkIuXwHzZwSaKUD5p6fTQGhuTbPy5e5L4H7n/xEIIPiQSTSaE5NT6zJIF8uFEYL6lR0rKMGefpK6gNw5IGdrVmYIfUwKTZ4QEiCWkGui4OIopqguHBNkszZItGlO2oUOKtMlEKZUiMWVbM8wSSgfECUXbHrslBIntScqQvWXwUaOTwSTFbOppT55n38Yqlx9eBS3+j107p13MiAFCVAwHQxZNI5GCKXHkyCEAFrsth668kltvTvzmb7yFv/9FL2Wx2KXNbi4b63vEhUX3kVuiYayt0PFd15KiwQVPWVWUhcxBneuIRMxFhgQ+SPG3uoAkiIH3HqUUhbKAxwxqUiiZT6aSNp+ioBbec2DPHkKKzGZzOufQyWIj2NJSGMVgUFJWibLSWGVRSWGJ6AbarmXRepom4tqWxUzkMykFjIm4bkHy4HyiqqolUUoY1EZml0EOhdoIu9V1HlMobK2oq4rOdXgfWFlZBybMFx3l0BJcg9GWYaFROxPCYkY7m6H8n/8Q/++9tNZ83/d939nv+77vO/s/+/qJEyfuuvjPr3/968+//vWvP3/x36WUPnrxn/fv3x/e/OY3P/4X/c5P93WQOfMb3vCGM294wxvO/FV+zv8K69IKX8qn+NTLi8n6KpXH01GIG5mOndD0Vl06xNxweUIocwSQRnQNWU+mRdogmX+K4BMmCRtMKS+QG+LGLyxARxETuFwwiwtzNxJZSrEk5ed/enhQusKUtVIKS4oB1QYWTaQyc9ZW1zh/6pxoEKNAgzF6+iSKPvlhmd6nMtkkXkRlycVNibs3kEXbse82M4Eme1qSC+RyNqml5IRUoCPCxgvCno0xiQOMCiidkASMQk4OWovJdOq7bIixL3iBQAfaSweYNXKpT4BIwiBVOovUU2/SnehlFzLDjcIwXV67R0ez7AwB+rmA7rtiLcQGrRQ+pJxekZ1nIBdlm0k0/QBSOiM5UGkhTSmxVvN9arnW8n7KhVnl7t4UFaiSFBRx7nj44SdYX72WG1bWALjvgXuYpgVal6Qg/pVaa2IUp6HKCTrziT+7h3nXQYADe/fxzDPPcOa8pV10DAc1T9tTlGVFiI7RaMxgMJB7Fz3aaLSVeWXTtaysrnKuPYfVBltYAoGVtTHTyZyqqmjajtW1NeazGbPdKVvnN2naBl0UhCCfsabpmMWCrXbE7sQRhFCLNXDw4DqL6YwYo9iGdXKPSHI9xljqQUFZydjCuUTnW5yLxGTFN9QlQgRtS5wL+KyasUY0pDommvmUhOQdyiFJujthmwobV5GwthSYOwlcXVQDFAU+G9FHrXHeU5QWYwyFbplONjOJpkP3JrfPrmfXZ2ldUuGLSiThwt9I+KRRMSelG3E9cWQXiiQwn1Yp+/ZqSIGkW9noTQIsMRiSdlLgsKSUI2mED3+B1afjkhxhTLGUMQQCqHwy7h3BlgWpL4QgKeWZcdoL3FOQLk0rEb3HBBRoNWZ2fpc9R/ayPTpAnG1htUIjiemRDLdBhgt782f6uyMwqlZSyLX6JAuroJJ8DQNZUqC12KzFJJ2GVkLETEruo0geEjb1kKuwPH30wtoMORbHSqzNku/Zw7Mm6xyzi4xfZidqVDDLrhGdC28ypOh74if9M+vNuH0IZNNHTHTL2WFQDo1ZFrz+H4UCn11LCkNMXU6nl26iZ34qdSH5vT+2qFzMiKJv0bkLVkZeT5Dvs9qSNBiTDQ2MGJsHDYQWNfPMwgp3fuAejl9xlbwvXGAxnzEYyj0wWjSIru2YTCfsdRkSVQU7sznKd+xsbaKKyMb+vdSDITE62qbDuQ5jCxZNiylKYvBMtjaJQD0YMJ/PGQwG7O7MmU4yBBmDEHV1xAXHsB4QY2JjbYO19TXaVjrTlGAymRCipypKFq0nViM2J4GuU3gvDNuNfWOqOrG706CSJiaHUpqiKIk6MhpWGBNJoSMFS0qKznlccHSdZ7HwxFigtcYFB9riE3QBiIHkHSoZKmto25aAx7U+J65EGYZo0Fq6v57tbMsSoy2LLjKsLKhIFxpSDJS2ApXQqkABlZpycvOceJqmtDQReHY9uz5b65IKn85zFRLoZPMGKjliRmmwCmlxOoIqsQqSioS+8zICCyJ8QjQFURsiDhccwVqMNRA9PjqglEdGTySJNVaUWV9C5W6wJsqgAp0MCk3QQeTYMZGSfGjEP9SgokfpSOw7CmWyvyighRlpEUPfyc6U/XsPc65rcW5KpaUrisqwTIuXwV7uciQI12idBzwQVJTbnDJTFfK1yvf1qQVCBEqZ8UCGNcUVJERDwmNSjufJHR3Iw0ULloCW4HNG3RLKle5bhYjP8g0VxTLNRJ1BQSE2KCMkFa0MBi8Hi8ymNVHnOVpPXgj05Yk+WcJYEp6owyfBuj2Mq7XMs+bBc2jfPkajMY88+ghKWZKWhHWHF5YwkjmoAgStMrkmP69kUDqIzlArVMydkLXEVOIjVCmScCQTiMGTYqSLAdt5XFPxgZMfBOCxx56iWR0znHXUxQDXRcrK4nzAR0fTCgmmaVqazhG6loOH9vDMqdMUtqI0NWWR0EWiTeIS053fpXUeE6PAzUbCZNfX14kh0rYtVxw7SDWoxO4rwebWNivFGjFGxuMRK+MVovd08wVd26K0GBl4FyF2xGRxoebczjmiA6ULChM4cmAPk9mctpNQYjHMNhgr0gNdyKEyJkUMhs41tM4xn3mahaPrGjqvaOfQtRDiDLqEVfI6hAREJ6bT2SCdkIhe2MZOewwDUpeypMEgsPeYZLJkSUxkKW2BKYYC/5YVptToUBK6TbZ2dgnJEVxYplQ8u55dn611aR1fZkv0cTYoiMaI+Va2kTQqYbgQ2bMkRujMCEwRk0FRYbBHyt7dJWfHaQWmkLR1pTXG2qwTS8LcS9mTJaplmrfWfdBpZhH2pJGLCkDfrwi8yNKEWWeKfEoJpSMpOJRSNIsZth4w3HuI2bkniG6emZW9B2dvV6Yv5BKmfnZG1h/2xtoXhvQy/BfiTU/5FdLIEk2U71MIuJciyfazu561KTIPEQZnBm0UJqdsFGHZDelMfulfD60EehRYUV470U92kKLID9SFzjmzeZbdLAqU1Rk+yx1lJjYtZ5PZNk3l2Z7JhxNlE1dfcyXf8A3/B7s7c37jv72Zp598YkkgSkplUoTMtkKMF362zocXrdFG45OTzVVn2+0UgAUpgQ8CwfuQYdKMFkQWBN8yb+RA9KGP3s3BwwdZHw9ZG0Oz6EBJEQ8pZGM5aNo588WcGBw2JaJRnNvaYmCHDCpLoCOZkqhmGOdIM0OFmE+30TOoaxbThUDDSrG73YCKtK7B5263LC1FUTKZ7DAajUg+QEjsTCdUg1pQjTy7LsoR5yYNTeswyuBpGVaK4dhyemsbdMminUsEkDVAkrmi0HbJShd8l0TEngyubQkeiJG2mzGohyzmSoJEoiMFR38GkaSWkH1dU36dsqsOHo3B2oKUOuqiBJNJYWLBg9EG7x0hQDWos+ZXNMK7WzNSUviuFaTGLgcHz65n12dlXWI6A0v4iSD2XVrGKviYNWd9EGoyGTLTKJvp/Vh86khKLLKShiJFrJeolJiZmGgJ+0xJHFDIIm7vBUpJ2dQaBSHIxp1UJARhkqKswInE5RxOyAYG8Hkflw0AJRIKyZATWEUSAgwpJnZ2dtl79BDGz5idPwF5tiVOKipfY9+RsCS4gHRQJAv0iQSfTKhJZKMwpTA6z+Dop5FyUFBJ3DBCciKaT2Wu3JmkktPu5X+RqFwusj1x5QLZBjS6J5nELCXo53Vymrhwff0/fZFXZP1eyrKRJPo5le3o+uePyjNOldmdIbNdDXVZUwwKjl19iJ1ui6DhH37dP+T3fuu3ueuuu6ltiTZWbLO0xqeIKQr5edpId9eTmFRCYQTwzdBujB4dWqJWtK4AZSTSKsO4RhlicgxthckUeR8sjzx2gqsvP4xRJXU1wDnpnL332YUIFs2U3d0Jg9GAc1vbDFZXUMoKaqEMzjticETlGWjDZLrLQBcMqhERzdbWNPv5CCnEtZEQPWVVSs5jEjZlCGIc4L14tmqFMEGNsKZDivgEq/trJnOwtqDrIsnCVVdfhvfSvQWnWTQOled+SiWs1RRaDhZVaWmbjrbx+BiZ7M6Zt46kCmazLcpSsbt7nkJVeN9RFsLgDVkH2CYxEkgxkYJ080vtbfK4GNAUWWEjXqw+BclbNPmNFSO2LuWzEA3GKghetJraYLXGrtSoNLukberZ9ez6dOvSBOxkfnzuTmI/v5EWRLoHshtHlgoIeSIKBVpZjCpQRvouFUR0HjKtXmVBtBaeBFZyjYTtFm324IyAB51EPmFFKC1kDCF6kCQo1qCxse+ocjdFhisTxNypCTNTwjL7YKGUAnhQ3rFz5hyrqxvMplNCt0Arn2HUDPXl7rGf+fXdGsqTcuGTgnhBWqGzhkuIH3KfjEbS5hMZQhWmqYpKCCoqkVSQexSFNBMiJJUoMwod6WdyvXdnLsb9fA8p+qq/pr6R0oZEIXOcJPdBmSyk6GctSHGu6hrh7Rq5Nl1kElOQepyyEXXq2b2AtuiyxMXA8euOM20bVqoV5rOGV3/1V3P5Vdfw3ne/l8VijtVZ8C5D3Ysi27OvaMqHAqXFIoyETymn1ktEUNJaNJcKJOrJopSmsnWWs0gnWJc1k0VkZ2eKTorxSmQwqAmdoyjEQQagrgvqQYmLkel8wWhlTFVXNDPP+XObBN+yd88GtiiF1agNtqiYzxtMVWG0pbQGYxRt10gYbEykTog93nt0uODRmpLIRNpugUuB6B2VHUg6iilIdiBzPhWxtsIODAf27uHsiZMsZonkOqwuiSS6IMSwUbJCTNEGqw1dAp8S09mcWdcSbEHsPFVRLy3jom/RyRPaSHQO0PjYkUgYpYku4DvR2Sqj0VokEgBJFxhjcCFhtXT8RWVF6xnk/RS8o3UB7JCqEDu9rg1Ug1UWk9MURUZOnl3Prs/iusTCBz0zkqUdWcrbec/SJBfGDEkRM0wJKsq8J5kg2qSUUCagjDARlTGycUYgJvGI0VpgSzQg7M6ejUkie4QqSWlHkfBSoJUifZL850LB7okpWhkpuEqhc3afyrM3pcRst0iRMJ+xqCzDjcPsnjshG3ymXfSuJin/XQ91yvPpH0P++3wHY8w8FyOu/kYvOYxKK9HCRcDkFHQSZNsnVMg1wSCUnawF0zp3ZdmNWvUejJGUBHpSiJMMStib0sxoIftk4k9vTCrszgw/xpBnkgprLNaWAgMmMcAWUyQHXmQrfep9yq+b0II0ISX27tvLxvpedpsJi0WDRjNbLLjx5lu4/IqrufvOO7n74x9nOtnFFoUUXRVBG3TPDs26y55JqHoHHKVRVALHlVaeszZobbHaYoxFmwJTWcogbM19G2vs2JrZdEGKicYHNtQehmWNUtB1mVyixFS71CX79x1gujMRaL8oGYxGbG8t2N7epqwHWK1JtcwJY4oQPCEEjCqZzedorbCFoQ0xoxbZXg+dYV5omg5jAmS0RKOJMVFXA6Yu4aJh0TUE57DGcmDvOu1il+jFPN51HUVRClpSQlVLCrwtLFVl8d7jvNimKS2sWNebgIeEdzCoRrjZhK5ZUNpCjOJ7nSxy4DX/f/b+PNa27K7vRT+jmc1auzl9nTpV5epslwt34KIPXCdl7BAB4SV5KDc3hGeBHgqdEwFKLCSERGRi2YmILlIUOySOQl5IIPcG8kKcEJzLu4+Ach1MiMEPG2xsV5WrO3Wavfdq5pxjjN/v/fEbc+1DcoGcSxnHcEappHP22XuvueZaa/zG7/v7NngyFszbttFm5pKrnMjhXQtzd50zvimor9aFXUNG2T84JG8zm/WKvu1QHP1iScliJK55Nn5n3Vkv0rpNVmedGeEpdYbnBIOgooNql+ViwKultRcFp6lCS8b2Q4xm7zHHFxUjwMzuKI5qpRS8CWWr/ZYngBaKz6AFJy3qWnAJJWFzLZtP+WKb+aTzE43gbf7lJOCdkr2aNRcG+6lUUbszNmnAEiZElXF9wuHZu9k/c4XVzaeImnDeMWmhEduUb9UbGmwacH6snVe3mxexm3xWI2knoLbJu6D4TpEiOKmQK4WkmdOCbIxO0+2ZXtA2JF+t3WoyhStAJouaZKQeSiwtw6qr94Kfi2rV1c0nbJ2hYl+1dzhihLLZgDPRcdGW7H0lPEy74oRW5x1vhxMvnpwyr33stUzbgVYUQjT9oAjqE20vfMEbvpgv/WNfzC//hw/wS7/4S6xvnkDwhGg2ec5bMQPdyRogVoasHXu6tiH0xhJUPItuny720Clt29Euei5uzP9xTJmj1RHd4EhaWKWM0tJcbmnUIHiAgmecBkJO+NAiBI6ur1gsl6gK589fZHOyIU+KtNBh8GcXHPtnWtbrzDhOtZhm0na0IofNmVPKRI14DO6MITCNEwfLffr9A1abDWUUyjjRHV7iJDua2NMVwCdecvdZ1psjRhp8AyHWPEevaMnshx7nQ81mNmh1ShPTIIgEondoygxFOB42hGyB0QAEz1QmY3RKhmIEJHFCrlFLoYm4AuoyyQlNbGi9wZyx6cma2FscMk5bpkmIoadvPV2MjNuBxoM2S6ap4MUSI0LfQJn+kMnX76zfj3V7rE6oEN+cOmBETpU5LcHqjjB7PNZCqEZUEbI5/IszVqhXJgQJ0Rw51ETTvv4O83v21Z4q30JKqWkIPtYiamw/7y0KyKnDE8kqlVU2w39GswZPKRlyIfhQOxtD1bwH0Vyjb6zIeA85Tdy8eZUrV+4mpw3jjatEnWijoi4Q1J8+CMaElFnKUe/HrXdyzsND5+6vVE0bzCYvKlrTyE+hT2tc6/cVzCjYNbYRVZmDx2qYAdPu9GfdXHC1urxYMfPqdsNJ3V2r2nxS4bQFLZQc7QJ9si5O4m52tSMSVaarn4NoteBoONjb4zWvfTXDNDEMEzkLrfNM44iWif3FEknCdhj40je8nocffTn/r3f/A4IEylTwwSE64Z0SXKgenYHGB3xoQSs7VRWy0raB/cMDgu+IoUPawjBsObp2jeeOjwG4fv0GW93H9w25Wpddzy9wz+W7CbE62wDON4SuR7UwpZG26dhuNqhzdMveiFbBoaXQxJbtZlON3APl6ATosMOI2xGRnPPIVNgOEyVnlp1JNHywg6QW2I4DBGh8R84j0+Ro4x65hgLH2HHufEcTLe3A0yLjwDgNtP0CB3RNV7WNDucLbdeyOspMY2QYMrkoU1LyYJFeMQYohTRNlJLZZUdqQrKAKOJLlTpk8y+gQuka6utvRgLBm1NTiGaqHXxTi/ssVVGCi/Rdh4ow5C0EOxD3IZrJ953Sd2e9yOv2oE61wuFdrCkNVMG5bd7OixU4ncXdtcdR8+sDy4BTb2GyTux3uEJlZhqcqVpsniVWGIxIYv6QKvMM5NS1JIuvtmYmhDdVgBH15xC4ef42h8mqh8ZZARIxBmGInizmeznP5OaPsEihTCtu3niBM2cv8cJqTfDKKKO5ocykn/leOYNBTw2eZ7oIVed2WiRqGTOqd+2sPI5cWaHOeyMT6TxLnH9ntXDTmbhjQz6DPLV26HNHaY+hNT3dVx9Tr7fAhTNrc5ecV82gmecs9XGcwzmzmFNnZuM6/3x1YPEh1NeuPjdnd2izXbMdR/YWPettomRl0S/oDxZozsgoZO9Jmrlw5SJ/9A3/A/+fn/73FqgrBnOXKSM+QhHTj/mAa0Y7xNQaHVxkzAmHcObMBcbNyrq9JjAilGl2bhnZrD2RnsVeR5ERGYXrV69z9uJeJc6AFCGlQvQVwhaz8fIh2HtGHT56tsMaN9lL4UJD7Dq242AHoSI4n5EsBrt6E/W0rcc14LUKv7UgosTWOsLVycZGBRoR39J0ezCaabc65crl86Q0MSVhfbLBC/TLJSFGfHA00aQ9w7Cl71tyymw2IyLRINjYULZrcoJMoetahnFliSCzNra6y7gQSVnRkEjJCGmuJJrQGVTqmjrLNmg9p2JB0s6RUqIJLQ5H0wamaSCPsL9/tpohZFxwRmhqepaLlrQxR5o76856MddtFb5cLPbEIZYEUINTXe0WnNQNVSy5Dh/qFm7mydSNXBSir24opRBoKGFWhc0uMDVF3AmhwjVz2VANFa7J1oxgZJKiVF2a2W6F4OGWFCtXLbZEpeqisnU1WNdXKjxKJZ4UkSpoFpsB+kQa1uS45Mp9D/LxT/wasVsQsgmw5+cHNsebY4lO17yL2EFBKncDNVanWR2GysL0dRO0Kd9cIG91RLEWzdw4pNqoWSqF7uQRznmy2N9nGYqr7izU+20pEfOUEWbKO26+5/Wx8IhMoA3eN7gg+FBJQH6eKZb5spgzFudDhxMlhsA4DnRtS8qF/bajawJ5mpiGAY/NktarE7xXXvV5n8P//jP/vjasFrFkWYIFKQl1jgmhjAnzX4275+NxhFVks96yvzhkNd3gwsW7ODhzlm5rwvS+i7Rtj6ozFmSEBsennvoEh5deAdEOAd4py2ZJyRO+i2QKbWzJUyY0VvS7rsW7JTlPdN3Ccve8J8aOkkyCoSqEatzsnGMYbRa3XCwo08B23NQDkLGjRYTgTIcX24b9wwuMrWdzNNA2C5ZLz/7BkuevPc92LIaGaMZ5s0cLeNrojSEZGryLbDYDpWSb74WCSGEaN5TsK4EpIznhnaNprdNEhFRzEWO3gFgoxdNGC6OVSSmuEGOP9w1STO7gG5vxKUoTDR0I3ubT3lXGsJr1e5HJDq2hMRSocZQ07t6Wd9ad9WKt25Mz1ERsJxB8oPg6q3NABbo8vorWzffRioo3O7BomyPZwlJDUIg2VD+1tjKRq9nz1XgX1co8bMyzkupLKZN1Vb4BF1G8jfgkoZrsQ+VPPzW+skItEd3cVUpxOPGVUi02iNdTCjwYjTtXqFWmiZObN+iv3MXBpbtZHV2n0QEouxkotQuep33sXExq4ROpGXZV/KigKpRKSlGMsRiCWUCpWhTQLJansjZVoxE5vP6WoqiVWTp7mKpXm2HWjs/ugC2dxYPVwLqCnRXinOqTaYDGHtpnvIbqISoWO+MriWX+XVC7Iqvm5swibIctR8fHzMG7TYwWXFoKSeyxhmmipEQaEgTlV3/5P5swvdrASTGak/euFnbLNjRyjzd2LwHnalSSg5Pjm0gqxKXnuWef4b77X8r5uy4CsFj0tN2C4JWinoiniHLtxrUqObHnM2w3NgPbbOn391kuO5jsemNsiW3DsB2t082Otu04PDwk5cI0JsZihgcQatq4vedi9DgnbLcr+rZhdwur5jLU9AlcoahnNW7w3QHLxT6b1YaXXL6HaRwZp8I4ZiR7Sh6J3R446/ZijKhCbEIlrpj5gCC0refo6KZB/E5wYmkMXdsimim5ksBcqoxnj5Y6b1Uj3eAatCihsfen1plE8DajdiGAF2KAaRjRUCyPzyux7ZmmDU1s6JrIQCE0Ad9GqC46uM9sx/fcc8+Ff/2v//XBwcGBfO3Xfu1x0/xX4Qf/Xa5hGNwcZHtn/dZ1W+8oRW+xLZs1c/a3OqLbfV/wEGY5gisoVoiq1KdKF05p9fZht+4FrV6RtXsRN5FIJK8UX1mIZOPhuxoUmyFiMEuuEJxR3+vFOysINgOshUYFdXW256TadWndqO1ELChZEkWh5IAvDlLi2gtXOTh7nrbdr3ZtajO9WuBMXld1Z/XxZ9gMamPoqOzE+X5WyHRusGrpjK6G8s43uUpHzO2lMinrz6raCV6KUMRc9x23HE7qPQObQyrVn9Nn1KUKi5qgWior1B7zFs2cS+Ayoh6kqa+45TPOxcgHqdq7Bldz5JYHexyeOU/XLNisB7abNavVMavNlmFKiECZhFysoKh6nG+ZUoYilDShQRFvhm3FCUntYBDUEWgIobfuyNXcP7HYpdW4IpeJVCY++eQTldwDy8N99g/3aff2CE1fjyrCZr3h5tVjnHaA6eW0TMQQUVH6rmWx6PHqGTcjebAOaZoy2yExDBPPPfc8x0dHrNYniJowPouQxoFh2BCCo2kaUs7knBmHqULjjnHMlEkRilmuOWW7nRjzHilFhm3CxcK5sws2qw3DplCSzXcXewf19TUI3wfzSDXC0UQpZhJdkiMNiWm7ZbPe4gMUTeQkpCmZD6y3sYWZtAR7T8lIzqBlhFLIGZImkGh3z9X3iQcXPbkkSklIyfRtYxZvaUvsGlQLWTJjyqizrM/RTRSfUe2ZTRE+U+sf/IN/cO7uu+/+vG/8xm986dd93de9/IEHHnjNr/zKr3SfrscrpfC93/u9l++///5Xt2372JUrV17z1re+9W6Ab/3Wb733wQcffPVisXjdfffd95q//Jf/8j3jOO5uznd913fd8+ijj77yB3/wBy/ee++9r1kul499uq7zs33dJrll3mgduGDmxKj9OURjHzrFibe4HASRybpEODU39vYGF6kG0b6F6uvp0Coadkgxo+XZtQWqUN7JTvNnHpzZuijxeLUPOszEhJmhaJ2i09qFSdWKOayIkq2UV69IUAvLc+YmYmdbxYsjONhutmyGLZdech/PfOyYhYvVxsyZR6SvIN/sUuZ2l4Lz80zS/l0lWyKBXRgzCcKGdfa4zkHOBRcCudLgQ4VFpc6YnFnnVJKQVt2fq5ISj4pRzHEGQYuz6wrUgl/v0YxsGmxocKliXTrO4nt2bjTObNoENRKLQqxwow/Vqi1ncIHFYsnTn/oU4zDVzRi0Ebw2bMaBPgQ2q8LhuTPsLTqUzP0PvYQPfeBDyJBtruiqfbUUshM0eHxRfKkMRj/Z4NkFgos00dtEUx1pGtg7OIOLHetxA8DeQc/e3p5BfzkzbbegBU/ghas3SMsee1Uc2zHV1waOVidoUhZ7e6w3J5SczVuy2Pxus9kQQqjp5rDZbmj6jthEcyQRi/2JsSGPdkhwTQPOING2s/eHEUoKBIjLJWfP383NQdFYOH++hWmkFE+WQBYlho5UTYJiG+m6Bs2Jbrmgax3jOOAxmD+lKtIP5m9aioXCxsZcXLwaYiLVMzc6by4uJZGztxlnxPR79fCHK4BpJqWOIHxoasEVmjYiKE3XUNJE13QUF5iS0ixbwBH8Etnu4WJnIbbplnnF7+O6ceOG/4t/8S8+dOvXnnnmmfZbvuVb7v/5n//53/h0POZ3fMd33PujP/qjl972trc9+YY3vGH11FNPNR/60Id6gIODA/nhH/7hj99///3pAx/4wOIv/aW/9ODBwUF529ve9tz880888UT3kz/5k+d+/Md//KMx3h6g94dp3eadKeg8r9sBeeZcUlyNNlUhqMNlrWMkMzI2g+G5lanZaq5S8is5xPLfCsEZBOaxDxe+diyqlvDujDHpBYqzIuMrnFh2syVXO5p5uV1x8DqHmNbipA60q2QU20B9wBw0NBlr01etnQjFeQjK6toNzi8OObe8wPr4aXCVLclp9zt3bqfUFvsdUmoxFvAYiUB3cOZc6KvJ8y2TNu/NgLmUUwu5U4JMlfCpwWqCkWsC5s7vagHX+QDjPeKdaSpn7d/u/Kg1z6+xx3HZpBc7U+g6K3Ra8+RM72edQb3PIVZI1MyQH/uCz+dP/en/G5uTLSlnjk6OaXzkyaee5t4HXsLhYsHqZOCZ558jdJ7f/PhHcTHw59/8P/HjP/JjyJTJau+R6L0dfrwnqOJLQNXy3JBEMdASshjztemQLEwnGy7ddY4wGquz9Y79/QVpGJlUkOAsCNc3xO6A8xfPALBYHrLN0LaBlBLTMBEF2rYhkRgm4cyZc3QC69WG559/jn7RcvHiRbbbieBbpHhGKQQNZJmqI02kqFoMVHU5GYeBlCbatmVMW7xfgEaahWc9rtgMikjgniv3c3x9w8nJ2mALZ4xkPIRo7EnNieigaz2Spxo2a0hKCLBdT6xOtoTQUopB4t4JfdeThwmnZg6QcyFQUClIscOmD2ppK95GFUVW+GLFyvkFPrTV7aUYU1SF9WZDaDyhDXbYKBYuvdzfg6I0vqFre266htD0KM4M0T8D6x//4398bk43f9/73vfhf/Nv/s3h3/ybf/OeX/iFXzj8xCc+0Tz44IMvqnv2jRs3/Hve857Lb3/72594y1vecg3gVa961fiVX/mVK4B3vvOdz8zf+4pXvGL68Ic//Ow//+f//PythS+l5P7pP/2nH7/nnns+M6eFz5J1W4XPRktGWdYaIe18JAsmpA6mgwsaQSpb0I8G/WlNXlDMNstJnb/ZfElDA34mmCgR6/SsxrpKafYEAqVCfr6SUsyxZdrBbUG8CdF9qAJ3CASTRVQmpBWdaPZpt0B0xtepJBdnNlHgTMZRW8SM+RK2Atev3uDSvfexGq4heUtEa07fqR7OzX/W0xs5jx6DnhajXa2cmZT1f6jepD4YK9O5HVdo91214Htf2ZS1fytSubXB8vykMi29M7Ni70zbNVu27VpUauKEtnbg8GJzM1eqTMHtrlmpkK26nZdj8NTOpWP0Vihe+bmv4j/+p/eTh4J3gSZ2rI5WFIXf+MivM65POHNmn80w8cBLX8r1a9dJ04g/A/c9cB+/9uGPIu70kNPUmCU3vy9dTVegGDF1hs9VGdOEb4HjE67lpxg6u9s3r11n6Pbt/iYjWahT2j6AZr7o818LwJc+9vk88MA9iCbagyVXn73Kg1fupQmBMU38H7/4i/zGb36SC+cucc/d9/DAA/fzy//5P1FK4Z4rd9OElrjoee7qcwQV9g/u4tnnnjeUoXE89NBD+BhZnRyx6DuefPJJ9g/2OWzOcPOFgfXJmouHZymaLAMw7NEE5WYyq77j1Ql9f4bgzMQheE/bNXgyfRuITpmKosUzbC1/crOZWJ0kSnE0TWBK9t4oYiSh4CNaEuINxqcYI8vgcjNeGKeCEG2EIUozz169IJrxyTScbduwHQdSMtmOpoKPke040vVmQ7gdB2LXI0nwvqFp+yrIv51d6sVbv/7rv94BdF2njz/++FpV+Zt/82/eA/DhD3+4e7EL3y//8i/30zS5r/qqrzr+P/v3H/7hHz73d/7O37n8xBNPdJvNxpdS3N7e3m85Fdxzzz3TnaL3u6/blDMYbGadiHUmdroXIxtoMMjJFZuTScRh7L86uiCqGrTmHMWJ+feR0OxxTbUlc7WQuYCZD3sGLRXyrGxPjTjLmzFyRwEw6EvF3GSkQnhQi5lNqVDCLqTWCos5/RsL06QSdZhV+R6lPmfrynztckSFk81N/Kbn3N33c/VTH67dV4PqBkdjT9q56jcyL/ubF+rAs3aaUOd2lUhTpQPeBYJWj8xss0m82xU+K9JG+NCiuxkdfmbJOpzYpu5mRFXVWK1OoTjzWLUbxamWD+vobChbJ8JaNZw130Ixp3+oAbVWuKNr8OrwrdJ3e6iP3Fgfc/7Kea4dXaVtA3lKHByeY5q2xNY605ITmjP/+7/7WWLXcddL7sZPA6/+3M/h13/jN/EukmXEoZZDPENqQWtBDHg6XDA3EZzFPQVvFnloYXNyzLVj27M+9anneGr0HOzvEZpgc18RgmSef+Yp/t37/i2vBX7mp9/LjVe+FJHM+QvnCeJYP/Ms23Hg4OxZzh0c8Ce+4o/xkV//KKITsVH+yB/5It7//v/I2YMziC+89vM/lwen+zl/9gxNEH79Y7/O6qTw5JPPsFgErly5B9VL9H3Hwf4eiiM0gUdfts/5s4fc2Ez80q89S3KO+19ygeMbJ9w82hBjz8Few5S1knxsFjeOA12jdH1PLomcMrlkNuNIqkBM23W42BtpqM5+vbcw3MZ7S17QDNEkRZIyk2STHmlNZu9a89YVRxtj9XSdAMWHhr73jOOA00jf9EzDim6vQbInhj0InuIF78zizPuO0HeoV4IYi/wzsa5duxYBlstl8d5z7ty5XZG5evXqi44j7u3t/bYl/n3ve9/et37rtz783d/93Z/66q/+6uNz586VH/mRHzn/rne96/Kt37dYLO7Y3Pw3rNuMJfJEZzM29Tb1tu4hQw2VdXP3EDDoj8aIIs4Mcx22OYl68I4QjCLtyhy5YzZmNi9aYF64AaJDG1+7GV9P+C2oudx7F+vHI1C8fUjxUu20DGysfZgVaRGyL+a4or5m8dnGjavG06pQZ1vOURPDtbISewQheli98AJ7d99N11+hjDcprHbekFoLyWwFBrWvmkXouzHgLQN8Mdd7y/FroG7ifu5uq8O/SSAqBZ05R++0C/O1s1Q3yxsU6rzLOaww1iR4wTxSqZ0clWmLUxNU+wac1CaqyjfUmKteAuKlCtnt8UoCFwqb7cDewTkOzyx5+OGXsS1rXnLlPvbPnOXp559hdXzCtWvXLEXBB87stzz59JPcc+99vPax1xFi4Jd/7t9Tpi0Xzh5y9YUVIXq7b2KFPeNwKgRXiK615+DN6kzEWcyRau3z6+Ginok3q4Fr+hyr445uuSBE23yjd8RGaPb2Abjngfto7r7CyckKLQGP5+bRmqkUTrZXKZq4dv15bh6dcHR8wmq1Zm+5z4Vzl/nkE0+zt7fk5/6/P8f9D90PUmjdBCSOTlY4F3nh+ee4ee0ql+++gA+e5557Du86lss9jtw1nn46szx3L7k4+oM9+v2G7fNrQlhwsl4jztF03Q5Z8EHxXjk43CMER54KKSXGPCJOKTU8OOeRnI1YFHxDjIGcqquMFkpJBFdoggdxlMYhvkdkQsfEYtFQXAbNBq+6U/WrOktgSWVrtnxquYp93xvZRxu6fmHWbEWJweGSzQ1j1zKVyZLOPkPclgsXLmSAzWYTRIQbN27szHYvXbr0ondVr371q4e+7+W9733v4aOPPvrCrf/2cz/3c/tXrlwZ3/GOdzw7f+2JJ55oX+xr+MOybg/qrKx30cRs0KwzM1BnBZ4VtgIGO9Z0cEMsbZ5W6u/yYpuRFEtkEKc7b2tRLLXdBTRYkS2l0GIf2lyqEF4zgflk7/ESEQ9FRxNNz3BjJXSYl2Qt2N4ze1K5Oms8hRylskfneWYl1FRij1adnVPBJeHGteucv/BSnnnqQ8R4hM/NvM3a/+qZhbhSOZi7eVqVKTjqpuWq4FstIgmpfpi1Q40VzixVguBwBCKiSg2r2An+w9ytOfNYFFchUcdvKcxOawfpd26stUJLvQelpsLn3XsBAO+M5FKZsvb8ChOJMQ1oEk7SDV7/+q/g817zWjbpBnmb+djHnyR2kUdf+yj3P/AgfbuAlLh5cpWf/d/+HS9/6KWcXzT84vt/kZ/5tz/Nous5unmzMjWN9GOaUOuYgwuWV+grnV4NhTBPWXuPFkKF4OygYM9DkZIZBmNchtiA95aO3i+55977AOj6ji603Pc5r+Tk+Jg+tkxjoqi5rUxpy3LRctd9gY9/4pN0N455yb0Pce3qEeIcJ8dHPHD2Jdx98aIdTnwA19C2kYcfvoyUgTKObLYbzl84z8WLF7h5Y0MRYX9/n2EY0BJxsXDX2XMsaSgh1iDm6jgk2fxEmxZlpGns8DVOE5oSbQwMo3W0pSSGYayHUHNMyiVRsgXXeueRnMwk3aAJigqp2JzVRTWDeBVCDGg2Xa13kVKjxWIbzZR7m+naCC7jo6NkJYSAukzRLUGjEaKK6W/VGZoQgydPIzvX69/n9fKXv3wEGMfR/ezP/uzeT//0Tx/O//boo4+OL/bjLZdL/bZv+7Znv//7v/++tm318ccfXz377LPxgx/84OKRRx4Znnnmmfbv/t2/e+7LvuzLNj/xEz9x5qd/+qfPvdjX8Idl3Z5Xp9gH1nk7TfoKxZkDf6gaqorx19idIrmKVqtLCpYw4B14LZAzgWhkl1okZO42vEecwSBSzZst5UGqfi0TnNmTMXcnougcb+T0ltOiZfrZfm2sz1OHEnYFxOj7oMxOJh4hWlZY7ZxA0ZBtw1Al+oayyaS9I648cB/PP7XBcuGiMQ1Vd3NEeygjf+yoPo6d9t2ZbgEtGeeKCcVrEoFd1RwIq4gP1cx7/r2zDVR9IDulMAvhBcEAXW8embAL+fQ1Akm0GliHOeioai+Zf9cpecWKqNsVT53HmKXOfF2h0YhME4990Wv5zx/8RXycOHv2Anffe4HPufAq9vZadJrQYcW/+sl/hms7zi16nvj1X2O9f5Gf/Mf/K0d5yzPP3EBEGCUTY29dnPd4byxbJxa8qqFGVs1zZXRm/OzkFlLkdG5U4Trn6uxJzTS9XyxIQ+HM/lkALl28QLjrLvpFz97eEknZhPbDgA+eveVFHIVhGvnc1z7KYrHkJ//5e7ly6SW8+rWPcvPGDdYnK7Yna2IIHF464Kye59yZSzVXUrn63PNcOHOOwzNnuHihJ03CdrNhs1lx8eJdPHU903YNl86eoYwjw5QRdcaWjA2CSUEs5R6WyyUpFfomolHYbrdMqbAdUjVnt8PbNJqMIkaD5lWN3YkknCayVns0Bw0espBUaGgRhTQJTYgWKTVmk35gxg9JtrTdEikbVCdKVsZR6EKLhkwbGsaUGUumbRZ4lCmPeB/o+p6ma6r5+e//+gt/4S/c+O7v/u4HUkrujW9846Pz17/0S7/0+MWe783rne985zMxRn37299+z3d+53c2ly5dSm9+85uvfud3fucLP/dzP/f8W9/61vunafKPP/740Xd913c9/Tf+xt+459NxHX/Q120VvuBradJaXHR26zCnDKexdg2TzYBcY3uOtxOlFBNTe+ewFibU4tmaxi+bZZELtbPKpYpYMwHwWlO5qaxLJhrfoOIoFEIA23UzvoZjzizEombhpQRUZmZIqV2oZaAZB2T2/twZe9nvJFXShm1SooKPnqAezQqYtu/Svfdy/sJDHD/zSfAmkzBPtnJKVNl1ZVSSDsyMF3VaO0sM8iwwi8P9rlSqHfF9BMlW4CgoFpzqXQ3lRVA/V1vTFEatko4K28YQLVV7prnOHaoLiNhkVKtFV6zfpnVuWHB15ulRL3atatZr+3t7ZBnxyXN44RDCiEsrXnbfw2TxXLv+LNkLz3zyKifXTvixH/8pLj94mYuHl/jAf/ggL334ZfyzH/2XHB2tGHJGpcGRCKqQ7SBRxNiCobJRc7AThIj5sEqwjjAiEATxis760PpqODGBtjqtkoSM14KWQ7x6Gm9oUpHCOG1gcrRtY51VH+mW+4TQEEOgpInDvQOL7fGwXLRspy133X2eBx64gpfI//Zvf5aXv+wRzt51yKLtCMHmbNO2cP4VDzHlLU0T6fslq+MtlIZxC88++yyjP8eZ8wc4Jwy5kLwxTF0lNoXoQO1ri77F05AkkUtB8sSmTGTnCLFje7IhKNap7i2ZBpO8+GhSBylaSVRCk3OdE9snQYLDi0HuwTd26Klp8yH0OBft58lIWYM2lsQCjMNAbHpjs8pECA251LlqiIzTRGyAEGi6ntC05GlzO9vUi7bOnTsn7373uz/+Td/0TQ/PX7ty5cr07ne/+4lP12OGEHjHO97x7K2Q5rze9a53PfWud73rqVu/9n3f933Pz3/+wR/8wad/8Ad/8OlP17X9QVq3SW4xnZsP3uYn2YTfsyekfQBNfiAIaMZrg2LzBIqJtM28tkbJYPoynQ0ONSDZfA/VGXQT1XEa4qo1i45aXAuFxsyAXSYKhN0sx9c4Haomr7EC5KuLikr9Ple5LDYrMnPtGn/jMLhWza3C8nej6e6kkEo13nUNQTI3nr/OxbvuRZdb8vg8zpm8I3iHzP5pt5An5w5ZARMYW3CrXXbttNyA1xatnep8+CgidQM3g27RQpGMTVmMKVdcrt2luZpYh26dr/cWgmusWyE7ez1d7XgdpzCwoZ1af7YWYAcueiSrzROrBjIE0zQulnvkMXN8cszff/ffIzbKyx96GZ1f8OV/9It4+onfZFpv+bf/6mf5+JPPsNXEtf4629XIP/ux/zfTVChJkawEb4/b+hapxbWUUm3noPGe4CuM7s00PDhn4m8Fakdrhdtg7Pk9nbMQgqspEQbDrU9WPPYlX4DHDvaqmcOzhzz7/HN0TUPXNKy9Bx+IMdK1HScnx3hMK6cK991zP//x/f+Z+x+8h0t3nePw4JB77r+Xj3zsN2jOOPZ7m88Nw4SPPW3fcu3Gc5w9e4a9xUhKBRmU1itJFQktXfRM48Bqs6VUhmXsOzNvT5nQRpom0MZ2Z/clYikU4wCrtTCWgVwGSrFDY56M6BSiJyWzjQtEUlrbAS17HKMFeBSHhkL0ARkSGhXXWTJDnhQNQr8IBN8gJdfD05bYRqYx0/UtU64SHheZJiEAaTYokIwnkrPS9Iek1hE2pzmWv9/rG7/xG298zdd8zS+/973vPdjf3/+scm65s377ddsJ7DYTkR20RSVTmCGxCbiF1v7dqc1UPDTqqzDdRM8i9vMxOhyFohnvGssFK45clGaGLDn1+5wLRVV+I6UgLlCcErxt1+qs9Pm6fQO72Z3OxA7MGi2okUOKKEW1ygzUoLFaIIoWi1OteXUyI6jG20adkP1EcIGUR26sb/Cy13wev/mhDzCML+CdRSbt5kr1ZmqFIOcIADfDrdXRZmaEGqGmVJsuVzWQVe5R55hzfd8J37W61ziDl73O/VwlEHm/ez1d7ep2gnc5ZYyal0C9Fn/qk29fMe9T9XYYiTVJwwe/e3+MeaIJAU2e5OCDv/JhXvuK1/C//i//krPn9vnob3yCYXAMW+EXf+GD9F1ExZNSjQRSh/daCUxhx+9TUXwMFElmwRUNZlfv6ji1Hhp2M9tQIWBFySb6x4QtlqhVPUa9JxdHHrd84Zd9AXm4AUATGkqMPPjAg3RtS0mJtu2MiVwL8ZmzZ0CV1ckJTWy5554HuPviFX7pV/8Tj7z8jeQy8erXfA7rzZqcMs1By/XNNVQcy72eru+4fNfd4JScLOF8f3+fzabQ73k0LDjY63jh6k1Symy3E7k0ZrAeTGYkovSLlq5tKDKBwDiOTNNEmhTvOzQNlut3lAxKp5DYWoRQgDxkoossup40bJk02+eiVDG7ChqMuLJjzyrEtpKqghHITKsacEEZhg19H62zqy4yzjVIcfR7kb1mj/VmYx18aMhS6Bb71l3OqMVnaF26dKm8+c1vvvkZvYg760VdtznjKxBsgzC7SdsgzStS8V6qzCFWR5diszAiSjQySIUXxUWk2IbjfTa/RzF2GS5aoxNmZiW7mCAjaBiDUOdZnICL1kaZOXRXtzjrdsAKgnkO1vZKHaUkrKiYuN6p7gqNSTWqf+XMlFMrvsXN26mrxsOKZ4S0IIbCcHKVmy+c4/KVl/Pxj5+gbgMqle1mczytNmUmV3DcuknPH/O5Joo6gjNJxazxA2em006tk9Qqhq8xTFLnmY44jz+Zmx+tBcu7spsfKs4iZlStROyIPhWart/nxFf/RNvsyIq4Yl78zmZuqkIuigwD4zBSvHl7jicTwXt+6QMfwgTxiaZdsknKsE0sm4aS7bXOeXZqYSfIB9ONRmpahmZDBoInZSHUPCfxxvr0Oh8aMOJE7WItI9DucamECudc3cwd+MDBXefYO7vHcx/8NQCef+4qNz75SbIoi8WCYb3l7NlDppw4OT6m6zoUSFOiaRpu3LxBmhLDkBinkU9+8gm0Eqvuve8yzz33LB//6Me5eOEiUuDS5QXOC2f390ELpRZ/J0q32KeXTNFYkycs2SCVAsxhvdC0kVLnl6v1itAo3kU7NKG0vZLWI4FAShbYbF67BU/BqQUONx7T7+VibNkYzK4v2mfU48hp3EHyXgDMJzd480l13mQNXiZyTiAZKcYMRdWIM64ltAEfYJg2xOgpWchJcK0n9h2Eqtu9s+6sF3HddseHn1l+gJqXpRS3E0ELYkSO2kmYuUeNHvDJCo5QWZVGxLfuzdcZonWNRU2Tpjrn1FkMDt6RndCIswKpajOZShgRqDM6quTdyoh39QNZi5o6Ifj2VGxeHUCMmTh3iQZN+kqHr7QW+/lZ51cZdREL39VsBJTVtaucvecBLl25jxvPrispxLoM89P0u0xDnTtYZ13lbOJs6KuDGlYrokZmnLu1uonMGkljrlJJLHVVtmXQ045Xq3H1HFE0i/VNTG/PVSRXqznFyezOahFIczeMmim50f9tLpq1kKXgY/2aOtt8k93nKVUCkwoheqYpm7NHUHIZdwek2e7Oe5PIqKoxGKX6n1aCTcnZinsNKaa6/xjgXQ8S1rbac6OaANQW2c8V0JvlFwTaEHn0ta/g6vVrxBpftBm2XL9+nabruXHjCMnCycmKto1M04jDsdlumVKmaav/phRSmbj2wlWef/Yq/aJjmkaGcWCYRp761LMcrUYODw6ZPvoxDg8XRJdp20jKgSElVicv0C8uUBbnaJYHbNYT6yGzncSQjpTBK95HUrI0kViFG0ZqynX+CM4XckmE0HKyOa5WdyA1c6+MGVXLZyw5VdvWQskTEJimrcmPipLTiIZIkz2eBo3Gso2xQ9WMvn0MBA0UNdg7TYkYGjODkIRiCR1Frbh1TWQcB3Iu4CKNCk1sdwfBO+vOerHW7c34fGCOtHHeQ/WInJd1EoBLlW3ZEIoni0MZUT8h9ECLCc+TwRhymvkmkir7bt7M5oJgGjx1FvjpNNgJWDNRa84fQKj6IanpCPX6zG/QMde1uu3ZKdwZTX82ej717vaWGVjmAjTPFqnFQRCSpUnQEGUymybXM04nXLv5LK945GXcfO5J0yiqbaIOarfmqq1YPSyc/mu9n3PXZg9srENzhvEh4GmMOFOhWTBZwpwwNHeG1afGNH3qUResS9wldDtckSpAt9Bfu55gBxi5ZTYG1XZG6sGihgKrEUDmErnsehbLPTarASmCt9A+RAo+tkzjhkykIxKzHW6SFnJqaJoG77OFrYptqFpZqc7l+voEdrwduyi7LslolnoA8DvZw9zB+l1WnN1jy8Sjwr6ONrZIVl76igeqfZf5EU95YBwmVpsB5yNarFvMudC2LdttwrtICDO0aMzE69ePuOvSXQzDwGq1JaVEiIFuf0l/cEh2zjw6vWezHsnDMZv1mqY/y1gG+sXAtSPH+cO72GzXDENgkoBvLMYoBI8nW7ZdUS6eO4/kTJKMC9g9T8KwmVgNE1Iim9WEdy0p3yQ20RiyYjO4tmspMlh6QvF1Ri3kZGSVIiuTGoUGifa+9OIsTcVX2jXJ0hyKEMRQmVwga4NkONxrGMaBGFua/QU4O+CkmmCRy0jvO1wG5PTwemfdWS/Wuj0dnxSkzK4lAY/R9RtnmjXrj1qcJoOaBKJ4SjBoz+dICTCRCVKp0VTYEzEfxsp4lCxEH9DQkMW6SacTrkDjHBI7SlYal8ElI25oY91ItCBPX2yDPF1mBmwQYsIV3RUTMCjUoxR11tlgOj6zRMNcQeZCU2d0VEadcxkJ5vxBWlsxvHnC008+x9mLD3L0zCewPi5ZwXAWKxR2OoaadoDNDG1GaWVQtFAw1qTDIGCbr86RP8UaRqJZi81p8MZ7qRq9U2E7GMM2uspQFcGJqzlpM3Vl9lCldsV2OBAa0z762iViXNIpCwGlaT39wYLlYp/jm5s6Ey2MWoOanIXM+tDaLE4L6jxJrOvwwUhRUvWVUmUUc4k9jfutDNbQ7Q4DEHC+hxqDFJyrHqS1T1epLOP5zkJ2kJ2vWY4RLcrnvO5z+dIv/GJunhyx3DcBe/QdXd8RxHF45iwnR0ecOThDSgOlFHKauyAlpUJKmVy25Fzo+p6Ucw11rZKRrBz2+9y8eUzpRrYpEwPEELh05UFublf0rSeWRPEtvj8gHk9sx8xqmxBJ7PdQdA+dNuQp03YdoQkEBCcBis0eV8MJkwSGrVKy5R6WojTdHiojbQysh4JrPClt65zZV/H6CM4jZESrfAIhF1jEthKDbAIiOVAciC9oGUAyxfXEtmE7JnyIuKZhVCHu7VNU6PpAzh4Vsx+MTUHdiGzXDIEqiL+z7qwXd/nf/VtOl/nxma1X8A3Oi3VMO9G5EU7mLnBmH87th5odCTP1XlCKKFmoMTvzxmaMyixSY2VMKGsAm8GoiO46GVylXrsqDq+Zc6Lmlg+V2KHFzJmd4oOJ5XfwZY0HEues96ubZUHq1dbrr5uX1i5jftydHZqKBXlKwWfh5tF1XH8W19xFFvsIJ8S6G02UGiOkte3QSq5xu8dwu0Lk6n+ozcFEErPBt84dqAiz/ZnBzPXe127Z9vtaVmtsT223a1Wtf67QsfPzfG+uolXqUbu3YkM5nDeR98H+PiE23Di6aSLlXdGsxUeUXEwDZxT4OcTXirWKVDu52W3U7a5pNvWen7PU2KVSyu79ocxQqJ3pRO2ezDFEFkR8izOqqjnYFKWNEQI89qWfz8c//jGOj47mkwKI5dg55+jbjn7RM04DzkHTGCS8XC7oupbzFy5weHgGKULbNuzv75Fz4v7772O5XJBzZhonPMq4XTGNGwwpdKyHiSef/BQn159nfeM5nnvhOqHtEWBMhdV6vSPTSC4g5mtZiklTzADeuk/vaohy7WqncQtS6NvGYoeStVRTmmha89K1GduEyIDKVCOKDGZXCRa0myLult+di+JCIAbTjHo8fdynJJsLi0DbtIQQaKKlvhexz+wwDKSUdrPpnDNd36MJdMpEDTv96511Z71Y67YKXwiB4FscsRa3YvlozCdpt4PsVD0FR/ZGEgku4GJD8IHgMTcIX3+NzqxN01iBxwVP8YGyKyymKSsiFociQoPtW1ktr86r7AqQaf2cJadDnYEVQhTTALo6Z6eWWYeRCtSZKW81vMY5xDlzg3H1tF6kwpXeZkRuzhXUqlkE25i3DNMRk2bueehlaDBNmKI00RGDw2NBrt6bqNwRKtmldpW18M15etbd2EunWCcagoWNhuDnp2q/w7zIED3dv21jUkIlf0QXaaoODVflANV83Lm5iHibM3rzUvXO5Bk++OoHWgjRXrNpKtw8OmG5v8/+/oFJBPSU4ev8KXEoAIgldehc9Cq7Vrw7nSkye67O3qn/RZGfr2d+HWpAqneBOYFiDgnevZdmBXtNeNhfLgDhy7/i9Xz04x/jqU88wdnDM1y6ZIG1XdcTQkMgcP36dY5u3uT6zeuE6NlbLrj33iuoE/pFz2azYb3ZmvAjOFKeuHDhHKv1MSF6Dg8PaBpjXi4WLcO4JpWJ1WpFEWXvYMkD916gj8r1E+Huex6m5MIqF3xs6Bc9PkSasMBJoRRlb2+fs2cPcAi5JMZxMhKyKrkI41RoYkRSwanjYP8Q5yw2KWuqM+VomYZecW5CdazYvlSJknWygYY2BqSUHbydSkbKaLmWqlA8bVhWspkdGKcxWSSV83RNX7V+EGPLdrNl2g5oEcYh4ZueJsK4zUy3Gk3cWXfWi7BuM52hFhGtm1dF34soXg3ysAgiV5n/jkzBSxWw1w9AmM/b1QPTYTpspxCi37lKFEwo62s+2o4FieBE6lyGOmcQ8DbPEmdFwWjQddJVOykpGXG1sGab+ZhBc6oSC8VHVwkQWt1LDN4leHY5fjq7qTiD1ubbQp3dqVLcCNqyPv4UZy979s5ehJPnCS6YJCQXy0GrKfOzA4qfuxNXe7zZn3Tumufn48Sy/ZjdU25hv1VhvFSmqjrMv1TtAGDXWwv33MFVy7NQH8v707miyEwestgaawqDUeC9MklmOxRK8cT9nnMXzjMejbuuMUtB/EyeqS9Lmd9Tvt62qimkJlHIXPx1R1CZPVdPIU9brob1aiXz2Puqdpm7b61kogr/gnVrbdMwTSOX7r+Xl7/qEW5ce57jF67xH9//fm7cPOb1wLUXXuBG2yHFIOQxD7Rdx8c/8Un22s60qAEW+z3TKFy9+gIuCMu9A1JOrNbHHB4cME1bnO9IKSE50QZPThkRR9v1iBam6YTnbw58/JOf5Oz9X0qSHskjw2QsXKkmC1oUV2wW2jTBcg+NDm0p5h5KUcy0oWG7BSc1Ry84Qog411N0YMoZ71piEEpZo8UYnd4bEWyaxnm4bQSuGM3eTgXfUEloLVkywUESS1IPobP9oZKGtBihpmTBE0nTRNM2dG1HHgdiiEya0OhpmCiTUD5DlmV31h/cdVsd37zczn+zsV1F6hyoDqddLQwWW+TAmfbPHBrmtGxXCTCO4ARXkxhQj4juRMpGeDQihcGbjsabhRiopW37aHSHat/l5w1UbjkpamV2+sqErDoiNMIcbzRT/ufHUzUYj5nYUuNZakcmYjpCZhsznTuTykoMAc0R0prjo6c5vHA3YMYj6gOEWfZRuxk3d30zMGib/Q4yduyK7SzeFik72zFXSSDz//PX5pfZar9U4k+9TzLPx6xj994OBWD6LJFKsHH2bz5YQbR77BBx5DxRJCMoTdty9sw5zp4/x7Xr1+2aKtRo11p2UOVMWJpfZ3uOys7ereb9+UpQ4ZZiNz+/U5jXdH9mJC51rjf/hx2KgnWrMzwNkHJiu95y17338MY/+VVcv3mdg77n4YcepI3RqPhgBs+TQcspzaJ2e7yjoxOObhwzTYkbN25w9dpVci4s9/YRlM1mw+pkxZNPPclmu2G9PjFm5zBwcrQiJWG1Grh54yY3bzzPyeoaT19/nhubxH0Pv4aTVWaz2lY0xJGSkUGoTFbnHP1igaiQsxKjdZLOw5QmttuR7ZARackSzVhALAdPJSIEYmyrnNTGA5KpMOVIygnvYzU3DzSxRcRGBTEGcAXfAK7F+UAhUxgoOpJLMUMDNWSlazu8g5ILTYiEEEjF7mnbxPoZssNxWd8kZPm/tkn9AVsf+chHWufc5//CL/zC4vfye77oi77oFd/0Td/0khfruj5b1207t4AznVQpFFUaNQOr7EzPZvR6486VWgjVaGBmvDwXIwUnMwu0YKVQrPXDURR8sZNkqIJs22whiTEQjfVpZcKLt43ae4jOzKpFKTJzJSM7Q208WsATEHFIyDiKFUGnxGI5A+LngjdvuNm+lk264FyCOncTZ5Ci01ognBXGedM/OdmyccbqnKOTomtqcKiANtU6VCnqca4Y0aN2RKJzEZ+fj3XRNkOr98+ZgHu2KK1Nbj002PxVcdVYGqhFQOsc0Kui1TnGe09Rta7CAV7JYsSZ4gWPabKKAMEjRWm7yGKv4fLFM9x47ionN4+qpVwhBrfr+HEgVT5S1EJszTS7ofhKTFLzD8U58i2pH66C0zgoLuGp7FY1WNORoRgsTnC4LDRNJKur4ut6eKi3c5wSj3zpY/wPX/lGjp9/gcO+Q6Pj5lQT06vcYTuOVoC94+67r1A08cLzVwlNj/c9jUIqidV2w3q95uLFi5SpkFUq+asn5ZGiI1o2dP2SUmC5f4aTk2OWuRCahtX6hOhPeOHkJhfufw2x2ScfbblxtGFMdr9iuw++JaUVBGVvb4+ua2iiYxrUGM8l1S4e+kXDME6ELpCSsF0dm0a19Wj2SLL3tgOG8cgkBtLSt1umbM48lsJhM0QXHFIckq1jRkyy4n0PJaGSmNJI1zhiFxjGNX27j6q3pAhnh66IMOEoKNF7kzjohDgjaQ3DirVsq07wznox1r/8l//yo23b6u/+nX+w1+0VPrHODR8qtKR1jOQq7OVqqoAZUvvaBTnvqgyidnbOILhSxGZ4AYKmW6zEzCEk+Erxdw5x5nwyZ4ZZbJCxzOZiqrAjTThmsWAtWt5XXZrplERL9edUK6raEG65tqST6aMiUAqSaufhPaFmkZ2mo/tbTqXmYeiDY6GOrBPOQZ4S25MTu4+uRWXc2ZI5rR4z86XuyCQGXzp/S7mbU+Kdh1K7T48lWwCNC7W7snKIdwQRdgwkZ8zR06nf3GmZlKGRuaie5vyZSN6+34y5Z0KQPbwIoBHnWkKIDNsNT/7mJ+lC3BFXBHbau7l3m4k8Wpy5v1TNntNTopNzDhd8JbXUbtpVq1fnqaCtAeGumKrBxUpiEdqmOe2dBZrgkeARC9bmC7789XRv/HJuHl/n/IUzpPWaPBba0BKbhq61A3bT92RNDOuR/HyNgHKOLJORP5zHNeYJ60ODFOugDw4OSUNmGoVF35HyZKkiNRl+O66JnSWVC+DbzDZtef7ayJe84Y+wPh7QoVA2QvHGiE7jSIww5QQiHLTQd/bZCVHpe1PyDaOSRs84TOQpIWSa6FgNJ8QIzieasEC8UiSj4urc1LS0Wmxe6pBKPFLzj61jheBjhdIjWRwN1Y0p9Djx1iUWS23PJdE2HaUkshgRSDxMKRG6jrZpkVRAA10bCQ6u33gBRzo1FP8MrFIKH/rQh7oPfOADy3Ec3dd93dcdXbx48bN26Hj58uXf8dqHYXB93/+BL4y3hSI0Ao0ooShBjGafRczkmFuYe3oKW4k3OyyLxbHiNQNQ4jzZQXIFcYlSpqrVosbqzOy9UqG0XCORZPd4FcMkO2dpBdWZn3nzrht2KQUjwRl5xs87ocsmpRBHVtkRR4w8oqhmtGQ85ospuRgTEhPXlxnelNoRS0Ylgyhakn1wsaI/TSsAHMaGK9JQtEE1nCa/1/w/W/UxisXDiJSd1daOuWl/qVE79ucZztNKHpq7Q+fifzUbmyFDUanBrfZ1ycXs4Gr8kRT7f4YVpeiOLaniaRojfzRNw1NPPolmMRH0LaBtrVm7a8bNHqLUwmaHppngBDPL1bq0ueMyMNOYhM6bdVbRQiaTNDM52aky7R1XEyYEgmuQopw7dwaA+x++zGp7Qt8GhmmF7wJ7yz2ij8ZwrMkAPkZccCz3F0xp4tlnnuf5565ydHJEQZhy4sbxEakUmrZlmiZUlXEYTwu8CMvFkr5foOpZbba4JnB49gCcMJUV6+kFrt68Tr93FwdnrlCSkKdMKYEhCakI6/WG45tHbNYbfGxonIMiDMNgLFkRI/sQcDSU7Bi3CUnKsJrYPzhH0/R0LqBpxInFa5U8IiUTvBKDmL6zuhqJVC2oc7ecmew5zablUxpOD0yhodTPUYzmLlMko6h5qObMKJnQRCLV4CI4xAXaJkLasj25CSK798Dv90opcc8997z2cz/3c1/9Td/0TQ9/67d+60N/+2//7QufzscspfC93/u9l++///5Xt2372JUrV17z1re+9e7533/jN36j++Iv/uJHFovF617xile88n3ve9/e/G/PPvts+JN/8k8+dNddd712sVi87pFHHnnlu9/97vO3/v7/Euq89957X/NX/spfufKn//SffnB/f/91X//1X//Ap/P5/feybo/c4uQWuycH4pGgdf5jJs5ODc9X53DRSBG4UjuXQEHJauE4eF+LRyE6X5MErCh5Z5tbcLEWqTpfQ4DIrHKzTdRR3MzgLLdQ+Gs3CqifB3ihitgzzmVcLKAN4jyQ67zL40JjeXzOHChm+Z4PwU6/ZNRr1ceZK4z3DnW5FgytnbDBv+IKSSzCa/ZlkVlmWBRT696i3ptnVMque67t764z3M3srGlFsRmWzeLs9ahDUn6LnvEWcgfUTkzsgDI5iKLEtPPUYTZTm2ePVixPDdC863Cu0PXKanVU89Yi4ucDil23v6Xbo3aNWmntPoSatWdi9lLmimfFec4PNLBadhAkODM8zwUnIMHuU4zBAnxzAgQfWsSD9B1f+IWfxxvvvgi/8VFUBroQaB2kMlFUIeeKLKgJuTHCTxt7VDOhDXRtx3LZ0vaBJi6RLGSvtH1LxLM+PmG9XtHmDh8iwXXkKduc0HmmlCAGDg72aSikYQS3Zr3ecPOG8Ef/+OOsT7Y4J6zzlkEdTbfAV3OBcZzMUHr/gHEcyduMbxq8gyZ6pikxpELxBRecsbGTMKxH0rKlqBXLkuw1MO3dZOiJMw9Prw0lTyY1os5ea8J70oTL0MRoiE6MeAKxbWsen0lPpHhi06Dq7RDhhb5f4LA0Da/gcoYg+OBxNKxPTlhqZtxsUPW3kJN+f9cwDP7555//LY7UKaVP69V8x3d8x70/+qM/eultb3vbk294wxtWTz31VPOhD32on//9+7//++/963/9rz/1yle+cnjrW99675vf/OaHP/GJT/xK0zRst1v/ute9bvM93/M9z549e7b8xE/8xNlv//Zvf+iRRx4ZHn/88d824uJd73rX5e/6ru965m1ve9szn87n9t/Tuq3Cl4MnebUCiLfNqgZHGvRVMMeVKjzmNFzVHs3VYz2Vn2cOJt55nATTJs3kh1CJLpW+bxlxVdtWWZNmHWbzQw2O5JVQhFMZ9ulyM0miGvlCqB1fNWdW6+mop2crKsbi1Np57Fib9d983ftFjNk6FwdXBYIOt4NdAxCkOreoeUkSEuoyuJqDtvt5296Zi0J1zDm1U5v/XK+nFhQL8Qy7TcrXbtRuWbXpcuB9ZKebw6BL782DtFSmbVsddXZZfPMhA6hl2CBY72hCR2gT69UxhoJF6/Sjr/E21c90hy+Y9MBkFgJVo2gklPl7rPuc3XrsAY204qjvmZnRWubu0ZkoX8VeOrHuSwTwnkdf+0q+7E88TgnK+qMfA6AJS8pks9sYI1kgpwKuUHU3u9tcanxOkULbRvb29sliwa3Bw7Jf4oLNqxZ7i10XmyWB86Q0ERoz4e6XezRdCyqsT1Y0obA6foHVzcKyfwkve9nn8cILR4gogxY0NjX+yvxAVQpta7M755TYBMiJRRcRSaxWE9tpYDsOlGxhvHmaWPQ9kxjKkYaBJvbEpqPkAdWED1jQcx5ppEWKGLklGjnI7XR70Q5YMe5m4M47kiS885Rqc9c1Hc7BNA7E6On6JXMAshHbQHMCZwaBwTdoiaiODOsNzreoflqi737XdXBwIO95z3t+s5TCN3/zNz/86X68Gzdu+Pe85z2X3/72tz/xlre85RrAq171qvErv/IrVx/5yEdagLe85S3P/bk/9+eOAN72trc9/QVf8AWv+tVf/dX+da973fDQQw+lv/bX/tpz8+975Stf+fz73ve+w3/yT/7J+d+p8H3Jl3zJyfd///c/99v9+x/EdVuFr1TJuEl1PMUbfdqpmFuUONRniEabD8UoK8ZpCGhRvGbElV2B8a4QNEDuIBRaX6oMwpFF6Hwk+tp51O7HUtkVgqAZcI7SOFQLfS4Wc4Kzmd68kYrp1swOTVD1oC0Uy7PzqggtOKVjMFjUBTyCuIy63mZKOtoHVht8MZDHI+DNNk1UiVVyoZKMCWdpPUSZnVBAQ8BJZjadFtehdZ7hZBZv10DeeW4q83yzGoVrYbYzmyuGEXzcrjtUcWTvcFpoZnlA1xF8Q0oTORccajo8BZ8Lznum4BCaajqs9bCRoZgfJF6hMXWHyASDIMXjowexlO6MhxBqVJRBwRkBtTBjqrZPnaNU15oqqDTTcrWOOroIaI2p8uBaCy6tIm47x3ii97gcIAQmrzRtpG0aQhP48i/7Izz44P3kaUvJ06lkJCe8COIDaT2iYkG9qMGbTWcH/uDh+PgmZ8+eRyRzcFeH4tEhAIVURiNiaUBjQ9P1SPZ4Ag3W6Yfg64ExEtsW/IbgenLsic2a5576FEfrjjf8j1/NtdWaYbPi2jqTpSEibNaDHVrwNMsOSGgbkADDaMSwZbvg+HjDNBjrc1xtCNrb2yEqGkFToe86Ri8UMYJUGzxCIumIqiXaKxmJYiC/dxRJNssP5jjkXCQJtG0LBGL04ApBHLgWbQXNA9PU0C06vFfymAltS2gcOphrTJKJcQS/MOceHTzbPOFLqr/vdnapF3d94zd+443j42P/zd/8zZ/2x/rlX/7lfpom91Vf9VXHv933PPbYY7sCdv/99yeAZ599NoKJ/7/ne77nyr/4F//i3HPPPdemlNw0TW6xWPyOd/Cxxx5bv1jP4bNl3WY6g2XvhWizp3l4MxMvXM2Emwf/Fp8zi5ANIjuVExgkNcsG1BmkMpsgZx1R11DmQTvzXMe6DRWzzRKCMSmTEL1BnllKJXPc8np7RWQySFPnAmLpDDPl38JZzR7sVN+GdZ7Fug0wn9I54cBY9q6yIW0OpU5Jkiqhx+zCnHeIzDR4qcbSs6nyro88hTONDWL3HZhp6/YNYTcUc95XWr09Vx9CdT6Zf5+viRS1mDjIZaJRS1hwQWpXZb8vxkreqTKHXZfpqvdpjNbpVSamvRYmezCChtQ5kP1/K6wpdmcrFYUdGWXXwTmPsUJmYk+dGssExIoaCzgzuDZxtCcGT9bCmDPRmy2Wa6BrW+6+9wqveuxVXDh/SM4bcC3mRGO3Z0qJkjIOJTYNpWC6uGjQdRqtaN28eYRe7ig5E7wxIUPjCE7JKdE0HS4Gxu3WkAuElEacelKe6Drz/CxixXvarIhR8G7kMHqOTjash0h2Cy5evsK1G0d4VWLTMZ6s0JQhRlIpqHeknDnY7+mWS/K4xnthub9HbAPDVJjywMl6wzQpUjaM42Dv+Gmos1lX3VIc3k2kYcW4GcA7pjzR+R5RZeOFEkz43hAIzghCiO58a70W6wadp6iYGXXTWJeZEk1rnrhdd8ioE55MnpS268iqqHfkUmjEjLKDmvl8TgmYw6D/4K+9vb3fdZrZNM3ue2YCXCnFAXzf933f3X/v7/29u37gB37gyde97nXbg4MDectb3vKSaZp+xxu4t7f3h443e3vpDBiTy/lSOST1x3cEBK2Uc9ugBarThxU+J3U4ToMlCRi0JyXgXMZUeIE5My6EgJ/rDRVqLCCu4LwQnMEsqoorQsiQYjBKf20Rd4ggWiOEKiHeBZM6zJgdzq7YFSRkojOifBZX6fJ1I543aGyOOaOf3pmsIEvCt4H+YEkZJjQZ2cXGbadzq3lbrzfu9MvzV93p/zuI0c3QY70XLtZrsS7WzV1eZeA5NYjLOyCY04av2krvrTg66uszR8z4gKdmEFY9nxU6g4FtDitmBJDq9M85fJW6qFQ4ctdqe7u9tRIHFyhEROs1h7m42/V7PYXGTT/pa+zVHKJbKqHCI+KIsUUjXL58L1/4pV8GMdAtIsLAXows9w/YlIwSKAIiCXFiszwgxEiIFtNTdELU0TQ9yUGehh2ZSJxjsehp6sxRC6Q82RxVPWnKkBXnGqYp4aPZczUx0pRI27bknBnHkRAtmUNcw1QK+73n1/6PD1FK5GWPvJqDfp+1TxQKKY20Xc82neCbyHYaKDjapjHkVywKLMRI03g22y3Hqy1atqSZ0DU5kIiGidh6IJJzpmBdqORMKiPiIKtSnNkFBlG6GJkqc9jV94uI4JrIMCVi0zOVQuMjmgvqIDYt3jnGcUPrlRAGoGGzXdvrWEwPKMUzThlthNj3VT9aiE3P9vqIqElobmHT/IFer371q4e+7+W9733v4aOPPvrC7f78f/gP/2H/TW96081v+7Zvuw5GlPn4xz/ev+xlL9u++Ff72b1uT86AQzVQUqkuKVj3VGc1lgdncxEToVatlzsVVJdipA9fNXXOuxrxUyBYJzl7emqutPE6x7JSWRMDtJiDhCrFOdQLmuwDTrD4Ic2nHxoV0x5pFW2LM92e2kVjtlhCEE8Ovvowm0OG1wCVVIDuXB53naIgFFdomsi5w3Ocu3SevcM9PvKff42UM00l8cwFLKghhSrGHJ3zBetN3rneOK3enDKnCswsx0o50aaSQGaWKeaSU/kxYPCiLyC+HjzibIFmRd95e/1Ea+dYDwIWeluHmN7tGKVSsAKkjlAJM+Ixgglm4+aQyvatP69YbBVWIJ2I3X/M5k2dsS+dtblGglFnqQE7f9dQoc46X3YLfNPiu8jhpUO++mu/mqJKdgquULYj6/UKHxuzGgue0PWImiB7LsQzEYs609xf7JOzkHKha4JpSIF+saTrOpyYa9Ci69lmRbKvHaOjX3aGdNQZbqydkZsLo1NitNilECJFIu0iMOUjVuPEyRT4gi9/A9thw94iMtT75hH295cUgc3oib4mWDiQKTGMicWiYZxGNusRzYHjVSFNGa+eNE5oEsRnmqY1g4YCOU/4RiEpaVvTFJBZ4WOK1wxRCiXbZ37QjG8iriSqMSCuGtZLKRQHbdOYfVyyezuNdY7rHU1n4vUuBiQlglOm6hcrNQVlVt7gqUjEZ6164LbWcrnUb/u2b3v2+7//++9r21Yff/zx1bPPPhs/+MEPLn4n+HNeDz/88PCv/tW/OvczP/MzexcuXCjvfOc7L1+7di2+7GUv+/24/M+qdVtyBmPWaU0xcCgZNOPcLCh31oVwy87LadHbuX/UbD1j6Zk/pQuB6Mxz0Vz/G4JGfO1qPODUYE284ojVa7JUMCSirgFtcRKNXbrL97NN16klhIcQ6gcrWMRM1cZZAGuPKx1Ih9OeUGd7PnhCiOxCY53pnYRC0zZcvnKZR1/1Ch586H4Wyx5F2dvfNxYodhiYjZKtbwsVaoqYHQo1qNaZT2U9QOx6PDf7ZoJ1eFqF8vZ/8IEQzELM7mrYdb1OlCCV3i4BVxyUShoS0KKUnClFaofld3ZvIgXJ6RatpFQ3G4+qGQBkMZGKzeNstiTeSDnB2czNicOrDWudg6bayxl6LDtPz4KivhJXagHEB8R5fOggNoQ+slzu4WLkoZe/lK/+2q9hGI8p6Rgk0RBYhJ6zhxfREpFJoBTGtCEXI1LMmZKiSiqpHoiobjgZlxIkIW8G6jey2WzYDBvatmWvjRx0HeAIzRylhDka1QR6c0Yxf9kpGQmmjQ0Ui49qQsd2vWWzGTk5Sdzz0Cs5f/k+YnQ4nysw7HEqHCz30FRouwWhet62Tct2HEgiiPoaXpuIDvLUgrRILizaQBOEZVwQCpCEUJQOh44jMk600lCmiahSTSm0juEzHpPzxCbivYJmJE+0jRlhew1oMoJUjNH8e1XwGhBp0NLSdwe0XU/Omdg4UtmSZQJXaL3gczF0Jxd0tMBiUZsZ7pCSz8D64i/+4kfOnDnzuvnvP/ADP3BfCOH37KDy2613vvOdz3zLt3zLs29/+9vv+bzP+7xXfcM3fMPDzz///H9Tg/IDP/ADz7zqVa/a/Kk/9aceedOb3vSKy5cvpze96U03Px3X+dm+bhPqzPaG9qcuLbjZJss6N1cJGbcWu1v/PruEnOrVPOIE0UKs3o3FC04KzgXTCLpcy6g5mogHXyKWID7htCU7b0nlGojZyB/Z+Z3wei4aWiykVZyxUG0mZuQL7wKS7UQtzrR93jtE6yzQSuPu2hd9w2Kxz97+Ht1ewElBEjQhUJKyt1yy8keUJLvwWGC3IVrAuN/5UqrIjlnp6+01z1BuYXVSZSBaoafaxWKFyogw9RCCdVO+tpHGZlVi8DXB3FneYIUm1TmKmJ+p02wHDWr0EQ71fjZNsYOPmqG0rxZgoc71tB6Lojle2/ORAMGjwcg2lELXNEwpE9vW7qqK8VtUEQEX5gOW0jQNKRf6fkmWLb7LfMEXfhEPP/IgyIDHMgpTcdw4OmK/dah4sjq6vsEHT54SJSeapq/JBNDGQBusWMe2ZRxHe5+osJoSB3V2moZk1P8QOF4XNicrS+Fomyqzccho780YI2mcILbklJmKELx5cs4C/rFk0C3OK/3+edp4nv/7n/lTXLv2Ak4z66ORZ68OnKxHfPSsbo6cHK3QrmHv3KHNs7OF/nZ7CxyOzXpLycVYkpLNGal4pjQyTgNdawL12ESGYcA7C5XKOqEy4Zxn0lzlJJkgSnGTMX2JuJRoo0C2KCpETCaEOf4EOzdR0pbgGpp2yThtCR6m6YS27/F4gou2b4SmnglH0/dW4/KgjqZ4QgqVT3yb29SLuD760Y/+VwVORHj66acb4EWHEEMIvOMd73j2He94x7P/5b+p6gdu/fvFixfLrV+7fPlyed/73vex3+n3v//97//IrX//1Kc+9Su/12v+bFy3B3XeSpF0Nq+BW0TRfiaKwA5i1NNODyoJwyigp2QXVwje7RiOOGN/oooLDcEb47CIabi8tARxBF8IzuYvky/ECE0Bs8aOUAJO4+5qyi2elMFVZqI36DOpiZ4bZ56SvkY4lJJrx1dhxjrc75rIwV5H20ZCKORpJGmmjR14+2AvF3u75y/U4sdp5zxzMZU5Xsnhdn+eR39V4M3sQ+mYqf++EoTEXhzbRGRmyugtBxSj9jtHhTDNCxWxGZpQtYAoWnK9ruqXWmHNHRqop928846gnuykdjbVN9MZlNqFiIivpt+m5fTeZr8h2v3tos2ICOaaE3TElYyPbTUlsEcracSHSL/Y4zWv/QI+/0u+kOs3V2QZmMZk99abVOXwzCELL6xOtrRti/eOIpmmDQTfIOKJoasfgEAfe0QxFnHfsd1u0QwbErSWqLFZbbh+7Yhu2XOimegdi0Uw7ZsadOm8t6zKIuiYWG8GhpIhRPYWS0uLdw4tghIRGdGYubl2vOFr/xyTLjm+cZ31+oTVUebmkcN3HZIyqjaXjM46el9dkhaLBb6J6JRxAsG1DNsNeRxpm4btsCV6I7KUUgjBnH1mNxxzZRnRUGhcT9bBDB20xoYpNtf1zhi5RdCiNHEBGolEXNW2QsYXY4k2BwumooQmUNJEKZlSMs5F2rahbVvG0SKpcjGJlM/VhFuV1jl8LjSfSUon8BM/8RO/8au/+qv9rV9bLBb6NV/zNSefqWu6s37v6//CUcoG0wZRWSGRutnOQnNbpx3frUiFd64aPRsbUJ3HR2MpKs48P1GCFLwW6tTBfCQrjdRmX+YJKJhQ2RilJlae41hmBxO7HCu4c5F2Yhv07P+oMaBqlP3ZE9SYft7MtVWZzY37PtJGZ92nKJqE2DoQMXq8F2LTog3EYB2eadbmwZtBYlbNFK9lx5ALbmajmsjfV4KHafrmOV8xgbGftX9VYFxhYHu6VeheC9psQG0PaffRimAwWLUyMC0JAnAB8b66gHq8t/vn1U7hRSE4JQRn9mHMBAir2C5ESlF8C+fuOs/xtTXD0QkeIbSR2DT0sePm0Qm+7xCU6CNIITYB17aEvmfv4ICzZ87y6KOPcnB4wMVLF9msVxytTlit13S9IzpPmswmzmHzyNVmQ4zNTh4hxZxpxm1CxLojgOvXj3kim89mv+xouo5hmCB5XG+elwBnz53lrrsvmr1eu7DXPU/E6Gmblr2m58bJMdsp1TlgZj1siX1H00ZySXY/mg7nDeaOITBIZlR48IGHuH7jiHE9ce3GimFsSNqw8KZ13Ww2xCbQt60dllyN3So2o+3bDi1KmjJSIlFhWK9wCNttom87RMQ8PJ110+CQnHFimZrROXK2w1hJiezNCKBiBzjnSaVg8UURJ94IxmZ8i9aOue87iiYjyDhH09jhFRypCEUTadwQQ0vOIxo6FGe+n0VRX8AXmtbboYjPXPF7/etfv3n961//22rg7qzPznWbcgZQgkkIHAQVys47ktq+/Nd4/ExQZP7nKlpHbdBt8zW1JPYaSGl5avb7syhRjb0GERcKqmNlfLWoDzVfziPOZgPWOSbcLsvLGJve2wzSKyS1yJ+ZeCEONMKYEtG11X0i473ZdIXgabqIC9lSw+vJ3auHrMQ620CNAOE87O933LxxYtZaVYAsamGrFjWkdor27Eg9Vozrh72G186ieGsMgx0EqiuMQ6sTi5IptSf0Fn/kZlqsEURMzA3O5VokMN2fga7MEkB1FdoUI8A47xCdrPt1Vmjn6Cat3aFTV0ODa1H3woWX3M0f/x//NPlYee6JZ/nNj/0ag2Qe+8LP454Ll3j/+z/As1evcv2FG4yrLbHbY9EvicueSw/ex0sfeYTWR0qZyD7xyac/gU2JjdE7TQWaQN/sMW4miA39wZKT1QnTNOGivcWLFJPXQCU52WsxqbA4OOTcoiWVkVKEwzNniK4lNo5z26P6SZmlKoUiE0WxNIEsrMctOWTaxR4xCOM0on3Hmb0F4zgSvUUBiSibcURETGPoF4g4Lt97BW1he7xFkmM7wYQjtJE8TZTtlnG7JSwXsDDCEs5y9prYWYq8KN5HfHQUHVl0LdOwoaTEol/WWCljovpoU+AiBU0G4SZVkC2NVyQrrfcMXnAhoONE9E0lmwRwLQVvxJ9KQFHvKKMnNgu6RcewPiZmJdMQ28Y+994TYqxaXIPb2zYyloYkBdVcczgT2Qea5SHbk+3/2ZZyZ91Zv6d1ex3fbl5nMyrvFVeTFYzJeWoXZt1FZU8AEBCzKzG4U2yTDVXzRLHZljoh+oDSkGXESTbphLMPnaoZ8Frz41Ff53g1IVppjHlZrNDtIFfvjTmodcM3tLUaQAdC1UpkLZYrWDJaw029msO9j1ZaTDeneOfJauw9dQWJ1m0aK3AidB1N1xi0VDxRZ6jY6PB2fyqsWTteB0ZKqYVvTnzAzSniddPS+o218PlKRkiuQsu1M6x8W7Myq2QZKBSnBKnepD5aw1YhzbkjDVKLLB5HqtpMjwvGKlWp2sroKsvXCDgRhwv25+effp7p5hHe99z30pdw78su4xvPerViksLnfv7n8ro2cHx0wuHeIc88d5Pee86c2yceLtgMW/P9TC17fc+y7St816PevFSnqTBMx6RcGAVWw4Yzi4bD/fPEzoyzj4+PGceRUgpTnpCalLG/PKTve6aS0AJdbHBADIE0bpHBvs87y68by1Sdb2DYbgjF03VLfPCkPJKLQaYilvixv7+k1YZJCtM01fkYgDNiisKFi1c43g6MBa5t1oR2j1ggEAgS0W2hb311fRXGYUsaM8u+IYQOxFPyxDRNbDcTTd8wrdU0t00kBs+YS00MyeQc8NogKSNpssOLSJ0zg6ZMDA4NgSyFiNI4LIvQByTUoN8aF6Va0OwIsaFpPdtNIpRISQPNckkuQgiBLBnvIzE0BLdgezIQDxaIGhN0HDf0bWezy9JAXOJ1sUMx7qw768Vat1X4jD0pO/G5+rzbMKmECgcUTUaMcLEO/m0epMn6Chd0HhiZN22MhCpsLk5BEiLByI7VdFd93DEkVUeKGBnFuwpFYk4SFp1kMJAJouerD9X6y8yaQWlS2en6VATvzAWUqg90WvChEH0gBGNy2njT4B9RNRaqQvCRnM2Y2gfboMqU6PqFFRw1KBSMODGzCqESXdDdoWIOO2U376v/U+UeUjV4njrtqzE8qlUIrlU8X+FT74khWqGfXw91+GCi8MLcqVZhhKo91yLgLK9t9hK1x0/2u0uokLVBpwpQLBEc75iKzaX+f//xP/HH3vg4R+stIdrGugjRWIRtRMrE3l4P3nHXpbMsY2DMW0pa0Xil3bdNctwO9IuW/b0l6hrWw4qmMW/X7TgSY+DgzCHNokOngXFKjLkQo68m3EpsO/aWHZeqSbVzEGooa7+3wBeLvaKxCJ+hAgZTSuRSXYWKEGOgaRpKMTPvk9UxLkJY9DWOKtPEBkrg+OQIaQLDMNLiaWJrUGHbIKGlbc/Adg2uoV0u0OSIxRABmQqjJkIX6PoI0QhaqWRcbBlyMqs9ClPKlFLsNfKRpu9xUqoFmzFWu7ZlnQQtQhMCE37nrUnwpJyYpoGma2naDs0TIpA1VzJVY59zV1m7IUKBlAquLbh6oCy+ISy9uf3Uw7CZUycyZnm26JYm6SmJdtnhfWuuOSXQNwvaRY/3Yp+HO+vOehHX7XV8KibYrcwLrew3J3OX5ypJxTbqIg510cyeRSo93SJQLAE61O7RBLM+VN0XBR8LztkHVufHU0F1MAswApZCXekelQAjNYcsxLZ2OXbpdkmzQB7QQqzzr4zudHOxOJKPiA80DLQBYotdSy1YnkAI5tZi6QkT5OpZ6GdIzApfE7rTWeitmE11U7E/3pKiroCGUw3d3O3KLAtxuyJJnXrW/ssy/upzkWAd2C5s1llUkQDB1ww7caCFULtFlF0XV7SAFAKpzhapsVHFumF1eDXWpasFXdDarZpeM/rAou34zY98lD/+pseJmnASKaJ0rsEvHaE1Z5k0CSqO/b0OnUb6g31KhNXqhGmzwrtAv7dANVvETWzZa1q8V4Y0oRT29w/o9yK+i3TL8wzrAbwy5YFUCou9AxQhajFaPyBOGGVEnSMl0CTW1WgiNi15frmCZygZF02S0Ebzk2wWZiHWs0BcwfJhPUiPJEGxA1zWwnK5JIoRplQmTiRz/0sfgdCgSdFUaGKDesdmGOmbhtUw4ZYNtJ7YBKIzx5NpKmYdp5Yi4FUoU6FxkWFcMw4TUqyTy9ler+AdpQh9t7TQ2vUJOWeCcybzICMUXGuGBjpWqVKM9bNbQ4mzQ1qD2i2s2A6LoXVIGiE7SoxMpdCqyV9EMi407Pd7bMcBUceya1AKXfSkYUu/aJiGXJPkC82io/ihZlDdWXfWi7duG+r0wWBCM/dvMHFptM1RrCsIwfLXREqdtVk3EUJDcJ5URd+i1S5Lb01uNz2YQ2sUCjhfOz0xmy0VXyn8BnX5EKvAG2aWjcc+9HOXZZR+08ah5jDifaiRdtW1hcp0C7ZJhDpwN5mdVkmAr2J4OWVJQp0Bmj5RRGpxdpSUCM5XE+v5Nt7i/qImnHeuRr84rQcAatG38ibzz1GLuLPst1nf11T/TK8W7CmV4DM3jJakrvUAUeeJzowAvNgjiFQ+p7fuu1pRn77+6vFqzEWHdfoeqtWbFdwZKw0VBrMi63nqqacpwRNih/MRySMuCMuDPfb3D9msV3SxYTtOpO1EI44kmZIUH5Su76pBcqYNDS4lYlbWmxVE2D844PDsWZou0iwXeIGShaZv6KSj65esTka8g+noZmWnwnJ/n365wDlHHxqCBjLKXRcucnx8xJV77wGgj5HFckHXt/TNAlcyNzYnhLYHrSSoOjNGlUVrOXfOOfb2D7i5GfCupwViNObosj/g/OW72Qwj6oSu6yhJGDaJLraMQ2K9GvBdQ9t0NNHhagbfhXPnKIy44sglU6ZMWU00viGdnJCTzRGnUkxiIBbQ63FQqLNrIZctvo4EUk4oCR8bchG8GqztvKEpSKxjjXpQAjPedg3LvicVk4sEWmOPJmUaJ7oYmLMUt+sBqTmXuWS61j5fCaEUMSlImZA0EruOonprrsiddWe9KOv2yC0KWVydSVWoUsXYjVqAXHP6LErEB0vs9s6Sm0075tFYBeAzDcPP6QKnHYOxFU3YLTrbadncsCKr9gSCwS6lmHVU0Jq2UMklwc++nJV946q3JzZTrB4o1WHeQYwEnwheabyniR1QKJJMMO/nkNO62UtlEjpBsOeHK6hOeG/dXhPM35LyW1PUZ22jD7oT6euc74cVZGb3GKsjBpCKGvvTzW4pbgc/6VwqPfX1sa8779CS7R4HC6mVSioKdT43yyW8D6hn19HN5dfIOBEz+dYdgTfUjtzV9lorpNp2rRkch56PP/kMj7zmc1htRiRbOOl+33PzaM12UqYhcZLXtAc90zRQhoG+ayFnlAShw0XYDgN7ez0uKmUU1psJOk/XtNw4OSGuOvLVm3idcG3HybUNZ8/s8dRTT9O4Pfb3FmyHxGowtmYaJ3ISDg/26V1gdXxEcfCpT32SpmnZjEbocyq4ksmjgmtJaWCx6Egp00RLgNhb7LFeHdG3HTknKDZXG4YtsVkQXIuXQtER1+5x7uw9pMEzToLGyPZkxZQdORWa2FPSZKJxUXo1MDkHhzaB7Oc4ISCYP+2UMsM0ktKEK8q0MQ2gi4GUhzoKsHSFGDyTJppWkDGRklhwrEyIJHKORMxAOkZBfESlrdhwqjZ+hmvGqERv5DPfRrwGUk4EUXxsGKaBZb+gbRbkomRNxBiNZTvlSsgpNr0Opvf0WVDfEOIe5DukyjvrxV232fHNgT/BOIBqA3jvigV/OsjF3ZKLJggBX8QYoK76cJaq36MK211BiOAUXwub+AAFE7KLdX26a4TqLGz2BdWCAJWWgNmLlZ0GDaji72CdjBa8ejKpwoEtRdTmCShOIq0fCQhFm0ogaRCMeWZQaY1EQk3E64KJuVUgGjQmxcJrQ9uwXU/MRJudzMLXLtCFOrBT0GyCcm/F2NESqgRkVxD9HOHjzQdUghV/G6KCBILY7MrV0lWqJEOdqyJ58DV4dgZQTwutg0y1raoz3CqSdw4owViRHryYUUAIYhAttUgHm/W2fkETejajR9wBvol0jaNpA50P7O05NtuJw4Oe5XLJ8bBhf/8M29WGw/1DVicn9HsN/cEeMTaU9Rmee+pZjrcnvPKVn8Pe3Xfz0le8nL1lw5nDA/q+Z7U6IYYGnMGqOY+8cPUaTex5/vmrDGnivv0lAGfPn+euy/ewOT7mKA2MOLbrDctuCcExro3cMmYhj0rIwqasWSx6pmlEK0M3hMg0TPTdgpIz4Gi6ju16g9OermuYpongW9qmJ0nD2bOXmbaZMU81lLlhvV1bh6ZbNuMJ22mk9S1dCgQfIXhCE8gIbbuA0Q434pTQG9Er0BIdrDZrkgo+T8wyn1KEEOx9lHMhkwheCAqIx3uLU2qDo5SRGCLeRzuIhmJQv2tN0kS29A4fWQ8j6iaatqOLDs1V/6rColsikhmGNT50xBBoNOCDI2dBsyPnxN6ZfaRkmuiRjSErjY91on5n3Vkv3rqtwhdUaTWQvVHdQ7H5nm2oxrr03tRnNlOSOmMrFLSGuxpL0izEXA1OxeaBc7yOmgWT1Qat9lneOp9adJ2yg2hyNTs2txMzYlaVajw5szoz6sxUWH2mhEwu0NTBmtPKavQRNBGjq2zSKgSvSfOh4ozGyjQcUWoeYVDzZSwF8EoQi+lpmo4QCn6OFaoIove+wpve3GOqlZpD8MGSMLR6aFL1dTtHdlWKsyKInBYl532N1jGfRxHrTi3gtRpUO0dQJTgj5YiUyuS0RPOqragJEvU2UusyYnB37S69d4g/desJdVhoxc9Xb0r4f7z5G7h47300beSjH/sId999FzePrnHx4iWGYUJEuHDhPLHzLDsLKu3bBSpCKmNlJ0Y6jTzxsSf5X37iX/A//Zmv5ShtWG021nmEgEjh4sEh2+1I27W0XUPOHXefP4MIPPzgPWzyyNlfM03yq1/zCA++9nNZNq2FxAZP33Rcu3mTEDwXP/ab8O6/x6sfex0vecUrGIYtm+2WaUocHd/kxrUb4Dx5zPRtT7/XcuPGdVSV2DgWB3t4Apv1lmEUmv2G7bDhzP0PkaLn6OQE8ZFhELxvgQGRkTFtGcYNY8nsLZeUPFn31Td416ClEGJDzoU0VUJVEFxjVmg+mxNLG1vKNNG4BheDpYygiCRzSJk8IhnfNgyi1dvVLPokmPGC5hofFqz4ekIlV9UQ4Gjm5k4a1HuyFBPbNwE/JGK7x2aY6JfekhtCSwhm9xfbljwlQxOyIqWQxg0lT2zGwcJr5zfgnXVnvUjrtgpfwgb/TtXgtmJ5fFq7BXGgrmb2OY/TFieOgsyBbFBPl5Y5R2V8ORyZOomjZvLY1yvxQ9SZFikGgyWlEGoXl6nJ405tvua0QqLmkgF1vIeJxcULKQiqEcQCW03fB+Iizg+VGYl1ebWFcZX96GaSinMQHFJd5QUobjZYjkiu/paVSeqdUSXczFhx7pRlWV1rnAu1Vt8yj6t3xtXH1Pn7a6FD6/N0t8Tv1j/sNJac/r3kRHDV9FsVCNa1zjNWTGqSZ0jZzQcVm+epn42EfSUwVY/NakwQfHVcccYY/Zqv/RN8/he/lkESXdfx4EP/A1NKiD5Q3yvBEsOHDW00n1BVZTMco0VYLBZM48CUha0oTz/7FC9cu8lmu2JKa1yBaRzxfVcZtKax65pIzgmVjPORkkyd3QdPN0PgkvBNoTDS9AEfGoJTzl1Y4r2jedLu24MP3cvq5feZQwuekoUisN0OpHFExaKMNustfd8Rmsgzzz6NczBOE4vFGaZJ2KzXnKyP8OdfwvNXb1C0sF1n1iux+1jmCCuhW/a0VXc6Ry+JFNIwEJyyTUKeTAtbNJEkoXhiDKT1QB87UEs/t9cQM/kOkFPGO/PTzAU02OdHs5KnTGwiLhohy0t9bb1W0wNH9LWzj4HiiqEvwTx301QoRRDNLEM1C4iBSSbwDd57xrQ2SzofrMiqZ9jaiCGniW3aMmnNXyyfOQF7zpmf//mfX37kIx/p7r777vz444+vDw4OPrN2MnfW73ndXhBtUBIFJzMLsXZjfmb8SaXVzzCdeVvm4Cge6xA17wiN1abYwNOZrIGiJJRiGqWZoIFt+FO113JOaKEWiVAlArbDiwi5yi7yTkfoqw6wpkEo5i4/zylDnWJ58HFOCjAXlbngeBdMhlHmxPlbSCO7rkcIEogSEMZ6T6Q6t8xYrbfr2bmkYLCnqXwx82e7nlnqMPvWzKnsuFqI6iHEkugFKaePY/+kO3u4euOYU9ktU9By9JrQVBq+7izM7Cm63abpwDZHnR1m7IUUKqwq5uWJd8TY4b1w75W7+X/+xTdTwkAXILiRNI04oKnM3OqIxd7BfiXYWKQKvmGcJoKL9NE6ixAdDzz6MvK/+Bk+8uGP8vCjL0NIHC66U52j9zTRE/sOL9GE2iI0vcFmRUo104b9/TOwWNCqw4dgtnal4Cgg0FQ9TOsgSjYrNxGi80ie6KPipdAtFkwlca7bN+mIU15y/z2IKlMacL6tvKuLrFeFX3vyBiXZbHMaBDSyHQabFxZIkxXWZhktiLcouWRCzemKMZBTYRgz4zbhWsc0JgINq+MNbkx4Arkkur0F27Q1E2iUnBPDuCENayv8weGkEIodOHP1ARXJxM7gVee8FTbvTYju7HVTrUbqChoCRYTGmbF2DMHQky6QJFv4sYB6g4ZVCyVtiMEIT94HRIyA5LzNnnU3T//9Xx/4wAf6N77xja+4efPmbp88d+5c/rEf+7GPvulNb/pDF976B2ndVjpDh6Oo1KLm0GAUfqnbsvM2Pwh+1pEZpldwJDktYFDnVPX/giI5oiUCAe+ayuSsCQhSatq4wYk7qsVOt5coklGKuaw4SzcQ7ytphco4LWZTVk1wY3UhcSjizHDZZoFTZYzCTmNgCFHVMt1yAxVcUZzYQExLRsYJhoQTs1VThKyy85DZtXH18KBSzEoKe55ZIGusDFJjlP5WZqfJQoIovkg1mq66RWfwk7FEBSmldsGz36Zt8IRQm06TZUguZuAtxrbVW4rdrhOsLNvgsfijIiZyxxGcdSW+FmSKY9yMPPLyl7J30NpzmJRQPK2L6JjJaQItlojusdmYWEpG9LWLByQnckqkNJJ84sLdZ/njb/hy/ucf/FsMq4FF1xKimUPHGC2xwnvyOOC04CSjJdMET2w8TRt3XXDTRBrfGDxf7bu8Kp7TlA/A0uV9h6oneHMiiY2RlhZ9RxRhj0DjBDSBJKJ3xOBZhJZlCPRBaBGuP3OdYdoyjIkxCWNJiM+IFDbrNcMwsV4PDNuJNBZyEqYibKaJ0LT1vGEEriknVAPTkEEhD4myzaSpkLPgYyCT7fcX8/ycA3xtHJDMbBszii9iWkUfArM3nSioN0THbP6s24vBU1LBFU9DUzMgsUSIrkNKYSoZFwTRiXFI7O8fksn40JCTktOAuGLHSx/xIdp4AXNUAk4/w7/P64Mf/ODi1qIHcOPGjfjn//yff2nZEdVe3FVK4Xu/93sv33///a9u2/axK1euvOatb33r3T/1Uz914Jz7/BdeeGFHcv2FX/iFhXPu8z/ykY+0AD/0Qz904eDg4PP+0T/6R2cfeOCBV3dd99iXf/mXv/yjH/1o82m52M/idXsCdlHaEFFqd1bhulid2qk6oV1Yq5prSCOepjo8SKZm9lm3Ms+GCqMN0Ss70oslA3jXYoJyxflsJ0lp8C7gfMG5TKuBooHRJaI4pImg0IgN0QECkdY3lJQRZzMqr42Bq756Y3pPKRtCmMwM27fEEFEK0Vk3U8TKgQueSYqJwguE4ixP0NnsbUKgNASESIFySwZcnTtS5QMWdmokH++lMlE9UmZI1OOdgka0BLPNClsQT3DmoI8LNmPFoToQfTDSDuC8bXwxmsO+p4HQ1Me9xdLMeQJz5ykUFy3DD0t/Fx8obqq9t8MTUR/oVIhTxvtCdp4skTQdkyVz70P389zNazx3/YiFDxwf3eSZp57mta99FYuDfVLZoApd1xOCeWN6zy5RvkgG35njikykq1vGmyccv/AcB5fu5sff+5P8mT/zNTQ+0ISe7Wa0bhhlu90yTRMHB/scH5+wXq85Xp1w6dIl7rp5BMAH/9MH2UzC+QuHhOg5un7C8c0VD3/OK7hw9izB256hKmixdHgTiSt4TxPs2qgHwFmbGlwglWz5gyGCn3Ah8vwNeG4sHG0dU7HPS1DPsMmkdaH1nlSmqndN5GEyqYQ49vYugkYaIm7YEDBIchKYXtiQV8c0bUsZJ+Iiol4peaCbGppSaU71gGPWaY5l7CiaWW8EF82rFucRH1Bp0eSJobHZd9Ngzkj97sDpwxYflVIiMdtnK2pHnrbEXkDPUJqCc5E+N7itst/vM242eNcS+5ZJRlrfIJMieEqOeAksSJwwMn2GOr43vOENq2/4hm+4+vVf//XXv+IrvmL1V//qX73nb/2tv3XlhRdeaD72sY+1jzzyyPRiP+Z3fMd33PujP/qjl972trc9+YY3vGH11FNPNR/60If63/0nbQ3D4N/xjndc+ft//+9/vOs6/fZv//b7/+yf/bMv/aVf+qUPv9jX+tm8bqvwzUJvnUkOsxYNI2YkEQM3nbeIH2/xMru5UTWqnWOGzEnEg5s9H9nNt3RnTSLgTaqANJZyoAbrTQ5cbCB7KwIFo89rrPlursKKAB4tuktNdy4gFXYVtTGCD44QresL0ZvaUMbTzY+5u7ViUcmPlWCC2bDVOYh6wRXrqkSsW53DZkUE9dkgTDVpRzX8MssodAe1uhlCrRCw6Gj3UKu0o+oxTjWFVpRUdKeTNPJO9Xipr8HsCCMqu4OISqnQreAo+DLPFau/qLOgV6cWAOxVajxuYfLm0OHVPBw1GhFptR4YhsRisWTZBto+cuHiXYTgK+PSHPmHYWA2NJ/SRN/3JvRWCFOxqKToWbZ7HK2u8Su/8iFUlAvnzrPo+xrBY/FFqhV6dZ69vX1ElKbpuHRpn3vvu4+nn3maYRyhvm+DU4L3nDk4ZFqPXHz4QRa9YnF7JnsoMpFruoL33kJpvfXiaUqkya4559Pvk1Jo24ainti1JI088amPkwVW641pSH20z5KHxiekJDarFRKhiRFfhCkPFIW47PHOM8lEbJTNak1JymZtsUJzynvbmPdlkYn9poMymfjczRFbmWCANkWFXDIxNkx5sEDn+vmMTWPZjU7w3sTqeI9zDdk5pmSuSyV7oEGDnQI9mTxlyDUBJRe62JFFcCRCNRzouo6ihSZ2uKI0jTBMY0198eQhkYZE9J8ZJd8DDzyQfuRHfuQJsM/sr/7qry7ASGkXL17Mv/NP3/66ceOGf8973nP57W9/+xNvectbrgG86lWvGr/yK79y9VM/9VMH/y2/I+fsfuiHfuiJN7zhDWuAf/gP/+EnHnvssVf97M/+7PLxxx+/owup6/ZiiaowXVWqmbEwy8PLLDbHnc6WZlizbhaiszelOZ7gfBV6g4qvWnNzYQnOZnFFE5BwdKgGXIi4qGQxl3mb1QmNdzRaZ4DZ4WoK9jwf8DimnHDBmTOHiDnKYHTs4KnzBKEJPWnMBI9ZYomYXaAPdfPAvEoLNFWGYZ2qaeNcsWF/IxGiM3NkD5rmMFdjnyqlkk8qqQdfXVKsmoVg322s2Qofu8lKqDbziM2Kptqc1KBaR/CxJtubpELwiJi+MuCqNRs7+zPVQpbJvEERK2xixBnxVRfo7Rr//+z9ebBt+VnXj7+ez7DW2nufc+7U3be7k3SGpiGQJp3hRyIgX0tAEUspISkFSggBiQoCCpYgRbCAEhIEIUwlQwkylFUEKAgpqoRYlIKgGUzIgBk66XR3On2773jO2cNan/H3x/NZ5zYxKle7acH+dN3k3nP22XudtfZez+d5P++hVjUXUGqrMv6y0YI4NG1gMQZwTNuJveWKXcnsNkcMQ4fvB6BinW2bEEMIGu+TUiKEoDqvnDk8PMR6S+87Ft0KsY53/7f3kTK88AUv5M992kupmbapKljrSEkLuDEW5zR77uDggPV6zW5M9H2vbEFguVwynD2NMYZ77/0gD9z3ALfcfAt2sIQQuOvaMc9FL0kpKrD2XmN1QgxK/pgjtWpVDaTaCTWIGTqn0OHlKyPHxyOxqn1cyhBzYRwjedLNpBjRGaGpLUYrN2MEKCUiohvQqQTGPFG2E4LBDw6XoOwqznWavm7Vacg7R04KpSOFnCdKiidRXzFFrHhM1VBh/XAXcol0Vr1OQwyYJhHCVIo9QKSnVrVJyyVget0E1Zrpe882KWnNiyZ8RCkEIkyBYTmAMQxDTwiJNCU6l3HWAQuYrpKmEdOE9E/2+qZv+qbbf+u3fus0wN/+23/70bNnzz7uBJd3vOMdQwhB/jhp6/+jZa2tf+Ev/IWT+eMLX/jCcX9/P7/73e9ePFX4rq8bK3wz4tDkAzS7IvXX02H0XPj0cZyQQGqbjVXQG7KAaOwkygwtjdpPI28o3MZMwS86Ayl0hGIwxtHVTE2Z6jqSVHw7FnXlmI2U9QZXasH1HRkVcRvrtdjm2gbpVW/g6EzFmw6MIeQJ12ZNzRdbi20GMjhTqTVqiCqto9VqTG6ellPJzQ1F5tPS/tc0Vqc8Zo7RrKplPuHNQLiiNySa3s+k1n2r4Fz+SKcsiLEqY5DWkYrKQVSbJ1p0aSbWpYJphZjmgdoKW0G0M2nmAlYes5mZ2Z6Apmm0VHlx7XGF48MrLHvDuA7UXOk7TwzaYcwbImNs69TU9WcYerzvqLVq3pxzxGliypFhseDpdzyD97/vQY7XG86eO8tmPMaKbXNGOYFMVVZRGRYdOSWsBes1Osf71sWXwqOXLrNc7dEvVqz2T7HZTewvzxFL4aZbbtXHVe0Ou66jaxl9c6FOSbvdGCNd17VNgXZ9u3HLarFknCr3fuhhxuQ43Oy0aBhlO+c6qVTGe/IUEO+xRvDesFgsEdsRU6brlXHqvGO93WGNoVsuCbtIjSpgL6WSS9ANpa2kmpXRajoshVQCU07qiSuGmAo5K+FFEIa+J1MJMeKtxxgljZWSqClROqvymeDouo5cEmIrQtRRRtCIoySVWjs6L2r4HQJD1xPCpLImhClG7OAoJuN615jbFeeAuqOEDZ2zlPTkQJ3z+qf/9J/e+oM/+IO3AbzoRS9a/+iP/uhHnojXWa1W/8Nf1DbxZa3XHxJjfHKGn38G1o1tpRqRQUQoVd/gKvkqSkZgJkTISdGbPSdz0TmILu05EHsCM5bW6amRdIPnjN7wS1ZHTTEjhUmH7FUwOTNgkWKJxRCrwjDzUTQCGkCDJxuMZxTWo1qs7XSmliZq2iB1h3Pa+aWSqc4QjCFYQzKGbCrZCtV5MCokrlKuk3lKbSQX7QIThSnGxsjWgzFtg8CsWzRWI2XM7M855zTMhaxBjRlqVZ2doF2ljhyvn28955qknopS5OcUer0y5kQKgag8BNNCYI0WTDXxbrKR5vI2s1xMqZii7Fzm4mnAFsFVtTpLlaarcxxevUpnhNWiZ7XY0w4yZ2W0Cs3o2NL3/cmfvb0V1up7IafKtNnRu569Ycm03vLcT3guY4xcPrxGRU3KpxDIOesNXDTo1jlDyoEYR8TA/sEeB/sqch8GDaLt+h7rdRb66MXLrHc7usUS2/UMe3vK5AFs+/8YI+M4Mo6jGlSXfEIkqlXniqUUqIKxBt91VCwXr2y4tiusk+M4qpFCiIkQA6lmQo6EUghFZ4LGOoz1qNqtYL2j7z1UTS/pxKskSBS2V0lIolT1WfV9BhvwA2rU4CzOdEqYapB9zAXB0vULNIXDYovBFYMXq05CJEoNDL3H4DAsMKLM1VpmaY/O+r0Ret+33127bQtMcaLvO5ziK3i/JGf9mXEaEZvBJWKOmE4QN5HCmhxH/ew+SeQWgG//9m8//5rXvOZpAJ/6qZ+6edOb3vSBvb29J6QS33333eMwDOU3fuM3Dj72e+fPn08ADzzwwAlR5a1vfevyYx+Xc5b/+B//48nX/+AP/qA/Pj62d9999+OeFv+ned2gc0vT82Aa1R1sldlHC9eSvSueTKQU06AwYU5W1ppoGsPzOtMzCxgpzTGzzaPEYlHRbbUKsdlq6EylSiY1aNHWgq+WYqy6kZR6MnerJ+LX1OAe1RoZ0SKRzYQlASNiA6DxMlIHXFWYU+HI2VlGyR6lGIpxar9UahM+aDGt1QIJpOCNb7NGuT5unOeXbaswYybzPE3txSq1pVnXklXbNZNKGjFICT9VX0rmOYheHyPgBFKt7cYoqp+s0lxh5sBetDjOm4bcuvLGJS01UmtqBdGrCF7UUG7WPJRqsFXp/3RzEoXB9J77Hryfa5eP2eaJYXkaYw2uU/aadx1dp8ng0CzpirIT503W0PfU4jFOUxyWiwXveu97OTw84mh3zDhGOtsjbp6Fqn5MUAaytZrKnnNmGAYq5qRzAxiWA2dvOktMkU/8pE/g6GjNdrslpcRNZ8/irl7Us3rirVpPjpeZFVuV1KJnv2CNQ8QQ045UhJw7HnzgUaAjxh05V6awpZbCZr2hIrjFAFVaV52ap6ZlGiPVTCyWe4DFOs/xeiKMCcFQYsRVTTWpVpCshc9ZT92pZZjJCZFAjpkuQWrhrgaVIUi1eGvIcU0MAeN6NWIouhHSQbils5ZaKlmyugnlmbXcEjzMQF8ji0XH0S5huiUpB6wt5LIl5ULXrdhNyuItJmG7DpFKypYins5UTIiE3YRzA5t0hMiN7c8fr/WGN7xh/7u+67uePv/7Xe961+rs2bMv3Nvby+985zvf8+xnPzs+nq+3XC7r13zN11z4ju/4jqd3XVf/4l/8i+sLFy64d77znYuv+ZqvuXzrrbeGV7/61be/9rWvfeg973nP8CM/8iPnP/Y5nHP1G77hG+74wR/8wQe99/Xrvu7r7rjnnns2T8Gcf3Td2DtKWocy93YNmihZQKx2Nc2Oy4gSNE42a3J91nfSNc4G0kalBbMribSII2XPNRZoNVC0D7JUvYljQJzetOc5ixhlNOp3Tzo+FQ+a68cvGqYrLlDNGswIkltSuaPWQkqRnAuminqApohpsgox2rmZ1gkpyDp3ssp2NBaMmeUdmdq8/uc0d2NUczZrIrUJu94J1jp3ykohgaKv2YqQa6bR0sguFAsytKI6J1dfv1ZqoqlfFxS21PkUlGop1bXXvd49aufs1JpKZt2juX4uxVLwVOu10kqhmkrxBnGOw8NjPvrIRXXuMZUcIiUVSi6EMLHbbtWtI2qqhhFLDkkF+VlnwmIdoUF5Vw+PuPOuu3jGM57GbU+7DXEg7vqMrbm/anc8uwq1YqV/EqvViuVqBUDJhWkcKSnzgQ+8H+eE/f0Fe0vPxYce4PjKZb1mzURW55tFnyu1yJ82/6tVyTXOWjpnWfQ9zlguXr3K8S6SkicGhf5cQzc632ExdMZhS8ZJoXMGSiblrInqqWKwlASCI0cwxSHZYbNFcmHaaSHNVIp1hF3Cmx7b3udQSFmRCVMLhICkESqkkJRdKxMhTUwxniSNFJxa4InHuQGDb6kP8wZFoAacndR4HMN2F+j7nloyxjoN/i2CMQoDD0OHcVBIpDLp2KCxh/vOEaZArAMJC0YlTU/GevDBBz+uDGC9XtsLFy7cWNPwx1zf+73f+/Df+3t/78L3fM/33P6CF7zgeV/2ZV/2nEcffdT1fV//zb/5Nx+69957h0/7tE973vd93/fd+u3f/u0f/difH4ahfNM3fdOFV7ziFc/+3M/93Ocul8v8+te//kNPxLH+aV43JmDPzadFLTx0RieikExjUZ4kB5QM6Ie7hZI3gkZpo7vrIuucK6VR5qm1kTc0CWK+qWs35BCqmi2jPEljfNsptxmVuMYMLSfkkZMXglasWxda0LlFCVhqK7ACeMQok00zMPP1LLsT5r8WQ2sMRjwmN8Znbd1wQUX9JVFzpBaFhtpBqP2XaeekzT9nVqOG/KrOSpMQdK5nTuZ4GkZrTyQSrUhV/V5prFD9jqKvXlrtb7FS82vNE0e9Wk3rh+r9Sp1nsdcJMCAtGHeex1ZqVePuVBJWkhbq6onTiMnC+97zXj7/i/4q22kEBO8cJavAOYRAqhXfe6CQSjqZp5VaGactvlPPzVxGSg088JFHOd4ccsedzzqBTKnXw3ZnnaR1ls1mTd/3eK9emaDXP04Ko61WS6aDPQDG6QzeW46ONtRUGfzAo/d/uF0y7SS141YYWsTiRE0ApM1jU5vHqg9mRvyKR65eJJsFu22mFp1BhjwRo9rrrYaecR0pKSHNfUi7Mq+JJk4F5ViLuECOW2pypF0lrBPi26bSCtMu4Hyvwce2I05J33ftOGsZ9fxWyCmA+Nbi62t3LT0DhFwFKaKxRK6nGp3DSdVsP4PgnZJ5ao0qehed24/TDt+vyK0+5qzz22mKLA+WdN6T2rw9RZCSWS46ctqyOV5jbEdI6iX7ZHl1ftmXfdm1rus+9LGztP39/fKSl7zkCYEOrbW89rWvvfDa1772wsd+7y//5b+8ef/73/+Hj/3aK1/5yrd97ONe8YpXXHvFK15x7Yk4vj8r68bILeKwOIoEquQ2y2s5edS2u9Pdte/mWRuN9m1aB5GV3VlmeK6RPBrl4vp8S07mB0rF166uimBmOYGTZp1G6y4rRhwGpzMx4CTuod34pVH4wdCLOpXkYhFLE8lr4TSmzRgRhQUrSqueC0Kro9Z47bRcQWrB5FaoRY13jTMn6QVzooQ6p2QdB1bw1s/faEQg1XCVRpDIVaHU2fGGucNrNyvtRubNCKiTjaNIothMpbnqN4G8iOBRbaQWS9U1aqah7lIqVg0G6jy1ra0otqzAtrkQGkO0wdNFVfLkGuhNT+c63vGOd/IX/9pfJOWIKZYUlVpvpOKsFtdrh1fJOXOwt48R4dqVqxhr2N/fRwFEDUGmJu5933shq9j8vns/wNOfdjudNwrZFnXXtsafyBtErneESngRJaSgUhPn4Ph4zcHeHmGKrA83bDeRUuGMVwlVmgLjOOrzOXWDsTMxqWbNL5SCRNXwhTCB9xwej1w+TGwnWO92IMK0DRr7IwbjO/a6JXlzTMgQwpbFyuO7nt2kkUDWKoJhrRDDSJhGDD2b8ZhhcBztgn6CcqKkQOd7TKe/v4ghJ/UNFRKxjsScCLEiJej5L3o9S3EtGBqMqyohMepCk1DfVwScM1jrcFLxBkKwlGpZdBBD0igvKZQUyEWYwsRquWI3jljj1Jpub4XzC2otxHGHNxVjAtNujXeWiGp2TTFPUtmDg4OD8tVf/dVXn6SXf2o9geuGCt9sdTXPopoPF6FEnDFIrljjEWvJaVQ9jm07QlRaINLo3w1y1O6GhuOXmd/Jib2XWpcA6qxS0MfTilZNgpdeBd7MTMRMbrOreTBuaD9zMj8MlBSoTKjRtBJoXO1a19q6otbtzJR17ZaUOWoEnLPUrISN3HosS9PtWd8IAvqzc0SSGNrMktZJtCqataOqrctUraTTDqMV2zlxfb4WtGshaOcjtVCrWrNhpekEdd41byCooqw+Spu3tnPXujuZg2tt+/1BXVuExgy1bXPSjLNropqZyKSvXYxex1wSD3zk/pbsPTGtA5vtyNWr11j2nsVyYBcm9k+dZlj1vOcd/5W91R7DsGSz2fJgCkypcPamm0kp8pH7H+S++x5guVxy+tQBp/b3mXY7jHR4bzheb/Cup/NKyw8xknPGGGFYDAx9T8mFrtfNhjWGvYMFMU1sN1MLSbaEaUOkcunwGgBHVw+5dOUKZ0+fJkwTXdfhhx7brO5SBGs7rj56iR7DhYsX6fbPcXnnWO8qGaOWcjUzjjswhjFWrBS2U+bq5ctECvunV3RLTwiV1f5AqYmuc3TW4wysN5Xe77GbNsS6pXce8VrkjDgWw0AMIySLsw32rjBtJ0qNJ9dFRLDWIzVrioix1GKIjclrrRApJ8EhRQrVWGVzywyPJ0JIUBy9X1BkS7fqybngTU+OGSuG5XJgM21ZLQ4gF6yvOKuM4irK6PUmQdoxbY4Zhp7dYQCSEm543CVzT63/x9eNdXymsQQrGkhKwYoDb5t9VSFLJIkOy6Wq3s4apbnXPHcsai6tH7hGz2+dX5U5YLWiwGYC0VG8Qm9WJ1dicHPyQOs+a6PQq9FlK3qtzSpl3u039iIZakYkQrWkqhTQWfumZipVC6o4MK0sZxDbte5RJRPOwZzzV0UwpZBbF2Qbxb9gW3o9gOrcSlafx4IWEydtc1FN6xCFmloBb76nlYxIbRo/4cTrFGnFSWeDYhQuzcWedJiCA3QHj1XRtClOMxRJrav1WMlk4cTn1IicOPGrU752ejpXrdQTCNCjPqPNS3KKLBcLrh0e8eEHHuDapUucWp4hpMzBckXfeVIJUCGnynve/W4uPfxRDD1TqGx3O3X4d5azV66xv3+A4On7BZeuXqHrPClFSik89NCD7O0t+NAHH+D06TNYa5nixDAsCCEQY6AfOlII7O3t8dxDRaouXbzMQ+/5QxbdwKJfYkVwXrj7nrtIMdKjkOgjFx/l/TZzbv8Uly48SsiJzWZUgkce2VscEHYquZBc2TjPnXe9GIZzhALjFAkhk3JU550UCdGy6CzXrl7WWCHr6FcLYk1aaETZnN53at4tmRQKiCGnCe88MaDv5WKR6nFOKCYBllpEDbrbrD2HTCoVSlQ7sOrIOZJzRIpVtqiHWGEKEeus+q+KQMoUA84NeGsxpiiRC8AYvY4OkgkIlnGb2/NZYo0a8ZQje8OyyZMyhURIhq7r6H0lHE0M9HS+o+s77GCpa08enyIk/nHW13/911/++q//+stP9nH8aVg3rOMrorIAJDe3q4xN6mAyiVLdwTZPv9o6nqzzv5Obst7CrZjrc6diNEdnzqUTgSxtbqgkEGdUgK2dXiPpN3FhLu1nKtgaMQ36nNMPjBFMLTqUL1CZqHmk+kTB4YrTeQ0W6kyUUJsyaf9RWlRPE4lTZ8anafl2ekKzCKMBSYV90yNRZ2ixoa4VWsq8lvtZ65hPksybu0qB2RYOmX1Ec9s86GPmZlHh5DaLNO1Zm2OM9iTl5FxphlzW68kJatnkEbpRSDljnJwYEZxAhXWe8c4OPkrcoS6gdq2zzpRY6HzHlCoH3ZL3feA+FsbwkQ9+VDtoINVMtYJ1PWfP3YL1wuLgDONOvUO7hWe1WmKt4eZbboFaWC33GT76Ubqt4eLFR4gpcHi4ZrvdcebMPpXMLbec5+DgJq4dXuaWW27i8uXLXL58mbPuDDkJH33oUQ6OVeO73Y0cX9txlDeM649gnSPWwtvf/Qec2zvN7Q8qf+DCIxcZnn47Q7fg1Jkz3P/AAyAd1jkwjimC6xa44kk4VudvxZ05z+ZINzLr9YZUIjEr3JlKBuNAIBlhsb+HYDVtoVY65zX8GJX8GNsTp4mQQiMWCd4OlKngSmbcBZzv2O02pBRxrqPvPEWaV24OKiMJem1MLaQY1Su3zZNFKjVXvNXwZWcMpoKtpuVuWqQWctghXsOXjbekVLBYpnFkMEtIiVwzxRklNMVA3zsNPDbqTqTEbsE6oessMUzspoKJQokTU6hYZwii/rpPrafW47lurPChkJoFHEI2FilKk8cYilV5g6mOLIVaC9bornAWp1vbBlVScVZvnCkpCcVicEZaOjhKnGmGw/nkBq4ZcqboDK5UOTHOVdZhKw7CCQEGOOmYZCZDiFqUVbG6W8Ygs3dlm2fJY/+bbWKM+nSaCrWYxhqd50da0E1jqxhr8L5BlYUTyruIPqczBieeVIWMaeeqkGhszwJS1fZJ0OKtQnROiClG5vNbGtw71/+ih9sS6RUtbufdzMdSqVnvKk70HBdBsw2N1Q1KIwLNa9YaykmX3opik46IVZcONf9WN569U2dYj4HD7Y5OPN3QIRQGY+iGgd0u4f2CCxfux+91hJLYP7PPzTed58EHHmAKgfd/4DJ959jtdmy2a7zvCEFZt9733HbbWUIYmcKWP3zv+zg+egf7+yuO18fstjuMOB76yAUEy9HhETc1y7IP3vshHi2JNEV886ENOZHFcvHSEftXlQVupOPaI1d58IMP4DqvYLatlJpwvWXwS6ZpS06VKo7T526jVEMc18SQtNMylu16ZPCeFCPeLyi5sFyuuHY4sr/ax9AkORWGweMHj6HgvMKp/bBgs9nq+y1VajRQJlwzKPB+Scmbdq0T0CHiqTZhJOFy6/pEbekKTh2JaibXpIzqauh7T2lOLV3XQa2qJxTBOSgxUJwh0hjNtdKZHiJIKSyXC5Kp7HZbnc1X/R1Kae42UlQXWAslbEjThPVLvLOMx5cZp8qi7znO6URK8tR6aj1e6wY7vqIOHblishCtp5hCh8oLStYgSVsd0alrBkUoxSrvUPTDOIevisypDtDi1lX/VY2mfreoo1I1osU4oGrRaTRDKpXcaNrWgLQ094ojF8hZPzQnDiRzYaQlRJcG2xohmQJ5nuXBdSkACvGZVpBzUTap1BbsqRSPQiKiiQO2Vrw4ck1UL0iS2fYRaF1r1TleaQJ+lR4WUikYsaqRZE5waOnqJwUHhWZnAtFc9Kx22DRBR0UZr0phbTPO2TxZo75RSFY3EPkxbjG0WSe0wtrMs5XF1xSXok4eKpOYqMU16zaDa4Sh02f2CSHSmY4YMlPesX+wIqTA5vAqlI77PngfxkZFDqaJ7dGW7dEx3ntiq/TrzZZaKqvlPtN4zPp4x8FB5KabznJ8fExKkRSVrNP3nmmaePCBBxmGgXGc6LoFqbm2jNeu6Tm0wm6KmArLYVCpQgAb1KggtZY6hont0THGGkLrlMxs5FAs05RIUcimY7E8xelT51lf3oKFEEbdvIne9ClQp4iQGxQJB/v7LIaeaUpNuyknIb7U6zKW6840hTDuIDhyDbjugBC0WFgr1BqJcUfFqKyGSmaHKUmZs5LwFihZWcJiVebQPse2oQnFQCpBQ5q9UzSh6uc8S1XtbIYaCh7PmBN93xFTJNasn+OiTxRTZOFWgJDiiDcWKYE4rTk83nHTzc+gtz3T7irDsEfZFYWSnzR6y1Prz+q6MXILUCQhFjI9mJ4qkVAjbpZht2gcU3WmVaog+BN4VIwGuqoYXmNoRDyYSUkZRTVjMxSiL9wIMU2ULlKpTeaVH0PsUBgwklmRpZGg243bStUbf4nNJss3yFElA9UopaaTctI9ihSFELP6JtrmcFJnh5Y6W0+bxkttbEVgaJq3WALGC5Ie25FZLSKlNJmEQp+1FfIZRqag5wlO6q+hzVyk0UtyS1af55lSmiVca2vnv5YWFEyDaZ0yBUtLeI9FJSLGuFZcm35wxmONtOsx69lmI+jW7UpWokfRnTwtfPfZd97B3/u6V3Hp0Y9y5cJVLjz8KF3fs1lvEKM+kYt+oGt6szQWSjT0C8+42+F8R82ZYbHk1lvPcOGjFwhT5Myp01w9vMRme8wYjogxM+4SIh6LpV86xjBiq2MKgRg1jWDwPdVqVwswxYjpPJISR7tjUikMtsM6qzZd7ZqVmjG2o5iCFUM1KoRf+A6qIUel/pdquOm2Z3Pt6pqUlNSSbUUcpJhZrPYxMeCMYWkNYy6EnOmGjs3uEO+U6WisR4yQs3pdmoY4hGnU7gpLLkIIAelaty2oeN10mu7g9P1Rs877TM1kAtFEsiRsNSdzdhFFNVTbqnY9ttNUhln6cuIVbTW4uJZKLwfEHEhljSmWbtjDmJ6UNnQtxw8jTEXnwOIMXpx+FsNEGNdsN4cthzAwnN7neF1ZrRwX7tvgjW2s4afWU+vxWzem42sPrxYyHbled/2fg181rRytSC0gViRqOjqzVZVrIzLVNWmBUq9HaTSyk3QBoXWLKjaWYtQhpZFdOtcBwhiDOr1Y24rrrOFrXp15Ais6j5LcikyD6lCvRQUqtQA1K0MoGREN4KxNukGpZJKmObRk+NKgXfU15KRbqkDfdcRRM88A1TzmjFT9HWub9FFBjFVPSWC2FzOm0HzadE4KzBVNJRw06FOdck6cRZWOSa4FU4Q6q0eqNAu1Gbyu1+d3+fpMr1QlD2l4gznRcKmptjron8wY2yEY08zGDfTLnvNPu4Xf+b3fxUwTYRcZp0jPUp8zVcouYGwhp4p1NCutxPZww2rRc7zeMm13iPF8+P6HoFSGboBcWA4LHv7oBW4+fxN7iwFLZBMSy+VpxmlDKZWYNBInhKBOKXWHWzQ/TcAaS0kRl7JCwiLUmNmGqcVm6Xs+lUIsOt+OLX5HEohZYErF2Y7l6dOM2eEWK44vHlIbqaWWQu86jtYj3juOrm0wdiBlIcSM6TVVwVnDGCZ839MPPTllrBVOn9pjc7zm6tVDEo5QCykmbO+xSdm7OSmCkVM+eVcb5ykpa3ByKtSkvC87fz5TbmSten3+iyIiJVXU1kVjiZy1undNhSwV73WzklPF2Z4sgZxGvE2KYJSAFcHiEO/BdlRr2vsdVv2SxDFHmw0xVZa9Z5xGLl67yPG05sBrSoWhpaw8tZ5aj+O6MfcB0+yoUoYaGjmlnszRMvqhOIEwUQNaa9RCytRGgG906FotxlgMnpylOXDMMKClNh2b2nW1OZnYxtRslOqs6dpKROnaBDIhKIRk7LxjTwgOY5ThWGf6fjVI1Q91FVEz6aqaNtcIHFkqnbUqtagF10S/pUSoc6BQwUlRn0oMqZntWqvhqNStsu9Q55lmCKalVq5TUGaJQ5XSXGE4IelUSkN4pdWsBoM2/R/ArCEE7UwbYjmfsgbeimJYYho2mrWMVYU0aXINTGPHNmafdsE64yvtNbX5tOoTU5p5ixVkYXn6s28n1cDh1TXhcM1qf0kVGMet+jgWTXJfHx/juwWhTOSccf1ADllv/tWwWq1YH2+wviPGADUgJbI+XrPbRh5+6AoL69k/vWKUwvbqRfYXjq7z1OKYpoCxVsNPM0wxMwalyIfdjmmtFmWxZrrFklILtirBJxV93BgDIQY6210/l8YxjZHB9sSU2MYdtz/neTx88ZDj9ZbeO2JMSIGYAoPvVb9nBiYctggYRywZaxwWi3XQ9b4JxDWVXkzleHNMmJREVoxhSgnJmSyRTjyVjLUWb3sswjgmtS2qEUqmpEQUo45CyUHMSAYvllzb5wg1cRDJiGSsdDpjb65EtSqKgRRihr63eBeYIuRq8csFIe7IOeGMBkvnlKDrkL7HUvHG46xBpLDbbhBjWOyfIlXtxqecscsOu75GSmuMGPTd+dR6aj1+6wbJLcrYRDKYQE2meTi2nb8YvZcaFbRrgEEzRkZmLYPGtZyQQgqVqLMP6nUxfGN/VdGCRHN/UZq33oC1C9MiV4slZXCua0w0QAqmzoO1rN3Y7HTSIM1W/3RVhYoKOmdUOYN2ISFGnFPorzRmqP6u6kyfSmqjwJbYJ0oMSamoPI/HzA4rbU7WHFGYk+LVa3Kua/pQLXbSnFL0GzO7RU5gKpoN2rwhmIk/pqAswCaEh4pTzJNqZ9h0nlNKY2w2w7IZY50PGk3SNsY02KsqUcI6aksbTzmyWvQsbjnDdtqyKnsghv3TZwhhh22eqwJMNWO8xiVpNyrsLVesR2WExhARazXh3A2MMUNLTx9jYEqRlDKnz+5BFjbbhD1YUPOOysgUbHOj0eSQWitd7xnHHbtRZQpxihyujznoemItjNs1SzrcoMG4UyPBxBgpRYgxEnLA9R1TbLMvI4Ri8Qf75G7J7tohznbUUkkFvHjCuKVYx7gLGOtx3jIdj1ib1W9VDFYGTUsn6zzdOlIKXD3cKiNYHF6M+lgmJZRY6xBxLFeOGAthUqegUjOFSCk7JEVqLUypmXhn7bqi6BS4ZI340veSQ9/cGWMWSFHiVa4G5xylBIxUvDOUHEESBcF3exgrEFMjQwm2G6jGshsTpqoo3hmBanAUttOOZA3DYoGESpwyY92yOuO59qGHMK62jeRThe+p9fiuGyx8ghUPRp3fFQqyraCoCFbV6JnqCkXUM7GIYE/0X/ohV9eXeZbUMn5O4LOIuIIVT6bl9s0C8kbwSFVdSGwpeGMw1jaIL1OzMguFjKRZO5coVXDiWqeUm2zBnNClixRyzVTjlBBQDLU0Vw5jICsBxZgm2aiOUjJZtghOo4Ba54gIrrnY5KS6Opkty2ZhfcMHS+u+VJ6vxT8XvUHUUim20LeCRGNLqqMMnLiStL9rJS/zeO8kukjUdqURlNSxpIoBO5s6g2hwjbrWSNauG9PGpAUktS5vbi/1d8mmYDqP9Z5l17G/WkFvGKdEzOB6x/HVI811G0cO9vepqVJMoRqFfvvOIwauXTskVc9iMZDDBuMsoRa2U8D6gTBOiLXgLDefv5krly/jFoLvOoah15idsmBzfMSwdFgveO/bzdgx7Uacqyczvq7v2VsuKeOE944zp08xXtsQUmQYehbDAqB1Qvo+jzmRxqx6RbG4zrGZCp/0nE9kW0Q3AjEQY9KsPduMyA1gCoPvyakQQ6RfOPxqIEyFFDXXsDOCE0tOge0YiSVTpCPUwL63bHdbalI3FeksJCg50nUDY0wakSWJXBNIJJQd4OhyPTFtL0ZnjyVXcsqaM2gdKTcpjxRiUrjAiRa9mbdsEHIqGPGIgLOGLIZxd0RNWQN0rTCVgBvO0OdCPzgCW7IokeiRi5eIuXJw821Y4wlhg+RCDDuWw5KPXHgYjFUk6EZuUo/zunLlivm93/u91UMPPeTPnz8fP/uzP3tzcHBQ/tc/+dT6v3nd2IzPZJKddJeYmg+n6K5PaOGndTaKNhhRykvOhSLqAkEVilhs6SBPiE1KhkD1S8YZjHSUUtRJqXpKUlaaM5ClECm4ZmQNhdmfsiJtTmiIjFjZIVV39lLBmfYTYpjzFGSe94mQUCNdqiA1NZhLEAseMEXIoVI0MAyIZCkEBGcyTsufxtVUSy6Bwe+xa4GaMovpq9q2FaPsU9sEvVCos2RBlEwitrFeiyO347RNwlDnczAzL6t2oNoeK71d4wjV3sw1ekw26otpjACxkXJonYbOEgsNKau6WTmpqQI5qYTA2JYO0C+wXh04pPPQ9Ziq0HAqiZwC+2dOYTLkaKk5Y5xn0WmSuaNn3E2EpCkQJWTEGWy/Im4nktgmvNbctpgVR+gXHXavR7qeaRcoKXFqtU9yhboayI2cchzUzSdsd3Smw2Ho2+wOYylZ8MtTbKYt4eoxHRr/Q004p/c42w3EWlh6w97egpwNEgWJlSodZ267lc6u2G5GpJlu5yyI6bl2vMF3nhwTy36BGSM+FuzSk4xgU4WwYyyGbCxDUEQi54y1HSkVwm6iD9rhUh0igZgmCNpx5liRZi7tVpYSwQahpoDFkPJErRGfTQvBnRBfKGSc77E4LbwoMc1lT6me6CviAp1VZXspFbyhktDkeKez5zphigPrwXuVfITSXIAixmR63SJz7dq1xozusKbTzYkxTFPgYG9JuXxMmiK19FgKhsc1BOGPvT784Q/7Zz/72c9/7NdWq1V+/etff+/nf/7nr5+Ug7rBlZKOi2Yp1VNL1w0JZMREBDV0dtJhcDTVlv4nohIEQMRhsHg34F2nej9RCnidKYvSdHRitfsQ9besjc5vqNi2o7TNtkxEsM2I2DQ9mjQm5UxcMSYhprEXmwbI2XZ882xMYBZj6x+LqRZbaityWZ/fKbFlhgKNOJ2jVe2epFmBSVUTaQMKQVW0S7WGKQYtLjMTpBWremIhNv+cQrQOg5OWi9eg5IKhWk8VdUaRBmnOwvZ6IlBX95QZWp3NwKnNb5N5nqgMHGv0HM9QqkLSM5BaMMwemRlKOZmn2kbLt6bRcBr0nAqkCp3v6LuO48NDbBV26zUpBP09naHob0TejeTNDomZXhzOeayx7MaRzW5LyZWQEv3QUtut0A8eby277ZbVckVYj5Qp4sRwfHREjKHF6czCbJUGGO8wxjJ0HdY1fac1VO8p3rM6vc9iNVBMaWkRmtChp0/INbHbbdhstux2US2/nCVSufUZz2A3Tkxj0KSGNrvtnKPmSgiJcQwM3UBKhZQncNAth8azrVhTkFLIMbLdbNSoOxaVnORCyZkwjhjJuK7ivdDZBWJF/TRj1Plw0ngsb3vdfNZKTGoeXmoCm9v8NmNtxXlFTEo1eD8g+FZcUXJUDlAi3mpM2GzwENNIiBMlJ5wB53tc11OtpVpH1/dAQgykEJUBOwWm3YQf9jl99jzjeksJQe0IO8fBmQMu3Xc/JWY1dGgd55Oxjo6O/rsX3mw29pWvfOVznqjXfMlLXvJJX/7lX37Hl3/5l9+xv7//gjNnztzzDd/wDbfPSTa73U5e9apXPf2WW255/mKxeOHzn//8577xjW/cn3/+h37oh87t7++/4Bd+4RdO3Xnnnc8bhuHF9957b/fGN75x/1M/9VM/ebFYvHB/f/8FL3rRi577/ve/v5t/7rWvfe3Nz3jGM+723r/oWc961t0/+qM/evaxxyUiL/6X//Jf3vSX/tJfunOxWLzwmc985t2/8Au/cOqJOg9P9Lqhd5SlYrLBJI+pC4SOWnxLHpjjbJpZr6BzC+fouwErQk36RrZURAJIIWUhRtWulao3Q+edenjmDDkiNVBrboSKqky2UlvquSj5olYtOKVgyogpiZqF2uYDGp+kN49SEnO6AMxWZ0Zt0EylpqRFGiEFdZWJuTLlrO77syVaKZAKLldMLMo8yGgRJCucZoVc8kmgqK5GHa8g5fq8jkYAsrkiZZ75GWq16q5hBJp+7ATWpDRNXeN6VqdpEY17XmphlkRo3Z0h0xYb1aSKpdm/zcG3xugGZba70kLbuvxaiTGTklLhrTPtGFqBgCYh0Fy19bU1vhEtdruthrNKIYVIjplYEjEFdruRXEoLMG0hT87hrD2ZZXbO462jWGGqCg/2nYbXrlYLlnsLxHqgV69U61h0Pd4YTh/sc8tNZzAWYtaCVij0w0IDeaVSJdD1sFwsGYZBiUloEO2w6HBOWA6n8HbQ+CRjuOn8raz6FVIMKVemVHD9AhEhjCODdYRpYujU2cYMHrvwxKLzsZy105dSkJiIMeA6rykWIVKz+mlOMTKFHSltKHmrGrxkWuq6FuuSCjUk0hioRcOKU4nEFLWjq4FCQEylJCjBUZKmtmPUTq1WJQK5zuCd6PnOSUN3jaAm4Pr+yjlpvNRuVM1tKYSgc+CUA67XznnZLwmbHZujY3o/YNyCYbmPFyFvNqS4o185loOweeQBRR4MJxKmJ2M9//nPn974xje+78Mf/vA7a61ve+UrX/kowCOPPOLX6/UThsD+8i//8jnnXP3d3/3d//bd3/3dD/7ET/zE+R/4gR+4CeCVr3zlHW9961v3fvZnf/ZDb33rW//wb/yNv3H1ZS972V3vete7+vnnx3E03//933/bv/pX/+rDb3vb29598803py/5ki+58zM+4zOO3/rWt/7hf/gP/+G9r3zlKy/OnIOf/dmfPf1t3/Ztz/jar/3aR972tre95yu+4isufsM3fMOzf/3Xf33/scf1vd/7vbe//OUvv/qWt7zlDz/7sz/78FWvetVzHnnkkT+VreSNsTqLx4hXF3eU+aishHpC1jAnGrOqjhMVHYoPA2GaqKliLGpsLJpELVYLmGndSykZ4z02FxJqfwRNApEzLgnFepLMImuUxj+zOyW3zvMxXVadVYFFOybrkNJ0ai27rhal7BtjKblJHKpQshbcKmCrZsnVx5BMrIiK7lECyWzjJVhi81ZkfjgzSUh/5iTHl9YBzjZrMxeosSfVZab5cTaSznVfGU5Ylrl5eSrkyQnpRbXJ+nUtaK3Ytuw/aVT+E97MfEzz11pHb53OW5311FpJOSIxsRh6Tu3tM1XBWsduu9WEhFLJIbO4aUktiZIrKZb284acExiwi07ZjyXSS6+FyVj2hiW7caJf9IiIPq9zSCcs9vepU8LhEKlsthuOj9d4etziFNkISSoLY8lhIshEtJ5+uTh5X9gi2FLouo6YlalpcPiuw5jrobMxKrRHztSkG5cYJ/rViuWZc1y9csx6C1cOjxj2lmqRUJRMYkToh57OOeJuRyYTyqS/RzXkAlL0PeWNIet0Wjd/1ugGsLRNW65NaqAwf047fc9g6TqrDi45qVF6KVAiUrNCjjlhjXqY1logg7EdOWUVtPeWSsa5BdYaClFlNBh1d2lG5DEkaPrREEZCSOytTpNjplCxvSPHpF1qiBpgmyfGKbLdJc6dvwm/0PeDtUIcM9M4sji7ouQtdVrrhs+ouP7J9Cz7vM/7vPUnf/InP+9DH/rQMHddn/AJn7B7olLYAW699dbwUz/1Uw8aY7jnnnumd73rXYsf+7EfO/8FX/AFR7/0S79007333vvOZz3rWRHgO7/zOx9505vedOrHf/zHb/qRH/mRhwBSSvKjP/qj93/6p3/6DuCRRx6x6/XafsEXfMG15z3veRPAi170onF+vR/8wR+89eUvf/nlb/mWb7kI8PznP/+RN7/5zavv//7vP//X//pfP54f97f+1t+69Hf/7t+9AvC6173uoZ/5mZ+55Xd+53dWL3/5y4+eqHPxRK0b6vgMDmuNzgZsIzpUwRmHd07tjMTSGYs3ht47ltbQWUO/v8KvVljxSho5gecMxipdg1KwVsg1E0ohlhmQa2R/ESyZAYs3CqUaKWBGxOwQs8PIdF1bJum6aFzbGoW9jGkcGc0Oq2j1UWOSDkQDWU3zBzXojaaa2jLZynXIFJrYvDbCiW4IrBWc7Zh2AalzSK1+gFXqqHCvETWTxhhlWTrtElVpoMbQnTGYUjFF+1epFWkifNVKqiRDkxxa5NFJcry0Q22MT+Z/K5FIQ0LV5UWazMRYFWcXMRSxVOMQ1+GGJcZ39IsltutwfY+xajsVx5ESE6ZlLdkqLPuBxbCgFDg4fapJWRwlVuKYmLYjNek1Nw1OLkW7hs5rx7Pb7Zim8aSDHLqe3ljMlPBiWJqODs34S6Vy6swZnHNMmx3O6rx22ow44+hsx+HxMUebNSmqTKGvhkEqZTtSQ6ZGw+Zo5OjoiPVm03IlwTtP3/fsH+xTMnjrOHXTWWRvj7xYcGU3cnk7YmyHEY2vyrUgVrWh4oXeO8i5/T4jQz9QUpurGstUMs51LVlEZ16lFTx1cyl0bsDUAbJuFFyXWmq7wsTG6rjBmUzKW2LYUGPE5ERJI6ZaanJ44+kcWDMhksi5EoMlZ+32UylMYcJgFfa0vqW8g/GCtZ1GGFlH3y2xtqP3nuUw0BlHjhEvRmfbecdmc431bsL2B4SYMERq3JJLYVcM/f45/OIU6zEyGtH3XksgMXJj+/PHc6WU5N577z0pegB7e3tPaA/6ohe9aHMyogA+4zM+Y3P//ff3b3vb2xY5Zz7lUz7l7uVy+cL5z5vf/Oa9++6776Tj897Xl770pSfO3ufPn88ve9nLLn/RF33RJ372Z3/2J3zXd33XLffff/9JyO4HP/jB4TM+4zP+yMzy0z/909f33nvv4rFfu+eee06e8+DgoOzt7eULFy583LDe/9vXDVuWUQzVNpF30ZuzGh5f9+KcYSkj2hBWyfTeMpglmxE2cYOU5vPpCp3xhDJRilc49ETAVpWggIbN2pIRAzuXqAVcTRijqQJmvsnXSpSsH0pl0wAqSSgyN1AKu5YalY6SwOWq6j9b1MbMqIo854BK2trsyxh1emkG0LVWitgmgYiIDBgbKLUyDD3XrhzhxZHqDM3O7mczWzI3CMkrOQXX2JMFNXpT8o7SP5sTjtEiXqqmNdBIPZDU9aMlM5RWn5XEY9D5X6FIwmKoxVKNp0qgksnVIkU9U6kFcZ36dmK1E/JqpN2VShJLqJnewCZM9Mt91scTtgdvLAu7ZDftOHXmLJcuXeH8uKbzHokQs7BarqjJMtVIyJEQdnRGnflzTlgcZB3Kr5aemBOu07zHGCKlgAmVKSSkZrphwbC3xOEwS08lqp+OF3pvqTEScsJbSz902E7vE7u85cruEGc6rAHXOfYOBkqFUCOgY5BcJo63W/YHp++PruPR4yOe9wnP4/BYGBMcbtecOzinCeVkaktR32ToKCR2xJwZd1v84ChppAajyVuDxzcZCdZhO0/ZrQkhM02ZcdSEddNS5UUMOXekKWA7ZT8aiaQ6cYzQ9WoOIKWcREpZ69RdzjjiFDBGw5vVOUklCws6nEmMpgILNA7X6WuaTEpJyW3GUlLBe0eyhiSVriTcckUJAWcqtqukkFkf75ToMqzoDhY4BxcefoSnPf00KR+Ry8DZs6fZbHctycRQJGOzpROjm9snaQ3DUN/whje8/7777ut+9Vd/9cxv//Zvn3rHO96xeuc739k///nPn/4kj+X4+NhYa/n93//9P/xYssrBwcFJTH3f9+WxhRPgl37plz78n/7Tf3rkjW9846lf+ZVfOfua17zmaW94wxve/zmf8zmbP+7re+//uy73sRuCP03rhjq+efevN0uDmk+m1rU08TmACMZ4nO+Umemba0hvWJzZwzh73Xg2J1IKUBtBohFd5rgdnSuJ+nCaSinNsNpWMJmquT1oVSoUdE4hjWST6/VrVdvcERpxRErzSqya9v4YScAsDDcIvlRsmbA14JuYV3mm85RNMNa1FHr9qvdqy5TLDNXO8CKUDBWLMR4NFFLb71wNqRhynuemLUUeQUQ5o8Y4jPFtjqedpVqHtRR7rgf5ziJFOSmyjZVXG629zQZFZpK6dtelVJzVDqlzHuc9xmowrXGGmDXmh5Y27volvhtwncc51XKGkEgFhqXmw937gfczDJ26s9RMjIExbBEpOGdwTjWAsRYSBeMdKWfizErT7Cf10mzzYMTQ9wNdN5BiIUchpUI1OrMKYUutmSnuwID3Du97Yoy6+wCc86wWe3jXYYwjxcJqdYrl3im9vq3jy9NESZUYND9xs9tx9qbbsWafOBnCtjLYJbtdYpwSKYMxzdYveUzt2O0yUyikBK4a4pTISS3zBMfS9+zGHX2nmYHVqAvQNAZSUOJKSBty3ZGZQBLOW2LKxDAy7TaEaaSEESkw5cgkRYtdcwlKKbT3on5O27haBf5GFKKd0cUCxIKIJaVKSpCSISUloVlrsGKwxpBixHWGKWxwriCmEOLINO2YpglnOvaXB3jpyQFWi4GLFy6wWW842F/hnWez3in6IB5jEkYi1hSdyT8J68KFC/bHfuzHzv65P/fntv/gH/yDy5/3eZ93OH/v8PDwCZttvf3tb1899t+///u/v3rmM585vfSlL93mnHn44Yf93XffPT32zx133PG/PEmf+Zmfufue7/meC29/+9vfe9ddd+1+7ud+7izAnXfeOf7e7/3e3se85t5dd931ZzYP6oY6PluSdgStkzK2OTmUqhR9UesmY5zOL4xT9p4xiLMUBLfvGeKKzWZNDgGL3gjFeCWj1BYwizTh9mwkNvuA+WZenSgS9QPc7LdqczdRdqPan52IxmUuAm1mgYArVFHPQ6FQrVWornEf57lZmTVsYnWG9xgB/kwrkZKx1qhZmDUMw5LdLmCsEm8QmiH0DN3Yli+IagqJOoszehPUMvRYf1Np0gu1a5sDdalJi6c47Py1Rp7hJMZhTlTQ37/gtPOrCTG5zfVacTR6tg26o9fzJO0Y1O6qOoOrzcLMaT6e1tM5B7BgbMfq1D7VZKQYcoKUdIYmkompstmNLMyCEgMpVTo3gFUY/HB9zLTZauq92WO2guucoe877fSMkGNh3GzpVgvEGGUgAou9gZgj6/U14hiwVRhWe7iuA3RmBhCmRNglUi4Mi47Ndsvx4cPYRYft52ujDicOx24T8b2niOHg9C1cubZmDEW7VOOpFELYUoJm4sWoTj/LhWd9dFVTyClISkwhkEvH6vQZYpgouZByZqjK3pSaSE1LJ7WSW8K6MVWvh7HkWPG+x5dInFTzOohBsvrAavEs1JjIKVFrxZqeXCE1Czzr3IlgoDqjUVFYbLZQEpWE85BzxEjz7sylqXJFg32dw3eWadxqtp/17MaRw8MjBr+k6xfqAmMruWZWywUf/eghYlfs7x9wdO0anetxpqicKYtmfWJPrP7+pNfP/uzPnv3mb/7mO772a7/2j3z94OAgv/jFL37CisLDDz/c/Z2/83ee/nVf93UX//N//s+rn/7pn77lO77jOx58/vOfP33BF3zBla/6qq969nd/93c/+NKXvnT78MMPu9/8zd88uOeee3Zf/MVffPjxnu+9731v98M//MM3f+EXfuG1O+64I7773e8e7r///v5LvuRLLgP8o3/0jy585Vd+5XNe8IIXbP/qX/2rR7/8y798+jd/8zfP/Oqv/ur7n6jf8cleN1T4HBVD07ZpL3Ti0lBNAluoVh9Rm2OINwYvCqslCsVkVgenSNayvnoVVyvV0Ab0YKpghQbjqVyAmY1oVDsntWqH2EgZWer1GRUZa5SQUnO+XgoaS5Mm4AYag1GaHZMWTM3x09lfacdWANvcYmqpJz6EcxdFM7MuJeG7BSKW5eKARx+5v+noWsr8POOzzQGmQZDWqj2YklLacVWVMKhTjko/aN6jqqFq0GUjqOhzW0RO7G4+buE7gUULyjwVKHVOcqgndly1ySrmglNLpuakZBzrcKajFIsY3QjNaQUxRvCFoVsScmE9blmwoq4n1psN42YN0rMclpw6dY6YE0dXj9jbPyDljC+Gg9WeUvf7Xue+3rFeTxgM3loomVP7+8QqHB0e0vWDZstJmxtjsLbDmnYGnSNsRvplZQqRzouSl4BxDGw3I9UY/NKTqQzLgVwzKYXmugO2OsjQ+xVjmjh9660s987w8KNH7KbCcn/J8WaipMy4O6brrF6fImAym3GkkHCi3XSKzbMWGLcbIJMC+KGjxozkSsiRnBIl6YZTquoCRXSjadVEle12x+Ay1lmmbOisJ+dJi9PsxylGw4Fj0Pe4t1jpSaGQc8X4BmeLQKp4a9V9xgiJCVNr22QBLRezFLUopELvHCVHrBRqjIzbiSlkjFth+xWm7wh5JGyPOHPqNBcfvchqdQt7Z28hl0nZrwdncG6i7z1prRvoInJ99PEnvJ71rGf9d1Dm2bNn08///M9/cLlcPmHkli/6oi+6vNvtzJ//83/+k40xfNVXfdWj3/RN33QJ4Bd/8Rc//C3f8i23feu3fuszHn30UX/mzJn0ghe8YPOFX/iFH7foAaxWq/L+979/+NIv/dI7r1275m6++eb4FV/xFRf/8T/+xxcBvuzLvuzaRz/60Qd/5Ed+5Py3fdu3PeNpT3taeN3rXnffX/trf+34f/Scf9rXDZpUG7I0t5NiVOg6C8GtwTRYTWd8ak1FqsrqTK3rq5Uk0C2XLFOm7nZkW8klqq1UUGc+vQm3fqrtzkUqNPaoao3m8NXSdIOAWChFi5yI6s+Ama4ozYS3VjChgNGkaDAtoSc1Wr898e7U2qlQ0AxlGrRT1eLpyDnTdUIuEW8c281InJJ2Qqj915xrp7ZmpiVQQGlG3VLRGQ6JIrYxZi3VZCi+/Xue2xliqXPvep3QY2aYU7tJUS+5Bt3OijFzffZHwYhuaVTw3Rx42nNI6zYav0ejeE7uQ4ZqKjY3ZWZJmIXHWsNu2lDtHmCYxkDshFIn+oWllsoU10zTlq5fKswZd6QyMQZHjSpd0bwM2Gy3DIt9Ot9T44QVYXt8zCNXD1ksVkybNcv9AamZlLImTqAw7enTZyhF2JhjhZgdjLuR3U5JbXurfRbLPaQz7MIa44UQR2KY2Duz12DoFm/VXHnEOp52xzM53iWmACFV6IRUoabKuEnUpJBsrgOut1SSwqybQkkVMQMpRYbFijFssaZgfU+1BikFL5ZQK6VkSlYCS0qVki3eW4wrGKubEmMSMeww3lNNRy5CShmToAT1so0544xiBbUkYstOdM3kPaNzbW9M02rWBn2qji6FCW9MM/e+HltFBecsOUViSeQ0IrWy20aqW9IvDhiGATGGGNdkm9lsIik5huXAsLJcuXwFsrJRIwHrW5yW8drtPUmF72/+zb959P/9f//fO373d393dfXqVfvMZz4zfNZnfdZ2sVg8YUUPdJb2r//1v34QeOBjv9f3ff2BH/iBj/7AD/zARz/ez368FPZnPOMZ6bd+67c++D97zW/+5m+++M3f/M0X/0ffr7W+7WO/dnx8/I7/2XP+37xuqPAlgUgHzXuziiGbiBWhsx1FYrPiaka3GIqtTHlHTR1UT+0qziS6ZOgWCyZXee6Lnke3XFIT/OF/ejubK4dkm3GSOdErlKI3AKNi+Uohi1G9lyg8R41o92lUNlGvGz3nohZZ3nqgJT9YpbWn+TVqQXKlmEi1uRUf0UijEzmRUcJHC9cLIojpMCWTS8HajLFLLl19tHWQKmKvuVDz3H1aFaJXNQPWDmpBzlV36s3lRqQoVFwrUiOlODB6EzFSyCFgrNM5Zzs21ecpAcYUlaAzd8+zJ2cj2eAEYzM5NTsqKczGb9hKEsEbp1orKqERmPzQkaNgcqJrET21FJ3dpsj22kSVnh6FO/OiY9hbEFOhTAEjBtstWO3tEYLqPkPZgRSW/oBlt8cUJsY0kWrGiGMaJ3JILLwlpUg/9Nx8803ELHTe4owQiwWJ5JyRmthsAjVCzQVXYFgsMFbwyyXnb9Fz1nmhXzgymTPL01y+fJkzp08jsVJMwM1IeQ9+EDbrDRtxDPunuHDtEjGrBjXvRmoIhKlQS8dmPbFYeYyJiDhyjg2S9lAilQnvB5x1LAZHjFtCtgx2haEw7rZ0w0DapuY6NCAmkatu0Ioo89faTD9UTFlqUkQ1xLQjTREbIU66KS21kFPBWXVFsmIoVZm7uTYyURViLogrOgZoUGhJiWo6MA6XEkl5qngBawtJMjFmmEachWmckK6j39vD+yU1bTGyh3EdRiYODy+zWt3GLbed4eKlj0Ay+NUelB2mX1DNiuoeJRerhhcNvn4y1q233pr/NNL1n1r/83VDha/QHP1pRAojVKNswNCE1LVZXokUzZgrDrEWY9VtPgXIhNZJeCRXemtwxuP2B+74lE/knf/pLbhiKHZCmscHonOmuejSZlSIUI0joHwX23RuShzJJ+QEg1L+yQ1KNFAkN4/Rxxg9m+5EB6jGZiqMByjSooykMKcoKODaQlqruuyHENltd1g080+brqZ7bEuoDS5VcTpSNPvNaNeslagVzhmxrLXJFLQTVuamsjqtNc2HcdZCzgG9OiuUP8Jj0g40Re3M9TkyRpTNmXOmmoQZfOu5OEmdt8wm41bT300l1nwiF8khkXKmuorIxHZ0nL3lrEJdxeLcAmc8OSppYbFY4PsOzIjG4VhiCjhnsAmWy1VzHtFIqylH9lcLasjEkLDDoMxikqbdG0eMQaFlY9jGEWsdORd26yMG3+NWK6agKNZm2pFTpJTM8XaHE8/lS1eouTCFkdVa0R7bT1hbWCwXPOeuuzk+2gGW9WbDlHUW2Ikn1IDpQZIlx0w/LEkpYWslxx3Gd0TbEaZDuq5iO0sJiRATeIsQiCnq/LzN5IoICaBzOCwpJ4wRps2G3tgTr82Us/qRmwIlUBskH2LReCCjnXQIqpG01lDjqC42pVKkgPNqk1bBe6E6IW4n3ZglxTpU36evKVUNHySqyP7aWpmZ/WJJzgXfT+AcR8eHHBwMrCfB+tMsz3q28ZDNceTMuXPgLGmqUCZsV6lWOIkPkye0wXpq/T+4bqjwGWOb51tub8bSCCX6wW+RqXTisJKhJjL6Ya01klPQGUXRWZQVRycLrj16lb2bO/aXSxY3H7A4d0C4eKTOJ7M5c2N4ilhomV4zCzHVQm6doBUht52sEYV5gJN4lVJqo/wLqcS2q7XMFiYKcSo700ptNBPtClMjzhijMKQUMNVgi6Y4GOvw1nP16jFW1MotN6gRkRNWpwhYoySbKtJyyrImRhhRSl0j9SjEaFVjOAObQttk1CZj0OJaSm0MT5jZmuq0gm5ImtnACXtVrBIySkR3BLq5qKhwulKaYWdjUaISkVQLznlqyIQpYPsBg9489bdyarJtNCrnypUr3Lw6j3OWg+VKZ1hAEUfFYkVA1BVFhgGLJU0RZ/Qc55rZ21sRxh0h7tiMa7rakWsmp0AIBSswDD1Uy7gL7B/0LPYOsDt9nNvvuPLoFTY7FcDPdga5VO1QBHrvwViMJCDSLzrkso5OduPElBL9cJpz52/noYtXuXptTYiaIJBixA09xgqxZLrOa3ZkVsOGGqsSw7yQ6bBlhfWiWr1sQHqcE/K41XPTtKFSC9Z7rPXkrAW6pIitsBgGSlAINWd1CjJGbeaKq0zblhpSahPKx5Nuv1a1LHNGma3FODIF8Q5jBqiV7IqiJMOAr57tqJ8CkUjvOiwQ4qRzyDESUyICe3t7eLdglxPGFNa7wP5yYHO8JnUdt912Bz5e4PDSEd6fJgtYyUg1WJldgyrWqoPRx1Lz/yyvN7/5ze97so/h/4V1Q++oWjM1heYGUTjxQ64KhVUpkBM1Jg25rGhHIFUhmizEGEgRUlL391qEzgqrvRXWwOr0gk/5tOeTrUGK08F8sypLOZFqfoxsQGcuRZQU46oWxlKl6eZyIztAIRFr0PmaNNJ/NVijEE/NKl/QWCJAmui8FYwqHjFeu84KNTeyQWOClJoxxnB8rAGoMxll3qzOobR60E2WX5XEk7POOGttXouPkSPUklVz2JIeVOB9vfO7vhlocpJmDHCStM48l5Q/8ke7y0KRqN3tjAnrqSblpLqLRseU9sdldS9R0kSms5ojJ7UlEHiLW/SIE0LZsLe/Iu5GSgic3t/HWUffdaQ0EVMglcLp0zexWBxQS0cKUR1eaqbrHLlE1ps1OQU1Ga+F1XJgKhPiYbM5bhmNju0mstms8Z123ZtNYNqptOB43HHTzec4e/NZxBmuHmpB864nZCV4HB+vlYBrRSFriRzsK8tbI4MMbrnP1eMdlw83XLl2jHUdpRhcN1AR8hQxBWKIanFnDEIiJBgTjNMGySOLxQLBM46V3ZQxzuJKxU6BcbNuUhqwVjczuSSMJIxkLWwpqiuMUzPylJs0pWZMmYhJo4bKzHOiavhsRhMlRAlNpTb3JLFqFWcMJQq2WGLJxJyJ0TJuA5AoXcV2HXNmZIyBFCYohSlmhr0DjO905m0t0yawsB1VMsF4+tU5rL3K0dU1cbR0fSUnyGGLL468i7hiMcXoPH5W3Ty1nlqP47rBrVTBSNFkZaOFwYoywRoVgto8/dSaKpFzJJVAyEVnCrVQraZ+GzJSI85VusGTUiCGiVufcRu33/lMahFqFqQ4ZR3W2WdT3UXMnGJQCy5rYgNVIS71vK4nhBKg0fWrBts2hYER2xiqtekFdT5WRchFyEnI1UCx6gmahZrnD2PzzTQqtJ7CpDBS1c7IWo9xeq7qLIuA1p6pXEJDQAuUx1JOmgVZKSdxONJmd2qHpjCnklqa6LldkzmmCJET78160rGZ9nytuy2BUiZoM5vaiqlpDFhTFMaqOSmMXJs9WkhIrXjr1HAg6szIe484oUjGDRbrDTln4jRx+aOPsvALpikoRR+jWXvLBY9evMjmeGLc6fPEmIgxMoXAbtzhvCOGSI6JEhPjbqTWzGLhOHf2AOccnR1Y9CusNfS9I8bM5nikFEtKmcViIISJrvN0veX8LTcD0Hc9+3v7LJYLVvv7HB4fkVJib7Fib29ARCHRg1P7WL/kac++i2vrLamC8wOI4DvL3t4emOb3Kg6xDtt3SO+xXhEIbEfne0xJCLExajW5JJeEiYWw3jG7zZWSG4MzUUpkmnbkacIZCFOk1kyumvlnvNXPRkqk3cQ47jSqyprmOKSmDqYhCVaMJiu4jtTUHUbUvL2zjk57eGqumOrJuZDyjlxHCpkqEEIg5QgorO76nlSEEBM5BkqMdMVSdiOHR9cYzpzh7NlT7I6PuBoinD4NYlh6R297LJaaCzlCZwecUTTgyQ0memr9WVw3ZlJtLW2ER0lqpcQfSSbQ6JyxTAoLUlU/FCfCNBJ2mZgKU5mIqSgUlBLGdJSY2BxP1CDEbeCO5zyH7IVusaCieh7jjTK85qTWos9hq1LzldmvmXpSS4Mb1XnD1h5Dp4XyBHIUSsvnEykqG0CLWqnNwbOZNec6QUkwzwqrIReVBhhniTkxhahuMKLOMjFXYo1KLEFOMuD05njCx9Q5YgGyRiNlaRmBcj1NYT5ejRPKJ+dcpJBJZFJDVK8bUleK/jJSyaJek6XUJqOQpnO0rSOgSTJUK2iqYHN7Dh2pkjFsphHvOyqFWDK5GFJOxBQpqL7M+Y7FcmBveZrNZoKc2V46IuQJszSEaVQKfDewWx9jpbK/2muItiMVcH1Ht+ipwKmDAxbLJW5Y0K/2W66iwdRCqROYCC5jrEoFYkx4Z9nfG8g5IRnODUuWbsDjSLstcTMbViSczZpLaDMHpw4wCEdHO46PxxNI9JELlxC/orgFCctul4jZMI4ji95S4oil0vc9mYJbeGxvwGSOxwmxmd4bUlCUQ6jkHLDO4pylcz0lC7RoL+8d4zgxbbYQA6RCiUm7tRYYW0oiJ0UJxCp5yWBIsdI7Rw5BPTOram7VBWLWpzpi0A1TbdZgYlXOQessKQVJ6kIjXj/nPmRMCuRppMZMjpFUMrZb0HcdYcysj7fkmghxy+b4EuPxxGr/Zs6eWRJ31/jI/VdZDGegF4o1TNNIyZ5YE9JbxHcaeSVg7CzDeWo9tR6/dYMdn2mzonbDN6r9EqO7slqqJlKXSoyBECI5R2rJGKk4B6Z6uqKem0kyxasfpEywlAW+OqiVW552K3d9yt1spkk1ggZq7YGOLEY/lDVjam0RPiouLyeTv0bwaE3W3Am1Zghp7hS5qnWFaaSQaCErcqhuNFYoNlPMpNZeWeFL57TDMUZTumPI1DLH84jq25wgZs7uy9Q8w66KOhkKjoIT3YGLVVGyWDUpzmJPkiNS1fBdrWgZKwlM0ZkLqmssRQk084xv1lhqPJJVur1Ri7nZt7RiKWWmqet6LEmmVqXM5/Z9cbbZXhls1yms5S3JANZgnMf6Du96Ft2CsIuUVJl2E5cuXKBQsAvP0PVMx1tqzJw+2Gdvf8GwsHqd27WxnWP/9D7r9SG7aaf9clUIfBwjJTlKVLOEXTjCuMy5c6dZDgv6vkeksL+3YDUMXL10lbANXHj4YQRNFQDYjSM333wz+3t7WGvZ7baAEJIj147trrE/h1M87VmfzOFRJEwgeFKIdNbgasWUTNgcUQHnO7x3WKtSA2nXMaZEmHakcUtJmRLVSFoajB2AWFWKYzPYJPjq1LYrZQarpt/TFBh8j7cdOYIxFWu0gIqoE5IvFZsyLhV8MdhccVYjgmKpTEUay7LiOgfOMLaw6FwmktXPhQ2VzfaIMY9QKy7DkIQuVTbX1upk4xx2NWClxxpPNwyEmCnWsTGJ1a03c/r0Wdgc8vCHP0C/3Mf2C3o/cHCw3z6tFdd7Eolu1VFqxjmjiSbu/50Z31PrT2bdGKuz6A3JtDT0ivpPSoNJTLM/wrZ9srTcuqLzwZxHQnYYn7E95Bqh3ZBDCojtsAK7uGVXEuduux33nj+kq0mjW4rVxG4n13U+s8Jc5oSC2vxCrSYSzLPAWq53QXNxFNP0b62LQ8jSxOYVpM3SilWmpZoBS9MrapxMjEEjZWRmhs6aQw3NVc2jMlrNrAlDSEWZnbMY/8SJpZlXK7RryHXC1LkrFb3x65CxQZa1qbC0d0xtNlJLhmqUfCTtdxUlCsybl1rmXYBey9qumWbtVaqt6vcpDmc92Jb03nSa1nnEObB6zBXonEfEtjxF26aNhuXBAY8+8ggMBkuHjAFjhWGx5NKlS5irl6lU9k6dokY974dHh3Rdx2Kx1E4w6eZB5RqecRtZ7C3JJdP1PSFNbC8eq/ZOAFPwzmien++YtpHV3j7eC8Ni0d4X8MEPfojV/oopjOwfrOjcwBQz290xi9UAwN7+AXm5YnMtkWJSSUXv8d6qTm7UwlBSJVdBQlG2ZFXoMKRGarIeUiCEgBoONMQEhaadMfTOUnYBk4Vi1blEzZLUtk0kqfFDrXjbkUqiJEMtAfJIKZPmCRa1pks5Y0U3WKFknPfENBOHWzfoHNV6JdjEwFgyhkrnvCIkDmoRQgmEsMNsg3rt2gGcJ9TEWMAtByQF4i4ismB5cBOnbzHYcMQD73+Avltiup6zp04hEkm7nd4bGloRKHSn9qhWqNaSJRHKiQ3lU+up9bisGya3NF69/mluJq2OYKuabVk4gSdKLjoTywVqxBlHNlbLT85Iyaz2l9jeKZ0+wcot2V/u0Z/d59xtdxCDVmhfMn3TkpUKsTpis3LGCMZanacVJYPU2o5Nj171b9ackEDmyBmNx7GU7JCUkZQxKWNK07Vlg0kDti6xRhmpOWctxlUe8/facv+0YORaVEzdurTauiadbhpojExllCr0aWvGlYCXNvMzsz2adpoiltqE7bRQUE1Ibwzb2mOkQ5Mr2nQkF3JuBb+W5t6mqQ7qOKeeoJzUbT3HczE8Kcoi+ObA78WSo+6EVt0SWyvWgHc6j3POsVguwQjFGVh4drsNC+OwgO87xFmSVIbVAu87nPXKTqS2WCgt4NqcGYxzzRZO2Ns7YG9/YBjUHisXwxQSRiydc/R9j7ctNcRYsIZY1F/Sdx1dM6n2zrO3t6fzXau+sF1nMVYhvxRUr7kaVpoIUgw5VlKqiDOIN+xiZIqVUgVrC50rLDrLwnu6ahlsR46FMKmC3nst1gWd1Z7IaaQyLD3FFfBCqo2VXDPOeXJST9ih7zRmabclxYgVD6XH4ElpxIi+BylAUjKMt6YRxbS791bwVmfdOSclUZVMCBPVtk1ZToQaqQ2R6LoOK6YJ4BPGdfR+D1IljyM57ChlQmxVuYQZOLvYh/GQCx/8EDV7TDdQTaRIoMQdcdriKfiSsClhMriq6RFSYHA9pv6pjHx7av1fvG4MQzDgvG0SBmV8JbSIGAqYSpHG/Gj0+toE1CJCrpFct1STmGJpQF8m24rrPNY5nDXkGEjbiEjl1LlzOL+iiqW3YIjYWnHZIsU09mU9Ib7Ugn5Q5nTzufIJgLpmlKIdoTOerpk8Yx0YM4cUKYzoCsVq/gHF4E0H1ahZbyPuYA3WeC2QtUG91bT2uLSC4NTRZp4t1uYeQ3MoKSj5Ad39SymYqtq6bC2lyStyg4SqiMoMDMyBukbU6k2fxbQuTou6iMFTcLO8o6rrjog+h2maRaRoKpNRmUZJEckJyRMp7sghQiwQC8agz5cnNsdXcVZTBrbbLdTC0bVDYkjs7R2wWC2IaSKtN3gc1vQMy4Gbzt+sGwaxrKeIc72W81JY2p5Twz4UQ04JamKzvkqIkxbTksg1cvnyZS5dvERNWfV6YrDea/EscLTeMOXA8foIY5V9uj0aOV5vAVgsBoVdxy05JrbrkQ/f9wBODHtuICWFOg9ufhqX1humnJjGgBOvcgsqcQrUWjVjsKhBs/OFWifGOHK8O8Yai62GNAVA2F/uaSGthpwK425HkUTtLREhSsUvXZvZGgoavUTOxFKJZJzr6b2n6zplG9vKNG0phaaNLRQmqIEUtkis1KQbnyqFECdqFox0WGPpLNhcqAFSbIzoOmFdAacetCZE/DogWegGT7cwHOyfJedIb6Fc3RDGwOKWA25/9k0MNfGR/3aBsBEW+3swCHSZa4eX2G42xLoj14myXVN3G/oYkVjonYeQsQXcY4zmn1o3to6Pj83nfd7n3bm3t/dCEXnxpUuX7Mf72tOe9rRP/c7v/M5b/ndfR0Re/HM/93OnAd73vvd1IvLi3/u931v8L37sZM3J8f+7r3+j6waDaJP6NqJzCLG93oCLMtRydcSsforGCNWo3VCZZ2imgk2ax2d6sJacJ9SPPzOmCesL/TCw20x47+h9T8kWsR0lH5OlYLLHlUohUq2jGqM3DYGKOxF8KxNSa3s9EX8LRpwy5lBXiNpE8lUakw3IojvySgUPkjUlvFZPrV6daaSSqlLETUXF9nUuPBUpprmlNO/ROV6lOWW0FELl6dCy/LAYmXusQqpa2I3YBoXOOe7NXBo94NJ8PJ1pYvLZQ7VaJSXMzL5mLk3boJQmk5zZgyK+wb9NVFEyRhIZyLkyRk3y9r3q+wwd1UFM6pjjnEbehKkg9hL7B/uEMVBy5uqFSzz44AM87dmfwDSNuM5wan/FeoocnD7N0lo6nzmMO2yJqEs/UCs57FgNnnFMCiUKLLxjNJa9xUILlPUnHX0uCWsdy2Eg5IkyKjtxXI+sVgu6XklPh+s1JVfOnDnD1eM1OVvEey5duYDLldPtExKKsBsjlw+3FCuNCFWZtoFKJZaM2KrhxqUQ0kScItug7NcUNRW9GzzVG+Juwholkzjn1IXFB4WPg5BDptREsuD8gjFmjMmYWChGwGlOSDEQ8w6qpZYdKUVysrgciERKV6mi7i+a3gHWqC0ezmGdurqUnFj1hhArWd/JFDJd1Q3uehw5GJZQEmMMiN0nlcL68BKrvds4d+YceUo8uj7kVLdk0RVCuMK1Bx8ihZ5u7xTFtiSHErChKJmrC+SaGaLaH+ZpZApbnfXmQmqJK0/2unz5sn39619/quu6+iVf8iXXnijbspe85CWfdPfdd2+bZdn/8fqxH/uxc295y1v2fvu3f/u/nT9/Pp09ezb/i3/xL27+2K+95S1v+W/7+/uPi3DkzjvvDPfff/8f3HbbbY9rrMbLXvayZx0eHto3velN/1P7tT/OujEBe0VZmE7NoVUO0GYETVBtjc6jcvMZNE1bp3dR9TnB9eBQi7NccGi0iusNGIPm7yVSmkilgO0oLlNKk04YC0wYyaRaiNWol7MRSgnq0qJtW3M50XgiHYkpi80Y2zrSlklmDMY6bFU7qNRu/rXJ13OZILeuTARRozNAySwnsz3T/DWlzCRRruvndFVtBptDSytytZxMHw3Xi7XJXJdePFbfB02GISeBnQpQVWpRCAtpnqC1JURUTbzIVIxUuqrxOKmhYlSh5EYA8hXJtmk0U0uchyRVqeoibYOj5yDFxLC3IEcVMo8xsMKxixtC0OfN08hDDz/Abc++DessuWTtTGoh7DZE55k2Eznp+2K116mWsFRyTUrozZUzN53jytXL1JRZOE8sluQMOMPC94yHh2RTWCw0LSHH1Gzu1D7OdoL1enEWqxXbznO4uYbpHDVbDoZThO0h3jmc103rtJ2YfGC7G7EtjSKHSIoRsR6/6NQ0oVY6r7E6MwsTGpPSWVznKSUzbUesT7hhUOccA12/YBy3LIc9DjfXdANWisofpBJzwNeKrY7crkWVgjiLq5WwidjqKMYwTSp2l06IRZ2FEipDkVoQZd7oLC0XahZiUCZzdTo/tM7Qew17jilRdxtiSrhzN1GTZTNuSXHEl2MWYtnGwsEdN+MlYuOOhz7yINNRZHH2aaR+wdD3eJuxaYcZrxJ3mdFbUm+RqAhOJJCS/skl4ZoBw5O5Sil8/ud//p1vectb9gGGYbj3S7/0Sw+fzOPJOeP9/zoD9oMf/GB/5513jp/2aZ82/s++dvvttz9uRco5xx8nJunJXDcEdRrReYo1qn3TucB1IXatGXET4hJiwViPMx7fHEKccTjrKNYRbCU7g/iBbjHg/EC37NlNU7ODUvJF33msb9IC69SlJWVCzVRb8QK9CL1YetHgShEwzetybouM9Y0Qo5CNrtJmOzpfgYKUgJBbknkhpUwMag7sbI+zrkkfFIpEMiKpjem008qiKQfSiEBzEvq8lHwjjQlUEVuwpuBNxQpN46fH54sAejPTWlxOjKdz1d35zLg0jc3ZOD6I6NyKkxlJCyQVUflJVZs3VzmRaMjc7VmDuAFsT3Ud4p1qwh4rxJemiS6ZVCIpTE0rZjm1v8KWRMqBftEjVUNJLz9ymb6zjOPIdrdlHEfVj5XM8fEROVnGqfLQhYustxNWeiqV5WqJ77pGZOnwzhFiwfdLrHMMnaUzlpwzxlpCCBwdH7LdbJCmDZzSRHHCLo1MKVw/Z0ZO2L3eenbHW0zw5NDhu1MAXDvecRQz1nV66acCEciGPCYOhiU9hr7vMd5DgjxFyIVdmEgG+uXAtNlipkTKlVQivhNKM2gvubIcVlhxLJfXdYFS2/vDqZtO3I3kEAmlqGYuZ1KaGlxbyWE6IU6VmOnEYovQYXFVmh+tJ7ueWlVv2nVLctak9YaAU5uA3ZqCJ5B2a6iGvTO34k8t8EuH0QkBl65cZKqJW/YWnD3Vc+XSBXaHa/zBAcFmhj2LtQlTAmFzlaNrFzi8eoHd8RFxF5liYh0D692OcbtThCirScKT7dzyute97qa56AHEGJ8QgcXLXvayZ73lLW/Z++mf/ulbROTFIvLi973vfd0b3/jGfRF58S/+4i8ePO95z/vkvu9f9O/+3b/bf8973tN/zud8zp3nzp27Z7lcvvDuu+/+5F/91V89Oc6XvOQln/STP/mT59/61rfuiciLX/KSl3zSx/sawMdCnZcuXbJf+qVf+sxz587d0/f9i+66667n/dt/+29P/XF+j48Hdf7CL/zCqWc+85l3933/ope+9KWf+MM//MPnZpj1sT/7y7/8ywfPec5znrdcLl/4WZ/1WXfNSfHf+I3fePuv/MqvnPv3//7fn57PzRvf+Mb9j33tP+66oY6vIhjjyCmqRZdSItTmqyisZjA6uK9zUrMn4ylGA0KtgC9gCuRkqVHNpFOo+MGyf3DAdFQ5e3CGzXbk3DnHB0SjiMR4jNNOJEkh1YxDYadaRV3vmcXbpQGJLUi0avGb/Tu1OIp6Iqq6XEkuxVKwJKEJ7hXaLdmqO4bdtt5N9VYFd/J8c6JEoc3NmtxwZpb+UXKaJlrraK9Qc9KbXFE5QbWmpTVAtqLhrFJxLUa+kkkCrt0U9EY3p7Vrfh8npBVDqbO1sLI0SzVsJYHRc1jLbGemnWyqnlJd03Vpgc81q21dk0zEqk4h0uaDMUal1RvHwdCzW18jWstwYJgOJyRVNtcmLJ69/Y6pZqYpMvQdq9WK7DpiKqzOnKLrHOMYqKGyd3bJYjmw2WwZ9lZcunwJK4VNFWICSyVtjjHOUfpeTb/Fs1wuyaKSG9MJoarzTzIG64Z23mA3rZXBKI48FRgLw8GKxMDZ02cBSAilVPb8gsOjNTjPcZgYuh7fNG6DNexKZrVcEnc7SoBYIqYWrOmoFtzQIwgmdbiukOIOcgSxxFywfacdoHM431OLWuelGNpupqoO1HqiMZAKNhtiCsp+rEAYKa6QqcScsMk0+N3jirKgJfRU0+E6ZXEaUf2cdVa9QhtSUATGayPjNmBzZLn02BJYrFZIha7r2R5ZVmduZhgMaTwkXXmU3cVrpMUZFrZnr4NuuoSrwnZzzOX1BRXw5w4XBbOL6pjkKqXs6NLILozknHDOnqApT8a67777/Dd+4zc+80/itX7iJ37iwQ996EPDc5/73N1rX/vah0A7sQ984AM9wKtf/eqnv+Y1r/nIJ37iJ0433XRT+tCHPtT9lb/yVw5f85rXPDQMQ/2pn/qpc1/8xV9817ve9a5333XXXeHXf/3X7/2H//AfPv29733v4td+7dfu7fu+Any8rz125Zz53M/93Ls2m439qZ/6qfs+6ZM+afyDP/iDhbX2f6v3fu9739t9xVd8xZ1f+ZVf+ejXfM3XXPwv/+W/LF/96lc/42MfN46j+f7v//7zP/MzP3OfMYZXvOIVz/66r/u6p7/hDW+475/9s3924X3ve99wfHxsf/7VqnEMAABRR0lEQVTnf/4+gFtuueV/m+57Y0G0NN9HY0k0b8raxN+2aR1Eb0QWHcwba1WQbSoiHhHBUUjFY/GQN8oEHAWZAsthQTaRo+PLDHv75NKib4xHxqQ6NAsGj+CRKkqsaRIKyZDxRArZJERHORhJlOY0Y72jpIgxOk+rRou1b50VjTmpZTQ1k+raduD2hNcKBlOUuFKJKt+wzVFeKlJS+3sDKkW7/9paMrHmuoygqTJ0R2HAWUwuuIo6b4hAzTozzbNeT6FLYwG9IkqwQB9qjcZAiSvUqonu0hikRTQeysgs8QCprnW+EakG7yq5BgTBlgX4jDOCzXrzxVmMFXYh4UsklMyZs2dZ9R3H1x5VGDzDwneM1rCtiTwZNus1pvdU6VjtHxB3kZIS1kLnFRVwq31KXDOFyPo4EsaRvl/iF726iIRMrkIIW7yBrvfkWtR3tEI/9HjvSVNiu9vhO4OzRhPdrZCT/s4x7dgcXyOnSu0dJWss0Rg6/P4B3bAE9NpUKUjMujnJBQkJrKOKYTvt1MbPZmyqHI87chVqUlSkHwxTDiz2OuI2YPse3xctAFT6zrPZbdnVwOAttSRc5xCzoMZIbxUI3zRzbSuCzQVxXgN5cyCXqrCnq8Ss/q/WCKZq169XtuKsBZKGzbpBdZzJULxgKyy7jqnohLvkSf1Si1CWC4oxXLn4IHbvDH3niBlsbzi1t6S6NdsrV7jy0MPQn6Mzp+g7287RyJivce3iIxgPYhzGWCRPMCmRJoZIIBC3h2Qy4nSO/1i05E9ylVJ4xSte8SdS9ADOnTuXvfd1sViUjwcVvvrVr/7oF37hF54kRZw/f3736Z/+6SeBuK973es++hu/8RtnXv/615/61m/91ovnz5/Pi8WieO/rY5/v433tsevXfu3XDt71rnet3v72t7/7+c9//gTwKZ/yKf/bERk/9EM/dPOzn/3s8cd//Mc/AnDPPfdM7373uxc//MM/fNtjH5dSkp/8yZ984HnPe94E8NVf/dWPft/3fd/tAKdOnSrDMJRpmuTxgFFvUM7QzKGNWjCduEq2e7Xm5RWM5MYw9M1yrGByUblDFWy12OJw1eCdYQyBrusRLDkVhsUCu/BMJXF1fUzBkhuEhhhyqdQEZBVtI56KA3EgHcYOGNuD8WC0tptGtRdrqZQGARZNvW6+oSkVYlbRPbniq+CKQkYitdmjqS/krIWzYk5mnRYwtSqTkybqn305yyxypOnxgHqdn0IrQAW1Fws5kXJSNw5J2KoMUQNKgbe2QUBzAK8+b8qBWuMJg5TGIG0mZHoNW5aaoHCnFmLaEauesZIIcUMuO41FKhVJkZr1Zqpm5WrOLKViSuFgsaQzhmuXrxBjJSUhTZlxN5Kqdog5QQqFvltysH+gxIpacMZycHBK44dyYX2kEOX5W27h1NlTiO0JSbh85RpX14ccbjf0C8+5m8+wd3of1y+00FRl31px5AydX7C3POBg7xTLbsmyWyFZY6v0tAunz5yl6xZY21EMnL3tPEmEbrlPbAWyNLr/ZtzgO8807lgOg0pUqt4krTV4gfXhEeMYFK7OhRgC43ZNDjsoEdKEs4aSC2G7w5SCxMj+coW1gjWFEkdKzg2ed+SYqfN82gliMpaJrp8RBkOpkRjUZCFPgTCqgbTBkLMycautJJMIdQcmKvGsGxBrdUMRMyUoCmCqkENgihkxHd3qLDH15Kljd7ilTAlvDOdv3sOWHXm95sqFyyS3ItqOoeuIaUuaIr1zHF27wLQ9Znd4xHj1mHB8xHR4henwMuHoCvH4GnFzRBh3jOOkIwIKpTw5Hd+P//iPn/2d3/mdUwD/5J/8k4eelIN4zPrMz/zMzWP/fXh4aF71qlc9/TnPec7z9vf3X7BcLl/4oQ99aHjggQf6/5PX+a//9b8uzp8/H+ai93+6PvCBDwz33HPP9rFfe+lLX7r52McNw1Dmogdw++23xytXrtwYAfOPuW4siLaJlzHSbpgtkaHSpAO27c5UrWtEXVhKSVTJSiTMBus14kaM6gJN32G7gWEYGMPImCN+6Dl1cJYr144Bh7XqAwpWIcSqXUcGUrXUlFsBUkJGKoaKVT0TQHVQO2YTMDFVo1uqenKKdOSqPBBrBFtaxFGpZAPVNJ5lbenz8xytFUUeI3qv0my+MDBbt510fo9ZVeHQ2fatlIK0mze0GKSSFVIqTr8nVY2Ha9HXxCItQd0gVKvC9pqr7uQrSGmdZntNDfPVXY+1VqHhpsXUw7HElgloshKGrLH0ppkaF5hi0FDirHCr6z2UytUrV6kVHHo+BztQss4AjVUZyDROnOsGhZGrUmJjiqwPNTmh6zt2dsu0nQhx4uLVR1kNZ8kxsr9agcmEkCBFbNGbr9tb4TpPypWFOMbNlvVujbU9B/sHUANh2rFY7FNRtidATponub9cEKaEdcJ22lDsglue/iw273w3AClpuniyaBZiLVRTsM7jO08MCXLGOM80TaRmrl6AEhMxbqkZjtMxBPCnPL13bNYJSYqWuJLYTWuk6xTitB0hFGrKuK4jTKnNYhXCNlZnYCkum4F5oKSCywI1segHphDJZHzXq7YVC7ZFPYshxIRbViwVV1ryiFOC07TVWWISw+mD0yTjyQJDvyLUwNHlDU+74zydDYT1IUcXPkJKhro30C87vI+IE7qa6TJsH3mUMrkmYUnUuoOSKWQCBttBNZkS6gmaoQEhT07H9+pXv/rp899/6Id+6KQ7+aqv+qrnTNN036te9aqrf5LH87Gsy7//9//+03/nd37n4J//83/+kec+97nTcrksL3/5y+8MIfwfnbAnOmj3f7Scc3/kdUVmg/0n4LVu6NGihaA2OrnMZaS28FcxLa9NI4CMM0r7bpx52yj2WSqiny+c7XjRi/9/VDnDwx/5CKfOnuam8zcRcsEPPVMM/EH9LwhCqAVKRtPX1YeyzqSPpmCrYsko89NUc2L8rGGgloIaUeecwfSteimZhZI0XLdkKBFbLaZkLTSNuMJM/zdAVvG3NYaCUfeaKu3DXCjVUxuDzmI0tQIUEpW2cch6TnQm11ieWc2pi4FUK6YI0tz3q5HrgvaWwK7XozFOoRlT6/MaMaoVFGU12qb5ox0vFbWAM1Bz1KIpntwquy9eCTeutkJ/HZ611usbUyqxJvJuRyoO2/WNRCqkMWA7oeRArZOyfslcvnaZ1eo0XdeT0pY77riDw8uH1Kzmx0Uqw/6C4mDhl3TWslwtSSWTS8V2AzmOlN2I75dcuXyZbm8BxhFrJUqmSMF7g6gRDW5wZBLOygwwM6yWNAEKvTNsN2uSCOdufxZJPLtmWSZOZ8kTWXVmQwdWcEYhRe88tQRCLKSEMmvL/L40WOdIdWLa7Vi6Jb7zZMlIv0BMVkP03Q6RQgojISSm9Ya9vTPs8lbZqU3Mn2tBTMUbRxgD5EoeEw5PlKAuSIAUwWObO1A8kbIYcZScdbTY62dUpNAByWi4sjd6zOMOTp0+hY+FGLeYzjB0jrgN7O3vk3MglS1XHnmYabNB/AG293Q9pLBWXWvnyTURN1ukLBQuF0vNCZGMRe8RZcpUE8DsAYW+HxAKNTw5QbTjONrH/P0EHcs5y/ve977hiXhN732ZjTX+V+utb33r3hd/8Rdf/vIv//JroB3gQw891P2fHsMLXvCC7SOPPNK9853v7B+Pru+uu+4a3/SmN/0RYsyb3/zm1Y0+T9d1tZTyuOyCbgjqFNEbhOSi0FebVVURirGtOFSoLW0d1YbVVqSU1g3BJMY6kkvEiPDIhUfYP3vAGCauXr3MhYceZHt8jc3xJf78Z76E3iltvqDibqFq6KgzVFOoNWNoRI/6mIzAIiduKcoUSU1/B1hDoSfmNvivCStFZ4oiOjezKsq3qNi71Pb7iOaSYSvGqS1XxlDFNvG6FkCFFptIerH8mHNpTmzPUjEK06Hmw1Zs66hbIS9WE+Qpeg4aacaUOpNRlcAiQs4C1WKtO9EwqkbQUY2nitO0CZpsIreCZkQDTGvRRAM1K8XiybkylUaeKNdNjTHqBmJj851MBaLqA6cyUu3/v70/jbVty/I6sd8cc8611t77NLd5fbx4L140VUEWpIOKTKTMpLCrUDUCCgsJfykQCKySwIIQxpZT8MUSsrOQUkK2kCjLlYnATcpgLEwmVbJlqVIqCjBFBVR2EEFERkbz4jW3P83ee6015xzDH8Y8N0CYzBeRD6eTOCP09OLde+65++xmzTXG+P9/fxhTZABPeCASSMQ4sjbIw4ZhnEg58eTpI7DGNI5M2y0qgbibqCkwnZxxPV9weXhM1ZWrw8Kixumdc+69+BJLU6bNCWWBi4uF4wprGNidvUApgcfPDlwfG5a2FA2sLfDs4J/nq73y4GLhugB5y/bkHifnL3J2fpe1Gof+dUUrDSVvxw5sgJyMebmklpl1XVxmrk7JGcati4hUCGmEmPx90kfeUioUZTudIrIhhpEkiSElhpiY8kCSwLIcQJRqBaSRR5ABSOqsWRVSAKtGK4I2CLE54afCJo4kgzZfQ9uDebxQzg67vrFbSFCnBglgFV0Xyn7mZDojkDleHTnuLyH6mNRy4+zeOSkKj95/j/3FnhC3TNMZQxqoc6WtHuJcAdmMHe6gWCueDBIaGoymldZ5sFEii3q4dIjJ4TP8mjQg/MRP/MSXf/RHf/RrP/qjP/q1P/7H//i7N7/++37f73v4J/7En3j4L+PvfOONN9Z/8A/+wckXv/jF4d13302/3CH4sY99bPmbf/Nv3v07f+fvbP7u3/27m9/ze37Px83sV30w/M7f+Tuvv+/7vu/q9/7e3/uJv/7X//rZF77wheGv/tW/evbX/tpfO/tOvt/nPve5h7/0S780/dE/+kc/8rM/+7Pjj/3Yj939K3/lr7wAfFv72zfffHP5whe+sPmZn/mZ8d13303LsnzHP+u31fF9sirBCkE6/FkTIuoJ2e2IYSQTko3I0KDWPmpsGLXT9wMyQ4gLppF/482X+EyDt//zn+KTacvZ3S0fOTnh6dMDX/ylf0x5/31+0zqjWilagIjqwUUfKtTgj8FHjp7Fp211VmcLfKqDoT9ejGZuWxHxTuWV1z7CulaePXzESiBrQy25jkWdktKkZ801ex4MGqx1c3wBZjf1WyTHbmcgEEoAKTRThuORYJG3ug9Xm49PAxCT9HuCm/FOw8T9eaF3tEGNFm/ILp36YrjCT4qzHC269kW8M6lakZCx5vEz+jwYEMdxtdAZnoEoCasNxamfSiWFwCZkmjQf77YEUrFYSDVAisSUeipEIKyBvBkJ0T1kMmSGJMSceHp1TV2VgR0WK3IycHJ+F7GJ/eVTsMZaE0MOTCdbPvmxT3G1v+RQjhyuZzabLWV5hSwREF546WUOZeb9b7zdweCBN9/6GK+99gYhDqRx4OnhGZcXz7BinJ+fs9aVPI7MxwOH44HvffV1+LEf57f/O/823/zIa0geWI5HD23NE+9fXvHg8uC8TSDMkWxbqiqaE22u7PJIWasTVfBRciueTG49C9K0cX6y47BeoxoZ0h3ncu4PzFrJuy1DTh5ga0bQBlIRMU6mkblWUkysa2A5Lv0m0BPaa6k+srcDqY9LRQ01JyJpWAl9cywSILnAiwCSogMbogOxJUy0sAKFbY3Ml3tIysnJCUri2q4ZZMMiSpLA+ckdmlzw8P2vsn92xZhPaCk5vFyTM3g5YhIpzVgAHQOs4fnESPodXFDxm2YxCpWUR2QaWHGYuvzanHv87t/9u6+AK4C///f//nQjxvgdv+N3XHz0ox/9l+JT+1N/6k+99wf+wB946zOf+cy/Mc+zfOELX/i5f9HX/vk//+e/8Qf/4B/82G//7b/903fu3Kmf+9zn3ru+vv5Q+G4/+ZM/+Yt/7I/9sY/+4T/8h986Ho/xjTfemP/Mn/kz39Ge89Of/vT6l/7SX/rFP/2n//RH/+Jf/Isvf+Yzn7n+k3/yT777wz/8w29sNpsPvMD93Oc+9+hv/a2/dfqDP/iD33M4HOSnfuqn/snv+l2/6+o7eUzhg85Q9yHYt92b3tY/V3vg+7YnfCP6LtRHnt6ZRnHA9A2wWG/EKwrmBj/Q2seFPSk9hN45A+Y5gC247ChY9gzBUFBpmCUigRjMiS4hOHNxyND8TlwloEFIY2Aw8XFYp3arrYglBhOYRmoMDCoc15XtNDGMA0rg+vrA6WbDbtoQJPDo2RMqxmAjuzs7/qf/6z/J+4+e8MbLr/P02TuMeeDJowP3773E+at3GGXg4aMHVCrXz/Z87I23eP+9b/Lk0fvMy8xrH/0om90JTx894dGDB+x2J3z842+hqixL5ezOOXmTuLzcey6i+MV+2EzsTnws+qlHj/mh/+h/zN/7q3+Jr7/yEmszDtfXbDdbDjXy7rPCu89W7nzhy/wn/5s/xZ/+Y/8JX379LYoqdSnUuTAN2VMjDMq6spsmdC0egoxx2M+MMtIWB0e7xzERrBLmFR0TcbtBwuhj41ShHqAunbUaOc6H7kP00aQqrHV2aLglYMXqnvp0z2U5EsrCsl4TUdSK73qLkpIRs085WhgZpg2pk25sjEjMZFPmciSWRJtXwmliHDdIyhwe7WnAMa6cn55ysjWO+/d5/+1vEMpIki0hRciOMpuGqUMPDM07Xnhl4p/89N9AFqE2Q0ImxuSKYlVUAxYrJgXsjN3ZHVpQ6vyM791f8V8+ewYfsJv5/Oc//+mU0v/jU5/61PV2u51/5T/xK5eq8hM/8RN31nUN/zLJLd8t9cM//MOv/OW//Jdfeu+99372w/y+h8Nh+tKXvnRSa/0PPvvZz37hX/R1H7jj+60nJ9zrtHnUaCGg4plogtGaL8YDvn8gB6iZoNml/rIiOSIyksRQjRQb+I3fc8Zv/KF/l28+fMzJySmvvPQiJ9uJEAbee/cRcRz4z/+z/yvX80OWcugfdhfHaPBAzGje3ZkF39vF5qNRhU+VlR/bP+E/PnuJrwyJVhuGY9V25/BDv/vf42z7Eg/ee5vP//T/m2dPn6E6e9hni+4dtNq95u2fmg071gn4VmCsVk7OzrEhU60yhIS2lfl4pBbvfh8F4e2bENn+3W5ikkL4VmKCJ9wHWvOvs9AIosQY+14VAqMHzAZFohMdPUleCZL8e5kLYVLvRm+I/Rb0OZrsueMdJXaRTOg7QrNKaH10HGJnrirJ3D/WrEEaGE/PXC2qjWkzkqJQa+H6sKeqIjFhBnkaubrec77b0Oqexw8esd1t+2NeOR6uePTsyDBkdrsNuzzxS1/9iu/lJDEXePzskml/zac++Ule/8graGvEJORpS7u85uzOKY8evs+6rKxLZbfdstluaerCGm2N4+wis0ePHpA++ipxSpyeTjx9esXlfqGWDfvra067OCrm5Kkah4JVZTNNGJ2IUt3feLjeMwSQJCzFf23a7lgFdFasdQXmILCvDJKotVIkIsMEWUlxQ5BE0EgrlTGOrOuB4+UlY94hIbLdbCnF0KaMQ2A1aLoguC8zRIjBOvEmeChuFISEyICQ/DDMkWjGSoUg5KpcHVdKDWzHHWkUSqtAhSETbeTF80SOleXya1w+fBeZlXHaoTF3yNmM6EJdCm5cHLHRsBgYx1Pacu03aCkikihlhUC/4UoQhCQBqpGiR2rl/z9gVIsIv//3//5nv9aP49dr/dk/+2df/IEf+IH9iy++WH/6p3/65C/8hb/wyh/6Q3/owa/V4/nAB9+DLHzDl2M+elGjiDCIEmldQOD7sZgFHQxKJi1dUDFFLAZC2PjBiFFtS0gj3/ObP8O9mLjeX/Hzj57w0v0X0BgYX3mZvDvn5/6PP8XTZpQwkFp2BWJrtC44yeoZduooZj/4oisOu8KAL0rk56I4moyBGAIvnIy8upuoMbH5732S0Tb8d3/jbxDiSApQJVAD0DqAmm7yxk+q2veWbuZXpiCcxgybE446Y81IacvVUlisErPbIG7CZUOIdCbN832cewt5TsP51ql4k67g+Yf0w6+pYV1R6krNm9zBgCTpeW2e04Y17CZ9/QaRJv76tA7FFsM9DnT/FQUoCIEoGYm+u2xrIcUEEsmbkavjTNBGDEZOQqFyfT2731CC3yKEypufeJPP/Obv58n7b9OKUnXiwaNrNjvh8dVjPnr/LU5PM5dXV4RBONnsmFfvjEkTr775MjFFNrny3/3czxEJlOPC933/b0FDIk8b3n/8mM321MftYc/p3btcXF5x9/Qea2nszu5g7/uK5vTsLu9UdV8njXG7o11es8xu6k7JsVDWlLYUogwc64EYtUPX3dx+enLK/skTDO/cCZXNdgOYq5Vb7T7DRkrBo3zWFUluS6mmjuNL2eN++oFqnb6TouMialWGAST5eD+JcV0X1uWIhYbV+pzbqtqwEJGUMDzPUawhKdFEKGaIdg+lBGz17MwYIyEmyvWRSiNmtxBtNxNnA6zHC64ePeF4XcnxjJi2bDaZaoXFGqrlW5mPMnoivU1Mu1d5cvkl6Dl7wfwG1W+rFAsZa6CiaHF/4w0H+LZ+fdeXvvSl6c/9uT/36sXFRXr11VfXP/JH/sj7P/IjP/Lur/wn/+XUB9/xtUYcEs0tYJ63ZsnvMs09TFo9KFUZXcZNcaNzaFRzNaYQiCSQA0kibU68eO8ej+cFLTNnr72GxIQmI64LQzCSOdOxrLV3Il1cEhIibplwgYZBUFrQjlCjyy/99xwibL1DS2iLvP7Jt3j0jZnrB+/xG37Tb+C//Js/har4HWtqNCukG22oWVc0+sHRbkQmYo5YMeE4zwybrQs/grGuFW2ugFS7AUsb/if7wUb/NXO1oXZVaghCDE7mxzqKLThDsZbGjYciSOyevA6ddrIzURxp7Rl6PfOv2rdCZztA3PrNgfULJuagbw0BE/++TpQJtB6sO+VMNuGwLNhm6+nyPZooWGNZKq1VMhkxQyNYMGTK/Owv/AJ6vGQYRl796Ju8+957FDtydTQurvZsZMe42XF295QYhLfe/NcZBmFtB15+/WViShyu9rzw0is8eu8B+ycXaGmspXBxecHZ+Tm77Yb9Yc+d8xMunz7h5GTHfH3FOI6cbndM2cVvpvjPkjPH/SXzsrIcZ1SzRxtl/4hoVaIK+3XteYqNZZmRcfAdMEYeBqyu7I8z87oSZaBRPH1DBIYIxVjmA4MIkjzPspXFY4eaMg1DH6ooMUcOa6PVlThESlFUE+tSaCxMeSLHTK2NmCLrsvZ0iIjW4jdG5p9EwYEJDnwW8jBRrJKtYWWlVljmQtpkTjcnlFVZro+MU+bi0SUnd+8zpEooV1w9eYfrfSVu75M2d5CYO3zeGGViXfz9I+KQC2mG6MjZ3Td59OhttMyuJMV3mZh6WkoesZbReqCUhZz8fa2/huSW2/pw6sd//Me/AXwo4O0Poz64qjNkj7LBxRVohVy5QSsX9WV+6OnoQ00MlmihdCHEQIvNdyTmSsMYjf31wi9+9RcZU+Bst0MGIW0zClxcXlKvn2GiaBCGELt3LaDinZIrSwJiGQmVIAtm1ceIYXh+8FVd0boSRLHoUTjLUlgvD7zysVcIaWC3i+Q0EDt3M7b+4Q0eNSTiO7SUJ2/EQvO09iRUjGOIXJtSbEE5dieH+lgR6/Ds4EQbcXOuC2bcOG7g56HrMF1kY4ZQyTSXugel1doF+B4wGkOg1sZa1V/SbuMwvelGwTT54QhAJER5ntf3/LKiwSEAN6QarUQTaOJJ8mGklAi1sbSFMiWGsy0hZ8IUyClS55WqlbWsSM6YBM/ps0DMI//Wv/0/4N1vvM1mu+Hqek/IMN05YXP6IrWNnJ/dZ9xN3H/hHvfu3iVPwnQSsFSQDOvximfvf5NsK9dPnnL97JrDvLCfF/I4cXZ+h3un56Tgo7TjPJNy4uzklLKubKaRlOT5T92sQnS/Z2kzl/tGNWEpx+6b68i7G7uMNc53pzQTQhrYWORuytT9tXfbMTHPC0lcNam497SZMgz+a40Ome6q5oQwAcNakVoxDcRh6KIUQ6IDC4IIaRC0Ca1OVDWO9ZpQKuRE6jSfGALtxpepPqpsBFBYQkGzJz0Mwd+3Mkbq5cp8EO6c3mMYIiuKbRJrrQyy4/xkg7Qjzx5/gyfvPyPFc8bxRcbtGTYKrSswVStiESPRIqz4qLzMjd1LL3N672MQG9GaB9yGiqSMPGf5RsZ8SkNZ5sUtGPpry+q8rX/16gN3fNVGQmmk6H+qBO+4Su+2UoxIjC71jqDmd/zepfhuUFvyIMtanKU5CruzLR/92Js8u96zyRNtWRCMbJmlCI3A9eFA1YDpwA3iKUvyrq5L+ZFGa36XrQjaXM9mwS9cowmD+ohljSuWIkEHnjy54Etff8JwkvjGs6esoSGyErIh6pgv1UAO6fkIsZlfmJq1fmiZH1wCxIqGmWRCHKqT793PjlIxleffx1WcblgX6SBrc9GKhtpTF7yrc2+ei1b+6a26mhJah0/n+PzromSqGhZBrPI89BZXQd742GK/oGpTQuePGopYJYeMhuiRRZin0wchdVFCWVZiEKQcqbKSYiIH4XC1p7d+nv6tjWma2Jyf8cYbr1NCY7M75ez8nErghZfvcn5+x83KWhELzjVtDSGw2Q7UKkybM64vL3n88BkP6mPee/cBy7zywv37qDkh5fzszBW9Eji5e8rdF86JQCuVe3fPyDH6jUPvtKdxw2VTllJZW+Byv3K9VvYlUFe9Qb0ScqTkwBAnWlGk0eOwhBQjOSttbeQojMNA6tl5OURKVUTcFhLSgGlDxgRakRhopVLWI2mKzPOeadwAK6ozao1ajLIqVRvnJwP740yU0QNnVahrQdfV8wi1oq0h1dmerrJ2W80ahJgyIhNmkVVgzJFUG8cUSWngYr1mmpTtJjIXJxqcv3SXaHuW+Zs8fnBEw4bt5gQNSqsHchIokErlGBYqQmAiMcIYUS3YspJ2W+698irr/E306TXBoIi/fyXF5wg/iZEcR9b5wFobv3qB/m3d1j9bH/jga6E6kisILTh/s9aVHJOnjROcjm+lJxMYN3l4LusWJE1OGQmFHAe/gIaGAfv9kSVUdtMprTaWslAI/MIXv0RpCjH3IeDqBJPmEK6m0ELBoqsRaT4C9Pgiuvwf97GFgdB3cjEI+8sDP/RbfpDHh8qT41NefPEVXnvjdR587UvdxC6+4zEj4KbfGAOt+Z166GKP5+nq0mHVNRLYYulAjpkcN7S29jGpd8jfUtPemM9bPwj7QXfjSTRncIab0WgXo1hPaAjPR6d+jY4SEKKvNiXQtHULhhNaQpAb1nEfifYuMMjzcaQFH3kG9cG0Et0vGV32bm32TWdM/rO0goRCMfPnXrx7bard8+9d6naauLq65NXXXiWngGllO45Mm4lWVnIaCNZoVXn65Bn37t9HglBbxVDefvubBAucnd7jn3zhi8yHmRDg9OyE0irJlMvLS4aY2JydsBwOpCgkgyFlnj55zDhObLdbdh7rztXVJfura5CIycTl8chclaKRGIfnO+JaGzS/qWtUNMN2t8OWwrpWagORxLI4icl6ioatlVoDm7MtZZlppoy7HXq1J+bBR5pRaGUmhMR0Et2HZxBobvK2RCkLko1lvUa1kpJ/7pIkamnUtRAxtLbOlu25i0RQVwsj1V/PptS20G4M+NcLJW0Im8xxv0caxCasRZlOtuSgDOued957mxLeJN07YdyMVGu0eqQVIdbsCtKcCINQS0RrIFolxIG2zqxDYjq7x3j6AvPeSTEt4hYihKVVJCT35ObEWpw81GG03065QehD8LTd1q+v6q/5jbv5X1gfeIYgCTQbJUMJnm239VtpEEMFNzgH6Wiu4PzA1hAxYkqYRu9Q+n/XVtlsBqZpYhwmomTGYWKY/EB85Y0X2Z2f0YgdWguSsistg/8zxEhM6gcxo4eEiu81fOfnh4KJuarRXAYuJkirXB8uyJPwkVdf4ZUXX+TVj3zEPb7kDlKpz7mUAE0dXSWtIdqZH8EpUBlhssggye0HFlFtnJ+fMG0mchrIeSB2zqZ3ffRMu9QPvYJa8Y7VcLOzdSJLH3/e7OVU1Y9D64rQ7tnCfKzVrfk4fzN5N2k8P3QDYGqdhdiZlPTXz9Ef/pwF6XgxP/zyNDCMfiNSaD6KprJQOGrvLk0orT3fYYokdien/Guf/tc5Pz+lLDOmxt07d5nSyLKfWY8rT55c8PTiGcM0eLJAzEzTREqZs9Nzzs7uMI4Td+/d48UX7jMOma+//U1WVWQYEBGiRMriRJNogRSEWgqtNYYhkZPfxICzRsdpYnt6wvWhcXFZXKFswfd7PVSxHleCGa2uGCshFGo9EMXFRhaFOHlsFiI0U9ayOntUXFgyjCMpJdZ18RxKCQzJd6ZRAlqL20a0OkRbA636zVbOkSCR/dXe1b5aUXOKDVEISTz/shm5mXtcNVKIrMHQ6H+HqfnzgjFIoMwrlhInJydYW0nN0MPKfNwTopBGIaY9Tx6+TW07ti/cZTzJxFzACkkyoQXqslKCK42tgrGQspFUadIgQ9WGxZF8/hJ2cheV0YVmMmAhsGhzRm1VlqUQ8G76hr70bdR7Zlb2+/32V/7S2/pXqfb7/dbMCvDLCmc+cMenNVKkEaKSEtha/QMUAvpPdyuIh7mqiz1iClRraFmxmJDYukctduCyd2Anmw0pjT4ejcYL989YkzHm0XmC0dyLVhVrHgLWAKwQrNJColkihpWgFVVxOX8fVYkWp5zJgCG0qKg1hu3IFZV1Wbhb7zgCqiZXlolHAfkAphGzd1DcjOKCX7Sa+QGR1BhSIKZKkwqrp10fjo/7yNItBO7ds5575rQbb0x95xGCuIin49d8hur+vt7n9bR2v8FQ+xYA28Rz+hKCFodcB0lg4qZm1ZsTz5MbAn5AdwA5IfpjtZ76Lj2oVA2oWFyYy9EpOa5eIOABwTcGf2uOR/OO08fARaHl3LmRkSEndrst8+FAa76nzDmy2e48DXy7YT/P0CKmSl0rdS1sNhtaqVxeXnDn/JSrwzXzXNicnjBuJ5LxrZRyM3QpaEo8ffqUlEbWtbCuF+z67i7lRGkuILm4Wohp4mr/BFGhxeKHN1DKzNV6ICeweSUGYy3GvPhzNpxuASOkhK7VV61VsWoMZ5llnYlDRpKSAv66BDehmyVUjc2Unx/Iy/URK5WgUFmICSfBDBMhZrTBOLrp/mYWQB+fa6mE5Dcz1ndp1l9nV/k6rCDEyH4tpDFzJsJgxhFjqQckC6ebU5IeeXD1Dvv5gvH8DaatEtUnEaLau9HV9+qyUOeFIYv//bUSlglLytoKYx6JKZK3Z4TdJXZYiGHwa4H4xUiLUVrrU4T2HKD+7dRnP/vZy89//vP/h/fee++PAvd3u90hhPBtfpfb+vVUZhb2+/32vffeG1prP/7Zz372lzW2f/CDTxpjGpBWEQ3MLdGiIQZJW4+3ySTzu0llYNXGsPELZwgZkeJ+NBuxcYKqXB+veHb5AFt9zNas+cFVEw+fPOPrX/k6Ibpa7MAVodWe0gCWtZu08b2fOeUhI31HhoszAB/BBQYRakuEAlkEijBMAcKW+Tjz0U9+nJ//23+fNMCMsdEMWnxcS3AptvUD1JKrRKXnsVsjS/bDI3jCgUjsHrae8NBvEsCVpc7t9N7JUN9XGtC9VSbdhM5N9mCgaffgxeAHXQskYAjKag0NI0X9BiTh6d8Bc5EKgSBdmxoNKERGIGGtAYGC37SEFlFziHC86QjX4qrQZETzTu5Iw+oN+1O9220uww8CQQZqygxnO66Ol9S6cn5+TtTI8ThzenrCcV0wC6SUWGrpknz6vlJ4+uyCIY0cr665ur5EkiCbDS+99Safee01WlNSEE+xyJG5rNRaIbi5vKkxbnfPGakS/a0fo6Cm1GPhyZOFQgI2aFSqLiyli1uCkaJy3C8kc08ftZDUSTHaFiASQ2acInM9YlFJ24jqwpSE+XCJ5IAERTZCzJm6rKylMuTRn7egzNczuUXqcQUrLrQCTyUPiSwbxpCIqgwSiEVRdTapRkOHRhEI4vtya4nQJjQ2YgRh8fzJdUskoSHx7PIZZ8PItDvl6uBQgE0eOM6Pefj4khjvMt7fYPPCYYEhJULZQ/EUhZIqthaCNqxuMB1oVknByIeKhIZuGlL2TDkxbhKaPc+yhkgOmUzAUqPZ6ti3CEkG5DtLXfuRUgrvvPPOHwghOJD1tv5VLjOz0lr7ceBHfqUv/sAHX7JKrE7+WKlock+b75n8Tt9R/YG1LmADVSA1j9NpbSUSkZBJaaCViq6NISTKcuD07GXKrIwx8XS/58HjBzx78oTv/f5/k//i//aT6PFIqN2crZWoSjZnZK5B+7zQQKMv3dVQ/daP57LoSmtCShvq6oIBCUKZZ85Od5g1PvkbPsEqrlode1q1BSGQoYmPmUKh4d1uDB6NpKZEogsNDn4g2ez4qtb8wlRrAdT5jSH20a/n5om4T0u1PR+vmfrNQBQHDd90dm4m76+2+qit1YZEiE2fj0SDOJYKpVswfFzqwbmOYquhEpIRV4PaUysEB1MHj53yA81/xmChJwQEYszUorTgNpZIT3YPoNL/bb7ryzliS2W+OhCzcFwOfiMVYL8/QBIC/rjGaev7qz4OX0sjiFBbY7vbkIbEo4dPmA8HXnj5JS6ePOXFF190Jae2noHYX3Vz+8pLL7+EBiilcnV5Re1xQ601BOP6AMUmSoIaV7Zp5OpQKD2+aDOekSRT1wPb3QnWnE9aykJLgi6NsC6kkKEVJgvMRWEcqa26+EMT87Mjw2ZAZWWp1Q+16LmGQSClTtQplVIKRqUZbIYth7mQ8gZpiZQDtcy0OlN0ptravZwenRWooD6Cb7YSEVJNlGRgFZozRTdnL6DbDfuLPW1JLPPKtBnZ7CZsXnn8zgOGcWJ7/gJzU6iFOlc0JoLNaJspNRJSZmhQY2VtM1H9JmBZCk0Dm6ES10bTQgiNzTRxTA5uTyGQ4tDFTM56JQh6owD/DtwMn/3sZxX4X33+85//3wKv8m1yiW/r110p8O6v1Ond1Ac++MwCFgfnNkYXG2QdERoxNJI0iq0QBh+34OIJx2cplnCkkQyIJAxIBDbjhnt37/MLX/gKqQlRKxeXFzQGLq+u+Jmf+xnO7p1z/c5Mlswq2dOitXo0EfQLXfZ8Oo0u0oguSLGbBHaqj6GCAh5Cm3Pi+vqa6e6OISZqVe7fuUsW714i4tDh5BenVv2KLsmIVKfWhEAKkdUM1C+s2OJdXA3Uon5YB/Ck+ugKzBg9gb2BEZ9HOIWgz29NRZwcg/r3APruLfwzt6+Ch/VKhCS5H6gRxA8y1Upr/nqkGCEYrVZaix1cPTEnZ6vmfoBlBYk35BffI0pwQZGFCBKp6heoACSJROs3GHpzeLoFRFL07Lra2IwjS3Xwc0qJPGWq+g62aEVIHiBbSk9/CNS1cO+F+9SlknLidDzl+7//+/iv/s5/zf0X7nOy2xFDcK+aODEmhEDKkf1+jzXl6vpAHBJaK8fj0V8nf2ez2Z7yjXefsdSAZqfMYCNgPcnCA2vXw8qYJtbZO6lalBSzX6CbMLSI6QymDBJZCUhKxAbrfk+OA5sw0Y5G2Pmkoqylq6L9tdVWSdFH8YjfsIiGnoMYGUZD5EgUoa0rYpFxTDAHavP9rwti1dWsrSFJ0FZ8fNtCH7n36YKC6cpuJ9jhioZy5+wU0cL1xVPW1Tg7v8MmjMTWkOXQAQX+fgzBzfbWpwvQ/Ma4BnLcoDIAxrIsHlydEoRGTgNls6WWxT2HyfUBoX8uUzQXyinPFcjfSfUL4XfEc7ytf3XrAx980yisAdZqxNYYrPmdfvemraqeEh48niiYEM1FES00JBkSfQwWJHhng/D44VP+8T/+MmPYsqxHvv7O13j4+CHjdM6+VF7+yBlnmw0XzSkWUn0cSvRxnzU/jMxAgvr+zgIhus/vOVZMEmuLpGHj3VabQQr7wyX3P/YydVW0KqXiYpIYqfQYJhrNzCXt1jqd3z1d2gIxiQt5unnY6OkV4SbARxynZvJ8v6fqd95GN9/juyz3Qt50WKBBSb3z6u46r+CKzniDIQNMFR9W+mNWcx+aBHMxjcTnulINASG7jUKij0JjdA+W9SgmfEV6Y+tDjYYLVqr5xVNCDy0N9MPZd5Puv/KDo2ohCFxdPmMcMyH7mBqhE0sS02bkuOy9AxL8uYrRR3F9BHvv3j2Oy9Ff92C89MpLHOcDp2cntFqJeaS2wv54IKVETIHdyYZ1rqy1sRwXkhgvvfQS9t4jfxolcLUvPH56IIQRLQtDiKhWjsdrSvWA61pmhMA4jSzH2bv3IWBWYFWsRRZbyRRiSGhKpGHAVIkaCBpIMbKUlSCCLcYwjozbibkcXN1sCWu+857rzGE9kIdMTpH5eIAotJ4coloIQSnHBWtGjiPLYUaDUMUnFW56jWiA1RqDNsQCIokWIG1OfBS8v2TKkdoW4m4HQVmvL3j/m99kOH2ZGDfEGc/Vs8i1rSA+tVCy3xXpTLWKrpUUEmaVqkdSitSyMgyZ+XhANls0BoxITQlDUC1UrSAjnispRC3etYoQv21R523d1i9fH7zjq8W9bRixrESprBmMCYmZJhkh9gu83+1DxkKlxUYaIiMZ64q3/l05LN6Zff2rX+fsdMfJC+ccc+OFOy/z7MmB2gpl77aAYtXvKkVZdaEhhDCBJVJIpOAJ6iGK01+0Pe/4REbMIjBRW8O5HcoL9+9TWmFZBCmVkzvnDLJlZqaGRki+0xSMSnVDtkqfNSrWqlM5UheCVMNqj2IS6daDgITBd3UyI3EghNQfT+vihNBVp/3EuQmXVSB5N4ne+O/MbzD8dtgPWfGxogXHp9GzFsz0+aGLBVStqzYjISQ3EEtlaN8S2LhpxIjWPHg23GQNCpr7TqytnhCgRpaNi1rwhAwzIZC4sXlIDC6KmhKlFa6uL7lzdrdL/gMpJJbjwrwc2J5s3Q4RhcPxCCKspTDtth03B9vthlorn/rkp1BgyLnjsWBdK3kaCQFaW13lmYQpZjQIIVT2+2tOu0Q+xsRxNUqDccys8x4hs7QDOX1LUWgVUOM4L4zjCNpYy0xKypiCj+kG3PZSjLquPt1Qfz8EjLUufbeq5BhoZSUFYRighdDVyh51JGJsT7YsqwMBWlCGcaJqpFYf/6UYqesBbZF1OSLRY7OC+G5bSyOkRNGCN5SGLD6mt+1EHgfKXGilcjwW1AJnr+wo5cD1wydspx1hu4XsNx/SjIqQppGijUh24ZOtmChrbYQq5OjisyABrQspO35tGDKtrSCZFEckO6jCaKx17tmIEdYGFd8XB7q95rZu68Orb8PHl6jm+6GchdrHEtb3PSGClB6sKr4HLDNErdikPmLRgOwGUN+hrRQuDo02H7n74l2unz2ljYFxc8bT95/y0suvcP8TH+Vn/1//NQClFLIKiCPJnLzl3aXUioauKlSlaegKfR/tm0KIRtPi/6E9WshT6chxJEe4c+9FX4xYRIt3QShoa1hK3pk1V0RKD961Ct5rJkKouIAs9WR354a6iT/0QFj6Y/c7X4e1+IHnKkvHlEkwMsbac/dCj3NBbr6Hgamr4QhdEBIIaHcRuKfP54+e3O2TUhe7NPWRtdW+ixMHKccwIEEJUaD2zEAxskVKc/xVCt1nKIKRCJS+CxTAx7h+qCcIMIyJqo15WRETWvGw3u0wsR698zYptDKyrtU5pv2MH8cNpVQOHCjzkYiy3x+5PM4c68o0Dty7e06yRBShrCtgzMtCLYX7915kLdWTLJoxbkZMb5zpA4+fXBDzhqoOG1jaymFZuH5WEPUDUtVHvDfipEhECpTDTFNPGwhElmBMw0Q7zrRaSAHKvCcO4grLtiApMy+rj4wtIFMgjhskCMf9jKmb8bU1ckosy+Lvr2Z9yhIpLdBiZK7K0nwpKxRSM39Pmu+miZlIJAZXxWZxYU4eMqEtbpUgczwe2J1tmYLy+OFj5stCHs8Zxg0WjGM9IipozETxMSpWPVPPKlr7XtkCRR1VmMgQGlqKd5Mhs2jHDkYXm4UopGJYq6CNGhN5Gmgtw+Kczltk2W192PXBF76xga1Yqqy5uZcv5k4EcWl96B6xRYwSQbIwxMgmZlJXjzUNtFYhQcs+/vy5/+a/4Wtf/TLDdiAMmSgDzx5d8cUvf4l5f+Tq6oqKuiItKkhAYiQGcf9TKKgcIfoHGXzPdAOEBh9pDcPgFoJBaElpwXj67BkByDlRWkVpvPSJVyAWJEYkTISw4QZ/JihJlJz8jtsjgsTHrdqIN/Ev0jsqpI82K9pjiEL3yIWeuEDUrupsfaHmXrogtV9k/ALZZSs+9jQ/tEOQPnbyceM/u/3ro9GbEafPVP0ArX4hjbinMiTx7D1pPbvPH6hEIfW8uBaa/3LPCxSEJPH543LqjHRbRPFRrznXta4Nm1eSRIZhoKlS1sq8rBzXmaqNy6sr9ocjx+PMsixukWn+WNZ15TDPaAg8fPqMxxcXvPvgfV544T6b3YacXTl7429c15VaXE1canW7hlW/mUgD0nP2jotxeV2QNLA2z7kjeOe4rKWbwAHJHJaVsh4QKgUgJgLinrvgnbGqsr++phqQI0181G9BqXVGQqWWAxpa38G6WnPMkTEn3xfPfmBDh41LZJg2IKGPlx2eYCKU6ErOkBIh+GdHLGCrZ2BWM8wyOWSgsdTVjedamOcrWj1Sm8EwsdlNhOM17cmRKZ0zbe8gMtCas3aPtdEkEWT0/bdBC4JIZgyRbInUO+lGQ3GkYQiBQTxEurRAbeYA7j7aF1WSFkwrMU9oEuqU0ShE9YP8tm7rw6wPfPBFYDRFdKVp9Rie0DPkumE9qI84Q4+48fAEI1RFKp304WqJeVn9Aq+BtjZMnAh/dXHN1eWe3e6Mtz7xaU43d7i6XB31pX3ih4BFF1yYOh2r+wWbNpL4IUDwXR1ACh5lhBlrrWj0rurR48dOo9FGpfHoySN+w2/+TWDdm9i0w5q7mEMVtKK1eMTR85PI/+XaRGN11YA/ydLpNfgFWNX9VdoTEcxa54uqX0xQCM05kerdRsP/HjXHgLmoIHr8kEjfFfZBaBekePC9x9d49+cqR2vmXbI1TD0stzUfOgUxLDSKNhf21Ia2Rm3FrSZUMDfZe3J2657C2O3ydBO9EMJATjty2pLTyLCZPILIny5HcxlOXRkHzs/vkceRPGQk0i/yPrJrtVDKiiJ85Wtv87W3v8nLL7/Cuiy01QUSYupA7NKwomzHqY86I2uZidF/xuM8+04JuLhcKZZoIdCscVxmjscCxfeu67r6z6SVYUhsp4laqnv/AhSMIpHVArNWTCuBBsH6iG+mNTfPW+ss1eBqRlP3gJpBKQvleEBrQ6uRYnLDeQMJ6Xl2XpYER0WqQG3kpITkE4hWzUePrUHILjwzI7aErf01kci43SIhcNzvqYuxzEqcNgiBw8PHhCoQT4njhvU4E5p6cG0SLLo9ZxwnkmSX7ypObTEhhOT2ndAH7qaeF1kPBF3812qlrauvT4I/j9UqWldSgc2qnugSHcp+a0S4rQ+7PvDBV0w6cT2CJGLIiK5E3D+VY3KlZBBGE7Li9gYgEhmTR9qgBQtuKWANRDIPnjyjSOLhgye0uXDn7IyPfuJNmhi6Vso8kzAGhBgHQhgJIZNjIgku9W8JCyMmQhRzkzBg3RoQcBVpNB+NGpFVhdc/+jGWZaa2FUS5urjk+/6t30YMo3sUQ/NuN7pnLSAOqk6JGGJHclVg9Ttt865WRT0ZHse2BXHQcJTsz2Fnqnxr7um7t9bHaRLWjh6LVLPnYhNC8I7rRkhkfbfXf8rw/Pt2Hqj0cZr585R6z6hBkCjEGIgx+eiu+fCVoJ5kge8PpfsA3WScOxB8ABmo2rFS3Py1BZN6A2zpisfixJfdBlLyPaQItayYNVJ2lWgIidJ9c2ZGLYVpGP2QqJUcheP+yKc//T289urrBAs8fPc9Lp8+4cmjhxz2e772S1/l0YNHSBDOTk8YshATDGNkWY6utlUjRn/rP3pyTVHfDS91YS2N1gK69Lusvl/KYpxtJk5256wFtHgqibp9E2tGmVdyBqiYVebjHi1Hcm7U9QqsIBHmspBDJnSs2DBtCCHRakGr8i00gUOIgga03tx0BQIOjG9lpbUDKSkidIiE+U1KyOQgTC2Q1TtnBwvk5/g/kUgtyunpxDiBznv2l8+wYSJszyBHcmhYKQQzxmkgD0JKDsGWkBjIJLz7bEEoJijJ/zHpj6lh9RrVAyFUhIZoIbRKCo5dkxBIBllhEwOjKkMwJIcbtNFt3daHVt/Gji9SupIrNEVNyWqQDZWAVSUj7klqgYHg6jJJNGvkoERWrAqrKiFHdFXimNlOZ/yWH/wB3vknX+Hu9oxFG2f3T8i1er5Xqw72BVRcPEHrXjjMLwIIIYOoQluR5Ebhb5mAjCbelU4hYzIwDQFJnmlXlgMqFcF45aOv8vInP8nDL/wiB6nEVmmId1Y97qhWh2Vrq5C0bwr9rtckIqnREH884I+5P5QbPqn2jL8o3gk8bx7F1aSKUbN3ptHACS/dSK763GJgPbfP1aBG63gYuRlJBoeHK37Rd65oonoqGmj1vWBnJvr0NDkVX5oLI8LIjYlfEUQFxDWiUQP0CCSNve8zRWgEWR1eLkIeN1zsr50rucloKVhUpu2GEDzjb60rh+ORPHhMz+FwYLfZUnqmnNaVMsOLL92n1ZWf/YdfZztt+I3f+z3kPPLCCy9Qi3fEFxcX7M52XO2vuL6+glbJeSLn7XMMnZEYpi1rNcpSiBrRUrGlsrkRFwGI+c1RUEKIxDhizZMfXPmyEIOx389kCSQpiDbKUmmy+HslmCdoaGVzco/rdUWJqGQHPYuAes6hGrTiWXmSh841UDQ00jTSJDEU6Qg1QXUFivNiLRDaiIVMVrDWqMFYmpI37jNMOdLihDUYh0aVCy4uLrg6CrsXd8joGZS1KWYFDYYERbUCg4dQh9ltTGSI6jvi6gHFEU9UxwyiUaMR2kxYM/T3Vmn6/H1s1VcQYo2iRygr4Gi28J0Y+W7rtn6Z+jZ2fEKNULsJWvvYyMqCdLWdYhD9zrOpOd5LjCBKQIlaGUUZLNMWzy9rVpBaeP3FU55cvEuxwsNHD1nXmd1ux4P3L1AbWFtAU0Zj9+kZtOp36oiSByHEjmlSx6eFIN/aOVhAJbpIpRbPDiTw9PIph8MRCZDSwPnpDmPl3/kd/z61rG4qFoCKWu34p0BMw3OBikhEpH9IxYHGWs2TK25u3UPnZAbtBvXWI5Xca6VNCd3sL+pdnG8FG0mN2Hprh5vYg2N4Ab8g3oxLpUcqIX0sqs0tGDnQslAFWoQWjRKMSh/jttpN9Q6mxpITYsIKodLVNajNVFmpsRJCJYWujOnfx4KPoLGK2UrTmVoXWi3cOzsnD4Or/oBps0Fy6ike3SpBf/80D0R98uQJz549RbUx5MzZySnzccaactxfk4eJq/2ChURTY7fd4b6xmffeeY+f//lf4J2332XMI/fO7vrrFWBdfNRZWmVpjcPxQGiQEK4vL6hNGToIGmAYRpZl5eLpE7SsaKssZfHnSBpzOQDGkLdd5OTj6igQWsSKsM7NvaBVWNYZQiEl8Sinw56gM02PfRcspJS+FeTbgd9BDLXCakcCSgojagnVgES3sdzAyNcglBhcYKINic7EnA8HynFxos8QaazEeU9ZV6a7LyHDSG171vnK31MRxmlExLFyjgo0Ai7cIkWIEYuKxoaESqQAtT/e1gHmQF1oxz3leMBa69Oam4BkI1D8EK+NVn2PfbO3v63b+rDqA3d8Ykq2vmYOhiUPZM0rUApFHGTtH5QONQ4Ozh1SAPXIm6aFoIExCs0aljIP3n7Az/ztv80bb7zKV995h/t3XyXFCbFGbNA0ErdjHykKmCeFRzVCFlowWuuHEvjoxaIbdPELnAfcRB8jhUY0JQ3CKy+/zGoNU4gaGPLI1ZNnfPbf/Cz/9zt3uZ6fUXs8DH0f02qnuQQnrpR1QTJIjLRWSZK74CN2T50LU/zwKESJaOvnof+OA6XNx5RqgsXs8S5NGVT8+5mDhkMQ7EbhZ25rCB3MTXBT/w3L025WJOaexoD1xPWCp/Ylsg3UenSiS0oumLHoxvnO5Lzx5dGBx4YRqyEa0OBioqbuzbsZuTUzv7DRoDZOxpGgnkZgZpRaXcYf3aReSyPnxCiBWgvbaYOd4KM1bVwf9jx58oxnzy4YUmYcRj75qX+Nr33tmy6qsR7tBNRaOb1zh/ub0Ue+KK16iryZOdwZeOGb30DzyPFwYJ2dfnL6+AnYQAqRV54+AOCNx+/6DUpZsWbM6+I2g7KS8J+zaSCmkc0YWevB/56jv/NMlRQjrTSCBML1I2LuuXstsJsSbT5wrBUlu1+1FufU5thvZPwVHOLAGlbCcuReN5lnGrpeE+ldGpUSZ6JAXA2thZg2gFLXA7YoxEw62bCdQZ8843JRwuZIniGGgs5HJAkxR6QkqoZuOfALR5sXluNMqX4zt6qnSWRzhaZ2c7sGtyXEVRENJIm+aqh7KLOLvCUw2BFZCmW9RI6FkBNE5eOl/Gqucbd1W/9cfeCDLyoEFTKRFpSVhiHEmIkpezdkfuGkeRdyg+IKQTyLTBKqiWILOSQ/RDQzH4Uv//wX+dT3fy+vvv46u/E+ZVkZT5V6/YwhaQ+0bL6DUu3UFJz6gasrm1UP49TeZvXdGjiFJDRXxGnw8M+75ztyEi4urjnfbam1cn19YHN+wsX+wP3X3+DZF64JJFeWiWE97FUkUqvvNyXeBPS6GTz6l2LmiDO11u0KNwb1RDDpnZ/2PZ/bQG7+16yn/KnbyD2bz1MDpKcnhOeHEoj0bD+9yesLvTNUUodoh9T3f9q7kRAJKrRD8YR1OoCchmnthJ0uWlDBmvg+0Drzs7dDQdwYjThBJhJoNroZuwOyy9L4xMff4sagn4fsqs9aaa2RciJPI6UpV5ce6prJpNgTPlICVcbNxKdfe422zCDKbrfl5HRHrQVLmZSS8z6XBWJkWdYOS/BooqrKtBHa/RdZhoE/9H/6Cx/o/f8//6n/7IN+VH6d11d/rR/A/9c6hMBt1MJtfVj1wQ3sMblHD+mkj4owUJ/P8gzVxtxal/gbSUCiH4oiDQ1C04TlAQxSCJACpVWuLy/IwTBd2I7CYS6cnbzMxeN/hARXrGXNxJxQMR+5djWmmEIriLg3z/O9MorSunqP0BCtEAKrBnbDhldefYnTsxOWpmhTtFYunz4ljI2C8N//H/77fPXLX4ImtLrHdO2CFqPW0vdgCgFqNYZxQKQHnQJoIPlACD8uA2IJqxHtpnMwD+ftmlANfmAGc9N8wlijK/IkuBdPVYl0Cgzmvj+1TkrJ/uNiBGvd9tBfwz6SlOApGjeq17kUogip2yzouzzHk3XPIm5it2ZdNdu5njF09WrfM5pzNaP4a+OCX2NZjnziU58gDslB3gjRGsM0uD2l/5nS4GSzpc1rt3Y4t5IQUFZOz09ourKUI4gxDWfsdlsfNbfGokaMkXGaiGMmDplaK/O8EMZMrEYpC+vrr/Nn/mf/S6a1ggWWnql32C+0quSQKa3w5uGC/8VP/u/5s7/zD/Kl6QUmRoLNXB2ve0ZfIUUjxgHJO//ZgyLJuLi+JLUex6Pd+1b7PjgO5G1waLUMhCDsr49YbViDaRBKbUx5x3G5RK1CHFwYFhJFFsK659njhyxtcNJJOfi+lUiQggnoevAsvzT6DjclEi4c0jww7UbW62ccj8o0vUBMG0KqpLiCNTRmzISIkIcRE7+Bam3FDhXKijJT5j2hDqw4Ug1taN0QkxHSCmS/Smhxb6kalYIV35lajpCVVhakBkYL1CQ0M7LC02D8ve/4Mndbt/XP1gc++EJQhp0QlkoOLq+3BiFsUW0soZL7hTm0Dl4OTk5REST6zifkTEgB1UySS9/nDCNXV3vmtZLChofvvsMrr7/Kfn/J04fPKO1IVF/4k5QihViVVZQixnlPnq6tYaijxiz5wdpVnTSFoaLqysdq8Im33iBLYEonzO3ItNtwoo3WlDE2fuC3/hZ+4b/9rfzD/+d/5QiqMBOWTO2eOUFRK318mNFmtHr0LteEWQoBv4CuYmQiKTS3RUQhWnAvXXRSSnPvAmqe7C3SPB4pCNgMdew7nwVTH9n58BJAuzG97+PMLyiIIEFcSm+4DaXvXgkNxDFripKCumFdBJPZR5WaiawEWTGLRMu+m0kua2+6YM99fRmIhOCRT2rSWarGMCY2Zzvm6z2rJDZxwNSIN/F9BqUEyk2WYIpduKOEmGgtMIpQ28w4JrLsqNXTBj7y2ouYuphmWQvzWt1Ko8rTJ0947fXXkbintca4jTBPRFt5Z3yJ+NLEXFbKXBCLPDzObGsGU/brTNo74+crZy/xj3evMKUNo8CxXHB9ODKmiK4LQRq1jey2ZzQWSjvShi169EMvJOe+Wq0kIA4JJDDtBoLsUFGWrSGHhWiNypGqwnY4R/UMpNDIEISmkTQlri++xvW6stgJoa3YfEEWoa5Glj3BVma26HifmAKtzJh2xeg0EjYD4zRw+XTPOu3Ynd4jhYGUFdNrQobASNBI3g5IHMlxgiTMywXkynq4cNBDE0JM7MPKSIU1sQ47YlRiHpG2EEzQ5CIpKDRttJAJsmGaAsVmjqrEGNiZryuOrRA3jtK7rdv6sOrb2PFltFVyCpRmINmBsyy0KjTNZGrnSgZMbmTuzjQp1UBWgpiP+sLkxmoDs0hhx9fefcpbb71JK1ckUWLMXF5ekLPbGtzK4KrBm49BFkX1SKtKyJmWxt5tAS36SBH4VCkEVUKKNA3Y1ZHfWBbiP/iH7ObEeQqstXIiwrrMTJuEfe1d/uPf/L38X/7bL/LOw6+TgKzFl/sd7wXmHimRnuRQ+m7N0JCI3cbbAsSubrUQnCTTTehWXXFpTZ03auJeOPWLpIbmo1Rt7hcM1c82eK6yDH235ym6N/pQnJYjPdfNfKco5inrCcXqQtWMaiPiCd2t81bNKkIgWO0CJVepmlVaW53sEtSjquyGEGrE5B1gCAO6HBmngZfOd5z/4pe42j8hp4FWmz/6lCjrTE4JkUSSiSQRwf1ftRY22xO0Na6urnn84CGf+uSnMHNv3LKuxOTwYwmBpRT27z3k9OyMs7MTrr7yS9w/LIzX1ww5u+BqFUiVV7/xHvLY+Z4BaKVyVgublmkr5HHkpQfv9Pd/YsxK02tKS5gUZExUFd9FoQ4biEZZHUM3pIEihdoFJ4gSU3CMmTXEvDuvpUJ06DqhOOmuuijkcLwiJSUGR78pholSy+IAh2BMwWjdF5jowbgaaAtIGMnDFrWVGnra+9qIQ2IzJNbrQgqRdJIIY7nRMDGkDRqEdW0Mg1LtSAzGmAZMGrlCWSoWAjoIrQphVTRUVyXHgFjBWqU9F79pBwm4UluCsUkDLSZqmSnr4rvQYaCFgtCYxI3483PPzG3d1q++PvjBF7YErghRfSHfE941raBbKBG1QkwjJpmqnlSdQiOQaRYBp0cMFt1cnQYMyGEgTxt+4Lf9NkoZaFcrKax85Zce8ZUvfxlYSWkAc0Nu7vs9U8AiVQwZEkPM7FvvKk2wBs8kcQiB/92z9//5H+rH/8/w47/yz/4/+kDP0PxBn8pfRa3/1P//dbbwf+dd+Hv/vx9W/bu/zO/9ex/wexzTwOV4ypCERqMcG5eHPSdnL7IcFrDQvaqRtlz7KFgy61yIkgjjQNNCVB//h+bTCYcye6yXtZWgjSC1+9+EacxOcamN0grDJvVJgJFCoDRQbbS276Z1t5TU1lha9ZFyFAoz2tbu+ZwBYcgJaXt0Ls4+HSKaIjE0wjJjbUsp6q7QoFRtDHGLoZRaeB5CbD7+jil2WlJl1UYSIVnBtLAW8ZFuxHXK5oP/6LN6qq605YioC+ikGmEAzEjNHIMWb718t/Xh1QcPog3GLmYsVkKOUBvRBooNRAvk1IjWY24kosllyEmqUx3YQBhpqkgUGsGjVVsloYTlEQ+++SUujgNxueTycsP9+x+nrA469jQDY4pKUM9ciJpgVUoyTBojlW2orIx997jydjJ+6KVXOUeINrgvLRqT7Pj4J8/5D/+j/5DHT4t/6MaBMCaCGe+/8z4vvnifqpWSJp68e+Cv/af/KYGFWq2rJ7X7+gSVrq6kRyURgOpxSDYAQksrsblEPnQhCwGU0n1vLj4xC1hI+GZFu8y/PVeIhj5YdJiT7xjFwGqDdJPXJ55Cb5Cid5lBXT+q8aZPdNO4Gqg1soBV8ddFXEBkoWJADBlTh1aH4Gi4SvIeMWVyW3EZTt9dWuoKysKd8zuM9+/wR/7E/4T90yfkOGAxsLZCq5XdZsc/+rl/xEde/SjTvdEN/r4+I8aEsfDuO+9w994ddtude0l9btt7WzfoNxqSIk+eHoiSEDOW+cA0TqgZwzQypkipK5ca+YUvPXOM2+J+xv1+oeTIuDTWcvDnRuHpdMq7ecKOCzIGSknsprtMMmIsuNg1EfFRbNBAM6N1pSohU5snlnumoT5/DVvxTr6uM0EGYoz+PBd1n6PVnmvYKOVISCMpiu8gF0WGEZuVFtyu06zSpFLVnaHD6MKm4yx9L6zkcWKYBo7LA46twXDCEEaqCibN1clrAYkedRUTkkZSmAiayFFYo9NvYvePWqvMpSKhYJivQdoRpBBk8n189wPexFqF6FOSWla0LgR1ha8GI6nSQva9cYUmt6PO2/rw6gMffIhCcwpJrQubnFjnhHFC5JoQjxiNwpFmAUKiaaKVBM3l+DG6qlKCUPCBXDQlWWNYr2G54iOv/CYefxMeP9oj4+oXBgSRnm5gK9QVq4EUBmRdkNg4RiimnqOTBlc5hgNk4+2c+VockbIhSiUmZbQdxzzwWz/+Ma7nTBTznVVsrIcD8/1X2b/4AvPhgmvJPJ4KvzCe0GxBQ8bMRTPWAmoZx2lGqF14giHJ1ZjDGmkWWbMiYQPWnFcajNgtIBHfZxkJQqDh/jgxQ2NBQvRRZhCCTtCCA+ypRHGYdbBIiYYKmESafwEx+QhyaB7QusRG0IDoSAwZkgOTxz5ibSF61FKpEJSqEIKQc2ANmUhBMFYN1DzS0sCmGVEdPC0imHna+TSOnIwTwzDy/usf4XK3IQ8jaUzcIMJrHtHxlAd5w/bOhloWznY7coqspigZe+0tnga4Goyz0wmtDYvRYQHdGqm2oBJYlkiUEYnG5bOnXKpyducOSxRqKFSUrz4sfKPdZz5ewaViDFzmlTkldgVWe0aU5GPLnNFZfTdNZZkb201mmY9OuImRugaO1W/iIkJZFgQYBlhaRVpzDBcB6YzPmAckNgJCacYQd6zrJTF7juGyOqS5afX3SSgIA1k6FrDvgqMZgYgKlFY8laIOxGDEVj0NRE6p9QlEv5lophzKAR3PSdt75GS0dkBbRuIJKoUYA6UsTENkTMnfg913KqUxdHXxvqzOqb0hx2ggVCG0gg4LhguMYEYEomWCjIgKVtzjKaHjyQQsNcQqa4AmgUETIvnDvfLd1nd1BbPb2flt3dZt3dZtfffU7eD8tm7rtm7rtr6r6vbgu63buq3buq3vqro9+G7rtm7rtm7ru6puD77buq3buq3b+q6q24Pvtm7rtm7rtr6r6vbgu63buq3buq3vqro9+G7rtm7rtm7ru6puD77buq3buq3b+q6q24Pvtm7rtm7rtr6r6v8DUCq6UXxBk8oAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./example_images/000000189475.jpg | idx 101\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "num_imgs_to_show = 3\n", + "lab_object_counts,pred_object_counts = object_counts_per_image(labels,predictions)\n", + "for image_to_visualize in np.argsort(lab_object_counts)[::-1][0:num_imgs_to_show]:\n", + " image_path = IMAGE_PATH + labels[image_to_visualize]['seg_map']\n", + " print(image_path, '| idx', image_to_visualize)\n", + " visualize(image_path, label=labels[image_to_visualize], class_names=class_names)" + ] + }, + { + "cell_type": "markdown", + "id": "e5ddd4fe-4477-4b68-ba79-e5cbb62822eb", + "metadata": {}, + "source": [ + "Next let's study the distribution of class labels in the overall annotations, comparing the distribution in the given annotations vs. in the model predictions. This can sometimes reveal that something's off in our dataset or model." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "9d4b7677-6ebd-447d-b0a1-76e094686628", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:57.491973Z", + "iopub.status.busy": "2024-06-25T23:03:57.491650Z", + "iopub.status.idle": "2024-06-25T23:03:57.632307Z", + "shell.execute_reply": "2024-06-25T23:03:57.631806Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Frequency of each class amongst annotated | predicted bounding boxes in the dataset:\n", + "\n", + "car : 0.08 | 0.06\n", + "person : 0.68 | 0.7\n", + "cup : 0.11 | 0.11\n", + "chair : 0.1 | 0.09\n", + "traffic light : 0.03 | 0.04\n" + ] + } + ], + "source": [ + "label_norm,pred_norm = class_label_distribution(labels,predictions)\n", + "print(\"Frequency of each class amongst annotated | predicted bounding boxes in the dataset:\\n\")\n", + "for i in label_norm:\n", + " print(f\"{class_names[str(i)]} : {label_norm[i]} | {pred_norm[i]}\")" + ] + }, + { + "cell_type": "markdown", + "id": "200cdebf-b24c-4c2b-8914-6a2fce218daf", + "metadata": {}, + "source": [ + "Finally, let's consider the distribution of bounding box sizes (aka object sizes) in the given annotations for each class label. The idea is to review any anomalies in bounding box areas for a given class (which might reveal problematic annotations or abnormal instances of this object class). The following code determines such anomalies by assessing each bounding box's area vs. the mean and standard deviation of areas for bounding boxes with the same class label." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "59d7ee39-3785-434b-8680-9133014851cd", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:57.634634Z", + "iopub.status.busy": "2024-06-25T23:03:57.634175Z", + "iopub.status.idle": "2024-06-25T23:03:57.844094Z", + "shell.execute_reply": "2024-06-25T23:03:57.843461Z" + } + }, + "outputs": [], + "source": [ + "lab_area,pred_area = bounding_box_size_distribution(labels,predictions)\n", + "lab_area_mean = {i: np.mean(lab_area[i]) for i in lab_area.keys()}\n", + "lab_area_std = {i: np.std(lab_area[i]) for i in lab_area.keys()}\n", + "\n", + "max_deviation_values = []\n", + "max_deviation_classes = []\n", + "\n", + "for label in labels:\n", + " bounding_boxes, label_names = _separate_label(label)\n", + " areas = calculate_bounding_box_areas(bounding_boxes)\n", + " deviation_values = []\n", + " deviation_classes = []\n", + "\n", + " for class_name, mean_area, std_area in zip(lab_area_mean.keys(), lab_area_mean.values(), lab_area_std.values()):\n", + " class_areas = areas[label_names == class_name]\n", + " deviations_away = (class_areas - mean_area) / std_area\n", + " deviation_values.extend(list(deviations_away))\n", + " deviation_classes.extend([class_name] * len(class_areas))\n", + "\n", + " if deviation_values==[]:\n", + " max_deviation_values.append(0.0)\n", + " max_deviation_classes.append(-1)\n", + " else:\n", + " max_deviation_index = np.argmax(np.abs(deviation_values))\n", + " max_deviation_values.append(deviation_values[max_deviation_index])\n", + " max_deviation_classes.append(deviation_classes[max_deviation_index])\n", + "\n", + "max_deviation_classes, max_deviation_values = np.array(max_deviation_classes), np.array(max_deviation_values)" + ] + }, + { + "cell_type": "markdown", + "id": "b260142e-b760-490c-818e-c037fab5c6c8", + "metadata": {}, + "source": [ + "In our dataset here, this analysis reveals certain abnormally large bounding boxes that take up most of the image." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "47b6a8ff-7a58-4a1f-baee-e6cfe7a85a6d", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:57.846369Z", + "iopub.status.busy": "2024-06-25T23:03:57.845973Z", + "iopub.status.idle": "2024-06-25T23:03:58.522920Z", + "shell.execute_reply": "2024-06-25T23:03:58.522398Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./example_images/000000422886.jpg | idx 103 | class person\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./example_images/000000341828.jpg | idx 104 | class person\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./example_images/000000461009.jpg | idx 105 | class person\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "num_imgs_to_show_per_class = 3\n", + "\n", + "for c in class_names.keys():\n", + " class_num = int(c)\n", + " sorted_indices = np.argsort(max_deviation_values)[::-1]\n", + " count = 0\n", + "\n", + " for image_to_visualize in sorted_indices:\n", + " if max_deviation_values[i] == 0 or max_deviation_classes[i] != class_num:\n", + " continue\n", + " image_path = IMAGE_PATH + labels[image_to_visualize]['seg_map']\n", + " print(image_path, '| idx', image_to_visualize, '| class', class_names[c])\n", + " visualize(image_path, label=labels[image_to_visualize], class_names=class_names)\n", + "\n", + " count += 1\n", + " if count == num_imgs_to_show_per_class:\n", + " break # Break the loop after visualizing the top 3 instances for the current class" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "8ce74938", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:03:58.525706Z", + "iopub.status.busy": "2024-06-25T23:03:58.525376Z", + "iopub.status.idle": "2024-06-25T23:03:58.529090Z", + "shell.execute_reply": "2024-06-25T23:03:58.528552Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "expected_values = {0: 50, 1: 16, 2: 31, 9: 62}\n", + "\n", + "for idx, value in expected_values.items():\n", + " assert value in issue_idx and issue_idx[idx] == value, f\"Assertion error at index {idx}: Expected {value}, got {issue_idx.get(idx, None)}\"\n", + "\n", + "assert all(i not in issue_idx for i in [0, 2, 3]), \"Unexpected values found in issue_idx\"" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/outliers.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/outliers.ipynb new file mode 100644 index 000000000..95f9ad783 --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/outliers.ipynb @@ -0,0 +1,1524 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1043b220", + "metadata": {}, + "source": [ + "# Detect Outliers with Cleanlab and PyTorch Image Models (timm)\n", + "\n", + "This quick tutorial shows how to detect outliers (out-of-distribution examples) in image data, using the [cifar10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset as an example. You can easily replace the image dataset + neural network used here with any other Pytorch dataset + neural network (e.g. to instead detect outliers in text data with minimal code changes). \n", + "\n", + "**Overview of what we'll do in this tutorial:**\n", + "\n", + "Detect outliers using `feature_embeddings`\n", + "\n", + "- Pre-process [cifar10](https://www.cs.toronto.edu/~kriz/cifar.html) into Pytorch datasets where `train_data` only contains images of animals and `test_data` contains images from all classes.\n", + "\n", + "- Use a pretrained neural network model from [timm](https://github.com/rwightman/pytorch-image-models) to extract feature embeddings of each image.\n", + "\n", + "- Use cleanlab to find naturally occurring outlier examples in the `train_data` (i.e. atypical images).\n", + "\n", + "- Find outlier examples in the `test_data` that do not stem from training data distribution (including out-of-distribution non-animal images).\n", + "\n", + "- Explore threshold selection for determining which images are outliers vs not.\n", + "\n", + "Detect outliers using `pred_probs` from a trained classifier\n", + "\n", + "- Adapt our [timm](https://github.com/rwightman/pytorch-image-models) network into a classifier by training an additional output layer using the (in-distribution) training data.\n", + "\n", + "- Use cleanlab to find out-of-distribution examples in the dataset based on the probabilistic predictions of this classifier, as an alternative to relying on feature embeddings." + ] + }, + { + "cell_type": "markdown", + "id": "70016f64", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "Already have numeric **feature embeddings** for your data? Just run the code below to score how out-of-distribution each example is.\n", + "\n", + "\n", + "
\n", + " \n", + "```python\n", + "\n", + "from cleanlab.outlier import OutOfDistribution\n", + " \n", + "ood = OutOfDistribution()\n", + "\n", + "# To get outlier scores for train_data using feature matrix train_feature_embeddings\n", + "ood_train_feature_scores = ood.fit_score(features=train_feature_embeddings)\n", + "\n", + "# To get outlier scores for additional test_data using feature matrix test_feature_embeddings\n", + "ood_test_feature_scores = ood.score(features=test_feature_embeddings)\n", + " \n", + " \n", + "```\n", + "\n", + "
\n", + " \n", + "Already have `pred_probs` and `labels` for your classification dataset? Just run the code below to to score how out-of-distribution each example is.\n", + "\n", + "\n", + "
\n", + " \n", + "```python\n", + "\n", + "from cleanlab.outlier import OutOfDistribution\n", + " \n", + "ood = OutOfDistribution()\n", + "\n", + "# To get outlier scores for train_data using predicted class probabilities (from a trained classifier) and given class labels\n", + "ood_train_predictions_scores = ood.fit_score(pred_probs=train_pred_probs, labels=labels)\n", + "\n", + "# To get outlier scores for additional test_data using predicted class probabilities\n", + "ood_test_predictions_scores = ood.score(pred_probs=test_pred_probs)\n", + " \n", + " \n", + "```\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "45cb0f90", + "metadata": {}, + "source": [ + "## 1. Install the required dependencies\n", + "You can use `pip` to install all packages required for this tutorial as follows:\n", + "\n", + "```ipython3\n", + "!pip install matplotlib torch torchvision timm\n", + "!pip install cleanlab\n", + "...\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "2bbebfc8", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:00.888869Z", + "iopub.status.busy": "2024-06-25T23:04:00.888701Z", + "iopub.status.idle": "2024-06-25T23:04:03.644396Z", + "shell.execute_reply": "2024-06-25T23:04:03.643891Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs website).\n", + "# If running on Colab, may want to use GPU (select: Runtime > Change runtime type > Hardware accelerator > GPU)\n", + "# Package versions we used: matplotlib==3.5.1, torch==2.1.2, torchvision==2.1.2, timm==0.6.12\n", + "\n", + "dependencies = [\"matplotlib\", \"torch\", \"torchvision\", \"timm\", \"cleanlab\"]\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "markdown", + "id": "41733949", + "metadata": {}, + "source": [ + "Let's first import the required packages" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "4396f544", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:03.647313Z", + "iopub.status.busy": "2024-06-25T23:04:03.646689Z", + "iopub.status.idle": "2024-06-25T23:04:03.981647Z", + "shell.execute_reply": "2024-06-25T23:04:03.981087Z" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from pylab import rcParams\n", + "import torch\n", + "import torchvision\n", + "import timm\n", + "from sklearn import preprocessing\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.ensemble import BaggingClassifier\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "from cleanlab.outlier import OutOfDistribution\n", + "from cleanlab.rank import find_top_issues" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "3792f82e", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:03.984374Z", + "iopub.status.busy": "2024-06-25T23:04:03.983765Z", + "iopub.status.idle": "2024-06-25T23:04:03.988160Z", + "shell.execute_reply": "2024-06-25T23:04:03.987737Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# This (optional) cell is hidden from docs.cleanlab.ai \n", + "# Set some seeds for reproducibility. \n", + "\n", + "SEED = 42\n", + "np.random.seed(SEED)\n", + "torch.manual_seed(SEED)\n", + "torch.backends.cudnn.deterministic = True\n", + "torch.backends.cudnn.benchmark = False\n", + "torch.cuda.manual_seed_all(SEED)" + ] + }, + { + "cell_type": "markdown", + "id": "be38283d", + "metadata": {}, + "source": [ + "## 2. Pre-process the Cifar10 dataset\n", + "\n", + "Each image in the original [cifar10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) belongs to 1 of 10 classes: `[airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck]`. \n", + "After loading the data and processing the images, we manually remove some classes from the training dataset thereby making images from these classes outliers in the test dataset. Here we to remove all classes that are not an animal, such that test images from the following classes would be out-of-distribution: `[airplane, automobile, ship, truck]`." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "fd853a54", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:03.990325Z", + "iopub.status.busy": "2024-06-25T23:04:03.990039Z", + "iopub.status.idle": "2024-06-25T23:04:08.181851Z", + "shell.execute_reply": "2024-06-25T23:04:08.181275Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\r", + " 0%| | 0/170498071 [00:00See the implementation of `plot_images` and `visualize_outliers` **(click to expand)**\n", + "\n", + "```python\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "txt_classes = {0: 'airplane', \n", + " 1: 'automobile', \n", + " 2: 'bird',\n", + " 3: 'cat', \n", + " 4: 'deer', \n", + " 5: 'dog', \n", + " 6: 'frog', \n", + " 7: 'horse', \n", + " 8:'ship', \n", + " 9:'truck'}\n", + "\n", + "def imshow(img):\n", + " npimg = img.numpy()\n", + " return np.transpose(npimg, (1, 2, 0))\n", + "\n", + "def plot_images(dataset, show_labels=False):\n", + " plt.rcParams[\"figure.figsize\"] = (9,7)\n", + " for i in range(15):\n", + " X,y = dataset[i]\n", + " ax = plt.subplot(3,5,i+1)\n", + " if show_labels:\n", + " ax.set_title(txt_classes[int(y)])\n", + " ax.imshow(imshow(X))\n", + " ax.axis('off')\n", + " plt.show()\n", + "\n", + "def visualize_outliers(idxs, data):\n", + " data_subset = torch.utils.data.Subset(data, idxs)\n", + " plot_images(data_subset)\n", + " \n", + "```\n", + "" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "9b64e0aa", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:08.184024Z", + "iopub.status.busy": "2024-06-25T23:04:08.183835Z", + "iopub.status.idle": "2024-06-25T23:04:08.188504Z", + "shell.execute_reply": "2024-06-25T23:04:08.188064Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "txt_classes = {0: 'airplane', \n", + " 1: 'automobile', \n", + " 2: 'bird',\n", + " 3: 'cat', \n", + " 4: 'deer', \n", + " 5: 'dog', \n", + " 6: 'frog', \n", + " 7: 'horse', \n", + " 8:'ship', \n", + " 9:'truck'}\n", + "\n", + "def imshow(img):\n", + " npimg = img.numpy()\n", + " return np.transpose(npimg, (1, 2, 0))\n", + "\n", + "def plot_images(dataset, show_labels=False):\n", + " plt.rcParams[\"figure.figsize\"] = (9,7)\n", + " for i in range(15):\n", + " X,y = dataset[i]\n", + " ax = plt.subplot(3,5,i+1)\n", + " if show_labels:\n", + " ax.set_title(txt_classes[int(y)])\n", + " ax.imshow(imshow(X))\n", + " ax.axis('off')\n", + " plt.show()\n", + "\n", + "def visualize_outliers(idxs, data):\n", + " data_subset = torch.utils.data.Subset(data, idxs)\n", + " plot_images(data_subset)" + ] + }, + { + "cell_type": "markdown", + "id": "eb28f354", + "metadata": {}, + "source": [ + "Observe how there are only animals left in our `train_data`:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "a00aa3ed", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:08.190424Z", + "iopub.status.busy": "2024-06-25T23:04:08.190236Z", + "iopub.status.idle": "2024-06-25T23:04:08.732815Z", + "shell.execute_reply": "2024-06-25T23:04:08.732265Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_images(train_data, show_labels=True)" + ] + }, + { + "cell_type": "markdown", + "id": "df819e85", + "metadata": {}, + "source": [ + "If we consider `train_data` to be representative of the typical data distribution, then non-animal images in `test_data` become outliers:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "41e5cb6b", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:08.735175Z", + "iopub.status.busy": "2024-06-25T23:04:08.734752Z", + "iopub.status.idle": "2024-06-25T23:04:09.224819Z", + "shell.execute_reply": "2024-06-25T23:04:09.224213Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_images(test_data, show_labels=True)" + ] + }, + { + "cell_type": "markdown", + "id": "92caec8a", + "metadata": {}, + "source": [ + "## 3. Use cleanlab and feature embeddings to find outliers in the data\n", + "\n", + "\n", + "### Represent each image as a numeric feature embedding vector\n", + "\n", + "We can pass images through a neural network to generate vector embeddings via its hidden layer representation. Here we use a `resnet50` network from [timm](https://timm.fast.ai/), which has been pretrained on a large corpus of other images. Note that cleanlab's outlier detection can be applied to numeric feature embeddings generated from any model (or to the raw data features if they are already numeric vectors). Outlier detection works best with feature vectors whose values along each dimension are of a similar scale. " + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "1cf25354", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:09.227103Z", + "iopub.status.busy": "2024-06-25T23:04:09.226743Z", + "iopub.status.idle": "2024-06-25T23:04:09.230110Z", + "shell.execute_reply": "2024-06-25T23:04:09.229675Z" + } + }, + "outputs": [], + "source": [ + "# Generates 2048-dimensional feature embeddings from images\n", + "def embed_images(model, dataloader):\n", + " feature_embeddings = []\n", + " for data in dataloader:\n", + " images, labels = data\n", + " with torch.no_grad():\n", + " embeddings = model(images)\n", + " feature_embeddings.extend(embeddings.numpy())\n", + " feature_embeddings = np.array(feature_embeddings)\n", + " return feature_embeddings # each row corresponds to embedding of a different image" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "85a58d41", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:09.232164Z", + "iopub.status.busy": "2024-06-25T23:04:09.231845Z", + "iopub.status.idle": "2024-06-25T23:04:22.241089Z", + "shell.execute_reply": "2024-06-25T23:04:22.240417Z" + } + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "aa545006254f4fc6b53bddaa1b160a6e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "model.safetensors: 0%| | 0.00/102M [00:00" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "ood = OutOfDistribution()\n", + "train_ood_features_scores = ood.fit_score(features=train_feature_embeddings)\n", + "\n", + "top_train_ood_features_idxs = find_top_issues(quality_scores=train_ood_features_scores, top=15)\n", + "visualize_outliers(top_train_ood_features_idxs, train_data)" + ] + }, + { + "cell_type": "markdown", + "id": "756333f7", + "metadata": {}, + "source": [ + "For fun, let's see what cleanlab considers the least likely outliers in the dataset! We can do this by calling `find_top_issues` on the negated outlier scores. These examples look quite homogeneous as each one is similar to many other training images." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "089d5860", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:24.319144Z", + "iopub.status.busy": "2024-06-25T23:04:24.318841Z", + "iopub.status.idle": "2024-06-25T23:04:24.565435Z", + "shell.execute_reply": "2024-06-25T23:04:24.564770Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "bottom_train_ood_features_idxs = find_top_issues(quality_scores=-train_ood_features_scores, top=15)\n", + "visualize_outliers(bottom_train_ood_features_idxs, train_data)" + ] + }, + { + "cell_type": "markdown", + "id": "2521aefb", + "metadata": {}, + "source": [ + "### Scoring outliers in additional test data\n", + "\n", + "Now suppose we want to find outlier images in some never before seen test data, in particular images unlikely to stem from the same distribution as the training data. We can use our already fitted `OutOfDistribution` estimator to score how typical each new test example would be under the training data distribution and visualize the most severe outliers in this additional data." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "78b1951c", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:24.568629Z", + "iopub.status.busy": "2024-06-25T23:04:24.568075Z", + "iopub.status.idle": "2024-06-25T23:04:25.242375Z", + "shell.execute_reply": "2024-06-25T23:04:25.241791Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAs0AAAIICAYAAACVatOGAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8WgzjOAAAACXBIWXMAAA9hAAAPYQGoP6dpAADVGUlEQVR4nOz9Z5RsV37dCV4XPiIz0r983uIBD74AAihvWFWsIqliUaRoJFFqSi1p1OpRS5o1MiO1WT1reqbXSEuakdZIo5Yhm5ToXZGsElksxzKoAgoeeACewfMvfWZkRoa/Zj5QE+fsfZFx8wHIrCK1f5/uP0/ENeeee+7JOPvsv5skSeIIIYQQQgghdsT7Tp+AEEIIIYQQ3+1o0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZBDs+pP/t78IYdfxIc574XDbjTFfiuu5ECcYOo4VJ/Rdh2P+LhV7kflD7Cb0UfxyPKBDbZo/dBeXoSxsbeO+ymWIC2N5iP3Q1Ic3NYbf9bDaGwsbEOeKWLfVyGx36bM3F9YhbjohxH6tMNw+fHAWyibyMcSln/+ms1f8q9/5EsSjMuqkbjHl3/E8+l/Pda1N/DbvK3Us2lfge2+67TiOE+RyEOcoDnz7vlGbTyKK8Zr6A7xv9v+zUYj3KcbQSQZ9iEOKY+sLqXrnP6QqzH446Xmi7/6VH/oQf/kd5bPfugaxR893IWffLyyLuB8h7Gvhz6ariCqJPpAk9g2idkCxm6rvnc+T23YW9sf5m9SEHIf66OuXzkNcyJl+5MSZe6Gs38P25vDzaR09K5fWxx45NLL87fATf/dX6S8732d+1j0Xrymm64gj83zzffJ97M+xfThOEuOzH1n7CkMsc+k8Un2hVdfpR3n0eXnuzm3Vo9/X+C7yNcXUSdmP1CB1TakT3XFf3H64rn/9n/4472zPsO+T+JMBPxNvhn5pFkIIIYQQIgMNmoUQQgghhMhAg2YhhBBCCCEy2L2muYia3d4AtR+RpTUq9NtQ5uXwMEmOdCOgHU2p7QjSt9LnY8fojDLkc45bpMufmx5uludQsxxfXYK4dW0R4uZqF/d9eGq4PV0qQJkX4fWXt1AbtfLKVYhX582+Jk7OQdnZ4xj3L+N5Xb1htNmvrt6AsvxkCeL3O3tHtVyEOGYNXGTilAaeNKas84stvXDYp7IY65Y1cJ6H96Lv+nYhlDkJtmsW4wW+aU98DQHpb3P0DLguP4rm+6xhZk0t67BcB/WYg4HR6vf7pH8OWUvN3JmOdi/JBXidvod1mrfqNIyw0iJqb6xRtfuVxCXNacJaT4LXcLg76+JSemguh+LR+s0s7E8n9PsINU/n1eefhPif/y//I8Tv+cj3Drf/67vuw+N4rI3FfYOOm9cc3OE1vS18fIe59O4IrOedHteURpefb9QW87oKvmbSB/v47Nt9EvdPYThaR+tb71Ku27QeGL/L1+SPaMepa6TPcr8bxZY+mtb08HklrDW3ni9+b+xr+yF2o38Vf/LQL81CCCGEEEJkoEGzEEIIIYQQGWjQLIQQQgghRAa71zQHqB1qRzjeHrRNfGS5hd8lgVhMXsJuwWgwE9b/UpyQjpH9b+1jeRma5oT9IttGl5zUUe/rn5yGuDaDmufq7QbE/b5lAt1HQ+gOefK6pJ8+UD6Mn1803sybT78BZY3pCsT1gxMQnzxivJmPkMfz0q0VZ78oFrCpsc7Ps7V47NOZ8jVNdixP+YOytpW8NT3SpdmWyAPSD/K+eV/9XmfHcwxD1BKzrynr/Gzdo8t6VNJAFljrO0I3Wijg88TXMNp79Durb2ZfZi+lfbTWM7AGNeE6cSg2+/JJcxrFrHEe7Vc9stvJWGhxZ5rmDNN6qzwKcc3Fy89+A+Kf+Zf/FOLLb7wK8cGjR4bbzU3sR8qVcYijCPs3W+/LdZf39+93m4BeCHyffcc+T7znYWoNBt8b811eCpFqt95oz+N4hKc4twHWDjv2M8Drf9jjmQ9DfWXi2t7TO68BSEdv4lVtaZ5T7XiUBt5xnMTSnlO1OxFf/z6S5Rkt/mSiX5qFEEIIIYTIQINmIYQQQgghMti9PKPdgTBIcJr3xpaRZLy6jdMW765PQjwW4VRhvLJqtsNVKHNqNQi9GlqXJUWaErHs7RKeBiOJicsyicjIKJIKyhzcrU2Mqebco3iNOWvqJnZJErCN57H1OtrZJTWsv/IZYyt3uIdTaJtXUWKx9sxliJdnjHyjdvwglB05gpKTvaTbJhtCtkkbMbWVSqOdSmkb2x+m7+K+Ypo6jkOUzviBubFBgMdhWzjXRWs3G07Dy3qAAckgel2Ub/Qt67xuuwdlrVYT4g5VXTjAz9uyksnJKSjjadeYUvpCivLRwoM9hy3nWIJhz/Tnc2gvFtG8Lhtb2uVRRnvjeojSSYXfdPOPvpw1hWu+cKfTvQl9PuybdvDklz4HZf/x3/4LiBdu34T4wDxKxN71hDGkDPLYB4fUlvmSfeu02KarWtr5GXqncfsN/ANLonzzTktitpvk3PVsdWqeG9chK7KUHRtN63M6a0s3kbK6Y5tLtvtzdu4L+cb4bDFH52FLteIE+8mUvC6VzpsOZpXHLMfIkCzZyipO583WgELsNfqlWQghhBBCiAw0aBZCCCGEECIDDZqFEEIIIYTIYNea5vg26pCDOuqfvnDx+nD73z53Cco+fnge4p+46yzEj0waO6NCCzXNyRbqN50Gxm6R9GKlvFVGaVMrGIPYznGcxNIAOlvb+N0AdXxOiBpvh2z0QBJHGq78OJ7H9FHUT7duoMZ582WT/ro9OwZltdOoUz7aRu1Z+6pJo7309EUoa87hvu539o5uSDq+aOf01mkpJ6el5ZTI9jZr3kj3yilsybIwGthp2MlijuyNOKXtqGvwyFqrkEctZ4U0uHN10yYGlBq818Fn0c/jvpeWsf2sLJp4c40t90ifSim4PdB1j9Z17jWlAumv2W3SqvQ8pSnnnq4fkqVYZN/3Oz0zatt2Su530KYvZXFFP3l0mlsQf/WLvzPc/tWf+7dQtnwLNcyHTxyH+Ad/5M9B/NEf+DFzHiltP9vIcXp5s12h/nqqhmtj9pK5Gj5H65tojToIjc2ol0MrT5+0w2xl6UJZuGOZ4ziOT2trYrI+hdpMrdEYoZ930IaR+xzeF1sDxhG+OzyrY3X5u/TwuRlDCdB8e7yexaF4hCUdWUd6rD0XYo9RixNCCCGEECIDDZqFEEIIIYTIQINmIYQQQgghMti1prlHGq5F0hZ/e2VtuN3oojbqP7x6AeJV8kD+2+/6/uH2qSamaB1fvQVxbhM1z1FrDWLPSoXtkg9nVCQtXpk0z2XLp/nAHJTFB9CH2Wmjb7PXJ40XeHGywBV9dHPHqhCPz6GerrRktHedm3i9jdtX8TynUCNYO2uu40QPdWlt2tdeMgg5bTSldLX0c6zby0qj7dhepgmnT8b7kiOdH3v/unaKatbP0X3MkW52YPl+9wfou8xptNf6pEum/1/LRaM15rqqlDDF+9FZXDMwVUVdcmvaaNeXlvD5WVnH5zhO8BnpWHXgkff0fqeNrRTw+H3SevctjXmf9OYsv+6TR7vt05zWNGfpSpmdte13opdO+ZOTj+7mBt7L3//ML0H8mV/+34fbzU3UO5+65xzEn/qxPw/xhz7xp/HYrtVX0nmV89hmAp/SVVvnPV7C/qlS3H2qgLfLhx+/F+KrCw2IL1ie9xvbqHdOPDzvXIBtMbLej+kU7uy1jOfFPs22JjrhV0cq5zSns7Y9nndec+E4juNTX8ga54HVh2Vpqfukh2bPZ9c6VhRhX8j7znE/Y/dBpKfPBbROSYg9Rr80CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZ7FpQtlBDT9/la1chfm3R6GPnDh6AsicefRzij//AJyFesfb95B/8BpSdIc3lB+cOQzxG5xWtLQ63ky5+19smY9c2aqtcv22CLnoau4fq+N0O+jS77Bdpmwezv6ozmqSIn8gfMxrw/AE8j8rNdYg3Li9AvHLbeGa7x2egbO7coYwzeefodFDHHQTs9WquOYrYDxm1dgxo4li8mvJpxuJCjjRyOevYZITrZWp4zXmwr2uhjDr1WgVjPs+8pbX26LABNbXGCt7zAvkpT9bMsSoHUZs/UUWtZjciz+dN085bPdIB37mh8dsiRxXRpcPb5zOgNQZRTN641ExAFk96TQ4ZbhXJjsGbsbNWlLXsK4vorfy53/iPEP/eb/8KxNtbRsf8wMPYB//wn/tpiB985H10HqSztfSxJdIwF8h32KP7VCnl3nTbcZzszvAdpLGC9XfvyZMQnzlq+scLl69C2Ws3cA3LFrUvz3qV8n1zSSucRLRmgzzaA6v+YjqOQyH/6uVbx06tDaG1IHyf8nQesdUEBrQmI9XNZq07cez+jHz3XWxrCa1HgH4mIQ9sD7XUQuw1+qVZCCGEEEKIDDRoFkIIIYQQIoNdyzPWHv0QxJdu/BrEjz72vcPt93zfx6Ds8GGUVOTJOuriK88Mt5+/eh3KXqNpoTduY/zusTrE5+ZODbdLbbRUizYWIfb6JM+w5mi9pQaUJZsoxyBnLifycdrIr1rpicuYqpinn1yy/vHIzsieC0scSnV6eBzimSlKs73YGG5v3Mb6WLt9GeK6s3d4HssxWPrg2QF+mfQIyQiZBNs7cZpsLmeJQdI39Zu2xdv9/5gRp8bt43GqJUrLTtOZZcuKq1pGCUWJnp88W0nRNRdyprxAdmAlsgBz8mhnN9Yw1lvtLu632Wo7+0mW7VVk1WEU8WczdADJm27u6qucXRyy/mZ8lbGtum5dw+fzN3/x30P8lc9/FuJOB23SHnzX9wy3/+xf/htQds+DKNdI2NssJRcybZ9TYRfIerFI8o26JQHiFNKdHukN9pDbt1Gesb2NNnzT09PD7buOooxpvF6G+OoiSv8WVs2z0OpRP0G2lj5dcnEbY9d6RiPSEYUxvrNSEil4MbEMAj+akr2FnKbdnCg/AmwDmmU/GVnvPP6s647e1yiLxyTev/YjhOPol2YhhBBCCCEy0aBZCCGEEEKIDDRoFkIIIYQQIoNda5qjUycgLn/iRyD+RMWkgi6SXrPTwZTBWw1M/9ppLg23D1P66vU1tPp5qova4pdvok7tXZbe82PzuK9TB89A7KyjxjncMufhkrWNE6BWOMnhNbqbuK942zrPOUwb7lBK8phir4q6UjdvrimVCpV0aEkB4+IZo9ObP4SWc/19TKPd66IWj9OscgyQpi1yd06zncR03zilLalMWYrnW3/wOY0x7Ystm/CwlMa5g1p01tzmc/j/a9VK531kahrKyqRD9kgfPuijvZ+TmLoPSB++TdaJsY9dQsUz9Rm4uN/E3V+7p7TFHd0fe5tubJzSfu6XXR4dh1MoU3zx/AvDbdYwf+trX4K436P7Qbr4I8eMpdpd9z0MZWzrGNC6ijLp5m1dfKngURk+u2Nl/K79+TbZFjY72C84DlkxvoMcOXIE4o2NDYjX1kx/2O3iO6teQTu2R07jWpKFORNfuYXvrPUGas3jBOuvM8A6sLsVehwdn6wTkwHeR+x3RpohptaVpNYIWDrkiPrVOOa1I7xvPHYQ7PwOy8L+PH+Xz0OIvUa/NAshhBBCCJGBBs1CCCGEEEJkoEGzEEIIIYQQGexa07y1jPrXwgSmyo4io43c3kIPS06x2dtGLdnBSaOHPlCvQdkL569C3F0kr1gP4y+2jM7v0uUrUPahcdShPTqJXpxTFePF6a0vQ5lD5+wW8TxdF6sytjyg3RX0A3XJd9f2BnYcx4lJ5+dVjMbZHa9CmUN6wpi8N11Lu5iU8X8k/wxe/14y4FSyrJ+zz5N1yCSBixKsr8T2+eT0rRRyemtW10WW9s4j7aFHOmWf4sAy7HXJK9n3SA9NesKQtJ5XXjepsfuNBpRNT9QhHoSobQ1D1omaSpibRV07awL7fWyrE1be3laMx2n2UA+914QRtxnS84/4LqcQTqXKvpN8zhlyaFvfmVqDQD9TvPDtr0P8Kz/3b4bbLz//bSgLB9hGWFMfR3jfL198dbi9uY7rSA6Rd36NvJfrFfKWt+ovpLospPTQtF7BKu6TBrfd3T9d/PHjxyGemMC1Js2mMUwu0bqcuIPPRXdzHeL5ulk/M+ZhH72aoMY5mMJ9X9zG52p12Xw+CvFd4TvoFx0EWB6DjzO1+VTXSB729Az4lr90zhs9VBhQ2wxDeqfZORA89mke/eyNKr8T73wh3gnU4oQQQgghhMhAg2YhhBBCCCEy0KBZCCGEEEKIDHatad7soq4oClG3HPaMr2UYolaqP0DN1uYq6sG8ihFbxQ7p9si+txCgMGtAfsqRpem9EeA5/+rWNsTPdtoQf6+lcX507hiUFdvotRmuLUHskNemawn5vA7peQesd6X/XVzy3gwtH2L6rkcaZ9dHj1Tb0tjr03ncmV3m24J9PEP217R1a1QdrD+N+bytXXmk2wvoIl2u+5RezmjxXPIlzmd52QbBzp8lX+9eG31gu9uoARxYHryvv/YqlN0kn+YwxH0Vi3heVctDfX1tBc+5gPuK6L6Uy0a7v9zA52W5ifFe0+7hvRuE5HdutZMMafsdNv7R3t4pD2hbs0n1+c2vfAHiX/65/w3iS6+ft746WtvPa0U8MvW9cvH14fbNSy9C2ROP3AtxJUf+7jncV7trjtXt433I07qKwB/lX07P451oyd8mV69chbhaw76zWDTPQi6Hz9BggFriGq1Lse9NMcBnueThu7K0hc/NgRiP9UZsnueFAd7j1QjreuCif7Rr+ao7yYg+9k3gdmyv6eAmHgTYPopF7N/YB9zO1cDrCxh+V9hxlh+0EHuNfmkWQgghhBAiAw2ahRBCCCGEyECDZiGEEEIIITLYtaa5OyAdUsS+jCbukla420XN5fo6+lZubhj9ZruNfpirC+gvurSA/skN0lUemhkbbvukrV4nfVh/DH06lzeM5vlSFz1o31+fgvjYPGmHlxYhTixfT9ZduQ55K7t4Xgn5nLqB+d8mbmFdssbLm0Av6sTyA3ZZ4raPesKUTo01zZYOlPVzccpvlHRt1r580svlyA/Z7aP2PKB7U7M0vrUx1DGOkQayVsHyvqXr77axXXZIE7+6hNriTos0833zTOSK+Jj2Y9JMksb54OF5iF17YYBL+tM8fneribr/16/dGm6vb6I288FHHnL2kw55wSYjtJEp3SSHaZHzcMvltkpthLXErk/Pq6Xn/NLv/xaU/crP/1uIb11DL3lbv5+Sa6YuCT/gkZfuVtPcr5ef/RaU/fRf+EmIq0Xy+6U6CHwT50mznCP9fkR1b/s6RxHWXYH00HvJ+sb6yNh+h1UrFSg7efwExPNHj+B3LR/2hDyc/QrqjnuvvgZx6cY1iN286cPr9TEou0xrMm618ZmMHPNecmmtDOuMU+8l1upbPvQJedYn5JXP+/bpmbDXTvCzydp9jkfB7wIh9hr90iyEEEIIIUQGGjQLIYQQQgiRwa7lGT2SK4RdnH4O+2ZqejDAqZseyTNaJN/odMzU9PYmpqu+fQNlD7cWcOqrQulOC1UzrdYkq7dri7chrjZQClK86/Rw+1s+Tqm9fPsWxO8u4/Td+2YxLe3MwEybRRuYgtyl9MM8vZuQrCQJrDhH06hNtEXzHZS+uAUjIUgofakTjE4p/U4Scgpq+n8NnJLo+nM0b+iRLWHPmqKsVFEyMVHD+xSU8LhjJLGYnKwPt/NsO0WpYZvr2FbX1sx93t5GmcOA7P7ikKcz8RorE2ZadmYWpUEBTYdz/eRy2Ha3tk17W1xBedPyWgPiWwtopVi02sgD990NZbNTWLd7TSptNst0rO2UciMz9fWb74fLHMdxfLJ267Rwivx3f/3nzfav/SKUra0sQJyeIresKklPFZFdW0qeQj+BFAqmHeTIfjNHn2XbyyRiWzlL5pWqTDzPMMJj9SxrQJZ9cLvfS2Zn5yBuNrH/71u2oWxzOTuHUr7pOZTBtaz7VpzH4wxWML65je+/wizu++iYsT4dJ4u56Ca+D8vbWPdvrJr3QS+kdOYB9mdpD0OSSVjPW0qSRDIv3hWPAWxJBssCXXov5cjOzpaZsAwky75uT2EZ16iPUsyt3q6TTDfMzEtO3mQrfRzHcRzXHdWn3tmzOfoaR2vNUn0KN027KEOmNoqUWy3bdrLH8ZugX5qFEEIIIYTIQINmIYQQQgghMtCgWQghhBBCiAx2rWnm0fWA7J96PaMHa7U26bNo8zUgu7pOx5Rvk6Xa+gZacfXINu59901D/OPvfmi4vdBE8csXptEm6BvffAri5PLV4faHH38EyrbyqBP9DzdRm/j8Jl7znzthbL/OHkNNarT0BsROCzXOPtswWdqymAQ9Lmlykg3UV7qelb6UrI8cslXaS/OniO65S1q0omf0doMWav76FPfaWNfbTaNzr59Ga6hJ0iJyyuntJloarq8aTe9mA+tym2zhWFNrXxKXlcjS6/gR1MDn8vgoepbN4MREHcoC0qJvN/C8Ll28AfGFK8bS6tYiWt1ttVETXyyjrd4D9xodc3X6AJStNLEu95qUi1xK02wp1lK6vwz/NrvEZd0t3ptuE9dV/MK///9A/LnP/Npwu0NtlzXMaStGo9kMyHoyn+fU1tiv8r5/5Ed+dLj9t/7W38Z9kV4/JK0on5etifbJ2i4M2ULMoXLLjg2LnH64e3uxt8vaMvYbQQGv+eRp80xOT+N7ZXpuFuKcj89z1bqMAtVPVEPbuIkzZyH2yFLStd41Aa3/WdvG/qoSY/ncIXOsl5ewfSxiF+wEHraBfIKfdzw7LT3d44htF/HdEpBoPrHf22wpSm0vovUeoHn2WJf9nfvdj/WwqBcerejlSoitEVZKG8xyX9rZqDUYrM3v0Nq05hbq+qtV0//nacyTI00873t0mvbRi0y4LlMLSdyd65YZjLBWZDtRtmXcDfqlWQghhBBCiAw0aBZCCCGEECIDDZqFEEIIIYTIYNea5pB8F7tt1DPavozs0ch6OY90SLb/LX+3PUDN5aES6ln++n33QPzYBx4fbneefB3K7m/hvurvfTfEX33ZpDe9eB19mR94AHVo7Hn8yiJqzf5fFy8Ptz8xhZrmd88eg3hqErVkUQ99nW3ZDfsIJm3S/rRII2np0hzSVzpt0rDtISXyVi54dN5942s8jnJBp1BGLZUfHIQ4lzc65iJph5tb6Je8soKaXtZa21rFchm1iJUqasBZDtW29MG3bqHmndNok9WyE5KPs+0pu7mG391soDbz9i28psvXbkK8ZvkIV8ZIXzk1A3G5gL7nXSvtuK2NdhzHOUB+tHsPi5p31imnLGgzdmXj+djeFm9iquuf/Vf/GOIn//DLEPMaDjjsHfhFe+ydzDpA0pXmC9j2P/CB9w+37z6LHts9OsdU6noSkHtWhbK3suvjZwc97O97fXOenT6ec6uH/f1est1uQHyEtMYVS+85S3U7Rr7EnCrcXlsS0bobn9Lczxw6CvELL70C8eWbZk1CQP1khdp8jrTD8zXTdv06raW5gfrVDfL45/aE7216oOj91++SLzPrfa368lLp3/EaUopV6zRiXnnj718a9iVaD9IgL/7rN02/WyqgHvjc3WcgHhvHtudZ9elldBIJLRropTyxTXmRzmOB8k387P/+8xCfucuMcx544D4oq9frEM9Moe6/kMd23gtN++IcB0V6zwTsze1y32eIaB3Exga+D6/cwjU9gbV+o0i67OYmPhPvobVsb4Z+aRZCCCGEECIDDZqFEEIIIYTIQINmIYQQQgghMti1pjkg7dAoG1SfNIGDAXtx7qxhYr1zkqA+7mHyDrw/hwqo0PIpzm2h7vpdA/xsj/Sv23efHm4/f/45KDt4GH065+dQC+rEeN4ba6a+fnFpCcq+fAXjP/vEo3ie83WIk7bRxyasl/RQG8Raa9fyfHYH5F+4iZ6Ne8ljZ05CnCct3vZWwwQxXmNMeqjtJmqzc1Yrbm9vQFmLdNylPLbjyUm8j9XauPlsGevW9bD+fBImX79m9GKNRgPKWMOVI/1XFGFbPf/axeH2DdLXb5FOu0N+vblSCeJStTzcPnTsEJTVqujLnCdP4vqE8bl2qW1truN57D2ss8XSnOVt7ZK5KUnGUxpe25e4UMT6e/bpb0D8h1/4PJ0H//Zg+RKTBpW9lBm7OCQv/Ij9R0kL2iIf8d///d8bbn/60z8EZaVyDeIB9Q3syxuAbhmfoZA0hh3SNHd65jq6dCPY03kvKZZIT72G6w5u3DSa/cEl1O9H78J+pHIU11X0rTrJWX2I4zhOq4vPdo/WAy0v4RqWf/dvfna4vbmBZTM1bJv3nEB99Mc+9rHh9skjdTznSbznz7yC/crSGmqc/bxZwxGGeN8Saos+PYx5WnsTJ+b7/T5pqUkfzv0qaJ7ZvndPswsg//Af/g8QF8cnIR6z/PQLNLZ4+fJliB+gtVjnzhjNc4neFc029rPLK9gmlpfRN96uryrlYuD+5+CR4xDni+Zdwe328vM4Jjp+4jTEpVIZ4o41Vmn3cF816mNnJ7EuazV8L9n6+hs3bkPZ1ZuLeNwQ34dd6zwWruN6n1eeex5iaZqFEEIIIYR4B9CgWQghhBBCiAx2n0abUoMGOZRJRNY8mxdgme/jtFiBrJHsaV83R8chi5ApSiHsr6DdiPessY3zGyRlILujR8ju6CUrxfK3aBrxtfOXIJ79IMo1xutoIdPaNtKHxMFUzl+5iVM1z332yxD/3fc9DPHH561pj5jSUXP6W0rJalvyeBWc9nHyu779b5uzh9CebJumnGLLDmp1HdNXr9MUZY8kFwdmzL7nZ7Cuc3Rcnhxvd1CiEkcm3iQZRLGE7bpWxenO2piJJ2m6aUBt7zpJLi5ewjaxtGIsDBP633Z8Avc9cwzbItsZ2VZkFZr2qtJnJ8keMRqY+dDNdbTnaTZRDrDXeDQFnKOpxnLR6itInpF08Lnpc05ui7CP/ca5+x6A+PARtIy8deM6xO6IlK9Z8gzbVm7AqYppKtqnKXDPw/6s2Wxa23ivghxOj/ZZukXSD3vGfEBpjttdjLu0r9C2Y8uw3NtLZqaxrXfXsB+JLDnMG4toM7j+JL5n6lcwpXx1Zn64feZhlNsFCdZlj+SK99+L1l5nrGnv//CNb0LZqwO8j6+88hLElan6cPun7kFJ3MQEWWaSrOjpV1A2uN40z0GRZJExtYGIpEQ+tZ/Amo4vliiFO+0rJDmebYWXUKpvd/dDmLfN2btRjvC1bz4F8VTb9MOHjh+Hsu0WyhOefvZ5iK/fNFKh2Wnsz5stfB+2STbhU79gj6/WSbI1OYHvx3MPYN9Wq5mxRo3GC1FMEi7qY9caJBPJ2f0x9nvbbXz2el2U7HC/2emYa2bpWK6M48m1RRwvvPrS88Pt1aVlKPv4937AuVP0S7MQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkcGuBUExeQOxNq9cMnopn+zEel3UjbKWyrVSlAY+aqfKVbIxIZ1fvIlxMml0N50EtT95svXKkR74eyyd40k67u0l1LQtL2Ha7DqllAws2y/WYU3V0ZLo1UtoffRP/uBJiOc+YlJbPnwCdbRhj+x5WljXfn7n/4uSItb1aLXl2+OZZ56GuEWaJlunxJZDB+cOQzw1hbqsqqXTzZP2jtspW41xGu0+6OsoVS5puOy02Y7jOEsLpk28+MJrUPb666iJX11H3VVMtT82abTFrGGenkPNW6GMWkXO7x1aQlLPIZvFFuoH17bQkieKzHklEetz9/d/7oDuZZ7WN+RszTOdm+fis56+t+a7vR4+Q0eO3wXxw489AfHiAurT7TbG/SaTJGyjBwacWEbrSnJ51IZ+8ge+D+L/4b//H4fbk1PYZvqUejeVE57aY2SdJ3WbKe0178p+tiOqj3Jh/zSptVod4okx1HWvLpm271XxvHotfNYXr+IahPyq0XMWC/juOHgaU5jnCnjfvAoe66f/3E8Mt9+4+CqUPf/iM3he1H997vf+03D73D0noOyD73s/xHcdwfULk5PYRl57w9TH1WuYmrjVJltQeudH1FcMLLvXgNYt5ckGNE9twrZeTBxa07OXLy3i7/ydvwXxR15EPflnP2csHm9SKueQxkCHDuM7rbNt6nOxj+m687ROqVzE9lUu7PweHyOLwskJHD+sreI4Jh+Yh7tYwHvKawLKFVwfMwjxWK1tM94q5lEfHVKf0e1jfxSRzt22PG5soK3sZbLzu379KsRHj5j1Bj/9Z38Sys6cOuXcKfqlWQghhBBCiAw0aBZCCCGEECIDDZqFEEIIIYTIYPeCMhIPBQF+1dbm5XKo2fIpBXeR0i0WCkaT2eugtqVCuplbq+gFuJRHPefkuQeH243XUf81t41+h/0Ej3XE0h7fV8Lre3UJ/Q5XF9HTcmIW0zHnrDSaW1v4v0mhQpqkItbPy+RT/NuXTNrIh07fC2XJALVSSYTX5AbmWKzndUZ41b7TXLiCvqfT06gBPzRv0jvnyZu7SCk3A9K8RZZ+tZNKPUz5kx32n8W6D3yjvdreRp/my2+gdurrX0Pt+XPPnR9ury6jBj5KUMNVGcN2ffgY6g9rlo65RyLSHpndbjfQPzlPPulHD5q6PXQYj3PhKt6XW6uotfasNQa5APXQnre/prvVIraLQoD30k6j3SUv3CijrY/yVg7ouj/wvd8P8ZNf/RLE6yu2JpF04HRcz8e/FCy9J/tzt0hXe/8D6O/7D/4v/wji48eNryxrmD0SHqerB/9ga1R77MNMabQj+q6tX6yQnneiinW7l1y8gn320cOo0Zw9bJ6TdrMBZckm9skOeVO3Oubdce3qRSjr9PC+HTqGqa/HJnGNxn13GR/w/+t///eh7OWLr+C+u9hHhdZ5lCt1KOuRTjQeYL9x1EoD7TiOM/+4OY+bR/E9+9L5NyC+RWt++hGlWo/Nfe8PUA/Nun5eh2Kn8OacD66/f7/78TjmwQfuh/iu0+Z5e/rbOPZ4+ulvQ3z1wusQ1yfM+/Cuu1ADP1HDdlogX+IC1cHqivEi3qL2MU0+/QmlNK8VzHqrEq0PWqP07z6tG1lcwnTWtrZ/hvyhub+pksZ5kcZXL1kpvJeobJZyC/zZH/1RiO+519RnkXKEcNvbDfqlWQghhBBCiAw0aBZCCCGEECIDDZqFEEIIIYTIYNea5iJ78lEe8k7H+O66MY7FWcM88FDTZPv9bTW3qQz1PNeX0Vfw5TxewkPnjH5lMEAdWvc25h2PQ9QDO4m5piM+eUWS9mWVdKQnSRsUWFrFmH2pc6irCVhn4+O+Xt80cWsTz7lGueUd0hvGls7TJX9HN4carb3kXY8+CHFI2mNbm2fr3xzHcVo9vI/FHmqcx8ZNGyGJsjMgLWdI9rSrK6jFe/554735AvlwLixi+1lbR79IW5aVL2NdT03WIa7VUeMVVNE/c71l9GMRPWsHqqgvHKthXKV9FWvmXF68iFq6ZgefRS/A87bXMgxIW+0l+2iS6jhOpYTtwiXtrGOdD/sBs3RtlJaNf0lISI9+5p4HID5338MQf+UPPjvczvEzRoflGowtsd+AGmsU4b3qtPC5CPv4TA0srTH7orKWmh5HJyHRYWKdOGtj+xHHeCz72NUi1m6psH990EYLz6v12jWID02a9QwHp+tQFszic9GlfiWxfIu3tvC+dG6iR/sgxj57ooE+7LW6WR9z3z3oI3v2vuO4L2ojRdf0jQ3ywt/cakDs9rF8YeEqxHXLl/fuo4egbG4a9bzPvoJ1ef4y6lubbdN+SiX07I1ojQY/u/Z6hJCeASdk//V9hPqQmtUvf+RDH4Syhx/E998zz6LG+amnTR6DJ7/2ZSg7eRp94u97EOt+cg41vasr5rxu3roNZbOzcxDfvI7lrtWHlirYPq5cvg7xvfeh9nptBdfDHDtmxnVjdRw/riyhF/VT38L6eOWV8xBPWXkLPvbRj0AZa8t5HZxj9d/c67N+fjfol2YhhBBCCCEy0KBZCCGEEEKIDDRoFkIIIYQQIoNda5pZG+qRZ2HJ0nD6pBzxPdSNNMn/tjphvAGdNdTFFMuoz9yaQv3XV69hjveTq0Z3Gs5jfvcaaadc0k8PBqa80Kbrpe82tlCX1mmjDjklnrGPS/Xhky8xScKdtZ7RbbXJH3TMwThJUJyYWLq2hO6ZTznr95JKGXVGrCWyI6461p/aXsqOgzrQtQ1sPxdevwDx5Uuoy7p0CbV412+Y8ojqdkBazRz5Bh+YMXqxPOnUXQ8/myN9eT/G+hifmh1uU3NxquTxXKnivlo91P3duGyusdXFdu3nsS498nj2HXPNnoPXH8f7q2l2SNs9IN2t3Q6iiLWOo/04bc3uKJ9Yx3GcUnUc4sfe+yGIv/X1L5pzJI9e9nrl0+r3rfOm6mUtaK6E2v61TezPjlm3i5puat+DcGcdsuM4cOf7IT8XeBHdPvdJZpuk1E4xv3+a5iSoQ9zpY0d7ccHU3+XbC1A2VsHPHjmCvvwzUweG26wFbTdx3cTCMu67S1r0des+ejl8HqcpHwCvndhyzfNdKONah9k89kmtDVwf1HNwjcbKyy8Ot9vXcC3E7F2oI33fw2cgro2hD/9TLxhd98YG1k8UU19J1xyUTZzEWBb3SeO8j6SeZavfiEmXPV7HtVkf/ehHIX7gAbNO4plnnoWyr3z5axC//hp6dT/x3schPnv27HD7zInjUDZBXty/8Au/APFv/NZvDbc538bhw/MQ/+Cf+gSVo+692TTa/m9+/RtQ9qUvfQXicgnfaR/58Ichfuihh4bb9XodyriueQ2K7Un/Tryx9EuzEEIIIYQQGWjQLIQQQgghRAa7lmcUipS6sYjTJD3Lci0i+zWWH+QoPWOxbKYZy2Sn1evhdPLE7AGIn7qFU133/O7vDbcfPXkSytqLOP3U6+B0ZmJNN290MWVkl6Z7B5Smt7mNVnCx5X3GUpaYrLp4mqdAKcrX++ZYazTNesAla6iEbMGsa0q2cVqMZ2z38j8ozyULP5rytutgcxOnMzc2GhC3W9gmbHua1y+8CmW3bt6CeLuJ9zWK2A7Q1OfEDNrCTZGVTYXSm45bqa9dF+9p4uIz0I+xvJ9gXB6rD7dbJGdaXsf68ZrY9nopuzCrDXj47IUksfBT2hirlVDb2t8k2g5MfzqO44Q0LWenc45TqXlH7gokBPx8srXUoI/t79yDj0J88oyZHn3ZSv/qOI7jkidikkqzbY5VLKIc46FHH4P4x/78fwXx3NHTEG9YfVIpx2mNOdV1NLLclsKEIX+W7wuETs5K82unCXccx8nn9u93m8YWPvucbj7I1Yfb2220jWssrkO8QPGMZVF38jhOU4+RVKFD6Ygb2xg7VlytoTwxRxKLCUrxXhgzcYGm1z1Kt1yZw3fpJNlgbk0ZGdLa69ivLj+HlmD1U3jT7zt6DOJa1ViuPfUsSj2u3WxA3OnSs2tdB6eWzwUoUfpugdPUs8SQ5a6zs0aO98lPouzh0Uewf/nyV74M8e9+9nchfvbbxr7tUz/4A1A2P4/3/MABlFx89rOfG267NLZ4/wf+G4hz9Py8/BK2iV/5lV8Zbq+uohToU5/6IYg/8AG06Bsbw3erLcHgugtovMTyOrvu34rFHKNfmoUQQgghhMhAg2YhhBBCCCEy0KBZCCGEEEKIDHataWYLItbO2tpj1rz5ZL9VJl2JZ8UH5mahbHMT01XnfNQ8Dw6gRucXvv7N4fb2sy9C2fdQytEC6fhWrGt8qomWcrFDdm2kLWt2KR2xb8rZGqtPoj83wP9dpmbQVi8ZmH0vbaC+9d5x1k5R3dqXSPY88T6mIH3uuRcgXiNrwaWlpeH2q6+Sfm4ZU272uqiH7tt6+hjL2Hos56MGrlxG+7DZQ0bjVZvAsh7lGvZ8tuwzsU+p0h367Ooq6pJ7Cd7HTmzqh1VYvkvth20GSU8HjxunjyZdMKdht/3uUq5l++w4x22brYYw/S7WgUf6PMfl/mzn46Ys6EhTN3PgIMSPv8+keb1AbTkmOyROBT5m2Sm954OYLvbH//xPQ3zizD0QsyViz7Iycx0+Ll5TlLDGeec+i7WOnot1zRajBctWjtNm+/7+/W7juvgMsjbSsdYVFAqoqXQ8fFUOeqh5vr5o3hdL65g2e3qK1kbUUadcIl133lrz06B1KOuUonuK3lMz0+Z9ODmJx63W0J6ONc45iieP3zfcHp9G+9bmjSsQd9ZxvZBP7Wt+yvSrj5E9XaWMqZwvXcd9bQ/MviLqA9ju8LsF1jAzrMO131Pcr83Moib+z/yZH4X43e9+AuLf/I3fGG7/zM/8LJQ9+yza2U1M4L5/8id+Yridp2e1Xsf34T//F/8C4pdeRE3z93yP0WL/jb/xN6Hs0CHsM5mUxbH1TmMrPK5rjt8JHTOcyzu6NyGEEEIIIf4EokGzEEIIIYQQGWjQLIQQQgghRAa71jSzNjSfx6+WrJSugxC1MEGAcYf0Tp6lFZ2YRI3NxARqXxcXMR4fw7SiG9a+/+NN1F29Tl6lh0m+ecHyX32GrrdSQo1qvoS6mhZ5S3qe5VUdoT4nnRYaz8sjjbOXM9e0to36VaeCOjWXjgVuunRcr8uf3Tv+yT/5ZxB3OqjV61t175EmskZ+yL0u+oAHdm7eGMtK5Cc+NzcH8XgdNfSx5afc7aK2rFqrQzx/6AjEa5af9JWr6B8elPA+heTj7JJOK7J08OzjnVD98LPpcfuyvXHp3+SENc0pP3LrPMhvnb1I9xr2DuZTt3W4Caf8ztK92RWToUdkiZznY1/4+PtMitwv/d7vQNm1K5dxX1SHj733A8PtP/9XUQc4N4c6wAGtSWCdMnibRqN1ff0MH/rAanPstRxSdVWovU5UzDNYLuJ3OUX8XhLFpL0mjbN9Y1O67Ty1/YD0wQWzXqQ/QN/lmysYL66hDrlCyx+qVp8+VsXjjNXQK35zC/vRXtvog1dX8V05Rqmc5+bx3VmmlNyR1b4C8qgv330XxH4LrylZx2u2151M1fCC770Lfa1j8qy/cNWsaQnpHvbj705Nc5aOlvsfu4/n/j7Vv9NDc/gw6s3/+n9j/JTPn0ed8ec//3mIn3rqWxDb71qfct5fuXIN4pMnT0H8t//234L47Nm7dzxn1m0zrFu22WvNchb6pVkIIYQQQogMNGgWQgghhBAiAw2ahRBCCCGEyGDXmua0fpF1JJafa8yen+SJSt+1PW3rk6h3OnMXejq22+g1vN1GDeukpVFNxtFX8MWbNyF+fhs9oLct3U2OfJh90vEVS6gP65PUOHAtHVKMmqSYvCYd0kQ6VF+RVT9XO/TdDh6Y9+1a58FSTbau3Uu2yF+UNU45q3490ijlydOyNI76ugOzxtf62FHUKN9F7WerjXV7axH9kr1ccbi93sD20SYv7ktXsD1tWxrxmDSP3Rh1ag55ouZT4k5znvw8uexB7JHmmW70wPJeZl/clLSMPdTtfVPbivbR59txHKdPHtLsS2z7obNmLi2ho/oe9TDwc0O+xKw5PHzi9HD7XY+/F8quk6aZ71WhaNpfdRz92rt0/T7rbqmP9uxrJk0z+912uQOjay7kLA9y0jpyuyiS7/C4pWnO575zv9O0aR1FjtcKJLkdywp50j+nPNrtPgr1mK6L3vBh1IF4rYV948Kq0fDCe8RxnAKtD5ok79zpSdNmJqhsu4drWNa30PO/UKDz9s2xx8mzvlZCr+lCDttqeBD3FXcaw+38APvRiRr27+dOY+6FxPLsvXZtEc9xf7ugd4w70eGm1rRQn8H9j/35+++/H8pOnDgB8UsvvQTxN79p8lxwH/qn/tSnIH7wwQchrlQwh4Z9nmldNucS2H197LeGmdEvzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBrvWNJMs1yErRSewPFx7fdQs5XKo6fI8OqxrNF4BaS6nZ1Cjevou1IM98+zzEG+2jDZvrI4am8kp9OjdJl1NvN0YbvukWauQV3C5XIe4T/rO0DHnkZCeh/WDLmmaWfcXW/rLC9u4r7U87muqQFrOvNlX4pE2Ntm//5nydF8D8vwtW96kc3OoaZuYRL1croCa5pkZ4+1dH8N7Pkjws7cWlyC+cAW9TLuRqc/YJSGsj+fMskY3MH/wAir0WDM72jfYFqCzhos/m9bv0hqCxNb6kj80a5hJr5tYOuHIiahstNfmO02nj5pM9m22va0T9nROuA5HaZr5XvFnaV/UOeYtXfIj7/4glH35938X4tVlbI/PPW00hdcuX4SyU3ejPnEQYn2wP7d92nFC6yTY35ZCnzT2ifUB/qpHXy6SH37O6s9ScsR9XFcxiLG+Yoqj2Dz7OXpHDQZ44iX27S+Yd1xM3408uk8OfjdXxHeLnzNa405zA8q21tYhXlhqQOwExku3Sr7Lk6RLPjiH79a5OcyREHqmPspXl6GsXkBN85F70Lc5N43HCvLWGIAaQezgeGESd+3cf8a8t+MmXf8K6rL/SyDVZ/M4Jo7fdNtxHKdo9U2O4ziPP/44xLYGmt8zozTLb3asUef4x5k/OVcihBBCCCHEHqFBsxBCCCGEEBnsWp6RpKaTsdy2skpbgoy2RgosSzGeVvVcnBKfnZ2H+MEHcUrg1ddM2sitTUztmSMZRG4Cp6Ps6awK2VnxObPdFafpda1rDkmekZrGYAsZmkp1rPTAF8j67ud6ONX3qRAt+I52rFSoRZpOoUyVe2nk8uM/9hMQBzk8b9eWPpDFzs1btyG+fAOt3q4uGoumbbrnPK3q52nasIxxbE8tkwQpX6B8tzmUYNgyG49TTrssoSCrwBESi1EpV//zB/A8Rkzf8XH4sxzHVjtPl+2v31N3sLPshOGy1CdTfZQdj5bSsMUaW9DZKeFPn70Pys498DDEX/vC70G8cPPGcPuZb34Vyk6cOYensbOix3EctPrMulV+hqWoLecYUN7sAsnJ8iTP8EAmwkem/n70ab4tyhWUCbL6ypYmBXRP3QSf13Yb+xnXsoLjZ5+fV9fFPtx3sbxYNJKyahmlG808Si6alqTQcRxnOzTyxeUtPMe1dbTXvHV9AeIK9W+zkybN9vxBfO8u51EmudhCe867D2Nq7Ik5I7Hwythv5n2Ur+SLZDlqKeyqA5Rn5Lcazn/pjLJgS70rCH4flEqlHT555/se1T9/p23j3g76pVkIIYQQQogMNGgWQgghhBAiAw2ahRBCCCGEyGDXmmbWnoVkd2TrV9i2isV2XO5bWlC2RcuRPi4hLcyRY0cgrlSNJufF589D2doq6qECSs9sp1COYrTBcVlnTJpUj847tFKFDgb42SJpxwYZ/7oEBVMHgwj39TshXsPLPSz/hKVr/N4OpS8t4z3cS9p0qAZZJ91eNtZvqxuojxuQELJP+sLAsngqFFADWKqgxY5XQM1WTG0AmmZK9rpzqmvHcRzfagNeQhfMz4TDFmi7994aZe3zZozSj2XZ1cFjn6Wt3mNYl8vrLOzrzK7OEXq7OzyvlIWWZUFXqdWh7N0f/DjELzz9JMTNhtGdPv2Nr0DZBz/+gxDPHTwGcRSypZrZ5qv1+ZxT+miqBWtdhUvtvpAnzS4dC+4b75ZSgQd38Ea6U9ptWqdCtRLYj2ie1t1QW/f5vWRb8vH6H37GIlx34tFaG+zd8DjlsSksLaKlZr5nLNjCMqYN75PueNBBu7bmNpZvWum9Ly7jOpIxatdzdF4L11E/nS+Yd/HkNPbRs9NoKTpZphTclu1ecOVVPO7WPnoW/jHh7aTo/k6dxx8n9EuzEEIIIYQQGWjQLIQQQgghRAYaNAshhBBCCJHB7n2aOUMr6VXs9IyFAvow9npdiDmNtu9b6W8TFC7mUl64eNzeJurDpmeMn+QDD6Je59VXUQ+1vIipQcPQ8sYlHRpLfwJKM+uQhjWxtHo++XbmC5yOmTRtA9xX4FkHD/H/nHYPr/9l0sC9ZtX1KxGe809uocYZE6G+s/zu5/8Q4pi8SV1LT54vYR7VgDTgBdKP23rDPOW2TkhLHPukN/QopblV1+y/67os/MS26llpyVm3H5No1OdU8ikdpPVM0HFdEoa6Hmtqd/YBz/JlZu31KI/nO9VWv11SXvHpDxhG+jA7uxE9755U37izMP7BR98D8Zl7HoD4208ab+al27egbHkB/coPHT0FcUzrHaAOOOWtQ3B90L3NW89Y4GMbCcmzntOb2572Ebddui2UnfodJaHXXURrI5LQPM8xpUb3qd/gPjyXM7HLWvvU+gU8bp/6cPguVVDAuQaKGFcsH/+kiOs3+vQu7dBLbdBD7+W+a86zTwsK1tYwvXd7G98ltQr631eqpn+/uYFaavcSppIvunisMd+0n2IR99vzpGkW+4t+aRZCCCGEECIDDZqFEEIIIYTIQINmIYQQQgghMti1prlWrUPMGl9bxjfKw/mP/uDsWM46yZSmi/TBhUIFYs/Sdx48eJA+i4K5xQXUDC7cNprBjXXUbLF7ZkCaXN9DP+BczVStS5/tkYbNjfGa+mELj2zVV0A+w5GL+yrmSB9t+at+vo91eaWPn/0ZZ+/IV6ch9nN4zV5g9HYu6X09Mm/l9uTZ/rz0XbabTUhf6DmsW975/8i0xpm0nbHZN/vgeh49MHQNqWuydcopuSlq/tyYNJP4cbjizOeL/GdjNkcesa+9JuV/m2oHdiFpZ0f4Mv/nnZnNzDMZ3Z/ZFR6Rznhy5gDEj7/vQxBfumD8bD/xQz8CZcdPnYGY14qM0mmn9d8sxMaQPYyL1vOaC/DDEe1rEGKbsdd/sCss6333kpCeG75Gu5biEO9bTOtBHHrmYuuVl6O+jd9ZDvfR1IdH1jMXc93S82ivOXAczDXgUF+Yp9r3aP1Hr4vvHWdgNM4uvdOjAWqY++Q9vb61AnErqg+3S+MzUOYH+F7eIk38UtccK4rrUNb39i/XwH8JjMoX8CfVd/lO0S/NQgghhBBCZKBBsxBCCCGEEBnsPmmpy7Y5ZDPUN9NZEVlx9WlqJ6AZgFzeTF8N+my9hZ/lqS7b6o7PKybrmloNpRyOM4/lYya95+oqTi81m5gWtED2PT5dc2hNXfdDsupycRosomnuSoD77vfNNFmejmvXneM4TruDtkF+YE3J0bTX+X2cXi+QjZzDlnOWlZLrjU6d7nokMbAssJKA/g9MfZdiTo9rSS54MoqnQnm6yi7nsmiEzMFxnNR0uf39lE1cxpS2P+I8s9J1j5JjpD67z/IMl66L68x7O1P9I6YeR8lddlM+qvSRd38Q4pnZueH2A2RPV6qi3Rb3wdw+7WN5qesbIQdyHKdAMp0gZ/bt02c9mvbn1Lx2zG15P5uQR+fFDpL2y4b7nIhs9RJ6pzl56/Mk3Rj0yEaV046TjZxv9Y3cb7Bcg+vPt2wvHZde7yTHKFQwroxNQNwPjeSiP0ApUJ+kHJ12E+JeB1N4b1kpups9PGlu16Uqnkfkm/d2NyYZZP47Jxn4kyhX+JN4Te80+qVZCCGEEEKIDDRoFkIIIYQQIgMNmoUQQgghhMhg15rm5naT/kJpRfPGNiYgi7CxGmqUWJuIlnOkBxtQqlPSoQ0o5bRt8VQpof6J9V/dLn53YsLYudVqY1C2vr6Ox6GdzU3hNY5VytY5kU40wevv93dOwe04aOezvorn4ZFOOU8pzLdbRns2oOOMV6vOfuH5qL1mHR94FpKmmS37XJ9slywLrJTulfSXrFVMCVCt8jvVyNr3+U61Yfx5+5lI2UoFo7XVjL0v1jRnpcYetW+2vttrRqUHdxzHid+Gphl3tXepedlucmruMMTTVsza/pDtxlK6250t1NLpmPGTvC831U6sPpqO4tMag4C0w3ZfGQ1wv5xWey/JkeVoHOK9sFPb5zz8bED1EfWwL40tPTRrvrk9dWjdSUKpoAPLsi5rDYJPKc1d2KZ7nsf3Yb+P1x8O8D76OfM+rJbwXZGQDrnSxdTY7RaOF7a3jKa53cHPdjfxPBKyaUzy5tgeaZrLLr7v9pPMdSrijx28HuPN0C/NQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGbpIlmhJCCCGEEOK/cPRLsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZKBBsxBCCCGEEBkEu/3gb/7ub0Dsui7EuVxuuO37PpQVfDyM53k7xqPK3izm84B9ufRZDz8bY+i4d3AefNwkSXBn1v8jScyfpX05HCNREpttB4/DcUjnEUYmjqIIyuI4hvh7v+ddzl6xtrkO8fjYBMRRaM4tR//KbW5u4h/oGsfHzb6iCK+JqtppdtsQF8s5iNdXbg+3q6VxPMcE23EYD3DfzeZwu9/vQ9n0xBTEGwuXIF6+8QrEcWzVx+QJKAtmTkGcKxcg5uctF5jncbOLdbd89SLE45tXIX7jtSeH20vLy1DWb2Nd/8N//mvOfsLtmZ/Jt0r6Wb6zcpswDCFut1oQbzQaWN427TOi73Z7PYi5jUURfj4ITDvgU87lsI1UK1WI3QD7cK+YH26XCkUoG6fvVkpliH3P7Iv7YH5X7CV/6a99GmKPzqVYNNflUi987ep1iOfnD0J8+PCx4XY7xHbpFfIQB1T3MfXhBeu+BQmex2CAfU5/RJuolCtQVi7jfbGv13Ecx6d3nO+a5zug+8Tnwc8EnTaMD/iZYGJqq5H1h5iOw0/83/xr/2Dkvt8ODzxwkv6C/Z/vmnPL57EuA5/GKbSnxGqLW+1tKOsM8B67/IIcEVITT409HBfbouua+5waA/kJxXj9nodxYO3r0OwxKLv/3gcg5n6A2+rBgwd3/Cz3bTyuwRg/3O11If4rf+kfOlnol2YhhBBCCCEy0KBZCCGEEEKIDHYtz7gT2QT/fO5RzNNAb0eeMUo2MUq68UcxltvyjKzv3pE8I/XZLHkG7cua60qoLOEpIp4mG3GO79R09m4YdHEapOmg5KLd6gy3x8YnoWy9gdNVnofXEeRKw+0eTVH7Bfq/kM4j2WxCHDVMvBp0oKwX4XF7NLVTtKaUPJrebrfxOL0+Tm/GDspEBtZ9LdBtmhnH6fDtLp5nwad9bZu6ztEjX6GdtwdbEHu5seH29CTudyW85nwn2aup/Uz5BU950nxyq20kGJ0O3putLazftTWULfV7Znqdp6J5WntA8oyQ5Cr2vGWBpuJ5CnxlFeUH640NiL2CaTfjtTEoq1N87z3nIJ6cMPIp7kf3k34f66dIMpNOxzyTiYv96pGTKJGqlGv4Xet59gr4nLAkhbv3cEAyo9jUUezRVLSLca6EEoy8FZdIBmJLJBwnPY3N7wNbJpJ6p7OUI6ChBF2yLaXKGg/0BtjOO9a7ga+Bv7uXRDE9X/Tu9WwJBr/C6VnGbzpObD2QfZa+jG4+KdydT8NxSWLhuHgs1wntYOcdO28iyaGP53JGNlgfr0MZS8u4b+P21rBkbLkcyp1SbZHkiUFg2kw+j+3H8+78HaJfmoUQQgghhMhAg2YhhBBCCCEy0KBZCCGEEEKIDHataU5pckgPZZdnffZOlLRZOtxR+sNsbSL9zzDiGu74PGBffMV8XmTlQjUUW5ZzbE8UkzqKdZB21Wfdl71ku4lazgHZ6Kyvrwy3gwT1TnEP9ZWRgzos20WuPcBrihZRdxzfWoP4wquvQvzKtTeG20sbDTxOD/WpEVnO5ctGW+3md7bycRzHqddQE1mqoG3cwDXnfewY1tWANJAs7e3ncN+B1UZYl51nm7LJoxDnekZf3kG3PicuoG3QH2dG9V8csyZzq4V69ZUV05ZZgzlg3THp8ULrUGyf2O5gO9gkuzrWCXYs/X6lgtpXjttttMLr9nBfgbtzX3Hzxg2IacmB8+gjjwy3WVu9nxrnhOy1vBw+c7bGOVfGzxZI/xyTjWjJst0r1XDNQbOFazJKRbTTCrpkBWrdd591yaQdTmmJLd0tL+fgdsy6UX4fhHb7o3cl60pT78OUnav5A7mvgc2i4zjO4vIKxC1L03zw4DyUFatY13sJr11IrYly7bVH9N2M8cPA6ocjfi8HXJnOyBjqnr+aui98TYlVxt/ldV50/fQbbD5nnpkq3ae0bRxZ6VI/ub5u1n7wd7kt5gJ8rvMFEwcBWydiP7cb9EuzEEIIIYQQGWjQLIQQQgghRAa7lmfwT/OjYs7Ex7wd6zO2XBtFar8cj7CFueNpwxHXwNml+H+VlOVcKuuR/f3dX/8fndb+2cqNYmtjCeIkIes3awrz6hJN947jVHKfpo6//tRLw+32Ah7nRIRTN20Pp8tfXMIsd3krU+Gtl1+Dsq3Fm3helLUoyptjNTZQUtKlqfQizZ2WJzH74KZlZzdeR0uv6flDEB+Zn4P49Nm7If6exx8zZUcxq1VMco2XX3od4ks3TX0G0/dB2fPbOA32J4UsWVdjC+0SGyQ9su0Gm5QBcJO+29pGyU+7bSQY/X6PyvCzXbJP5Nie0pwPyKqLpjQjmk8vkJWZLYnivnFyCrNdXr16FeL5A2ZK/cQJtG7b159tSCLV6ZIMzKoDnyRgbepzxmr4vNpSBt5vs4X3zaXMoh5JPezseyzvCUiOwb27fV/zJNvKslEdJdeL6btdtsyk73p0no3NxnC7UMB+g20Z8ySFcTyzr5AkEoOQbeD2DrZ0DFISTd8OoCxmj0cXY1talVJzkgyCLWxTr3h358KsIZFrnZfr8jiNjpOwtTDe17GaeZdWqyQZpPaRJR9DuQa3YzyPwKcMnJYFK2dN7Xaxj90N+qVZCCGEEEKIDDRoFkIIIYQQIgMNmoUQQgghhMjgLVvOjdLK2hZpjpO2UHM9POwoDWHKyoa0QWy5Btpq1iWzTR5bpljprd+O1d0ffWCUlni0pRXroRKoz9GpK9/E62bns8i6hneQ8dkjEEcd1Pw220YjN3FwGsrmD6MO9+tffBLiX/hXvzjcfoL0lfW7T0P8lVu3IH7ofU9A/OD3PDrcnptBjdaFp74G8aWbqHEemzfHPnE/WrfdvHIF4pKH+rgC6S03XzG2Sxst1GHVajMQX72CKZCf+tofQvyZX/7l4fY9Dz0KZT/ww5+GeHsTNd7llrlG7iymS+ecP4kMKI3t+gamuu6QTVHiYz/TtOzbmltoR9ch374OaWVtrWhK70zfZYu5rU3UVm9bWlrWLPfJtjAIyLaJBLG23R3bQVWrqO3nfndxcWG4fWD+AH13/yzDJqYnIeb6wz6e+kbSHXP/nrfs60LSem5u4n0bK6MeOolJW22lMo6wKTruiPed4ziO55hjhxndO+tK2crLTkfM792ErMpcerem0kRb22xnyProGqVlH8ubuuV3Fmuc9xKXxjUJyalD+yqpffg+x/iM2FaUPC7hLOwutU1eM2V/3mNdMj2bqWGcVcznwVpin95ZebJ6m6ib561cKkGZS+ve6uP4TExMTDjIzs8mD2MC0lbb7ToMsa2Vitjv7Qb90iyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZLBrTXMWttYopvSvnOfXG6EPzvKDHnXcOyn7o/Kd//C29b7w/dFps9MaZtJt2+d1p7bLIz6/nx7OxQLqlDdD1Okm4yZ2ffQmvX4VvZf/xb/8GYiDvtGQPnj4QSh7mbyVwxbqC9sbmLL1tZdeHG5vrDegbLyGeunJMmpOB6Fp5yeOoQ776Bymf126fg3Pg/x8x6aMHqzdQfHcdgM1tgcm0QPz2Bjqw15/4+pw+5dfQR/mb7/wLMQP3XUWYm/DaMCXl78AZeMPfsJB3u18t3Cnz6/9LGyQx/ar5zHVemkMdbiFEvrKNreNjnm7iSmUt7ZQd9wjT9+wb+51f4DPSEha624PfZk5HXG5ZLTGCflx+7SuxPZSdhzHaVAbi3rmvOIBpd6l18iBOfQRz1npdBsN9KnO5/Mj43eSUhm1juN1fG5s/+CI6z7Ed9rWJt7Xes08r8UC7ndm5jDESUKp6326N5YO1SPhbEC/cxULqBXN503fGWZ4KbOmO/XMWLpTn9cHZb2n2ct70vSdvGYgl+eU3Phd1KTSe2Mf1+XkKaX5qJTUnE8i4rVYpI8OnZ3f8WnN8mj/ZNA0s/acvpvygLZiTrHN+SQ80jQXyfu9bmnTfbqnrKefJA1zmXIg2O2L11TwGgo7fTeXpzXNdz4E1i/NQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGuxZ0uKQdIrmLk7M0K+zLnGPPQpI8R7b+xx2t7+U45cU84rMp8c8If+Q4LXimmPdF/od2eaaZMvk/sqbZsc+Lv4nfjVI573c+bhyT0eQe8m9+9Q8gfuPF34P4ifuN52/NR93ac+dR/1tqo5b4+86eGW5PdlAzem8Tc8vf46AG0HkN9dJFS+Jc38bvxtuo4TpTPQ5x4pny+FnUUjtd1J8ep0cvyqGm6/Gzxot5K48arTUXvxsXyZfSwzYRTJrzPEja6fVVvP5nv4Ea54lxc+y1dfxu7YVvON9J7mQ9A2suWd/Zte7P1atXoeyFF16AeLuN9VCuVjCumLjTwba6vIw+2F26HzXru2Pj6CvskpdyRPo87guPHT0+3L586ZIzijzpEVtN9JeenZgbbk9P4PqEYgHbZyvEa1pYMNe8sHgbyh5//DGI5+dRW/1OMhjg88z6RlsrWSniPS2SnnVtGTXft28Y7f+h07imoFhEfWYyQD00v5Z817RNzyH9poN9UCHAe56z/H/dHK4NSa0X8kbrTO36YB1yh/rgiJ4n1sra6z0S/izBmtVOz9w31vfu57ocfn+mPLKt/j+lHabvhlFIxZaO/Q714ykfZ+vYmWvEUvGI49B5sU9zrYZa/rExo2kulfG9y31GsYgxt8VR+TdGaeD5u7k8PhN+cOftR780CyGEEEIIkYEGzUIIIYQQQmRwB/IM+iL9JF60bMLikKaUIpYF4NSEa00xJXSgyB2dNpRTY9tTTiwp4c+mpkGsj6dkIc6OH/2jfcUeld/J/yNcP5SS1LrkiOsnHi3PwGviabH9s+u59s3fgvgPfvszEN/65qnh9sOP3wdluRCnW35kZg7i2Qs3htuczvUgKVAKebQLa5J0aMFKe7x6EWUhY32y9CrgVM8Bz0xxB3QeayFODTcHOL2ZK+D0+JGqsWgqjuP0bo9ShTe7+Dy1aWrr5JiZ8u6X8bMbEdZts4XT8t3QsvqZR+usTVSFfMexp3V5io6tqlZW0GpwacnIVDY30RbtzJkzEAdkkcX3zrYu42lHfjzDHrYLuzV6Pu43pj6lQ5ZiW5t473JWGuSLFy5AWaVCkhKyeJqbQQlGYtXf6soqlDVJyrG6iuXdrmnr5Qo+Mw88cL+zX0QkR0t8vDc5z/QNG9v4/OY9kif0sPxFy5Zw7thxKCtQqt64iPKNvIN9kjcw8hZOmx0l+Gy3W9h+AsfIjHySbbEcIxfgvfDI6tNWfuTodVbma6K67fWprkdIH3v0DIQsZbBkNDFJN1KykD0lSxpiW9bSN3nYQtcx6k18pwqUkZKVDLmGHdIQz/EyZBDVKrbjwOp/IhoTViax/2GrSd63PVbj8RHLM0bZ7Nrn5DiU+nyX6JdmIYQQQgghMtCgWQghhBBCiAw0aBZCCCGEECKDXWuaU3oWEo4kVupsn61KSJ8yID2POyp9dUqHS8el87I1rfxNTt/NF8+mcXgcOi06Luu04JoyrGqybPWSEWV83NR3Y7vMIfbPrqeztADxp3/wByFO1kzq4lkX9U5TlPK3vHwe4mhg2bmRztOlG0fOSc76EpZfu2XO8y4XbeAqLuquzm/cgLiYN7rQHrWumy20qJoLUE+5mkPNbWNtbbh9X6EOZTn3OsTT9HzFebKOsjyJIg8/2+PUsCGm1u1bDSjp4Dl6ZdSwfacZlQqb4xZZvdkWR8ePH4eyCUrxWqU05ckIOzteRxBzOt0B9oVh39Rxn8raPWy8m028BtdBTeqgZ7TEjz76KJTlyI6M4/V1rK/VZUvz3WhAWY/SeUcRXnNi9cTNbXwOOJXzXtKhtREVK/W14zhOUKwPt/uU2rpPmt2ZY3WITz78QfPZAdpeJtQn5cpjEDe2URNey5v+r0hi4jylzQ7INs+PzD2PyNaM+75eF7XEnM46n7NsvlJZsjN+b6N2P7DSkrM1WZX09SGND0LrxcVrE9gKbz9J6YEhyFinRP2A64PXG+13Z93xm8VYxqmwqXzEvtOZ0UkTT+8O22LOcRynXDHvw3yA+npuA/Y6EMcZbTnH+KRT5sGaXfc+WTQG0Z3/bqxfmoUQQgghhMhAg2YhhBBCCCEy0KBZCCGEEEKIDN5yGm3O55xYOiT2MU18HJsP+uRNaqWgDsgrkoXJCYmrRqW75lTfKWFySuEzwi2R/73I0hKntNg7fzarHDTNzujj8GETq265LNq/LNrO7XVMHfsjP/JpPJflxeF29zrqHvPLqBGMSugp6xw1/sHhDUwL3VtFz11OO764tgjxqQPGL/rjd38Cyrobb0B85es/A/FSx2g741n0uX3XgfdCfP+hhyG+ehlTUv/O1T8cbp/2KT2pz3p68j0n/VzJ0ilvF/G769OHIPY91HmO5Y0O1F/Auspfv+x8J+HnZH3dtJubNzGNOevtZmZmILZ9itkzNJ1CGDW8fdZgWrpLW8vpOGmdqROSJ23f1v+iL/h2B/vNAXUjrDu1vU5Zs7ywgGsMtrfx+exS+u94YNoQpznuD1CXzDrJZtM8vxcuvgJln/70p5z9IgpQm764jvfi9VfNM3j3w++GsgKlsu9RKvWZqmkzteIUfRb7L7+H2upbC6hpLlnFp08dhbJWG9tAxcW2WQtsHTKtLaL8CJzaOKS2asu4B9Ru0165/F7GfsbWuW9tYX2wZ7iT0s6atsvP5v6S8d4e8dGUpjkZUX934KWcFafL7sSneXTa9RKlh69WcI1LyXpP12j9C3s6l6kNcH9l+1rzOedz1CZ4vZmVJySXp/U/3B/vAv3SLIQQQgghRAYaNAshhBBCCJGBBs1CCCGEEEJksGtNc4N8O1njfHD2wHC7vY16r6X1NYyXUXd66KDRVbInar6I/n1uwH7IOO73LB9asih0EvLadOl/hiQx5SFpDR3y92O9j0dxYun6Mn2YM+LYtbfZ/5F9m/G07S+nZOnRaI3WO0lCueev/yFqeA/PWLrdKmviUdObS1APtRabe+6No0ap0sNrzPexvNBYxvLQituoKW37qB8skkZwYso8A8tdfAbm66hxroyjVjEKsD7mA6MHK07PQZlH2kOPNM0Jifn9yDxDBfK0LG6jHrV3HM8zapjrKLZQT+n7O3tn7gesNV5aMv0Ka5jZ95M1htuWV26rhffdJx/sIIda0AE9R72eqacOaYNZY9gn7+t203y+RRrmgHS1Xh77RrbOzVn61tXVVSiz68pxHKdHfskh6ZQdeH7xell/yP1Xt2s04Hwf2Hd3L0lyqNdvbOK9qc+eGG77pTqUtbrYJpptbHut26Z+Tx09CGXbbfKKb6ImvjQ2j5/fMuswXr2K6zsmqlh/XZ/aatnUfYHaB/vqJhFeQxCQhtVq9zG9O9lXl5+R3gCfr/7A1DV7c7NG3qM2Yq+RYj0rt729JKH1MAmtkXLhs0hIz0TECxCs6+I1AWkjZtr7yI9n6KG9nbXVHn03cPEeF/PYD0b0rt3aMH1qpYSaZZ994kmrzu3Jt9pESqtP7zReu+VabZfHPHG489qzndAvzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBrvWNMekO+o0UbN5u2/Ku6QFZZ/m9hb6Up5ffmG4ffb0GShjjbNXRl1fkEdtTBCYmDVcrJOJWBtqnSfnWU9ILByRHsxJ6fp2r2lmfSUDmmaHNcyc0x6/m1h/YE/rDLvod5RiQN6kz7wIcadv2sx6CbVS7jxqeh89cRfE/bbRy5FVpBOto35wsIa+zdMV1EsvLb4+3P7qN/817ow9Uem7JyaNHjhZQg/jly98BuJW8xLEq5sXcF+n7h5uVw6i/tnfxGvwItKfutQmCqZSygF6a57q47MYNXBftlzaH8PrjfO77j72BPZ7feMN46M9NYVeuV/80pfwy9T4jx41ddxuo9a1UGT9JsaFEtbL5CT2WTb1OpZ1u1jfsf2w03oNlzSFHsXr66hbvvDa8+Y4PXwOej08bp+887m/t49UKmEfzLrJRqNB5aauT548CSX2GpS9Zn0Tr7FYRo3zo99jnrkLly9C2eriLYhL9N7pt8xztLiE98EjLWwuwPobq89C3LFujZunuiXP53wZ62/N6htKBeyvKiVst0XyrM3Rb2iR9YzwO4rXE7DWuJDSqAY7fpY1zT3at62J58+yRn5PSUmLWeNs6ivhtUcJJUWgdSe2tbDLZeylTD91pr2Yk50/m/Je3tmLmbXVqWeVxg/2Wg7HcZymNc7rUv+yTetGpqdwLU25gu+pWs30sbanvuM4To7WufHYLWe9pwLOGeJK0yyEEEIIIcQ7jgbNQgghhBBCZLDruY3pSZzufPHaDYhfv31+uD1O07h5sqdpNXB6ObSmCjcXVqCslJCVzQZNTRdwmqxcNFP7RZpGTE1z5MjWxJpyK1JaR04xmnCqRofkGiP+H8m0nONU2dZ5RzTVF6XSc9I12lNGPBORSiu+d9SmcCo0R1M70W0z/Xkjjyl9vVlMefz+M0cgjq2pUadJabPfoPtClnN2e3Ecxzl80kzRtrp4HpWxwxAfcMnGzGrH84fvhbIbK/i8XN28DvHkqfsgnpo209jBKtpOsbVY4uEzwFZBsdXOE7IYios01V7B++QkZlotT3Y9UY6n6feXZhOlJfZzdP061u9LL7wE8YWLKI+pVU2fxf0E28YNqH6feM/7If7ABz403K5W0Pqu2cL2GCfYhrb7RvbGU7xOgpKKTgenOD/zmV+B+LOf+fXh9mOPPQZl586dgzjgKXM69sCaXu2S/RpbiHH92XacLLfjdLp7ST7A56RI8qqtrcZwe3IM+//2Gj50gcN2k6aPKo+PQ9n6Glqu+gE+g2ur+M5zrXfLIBwtCStSBurOhiVVo8Iu7Wt7G/vKgGQBJesdmLIAo5itA2OynHNtK1hqHyWS47HkcmCl4GaLuf20LIzJhjYmy037slg26ThkdztCYpGdJnv3abXvfF+WPMPjVNZ40q1tlHzl53GcNz1tnonbtxegbHXpNYhZcsGWhkXrPcV9RoWkHOUKvpeqVdOOiyS14zTa9z3gZKJfmoUQQgghhMhAg2YhhBBCCCEy0KBZCCGEEEKIDHataR6r1SH2abx986KxzGqTHtgpYDw9h2lDjxy0LMVGZ4h0bl27guU+p9w0mpUB2ZzkKBV2bRL1m9Nz5jx6bbJP6WJcI+uofJmq0hKeeh6WRWw5F6H+if+TiS1bFNY0s40cWFb90QfMJu03otSoe0mJ2kR7CXV+oXV2PdKaz1Aa6W28FY5bN3r77iZaITqbqLvizM8eWc64lt/TtIuaLk7B7fVQUzvwTNuLx1HLyhrnYkQWYDHqDa9eNtZ3A7Y2cskOjG4sP9RRydg/lSn1qU+pwH06Vs665mIHzzmJWce/t7D237aYcxzH2d42GvTpabQw+tjHvg/ie+9F8Zqty2VbK7bXCkgHfzftC23lsC0PBqRtpHLfsskcDLC+W2Q39uqrr0D87LNPQexZuuTbt25C2ckTx/GzJJRnvbhdJ/xZ1inb9lCO4zjXr1/bsWx2Fu3W9pJzZ09B3A3xPq4sGx38BLWfhx56COLm5gbEi6uN4fbkLNqtNTdQs3zw1AmIN7exfdltgK0ASyU8rz7ZTeYrpjwoUqpiWs+Qr6E2dNDFvrPX61jb9C4lbXGeUnZzyve8ZW/HzzHrkrt8LEvf+p1Mo73dxLouFfE+5wvmvvlkscrjlMSndUu25VxKeEzvKBokcYxlHNN6A35bJFZ9xnRPA3xeigV8xxVonDczZd7b0QCPe+3qNYiXF3A8YNsMOg62mRxp3ot0H4Ic9k8Fa93bGL2XSxW8pk9+v5OJfmkWQgghhBAiAw2ahRBCCCGEyECDZiGEEEIIITLYtaZ5Ywv1TsUAdSRHykYbkttEX9mVGH091yjrr2N5II/NoM64PEB90zqlEG5so4eqb2mpinR1k2XSP/XJZ9DSFlfG0Zd6cwuPE3VQR5QfQ0/fsqXdK5RRxxemUl2ztzLqHm0dc8gpkln+RBpJWws0IFXzINk/j8vvfQRTQd92UMN0/P5Hh9vhMqVopf/tLv/270Fcsa5rkjxjKw3UEyZsckz6qLht2nkUki+uRylsR6QkTdp4fV4e9YMJafFypCd0LT3rUx1s85daDYi3yTM2JA/VXtnSmlHq5TynpaVyWwPuJnhfiqQ9RJfgdx7W+l2+jKnKv/nNbw63z549C2VBgPo81kLOzBhP0Xq9DmWckrtG5R7tOwxNnQ76lCK4S/rNPvYrti670cA21KE28uLzz0GcIx3lqVOnrX01oGyT+lHWmXJ6YluLzD6prFN+8cUXIb55y3iUP/b4I1A2NoYaw72k1yJfYkoDn/PMvVlexbq/5+57II7pOekvmb6BU5A/cO4MxBXKLbDZwH7FsdZSTI6j5nLhFqbzLufwPby1dnu4fXAe79NdJ9DfPiTf4RxpUu08B+EA+0LWOEfkAd0n7f6gb8rzlGKbYR2uXZ8htVOX+/M9ZG0NxzGlPNZJtWZ5CY+R9zT1EV4OrzKw6sSjVM8ueaZznDi8Nsl6z9N4wWO9NK23KhTNs3zqFHq5330a4zHyAe9tY/+0bY2Z8j5ef5XGRLzgqkblts59QG1x0MPr7/c4xbulNffxvrQ7dz4G0i/NQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGu9Y0L62gxuuNq9chji1d8pkD6Ks720Pt8GoHdaarF5aG28u36lB24ypquBpd1Kv0aNw/OWn0YEfn8TzOHp2BuE1asqtXjO+r599wENQCRduoc3THyKNw/qApIxFzmOA55yjPekia5tAzGqaEvKYj0jf5pLMNrGN1yW82Sti5ee8Ym0R/0eXKOMTlw6Y+Z8dR89eOKA/97WWI3TeuDrf7LdSd5V3yIXYDilFTat+riLyUExJeuRFpmi09eRCh5tHpoc6/T5raEn1+Pl8ebn//DLa15QP42fUYNV5r83Usf5fxiN7qY3uJSHPb72P99R2rzTTxGvLffMH5TsJaWlsP/Mwzz0DZxjo+64MBtovpaVPHhQJ6zB4/fhziu+9BfWuefJtz1nqPXg+P0yH/d9aGrq2ZfrbVQq/khLTrS7fRe/no4UN43kePDbefego9nAvU5xw9imsOWLds68fZH7vTwWfoqafxWJWyact18rcvlbDu9pK1q6i1HptGv+Rzp4wO/uIG+jC/fOFliP2Q1gIk5hm86xDeh+lJ7OtyJXz2X3njNyGuVix9K2lhXz3/GsSHj56E2HWNRvzWBr4riuPY91XydYjHyuQ7HJv3dI7030XSEvv0zvJc3Fff8mLu03hg0MX+K4ywnZcqpv3kSUe7n++wKMZjtWn9TNi03h30Xq64eB/ztPbIzqfA6y2CPH42F1B5DvsrzzOf9+k+eT5+1/XxPh0+YrzMP/KRT0JZpYh9QmOFxnG3MG5Z74uE/MTZ479WLUPsuHjes+UDw+1uD/fVaqGWOiBNeNXyI2dH6wsX33DuFP3SLIQQQgghRAYaNAshhBBCCJHBruUZvT7+nL66idOdzRUznTVHNl7HaVroIE2/r7bNj+YLDfypfeMaWrkNAvwZ36uQlZeV7ppmU5zIw6nAJI8/1rf75gt5H6eQynmciu82FvG82nheYctMWU5MoTTB8/D6K+M4fdf1yGLGspLKU9pHn1JOp9J3WqmyY7IFStyd02++0zxzHaehX38Dp57bfTOtOKjiVFZhBuu2NIFTdLUHzbRqZEk1HMdx4kW8T1GI/yeyfY1vTeCwDZdHM2pRhOX2FGRI+bp92pdDtot9cgIKBqb9VGkqvTKH1lFHapiWvoWujc71Rx8zx/HJViqHz08U432Ke+Z5LKzjlPXRdZpS22c4BbNtZdVuo8zkFll1dSgl+I2bRo7l0dTg4hK2ofV1lKpNTqJ8ZsKKI7L16pM9Essz7PSxMVlCtimNdquJ8b3n7ob4kCUTYBu9kydxWn9+HtvQk08+CfHTTz9tnQf20T6lCH7i8Scg/vSnf2i4/YlPfALKUimD95BecwniF69gKt/rljzvPR/F8zw8Xof44sUrEN+wLOp6NPW8PUD5SruJ7eddD2Ma9saGua/tNu7r8CGU0fQ6WF6qmH40pPfM5Zv4/BYDrPvZqTrEM3VLFkFtMe/htH6B3lleTLH1ri2Q41xCv90FMR6rZ03H99pY5vr7J8/w6N3qk9TBtvNsUmr0NklQamMoqZicMP2w65M8kySpDr3D4gE+f4EluXBJupFKre5j+cSYGascOYTypa0NfGdvbZJNKkkfyyWr/eToHVbB95DrsSQFxznn7jXPyK1bC1D2+oULEIfUp/Yt2+L2Nvb7vjfa/vDN0C/NQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGu9Y0O2RdUhhD4eSyZ7Qxry82oKzXQu3UoSnU8E5baX7nSL/Z76I+ZTFETdNqH/VhW6smVeoF0o5duXgV4rExtD8aqxlLutPHUONXLuA1bG+hPmxlDfWFW5ZFX2MJrVjKZLM0PkkiVNL/1KqWTm0TtZoR5Qr3yIbLC0zdeqSr7ZNOfS9ZWMUUti+TbdVYxZz3iQOHoYxThS/cRG1iJzQ6pake6qzypHlzfbbswzqI7ZSbdOA82Rtx8lLQj1Ma2ZgE0QGlvw05Hbplu0RuTs6A7Jy602hxtZ1D7VnfskYaFCmNdh51wVGM5Z3Iqk/S9Sfe/mni3wxOb/2+971vuD0ge8VqFfucpUW0LWw2TZ1tbmFbvXkDrd1u3UQ7SrakO3PGaOzbZDHnxPQ7BemnE+tmV2uoGe+StVtIaxRisuqyreBC6jefew5TcH/5y1+G+OZNvObYWiBy1LKycxzH+chHPgzxxz72cYjvu8+k3y2XUUO5n8wfwH42itFC8fbN88PtL/wc6iY/8n0/BPF7H8Z04LOW/nOV0p174/TMFTAeD3BdQWyl9u1uNqCsQlrPMgmEI8salfsnthkslzDuUJ91ddWcRzFP1q41fJ4CStdcIKtPzyqPabERL/eIqQ/KF82xkohSJCf4TOwlVbJF83iRi/UbJKdS7/VRS7vZwLhUNM9FocgpuOkdT4f1yGY2sd5TvGYnIdM1XrfT3GqY88jhcR94ALX3Bw/gu+Py669CvLxg1gzkKW04p1IPycLx1OmzEJ87d99wOwiwD+mSXpyqw5mwxldjZHkZRneuidcvzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBrvWNLsefrQ2RR6pkyZl9UajAWXtLdSN3CaP1KMVE5+po15llrKsTge4r1aCApblnokvb6K2c2Ed40YOtcbjE/XhtksawGMn0Bu3Pnsc4lqdvDktD9WLr2Pq0z55k64toka320KNzmzdaHKKpHeqHsLU4BHVx1bfnEdxGv2i99Gm2amST2zdSjPuOI7z2pLREJ545FH87NQYxFdIAJxbMtrEeoR6wtgjzRL7egbkiV0y+sIBaUS9LurJXdKDJb4Vk/7ZZd9SlsNRm3AtjVeRPuyRXjfsogY3pn15F4wGl6w0nXiM9IWshOwYj2KfUpRH9BzvN2fPou5t2mrfv/mbvwllnQ6e+8QErauYNs9Yl3w+OU30oM/eudg3HJgz6yF43QB7exeL2N8V2MTWYiGglLjULrpdTtFtznttHfu6K1cvQ5wnjf3cgTmIH3zgweH2Rz7yESj7wAfeDzF7Qts6ypieC/bE3kumaC1NkdITT9fMw7F5C9eo/M4v/muIH/oQphi+9z2mTrrUxyT0LrH9fP8oxLhgrfFZXbsIZV3SxpbHyCvd6v8np1HDHZDPcLOB63Ji6qOcvNE8t6nP3aZnYqpOKZY3UC8+6Ju2mfBxEq4fCq3qrBexnR6a2T+v+ENHUA/LFuP2VcWkle128bzbbay/VsvkoxgMsK+qjeE7v17HdUt5SnHuWr+FejRuY300hU63Y85jdRXHJUePHIf4GK3lmJzA9/SlC+Z5W1vBNSQRPRPdPubj6JKf/fqaecdtrDegLAiwbs/eg371J0+a1OCsF+/3qe3tAv3SLIQQQgghRAYaNAshhBBCCJGBBs1CCCGEEEJksGtNc0QanaCMuprKrNGoDgLUGfW6+N3FLupX2puWXqWN+sz5McxJPjseUIzHOl02OizWQ6+18DyW2qg3XN+4Pdx+YwP1PEu3b0NcmkC92PxB9HUetzwdD584jd8lLd3K8iLErW3UNG1bvrEt8u1sB6jJmSiix/HlC1eG2zP9k1BWnUd/373k5HHMY++973sh/uY3vjrcXmusQplbwv/tuvSvXpIzN7pNerA8aeADhz0dsX35lla/h0WOf/MaxEGI+4oTc2I+yfY80vF1ItS0BT6268Qz+44jfF5y63ge4yU8jyqJAiffMG0kJC1dHDyFMdfHwFoHQFrqsTae135TLuOzf+nSpeH2Zz/7WSg7f/48xJVKlWKzL5/vBelwnQSf36vX0NO40zX1NDuLaw7GxlBXmy9gfbfaRpN//Tre59dfvwDx8gr60Lvkpb+6ZnSEm5uoX50kb/j7778f4k996lMQP/HEE8PtaVobkcvRg0Lsp255FAXyRy762H6ma+Y6OjP4fiuS1/KTX/hViG9cNO3rgx/Hups6jL7WbfLK7ZIffGQJTbk/i3hND/Ujrm+u4daNK1B29OhRiI8dx3htDdvI6lpjuN3voZa6QProsIvlHfYMtzSqboTXO1GvQDw1VYe4VjLl8Tae4xjpffeSHF0z64HtJQa+T8+Ei20tjvH5i63qIqm049E95z4jRx7IriW25nUPdAlOLqC2NzD38Zlnnoay+gSuYzt7F64pmZjCfuGhdxkv87VV1Lgv3MYxD+vDfaqF1VVz3yfreJwyjUXn5nBc4wVmTJjQmi/Pu/O+6bujNxNCCCGEEOK7GA2ahRBCCCGEyECDZiGEEEIIITLYtaY5l8M89XnyuyvXp4bbsYefjfuoyen3UIPS3jZ6n4tt9FK+3EDv0bEt1G+ea6OW6v4po1mZiVGDeaCAepZTFbz8RmJ0jjfaqDu72kCfwasLCxBfegP1Y9Wa2deRgwegrFZBb9YcebPOnD4OcdkxQqRt8p6+td2AuN3A+5IvmHvRbOJ3kzJ6ke4ltSpqugp5bCOHDhsvxXwVdZ+bTdTOthL0cExy5j6uY3Nx3BD/L6zk8L76LvkpW2Jkdwx1rxHp1KKQ9ISWVjHnkXaKdIwueZPGZBcZ259vov7L38J9hWvoueuSQCywdFw50k8GHu7L1mU7juO4ljYxJF/q1Zi0vt9hClZbP3IEvZOXlnCNAj8Lq6uoHR0JaZpXVlCvd+XKG8PtfB7bTJ7avU+aur6lG9/awvUdt2hdRZJgY3/uuecgPnLErLP46Ec/CmUf/OAHIbY1y47jOCdO4BoE3/ISjiLy8ia4/X23wPrFPHn+BtbrsDKNz36e1s7MTKEm9YVvvTzc/qV//o+h7J5H3ocx1XX1MGowE+uZfOA+1I0ukDa072H72tg0fXrYxHfj7etvQDzoYf81MTmFcdWc19oq+nxvrON5tLdwX1GRdbem/lhfPzGG98HnvrFvdP45WiziO/iO30s6LtaPT32nZ62X8WN8RvL082SJ3vn16fpwm9dBxBHex+Y21n23h32XfVYuaZp9Ss7A1xBa/c8Va42I4zjO14LPQzxewr7s2AlcM1W0tMaHj6CH8+Qk6qObWzgWWaC+zrHeNRPjOD7oh9gGiiXUyAeBaYusS38rXZV+aRZCCCGEECIDDZqFEEIIIYTIYNfyjF4fpwI9mkKp1ozkYJDDsXg8wM+WYrKka5uf01tbOG26udGAeLuDaZK3buPU/dqmOc9T4yiDOFgh+zofp5SKljXSeA1lDkcKWFWreBrOdZKJNDbNVM0WTQm81sa6jAo4zXH8KE7XjVfMNEenh/MJrRC/u/AG2lANrDSRYzX87snq/k2vbzQbEOdKeC+mZs10zem774GyxVtXIb4Q4tRXy7o1YRXvebuNny23sC1WKC1t5ZqZhux2KI32GE7Jej5O18UDa3qO5BgxpafmdMIOWTp6IIXA6aeEZBIpi6IRdnf8WU7vHZH0JbakQTHJMXoD/O7Hnf2F6+H0aWPt+Pf+3t+Dsq9+9asQP//88xDfvGls47ZoqrBHabU59WoYYr3YKWLDkD+L8jLKTgzprA8fwX7gxInjEBeLND16DC3EPvn93zfcZvnF7CxOj3J6Xb7X9nX4Pk9xfnfKMZheD+9Froz1Z6tuAh/7kSq9O2onUWI4ZcnxXnweU18/843fgfipb+E09/G774X44cfNvZo5hBaipRnsc3qUknvCahPtEkoAmmRl2mmh/Ke51YC4YKUnHh+vQ9npM2fwuyR36pK1Z61i+s5BHdNRV0t4XwoBpYNvGjnCWA3HDt0WHncv2TrxHohdSmvvJ2Y84bZRgpJro41qJSI5WNf0u+NkyTc5hvVVIsu5jXWUznSt91BCPX6eLedI3uNZdpsHKEX7kSL2t/3FqxB3yMayNGGs4biHqFZw3/kcnliPpEOtbUuiQ7KQcp7TipM0yBp/cT/H75DdoF+ahRBCCCGEyECDZiGEEEIIITLQoFkIIYQQQogMdq1p7nRQY5Ij65KqpSXtkNYn7KMmM0dpVQuWRifno/olR1qXRot0Wpt4Hucty7qbpNmaLeLl3jWJurVDE0ZnM0bXcKCI1zBPqSuPT6CuZiMxx9qgawpI492g9MQ+pSfuWOU3Fsk6q4OaXJKZOrFlcXXgcbSESUp4/XtJl9KsOmT/5BdMXCOt4S2yDlwjXei4b77bclGT6zmozZyMsQ1UHGxPdn01xtG6Jp5C25ygSJZ0lu4/pjUAcUK65Biv37bFcRzH6XSN5q3X43TdpEMm3WxEmm+0CMPvsqYrJu+7yLKcY53rgI7zD5z9hbW0ttb21KlTUHboEOqDP/axj0G8tmbWAjQaDSjb3CTtJ1kgct9ox33q+1jjHJA+uDZm2tjUFOpXp6cxnp2dg3iKbNAmJurDbdYhM9wO2J7traSb/W6jR1r0hNrPpFVfThf7oKKDa1ySIj6T9UOmX3mYLNSOnEH9+PULNyC+ceFZiH/z6W8Mt2tHTkPZ0TNoQXfo1F0QVy1daWUM+696DePOAJ/nTo/62YbRkb5xDc957sA8xEVKaV+hNUCdpnmG2hv4PAVj2PYqdXpf5s19W1/G1PJrg/2zTd0uo3WsX2ZrM/OMeCFqlrtsE9fF8252GsPt9UXsTya3sN3W83jf/Jgs+ywb1RwtcMn5/N7Bdj5eNe+0B8/geOHRe8/hd+n912/hO75YM8f2KF03a5wLBV6fgannWy1aRDYCj+xeR625eCvrMf7494RCCCGEEELsMRo0CyGEEEIIkYEGzUIIIYQQQmSwa00zaz/clObN0q9wGl/yxsv5O8cpH1nyOA5LGCeUFjmxNM8bLdQerndRZ3p9GY92sGN0avdMoU7o9DhpYV3U75QT1CFV8kYvfJi01CdyqF9daeF5LW2hbrltefgeopSRHdyV0yO/31zF6I7GxvA8Nhw8572k38JjheSZakvIlxZuQdniMqbUXCIPx6NHjX61T565LdKbruFtdaIJ1CrmDteH26zDmi+jBjzifVkfb5OmmVPH9rr0jOTwPC5eeXW43Wrhd1mGxXEujyfW7Zt23Sf9vEsatySkNKt2OR0oYa/p72LyVCcHDx6EmDXPNqz35TTSHMfWMxhTOt2IdOGsFc5ZfQP3m1m64pR/9zus5fvjzvgYrklgL9jNDfOc1cuoD3coHTE/Y4mVqn6c8hSMkZZ4oo7nce99qEu+eXN5uP3CK5j6+pWn0eP52Sd/H+Jxyxt39iDqQmcPoo/3xAw+AxMVXJczecjUwWYL+6elNfQGbmxSbgbqG25cuWIdh4Ydfazbko/nUSsYvbTv4X4TWguyl8S8XijBa448U0d+Ge+xV0I9dET9cNRtDLe7bdQ/N0PUP6/2MK5GtL7MqqKqT3kJfGy3RR/v6wHrGTk5j1r8Sgl167ka+kcHVdQ427kKXId1xhA6bJfM/VOlYp4hXhfC/d4oH/l3ot/TL81CCCGEEEJkoEGzEEIIIYQQGWjQLIQQQgghRAa71jSn/VxRWxRHO+sb2YuU9YWxpQnMJagbTchzr+qTfoW0Mv280d1se7ivfg51RK0+6g03m8ZbcaOLPou3tvAa5klbfZTywU9Y/47UqK7yEZ5HmXRaEyS76efMbWoX8Lgdui9dMmqOLf9Dl47bitBrdC8JSVvc7+K5xANzLu1t9GTsd/A8t8jz+eKG8dgdJ/3gdh91VgNqi5VZ1Jqtra2ac+7jcbbbqxDnyKvbs+4Feyc3yWcyIm/NOEHv0s0to2srFdHDskyeqO027rs+Voe41zPXsbVN/qDbqPv3XGzH87NG91ig5zYIvrP/c6c0vFZfkLgskuPvYhxZ2mP+akJe3wl5XXukC/esenHd0ZpL1vo5ia2HxjYUx9TeUutKdtbyCcdp0RqXKduX2XGcTst4/Pci7P/HqvhZ9kr3rftcKJDeuYD9F0lQna0tfCbni8YD+dDJw3iOHVwrsXAb17/cvLkw3F68/BKUXXwZ/aDdPPaNY9OoYZ09aOL5o8eh7NgkeoQ3u9hWN9t4zXedMHrqWhHfu62tKxBvNai+8qb/GoR4/QcO4HnsJYHHYyBaH+KYftoPyDuZ1qy49HtlUqkPt3vFOpS1O/hu6PSxvXQjjNuDxnB7O6Z2TDkOKmU8z2OHzPtwbgLbR5DHvswv4bs2V8L3lGP5RXN/m/V7bWoNnRWnxo/x6LU19nuC3xkc7wb90iyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZLBrTXM4YH0d6fxGaENYt5fSlVjFAenBSMbnVMjfsEAa33Zivu9GrMFEjWqrjf63SWQ0OUsR/j+xtkVeiNt4YncXsX6OWx6OB/OoZ/LponwHr4lSyzueZRDpUV2STbVTJR/G0Dc6pDbpzqq1Xd/+t02coJaKPX59S4+5ttaAsgVLp+c4jhNHqGu7dP314XbK55vbTxU9QFdvoyf0+obREvcH2F4uX8dzHuU1yZos30M9WLWKPp5hhPem1zX1VR9HP8xDh9BvtdfD+mCt8dKS8bk+fBg1kgPSua+ton5uYmJquF0uYzv2/DvXg70t6L5TlwSawkEe25tHfY7Xx7bfzZv2V0XJpeP6+IfQp/Jk598eWP/McN/oe+a8fJ9bM+9LmuU7gd9Z7TbqPScnzHPWJR/5ThvXZIyPj0Pct7zQPVqHE5KIuVQib3jy7d9smGcwpnU3E5N1iKu0huOA5a07GOB315u49uHydfS/X1i6CfHL33xtuP3iN76Oxzl6AuL7HnoUz6uI64mmZ815rS3jcfIuPVBkiNzYbAy3KxWsu/Fx7Ef3kpA8jRMPNbyu9V7n97TvYh/h0vvAsXJV5Au4334b31n9Lt7HzV4T4s7AaPc7MbbbnIv9+/gc1v3MrHkG/IDWjOTxszF7lQe8xsLZkTvVFo/yWr6TtRt8mLcgadYvzUIIIYQQQmShQbMQQgghhBAZ7Hp+fkDpm2OafrDTw6ZTIo7el/07vss/8Ts4jZEvot2W71I6Svv/AJoS8ckizPVweq7X96xtnHrokFyjxZZVlAp7MTHxwQTLajTNUaT6KZHVlC1doCzRDmXwdTya6tvqWHY9azg1k6tNO/sFpxNOyKLQTcw1XrtxA8ouvXER4m4fraMGlr1bGFJKY5qS3djGqSye4natNpOeMhod29NE/N2QbMv6G2hfl54mMn9od3A6bn0dv9vtUUp3mt5zrLq+du0aFFVpetMn+7QbN68Ot/m59TPSOr/T9Lfw3kUkc3KtOo7y+Gx7Dj0oIdbRIGemuSOeLuapVTqvHMkz7mTG785s4STHeDsU6LnodnHqemvT3LmxOqbRbrWwPa2v4TNYt6Qd3NdxH8T3vEhShmDKtKcOWVX26VkvlbG/TxwjIQjpea1OUDrvmTrE2y2sj+0l875ISeaWMY32lz/3qxBHdI2HLcu62jjKDaYPoNTFLZI9myXl45TkSULP9R5Cbq6Oz/2EpTMd0HvIIx2lT/Vj21b6JLEs5MmGN8L6C7dxTNTbtm1m8Z6WaAx0ZAbbT9myleuQTWxCMsBaQMNHrg/HxCmLPZZnxKy1o8q2ZGsJy30pZmkM7JalHW/hd2P90iyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZLD7NNpkAzMg3VYY2elf8bsR2YvFpPqzU2UnrA8kPWESoG6G02zbTi5+AY8TdPE8cj7q1mJL/5u4qOfxeqTpJnHxGul52pbeklOM5nr4Xc/F8oAsruo1o/niVLkR+W4NKH3pysDo0sZLqF+dGsP0m3tJTJZqCZ13YjXFXI70plTXHU7Bbe2bJaIpu0PWSnFbhLaaMrDDaJS/XYYe+k7SdzYaayNjJp9DTdxYzaRD7bZRE9kkG6pUfd2RQndvWfzGNyAuudh9Bbn6cHvg4XXFZJ+U91AXWHr4PhMUMH2s61CfQ1UyeBvpqlkFqF8x9g6X3iWlEmpBY0sf22o1R362R9pie51BfRzbT0IvRNZWp3Xtpl2Xa6jvLZLlWmubLMUi8wL0A9Y743EKRdQHF/J4XpWcOY8DRzFd9Wmywrt+C607r1y8DPG1Cy8Mt6ukF3c9tK/z3Rk8b6v++PnotVvOfuG79N5OSIdrj11CLOt3sb0USvjke6DZJXvMHH7WDfA+Bi7pvD1zngE2D2eCUl0fqOC+y9baj6SLOv7O+jrEQYXs6yj9t2vptDntfExadJfeyx7ZbdrDPB5f8tg04THAiO45obVGu0F9tBBCCCGEEBlo0CyEEEIIIUQGGjQLIYQQQgiRwa41zf0eimN6lKIzsvViLuuO8TCsm7TjhPUqNKzvURyTYCWyBC0x6WJi0kN7edSp5WJzni7ZPwaUCjVMUFfbJX1lz/p/ZL2zswb3zfYV5/G8y6HRbeWobuM+aZpJS9W16iChayh00XdxL+mQPsod7KydnZ5B/+h77n4Q4kIR9ai3bhr9XLeDOqtMTW5K75TssJ3+sJcSUJuY3UPdt6UNTgm1R8IpuQdWit9aHnWOXkTPRMpz2NITki/zgHPc7zGDL3we4mhjA+KgZLSSEaWP7Xuo83YLsxBPzRgdZe4kaQRJuhiS/i6ifibBfOpQxm0mJYsfkYpdvD3WSZN58AC2gdh6p21v03oPuue1GvZB29vGO549nCcn6rgruq++n8rLbs6JtJ/s7ztex30Hloi12UAdbUJiz8DHd1atjM937Jnz6JPnc6GM3z1ZOw7xsSMHIN7eNOfVpz7m1gqm815axHh2yjybdar3XLB/v/vx+9PxSYdr1Rf399EA37WUqcLxHWssQuu07BwGjuM4rssaZ+zb/LLpvwKP9PQBavULCa6BCgamzRRo3NbZbkC8ev0NiJME3y3jc1Yb4PwbpMvm9QbplAjm81TtqbpO3J3X0CUxjzuoc3fyThb6pVkIIYQQQogMNGgWQgghhBAiAw2ahRBCCCGEyGDXmmaX/Ox8b2e9XczedxmetL41dk/lKKddhcloTaCd8j1i4VDE/oZ4LN/yqfTJ+y+iqgrIZ9iPUHsWW2aCfI6p3Omks4npvLcsTSrnaPf43x7WQwVG3xSSJqvTQZ3xXhKR1prvcyFn9FDzB9ETdG29AXGSQ13b5PT8cHtjbQHKNrdQx9jpbEMcRag1s2tvMBjQZ/Eec7O25YbsM5lwQybYy5VKR4bs3c062m7PXGOV9HABe6qTGtt+3HLk/5z3UcO219QaqMeLl/Beh2Wj/S+RBtNzyGe3iPtutRrD7TUX9ayei9dZ5j6JPNzteooj9rPHuErrKjx35y5ZGue3R6WCdb2wgO3H1jj7MT4n/T72E90u6YEtL/Q26Z8bjQbEMzPoQ5zSNNtrfOhZ79MalkoZG3K1Ztpezsey1RXUWnv0/stT0wvyZl9hxrKKXI7WSpDnc33SXHMvxIuqTuIaAjbibW+b53pxAf2gDx856OwXUUjv+ATbhGflE/Aot0BCL+qY2pMt8eX2wN9N/J09nh3HcZycdd9JE789wPu00sbzOFg15110Ue/s+TheCLewPa1cI423a/rF8VlcP+C6+C6JUut2WMe9cw4Ebosx/cX2iOYuNLpzm2b90iyEEEIIIUQWGjQLIYQQQgiRwe4t5/r4U32fph8iW2JBVjYD1knQlFMQBNY2WarRZ9nKy6Gpafu/AJ7mSP+MzxYy5tuBg1NbMR0npKn7gOQHoTWVwxZgHk0veDQdlZJvWOfJs/gpKQzbolnTwRFNWbt3kMr57dLvYftxQpZnmPou0zQqN5gCtZGZ6cPD7fExlHZ0Ol2KcYqpH2Ia1pyVOpZTXTebOG0fk10NfJy+G8U01UUyGm6rzS1jnddu43HzlGa8WsX06DOHDuGxrTYR9fAJGlBK8jbZCvnWs1wu4zRqpTru7CfR7FGI82Q/FY2Zegiof/IDSl1MtoXPxcvD7V/+9gtQNj6G9VunbnOFrO+mp41lIltArm2gXOhjhx6A+AeOvMs6Z5qmdRCJNe6MahXv+e1bNyAeq5p+pzaGqbBjklf1ejitbT+/1Sp+d5s67bW1NYjn5rDPsmFJXaGA76UeyTUKeVNeKGEfO3cAn9dt6s9abewbA9fUR4EsG6vUR3Nb7JEMqdM151mt4/NUGsNnM0/jh6L1bmi1UaJ1+zba0+0l/L70+Kqta2bphuvxuAb31bX25ZHsLfXLJqniWAroWLIbtqNb62FdX2ziPZ8ZM+dZCLBtFV28ppyDUsfBFpYvXjTlUe8MlE0ewr7cK2J7ikiSEloDQZckJywzimKMmy1jd9jpoBVpl9KbnzmB5/Vm6JdmIYQQQgghMtCgWQghhBBCiAw0aBZCCCGEECKDXWuaI079nLJUsXdF1iykb+KRemj5fpAE0CGHOSekb3ukBY2tfaW0L5xukU7E1ih5lPYxjlPebrQvTm1p7Yus7jiFJOfUjGjfdsZyl1JEJpTjN6XTjs2Xe5TKukep0fcS1nWzh1HP0sxvbaHWjiVbPt9Xqz7Zlsul1J75HOr6EncSYvv75TLqrPpkM9gjCyJbx8wppzkNLzs2sqY5tPTngwHeJ7Z38n3UrZXKpBEsmvJBH8+53UZNVxyi9trWdUekrxwM3oJfz9ugfQZ1cSGlgE0sjXPIVnoB1lk7j/V93bIefOnqJSgr1FDLHRRwX7ebqGke7y4Otzk97NLaCsSzAbbHTxwyKeN9Z/SaDHFn8DqCc+fuhviNSxeG28XyKSjL51FLHIX43PS6HbsQyioV1Djz+qDFxSWIx8fr5pxJwOqxFRflFI4T028kbBNLlpFj9QmII9JtN1Yaw+1SAa9hqlaHuEtrfHyyRh3E5vliS7AarY0oFbD/sl5hzphVN47jOLMzaGW2l6RsU/leWH0+dfcp/TOnfg7bRv8bkl2f7/O7BMM4lWbbeh9SH9Kg/uZyD8cE8w3Tjqd8SuFOXptsdZqjsUjf0jgvXMQ2Hw4wnj6OfbtbwvUH9to2XtsRsUVhF/e9vW2e1Yg85uL4zntV/dIshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmTgJmxGK4QQQgghhAD0S7MQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBho0CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZBLv94DOtBOJmfwBxrZIfbiftNpR96xsvQHzp8m2ID8xND7cPHpzHsgNzENcnxiAuV1yIfd9s9zo9KMsF/Fn8n2GrY65xEEKRU8n7EAdJjB+IYyqPzHE8rLuQ/leJPNx3j3Ydxea8kxCvIYrws90B/iGy9hUEeLtdD8/jgwedfYMu0XHf9FNvXsbftWuXPztqv/tJkv2Rd+y772Qd2Mfmeu9TXH4bx9kNf/df/jrExRI+N/l8Ybjd7VB/1WxBHIZ49uVy2SrDh9+j56RYLEHcbXUgnp+uDrePHZmBso1mA78b4XkmA3O3Vlc2oKxQHYe4Nol944D6lfW1teH2+Bj2m80W9tHLa3isxMW+wr73CbeoZGToOElibVIp7epn/ub38bffMb7Y+U2I11Y3IV66ZuprdQnr59DZByDuJNsQ31x4bbi9fPt1KJuYmoZ4Y3Md4miAcS1fH2673TqU3XXwLMQHD09BvNUz93GqhGVnp87gcdfw+qNBA+Ji0dz19fVVKGt38Bmp0DVue/jurVRM+ytF9Pz0mhA7fhe/65tnqLeJDWa7dRPiD334/+DsFV9/EccxNJxwoJvw8D2cD3IQF32MXdfsrEcv9X5I7/SExgD0SMXW2CSmcQk/fy79bGpfg0uFvo/9rU/X7/KzbZe5+OGArqGQw/ooF/MQB9bB4hjrY0D1NeAXldXJBNThcN9+7NAJ/nIK/dIshBBCCCFEBho0CyGEEEIIkcGu5RnbMcoxNrs4PdWNzM/r/TZOr6yt49TfrYsXIP72158cbkc0ji9UcFrx4DxOSd51/ADEZ+8+Ptw+ceoQlAXVIsSuh5c/W7COHfBUA05HJSzHKOB5x9bHA5+mOmlKoBfzlAnuu9c3OwsTOg7pSIo0rRoUzH3haY0wNZ3iO3vFgOQsPAVuT9/wVM7b4Z3c150waor6j0K65+/gecZ09MjaNR8lPaHGc+0mZhlRm56Bso9Tau80+TzuP4pQYtHr2VPCON1Xq1YhjhN8FmzpEt8bnuLM5+h5zmOtlsumnwnp+QwHdG9IXpW3rmm6VoCyToL7InWZ0wtx372e2Vcc4XcHA+zPOx2UmCQu3mt7z57DbRnPI9WS3Z1LPW//ns9f+dlfgfjiK9cgfulZ8146fd9dUPZYEd9h2+tYn7mBeU8Ve6ehbOMC933YFvu9GsSVupE65BIs27pehzhaxbaYq82a/eawgeTb2NgmnAmICy62t8h6j88W8bidPLaX9T5KPZIAn81Wx0hfutRPXO8sQuw5OLY4UzPnXRnD938rxHf6XsJdtEf6hHLBXFeeH06PJRXYJuw7E1PfFcck5+Tnjc7Lt97z/F5JvWdc3Jn9PHr0WY/kGQHvO9UvmJiP65P0I0d9u09yDfvQQYLn4UYkQSE5i13qeqOvaTfol2YhhBBCCCEy0KBZCCGEEEKIDDRoFkIIIYQQIoNda5q9GDVcc+OoJQotq7OwgpYyP/ipj0L8E3/64xDfvGnsbNY3UCv1yvmLEP/uZ/4TxL/3q78BcW3caMvmD6H+6eiJIxCfvgsteO65y9iNHD44C2VzM3WIq2XU4Hgu+ZxY/454ZPVWDEjPQ7rHSoTlXt7UdT9hTSTum2SPjm9pCHtd0le+BT3PWyVlNXUHn70T/e93SsPMpHWdo8/rTuonvWvWqbGObed9e3Sm/MnYOq+YBLh+wmK60ef5dlncxLUSiYO2Vr69doDq0yeLp5SUtm+uLdXe6KMu2W0W86gF9SxruJUWWt2tN/EZ7NPzOuOb/o9tLre20QZtYx2tynou9slBzvRRbDeZI81gQH0B2zbZazjimNd3YLvgpjyqbe/n4/q7X/oixF6HdO/z5r6tdpeh7MVvPQ3x4eLdEN++YDTPt66gRrdYRg1zoVSBeGICtcXV3NHh9ngV1+wUEny3Oj28b2Hf1HWPNP/PX12BOM+6/wrex6m6uTmH5iahzC3gebhFslAbrEHc3DDHZpuzsSK2zZtbeJ4vhKbd5yO0mDtQw/PaSxIy3YxT7ylzL3xaW+TS8xXTizqydLlJSqSMFZYaanjc39tl9F1+N6Qs53ZeW8T7Yss5L7W4Yed9+bQ+xgvo5UHnZdvjsl7c5+6FOi97jQ8PedhGbzfol2YhhBBCCCEy0KBZCCGEEEKIDHYtz+ht0zSQh9MLJctmyffx5/EtyszXoZ/mS1UzXTVYwKmtN157BeJ2A6duSpQVrLltpkPXXkVLoVfO34A4+Nw3MC6Z6czxScy+dewITpOdOX0U4rvPnoJ4cspcU4usoMZruO+HTqAUpELyFvv7AWUXDOj/Hp5C8qz5F7bK+k4KGdgWDRUEo1OMjbLRSWfDw7+kMxHyVLz5RJTSG5A9j8OMyk2Yddydp7Y4A1uWJVra3seKU1Noo2UhdvvhbEolntvbY167iplEeYrTs/oVl6wZPTpXnmq05QssZeDPRhHKM6YmcLp9cqw+3K5SE5ogKYdbILuklrmmTg+ny9kiszdA6UcvQWlbPjDH6vepD27jd/s9LCcnPGjaHrWZVAvi6WU7I2Dqo/vXCz36gcchDpskN8qbuFatQ1n3GvbJl59EmUC4YaQQ/Q2UzWw7mE3vwOFjEFemsf9fvmb2vRgvQdn4JGaYrNVQ+tHrGLu2Xhfbg+uRpDBAO9c826Ze2Rpu5/JXoIxcVJ1ebwvi+Vmsr/Ex0xb9HD4U1Tq2vZVFzBC47ph3fpigDOTUPEouP+nsHfzc06vYsRweHZdsZHMO1j0r22LLAjJ28DicfNghO0gneev9ML8f78T6NUuCOGpfCfUEIb3DnJD1Yeb7Pg1bE2e0BCWwJCcFksayLG036JdmIYQQQgghMtCgWQghhBBCiAw0aBZCCCGEECKDXWuaB2Sh0qdcjknXxGznNOiTBjNEK5wLL7883P6dX/t1KPv2N56EeHsLtVMx6U6D2tRwuzA+BWVsr1UpoM5ocrI+3G7QcS5feAPi61cWIP7yF5+BuFyz0lcXKB1uGzVLB/IYv/u9qL2765F3DbcfvucwlNXKqB0L6b7Y6TpZvrqfmmbWcPXpZHzb6oY/TIzMBJoSgFHbYw0XlQfWzoKUxdDofSeWbjbJ0PumtWSkSx5RB1yU1pRiCGfC2rIMOz/bsjAlVWUd2j7/Dx6MSNvNmjlu/C7d2ygcWNvYP7E9UkBrA6Ym0PaqaAk+cyHa5BU9smeLSHdqtbFyrgxlJ2axP4tJH91ooza0YfW7EZ1zwK5W3DlQaGsOB2wxF3ML3NkyMo5Ht7e95MTduC6lFmCK6o1FYxu3dRv75Je/8jrE7WVMG52ztf95rOvxcbSUm5lBXXJIKc1fee289Vm0TS2WsE30e6hNX7h9ebjd2sZ32OQk7qs2cRBiv4OWhba14MraLShbX0Ot9foSrTcgjfOpMyeH23fdew+W3YP3oR7g+qCtFbMWqT3ANr6af+tWnXdKai0JHdq2oOvTMxKHuP6CtbTwPmDf2Iz3dtY6ntGkVhns+Mms46RsPK1nm1Nyp99D1B+zxNk6Vsiee6lT5vUs1nnQPXwrbyz90iyEEEIIIUQGGjQLIYQQQgiRgQbNQgghhBBCZLD7NNrkUZgjDZytrHVJV1skX8FiFbVTDz/44HB7axk9La+dfxni3ham5/Sr6FNZrZszKY+jp2NE5+GRrsizfDqDAHVF26QX9MgzNaTUjQPLF7VQQo1kcx11aNfWGhB/6zn0l/6zP2k+/8Q9fxbKyBLbGYSkN7Q8ZkMq80K8p45TdvaLlE9jZNU3p/4ksVRKFwkesqzJxbrPkx5qM8T7uNY2dTRTxrIS+Yt6KbNNs++Uly17/bL3sov6XFtzm/JwpuP6e6gLHbnnfTb6HvTa9BdOlW3ul0/1zV7XnD/W9mJmn+Y4IT9f2letjNpi2ws0R89nifSuY2PTuO+kbs6Z9JuBj23ZIZ3yUht1pGFkylkLWyqQ1ykdq9sjT1qrr/RZvMhtPaX9NJXA98Xx7twn9a0SOagf32yiH/DX/5PJCdC7jd7bPhlXT5JOebPRGG4PHEqRPMB2u3z9KsTdHn5+YPlzHzowD2UzM9heFpZRS9zYMO/HWhHfs4PWNsZljH3KeZBE5jzyAb5L+1387mYD8ytEVP7Ki+Yd1u7i9U5OvRvixx/HNT1H68bXeruB9+zAHPo07ynUz0as+7d/g6T+hZfDxPQYJNZzkFD/nrbS58Ul7FOcWNtZXsu8xsfdVdmbnIXjsV/yyH2N7jNcrj9r37yejt+1nGc8tvaFvdxbe4Xpl2YhhBBCCCEy0KBZCCGEEEKIDDRoFkIIIYQQIoNda5rjLupGcijjcyYtPVSedDTL6ygIu4UWj87NRaN3ev0maunq8+jpuNlArdR2axni9qAx3O41UJeWlFCHNvBR57di6UirFfQ/LpZRH9YnfSV7lXqu0YA1l1HvFZMmJ1/D83rssfsg/ss/9aeG29M11HBvbpHOk7RBjmdi1jsXWV+4j6S8JG2bSvpslLD+Ga/Dtere9bFJ50m33u5g+/r5338a4q9eN7rHsw89AGX3HBiD+MFZbBPHJ8yxign7LpMuNmZNF4ROaOlG2fM5YH9M/GpqX7FVo6wdY1IarxGauP302HUcx/HJW5m9PeOB0Uqyhyr7orKmPLLCOBzhoeo4jp/HfqVcRD26Z53nyy8+B2XTVew4f/THfxjiuGD6jYC0sfkYNc1fffJZiK/c3IC4bPnylsg3dnoC2/JyBdtyyOsd3J393uOUbzO1fWvbo+dz/1x2HacaoM91lMO+89SxM8Ptb9JamrvvPQFx3MQ20nzNvJcGEXpzl6h9HCDv5eU11KKfuee0Oae7sQ+KXNaNYt23thrD7UKE7TR08bxC0qYfPITvvHLBxJ0WafEr6K3cGsP25BRpaGH1FTdI0/3kN6hfve9eiJ946NHhtt/Ha4h5mcOekuE1bJXzGoqEPpuyNnd27u+5g3dZw5xxmqPgPtxeEzSq7M0Ow+X2OpGUx3/qPFiXvbMbdbru2IMfia3+iO9LxML0XaBfmoUQQgghhMhAg2YhhBBCCCEy2LU8w01oerOPU4ebG2bapOjjT94LyzjVd6NBUxW+SUP7wHs+DmVzBw5BvLGGUodbL1yEuGRZK+VinAoNyIrEoyy8rmX70tqiaaAE7Yo6XYwjnpIcmDookF1Rnqa2zp07DfHf+G9/AuLTp43tkEspyfM03enQVKlryTMqZbxg390/u6cUqVTPO///xiUu2VTZFlhBgNf4xi20MPw3v/oViD/73ArEyeGHhtuXL+L05hdfJhuu1csQP37OyGze9+AslJ0gKcc8zoQ6Yy5OvdvVk8qozalAE576I+nHHcgzUtOIsW19t7OkZj+IKb21R9PLtp1ZyvYsZAs1tNCy9xQOsG9juUHLI7kQTRmHA/MMrq/chLKDM2ch3trEqflcwciv+pROl6fi43Xsg9rL2JYnpi15BtnTRSlbK7ymfg/rGiRRqZTbO7c3Lk9Z/3EnvIc0btAzRjZqiXVdD7z7KJT509gGFl9HKUzbNfdxdgqffbbvG9D0+rkHHoX46Klzw20vhx1FMsB7HuS4D7dSp1Pq5qlpTPfe6OG+Fm5ex89PmOvoNDFteEjXFJBl5tLqDYgT13w+X8Z+9do1TNH925/7Iu7bN3KN+05hCvKpyXFnv8iSo6VSRd8B2F+x3ID1GfTspmQTthxhNHckscuQZ6TOY4TUwyWpnZshFQWJF9vXZVwDVi3WiOQZQgghhBBC7AEaNAshhBBCCJGBBs1CCCGEEEJksGtNc5k1cH3SkUSWnjDAsqkq6QdzbOFkyuMItS6Nq6gXrM+gXuzYOUzBObC0d5yeNCLLNSdEvxpb3xLSeaRiTouJoePFtoUM6s7uu+9uiP+7v/6TEB+bQ93WworRy7G23HEw5hTTiaU9z+XJ1iXG+tnTNNopCzZOM53YAZTZdek4jpOn+rTvzde+/RKU/bvfQluub91AbXH+0PtwX67RELY28fFYbWB9LVzGdv3SDWOd+PzKApS992G8p3dP4TXcNYbXXCsaDWCpgJ+t5VA/WEylDkc8W9NMOuCE/2+muo8sHW1CFmh+RN/dY4l8n9YRsD7NTn8dkF7VJ+1/zGsQoA5JU0h6u/EKWmRVHbwf0cq14fa9Ie7r4TYet/kSrsm4VDTrHQ6fxBTKpwL87ixd/0SISWJrvrmmgo/XsNWmhLLcv7tUX1adxDHZ0aXglLg7axuTiPugveP3/hNqZT1qv90N074efeQhKHviQ++CuHEv6se/mPzhcLt5Beu26NUhnj+MqZ+PHzsFcWA9SN0Wpo2OqM/Ok23qoSNGM18uYbvMUwr3cqsF8RbZuV60rFK7HdRwR6SvDyl2C/RM9E2bcRNsP+E2rju59gbW7Ze+ZmxlvT5qvN9/3/5pmn2yrUzpdEfocnmNRSpttNUvpy0daS0WHccd2afz+gIMR+mQs8j6rn2NaQe5DF32KKvTjHTeo8+T0nOnFgxlo1+ahRBCCCGEyECDZiGEEEIIITLQoFkIIYQQQogMdu/TTKmL8z7qbIp5E5OcyckX8LNlim0vyi6l637Pg5hS+sHTxyG+fOs2xFevXhhuLy1cg7LbS/jZ5VXUUjVWjaYraqF2rEA+pw55uUac29HKM/7AQ+eg6O///b8I8SMPYIrWxjrquB3LAzMir2lODxwPdtYV5Uh0us26xj3ETyi9dcwqcBNH5DXtBPjdhQ6e969/yaQq/sUvoC/uYhPbT2kCdaI9B/V1UdvUvRfgPR1QyvJKnrRVVhu4/Drqb0sB+vFuH8b/V1dr2J7GfBMHlD75yDTq+OYnUOc3WUF/ct8z9RmRHjVmjXyMdW2nU44C/G6hh7rhudk99tylhQQx6eTtZzCm5zOgNsSaZ/ZZt3FJDzw9hvVfpm60bXkvPzjAshPnsU96fX0N4m/UDwy3v/8setQHXfSoLzVQN3/EQf/bQdvc91yedPEBfrZYonUn1M8k1jOZWo+Q4QZr3yf2vE7egk/qW+XsA/dA7PZI571q9MM3V9ah7MVvX4X4fe9Bb+XjJ4wO9/pKA8oeOPMExNNT2AetLWPf4LrmPApFyo9Az6uX4LM+OXVwuN0iPfQG9V/Uyzq5EraJbt/0Oz16H4YhvqNcerfUqhNYHpl1AB6tADo+fwBiO3234zjOyoo50y99Fb3x7z2OenDMgLC3sC65YA18eA0Fa3RD8tDuWXXN/Vq5VIWY11ex5tn+KTStu87SEjvf/aT6HySV1sBK0e2l0tDf+e/G+qVZCCGEEEKIDDRoFkIIIYQQIgMNmoUQQgghhMhg15rm9RbqSPM+6hvrjvG/Dcnr1Xfxs7UK6p/aA6PJCUkbXKygD+X4JKqWalPomfrud50ebh+cwe9uNDch/tlf+nWIf/1X/5M5py30rIzIT9QjGZHr4TXdf685j3/0d/4KlL3r5GGI/W081oEaapgG1v82/QTPg6SHTr+P5ZWC0SqWSLd4e7B/AqaElEY+6bYC6767Hmpjn75+C+J/9zX0Xn7uttHADQ6in+qDNWwvp2ewbX7zZfQfPX/JaMsGvTegrJjDtubnsX35Y5amljTwF27gfVnewPgZ0laPF0x8eA41fsf6qEY8TRrByR5qoG1ZZODjd4vkmb7VQH2lrZfzAn4GUE85hxbq7zge+1GTX7fto8p6w7CPdcL+wLbHKutuq0VsQzOzdYj7HfSV9baNTjk8hG15KUd9H+lOZzbNeQdr16FssYVrMjYi7DfapKNMEqOrz5M2NuyRry51JD4v4di2tP5s/Up1zRpLLofj7FjyznP3vWfxDyEefWPVtP2oRxrUFl7Day+inrzkzw23HyHt9L1nsE+iqnc2G+iXvLpmfIkTl9bWUFvM5dB3vlI25YM+9ikbDezrBgk+E6Ui9mdjk/XhdquD6396m3jOboTHKuXJI7pk+orjR1Gr/1f+wo9B/NSLqFu+sGDaXpLg9UcuXv9eElM7fulFzAlw/tXzw+3JiSkoi0jDvEbrqVqWZ3a5ivfh3nO4LueBBx6EuD45iedpveJSzybrgVNDAPdNN/8opj9QmNIWuyPKKL6TlQ2pa6Byfk+49poKvuAR3to7oV+ahRBCCCGEyECDZiGEEEIIITLQoFkIIYQQQogMdq1pni6T/om0eWNVo5eNBqSbJE/UYkCak9B8PnZROx2Shre5hl6TAe1rsmr0n1W6us8/+XWIv/GVr+C+LHWMT9KXdhd9KYMcarYevA91R//o//RXh9sffPAuKPNIc9rpoh6sF+E1blm+jM0OeTSGeKKBi3U9sL7bJx1xt7ezN+07zWaM+ssOaeAGoanPJ1++AmWff/YixPHYEYjPHjdi2kEH78t8HtvTyRnUB//hNjaS5pY5LzdsQFmphuopf4DPQCE0ur1iH+u2RtrEPJuk0jqATce0gdUbqG29kMe2+NpBvKa5KfQRnp4wOsADB9E/dXoa41ubWF+3Fo2vcK2Ibe12A+/pe059v7OXuOTp67HnqKWL91mrRnEcYvsDv3P67OHDuAZhbhzXHNRvYvs8MG78bssPnYayRoA3vvpt9BV/14LxYq7dwv1uVVG/mTuLHrWHc+Rv2zfXwZrumLxewz7e9wHFjuXTHNN9SGgdCnu/osaQPpvsXx909AD6Afse1udU3WjAQ8oX0FpB3/Vnn8Z1FY4lTW9feRWKNlfRi/vAkeMQHz50BuKVVVPXm/SM1aeofw/xvvb6RhsbRnjOiYOfHXRxjY8fURvomT6o00QNbtRHTTM/T9Oks33kXeYd+KEPPAZlp05gu721iH34pWtmTcv9J/FdOlvZ9RDmbeP7WPdT0zMQv/yS0TQ/+8xzUJawL3MH+3C7LyuXca3I56rY3zz88MMQf/B7vxfis/eascjcgTkoK5dRL+3SepjYWusRUR/B6x7ilOczfd76TZbl0KlfazNFzcmbbP3/49EaZ99SULvc/7yFn431S7MQQgghhBAZaNAshBBCCCFEBrue28hzykgq39owUz1dnIlwqjTdcHsNp4061sxOnmxwul3+LJ7H/CRORd+8YaaR/vW/+z0o+9wX/gDijS2010osqcOAplOCHE7NPHQOp4n+p//zX4P48QeMvVG/icep1+sQ98iiaXUNp+T8vJlGnCGbM6eA//d06bwTK1nqVhunhAZdnsjYO/ue8zfRluvqFh77yfOm/fzuU2iNVK/hVN97ptGy6MIl04Cu3qImTdOIxRJOSd5ewfbjR2aeNaC0zY/fg23zkfvQgm5rfWm4fe3181A2WEDJSXMd463NZSy3HoptktG4lK36BZ6GJSstNzJtd/wAWiEdOYPtePIIpnSfPmxS/n7y/TiteqN5wdlP2NYqIYlLYs0BxjwfSHIyn+flLIlYcQqnQ+cPYnsb65Kl2hq27c4xIwPolaahrL+JfYHfwn2VXDNVXfIwrXHPx2c/KGBDCPPYR62vmbafI/lPgezFxsto5bWexz6ob1mMRjSNz9Oj3ggbJ5ZyxPGdmE29Pa5eRQvJuZljEB+eM7Iv18O6XC1jWu3c17HxdTbNM3h7iawCb2Psklzx0EGUZ8wdOD7cnpzE9hM7WPdXr6EUJLCmzNtt7Pvamyix8Eiu0eth37ixbj7vkUwyT+9DP4ft6eMf/ijEP/pn3jPcTug5zlOK+3vvxuet3zPHfvTsQSgbhPiu2EvYSvH48eMQ/9W/aiSZ/4//+/8KZedfeQV3xpaYlmSz38H7trGC9+3GlasQf/lLX8LzOmUkYY89hn32E09gSvdDh7GuJyaNXK9KspA8tdtUym2SWkF98XPPfUbKCY76EOtg6VTYfB5sq7dzHxNFrJPMRr80CyGEEEIIkYEGzUIIIYQQQmSgQbMQQgghhBAZ7FrT7JGOLSKt8dS40Xe6OdQwL62hVqrRRh1J0xJBF2K0W+u0MW6TxdrlK6hT+/JXvjzcfuFlTHPZIRulDu2rZ50Hp8Xm1Kh/92/+VYgfPHsS4mbbaJRaHdTUDBzULAUk6KmXsP5KY0Zv2CcbmF4fdcq1CuqOyiWje9xq4nFz7v7Z9fzyV9CC5w8XUFt7bak+3O6HqK/cWsH6uPkZsuSzNL9RgHo5x6O4iTrQYMC2OaYNeDF+94c/jJZV3/8BzBv9P//Dnxtuf+bn/zUet9cdGfuUt9gNzPNUP4O2ZXNH0XKvdhfqkjdeQZ3j1We+NdzOHUCd7PnbqLXrXMdnZiZv7hOnl27uXxb2/3x80utzitiRX8bfB3Ie1ndk6dfr43Uom6DncW0BUyj3+mQ9OGn6gp6Pln6FImroNwY38Dyapm0vPo9aWL+O1lylIl5DN4c1ENVN+2QNauLjOReLuJ6BPx9YGsOYNYIeHZd0gna74RS4+8n1i5ie+YVvYJ90zkpXfO+9D0DZWB6fm6qHbaJeN33WZhU1qBurqIeuV3EtRDTAd6nnmvqbnkZNM2WOd8pFbBPrq2ZdRbVMqawnaQ3GBmrxV601GY7jOJ220d+XKH13QGtrJqvYN87WsY/yCqazyOfrUNZuY0eSTGB7euxD5t076OE77HoX6xaftncW7m96PRxPnDhpnvv/6i/9NJT9k//nP4b49k20moQaYAvHVL5qLN9aQUvDF5dNfP5ZbOO//Su/CvHswXmIDx419ppHDuM9PHbsKMRH6T10YB7bgN2PVirYXgp57Lt86o9c0i3bdR/H2L+k11BQe7L6nzDe/fqLndAvzUIIIYQQQmSgQbMQQgghhBAZaNAshBBCCCFEBrsWtXYojTRJAp2wZ3RZUadNZfjdLqUYtnMZkpTYCXI4rn/+WUxf+pWvfAHitTWjJ9tqNp1R5PKoS3MtjeB9d6N35v/09/5biN/zLkybvbqIerC+Zabb55TcA6yPCul72Itza7sx3O7RLWOb02YX637M0nImZPDb6+FnHafi7BW3G6gPXnBRfebPG73UWAt1a90N1LJuUhpfx7XuY8h6S9ZKYf15LqWWzVnaO6qvSg5vZLSN92n1uqUt2yCv7RxqRscPYgrkKfLLbNwy7anVRC/SjTU8r3oO9fYOaU5n7jGa5/v/wk/hOXdQ1+h08Zr820bXeLGF1+QG9LDuMYlDPs2cPnWUXpb83aMYNbuVcaMdHaug9jPo4XX3trDOPNJAuzWjJQ5zqGFOYtS7trH5OYmV+rqzhoWlAfaFkYv9yEKvAfHUfeYZY5/5mMSx+Ty2qVwO48jSEfb75AvO/qycGjt5003HcdIpt/eSS1/H9S/FItbJU8vfGG53t7APuvceTF28Qemr89umjzpM6brdGO9bpYR9X30c13ecOm60sSfJC3iriff8xBHUd/p21SfYv7sFWr+RYNtcb+A7zL43rRbuq5rDd8VkGa8h52LdXuuavrFWwP58QIbrrTK+t8fLZl9tSmceU//9oLN3cP/C8WBgzuXx97wbyv7cT/15iP/f//Sf4XettVv8TMS8RoCOy5+3Q07fvUzrMZZuY/zSt63xFf2kGhRIxz9Zh3iWUnYfOmg8tU+cQP//u07jOp3jx9AzfZ601nZuixr5R3Onwn7aiVUf7Mo88p2xA/qlWQghhBBCiAw0aBZCCCGEECIDDZqFEEIIIYTIYNea5nYftR8BSdFWVo1mKU9a4fEqevQtk3Yq5xvNkhuRZok0um6EcaWE2ql1x2h42P/SZ61QghrJhx40iqj/7q//JSi7767jEG+sozdiRNrZvuWfzN6SedJMFslbOSQf680tc80d0jUOIow7PbxPF28aPaZPOsW56v5pUsukoeQ20bTagO/XoSzZxnvu5/C+gqYpxOMkMTVUlz0csa5dq/2wZJek504+wLo/PG+0rD65BkekY++Rrn+lgdfYXDO62dI4Pk/9DTyxtW8/BfHCefReHj9n9Pm9CfJ17WJ9VBw8ll+w7gv5Vhd5YcMeE5FPM/uVjtLHJqQLDEn7XikbH9qZqRn8bLMB8dq1CxDPHUc9Xmfd6AT7tL7DJS/cweYixJ51nm6E9yo/wPueS7D+az76hpcsPbVHXvAxLYbY2kQv/TDEe+375vthhDpS1jRnaS6R/dM0X/jWLfyDh23ikY+9b7h96MRDULZJ51kfQ83l9Ia55lwN97uUa+Bxqc+pj6E+uFIy97nbxvUMYR+/mydv7rtOm7Z48/pFKFtZQS1+QB1clbSinY713iLpZ7WE77CDJWx7p6YPQ7zsGz/y1gDfnUUP3wVbAa2BGphnJiY/3wH3CftISg9rtfOQtMSf+P5PQnz9Gnqw//ov/ZIJqK+KSYmbfp7oPKzzSj2L9DOpy8bftm8xfTfs4XO/uoB92eoSxueff3G47dOainIZ29rExCTEc7Oojz5srfl58KFzUPbJT2Ldzs1iW4zs/onf/9I0CyGEEEII8c6jQbMQQgghhBAZ7Hp+tdmmFLZko+NYNh8VHz87VcLD3DWPP83bU9WbNE1dIFugDz+CpjJ3k5XJZWva4+Ibl6Ds2lWMJ2s4/fk3furTw+13n8Mp17i7BbFtL+M4jlOpYZpR20YuJrsxh6ZuNlcwFagXoIyiak9tDLB+vIimzAt4LDv9rZPDqRjP27+UtueO45T3S9fwvm50rGvuYv24JAsIfJrytmUSlC6Z1Rmpdst2PqGxXAtyeM9zBZqWLuDU6F/8P/6V4fahc2gh96U/+DLE3/rq0xA3Xn0dYj9v9u0HOH3pjWG8cAXbtcs2etZ0+sLVq1AWkp3arZvLEOfaZqr07gJOycZbOKW/10Th6PSp9rQlT7uFJE/wCvSMWdODRZKXbd7EaceL3/46xMcdnE6eSEz7rbLHEdlv1j1sy1HVnFfkY/saK+Mz0/dImraJ7bUQmPJ8Ea8pF+EzNlHFPnp1jfp7q58J+/g8svTjTmDbwL1kvIZ1MEjI+iw2z0l3gNaLVR/fWT/x6KcgfuyksSx85vq3oaxPdfvGJbS+K+Wx/laXrw63V1auQlmY4Gc7LZIyWHKgrQZKOyZpCny7i89+i6bfB5YUJKBnophg2/ueo2jBes/JuyG+2b863A5z2H7CLsZ9n/p7y3Y2R+OBAdmL7SV8JI5tp0+Wg7HlI1vQLdwyabX/8PO/j/vl5yu6A1layo8NQ5fHABF4FuJnSaLk+tT/ktzHTncdUZ/RDNFWsEkSzOs3b0P8wksvDbefevpJKCuW8H344z/2Y3geVn1xXcZvof/RL81CCCGEEEJkoEGzEEIIIYQQGWjQLIQQQgghRAa71jRXSV/XIyulUsVoaZsd1FndXsLUsWNFSudsfX6LdHljtTrEAZ1HIY86ywNTRkv1vnehNUlzDfWaU1XUaT1493FzXA81OPkS2fF0UaO0toS65Lyl75kYx3OMOMtsn9JZk94wZ2mH6tOUQpI0XX3Sh21a9nVbXdQRNXosuDzg7BXvefhRiH/j1SsQR1tGw+R2UWfkk+1gEFAaX0urHVB65Jisxfp9zv+OYWKlkfZJ7zXAqnVevYIWTssr5l58+k//BJT90A/+EMRf+8pXIP7NX/t1PE/X7Mujtnclxuvv5NFKq93A895cM20zevo5KKtNoa6/u4rXNFkz6XE/cu4slP3B51B7t9ck9OCwbZpj3a9Ualn6eaBSQR1cztI49yhd9/gsphu+735MW/6QlYLbcRxn3rG1fdSWC3hc9zCuhYisNQkdltuV0ZpszcdrXHZRw5p4pi2zxjsgPSLbJ9bHsc2tbBj9ekpBmSELBF0lnUf8Fiyf3ipulVKnU1rg25dfG27/wS/hc/Dhx78P4nN3oc3VmKU1XmhiP1uYxBTTH/jwGYgniljXr79iUhm3u/ju7FFldyk9vOeZdjwzh/35ZgPfrddvXIO41UIdt2/puOt0jh+efwTij594L8Rbq7ivtQ2jny6fo3POU8fawT46ck0fRc6uzvYWvTv3EO5uYtb4Wm2Z11uwBd3kNLaJ//qv/7Xh9q0bV6Hs8vlX8Tj8BNKJsRJ5JDu75qW/69Jx+NGlcU3J6q+OUjr4+gz2meNjdYhn6Zk5YNnIzc1i2d13o35+EPK4xs4rPvqcd4N+aRZCCCGEECIDDZqFEEIIIYTIQINmIYQQQgghMti9pnkb9U8e+VgWIqPFC0g40mvh2HyZdKax5Xk8mUfR0piL+rB+jMfNu6gVKlqH8hPc17F59Kks0b8M7WWT0ra3gak+vQD9jwchfnmridqqoqWR3Kb0pWS16fjkn+zn8LZ4lv4wbFEaSE65GZGPZ9PoxZot1I6lPVLPOnvF8m30Ju1c/iLEuYbRP+Vc1JB6AWoP44A0b3lzb3KU2pmvMHFIE16kVLJjRuRU7KI2P08+lB5ri5umHV+7hlrE4+RN/kN/CjWS733PwxCvbDSG2+OTmBb0egP9kV/89rMQ/7t//79B3LhlNM/e7QUoqxSOQ1wPSRO+1LA+i3rch87c6+wnbkp/lqFxtiBLY+fwHPqG53zbyxP3W6ujlvj0Q+hJW2lB6PQs7XWQ0jZSGmQSadq9TEwe7H1qb+VxvB9n7j8C8ZJ1TRsbqHcer6KWvdFoQLxJabVBo5lKRU/3YR91yndCbQ6vOQppfUzB3OfNVdT/VpvY/x9s4L6Ki6a+HjmMz8W1W9+CeKPXgHi7iv7I+bp5T4138Tjnr74G8WobvZYLRaORX93Asq029mf9Pt6ncfKwL1RNP/zBI++Hsp9+4qcgnq5jH9Xo4njh5a+a8z6VOwhl83fje7kzwLYXWf19t0dji826s1/wugBu98mIz/Iai14PxzHHrXwTP23pmx3Hcf7Z//K/Qrx6Az2MUxrnO7BNT12TFY/yf/7Pn+adQViwvL1/+Id/GMre/Z4nIO52sW9rN7Ht5nzTgd919i4oq1GOjCjaOe14aq1L5jWm0S/NQgghhBBCZKBBsxBCCCGEEBlo0CyEEEIIIUQGu9Y0r5Iut1xBr+VqYDQnUYwardU11Puub6C2amLM6LYqHmrHwm3UugQu6lVqlNM9yRkdzYC002EHdTIdZ4QmibUubBBJWkUu77XMeYfsy0ziTC+Px/IC9pg1+46Tncscx3H6ZEh72/Kx3MLb4syVdn373zbhOmrccjf/EOLuK0Z3O3BQQxp7qGn2KqhhKowbTVwwhlpVp4BxzUN/yEIFtcZ+aDwxox76H5dL6E165Ahq8Q4cMn7cX/siXt/iNWzHH3o/arpiB+/j4UNGn5rP4fUfOoCawEPkZf7Z3/pNiPttc+xH77sfj3MKdexLi0sYX7o43K5u4nN7bhx1jHuNm9DzOkI3yGUDB/uNHnl/+5aGt3oYfUArffxugbzQnRDvXWhpHT0fy1gHyeV9S5Pf8fC4CXkph+Qj7gTkSW6tO2GdX0DrAnzybe52UXNpa6Ij8pxluO5H8hY0hW+VycO4ViKh9R/5onnOajH2Me+d+B6Ia9ewDuJF057OHp6DsveVj0P8SoTrO3pl3NetbdMWL5Fn78uvoWdvo4X9SrU2MdyenJ6HsrGxCYjb9N1ivg7xRw++e7j9l9/9l6Bspo77jqvUvor4Xr55yayl6Bawbc0cwD469unZ9Uz9DNrYTnMRXtNeErN2P97Z5DdLO8tx30oC8MR73wdlP/lTuA7l//vP/jl+t4192Tv1RI26vjc7EF9Tw+ozfuPXfg3KvvX/a+/cYuy66jO+9z57n/s5czwXz4w943vixLmQmJDgAE2hSKUCBCJURZUoKoKiQqGP9Kmij1Wf+tCiCvUBtdCWSkChKrRQSJrEiR1ytxPHl8QeX2bs8XjO3M51X/pQ6az1fcc+e4w9Q9V+v6f99zr7vi7bs771/Q8/DfH8PObQWFvDbzW7S3nkkUNQ9oUvoAZ8YgL9ye37kKZZCCGEEEKITUAfzUIIIYQQQqSgj2YhhBBCCCFSWLeo9fg51JzUSqhZaraMYLZC3rf1a+h5udpA/VMYGj1Pp4v6rgr5iVZLqHctkqbXs7R6PudKz6B+JSZtni3h8VjDzN7K5NnLnsdhYu4xZt0MhezdGrik4bWuK0vawzrpti+uoK9109JKlXP4zgo+mdduIHvuOADx+w69A+K1i0ard+3SaShbWkbtXSZAPf1qZJ5BUCEvbtbduuht282ghn750vHe9vAYvodwAb0mvXHUPZaK5llvHcZ6ungFNVqtLr7zRkiVwpLNkvW00+qQRzr5fD9yED2fpyaMxrKQwYM1z6HWvBBhm3jn/UbzfHDPDigLyAd9o0m6qCVmPRqsSSA9XibEfVt1bCfjW3f2tmt5fDelJv529iR55Z4+D3EuMW00Q2sfWO6bK5Lf76jR3Yak5XeKqMmNp1HPWX5gO8SedR3FLLaZ+iJ64c7Po49zcw11kmt1U87acvZrdQdImrko7vuXDSRH9TUmj/9hU5/uHdsLZTsb2I8kV3CBiL3WJHcR69qBRg3iN1/F+nJ85RTErzaND++5Fo6dHdKmF4vYz1SGTJ3I03qNboz7Dvu474cmUUv76Ts+3tve7uE6iiUX7788gf3q1fO4BqprjXEzp+eg7PyZCxDvu2cnxI1FU0e6LWxPhSrW041kkKfxzdLXd1mHyiR4jx/92MchPn9uBuLv/9N38NhW33cr1+h5N75Gx7mOHphPZfnVnzh2HIpOvHIMf8tDSRbHqS3DZlz3eD0ZMeiepWkWQgghhBBiE9BHsxBCCCGEECmsW55R8XFKrkS2aKvXrOm9Ak4r1mgKMhfg9FVoySQimnpurJDNC01rd4o47ViyzsXT1py+OqZpxtiaTuDpXYbTV/OUQGxZXCVkDUWzZA7PdiaUDjhjWY61yL/u/BLm8G2R9qNkSVKGPZyO9DqU/3cDudzE6y5O7YN45A4jA1ioX4OyhNJ/ZyjVetQ0VmhxCx+u5+H0Jk/HRD5OK3qJmVZcrqN042t/+mcQ33EX2rXd+6CRnNx1D0okHn4PpqEtUr3N5LCNtCyZUrON761KU+0ByY4KnO67baZSo5DrLYRORJUxrJpzvXQG7a4O3L25abQTbjiDptb4PmKsQ/UO2uftGjLPf7mD9W91Bu/75IlXIF46h9Pt+/ff1dvOkZwsJou5xSU81+XnzLE8H6VGQQ3jEbLjfHgSyzMVqy8ki7lOB5/H/FW8jnNnUbZj1xP3FiQVfW9sE1Nu2ymmHcdxShU8d5Az9SsfYnushVWI2SowtProTBvr3s4symZ+765PQHzm9IsQXxgzfdDhFbSne+LKSYiv5fBcq20zDrdmsX/fXUSJye9Pvx/iD1c+CHE2MrKutRyl3K5ivV65iBKmJ3/6DMTNNWtsxW7VWaGxIfCwfxuvGJvQuQbKihab2DY3Eq6qtyJ94H0zdl8WYVm+gGPUpz7zGYgvXcS02kefeKK3zVKGm7nmOB7825u6f5KVOvRtlqVvxAceQnvXxz/5eG/7sXejXWuljDIkxn4G/DwkzxBCCCGEEGID0EezEEIIIYQQKeijWQghhBBCiBTWrWkeylPaVdKzeJZard1G7aHroX6Orc8Sy16ky7ZSpEVMyJprbRWtb7qWJjpP5wk8tkjhtLzm/xBpUpcwJZWsZ+luuxGexyXbL1JqOo0u3tNaw9zTYps0OeRHVsli+bCVKjuboF48s4kpbENKrT68De3LHv3Ib/e2a6OYVvX1o0cgPn8GLelsK7Kt2/dAmZtBfVyB0r9v2XMHxHVLmz82sQvK1pqop/v+M89B/LMXjNZ1bPQnUPbQO1D/+8df+hzEo8NoJzZzwWhbWW86RWlnSeLtVEkfFlht0ydtWcSpqMnCsBibgy+soj780iJe16+aeIDVUobSyy9dRkus+oxJVVteRK1w/ZkXIA5Wse13A9Qc1gvm+WfHMG17cRjjy29jqvYXTp3pba+FC1AWhfi8d+XwHtyjqGHd/8Cj5rwlvMZjb6Bt3tmzZyHukvVnxuqz+rWMt6BL3kRNc3sZbR+9CBtOoWq1wTXUSQZZbFNOBscWzzW/93Io2o1XsN8YzmP7re1Bjeb0nLGg21NFnfrOAHXZzy+T9rxtruMdY/dD2XvHMRX4gcqdEAeUZru919xjYQeOM14Dx475uasQv3wMddq2rWylgmnqa9vRYq7t4XjZvGr65JyP72U5ZRy+nSSkyE+oT3FvsO04juPSgqo+W0Z7zQCNyyGt5dg2hRr5P/ijL0I8a40d50+fgbKMy1a6eCF+1mjzi1Uck2IPryNfwHperaDuv1KtXnfbcRwnIE3zzm3YJj7x+OMQ77n7vt521MLvGH7avN7Afpz8ycO2eutBf2kWQgghhBAiBX00CyGEEEIIkYI+moUQQgghhEhh3Zrmq8uoHRqiNNoFS0sbcGrrhDTOpPHNWumc8wU8bkSapYg1zuTpaOsa223UvkSkX/FITxda/ogu6ZUyA3Qy/3MhGC5auu7lmHSkLj6PLvkhNshD1bXOHbjk2UvPdiKH/pkF69QRi6c3MYVtRHkyPXqAY+MTve0HP/QxKJt+4CGIz7z6EsSvP/dsb3tyfBzK2iuoTy2VUVs1VEPd1lvHjNaz28RrLg3jvnlKI7108s3edmPmLJRdm0Xt4dQ06tI+++lPQRxbmtJWA1PFnp/BNKrbJlAjWB1C3V/G8nGO2C+T7sF3sT11Le3m+x59D5TtvwNTDW80aWlsB8ljPdK9uR3qkxpGc7+ygL6nV46jhv7TD/8anjeDJ57Pm2e6RjrRoW27IR4vTUD8gYLRPLvLqIVNCvhuzl5Ef+i5J45CXLr7UG/bD7BfvXQJ77HRQD207Q3vOI4TWZ0H9323tDIixQv2djIxjLrurFfDS1k0fecaeQeHmEXacTKU4tzWaGbxifiknQ5X6hBHNTzWSMH0M6V51K1P5XDNxke20FgxZDTQI0OTUFYOsC6GDp43jrCvdM+fNdsu9leLNB7+4IV/g/jt+psQR5YvfZn70RrW61aGxkfH8qxfxLUxY9vQK38j6UvBnLDnr9n2OLU8H2vAeeiw/bkF6Jto/4G7If78l7/U2/7LP/8LKFucvTLgzI7jWd9in/9D1Erf9857IQ5oMU2piONOwVrbkcuj/tknv+Sii99quRzWTXuNBX+L8bcEPy+7nIe/tJTc10N/aRZCCCGEECIFfTQLIYQQQgiRgj6ahRBCCCGESGHdmuaT59EjdKiCvpW1gtGZTA6jrjbwUZ+S9PkQGp2SG2NZEOAlst9hQvrpwBKteKQddtkckfb1rf9CJHSekPLBt9qoK7pwYRbiFc9oeLxh1C0mPimafPy/S1DA5xXErd72sI/nnayQjog8LuPI/D5hX+p48/7PFJOKi3VJtkzLpWo5uR09nbcOo3fpvv3Gb/Qk6Z1fe/K/IPYiql8XUWvsLBod6bVrqCmde4ueXxt9X92m5WPcRp3eaoTP+u+//g2I3/zF8xBPbJ/qbY+QTnvvPtQ1LpP+cn4F447lC+v5qOvsNPC3KyuoLbtWNG35Shfb08gyerOOlvA93W5i0u8PFDFTH9OJsd2EPr6fZUvTO3sK68R0FTXjZdKMe9Q3BJan8UXq+/JV3HdLjOXbxs11FIrYDvJl1N+36lg/j5xFz+fQ8vidv4rv6tIs9ldZ0jxH/KxtEg7Xr2p2eedN1DRP70RN7+oVHKcWLlqe5Avokd0cw3UFThaPlXHM80tC6idIH+5U8LzdJvYjvm/ee2U7aonj7i6Iy218fs2W0SXHyy0oa3uoY8/QeqCIxh0/NNfZfXsFyr576kcQf+fEP+B15LG/ywZmPMyh1bQTJlg3V1r4/PyiOVZjCTXN5+foYBtIn8fvAG1/mv1v37qAW1gYENG4/tgHP9DbvnIZNczf+Ku/xn1p3Vezaer5NfLhP3jwIO4b0foyasruAINkl/5e6yf4XrlXcD3TvjJUys+adcoZK5amWQghhBBCiE1AH81CCCGEEEKkoI9mIYQQQgghUli3ptnN4E+XGqhnabSMFiZxUWPjkW9l2aM89tZ2hgQqBdKcZEmLmAtIDxxYXsuUZj2hY3k+egcudc2+i+Tvu9pATeT8VdSh1dfwuos1SztKudLzRbywLQXUh/sheqbmI/Nsx0lLXsqRNsjHZ9u1XlOnhe+s08F72kjYO5G16l7G3BfXtRy9xy7tWyyafYtl1AvmSG95/MgvIL5wGj14A0vzHBSwfgQu6wexDuy+22irH3jwYSh7+mdPQnz1CuoLf/gj1KMmidFw1YZQyzq2A01ji9VRiJ3cGITLLaNVXJxFj+dOC+uAn0PN7X3TxgN0idYb1FvYRjaamDzaGdu3metbEqNGN46wTS5cMX64RRd1knEGfWV/+tKLEEcetknH0jF3dm2Fop2T6J27cgL10+fnjb6z2EFNarWKz3uePLaTUVw7ERVMW5iduwxlFy6gx3McoaZwsK/s7ftbC3ttbyRjE9guOigJd2bnLvS2Mx2sH50YNc2tFu7sWZrmoEua5g7um4nwvRbW6NjW2ohrMb5zj/r3QhG11eVqrbcdBbj2w6W1MyHlIgio3nuWX/K3jn0byr7+0t9CXC/idWazUxDni+a6tk5hH+MkdQjjLvkSZ8zzLI+j9r696GwaXOsTXiOFpQOP1a9hvvHvuY1w39Z3bGt8/MTvfBLKZmawv/nRv/7whhd25OgRKJonz/DhYax7g+7BZZ98epquh3r6fl2yuaeMQ2vVWONM66Uy8mkWQgghhBBic9FHsxBCCCGEECmsW54xRK45NFPrtK0pqYVlnLYOyGItiXB6Jp8zl1GgufgOpbv1HZxG9DySGNjWSZTqNAzxz/jnZ3CK8sySOdcq2bGFXdw3n6cppjKe6+R5Mw3eaqNdz6GDD0KcWcV7qlBa3mFLctBt4lTeElnGlEooT3AtuYtHNno+TWNsJn1yDd9+bzTtHOLzCVyyRnLNvtum0I5t/KNo13b/wUcgfv7osxC/9tTh3vbCJZQy4MRgf2rmHXeaNKN73nsIyo6fOgHxtXMXIH7PY++H+NKFud72UBVlIrvuxbSpYaYG8Ssn0MJptWHqddSmtLsetq8wwn3nThsrvGj1AJQ5o3jejeZmpikjyhnPdovtiGz5LKvGiR04tXzi8imITy+g1CEkd7bZnLnOKQ/bZ2EMpTX//txTEOeXTV+RI+mMF2Dbdmpohbf/PZjee2zbdG/76SOHoWx1FfvofvpMn1J+/8uxmfKMmTfwnmdex36la1mddWjMCun2Ly+iPKMbGUnFLvrtGj3rmNIg5wJs34WKkWMFMUqzGnW0AVtexjEsWzZT6CH3kx6OWdUKTq8v+nWI//GEsZX7mze+BWXXArynoo8Spoim0LNWank/h/cbxdjfZ7FpOpY6w8l7uO/kNFpobiQeyTEG1l2SqF7naPT7QcdKuy6SPlg2jsUi9hmf/dznIL4yj33ZkcNGkjFzAceos2fPQjwxgdIzO9W142D/zNfIXx4BSTLdAbIJlnr0icf6/PxueKhfCv2lWQghhBBCiBT00SyEEEIIIUQK+mgWQgghhBAihXVrmstl1BJlM6jZWWsarSRJh51OB7Vja2SxE3aNwsVz0FKuNHxj/bPjOH3616XIlF9dQP3m3BymRp2fRQuVXM3YQXXJnm1hHtNR7ttD2mHSN/mh0TLevWcnlGVJ4xw3UPeYrdUgtmXcPmnt2DKl3WxRuWW3QrqhPu3PBuKSxs0ja0HIuMmaNrKY4wpmW854VKVbAf52dMduiH9jCtPU7jtwT2/72f/8MZSdfhFTdHfrqOtbs7T8b55BHexKg367WIf46M/Rki5MzHs9cPBeKMuVUMcXhVgHch7W3aVlo48u+Vj3RkewfV2eRx1bwdLHjdB5nYRV3htLHA/WCdptoU9vSLv6pO+/esVo+3ZOkqXfvnsgbhXwmW2j6rk6+3ZvuxBg/1QskV1iB626hiwNoj9Sg7I57Eadkb13QPzoRz4M8eyC0acfOYL2USHpam/Oemnwe0izxPpVcezIGYjdZg1/YOn9i7RGpdlCG1CSqjuOpYn3sjiG5WmdyeoKtsHFNVoDZPmEVragbr20+06Ii11sk5nQ1Dc3i++4Q23/9atot/nNF/4Z4u+ef6K3vURrmrJkEdbo4gMZKaIN48ioZbWY4DUvN5Yhbvg4hmWsdr+1ht8hy+wbuIFwCxmQaD6Vm1kx0GfX1peSmvTCVt+W0NqNnTvxW+TLX/kKxPPzX+ttnzn2BpQdfvoZiB9++KGB12X3Kfyt4aekwnZZ422vSXFT+ip+PpDNm8tkOSeEEEIIIcRtRx/NQgghhBBCpKCPZiGEEEIIIVJYt6b53CXU/+6eRP/I8S0mBadHuQrZDzibIX2LJY10M6gUimLUN3U91FFebeOxzy0avXS9RfqVDKYVzQyjPmpuweiW223UXeeK+KhePYbpmIcqqFsbHzWpjfOkHyySS+HoEOmjKV1wbHlGZxJ8thEdm3XLtr90q43P0vXX/fpvmSzrIBN8zxnr/29Fyn8eu3idSYDHilxzLJ90rxXSE8ak8Qq6KBSd2mc8kB8bRe/Rqf37IX7tyachPv7i0d726TPHoay5UIe4OoTHHt2JftIX3jYa2yOHX4ay0zOYcntsDNNoz53BVKmRtabgAx/7TSh74NfRH7qb4LPeO20034Ut2OY7SZoX6cYS96XGNjH3QbZ3qeM4jh/itbfaRlf56suYJvtd70LtXr2Ies78ZUyJ/ti40RovuHiNl+bRV/de8lrOuUbvednH/unA/ahh/t0vfRHimVk89re/9Xe9bfZlDoKb1aMP8lPGfjZNe77+495eJsZx/cLsW+h53LV0uRkao7oJ9hP5ANuCnfbepTEqW0INfK2A41Cb0tE3LS/+lVVKwd0ljXwZ+7dO0ZRfamE/8czbqEn95qvfg/iVVVyz0c6b8aIaUT0l1/rCJN7TyFZ81oFvrjPqYNvsNnG8K1WxDyqVrHZAuRe60eblGmANczww9TXeY//6IY7N7/stm+k7htYfuH2ux671W0pJTn32vffdD/FX/+Srve3vfe/7UMba/G4X31tAa49sXTJfR8J9hsvrlAhr/35dMv50kMdz35GVRlsIIYQQQojbjz6ahRBCCCGESEEfzUIIIYQQQqSwblHragM1J6dnFiEu5833d62CHo3D5K1YyOO3ejEwOr6VDp6nsYya3dYSek0udPHYiWduqVjA62DPwqjNXoFG01Yto9fm1hHUnIYN1AdH5KFa8Mx1tUmXtkr/VSmS/rJMXtRJbJ6By1pE8ptdI5/mTtfcc0g6zmx+8/7P5Maou/JIE2drmhMWdfVpmOi9+ebYtq71er9lrWJE/28sWBaiNQff+T0PHYJ4ehdqTI89bzTNx8l3ubuEPq8d0mkXy6h7nJ4y13Xq2Ak81hW8x4UV1Ga25lmraerP0edRi784glrFu9/1bognfdMO6m2q5N7m+jTb7cBx+v1JY0snn5CGOSINfUIKxWzG6PMWL89B2VP/8TbE0zvR69utjkFcK5nn0lhAnfGp596E+M5hrGNDE+bYkzumoGxs5wTEP/gB+uoePYwae7u78wuoR4z62sn/Tm/l20mYIe1wjGOJb2k2I9Jgtru4b62C7XW5bsaObhu1nq5bgdgPMC7i8ganuMVsJyH258sNXFv00rWfQ/zjOdMHPXfpGJS9RXVxrnsZYi/AOpGPzBiWK+FYOjqOdbM8gd7mhWIN4tCqjNwHT0/thbjVQu9lLzJtuRhsgbIIu4QNJSENc8xe8ABpeFP9gd0bbF/vyCn6aPvY/YJfCPkeDh0yY9z996PemfsIP21N1EB/5ME67YGHZVPnAed1HFw10S8Xv/l+T39pFkIIIYQQIgV9NAshhBBCCJHCuuUZGR+nbkL6c3orMFYwKz5O5UQR2sQkKzh9lWmYmFMme+Smkng0H5NB6UNi2R11aNqa5Rkhpfcu58x0FNvkNZZwymhyDKe1m2s4jbaybH4/VMW0od0IJwmu0tR9WELJSdGSNlToPVSrOE1YLKCspGXJSNo0vV4Ywn03FLIHcynGqRx8PgnZ5PD0lF0TI5qv61LM6ZUTssfyrGNXCzhvWs7hexwZRau3sWkzRTm5dweUvXb4OYgvnXwL4hefRdmEZ11X1MG6VZ/HtLN9UKt2q2bauRFh2+vUcYo6XMA2s1wyU8lFatfFzXN7ui48s2anbe2TH/DE3IC0rTx9PD+P09iXL89CvHvvXRCvVky9aJAlpF+ehLhyJ06Bli3rzqMvY51464coE+mQNeXoCNp+FSvGFo1t4NKmS/8vkgtIikWpoVttU2daDXxvi2207HNL+Oy316y+oIFjUkjt16VxqJvBY822F3rbx66cgrKfXHwK4icvYXr0s20jLermyPbMxwbLqeTdBB9IuWTGuOFp7M8K1RrE7MeWiXEcCrNG3pLJYweVj8k2r0nSqthcZ0yyyYy/eZ0Qjx391ormOj2Px7DBMZT1pZhmGVqKPdvAssEJvO1+s1jE8c5jCYXL90/Xldz4yjayvxn0bGPu9wZKbK6P/tIshBBCCCFECvpoFkIIIYQQIgV9NAshhBBCCJHCujXNfoBakKFhtH7JWalCvQB1R21KVRyihMtxLUuspIGFrOfJki4tk+H0zFZM2rEcXRfJjhzfstcKfLTTCsgWrtVAHXJAabZLZaOH5fSbq03ct03/dfEC0nRZGsvVedTLLa+h9m5sGNO7li1dUonS/2ay+Dw2knIJn89qwpZz5h59l3VW+B5Z02XbzIVkqRP5GLOEyX7njuM4niWiJ+m5E5KvYJLgs99SNprfbb/1ISg79L73QvzWa28MjC9cMKmwV+uoM/YofalHGtyhbWiBtm2/SQ2+e9cBKJsgq6hKFfWFRUv4WQ5QP5gLNlfUPEirlkafxVOfa5P5B04xXSxS+tg22o/NnEUbuYvB2d72RAUt5Xbv2AfxyYuY8vysZUnXWkXtejbA9rulipp77ndD63mlmQPejOb5Zt/DoN9vppb6pSNnIO7QOhU3Mnpyv4b6/X/xUEv85hwea9wx/a5L62FWumjPerWO6a3nVlAzf2bZlJ9bu4K/jfCaOYt0Lm/Ww2TpOrgf7dKYliuhFV5lzFgc5gJsA3GXtPqky45pzC8Om3OFCbaf+gLe/9gotpls3lxXSOmp/RwJ0zeQm1kHwP1Naj23yvvOQ++JLdf6XOWsc6f+li7D1i33XXPfedxBxXDutPvvK72JfiHtvXiDnofSaAshhBBCCHH70UezEEIIIYQQKeijWQghhBBCiBTc5FaEgkIIIYQQQvw/QH9pFkIIIYQQIgV9NAshhBBCCJGCPpqFEEIIIYRIQR/NQgghhBBCpKCPZiGEEEIIIVLQR7MQQgghhBAp6KNZCCGEEEKIFPTRLIQQQgghRAr6aBZCCCGEECKF/waIIWclnvT9dAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "test_ood_features_scores = ood.score(features=test_feature_embeddings)\n", + "\n", + "top_ood_features_idxs = find_top_issues(test_ood_features_scores, top=15)\n", + "visualize_outliers(top_ood_features_idxs, test_data)" + ] + }, + { + "cell_type": "markdown", + "id": "2c645c58", + "metadata": {}, + "source": [ + "Many outliers identified in `test_data` depict (non-animal) classes not present in the training set. These non-animal images have very different feature embeddings than the animal-only images in the training data." + ] + }, + { + "cell_type": "markdown", + "id": "0b5de6f6", + "metadata": {}, + "source": [ + "### Deciding which test examples are outliers\n", + "\n", + "Given outlier scores, how do we determine how many of the top-ranked examples in ``test_data`` should be marked as outliers? \n", + "\n", + "Inevitably this has some true positive / false positive trade-off, so let's suppose we want to ensure around at most 5% false positives. We can use the 5th percentile of the distribution of `train_ood_features_scores` (assuming the training data are in-distribution examples without outliers) as a hard score threshold below which to consider a test example an outlier.\n", + "\n", + "Let's plot the 5th percentile of the training outlier score distribution (shown as red line)." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "e9dff81b", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:25.245309Z", + "iopub.status.busy": "2024-06-25T23:04:25.245007Z", + "iopub.status.idle": "2024-06-25T23:04:25.583556Z", + "shell.execute_reply": "2024-06-25T23:04:25.583085Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "fifth_percentile = np.percentile(train_ood_features_scores, 5) # 5th percentile of the train_data distribution\n", + "\n", + "# Plot outlier_score distributions and the 5th percentile cutoff\n", + "fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 5))\n", + "plt_range = [min(train_ood_features_scores.min(),test_ood_features_scores.min()), \\\n", + " max(train_ood_features_scores.max(),test_ood_features_scores.max())]\n", + "axes[0].hist(train_ood_features_scores, range=plt_range, bins=50)\n", + "axes[0].set(title='train_outlier_scores distribution', ylabel='Frequency')\n", + "axes[0].axvline(x=fifth_percentile, color='red', linewidth=2)\n", + "axes[1].hist(test_ood_features_scores, range=plt_range, bins=50)\n", + "axes[1].set(title='test_outlier_scores distribution', ylabel='Frequency')\n", + "axes[1].axvline(x=fifth_percentile, color='red', linewidth=2)\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "74c39ab1", + "metadata": {}, + "source": [ + "All test examples whose `test_ood_features_scores` fall left of the red line will be marked as an outlier.\n", + "\n", + "Let's plot the least-certain outliers of our `test_data` (i.e. 15 images with outlier scores right along the threshold). These are the images immediately to the left of that cutoff threshold (red line). The majority of them are still truly out-of-distribution non-animal images, but there are a few atypical-looking animals that are now erroneously identified as outliers as well." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "616769f8", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:25.585904Z", + "iopub.status.busy": "2024-06-25T23:04:25.585527Z", + "iopub.status.idle": "2024-06-25T23:04:25.828572Z", + "shell.execute_reply": "2024-06-25T23:04:25.827942Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "sorted_idxs = test_ood_features_scores.argsort()\n", + "ood_features_scores = test_ood_features_scores[sorted_idxs]\n", + "ood_features_indices = sorted_idxs[ood_features_scores < fifth_percentile] # Images in test data flagged as outliers\n", + "\n", + "visualize_outliers(ood_features_indices[::-1], test_data)" + ] + }, + { + "cell_type": "markdown", + "id": "cb4c0a06", + "metadata": {}, + "source": [ + "### How does cleanlab detect outliers from feature values?\n", + "\n", + "Outlier scores are defined relative to the average distance (computed over feature values) between each example and its K nearest neighbors in the training data. Such scores have been found to be particularly effective for out-of-distribution detection, see this paper for more details:\n", + "\n", + "[Back to the Basics: Revisiting Out-of-Distribution Detection Baselines](https://arxiv.org/abs/2207.03061)\n", + "\n", + "\n", + "Internally, cleanlab uses the `sklearn.neighbors.NearestNeighbor` class (with *cosine* distance) to find the K nearest neighbors, but you can easily use [another KNN estimator](https://github.com/cleanlab/examples/blob/master/outlier_detection_cifar10/outlier_detection_cifar10.ipynb) with cleanlab's `OutOfDistribution` class." + ] + }, + { + "cell_type": "markdown", + "id": "937c7e97", + "metadata": {}, + "source": [ + "## 4. Use cleanlab and `pred_probs` to find outliers in the data\n", + "\n", + "We sometimes wish to find outliers in classification datasets for which we do not have meaningful numeric feature representations. In this case, cleanlab can detect unusual examples in the data solely using predicted probabilities from a trained classifier.\n", + "\n", + "To get `pred_probs` here, a Logistic Regression classifier is fit on the already generated `train_feature_embeddings` (from our pretrained timm network) and the given label for each training image. We use a simple classifier here to quickly generate `pred_probs`, but in practice [fine-tuning the entire neural network for classification](https://github.com/cleanlab/examples/blob/master/outlier_detection_cifar10/outlier_detection_cifar10.ipynb) will be more effective (our approach here is equivalent to only training an extra output layer appended on top of the pretrained network)." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "40fed4ef", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:25.831324Z", + "iopub.status.busy": "2024-06-25T23:04:25.830758Z", + "iopub.status.idle": "2024-06-25T23:04:25.913339Z", + "shell.execute_reply": "2024-06-25T23:04:25.912839Z" + } + }, + "outputs": [], + "source": [ + "# Preprocess data\n", + "train_labels = np.array(train_data.dataset.targets)[train_data.indices]\n", + "train_labels = np.unique(train_labels, return_inverse=True)[1] # MAKE SURE to zero index training labels for sklearn\n", + "test_labels = np.array(test_data.dataset.targets)[test_data.indices]\n", + "\n", + "scaler = preprocessing.StandardScaler().fit(train_feature_embeddings)\n", + "train_feature_embeddings_scaled = scaler.transform(train_feature_embeddings)\n", + "test_feature_embeddings_scaled = scaler.transform(test_feature_embeddings)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "89f9db72", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:25.915598Z", + "iopub.status.busy": "2024-06-25T23:04:25.915418Z", + "iopub.status.idle": "2024-06-25T23:04:35.984857Z", + "shell.execute_reply": "2024-06-25T23:04:35.984226Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model accuracy on held-out train_data 0.9702\n" + ] + } + ], + "source": [ + "# Our classifier employs bagging to better account for epistemic uncertainty \n", + "model = BaggingClassifier(LogisticRegression(max_iter=500), random_state=1, n_jobs=-1)\n", + "model.fit(train_feature_embeddings_scaled, train_labels)\n", + "\n", + "train_pred_probs = model.predict_proba(train_feature_embeddings_scaled)\n", + "train_pred_labels = train_pred_probs.argmax(1)\n", + "accuracy = np.mean(train_pred_labels == train_labels)\n", + "print(f\"Model accuracy on held-out train_data {accuracy}\")" + ] + }, + { + "cell_type": "markdown", + "id": "03e3f7b7", + "metadata": {}, + "source": [ + "We can use these `pred_probs` to again compute out-of-distribution scores for each image in our dataset using cleanlab's `OutOfDistribution` class." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "874c885a", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:35.987072Z", + "iopub.status.busy": "2024-06-25T23:04:35.986845Z", + "iopub.status.idle": "2024-06-25T23:04:38.178455Z", + "shell.execute_reply": "2024-06-25T23:04:38.177824Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Fitting OOD estimator based on provided pred_probs ...\n" + ] + } + ], + "source": [ + "ood = OutOfDistribution()\n", + "train_ood_predictions_scores = ood.fit_score(pred_probs=train_pred_probs, labels=train_labels)" + ] + }, + { + "cell_type": "markdown", + "id": "dcff8e5a", + "metadata": {}, + "source": [ + "We can repeat this for additional test data, to identify test images that do not stem from the training data distribution." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "e110fc4b", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:38.181124Z", + "iopub.status.busy": "2024-06-25T23:04:38.180742Z", + "iopub.status.idle": "2024-06-25T23:04:38.381286Z", + "shell.execute_reply": "2024-06-25T23:04:38.380674Z" + } + }, + "outputs": [], + "source": [ + "test_pred_probs = model.predict_proba(test_feature_embeddings_scaled)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "85b60cbf", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:38.383643Z", + "iopub.status.busy": "2024-06-25T23:04:38.383433Z", + "iopub.status.idle": "2024-06-25T23:04:38.386599Z", + "shell.execute_reply": "2024-06-25T23:04:38.386060Z" + } + }, + "outputs": [], + "source": [ + "test_ood_predictions_scores = ood.score(pred_probs=test_pred_probs)" + ] + }, + { + "cell_type": "markdown", + "id": "702aa162", + "metadata": {}, + "source": [ + "Detecting outliers based on feature embeddings can be done for arbitrary unlabeled datasets, but requires a meaningful numerical representation of the data. Detecting outliers based on predicted probabilities applies mainly for labeled classification datasets, but can be done with any effective classifier. The effectiveness of the latter approach depends on: how much auxiliary information captured in the feature values is lost in the predicted probabilities (determined by the particular set of labels in the classification task), the accuracy of our classifier, and how properly its predictions reflect epistemic uncertainty. Read more about it [here](https://pub.towardsai.net/a-simple-adjustment-improves-out-of-distribution-detection-for-any-classifier-5e96bbb2d627)." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "17f96fa6", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:38.388700Z", + "iopub.status.busy": "2024-06-25T23:04:38.388398Z", + "iopub.status.idle": "2024-06-25T23:04:38.396902Z", + "shell.execute_reply": "2024-06-25T23:04:38.396445Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "# Verify the top identified test outliers data are mostly non-animal images\n", + "top_ood_features_subset = torch.utils.data.Subset(test_data, top_ood_features_idxs)\n", + "num_animals = len([i for i in range(len(top_ood_features_subset)) if top_ood_features_subset[i][1] in animal_classes])\n", + "non_animal_frac = 1 - (num_animals / len(top_ood_features_subset))\n", + "if non_animal_frac < 0.81:\n", + " raise Exception(f\"Not enough non-animal images amongst top-ranked outliers in test_data, only: {non_animal_frac}\")\n", + "\n", + "top_ood_predictions_idxs = (test_ood_predictions_scores).argsort()[:15]\n", + "top_ood_predictions_subset = torch.utils.data.Subset(test_data, top_ood_predictions_idxs)\n", + "num_animals = len([i for i in range(len(top_ood_predictions_subset)) if top_ood_predictions_subset[i][1] in animal_classes])\n", + "non_animal_frac = 1 - (num_animals / len(top_ood_predictions_subset))\n", + "if non_animal_frac < 0.50:\n", + " raise Exception(f\"Not enough non-animal images amongst top-ranked ood datapoints in test_data, only: {non_animal_frac}\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": { + "24d684323edc4a1983e3d36f626813ea": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_2c6649d08a3741ccaa3875fa9897b05b", + "max": 102469840.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_468e0bb8a3584bf48d9a209cfe1ba29f", + "tabbable": null, + "tooltip": null, + "value": 102469840.0 + } + }, + "2c6649d08a3741ccaa3875fa9897b05b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3b464bd557e54de294a2d7b552708128": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "468e0bb8a3584bf48d9a209cfe1ba29f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "6f2a847b203b42d2b622adcc0398f41b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_9875ef0b112f4f15b37ab2b2a39546ae", + "placeholder": "​", + "style": "IPY_MODEL_e6c6988d2500492887ac8603988aa804", + "tabbable": null, + "tooltip": null, + "value": "model.safetensors: 100%" + } + }, + "82a8864b3ecf45ee8c0f45ae76d8db01": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_c84e2ced9a1148de83443d39d384c650", + "placeholder": "​", + "style": "IPY_MODEL_3b464bd557e54de294a2d7b552708128", + "tabbable": null, + "tooltip": null, + "value": " 102M/102M [00:00<00:00, 131MB/s]" + } + }, + "9875ef0b112f4f15b37ab2b2a39546ae": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "aa545006254f4fc6b53bddaa1b160a6e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_6f2a847b203b42d2b622adcc0398f41b", + "IPY_MODEL_24d684323edc4a1983e3d36f626813ea", + "IPY_MODEL_82a8864b3ecf45ee8c0f45ae76d8db01" + ], + "layout": "IPY_MODEL_d821ef5a1f354429939318d2e7725cbb", + "tabbable": null, + "tooltip": null + } + }, + "c84e2ced9a1148de83443d39d384c650": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d821ef5a1f354429939318d2e7725cbb": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e6c6988d2500492887ac8603988aa804": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + } + }, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/regression.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/regression.ipynb new file mode 100644 index 000000000..ad30b1152 --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/regression.ipynb @@ -0,0 +1,1444 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "ea0a577e", + "metadata": {}, + "source": [ + "# Find Noisy Labels in Regression Datasets" + ] + }, + { + "cell_type": "markdown", + "id": "e15b9f2f", + "metadata": {}, + "source": [ + "This 5-minute quickstart tutorial uses cleanlab to find potentially incorrect numeric values in a dataset column by means of a regression model. Unlike classification models, regression predicts numeric quantities such as price, income, age,... Response values in regression datasets may be corrupted due to: data entry or measurement errors, noise from sensors or other processes, or broken data pipelines. To find corrupted values in a numeric column, we treat it as the target value, i.e. label, to be predicted by a regression model and then use cleanlab to decide when the model predictions are trustworthy while deviating from the observed label value.\n", + "\n", + "In this tutorial, we consider a student grades dataset, which records three exam grades and some optional notes for over 900 students, each being assigned a final score. Combined with any regression model of your choosing, cleanlab automatically identifies examples in this dataset that have incorrect final scores.\n", + "\n", + "**Overview of what we’ll do in this tutorial:**\n", + "\n", + "- Fit a simple Gradient Boosting model (any other model could be used) on the exam-score and notes (covariates) in order to compute out-of-sample predictions of the final grade (the response variable in our regression).\n", + "- Use cleanlab's `CleanLearning.find_label_issues()` method to identify potentially incorrect final grade values based on outputs from this regression model.\n", + "- Train a more robust version of the same model after dropping the identified label errors using CleanLearning.\n", + "- Run an alternative workflow to detect errors via cleanlab's `Datalab` audit, which can simultaneously estimate **many other types of data issues**." + ] + }, + { + "cell_type": "markdown", + "id": "612a355a", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "Already have an sklearn-compatible regression `model`, features/covariates `X`, and a label/target variable `y`? Run the code below to train your `model` and identify potentially incorrect `y` values in your dataset.\n", + "\n", + "\n", + "
\n", + " \n", + "```python\n", + "\n", + "from cleanlab.regression.learn import CleanLearning\n", + "\n", + "cl = CleanLearning(model)\n", + "cl.fit(X, y)\n", + "label_issues = cl.get_label_issues()\n", + "preds = cl.predict(X_test) # predictions from a version of your model trained on auto-cleaned data\n", + "```\n", + " \n", + "
\n", + " \n", + "Is your model/data not compatible with `CleanLearning`? You can instead run cross-validation on your model to get out-of-sample `predictions`. With that, run the code below to find data and label issues in your regression dataset:\n", + "\n", + "
\n", + " \n", + "```python\n", + "\n", + "from cleanlab import Datalab\n", + "\n", + "# Assuming your dataset has a label column named 'label'\n", + "lab = Datalab(dataset, label_name='label', task='regression')\n", + "# To detect more data issue types, optionally supply `features` (numeric dataset values or model embeddings of the data)\n", + "lab.find_issues(pred_probs=predictions, features=features)\n", + "\n", + "lab.report()\n", + " \n", + "```\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "f9a290d6", + "metadata": {}, + "source": [ + "## 1. Install required dependencies" + ] + }, + { + "cell_type": "markdown", + "id": "8430ca39", + "metadata": {}, + "source": [ + "You can use `pip` to install all packages required for this tutorial as follows:\n", + "\n", + "```ipython3\n", + "!pip install matplotlib\n", + "!pip install cleanlab[datalab]\n", + "# Make sure to install the version corresponding to this tutorial\n", + "# E.g. if viewing master branch documentation:\n", + "# !pip install git+https://github.com/cleanlab/cleanlab.git\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "2e1af7d8", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:42.661691Z", + "iopub.status.busy": "2024-06-25T23:04:42.661298Z", + "iopub.status.idle": "2024-06-25T23:04:43.832366Z", + "shell.execute_reply": "2024-06-25T23:04:43.831756Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs website).\n", + "\n", + "dependencies = [\"cleanlab\", \"matplotlib>=3.6.0\", \"datasets\"]\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = \" \".join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "4fb10b8f", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:43.835062Z", + "iopub.status.busy": "2024-06-25T23:04:43.834658Z", + "iopub.status.idle": "2024-06-25T23:04:43.851743Z", + "shell.execute_reply": "2024-06-25T23:04:43.851297Z" + } + }, + "outputs": [], + "source": [ + "import pandas as pd\n", + "from sklearn.ensemble import HistGradientBoostingRegressor\n", + "from sklearn.model_selection import cross_val_predict\n", + "from sklearn.metrics import r2_score\n", + "\n", + "from cleanlab.regression.learn import CleanLearning" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "284dc264", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:43.853855Z", + "iopub.status.busy": "2024-06-25T23:04:43.853470Z", + "iopub.status.idle": "2024-06-25T23:04:43.856588Z", + "shell.execute_reply": "2024-06-25T23:04:43.856130Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# This cell is hidden from docs.cleanlab.ai \n", + "\n", + "import random \n", + "import numpy as np \n", + "\n", + "SEED = 111 # for reproducibility \n", + "\n", + "np.random.seed(SEED)\n", + "random.seed(SEED)" + ] + }, + { + "cell_type": "markdown", + "id": "2035042e", + "metadata": {}, + "source": [ + "## 2. Load and process the data" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "0f7450db", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:43.858545Z", + "iopub.status.busy": "2024-06-25T23:04:43.858248Z", + "iopub.status.idle": "2024-06-25T23:04:43.922391Z", + "shell.execute_reply": "2024-06-25T23:04:43.921830Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
exam_1exam_2exam_3notesfinal_scoretrue_final_score
0728180NaN73.373.3
1896293NaN83.883.8
297094NaN73.573.5
3807696missed class frequently -1078.678.6
4678795missed homework frequently -1074.174.1
\n", + "
" + ], + "text/plain": [ + " exam_1 exam_2 exam_3 notes final_score \\\n", + "0 72 81 80 NaN 73.3 \n", + "1 89 62 93 NaN 83.8 \n", + "2 97 0 94 NaN 73.5 \n", + "3 80 76 96 missed class frequently -10 78.6 \n", + "4 67 87 95 missed homework frequently -10 74.1 \n", + "\n", + " true_final_score \n", + "0 73.3 \n", + "1 83.8 \n", + "2 73.5 \n", + "3 78.6 \n", + "4 74.1 " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_data = pd.read_csv(\"https://s.cleanlab.ai/student_grades_r/train.csv\")\n", + "test_data = pd.read_csv(\"https://s.cleanlab.ai/student_grades_r/test.csv\")\n", + "train_data.head()" + ] + }, + { + "cell_type": "markdown", + "id": "aa0165ef", + "metadata": {}, + "source": [ + "In the DataFrame above, `final_score` represents the noisy scores and `true_final_score` represents the ground truth. Note that ground truth is usually not available in real-world datasets, and is just added in this tutorial dataset for demonstration purposes." + ] + }, + { + "cell_type": "markdown", + "id": "82285102", + "metadata": {}, + "source": [ + "We show a 3D scatter plot of the exam grades, with the color hue corresponding to the final score for each student. Incorrect datapoints are marked with an **X**." + ] + }, + { + "cell_type": "markdown", + "id": "c8173840", + "metadata": {}, + "source": [ + "
See the code to visualize the data. **(click to expand)**\n", + " \n", + "```ipython3\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + " \n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from mpl_toolkits.mplot3d import Axes3D\n", + "\n", + "def plot_data(train_data, errors_idx):\n", + " fig = plt.figure()\n", + " ax = fig.add_subplot(111, projection='3d')\n", + "\n", + " x, y, z = train_data[\"exam_1\"], train_data[\"exam_2\"], train_data[\"exam_3\"]\n", + " labels = train_data[\"final_score\"]\n", + "\n", + " img = ax.scatter(x, y, z, c=labels, cmap=\"jet\")\n", + " fig.colorbar(img)\n", + "\n", + " ax.plot(\n", + " x.iloc[errors_idx],\n", + " y.iloc[errors_idx],\n", + " z.iloc[errors_idx],\n", + " \"x\",\n", + " markeredgecolor=\"black\",\n", + " markersize=10,\n", + " markeredgewidth=2.5,\n", + " alpha=0.8,\n", + " label=\"Label Errors\"\n", + " )\n", + " ax.legend()\n", + "```\n", + " \n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "55513fed", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:43.924711Z", + "iopub.status.busy": "2024-06-25T23:04:43.924376Z", + "iopub.status.idle": "2024-06-25T23:04:44.107678Z", + "shell.execute_reply": "2024-06-25T23:04:44.107082Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from mpl_toolkits.mplot3d import Axes3D\n", + "\n", + "def plot_data(train_data, errors_idx):\n", + " fig = plt.figure()\n", + " ax = fig.add_subplot(111, projection='3d')\n", + "\n", + " x, y, z = train_data[\"exam_1\"], train_data[\"exam_2\"], train_data[\"exam_3\"]\n", + " labels = train_data[\"final_score\"]\n", + "\n", + " img = ax.scatter(x, y, z, c=labels, cmap=\"jet\")\n", + " fig.colorbar(img)\n", + "\n", + " ax.plot(\n", + " x.iloc[errors_idx],\n", + " y.iloc[errors_idx],\n", + " z.iloc[errors_idx],\n", + " \"x\",\n", + " markeredgecolor=\"black\",\n", + " markersize=10,\n", + " markeredgewidth=2.5,\n", + " alpha=0.8,\n", + " label=\"Label Errors\"\n", + " )\n", + " ax.legend()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "df5a0f59", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:44.110202Z", + "iopub.status.busy": "2024-06-25T23:04:44.109907Z", + "iopub.status.idle": "2024-06-25T23:04:44.348653Z", + "shell.execute_reply": "2024-06-25T23:04:44.348060Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "errors_mask = train_data[\"final_score\"] != train_data[\"true_final_score\"]\n", + "errors_idx = np.where(errors_mask == 1)\n", + "\n", + "plot_data(train_data, errors_idx)" + ] + }, + { + "cell_type": "markdown", + "id": "add939ae", + "metadata": {}, + "source": [ + "Next we preprocess the data by applying one-hot encoding to features with categorical data (this is optional if your regression model can work directly with categorical features)." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "7af78a8a", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:44.351210Z", + "iopub.status.busy": "2024-06-25T23:04:44.350779Z", + "iopub.status.idle": "2024-06-25T23:04:44.355202Z", + "shell.execute_reply": "2024-06-25T23:04:44.354700Z" + } + }, + "outputs": [], + "source": [ + "feature_columns = [\"exam_1\", \"exam_2\", \"exam_3\", \"notes\"]\n", + "predicted_column = \"final_score\"\n", + "\n", + "X_train_raw, y_train = train_data[feature_columns], train_data[predicted_column]\n", + "X_test_raw, y_test = test_data[feature_columns], test_data[predicted_column]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "9556c624", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:44.357328Z", + "iopub.status.busy": "2024-06-25T23:04:44.356872Z", + "iopub.status.idle": "2024-06-25T23:04:44.364060Z", + "shell.execute_reply": "2024-06-25T23:04:44.363629Z" + } + }, + "outputs": [], + "source": [ + "categorical_features = [\"notes\"]\n", + "X_train = pd.get_dummies(X_train_raw, columns=categorical_features)\n", + "X_test = pd.get_dummies(X_test_raw, columns=categorical_features)" + ] + }, + { + "cell_type": "markdown", + "id": "1ce924cf", + "metadata": {}, + "source": [ + "
\n", + "Bringing Your Own Data (BYOD)?\n", + "\n", + "Assign your data's features to variable `X` and the target values to variable `y` instead, then continue with the rest of the tutorial.\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "4b14309d", + "metadata": {}, + "source": [ + "## 3. Define a regression model and use cleanlab to find potential label errors" + ] + }, + { + "cell_type": "markdown", + "id": "81ee2349", + "metadata": {}, + "source": [ + "We'll first demonstrate regression with noisy labels via the `CleanLearning` class that can wrap any scikit-learn compatible regression model you have. `CleanLearning` uses your model to estimate label issues (i.e. noisy `y`-values) and train a more robust version of the same model when the original data contains noisy labels.\n", + "\n", + "Here we define a `CleanLearning` object with a histogram-based gradient boosting model (sklearn version of XGBoost) and use the `find_label_issues` method to find potential errors in our dataset's numeric label column. Any other sklearn-compatible regression model could be used, such as `LinearRegression` or `RandomForestRegressor` (or you can easily wrap arbitrary custom models to be compatible with the sklearn API)." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "3c2f1ccc", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:44.366088Z", + "iopub.status.busy": "2024-06-25T23:04:44.365912Z", + "iopub.status.idle": "2024-06-25T23:04:44.368463Z", + "shell.execute_reply": "2024-06-25T23:04:44.368030Z" + } + }, + "outputs": [], + "source": [ + "model = HistGradientBoostingRegressor()\n", + "cl = CleanLearning(model)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "7e1b7860", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:44.370335Z", + "iopub.status.busy": "2024-06-25T23:04:44.370153Z", + "iopub.status.idle": "2024-06-25T23:04:52.997167Z", + "shell.execute_reply": "2024-06-25T23:04:52.996527Z" + } + }, + "outputs": [], + "source": [ + "label_issues = cl.find_label_issues(X_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "id": "43bd6c7f", + "metadata": {}, + "source": [ + "`CleanLearning` internally fits multiple copies of our regression model via cross-validation and bootstrapping in order to compute predictions and uncertainty estimates for the dataset. These are used to identify label issues (i.e. likely corrupted `y`-values).\n", + "\n", + "This method returns a Dataframe containing a label quality score (between 0 and 1) for each example in your dataset. Lower scores indicate examples more likely to be mislabeled with an erroneous `y` value. The Dataframe also contains a boolean column specifying whether or not each example is identified to have a label issue (indicating its `y`-value appears potentially corrupted). " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "f407bd69", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:53.000185Z", + "iopub.status.busy": "2024-06-25T23:04:52.999553Z", + "iopub.status.idle": "2024-06-25T23:04:53.006676Z", + "shell.execute_reply": "2024-06-25T23:04:53.006114Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_label_issuelabel_qualitygiven_labelpredicted_label
0False0.38510173.376.499503
1False0.69825583.882.776647
2True0.10937373.563.170547
3False0.48109678.675.984759
4False0.64527074.175.795928
\n", + "
" + ], + "text/plain": [ + " is_label_issue label_quality given_label predicted_label\n", + "0 False 0.385101 73.3 76.499503\n", + "1 False 0.698255 83.8 82.776647\n", + "2 True 0.109373 73.5 63.170547\n", + "3 False 0.481096 78.6 75.984759\n", + "4 False 0.645270 74.1 75.795928" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_issues.head()" + ] + }, + { + "cell_type": "markdown", + "id": "4ab5acf3", + "metadata": {}, + "source": [ + "We can get the subset of examples flagged with label issues, and also sort by label quality score to find the indices of the 10 most likely mislabeled examples in our regression dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "f7385336", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:53.009002Z", + "iopub.status.busy": "2024-06-25T23:04:53.008671Z", + "iopub.status.idle": "2024-06-25T23:04:53.012194Z", + "shell.execute_reply": "2024-06-25T23:04:53.011768Z" + } + }, + "outputs": [], + "source": [ + "identified_issues = label_issues[label_issues[\"is_label_issue\"] == True]\n", + "lowest_quality_labels = label_issues[\"label_quality\"].argsort()[:10].to_numpy()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "59fc3091", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:53.014198Z", + "iopub.status.busy": "2024-06-25T23:04:53.013886Z", + "iopub.status.idle": "2024-06-25T23:04:53.017016Z", + "shell.execute_reply": "2024-06-25T23:04:53.016517Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "cleanlab found 141 potential label errors in the dataset.\n", + "Here are indices of the top 10 most likely errors: \n", + " [659 367 56 318 305 560 657 688 117 160]\n" + ] + } + ], + "source": [ + "print(\n", + " f\"cleanlab found {len(identified_issues)} potential label errors in the dataset.\\n\"\n", + " f\"Here are indices of the top 10 most likely errors: \\n {lowest_quality_labels}\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "aa2c1fec", + "metadata": {}, + "source": [ + "Let’s review some of the values most likely to be erroneous. To help us inspect these datapoints, we define a method to print any example from the dataset, together with its given (original) label and the suggested alternative label predicted by your regression model." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "00949977", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:53.019136Z", + "iopub.status.busy": "2024-06-25T23:04:53.018707Z", + "iopub.status.idle": "2024-06-25T23:04:53.021885Z", + "shell.execute_reply": "2024-06-25T23:04:53.021330Z" + } + }, + "outputs": [], + "source": [ + "def view_datapoint(index):\n", + " given_labels = label_issues[\"given_label\"]\n", + " predicted_labels = label_issues[\"predicted_label\"].round(1)\n", + " return pd.concat(\n", + " [X_train_raw, given_labels, predicted_labels], axis=1\n", + " ).iloc[index]" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "b6c1ae3a", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:53.023931Z", + "iopub.status.busy": "2024-06-25T23:04:53.023669Z", + "iopub.status.idle": "2024-06-25T23:04:53.032049Z", + "shell.execute_reply": "2024-06-25T23:04:53.031590Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
exam_1exam_2exam_3notesgiven_labelpredicted_label
659679393NaN17.484.1
36778086NaN0.056.7
56758369NaN8.971.7
318418898missed class frequently -100.071.9
30597090NaN19.161.6
\n", + "
" + ], + "text/plain": [ + " exam_1 exam_2 exam_3 notes given_label \\\n", + "659 67 93 93 NaN 17.4 \n", + "367 78 0 86 NaN 0.0 \n", + "56 75 83 69 NaN 8.9 \n", + "318 41 88 98 missed class frequently -10 0.0 \n", + "305 97 0 90 NaN 19.1 \n", + "\n", + " predicted_label \n", + "659 84.1 \n", + "367 56.7 \n", + "56 71.7 \n", + "318 71.9 \n", + "305 61.6 " + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "view_datapoint(lowest_quality_labels[:5])" + ] + }, + { + "cell_type": "markdown", + "id": "f2be7a93", + "metadata": {}, + "source": [ + "These are very clear errors that cleanlab has identified in this data! Note that the `given_label` does not correctly reflect the final grade that these student should be getting. \n", + "\n", + "cleanlab has shortlisted the most likely label errors to speed up your data cleaning process. With this list, you can decide whether to fix these label issues or remove erroneous examples from the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "9131d82d", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:53.034121Z", + "iopub.status.busy": "2024-06-25T23:04:53.033797Z", + "iopub.status.idle": "2024-06-25T23:04:53.036428Z", + "shell.execute_reply": "2024-06-25T23:04:53.035992Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# This cell is hidden from docs.cleanlab.ai \n", + "\n", + "label_issues_cl = label_issues.copy()" + ] + }, + { + "cell_type": "markdown", + "id": "e2761486", + "metadata": {}, + "source": [ + "## 4. Train a more robust model from noisy labels" + ] + }, + { + "cell_type": "markdown", + "id": "043bfb52", + "metadata": {}, + "source": [ + "Fixing the label issues manually may be time-consuming, but cleanlab can filter these noisy examples and train a model on the remaining clean data for you automatically.\n", + "\n", + "To establish a baseline, let’s first train and evaluate our original Gradient Boosting model." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "31c704e7", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:53.038403Z", + "iopub.status.busy": "2024-06-25T23:04:53.038069Z", + "iopub.status.idle": "2024-06-25T23:04:53.159037Z", + "shell.execute_reply": "2024-06-25T23:04:53.158418Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "r-squared score of original model: 0.838\n" + ] + } + ], + "source": [ + "baseline_model = HistGradientBoostingRegressor() \n", + "baseline_model.fit(X_train, y_train)\n", + "\n", + "preds_og = baseline_model.predict(X_test)\n", + "r2_og = r2_score(y_test, preds_og)\n", + "print(f\"r-squared score of original model: {r2_og:.3f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "0d01f715", + "metadata": {}, + "source": [ + "Now that we have a baseline, let’s check if using `CleanLearning` improves our test accuracy.\n", + "\n", + "`CleanLearning` provides a wrapper that can be applied to any scikit-learn compatible model. The resulting model object can be used in the same manner, but it will now train more robustly if the data has noisy labels.\n", + "\n", + "We can use the same `CleanLearning` object defined above, and pass the label issues we already computed into `.fit()` via the `label_issues` argument. This accelerates things; if we did not provide the label issues, then they would be re-estimated via cross-validation. After the issues are estimated, `CleanLearning` simply removes the examples with label issues and retrains your model on the remaining clean data." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "0bcc43db", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:53.161459Z", + "iopub.status.busy": "2024-06-25T23:04:53.161110Z", + "iopub.status.idle": "2024-06-25T23:04:53.265546Z", + "shell.execute_reply": "2024-06-25T23:04:53.264942Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "r-squared score of cleanlab's model: 0.926\n" + ] + } + ], + "source": [ + "found_label_issues = cl.get_label_issues()\n", + "cl.fit(X_train, y_train, label_issues=found_label_issues)\n", + "\n", + "preds_cl = cl.predict(X_test)\n", + "r2_cl = r2_score(y_test, preds_cl)\n", + "print(f\"r-squared score of cleanlab's model: {r2_cl:.3f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "3aea51da", + "metadata": {}, + "source": [ + "We can see that the coefficient of determination (r-squared score) of the test set improved as a result of the data cleaning. Note that this will not always be the case, especially when we are evaluating on test data that are themselves noisy. The best practice is to run cleanlab to identify potential label issues and then manually review them, before blindly trusting any evaluation metrics. In particular, the most effort should be made to ensure high-quality test data, which is supposed to reflect the expected performance of our model during deployment." + ] + }, + { + "cell_type": "markdown", + "id": "167fca90", + "metadata": {}, + "source": [ + "## 5. Other ways to find noisy labels in regression datasets" + ] + }, + { + "cell_type": "markdown", + "id": "5b4f8e14", + "metadata": {}, + "source": [ + "The `CleanLearning` workflow above requires a sklearn-compatible model. If your model or data format is not compatible with the requirements for using `CleanLearning`, you can instead run [cross-validation on your regression model to get out-of-sample predictions](https://docs.cleanlab.ai/stable/tutorials/pred_probs_cross_val.html), and then use the `Datalab` audit to estimate label quality scores for each example in your dataset.\n", + "\n", + "This approach requires two inputs:\n", + "\n", + "- `labels`: numpy array of given labels in the dataset. \n", + "- `predictions`: numpy array of predictions for each example in the dataset from your favorite model (these should be out-of-sample predictions to get the best results)." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "7021bd68", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:53.268198Z", + "iopub.status.busy": "2024-06-25T23:04:53.267762Z", + "iopub.status.idle": "2024-06-25T23:04:53.752233Z", + "shell.execute_reply": "2024-06-25T23:04:53.751615Z" + } + }, + "outputs": [], + "source": [ + "# Get out-of-sample predictions using cross-validation:\n", + "model = HistGradientBoostingRegressor()\n", + "predictions = cross_val_predict(estimator=model, X=X_train, y=y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "d49c990b", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:53.755037Z", + "iopub.status.busy": "2024-06-25T23:04:53.754559Z", + "iopub.status.idle": "2024-06-25T23:04:53.826716Z", + "shell.execute_reply": "2024-06-25T23:04:53.826019Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding label issues ...\n", + "\n", + "Audit complete. 50 issues found in the dataset.\n" + ] + } + ], + "source": [ + "from cleanlab import Datalab\n", + "\n", + "lab = Datalab(\n", + " data=train_data.drop(columns=[\"true_final_score\"]),\n", + " label_name=\"final_score\",\n", + " task=\"regression\",\n", + ")\n", + "\n", + "lab.find_issues(\n", + " pred_probs=predictions,\n", + " issue_types={\"label\": {}}, # specify we're only interested in label issues here \n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "dbab6fb3", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:53.829119Z", + "iopub.status.busy": "2024-06-25T23:04:53.828699Z", + "iopub.status.idle": "2024-06-25T23:04:53.837304Z", + "shell.execute_reply": "2024-06-25T23:04:53.836821Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_label_issuelabel_scoregiven_labelpredicted_label
318True1.968627e-090.078.228799
659True2.646674e-0817.486.402962
56True4.323818e-088.975.952758
160True2.422144e-070.060.456908
367True8.465815e-070.055.753968
\n", + "
" + ], + "text/plain": [ + " is_label_issue label_score given_label predicted_label\n", + "318 True 1.968627e-09 0.0 78.228799\n", + "659 True 2.646674e-08 17.4 86.402962\n", + "56 True 4.323818e-08 8.9 75.952758\n", + "160 True 2.422144e-07 0.0 60.456908\n", + "367 True 8.465815e-07 0.0 55.753968" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_issues = lab.get_issues(\"label\")\n", + "\n", + "label_issues.sort_values(\"label_score\").head()" + ] + }, + { + "cell_type": "markdown", + "id": "3a0db9b2", + "metadata": {}, + "source": [ + "As before, these label quality scores are continuous values in the range [0,1] where 1 represents a clean label (given label appears correct) and 0 a represents dirty label (given label appears corrupted, i.e. the numeric value may be incorrect). You can sort examples by their label quality scores to inspect the most-likely corrupted datapoints.\n", + "\n", + "If possible, we recommend you use `CleanLearning` to wrap your regression model (over providing its pre-computed predictions) for the most accurate label error detection (that properly accounts for aleatoric/epistemic uncertainty in the regression model). To understand how these approaches work, refer to our paper: **[Detecting Errors in Numerical Data via any Regression Model](https://arxiv.org/abs/2305.16583)**" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "5b39b8b5", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:53.839543Z", + "iopub.status.busy": "2024-06-25T23:04:53.839102Z", + "iopub.status.idle": "2024-06-25T23:04:53.841883Z", + "shell.execute_reply": "2024-06-25T23:04:53.841412Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# This cell is hidden from docs.cleanlab.ai\n", + "np.random.seed(SEED) # for reproducibility\n", + "random.seed(SEED)" + ] + }, + { + "cell_type": "markdown", + "id": "4366346a", + "metadata": {}, + "source": [ + "You can alternatively provide `features` to `Datalab` instead of pre-computed predictions. These are (preprocessed) numeric dataset covariates, aka independent variables to the regression model (such as neural network embeddings of your raw data). Internally, this is equivalent to using `CleanLearning` to find label issues if you also possible provide your sklearn-compatible regression model to `Datalab.find_issues`. But you can simultaneously detect many more types of issues in your dataset beyond mislabeling via Datalab (simply drop the `issue_types` argument below)." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "df06525b", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:53.843864Z", + "iopub.status.busy": "2024-06-25T23:04:53.843543Z", + "iopub.status.idle": "2024-06-25T23:04:59.338790Z", + "shell.execute_reply": "2024-06-25T23:04:59.338160Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Finding label issues ...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Audit complete. 141 issues found in the dataset.\n" + ] + } + ], + "source": [ + "lab = Datalab(\n", + " data=train_data.drop(columns=[\"true_final_score\"]),\n", + " label_name=\"final_score\",\n", + " task=\"regression\",\n", + ")\n", + "\n", + "lab.find_issues(\n", + " features=X_train,\n", + " issue_types={ # Optional drop this to simultaneously detect many types of data/label issues \n", + " \"label\": {\n", + " # Optional: Specify which type of sklearn-compatible regression model is used to find label errors\n", + " \"clean_learning_kwargs\": {\"model\": HistGradientBoostingRegressor()}\n", + " }\n", + " },\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "05282559", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:59.341180Z", + "iopub.status.busy": "2024-06-25T23:04:59.340793Z", + "iopub.status.idle": "2024-06-25T23:04:59.349286Z", + "shell.execute_reply": "2024-06-25T23:04:59.348832Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
is_label_issuelabel_scoregiven_labelpredicted_label
659True5.791186e-1217.484.110719
367True6.485156e-100.056.670640
56True1.225300e-098.971.749976
318True1.499679e-090.071.947007
305True4.067882e-0819.161.648396
\n", + "
" + ], + "text/plain": [ + " is_label_issue label_score given_label predicted_label\n", + "659 True 5.791186e-12 17.4 84.110719\n", + "367 True 6.485156e-10 0.0 56.670640\n", + "56 True 1.225300e-09 8.9 71.749976\n", + "318 True 1.499679e-09 0.0 71.947007\n", + "305 True 4.067882e-08 19.1 61.648396" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_issues = lab.get_issues(\"label\")\n", + "\n", + "label_issues.sort_values(\"label_score\").head()" + ] + }, + { + "cell_type": "markdown", + "id": "c1353758", + "metadata": {}, + "source": [ + "While this tutorial focused on label issues, cleanlab's `Datalab` object can automatically detect many other types of issues in your dataset (outliers, near duplicates, etc).\n", + "Simply remove the `issue_types` argument from the above call to `Datalab.find_issues()` above and `Datalab` will more comprehensively audit your dataset (a default regression model will be used if you don't specify the model type).\n", + "Refer to our [Datalab quickstart tutorial](./datalab/datalab_quickstart.html) to learn how to interpret the results (the interpretation remains mostly the same across different types of ML tasks).\n", + "\n", + "**Summary:** To detect many types of issues in your regression dataset, we recommend using `Datalab` with provided `features` plus the best regression model you know for your data. If your goal is to train a robust regression model with noisy data rather than detect data/label issues, then use `CleanLearning`. Alternatively, if you don't have a sklearn-compatible regression model or already have pre-computed predictions from the model you'd like to rely on, you can pass these predictions into `Datalab` directly to find issues based on them instead of providing a regression model." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "95531cda", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:04:59.351331Z", + "iopub.status.busy": "2024-06-25T23:04:59.350998Z", + "iopub.status.idle": "2024-06-25T23:04:59.419687Z", + "shell.execute_reply": "2024-06-25T23:04:59.419197Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "from sklearn.metrics import roc_auc_score\n", + "from cleanlab.regression.rank import get_label_quality_scores\n", + "\n", + "if r2_cl <= r2_og:\n", + " raise ValueError(\"CleanLearning did not improve r2 score\")\n", + "\n", + "label_quality_score_cl = label_issues_cl[\"label_quality\"]\n", + "label_quality_scores_residual = get_label_quality_scores(labels=y_train, predictions=predictions, method=\"residual\")\n", + "\n", + "label_quality_scores = get_label_quality_scores(labels=y_train, predictions=predictions)\n", + "\n", + "auc_outre = roc_auc_score(errors_mask, 1 - label_quality_scores)\n", + "auc_cl = roc_auc_score(errors_mask, 1 - label_quality_score_cl)\n", + "auc_residual = roc_auc_score(errors_mask, 1 - label_quality_scores_residual)\n", + "\n", + "if auc_outre <= 0.5 or auc_cl <= 0.5:\n", + " raise ValueError(\"Label quality scores did not perform well enough\")\n", + "\n", + "if auc_outre <= auc_residual:\n", + " raise ValueError(\"Outre label quality scores did not outperform alternative scores\")\n", + " \n", + "if auc_cl <= auc_residual:\n", + " raise ValueError(\"CL label quality scores did not outperform alternative scores\")\n", + "\n", + "# Test that CleanLearning label issues and Datalab label issues match\n", + "pd.testing.assert_frame_equal(\n", + " # CleanLearning DataFrame\n", + " label_issues_cl.rename(columns={\"label_quality\": \"label_score\"}), \n", + " # Datalab DataFrame\n", + " label_issues,\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/segmentation.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/segmentation.ipynb new file mode 100644 index 000000000..fe58276a2 --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/segmentation.ipynb @@ -0,0 +1,2489 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d0d2e007", + "metadata": {}, + "source": [ + "# Find Label Errors in Semantic Segmentation Datasets\n", + "\n", + "This 5-minute quickstart tutorial shows how you can use cleanlab to find potentially mislabeled images in semantic segmentation datasets. In semantic segmentation, our data consists of images each annotated with a corresponding mask that labels each pixel in the image as one of K classes. Models are trained on this labeled mask to predict the class of each pixel in an image. However in real-world data, this annotated mask often contains errors. \n", + "Here we apply cleanlab to find label errors in a variant of the [SYNTHIA](https://synthia-dataset.net) segmentation dataset, which consists of synthetic images generated via graphics engine." + ] + }, + { + "cell_type": "markdown", + "id": "07936a54", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "cleanlab uses two inputs to handle semantic segmentation data classification data:\n", + "- `labels`: Array of dimension (N,H,W) where N is the number of images and H and W are dimension of the image. We assume an integer encoded image. For one-hot encoding one can `np.argmax(labels_one_hot,axis=1)` assuming that `labels_one_hot` is of dimension (N,K,H,W) where K is the number of classes.\n", + "- `pred_probs`: Array of dimension (N,K,H,W), similar to `labels`.\n", + "\n", + "With these inputs, you can find and review label issues via this code: \n", + "\n", + "
\n", + " \n", + "```python\n", + "\n", + "from cleanlab.segmentation.filter import find_label_issues \n", + "from cleanlab.segmentation.summary import display_issues\n", + " \n", + "issues = find_label_issues(labels, pred_probs)\n", + "display_issues(issues, pred_probs=pred_probs, labels=labels,\n", + " top=10)\n", + "\n", + "```\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "1da020bc", + "metadata": {}, + "source": [ + "## 1. Install required dependencies and download data\n", + "\n", + "You can use `pip` to install all packages required for this tutorial as follows: \n", + "\n", + " !pip install cleanlab " + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "ae8a08e0", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:05:02.478978Z", + "iopub.status.busy": "2024-06-25T23:05:02.478799Z", + "iopub.status.idle": "2024-06-25T23:05:04.055130Z", + "shell.execute_reply": "2024-06-25T23:05:04.054408Z" + } + }, + "outputs": [], + "source": [ + "%%capture\n", + "!wget -nc 'https://cleanlab-public.s3.amazonaws.com/ImageSegmentation/given_masks.npy' " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "58fd4c55", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:05:04.057619Z", + "iopub.status.busy": "2024-06-25T23:05:04.057413Z", + "iopub.status.idle": "2024-06-25T23:05:55.802403Z", + "shell.execute_reply": "2024-06-25T23:05:55.801688Z" + } + }, + "outputs": [], + "source": [ + "%%capture\n", + "!wget -nc 'https://cleanlab-public.s3.amazonaws.com/ImageSegmentation/predicted_masks.npy' " + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "439b0305", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:05:55.805275Z", + "iopub.status.busy": "2024-06-25T23:05:55.804881Z", + "iopub.status.idle": "2024-06-25T23:05:56.945713Z", + "shell.execute_reply": "2024-06-25T23:05:56.945159Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs website).\n", + "\n", + "dependencies = [\"cleanlab\"]\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "a1349304", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:05:56.948326Z", + "iopub.status.busy": "2024-06-25T23:05:56.948012Z", + "iopub.status.idle": "2024-06-25T23:05:56.951599Z", + "shell.execute_reply": "2024-06-25T23:05:56.951146Z" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "from cleanlab.segmentation.filter import find_label_issues \n", + "from cleanlab.segmentation.rank import get_label_quality_scores, issues_from_scores \n", + "from cleanlab.segmentation.summary import display_issues, common_label_issues, filter_by_class \n", + "np.set_printoptions(suppress=True)" + ] + }, + { + "attachments": { + "image-2.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "9ad75b45", + "metadata": {}, + "source": [ + "## 2. Get data, labels, and pred_probs\n", + "\n", + "This tutorial just loads `labels` and `pred_probs` for our dataset, which are the only inputs required to find label issues and score the label quality of each image with cleanlab. For your own dataset, you will need to properly format its `labels` and train your own semantic segmentation model to produce `pred_probs` (pixel-level predicted class probabilities, which should be out-of-sample such as computed via cross-validation). Our example [training notebook](https://github.com/cleanlab/examples/blob/master/segmentation/training_ResNeXt50_for_Semantic_Segmentation_on_SYNTHIA.ipynb) demonstrates code to train a Pytorch segmentation model on the SYNTHIA dataset, produce such `pred_probs` for each image, and save them in a `.npy` file (which we simply load in this tutorial via `np.load`).\n", + "\n", + "Here's what an image looks like in the SYNTHIA dataset. For every image there is a `label` mask provided in which each pixel is integer-encoded as one of the SYNTHIA classes: sky, building, road, sidewalk, fence, vegetation, pole, car, traffic sign, person, bicycle, motorcycle, traffic light, terrain, rider, truck, bus, train, wall, and unlabeled (annotated for pixels not belonging to the other classes). \n", + "\n", + "![image-2.png](attachment:image-2.png)" + ] + }, + { + "cell_type": "markdown", + "id": "dc888c2a", + "metadata": {}, + "source": [ + "In semantic segmentation tasks `labels` and `pred_probs` are formatted with the following dimensions:\n", + "\n", + " N - Number of images in the dataset\n", + " K - Number of classes in the dataset\n", + " H - Height of each image\n", + " W - Width of each image\n", + "\n", + "Each pixel in the dataset is labeled with one of *K* possible classes. The `pred_probs` contain a length-*K* vector for **each** pixel in the dataset (which sums to 1 for each pixel). This results in an array of size `(N,K,H,W)`. \n", + "\n", + "Note that cleanlab requires **only** `pred_probs` from any trained segmentation model and `labels` in order to detect label errors. The `pred_probs` should be **out-of-sample**, which can be obtained for every image in a dataset via K-fold cross-validation." + ] + }, + { + "cell_type": "markdown", + "id": "6c2202be", + "metadata": {}, + "source": [ + "**pred_probs**\n", + "dim: (N,K,H,W)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "07dc5678", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:05:56.953559Z", + "iopub.status.busy": "2024-06-25T23:05:56.953382Z", + "iopub.status.idle": "2024-06-25T23:05:56.957330Z", + "shell.execute_reply": "2024-06-25T23:05:56.956815Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(30, 20, 1088, 1920)\n" + ] + } + ], + "source": [ + "pred_probs_filepaths ='predicted_masks.npy'\n", + "pred_probs = np.load(pred_probs_filepaths, mmap_mode='r+')\n", + "print(pred_probs.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "f2eff12e", + "metadata": {}, + "source": [ + "The `labels` contain a class label for each pixel in each image, which must be an integer in `0, 1, ..., K-1`. This results in an array of size `(N,H,W)`." + ] + }, + { + "cell_type": "markdown", + "id": "1e625c33", + "metadata": {}, + "source": [ + "**labels**\n", + "dim: (N,H,W)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "25ebe22a", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:05:56.959639Z", + "iopub.status.busy": "2024-06-25T23:05:56.959252Z", + "iopub.status.idle": "2024-06-25T23:05:56.962811Z", + "shell.execute_reply": "2024-06-25T23:05:56.962296Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(30, 1088, 1920)\n" + ] + } + ], + "source": [ + "label_filepaths ='given_masks.npy'\n", + "labels = np.load(label_filepaths, mmap_mode='r+')\n", + "print(labels.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "9b71eb4a", + "metadata": {}, + "source": [ + "Note that these correspond to the labeled mask from the dataset, and the extracted probabilities of a trained classifier. If using your own dataset, which may consider iterating on memmaped numpy arrays.\n", + "\n", + "- `labels`: Array of dimension (N,H,W) where N is the number of images, K is the number of classes, and H and W are dimension of the image. We assume an integer encoded image. For one-hot encoding one can `np.argmax(labels_one_hot,axis=1)` assuming that `labels_one_hot` is of dimension (N,K,H,W)\n", + "- `pred_probs`: Array of dimension (N,K,H,W), similar to `labels` where `K` is the number of classes.\n", + "\n", + "**class_names**\n", + "dim: (K,)\n", + "\n", + "Some of our functions optionally use the class names to improve visualization. Here are the class names in our dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "3faedea9", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:05:56.964844Z", + "iopub.status.busy": "2024-06-25T23:05:56.964433Z", + "iopub.status.idle": "2024-06-25T23:05:56.967174Z", + "shell.execute_reply": "2024-06-25T23:05:56.966734Z" + } + }, + "outputs": [], + "source": [ + "SYNTHIA_CLASSES = ['unlabeled','sky', 'building', 'road', 'sidewalk', 'fence', 'vegetation','pole','car', \\\n", + " 'traffic sign','person','bicycle','motorcycle','traffic light', 'terrain', \\\n", + " 'rider', 'truck', 'bus', 'train','wall']" + ] + }, + { + "attachments": { + "synthia_errors-2.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "1dc3150f", + "metadata": {}, + "source": [ + "## 3. Use cleanlab to find label issues \n", + "\n", + "In segmentation, we consider an image mislabeled if the given mask does not match what truly appears in the image that is being segmented. More specifically, when a pixel is labeled as class `i` but the pixel _really_ belongs to class `j`. This generally happens when an image is annotated maunally by human annotators.\n", + "\n", + "Below are examples of three types of annotation errors common in segmentation datasets.\n", + "\n", + "![synthia_errors-2.png](attachment:synthia_errors-2.png)\n", + "\n", + "\n", + "Based on the given `labels` and out-of-sample `pred_probs`, cleanlab can quickly help us identify such label issues in our dataset by calling `find_label_issues()`. \n", + "\n", + "By default, the indices of the identified label issues are sorted by cleanlab’s self-confidence score, which measures the quality of each given label via the probability assigned to it by our trained model. The returned `issues` is a boolean mask of dimension `(N,H,W)`, where `True` corresponds to a detected error sorted by image quality with the lowest-quality images coming first." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "2c2ad9ad", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:05:56.969256Z", + "iopub.status.busy": "2024-06-25T23:05:56.968825Z", + "iopub.status.idle": "2024-06-25T23:06:31.975355Z", + "shell.execute_reply": "2024-06-25T23:06:31.974796Z" + } + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1a9ba29fef264479aeebe59f09be2cf7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "number of examples processed for estimating thresholds: 0%| | 0/30 [00:00" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display_issues(issues,top=2)" + ] + }, + { + "cell_type": "markdown", + "id": "717b3b7d", + "metadata": {}, + "source": [ + "We can also input `pred_probs`, `labels`, and `class_names` as auxiliary inputs to see more information." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "57fed473", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:06:32.647536Z", + "iopub.status.busy": "2024-06-25T23:06:32.647114Z", + "iopub.status.idle": "2024-06-25T23:06:35.351082Z", + "shell.execute_reply": "2024-06-25T23:06:35.350598Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display_issues(issues, labels=labels, pred_probs=pred_probs, class_names=SYNTHIA_CLASSES,top=2)" + ] + }, + { + "cell_type": "markdown", + "id": "116fff37", + "metadata": {}, + "source": [ + "After additionally inputting `pred_probs`, `labels`, and `class_names` we see more information:\n", + " - Inputs `labels` and `pred_probs` generates the first two columns. This segments the image based on the class that appears in the given label and what class the model predicted for those pixels.\n", + " - Input `class_names` creates the legend that color codes our segmentation.\n", + "\n", + "\n", + "In the leftmost plot we can see that the dark brown area (the `unlabeled` class as shown in the legend) was the given label. The middle plot shows our model believes that this area is infact the `sky`, a light brown shade in the legend. The rightmost plot highlights the discrepancy between these classes in red to indicate which area of the image is likely mislabeled.\n", + "\n", + "These plots clearly highlight the part of the sky that was mislabeled by annotators of this image." + ] + }, + { + "cell_type": "markdown", + "id": "d213b2b2", + "metadata": {}, + "source": [ + "### Classes which are commonly mislabeled overall \n", + "\n", + "We may also wish to understand which classes tend to be most commonly mislabeled throughout the entire dataset by calling `common_label_issues()`. " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "e4a006bd", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:06:35.353386Z", + "iopub.status.busy": "2024-06-25T23:06:35.353025Z", + "iopub.status.idle": "2024-06-25T23:07:08.308710Z", + "shell.execute_reply": "2024-06-25T23:07:08.308229Z" + } + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5b3005f45257497bb0cb47a947558588", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + " 0%| | 0/4997683 [00:00\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
given_labelpredicted_labelnum_pixel_issues
0unlabeledsky3263230
1unlabeledcar783381
2polebuilding275110
3unlabeledbuilding255917
4traffic lightbuilding78225
5personbuilding55990
6unlabeledsidewalk54315
7polesidewalk33591
8buildingcar24645
9wallbuilding21054
10personsidewalk15045
11wallsidewalk14171
12buildingsky13832
13roadcar13498
14fencebuilding11490
15carroad9164
16carbuilding8769
17wallvegetation6999
18wallcar6031
19traffic signbuilding5011
\n", + "" + ], + "text/plain": [ + " given_label predicted_label num_pixel_issues\n", + "0 unlabeled sky 3263230\n", + "1 unlabeled car 783381\n", + "2 pole building 275110\n", + "3 unlabeled building 255917\n", + "4 traffic light building 78225\n", + "5 person building 55990\n", + "6 unlabeled sidewalk 54315\n", + "7 pole sidewalk 33591\n", + "8 building car 24645\n", + "9 wall building 21054\n", + "10 person sidewalk 15045\n", + "11 wall sidewalk 14171\n", + "12 building sky 13832\n", + "13 road car 13498\n", + "14 fence building 11490\n", + "15 car road 9164\n", + "16 car building 8769\n", + "17 wall vegetation 6999\n", + "18 wall car 6031\n", + "19 traffic sign building 5011" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "common_label_issues(issues, labels=labels, pred_probs=pred_probs, class_names=SYNTHIA_CLASSES)" + ] + }, + { + "cell_type": "markdown", + "id": "a35ef843", + "metadata": {}, + "source": [ + "The printed information above is also stored in a returned pandas DataFrame, which summarizes which classes are overall least reliably labeled in the dataset.\n", + "\n", + "### Focusing on one specific class\n", + "\n", + "We can also just focus on issues within a specific class of interest, say just the class `car`. Easily do so using `filter_by_class` to only look at the estimated label errors in the `car` class. \n", + "Here the color-coding reveals that the pixels depicting a car in the image were mistakenly left as the `unlabeled` class in the given label." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "c8f4e163", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:08.310775Z", + "iopub.status.busy": "2024-06-25T23:07:08.310562Z", + "iopub.status.idle": "2024-06-25T23:07:22.565922Z", + "shell.execute_reply": "2024-06-25T23:07:22.565290Z" + } + }, + "outputs": [], + "source": [ + "class_issues = filter_by_class(SYNTHIA_CLASSES.index(\"car\"), issues,labels=labels, pred_probs=pred_probs)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "716c74f3", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:22.568373Z", + "iopub.status.busy": "2024-06-25T23:07:22.568170Z", + "iopub.status.idle": "2024-06-25T23:07:26.286056Z", + "shell.execute_reply": "2024-06-25T23:07:26.285490Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAEZkAAAGFCAYAAACBR0rlAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8WgzjOAAAACXBIWXMAAA9hAAAPYQGoP6dpAACNaUlEQVR4nOzdd5RV1fk/4M/QBZQi9oq9kYhiiyLYG9h7xdijJpoY89WQ2GJMojGxYVdU7L3FriiCRqVYorErKlhQQBHp/P7gd29mmMLMMAN4eZ61shbeU+6+k3PO3vs9e7+7bObMmTMDAAAAAAAAAAAAAAAAAAAAAEBJajK/CwAAAAAAAAAAAAAAAAAAAAAAQOORZAYAAAAAAAAAAAAAAAAAAAAAoIRJMgMAAAAAAAAAAAAAAAAAAAAAUMIkmQEAAAAAAAAAAAAAAAAAAAAAKGGSzAAAAAAAAAAAAAAAAAAAAAAAlDBJZgAAAAAAAAAAAAAAAAAAAAAASpgkMwAAAAAAAAAAAAAAAAAAAAAAJUySGQAAAAAAAAAAAAAAAAAAAACAEibJDAAAAAAAAAAAAAAAAAAAAABACZNkBgAAAAAAAAAAAAAAAAAAAACghEkyAwAAAAAAAAAAAAAAAAAAAABQwiSZAQAAAAAAAAAAAAAAAAAAAAAoYZLMAAAAAAAAAAAAAAAAAAAAAACUMElmAAAAAAAAAAAAAAAAAAAAAABKmCQzAAAAAAAAAAAAAAAAAAAAAAAlTJIZAAAAAAAAAAAAAAAAAAAAAIASJskMAAAAAAAAAAAAAAAAAAAAAEAJk2QGAAAAAAAAAAAAAAAAAAAAAKCESTIDAAAAAAAAAAAAAAAAAAAAAFDCJJkBAAAAAAAAAAAAAAAAAAAAAChhkswAAAAAAAAAAAAAAAAAAAAAAJQwSWYAAAAAAAAAAAAAAAAAAAAAAEqYJDMAAAAAAAAAAAAAAAAAAAAAACVMkhkAAAAAAAAAAAAAAAAAAAAAgBImyQwAAAAAAAAAAAAAAAAAAAAAQAmTZAYAAAAAAAAAAAAAAAAAAAAAoIRJMgMAAAAAAAAAAAAAAAAAAAAAUMIkmQEAAAAAAAAAAAAAAAAAAAAAKGGSzAAAAAAAAAAAAAAAAAAAAAAAlDBJZgAAAAAAAAAAAAAAAAAAAAAASpgkMwAAAAAAAAAAAAAAAAAAAAAAJUySGQAAAAAAAAAAAAAAAAAAAACAEibJDAAAAAAAAAAAAAAAAAAAAABACZNkBgAAAAAAAAAAAAAAAAAAAACghEkyAwAAAAAAAAAAAAAAAAAAAABQwiSZAQAAAAAAAAAAAAAAAAAAAAAoYZLMAAAAAAAAAAAAAAAAAAAAAACUMElmAAAAAAAAAAAAAAAAAAAAAABKmCQzAAAAAAAAAAAAAAAAAAAAAAAlTJIZAAAAAAAAAAAAAAAAAAAAAIASJskMAAAAAAAAAAAAAAAAAAAAAEAJk2QGAAAAAAAAAAAAAAAAAAAAAKCESTIDAAAAAAAAAAAAAAAAAAAAAFDCJJkBAAAAAAAAAAAAAAAAAAAAAChhkswAAAAAAAAAAAAAAAAAAAAAAJQwSWYAAAAAAAAAAAAAAAAAAAAAAEqYJDMAAAAAAAAAAAAAAAAAAAAAACVMkhkAAAAAAAAAAAAAAAAAAAAAgBImyQwAAAAAAAAAAAAAAAAAAAAAQAmTZAYAAAAAAAAAAAAAAAAAAAAAoIRJMgMAAAAAAAAAAAAAAAAAAAAAUMIkmQEAAAAAAAAAAAAAAAAAAAAAKGGSzAAAAAAAAAAAAAAAAAAAAAAAlDBJZgAAAAAAAAAAAAAAAAAAAAAASpgkMwAAAAAAAAAAAAAAAAAAAAAAJUySGQAAAAAAAAAAAAAAAAAAAACAEibJDAAAAAAAAAAAAAAAAAAAAABACZNkBgAAAAAAAAAAAAAAAAAAAACghEkyAwAAAAAAAAAAAAAAAAAAAABQwiSZAQAAAAAAAAAAAAAAAAAAAAAoYZLMAAAAAAAAAAAAAAAAAAAAAACUMElmAAAAAAAAAAAAAAAAAAAAAABKmCQzAAAAAAAAAAAAAAAAAAAAAAAlTJIZAAAAAAAAAAAAAAAAAAAAAIASJskMAAAAAAAAAAAAAAAAAAAAAEAJk2QGAAAAAAAAAAAAAAAAAAAAAKCESTIDAAAAAAAAAAAAAAAAAAAAAFDCJJkBAAAAAAAAAAAAAAAAAAAAAChhkswAAAAAAAAAAAAAAAAAAAAAAJQwSWYAAAAAAAAAAAAAAAAAAAAAAEqYJDMAAAAAAAAAAAAAAAAAAAAAACVMkhkAAAAAAAAAAAAAAAAAAAAAgBImyQwAAAAAAAAAAAAAAAAAAAAAQAmTZAYAAAAAAAAAAAAAAAAAAAAAoIRJMgMAAAAAAAAAAAAAAAAAAAAAUMIkmQEAAAAAAAAAAAAAAAAAAAAAKGGSzAAAAAAAAAAAAAAAAAAAAAAAlDBJZgAAAAAAAAAAAAAAAAAAAAAASpgkMwAAAAAAAAAAAAAAAAAAAAAAJUySGQAAAAAAAAAAAAAAAAAAAACAEibJDAAAAAAAAAAAAAAAAAAAAABACZNkBgAAAAAAAAAAAAAAAAAAAACghEkyAwAAAAAAAAAAAAAAAAAAAABQwiSZAQAAAAAAAAAAAAAAAAAAAAAoYZLMAAAAAAAAAAAAAAAAAAAAAACUMElmAAAAAAAAAAAAAAAAAAAAAABKmCQzAAAAAAAAAAAAAAAAAAAAAAAlTJIZAAAAAAAAAAAAAAAAAAAAAIASJskMAAAAAAAAAAAAAAAAAAAAAEAJk2QGAAAAAAAAAAAAAAAAAAAAAKCESTIDAAAAAAAAAAAAAAAAAAAAAFDCJJkBAAAAAAAAAAAAAAAAAAAAAChhkswAAAAAAAAAAAAAAAAAAAAAAJQwSWYAAAAAAAAAAAAAAAAAAAAAAEqYJDMAAAAAAAAAAAAAAAAAAAAAACVMkhkAAAAAAAAAAAAAAAAAAAAAgBImyQwAAAAAAAAAAAAAAAAAAAAAQAmTZAYAAAAAAAAAAAAAAAAAAAAAoIRJMgMAAAAAAAAAAAAAAAAAAAAAUMIkmQEAAAAAAAAAAAAAAAAAAAAAKGGSzAAAAAAAAAAAAAAAAAAAAAAAlDBJZgAAAAAAAAAAAAAAAAAAAAAASpgkMwAAAAAAAAAAAAAAAAAAAAAAJUySGQAAAAAAAAAAAAAAAAAAAACAEibJDAAAAAAAAAAAAAAAAAAAAABACZNkBgAAAAAAAAAAAAAAAAAAAACghEkyAwAAAAAAAAAAAAAAAAAAAABQwiSZAQAAAAAAAAAAAAAAAAAAAAAoYZLMAAAAAAAAAAAAAAAAAAAAAACUMElmAAAAAAAAAAAAAAAAAAAAAABKmCQzAAAAAAAAAAAAAAAAAAAAAAAlTJIZAAAAAAAAAAAAAAAAAAAAAIASJskMAAAAAAAAAAAAAAAAAAAAAEAJk2QGAAAAAAAAAAAAAAAAAAAAAKCESTIDAAAAAAAAAAAAAAAAAAAAAFDCJJkBAAAAAAAAAAAAAAAAAAAAAChhkswAAAAAAAAAAAAAAAAAAAAAAJQwSWYAAAAAAAAAAAAAAAAAAAAAAEqYJDMAAAAAAAAAAAAAAAAAAAAAACVMkhkAAAAAAAAAAAAAAAAAAAAAgBImyQwAAAAAAAAAAAAAAAAAAAAAQAmTZAYAAAAAAAAAAAAAAAAAAAAAoIRJMgMAAAAAAAAAAAAAAAAAAAAAUMIkmQEAAAAAAAAAAAAAAAAAAAAAKGGSzAAAAAAAAAAAAAAAAAAAAAAAlDBJZgAAAAAAAAAAAAAAAAAAAAAASpgkMwAAAAAAAAAAAAAAAAAAAAAAJUySGQAAAAAAAAAAAAAAAAAAAACAEibJDAAAAAAAAAAAAAAAAAAAAABACZNkBgAAAAAAAAAAAAAAAAAAAACghEkyAwAAAAAAAAAAAAAAAAAAAABQwiSZAQAAAAAAAAAAAAAAAAAAAAAoYZLMAAAAAAAAAAAAAAAAAAAAAACUMElmAAAAAAAAAAAAAAAAAAAAAABKmCQzAAAAAAAAAAAAAAAAAAAAAAAlTJIZAAAAAAAAAAAAAAAAAAAAAIASJskMAAAAAAAAAAAAAAAAAAAAAEAJk2QGAAAAAAAAAAAAAAAAAAAAAKCESTIDAAAAAAAAAAAAAAAAAAAAAFDCJJkBAAAAAAAAAAAAAAAAAAAAAChhkswAAAAAAAAAAAAAAAAAAAAAAJQwSWYAAAAAAAAAAAAAAAAAAAAAAEqYJDMAAAAAAAAAAAAAAAAAAAAAACVMkhkAAAAAAAAAAAAAAAAAAAAAgBImyQwAAAAAAAAAAAAAAAAAAAAAQAmTZAYAAAAAAAAAAAAAAAAAAAAAoIRJMgMAAAAAAAAAAAAAAAAAAAAAUMIkmQEAAAAAAAAAAAAAAAAAAAAAKGGSzAAAAAAAAAAAAAAAAAAAAAAAlDBJZgAAAAAAAAAAAAAAAAAAAAAASpgkMwAAAAAAAAAAAAAAAAAAAAAAJUySGQAAAAAAAAAAAAAAAAAAAACAEibJDAAAAAAAAAAAAAAAAAAAAABACZNkBgAAAAAAAAAAAAAAAAAAAACghEkyAwAAAAAAAAAAAAAAAAAAAABQwiSZAQAAAAAAAAAAAAAAAAAAAAAoYZLMAAAAAAAAAAAAAAAAAAAAAACUMElmAAAAAAAAAAAAAAAAAAAAAABKmCQzAAAAAAAAAAAAAAAAAAAAAAAlTJIZAAAAAAAAAAAAAAAAAAAAAIASJskMAAAAAAAAAAAAAAAAAAAAAEAJk2QGAAAAAAAAAAAAAAAAAAAAAKCESTIDAAAAAAAAAAAAAAAAAAAAAFDCJJkBAAAAAAAAAAAAAAAAAAAAAChhkswAAAAAAAAAAAAAAAAAAAAAAJQwSWYAAAAAAAAAAAAAAAAAAAAAAEqYJDMAAAAAAAAAAAAAAAAAAAAAACWsWX0PnDlzZqZOnZoZM2Y0ZHkAAAAAAAAAAAAAAAAAAAAAAKhC06ZN06xZs5SVldXpuDonmZk4cWLGjx+f7777LtOnT6/r4QAAAAAAAAAAAAAAAAAAAAAA1FPLli3Tvn37dOjQodbJZuqUZOa7777Lp59+mubNm6d9+/Zp06ZNmjRpUufMNgAAAAAAAAAAAAAAAAAAAAAA1N7MmTMzbdq0jB8/Pl988UWmTJmSpZdeulbH1jrJzMSJE/Ppp59mscUWy7LLLiuxDAAAAAAAAAAAAAAAAAAAAADAPLboootm7Nix+fzzz7PIIoukXbt2czymSW1PPn78+DRv3lyCGQAAAAAAAAAAAAAAAAAAAACA+ahDhw5p3bp1vv3221rtX6skMzNnzsx3332XxRZbTIIZAAAAAAAAAAAAAAAAAAAAAID5rG3btpk4cWJmzJgxx31rlWRm6tSpmT59etq0aTPXhQMAAAAAAAAAAAAAAAAAAAAAYO60atUqM2bMyLRp0+a4b62SzBSy1TRpUqvdAQAAAAAAAAAAAAAAAAAAAABoRIVcMIXcMDXuW5cTl5WV1a9EAAAAAAAAAAAAAAAAAAAAAAA0mLrkgqlTkhkAAAAAAAAAAAAAAAAAAAAAAH5cJJkBAAAAAAAAAAAAAAAAAAAAAChhkswAAAAAAAAAAAAAAAAAAAAAAJQwSWYAAAAAAAAAAAAAAAAAAAAAAEqYJDNU66OPPkpZWVnKysrSv3//RvmOM888s/gd81vPnj1TVlaWnj17zu+iFP8mZ5555vwuCgsB1xsLmgWlbphTPdi/f//i9o8++qje39OnT5+UlZVl5ZVXrnK7exQWTHO6d6GhrLzyyikrK0ufPn3qfY550bdb0NXU3/P3obGNHDkyxxxzTFZdddW0atWqeL3dd99987toAA1C2xjmXkO0+wFYME2fPj0XXXRRNt544yy22GLFPuHuu+9eYb+vv/46p5xyStZee+0sssgixf3++c9/Jllw3p3UlbgLAA2lIerCBWlsWENS3wLzi5gWVK1UYgFzev8zL54B3kHBgseYZn4MjLmEH7+BAwcW78GBAwfO7+JQIuZ3HwsWRvoPUNmCEHMraNYYJ/185MiMGzOmMU69QGjfqVOWXnHF+V0MKCljvhyZb8eX7nNjsXad0mlJz42FxahxIzP2+9K8nju06ZRl27uWWXiMnPBVxkz6dn4Xo9F0arVYVmy7xPwuBj8SI8dOypjvp8zvYjSaTm1aZMUOreZ3MWCBMnrktIwbM31+F6NRtO/UNMus2ChhsQpGjhyZDTfcMGNKOE7GHIycnoyZOb9L0Tg6lSUrNp3fpQC+HJmUcFw17Tol4qrUYFpGZkZK8x5okk5pFtd/qRuZkfm6RK/hxdMpK7qGc8ABB+TOO++scZ/x48dns802y7vvvjuPSsXCYmSmZEymze9iNJpZNWWL+V0MqLWRY5Ix383vUjSOTosmK3aa36WAuhs/MplYms3xJEnrTkk7TXJYMI38PhkzeX6XonF0apms2GZ+l2K+EgtggTN5fDJ14vwuReNp3jpp2W5+lwJqNCXjMy2lex82S+u0iPuQuhufCZmYEm0XJ2mdlmmXtvO7GJSQ8TOSiSU6FLF1WdKuyfwuBfPL+PHJxB/mdykaR+tFknaaSdTDtPHJjNLtQqRJ66SZe2O+a/DZNJ+PHJk911wzUyZNauhTLzBatGqVe95+W6IZaCBjvhyZXx2xZqZOLd3nRvPmrXLRtW9LNLMQGDVuZLa/YM1MmVaa13OLZq3y+ClvSzTDQmHkhK+y5u3HZ9L0qfO7KI2mVdPmeXu/yySaYY5Gjp2UNf82JJOmzZjfRWk0rZo1ydun/kyiGfj/Ro+clt3X/CRTJpXmG6kWrcpy39srNHqimT/96U8ZM2ZMmjVrlnPPPTdbbrll2rad9eJ4pZVWatTvZgEwcnqy5vikNLuHSaskb7eTaGYe6d+/fw4//PAkyYcfftjoKzaeeeaZOeuss5IkM2eWZl1QEr4cmRyxZlLCcdU0b5Vc+7ZEM1RpWkZmdNZMKVe2y+RtiWZK2MiMzE+yZiaX6DXcMq3yWt5u1EQz87qNVFdDhgwpTirbZZddctJJJ2WppZZKWVlZFltsseJ+l112WXFS2amnnprevXunffv2SZJllllmnpeb0jAyU7Jm/pNJKd32fKuU5e2sK9EMPwojxyRrnpxMKtHXjq2aJ2//Q6IZflzGj0wuXzOZXprN8SRJ01bJcW9LNJMs+H0HFjIjv0/WfCiZVKLjL1o1Sd7u1WiJZhb0+1ksYMG08sor5+OPP85hhx2W/v37z+/izFuTxyfDL01mlm4S2pQ1S7qe0KiJZhb0Zw8LtikZn//k0sws4WTQZWmWdXOCRDPUyfhMyKW5N9NSmovgJUmzNM0J2aNRE82ooxYe42ckl36fkq1NmiU5oY1EMwuj8eOTS69KppVoddCsaXLC0Y2baEZdUHqmjU++vDSl+9BPkmbJkidINDO/NfhMmnFjxpR0gpkkmTJpUsaNGSPJDDSQb8ePKekEM0kydeqkfDt+jCQzC4Gx348p2QQzSTJl2qSM/X7MQpFkZuWVV14gJtItCGVYWI2Z9G1JJ5hJkknTp2bMpG8lmWGOxnw/paQTzCTJpGkzMub7KQtckpmPPvpofheBhdS4MdNLNsFMkkyZNDPjxkxv9CQzTz75ZJJk9913z6mnntqo38UCaMzM0p3znsz6bWNmxrx3+vfvv/ANyF1QjB9T2glmklm/b/wYSWao0oyMSalXtrN+o+u/VH2dMSWbYCZJJmdSvs6YRk0ys6Ar9AmbNm2aW265pcJksqr269atW/76179Wuc+ZZ56ZM888s1HK2ZgWlHctC6MxmVbSCWaSZFJmZkymSTLDj8KY70o3wUwy67eN+W7BTzIzcODA+V0EFiATx5R2gplk1u+bOEaSGVjgjJlcuglmklm/bczkRksys6BbmGIBxrP8SEydWNoJZpJZv2/qxEZNMsP/iPXV3bRMLOkEM0kyM9MyLRMXmCQz6qgfh4mZXNIJZpJkWqZnYiY3apIZFh4TZ5Z2roFpmfUbF4yahHlp4g+lm2AmmfXbJv7QuElmFmT6D/UzY2JK+6GfJNP+/+9cSO+NBYXcbgAAAADQCD777LMkyRprrDGfSwIAAMC8VugTLrXUUtVOKiu/n74jAAAA/LiJBQAAAAAAPwaSzAAAAABAI5gyZUqSpHnz5vO5JAAAAMxrkydPTjLnPmFt9wMAAAAWbGIBAAAAAMCPgSQzC6A+ffqkrKwsK6+8co379e/fP2VlZSkrK8tHH31UYdvKK6+csrKy9OnTJ0ny9ttv56ijjsrKK6+cli1bZqmllsoee+yRF198ca7K+sYbb+RPf/pTdthhhyy//PJp2bJl2rZtm9VXXz2HHXZYnc8/bty4nHHGGVl33XXTtm3bdOzYMVtttVVuvfXWWh0/adKkXHrppdlmm22y9NJLp0WLFllyySWz7bbb5tprr820adPq8zMr+Pzzz/P73/8+3bp1S8eOHdOyZcussMIK2XffffPkk0/W6hy33HJLevbsmQ4dOqRt27ZZb731csYZZ2TcuHFzXT4WbqNGjcr//d//ZYMNNki7du3SvHnzLLXUUunSpUsOOOCA9O/fP99++22dzjljxowcd9xxxefNCSeckIsuuqj437W5z/faa6+UlZWlY8eOmTRpUn1/Hgup+tQNhevzzDPPrPHcPXv2TFlZWXr27Flp20cffVQ8T//+/etd/rfeeit9+vTJCiuskFatWmWFFVbIgQcemJdffrlWx9f0W2ZvC8yYMSNXXXVVfvazn6VDhw5p06ZNfvKTn+Tcc8/NxIkT5/hdb7zxRg499NAsv/zyadWqVVZcccUcfPDBGTZsWJLat1GgvDPPPLN4nSbJ+PHjc84556Rr165p3759pXtswoQJ+ctf/pLNNtus2NZafvnls/fee+ehhx6q8btmzJiRp59+Oqeccko233zzdOrUKc2bN0/79u2z/vrr55RTTsnIkSNrVe65vXehKvVpq83et6vK9OnT069fv2yyySZZbLHF0q5du2ywwQa54IILigOTauu+++7LPvvskxVXXDGtWrVK+/bt061bt5x11lkZO3Zslcest956KSsry/7771/l9vL11frrr1/lPi+++GJxn0cffbTCtilTpuTBBx/MCSeckI022igdOnRI8+bNs/jii2eTTTbJmWeemTFjxtTpd9bVpEmTsttuuxXL+Ne//rVRv48ft/LXfMFZZ51V/Kyqe3r69Om54YYb0qtXryy77LJp2bJlFl988WyxxRa58MIL88MPP1T7fbO3aT/77LP8+te/zmqrrZZFFlkkiy++eHbYYYc88sgjtSr/V199lbPPPjubb755llxyyTRv3jwdOnTIJptsklNPPTWvvfZatcfOi7gMC4eJEydm0UUXTVlZWQ466KA57v/CCy8U769+/fpV2t4Q8bwbb7wxPXr0KMbzunTpkrPPPrtYd9e2H/rMM8/ksMMOyyqrrJLWrVtnscUWS5cuXfLb3/42o0aNqrT/wIEDU1ZWlsMPP7z4WefOnSs8U8rKyjJw4MAKx7344ovp27dvevbsWbwfF1tssayzzjo57rjj8uabb1ZZvsIz7Kyzzip+Nvt3zR6Prm1f8fXXX8/RRx+d1VdfPa1bt86iiy6addddNyeffHKl+HZ5VfXPn3jiifTu3TtLL710WrZsmc6dO+e4447Lp59+WmMZYHaz91nn5h3FRx99lJNPPjnrrrtuFl100bRu3Tqrr756jjnmmLz++usNUt733nsvJ598crp06ZJ27dplkUUWySqrrJI+ffrklVdeaZDvYOE1ePDgHHnkkVlzzTWz2GKLpUWLFll++eXTq1evXHbZZZXeYY0ePTr9+vXL3nvvndVXXz1t2rRJy5Yts9xyy2W33XbL7bffnhkzZlT7fYU6rlCPzZgxI9ddd1222mqrLLXUUmnSpEmNfWEWbvVpI83eZhk9enR+97vfFZ/bs+8/duzYXH/99Tn44IOzzjrrpG3btmnRokWWXnrp7LDDDrnqqquKiUVnV/j+G264IUny8ccfV1m2wr8//vjjJMkNN9xQYZ/y7y5mr7OqM3ny5Fx11VXZZZddstxyy6Vly5Zp06ZN1l133Rx55JF57LHHMnPmzDr8tf9n6NChOeKII7LGGmukTZs2xbjphhtumOOPPz4PPPBApXPX9l3L119/nVNPPTVrrrlmFllkkSy11FLZbrvtcu+99yapeWxEMu/GR7Dwacj24tzETWa/xocOHZo+ffqkc+fOadmyZaVnw7hx43Luuedms802K8ZTl1hiiayzzjrZY489cvnll+eLL76o9vvmpm07e//45ZdfzgEHHFAcU7TccsvlkEMOyVtvvTXHvxnUpD73Y03jA8qrS4z0gQceKF73t9122xzL/Zvf/CZlZWVp1qxZlbGgpO5t87qoa2wKajJ7Pfntt9/mzDPPTJcuXdK2bdssueSS2XnnnTNkyJAKx3355Zfp27dv1l133bRp0yaLL754dttttwwfPrzG75sxY0YGDBiQnXfeuViXLrHEEtlqq63Sr1+/Ktvn9Y2vJnM3dmH25827776bE044oRgbrapdW5e2/JgxY4ptgGOPPbbGsiTJgw8+WPytd9xxR5X7fPTRR/nd736XDTfcMIsvvniaN2+eTp06pXv37jnzzDPzwQcfzPF7qiOmRUMQC6heY8YCalKb8SzTpk3LxRdfnI033jiLLbZYcQzKP/7xj0yZMqXOYzTHjRuXP/7xj8U6pH379tlyyy1z8803V7l/4Xlc3d+7Nm1DFm5z05aozxispG7Prnn5nCuo6b14Q49phvLqMhb5xzDmMqn7PQzl1bWOqsv1Nvt73JrUdrxUY8S7xo4dm8022yxlZWVp3rx5tW1CqM78mquWzHpndPHFF6dnz55ZYokl0rx583Ts2DFrrrlmdtppp1x44YU1jqeDpHHrgkT/gR+vXr16paysLJtuummV28u3dTp27Fjl+LbPP/+8uM8VV1xR/Lwh5y0uCJrN7wLQ+O69994cfPDBFR6kX375Ze677748+OCDufnmm7PffvvV+bwDBw7MVlttVenzKVOm5L333st7772XG2+8Mf/3f/+X8847b47n+/DDD7Pddtvl/fffL372/fffZ+DAgRk4cGDuu+++3HzzzWnWrOrL9tVXX81uu+1WDAQXfPXVV3nqqafy1FNP5corr8yDDz6YpZZaqo6/dpabb745xxxzTL7//vsKn3/66ae58847c+edd+aII47IFVdcUWU5p02blgMPPDB33nlnhc//85//5D//+U8GDBhQ64ktMLtBgwalV69elSYmf/nll/nyyy/zxhtv5LbbbkunTp3Sq1evWp1z6tSpOfTQQ4sDYfr27Ztzzjkn33zzTX73u99l8uTJ6d+/f7UVbjLrpfaDDz6YJDnwwAPTqlWrev5CFkZzWzfMb3fccUcOPfTQCsHmTz/9NLfeemvuvPPOCo3MuTVx4sRsv/32eeqppyp8/vrrr+f111/PAw88kKeffjpt2rSp8vgBAwbk5z//eaZOnVr87JNPPsnNN9+cO+64I1dffXWDlZWF17vvvpvtt9++2oDX8OHD06tXr0oDGD/77LPcfffdufvuu7Pnnnvm5ptvrrI+OfvssytMhi0YP358Xn311bz66qu5/PLLM2DAgOyxxx7VlnNe3rssPBqjrZbMGty48847Z9CgQRU+Hz58eIYPH55bb70111xzzRzPM3bs2Oy99955+umnK3w+efLkDB06NEOHDk2/fv1y//33V2r79ejRI//5z3/y7LPPVnnu8p+/9tpr+eabb9KxY8cq92nWrFm22GKLCtuOPvro4kCs8r755pu89NJLeemll3LppZfm/vvvz+abbz7H31pX3333XXbdddcMHDgwTZo0yRVXXJGjjjqqwb+HhdfIkSOz66675tVXX63w+TfffJPBgwdn8ODBufzyy/Pwww9njTXWqPFcgwcPzu67714h8dKkSZPy+OOP5/HHH8/555+fU045pdrjq4t7jBs3rni/3XHHHVXW5fMiLsPCo3Xr1tl9990zYMCA3H///fn++++r7cskKQ4SaNasWfbdd99K2+Ymnjd16tTss88+uf/++yt8/sYbb+SNN97IgAED8sQTT8zxN02aNCmHH354lZONCue6/PLLc+utt6Z3795zPF9N+vfvX+HFYcHUqVPz1ltv5a233srVV1+diy++OL/4xS/m6rtq67zzzkvfvn0rvYx588038+abb+byyy/PVVddlUMPPXSO5zrttNPyl7/8pcJnH330Ua644orcfffdefbZZ7P22ms3aPlZOMxNHOrGG2/M0UcfXWnAYeFdybXXXptzzjknp512Wr3Ld8EFF+T000+vELsplPvDDz/MjTfemL59++bss8+u93ewcPrhhx9yxBFHVDlQ6rPPPstnn32Whx9+OF999VVx0Mf06dOz/PLLV/mSfdSoUXnggQfywAMP5Nprr80999yTtm3b1liGSZMmZYcddvCOjHnmxRdfTO/evWtMWtu1a9dK/Zsk+eKLL4p9rCuuuCL/+te/svTSSzdmcWttxIgR2XPPPfPhhx9W+HzKlCnFdte1116bDz/8sM7J5P/xj3/klFNOqXTff/rpp/n0008zbNiw9OvXL999990c7/nZvf7669luu+0qJLyYNGlSnnzyyTz55JM5+uijs9lmm9X6fI01PgLmpr3YkHGTK664IieeeGK1SWneeuutbLvttpXetYwZMyZjxozJW2+9lfvuuy/Tp0/PCSecUOn4hmzb9uvXL7/61a8qlHXUqFEZMGBA7rnnnjzyyCPZcsst53gemF1jjiOoa4x0l112yTLLLJPRo0enf//+1SbkT2aNGxswYECSZMcdd8yyyy5bYXt92ua1NS9jUyycPvnkk2y77bZ55513ip99//33eeSRR/L444/n1ltvzT777JPXXnstO++8cz777LPifhMnTswDDzyQxx57LI888kiVY1K/+eab7Lrrrhk8eHCFz8eMGVO89y+99NI88sgjWWmlleb698zt2IXy7r///hx00EGVnivl1bUt36lTp+y222658847c/vtt+ef//xnjeW4/vrrkyQdO3bMbrvtVml7dXGnr7/+Os8//3yef/754t+5rsS0WFCIBdQ9FjC3vv322+ywww6VEr4WxqDcdtttufLKK2t9vrfffjs77rhjpXfVgwYNyqBBg/LCCy/k0ksvbYiiw1yZmzFYs6vNs6su+87L59zcjmmGmsxpLPKcLAhjLmdXl/sd5ta8vN4aK941atSo7LDDDnnjjTeyyCKL5M4778wuu+zSgCWn1M3PuWqjR4/OtttuW2mRuLFjx2bs2LF555138uijj2bUqFG54IILGvz7IdF/oLT16NEjDz/8cIYOHZoJEyZUGsNSfn7T2LFj89prr1VaTLv8PuUThjXUvMUFxYI5I5sG8/rrr+f222/PMsssk9/85jfp1q1bZs6cmcceeyx/+ctfMmnSpBx99NHZeuuts8QSS9Tp3NOmTUubNm2yyy67ZOutt85aa62VxRZbLF9++WX+85//5OKLL87HH3+cv/zlL1ljjTWqnNhQ3n777ZcPP/wwxx57bPbee++0a9cur732Wv7617/mnXfeyR133JFll102//jHPyod+95776VHjx4ZP358FltssRx//PHZeOONs8IKK+Trr7/OAw88kCuvvDIvv/xydttttwwaNCjNmzev0++94447csghh2TmzJlZZZVVcsIJJ2SdddbJEksskY8++ijXXntt/vWvf+Xaa6/NYostlgsvvLDSOU455ZRigpk111wzp556an7yk59k/PjxufPOO3P11Vcb0Ea9TJ48Ofvvv3++/fbbLLroojnuuOOy1VZbZckll8yUKVPy4YcfZsiQIcXV/mpj4sSJ2WuvvfLoo4+mrKwsF154YU466aQks14477nnnrn11ltz22235R//+EcWWWSRKs9z8803F18S//znP5/r38rCZW7qhvnt5ZdfzkEHHZRp06alZcuWOfnkk7PzzjunZcuW+fe//50///nPOe6447LOOus0yPcdddRRefHFF3PYYYdl3333zdJLL52RI0fmb3/7W1544YW89NJL+dOf/lRl4rchQ4akT58+mT59elq3bp1f//rX2X777dOyZcu88sorOe+883L00Udn3XXXbZCysvDae++989lnn+XEE0/Mrrvumg4dOuTdd9/NSiutlM8++yzbbLNNxo4dW1xBYP/998/iiy+eN998M3//+9/z6quv5p577kmfPn2qHAA5bdq0LLPMMtljjz2y2WabZZVVVkmrVq3yySefZMiQIenXr18mTJiQAw88MMOGDaty4um8vndZODRGW63g4IMPLr7s3HjjjXPyySdn9dVXzxdffJH+/fvnzjvvzDHHHDPH8m277bYZNmxYmjZtmgMPPDA777xzOnfunKlTp+a5557LhRdemC+//DI777xzhg8fXmFwZs+ePdOvX798/vnn+e9//5u11lqrwvnLDzacOXNmnnvuuey+++5V7rPBBhtUCuJMmzYtq6yySvbYY49svPHGWXHFFdOsWbN8/PHHefLJJ3Pdddfl66+/zh577JE33ngjSy65ZB3/itX76quvstNOO2Xo0KFp0aJFbrrppkrJC2B2u+++e7p165Yk6dKlS5LkuOOOq5DIoUOHDklmDc7dYost8sknn6Rly5Y56qij0qNHj6y88sqZMGFCHn/88Vx00UV57733stNOO2XYsGFp165dld87evTo7L777mnSpEn+8pe/ZIsttkiLFi3y/PPP5+yzz864ceNy2mmnZaeddqqyXXfTTTcVkzu0atUqRx11VHbaaacsvfTSmTBhQl577bU88MADeffddysdOy/iMix8DjrooAwYMCDff/997r///hx44IFV7jdt2rRivG2HHXZIp06ditsaIp73q1/9qphgZt11180pp5yS9dZbL99++23uvffeXH755XOM582cOTN77713Hn744SRJ7969s++++2aVVVZJkyZN8tJLL+Xvf/97Ro4cmb333juDBw8uPkc22mijvP7667n//vvTt2/fJMljjz1WaRJS586dK/xNOnTokN122y1bbrllVl999bRp0yajRo3KsGHDcvHFF2fMmDE54YQTstZaa2XrrbcuHlt4hvXr1y+XX355klS5Sv1yyy1X428ur1+/fjn99NOTJEsssUR+97vfZfPNN8/06dPz5JNP5vzzz8/333+fPn36pFOnTtl5552rPdfVV1+dIUOGpEePHjnmmGOyxhprZNy4cbnxxhtz44035quvvsrPf/7zvPDCC7UuHxTUNw718MMPp0+fPpk5c2batm2b3/zmN9l2223TrFmzDBkyJOedd17GjBmT008/Pe3bt89xxx1X57Kdf/75OfXUU5MkP/nJT3Lcccdl9dVXT/v27fP222/n0ksvzQsvvJBzzjknnTp1yi9/+csG+ZtQ+mbMmJHddtutmDBt9dVXzy9+8Yt069YtrVu3zujRozNkyJBKq5wXVj7eeuuts9NOO6VLly5ZYokl8t133+WDDz7I1VdfnRdeeCFPPPFEjj/++CoTh5b3u9/9Lq+99lp23XXX9OnTJyuttFK++OKLSolaoaA+baSCCRMmZK+99sqkSZPy+9//Ptttt11at26d119/Pcsss0xxv+nTp2eTTTZJr1690rVr1yy11FLFOM6AAQPy6KOPZvjw4dl///0rTbQstJ/69u2b+++/P8suu2wee+yxSmUr7LfDDjtk1KhR2W233fKnP/2puE9dBki99dZb6d69eyZMmJAk2WOPPbL//vtnlVVWyfTp0/POO+/k8ccfr1f86bXXXismmOncuXNOOOGErL/++unYsWO+++67vP3223nmmWcqJWasjXHjxmXHHXcsJpg55JBDcuCBB2aJJZbIe++9l4suuihXXXVVpcSs1WnM8RFQ3/ZiQ8ZNXn755QwYMCArrLBCTjnllHTr1i3Tpk2rMCHmkEMOyahRo9K8efMK8Z0ZM2bk008/zYsvvljts6Ah27aPPfZYXnrppXTp0iW/+tWv0qVLl/zwww+59957c9FFF2XixIk55JBD8u6776ZFixb1+H+EhVljjSOoT4y0adOm6dOnT84777w88cQT+fTTT7P88stXef6HH344X375ZZLK42fq2zavjbmJTUFt7bPPPvn0009z2mmnZccdd0zr1q3z/PPP54wzzsi3336bI444It26dUuvXr3yww8/5Nxzz02PHj3SvHnzPProozn33HMzefLk9OnTp1LdMH369PTq1asY8+vRo0dOOOGEdO7cOaNGjcp1112X++67L2+99Va22WabjBgxovi+sT59h4YYu1AwcuTIHHzwwWndunX+8Ic/pHv37mnatGlefvnlYhnr25Y/8sgjc+edd2bcuHG59957c8ABB1RZhq+++ioPPfRQkllx95YtW1bYfs455+SPf/xjkqR9+/b5xS9+ka222iqLL754xo0bl2HDhuWee+5JWVlZtb+zOmJaNCSxgMoaMxbQEPbff/9igpnNN988J554YlZbbbV89dVXGTBgQG6++eYce+yxtTrXxIkT07t373z99dfp27dvtt1227Rt2zbDhw/PWWedlU8//TSXXXZZevfunR122KF43PXXX5/vv/++2r93Ure/OQufuj575nYMVnm1fXbVZd+5ec7V1dyMaYY5qWkscm0sCGMuy6vL/Q4F9W0fz8vrrbHiXe+//3622267fPjhh1lsscXy4IMPSmZOnc3PuWonnnhiMcHMwQcfnD333DPLLrtsmjZtmtGjR+eVV16p13tXFj6NXRfoP/BjVUgKM23atDz//PPZcccdK2yf/XodOHBgpSQzhX2WWmqpCvOjGmLe4oJEkpkSN2zYsGy44YZ5+umns9hiixU/33TTTbPaaqvl4IMPzrfffpsBAwbk5JNPrtO5119//Xz66adp3759pW077LBDTjjhhPTq1StPPPFEzjrrrBx66KFp2rRpted7+eWXc8stt1R42dWtW7fss88+6d69e1599dVcfPHFOeKII7LeeutVOPawww7L+PHj07Vr1zz++OMVJpMkyfbbb59evXpll112yb///e/079+/TivPjxkzJkcffXRmzpyZn//857nyyisrZCLcYIMNsueee+b3v/99/vznP+eiiy7KMccckzXXXLO4z+uvv55LLrmkuP+zzz5bYfLkNttsk5/97Gc57LDDal0uKBg8eHBx5ZRbbrklvXr1qrB90003zQEHHJB//OMfFVbtq864cePSq1evDB48OE2bNs0111yTPn36VNjnyCOPzK233prx48fn3nvvrXayV2EllJ/+9KfZYIMN6vHrWJjNTd0wv/3iF7/ItGnT0rx58zz++OMVAlcbb7xx9txzz2y66aa1Hhg9J0OGDMlNN92Ugw8+uPjZBhtskJ122indunXLG2+8kauvvjrnnHNOpWy6xx9/fKZPn56WLVvm6aefziabbFKhrHvvvXc222yzDB8+vEHKysLrjTfeyCOPPJLtt9+++NmGG26YZNags7FjxyaZNXH0iCOOqLDPvvvum5122inPPPNMbr/99hx22GHZaaedKpz/yCOPzBlnnFFp8PUGG2yQ3XbbLSeeeGI23XTTfPbZZ/nzn/+cm266qVIZ5/W9y8KhodtqBQ8//HAxiLzzzjvn/vvvr/CM33nnnXP22WfnjDPOqPE8Z599doYNG5b27dvnySefLN6XBVtssUUOOuigbLbZZhk9enROP/303HzzzcXtPXr0KP574MCBFYIoI0eOzEcffZSysrLssssueeihhzJw4MAKSWamT59eXHWwfJbfgrPOOiurrLJKpYGL3bp1y1577ZVf/OIX+dnPfpavvvoql1xySc4555waf29tffLJJ9luu+3y9ttvp3Xr1rnnnnsqDEaC6rRv375SrGTJJZessr36y1/+Mp988klWWmmlPPPMM5UC+T179iy2fT/44IP87W9/y7nnnlvl977zzjtZaaWVMnjw4ArJHzbaaKNstNFG2XLLLTNt2rRcddVVueiiiyocO3r06OIAviWXXDJPPfVUpfJ27949xx9/fD755JNK393YcRkWTttuu22WXHLJfPnll7nllluqjTs8+eSTxck5Bx10UPHzhojnDR8+PFdccUWSZLPNNstTTz1VIcnu1ltvnR49emSfffap8bdcc801efjhh9O8efM88MADlV5YbLrppjnkkEPSvXv3/Oc//8lJJ52U559/Psmsga3rrbdeXnnlleL+a6yxRo2rTe6000458MAD07p16wqfd+3aNbvsskt++ctfZsstt8xrr72WM844o0KSmcIzrHzStrnpb3/11Vf57W9/myRZdtll8+KLL2aFFVYobt98882z6667pnv37vn+++9z9NFH58MPP6x2QuWQIUNy1FFH5corr6zQNthmm23SokWLXHPNNXnxxRczfPjwdO3atd7lZuFUnzjU1KlTi8+atm3bZtCgQRVeOG666abZa6+9im3pU045Jfvss0+lurImb775Zn7/+98nSc4444ycccYZFa7/DTfcMPvvv38OO+ywDBgwIL///e9zyCGHFJPaQU0uvfTS4qC+PfbYI7feemulyWa77LJLzjnnnIwePbr4WdOmTfP2229ntdVWq3TOHj165PDDD88ZZ5yRs88+OzfddFP69u2b1VdfvdpyvPbaa+nbt2+D9ecoffVpIxV8/fXXadu2bZ5//vn89Kc/LX6+0UYbVdjv6aefrvK6/dnPfpaDDjoo119/fX7+85/n2WefzVNPPZVtttmmuE+hnij0DZs3b15lm6rwWaHt0759+3q3vQ4++OBMmDAhTZo0yc0335z999+/wvZNNtkkhxxySL7++utK7cQ5ueuuuzJjxoy0adMmL7zwQpZaaqkK27t3754jjzwy48ePr/O5zzrrrGLc7J///Gd+9atfFbdtuOGG2XvvvbPXXnvVeiBlY46PgPq+t2zIuMmbb76ZLl265LnnnqsQf9p8882TJB988EGGDh2aJLnwwgtzwgknVDi+8J7jr3/9a8aNG1dhW0O3bV988cXsvPPOuffeeyskCujevXsWX3zx9O3bNyNHjszDDz/8o1jFjQVLY4wjmJsY6RFHHJG//OUvmTFjRm688cZist3ZXXfddUlmJeHt3bt3hW31bZvXxtzEpqC2RowYkWeffbbCeJNu3bpl9dVXT69evfLdd99lk002ycyZM/PSSy9l1VVXLe638cYbp1OnTjn++OOrrBuuuOKKYoKZQw89NP379y/GRjbccMP07t27GOd9//33c8455+Svf/1rkvr1HU466aS5HrtQ8OGHH2bZZZfNCy+8kBVXXLH4efm/U33b8ttuu21WWmmlfPzxx7n++uurTTIzYMCAaheIGz58eHGl+DXWWCNPPfVUpURZW221VX7zm99U+X6oJmJaNDSxgMoaMxYwt+6///488sgjSZI999wzd955Z5o0aVLcvuOOO6Zr16455ZRTanW+r776KlOmTMkLL7xQYUGVDTfcMD179kyXLl0yadKk9OvXr8K4jsL794b4e7NwquuzZ27HYJVX22dXXfadm+dcXc3NmGaYk5rGIs/JgjLmsry63O9QUN/28by83hoj3vX6669n++23z+eff54lllgijz76qHlq1Mv8mqs2adKkPPDAA0mS3/zmN7ngggsq7dO7d++cddZZ+eabbxrseylNjV0X6D/wY7XBBhtk0UUXzXfffZeBAwdWeC82efLkYlLk3r1758EHH8zAgQNz0kknVTjHs88+m6TiXKmkYeYtLkiazHkXfuyuu+66CgOoCg488MBiVrLyqxrVVqdOnapMMFPQokWLnH/++UmSjz/+OCNGjKjxfL169aryRdeiiy6aq666KsmsLJaFiR0FgwYNypAhQ5IkN9xwQ7UDWHbcccfsvffeSZL+/fvXWJbZXX755Rk/fnyWW2659OvXr9qK6Kyzzspyyy1XHDRQ3hVXXJEZM2YkSa666qoKCWYKDj300GpfOEJNPv/88+K/a8rA2qxZsyqfB+V98cUX6dmzZwYPHpyWLVvmrrvuqpRgJpn1Arnwwr+QSGZ2w4YNK07Cn/0lNdRGfeuG+e3ll18udtKOOeaYKu/L5ZZbLn//+98b7Dv33HPPCp2pgpYtWxYHkH799dfFjLcF//73v4t19IknnlhhIEvBkksu2SgZeFn49OnTp8JLnYJRo0YVV83ZcccdKwzSKmjZsmWuu+66Yjvs0ksvrbTPyiuvXO1k1CRZfvnlixNcH3jggeKq2wXz495l4dCQbbXy+vXrl2TW/XH11VdX2U/p27dvjcHtCRMm5LLLLksya5W66l62rrTSSvnDH/6QJLnzzjvz/fffF7ctueSSxQy7VWX1TZJ11lmnOAF/9n2GDh2a7777LknlIEySrLrqqjWujNelS5cceeSRSZL77ruv2v3q4u23384WW2yRt99+O+3bt88TTzwhwQwN7qOPPsrtt9+eZFa9VtWqesmspBDHH398kjnHMy655JIKCWYKtthii2I7r6oY0CWXXFJMcnXVVVfV+NwonxyicL7GjsuwcGrWrFn222+/JMnjjz+er7/+usr9CoNw2rZtm9122634eUPE86666qpim/Hqq6+ukGCmYO+9965xMtzMmTOLExp++ctfVprEU9ChQ4diLHXw4MEVVsSuq+WWW67GwcLt2rXL2WefnSR5/vnnq/3bNoTrr7+++Hy58MILKz1DklnPudNOOy3JrFWCa6rPl1lmmVxyySVVtg3KD0SuT7wb6hOHuvfee4sT4/v27VtpRYtkVlu6cH9PnDix2lhqdf7+979n6tSp6datW6XJOAVNmjTJJZdckpYtW2bChAm566676vQdLJxmzJhRvDaXX3753HjjjZUG9RU0adKkQjuzrKysygQz5f3xj39Mp06dMnPmzOIgqeqsscYaxUltMC+ceuqpFQZLVaWmxEhJcvjhhxef+w0Vj6ivxx9/PMOGDUsyq805+6Sy8hZffPEq27U1KcS21lhjjUoJZspr165dhQljczJ58uRi/3CjjTaqkGCmoGnTprnyyivTqlWrWp+3scZHQH3ai40RN7nsssuqHatT21h0WVlZpQncDd22bdWqVa6//voKCWYKfvnLXxY/dz9SH40xjmBuYqSrrrpqMYl+dffwF198kX/9619JZk0IL/9Oc27a5nMyP2JTLJxOOumkKseb7LLLLllppZWSzEoQcM4551RIMFNw+OGHF9t8s9cNhXeZSyyxRC699NIqYyNnnXVWcSGMq6++OpMnT67X72iosQvl/eUvf6mQYKa8uWnLN2nSpDge76mnnqo2CUyhvu7atWul+v3888/PjBkzUlZWlttuu61Sgpnyqort1kRMiwWNWEDdYgFzq9AOW2SRRXLFFVdUGS/49a9/XacJweecc06FBDMFq622WnGhI4nymJ8aYgzW7Grz7KrLvvPyOVffMc1QG9WNRa6NBWXM5ezqcr/D3JoX11tjxLuGDBmSHj165PPPP88KK6yQQYMGSTBDvc2vuWrffPNNMRlwTe9xkqRjx44N+t1Qnv4Dpaxp06bZYostklSeu/Tvf/87kyZNSrt27YoLEz333HPF3A9J8uWXX+att95KUnl+09zOW1zQSDJT4rp06ZKf/OQnVW4rKysrrub6wQcfzPV3TZ48OSNHjsybb76ZN954I2+88UaFG6CQaKI6hx9+eLXbNt5442Jg+Mknn6ywrTAwdc0110yXLl1q/I5C4+vll1/OtGnTaty3qu/o1atXtZ2aZNbkl8022yxJiqtnFBTK3aVLlxqz5ErEQX0ss8wyxX/XdZJCeR999FG22GKLvPrqq2nbtm3+9a9/FV++zK6srKx4vT799NMZOXJkpX0KZWnRokWF1cShtupbN8xv5ctT02/YY489akzYVhc13WPl653Z6/zyZT3kkEOqPccuu+ySxRdffC5KCNVfpwMHDsz06dOTpMpBWgUrr7xytttuu0rHVOfbb7/Nhx9+mP/85z/F9mlhsm1hW3nz495l4dBQbbXypk+fXgx4bL/99sUJMrNr0qRJDjvssGrP8+yzz2b8+PFJUpzEUJ1Cf2rq1KnF1XALCsGTQsbe8udPkp49exYHWL/22msVsqsX9ikfzKnJ2LFj8/7771e4twv35JtvvlkMvtfXsGHD0r1794wcOTJLL710nn322fzsZz+bq3NCVR5++OFMnz49rVu3nmPC2cL9N2rUqCr7XsmsVdd22WWXas9RaBNWFQN66KGHkiSrrLJKdt1111qVv2BexGVYeBXaj1OnTs0dd9xRafsPP/xQfDm1++67V0is0pDxvK5du1Y5aLXg0EMPrXbbm2++mffffz9J7evaqsoyN77//vt89NFHFerO8i855hS3nRuFv2H79u2z5557VrtfIWFc+WOqsvfee1f7/+eaa65ZTOzdEPFuFj71iUMV/l0+VlqVffbZJ+3atat0fG08+OCDSZK99tqrxuSL7du3L9bFDfkMoXSNGDEin376aZLkqKOOqnJxhNqaMWNGRo0albfffrtY17z11lvFyWlzqmv222+/NG3atN7fD3VV13dWM2fOzOeff5533nmneI2/8cYbxQGvjdmeqo1Cny5JpdWVGkIhtvXmm2/mpZdearDzvvLKKxk3blySVDlgrGCppZaqdfLfeTk+goVPfdqLDR03WWGFFdK9e/dqz1E+Fl3XJL8N3bbdbrvtsuSSS1a5bdFFFy0OTHU/Uh+NMY5gbmKkyf9iG++++26Vk5sHDBhQvLdnv8casm0+uwUhNsXCoabkBoX2WVlZWTGx+OwWWWSRKuuGUaNGFQdz77vvvll00UWrPL5Zs2bFZ8PYsWOLiRfqqqHHLrRo0aK4EEdV5rYt//Of/zxNmjTJjBkzcsMNN1TaPnTo0Lz++uvFfcubMWNGHnnkkSSz3uUW2soNRUyLBY1YwLwzbdq04jiQHXfcMUsssUSV+5WVldU4XnH2fQ888MBqtxfehX/zzTfFWAPMaw01Bqu8ujy7FrTnXH3HNENt1HdeyoI25rI8c22Yl+bF9dbQ8a7HHnss2223XcaOHZs111wzgwcPzpprrtkQRWUhNb/mqi2++OLFJPw33XSTcbTMN/oPlLrC/KahQ4dmwoQJxc8LMastttgiP/vZz7LIIotk7Nixee211yrtk6Q4B6o6dZ23uKCRZKbEFVZnqE4ho11h1fi6+v7773Peeeflpz/9adq0aZOVVlop6667brp06ZIuXbpUePE0ZsyYGs+10UYb1bh94403TpK88847mTJlSvHzV155JcmslebLyspq/F8ha9nUqVMrTGqsyfTp0zNixIgkyZVXXjnH7yisplB+hajJkycXV3ip7e+Euthiiy2yyiqrJJn1wmjjjTfOeeedl8GDB1e4X2ry1ltvZfPNN897772XxRdfPE899VS23nrrGo/p06dPmjZtWuWL6smTJ+eWW25Jkuy2226SU1Av9a0b5rfCAI0WLVrUmNmzefPmDTZIo6Y6v3wG29nr/DfeeCPJrOyeNU2abNq0aZUrFkJdVDe4v3AdJqlydbPyCtsnTpxYZYDg448/zoknnpiVV1457dq1yyqrrJL11luv2D49+uiji/vO3j6dH/cuC4eGaKvN7v333y+uqDk3fYxCfyqZNQGhpr5O+dU5yvd3kv8FTz7//PP897//LX5eeCnbs2fPrLjiiuncuXNmzpyZ5557rtI+Xbt2rXKV6WTW/fnzn/88yyyzTDp27JjVVlutwr1dWPF+xowZGTt2bI1/j5oMGjQoW221Vb766qusvPLKef7556t9dsHcKtx/EydOTLNmzWq8/3r16lU8bvb7r2D11VevccX46mJAU6dOLdbFW2yxRY0DfWv6HY0Vl2HhtskmmxRXtb355psrbX/ggQeKwf/yL5kaIp43adKkvPfee0lSY8LoJOnWrVu128rXtZtttlmN5Sg/oKK6e722xowZk9NPPz1rrrlmFl100XTu3LlC3Vk+KdWc4rZzo/B82WCDDWrM3r/UUktl5ZVXrnBMVeYU7+7QoUOS+se7WbjVJw5VuF47d+5c7UD5ZFY/s9CPrOkan93HH3+cr776Kkly2mmnzfF5VnjmzO0zhIXD8OHDi/+uabJ6dWbOnJkBAwZkq622Stu2bbPccstlrbXWKtY1Xbp0KdbHc6pr9LuYl9q2bVuM08zJww8/nF69eqVdu3ZZZpllikkiCv97+OGHkzRue6o2CvfziiuumJVWWqnBz3/AAQekefPmmTx5cjbffPP07t07V1xxRaVFZ+qqfJ04N23u8hp7fAQLt/q0Fxs6bjKnOrNz587Fev0f//hH1l133fzxj3/M008/XYwnV6eh27buRxpTQ48jmNsYaTJrlc1CXKKqBQcKn2200UaVVkOf27Z5TeZXbIqFzxprrFHttsJiEZ06dSreJzXtV75uqM9YgtmPq4uGHLuQzHp306pVq2rPMbdt+eWXX76YkLGqBHOFZ0/Lli0rTdT48MMPi4kYGvrZI6bFgkYsYN56//3388MPPyRpuP5+p06dahyDXNP4SJhXGmoMVkFdnl0L4nOuvmOaoTbq+15pQRtzWVCXexjm1ry63hoy3nXXXXdl1113zcSJE7PBBhtk0KBBWWGFFea2iCzk5tdctZYtWxaTMN91111ZbbXVcuqpp+Zf//qXhJnMM/oPLAwK85umTZtWYXGI8vObWrZsmU033bTC5+X/vcQSS2SdddapdO65mbe4oJFkpsSVX723KoXJR9WtpFCTjz76KF26dMnpp5+e1157bY7nKASMq1PdCkYFSy21VJJZg1bLTxr88ssva1niiuY0gKbgm2++qVdWwPLnHzt2bHGAXW1/J9RF8+bN8+CDD2bttddOMmt1s9NPPz1bbLFF2rdvnx133DG33HJLjffpHXfckVGjRiVJLr/88lolPFp22WWz8847J5n1orr8QNL777+/OPitppXOoCb1rRvmt8K137FjxzmuwNtQz/2a6vzyk41nfw4U/m61KWtNA0qhNqobMFZ+sPSc7vull166yuOS5JFHHsk666yTSy+9NB9//PEcyzN7+3R+3LssHBqirTa7utw3NV2vDdWfKmT6Tf4XVPn000/zwQcfpKysrLi9EKwp7DN9+vRi0Ka6LL/XXnttNthgg1x//fW1Glg4p75nTa677rp8++23SZLbb7+9mNgAGkNDxzNqGwOaMWNGhc+/+eabYl+u/GrXtdXYcRkoDHwfMmRIPvroowrbColnllxyyWy77bbFzxsinlf+pe2c+kI1bZ8f98jQoUOz1lpr5bzzzss777wzx4m/c1N3zkmhzTKn9kryv7Z+TUmoGjPeDfWJQzX0NT479SyNqfxL7Lq2AydNmpRddtklhxxySAYOHDjHumRO22uaZAgNrTBxtSYzZ87MkUcemV69euXhhx+e40ClxmxP1Ubhfq5Pn6421lprrdx6663p0KFDpk2bloceeijHHXdcunTpkiWXXDKHHHJIBg0aVOfzln+vMzdt7vK0F2lM9WkvNnR7rjZ15q233prNNtssSfLmm2/mnHPOyTbbbJP27dtnyy23zBVXXJFJkyZVOk7/jR+Thh5HMLcx0iRp1apVDj744CSzxt98//33xW0vvfRS/vOf/ySpevzM3LTN50S/knmlNmNW6lM3NNRYgtpq6O+bU93dEG35I488MsmsCbPlF/oov0Dc7rvvXqksnj0sTMQC5q352d9P9DGYfxq6/qvNs6su+87r51x9xzRDbdT3vdKCNuayoC73O8yteXW9NWSf87LLLsuUKVPSsmXL3Hfffea00CDm51y1Sy+9NL17904yK1HB+eefn1122SWLL754Ntpoo5x//vkZP358g34nlKf/wMJgww03LC6uUJi7NGXKlLzwwgtJ/jd3afb5TUny7LPPJqk4R6pgbuctLmiaze8C8ON1yCGH5MMPP0xZWVkOP/zw7L///ll77bWzxBJLpEWLFikrK8uMGTOKE3TnNJmhPqvAJP+rGH76059mwIABtT5uueWWq9P5k1kv5H71q1/V6rgWLVpU+Xl9fyfMyTrrrJPXX389Dz74YB588ME899xzee+99/LDDz/ksccey2OPPZYLL7ww//rXv6rsDO2www55/vnn8/333+eEE07IuuuuW2WmtdkdeeSRefDBB/PBBx/kueeeK1aehZVQll9++Wy//fYN+2NZaPzYn5k/9vJDQ5tT4pak/vfNmDFjcuCBB2bixIlp27ZtTjnllOywww5ZddVV065du2Lb7Omnn84222yTpPr2qXuXxjC3bbWazM01W76/M2zYsDRv3rxWxy2//PIV/nvppZfOmmuumbfffjsDBw7MscceWwyurLPOOsWXOj169Mj1119fDMKMGDGimNSlqiDMf//73xx77LGZNm1allxyyfz2t7/N1ltvnZVXXjmLLrposbzXXXddjjjiiCRz7nvWZLfddstDDz2U6dOnFydLLoiDsigNhfuvU6dOeeaZZ2p9XOfOnRurSPXS2HEZOOigg3L22Wdn5syZufXWW3PaaaclmTX457HHHkuS7LfffmnW7H+h7oaO582N8mV58MEHs/LKK9fquLq2BwqmTJmSfffdN19//XWaN2+eE088MbvttlvWWGONdOjQIS1btkySfPDBB8VkanNTd9aWNjY/BnNznTbWNV7+GfLHP/4x++yzT62Oa9OmTaOUBwrOPffcPPLII0lm9eWOP/74bLDBBll66aWzyCKLFAd4bLnllhk0aNAc65raxIygodTmervuuuty7bXXJknWX3/9nHTSSdlkk02y3HLLpXXr1sVzHHroobnpppvmSXtqfttrr72y7bbb5vbbb89jjz2WQYMG5auvvsqYMWMyYMCADBgwIIcddliuu+66CoO8oJTUp83X0HGT2jzDlltuuQwZMiRPPfVU7rnnnjz77LN58803M3Xq1AwaNCiDBg3KBRdckH/9619ZY401Kh2v/8aPwYJ6nR555JG55JJLMmHChNx111057LDDkvxv/MwiiyySAw44YJ6WaV7HpqAxzet7vyG+b170d3v37p2llloqX3zxRa6//vpsueWWSZL77ruvOAlqXi8QJ6bFgkYsAJgXGmoMVkFd2hGecyxsGqKdvSCMuSzwnox56cd4ve2555655557Mnny5Oy333557LHHsuiii87vYvEjNz9jzIsttlgeeOCBvPTSS7njjjsycODAjBgxItOnT88rr7ySV155JRdccEHuu+++4qIC0JD0H1gYNGvWLJtvvnkee+yx4tyll19+OT/88EPatWuXrl27JvnfHKbnnnsuM2bMyDfffJM333yzwraChpy3uKCQZGYBVN3K0rMrv+LJvPbf//63uNr86aefnj/96U9V7leXFSG++OKLrLDCCjVuT2Y14spnnl188cWTJBMmTMh6661X6++rrY4dOxb/PXPmzHp9R/nsboXfUZ05bYeaNG3aNLvvvnt23333JMno0aPz6KOP5rLLLsvQoUMzdOjQHHPMMbn33nsrHbvpppvmtNNOy84775wvv/wy22yzTQYOHJg111yzxu/cZZddsswyy2T06NG5/vrr06NHj3z22Wd5/PHHkySHHXaYwaTUW33rhrKyssycOXO+1aWFsnz99deZPn16jR2w+f3cL5T1m2++mWNZv/rqq3lVLBYy5dtbc7rvP//88yqPu+uuuzJu3Lgkyb333pttt922yuNrap/+mO5dfpzmpq02u/L13tz0MQr9qWTWClHVvcisjR49euTtt98uJpcpBGMK2X3L//u1117LN998U9ynSZMm6d69e6Vz9u/fP9OmTUvTpk3z7LPPZq211qryu+u7GuHsdt999+y///45+OCD884772TrrbfOwIEDa1yZBOqrcP999913WXvttefbC9yOHTumSZMmmTFjRkaPHl3n4xs7LgNrrLFGunXrlldeeSW33HJLMcnMXXfdlSlTpiSZlYimvIaO582pL1TT9vJ1bfv27Rv9Pnn66afzwQcfJEn69etXXMV2dg1Vd85Jx44dM3r06Fq1nwtt/fL//8G8VJ84VOF6baxrvPwzpHnz5upaGlSnTp2K/x49enS1/a3ZzZw5M9dcc02SpHv37nn66aerfQcwr+obaGhXX311kmS11VbLkCFDssgii1S534JyjRfu5/r06eqiXbt2Ofroo3P00UcnSd56663cf//9ueSSSzJq1KjccMMN6dq1a62TPJaPb3311VdVJrsovx3mt/q0F+dn3GSbbbYpDl77+uuv8+STT+aqq67K008/nffffz/77bdfhg8fXty/sdu20JDqO46gOnMbIy34yU9+ko022igvv/xyrr/++hx22GGZNGlSbrvttiSzJsS0a9eu0nH1bZvXxryOTUFDm30sQU2qG0swN99Xn7ELddEQbfnmzZvn0EMPzfnnn58777wzl1xySdq2bVtMcLXiiitWOX5i9mdPQxLT4sdILKDhzN7fr4n+PqWkIcdgNYYf23MOGsOCOOYSFkTl3/3WNB+nprk4DRnvOvHEE7Ppppvm1FNPzQsvvJCdd945jzzySNq2bVvvc8KCMFdt4403zsYbb5xk1jjigQMHpn///rnnnnvy5ZdfZq+99sr7779fbbsNGpP+A6WgR48eeeyxxzJ06NBMmDChOHdpiy22KM7b2HTTTdOqVauMHTs2r732Wt5///1iYpjy86CShpm3uKAx638BVMimWLjYqvPOO+/Mg9JU7T//+U/x3/vtt1+1+73yyiu1PufLL79cq+2rr756hVWFCxmjPvjggwov7hpKixYtsu666yZJBg8eXK9ztGrVKquvvnqS2v9OaAjLLLNMDj/88LzwwgvZYIMNkiQPPfRQfvjhhyr379GjRx588MEsssgi+fzzz7PVVlvl3XffrfE7mjZtmj59+iSZVVFOmDAhN9xwQ2bMmJGysrIcfvjhDfqbWLjUt24o1KWFFXmqMnPmzLz33nsNUMrKunTpkmTWCvKvvvpqtftNmzYtI0aMaJQy1Fahjps8eXKF+n1206dPn+9lpXSVH1D073//u8Z9X3rppSRJ69ats8oqqxQ/L1y/HTt2rLajltTcPv0x3buUhrq21cpbddVVi8G6ueljFPpTSf37OwWFIMrnn3+e//73v8VkM+WDKyuttFJWXnnlzJw5M88991xxn/XXX7/KgdWFe/unP/1pjS+Z6tL3nJP9998/N9xwQ5o0aZL//ve/2XrrrfPll1822PmhoHD/TZ48uUGv4boqP7h30KBBdc6Y3dhxGUj+l0TmjTfeyGuvvZYkufnmm5PMqhM32WSTCvs3VDxv1VVXTZIMHTq0xn1ruocbqq6t7eopDRW3bajVWgrPl2HDhmXatGnV7vfll1/m448/rnAMzGv1iUMVrtcPP/ywxsHwU6dOLU7grcs1vsoqqxTbyXPbXofZFfqhyazVWGrrm2++Kbb79tlnn2oTzEyYMCFvv/323BUS5qCxVpgrtKl23XXXagdLzZw5M8OGDWuU76+rwv08cuTIYptqXlh77bXzf//3f3nxxRfTpk2bJMkdd9xR6+MLbfZk7trcMK/Up724oMRNFl988ey333556qmnsuuuuyZJRowYUWEsQGO3baEh1XccQXXmNkZaXiHh73PPPZcPPvgg99xzT3EM4M9//vMqj6lv27w2GvI9EMwP9RlLMPtxSe37Dg0xdqEuGqotX3j2fP/997nzzjvz6aef5oknnkhS/QJxnTt3LiZbb+hnj5gWjUksYJb5FQuojVVXXTWtWrVKsuD09xvrumHhUZtraEFve//YnnPQGBbEMZcwtxqjnVOYi5PUPB+npnmtDR3v+u1vf5s///nPSZLnn38+u+yySyZOnDjX52XhtaDNVVt00UXTu3fv3H333fnlL3+ZZFaCpueff75O52Hh1Bh1gf4DpaAwj2natGl5/vnnq5zf1LJly2y66aZJZi2yXdinU6dOFca0JA0zb3FBI8nMAqhz585JZmWgq27w5ZQpU3L33XfPy2JVUH5CQE2Z9a644opan/OGG26odtvLL7+cN954I0kq3XyFATAzZ87MRRddVOvvq4vCd/z3v//NY489Vq9zFMr9+uuvV1gNanbXXXddvc4PNWnevHl69OiRZNb9W1MSq6233jr3339/WrVqldGjR2errbbK+++/X+P5jzjiiJSVleX777/P7bffnv79+ydJttxyy+KELKiP+tYNhbq0pkbZI488MseEbvVVvjw1/YZ77723xuDCvFBYwTBJbrrppmr3e/jhh/P111/PiyKxEOrZs2cxC2hNbaGRI0cWB0KVPyb5X/t00qRJ1WaGnjhxYo3X+Y/p3qW01KWtVtCsWbNicOPxxx+vdnWqGTNm1Hg9b7vttmndunWS5OKLL56rgdOF35Akt9xyS959992UlZVV+Dz5X1Dm6aefzqBBgyp8NrvCvV1Tv3P06NF54IEH6l3uqhx00EG5/vrr06RJk7z55pvZZpttMmbMmAb9Dujdu3cxsP/Pf/5zvpclmTWJ6P7776/TsfMiLgP7779/se13880359NPPy3WIYUENLNriHheob80fPjwGpNy3njjjdVu22CDDYqrVl111VWZNGlSvcpSGIybzEpOVZ3axG1nzJhRXGlibr9vTgpt7HHjxuWee+6pdr9rr7222A6p6eULNKb6xKEK/545c2Zxdeiq3HXXXRk/fnyl4+ekadOm2XnnnZPMave/9dZbtT4W5uSnP/1pcWWua665JhMmTKjVcbV9R3jNNdfUmGAMGkJDtVlmV5t4xP3337/ArBZe6NMlyT/+8Y95/v0rrLBC1lhjjSSpU/ykW7duxYmnAwYMqHa/L774ot5temhI9WkvLohxk/LvBsvfs43dtoWGVN9xBDWZmxhpeQcccEDatGmTmTNnpn///sX7qXPnztlqq62qPKa+bfPaaKjYFMwvyy67bNZee+0ksxIaVnd/TJ8+vTherUOHDhUmsyW17zs0xNiFumiotvwaa6yR7t27J0muv/76Wi0Q16RJk+yyyy5JkmeffbbGMa11JaZFYxILmGV+xwJq0qxZs2y55ZZJkkcffbTaJJYzZ86scRxXQypcNw15zbBwqc2zpyHHYDWGH9tzDhrDgjjmEuZWY7SPV1555eK/a5qPc+utt1a7rTHiXaeddlrOOeecJLMS1/Tq1atWi3pCVRbkuWrVvceB6jRGXaD/QCno1q1bccGkJ554IkOGDElSee5S4b8HDhyYgQMHJpk1L372BE4NMW9xQSPJzAKo/AS8v//971Xu8+tf/zqfffbZvCpSJauvvnrx34WXc7O7/PLL6/TS/YEHHqhyZbMJEybkmGOOSTLrxVbh3wXbb799Nt544yTJ+eefP8fV0V5//fU8+OCDtS5XkvzqV79K27ZtkySHH354jZNKklkT8QurKhccc8wxxYfK0UcfXWUFe/PNN+df//pXncoGyayVlGrKcjllypRiFrW2bdtmiSWWqPF82223Xe677760bNkyn332Wbbaaqt88MEH1e6/6qqrFivTvn37Flc8q24VJqit+tYNhbr03//+d5UZwj///POceOKJjVDiWTbeeOPigJXLL7+8yuyxo0ePzimnnNJoZaitzTbbLD/5yU+SJJdcckmVKzF99dVXOfnkk+d10ViILLvsstljjz2SzAqqVRW0mzJlSn7+859n6tSpSZITTjihwvZC+3TixIlVPjemT5+eI488MqNGjaq2HD+me5cfl4ZuqxUcd9xxSWYFA4855phMnz690j7nnXdeXn/99WrP0b59++L9NGTIkJx88snVBjySWZN6rrnmmiq3LbvsssV78eKLL06SrLPOOpV+T6GevvHGG4tB9NkT0RQUzvfuu+8WgzrlTZw4MQceeGCjvCg69NBDc80116SsrCxvvPFGttlmGwnXaFBrrrlm9tlnnyTJbbfdlgsvvLDG/T/88MMaX8zOjRNOOKEYRD3mmGOKL8iq8umnn1b473kRl4Gll146W2+9dZJZAxRuueWW4iCd6pLMNEQ87+ijjy7G84466qgq65u777479957b7XnbdKkSU4//fQks1auP/TQQ2t8kfftt9/m0ksvrfT5MsssU/x3TcmAaxO3Pe200+a4gkRtv29ODj/88OLgqt/85jdVxrRfffXV4mpDyy23XHbfffd6fx/MjfrEoXbfffcsu+yySZJzzz23yrb3J598UuxHtm7dutqJPdU57bTT0rRp08yYMSN77713pbq4vOnTpxeTccGcNGnSJL/97W+TzGrjHXrooZkyZUqV+86YMaMYU1liiSWKK53feuutVdZrL7/8cv7whz80TsGhnIZqs8yu0KZ68MEH880331Ta/v777+f4449vsO+bW9tuu2023HDDJLPi/Lfddlu1+3799dd1jqPcd999NQ6E/OSTT/Lf//43yf8GVtZGq1atcuihhyaZ9dyoKgHHjBkzcswxx5gQzwKhPu3FeR03GTFiREaMGFHt9pkzZ+bJJ59MMmtVw/KTBeZF2xYaSn3HEdRkbmKk5S266KLZd999kyRXXnllnn766SRJnz59ql1NtL5t89poqNgUzE+FtvdXX31VXMl5dmeddVbefPPNJLNiuS1btqywvbZ9h4YYu1AXDdmWP/LII5PMejd9ySWXJJk1OL6mNvopp5ySJk2aZObMmdl///1rfL7VNd4kpkVjEQuYpbFjAXOr0A774Ycfcuyxx1Y5DuXCCy+cZyueF66bhrxmWLjU5tnTkGOwGsOP7TkHjWVBG3MJc6sx2scdOnQozi+5/vrrq6w3nn/++RoTqzdWvKtv374544wzkiTPPPNMevfu7R0S9TK/5qp98MEHxTkD1Xn88ceL/67Lu1cWXo1RF+g/UAqaN2+en/3sZ0lmLYT5/fffp127dunatWuF/QrP9qeffrr4frCq+U0NMW9xQSPJzAKoa9eu2WyzzZIkV199dfr06ZNnnnkmw4YNy+23355tttkml112WfHinl9lXG+99ZLMeiG+33775aGHHsrQoUNz//33Z5999skvfvGLbL755rU+Z7du3XLggQfm+OOPzzPPPJOhQ4fm+uuvT7du3YqrJBx//PHFjkp5t9xySzp27Jjp06dnv/32y6677pqbb745L730UoYOHZpHHnkkf/7zn4sT6efUGJvdUkstlRtuuCFlZWUZPXp0unXrluOOOy4PPPBAhg0bln//+9+5++6787vf/S6rrrpqevXqlZEjR1Y4x09/+tNixfnKK6+kW7du6d+/f4YOHZqnn346xx13XA499NB069atTmWDJHnqqaey5pprpmfPnjn//PPz2GOPZdiwYRk8eHCuv/76dO/evfgy5ogjjkizZs3meM4ddtgh99xzT1q0aJFPPvkkW2+9dT7++ONq9y+8qP7888+TJIsttlj23nvvBvh1LMzqWzccffTRadasWWbOnJnevXvnn//8Z1555ZUMGTIk559/frp27Zrx48dXmHzX0Pr165dmzZpl6tSp2W677XL66afn+eefz8svv5xLL700G264YUaPHp2f/vSnjVaG2rrsssvSpEmTTJ48OVtvvXX+8Ic/FMt6+eWXZ8MNN8wnn3yS9ddfP0mqHfwGc+Mf//hHOnTokGRWkrKjjjoqTz75ZIYOHZqbb745m2yySZ566qkkyb777puddtqpwvH77rtvcaDY4Ycfnv/7v//LU089lVdeeSU33HBDNtlkk9x6661zbJ/+mO5dfjwao62WzFqZqrA61YMPPpjNN988t99+e4YNG5ZHH300+++/f/r27TvHPsbZZ5+dTTbZJEly0UUXZYMNNshll12WwYMHZ8SIEXnmmWdy6aWXZvfdd8+KK66YK664otpzFYIphZVsZ8/yW/6zwj5NmjQprmI1u0MOOSTJrJdIu+yyS/785z/nueeey0svvZTLL78866+/fgYOHFinvmddHH744bnyyitTVlaW1157Ldttt13Gjh3bKN/Fwunyyy/PKquskmRW8oUePXrk2muvzYsvvpjhw4fnySefzN///vdst912WW211XL33Xc3SjmWXnrpXH755UmSL7/8MhtvvHF+9atf5dFHH82IESPy/PPP54orrsjOO+9cZdC0seMykPwvmcwnn3yS8847L8msPuMaa6xR5f4NEc/bcMMNc9RRRyVJXnjhhWy00Ua54YYbMnTo0DzzzDM58cQTs99++xUnDCZV95eOPfbY4sSEO++8M+uuu27OP//8PPvssxkxYkSee+65XHXVVTnwwAOz7LLL5swzz6x0jq5duxZXnPjDH/6QJ554Iu+8807ee++9vPfee8XBwTvssEOWXHLJJLMGVhx77LF57LHHMnTo0Nx+++3Zdttt87e//W2OdWf52PPJJ5+c5557Lu+++27x+woZ+edkiSWWyPnnn59k1qCRDTfcMP/85z/z0ksvZciQITn77LOzxRZbZMKECSkrK8tVV12V5s2b1+rc0NDqE4dq0aJFrrrqqpSVleXbb7/N5ptvnnPOOSdDhgzJv//97/zjH/9It27dii8OL7jggnTq1KlO5erSpUsuuOCCJMmbb76Z9dZbL6eeemoeffTRDB8+PC+88EJuvfXW/PKXv8wKK6yQgw8+eK5WRGLhcvzxx2e77bZLktx7773p0qVLLrroogwePDjDhw/PI488kjPOOCNrrbVWrrrqqiSz+nCFevm1117LFltskVtvvTWvvPJKnnrqqfzmN7/JlltumVatWlVbT0NDqW0bqa4KiU9GjRqVzTbbLNddd11eeumlPPfccznzzDOz4YYb5ptvvikmrF4Q3HTTTWnbtm1mzJiRAw44IHvttVfuvPPODB06NC+99FJuueWW9OnTJyuttFK++OKLOp37n//8Z5Zbbrnsu+++ueKKK4rt2GeeeSbnn39+Nt988+Lf+thjj63Tuc8888wsvfTSSZKTTjophx56aDF2dscdd6R79+65//7759jmhnmhvu8t52XcZMSIEenatWs23njjnHPOOXn44YczdOjQvPjii7n11luzww47FJPY7LrrrhUGnc6Lti00lLkZY1aduY2RllcYP/Pll19mxowZadKkSfr06VPjMfVpm9dWQ8SmYH469thji+Npr7/++myzzTa5++67M2zYsDz88MPZa6+9iiuZr7rqqlUmPa1L32Fuxy7UVUO15ffZZ5+0a9cuSYr7zWmBuPXXXz9nnXVWkuSdd95Jly5d0rdv3zz11FMZMWJEBg4cmH/+85/Zcssti+9ta0tMi8YiFvA/jRkLmFt77rlntt9++yTJPffcky233DJ33HFHhg0blsceeyyHHHJITjnllHnW3y+893r55Zfzl7/8Ja+++mrxmpmfi/7y41HbZ09DjsFqaD/G5xw0hgVxzCXMjcZqHxfmP37xxRfp3r17brvttgwfPjxPPfVUfv3rX2fbbbed433SWPGuM888M3379k0ya3z2brvtVmNSZajK/JqrNnLkyPTs2TPrrrtu+vbtm/vuuy8vv/xyXn755dxzzz3Zb7/9ctlllyWZFbcp1DVQk8aoC/QfKBWzz2/aYost0rRp0wr7bLrppmnZsmW+++674gKoVc2Daqh5iwuS2s1cY5677rrr0qNHj3z55Ze54YYbKq2IcMopp2TdddetciX3eaGsrCw33XRTtt5664wdOzZ33HFHpcxLXbp0yZ133llc6WhO7rjjjmyzzTbp169f+vXrV2n7XnvtVe2q3quuumpeeOGF7LXXXnnjjTfy4IMP1ri602KLLVarMpW355575v7770+fPn3yzTff5Iorrqi2o9+kSZPi6jblXXjhhRk1alTuueee/Pe//620qlPnzp1z++23Z9VVV61z+WDGjBl59tlnaxx0tttuuxUnYtXGzjvvnLvuuit77bVXPv7442y11VZ59tlns8IKK1Tad88990yHDh2Kk27322+/4krVUF/1rRvWXXfd/O1vf8uvf/3rjB07NieffHKF7R07dsx9992XP/zhD3n33XcbpeybbLJJbrzxxvTp0yeTJk3KeeedV+H+a9asWfr165fBgwfn1VdfbZQy1NYWW2yR6667LkcddVQmTpyYP/3pT/nTn/5U3N6sWbNcfvnlee655zJixIhi5xMa0vLLL5+nnnoqvXr1yqhRo3LNNddUmbl/zz33rHK1sOWXXz6XX355jjzyyEyaNCl//etf89e//rXCPvvtt1+OOuqobLvtttWW48d07/Lj0hhttSS5+eabs9NOO2Xw4MH597//nf3337/C9q5du+bKK68srmBVlZYtW+aJJ55Inz59cs899+TVV1+tccW9mvpTPXv2rHDvVhVcWXnllbPSSisVExj+5Cc/Sfv27as830YbbZSzzjorZ5xxRsaNG5ff//73lfb5zW9+k/XWW6/KjPAN4aijjsr06dPzi1/8IsOHD892222XJ598stoyQ1107NgxgwcPzr777ptBgwblueeey3PPPVft/vWJZ9TWIYcckhkzZuS4447LDz/8kIsvvjgXX3xxpf1WWmmlSp/Ni7gM7LnnnsXrszDQvDDBvaZj5jaed8kll2TUqFF56KGH8p///KfShKDOnTvnlltuyWqrrZYkVfaXysrKcvvtt+dXv/pVrrjiirz//vs59dRTqy13IUlMeYsuumh++ctf5m9/+1uGDRtWHJhb8Mwzz6Rnz55p06ZNbrzxxuy+++6ZNGlSrrzyylx55ZUV9u3Zs2cuvfTSYhLxqqy22mrZd999c8cdd+Txxx+vsEJKknz44YcVVryvyS9+8YuMGzcuf/jDH/LFF19U6qMns9ojV111VXbeeedanRMaQ33jULvsskuuv/76HHPMMfnuu+/yxz/+MX/84x8r7NO0adOcc845xZXx6uqkk05KmzZtctJJJ2X8+PE5//zziwmcZteiRQuxG2qtSZMmue+++3LYYYflrrvuyjvvvJOTTjppjsede+65xUGyr7zySg488MAK2zt27Ji77747f/zjH/POO+80Uumh9m2kuvrVr36VJ554Io8//njeeeedHHHEERW2L7LIIrnxxhvz8MMPz7MVt+dk7bXXzsCBA7PHHnvkk08+yT333JN77rmnwc4/ceLE3Hnnnbnzzjur3N6kSZOcddZZ2X333et03o4dO+bRRx/Ndtttl6+++io33XRTbrrppgr79OnTJ927d89LL72UpOo2N8wL9W0vzo+4SWFAcnV+9rOf5dprr630+bxo20JDmJsxZjWZmxhpeT/72c+yzjrr5M0330ySbLPNNllxxRVrPKa+bfPaaIjYFMxPTZs2zUMPPZRdd901gwcPztNPP52nn3660n5rr712HnnkkbRt27bStrr0HeZ27EJdNVRbfpFFFsmBBx5YTJjVrl277LXXXnM8rm/fvmnSpEnxfey5556bc889t9J+c0qwVRUxLRqDWMD/NHYsYG7dfvvt2WGHHfLSSy9l8ODBlcZ0dO3aNf369StOTm7MZ8Bxxx2Xyy+/PN98801OO+20nHbaacVtPXr0yMCBAxvtuykNtX32NOQYrIb2Y3zOQWNZ0MZcwtxorPbxkUcemUceeST33Xdf3nzzzRxwwAEVtnfp0iV33313hUTms2vMeNc555yTadOm5S9/+Usef/zx7Lnnnrn33nvTokWLBjk/pW9+z1V78803i/Hjqqy11lq55557LL5BrTRGXaD/QKmY/dqv6l5o1apVNt100+Jcr44dO6ZLly6V9muoeYsLkiYNfcL2nTqlRYkH+lu0apX2jbwaz1prrZVhw4bluOOOy0orrZQWLVpkiSWWyI477piHH3642hct89L666+fESNG5Nhjj81KK62U5s2bp2PHjtl4441zwQUX5KWXXqqxszC7zp07Z+jQoTn99NOz9tprp3Xr1mnXrl223HLLDBgwIHfddVeaNas+L9Iaa6yRESNG5JZbbslee+2VFVdcMYssskhatGiRZZZZJj179kzfvn0zdOjQSoNgaqt379758MMPc8EFF2TrrbfOUkstlebNm2eRRRZJ586d06tXr1x44YX56KOPstVWW1U6vnnz5rn77rtz0003pXv37mnXrl1at26dtddeO6effnqGDh1aXMF8YbJYu05p3ry0nxvNm7fKYu0a77lxyimn5O67785xxx2XTTfdNCuuuGJatWqVVq1aZeWVV86+++6bhx56KPfdd18WWWSROp27d+/eueOOO9K8efN8+OGH2WqrrfLpp59W2q9Vq1bZZ599iv89p5VQSlWHNp3SolnpXs8tmrVKhzbzbkW6uakbTj755Dz66KPZYYcd0qFDh7Rs2TKdO3fO8ccfn+HDh6d79+6NXv4DDjggw4cPzyGHHJJll102LVq0KK72+fzzz+eoo45q9DLU1mGHHZZXXnklBx10UJVlPfLII/Ptt98mSXHlo1LXqdViadW0+fwuRqNq1bR5OrVacF5cdO3aNW+//XbOO++8bLLJJmnfvn1atGiRZZddNnvuuWceeOCB3H333dUOKjj88MMzaNCg7L777lliiSXSvHnzLLPMMtlxxx1z++2357bbbquUcbQqP6Z7d17p1KZFWjVr8O7jAqVVsybp1KZxXi40Zltt0UUXzcCBA3PJJZdko402Stu2bbPoootm/fXXz3nnnZchQ4akY8eOtTrP3XffnUGDBuXII4/M/2vv7oOsrOv/j7/2jl0Wll3ggGi6zQrSkGDOAGaCGmbkjElTTiMlRpPBYDdjDtUM2JSO0d0fNQOTqYRaY1qjM5kFo6lRpE5YiTSJOiMpFD+aLzRy8w3UgP39sbVpX4hdYPfAZx8P/2F2z559X+PZc67zua7zvN7ylrekpaUl9fX1GTFiRKZOnZpPfvKTWbVqVR5++OFD3s/rTy6sqak55MmGr1+cOdyi5Re/+MWsXLkyM2fOzPDhwzNo0KCceuqp+cAHPpCf//zn3VfA60sLFizIsmXLkiS///3vM3PmzO6acanaKnUZ1FTuwYlBTTVpqxz+NaE/jBkzJmvWrMnPfvazXHnllTn99NPT3NychoaGjBo1Kuedd14WLlyYX/3qV7n99tv7dJa5c+dm48aNuf766zN58uS0tbWlrq4uw4cPz7nnnpvFixfnwQcfPOjP9se6zAmnUpOU+/awa9sq/fc80dLS0n01qaTrgwX/eaLPwRztet6gQYPywAMP5I477sj06dMPup43cuTI7tsf6v1SQ0NDbr755qxfvz6f/vSnM2nSpLS2tqauri6tra05++yzc/XVV+e+++7Ls88+e9D7+NrXvpbly5fn/PPPz4gRIw65b/ue97wnv/vd7zJnzpyccsop3c8nF154YW677bY8+uijBw3q/Ke77ror3/jGN3LOOeektbU1tbVHvj+4ePHirFu3LvPmzcvYsWMzePDgDBkyJBMmTMi1116b5557rvvqF/RCayUpfF01DU1d29kPjmYdau7cuXnuuedy7bXXZsKECRkyZEgGDx6csWPHZt68eVm3bt0bTlo/EvPmzcuf/vSn3HjjjZk2bVoqlUrq6+szZMiQjB8/PpdffnluueWWbNmypTt8VbraVFL6i23XNvat5ubm3HvvvfnFL36Rq666Kh0dHd37caeddlouu+yy3HrrrVm4cGH3z7S2tubxxx/PTTfdlEmTJqWpqSlDhw7NhAkT8tnPfjbr16/PBRdc0Oezn+hGppLGgh/DjWnKyH54DPd0H6k3GhoasnLlyixdujRTpkxJc3NzBg8enHHjxmXBggV56qmn3nBM7HgxefLkPP/881m6dGkuuuiijB49OvX19Rk6dGgmTZqU+fPn59FHH+1xLPBf7rnnntx222358Ic/nLPPPjtjxozpvt8zzzwz11xzTdatW9d91cjeetvb3pYNGzZk4cKFOeOMM9LY2JhKpZIZM2bk7rvvzh133NF9fCIZOMcojlYl9WlKuWs7SdKUmlT68dpaR7O/2F/rJh/60IeyatWqXHfddZk+fXo6OjrS3NzcvaY6a9as/OAHP8ivf/3rN7yXfb3+2LcdiCotSVPBhx2bGrq2sb8c7Tlm/83RrJG+3pw5c7r/3dPzZ45k37ynjsXaVGmaK0ldubvjSbq2r7n/TvHpUyNGjMiaNWvy/e9/P5dcckn3Ou/IkSO7o9pPP/30f41A9ea9w9Geu9Bbx2pf/qqrrur+9+zZs3t87Hnx4sXZsGFDPvOZz2TixIkZNmxY6uvru9eVv/zlL/+fGGRPWdM6iEpj0lTw+RdNtV3b2IesBfxbX60FHAttbW157LHH8q1vfSuTJ08+6Pksr/9/15fv99/0pjflySefzNVXX51x48YNrKhVQ3NSU/h1qWvqu7azj/X0uedYnYN1rJ2oz3MlqE9zagq/PnxN6lOfE+fixMfbOZcDVXMaU5/j4/y9vlKfujSnb/eNk77ZP66trc19992Xb3/725k6dWqGDBmSIUOG5KyzzsqSJUuydu3ajBkz5rD305frXV/96lfzuc99LkmyatWqXH755Xnttdd6fT8niuaaFP1qUp+ubewv1fqs2vnnn59f/vKXWbRoUWbMmJFx48alpaUlDQ0NOemkkzJz5szccsstefrpp9PR0dFXm39caR6c1Bf8clBf17WNfe1YvxZ4/1Bdtc0p+0k/Ser/uZ19bOrUqWlu/vcvOtRnl17/9QsuuOCQka9j9bnF40VNZ2dn5+Fu9Morr+TFF19MR0dHjxbV/rp5c3Zs335MBjwetVUqGXOYq4sAvbP9fzZn185ynzeGtVZSGV3+88a0adPyxBNP5K1vfWueeeaZao9TNf9vx+a8/PcyH8/Dh1RySlv5j2UObty4cdm4cWPmzJlzxCePnGg2/++2bH9l1+FveIKqNA1L+9BR1R6DE8Tml1/J9r+Xu/heGTIo7cMH0Ekk0ANbN+/Lju37qz1Gn2ir1OXk9tJXXzkubN6fbD/s8uuJqVKTtJ84C+F96bHHHus+OPzII4/kXe96V5UnYkD5n81Jweuqaa0kfbiuesMNN+TGG29MkvTgcBnHoX3ZnAMp82+gNpXUx1ps6TZnc/5W6GN4ZCpp9xjmGPn4xz+eFStW5NRTT82f//znao9zwtic17I9+6o9Rp/peqXs26uS2l/kWNq8Pdm+u9pT9I1KS9JeSMjiWLnyyitz9913Z/jw4dm6dWsaG/v+A0X03s7NyZ4yd8eTdAVmWu2SDyjLly/P/PnzkyRr167NOeecU+WJOKTNf0+2v1rtKfpGpTFpP3zoHpKuCx78K5D1wgsvZOzYsVWeqFCv7kz+safaU/SdhuakUZSY49tr2Zl9KffvsD7NGRR/h/Tezvxv9qTQ/eJ0hXRaM7TaY1CQnQeSPYUeqmiuSVoL7rHy3+3cmezZW+0p+kbz4MQ1VDgS+3YmB8p9C5Ha5qTe30af6E0Tpk8+TTOmvV2EBeiVyuj2ARFhKdnzzz+fJ554IknPr8JUqlPa2oVYKM5vf/vbbNy4MUly7rnnVnma/tM+dJQIC/xT+/AmERYYYE5urxdigaPVXhefqy3fPffck6Tr6g2TJ0+u8jQMOKPb+zTCAse7rgiLvwFOXO3//A84tL179+YnP/lJkoF1fOJYaM+gPo+wAD3XXhFiGSh27NiRH//4x0m6YjMCM8ev1nYRFspy++23J0kmTpwoMHO8ax8ixAL59zG2UaNG5fTTT6/yNAVrbBVhgSoblFYRFjiI1gwVYYFeaK2NVxOK1NoqxAL/qb41nvTpc/puABwTX//615MkTU1N+ehHP1rdYYBee+GFFw75vb/97W+ZN29ekqSxsTFXXHFFf40FAABQVdu3b8+OHTsO+f2HHnoot956a5Jk1qxZaWtr65/BAACgEBs3bkxn58Evu7h///5cc8012b59e5Jk7ty5/TkaAByRpUuXZu/ersuuLliwoMrTAAPFmjVr8pvf/CaJ5x7g+LBly5bufaKD+e53v5tVq1YlST7ykY+kpqamv0YDAAAAgAHP5agBOCJ79+7Nli1bsmfPntx///258847kyTz58/PyJEjqzsc0Gvvfve709HRkfe///0566yz0trampdffjmPP/54br755mzdujVJ8oUvfCGVikvsAQAAA8Mf//jHvO9978sHP/jBXHzxxRk7dmxqa2uzadOmPPDAA7nrrruyf//+DB48OF/5yleqPS4AAJxwbrrppjz55JOZPXt23v72t2f06NHZu3dv/vCHP2T58uV56qmnkiQXX3xxLr300ipPCwD/1759+/LSSy/l1VdfzerVq7vXiGbNmpUzzzyzytMBJdu0aVNeffXVPPPMM7nuuuuSJGPGjMnHPvaxKk8GkDz88MP5/Oc/n9mzZ+ed73xn3vzmN+fAgQPZuHFjfvSjH+X+++9Pkpx00klZtGhRdYcFAAAAgAFGZAaAI7J27drMmDHjDV877bTTcsMNN1RnIOCodHZ2ZvXq1Vm9evUhb/OJT3wiixcv7sepAAAAqm/Xrl1ZsWJFVqxYcdDvDxs2LPfee2/Gjx/fz5MBAEAZnn322XzpS1865PenTZuWH/7wh65qDsBx6S9/+UvOOOOMN3yttbU13/zmN6s0ETBQXHjhhdm0adMbvrZs2bIMHjy4ShMBvNG2bduybNmyLFu27KDfP/nkk7Ny5UoXtgQAAACAfiYyA8BRqampycknn5yLLrooS5YsyfDhw6s9EnAEvve97+WnP/1p1qxZk61bt2bbtm2pr6/PmDFjMn369MyfPz/nnXdetccEAADoV1OmTMmdd96ZBx98MOvXr8+2bduyY8eODBs2LOPGjcsll1yST33qUxk1alS1RwUAgBPSokWLMn78+DzyyCN56aWXsm3btvzjH//IyJEjM2XKlFxxxRWZPXt2amtrqz0qABzW6NGj8453vCNLlizJ2LFjqz0OMEC0tLRk4sSJuf7663PppZdWexyAJMl73/vefOc738lDDz2UDRs2ZNu2bdm9e3fa2toyYcKEXHbZZVmwYEFaWlqqPSoAAAAADDg1nZ2dnYe70SuvvJIXX3wxHR0daWpq6o+5AAAAAAAAAAAAAAAAAAAAAAA4hN40YVzqCQAAAAAAAAAAAAAAAAAAAACgYCIzAAAAAAAAAAAAAAAAAAAAAAAFE5kBAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAAAQMF6FZnp7OzsqzkAAAAAAAAAAAAAAAAAAAAAAOih3rRgehSZqaurS5Ls27fvyCYCAAAAAAAAAAAAAAAAAAAAAOCY2b9/f5KktvbwCZkeRWbq6+vT2NiYnTt3Ht1kAAAAAAAAAAAAAAAAAAAAAAActd27d6ehoSENDQ2HvW2PIjM1NTVpa2vL7t278/LLLx/1gAAAAAAAAAAAAAAAAAAAAAAAHJm9e/dm165daWlpSU1NzWFvX9/TOx4+fHhee+21/PWvf82uXbsydOjQNDU1pba2tke/CAAAAAAAAAAAAAAAAAAAAACAI9PZ2Zn9+/dn9+7d2bVrVxobG1OpVHr0szWdnZ2dvfllO3fuzK5du7Jnz54cOHDgiAYGAAAAAAAAAAAAAAAAAAAAAKD3Ghoa0tLSkkqlkrq6uh79TK8jM/9y4MCB7Nu3T2gGAAAAAAAAAAAAAAAAAAAAAKAf1NbWpqGhITU1Nb36uSOOzAAAAAAAAAAAAAAAAAAAAAAAcPyrrfYAAAAAAAAAAAAAAAAAAAAAAAD0HZEZAAAAAAAAAAAAAAAAAAAAAICCicwAAAAAAAAAAAAAAAAAAAAAABRMZAYAAAAAAAAAAAAAAAAAAAAAoGAiMwAAAAAAAAAAAAAAAAAAAAAABROZAQAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAEDBRGYAAAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAUDCRGQAAAAAAAAAAAAAAAAAAAACAgonMAAAAAAAAAAAAAAAAAAAAAAAUTGQGAAAAAAAAAAAAAAAAAAAAAKBgIjMAAAAAAAAAAAAAAAAAAAAAAAUTmQEAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAABAwURmAAAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAAFAwkRkAAAAAAAAAAAAAAAAAAAAAgIKJzAAAAAAAAAAAAAAAAAAAAAAAFExkBgAAAAAAAAAAAAAAAAAAAACgYCIzAAAAAAAAAAAAAAAAAAAAAAAFE5kBAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAAAQMFEZgAAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAABQMJEZAAAAAAAAAAAAAAAAAAAAAICCicwAAAAAAAAAAAAAAAAAAAAAABRMZAYAAAAAAAAAAAAAAAAAAAAAoGAiMwAAAAAAAAAAAAAAAAAAAAAABROZAQAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAEDBRGYAAAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAUDCRGQAAAAAAAAAAAAAAAAAAAACAgonMAAAAAAAAAAAAAAAAAAAAAAAUTGQGAAAAAAAAAAAAAAAAAAAAAKBgIjMAAAAAAAAAAAAAAAAAAAAAAAUTmQEAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAABAwURmAAAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAAFAwkRkAAAAAAAAAAAAAAAAAAAAAgIKJzAAAAAAAAAAAAAAAAAAAAAAAFExkBgAAAAAAAAAAAAAAAAAAAACgYCIzAAAAAAAAAAAAAAAAAAAAAAAFE5kBAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAAAQMFEZgAAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAABQMJEZAAAAAAAAAAAAAAAAAAAAAICCicwAAAAAAAAAAAAAAAAAAAAAABRMZAYAAAAAAAAAAAAAAAAAAAAAoGAiMwAAAAAAAAAAAAAAAAAAAAAABROZAQAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAEDBRGYAAAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAUDCRGQAAAAAAAAAAAAAAAAAAAACAgonMAAAAAAAAAAAAAAAAAAAAAAAUTGQGAAAAAAAAAAAAAAAAAAAAAKBgIjMAAAAAAAAAAAAAAAAAAAAAAAUTmQEAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAABAwURmAAAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAAFAwkRkAAAAAAAAAAAAAAAAAAAAAgIKJzAAAAAAAAAAAAAAAAAAAAAAAFExkBgAAAAAAAAAAAAAAAAAAAACgYCIzAAAAAAAAAAAAAAAAAAAAAAAFE5kBAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAAAQMFEZgAAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAABQMJEZAAAAAAAAAAAAAAAAAAAAAICCicwAAAAAAAAAAAAAAAAAAAAAABRMZAYAAAAAAAAAAAAAAAAAAAAAoGAiMwAAAAAAAAAAAAAAAAAAAAAABROZAQAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAEDBRGYAAAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAUDCRGQAAAAAAAAAAAAAAAAAAAACAgonMAAAAAAAAAAAAAAAAAAAAAAAUTGQGAAAAAAAAAAAAAAAAAAAAAKBgIjMAAAAAAAAAAAAAAAAAAAAAAAUTmQEAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAABAwURmAAAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAAFAwkRkAAAAAAAAAAAAAAAAAAAAAgIKJzAAAAAAAAAAAAAAAAAAAAAAAFExkBgAAAAAAAAAAAAAAAAAAAACgYCIzAAAAAAAAAAAAAAAAAAAAAAAFE5kBAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAAAQMFEZgAAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAABQMJEZAAAAAAAAAAAAAAAAAAAAAICCicwAAAAAAAAAAAAAAAAAAAAAABRMZAYAAAAAAAAAAAAAAAAAAAAAoGAiMwAAAAAAAAAAAAAAAAAAAAAABROZAQAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAEDBRGYAAAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAUDCRGQAAAAAAAAAAAAAAAAAAAACAgonMAAAAAAAAAAAAAAAAAAAAAAAUTGQGAAAAAAAAAAAAAAAAAAAAAKBgIjMAAAAAAAAAAAAAAAAAAAAAAAUTmQEAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAABAwURmAAAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAAFAwkRkAAAAAAAAAAAAAAAAAAAAAgIKJzAAAAAAAAAAAAAAAAAAAAAAAFExkBgAAAAAAAAAAAAAAAAAAAACgYCIzAAAAAAAAAAAAAAAAAAAAAAAFE5kBAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAAAQMFEZgAAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAABQMJEZAAAAAAAAAAAAAAAAAAAAAICCicwAAAAAAAAAAAAAAAAAAAAAABRMZAYAAAAAAAAAAAAAAAAAAAAAoGAiMwAAAAAAAAAAAAAAAAAAAAAABROZAQAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAEDBRGYAAAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAUDCRGQAAAAAAAAAAAAAAAAAAAACAgonMAAAAAAAAAAAAAAAAAAAAAAAUTGQGAAAAAAAAAAAAAAAAAAAAAKBgIjMAAAAAAAAAAAAAAAAAAAAAAAUTmQEAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAABAwURmAAAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAAFAwkRkAAAAAAAAAAAAAAAAAAAAAgIKJzAAAAAAAAAAAAAAAAAAAAAAAFExkBgAAAAAAAAAAAAAAAAAAAACgYCIzAAAAAAAAAAAAAAAAAAAAAAAFE5kBAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAAAQMFEZgAAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAABQMJEZAAAAAAAAAAAAAAAAAAAAAICCicwAAAAAAAAAAAAAAAAAAAAAABRMZAYAAAAAAAAAAAAAAAAAAAAAoGAiMwAAAAAAAAAAAAAAAAAAAAAABROZAQAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAEDBRGYAAAAAAAAAAAAAAAAAAAAAAAomMgMAAAAAAAAAAAAAAAAAAAAAUDCRGQAAAAAAAAAAAAAAAAAAAACAgonMAAAAAAAAAAAAAAAAAAAAAAAUTGQGAAAAAAAAAAAAAAAAAAAAAKBgIjMAAAAAAAAAAAAAAAAAAAAAAAUTmQEAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAABAwURmAAAAAAAAAAAAAAAAAAAAAAAKJjIDAAAAAAAAAAAAAAAAAAAAAFAwkRkAAAAAAAAAAAAAAAAAAAAAgIKJzAAAAAAAAAAAAAAAAAAAAAAAFExkBgAAAAAAAAAAAAAAAAAAAACgYCIzAAAAAAAAAAAAAAAAAAAAAAAFE5kBAAAAAAAAAAAAAAAAAAAAACiYyAwAAAAAAAAAAAAAAAAAAAAAQMH+P+wrG1k9L7y3AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display_issues(class_issues, pred_probs=pred_probs, labels=labels, top=3, class_names=SYNTHIA_CLASSES)" + ] + }, + { + "cell_type": "markdown", + "id": "1759108b", + "metadata": {}, + "source": [ + "### Get label quality scores\n", + "\n", + "Cleanlab can provide an overall label quality score for each image to estimate our confidence that it is correctly labeled. These scores range from 0 to 1, such that lower scores indicate images more likely to contain some mislabeled pixels.\n", + "\n", + "**Note:** To automatically estimate *which* pixels are mislabeled (and the number of label errors) rather than ranking the images, use `find_label_issues()` instead. \n", + "\n", + "The label quality scores are most useful if you only have time to review a limited number of images and want to prioritize which ones to look at, or if you're specifically aiming to detect label errors with high precision (or high recall) rather than overall estimation of the set of mislabeled images and pixels." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "db0b5179", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:26.288290Z", + "iopub.status.busy": "2024-06-25T23:07:26.288107Z", + "iopub.status.idle": "2024-06-25T23:07:27.688143Z", + "shell.execute_reply": "2024-06-25T23:07:27.687506Z" + } + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "853bac5cbd4f4898be61293580d9bfb3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "images processed using softmin: 0%| | 0/30 [00:00" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAABMsAAAEDCAYAAAAr7oPcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8WgzjOAAAACXBIWXMAAA9hAAAPYQGoP6dpAADmNElEQVR4nOydd3wcxfm4n72ibjVbkns3uGKDwWAM2IDBgCkOHRKwKaGZDiGBhBrAAcLvC6EnJEASExJ6Sei9hQ6hGQwYG9yrZKtemd8fe3fam5vZW8kn6U6a5/OxdTf1fXdn533nvd1ZSwghMBgMBoPBYDAYDAaDwWAwGAz4uloAg8FgMBgMBoPBYDAYDAaDIVswwTKDwWAwGAwGg8FgMBgMBoMhhgmWGQwGg8FgMBgMBoPBYDAYDDFMsMxgMBgMBoPBYDAYDAaDwWCIYYJlBoPBYDAYDAaDwWAwGAwGQwwTLDMYDAaDwWAwGAwGg8FgMBhimGCZwWAwGAwGg8FgMBgMBoPBEMMEywwGg8FgMBgMBoPBYDAYDIYYJlhmMBgMBoPBYDAYDAaDwWAwxDDBMkOP4IorrsCyrK4Wo9OYMWMG48ePz2ibQ4cOZd68eRlt02AwGLoay7K44oorulqMtMybN4+hQ4cmpWVa9hkzZjBjxoyMtZeNDB06lAMPPDBj7b3yyitYlsVDDz2UtqyXc3jvvfdiWRbff/+9575feeWVtgltMBgMWY5Zd9icccYZ7LPPPonv33//PZZlce+993adUFuB7Gd88cUXBAIBPvvss64TygUTLDPkLEuWLOHMM89km222oaioiKKiIsaOHcv8+fP53//+19XitRnLsjjzzDO7WgyDwZCj3H777ViWxc4779zVonQ74s5p/J/f72fw4MH85Cc/4eOPP+5q8drEF198wRVXXOEpGNNZxIM+8X/BYJDhw4dz/PHH891333W1eF3O7bffnrMLI0NuEQ/Wvv/++10tSofxww8/cOWVVzJlyhQqKiro06cPM2bM4IUXXtjqtt944w32339/BgwYQEFBAYMHD+aggw7i/vvvz4Dk2cd//vOfLv+xyWk75H+nnXZal8q2tSxZsoS7776bSy65pEP7mTFjRtJxKywsZLvttuOmm24iGo12aN9jx45l9uzZXHbZZR3aT3sJdLUABkN7eOqppzjqqKMIBAL89Kc/ZeLEifh8PhYtWsQjjzzCHXfcwZIlSxgyZAgAv/nNb/jVr37VxVIbDAZDx7Fw4UKGDh3Ku+++yzfffMPIkSO7WqRuxzHHHMMBBxxAJBLhyy+/5I477uDpp5/mv//9L5MmTep0eRobGwkE2ubKffHFF1x55ZXMmDEj5S6n5557LoPStZ2zzz6bnXbaiVAoxIcffsgf//hH/v3vf/Ppp5/Sv3//LpUtE/zpT39Ku/A47rjjOProo8nPz0+k3X777fTp0yflLos99tiDxsZG8vLyOkJcg6Fb8vjjj3PdddcxZ84c5s6dSzgc5q9//Sv77LMPf/nLXzjhhBPa1e6DDz7IUUcdxaRJkzjnnHOoqKhgyZIlvPbaa/zpT3/i2GOPzbAmXc9//vMfbrvtti4PmO2zzz4cf/zxKenbbLNNF0iTOW6++WaGDRvGnnvumUgbMmQIjY2NBIPBjPY1cOBAFixYAMC6deu4//77Oe+881i7di3XXHNNRvuSOe200zjggAP49ttvGTFiRIf21VZMsMyQc3z77bccffTRDBkyhBdffJF+/fol5V933XXcfvvt+HytN04GAoE2LygMBoMhV1iyZAlvvfUWjzzyCKeeeioLFy7k8ssvT1svHA4TjUbNYtsjO+ywAz/72c8S36dNm8bBBx/MHXfcwV133aWsU19fT3FxcYfIU1BQkNH2unoc7L777hx++OEAnHDCCWyzzTacffbZ3HfffVx88cXKOh15fDONl8WN3+/H7/d7as/n82V8DBgM3Z0999yTZcuW0adPn0TaaaedxqRJk7jsssvaHSy74oorGDt2LP/9739T5tI1a9ZslcwGd7bZZpsk2+yVhoYGioqKUtIz4RttrW0KhUIsXLgw5e44y7I6ZN4vKytLOoannXYao0eP5pZbbuGqq67ybJfaw8yZM6moqOC+++7jqquu6rB+2oN5DNOQc1x//fXU19dzzz33pATKwA6MnX322QwaNCiRJu9ZNn78+KQofZxoNMqAAQMSzno87aabbmLcuHEUFBRQU1PDqaeeysaNG5PqxvdBeeONN5gyZQoFBQUMHz6cv/71r5lQG7B/DZs9ezb9+/cnPz+fESNG8Nvf/pZIJKIs/8EHH7DrrrtSWFjIsGHDuPPOO1PKNDc3c/nllzNy5Ejy8/MZNGgQF110Ec3Nza6yhEIhrrzySkaNGkVBQQG9e/dmt9124/nnn8+IrgaDwTsLFy6koqKC2bNnc/jhh7Nw4cKUMvFHCX//+99z0003MWLECPLz8/niiy8A+1G4HXfckYKCAkaMGMFdd92l3O8x/sj4gw8+yNixYyksLGTq1Kl8+umnANx1112MHDmSgoICZsyYkfK43+uvv84RRxzB4MGDE3POeeedR2NjY6LMmjVrqKqqYsaMGQghEunffPMNxcXFHHXUUYk0r3NYc3Mz5513HlVVVfTq1YuDDz6YH3/8sX0HPMZee+0F2MFKaH2E6dVXX+WMM86gurqagQMHJso//fTT7L777hQXF9OrVy9mz57N559/ntLuY489xvjx4ykoKGD8+PE8+uijyv5Ve5YtX76ck046KWEnhg0bxumnn05LSwv33nsvRxxxBGAvGOOPXMT3vFLtWbZmzRpOOukkampqKCgoYOLEidx3331JZZxj649//GNibO2000689957no+njHx84+Pxiy++4Nhjj6WiooLddtsNsBc3v/3tbxN9Dx06lEsuuURry5577jkmTZpEQUEBY8eO5ZFHHknK37BhAxdeeCETJkygpKSE0tJS9t9/fz755BNle5FIhEsuuYS+fftSXFzMwQcfzA8//JBURrVnmYy8Z9nQoUP5/PPPefXVVxPnK36OdHuWvfPOO+y3336UlZVRVFTE9OnTefPNN5PKbN68mXPPPZehQ4eSn59PdXU1++yzDx9++KGrfIaex7x58ygpKWHZsmUceOCBlJSUMGDAAG677TYAPv30U/baay+Ki4sZMmRIyiOHbbmWli5dysEHH0xxcTHV1dWcd955PPvss+0e5yrGjRuXFCgDyM/P54ADDuDHH39k8+bNifRQKMSiRYtYuXJl2na//fZbdtppJ2WApbq6OvFZd93q9qKK21qnPVDNJevXr+e4446jtLSU8vJy5s6dyyeffKJsc9GiRRx++OFUVlZSUFDAjjvuyBNPPJFUJp2fP2/evMQYcD7CF8fr+kkIwdVXX83AgQMpKipizz33VNrFrSW+n/MHH3zAHnvsQVFREZdcckla3+ill15K2O3y8nIOOeQQvvzyy6S23WzTqlWrOOGEExg4cCD5+fn069ePQw45JO1WCG+88Qbr1q1j5syZSemqcRK/RpcvX86cOXMoKSmhqqqKCy+8ULtGTEdBQQE77bQTmzdvTgn2/v3vf2fy5MkUFhZSWVnJ0UcfnWLvgIQ/UFhYyJQpU3j99deVfQWDQWbMmMHjjz/eLlk7EnOrjSHneOqppxg5cuRW7ctz1FFHccUVV7Bq1Sr69u2bSH/jjTdYsWIFRx99dCLt1FNP5d577+WEE07g7LPPZsmSJdx666189NFHvPnmm0m/FH/zzTccfvjhnHTSScydO5e//OUvzJs3j8mTJzNu3Lh2yxvn3nvvpaSkhPPPP5+SkhJeeuklLrvsMurq6rjhhhuSym7cuJEDDjiAI488kmOOOYZ//etfnH766eTl5XHiiScCtiE7+OCDeeONNzjllFMYM2YMn376Kf/3f//H119/zWOPPaaV5YorrmDBggWcfPLJTJkyhbq6Ot5//30+/PDDpI0oDQZDx7Nw4UIOPfRQ8vLyOOaYY7jjjjt477332GmnnVLK3nPPPTQ1NXHKKaeQn59PZWUlH330Efvttx/9+vXjyiuvJBKJcNVVV1FVVaXs7/XXX+eJJ55g/vz5ACxYsIADDzyQiy66iNtvv50zzjiDjRs3cv3113PiiSfy0ksvJeo++OCDNDQ0cPrpp9O7d2/effddbrnlFn788UcefPBBwF5Y3HHHHRxxxBHccsstnH322USjUebNm0evXr24/fbbgbbNYSeffDJ///vfOfbYY9l111156aWXmD179lYd92+//RaA3r17J6WfccYZVFVVcdlll1FfXw/A3/72N+bOncusWbO47rrraGho4I477mC33Xbjo48+Six8nnvuOQ477DDGjh3LggULWL9+fcLRTseKFSuYMmUKmzZt4pRTTmH06NEsX76chx56iIaGBvbYYw/OPvts/vCHP3DJJZcwZswYgMRfmcbGRmbMmME333zDmWeeybBhw3jwwQeZN28emzZt4pxzzkkqf//997N582ZOPfVULMvi+uuv59BDD+W7775r1yMjuuN7xBFHMGrUKK699tpEMPXkk0/mvvvu4/DDD+eCCy7gnXfeYcGCBXz55ZcpwcbFixdz1FFHcdpppzF37lzuuecejjjiCJ555pmE/fruu+947LHHOOKIIxg2bBirV6/mrrvuYvr06XzxxRcpj4Vec801WJbFL3/5S9asWcNNN93EzJkz+fjjjyksLGyz7nFuuukmzjrrLEpKSvj1r38NQE1Njbb8Sy+9xP7778/kyZO5/PLL8fl83HPPPey11168/vrrTJkyBbDvGnjooYc488wzGTt2LOvXr+eNN97gyy+/ZIcddmi3vIbuSSQSYf/992ePPfbg+uuvZ+HChZx55pkUFxfz61//mp/+9Kcceuih3HnnnRx//PFMnTqVYcOGAd6vpfr6evbaay9WrlzJOeecQ9++fbn//vt5+eWXU+TxOs7bwqpVqxJ7IMdZvnw5Y8aMYe7cuWn3DYw/8fLjjz96mq+98O9//5ujjjqKCRMmsGDBAjZu3MhJJ53EgAEDkspFo1EOOugg3n33XU4//XRGjx7N448/zty5c1Pa/Pzzz5k2bRoDBgzgV7/6FcXFxfzrX/9izpw5PPzww/zkJz8B0vv5p556KitWrOD555/nb3/7W0o/XtdPl112GVdffTUHHHAABxxwAB9++CH77rsvLS0tno9TU1MT69atS0kvLS1NCl6uX7+e/fffn6OPPpqf/exnSXOpyjd64YUX2H///Rk+fDhXXHEFjY2N3HLLLUybNo0PP/wwJWCpsk2HHXYYn3/+OWeddRZDhw5lzZo1PP/88yxbtsz1x5O33noLy7LYfvvtPR2DSCTCrFmz2Hnnnfn973/PCy+8wI033siIESM4/fTTPbUhEw/MlZeXJ9KuueYaLr30Uo488khOPvlk1q5dyy233MIee+zBRx99lCj75z//mVNPPZVdd92Vc889l++++46DDz6YysrKpBta4kyePJnHH3+curo6SktL2yVvhyAMhhyitrZWAGLOnDkpeRs3bhRr165N/GtoaEjkXX755cI53L/66isBiFtuuSWpjTPOOEOUlJQk6r7++usCEAsXLkwq98wzz6SkDxkyRADitddeS6StWbNG5OfniwsuuCCtboCYP3++axmnTnFOPfVUUVRUJJqamhJp06dPF4C48cYbE2nNzc1i0qRJorq6WrS0tAghhPjb3/4mfD6feP3115PavPPOOwUg3nzzzST95s6dm/g+ceJEMXv27LR6GQyGjuX9998XgHj++eeFEEJEo1ExcOBAcc455ySVW7JkiQBEaWmpWLNmTVLeQQcdJIqKisTy5csTaYsXLxaBQEDIrgIg8vPzxZIlSxJpd911lwBE3759RV1dXSL94osvFkBSWdU8tmDBAmFZlli6dGlS+jHHHCOKiorE119/LW644QYBiMceeyyR73UO+/jjjwUgzjjjjKRyxx57rADE5ZdfniKTk/ixu/LKK8XatWvFqlWrxCuvvCK23357AYiHH35YCCHEPffcIwCx2267iXA4nKi/efNmUV5eLn7+858ntbtq1SpRVlaWlD5p0iTRr18/sWnTpkTac889JwAxZMiQpPqy7Mcff7zw+XzivffeS9EhGo0KIYR48MEHBSBefvnllDLTp08X06dPT3y/6aabBCD+/ve/J9JaWlrE1KlTRUlJSeJcx49P7969xYYNGxJlH3/8cQGIJ598MqUvJy+//LIAxF/+8hexdu1asWLFCvHvf/9bDB06VFiWldAnbsuPOeaYpPrx83vyyScnpV944YUCEC+99FIiLW6r4+dMCNu36Nevn9h+++0TaU1NTSISiSS1t2TJEpGfny+uuuqqFNkHDBiQNPb/9a9/CUDcfPPNibS5c+emPYfxMeS8ZsaNG5d0XuS+4+cyGo2KUaNGiVmzZiXOtxD2NTds2DCxzz77JNLKysrS+hyGnkd8/DnnkLlz5wpAXHvttYm0jRs3isLCQmFZlnjggQcS6YsWLUoZ016vpRtvvDFljm9sbBSjR49u9zj3yuLFi0VBQYE47rjjUuQEkvxfHX/+858FIPLy8sSee+4pLr30UvH666+n6C5ft3Jf99xzTyJtwoQJYuDAgWLz5s2JtFdeeSXFHjz88MMCEDfddFMiLRKJiL322iulzb333ltMmDAhad0QjUbFrrvuKkaNGpVI8+Lnz58/P8VHEML7+mnNmjUiLy9PzJ49O+lcXnLJJZ6PO6D9949//CNRLr42uvPOO5Pqu/lG8XXT+vXrE2mffPKJ8Pl84vjjj0+k6WzTxo0bBSBuuOGGtHrI/OxnPxO9e/dOSVeNk/g16ryehBBi++23F5MnT07b1/Tp08Xo0aMTa+hFixaJX/ziFwJIGgPff/+98Pv94pprrkmq/+mnn4pAIJBIb2lpEdXV1WLSpEmiubk5Ue6Pf/yjAJT27P777xeAeOedd9LK25mYxzANOUVdXR0AJSUlKXkzZsygqqoq8S9+a7CKbbbZhkmTJvHPf/4zkRaJRHjooYc46KCDEr8CP/jgg5SVlbHPPvuwbt26xL/JkydTUlKS8mvX2LFj2X333RPfq6qq2HbbbTP2Ni/nr9ObN29m3bp17L777jQ0NLBo0aKksoFAgFNPPTXxPS8vj1NPPZU1a9bwwQcfJPQbM2YMo0ePTtIv/uiL6te8OOXl5Xz++ecsXrw4I7oZDIb2sXDhQmpqahKPlluWxVFHHcUDDzygvP3+sMMOS7pjLBKJ8MILLzBnzpyku2VGjhzJ/vvvr+xz7733TvpFNH6n72GHHUavXr1S0p1zoHMeq6+vZ926dey6664IIfjoo4+S+rn11lspKyvj8MMP59JLL+W4447jkEMOSeR7ncP+85//APYG8k7OPfdcpX46Lr/8cqqqqujbty8zZszg22+/5brrruPQQw9NKvfzn/88aX+P559/nk2bNnHMMcckyen3+9l5550Tcq5cuZKPP/6YuXPnUlZWlqi/zz77MHbsWFfZotEojz32GAcddBA77rhjSr78OK0X/vOf/9C3b1+OOeaYRFowGOTss89my5YtvPrqq0nljzrqKCoqKhLf4/bQqw088cQTqaqqon///syePZv6+nruu+++FH3kPVzi5/f8889PSr/gggsA++4MJ/3790/cPQH23QfHH388H330EatWrQLsx7Lie59GIhHWr19PSUkJ2267rfJRxeOPPz5p7B9++OH069cvIVtn8PHHH7N48WKOPfZY1q9fnxhn9fX17L333rz22muJFwyUl5fzzjvvsGLFik6Tz5DbnHzyyYnP5eXlbLvtthQXF3PkkUcm0rfddlvKy8uTrnmv19IzzzzDgAEDOPjggxNpBQUF/PznP0+Soy3j3AsNDQ0cccQRFBYW8rvf/S4pb+jQoQghPL2N9sQTT+SZZ55hxowZvPHGG/z2t79l9913Z9SoUbz11lue5YmzYsUKPv30U44//vikdc/06dOZMGFCUtlnnnmGYDCYdKx8Pl/i7u84GzZs4KWXXuLII49MrCPWrVvH+vXrmTVrFosXL2b58uXA1vn5XtdPL7zwAi0tLZx11llJNqqttvmQQw7h+eefT/knb7mTn5+v3ZNO9o3i9njevHlUVlYm0rfbbjv22Wcf5dwu26bCwkLy8vJ45ZVXUh4/Tcf69euT7KkX5P533313z/Z30aJFiTX06NGjueGGGzj44IOTxv4jjzxCNBrlyCOPTDqvffv2ZdSoUYnz+v7777NmzRpOO+20pDv75s2bl+TbOInrqrpDsCsxj2Eacoq4I7ply5aUvLvuuovNmzezevVqT5s8HnXUUVxyySUsX76cAQMG8Morr7BmzZqkvXAWL15MbW1t0l4DTuRnuAcPHpxSpqKios0TpI7PP/+c3/zmN7z00kuJwGGc2trapO/9+/dP2Vgy/laY77//nl122YXFixfz5Zdfah+1ctuQ9KqrruKQQw5hm222Yfz48ey3334cd9xxbLfddu1RzWAwtINIJMIDDzzAnnvumdjXCewg1Y033siLL77Ivvvum1Qn/mhMnDVr1tDY2Kh8e6bujZryXBd3fuRb6+Ppzjlw2bJlXHbZZTzxxBMpc6M8j1VWVvKHP/yBI444gpqaGv7whz8k5Xudw5YuXYrP50t5y9K2226rrKfjlFNO4YgjjsDn81FeXs64ceOS3loYRz7G8cVGPIgnE3/kYOnSpQCMGjUqpYwuSBNn7dq11NXVMX78eG/KeGDp0qWMGjUq6YU50PrYZlzeOPK4iDu/Xm3gZZddxu67747f76dPnz6MGTNG+XIe+fjGz688Xvv27Ut5eXmKnCNHjkwJHjrtY9++fYlGo9x8883cfvvtLFmyJCnwLD8WCqnnzLIsRo4cmXZfmkwSH2eqR6/i1NbWUlFRwfXXX8/cuXMZNGgQkydP5oADDuD4449n+PDhnSWuIYcoKChImWfLysoYOHBgyrVUVlaWdM17vZaWLl3KiBEjUtqTr+u2jPN0RCIRjj76aL744guefvrprX7r7qxZs5g1axYNDQ188MEH/POf/+TOO+/kwAMPZNGiRdr1hIr4vKWzzU57sHTpUvr165eyWb1c95tvvkEIwaWXXsqll16q7HfNmjUMGDBgq/x8r+snnc2rqqpqU6Bo4MCBKXt7qRgwYIB2036VXQG1nzBmzBieffbZlE385Tby8/O57rrruOCCC6ipqWGXXXbhwAMP5Pjjj0/aBkiHcOzZmg7VNdqWNejQoUMTb2z+9ttvueaaa1i7dm3SywQWL16MEELpo0DrS2x05zUYDGptTFzX9vyw15GYYJkhpygrK6Nfv3589tlnKXnxOxi8OqZHHXUUF198MQ8++CDnnnsu//rXvygrK2O//fZLlIlGo1RXVys3ywZSJiXdm0LaMtnp2LRpE9OnT6e0tJSrrrqKESNGUFBQwIcffsgvf/nLNv2KFicajTJhwgT+3//7f8p81TPlcfbYYw++/fZbHn/8cZ577jnuvvtu/u///o8777wz6ddHg8HQcbz00kusXLmSBx54gAceeCAlf+HChSnBsq3ZPymObq5LNwdGIhH22WcfNmzYwC9/+UtGjx5NcXExy5cvZ968ecp57NlnnwXsgMuPP/6YtHfG1sxh7WHUqFGeHHL5GMf1+tvf/qZ0kLvL25q31gZOmDChXcc3Tiad7GuvvZZLL72UE088kd/+9rdUVlbi8/k499xz22VvO4O4XDfccAOTJk1SlonfoXLkkUey++678+ijj/Lcc89xww03cN111/HII49o7yg19FzaO+dD5q+ltozzdPz85z/nqaeeYuHChdofM9pDUVERu+++O7vvvjt9+vThyiuv5Omnn2bu3Lnaeaq9G7G3hfixu/DCC5k1a5ayTDzAtjV+flvXT52Fm/+TCd9I1ca5557LQQcdxGOPPcazzz7LpZdeyoIFC3jppZdc9yPr3bt3m2622Nq3VRYXFyfZ32nTprHDDjtwySWXJH6ojEajWJbF008/rezP63WnIq6r/PKNrqZ7eGeGHsXs2bO5++67effdd9u1gWecYcOGMWXKFP75z39y5pln8sgjjzBnzpykuwRGjBjBCy+8wLRp0zIyiW4Nr7zyCuvXr+eRRx5hjz32SKQ77yZxsmLFipRfPL7++muAxONTI0aM4JNPPmHvvfdu1yKjsrKSE044gRNOOIEtW7awxx57cMUVV5hgmcHQSSxcuJDq6mrlY+ePPPIIjz76KHfeeafr/FVdXU1BQQHffPNNSp4qbWv49NNP+frrr7nvvvs4/vjjE+m6t+g+88wz3H333Vx00UUsXLiQuXPn8s477ySCS17nsCFDhiR+LXX+SvzVV19lSDN34ne0VVdXuwaDhgwZAqB87CWdrFVVVZSWlip/THLSlrl+yJAh/O9//yMajSbdXRZ/7D8ub1cTP7+LFy9OelnB6tWr2bRpU4qc8bsrnMdCto8PPfQQe+65J3/+85+T6m7atEnpzMvnTAjBN998k5G7rb2es/g4Ky0t9RR07NevH2eccQZnnHEGa9asYYcdduCaa64xwTJDRvF6LQ0ZMoQvvvgi5dqU7VBbx7mOX/ziF9xzzz3cdNNNSY+aZ5r4Y+TxN2rG75jatGlTUjn5Dtj4vOXFNg8ZMoSXX36ZhoaGpLvL5HLxu3qCwaCnY5fOz9fNTV7XT06b57zjaO3atRl7Kqe9xGVT2d5FixbRp0+flCd4dIwYMYILLriACy64gMWLFzNp0iRuvPFG/v73v2vrjB49moULF1JbW6t9dLEj2W677fjZz37GXXfdxYUXXsjgwYMZMWIEQgiGDRuWuBtbhfO8OoPQoVCIJUuWMHHixJQ6S5YswefzubbbFZg9yww5x0UXXURRUREnnngiq1evTslvy11cRx11FP/973/5y1/+wrp165IewQT7l9dIJMJvf/vblLrhcDjF0HUk8Qi+U7+WlpbEW+FkwuEwd911V1LZu+66i6qqKiZPngzY+i1fvpw//elPKfUbGxsTb3FTsX79+qTvJSUljBw5kubmZu9KGQyGdtPY2MgjjzzCgQceyOGHH57y78wzz2Tz5s0pr4OX8fv9zJw5k8ceeyxp/6JvvvmGp59+OqMyq+YxIQQ333xzStlNmzYl3sJ17bXXcvfdd/Phhx9y7bXXJsp4ncPii3/5Mc6bbrppq3XywqxZsygtLeXaa68lFAql5K9duxawgxeTJk3ivvvuS3ok9fnnn0+8xl6Hz+djzpw5PPnkk7z//vsp+fFjHnfuvdivAw44gFWrViXt7xkOh7nlllsoKSlh+vTpadvoDA444AAg9XzG7ziU33q6YsWKpDdk1tXV8de//pVJkyYl7vzz+/0p/sSDDz6Y2M9H5q9//SubN29OfH/ooYdYuXJlRgJPxcXFns7X5MmTGTFiBL///e+V21XEx1kkEkl55Lm6upr+/fsbG27IOF6vpVmzZrF8+fIkm9XU1JQyv3sd527ccMMN/P73v+eSSy5Jeauvk1AoxKJFixKBLjdefPFFZXp8b6v4DzVDhgzB7/fz2muvJZWT/fn+/fszfvx4/vrXvybp+eqrr/Lpp58mlZ01axahUCjpWEWj0ZQf0qqrq5kxYwZ33XWXUifnsfPi5+vsidf108yZMwkGg9xyyy1JY6SzbLMbTnvs1O+zzz7jueeeS9gdNxoaGmhqakpKGzFiBL169Uo7106dOhUhRGKf6a7goosuIhQKJWzpoYceit/v58orr0y5poUQiTGz4447UlVVxZ133pn0VtN7771Xa8s++OADxo0b1yWBQTfMnWWGnGPUqFHcf//9HHPMMWy77bb89Kc/ZeLEiQghWLJkCffffz8+n8/Ta5uPPPJILrzwQi688EIqKytTfmWZPn06p556KgsWLODjjz9m3333JRgMsnjxYh588EFuvvlmDj/88Izp9v7773P11VenpM+YMYNdd92ViooK5s6dy9lnn41lWfztb3/TBgf79+/Pddddx/fff88222zDP//5Tz7++GP++Mc/Jp4pP+644/jXv/7Faaedxssvv8y0adOIRCIsWrSIf/3rXzz77LPKjaLBfpnBjBkzmDx5MpWVlbz//vuJ19AbDIaO54knnmDz5s1JmyE72WWXXaiqqmLhwoUpPwTIXHHFFTz33HNMmzaN008/nUgkwq233sr48eP5+OOPMybz6NGjGTFiBBdeeCHLly+ntLSUhx9+WPkL8jnnnMP69et54YUX8Pv97Lfffpx88slcffXVHHLIIUycONHzHDZp0iSOOeYYbr/9dmpra9l111158cUXM37nnI7S0lLuuOMOjjvuOHbYYQeOPvpoqqqqWLZsGf/+97+ZNm0at956KwALFixg9uzZ7Lbbbpx44ols2LCBW265hXHjxikXhk6uvfZannvuOaZPn84pp5zCmDFjWLlyJQ8++CBvvPEG5eXlTJo0Cb/fz3XXXUdtbS35+fnstddeyr1lTjnlFO666y7mzZvHBx98wNChQ3nooYd48803uemmm5I2tO9KJk6cyNy5c/njH/+Y2LLg3Xff5b777mPOnDkpmzxvs802nHTSSbz33nvU1NTwl7/8hdWrV3PPPfckyhx44IFcddVVnHDCCey66658+umnLFy4ULvfSmVlJbvtthsnnHACq1ev5qabbmLkyJEpm5O3h8mTJ3PHHXdw9dVXM3LkSKqrq5WPjPl8Pu6++272339/xo0bxwknnMCAAQNYvnw5L7/8MqWlpTz55JNs3ryZgQMHcvjhhzNx4kRKSkp44YUXeO+997jxxhu3Wl6DwYnXa+nUU0/l1ltv5ZhjjuGcc86hX79+LFy4MLFnUvwuJq/jXMejjz7KRRddxKhRoxgzZkzK3T377LMPNTU1ACxfvpwxY8Ywd+7ctJv8H3LIIQwbNoyDDjqIESNGUF9fzwsvvMCTTz7JTjvtxEEHHQTYW8occcQR3HLLLViWxYgRI3jqqaeU+wRfe+21HHLIIUybNo0TTjiBjRs3Jmyz0x7MmTOHKVOmcMEFF/DNN98wevRonnjiCTZs2JB07ABuu+02dtttNyZMmMDPf/5zhg8fzurVq3n77bf58ccf+eSTTwBvfn78x/ezzz6bWbNm4ff7Ofrooz2vn6qqqrjwwgtZsGABBx54IAcccAAfffQRTz/9dJsex/v666+Vd2nV1NSwzz77eG5H5oYbbmD//fdn6tSpnHTSSTQ2NnLLLbdQVlbGFVdc4UmuvffemyOPPJKxY8cSCAR49NFHWb16NUcffbRr3d12243evXvzwgsvZPQR4bYwduxYDjjgAO6++24uvfRSRowYwdVXX83FF1/M999/z5w5c+jVqxdLlizh0Ucf5ZRTTuHCCy8kGAxy9dVXc+qpp7LXXntx1FFHsWTJEu655x6lDQ2FQrz66qucccYZXaBlGjrprZsGQ8b55ptvxOmnny5GjhwpCgoKRGFhoRg9erQ47bTTxMcff5xUNv5KXxXTpk1TvnLeyR//+EcxefJkUVhYKHr16iUmTJggLrroIrFixYpEmSFDhihfsTx9+nTlK3JlcHn18W9/+1shhBBvvvmm2GWXXURhYaHo37+/uOiii8Szzz6b8grq6dOni3Hjxon3339fTJ06VRQUFIghQ4aIW2+9NaXflpYWcd1114lx48aJ/Px8UVFRISZPniyuvPJKUVtbm6Sf8xXOV199tZgyZYooLy9PHPtrrrlGtLS0pNXVYDBsPQcddJAoKCgQ9fX12jLz5s0TwWBQrFu3LvG6cd0rzF988UWx/fbbi7y8PDFixAhx9913iwsuuEAUFBQklQPE/Pnzk9J0bb/88ssCEA8++GAi7YsvvhAzZ84UJSUlok+fPuLnP/+5+OSTT5Jehf74448LQNx4441J7dXV1YkhQ4aIiRMnJuYar3NYY2OjOPvss0Xv3r1FcXGxOOigg8QPP/wgAHH55Zdrj6GbfjL33HOPAMR7772nzH/55ZfFrFmzRFlZmSgoKBAjRowQ8+bNE++//35SuYcffliMGTNG5Ofni7Fjx4pHHnlEzJ07VwwZMiSpnEr2pUuXiuOPP15UVVWJ/Px8MXz4cDF//vyk17f/6U9/EsOHDxd+vz/Jfqjs1erVq8UJJ5wg+vTpI/Ly8sSECROSXlmf7vh4Ob6qcaIibsvXrl2bkhcKhcSVV14phg0bJoLBoBg0aJC4+OKLRVNTU1K5uK1+9tlnxXbbbSfy8/PF6NGjU/puamoSF1xwgejXr58oLCwU06ZNE2+//XbKMYrL/o9//ENcfPHForq6WhQWForZs2eLpUuXJrXp5RzGx9CSJUsSaatWrRKzZ88WvXr1EkCi/3jfTvsvhBAfffSROPTQQ0Xv3r1Ffn6+GDJkiDjyyCPFiy++KIQQorm5WfziF78QEydOFL169RLFxcVi4sSJ4vbbb3c5+oaegGoOmzt3riguLk4pG/c1ZWR/2Ou1JIQQ3333nZg9e7YoLCwUVVVV4oILLhAPP/ywAMR///vfpLLpxrmO+Dyi++e8nuJzm9P/1fGPf/xDHH300WLEiBGisLBQFBQUiLFjx4pf//rXoq6uLqns2rVrxWGHHSaKiopERUWFOPXUU8Vnn32WZAfjPPDAA2L06NEiPz9fjB8/XjzxxBPisMMOE6NHj05p89hjjxW9evUSZWVlYt68eeLNN98UgHjggQeSyn777bfi+OOPF3379hXBYFAMGDBAHHjggeKhhx5KlPHi54fDYXHWWWeJqqoqYVlWylrLy/opEomIK6+8MjE+ZsyYIT777LOUdYcOt3PpHF+68ZrOvr/wwgti2rRporCwUJSWloqDDjpIfPHFF0lldLZp3bp1Yv78+WL06NGiuLhYlJWViZ133ln861//SquXEEKcffbZYuTIkUp5neNEd426rX+d6I6NEEK88sorKXbq4YcfFrvttpsoLi4WxcXFYvTo0WL+/Pniq6++Sqp7++23i2HDhon8/Hyx4447itdee0153T/99NMCEIsXL04ra2djCZGBnccNBoPBYDB0O+bMmdPuV8cbDAaDwbC13HTTTZx33nn8+OOPDBgwoKvFyQomTZpEVVWVdr/POI899hg/+clPeOONN5g2bVonSWfIFN999x2jR4/m6aefZu+99+5qcTqMOXPmYFlW0vYI2YLZs8xgMBgMBgONjY1J3xcvXsx//vMfZsyY0TUCGQwGg6FHIduhpqYm7rrrLkaNGtUjA2WhUIhwOJyU9sorr/DJJ5+k2Gb52EUiEW655RZKS0vZYYcdOlpUQwcwfPhwTjrpJH73u991tSgdxpdffslTTz2l3N8uGzB3lhkMBoPBYKBfv37MmzeP4cOHs3TpUu644w6am5v56KOPGDVqVFeLZzAYDIZuzv7778/gwYOZNGkStbW1/P3vf+fzzz9n4cKFHHvssV0tXqfz/fffM3PmTH72s5/Rv39/Fi1axJ133klZWRmfffYZvXv3TpQ9+eSTaWxsZOrUqTQ3N/PII4/w1ltvce2113LxxRd3oRYGQ+5iNvg3GAwGg8HAfvvtxz/+8Q9WrVpFfn4+U6dO5dprrzWBMoPBYDB0CrNmzeLuu+9m4cKFRCIRxo4dywMPPJD2JTXdlYqKCiZPnszdd9/N2rVrKS4uZvbs2fzud79LCpQB7LXXXtx444089dRTNDU1MXLkSG655Rbz4i2DYSvI6jvLbrvtNm644QZWrVrFxIkTueWWW5gyZUpXi2UwGAyGboSxNQaDwWDoSIydMRgMhtwja/cs++c//8n555/P5ZdfzocffsjEiROZNWuW8rW6BoPBYDC0B2NrDAaDwdCRGDtjMBgMuUnW3lm28847s9NOO3HrrbcCEI1GGTRoEGeddRa/+tWvulg6g8FgMHQHjK0xGAwGQ0di7IzBYDDkJlm5Z1lLSwsffPBB0maEPp+PmTNn8vbbbyvrNDc309zcnPgejUbZsGEDvXv3xrKsDpfZYDAYujtCCDZv3kz//v3x+bL2xmTPtNXWGDtjMBgMHYuxM8bOGAwGQ0fSFjuTlcGydevWEYlEqKmpSUqvqalh0aJFyjoLFizgyiuv7AzxDAaDoUfzww8/MHDgwK4WY6tpq60xdsZgMBg6B2NnDAaDwdCReLEzWRksaw8XX3wx559/fuJ7bW0tgwcPZgL2xmwWoHveNP47jXB8F450HPmW5rNbe3KeIFkeZz86GZxtqerKbar6dWvf2b/RUd2enGd0VMuQTTqGxkLTDCBPI4ibACph0x1IVT2VsroDKaM7Map8L3Lq2vCoY7QZvr8DevXqpWm4e6OzM7P3gEDAfYzK6E67l7GOy3ddnXTDpq1yuKXJ/Rgd1fLi8l1Xx+iYfTrWl8APAyEUlISVFUCR72zQ2BljZ9DbmR+A0q4Ty2AwdDW1tV0tQbehrq6OQYMGebIzWRks69OnD36/n9WrVyelr169mr59+yrr5Ofnk5+fn5Luj/2T8epTqHwQnQ8h58n5bsECr4EJXVsqjI76fr32rco3OuaGjuGREJoFvjypgG5R0RYhdOVUfaQ7OCoF5XZ05dxWch2oY3d5FKSttkZnZ4IB+59qoe8WBEh3bbldL6q1pttZ0Q0zXbtu7RgdU/OMjun76W46CqCpEFYMg0ge+NwMJy5pzga9lFP1YexM1pIpO1OKCZYZDD2W7NxiPufxYmeycjOAvLw8Jk+ezIsvvphIi0ajvPjii0ydOnWr24/b+PjhkQ+TpUhzpseHq7Mdp9/h/Ce3p/M9VG2q+pZl0GF0NDr2VB0BIoOhaV8Qzl/6vazsVAdIJYRczlne0tRxpsl1nfmqdCdyH0KR15E6diMyaWvk0wCph1OgPoS68S8PLd3hd14Lch1VH7r2VUNGpZfzs9FR3b/RUU130rG5AJYOgZY8R0FjZ9SyGzvTYWsag8HQzTGBsi4lK+8sAzj//POZO3cuO+64I1OmTOGmm26ivr6eE044oU3tqBbSThuvc7jkRbjzr2qB7sXn0PkBKj9EllPVnqp9o2N6mYyOerqDjpHB0HggREsUjcnKxCu5Of0qIZxtqARJ14bbQkZIefKqTTWp6egIHbsZmbI18vovjhwEcKvnrKNrL/5dt8h3K6MbAqprVNemro5OVl2a0dHo6FYmm3UEO1D2/VBoLFBUUDVo7EyqfHIbxs4YDAZDMiZQ1uVkbbDsqKOOYu3atVx22WWsWrWKSZMm8cwzz6RskOkF3cJfl6ZKl22724JfF0DQ+QKqAIIguT83X8b52ejonm507N46RgZJgTI3JZ2VZeHlgyQrpBJQpaxbmlsZN9pyIlV9bo2O3ZBM2RrdtRHH7TpSrVtVp9F5mlUBgHTDUz7FbmtwXf9GR31Zo6Nal+6ko0ARKDN2JrVPY2eSyOSaxmAw9BBMoCwrsITonmeirq6OsrIyJpK8Z5nsAOH4LueraIvT6JbmRNWXrh+5XZ2Mqr6Mjun7Njrmpo4A0T7QcAREe2kaTqdkew6M1xVZutUXZOZEdbCO0Rb47iZ7w+HSUrN7StzOzNnL3rMsjmo4yKdahds1p7qunJ+9XLte8nWyu8lqdNTXc6bJdYyOuaWjAMJB+G44NBRqhDR2xtiZDBO3M7WYPcsMhh5D9wzPZA2JedWDncnKPcsyjWoRHv8sO0vydzcnytmG7JM4fQBnmtyv7O9YpMqr8o2cdWQ5jY5GRyc9QcdoJTQeFAuUqTqVG1EpKXeSLk1ehDjbsUhVwtknUl1ZNudnlbxdqaMhBd1hVn1WLfzjqA61PJScbbrV8SKTXMdZ1ymv21AyOhode4qO4QAsGxwLlCEVlD8bO2PsjMFgMLQHEyjLKrp9sMzpByD9lR0l2WmS/QlnWyq7j5QvSPVr5LZk0vknOl2MjkbHnqpjtBIaD4FIlYuAuoWE3LjOWVet7OLpKuHlxYpKId1qTbeQUdGVOhoSqMa+6rOOdMNFt/Z1qyNfWzrcTreqD7mO0TG5XTd5jY65qaMAIgFYNgTq5B9kjJ1Rt2vsjMFgMLQdEyjLOrJ2z7KOQA4WqPKcZeRAgexLqOy5zknT+StyOV37Kl/FrX+jY6pMRkd1O/HPuahjUqBMpYTKGXdbFMj58sFQraKc5dM5+V4XASo5ZRm6SkdDCs5DpDotuvUrJB9iJ6rr0+3U6k6rW7tyObmu3LZKlnTtOusYHY2OchlZdl27crnO1DESsN96WdeL5MaMnTF2xmAwGDKFCZRlJd3+zjInbs5SPF+3cHf6BvI/i2QfwNmeygFz9iP7PrI/4uaYqmQ1Ohode4qOohCaZjkCZfFMleOfDjf7JC8aVALp0uSDl659FaqVXjboaEhCXhuqrjO5LCRfl6p6qiGkuraR8lXrat06WCWfm9xGx+Syzn6Mjt1LR+GDFf0dgTJVY8bOpPaTTgZjZwwGg6EVEyjLWrr9nWXOwIH8XeXkQapP4bbgl9Odfcn9utWVgxwyThnkYIiqL6Oj0dFNTrmvXNIRQBRC42wID1IIYinS0imkWpnJyIq7rRSd+TrlVHnp+nZ+70odDUmoDpszT1fH+VcVRJDr69aYzvLytSmfejnYIffjJfhgdExNNzp2Px2FD5YPgA2VkgBOYY2dMXbGYDAYtgYTKMtqun2wTIXKhrst/FWBAdmhi39WLfJVzpmqLVVfXpxTldxGx2SMjt1DR6A1UDZcI4jbAkInvOywy0LLwshtqw6mnKY6WLq6qjLZpKNBi24xrsJ5eL1ct25tewlUpEtzC3Cka0eH0VFdxku/Rse21/PSjo6k6c4Hy/vD+t4gjJ0xdsZgMBg6AhMoy3q6/WOYKtsup7s5dyrnT2Xz5XKqss5/6frUyaCS1+joXlYluyy3rpzRMct0TBco0znoKuQFg2rxoRIo3aJDPrjpVo268vLiI5t0NKSQbq0nH9J0i33V6XVzqVSn2nntq+rKZdLpYHQ0OrqV7y46RuOBsj6oA2XZMAcbO2MwGAy5jQmU5QTdPlimcq6cWFJZ3WJebjOepyqvci5VPkI6edzyZX/E6Jgsu9GxtVw8L+d1DEDz9FigTCeEF7ujc+zltnSrLV0dp8Byuk4G3QpRd3CzQUeDK14CBhbqUykfdt1aUod8fbVlDe3sI91a1eiY3LbRsfvoKCxYWxULlDkLqBp0w9iZ1s/GzhgMBoMhR+n2wTKVE+bFD1A5Zul++JLLgLqO029QyeHm/AnUshgd1Rgdu4eOBKBpT2iZoBAqvnjwEtWT01WLCacCTqFV/Tj7V63KdKtJlbyqBUU26mhIQXW6nX/lz6q6qnLOdtzWnEjlVO27BTRU17tOTqOjGqNj69+c1dGyA2Wr+tpBs0Qj2TYHGztjMBgMuY25qyxn6PbBMlD7BHKe02arHDhdwMH5V3YWnWlyWTT5croKlZxGx9a/RsdUcllHAtA0A1omkepgy4617gDIHciLB/m7zoYpBZTy3NAdZNXqMFd17IG4rS91yKfJeS3In3V15Hxnmm696vzcltNpdFRjdOweOkZ9sLoaVvaDqLEz2a+jwWAw5ComUJZT9IgN/r3YbpVD15Yf1pz10l0Ccd9AkOwnqIIQ6fyIeL7RsTVdbsPomEqu6Cj8sUDZ9qSG9r0q7TxIbs67LJiqvqyk3J4cHXTm6yKHujbkz25pnamjwTPy6ZVPneq76vDLn1XXWro6XgIHzmvUa7DB6Gh0dJM3F3QUFqyusf+JdIrr0oydSd+2sTMGg6EnY4JkOUm3D5bJv1qms/E6W676pVMu3xY5VG3LPoWXflS6GB2NjnIZt75V7WaDjsKClskQmoQdKHNbEcnRP7eFgNxhupWW6sCq8mVZvCIf4FzQ0ZCCahikW0+mCySo2pPr6q5Nt1PuNmTdghBGR3U/Rkd1e3LdbNRRWPYbL9dUk7yZv4psmoONnTEYDAaDocPpEY9hyk6bM92ZJjteKnT+gypYoGo3XVuyfEjfnYEJVVmjo9HRTa5c0TEeKGveHYRfI6hOMJ2QsoBOQVRlVcLL+fKB060ohaKOTtZs19GQgtthtNBfq/J14ta2fMrdAhe6UyX35Wzb+c+tntHR6Kgj53S07CDZ8gGORy9x+Zstc7CxMwaDwZBbmLvKcpZuHyxz+hXOv5Bqt1WL93R2W/YR4ggpT9WX5VJW7lv+nq6O0TH1u9Ex+W+cbNRRFEPzriDc7n11W4WpHH+kPLd2VXXl1ZlqYaA6Mc52VStRt5OWjToaXHE7LXKZOKqFv1s78SGkaidd3zpZ2nKKjY7q9oyOqWSzjhE/rOtt71emFSQb52BjZwwGgyF3MIGynKbbB8ucuNl62aanCx4468n/ZL9F5+Dp8uS+5XIqZ1WuK/fj/Gt0TE0zOmaXjsQXL15XYe2xQ/KB0eWrHHkh5esWFnIZ1YHMNR0NKagW/l5Pl+q6dV6bumvR7ZS4nUa573T15bJGRzVGx9zT0RcFn1un8bRsm4ONnTEYDAaDoVPo9sEyNxvutMuyE6gLKFiKNLk9VTDB+VkVA1A5dboyqqAEGB3lfuV2jI6p7WWrjkmohJQFkzuUO5cPkpDactZX9SH/1SnpthiQV42q9nNBR4MrXhb1ujWvqrzbdSq3qyujmhdU7etOb3sCF0bH1DyjY9tk6iwdUxKyfQ42dsZgMBgMhk6h22/wD2p7LufL6emcPlXbQlFOlaaTSy6rC17I7avaMjom11H1p5PT6OheV1U20zoKVYKjcz9QFoGAgM1+aLYgqiqr6lhOl1eB8kFS/XWRLaVv1UF1a19V163PztRRN0AMymsiXeBAdTidp0B32HXXnlxfle9Mk9vXrYF1MhsdjY65rKOrnZEbyZY5WKWcsTMGg8FgMHQIPSJYJtvxOCr/wK2+W5pbG7qyOrmcabIjqevH6Gh0TCeDW5l0ZbNCR2GX2aEeTl0DYxsgT8C6AHxaBG+UQpMFHxfDhkCrDBFnxzpHXVZEJ7QuTf7sbF+lqFcZdAfDbYXZkTqmG4A9mHQLfbe1rjNNt/iX0a15VQECXVu6wIbb2luXZnRMldHoqG4jW3TUCisX1gnUFXOw87OxMwaDwZDbWIqJ0uxxllV0+2CZJf1NVy6O7APIPoXKUXPzKeS2VflyoMHtBziVnEZHo6NODmc5VZ5b26r8jtYxhViFbRrh1u+hd6i1jYowjGyCn2y0vz9RAcvy7GfMe0VgUwBKIrAqCD/kwzcF9udmC4QsoPxZPmi6VZ+zjupgpDvQzrR0q1V5paqSWzfxba2OhnahW0/qggW6oeOs4zZ0VYGMdKdRbrutp93omNqu0VGdLpfJBh2TBM3mOdjYGYPBYOgeqAJlbulC2HkmmNapdPtgmVdUC31VvpzmzJN9GUj1Y3QBA5U/IeehyGsLRkejY67oqGu0yQctUsNhC97uBWGgLgB31MD3+XZenoBgFKrDMKIJpmyBE9bYj3F+VggvlsEnRdDgfBuayqnXIa8gnfV1Cxi5vtyH3I7balA1ENItflSf26qjQYvuulEFG1T1VOVV163uNHs5Xaohqlpn64ad0TG5vqqeqrzRMbt0FHJBWZhsnoONnTEYDIbujSooFk+TA2omgNZhdPtgmduCP/5d5UDpHEQUeaqybnWElO8WcJDlU9UxOqa2YXRMlVf1PV2drtIxRcZY5rI8uHAIHL4e+oTtu8qeLYd7qyAUKxN11GkEGgN2EO2bfHiuDAoEjGmA/TfBgmX2nWcPVcILZbA6iH23meqgORccXhYHuoPhtiBxlrOkzzKqvoQiz5ku6yR/96KjwRXdKXGmudWL13FbO7pdx05U15WqLbc6brIaHfX15H5U9YyO6nY6S0dt4Wyfg42dMRgMhtwnfqeY83tb6saxrNZ2TNAs43T7YBmoF/5yXrp0N5svf5eDDqogQLoAgiC5Pzd/xfnZ6OiebnTMMR2lBgTwXgm8X2xnBYV9p1nS45QqYWNtCQGNPviwBD4qhvuq4Kj1cPpq+PkauLsaHq+AOr+jfy8rMvmAaORXtuFpFaco7+VEqvpULZK86mjQors24rhdR6p1sOo0Ok+zao2abnjKp1g1dOQ6qnaMjuqyRke1Ltmqo3B+kQu40ZVzsLEzBoPB0H3IRHDLebeZCZplHF9XC9CZWNI/XD7LdWS/QOWzOMupnDTZCVT17ww8qBxGlWxGR6Njd9QxSQEpU1j2HWTNPlLvBEunpGgt9mM+/L9+cNxI+KwIfrUC/vwd7Fgf60pIbcY/CylNdaDkNLmuTmYviwiPOib91a1S26OjIQXV+JeHgnwKdHWcf3XXGY58OV3Xj5wvt+nlOlbJa3RU13H+NTpmn47KjnJlDjZ2xmAwGAxOhFAHzgxbRY8IlsnOn/OzbKfdcNZRteksJwcskL7r+lWVcZZNV8/omB6jY6psWaujyvF2Ci0rICsjl3Hmx/4KC77Lh/OHwA39YHgT3LEEjllv73mmXQnq+pMVcivXSTpq5VKtkNPpaHAl3XXqtoZV4bauRZPntg5WrXN1MuvyjI5GRxW5qGOikCxMNs/Bxs4YDAaDQYccNDNsFd0+WCbbdGd6/K9sw51pQkpDURZSbb+zrlxObsNZV9WX7ICqdDE6Gh27k45JGap5XufEo/gr1xeKf5b9dsy/VsE5Q+0N/3+9HM5bCflRRzvpbI68KvNiozpRx0Se3F46WQyeUI1t1fUmX8de2lX9VZVxQ3VtquYKZxmVPkZHo6OOXNLRtbJKiGyZg42dMRgMBkM6zF1mGSHjwbIFCxaw00470atXL6qrq5kzZw5fffVVUpmmpibmz59P7969KSkp4bDDDmP16tVJZZYtW8bs2bMpKiqiurqaX/ziF4TD4a2STbXol+24LlAQT3MLBMhpct221pHrC8VnXVtGR71MRsf0Mra1TkfpqK2YzvlWOerOg6JaCQJRC97sBecNhXUBmLcWzl0J+aqVn27Flk52rwubDtIxUcciuR83fbysbjuRbLMzznWi6rSo1phIeW7XgNyu7tp01pOvQXkt6yynqyu3bXRMbRcpz+iYezoqyeY52NiZTiHb7IzBYDC0GflFAIY2k/Fg2auvvsr8+fP573//y/PPP08oFGLfffelvr4+Uea8887jySef5MEHH+TVV19lxYoVHHrooYn8SCTC7NmzaWlp4a233uK+++7j3nvv5bLLLtsq2dL5GrqFusrhkh06lW8i+xMqB9OS/spOoM4ZlcuiSDM6prZpdMwdHZMK6Zxo2dlXOexeEMkfPyqCqwbaLwOYtxbmrwa/28pPl+ZlddYFOrrK4FXHLiQb7YwcHHBbrzpPqZDKqtaqKNJU16gzX75enZ/be7qNjuqyzn6MjrmnY87OwcbOdCjZaGcMBoOhzZiA2VZhCdGxr0tYu3Yt1dXVvPrqq+yxxx7U1tZSVVXF/fffz+GHHw7AokWLGDNmDG+//Ta77LILTz/9NAceeCArVqygpqYGgDvvvJNf/vKXrF27lry8vJR+mpubaW5uTnyvq6tj0KBBTMR+5afsA8i22u0gyD6EG86y7amnq6PyiVQ6qdqLY3Q0OurKZpuOkb5Q/1MQ8ff1qjrUHRRZgHTCa9rxASeuhQtX2G/c/M0geLKC1hcKyPZGVtgph25FqSrXiTqm1JPlkHSMNsF3N0NtbS2lpaWahjqfrrYzc/aCQEB72DwhnyrdqdO1rbvG0tX3ss6W6xsd3es7+zI66uvH63SVjqEgfLWt/TelklNAneBymjOvg+ZgY2e6jq62M7VA9hwNg8HgmWx5M6V5U2aCuro6ysrKPNmZDt+zrLa2FoDKykoAPvjgA0KhEDNnzkyUGT16NIMHD+btt98G4O2332bChAkJwwIwa9Ys6urq+Pzzz5X9LFiwgLKyssS/QYMGaWUSjr/xfzpnSuevWI4052fn8BNSviWVdbal68typOvaVmF0TO3f6Jg7OoogCJ/UmJtzr1skyErphJcPAPbbNu/vDa/3goIonL8SBrRIbcj1VQrJaSr5u0jHpDbkE+emY5aRTXbGy0I+ju50x7+7rWlV/TrruF2/qnqyPG4YHZMxOuamjsKCaBfbmaQ22jIHGzvT6WSTnTEYDIZ2Y+4uaxMdGiyLRqOce+65TJs2jfHjxwOwatUq8vLyKC8vTypbU1PDqlWrEmWchiWeH89TcfHFF1NbW5v498MPPwDui323tDhCka8LCKgCEs6yqoCHrk+dDCp5jY7uZVWyy3Lryhkdu07HlAKqlVK8Q9mpd37XOd86h92R3uCDO/rCFj/0b4FT1sQex9S14SazCvkEdIGOStnT6ZhFZIOdgfSLePmQyteKrrzbadO1L1/LurpymXQ6GB2Njm7lc01HS4AvCjk5Bxs706lki50xGAyGdmMex2wXgfRF2s/8+fP57LPPeOONNzqyGwDy8/PJz89PSVc5V04sqazsD7gFCHTtqfLj6U7HsC31ZXnk9tzkNToaHTOlo3AW9tv/hB9EYSy/F62PTgJWI1hNUr8t9j+nfCICRBR9yI68F8HToXLs5cWG4/PHRfBcORy6Hg7cCP/sDZ8XKvqXZVT1J5eVFwtdpKNyUaPq30vfnUw22Bkn8vXjdi3L+fKpSrn20uA8ZV7cIN1llq6u0TG1baNjZnUUVuvfqGNeFBY0FcTuCIvhi0JBE0R8rY35I2BFScKKlY0T9dH6WL2sZDbPwcbOdDrZZmcMBoPB0Dl0WLDszDPP5KmnnuK1115j4MCBifS+ffvS0tLCpk2bkn6NWb16NX379k2Ueffdd5Pai79dJl7GKypHzIsfINfR/UDmVkZOk/0TXSDDTT5VerpAiA6jo9FRl6bTMTwaWibZGaIQRD7gi/3FDpw5O7Si2M8zOgmBJb0IymoiKagmCiTBVSs2naPu5QA46zjbUixKohY8WmkHykoicOR6uGJgrIpcT155OvPTyduFOrrK39aVeyeSLXZGF6TQrRnlurpyqnZUddz6cAtoyO3rAiPONKOjGqOjum5bdQTYXAprqkD47IBWONCaLyyIKDzXQNgR/CIWLJPnSQHBUKxvARE/hP0KobJ5DtaVkfvSyZsLOho7YzAYOoOeuHeXECTuKrOsnnkM2kjGH8MUQnDmmWfy6KOP8tJLLzFs2LCk/MmTJxMMBnnxxRcTaV999RXLli1j6tSpAEydOpVPP/2UNWvWJMo8//zzlJaWMnbs2LbLJH2Wv8f/ttXZ0/kGKgdULosmX05XoZLT6Nj61+iYSiZ0jNZA00wID4bwIIhUQbQUoiUgAvY/uTHhw95/LJYvgiAKY/V6xf6V2m3F2w0Pgmi5o2PZaZcPgMop16Fy5OU2FIuETwthSb6dtVcd9A4r+vFib3QHOQt0TKmvW2lnAdlmZ+QAhicdHHXj34Xms66OnO9MU8QJUj635XQaHdUYHTOrYygPfhgIm3vBlmJoKIKWPPtfKGgHzuJ1hNX6ORS07yyLWnbQLBSI1QvG/sXaqC+x291SAk2Fjo670xxs7ExGyDY7YzAYMogJEhk8kvE7y+bPn8/999/P448/Tq9evRLP5JeVlVFYWEhZWRknnXQS559/PpWVlZSWlnLWWWcxdepUdtllFwD23Xdfxo4dy3HHHcf111/PqlWr+M1vfsP8+fPbdWuyF9utcuhUfoIX3yHd5Rf3DQTJfoIq0JLOj4jnGx1b0+U2jI6ptFXHaCE07gPRIhcBZCVlBeSonDPfUS+wHIJfQONM7HC+qg35s1uasw83513XtoBGH7zVC7ZthKoQbNcAL5Up2vOio2pBkgU6JsnrdcHTRWSjnVEhn1751Km+qw6//Fk1n6Sr4yU44pyHvAZUjI5GRzd526Kj8MGqvnZQq6PtTFEjlG+Elf1JfhTTTUC3tM6cg42d6RRyxc4YDIY20pmBMnP3Vs5jCZHZM2hpNoy75557mDdvHgBNTU1ccMEF/OMf/6C5uZlZs2Zx++23J92SvHTpUk4//XReeeUViouLmTt3Lr/73e8IBLzF9+KvBJ2IvbVSQr7YX53PoLLlctl0AQwVbnVUzqmuTzf/Rdeurm2jo15eHT1SRx807Q0t27tUdlMsnu9BIP86KHoAopVQfzStTrmzDS9Ct1VJD6vJfTbBrd+DJeDOGvi/fiQe+UmSry0HXT5uXaxjuvMYbYLvbsbTq5Y7kmyzM3P2gqBUJd361FlG/qxCzld9d/bhNUjirCvL6VUmo6PRUVdW912WDwvW9YHlA6R5VW5YJZzcUZo5OBCGUYvtO9AWj4rtiZYjc7CxM51HttmZWqDrjobB0I3orOBVfA7JxmCZc37LRvk6mMS86sHOZDxYli04g2UBkv0L+UctXRq4+yS6H8nS9SW341ZH9VfuM91fo6PRcWt0DI2HplkkbdzvqpBKKFVZRb5/JRQvhMgAKVjmUkfZpyynDrcVoKMPCzhgE/x+KQQErAvCz0bCt/lt11ErYxfr6OU8Rlvgu5u6fhGTLcTtzCF7QV7A/VC7pcVRnUav7TjT3fKd/XgdOunmDy/yxjE6uqfJMvUEHcF+PHLJsNb9yVIKZdDOBMMwepF9B1tSsCwH5mBjZ3oeJlhmMGSQzr6rrLP79IoJlnkOlmV8z7Jsw82/kIeGmx/h1j6KMkLKU/Wl8it0voubc2t0TO3X6JhKe3SM1kDzdOyN+52CyI1YUprsBOuEVgmhElqFqo6bs4+U59auVNcC5mwAfyyvLGxv9p/Snhcd3cqoZOkkHRPf051Hgytup0UuE0c+7OnaiQ8hVTvp+tbJ0pZTbHRUt2d0TMWLjuGgfUdZuJPtTCCM/RKAXJuDjZ0xGAyGttMDg0JKNHfNGtR0+2CZEzdbL9v0dAESZz35n+y36JxYXZ7ct1xO5azKdeV+nH+NjqlpRsdU+UR8n7JiF2FUTq6Q8nVOt1xGZcO8rsLaY//kA5Omb+35aquO8onMIh09nUdDCqrghtfTpRr6zmtTN9+4nRK30yj3na6+XNboqMbo2D4doz5Y3h8aChWZHWRnfFGS35SZa3OwsTMGg8HQNkygzNBOun2wLFpi/1XZcKddlp1AXUDBUqTJ7akCJs7PzjJujquujOzTuPkpRkejY3t0xA/Ne0CkvySwSgj5L6iVdHOUdYLq2pUPvtyf3Lbcj6xLOh3daKuObsewK3X0eh4NKYQC7mtR0B9WXR1nnvY6RX1K3IIp8lpaNwy9tqeTxeiYmmd0lL5bsLYPbKqQOupgOxMI2wEzrXDZPgcbO2MwGAwGQ6fQ7YNlDT8BYi+cUdlzOV1Oc37XBR3if1X2XfZb5Lo6v0fuU5ZdDnboyqnklNNU/cnpTnlljI7dS0eA0HYQmiBlyA6vylHWOePOsvIB0QnsbEdIZWSZVAdQd/CcbcT/ptPRTTZnf1511B27rtTR63k0pLB0CAjJmjpPm2r+kE+3Kl0XoGjrKZdRzTvOPlXD0dm2sx2jo9HR2aZXHQVQVwqr+8bqGjuTeR2Nnele1NZ2tQQGQ25i7iozbAXeXsWSw4gC+59ltf/HKl1gwe2vMk0Aze515LqyI6zTQdeW7MjqcJPBrUy6skZHm1zSMdIfmnYjefGvWwl56VT+LDvRXpFl0NWVD7YqT3US3HRMJ1NbdGyLDJ2po5fzaPwNJeGA/S/lbpUuwBIxOYQ+WJIoq/nuFpzRpclBHV2QRl7vu+G2rnf2b3TUy5utOobyYGU/iBg7k5qnSzN2xmAwGNqGCZQZtpLuHywrhPqfkR3GN2K/7S/4Nfh/AKveTpb9FdkH0f0Ah6KcGzqHV+c3qZxlN79JbluVb3RUt5ctOooiaNrHvm5SBJQ/yw3qVn1yhzqB3fC6kpNXcSq5LemvnO/8rFsM6GSLf0+n49asVjtKR1Xfqv4NKbTkwdfbkBV2xhJQWgcVm6CwEXyxF1GkO43OfK9zUVK/qNfMuqCP7vKQ5XCmy2VkOY2OuaGj8Nkb+jcWKDLlzz3Jzqj6VuXH2zV2pueRbQt/s1G4IZvJtuvFkJN0+2AZVmzfMq8Ok1sEJN1nD/Wj5RDaFnxbIPAdBL62A2hWk7o5ucmtMUvp1v1ufTv9JpVMOv9NdWiMjtmpI77YPmU1pCqlG/tudkheXamE93IgVAdOt5KU+/dyPcvfFTrmCagOafrQte1FR9VCpIt0dJVR9dmQRChI1tiZdX1gfR8oaoDyjVDcAIUNdr5uja1aZ+uGnW7+UwVUVPVU5VVzk059L0PS6JgsY7boCLC2CmrLNBVVn3uIndFi7IwhW4kHI0zQzGAwdFO6fbCsJAp+xy/rzT4Ixz6ndQ5A7Ui4eZ1yXTkvlh/tBS0T7b2hrDoILo4FzlaDkBblcnc638TNZ1E5vzonGEWeqqxbHSHlq3RwYnTseh3xgW8T5H1oj0sRVBTWNeh0xr04zukOhkxHXo8edPRFoVD3iF1bdVToagF+oCDamhUBmnwQVbWl62srdPR8Hg2ptOOcJ5XP9Li27DVMfZH9zxe1A2aVG6B4CwTDYEXdRVblyWLLIrjVkfPcAi7x725ra1U7ukNtdHSXtbN1LGiCqrV2UDfqUxTooXYmozp2hW9r7EzPxXkHjwmcGQyGbkS3D5btVQsNRfZnAYQtCFmwxQ+b/FAagQ0B+1/Iitl1rx5svKyzjrOuB69WAKIcmneElu3BtxECX9nBM9967BUz7v6K87NK7HQBIKdYOr9G/q5TUc5z9iWXFaTqYHTU69PhOoYh/78QLYbQGCCIWkG5gXQrMvmAOAVzU8xrGVX5dl6PWh3T0QYdLcAnoCIMfUN2QKxvi50eEPbf4iiMbIKyMHxWBG/2glVBEM4FTaZ19HIeDWqcF7qXyEMnj+uoDzb3gi0l9mOZ8cBZyRb7zYCI1KGualI3xym6TCmrWuvr+oi3pVqHp5tm5GGsOlRyHVU7Rkd12UzoWFoLxfWwsSIWLOsiO2MJ+1/W2JlM6tjFc46xMz0YEzgzZAuWhXkUMwuIzwM5ei66fbDsiQrwFTgSYouCPAFDmuHXy6EqDJt9sDIPvimAp8vh64LYwlSHymNM50mqHA6H4yD8EOlj/2vZGXxrIfglBL4FXy2IaLJ/4cU/ksul87Oc4rVFRWebbv6T7Mir9NHJ7rVcW/zl7qrj1pzHJHQrHmdeOoG8HIBYsEEUQHiU/Vcpg45MX49y3+nwoKMlYLt6OGwDDGqB4U2wNgi/HAxv94rFxR2y+oCqkF3+z9/Co5Vwfx+o9zn6yaSObTmPhmRUx9UxrlPSoUvGtbAg4oe6XnbwLBiygxblm6BkM/ijrfXc5iC34e4W0NG14zY8nW166UdVRyev0VFfrqN1FJbDx+pkO9OcD6trIBDSyKCjo+1MBnVMSXe2mSs6GtqGKjDV1YtTEzgzGLov8jWtm2/keaCr56U20u2DZQkcRlwAzRYsLoCTRsBOW+B3y2B8I+xTCwdvhONGwnLnnTWyU4D0WfVd5UQIKQ+pfLxYACL97H/WNPCvsl8MEPgGrC12XVkcncqK5pUiynXkNJWfJauhOlyqvlSXiZt/lykdRTGEh4AlILDI/utspzvoKNdpz3lMKegmRNw5d1vJyULKQsTSohXQcAj2I2JyOTdnXdVfBq9HLW3QsSwCV/8A2zTZgbEXy+DXg2CzP1bUWdeCqIDVeXBnDXxaBAuWweR6uGQwbPB3gI5ez6MhFd3FlW4i6MJxLYCWILSUw6YyyAvZLwSoWms/JucPq0+7mzi6PLc5UrWu1rXlEF9bR5ena9dLmbbqKHyxPbmwA5GW6H46bu15TEnoRDsT9sPKvi4ydJWdyaCOKeSajoa2o9o/LJsWpiZwZuhpZMu11xF4DZTJCEGu3WnW/YNlshNA63cBNFnwRi84YhsY0WTvGbQ2aD/y1K5IhtSHaxlZRpVDAoh8CA+1gzy+aeBfbu9vFlgCVoPeB3IT2c1PEYp0VZpcNt3hSueT6eq7+XRt1VH0gpZdgIgdeCTU/XTc2vPoSpqxmlRGJ3AaR9yK2mM8PEjTR1ddj15w0XGzHy4dZD/63eiDLwpjd4ml0TGKPUf9ejDc9D1ctxQuHAK18uydwTknqYwhPR7GtXJh6/zchXZGWNCcZ/+rLYP8ZiirtR/TLNmCcn8zZ4BEpZbbd9Xcprt0nOW8qOhsy4ksr5e1ent1bAnCD4PsgGNpXfK+qd1FR9VfZ3tbo2On2BlhB4ib8zR9dLHfl1RG108OzzlKGQ2ZIRcWoCZwlhnMCxYMXcXWzDPxgFk2BfNd6P7BMjdic4sAVgRhRZ5LObfoBiQ7LyjSdPOYnO/0XJ3tx8pGiyA6CkKjwLcZAktjgbMfQDS7+zCyY6tykt18GlWaqq6ujpvPJsusc9qdae3R0dqSWrm76ehMU8mgSksauvEAjmql04axmuIkp7PlsXxrM/h/iAXLdOU6+3rU4MNKBBKc/QggikiqG7Hg42KFPB7kEJYdMHuyAo5aDyevgf/rB9FOmHPaHTg02Hg9btlgZyz7Y1M+NFXbby0saLLfqlm5wQ6i+cNqMeSu5UtfSOXlOk4Rdaro1HBrVy6nu7Tl+XtrdLRiH6J+iPhag2XdSUedLJ511NmFzrIzwt6vrzkH/L726piWbNbR0HMwgTODoeeSAwGz7h8sk421Kh3cPUZdBEJl2GVv1JkmOw+qttrQfrQXtIyH0DjwrYaiB4HG1OKyGmjSVL6Ryt9x/tWJLDvkbn6Xrh1ZPlW9eLpKH1Wa1Wy/OMH/Y6o83UXHrTmPwg8tk7H3DMvUWFWtzlRCx9KjZdA8VSGgTkEvMjjbaO/16GxW+Djgy104+cedKG8sSclv8Yd4f+BX/H3yc2zOa1TL18Y5J2LZe5YdvBGOWWfvx7i4MMM6pjuP2W3Puo5uameiFjQU2v82VNr7mw3/jqQ7zbyIqxPdSbp2VKLrVFfJkC7WoJO7PTr6ovb+b74oBCL6erms49acR2HZm/tH/C5Cd7Sd8UF9saIuUr0suh7bqmOKbKr+VXW6WkdjZ3ou5k6ptmOOlWFrSTeGOiKY5XwcM8vp/sEySDXwspMhG3ykPJ1DIJdL17ZKLtm50EU2XPIFEC0H/KlNqfCS7yaGW5pKXdVhdesnXT2vOijzBVjNYCl+6Vfpo0rLeh3T9K1LwwfN06BlR41gGRirSQK71XEqIMvQFdejhH/jKE744OdYUd1tCbDbku3Ykt/I33Z4Tt1mO+acbwvs/ct23gyzN8HNBbHqnTznGBR0czsTBcKB1mZksXWq4yFPnsN0h0gnriyTnJ927tPI214d429Z9DnuOu1uOrb3PGLBht6wqi/23WVdaWdy+HrsMToaDAZDe8nyO5Zyho6++yvL7y7zpS+S41iOf9DqMAiSnQeVpyinOY25sx25P/mzqo6bM6FLk50eoUiTRHJ2q3OcdX6JrhsLtQoqFWV55NOhUlFGJedW6Riy3y7qLNftdFS0k07HSA2EtwFCmkpy5+0dq7l4PcrNNle4BsrAfkRz7vv7MWrdwOR+nPK0UccwsKjQzt6jzn6rb6fPOYZUcnVct/Hadb4Qxa2qas3tFFdWSz50qmGtqqNTVa6vO+ROed0OXZt1FJDfBH3WtR6zbqejRt50OoaCsL43+CIuwvRUO2N0VPdtMBgMbSWLgy85SY7cBdYRdP9gmZuDITsaKqdD5826eZE6GVTOghPZ89V5zpLsVgMUvGn/9YrsT6nEcjtEsiOdro7qEmvLobQ06W6odATAbz/CGm+3u+nY3vPoXw3Ff4XgZ1IBlcDtGavpHP5svh7byeBNNZz634PtpjKkY17sbpVtmmCHek3HHTznGCRydVy34ZwHQ/a+ZZZGBt186YZKPd0wc0uT63gZpnIZr3W86Bj1QUs+1Bel5nUXHd3addMxGIKR30C/VYoGerqdMToaO2OgJy/KDRkkvoG8IXP00OPZ/R/DVBlxp2F2lkuX5nQAdO2onIR4OfnXM1VbzjrOPBUCrDAUPQH+pclVZB8E9L6QXE4W0Sme12COqi1ZJfmwqtJUcsbz262jH6ymVHm7lY604zxGgRbs/Yg6YKwmCZ1r12M72VSwhVdHfJzcrlOWtuoI1ITs7LwolEfInI5tOY+GZHJ1XHs85z4BQ5bG3owpkquoxJO7lPN063jV4QP3PlTq6frzKpdqHm+LjlGf/chqS566rCwv5J6OqjZ0n5OmUmG/8CDxiKqxM8l1nHkqepKOhp5HD12MGww5QQ+9W6/7B8vipPXgFGVVzkG6crLzIefJddrinarGqB+IujvObs3r6qjK6JxpnaOcTh43XymdLLo209WJEy1Vl+kOOmbiPHbIWNV1lgvXYzt5ffj/eGrMW3bzmdJRprPnHIOeXBvXHs+5sEjsw6XqJt18qEO1Vpfb1rWny5dVSSdbujm6PTpams39dW3r2stmHVVl2qOjsTOKto2OBoPBYDBkBT0nWCY7ELroh1Dk41LO2b6qT2ddXduyY6Mrp+hbAPhSfQ1dEEXgTXVnd26BFjdnXqey01dyO8wqnbZWRwBawL+m++qYifPYEWNVKUA6YbLlemwnEStiV8+UjunohDnH4EKujes2nHNhpRenvcMk3XThbD+e56Uft7W6rkxGdLSgsBEG/EhC6G6no4ssXnVMFDB2xuio6ttgMBgMhi6m++9ZBmrvM26wvTgGcp5rlMHRn1xX7luV52xD166LjHJTcRXdulJ1J4vobEtVzvnPKaqzbZX4chxA7lPnG7ZXR8Des6xE3V930HFrz2OKUJkeq7l4PbaTkD9MMOp4TW1bdQQCAvq1wKAW+7OSLpxzDDFycVy38Zyr1sFyvjxX6crryrmhU0Gei1Wo+ld9zoSO/gjktWC/7bGN106u6KiTzQsph8TYmfTt9lQdDQaDwWDoQrp9sKwwElNS5W06Dbkc7XDW0UVHZMdC166QPuucBdnJkPuV+5LaUPlIKlF0TrOqa6d4Orz4YDpkVdz8LmdaW3UUeSCKY/8KgSCIoFqWtuqRLTqq5Gr3eeyIsepsL9euR5lwsUtmK3M+250Dv9g1uR8POloCCqIwfTPcsgQe/wr+8i1UhRWddOGcY4iRq+O6Hedc17QTlYq6uU7uSnXp6dLiddLNdc7+nXOqLIfcX1t1xIKI396vbHMvqCsjcUded9Ex0+fR2Bmjo7EzBoPB0AOIv3Qhx/Ym7PaPYfYJQ2kj5AnYGIAVeRCGVK8yjjNd5xG6RSZ0ebp6TgdEdoZUToelzvcasElXJt1hcWtL5ZjrfDGVas5+3A6jDp1cAmjeCVp2iDUShPAw8E2E4CIIfkryhv8i93SUy2zNedQKkomxqjuA2X49yjRVKhJTCUYD7Lp0PI+Nf92+yySNjn4Bw5qhKgTrY7PzeyXwv2J4rResDUJRVFFfpqPmHIOaXB3XbbQzzmrpkA9JusMhf9ZNG6o6qkOpm3/lOiraoyMWrK6BDZVgCWjOh+UDYEMFVG6Eio2OTe1jAuSajh1xHj3NOT3FzhgdjZ0xGLzg3Gg9/jnHAhCGHMSy2KpN/nP0BQHdPlj2Qx748iE/tgjdbxMsy4MviiCscjBwpLmhckrcnAndwlvOd352i75I8qqaV4moc7hVjq2XgJFOXGe7smzpfCed36XK96JjtAZCE7DvKIsXCkCkH0T62nlWY2ubvrVQ8CpYLdmvY6QKRMCW32oCX1NyvkpmnQxC/tIRY9VNsGy/HtvBiyM/4PfTH2htQqOjBfRvgV02w6JCeL8EQsDXhfBqaWudAgHVoTSddtKcY3DgNl6yfVy30c64ueOyiunmvI5A14/KxqFIa6+OLXmwrg+EAq2JwoL6EmgotoNoeS0Qit3R3GszVK21H9lsK52tY9Rv61fQRNJLHjqEnm5njI7GzhgMbugCDqp0E0AzZJIcDXZtLR3+GObvfvc7LMvi3HPPTaQ1NTUxf/58evfuTUlJCYcddhirV69Oqrds2TJmz55NUVER1dXV/OIXvyAcVj2D5AELmi17EfpcGVRE4NTV0LcFfdRI5SQI9E6Jqowu+qGRUemgOGVy9iG16ebLqBxkuVlBso8k0B8OnZ9jSf8EqX6QXEdWSf6cbrHlRUdRANFeUgeOTiJ9IDyo9V/L9tC4D4i87NcxWgGRgRCtAlGemfOYlvaOVZUz7WxL148suLPtzrwe28EHA79ida+Nrjr6gJ3q4dAN8GIZfF5oB8qSTqIbXTTnZBNZYWfi5Nq4bqOdcRuOqrnK+U9VRoU8hznF0fWrih2o6qv6t6T89uhY1ysWKEPKiP2pL4aNFbClxP63qi/8ONB+bDPbddxQCcsGk3ikVNeHTFvPY4oQPdHOGB2NnTEYMoUQPTbAYcgwPXgcdWiw7L333uOuu+5iu+22S0o/77zzePLJJ3nwwQd59dVXWbFiBYceemgiPxKJMHv2bFpaWnjrrbe47777uPfee7nsssvaLoQUHWjx2Y8z/acczl4Fkxoku+z07nQepwqVI5Nusatzapx1VXUUMuiKytVUARtnQEX+LJeRu7ekPHlRpHLwZTlkH8/5OZ2T7VXHRMOqz3KDFoTGQeO+IPKyW0dfLVhh8K8C35rMnEclmRqr6ero2ujq63FrcNHTEjCzFg7eAH+ugk1+UhcRXhcj8c+dNOdkC9loZ9KSLeO6Hedct75VdWVJ/5zl3OY8eU0t96tbs8fbdTukOnndDr8XHZWKupxHYdnBsx8H2nduycWzScdem6Figz1fZfI8ptDT7YzRMWvJCjvjhrmDyOBGDw50GDqZbjgXdViwbMuWLfz0pz/lT3/6ExUVFYn02tpa/vznP/P//t//Y6+99mLy5Mncc889vPXWW/z3v/8F4LnnnuOLL77g73//O5MmTWL//ffnt7/9LbfddhstLS26Lj0jLFiSD3fVwPxVMLked8MfRxcNkY28WzRFVydJQMV3VTRDIa+shi5Q4qWeG+kOl5sccZz+luxw+4CgZZcZVATbl8OufaDY334dkzpWFZacyNBYe2+zjOoYAOFXiwNAEKK9IVpO4s42ue3E3wLwrQeawNoMRDN4HjthrCr7E6R3xJ115D5V8mb6emwDuy+ZSGVjr9a+JR23aYKj18PN/aDBp5BB1lFHF845XUk22xkgt8a1x3PuzPa6vpXnLqe4qvy2tOWsI8eqdGnOvHj9qAUNRdBU0D4dlY07G1Cck3jAbHOv5KodpWO8LYH9WGU4mDw8ZQQgfJDfDFXrWhvKxHlM6iT+19iZ5O9Gx6wg6+0MgBDdcpFqMBiyBC/zy9buaZaldFiwbP78+cyePZuZM2cmpX/wwQeEQqGk9NGjRzN48GDefvttAN5++20mTJhATU1NosysWbOoq6vj888/V/bX3NxMXV1d0j+g1bgrDPjSPPhzNfxqBfQPof7lzInTSVD9xfFdt+jV1dGVlb1kVbk0RWQfJF0wKV5G54DLjreqP9mJVtWXT4kFlAdhhwoYVwYDiiBgwcgSOHwwzBsGNQUZ0NHreQSEP7M6igLsN3EqygJEKyE0BlrG2Y+OCiAyBMjHvlrjj/kEoGGO/ciobudB+TwGLCgNQNCXqrLyWHXEWNVdj05hnXWy6XpsB1OXjuOkd2fb+/xIOhYIOH8lPFoJa+P7HKXT0U3GLppzupKstzO5MK7bcM5Vc7amqPKzTgUv4jr7l/+1BbdDtrkE6kq9lVflJeHxPAog6kv+3lE6xj8LH3w/BJYOsT+DHbgTli2L8LUGyr4fYu/FlqiraT/JHsba0pE05I2dSS1jdMwqssbOpMMEzAy5SjcMsHRL3OaXbhoogw7a4P+BBx7gww8/5L333kvJW7VqFXl5eZSXlyel19TUsGrVqkQZp2GJ58fzVCxYsIArr7xSLZDTMEuOxLslsKgAzlwFlw6CiGocyM6HKnqiM/xCUcbpcep+WVO1L+eL1CJuxZ3iq0QWacro1u3yD5e69p15MnkW7NIHIgI+q4W6UGvZl9fAa2vAZ0FYtF/HNp9HKXlrdYyXjxaCr9n+IvIAx4+LVgP46sG/AnwbITQeWibZj1uKEhBB8K+xG4tWQqgYaIZIf7BaIP9Nuz3Vcdi9Ck4eAV9vhhdXweqmVr3WNsHmsEOfjhirKkdZ5din66err8c2UFdQz4rS9Uo5pm2Gfi2xDfx1x0uno6qs83u8vw6cc7qaXLIzSWnZOq7bYGfSzXUybt2nOxxOEVT9eqnvpZwloHpN+nZ0+KKSoG20M5nUUQDCR/Jm/MJxDgSU1UEwZH/e0gvWVtkvG2jJs++yq9gE4QBsLrXz64uhqMFOq1md+mKChCyWHXAsbLRfaOB2HlOENnYmtf2eqmOWkHV2xgvdeNFqyHHkcekcq863eprAb3bSQ+eVjAfLfvjhB8455xyef/55CgoKMt28losvvpjzzz8/8b2uro5BgwalFpS8z6hl39lx2xK4rwq+KsTdYbCkdK8ovUQpz62szmESyq8pzap8IWdauvpeUPlRqu9yXp4FhwyEbzbDx5vUZSPYgbT26pj4spXn0VL8TXfMko6tBS072IEuokAeFDxjB8ZEEYgSRwXLvnMs0g8iA1rbi/RzNF4SS+sPvgbIewd8Leoh/NY6+HYLbGiB+nBreq8ADCm2/36yCZplZWVlVN+9jFUVqtWg/Ddbrsd28NfJz/CP7V+w77Jw6OjH3tD/nRLY7FP0odOxLXTUnJMF5JqdSUrLtnHt0c44Ay+6dbJbvrNMUh0r9l32oV3EkovrLtGk9Ji9t6Jpyilk9KKjsNJUSHMeddmybdY1mXR5WrCiv/1IqT9iy9Z/BeQ3tTZWsgX8MTuwqsZ+6YCzwYbi5MY3ltv//FHovT7WrkomAaV1dh15ylDqaOyMvqzRsUvJejujwgQZDJ2NM4DiNvZUgRavaYbsppvPORkPln3wwQesWbOGHXbYIZEWiUR47bXXuPXWW3n22WdpaWlh06ZNSb/GrF69mr59+wLQt29f3n333aR242+XiZeRyc/PJz8/PzXDuUhV/QW+LYBmn73R9lcFmrLyOFB6/bRGnuRyOnQRJnmVEc9XpcfKK6eX2ONdIgxE1X6J3F38c3gkhEY7uoiAby34V4Jvg922CIAoxH68MGDvtYXffkuj8PCQryVgTj183xvebnaJSwjwbwDfavuv1RBLD5M45rpTldyh5rvqPKZ+VInVulhSpCd16YPmnSEyGCJVrZWa98S+uywfe9+xAgiNAqsZon1IPTmqzq3kZEgdKi1RWNZAEgKoDdt38/XOg5l94bkgJL/LSdO3M93LWHW7Dr3q2FXXYztpCDYTtUSKjhVh2K4e/jOARBDVk45eogKdNed0IbloZ7JyXHs855awg0zyo3XxIhG/nReIzceqdbtcJ97F2j5QV2bf5RS/W6m4Hgqa7D6x7LbDgeT+m/PtPca8Eg7YcuY368tYAgob7LuiguGYPnH9RWwd6pDfqaM/4jhs7bQzcgxBZUeE4nNSmgWra2B979jjnaJV//xmO98Xte8gi/jtAFpDEXgdq74o+BSBsqTqmvnCKWc0XdAnXsGZ3t3tjNExq8g6O2PYOlR3NbWnnhte7+rrLsEFWVcT6Oq+pDu33fjcZzxYtvfee/Ppp58mpZ1wwgmMHj2aX/7ylwwaNIhgMMiLL77IYYcdBsBXX33FsmXLmDp1KgBTp07lmmuuYc2aNVRXVwPw/PPPU1paytixY9sulMpBoPVveRh6RWC3zfam//FH/dSeKMlOQLrFtZtzIa8adN6xTo/Y56Y9wWpU9G1hP7rnB1+d/bZE/2rwr7XLR6VfkhP4IFoV2+B+gJQXWzRZDSAC2LfI+EkOjCkWAjodB7TAkDVw/wAIq1YAjvZCMZ2tFrCaYln19ufAUrAUiyDfBods7TyPKn9Tp5YqvVUYe/+xSJ/kzNBITSe6sSCPG0lmt4Wq3JST9S3w8mo4ZFtYGIV654sIMjRWPbeh07Err8f2otBx20aoCkPf+F6JbdHRS39xOmLOyQJy0c4A2TmuPZzzqAU/DG4NHsXxRcESdrBFWHaQqbDRDsrkN9vlw9ILTeK05EEkAGur7c9Oedf1oXWfv5hMykCd28SruW7q43dM6QILlXbfPtF651UwbAeVihrsdJnCxthjmM525c/geh5VJkA3f8siJ9W17DvAEoGyWIX6Iofujk4aCnUNKTqU7Ix8KNoSP4movE9jZ4yOxs5khlx9FLOtgamtabcrj49b37kSSMvF8WUwtIOMB8t69erF+PHjk9KKi4vp3bt3Iv2kk07i/PPPp7KyktLSUs466yymTp3KLrvsAsC+++7L2LFjOe6447j++utZtWoVv/nNb5g/f377fm2RjbV0fQeAoIDCaMwRlr1UpzPh/AupRt+ZLnu0bvOKm7Mh15WcmkgNaXWMlgOxu7itCPYdWUG1/IlFiKIvwN78Nx5oU8mp01Oh47TN8HFxLFDmomMCH4h87DvZBFBmJ4eHa7oMg28Tyee0LeeRVHW8+JmuqsgOq1vjqtWIrIfmHCaqx+rEHz9SqR1P2xyG/22BfWvh0YrUdpX9tWGspgjXHh274nqUCdanKWA33xwIKXUsFLH3NUgBgZT+VTrq6Iw5J0t8uFy0M1k5rttw7TYWklbHUCnUOV7+6osmb16v6jqlr9gaT8SPl5uOKl23dn7CtnERAZF8u0xLnh1s2lCp1iUQsXVNsZ1yfx7sjHyZqeyMSpWk06EKjMidZcDOJMnuKK+r6vweDCmUkGVI6UTRWPxzd7AzRkdjZwzeycZgTSZkyka9DIYeTIds8J+O//u//8Pn83HYYYfR3NzMrFmzuP322xP5fr+fp556itNPP52pU6dSXFzM3Llzueqqq9remcqJ8GLgnXXldtwCO7IjonPqVXk6J0FVVyWnm46OPkTsbjBtxKeTdLSA3evsYFlGdFT0K4KxO7nao6NIVUMlmkrE+F/NWkzdUBvPY0pdRdFoJTTvDkSh4DWwatXdO+X9rBCmNUGegBZ5sdrJY1VZV26ns69HgL5vw6optGwZQlhxq4lA8Fnf73hum/eS5cjUnCPL7qaHW157z2MOkHV2JtvHdabtjICIT9JHbicXdVT0Gw7Qfh2ldmWboTsM6Q6TsgFZNrlMG+yMk+YCWD4ArCgMWNH6qKtu2lI2Y+xMqhxyXZWc3UxHvwXbDITvFMWzkU61M4bsJlfv5sslnC8B8Io5L4atpFOCZa+88krS94KCAm677TZuu+02bZ0hQ4bwn//8Z+s7Vxl6OS9dXbdrTOX4tmWx6yzr5ljI5dJ5zXI9uQ9Vuzo6SEcL+26+8rCmXFfrKFWTu1GpLaR8JfEKHaGj9DUyEELb2p+DX0KwVt9EvF7Ugm2aYFQTfF6oaLQHjtUU8jfCxD/wl141PNJ/Y0p21BKsLd5EQ1B6NjgTc45MV8w5WUhO2JlsG9c94drNdh1dmvAqmvKwdKKdaShqvaOwpB6q1njrQkmunseeMFY7QcchfeHKY+E/F7rI1oV0qZ0xZD8mMGPIZXrS2C0r81y0S+4s61RkR0HljcroPFaVw6EbV6o+dXV0DofKY1b1n84Rkh0s2elJ11ZH6piun67W0YHu9MiHQae6tqFM6ih3E01OkGWR5RXYjyJXh6BY8dY4M1YdFGxgQ/UGllQoZHSTYWvmHDc6a84xpOLFzmTruO4J126260j7Li1dvMRy6z9DdsaZnN/cWlTe104V39EWkBvPtfPYE8ZqB+tYVgQFeRgMBoOhM+lJgbI24uF9hd0A569ZTsdA550682Vj7vRO5cgDpDoozjblOqp8VXtOWWQ95Hxnnqq+cJTJFh2d7WaTjgqxVGrJKjqblUXR6pEJHRVyCMedYdEKkq54+TQqT40Zq95oq46q+s56Oh1VdPacY1CTy+O6J1y72aqjJLosgiyOLFZKXRELWHWwnXHauAaHnakrtTOd+SodkgTvDuexJ4zVDtTR74OAH5augduexGAwGAyGjqO21nPRnhEsU3lsWs9NqucsKzsXcrn4Z5WDIjsKcj2VN6yS0c0hSaejSo6u1DFd+8468fTO1FHRtLO6SgxVlxZQPURRoCN0dCaV2HuWFbwC+W9CeCiIglTfVuVLpzbm+NzTx6pMe3VU4VVHuU5XzDmGZHJxXPeEazfbdZSa1GQnNamLdaQcmg60Mwl5fbCxEqrWQulm2Nyr9e2mququU1oun8eeMFY7UMe4Khs2w3MfKto0GAwGy8I86mrobLp/sMwi2ZFwGmzdtaZyIJxeqrOuypvVtSP3rXJu3NJkT1N2lNLpKOvT1Tqq2sxCHS1AWBCthkiFumvVosWp5rZTad2PsqN0VODbBL514F8NVii5uvMQaJsxYzU97dVRJacXHePNWRCSB1pnzDmGVHJ1XPeEazdHdIzbmYZi+82jwtKLKKuTJKoFIecLBzrIziRUs+x/BU1Qvsn+HPanyis3mUQ3Oo89Yax2hI5RAZGIpm+DwWAwGLqInrFnmfOv18Wv04GIo3Iw5DpuMqicEpUD4qzjbFslkyyv3I6uXBboWBjKp6K+iG1Wb8s+W0K8OuITWvzhrNQxNAaa9gUrDIVPQmBpa3HZr1WJY6n6z7SOskqWvcF/tMy+w0ylYpLc+YC0H70Zqx5or466Oh51DFmwPOjSnrNOps6j12PS0/BqZ7JtXPeEazeHdNxYAT8MAl8UhiyF0jr9ped6KXakjgqiPljRHyJ+sIS976WuWWUz3ew8JsnbTcdqkrzGzhgMuUV73irZ1Zi7yQxdRPcPlqmiAjrnQ1XHWU5l5OV25GvZ6QjoHE6V0yPLq0J2ktLp6CWtE3Ucv2oYY5f3xv/lcexCHj+Z+2u+670y63QU2MEmkQ8ij8Q+YCrfUhbPkxnKlI6y6Bb41oB/LUQaIdoH+15SS73miVYAYYjWAC1yYwqZ4v32gLHqibbqiFQORTlZR7lLlfydMecYksnVcd0Trt0c0VEAzfl24Clq2YEnlZi6qSTtNJVhOxP/agkobISKjbGAmQ/84djcpCDih1DQflwzqR9Z1hw9j0nydtOxmiSvsTMGQzLZ/phgNsvmhjOw51UHIcipgKAhK+n+wbI4smOgMuxyWZVzkK6c7HzIeXIdncMQT3dzVnR1VLJ5kd1ruQzp6Bc+LGFhAT5h4RO+TtPREg7RdToKO3DUOByi5a350VKI9JaabAZrS3J1Z9ORkCSXG+3Q0ScgX0DAsl+AKfIhNApC4yDcZH/HDw1zwIqgROSB6AWBQmBx8nFIkUslbzceq+2io+Ycma6ccwzJ5Nq47gnXbrbrKOxN8aO+2Ob4sfymAthSAsGQba+EZW/c74ukqqKcsjpIR2FB3FRHArCh0g58NRS17lX2/TA7YBb1J7fhj9p6hYJ2QLBbnUenPN11rDrlMXbGYMgtcj1wJAfIcjXoZ8g5ek6wTHYg3BbFunyVU+Kso2rHWdfpTKgcBZXDo+rDmS7n63TUOVRdqaOKjtJRQHEECqJQGoH+IZheB3/rAz/k63UMD4NIn9Y2rAg07w7NuyaXC3wDhU8lq+v0Bxu3gBXfIbADdKwOwQlr4Pl8WO8HcQA09oV1eRApddQpSdUx5VxFXfLMWE2lPTqq6nvRMR2dMecY9OTauO4J126262jBxnL7EUxn3dU19j9ftLV+xUYY+KNDNqkp4Ud/DDKkY9Rn70nmC8C3w6GxiMQ+ZfE69UVpjkf8e3c6j6o0o6NaB2NnDIbM4vVuNq8Bp64OqrXnLjKDoYPoGcEyOXIByQ6GV9wcELk/XV1n3yrHx9mGzqFxK+/mRKlk0cm5lTpallTE67zbATr6BYxphGYLluXDhgAszYcvCqEu/su3xkmL9mpNt6LgawZRAlYQ8oKtVcr6QemI5LqWD/oMgvwiOPwSKCiDDeMgVAxra+0NbZ20hKGhKdavsL/HRRNpdFwVhN9X25ss44f6Gij3w6R6+LIQGuKLKC/nMUkJuv1Y9aSjG+3RMV19Oc+5yFDRFXOOIZlcHNc94drNMR19sf+EsLMi/tYqfaugKpq8dvAHoLwP+IMwdhos+RSCBVBv2fakrBg2boFI7EeQugaIxj6n2BmVjJK8vigEw3bQrDmf1kfBzXnU13X2bXQ0dsZgyDSdGVDqjDvUcv0uOEO3ovsHy+RFpsIRaLKgWfVeUOfco3IIVe25/bImt6fqy+lkyG3Kfcl6ueiolbEjdLSgqgzW1KbRUUUH6LjjFliVB0vzWvOEBRtVgTKpvr8Ixg2BsYOhXyX0rbCL+X1QVQ6+WL3iPMiTVbEgrwAsn/35Fzu3LkhaQqTQHIKG2Ab7oTCsrbMLb9wCS9fAF8vg0yWxRY/iPEQcnwV2ULClEKZsgddLHfnpziNSWnceq0j1dDq60V4d5frOfp1tqPJ09WXZO2LOMaTiwc6klMuGcd0Trt0c0DHohzGDYLdxMKjKti9ra6E+9uNJn1Kob4bth0Pv4uS+LB8EY8bHsmDqvnBULKAWidpth2LvzREiFiyL1Xfamfom28589C38b4ld1vMc5EFHZbludh6NjhnU0WAwdB3pgm/mbi9DD6L7B8ssxWfJUNf5ocEZLJMXPnJ9lSFXORNe6rk5QCqnw1Lke9BRK2sH6GgBec6RpdGxMdhMxBfBD4R9EcK+SMZ1LIk9dvlunqJ+Gh2Dfpg7Ew7fLabPVjpwltXaRGF+an5hPpSXtH4fUuPIFHYw7b4X4eE3IBQh5TyKfGjaL/Y99pbELT5o9MGAFvuuOlnHVCHRO77dcKymoNPRjc7UUeWfdOacY1CTq+M6i69dy4L8oH0XVEukNc9y1EtqNgd1xILhNXDuT2BEPyjM05RvAxYQiP0QFH9Zrt+xd1hRQXJ52c78ZFe46TF48/PYcZd0jPhh2WB7L7XEnmQ9fKwaHRV5xs4YDIatwQTkDFlC9w+WCc1ncHdK012jKqfEzZnQLbxVi2ChSZOdI6HI0+mocpIUOsb9lWCAxB2wBUHIl2+bws7PC9hBHCcBn51eEPPUQ5HWxz6EQ54vq5fx2cB1bCleyAM1jfxYtjbjOo5rsB9r0Z5OzXn0AcfOgKP2SF5odBmWfQ5O3AfW1cILH5MyToQfQiNJOYZBYT+Oucy5EHMbqy4yOPtL+pzhsWpFQcgBbLl4GAg58vJjdTrqenSjrTo6UR1DFV79hs6YcwzJeLEzHufglHa7kZ1xPlURtzHhsD3H+hx5fUphr0mw+3jY3ABvfgHhWMBsWF9oaoGifPjqR/hxnX2H1I/rYEuTHSjqUwrr6hx38HaBLU1Ccx79Fhy7J4wfgvq8djYW9CqEXx0Jtz0FT/7XsV4RrX/qHNsTJOhmY1WJ0VHdh6qMsTOGXCFumDIVnDFBHoOhW9H9g2VOnAZZ90uZs6zsaKicApXDKNd3fk8nm8pRcbbrLKNyRtLp6Fys+KF3KQyrsR8xrCqDAX3sYFd1eeuv00X5UKAJlvms1r1QnF1saWoNom3YDI3NdtraWvvfl8tg9YYQ9UW1fFH5Os/37wAdBZRF7DvL2noeRw6AQ6dlSaDMQSAAP9sL3vsaahtIHS8KHfMFFMcf3XSiG6vp6ICxajWBFbIDflYYfHUQHqjpJ9aGfxUQe2zVvyb2Moa+HnVs6/XoxtaMVVk2L3OOqn+5Hed3HVtzHg162jA/dXc7Y2HfHdar0LYtO29rP2ZoWba9Cfpte1BWbJeLU1YMZY5N4idJ+0HGOXAKRIT9Y8y6Ovhhrd3XoCr497tw3wvQJD/y3gF2pr3nsaocJo9StN2VWHYgc/p4eOqdmBo9YKwaHck+HQ2GzsAEtwwGgwvdP1gmG3QXI1wWhsIotARIdQbk9mRURt7pRKjquTk1Onl1dXx+hD+AFWrWCNhKcT7sOAoOmWo/+lFcEPtVP4MOSnFh6+fB1VKmsDcV3rgFqt+E/z2fmt/m46IoHpBHt8fzaFmw/472oisbGdgHdtomdneZynFF8dlJurHqVsdJe8eqoo5vMxQ9iP0mTgEtO8aCZSoHO74Qr4fCJ2N5AqKVjmBZpq9HN9o5VgWw0TnXtPU8qvI7a84xJNPG+and57yr7YxbHalo71LYexLMnGQHxooL7L245HpJjwG2Fcu+O8vvs3/w6VfZmnX47vZ+k899CIuX23eeNTQ7prk0Ovose28vv2VPS0E/+HyODfFFrK12nke/D/abDKVF7dK8w8kLxI6Bao/MbjRWk9owOurrGTtjMKTHBN4MQLve5mneAJq1dP9gmQ5FcKAwaj+y5loH1L/OOY2804nwWseJqj1dP7Hv9dvsSt1Oh9D3/ouxIqFUHWPO/3ZD4cRZ9mbCXXbXlGW/SbKmApgNvQdA8KHYPlwuOjrr+3120C/oGMEFQSiL7fnV3GKXKwlDcQQqS2BLo2MPFmfb0t8+pbDrWEW/WYLPB/vuAK/8D8JRTSGnjk68jlW3OunaSzNWlc0UARGwGuzvvjqFHBLR0liZ2DGwtrjIBFt3PXohnbOvmHPWpZuBVfLq+o6X7+g5x+AdeVHp5fhlsZ3RyuuQpTgfpo2FufvYL0Tpqhda+X0wfqgdMAuFYcUGeOEj+67cYTW2/Vu6Gr78wRa/rAh2Hm3//WKZfadbeTHsMcG2TQN623uKra+zy9c3wYr18MkSOxiXF7QDaqs3xQJqac7JxOFw5B5dd3zSMbyf/cKBVz51KZTjY7UnXI85q6OhZ2NZuRcw2Bp5dYYg146BIZVcHMuGJLp/sEyg/gXMq8OhqmM5ysjtqr6ryqbr2ymDrl2HsxLctIqW6mG0VA8jf+XXyfICFSX2/lsH7ZyZTYQzhgU7jYfx78FH39mBoJpye2FSWWovXOJ7ozW12Pt2Bf0wZrC9KOtbYS961tdBY0vyPmmhMPQrtN9UeWDQfkTn3+/a+980xoJpcRmcf6eNswNm2cy4IfbePYtXkH6sWmD57Dsk4mWiIja0vIyDDI9VVRkRAIKtzfhq4x+k9p11ikHkgS/2xjj/+tQyrYXBUgUBY38TWW7XbjrS6KiVTT6GbnNOur517erK6GTQtZst80a20RY7Y0mfc8jOKMvE5PXHAkCnHgDD+2bPI+yWZQeyhtbASbPguL0hP+b5bNgMT7xj/z1iN+jfx3E3FbY9siBJX3lD/CN2h9p6+4cbvx++WQEPvALvfhWL42vOyZRtW+1VNlKYb8v46ff28REex6rPSvx+0Tq0smysKus42+8G16NWtlzS0WDIRVSBL2ewRHUHkQmmGJyYAFtW0f2DZZBsmL0uPp3IdeTx62b4Vb/wOZ0VL7/G6fIdaaGKfoR6D6Jx+GTyV3ydkNeHvYA562Dbyc/GX7HzArDfjvDVcthmgO1shyPw2ff2L/3/WwJH7wHfrIQhVVDXYN8ZcPnf4MCd7T1f/vKcvdlzSaEd6Irvt7ZLFH6yAywut9NLCmHm9vDBYnsB0OTcIN6yFy/7bJ+dx8lJYZ59zFY8Zu9xH7JiajjGalUp7DkRpvSF/qWwc6C1zMoN9h0VL38Cm2PBJu2vuRkeq6rrUQQhPAD8sRlJ+CHvf0DE0UYeRMvAt86uZ4XAikK0CEQhWPV2wCzay24PC6xGyH8H/Mv1xxLs+o2zQeR70FHbiLuOGZ1zdP06y3bUnGNQk+6cpzvuOWBndDqWFMDP9oSDp8YCQFk6f8bfsBmnshTmzXQWsP94DvRZdtlKx48r2w2FkcfAXf+xH5VvapF8Xsv+EWjHbNurTMG+O9j+wy/utu/MS+A49wVB+y60fhVQ2QsmDLUfeS0qsO/m/vg725Y3yS9b6KKxmlK2G16PKWVzUUeDIVcRInURIQfI5GCI87vbAkRXx9Az8Lo4NeMio/SMYJkTp1OgG3NuDoMlpXtFVV/Xn5e+JPmFPwCWj7LyfI6abu+D0tRi31117J72YiZrHXPLftzjw2/g1U8C7DEhyrcrLWZsF2HJKkDY+56t2mg/3rLNQFhTa//qXZgXe3xmWzjjQHtj6IJg63xiAVEL+m6yN4DefRy88xXMnWm/mODmx6A53CrK+KEwakCnH4G2Y9m6/PdTCNYDG+C/IVtXhL0/0G7j7X3XdhwI77wGz4btly5sO8BeMBbkwQFT4NE37eNqNWMHpxQvc0iQgbEq6+H869vQ+jfwTXIRAXb0Nypdki12UMy3AQJLQBRB887QMgmC30Leu6lrAVmcSAWtd7HpdHTDi46ZmHPaQgfPOQYXVOdc/puDdiZexm/BwGp7/8TaejhxXzuoku0/MijJtMyWHSg6+xD7x5x/v2vfveZk4nD7TrdsJ36n95ypdtDrh7XwwzoSx6xfhf1D3ORR9g9UFiQfT2Hf0b7oR7j/ZXh7UXIecnk5z61se+xMd7we5bLdWUdDz6I7B4Pc9PKqsyro1p2J66q6My/bj0N7nKNMv6XV0G66f7DM7Zcu3fhTlVUsGIBU4y+kdtNdH6pf6pwyIOXL6bG0aIG9Wdd2Uwdzij92jYlYdg44HoV5sNvIEvpRyZQd1+IXQUYNr6OhxVa5IA9G9W8tH4rA6IGxFxNgL0rc9Kwuhz5l9sLu/cX2Rs87jrJ/CV+5ERD2wmDWDvbbQHOBPqVwzcmAgB8/geUL4Yd8O0jWp9S+A2/6BPCFYMmX8GMFjBsMB+0Cb39pP4q5thamjoHX3oeif8UefxwP9Fd0mKGxqr0OfWAJ3H1lOVAWbyaeEAJqwYrtGRTtZf9NZ2tEKfajoDodXStLn9s65+jKquYcnUydOecYUvFyzuXjnoN2BmE/bnjaATBjO/vHiVAk9lhjDtiZzsTvt394mTYOnn6/dU/OvIAdfPLlyPHy+eDw3ey3Qz/9Hvy/R+3hMHog/OpIGFyFawDH77e3Ddh3sm17QxFNWUedBJm2M93weuy2OhoMuUxbA1kmIOJOumMpH+9cWPgacobuHywDtYMAemMvHH/lOvJKPt3i2s25SFnxp5FPLhf73FIzgo17HA+REC1WfuINXlg5tH6xYNoOW5i2Q3yX9kYAivLVxYPSyE2rp2UvTip6wc/3s5NawvYjmX9/CYQFA3vbj33mzEGzWh83rS+FcY2wLM++m+yI3e1HWftVQtiCn/wKDrFib3Sz7KAgln3n4dpa+OYr2LwBrGbwLQP6Sn1laKxq2xAQLU/Ndnavaso1sObIVF16SWnxt/TpFhQuJHz8dDqqBElqQFPHgwxJdPScY1Dj5ZjKxx9NnSy0MwVB+w7fPcbbP07E5+D8HPlxoavYfgQcOwPuf8UOFA2tsbcbyBk7Awn7ObDKfnFCYb59J7droExix1H2HqPL17fuCddRY9VzGzl8PXpuI1d1NPRcukugozMCYD0lyObUU6dzNh0L1aO4hpylZwTLdL+6uaFyGoT0Vy4n11f9MuelP2dZVV2H0xEtLGXD4ZfQr/l7Rv3v72yzbSG1wqKyJIsmjWwjduzygnDgFPuxzG9X2m89KynoWtHajLDvkqutg1HN8GWN/Yhpn7LkYkHp0UrLcQz697b3mXkLR9DH0X4K7RyrKWny9ZjXmqS6fGTc/HwR3/usCPAD4VRVnN1Hi6RMjzbu0A8OZFvRn6Xlq3lg0otsLNqS2ni6OUelqNuco6OD5hwrBFYzWE0gTGBET7pzHh/Y0PZz3sV2pigPzjwY9ppoB0uyZfP+XCDgh5/uBSP622/NnDpG/yNQtrPtQPvlOvvuYN8t1paAX1E+XHkcPPAqPPdhLLEDxmpKWje8HlPSuoGO8f0EBclbYxh6GCbAYAC26rFKM4bah3nsU0v3D5apnAgvBt5ZV25Hbk/3i5n8Sx5p8pyOjSxHvIhlP8bRv7e9qe7ocQEGD7yZ4aHFlPVvTNxpZPBGVRn8v1Ps/VSmjqFNzn+28OyH8Len4GdRmDW57W/ytCx7EfSWF+e4DWM1UU7O11yPIravnjxPO7sWvWJBm1Bqd4nPVutdaiIPe7N/p9xREq9rS7TtDJal09Eh+Ojl4xm9fgKNgRae3fZdO1jmYc4J6uYer3NOshjJnzM05/g2Q94HEPgWrAawWiCcxW/v61K82Bld9DfL7Ex80er32XdBjR9iz43jh9qP5BnaTsBv78u529hYQg7aGbDvTK4qte/Abut6xLJgaLW9nyqQMZ8oUU7O7ybXY4r8cv+5rqMFpYVw8C4wYzs7ed06OPz/Kcobuj9mob71ZPv+XenY2mCXuatr68j18aNjK8ZE9w+WqQy9nJeurtuYkZ2Jti52nWU1joVl2Xc8bTsQdtoWth8OA/rY+3hZ1gZgg/um7AY9lv2L97EzUh/tzAksey+3+LxWWUK7FmJjhlhQWkrYZxHtA8LaTIRCIgTU4zZTaY7PLX1A7GAHxAjH3mpZbAfHgoshPNAO4Fj1EBoNwS8hUgOiBETAfntmeDgEvrffUxABogXQPDYMzQHCIyDSB3wNdhuEIbiiGeHzES0GQZj46zctIu0zFrrFBclpQ5o19b0G8nV9ZmLOEfbLEooes9886izuS7ffUE/Fq51xHnM5v4vtTFGB/ZbhPcbbdiY/AL1L7R9ncjW4k3Xk+HH8fBl8vQJWrG/7jzJgD736lkrCIkCEMvzUYdHcmqmqkIm0dNcWFmEqQfjx0YBPbEHgJ0Illohg0URU2PvCWjQhRD4+GvGJ1juJBQVEaD0olmjBzyZXOS1asERL7LPjV5zWAq315Ou7G8w5WPYesafsD/vv1LqWqSp2kctgMLjTHQMduYrqXMgvKtC9HbU97eryejpbGTzNxfBAm4iKQhDuBymKQNAEWISoIYTf3fmS87Tfo/hFLTFPCnsJXgb49G050iwryojqdRw0pYUpo+39PgLxvZUMGWNzSwC/XxAM5GYkYHhfqKopYNOGAL0JIUQzjWE/RYGI57FSWlPB98ffSF19MQF/hJc2/IMvrRksiwzsUNmT6A3siv3YpEXy2ymHYQeEW2JpedgvIQiS0NEaACIfGBtLiwA+gW/nENFgXmu75a1d5vVdRySv2J5Io81YsQ4CrMVHk/3ZWk2LqCPCs8AWVAh8NIrtaGCQp0BXWKwAfqRJbENDXCAPc05URInyGcTmq0YxjgaK1AtDR5oQfkKiPyLd3AZY0SjVr72Ib93GlLWRQYMgjZ2JrxIjCIKEqSLJDshtOf860sP0RmA/w+cTdfhj49EnGrFoJCqKEBRo2wpTGcuPEBBriFKGRYgx/b/jzANDjB8S2wfRnGyDgjGD4A+nQUlh++pvbCjl8UXX82OkkigF+GjCIhueubOIUoTAwiKEj2YEPqIUYRHFIkw09oukRRhBIFEujiBIlHxHixF8sb1Xdfisenw0AOCnFj91BK3lFPE+UXphiWYK+DIWhBNYQtDCQPsa9rgeymcRLQwhIsrwWS2e5pyw6I1FBD8b8bOJkBhAlEICrNfWiYoCQvTVzl8+mgmwCiuWIUSQvEAT5x3yKTO3bzE3ghgMhszRlsBTZ8igSnfbh609+7J1VZAs2yfvDMjX7YNlP0Rvxoq6e3al0UaWB64nLxrhh+hvWRspy1DvUfzWZpw/q0VEL1qjAHp8vihHTH6OS2b/mfIizOKlA2mO+An6osTvKso1ivLhhMOLeOzLwVQUr+OHzRt47JtBnLX9V6ypL+DhxYOoyLeDQD4LJlRtpCgQ4b1VlWxbuZnt+myCQJhI1RrCgVGEIyWc1/90hJCf6fUSOlH91Oxl8Ar7ssiLl3f0YdF652Seo488qY/4OiWQXDeivPztMo0VFY6k1nrNjGptQ0BLtJEI/0UfLAuyJnoOKyLenoNeYT3G8sCDrOBEVkS291QHoDDaTFjMB35AEGBt9ExWRIZ5rC3fiqA+j8HNaxjww3PKmub3KjUroldhRd02PLQ3zqv0/Y0tYjp10X3xYgdkhDM6TMS+AxLsoIPVRFQUJoJp6vqBRL8WLQgCFOU1csER5zCm3w/GzhhcCQagvCQ1PSrgue/7sb4xn7F9amkI+RHCYoeaDfxnSX+GldYzuWYDBcEmAsXridb1gYhFlGJHkFl3+5KKrbAzSeVT+xAEiVCUyBMIRMIAWbFr0L6W7HK6noJEyFfI1dpnRJSn5gvBBo4BfFiEyLN+JCSqYkEmQQRvPqRNlIC1jrDoje3ue53BW8+BHRwMJr6nr+N2G1ry8a8uWM+O255FXnCdR7kMBkO3J9OPUZo7rTqWHvDYa4cEy5YvX84vf/lLnn76aRoaGhg5ciT33HMPO+64IwBCCC6//HL+9Kc/sWnTJqZNm8Ydd9zBqFGjEm1s2LCBs846iyeffBKfz8dhhx3GzTffTEmJwlNzIeIrA5/eoQHYZJXxkyFXYCFYF+yV7ka0NuAjTPsCb5OHfsXlh/yF4vymTAlj0NCnUPdMXI5gwdSBG9hlwAa2hAJ8sqaCloiPbzeV8Nm6cm54bwwRx87shYEwfkuwJRRk5pCV/GLHL/lhcxHNvd+AXh9CpBARDUC4l914qBSisUV2S+zOSLDTwtL1GA1CVLdYt2LtuF1gAqwwWNll3AI0EBRp7oDwhcDvLeB6X+WePF46lTV5JeBrSV8hjtWSdGyErwX8bajvgUDLBnwtTSlP8mSbKcwmO9Pk2wZ8sWeHhN/+p2BFZJwj4OX1iKoW+AABhOUHBBGfBc47ynzN9njUNulHRAqwrBbm7/1PRtWsyL4TbMgJWiI+nlnSj0vemERdS5A8fwSERWEgzDaVm/lgdSVjKms5a/uvWVpXzI/Fb8Hgj+1fMSJFEM2DlnL7r/BDcx/bBkQLYvbCQTRgp6c8i4hte+LXhzz2RSxQlHRdiqQ/STZHWMllEnkCiIKvg+6GE76YHradFPhpZoSjgO6ZRwWWwPZBq2OHyWM9Ydl1hUUiOOjJHsu3lMWPr+rZUZuJg76mqtdGb3J1EdlkZzxjggOGXEe+M8yM6eymm5+fjAfLNm7cyLRp09hzzz15+umnqaqqYvHixVQ47uC4/vrr+cMf/sB9993HsGHDuPTSS5k1axZffPEFBQW2w//Tn/6UlStX8vzzzxMKhTjhhBM45ZRTuP/++9sm0KCHodB9V2ohBDTFglIFBV0eIfVZgp/t9Q7Fee638BsMTiwLeuWF2W3AWqb2X8vj3w7kl69tnxQoQ0BjuPWyf2lZX177sRohLEJRH/ib7YW2BbAuqZ6iR1Jejyj82DvqKxAWhD1sRhJoACuavlwnUtGwhcpFG6RUwatjl7BuSwX5YT8Nw76ECm/BbREKIUIhyM9v26sFI2FYUhd7CjMC/Z+B0oq01dpCXt1yrIgdgMvGIBlkq52J3X0SyY8teCWieYhwCYTKIepPyUsJPMsE62KBZCBYC1YIgpvtNPmasUL2+NBiL8rz/FH2GP88AY9BXoNBZsWWQi59cyJ1LfaYb4n4Y3fj5vHeqt4AfL6+nNNf2AkRn00CjeBvBCs2pybZF8cdSvIvlwn7IgVfBBApbg30BOqT60VigWQ5+Bb/MUj4Y9dSyG4/5Nj809cCgS0k7J2vxZa/I4gUkLd5PBC1j2OojDxRQAsh8DfYMhSsto+dKyI2X7T1urbsY+Vvhk3j7YCmFYXSL1KPqdf24sc3RcQgs3d6gkAWb4SZdXZma/GytumIRa8Jdhi2BjN2DF2MJURmR+GvfvUr3nzzTV5//XVlvhCC/v37c8EFF3DhhRcCUFtbS01NDffeey9HH300X375JWPHjuW9995L/HrzzDPPcMABB/Djjz/Sv3//lHabm5tpbm69Q6iuro5BgwYx/u/T+Ly+T6uTBt7u4NetEp0+mnzk5GeVVN/lvhV91BQ18Z9DX6aqKMfveDJ0KWsb8tn/kT1Z06DY3yRDY1X71IyqHR1Zfj0iYFhDPR+/+DKloeQ7Cs7bbjw3jxyJJSyilti6p4Y86FgYjvDBSy8zZvMWmvw+puw5nU/LyjJ6Hss+WcmI299RntYI8DH2nF1a2o4dvjNEttkZ/x/2J5Iv3VHZlnMuVJkpWiW3sZXnvF9xI+dP/pKfjPqRPH92BacNuUNT2MdPHt+Dz9eXGzuj++5VR2eb8TkhGruzywq1ZnSwnWmdj+IFo2r5t/I8Xr3bJxw/bklKsc11MK7M2Jk4cTvT1cfDYDB0E1SB864ISjrl6Kj+NT8S1AFleLMzGX8R/BNPPMGOO+7IEUccQXV1Ndtvvz1/+tOfEvlLlixh1apVzJw5M5FWVlbGzjvvzNtvvw3A22+/TXl5ecKwAMycOROfz8c777yj7HfBggWUlZUl/g0aNAiAO2a+y5jeda0FnQbd8QNmIi/+T0W8jiV9dvoUKuQ6Qqrj6K8iv4VTt1tM7wITKDNsHZUFLYyurO2wsZpA5Ryr2lGR5dejFwQQ9blUyqCOFsLbpL0151ESTf6cDWSbnTlvh6/wJT3GRdvOuUVsQRzLUH3O4LiuLmri9r3f48htl5lAmWGrKPBH2W/YSvuLsTPusnmxM075LGE/Zh9/rNTtcciM6yjAisb+ZVBHt2BalpFtdsZgMBgMnU/Gg2Xfffdd4nn9Z599ltNPP52zzz6b++67D4BVq1YBUFNTk1SvpqYmkbdq1Sqqq6uT8gOBAJWVlYkyMhdffDG1tbWJfz/88AMAFQWh1oCBzpnQOWGyg6HKj+NcVcpOlOrXNyex9B2qN/DAgW9w0oRv8WX8zBh6Gn5LMKXv+g4Zq8pfmJ3tyuUtKT8HrkfPTn0n6lgRCtGnWQqkZ/o8alR0ye50ss3OzBn1A/2LG7N/XAPbVNTx1/3fYoeaDV2940D2IrB/5MyWAZ/NWLDHgDX2S3KMneneOjrT5DZUMsn6SDpGhcWWlux9z1i22RmDwWAwdD4Zt1LRaJQdd9yRa6+9FoDtt9+ezz77jDvvvJO5c+dmursE+fn55MuPwWDb5vEVjTzS0rttDfqa9Zu4WmES+8MI6Vd52dGQHSid4wRM6bc++S44Q4JldUX8b215Svq6xgLWNTrPu+CA4SsYa44jWLBNxWYsBMK5KlY50/HvHseq8ldkN2QnXk6XFwFyHZU8sgMul3XWUckTF14Awoe9CbSGSBTlmz/CxdDcxrllK/A11+MXVqtKLeXQUpnZTkL23jTy2i2b4irZZmfK80McNGwtd3w4IXW/MmHF9iPzqceQE0vY9sXfSPK+fdHW/clU10obrt2fjvmesZV12XVCswQh4PvaYu5fNJTVDQXMGrqSoC/KpOqNVBc2m2OmoTgYJuCL2vtexunRdiYmcCT20oFo0N6XEFJfjONvbt2DLBpsfVFBNLY3W3CL/dnfpH6pjhW2y6iI5iW/QMYK2/3F5RABu91IAYl9xYQF4VISc040D8JF9v5lLRXp91aM42uBwObktOAW27eOEapfRii8GvI66IUJW0m22RlDD0D+Bcvs12XoiXj9JbeTro+MB8v69evH2LFjk9LGjBnDww8/DEDfvn0BWL16Nf369UuUWb16NZMmTUqUWbNmTVIb4XCYDRs2JOp7xoKxJXn4fzyUSLQNt2v5Qmg3Gfc1246ABQRim6gGa6FgHeRtsB0E+TyrfvmT8wxa3lpRxUWvba/OlBzx2pY8frvr/8wxBYaU1pPvj9IU9qvHn+oYeR2rujx5UeLlPKiCWrrFkyyHSm6wFwKRAtu5j+RDqCz2Zs/YosW58XA0z+UtnkDzGhDPA9Jb1jZNhGU/cdMso2yINjO/71DyRJhe0QaWr98TNmX4jVrr3gGeTzkd2eSyZZudsSwYRD9YdmTq5v1YpLwIw7Ux+daMWFqwDvI22ram+HsoWNUuO1OW32LmRg1vrejD2S/tyNpGe2Pup5f0Z/vqjfx0zBL2HryaupYgZfktFAYi9lRnjiMAFQUtFAQiNIYCPczOWLbdaKqG5kr75R3CZ9uceJAJn21nEi8XkOYDSwDR1rx4h/HAuhWNpUVT6yba0G2SLylsRVsf6RR+uy1fyA7GJdp19GFFFH2qDrjuBKiijq3c8cwUmrY8xll7/SMr56RsszOGbo7KoJiXIxi6M7qxnWVjPuPBsmnTpvHVV18lpX399dcMGTIEgGHDhtG3b19efPHFhDGpq6vjnXfe4fTTTwdg6tSpbNq0iQ8++IDJkycD8NJLLxGNRtl5553bLNOAijUUBVvY3FRCeu8nlhdxOzRFjjZqWutawn5j0YAn7aAZqc1qHYLsGhedg4CosG1B4jBpjo/r4ZF8saZwG94u2M3pXdhMYTBMUyR2THS/rst54D5WVQsVtwWGDtWCSncnmFt+vL6woGEQbBlhL2DCJakLgfYQf5tailzOO9K8LBjkNK8Hyy5XbwW4v3yv5DqZnjtix8o5RNoiaWeQjXbGwoJIHukjAGmOpFBcSELYdzA297bTN4+CQQ/ad4WgaDZbTlQ2ICAsLKLCIuiL2odGcXyEgH8sGpoIlAG0RHx8X1fM798fyw3vjaWuJUjvgmYGlDRy8MgfOXKbZeZYA0FflKJAhI3O8det7YwP6odA3WhorrLvMBZO4dqAsEjdEcV57cf9GV9qXqJ4QJ0uHzxB649FcZK+SwdAe7e13I9Ob7eIqWDd5kqe+HhPfrbLv6kozr6nAbLRzhi6MVkWIDAYDDYZD5add9557Lrrrlx77bUceeSRvPvuu/zxj3/kj3/8IwCWZXHuuedy9dVXM2rUqMSrlvv378+cOXMA+5eb/fbbj5///OfceeedhEIhzjzzTI4++mjlm2PSUVWykb5l62PBsjjtWVm4eWsxh8TXktqeXFzl0/QQh1sI2NwS4Pu6Et5cXsW7q3rjtwT5/gizhq7k4JHLlfVaIh7uzIgdw1EV2ed0dRXFgTAV+SE2NjnumnIba17HqlDUUX338uOzTgYVugVYPL1xAKzcL/YrvltBr0J5Edrr7aKZnnOcZWQyp6O8/s0GstHOjOn/LcFAmFDY+Rhmps95zMb0+tre+NuJl2u3hyAErG/M56uNpby/upLXfqimMeJnZPlmfjXlC/qXNCrrJD/SDwKLVfWFSWmbmvP4trYXNcWNdrDMQElemP4ljSzfUtSa2F3tDMDGSbB+SuyHBWdBncBtnYNVB6W72Rlbx69XDeWvbx3EOTMXZt18lY12xmAwGAydS8aDZTvttBOPPvooF198MVdddRXDhg3jpptu4qc//WmizEUXXUR9fT2nnHIKmzZtYrfdduOZZ56hoKD1F92FCxdy5plnsvfee+Pz+TjssMP4wx/+0C6ZCvOaGVm9jMWrh+DuRMiGX06T8i0BviZ7X4eCVVD0IxSusB/T1P2i6vwr+SdRQfdZ4Ahblaawnw1NeSzfUsSXG0p5Z2UfPl1Xxqr6Qpoj/iSntDw/xMEjlqf+aCrgi/VlSW3rfEnLEgwvq+8exzAD5PmjlOSF9AV0v6KnGasp41teXKjOj26B5LYmkC/DdHIhFL/wuy0AVKgEckMnlOqg6eo6+3H7NT6dDFujo7petgTInGSjnRlZ/QMDK1azZO1AMm9nGqFwFRSuhMLlkL+OxJvx2njtNoSyd0PtNiMgIiwaw37WN+Xx1YZS3l/dm9X1Bbyzqjer6wuJCCtxDD5fV8asoSvtlzE4DrMQsL4p3w72eJiffD7BjEHJj1b1ZAKWYFLVRt5bpdm/sTvZGYH95IAvZD/er+04HenmYF10sDvZGQsh4LnPd+W0Gf8iP+jiq3QB2WhnDAaDwdC5dIjXfOCBB3LggQdq8y3L4qqrruKqq67SlqmsrOT+++/PiDwWgpE1y+DTtkaiHF6ZFbE3Rg3WQXCT7Szlr7f3kPE32vkqh8yJzreJfV9aW9ItYmW1zUEeXTyQN1dUsaS2hFX1BTSEAoTjj8Kpfny1YPGmXoSiFkF/ske9uSXAOyt7pzqsCl8yzx9lYElDR6iVkwR8gkElDfxvbUXbKqYZq0nlAKWfni4W43IeU2TQ+dxy2cKVUPMKbJoAzX1I3WtFFsYtzSmkG17uCvBKujqq1V+6wODW67g1GnUU2WZnivIaqSyujQXL2kLs6FrCvmvM1wyBeihaDoEtMXtTaz/irwuQOUlz7S7e2Cv7TmY7WFpXxD8XDeHTdeV8X1fM+sZ8GkIBonHFFHZGYHHHx6OY2m8dlQX23m1RAbd+tA3//m4AK7YUJtfRzE8F/gjjetfm/DHMGBaU57ekLyeTq3am+Hvo/xSs29V+DDPqdKUzOQfLEb3uaGcETaE8om3Z17ETyTY7Y8hxzB5kBkPO0Y1+YnbBgvEDvsGyBEL5NjJHQMwXsveBCWyJLVTq7MBYXh3462NvEoom2k1pxukPuPkyirny+7pimiJ+CgO6zVpzg6eX9Ofyt7ZLPdYq/8lxfD5bV8aiDaVM6FObOHYtUR+PfTOIpXUl6mMrtdcrL0xlYevblno6FuCzFINtK8dqoo7cnq6eamGC4rPcfrqFTIpMUej1FZR8C001sGW4/WhmqNTxaKZKSbeL2UJYFhH8CKs1XqGvq1JYxu22hnS4LU5UCxudnKpbL9QY1y49PkvQr2wd/7+9Nw+Torz3vr/VM9PLbD0Mw2wsAyKiIKjB44hRc3KcR1FjNPo+RwzPcYmRLHIS36hRj3FJruRA9ByTmMeDOXlVfI5GY/KqOYmGBEFCVIKCLAKyOjAgswBD9+wzvdzPH9XdU3X3XdU9MEtX9/dzXX11973V71tV3d9f31VdlfKDFPeZ/C797nAFQX2/9TYDnqN6nSuMkfKZxo5iRJF8lSQnEYlqeGrzTLy8uy712UqAaf18dKwMj7w3Fw9d+BHG+/rROVCAPzbW4uN2f9rfT2WeAYynz6Qma30GgLcVmPjfQH8l0HmGfqfI/vGGs81O5Ts4nb7GMif6zGC7C6dvg7eAnyeS5fCOMIQ4ktyYLANwemUTvO4+/W5NrpB+5D6v13CmWFA/cp/Xq0+WaREM/riRBrM7mGeVuBtfqw7aAWju9qK9142JJcnXU3ESoagGEReVbnIsgK5QAb677jxcWHMM0IDeUD52tZdi53G//lcaGUWROy8CX56zJxtHjVPYVy3HMebBqQ5Oq5ZrF1+8TB5HdcBdC+t/V/N9Coh8fbKsvxLongT0V+kXZDbenUwZ1GDwR/LH47q672P6wBGURbpx24mVmNN3QApA9WNEFajd2QJyWTpnDqiWl84PI/uNO9SfVrmOSxO46/L/g3c+OROB3tjNZOI+423Vzxoz+UzsjOQ4o+QzB4JFCPYXYJw3s/7yNBTa+9xYc6hqsGAI308CGn63fyI+aC3H1NJutHZ70dhRPDiOjKJsdkUQxQXhk44/G7E8yJfVPhPRJ7m9zXpBqBQYKI8dqJmu3xgmcYOZ1D6jnsmTA8gunwGAuZP2JOYRwhGgoytFKIQ4EZ5RRjIF477ISdyU5Mxk2YTSYxg3/bfo7cmLTYaFoN+S27jDWHROlWek6i/3EdL7GF2hAry0ayq+OmcfStxh5LnG/otVCD1M1xA+S5GoZqnR9F4x5o7jZdhxrCx17qcamygpcSt+1J3ivmq7HVX5t12+nM7na6jtjX20MOBp188QLdml3/0rXKyfBdBXDYRKgFBZ7M6ZxhsDDAY/oBVgXdEcrCuaC0DAIwZQ37ML+z21NitCtRLklaFaIUMRrkL1IyjVSpN//JgjlV8TBRpQUnQUYvL/D/R7AAj9wEz8dBW7iQBY1I2AzxzsKMJjH8zCP57RhLkTAhnhM0PFlx9BcUEYbSf9/aThSGchjhgvSD+E76fmbh9CUc2R626kmFLaDQ1i8EAZkFs+A6FPhruDQFEjUL5JP8ssXAz01sYuC5AP9I8DwqWxCTSr72ArcXKQqsCc4jP664qSAC46fQsggOYTwIf7gAOH0lg8IYSQU0cITpilIGcmy0o8vZha+QmOfDrBOtewwirPsErG5F+XdomcoW1UaHhqywz8ds8UzK4I4HsXbsc0f3f6Ik+W2IRYVGjoCuWjucuHg51F2HHMj72BEvyPKS24/ow0sxcB7A+WWGpMKwdL58chJ8rSZmJxz7Dvq2O6HU/185g3AOS16xNoJXtjbQv0HzV9FfqZZ30TgIEyIOqRzgrQB1s2YaGFGDkgK5FWU092R+rtftCk+tE1lBkac4TxZ37kUlNYEMa4wk4EQ2L09+s0P7sCGl78eBpe3TsZ1884hAfrd6BYNaE+EsS8JhzVEOh3Y1+gBHtPlEDTBC6va0FVUV9aY2xpG4dDnYVj9v3U3udGKOKCNz+axoJyA19+JPlyPLnqM0Ds79Qh/a/W3pbB9lGPPnkW8usTaL3VQKRQuu6ZnQ9ki8/oMUwpb0a1/xgOHgUe+S/gyHFgUpnFMIQQQoYfnvVoS85MluVpAmeWd+C9IxPUDeQkyOo1kJzowaYtpHJVTmFYdkS4cKTLh+ZuL/7nGU0nNVk2EHGhayAf7rwo+iPJV6bpj+h3qGzp9qKtx4tDnUU40uXD0V4PDgSLcazXg/6IK3GEeNb4YNrJZkRo+gWkbTQCsE6eZYaY5Lpd0SGdBZf1jOC+mmirKrcacyhxj+rnMaTfrKPgRGwCzQVEfMCAH+ir1M9AGyg3nH0GaWC7Hyp2K0eeikoVuEqEvKx0+qt+LAllC06SpU+pO4Qrpx3B8q1nqBtkiM8A+t/cX9o1FQumNuNzQ72zowB6wnnoDechKjQc7fUgKjR48yIY7xtAiTuEfE1AAOgO5eNIlw9Hun1499MJ2BcoQaDfjaaOQgT63QhFXXC7ojhnQiCtybKBqIYVO05DKOpKqXGkvp+m+7vgdfi1RYebmqJeuF1R9IXz9AL6TLJGQL/ubVHj4DhRj35ZgFAp0FszeI3NiFcxcDb5jD5WbdlRnOiM4BdvAE1H9ZrjwRQhETIa8GL8hBDk0GQZtNikDwT0q3TbtLVKkmzGNj2rDrRZjWGRKApo2NRajgVTm4eU/AkB/OzDmXh5Vx2K3WF0DsT/VjZIOKqhO5SPSFQbvHOYDY1DuEtnW48He06UmgvtkmE70qmXgqoq7OOPGAnT+hiBfXUktqNt/Wh8HhHVrzeV3w34jugVIl//y+bAeKAn9reakD929lm6K20owRhJ5wdKqpWk+tFj1d/cyjhxRqzRNOC8yhPQELuZTKbt19LyokLD7vZSfG5S25B8pjOUj9v/dCH2BUogBNAVykdUaChwRVHiDuE0fxdqivvQF3ZhV7sfR7p9GIi4ELX4nISFhrYej7JO5livF/sCJWlrHO7vJw0C1884hAL+BdPEOO8ASj0hfbKMPmM/hrE+r19/eNr1u2yKPCDqBXom6ZcKiJ+BFi4CLO8Y6USf0dtFohqe+zOwYfdg0+Ao/KGCEEIISYfcmSwDcG7lCRTmR9ATjslWHfSKv7c6oGd8hlRv7K8a3+q9RZs1TVW489w9KEv3QswC2B8oxku76nCs14ujvRbLtMqFLDTuOOZHXzgPvoIUk1AC+OOBWhzvdZvHtNGoGiNRJ69n1XYy9HHnRdFQ16K++2MO84XTPsWvd9dhT3tsEnME9tUkTmE7ZubnUUA/+yz2983i2Nln4UL975r9lfrfaQbGAZFiIJqnCF7140NVp3qt+uWY6sdPql9/dsu1/s1K7Kkr7YYnL4q+SGwfyOj9GnizsRb/a1YjClN9vxti2Ha0DBtbyxGOn90VGy8UdaEnnI/Wbt+QfCYKDTuPl6FhSmvKnWyCrw8Vvn58Eigek++n8YUDuKDmOD8MEqWxSdK2bq9eQJ+xf2/VRosAWjdQujtWrukHZAbKgP4J+s0D+qqAsA8QBbEDNU70Gf397pbJ6MnzQaDXsA5SLJKQ0YBnlRFC4Oy7xw+ZqaVdOK/yhP0RyPhpFJr0Po7xF6ScoKn6aDCPKbe3Wo4A9gVKsGLHafoF81Mh9L/FLHt/No71eodV475ACbYcHWc/JoCWHi9WbD8NAlraGpW5mJxACpj7KhJXTRP48lkH8JWz9/M6hRI1RX24cebBEd1Xh2s7JpFBn0dTHBoARPXr0RQdBsZ9CNT+EZjyG2Dyb4HqNUDZdv0uiHl90G8mohKXakrK6hei/ItO9TqdD4JqhZlHEkjedMSa6WVdmFfV7pj9etuxMjy/4zT0h1OkA0I/K/kvhyvx7xvP0ifKhksjgFUHqxHoL0g53rtHJuDj9tKx+X7SBG468wAmFfdYx5mjFLgELq9rps8Mp0YtVpjXB/haAP9HQNVbwJRXgLpXgNo3AP9OoPAQkN9jITJTfUYX23hsCnYdvTR5+xFCCCEZQE6dWeZ2Cfw/ZzThvSMVEFYGr0qqjGVyjiAnYla/PVUH6eQcRkrWBDQ8teUM+PIjuHX2J/DkRU0hRISG9j43GoPF2NhSjnWHK7GhpSJ1ojFEjaGoC499cBb+v8s3YLx3IElDf8SF/cFiPLHxLBzoKDaPm0Kjch1aJZg28U8u6cFdn9mFgjxmWUlosb9ijuC+OlzbURV7pnweU2sUgGsA8BzXHyW7AOTpZwCEyvQzz/piZ5+FCwERP/ssHawCMP7yMoqUV4AsBBZ91N+Lqp9aRI3bFcUlk9rwrtX1MYGM2q/Dse/3T7t8eLB+B3z5EQjo3+sDkTwc6/Vg5/FSbGwdj53H/dh6tAy94fxh/+xuO1qGp7fOwF3zdsGbF020jQr9b567T5Tit3umYO2hKnQOSJNqo+gzt87+hAdkVGjA7IogXJrQL+9Anxk5jVoIKAgBBR1AYZNeEfHFbhZQpD/3TtTvxpmxPqP3EcKF49H/hULXeuRpHUOIlRBCCBl5cmqyDBpwyaQ2VPj6cbTXq/7lJx88k3MAOUcwjJ1UpkqeVH2sEicA/eE8/Pj9WVh7qApzKgLQNIEjXYXoHMhHVygfnwRLEOgvGDzKb5XTnIpGAXzYWo5bV87H/Npj0AwD9IbzsaWtDHtOlOp/b5VzpDQ02sYql1to7I+4oPHvl5YI47WCRmhfNbW1GsO4vJHYV+V4VWONlkYA0CL62WcFXYDvsN4g6tavfdZfqd88oH+CfkHnqBuD1z5T/dCwEi63tfv1ZdXHuAxAv7AtIGB5WIFYoQF/V30cBa6ofhF6B+zXkagLL3w8DfsDJZhTEUBbrxc7jvnRHcpHe58bveG8wT1hhD67Qmj45Uen44OW8fjC9E9R5hlAW48X77eMx+72UrT2eNFv/GvrGPjMeG8/xnkHLAYgUdWkkhH6zAj5jNDPLCv5RH/v367fYTNcqN+YpnO6fnOaSKE+qZYRPjM4YaYhBCAv9bojhBBCRpncmiwDcLzXg+6QJFuVUAHWSYxqTsYqL1DVWb22SKDCURfe/XSC+UwFq4ROTlbl9iepUUDD1rZx2Hp0XJK8pNjlZaehUZkkDUFjbzgfXQMFGJfu9d1yjIMdRYNvRnBfPdXtaGqfoZ/HU9Mo9Is5u/oBzzGgdCcg8vQfMQPjBq9Hk3T2mdUPFbtfWKoVY/crZDD4ngnT0XLhl+E71gjvsYPwtTehoLsdoq8bELyBRir0u0RKhRm+X0ej2pj7TCjqwget4/FB6/gkeUmxy8sehe+nvkgewlEN+by4v5IP28oHb+KQsd/BUvsM/TyekkZN6DenyesGPEeBkj3Q7/DsBnomx25OUwb0TdC9x3LybGR9Jt42JCbiSPSH8GAvSl2rkCcOAeBZZoQQQoaB+PUHT/JvAbk1WSaAtw9VDV7gP06qdScfLIPitd0YcrKTqtx80M0+TznVcnmZqljSGWOMNfaF89ApT4ISHQHzusng7ZgyPuMyVbGkM0amadQigKsTKOgc/EtN1K2fbdZfof91sy929lnigs7pIv8Isjvyr78WmgsRXykCp1+EwIyLAQho4RDy+zpR0N4E/Oa+ISw/BxHABy3jEZHvXJdr+7WKdDTaMcYa23s96A3nw5vPs8tUdIfyzevcimzZVx2jUQCIAPm9QOkeQOzRKyO+2IGZEt1ruqfo3jPkv28O3WfMdRr6xQz0YwY6I/8DWqQVwFeGsHxCCCEkBcabdnR0AH5/Wt1yanahP+rCqoM1+hu7I4fGZMV41M7O/63GkVElQlbtrZanOrqYqo9qWVmkMSw0tHT7MGs8j0bKRETszDLVtsqw7WgZWxbtq5axJeqFfuaZ96h+VkDpx/oPl3CxdPZZWeysAJdhEFmcvODUQXdNmoO+8VMAzYX4Shf5boSKKxAq8IHYExYa/nakQn/D/TqrNAqAN0izQgAd/QX0GadohADyeoDCHr2sBMD4At1rQiVA5xn6c8ivP6LGnwua4vXQfMaqvYAbAuUghBBCTplhSNpyarLsk0AxPj5eap/wqJD9PI5VgjLURMaYmKn6W42pGt9Yl0Mao0JDoN+tGJSEoi4Ejesmg7djWjh8X00LOXYtAhQE9UfRAb0i6tF/xPRNiF37rBIIFetnBlgGaRxY/iGjP0fdhRhwF0p9hhJ8btPc7cPeQAl9RoXDNXaGCtDc7UO5j2eWyUQFcKizcLAgg7djWjh8X02LJJ8JASIEePr0AzVAbPKsDOgvB6JeoLtO9xuRDwiX/kgKJLXPqPvQZwghhGQWuTNZJoANzRXmv2DKiUo6Pi0foVNNWKqSFmOeoGpjlTfICRcU5ca+cttc0agBjUHDdblIgv5IHk70uZ2xHY112bqvGutORmO80NUHePsAb2tMe/7gBZ3jf90cGA9EvID8l0ATKmFWvwBJKrYf9euftzjcr9U4UGNfOA9/bKzF7PFBfjQkeiN5ONRZ5IjtaKrL0n3VVHfSPgNAiwze4VkA8O/Q/74ZzdfPdu44Uz/jOeLV31t6DX2GEEKI88iZybKw0PBWU7X1QS5AnQAZsfL5VMh95L6qsVQH2lKNI/fNMY1tPV4IgZO9fl/W0trjRdDq7zEyGbAdc2FfHRGNWhhwd+iP+NlnEW/s7LMqffKsf4L+gyZaIAVjJ4CkS1QAbx+qhoDG/doOB2sMDhTwk6Ig0O/WJ4npM+nF5liNUf3mAYB+prPviN4hmg/01QC9NfrkWX+F/jdO+gwhhBAHkzOTZZ92FmJrW1myV6t8HNJrQJ3AqBKMVAmQql5ebvy9HIsqZgF1rDmoce2hKjR3+1Bb3AsSQwB/PlCTfEZlBm/HXNhXR1wjAP2OaL1AXi/gbYn1iZ0N0F+u/7DpmmY4G8AYoIxKGJE51uvB2sOV+hvu11mpcWVjDRbP2YfJpT0gMQSwpqkKgf4CU1kmb8dc2FdHzWcgANcAUHQQKDyol0cK9TOcB/xAz5TY2WfGGwfQZwghhGQ+uTFZJoC1hysRHCiwTzbkZECVNADmZASKsqEkIrCoU9XLyImRXJ5jGpu7ffjtnsn41nl70hs7Bzja68Fv9kwxF2b4dlT2UcXj4H1V2UcVz3Br1MKAO6A/ij8Bxn0YO/OsUj/7rH8CEC6KnXmmWjA/WJYI4MWPp6K125tZ21wmG/drmRHU2NrjxcoDNfjqnP08izlGe78b/7VzGoRx5WX4dlT2UcXj4H1V2UcVz0hozO/RPUYAGLdVnygLFwGdM/VLBQz49Qm1xI0DjAPyg0UIISQzyInJssNdPvxy2+kwGbBs/IB1IgPYHzlUJSIy6RyBVPVVHZ20SrpUseeSRgH8atdUXH3aEUz3dzHfEsAvPzpdv5abk7ajXWzZsq/axTaaGvN79UfizLOC2JlnFYN/3Rwo028mYHvdsxxHAI0dRfjNnrrkCYNM2+aqvtm2X6v6DpvPaPiPLWfg7Iog5tcco88I4JXdU7C7vdRZ29EutmzZV+1iG1WN0UGv8RzTC6MFgzcOCPn1AzV9lbEb1MgCCCGEkLEh6yfLQhHg37echabO2MXfVcav8mXVUTcZY6KjOpon18nLVy1PfrZLaKzIYY1Hugrx0Ltz8b8v24hy70DyEVOnkmr7KIgC+t1foTluO1ou06re2J8aT16jKwS4T+iP4r0AXPrFnAf8+l9qgqUWg+U2vWEXHn1/Lg7H7wbopG2eC/v1MGs83ufB/X89F/9nwXrUlXY7319OgSiAdz+dgMR1+gDHbEfLZVrVG/tT48lpBABNAHkDQF4b4GkbHCBcCIRL6DOEEEIyhqyfLFt1oAa/3z/JvpFdMmKVKKRzxE2uszsCJy9DVa96PZSjjVZkmcZ3Pq3ED9bPwe1n78e2Y2WoLerB56e02ffNcFp6vPivndPQ1FGEqADOGt+Bb567B2md6+PQ7agky/ZVJRmlMXYx5/xu/ULO3rDNQnOXF3ZOxV/i1yqzwjHbXPE66/ZrxeshamwMFuPWP12IO8/dg7kTAqgu7EWpJwc/HwKICs2x21FJlu2rSjJOowAKuvUHDtkslBDieIzXMBBWs/aEZAZZ/7+af9s4CwORmMz451GVHBhfq46WyUfiUiUQqvpUY8l9jPFaxS6PRY0AgFf3TsaXfncJHnznHKyO3wU1DUIRDfsDxXi/eTy6BsZuLjkU0dAXzkM4qouvKuzDV+fsx6WT2nCsz4OVB2oQith/fEMRF473eRy9HXNhX3WMRg3QL+ZMZH750emIxj6rWbXN5TGzcb+WxxyCxv0nSnDPXz6D616/FK/vm5y2z2QTfZE8HOnyOXo75sK+6jiNhJDsRYjBSTJe/JNkOFl/Zllrjxfwxt7YHfmCoU5OKox1ctuhIB99sxrLLtFQxQSpnhoBDRiI6Hdd2n2iFGGhIT/VD30B/HzzTDy34zSEoy784UtrUezusu8zQjy7fTp+vbsOZ5Z34Gtz9+LsiiDGeQbwP89owhdPP4zvvXMOOgcK4MnvtxwjFHWho7/A8dsxF/ZVR2kkSXSHCgDjTd6ybZvnwn59khqjUQ1dAwV4de9k3DjzIDz50ZSxRxE7UjmWvxEE8EmwGDuO+wEAJe4Qzh4fRFhoiEQ1TCxJ767SAkBYuBy/HXNhX3WURkLIqROfiBrrs7c0TR2DEOBkGcl0sn6yzGTiULw2oiF1UnEyy1J9D1iNlSopEoq21Jg8vqH97vZSHOnyYUppj0VAOoH+Avzhk4kI9rnhKwiP6e+YY70e7AuUYF+gBG8fqsKV045gybl7MM3fBbdL/zF2tNeDikLrybKjvR6093nUlQ7cjrmwrzpGIzGTC9ucGpPHN7TfHyxGc7cPU+2uYSb0v9Q/vXUG5lQEMM3fhellXShxh4Z18ixxwN4QYvxNFMCJPjc2NFdg2fuzcKCjGACQpwmUeQYQBXDjzCZ8Z97HEEKDLz9iG1ckqk+uWel12nbMhX3VMRoJIdmBPFGWKZN4hKTBsE+WRSIRPProo3jhhRfQ0tKC2tpa3Hrrrfje974HLfbhEELgkUcewS9/+UsEAgF89rOfxfLlyzFjxozEOO3t7fjnf/5n/P73v4fL5cINN9yAn/3sZyguLh5aQCrTtUlklWUaUhu91Ria4rWcuKRKOIwJiqovNdrGF+gvwMu76nD3+buQ51J/MUeiGp7dPh37A/r+FY668MfGGkxQTEadX3UckwxH3SNCw18OVaJjoACevAiumNoMb34U4aiGVQer0TlQYBGoNbvbBy9w2x3Kx2/3TMGapip86fTDmDU+iFUHq3FhzTGcVd6hXA9CAO83j0d3KC9rtmOiLIv31URZJmvMAOgzNsvifm09zghqDPa78a8bZuOnn9+EwoKIsv3G1nL8yzvnYFe7H4BAgUugtqgXZ44PYlJxDyaV9KLEHYInLwpNEzizvAN+dwjHez0Y5+3H4a5CePKiqCrsgwYBd14Upe4QNA040uXD4a5CHOv1YN3hSnQN5KOqqA9Huny4sOYYwlEXDnQU4UCwGJ90FONIl0+/1liMiND0v+0DeHlXHf5yqBICwKMXfYT5NceUB//DUQ2v7K5DS7c3a7ZjoiyL99VEWSZrzAAyzmcIGSpjPRlld0aZ6jUhGciwT5b9+Mc/xvLly/H8889j9uzZ2LhxI2677Tb4/X5861vfAgA89thjePLJJ/H8889j2rRpeOihh3DFFVdg586d8Hr1/0wuWrQIzc3NWLVqFUKhEG677TYsXrwYv/rVr4YelOoommzictlQ28t97OIQ0nvVmFZjy22sEhNqNIyp4ZcfnY48l8DZFUFl2E0dhXh66wzE7+gVirrw4w9mK9tOLO5BqTtkWKSGxmAR+iN5KCkIYSCSh1JPCN2hfDz63hwE+t3KcdLCoKe9z4Nntk+HBgEBDS9+PBVFBWHlj5jWbi9+vvkMiPiPoazYjkNor4pLhhpPTmMGQJ9JIw7u1+qxRlDjnw/W4NH35qKhrhkTi3uhaQI7j/vxaVchTvS58fv9E3G0d/C6EKGohoOdRTgYv1u3aQFAUUEEBa4oesN5KCwIozukp2yF+RFoAHwFYZwxrhOF+RF82DoOR3u9iArdk4y82Tgxeb3ZaAz0uxHocwMasGT1+bhhxiHMLO9AiTsMDQL5LoGBiAtrmqrw6r7JCEdd5nUrrRdLMnQ7pt1eFZcMNdJnhtNnCBlNTvUMME6EkSxAE2J49+QvfOELqKqqwjPPPJMou+GGG+Dz+fDCCy9ACIHa2lrcfffduOeeewAAwWAQVVVVWLFiBRYuXIiPP/4Ys2bNwgcffIDzzz8fALBy5UpcddVVOHz4MGpra1PG0dHRAb/fD/z0asBboDZkI3ZH1OT6VImK8WiZXRu5bTrLHkp8Qx0nFzTCanfXzE1OSaNxGYpxU4U3JI0p9GTrdsyFfTUTNfaGgLveQDAYRGlpqaLB6ECfAffrDNaoQaAgT/+7vH6DIS1LNApTN/oMqFHVhj4zIj4z1uuDEEKyhaF8rw773TAvuugirF69Gnv27AEAbN26Fe+88w6uvPJKAEBjYyNaWlrQ0NCQ6OP3+1FfX4/169cDANavX4+ysrKEsQBAQ0MDXC4XNmzYoFxuf38/Ojo6TI8hYTR3KF5rUltjH1VbOQGQEwd5PHkMYXhtlWxYzZNYQY0YDFx6DKvG+JjaKGiMLce4TOOCsnU75sK+mg0aRwj6DDJ3m1MjBDQMRPJiN5nRzH0crVGDgBY7Y5k+k9SWGukzmeAzhBBCho1h/xvm/fffj46ODpx55pnIy8tDJBLBj370IyxatAgA0NLSAgCoqqoy9auqqkrUtbS0oLKy0hxofj7Ky8sTbWSWLl2K73//++kFaWfSqnq7tsb6+LNQlFmNkU6CYRVjqnpqpEYZarQfgxrtY8wQ6DPIjG1OjdSoghrtx6BG+xgzBEf4DCGEkBFl2M8se+WVV/Diiy/iV7/6FT788EM8//zz+Ld/+zc8//zzw70oEw888ACCwWDicejQocFK2cyNxq8yZpWhC8Wz6oiZMZmwaqPqI5epEhBjWytN8ntqTB5ffqZGapT7yGXUmPqH1ShCn7EZn/u1uYwaqdH4nhqTx5efM0njGJKRPkMIIWRUGfYzy+69917cf//9WLhwIQBgzpw5OHjwIJYuXYpbbrkF1dXVAIDW1lbU1NQk+rW2tuLcc88FAFRXV6Otrc00bjgcRnt7e6K/jMfjgcfjSa4wGrT8rEI2bDmxUB0tM/aREwQN6jE12Cc2qZIjOcExLosaqdFYTo3Jy5XHpMb0NWYA9Blwv463p0ZqNNZToxqnacwAMs5nCCGEjDrDfmZZT08PXC7zsHl5eYhG9YvdTps2DdXV1Vi9enWivqOjAxs2bMD8+fMBAPPnz0cgEMCmTZsSbdasWYNoNIr6+vqhBWQ0bqO527W3KxMwJwBycmL32ippMLa1ild+lsemRjPUSI2qZRtfU6N9vHYaxxj6jM1r7tf28VIjNcpQY2ZqHGMyzmcIIYSMOsN+Ztk111yDH/3oR5gyZQpmz56NzZs344knnsBXvvIVAICmabjrrrvwwx/+EDNmzEjcarm2thbXXXcdAOCss87CggULcMcdd+Dpp59GKBTCkiVLsHDhwrTuHKNEPuIFqE1ZSK81mJMGTdEOsE8YrMxfNbacoNjFKpdTY3Jcxj7UaI5PNRY1UqNdrBkCfUYxFvdrarSLVS6nxuS4jH2o0Ryfaiz6zNj4DCGEkFFj2CfLfv7zn+Ohhx7CN7/5TbS1taG2thZf+9rX8PDDDyfafPe730V3dzcWL16MQCCAiy++GCtXroTX6020efHFF7FkyRJcdtllcLlcuOGGG/Dkk0+efGCy4adKNox95HLVETyrREBVZ/XaKrlQJSZWyQ41UqMccyoddnXUmLzsXNeYAdBn0qjjfp28bGqkRrk9NZrbp9JhV0efGR2fIYQQMmpoQgiVPTqejo4O+P1+4KdXA76Ckx9INnPV2krX2OW+coIQLzMmC6kSh+FILKjRXJfuWKp+1Gg/JjWmH1cmauwNAXe9gWAwiNLS0jSDyF7oM0OAGs116Y6l6keN9mNSY/pxZaJG+oyJuM9wfRBCyPAwlO/VYb9mWUYjLF4by4RUb3eUTnXETh5HRpOerZZlXJ4cr1DUy3Xya1Vs1DhYR43JUGPyONRIUpGN2zwX9mtqtI+NGgfrqDEZ+gwhhJAsJPsnywSsDV42aaOhazAbvdzfKlkxJh1WyYlqPKs4rOIxPlOj9djUSI3UOPwaiZlc2ObUaD02NVIjNdJnCCGEZB3Dfs2yjCNu7kbipm1MNFSJiJxgAPZJhlxu7G9cprwMVSKjKdpbjaNKLKjR3AaKclkLpHbUSI3UaB6HqKHPUCM1UqO8bGpM7k+fIYQQ4iCyf7JMRjZ22eDT6Su/ToXcxypJUC3LKqlRjSP3pUY11GgNNVJjKo0kNdm2zXNhv6ZGalRBjdbQZwghhGQ52f83TMBsvJrhWTZkDckmLtfHx5MTAkh18vjGPnKdcVwhlcGmzjgONZrrqDE5Xnl8alQvmxqt69L9EZWLZPM2z4X9mhqpUQU1muvoM4QQQnKI3DizzCoRsDPkodapEoF44qBKUlQxqZKIVHVyPTWqY0i3jhqpkRqt64g12brNc2G/pkbrWE+2jhqpkT5DCCHE4WT/mWXC8NCkctVr43vjs/GhQpU0AObEQRVbPOlQ1avitdJCjdQoj0GN1nFS46lpJGZyYZtTo3r51KiGGqmRPkMIIcThZP+ZZUajNxpx3MiNr1XPcn0cY195bBnjmKq40umnisE4ltzHGLesgRrNYxnHp0ZqpMbUGokZ+kyyBmo0j2UcnxqpkRrpM4QQQjKe7J8sk7EyZ7vkQVWuSiTk16qxreJJ5306yYuxnhqp0Wp51EiN6bxPVyMxkwvbnBqpkRqp0fisiied9/QZQgghGUpu/g1TZdrGZ83Qx4hs4sbkxFhvfC+XGce0S1yM/Y2xqJIWalQvnxqTY6RG9TJUcRiXbexPjUQmF7Y5NaqXT43JMVKjehmqOIzLNvanRkIIIWTMyf7JMg1mY7dLClTmbmwnvzb2M9YblyePJbeVx5aRkxU5buPyqJEa5TiokRpHQiMxkwvbnBqpkRqpkT5DCCEkh8j+yTIg2bCNpq5KJqySCyvjtzoiJgzv5YTDKiGwa2/XlxqpUdVHFRc1UqMq1qFqJGZyYZtTIzWq+qjiokZqVMVKnyGEEOIgcmOyTGW+coJgNH9VQmI0cWO9KhGxSjxUMVklEPKy5KRC1Ue1DGpMrqNGalQt3xgnNabWSMzkwjanxsH21EiNADWqlm+Mkz5DCCHEweTGZJls2sbkARg0cE3R1thGZeCqRMP4rEpm5HHlBMIqTqvlGuOlRmqkxsE28jM1Dp9GYiYXtjk1UqOxDTVSoxX0GUIIIVlA9t8NM268sgHHy1SJhbEeUr08rjy2KgHQpD6qZECOQ7U8FfFyajSXUaN5bGpM7keNye3T0UiSoc9QIzVSo6ofNSa3p88QQghxCNl/ZpldghAvNyYSKmOXy43jqRIVeWzj63STCbtyGWqkRmqkRuPYxtcjoZGYyYVtTo3USI3UaBzb+Jo+QwghJAvJ/skyI5r0bERIdcb3qoRDTgzkck0qs+ujSX3ko4ZynMZxZC3UaK6jRmqkRvUyT1YjsScbt3ku7NfUmNzXWEeN1EifIYQQkmPk1mSZbPBGU07XoOMmryneG8eXkwhVMqFKGIyxyH3kxEMFNaYHNQ6OT43UeCoaiZlc2ObUmB7UODg+NVIjfYYQQojDyP5rlgFmYxZSeTp948hJg1wuL0NODOTlqcrl93YxGpdDjfZQo/0yqJEa04mRP2SsyfZtngv7NTWmhhrtl0GN9BlCCCFZQ/ZPlhkTAFVSAUO5VV/VUTpVXXwZcpkVqRIRVTyqJIUaU48p11MjNcpQozoeWSNJJtu3eS7s19SYHLdqTLmeGqlRhj5DCCEkS8iNv2HKxqsZnuOmbkQo2qrGtBrXqkyTHvEyq+REjkmV2MjPqnpqpEZqTC6jxpPTSNRk8zbPhf2aGqnRLm5jX9W4VmXUSJ8hhBDiWLL/zLK4YccNWJUsyG3k+jh249gZvN2y0+0jJxtyckGN1GiEGtOPVzV2qj65rpGYyYVtTo3qNlZxUGP68arGTtWHGrNfIyGEEDLG5MaZZUY06dmujRFVomI1hoA5eTAmBfGHfHRtKLGk6k+N1nXUmNyWGqkxVSxD6U9yY5tTo3UdNSa3pUZqTBULfYYQQkiGkf1nlsWxM2CV+ceRDV0+AqdJz5DqVcuySipU7VXLt4vHCmq0H5Ma1cuixtTLt4vHCidrJNZk6zbPhf2aGtX9qJEa6TOEEEJylOw/s0yVIMTNXE4A5DaqseQjcPFnuU5uI783Jhby2PLyVX3kZavip0ZqlJdNjer31Ji+RpJMtm/zXNivqZEaqdHcjz5DCCEkxxnyZNm6detwzTXXoLa2Fpqm4fXXXzfVCyHw8MMPo6amBj6fDw0NDdi7d6+pTXt7OxYtWoTS0lKUlZXh9ttvR1dXl6nNtm3bcMkll8Dr9WLy5Ml47LHHhq4OMJs0YE4wUpmyMSGxaisfubNKOKziko/GyW1ViY/VMqlRDTXaL8/YhxqpMZXGUYA+IzHW2zwX9mtqpMZ4PTWOvcZRwHE+QwghZNQZ8mRZd3c3zjnnHDz11FPK+sceewxPPvkknn76aWzYsAFFRUW44oor0NfXl2izaNEi7NixA6tWrcIf/vAHrFu3DosXL07Ud3R04PLLL0ddXR02bdqExx9/HI8++ij+8z//8yQkQp1UGI07/l4g2bRVCYIwPOySBhmheKjiNC7bmGzIsaqWT43UKPenRjXUeHIaRwH6DDJrm+fCfk2N5mVTY3J/alRDnxk9nyGEEDKqaEKIk7YlTdPw2muv4brrrgMACCFQW1uLu+++G/fccw8AIBgMoqqqCitWrMDChQvx8ccfY9asWfjggw9w/vnnAwBWrlyJq666CocPH0ZtbS2WL1+OBx98EC0tLXC73QCA+++/H6+//jp27dqVVmwdHR3w+/3AT68GvAVqw1clDcY6qyTBKiFJ97XV+MYtYVWnKpeToXT6WcVgVUeNqfvI8ahiokZqVPWzisGqbqw09oaAu95AMBhEaWmpRbDDC31G8Z77tf341GhdTo3qflYxWNVRY+o+cjyqmOgzKYn7zGiuD0IIyWaG8r06rNcsa2xsREtLCxoaGhJlfr8f9fX1WL9+PQBg/fr1KCsrSxgLADQ0NMDlcmHDhg2JNpdeemnCWADgiiuuwO7du3HixAnlsvv7+9HR0WF6JDCauhEN1kmDVblVnUBykhBPZlR9VDFpMMekqoNUp0nP1KiOId06aqRGOR5qTK4bQ+gz4H6tGoMarWOlRut+qhjSraNG+sxo+wwhhJBRZVgny1paWgAAVVVVpvKqqqpEXUtLCyorK031+fn5KC8vN7VRjWFchszSpUvh9/sTj8mTJ+sVAmqzFxavje+Nz8aHCqvkxCqZiZfFkw5VvSpeKy3USI3yGNRoHSc1nprGMYQ+Y1GuWka8jPu1OiZqTO6jWj41qqFG+sxo+gwhhJBRJ2vuhvnAAw8gGAwmHocOHdIrjEe0jKZtNHJjnfG9fPROPjJml5RAqjMuL95WNa7cTxWT6kgeNVKjVSzUmBwrNZ6axhyFPgNqpEZqlONRQY30mZPE0mcIIYSMOvnDOVh1dTUAoLW1FTU1NYny1tZWnHvuuYk2bW1tpn7hcBjt7e2J/tXV1WhtbTW1ib+Pt5HxeDzweDypg1QlDIDZ3OU6VbkxsbB6rRrbKp503quWo4IaqZEaqdH4rIonnffpahxF6DPIjG1OjdRIjdRofFbFk857+oyJtH2GEELIiDOsZ5ZNmzYN1dXVWL16daKso6MDGzZswPz58wEA8+fPRyAQwKZNmxJt1qxZg2g0ivr6+kSbdevWIRQKJdqsWrUKM2fOxLhx44YWlDA8rAxZSM+aoY8R2cSNyYmx3vheLjOOaZe4GPsbY1ElLdSoXj41JsdIjeplqOIwLtvYnxrHFPqMYhzu19RIjdSYbRrHkIz0GUIIIaPOkCfLurq6sGXLFmzZsgWAfhHMLVu2oKmpCZqm4a677sIPf/hD/Pd//zc++ugj3HzzzaitrU3cYeass87CggULcMcdd+D999/Hu+++iyVLlmDhwoWora0FAHz5y1+G2+3G7bffjh07duDXv/41fvazn+E73/nO0BVqMBu7XVKgMndjO/m1sZ+x3rg8eSy5rTy2jJysyHEbl0eN1CjHQY3UOBIaRxj6DDJvm1MjNVIjNdJnxs5nCCGEjDpD/hvmxo0b8fnPfz7xPv6Ff8stt2DFihX47ne/i+7ubixevBiBQAAXX3wxVq5cCa/Xm+jz4osvYsmSJbjsssvgcrlwww034Mknn0zU+/1+/PnPf8add96JefPmoaKiAg8//DAWL158cipl8zUmAqpkwiq5sDJx+YiY8bWqb6qx5HFSxauKmRqpkRqpcSQ1jiD0mRTjc79Obyx5HGpUx0yN1JipGkcQR/oMIYSQUUUTQojUzZxHR0cH/H4/8NOrAV+BulHcmI1rwM7krUzcOI78DKSXQKRqB0VdOokFNSbHQY3USI2p20FR1xMC/t83EAwGUVpaatEpd6DPKGIzLscqDmqkRmpM3Q6KulzQSJ8xEfcZrg9CCBkehvK9mjV3w7RFKJ5lI48/5LbGNnJZvJ1maCM/p0pqNKmdHINqmfJyjfFSIzVS42Ab+Zkah08jMZML25waqdHYhhqp0Qr6DCGEkCxgWO+GmZHEjVc24HiZKrEw1kOql8eVx1YlAJrUR5UMyHGolqciXk6N5jJqNI9Njcn9qDG5fToaSTL0GWqkRmpU9aPG5Pb0GUIIIQ4h+88ss0sQ4uXGREJl7HK5cTxVoiKPbXydbjJhVy5DjdRIjdRoHNv4eiQ0EjO5sM2pkRqpkRqNYxtf02cIIYRkIdk/WWZEk56NCKnO+F6VcMiJgVyuSWV2fTSpj3zUUI7TOI6shRrNddRIjdSoXubJaiT2ZOM2z4X9mhqT+xrrqJEa6TOEEEJyjNyaLJMN3mjK6Rp03OQ1xXvj+HISoUomVAmDMRa5j5x4qKDG9KDGwfGpkRpPRSMxkwvbnBrTgxoHx6dGaqTPEEIIcRjZf80ywGzMQipPp28cOWmQy+VlyImBvDxVufzeLkbjcqjRHmq0XwY1UmM6MfKHjDXZvs1zYb+mxtRQo/0yqJE+QwghJGvI/skyYwKgSipgKLfqqzpKp6qLL0MusyJVIqKKR5WkUGPqMeV6aqRGGWpUxyNrJMlk+zbPhf2aGpPjVo0p11MjNcrQZwghhGQJWTtZJkTMnftCycad1BjqxGAopj2U8dOpt4rJrj81UmM6UCM1nqzGvpBeFf9+zXHoM2mOR42p40ynzqqeGtVtqdGZGukzJuLroaOjY4wjIYSQ7CD+fZqOz2TtZNnx48f1Fw/8eWwDIYSQLKOzsxN+v3+swxhz6DOEEDIy0Gd0Ojs7AQCTJ08e40gIISS7SMdnsnayrLy8HADQ1NTkCLPt6OjA5MmTcejQIZSWlo51OGnhtJidFi/AmEcDp8ULjF3MQgh0dnaitrZ21JaZyTjNZwDn7e9OixdwXsxOixdwXsxOixegz2QKtbW12LlzJ2bNmuWY/Yf7+8jjtHgB58XstHgB58XsBJ/J2skyl0u/0aff73fEzhKntLTUUfECzovZafECjHk0cFq8wNjE7JRJodHAqT4DOG9/d1q8gPNidlq8gPNidlq8AH1mrHG5XJg4cSIA5+0/TosXcF7MTosXcF7MTosXcF7MmewzrhGOgxBCCCGEEEIIIYQQx8DJMkIIIYQQQgghhBBCYmTtZJnH48EjjzwCj8cz1qGkhdPiBZwXs9PiBRjzaOC0eAFnxpyNOHE7OC1mp8ULOC9mp8ULOC9mp8ULODPmbMVp28Jp8QLOi9lp8QLOi9lp8QLOi9kJ8WqC92YmhBBCCCGEEEIIIQRAFp9ZRgghhBBCCCGEEELIUOFkGSGEEEIIIYQQQgghMThZRgghhBBCCCGEEEJIDE6WEUIIIYQQQgghhBASg5NlhBBCCCGEEEIIIYTEyMrJsqeeegpTp06F1+tFfX093n///TGJY+nSpfi7v/s7lJSUoLKyEtdddx12795tavP3f//30DTN9Pj6179uatPU1ISrr74ahYWFqKysxL333otwODwiMT/66KNJ8Zx55pmJ+r6+Ptx5550YP348iouLccMNN6C1tXXM4p06dWpSvJqm4c477wSQGet33bp1uOaaa1BbWwtN0/D666+b6oUQePjhh1FTUwOfz4eGhgbs3bvX1Ka9vR2LFi1CaWkpysrKcPvtt6Orq8vUZtu2bbjkkkvg9XoxefJkPPbYYyMScygUwn333Yc5c+agqKgItbW1uPnmm3HkyBHTGKpts2zZshGJOdU6vvXWW5NiWbBggalNJq1jAMr9WtM0PP7444k2o7mOSTL0mpPDaT4DZL7X0GfoMycTM30m86HPnDxO85pM9xnAeV7jNJ9JFTOQeV6T9T4jsoyXX35ZuN1u8eyzz4odO3aIO+64Q5SVlYnW1tZRj+WKK64Qzz33nNi+fbvYsmWLuOqqq8SUKVNEV1dXos3nPvc5cccdd4jm5ubEIxgMJurD4bA4++yzRUNDg9i8ebN48803RUVFhXjggQdGJOZHHnlEzJ492xTP0aNHE/Vf//rXxeTJk8Xq1avFxo0bxYUXXiguuuiiMYu3ra3NFOuqVasEAPH2228LITJj/b755pviwQcfFK+++qoAIF577TVT/bJly4Tf7xevv/662Lp1q/jiF78opk2bJnp7exNtFixYIM455xzxt7/9Tfz1r38Vp59+urjpppsS9cFgUFRVVYlFixaJ7du3i5deekn4fD7xi1/8YthjDgQCoqGhQfz6178Wu3btEuvXrxcXXHCBmDdvnmmMuro68YMf/MC07o37/nDGnGod33LLLWLBggWmWNrb201tMmkdCyFMsTY3N4tnn31WaJom9u/fn2gzmuuYmKHXnDxO8xkhMt9r6DP0mZOJmT6T2dBnTg2neU2m+4wQzvMap/lMqpiFyDyvyXafybrJsgsuuEDceeedifeRSETU1taKpUuXjmFUOm1tbQKA+Mtf/pIo+9znPie+/e1vW/Z58803hcvlEi0tLYmy5cuXi9LSUtHf3z/sMT7yyCPinHPOUdYFAgFRUFAgfvOb3yTKPv74YwFArF+/fkzilfn2t78tpk+fLqLRqBAi89av/CUSjUZFdXW1ePzxxxNlgUBAeDwe8dJLLwkhhNi5c6cAID744INEmz/+8Y9C0zTx6aefCiGE+I//+A8xbtw4U8z33XefmDlz5rDHrOL9998XAMTBgwcTZXV1deInP/mJZZ+RitnKWK699lrLPk5Yx9dee634h3/4B1PZWK1jQq85FZzuM0JkttfQZwahz9jHLEOfySzoM6eG070mk31GCOd5jdN8RgjneU02+kxW/Q1zYGAAmzZtQkNDQ6LM5XKhoaEB69evH8PIdILBIACgvLzcVP7iiy+ioqICZ599Nh544AH09PQk6tavX485c+agqqoqUXbFFVego6MDO3bsGJE49+7di9raWpx22mlYtGgRmpqaAACbNm1CKBQyrd8zzzwTU6ZMSazfsYg3zsDAAF544QV85StfgaZpifJMW79GGhsb0dLSYlqnfr8f9fX1pnVaVlaG888/P9GmoaEBLpcLGzZsSLS59NJL4Xa7TTp2796NEydOjLiOYDAITdNQVlZmKl+2bBnGjx+P8847D48//rjpVPDRjnnt2rWorKzEzJkz8Y1vfAPHjx83xZLJ67i1tRVvvPEGbr/99qS6TFrHuQK95tRxqs8AzvMa+gx9Jh3oM5kFfWZ4cKrXOM1ngOzwGif4DOBcr3Giz+SP6OijzLFjxxCJRExfEgBQVVWFXbt2jVFUOtFoFHfddRc++9nP4uyzz06Uf/nLX0ZdXR1qa2uxbds23Hfffdi9ezdeffVVAEBLS4tST7xuuKmvr8eKFSswc+ZMNDc34/vf/z4uueQSbN++HS0tLXC73UlfIFVVVYlYRjteI6+//joCgQBuvfXWRFmmrV+Z+DJUMRjXaWVlpak+Pz8f5eXlpjbTpk1LGiNeN27cuBGJH9Cv+XDffffhpptuQmlpaaL8W9/6Fj7zmc+gvLwc7733Hh544AE0NzfjiSeeGPWYFyxYgOuvvx7Tpk3D/v378S//8i+48sorsX79euTl5WX8On7++edRUlKC66+/3lSeSes4l6DXnBpO9hnAeV5Dn6HPpAN9JrOgz5w6TvYap/mMcRlO9Ron+AzgbK9xos9k1WRZJnPnnXdi+/bteOedd0zlixcvTryeM2cOampqcNlll2H//v2YPn36aIeJK6+8MvF67ty5qK+vR11dHV555RX4fL5Rj2coPPPMM7jyyitRW1ubKMu09ZtthEIh/OM//iOEEFi+fLmp7jvf+U7i9dy5c+F2u/G1r30NS5cuhcfjGdU4Fy5cmHg9Z84czJ07F9OnT8fatWtx2WWXjWosJ8Ozzz6LRYsWwev1msozaR2TzMAJXuNknwHoNaMNfWZ0oM+QdHGCzwDO9hr6zOjiFJ8BnO01TvSZrPobZkVFBfLy8pLuZNLa2orq6uoxigpYsmQJ/vCHP+Dtt9/GpEmTbNvW19cDAPbt2wcAqK6uVuqJ1400ZWVlOOOMM7Bv3z5UV1djYGAAgUAgKZ54LGMV78GDB/HWW2/hq1/9qm27TFu/8WXY7bPV1dVoa2sz1YfDYbS3t4/peo8by8GDB7Fq1SrTURgV9fX1CIfDOHDgwJjFHOe0005DRUWFaT/IxHUMAH/961+xe/fulPs2kFnrOJuh1wwvTvEZwJleQ5+hz6SCPpN50GeGH6d4jRN9xrgMp3mNk30GcI7XONVnsmqyzO12Y968eVi9enWiLBqNYvXq1Zg/f/6oxyOEwJIlS/Daa69hzZo1SacPqtiyZQsAoKamBgAwf/58fPTRR6adPv5BnjVr1ojEbaSrqwv79+9HTU0N5s2bh4KCAtP63b17N5qamhLrd6zife6551BZWYmrr77atl2mrd9p06ahurratE47OjqwYcMG0zoNBALYtGlTos2aNWsQjUYTRjl//nysW7cOoVDIpGPmzJkjcmpq3Fj27t2Lt956C+PHj0/ZZ8uWLXC5XIlTg0c7ZiOHDx/G8ePHTftBpq3jOM888wzmzZuHc845J2XbTFrH2Qy9Znhxis8AzvQa+gx9JhX0mcyDPjP8OMVrnOgzgDO9xuk+AzjHaxzrMyN+C4FR5uWXXxYej0esWLFC7Ny5UyxevFiUlZWZ7gwyWnzjG98Qfr9frF271nQr1J6eHiGEEPv27RM/+MEPxMaNG0VjY6P43e9+J0477TRx6aWXJsaI3wb48ssvF1u2bBErV64UEyZMGLHbFt99991i7dq1orGxUbz77ruioaFBVFRUiLa2NiGEfpvlKVOmiDVr1oiNGzeK+fPni/nz549ZvELodweaMmWKuO+++0zlmbJ+Ozs7xebNm8XmzZsFAPHEE0+IzZs3J+60smzZMlFWViZ+97vfiW3btolrr71WeZvl8847T2zYsEG88847YsaMGaZbAAcCAVFVVSX+6Z/+SWzfvl28/PLLorCw8KRvqWsX88DAgPjiF78oJk2aJLZs2WLat+N3KXnvvffET37yE7Flyxaxf/9+8cILL4gJEyaIm2++eURitou3s7NT3HPPPWL9+vWisbFRvPXWW+Izn/mMmDFjhujr68vIdRwnGAyKwsJCsXz58qT+o72OiRl6zcnjRJ8RIrO9hj5DnxlqzHHoM5kLfebUcKLXZLLPCOE8r3Gaz6SKORO9Jtt9Jusmy4QQ4uc//7mYMmWKcLvd4oILLhB/+9vfxiQOAMrHc889J4QQoqmpSVx66aWivLxceDwecfrpp4t7771XBINB0zgHDhwQV155pfD5fKKiokLcfffdIhQKjUjMN954o6ipqRFut1tMnDhR3HjjjWLfvn2J+t7eXvHNb35TjBs3ThQWFoovfelLorm5ecziFUKIP/3pTwKA2L17t6k8U9bv22+/rdwPbrnlFiGEfqvlhx56SFRVVQmPxyMuu+yyJC3Hjx8XN910kyguLhalpaXitttuE52dnaY2W7duFRdffLHweDxi4sSJYtmyZSMSc2Njo+W+/fbbbwshhNi0aZOor68Xfr9feL1ecdZZZ4l//dd/NX2RD2fMdvH29PSIyy+/XEyYMEEUFBSIuro6cccddyQlm5m0juP84he/ED6fTwQCgaT+o72OSTL0mpPDiT4jRGZ7DX2GPjPUmOPQZzIb+szJ40SvyWSfEcJ5XuM0n0kVcyZ6Tbb7jCaEEIoTzgghhBBCCCGEEEIIyTmy6pplhBBCCCGEEEIIIYScCpwsI4QQQgghhBBCCCEkBifLCCGEEEIIIYQQQgiJwckyQgghhBBCCCGEEEJicLKMEEIIIYQQQgghhJAYnCwjhBBCCCGEEEIIISQGJ8sIIYQQQgghhBBCCInByTJCCCGEEEIIIYQQQmJwsowQQgghhBBCCCGEkBicLCOEEEIIIYQQQgghJAYnywghhBBCCCGEEEIIifF/AQoonA1mXS9TAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display_issues(issues_from_score, pred_probs=pred_probs, labels=labels, top=5) " + ] + }, + { + "cell_type": "markdown", + "id": "eacdd73d", + "metadata": {}, + "source": [ + "We can see that the errors are dominated by label errors in the sky." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "86bac686", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:33.663615Z", + "iopub.status.busy": "2024-06-25T23:07:33.663161Z", + "iopub.status.idle": "2024-06-25T23:07:33.719276Z", + "shell.execute_reply": "2024-06-25T23:07:33.718801Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "top_2_issues = np.argsort(-np.sum(issues, axis=(1, 2)))[:2]\n", + "assert (top_2_issues == [1, 21]).all()\n", + "\n", + "top_3_class_issues = np.argsort(-np.sum(class_issues, axis=(1, 2)))[:3]\n", + "assert (top_3_class_issues == [17, 19, 0]).all()\n", + "\n", + "highlighted_indices = [ 1, 21, 2, 24, 4, 3, 12]\n", + "top_issues_from_scores = np.argsort(-issues_from_score.sum((1,2)))[:len(highlighted_indices)]\n", + "if not len(set(top_issues_from_scores).difference(highlighted_indices)) == 0:\n", + " raise Exception(f\"Some highlighted examples are missing from ranked_label_issues. Highlighted indices: {top_issues_from_scores[:len(highlighted_indices)]}\")\n", + " \n", + "lowest_image_scores = np.argsort(image_scores)[:15] \n", + "assert len(set(top_issues_from_scores).difference(lowest_image_scores)) == 0" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": { + "019777cf20694ca1a16b31356c899c37": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0e4c0b8b025944599d9337f0487205ff": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "12e8686f31244c96acd0add15cc5fafe": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "14e990e40e0048c8bab02470fce01cdb": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1a9ba29fef264479aeebe59f09be2cf7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_6bfe9912818c4661bee7d3438ae8601a", + "IPY_MODEL_a38f91fe903746aebec7c949258d5078", + "IPY_MODEL_41356d24b0d5449d8acefaacebadb0c0" + ], + "layout": "IPY_MODEL_4db9516b032246d7a136f20090a3e7a0", + "tabbable": null, + "tooltip": null + } + }, + "1ded9057fe774d1298be2ca6127d1e8a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_b3e1c633bc35499c86c59f40f76830f6", + "placeholder": "​", + "style": "IPY_MODEL_6e1df7079c3147dfab5a0ed4e7c58e2c", + "tabbable": null, + "tooltip": null, + "value": " 30/30 [00:01<00:00, 21.66it/s]" + } + }, + "250760339324499f9f4e6b0984a34298": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_6872b3e51d114f06b69fa1a6e05cf20d", + "placeholder": "​", + "style": "IPY_MODEL_0e4c0b8b025944599d9337f0487205ff", + "tabbable": null, + "tooltip": null, + "value": " 30/30 [00:22<00:00,  1.36it/s]" + } + }, + "2c792b94d71a4edb8131889b6899327c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "3360e13c90b14f8bb9f02d9dca352126": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "33e55b9faa124a2da123a32c7cf39cd2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "3e2c883142a44129b435925e3a875033": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_3360e13c90b14f8bb9f02d9dca352126", + "placeholder": "​", + "style": "IPY_MODEL_a963f364c0e043ae9fee60c12b02262c", + "tabbable": null, + "tooltip": null, + "value": " 4997683/4997683 [00:32<00:00, 152645.67it/s]" + } + }, + "41356d24b0d5449d8acefaacebadb0c0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_52d609ad11084e488a79ecdcdb9088e9", + "placeholder": "​", + "style": "IPY_MODEL_41389220daa745738b749a89d14a3f32", + "tabbable": null, + "tooltip": null, + "value": " 30/30 [00:00<00:00, 810.38it/s]" + } + }, + "41389220daa745738b749a89d14a3f32": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "41b5ff927c124880a253cbcb4700182c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_019777cf20694ca1a16b31356c899c37", + "max": 4997683.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_abaf98692428499b9c0555d67d3756e5", + "tabbable": null, + "tooltip": null, + "value": 4997683.0 + } + }, + "4db9516b032246d7a136f20090a3e7a0": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "52d609ad11084e488a79ecdcdb9088e9": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "579a6fd51468469a897ad8e37ab2a4b3": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f9671ac63586439e89d15c8f8335e3ac", + "IPY_MODEL_ef7dbda0c75c4bd89d019a5e5dcab69d", + "IPY_MODEL_250760339324499f9f4e6b0984a34298" + ], + "layout": "IPY_MODEL_e8237d5d5da946aca6aa7d3adc9f44dd", + "tabbable": null, + "tooltip": null + } + }, + "595ab7f81af447e2899da945eb15bf5f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "5b3005f45257497bb0cb47a947558588": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7bc3aa857121480cb44097cf2648bcec", + "IPY_MODEL_41b5ff927c124880a253cbcb4700182c", + "IPY_MODEL_3e2c883142a44129b435925e3a875033" + ], + "layout": "IPY_MODEL_7a5f24fece084f0db212f8f0a4db3652", + "tabbable": null, + "tooltip": null + } + }, + "5ed030d0f00e47b1887ab648dc7ff453": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6872b3e51d114f06b69fa1a6e05cf20d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6bfe9912818c4661bee7d3438ae8601a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_5ed030d0f00e47b1887ab648dc7ff453", + "placeholder": "​", + "style": "IPY_MODEL_d7d1826041c54eec950d83cff81c69cf", + "tabbable": null, + "tooltip": null, + "value": "number of examples processed for estimating thresholds: 100%" + } + }, + "6e1df7079c3147dfab5a0ed4e7c58e2c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "70061786cfe54b15a4eefb94068c121b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "7a5f24fece084f0db212f8f0a4db3652": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7bc3aa857121480cb44097cf2648bcec": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_7c41f9ec83344483ba6c8dea1579b5e7", + "placeholder": "​", + "style": "IPY_MODEL_8005331408814a3198801c2846538676", + "tabbable": null, + "tooltip": null, + "value": "100%" + } + }, + "7c41f9ec83344483ba6c8dea1579b5e7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8005331408814a3198801c2846538676": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "853bac5cbd4f4898be61293580d9bfb3": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_a9430850372641e7933cdb8967a500e0", + "IPY_MODEL_b9f6611d1be34ee0a99836eabd14dc8c", + "IPY_MODEL_1ded9057fe774d1298be2ca6127d1e8a" + ], + "layout": "IPY_MODEL_14e990e40e0048c8bab02470fce01cdb", + "tabbable": null, + "tooltip": null + } + }, + "a38f91fe903746aebec7c949258d5078": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_c1683469bb68465ba1e309ff9d30d589", + "max": 30.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_595ab7f81af447e2899da945eb15bf5f", + "tabbable": null, + "tooltip": null, + "value": 30.0 + } + }, + "a9430850372641e7933cdb8967a500e0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_cfa6412dc6334607b80c8f229abd6dcb", + "placeholder": "​", + "style": "IPY_MODEL_33e55b9faa124a2da123a32c7cf39cd2", + "tabbable": null, + "tooltip": null, + "value": "images processed using softmin: 100%" + } + }, + "a963f364c0e043ae9fee60c12b02262c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "abaf98692428499b9c0555d67d3756e5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "ad6f606a7c394bbf8aeee6734cb251d0": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b3e1c633bc35499c86c59f40f76830f6": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b9f6611d1be34ee0a99836eabd14dc8c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_ad6f606a7c394bbf8aeee6734cb251d0", + "max": 30.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_d3e16849a6da4c929101b72bad4b2944", + "tabbable": null, + "tooltip": null, + "value": 30.0 + } + }, + "c1683469bb68465ba1e309ff9d30d589": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cfa6412dc6334607b80c8f229abd6dcb": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d3e16849a6da4c929101b72bad4b2944": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "d7d1826041c54eec950d83cff81c69cf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "e55afb5a089146a4b3d76b9c560a136b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e8237d5d5da946aca6aa7d3adc9f44dd": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ef7dbda0c75c4bd89d019a5e5dcab69d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_e55afb5a089146a4b3d76b9c560a136b", + "max": 30.0, + "min": 0.0, + "orientation": "horizontal", + "style": "IPY_MODEL_2c792b94d71a4edb8131889b6899327c", + "tabbable": null, + "tooltip": null, + "value": 30.0 + } + }, + "f9671ac63586439e89d15c8f8335e3ac": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_12e8686f31244c96acd0add15cc5fafe", + "placeholder": "​", + "style": "IPY_MODEL_70061786cfe54b15a4eefb94068c121b", + "tabbable": null, + "tooltip": null, + "value": "number of examples processed for checking labels: 100%" + } + } + }, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials/token_classification.ipynb b/v2.6.6/.doctrees/nbsphinx/tutorials/token_classification.ipynb new file mode 100644 index 000000000..c6bc72371 --- /dev/null +++ b/v2.6.6/.doctrees/nbsphinx/tutorials/token_classification.ipynb @@ -0,0 +1,1175 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d0d2e007", + "metadata": {}, + "source": [ + "# Find Label Errors in Token Classification (Text) Datasets\n", + "\n", + "This 5-minute quickstart tutorial shows how you can use cleanlab to find potential label errors in text datasets for token classification. In token-classification, our data consists of a bunch of sentences (aka documents) in which every token (aka word) is labeled with one of K classes, and we train models to predict the class of each token in a new sentence. Example applications in NLP include part-of-speech-tagging or entity recognition, which is the focus on this tutorial. Here we use the [CoNLL-2003 named entity recognition](https://deepai.org/dataset/conll-2003-english) dataset which contains around 20,000 sentences with 300,000 individual tokens. Each token is labeled with one of the following classes:\n", + "\n", + "- LOC (location entity)\n", + "- PER (person entity)\n", + "- ORG (organization entity)\n", + "- MISC (miscellaneous other type of entity)\n", + "- O (other type of word that does not correspond to an entity)\n", + "\n", + "**Overview of what we'll do in this tutorial:** \n", + "\n", + "- Find tokens with label issues using `cleanlab.token_classification.filter.find_label_issues`. \n", + "- Rank sentences based on their overall label quality using `cleanlab.token_classification.rank.get_label_quality_scores`." + ] + }, + { + "cell_type": "markdown", + "id": "07936a54", + "metadata": {}, + "source": [ + "
\n", + "Quickstart\n", + "
\n", + " \n", + "cleanlab uses three inputs to handle token classification data:\n", + "\n", + "- `tokens`: List whose `i`-th element is a list of strings/words corresponding to tokenized version of the `i`-th sentence in dataset. \n", + " Example: `[..., [\"I\", \"love\", \"cleanlab\"], ...]`\n", + "- `labels`: List whose `i`-th element is a list of integers corresponding to class labels of each token in the `i`-th sentence. Example: `[..., [0, 0, 1], ...]`\n", + "- `pred_probs`: List whose `i`-th element is a np.ndarray of shape `(N_i, K)` corresponding to predicted class probabilities for each token in the `i`-th sentence (assuming this sentence contains `N_i` tokens and dataset has `K` possible classes). These should be out-of-sample `pred_probs` obtained from a token classification model via cross-validation. \n", + " Example: `[..., np.array([[0.8,0.2], [0.9,0.1], [0.3,0.7]]), ...]`\n", + "\n", + "Using these, you can find/display label issues with this code: \n", + "\n", + "
\n", + " \n", + "```python\n", + "\n", + "from cleanlab.token_classification.filter import find_label_issues \n", + "from cleanlab.token_classification.summary import display_issues\n", + " \n", + "issues = find_label_issues(labels, pred_probs)\n", + "display_issues(issues, tokens, pred_probs=pred_probs, labels=labels,\n", + " class_names=OPTIONAL_LIST_OF_ORDERED_CLASS_NAMES)\n", + "\n", + "```\n", + " \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "1da020bc", + "metadata": {}, + "source": [ + "## 1. Install required dependencies and download data\n", + "\n", + "You can use `pip` to install all packages required for this tutorial as follows: \n", + "\n", + " !pip install cleanlab " + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "ae8a08e0", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:35.984293Z", + "iopub.status.busy": "2024-06-25T23:07:35.984113Z", + "iopub.status.idle": "2024-06-25T23:07:37.018147Z", + "shell.execute_reply": "2024-06-25T23:07:37.017540Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2024-06-25 23:07:35-- https://data.deepai.org/conll2003.zip\r\n", + "Resolving data.deepai.org (data.deepai.org)... " + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "185.93.1.243, 2400:52e0:1a00::941:1\r\n", + "Connecting to data.deepai.org (data.deepai.org)|185.93.1.243|:443... connected.\r\n", + "HTTP request sent, awaiting response... " + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "200 OK\r\n", + "Length: 982975 (960K) [application/zip]\r\n", + "Saving to: ‘conll2003.zip’\r\n", + "\r\n", + "\r", + "conll2003.zip 0%[ ] 0 --.-KB/s " + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\r", + "conll2003.zip 100%[===================>] 959.94K --.-KB/s in 0.1s \r\n", + "\r\n", + "2024-06-25 23:07:36 (7.41 MB/s) - ‘conll2003.zip’ saved [982975/982975]\r\n", + "\r\n", + "mkdir: cannot create directory ‘data’: File exists\r\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Archive: conll2003.zip\r\n", + " inflating: data/metadata \r\n", + " inflating: data/test.txt \r\n", + " inflating: data/train.txt \r\n", + " inflating: data/valid.txt \r\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2024-06-25 23:07:36-- https://cleanlab-public.s3.amazonaws.com/TokenClassification/pred_probs.npz\r\n", + "Resolving cleanlab-public.s3.amazonaws.com (cleanlab-public.s3.amazonaws.com)... 16.182.39.153, 3.5.29.169, 3.5.28.38, ...\r\n", + "Connecting to cleanlab-public.s3.amazonaws.com (cleanlab-public.s3.amazonaws.com)|16.182.39.153|:443... connected.\r\n", + "HTTP request sent, awaiting response... " + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "200 OK\r\n", + "Length: 17045998 (16M) [binary/octet-stream]\r\n", + "Saving to: ‘pred_probs.npz’\r\n", + "\r\n", + "\r", + "pred_probs.npz 0%[ ] 0 --.-KB/s " + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\r", + "pred_probs.npz 100%[===================>] 16.26M --.-KB/s in 0.1s \r\n", + "\r\n", + "2024-06-25 23:07:36 (132 MB/s) - ‘pred_probs.npz’ saved [17045998/17045998]\r\n", + "\r\n" + ] + } + ], + "source": [ + "!wget -nc https://data.deepai.org/conll2003.zip && mkdir data \n", + "!unzip conll2003.zip -d data/ && rm conll2003.zip \n", + "!wget -nc 'https://cleanlab-public.s3.amazonaws.com/TokenClassification/pred_probs.npz' " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "439b0305", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:37.020478Z", + "iopub.status.busy": "2024-06-25T23:07:37.020266Z", + "iopub.status.idle": "2024-06-25T23:07:38.256309Z", + "shell.execute_reply": "2024-06-25T23:07:38.255739Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Package installation (hidden on docs website).\n", + "\n", + "dependencies = [\"cleanlab\"]\n", + "\n", + "if \"google.colab\" in str(get_ipython()): # Check if it's running in Google Colab\n", + " %pip install cleanlab==v2.6.6\n", + " cmd = ' '.join([dep for dep in dependencies if dep != \"cleanlab\"])\n", + " %pip install $cmd\n", + "else:\n", + " dependencies_test = [dependency.split('>')[0] if '>' in dependency \n", + " else dependency.split('<')[0] if '<' in dependency \n", + " else dependency.split('=')[0] for dependency in dependencies]\n", + " missing_dependencies = []\n", + " for dependency in dependencies_test:\n", + " try:\n", + " __import__(dependency)\n", + " except ImportError:\n", + " missing_dependencies.append(dependency)\n", + "\n", + " if len(missing_dependencies) > 0:\n", + " print(\"Missing required dependencies:\")\n", + " print(*missing_dependencies, sep=\", \")\n", + " print(\"\\nPlease install them before running the rest of this notebook.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "a1349304", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:38.258805Z", + "iopub.status.busy": "2024-06-25T23:07:38.258494Z", + "iopub.status.idle": "2024-06-25T23:07:38.262814Z", + "shell.execute_reply": "2024-06-25T23:07:38.262290Z" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "from cleanlab.token_classification.filter import find_label_issues \n", + "from cleanlab.token_classification.rank import get_label_quality_scores, issues_from_scores \n", + "from cleanlab.internal.token_classification_utils import get_sentence, filter_sentence, mapping \n", + "from cleanlab.token_classification.summary import display_issues, common_label_issues, filter_by_token \n", + "\n", + "np.set_printoptions(suppress=True)" + ] + }, + { + "cell_type": "markdown", + "id": "9ad75b45", + "metadata": {}, + "source": [ + "## 2. Get data, labels, and pred_probs\n", + "\n", + "In token classification tasks, each token in the dataset is labeled with one of *K* possible classes.\n", + "To find label issues, cleanlab requires predicted class probabilities from a trained classifier. These `pred_probs` contain a length-*K* vector for **each** token in the dataset (which sums to 1 for each token). Here we use `pred_probs` which are out-of-sample predicted class probabilities for the full CoNLL-2003 dataset (merging training, development, and testing splits), obtained from a BERT Transformer fit via cross-validation. Our example notebook [\"Training Entity Recognition Model for Token Classification\"](https://github.com/cleanlab/examples/blob/master/entity_recognition/entity_recognition_training.ipynb) contains the code to produce such `pred_probs` and save them in a `.npz` file, which we simply load here via a `read_npz` function (can skip these details)." + ] + }, + { + "cell_type": "markdown", + "id": "6cc832fd", + "metadata": {}, + "source": [ + "
See the code for reading the `.npz` file **(click to expand)** \n", + "\n", + "```python\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "def read_npz(filepath): \n", + " data = dict(np.load(filepath)) \n", + " data = [data[str(i)] for i in range(len(data))] \n", + " return data \n", + "\n", + "```\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "ab9d59a0", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:38.264742Z", + "iopub.status.busy": "2024-06-25T23:07:38.264563Z", + "iopub.status.idle": "2024-06-25T23:07:38.267474Z", + "shell.execute_reply": "2024-06-25T23:07:38.267010Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "def read_npz(filepath): \n", + " data = dict(np.load(filepath)) \n", + " data = [data[str(i)] for i in range(len(data))] \n", + " return data " + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "519cb80c", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:38.269420Z", + "iopub.status.busy": "2024-06-25T23:07:38.269245Z", + "iopub.status.idle": "2024-06-25T23:07:47.051311Z", + "shell.execute_reply": "2024-06-25T23:07:47.050784Z" + } + }, + "outputs": [], + "source": [ + "pred_probs = read_npz('pred_probs.npz') " + ] + }, + { + "cell_type": "markdown", + "id": "a8136f37", + "metadata": {}, + "source": [ + "`pred_probs` is a list of numpy arrays, which we'll describe later. Let's first also load the dataset and its labels. We collect sentences from the original text files defining: \n", + "\n", + "- `tokens` as a nested list where `tokens[i]` is a list of strings corrsesponding to a (word-level) tokenized version of the `i`-th sentence\n", + "- `given_labels` as a nested list of the given labels in the dataset where `given_labels[i]` is a list of labels for each token in the `i`-th sentence. \n", + "\n", + "This version of CoNLL-2003 uses IOB2-formatting for tagging, where `B-` and `I-` prefixes in the class labels indicate whether the tokens are at the start of an entity or in the middle. We ignore these distinctions in this tutorial (as label errors that confuse `B-` and `I-` are less interesting), and thus have two sets of entities: \n", + "\n", + "- `given_entities` = ['O', 'B-MISC', 'I-MISC', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC'] \n", + "- `entities` = ['O', 'MISC', 'PER', 'ORG', 'LOC']. These are our classes of interest for the token classification task.\n", + "\n", + "We use some helper methods to load the CoNLL data (can skip these details)." + ] + }, + { + "cell_type": "markdown", + "id": "43a87745", + "metadata": {}, + "source": [ + "
See the code for reading the CoNLL data files **(click to expand)**\n", + "\n", + "```python\n", + "\n", + "# Note: This pulldown content is for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "\n", + "given_entities = ['O', 'B-MISC', 'I-MISC', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']\n", + "entities = ['O', 'MISC', 'PER', 'ORG', 'LOC'] \n", + "entity_map = {entity: i for i, entity in enumerate(given_entities)} \n", + "\n", + "def readfile(filepath, sep=' '): \n", + " lines = open(filepath)\n", + " data, sentence, label = [], [], []\n", + " for line in lines:\n", + " if len(line) == 0 or line.startswith('-DOCSTART') or line[0] == '\\n':\n", + " if len(sentence) > 0:\n", + " data.append((sentence, label))\n", + " sentence, label = [], []\n", + " continue\n", + " splits = line.split(sep) \n", + " word = splits[0]\n", + " if len(word) > 0 and word[0].isalpha() and word.isupper():\n", + " word = word[0] + word[1:].lower()\n", + " sentence.append(word)\n", + " label.append(entity_map[splits[-1][:-1]])\n", + "\n", + " if len(sentence) > 0:\n", + " data.append((sentence, label))\n", + "\n", + " tokens = [d[0] for d in data] \n", + " given_labels = [d[1] for d in data]\n", + " return tokens, given_labels\n", + "\n", + "```\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "202f1526", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:47.053629Z", + "iopub.status.busy": "2024-06-25T23:07:47.053443Z", + "iopub.status.idle": "2024-06-25T23:07:47.059010Z", + "shell.execute_reply": "2024-06-25T23:07:47.058561Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "given_entities = ['O', 'B-MISC', 'I-MISC', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']\n", + "entities = ['O', 'MISC', 'PER', 'ORG', 'LOC'] \n", + "entity_map = {entity: i for i, entity in enumerate(given_entities)} \n", + "\n", + "def readfile(filepath, sep=' '): \n", + " lines = open(filepath)\n", + " data, sentence, label = [], [], []\n", + " for line in lines:\n", + " if len(line) == 0 or line.startswith('-DOCSTART') or line[0] == '\\n':\n", + " if len(sentence) > 0:\n", + " data.append((sentence, label))\n", + " sentence, label = [], []\n", + " continue\n", + " splits = line.split(sep) \n", + " word = splits[0]\n", + " if len(word) > 0 and word[0].isalpha() and word.isupper():\n", + " word = word[0] + word[1:].lower()\n", + " sentence.append(word)\n", + " label.append(entity_map[splits[-1][:-1]])\n", + "\n", + " if len(sentence) > 0:\n", + " data.append((sentence, label))\n", + " \n", + " tokens = [d[0] for d in data] \n", + " given_labels = [d[1] for d in data] \n", + " return tokens, given_labels " + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a4381f03", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:47.061025Z", + "iopub.status.busy": "2024-06-25T23:07:47.060642Z", + "iopub.status.idle": "2024-06-25T23:07:47.406374Z", + "shell.execute_reply": "2024-06-25T23:07:47.405681Z" + } + }, + "outputs": [], + "source": [ + "filepaths = ['data/train.txt', 'data/valid.txt', 'data/test.txt'] \n", + "tokens, given_labels = [], [] \n", + "\n", + "for filepath in filepaths: \n", + " words, label = readfile(filepath) \n", + " tokens.extend(words) \n", + " given_labels.extend(label)\n", + " \n", + "sentences = list(map(get_sentence, tokens)) \n", + "\n", + "sentences, mask = filter_sentence(sentences) \n", + "tokens = [words for m, words in zip(mask, tokens) if m] \n", + "given_labels = [labels for m, labels in zip(mask, given_labels) if m] \n", + "\n", + "maps = [0, 1, 1, 2, 2, 3, 3, 4, 4] \n", + "labels = [mapping(labels, maps) for labels in given_labels] " + ] + }, + { + "cell_type": "markdown", + "id": "46cb7c93", + "metadata": {}, + "source": [ + "To find label issues in token classification data, cleanlab requires `labels` and `pred_probs`, which should look as follows: " + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "7842e4a3", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:47.409012Z", + "iopub.status.busy": "2024-06-25T23:07:47.408584Z", + "iopub.status.idle": "2024-06-25T23:07:47.413050Z", + "shell.execute_reply": "2024-06-25T23:07:47.412509Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "sentences[0]:\tEu rejects German call to boycott British lamb.\n", + "labels[0]:\t[3, 0, 1, 0, 0, 0, 1, 0, 0]\n", + "pred_probs[0]:\n", + "[[0.00030412 0.00023826 0.99936208 0.00007009 0.00002545]\n", + " [0.99998795 0.00000401 0.00000218 0.00000455 0.00000131]\n", + " [0.00000749 0.99996115 0.00001371 0.0000087 0.00000895]\n", + " [0.99998936 0.00000382 0.00000178 0.00000366 0.00000137]\n", + " [0.99999101 0.00000266 0.00000174 0.0000035 0.00000109]\n", + " [0.99998768 0.00000482 0.00000202 0.00000438 0.0000011 ]\n", + " [0.00000465 0.99996392 0.00001105 0.0000116 0.00000878]\n", + " [0.99998671 0.00000364 0.00000213 0.00000472 0.00000281]\n", + " [0.99999073 0.00000211 0.00000159 0.00000442 0.00000115]]\n", + "\n", + "sentences[1]:\tPeter Blackburn\n", + "labels[1]:\t[2, 2]\n", + "pred_probs[1]:\n", + "[[0.00000358 0.00000529 0.99995623 0.000022 0.0000129 ]\n", + " [0.0000024 0.00001812 0.99994141 0.00001645 0.00002162]]\n", + "\n", + "sentences[2]:\tBrussels 1996-08-22\n", + "labels[2]:\t[4, 0]\n", + "pred_probs[2]:\n", + "[[0.00001172 0.00000821 0.00004661 0.0000618 0.99987167]\n", + " [0.99999061 0.00000201 0.00000195 0.00000408 0.00000135]]\n" + ] + } + ], + "source": [ + "indices_to_preview = 3 # increase this to view more examples\n", + "for i in range(indices_to_preview):\n", + " print('\\nsentences[%d]:\\t' % i + str(sentences[i])) \n", + " print('labels[%d]:\\t' % i + str(labels[i])) \n", + " print('pred_probs[%d]:\\n' % i + str(pred_probs[i])) " + ] + }, + { + "cell_type": "markdown", + "id": "9b71eb4a", + "metadata": {}, + "source": [ + "Note that these correspond to the sentences in the dataset, where each sentence is treated as an individual training example (could be document instead of sentence). If using your own dataset, both `pred_probs` and `labels` should each be formatted as a nested-list where: \n", + "\n", + "- `pred_probs` is a list whose `i`-th element is a np.ndarray of shape `(N_i, K)` corresponding to predicted class probabilities for each token in the `i`-th sentence (assuming this sentence contains `N_i` tokens and dataset has `K` possible classes). Each row of one np.ndarray corresponds to a token `t` and contains a model's predicted probability that `t` belongs to each possible class, for each of the K classes. The columns must be ordered such that the probabilities correspond to class 0, 1, ..., K-1. These should be out-of-sample `pred_probs` obtained from a token classification model via cross-validation. \n", + "\n", + "- `labels` is a list whose `i`-th element is a list of integers corresponding to class label of each token in the `i`-th sentence. For dataset with K classes, labels must take values in 0, 1, ..., K-1. " + ] + }, + { + "cell_type": "markdown", + "id": "1dc3150f", + "metadata": {}, + "source": [ + "## 3. Use cleanlab to find label issues \n", + "\n", + "Based on the given labels and out-of-sample predicted probabilities, cleanlab can quickly help us identify label issues in our dataset. Here we request that the indices of the identified label issues be sorted by cleanlab’s self-confidence score, which measures the quality of each given label via the probability assigned to it in our model’s prediction. The returned `issues` are a list of tuples `(i, j)`, which corresponds to the `j`th token of the `i`-th sentence in the dataset. These are the tokens cleanlab thinks may be badly labeled in your dataset. " + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "2c2ad9ad", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:47.414989Z", + "iopub.status.busy": "2024-06-25T23:07:47.414808Z", + "iopub.status.idle": "2024-06-25T23:07:49.966577Z", + "shell.execute_reply": "2024-06-25T23:07:49.965753Z" + } + }, + "outputs": [], + "source": [ + "issues = find_label_issues(labels, pred_probs) " + ] + }, + { + "cell_type": "markdown", + "id": "7221c12b", + "metadata": {}, + "source": [ + "Let's look at the top 20 tokens that cleanlab thinks are most likely mislabeled. " + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "95dc7268", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:49.969486Z", + "iopub.status.busy": "2024-06-25T23:07:49.968932Z", + "iopub.status.idle": "2024-06-25T23:07:49.973148Z", + "shell.execute_reply": "2024-06-25T23:07:49.972703Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cleanlab found 2254 potential label issues. \n", + "The top 20 most likely label errors:\n", + "[(2907, 0), (19392, 0), (9962, 4), (8904, 30), (19303, 0), (12918, 0), (9256, 0), (11855, 20), (18392, 4), (20426, 28), (19402, 21), (14744, 15), (19371, 0), (4645, 2), (83, 9), (10331, 3), (9430, 10), (6143, 25), (18367, 0), (12914, 3)]\n" + ] + } + ], + "source": [ + "top = 20 # increase this value to view more identified issues\n", + "print('Cleanlab found %d potential label issues. ' % len(issues)) \n", + "print('The top %d most likely label errors:' % top) \n", + "print(issues[:top]) " + ] + }, + { + "cell_type": "markdown", + "id": "65421a2d", + "metadata": {}, + "source": [ + "We can better decide how to handle these issues by viewing the original sentences containing these tokens.\n", + "Given that `O` and `MISC` classes (corresponding to integers 0 and 1 in our class ordering) can sometimes be ambiguous, they are excluded from our visualization below. This is achieved via the `exclude` argument, a list of tuples `(i, j)` such that tokens predicted as `entities[j]` but labeled as `entities[i]` are ignored." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "e13de188", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:49.975217Z", + "iopub.status.busy": "2024-06-25T23:07:49.974803Z", + "iopub.status.idle": "2024-06-25T23:07:49.979841Z", + "shell.execute_reply": "2024-06-25T23:07:49.979336Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Sentence index: 2907, Token index: 0\n", + "Token: Little\n", + "Given label: PER, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mLittle\u001b[0m change from today's weather expected.\n", + "\n", + "\n", + "Sentence index: 19392, Token index: 0\n", + "Token: Let\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mLet\u001b[0m's march together,\" Scalfaro, a northerner himself, said.\n", + "\n", + "\n", + "Sentence index: 9962, Token index: 4\n", + "Token: germany\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "3. Nastja Rysich (\u001b[31mgermany\u001b[0m) 3.75\n", + "\n", + "\n", + "Sentence index: 8904, Token index: 30\n", + "Token: north\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "The Spla has fought Khartoum's government forces in the south since 1983 for greater autonomy or independence of the mainly Christian and animist region from the Moslem, Arabised \u001b[31mnorth\u001b[0m.\n", + "\n", + "\n", + "Sentence index: 12918, Token index: 0\n", + "Token: Mayor\n", + "Given label: PER, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mMayor\u001b[0m Antonio Gonzalez Garcia, of the opposition Revolutionary Workers' Party, said in Wednesday's letter that army troops recently raided several local farms, stole cattle and raped women.\n", + "\n", + "\n", + "Sentence index: 9256, Token index: 0\n", + "Token: Spring\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mSpring\u001b[0m Chg Hrw 12pct Chg White Chg\n", + "\n", + "\n", + "Sentence index: 11855, Token index: 20\n", + "Token: Prince\n", + "Given label: PER, predicted label according to provided pred_probs: O\n", + "----\n", + "\" We have seen the photos but for the moment the palace has no comment,\" a spokeswoman for \u001b[31mPrince\u001b[0m Rainier told Reuters.\n", + "\n", + "\n", + "Sentence index: 18392, Token index: 4\n", + "Token: /\n", + "Given label: O, predicted label according to provided pred_probs: LOC\n", + "----\n", + "Danila 28.5 16\u001b[31m/\u001b[0m12 Caribs/ up W224 Mobil.\n", + "\n", + "\n", + "Sentence index: 19402, Token index: 21\n", + "Token: Wednesday\n", + "Given label: ORG, predicted label according to provided pred_probs: O\n", + "----\n", + "A Reuter consensus survey sees medical equipment group Radiometer reporting largely unchanged earnings when it publishes first half 19996/97 results next \u001b[31mWednesday\u001b[0m.\n", + "\n", + "\n", + "Sentence index: 83, Token index: 9\n", + "Token: Us\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "Listing London Denoms (K) 1-10-100 Sale Limits \u001b[31mUs\u001b[0m/ Uk/ Jp/ Fr\n", + "\n", + "\n", + "Sentence index: 10331, Token index: 3\n", + "Token: Maccabi\n", + "Given label: O, predicted label according to provided pred_probs: ORG\n", + "----\n", + "Hapoel Haifa 3 \u001b[31mMaccabi\u001b[0m Tel Aviv 1\n", + "\n", + "\n", + "Sentence index: 9430, Token index: 10\n", + "Token: hospital\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "The revered Roman Catholic nun was admitted to the Calcutta \u001b[31mhospital\u001b[0m a week ago with high fever and severe vomiting.\n", + "\n", + "\n", + "Sentence index: 6143, Token index: 25\n", + "Token: alliance\n", + "Given label: ORG, predicted label according to provided pred_probs: O\n", + "----\n", + "The embattled Afghan government said last week that the Kabul-Salang highway would be opened on Monday or Tuesday following talks with the Supreme Coordination Council \u001b[31malliance\u001b[0m led by Jumbish-i-Milli movement of powerful opposition warlord General Abdul Rashid Dostum.\n", + "\n", + "\n", + "Sentence index: 18367, Token index: 0\n", + "Token: Can\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mCan\u001b[0m/ U.s. Dollar Exchange Rate: 1.3570\n", + "\n", + "\n", + "Sentence index: 12049, Token index: 0\n", + "Token: Born\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mBorn\u001b[0m in 1937 in the central province of Anhui, Dai came to Shanghai as a student and remained in the city as a prolific author and teacher of Chinese.\n", + "\n", + "\n", + "Sentence index: 16764, Token index: 7\n", + "Token: (\n", + "Given label: PER, predicted label according to provided pred_probs: O\n", + "----\n", + "1990 - British historian Alan John Percivale \u001b[31m(\u001b[0mA.j.p.) Taylor died.\n", + "\n", + "\n", + "Sentence index: 20446, Token index: 0\n", + "Token: Pace\n", + "Given label: PER, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mPace\u001b[0m bowler Ian Harvey claimed three for 81 for Victoria.\n", + "\n", + "\n", + "Sentence index: 15514, Token index: 16\n", + "Token: Cotti\n", + "Given label: O, predicted label according to provided pred_probs: PER\n", + "----\n", + "But one must not forget that the Osce only has limited powers there,\" said \u001b[31mCotti\u001b[0m, who is also the Swiss foreign minister.\"\n", + "\n", + "\n", + "Sentence index: 7525, Token index: 12\n", + "Token: Sultan\n", + "Given label: PER, predicted label according to provided pred_probs: O\n", + "----\n", + "Specter met Crown Prince Abdullah and Minister of Defence and Aviation Prince \u001b[31mSultan\u001b[0m in Jeddah, Saudi state television and the official Saudi Press Agency reported.\n", + "\n", + "\n", + "Sentence index: 2288, Token index: 0\n", + "Token: Sporting\n", + "Given label: ORG, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mSporting\u001b[0m his customary bright green outfit, the U.s. champion clocked 10.03 seconds despite damp conditions to take the scalp of Canada's reigning Olympic champion Donovan Bailey, 1992 champion Linford Christie of Britain and American 1984 and 1988 champion Carl Lewis.\n" + ] + } + ], + "source": [ + "display_issues(issues, tokens, pred_probs=pred_probs, labels=labels, \n", + " exclude=[(0, 1), (1, 0)], class_names=entities) " + ] + }, + { + "cell_type": "markdown", + "id": "96d04902", + "metadata": {}, + "source": [ + "More than half of the potential label issues correspond to tokens that are incorrectly labeled. As shown above, some examples are ambigious and may require more thoughful handling. cleanlab has also discovered some edge cases such as tokens which are simply punctuations such as `/` and `(`. " + ] + }, + { + "cell_type": "markdown", + "id": "d213b2b2", + "metadata": {}, + "source": [ + "### Most common word-level token mislabels \n", + "\n", + "We may also wish to understand which tokens tend to be most commonly mislabeled throughout the entire dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "e4a006bd", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:49.981819Z", + "iopub.status.busy": "2024-06-25T23:07:49.981646Z", + "iopub.status.idle": "2024-06-25T23:07:50.007368Z", + "shell.execute_reply": "2024-06-25T23:07:50.006913Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Token '/' is potentially mislabeled 42 times throughout the dataset\n", + "---------------------------------------------------------------------------------------\n", + "labeled as class `O` but predicted to actually be class `LOC` 36 times\n", + "labeled as class `O` but predicted to actually be class `PER` 4 times\n", + "labeled as class `O` but predicted to actually be class `ORG` 2 times\n", + "\n", + "Token 'Chicago' is potentially mislabeled 27 times throughout the dataset\n", + "---------------------------------------------------------------------------------------\n", + "labeled as class `ORG` but predicted to actually be class `LOC` 22 times\n", + "labeled as class `LOC` but predicted to actually be class `ORG` 3 times\n", + "labeled as class `MISC` but predicted to actually be class `ORG` 2 times\n", + "\n", + "Token 'U.s.' is potentially mislabeled 21 times throughout the dataset\n", + "---------------------------------------------------------------------------------------\n", + "labeled as class `LOC` but predicted to actually be class `ORG` 8 times\n", + "labeled as class `ORG` but predicted to actually be class `LOC` 6 times\n", + "labeled as class `LOC` but predicted to actually be class `O` 3 times\n", + "labeled as class `LOC` but predicted to actually be class `MISC` 2 times\n", + "labeled as class `MISC` but predicted to actually be class `LOC` 1 times\n", + "labeled as class `MISC` but predicted to actually be class `ORG` 1 times\n", + "\n", + "Token 'Digest' is potentially mislabeled 20 times throughout the dataset\n", + "---------------------------------------------------------------------------------------\n", + "labeled as class `O` but predicted to actually be class `ORG` 20 times\n", + "\n", + "Token 'Press' is potentially mislabeled 20 times throughout the dataset\n", + "---------------------------------------------------------------------------------------\n", + "labeled as class `O` but predicted to actually be class `ORG` 20 times\n", + "\n", + "Token 'New' is potentially mislabeled 17 times throughout the dataset\n", + "---------------------------------------------------------------------------------------\n", + "labeled as class `ORG` but predicted to actually be class `LOC` 13 times\n", + "labeled as class `LOC` but predicted to actually be class `ORG` 2 times\n", + "labeled as class `O` but predicted to actually be class `ORG` 1 times\n", + "labeled as class `MISC` but predicted to actually be class `LOC` 1 times\n", + "\n", + "Token 'and' is potentially mislabeled 16 times throughout the dataset\n", + "---------------------------------------------------------------------------------------\n", + "labeled as class `ORG` but predicted to actually be class `O` 7 times\n", + "labeled as class `O` but predicted to actually be class `ORG` 5 times\n", + "labeled as class `O` but predicted to actually be class `LOC` 3 times\n", + "labeled as class `MISC` but predicted to actually be class `ORG` 1 times\n", + "\n", + "Token 'Philadelphia' is potentially mislabeled 15 times throughout the dataset\n", + "---------------------------------------------------------------------------------------\n", + "labeled as class `ORG` but predicted to actually be class `LOC` 14 times\n", + "labeled as class `LOC` but predicted to actually be class `ORG` 1 times\n", + "\n", + "Token 'Usda' is potentially mislabeled 13 times throughout the dataset\n", + "---------------------------------------------------------------------------------------\n", + "labeled as class `ORG` but predicted to actually be class `LOC` 7 times\n", + "labeled as class `ORG` but predicted to actually be class `PER` 5 times\n", + "labeled as class `ORG` but predicted to actually be class `MISC` 1 times\n", + "\n", + "Token 'York' is potentially mislabeled 12 times throughout the dataset\n", + "---------------------------------------------------------------------------------------\n", + "labeled as class `ORG` but predicted to actually be class `LOC` 11 times\n", + "labeled as class `LOC` but predicted to actually be class `ORG` 1 times\n", + "\n" + ] + } + ], + "source": [ + "info = common_label_issues(issues, tokens, \n", + " labels=labels, \n", + " pred_probs=pred_probs, \n", + " class_names=entities, \n", + " exclude=[(0, 1), (1, 0)]) " + ] + }, + { + "cell_type": "markdown", + "id": "9c417061", + "metadata": {}, + "source": [ + "The printed information above is also stored in pd.DataFrame `info`." + ] + }, + { + "cell_type": "markdown", + "id": "a35ef843", + "metadata": {}, + "source": [ + "### Find sentences containing a particular mislabeled word \n", + "\n", + "You can also only focus on the subset of potentially problematic sentences where a particular token may have been mislabeled." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "c8f4e163", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:50.009284Z", + "iopub.status.busy": "2024-06-25T23:07:50.009112Z", + "iopub.status.idle": "2024-06-25T23:07:50.013172Z", + "shell.execute_reply": "2024-06-25T23:07:50.012636Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Sentence index: 471, Token index: 8\n", + "Token: United\n", + "Given label: LOC, predicted label according to provided pred_probs: ORG\n", + "----\n", + "Soccer - Keane Signs Four-year Contract With Manchester \u001b[31mUnited\u001b[0m.\n", + "\n", + "\n", + "Sentence index: 19072, Token index: 5\n", + "Token: United\n", + "Given label: LOC, predicted label according to provided pred_probs: ORG\n", + "----\n", + "The Humane Society of the \u001b[31mUnited\u001b[0m States estimates that between 500,000 and one million bites are delivered by dogs each year, more than half of which are suffered by children.\n", + "\n", + "\n", + "Sentence index: 19910, Token index: 5\n", + "Token: United\n", + "Given label: LOC, predicted label according to provided pred_probs: ORG\n", + "----\n", + "His father Clarence Woolmer represented \u001b[31mUnited\u001b[0m Province, now renamed Uttar Pradesh, in India's Ranji Trophy national championship and captained the state during 1949.\n", + "\n", + "\n", + "Sentence index: 15658, Token index: 0\n", + "Token: United\n", + "Given label: ORG, predicted label according to provided pred_probs: LOC\n", + "----\n", + "\u001b[31mUnited\u001b[0m Nations 1996-08-29\n", + "\n", + "\n", + "Sentence index: 19879, Token index: 1\n", + "Token: United\n", + "Given label: ORG, predicted label according to provided pred_probs: LOC\n", + "----\n", + "1. \u001b[31mUnited\u001b[0m States Iii (Brian Shimer, Randy Jones) one\n", + "\n", + "\n", + "Sentence index: 19104, Token index: 0\n", + "Token: United\n", + "Given label: ORG, predicted label according to provided pred_probs: LOC\n", + "----\n", + "\u001b[31mUnited\u001b[0m Nations 1996-12-06\n" + ] + } + ], + "source": [ + "token_issues = filter_by_token('United', issues, tokens)\n", + "\n", + "display_issues(token_issues, tokens, pred_probs=pred_probs, labels=labels, \n", + " exclude=[(0, 1), (1, 0)], class_names=entities) " + ] + }, + { + "cell_type": "markdown", + "id": "1759108b", + "metadata": {}, + "source": [ + "### Sentence label quality score \n", + "\n", + "For best reviewing label issues in a token classification dataset, you want to look at sentences one at a time. Here sentences more likely to contain a label error should be ranked earlier. Cleanlab can provide an overall label quality score for each sentence (ranging from 0 to 1) such that lower scores indicate sentences more likely to contain some mislabeled token. We can also obtain label quality scores for each individual token and manually decide which of these are label issues by thresholding them. For automatically estimating which tokens are mislabeled (and the number of label errors), you should use `find_label_issues()` instead. `get_label_quality_scores()` is useful if you only have time to review a few sentences and want to prioritize which, or if you're specifically aiming to detect label errors with high precision (or high recall) rather than overall estimation of the set of mislabeled tokens." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "db0b5179", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:50.015176Z", + "iopub.status.busy": "2024-06-25T23:07:50.015001Z", + "iopub.status.idle": "2024-06-25T23:07:51.393532Z", + "shell.execute_reply": "2024-06-25T23:07:51.393052Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Sentence index: 2907, Token index: 0\n", + "Token: Little\n", + "Given label: PER, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mLittle\u001b[0m change from today's weather expected.\n", + "\n", + "\n", + "Sentence index: 19392, Token index: 0\n", + "Token: Let\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mLet\u001b[0m's march together,\" Scalfaro, a northerner himself, said.\n", + "\n", + "\n", + "Sentence index: 9962, Token index: 4\n", + "Token: germany\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "3. Nastja Rysich (\u001b[31mgermany\u001b[0m) 3.75\n", + "\n", + "\n", + "Sentence index: 8904, Token index: 30\n", + "Token: north\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "The Spla has fought Khartoum's government forces in the south since 1983 for greater autonomy or independence of the mainly Christian and animist region from the Moslem, Arabised \u001b[31mnorth\u001b[0m.\n", + "\n", + "\n", + "Sentence index: 12918, Token index: 0\n", + "Token: Mayor\n", + "Given label: PER, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mMayor\u001b[0m Antonio Gonzalez Garcia, of the opposition Revolutionary Workers' Party, said in Wednesday's letter that army troops recently raided several local farms, stole cattle and raped women.\n", + "\n", + "\n", + "Sentence index: 9256, Token index: 0\n", + "Token: Spring\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mSpring\u001b[0m Chg Hrw 12pct Chg White Chg\n", + "\n", + "\n", + "Sentence index: 11855, Token index: 20\n", + "Token: Prince\n", + "Given label: PER, predicted label according to provided pred_probs: O\n", + "----\n", + "\" We have seen the photos but for the moment the palace has no comment,\" a spokeswoman for \u001b[31mPrince\u001b[0m Rainier told Reuters.\n", + "\n", + "\n", + "Sentence index: 18392, Token index: 4\n", + "Token: /\n", + "Given label: O, predicted label according to provided pred_probs: LOC\n", + "----\n", + "Danila 28.5 16\u001b[31m/\u001b[0m12 Caribs/ up W224 Mobil.\n", + "\n", + "\n", + "Sentence index: 19402, Token index: 21\n", + "Token: Wednesday\n", + "Given label: ORG, predicted label according to provided pred_probs: O\n", + "----\n", + "A Reuter consensus survey sees medical equipment group Radiometer reporting largely unchanged earnings when it publishes first half 19996/97 results next \u001b[31mWednesday\u001b[0m.\n", + "\n", + "\n", + "Sentence index: 83, Token index: 9\n", + "Token: Us\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "Listing London Denoms (K) 1-10-100 Sale Limits \u001b[31mUs\u001b[0m/ Uk/ Jp/ Fr\n", + "\n", + "\n", + "Sentence index: 10331, Token index: 3\n", + "Token: Maccabi\n", + "Given label: O, predicted label according to provided pred_probs: ORG\n", + "----\n", + "Hapoel Haifa 3 \u001b[31mMaccabi\u001b[0m Tel Aviv 1\n", + "\n", + "\n", + "Sentence index: 9430, Token index: 10\n", + "Token: hospital\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "The revered Roman Catholic nun was admitted to the Calcutta \u001b[31mhospital\u001b[0m a week ago with high fever and severe vomiting.\n", + "\n", + "\n", + "Sentence index: 6143, Token index: 25\n", + "Token: alliance\n", + "Given label: ORG, predicted label according to provided pred_probs: O\n", + "----\n", + "The embattled Afghan government said last week that the Kabul-Salang highway would be opened on Monday or Tuesday following talks with the Supreme Coordination Council \u001b[31malliance\u001b[0m led by Jumbish-i-Milli movement of powerful opposition warlord General Abdul Rashid Dostum.\n", + "\n", + "\n", + "Sentence index: 18367, Token index: 0\n", + "Token: Can\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mCan\u001b[0m/ U.s. Dollar Exchange Rate: 1.3570\n", + "\n", + "\n", + "Sentence index: 12049, Token index: 0\n", + "Token: Born\n", + "Given label: LOC, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mBorn\u001b[0m in 1937 in the central province of Anhui, Dai came to Shanghai as a student and remained in the city as a prolific author and teacher of Chinese.\n", + "\n", + "\n", + "Sentence index: 16764, Token index: 7\n", + "Token: (\n", + "Given label: PER, predicted label according to provided pred_probs: O\n", + "----\n", + "1990 - British historian Alan John Percivale \u001b[31m(\u001b[0mA.j.p.) Taylor died.\n", + "\n", + "\n", + "Sentence index: 20446, Token index: 0\n", + "Token: Pace\n", + "Given label: PER, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mPace\u001b[0m bowler Ian Harvey claimed three for 81 for Victoria.\n", + "\n", + "\n", + "Sentence index: 15514, Token index: 16\n", + "Token: Cotti\n", + "Given label: O, predicted label according to provided pred_probs: PER\n", + "----\n", + "But one must not forget that the Osce only has limited powers there,\" said \u001b[31mCotti\u001b[0m, who is also the Swiss foreign minister.\"\n", + "\n", + "\n", + "Sentence index: 7525, Token index: 12\n", + "Token: Sultan\n", + "Given label: PER, predicted label according to provided pred_probs: O\n", + "----\n", + "Specter met Crown Prince Abdullah and Minister of Defence and Aviation Prince \u001b[31mSultan\u001b[0m in Jeddah, Saudi state television and the official Saudi Press Agency reported.\n", + "\n", + "\n", + "Sentence index: 2288, Token index: 0\n", + "Token: Sporting\n", + "Given label: ORG, predicted label according to provided pred_probs: O\n", + "----\n", + "\u001b[31mSporting\u001b[0m his customary bright green outfit, the U.s. champion clocked 10.03 seconds despite damp conditions to take the scalp of Canada's reigning Olympic champion Donovan Bailey, 1992 champion Linford Christie of Britain and American 1984 and 1988 champion Carl Lewis.\n" + ] + } + ], + "source": [ + "sentence_scores, token_scores = get_label_quality_scores(labels, pred_probs)\n", + "issues = issues_from_scores(sentence_scores, token_scores=token_scores) \n", + "display_issues(issues, tokens, pred_probs=pred_probs, labels=labels, \n", + " exclude=[(0, 1), (1, 0)], class_names=entities) " + ] + }, + { + "cell_type": "markdown", + "id": "1759108c", + "metadata": {}, + "source": [ + "## How does cleanlab.token_classification work?\n", + "\n", + "The underlying algorithms used to produce these scores are described in [this paper](https://arxiv.org/abs/2210.03920)." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "a18795eb", + "metadata": { + "execution": { + "iopub.execute_input": "2024-06-25T23:07:51.395586Z", + "iopub.status.busy": "2024-06-25T23:07:51.395406Z", + "iopub.status.idle": "2024-06-25T23:07:51.399488Z", + "shell.execute_reply": "2024-06-25T23:07:51.398944Z" + }, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Note: This cell is only for docs.cleanlab.ai, if running on local Jupyter or Colab, please ignore it.\n", + "highlighted_indices = [(2907, 0), (19392, 0), (9962, 4), (8904, 30), (19303, 0), \n", + " (12918, 0), (9256, 0), (11855, 20), (18392, 4), (20426, 28), \n", + " (19402, 21), (14744, 15), (19371, 0), (4645, 2), (83, 9), \n", + " (10331, 3), (9430, 10), (6143, 25), (18367, 0), (12914, 3)] \n", + "\n", + "if not all(x in issues for x in highlighted_indices):\n", + " raise Exception(\"Some highlighted examples are missing from ranked_label_issues.\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_datalab_advanced_15_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_datalab_advanced_15_0.png new file mode 100644 index 000000000..f4e1cd479 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_datalab_advanced_15_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_datalab_quickstart_15_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_datalab_quickstart_15_0.png new file mode 100644 index 000000000..9d0daefb7 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_datalab_quickstart_15_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_30_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_30_1.png new file mode 100644 index 000000000..321b30602 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_30_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_30_3.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_30_3.png new file mode 100644 index 000000000..1a7b24908 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_30_3.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_38_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_38_0.png new file mode 100644 index 000000000..9c10ccac5 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_38_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_44_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_44_0.png new file mode 100644 index 000000000..3f475d975 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_44_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_0.png new file mode 100644 index 000000000..d95c6961d Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_1.png new file mode 100644 index 000000000..a9e4dc375 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_2.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_2.png new file mode 100644 index 000000000..1aa029df0 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_2.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_3.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_3.png new file mode 100644 index 000000000..5c236c697 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_3.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_4.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_4.png new file mode 100644 index 000000000..bf5f771b5 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_50_4.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_57_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_57_0.png new file mode 100644 index 000000000..2ed7b063d Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_57_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_61_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_61_0.png new file mode 100644 index 000000000..82c886c5e Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_image_61_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_19_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_19_0.png new file mode 100644 index 000000000..6611034e4 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_19_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_31_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_31_0.png new file mode 100644 index 000000000..ff2f2167b Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_31_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_50_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_50_0.png new file mode 100644 index 000000000..3191192bc Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_50_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_50_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_50_1.png new file mode 100644 index 000000000..d3c8f4eb0 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_50_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_74_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_74_0.png new file mode 100644 index 000000000..3c787126a Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_datalab_workflows_74_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_indepth_overview_25_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_indepth_overview_25_0.png new file mode 100644 index 000000000..4231ebeed Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_indepth_overview_25_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_indepth_overview_49_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_indepth_overview_49_0.png new file mode 100644 index 000000000..e61be677e Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_indepth_overview_49_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_indepth_overview_55_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_indepth_overview_55_0.png new file mode 100644 index 000000000..3d41baa39 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_indepth_overview_55_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_indepth_overview_8_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_indepth_overview_8_0.png new file mode 100644 index 000000000..ca118fcac Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_indepth_overview_8_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_multilabel_classification_19_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_multilabel_classification_19_0.png new file mode 100644 index 000000000..6992b1216 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_multilabel_classification_19_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_multilabel_classification_9_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_multilabel_classification_9_0.png new file mode 100644 index 000000000..2e409b9ff Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_multilabel_classification_9_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_22_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_22_1.png new file mode 100644 index 000000000..482e044ac Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_22_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_24_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_24_1.png new file mode 100644 index 000000000..512fc71c5 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_24_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_26_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_26_1.png new file mode 100644 index 000000000..e5cc313e5 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_26_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_28_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_28_1.png new file mode 100644 index 000000000..38c29b07d Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_28_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_31_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_31_1.png new file mode 100644 index 000000000..178d07750 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_31_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_33_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_33_1.png new file mode 100644 index 000000000..d57dc95a6 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_33_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_35_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_35_1.png new file mode 100644 index 000000000..dfaf20952 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_35_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_38_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_38_1.png new file mode 100644 index 000000000..f9f15bfd3 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_38_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_38_3.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_38_3.png new file mode 100644 index 000000000..484169e5a Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_38_3.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_38_5.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_38_5.png new file mode 100644 index 000000000..23b5b864f Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_38_5.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_44_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_44_1.png new file mode 100644 index 000000000..a58ceca25 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_44_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_44_3.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_44_3.png new file mode 100644 index 000000000..2cd0cb83a Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_44_3.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_44_5.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_44_5.png new file mode 100644 index 000000000..fdc91223c Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_44_5.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_8_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_8_0.png new file mode 100644 index 000000000..62716ca2d Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_object_detection_8_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_13_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_13_0.png new file mode 100644 index 000000000..df4d5ad79 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_13_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_15_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_15_0.png new file mode 100644 index 000000000..358bc9593 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_15_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_20_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_20_1.png new file mode 100644 index 000000000..cf1eba332 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_20_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_22_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_22_0.png new file mode 100644 index 000000000..02d2a08f6 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_22_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_24_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_24_0.png new file mode 100644 index 000000000..ed13a9ba8 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_24_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_27_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_27_0.png new file mode 100644 index 000000000..3757f037e Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_27_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_29_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_29_0.png new file mode 100644 index 000000000..c39af09e8 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_outliers_29_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_regression_14_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_regression_14_0.png new file mode 100644 index 000000000..63b616cfb Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_regression_14_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_19_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_19_0.png new file mode 100644 index 000000000..6cb6f47bb Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_19_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_19_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_19_1.png new file mode 100644 index 000000000..49067f19a Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_19_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_21_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_21_0.png new file mode 100644 index 000000000..c041ae84a Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_21_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_21_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_21_1.png new file mode 100644 index 000000000..d08f45e79 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_21_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_21_2.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_21_2.png new file mode 100644 index 000000000..9a8cc6bce Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_21_2.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_27_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_27_0.png new file mode 100644 index 000000000..c041ae84a Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_27_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_27_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_27_1.png new file mode 100644 index 000000000..6a9558904 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_27_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_27_2.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_27_2.png new file mode 100644 index 000000000..bd9888ddd Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_27_2.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_27_3.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_27_3.png new file mode 100644 index 000000000..6c7795496 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_27_3.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_0.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_0.png new file mode 100644 index 000000000..4355002b1 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_0.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_1.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_1.png new file mode 100644 index 000000000..3ac3b2b3c Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_1.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_2.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_2.png new file mode 100644 index 000000000..ef3cb4498 Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_2.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_3.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_3.png new file mode 100644 index 000000000..1213c142a Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_3.png differ diff --git a/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_4.png b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_4.png new file mode 100644 index 000000000..a37f2467f Binary files /dev/null and b/v2.6.6/.doctrees/nbsphinx/tutorials_segmentation_32_4.png differ diff --git a/v2.6.6/.doctrees/tutorials/clean_learning/index.doctree b/v2.6.6/.doctrees/tutorials/clean_learning/index.doctree new file mode 100644 index 000000000..9da31be2a Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/clean_learning/index.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/clean_learning/tabular.doctree b/v2.6.6/.doctrees/tutorials/clean_learning/tabular.doctree new file mode 100644 index 000000000..47c0945ad Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/clean_learning/tabular.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/clean_learning/text.doctree b/v2.6.6/.doctrees/tutorials/clean_learning/text.doctree new file mode 100644 index 000000000..3888ca439 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/clean_learning/text.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/datalab/audio.doctree b/v2.6.6/.doctrees/tutorials/datalab/audio.doctree new file mode 100644 index 000000000..a2a2ccfc7 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/datalab/audio.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/datalab/datalab_advanced.doctree b/v2.6.6/.doctrees/tutorials/datalab/datalab_advanced.doctree new file mode 100644 index 000000000..28ce26001 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/datalab/datalab_advanced.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/datalab/datalab_quickstart.doctree b/v2.6.6/.doctrees/tutorials/datalab/datalab_quickstart.doctree new file mode 100644 index 000000000..07c0fe275 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/datalab/datalab_quickstart.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/datalab/image.doctree b/v2.6.6/.doctrees/tutorials/datalab/image.doctree new file mode 100644 index 000000000..dec7c0814 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/datalab/image.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/datalab/index.doctree b/v2.6.6/.doctrees/tutorials/datalab/index.doctree new file mode 100644 index 000000000..ac7c81ea7 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/datalab/index.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/datalab/tabular.doctree b/v2.6.6/.doctrees/tutorials/datalab/tabular.doctree new file mode 100644 index 000000000..bd9792dc6 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/datalab/tabular.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/datalab/text.doctree b/v2.6.6/.doctrees/tutorials/datalab/text.doctree new file mode 100644 index 000000000..36c0514c4 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/datalab/text.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/datalab/workflows.doctree b/v2.6.6/.doctrees/tutorials/datalab/workflows.doctree new file mode 100644 index 000000000..c28fe076d Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/datalab/workflows.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/dataset_health.doctree b/v2.6.6/.doctrees/tutorials/dataset_health.doctree new file mode 100644 index 000000000..b562ec965 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/dataset_health.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/faq.doctree b/v2.6.6/.doctrees/tutorials/faq.doctree new file mode 100644 index 000000000..82cd0325b Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/faq.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/indepth_overview.doctree b/v2.6.6/.doctrees/tutorials/indepth_overview.doctree new file mode 100644 index 000000000..461ccbd59 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/indepth_overview.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/index.doctree b/v2.6.6/.doctrees/tutorials/index.doctree new file mode 100644 index 000000000..d7d4395a1 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/index.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/multiannotator.doctree b/v2.6.6/.doctrees/tutorials/multiannotator.doctree new file mode 100644 index 000000000..5e6ea21f4 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/multiannotator.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/multilabel_classification.doctree b/v2.6.6/.doctrees/tutorials/multilabel_classification.doctree new file mode 100644 index 000000000..7ef15d80e Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/multilabel_classification.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/object_detection.doctree b/v2.6.6/.doctrees/tutorials/object_detection.doctree new file mode 100644 index 000000000..55684e5a8 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/object_detection.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/outliers.doctree b/v2.6.6/.doctrees/tutorials/outliers.doctree new file mode 100644 index 000000000..d82629936 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/outliers.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/pred_probs_cross_val.doctree b/v2.6.6/.doctrees/tutorials/pred_probs_cross_val.doctree new file mode 100644 index 000000000..949de954a Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/pred_probs_cross_val.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/regression.doctree b/v2.6.6/.doctrees/tutorials/regression.doctree new file mode 100644 index 000000000..aa804a8f7 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/regression.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/segmentation.doctree b/v2.6.6/.doctrees/tutorials/segmentation.doctree new file mode 100644 index 000000000..695d49604 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/segmentation.doctree differ diff --git a/v2.6.6/.doctrees/tutorials/token_classification.doctree b/v2.6.6/.doctrees/tutorials/token_classification.doctree new file mode 100644 index 000000000..6bb752362 Binary files /dev/null and b/v2.6.6/.doctrees/tutorials/token_classification.doctree differ diff --git a/v2.6.6/_images/tutorials_datalab_datalab_advanced_15_0.png b/v2.6.6/_images/tutorials_datalab_datalab_advanced_15_0.png new file mode 100644 index 000000000..f4e1cd479 Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_datalab_advanced_15_0.png differ diff --git a/v2.6.6/_images/tutorials_datalab_datalab_quickstart_15_0.png b/v2.6.6/_images/tutorials_datalab_datalab_quickstart_15_0.png new file mode 100644 index 000000000..9d0daefb7 Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_datalab_quickstart_15_0.png differ diff --git a/v2.6.6/_images/tutorials_datalab_image_30_1.png b/v2.6.6/_images/tutorials_datalab_image_30_1.png new file mode 100644 index 000000000..321b30602 Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_image_30_1.png differ diff --git a/v2.6.6/_images/tutorials_datalab_image_30_3.png b/v2.6.6/_images/tutorials_datalab_image_30_3.png new file mode 100644 index 000000000..1a7b24908 Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_image_30_3.png differ diff --git a/v2.6.6/_images/tutorials_datalab_image_38_0.png b/v2.6.6/_images/tutorials_datalab_image_38_0.png new file mode 100644 index 000000000..9c10ccac5 Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_image_38_0.png differ diff --git a/v2.6.6/_images/tutorials_datalab_image_44_0.png b/v2.6.6/_images/tutorials_datalab_image_44_0.png new file mode 100644 index 000000000..3f475d975 Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_image_44_0.png differ diff --git a/v2.6.6/_images/tutorials_datalab_image_50_0.png b/v2.6.6/_images/tutorials_datalab_image_50_0.png new file mode 100644 index 000000000..d95c6961d Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_image_50_0.png differ diff --git a/v2.6.6/_images/tutorials_datalab_image_50_1.png b/v2.6.6/_images/tutorials_datalab_image_50_1.png new file mode 100644 index 000000000..a9e4dc375 Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_image_50_1.png differ diff --git a/v2.6.6/_images/tutorials_datalab_image_50_2.png b/v2.6.6/_images/tutorials_datalab_image_50_2.png new file mode 100644 index 000000000..1aa029df0 Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_image_50_2.png differ diff --git a/v2.6.6/_images/tutorials_datalab_image_50_3.png b/v2.6.6/_images/tutorials_datalab_image_50_3.png new file mode 100644 index 000000000..5c236c697 Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_image_50_3.png differ diff --git a/v2.6.6/_images/tutorials_datalab_image_50_4.png b/v2.6.6/_images/tutorials_datalab_image_50_4.png new file mode 100644 index 000000000..bf5f771b5 Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_image_50_4.png differ diff --git a/v2.6.6/_images/tutorials_datalab_image_57_0.png b/v2.6.6/_images/tutorials_datalab_image_57_0.png new file mode 100644 index 000000000..2ed7b063d Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_image_57_0.png differ diff --git a/v2.6.6/_images/tutorials_datalab_image_61_0.png b/v2.6.6/_images/tutorials_datalab_image_61_0.png new file mode 100644 index 000000000..82c886c5e Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_image_61_0.png differ diff --git a/v2.6.6/_images/tutorials_datalab_workflows_19_0.png b/v2.6.6/_images/tutorials_datalab_workflows_19_0.png new file mode 100644 index 000000000..6611034e4 Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_workflows_19_0.png differ diff --git a/v2.6.6/_images/tutorials_datalab_workflows_31_0.png b/v2.6.6/_images/tutorials_datalab_workflows_31_0.png new file mode 100644 index 000000000..ff2f2167b Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_workflows_31_0.png differ diff --git a/v2.6.6/_images/tutorials_datalab_workflows_50_0.png b/v2.6.6/_images/tutorials_datalab_workflows_50_0.png new file mode 100644 index 000000000..3191192bc Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_workflows_50_0.png differ diff --git a/v2.6.6/_images/tutorials_datalab_workflows_50_1.png b/v2.6.6/_images/tutorials_datalab_workflows_50_1.png new file mode 100644 index 000000000..d3c8f4eb0 Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_workflows_50_1.png differ diff --git a/v2.6.6/_images/tutorials_datalab_workflows_74_0.png b/v2.6.6/_images/tutorials_datalab_workflows_74_0.png new file mode 100644 index 000000000..3c787126a Binary files /dev/null and b/v2.6.6/_images/tutorials_datalab_workflows_74_0.png differ diff --git a/v2.6.6/_images/tutorials_indepth_overview_25_0.png b/v2.6.6/_images/tutorials_indepth_overview_25_0.png new file mode 100644 index 000000000..4231ebeed Binary files /dev/null and b/v2.6.6/_images/tutorials_indepth_overview_25_0.png differ diff --git a/v2.6.6/_images/tutorials_indepth_overview_49_0.png b/v2.6.6/_images/tutorials_indepth_overview_49_0.png new file mode 100644 index 000000000..e61be677e Binary files /dev/null and b/v2.6.6/_images/tutorials_indepth_overview_49_0.png differ diff --git a/v2.6.6/_images/tutorials_indepth_overview_55_0.png b/v2.6.6/_images/tutorials_indepth_overview_55_0.png new file mode 100644 index 000000000..3d41baa39 Binary files /dev/null and b/v2.6.6/_images/tutorials_indepth_overview_55_0.png differ diff --git a/v2.6.6/_images/tutorials_indepth_overview_8_0.png b/v2.6.6/_images/tutorials_indepth_overview_8_0.png new file mode 100644 index 000000000..ca118fcac Binary files /dev/null and b/v2.6.6/_images/tutorials_indepth_overview_8_0.png differ diff --git a/v2.6.6/_images/tutorials_multilabel_classification_19_0.png b/v2.6.6/_images/tutorials_multilabel_classification_19_0.png new file mode 100644 index 000000000..6992b1216 Binary files /dev/null and b/v2.6.6/_images/tutorials_multilabel_classification_19_0.png differ diff --git a/v2.6.6/_images/tutorials_multilabel_classification_9_0.png b/v2.6.6/_images/tutorials_multilabel_classification_9_0.png new file mode 100644 index 000000000..2e409b9ff Binary files /dev/null and b/v2.6.6/_images/tutorials_multilabel_classification_9_0.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_22_1.png b/v2.6.6/_images/tutorials_object_detection_22_1.png new file mode 100644 index 000000000..482e044ac Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_22_1.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_24_1.png b/v2.6.6/_images/tutorials_object_detection_24_1.png new file mode 100644 index 000000000..512fc71c5 Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_24_1.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_26_1.png b/v2.6.6/_images/tutorials_object_detection_26_1.png new file mode 100644 index 000000000..e5cc313e5 Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_26_1.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_28_1.png b/v2.6.6/_images/tutorials_object_detection_28_1.png new file mode 100644 index 000000000..38c29b07d Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_28_1.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_31_1.png b/v2.6.6/_images/tutorials_object_detection_31_1.png new file mode 100644 index 000000000..178d07750 Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_31_1.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_33_1.png b/v2.6.6/_images/tutorials_object_detection_33_1.png new file mode 100644 index 000000000..d57dc95a6 Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_33_1.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_35_1.png b/v2.6.6/_images/tutorials_object_detection_35_1.png new file mode 100644 index 000000000..dfaf20952 Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_35_1.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_38_1.png b/v2.6.6/_images/tutorials_object_detection_38_1.png new file mode 100644 index 000000000..f9f15bfd3 Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_38_1.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_38_3.png b/v2.6.6/_images/tutorials_object_detection_38_3.png new file mode 100644 index 000000000..484169e5a Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_38_3.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_38_5.png b/v2.6.6/_images/tutorials_object_detection_38_5.png new file mode 100644 index 000000000..23b5b864f Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_38_5.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_44_1.png b/v2.6.6/_images/tutorials_object_detection_44_1.png new file mode 100644 index 000000000..a58ceca25 Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_44_1.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_44_3.png b/v2.6.6/_images/tutorials_object_detection_44_3.png new file mode 100644 index 000000000..2cd0cb83a Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_44_3.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_44_5.png b/v2.6.6/_images/tutorials_object_detection_44_5.png new file mode 100644 index 000000000..fdc91223c Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_44_5.png differ diff --git a/v2.6.6/_images/tutorials_object_detection_8_0.png b/v2.6.6/_images/tutorials_object_detection_8_0.png new file mode 100644 index 000000000..62716ca2d Binary files /dev/null and b/v2.6.6/_images/tutorials_object_detection_8_0.png differ diff --git a/v2.6.6/_images/tutorials_outliers_13_0.png b/v2.6.6/_images/tutorials_outliers_13_0.png new file mode 100644 index 000000000..df4d5ad79 Binary files /dev/null and b/v2.6.6/_images/tutorials_outliers_13_0.png differ diff --git a/v2.6.6/_images/tutorials_outliers_15_0.png b/v2.6.6/_images/tutorials_outliers_15_0.png new file mode 100644 index 000000000..358bc9593 Binary files /dev/null and b/v2.6.6/_images/tutorials_outliers_15_0.png differ diff --git a/v2.6.6/_images/tutorials_outliers_20_1.png b/v2.6.6/_images/tutorials_outliers_20_1.png new file mode 100644 index 000000000..cf1eba332 Binary files /dev/null and b/v2.6.6/_images/tutorials_outliers_20_1.png differ diff --git a/v2.6.6/_images/tutorials_outliers_22_0.png b/v2.6.6/_images/tutorials_outliers_22_0.png new file mode 100644 index 000000000..02d2a08f6 Binary files /dev/null and b/v2.6.6/_images/tutorials_outliers_22_0.png differ diff --git a/v2.6.6/_images/tutorials_outliers_24_0.png b/v2.6.6/_images/tutorials_outliers_24_0.png new file mode 100644 index 000000000..ed13a9ba8 Binary files /dev/null and b/v2.6.6/_images/tutorials_outliers_24_0.png differ diff --git a/v2.6.6/_images/tutorials_outliers_27_0.png b/v2.6.6/_images/tutorials_outliers_27_0.png new file mode 100644 index 000000000..3757f037e Binary files /dev/null and b/v2.6.6/_images/tutorials_outliers_27_0.png differ diff --git a/v2.6.6/_images/tutorials_outliers_29_0.png b/v2.6.6/_images/tutorials_outliers_29_0.png new file mode 100644 index 000000000..c39af09e8 Binary files /dev/null and b/v2.6.6/_images/tutorials_outliers_29_0.png differ diff --git a/v2.6.6/_images/tutorials_regression_14_0.png b/v2.6.6/_images/tutorials_regression_14_0.png new file mode 100644 index 000000000..63b616cfb Binary files /dev/null and b/v2.6.6/_images/tutorials_regression_14_0.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_19_0.png b/v2.6.6/_images/tutorials_segmentation_19_0.png new file mode 100644 index 000000000..6cb6f47bb Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_19_0.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_19_1.png b/v2.6.6/_images/tutorials_segmentation_19_1.png new file mode 100644 index 000000000..49067f19a Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_19_1.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_21_0.png b/v2.6.6/_images/tutorials_segmentation_21_0.png new file mode 100644 index 000000000..c041ae84a Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_21_0.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_21_1.png b/v2.6.6/_images/tutorials_segmentation_21_1.png new file mode 100644 index 000000000..d08f45e79 Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_21_1.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_21_2.png b/v2.6.6/_images/tutorials_segmentation_21_2.png new file mode 100644 index 000000000..9a8cc6bce Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_21_2.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_27_0.png b/v2.6.6/_images/tutorials_segmentation_27_0.png new file mode 100644 index 000000000..c041ae84a Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_27_0.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_27_1.png b/v2.6.6/_images/tutorials_segmentation_27_1.png new file mode 100644 index 000000000..6a9558904 Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_27_1.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_27_2.png b/v2.6.6/_images/tutorials_segmentation_27_2.png new file mode 100644 index 000000000..bd9888ddd Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_27_2.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_27_3.png b/v2.6.6/_images/tutorials_segmentation_27_3.png new file mode 100644 index 000000000..6c7795496 Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_27_3.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_32_0.png b/v2.6.6/_images/tutorials_segmentation_32_0.png new file mode 100644 index 000000000..4355002b1 Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_32_0.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_32_1.png b/v2.6.6/_images/tutorials_segmentation_32_1.png new file mode 100644 index 000000000..3ac3b2b3c Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_32_1.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_32_2.png b/v2.6.6/_images/tutorials_segmentation_32_2.png new file mode 100644 index 000000000..ef3cb4498 Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_32_2.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_32_3.png b/v2.6.6/_images/tutorials_segmentation_32_3.png new file mode 100644 index 000000000..1213c142a Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_32_3.png differ diff --git a/v2.6.6/_images/tutorials_segmentation_32_4.png b/v2.6.6/_images/tutorials_segmentation_32_4.png new file mode 100644 index 000000000..a37f2467f Binary files /dev/null and b/v2.6.6/_images/tutorials_segmentation_32_4.png differ diff --git a/v2.6.6/_modules/cleanlab/benchmarking/noise_generation.html b/v2.6.6/_modules/cleanlab/benchmarking/noise_generation.html new file mode 100644 index 000000000..bac5cc852 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/benchmarking/noise_generation.html @@ -0,0 +1,1182 @@ + + + + + + + + + + + cleanlab.benchmarking.noise_generation - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.benchmarking.noise_generation

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+
+"""
+Helper methods that are useful for benchmarking cleanlab’s core algorithms.
+These methods introduce synthetic noise into the labels of a classification dataset.
+Specifically, this module provides methods for generating valid noise matrices (for which learning with noise is possible),
+generating noisy labels given a noise matrix, generating valid noise matrices with a specific trace value, and more.
+"""
+
+from typing import Optional
+
+import numpy as np
+from cleanlab.internal.util import value_counts
+from cleanlab.internal.constants import FLOATING_POINT_COMPARISON
+
+
+
[docs]def noise_matrix_is_valid(noise_matrix, py, *, verbose=False) -> bool: + """Given a prior `py` representing ``p(true_label=k)``, checks if the given `noise_matrix` is a + learnable matrix. Learnability means that it is possible to achieve + better than random performance, on average, for the amount of noise in + `noise_matrix`. + + Parameters + ---------- + noise_matrix : np.ndarray + An array of shape ``(K, K)`` representing the conditional probability + matrix ``P(label=k_s|true_label=k_y)`` containing the fraction of + examples in every class, labeled as every other class. Assumes columns of + `noise_matrix` sum to 1. + + py : np.ndarray + An array of shape ``(K,)`` representing the fraction (prior probability) + of each true class label, ``P(true_label = k)``. + + Returns + ------- + is_valid : bool + Whether the noise matrix is a learnable matrix. + """ + + # Number of classes + K = len(py) + + # let's assume some number of training examples for code readability, + # but it doesn't matter what we choose as it's not actually used. + N = float(10000) + + ps = np.dot(noise_matrix, py) # P(true_label=k) + + # P(label=k, true_label=k') + joint_noise = np.multiply(noise_matrix, py) # / float(N) + + # Check that joint_probs is valid probability matrix + if not (abs(joint_noise.sum() - 1.0) < FLOATING_POINT_COMPARISON): + return False + + # Check that noise_matrix is a valid matrix + # i.e. check p(label=k)*p(true_label=k) < p(label=k, true_label=k) + for i in range(K): + C = N * joint_noise[i][i] + E1 = N * joint_noise[i].sum() - C + E2 = N * joint_noise.T[i].sum() - C + O = N - E1 - E2 - C + if verbose: + print( + "E1E2/C", + round(E1 * E2 / C), + "E1", + round(E1), + "E2", + round(E2), + "C", + round(C), + "|", + round(E1 * E2 / C + E1 + E2 + C), + "|", + round(E1 * E2 / C), + "<", + round(O), + ) + print( + round(ps[i] * py[i]), + "<", + round(joint_noise[i][i]), + ":", + ps[i] * py[i] < joint_noise[i][i], + ) + + if not (ps[i] * py[i] < joint_noise[i][i]): + return False + + return True
+ + +
[docs]def generate_noisy_labels(true_labels, noise_matrix) -> np.ndarray: + """Generates noisy `labels` from perfect labels `true_labels`, + "exactly" yielding the provided `noise_matrix` between `labels` and `true_labels`. + + Below we provide a for loop implementation of what this function does. + We do not use this implementation as it is not a fast algorithm, but + it explains as Python pseudocode what is happening in this function. + + Parameters + ---------- + true_labels : np.ndarray + An array of shape ``(N,)`` representing perfect labels, without any + noise. Contains K distinct natural number classes, 0, 1, ..., K-1. + + noise_matrix : np.ndarray + An array of shape ``(K, K)`` representing the conditional probability + matrix ``P(label=k_s|true_label=k_y)`` containing the fraction of + examples in every class, labeled as every other class. Assumes columns of + `noise_matrix` sum to 1. + + Returns + ------- + labels : np.ndarray + An array of shape ``(N,)`` of noisy labels. + + Examples + -------- + + .. code:: python + + # Generate labels + count_joint = (noise_matrix * py * len(y)).round().astype(int) + labels = np.ndarray(y) + for k_s in range(K): + for k_y in range(K): + if k_s != k_y: + idx_flip = np.where((labels==k_y)&(true_label==k_y))[0] + if len(idx_flip): # pragma: no cover + labels[np.random.choice( + idx_flip, + count_joint[k_s][k_y], + replace=False, + )] = k_s + """ + + # Make y a numpy array, if it is not + true_labels = np.asarray(true_labels) + + # Number of classes + K = len(noise_matrix) + + # Compute p(true_label=k) + py = value_counts(true_labels) / float(len(true_labels)) + + # Counts of pairs (labels, y) + count_joint = (noise_matrix * py * len(true_labels)).astype(int) + # Remove diagonal entries as they do not involve flipping of labels. + np.fill_diagonal(count_joint, 0) + + # Generate labels + labels = np.array(true_labels) + for k in range(K): # Iterate over true_label == k + # Get the noisy labels that have non-zero counts + labels_per_class = np.where(count_joint[:, k] != 0)[0] + # Find out how many of each noisy label we need to flip to + label_counts = count_joint[labels_per_class, k] + # Create a list of the new noisy labels + noise = [labels_per_class[i] for i, c in enumerate(label_counts) for z in range(c)] + # Randomly choose y labels for class k and set them to the noisy labels. + idx_flip = np.where((labels == k) & (true_labels == k))[0] + if len(idx_flip) and len(noise) and len(idx_flip) >= len(noise): # pragma: no cover + labels[np.random.choice(idx_flip, len(noise), replace=False)] = noise + + # Validate that labels indeed produces the correct noise_matrix (or close to it) + # Compute the actual noise matrix induced by labels + # counts = confusion_matrix(labels, true_labels).astype(float) + # new_noise_matrix = counts / counts.sum(axis=0) + # assert(np.linalg.norm(noise_matrix - new_noise_matrix) <= 2) + + return labels
+ + +
[docs]def generate_noise_matrix_from_trace( + K, + trace, + *, + max_trace_prob=1.0, + min_trace_prob=1e-5, + max_noise_rate=1 - 1e-5, + min_noise_rate=0.0, + valid_noise_matrix=True, + py=None, + frac_zero_noise_rates=0.0, + seed=0, + max_iter=10000, +) -> Optional[np.ndarray]: + """Generates a ``K x K`` noise matrix ``P(label=k_s|true_label=k_y)`` with + ``np.sum(np.diagonal(noise_matrix))`` equal to the given `trace`. + + Parameters + ---------- + K : int + Creates a noise matrix of shape ``(K, K)``. Implies there are + K classes for learning with noisy labels. + + trace : float + Sum of diagonal entries of array of random probabilities returned. + + max_trace_prob : float + Maximum probability of any entry in the trace of the return matrix. + + min_trace_prob : float + Minimum probability of any entry in the trace of the return matrix. + + max_noise_rate : float + Maximum noise_rate (non-diagonal entry) in the returned np.ndarray. + + min_noise_rate : float + Minimum noise_rate (non-diagonal entry) in the returned np.ndarray. + + valid_noise_matrix : bool, default=True + If ``True``, returns a matrix having all necessary conditions for + learning with noisy labels. In particular, ``p(true_label=k)p(label=k) < p(true_label=k,label=k)`` + is satisfied. This requires that ``trace > 1``. + + py : np.ndarray + An array of shape ``(K,)`` representing the fraction (prior probability) of each true class label, ``P(true_label = k)``. + This argument is **required** when ``valid_noise_matrix=True``. + + frac_zero_noise_rates : float + The fraction of the ``n*(n-1)`` noise rates + that will be set to 0. Note that if you set a high trace, it may be + impossible to also have a low fraction of zero noise rates without + forcing all non-1 diagonal values. Instead, when this happens we only + guarantee to produce a noise matrix with `frac_zero_noise_rates` *or + higher*. The opposite occurs with a small trace. + + seed : int + Seeds the random number generator for numpy. + + max_iter : int, default=10000 + The max number of tries to produce a valid matrix before returning ``None``. + + Returns + ------- + noise_matrix : np.ndarray or None + An array of shape ``(K, K)`` representing the noise matrix ``P(label=k_s|true_label=k_y)`` with `trace` + equal to ``np.sum(np.diagonal(noise_matrix))``. This a conditional probability matrix and a + left stochastic matrix. Returns ``None`` if `max_iter` is exceeded. + """ + + if valid_noise_matrix and trace <= 1: + raise ValueError( + "trace = {}. trace > 1 is necessary for a".format(trace) + + " valid noise matrix to be returned (valid_noise_matrix == True)" + ) + + if valid_noise_matrix and py is None and K > 2: + raise ValueError( + "py must be provided (not None) if the input parameter" + " valid_noise_matrix == True" + ) + + if K <= 1: + raise ValueError("K must be >= 2, but K = {}.".format(K)) + + if max_iter < 1: + return None + + np.random.seed(seed) + + # Special (highly constrained) case with faster solution. + # Every 2 x 2 noise matrix with trace > 1 is valid because p(y) is not used + if K == 2: + if frac_zero_noise_rates >= 0.5: # Include a single zero noise rate + noise_mat = np.array( + [ + [1.0, 1 - (trace - 1.0)], + [0.0, trace - 1.0], + ] + ) + return noise_mat if np.random.rand() > 0.5 else np.rot90(noise_mat, k=2) + else: # No zero noise rates + diag = generate_n_rand_probabilities_that_sum_to_m(2, trace) + noise_matrix = np.array( + [ + [diag[0], 1 - diag[1]], + [1 - diag[0], diag[1]], + ] + ) + return noise_matrix + + # K > 2 + for z in range(max_iter): + noise_matrix = np.zeros(shape=(K, K)) + + # Randomly generate noise_matrix diagonal. + nm_diagonal = generate_n_rand_probabilities_that_sum_to_m( + n=K, + m=trace, + max_prob=max_trace_prob, + min_prob=min_trace_prob, + ) + np.fill_diagonal(noise_matrix, nm_diagonal) + + # Randomly distribute number of zero-noise-rates across columns + num_col_with_noise = K - np.count_nonzero(1 == nm_diagonal) + num_zero_noise_rates = int(K * (K - 1) * frac_zero_noise_rates) + # Remove zeros already in [1,0,..,0] columns + num_zero_noise_rates -= (K - num_col_with_noise) * (K - 1) + num_zero_noise_rates = np.maximum(num_zero_noise_rates, 0) # Prevent negative + num_zero_noise_rates_per_col = ( + randomly_distribute_N_balls_into_K_bins( + N=num_zero_noise_rates, + K=num_col_with_noise, + max_balls_per_bin=K - 2, + # 2 = one for diagonal, and one to sum to 1 + min_balls_per_bin=0, + ) + if K > 2 + else np.array([0, 0]) + ) # Special case when K == 2 + stack_nonzero_noise_rates_per_col = list(K - 1 - num_zero_noise_rates_per_col)[::-1] + # Randomly generate noise rates for columns with noise. + for col in np.arange(K)[nm_diagonal != 1]: + num_noise = stack_nonzero_noise_rates_per_col.pop() + # Generate num_noise noise_rates for the given column. + noise_rates_col = list( + generate_n_rand_probabilities_that_sum_to_m( + n=num_noise, + m=1 - nm_diagonal[col], + max_prob=max_noise_rate, + min_prob=min_noise_rate, + ) + ) + # Randomly select which rows of the noisy column to assign the + # random noise rates + rows = np.random.choice( + [row for row in range(K) if row != col], num_noise, replace=False + ) + for row in rows: + noise_matrix[row][col] = noise_rates_col.pop() + if not valid_noise_matrix or noise_matrix_is_valid(noise_matrix, py): + return noise_matrix + + return None
+ + +
[docs]def generate_n_rand_probabilities_that_sum_to_m( + n, + m, + *, + max_prob=1.0, + min_prob=0.0, +) -> np.ndarray: + """ + Generates `n` random probabilities that sum to `m`. + + When ``min_prob=0`` and ``max_prob = 1.0``, use + ``np.random.dirichlet(np.ones(n))*m`` instead. + + Parameters + ---------- + n : int + Length of array of random probabilities to be returned. + + m : float + Sum of array of random probabilities that is returned. + + max_prob : float, default=1.0 + Maximum probability of any entry in the returned array. Must be between 0 and 1. + + min_prob : float, default=0.0 + Minimum probability of any entry in the returned array. Must be between 0 and 1. + + Returns + ------- + probabilities : np.ndarray + An array of probabilities. + """ + + if n == 0: + return np.array([]) + if (max_prob + FLOATING_POINT_COMPARISON) < m / float(n): + raise ValueError( + "max_prob must be greater or equal to m / n, but " + + "max_prob = " + + str(max_prob) + + ", m = " + + str(m) + + ", n = " + + str(n) + + ", m / n = " + + str(m / float(n)) + ) + if min_prob > (m + FLOATING_POINT_COMPARISON) / float(n): + raise ValueError( + "min_prob must be less or equal to m / n, but " + + "max_prob = " + + str(max_prob) + + ", m = " + + str(m) + + ", n = " + + str(n) + + ", m / n = " + + str(m / float(n)) + ) + + # When max_prob = 1, min_prob = 0, the next two lines are equivalent to: + # intermediate = np.sort(np.append(np.random.uniform(0, 1, n-1), [0, 1])) + # result = (intermediate[1:] - intermediate[:-1]) * m + result = np.random.dirichlet(np.ones(n)) * m + + min_val = min(result) + max_val = max(result) + while max_val > (max_prob + FLOATING_POINT_COMPARISON): + new_min = min_val + (max_val - max_prob) + # This adjustment prevents the new max from always being max_prob. + adjustment = (max_prob - new_min) * np.random.rand() + result[np.argmin(result)] = new_min + adjustment + result[np.argmax(result)] = max_prob - adjustment + min_val = min(result) + max_val = max(result) + + min_val = min(result) + max_val = max(result) + while min_val < (min_prob - FLOATING_POINT_COMPARISON): + min_val = min(result) + max_val = max(result) + new_max = max_val - (min_prob - min_val) + # This adjustment prevents the new min from always being min_prob. + adjustment = (new_max - min_prob) * np.random.rand() + result[np.argmax(result)] = new_max - adjustment + result[np.argmin(result)] = min_prob + adjustment + min_val = min(result) + max_val = max(result) + + return result
+ + +
[docs]def randomly_distribute_N_balls_into_K_bins( + N, # int + K, # int + *, + max_balls_per_bin=None, + min_balls_per_bin=None, +) -> np.ndarray: + """Returns a uniformly random numpy integer array of length `N` that sums + to `K`. + + Parameters + ---------- + N : int + Number of balls. + K : int + Number of bins. + max_balls_per_bin : int + Ensure that each bin contains at most `max_balls_per_bin` balls. + min_balls_per_bin : int + Ensure that each bin contains at least `min_balls_per_bin` balls. + + Returns + ------- + int_array : np.array + Length `N` array that sums to `K`. + """ + + if N == 0: + return np.zeros(K, dtype=int) + if max_balls_per_bin is None: + max_balls_per_bin = N + else: + max_balls_per_bin = min(max_balls_per_bin, N) + if min_balls_per_bin is None: + min_balls_per_bin = 0 + else: + min_balls_per_bin = min(min_balls_per_bin, N / K) + if N / float(K) > max_balls_per_bin: + N = max_balls_per_bin * K + + arr = np.round( + generate_n_rand_probabilities_that_sum_to_m( + n=K, + m=1, + max_prob=max_balls_per_bin / float(N), + min_prob=min_balls_per_bin / float(N), + ) + * N + ) + while sum(arr) != N: + while sum(arr) > N: # pragma: no cover + arr[np.argmax(arr)] -= 1 + while sum(arr) < N: + arr[np.argmin(arr)] += 1 + return arr.astype(int)
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/classification.html b/v2.6.6/_modules/cleanlab/classification.html new file mode 100644 index 000000000..1ec449533 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/classification.html @@ -0,0 +1,1762 @@ + + + + + + + + + + + cleanlab.classification - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.classification

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+cleanlab can be used for learning with noisy labels for any dataset and model.
+
+For regular (multi-class) classification tasks,
+the `~cleanlab.classification.CleanLearning` class wraps an instance of an
+sklearn classifier. The wrapped classifier must adhere to the `sklearn estimator API
+<https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator>`_,
+meaning it must define four functions:
+
+* ``clf.fit(X, y, sample_weight=None)``
+* ``clf.predict_proba(X)``
+* ``clf.predict(X)``
+* ``clf.score(X, y, sample_weight=None)``
+
+where `X` contains data (i.e. features), `y` contains labels (with elements in 0, 1, ..., K-1,
+where K is the number of classes). The first index of `X` and of `y` should correspond to the different examples in the dataset,
+such that ``len(X) = len(y) = N`` (sample-size). Here `sample_weight` re-weights examples in
+the loss function while training (supporting `sample_weight` in your classifier is recommended but optional).
+
+Furthermore, your estimator should be correctly clonable via
+`sklearn.base.clone <https://scikit-learn.org/stable/modules/generated/sklearn.base.clone.html>`_:
+cleanlab internally creates multiple instances of the
+estimator, and if you e.g. manually wrap a PyTorch model, you must ensure that
+every call to the estimator's ``__init__()`` creates an independent instance of
+the model (for sklearn compatibility, the weights of neural network models should typically be initialized inside of ``clf.fit()``).
+
+Note
+----
+There are two new notions of confidence in this package:
+
+1. Confident *examples* --- examples we are confident are labeled correctly.
+We prune everything else. Mathematically, this means keeping the examples
+with high probability of belong to their provided label class.
+
+2. Confident *errors* --- examples we are confident are labeled erroneously.
+We prune these. Mathematically, this means pruning the examples with
+high probability of belong to a different class.
+
+Examples
+--------
+>>> from cleanlab.classification import CleanLearning
+>>> from sklearn.linear_model import LogisticRegression as LogReg
+>>> cl = CleanLearning(clf=LogReg()) # Pass in any classifier.
+>>> cl.fit(X_train, labels_maybe_with_errors)
+>>> # Estimate the predictions as if you had trained without label issues.
+>>> pred = cl.predict(X_test)
+
+If the model is not sklearn-compatible by default, it might be the case that
+standard packages can adapt the model. For example, you can adapt PyTorch
+models using `skorch <https://skorch.readthedocs.io/>`_ and adapt Keras models
+using `SciKeras <https://www.adriangb.com/scikeras/>`_.
+
+If an open-source adapter doesn't already exist, you can manually wrap the
+model to be sklearn-compatible. This is made easy by inheriting from
+`sklearn.base.BaseEstimator
+<https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html>`_:
+
+.. code:: python
+
+    from sklearn.base import BaseEstimator
+
+    class YourModel(BaseEstimator):
+        def __init__(self, ):
+            pass
+        def fit(self, X, y, sample_weight=None):
+            pass
+        def predict(self, X):
+            pass
+        def predict_proba(self, X):
+            pass
+        def score(self, X, y, sample_weight=None):
+            pass
+
+Note
+----
+
+* `labels` refers to the given labels in the original dataset, which may have errors
+* labels must be integers in 0, 1, ..., K-1, where K is the total number of classes
+
+Note
+----
+
+Confident learning is the state-of-the-art (`Northcutt et al., 2021 <https://jair.org/index.php/jair/article/view/12125>`_) for
+weak supervision, finding label issues in datasets, learning with noisy
+labels, uncertainty estimation, and more. It works with *any* classifier,
+including deep neural networks. See the `clf` parameter.
+
+Confident learning is a subfield of theory and algorithms of machine learning with noisy labels.
+Cleanlab achieves state-of-the-art performance of any open-sourced implementation of confident
+learning across a variety of tasks like multi-class classification, multi-label classification,
+and PU learning.
+
+Given any classifier having the `predict_proba` method, an input feature
+matrix `X`, and a discrete vector of noisy labels `labels`, confident learning estimates the
+classifications that would be obtained if the *true labels* had instead been provided
+to the classifier during training. `labels` denotes the noisy labels instead of
+the :math:`\\tilde{y}` used in confident learning paper.
+"""
+
+from sklearn.linear_model import LogisticRegression as LogReg
+from sklearn.metrics import accuracy_score
+from sklearn.base import BaseEstimator
+import numpy as np
+import pandas as pd
+import inspect
+import warnings
+from typing import Optional, TYPE_CHECKING
+
+if TYPE_CHECKING:  # pragma: no cover
+    from typing_extensions import Self
+
+from cleanlab.rank import get_label_quality_scores
+from cleanlab import filter
+from cleanlab.internal.util import (
+    value_counts,
+    compress_int_array,
+    subset_X_y,
+    get_num_classes,
+    force_two_dimensions,
+)
+from cleanlab.count import (
+    estimate_py_noise_matrices_and_cv_pred_proba,
+    estimate_py_and_noise_matrices_from_probabilities,
+    estimate_cv_predicted_probabilities,
+    estimate_latent,
+    compute_confident_joint,
+)
+from cleanlab.internal.latent_algebra import (
+    compute_py_inv_noise_matrix,
+    compute_noise_matrix_from_inverse,
+)
+from cleanlab.internal.validation import (
+    assert_valid_inputs,
+    labels_to_array,
+)
+from cleanlab.experimental.label_issues_batched import find_label_issues_batched
+
+
+
[docs]class CleanLearning(BaseEstimator): # Inherits sklearn classifier + """ + CleanLearning = Machine Learning with cleaned data (even when training on messy, error-ridden data). + + Automated and robust learning with noisy labels using any dataset and any model. This class + trains a model `clf` with error-prone, noisy labels as if the model had been instead trained + on a dataset with perfect labels. It achieves this by cleaning out the error and providing + cleaned data while training. This class is currently intended for standard (multi-class) classification tasks. + + Parameters + ---------- + clf : estimator instance, optional + A classifier implementing the `sklearn estimator API + <https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator>`_, + defining the following functions: + + * ``clf.fit(X, y, sample_weight=None)`` + * ``clf.predict_proba(X)`` + * ``clf.predict(X)`` + * ``clf.score(X, y, sample_weight=None)`` + + See :py:mod:`cleanlab.experimental` for examples of sklearn wrappers, + e.g. around PyTorch and FastText. + + If the model is not sklearn-compatible by default, it might be the case that + standard packages can adapt the model. For example, you can adapt PyTorch + models using `skorch <https://skorch.readthedocs.io/>`_ and adapt Keras models + using `SciKeras <https://www.adriangb.com/scikeras/>`_. + + Stores the classifier used in Confident Learning. + Default classifier used is `sklearn.linear_model.LogisticRegression + <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html>`_. + Default classifier assumes that indexing along the first dimension of the dataset corresponds to + selecting different training examples. + + seed : int, optional + Set the default state of the random number generator used to split + the cross-validated folds. By default, uses `np.random` current random state. + + cv_n_folds : int, default=5 + This class needs holdout predicted probabilities for every data example + and if not provided, uses cross-validation to compute them. + `cv_n_folds` sets the number of cross-validation folds used to compute + out-of-sample probabilities for each example in `X`. + + converge_latent_estimates : bool, optional + If true, forces numerical consistency of latent estimates. Each is + estimated independently, but they are related mathematically with closed + form equivalences. This will iteratively enforce consistency. + + pulearning : {None, 0, 1}, default=None + Only works for 2 class datasets. Set to the integer of the class that is + perfectly labeled (you are certain that there are no errors in that class). + + find_label_issues_kwargs : dict, optional + Keyword arguments to pass into :py:func:`filter.find_label_issues + <cleanlab.filter.find_label_issues>`. Particularly useful options include: + `filter_by`, `frac_noise`, `min_examples_per_class` (which all impact ML accuracy), + `n_jobs` (set this to 1 to disable multi-processing if it's causing issues). + + label_quality_scores_kwargs : dict, optional + Keyword arguments to pass into :py:func:`rank.get_label_quality_scores + <cleanlab.rank.get_label_quality_scores>`. Options include: `method`, `adjust_pred_probs`. + + verbose : bool, default=False + Controls how much output is printed. Set to ``False`` to suppress print + statements. + + low_memory: bool, default=False + Set as ``True`` if you have a big dataset with limited memory. + Uses :py:func:`experimental.label_issues_batched.find_label_issues_batched <cleanlab.experimental.label_issues_batched>` + to find label issues. + """ + + def __init__( + self, + clf=None, + *, + seed=None, + # Hyper-parameters (used by .fit() function) + cv_n_folds=5, + converge_latent_estimates=False, + pulearning=None, + find_label_issues_kwargs={}, + label_quality_scores_kwargs={}, + verbose=False, + low_memory=False, + ): + self._default_clf = False + if clf is None: + # Use logistic regression if no classifier is provided. + clf = LogReg(solver="lbfgs") + self._default_clf = True + + # Make sure the given classifier has the appropriate methods defined. + if not hasattr(clf, "fit"): + raise ValueError("The classifier (clf) must define a .fit() method.") + if not hasattr(clf, "predict_proba"): + raise ValueError("The classifier (clf) must define a .predict_proba() method.") + if not hasattr(clf, "predict"): + raise ValueError("The classifier (clf) must define a .predict() method.") + + if seed is not None: + np.random.seed(seed=seed) + + self.clf = clf + self.seed = seed + self.cv_n_folds = cv_n_folds + self.converge_latent_estimates = converge_latent_estimates + self.pulearning = pulearning + self.find_label_issues_kwargs = find_label_issues_kwargs + self.label_quality_scores_kwargs = label_quality_scores_kwargs + self.verbose = verbose + self.label_issues_df = None + self.label_issues_mask = None + self.sample_weight = None + self.confident_joint = None + self.py = None + self.ps = None + self.num_classes = None + self.noise_matrix = None + self.inverse_noise_matrix = None + self.clf_kwargs = None + self.clf_final_kwargs = None + self.low_memory = low_memory + +
[docs] def fit( + self, + X, + labels=None, + *, + pred_probs=None, + thresholds=None, + noise_matrix=None, + inverse_noise_matrix=None, + label_issues=None, + sample_weight=None, + clf_kwargs={}, + clf_final_kwargs={}, + validation_func=None, + y=None, + ) -> "Self": + """ + Train the model `clf` with error-prone, noisy labels as if + the model had been instead trained on a dataset with the correct labels. + `fit` achieves this by first training `clf` via cross-validation on the noisy data, + using the resulting predicted probabilities to identify label issues, + pruning the data with label issues, and finally training `clf` on the remaining clean data. + + Parameters + ---------- + X : np.ndarray or DatasetLike + Data features (i.e. training inputs for ML), typically an array of shape ``(N, ...)``, + where N is the number of examples. + Supported `DatasetLike` types beyond ``np.ndarray`` include: + ``pd.DataFrame``, ``scipy.sparse.csr_matrix``, ``torch.utils.data.Dataset``, ``tensorflow.data.Dataset``, + or any dataset object ``X`` that supports list-based indexing: + ``X[index_list]`` to select a subset of training examples. + Your classifier that this instance was initialized with, + ``clf``, must be able to ``fit()`` and ``predict()`` data of this format. + + Note + ---- + If providing `X` as a ``tensorflow.data.Dataset``, + make sure ``shuffle()`` has been called before ``batch()`` (if shuffling) + and no other order-destroying operation (eg. ``repeat()``) has been applied. + + labels : array_like + An array of shape ``(N,)`` of noisy classification labels, where some labels may be erroneous. + Elements must be integers in the set 0, 1, ..., K-1, where K is the number of classes. + Supported `array_like` types include: ``np.ndarray``, ``pd.Series``, or ``list``. + + pred_probs : np.ndarray, optional + An array of shape ``(N, K)`` of model-predicted probabilities, + ``P(label=k|x)``. Each row of this matrix corresponds + to an example `x` and contains the model-predicted probabilities that + `x` belongs to each possible class, for each of the K classes. The + columns must be ordered such that these probabilities correspond to class 0, 1, ..., K-1. + `pred_probs` should be :ref:`out-of-sample, eg. computed via cross-validation <pred_probs_cross_val>`. + If provided, `pred_probs` will be used to find label issues rather than the ``clf`` classifier. + + Note + ---- + If you are not sure, leave ``pred_probs=None`` (the default) and it + will be computed for you using cross-validation with the provided model. + + thresholds : array_like, optional + An array of shape ``(K, 1)`` or ``(K,)`` of per-class threshold + probabilities, used to determine the cutoff probability necessary to + consider an example as a given class label (see `Northcutt et al., + 2021 <https://jair.org/index.php/jair/article/view/12125>`_, Section + 3.1, Equation 2). + + This is for advanced users only. If not specified, these are computed + for you automatically. If an example has a predicted probability + greater than this threshold, it is counted as having true_label = + k. This is not used for pruning/filtering, only for estimating the + noise rates using confident counts. + + noise_matrix : np.ndarray, optional + An array of shape ``(K, K)`` representing the conditional probability + matrix ``P(label=k_s | true label=k_y)``, the + fraction of examples in every class, labeled as every other class. + Assumes columns of `noise_matrix` sum to 1. + + inverse_noise_matrix : np.ndarray, optional + An array of shape ``(K, K)`` representing the conditional probability + matrix ``P(true label=k_y | label=k_s)``, + the estimated fraction observed examples in each class ``k_s`` + that are mislabeled examples from every other class ``k_y``, + Assumes columns of `inverse_noise_matrix` sum to 1. + + label_issues : pd.DataFrame or np.ndarray, optional + Specifies the label issues for each example in dataset. + If ``pd.DataFrame``, must be formatted as the one returned by: + :py:meth:`CleanLearning.find_label_issues + <cleanlab.classification.CleanLearning.find_label_issues>` or + `~cleanlab.classification.CleanLearning.get_label_issues`. + If ``np.ndarray``, must contain either boolean `label_issues_mask` as output by: + default :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>`, + or integer indices as output by + :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` + with its `return_indices_ranked_by` argument specified. + Providing this argument significantly reduces the time this method takes to run by + skipping the slow cross-validation step necessary to find label issues. + Examples identified to have label issues will be + pruned from the data before training the final `clf` model. + + Caution: If you provide `label_issues` without having previously called + `~cleanlab.classification.CleanLearning.find_label_issues` + e.g. as a ``np.ndarray``, then some functionality like training with sample weights may be disabled. + + sample_weight : array_like, optional + Array of weights with shape ``(N,)`` that are assigned to individual samples, + assuming total number of examples in dataset is `N`. + If not provided, samples may still be weighted by the estimated noise in the class they are labeled as. + + clf_kwargs : dict, optional + Optional keyword arguments to pass into `clf`'s ``fit()`` method. + + clf_final_kwargs : dict, optional + Optional extra keyword arguments to pass into the final `clf` ``fit()`` on the cleaned data + but not the `clf` ``fit()`` in each fold of cross-validation on the noisy data. + The final ``fit()`` will also receive `clf_kwargs`, + but these may be overwritten by values in `clf_final_kwargs`. + This can be useful for training differently in the final ``fit()`` + than during cross-validation. + + validation_func : callable, optional + Optional callable function that takes two arguments, `X_val`, `y_val`, and returns a dict + of keyword arguments passed into to ``clf.fit()`` which may be functions of the validation + data in each cross-validation fold. Specifies how to map the validation data split in each + cross-validation fold into the appropriate format to pass into `clf`'s ``fit()`` method, assuming + ``clf.fit()`` can utilize validation data if it is appropriately passed in (eg. for early-stopping). + Eg. if your model's ``fit()`` method is called using ``clf.fit(X, y, X_validation, y_validation)``, + then you could set ``validation_func = f`` where + ``def f(X_val, y_val): return {"X_validation": X_val, "y_validation": y_val}`` + + Note that `validation_func` will be ignored in the final call to `clf.fit()` on the + cleaned subset of the data. This argument is only for allowing `clf` to access the + validation data in each cross-validation fold (eg. for early-stopping or hyperparameter-selection + purposes). If you want to pass in validation data even in the final training call to ``clf.fit()`` + on the cleaned data subset, you should explicitly pass in that data yourself + (eg. via `clf_final_kwargs` or `clf_kwargs`). + + y: array_like, optional + Alternative argument that can be specified instead of `labels`. + Specifying `y` has the same effect as specifying `labels`, + and is offered as an alternative for compatibility with sklearn. + + Returns + ------- + self : CleanLearning + Fitted estimator that has all the same methods as any sklearn estimator. + + + After calling ``self.fit()``, this estimator also stores extra attributes such as: + + * *self.label_issues_df*: a ``pd.DataFrame`` accessible via + `~cleanlab.classification.CleanLearning.get_label_issues` + of similar format as the one returned by: `~cleanlab.classification.CleanLearning.find_label_issues`. + See documentation of :py:meth:`CleanLearning.find_label_issues<cleanlab.classification.CleanLearning.find_label_issues>` + for column descriptions. + + + After calling ``self.fit()``, `self.label_issues_df` may also contain an extra column: + + * *sample_weight*: Numeric values that were used to weight examples during + the final training of `clf` in ``CleanLearning.fit()``. + `sample_weight` column will only be present if automatic sample weights were actually used. + These automatic weights are assigned to each example based on the class it belongs to, + i.e. there are only num_classes unique sample_weight values. + The sample weight for an example belonging to class k is computed as ``1 / p(given_label = k | true_label = k)``. + This sample_weight normalizes the loss to effectively trick `clf` into learning with the distribution + of the true labels by accounting for the noisy data pruned out prior to training on cleaned data. + In other words, examples with label issues were removed, so this weights the data proportionally + so that the classifier trains as if it had all the true labels, + not just the subset of cleaned data left after pruning out the label issues. + + Note + ---- + If ``CleanLearning.fit()`` does not work for your data/model, you can run the same procedure yourself: + * Utilize :ref:`cross-validation <pred_probs_cross_val>` to get out-of-sample `pred_probs` for each example. + * Call :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` with `pred_probs`. + * Filter the examples with detected issues and train your model on the remaining data. + """ + + if labels is not None and y is not None: + raise ValueError("You must specify either `labels` or `y`, but not both.") + if y is not None: + labels = y + if labels is None: + raise ValueError("You must specify `labels`.") + if self._default_clf: + X = force_two_dimensions(X) + + self.clf_final_kwargs = {**clf_kwargs, **clf_final_kwargs} + + if "sample_weight" in clf_kwargs: + raise ValueError( + "sample_weight should be provided directly in fit() or in clf_final_kwargs rather than in clf_kwargs" + ) + + if sample_weight is not None: + if "sample_weight" not in inspect.signature(self.clf.fit).parameters: + raise ValueError( + "sample_weight must be a supported fit() argument for your model in order to be specified here" + ) + + if label_issues is None: + if self.label_issues_df is not None and self.verbose: + print( + "If you already ran self.find_label_issues() and don't want to recompute, you " + "should pass the label_issues in as a parameter to this function next time." + ) + label_issues = self.find_label_issues( + X, + labels, + pred_probs=pred_probs, + thresholds=thresholds, + noise_matrix=noise_matrix, + inverse_noise_matrix=inverse_noise_matrix, + clf_kwargs=clf_kwargs, + validation_func=validation_func, + ) + + else: # set args that may not have been set if `self.find_label_issues()` wasn't called yet + assert_valid_inputs(X, labels, pred_probs) + if self.num_classes is None: + if noise_matrix is not None: + label_matrix = noise_matrix + else: + label_matrix = inverse_noise_matrix + self.num_classes = get_num_classes(labels, pred_probs, label_matrix) + if self.verbose: + print("Using provided label_issues instead of finding label issues.") + if self.label_issues_df is not None: + print( + "These will overwrite self.label_issues_df and will be returned by " + "`self.get_label_issues()`. " + ) + + # label_issues always overwrites self.label_issues_df. Ensure it is properly formatted: + self.label_issues_df = self._process_label_issues_arg(label_issues, labels) + + if "label_quality" not in self.label_issues_df.columns and pred_probs is not None: + if self.verbose: + print("Computing label quality scores based on given pred_probs ...") + self.label_issues_df["label_quality"] = get_label_quality_scores( + labels, pred_probs, **self.label_quality_scores_kwargs + ) + + self.label_issues_mask = self.label_issues_df["is_label_issue"].to_numpy() + x_mask = np.invert(self.label_issues_mask) + x_cleaned, labels_cleaned = subset_X_y(X, labels, x_mask) + if self.verbose: + print(f"Pruning {np.sum(self.label_issues_mask)} examples with label issues ...") + print(f"Remaining clean data has {len(labels_cleaned)} examples.") + + if sample_weight is None: + # Check if sample_weight in args of clf.fit() + if ( + "sample_weight" in inspect.signature(self.clf.fit).parameters + and "sample_weight" not in self.clf_final_kwargs + and self.noise_matrix is not None + ): + # Re-weight examples in the loss function for the final fitting + # such that the "apparent" original number of examples in each class + # is preserved, even though the pruned sets may differ. + if self.verbose: + print( + "Assigning sample weights for final training based on estimated label quality." + ) + sample_weight_auto = np.ones(np.shape(labels_cleaned)) + for k in range(self.num_classes): + sample_weight_k = 1.0 / max( + self.noise_matrix[k][k], 1e-3 + ) # clip sample weights + sample_weight_auto[labels_cleaned == k] = sample_weight_k + + sample_weight_expanded = np.zeros( + len(labels) + ) # pad pruned examples with zeros, length of original dataset + sample_weight_expanded[x_mask] = sample_weight_auto + # Store the sample weight for every example in the original, unfiltered dataset + self.label_issues_df["sample_weight"] = sample_weight_expanded + self.sample_weight = self.label_issues_df[ + "sample_weight" + ] # pointer to here to avoid duplication + self.clf_final_kwargs["sample_weight"] = sample_weight_auto + if self.verbose: + print("Fitting final model on the clean data ...") + else: + if self.verbose: + if "sample_weight" in self.clf_final_kwargs: + print("Fitting final model on the clean data with custom sample_weight ...") + else: + if ( + "sample_weight" in inspect.signature(self.clf.fit).parameters + and self.noise_matrix is None + ): + print( + "Cannot utilize sample weights for final training! " + "Why this matters: during final training, sample weights help account for the amount of removed data in each class. " + "This helps ensure the correct class prior for the learned model. " + "To use sample weights, you need to either provide the noise_matrix or have previously called self.find_label_issues() instead of filter.find_label_issues() which computes them for you." + ) + print("Fitting final model on the clean data ...") + + elif sample_weight is not None and "sample_weight" not in self.clf_final_kwargs: + self.clf_final_kwargs["sample_weight"] = sample_weight[x_mask] + if self.verbose: + print("Fitting final model on the clean data with custom sample_weight ...") + + else: # pragma: no cover + if self.verbose: + if "sample_weight" in self.clf_final_kwargs: + print("Fitting final model on the clean data with custom sample_weight ...") + else: + print("Fitting final model on the clean data ...") + + self.clf.fit(x_cleaned, labels_cleaned, **self.clf_final_kwargs) + + if self.verbose: + print( + "Label issues stored in label_issues_df DataFrame accessible via: self.get_label_issues(). " + "Call self.save_space() to delete this potentially large DataFrame attribute." + ) + return self
+ +
[docs] def predict(self, *args, **kwargs) -> np.ndarray: + """Predict class labels using your wrapped classifier `clf`. + Works just like ``clf.predict()``. + + Parameters + ---------- + X : np.ndarray or DatasetLike + Test data in the same format expected by your wrapped classifier. + + Returns + ------- + class_predictions : np.ndarray + Vector of class predictions for the test examples. + """ + if self._default_clf: + if args: + X = args[0] + elif "X" in kwargs: + X = kwargs["X"] + del kwargs["X"] + else: + raise ValueError("No input provided to predict, please provide X.") + X = force_two_dimensions(X) + new_args = (X,) + args[1:] + return self.clf.predict(*new_args, **kwargs) + else: + return self.clf.predict(*args, **kwargs)
+ +
[docs] def predict_proba(self, *args, **kwargs) -> np.ndarray: + """Predict class probabilities ``P(true label=k)`` using your wrapped classifier `clf`. + Works just like ``clf.predict_proba()``. + + Parameters + ---------- + X : np.ndarray or DatasetLike + Test data in the same format expected by your wrapped classifier. + + Returns + ------- + pred_probs : np.ndarray + ``(N x K)`` array of predicted class probabilities, one row for each test example. + """ + if self._default_clf: + if args: + X = args[0] + elif "X" in kwargs: + X = kwargs["X"] + del kwargs["X"] + else: + raise ValueError("No input provided to predict, please provide X.") + X = force_two_dimensions(X) + new_args = (X,) + args[1:] + return self.clf.predict_proba(*new_args, **kwargs) + else: + return self.clf.predict_proba(*args, **kwargs)
+ +
[docs] def score(self, X, y, sample_weight=None) -> float: + """Evaluates your wrapped classifier `clf`'s score on a test set `X` with labels `y`. + Uses your model's default scoring function, or simply accuracy if your model as no ``"score"`` attribute. + + Parameters + ---------- + X : np.ndarray or DatasetLike + Test data in the same format expected by your wrapped classifier. + + y : array_like + Test labels in the same format as labels previously used in ``fit()``. + + sample_weight : np.ndarray, optional + An array of shape ``(N,)`` or ``(N, 1)`` used to weight each test example when computing the score. + + Returns + ------- + score: float + Number quantifying the performance of this classifier on the test data. + """ + if self._default_clf: + X = force_two_dimensions(X) + if hasattr(self.clf, "score"): + # Check if sample_weight in clf.score() + if "sample_weight" in inspect.signature(self.clf.score).parameters: + return self.clf.score(X, y, sample_weight=sample_weight) + else: + return self.clf.score(X, y) + else: + return accuracy_score( + y, + self.clf.predict(X), + sample_weight=sample_weight, + )
+ +
[docs] def find_label_issues( + self, + X=None, + labels=None, + *, + pred_probs=None, + thresholds=None, + noise_matrix=None, + inverse_noise_matrix=None, + save_space=False, + clf_kwargs={}, + validation_func=None, + ) -> pd.DataFrame: + """ + Identifies potential label issues in the dataset using confident learning. + + Runs cross-validation to get out-of-sample pred_probs from `clf` + and then calls :py:func:`filter.find_label_issues + <cleanlab.filter.find_label_issues>` to find label issues. + These label issues are cached internally and returned in a pandas DataFrame. + Kwargs for :py:func:`filter.find_label_issues + <cleanlab.filter.find_label_issues>` must have already been specified + in the initialization of this class, not here. + + Unlike :py:func:`filter.find_label_issues + <cleanlab.filter.find_label_issues>`, which requires `pred_probs`, + this method only requires a classifier and it can do the cross-validation for you. + Both methods return the same boolean mask that identifies which examples have label issues. + This is the preferred method to use if you plan to subsequently invoke: + `~cleanlab.classification.CleanLearning.fit`. + + Note: this method computes the label issues from scratch. To access + previously-computed label issues from this `~cleanlab.classification.CleanLearning` instance, use the + `~cleanlab.classification.CleanLearning.get_label_issues` method. + + This is the method called to find label issues inside + `~cleanlab.classification.CleanLearning.fit` + and they share mostly the same parameters. + + Parameters + ---------- + save_space : bool, optional + If True, then returned `label_issues_df` will not be stored as attribute. + This means some other methods like `self.get_label_issues()` will no longer work. + + + For info about the **other parameters**, see the docstring of `~cleanlab.classification.CleanLearning.fit`. + + Returns + ------- + label_issues_df : pd.DataFrame + DataFrame with info about label issues for each example. + Unless `save_space` argument is specified, same DataFrame is also stored as + `self.label_issues_df` attribute accessible via + `~cleanlab.classification.CleanLearning.get_label_issues`. + Each row represents an example from our dataset and + the DataFrame may contain the following columns: + + * *is_label_issue*: boolean mask for the entire dataset where ``True`` represents a label issue and ``False`` represents an example that is accurately labeled with high confidence. This column is equivalent to `label_issues_mask` output from :py:func:`filter.find_label_issues<cleanlab.filter.find_label_issues>`. + * *label_quality*: Numeric score that measures the quality of each label (how likely it is to be correct, with lower scores indicating potentially erroneous labels). + * *given_label*: Integer indices corresponding to the class label originally given for this example (same as `labels` input). Included here for ease of comparison against `clf` predictions, only present if "predicted_label" column is present. + * *predicted_label*: Integer indices corresponding to the class predicted by trained `clf` model. Only present if ``pred_probs`` were provided as input or computed during label-issue-finding. + * *sample_weight*: Numeric values used to weight examples during the final training of `clf` in `~cleanlab.classification.CleanLearning.fit`. This column may not be present after `self.find_label_issues()` but may be added after call to `~cleanlab.classification.CleanLearning.fit`. For more precise definition of sample weights, see documentation of `~cleanlab.classification.CleanLearning.fit` + """ + + # Check inputs + assert_valid_inputs(X, labels, pred_probs) + labels = labels_to_array(labels) + if noise_matrix is not None and np.trace(noise_matrix) <= 1: + t = np.round(np.trace(noise_matrix), 2) + raise ValueError("Trace(noise_matrix) is {}, but must exceed 1.".format(t)) + if inverse_noise_matrix is not None and (np.trace(inverse_noise_matrix) <= 1): + t = np.round(np.trace(inverse_noise_matrix), 2) + raise ValueError("Trace(inverse_noise_matrix) is {}. Must exceed 1.".format(t)) + + if self._default_clf: + X = force_two_dimensions(X) + if noise_matrix is not None: + label_matrix = noise_matrix + else: + label_matrix = inverse_noise_matrix + self.num_classes = get_num_classes(labels, pred_probs, label_matrix) + if (pred_probs is None) and (len(labels) / self.num_classes < self.cv_n_folds): + raise ValueError( + "Need more data from each class for cross-validation. " + "Try decreasing cv_n_folds (eg. to 2 or 3) in CleanLearning()" + ) + # 'ps' is p(labels=k) + self.ps = value_counts(labels) / float(len(labels)) + + self.clf_kwargs = clf_kwargs + if self.low_memory: + # If needed, compute P(label=k|x), denoted pred_probs (the predicted probabilities) + if pred_probs is None: + if self.verbose: + print( + "Computing out of sample predicted probabilities via " + f"{self.cv_n_folds}-fold cross validation. May take a while ..." + ) + + pred_probs = estimate_cv_predicted_probabilities( + X=X, + labels=labels, + clf=self.clf, + cv_n_folds=self.cv_n_folds, + seed=self.seed, + clf_kwargs=self.clf_kwargs, + validation_func=validation_func, + ) + + if self.verbose: + print("Using predicted probabilities to identify label issues ...") + + if self.find_label_issues_kwargs: + warnings.warn(f"`find_label_issues_kwargs` is not used when `low_memory=True`.") + arg_values = { + "thresholds": thresholds, + "noise_matrix": noise_matrix, + "inverse_noise_matrix": inverse_noise_matrix, + } + for arg_name, arg_val in arg_values.items(): + if arg_val is not None: + warnings.warn(f"`{arg_name}` is not used when `low_memory=True`.") + label_issues_mask = find_label_issues_batched(labels, pred_probs, return_mask=True) + else: + self._process_label_issues_kwargs(self.find_label_issues_kwargs) + # self._process_label_issues_kwargs might set self.confident_joint. If so, we should use it. + if self.confident_joint is not None: + self.py, noise_matrix, inv_noise_matrix = estimate_latent( + confident_joint=self.confident_joint, + labels=labels, + ) + + # If needed, compute noise rates (probability of class-conditional mislabeling). + if noise_matrix is not None: + self.noise_matrix = noise_matrix + if inverse_noise_matrix is None: + if self.verbose: + print("Computing label noise estimates from provided noise matrix ...") + self.py, self.inverse_noise_matrix = compute_py_inv_noise_matrix( + ps=self.ps, + noise_matrix=self.noise_matrix, + ) + if inverse_noise_matrix is not None: + self.inverse_noise_matrix = inverse_noise_matrix + if noise_matrix is None: + if self.verbose: + print( + "Computing label noise estimates from provided inverse noise matrix ..." + ) + self.noise_matrix = compute_noise_matrix_from_inverse( + ps=self.ps, + inverse_noise_matrix=self.inverse_noise_matrix, + ) + + if noise_matrix is None and inverse_noise_matrix is None: + if pred_probs is None: + if self.verbose: + print( + "Computing out of sample predicted probabilities via " + f"{self.cv_n_folds}-fold cross validation. May take a while ..." + ) + ( + self.py, + self.noise_matrix, + self.inverse_noise_matrix, + self.confident_joint, + pred_probs, + ) = estimate_py_noise_matrices_and_cv_pred_proba( + X=X, + labels=labels, + clf=self.clf, + cv_n_folds=self.cv_n_folds, + thresholds=thresholds, + converge_latent_estimates=self.converge_latent_estimates, + seed=self.seed, + clf_kwargs=self.clf_kwargs, + validation_func=validation_func, + ) + else: # pred_probs is provided by user (assumed holdout probabilities) + if self.verbose: + print("Computing label noise estimates from provided pred_probs ...") + ( + self.py, + self.noise_matrix, + self.inverse_noise_matrix, + self.confident_joint, + ) = estimate_py_and_noise_matrices_from_probabilities( + labels=labels, + pred_probs=pred_probs, + thresholds=thresholds, + converge_latent_estimates=self.converge_latent_estimates, + ) + # If needed, compute P(label=k|x), denoted pred_probs (the predicted probabilities) + if pred_probs is None: + if self.verbose: + print( + "Computing out of sample predicted probabilities via " + f"{self.cv_n_folds}-fold cross validation. May take a while ..." + ) + + pred_probs = estimate_cv_predicted_probabilities( + X=X, + labels=labels, + clf=self.clf, + cv_n_folds=self.cv_n_folds, + seed=self.seed, + clf_kwargs=self.clf_kwargs, + validation_func=validation_func, + ) + # If needed, compute the confident_joint (e.g. occurs if noise_matrix was given) + if self.confident_joint is None: + self.confident_joint = compute_confident_joint( + labels=labels, + pred_probs=pred_probs, + thresholds=thresholds, + ) + + # if pulearning == the integer specifying the class without noise. + if self.num_classes == 2 and self.pulearning is not None: # pragma: no cover + # pulearning = 1 (no error in 1 class) implies p(label=1|true_label=0) = 0 + self.noise_matrix[self.pulearning][1 - self.pulearning] = 0 + self.noise_matrix[1 - self.pulearning][1 - self.pulearning] = 1 + # pulearning = 1 (no error in 1 class) implies p(true_label=0|label=1) = 0 + self.inverse_noise_matrix[1 - self.pulearning][self.pulearning] = 0 + self.inverse_noise_matrix[self.pulearning][self.pulearning] = 1 + # pulearning = 1 (no error in 1 class) implies p(label=1,true_label=0) = 0 + self.confident_joint[self.pulearning][1 - self.pulearning] = 0 + self.confident_joint[1 - self.pulearning][1 - self.pulearning] = 1 + + # Add confident joint to find label issue args if it is not previously specified + if "confident_joint" not in self.find_label_issues_kwargs.keys(): + # however does not add if users specify filter_by="confident_learning", as it will throw a warning + if not self.find_label_issues_kwargs.get("filter_by") == "confident_learning": + self.find_label_issues_kwargs["confident_joint"] = self.confident_joint + + labels = labels_to_array(labels) + if self.verbose: + print("Using predicted probabilities to identify label issues ...") + label_issues_mask = filter.find_label_issues( + labels, + pred_probs, + **self.find_label_issues_kwargs, + ) + label_quality_scores = get_label_quality_scores( + labels, pred_probs, **self.label_quality_scores_kwargs + ) + label_issues_df = pd.DataFrame( + {"is_label_issue": label_issues_mask, "label_quality": label_quality_scores} + ) + if self.verbose: + print(f"Identified {np.sum(label_issues_mask)} examples with label issues.") + + predicted_labels = pred_probs.argmax(axis=1) + label_issues_df["given_label"] = compress_int_array(labels, self.num_classes) + label_issues_df["predicted_label"] = compress_int_array(predicted_labels, self.num_classes) + + if not save_space: + if self.label_issues_df is not None and self.verbose: + print( + "Overwriting previously identified label issues stored at self.label_issues_df. " + "self.get_label_issues() will now return the newly identified label issues. " + ) + self.label_issues_df = label_issues_df + self.label_issues_mask = label_issues_df[ + "is_label_issue" + ] # pointer to here to avoid duplication + elif self.verbose: + print( # pragma: no cover + "Not storing label_issues as attributes since save_space was specified." + ) + + return label_issues_df
+ +
[docs] def get_label_issues(self) -> Optional[pd.DataFrame]: + """ + Accessor. Returns `label_issues_df` attribute if previously already computed. + This ``pd.DataFrame`` describes the label issues identified for each example + (each row corresponds to an example). + For column definitions, see the documentation of + `~cleanlab.classification.CleanLearning.find_label_issues`. + + Returns + ------- + label_issues_df : pd.DataFrame + DataFrame with (precomputed) info about label issues for each example. + """ + + if self.label_issues_df is None: + warnings.warn( + "Label issues have not yet been computed. Run `self.find_label_issues()` or `self.fit()` first." + ) + return self.label_issues_df
+ +
[docs] def save_space(self): + """ + Clears non-sklearn attributes of this estimator to save space (in-place). + This includes the DataFrame attribute that stored label issues which may be large for big datasets. + You may want to call this method before deploying this model (i.e. if you just care about producing predictions). + After calling this method, certain non-prediction-related attributes/functionality will no longer be available + (e.g. you cannot call ``self.fit()`` anymore). + """ + + if self.label_issues_df is None and self.verbose: + print("self.label_issues_df is already empty") # pragma: no cover + self.label_issues_df = None + self.sample_weight = None + self.label_issues_mask = None + self.find_label_issues_kwargs = None + self.label_quality_scores_kwargs = None + self.confident_joint = None + self.py = None + self.ps = None + self.num_classes = None + self.noise_matrix = None + self.inverse_noise_matrix = None + self.clf_kwargs = None + self.clf_final_kwargs = None + if self.verbose: + print("Deleted non-sklearn attributes such as label_issues_df to save space.")
+ + def _process_label_issues_kwargs(self, find_label_issues_kwargs): + """ + Private helper function that is used to modify the arguments to passed to + filter.find_label_issues via the CleanLearning.find_label_issues class. Because + this is a classification task, some default parameters change and some errors should + be throne if certain unsupported (for classification) arguments are passed in. This method + handles those parameters inside of find_label_issues_kwargs and throws an error if you pass + in a kwargs argument to filter.find_label_issues that is not supported by the + CleanLearning.find_label_issues() function. + """ + + # Defaults for CleanLearning.find_label_issues() vs filter.find_label_issues() + DEFAULT_FIND_LABEL_ISSUES_KWARGS = {"min_examples_per_class": 10} + find_label_issues_kwargs = {**DEFAULT_FIND_LABEL_ISSUES_KWARGS, **find_label_issues_kwargs} + # Todo: support multi_label classification in the future and remove multi_label from list + unsupported_kwargs = ["return_indices_ranked_by", "multi_label"] + for unsupported_kwarg in unsupported_kwargs: + if unsupported_kwarg in find_label_issues_kwargs: + raise ValueError( + "These kwargs of `find_label_issues()` are not supported " + f"for `CleanLearning`: {unsupported_kwargs}" + ) + # CleanLearning will use this to compute the noise_matrix and inverse_noise_matrix + if "confident_joint" in find_label_issues_kwargs: + self.confident_joint = find_label_issues_kwargs["confident_joint"] + self.find_label_issues_kwargs = find_label_issues_kwargs + + def _process_label_issues_arg(self, label_issues, labels) -> pd.DataFrame: + """ + Helper method to get the label_issues input arg into a formatted DataFrame. + """ + + labels = labels_to_array(labels) + if isinstance(label_issues, pd.DataFrame): + if "is_label_issue" not in label_issues.columns: + raise ValueError( + "DataFrame label_issues must contain column: 'is_label_issue'. " + "See CleanLearning.fit() documentation for label_issues column descriptions." + ) + if len(label_issues) != len(labels): + raise ValueError("label_issues and labels must have same length") + if "given_label" in label_issues.columns and np.any( + label_issues["given_label"].to_numpy() != labels + ): + raise ValueError("labels must match label_issues['given_label']") + return label_issues + elif isinstance(label_issues, np.ndarray): + if not label_issues.dtype in [np.dtype("bool"), np.dtype("int")]: + raise ValueError("If label_issues is numpy.array, dtype must be 'bool' or 'int'.") + if label_issues.dtype is np.dtype("bool") and label_issues.shape != labels.shape: + raise ValueError( + "If label_issues is boolean numpy.array, must have same shape as labels" + ) + if label_issues.dtype is np.dtype("int"): # convert to boolean mask + if len(np.unique(label_issues)) != len(label_issues): + raise ValueError( + "If label_issues.dtype is 'int', must contain unique integer indices " + "corresponding to examples with label issues such as output by: " + "filter.find_label_issues(..., return_indices_ranked_by=...)" + ) + issue_indices = label_issues + label_issues = np.full(len(labels), False, dtype=bool) + if len(issue_indices) > 0: + label_issues[issue_indices] = True + return pd.DataFrame({"is_label_issue": label_issues}) + else: + raise ValueError("label_issues must be either pandas.DataFrame or numpy.array")
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/count.html b/v2.6.6/_modules/cleanlab/count.html new file mode 100644 index 000000000..9eddad89c --- /dev/null +++ b/v2.6.6/_modules/cleanlab/count.html @@ -0,0 +1,2177 @@ + + + + + + + + + + + cleanlab.count - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.count

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods to estimate latent structures used for confident learning, including:
+
+* Latent prior of the unobserved, error-less labels: `py`: ``p(y)``
+* Latent noisy channel (noise matrix) characterizing the flipping rates: `nm`: ``P(given label | true label)``
+* Latent inverse noise matrix characterizing the flipping process: `inv`: ``P(true label | given label)``
+* Latent `confident_joint`, an un-normalized matrix that counts the confident subset of label errors under the joint distribution for true/given label
+
+These are estimated from a classification dataset. This module considers two types of datasets:
+
+* standard (multi-class) classification where each example is labeled as belonging to exactly one of K classes (e.g. ``labels = np.array([0,0,1,0,2,1])``)
+* multi-label classification where each example can be labeled as belonging to multiple classes (e.g. ``labels = [[1,2],[1],[0],[],...]``)
+"""
+
+import warnings
+from typing import Optional, Tuple, Union
+
+import numpy as np
+import sklearn.base
+from sklearn.linear_model import LogisticRegression as LogReg
+from sklearn.metrics import confusion_matrix
+from sklearn.model_selection import StratifiedKFold
+
+from cleanlab.internal.constants import (
+    CONFIDENT_THRESHOLDS_LOWER_BOUND,
+    FLOATING_POINT_COMPARISON,
+    TINY_VALUE,
+)
+from cleanlab.internal.latent_algebra import (
+    compute_inv_noise_matrix,
+    compute_noise_matrix_from_inverse,
+    compute_py,
+)
+from cleanlab.internal.multilabel_utils import get_onehot_num_classes, stack_complement
+from cleanlab.internal.util import (
+    append_extra_datapoint,
+    clip_noise_rates,
+    clip_values,
+    get_num_classes,
+    get_unique_classes,
+    is_tensorflow_dataset,
+    is_torch_dataset,
+    round_preserving_row_totals,
+    train_val_split,
+    value_counts_fill_missing_classes,
+)
+from cleanlab.internal.validation import assert_valid_inputs, labels_to_array
+from cleanlab.typing import LabelLike
+
+
+
[docs]def num_label_issues( + labels: LabelLike, + pred_probs: np.ndarray, + *, + confident_joint: Optional[np.ndarray] = None, + estimation_method: str = "off_diagonal", + multi_label: bool = False, +) -> int: + """Estimates the number of label issues in a classification dataset. Use this method to get the most accurate + estimate of number of label issues when you don't need the indices of the examples with label issues. + + Parameters + ---------- + labels : np.ndarray or list + Given class labels for each example in the dataset, some of which may be erroneous, + in same format expected by :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` function. + + pred_probs : + Model-predicted class probabilities for each example in the dataset, + in same format expected by :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` function. + + confident_joint : + Array of estimated class label error statisics used for identifying label issues, + in same format expected by :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` function. + The `confident_joint` can be computed using `~cleanlab.count.compute_confident_joint`. + It is internally computed from the given (noisy) `labels` and `pred_probs`. + + estimation_method : + Method for estimating the number of label issues in dataset by counting the examples in the off-diagonal of the `confident_joint` ``P(label=i, true_label=j)``. + + * ``'off_diagonal'``: Counts the number of examples in the off-diagonal of the `confident_joint`. Returns the same value as ``sum(find_label_issues(filter_by='confident_learning'))`` + + * ``'off_diagonal_calibrated'``: Calibrates confident joint estimate ``P(label=i, true_label=j)`` such that + ``np.sum(cj) == len(labels)`` and ``np.sum(cj, axis = 1) == np.bincount(labels)`` before counting the number + of examples in the off-diagonal. Number will always be equal to or greater than + ``estimate_issues='off_diagonal'``. You can use this value as the cutoff threshold used with ranking/scoring + functions from :py:mod:`cleanlab.rank` with `num_label_issues` over ``estimation_method='off_diagonal'`` in + two cases: + + #. As we add more label and data quality scoring functions in :py:mod:`cleanlab.rank`, this approach will always work. + #. If you have a custom score to rank your data by label quality and you just need to know the cut-off of likely label issues. + + * ``'off_diagonal_custom'``: Counts the number of examples in the off-diagonal of a provided `confident_joint` matrix. + + TL;DR: Use this method to get the most accurate estimate of number of label issues when you don't need the indices of the label issues. + + Note: ``'off_diagonal'`` may sometimes underestimate issues for data with few classes, so consider using ``'off_diagonal_calibrated'`` instead if your data has < 4 classes. + + multi_label : bool, optional + Set ``False`` if your dataset is for regular (multi-class) classification, where each example belongs to exactly one class. + Set ``True`` if your dataset is for multi-label classification, where each example can belong to multiple classes. + See documentation of `~cleanlab.count.compute_confident_joint` for details. + + Returns + ------- + num_issues : + The estimated number of examples with label issues in the dataset. + """ + valid_methods = ["off_diagonal", "off_diagonal_calibrated", "off_diagonal_custom"] + if isinstance(confident_joint, np.ndarray) and estimation_method != "off_diagonal_custom": + warn_str = ( + "The supplied `confident_joint` is ignored as `confident_joint` is recomuputed internally using " + "the supplied `labels` and `pred_probs`. If you still want to use custom `confident_joint` call function " + "with `estimation_method='off_diagonal_custom'`." + ) + warnings.warn(warn_str) + + if multi_label: + return _num_label_issues_multilabel( + labels=labels, + pred_probs=pred_probs, + confident_joint=confident_joint, + ) + labels = labels_to_array(labels) + assert_valid_inputs(X=None, y=labels, pred_probs=pred_probs) + + if estimation_method == "off_diagonal": + _, cl_error_indices = compute_confident_joint( + labels=labels, + pred_probs=pred_probs, + calibrate=False, + return_indices_of_off_diagonals=True, + ) + + label_issues_mask = np.zeros(len(labels), dtype=bool) + label_issues_mask[cl_error_indices] = True + + # Remove label issues if model prediction is close to given label + mask = _reduce_issues(pred_probs=pred_probs, labels=labels) + label_issues_mask[mask] = False + num_issues = np.sum(label_issues_mask) + elif estimation_method == "off_diagonal_calibrated": + calculated_confident_joint = compute_confident_joint( + labels=labels, + pred_probs=pred_probs, + calibrate=True, + ) + assert isinstance(calculated_confident_joint, np.ndarray) + # Estimate_joint calibrates the row sums to match the prior distribution of given labels and normalizes to sum to 1 + joint = estimate_joint(labels, pred_probs, confident_joint=calculated_confident_joint) + frac_issues = 1.0 - joint.trace() + num_issues = np.rint(frac_issues * len(labels)).astype(int) + elif estimation_method == "off_diagonal_custom": + if not isinstance(confident_joint, np.ndarray): + raise ValueError( + f""" + No `confident_joint` provided. For 'estimation_method' = {estimation_method} you need to provide pre-calculated + `confident_joint` matrix. Use a different `estimation_method` if you want the `confident_joint` matrix to + be calculated for you. + """ + ) + else: + joint = estimate_joint(labels, pred_probs, confident_joint=confident_joint) + frac_issues = 1.0 - joint.trace() + num_issues = np.rint(frac_issues * len(labels)).astype(int) + else: + raise ValueError( + f""" + {estimation_method} is not a valid estimation method! + Please choose a valid estimation method: {valid_methods} + """ + ) + + return num_issues
+ + +def _num_label_issues_multilabel( + labels: LabelLike, + pred_probs: np.ndarray, + confident_joint: Optional[np.ndarray] = None, +) -> int: + """ + Parameters + ---------- + labels: list + Refer to documentation for this argument in ``count.calibrate_confident_joint()`` with `multi_label=True` for details. + + pred_probs : np.ndarray + Predicted-probabilities in the same format expected by the `~cleanlab.count.get_confident_thresholds` function. + + Returns + ------- + num_issues : int + The estimated number of examples with label issues in the multi-label dataset. + + Note: We set the filter_by method as 'confident_learning' to match the non-multilabel case + (analog to the off_diagonal estimation method) + """ + + from cleanlab.filter import find_label_issues + + issues_idx = find_label_issues( + labels=labels, + pred_probs=pred_probs, + confident_joint=confident_joint, + multi_label=True, + filter_by="confident_learning", # specified to match num_label_issues + ) + return sum(issues_idx) + + +def _reduce_issues(pred_probs, labels): + """Returns a boolean mask denoting correct predictions or predictions within a margin around 0.5 for binary classification, suitable for filtering out indices in 'is_label_issue'.""" + pred_probs_copy = np.copy(pred_probs) # Make a copy of the original array + pred_probs_copy[np.arange(len(labels)), labels] += FLOATING_POINT_COMPARISON + pred = pred_probs_copy.argmax(axis=1) + mask = pred == labels + del pred_probs_copy # Delete copy + return mask + + +
[docs]def calibrate_confident_joint( + confident_joint: np.ndarray, labels: LabelLike, *, multi_label: bool = False +) -> np.ndarray: + """Calibrates any confident joint estimate ``P(label=i, true_label=j)`` such that + ``np.sum(cj) == len(labels)`` and ``np.sum(cj, axis = 1) == np.bincount(labels)``. + + In other words, this function forces the confident joint to have the + true noisy prior ``p(labels)`` (summed over columns for each row) and also + forces the confident joint to add up to the total number of examples. + + This method makes the confident joint a valid counts estimate + of the actual joint of noisy and true labels. + + Parameters + ---------- + confident_joint : np.ndarray + An array of shape ``(K, K)`` representing the confident joint, the matrix used for identifying label issues, which + estimates a confident subset of the joint distribution of the noisy and true labels, ``P_{noisy label, true label}``. + Entry ``(j, k)`` in the matrix is the number of examples confidently counted into the pair of ``(noisy label=j, true label=k)`` classes. + The `confident_joint` can be computed using `~cleanlab.count.compute_confident_joint`. + If not provided, it is computed from the given (noisy) `labels` and `pred_probs`. + If `multi_label` is True, then the `confident_joint` should be a one-vs-rest array of shape ``(K, 2, 2)``, and an array of the same shape will be returned. + + labels : np.ndarray or list + Given class labels for each example in the dataset, some of which may be erroneous, + in same format expected by :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` function. + + multi_label : bool, optional + If ``False``, dataset is for regular (multi-class) classification, where each example belongs to exactly one class. + If ``True``, dataset is for multi-label classification, where each example can belong to multiple classes. + See documentation of `~cleanlab.count.compute_confident_joint` for details. + In multi-label classification, the confident/calibrated joint arrays have shape ``(K, 2, 2)`` + formatted in a one-vs-rest fashion such that they contain a 2x2 matrix for each class + that counts examples which are correctly/incorrectly labeled as belonging to that class. + After calibration, the entries in each class-specific 2x2 matrix will sum to the number of examples. + + Returns + ------- + calibrated_cj : np.ndarray + An array of shape ``(K, K)`` representing a valid estimate of the joint *counts* of noisy and true labels (if `multi_label` is False). + If `multi_label` is True, the returned `calibrated_cj` is instead an one-vs-rest array of shape ``(K, 2, 2)``, + where for class `c`: entry ``(c, 0, 0)`` in this one-vs-rest array is the number of examples whose noisy label contains `c` confidently identified as truly belonging to class `c` as well. + Entry ``(c, 1, 0)`` in this one-vs-rest array is the number of examples whose noisy label contains `c` confidently identified as not actually belonging to class `c`. + Entry ``(c, 0, 1)`` in this one-vs-rest array is the number of examples whose noisy label does not contain `c` confidently identified as truly belonging to class `c`. + Entry ``(c, 1, 1)`` in this one-vs-rest array is the number of examples whose noisy label does not contain `c` confidently identified as actually not belonging to class `c` as well. + + """ + + if multi_label: + if not isinstance(labels, list): + raise TypeError("`labels` must be list when `multi_label=True`.") + else: + return _calibrate_confident_joint_multilabel(confident_joint, labels) + else: + num_classes = len(confident_joint) + label_counts = value_counts_fill_missing_classes(labels, num_classes, multi_label=False) + # Calibrate confident joint to have correct p(labels) prior on noisy labels. + calibrated_cj = ( + confident_joint.T + / np.clip(confident_joint.sum(axis=1), a_min=TINY_VALUE, a_max=None) + * label_counts + ).T + # Calibrate confident joint to sum to: + # The number of examples (for single labeled datasets) + # The number of total labels (for multi-labeled datasets) + calibrated_cj = ( + calibrated_cj + / np.clip(np.sum(calibrated_cj), a_min=TINY_VALUE, a_max=None) + * sum(label_counts) + ) + return round_preserving_row_totals(calibrated_cj)
+ + +def _calibrate_confident_joint_multilabel(confident_joint: np.ndarray, labels: list) -> np.ndarray: + """Calibrates the confident joint for multi-label classification data. Here + input `labels` is a list of lists (or list of iterable). + This is intended as a helper function. You should probably + be using `calibrate_confident_joint(multi_label=True)` instead. + + + See `calibrate_confident_joint` docstring for more info. + + Parameters + ---------- + confident_joint : np.ndarray + Refer to documentation for this argument in count.calibrate_confident_joint() for details. + + labels : list + Refer to documentation for this argument in count.calibrate_confident_joint() for details. + + multi_label : bool, optional + Refer to documentation for this argument in count.calibrate_confident_joint() for details. + + Returns + ------- + calibrated_cj : np.ndarray + An array of shape ``(K, 2, 2)`` of type float representing a valid + estimate of the joint *counts* of noisy and true labels in a one-vs-rest fashion.""" + y_one, num_classes = get_onehot_num_classes(labels) + calibrate_confident_joint_list: np.ndarray = np.ndarray( + shape=(num_classes, 2, 2), dtype=np.int64 + ) + for class_num, (cj, y) in enumerate(zip(confident_joint, y_one.T)): + calibrate_confident_joint_list[class_num] = calibrate_confident_joint(cj, labels=y) + + return calibrate_confident_joint_list + + +
[docs]def estimate_joint( + labels: LabelLike, + pred_probs: np.ndarray, + *, + confident_joint: Optional[np.ndarray] = None, + multi_label: bool = False, +) -> np.ndarray: + """ + Estimates the joint distribution of label noise ``P(label=i, true_label=j)`` guaranteed to: + + * Sum to 1 + * Satisfy ``np.sum(joint_estimate, axis = 1) == p(labels)`` + + Parameters + ---------- + labels : np.ndarray or list + Given class labels for each example in the dataset, some of which may be erroneous, + in same format expected by :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` function. + + pred_probs : np.ndarray + Model-predicted class probabilities for each example in the dataset, + in same format expected by :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` function. + + confident_joint : np.ndarray, optional + Array of estimated class label error statisics used for identifying label issues, + in same format expected by :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` function. + The `confident_joint` can be computed using `~cleanlab.count.compute_confident_joint`. + If not provided, it is internally computed from the given (noisy) `labels` and `pred_probs`. + + multi_label : bool, optional + If ``False``, dataset is for regular (multi-class) classification, where each example belongs to exactly one class. + If ``True``, dataset is for multi-label classification, where each example can belong to multiple classes. + See documentation of `~cleanlab.count.compute_confident_joint` for details. + + Returns + ------- + confident_joint_distribution : np.ndarray + An array of shape ``(K, K)`` representing an + estimate of the true joint distribution of noisy and true labels (if `multi_label` is False). + If `multi_label` is True, an array of shape ``(K, 2, 2)`` representing an + estimate of the true joint distribution of noisy and true labels for each class in a one-vs-rest fashion. + Entry ``(c, i, j)`` in this array is the number of examples confidently counted into a ``(class c, noisy label=i, true label=j)`` bin, + where `i, j` are either 0 or 1 to denote whether this example belongs to class `c` or not + (recall examples can belong to multiple classes in multi-label classification). + """ + + if confident_joint is None: + calibrated_cj = compute_confident_joint( + labels, + pred_probs, + calibrate=True, + multi_label=multi_label, + ) + else: + if labels is not None: + calibrated_cj = calibrate_confident_joint( + confident_joint, labels, multi_label=multi_label + ) + else: + calibrated_cj = confident_joint + + assert isinstance(calibrated_cj, np.ndarray) + if multi_label: + if not isinstance(labels, list): + raise TypeError("`labels` must be list when `multi_label=True`.") + else: + return _estimate_joint_multilabel( + labels=labels, pred_probs=pred_probs, confident_joint=confident_joint + ) + else: + return calibrated_cj / np.clip(float(np.sum(calibrated_cj)), a_min=TINY_VALUE, a_max=None)
+ + +def _estimate_joint_multilabel( + labels: list, pred_probs: np.ndarray, *, confident_joint: Optional[np.ndarray] = None +) -> np.ndarray: + """Parameters + ---------- + labels : list + Refer to documentation for this argument in filter.find_label_issues() for details. + + pred_probs : np.ndarray + Refer to documentation for this argument in count.estimate_joint() for details. + + confident_joint : np.ndarray, optional + Refer to documentation for this argument in filter.find_label_issues() with multi_label=True for details. + + Returns + ------- + confident_joint_distribution : np.ndarray + An array of shape ``(K, 2, 2)`` representing an + estimate of the true joint distribution of noisy and true labels for each class, in a one-vs-rest format employed for multi-label settings. + """ + y_one, num_classes = get_onehot_num_classes(labels, pred_probs) + if confident_joint is None: + calibrated_cj = compute_confident_joint( + labels, + pred_probs, + calibrate=True, + multi_label=True, + ) + else: + calibrated_cj = confident_joint + assert isinstance(calibrated_cj, np.ndarray) + calibrated_cf: np.ndarray = np.ndarray((num_classes, 2, 2)) + for class_num, (label, pred_prob_for_class) in enumerate(zip(y_one.T, pred_probs.T)): + pred_probs_binary = stack_complement(pred_prob_for_class) + calibrated_cf[class_num] = estimate_joint( + labels=label, + pred_probs=pred_probs_binary, + confident_joint=calibrated_cj[class_num], + ) + + return calibrated_cf + + +
[docs]def compute_confident_joint( + labels: LabelLike, + pred_probs: np.ndarray, + *, + thresholds: Optional[Union[np.ndarray, list]] = None, + calibrate: bool = True, + multi_label: bool = False, + return_indices_of_off_diagonals: bool = False, +) -> Union[np.ndarray, Tuple[np.ndarray, list]]: + """Estimates the confident counts of latent true vs observed noisy labels + for the examples in our dataset. This array of shape ``(K, K)`` is called the **confident joint** + and contains counts of examples in every class, confidently labeled as every other class. + These counts may subsequently be used to estimate the joint distribution of true and noisy labels + (by normalizing them to frequencies). + + Important: this function assumes that `pred_probs` are out-of-sample + holdout probabilities. This can be :ref:`done with cross validation <pred_probs_cross_val>`. If + the probabilities are not computed out-of-sample, overfitting may occur. + + Parameters + ---------- + labels : np.ndarray or list + Given class labels for each example in the dataset, some of which may be erroneous, + in same format expected by :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` function. + + pred_probs : np.ndarray + Model-predicted class probabilities for each example in the dataset, + in same format expected by :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` function. + + thresholds : array_like, optional + An array of shape ``(K, 1)`` or ``(K,)`` of per-class threshold + probabilities, used to determine the cutoff probability necessary to + consider an example as a given class label (see `Northcutt et al., + 2021 <https://jair.org/index.php/jair/article/view/12125>`_, Section + 3.1, Equation 2). + + This is for advanced users only. If not specified, these are computed + for you automatically. If an example has a predicted probability + greater than this threshold, it is counted as having true_label = + k. This is not used for pruning/filtering, only for estimating the + noise rates using confident counts. + + calibrate : bool, default=True + Calibrates confident joint estimate ``P(label=i, true_label=j)`` such that + ``np.sum(cj) == len(labels)`` and ``np.sum(cj, axis = 1) == np.bincount(labels)``. + When ``calibrate=True``, this method returns an estimate of + the latent true joint counts of noisy and true labels. + + multi_label : bool, optional + If ``True``, this is multi-label classification dataset (where each example can belong to more than one class) + rather than a regular (multi-class) classifiction dataset. + In this case, `labels` should be an iterable (e.g. list) of iterables (e.g. ``List[List[int]]``), + containing the list of classes to which each example belongs, instead of just a single class. + Example of `labels` for a multi-label classification dataset: ``[[0,1], [1], [0,2], [0,1,2], [0], [1], [], ...]``. + + return_indices_of_off_diagonals : bool, optional + If ``True``, returns indices of examples that were counted in off-diagonals + of confident joint as a baseline proxy for the label issues. This + sometimes works as well as ``filter.find_label_issues(confident_joint)``. + + + Returns + ------- + confident_joint_counts : np.ndarray + An array of shape ``(K, K)`` representing counts of examples + for which we are confident about their given and true label (if `multi_label` is False). + If `multi_label` is True, + this array instead has shape ``(K, 2, 2)`` representing a one-vs-rest format for the confident joint, where for each class `c`: + Entry ``(c, 0, 0)`` in this one-vs-rest array is the number of examples whose noisy label contains `c` confidently identified as truly belonging to class `c` as well. + Entry ``(c, 1, 0)`` in this one-vs-rest array is the number of examples whose noisy label contains `c` confidently identified as not actually belonging to class `c`. + Entry ``(c, 0, 1)`` in this one-vs-rest array is the number of examples whose noisy label does not contain `c` confidently identified as truly belonging to class `c`. + Entry ``(c, 1, 1)`` in this one-vs-rest array is the number of examples whose noisy label does not contain `c` confidently identified as actually not belonging to class `c` as well. + + + Note + ---- + If `return_indices_of_off_diagonals` is set as True, this function instead returns a tuple `(confident_joint, indices_off_diagonal)` + where `indices_off_diagonal` is a list of arrays and each array contains the indices of examples counted in off-diagonals of confident joint. + + Note + ---- + We provide a for-loop based simplification of the confident joint + below. This implementation is not efficient, not used in practice, and + not complete, but covers the gist of how the confident joint is computed: + + .. code:: python + + # Confident examples are those that we are confident have true_label = k + # Estimate (K, K) matrix of confident examples with label = k_s and true_label = k_y + cj_ish = np.zeros((K, K)) + for k_s in range(K): # k_s is the class value k of noisy labels `s` + for k_y in range(K): # k_y is the (guessed) class k of true_label k_y + cj_ish[k_s][k_y] = sum((pred_probs[:,k_y] >= (thresholds[k_y] - 1e-8)) & (labels == k_s)) + + The following is a vectorized (but non-parallelized) implementation of the + confident joint, again slow, using for-loops/simplified for understanding. + This implementation is 100% accurate, it's just not optimized for speed. + + .. code:: python + + confident_joint = np.zeros((K, K), dtype = int) + for i, row in enumerate(pred_probs): + s_label = labels[i] + confident_bins = row >= thresholds - 1e-6 + num_confident_bins = sum(confident_bins) + if num_confident_bins == 1: + confident_joint[s_label][np.argmax(confident_bins)] += 1 + elif num_confident_bins > 1: + confident_joint[s_label][np.argmax(row)] += 1 + """ + + if multi_label: + if not isinstance(labels, list): + raise TypeError("`labels` must be list when `multi_label=True`.") + + return _compute_confident_joint_multi_label( + labels=labels, + pred_probs=pred_probs, + thresholds=thresholds, + calibrate=calibrate, + return_indices_of_off_diagonals=return_indices_of_off_diagonals, + ) + + # labels needs to be a numpy array + labels = np.asarray(labels) + + # Estimate the probability thresholds for confident counting + if thresholds is None: + # P(we predict the given noisy label is k | given noisy label is k) + thresholds = get_confident_thresholds(labels, pred_probs, multi_label=multi_label) + thresholds = np.asarray(thresholds) + + # Compute confident joint (vectorized for speed). + + # pred_probs_bool is a bool matrix where each row represents a training example as a boolean vector of + # size num_classes, with True if the example confidently belongs to that class and False if not. + pred_probs_bool = pred_probs >= thresholds - 1e-6 + num_confident_bins = pred_probs_bool.sum(axis=1) + at_least_one_confident = num_confident_bins > 0 + more_than_one_confident = num_confident_bins > 1 + pred_probs_argmax = pred_probs.argmax(axis=1) + # Note that confident_argmax is meaningless for rows of all False + confident_argmax = pred_probs_bool.argmax(axis=1) + # For each example, choose the confident class (greater than threshold) + # When there is 2+ confident classes, choose the class with largest prob. + true_label_guess = np.where( + more_than_one_confident, + pred_probs_argmax, + confident_argmax, + ) + # true_labels_confident omits meaningless all-False rows + true_labels_confident = true_label_guess[at_least_one_confident] + labels_confident = labels[at_least_one_confident] + confident_joint = confusion_matrix( + y_true=true_labels_confident, + y_pred=labels_confident, + labels=range(pred_probs.shape[1]), + ).T # Guarantee at least one correctly labeled example is represented in every class + np.fill_diagonal(confident_joint, confident_joint.diagonal().clip(min=1)) + if calibrate: + confident_joint = calibrate_confident_joint(confident_joint, labels) + + if return_indices_of_off_diagonals: + true_labels_neq_given_labels = true_labels_confident != labels_confident + indices = np.arange(len(labels))[at_least_one_confident][true_labels_neq_given_labels] + + return confident_joint, indices + + return confident_joint
+ + +def _compute_confident_joint_multi_label( + labels: list, + pred_probs: np.ndarray, + *, + thresholds: Optional[Union[np.ndarray, list]] = None, + calibrate: bool = True, + return_indices_of_off_diagonals: bool = False, +) -> Union[np.ndarray, Tuple[np.ndarray, list]]: + """Computes the confident joint for multi_labeled data. Thus, + input `labels` is a list of lists (or list of iterable). + This is intended as a helper function. You should probably + be using `compute_confident_joint(multi_label=True)` instead. + + The MAJOR DIFFERENCE in how this is computed versus single_label, + is the total number of errors considered is based on the number + of labels, not the number of examples. So, the confident_joint + will have larger values. + + See `compute_confident_joint` docstring for more info. + + Parameters + ---------- + labels : list of list/iterable (length N) + Given noisy labels for multi-label classification. + Must be a list of lists (or a list of np.ndarrays or iterable). + The i-th element is a list containing the classes that the i-th example belongs to. + + pred_probs : np.ndarray (shape (N, K)) + P(label=k|x) is a matrix with K model-predicted probabilities. + Each row of this matrix corresponds to an example `x` and contains the model-predicted + probabilities that `x` belongs to each possible class. + The columns must be ordered such that these probabilities correspond to class 0, 1, 2,..., K-1. + `pred_probs` must be out-of-sample (ideally should have been computed using 3+ fold cross-validation). + + thresholds : iterable (list or np.ndarray) of shape (K, 1) or (K,) + P(label^=k|label=k). If an example has a predicted probability "greater" than + this threshold, it is counted as having true_label = k. This is + not used for filtering/pruning, only for estimating the noise rates using + confident counts. This value should be between 0 and 1. Default is None. + + calibrate : bool, default = True + Calibrates confident joint estimate P(label=i, true_label=j) such that + ``np.sum(cj) == len(labels) and np.sum(cj, axis = 1) == np.bincount(labels)``. + + return_indices_of_off_diagonals: bool, default = False + If true returns indices of examples that were counted in off-diagonals + of confident joint as a baseline proxy for the label issues. This + sometimes works as well as filter.find_label_issues(confident_joint). + + Returns + ------- + confident_joint_counts : np.ndarray + An array of shape ``(K, 2, 2)`` representing the confident joint of noisy and true labels for each class, in a one-vs-rest format employed for multi-label settings. + + Note: if `return_indices_of_off_diagonals` is set as True, this function instead returns a tuple `(confident_joint_counts, indices_off_diagonal)` + where `indices_off_diagonal` is a list of arrays (one per class) and each array contains the indices of examples counted in off-diagonals of confident joint for that class. + """ + + y_one, num_classes = get_onehot_num_classes(labels, pred_probs) + confident_joint_list: np.ndarray = np.ndarray(shape=(num_classes, 2, 2), dtype=np.int64) + indices_off_diagonal = [] + for class_num, (label, pred_prob_for_class) in enumerate(zip(y_one.T, pred_probs.T)): + pred_probs_binary = stack_complement(pred_prob_for_class) + if return_indices_of_off_diagonals: + cj, ind = compute_confident_joint( + labels=label, + pred_probs=pred_probs_binary, + multi_label=False, + thresholds=thresholds, + calibrate=calibrate, + return_indices_of_off_diagonals=return_indices_of_off_diagonals, + ) + indices_off_diagonal.append(ind) + else: + cj = compute_confident_joint( + labels=label, + pred_probs=pred_probs_binary, + multi_label=False, + thresholds=thresholds, + calibrate=calibrate, + return_indices_of_off_diagonals=return_indices_of_off_diagonals, + ) + confident_joint_list[class_num] = cj + + if return_indices_of_off_diagonals: + return confident_joint_list, indices_off_diagonal + + return confident_joint_list + + +
[docs]def estimate_latent( + confident_joint: np.ndarray, + labels: np.ndarray, + *, + py_method: str = "cnt", + converge_latent_estimates: bool = False, +) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: + """Computes the latent prior ``p(y)``, the noise matrix ``P(labels|y)`` and the + inverse noise matrix ``P(y|labels)`` from the `confident_joint` ``count(labels, y)``. The + `confident_joint` can be estimated by `~cleanlab.count.compute_confident_joint` + which counts confident examples. + + Parameters + ---------- + confident_joint : np.ndarray + An array of shape ``(K, K)`` representing the confident joint, the matrix used for identifying label issues, which + estimates a confident subset of the joint distribution of the noisy and true labels, ``P_{noisy label, true label}``. + Entry ``(j, k)`` in the matrix is the number of examples confidently counted into the pair of ``(noisy label=j, true label=k)`` classes. + The `confident_joint` can be computed using `~cleanlab.count.compute_confident_joint`. + If not provided, it is computed from the given (noisy) `labels` and `pred_probs`. + + labels : np.ndarray + A 1D array of shape ``(N,)`` containing class labels for a standard (multi-class) classification dataset. Some given labels may be erroneous. + Elements must be integers in the set 0, 1, ..., K-1, where K is the number of classes. + + py_method : {"cnt", "eqn", "marginal", "marginal_ps"}, default="cnt" + `py` is shorthand for the "class proportions (a.k.a prior) of the true labels". + This method defines how to compute the latent prior ``p(true_label=k)``. Default is ``"cnt"``, + which works well even when the noise matrices are estimated poorly by using + the matrix diagonals instead of all the probabilities. + + converge_latent_estimates : bool, optional + If ``True``, forces numerical consistency of estimates. Each is estimated + independently, but they are related mathematically with closed form + equivalences. This will iteratively make them mathematically consistent. + + Returns + ------ + tuple + A tuple containing (py, noise_matrix, inv_noise_matrix). + + Note + ---- + Multi-label classification is not supported in this method. + """ + + num_classes = len(confident_joint) + label_counts = value_counts_fill_missing_classes(labels, num_classes) + # 'ps' is p(labels=k) + ps = label_counts / float(len(labels)) + # Number of training examples confidently counted from each noisy class + labels_class_counts = confident_joint.sum(axis=1).astype(float) + # Number of training examples confidently counted into each true class + true_labels_class_counts = confident_joint.sum(axis=0).astype(float) + # p(label=k_s|true_label=k_y) ~ |label=k_s and true_label=k_y| / |true_label=k_y| + noise_matrix = confident_joint / np.clip(true_labels_class_counts, a_min=TINY_VALUE, a_max=None) + # p(true_label=k_y|label=k_s) ~ |true_label=k_y and label=k_s| / |label=k_s| + inv_noise_matrix = confident_joint.T / np.clip( + labels_class_counts, a_min=TINY_VALUE, a_max=None + ) + # Compute the prior p(y), the latent (uncorrupted) class distribution. + py = compute_py( + ps, + noise_matrix, + inv_noise_matrix, + py_method=py_method, + true_labels_class_counts=true_labels_class_counts, + ) + # Clip noise rates to be valid probabilities. + noise_matrix = clip_noise_rates(noise_matrix) + inv_noise_matrix = clip_noise_rates(inv_noise_matrix) + # Make latent estimates mathematically agree in their algebraic relations. + if converge_latent_estimates: + py, noise_matrix, inv_noise_matrix = _converge_estimates( + ps, py, noise_matrix, inv_noise_matrix + ) + # Again clip py and noise rates into proper range [0,1) + py = clip_values(py, low=1e-5, high=1.0, new_sum=1.0) + noise_matrix = clip_noise_rates(noise_matrix) + inv_noise_matrix = clip_noise_rates(inv_noise_matrix) + + return py, noise_matrix, inv_noise_matrix
+ + +
[docs]def estimate_py_and_noise_matrices_from_probabilities( + labels: np.ndarray, + pred_probs: np.ndarray, + *, + thresholds: Optional[Union[np.ndarray, list]] = None, + converge_latent_estimates: bool = True, + py_method: str = "cnt", + calibrate: bool = True, +) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]: + """Computes the confident counts + estimate of latent variables `py` and the noise rates + using observed labels and predicted probabilities, `pred_probs`. + + Important: this function assumes that `pred_probs` are out-of-sample + holdout probabilities. This can be :ref:`done with cross validation <pred_probs_cross_val>`. If + the probabilities are not computed out-of-sample, overfitting may occur. + + This function estimates the `noise_matrix` of shape ``(K, K)``. This is the + fraction of examples in every class, labeled as every other class. The + `noise_matrix` is a conditional probability matrix for ``P(label=k_s|true_label=k_y)``. + + Under certain conditions, estimates are exact, and in most + conditions, estimates are within one percent of the actual noise rates. + + Parameters + ---------- + labels : np.ndarray + A 1D array of shape ``(N,)`` containing class labels for a standard (multi-class) classification dataset. Some given labels may be erroneous. + Elements must be integers in the set 0, 1, ..., K-1, where K is the number of classes. + + pred_probs : np.ndarray + Model-predicted class probabilities for each example in the dataset, + in same format expected by :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` function. + + thresholds : array_like, optional + An array of shape ``(K, 1)`` or ``(K,)`` of per-class threshold + probabilities, used to determine the cutoff probability necessary to + consider an example as a given class label (see `Northcutt et al., + 2021 <https://jair.org/index.php/jair/article/view/12125>`_, Section + 3.1, Equation 2). + + This is for advanced users only. If not specified, these are computed + for you automatically. If an example has a predicted probability + greater than this threshold, it is counted as having true_label = + k. This is not used for pruning/filtering, only for estimating the + noise rates using confident counts. + + converge_latent_estimates : bool, optional + If ``True``, forces numerical consistency of estimates. Each is estimated + independently, but they are related mathematically with closed form + equivalences. This will iteratively make them mathematically consistent. + + py_method : {"cnt", "eqn", "marginal", "marginal_ps"}, default="cnt" + How to compute the latent prior ``p(true_label=k)``. Default is ``"cnt"`` as it often + works well even when the noise matrices are estimated poorly by using + the matrix diagonals instead of all the probabilities. + + calibrate : bool, default=True + Calibrates confident joint estimate ``P(label=i, true_label=j)`` such that + ``np.sum(cj) == len(labels)`` and ``np.sum(cj, axis = 1) == np.bincount(labels)``. + + Returns + ------ + estimates : tuple + A tuple of arrays: (`py`, `noise_matrix`, `inverse_noise_matrix`, `confident_joint`). + + Note + ---- + Multi-label classification is not supported in this method. + """ + + confident_joint = compute_confident_joint( + labels=labels, + pred_probs=pred_probs, + thresholds=thresholds, + calibrate=calibrate, + ) + assert isinstance(confident_joint, np.ndarray) + py, noise_matrix, inv_noise_matrix = estimate_latent( + confident_joint=confident_joint, + labels=labels, + py_method=py_method, + converge_latent_estimates=converge_latent_estimates, + ) + assert isinstance(confident_joint, np.ndarray) + + return py, noise_matrix, inv_noise_matrix, confident_joint
+ + +
[docs]def estimate_confident_joint_and_cv_pred_proba( + X, + labels, + clf=LogReg(solver="lbfgs"), + *, + cv_n_folds=5, + thresholds=None, + seed=None, + calibrate=True, + clf_kwargs={}, + validation_func=None, +) -> Tuple[np.ndarray, np.ndarray]: + """Estimates ``P(labels, y)``, the confident counts of the latent + joint distribution of true and noisy labels + using observed `labels` and predicted probabilities `pred_probs`. + + The output of this function is an array of shape ``(K, K)``. + + Under certain conditions, estimates are exact, and in many + conditions, estimates are within one percent of actual. + + Notes: There are two ways to compute the confident joint with pros/cons. + (1) For each holdout set, we compute the confident joint, then sum them up. + (2) Compute pred_proba for each fold, combine, compute the confident joint. + (1) is more accurate because it correctly computes thresholds for each fold + (2) is more accurate when you have only a little data because it computes + the confident joint using all the probabilities. For example if you had 100 + examples, with 5-fold cross validation + uniform p(y) you would only have 20 + examples to compute each confident joint for (1). Such small amounts of data + is bound to result in estimation errors. For this reason, we implement (2), + but we implement (1) as a commented out function at the end of this file. + + Parameters + ---------- + X : np.ndarray or pd.DataFrame + Input feature matrix of shape ``(N, ...)``, where N is the number of + examples. The classifier that this instance was initialized with, + ``clf``, must be able to fit() and predict() data with this format. + + labels : np.ndarray or pd.Series + A 1D array of shape ``(N,)`` containing class labels for a standard (multi-class) classification dataset. + Some given labels may be erroneous. + Elements must be integers in the set 0, 1, ..., K-1, where K is the number of classes. + All classes must be present in the dataset. + + clf : estimator instance, optional + A classifier implementing the `sklearn estimator API + <https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator>`_. + + cv_n_folds : int, default=5 + The number of cross-validation folds used to compute + out-of-sample predicted probabilities for each example in `X`. + + thresholds : array_like, optional + An array of shape ``(K, 1)`` or ``(K,)`` of per-class threshold + probabilities, used to determine the cutoff probability necessary to + consider an example as a given class label (see `Northcutt et al., + 2021 <https://jair.org/index.php/jair/article/view/12125>`_, Section + 3.1, Equation 2). + + This is for advanced users only. If not specified, these are computed + for you automatically. If an example has a predicted probability + greater than this threshold, it is counted as having true_label = + k. This is not used for pruning/filtering, only for estimating the + noise rates using confident counts. + + seed : int, optional + Set the default state of the random number generator used to split + the cross-validated folds. If None, uses np.random current random state. + + calibrate : bool, default=True + Calibrates confident joint estimate ``P(label=i, true_label=j)`` such that + ``np.sum(cj) == len(labels)`` and ``np.sum(cj, axis = 1) == np.bincount(labels)``. + + clf_kwargs : dict, optional + Optional keyword arguments to pass into `clf`'s ``fit()`` method. + + validation_func : callable, optional + Specifies how to map the validation data split in cross-validation as input for ``clf.fit()``. + For details, see the documentation of :py:meth:`CleanLearning.fit<cleanlab.classification.CleanLearning.fit>` + + Returns + ------ + estimates : tuple + Tuple of two numpy arrays in the form: + (joint counts matrix, predicted probability matrix) + + Note + ---- + Multi-label classification is not supported in this method. + """ + + assert_valid_inputs(X, labels) + labels = labels_to_array(labels) + num_classes = get_num_classes( + labels=labels + ) # This method definitely only works if all classes are present. + + # Create cross-validation object for out-of-sample predicted probabilities. + # CV folds preserve the fraction of noisy positive and + # noisy negative examples in each class. + kf = StratifiedKFold(n_splits=cv_n_folds, shuffle=True, random_state=seed) + + # Initialize pred_probs array + pred_probs = np.zeros(shape=(len(labels), num_classes)) + + # Split X and labels into "cv_n_folds" stratified folds. + # CV indices only require labels: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html + # Only split based on labels because X may have various formats: + for k, (cv_train_idx, cv_holdout_idx) in enumerate(kf.split(X=labels, y=labels)): + try: + clf_copy = sklearn.base.clone(clf) # fresh untrained copy of the model + except Exception: + raise ValueError( + "`clf` must be clonable via: sklearn.base.clone(clf). " + "You can either implement instance method `clf.get_params()` to produce a fresh untrained copy of this model, " + "or you can implement the cross-validation outside of cleanlab " + "and pass in the obtained `pred_probs` to skip cleanlab's internal cross-validation" + ) + # Select the training and holdout cross-validated sets. + X_train_cv, X_holdout_cv, s_train_cv, s_holdout_cv = train_val_split( + X, labels, cv_train_idx, cv_holdout_idx + ) + + # dict with keys: which classes missing, values: index of holdout data from this class that is duplicated: + missing_class_inds = {} + is_tf_or_torch_dataset = is_torch_dataset(X) or is_tensorflow_dataset(X) + if not is_tf_or_torch_dataset: + # Ensure no missing classes in training set. + train_cv_classes = set(s_train_cv) + all_classes = set(range(num_classes)) + if len(train_cv_classes) != len(all_classes): + missing_classes = all_classes.difference(train_cv_classes) + warnings.warn( + "Duplicated some data across multiple folds to ensure training does not fail " + f"because these classes do not have enough data for proper cross-validation: {missing_classes}." + ) + for missing_class in missing_classes: + # Duplicate one instance of missing_class from holdout data to the training data: + holdout_inds = np.where(s_holdout_cv == missing_class)[0] + dup_idx = holdout_inds[0] + s_train_cv = np.append(s_train_cv, s_holdout_cv[dup_idx]) + # labels are always np.ndarray so don't have to consider .iloc above + X_train_cv = append_extra_datapoint( + to_data=X_train_cv, from_data=X_holdout_cv, index=dup_idx + ) + missing_class_inds[missing_class] = dup_idx + + # Map validation data into appropriate format to pass into classifier clf + if validation_func is None: + validation_kwargs = {} + elif callable(validation_func): + validation_kwargs = validation_func(X_holdout_cv, s_holdout_cv) + else: + raise TypeError("validation_func must be callable function with args: X_val, y_val") + + # Fit classifier clf to training set, predict on holdout set, and update pred_probs. + clf_copy.fit(X_train_cv, s_train_cv, **clf_kwargs, **validation_kwargs) + pred_probs_cv = clf_copy.predict_proba(X_holdout_cv) # P(labels = k|x) # [:,1] + + # Replace predictions for duplicated indices with dummy predictions: + for missing_class in missing_class_inds: + dummy_pred = np.zeros(pred_probs_cv[0].shape) + dummy_pred[missing_class] = 1.0 # predict given label with full confidence + dup_idx = missing_class_inds[missing_class] + pred_probs_cv[dup_idx] = dummy_pred + + pred_probs[cv_holdout_idx] = pred_probs_cv + + # Compute the confident counts, a num_classes x num_classes matrix for all pairs of labels. + confident_joint = compute_confident_joint( + labels=labels, + pred_probs=pred_probs, # P(labels = k|x) + thresholds=thresholds, + calibrate=calibrate, + ) + assert isinstance(confident_joint, np.ndarray) + assert isinstance(pred_probs, np.ndarray) + + return confident_joint, pred_probs
+ + +
[docs]def estimate_py_noise_matrices_and_cv_pred_proba( + X, + labels, + clf=LogReg(solver="lbfgs"), + *, + cv_n_folds=5, + thresholds=None, + converge_latent_estimates=False, + py_method="cnt", + seed=None, + clf_kwargs={}, + validation_func=None, +) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]: + """This function computes the out-of-sample predicted + probability ``P(label=k|x)`` for every example x in `X` using cross + validation while also computing the confident counts noise + rates within each cross-validated subset and returning + the average noise rate across all examples. + + This function estimates the `noise_matrix` of shape ``(K, K)``. This is the + fraction of examples in every class, labeled as every other class. The + `noise_matrix` is a conditional probability matrix for ``P(label=k_s|true_label=k_y)``. + + Under certain conditions, estimates are exact, and in most + conditions, estimates are within one percent of the actual noise rates. + + Parameters + ---------- + X : np.ndarray + Input feature matrix of shape ``(N, ...)``, where N is the number of + examples. The classifier that this instance was initialized with, + `clf`, must be able to handle data with this shape. + + labels : np.ndarray + A 1D array of shape ``(N,)`` containing class labels for a standard (multi-class) classification dataset. + Some given labels may be erroneous. + Elements must be integers in the set 0, 1, ..., K-1, where K is the number of classes. + All classes must be present in the dataset. + + clf : estimator instance, optional + A classifier implementing the `sklearn estimator API + <https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator>`_. + + cv_n_folds : int, default=5 + The number of cross-validation folds used to compute + out-of-sample probabilities for each example in `X`. + + thresholds : array_like, optional + An array of shape ``(K, 1)`` or ``(K,)`` of per-class threshold + probabilities, used to determine the cutoff probability necessary to + consider an example as a given class label (see `Northcutt et al., + 2021 <https://jair.org/index.php/jair/article/view/12125>`_, Section + 3.1, Equation 2). + + This is for advanced users only. If not specified, these are computed + for you automatically. If an example has a predicted probability + greater than this threshold, it is counted as having true_label = + k. This is not used for pruning/filtering, only for estimating the + noise rates using confident counts. + + converge_latent_estimates : bool, optional + If ``True``, forces numerical consistency of estimates. Each is estimated + independently, but they are related mathematically with closed form + equivalences. This will iteratively make them mathematically consistent. + + py_method : {"cnt", "eqn", "marginal", "marginal_ps"}, default="cnt" + How to compute the latent prior ``p(true_label=k)``. Default is ``"cnt"`` as it often + works well even when the noise matrices are estimated poorly by using + the matrix diagonals instead of all the probabilities. + + seed : int, optional + Set the default state of the random number generator used to split + the cross-validated folds. If ``None``, uses ``np.random`` current random state. + + clf_kwargs : dict, optional + Optional keyword arguments to pass into `clf`'s ``fit()`` method. + + validation_func : callable, optional + Specifies how to map the validation data split in cross-validation as input for ``clf.fit()``. + For details, see the documentation of :py:meth:`CleanLearning.fit<cleanlab.classification.CleanLearning.fit>` + + Returns + ------ + estimates: tuple + A tuple of five arrays (py, noise matrix, inverse noise matrix, confident joint, predicted probability matrix). + + Note + ---- + Multi-label classification is not supported in this method. + """ + confident_joint, pred_probs = estimate_confident_joint_and_cv_pred_proba( + X=X, + labels=labels, + clf=clf, + cv_n_folds=cv_n_folds, + thresholds=thresholds, + seed=seed, + clf_kwargs=clf_kwargs, + validation_func=validation_func, + ) + + py, noise_matrix, inv_noise_matrix = estimate_latent( + confident_joint=confident_joint, + labels=labels, + py_method=py_method, + converge_latent_estimates=converge_latent_estimates, + ) + + return py, noise_matrix, inv_noise_matrix, confident_joint, pred_probs
+ + +
[docs]def estimate_cv_predicted_probabilities( + X, + labels, + clf=LogReg(solver="lbfgs"), + *, + cv_n_folds=5, + seed=None, + clf_kwargs={}, + validation_func=None, +) -> np.ndarray: + """This function computes the out-of-sample predicted + probability [P(label=k|x)] for every example in X using cross + validation. Output is a np.ndarray of shape ``(N, K)`` where N is + the number of training examples and K is the number of classes. + + Parameters + ---------- + X : np.ndarray + Input feature matrix of shape ``(N, ...)``, where N is the number of + examples. The classifier that this instance was initialized with, + `clf`, must be able to handle data with this shape. + + labels : np.ndarray + A 1D array of shape ``(N,)`` containing class labels for a standard (multi-class) classification dataset. + Some given labels may be erroneous. + Elements must be integers in the set 0, 1, ..., K-1, where K is the number of classes. + All classes must be present in the dataset. + + clf : estimator instance, optional + A classifier implementing the `sklearn estimator API + <https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator>`_. + + cv_n_folds : int, default=5 + The number of cross-validation folds used to compute + out-of-sample probabilities for each example in `X`. + + seed : int, optional + Set the default state of the random number generator used to split + the cross-validated folds. If ``None``, uses ``np.random`` current random state. + + clf_kwargs : dict, optional + Optional keyword arguments to pass into `clf`'s ``fit()`` method. + + validation_func : callable, optional + Specifies how to map the validation data split in cross-validation as input for ``clf.fit()``. + For details, see the documentation of :py:meth:`CleanLearning.fit<cleanlab.classification.CleanLearning.fit>` + + Returns + -------- + pred_probs : np.ndarray + An array of shape ``(N, K)`` representing ``P(label=k|x)``, the model-predicted probabilities. + Each row of this matrix corresponds to an example `x` and contains the model-predicted + probabilities that `x` belongs to each possible class. + """ + + return estimate_py_noise_matrices_and_cv_pred_proba( + X=X, + labels=labels, + clf=clf, + cv_n_folds=cv_n_folds, + seed=seed, + clf_kwargs=clf_kwargs, + validation_func=validation_func, + )[-1]
+ + +
[docs]def estimate_noise_matrices( + X, + labels, + clf=LogReg(solver="lbfgs"), + *, + cv_n_folds=5, + thresholds=None, + converge_latent_estimates=True, + seed=None, + clf_kwargs={}, + validation_func=None, +) -> Tuple[np.ndarray, np.ndarray]: + """Estimates the `noise_matrix` of shape ``(K, K)``. This is the + fraction of examples in every class, labeled as every other class. The + `noise_matrix` is a conditional probability matrix for ``P(label=k_s|true_label=k_y)``. + + Under certain conditions, estimates are exact, and in most + conditions, estimates are within one percent of the actual noise rates. + + Parameters + ---------- + X : np.ndarray + Input feature matrix of shape ``(N, ...)``, where N is the number of + examples. The classifier that this instance was initialized with, + `clf`, must be able to handle data with this shape. + + labels : np.ndarray + An array of shape ``(N,)`` of noisy labels, i.e. some labels may be erroneous. + Elements must be integers in the set 0, 1, ..., K-1, where K is the number of classes. + + clf : estimator instance, optional + A classifier implementing the `sklearn estimator API + <https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator>`_. + + cv_n_folds : int, default=5 + The number of cross-validation folds used to compute + out-of-sample probabilities for each example in `X`. + + thresholds : array_like, optional + An array of shape ``(K, 1)`` or ``(K,)`` of per-class threshold + probabilities, used to determine the cutoff probability necessary to + consider an example as a given class label (see `Northcutt et al., + 2021 <https://jair.org/index.php/jair/article/view/12125>`_, Section + 3.1, Equation 2). + + This is for advanced users only. If not specified, these are computed + for you automatically. If an example has a predicted probability + greater than this threshold, it is counted as having true_label = + k. This is not used for pruning/filtering, only for estimating the + noise rates using confident counts. + + converge_latent_estimates : bool, optional + If ``True``, forces numerical consistency of estimates. Each is estimated + independently, but they are related mathematically with closed form + equivalences. This will iteratively make them mathematically consistent. + + seed : int, optional + Set the default state of the random number generator used to split + the cross-validated folds. If None, uses np.random current random state. + + clf_kwargs : dict, optional + Optional keyword arguments to pass into `clf`'s ``fit()`` method. + + validation_func : callable, optional + Specifies how to map the validation data split in cross-validation as input for ``clf.fit()``. + For details, see the documentation of :py:meth:`CleanLearning.fit<cleanlab.classification.CleanLearning.fit>` + + Returns + ------ + estimates : tuple + A tuple containing arrays (`noise_matrix`, `inv_noise_matrix`).""" + + return estimate_py_noise_matrices_and_cv_pred_proba( + X=X, + labels=labels, + clf=clf, + cv_n_folds=cv_n_folds, + thresholds=thresholds, + converge_latent_estimates=converge_latent_estimates, + seed=seed, + clf_kwargs=clf_kwargs, + validation_func=validation_func, + )[1:-2]
+ + +def _converge_estimates( + ps: np.ndarray, + py: np.ndarray, + noise_matrix: np.ndarray, + inverse_noise_matrix: np.ndarray, + *, + inv_noise_matrix_iterations: int = 5, + noise_matrix_iterations: int = 3, +) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: + """Updates py := P(true_label=k) and both `noise_matrix` and `inverse_noise_matrix` + to be numerically consistent with each other, by iteratively updating their estimates based on + the mathematical relationships between them. + + Forces numerical consistency of estimates. Each is estimated + independently, but they are related mathematically with closed form + equivalences. This will iteratively make them mathematically consistent. + + py := P(true_label=k) and the inverse noise matrix P(true_label=k_y|label=k_s) specify one + another, meaning one can be computed from the other and vice versa. + When numerical discrepancy exists due to poor estimation, they can be made + to agree by repeatedly computing one from the other, + for some a certain number of iterations (3-10 works fine.) + + Do not set iterations too high or performance will decrease as small + deviations will get perturbed over and over and potentially magnified. + + Note that we have to first converge the inverse_noise_matrix and py, + then we can update the noise_matrix, then repeat. This is because the + inverse noise matrix depends on py (which is unknown/latent), but the + noise matrix depends on ps (which is known), so there will be no change in + the noise matrix if we recompute it when py and inverse_noise_matrix change. + + + Parameters + ---------- + ps : np.ndarray (shape (K, ) or (1, K)) + The fraction (prior probability) of each observed, NOISY class P(labels = k). + + py : np.ndarray (shape (K, ) or (1, K)) + The estimated fraction (prior probability) of each TRUE class P(true_label = k). + + noise_matrix : np.ndarray of shape (K, K), K = number of classes + A conditional probability matrix of the form P(label=k_s|true_label=k_y) containing + the fraction of examples in every class, labeled as every other class. + Assumes columns of noise_matrix sum to 1. + + inverse_noise_matrix : np.ndarray of shape (K, K), K = number of classes + A conditional probability matrix of the form P(true_label=k_y|labels=k_s) representing + the estimated fraction observed examples in each class k_s, that are + mislabeled examples from every other class k_y. If None, the + inverse_noise_matrix will be computed from pred_probs and labels. + Assumes columns of inverse_noise_matrix sum to 1. + + inv_noise_matrix_iterations : int, default = 5 + Number of times to converge inverse noise matrix with py and noise mat. + + noise_matrix_iterations : int, default = 3 + Number of times to converge noise matrix with py and inverse noise mat. + + Returns + ------ + estimates: tuple + Three arrays of the form (`py`, `noise_matrix`, `inverse_noise_matrix`) all + having numerical agreement in terms of their mathematical relations.""" + + for j in range(noise_matrix_iterations): + for i in range(inv_noise_matrix_iterations): + inverse_noise_matrix = compute_inv_noise_matrix(py=py, noise_matrix=noise_matrix, ps=ps) + py = compute_py(ps, noise_matrix, inverse_noise_matrix) + noise_matrix = compute_noise_matrix_from_inverse( + ps=ps, inverse_noise_matrix=inverse_noise_matrix, py=py + ) + + return py, noise_matrix, inverse_noise_matrix + + +
[docs]def get_confident_thresholds( + labels: LabelLike, + pred_probs: np.ndarray, + multi_label: bool = False, +) -> np.ndarray: + """Returns expected (average) "self-confidence" for each class. + + The confident class threshold for a class j is the expected (average) "self-confidence" for class j, + i.e. the model-predicted probability of this class averaged amongst all examples labeled as class j. + + Parameters + ---------- + labels : np.ndarray or list + Given class labels for each example in the dataset, some of which may be erroneous, + in same format expected by :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` function. + + pred_probs : np.ndarray + Model-predicted class probabilities for each example in the dataset, + in same format expected by :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` function. + + multi_label : bool, default = False + Set ``False`` if your dataset is for regular (multi-class) classification, where each example belongs to exactly one class. + Set ``True`` if your dataset is for multi-label classification, where each example can belong to multiple classes. + See documentation of `~cleanlab.count.compute_confident_joint` for details. + + Returns + ------- + confident_thresholds : np.ndarray + An array of shape ``(K, )`` where K is the number of classes. + """ + if multi_label: + assert isinstance(labels, list) + return _get_confident_thresholds_multilabel(labels=labels, pred_probs=pred_probs) + else: + # When all_classes != unique_classes the class threshold for the missing classes is set to + # BIG_VALUE such that no valid prob >= BIG_VALUE (no example will be counted in missing classes) + # REQUIRES: pred_probs.max() >= 1 + # TODO: if you want this to work for arbitrary softmax outputs where pred_probs.max() + # may exceed 1, change BIG_VALUE = 2 --> BIG_VALUE = 2 * pred_probs.max(). Downside of + # this approach is that there will be no standard value returned for missing classes. + labels = labels_to_array(labels) + all_classes = range(pred_probs.shape[1]) + unique_classes = get_unique_classes(labels, multi_label=multi_label) + BIG_VALUE = 2 + confident_thresholds = [ + np.mean(pred_probs[:, k][labels == k]) if k in unique_classes else BIG_VALUE + for k in all_classes + ] + confident_thresholds = np.clip( + confident_thresholds, a_min=CONFIDENT_THRESHOLDS_LOWER_BOUND, a_max=None + ) + return confident_thresholds
+ + +def _get_confident_thresholds_multilabel( + labels: list, + pred_probs: np.ndarray, +): + """Returns expected (average) "self-confidence" for each class. + + The confident class threshold for a class j is the expected (average) "self-confidence" for class j in a one-vs-rest setting. + + Parameters + ---------- + labels: list + Refer to documentation for this argument in ``count.calibrate_confident_joint()`` with ``multi_label=True`` for details. + + pred_probs : np.ndarray + Predicted class probabilities in the same format expected by the `~cleanlab.count.get_confident_thresholds` function. + + Returns + ------- + confident_thresholds : np.ndarray + An array of shape ``(K, 2, 2)`` where `K` is the number of classes, in a one-vs-rest format. + """ + y_one, num_classes = get_onehot_num_classes(labels, pred_probs) + confident_thresholds: np.ndarray = np.ndarray((num_classes, 2)) + for class_num, (label_for_class, pred_prob_for_class) in enumerate(zip(y_one.T, pred_probs.T)): + pred_probs_binary = stack_complement(pred_prob_for_class) + confident_thresholds[class_num] = get_confident_thresholds( + pred_probs=pred_probs_binary, labels=label_for_class + ) + return confident_thresholds +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/data_valuation.html b/v2.6.6/_modules/cleanlab/data_valuation.html new file mode 100644 index 000000000..4ad84c5c2 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/data_valuation.html @@ -0,0 +1,822 @@ + + + + + + + + + + + cleanlab.data_valuation - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.data_valuation

+# Copyright (C) 2017-2024  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+"""
+Methods for quantifying the value of each data point in a Machine Learning dataset.
+Data Valuation helps us assess individual training data points' contributions to a ML model's predictive performance.
+"""
+
+
+from typing import Callable, Optional, Union
+
+import numpy as np
+from scipy.sparse import csr_matrix
+
+from cleanlab.internal.neighbor.knn_graph import create_knn_graph_and_index
+
+
+def _knn_shapley_score(neighbor_indices: np.ndarray, y: np.ndarray, k: int) -> np.ndarray:
+    """Compute the Data Shapley values of data points using neighbor indices in a K-Nearest Neighbors (KNN) graph.
+
+    This function leverages equations (18) and (19) from the paper available at https://arxiv.org/abs/1908.08619
+    for computational efficiency.
+
+    Parameters
+    ----------
+    neighbor_indices :
+        A 2D array where each row contains the indices of the k-nearest neighbors for each data point.
+    y :
+        A 1D array of target values corresponding to the data points.
+    k :
+        The number of nearest neighbors to consider for each data point.
+
+    Notes
+    -----
+    - The training set is used as its own test set for the KNN-Shapley value computation, meaning y_test is the same as y_train.
+    - `neighbor_indices` are assumed to be pre-sorted by distance, with the nearest neighbors appearing first, and with at least `k` neighbors.
+    - Unlike the referenced paper, this implementation does not account for an upper error bound epsilon.
+      Consequently, K* is treated as equal to K instead of K* = max(K, 1/epsilon).
+        - This simplification implies that the term min(K, j + 1) will always be j + 1, which is offset by the
+          corresponding denominator term in the inner loop.
+        - Dividing by K in the end achieves the same result as dividing by K* in the paper.
+    - The pre-allocated `scores` array incorporates equation (18) for j = k - 1, ensuring efficient computation.
+    """
+    N = y.shape[0]
+    scores = np.zeros((N, N))
+
+    for y_alpha, s_alpha, idx in zip(y, scores, neighbor_indices):
+        y_neighbors = y[idx]
+        ans_matches = (y_neighbors == y_alpha).flatten()
+        for j in range(k - 2, -1, -1):
+            s_alpha[idx[j]] = s_alpha[idx[j + 1]] + float(
+                int(ans_matches[j]) - int(ans_matches[j + 1])
+            )
+    return np.mean(scores / k, axis=0)
+
+
+
[docs]def data_shapley_knn( + labels: np.ndarray, + *, + features: Optional[np.ndarray] = None, + knn_graph: Optional[csr_matrix] = None, + metric: Optional[Union[str, Callable]] = None, + k: int = 10, +) -> np.ndarray: + """ + Compute the Data Shapley values of data points using a K-Nearest Neighbors (KNN) graph. + + This function calculates the contribution (Data Shapley value) of each data point in a dataset + for model training, either directly from data features or using a precomputed KNN graph. + + The examples in the dataset with lowest data valuation scores contribute least + to a trained ML model’s performance (those whose value falls below a threshold are flagged with this type of issue). + The data valuation score is an approximate Data Shapley value, calculated based on the labels of the top k nearest neighbors of an example. Details on this KNN-Shapley value can be found in these papers: + https://arxiv.org/abs/1908.08619 and https://arxiv.org/abs/1911.07128. + + Parameters + ---------- + labels : + An array of labels for the data points(only for multi-class classification datasets). + features : + Feature embeddings (vector representations) of every example in the dataset. + + Necessary if `knn_graph` is not supplied. + + If provided, this must be a 2D array with shape (num_examples, num_features). + knn_graph : + A precomputed sparse KNN graph. If not provided, it will be computed from the `features` using the specified `metric`. + metric : Optional[str or Callable], default=None + The distance metric for KNN graph construction. + Supports metrics available in ``sklearn.neighbors.NearestNeighbors`` + Default metric is ``"cosine"`` for ``dim(features) > 3``, otherwise ``"euclidean"`` for lower-dimensional data. + The euclidean is computed with an efficient implementation from scikit-learn when the number of examples is greater than 100. + When the number of examples is 100 or fewer, a more numerically stable version of the euclidean distance from scipy is used. + k : + The number of neighbors to consider for the KNN graph and Data Shapley value computation. + Must be less than the total number of data points. + The value may not exceed the number of neighbors of each data point stored in the KNN graph. + + Returns + ------- + scores : + An array of transformed Data Shapley values for each data point, calibrated to indicate their relative importance. + These scores have been adjusted to fall within 0 to 1. + Values closer to 1 indicate data points that are highly influential and positively contribute to a trained ML model's performance. + Conversely, scores below 0.5 indicate data points estimated to negatively impact model performance. + + Raises + ------ + ValueError + If neither `knn_graph` nor `features` are provided, or if `k` is larger than the number of examples in `features`. + + Examples + -------- + >>> import numpy as np + >>> from cleanlab.data_valuation import data_shapley_knn + >>> labels = np.array([0, 1, 0, 1, 0]) + >>> features = np.array([[0, 1, 2, 3, 4]]).T + >>> data_shapley_knn(labels=labels, features=features, k=4) + array([0.55 , 0.525, 0.55 , 0.525, 0.55 ]) + """ + if knn_graph is None and features is None: + raise ValueError("Either knn_graph or features must be provided.") + + # Use provided knn_graph or compute it from features + if knn_graph is None: + knn_graph, _ = create_knn_graph_and_index(features, n_neighbors=k, metric=metric) + + num_examples = labels.shape[0] + distances = knn_graph.indices.reshape(num_examples, -1) + scores = _knn_shapley_score(neighbor_indices=distances, y=labels, k=k) + return 0.5 * (scores + 1)
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/datalab.html b/v2.6.6/_modules/cleanlab/datalab/datalab.html new file mode 100644 index 000000000..ecd9ec5e8 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/datalab.html @@ -0,0 +1,1316 @@ + + + + + + + + + + + cleanlab.datalab.datalab - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.datalab

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+"""
+Datalab offers a unified audit to detect all kinds of issues in data and labels.
+
+.. note::
+    .. include:: optional_dependencies.rst
+"""
+from __future__ import annotations
+
+import warnings
+from typing import TYPE_CHECKING, Any, Dict, List, Optional, Union
+
+import numpy as np
+import pandas as pd
+
+import cleanlab
+from cleanlab.datalab.internal.adapter.constants import DEFAULT_CLEANVISION_ISSUES
+from cleanlab.datalab.internal.adapter.imagelab import create_imagelab
+from cleanlab.datalab.internal.data import Data
+from cleanlab.datalab.internal.display import _Displayer
+from cleanlab.datalab.internal.helper_factory import (
+    _DataIssuesBuilder,
+    issue_finder_factory,
+    report_factory,
+)
+from cleanlab.datalab.internal.issue_manager_factory import (
+    list_default_issue_types as _list_default_issue_types,
+    list_possible_issue_types as _list_possible_issue_types,
+)
+from cleanlab.datalab.internal.serialize import _Serializer
+from cleanlab.datalab.internal.task import Task
+
+if TYPE_CHECKING:  # pragma: no cover
+    import numpy.typing as npt
+    from datasets.arrow_dataset import Dataset
+    from scipy.sparse import csr_matrix
+
+    DatasetLike = Union[Dataset, pd.DataFrame, Dict[str, Any], List[Dict[str, Any]], str]
+
+
+__all__ = ["Datalab"]
+
+
+
[docs]class Datalab: + """ + A single object to automatically detect all kinds of issues in datasets. + This is how we recommend you interface with the cleanlab library if you want to audit the quality of your data and detect issues within it. + If you have other specific goals (or are doing a less standard ML task not supported by Datalab), then consider using the other methods across the library. + Datalab tracks intermediate state (e.g. data statistics) from certain cleanlab functions that can be re-used across other cleanlab functions for better efficiency. + + Parameters + ---------- + data : Union[Dataset, pd.DataFrame, dict, list, str] + Dataset-like object that can be converted to a Hugging Face Dataset object. + + It should contain the labels for all examples, identified by a + `label_name` column in the Dataset object. + + Supported formats: + - datasets.Dataset + - pandas.DataFrame + - dict (keys are strings, values are arrays/lists of length ``N``) + - list (list of dictionaries that each have the same keys) + - str + + - path to a local file: Text (.txt), CSV (.csv), JSON (.json) + - or a dataset identifier on the Hugging Face Hub + + task : str + The type of machine learning task that the dataset is used for. + + Supported tasks: + - "classification" (default): Multiclass classification + - "regression" : Regression + - "multilabel" : Multilabel classification + + label_name : str, optional + The name of the label column in the dataset. + + image_key : str, optional + Optional key that can be specified for image datasets to point to the field (column) containing the actual images themselves (as PIL objects). + If specified, additional image-specific issue types will be checked for in the dataset. + See the `CleanVision package <https://github.com/cleanlab/cleanvision?tab=readme-ov-file#clean-your-data-for-better-computer-vision>`_ for descriptions of these image-specific issue types. + Currently, this argument is only supported for data formatted as a Hugging Face ``datasets.Dataset`` object. + + + verbosity : int, optional + The higher the verbosity level, the more information + Datalab prints when auditing a dataset. + Valid values are 0 through 4. Default is 1. + + Examples + -------- + >>> import datasets + >>> from cleanlab import Datalab + >>> data = datasets.load_dataset("glue", "sst2", split="train") + >>> datalab = Datalab(data, label_name="label") + """ + + def __init__( + self, + data: "DatasetLike", + task: str = "classification", + label_name: Optional[str] = None, + image_key: Optional[str] = None, + verbosity: int = 1, + ) -> None: + # Assume continuous values of labels for regression task + # Map labels to integers for classification task + self.task = Task.from_str(task) + self._data = Data(data, self.task, label_name) + self.data = self._data._data + self._labels = self._data.labels + self._label_map = self._labels.label_map + self.label_name = self._labels.label_name + self._data_hash = self._data._data_hash + self.cleanlab_version = cleanlab.version.__version__ + self.verbosity = verbosity + self._imagelab = create_imagelab(dataset=self.data, image_key=image_key) + + # Create the builder for DataIssues + builder = _DataIssuesBuilder(self._data) + builder.set_imagelab(self._imagelab).set_task(self.task) + self.data_issues = builder.build() + + # todo: check displayer methods + def __repr__(self) -> str: + return _Displayer(data_issues=self.data_issues, task=self.task).__repr__() + + def __str__(self) -> str: + return _Displayer(data_issues=self.data_issues, task=self.task).__str__() + + @property + def labels(self) -> Union[np.ndarray, List[List[int]]]: + """Labels of the dataset, in a [0, 1, ..., K-1] format.""" + return self._labels.labels + + @property + def has_labels(self) -> bool: + """Whether the dataset has labels, and that they are in a [0, 1, ..., K-1] format.""" + return self._labels.is_available + + @property + def class_names(self) -> List[str]: + """Names of the classes in the dataset. + + If the dataset has no labels, returns an empty list. + """ + return self._labels.class_names + +
[docs] def find_issues( + self, + *, + pred_probs: Optional[np.ndarray] = None, + features: Optional[npt.NDArray] = None, + knn_graph: Optional[csr_matrix] = None, + issue_types: Optional[Dict[str, Any]] = None, + ) -> None: + """ + Checks the dataset for all sorts of common issues in real-world data (in both labels and feature values). + + You can use Datalab to find issues in your data, utilizing *any* model you have already trained. + This method only interacts with your model via its predictions or embeddings (and other functions thereof). + The more of these inputs you provide, the more types of issues Datalab can detect in your dataset/labels. + If you provide a subset of these inputs, Datalab will output what insights it can based on the limited information from your model. + + NOTE + ---- + The issues are saved in the ``self.issues`` attribute of the ``Datalab`` object, but are not returned. + + Parameters + ---------- + pred_probs : + Out-of-sample predicted class probabilities made by the model for every example in the dataset. + To best detect label issues, provide this input obtained from the most accurate model you can produce. + + For classification data, this must be a 2D array with shape ``(num_examples, K)`` where ``K`` is the number of classes in the dataset. + Make sure that the columns of your `pred_probs` are properly ordered with respect to the ordering of classes, which for Datalab is: lexicographically sorted by class name. + + For regression data, this must be a 1D array with shape ``(num_examples,)`` containing the predicted value for each example. + + For multilabel classification data, this must be a 2D array with shape ``(num_examples, K)`` where ``K`` is the number of classes in the dataset. + Make sure that the columns of your `pred_probs` are properly ordered with respect to the ordering of classes, which for Datalab is: lexicographically sorted by class name. + + + features : Optional[np.ndarray] + Feature embeddings (vector representations) of every example in the dataset. + + If provided, this must be a 2D array with shape (num_examples, num_features). + + knn_graph : + Sparse matrix of precomputed distances between examples in the dataset in a k nearest neighbor graph. + + If provided, this must be a square CSR matrix with shape ``(num_examples, num_examples)`` and ``(k*num_examples)`` non-zero entries (``k`` is the number of nearest neighbors considered for each example), + evenly distributed across the rows. + Each non-zero entry in this matrix is a distance between a pair of examples in the dataset. Self-distances must be omitted + (i.e. diagonal must be all zeros, k nearest neighbors for each example do not include the example itself). + + This CSR format uses three 1D arrays (`data`, `indices`, `indptr`) to store a 2D matrix ``M``: + + - `data`: 1D array containing all the non-zero elements of matrix ``M``, listed in a row-wise fashion (but sorted within each row). + - `indices`: 1D array storing the column indices in matrix ``M`` of these non-zero elements. Each entry in `indices` corresponds to an entry in `data`, indicating the column of ``M`` containing this entry. + - `indptr`: 1D array indicating the start and end indices in `data` for each row of matrix ``M``. The non-zero elements of the i-th row of ``M`` are stored from ``data[indptr[i]]`` to ``data[indptr[i+1]]``. + + Within each row of matrix ``M`` (defined by the ranges in `indptr`), the corresponding non-zero entries (distances) of `knn_graph` must be sorted in ascending order (specifically in the segments of the `data` array that correspond to each row of ``M``). The `indices` array must also reflect this ordering, maintaining the correct column positions for these sorted distances. + + This type of matrix is returned by the method: `sklearn.neighbors.NearestNeighbors.kneighbors_graph <https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html#sklearn.neighbors.NearestNeighbors.kneighbors_graph>`_. + + Below is an example to illustrate: + + .. code-block:: python + + knn_graph.todense() + # matrix([[0. , 0.3, 0.2], + # [0.3, 0. , 0.4], + # [0.2, 0.4, 0. ]]) + + knn_graph.data + # array([0.2, 0.3, 0.3, 0.4, 0.2, 0.4]) + # Here, 0.2 and 0.3 are the sorted distances in the first row, 0.3 and 0.4 in the second row, and so on. + + knn_graph.indices + # array([2, 1, 0, 2, 0, 1]) + # Corresponding neighbor indices for the distances from the `data` array. + + knn_graph.indptr + # array([0, 2, 4, 6]) + # The non-zero entries in the first row are stored from `knn_graph.data[0]` to `knn_graph.data[2]`, the second row from `knn_graph.data[2]` to `knn_graph.data[4]`, and so on. + + For any duplicated examples i,j whose distance is 0, there should be an *explicit* zero stored in the matrix, i.e. ``knn_graph[i,j] = 0``. + + If both `knn_graph` and `features` are provided, the `knn_graph` will take precendence. + If `knn_graph` is not provided, it is constructed based on the provided `features`. + If neither `knn_graph` nor `features` are provided, certain issue types like (near) duplicates will not be considered. + + .. seealso:: + See the + `scipy.sparse.csr_matrix documentation <https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html>`_ + for more details on the CSR matrix format. + + issue_types : + Collection specifying which types of issues to consider in audit and any non-default parameter settings to use. + If unspecified, a default set of issue types and recommended parameter settings is considered. + + This is a dictionary of dictionaries, where the keys are the issue types of interest + and the values are dictionaries of parameter values that control how each type of issue is detected (only for advanced users). + More specifically, the values are constructor keyword arguments passed to the corresponding ``IssueManager``, + which is responsible for detecting the particular issue type. + + .. seealso:: + :py:class:`IssueManager <cleanlab.datalab.internal.issue_manager.issue_manager.IssueManager>` + + Examples + -------- + + Here are some ways to provide inputs to :py:meth:`find_issues`: + + - Passing ``pred_probs``: + .. code-block:: python + + >>> from sklearn.linear_model import LogisticRegression + >>> import numpy as np + >>> from cleanlab import Datalab + >>> X = np.array([[0, 1], [1, 1], [2, 2], [2, 0]]) + >>> y = np.array([0, 1, 1, 0]) + >>> clf = LogisticRegression(random_state=0).fit(X, y) + >>> pred_probs = clf.predict_proba(X) + >>> lab = Datalab(data={"X": X, "y": y}, label_name="y") + >>> lab.find_issues(pred_probs=pred_probs) + + + - Passing ``features``: + .. code-block:: python + + >>> from sklearn.linear_model import LogisticRegression + >>> from sklearn.neighbors import NearestNeighbors + >>> import numpy as np + >>> from cleanlab import Datalab + >>> X = np.array([[0, 1], [1, 1], [2, 2], [2, 0]]) + >>> y = np.array([0, 1, 1, 0]) + >>> lab = Datalab(data={"X": X, "y": y}, label_name="y") + >>> lab.find_issues(features=X) + + .. note:: + + You can pass both ``pred_probs`` and ``features`` to :py:meth:`find_issues` for a more comprehensive audit. + + - Passing a ``knn_graph``: + .. code-block:: python + + >>> from sklearn.neighbors import NearestNeighbors + >>> import numpy as np + >>> from cleanlab import Datalab + >>> X = np.array([[0, 1], [1, 1], [2, 2], [2, 0]]) + >>> y = np.array([0, 1, 1, 0]) + >>> nbrs = NearestNeighbors(n_neighbors=2, metric="euclidean").fit(X) + >>> knn_graph = nbrs.kneighbors_graph(mode="distance") + >>> knn_graph # Pass this to Datalab + <4x4 sparse matrix of type '<class 'numpy.float64'>' + with 8 stored elements in Compressed Sparse Row format> + >>> knn_graph.toarray() # DO NOT PASS knn_graph.toarray() to Datalab, only pass the sparse matrix itself + array([[0. , 1. , 2.23606798, 0. ], + [1. , 0. , 1.41421356, 0. ], + [0. , 1.41421356, 0. , 2. ], + [0. , 1.41421356, 2. , 0. ]]) + >>> lab = Datalab(data={"X": X, "y": y}, label_name="y") + >>> lab.find_issues(knn_graph=knn_graph) + + - Configuring issue types: + Suppose you want to only consider label issues. Just pass a dictionary with the key "label" and an empty dictionary as the value (to use default label issue parameters). + + .. code-block:: python + + >>> issue_types = {"label": {}} + >>> # lab.find_issues(pred_probs=pred_probs, issue_types=issue_types) + + If you are advanced user who wants greater control, you can pass keyword arguments to the issue manager that handles the label issues. + For example, if you want to pass the keyword argument "clean_learning_kwargs" + to the constructor of the :py:class:`LabelIssueManager <cleanlab.datalab.internal.issue_manager.label.LabelIssueManager>`, you would pass: + + + .. code-block:: python + + >>> issue_types = { + ... "label": { + ... "clean_learning_kwargs": { + ... "prune_method": "prune_by_noise_rate", + ... }, + ... }, + ... } + >>> # lab.find_issues(pred_probs=pred_probs, issue_types=issue_types) + + """ + + if issue_types is not None and not issue_types: + warnings.warn( + "No issue types were specified so no issues will be found in the dataset. Set `issue_types` as None to consider a default set of issues." + ) + return None + issue_finder = issue_finder_factory(self._imagelab)( + datalab=self, task=self.task, verbosity=self.verbosity + ) + issue_finder.find_issues( + pred_probs=pred_probs, + features=features, + knn_graph=knn_graph, + issue_types=issue_types, + ) + + if self.verbosity: + print( + f"\nAudit complete. {self.data_issues.issue_summary['num_issues'].sum()} issues found in the dataset." + )
+ +
[docs] def report( + self, + *, + num_examples: int = 5, + verbosity: Optional[int] = None, + include_description: bool = True, + show_summary_score: bool = False, + show_all_issues: bool = False, + ) -> None: + """Prints informative summary of all issues. + + Parameters + ---------- + num_examples : + Number of examples to show for each type of issue. + The report shows the top `num_examples` instances in the dataset that suffer the most from each type of issue. + + verbosity : + Higher verbosity levels add more information to the report. + + include_description : + Whether or not to include a description of each issue type in the report. + Consider setting this to ``False`` once you're familiar with how each issue type is defined. + + show_summary_score : + Whether or not to include the overall severity score of each issue type in the report. + These scores are not comparable across different issue types, + see the ``issue_summary`` documentation to learn more. + + show_all_issues : + Whether or not the report should show all issue types that were checked for, or only the types of issues detected in the dataset. + With this set to ``True``, the report may include more types of issues that were not detected in the dataset. + + See Also + -------- + For advanced usage, see documentation for the + :py:class:`Reporter <cleanlab.datalab.internal.report.Reporter>` class. + """ + if verbosity is None: + verbosity = self.verbosity + if self.data_issues.issue_summary.empty: + print("Please specify some `issue_types` in datalab.find_issues() to see a report.\n") + return + + reporter = report_factory(self._imagelab)( + data_issues=self.data_issues, + task=self.task, + verbosity=verbosity, + include_description=include_description, + show_summary_score=show_summary_score, + show_all_issues=show_all_issues, + imagelab=self._imagelab, + ) + reporter.report(num_examples=num_examples)
+ + @property + def issues(self) -> pd.DataFrame: + """Issues found in each example from the dataset.""" + return self.data_issues.issues + + @issues.setter + def issues(self, issues: pd.DataFrame) -> None: + self.data_issues.issues = issues + + @property + def issue_summary(self) -> pd.DataFrame: + """Summary of issues found in the dataset and the overall severity of each type of issue. + + Each type of issue has a summary score, which is usually defined as an average of + per-example issue-severity scores (over all examples in the dataset). + So these summary scores are not directly tied to the number of examples estimated to exhibit + a particular type of issue. Issue-severity (ie. quality of each example) is measured differently for each issue type, + and these per-example scores are only comparable across different examples for the same issue-type, but are not comparable across different issue types. + For instance, label quality might be scored via estimated likelihood of the given label, + whereas outlier quality might be scored via distance to K-nearest-neighbors in feature space (fundamentally incomparable quantities). + For some issue types, the summary score is not an average of per-example scores, but rather a global statistic of the dataset + (eg. for `non_iid` issue type, the p-value for hypothesis test that data are IID). + + In summary, you can compare these summary scores across datasets for the same issue type, but never compare them across different issue types. + + Examples + ------- + + If checks for "label" and "outlier" issues were run, + then the issue summary will look something like this: + + >>> datalab.issue_summary + issue_type score + outlier 0.123 + label 0.456 + """ + return self.data_issues.issue_summary + + @issue_summary.setter + def issue_summary(self, issue_summary: pd.DataFrame) -> None: + self.data_issues.issue_summary = issue_summary + + @property + def info(self) -> Dict[str, Dict[str, Any]]: + """Information and statistics about the dataset issues found. + + Examples + ------- + + If checks for "label" and "outlier" issues were run, + then the info will look something like this: + + >>> datalab.info + { + "label": { + "given_labels": [0, 1, 0, 1, 1, 1, 1, 1, 0, 1, ...], + "predicted_label": [0, 0, 0, 1, 0, 1, 0, 1, 0, 1, ...], + ..., + }, + "outlier": { + "nearest_neighbor": [3, 7, 1, 2, 8, 4, 5, 9, 6, 0, ...], + "distance_to_nearest_neighbor": [0.123, 0.789, 0.456, ...], + ..., + }, + } + """ + return self.data_issues.info + + @info.setter + def info(self, info: Dict[str, Dict[str, Any]]) -> None: + self.data_issues.info = info + +
[docs] def get_issues(self, issue_name: Optional[str] = None) -> pd.DataFrame: + """ + Use this after finding issues to see which examples suffer from which types of issues. + + Parameters + ---------- + issue_name : str or None + The type of issue to focus on. If `None`, returns full DataFrame summarizing all of the types of issues detected in each example from the dataset. + + Raises + ------ + ValueError + If `issue_name` is not a type of issue previously considered in the audit. + + Returns + ------- + specific_issues : + A DataFrame where each row corresponds to an example from the dataset and columns specify: + whether this example exhibits a particular type of issue, and how severely (via a numeric quality score where lower values indicate more severe instances of the issue). + The quality scores lie between 0-1 and are directly comparable between examples (for the same issue type), but not across different issue types. + + Additional columns may be present in the DataFrame depending on the type of issue specified. + """ + + # Validate issue_name + if issue_name is not None and issue_name not in self.list_possible_issue_types(): + raise ValueError( + f"""Invalid issue_name: {issue_name}. Please specify a valid issue_name from the list of possible issue types. + Either, specify one of the following: {self.list_possible_issue_types()} + or set issue_name as None to get all issue types. + """ + ) + return self.data_issues.get_issues(issue_name=issue_name)
+ +
[docs] def get_issue_summary(self, issue_name: Optional[str] = None) -> pd.DataFrame: + """Summarize the issues found in dataset of a particular type, + including how severe this type of issue is overall across the dataset. + + See the documentation of the ``issue_summary`` attribute to learn more. + + Parameters + ---------- + issue_name : + Name of the issue type to summarize. If `None`, summarizes each of the different issue types previously considered in the audit. + + Returns + ------- + issue_summary : + DataFrame where each row corresponds to a type of issue, and columns quantify: + the number of examples in the dataset estimated to exhibit this type of issue, + and the overall severity of the issue across the dataset (via a numeric quality score where lower values indicate that the issue is overall more severe). + The quality scores lie between 0-1 and are directly comparable between multiple datasets (for the same issue type), but not across different issue types. + """ + return self.data_issues.get_issue_summary(issue_name=issue_name)
+ +
[docs] def get_info(self, issue_name: Optional[str] = None) -> Dict[str, Any]: + """Get the info for the issue_name key. + + This function is used to get the info for a specific issue_name. If the info is not computed yet, it will raise an error. + + Parameters + ---------- + issue_name : + The issue name for which the info is required. + + Returns + ------- + :py:meth:`info <cleanlab.datalab.internal.data_issues.DataIssues.get_info>` : + The info for the issue_name. + """ + return self.data_issues.get_info(issue_name)
+ +
[docs] def list_possible_issue_types(self) -> List[str]: + """Returns a list of all registered issue types. + + Any issue type that is not in this list cannot be used in the :py:meth:`find_issues` method. + + See Also + -------- + :py:class:`REGISTRY <cleanlab.datalab.internal.issue_manager_factory.REGISTRY>` : All available issue types and their corresponding issue managers can be found here. + """ + possible_issue_types = _list_possible_issue_types(task=self.task) + if self._imagelab is not None: + possible_issue_types.extend(DEFAULT_CLEANVISION_ISSUES.keys()) + return possible_issue_types
+ +
[docs] def list_default_issue_types(self) -> List[str]: + """Returns a list of the issue types that are run by default + when :py:meth:`find_issues` is called without specifying `issue_types`. + + See Also + -------- + :py:class:`REGISTRY <cleanlab.datalab.internal.issue_manager_factory.REGISTRY>` : All available issue types and their corresponding issue managers can be found here. + """ + default_issue_types = _list_default_issue_types(task=self.task) + if self._imagelab is not None: + default_issue_types.extend(DEFAULT_CLEANVISION_ISSUES.keys()) + return default_issue_types
+ +
[docs] def save(self, path: str, force: bool = False) -> None: + """Saves this Datalab object to file (all files are in folder at `path/`). + We do not guarantee saved Datalab can be loaded from future versions of cleanlab. + + Parameters + ---------- + path : + Folder in which all information about this Datalab should be saved. + + force : + If ``True``, overwrites any existing files in the folder at `path`. Use this with caution! + + NOTE + ---- + You have to save the Dataset yourself separately if you want it saved to file. + """ + _Serializer.serialize(path=path, datalab=self, force=force) + save_message = f"Saved Datalab to folder: {path}" + print(save_message)
+ +
[docs] @staticmethod + def load(path: str, data: Optional[Dataset] = None) -> "Datalab": + """Loads Datalab object from a previously saved folder. + + Parameters + ---------- + `path` : + Path to the folder previously specified in ``Datalab.save()``. + + `data` : + The dataset used to originally construct the Datalab. + Remember the dataset is not saved as part of the Datalab, + you must save/load the data separately. + + Returns + ------- + `datalab` : + A Datalab object that is identical to the one originally saved. + """ + datalab = _Serializer.deserialize(path=path, data=data) + load_message = f"Datalab loaded from folder: {path}" + print(load_message) + return datalab
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/data.html b/v2.6.6/_modules/cleanlab/datalab/internal/data.html new file mode 100644 index 000000000..c65cb1d5e --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/data.html @@ -0,0 +1,1069 @@ + + + + + + + + + + + cleanlab.datalab.internal.data - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.data

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+"""Classes and methods for datasets that are loaded into Datalab."""
+
+import os
+from typing import Any, Callable, Dict, List, Mapping, Optional, Union, cast, TYPE_CHECKING, Tuple
+
+from cleanlab.datalab.internal.task import Task
+
+try:
+    import datasets
+except ImportError as error:
+    raise ImportError(
+        "Cannot import datasets package. "
+        "Please install it and try again, or just install cleanlab with "
+        "all optional dependencies via: `pip install 'cleanlab[all]'`"
+    ) from error
+from abc import ABC, abstractmethod
+import numpy as np
+import pandas as pd
+from datasets.arrow_dataset import Dataset
+from datasets import ClassLabel
+
+from cleanlab.internal.validation import labels_to_array, labels_to_list_multilabel
+
+
+if TYPE_CHECKING:  # pragma: no cover
+    DatasetLike = Union[Dataset, pd.DataFrame, Dict[str, Any], List[Dict[str, Any]], str]
+
+
+
[docs]class DataFormatError(ValueError): + """Exception raised when the data is not in a supported format.""" + + def __init__(self, data: Any): + self.data = data + message = ( + f"Unsupported data type: {type(data)}\n" + "Supported types: " + "datasets.Dataset, pandas.DataFrame, dict, list, str" + ) + super().__init__(message)
+ + +
[docs]class DatasetDictError(ValueError): + """Exception raised when a DatasetDict is passed to Datalab. + + Usually, this means that a dataset identifier was passed to Datalab, but + the dataset is a DatasetDict, which contains multiple splits of the dataset. + + """ + + def __init__(self): + message = ( + "Please pass a single dataset, not a DatasetDict. " + "Try specifying a split, e.g. `dataset = load_dataset('dataset', split='train')` " + "then pass `dataset` to Datalab." + ) + super().__init__(message)
+ + +
[docs]class DatasetLoadError(ValueError): + """Exception raised when a dataset cannot be loaded. + + Parameters + ---------- + dataset_type: type + The type of dataset that failed to load. + """ + + def __init__(self, dataset_type: type): + message = f"Failed to load dataset from {dataset_type}.\n" + super().__init__(message)
+ + +
[docs]class Data: + """ + Class that holds and validates datasets for Datalab. + + Internally, the data is stored as a datasets.Dataset object and the labels + are integers (ranging from 0 to K-1, where K is the number of classes) stored + in a numpy array. + + Parameters + ---------- + data : + Dataset to be audited by Datalab. + Several formats are supported, which will internally be converted to a Dataset object. + + Supported formats: + - datasets.Dataset + - pandas.DataFrame + - dict + - keys are strings + - values are arrays or lists of equal length + - list + - list of dictionaries with the same keys + - str + - path to a local file + - Text (.txt) + - CSV (.csv) + - JSON (.json) + - or a dataset identifier on the Hugging Face Hub + It checks if the string is a path to a file that exists locally, and if not, + it assumes it is a dataset identifier on the Hugging Face Hub. + + label_name : Union[str, List[str]] + Name of the label column in the dataset. + + task : + The task associated with the dataset. This is used to determine how to + to format the labels. + + Note: + + - If the task is a classification task, the labels + will be mapped to integers, e.g. [0, 1, ..., K-1] where K is the number + of classes. If the task is a regression task, the labels will not be + mapped to integers. + + - If the task is a multilabel task, the labels will be formatted as a + list of lists, e.g. [[0, 1], [1, 2], [0, 2]] where each sublist contains + the labels for a single example. If the task is not a multilabel task, + the labels will be formatted as a 1D numpy array. + + Warnings + -------- + Optional dependencies: + + - datasets : + Dataset, DatasetDict and load_dataset are imported from datasets. + This is an optional dependency of cleanlab, but is required for + :py:class:`Datalab <cleanlab.datalab.datalab.Datalab>` to work. + """ + + def __init__( + self, + data: "DatasetLike", + task: Task, + label_name: Optional[str] = None, + ) -> None: + self._validate_data(data) + self._data = self._load_data(data) + self._data_hash = hash(self._data) + self.labels: Label + label_class = MultiLabel if task.is_multilabel else MultiClass + map_to_int = task.is_classification + self.labels = label_class(data=self._data, label_name=label_name, map_to_int=map_to_int) + + def _load_data(self, data: "DatasetLike") -> Dataset: + """Checks the type of dataset and uses the correct loader method and + assigns the result to the data attribute.""" + dataset_factory_map: Dict[type, Callable[..., Dataset]] = { + Dataset: lambda x: x, + pd.DataFrame: Dataset.from_pandas, + dict: self._load_dataset_from_dict, + list: self._load_dataset_from_list, + str: self._load_dataset_from_string, + } + if not isinstance(data, tuple(dataset_factory_map.keys())): + raise DataFormatError(data) + return dataset_factory_map[type(data)](data) + + def __len__(self) -> int: + return len(self._data) + + def __eq__(self, other) -> bool: + if isinstance(other, Data): + # Equality checks + hashes_are_equal = self._data_hash == other._data_hash + labels_are_equal = self.labels == other.labels + return all([hashes_are_equal, labels_are_equal]) + return False + + def __hash__(self) -> int: + return self._data_hash + + @property + def class_names(self) -> List[str]: + return self.labels.class_names + + @property + def has_labels(self) -> bool: + """Check if labels are available.""" + return self.labels.is_available + + @staticmethod + def _validate_data(data) -> None: + if isinstance(data, datasets.DatasetDict): + raise DatasetDictError() + if not isinstance(data, (Dataset, pd.DataFrame, dict, list, str)): + raise DataFormatError(data) + + @staticmethod + def _load_dataset_from_dict(data_dict: Dict[str, Any]) -> Dataset: + try: + return Dataset.from_dict(data_dict) + except Exception as error: + raise DatasetLoadError(dict) from error + + @staticmethod + def _load_dataset_from_list(data_list: List[Dict[str, Any]]) -> Dataset: + try: + return Dataset.from_list(data_list) + except Exception as error: + raise DatasetLoadError(list) from error + + @staticmethod + def _load_dataset_from_string(data_string: str) -> Dataset: + if not os.path.exists(data_string): + try: + dataset = datasets.load_dataset(data_string) + return cast(Dataset, dataset) + except Exception as error: + raise DatasetLoadError(str) from error + + factory: Dict[str, Callable[[str], Any]] = { + ".txt": Dataset.from_text, + ".csv": Dataset.from_csv, + ".json": Dataset.from_json, + } + + extension = os.path.splitext(data_string)[1] + if extension not in factory: + raise DatasetLoadError(type(data_string)) + + dataset = factory[extension](data_string) + dataset_cast = cast(Dataset, dataset) + return dataset_cast
+ + +
[docs]class Label(ABC): + """ + Class to represent labels in a dataset. + + It stores the labels as a numpy array and maps them to integers if necessary. + If a mapping is not necessary, e.g. for regression tasks, the mapping will be an empty dictionary. + + Parameters + ---------- + data : + A Hugging Face Dataset object. + + label_name : str + Name of the label column in the dataset. + + map_to_int : bool + Whether to map the labels to integers, e.g. [0, 1, ..., K-1] where K is the number of classes. + If False, the labels are not mapped to integers, e.g. for regression tasks. + """ + + def __init__( + self, *, data: Dataset, label_name: Optional[str] = None, map_to_int: bool = True + ) -> None: + self._data = data + self.label_name = label_name + self.labels = labels_to_array([]) + self.label_map: Mapping[Union[str, int], Any] = {} + if label_name is not None: + self.labels, self.label_map = self._extract_labels(data, label_name, map_to_int) + self._validate_labels() + + def __len__(self) -> int: + if self.labels is None: + return 0 + return len(self.labels) + + def __eq__(self, __value: object) -> bool: + if isinstance(__value, Label): + labels_are_equal = np.array_equal(self.labels, __value.labels) + names_are_equal = self.label_name == __value.label_name + maps_are_equal = self.label_map == __value.label_map + return all([labels_are_equal, names_are_equal, maps_are_equal]) + return False + + def __getitem__(self, __index: Union[int, slice, np.ndarray]) -> np.ndarray: + return self.labels[__index] + + def __bool__(self) -> bool: + return self.is_available + + @property + def class_names(self) -> List[str]: + """A list of class names that are present in the dataset. + + Without labels, this will return an empty list. + """ + return list(self.label_map.values()) + + @property + def is_available(self) -> bool: + """Check if labels are available.""" + empty_labels = self.labels is None or len(self.labels) == 0 + empty_label_map = self.label_map is None or len(self.label_map) == 0 + return not (empty_labels or empty_label_map) + + def _validate_labels(self) -> None: + if self.label_name not in self._data.column_names: + raise ValueError(f"Label column '{self.label_name}' not found in dataset.") + labels = self._data[self.label_name] + assert isinstance(labels, (np.ndarray, list)) + assert len(labels) == len(self._data) + + @abstractmethod + def _extract_labels(self, *args, **kwargs) -> Any: + """Extract labels from the dataset and formats them""" + raise NotImplementedError
+ + +
[docs]class MultiLabel(Label): + def __init__(self, data, label_name, map_to_int): + super().__init__(data=data, label_name=label_name, map_to_int=map_to_int) + + def _extract_labels( + self, data: Dataset, label_name: str, map_to_int: bool + ) -> Tuple[List[List[int]], Dict[int, Any]]: + labels: List[List[int]] = labels_to_list_multilabel(data[label_name]) + # label_map needs to be lexicographically sorted. np.unique should sort it + unique_labels = np.unique([x for ele in labels for x in ele]) + label_map = {label: i for i, label in enumerate(unique_labels)} + formatted_labels = [[label_map[item] for item in label] for label in labels] + inverse_map = {i: label for label, i in label_map.items()} + return formatted_labels, inverse_map
+ + +
[docs]class MultiClass(Label): + def __init__(self, data, label_name, map_to_int): + super().__init__(data=data, label_name=label_name, map_to_int=map_to_int) + + def _extract_labels(self, data: Dataset, label_name: str, map_to_int: bool): + """ + Picks out labels from the dataset and formats them to be [0, 1, ..., K-1] + where K is the number of classes. Also returns a mapping from the formatted + labels to the original labels in the dataset. + + Note: This function is not meant to be used directly. It is used by + ``cleanlab.data.Data`` to extract the formatted labels from the dataset + and stores them as attributes. + + Parameters + ---------- + data : datasets.Dataset + A Hugging Face Dataset object. + + label_name : str + Name of the column in the dataset that contains the labels. + + map_to_int : bool + Whether to map the labels to integers, e.g. [0, 1, ..., K-1] where K is the number of classes. + If False, the labels are not mapped to integers, e.g. for regression tasks. + Returns + ------- + formatted_labels : np.ndarray + Labels in the format [0, 1, ..., K-1] where K is the number of classes. + + inverse_map : dict + Mapping from the formatted labels to the original labels in the dataset. + """ + + labels = labels_to_array(data[label_name]) # type: ignore[assignment] + if labels.ndim != 1: + raise ValueError("labels must be 1D numpy array.") + + if not map_to_int: + # Don't map labels to integers, e.g. for regression tasks + return labels, {} + label_name_feature = data.features[label_name] + if isinstance(label_name_feature, ClassLabel): + label_map = { + label: label_name_feature.str2int(label) for label in label_name_feature.names + } + formatted_labels = labels + else: + label_map = {label: i for i, label in enumerate(np.unique(labels))} + formatted_labels = np.vectorize(label_map.get, otypes=[int])(labels) + inverse_map = {i: label for label, i in label_map.items()} + + return formatted_labels, inverse_map
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/data_issues.html b/v2.6.6/_modules/cleanlab/datalab/internal/data_issues.html new file mode 100644 index 000000000..85c91ce1f --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/data_issues.html @@ -0,0 +1,1106 @@ + + + + + + + + + + + cleanlab.datalab.internal.data_issues - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.data_issues

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+"""
+Module for the :py:class:`DataIssues` class, which serves as a central repository for storing
+information and statistics about issues found in a dataset.
+
+It collects information from various
+:py:class:`IssueManager <cleanlab.datalab.internal.issue_manager.issue_manager.IssueManager>`
+instances and keeps track of each issue, a summary for each type of issue,
+related information and statistics about the issues.
+
+The collected information can be accessed using the
+`~cleanlab.datalab.internal.data_issues.DataIssues.get_info` method.
+We recommend using that method instead of this module, which is just intended for internal use.
+"""
+from __future__ import annotations
+
+import warnings
+from abc import ABC, abstractmethod
+from typing import TYPE_CHECKING, Any, Dict, List, Optional, Type, Union
+import numpy as np
+
+import pandas as pd
+
+if TYPE_CHECKING:  # pragma: no cover
+    from cleanlab.datalab.internal.data import Data
+    from cleanlab.datalab.internal.issue_manager import IssueManager
+    from cleanvision import Imagelab
+
+
+class _InfoStrategy(ABC):
+    """
+    Abstract base class for strategies that fetch information about data issues.
+
+    Subclasses must implement the `get_info` method, which takes a `Data` object, a dictionary of
+    information about data issues, and an optional issue name, and returns a dictionary of
+    information about the specified issue, augmented with dataset about the dataset as a whole.
+
+    This class also provides a helper method, `_get_info_helper`, which takes an information
+    dictionary and an optional issue name, and returns a copy of the information dictionary for
+    the specified issue. If the issue name is `None`, this method returns `None`.
+    """
+
+    @staticmethod
+    @abstractmethod
+    def get_info(
+        data: Data,
+        info: Dict[str, Dict[str, Any]],
+        issue_name: Optional[str] = None,
+    ) -> Dict[str, Any]:
+        """
+        Get information about a data issue from an information dictionary.
+
+        Parameters
+        ----------
+        info : dict
+            A dictionary of information about data issues.
+        issue_name : str or None, optional (default=None)
+            The name of the issue to get information about. If `None`, this method returns `None`.
+
+        Returns
+        -------
+        dict or None
+            A copy of the information dictionary for the specified issue, or `None` if the issue
+            name is `None`.
+
+        Raises
+        ------
+        ValueError
+            If the specified issue name is not found in the information dictionary.
+        """
+        pass  # pragma: no cover
+
+    @staticmethod
+    def _get_info_helper(
+        info: Dict[str, Dict[str, Any]],
+        issue_name: Optional[str] = None,
+    ) -> Optional[Dict[str, Any]]:
+        if issue_name is None:
+            return None
+        if issue_name not in info:
+            raise ValueError(
+                f"issue_name {issue_name} not found in self.info. These have not been computed yet."
+            )
+        info = info[issue_name].copy()
+        return info
+
+
+class _ClassificationInfoStrategy(_InfoStrategy):
+    """Strategy for computing information about data issues related to classification tasks."""
+
+    @staticmethod
+    def get_info(
+        data: Data,
+        info: Dict[str, Dict[str, Any]],
+        issue_name: Optional[str] = None,
+    ) -> Dict[str, Any]:
+        info_extracted = _InfoStrategy._get_info_helper(info=info, issue_name=issue_name)
+        info = info_extracted if info_extracted is not None else info
+        if issue_name in ["label", "class_imbalance"]:
+            if data.labels.is_available is False:
+                raise ValueError(
+                    "The labels are not available. "
+                    "Most likely, no label column was provided when creating the Data object."
+                )
+            # Labels that are stored as integers may need to be converted to strings.
+            label_map = data.labels.label_map
+            if not label_map:
+                raise ValueError("The label map is not available.")
+            for key in ["given_label", "predicted_label"]:
+                labels = info.get(key, None)
+                if labels is not None:
+                    info[key] = np.vectorize(label_map.get)(labels)
+            info["class_names"] = list(label_map.values())
+        return info
+
+
+class _RegressionInfoStrategy(_InfoStrategy):
+    """Strategy for computing information about data issues related to regression tasks."""
+
+    @staticmethod
+    def get_info(
+        data: Data,
+        info: Dict[str, Dict[str, Any]],
+        issue_name: Optional[str] = None,
+    ) -> Dict[str, Any]:
+        info_extracted = _InfoStrategy._get_info_helper(info=info, issue_name=issue_name)
+        info = info_extracted if info_extracted is not None else info
+        if issue_name == "label":
+            for key in ["given_label", "predicted_label"]:
+                labels = info.get(key, None)
+                if labels is not None:
+                    info[key] = labels
+        return info
+
+
+class _MultilabelInfoStrategy(_InfoStrategy):
+    """Strategy for computing information about data issues related to multilabel tasks."""
+
+    @staticmethod
+    def get_info(
+        data: Data,
+        info: Dict[str, Dict[str, Any]],
+        issue_name: Optional[str] = None,
+    ) -> Dict[str, Any]:
+        info_extracted = _InfoStrategy._get_info_helper(info=info, issue_name=issue_name)
+        info = info_extracted if info_extracted is not None else info
+        if issue_name == "label":
+            if data.labels.is_available is False:
+                raise ValueError(
+                    "The labels are not available. "
+                    "Most likely, no label column was provided when creating the Data object."
+                )
+            # Labels that are stored as integers may need to be converted to strings.
+            label_map = data.labels.label_map
+            if not label_map:
+                raise ValueError("The label map is not available.")
+            for key in ["given_label", "predicted_label"]:
+                labels = info.get(key, None)
+                if labels is not None:
+                    info[key] = [list(map(label_map.get, label)) for label in labels]
+            info["class_names"] = list(label_map.values())
+        return info
+
+
+
[docs]class DataIssues: + """ + Class that collects and stores information and statistics on issues found in a dataset. + + Parameters + ---------- + data : + The data object for which the issues are being collected. + strategy : + Strategy used for processing info dictionaries. + + Attributes + ---------- + issues : pd.DataFrame + Stores information about each individual issue found in the data, + on a per-example basis. + issue_summary : pd.DataFrame + Summarizes the overall statistics for each issue type. + info : dict + A dictionary that contains information and statistics about the data and each issue type. + """ + + def __init__(self, data: Data, strategy: Type[_InfoStrategy]) -> None: + self.issues: pd.DataFrame = pd.DataFrame(index=range(len(data))) + self.issue_summary: pd.DataFrame = pd.DataFrame( + columns=["issue_type", "score", "num_issues"] + ).astype({"score": np.float64, "num_issues": np.int64}) + self.info: Dict[str, Dict[str, Any]] = { + "statistics": get_data_statistics(data), + } + self._data = data + self._strategy = strategy + +
[docs] def get_info(self, issue_name: Optional[str] = None) -> Dict[str, Any]: + return self._strategy.get_info(data=self._data, info=self.info, issue_name=issue_name)
+ + @property + def statistics(self) -> Dict[str, Any]: + """Returns the statistics dictionary. + + Shorthand for self.info["statistics"]. + """ + return self.info["statistics"] + +
[docs] def get_issues(self, issue_name: Optional[str] = None) -> pd.DataFrame: + """ + Use this after finding issues to see which examples suffer from which types of issues. + + Parameters + ---------- + issue_name : str or None + The type of issue to focus on. If `None`, returns full DataFrame summarizing all of the types of issues detected in each example from the dataset. + + Raises + ------ + ValueError + If `issue_name` is not a type of issue previously considered in the audit. + + Returns + ------- + specific_issues : + A DataFrame where each row corresponds to an example from the dataset and columns specify: + whether this example exhibits a particular type of issue and how severely (via a numeric quality score where lower values indicate more severe instances of the issue). + + Additional columns may be present in the DataFrame depending on the type of issue specified. + """ + if self.issues.empty: + raise ValueError( + """No issues available for retrieval. Please check the following before using `get_issues`: + 1. Ensure `find_issues` was executed. If not, please run it with the necessary parameters. + 2. If `find_issues` was run but you're seeing this message, + it may have encountered limitations preventing full analysis. + However, partial checks can still provide valuable insights. + Review `find_issues` output carefully for any specific actions needed + to facilitate a more comprehensive analysis before calling `get_issues`. + """ + ) + if issue_name is None: + return self.issues + + columns = [col for col in self.issues.columns if issue_name in col] + if not columns: + raise ValueError( + f"""No columns found for issue type '{issue_name}'. Ensure the following: + 1. `find_issues` has been executed. If it hasn't, please run it. + 2. Check `find_issues` output to verify that the issue type '{issue_name}' was included in the checks to + ensure it was not excluded accidentally before the audit. + 3. Review `find_issues` output for any errors or warnings that might indicate the check for '{issue_name}' issues failed to complete. + This can provide better insights into what adjustments may be necessary. + """ + ) + specific_issues = self.issues[columns] + info = self.get_info(issue_name=issue_name) + + if issue_name == "label": + specific_issues = specific_issues.assign( + given_label=info["given_label"], predicted_label=info["predicted_label"] + ) + + if issue_name == "near_duplicate": + column_dict = { + k: info.get(k) + for k in ["near_duplicate_sets", "distance_to_nearest_neighbor"] + if info.get(k) is not None + } + specific_issues = specific_issues.assign(**column_dict) + + if issue_name == "class_imbalance": + specific_issues = specific_issues.assign(given_label=info["given_label"]) + return specific_issues
+ +
[docs] def get_issue_summary(self, issue_name: Optional[str] = None) -> pd.DataFrame: + """Summarize the issues found in dataset of a particular type, + including how severe this type of issue is overall across the dataset. + + Parameters + ---------- + issue_name : + Name of the issue type to summarize. If `None`, summarizes each of the different issue types previously considered in the audit. + + Returns + ------- + issue_summary : + DataFrame where each row corresponds to a type of issue, and columns quantify: + the number of examples in the dataset estimated to exhibit this type of issue, + and the overall severity of the issue across the dataset (via a numeric quality score where lower values indicate that the issue is overall more severe). + """ + if self.issue_summary.empty: + raise ValueError( + "No issues found in the dataset. " + "Call `find_issues` before calling `get_issue_summary`." + ) + + if issue_name is None: + return self.issue_summary + + row_mask = self.issue_summary["issue_type"] == issue_name + if not any(row_mask): + raise ValueError(f"Issue type {issue_name} not found in the summary.") + return self.issue_summary[row_mask].reset_index(drop=True)
+ +
[docs] def collect_statistics(self, issue_manager: Union[IssueManager, "Imagelab"]) -> None: + """Update the statistics in the info dictionary. + + Parameters + ---------- + statistics : + A dictionary of statistics to add/update in the info dictionary. + + Examples + -------- + + A common use case is to reuse the KNN-graph across multiple issue managers. + To avoid recomputing the KNN-graph for each issue manager, + we can pass it as a statistic to the issue managers. + + >>> from scipy.sparse import csr_matrix + >>> weighted_knn_graph = csr_matrix(...) + >>> issue_manager_that_computes_knn_graph = ... + + """ + key = "statistics" + statistics: Dict[str, Any] = issue_manager.info.get(key, {}) + if statistics: + self.info[key].update(statistics)
+ + def _update_issues(self, issue_manager): + overlapping_columns = list(set(self.issues.columns) & set(issue_manager.issues.columns)) + if overlapping_columns: + warnings.warn( + f"Overwriting columns {overlapping_columns} in self.issues with " + f"columns from issue manager {issue_manager}." + ) + self.issues.drop(columns=overlapping_columns, inplace=True) + self.issues = self.issues.join(issue_manager.issues, how="outer") + + def _update_issue_info(self, issue_name, new_info): + if issue_name in self.info: + warnings.warn(f"Overwriting key {issue_name} in self.info") + self.info[issue_name] = new_info + +
[docs] def collect_issues_from_issue_manager(self, issue_manager: IssueManager) -> None: + """ + Collects results from an IssueManager and update the corresponding + attributes of the Datalab object. + + This includes: + - self.issues + - self.issue_summary + - self.info + + Parameters + ---------- + issue_manager : + IssueManager object to collect results from. + """ + self._update_issues(issue_manager) + + if issue_manager.issue_name in self.issue_summary["issue_type"].values: + warnings.warn( + f"Overwriting row in self.issue_summary with " + f"row from issue manager {issue_manager}." + ) + self.issue_summary = self.issue_summary[ + self.issue_summary["issue_type"] != issue_manager.issue_name + ] + issue_column_name: str = f"is_{issue_manager.issue_name}_issue" + num_issues: int = int(issue_manager.issues[issue_column_name].sum()) + self.issue_summary = pd.concat( + [ + self.issue_summary, + issue_manager.summary.assign(num_issues=num_issues), + ], + axis=0, + ignore_index=True, + ) + self._update_issue_info(issue_manager.issue_name, issue_manager.info)
+ +
[docs] def collect_issues_from_imagelab(self, imagelab: "Imagelab", issue_types: List[str]) -> None: + pass # pragma: no cover
+ +
[docs] def set_health_score(self) -> None: + """Set the health score for the dataset based on the issue summary. + + Currently, the health score is the mean of the scores for each issue type. + """ + self.info["statistics"]["health_score"] = self.issue_summary["score"].mean()
+ + +
[docs]def get_data_statistics(data: Data) -> Dict[str, Any]: + """Get statistics about a dataset. + + This function is called to initialize the "statistics" info in all `Datalab` objects. + + Parameters + ---------- + data : Data + Data object containing the dataset. + """ + statistics: Dict[str, Any] = { + "num_examples": len(data), + "multi_label": False, + "health_score": None, + } + if data.labels.is_available: + class_names = data.class_names + statistics["class_names"] = class_names + statistics["num_classes"] = len(class_names) + return statistics
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/issue_finder.html b/v2.6.6/_modules/cleanlab/datalab/internal/issue_finder.html new file mode 100644 index 000000000..bca0a2859 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/issue_finder.html @@ -0,0 +1,1182 @@ + + + + + + + + + + + cleanlab.datalab.internal.issue_finder - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.issue_finder

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+"""
+Module for the :class:`IssueFinder` class, which is responsible for configuring,
+creating and running issue managers.
+
+It determines which types of issues to look for, instatiates the IssueManagers
+via a factory, run the issue managers
+(:py:meth:`IssueManager.find_issues <cleanlab.datalab.internal.issue_manager.issue_manager.IssueManager.find_issues>`),
+and collects the results to :py:class:`DataIssues <cleanlab.datalab.internal.data_issues.DataIssues>`.
+
+.. note::
+
+    This module is not intended to be used directly. Instead, use the public-facing
+    :py:meth:`Datalab.find_issues <cleanlab.datalab.datalab.Datalab.find_issues>` method.
+"""
+from __future__ import annotations
+
+import warnings
+from typing import TYPE_CHECKING, Any, Dict, Optional
+
+import numpy as np
+from scipy.sparse import csr_matrix
+
+from cleanlab.datalab.internal.issue_manager_factory import (
+    _IssueManagerFactory,
+    list_default_issue_types,
+)
+from cleanlab.datalab.internal.model_outputs import (
+    MultiClassPredProbs,
+    MultiLabelPredProbs,
+    RegressionPredictions,
+)
+from cleanlab.datalab.internal.task import Task
+
+if TYPE_CHECKING:  # pragma: no cover
+    from typing import Callable
+
+    import numpy.typing as npt
+
+    from cleanlab.datalab.datalab import Datalab
+
+
+_CLASSIFICATION_ARGS_DICT = {
+    "label": ["pred_probs", "features"],
+    "outlier": ["pred_probs", "features", "knn_graph"],
+    "near_duplicate": ["features", "knn_graph"],
+    "non_iid": ["pred_probs", "features", "knn_graph"],
+    # The underperforming_group issue type requires a pair of inputs: (pred_probs, <any_of_the_other_three>)
+    "underperforming_group": ["pred_probs", "features", "knn_graph", "cluster_ids"],
+    "data_valuation": ["features", "knn_graph"],
+    "class_imbalance": [],
+    "null": ["features"],
+}
+_REGRESSION_ARGS_DICT = {
+    "label": ["features", "predictions"],
+    "outlier": ["features", "knn_graph"],
+    "near_duplicate": ["features", "knn_graph"],
+    "non_iid": ["features", "knn_graph"],
+    "data_valuation": ["features", "knn_graph"],
+    "null": ["features"],
+}
+
+_MULTILABEL_ARGS_DICT = {
+    "label": ["pred_probs"],
+    "outlier": ["features", "knn_graph"],
+    "near_duplicate": ["features", "knn_graph"],
+    "non_iid": ["features", "knn_graph"],
+    "data_valuation": ["features", "knn_graph"],
+    "null": ["features"],
+}
+
+
+def _resolve_required_args_for_classification(**kwargs):
+    """Resolves the required arguments for each issue type intended for classification tasks."""
+    initial_args_dict = _CLASSIFICATION_ARGS_DICT.copy()
+    args_dict = {
+        issue_type: {arg: kwargs.get(arg, None) for arg in initial_args_dict[issue_type]}
+        for issue_type in initial_args_dict
+    }
+
+    # Some issue types (like class-imbalance) have no required args.
+    # This conditional lambda is used to include them in args dict.
+    keep_empty_argument = lambda k: not len(_CLASSIFICATION_ARGS_DICT[k])
+
+    # Remove None values from argument list, rely on default values in IssueManager
+    args_dict = {
+        k: {k2: v2 for k2, v2 in v.items() if v2 is not None}
+        for k, v in args_dict.items()
+        if (v or keep_empty_argument(k))
+    }
+
+    # Prefer `knn_graph` over `features` if both are provided.
+    for v in args_dict.values():
+        if "cluster_ids" in v and ("knn_graph" in v or "features" in v):
+            warnings.warn(
+                "`cluster_ids` have been provided with `knn_graph` or `features`."
+                "Issue managers that require cluster labels will prefer"
+                "`cluster_ids` over computation of cluster labels using"
+                "`knn_graph` or `features`. "
+            )
+        if "knn_graph" in v and "features" in v:
+            warnings.warn(
+                "Both `features` and `knn_graph` were provided. "
+                "Most issue managers will likely prefer using `knn_graph` "
+                "instead of `features` for efficiency."
+            )
+
+    # Only keep issue types that have at least one argument
+    # or those that require no arguments.
+    args_dict = {k: v for k, v in args_dict.items() if (v or keep_empty_argument(k))}
+
+    return args_dict
+
+
+def _resolve_required_args_for_regression(**kwargs):
+    """Resolves the required arguments for each issue type intended for regression tasks."""
+    initial_args_dict = _REGRESSION_ARGS_DICT.copy()
+    args_dict = {
+        issue_type: {arg: kwargs.get(arg, None) for arg in initial_args_dict[issue_type]}
+        for issue_type in initial_args_dict
+    }
+    # Some issue types have no required args.
+    # This conditional lambda is used to include them in args dict.
+    keep_empty_argument = lambda k: not len(_REGRESSION_ARGS_DICT[k])
+
+    # Remove None values from argument list, rely on default values in IssueManager
+    args_dict = {
+        k: {k2: v2 for k2, v2 in v.items() if v2 is not None}
+        for k, v in args_dict.items()
+        if v or keep_empty_argument(k)
+    }
+
+    # Only keep issue types that have at least one argument
+    # or those that require no arguments.
+    args_dict = {k: v for k, v in args_dict.items() if (v or keep_empty_argument(k))}
+
+    return args_dict
+
+
+def _resolve_required_args_for_multilabel(**kwargs):
+    """Resolves the required arguments for each issue type intended for multilabel tasks."""
+    initial_args_dict = _MULTILABEL_ARGS_DICT.copy()
+    args_dict = {
+        issue_type: {arg: kwargs.get(arg, None) for arg in initial_args_dict[issue_type]}
+        for issue_type in initial_args_dict
+    }
+    # Some issue types have no required args.
+    # This conditional lambda is used to include them in args dict.
+    keep_empty_argument = lambda k: not len(_MULTILABEL_ARGS_DICT[k])
+
+    # Remove None values from argument list, rely on default values in IssueManager
+    args_dict = {
+        k: {k2: v2 for k2, v2 in v.items() if v2 is not None}
+        for k, v in args_dict.items()
+        if v or keep_empty_argument(k)  # Allow label issues to require no arguments
+    }
+
+    # Only keep issue types that have at least one argument
+    # or those that require no arguments.
+    args_dict = {k: v for k, v in args_dict.items() if (v or keep_empty_argument(k))}
+
+    return args_dict
+
+
+def _select_strategy_for_resolving_required_args(task: Task) -> Callable:
+    """Helper function that selects the strategy for resolving required arguments for each issue type.
+
+    Each strategy resolves the required arguments for each issue type.
+
+    This is a helper function that filters out any issue manager
+    that does not have the required arguments.
+
+    This does not consider custom hyperparameters for each issue type.
+
+    Parameters
+    ----------
+    task : str
+        The type of machine learning task that the dataset is used for.
+
+    Returns
+    -------
+    args_dict :
+        Dictionary of required arguments for each issue type, if available.
+    """
+    strategies = {
+        Task.CLASSIFICATION: _resolve_required_args_for_classification,
+        Task.REGRESSION: _resolve_required_args_for_regression,
+        Task.MULTILABEL: _resolve_required_args_for_multilabel,
+    }
+    selected_strategy = strategies.get(task, None)
+    if selected_strategy is None:
+        raise ValueError(f"No strategy for resolving required arguments for task '{task}'")
+    return selected_strategy
+
+
+
[docs]class IssueFinder: + """ + The IssueFinder class is responsible for managing the process of identifying + issues in the dataset by handling the creation and execution of relevant + IssueManagers. It serves as a coordinator or helper class for the Datalab class + to encapsulate the specific behavior of the issue finding process. + + At a high level, the IssueFinder is responsible for: + + - Determining which types of issues to look for. + - Instantiating the appropriate IssueManagers using a factory. + - Running the IssueManagers' `find_issues` methods. + - Collecting the results into a DataIssues instance. + + Parameters + ---------- + datalab : Datalab + The Datalab instance associated with this IssueFinder. + + task : str + The type of machine learning task that the dataset is used for. + + verbosity : int + Controls the verbosity of the output during the issue finding process. + + Note + ---- + This class is not intended to be used directly. Instead, use the + `Datalab.find_issues` method which internally utilizes an IssueFinder instance. + """ + + def __init__(self, datalab: "Datalab", task: Task, verbosity=1): + self.datalab = datalab + self.task = task + self.verbosity = verbosity + +
[docs] def find_issues( + self, + *, + pred_probs: Optional[np.ndarray] = None, + features: Optional[npt.NDArray] = None, + knn_graph: Optional[csr_matrix] = None, + issue_types: Optional[Dict[str, Any]] = None, + ) -> None: + """ + Checks the dataset for all sorts of common issues in real-world data (in both labels and feature values). + + You can use Datalab to find issues in your data, utilizing *any* model you have already trained. + This method only interacts with your model via its predictions or embeddings (and other functions thereof). + The more of these inputs you provide, the more types of issues Datalab can detect in your dataset/labels. + If you provide a subset of these inputs, Datalab will output what insights it can based on the limited information from your model. + + Note + ---- + This method is not intended to be used directly. Instead, use the + :py:meth:`Datalab.find_issues <cleanlab.datalab.datalab.Datalab.find_issues>` method. + + Note + ---- + The issues are saved in the ``self.datalab.data_issues.issues`` attribute, but are not returned. + + Parameters + ---------- + pred_probs : + Out-of-sample predicted class probabilities made by the model for every example in the dataset. + To best detect label issues, provide this input obtained from the most accurate model you can produce. + + If provided for classification, this must be a 2D array with shape ``(num_examples, K)`` where K is the number of classes in the dataset. + If provided for regression, this must be a 1D array with shape ``(num_examples,)``. + + features : Optional[np.ndarray] + Feature embeddings (vector representations) of every example in the dataset. + + If provided, this must be a 2D array with shape (num_examples, num_features). + + knn_graph : + Sparse matrix representing distances between examples in the dataset in a k nearest neighbor graph. + + For details, refer to the documentation of the same argument in :py:class:`Datalab.find_issues <cleanlab.datalab.datalab.Datalab.find_issues>` + + issue_types : + Collection specifying which types of issues to consider in audit and any non-default parameter settings to use. + If unspecified, a default set of issue types and recommended parameter settings is considered. + + This is a dictionary of dictionaries, where the keys are the issue types of interest + and the values are dictionaries of parameter values that control how each type of issue is detected (only for advanced users). + More specifically, the values are constructor keyword arguments passed to the corresponding ``IssueManager``, + which is responsible for detecting the particular issue type. + + .. seealso:: + :py:class:`IssueManager <cleanlab.datalab.internal.issue_manager.issue_manager.IssueManager>` + """ + + issue_types_copy = self.get_available_issue_types( + pred_probs=pred_probs, + features=features, + knn_graph=knn_graph, + issue_types=issue_types, + ) + + if not issue_types_copy: + return None + + new_issue_managers = [ + factory(datalab=self.datalab, **issue_types_copy.get(factory.issue_name, {})) + for factory in _IssueManagerFactory.from_list( + list(issue_types_copy.keys()), task=self.task + ) + ] + + failed_managers = [] + data_issues = self.datalab.data_issues + for issue_manager, arg_dict in zip(new_issue_managers, issue_types_copy.values()): + try: + if self.verbosity: + print(f"Finding {issue_manager.issue_name} issues ...") + issue_manager.find_issues(**arg_dict) + data_issues.collect_statistics(issue_manager) + data_issues.collect_issues_from_issue_manager(issue_manager) + except Exception as e: + print(f"Error in {issue_manager.issue_name}: {e}") + failed_managers.append(issue_manager) + if failed_managers: + print(f"Failed to check for these issue types: {failed_managers}") + data_issues.set_health_score()
+ + def _set_issue_types( + self, + issue_types: Optional[Dict[str, Any]], + required_defaults_dict: Dict[str, Any], + ) -> Dict[str, Any]: + """Set necessary configuration for each IssueManager in a dictionary. + + While each IssueManager defines default values for its arguments, + the Datalab class needs to organize the calls to each IssueManager + with different arguments, some of which may be user-provided. + + Parameters + ---------- + issue_types : + Dictionary of issue types and argument configuration for their respective IssueManagers. + If None, then the `required_defaults_dict` is used. + + required_defaults_dict : + Dictionary of default parameter configuration for each issue type. + + Returns + ------- + issue_types_copy : + Dictionary of issue types and their parameter configuration. + The input `issue_types` is copied and updated with the necessary default values. + """ + if issue_types is not None: + issue_types_copy = issue_types.copy() + self._check_missing_args(required_defaults_dict, issue_types_copy) + else: + issue_types_copy = required_defaults_dict.copy() + # keep only default issue types + issue_types_copy = { + issue: issue_types_copy[issue] + for issue in list_default_issue_types(self.task) + if issue in issue_types_copy + } + + # Check that all required arguments are provided. + self._validate_issue_types_dict(issue_types_copy, required_defaults_dict) + + # Remove None values from argument list, rely on default values in IssueManager + for key, value in issue_types_copy.items(): + issue_types_copy[key] = {k: v for k, v in value.items() if v is not None} + + return issue_types_copy + + @staticmethod + def _check_missing_args(required_defaults_dict, issue_types): + for key, issue_type_value in issue_types.items(): + missing_args = set(required_defaults_dict.get(key, {})) - set(issue_type_value.keys()) + # Impute missing arguments with default values. + missing_dict = { + missing_arg: required_defaults_dict[key][missing_arg] + for missing_arg in missing_args + } + issue_types[key].update(missing_dict) + + @staticmethod + def _validate_issue_types_dict( + issue_types: Dict[str, Any], required_defaults_dict: Dict[str, Any] + ) -> None: + missing_required_args_dict = {} + for issue_name, required_args in required_defaults_dict.items(): + if issue_name in issue_types: + missing_args = set(required_args.keys()) - set(issue_types[issue_name].keys()) + if missing_args: + missing_required_args_dict[issue_name] = missing_args + if any(missing_required_args_dict.values()): + error_message = "" + for issue_name, missing_required_args in missing_required_args_dict.items(): + error_message += f"Required argument {missing_required_args} for issue type {issue_name} was not provided.\n" + raise ValueError(error_message) + +
[docs] def get_available_issue_types(self, **kwargs): + """Returns a dictionary of issue types that can be used in :py:meth:`Datalab.find_issues + <cleanlab.datalab.datalab.Datalab.find_issues>` method.""" + + pred_probs = kwargs.get("pred_probs", None) + features = kwargs.get("features", None) + knn_graph = kwargs.get("knn_graph", None) + issue_types = kwargs.get("issue_types", None) + + model_output = None + if pred_probs is not None: + model_output_dict = { + Task.REGRESSION: RegressionPredictions, + Task.CLASSIFICATION: MultiClassPredProbs, + Task.MULTILABEL: MultiLabelPredProbs, + } + + model_output_class = model_output_dict.get(self.task) + if model_output_class is None: + raise ValueError(f"Unknown task type '{self.task}'") + + model_output = model_output_class(pred_probs) + + if model_output is not None: + # A basic trick to assign the model output to the correct argument + # E.g. Datalab accepts only `pred_probs`, but those are assigned to the `predictions` argument for regression-related issue_managers + kwargs.update({model_output.argument: model_output.collect()}) + + # Determine which parameters are required for each issue type + strategy_for_resolving_required_args = _select_strategy_for_resolving_required_args( + self.task + ) + required_args_per_issue_type = strategy_for_resolving_required_args(**kwargs) + + issue_types_copy = self._set_issue_types(issue_types, required_args_per_issue_type) + if issue_types is None: + # Only run default issue types if no issue types are specified + issue_types_copy = { + issue: issue_types_copy[issue] + for issue in list_default_issue_types(self.task) + if issue in issue_types_copy + } + drop_label_check = ( + "label" in issue_types_copy + and not self.datalab.has_labels + and self.task != Task.REGRESSION + ) + + if drop_label_check: + warnings.warn("No labels were provided. " "The 'label' issue type will not be run.") + issue_types_copy.pop("label") + + outlier_check_needs_features = ( + self.task == "classification" + and "outlier" in issue_types_copy + and not self.datalab.has_labels + ) + if outlier_check_needs_features: + no_features = features is None + no_knn_graph = knn_graph is None + pred_probs_given = issue_types_copy["outlier"].get("pred_probs", None) is not None + + only_pred_probs_given = pred_probs_given and no_features and no_knn_graph + if only_pred_probs_given: + warnings.warn( + "No labels were provided. " "The 'outlier' issue type will not be run." + ) + issue_types_copy.pop("outlier") + + drop_class_imbalance_check = ( + "class_imbalance" in issue_types_copy + and not self.datalab.has_labels + and self.task == Task.CLASSIFICATION + ) + if drop_class_imbalance_check: + issue_types_copy.pop("class_imbalance") + + required_pairs_for_underperforming_group = [ + ("pred_probs", "features"), + ("pred_probs", "knn_graph"), + ("pred_probs", "cluster_ids"), + ] + drop_underperforming_group_check = "underperforming_group" in issue_types_copy and not any( + all( + key in issue_types_copy["underperforming_group"] + and issue_types_copy["underperforming_group"].get(key) is not None + for key in pair + ) + for pair in required_pairs_for_underperforming_group + ) + if drop_underperforming_group_check: + issue_types_copy.pop("underperforming_group") + + return issue_types_copy
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/data_valuation.html b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/data_valuation.html new file mode 100644 index 000000000..d4bcacb07 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/data_valuation.html @@ -0,0 +1,883 @@ + + + + + + + + + + + cleanlab.datalab.internal.issue_manager.data_valuation - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.issue_manager.data_valuation

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+from __future__ import annotations
+
+from typing import (
+    TYPE_CHECKING,
+    Any,
+    Callable,
+    ClassVar,
+    Dict,
+    List,
+    Optional,
+    Union,
+)
+
+
+import numpy as np
+import pandas as pd
+from scipy.sparse import csr_matrix
+
+from cleanlab.data_valuation import data_shapley_knn
+from cleanlab.datalab.internal.issue_manager import IssueManager
+from cleanlab.datalab.internal.issue_manager.knn_graph_helpers import (
+    num_neighbors_in_knn_graph,
+    set_knn_graph,
+)
+
+if TYPE_CHECKING:  # pragma: no cover
+    import numpy.typing as npt
+    import pandas as pd
+    from cleanlab.datalab.datalab import Datalab
+
+
+
[docs]class DataValuationIssueManager(IssueManager): + """ + Detect which examples in a dataset are least valuable via an approximate Data Shapely value. + + Examples + -------- + .. code-block:: python + + >>> from cleanlab import Datalab + >>> import numpy as np + >>> from sklearn.neighbors import NearestNeighbors + >>> + >>> # Generate two distinct clusters + >>> X = np.vstack([ + ... np.random.normal(-1, 1, (25, 2)), + ... np.random.normal(1, 1, (25, 2)), + ... ]) + >>> y = np.array([0]*25 + [1]*25) + >>> + >>> # Initialize Datalab with data + >>> lab = Datalab(data={"y": y}, label_name="y") + >>> + >>> # Creating a knn_graph for data valuation + >>> knn = NearestNeighbors(n_neighbors=10).fit(X) + >>> knn_graph = knn.kneighbors_graph(mode='distance') + >>> + >>> # Specifying issue types for data valuation + >>> issue_types = {"data_valuation": {}} + >>> lab.find_issues(knn_graph=knn_graph, issue_types=issue_types) + """ + + description: ClassVar[ + str + ] = """ + Examples that contribute minimally to a model's training + receive lower valuation scores. + Since the original knn-shapley value is in [-1, 1], we transform it to [0, 1] by: + + .. math:: + 0.5 \times (\text{shapley} + 1) + + here shapley is the original knn-shapley value. + """ + + issue_name: ClassVar[str] = "data_valuation" + issue_score_key: ClassVar[str] + verbosity_levels: ClassVar[Dict[int, List[str]]] = { + 0: [], + 1: [], + 2: [], + 3: ["average_data_valuation"], + } + + DEFAULT_THRESHOLD = 0.5 + + def __init__( + self, + datalab: Datalab, + metric: Optional[Union[str, Callable]] = None, + threshold: Optional[float] = None, + k: int = 10, + **kwargs, + ): + super().__init__(datalab) + self.metric = metric + self.k = k + self.threshold = threshold if threshold is not None else self.DEFAULT_THRESHOLD + +
[docs] def find_issues( + self, + features: Optional[npt.NDArray] = None, + **kwargs, + ) -> None: + """Calculate the data valuation score with a provided or existing knn graph. + Based on KNN-Shapley value described in https://arxiv.org/abs/1911.07128 + The larger the score, the more valuable the data point is, the more contribution it will make to the model's training. + + Parameters + ---------- + knn_graph : csr_matrix + A sparse matrix representing the knn graph. + """ + labels = self.datalab.labels + if not isinstance(labels, np.ndarray): + error_msg = ( + f"Expected labels to be a numpy array of shape (n_samples,) to use with DataValuationIssueManager, " + f"but got {type(labels)} instead." + ) + raise TypeError(error_msg) + + knn_graph, self.metric, _ = set_knn_graph( + features=features, + find_issues_kwargs=kwargs, + metric=self.metric, + k=self.k, + statistics=self.datalab.get_info("statistics"), + ) + + # TODO: Check self.k against user-provided knn-graphs across all issue managers + num_neighbors = num_neighbors_in_knn_graph(knn_graph) + if self.k > num_neighbors: + raise ValueError( + f"The provided knn graph has {num_neighbors} neighbors, which is less than the required {self.k} neighbors. " + "Please ensure that the knn graph you provide has at least as many neighbors as the required value of k." + ) + + scores = data_shapley_knn(labels, knn_graph=knn_graph, k=self.k) + + self.issues = pd.DataFrame( + { + f"is_{self.issue_name}_issue": scores < self.threshold, + self.issue_score_key: scores, + }, + ) + self.summary = self.make_summary(score=scores.mean()) + + self.info = self.collect_info(issues=self.issues, knn_graph=knn_graph)
+ +
[docs] def collect_info(self, issues: pd.DataFrame, knn_graph: csr_matrix) -> dict: + issues_info = { + "num_low_valuation_issues": sum(issues[f"is_{self.issue_name}_issue"]), + "average_data_valuation": issues[self.issue_score_key].mean(), + } + + params_dict = { + "metric": self.metric, + "k": self.k, + "threshold": self.threshold, + } + + statistics_dict = self._build_statistics_dictionary(knn_graph=knn_graph) + + info_dict = { + **issues_info, + **params_dict, + **statistics_dict, + } + + return info_dict
+ + def _build_statistics_dictionary(self, knn_graph: csr_matrix) -> Dict[str, Dict[str, Any]]: + statistics_dict: Dict[str, Dict[str, Any]] = {"statistics": {}} + + # Add the knn graph as a statistic if necessary + graph_key = "weighted_knn_graph" + old_knn_graph = self.datalab.get_info("statistics").get(graph_key, None) + old_graph_exists = old_knn_graph is not None + prefer_new_graph = ( + not old_graph_exists + or knn_graph.nnz > old_knn_graph.nnz + or self.metric != self.datalab.get_info("statistics").get("knn_metric", None) + ) + if prefer_new_graph: + statistics_dict["statistics"][graph_key] = knn_graph + if self.metric is not None: + statistics_dict["statistics"]["knn_metric"] = self.metric + + return statistics_dict
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/duplicate.html b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/duplicate.html new file mode 100644 index 000000000..6bdf11091 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/duplicate.html @@ -0,0 +1,922 @@ + + + + + + + + + + + cleanlab.datalab.internal.issue_manager.duplicate - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.issue_manager.duplicate

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any, Callable, ClassVar, Dict, List, Optional, Union
+import warnings
+
+import numpy as np
+import pandas as pd
+from scipy.sparse import csr_matrix
+
+
+from cleanlab.datalab.internal.issue_manager import IssueManager
+from cleanlab.datalab.internal.issue_manager.knn_graph_helpers import set_knn_graph
+from cleanlab.internal.constants import EPSILON
+
+if TYPE_CHECKING:  # pragma: no cover
+    import numpy.typing as npt
+    from cleanlab.datalab.datalab import Datalab
+
+
+
[docs]class NearDuplicateIssueManager(IssueManager): + """Manages issues related to near-duplicate examples.""" + + description: ClassVar[ + str + ] = """A (near) duplicate issue refers to two or more examples in + a dataset that are extremely similar to each other, relative + to the rest of the dataset. The examples flagged with this issue + may be exactly duplicated, or lie atypically close together when + represented as vectors (i.e. feature embeddings). + """ + issue_name: ClassVar[str] = "near_duplicate" + verbosity_levels = { + 0: [], + 1: [], + 2: ["threshold"], + } + + def __init__( + self, + datalab: Datalab, + metric: Optional[Union[str, Callable]] = None, + threshold: float = 0.13, + k: int = 10, + **_, + ): + super().__init__(datalab) + self.metric = metric + self.threshold = self._set_threshold(threshold) + self.k = k + self.near_duplicate_sets: List[List[int]] = [] + +
[docs] def find_issues( + self, + features: Optional[npt.NDArray] = None, + **kwargs, + ) -> None: + knn_graph, self.metric, _ = set_knn_graph( + features=features, + find_issues_kwargs=kwargs, + metric=self.metric, + k=self.k, + statistics=self.datalab.get_info("statistics"), + ) + + N = knn_graph.shape[0] + nn_distances = knn_graph.data.reshape(N, -1)[:, 0] + median_nn_distance = max(np.median(nn_distances), EPSILON) # avoid threshold = 0 + self.near_duplicate_sets = self._neighbors_within_radius( + knn_graph, self.threshold, median_nn_distance + ) + + # Flag every example in a near-duplicate set as a near-duplicate issue + all_near_duplicates = np.unique(np.concatenate(self.near_duplicate_sets)) + is_issue_column = np.zeros(N, dtype=bool) + is_issue_column[all_near_duplicates] = True + temperature = 1.0 / median_nn_distance + scores = _compute_scores_with_exp_transform(nn_distances, temperature=temperature) + self.issues = pd.DataFrame( + { + f"is_{self.issue_name}_issue": is_issue_column, + self.issue_score_key: scores, + }, + ) + + self.summary = self.make_summary(score=scores.mean()) + self.info = self.collect_info(knn_graph=knn_graph, median_nn_distance=median_nn_distance)
+ + @staticmethod + def _neighbors_within_radius(knn_graph: csr_matrix, threshold: float, median: float): + """Returns a list of lists of indices of near-duplicate examples. + + Each list of indices represents a set of near-duplicate examples. + + If the list is empty for a given example, then that example is not + a near-duplicate of any other example. + """ + + N = knn_graph.shape[0] + distances = knn_graph.data.reshape(N, -1) + # Create a mask for the threshold + mask = distances < threshold * median + + # Update the indptr to reflect the new number of neighbors + indptr = np.zeros(knn_graph.indptr.shape, dtype=knn_graph.indptr.dtype) + indptr[1:] = np.cumsum(mask.sum(axis=1)) + + # Filter the knn_graph based on the threshold + indices = knn_graph.indices[mask.ravel()] + near_duplicate_sets = [indices[indptr[i] : indptr[i + 1]] for i in range(N)] + + # Second pass over the data is required to ensure each item is included in the near-duplicate sets of its own near-duplicates. + # This is important because a "near-duplicate" relationship is reciprocal. + # For example, if item A is a near-duplicate of item B, then item B should also be considered a near-duplicate of item A. + # NOTE: This approach does not assure that the sets are ordered by increasing distance. + for i, near_duplicates in enumerate(near_duplicate_sets): + for j in near_duplicates: + if i not in near_duplicate_sets[j]: + near_duplicate_sets[j] = np.append(near_duplicate_sets[j], i) + + return near_duplicate_sets + +
[docs] def collect_info(self, knn_graph: csr_matrix, median_nn_distance: float) -> dict: + issues_dict = { + "average_near_duplicate_score": self.issues[self.issue_score_key].mean(), + "near_duplicate_sets": self.near_duplicate_sets, + } + + params_dict = { + "metric": self.metric, + "k": self.k, + "threshold": self.threshold, + } + + N = knn_graph.shape[0] + dists = knn_graph.data.reshape(N, -1)[:, 0] + nn_ids = knn_graph.indices.reshape(N, -1)[:, 0] + + knn_info_dict = { + "nearest_neighbor": nn_ids.tolist(), + "distance_to_nearest_neighbor": dists.tolist(), + "median_distance_to_nearest_neighbor": median_nn_distance, + } + + statistics_dict = self._build_statistics_dictionary(knn_graph=knn_graph) + + info_dict = { + **issues_dict, + **params_dict, + **knn_info_dict, + **statistics_dict, + } + return info_dict
+ + def _build_statistics_dictionary(self, knn_graph: csr_matrix) -> Dict[str, Dict[str, Any]]: + statistics_dict: Dict[str, Dict[str, Any]] = {"statistics": {}} + + # Add the knn graph as a statistic if necessary + graph_key = "weighted_knn_graph" + old_knn_graph = self.datalab.get_info("statistics").get(graph_key, None) + old_graph_exists = old_knn_graph is not None + prefer_new_graph = ( + not old_graph_exists + or knn_graph.nnz > old_knn_graph.nnz + or self.metric != self.datalab.get_info("statistics").get("knn_metric", None) + ) + if prefer_new_graph: + statistics_dict["statistics"][graph_key] = knn_graph + if self.metric is not None: + statistics_dict["statistics"]["knn_metric"] = self.metric + + return statistics_dict + + def _set_threshold( + self, + threshold: float, + ) -> float: + """Computes nearest-neighbors thresholding for near-duplicate detection.""" + if threshold < 0: + warnings.warn( + f"Computed threshold {threshold} is less than 0. " + "Setting threshold to 0." + "This may indicate that either the only a few examples are in the dataset, " + "or the data is heavily skewed." + ) + threshold = 0 + return threshold
+ + +def _compute_scores_with_exp_transform(nn_distances: np.ndarray, temperature: float) -> np.ndarray: + r"""Compute near-duplicate scores from nearest neighbor distances. + + This is a non-linear transformation of the nearest neighbor distances that + maps distances to scores in the range [0, 1]. + + Note + ---- + + This transformation is given by the following formula: + + .. math:: + + \text{score}(d, t) = 1 - e^{-dt} + + where :math:`d` is the nearest neighbor distance and :math:`t > 0` is a temperature parameter. + + Parameters + ---------- + nn_distances : + The nearest neighbor distances for each example. + + Returns + ------- + scores : + The near-duplicate scores for each example. The scores are in the range [0, 1]. + A lower score indicates that an example is more likely to be a near-duplicate than + an example with a higher score. + A score of 0 indicates that an example has an exact duplicate. + """ + if temperature <= 0: + raise ValueError("Temperature must be greater than 0.") + + scores = 1 - np.exp(-temperature * nn_distances) + + # Ensure that for nn_distances approximately equal to 0, the score is set to 0 + inds = np.isclose(nn_distances, 0) + scores[inds] = 0 + + return scores +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/imbalance.html b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/imbalance.html new file mode 100644 index 000000000..4d3540be6 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/imbalance.html @@ -0,0 +1,772 @@ + + + + + + + + + + + cleanlab.datalab.internal.issue_manager.imbalance - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.issue_manager.imbalance

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, ClassVar
+
+import numpy as np
+import pandas as pd
+from cleanlab.datalab.internal.issue_manager import IssueManager
+
+if TYPE_CHECKING:  # pragma: no cover
+    from cleanlab.datalab.datalab import Datalab
+
+
+
[docs]class ClassImbalanceIssueManager(IssueManager): + """Manages issues related to imbalance class examples. + + Parameters + ---------- + datalab: + The Datalab instance that this issue manager searches for issues in. + + threshold: + Minimum fraction of samples of each class that are present in a dataset without class imbalance. + + """ + + description: ClassVar[str] = ( + """Examples belonging to the most under-represented class in the dataset.""" + ) + + issue_name: ClassVar[str] = "class_imbalance" + verbosity_levels = { + 0: ["Rarest Class"], + 1: [], + 2: [], + } + + def __init__(self, datalab: Datalab, threshold: float = 0.1, **_): + super().__init__(datalab) + self.threshold = threshold + +
[docs] def find_issues( + self, + **kwargs, + ) -> None: + labels = self.datalab.labels + if not isinstance(labels, np.ndarray): + error_msg = ( + f"Expected labels to be a numpy array of shape (n_samples,) to use with ClassImbalanceIssueManager, " + f"but got {type(labels)} instead." + ) + raise TypeError(error_msg) + K = len(self.datalab.class_names) + class_probs = np.bincount(labels) / len(labels) + rarest_class_idx = int(np.argmin(class_probs)) + # solely one class is identified as rarest, ties go to class w smaller integer index + scores = np.where(labels == rarest_class_idx, class_probs[rarest_class_idx], 1) + imbalance_exists = class_probs[rarest_class_idx] < self.threshold * (1 / K) + rarest_class_issue = rarest_class_idx if imbalance_exists else -1 + is_issue_column = labels == rarest_class_issue + rarest_class_name = self.datalab._label_map.get(rarest_class_issue, "NA") + + self.issues = pd.DataFrame( + { + f"is_{self.issue_name}_issue": is_issue_column, + self.issue_score_key: scores, + }, + ) + self.summary = self.make_summary(score=class_probs[rarest_class_idx]) + self.info = self.collect_info(class_name=rarest_class_name, labels=labels)
+ +
[docs] def collect_info(self, class_name: str, labels: np.ndarray) -> dict: + params_dict = { + "threshold": self.threshold, + "Rarest Class": class_name, + "given_label": labels, + } + info_dict = {**params_dict} + return info_dict
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/issue_manager.html b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/issue_manager.html new file mode 100644 index 000000000..2e9afe5bf --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/issue_manager.html @@ -0,0 +1,1024 @@ + + + + + + + + + + + cleanlab.datalab.internal.issue_manager.issue_manager - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.issue_manager.issue_manager

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+from __future__ import annotations
+
+from abc import ABC, ABCMeta, abstractmethod
+from itertools import chain
+from typing import TYPE_CHECKING, Any, ClassVar, Dict, List, Optional, Set, Tuple, Type, TypeVar
+import json
+
+import numpy as np
+import pandas as pd
+
+if TYPE_CHECKING:  # pragma: no cover
+    from cleanlab.datalab.datalab import Datalab
+
+
+T = TypeVar("T", bound="IssueManager")
+TM = TypeVar("TM", bound="IssueManagerMeta")
+
+
+class IssueManagerMeta(ABCMeta):
+    """Metaclass for IssueManager that adds issue_score_key to the class.
+
+    :meta private:
+    """
+
+    issue_name: ClassVar[str]
+    issue_score_key: ClassVar[str]
+    verbosity_levels: ClassVar[Dict[int, List[str]]] = {
+        0: [],
+        1: [],
+        2: [],
+        3: [],
+    }
+
+    def __new__(
+        meta: Type[TM],
+        name: str,
+        bases: Tuple[Type[Any], ...],
+        class_dict: Dict[str, Any],
+    ) -> TM:  # Classes that inherit from ABC don't need to be modified
+        if ABC in bases:
+            return super().__new__(meta, name, bases, class_dict)
+
+        # Ensure that the verbosity levels don't have keys other than those in ["issue", "info"]
+        verbosity_levels = class_dict.get("verbosity_levels", meta.verbosity_levels)
+        for level, level_list in verbosity_levels.items():
+            if not isinstance(level_list, list):
+                raise ValueError(
+                    f"Verbosity levels must be lists. "
+                    f"Got {level_list} in {name}.verbosity_levels"
+                )
+            prohibited_keys = [key for key in level_list if not isinstance(key, str)]
+            if prohibited_keys:
+                raise ValueError(
+                    f"Verbosity levels must be lists of strings. "
+                    f"Got {prohibited_keys} in {name}.verbosity_levels[{level}]"
+                )
+
+        # Concrete classes need to have an issue_name attribute
+        if "issue_name" not in class_dict:
+            raise TypeError("IssueManagers need an issue_name class variable")
+
+        # Add issue_score_key to class
+        class_dict["issue_score_key"] = f"{class_dict['issue_name']}_score"
+        return super().__new__(meta, name, bases, class_dict)
+
+
+
[docs]class IssueManager(ABC, metaclass=IssueManagerMeta): + """Base class for managing data issues of a particular type in a Datalab. + + For each example in a dataset, the IssueManager for a particular type of issue should compute: + - A numeric severity score between 0 and 1, + with values near 0 indicating severe instances of the issue. + - A boolean `is_issue` value, which is True + if we believe this example suffers from the issue in question. + `is_issue` may be determined by thresholding the severity score + (with an a priori determined reasonable threshold value), + or via some other means (e.g. Confident Learning for flagging label issues). + + The IssueManager should also report: + - A global value between 0 and 1 summarizing how severe this issue is in the dataset overall + (e.g. the average severity across all examples in dataset + or count of examples where `is_issue=True`). + - Other interesting `info` about the issue and examples in the dataset, + and statistics estimated from current dataset that may be reused + to score this issue in future data. + For example, `info` for label issues could contain the: + confident_thresholds, confident_joint, predicted label for each example, etc. + Another example is for (near)-duplicate detection issue, where `info` could contain: + which set of examples in the dataset are all (nearly) identical. + + Implementing a new IssueManager: + - Define the `issue_name` class attribute, e.g. "label", "duplicate", "outlier", etc. + - Implement the abstract methods `find_issues` and `collect_info`. + - `find_issues` is responsible for computing computing the `issues` and `summary` dataframes. + - `collect_info` is responsible for computing the `info` dict. It is called by `find_issues`, + once the manager has set the `issues` and `summary` dataframes as instance attributes. + """ + + description: ClassVar[str] = "" + """Short text that summarizes the type of issues handled by this IssueManager. + + :meta hide-value: + """ + issue_name: ClassVar[str] + """Returns a key that is used to store issue summary results about the assigned Lab.""" + issue_score_key: ClassVar[str] + """Returns a key that is used to store issue score results about the assigned Lab.""" + verbosity_levels: ClassVar[Dict[int, List[str]]] = { + 0: [], + 1: [], + 2: [], + 3: [], + } + """A dictionary of verbosity levels and their corresponding dictionaries of + report items to print. + + :meta hide-value: + + Example + ------- + + >>> verbosity_levels = { + ... 0: [], + ... 1: ["some_info_key"], + ... 2: ["additional_info_key"], + ... } + """ + + def __init__(self, datalab: Datalab, **_): + self.datalab = datalab + self.info: Dict[str, Any] = {} + self.issues: pd.DataFrame = pd.DataFrame() + self.summary: pd.DataFrame = pd.DataFrame() + + def __repr__(self): + class_name = self.__class__.__name__ + return class_name + + @classmethod + def __init_subclass__(cls): + required_class_variables = [ + "issue_name", + ] + for var in required_class_variables: + if not hasattr(cls, var): + raise NotImplementedError(f"Class {cls.__name__} must define class variable {var}") + +
[docs] @abstractmethod + def find_issues(self, *args, **kwargs) -> None: + """Finds occurrences of this particular issue in the dataset. + + Computes the `issues` and `summary` dataframes. Calls `collect_info` to compute the `info` dict. + """ + raise NotImplementedError
+ +
[docs] def collect_info(self, *args, **kwargs) -> dict: + """Collects data for the info attribute of the Datalab. + + NOTE + ---- + This method is called by :py:meth:`find_issues` after :py:meth:`find_issues` has set the `issues` and `summary` dataframes + as instance attributes. + """ + raise NotImplementedError
+ +
[docs] @classmethod + def make_summary(cls, score: float) -> pd.DataFrame: + """Construct a summary dataframe. + + Parameters + ---------- + score : + The overall score for this issue. + + Returns + ------- + summary : + A summary dataframe. + """ + if not 0 <= score <= 1: + raise ValueError(f"Score must be between 0 and 1. Got {score}.") + + return pd.DataFrame( + { + "issue_type": [cls.issue_name], + "score": [score], + }, + )
+ +
[docs] @classmethod + def report( + cls, + issues: pd.DataFrame, + summary: pd.DataFrame, + info: Dict[str, Any], + num_examples: int = 5, + verbosity: int = 0, + include_description: bool = False, + info_to_omit: Optional[List[str]] = None, + ) -> str: + """Compose a report of the issues found by this IssueManager. + + Parameters + ---------- + issues : + An issues dataframe. + + Example + ------- + >>> import pandas as pd + >>> issues = pd.DataFrame( + ... { + ... "is_X_issue": [True, False, True], + ... "X_score": [0.2, 0.9, 0.4], + ... }, + ... ) + + summary : + The summary dataframe. + + Example + ------- + >>> summary = pd.DataFrame( + ... { + ... "issue_type": ["X"], + ... "score": [0.5], + ... }, + ... ) + + info : + The info dict. + + Example + ------- + >>> info = { + ... "A": "val_A", + ... "B": ["val_B1", "val_B2"], + ... } + + num_examples : + The number of examples to print. + + verbosity : + The verbosity level of the report. + + include_description : + Whether to include a description of the issue in the report. + + Returns + ------- + report_str : + A string containing the report. + """ + + max_verbosity = max(cls.verbosity_levels.keys()) + top_level = max_verbosity + 1 + if verbosity not in list(cls.verbosity_levels.keys()) + [top_level]: + raise ValueError( + f"Verbosity level {verbosity} not supported. " + f"Supported levels: {cls.verbosity_levels.keys()}" + f"Use verbosity={top_level} to print all info." + ) + if issues.empty: + print(f"No issues found") + + topk_ids = issues.sort_values(by=cls.issue_score_key, ascending=True).index[:num_examples] + + score = summary["score"].loc[0] + report_str = f"{' ' + cls.issue_name + ' issues ':-^60}\n\n" + + if include_description and cls.description: + description = cls.description + if verbosity == 0: + description = description.split("\n\n", maxsplit=1)[0] + report_str += "About this issue:\n\t" + description + "\n\n" + report_str += ( + f"Number of examples with this issue: {issues[f'is_{cls.issue_name}_issue'].sum()}\n" + f"Overall dataset quality in terms of this issue: {score:.4f}\n\n" + ) + + info_to_print: Set[str] = set() + _info_to_omit = set(issues.columns).union(info_to_omit or []) + verbosity_levels_values = chain.from_iterable( + list(cls.verbosity_levels.values())[: verbosity + 1] + ) + info_to_print.update(set(verbosity_levels_values) - _info_to_omit) + if verbosity == top_level: + info_to_print.update(set(info.keys()) - _info_to_omit) + + report_str += "Examples representing most severe instances of this issue:\n" + report_str += issues.loc[topk_ids].to_string() + + def truncate(s, max_len=4) -> str: + if hasattr(s, "shape") or hasattr(s, "ndim"): + s = np.array(s) + if s.ndim > 1: + description = f"array of shape {s.shape}\n" + with np.printoptions(threshold=max_len): + if s.ndim == 2: + description += f"{s}" + if s.ndim > 2: + description += f"{s}" + return description + s = s.tolist() + + if isinstance(s, list): + if all([isinstance(s_, list) for s_ in s]): + return truncate(np.array(s, dtype=object), max_len=max_len) + if len(s) > max_len: + s = s[:max_len] + ["..."] + return str(s) + + if info_to_print: + info_to_print_dict = {key: info[key] for key in info_to_print} + # Print the info dict, truncating arrays to 4 elements, + report_str += f"\n\nAdditional Information: " + for key, value in info_to_print_dict.items(): + if key == "statistics": + continue + if isinstance(value, dict): + report_str += f"\n{key}:\n{json.dumps(value, indent=4)}" + elif isinstance(value, pd.DataFrame): + max_rows = 5 + df_str = value.head(max_rows).to_string() + if len(value) > max_rows: + df_str += f"\n... (total {len(value)} rows)" + report_str += f"\n{key}:\n{df_str}" + else: + report_str += f"\n{key}: {truncate(value)}" + return report_str
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/label.html b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/label.html new file mode 100644 index 000000000..313774b70 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/label.html @@ -0,0 +1,967 @@ + + + + + + + + + + + cleanlab.datalab.internal.issue_manager.label - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.issue_manager.label

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any, ClassVar, Dict, Optional
+
+import numpy as np
+from sklearn.neighbors import KNeighborsClassifier
+from sklearn.preprocessing import OneHotEncoder
+
+from cleanlab.classification import CleanLearning
+from cleanlab.count import get_confident_thresholds
+from cleanlab.datalab.internal.issue_manager import IssueManager
+from cleanlab.internal.validation import assert_valid_inputs
+
+if TYPE_CHECKING:  # pragma: no cover
+    import numpy.typing as npt
+    import pandas as pd
+
+    from cleanlab.datalab.datalab import Datalab
+
+
+
[docs]class LabelIssueManager(IssueManager): + """Manages label issues in a Datalab. + + Parameters + ---------- + datalab : + A Datalab instance. + + k : + The number of nearest neighbors to consider when computing pred_probs from features. + Only applicable if features are provided and pred_probs are not. + + clean_learning_kwargs : + Keyword arguments to pass to the :py:meth:`CleanLearning <cleanlab.classification.CleanLearning>` constructor. + + health_summary_parameters : + Keyword arguments to pass to the :py:meth:`health_summary <cleanlab.dataset.health_summary>` function. + """ + + description: ClassVar[ + str + ] = """Examples whose given label is estimated to be potentially incorrect + (e.g. due to annotation error) are flagged as having label issues. + """ + + issue_name: ClassVar[str] = "label" + verbosity_levels = { + 0: [], + 1: [], + 2: [], + 3: ["classes_by_label_quality", "overlapping_classes"], + } + + def __init__( + self, + datalab: Datalab, + k: int = 10, + clean_learning_kwargs: Optional[Dict[str, Any]] = None, + health_summary_parameters: Optional[Dict[str, Any]] = None, + **_, + ): + super().__init__(datalab) + self.cl = CleanLearning(**(clean_learning_kwargs or {})) + self.k = k + self.health_summary_parameters: Dict[str, Any] = ( + health_summary_parameters.copy() if health_summary_parameters else {} + ) + self._find_issues_inputs: Dict[str, bool] = {"features": False, "pred_probs": False} + self._reset() + + @staticmethod + def _process_find_label_issues_kwargs(**kwargs) -> Dict[str, Any]: + """Searches for keyword arguments that are meant for the + CleanLearning.find_label_issues method call + + Examples + -------- + >>> from cleanlab.datalab.internal.issue_manager.label import LabelIssueManager + >>> LabelIssueManager._process_find_label_issues_kwargs(thresholds=[0.1, 0.9]) + {'thresholds': [0.1, 0.9]} + """ + accepted_kwargs = [ + "thresholds", + "noise_matrix", + "inverse_noise_matrix", + "save_space", + "clf_kwargs", + "validation_func", + ] + return {k: v for k, v in kwargs.items() if k in accepted_kwargs and v is not None} + + def _reset(self) -> None: + """Reset the attributes of this manager based on the available datalab info + and the keyword arguments stored as instance attributes. + + This allows the builder to use pre-computed info from the datalab to speed up + some computations in the :py:meth:`find_issues` method. + """ + if not self.health_summary_parameters: + statistics_dict = self.datalab.get_info("statistics") + self.health_summary_parameters = { + "labels": self.datalab.labels, + "class_names": list(self.datalab._label_map.values()), + "num_examples": statistics_dict.get("num_examples"), + "joint": statistics_dict.get("joint", None), + "confident_joint": statistics_dict.get("confident_joint", None), + "multi_label": statistics_dict.get("multi_label", None), + "asymmetric": statistics_dict.get("asymmetric", None), + "verbose": False, + } + self.health_summary_parameters = { + k: v for k, v in self.health_summary_parameters.items() if v is not None + } + +
[docs] def find_issues( + self, + pred_probs: Optional[npt.NDArray] = None, + features: Optional[npt.NDArray] = None, + **kwargs, + ) -> None: + """Find label issues in the datalab. + + Parameters + ---------- + pred_probs : + The predicted probabilities for each example. + + features : + The features for each example. + """ + if pred_probs is not None: + self._find_issues_inputs.update({"pred_probs": True}) + if pred_probs is None: + self._find_issues_inputs.update({"features": True}) + if features is None: + raise ValueError( + "Either pred_probs or features must be provided to find label issues." + ) + # produce out-of-sample pred_probs from features + labels = self.datalab.labels + if not isinstance(labels, np.ndarray): + error_msg = ( + f"Expected labels to be a numpy array of shape (n_samples,) to use in LabelIssueManager, " + f"but got {type(labels)} instead." + ) + raise TypeError(error_msg) + + knn = KNeighborsClassifier(n_neighbors=self.k + 1) + knn.fit(features, labels) + pred_probs = knn.predict_proba(features) + + encoder = OneHotEncoder() + label_transform = labels.reshape(-1, 1) + one_hot_label = encoder.fit_transform(label_transform) + + # adjust pred_probs so it is out-of-sample + pred_probs = np.asarray( + (pred_probs - 1 / (self.k + 1) * one_hot_label) * (self.k + 1) / self.k + ) + + self.health_summary_parameters.update({"pred_probs": pred_probs}) + # Find examples with label issues + labels = self.datalab.labels + self.issues = self.cl.find_label_issues( + labels=labels, + pred_probs=pred_probs, + **self._process_find_label_issues_kwargs(**kwargs), + ) + self.issues.rename(columns={"label_quality": self.issue_score_key}, inplace=True) + + summary_dict = self.get_health_summary(pred_probs=pred_probs) + + # Get a summarized dataframe of the label issues + self.summary = self.make_summary(score=summary_dict["overall_label_health_score"]) + + confident_thresholds = get_confident_thresholds(labels=labels, pred_probs=pred_probs) + # Collect info about the label issues + self.info = self.collect_info( + issues=self.issues, + summary_dict=summary_dict, + confident_thresholds=confident_thresholds, + ) + + # Drop columns from issues that are in the info + self.issues = self.issues.drop(columns=["given_label", "predicted_label"])
+ +
[docs] def get_health_summary(self, pred_probs) -> dict: + """Returns a short summary of the health of this Lab.""" + from cleanlab.dataset import health_summary + + # Validate input + self._validate_pred_probs(pred_probs) + + summary_kwargs = self._get_summary_parameters(pred_probs) + summary = health_summary(**summary_kwargs) + return summary
+ + def _get_summary_parameters(self, pred_probs) -> Dict["str", Any]: + """Collects a set of input parameters for the health summary function based on + any info available in the datalab. + + Parameters + ---------- + pred_probs : + The predicted probabilities for each example. + + kwargs : + Keyword arguments to pass to the health summary function. + + Returns + ------- + summary_parameters : + A dictionary of parameters to pass to the health summary function. + """ + if "confident_joint" in self.health_summary_parameters: + summary_parameters = { + "confident_joint": self.health_summary_parameters["confident_joint"] + } + elif all([x in self.health_summary_parameters for x in ["joint", "num_examples"]]): + summary_parameters = { + k: self.health_summary_parameters[k] for k in ["joint", "num_examples"] + } + else: + summary_parameters = { + "pred_probs": pred_probs, + "labels": self.datalab.labels, + } + + summary_parameters["class_names"] = self.health_summary_parameters["class_names"] + + for k in ["asymmetric", "verbose"]: + # Start with the health_summary_parameters, then override with kwargs + if k in self.health_summary_parameters: + summary_parameters[k] = self.health_summary_parameters[k] + + return ( + summary_parameters # will be called in `dataset.health_summary(**summary_parameters)` + ) + +
[docs] def collect_info( + self, issues: pd.DataFrame, summary_dict: dict, confident_thresholds: np.ndarray + ) -> dict: + issues_info = { + "num_label_issues": sum(issues[f"is_{self.issue_name}_issue"]), + "average_label_quality": issues[self.issue_score_key].mean(), + "given_label": issues["given_label"].tolist(), + "predicted_label": issues["predicted_label"].tolist(), + } + + health_summary_info = { + "confident_joint": summary_dict["joint"], + "classes_by_label_quality": summary_dict["classes_by_label_quality"], + "overlapping_classes": summary_dict["overlapping_classes"], + } + + cl_info = {} + for k in self.cl.__dict__: + if k not in ["py", "noise_matrix", "inverse_noise_matrix", "confident_joint"]: + continue + cl_info[k] = self.cl.__dict__[k] + + info_dict = { + **issues_info, + **health_summary_info, + **cl_info, + "confident_thresholds": confident_thresholds.tolist(), + "find_issues_inputs": self._find_issues_inputs, + } + + return info_dict
+ + def _validate_pred_probs(self, pred_probs) -> None: + assert_valid_inputs(X=None, y=self.datalab.labels, pred_probs=pred_probs)
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/multilabel/label.html b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/multilabel/label.html new file mode 100644 index 000000000..a3f4cef4a --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/multilabel/label.html @@ -0,0 +1,829 @@ + + + + + + + + + + + cleanlab.datalab.internal.issue_manager.multilabel.label - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.issue_manager.multilabel.label

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any, ClassVar, Dict, List
+
+import pandas as pd
+
+from cleanlab.datalab.internal.issue_manager import IssueManager
+from cleanlab.internal.multilabel_utils import onehot2int
+from cleanlab.multilabel_classification.filter import find_label_issues
+from cleanlab.multilabel_classification.rank import get_label_quality_scores
+
+if TYPE_CHECKING:  # pragma: no cover
+    import numpy.typing as npt
+    import pandas as pd
+
+    from cleanlab.datalab.datalab import Datalab
+
+
+
[docs]class MultilabelIssueManager(IssueManager): + """Manages label issues in Datalab for multilabel tasks. + + Parameters + ---------- + datalab : + A Datalab instance. + """ + + description: ClassVar[ + str + ] = """Examples whose given label(s) are estimated to be potentially incorrect + (e.g. due to annotation error) are flagged as having label issues. + """ + + _PREDICTED_LABEL_THRESH = 0.5 + """Internal variable specifying threshold for predicted label.""" + + issue_name: ClassVar[str] = "label" + verbosity_levels = { + 0: [], + 1: [], + 2: [], + 3: [], + } + + def __init__( + self, + datalab: Datalab, + **_, + ): + super().__init__(datalab) + + @staticmethod + def _process_find_label_issues_kwargs(**kwargs: Dict[str, Any]) -> Dict[str, Any]: + """Searches for keyword arguments that are meant for the + multilabel_classification.filter.find_label_issues method call. + + Examples + -------- + >>> from cleanlab.datalab.internal.issue_manager.multilabel.label import MultilabelIssueManager + >>> MultilabelIssueManager._process_find_label_issues_kwargs(frac_noise=0.9) + {'frac_noise': 0.9} + """ + accepted_kwargs = [ + "filter_by", + "frac_noise", + "num_to_remove_per_class", + "min_examples_per_class", + "confident_joint", + "n_jobs", + "verbose", + "low_memory", + ] + return {k: v for k, v in kwargs.items() if k in accepted_kwargs and v is not None} + + @staticmethod + def _process_get_label_quality_scores_kwargs(**kwargs: Dict[str, Any]) -> Dict[str, Any]: + """Searches for keyword arguments that are meant for the + multilabel_classification.rank.get_label_quality_scores method call. + + Examples + -------- + >>> from cleanlab.datalab.internal.issue_manager.multilabel.label import MultilabelIssueManager + >>> MultilabelIssueManager._process_get_label_quality_scores_kwargs(method="self_confidence") + {'method': 'self_confidence'} + """ + accepted_kwargs = ["method", "adjust_pred_probs", "aggregator_kwargs"] + return {k: v for k, v in kwargs.items() if k in accepted_kwargs and v is not None} + +
[docs] def find_issues( + self, + pred_probs: npt.NDArray, + **kwargs, + ) -> None: + """Find label issues in a multilabel dataset. + + Parameters + ---------- + pred_probs : + The predicted probabilities for each example. + """ + predicted_labels = onehot2int(pred_probs > self._PREDICTED_LABEL_THRESH) + + # Find examples with label issues + assert isinstance(self.datalab.labels, List) # Type Narrowing + is_issue_column = find_label_issues( + labels=self.datalab.labels, + pred_probs=pred_probs, + **self._process_find_label_issues_kwargs(**kwargs), + ) + scores = get_label_quality_scores( + labels=self.datalab.labels, + pred_probs=pred_probs, + **self._process_get_label_quality_scores_kwargs(**kwargs), + ) + + self.issues = pd.DataFrame( + { + f"is_{self.issue_name}_issue": is_issue_column, + self.issue_score_key: scores, + }, + ) + # Get a summarized dataframe of the label issues + self.summary = self.make_summary(score=scores.mean()) + + # Collect info about the label issues + self.info = self.collect_info(self.datalab.labels, predicted_labels)
+ +
[docs] def collect_info( + self, given_labels: List[List[int]], predicted_labels: List[List[int]] + ) -> Dict[str, Any]: + issues_info = { + "given_label": given_labels, + "predicted_label": predicted_labels, + } + return issues_info
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/noniid.html b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/noniid.html new file mode 100644 index 000000000..2a8a594ad --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/noniid.html @@ -0,0 +1,1125 @@ + + + + + + + + + + + cleanlab.datalab.internal.issue_manager.noniid - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.issue_manager.noniid

+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any, Callable, ClassVar, Dict, Optional, Union, cast
+import itertools
+
+from scipy.stats import gaussian_kde
+import numpy as np
+import pandas as pd
+from scipy.sparse import csr_matrix
+
+from cleanlab.datalab.internal.issue_manager import IssueManager
+from cleanlab.datalab.internal.issue_manager.knn_graph_helpers import knn_exists, set_knn_graph
+
+if TYPE_CHECKING:  # pragma: no cover
+    import numpy.typing as npt
+    from cleanlab.datalab.datalab import Datalab
+
+
+
[docs]def simplified_kolmogorov_smirnov_test( + neighbor_histogram: npt.NDArray[np.float64], + non_neighbor_histogram: npt.NDArray[np.float64], +) -> float: + """Computes the Kolmogorov-Smirnov statistic between two groups of data. + The statistic is the largest difference between the empirical cumulative + distribution functions (ECDFs) of the two groups. + + Parameters + ---------- + neighbor_histogram : + Histogram data for the nearest neighbor group. + + non_neighbor_histogram : + Histogram data for the non-neighbor group. + + Returns + ------- + statistic : + The KS statistic between the two ECDFs. + + Note + ---- + - Both input arrays should have the same length. + - The input arrays are histograms, which means they contain the count + or frequency of values in each group. The data in the histograms + should be normalized so that they sum to one. + + To calculate the KS statistic, the function first calculates the ECDFs + for both input arrays, which are step functions that show the cumulative + sum of the data up to each point. The function then calculates the + largest absolute difference between the two ECDFs. + """ + + neighbor_cdf = np.cumsum(neighbor_histogram) + non_neighbor_cdf = np.cumsum(non_neighbor_histogram) + + statistic = np.max(np.abs(neighbor_cdf - non_neighbor_cdf)) + return statistic
+ + +
[docs]class NonIIDIssueManager(IssueManager): + """Manages issues related to non-iid data distributions. + + Parameters + ---------- + datalab : + The Datalab instance that this issue manager searches for issues in. + + metric : + The distance metric used to compute the KNN graph of the examples in the dataset. + If set to `None`, the metric will be automatically selected based on the dimensionality + of the features used to represent the examples in the dataset. + + k : + The number of nearest neighbors to consider when computing the KNN graph of the examples. + + num_permutations : + The number of trials to run when performing permutation testing to determine whether + the distribution of index-distances between neighbors in the dataset is IID or not. + + Note + ---- + This class will only flag a single example as an issue if the dataset is considered non-IID. This type of issue + is more relevant to the entire dataset as a whole, rather than to individual examples. + + """ + + description: ClassVar[ + str + ] = """Whether the dataset exhibits statistically significant + violations of the IID assumption like: + changepoints or shift, drift, autocorrelation, etc. + The specific violation considered is whether the + examples are ordered such that almost adjacent examples + tend to have more similar feature values. + """ + issue_name: ClassVar[str] = "non_iid" + verbosity_levels = { + 0: ["p-value"], + 1: [], + 2: [], + } + + def __init__( + self, + datalab: Datalab, + metric: Optional[Union[str, Callable]] = None, + k: int = 10, + num_permutations: int = 25, + seed: Optional[int] = 0, + significance_threshold: float = 0.05, + **_, + ): + super().__init__(datalab) + self.metric = metric + self.k = k + self.num_permutations = num_permutations + self.tests = { + "ks": simplified_kolmogorov_smirnov_test, + } + self.background_distribution = None + self.seed = seed + self.significance_threshold = significance_threshold + + # TODO: Temporary flag introduced to decide on storing knn graphs based on pred_probs. + # Revisit and finalize the implementation. + self._skip_storing_knn_graph_for_pred_probs: bool = False + + @staticmethod + def _determine_optional_features( + features: Optional[npt.NDArray], + pred_probs: Optional[np.ndarray], + ) -> Optional[npt.NDArray]: + """ + Determines the feature array to be used for constructing a knn-graph. Prioritizing the original features array over pred_probs. + If neither are provided, returns None. + + Parameters + ---------- + features : + Original feature array or None. + + pred_probs : + Predicted probabilities array or None. + + Returns + ------- + features_to_use : + Either the original feature array or the predicted probabilities array, + intended for constructing the knn-graph. + + Notes + ----- + A knn-graph constructed from predicted probabilities should not be stored in the statistics. But this kind + of knn-graph is allowed for the purpose of running a non-IID check. + """ + if features is not None: + return features + + if pred_probs is not None: + return pred_probs + + return None + +
[docs] def find_issues( + self, + features: Optional[npt.NDArray] = None, + pred_probs: Optional[np.ndarray] = None, + **kwargs, + ) -> None: + statistics = self.datalab.get_info("statistics") + + # Crucial when building knn graphs with pred_probs instead of features, where only the + # latter is preferred for storage. + self._determine_if_knn_graph_storage_should_be_skipped( + features, pred_probs, kwargs, statistics, self.k + ) + + knn_graph, self.metric, _ = set_knn_graph( + features=self._determine_optional_features(features, pred_probs), + find_issues_kwargs=kwargs, + metric=self.metric, + k=self.k, + statistics=statistics, + ) + + self.neighbor_index_choices = self._get_neighbors(knn_graph=knn_graph) + + self.num_neighbors = self.k + + indices = np.arange(self.N) + self.neighbor_index_distances = np.abs(indices.reshape(-1, 1) - self.neighbor_index_choices) + + self.statistics = self._get_statistics(self.neighbor_index_distances) + + self.p_value = self._permutation_test(num_permutations=self.num_permutations) + + scores = self._score_dataset() + issue_mask = np.zeros(self.N, dtype=bool) + if self.p_value < self.significance_threshold: + issue_mask[scores.argmin()] = True + self.issues = pd.DataFrame( + { + f"is_{self.issue_name}_issue": issue_mask, + self.issue_score_key: scores, + }, + ) + + self.summary = self.make_summary(score=self.p_value) + + self.info = self.collect_info(knn_graph=knn_graph)
+ + def _determine_if_knn_graph_storage_should_be_skipped( + self, features, pred_probs, kwargs, statistics, k + ) -> None: + """Decide whether to skip storing the knn graph based on the availability of pred_probs. + + Should only happend when a new knn graph needs to be computed, and that it + can only be computed from pred_probs. + """ + sufficient_knn_graph_available = knn_exists(kwargs, statistics, k) + pred_probs_needed = ( + not sufficient_knn_graph_available and features is None and pred_probs is not None + ) + if pred_probs_needed: + self._skip_storing_knn_graph_for_pred_probs = True + +
[docs] def collect_info(self, knn_graph: csr_matrix) -> dict: + issues_dict = { + "p-value": self.p_value, + } + + params_dict = { + "metric": self.metric, + "k": self.k, + } + + statistics_dict = self._build_statistics_dictionary(knn_graph=knn_graph) + + info_dict = { + **issues_dict, + **params_dict, # type: ignore[arg-type] + **statistics_dict, # type: ignore[arg-type] + } + return info_dict
+ + def _build_statistics_dictionary(self, knn_graph: csr_matrix) -> Dict[str, Dict[str, Any]]: + statistics_dict: Dict[str, Dict[str, Any]] = {"statistics": {}} + + if self._skip_storing_knn_graph_for_pred_probs: + return statistics_dict + # Add the knn graph as a statistic if necessary + graph_key = "weighted_knn_graph" + old_knn_graph = self.datalab.get_info("statistics").get(graph_key, None) + old_graph_exists = old_knn_graph is not None + prefer_new_graph = ( + (knn_graph is not None and not old_graph_exists) + or knn_graph.nnz > old_knn_graph.nnz + or self.metric != self.datalab.get_info("statistics").get("knn_metric", None) + ) + if prefer_new_graph: + statistics_dict["statistics"][graph_key] = knn_graph + if self.metric is not None: + statistics_dict["statistics"]["knn_metric"] = self.metric + + return statistics_dict + + def _permutation_test(self, num_permutations) -> float: + N = self.N + + if self.seed is not None: + np.random.seed(self.seed) + perms = np.fromiter( + itertools.chain.from_iterable( + np.random.permutation(N) for i in range(num_permutations) + ), + dtype=int, + ).reshape(num_permutations, N) + + neighbor_index_choices = self.neighbor_index_choices + neighbor_index_choices = neighbor_index_choices.reshape(1, *neighbor_index_choices.shape) + perm_neighbor_choices = perms[:, neighbor_index_choices].reshape( + num_permutations, *neighbor_index_choices.shape[1:] + ) + neighbor_index_distances = np.abs(perms[..., None] - perm_neighbor_choices).reshape( + num_permutations, -1 + ) + + statistics = [] + for neighbor_index_dist in neighbor_index_distances: + stats = self._get_statistics( + neighbor_index_dist, + ) + statistics.append(stats) + + ks_stats = np.array([stats["ks"] for stats in statistics]) + ks_stats_kde = gaussian_kde(ks_stats) + p_value = ks_stats_kde.integrate_box(self.statistics["ks"], 100) + + return p_value + + def _score_dataset(self) -> npt.NDArray[np.float64]: + """This function computes a variant of the KS statistic for each + datapoint. Rather than computing the maximum difference + between the CDF of the neighbor distances (foreground + distribution) and the CDF of the all index distances + (background distribution), we compute the absolute difference + in area-under-the-curve of the two CDFs. + + The foreground distribution is computed by sampling the + neighbor distances from the KNN graph, but the background + distribution is computed analytically. The background CDF for + a datapoint i can be split up into three parts. Let d = min(i, + N - i - 1). + + 1. For 0 < j <= d, the slope of the CDF is 2 / (N - 1) since + there are two datapoints in the dataset that are distance j + from datapoint i. We call this threshold the 'double distance + threshold' + + 2. For d < j <= N - d - 1, the slope of the CDF is + 1 / (N - 1) since there is only one datapoint in the dataset + that is distance j from datapoint i. + + 3. For j > N - d - 1, the slope of the CDF is 0 and is + constant at 1.0 since there are no datapoints in the dataset + that are distance j from datapoint i. + + We compute the area differences on each of the k intervals for + which the foreground CDF is constant which allows for the + possibility that the background CDF may intersect the + foreground CDF on this interval. We do not account for these + cases when computing absolute AUC difference. + + Our algorithm is simple, sort the k sampled neighbor + distances. Then, for each of the k neighbor distances sampled, + compute the AUC for each CDF up to that point. Then, subtract + from each area the previous area in the sorted order to get + the AUC of the CDF on the interval between those two + points. Subtract the background interval AUCs from the + foreground interval AUCs, take the absolute value, and + sum. The algorithm is vectorized such that this statistic is + computed for each of the N datapoints simultaneously. + + The statistics are then normalized by their respective maximum + possible distance (N - d - 1) and then mapped to [0,1] via + tanh. + """ + N = self.N + + sorted_neighbors = np.sort(self.neighbor_index_distances, axis=1) + + # find the maximum distance that occurs with double probability + middle_idx = np.floor((N - 1) / 2).astype(int) + double_distances = np.arange(N).reshape(N, 1) + double_distances[double_distances > middle_idx] -= N - 1 + double_distances = np.abs(double_distances) + + sorted_neighbors = np.hstack([sorted_neighbors, np.ones((N, 1)) * (N - 1)]).astype(int) + + # the set of distances that are less than the double distance threshold + set_beginning = sorted_neighbors <= double_distances + # the set of distances that are greater than the double distance threshold but have nonzero probability + set_middle = (sorted_neighbors > double_distances) & ( + sorted_neighbors <= (N - double_distances - 1) + ) + # the set of distances that occur with 0 probability + set_end = sorted_neighbors > (N - double_distances - 1) + + shifted_neighbors = np.zeros(sorted_neighbors.shape) + shifted_neighbors[:, 1:] = sorted_neighbors[:, :-1] + diffs = sorted_neighbors - shifted_neighbors # the distances between the sorted indices + + area_beginning = (double_distances**2) / (N - 1) + length = N - 2 * double_distances - 1 + a = 2 * double_distances / (N - 1) + area_middle = 0.5 * (a + 1) * length + + # compute the area under the CDF for each of the indices in sorted_neighbors + background_area = np.zeros(diffs.shape) + background_diffs = np.zeros(diffs.shape) + background_area[set_beginning] = ((sorted_neighbors**2) / (N - 1))[set_beginning] + background_area[set_middle] = ( + area_beginning + + 0.5 + * ( + (sorted_neighbors + 3 * double_distances) + * (sorted_neighbors - double_distances) + / (N - 1) + ) + )[set_middle] + background_area[set_end] = ( + area_beginning + area_middle + (sorted_neighbors - (N - double_distances - 1) * 1.0) + )[set_end] + + # compute the area under the CDF between indices in sorted_neighbors + shifted_background = np.zeros(background_area.shape) + shifted_background[:, 1:] = background_area[:, :-1] + background_diffs = background_area - shifted_background + + # compute the foreground CDF and AUC between indices in sorted_neighbors + foreground_cdf = np.arange(sorted_neighbors.shape[1]) / (sorted_neighbors.shape[1] - 1) + foreground_diffs = foreground_cdf.reshape(1, -1) * diffs + + # compute the differences between foreground and background area intervals + area_diffs = np.abs(foreground_diffs - background_diffs) + stats = np.sum(area_diffs, axis=1) + + # normalize scores by the index and transform to [0, 1] + indices = np.arange(N) + reverse = N - indices + normalizer = np.where(indices > reverse, indices, reverse) + + scores = stats / normalizer + scores = np.tanh(-1 * scores) + 1 + return scores + + def _get_neighbors(self, knn_graph: csr_matrix) -> np.ndarray: + """ + Given a knn graph, returns an (N, k) array in + which j is in A[i] if item i and j are nearest neighbors. + """ + self.N = knn_graph.shape[0] + kneighbors = knn_graph.indices.reshape(self.N, -1) + return kneighbors + + def _get_statistics( + self, + neighbor_index_distances, + ) -> dict[str, float]: + neighbor_index_distances = neighbor_index_distances.flatten() + sorted_neighbors = np.sort(neighbor_index_distances) + sorted_neighbors = np.hstack([sorted_neighbors, np.ones((1)) * (self.N - 1)]).astype(int) + + if self.background_distribution is None: + self.background_distribution = (self.N - np.arange(1, self.N)) / ( + self.N * (self.N - 1) / 2 + ) + + background_distribution = cast(np.ndarray, self.background_distribution) + background_cdf = np.cumsum(background_distribution) + + foreground_cdf = np.arange(sorted_neighbors.shape[0]) / (sorted_neighbors.shape[0] - 1) + + statistic = np.max(np.abs(foreground_cdf - background_cdf[sorted_neighbors - 1])) + statistics = {"ks": statistic} + return statistics
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/null.html b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/null.html new file mode 100644 index 000000000..70ab2cc0d --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/null.html @@ -0,0 +1,889 @@ + + + + + + + + + + + cleanlab.datalab.internal.issue_manager.null - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.issue_manager.null

+from __future__ import annotations
+
+from collections import Counter
+from typing import TYPE_CHECKING, Any, ClassVar, Dict, List, Optional
+
+import numpy as np
+import pandas as pd
+
+from cleanlab.datalab.internal.issue_manager import IssueManager
+
+if TYPE_CHECKING:  # pragma: no cover
+    import numpy.typing as npt
+
+
+
[docs]class NullIssueManager(IssueManager): + """Manages issues related to null/missing values in the rows of features. + + Parameters + ---------- + datalab : + The Datalab instance that this issue manager searches for issues in. + """ + + description: ClassVar[ + str + ] = """Examples identified with the null issue correspond to rows that have null/missing values across all feature columns (i.e. the entire row is missing values). + """ + issue_name: ClassVar[str] = "null" + verbosity_levels = { + 0: [], + 1: [], + 2: ["most_common_issue"], + } + + @staticmethod + def _calculate_null_issues( + features: npt.NDArray[Any], + ) -> tuple[npt.NDArray[np.bool_], npt.NDArray[np.float64], npt.NDArray[np.bool_]]: + """Tracks the number of null values in each row of a feature array, + computes quality scores based on the fraction of null values in each row, + and returns a boolean array indicating whether each row only has null values.""" + cols = features.shape[1] + null_tracker = pd.isna(features) + non_null_count = cols - null_tracker.sum(axis=1) + scores = non_null_count / cols + is_null_issue = non_null_count == 0 + return is_null_issue, scores, null_tracker + +
[docs] def find_issues( + self, + features: Optional[npt.NDArray | pd.DataFrame] = None, + **kwargs, + ) -> None: + if features is None: + raise ValueError("features must be provided to check for null values.") + # Support features as a numpy array. Temporarily allow this issuecheck to convert a DataFrame to a numpy array. + if isinstance(features, pd.DataFrame): + features = features.to_numpy() + + is_null_issue, scores, null_tracker = self._calculate_null_issues(features=features) + + self.issues = pd.DataFrame( + { + f"is_{self.issue_name}_issue": is_null_issue, + self.issue_score_key: scores, + }, + ) + + self.summary = self.make_summary(score=scores.mean()) + self.info = self.collect_info(null_tracker)
+ + @staticmethod + def _most_common_issue( + null_tracker: np.ndarray, + ) -> dict[str, dict[str, str | int | list[int] | list[int | None]]]: + """ + Identify and return the most common null value pattern across all rows + and count the number of rows with this pattern. + + Parameters + ------------ + null_tracker : np.ndarray + A boolean array of the same shape as features, where True indicates null/missing entries. + + Returns + -------- + Dict[str, Any] + A dictionary containing the most common issue pattern and the count of rows with this pattern. + """ + # Convert the boolean null_tracker matrix into a list of strings. + most_frequent_pattern = "no_null" + rows_affected: List[int] = [] + occurrence_of_most_frequent_pattern = 0 + if np.any(null_tracker, axis=None): + null_row_indices = np.where(np.any(null_tracker, axis=1))[0] + null_patterns_as_strings = [ + "".join(map(str, null_tracker[i].astype(int).tolist())) for i in null_row_indices + ] + + # Use Counter to efficiently count occurrences and find the most common pattern. + pattern_counter = Counter(null_patterns_as_strings) + ( + most_frequent_pattern, + occurrence_of_most_frequent_pattern, + ) = pattern_counter.most_common(1)[0] + rows_affected = [] + for idx, row in enumerate(null_patterns_as_strings): + if row == most_frequent_pattern: + rows_affected.append(int(null_row_indices[idx])) + return { + "most_common_issue": { + "pattern": most_frequent_pattern, + "rows_affected": rows_affected, + "count": occurrence_of_most_frequent_pattern, + } + } + + @staticmethod + def _column_impact(null_tracker: np.ndarray) -> Dict[str, List[float]]: + """ + Calculate and return the impact of null values per column, represented as the proportion + of rows having null values in each column. + + Parameters + ---------- + null_tracker : np.ndarray + A boolean array of the same shape as features, where True indicates null/missing entries. + + Returns + ------- + Dict[str, List[float]] + A dictionary containing the impact per column, with values being a list + where each element is the percentage of rows having null values in the corresponding column. + """ + # Calculate proportion of nulls in each column + proportion_of_nulls_per_column = null_tracker.mean(axis=0) + + # Return result as a dictionary containing a list of proportions + return {"column_impact": proportion_of_nulls_per_column.tolist()} + +
[docs] def collect_info(self, null_tracker: np.ndarray) -> dict: + most_common_issue = self._most_common_issue(null_tracker=null_tracker) + column_impact = self._column_impact(null_tracker=null_tracker) + average_null_score = {"average_null_score": self.issues[self.issue_score_key].mean()} + issues_dict = {**average_null_score, **most_common_issue, **column_impact} + info_dict: Dict[str, Any] = {**issues_dict} + return info_dict
+ +
[docs] @classmethod + def report(cls, *args, **kwargs) -> str: + """ + Return a report of issues found by the NullIssueManager. + + This method extends the superclass method by identifying and reporting + specific issues related to null values in the dataset. + + Parameters + ---------- + *args : list + Variable length argument list. + **kwargs : dict + Arbitrary keyword arguments. + + Returns + ------- + report_str : + A string containing the report. + + See Also + -------- + :meth:`cleanlab.datalab.Datalab.report` + + Notes + ----- + This method differs from other IssueManager report methods. It checks for issues + and prompts the user to address them to enable other issue managers to run effectively. + """ + # Generate the base report using the superclass method + original_report = super().report(*args, **kwargs) + + # Retrieve the 'issues' dataframe from keyword arguments + issues = kwargs["issues"] + + # Identify examples that have null values in all features + issue_filter = f"is_{cls.issue_name}_issue" + examples_with_full_nulls = issues.query(issue_filter).index.tolist() + + # Identify examples that have some null values (but not in all features) + partial_null_filter = f"{cls.issue_score_key} < 1.0 and not {issue_filter}" + examples_with_partial_nulls = issues.query(partial_null_filter).index.tolist() + + # Append information about examples with null values in all features + if examples_with_full_nulls: + report_addition = ( + f"\n\nFound {len(examples_with_full_nulls)} examples with null values in all features. " + f"These examples should be removed from the dataset before running other issue managers." + # TODO: Add a link to the documentation on how to handle null examples + ) + original_report += report_addition + + # Append information about examples with some null values + if examples_with_partial_nulls: + report_addition = ( + f"\n\nFound {len(examples_with_partial_nulls)} examples with null values in some features. " + f"Please address these issues before running other issue managers." + # TODO: Add a link to the documentation on how to handle partially null examples + ) + original_report += report_addition + + return original_report
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/outlier.html b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/outlier.html new file mode 100644 index 000000000..02a843bba --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/outlier.html @@ -0,0 +1,989 @@ + + + + + + + + + + + cleanlab.datalab.internal.issue_manager.outlier - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.issue_manager.outlier

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any, ClassVar, Dict, Optional, Tuple
+
+from scipy.sparse import csr_matrix
+from scipy.stats import iqr
+import numpy as np
+import pandas as pd
+
+from cleanlab.datalab.internal.issue_manager import IssueManager
+from cleanlab.datalab.internal.issue_manager.knn_graph_helpers import knn_exists, set_knn_graph
+from cleanlab.internal.outlier import correct_precision_errors
+from cleanlab.outlier import OutOfDistribution, transform_distances_to_scores
+
+if TYPE_CHECKING:  # pragma: no cover
+    from sklearn.neighbors import NearestNeighbors
+    import numpy.typing as npt
+    from cleanlab.datalab.datalab import Datalab
+    from cleanlab.typing import Metric
+
+
+
[docs]class OutlierIssueManager(IssueManager): + """Manages issues related to out-of-distribution examples.""" + + description: ClassVar[ + str + ] = """Examples that are very different from the rest of the dataset + (i.e. potentially out-of-distribution or rare/anomalous instances). + """ + issue_name: ClassVar[str] = "outlier" + verbosity_levels = { + 0: [], + 1: [], + 2: ["average_ood_score"], + 3: [], + } + + DEFAULT_THRESHOLDS = { + "features": 0.37037, + "pred_probs": 0.13, + } + """Default thresholds for outlier detection. + + If outlier detection is performed on the features, an example whose average + distance to their k nearest neighbors is greater than + Q3_avg_dist + (1 / threshold - 1) * IQR_avg_dist is considered an outlier. + + If outlier detection is performed on the predicted probabilities, an example + whose average score is lower than threshold * median_outlier_score is + considered an outlier. + """ + + def __init__( + self, + datalab: Datalab, + k: int = 10, + t: int = 1, + metric: Optional[Metric] = None, + scaling_factor: Optional[float] = None, + threshold: Optional[float] = None, + **kwargs, + ): + super().__init__(datalab) + + ood_kwargs = kwargs.get("ood_kwargs", {}) + + valid_ood_params = OutOfDistribution.DEFAULT_PARAM_DICT.keys() + params = { + key: value + for key, value in ((k, kwargs.get(k, None)) for k in valid_ood_params) + if value is not None + } + + # Simplified API: directly specify k and metric instead of NearestNeighbors object + # This reduces dependency on OutOfDistribution and aligns with Datalab's approach + params["k"] = k + self.k = k + self.t = t + self.metric: Optional[Metric] = metric + self.scaling_factor = scaling_factor + + if params: + ood_kwargs["params"] = params + + # OutOfDistribution still used for pred-prob based outlier detection + self.ood: OutOfDistribution = OutOfDistribution(**ood_kwargs) + + self._find_issues_inputs: Dict[str, bool] = { + "features": False, + "pred_probs": False, + "knn_graph": False, + } + + # Used for both methods of outlier detection + self.threshold = threshold + +
[docs] def find_issues( + self, + features: Optional[npt.NDArray] = None, + pred_probs: Optional[np.ndarray] = None, + **kwargs, + ) -> None: + statistics = self.datalab.get_info("statistics") + + # Determine if we can use kNN-based outlier detection + knn_graph_works: bool = self._knn_graph_works(features, kwargs, statistics, self.k) + knn_graph = None + knn = None + if knn_graph_works: + # Set up or retrieve the kNN graph + knn_graph, self.metric, knn = set_knn_graph( + features=features, + find_issues_kwargs=kwargs, + metric=self.metric, + k=self.k, + statistics=statistics, + ) + + # Compute distances and thresholds for outlier detection + distances = knn_graph.data.reshape(knn_graph.shape[0], -1) + assert isinstance(distances, np.ndarray) + ( + self.threshold, + issue_threshold, # Useful info for detecting issues in test data + is_issue_column, + ) = self._compute_threshold_and_issue_column_from_distances(distances, self.threshold) + + # Calculate outlier scores based on average distances + avg_distances = distances.mean(axis=1) + median_avg_distance = np.median(avg_distances) + self._find_issues_inputs.update({"knn_graph": True}) + + # Ensure scaling factor is not too small to avoid numerical issues + if self.scaling_factor is None: + self.scaling_factor = float(max(median_avg_distance, 100 * np.finfo(np.float_).eps)) + scores = transform_distances_to_scores( + avg_distances, t=self.t, scaling_factor=self.scaling_factor + ) + + # Apply precision error correction if metric is available + _metric = self.metric + if _metric is not None: + _metric = _metric if isinstance(_metric, str) else _metric.__name__ + scores = correct_precision_errors(scores, avg_distances, _metric) + elif pred_probs is not None: + # Fallback to prediction probabilities-based outlier detection + scores = self._score_with_pred_probs(pred_probs, **kwargs) + self._find_issues_inputs.update({"pred_probs": True}) + + # Set threshold for pred_probs-based detection + if self.threshold is None: + self.threshold = self.DEFAULT_THRESHOLDS["pred_probs"] + if not 0 <= self.threshold: + raise ValueError(f"threshold must be non-negative, but got {self.threshold}.") + issue_threshold = float( + self.threshold * np.median(scores) + ) # Useful info for detecting issues in test data + is_issue_column = scores < issue_threshold + + else: + # Handle case where neither kNN nor pred_probs-based detection is possible + if ( + kwargs.get("knn_graph", None) is not None + or statistics.get("weighted_knn_graph", None) is not None + ): + raise ValueError( + "knn_graph is provided, but not sufficiently large to compute the scores based on the provided hyperparameters." + ) + raise ValueError(f"Either features pred_probs must be provided.") + + # Store results + self.issues = pd.DataFrame( + { + f"is_{self.issue_name}_issue": is_issue_column, + self.issue_score_key: scores, + }, + ) + + self.summary = self.make_summary(score=scores.mean()) + + self.info = self.collect_info(issue_threshold=issue_threshold, knn_graph=knn_graph, knn=knn)
+ + def _knn_graph_works(self, features, kwargs, statistics, k: int) -> bool: + """Decide whether to skip the knn-based outlier detection and rely on pred_probs instead.""" + sufficient_knn_graph_available = knn_exists(kwargs, statistics, k) + return (features is not None) or sufficient_knn_graph_available + + def _compute_threshold_and_issue_column_from_distances( + self, distances: np.ndarray, threshold: Optional[float] = None + ) -> Tuple[float, float, np.ndarray]: + avg_distances = distances.mean(axis=1) + if threshold: + if not (isinstance(threshold, (int, float)) and 0 <= threshold <= 1): + raise ValueError( + f"threshold must be a number between 0 and 1, got {threshold} of type {type(threshold)}." + ) + if threshold is None: + threshold = OutlierIssueManager.DEFAULT_THRESHOLDS["features"] + + def compute_issue_threshold(avg_distances: np.ndarray, threshold: float) -> float: + q3_distance = np.percentile(avg_distances, 75) + iqr_scale = 1 / threshold - 1 if threshold != 0 else np.inf + issue_threshold = q3_distance + iqr_scale * iqr(avg_distances) + return float(issue_threshold) + + issue_threshold = compute_issue_threshold(avg_distances, threshold) + return threshold, issue_threshold, avg_distances > issue_threshold + +
[docs] def collect_info( + self, + *, + issue_threshold: float, + knn_graph: Optional[csr_matrix], + knn: Optional["NearestNeighbors"], + ) -> dict: + issues_dict = { + "average_ood_score": self.issues[self.issue_score_key].mean(), + "threshold": self.threshold, + "issue_threshold": issue_threshold, + } + pred_probs_issues_dict: Dict[str, Any] = {} + feature_issues_dict = {} + + if knn_graph is not None: + N = knn_graph.shape[0] + k = knn_graph.nnz // N + dists = knn_graph.data.reshape(N, -1)[:, 0] + nn_ids = knn_graph.indices.reshape(N, -1)[:, 0] + + feature_issues_dict.update( + { + "k": self.k, # type: ignore[union-attr] + "nearest_neighbor": nn_ids.tolist(), + "distance_to_nearest_neighbor": dists.tolist(), + "metric": self.metric, # type: ignore[union-attr] + "scaling_factor": self.scaling_factor, + "t": self.t, + "knn": knn, + } + ) + + if self.ood.params["confident_thresholds"] is not None: + pass # + statistics_dict = self._build_statistics_dictionary(knn_graph=knn_graph) + ood_params_dict = { + "ood": self.ood, + **self.ood.params, + } + knn_dict = { + **pred_probs_issues_dict, + **feature_issues_dict, + } + info_dict: Dict[str, Any] = { + **issues_dict, + **ood_params_dict, # type: ignore[arg-type] + **knn_dict, + **statistics_dict, + "find_issues_inputs": self._find_issues_inputs, + } + return info_dict
+ + def _build_statistics_dictionary( + self, *, knn_graph: Optional[csr_matrix] + ) -> Dict[str, Dict[str, Any]]: + statistics_dict: Dict[str, Dict[str, Any]] = {"statistics": {}} + + # Add the knn graph as a statistic if necessary + graph_key = "weighted_knn_graph" + old_knn_graph = self.datalab.get_info("statistics").get(graph_key, None) + old_graph_exists = old_knn_graph is not None + prefer_new_graph = ( + not old_graph_exists + or (isinstance(knn_graph, csr_matrix) and knn_graph.nnz > old_knn_graph.nnz) + or self.metric != self.datalab.get_info("statistics").get("knn_metric", None) + ) + if prefer_new_graph: + if knn_graph is not None: + statistics_dict["statistics"][graph_key] = knn_graph + if self.metric is not None: + statistics_dict["statistics"]["knn_metric"] = self.metric + + return statistics_dict + + def _score_with_pred_probs(self, pred_probs: np.ndarray, **kwargs) -> np.ndarray: + # Remove "threshold" from kwargs if it exists + kwargs.pop("threshold", None) + labels = self.datalab.labels + if not isinstance(labels, np.ndarray): + error_msg = ( + f"labels must be a numpy array of shape (n_samples,) to use the OutlierIssueManager " + f"with pred_probs, but got {type(labels)}." + ) + raise TypeError(error_msg) + scores = self.ood.fit_score(pred_probs=pred_probs, labels=labels, **kwargs) + return scores
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/regression/label.html b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/regression/label.html new file mode 100644 index 000000000..7e23dc398 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/regression/label.html @@ -0,0 +1,936 @@ + + + + + + + + + + + cleanlab.datalab.internal.issue_manager.regression.label - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.issue_manager.regression.label

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any, ClassVar, Dict, Optional
+import numpy as np
+import pandas as pd
+
+from cleanlab.regression.learn import CleanLearning
+from cleanlab.datalab.internal.issue_manager import IssueManager
+from cleanlab.regression.rank import get_label_quality_scores
+
+if TYPE_CHECKING:  # pragma: no cover
+    from cleanlab.datalab.datalab import Datalab
+
+
+
[docs]class RegressionLabelIssueManager(IssueManager): + """Manages label issues in a Datalab for regression tasks. + + Parameters + ---------- + datalab : + A Datalab instance. + + clean_learning_kwargs : + Keyword arguments to pass to the :py:meth:`regression.learn.CleanLearning <cleanlab.regression.learn.CleanLearning>` constructor. + + threshold : + The threshold to use to determine if an example has a label issue. It is a multiplier + of the median label quality score that sets the absolute threshold. Only used if + predictions are provided to `~RegressionLabelIssueManager.find_issues`, not if + features are provided. Default is 0.05. + """ + + description: ClassVar[ + str + ] = """Examples whose given label is estimated to be potentially incorrect + (e.g. due to annotation error) are flagged as having label issues. + """ + + issue_name: ClassVar[str] = "label" + verbosity_levels = { + 0: [], + 1: [], + 2: [], + 3: [], # TODO + } + + def __init__( + self, + datalab: Datalab, + clean_learning_kwargs: Optional[Dict[str, Any]] = None, + threshold: float = 0.05, + health_summary_parameters: Optional[Dict[str, Any]] = None, + **_, + ): + super().__init__(datalab) + self.cl = CleanLearning(**(clean_learning_kwargs or {})) + # This is a field for prioritizing features only when using a custom model + self._uses_custom_model = "model" in (clean_learning_kwargs or {}) + self.threshold = threshold + +
[docs] def find_issues( + self, + features: Optional[np.ndarray] = None, + predictions: Optional[np.ndarray] = None, + **kwargs, + ) -> None: + """Find label issues in the datalab. + + .. admonition:: Priority Order for finding issues: + + 1. Custom Model: Requires `features` to be passed to this method. Used if a model is set up in the constructor. + 2. Predictions: Uses `predictions` if provided and no model is set up in the constructor. + 3. Default Model: Defaults to a standard model using `features` if no model or predictions are provided. + """ + if features is None and predictions is None: + raise ValueError( + "Regression requires numerical `features` or `predictions` " + "to be passed in as an argument to `find_issues`." + ) + if features is None and self._uses_custom_model: + raise ValueError( + "Regression requires numerical `features` to be passed in as an argument to `find_issues` " + "when using a custom model." + ) + # If features are provided and either a custom model is used or no predictions are provided + use_features = features is not None and (self._uses_custom_model or predictions is None) + labels = self.datalab.labels + if not isinstance(labels, np.ndarray): + error_msg = ( + f"Expected labels to be a numpy array of shape (n_samples,) to use with RegressionLabelIssueManager, " + f"but got {type(labels)} instead." + ) + raise TypeError(error_msg) + if use_features: + assert features is not None # mypy won't narrow the type for some reason + self.issues = find_issues_with_features( + features=features, + y=labels, + cl=self.cl, + **kwargs, # function sanitizes kwargs + ) + self.issues.rename(columns={"label_quality": self.issue_score_key}, inplace=True) + + # Otherwise, if predictions are provided, process them + else: + assert predictions is not None # mypy won't narrow the type for some reason + self.issues = find_issues_with_predictions( + predictions=predictions, + y=labels, + **{**kwargs, **{"threshold": self.threshold}}, # function sanitizes kwargs + ) + + # Get a summarized dataframe of the label issues + self.summary = self.make_summary(score=self.issues[self.issue_score_key].mean()) + + # Collect info about the label issues + self.info = self.collect_info(issues=self.issues) + + # Drop columns from issues that are in the info + self.issues = self.issues.drop(columns=["given_label", "predicted_label"])
+ +
[docs] def collect_info(self, issues: pd.DataFrame) -> dict: + issues_info = { + "num_label_issues": sum(issues[f"is_{self.issue_name}_issue"]), + "average_label_quality": issues[self.issue_score_key].mean(), + "given_label": issues["given_label"].tolist(), + "predicted_label": issues["predicted_label"].tolist(), + } + + # health_summary_info, cl_info kept just for consistency with classification, but it could be just return issues_info + health_summary_info: dict = {} + cl_info: dict = {} + + info_dict = { + **issues_info, + **health_summary_info, + **cl_info, + } + + return info_dict
+ + +
[docs]def find_issues_with_predictions( + predictions: np.ndarray, + y: np.ndarray, + threshold: float, + **kwargs, +) -> pd.DataFrame: + """Find label issues in a regression dataset based on predictions. + This uses a threshold to determine if an example has a label issue + based on the quality score. + + Parameters + ---------- + predictions : + The predictions from a regression model. + + y : + The given labels. + + threshold : + The threshold to use to determine if an example has a label issue. It is a multiplier + of the median label quality score that sets the absolute threshold. + + **kwargs : + Various keyword arguments. + + Returns + ------- + issues : + A dataframe of the issues. It contains the following columns: + - is_label_issue : bool + True if the example has a label issue. + - label_score : float + The quality score of the label. + - given_label : float + The given label. It is the same as the y parameter. + - predicted_label : float + The predicted label. It is the same as the predictions parameter. + """ + _accepted_kwargs = ["method"] + _kwargs = {k: kwargs.get(k) for k in _accepted_kwargs} + _kwargs = {k: v for k, v in _kwargs.items() if v is not None} + quality_scores = get_label_quality_scores(labels=y, predictions=predictions, **_kwargs) + + median_score = np.median(quality_scores) + is_label_issue_mask = quality_scores < median_score * threshold + + issues = pd.DataFrame( + { + "is_label_issue": is_label_issue_mask, + "label_score": quality_scores, + "given_label": y, + "predicted_label": predictions, + } + ) + return issues
+ + +
[docs]def find_issues_with_features( + features: np.ndarray, + y: np.ndarray, + cl: CleanLearning, + **kwargs, +) -> pd.DataFrame: + """Find label issues in a regression dataset based on features. + This delegates the work to the CleanLearning.find_label_issues method. + + Parameters + ---------- + features : + The numerical features from a regression dataset. + + y : + The given labels. + + **kwargs : + Various keyword arguments. + + Returns + ------- + issues : + A dataframe of the issues. It contains the following columns: + - is_label_issue : bool + True if the example has a label issue. + - label_score : float + The quality score of the label. + - given_label : float + The given label. It is the same as the y parameter. + - predicted_label : float + The predicted label. It is determined by the CleanLearning.find_label_issues method. + """ + _accepted_kwargs = [ + "uncertainty", + "coarse_search_range", + "fine_search_size", + "save_space", + "model_kwargs", + ] + _kwargs = {k: v for k, v in kwargs.items() if k in _accepted_kwargs and v is not None} + return cl.find_label_issues(X=features, y=y, **_kwargs)
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/underperforming_group.html b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/underperforming_group.html new file mode 100644 index 000000000..61344ce19 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager/underperforming_group.html @@ -0,0 +1,1025 @@ + + + + + + + + + + + cleanlab.datalab.internal.issue_manager.underperforming_group - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.issue_manager.underperforming_group

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any, Callable, ClassVar, Dict, Optional, Union, Tuple
+import warnings
+import inspect
+
+import numpy as np
+import pandas as pd
+from scipy.sparse import csr_matrix
+from sklearn.cluster import DBSCAN
+
+from cleanlab.datalab.internal.issue_manager import IssueManager
+from cleanlab.datalab.internal.issue_manager.knn_graph_helpers import set_knn_graph
+from cleanlab.rank import get_self_confidence_for_each_label
+
+if TYPE_CHECKING:  # pragma: no cover
+    import numpy.typing as npt
+    from cleanlab.datalab.datalab import Datalab
+
+
+CLUSTERING_ALGO = "DBSCAN"
+CLUSTERING_PARAMS_DEFAULT = {"metric": "precomputed"}
+
+
+
[docs]class UnderperformingGroupIssueManager(IssueManager): + """ + Manages issues related to underperforming group examples. + + Note: The `min_cluster_samples` argument should not be confused with the + `min_samples` argument of sklearn.cluster.DBSCAN. + + Examples + -------- + >>> from cleanlab import Datalab + >>> import numpy as np + >>> X = np.random.normal(size=(50, 2)) + >>> y = np.random.randint(2, size=50) + >>> pred_probs = X / X.sum(axis=1, keepdims=True) + >>> data = {"X": X, "y": y} + >>> lab = Datalab(data, label_name="y") + >>> issue_types={"underperforming_group": {"clustering_kwargs": {"eps": 0.5}}} + >>> lab.find_issues(pred_probs=pred_probs, features=X, issue_types=issue_types) + """ + + description: ClassVar[ + str + ] = """An underperforming group refers to a cluster of similar examples + (i.e. a slice) in the dataset for which the ML model predictions + are particularly poor (loss evaluation over this subpopulation is high). + """ + issue_name: ClassVar[str] = "underperforming_group" + verbosity_levels = { + 0: [], + 1: [], + 2: ["threshold"], + } + OUTLIER_CLUSTER_LABELS: ClassVar[Tuple[int]] = (-1,) + """Specifies labels considered as outliers by the clustering algorithm.""" + NO_UNDERPERFORMING_CLUSTER_ID: ClassVar[int] = min(OUTLIER_CLUSTER_LABELS) - 1 + """Constant to signify absence of any underperforming cluster.""" + + def __init__( + self, + datalab: Datalab, + metric: Optional[Union[str, Callable]] = None, + threshold: float = 0.1, + k: int = 10, + clustering_kwargs: Dict[str, Any] = {}, + min_cluster_samples: int = 5, + **_: Any, + ): + super().__init__(datalab) + self.metric = metric + self.threshold = self._set_threshold(threshold) + self.k = k + self.clustering_kwargs = clustering_kwargs + self.min_cluster_samples = min_cluster_samples + +
[docs] def find_issues( + self, + pred_probs: npt.NDArray, + features: Optional[npt.NDArray] = None, + cluster_ids: Optional[npt.NDArray[np.int_]] = None, + **kwargs: Any, + ) -> None: + labels = self.datalab.labels + if not isinstance(labels, np.ndarray): + error_msg = ( + f"Labels must be a numpy array of shape (n_samples,) for UnderperformingGroupIssueManager. " + f"Got {type(labels)} instead." + ) + raise TypeError(error_msg) + if cluster_ids is None: + statistics = self.datalab.get_info("statistics") + knn_graph, self.metric, _ = set_knn_graph( + features, kwargs, self.metric, self.k, statistics + ) + cluster_ids = self.perform_clustering(knn_graph) + performed_clustering = True + else: + if self.clustering_kwargs: + warnings.warn( + "`clustering_kwargs` will not be used since `cluster_ids` have been passed." + ) + performed_clustering = False + knn_graph = None + unique_cluster_ids = self.filter_cluster_ids(cluster_ids) + if not unique_cluster_ids.size: + raise ValueError( + "No meaningful clusters were generated for determining underperforming group." + ) + n_clusters = len(unique_cluster_ids) + worst_cluster_id, worst_cluster_ratio = self.get_worst_cluster( + cluster_ids, unique_cluster_ids, labels, pred_probs + ) + is_issue_column = cluster_ids == worst_cluster_id + scores = np.where(is_issue_column, worst_cluster_ratio, 1) + self.issues = pd.DataFrame( + { + f"is_{self.issue_name}_issue": is_issue_column, + self.issue_score_key: scores, + }, + ) + self.summary = self.make_summary(score=worst_cluster_ratio) + self.info = self.collect_info( + knn_graph=knn_graph, + n_clusters=n_clusters, + cluster_ids=cluster_ids, + performed_clustering=performed_clustering, + worst_cluster_id=worst_cluster_id, + )
+ +
[docs] def perform_clustering(self, knn_graph: csr_matrix) -> npt.NDArray[np.int_]: + """Perform clustering of datapoints using a knn graph as distance matrix. + + Args: + knn_graph (csr_matrix): Sparse Distance Matrix. + + Returns: + cluster_ids (npt.NDArray[np.int_]): Cluster IDs for each datapoint. + """ + DBSCAN_VALID_KEYS = inspect.signature(DBSCAN).parameters.keys() + dbscan_params = { + key: value + for key, value in ((k, self.clustering_kwargs.get(k, None)) for k in DBSCAN_VALID_KEYS) + if value is not None + } + dbscan_params["metric"] = "precomputed" + clusterer = DBSCAN(**dbscan_params) + cluster_ids = clusterer.fit_predict( + knn_graph.copy() + ) # Copy to avoid modification by DBSCAN + return cluster_ids
+ +
[docs] def filter_cluster_ids(self, cluster_ids: npt.NDArray[np.int_]) -> npt.NDArray[np.int_]: + """Remove outlier clusters and return IDs of clusters with at least `self.min_cluster_samples` number of datapoints. + + + Args: + cluster_ids (npt.NDArray[np.int_]): Cluster IDs for each datapoint. + + Returns: + unique_cluster_ids (npt.NDArray[np.int_]): List of unique cluster IDs after + removing outlier clusters and clusters with less than `self.min_cluster_samples` + number of datapoints. + """ + unique_cluster_ids = np.array( + [label for label in set(cluster_ids) if label not in self.OUTLIER_CLUSTER_LABELS] + ) + frequencies = np.bincount(cluster_ids[~np.isin(cluster_ids, self.OUTLIER_CLUSTER_LABELS)]) + unique_cluster_ids = np.array( + [ + cluster_id + for cluster_id in unique_cluster_ids + if frequencies[cluster_id] >= self.min_cluster_samples + ] + ) + return unique_cluster_ids
+ +
[docs] def get_worst_cluster( + self, + cluster_ids: npt.NDArray[np.int_], + unique_cluster_ids: npt.NDArray[np.int_], + labels: npt.NDArray, + pred_probs: npt.NDArray, + ) -> Tuple[int, float]: + """Get ID and quality score of underperforming cluster. + + Args: + cluster_ids (npt.NDArray[np.int_]): _description_ + unique_cluster_ids (npt.NDArray[np.int_]): _description_ + labels (npt.NDArray): _description_ + pred_probs (npt.NDArray): _description_ + + Returns: + Tuple[int, float]: (Underperforming Cluster ID, Cluster Quality Score) + """ + worst_cluster_performance = 1 # Largest possible probability value + worst_cluster_id = min(unique_cluster_ids) - 1 + for cluster_id in unique_cluster_ids: + cluster_mask = cluster_ids == cluster_id + cur_cluster_ids = labels[cluster_mask] + cur_cluster_pred_probs = pred_probs[cluster_mask] + cluster_performance = get_self_confidence_for_each_label( + cur_cluster_ids, cur_cluster_pred_probs + ).mean() + if cluster_performance < worst_cluster_performance: + worst_cluster_performance = cluster_performance + worst_cluster_id = cluster_id + mean_performance = get_self_confidence_for_each_label(labels, pred_probs).mean() + worst_cluster_ratio = min(worst_cluster_performance / mean_performance, 1.0) + worst_cluster_id = ( + worst_cluster_id + if worst_cluster_ratio < self.threshold + else self.NO_UNDERPERFORMING_CLUSTER_ID + ) + return worst_cluster_id, worst_cluster_ratio
+ +
[docs] def collect_info( + self, + knn_graph: csr_matrix, + n_clusters: int, + cluster_ids: npt.NDArray[np.int_], + performed_clustering: bool, + worst_cluster_id: int, + ) -> Dict[str, Any]: + params_dict = { + "k": self.k, + "metric": self.metric, + "threshold": self.threshold, + } + + knn_info_dict = {} + if knn_graph is not None: + N = knn_graph.shape[0] + dists = knn_graph.data.reshape(N, -1)[:, 0] + nn_ids = knn_graph.indices.reshape(N, -1)[:, 0] + + knn_info_dict = { + "nearest_neighbor": nn_ids.tolist(), + "distance_to_nearest_neighbor": dists.tolist(), + } + statistics_dict = self._build_statistics_dictionary(knn_graph=knn_graph) + + cluster_stat_dict = self._get_cluster_statistics( + n_clusters=n_clusters, + cluster_ids=cluster_ids, + performed_clustering=performed_clustering, + worst_cluster_id=worst_cluster_id, + ) + info_dict = { + **params_dict, + **knn_info_dict, + **statistics_dict, + **cluster_stat_dict, + } + + return info_dict
+ + def _build_statistics_dictionary(self, knn_graph: csr_matrix) -> Dict[str, Dict[str, Any]]: + statistics_dict: Dict[str, Dict[str, Any]] = {"statistics": {}} + + # Add the knn graph as a statistic if necessary + graph_key = "weighted_knn_graph" + old_knn_graph = self.datalab.get_info("statistics").get(graph_key, None) + old_graph_exists = old_knn_graph is not None + prefer_new_graph = ( + not old_graph_exists + or (isinstance(knn_graph, csr_matrix) and knn_graph.nnz > old_knn_graph.nnz) + or self.metric != self.datalab.get_info("statistics").get("knn_metric", None) + ) + if prefer_new_graph: + if knn_graph is not None: + statistics_dict["statistics"][graph_key] = knn_graph + if self.metric is not None: + statistics_dict["statistics"]["knn_metric"] = self.metric + + return statistics_dict + + def _get_cluster_statistics( + self, + n_clusters: int, + cluster_ids: npt.NDArray[np.int_], + performed_clustering: bool, + worst_cluster_id: int, + ) -> Dict[str, Dict[str, Any]]: + """Get relevant cluster statistics. + + Args: + n_clusters (int): Number of clusters + cluster_ids (npt.NDArray[np.int_]): Cluster IDs for each datapoint. + performed_clustering (bool): Set to True to indicate that clustering was performed on + `features` passed to `find_issues`. Set to False to suggest that `cluster_ids` were explicitly + passed to `find_issues`. + worst_cluster_id (int): Uderperforming cluster ID. + + Returns: + cluster_stats (Dict[str, Dict[str, Any]]): Cluster Statistics + """ + cluster_stats: Dict[str, Dict[str, Any]] = { + "clustering": { + "algorithm": None, + "params": {}, + "stats": { + "n_clusters": n_clusters, + "cluster_ids": cluster_ids, + "underperforming_cluster_id": worst_cluster_id, + }, + } + } + if performed_clustering: + cluster_stats["clustering"].update( + {"algorithm": CLUSTERING_ALGO, "params": CLUSTERING_PARAMS_DEFAULT} + ) + + return cluster_stats + + def _set_threshold( + self, + threshold: float, + ) -> float: + """Computes nearest-neighbors thresholding for near-duplicate detection.""" + if threshold < 0: + warnings.warn( + f"Computed threshold {threshold} is less than 0. " + "Setting threshold to 0." + "This may indicate that either the only a few examples are in the dataset, " + "or the data is heavily skewed." + ) + threshold = 0 + return threshold
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager_factory.html b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager_factory.html new file mode 100644 index 000000000..10d0e0b78 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/issue_manager_factory.html @@ -0,0 +1,960 @@ + + + + + + + + + + + cleanlab.datalab.internal.issue_manager_factory - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.issue_manager_factory

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+"""The factory module provides a factory class for constructing concrete issue managers
+and a decorator for registering new issue managers.
+
+This module provides the :py:meth:`register` decorator for users to register new subclasses of
+:py:class:`IssueManager <cleanlab.datalab.internal.issue_manager.issue_manager.IssueManager>`
+in the registry. Each IssueManager detects some particular type of issue in a dataset.
+
+
+Note
+----
+
+The :class:`REGISTRY` variable is used by the factory class to keep track
+of registered issue managers.
+The factory class is used as an implementation detail by
+:py:class:`Datalab <cleanlab.datalab.datalab.Datalab>`,
+which provides a simplified API for constructing concrete issue managers.
+:py:class:`Datalab <cleanlab.datalab.datalab.Datalab>` is intended to be used by users
+and provides detailed documentation on how to use the API.
+
+Warning
+-------
+Neither the :class:`REGISTRY` variable nor the factory class should be used directly by users.
+"""
+from __future__ import annotations
+
+from typing import Dict, List, Type
+
+from cleanlab.datalab.internal.issue_manager import (
+    ClassImbalanceIssueManager,
+    DataValuationIssueManager,
+    IssueManager,
+    LabelIssueManager,
+    NearDuplicateIssueManager,
+    NonIIDIssueManager,
+    ClassImbalanceIssueManager,
+    UnderperformingGroupIssueManager,
+    DataValuationIssueManager,
+    OutlierIssueManager,
+    NullIssueManager,
+)
+from cleanlab.datalab.internal.issue_manager.regression import RegressionLabelIssueManager
+from cleanlab.datalab.internal.issue_manager.multilabel.label import MultilabelIssueManager
+from cleanlab.datalab.internal.task import Task
+
+
+REGISTRY: Dict[Task, Dict[str, Type[IssueManager]]] = {
+    Task.CLASSIFICATION: {
+        "outlier": OutlierIssueManager,
+        "label": LabelIssueManager,
+        "near_duplicate": NearDuplicateIssueManager,
+        "non_iid": NonIIDIssueManager,
+        "class_imbalance": ClassImbalanceIssueManager,
+        "underperforming_group": UnderperformingGroupIssueManager,
+        "data_valuation": DataValuationIssueManager,
+        "null": NullIssueManager,
+    },
+    Task.REGRESSION: {
+        "label": RegressionLabelIssueManager,
+        "outlier": OutlierIssueManager,
+        "near_duplicate": NearDuplicateIssueManager,
+        "non_iid": NonIIDIssueManager,
+        "data_valuation": DataValuationIssueManager,
+        "null": NullIssueManager,
+    },
+    Task.MULTILABEL: {
+        "label": MultilabelIssueManager,
+        "outlier": OutlierIssueManager,
+        "near_duplicate": NearDuplicateIssueManager,
+        "non_iid": NonIIDIssueManager,
+        "data_valuation": DataValuationIssueManager,
+        "null": NullIssueManager,
+    },
+}
+"""Registry of issue managers that can be constructed from a task and issue type
+and used in the Datalab class.
+
+:meta hide-value:
+
+Currently, the following issue managers are registered by default for a given task:
+
+- Classification:
+
+    - ``"outlier"``: :py:class:`OutlierIssueManager <cleanlab.datalab.internal.issue_manager.outlier.OutlierIssueManager>`
+    - ``"label"``: :py:class:`LabelIssueManager <cleanlab.datalab.internal.issue_manager.label.LabelIssueManager>`
+    - ``"near_duplicate"``: :py:class:`NearDuplicateIssueManager <cleanlab.datalab.internal.issue_manager.duplicate.NearDuplicateIssueManager>`
+    - ``"non_iid"``: :py:class:`NonIIDIssueManager <cleanlab.datalab.internal.issue_manager.noniid.NonIIDIssueManager>`
+    - ``"class_imbalance"``: :py:class:`ClassImbalanceIssueManager <cleanlab.datalab.internal.issue_manager.imbalance.ClassImbalanceIssueManager>`
+    - ``"underperforming_group"``: :py:class:`UnderperformingGroupIssueManager <cleanlab.datalab.internal.issue_manager.underperforming_group.UnderperformingGroupIssueManager>`
+    - ``"data_valuation"``: :py:class:`DataValuationIssueManager <cleanlab.datalab.internal.issue_manager.data_valuation.DataValuationIssueManager>`
+    - ``"null"``: :py:class:`NullIssueManager <cleanlab.datalab.internal.issue_manager.null.NullIssueManager>`
+    
+- Regression:
+
+    - ``"label"``: :py:class:`RegressionLabelIssueManager <cleanlab.datalab.internal.issue_manager.regression.label.RegressionLabelIssueManager>`
+    - ``"outlier"``: :py:class:`OutlierIssueManager <cleanlab.datalab.internal.issue_manager.outlier.OutlierIssueManager>`
+    - ``"near_duplicate"``: :py:class:`NearDuplicateIssueManager <cleanlab.datalab.internal.issue_manager.duplicate.NearDuplicateIssueManager>`
+    - ``"non_iid"``: :py:class:`NonIIDIssueManager <cleanlab.datalab.internal.issue_manager.noniid.NonIIDIssueManager>`
+    - ``"null"``: :py:class:`NullIssueManager <cleanlab.datalab.internal.issue_manager.null.NullIssueManager>`
+
+- Multilabel:
+
+    - ``"label"``: :py:class:`MultilabelIssueManager <cleanlab.datalab.internal.issue_manager.multilabel.label.MultilabelIssueManager>`
+    - ``"outlier"``: :py:class:`OutlierIssueManager <cleanlab.datalab.internal.issue_manager.outlier.OutlierIssueManager>`
+    - ``"near_duplicate"``: :py:class:`NearDuplicateIssueManager <cleanlab.datalab.internal.issue_manager.duplicate.NearDuplicateIssueManager>`
+    - ``"non_iid"``: :py:class:`NonIIDIssueManager <cleanlab.datalab.internal.issue_manager.noniid.NonIIDIssueManager>`
+    - ``"null"``: :py:class:`NullIssueManager <cleanlab.datalab.internal.issue_manager.null.NullIssueManager>`
+
+Warning
+-------
+This variable should not be used directly by users.
+"""
+
+
+# Construct concrete issue manager with a from_str method
+class _IssueManagerFactory:
+    """Factory class for constructing concrete issue managers."""
+
+    @classmethod
+    def from_str(cls, issue_type: str, task: Task) -> Type[IssueManager]:
+        """Constructs a concrete issue manager class from a string."""
+        if isinstance(issue_type, list):
+            raise ValueError(
+                "issue_type must be a string, not a list. Try using from_list instead."
+            )
+
+        if task not in REGISTRY:
+            raise ValueError(f"Invalid task type: {task}, must be in {list(REGISTRY.keys())}")
+        if issue_type not in REGISTRY[task]:
+            raise ValueError(f"Invalid issue type: {issue_type} for task {task}")
+
+        return REGISTRY[task][issue_type]
+
+    @classmethod
+    def from_list(cls, issue_types: List[str], task: Task) -> List[Type[IssueManager]]:
+        """Constructs a list of concrete issue manager classes from a list of strings."""
+        return [cls.from_str(issue_type, task) for issue_type in issue_types]
+
+
+
[docs]def register(cls: Type[IssueManager], task: str = str(Task.CLASSIFICATION)) -> Type[IssueManager]: + """Registers the issue manager factory. + + Parameters + ---------- + cls : + A subclass of + :py:class:`IssueManager <cleanlab.datalab.internal.issue_manager.issue_manager.IssueManager>`. + + task : + Specific machine learning task like classification or regression. + See :py:meth:`Task.from_str <cleanlab.datalab.internal.task.Task.from_str>`` for more details, + to see which task type corresponds to which string. + + Returns + ------- + cls : + The same class that was passed in. + + Example + ------- + + When defining a new subclass of + :py:class:`IssueManager <cleanlab.datalab.internal.issue_manager.issue_manager.IssueManager>`, + you can register it like so: + + .. code-block:: python + + from cleanlab import IssueManager + from cleanlab.datalab.internal.issue_manager_factory import register + + @register + class MyIssueManager(IssueManager): + issue_name: str = "my_issue" + def find_issues(self, **kwargs): + # Some logic to find issues + pass + + or in a function call: + + .. code-block:: python + + from cleanlab import IssueManager + from cleanlab.datalab.internal.issue_manager_factory import register + + class MyIssueManager(IssueManager): + issue_name: str = "my_issue" + def find_issues(self, **kwargs): + # Some logic to find issues + pass + + register(MyIssueManager, task="classification") + """ + + if not issubclass(cls, IssueManager): + raise ValueError(f"Class {cls} must be a subclass of IssueManager") + + name: str = str(cls.issue_name) + + try: + _task = Task.from_str(task) + if _task not in REGISTRY: + raise ValueError(f"Invalid task type: {_task}, must be in {list(REGISTRY.keys())}") + except KeyError: + raise ValueError(f"Invalid task type: {task}, must be in {list(REGISTRY.keys())}") + + if name in REGISTRY[_task]: + print( + f"Warning: Overwriting existing issue manager {name} with {cls} for task {_task}." + "This may cause unexpected behavior." + ) + + REGISTRY[_task][name] = cls + return cls
+ + +
[docs]def list_possible_issue_types(task: Task) -> List[str]: + """Returns a list of all registered issue types. + + Any issue type that is not in this list cannot be used in the :py:meth:`find_issues` method. + + See Also + -------- + :py:class:`REGISTRY <cleanlab.datalab.internal.issue_manager_factory.REGISTRY>` : All available issue types and their corresponding issue managers can be found here. + """ + return list(REGISTRY.get(task, []))
+ + +
[docs]def list_default_issue_types(task: Task) -> List[str]: + """Returns a list of the issue types that are run by default + when :py:meth:`find_issues` is called without specifying `issue_types`. + + task : + Specific machine learning task supported by Datalab. + + See Also + -------- + :py:class:`REGISTRY <cleanlab.datalab.internal.issue_manager_factory.REGISTRY>` : All available issue types and their corresponding issue managers can be found here. + """ + default_issue_types_dict = { + Task.CLASSIFICATION: [ + "null", + "label", + "outlier", + "near_duplicate", + "non_iid", + "class_imbalance", + "underperforming_group", + ], + Task.REGRESSION: [ + "null", + "label", + "outlier", + "near_duplicate", + "non_iid", + ], + Task.MULTILABEL: [ + "null", + "label", + "outlier", + "near_duplicate", + "non_iid", + ], + } + if task not in default_issue_types_dict: + task = Task.CLASSIFICATION + default_issue_types = default_issue_types_dict[task] + return default_issue_types
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/model_outputs.html b/v2.6.6/_modules/cleanlab/datalab/internal/model_outputs.html new file mode 100644 index 000000000..03898168e --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/model_outputs.html @@ -0,0 +1,811 @@ + + + + + + + + + + + cleanlab.datalab.internal.model_outputs - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.model_outputs

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+"""
+This module contains the ModelOutput class, which is used internally within Datalab
+to represent model outputs (e.g. predictions, probabilities, etc.) and process them
+for issue finding.
+This class and associated naming conventions are subject to change and is not meant
+to be used by users.
+"""
+
+
+from abc import ABC, abstractmethod
+import numpy as np
+from dataclasses import dataclass
+
+
+
[docs]@dataclass +class ModelOutput(ABC): + """ + An abstract class for representing model outputs (e.g. predictions, probabilities, etc.) + for internal use within Datalab. This class is not meant to be used by users. + + It is used internally within the issue-finding process Datalab runs to assign + types to the data and process it accordingly. + + Parameters + ---------- + data : array-like + The model outputs. Not to be confused with the data used to train the model. + This is mainly intended for NumPy arrays. + """ + + data: np.ndarray + +
[docs] @abstractmethod + def validate(self): + """ + Validate the data format and content. + E.g. a pred_probs object used for classification + should be a 2D array with values between 0 and 1 and sum to 1 for each row. + """ + pass
+ +
[docs] @abstractmethod + def collect(self): + """ + Fetch the data for issue finding. + Usually this is just the data itself, but sometimes it may be a transformation + of the data (e.g. a 1D array of predictions from a 2D array of predicted probabilities). + """ + pass
+ + +
[docs]class MultiClassPredProbs(ModelOutput): + """ + A class for representing a model's predicted probabilities for each class + in a multi-class classification problem. This class is not meant to be used by users. + """ + + argument = "pred_probs" + +
[docs] def validate(self): + pred_probs = self.data + if pred_probs.ndim != 2: + raise ValueError("pred_probs must be a 2D array for multi-class classification") + if not np.all((pred_probs >= 0) & (pred_probs <= 1)): + incorrect_range = (np.min(pred_probs), np.max(pred_probs)) + raise ValueError( + "Expected pred_probs to be between 0 and 1 for multi-label classification," + f" but got values in range {incorrect_range} instead." + ) + if not np.allclose(np.sum(pred_probs, axis=1), 1): + raise ValueError("pred_probs must sum to 1 for each row for multi-class classification")
+ +
[docs] def collect(self): + return self.data
+ + +
[docs]class RegressionPredictions(ModelOutput): + """ + A class for representing a model's predictions for a regression problem. + This class is not meant to be used by users. + """ + + argument = "predictions" + +
[docs] def validate(self): + predictions = self.data + if predictions.ndim != 1: + raise ValueError("pred_probs must be a 1D array for regression")
+ +
[docs] def collect(self): + return self.data
+ + +
[docs]class MultiLabelPredProbs(ModelOutput): + """ + A class for representing a model's predicted probabilities for each class + in a multilabel classification problem. This class is not meant to be used by users. + """ + + argument = "pred_probs" + +
[docs] def validate(self): + pred_probs = self.data + if pred_probs.ndim != 2: + raise ValueError( + f"Expected pred_probs to be a 2D array for multi-label classification," + " but got {pred_probs.ndim}D array instead." + ) + if not np.all((pred_probs >= 0) & (pred_probs <= 1)): + incorrect_range = (np.min(pred_probs), np.max(pred_probs)) + raise ValueError( + "Expected pred_probs to be between 0 and 1 for multi-label classification," + f" but got values in range {incorrect_range} instead." + )
+ +
[docs] def collect(self): + return self.data
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/report.html b/v2.6.6/_modules/cleanlab/datalab/internal/report.html new file mode 100644 index 000000000..97a27320f --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/report.html @@ -0,0 +1,888 @@ + + + + + + + + + + + cleanlab.datalab.internal.report - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.report

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+"""
+Module that handles reporting of all types of issues identified in the data.
+"""
+
+from typing import TYPE_CHECKING, List
+
+import pandas as pd
+
+from cleanlab.datalab.internal.adapter.constants import DEFAULT_CLEANVISION_ISSUES
+from cleanlab.datalab.internal.issue_manager_factory import _IssueManagerFactory
+from cleanlab.datalab.internal.task import Task
+
+if TYPE_CHECKING:  # pragma: no cover
+    from cleanlab.datalab.internal.data_issues import DataIssues
+
+
+
[docs]class Reporter: + """Class that generates a report about the issues stored in a :py:class:`DataIssues` object. + + Parameters + ---------- + data_issues : + The :py:class:`DataIssues` object containing the issues to report on. This is usually + generated by the :py:class:`Datalab` class, stored in the :py:attr:`data_issues` attribute, + and then passed to the :py:class:`Reporter` class to generate a report. + + task : + Specific machine learning task that the datset is intended for. + See details about supported tasks in :py:class:`Task <cleanlab.datalab.internal.task.Task>`. + + verbosity : + The default verbosity of the report to generate. Each :py:class`IssueManager` + specifies the available verbosity levels and what additional information + is included at each level. + + include_description : + Whether to include the description of each issue type in the report. The description + is included by default, but can be excluded by setting this parameter to ``False``. + + Note + ---- + This class is not intended to be used directly. Instead, use the + `Datalab.find_issues` method which internally utilizes an IssueFinder instance. + """ + + def __init__( + self, + data_issues: "DataIssues", + task: Task, + verbosity: int = 1, + include_description: bool = True, + show_summary_score: bool = False, + show_all_issues: bool = False, + **kwargs, + ): + self.data_issues = data_issues + self.task = task + self.verbosity = verbosity + self.include_description = include_description + self.show_summary_score = show_summary_score + self.show_all_issues = show_all_issues + + def _get_empty_report(self) -> str: + """This method is used to return a report when there are + no issues found in the data with Datalab.find_issues(). + """ + report_str = "No issues found in the data. Good job!" + if not self.show_summary_score: + recommendation_msg = ( + "Try re-running Datalab.report() with " + "`show_summary_score = True` and `show_all_issues = True`." + ) + report_str += f"\n\n{recommendation_msg}" + return report_str + +
[docs] def report(self, num_examples: int) -> None: + """Prints a report about identified issues in the data. + + Parameters + ---------- + num_examples : + The number of examples to include in the report for each issue type. + """ + print(self.get_report(num_examples=num_examples))
+ +
[docs] def get_report(self, num_examples: int) -> str: + """Constructs a report about identified issues in the data. + + Parameters + ---------- + num_examples : + The number of examples to include in the report for each issue type. + + + Returns + ------- + report_str : + A string containing the report. + + Examples + -------- + >>> from cleanlab.datalab.internal.report import Reporter + >>> reporter = Reporter(data_issues=data_issues, include_description=False) + >>> report_str = reporter.get_report(num_examples=5) + >>> print(report_str) + """ + report_str = "" + issue_summary = self.data_issues.issue_summary + should_return_empty_report = not ( + self.show_all_issues or issue_summary.empty or issue_summary["num_issues"].sum() > 0 + ) + + if should_return_empty_report: + return self._get_empty_report() + issue_summary_sorted = issue_summary.sort_values(by="num_issues", ascending=False) + report_str += self._write_summary(summary=issue_summary_sorted) + + issue_types = self._get_issue_types(issue_summary_sorted) + + def add_issue_to_report(issue_name: str) -> bool: + """Returns True if the issue should be added to the report. + It is excluded if show_all_issues is False and there are no issues of that type + found in the data. + """ + if self.show_all_issues: + return True + summary = self.data_issues.get_issue_summary(issue_name=issue_name) + has_issues = summary["num_issues"][0] > 0 + return has_issues + + issue_reports = [ + _IssueManagerFactory.from_str(issue_type=key, task=self.task).report( + issues=self.data_issues.get_issues(issue_name=key), + summary=self.data_issues.get_issue_summary(issue_name=key), + info=self.data_issues.get_info(issue_name=key), + num_examples=num_examples, + verbosity=self.verbosity, + include_description=self.include_description, + ) + for key in issue_types + ] + + report_str += "\n\n\n".join(issue_reports) + return report_str
+ + def _write_summary(self, summary: pd.DataFrame) -> str: + statistics = self.data_issues.get_info("statistics") + num_examples = statistics["num_examples"] + num_classes = statistics.get( + "num_classes" + ) # This may not be required for all types of datasets in the future (e.g. unlabeled/regression) + + dataset_information = f"Dataset Information: num_examples: {num_examples}" + if num_classes is not None: + dataset_information += f", num_classes: {num_classes}" + + if not self.show_all_issues: + # Drop any items in the issue_summary that have no issues (any issue detected in data needs to have num_issues > 0) + summary = summary.query("num_issues > 0") + + report_header = ( + f"{dataset_information}\n\n" + + "Here is a summary of various issues found in your data:\n\n" + ) + report_footer = ( + "\n\n" + + "Learn about each issue: https://docs.cleanlab.ai/stable/cleanlab/datalab/guide/issue_type_description.html\n" + + "See which examples in your dataset exhibit each issue via: `datalab.get_issues(<ISSUE_NAME>)`\n\n" + + "Data indices corresponding to top examples of each issue are shown below.\n\n\n" + ) + + if self.show_summary_score: + return ( + report_header + + summary.to_string(index=False) + + "\n\n" + + "(Note: A lower score indicates a more severe issue across all examples in the dataset.)" + + report_footer + ) + + return ( + report_header + summary.drop(columns=["score"]).to_string(index=False) + report_footer + ) + + def _get_issue_types(self, issue_summary: pd.DataFrame) -> List[str]: + issue_types = [ + issue_type + for issue_type, num_issues in zip( + issue_summary["issue_type"].tolist(), issue_summary["num_issues"].tolist() + ) + if issue_type not in DEFAULT_CLEANVISION_ISSUES + and (self.show_all_issues or num_issues > 0) + ] + return issue_types
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/datalab/internal/task.html b/v2.6.6/_modules/cleanlab/datalab/internal/task.html new file mode 100644 index 000000000..bc4266666 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/datalab/internal/task.html @@ -0,0 +1,823 @@ + + + + + + + + + + + cleanlab.datalab.internal.task - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.datalab.internal.task

+# Copyright (C) 2017-2024  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+"""
+This module contains the Task enum, which internally represents the tasks
+supported by Datalab, so that the appropriate task-specific logic can be applied.
+This class and associated naming conventions are subject to change and is not meant
+to be used by users.
+"""
+from enum import Enum
+
+
+
[docs]class Task(Enum): + """ + Represents a task supported by Datalab. + + Datalab supports the following tasks: + + * **Classification**: for predicting discrete class labels. + * **Regression**: for predicting continuous numerical values. + * **Multilabel**: for predicting multiple binary labels simultaneously. + + Example + ------- + >>> task = Task.CLASSIFICATION + >>> task + <Task.CLASSIFICATION: 'classification'> + """ + + CLASSIFICATION = "classification" + """Classification task.""" + REGRESSION = "regression" + """Regression task.""" + MULTILABEL = "multilabel" + """Multilabel task.""" + + def __str__(self): + """ + Returns the string representation of the task. + + Returns: + str: The string representation of the task. + """ + return self.value + +
[docs] @classmethod + def from_str(cls, task_str: str) -> "Task": + """ + Converts a string representation of a task to a Task enum value. + + Parameters + ---------- + task_str : + The string representation of the task. + + Returns + ------- + Task : + The corresponding Task enum value. + + Raises + ------ + ValueError : + If the provided task_str is not a valid task supported by Datalab. + + Examples + -------- + >>> Task.from_str("classification") + <Task.CLASSIFICATION: 'classification'> + >>> print(Task.from_str("regression")) + regression + """ + _value_to_enum = {task.value: task for task in Task} + try: + return _value_to_enum[task_str] + except KeyError: + valid_tasks = list(_value_to_enum.keys()) + raise ValueError(f"Invalid task: {task_str}. Datalab only supports {valid_tasks}.")
+ + @property + def is_classification(self): + """ + Checks if the task is classification. + + Returns + ------- + bool : + True if the task is classification, False otherwise. + + Examples + -------- + >>> task = Task.CLASSIFICATION + >>> print(task.is_classification) + True + """ + return self == Task.CLASSIFICATION + + @property + def is_regression(self): + """ + Checks if the task is regression. + + Returns + ------- + bool : + True if the task is regression, False otherwise. + + Examples + -------- + >>> task = Task.CLASSIFICATION + >>> print(task.is_regression) + False + """ + return self == Task.REGRESSION + + @property + def is_multilabel(self): + """ + Checks if the task is multilabel. + + Returns + ------- + bool : + True if the task is multilabel, False otherwise. + + Examples + -------- + >>> task = Task.CLASSIFICATION + >>> print(task.is_multilabel) + False + """ + return self == Task.MULTILABEL
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/dataset.html b/v2.6.6/_modules/cleanlab/dataset.html new file mode 100644 index 000000000..0034bfc5e --- /dev/null +++ b/v2.6.6/_modules/cleanlab/dataset.html @@ -0,0 +1,1209 @@ + + + + + + + + + + + cleanlab.dataset - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.dataset

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Provides dataset-level and class-level overviews of issues in your classification dataset.
+If your task allows you to modify the classes in your dataset, this module can help you determine
+which classes to remove (see `~cleanlab.dataset.rank_classes_by_label_quality`)
+and which classes to merge (see `~cleanlab.dataset.find_overlapping_classes`).
+"""
+
+from typing import Optional, cast
+import numpy as np
+import pandas as pd
+
+from cleanlab.count import estimate_joint, num_label_issues
+from cleanlab.internal.constants import EPSILON
+
+
+
[docs]def rank_classes_by_label_quality( + labels=None, + pred_probs=None, + *, + class_names=None, + num_examples=None, + joint=None, + confident_joint=None, + multi_label=False, +) -> pd.DataFrame: + """ + Returns a Pandas DataFrame with all classes and three overall class label quality scores + (details about each score are listed in the Returns parameter). By default, classes are ordered + by "Label Quality Score", ascending, so the most problematic classes are reported first. + + Score values are unnormalized and may tend to be very small. What matters is their relative + ranking across the classes. + + This method works by providing any one (and only one) of the following inputs: + + 1. ``labels`` and ``pred_probs``, or + 2. ``joint`` and ``num_examples``, or + 3. ``confident_joint`` + + Only provide **exactly one of the above input options**, do not provide a combination. + + Examples + -------- + >>> from cleanlab.dataset import rank_classes_by_label_quality + >>> from sklearn.linear_model import LogisticRegression + >>> from sklearn.model_selection import cross_val_predict + >>> data, labels = get_data_labels_from_dataset() + >>> yourFavoriteModel = LogisticRegression() + >>> pred_probs = cross_val_predict(yourFavoriteModel, data, labels, cv=3, method="predict_proba") + >>> df = rank_classes_by_label_quality(labels=labels, pred_probs=pred_probs) + + **Parameters**: For parameter info, see the docstring of `~cleanlab.dataset.find_overlapping_classes`. + + Returns + ------- + overall_label_quality : pd.DataFrame + Pandas DataFrame with cols "Class Index", "Label Issues", "Inverse Label Issues", + "Label Issues", "Inverse Label Noise", "Label Quality Score", + with a description of each of these columns below. + The length of the DataFrame is ``num_classes`` (one row per class). + Noise scores are between 0 and 1, where 0 implies no label issues + in the class. The "Label Quality Score" is also between 0 and 1 where 1 implies + perfect quality. Columns: + + * *Class Index*: The index of the class in 0, 1, ..., K-1. + * *Label Issues*: ``count(given_label = k, true_label != k)``, estimated number of examples in the dataset that are labeled as class k but should have a different label. + * *Inverse Label Issues*: ``count(given_label != k, true_label = k)``, estimated number of examples in the dataset that should actually be labeled as class k but have been given another label. + * *Label Noise*: ``prob(true_label != k | given_label = k)``, estimated proportion of examples in the dataset that are labeled as class k but should have a different label. For each class k: this is computed by dividing the number of examples with "Label Issues" that were labeled as class k by the total number of examples labeled as class k. + * *Inverse Label Noise*: ``prob(given_label != k | true_label = k)``, estimated proportion of examples in the dataset that should actually be labeled as class k but have been given another label. + * *Label Quality Score*: ``p(true_label = k | given_label = k)``. This is the proportion of examples with given label k that have been labeled correctly, i.e. ``1 - label_noise``. + + By default, the DataFrame is ordered by "Label Quality Score", ascending. + """ + if multi_label: + raise ValueError( + "For multilabel data, please instead call: multilabel_classification.dataset.overall_multilabel_health_score()" + ) + + if joint is None: + joint = estimate_joint( + labels=labels, + pred_probs=pred_probs, + confident_joint=confident_joint, + ) + if num_examples is None: + num_examples = _get_num_examples(labels=labels) + given_label_noise = joint.sum(axis=1) - joint.diagonal() # p(s=k) - p(s=k,y=k) = p(y!=k, s=k) + true_label_noise = joint.sum(axis=0) - joint.diagonal() # p(y=k) - p(s=k,y=k) = p(s!=k,y=k) + given_conditional_noise = given_label_noise / np.clip( + joint.sum(axis=1), a_min=EPSILON, a_max=None + ) # p(y!=k, s=k) / p(s=k) , avoiding division by 0 + true_conditional_noise = true_label_noise / np.clip( + joint.sum(axis=0), a_min=EPSILON, a_max=None + ) # p(s!=k, y=k) / p(y=k) , avoiding division by 0 + df = pd.DataFrame( + { + "Class Index": np.arange(len(joint)), + "Label Issues": (given_label_noise * num_examples).round().astype(int), + "Inverse Label Issues": (true_label_noise * num_examples).round().astype(int), + "Label Noise": given_conditional_noise, # p(y!=k | s=k) + "Inverse Label Noise": true_conditional_noise, # p(s!=k | y=k) + # Below could equivalently be computed as: joint.diagonal() / joint.sum(axis=1) + "Label Quality Score": 1 - given_conditional_noise, # p(y=k | s=k) + } + ) + if class_names is not None: + df.insert(loc=0, column="Class Name", value=class_names) + return df.sort_values(by="Label Quality Score", ascending=True).reset_index(drop=True)
+ + +
[docs]def find_overlapping_classes( + labels=None, + pred_probs=None, + *, + asymmetric=False, + class_names=None, + num_examples=None, + joint=None, + confident_joint=None, + multi_label=False, +) -> pd.DataFrame: + """Returns the pairs of classes that are often mislabeled as one another. + Consider merging the top pairs of classes returned by this method each into a single class. + If the dataset is labeled by human annotators, consider clearly defining the + difference between the classes prior to having annotators label the data. + + This method provides two scores in the Pandas DataFrame that is returned: + + * **Num Overlapping Examples**: The number of examples where the two classes overlap + * **Joint Probability**: `(num overlapping examples / total number of examples in the dataset`). + + This method works by providing any one (and only one) of the following inputs: + + 1. ``labels`` and ``pred_probs``, or + 2. ``joint`` and ``num_examples``, or + 3. ``confident_joint`` + + Only provide **exactly one of the above input options**, do not provide a combination. + + This method uses the joint distribution of noisy and true labels to compute ontological + issues via the approach published in `Northcutt et al., + 2021 <https://jair.org/index.php/jair/article/view/12125>`_. + + Examples + -------- + >>> from cleanlab.dataset import find_overlapping_classes + >>> from sklearn.linear_model import LogisticRegression + >>> from sklearn.model_selection import cross_val_predict + >>> data, labels = get_data_labels_from_dataset() + >>> yourFavoriteModel = LogisticRegression() + >>> pred_probs = cross_val_predict(yourFavoriteModel, data, labels, cv=3, method="predict_proba") + >>> df = find_overlapping_classes(labels=labels, pred_probs=pred_probs) + + Note + ---- + The joint distribution of noisy and true labels is asymmetric, and therefore the joint + probability ``p(given="vehicle", true="truck") != p(true="truck", given="vehicle")``. + This is intuitive. Images of trucks (true label) are much more likely to be labeled as a car + (given label) than images of cars (true label) being frequently mislabeled as truck (given + label). cleanlab takes these differences into account for you automatically via the joint + distribution. If you do not want this behavior, simply set ``asymmetric=False``. + + This method estimates how often the annotators confuse two classes. + This differs from just using a similarity matrix or confusion matrix, + as these summarize characteristics of the predictive model rather than the data labelers (i.e. annotators). + Instead, this method works even if the model that generated `pred_probs` tends to be more confident in some classes than others. + + Parameters + ---------- + labels : np.ndarray or list, optional + An array_like (of length N) of noisy labels for the classification dataset, i.e. some labels may be erroneous. + Elements must be integers in the set 0, 1, ..., K-1, where K is the number of classes. + All the classes (0, 1, ..., and K-1) should be present in ``labels``, such that + ``len(set(labels)) == pred_probs.shape[1]`` for standard multi-class classification with single-labeled data (e.g. ``labels = [1,0,2,1,1,0...]``). + For multi-label classification where each example can belong to multiple classes (e.g. ``labels = [[1,2],[1],[0],[],...]``), + your labels should instead satisfy: ``len(set(k for l in labels for k in l)) == pred_probs.shape[1])``. + + pred_probs : np.ndarray, optional + An array of shape ``(N, K)`` of model-predicted probabilities, + ``P(label=k|x)``. Each row of this matrix corresponds + to an example `x` and contains the model-predicted probabilities that + `x` belongs to each possible class, for each of the K classes. The + columns must be ordered such that these probabilities correspond to + class 0, 1, ..., K-1. `pred_probs` should have been computed using 3 (or + higher) fold cross-validation. + + asymmetric : bool, optional + If ``asymmetric=True``, returns separate estimates for both pairs (class1, class2) and (class2, class1). Use this + for finding "is a" relationships where for example "class1 is a class2". + In this case, num overlapping examples counts the number of examples that have been labeled as class1 which should actually have been labeled as class2. + If ``asymmetric=False``, the pair (class1, class2) will only be returned once with an arbitrary order. + In this case, their estimated score is the sum: ``score(class1, class2) + score(class2, class1))``. + + class_names : Iterable[str] + A list or other iterable of the string class names. The list should be in the order that + matches the class indices. So if class 0 is 'dog' and class 1 is 'cat', then + ``class_names = ['dog', 'cat']``. + + num_examples : int or None, optional + The number of examples in the dataset, i.e. ``len(labels)``. You only need to provide this if + you use this function with the joint, e.g. ``find_overlapping_classes(joint=joint)``, otherwise + this is automatically computed via ``sum(confident_joint)`` or ``len(labels)``. + + joint : np.ndarray, optional + An array of shape ``(K, K)``, where K is the number of classes, + representing the estimated joint distribution of the noisy labels and + true labels. The sum of all entries in this matrix must be 1 (valid + probability distribution). Each entry in the matrix captures the co-occurence joint + probability of a true label and a noisy label, i.e. ``p(noisy_label=i, true_label=j)``. + **Important**. If you input the joint, you must also input `num_examples`. + + confident_joint : np.ndarray, optional + An array of shape ``(K, K)`` representing the confident joint, the matrix used for identifying label issues, which + estimates a confident subset of the joint distribution of the noisy and true labels, ``P_{noisy label, true label}``. + Entry ``(j, k)`` in the matrix is the number of examples confidently counted into the pair of ``(noisy label=j, true label=k)`` classes. + The `confident_joint` can be computed using :py:func:`count.compute_confident_joint <cleanlab.count.compute_confident_joint>`. + If not provided, it is computed from the given (noisy) `labels` and `pred_probs`. + + Returns + ------- + overlapping_classes : pd.DataFrame + Pandas DataFrame with columns "Class Index A", "Class Index B", + "Num Overlapping Examples", "Joint Probability" and a description of each below. + Each row corresponds to a pair of classes. + + * *Class Index A*: the index of a class in 0, 1, ..., K-1. + * *Class Index B*: the index of a different class (from Class A) in 0, 1, ..., K-1. + * *Num Overlapping Examples*: estimated number of labels overlapping between the two classes. + * *Joint Probability*: the *Num Overlapping Examples* divided by the number of examples in the dataset. + + By default, the DataFrame is ordered by "Joint Probability" descending. + """ + + def _2d_matrix_to_row_column_value_list(matrix): + """Create a list<tuple> [(row_index, col_index, value)] representation of matrix. + + Parameters + ---------- + matrix : np.ndarray<float> + Any valid np.ndarray 2-d dimensional matrix. + + Returns + ------- + list<tuple> + A [(row_index, col_index, value)] representation of matrix. + """ + + return [(*i, v) for i, v in np.ndenumerate(matrix)] + + if multi_label: + raise ValueError( + "For multilabel data, please instead call: multilabel_classification.dataset.common_multilabel_issues()" + ) + + if joint is None: + joint = estimate_joint( + labels=labels, + pred_probs=pred_probs, + confident_joint=confident_joint, + ) + if num_examples is None: + num_examples = _get_num_examples(labels=labels, confident_joint=confident_joint) + if asymmetric: + rcv_list = _2d_matrix_to_row_column_value_list(joint) + # Remove diagonal elements + rcv_list = [tup for tup in rcv_list if tup[0] != tup[1]] + else: # symmetric + # Sum the upper and lower triangles and remove the lower triangle and the diagonal + sym_joint = np.triu(joint) + np.tril(joint).T + rcv_list = _2d_matrix_to_row_column_value_list(sym_joint) + # Provide values only in (the upper triangle) of the matrix. + rcv_list = [tup for tup in rcv_list if tup[0] < tup[1]] + df = pd.DataFrame(rcv_list, columns=["Class Index A", "Class Index B", "Joint Probability"]) + num_overlapping = (df["Joint Probability"] * num_examples).round().astype(int) + df.insert(loc=2, column="Num Overlapping Examples", value=num_overlapping) + if class_names is not None: + df.insert( + loc=0, column="Class Name A", value=df["Class Index A"].apply(lambda x: class_names[x]) + ) + df.insert( + loc=1, column="Class Name B", value=df["Class Index B"].apply(lambda x: class_names[x]) + ) + return df.sort_values(by="Joint Probability", ascending=False).reset_index(drop=True)
+ + +
[docs]def overall_label_health_score( + labels=None, + pred_probs=None, + *, + num_examples=None, + confident_joint=None, + joint=None, + multi_label=False, + verbose=True, +) -> float: + """Returns a single score between 0 and 1 measuring the overall quality of all labels in a dataset. + Intuitively, the score is the average correctness of the given labels across all examples in the + dataset. So a score of 1 suggests your data is perfectly labeled and a score of 0.5 suggests + half of the examples in the dataset may be incorrectly labeled. Thus, a higher + score implies a higher quality dataset. + + This method works by providing any one (and only one) of the following inputs: + + 1. ``labels`` and ``pred_probs``, or + 2. ``joint`` and ``num_examples``, or + 3. ``confident_joint`` + + Only provide **exactly one of the above input options**, do not provide a combination. + + Examples + -------- + >>> from cleanlab.dataset import overall_label_health_score + >>> from sklearn.linear_model import LogisticRegression + >>> from sklearn.model_selection import cross_val_predict + >>> data, labels = get_data_labels_from_dataset() + >>> yourFavoriteModel = LogisticRegression() + >>> pred_probs = cross_val_predict(yourFavoriteModel, data, labels, cv=3, method="predict_proba") + >>> score = overall_label_health_score(labels=labels, pred_probs=pred_probs) # doctest: +SKIP + + **Parameters**: For parameter info, see the docstring of `~cleanlab.dataset.find_overlapping_classes`. + + + Returns + ------- + health_score : float + A score between 0 and 1, where 1 implies all labels in the dataset are estimated to be correct. + A score of 0.5 implies that half of the dataset's labels are estimated to have issues. + """ + if multi_label: + raise ValueError( + "For multilabel data, please instead call: multilabel_classification.dataset.overall_multilabel_health_score()" + ) + if num_examples is None: + num_examples = _get_num_examples(labels=labels, confident_joint=confident_joint) + + if pred_probs is None or labels is None: + if joint is None: + joint = estimate_joint( + labels=labels, + pred_probs=pred_probs, + confident_joint=confident_joint, + ) + joint_trace = joint.trace() + num_issues = (num_examples * (1 - joint_trace)).round().astype(int) + health_score = joint_trace + else: + num_issues = num_label_issues( + labels=labels, pred_probs=pred_probs, confident_joint=confident_joint + ) + health_score = 1 - num_issues / num_examples + + if verbose: + print( + f" * Overall, about {(1 - health_score):.0%} ({num_issues:,} of the {num_examples:,}) " + f"labels in your dataset have potential issues.\n" + f" ** The overall label health score for this dataset is: {health_score:.2f}." + ) + return health_score
+ + +
[docs]def health_summary( + labels=None, + pred_probs=None, + *, + asymmetric=False, + class_names=None, + num_examples=None, + joint=None, + confident_joint=None, + multi_label=False, + verbose=True, +) -> dict: + """Prints a health summary of your dataset. + + This summary includes useful statistics like: + + * The classes with the most and least label issues. + * Classes that overlap and could potentially be merged. + * Overall label quality scores, summarizing how accurate the labels appear overall. + + This method works by providing any one (and only one) of the following inputs: + + 1. ``labels`` and ``pred_probs``, or + 2. ``joint`` and ``num_examples``, or + 3. ``confident_joint`` + + Only provide **exactly one of the above input options**, do not provide a combination. + + Examples + -------- + >>> from cleanlab.dataset import health_summary + >>> from sklearn.linear_model import LogisticRegression + >>> from sklearn.model_selection import cross_val_predict + >>> data, labels = get_data_labels_from_dataset() + >>> yourFavoriteModel = LogisticRegression() + >>> pred_probs = cross_val_predict(yourFavoriteModel, data, labels, cv=3, method="predict_proba") + >>> summary = health_summary(labels=labels, pred_probs=pred_probs) # doctest: +SKIP + + **Parameters**: For parameter info, see the docstring of `~cleanlab.dataset.find_overlapping_classes`. + + Returns + ------- + summary : dict + A dictionary containing keys (see the corresponding functions' documentation to understand the values): + + - ``"overall_label_health_score"``, corresponding to `~cleanlab.dataset.overall_label_health_score` + - ``"joint"``, corresponding to :py:func:`count.estimate_joint <cleanlab.count.estimate_joint>` + - ``"classes_by_label_quality"``, corresponding to `~cleanlab.dataset.rank_classes_by_label_quality` + - ``"overlapping_classes"``, corresponding to `~cleanlab.dataset.find_overlapping_classes` + """ + from cleanlab.internal.util import smart_display_dataframe + + if multi_label: + raise ValueError( + "For multilabel data, please call multilabel_classification.dataset.health_summary" + ) + if joint is None: + joint = estimate_joint( + labels=labels, + pred_probs=pred_probs, + confident_joint=confident_joint, + ) + if num_examples is None: + num_examples = _get_num_examples(labels=labels) + + if verbose: + longest_line = ( + f"| for your dataset with {num_examples:,} examples " + f"and {len(joint):,} classes. |\n" + ) + print( + "-" * (len(longest_line) - 1) + + "\n" + + f"| Generating a Cleanlab Dataset Health Summary{' ' * (len(longest_line) - 49)}|\n" + + longest_line + + f"| Note, Cleanlab is not a medical doctor... yet.{' ' * (len(longest_line) - 51)}|\n" + + "-" * (len(longest_line) - 1) + + "\n", + ) + + df_class_label_quality = rank_classes_by_label_quality( + labels=labels, + pred_probs=pred_probs, + class_names=class_names, + num_examples=num_examples, + joint=joint, + confident_joint=confident_joint, + ) + if verbose: + print("Overall Class Quality and Noise across your dataset (below)") + print("-" * 60, "\n", flush=True) + smart_display_dataframe(df_class_label_quality) + + df_overlapping_classes = find_overlapping_classes( + labels=labels, + pred_probs=pred_probs, + asymmetric=asymmetric, + class_names=class_names, + num_examples=num_examples, + joint=joint, + confident_joint=confident_joint, + ) + if verbose: + print( + "\nClass Overlap. In some cases, you may want to merge classes in the top rows (below)" + + "\n" + + "-" * 83 + + "\n", + flush=True, + ) + smart_display_dataframe(df_overlapping_classes) + print() + + health_score = overall_label_health_score( + labels=labels, + pred_probs=pred_probs, + num_examples=num_examples, + confident_joint=confident_joint, + verbose=verbose, + ) + if verbose: + print("\nGenerated with <3 from Cleanlab.\n") + return { + "overall_label_health_score": health_score, + "joint": joint, + "classes_by_label_quality": df_class_label_quality, + "overlapping_classes": df_overlapping_classes, + }
+ + +def _get_num_examples(labels=None, confident_joint: Optional[np.ndarray] = None) -> int: + """Helper method that finds the number of examples from the parameters or throws an error + if neither parameter is provided. + + **Parameters:** For information about the arguments to this method, see the documentation of `dataset.find_overlapping_classes` + + Returns + ------- + num_examples : int + The number of examples in the dataset. + + Raises + ------ + ValueError + If `labels` is None.""" + + if labels is None and confident_joint is None: + raise ValueError( + "Error: num_examples is None. You must either provide confident_joint, " + "or provide both num_example and joint as input parameters." + ) + _confident_joint = cast(np.ndarray, confident_joint) + num_examples = len(labels) if labels is not None else cast(int, np.sum(_confident_joint)) + return num_examples +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/experimental/cifar_cnn.html b/v2.6.6/_modules/cleanlab/experimental/cifar_cnn.html new file mode 100644 index 000000000..e45db262f --- /dev/null +++ b/v2.6.6/_modules/cleanlab/experimental/cifar_cnn.html @@ -0,0 +1,787 @@ + + + + + + + + + + + cleanlab.experimental.cifar_cnn - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.experimental.cifar_cnn

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+A PyTorch CNN which can be used for finding label issues in CIFAR-10 and CleanLearning with co-teaching.
+
+Code adapted from: https://github.com/bhanML/Co-teaching/blob/master/model.py
+
+You must have PyTorch installed: https://pytorch.org/get-started/locally/
+"""
+
+
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+
[docs]def call_bn(bn, x): + return bn(x)
+ + +
[docs]class CNN(nn.Module): + """A CNN architecture shown to be a good baseline for a CIFAR-10 benchmark. + + Parameters + ---------- + input_channel : int + n_outputs : int + dropout_rate : float + top_bn : bool + + Methods + ------- + forward + forward pass in PyTorch""" + + def __init__(self, input_channel=3, n_outputs=10, dropout_rate=0.25, top_bn=False): + self.dropout_rate = dropout_rate + self.top_bn = top_bn + super(CNN, self).__init__() + self.c1 = nn.Conv2d(input_channel, 128, kernel_size=3, stride=1, padding=1) + self.c2 = nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1) + self.c3 = nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1) + self.c4 = nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1) + self.c5 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1) + self.c6 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1) + self.c7 = nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=0) + self.c8 = nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=0) + self.c9 = nn.Conv2d(256, 128, kernel_size=3, stride=1, padding=0) + self.l_c1 = nn.Linear(128, n_outputs) + self.bn1 = nn.BatchNorm2d(128) + self.bn2 = nn.BatchNorm2d(128) + self.bn3 = nn.BatchNorm2d(128) + self.bn4 = nn.BatchNorm2d(256) + self.bn5 = nn.BatchNorm2d(256) + self.bn6 = nn.BatchNorm2d(256) + self.bn7 = nn.BatchNorm2d(512) + self.bn8 = nn.BatchNorm2d(256) + self.bn9 = nn.BatchNorm2d(128) + +
[docs] def forward( + self, + x, + ): + h = x + h = self.c1(h) + h = F.leaky_relu(call_bn(self.bn1, h), negative_slope=0.01) + h = self.c2(h) + h = F.leaky_relu(call_bn(self.bn2, h), negative_slope=0.01) + h = self.c3(h) + h = F.leaky_relu(call_bn(self.bn3, h), negative_slope=0.01) + h = F.max_pool2d(h, kernel_size=2, stride=2) + h = F.dropout2d(h, p=self.dropout_rate) + + h = self.c4(h) + h = F.leaky_relu(call_bn(self.bn4, h), negative_slope=0.01) + h = self.c5(h) + h = F.leaky_relu(call_bn(self.bn5, h), negative_slope=0.01) + h = self.c6(h) + h = F.leaky_relu(call_bn(self.bn6, h), negative_slope=0.01) + h = F.max_pool2d(h, kernel_size=2, stride=2) + h = F.dropout2d(h, p=self.dropout_rate) + + h = self.c7(h) + h = F.leaky_relu(call_bn(self.bn7, h), negative_slope=0.01) + h = self.c8(h) + h = F.leaky_relu(call_bn(self.bn8, h), negative_slope=0.01) + h = self.c9(h) + h = F.leaky_relu(call_bn(self.bn9, h), negative_slope=0.01) + h = F.avg_pool2d(h, kernel_size=h.data.shape[2]) + + h = h.view(h.size(0), h.size(1)) + logit = self.l_c1(h) + if self.top_bn: + logit = call_bn(self.bn_c1, logit) + return logit
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/experimental/coteaching.html b/v2.6.6/_modules/cleanlab/experimental/coteaching.html new file mode 100644 index 000000000..d0cbab2a3 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/experimental/coteaching.html @@ -0,0 +1,925 @@ + + + + + + + + + + + cleanlab.experimental.coteaching - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.experimental.coteaching

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+
+"""
+Implements the co-teaching algorithm for training neural networks on noisily-labeled data (Han et al., 2018).
+This module requires PyTorch (https://pytorch.org/get-started/locally/).
+Example using this algorithm with cleanlab to achieve state of the art on CIFAR-10
+for learning with noisy labels is provided within: https://github.com/cleanlab/examples/
+
+``cifar_cnn.py`` provides an example model that can be trained via this algorithm.
+"""
+
+# Significant code was adapted from the following GitHub:
+# https://github.com/bhanML/Co-teaching/blob/master/loss.py
+# See (Han et al., 2018).
+
+import torch
+import torch.nn.functional as F
+from torch.autograd import Variable
+import numpy as np
+
+MINIMUM_BATCH_SIZE = 16
+
+
+# Loss function for Co-Teaching
+
[docs]def loss_coteaching( + y_1, + y_2, + t, + forget_rate, + class_weights=None, +): + """Co-Teaching Loss function. + + Parameters + ---------- + y_1 : Tensor array + Output logits from model 1 + + y_2 : Tensor array + Output logits from model 2 + + t : np.ndarray + List of Noisy Labels (t means targets) + + forget_rate : float + Decimal between 0 and 1 for how quickly the models forget what they learn. + Just use rate_schedule[epoch] for this value + + class_weights : Tensor array, shape (Number of classes x 1), Default: None + A np.torch.tensor list of length number of classes with weights + """ + + loss_1 = F.cross_entropy(y_1, t, reduce=False, weight=class_weights) + ind_1_sorted = np.argsort(loss_1.data.cpu()) + loss_1_sorted = loss_1[ind_1_sorted] + + loss_2 = F.cross_entropy(y_2, t, reduce=False, weight=class_weights) + ind_2_sorted = np.argsort(loss_2.data.cpu()) + + remember_rate = 1 - forget_rate + num_remember = int(remember_rate * len(loss_1_sorted)) + + ind_1_update = ind_1_sorted[:num_remember] + ind_2_update = ind_2_sorted[:num_remember] + # Share updates between the two models. + # TODO: these class weights should take into account the ind_mask filters. + loss_1_update = F.cross_entropy(y_1[ind_2_update], t[ind_2_update], weight=class_weights) + loss_2_update = F.cross_entropy(y_2[ind_1_update], t[ind_1_update], weight=class_weights) + + return ( + torch.sum(loss_1_update) / num_remember, + torch.sum(loss_2_update) / num_remember, + )
+ + +
[docs]def initialize_lr_scheduler(lr=0.001, epochs=250, epoch_decay_start=80): + """Scheduler to adjust learning rate and betas for Adam Optimizer""" + mom1 = 0.9 + mom2 = 0.9 # Original author had this set to 0.1 + alpha_plan = [lr] * epochs + beta1_plan = [mom1] * epochs + for i in range(epoch_decay_start, epochs): + alpha_plan[i] = float(epochs - i) / (epochs - epoch_decay_start) * lr + beta1_plan[i] = mom2 + return alpha_plan, beta1_plan
+ + +
[docs]def adjust_learning_rate(optimizer, epoch, alpha_plan, beta1_plan): + """Scheduler to adjust learning rate and betas for Adam Optimizer""" + for param_group in optimizer.param_groups: + param_group["lr"] = alpha_plan[epoch] + param_group["betas"] = (beta1_plan[epoch], 0.999) # Only change beta1
+ + +
[docs]def forget_rate_scheduler(epochs, forget_rate, num_gradual, exponent): + """Tells Co-Teaching what fraction of examples to forget at each epoch.""" + # define how many things to forget at each rate schedule + forget_rate_schedule = np.ones(epochs) * forget_rate + forget_rate_schedule[:num_gradual] = np.linspace(0, forget_rate**exponent, num_gradual) + return forget_rate_schedule
+ + +# Train the Model +
[docs]def train( + train_loader, + epoch, + model1, + optimizer1, + model2, + optimizer2, + args, + forget_rate_schedule, + class_weights, + accuracy, +): + """PyTorch training function. + + Parameters + ---------- + train_loader : torch.utils.data.DataLoader + epoch : int + model1 : PyTorch class inheriting nn.Module + Must define __init__ and forward(self, x,) + optimizer1 : PyTorch torch.optim.Adam + model2 : PyTorch class inheriting nn.Module + Must define __init__ and forward(self, x,) + optimizer2 : PyTorch torch.optim.Adam + args : parser.parse_args() object + Must contain num_iter_per_epoch, print_freq, and epochs + forget_rate_schedule : np.ndarray of length number of epochs + Tells Co-Teaching loss what fraction of examples to forget about. + class_weights : Tensor array, shape (Number of classes x 1), Default: None + A np.torch.tensor list of length number of classes with weights + accuracy : function + A function of the form accuracy(output, target, topk=(1,)) for + computing top1 and top5 accuracy given output and true targets.""" + + train_total = 0 + train_correct = 0 + train_total2 = 0 + train_correct2 = 0 + + # Prepare models for training + model1.train() + model2.train() + + for i, (images, labels) in enumerate(train_loader): + if i == len(train_loader) - 1 and len(labels) < MINIMUM_BATCH_SIZE: + # Edge case -- the last leftover batch is small (potentially size 1) + # This will happen if, for example, you train on 35101 examples with + # batch size of 450. The last batch will be size 1. + # If you update the weights based on the gradient from one example + # if that example is noisy, you will add tons of noise to your net + # and accuracy will actually go down with each epoch. + # To avoid this, do not train on the last batch if it's small. + continue + + images = Variable(images).cuda() + labels = Variable(labels).cuda() + + # Forward + Backward + Optimize + logits1 = model1(images) + prec1, _ = accuracy(logits1, labels, topk=(1, 5)) + train_total += 1 + train_correct += prec1 + logits2 = model2(images) + prec2, _ = accuracy(logits2, labels, topk=(1, 5)) + train_total2 += 1 + train_correct2 += prec2 + loss_1, loss_2 = loss_coteaching( + logits1, + logits2, + labels, + forget_rate=forget_rate_schedule[epoch], + class_weights=class_weights, + ) + optimizer1.zero_grad() + loss_1.backward() + optimizer1.step() + optimizer2.zero_grad() + loss_2.backward() + optimizer2.step() + if (i + 1) % args.print_freq == 0: + print( + "Epoch [%d/%d], Iter [%d/%d] Training Accuracy1: %.4F, " + "Training Accuracy2: %.4f, Loss1: %.4f, Loss2: %.4f " + % ( + epoch + 1, + args.epochs, + i + 1, + len(train_loader.dataset) // args.batch_size, + prec1, + prec2, + loss_1.data.item(), + loss_2.data.item(), + ) + ) + + train_acc1 = float(train_correct) / float(train_total) + train_acc2 = float(train_correct2) / float(train_total2) + return train_acc1, train_acc2
+ + +# Evaluate the Model +
[docs]def evaluate(test_loader, model1, model2): + print("Evaluating Co-Teaching Model") + model1.eval() # Change model to 'eval' mode. + correct1 = 0 + total1 = 0 + for images, labels in test_loader: + images = Variable(images).cuda() + logits1 = model1(images) + outputs1 = F.softmax(logits1, dim=1) + _, pred1 = torch.max(outputs1.data, 1) + total1 += labels.size(0) + correct1 += (pred1.cpu() == labels).sum() + + model2.eval() # Change model to 'eval' mode + correct2 = 0 + total2 = 0 + for images, labels in test_loader: + images = Variable(images).cuda() + logits2 = model2(images) + outputs2 = F.softmax(logits2, dim=1) + _, pred2 = torch.max(outputs2.data, 1) + total2 += labels.size(0) + correct2 += (pred2.cpu() == labels).sum() + + acc1 = 100 * float(correct1) / float(total1) + acc2 = 100 * float(correct2) / float(total2) + return acc1, acc2
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/experimental/label_issues_batched.html b/v2.6.6/_modules/cleanlab/experimental/label_issues_batched.html new file mode 100644 index 000000000..d6bd9de98 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/experimental/label_issues_batched.html @@ -0,0 +1,1445 @@ + + + + + + + + + + + cleanlab.experimental.label_issues_batched - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.experimental.label_issues_batched

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Implementation of :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>`
+that does not need much memory by operating in mini-batches.
+You can also use this approach to estimate label quality scores or the number of label issues
+for big datasets with limited memory.
+
+With default settings, the results returned from this approach closely approximate those returned from:
+``cleanlab.filter.find_label_issues(..., filter_by="low_self_confidence", return_indices_ranked_by="self_confidence")``
+
+To run this approach, either use the ``find_label_issues_batched()`` convenience function defined in this module,
+or follow the examples script for the ``LabelInspector`` class if you require greater customization.
+"""
+
+import numpy as np
+from typing import Optional, List, Tuple, Any
+
+from cleanlab.count import get_confident_thresholds, _reduce_issues
+from cleanlab.rank import find_top_issues, _compute_label_quality_scores
+from cleanlab.typing import LabelLike
+from cleanlab.internal.util import value_counts_fill_missing_classes
+from cleanlab.internal.constants import (
+    CONFIDENT_THRESHOLDS_LOWER_BOUND,
+    FLOATING_POINT_COMPARISON,
+    CLIPPING_LOWER_BOUND,
+)
+
+import platform
+import multiprocessing as mp
+
+try:
+    import psutil
+
+    PSUTIL_EXISTS = True
+except ImportError:  # pragma: no cover
+    PSUTIL_EXISTS = False
+
+# global variable for multiproc on linux
+adj_confident_thresholds_shared: np.ndarray
+labels_shared: LabelLike
+pred_probs_shared: np.ndarray
+
+
+
[docs]def find_label_issues_batched( + labels: Optional[LabelLike] = None, + pred_probs: Optional[np.ndarray] = None, + *, + labels_file: Optional[str] = None, + pred_probs_file: Optional[str] = None, + batch_size: int = 10000, + n_jobs: Optional[int] = 1, + verbose: bool = True, + quality_score_kwargs: Optional[dict] = None, + num_issue_kwargs: Optional[dict] = None, + return_mask: bool = False, +) -> np.ndarray: + """ + Variant of :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` + that requires less memory by reading from `pred_probs`, `labels` in mini-batches. + To avoid loading big `pred_probs`, `labels` arrays into memory, + provide these as memory-mapped objects like Zarr arrays or memmap arrays instead of regular numpy arrays. + See: https://pythonspeed.com/articles/mmap-vs-zarr-hdf5/ + + With default settings, the results returned from this method closely approximate those returned from: + ``cleanlab.filter.find_label_issues(..., filter_by="low_self_confidence", return_indices_ranked_by="self_confidence")`` + + This function internally implements the example usage script of the ``LabelInspector`` class, + but you can further customize that script by running it yourself instead of this function. + See the documentation of ``LabelInspector`` to learn more about how this method works internally. + + Parameters + ---------- + labels: np.ndarray-like object, optional + 1D array of given class labels for each example in the dataset, (int) values in ``0,1,2,...,K-1``. + To avoid loading big objects into memory, you should pass this as a memory-mapped object like: + Zarr array loaded with ``zarr.convenience.open(YOURFILE.zarr, mode="r")``, + or memmap array loaded with ``np.load(YOURFILE.npy, mmap_mode="r")``. + + Tip: You can save an existing numpy array to Zarr via: ``zarr.convenience.save_array(YOURFILE.zarr, your_array)``, + or to .npy file that can be loaded with mmap via: ``np.save(YOURFILE.npy, your_array)``. + + pred_probs: np.ndarray-like object, optional + 2D array of model-predicted class probabilities (floats) for each example in the dataset. + To avoid loading big objects into memory, you should pass this as a memory-mapped object like: + Zarr array loaded with ``zarr.convenience.open(YOURFILE.zarr, mode="r")`` + or memmap array loaded with ``np.load(YOURFILE.npy, mmap_mode="r")``. + + labels_file: str, optional + Specify this instead of `labels` if you want this method to load from file for you into a memmap array. + Path to .npy file where the entire 1D `labels` numpy array is stored on disk (list format is not supported). + This is loaded using: ``np.load(labels_file, mmap_mode="r")`` + so make sure this file was created via: ``np.save()`` or other compatible methods (.npz not supported). + + pred_probs_file: str, optional + Specify this instead of `pred_probs` if you want this method to load from file for you into a memmap array. + Path to .npy file where the entire `pred_probs` numpy array is stored on disk. + This is loaded using: ``np.load(pred_probs_file, mmap_mode="r")`` + so make sure this file was created via: ``np.save()`` or other compatible methods (.npz not supported). + + batch_size : int, optional + Size of mini-batches to use for estimating the label issues. + To maximize efficiency, try to use the largest `batch_size` your memory allows. + + n_jobs: int, optional + Number of processes for multiprocessing (default value = 1). Only used on Linux. + If `n_jobs=None`, will use either the number of: physical cores if psutil is installed, or logical cores otherwise. + + verbose : bool, optional + Whether to suppress print statements or not. + + quality_score_kwargs : dict, optional + Keyword arguments to pass into :py:func:`rank.get_label_quality_scores <cleanlab.rank.get_label_quality_scores>`. + + num_issue_kwargs : dict, optional + Keyword arguments to :py:func:`count.num_label_issues <cleanlab.count.num_label_issues>` + to control estimation of the number of label issues. + The only supported kwarg here for now is: `estimation_method`. + return_mask : bool, optional + Determines what is returned by this method: If `return_mask=True`, return a boolean mask. + If `False`, return a list of indices specifying examples with label issues, sorted by label quality score. + + Returns + ------- + label_issues : np.ndarray + If `return_mask` is `True`, returns a boolean **mask** for the entire dataset + where ``True`` represents a label issue and ``False`` represents an example that is + accurately labeled with high confidence. + If `return_mask` is `False`, returns an array containing **indices** of examples identified to have + label issues (i.e. those indices where the mask would be ``True``), sorted by likelihood that the corresponding label is correct. + -------- + >>> batch_size = 10000 # for efficiency, set this to as large of a value as your memory can handle + >>> # Just demonstrating how to save your existing numpy labels, pred_probs arrays to compatible .npy files: + >>> np.save("LABELS.npy", labels_array) + >>> np.save("PREDPROBS.npy", pred_probs_array) + >>> # You can load these back into memmap arrays via: labels = np.load("LABELS.npy", mmap_mode="r") + >>> # and then run this method on the memmap arrays, or just run it directly on the .npy files like this: + >>> issues = find_label_issues_batched(labels_file="LABELS.npy", pred_probs_file="PREDPROBS.npy", batch_size=batch_size) + >>> # This method also works with Zarr arrays: + >>> import zarr + >>> # Just demonstrating how to save your existing numpy labels, pred_probs arrays to compatible .zarr files: + >>> zarr.convenience.save_array("LABELS.zarr", labels_array) + >>> zarr.convenience.save_array("PREDPROBS.zarr", pred_probs_array) + >>> # You can load from such files into Zarr arrays: + >>> labels = zarr.convenience.open("LABELS.zarr", mode="r") + >>> pred_probs = zarr.convenience.open("PREDPROBS.zarr", mode="r") + >>> # This method can be directly run on Zarr arrays, memmap arrays, or regular numpy arrays: + >>> issues = find_label_issues_batched(labels=labels, pred_probs=pred_probs, batch_size=batch_size) + """ + if labels_file is not None: + if labels is not None: + raise ValueError("only specify one of: `labels` or `labels_file`") + if not isinstance(labels_file, str): + raise ValueError( + "labels_file must be str specifying path to .npy file containing the array of labels" + ) + labels = np.load(labels_file, mmap_mode="r") + assert isinstance(labels, np.ndarray) + + if pred_probs_file is not None: + if pred_probs is not None: + raise ValueError("only specify one of: `pred_probs` or `pred_probs_file`") + if not isinstance(pred_probs_file, str): + raise ValueError( + "pred_probs_file must be str specifying path to .npy file containing 2D array of pred_probs" + ) + pred_probs = np.load(pred_probs_file, mmap_mode="r") + assert isinstance(pred_probs, np.ndarray) + if verbose: + print( + f"mmap-loaded numpy arrays have: {len(pred_probs)} examples, {pred_probs.shape[1]} classes" + ) + if labels is None: + raise ValueError("must provide one of: `labels` or `labels_file`") + if pred_probs is None: + raise ValueError("must provide one of: `pred_probs` or `pred_probs_file`") + + assert pred_probs is not None + if len(labels) != len(pred_probs): + raise ValueError( + f"len(labels)={len(labels)} does not match len(pred_probs)={len(pred_probs)}. Perhaps an issue loading mmap numpy arrays from file." + ) + lab = LabelInspector( + num_class=pred_probs.shape[1], + verbose=verbose, + n_jobs=n_jobs, + quality_score_kwargs=quality_score_kwargs, + num_issue_kwargs=num_issue_kwargs, + ) + n = len(labels) + if verbose: + from tqdm.auto import tqdm + + pbar = tqdm(desc="number of examples processed for estimating thresholds", total=n) + i = 0 + while i < n: + end_index = i + batch_size + labels_batch = labels[i:end_index] + pred_probs_batch = pred_probs[i:end_index, :] + i = end_index + lab.update_confident_thresholds(labels_batch, pred_probs_batch) + if verbose: + pbar.update(batch_size) + + # Next evaluate the quality of the labels (run this on full dataset you want to evaluate): + if verbose: + pbar.close() + pbar = tqdm(desc="number of examples processed for checking labels", total=n) + i = 0 + while i < n: + end_index = i + batch_size + labels_batch = labels[i:end_index] + pred_probs_batch = pred_probs[i:end_index, :] + i = end_index + _ = lab.score_label_quality(labels_batch, pred_probs_batch) + if verbose: + pbar.update(batch_size) + + if verbose: + pbar.close() + + label_issues_indices = lab.get_label_issues() + label_issues_mask = np.zeros(len(labels), dtype=bool) + label_issues_mask[label_issues_indices] = True + mask = _reduce_issues(pred_probs=pred_probs, labels=labels) + label_issues_mask[mask] = False + if return_mask: + return label_issues_mask + return np.where(label_issues_mask)[0]
+ + +
[docs]class LabelInspector: + """ + Class for finding label issues in big datasets where memory becomes a problem for other cleanlab methods. + Only create one such object per dataset and do not try to use the same ``LabelInspector`` across 2 datasets. + For efficiency, this class does little input checking. + You can first run :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` + on a small subset of your data to verify your inputs are properly formatted. + Do NOT modify any of the attributes of this class yourself! + Multi-label classification is not supported by this class, it is only for multi-class classification. + + The recommended usage demonstrated in the examples script below involves two passes over your data: + one pass to compute `confident_thresholds`, another to evaluate each label. + To maximize efficiency, try to use the largest batch_size your memory allows. + To reduce runtime further, you can run the first pass on a subset of your dataset + as long as it contains enough data from each class to estimate `confident_thresholds` accurately. + + In the examples script below: + - `labels` is a (big) 1D ``np.ndarray`` of class labels represented as integers in ``0,1,...,K-1``. + - ``pred_probs`` = is a (big) 2D ``np.ndarray`` of predicted class probabilities, + where each row is an example, each column represents a class. + + `labels` and `pred_probs` can be stored in a file instead where you load chunks of them at a time. + Methods to load arrays in chunks include: ``np.load(...,mmap_mode='r')``, ``numpy.memmap()``, + HDF5 or Zarr files, see: https://pythonspeed.com/articles/mmap-vs-zarr-hdf5/ + + Examples + -------- + >>> n = len(labels) + >>> batch_size = 10000 # you can change this in between batches, set as big as your RAM allows + >>> lab = LabelInspector(num_class = pred_probs.shape[1]) + >>> # First compute confident thresholds (for faster results, can also do this on a random subset of your data): + >>> i = 0 + >>> while i < n: + >>> end_index = i + batch_size + >>> labels_batch = labels[i:end_index] + >>> pred_probs_batch = pred_probs[i:end_index,:] + >>> i = end_index + >>> lab.update_confident_thresholds(labels_batch, pred_probs_batch) + >>> # See what we calculated: + >>> confident_thresholds = lab.get_confident_thresholds() + >>> # Evaluate the quality of the labels (run this on full dataset you want to evaluate): + >>> i = 0 + >>> while i < n: + >>> end_index = i + batch_size + >>> labels_batch = labels[i:end_index] + >>> pred_probs_batch = pred_probs[i:end_index,:] + >>> i = end_index + >>> batch_results = lab.score_label_quality(labels_batch, pred_probs_batch) + >>> # Indices of examples with label issues, sorted by label quality score (most severe to least severe): + >>> indices_of_examples_with_issues = lab.get_label_issues() + >>> # If your `pred_probs` and `labels` are arrays already in memory, + >>> # then you can use this shortcut for all of the above: + >>> indices_of_examples_with_issues = find_label_issues_batched(labels, pred_probs, batch_size=10000) + + Parameters + ---------- + num_class : int + The number of classes in your multi-class classification task. + + store_results : bool, optional + Whether this object will store all label quality scores, a 1D array of shape ``(N,)`` + where ``N`` is the total number of examples in your dataset. + Set this to False if you encounter memory problems even for small batch sizes (~1000). + If ``False``, you can still identify the label issues yourself by aggregating + the label quality scores for each batch, sorting them across all batches, and returning the top ``T`` indices + with ``T = self.get_num_issues()``. + + verbose : bool, optional + Whether to suppress print statements or not. + + n_jobs: int, optional + Number of processes for multiprocessing (default value = 1). Only used on Linux. + If `n_jobs=None`, will use either the number of: physical cores if psutil is installed, or logical cores otherwise. + + quality_score_kwargs : dict, optional + Keyword arguments to pass into :py:func:`rank.get_label_quality_scores <cleanlab.rank.get_label_quality_scores>`. + + num_issue_kwargs : dict, optional + Keyword arguments to :py:func:`count.num_label_issues <cleanlab.count.num_label_issues>` + to control estimation of the number of label issues. + The only supported kwarg here for now is: `estimation_method`. + """ + + def __init__( + self, + *, + num_class: int, + store_results: bool = True, + verbose: bool = True, + quality_score_kwargs: Optional[dict] = None, + num_issue_kwargs: Optional[dict] = None, + n_jobs: Optional[int] = 1, + ): + if quality_score_kwargs is None: + quality_score_kwargs = {} + if num_issue_kwargs is None: + num_issue_kwargs = {} + + self.num_class = num_class + self.store_results = store_results + self.verbose = verbose + self.quality_score_kwargs = quality_score_kwargs # extra arguments for ``rank.get_label_quality_scores()`` to control label quality scoring + self.num_issue_kwargs = num_issue_kwargs # extra arguments for ``count.num_label_issues()`` to control estimation of the number of label issues (only supported argument for now is: `estimation_method`). + self.off_diagonal_calibrated = False + if num_issue_kwargs.get("estimation_method") == "off_diagonal_calibrated": + # store extra attributes later needed for calibration: + self.off_diagonal_calibrated = True + self.prune_counts = np.zeros(self.num_class) + self.class_counts = np.zeros(self.num_class) + self.normalization = np.zeros(self.num_class) + else: + self.prune_count = 0 # number of label issues estimated based on data seen so far (only used when estimation_method is not calibrated) + + if self.store_results: + self.label_quality_scores: List[float] = [] + + self.confident_thresholds = np.zeros( + (num_class,) + ) # current estimate of thresholds based on data seen so far + self.examples_per_class = np.zeros( + (num_class,) + ) # current counts of examples with each given label seen so far + self.examples_processed_thresh = ( + 0 # number of examples seen so far for estimating thresholds + ) + self.examples_processed_quality = 0 # number of examples seen so far for estimating label quality and number of label issues + # Determine number of cores for multiprocessing: + self.n_jobs: Optional[int] = None + os_name = platform.system() + if os_name != "Linux": + self.n_jobs = 1 + if n_jobs is not None and n_jobs != 1 and self.verbose: + print( + "n_jobs is overridden to 1 because multiprocessing is only supported for Linux." + ) + elif n_jobs is not None: + self.n_jobs = n_jobs + else: + if PSUTIL_EXISTS: + self.n_jobs = psutil.cpu_count(logical=False) # physical cores + if not self.n_jobs: + # switch to logical cores + self.n_jobs = mp.cpu_count() + if self.verbose: + print( + f"Multiprocessing will default to using the number of logical cores ({self.n_jobs}). To default to number of physical cores: pip install psutil" + ) + +
[docs] def get_confident_thresholds(self, silent: bool = False) -> np.ndarray: + """ + Fetches already-computed confident thresholds from the data seen so far + in same format as: :py:func:`count.get_confident_thresholds <cleanlab.count.get_confident_thresholds>`. + + + Returns + ------- + confident_thresholds : np.ndarray + An array of shape ``(K, )`` where ``K`` is the number of classes. + """ + if self.examples_processed_thresh < 1: + raise ValueError( + "Have not computed any confident_thresholds yet. Call `update_confident_thresholds()` first." + ) + else: + if self.verbose and not silent: + print( + f"Total number of examples used to estimate confident thresholds: {self.examples_processed_thresh}" + ) + return self.confident_thresholds
+ +
[docs] def get_num_issues(self, silent: bool = False) -> int: + """ + Fetches already-computed estimate of the number of label issues in the data seen so far + in the same format as: :py:func:`count.num_label_issues <cleanlab.count.num_label_issues>`. + + Note: The estimated number of issues may differ from :py:func:`count.num_label_issues <cleanlab.count.num_label_issues>` + by 1 due to rounding differences. + + Returns + ------- + num_issues : int + The estimated number of examples with label issues in the data seen so far. + """ + if self.examples_processed_quality < 1: + raise ValueError( + "Have not evaluated any labels yet. Call `score_label_quality()` first." + ) + else: + if self.verbose and not silent: + print( + f"Total number of examples whose labels have been evaluated: {self.examples_processed_quality}" + ) + if self.off_diagonal_calibrated: + calibrated_prune_counts = ( + self.prune_counts + * self.class_counts + / np.clip(self.normalization, a_min=CLIPPING_LOWER_BOUND, a_max=None) + ) # avoid division by 0 + return np.rint(np.sum(calibrated_prune_counts)).astype("int") + else: # not calibrated + return self.prune_count
+ +
[docs] def get_quality_scores(self) -> np.ndarray: + """ + Fetches already-computed estimate of the label quality of each example seen so far + in the same format as: :py:func:`rank.get_label_quality_scores <cleanlab.rank.get_label_quality_scores>`. + + Returns + ------- + label_quality_scores : np.ndarray + Contains one score (between 0 and 1) per example seen so far. + Lower scores indicate more likely mislabeled examples. + """ + if not self.store_results: + raise ValueError( + "Must initialize the LabelInspector with `store_results` == True. " + "Otherwise you can assemble the label quality scores yourself based on " + "the scores returned for each batch of data from `score_label_quality()`" + ) + else: + return np.asarray(self.label_quality_scores)
+ +
[docs] def get_label_issues(self) -> np.ndarray: + """ + Fetches already-computed estimate of indices of examples with label issues in the data seen so far, + in the same format as: :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` + with its `return_indices_ranked_by` argument specified. + + Note: this method corresponds to ``filter.find_label_issues(..., filter_by=METHOD1, return_indices_ranked_by=METHOD2)`` + where by default: ``METHOD1="low_self_confidence"``, ``METHOD2="self_confidence"`` + or if this object was instantiated with ``quality_score_kwargs = {"method": "normalized_margin"}`` then we instead have: + ``METHOD1="low_normalized_margin"``, ``METHOD2="normalized_margin"``. + + Note: The estimated number of issues may differ from :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` + by 1 due to rounding differences. + + Returns + ------- + issue_indices : np.ndarray + Indices of examples with label issues, sorted by label quality score. + """ + if not self.store_results: + raise ValueError( + "Must initialize the LabelInspector with `store_results` == True. " + "Otherwise you can identify label issues yourself based on the scores from all " + "the batches of data and the total number of issues returned by `get_num_issues()`" + ) + if self.examples_processed_quality < 1: + raise ValueError( + "Have not evaluated any labels yet. Call `score_label_quality()` first." + ) + if self.verbose: + print( + f"Total number of examples whose labels have been evaluated: {self.examples_processed_quality}" + ) + return find_top_issues(self.get_quality_scores(), top=self.get_num_issues(silent=True))
+ +
[docs] def update_confident_thresholds(self, labels: LabelLike, pred_probs: np.ndarray): + """ + Updates the estimate of confident_thresholds stored in this class using a new batch of data. + Inputs should be in same format as for: :py:func:`count.get_confident_thresholds <cleanlab.count.get_confident_thresholds>`. + + Parameters + ---------- + labels: np.ndarray or list + Given class labels for each example in the batch, values in ``0,1,2,...,K-1``. + + pred_probs: np.ndarray + 2D array of model-predicted class probabilities for each example in the batch. + """ + labels = _batch_check(labels, pred_probs, self.num_class) + batch_size = len(labels) + batch_thresholds = get_confident_thresholds( + labels, pred_probs + ) # values for missing classes may exceed 1 but should not matter since we multiply by this class counts in the batch + batch_class_counts = value_counts_fill_missing_classes(labels, num_classes=self.num_class) + self.confident_thresholds = ( + self.examples_per_class * self.confident_thresholds + + batch_class_counts * batch_thresholds + ) / np.clip( + self.examples_per_class + batch_class_counts, a_min=1, a_max=None + ) # avoid division by 0 + self.confident_thresholds = np.clip( + self.confident_thresholds, a_min=CONFIDENT_THRESHOLDS_LOWER_BOUND, a_max=None + ) + self.examples_per_class += batch_class_counts + self.examples_processed_thresh += batch_size
+ +
[docs] def score_label_quality( + self, + labels: LabelLike, + pred_probs: np.ndarray, + *, + update_num_issues: bool = True, + ) -> np.ndarray: + """ + Scores the label quality of each example in the provided batch of data, + and also updates the number of label issues stored in this class. + Inputs should be in same format as for: :py:func:`rank.get_label_quality_scores <cleanlab.rank.get_label_quality_scores>`. + + Parameters + ---------- + labels: np.ndarray + Given class labels for each example in the batch, values in ``0,1,2,...,K-1``. + + pred_probs: np.ndarray + 2D array of model-predicted class probabilities for each example in the batch of data. + + update_num_issues: bool, optional + Whether or not to update the number of label issues or only compute label quality scores. + For lower runtimes, set this to ``False`` if you only want to score label quality and not find label issues. + + Returns + ------- + label_quality_scores : np.ndarray + Contains one score (between 0 and 1) for each example in the batch of data. + """ + labels = _batch_check(labels, pred_probs, self.num_class) + batch_size = len(labels) + scores = _compute_label_quality_scores( + labels, + pred_probs, + confident_thresholds=self.get_confident_thresholds(silent=True), + **self.quality_score_kwargs, + ) + class_counts = value_counts_fill_missing_classes(labels, num_classes=self.num_class) + if update_num_issues: + self._update_num_label_issues(labels, pred_probs, **self.num_issue_kwargs) + self.examples_processed_quality += batch_size + if self.store_results: + self.label_quality_scores += list(scores) + + return scores
+ + def _update_num_label_issues( + self, + labels: LabelLike, + pred_probs: np.ndarray, + **kwargs, + ): + """ + Update the estimate of num_label_issues stored in this class using a new batch of data. + Kwargs are ignored here for now (included for forwards compatibility). + Instead of being specified here, `estimation_method` should be declared when this class is initialized. + """ + + # whether to match the output of count.num_label_issues exactly + # default is False, which gives significant speedup on large batches + # and empirically matches num_label_issues even on input sizes of + # 1M x 10k + thorough = False + if self.examples_processed_thresh < 1: + raise ValueError( + "Have not computed any confident_thresholds yet. Call `update_confident_thresholds()` first." + ) + + if self.n_jobs == 1: + adj_confident_thresholds = self.confident_thresholds - FLOATING_POINT_COMPARISON + pred_class = np.argmax(pred_probs, axis=1) + batch_size = len(labels) + if thorough: + # add margin for floating point comparison operations: + pred_gt_thresholds = pred_probs >= adj_confident_thresholds + max_ind = np.argmax(pred_probs * pred_gt_thresholds, axis=1) + if not self.off_diagonal_calibrated: + mask = (max_ind != labels) & (pred_class != labels) + else: + # calibrated + # should we change to above? + mask = pred_class != labels + else: + max_ind = pred_class + mask = pred_class != labels + + if not self.off_diagonal_calibrated: + prune_count_batch = np.sum( + ( + pred_probs[np.arange(batch_size), max_ind] + >= adj_confident_thresholds[max_ind] + ) + & mask + ) + self.prune_count += prune_count_batch + else: # calibrated + self.class_counts += value_counts_fill_missing_classes( + labels, num_classes=self.num_class + ) + to_increment = ( + pred_probs[np.arange(batch_size), max_ind] >= adj_confident_thresholds[max_ind] + ) + for class_label in range(self.num_class): + labels_equal_to_class = labels == class_label + self.normalization[class_label] += np.sum(labels_equal_to_class & to_increment) + self.prune_counts[class_label] += np.sum( + labels_equal_to_class + & to_increment + & (max_ind != labels) + # & (pred_class != labels) + # This is not applied in num_label_issues(..., estimation_method="off_diagonal_custom"). Do we want to add it? + ) + else: # multiprocessing implementation + global adj_confident_thresholds_shared + adj_confident_thresholds_shared = self.confident_thresholds - FLOATING_POINT_COMPARISON + + global labels_shared, pred_probs_shared + labels_shared = labels + pred_probs_shared = pred_probs + + # good values for this are ~1000-10000 in benchmarks where pred_probs has 1B entries: + processes = 5000 + if len(labels) <= processes: + chunksize = 1 + else: + chunksize = len(labels) // processes + inds = split_arr(np.arange(len(labels)), chunksize) + + if thorough: + use_thorough = np.ones(len(inds), dtype=bool) + else: + use_thorough = np.zeros(len(inds), dtype=bool) + args = zip(inds, use_thorough) + with mp.Pool(self.n_jobs) as pool: + if not self.off_diagonal_calibrated: + prune_count_batch = np.sum( + np.asarray(list(pool.imap_unordered(_compute_num_issues, args))) + ) + self.prune_count += prune_count_batch + else: + results = list(pool.imap_unordered(_compute_num_issues_calibrated, args)) + for result in results: + class_label = result[0] + self.class_counts[class_label] += 1 + self.normalization[class_label] += result[1] + self.prune_counts[class_label] += result[2]
+ + +
[docs]def split_arr(arr: np.ndarray, chunksize: int) -> List[np.ndarray]: + """ + Helper function to split array into chunks for multiprocessing. + """ + return np.split(arr, np.arange(chunksize, arr.shape[0], chunksize), axis=0)
+ + +def _compute_num_issues(arg: Tuple[np.ndarray, bool]) -> int: + """ + Helper function for `_update_num_label_issues` multiprocessing without calibration. + """ + ind = arg[0] + thorough = arg[1] + label = labels_shared[ind] + pred_prob = pred_probs_shared[ind, :] + pred_class = np.argmax(pred_prob, axis=-1) + batch_size = len(label) + + if thorough: + pred_gt_thresholds = pred_prob >= adj_confident_thresholds_shared + max_ind = np.argmax(pred_prob * pred_gt_thresholds, axis=-1) + prune_count_batch = np.sum( + (pred_prob[np.arange(batch_size), max_ind] >= adj_confident_thresholds_shared[max_ind]) + & (max_ind != label) + & (pred_class != label) + ) + else: + prune_count_batch = np.sum( + ( + pred_prob[np.arange(batch_size), pred_class] + >= adj_confident_thresholds_shared[pred_class] + ) + & (pred_class != label) + ) + return prune_count_batch + + +def _compute_num_issues_calibrated(arg: Tuple[np.ndarray, bool]) -> Tuple[Any, int, int]: + """ + Helper function for `_update_num_label_issues` multiprocessing with calibration. + """ + ind = arg[0] + thorough = arg[1] + label = labels_shared[ind] + pred_prob = pred_probs_shared[ind, :] + batch_size = len(label) + + pred_class = np.argmax(pred_prob, axis=-1) + if thorough: + pred_gt_thresholds = pred_prob >= adj_confident_thresholds_shared + max_ind = np.argmax(pred_prob * pred_gt_thresholds, axis=-1) + to_inc = ( + pred_prob[np.arange(batch_size), max_ind] >= adj_confident_thresholds_shared[max_ind] + ) + + prune_count_batch = to_inc & (max_ind != label) + normalization_batch = to_inc + else: + to_inc = ( + pred_prob[np.arange(batch_size), pred_class] + >= adj_confident_thresholds_shared[pred_class] + ) + normalization_batch = to_inc + prune_count_batch = to_inc & (pred_class != label) + + return (label, normalization_batch, prune_count_batch) + + +def _batch_check(labels: LabelLike, pred_probs: np.ndarray, num_class: int) -> np.ndarray: + """ + Basic checks to ensure batch of data looks ok. For efficiency, this check is quite minimal. + + Returns + ------- + labels : np.ndarray + `labels` formatted as a 1D array. + """ + batch_size = pred_probs.shape[0] + labels = np.asarray(labels) + if len(labels) != batch_size: + raise ValueError("labels and pred_probs must have same length") + if pred_probs.shape[1] != num_class: + raise ValueError("num_class must equal pred_probs.shape[1]") + + return labels +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/experimental/mnist_pytorch.html b/v2.6.6/_modules/cleanlab/experimental/mnist_pytorch.html new file mode 100644 index 000000000..c08484d6c --- /dev/null +++ b/v2.6.6/_modules/cleanlab/experimental/mnist_pytorch.html @@ -0,0 +1,1064 @@ + + + + + + + + + + + cleanlab.experimental.mnist_pytorch - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.experimental.mnist_pytorch

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+A cleanlab-compatible PyTorch ConvNet classifier that can be used to find
+label issues in image data.
+This is a good example to reference for making your own bespoke model compatible with cleanlab.
+
+You must have PyTorch installed: https://pytorch.org/get-started/locally/
+"""
+
+from sklearn.base import BaseEstimator
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+from torchvision import datasets, transforms
+from torch.autograd import Variable
+from torch.utils.data.sampler import SubsetRandomSampler
+import numpy as np
+
+
+MNIST_TRAIN_SIZE = 60000
+MNIST_TEST_SIZE = 10000
+SKLEARN_DIGITS_TRAIN_SIZE = 1247
+SKLEARN_DIGITS_TEST_SIZE = 550
+
+
+
[docs]def get_mnist_dataset(loader): # pragma: no cover + """Downloads MNIST as PyTorch dataset. + + Parameters + ---------- + loader : str (values: 'train' or 'test').""" + dataset = datasets.MNIST( + root="../data", + train=(loader == "train"), + download=True, + transform=transforms.Compose( + [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))] + ), + ) + return dataset
+ + +
[docs]def get_sklearn_digits_dataset(loader): + """Downloads Sklearn handwritten digits dataset. + Uses the last SKLEARN_DIGITS_TEST_SIZE examples as the test + This is (hard-coded) -- do not change. + + Parameters + ---------- + loader : str (values: 'train' or 'test').""" + from torch.utils.data import Dataset + from sklearn.datasets import load_digits + + class TorchDataset(Dataset): + """Abstracts a numpy array as a PyTorch dataset.""" + + def __init__(self, data, targets, transform=None): + self.data = torch.from_numpy(data).float() + self.targets = torch.from_numpy(targets).long() + self.transform = transform + + def __getitem__(self, index): + x = self.data[index] + y = self.targets[index] + if self.transform: + x = self.transform(x) + return x, y + + def __len__(self): + return len(self.data) + + transform = transforms.Compose( + [ + transforms.ToPILImage(), + transforms.Resize(28), + transforms.ToTensor(), + transforms.Normalize((0.1307,), (0.3081,)), + ] + ) + # Get sklearn digits dataset + X_all, y_all = load_digits(return_X_y=True) + X_all = X_all.reshape((len(X_all), 8, 8)) + y_train = y_all[:-SKLEARN_DIGITS_TEST_SIZE] + y_test = y_all[-SKLEARN_DIGITS_TEST_SIZE:] + X_train = X_all[:-SKLEARN_DIGITS_TEST_SIZE] + X_test = X_all[-SKLEARN_DIGITS_TEST_SIZE:] + if loader == "train": + return TorchDataset(X_train, y_train, transform=transform) + elif loader == "test": + return TorchDataset(X_test, y_test, transform=transform) + else: # prama: no cover + raise ValueError("loader must be either str 'train' or str 'test'.")
+ + +
[docs]class SimpleNet(nn.Module): + """Basic Pytorch CNN for MNIST-like data.""" + + def __init__(self): + super(SimpleNet, self).__init__() + self.conv1 = nn.Conv2d(1, 10, kernel_size=5) + self.conv2 = nn.Conv2d(10, 20, kernel_size=5) + self.conv2_drop = nn.Dropout2d() + self.fc1 = nn.Linear(320, 50) + self.fc2 = nn.Linear(50, 10) + +
[docs] def forward(self, x, T=1.0): + x = F.relu(F.max_pool2d(self.conv1(x), 2)) + x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) + x = x.view(-1, 320) + x = F.relu(self.fc1(x)) + x = F.dropout(x, training=self.training) + x = self.fc2(x) + x = F.log_softmax(x, dim=1) + return x
+ + +
[docs]class CNN(BaseEstimator): # Inherits sklearn classifier + """Wraps a PyTorch CNN for the MNIST dataset within an sklearn template + + Defines ``.fit()``, ``.predict()``, and ``.predict_proba()`` functions. This + template enables the PyTorch CNN to flexibly be used within the sklearn + architecture -- meaning it can be passed into functions like + cross_val_predict as if it were an sklearn model. The cleanlab library + requires that all models adhere to this basic sklearn template and thus, + this class allows a PyTorch CNN to be used in for learning with noisy + labels among other things. + + Parameters + ---------- + batch_size: int + epochs: int + log_interval: int + lr: float + momentum: float + no_cuda: bool + seed: int + test_batch_size: int, default=None + dataset: {'mnist', 'sklearn-digits'} + loader: {'train', 'test'} + Set to 'test' to force fit() and predict_proba() on test_set + + Note + ---- + Be careful setting the ``loader`` param, it will override every other loader + If you set this to 'test', but call .predict(loader = 'train') + then .predict() will still predict on test! + + Attributes + ---------- + batch_size: int + epochs: int + log_interval: int + lr: float + momentum: float + no_cuda: bool + seed: int + test_batch_size: int, default=None + dataset: {'mnist', 'sklearn-digits'} + loader: {'train', 'test'} + Set to 'test' to force fit() and predict_proba() on test_set + + Methods + ------- + fit + fits the model to data. + predict + get the fitted model's prediction on test data + predict_proba + get the fitted model's probability distribution over classes for test data + """ + + def __init__( + self, + batch_size=64, + epochs=6, + log_interval=50, # Set to None to not print + lr=0.01, + momentum=0.5, + no_cuda=False, + seed=1, + test_batch_size=None, + dataset="mnist", + loader=None, + ): + self.batch_size = batch_size + self.epochs = epochs + self.log_interval = log_interval + self.lr = lr + self.momentum = momentum + self.no_cuda = no_cuda + self.seed = seed + self.cuda = not self.no_cuda and torch.cuda.is_available() + torch.manual_seed(self.seed) + if self.cuda: # pragma: no cover + torch.cuda.manual_seed(self.seed) + + # Instantiate PyTorch model + self.model = SimpleNet() + if self.cuda: # pragma: no cover + self.model.cuda() + + self.loader_kwargs = {"num_workers": 1, "pin_memory": True} if self.cuda else {} + self.loader = loader + self._set_dataset(dataset) + if test_batch_size is not None: + self.test_batch_size = test_batch_size + else: + self.test_batch_size = self.test_size + + def _set_dataset(self, dataset): + self.dataset = dataset + if dataset == "mnist": + # pragma: no cover + self.get_dataset = get_mnist_dataset + self.train_size = MNIST_TRAIN_SIZE + self.test_size = MNIST_TEST_SIZE + elif dataset == "sklearn-digits": + self.get_dataset = get_sklearn_digits_dataset + self.train_size = SKLEARN_DIGITS_TRAIN_SIZE + self.test_size = SKLEARN_DIGITS_TEST_SIZE + else: # pragma: no cover + raise ValueError("dataset must be 'mnist' or 'sklearn-digits'.") + + # XXX this is a pretty weird sklearn estimator that does data loading + # internally in `fit`, and it supports multiple datasets and is aware of + # which dataset it's using; if we weren't doing this, we wouldn't need to + # override `get_params` / `set_params` +
[docs] def get_params(self, deep=True): + return { + "batch_size": self.batch_size, + "epochs": self.epochs, + "log_interval": self.log_interval, + "lr": self.lr, + "momentum": self.momentum, + "no_cuda": self.no_cuda, + "test_batch_size": self.test_batch_size, + "dataset": self.dataset, + }
+ +
[docs] def set_params(self, **parameters): # pragma: no cover + for parameter, value in parameters.items(): + if parameter != "dataset": + setattr(self, parameter, value) + if "dataset" in parameters: + self._set_dataset(parameters["dataset"]) + return self
+ +
[docs] def fit(self, train_idx, train_labels=None, sample_weight=None, loader="train"): + """This function adheres to sklearn's "fit(X, y)" format for + compatibility with scikit-learn. ** All inputs should be numpy + arrays, not pyTorch Tensors train_idx is not X, but instead a list of + indices for X (and y if train_labels is None). This function is a + member of the cnn class which will handle creation of X, y from the + train_idx via the train_loader.""" + if self.loader is not None: + loader = self.loader + if train_labels is not None and len(train_idx) != len(train_labels): + raise ValueError("Check that train_idx and train_labels are the same length.") + + if sample_weight is not None: # pragma: no cover + if len(sample_weight) != len(train_labels): + raise ValueError( + "Check that train_labels and sample_weight " "are the same length." + ) + class_weight = sample_weight[np.unique(train_labels, return_index=True)[1]] + class_weight = torch.from_numpy(class_weight).float() + if self.cuda: + class_weight = class_weight.cuda() + else: + class_weight = None + + train_dataset = self.get_dataset(loader) + + # Use provided labels if not None o.w. use MNIST dataset training labels + if train_labels is not None: + # Create sparse tensor of train_labels with (-1)s for labels not + # in train_idx. We avoid train_data[idx] because train_data may + # very large, i.e. ImageNet + sparse_labels = ( + np.zeros(self.train_size if loader == "train" else self.test_size, dtype=int) - 1 + ) + sparse_labels[train_idx] = train_labels + train_dataset.targets = sparse_labels + + train_loader = torch.utils.data.DataLoader( + dataset=train_dataset, + # sampler=SubsetRandomSampler(train_idx if train_idx is not None + # else range(self.train_size)), + sampler=SubsetRandomSampler(train_idx), + batch_size=self.batch_size, + **self.loader_kwargs, + ) + + optimizer = optim.SGD(self.model.parameters(), lr=self.lr, momentum=self.momentum) + + # Train for self.epochs epochs + for epoch in range(1, self.epochs + 1): + # Enable dropout and batch norm layers + self.model.train() + for batch_idx, (data, target) in enumerate(train_loader): + if self.cuda: # pragma: no cover + data, target = data.cuda(), target.cuda() + data, target = Variable(data), Variable(target).long() + optimizer.zero_grad() + output = self.model(data) + loss = F.nll_loss(output, target, class_weight) + loss.backward() + optimizer.step() + if self.log_interval is not None and batch_idx % self.log_interval == 0: + print( + "TrainEpoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}".format( + epoch, + batch_idx * len(data), + len(train_idx), + 100.0 * batch_idx / len(train_loader), + loss.item(), + ), + )
+ +
[docs] def predict(self, idx=None, loader=None): + """Get predicted labels from trained model.""" + # get the index of the max probability + probs = self.predict_proba(idx, loader) + return probs.argmax(axis=1)
+ +
[docs] def predict_proba(self, idx=None, loader=None): + if self.loader is not None: + loader = self.loader + if loader is None: + is_test_idx = ( + idx is not None + and len(idx) == self.test_size + and np.all(np.array(idx) == np.arange(self.test_size)) + ) + loader = "test" if is_test_idx else "train" + dataset = self.get_dataset(loader) + # Filter by idx + if idx is not None: + if (loader == "train" and len(idx) != self.train_size) or ( + loader == "test" and len(idx) != self.test_size + ): + dataset.data = dataset.data[idx] + dataset.targets = dataset.targets[idx] + + loader = torch.utils.data.DataLoader( + dataset=dataset, + batch_size=self.batch_size if loader == "train" else self.test_batch_size, + **self.loader_kwargs, + ) + + # sets model.train(False) inactivating dropout and batch-norm layers + self.model.eval() + + # Run forward pass on model to compute outputs + outputs = [] + for data, _ in loader: + if self.cuda: # pragma: no cover + data = data.cuda() + with torch.no_grad(): + data = Variable(data) + output = self.model(data) + outputs.append(output) + + # Outputs are log_softmax (log probabilities) + outputs = torch.cat(outputs, dim=0) + # Convert to probabilities and return the numpy array of shape N x K + out = outputs.cpu().numpy() if self.cuda else outputs.numpy() + pred = np.exp(out) + return pred
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/experimental/span_classification.html b/v2.6.6/_modules/cleanlab/experimental/span_classification.html new file mode 100644 index 000000000..897896e08 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/experimental/span_classification.html @@ -0,0 +1,785 @@ + + + + + + + + + + + cleanlab.experimental.span_classification - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.experimental.span_classification

+"""
+Methods to find label issues in span classification datasets (text data), each token in a sentence receives one or more class labels.
+
+The underlying label error detection algorithms are in `cleanlab.token_classification`.
+"""
+
+import numpy as np
+from typing import List, Tuple, Optional
+
+from cleanlab.token_classification.filter import find_label_issues as find_label_issues_token
+from cleanlab.token_classification.summary import display_issues as display_issues_token
+from cleanlab.token_classification.rank import (
+    get_label_quality_scores as get_label_quality_scores_token,
+)
+
+
+
[docs]def find_label_issues( + labels: list, + pred_probs: list, +): + """Identifies tokens with label issues in a span classification dataset. + + Tokens identified with issues will be ranked by their individual label quality score. + + To rank the sentences based on their overall label quality, use :py:func:`experimental.span_classification.get_label_quality_scores <cleanlab.experimental.span_classification.get_label_quality_scores>` + + Parameters + ---------- + labels: + Nested list of given labels for all tokens. + Refer to documentation for this argument in :py:func:`token_classification.filter.find_label_issues <cleanlab.token_classification.filter.find_label_issues>` for further details. + + Note: Currently, only a single span class is supported. + + pred_probs: + An array of shape ``(T, K)`` of model-predicted class probabilities. + Refer to documentation for this argument in :py:func:`token_classification.filter.find_label_issues <cleanlab.token_classification.filter.find_label_issues>` for further details. + + Returns + ------- + issues: + List of label issues identified by cleanlab, such that each element is a tuple ``(i, j)``, which + indicates that the `j`-th token of the `i`-th sentence has a label issue. + + These tuples are ordered in `issues` list based on the likelihood that the corresponding token is mislabeled. + + Use :py:func:`experimental.span_classification.get_label_quality_scores <cleanlab.experimental.span_classification.get_label_quality_scores>` + to view these issues within the original sentences. + + Examples + -------- + >>> import numpy as np + >>> from cleanlab.experimental.span_classification import find_label_issues + >>> labels = [[0, 0, 1, 1], [1, 1, 0]] + >>> pred_probs = [ + ... np.array([0.9, 0.9, 0.9, 0.1]), + ... np.array([0.1, 0.1, 0.9]), + ... ] + >>> find_label_issues(labels, pred_probs) + """ + pred_probs_token = _get_pred_prob_token(pred_probs) + return find_label_issues_token(labels, pred_probs_token)
+ + +
[docs]def display_issues( + issues: list, + tokens: List[List[str]], + *, + labels: Optional[list] = None, + pred_probs: Optional[list] = None, + exclude: List[Tuple[int, int]] = [], + class_names: Optional[List[str]] = None, + top: int = 20, +) -> None: + """ + See documentation of :py:meth:`token_classification.summary.display_issues<cleanlab.token_classification.summary.display_issues>` for description. + """ + display_issues_token( + issues, + tokens, + labels=labels, + pred_probs=pred_probs, + exclude=exclude, + class_names=class_names, + top=top, + )
+ + +
[docs]def get_label_quality_scores( + labels: list, + pred_probs: list, + **kwargs, +) -> Tuple[np.ndarray, list]: + """ + See documentation of :py:meth:`token_classification.rank.get_label_quality_scores<cleanlab.token_classification.rank.get_label_quality_scores>` for description. + """ + pred_probs_token = _get_pred_prob_token(pred_probs) + return get_label_quality_scores_token(labels, pred_probs_token, **kwargs)
+ + +def _get_pred_prob_token(pred_probs: list) -> list: + """Converts pred_probs for span classification to pred_probs for token classification.""" + pred_probs_token = [] + for probs in pred_probs: + pred_probs_token.append(np.stack([1 - probs, probs], axis=1)) + return pred_probs_token +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/filter.html b/v2.6.6/_modules/cleanlab/filter.html new file mode 100644 index 000000000..b71d2026f --- /dev/null +++ b/v2.6.6/_modules/cleanlab/filter.html @@ -0,0 +1,1645 @@ + + + + + + + + + + + cleanlab.filter - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.filter

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods to identify which examples have label issues in a classification dataset.
+The documentation below assumes a dataset with ``N`` examples and ``K`` classes.
+This module is for standard (multi-class) classification where each example is labeled as belonging to exactly one of K classes (e.g. ``labels = np.array([0,0,1,0,2,1])``).
+Some methods here also work for multi-label classification data where each example can be labeled as belonging to multiple classes (e.g. ``labels = [[1,2],[1],[0],[],...]``),
+but we encourage using the methods in the ``cleanlab.multilabel_classification`` module instead for such data.
+"""
+
+import numpy as np
+from sklearn.metrics import confusion_matrix
+import multiprocessing
+import sys
+import warnings
+from typing import Any, Dict, Optional, Tuple, List
+from functools import reduce
+import platform
+
+from cleanlab.count import calibrate_confident_joint, num_label_issues, _reduce_issues
+from cleanlab.rank import order_label_issues, get_label_quality_scores
+import cleanlab.internal.multilabel_scorer as ml_scorer
+from cleanlab.internal.validation import assert_valid_inputs
+from cleanlab.internal.util import (
+    value_counts_fill_missing_classes,
+    round_preserving_row_totals,
+    get_num_classes,
+)
+from cleanlab.internal.multilabel_utils import stack_complement, get_onehot_num_classes, int2onehot
+from cleanlab.typing import LabelLike
+from cleanlab.multilabel_classification.filter import find_multilabel_issues_per_class
+
+# tqdm is a package to print time-to-complete when multiprocessing is used.
+# This package is not necessary, but when installed improves user experience for large datasets.
+try:
+    import tqdm.auto as tqdm
+
+    tqdm_exists = True
+except ImportError as e:  # pragma: no cover
+    tqdm_exists = False
+
+    w = """To see estimated completion times for methods in cleanlab.filter, "pip install tqdm"."""
+    warnings.warn(w)
+
+# psutil is a package used to count physical cores for multiprocessing
+# This package is not necessary, because we can always fall back to logical cores as the default
+try:
+    import psutil
+
+    psutil_exists = True
+except ImportError as e:  # pragma: no cover
+    psutil_exists = False
+
+# global variable for find_label_issues multiprocessing
+pred_probs_by_class: Dict[int, np.ndarray]
+prune_count_matrix_cols: Dict[int, np.ndarray]
+
+
+
[docs]def find_label_issues( + labels: LabelLike, + pred_probs: np.ndarray, + *, + return_indices_ranked_by: Optional[str] = None, + rank_by_kwargs: Optional[Dict[str, Any]] = None, + filter_by: str = "prune_by_noise_rate", + frac_noise: float = 1.0, + num_to_remove_per_class: Optional[List[int]] = None, + min_examples_per_class=1, + confident_joint: Optional[np.ndarray] = None, + n_jobs: Optional[int] = None, + verbose: bool = False, + multi_label: bool = False, +) -> np.ndarray: + """ + Identifies potentially bad labels in a classification dataset using confident learning. + + Returns a boolean mask for the entire dataset where ``True`` represents + an example identified with a label issue and ``False`` represents an example that seems correctly labeled. + + Instead of a mask, you can obtain indices of the examples with label issues in your dataset + (sorted by issue severity) by specifying the `return_indices_ranked_by` argument. + This determines which label quality score is used to quantify severity, + and is useful to view only the top-`J` most severe issues in your dataset. + + The number of indices returned as issues is controlled by `frac_noise`: reduce its + value to identify fewer label issues. If you aren't sure, leave this set to 1.0. + + Tip: if you encounter the error "pred_probs is not defined", try setting + ``n_jobs=1``. + + Parameters + ---------- + labels : np.ndarray or list + A discrete vector of noisy labels for a classification dataset, i.e. some labels may be erroneous. + *Format requirements*: for dataset with K classes, each label must be integer in 0, 1, ..., K-1. + For a standard (multi-class) classification dataset where each example is labeled with one class, + `labels` should be 1D array of shape ``(N,)``, for example: ``labels = [1,0,2,1,1,0...]``. + + pred_probs : np.ndarray, optional + An array of shape ``(N, K)`` of model-predicted class probabilities, + ``P(label=k|x)``. Each row of this matrix corresponds + to an example `x` and contains the model-predicted probabilities that + `x` belongs to each possible class, for each of the K classes. The + columns must be ordered such that these probabilities correspond to + class 0, 1, ..., K-1. + + **Note**: Returned label issues are most accurate when they are computed based on out-of-sample `pred_probs` from your model. + To obtain out-of-sample predicted probabilities for every datapoint in your dataset, you can use :ref:`cross-validation <pred_probs_cross_val>`. + This is encouraged to get better results. + + return_indices_ranked_by : {None, 'self_confidence', 'normalized_margin', 'confidence_weighted_entropy'}, default=None + Determines what is returned by this method: either a boolean mask or list of indices np.ndarray. + If ``None``, this function returns a boolean mask (``True`` if example at index is label error). + If not ``None``, this function returns a sorted array of indices of examples with label issues + (instead of a boolean mask). Indices are sorted by label quality score which can be one of: + + - ``'normalized_margin'``: ``normalized margin (p(label = k) - max(p(label != k)))`` + - ``'self_confidence'``: ``[pred_probs[i][labels[i]] for i in label_issues_idx]`` + - ``'confidence_weighted_entropy'``: ``entropy(pred_probs) / self_confidence`` + + rank_by_kwargs : dict, optional + Optional keyword arguments to pass into scoring functions for ranking by + label quality score (see :py:func:`rank.get_label_quality_scores + <cleanlab.rank.get_label_quality_scores>`). + + filter_by : {'prune_by_class', 'prune_by_noise_rate', 'both', 'confident_learning', 'predicted_neq_given', 'low_normalized_margin', 'low_self_confidence'}, default='prune_by_noise_rate' + Method to determine which examples are flagged as having label issue, so you can filter/prune them from the dataset. Options: + + - ``'prune_by_noise_rate'``: filters examples with *high probability* of being mislabeled for every non-diagonal in the confident joint (see `prune_counts_matrix` in `filter.py`). These are the examples where (with high confidence) the given label is unlikely to match the predicted label for the example. + - ``'prune_by_class'``: filters the examples with *smallest probability* of belonging to their given class label for every class. + - ``'both'``: filters only those examples that would be filtered by both ``'prune_by_noise_rate'`` and ``'prune_by_class'``. + - ``'confident_learning'``: filters the examples counted as part of the off-diagonals of the confident joint. These are the examples that are confidently predicted to be a different label than their given label. + - ``'predicted_neq_given'``: filters examples for which the predicted class (i.e. argmax of the predicted probabilities) does not match the given label. + - ``'low_normalized_margin'``: filters the examples with *smallest* normalized margin label quality score. The number of issues returned matches :py:func:`count.num_label_issues <cleanlab.count.num_label_issues>`. + - ``'low_self_confidence'``: filters the examples with *smallest* self confidence label quality score. The number of issues returned matches :py:func:`count.num_label_issues <cleanlab.count.num_label_issues>`. + + frac_noise : float, default=1.0 + Used to only return the "top" ``frac_noise * num_label_issues``. The choice of which "top" + label issues to return is dependent on the `filter_by` method used. It works by reducing the + size of the off-diagonals of the `joint` distribution of given labels and true labels + proportionally by `frac_noise` prior to estimating label issues with each method. + This parameter only applies for `filter_by=both`, `filter_by=prune_by_class`, and + `filter_by=prune_by_noise_rate` methods and currently is unused by other methods. + When ``frac_noise=1.0``, return all "confident" estimated noise indices (recommended). + + frac_noise * number_of_mislabeled_examples_in_class_k. + + num_to_remove_per_class : array_like + An iterable of length K, the number of classes. + E.g. if K = 3, ``num_to_remove_per_class=[5, 0, 1]`` would return + the indices of the 5 most likely mislabeled examples in class 0, + and the most likely mislabeled example in class 2. + + Note + ---- + Only set this parameter if ``filter_by='prune_by_class'``. + You may use with ``filter_by='prune_by_noise_rate'``, but + if ``num_to_remove_per_class=k``, then either k-1, k, or k+1 + examples may be removed for any class due to rounding error. If you need + exactly 'k' examples removed from every class, you should use + ``filter_by='prune_by_class'``. + + min_examples_per_class : int, default=1 + Minimum number of examples per class to avoid flagging as label issues. + This is useful to avoid deleting too much data from one class + when pruning noisy examples in datasets with rare classes. + + confident_joint : np.ndarray, optional + An array of shape ``(K, K)`` representing the confident joint, the matrix used for identifying label issues, which + estimates a confident subset of the joint distribution of the noisy and true labels, ``P_{noisy label, true label}``. + Entry ``(j, k)`` in the matrix is the number of examples confidently counted into the pair of ``(noisy label=j, true label=k)`` classes. + The `confident_joint` can be computed using :py:func:`count.compute_confident_joint <cleanlab.count.compute_confident_joint>`. + If not provided, it is computed from the given (noisy) `labels` and `pred_probs`. + + n_jobs : optional + Number of processing threads used by multiprocessing. Default ``None`` + sets to the number of cores on your CPU (physical cores if you have ``psutil`` package installed, otherwise logical cores). + Set this to 1 to *disable* parallel processing (if its causing issues). + Windows users may see a speed-up with ``n_jobs=1``. + + verbose : optional + If ``True``, prints when multiprocessing happens. + + Returns + ------- + label_issues : np.ndarray + If `return_indices_ranked_by` left unspecified, returns a boolean **mask** for the entire dataset + where ``True`` represents a label issue and ``False`` represents an example that is + accurately labeled with high confidence. + If `return_indices_ranked_by` is specified, returns a shorter array of **indices** of examples identified to have + label issues (i.e. those indices where the mask would be ``True``), sorted by likelihood that the corresponding label is correct. + + Note + ---- + Obtain the *indices* of examples with label issues in your dataset by setting `return_indices_ranked_by`. + """ + if not rank_by_kwargs: + rank_by_kwargs = {} + + assert filter_by in [ + "low_normalized_margin", + "low_self_confidence", + "prune_by_noise_rate", + "prune_by_class", + "both", + "confident_learning", + "predicted_neq_given", + ] # TODO: change default to confident_learning ? + allow_one_class = False + if isinstance(labels, np.ndarray) or all(isinstance(lab, int) for lab in labels): + if set(labels) == {0}: # occurs with missing classes in multi-label settings + allow_one_class = True + assert_valid_inputs( + X=None, + y=labels, + pred_probs=pred_probs, + multi_label=multi_label, + allow_one_class=allow_one_class, + ) + + if filter_by in [ + "confident_learning", + "predicted_neq_given", + "low_normalized_margin", + "low_self_confidence", + ] and (frac_noise != 1.0 or num_to_remove_per_class is not None): + warn_str = ( + "frac_noise and num_to_remove_per_class parameters are only supported" + " for filter_by 'prune_by_noise_rate', 'prune_by_class', and 'both'. They " + "are not supported for methods 'confident_learning', 'predicted_neq_given', " + "'low_normalized_margin' or 'low_self_confidence'." + ) + warnings.warn(warn_str) + if (num_to_remove_per_class is not None) and ( + filter_by + in [ + "confident_learning", + "predicted_neq_given", + "low_normalized_margin", + "low_self_confidence", + ] + ): + # TODO - add support for these filters + raise ValueError( + "filter_by 'confident_learning', 'predicted_neq_given', 'low_normalized_margin' " + "or 'low_self_confidence' is not supported (yet) when setting 'num_to_remove_per_class'" + ) + if filter_by == "confident_learning" and isinstance(confident_joint, np.ndarray): + warn_str = ( + "The supplied `confident_joint` is ignored when `filter_by = 'confident_learning'`; confident joint will be " + "re-estimated from the given labels. To use your supplied `confident_joint`, please specify a different " + "`filter_by` value." + ) + warnings.warn(warn_str) + + K = get_num_classes( + labels=labels, pred_probs=pred_probs, label_matrix=confident_joint, multi_label=multi_label + ) + # Boolean set to true if dataset is large + big_dataset = K * len(labels) > 1e8 + + # Set-up number of multiprocessing threads + # On Windows/macOS, when multi_label is True, multiprocessing is much slower + # even for faily large input arrays, so we default to n_jobs=1 in this case + os_name = platform.system() + if n_jobs is None: + if multi_label and os_name != "Linux": + n_jobs = 1 + else: + if psutil_exists: + n_jobs = psutil.cpu_count(logical=False) # physical cores + elif big_dataset: + print( + "To default `n_jobs` to the number of physical cores for multiprocessing in find_label_issues(), please: `pip install psutil`.\n" + "Note: You can safely ignore this message. `n_jobs` only affects runtimes, results will be the same no matter its value.\n" + "Since psutil is not installed, `n_jobs` was set to the number of logical cores by default.\n" + "Disable this message by either installing psutil or specifying the `n_jobs` argument." + ) # pragma: no cover + if not n_jobs: + # either psutil does not exist + # or psutil can return None when physical cores cannot be determined + # switch to logical cores + n_jobs = multiprocessing.cpu_count() + else: + assert n_jobs >= 1 + + if multi_label: + if not isinstance(labels, list): + raise TypeError("`labels` must be list when `multi_label=True`.") + warnings.warn( + "The multi_label argument to filter.find_label_issues() is deprecated and will be removed in future versions. Please use `multilabel_classification.filter.find_label_issues()` instead.", + DeprecationWarning, + ) + return _find_label_issues_multilabel( + labels, + pred_probs, + return_indices_ranked_by, + rank_by_kwargs, + filter_by, + frac_noise, + num_to_remove_per_class, + min_examples_per_class, + confident_joint, + n_jobs, + verbose, + ) + + # Else this is standard multi-class classification + # Number of examples in each class of labels + label_counts = value_counts_fill_missing_classes(labels, K, multi_label=multi_label) + # Ensure labels are of type np.ndarray() + labels = np.asarray(labels) + if confident_joint is None or filter_by == "confident_learning": + from cleanlab.count import compute_confident_joint + + confident_joint, cl_error_indices = compute_confident_joint( + labels=labels, + pred_probs=pred_probs, + multi_label=multi_label, + return_indices_of_off_diagonals=True, + ) + + if filter_by in ["low_normalized_margin", "low_self_confidence"]: + # TODO: consider setting adjust_pred_probs to true based on benchmarks (or adding it kwargs, or ignoring and leaving as false by default) + scores = get_label_quality_scores( + labels, + pred_probs, + method=filter_by[4:], + adjust_pred_probs=False, + ) + num_errors = num_label_issues( + labels, pred_probs, multi_label=multi_label # TODO: Check usage of multilabel + ) + # Find label issues O(nlogn) solution (mapped to boolean mask later in the method) + cl_error_indices = np.argsort(scores)[:num_errors] + # The following is the O(n) fastest solution (check for one-off errors), but the problem is if lots of the scores are identical you will overcount, + # you can end up returning more or less and they aren't ranked in the boolean form so there's no way to drop the highest scores randomly + # boundary = np.partition(scores, num_errors)[num_errors] # O(n) solution + # label_issues_mask = scores <= boundary + + if filter_by in ["prune_by_noise_rate", "prune_by_class", "both"]: + # Create `prune_count_matrix` with the number of examples to remove in each class and + # leave at least min_examples_per_class examples per class. + # `prune_count_matrix` is transposed relative to the confident_joint. + prune_count_matrix = _keep_at_least_n_per_class( + prune_count_matrix=confident_joint.T, + n=min_examples_per_class, + frac_noise=frac_noise, + ) + + if num_to_remove_per_class is not None: + # Estimate joint probability distribution over label issues + psy = prune_count_matrix / np.sum(prune_count_matrix, axis=1) + noise_per_s = psy.sum(axis=1) - psy.diagonal() + # Calibrate labels.t. noise rates sum to num_to_remove_per_class + tmp = (psy.T * num_to_remove_per_class / noise_per_s).T + np.fill_diagonal(tmp, label_counts - num_to_remove_per_class) + prune_count_matrix = round_preserving_row_totals(tmp) + + # Prepare multiprocessing shared data + # On Linux, multiprocessing is started with fork, + # so data can be shared with global vairables + COW + # On Window/macOS, processes are started with spawn, + # so data will need to be pickled to the subprocesses through input args + chunksize = max(1, K // n_jobs) + if n_jobs == 1 or os_name == "Linux": + global pred_probs_by_class, prune_count_matrix_cols + pred_probs_by_class = {k: pred_probs[labels == k] for k in range(K)} + prune_count_matrix_cols = {k: prune_count_matrix[:, k] for k in range(K)} + args = [[k, min_examples_per_class, None] for k in range(K)] + else: + args = [ + [k, min_examples_per_class, [pred_probs[labels == k], prune_count_matrix[:, k]]] + for k in range(K) + ] + + # Perform Pruning with threshold probabilities from BFPRT algorithm in O(n) + # Operations are parallelized across all CPU processes + if filter_by == "prune_by_class" or filter_by == "both": + if n_jobs > 1: + with multiprocessing.Pool(n_jobs) as p: + if verbose: # pragma: no cover + print("Parallel processing label issues by class.") + sys.stdout.flush() + if big_dataset and tqdm_exists: + label_issues_masks_per_class = list( + tqdm.tqdm(p.imap(_prune_by_class, args, chunksize=chunksize), total=K) + ) + else: + label_issues_masks_per_class = p.map(_prune_by_class, args, chunksize=chunksize) + else: + label_issues_masks_per_class = [_prune_by_class(arg) for arg in args] + + label_issues_mask = np.zeros(len(labels), dtype=bool) + for k, mask in enumerate(label_issues_masks_per_class): + if len(mask) > 1: + label_issues_mask[labels == k] = mask + + if filter_by == "both": + label_issues_mask_by_class = label_issues_mask + + if filter_by == "prune_by_noise_rate" or filter_by == "both": + if n_jobs > 1: + with multiprocessing.Pool(n_jobs) as p: + if verbose: # pragma: no cover + print("Parallel processing label issues by noise rate.") + sys.stdout.flush() + if big_dataset and tqdm_exists: + label_issues_masks_per_class = list( + tqdm.tqdm(p.imap(_prune_by_count, args, chunksize=chunksize), total=K) + ) + else: + label_issues_masks_per_class = p.map(_prune_by_count, args, chunksize=chunksize) + else: + label_issues_masks_per_class = [_prune_by_count(arg) for arg in args] + + label_issues_mask = np.zeros(len(labels), dtype=bool) + for k, mask in enumerate(label_issues_masks_per_class): + if len(mask) > 1: + label_issues_mask[labels == k] = mask + + if filter_by == "both": + label_issues_mask = label_issues_mask & label_issues_mask_by_class + + if filter_by in ["confident_learning", "low_normalized_margin", "low_self_confidence"]: + label_issues_mask = np.zeros(len(labels), dtype=bool) + label_issues_mask[cl_error_indices] = True + + if filter_by == "predicted_neq_given": + label_issues_mask = find_predicted_neq_given(labels, pred_probs, multi_label=multi_label) + + if filter_by not in ["low_self_confidence", "low_normalized_margin"]: + # Remove label issues if model prediction is close to given label + mask = _reduce_issues(pred_probs=pred_probs, labels=labels) + label_issues_mask[mask] = False + + if verbose: + print("Number of label issues found: {}".format(sum(label_issues_mask))) + + # TODO: run count.num_label_issues() and adjust the total issues found here to match + if return_indices_ranked_by is not None: + er = order_label_issues( + label_issues_mask=label_issues_mask, + labels=labels, + pred_probs=pred_probs, + rank_by=return_indices_ranked_by, + rank_by_kwargs=rank_by_kwargs, + ) + return er + return label_issues_mask
+ + +def _find_label_issues_multilabel( + labels: list, + pred_probs: np.ndarray, + return_indices_ranked_by: Optional[str] = None, + rank_by_kwargs={}, + filter_by: str = "prune_by_noise_rate", + frac_noise: float = 1.0, + num_to_remove_per_class: Optional[List[int]] = None, + min_examples_per_class=1, + confident_joint: Optional[np.ndarray] = None, + n_jobs: Optional[int] = None, + verbose: bool = False, + low_memory: bool = False, +) -> np.ndarray: + """ + Finds label issues in multi-label classification data where each example can belong to more than one class. + This is done via a one-vs-rest reduction for each class and the results are subsequently aggregated across all classes. + Here `labels` must be formatted as an iterable of iterables, e.g. ``List[List[int]]``. + """ + if filter_by in ["low_normalized_margin", "low_self_confidence"] and not low_memory: + num_errors = sum( + find_label_issues( + labels=labels, + pred_probs=pred_probs, + confident_joint=confident_joint, + multi_label=True, + filter_by="confident_learning", + ) + ) + + y_one, num_classes = get_onehot_num_classes(labels, pred_probs) + label_quality_scores = ml_scorer.get_label_quality_scores( + labels=y_one, + pred_probs=pred_probs, + ) + + cl_error_indices = np.argsort(label_quality_scores)[:num_errors] + label_issues_mask = np.zeros(len(labels), dtype=bool) + label_issues_mask[cl_error_indices] = True + + if return_indices_ranked_by is not None: + label_quality_scores_issues = ml_scorer.get_label_quality_scores( + labels=y_one[label_issues_mask], + pred_probs=pred_probs[label_issues_mask], + method=ml_scorer.MultilabelScorer( + base_scorer=ml_scorer.ClassLabelScorer.from_str(return_indices_ranked_by), + ), + base_scorer_kwargs=rank_by_kwargs, + ) + return cl_error_indices[np.argsort(label_quality_scores_issues)] + + return label_issues_mask + + per_class_issues = find_multilabel_issues_per_class( + labels, + pred_probs, + return_indices_ranked_by, + rank_by_kwargs, + filter_by, + frac_noise, + num_to_remove_per_class, + min_examples_per_class, + confident_joint, + n_jobs, + verbose, + low_memory, + ) + if return_indices_ranked_by is None: + assert isinstance(per_class_issues, np.ndarray) + return per_class_issues.sum(axis=1) >= 1 + else: + label_issues_list, labels_list, pred_probs_list = per_class_issues + label_issues_idx = reduce(np.union1d, label_issues_list) + y_one, num_classes = get_onehot_num_classes(labels, pred_probs) + label_quality_scores = ml_scorer.get_label_quality_scores( + labels=y_one, + pred_probs=pred_probs, + method=ml_scorer.MultilabelScorer( + base_scorer=ml_scorer.ClassLabelScorer.from_str(return_indices_ranked_by), + ), + base_scorer_kwargs=rank_by_kwargs, + ) + label_quality_scores_issues = label_quality_scores[label_issues_idx] + return label_issues_idx[np.argsort(label_quality_scores_issues)] + + +def _keep_at_least_n_per_class( + prune_count_matrix: np.ndarray, n: int, *, frac_noise: float = 1.0 +) -> np.ndarray: + """Make sure every class has at least n examples after removing noise. + Functionally, increase each column, increases the diagonal term #(true_label=k,label=k) + of prune_count_matrix until it is at least n, distributing the amount + increased by subtracting uniformly from the rest of the terms in the + column. When frac_noise = 1.0, return all "confidently" estimated + noise indices, otherwise this returns frac_noise fraction of all + the noise counts, with diagonal terms adjusted to ensure column + totals are preserved. + + Parameters + ---------- + prune_count_matrix : np.ndarray of shape (K, K), K = number of classes + A counts of mislabeled examples in every class. For this function. + NOTE prune_count_matrix is transposed relative to confident_joint. + + n : int + Number of examples to make sure are left in each class. + + frac_noise : float, default=1.0 + Used to only return the "top" ``frac_noise * num_label_issues``. The choice of which "top" + label issues to return is dependent on the `filter_by` method used. It works by reducing the + size of the off-diagonals of the `prune_count_matrix` of given labels and true labels + proportionally by `frac_noise` prior to estimating label issues with each method. + When frac_noise=1.0, return all "confident" estimated noise indices (recommended). + + Returns + ------- + prune_count_matrix : np.ndarray of shape (K, K), K = number of classes + This the same as the confident_joint, but has been transposed and the counts are adjusted. + """ + + prune_count_matrix_diagonal = np.diagonal(prune_count_matrix) + + # Set diagonal terms less than n, to n. + new_diagonal = np.maximum(prune_count_matrix_diagonal, n) + + # Find how much diagonal terms were increased. + diff_per_col = new_diagonal - prune_count_matrix_diagonal + + # Count non-zero, non-diagonal items per column + # np.maximum(*, 1) makes this never 0 (we divide by this next) + num_noise_rates_per_col = np.maximum( + np.count_nonzero(prune_count_matrix, axis=0) - 1.0, + 1.0, + ) + + # Uniformly decrease non-zero noise rates by the same amount + # that the diagonal items were increased + new_mat = prune_count_matrix - diff_per_col / num_noise_rates_per_col + + # Originally zero noise rates will now be negative, fix them back to zero + new_mat[new_mat < 0] = 0 + + # Round diagonal terms (correctly labeled examples) + np.fill_diagonal(new_mat, new_diagonal) + + # Reduce (multiply) all noise rates (non-diagonal) by frac_noise and + # increase diagonal by the total amount reduced in each column + # to preserve column counts. + new_mat = _reduce_prune_counts(new_mat, frac_noise) + + # These are counts, so return a matrix of ints. + return round_preserving_row_totals(new_mat).astype(int) + + +def _reduce_prune_counts(prune_count_matrix: np.ndarray, frac_noise: float = 1.0) -> np.ndarray: + """Reduce (multiply) all prune counts (non-diagonal) by frac_noise and + increase diagonal by the total amount reduced in each column to + preserve column counts. + + Parameters + ---------- + prune_count_matrix : np.ndarray of shape (K, K), K = number of classes + A counts of mislabeled examples in every class. For this function, it + does not matter what the rows or columns are, but the diagonal terms + reflect the number of correctly labeled examples. + + frac_noise : float + Used to only return the "top" ``frac_noise * num_label_issues``. The choice of which "top" + label issues to return is dependent on the `filter_by` method used. It works by reducing the + size of the off-diagonals of the `prune_count_matrix` of given labels and true labels + proportionally by `frac_noise` prior to estimating label issues with each method. + When frac_noise=1.0, return all "confident" estimated noise indices (recommended). + """ + + new_mat = prune_count_matrix * frac_noise + np.fill_diagonal(new_mat, prune_count_matrix.diagonal()) + np.fill_diagonal( + new_mat, + prune_count_matrix.diagonal() + np.sum(prune_count_matrix - new_mat, axis=0), + ) + + # These are counts, so return a matrix of ints. + return new_mat.astype(int) + + +
[docs]def find_predicted_neq_given( + labels: LabelLike, pred_probs: np.ndarray, *, multi_label: bool = False +) -> np.ndarray: + """A simple baseline approach that considers ``argmax(pred_probs) != labels`` as the examples with label issues. + + Parameters + ---------- + labels : np.ndarray or list + Labels in the same format expected by the `~cleanlab.filter.find_label_issues` function. + + pred_probs : np.ndarray + Predicted-probabilities in the same format expected by the `~cleanlab.filter.find_label_issues` function. + + multi_label : bool, optional + Whether each example may have multiple labels or not (see documentation for the `~cleanlab.filter.find_label_issues` function). + + Returns + ------- + label_issues_mask : np.ndarray + A boolean mask for the entire dataset where ``True`` represents a + label issue and ``False`` represents an example that is accurately + labeled with high confidence. + """ + + assert_valid_inputs(X=None, y=labels, pred_probs=pred_probs, multi_label=multi_label) + if multi_label: + if not isinstance(labels, list): + raise TypeError("`labels` must be list when `multi_label=True`.") + else: + return _find_predicted_neq_given_multilabel(labels=labels, pred_probs=pred_probs) + else: + return np.argmax(pred_probs, axis=1) != np.asarray(labels)
+ + +def _find_predicted_neq_given_multilabel(labels: list, pred_probs: np.ndarray) -> np.ndarray: + """ + + Parameters + ---------- + labels : list + List of noisy labels for multi-label classification where each example can belong to multiple classes + (e.g. ``labels = [[1,2],[1],[0],[],...]`` indicates the first example in dataset belongs to both class 1 and class 2). + + pred_probs : np.ndarray + Predicted-probabilities in the same format expected by the `~cleanlab.filter.find_label_issues` function. + + Returns + ------- + label_issues_mask : np.ndarray + A boolean mask for the entire dataset where ``True`` represents a + label issue and ``False`` represents an example that is accurately + labeled with high confidence. + + """ + y_one, num_classes = get_onehot_num_classes(labels, pred_probs) + pred_neq: np.ndarray = np.zeros(y_one.shape).astype(bool) + for class_num, (label, pred_prob_for_class) in enumerate(zip(y_one.T, pred_probs.T)): + pred_probs_binary = stack_complement(pred_prob_for_class) + pred_neq[:, class_num] = find_predicted_neq_given( + labels=label, pred_probs=pred_probs_binary + ) + return pred_neq.sum(axis=1) >= 1 + + +
[docs]def find_label_issues_using_argmax_confusion_matrix( + labels: np.ndarray, + pred_probs: np.ndarray, + *, + calibrate: bool = True, + filter_by: str = "prune_by_noise_rate", +) -> np.ndarray: + """A baseline approach that uses the confusion matrix + of ``argmax(pred_probs)`` and labels as the confident joint and then uses cleanlab + (confident learning) to find the label issues using this matrix. + + The only difference between this and `~cleanlab.filter.find_label_issues` is that it uses the confusion matrix + based on the argmax and given label instead of using the confident joint + from :py:func:`count.compute_confident_joint + <cleanlab.count.compute_confident_joint>`. + + Parameters + ---------- + labels : np.ndarray + An array of shape ``(N,)`` of noisy labels, i.e. some labels may be erroneous. + Elements must be in the set 0, 1, ..., K-1, where K is the number of classes. + + pred_probs : np.ndarray + An array of shape ``(N, K)`` of model-predicted probabilities, + ``P(label=k|x)``. Each row of this matrix corresponds + to an example `x` and contains the model-predicted probabilities that + `x` belongs to each possible class, for each of the K classes. The + columns must be ordered such that these probabilities correspond to + class 0, 1, ..., K-1. `pred_probs` should have been computed using 3 (or + higher) fold cross-validation. + + calibrate : bool, default=True + Set to ``True`` to calibrate the confusion matrix created by ``pred != given labels``. + This calibration adjusts the confusion matrix / confident joint so that the + prior (given noisy labels) is correct based on the original labels. + + filter_by : str, default='prune_by_noise_rate' + See `filter_by` argument of `~cleanlab.filter.find_label_issues`. + + Returns + ------- + label_issues_mask : np.ndarray + A boolean mask for the entire dataset where ``True`` represents a + label issue and ``False`` represents an example that is accurately + labeled with high confidence. + + """ + + assert_valid_inputs(X=None, y=labels, pred_probs=pred_probs, multi_label=False) + confident_joint = confusion_matrix(np.argmax(pred_probs, axis=1), labels).T + if calibrate: + confident_joint = calibrate_confident_joint(confident_joint, labels) + return find_label_issues( + labels=labels, + pred_probs=pred_probs, + confident_joint=confident_joint, + filter_by=filter_by, + )
+ + +# Multiprocessing helper functions: + +mp_params: Dict[str, Any] = {} # Globals to be shared across threads in multiprocessing + + +def _to_np_array( + mp_arr: bytearray, dtype="int32", shape: Optional[Tuple[int, int]] = None +) -> np.ndarray: # pragma: no cover + """multipropecessing Helper function to convert a multiprocessing + RawArray to a numpy array.""" + arr = np.frombuffer(mp_arr, dtype=dtype) + if shape is None: + return arr + return arr.reshape(shape) + + +def _init( + __labels, + __label_counts, + __prune_count_matrix, + __pcm_shape, + __pred_probs, + __pred_probs_shape, + __multi_label, + __min_examples_per_class, +): # pragma: no cover + """Shares memory objects across child processes. + ASSUMES none of these will be changed by child processes!""" + + mp_params["labels"] = __labels + mp_params["label_counts"] = __label_counts + mp_params["prune_count_matrix"] = __prune_count_matrix + mp_params["pcm_shape"] = __pcm_shape + mp_params["pred_probs"] = __pred_probs + mp_params["pred_probs_shape"] = __pred_probs_shape + mp_params["multi_label"] = __multi_label + mp_params["min_examples_per_class"] = __min_examples_per_class + + +def _get_shared_data() -> Any: # pragma: no cover + """multiprocessing helper function to extract numpy arrays from + shared RawArray types used to shared data across process.""" + + label_counts = _to_np_array(mp_params["label_counts"]) + prune_count_matrix = _to_np_array( + mp_arr=mp_params["prune_count_matrix"], + shape=mp_params["pcm_shape"], + ) + pred_probs = _to_np_array( + mp_arr=mp_params["pred_probs"], + dtype="float32", + shape=mp_params["pred_probs_shape"], + ) + min_examples_per_class = mp_params["min_examples_per_class"] + multi_label = mp_params["multi_label"] + labels = _to_np_array(mp_params["labels"]) # type: ignore + return ( + labels, + label_counts, + prune_count_matrix, + pred_probs, + multi_label, + min_examples_per_class, + ) + + +# TODO figure out what the types inside args are. +def _prune_by_class(args: list) -> np.ndarray: + """multiprocessing Helper function for find_label_issues() + that assumes globals and produces a mask for class k for each example by + removing the examples with *smallest probability* of + belonging to their given class label. + + Parameters + ---------- + k : int (between 0 and num classes - 1) + The class of interest.""" + + k, min_examples_per_class, arrays = args + if arrays is None: + pred_probs = pred_probs_by_class[k] + prune_count_matrix = prune_count_matrix_cols[k] + else: + pred_probs = arrays[0] + prune_count_matrix = arrays[1] + + label_counts = pred_probs.shape[0] + label_issues = np.zeros(label_counts, dtype=bool) + if label_counts > min_examples_per_class: # No prune if not at least min_examples_per_class + num_issues = label_counts - prune_count_matrix[k] + # Get return_indices_ranked_by of the smallest prob of class k for examples with noisy label k + # rank = np.partition(class_probs, num_issues)[num_issues] + if num_issues >= 1: + class_probs = pred_probs[:, k] + order = np.argsort(class_probs) + label_issues[order[:num_issues]] = True + return label_issues + + warnings.warn( + f"May not flag all label issues in class: {k}, it has too few examples (see argument: `min_examples_per_class`)" + ) + return label_issues + + +# TODO figure out what the types inside args are. +def _prune_by_count(args: list) -> np.ndarray: + """multiprocessing Helper function for find_label_issues() that assumes + globals and produces a mask for class k for each example by + removing the example with noisy label k having *largest margin*, + where + margin of example := prob of given label - max prob of non-given labels + + Parameters + ---------- + k : int (between 0 and num classes - 1) + The true_label class of interest.""" + + k, min_examples_per_class, arrays = args + if arrays is None: + pred_probs = pred_probs_by_class[k] + prune_count_matrix = prune_count_matrix_cols[k] + else: + pred_probs = arrays[0] + prune_count_matrix = arrays[1] + + label_counts = pred_probs.shape[0] + label_issues_mask = np.zeros(label_counts, dtype=bool) + if label_counts <= min_examples_per_class: + warnings.warn( + f"May not flag all label issues in class: {k}, it has too few examples (see `min_examples_per_class` argument)" + ) + return label_issues_mask + + K = pred_probs.shape[1] + if K < 1: + raise ValueError("Must have at least 1 class.") + for j in range(K): + num2prune = prune_count_matrix[j] + # Only prune for noise rates, not diagonal entries + if k != j and num2prune > 0: + # num2prune's largest p(true class k) - p(noisy class k) + # for x with true label j + margin = pred_probs[:, j] - pred_probs[:, k] + order = np.argsort(-margin) + label_issues_mask[order[:num2prune]] = True + return label_issues_mask + + +# TODO: decide if we want to keep this based on TODO above. If so move to utils. Add unit test for this. +def _multiclass_crossval_predict( + labels: list, pred_probs: np.ndarray +) -> np.ndarray: # pragma: no cover + """Returns a numpy 2D array of one-hot encoded + multiclass predictions. Each row in the array + provides the predictions for a particular example. + The boundary condition used to threshold predictions + is computed by maximizing the F1 ROC curve. + + Parameters + ---------- + labels : list of lists (length N) + These are multiclass labels. Each list in the list contains all the + labels for that example. + + pred_probs : np.ndarray (shape (N, K)) + P(label=k|x) is a matrix with K model-predicted probabilities. + Each row of this matrix corresponds to an example `x` and contains the model-predicted + probabilities that `x` belongs to each possible class. + The columns must be ordered such that these probabilities correspond to class 0,1,2,... + `pred_probs` should have been computed using 3 (or higher) fold cross-validation.""" + + from sklearn.metrics import f1_score + + boundaries = np.arange(0.05, 0.9, 0.05) + K = get_num_classes( + labels=labels, + pred_probs=pred_probs, + multi_label=True, + ) + labels_one_hot = int2onehot(labels, K) + f1s = [ + f1_score( + labels_one_hot, + (pred_probs > boundary).astype(np.uint8), + average="micro", + ) + for boundary in boundaries + ] + boundary = boundaries[np.argmax(f1s)] + pred = (pred_probs > boundary).astype(np.uint8) + return pred +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/internal/label_quality_utils.html b/v2.6.6/_modules/cleanlab/internal/label_quality_utils.html new file mode 100644 index 000000000..9b6182f13 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/internal/label_quality_utils.html @@ -0,0 +1,812 @@ + + + + + + + + + + + cleanlab.internal.label_quality_utils - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.internal.label_quality_utils

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""Helper methods used internally for computing label quality scores."""
+import warnings
+import numpy as np
+from typing import Optional
+from scipy.special import xlogy
+
+from cleanlab.count import get_confident_thresholds
+
+
+def _subtract_confident_thresholds(
+    labels: Optional[np.ndarray],
+    pred_probs: np.ndarray,
+    multi_label: bool = False,
+    confident_thresholds: Optional[np.ndarray] = None,
+) -> np.ndarray:
+    """
+    Return adjusted predicted probabilities by subtracting the class confident thresholds and renormalizing.
+
+    The confident class threshold for a class j is the expected (average) "self-confidence" for class j.
+    The purpose of this adjustment is to handle class imbalance.
+
+    Parameters
+    ----------
+    labels : np.ndarray
+      Labels in the same format expected by the `cleanlab.count.get_confident_thresholds()` method.
+      If labels is None, confident_thresholds needs to be passed in as it will not be calculated.
+    pred_probs : np.ndarray (shape (N, K))
+      Predicted-probabilities in the same format expected by the `cleanlab.count.get_confident_thresholds()` method.
+    confident_thresholds : np.ndarray (shape (K,))
+      Pre-calculated confident thresholds. If passed in, function will subtract these thresholds instead of calculating
+      confident_thresholds from the given labels and pred_probs.
+    multi_label : bool, optional
+      If ``True``, labels should be an iterable (e.g. list) of iterables, containing a
+      list of labels for each example, instead of just a single label.
+      The multi-label setting supports classification tasks where an example has 1 or more labels.
+      Example of a multi-labeled `labels` input: ``[[0,1], [1], [0,2], [0,1,2], [0], [1], ...]``.
+      The major difference in how this is calibrated versus single-label is that
+      the total number of errors considered is based on the number of labels,
+      not the number of examples. So, the calibrated `confident_joint` will sum
+      to the number of total labels.
+
+    Returns
+    -------
+    pred_probs_adj : np.ndarray (float)
+      Adjusted pred_probs.
+    """
+    # Get expected (average) self-confidence for each class
+    # TODO: Test this for multi-label
+    if confident_thresholds is None:
+        if labels is None:
+            raise ValueError(
+                "Cannot calculate confident_thresholds without labels. Pass in either labels or already calculated "
+                "confident_thresholds parameter. "
+            )
+        confident_thresholds = get_confident_thresholds(labels, pred_probs, multi_label=multi_label)
+
+    # Subtract the class confident thresholds
+    pred_probs_adj = pred_probs - confident_thresholds
+
+    # Re-normalize by shifting data to take care of negative values from the subtraction
+    pred_probs_adj += confident_thresholds.max()
+    pred_probs_adj /= pred_probs_adj.sum(axis=1, keepdims=True)
+
+    return pred_probs_adj
+
+
+
[docs]def get_normalized_entropy( + pred_probs: np.ndarray, min_allowed_prob: Optional[float] = None +) -> np.ndarray: + """Return the normalized entropy of pred_probs. + + Normalized entropy is between 0 and 1. Higher values of entropy indicate higher uncertainty in the model's prediction of the correct label. + + Read more about normalized entropy `on Wikipedia <https://en.wikipedia.org/wiki/Entropy_(information_theory)>`_. + + Normalized entropy is used in active learning for uncertainty sampling: https://towardsdatascience.com/uncertainty-sampling-cheatsheet-ec57bc067c0b + + Unlike label-quality scores, entropy only depends on the model's predictions, not the given label. + + Parameters + ---------- + pred_probs : np.ndarray (shape (N, K)) + Each row of this matrix corresponds to an example x and contains the model-predicted + probabilities that x belongs to each possible class: P(label=k|x) + + min_allowed_prob : float, default: None, deprecated + Minimum allowed probability value. If not `None` (default), + entries of `pred_probs` below this value will be clipped to this value. + + .. deprecated:: 2.5.0 + This keyword is deprecated and should be left to the default. + The entropy is well-behaved even if `pred_probs` contains zeros, + clipping is unnecessary and (slightly) changes the results. + + Returns + ------- + entropy : np.ndarray (shape (N, )) + Each element is the normalized entropy of the corresponding row of ``pred_probs``. + + Raises + ------ + ValueError + An error is raised if any of the probabilities is not in the interval [0, 1]. + """ + if np.any(pred_probs < 0) or np.any(pred_probs > 1): + raise ValueError("All probabilities are required to be in the interval [0, 1].") + num_classes = pred_probs.shape[1] + + if min_allowed_prob is not None: + warnings.warn( + "Using `min_allowed_prob` is not necessary anymore and will be removed.", + DeprecationWarning, + ) + pred_probs = np.clip(pred_probs, a_min=min_allowed_prob, a_max=None) + + # Note that dividing by log(num_classes) changes the base of the log which rescales entropy to 0-1 range + return -np.sum(xlogy(pred_probs, pred_probs), axis=1) / np.log(num_classes)
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/internal/latent_algebra.html b/v2.6.6/_modules/cleanlab/internal/latent_algebra.html new file mode 100644 index 000000000..562b8537e --- /dev/null +++ b/v2.6.6/_modules/cleanlab/internal/latent_algebra.html @@ -0,0 +1,1008 @@ + + + + + + + + + + + cleanlab.internal.latent_algebra - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.internal.latent_algebra

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+
+"""
+Contains mathematical functions relating the latent terms,
+``P(given_label)``, ``P(given_label | true_label)``, ``P(true_label | given_label)``, ``P(true_label)``, etc. together.
+For every function here, if the inputs are exact, the output is guaranteed to be exact.
+Every function herein is the computational equivalent of a mathematical equation having a closed, exact form.
+If the inputs are inexact, the error will of course propagate.
+Throughout `K` denotes the number of classes in the classification task.
+"""
+
+import warnings
+import numpy as np
+from typing import Tuple
+
+from cleanlab.internal.util import value_counts, clip_values, clip_noise_rates
+from cleanlab.internal.constants import TINY_VALUE, CLIPPING_LOWER_BOUND
+
+
+
[docs]def compute_ps_py_inv_noise_matrix( + labels, noise_matrix +) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: + """Compute ``ps := P(labels=k), py := P(true_labels=k)``, and the inverse noise matrix. + + Parameters + ---------- + labels : np.ndarray + A discrete vector of noisy labels, i.e. some labels may be erroneous. + *Format requirements*: for dataset with `K` classes, labels must be in ``{0,1,...,K-1}``. + + noise_matrix : np.ndarray + A conditional probability matrix (of shape ``(K, K)``) of the form ``P(label=k_s|true_label=k_y)`` containing + the fraction of examples in every class, labeled as every other class. + Assumes columns of noise_matrix sum to 1.""" + + ps = value_counts(labels) / float(len(labels)) # p(labels=k) + py, inverse_noise_matrix = compute_py_inv_noise_matrix(ps, noise_matrix) + return ps, py, inverse_noise_matrix
+ + +
[docs]def compute_py_inv_noise_matrix(ps, noise_matrix) -> Tuple[np.ndarray, np.ndarray]: + """Compute py := P(true_label=k), and the inverse noise matrix. + + Parameters + ---------- + ps : np.ndarray + Array of shape ``(K, )`` or ``(1, K)``. + The fraction (prior probability) of each observed, NOISY class ``P(labels = k)``. + + noise_matrix : np.ndarray + A conditional probability matrix (of shape ``(K, K)``) of the form ``P(label=k_s|true_label=k_y)`` containing + the fraction of examples in every class, labeled as every other class. + Assumes columns of noise_matrix sum to 1.""" + + # 'py' is p(true_labels=k) = noise_matrix^(-1) * p(labels=k) + # because in *vector computation*: P(label=k|true_label=k) * p(true_label=k) = P(label=k) + # The pseudo-inverse is used when noise_matrix is not invertible. + py = np.linalg.inv(noise_matrix).dot(ps) + + # No class should have probability 0, so we use .000001 + # Make sure valid probabilities that sum to 1.0 + py = clip_values(py, low=CLIPPING_LOWER_BOUND, high=1.0, new_sum=1.0) + + # All the work is done in this function (below) + return py, compute_inv_noise_matrix(py=py, noise_matrix=noise_matrix, ps=ps)
+ + +
[docs]def compute_inv_noise_matrix(py, noise_matrix, *, ps=None) -> np.ndarray: + """Compute the inverse noise matrix if py := P(true_label=k) is given. + + Parameters + ---------- + py : np.ndarray (shape (K, 1)) + The fraction (prior probability) of each TRUE class label, P(true_label = k) + + noise_matrix : np.ndarray + A conditional probability matrix (of shape ``(K, K)``) of the form ``P(label=k_s|true_label=k_y)`` containing + the fraction of examples in every class, labeled as every other class. + Assumes columns of noise_matrix sum to 1. + + ps : np.ndarray + Array of shape ``(K, 1)`` containing the fraction (prior probability) of each NOISY given label, ``P(labels = k)``. + `ps` is easily computable from py and should only be provided if it has already been precomputed, to increase code efficiency. + + Examples + -------- + For loop based implementation: + + .. code:: python + + # Number of classes + K = len(py) + + # 'ps' is p(labels=k) = noise_matrix * p(true_labels=k) + # because in *vector computation*: P(label=k|true_label=k) * p(true_label=k) = P(label=k) + if ps is None: + ps = noise_matrix.dot(py) + + # Estimate the (K, K) inverse noise matrix P(true_label = k_y | label = k_s) + inverse_noise_matrix = np.empty(shape=(K,K)) + # k_s is the class value k of noisy label `label == k` + for k_s in range(K): + # k_y is the (guessed) class value k of true label y + for k_y in range(K): + # P(true_label|label) = P(label|y) * P(true_label) / P(labels) + inverse_noise_matrix[k_y][k_s] = noise_matrix[k_s][k_y] * \ + py[k_y] / ps[k_s] + """ + + joint = noise_matrix * py + ps = joint.sum(axis=1) if ps is None else ps + inverse_noise_matrix = joint.T / np.clip(ps, a_min=TINY_VALUE, a_max=None) + + # Clip inverse noise rates P(true_label=k_s|true_label=k_y) into proper range [0,1) + return clip_noise_rates(inverse_noise_matrix)
+ + +
[docs]def compute_noise_matrix_from_inverse(ps, inverse_noise_matrix, *, py=None) -> np.ndarray: + """Compute the noise matrix ``P(label=k_s|true_label=k_y)``. + + Parameters + ---------- + py : np.ndarray + Array of shape ``(K, 1)`` containing the fraction (prior probability) of each TRUE class label, ``P(true_label = k)``. + + inverse_noise_matrix : np.ndarray + A conditional probability matrix (of shape ``(K, K)``) of the form P(true_label=k_y|label=k_s) representing + the estimated fraction observed examples in each class k_s, that are + mislabeled examples from every other class k_y. If None, the + inverse_noise_matrix will be computed from pred_probs and labels. + Assumes columns of inverse_noise_matrix sum to 1. + + ps : np.ndarray + Array of shape ``(K, 1)`` containing the fraction (prior probability) of each observed NOISY label, P(labels = k). + `ps` is easily computable from `py` and should only be provided if it has already been precomputed, to increase code efficiency. + + Returns + ------- + noise_matrix : np.ndarray + Array of shape ``(K, K)``, where `K` = number of classes, whose columns sum to 1. + A conditional probability matrix of the form ``P(label=k_s|true_label=k_y)`` containing + the fraction of examples in every class, labeled as every other class. + + Examples + -------- + For loop based implementation: + + .. code:: python + + # Number of classes labels + K = len(ps) + + # 'py' is p(true_label=k) = inverse_noise_matrix * p(true_label=k) + # because in *vector computation*: P(true_label=k|label=k) * p(label=k) = P(true_label=k) + if py is None: + py = inverse_noise_matrix.dot(ps) + + # Estimate the (K, K) noise matrix P(labels = k_s | true_labels = k_y) + noise_matrix = np.empty(shape=(K,K)) + # k_s is the class value k of noisy label `labels == k` + for k_s in range(K): + # k_y is the (guessed) class value k of true label y + for k_y in range(K): + # P(labels|y) = P(true_label|labels) * P(labels) / P(true_label) + noise_matrix[k_s][k_y] = inverse_noise_matrix[k_y][k_s] * \ + ps[k_s] / py[k_y] + + """ + + joint = (inverse_noise_matrix * ps).T + py = joint.sum(axis=0) if py is None else py + noise_matrix = joint / np.clip(py, a_min=TINY_VALUE, a_max=None) + + # Clip inverse noise rates P(true_label=k_y|true_label=k_s) into proper range [0,1) + return clip_noise_rates(noise_matrix)
+ + +
[docs]def compute_py( + ps, noise_matrix, inverse_noise_matrix, *, py_method="cnt", true_labels_class_counts=None +) -> np.ndarray: + """Compute ``py := P(true_labels=k)`` from ``ps := P(labels=k)``, `noise_matrix`, and + `inverse_noise_matrix`. + + This method is ** ROBUST ** when ``py_method = 'cnt'`` + It may work well even when the noise matrices are estimated + poorly by using the diagonals of the matrices + instead of all the probabilities in the entire matrix. + + Parameters + ---------- + ps : np.ndarray + Array of shape ``(K, )`` or ``(1, K)`` containing the fraction (prior probability) of each observed, noisy label, P(labels = k) + + noise_matrix : np.ndarray + A conditional probability matrix ( of shape ``(K, K)``) of the form ``P(label=k_s|true_label=k_y)`` containing + the fraction of examples in every class, labeled as every other class. + Assumes columns of noise_matrix sum to 1. + + inverse_noise_matrix : np.ndarray of shape (K, K), K = number of classes + A conditional probability matrix ( of shape ``(K, K)``) of the form ``P(true_label=k_y|label=k_s)`` representing + the estimated fraction observed examples in each class `k_s`, that are + mislabeled examples from every other class `k_y`. If ``None``, the + inverse_noise_matrix will be computed from `pred_probs` and `labels`. + Assumes columns of `inverse_noise_matrix` sum to 1. + + py_method : str (Options: ["cnt", "eqn", "marginal", "marginal_ps"]) + How to compute the latent prior ``p(true_label=k)``. Default is "cnt" as it often + works well even when the noise matrices are estimated poorly by using + the matrix diagonals instead of all the probabilities. + + true_labels_class_counts : np.ndarray + Array of shape ``(K, )`` or ``(1, K)`` containing the marginal counts of the confident joint + (like ``cj.sum(axis = 0)``). + + Returns + ------- + py : np.ndarray + Array of shape ``(K, )`` or ``(1, K)``. + The fraction (prior probability) of each TRUE class label, ``P(true_label = k)``.""" + + if len(np.shape(ps)) > 2 or (len(np.shape(ps)) == 2 and np.shape(ps)[0] != 1): + w = "Input parameter np.ndarray ps has shape " + str(np.shape(ps)) + w += ", but shape should be (K, ) or (1, K)" + warnings.warn(w) + + if py_method == "marginal" and true_labels_class_counts is None: + msg = ( + 'py_method == "marginal" requires true_labels_class_counts, ' + "but true_labels_class_counts is None. " + ) + msg += " Provide parameter true_labels_class_counts." + raise ValueError(msg) + + if py_method == "cnt": + # Computing py this way avoids dividing by zero noise rates. + # More robust bc error est_p(true_label|labels) / est_p(labels|y) ~ p(true_label|labels) / p(labels|y) + py = ( + inverse_noise_matrix.diagonal() + / np.clip(noise_matrix.diagonal(), a_min=TINY_VALUE, a_max=None) + * ps + ) + # Equivalently: py = (true_labels_class_counts / labels_class_counts) * ps + elif py_method == "eqn": + py = np.linalg.inv(noise_matrix).dot(ps) + elif py_method == "marginal": + py = true_labels_class_counts / np.clip( + float(sum(true_labels_class_counts)), a_min=TINY_VALUE, a_max=None + ) + elif py_method == "marginal_ps": + py = np.dot(inverse_noise_matrix, ps) + else: + err = "py_method {}".format(py_method) + err += " should be in [cnt, eqn, marginal, marginal_ps]" + raise ValueError(err) + + # Clip py (0,1), s.t. no class should have prob 0, hence 1e-6 + py = clip_values(py, low=CLIPPING_LOWER_BOUND, high=1.0, new_sum=1.0) + return py
+ + +
[docs]def compute_pyx(pred_probs, noise_matrix, inverse_noise_matrix): + """Compute ``pyx := P(true_label=k|x)`` from ``pred_probs := P(label=k|x)``, `noise_matrix` and + `inverse_noise_matrix`. + + This method is ROBUST - meaning it works well even when the + noise matrices are estimated poorly by only using the diagonals of the + matrices which tend to be easy to estimate correctly. + + Parameters + ---------- + pred_probs : np.ndarray + ``P(label=k|x)`` is a ``(N x K)`` matrix with K model-predicted probabilities. + Each row of this matrix corresponds to an example `x` and contains the model-predicted + probabilities that `x` belongs to each possible class. + The columns must be ordered such that these probabilities correspond to class 0,1,2,... + `pred_probs` should have been computed using 3 (or higher) fold cross-validation. + + noise_matrix : np.ndarray + A conditional probability matrix (of shape ``(K, K)``) of the form ``P(label=k_s|true_label=k_y)`` containing + the fraction of examples in every class, labeled as every other class. + Assumes columns of `noise_matrix` sum to 1. + + inverse_noise_matrix : np.ndarray + A conditional probability matrix (of shape ``(K, K)``) of the form ``P(true_label=k_y|label=k_s)`` representing + the estimated fraction observed examples in each class `k_s`, that are + mislabeled examples from every other class `k_y`. If None, the + inverse_noise_matrix will be computed from `pred_probs` and `labels`. + Assumes columns of `inverse_noise_matrix` sum to 1. + + Returns + ------- + pyx : np.ndarray + ``P(true_label=k|x)`` is a ``(N, K)`` matrix of model-predicted probabilities. + Each row of this matrix corresponds to an example `x` and contains the model-predicted + probabilities that `x` belongs to each possible class. + The columns must be ordered such that these probabilities correspond to class 0,1,2,... + `pred_probs` should have been computed using 3 (or higher) fold cross-validation.""" + + if len(np.shape(pred_probs)) != 2: + raise ValueError( + "Input parameter np.ndarray 'pred_probs' has shape " + + str(np.shape(pred_probs)) + + ", but shape should be (N, K)" + ) + + pyx = ( + pred_probs + * inverse_noise_matrix.diagonal() + / np.clip(noise_matrix.diagonal(), a_min=TINY_VALUE, a_max=None) + ) + # Make sure valid probabilities that sum to 1.0 + return np.apply_along_axis( + func1d=clip_values, axis=1, arr=pyx, **{"low": 0.0, "high": 1.0, "new_sum": 1.0} + )
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/internal/multiannotator_utils.html b/v2.6.6/_modules/cleanlab/internal/multiannotator_utils.html new file mode 100644 index 000000000..37af0a155 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/internal/multiannotator_utils.html @@ -0,0 +1,1047 @@ + + + + + + + + + + + cleanlab.internal.multiannotator_utils - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.internal.multiannotator_utils

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Helper methods used internally in cleanlab.multiannotator
+"""
+
+import warnings
+from typing import Optional, Tuple
+
+import numpy as np
+import pandas as pd
+
+from cleanlab.internal.numerics import softmax
+from cleanlab.internal.util import get_num_classes, value_counts
+from cleanlab.internal.validation import assert_valid_class_labels
+from cleanlab.typing import LabelLike
+
+SMALL_CONST = 1e-30
+
+
+
[docs]def assert_valid_inputs_multiannotator( + labels_multiannotator: np.ndarray, + pred_probs: Optional[np.ndarray] = None, + ensemble: bool = False, + allow_single_label: bool = False, + annotator_ids: Optional[pd.Index] = None, +) -> None: + """Validate format of multi-annotator labels""" + # Check that labels_multiannotator is a 2D array + if labels_multiannotator.ndim != 2: + raise ValueError( + "labels_multiannotator must be a 2D array or dataframe, " + "each row represents an example and each column represents an annotator." + ) + + # Raise error if labels are not formatted properly + if any([isinstance(label, str) for label in labels_multiannotator.ravel()]): + raise ValueError( + "Labels cannot be strings, they must be zero-indexed integers corresponding to class indices." + ) + + # Raise error if labels_multiannotator has NaN rows + nan_row_mask = np.isnan(labels_multiannotator).all(axis=1) + if nan_row_mask.any(): + nan_rows = list(np.where(nan_row_mask)[0]) + raise ValueError( + "labels_multiannotator cannot have rows with all NaN, each example must have at least one label.\n" + f"Examples {nan_rows} do not have any labels." + ) + + # Raise error if labels_multiannotator has NaN columns + nan_col_mask = np.isnan(labels_multiannotator).all(axis=0) + if nan_col_mask.any(): + if annotator_ids is not None: + nan_columns = list(annotator_ids[np.where(nan_col_mask)[0]]) + else: + nan_columns = list(np.where(nan_col_mask)[0]) + raise ValueError( + "labels_multiannotator cannot have columns with all NaN, each annotator must annotator at least one example.\n" + f"Annotators {nan_columns} did not label any examples." + ) + + if not allow_single_label: + # Raise error if labels_multiannotator has <= 1 column + if labels_multiannotator.shape[1] <= 1: + raise ValueError( + "labels_multiannotator must have more than one column.\n" + "If there is only one annotator, use cleanlab.rank.get_label_quality_scores instead" + ) + + # Raise error if labels_multiannotator only has 1 label per example + if (np.sum(~np.isnan(labels_multiannotator), axis=1) == 1).all(): + raise ValueError( + "Each example only has one label, collapse the labels into a 1-D array and use " + "cleanlab.rank.get_label_quality_scores instead" + ) + + # Raise warning if no examples with 2 or more annotators agree + # TODO: might shift this later in the code to avoid extra compute + has_agreement = np.zeros(labels_multiannotator.shape[0], dtype=bool) + for i in np.unique(labels_multiannotator): + has_agreement |= (labels_multiannotator == i).sum(axis=1) > 1 + if not has_agreement.any(): + warnings.warn("Annotators do not agree on any example. Check input data.") + + # Check labels + all_labels_flatten = labels_multiannotator.ravel() + all_labels_flatten = all_labels_flatten[~np.isnan(all_labels_flatten)] + assert_valid_class_labels(all_labels_flatten, allow_one_class=True) + + # Raise error if number of classes in labels_multiannoator does not match number of classes in pred_probs + if pred_probs is not None: + if not isinstance(pred_probs, np.ndarray): + raise TypeError("pred_probs must be a numpy array.") + + if ensemble: + if pred_probs.ndim != 3: + error_message = "pred_probs must be a 3d array." + if pred_probs.ndim == 2: + error_message += " If you have a 2d pred_probs array, use the non-ensemble version of this function." + raise ValueError(error_message) + + if pred_probs.shape[1] != len(labels_multiannotator): + raise ValueError("each pred_probs and labels_multiannotator must have same length.") + + num_classes = pred_probs.shape[2] + else: + if pred_probs.ndim != 2: + error_message = "pred_probs must be a 2d array." + if pred_probs.ndim == 3: + error_message += " If you have a 3d pred_probs array, use the ensemble version of this function." + raise ValueError(error_message) + + if len(pred_probs) != len(labels_multiannotator): + raise ValueError("pred_probs and labels_multiannotator must have same length.") + + num_classes = pred_probs.shape[1] + + highest_class = np.nanmax(labels_multiannotator) + 1 + + # this allows for missing labels, but not missing columns in pred_probs + if num_classes < highest_class: + raise ValueError( + f"pred_probs must have at least {int(highest_class)} columns based on the largest class label " + "which appears in labels_multiannotator. Perhaps some rarely-annotated classes were lost while " + "establishing consensus labels used to train your classifier." + )
+ + +
[docs]def assert_valid_pred_probs( + pred_probs: Optional[np.ndarray] = None, + pred_probs_unlabeled: Optional[np.ndarray] = None, + ensemble: bool = False, +): + """Validate format of pred_probs for multiannotator active learning functions""" + if pred_probs is None and pred_probs_unlabeled is None: + raise ValueError( + "pred_probs and pred_probs_unlabeled cannot both be None, specify at least one of the two." + ) + + if ensemble: + if pred_probs is not None: + if not isinstance(pred_probs, np.ndarray): + raise TypeError("pred_probs must be a numpy array.") + if pred_probs.ndim != 3: + error_message = "pred_probs must be a 3d array." + if pred_probs.ndim == 2: # pragma: no cover + error_message += " If you have a 2d pred_probs array (ie. only one predictor), use the non-ensemble version of this function." + raise ValueError(error_message) + + if pred_probs_unlabeled is not None: + if not isinstance(pred_probs_unlabeled, np.ndarray): + raise TypeError("pred_probs_unlabeled must be a numpy array.") + if pred_probs_unlabeled.ndim != 3: + error_message = "pred_probs_unlabeled must be a 3d array." + if pred_probs_unlabeled.ndim == 2: # pragma: no cover + error_message += " If you have a 2d pred_probs_unlabeled array, use the non-ensemble version of this function." + raise ValueError(error_message) + + if pred_probs is not None and pred_probs_unlabeled is not None: + if pred_probs.shape[2] != pred_probs_unlabeled.shape[2]: + raise ValueError( + "pred_probs and pred_probs_unlabeled must have the same number of classes" + ) + + else: + if pred_probs is not None: + if not isinstance(pred_probs, np.ndarray): + raise TypeError("pred_probs must be a numpy array.") + if pred_probs.ndim != 2: + error_message = "pred_probs must be a 2d array." + if pred_probs.ndim == 3: # pragma: no cover + error_message += " If you have a 3d pred_probs array, use the ensemble version of this function." + raise ValueError(error_message) + + if pred_probs_unlabeled is not None: + if not isinstance(pred_probs_unlabeled, np.ndarray): + raise TypeError("pred_probs_unlabeled must be a numpy array.") + if pred_probs_unlabeled.ndim != 2: + error_message = "pred_probs_unlabeled must be a 2d array." + if pred_probs_unlabeled.ndim == 3: # pragma: no cover + error_message += " If you have a 3d pred_probs_unlabeled array, use the non-ensemble version of this function." + raise ValueError(error_message) + + if pred_probs is not None and pred_probs_unlabeled is not None: + if pred_probs.shape[1] != pred_probs_unlabeled.shape[1]: + raise ValueError( + "pred_probs and pred_probs_unlabeled must have the same number of classes" + )
+ + +
[docs]def format_multiannotator_labels(labels: LabelLike) -> Tuple[pd.DataFrame, dict]: + """Takes an array of labels and formats it such that labels are in the set ``0, 1, ..., K-1``, + where ``K`` is the number of classes. The labels are assigned based on lexicographic order. + + Returns + ------- + formatted_labels + Returns pd.DataFrame of shape ``(N,M)``. The return labels will be properly formatted and can be passed to + cleanlab.multiannotator functions. + + mapping + A dictionary showing the mapping of new to old labels, such that ``mapping[k]`` returns the name of the k-th class. + """ + if isinstance(labels, pd.DataFrame): + np_labels = labels.values + elif isinstance(labels, np.ndarray): + np_labels = labels + else: + raise TypeError("labels must be 2D numpy array or pandas DataFrame") + + unique_labels = pd.unique(np_labels.ravel()) + + try: + unique_labels = unique_labels[~np.isnan(unique_labels)] + unique_labels.sort() + except TypeError: # np.unique / np.sort cannot handle string values or pd.NA types + nan_mask = np.array([(l is np.NaN) or (l is pd.NA) or (l == "nan") for l in unique_labels]) + unique_labels = unique_labels[~nan_mask] + unique_labels.sort() + + # convert float labels (that arose because np.nan is float type) to int + if unique_labels.dtype == "float": + unique_labels = unique_labels.astype("int") + + label_map = {label: i for i, label in enumerate(unique_labels)} + inverse_map = {i: label for label, i in label_map.items()} + + if isinstance(labels, np.ndarray): + labels = pd.DataFrame(labels) + + formatted_labels = labels.replace(label_map) + + return formatted_labels, inverse_map
+ + +
[docs]def check_consensus_label_classes( + labels_multiannotator: np.ndarray, + consensus_label: np.ndarray, + consensus_method: str, +) -> None: + """Check if any classes no longer appear in the set of consensus labels (established using the consensus_method stated)""" + unique_ma_labels = np.unique(labels_multiannotator) + unique_ma_labels = unique_ma_labels[~np.isnan(unique_ma_labels)] + labels_set_difference = set(unique_ma_labels) - set(consensus_label) + + if len(labels_set_difference) > 0: + print( + "CAUTION: Number of unique classes has been reduced from the original data when establishing consensus labels " + f"using consensus method '{consensus_method}', likely due to some classes being rarely annotated. " + "If training a classifier on these consensus labels, it will never see any of the omitted classes unless you " + "manually replace some of the consensus labels.\n" + f"Classes in the original data but not in consensus labels: {list(map(int, labels_set_difference))}" + )
+ + +
[docs]def compute_soft_cross_entropy( + labels_multiannotator: np.ndarray, + pred_probs: np.ndarray, +) -> float: + """Compute soft cross entropy between the annotators' empirical label distribution and model pred_probs""" + num_classes = get_num_classes(pred_probs=pred_probs) + + empirical_label_distribution = np.full((len(labels_multiannotator), num_classes), np.NaN) + for i, labels in enumerate(labels_multiannotator): + labels_subset = labels[~np.isnan(labels)] + empirical_label_distribution[i, :] = value_counts( + labels_subset, num_classes=num_classes + ) / len(labels_subset) + + clipped_pred_probs = np.clip(pred_probs, a_min=SMALL_CONST, a_max=None) + soft_cross_entropy = -np.sum( + empirical_label_distribution * np.log(clipped_pred_probs), axis=1 + ) / np.log(num_classes) + + return soft_cross_entropy
+ + +
[docs]def find_best_temp_scaler( + labels_multiannotator: np.ndarray, + pred_probs: np.ndarray, + coarse_search_range: list = [0.1, 0.2, 0.5, 0.8, 1, 2, 3, 5, 8], + fine_search_size: int = 4, +) -> float: + """Find the best temperature scaling factor that minimizes the soft cross entropy between the annotators' empirical label distribution + and model pred_probs""" + + soft_cross_entropy_coarse = np.full(len(coarse_search_range), np.NaN) + log_pred_probs = np.log( + pred_probs, where=pred_probs > 0, out=np.full(pred_probs.shape, -np.inf) + ) + for i, curr_temp in enumerate(coarse_search_range): + scaled_pred_probs = softmax(log_pred_probs, temperature=curr_temp, axis=1, shift=False) + soft_cross_entropy_coarse[i] = np.mean( + compute_soft_cross_entropy(labels_multiannotator, scaled_pred_probs) + ) + + min_entropy_ind = np.argmin(soft_cross_entropy_coarse) + fine_search_range = _set_fine_search_range( + coarse_search_range, fine_search_size, min_entropy_ind + ) + soft_cross_entropy_fine = np.full(len(fine_search_range), np.NaN) + for i, curr_temp in enumerate(fine_search_range): + scaled_pred_probs = softmax(log_pred_probs, temperature=curr_temp, axis=1, shift=False) + soft_cross_entropy_fine[i] = np.mean( + compute_soft_cross_entropy(labels_multiannotator, scaled_pred_probs) + ) + best_temp = fine_search_range[np.argmin(soft_cross_entropy_fine)] + return best_temp
+ + +def _set_fine_search_range( + coarse_search_range: list, fine_search_size: int, min_entropy_ind: np.intp +) -> np.ndarray: + fine_search_range = np.array([]) + if min_entropy_ind != 0: + fine_search_range = np.append( + np.linspace( + coarse_search_range[min_entropy_ind - 1], + coarse_search_range[min_entropy_ind], + fine_search_size, + endpoint=False, + ), + fine_search_range, + ) + if min_entropy_ind != len(coarse_search_range) - 1: + fine_search_range = np.append( + fine_search_range, + np.linspace( + coarse_search_range[min_entropy_ind], + coarse_search_range[min_entropy_ind + 1], + fine_search_size + 1, + endpoint=True, + ), + ) + return fine_search_range + + +
[docs]def temp_scale_pred_probs( + pred_probs: np.ndarray, + temp: float, +) -> np.ndarray: + """Scales pred_probs by the given temperature factor. Temperature of <1 will sharpen the pred_probs while temperatures of >1 will smoothen it.""" + # clip pred_probs to prevent taking log of 0 + pred_probs = np.clip(pred_probs, a_min=SMALL_CONST, a_max=None) + pred_probs = pred_probs / np.sum(pred_probs, axis=1)[:, np.newaxis] + + # apply temperate scale + scaled_pred_probs = softmax(np.log(pred_probs), temperature=temp, axis=1, shift=False) + scaled_pred_probs = ( + scaled_pred_probs / np.sum(scaled_pred_probs, axis=1)[:, np.newaxis] + ) # normalize + + return scaled_pred_probs
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/internal/multilabel_scorer.html b/v2.6.6/_modules/cleanlab/internal/multilabel_scorer.html new file mode 100644 index 000000000..cdd1d5b34 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/internal/multilabel_scorer.html @@ -0,0 +1,1346 @@ + + + + + + + + + + + cleanlab.internal.multilabel_scorer - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.internal.multilabel_scorer

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+"""
+Helper classes and functions used internally to compute label quality scores in multi-label classification.
+"""
+
+from enum import Enum
+from typing import Callable, Dict, Optional, Union
+
+import numpy as np
+from sklearn.model_selection import cross_val_predict
+
+from cleanlab.internal.label_quality_utils import _subtract_confident_thresholds
+from cleanlab.internal.multilabel_utils import _is_multilabel, stack_complement
+from cleanlab.internal.numerics import softmax
+from cleanlab.rank import (
+    get_confidence_weighted_entropy_for_each_label,
+    get_normalized_margin_for_each_label,
+    get_self_confidence_for_each_label,
+)
+
+
+class _Wrapper:
+    """Helper class for wrapping callable functions as attributes of an Enum instead of
+    setting them as methods of the Enum class.
+
+
+    This class is only intended to be used internally for the ClassLabelScorer or
+    other cases where functions are used for enumeration values.
+    """
+
+    def __init__(self, f: Callable) -> None:
+        self.f = f
+
+    def __call__(self, *args, **kwargs):
+        return self.f(*args, **kwargs)
+
+    def __repr__(self):
+        return self.f.__name__
+
+
+
[docs]class ClassLabelScorer(Enum): + """Enum for the different methods to compute label quality scores.""" + + SELF_CONFIDENCE = _Wrapper(get_self_confidence_for_each_label) + """Returns the self-confidence label-quality score for each datapoint. + + See also + -------- + cleanlab.rank.get_self_confidence_for_each_label + """ + NORMALIZED_MARGIN = _Wrapper(get_normalized_margin_for_each_label) + """Returns the "normalized margin" label-quality score for each datapoint. + + See also + -------- + cleanlab.rank.get_normalized_margin_for_each_label + """ + CONFIDENCE_WEIGHTED_ENTROPY = _Wrapper(get_confidence_weighted_entropy_for_each_label) + """Returns the "confidence weighted entropy" label-quality score for each datapoint. + + See also + -------- + cleanlab.rank.get_confidence_weighted_entropy_for_each_label + """ + +
[docs] def __call__(self, labels: np.ndarray, pred_probs: np.ndarray, **kwargs) -> np.ndarray: + """Returns the label-quality scores for each datapoint based on the given labels and predicted probabilities. + + See the documentation for each method for more details. + + Example + ------- + >>> import numpy as np + >>> from cleanlab.internal.multilabel_scorer import ClassLabelScorer + >>> labels = np.array([0, 0, 0, 1, 1, 1]) + >>> pred_probs = np.array([ + ... [0.9, 0.1], + ... [0.8, 0.2], + ... [0.7, 0.3], + ... [0.2, 0.8], + ... [0.75, 0.25], + ... [0.1, 0.9], + ... ]) + >>> ClassLabelScorer.SELF_CONFIDENCE(labels, pred_probs) + array([0.9 , 0.8 , 0.7 , 0.8 , 0.25, 0.9 ]) + """ + pred_probs = self._adjust_pred_probs(labels, pred_probs, **kwargs) + return self.value(labels, pred_probs)
+ + def _adjust_pred_probs( + self, labels: np.ndarray, pred_probs: np.ndarray, **kwargs + ) -> np.ndarray: + """Returns adjusted predicted probabilities by subtracting the class confident thresholds and renormalizing. + + This is used to adjust the predicted probabilities for the SELF_CONFIDENCE and NORMALIZED_MARGIN methods. + """ + if kwargs.get("adjust_pred_probs", False) is True: + if self == ClassLabelScorer.CONFIDENCE_WEIGHTED_ENTROPY: + raise ValueError(f"adjust_pred_probs is not currently supported for {self}.") + pred_probs = _subtract_confident_thresholds(labels, pred_probs) + return pred_probs + +
[docs] @classmethod + def from_str(cls, method: str) -> "ClassLabelScorer": + """Constructs an instance of the ClassLabelScorer enum based on the given method name. + + Parameters + ---------- + method: + The name of the scoring method to use. + + Returns + ------- + scorer: + An instance of the ClassLabelScorer enum. + + Raises + ------ + ValueError: + If the given method name is not a valid method name. + It must be one of the following: "self_confidence", "normalized_margin", or "confidence_weighted_entropy". + + Example + ------- + >>> from cleanlab.internal.multilabel_scorer import ClassLabelScorer + >>> ClassLabelScorer.from_str("self_confidence") + <ClassLabelScorer.SELF_CONFIDENCE: get_self_confidence_for_each_label> + """ + try: + return cls[method.upper()] + except KeyError: + raise ValueError(f"Invalid method name: {method}")
+ + +
[docs]def exponential_moving_average( + s: np.ndarray, + *, + alpha: Optional[float] = None, + axis: int = 1, + **_, +) -> np.ndarray: + r"""Exponential moving average (EMA) score aggregation function. + + For a score vector s = (s_1, ..., s_K) with K scores, the values + are sorted in *descending* order and the exponential moving average + of the last score is calculated, denoted as EMA_K according to the + note below. + + Note + ---- + + The recursive formula for the EMA at step :math:`t = 2, ..., K` is: + + .. math:: + + \text{EMA}_t = \alpha \cdot s_t + (1 - \alpha) \cdot \text{EMA}_{t-1}, \qquad 0 \leq \alpha \leq 1 + + We set :math:`\text{EMA}_1 = s_1` as the largest score in the sorted vector s. + + :math:`\alpha` is the "forgetting factor" that gives more weight to the + most recent scores, and successively less weight to the previous scores. + + Parameters + ---------- + s : + Scores to be transformed. + + alpha : + Discount factor that determines the weight of the previous EMA score. + Higher alpha means that the previous EMA score has a lower weight while + the current score has a higher weight. + + Its value must be in the interval [0, 1]. + + If alpha is None, it is set to 2 / (K + 1) where K is the number of scores. + + axis : + Axis along which the scores are sorted. + + Returns + ------- + s_ema : + Exponential moving average score. + + Examples + -------- + >>> from cleanlab.internal.multilabel_scorer import exponential_moving_average + >>> import numpy as np + >>> s = np.array([[0.1, 0.2, 0.3]]) + >>> exponential_moving_average(s, alpha=0.5) + np.array([0.175]) + """ + K = s.shape[1] + s_sorted = np.fliplr(np.sort(s, axis=axis)) + if alpha is None: + # One conventional choice for alpha is 2/(K + 1), where K is the number of periods in the moving average. + alpha = float(2 / (K + 1)) + if not (0 <= alpha <= 1): + raise ValueError(f"alpha must be in the interval [0, 1], got {alpha}") + s_T = s_sorted.T + s_ema, s_next = s_T[0], s_T[1:] + for s_i in s_next: + s_ema = alpha * s_i + (1 - alpha) * s_ema + return s_ema
+ + +
[docs]def softmin( + s: np.ndarray, + *, + temperature: float = 0.1, + axis: int = 1, + **_, +) -> np.ndarray: + """Softmin score aggregation function. + + Parameters + ---------- + s : + Input array. + + temperature : + Temperature parameter. Too small values may cause numerical underflow and NaN scores. + + axis : + Axis along which to apply the function. + + Returns + ------- + Softmin score. + """ + + return np.einsum( + "ij,ij->i", s, softmax(x=1 - s, temperature=temperature, axis=axis, shift=True) + )
+ + +
[docs]class Aggregator: + """Helper class for aggregating the label quality scores for each class into a single score for each datapoint. + + Parameters + ---------- + method: + The method to compute the label quality scores for each class. + If passed as a callable, your function should take in a 1D array of K scores and return a single aggregated score. + See `~cleanlab.internal.multilabel_scorer.exponential_moving_average` for an example of such a function. + Alternatively, this can be a str value to specify a built-in function, possible values are the keys of the ``Aggregator``'s `possible_methods` attribute. + + kwargs: + Additional keyword arguments to pass to the aggregation function when it is called. + """ + + possible_methods: Dict[str, Callable[..., np.ndarray]] = { + "exponential_moving_average": exponential_moving_average, + "softmin": softmin, + } + + def __init__(self, method: Union[str, Callable], **kwargs): + if isinstance(method, str): # convert to callable + if method in self.possible_methods: + method = self.possible_methods[method] + else: + raise ValueError( + f"Invalid aggregation method specified: '{method}', must be one of the following: {list(self.possible_methods.keys())}" + ) + + self._validate_method(method) + self.method = method + self.kwargs = kwargs + + @staticmethod + def _validate_method(method) -> None: + if not callable(method): + raise TypeError(f"Expected callable method, got {type(method)}") + + @staticmethod + def _validate_scores(scores: np.ndarray) -> None: + if not (isinstance(scores, np.ndarray) and scores.ndim == 2): + raise ValueError( + f"Expected 2D array for scores, got {type(scores)} with shape {scores.shape}" + ) + +
[docs] def __call__(self, scores: np.ndarray, **kwargs) -> np.ndarray: + """Returns the label quality scores for each datapoint based on the given label quality scores for each class. + + Parameters + ---------- + scores: + The label quality scores for each class. + + Returns + ------- + aggregated_scores: + A single label quality score for each datapoint. + """ + self._validate_scores(scores) + kwargs["axis"] = 1 + updated_kwargs = {**self.kwargs, **kwargs} + return self.method(scores, **updated_kwargs)
+ + def __repr__(self): + return f"Aggregator(method={self.method.__name__}, kwargs={self.kwargs})"
+ + +
[docs]class MultilabelScorer: + """Aggregates label quality scores across different classes to produce one score per example in multi-label classification tasks. + + Parameters + ---------- + base_scorer: + The method to compute the label quality scores for each class. + + See the documentation for the ClassLabelScorer enum for more details. + + aggregator: + The method to aggregate the label quality scores for each class into a single score for each datapoint. + + Defaults to the EMA (exponential moving average) aggregator with forgetting factor ``alpha=0.8``. + + See the documentation for the Aggregator class for more details. + + See also + -------- + exponential_moving_average + + strict: + Flag for performing strict validation of the input data. + """ + + def __init__( + self, + base_scorer: ClassLabelScorer = ClassLabelScorer.SELF_CONFIDENCE, + aggregator: Union[Aggregator, Callable] = Aggregator(exponential_moving_average, alpha=0.8), + *, + strict: bool = True, + ): + self.base_scorer = base_scorer + if not isinstance(aggregator, Aggregator): + self.aggregator = Aggregator(aggregator) + else: + self.aggregator = aggregator + self.strict = strict + +
[docs] def __call__( + self, + labels: np.ndarray, + pred_probs: np.ndarray, + base_scorer_kwargs: Optional[dict] = None, + **aggregator_kwargs, + ) -> np.ndarray: + """ + Computes a quality score for each label in a multi-label classification problem + based on out-of-sample predicted probabilities. + For each example, the label quality scores for each class are aggregated into a single overall label quality score. + + Parameters + ---------- + labels: + A 2D array of shape (n_samples, n_labels) with binary labels. + + pred_probs: + A 2D array of shape (n_samples, n_labels) with predicted probabilities. + + kwargs: + Additional keyword arguments to pass to the base_scorer and the aggregator. + + base_scorer_kwargs: + Keyword arguments to pass to the base_scorer + + aggregator_kwargs: + Additional keyword arguments to pass to the aggregator. + + Returns + ------- + scores: + A 1D array of shape (n_samples,) with the quality scores for each datapoint. + + Examples + -------- + >>> from cleanlab.internal.multilabel_scorer import MultilabelScorer, ClassLabelScorer + >>> import numpy as np + >>> labels = np.array([[0, 1, 0], [1, 0, 1]]) + >>> pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]]) + >>> scorer = MultilabelScorer() + >>> scores = scorer(labels, pred_probs) + >>> scores + array([0.9, 0.5]) + + >>> scorer = MultilabelScorer( + ... base_scorer = ClassLabelScorer.NORMALIZED_MARGIN, + ... aggregator = np.min, # Use the "worst" label quality score for each example. + ... ) + >>> scores = scorer(labels, pred_probs) + >>> scores + array([0.9, 0.4]) + """ + if self.strict: + self._validate_labels_and_pred_probs(labels, pred_probs) + scores = self.get_class_label_quality_scores(labels, pred_probs, base_scorer_kwargs) + return self.aggregate(scores, **aggregator_kwargs)
+ +
[docs] def aggregate( + self, + class_label_quality_scores: np.ndarray, + **kwargs, + ) -> np.ndarray: + """Aggregates the label quality scores for each class into a single overall label quality score for each example. + + Parameters + ---------- + class_label_quality_scores: + A 2D array of shape (n_samples, n_labels) with the label quality scores for each class. + + See also + -------- + get_class_label_quality_scores + + kwargs: + Additional keyword arguments to pass to the aggregator. + + Returns + ------- + scores: + A 1D array of shape (n_samples,) with the quality scores for each datapoint. + + Examples + -------- + >>> from cleanlab.internal.multilabel_scorer import MultilabelScorer + >>> import numpy as np + >>> class_label_quality_scores = np.array([[0.9, 0.9, 0.3],[0.4, 0.9, 0.6]]) + >>> scorer = MultilabelScorer() # Use the default aggregator (exponential moving average) with default parameters. + >>> scores = scorer.aggregate(class_label_quality_scores) + >>> scores + array([0.42, 0.452]) + >>> new_scores = scorer.aggregate(class_label_quality_scores, alpha=0.5) # Use the default aggregator with custom parameters. + >>> new_scores + array([0.6, 0.575]) + + Warning + ------- + Make sure that keyword arguments correspond to the aggregation function used. + I.e. the ``exponential_moving_average`` function supports an ``alpha`` keyword argument, but ``np.min`` does not. + """ + return self.aggregator(class_label_quality_scores, **kwargs)
+ +
[docs] def get_class_label_quality_scores( + self, + labels: np.ndarray, + pred_probs: np.ndarray, + base_scorer_kwargs: Optional[dict] = None, + ) -> np.ndarray: + """Computes separate label quality scores for each class. + + Parameters + ---------- + labels: + A 2D array of shape (n_samples, n_labels) with binary labels. + + pred_probs: + A 2D array of shape (n_samples, n_labels) with predicted probabilities. + + base_scorer_kwargs: + Keyword arguments to pass to the base scoring-function. + + Returns + ------- + class_label_quality_scores: + A 2D array of shape (n_samples, n_labels) with the quality scores for each label. + + Examples + -------- + >>> from cleanlab.internal.multilabel_scorer import MultilabelScorer + >>> import numpy as np + >>> labels = np.array([[0, 1, 0], [1, 0, 1]]) + >>> pred_probs = np.array([[0.1, 0.9, 0.7], [0.4, 0.1, 0.6]]) + >>> scorer = MultilabelScorer() # Use the default base scorer (SELF_CONFIDENCE) + >>> class_label_quality_scores = scorer.get_label_quality_scores_per_class(labels, pred_probs) + >>> class_label_quality_scores + array([[0.9, 0.9, 0.3], + [0.4, 0.9, 0.6]]) + """ + class_label_quality_scores = np.zeros(shape=labels.shape) + if base_scorer_kwargs is None: + base_scorer_kwargs = {} + for i, (label_i, pred_prob_i) in enumerate(zip(labels.T, pred_probs.T)): + pred_prob_i_two_columns = stack_complement(pred_prob_i) + class_label_quality_scores[:, i] = self.base_scorer( + label_i, pred_prob_i_two_columns, **base_scorer_kwargs + ) + return class_label_quality_scores
+ + @staticmethod + def _validate_labels_and_pred_probs(labels: np.ndarray, pred_probs: np.ndarray) -> None: + """ + Checks that (multi-)labels are in the proper binary indicator format and that + they are compatible with the predicted probabilities. + """ + # Only allow dense matrices for labels for now + if not isinstance(labels, np.ndarray): + raise TypeError("Labels must be a numpy array.") + if not _is_multilabel(labels): + raise ValueError("Labels must be in multi-label format.") + if labels.shape != pred_probs.shape: + raise ValueError("Labels and predicted probabilities must have the same shape.")
+ + +
[docs]def get_label_quality_scores( + labels, + pred_probs, + *, + method: MultilabelScorer = MultilabelScorer(), + base_scorer_kwargs: Optional[dict] = None, + **aggregator_kwargs, +) -> np.ndarray: + """Computes a quality score for each label in a multi-label classification problem + based on out-of-sample predicted probabilities. + + Parameters + ---------- + labels: + A 2D array of shape (N, K) with binary labels. + + pred_probs: + A 2D array of shape (N, K) with predicted probabilities. + + method: + A scoring+aggregation method for computing the label quality scores of examples in a multi-label classification setting. + + base_scorer_kwargs: + Keyword arguments to pass to the class-label scorer. + + aggregator_kwargs: + Additional keyword arguments to pass to the aggregator. + + Returns + ------- + scores: + A 1D array of shape (N,) with the quality scores for each datapoint. + + Examples + -------- + >>> import cleanlab.internal.multilabel_scorer as ml_scorer + >>> import numpy as np + >>> labels = np.array([[0, 1, 0], [1, 0, 1]]) + >>> pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]]) + >>> scores = ml_scorer.get_label_quality_scores(labels, pred_probs, method=ml_scorer.MultilabelScorer()) + >>> scores + array([0.9, 0.5]) + + See also + -------- + MultilabelScorer: + See the documentation for the MultilabelScorer class for more examples of scoring methods and aggregation methods. + """ + return method(labels, pred_probs, base_scorer_kwargs=base_scorer_kwargs, **aggregator_kwargs)
+ + +# Probabilities + + +
[docs]def multilabel_py(y: np.ndarray) -> np.ndarray: + """Compute the prior probability of each label in a multi-label classification problem. + + Parameters + ---------- + y : + A 2d array of binarized multi-labels of shape (N, K) where N is the number of samples and K is the number of classes. + + Returns + ------- + py : + A 2d array of prior probabilities of shape (K,2) where the first column is the probability of the label being 0 + and the second column is the probability of the label being 1 for each class. + + Examples + -------- + >>> from cleanlab.internal.multilabel_scorer import multilabel_py + >>> import numpy as np + >>> y = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) + >>> multilabel_py(y) + array([[0.5, 0.5], + [0.5, 0.5]]) + >>> y = np.array([[0, 0], [0, 1], [1, 0], [1, 0], [1, 0]]) + >>> multilabel_py(y) + array([[0.4, 0.6], + [0.8, 0.2]]) + """ + + N, _ = y.shape + fraction_0 = np.sum(y == 0, axis=0) / N + fraction_1 = 1 - fraction_0 + py = np.column_stack((fraction_0, fraction_1)) + return py
+ + +# Cross-validation helpers + + +def _get_split_generator(labels, cv): + _, multilabel_ids = np.unique(labels, axis=0, return_inverse=True) + split_generator = cv.split(X=multilabel_ids, y=multilabel_ids) + return split_generator + + +
[docs]def get_cross_validated_multilabel_pred_probs(X, labels: np.ndarray, *, clf, cv) -> np.ndarray: + """Get predicted probabilities for a multi-label classifier via cross-validation. + + Note + ---- + The labels are reformatted to a "multi-class" format internally to support a wider range of cross-validation strategies. + If you have a multi-label dataset with `K` classes, the labels are reformatted to a "multi-class" format with up to `2**K` classes + (i.e. the number of possible class-assignment configurations). + It is unlikely that you'll all `2**K` configurations in your dataset. + + Parameters + ---------- + X : + A 2d array of features of shape (N, M) where N is the number of samples and M is the number of features. + + labels : + A 2d array of binarized multi-labels of shape (N, K) where N is the number of samples and K is the number of classes. + + clf : + A multi-label classifier with a ``predict_proba`` method. + + cv : + A cross-validation splitter with a ``split`` method that returns a generator of train/test indices. + + Returns + ------- + pred_probs : + A 2d array of predicted probabilities of shape (N, K) where N is the number of samples and K is the number of classes. + + Note + ---- + The predicted probabilities are not expected to sum to 1 for each sample in the case of multi-label classification. + + Examples + -------- + >>> import numpy as np + >>> from sklearn.model_selection import KFold + >>> from sklearn.multiclass import OneVsRestClassifier + >>> from sklearn.ensemble import RandomForestClassifier + >>> from cleanlab.internal.multilabel_scorer import get_cross_validated_multilabel_pred_probs + >>> np.random.seed(0) + >>> X = np.random.rand(16, 2) + >>> labels = np.random.randint(0, 2, size=(16, 2)) + >>> clf = OneVsRestClassifier(RandomForestClassifier()) + >>> cv = KFold(n_splits=2) + >>> get_cross_validated_multilabel_pred_probs(X, labels, clf=clf, cv=cv) + """ + split_generator = _get_split_generator(labels, cv) + pred_probs = cross_val_predict(clf, X, labels, cv=split_generator, method="predict_proba") + return pred_probs
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/internal/multilabel_utils.html b/v2.6.6/_modules/cleanlab/internal/multilabel_utils.html new file mode 100644 index 000000000..8c888da23 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/internal/multilabel_utils.html @@ -0,0 +1,784 @@ + + + + + + + + + + + cleanlab.internal.multilabel_utils - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.internal.multilabel_utils

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Helper functions used internally for multi-label classification tasks.
+"""
+from typing import List, Optional, Tuple
+
+import numpy as np
+
+from cleanlab.internal.util import get_num_classes
+
+
+def _is_multilabel(y: np.ndarray) -> bool:
+    """Checks whether `y` is in a multi-label indicator matrix format.
+
+    Sparse matrices are not supported.
+    """
+    if not (isinstance(y, np.ndarray) and y.ndim == 2 and y.shape[1] > 1):
+        return False
+    return np.array_equal(np.unique(y), [0, 1])
+
+
+
[docs]def stack_complement(pred_prob_slice: np.ndarray) -> np.ndarray: + """ + Extends predicted probabilities of a single class to two columns. + + Parameters + ---------- + pred_prob_slice: + A 1D array with predicted probabilities for a single class. + + Example + ------- + >>> pred_prob_slice = np.array([0.1, 0.9, 0.3, 0.8]) + >>> stack_complement(pred_prob_slice) + array([[0.9, 0.1], + [0.1, 0.9], + [0.7, 0.3], + [0.2, 0.8]]) + """ + return np.vstack((1 - pred_prob_slice, pred_prob_slice)).T
+ + +
[docs]def get_onehot_num_classes( + labels: list, pred_probs: Optional[np.ndarray] = None +) -> Tuple[np.ndarray, int]: + """Returns OneHot encoding of MultiLabel Data, and number of classes""" + num_classes = get_num_classes(labels=labels, pred_probs=pred_probs) + try: + y_one = int2onehot(labels, K=num_classes) + except TypeError: + raise ValueError( + "wrong format for labels, should be a list of list[indices], please check the documentation in find_label_issues for further information" + ) + return y_one, num_classes
+ + +
[docs]def int2onehot(labels: list, K: int) -> np.ndarray: + """Convert multi-label classification `labels` from a ``List[List[int]]`` format to a onehot matrix. + This returns a binarized format of the labels as a multi-hot vector for each example, where the entries in this vector are 1 for each class that applies to this example and 0 otherwise. + + Parameters + ---------- + labels: list of lists of integers + e.g. [[0,1], [3], [1,2,3], [1], [2]] + All integers from 0,1,...,K-1 must be represented. + K: int + The number of classes.""" + + from sklearn.preprocessing import MultiLabelBinarizer + + mlb = MultiLabelBinarizer(classes=range(K)) + return mlb.fit_transform(labels)
+ + +
[docs]def onehot2int(onehot_matrix: np.ndarray) -> List[List[int]]: + """Convert multi-label classification `labels` from a onehot matrix format to a ``List[List[int]]`` format that can be used with other cleanlab functions. + + Parameters + ---------- + onehot_matrix: 2D np.ndarray of 0s and 1s + A matrix representation of multi-label classification labels in a binarized format as a multi-hot vector for each example. + The entries in this vector are 1 for each class that applies to this example and 0 otherwise. + + Returns + ------- + labels: list of lists of integers + e.g. [[0,1], [3], [1,2,3], [1], [2]] + All integers from 0,1,...,K-1 must be represented.""" + + return [np.where(row)[0].tolist() for row in onehot_matrix]
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/internal/neighbor/knn_graph.html b/v2.6.6/_modules/cleanlab/internal/neighbor/knn_graph.html new file mode 100644 index 000000000..821bcb9f5 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/internal/neighbor/knn_graph.html @@ -0,0 +1,1257 @@ + + + + + + + + + + + cleanlab.internal.neighbor.knn_graph - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.internal.neighbor.knn_graph

+from __future__ import annotations
+from typing import List, Optional, TYPE_CHECKING, Tuple
+
+import numpy as np
+from scipy.sparse import csr_matrix
+from scipy.linalg import circulant
+from sklearn.neighbors import NearestNeighbors
+
+if TYPE_CHECKING:
+    from cleanlab.typing import FeatureArray, Metric
+
+from cleanlab.internal.neighbor.metric import decide_default_metric
+from cleanlab.internal.neighbor.search import construct_knn
+
+
+DEFAULT_K = 10
+"""Default number of neighbors to consider in the k-nearest neighbors search,
+unless the size of the feature array is too small or the user specifies a different value.
+
+This should be the largest desired value of k for all desired issue types that require a KNN graph.
+
+E.g. if near duplicates wants k=1 but outliers wants 10, then DEFAULT_K should be 10. This way, all issue types can rely on the same KNN graph.
+"""
+
+
+
[docs]def features_to_knn( + features: Optional[FeatureArray], + *, + n_neighbors: Optional[int] = None, + metric: Optional[Metric] = None, + **sklearn_knn_kwargs, +) -> NearestNeighbors: + """Build and fit a k-nearest neighbors search object from an array of numerical features. + + Parameters + ---------- + features : + The input feature array, with shape (N, M), where N is the number of samples and M is the number of features. + n_neighbors : + The number of nearest neighbors to consider. If None, a default value is determined based on the feature array size. + metric : + The distance metric to use for computing distances between points. If None, the metric is determined based on the feature array shape. + **sklearn_knn_kwargs : + Additional keyword arguments to be passed to the search index constructor. + + Returns + ------- + knn : + A k-nearest neighbors search object fitted to the input feature array. + + Examples + -------- + + >>> import numpy as np + >>> from cleanlab.internal.neighbor import features_to_knn + >>> features = np.random.rand(100, 10) + >>> knn = features_to_knn(features) + >>> knn + NearestNeighbors(metric='cosine', n_neighbors=10) + """ + if features is None: + raise ValueError("Both knn and features arguments cannot be None at the same time.") + # Use provided metric if available, otherwise decide based on the features. + metric = metric or decide_default_metric(features) + + # Decide the number of neighbors to use in the KNN search. + n_neighbors = _configure_num_neighbors(features, n_neighbors) + + knn = construct_knn(n_neighbors, metric, **sklearn_knn_kwargs) + return knn.fit(features)
+ + +
[docs]def construct_knn_graph_from_index( + knn: NearestNeighbors, + correction_features: Optional[FeatureArray] = None, +) -> csr_matrix: + """Construct a sparse distance matrix representation of KNN graph out of a fitted NearestNeighbors search object. + + Parameters + ---------- + knn : + A NearestNeighbors object that has been fitted to a feature array. + The KNN graph is constructed based on the distances and indices of each feature row's nearest neighbors. + correction_features : + The input feature array used to fit the NearestNeighbors object. + If provided, the function the distances and indices of the neighbors will be corrected based on exact + duplicates in the feature array. + If not provided, no correction will be applied. + + Warning + ------- + This function is designed to handle a specific case where a KNN index is used to construct a KNN graph by itself, + and there is a need to detect and correct for exact duplicates in the feature array. However, relying on this + function for such corrections is generally discouraged. There are other functions in the module that handle + KNN graph construction with feature corrections in a more flexible and robust manner. Use this function only + when there is a special need to correct distances and indices based on the feature array provided. + + Returns + ------- + knn_graph : + A sparse, weighted adjacency matrix representing the KNN graph of the feature array. + + Note + ---- + This is *not* intended to construct a KNN graph of test data. It is only used to construct a KNN graph of the data used to fit the NearestNeighbors object. + + Examples + -------- + >>> import numpy as np + >>> from cleanlab.internal.neighbor.knn_graph import features_to_knn, construct_knn_graph_from_index + >>> features = np.array([ + ... [0.701, 0.701], + ... [0.900, 0.436], + ... [0.000, 1.000], + ... ]) + >>> knn = features_to_knn(features, n_neighbors=1) + >>> knn_graph = construct_knn_graph_from_index(knn) + >>> knn_graph.toarray() # For demonstration purposes only. It is generally a bad idea to transform to dense matrix for large graphs. + array([[0. , 0.33140006, 0. ], + [0.33140006, 0. , 0. ], + [0.76210367, 0. , 0. ]]) + """ + + # Perform self-querying to get the distances and indices of the nearest neighbors + distances, indices = knn.kneighbors(X=None, return_distance=True) + + # Correct the distances and indices if the correction_features array is provided + if correction_features is not None: + distances, indices = correct_knn_distances_and_indices( + features=correction_features, distances=distances, indices=indices + ) + + N, K = distances.shape + + # Pointers to the row elements distances[indptr[i]:indptr[i+1]], + # and their corresponding column indices indices[indptr[i]:indptr[i+1]]. + indptr = np.arange(0, N * K + 1, K) + + return csr_matrix((distances.reshape(-1), indices.reshape(-1), indptr), shape=(N, N))
+ + +
[docs]def create_knn_graph_and_index( + features: Optional[FeatureArray], + *, + n_neighbors: Optional[int] = None, + metric: Optional[Metric] = None, + correct_exact_duplicates: bool = True, + **sklearn_knn_kwargs, +) -> Tuple[csr_matrix, NearestNeighbors]: + """Calculate the KNN graph from the features if it is not provided in the kwargs. + + Parameters + ---------- + features : + The input feature array, with shape (N, M), where N is the number of samples and M is the number of features. + n_neighbors : + The number of nearest neighbors to consider. If None, a default value is determined based on the feature array size. + metric : + The distance metric to use for computing distances between points. If None, the metric is determined based on the feature array shape. + correct_exact_duplicates : + Whether to correct the KNN graph to ensure that exact duplicates have zero mutual distance, and they are correctly included in the KNN graph. + **sklearn_knn_kwargs : + Additional keyword arguments to be passed to the search index constructor. + + Raises + ------ + ValueError : + If `features` is None, as it's required to construct a KNN graph from scratch. + + Returns + ------- + knn_graph : + A sparse, weighted adjacency matrix representing the KNN graph of the feature array. + knn : + A k-nearest neighbors search object fitted to the input feature array. This object can be used to query the nearest neighbors of new data points. + + Examples + -------- + >>> import numpy as np + >>> from cleanlab.internal.neighbor.knn_graph import create_knn_graph_and_index + >>> features = np.array([ + ... [0.701, 0.701], + ... [0.900, 0.436], + ... [0.000, 1.000], + ... ]) + >>> knn_graph, knn = create_knn_graph_and_index(features, n_neighbors=1) + >>> knn_graph.toarray() # For demonstration purposes only. It is generally a bad idea to transform to dense matrix for large graphs. + array([[0. , 0.33140006, 0. ], + [0.33140006, 0. , 0. ], + [0.76210367, 0. , 0. ]]) + >>> knn + NearestNeighbors(metric=<function euclidean at ...>, n_neighbors=1) # For demonstration purposes only. The actual metric may vary. + """ + # Construct NearestNeighbors object + knn = features_to_knn(features, n_neighbors=n_neighbors, metric=metric, **sklearn_knn_kwargs) + # Build graph from NearestNeighbors object + knn_graph = construct_knn_graph_from_index(knn) + + # Ensure that exact duplicates found with np.unique aren't accidentally missed in the KNN graph + if correct_exact_duplicates: + assert features is not None + knn_graph = correct_knn_graph(features, knn_graph) + return knn_graph, knn
+ + +
[docs]def correct_knn_graph(features: FeatureArray, knn_graph: csr_matrix) -> csr_matrix: + """ + Corrects a k-nearest neighbors (KNN) graph by handling exact duplicates in the feature array. + + This utility function takes a precomputed KNN graph and the corresponding feature array, + identifies sets of exact duplicate feature vectors, and corrects the KNN graph to properly + reflect these duplicates. The corrected KNN graph is returned as a sparse CSR matrix. + + Parameters + ---------- + features : np.ndarray + The input feature array, with shape (N, M), where N is the number of samples and M is the number of features. + knn_graph : csr_matrix + A sparse matrix of shape (N, N) representing the k-nearest neighbors graph. + The graph is expected to be in CSR (Compressed Sparse Row) format. + + Returns + ------- + csr_matrix + A corrected KNN graph in CSR format with adjusted distances and indices to properly handle + exact duplicates in the feature array. + + Notes + ----- + - This function assumes that the input `knn_graph` is already computed and provided in CSR format. + - The function modifies the KNN graph to ensure that exact duplicates are represented with zero distance + and correctly updated neighbor indices. + - This function is useful for post-processing a KNN graph when exact duplicates were not handled during + the initial KNN computation. + + """ + N = features.shape[0] + distances, indices = knn_graph.data.reshape(N, -1), knn_graph.indices.reshape(N, -1) + + corrected_distances, corrected_indices = correct_knn_distances_and_indices( + features, distances, indices + ) + N = features.shape[0] + return csr_matrix( + (corrected_distances.reshape(-1), corrected_indices.reshape(-1), knn_graph.indptr), + shape=(N, N), + )
+ + +def _compute_exact_duplicate_sets(features: FeatureArray) -> List[np.ndarray]: + """ + Computes the sets of exact duplicate points in the feature array. + + This function groups indices of points that have identical feature vectors. + It returns a list of arrays, where each array contains the indices of points that are exact duplicates + of each other. + + Parameters + ---------- + features : np.ndarray + The input feature array, with shape (N, M), where N is the number of samples and M is the number of features. + + Returns + ------- + exact_duplicate_sets + A list of 1D arrays, where each array contains the indices of exact duplicate points in the dataset. + Only sets with two or more duplicates are included in the list. If no exact duplicates are found, an empty list is returned. + + Examples + -------- + >>> features = np.array([[1, 2], [3, 4], [1, 2], [5, 6], [3, 4]]) + >>> _compute_exact_duplicate_sets(features) + [array([0, 2]), array([1, 4])] # The row value [1, 2] appears in rows 0 and 2, and [3, 4] appears in rows 1 and 4. + + Notes + ----- + - This function uses `np.unique` to find unique feature vectors and their inverse indices. + - This function is intended to be used internally within this module. + """ + # Use np.unique to catch inverse indices of all unique feature sets + _, unique_inverse, unique_counts = np.unique( + features, return_inverse=True, return_counts=True, axis=0 + ) + + # Collect different sets of exact duplicates in the dataset + exact_duplicate_sets = [ + np.where(unique_inverse == u)[0] for u in set(unique_inverse) if unique_counts[u] > 1 + ] + + return exact_duplicate_sets + + +
[docs]def correct_knn_distances_and_indices_with_exact_duplicate_sets_inplace( + distances: np.ndarray, + indices: np.ndarray, + exact_duplicate_sets: List[np.ndarray], +) -> None: + """ + Corrects the distances and indices arrays of k-nearest neighbors (KNN) graphs by handling sets + of exact duplicates explicitly. This function modifies the input arrays in-place. + + This function ensures that exact duplicates are correctly represented in the KNN graph. + It modifies the `distances` and `indices` arrays so that each set of exact duplicates + points to itself with zero distance, and adjusts the nearest neighbors accordingly. + + Parameters + ---------- + distances : + A 2D array of shape (N, k) representing the distances between each point of the N points and their k-nearest neighbors. + This array will be modified in-place to reflect the corrections for exact duplicates (whose mutual distances are explicitly set to zero). + indices : + A 2D array of shape (N, k) representing the indices of the nearest neighbors for each of the N points. + This array will be modified in-place to reflect the corrections for exact duplicates. + exact_duplicate_sets : + A list of 1D arrays, each containing the indices of points that are exact duplicates of each other. + These sets will be used to correct the KNN graph by ensuring that duplicates are reflected as nearest neighbors + with zero distance. + + High-Level Overview + ------------------- + The function operates in two main scenarios based on the size of the duplicate sets relative to k: + + 1. **Duplicate Set Size >= k + 1**: + - All nearest neighbors are exact duplicates. + - The `indices` array is updated such that the first k+1 entries for each duplicate set point are used to represent the nearest neighbors + of all points in the duplicate set. + - The rows of the `distances` array belonging to the duplicate set are set to zero. + + 2. **Duplicate Set Size < k + 1**: + - Some of the nearest neighbors are not exact duplicates. + - Non-duplicate neighbors are shifted to the back of the list. + - The `indices` and `distances` arrays are updated accordingly to reflect the duplicates at the front with zero distance. + + User Considerations + ------------------- + - **Input Validity**: Ensure that the `distances` and `indices` arrays have the correct shape and correspond to the same KNN graph. + - **In-Place Modifications**: The function modifies the input arrays directly. If the original data is needed, make a copy before calling the function. + - **Duplicate Set Size**: The function is optimized for cases where the number of exact duplicates can be larger than k. Ensure the duplicate sets are accurately identified. + - **Performance**: The function uses efficient NumPy operations, but performance can be affected by the size of the input arrays and the number of duplicate sets. + + Capabilities + ------------ + - Handles exact duplicate sets efficiently, ensuring correct KNN graph representation. + - Maintains zero distances for exact duplicates. + - Adjusts neighbor indices to reflect the presence of duplicates. + + Limitations + ----------- + - Assumes that the input arrays (`distances` and `indices`) come from a precomputed KNN graph. + - Does not handle near-duplicates or merge non-duplicate neighbors. + - Requires careful construction of `exact_duplicate_sets` to avoid misidentification. + """ + + # Number of neighbors + k = distances.shape[1] + + for duplicate_inds in exact_duplicate_sets: + # Determine the number of same points to include, respecting the limit of k + num_same = len(duplicate_inds) + num_same_included = min(num_same - 1, k) # ensure we do not exceed k neighbors + + sorted_first_k_duplicate_inds = _prepare_neighborhood_of_first_k_duplicates( + duplicate_inds, num_same_included + ) + + if num_same >= k + 1: + # All nearest neighbors are exact duplicates + + # We only pass in the ciruclant matrix of nearest neighbors + indices[duplicate_inds[: k + 1]] = sorted_first_k_duplicate_inds + # But the rest will just take the k first duplicate ids + indices[duplicate_inds[k + 1 :]] = duplicate_inds[:k] + + # Finally, set the distances between exact duplicates to zero + distances[duplicate_inds] = 0 + else: + # Some of the nearest neighbors aren't exact duplicates, move those to the back + + # Get indices and distances from knn that are not the same as i + different_point_mask = np.isin(indices[duplicate_inds], duplicate_inds, invert=True) + + # Get the indices of the first m True values in each row of the mask + true_indices = np.argsort(~different_point_mask, axis=1)[:, :-num_same_included] + + # Copy the values to the last m columns in dists + distances[duplicate_inds, -(k - num_same_included) :] = distances[ + duplicate_inds, true_indices.T + ].T + indices[duplicate_inds, -(k - num_same_included) :] = indices[ + duplicate_inds, true_indices.T + ].T + + # We can pass the circulant matrix to a slice + indices[duplicate_inds, :num_same_included] = sorted_first_k_duplicate_inds + + # Finally, set the distances between exact duplicates to zero + distances[duplicate_inds, :num_same_included] = 0 + + return None
+ + +def _prepare_neighborhood_of_first_k_duplicates(duplicate_inds, num_same_included): + """ + Prepare a matrix representing the neighborhoods of duplicate items. + + This function constructs a matrix where each row corresponds to an item + and contains the indices of its nearest neighbors (excluding itself), up + to a specified number `k`. + + Parameters: + ----------- + duplicate_inds : list + A list of indices that represent duplicate items. + + num_same_included : int + An integer `k` representing the number of neighbors to include for + each item. + + Returns: + -------- + np.ndarray + A matrix where each row contains the sorted indices of the nearest + neighbors for the corresponding item. + + Explanation: + ------------ + 1. Extract the Base for the Circulant Matrix: + - The function extracts the first `k+1` elements from `duplicate_inds` + to form the base of the circulant matrix. This approach ensures that + even if the set of duplicate items is larger, we only need to consider + the first `k` duplicates as the nearest neighbors, avoiding conflicts + with the items themselves. + + 2. Create the Circulant Matrix: + - A circulant matrix is generated from the base, where each row is a + cyclic permutation of the previous row. + + 3. Slice the Matrix to Exclude the First Column: + - The first column is removed to ensure each row represents the neighbors + without including the item itself. + + 4. Sort the Neighborhood Indices: + - The rows of the sliced matrix are sorted to ensure a consistent order + of neighbors. + + Example: + -------- + Given a set of 5 duplicate items `[A, B, C, D, E]` and `k=2`, the function + processes this as follows: + + 1. `circulant_base` for `k=2` would be `[A, B, C]`. + 2. The `circulant_matrix` might look like: + ``` + [A B C] + [B C A] + [C A B] + ``` + 3. Removing the first column results in: + ``` + [B C] + [C A] + [A B] + ``` + 4. Sorting each row gives the final matrix: + ``` + [B C] + [A C] + [A B] + ``` + + This matrix indicates that: + - The nearest neighbors of `A` are `[B, C]`. + - The nearest neighbors of `B` are `[A, C]`. + - The nearest neighbors of `C` are `[A, B]`. + + For `k=2`, the neighbors of `D`, `E`, onwards could be any of the above. + + The function constructs a sorted matrix of nearest neighbors for a list of + duplicate items, ensuring an equal distribution of neighbors up to a specified + number `k`. This process is necessary for tasks requiring an understanding of + the local neighborhood structure among duplicate examples. By using only the first + `k+1` elements, the function avoids the need to construct a larger circulant + matrix, simplifying the computation and ensuring no conflicts among the rest of the items. + """ + circulant_base = duplicate_inds[: num_same_included + 1] + circulant_matrix = circulant(circulant_base) + sliced_circulant_matrix = circulant_matrix[:, 1:] + sorted_first_k_duplicate_inds = np.sort(sliced_circulant_matrix, axis=1) + return sorted_first_k_duplicate_inds + + +
[docs]def correct_knn_distances_and_indices( + features: FeatureArray, + distances: np.ndarray, + indices: np.ndarray, + exact_duplicate_sets: Optional[List[np.ndarray]] = None, +) -> tuple[np.ndarray, np.ndarray]: + """ + Corrects the distances and indices of a k-nearest neighbors (KNN) graph + based on all exact duplicates detected in the feature array. + + Parameters + ---------- + features : + The feature array used to construct the KNN graph. + distances : + The distances between each point and its k nearest neighbors. + indices : + The indices of the k nearest neighbors for each point. + exact_duplicate_sets: + A list of numpy arrays, where each array contains the indices of exact duplicates in the feature array. If not provided, it will be computed from the feature array. + + Returns + ------- + corrected_distances : + The corrected distances between each point and its k nearest neighbors. Exact duplicates (based on the feature array) are ensured to have zero mutual distance. + corrected_indices : + The corrected indices of the k nearest neighbors for each point. Exact duplicates are ensured to be included in the k nearest neighbors, unless the number of exact duplicates exceeds k. + + Example + ------- + >>> import numpy as np + >>> X = np.array( + ... [ + ... [0, 0], + ... [0, 0], # Exact duplicate of the previous point + ... [1, 1], # The distances between this point and the others is sqrt(2) (equally distant from both) + ... ] + ... ) + >>> distances = np.array( # Distance to the 1-NN of each point + ... [ + ... [np.sqrt(2)], # Should be [0] + ... [1e-16], # Should be [0] + ... [np.sqrt(2)], + ... ] + ... ) + >>> indices = np.array( # Index of the 1-NN of each point + ... [ + ... [2], # Should be [1] + ... [0], + ... [1], # Might be [0] or [1] + ... ] + ... ) + >>> corrected_distances, corrected_indices = correct_knn_distances_and_indices(X, distances, indices) + >>> corrected_distances + array([[0.], [0.], [1.41421356]]) + >>> corrected_indices + array([[1], [0], [0]]) + """ + + if exact_duplicate_sets is None: + exact_duplicate_sets = _compute_exact_duplicate_sets(features) + + # Prepare the output arrays + corrected_distances = np.copy(distances) + corrected_indices = np.copy(indices) + + correct_knn_distances_and_indices_with_exact_duplicate_sets_inplace( + distances=corrected_distances, + indices=corrected_indices, + exact_duplicate_sets=exact_duplicate_sets, + ) + + return corrected_distances, corrected_indices
+ + +def _configure_num_neighbors(features: FeatureArray, k: Optional[int]): + # Error if the provided value is greater or equal to the number of examples. + N = features.shape[0] + k_larger_than_dataset = k is not None and k >= N + if k_larger_than_dataset: + raise ValueError( + f"Number of nearest neighbors k={k} cannot exceed the number of examples N={len(features)} passed into the estimator (knn)." + ) + + # Either use the provided value or select a default value based on the feature array size. + k = k or min(DEFAULT_K, N - 1) + return k +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/internal/neighbor/metric.html b/v2.6.6/_modules/cleanlab/internal/neighbor/metric.html new file mode 100644 index 000000000..4c7698512 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/internal/neighbor/metric.html @@ -0,0 +1,786 @@ + + + + + + + + + + + cleanlab.internal.neighbor.metric - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.internal.neighbor.metric

+from scipy.spatial.distance import euclidean
+
+from cleanlab.typing import FeatureArray, Metric
+
+HIGH_DIMENSION_CUTOFF: int = 3
+"""
+If the number of columns (M) in the `features` array is greater than this cutoff value,
+then by default, K-nearest-neighbors will use the "cosine" metric.
+The cosine metric is more suitable for high-dimensional data.
+Otherwise the "euclidean" distance will be used.
+
+"""
+ROW_COUNT_CUTOFF: int = 100
+"""
+Only affects settings where Euclidean metrics would be used by default.
+If the number of rows (N) in the `features` array is greater than this cutoff value,
+then by default, Euclidean distances are computed via the "euclidean" metric
+(implemented in sklearn for efficiency reasons).
+Otherwise, Euclidean distances are by default computed via
+the ``euclidean`` metric from scipy (slower but numerically more precise/accurate).
+"""
+
+
+# Metric decision functions
+def _euclidean_large_dataset() -> str:
+    return "euclidean"
+
+
+def _euclidean_small_dataset() -> Metric:
+    return euclidean
+
+
+def _cosine_metric() -> str:
+    return "cosine"
+
+
+
[docs]def decide_euclidean_metric(features: FeatureArray) -> Metric: + """ + Decide the appropriate Euclidean metric implementation based on the size of the dataset. + + Parameters + ---------- + features : + The input features array. + + Returns + ------- + metric : + A string or a callable representing a specific implementation of computing the euclidean distance. + + Note + ---- + A choice is made between two implementations + of the euclidean metric based on the number of rows in the feature array. + If the number of rows (N) in the feature array is greater than another predefined + cutoff value (ROW_COUNT_CUTOFF), the ``"euclidean"`` metric is used. This + is because the euclidean metric performs better on larger datasets. + If neither condition is met, the ``euclidean`` metric function from scipy is returned. + + See also + -------- + ROW_COUNT_CUTOFF: The cutoff value for the number of rows in the feature array. + sklearn.metrics.pairwise.euclidean_distances: The euclidean metric function from scikit-learn. + scipy.spatial.distance.euclidean: The euclidean metric function from scipy. + """ + num_rows = features.shape[0] + if num_rows > ROW_COUNT_CUTOFF: + return _euclidean_large_dataset() + else: + return _euclidean_small_dataset()
+ + +# Main function to decide the metric +
[docs]def decide_default_metric(features: FeatureArray) -> Metric: + """ + Decide the KNN metric to be used based on the shape of the feature array. + + Parameters + ---------- + features : + The input feature array, with shape (N, M), where N is the number of samples and M is the number of features. + + Returns + ------- + metric : + The distance metric to be used for neighbor search. It can be either a string + representing the metric name ("cosine" or "euclidean") or a callable + representing the metric function from scipy (euclidean). + + Note + ---- + The decision of which metric to use is based on the shape of the feature array. + If the number of columns (M) in the feature array is greater than a predefined + cutoff value (HIGH_DIMENSION_CUTOFF), the "cosine" metric is used. This is because the cosine + metric is more suitable for high-dimensional data. + + Otherwise, a euclidean metric is used. + That is handled by the :py:meth:`~cleanlab.internal.neighbor.metric.decide_euclidean_metric` function. + + See Also + -------- + HIGH_DIMENSION_CUTOFF: The cutoff value for the number of columns in the feature array. + sklearn.metrics.pairwise.cosine_distances: The cosine metric function from scikit-learn + """ + if features.shape[1] > HIGH_DIMENSION_CUTOFF: + return _cosine_metric() + return decide_euclidean_metric(features)
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/internal/neighbor/search.html b/v2.6.6/_modules/cleanlab/internal/neighbor/search.html new file mode 100644 index 000000000..66153bd36 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/internal/neighbor/search.html @@ -0,0 +1,754 @@ + + + + + + + + + + + cleanlab.internal.neighbor.search - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.internal.neighbor.search

+from __future__ import annotations
+from typing import TYPE_CHECKING
+
+from sklearn.neighbors import NearestNeighbors
+
+
+if TYPE_CHECKING:
+
+    from cleanlab.typing import Metric
+
+
+
[docs]def construct_knn(n_neighbors: int, metric: Metric, **knn_kwargs) -> NearestNeighbors: + """ + Constructs a k-nearest neighbors search object. You can implement a similar method to run cleanlab with your own approximate-KNN library. + + Parameters + ---------- + n_neighbors : + The number of nearest neighbors to consider. + metric : + The distance metric to use for computing distances between points. + See :py:mod:`~cleanlab.internal.neighbor.metric` for more information. + **knn_kwargs: + Additional keyword arguments to be passed to the search index constructor. + See https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html for more details on the available options. + + Returns + ------- + knn : + A k-nearest neighbors search object compatible with the scikit-learn NearestNeighbors class interface. + + Implements: + + - `fit` method: Accepts a feature array `X` to fit the model. + This enables subsequent neighbor searches on the data. + - `kneighbors` method: Finds the K-neighbors of a point, returning distances and indices of the k-nearest neighbors. Handles two scenarios: + 1. When a query array `features: np.ndarray` is provided, it returns the distances and indices for each point in the query array. + 2. When no query array is provided (`features = None`), it returns neighbors for each indexed point without considering the query point as its own neighbor. + Optionally, allows re-specification of the number of neighbors for each query point, defaulting to the constructor's value if not specified. + + Attributes: + + - `n_neighbors`: Number of neighbors to consider. + - `metric`: Distance metric used to compute distances between points. + - `metric_params`: Additional parameters for the distance metric function. + + Optional: + + - `kneighbors_graph` method: Not required but can be implemented for convenience. + Responsibility shifted to :py:ref:`construct_knn_graph_from_index <cleanlab.internal.neighbor.neighbor.construct_knn_graph_from_index>`. + + Fitted Attributes: + + - `n_features_in_`: Number of features observed during fit. + - `effective_metric_params_`: Metric parameters used in distance computation. + - `effective_metric_`: Metric used for computing distances to neighbors. + - `n_samples_fit_`: Number of samples in the fitted data. + + Additional: + + - `__sklearn_is_fitted__`: Method returning a boolean indicating if the object is fitted, + useful for conducting an is_fitted validation, which verifies the presence of fitted attributes (typically ending with a trailing underscore). + + + The above specifications ensure compatibility and provide a clear directive for developers needing to integrate alternative k-nearest neighbors implementations or modify existing functionalities. + + Note + ---- + The `metric` argument should be a callable that takes two arguments (the two points) and returns the distance between them. + The additional keyword arguments (`**knn_kwargs`) are passed directly to the underlying k-nearest neighbors search algorithm. + + """ + sklearn_knn = NearestNeighbors(n_neighbors=n_neighbors, metric=metric, **knn_kwargs) + + return sklearn_knn
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/internal/outlier.html b/v2.6.6/_modules/cleanlab/internal/outlier.html new file mode 100644 index 000000000..0a9ac91e5 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/internal/outlier.html @@ -0,0 +1,807 @@ + + + + + + + + + + + cleanlab.internal.outlier - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.internal.outlier

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Helper functions used internally for outlier detection tasks.
+"""
+
+from typing import Optional
+import numpy as np
+
+from cleanlab.internal.constants import EPSILON
+
+
+
[docs]def transform_distances_to_scores( + avg_distances: np.ndarray, t: int, scaling_factor: float +) -> np.ndarray: + """Returns an outlier score for each example based on its average distance to its k nearest neighbors. + + The transformation of a distance, :math:`d` , to a score, :math:`o` , is based on the following formula: + + .. math:: + o = \\exp\\left(-dt\\right) + + where :math:`t` scales the distance to a score in the range [0,1]. + + Parameters + ---------- + avg_distances : np.ndarray + An array of distances of shape ``(N)``, where N is the number of examples. + Each entry represents an example's average distance to its k nearest neighbors. + + t : int + A sensitivity parameter that modulates the strength of the transformation from distances to scores. + Higher values of `t` result in more pronounced differentiation between the scores of examples + lying in the range [0,1]. + + scaling_factor : float + A scaling factor used to normalize the distances before they are converted into scores. A valid + scaling factor is any positive number. The choice of scaling factor should be based on the + distribution of distances between neighboring examples. A good rule of thumb is to set the + scaling factor to the median distance between neighboring examples. A lower scaling factor + results in more pronounced differentiation between the scores of examples lying in the range [0,1]. + + Returns + ------- + ood_features_scores : np.ndarray + An array of outlier scores of shape ``(N,)`` for N examples. + + Examples + -------- + >>> import numpy as np + >>> from cleanlab.outlier import transform_distances_to_scores + >>> distances = np.array([[0.0, 0.1, 0.25], + ... [0.15, 0.2, 0.3]]) + >>> avg_distances = np.mean(distances, axis=1) + >>> transform_distances_to_scores(avg_distances, t=1, scaling_factor=1) + array([0.88988177, 0.80519832]) + """ + # Map ood_features_scores to range 0-1 with 0 = most concerning + return np.exp(-t * avg_distances / max(scaling_factor, EPSILON))
+ + +
[docs]def correct_precision_errors( + scores: np.ndarray, + avg_distances: np.ndarray, + metric: str, + C: int = 100, + p: Optional[int] = None, +): + """ + Ensure that scores where avg_distances are below the tolerance threshold get a score of one. + + Parameters + ---------- + scores : + An array of scores of shape ``(N)``, where N is the number of examples. + Each entry represents a score between 0 and 1. + + avg_distances : + An array of distances of shape ``(N)``, where N is the number of examples. + Each entry represents an example's average distance to its k nearest neighbors. + + metric : + The metric used by the knn algorithm to calculate the distances. + It must be 'cosine', 'euclidean' or 'minkowski', otherwise this function does nothing. + + C : + Multiplier used to increase the tolerance of the acceptable precision differences. + It is a multiplicative factor of the machine epsilon that is used to calculate the tolerance. + For the type of values that are used in the distances, a value of 100 should be a sensible + default value for small values of the distances, below the order of 1. + + p : + This value is only used when metric is 'minkowski'. + A ValueError will be raised if metric is 'minkowski' and 'p' was not provided. + + Returns + ------- + fixed_scores : + An array of scores of shape ``(N,)`` for N examples with scores between 0 and 1. + """ + if metric == "cosine": + tolerance = C * np.finfo(np.float_).epsneg + elif metric == "euclidean": + tolerance = np.sqrt(C * np.finfo(np.float_).eps) + elif metric == "minkowski": + if p is None: + raise ValueError("When metric is 'minkowski' you must specify the 'p' parameter") + tolerance = (C * np.finfo(np.float_).eps) ** (1 / p) + else: + return scores + + candidates_mask = avg_distances < tolerance + scores[candidates_mask] = 1 + return scores
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/internal/token_classification_utils.html b/v2.6.6/_modules/cleanlab/internal/token_classification_utils.html new file mode 100644 index 000000000..272d643c6 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/internal/token_classification_utils.html @@ -0,0 +1,970 @@ + + + + + + + + + + + cleanlab.internal.token_classification_utils - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.internal.token_classification_utils

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Helper methods used internally in cleanlab.token_classification
+"""
+from __future__ import annotations
+
+import re
+import string
+import numpy as np
+from termcolor import colored
+from typing import List, Optional, Callable, Tuple, TypeVar, TYPE_CHECKING
+
+if TYPE_CHECKING:  # pragma: no cover
+    import numpy.typing as npt
+
+    T = TypeVar("T", bound=npt.NBitBase)
+
+
+
[docs]def get_sentence(words: List[str]) -> str: + """ + Get sentence formed by a list of words with minor processing for readability + + Parameters + ---------- + words: + list of word-level tokens + + Returns + ---------- + sentence: + sentence formed by list of word-level tokens + + Examples + -------- + >>> from cleanlab.internal.token_classification_utils import get_sentence + >>> words = ["This", "is", "a", "sentence", "."] + >>> get_sentence(words) + 'This is a sentence.' + """ + sentence = "" + for word in words: + if word not in string.punctuation or word in ["-", "("]: + word = " " + word + sentence += word + sentence = sentence.replace(" '", "'").replace("( ", "(").strip() + return sentence
+ + +
[docs]def filter_sentence( + sentences: List[str], + condition: Optional[Callable[[str], bool]] = None, +) -> Tuple[List[str], List[bool]]: + """ + Filter sentence based on some condition, and returns filter mask + + Parameters + ---------- + sentences: + list of sentences + + condition: + sentence filtering condition + + Returns + --------- + sentences: + list of sentences filtered + + mask: + boolean mask such that `mask[i] == True` if the i'th sentence is included in the + filtered sentence, otherwise `mask[i] == False` + + Examples + -------- + >>> from cleanlab.internal.token_classification_utils import filter_sentence + >>> sentences = ["Short sentence.", "This is a longer sentence."] + >>> condition = lambda x: len(x.split()) > 2 + >>> long_sentences, _ = filter_sentence(sentences, condition) + >>> long_sentences + ['This is a longer sentence.'] + >>> document = ["# Headline", "Sentence 1.", "&", "Sentence 2."] + >>> sentences, mask = filter_sentence(document) + >>> sentences, mask + (['Sentence 1.', 'Sentence 2.'], [False, True, False, True]) + """ + if not condition: + condition = lambda sentence: len(sentence) > 1 and "#" not in sentence + mask = list(map(condition, sentences)) + sentences = [sentence for m, sentence in zip(mask, sentences) if m] + return sentences, mask
+ + +
[docs]def process_token(token: str, replace: List[Tuple[str, str]] = [("#", "")]) -> str: + """ + Replaces special characters in the tokens + + Parameters + ---------- + token: + token which potentially contains special characters + + replace: + list of tuples `(s1, s2)`, where all occurances of s1 are replaced by s2 + + Returns + --------- + processed_token: + processed token whose special character has been replaced + + Note + ---- + Only applies to characters in the original input token. + + Examples + -------- + >>> from cleanlab.internal.token_classification_utils import process_token + >>> token = "#Comment" + >>> process_token("#Comment") + 'Comment' + + Specify custom replacement rules + + >>> replace = [("C", "a"), ("a", "C")] + >>> process_token("Cleanlab", replace) + 'aleCnlCb' + """ + replace_dict = {re.escape(k): v for (k, v) in replace} + pattern = "|".join(replace_dict.keys()) + compiled_pattern = re.compile(pattern) + replacement = lambda match: replace_dict[re.escape(match.group(0))] + processed_token = compiled_pattern.sub(replacement, token) + return processed_token
+ + +
[docs]def mapping(entities: List[int], maps: List[int]) -> List[int]: + """ + Map a list of entities to its corresponding entities + + Parameters + ---------- + entities: + a list of given entities + + maps: + a list of mapped entities, such that the i'th indexed token should be mapped to `maps[i]` + + Returns + --------- + mapped_entities: + a list of mapped entities + + Examples + -------- + >>> unique_identities = [0, 1, 2, 3, 4] # ["O", "B-PER", "I-PER", "B-LOC", "I-LOC"] + >>> maps = [0, 1, 1, 2, 2] # ["O", "PER", "PER", "LOC", "LOC"] + >>> mapping(unique_identities, maps) + [0, 1, 1, 2, 2] # ["O", "PER", "PER", "LOC", "LOC"] + >>> mapping([0, 0, 4, 4, 3, 4, 0, 2], maps) + [0, 0, 2, 2, 2, 2, 0, 1] # ["O", "O", "LOC", "LOC", "LOC", "LOC", "O", "PER"] + """ + f = lambda x: maps[x] + return list(map(f, entities))
+ + +
[docs]def merge_probs( + probs: npt.NDArray["np.floating[T]"], maps: List[int] +) -> npt.NDArray["np.floating[T]"]: + """ + Merges model-predictive probabilities with desired mapping + + Parameters + ---------- + probs: + A 2D np.array of shape `(N, K)`, where N is the number of tokens, and K is the number of classes for the model + + maps: + a list of mapped index, such that the probability of the token being in the i'th class is mapped to the + `maps[i]` index. If `maps[i] == -1`, the i'th column of `probs` is ignored. If `np.any(maps == -1)`, the + returned probability is re-normalized. + + Returns + --------- + probs_merged: + A 2D np.array of shape ``(N, K')``, where `K'` is the number of new classes. Probabilities are merged and + re-normalized if necessary. + + Examples + -------- + >>> import numpy as np + >>> from cleanlab.internal.token_classification_utils import merge_probs + >>> probs = np.array([ + ... [0.55, 0.0125, 0.0375, 0.1, 0.3], + ... [0.1, 0.8, 0, 0.075, 0.025], + ... ]) + >>> maps = [0, 1, 1, 2, 2] + >>> merge_probs(probs, maps) + array([[0.55, 0.05, 0.4 ], + [0.1 , 0.8 , 0.1 ]]) + """ + old_classes = probs.shape[1] + map_size = np.max(maps) + 1 + probs_merged = np.zeros([len(probs), map_size], dtype=probs.dtype.type) + + for i in range(old_classes): + if maps[i] >= 0: + probs_merged[:, maps[i]] += probs[:, i] + if -1 in maps: + row_sums = probs_merged.sum(axis=1) + probs_merged /= row_sums[:, np.newaxis] + return probs_merged
+ + +
[docs]def color_sentence(sentence: str, word: str) -> str: + """ + Searches for a given token in the sentence and returns the sentence where the given token is colored red + + Parameters + ---------- + sentence: + a sentence where the word is searched + + word: + keyword to find in `sentence`. Assumes the word exists in the sentence. + Returns + --------- + colored_sentence: + `sentence` where the every occurrence of the word is colored red, using ``termcolor.colored`` + + Examples + -------- + >>> from cleanlab.internal.token_classification_utils import color_sentence + >>> sentence = "This is a sentence." + >>> word = "sentence" + >>> color_sentence(sentence, word) + 'This is a \x1b[31msentence\x1b[0m.' + + Also works for multiple occurrences of the word + + >>> document = "This is a sentence. This is another sentence." + >>> word = "sentence" + >>> color_sentence(document, word) + 'This is a \x1b[31msentence\x1b[0m. This is another \x1b[31msentence\x1b[0m.' + """ + colored_word = colored(word, "red") + return _replace_sentence(sentence=sentence, word=word, new_word=colored_word)
+ + +def _replace_sentence(sentence: str, word: str, new_word: str) -> str: + """ + Searches for a given token in the sentence and returns the sentence where the given token has been replaced by + `new_word`. + + Parameters + ---------- + sentence: + a sentence where the word is searched + + word: + keyword to find in `sentence`. Assumes the word exists in the sentence. + + new_word: + the word to replace the keyword with + + Returns + --------- + new_sentence: + `sentence` where the every occurrence of the word is replaced by `colored_word` + """ + + new_sentence, number_of_substitions = re.subn( + r"\b{}\b".format(re.escape(word)), new_word, sentence + ) + if number_of_substitions == 0: + # Use basic string manipulation if regex fails + new_sentence = sentence.replace(word, new_word) + return new_sentence +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/internal/util.html b/v2.6.6/_modules/cleanlab/internal/util.html new file mode 100644 index 000000000..98f1d9d83 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/internal/util.html @@ -0,0 +1,1449 @@ + + + + + + + + + + + cleanlab.internal.util - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.internal.util

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Ancillary helper methods used internally throughout this package; mostly related to Confident Learning algorithms.
+"""
+
+import warnings
+from typing import Optional, Tuple, Union
+
+import numpy as np
+import pandas as pd
+
+from cleanlab.internal.constants import FLOATING_POINT_COMPARISON, TINY_VALUE
+from cleanlab.internal.validation import labels_to_array
+from cleanlab.typing import DatasetLike, LabelLike
+
+
+
[docs]def remove_noise_from_class(noise_matrix, class_without_noise) -> np.ndarray: + """A helper function in the setting of PU learning. + Sets all P(label=class_without_noise|true_label=any_other_class) = 0 + in noise_matrix for pulearning setting, where we have + generalized the positive class in PU learning to be any + class of choosing, denoted by class_without_noise. + + Parameters + ---------- + noise_matrix : np.ndarray of shape (K, K), K = number of classes + A conditional probability matrix of the form P(label=k_s|true_label=k_y) containing + the fraction of examples in every class, labeled as every other class. + Assumes columns of noise_matrix sum to 1. + + class_without_noise : int + Integer value of the class that has no noise. Traditionally, + this is 1 (positive) for PU learning.""" + + # Number of classes + K = len(noise_matrix) + + cwn = class_without_noise + x = np.copy(noise_matrix) + + # Set P( labels = cwn | y != cwn) = 0 (no noise) + x[cwn, [i for i in range(K) if i != cwn]] = 0.0 + + # Normalize columns by increasing diagonal terms + # Ensures noise_matrix is a valid probability matrix + for i in range(K): + x[i][i] = 1 - float(np.sum(x[:, i]) - x[i][i]) + + return x
+ + +
[docs]def clip_noise_rates(noise_matrix) -> np.ndarray: + """Clip all noise rates to proper range [0,1), but + do not modify the diagonal terms because they are not + noise rates. + + ASSUMES noise_matrix columns sum to 1. + + Parameters + ---------- + noise_matrix : np.ndarray of shape (K, K), K = number of classes + A conditional probability matrix containing the fraction of + examples in every class, labeled as every other class. + Diagonal terms are not noise rates, but are consistency P(label=k|true_label=k) + Assumes columns of noise_matrix sum to 1""" + + def clip_noise_rate_range(noise_rate) -> float: + """Clip noise rate P(label=k'|true_label=k) or P(true_label=k|label=k') + into proper range [0,1)""" + return min(max(noise_rate, 0.0), 0.9999) + + # Vectorize clip_noise_rate_range for efficiency with np.ndarrays. + vectorized_clip = np.vectorize(clip_noise_rate_range) + + # Preserve because diagonal entries are not noise rates. + diagonal = np.diagonal(noise_matrix) + + # Clip all noise rates (efficiently). + noise_matrix = vectorized_clip(noise_matrix) + + # Put unmodified diagonal back. + np.fill_diagonal(noise_matrix, diagonal) + + # Re-normalized noise_matrix so that columns sum to one. + noise_matrix = noise_matrix / np.clip(noise_matrix.sum(axis=0), a_min=TINY_VALUE, a_max=None) + return noise_matrix
+ + +
[docs]def clip_values(x, low=0.0, high=1.0, new_sum=None) -> np.ndarray: + """Clip all values in p to range [low,high]. + Preserves sum of x. + + Parameters + ---------- + x : np.ndarray + An array / list of values to be clipped. + + low : float + values in x greater than 'low' are clipped to this value + + high : float + values in x greater than 'high' are clipped to this value + + new_sum : float + normalizes x after clipping to sum to new_sum + + Returns + ------- + x : np.ndarray + A list of clipped values, summing to the same sum as x.""" + + def clip_range(a, low=low, high=high): + """Clip a into range [low,high]""" + return min(max(a, low), high) + + vectorized_clip = np.vectorize( + clip_range + ) # Vectorize clip_range for efficiency with np.ndarrays + prev_sum = sum(x) if new_sum is None else new_sum # Store previous sum + x = vectorized_clip(x) # Clip all values (efficiently) + x = ( + x * prev_sum / np.clip(float(sum(x)), a_min=TINY_VALUE, a_max=None) + ) # Re-normalized values to sum to previous sum + return x
+ + +
[docs]def value_counts(x, *, num_classes: Optional[int] = None, multi_label=False) -> np.ndarray: + """Returns an np.ndarray of shape (K, 1), with the + value counts for every unique item in the labels list/array, + where K is the number of unique entries in labels. + + Works for both single-labeled and multi-labeled data. + + Parameters + ---------- + x : list or np.ndarray (one dimensional) + A list of discrete objects, like lists or strings, for + example, class labels 'y' when training a classifier. + e.g. ["dog","dog","cat"] or [1,2,0,1,1,0,2] + + num_classes : int (default: None) + Setting this fills the value counts for missing classes with zeros. + For example, if x = [0, 0, 1, 1, 3] then setting ``num_classes=5`` returns + [2, 2, 0, 1, 0] whereas setting ``num_classes=None`` would return [2, 2, 1]. This assumes + your labels come from the set [0, 1,... num_classes=1] even if some classes are missing. + + multi_label : bool, optional + If ``True``, labels should be an iterable (e.g. list) of iterables, containing a + list of labels for each example, instead of just a single label. + Assumes all classes in pred_probs.shape[1] are represented in labels. + The multi-label setting supports classification tasks where an example has 1 or more labels. + Example of a multi-labeled `labels` input: ``[[0,1], [1], [0,2], [0,1,2], [0], [1], ...]``. + The major difference in how this is calibrated versus single-label is that + the total number of errors considered is based on the number of labels, + not the number of examples. So, the calibrated `confident_joint` will sum + to the number of total labels.""" + + # Efficient method if x is pd.Series, np.ndarray, or list + if multi_label: + x = [z for lst in x for z in lst] # Flatten + unique_classes, counts = np.unique(x, return_counts=True) + + # Early exit if num_classes is not provided or redundant + if num_classes is None or num_classes == len(unique_classes): + return counts + + # Else, there are missing classes + labels_are_integers = np.issubdtype(np.array(x).dtype, np.integer) + if labels_are_integers and num_classes <= np.max(unique_classes): + raise ValueError(f"Required: num_classes > max(x), but {num_classes} <= {np.max(x)}.") + + # Add zero counts for all missing classes in [0, 1,..., num_classes-1] + total_counts = np.zeros(num_classes, dtype=int) + # Fill in counts for classes that are present. + # If labels are integers, unique_classes can be used directly as indices to place counts + # into the correct positions in total_counts array. + # If labels are strings, use a slice to fill counts sequentially since strings do not map to indices. + count_ids = unique_classes if labels_are_integers else slice(len(unique_classes)) + total_counts[count_ids] = counts + + # Return counts with zeros for all missing classes. + return total_counts
+ + +
[docs]def value_counts_fill_missing_classes(x, num_classes, *, multi_label=False) -> np.ndarray: + """Same as ``internal.util.value_counts`` but requires that num_classes is provided and + always fills missing classes with zero counts. + + See ``internal.util.value_counts`` for parameter docstrings.""" + + return value_counts(x, num_classes=num_classes, multi_label=multi_label)
+ + +
[docs]def get_missing_classes(labels, *, pred_probs=None, num_classes=None, multi_label=False): + """Find which classes are present in ``pred_probs`` but not present in ``labels``. + + See ``count.compute_confident_joint`` for parameter docstrings.""" + if pred_probs is None and num_classes is None: + raise ValueError("Both pred_probs and num_classes are None. You must provide exactly one.") + if pred_probs is not None and num_classes is not None: + raise ValueError("Both pred_probs and num_classes are not None. Only one may be provided.") + if num_classes is None: + num_classes = pred_probs.shape[1] + unique_classes = get_unique_classes(labels, multi_label=multi_label) + return sorted(set(range(num_classes)).difference(unique_classes))
+ + +
[docs]def round_preserving_sum(iterable) -> np.ndarray: + """Rounds an iterable of floats while retaining the original summed value. + The name of each parameter is required. The type and description of each + parameter is optional, but should be included if not obvious. + + The while loop in this code was adapted from: + https://github.com/cgdeboer/iteround + + Parameters + ----------- + iterable : list<float> or np.ndarray<float> + An iterable of floats + + Returns + ------- + list<int> or np.ndarray<int> + The iterable rounded to int, preserving sum.""" + + floats = np.asarray(iterable, dtype=float) + ints = floats.round() + orig_sum = np.sum(floats).round() + int_sum = np.sum(ints).round() + # Adjust the integers so that they sum to orig_sum + while abs(int_sum - orig_sum) > FLOATING_POINT_COMPARISON: + diff = np.round(orig_sum - int_sum) + increment = -1 if int(diff < 0.0) else 1 + changes = min(int(abs(diff)), len(iterable)) + # Orders indices by difference. Increments # of changes. + indices = np.argsort(floats - ints)[::-increment][:changes] + for i in indices: + ints[i] = ints[i] + increment + int_sum = np.sum(ints).round() + return ints.astype(int)
+ + +
[docs]def round_preserving_row_totals(confident_joint) -> np.ndarray: + """Rounds confident_joint cj to type int + while preserving the totals of reach row. + Assumes that cj is a 2D np.ndarray of type float. + + Parameters + ---------- + confident_joint : 2D np.ndarray<float> of shape (K, K) + See compute_confident_joint docstring for details. + + Returns + ------- + confident_joint : 2D np.ndarray<int> of shape (K,K) + Rounded to int while preserving row totals.""" + + return np.apply_along_axis( + func1d=round_preserving_sum, + axis=1, + arr=confident_joint, + ).astype(int)
+ + +
[docs]def estimate_pu_f1(s, prob_s_eq_1) -> float: + """Computes Claesen's estimate of f1 in the pulearning setting. + + Parameters + ---------- + s : iterable (list or np.ndarray) + Binary label (whether each element is labeled or not) in pu learning. + + prob_s_eq_1 : iterable (list or np.ndarray) + The probability, for each example, whether it has label=1 P(label=1|x) + + Output (float) + ------ + Claesen's estimate for f1 in the pulearning setting.""" + + pred = np.asarray(prob_s_eq_1) >= 0.5 + true_positives = sum((np.asarray(s) == 1) & (np.asarray(pred) == 1)) + all_positives = sum(s) + recall = true_positives / float(all_positives) + frac_positive = sum(pred) / float(len(s)) + return recall**2 / (2.0 * frac_positive) if frac_positive != 0 else np.nan
+ + +
[docs]def confusion_matrix(true, pred) -> np.ndarray: + """Implements a confusion matrix for true labels + and predicted labels. true and pred MUST BE the same length + and have the same distinct set of class labels represented. + + Results are identical (and similar computation time) to: + "sklearn.metrics.confusion_matrix" + + However, this function avoids the dependency on sklearn. + + Parameters + ---------- + true : np.ndarray 1d + Contains labels. + Assumes true and pred contains the same set of distinct labels. + + pred : np.ndarray 1d + A discrete vector of noisy labels, i.e. some labels may be erroneous. + *Format requirements*: for dataset with K classes, labels must be in {0,1,...,K-1}. + + Returns + ------- + confusion_matrix : np.ndarray (2D) + matrix of confusion counts with true on rows and pred on columns.""" + + assert len(true) == len(pred) + true_classes = np.unique(true) + pred_classes = np.unique(pred) + K_true = len(true_classes) # Number of classes in true + K_pred = len(pred_classes) # Number of classes in pred + map_true = dict(zip(true_classes, range(K_true))) + map_pred = dict(zip(pred_classes, range(K_pred))) + + result = np.zeros((K_true, K_pred)) + for i in range(len(true)): + result[map_true[true[i]]][map_pred[pred[i]]] += 1 + + return result
+ + + + + + + + + + + + + + +
[docs]def compress_int_array(int_array, num_possible_values) -> np.ndarray: + """Compresses dtype of np.ndarray<int> if num_possible_values is small enough.""" + try: + compressed_type = None + if num_possible_values < np.iinfo(np.dtype("int16")).max: + compressed_type = "int16" + elif num_possible_values < np.iinfo(np.dtype("int32")).max: # pragma: no cover + compressed_type = "int32" # pragma: no cover + if compressed_type is not None: + int_array = int_array.astype(compressed_type) + return int_array + except Exception: # int_array may not even be numpy array, keep as is then + return int_array
+ + +
[docs]def train_val_split( + X, labels, train_idx, holdout_idx +) -> Tuple[DatasetLike, DatasetLike, LabelLike, LabelLike]: + """Splits data into training/validation sets based on given indices""" + labels_train, labels_holdout = ( + labels[train_idx], + labels[holdout_idx], + ) # labels are always np.ndarray + split_completed = False + if isinstance(X, (pd.DataFrame, pd.Series)): + X_train, X_holdout = X.iloc[train_idx], X.iloc[holdout_idx] + split_completed = True + if not split_completed: + try: # check if X is pytorch Dataset object using lazy import + import torch + + if isinstance(X, torch.utils.data.Dataset): # special splitting for pytorch Dataset + X_train = torch.utils.data.Subset(X, train_idx) + X_holdout = torch.utils.data.Subset(X, holdout_idx) + split_completed = True + except Exception: + pass + if not split_completed: + try: # check if X is tensorflow Dataset object using lazy import + import tensorflow + + if isinstance(X, tensorflow.data.Dataset): # special splitting for tensorflow Dataset + X_train = extract_indices_tf(X, train_idx, allow_shuffle=True) + X_holdout = extract_indices_tf(X, holdout_idx, allow_shuffle=False) + split_completed = True + except Exception: + pass + if not split_completed: + try: + X_train, X_holdout = X[train_idx], X[holdout_idx] + except Exception: + raise ValueError( + "Cleanlab cannot split this form of dataset (required for cross-validation). " + "Try a different data format, " + "or implement the cross-validation yourself and instead provide out-of-sample `pred_probs`" + ) + + return X_train, X_holdout, labels_train, labels_holdout
+ + +
[docs]def subset_X_y(X, labels, mask) -> Tuple[DatasetLike, LabelLike]: + """Extracts subset of features/labels where mask is True""" + labels = subset_labels(labels, mask) + X = subset_data(X, mask) + return X, labels
+ + +
[docs]def subset_labels(labels, mask) -> Union[list, np.ndarray, pd.Series]: + """Extracts subset of labels where mask is True""" + try: # filtering labels as if it is array or DataFrame + return labels[mask] + except Exception: + try: # filtering labels as if it is list + return [l for idx, l in enumerate(labels) if mask[idx]] + except Exception: + raise TypeError("labels must be 1D np.ndarray, list, or pd.Series.")
+ + +
[docs]def subset_data(X, mask) -> DatasetLike: + """Extracts subset of data examples where mask (np.ndarray) is True""" + try: + import torch + + if isinstance(X, torch.utils.data.Dataset): + mask_idx_list = list(np.nonzero(mask)[0]) + return torch.utils.data.Subset(X, mask_idx_list) + except Exception: + pass + try: + with warnings.catch_warnings(): + warnings.filterwarnings("ignore") + import tensorflow + + if isinstance(X, tensorflow.data.Dataset): # special splitting for tensorflow Dataset + mask_idx = np.nonzero(mask)[0] + return extract_indices_tf(X, mask_idx, allow_shuffle=True) + except Exception: + pass + try: + return X[mask] + except Exception: + raise TypeError("Data features X must be subsettable with boolean mask array: X[mask]")
+ + +
[docs]def extract_indices_tf(X, idx, allow_shuffle) -> DatasetLike: + """Extracts subset of tensorflow dataset corresponding to examples at particular indices. + + Args: + X : ``tensorflow.data.Dataset`` + + idx : array_like of integer indices corresponding to examples to keep in the dataset. + Returns subset of examples in the dataset X that correspond to these indices. + + allow_shuffle : bool + Whether or not shuffling of this data is allowed (eg. must turn off shuffling for validation data). + + Note: this code only works on Datasets in which: + * ``shuffle()`` has been called before ``batch()``, + * no other order-destroying operation (eg. ``repeat()``) has been applied. + + Indices are extracted from the original version of Dataset (before shuffle was called rather than in shuffled order). + """ + import tensorflow + + idx = np.asarray(idx) + idx = np.int64(idx) # needed for Windows (reconsider if necessary in the future) + + og_batch_size = None + if hasattr(X, "_batch_size"): + og_batch_size = int(X._batch_size) + X = X.unbatch() + + unshuffled_X, buffer_size = unshuffle_tensorflow_dataset(X) + if unshuffled_X is not None: + X = unshuffled_X + + # Create index,value pairs in the dataset (adds extra indices that werent there before) + X = X.enumerate() + keys_tensor = tensorflow.constant(idx) + vals_tensor = tensorflow.ones_like(keys_tensor) # Ones will be casted to True + table = tensorflow.lookup.StaticHashTable( + tensorflow.lookup.KeyValueTensorInitializer(keys_tensor, vals_tensor), + default_value=0, + ) # If index not in table, return 0 + + def hash_table_filter(index, value): + table_value = table.lookup(index) # 1 if index in arr, else 0 + index_in_arr = tensorflow.cast(table_value, tensorflow.bool) # 1 -> True, 0 -> False + return index_in_arr + + # Filter the dataset, then drop the added indices + X_subset = X.filter(hash_table_filter).map(lambda idx, value: value) + + if (unshuffled_X is not None) and allow_shuffle: + X_subset = X_subset.shuffle(buffer_size=buffer_size) + + if og_batch_size is not None: # reset batch size to original value + X_subset = X_subset.batch(og_batch_size) + + return X_subset
+ + +
[docs]def unshuffle_tensorflow_dataset(X) -> tuple: + """Applies iterative inverse transformations to dataset to get version before ShuffleDataset was created. + If no ShuffleDataset is in the transformation-history of this dataset, returns None. + + Parameters + ---------- + X : a tensorflow Dataset that may have been created via series of transformations, one being shuffle. + + Returns + ------- + Tuple (pre_X, buffer_size) where: + pre_X : Dataset that was previously transformed to get ShuffleDataset (or None), + buffer_size : int `buffer_size` previously used in ShuffleDataset, + or ``len(pre_X)`` if buffer_size cannot be determined, or None if no ShuffleDataset found. + """ + try: + from tensorflow.python.data.ops.dataset_ops import ShuffleDataset + + X_inputs = [X] + while len(X_inputs) == 1: + pre_X = X_inputs[0] + if isinstance(pre_X, ShuffleDataset): + buffer_size = len(pre_X) + if hasattr(pre_X, "_buffer_size"): + buffer_size = pre_X._buffer_size.numpy() + X_inputs = ( + pre_X._inputs() + ) # get the dataset that was transformed to create the ShuffleDataset + if len(X_inputs) == 1: + return (X_inputs[0], buffer_size) + X_inputs = pre_X._inputs() # returns list of input datasets used to create X + except Exception: + pass + return (None, None)
+ + +
[docs]def is_torch_dataset(X) -> bool: + try: + import torch + + if isinstance(X, torch.utils.data.Dataset): + return True + except Exception: + pass + return False # assumes this cannot be torch dataset if torch cannot be imported
+ + +
[docs]def is_tensorflow_dataset(X) -> bool: + try: + import tensorflow + + if isinstance(X, tensorflow.data.Dataset): + return True + except Exception: + pass + return False # assumes this cannot be tensorflow dataset if tensorflow cannot be imported
+ + +
[docs]def csr_vstack(a, b) -> DatasetLike: + """Takes in 2 csr_matrices and appends the second one to the bottom of the first one. + Alternative to scipy.sparse.vstack. Returns a sparse matrix. + """ + a.data = np.hstack((a.data, b.data)) + a.indices = np.hstack((a.indices, b.indices)) + a.indptr = np.hstack((a.indptr, (b.indptr + a.nnz)[1:])) + a._shape = (a.shape[0] + b.shape[0], b.shape[1]) + return a
+ + +
[docs]def append_extra_datapoint(to_data, from_data, index) -> DatasetLike: + """Appends an extra datapoint to the data object ``to_data``. + This datapoint is taken from the data object ``from_data`` at the corresponding index. + One place this could be useful is ensuring no missing classes after train/validation split. + """ + if not (type(from_data) is type(to_data)): + raise ValueError("Cannot append datapoint from different type of data object.") + + if isinstance(to_data, np.ndarray): + return np.vstack([to_data, from_data[index]]) + elif isinstance(from_data, (pd.DataFrame, pd.Series)): + X_extra = from_data.iloc[[index]] # type: ignore + to_data = pd.concat([to_data, X_extra]) + return to_data.reset_index(drop=True) + else: + try: + X_extra = from_data[index] + try: + return to_data.append(X_extra) + except Exception: # special append for sparse matrix + return csr_vstack(to_data, X_extra) + except Exception: + raise TypeError("Data features X must support: X.append(X[i])")
+ + +
[docs]def get_num_classes(labels=None, pred_probs=None, label_matrix=None, multi_label=None) -> int: + """Determines the number of classes based on information considered in a + canonical ordering. label_matrix can be: noise_matrix, inverse_noise_matrix, confident_joint, + or any other K x K matrix where K = number of classes. + """ + if pred_probs is not None: # pred_probs is number 1 source of truth + return pred_probs.shape[1] + + if label_matrix is not None: # matrix dimension is number 2 source of truth + if label_matrix.shape[0] != label_matrix.shape[1]: + raise ValueError(f"label matrix must be K x K, not {label_matrix.shape}") + else: + return label_matrix.shape[0] + + if labels is None: + raise ValueError("Cannot determine number of classes from None input") + + return num_unique_classes(labels, multi_label=multi_label)
+ + +
[docs]def num_unique_classes(labels, multi_label=None) -> int: + """Finds the number of unique classes for both single-labeled + and multi-labeled labels. If multi_label is set to None (default) + this method will infer if multi_label is True or False based on + the format of labels. + This allows for a more general form of multiclass labels that looks + like this: [1, [1,2], [0], [0, 1], 2, 1]""" + return len(get_unique_classes(labels, multi_label))
+ + +
[docs]def get_unique_classes(labels, multi_label=None) -> set: + """Returns the set of unique classes for both single-labeled + and multi-labeled labels. If multi_label is set to None (default) + this method will infer if multi_label is True or False based on + the format of labels. + This allows for a more general form of multiclass labels that looks + like this: [1, [1,2], [0], [0, 1], 2, 1]""" + if multi_label is None: + multi_label = any(isinstance(l, list) for l in labels) + if multi_label: + return set(l for grp in labels for l in list(grp)) + else: + return set(labels)
+ + +
[docs]def format_labels(labels: LabelLike) -> Tuple[np.ndarray, dict]: + """Takes an array of labels and formats it such that labels are in the set ``0, 1, ..., K-1``, + where ``K`` is the number of classes. The labels are assigned based on lexicographic order. + This is useful for mapping string class labels to the integer format required by many cleanlab (and sklearn) functions. + + Returns + ------- + formatted_labels + Returns np.ndarray of shape ``(N,)``. The return labels will be properly formatted and can be passed to other cleanlab functions. + + mapping + A dictionary showing the mapping of new to old labels, such that ``mapping[k]`` returns the name of the k-th class. + """ + labels = labels_to_array(labels) + if labels.ndim != 1: + raise ValueError("labels must be 1D numpy array.") + + unique_labels = np.unique(labels) + label_map = {label: i for i, label in enumerate(unique_labels)} + formatted_labels = np.array([label_map[l] for l in labels]) + inverse_map = {i: label for label, i in label_map.items()} + + return formatted_labels, inverse_map
+ + +
[docs]def smart_display_dataframe(df): # pragma: no cover + """Display a pandas dataframe if in a jupyter notebook, otherwise print it to console.""" + try: + from IPython.display import display + + display(df) + except Exception: + print(df)
+ + +
[docs]def force_two_dimensions(X) -> DatasetLike: + """ + Enforce the dimensionality of a dataset to two dimensions for the use of CleanLearning default classifier, + which is `sklearn.linear_model.LogisticRegression + <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html>`_. + + Parameters + ---------- + X : np.ndarray or DatasetLike + + Returns + ------- + X : np.ndarray or DatasetLike + The original dataset reduced to two dimensions, so that the dataset will have the shape ``(N, sum(...))``, + where N is still the number of examples. + """ + if X is not None and len(X.shape) > 2: + X = X.reshape((len(X), -1)) + return X
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/internal/validation.html b/v2.6.6/_modules/cleanlab/internal/validation.html new file mode 100644 index 000000000..5a11190ed --- /dev/null +++ b/v2.6.6/_modules/cleanlab/internal/validation.html @@ -0,0 +1,922 @@ + + + + + + + + + + + cleanlab.internal.validation - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.internal.validation

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Checks to ensure valid inputs for various methods.
+"""
+
+from cleanlab.typing import LabelLike, DatasetLike
+from cleanlab.internal.constants import FLOATING_POINT_COMPARISON
+from typing import Any, List, Optional, Union
+import warnings
+import numpy as np
+import pandas as pd
+
+
+
[docs]def assert_valid_inputs( + X: DatasetLike, + y: LabelLike, + pred_probs: Optional[np.ndarray] = None, + multi_label: bool = False, + allow_missing_classes: bool = True, + allow_one_class: bool = False, +) -> None: + """Checks that ``X``, ``labels``, ``pred_probs`` are correctly formatted.""" + if not isinstance(y, (list, np.ndarray, np.generic, pd.Series, pd.DataFrame)): + raise TypeError("labels should be a numpy array or pandas Series.") + if not multi_label: + y = labels_to_array(y) + assert_valid_class_labels( + y=y, allow_missing_classes=allow_missing_classes, allow_one_class=allow_one_class + ) + + allow_empty_X = True + if pred_probs is None: + allow_empty_X = False + try: + import tensorflow + + if isinstance(X, tensorflow.data.Dataset): + allow_empty_X = True # length of X may differ due to batch-size used in tf Dataset, so don't check it + except Exception: + pass + + if not allow_empty_X: + assert_nonempty_input(X) + try: + num_examples = len(X) + len_supported = True + except: + len_supported = False + if not len_supported: + try: + num_examples = X.shape[0] + shape_supported = True + except: + shape_supported = False + if (not len_supported) and (not shape_supported): + raise TypeError("Data features X must support either: len(X) or X.shape[0]") + + if num_examples != len(y): + raise ValueError( + f"X and labels must be same length, but X is length {num_examples} and labels is length {len(y)}." + ) + + assert_indexing_works(X, length_X=num_examples) + + if pred_probs is not None: + if not isinstance(pred_probs, (np.ndarray, np.generic)): + raise TypeError("pred_probs must be a numpy array.") + if len(pred_probs) != len(y): + raise ValueError("pred_probs and labels must have same length.") + if len(pred_probs.shape) != 2: + raise ValueError("pred_probs array must have shape: num_examples x num_classes.") + if not multi_label: + assert isinstance(y, np.ndarray) + highest_class = max(y) + 1 + else: + assert isinstance(y, list) + assert all(isinstance(y_i, list) for y_i in y) + highest_class = max([max(y_i) for y_i in y if len(y_i) != 0]) + 1 + if pred_probs.shape[1] < highest_class: + raise ValueError( + f"pred_probs must have at least {highest_class} columns, based on the largest class index which appears in labels." + ) + # Check for valid probabilities. + if (np.min(pred_probs) < 0 - FLOATING_POINT_COMPARISON) or ( + np.max(pred_probs) > 1 + FLOATING_POINT_COMPARISON + ): + raise ValueError("Values in pred_probs must be between 0 and 1.") + if X is not None: + warnings.warn("When X and pred_probs are both provided, the former may be ignored.")
+ + +
[docs]def assert_valid_class_labels( + y: np.ndarray, + allow_missing_classes: bool = True, + allow_one_class: bool = False, +) -> None: + """Checks that ``labels`` is properly formatted, i.e. a 1D numpy array where labels are zero-based + integers (not multi-label). + """ + if y.ndim != 1: + raise ValueError("Labels must be 1D numpy array.") + if any([isinstance(label, str) for label in y]): + raise ValueError( + "Labels cannot be strings, they must be zero-indexed integers corresponding to class indices." + ) + if not np.equal(np.mod(y, 1), 0).all(): # check that labels are integers + raise ValueError("Labels must be zero-indexed integers corresponding to class indices.") + if min(y) < 0: + raise ValueError("Labels must be positive integers corresponding to class indices.") + + unique_classes = np.unique(y) + if (not allow_one_class) and (len(unique_classes) < 2): + raise ValueError("Labels must contain at least 2 classes.") + + if not allow_missing_classes: + if (unique_classes != np.arange(len(unique_classes))).any(): + msg = "cleanlab requires zero-indexed integer labels (0,1,2,..,K-1), but in " + msg += "your case: np.unique(labels) = {}. ".format(str(unique_classes)) + msg += "Every class in (0,1,2,..,K-1) must be present in labels as well." + raise TypeError(msg)
+ + +
[docs]def assert_nonempty_input(X: Any) -> None: + """Ensures input is not None.""" + if X is None: + raise ValueError("Data features X cannot be None. Currently X is None.")
+ + +
[docs]def assert_indexing_works( + X: DatasetLike, idx: Optional[List[int]] = None, length_X: Optional[int] = None +) -> None: + """Ensures we can do list-based indexing into ``X`` and ``y``. + ``length_X`` is an optional argument since sparse matrix ``X`` + does not support: ``len(X)`` and we want this method to work for sparse ``X`` + (in addition to many other types of ``X``). + """ + if idx is None: + if length_X is None: + length_X = 2 # pragma: no cover + + idx = [0, length_X - 1] + + is_indexed = False + try: + if isinstance(X, (pd.DataFrame, pd.Series)): + _ = X.iloc[idx] # type: ignore[call-overload] + is_indexed = True + except Exception: + pass + if not is_indexed: + try: # check if X is pytorch Dataset object using lazy import + import torch + + if isinstance(X, torch.utils.data.Dataset): # indexing for pytorch Dataset + _ = torch.utils.data.Subset(X, idx) # type: ignore[call-overload] + is_indexed = True + except Exception: + pass + if not is_indexed: + try: # check if X is tensorflow Dataset object using lazy import + import tensorflow as tf + + if isinstance(X, tf.data.Dataset): + is_indexed = True # skip check for tensorflow Dataset (too expensive) + except Exception: + pass + if not is_indexed: + try: + _ = X[idx] # type: ignore[call-overload] + except Exception: + msg = ( + "Data features X must support list-based indexing; i.e. one of these must work: \n" + ) + msg += "1) X[index_list] where say index_list = [0,1,3,10], or \n" + msg += "2) X.iloc[index_list] if X is pandas DataFrame." + raise TypeError(msg)
+ + +
[docs]def labels_to_array(y: Union[LabelLike, np.generic]) -> np.ndarray: + """Converts different types of label objects to 1D numpy array and checks their validity. + + Parameters + ---------- + y : Union[LabelLike, np.generic] + Labels to convert to 1D numpy array. Can be a list, numpy array, pandas Series, or pandas DataFrame. + + Returns + ------- + labels_array : np.ndarray + 1D numpy array of labels. + """ + if isinstance(y, pd.Series): + y_series: np.ndarray = y.to_numpy() + return y_series + elif isinstance(y, pd.DataFrame): + y_arr = y.values + assert isinstance(y_arr, np.ndarray) + if y_arr.shape[1] != 1: + raise ValueError("labels must be one dimensional.") + return y_arr.flatten() + else: # y is list, np.ndarray, or some other tuple-like object + try: + return np.asarray(y) + except: + raise ValueError( + "List of labels must be convertable to 1D numpy array via: np.ndarray(labels)." + )
+ + +
[docs]def labels_to_list_multilabel(y: List) -> List[List[int]]: + """Converts different types of label objects to nested list and checks their validity. + + Parameters + ---------- + y : List + Labels to convert to nested list. Supports only list type. + + Returns + ------- + labels_list : List[List[int]] + Nested list of labels. + """ + if not isinstance(y, list): + raise ValueError("Unsupported Label format") + if not all(isinstance(x, list) for x in y): + raise ValueError("Each element in list of labels must be a list.") + + return y
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/models/keras.html b/v2.6.6/_modules/cleanlab/models/keras.html new file mode 100644 index 000000000..14364e967 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/models/keras.html @@ -0,0 +1,964 @@ + + + + + + + + + + + cleanlab.models.keras - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.models.keras

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Wrapper class you can use to make any Keras model compatible with :py:class:`CleanLearning <cleanlab.classification.CleanLearning>` and sklearn.
+Use :py:class:`KerasWrapperModel<cleanlab.experimental.keras.KerasWrapperModel>` to wrap existing functional API code for ``keras.Model`` objects,
+and :py:class:`KerasWrapperSequential<cleanlab.experimental.keras.KerasWrapperSequential>` to wrap existing ``tf.keras.models.Sequential`` objects.
+Most of the instance methods of this class work the same as the ones for the wrapped Keras model,
+see the `Keras documentation <https://keras.io/>`_ for details.
+
+This is a good example of making any bespoke neural network compatible with cleanlab.
+
+You must have `Tensorflow 2 installed <https://www.tensorflow.org/install>`_ (only compatible with Python versions >= 3.7).
+This wrapper class is only fully compatible with ``tensorflow<2.11``, if using ``tensorflow>=2.11``, 
+please replace your Optimizer class with the legacy Optimizer `here <https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/legacy/Optimizer>`_.
+
+.. warning::
+
+    For those on TensorFlow version 2.16 or higher, please note that direct compatibility is not yet fully established.
+    We are actively working to extend support to these newer versions.
+    
+    In the interim, users are advised to use TensorFlow versions up to 2.15 to ensure stability and maintain compatibility.
+    This can be done by specifying the TensorFlow version in your package manager, for example:
+    
+    .. code-block::
+        
+        pip install tensorflow<2.16
+    
+    This approach ensures that you can continue utilizing the full functionality of this wrapper class until an update accommodating newer TensorFlow versions is released.
+
+Tips:
+
+* If this class lacks certain functionality, you can alternatively try `scikeras <https://github.com/adriangb/scikeras>`_.
+* Unlike scikeras, our `KerasWrapper` classes can operate directly on ``tensorflow.data.Dataset`` objects (like regular Keras models).
+* To call ``fit()`` on a tensorflow ``Dataset`` object with a Keras model, the ``Dataset`` should already be batched.
+* Check out our example using this class: `huggingface_keras_imdb <https://github.com/cleanlab/examples/blob/master/huggingface_keras_imdb/huggingface_keras_imdb.ipynb>`_
+* Our `unit tests <https://github.com/cleanlab/cleanlab/blob/master/tests/test_frameworks.py>`_ also provide basic usage examples.
+
+"""
+
+import tensorflow as tf
+import keras  # type: ignore
+import numpy as np
+from typing import Callable, Optional
+
+
+
[docs]class KerasWrapperModel: + """Takes in a callable function to instantiate a Keras Model (using Keras functional API) + that is compatible with :py:class:`CleanLearning <cleanlab.classification.CleanLearning>` and sklearn. + + The instance methods of this class work in the same way as those of any ``keras.Model`` object, see the `Keras documentation <https://keras.io/>`_ for details. + For using Keras sequential instead of functional API, see the :py:class:`KerasWrapperSequential<cleanlab.experimental.keras.KerasWrapperSequential>` class. + + Parameters + ---------- + model: Callable + A callable function to construct the Keras Model (using functional API). Pass in the function here, not the constructed model! + + For example:: + + def model(num_features, num_classes): + inputs = tf.keras.Input(shape=(num_features,)) + outputs = tf.keras.layers.Dense(num_classes)(inputs) + return tf.keras.Model(inputs=inputs, outputs=outputs, name="my_keras_model") + + model_kwargs: dict, default = {} + Dict of optional keyword arguments to pass into ``model()`` when instantiating the ``keras.Model``. + + compile_kwargs: dict, default = {"loss": tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)} + Dict of optional keyword arguments to pass into ``model.compile()`` for declaring loss, metrics, optimizer, etc. + """ + + def __init__( + self, + model: Callable, + model_kwargs: dict = {}, + compile_kwargs: dict = { + "loss": tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) + }, + params: Optional[dict] = None, + ): + if params is None: + params = {} + + self.model = model + self.model_kwargs = model_kwargs + self.compile_kwargs = compile_kwargs + self.params = params + self.net = None + +
[docs] def get_params(self, deep=True): + """Returns the parameters of the Keras model.""" + return { + "model": self.model, + "model_kwargs": self.model_kwargs, + "compile_kwargs": self.compile_kwargs, + "params": self.params, + }
+ +
[docs] def set_params(self, **params): + """Set the parameters of the Keras model.""" + self.params.update(params) + return self
+ +
[docs] def fit(self, X, y=None, **kwargs): + """Trains a Keras model. + + Parameters + ---------- + X : tf.Dataset or np.array or pd.DataFrame + If ``X`` is a tensorflow dataset object, it must already contain the labels as is required for standard Keras fit. + + y : np.array or pd.DataFrame, default = None + If ``X`` is a tensorflow dataset object, you can optionally provide the labels again here as argument `y` to be compatible with sklearn, + but they are ignored. + If ``X`` is a numpy array or pandas dataframe, the labels have to be passed in using this argument. + """ + if self.net is None: + self.net = self.model(**self.model_kwargs) + self.net.compile(**self.compile_kwargs) + + # TODO: check for generators + if y is not None and not isinstance(X, (tf.data.Dataset, keras.utils.Sequence)): + kwargs["y"] = y + + self.net.fit(X, **{**self.params, **kwargs})
+ +
[docs] def predict_proba(self, X, *, apply_softmax=True, **kwargs): + """Predict class probabilities for all classes using the wrapped Keras model. + Set extra argument `apply_softmax` to True to indicate your network only outputs logits not probabilities. + + Parameters + ---------- + X : tf.Dataset or np.array or pd.DataFrame + Data in the same format as the original ``X`` provided to ``fit()``. + """ + if self.net is None: + raise ValueError("must call fit() before predict()") + pred_probs = self.net.predict(X, **kwargs) + if apply_softmax: + pred_probs = tf.nn.softmax(pred_probs, axis=1) + return pred_probs
+ +
[docs] def predict(self, X, **kwargs): + """Predict class labels using the wrapped Keras model. + + Parameters + ---------- + X : tf.Dataset or np.array or pd.DataFrame + Data in the same format as the original ``X`` provided to ``fit()``. + + """ + pred_probs = self.predict_proba(X, **kwargs) + return np.argmax(pred_probs, axis=1)
+ +
[docs] def summary(self, **kwargs): + """Returns the summary of the Keras model.""" + if self.net is None: + self.net = self.model(**self.model_kwargs) + self.net.compile(**self.compile_kwargs) + + return self.net.summary(**kwargs)
+ + +
[docs]class KerasWrapperSequential: + """Makes any ``tf.keras.models.Sequential`` object compatible with :py:class:`CleanLearning <cleanlab.classification.CleanLearning>` and sklearn. + + `KerasWrapperSequential` is instantiated in the same way as a keras ``Sequential`` object, except for optional extra `compile_kwargs` argument. + Just instantiate this object in the same way as your ``tf.keras.models.Sequential`` object (rather than passing in an existing ``Sequential`` object). + The instance methods of this class work in the same way as those of any keras ``Sequential`` object, see the `Keras documentation <https://keras.io/>`_ for details. + + Parameters + ---------- + layers: list + A list containing the layers to add to the keras ``Sequential`` model (same as for ``tf.keras.models.Sequential``). + + name: str, default = None + Name for the Keras model (same as for ``tf.keras.models.Sequential``). + + compile_kwargs: dict, default = {"loss": tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)} + Dict of optional keyword arguments to pass into ``model.compile()`` for declaring loss, metrics, optimizer, etc. + """ + + def __init__( + self, + layers: Optional[list] = None, + name: Optional[str] = None, + compile_kwargs: dict = { + "loss": tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) + }, + params: Optional[dict] = None, + ): + if params is None: + params = {} + + self.layers = layers + self.name = name + self.compile_kwargs = compile_kwargs + self.params = params + self.net = None + +
[docs] def get_params(self, deep=True): + """Returns the parameters of the Keras model.""" + return { + "layers": self.layers, + "name": self.name, + "compile_kwargs": self.compile_kwargs, + "params": self.params, + }
+ +
[docs] def set_params(self, **params): + """Set the parameters of the Keras model.""" + self.params.update(params) + return self
+ +
[docs] def fit(self, X, y=None, **kwargs): + """Trains a Sequential Keras model. + + Parameters + ---------- + X : tf.Dataset or np.array or pd.DataFrame + If ``X`` is a tensorflow dataset object, it must already contain the labels as is required for standard Keras fit. + + y : np.array or pd.DataFrame, default = None + If ``X`` is a tensorflow dataset object, you can optionally provide the labels again here as argument `y` to be compatible with sklearn, + but they are ignored. + If ``X`` is a numpy array or pandas dataframe, the labels have to be passed in using this argument. + """ + if self.net is None: + self.net = tf.keras.models.Sequential(self.layers, self.name) + self.net.compile(**self.compile_kwargs) + + # TODO: check for generators + if y is not None and not isinstance(X, (tf.data.Dataset, keras.utils.Sequence)): + kwargs["y"] = y + + self.net.fit(X, **{**self.params, **kwargs})
+ +
[docs] def predict_proba(self, X, *, apply_softmax=True, **kwargs): + """Predict class probabilities for all classes using the wrapped Keras model. + Set extra argument `apply_softmax` to True to indicate your network only outputs logits not probabilities. + + Parameters + ---------- + X : tf.Dataset or np.array or pd.DataFrame + Data in the same format as the original ``X`` provided to ``fit()``. + """ + if self.net is None: + raise ValueError("must call fit() before predict()") + pred_probs = self.net.predict(X, **kwargs) + if apply_softmax: + pred_probs = tf.nn.softmax(pred_probs, axis=1) + return pred_probs
+ +
[docs] def predict(self, X, **kwargs): + """Predict class labels using the wrapped Keras model. + + Parameters + ---------- + X : tf.Dataset or np.array or pd.DataFrame + Data in the same format as the original ``X`` provided to ``fit()``. + """ + pred_probs = self.predict_proba(X, **kwargs) + return np.argmax(pred_probs, axis=1)
+ +
[docs] def summary(self, **kwargs): + """Returns the summary of the Keras model.""" + if self.net is None: + self.net = tf.keras.models.Sequential(self.layers, self.name) + self.net.compile(**self.compile_kwargs) + + return self.net.summary(**kwargs)
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/multiannotator.html b/v2.6.6/_modules/cleanlab/multiannotator.html new file mode 100644 index 000000000..858d42157 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/multiannotator.html @@ -0,0 +1,2618 @@ + + + + + + + + + + + cleanlab.multiannotator - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.multiannotator

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods for analysis of classification data labeled by multiple annotators.
+
+To analyze a fixed dataset labeled by multiple annotators, use the
+`~cleanlab.multiannotator.get_label_quality_multiannotator` function which estimates:
+
+* A consensus label for each example that aggregates the individual annotations more accurately than alternative aggregation via majority-vote or other algorithms used in crowdsourcing like Dawid-Skene.
+* A quality score for each consensus label which measures our confidence that this label is correct.
+* An analogous label quality score for each individual label chosen by one annotator for a particular example.
+* An overall quality score for each annotator which measures our confidence in the overall correctness of labels obtained from this annotator.
+
+The algorithms to compute these estimates are described in `the CROWDLAB paper <https://arxiv.org/abs/2210.06812>`_.
+
+If you have some labeled and unlabeled data (with multiple annotators for some labeled examples) and want to decide what data to collect additional labels for,
+use the `~cleanlab.multiannotator.get_active_learning_scores` function, which is intended for active learning.
+This function estimates an ActiveLab quality score for each example,
+which can be used to prioritize which examples are most informative to collect additional labels for.
+This function is effective for settings where some examples have been labeled by one or more annotators and other examples can have no labels at all so far,
+as well as settings where new labels are collected either in batches of examples or one at a time.
+Here is an `example notebook <https://github.com/cleanlab/examples/blob/master/active_learning_multiannotator/active_learning.ipynb>`_ showcasing the use of this ActiveLab method for active learning with data re-labeling.
+
+The algorithms to compute these active learning scores are described in `the ActiveLab paper <https://arxiv.org/abs/2301.11856>`_.
+
+Each of the main functions in this module utilizes any trained classifier model.
+Variants of these functions are provided for settings where you have trained an ensemble of multiple models.
+"""
+
+import warnings
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+import numpy as np
+import pandas as pd
+
+from cleanlab.internal.constants import CLIPPING_LOWER_BOUND
+from cleanlab.internal.multiannotator_utils import (
+    assert_valid_inputs_multiannotator,
+    assert_valid_pred_probs,
+    check_consensus_label_classes,
+    find_best_temp_scaler,
+    temp_scale_pred_probs,
+)
+from cleanlab.internal.util import get_num_classes, value_counts
+from cleanlab.rank import get_label_quality_scores
+
+
+
[docs]def get_label_quality_multiannotator( + labels_multiannotator: Union[pd.DataFrame, np.ndarray], + pred_probs: np.ndarray, + *, + consensus_method: Union[str, List[str]] = "best_quality", + quality_method: str = "crowdlab", + calibrate_probs: bool = False, + return_detailed_quality: bool = True, + return_annotator_stats: bool = True, + return_weights: bool = False, + verbose: bool = True, + label_quality_score_kwargs: dict = {}, +) -> Dict[str, Any]: + """Returns label quality scores for each example and for each annotator in a dataset labeled by multiple annotators. + + This function is for multiclass classification datasets where examples have been labeled by + multiple annotators (not necessarily the same number of annotators per example). + + It computes one consensus label for each example that best accounts for the labels chosen by each + annotator (and their quality), as well as a consensus quality score for how confident we are that this consensus label is actually correct. + It also computes similar quality scores for each annotator's individual labels, and the quality of each annotator. + Scores are between 0 and 1 (estimated via methods like CROWDLAB); lower scores indicate labels/annotators less likely to be correct. + + To decide what data to collect additional labels for, try the `~cleanlab.multiannotator.get_active_learning_scores` + (ActiveLab) function, which is intended for active learning with multiple annotators. + + Parameters + ---------- + labels_multiannotator : pd.DataFrame or np.ndarray + 2D pandas DataFrame or array of multiple given labels for each example with shape ``(N, M)``, + where N is the number of examples and M is the number of annotators. + ``labels_multiannotator[n][m]`` = label for n-th example given by m-th annotator. + + For a dataset with K classes, each given label must be an integer in 0, 1, ..., K-1 or ``NaN`` if this annotator did not label a particular example. + If you have string or other differently formatted labels, you can convert them to the proper format using :py:func:`format_multiannotator_labels <cleanlab.internal.multiannotator_utils.format_multiannotator_labels>`. + If pd.DataFrame, column names should correspond to each annotator's ID. + pred_probs : np.ndarray + An array of shape ``(N, K)`` of predicted class probabilities from a trained classifier model. + Predicted probabilities in the same format expected by the :py:func:`get_label_quality_scores <cleanlab.rank.get_label_quality_scores>`. + consensus_method : str or List[str], default = "majority_vote" + Specifies the method used to aggregate labels from multiple annotators into a single consensus label. + Options include: + + * ``majority_vote``: consensus obtained using a simple majority vote among annotators, with ties broken via ``pred_probs``. + * ``best_quality``: consensus obtained by selecting the label with highest label quality (quality determined by method specified in ``quality_method``). + + A List may be passed if you want to consider multiple methods for producing consensus labels. + If a List is passed, then the 0th element of the list is the method used to produce columns `consensus_label`, `consensus_quality_score`, `annotator_agreement` in the returned DataFrame. + The remaning (1st, 2nd, 3rd, etc.) elements of this list are output as extra columns in the returned pandas DataFrame with names formatted as: + `consensus_label_SUFFIX`, `consensus_quality_score_SUFFIX` where `SUFFIX` = each element of this + list, which must correspond to a valid method for computing consensus labels. + quality_method : str, default = "crowdlab" + Specifies the method used to calculate the quality of the consensus label. + Options include: + + * ``crowdlab``: an emsemble method that weighs both the annotators' labels as well as the model's prediction. + * ``agreement``: the fraction of annotators that agree with the consensus label. + calibrate_probs : bool, default = False + Boolean value that specifies whether the provided `pred_probs` should be re-calibrated to better match the annotators' empirical label distribution. + We recommend setting this to True in active learning applications, in order to prevent overconfident models from suggesting the wrong examples to collect labels for. + return_detailed_quality: bool, default = True + Boolean to specify if `detailed_label_quality` is returned. + return_annotator_stats : bool, default = True + Boolean to specify if `annotator_stats` is returned. + return_weights : bool, default = False + Boolean to specify if `model_weight` and `annotator_weight` is returned. + Model and annotator weights are applicable for ``quality_method == crowdlab``, will return ``None`` for any other quality methods. + verbose : bool, default = True + Important warnings and other printed statements may be suppressed if ``verbose`` is set to ``False``. + label_quality_score_kwargs : dict, optional + Keyword arguments to pass into :py:func:`get_label_quality_scores <cleanlab.rank.get_label_quality_scores>`. + + Returns + ------- + labels_info : dict + Dictionary containing up to 5 pandas DataFrame with keys as below: + + ``label_quality`` : pandas.DataFrame + pandas DataFrame in which each row corresponds to one example, with columns: + + * ``num_annotations``: the number of annotators that have labeled each example. + * ``consensus_label``: the single label that is best for each example (you can control how it is derived from all annotators' labels via the argument: ``consensus_method``). + * ``annotator_agreement``: the fraction of annotators that agree with the consensus label (only consider the annotators that labeled that particular example). + * ``consensus_quality_score``: label quality score for consensus label, calculated by the method specified in ``quality_method``. + + ``detailed_label_quality`` : pandas.DataFrame + Only returned if `return_detailed_quality=True`. + Returns a pandas DataFrame with columns `quality_annotator_1`, `quality_annotator_2`, ..., `quality_annotator_M` where each entry is + the label quality score for the labels provided by each annotator (is ``NaN`` for examples which this annotator did not label). + + ``annotator_stats`` : pandas.DataFrame + Only returned if `return_annotator_stats=True`. + Returns overall statistics about each annotator, sorted by lowest annotator_quality first. + pandas DataFrame in which each row corresponds to one annotator (the row IDs correspond to annotator IDs), with columns: + + * ``annotator_quality``: overall quality of a given annotator's labels, calculated by the method specified in ``quality_method``. + * ``num_examples_labeled``: number of examples annotated by a given annotator. + * ``agreement_with_consensus``: fraction of examples where a given annotator agrees with the consensus label. + * ``worst_class``: the class that is most frequently mislabeled by a given annotator. + + ``model_weight`` : float + Only returned if `return_weights=True`. It is only applicable for ``quality_method == crowdlab``. + The model weight specifies the weight of classifier model in weighted averages used to estimate label quality + This number is an estimate of how trustworthy the model is relative the annotators. + + ``annotator_weight`` : np.ndarray + Only returned if `return_weights=True`. It is only applicable for ``quality_method == crowdlab``. + An array of shape ``(M,)`` where M is the number of annotators, specifying the weight of each annotator in weighted averages used to estimate label quality. + These weights are estimates of how trustworthy each annotator is relative to the other annotators. + + """ + + if isinstance(labels_multiannotator, pd.DataFrame): + annotator_ids = labels_multiannotator.columns + index_col = labels_multiannotator.index + labels_multiannotator = ( + labels_multiannotator.replace({pd.NA: np.NaN}).astype(float).to_numpy() + ) + elif isinstance(labels_multiannotator, np.ndarray): + annotator_ids = None + index_col = None + else: + raise ValueError("labels_multiannotator must be either a NumPy array or Pandas DataFrame.") + + if return_weights == True and quality_method != "crowdlab": + raise ValueError( + "Model and annotator weights are only applicable to the crowdlab quality method. " + "Either set return_weights=False or quality_method='crowdlab'." + ) + + assert_valid_inputs_multiannotator( + labels_multiannotator, pred_probs, annotator_ids=annotator_ids + ) + + # Count number of non-NaN values for each example + num_annotations = np.sum(~np.isnan(labels_multiannotator), axis=1) + + # calibrate pred_probs + if calibrate_probs: + optimal_temp = find_best_temp_scaler(labels_multiannotator, pred_probs) + pred_probs = temp_scale_pred_probs(pred_probs, optimal_temp) + + if not isinstance(consensus_method, list): + consensus_method = [consensus_method] + + if "best_quality" in consensus_method or "majority_vote" in consensus_method: + majority_vote_label = get_majority_vote_label( + labels_multiannotator=labels_multiannotator, + pred_probs=pred_probs, + verbose=False, + ) + ( + MV_annotator_agreement, + MV_consensus_quality_score, + MV_post_pred_probs, + MV_model_weight, + MV_annotator_weight, + ) = _get_consensus_stats( + labels_multiannotator=labels_multiannotator, + pred_probs=pred_probs, + num_annotations=num_annotations, + consensus_label=majority_vote_label, + quality_method=quality_method, + verbose=verbose, + label_quality_score_kwargs=label_quality_score_kwargs, + ) + + label_quality = pd.DataFrame({"num_annotations": num_annotations}, index=index_col) + valid_methods = ["majority_vote", "best_quality"] + main_method = True + + for curr_method in consensus_method: + # geting consensus label and stats + if curr_method == "majority_vote": + consensus_label = majority_vote_label + annotator_agreement = MV_annotator_agreement + consensus_quality_score = MV_consensus_quality_score + post_pred_probs = MV_post_pred_probs + model_weight = MV_model_weight + annotator_weight = MV_annotator_weight + + elif curr_method == "best_quality": + consensus_label = np.full(len(majority_vote_label), np.nan) + for i in range(len(consensus_label)): + max_pred_probs_ind = np.where( + MV_post_pred_probs[i] == np.max(MV_post_pred_probs[i]) + )[0] + if len(max_pred_probs_ind) == 1: + consensus_label[i] = max_pred_probs_ind[0] + else: + consensus_label[i] = majority_vote_label[i] + consensus_label = consensus_label.astype(int) # convert all label types to int + + ( + annotator_agreement, + consensus_quality_score, + post_pred_probs, + model_weight, + annotator_weight, + ) = _get_consensus_stats( + labels_multiannotator=labels_multiannotator, + pred_probs=pred_probs, + num_annotations=num_annotations, + consensus_label=consensus_label, + quality_method=quality_method, + verbose=verbose, + label_quality_score_kwargs=label_quality_score_kwargs, + ) + + else: + raise ValueError( + f""" + {curr_method} is not a valid consensus method! + Please choose a valid consensus_method: {valid_methods} + """ + ) + + if verbose: + # check if any classes no longer appear in the set of consensus labels + check_consensus_label_classes( + labels_multiannotator=labels_multiannotator, + consensus_label=consensus_label, + consensus_method=curr_method, + ) + + # saving stats into dataframe, computing additional stats if specified + if main_method: + ( + label_quality["consensus_label"], + label_quality["consensus_quality_score"], + label_quality["annotator_agreement"], + ) = ( + consensus_label, + consensus_quality_score, + annotator_agreement, + ) + + label_quality = label_quality.reindex( + columns=[ + "consensus_label", + "consensus_quality_score", + "annotator_agreement", + "num_annotations", + ] + ) + + # default variable for _get_annotator_stats + detailed_label_quality = None + + if return_detailed_quality: + # Compute the label quality scores for each annotators' labels + detailed_label_quality = np.apply_along_axis( + _get_annotator_label_quality_score, + axis=0, + arr=labels_multiannotator, + pred_probs=post_pred_probs, + label_quality_score_kwargs=label_quality_score_kwargs, + ) + detailed_label_quality_df = pd.DataFrame( + detailed_label_quality, index=index_col, columns=annotator_ids + ).add_prefix("quality_annotator_") + + if return_annotator_stats: + annotator_stats = _get_annotator_stats( + labels_multiannotator=labels_multiannotator, + pred_probs=post_pred_probs, + consensus_label=consensus_label, + num_annotations=num_annotations, + annotator_agreement=annotator_agreement, + model_weight=model_weight, + annotator_weight=annotator_weight, + consensus_quality_score=consensus_quality_score, + detailed_label_quality=detailed_label_quality, + annotator_ids=annotator_ids, + quality_method=quality_method, + ) + + main_method = False + + else: + ( + label_quality[f"consensus_label_{curr_method}"], + label_quality[f"consensus_quality_score_{curr_method}"], + label_quality[f"annotator_agreement_{curr_method}"], + ) = ( + consensus_label, + consensus_quality_score, + annotator_agreement, + ) + + labels_info = { + "label_quality": label_quality, + } + + if return_detailed_quality: + labels_info["detailed_label_quality"] = detailed_label_quality_df + if return_annotator_stats: + labels_info["annotator_stats"] = annotator_stats + if return_weights: + labels_info["model_weight"] = model_weight + labels_info["annotator_weight"] = annotator_weight + + return labels_info
+ + +
[docs]def get_label_quality_multiannotator_ensemble( + labels_multiannotator: Union[pd.DataFrame, np.ndarray], + pred_probs: np.ndarray, + *, + calibrate_probs: bool = False, + return_detailed_quality: bool = True, + return_annotator_stats: bool = True, + return_weights: bool = False, + verbose: bool = True, + label_quality_score_kwargs: dict = {}, +) -> Dict[str, Any]: + """Returns label quality scores for each example and for each annotator, based on predictions from an ensemble of models. + + This function is similar to `~cleanlab.multiannotator.get_label_quality_multiannotator` but for settings where + you have trained an ensemble of multiple classifier models rather than a single model. + + Parameters + ---------- + labels_multiannotator : pd.DataFrame or np.ndarray + Multiannotator labels in the same format expected by `~cleanlab.multiannotator.get_label_quality_multiannotator`. + pred_probs : np.ndarray + An array of shape ``(P, N, K)`` where P is the number of models, consisting of predicted class probabilities from the ensemble models. + Each set of predicted probabilities with shape ``(N, K)`` is in the same format expected by the :py:func:`get_label_quality_scores <cleanlab.rank.get_label_quality_scores>`. + calibrate_probs : bool, default = False + Boolean value as expected by `~cleanlab.multiannotator.get_label_quality_multiannotator`. + return_detailed_quality: bool, default = True + Boolean value as expected by `~cleanlab.multiannotator.get_label_quality_multiannotator`. + return_annotator_stats : bool, default = True + Boolean value as expected by `~cleanlab.multiannotator.get_label_quality_multiannotator`. + return_weights : bool, default = False + Boolean value as expected by `~cleanlab.multiannotator.get_label_quality_multiannotator`. + verbose : bool, default = True + Boolean value as expected by `~cleanlab.multiannotator.get_label_quality_multiannotator`. + label_quality_score_kwargs : dict, optional + Keyword arguments in the same format expected by `~cleanlab.multiannotator.get_label_quality_multiannotator`. + + Returns + ------- + labels_info : dict + Dictionary containing up to 5 pandas DataFrame with keys as below: + + ``label_quality`` : pandas.DataFrame + Similar to output as `~cleanlab.multiannotator.get_label_quality_multiannotator`. + + ``detailed_label_quality`` : pandas.DataFrame + Similar to output as `~cleanlab.multiannotator.get_label_quality_multiannotator`. + + ``annotator_stats`` : pandas.DataFrame + Similar to output as `~cleanlab.multiannotator.get_label_quality_multiannotator`. + + ``model_weight`` : np.ndarray + Only returned if `return_weights=True`. + An array of shape ``(P,)`` where is the number of models in the ensemble, specifying the weight of each classifier model in weighted averages used to estimate label quality. + These weigthts is an estimate of how trustworthy the model is relative the annotators. + An array of shape ``(P,)`` where is the number of models in the ensemble, specifying the model weight used in weighted averages. + + ``annotator_weight`` : np.ndarray + Only returned if `return_weights=True`. + Similar to output as `~cleanlab.multiannotator.get_label_quality_multiannotator`. + + See Also + -------- + get_label_quality_multiannotator + """ + if isinstance(labels_multiannotator, pd.DataFrame): + annotator_ids = labels_multiannotator.columns + index_col = labels_multiannotator.index + labels_multiannotator = ( + labels_multiannotator.replace({pd.NA: np.NaN}).astype(float).to_numpy() + ) + elif isinstance(labels_multiannotator, np.ndarray): + annotator_ids = None + index_col = None + else: + raise ValueError("labels_multiannotator must be either a NumPy array or Pandas DataFrame.") + + assert_valid_inputs_multiannotator( + labels_multiannotator, pred_probs, ensemble=True, annotator_ids=annotator_ids + ) + + # Count number of non-NaN values for each example + num_annotations = np.sum(~np.isnan(labels_multiannotator), axis=1) + + # temp scale pred_probs + if calibrate_probs: + for i in range(len(pred_probs)): + curr_pred_probs = pred_probs[i] + optimal_temp = find_best_temp_scaler(labels_multiannotator, curr_pred_probs) + pred_probs[i] = temp_scale_pred_probs(curr_pred_probs, optimal_temp) + + label_quality = pd.DataFrame({"num_annotations": num_annotations}, index=index_col) + + # get majority vote stats + avg_pred_probs = np.mean(pred_probs, axis=0) + majority_vote_label = get_majority_vote_label( + labels_multiannotator=labels_multiannotator, + pred_probs=avg_pred_probs, + verbose=False, + ) + ( + MV_annotator_agreement, + MV_consensus_quality_score, + MV_post_pred_probs, + MV_model_weight, + MV_annotator_weight, + ) = _get_consensus_stats( + labels_multiannotator=labels_multiannotator, + pred_probs=pred_probs, + num_annotations=num_annotations, + consensus_label=majority_vote_label, + verbose=verbose, + ensemble=True, + **label_quality_score_kwargs, + ) + + # get crowdlab stats + consensus_label = np.full(len(majority_vote_label), np.nan) + for i in range(len(consensus_label)): + max_pred_probs_ind = np.where(MV_post_pred_probs[i] == np.max(MV_post_pred_probs[i]))[0] + if len(max_pred_probs_ind) == 1: + consensus_label[i] = max_pred_probs_ind[0] + else: + consensus_label[i] = majority_vote_label[i] + consensus_label = consensus_label.astype(int) # convert all label types to int + + ( + annotator_agreement, + consensus_quality_score, + post_pred_probs, + model_weight, + annotator_weight, + ) = _get_consensus_stats( + labels_multiannotator=labels_multiannotator, + pred_probs=pred_probs, + num_annotations=num_annotations, + consensus_label=consensus_label, + verbose=verbose, + ensemble=True, + **label_quality_score_kwargs, + ) + + if verbose: + # check if any classes no longer appear in the set of consensus labels + check_consensus_label_classes( + labels_multiannotator=labels_multiannotator, + consensus_label=consensus_label, + consensus_method="crowdlab", + ) + + ( + label_quality["consensus_label"], + label_quality["consensus_quality_score"], + label_quality["annotator_agreement"], + ) = ( + consensus_label, + consensus_quality_score, + annotator_agreement, + ) + + label_quality = label_quality.reindex( + columns=[ + "consensus_label", + "consensus_quality_score", + "annotator_agreement", + "num_annotations", + ] + ) + + # default variable for _get_annotator_stats + detailed_label_quality = None + + if return_detailed_quality: + # Compute the label quality scores for each annotators' labels + detailed_label_quality = np.apply_along_axis( + _get_annotator_label_quality_score, + axis=0, + arr=labels_multiannotator, + pred_probs=post_pred_probs, + label_quality_score_kwargs=label_quality_score_kwargs, + ) + detailed_label_quality_df = pd.DataFrame( + detailed_label_quality, index=index_col, columns=annotator_ids + ).add_prefix("quality_annotator_") + + if return_annotator_stats: + annotator_stats = _get_annotator_stats( + labels_multiannotator=labels_multiannotator, + pred_probs=post_pred_probs, + consensus_label=consensus_label, + num_annotations=num_annotations, + annotator_agreement=annotator_agreement, + model_weight=np.mean(model_weight), # use average model weight when scoring annotators + annotator_weight=annotator_weight, + consensus_quality_score=consensus_quality_score, + detailed_label_quality=detailed_label_quality, + annotator_ids=annotator_ids, + ) + + labels_info = { + "label_quality": label_quality, + } + + if return_detailed_quality: + labels_info["detailed_label_quality"] = detailed_label_quality_df + if return_annotator_stats: + labels_info["annotator_stats"] = annotator_stats + if return_weights: + labels_info["model_weight"] = model_weight + labels_info["annotator_weight"] = annotator_weight + + return labels_info
+ + +
[docs]def get_active_learning_scores( + labels_multiannotator: Optional[Union[pd.DataFrame, np.ndarray]] = None, + pred_probs: Optional[np.ndarray] = None, + pred_probs_unlabeled: Optional[np.ndarray] = None, +) -> Tuple[np.ndarray, np.ndarray]: + """Returns an ActiveLab quality score for each example in the dataset, to estimate which examples are most informative to (re)label next in active learning. + + We consider settings where one example can be labeled by one or more annotators and some examples have no labels at all so far. + + The score is in between 0 and 1, and can be used to prioritize what data to collect additional labels for. + Lower scores indicate examples whose true label we are least confident about based on the current data; + collecting additional labels for these low-scoring examples will be more informative than collecting labels for other examples. + To use an annotation budget most efficiently, select a batch of examples with the lowest scores and collect one additional label for each example, + and repeat this process after retraining your classifier. + + You can use this function to get active learning scores for: examples that already have one or more labels (specify ``labels_multiannotator`` and ``pred_probs`` + as arguments), or for unlabeled examples (specify ``pred_probs_unlabeled``), or for both types of examples (specify all of the above arguments). + + To analyze a fixed dataset labeled by multiple annotators rather than collecting additional labels, try the + `~cleanlab.multiannotator.get_label_quality_multiannotator` (CROWDLAB) function instead. + + Parameters + ---------- + labels_multiannotator : pd.DataFrame or np.ndarray, optional + 2D pandas DataFrame or array of multiple given labels for each example with shape ``(N, M)``, + where N is the number of examples and M is the number of annotators. Note that this function also works with + datasets where there is only one annotator (M=1). + For more details, labels in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + Note that examples that have no annotator labels should not be included in this DataFrame/array. + This argument is optional if ``pred_probs`` is not provided (you might only provide ``pred_probs_unlabeled`` to only get active learning scores for the unlabeled examples). + pred_probs : np.ndarray, optional + An array of shape ``(N, K)`` of predicted class probabilities from a trained classifier model. + Predicted probabilities in the same format expected by the :py:func:`get_label_quality_scores <cleanlab.rank.get_label_quality_scores>`. + This argument is optional if you only want to get active learning scores for unlabeled examples (specify only ``pred_probs_unlabeled`` instead). + pred_probs_unlabeled : np.ndarray, optional + An array of shape ``(N, K)`` of predicted class probabilities from a trained classifier model for examples that have no annotator labels. + Predicted probabilities in the same format expected by the :py:func:`get_label_quality_scores <cleanlab.rank.get_label_quality_scores>`. + This argument is optional if you only want to get active learning scores for already-labeled examples (specify only ``pred_probs`` instead). + + Returns + ------- + active_learning_scores : np.ndarray + Array of shape ``(N,)`` indicating the ActiveLab quality scores for each example. + This array is empty if no already-labeled data was provided via ``labels_multiannotator``. + Examples with the lowest scores are those we should label next in order to maximally improve our classifier model. + + active_learning_scores_unlabeled : np.ndarray + Array of shape ``(N,)`` indicating the active learning quality scores for each unlabeled example. + Returns an empty array if no unlabeled data is provided. + Examples with the lowest scores are those we should label next in order to maximally improve our classifier model + (scores for unlabeled data are directly comparable with the `active_learning_scores` for labeled data). + """ + + assert_valid_pred_probs(pred_probs=pred_probs, pred_probs_unlabeled=pred_probs_unlabeled) + + # compute multiannotator stats if labeled data is provided + if pred_probs is not None: + if labels_multiannotator is None: + raise ValueError( + "labels_multiannotator cannot be None when passing in pred_probs. ", + "Either provide labels_multiannotator to obtain active learning scores for the labeled examples, " + "or just pass in pred_probs_unlabeled to get active learning scores for unlabeled examples.", + ) + + if isinstance(labels_multiannotator, pd.DataFrame): + labels_multiannotator = ( + labels_multiannotator.replace({pd.NA: np.NaN}).astype(float).to_numpy() + ) + elif not isinstance(labels_multiannotator, np.ndarray): + raise ValueError( + "labels_multiannotator must be either a NumPy array or Pandas DataFrame." + ) + # check that labels_multiannotator is a 2D array + if labels_multiannotator.ndim != 2: + raise ValueError( + "labels_multiannotator must be a 2D array or dataframe, " + "each row represents an example and each column represents an annotator." + ) + + num_classes = get_num_classes(pred_probs=pred_probs) + + # if all examples are only labeled by a single annotator + if (np.sum(~np.isnan(labels_multiannotator), axis=1) == 1).all(): + optimal_temp = 1.0 # do not temp scale for single annotator case, temperature is defined here for later use + + assert_valid_inputs_multiannotator( + labels_multiannotator, pred_probs, allow_single_label=True + ) + + consensus_label = get_majority_vote_label( + labels_multiannotator=labels_multiannotator, + pred_probs=pred_probs, + verbose=False, + ) + quality_of_consensus_labeled = get_label_quality_scores(consensus_label, pred_probs) + model_weight = 1 + annotator_weight = np.full(labels_multiannotator.shape[1], 1) + avg_annotator_weight = np.mean(annotator_weight) + + # examples are annotated by multiple annotators + else: + optimal_temp = find_best_temp_scaler(labels_multiannotator, pred_probs) + pred_probs = temp_scale_pred_probs(pred_probs, optimal_temp) + + multiannotator_info = get_label_quality_multiannotator( + labels_multiannotator, + pred_probs, + return_annotator_stats=False, + return_detailed_quality=False, + return_weights=True, + ) + + quality_of_consensus_labeled = multiannotator_info["label_quality"][ + "consensus_quality_score" + ] + model_weight = multiannotator_info["model_weight"] + annotator_weight = multiannotator_info["annotator_weight"] + avg_annotator_weight = np.mean(annotator_weight) + + # compute scores for labeled data + active_learning_scores = np.full(len(labels_multiannotator), np.nan) + for i, annotator_labels in enumerate(labels_multiannotator): + active_learning_scores[i] = np.average( + (quality_of_consensus_labeled[i], 1 / num_classes), + weights=( + np.sum(annotator_weight[~np.isnan(annotator_labels)]) + model_weight, + avg_annotator_weight, + ), + ) + + # no labeled data provided so do not estimate temperature and model/annotator weights + elif pred_probs_unlabeled is not None: + num_classes = get_num_classes(pred_probs=pred_probs_unlabeled) + optimal_temp = 1 + model_weight = 1 + avg_annotator_weight = 1 + active_learning_scores = np.array([]) + + else: + raise ValueError( + "pred_probs and pred_probs_unlabeled cannot both be None, specify at least one of the two." + ) + + # compute scores for unlabeled data + if pred_probs_unlabeled is not None: + pred_probs_unlabeled = temp_scale_pred_probs(pred_probs_unlabeled, optimal_temp) + quality_of_consensus_unlabeled = np.max(pred_probs_unlabeled, axis=1) + + active_learning_scores_unlabeled = np.average( + np.stack( + [ + quality_of_consensus_unlabeled, + np.full(len(quality_of_consensus_unlabeled), 1 / num_classes), + ] + ), + weights=[model_weight, avg_annotator_weight], + axis=0, + ) + + else: + active_learning_scores_unlabeled = np.array([]) + + return active_learning_scores, active_learning_scores_unlabeled
+ + +
[docs]def get_active_learning_scores_ensemble( + labels_multiannotator: Optional[Union[pd.DataFrame, np.ndarray]] = None, + pred_probs: Optional[np.ndarray] = None, + pred_probs_unlabeled: Optional[np.ndarray] = None, +) -> Tuple[np.ndarray, np.ndarray]: + """Returns an ActiveLab quality score for each example in the dataset, based on predictions from an ensemble of models. + + This function is similar to `~cleanlab.multiannotator.get_active_learning_scores` but allows for an + ensemble of multiple classifier models to be trained and will aggregate predictions from the models to compute the ActiveLab quality score. + + Parameters + ---------- + labels_multiannotator : pd.DataFrame or np.ndarray + Multiannotator labels in the same format expected by `~cleanlab.multiannotator.get_active_learning_scores`. + This argument is optional if ``pred_probs`` is not provided (in cases where you only provide ``pred_probs_unlabeled`` to get active learning scores for unlabeled examples). + pred_probs : np.ndarray + An array of shape ``(P, N, K)`` where P is the number of models, consisting of predicted class probabilities from the ensemble models. + Note that this function also works with datasets where there is only one annotator (M=1). + Each set of predicted probabilities with shape ``(N, K)`` is in the same format expected by the :py:func:`get_label_quality_scores <cleanlab.rank.get_label_quality_scores>`. + This argument is optional if you only want to get active learning scores for unlabeled examples (pass in ``pred_probs_unlabeled`` instead). + pred_probs_unlabeled : np.ndarray, optional + An array of shape ``(P, N, K)`` where P is the number of models, consisting of predicted class probabilities from a trained classifier model + for examples that have no annotated labels so far (but which we may want to label in the future, and hence compute active learning quality scores for). + Each set of predicted probabilities with shape ``(N, K)`` is in the same format expected by the :py:func:`get_label_quality_scores <cleanlab.rank.get_label_quality_scores>`. + This argument is optional if you only want to get active learning scores for labeled examples (pass in ``pred_probs`` instead). + + Returns + ------- + active_learning_scores : np.ndarray + Similar to output as :py:func:`get_label_quality_scores <cleanlab.multiannotator.get_label_quality_scores>`. + active_learning_scores_unlabeled : np.ndarray + Similar to output as :py:func:`get_label_quality_scores <cleanlab.multiannotator.get_label_quality_scores>`. + + See Also + -------- + get_active_learning_scores + """ + + assert_valid_pred_probs( + pred_probs=pred_probs, pred_probs_unlabeled=pred_probs_unlabeled, ensemble=True + ) + + # compute multiannotator stats if labeled data is provided + if pred_probs is not None: + if labels_multiannotator is None: + raise ValueError( + "labels_multiannotator cannot be None when passing in pred_probs. ", + "You can either provide labels_multiannotator to obtain active learning scores for the labeled examples, " + "or just pass in pred_probs_unlabeled to get active learning scores for unlabeled examples.", + ) + + if isinstance(labels_multiannotator, pd.DataFrame): + labels_multiannotator = ( + labels_multiannotator.replace({pd.NA: np.NaN}).astype(float).to_numpy() + ) + elif not isinstance(labels_multiannotator, np.ndarray): + raise ValueError( + "labels_multiannotator must be either a NumPy array or Pandas DataFrame." + ) + + # check that labels_multiannotator is a 2D array + if labels_multiannotator.ndim != 2: + raise ValueError( + "labels_multiannotator must be a 2D array or dataframe, " + "each row represents an example and each column represents an annotator." + ) + + num_classes = get_num_classes(pred_probs=pred_probs[0]) + + # if all examples are only labeled by a single annotator + if (np.sum(~np.isnan(labels_multiannotator), axis=1) == 1).all(): + # do not temp scale for single annotator case, temperature is defined here for later use + optimal_temp = np.full(len(pred_probs), 1.0) + + assert_valid_inputs_multiannotator( + labels_multiannotator, pred_probs, ensemble=True, allow_single_label=True + ) + + avg_pred_probs = np.mean(pred_probs, axis=0) + consensus_label = get_majority_vote_label( + labels_multiannotator=labels_multiannotator, + pred_probs=avg_pred_probs, + verbose=False, + ) + quality_of_consensus_labeled = get_label_quality_scores(consensus_label, avg_pred_probs) + model_weight = np.full(len(pred_probs), 1) + annotator_weight = np.full(labels_multiannotator.shape[1], 1) + avg_annotator_weight = np.mean(annotator_weight) + + # examples are annotated by multiple annotators + else: + optimal_temp = np.full(len(pred_probs), np.NaN) + for i, curr_pred_probs in enumerate(pred_probs): + curr_optimal_temp = find_best_temp_scaler(labels_multiannotator, curr_pred_probs) + pred_probs[i] = temp_scale_pred_probs(curr_pred_probs, curr_optimal_temp) + optimal_temp[i] = curr_optimal_temp + + multiannotator_info = get_label_quality_multiannotator_ensemble( + labels_multiannotator, + pred_probs, + return_annotator_stats=False, + return_detailed_quality=False, + return_weights=True, + ) + + quality_of_consensus_labeled = multiannotator_info["label_quality"][ + "consensus_quality_score" + ] + model_weight = multiannotator_info["model_weight"] + annotator_weight = multiannotator_info["annotator_weight"] + avg_annotator_weight = np.mean(annotator_weight) + + # compute scores for labeled data + active_learning_scores = np.full(len(labels_multiannotator), np.nan) + for i, annotator_labels in enumerate(labels_multiannotator): + active_learning_scores[i] = np.average( + (quality_of_consensus_labeled[i], 1 / num_classes), + weights=( + np.sum(annotator_weight[~np.isnan(annotator_labels)]) + np.sum(model_weight), + avg_annotator_weight, + ), + ) + + # no labeled data provided so do not estimate temperature and model/annotator weights + elif pred_probs_unlabeled is not None: + num_classes = get_num_classes(pred_probs=pred_probs_unlabeled[0]) + optimal_temp = np.full(len(pred_probs_unlabeled), 1.0) + model_weight = np.full(len(pred_probs_unlabeled), 1) + avg_annotator_weight = 1 + active_learning_scores = np.array([]) + + else: + raise ValueError( + "pred_probs and pred_probs_unlabeled cannot both be None, specify at least one of the two." + ) + + # compute scores for unlabeled data + if pred_probs_unlabeled is not None: + for i in range(len(pred_probs_unlabeled)): + pred_probs_unlabeled[i] = temp_scale_pred_probs( + pred_probs_unlabeled[i], optimal_temp[i] + ) + + avg_pred_probs_unlabeled = np.mean(pred_probs_unlabeled, axis=0) + consensus_label_unlabeled = get_majority_vote_label( + np.argmax(pred_probs_unlabeled, axis=2).T, + avg_pred_probs_unlabeled, + ) + modified_pred_probs_unlabeled = np.average( + np.concatenate( + ( + pred_probs_unlabeled, + np.full(pred_probs_unlabeled.shape[1:], 1 / num_classes)[np.newaxis, :, :], + ) + ), + weights=np.concatenate((model_weight, np.array([avg_annotator_weight]))), + axis=0, + ) + + active_learning_scores_unlabeled = get_label_quality_scores( + consensus_label_unlabeled, modified_pred_probs_unlabeled + ) + else: + active_learning_scores_unlabeled = np.array([]) + + return active_learning_scores, active_learning_scores_unlabeled
+ + +
[docs]def get_majority_vote_label( + labels_multiannotator: Union[pd.DataFrame, np.ndarray], + pred_probs: Optional[np.ndarray] = None, + verbose: bool = True, +) -> np.ndarray: + """Returns the majority vote label for each example, aggregated from the labels given by multiple annotators. + + Parameters + ---------- + labels_multiannotator : pd.DataFrame or np.ndarray + 2D pandas DataFrame or array of multiple given labels for each example with shape ``(N, M)``, + where N is the number of examples and M is the number of annotators. + For more details, labels in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + pred_probs : np.ndarray, optional + An array of shape ``(N, K)`` of model-predicted probabilities, ``P(label=k|x)``. + For details, predicted probabilities in the same format expected by `~cleanlab.multiannotator.get_label_quality_multiannotator`. + verbose : bool, optional + Important warnings and other printed statements may be suppressed if ``verbose`` is set to ``False``. + Returns + ------- + consensus_label: np.ndarray + An array of shape ``(N,)`` with the majority vote label aggregated from all annotators. + + In the event of majority vote ties, ties are broken in the following order: + using the model ``pred_probs`` (if provided) and selecting the class with highest predicted probability, + using the empirical class frequencies and selecting the class with highest frequency, + using an initial annotator quality score and selecting the class that has been labeled by annotators with higher quality, + and lastly by random selection. + """ + + if isinstance(labels_multiannotator, pd.DataFrame): + annotator_ids = labels_multiannotator.columns + labels_multiannotator = ( + labels_multiannotator.replace({pd.NA: np.NaN}).astype(float).to_numpy() + ) + elif isinstance(labels_multiannotator, np.ndarray): + annotator_ids = None + else: + raise ValueError("labels_multiannotator must be either a NumPy array or Pandas DataFrame.") + + if verbose: + assert_valid_inputs_multiannotator( + labels_multiannotator, pred_probs, annotator_ids=annotator_ids + ) + + if pred_probs is not None: + num_classes = pred_probs.shape[1] + else: + num_classes = int(np.nanmax(labels_multiannotator) + 1) + + array_idx = np.arange(labels_multiannotator.shape[0]) + label_count = np.zeros((labels_multiannotator.shape[0], num_classes)) + for i in range(labels_multiannotator.shape[1]): + not_nan_mask = ~np.isnan(labels_multiannotator[:, i]) + # Get the indexes where the label is not missing for the annotator i as int. + label_index = labels_multiannotator[not_nan_mask, i].astype(int) + # Increase the counts of those labels by 1. + label_count[array_idx[not_nan_mask], label_index] += 1 + + mode_labels_multiannotator = np.full(label_count.shape, np.nan) + modes_mask = label_count == np.max(label_count, axis=1).reshape(-1, 1) + insert_index = np.zeros(modes_mask.shape[0], dtype=int) + for i in range(modes_mask.shape[1]): + mode_index = np.where(modes_mask[:, i])[0] + mode_labels_multiannotator[mode_index, insert_index[mode_index]] = i + insert_index[mode_index] += 1 + + majority_vote_label = np.full(len(labels_multiannotator), np.nan) + label_mode_count = (~np.isnan(mode_labels_multiannotator)).sum(axis=1) + + # obtaining consensus using annotator majority vote + mode_count_one_mask = label_mode_count == 1 + majority_vote_label[mode_count_one_mask] = mode_labels_multiannotator[mode_count_one_mask, 0] + nontied_idx = array_idx[mode_count_one_mask] + tied_idx = { + i: label_mode[:count].astype(int) + for i, label_mode, count in zip( + array_idx[~mode_count_one_mask], + mode_labels_multiannotator[~mode_count_one_mask, :], + label_mode_count[~mode_count_one_mask], + ) + } + + # tiebreak 1: using pred_probs (if provided) + if pred_probs is not None and len(tied_idx) > 0: + for idx, label_mode in tied_idx.copy().items(): + max_pred_probs = np.where( + pred_probs[idx, label_mode] == np.max(pred_probs[idx, label_mode]) + )[0] + if len(max_pred_probs) == 1: + majority_vote_label[idx] = label_mode[max_pred_probs[0]] + del tied_idx[idx] + else: + tied_idx[idx] = label_mode[max_pred_probs] + + # tiebreak 2: using empirical class frequencies + # current tiebreak will select the minority class (to prevent larger class imbalance) + if len(tied_idx) > 0: + class_frequencies = label_count.sum(axis=0) + for idx, label_mode in tied_idx.copy().items(): + min_frequency = np.where( + class_frequencies[label_mode] == np.min(class_frequencies[label_mode]) + )[0] + if len(min_frequency) == 1: + majority_vote_label[idx] = label_mode[min_frequency[0]] + del tied_idx[idx] + else: + tied_idx[idx] = label_mode[min_frequency] + + # tiebreak 3: using initial annotator quality scores + if len(tied_idx) > 0: + nontied_majority_vote_label = majority_vote_label[nontied_idx] + nontied_labels_multiannotator = labels_multiannotator[nontied_idx] + annotator_agreement_with_consensus = np.zeros(nontied_labels_multiannotator.shape[1]) + for i in range(len(annotator_agreement_with_consensus)): + labels = nontied_labels_multiannotator[:, i] + labels_mask = ~np.isnan(labels) + if np.sum(labels_mask) == 0: + annotator_agreement_with_consensus[i] = np.NaN + else: + annotator_agreement_with_consensus[i] = np.mean( + labels[labels_mask] == nontied_majority_vote_label[labels_mask] + ) + + # impute average annotator accuracy for any annotator that do not overlap with consensus + nan_mask = np.isnan(annotator_agreement_with_consensus) + avg_annotator_agreement = np.mean(annotator_agreement_with_consensus[~nan_mask]) + annotator_agreement_with_consensus[nan_mask] = avg_annotator_agreement + + for idx, label_mode in tied_idx.copy().items(): + label_quality_score = np.array( + [ + np.mean( + annotator_agreement_with_consensus[ + np.where(labels_multiannotator[idx] == label)[0] + ] + ) + for label in label_mode + ] + ) + max_score = np.where(label_quality_score == label_quality_score.max())[0] + if len(max_score) == 1: + majority_vote_label[idx] = label_mode[max_score[0]] + del tied_idx[idx] + else: + tied_idx[idx] = label_mode[max_score] + + # if still tied, break by random selection + if len(tied_idx) > 0: + warnings.warn( + f"breaking ties of examples {list(tied_idx.keys())} by random selection, you may want to set seed for reproducability" + ) + for idx, label_mode in tied_idx.items(): + majority_vote_label[idx] = np.random.choice(label_mode) + + if verbose: + # check if any classes no longer appear in the set of consensus labels + check_consensus_label_classes( + labels_multiannotator=labels_multiannotator, + consensus_label=majority_vote_label, + consensus_method="majority_vote", + ) + + return majority_vote_label.astype(int)
+ + +
[docs]def convert_long_to_wide_dataset( + labels_multiannotator_long: pd.DataFrame, +) -> pd.DataFrame: + """Converts a long format dataset to wide format which is suitable for passing into + `~cleanlab.multiannotator.get_label_quality_multiannotator`. + + Dataframe must contain three columns named: + + #. ``task`` representing each example labeled by the annotators + #. ``annotator`` representing each annotator + #. ``label`` representing the label given by an annotator for the corresponding task (i.e. example) + + Parameters + ---------- + labels_multiannotator_long : pd.DataFrame + pandas DataFrame in long format with three columns named ``task``, ``annotator`` and ``label`` + + Returns + ------- + labels_multiannotator_wide : pd.DataFrame + pandas DataFrame of the proper format to be passed as ``labels_multiannotator`` for the other ``cleanlab.multiannotator`` functions. + """ + labels_multiannotator_wide = labels_multiannotator_long.pivot( + index="task", columns="annotator", values="label" + ) + labels_multiannotator_wide.index.name = None + labels_multiannotator_wide.columns.name = None + return labels_multiannotator_wide
+ + +def _get_consensus_stats( + labels_multiannotator: np.ndarray, + pred_probs: np.ndarray, + num_annotations: np.ndarray, + consensus_label: np.ndarray, + quality_method: str = "crowdlab", + verbose: bool = True, + ensemble: bool = False, + label_quality_score_kwargs: dict = {}, +) -> tuple: + """Returns a tuple containing the consensus labels, annotator agreement scores, and quality of consensus + + Parameters + ---------- + labels_multiannotator : np.ndarray + 2D numpy array of multiple given labels for each example with shape ``(N, M)``, + where N is the number of examples and M is the number of annotators. + For more details, labels in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + pred_probs : np.ndarray + An array of shape ``(N, K)`` of model-predicted probabilities, ``P(label=k|x)``. + For details, predicted probabilities in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + num_annotations : np.ndarray + An array of shape ``(N,)`` with the number of annotators that have labeled each example. + consensus_label : np.ndarray + An array of shape ``(N,)`` with the consensus labels aggregated from all annotators. + quality_method : str, default = "crowdlab" (Options: ["crowdlab", "agreement"]) + Specifies the method used to calculate the quality of the consensus label. + For valid quality methods, view `~cleanlab.multiannotator.get_label_quality_multiannotator` + label_quality_score_kwargs : dict, optional + Keyword arguments to pass into ``get_label_quality_scores()``. + verbose : bool, default = True + Certain warnings and notes will be printed if ``verbose`` is set to ``True``. + ensemble : bool, default = False + Boolean flag to indicate whether the pred_probs passed are from ensemble models. + + Returns + ------ + stats : tuple + A tuple of (consensus_label, annotator_agreement, consensus_quality_score, post_pred_probs). + """ + + # compute the fraction of annotator agreeing with the consensus labels + annotator_agreement = _get_annotator_agreement_with_consensus( + labels_multiannotator=labels_multiannotator, + consensus_label=consensus_label, + ) + + # compute posterior predicted probabilites + if ensemble: + post_pred_probs, model_weight, annotator_weight = _get_post_pred_probs_and_weights_ensemble( + labels_multiannotator=labels_multiannotator, + consensus_label=consensus_label, + prior_pred_probs=pred_probs, + num_annotations=num_annotations, + annotator_agreement=annotator_agreement, + quality_method=quality_method, + verbose=verbose, + ) + else: + post_pred_probs, model_weight, annotator_weight = _get_post_pred_probs_and_weights( + labels_multiannotator=labels_multiannotator, + consensus_label=consensus_label, + prior_pred_probs=pred_probs, + num_annotations=num_annotations, + annotator_agreement=annotator_agreement, + quality_method=quality_method, + verbose=verbose, + ) + + # compute quality of the consensus labels + consensus_quality_score = _get_consensus_quality_score( + consensus_label=consensus_label, + pred_probs=post_pred_probs, + num_annotations=num_annotations, + annotator_agreement=annotator_agreement, + quality_method=quality_method, + label_quality_score_kwargs=label_quality_score_kwargs, + ) + + return ( + annotator_agreement, + consensus_quality_score, + post_pred_probs, + model_weight, + annotator_weight, + ) + + +def _get_annotator_stats( + labels_multiannotator: np.ndarray, + pred_probs: np.ndarray, + consensus_label: np.ndarray, + num_annotations: np.ndarray, + annotator_agreement: np.ndarray, + model_weight: np.ndarray, + annotator_weight: np.ndarray, + consensus_quality_score: np.ndarray, + detailed_label_quality: Optional[np.ndarray] = None, + annotator_ids: Optional[pd.Index] = None, + quality_method: str = "crowdlab", +) -> pd.DataFrame: + """Returns a dictionary containing overall statistics about each annotator. + + Parameters + ---------- + labels_multiannotator : np.ndarray + 2D numpy array of multiple given labels for each example with shape ``(N, M)``, + where N is the number of examples and M is the number of annotators. + For more details, labels in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + pred_probs : np.ndarray + An array of shape ``(N, K)`` of model-predicted probabilities, ``P(label=k|x)``. + For details, predicted probabilities in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + consensus_label : np.ndarray + An array of shape ``(N,)`` with the consensus labels aggregated from all annotators. + num_annotations : np.ndarray + An array of shape ``(N,)`` with the number of annotators that have labeled each example. + annotator_agreement : np.ndarray + An array of shape ``(N,)`` with the fraction of annotators that agree with each consensus label. + model_weight : float + float specifying the model weight used in weighted averages, + None if model weight is not used to compute quality scores + annotator_weight : np.ndarray + An array of shape ``(M,)`` where M is the number of annotators, specifying the annotator weights used in weighted averages, + None if annotator weights are not used to compute quality scores + consensus_quality_score : np.ndarray + An array of shape ``(N,)`` with the quality score of the consensus. + detailed_label_quality : + pandas DataFrame containing the detailed label quality scores for all examples and annotators + quality_method : str, default = "crowdlab" (Options: ["crowdlab", "agreement"]) + Specifies the method used to calculate the quality of the consensus label. + For valid quality methods, view `~cleanlab.multiannotator.get_label_quality_multiannotator` + + Returns + ------- + annotator_stats : pd.DataFrame + Overall statistics about each annotator. + For details, see the documentation of `~cleanlab.multiannotator.get_label_quality_multiannotator`. + """ + + annotator_quality = _get_annotator_quality( + labels_multiannotator=labels_multiannotator, + pred_probs=pred_probs, + consensus_label=consensus_label, + num_annotations=num_annotations, + annotator_agreement=annotator_agreement, + model_weight=model_weight, + annotator_weight=annotator_weight, + detailed_label_quality=detailed_label_quality, + quality_method=quality_method, + ) + + # Compute the number of labels labeled/ by each annotator + num_examples_labeled = np.sum(~np.isnan(labels_multiannotator), axis=0) + + # Compute the fraction of labels annotated by each annotator that agrees with the consensus label + # TODO: check if we should drop singleton labels here + agreement_with_consensus = np.zeros(labels_multiannotator.shape[1]) + for i in range(len(agreement_with_consensus)): + labels = labels_multiannotator[:, i] + labels_mask = ~np.isnan(labels) + agreement_with_consensus[i] = np.mean(labels[labels_mask] == consensus_label[labels_mask]) + + # Find the worst labeled class for each annotator + worst_class = _get_annotator_worst_class( + labels_multiannotator=labels_multiannotator, + consensus_label=consensus_label, + consensus_quality_score=consensus_quality_score, + ) + + # Create multi-annotator stats DataFrame from its columns + annotator_stats = pd.DataFrame( + { + "annotator_quality": annotator_quality, + "agreement_with_consensus": agreement_with_consensus, + "worst_class": worst_class, + "num_examples_labeled": num_examples_labeled, + }, + index=annotator_ids, + ) + + return annotator_stats.sort_values(by=["annotator_quality", "agreement_with_consensus"]) + + +def _get_annotator_agreement_with_consensus( + labels_multiannotator: np.ndarray, + consensus_label: np.ndarray, +) -> np.ndarray: + """Returns the fractions of annotators that agree with the consensus label per example. Note that the + fraction for each example only considers the annotators that labeled that particular example. + + Parameters + ---------- + labels_multiannotator : np.ndarray + 2D numpy array of multiple given labels for each example with shape ``(N, M)``, + where N is the number of examples and M is the number of annotators. + For more details, labels in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + consensus_label : np.ndarray + An array of shape ``(N,)`` with the consensus labels aggregated from all annotators. + + Returns + ------- + annotator_agreement : np.ndarray + An array of shape ``(N,)`` with the fraction of annotators that agree with each consensus label. + """ + annotator_agreement = np.zeros(len(labels_multiannotator)) + for i in range(labels_multiannotator.shape[1]): + annotator_agreement += labels_multiannotator[:, i] == consensus_label + annotator_agreement /= (~np.isnan(labels_multiannotator)).sum(axis=1) + return annotator_agreement + + +def _get_annotator_agreement_with_annotators( + labels_multiannotator: np.ndarray, + num_annotations: np.ndarray, + verbose: bool = True, +) -> np.ndarray: + """Returns the average agreement of each annotator with other annotators that label the same example. + + Parameters + ---------- + labels_multiannotator : np.ndarray + 2D numpy array of multiple given labels for each example with shape ``(N, M)``, + where N is the number of examples and M is the number of annotators. + For more details, labels in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + consensus_label : np.ndarray + An array of shape ``(N,)`` with the consensus labels aggregated from all annotators. + verbose : bool, default = True + Certain warnings and notes will be printed if ``verbose`` is set to ``True``. + + Returns + ------- + annotator_agreement : np.ndarray + An array of shape ``(M,)`` where M is the number of annotators, with the agreement of each annotator with other + annotators that labeled the same examples. + """ + + annotator_agreement_with_annotators = np.zeros(labels_multiannotator.shape[1]) + for i in range(len(annotator_agreement_with_annotators)): + annotator_labels = labels_multiannotator[:, i] + annotator_labels_mask = ~np.isnan(annotator_labels) + annotator_agreement_with_annotators[i] = _get_single_annotator_agreement( + labels_multiannotator[annotator_labels_mask], num_annotations[annotator_labels_mask], i + ) + + # impute average annotator accuracy for any annotator that do not overlap with other annotators + non_overlap_mask = np.isnan(annotator_agreement_with_annotators) + if np.sum(non_overlap_mask) > 0: + if verbose: + print( + f"Annotator(s) {list(np.where(non_overlap_mask)[0])} did not annotate any examples that overlap with other annotators, \ + \nusing the average annotator agreeement among other annotators as this annotator's agreement." + ) + + avg_annotator_agreement = np.mean(annotator_agreement_with_annotators[~non_overlap_mask]) + annotator_agreement_with_annotators[non_overlap_mask] = avg_annotator_agreement + + return annotator_agreement_with_annotators + + +def _get_single_annotator_agreement( + labels_multiannotator: np.ndarray, + num_annotations: np.ndarray, + annotator_idx: int, +) -> float: + """Returns the average agreement of a given annotator other annotators that label the same example. + + Parameters + ---------- + labels_multiannotator : np.ndarray + 2D numpy array of multiple given labels for each example with shape ``(N, M)``, + where N is the number of examples and M is the number of annotators. + For more details, labels in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + num_annotations : np.ndarray + An array of shape ``(N,)`` with the number of annotators that have labeled each example. + annotator_idx : int + The index of the annotator we want to compute the annotator agreement for. + + Returns + ------- + annotator_agreement : float + An float repesenting the agreement of each annotator with other annotators that labeled the same examples. + """ + adjusted_num_annotations = num_annotations - 1 + if np.sum(adjusted_num_annotations) == 0: + return np.NaN + + multi_annotations_mask = num_annotations > 1 + annotator_agreement_per_example = np.zeros(len(labels_multiannotator)) + for i in range(labels_multiannotator.shape[1]): + annotator_agreement_per_example[multi_annotations_mask] += ( + labels_multiannotator[multi_annotations_mask, annotator_idx] + == labels_multiannotator[multi_annotations_mask, i] + ) + annotator_agreement_per_example[multi_annotations_mask] = ( + annotator_agreement_per_example[multi_annotations_mask] - 1 + ) / adjusted_num_annotations[multi_annotations_mask] + + annotator_agreement = np.average(annotator_agreement_per_example, weights=num_annotations - 1) + return annotator_agreement + + +def _get_post_pred_probs_and_weights( + labels_multiannotator: np.ndarray, + consensus_label: np.ndarray, + prior_pred_probs: np.ndarray, + num_annotations: np.ndarray, + annotator_agreement: np.ndarray, + quality_method: str = "crowdlab", + verbose: bool = True, +) -> Tuple[np.ndarray, Optional[float], Optional[np.ndarray]]: + """Return the posterior predicted probabilities of each example given a specified quality method. + + Parameters + ---------- + labels_multiannotator : np.ndarray + 2D numpy array of multiple given labels for each example with shape ``(N, M)``, + where N is the number of examples and M is the number of annotators. + For more details, labels in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + consensus_label : np.ndarray + An array of shape ``(N,)`` with the consensus labels aggregated from all annotators. + prior_pred_probs : np.ndarray + An array of shape ``(N, K)`` of prior predicted probabilities, ``P(label=k|x)``, usually the out-of-sample predicted probability computed by a model. + For details, predicted probabilities in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + num_annotations : np.ndarray + An array of shape ``(N,)`` with the number of annotators that have labeled each example. + annotator_agreement : np.ndarray + An array of shape ``(N,)`` with the fraction of annotators that agree with each consensus label. + quality_method : default = "crowdlab" (Options: ["crowdlab", "agreement"]) + Specifies the method used to calculate the quality of the consensus label. + For valid quality methods, view `~cleanlab.multiannotator.get_label_quality_multiannotator` + verbose : bool, default = True + Certain warnings and notes will be printed if ``verbose`` is set to ``True``. + + Returns + ------- + post_pred_probs : np.ndarray + An array of shape ``(N, K)`` with the posterior predicted probabilities. + + model_weight : float + float specifying the model weight used in weighted averages, + None if model weight is not used to compute quality scores + + annotator_weight : np.ndarray + An array of shape ``(M,)`` where M is the number of annotators, specifying the annotator weights used in weighted averages, + None if annotator weights are not used to compute quality scores + + """ + valid_methods = [ + "crowdlab", + "agreement", + ] + + # setting dummy variables for model and annotator weights that will be returned + # only relevant for quality_method == crowdlab, return None for all other methods + return_model_weight = None + return_annotator_weight = None + + if quality_method == "crowdlab": + num_classes = get_num_classes(pred_probs=prior_pred_probs) + + # likelihood that any annotator will or will not annotate the consensus label for any example + consensus_likelihood = np.mean(annotator_agreement[num_annotations != 1]) + non_consensus_likelihood = (1 - consensus_likelihood) / (num_classes - 1) + + # subsetting the dataset to only includes examples with more than one annotation + mask = num_annotations != 1 + consensus_label_subset = consensus_label[mask] + prior_pred_probs_subset = prior_pred_probs[mask] + + # compute most likely class error + most_likely_class_error = np.clip( + np.mean( + consensus_label_subset + != np.argmax(np.bincount(consensus_label_subset, minlength=num_classes)) + ), + a_min=CLIPPING_LOWER_BOUND, + a_max=None, + ) + + # compute adjusted annotator agreement (used as annotator weights) + annotator_agreement_with_annotators = _get_annotator_agreement_with_annotators( + labels_multiannotator, num_annotations, verbose + ) + annotator_error = 1 - annotator_agreement_with_annotators + adjusted_annotator_agreement = np.clip( + 1 - (annotator_error / most_likely_class_error), a_min=CLIPPING_LOWER_BOUND, a_max=None + ) + # compute model weight + model_error = np.mean(np.argmax(prior_pred_probs_subset, axis=1) != consensus_label_subset) + model_weight = np.max( + [(1 - (model_error / most_likely_class_error)), CLIPPING_LOWER_BOUND] + ) * np.sqrt(np.mean(num_annotations)) + + non_nan_mask = ~np.isnan(labels_multiannotator) + annotation_weight = np.zeros(labels_multiannotator.shape[0]) + for i in range(labels_multiannotator.shape[1]): + annotation_weight[non_nan_mask[:, i]] += adjusted_annotator_agreement[i] + total_weight = annotation_weight + model_weight + + # compute weighted average + post_pred_probs = np.full(prior_pred_probs.shape, np.nan) + for i in range(prior_pred_probs.shape[1]): + post_pred_probs[:, i] = prior_pred_probs[:, i] * model_weight + for k in range(labels_multiannotator.shape[1]): + mask = ~np.isnan(labels_multiannotator[:, k]) + post_pred_probs[mask, i] += np.where( + labels_multiannotator[mask, k] == i, + adjusted_annotator_agreement[k] * consensus_likelihood, + adjusted_annotator_agreement[k] * non_consensus_likelihood, + ) + post_pred_probs[:, i] /= total_weight + + return_model_weight = model_weight + return_annotator_weight = adjusted_annotator_agreement + + elif quality_method == "agreement": + num_classes = get_num_classes(pred_probs=prior_pred_probs) + label_counts = np.full((len(labels_multiannotator), num_classes), np.NaN) + for i, labels in enumerate(labels_multiannotator): + label_counts[i, :] = value_counts(labels[~np.isnan(labels)], num_classes=num_classes) + + post_pred_probs = label_counts / num_annotations.reshape(-1, 1) + + else: + raise ValueError( + f""" + {quality_method} is not a valid quality method! + Please choose a valid quality_method: {valid_methods} + """ + ) + + return post_pred_probs, return_model_weight, return_annotator_weight + + +def _get_post_pred_probs_and_weights_ensemble( + labels_multiannotator: np.ndarray, + consensus_label: np.ndarray, + prior_pred_probs: np.ndarray, + num_annotations: np.ndarray, + annotator_agreement: np.ndarray, + quality_method: str = "crowdlab", + verbose: bool = True, +) -> Tuple[np.ndarray, Any, Any]: + """Return the posterior predicted class probabilites of each example given a specified quality method and prior predicted class probabilities from an ensemble of multiple classifier models. + + Parameters + ---------- + labels_multiannotator : np.ndarray + 2D numpy array of multiple given labels for each example with shape ``(N, M)``, + where N is the number of examples and M is the number of annotators. + For more details, labels in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + consensus_label : np.ndarray + An array of shape ``(P, N, K)`` where P is the number of models, consisting of predicted class probabilities from the ensemble models. + Each set of predicted probabilities with shape ``(N, K)`` is in the same format expected by the :py:func:`get_label_quality_scores <cleanlab.rank.get_label_quality_scores>`. + prior_pred_probs : np.ndarray + An array of shape ``(N, K)`` of prior predicted probabilities, ``P(label=k|x)``, usually the out-of-sample predicted probability computed by a model. + For details, predicted probabilities in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + num_annotations : np.ndarray + An array of shape ``(N,)`` with the number of annotators that have labeled each example. + annotator_agreement : np.ndarray + An array of shape ``(N,)`` with the fraction of annotators that agree with each consensus label. + quality_method : str, default = "crowdlab" (Options: ["crowdlab", "agreement"]) + Specifies the method used to calculate the quality of the consensus label. + For valid quality methods, view `~cleanlab.multiannotator.get_label_quality_multiannotator` + verbose : bool, default = True + Certain warnings and notes will be printed if ``verbose`` is set to ``True``. + + Returns + ------- + post_pred_probs : np.ndarray + An array of shape ``(N, K)`` with the posterior predicted probabilities. + + model_weight : np.ndarray + An array of shape ``(P,)`` where P is the number of models in this ensemble, specifying the model weight used in weighted averages, + ``None`` if model weight is not used to compute quality scores + + annotator_weight : np.ndarray + An array of shape ``(M,)`` where M is the number of annotators, specifying the annotator weights used in weighted averages, + ``None`` if annotator weights are not used to compute quality scores + + """ + + num_classes = get_num_classes(pred_probs=prior_pred_probs[0]) + + # likelihood that any annotator will or will not annotate the consensus label for any example + consensus_likelihood = np.mean(annotator_agreement[num_annotations != 1]) + non_consensus_likelihood = (1 - consensus_likelihood) / (num_classes - 1) + + # subsetting the dataset to only includes examples with more than one annotation + mask = num_annotations != 1 + consensus_label_subset = consensus_label[mask] + + # compute most likely class error + most_likely_class_error = np.clip( + np.mean( + consensus_label_subset + != np.argmax(np.bincount(consensus_label_subset, minlength=num_classes)) + ), + a_min=CLIPPING_LOWER_BOUND, + a_max=None, + ) + + # compute adjusted annotator agreement (used as annotator weights) + annotator_agreement_with_annotators = _get_annotator_agreement_with_annotators( + labels_multiannotator, num_annotations, verbose + ) + annotator_error = 1 - annotator_agreement_with_annotators + adjusted_annotator_agreement = np.clip( + 1 - (annotator_error / most_likely_class_error), a_min=CLIPPING_LOWER_BOUND, a_max=None + ) + + # compute model weight + model_weight = np.full(prior_pred_probs.shape[0], np.nan) + for idx in range(prior_pred_probs.shape[0]): + prior_pred_probs_subset = prior_pred_probs[idx][mask] + + model_error = np.mean(np.argmax(prior_pred_probs_subset, axis=1) != consensus_label_subset) + model_weight[idx] = np.max( + [(1 - (model_error / most_likely_class_error)), CLIPPING_LOWER_BOUND] + ) * np.sqrt(np.mean(num_annotations)) + + # compute weighted average + post_pred_probs = np.full(prior_pred_probs[0].shape, np.nan) + for i, labels in enumerate(labels_multiannotator): + labels_mask = ~np.isnan(labels) + labels_subset = labels[labels_mask] + post_pred_probs[i] = [ + np.average( + [prior_pred_probs[ind][i, true_label] for ind in range(prior_pred_probs.shape[0])] + + [ + ( + consensus_likelihood + if annotator_label == true_label + else non_consensus_likelihood + ) + for annotator_label in labels_subset + ], + weights=np.concatenate((model_weight, adjusted_annotator_agreement[labels_mask])), + ) + for true_label in range(num_classes) + ] + + return_model_weight = model_weight + return_annotator_weight = adjusted_annotator_agreement + + return post_pred_probs, return_model_weight, return_annotator_weight + + +def _get_consensus_quality_score( + consensus_label: np.ndarray, + pred_probs: np.ndarray, + num_annotations: np.ndarray, + annotator_agreement: np.ndarray, + quality_method: str = "crowdlab", + label_quality_score_kwargs: dict = {}, +) -> np.ndarray: + """Return scores representing quality of the consensus label for each example. + + Parameters + ---------- + labels_multiannotator : np.ndarray + 2D numpy array of multiple given labels for each example with shape ``(N, M)``, + where N is the number of examples and M is the number of annotators. + For more details, labels in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + consensus_label : np.ndarray + An array of shape ``(N,)`` with the consensus labels aggregated from all annotators. + pred_probs : np.ndarray + An array of shape ``(N, K)`` of posterior predicted probabilities, ``P(label=k|x)``. + For details, predicted probabilities in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + num_annotations : np.ndarray + An array of shape ``(N,)`` with the number of annotators that have labeled each example. + annotator_agreement : np.ndarray + An array of shape ``(N,)`` with the fraction of annotators that agree with each consensus label. + quality_method : str, default = "crowdlab" (Options: ["crowdlab", "agreement"]) + Specifies the method used to calculate the quality of the consensus label. + For valid quality methods, view `~cleanlab.multiannotator.get_label_quality_multiannotator` + + Returns + ------- + consensus_quality_score : np.ndarray + An array of shape ``(N,)`` with the quality score of the consensus. + """ + + valid_methods = [ + "crowdlab", + "agreement", + ] + + if quality_method == "crowdlab": + consensus_quality_score = get_label_quality_scores( + consensus_label, pred_probs, **label_quality_score_kwargs + ) + + elif quality_method == "agreement": + consensus_quality_score = annotator_agreement + + else: + raise ValueError( + f""" + {quality_method} is not a valid consensus quality method! + Please choose a valid quality_method: {valid_methods} + """ + ) + + return consensus_quality_score + + +def _get_annotator_label_quality_score( + annotator_label: np.ndarray, + pred_probs: np.ndarray, + label_quality_score_kwargs: dict = {}, +) -> np.ndarray: + """Returns quality scores for each datapoint. + Very similar functionality as ``_get_consensus_quality_score`` with additional support for annotator labels that contain NaN values. + For more info about parameters and returns, see the docstring of `~cleanlab.multiannotator._get_consensus_quality_score`. + """ + mask = ~np.isnan(annotator_label) + + annotator_label_quality_score_subset = get_label_quality_scores( + labels=annotator_label[mask].astype(int), + pred_probs=pred_probs[mask], + **label_quality_score_kwargs, + ) + + annotator_label_quality_score = np.full(len(annotator_label), np.nan) + annotator_label_quality_score[mask] = annotator_label_quality_score_subset + return annotator_label_quality_score + + +def _get_annotator_quality( + labels_multiannotator: np.ndarray, + pred_probs: np.ndarray, + consensus_label: np.ndarray, + num_annotations: np.ndarray, + annotator_agreement: np.ndarray, + model_weight: np.ndarray, + annotator_weight: np.ndarray, + detailed_label_quality: Optional[np.ndarray] = None, + quality_method: str = "crowdlab", +) -> pd.DataFrame: + """Returns annotator quality score for each annotator. + + Parameters + ---------- + labels_multiannotator : np.ndarray + 2D numpy array of multiple given labels for each example with shape ``(N, M)``, + where N is the number of examples and M is the number of annotators. + For more details, labels in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + pred_probs : np.ndarray + An array of shape ``(N, K)`` of model-predicted probabilities, ``P(label=k|x)``. + For details, predicted probabilities in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + consensus_label : np.ndarray + An array of shape ``(N,)`` with the consensus labels aggregated from all annotators. + num_annotations : np.ndarray + An array of shape ``(N,)`` with the number of annotators that have labeled each example. + annotator_agreement : np.ndarray + An array of shape ``(N,)`` with the fraction of annotators that agree with each consensus label. + model_weight : float + An array of shape ``(P,)`` where P is the number of models in this ensemble, specifying the model weight used in weighted averages, + ``None`` if model weight is not used to compute quality scores + annotator_weight : np.ndarray + An array of shape ``(M,)`` where M is the number of annotators, specifying the annotator weights used in weighted averages, + ``None`` if annotator weights are not used to compute quality scores + detailed_label_quality : + pandas DataFrame containing the detailed label quality scores for all examples and annotators + quality_method : str, default = "crowdlab" (Options: ["crowdlab", "agreement"]) + Specifies the method used to calculate the quality of the annotators. + For valid quality methods, view `~cleanlab.multiannotator.get_label_quality_multiannotator` + + Returns + ------- + annotator_quality : np.ndarray + Quality scores of a given annotator's labels + """ + + valid_methods = [ + "crowdlab", + "agreement", + ] + + if quality_method == "crowdlab": + if detailed_label_quality is None: + annotator_lqs = np.zeros(labels_multiannotator.shape[1]) + for i in range(len(annotator_lqs)): + labels = labels_multiannotator[:, i] + labels_mask = ~np.isnan(labels) + annotator_lqs[i] = np.mean( + get_label_quality_scores( + labels[labels_mask].astype(int), + pred_probs[labels_mask], + ) + ) + else: + annotator_lqs = np.nanmean(detailed_label_quality, axis=0) + + mask = num_annotations != 1 + labels_multiannotator_subset = labels_multiannotator[mask] + consensus_label_subset = consensus_label[mask] + + annotator_agreement = np.zeros(labels_multiannotator_subset.shape[1]) + for i in range(len(annotator_agreement)): + labels = labels_multiannotator_subset[:, i] + labels_mask = ~np.isnan(labels) + # case where annotator does not annotate any examples with any other annotators + # TODO: do we want to impute the mean or just return np.nan + if np.sum(labels_mask) == 0: + annotator_agreement[i] = np.NaN + else: + annotator_agreement[i] = np.mean( + labels[labels_mask] == consensus_label_subset[labels_mask], + ) + + avg_num_annotations_frac = np.mean(num_annotations) / len(annotator_weight) + annotator_weight_adjusted = np.sum(annotator_weight) * avg_num_annotations_frac + + w = model_weight / (model_weight + annotator_weight_adjusted) + annotator_quality = w * annotator_lqs + (1 - w) * annotator_agreement + + elif quality_method == "agreement": + mask = num_annotations != 1 + labels_multiannotator_subset = labels_multiannotator[mask] + consensus_label_subset = consensus_label[mask] + + annotator_quality = np.zeros(labels_multiannotator_subset.shape[1]) + for i in range(len(annotator_quality)): + labels = labels_multiannotator_subset[:, i] + labels_mask = ~np.isnan(labels) + # case where annotator does not annotate any examples with any other annotators + if np.sum(labels_mask) == 0: + annotator_quality[i] = np.NaN + else: + annotator_quality[i] = np.mean( + labels[labels_mask] == consensus_label_subset[labels_mask], + ) + + else: + raise ValueError( + f""" + {quality_method} is not a valid annotator quality method! + Please choose a valid quality_method: {valid_methods} + """ + ) + + return annotator_quality + + +def _get_annotator_worst_class( + labels_multiannotator: np.ndarray, + consensus_label: np.ndarray, + consensus_quality_score: np.ndarray, +) -> np.ndarray: + """Returns the class which each annotator makes the most errors in. + + Parameters + ---------- + labels_multiannotator : np.ndarray + 2D pandas DataFrame of multiple given labels for each example with shape ``(N, M)``, + where N is the number of examples and M is the number of annotators. + For more details, labels in the same format expected by the `~cleanlab.multiannotator.get_label_quality_multiannotator`. + consensus_label : np.ndarray + An array of shape ``(N,)`` with the consensus labels aggregated from all annotators. + consensus_quality_score : np.ndarray + An array of shape ``(N,)`` with the quality score of the consensus. + + Returns + ------- + worst_class : np.ndarray + The class that is most frequently mislabeled by a given annotator. + """ + + worst_class = np.apply_along_axis( + _get_single_annotator_worst_class, + axis=0, + arr=labels_multiannotator, + consensus_label=consensus_label, + consensus_quality_score=consensus_quality_score, + ).astype(int) + + return worst_class + + +def _get_single_annotator_worst_class( + labels: np.ndarray, + consensus_label: np.ndarray, + consensus_quality_score: np.ndarray, +) -> int: + """Returns the class a given annotator makes the most errors in. + + Parameters + ---------- + labels : np.ndarray + An array of shape ``(N,)`` with the labels from the annotator we want to evaluate. + consensus_label : np.ndarray + An array of shape ``(N,)`` with the consensus labels aggregated from all annotators. + consensus_quality_score : np.ndarray + An array of shape ``(N,)`` with the quality score of the consensus. + + Returns + ------- + worst_class : int + The class that is most frequently mislabeled by the given annotator. + """ + labels = pd.Series(labels) + labels_mask = pd.notna(labels) + class_accuracies = (labels[labels_mask] == consensus_label[labels_mask]).groupby(labels).mean() + accuracy_min_idx = class_accuracies[class_accuracies == class_accuracies.min()].index.values + + if len(accuracy_min_idx) == 1: + return accuracy_min_idx[0] + + # tiebreak 1: class counts + class_count = labels[labels_mask].groupby(labels).count()[accuracy_min_idx] + count_max_idx = class_count[class_count == class_count.max()].index.values + + if len(count_max_idx) == 1: + return count_max_idx[0] + + # tiebreak 2: consensus quality scores + avg_consensus_quality = ( + pd.DataFrame( + {"annotator_label": labels, "consensus_quality_score": consensus_quality_score} + )[labels_mask] + .groupby("annotator_label") + .mean()["consensus_quality_score"][count_max_idx] + ) + quality_max_idx = avg_consensus_quality[ + avg_consensus_quality == avg_consensus_quality.max() + ].index.values + + # return first item even if there are ties - no better methods to tiebreak + return quality_max_idx[0] +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/multilabel_classification/dataset.html b/v2.6.6/_modules/cleanlab/multilabel_classification/dataset.html new file mode 100644 index 000000000..1c27e1572 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/multilabel_classification/dataset.html @@ -0,0 +1,1021 @@ + + + + + + + + + + + cleanlab.multilabel_classification.dataset - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.multilabel_classification.dataset

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods to summarize overall labeling issues across a multi-label classification dataset.
+Here each example can belong to one or more classes, or none of the classes at all.
+Unlike in standard multi-class classification, model-predicted class probabilities need not sum to 1 for each row in multi-label classification.
+"""
+
+import pandas as pd
+import numpy as np
+from typing import Optional, cast, Dict, Any  # noqa: F401
+from cleanlab.multilabel_classification.filter import (
+    find_multilabel_issues_per_class,
+    find_label_issues,
+)
+from cleanlab.internal.multilabel_utils import get_onehot_num_classes
+from collections import defaultdict
+
+
+
[docs]def common_multilabel_issues( + labels=list, + pred_probs=None, + *, + class_names=None, + confident_joint=None, +) -> pd.DataFrame: + """Summarizes which classes in a multi-label dataset appear most often mislabeled overall. + + Since classes are not mutually exclusive in multi-label classification, this method summarizes the label issues for each class independently of the others. + + Parameters + ---------- + labels : List[List[int]] + List of noisy labels for multi-label classification where each example can belong to multiple classes. + Refer to documentation for this argument in :py:func:`multilabel_classification.filter.find_label_issues <cleanlab.multilabel_classification.filter.find_label_issues>` for further details. + + pred_probs : np.ndarray + An array of shape ``(N, K)`` of model-predicted class probabilities. + Refer to documentation for this argument in :py:func:`multilabel_classification.filter.find_label_issues <cleanlab.multilabel_classification.filter.find_label_issues>` for further details. + + class_names : Iterable[str], optional + A list or other iterable of the string class names. Its order must match the label indices. + If class 0 is 'dog' and class 1 is 'cat', then ``class_names = ['dog', 'cat']``. + If provided, the returned DataFrame will have an extra *Class Name* column with this info. + + confident_joint : np.ndarray, optional + An array of shape ``(K, 2, 2)`` representing a one-vs-rest formatted confident joint. + Refer to documentation for this argument in :py:func:`multilabel_classification.filter.find_label_issues <cleanlab.multilabel_classification.filter.find_label_issues>` for details. + + Returns + ------- + common_multilabel_issues : pd.DataFrame + DataFrame where each row corresponds to a class summarized by the following columns: + - *Class Name*: The name of the class if class_names is provided. + - *Class Index*: The index of the class. + - *In Given Label*: Whether the Class is originally annotated True or False in the given label. + - *In Suggested Label*: Whether the Class should be True or False in the suggested label (based on model's prediction). + - *Num Examples*: Number of examples flagged as a label issue where this Class is True/False "In Given Label" but cleanlab estimates the annotation should actually be as specified "In Suggested Label". I.e. the number of examples in your dataset where this Class was labeled as True but likely should have been False (or vice versa). + - *Issue Probability*: The *Num Examples* column divided by the total number of examples in the dataset; i.e. the relative overall frequency of each type of label issue in your dataset. + + By default, the rows in this DataFrame are ordered by "Issue Probability" (descending). + """ + + num_examples = _get_num_examples_multilabel(labels=labels, confident_joint=confident_joint) + summary_issue_counts = defaultdict(list) + y_one, num_classes = get_onehot_num_classes(labels, pred_probs) + label_issues_list, labels_list, pred_probs_list = find_multilabel_issues_per_class( + labels=labels, + pred_probs=pred_probs, + confident_joint=confident_joint, + return_indices_ranked_by="self_confidence", + ) + + for class_num, (label, issues_for_class) in enumerate(zip(y_one.T, label_issues_list)): + binary_label_issues = np.zeros(len(label)).astype(bool) + binary_label_issues[issues_for_class] = True + true_but_false_count = sum(np.logical_and(label == 1, binary_label_issues)) + false_but_true_count = sum(np.logical_and(label == 0, binary_label_issues)) + + if class_names is not None: + summary_issue_counts["Class Name"].append(class_names[class_num]) + summary_issue_counts["Class Index"].append(class_num) + summary_issue_counts["In Given Label"].append(True) + summary_issue_counts["In Suggested Label"].append(False) + summary_issue_counts["Num Examples"].append(true_but_false_count) + summary_issue_counts["Issue Probability"].append(true_but_false_count / num_examples) + + if class_names is not None: + summary_issue_counts["Class Name"].append(class_names[class_num]) + summary_issue_counts["Class Index"].append(class_num) + summary_issue_counts["In Given Label"].append(False) + summary_issue_counts["In Suggested Label"].append(True) + summary_issue_counts["Num Examples"].append(false_but_true_count) + summary_issue_counts["Issue Probability"].append(false_but_true_count / num_examples) + return ( + pd.DataFrame.from_dict(summary_issue_counts) + .sort_values(by=["Issue Probability"], ascending=False) + .reset_index(drop=True) + )
+ + +
[docs]def rank_classes_by_multilabel_quality( + labels=None, + pred_probs=None, + *, + class_names=None, + joint=None, + confident_joint=None, +) -> pd.DataFrame: + """ + Returns a DataFrame with three overall label quality scores per class for a multi-label dataset. + + These numbers summarize all examples annotated with the class (details listed below under the Returns parameter). + By default, classes are ordered by "Label Quality Score", so the most problematic classes are reported first in the DataFrame. + + Score values are unnormalized and may be very small. What matters is their relative ranking across the classes. + + **Parameters**: + + For information about the arguments to this method, see the documentation of + `~cleanlab.multilabel_classification.dataset.common_multilabel_issues`. + + Returns + ------- + overall_label_quality : pd.DataFrame + Pandas DataFrame with one row per class and columns: "Class Index", "Label Issues", + "Inverse Label Issues", "Label Issues", "Inverse Label Noise", "Label Quality Score". + Some entries are overall quality scores between 0 and 1, summarizing how good overall the labels + appear to be for that class (lower values indicate more erroneous labels). + Other entries are estimated counts of annotation errors related to this class. + + Here is what each column represents: + - *Class Name*: The name of the class if class_names is provided. + - *Class Index*: The index of the class in 0, 1, ..., K-1. + - *Label Issues*: Estimated number of examples in the dataset that are labeled as belonging to class k but actually should not belong to this class. + - *Inverse Label Issues*: Estimated number of examples in the dataset that should actually be labeled as class k but did not receive this label. + - *Label Noise*: Estimated proportion of examples in the dataset that are labeled as class k but should not be. For each class k: this is computed by dividing the number of examples with "Label Issues" that were labeled as class k by the total number of examples labeled as class k. + - *Inverse Label Noise*: Estimated proportion of examples in the dataset that should actually be labeled as class k but did not receive this label. + - *Label Quality Score*: Estimated proportion of examples labeled as class k that have been labeled correctly, i.e. ``1 - label_noise``. + + By default, the DataFrame is ordered by "Label Quality Score" (in ascending order), so the classes with the most label issues appear first. + """ + + issues_df = common_multilabel_issues( + labels=labels, pred_probs=pred_probs, class_names=class_names, confident_joint=joint + ) + issues_dict = defaultdict(defaultdict) # type: Dict[str, Any] + num_examples = _get_num_examples_multilabel(labels=labels, confident_joint=confident_joint) + return_columns = [ + "Class Name", + "Class Index", + "Label Issues", + "Inverse Label Issues", + "Label Noise", + "Inverse Label Noise", + "Label Quality Score", + ] + if class_names is None: + return_columns = return_columns[1:] + for class_num, row in issues_df.iterrows(): + if row["In Given Label"]: + if class_names is not None: + issues_dict[row["Class Index"]]["Class Name"] = row["Class Name"] + issues_dict[row["Class Index"]]["Label Issues"] = int( + row["Issue Probability"] * num_examples + ) + issues_dict[row["Class Index"]]["Label Noise"] = row["Issue Probability"] + issues_dict[row["Class Index"]]["Label Quality Score"] = ( + 1 - issues_dict[row["Class Index"]]["Label Noise"] + ) + else: + if class_names is not None: + issues_dict[row["Class Index"]]["Class Name"] = row["Class Name"] + issues_dict[row["Class Index"]]["Inverse Label Issues"] = int( + row["Issue Probability"] * num_examples + ) + issues_dict[row["Class Index"]]["Inverse Label Noise"] = row["Issue Probability"] + + issues_df_dict = defaultdict(list) + for i in issues_dict: + issues_df_dict["Class Index"].append(i) + for j in issues_dict[i]: + issues_df_dict[j].append(issues_dict[i][j]) + return ( + pd.DataFrame.from_dict(issues_df_dict) + .sort_values(by="Label Quality Score", ascending=True) + .reset_index(drop=True) + )[return_columns]
+ + +def _get_num_examples_multilabel(labels=None, confident_joint: Optional[np.ndarray] = None) -> int: + """Helper method that finds the number of examples from the parameters or throws an error + if neither parameter is provided. + + Parameters + ---------- + For parameter info, see the docstring of `~cleanlab.multilabel_classification.dataset.common_multilabel_issues`. + + Returns + ------- + num_examples : int + The number of examples in the dataset. + + Raises + ------ + ValueError + If `labels` is None. + """ + + if labels is None and confident_joint is None: + raise ValueError( + "Error: num_examples is None. You must either provide confident_joint, " + "or provide both num_example and joint as input parameters." + ) + _confident_joint = cast(np.ndarray, confident_joint) + num_examples = len(labels) if labels is not None else cast(int, np.sum(_confident_joint[0])) + return num_examples + + +
[docs]def overall_multilabel_health_score( + labels=None, + pred_probs=None, + *, + confident_joint=None, +) -> float: + """Returns a single score between 0 and 1 measuring the overall quality of all labels in a multi-label classification dataset. + Intuitively, the score is the average correctness of the given labels across all examples in the + dataset. So a score of 1 suggests your data is perfectly labeled and a score of 0.5 suggests + half of the examples in the dataset may be incorrectly labeled. Thus, a higher + score implies a higher quality dataset. + + **Parameters**: For information about the arguments to this method, see the documentation of + `~cleanlab.multilabel_classification.dataset.common_multilabel_issues`. + + Returns + ------- + health_score : float + A overall score between 0 and 1, where 1 implies all labels in the dataset are estimated to be correct. + A score of 0.5 implies that half of the dataset's labels are estimated to have issues. + """ + num_examples = _get_num_examples_multilabel(labels=labels) + issues = find_label_issues( + labels=labels, pred_probs=pred_probs, confident_joint=confident_joint + ) + return 1.0 - sum(issues) / num_examples
+ + +
[docs]def multilabel_health_summary( + labels=None, + pred_probs=None, + *, + class_names=None, + num_examples=None, + confident_joint=None, + verbose=True, +) -> Dict: + """Prints a health summary of your multi-label dataset. + + This summary includes useful statistics like: + + * The classes with the most and least label issues. + * Overall label quality scores, summarizing how accurate the labels appear across the entire dataset. + + **Parameters**: For information about the arguments to this method, see the documentation of + `~cleanlab.multilabel_classification.dataset.common_multilabel_issues`. + + Returns + ------- + summary : dict + A dictionary containing keys (see the corresponding functions' documentation to understand the values): + - ``"overall_label_health_score"``, corresponding to output of `~cleanlab.multilabel_classification.dataset.overall_multilabel_health_score` + - ``"classes_by_multilabel_quality"``, corresponding to output of `~cleanlab.multilabel_classification.dataset.rank_classes_by_multilabel_quality` + - ``"common_multilabel_issues"``, corresponding to output of `~cleanlab.multilabel_classification.dataset.common_multilabel_issues` + """ + from cleanlab.internal.util import smart_display_dataframe + + if num_examples is None: + num_examples = _get_num_examples_multilabel(labels=labels) + + if verbose: + longest_line = f"| for your dataset with {num_examples:,} examples " + print( + "-" * (len(longest_line) - 1) + + "\n" + + f"| Generating a Cleanlab Dataset Health Summary{' ' * (len(longest_line) - 49)}|\n" + + longest_line + + f"| Note, Cleanlab is not a medical doctor... yet.{' ' * (len(longest_line) - 51)}|\n" + + "-" * (len(longest_line) - 1) + + "\n", + ) + + df_class_label_quality = rank_classes_by_multilabel_quality( + labels=labels, + pred_probs=pred_probs, + class_names=class_names, + confident_joint=confident_joint, + ) + if verbose: + print("Overall Class Quality and Noise across your dataset (below)") + print("-" * 60, "\n", flush=True) + smart_display_dataframe(df_class_label_quality) + + df_common_issues = common_multilabel_issues( + labels=labels, + pred_probs=pred_probs, + class_names=class_names, + confident_joint=confident_joint, + ) + if verbose: + print( + "\nCommon multilabel issues are" + "\n" + "-" * 83 + "\n", + flush=True, + ) + smart_display_dataframe(df_common_issues) + print() + + health_score = overall_multilabel_health_score( + labels=labels, + pred_probs=pred_probs, + confident_joint=confident_joint, + ) + if verbose: + print("\nGenerated with <3 from Cleanlab.\n") + return { + "overall_multilabel_health_score": health_score, + "classes_by_multilabel_quality": df_class_label_quality, + "common_multilabel_issues": df_common_issues, + }
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/multilabel_classification/filter.html b/v2.6.6/_modules/cleanlab/multilabel_classification/filter.html new file mode 100644 index 000000000..0fdd3ea53 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/multilabel_classification/filter.html @@ -0,0 +1,998 @@ + + + + + + + + + + + cleanlab.multilabel_classification.filter - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.multilabel_classification.filter

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods to flag which examples have label issues in multi-label classification datasets.
+Here each example can belong to one or more classes, or none of the classes at all.
+Unlike in standard multi-class classification, model-predicted class probabilities need not sum to 1 for each row in multi-label classification.
+"""
+
+import warnings
+import inspect
+from typing import Optional, Union, Tuple, List, Any
+import numpy as np
+
+
+
[docs]def find_label_issues( + labels: list, + pred_probs: np.ndarray, + return_indices_ranked_by: Optional[str] = None, + rank_by_kwargs={}, + filter_by: str = "prune_by_noise_rate", + frac_noise: float = 1.0, + num_to_remove_per_class: Optional[List[int]] = None, + min_examples_per_class=1, + confident_joint: Optional[np.ndarray] = None, + n_jobs: Optional[int] = None, + verbose: bool = False, + low_memory: bool = False, +) -> np.ndarray: + """ + Identifies potentially mislabeled examples in a multi-label classification dataset. + An example is flagged as with a label issue if *any* of the classes appear to be incorrectly annotated for this example. + + Parameters + ---------- + labels : List[List[int]] + List of noisy labels for multi-label classification where each example can belong to multiple classes. + This is an iterable of iterables where the i-th element of `labels` corresponds to a list of classes that the i-th example belongs to, + according to the original data annotation (e.g. ``labels = [[1,2],[1],[0],..]``). + This method will return the indices i where the inner list ``labels[i]`` is estimated to have some error. + For a dataset with K classes, each class must be represented as an integer in 0, 1, ..., K-1 within the labels. + + pred_probs : np.ndarray + An array of shape ``(N, K)`` of model-predicted class probabilities. + Each row of this matrix corresponds to an example `x` + and contains the predicted probability that `x` belongs to each possible class, + for each of the K classes (along its columns). + The columns need not sum to 1 but must be ordered such that + these probabilities correspond to class 0, 1, ..., K-1. + + Note + ---- + Estimated label quality scores are most accurate when they are computed based on out-of-sample ``pred_probs`` from your model. + To obtain out-of-sample predicted probabilities for every example in your dataset, you can use :ref:`cross-validation <pred_probs_cross_val>`. + This is encouraged to get better results. + + return_indices_ranked_by : {None, 'self_confidence', 'normalized_margin', 'confidence_weighted_entropy'}, default = None + This function can return a boolean mask (if None) or an array of the example-indices with issues sorted based on the specified ranking method. + Refer to documentation for this argument in :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` for details. + + rank_by_kwargs : dict, optional + Optional keyword arguments to pass into scoring functions for ranking by + label quality score (see :py:func:`rank.get_label_quality_scores + <cleanlab.rank.get_label_quality_scores>`). + + filter_by : {'prune_by_class', 'prune_by_noise_rate', 'both', 'confident_learning', 'predicted_neq_given', 'low_normalized_margin', 'low_self_confidence'}, default='prune_by_noise_rate' + The specific Confident Learning method to determine precisely which examples have label issues in a dataset. + Refer to documentation for this argument in :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` for details. + + frac_noise : float, default = 1.0 + This will return the "top" frac_noise * num_label_issues estimated label errors, dependent on the filtering method used, + Refer to documentation for this argument in :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` for details. + + num_to_remove_per_class : array_like + An iterable that specifies the number of mislabeled examples to return from each class. + Refer to documentation for this argument in :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` for details. + + min_examples_per_class : int, default = 1 + The minimum number of examples required per class below which examples from this class will not be flagged as label issues. + Refer to documentation for this argument in :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` for details. + + confident_joint : np.ndarray, optional + An array of shape ``(K, 2, 2)`` representing a one-vs-rest formatted confident joint, as is appropriate for multi-label classification tasks. + Entry ``(c, i, j)`` in this array is the number of examples confidently counted into a ``(class c, noisy label=i, true label=j)`` bin, + where `i, j` are either 0 or 1 to denote whether this example belongs to class `c` or not + (recall examples can belong to multiple classes in multi-label classification). + The `confident_joint` can be computed using :py:func:`count.compute_confident_joint <cleanlab.count.compute_confident_joint>` with ``multi_label=True``. + If not provided, it is computed from the given (noisy) `labels` and `pred_probs`. + + n_jobs : optional + Number of processing threads used by multiprocessing. + Refer to documentation for this argument in :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` for details. + + verbose : optional + If ``True``, prints when multiprocessing happens. + + low_memory: bool, default=False + Set as ``True`` if you have a big dataset with limited memory. + Uses :py:func:`experimental.label_issues_batched.find_label_issues_batched <cleanlab.experimental.label_issues_batched>` + + Returns + ------- + label_issues : np.ndarray + If `return_indices_ranked_by` left unspecified, returns a boolean **mask** for the entire dataset + where ``True`` represents an example suffering from some label issue and + ``False`` represents an example that appears accurately labeled. + + If `return_indices_ranked_by` is specified, this method instead returns a list of **indices** of examples identified with + label issues (i.e. those indices where the mask would be ``True``). + Indices are sorted by the likelihood that *all* classes are correctly annotated for the corresponding example. + + Note + ---- + Obtain the *indices* of examples with label issues in your dataset by setting + `return_indices_ranked_by`. + + """ + from cleanlab.filter import _find_label_issues_multilabel + + if low_memory: + if rank_by_kwargs: + warnings.warn(f"`rank_by_kwargs` is not used when `low_memory=True`.") + + func_signature = inspect.signature(find_label_issues) + default_args = { + k: v.default + for k, v in func_signature.parameters.items() + if v.default is not inspect.Parameter.empty + } + arg_values = { + "filter_by": filter_by, + "num_to_remove_per_class": num_to_remove_per_class, + "confident_joint": confident_joint, + "n_jobs": n_jobs, + "num_to_remove_per_class": num_to_remove_per_class, + "frac_noise": frac_noise, + "min_examples_per_class": min_examples_per_class, + } + for arg_name, arg_val in arg_values.items(): + if arg_val != default_args[arg_name]: + warnings.warn(f"`{arg_name}` is not used when `low_memory=True`.") + + return _find_label_issues_multilabel( + labels=labels, + pred_probs=pred_probs, + return_indices_ranked_by=return_indices_ranked_by, + rank_by_kwargs=rank_by_kwargs, + filter_by=filter_by, + frac_noise=frac_noise, + num_to_remove_per_class=num_to_remove_per_class, + min_examples_per_class=min_examples_per_class, + confident_joint=confident_joint, + n_jobs=n_jobs, + verbose=verbose, + low_memory=low_memory, + )
+ + +
[docs]def find_multilabel_issues_per_class( + labels: list, + pred_probs: np.ndarray, + return_indices_ranked_by: Optional[str] = None, + rank_by_kwargs={}, + filter_by: str = "prune_by_noise_rate", + frac_noise: float = 1.0, + num_to_remove_per_class: Optional[List[int]] = None, + min_examples_per_class=1, + confident_joint: Optional[np.ndarray] = None, + n_jobs: Optional[int] = None, + verbose: bool = False, + low_memory: bool = False, +) -> Union[np.ndarray, Tuple[List[np.ndarray], List[Any], List[np.ndarray]]]: + """ + Identifies potentially bad labels for each example and each class in a multi-label classification dataset. + Whereas `~cleanlab.multilabel_classification.filter.find_label_issues` + estimates which examples have an erroneous annotation for *any* class, this method estimates which specific classes are incorrectly annotated as well. + This method returns a list of size K, the number of classes in the dataset. + + Parameters + ---------- + labels : List[List[int]] + List of noisy labels for multi-label classification where each example can belong to multiple classes. + Refer to documentation for this argument in `~cleanlab.multilabel_classification.filter.find_label_issues` for further details. + This method will identify whether ``labels[i][k]`` appears correct, for every example ``i`` and class ``k``. + + pred_probs : np.ndarray + An array of shape ``(N, K)`` of model-predicted class probabilities. + Refer to documentation for this argument in `~cleanlab.multilabel_classification.filter.find_label_issues` for further details. + + return_indices_ranked_by : {None, 'self_confidence', 'normalized_margin', 'confidence_weighted_entropy'}, default = None + This function can return a boolean mask (if this argument is ``None``) or a sorted array of indices based on the specified ranking method (if not ``None``). + Refer to documentation for this argument in :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` for details. + + rank_by_kwargs : dict, optional + Optional keyword arguments to pass into scoring functions for ranking by. + label quality score (see :py:func:`rank.get_label_quality_scores + <cleanlab.rank.get_label_quality_scores>`). + + filter_by : {'prune_by_class', 'prune_by_noise_rate', 'both', 'confident_learning', 'predicted_neq_given', 'low_normalized_margin', 'low_self_confidence'}, default = 'prune_by_noise_rate' + The specific method that can be used to filter or prune examples with label issues from a dataset. + Refer to documentation for this argument in :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` for details. + + frac_noise : float, default = 1.0 + This will return the "top" frac_noise * num_label_issues estimated label errors, dependent on the filtering method used, + Refer to documentation for this argument in :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` for details. + + num_to_remove_per_class : array_like + This parameter is an iterable that specifies the number of mislabeled examples to return from each class. + Refer to documentation for this argument in :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` for details. + + min_examples_per_class : int, default = 1 + The minimum number of examples required per class to avoid flagging as label issues. + Refer to documentation for this argument in :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` for details. + + confident_joint : np.ndarray, optional + An array of shape ``(K, 2, 2)`` representing a one-vs-rest formatted confident joint. + Refer to documentation for this argument in `~cleanlab.multilabel_classification.filter.find_label_issues` for details. + + n_jobs : optional + Number of processing threads used by multiprocessing. + Refer to documentation for this argument in :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` for details. + + verbose : optional + If ``True``, prints when multiprocessing happens. + + Returns + ------- + per_class_label_issues : list(np.ndarray) + By default, this is a list of length K containing the examples where each class appears incorrectly annotated. + ``per_class_label_issues[k]`` is a Boolean mask of the same length as the dataset, + where ``True`` values indicate examples where class ``k`` appears incorrectly annotated. + + For more details, refer to `~cleanlab.multilabel_classification.filter.find_label_issues`. + + Otherwise if `return_indices_ranked_by` is not ``None``, then this method returns 3 objects (each of length K, the number of classes): `label_issues_list`, `labels_list`, `pred_probs_list`. + - *label_issues_list*: an ordered list of indices of examples where class k appears incorrectly annotated, sorted by the likelihood that class k is correctly annotated. + - *labels_list*: a binary one-hot representation of the original labels, useful if you want to compute label quality scores. + - *pred_probs_list*: a one-vs-rest representation of the original predicted probabilities of shape ``(N, 2)``, useful if you want to compute label quality scores. + ``pred_probs_list[k][i][0]`` is the estimated probability that example ``i`` belongs to class ``k``, and is equal to: ``1 - pred_probs_list[k][i][1]``. + """ + import cleanlab.filter + from cleanlab.internal.multilabel_utils import get_onehot_num_classes, stack_complement + from cleanlab.experimental.label_issues_batched import find_label_issues_batched + + y_one, num_classes = get_onehot_num_classes(labels, pred_probs) + if return_indices_ranked_by is None: + bissues = np.zeros(y_one.shape).astype(bool) + else: + label_issues_list = [] + labels_list = [] + pred_probs_list = [] + if confident_joint is not None and not low_memory: + confident_joint_shape = confident_joint.shape + if confident_joint_shape == (num_classes, num_classes): + warnings.warn( + f"The new recommended format for `confident_joint` in multi_label settings is (num_classes,2,2) as output by compute_confident_joint(...,multi_label=True). Your K x K confident_joint in the old format is being ignored." + ) + confident_joint = None + elif confident_joint_shape != (num_classes, 2, 2): + raise ValueError("confident_joint should be of shape (num_classes, 2, 2)") + for class_num, (label, pred_prob_for_class) in enumerate(zip(y_one.T, pred_probs.T)): + pred_probs_binary = stack_complement(pred_prob_for_class) + if low_memory: + quality_score_kwargs = ( + {"method": return_indices_ranked_by} if return_indices_ranked_by else None + ) + binary_label_issues = find_label_issues_batched( + labels=label, + pred_probs=pred_probs_binary, + verbose=verbose, + quality_score_kwargs=quality_score_kwargs, + return_mask=return_indices_ranked_by is None, + ) + else: + if confident_joint is None: + conf = None + else: + conf = confident_joint[class_num] + if num_to_remove_per_class is not None: + ml_num_to_remove_per_class = [num_to_remove_per_class[class_num], 0] + else: + ml_num_to_remove_per_class = None + binary_label_issues = cleanlab.filter.find_label_issues( + labels=label, + pred_probs=pred_probs_binary, + return_indices_ranked_by=return_indices_ranked_by, + frac_noise=frac_noise, + rank_by_kwargs=rank_by_kwargs, + filter_by=filter_by, + num_to_remove_per_class=ml_num_to_remove_per_class, + min_examples_per_class=min_examples_per_class, + confident_joint=conf, + n_jobs=n_jobs, + verbose=verbose, + ) + + if return_indices_ranked_by is None: + bissues[:, class_num] = binary_label_issues + else: + label_issues_list.append(binary_label_issues) + labels_list.append(label) + pred_probs_list.append(pred_probs_binary) + if return_indices_ranked_by is None: + return bissues + else: + return label_issues_list, labels_list, pred_probs_list
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/multilabel_classification/rank.html b/v2.6.6/_modules/cleanlab/multilabel_classification/rank.html new file mode 100644 index 000000000..bc9e2433a --- /dev/null +++ b/v2.6.6/_modules/cleanlab/multilabel_classification/rank.html @@ -0,0 +1,873 @@ + + + + + + + + + + + cleanlab.multilabel_classification.rank - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.multilabel_classification.rank

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods to rank the severity of label issues in multi-label classification datasets.
+Here each example can belong to one or more classes, or none of the classes at all.
+Unlike in standard multi-class classification, model-predicted class probabilities need not sum to 1 for each row in multi-label classification.
+"""
+from __future__ import annotations
+
+import numpy as np  # noqa: F401: Imported for type annotations
+from typing import List, TypeVar, Dict, Any, Optional, Tuple, TYPE_CHECKING
+
+from cleanlab.internal.validation import assert_valid_inputs
+from cleanlab.internal.util import get_num_classes
+from cleanlab.internal.multilabel_utils import int2onehot
+from cleanlab.internal.multilabel_scorer import MultilabelScorer, ClassLabelScorer, Aggregator
+
+
+if TYPE_CHECKING:  # pragma: no cover
+    import numpy.typing as npt
+
+    T = TypeVar("T", bound=npt.NBitBase)
+
+
+def _labels_to_binary(
+    labels: List[List[int]],
+    pred_probs: npt.NDArray["np.floating[T]"],
+) -> np.ndarray:
+    """Validate the inputs to the multilabel scorer. Also transform the labels to a binary representation."""
+    assert_valid_inputs(
+        X=None, y=labels, pred_probs=pred_probs, multi_label=True, allow_one_class=True
+    )
+    num_classes = get_num_classes(labels=labels, pred_probs=pred_probs, multi_label=True)
+    binary_labels = int2onehot(labels, K=num_classes)
+    return binary_labels
+
+
+def _create_multilabel_scorer(
+    method: str,
+    adjust_pred_probs: bool,
+    aggregator_kwargs: Optional[Dict[str, Any]] = None,
+) -> Tuple[MultilabelScorer, Dict]:
+    """This function acts as a factory that creates a MultilabelScorer."""
+    base_scorer = ClassLabelScorer.from_str(method)
+    base_scorer_kwargs = {"adjust_pred_probs": adjust_pred_probs}
+    if aggregator_kwargs:
+        aggregator = Aggregator(**aggregator_kwargs)
+        scorer = MultilabelScorer(base_scorer, aggregator)
+    else:
+        scorer = MultilabelScorer(base_scorer)
+    return scorer, base_scorer_kwargs
+
+
+
[docs]def get_label_quality_scores( + labels: List[List[int]], + pred_probs: npt.NDArray["np.floating[T]"], + *, + method: str = "self_confidence", + adjust_pred_probs: bool = False, + aggregator_kwargs: Dict[str, Any] = {"method": "exponential_moving_average", "alpha": 0.8}, +) -> npt.NDArray["np.floating[T]"]: + """Computes a label quality score for each example in a multi-label classification dataset. + + Scores are between 0 and 1 with lower scores indicating examples whose label more likely contains an error. + For each example, this method internally computes a separate score for each individual class + and then aggregates these per-class scores into an overall label quality score for the example. + + + Parameters + ---------- + labels : List[List[int]] + List of noisy labels for multi-label classification where each example can belong to multiple classes. + Refer to documentation for this argument in :py:func:`multilabel_classification.filter.find_label_issues <cleanlab.multilabel_classification.filter.find_label_issues>` for further details. + + pred_probs : np.ndarray + An array of shape ``(N, K)`` of model-predicted class probabilities. + Refer to documentation for this argument in :py:func:`multilabel_classification.filter.find_label_issues <cleanlab.multilabel_classification.filter.find_label_issues>` for further details. + + method : {"self_confidence", "normalized_margin", "confidence_weighted_entropy"}, default = "self_confidence" + Method to calculate separate per-class annotation scores for an example that are then aggregated into an overall label quality score for the example. + These scores are separately calculated for each class based on the corresponding column of `pred_probs` in a one-vs-rest manner, + and are standard label quality scores for binary classification (based on whether the class should or should not apply to this example). + + See also + -------- + :py:func:`rank.get_label_quality_scores <cleanlab.rank.get_label_quality_scores>` function for details about each option. + + adjust_pred_probs : bool, default = False + Account for class imbalance in the label-quality scoring by adjusting predicted probabilities. + Refer to documentation for this argument in :py:func:`rank.get_label_quality_scores <cleanlab.rank.get_label_quality_scores>` for details. + + + aggregator_kwargs : dict, default = {"method": "exponential_moving_average", "alpha": 0.8} + A dictionary of hyperparameter values to use when aggregating per-class scores into an overall label quality score for each example. + Options for ``"method"`` include: ``"exponential_moving_average"`` or ``"softmin"`` or your own callable function. + See :py:class:`internal.multilabel_scorer.Aggregator <cleanlab.internal.multilabel_scorer.Aggregator>` for details about each option and other possible hyperparameters. + + To get a score for each class annotation for each example, use the `~cleanlab.multilabel_classification.rank.get_label_quality_scores_per_class` method instead. + + Returns + ------- + label_quality_scores : np.ndarray + A 1D array of shape ``(N,)`` with a label quality score (between 0 and 1) for each example in the dataset. + Lower scores indicate examples whose label is more likely to contain some annotation error (for any of the classes). + + Examples + -------- + >>> from cleanlab.multilabel_classification import get_label_quality_scores + >>> import numpy as np + >>> labels = [[1], [0,2]] + >>> pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]]) + >>> scores = get_label_quality_scores(labels, pred_probs) + >>> scores + array([0.9, 0.5]) + """ + binary_labels = _labels_to_binary(labels, pred_probs) + scorer, base_scorer_kwargs = _create_multilabel_scorer( + method=method, + adjust_pred_probs=adjust_pred_probs, + aggregator_kwargs=aggregator_kwargs, + ) + return scorer(binary_labels, pred_probs, base_scorer_kwargs=base_scorer_kwargs)
+ + +
[docs]def get_label_quality_scores_per_class( + labels: List[List[int]], + pred_probs: npt.NDArray["np.floating[T]"], + *, + method: str = "self_confidence", + adjust_pred_probs: bool = False, +) -> np.ndarray: + """ + Computes a quality score quantifying how likely each individual class annotation is correct in a multi-label classification dataset. + This is similar to `~cleanlab.multilabel_classification.rank.get_label_quality_scores` + but instead returns the per-class results without aggregation. + For a dataset with K classes, each example receives K scores from this method. + Refer to documentation in `~cleanlab.multilabel_classification.rank.get_label_quality_scores` for details. + + Parameters + ---------- + labels : List[List[int]] + List of noisy labels for multi-label classification where each example can belong to multiple classes. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.multilabel_classification.filter.find_label_issues>` for further details. + + pred_probs : np.ndarray + An array of shape ``(N, K)`` of model-predicted class probabilities. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.multilabel_classification.filter.find_label_issues>` for further details. + + method : {"self_confidence", "normalized_margin", "confidence_weighted_entropy"}, default = "self_confidence" + Method to calculate separate per-class annotation scores (that quantify how likely a particular class annotation is correct for a particular example). + Refer to documentation for this argument in `~cleanlab.multilabel_classification.rank.get_label_quality_scores` for further details. + + adjust_pred_probs : bool, default = False + Account for class imbalance in the label-quality scoring by adjusting predicted probabilities. + Refer to documentation for this argument in :py:func:`rank.get_label_quality_scores <cleanlab.rank.get_label_quality_scores>` for details. + + Returns + ------- + label_quality_scores : list(np.ndarray) + A list containing K arrays, each of shape (N,). Here K is the number of classes in the dataset and N is the number of examples. + ``label_quality_scores[k][i]`` is a score between 0 and 1 quantifying how likely the annotation for class ``k`` is correct for example ``i``. + + Examples + -------- + >>> from cleanlab.multilabel_classification import get_label_quality_scores + >>> import numpy as np + >>> labels = [[1], [0,2]] + >>> pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]]) + >>> scores = get_label_quality_scores(labels, pred_probs) + >>> scores + array([0.9, 0.5]) + """ + binary_labels = _labels_to_binary(labels, pred_probs) + scorer, base_scorer_kwargs = _create_multilabel_scorer( + method=method, + adjust_pred_probs=adjust_pred_probs, + ) + return scorer.get_class_label_quality_scores( + labels=binary_labels, pred_probs=pred_probs, base_scorer_kwargs=base_scorer_kwargs + )
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/object_detection/filter.html b/v2.6.6/_modules/cleanlab/object_detection/filter.html new file mode 100644 index 000000000..595f3f521 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/object_detection/filter.html @@ -0,0 +1,1100 @@ + + + + + + + + + + + cleanlab.object_detection.filter - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.object_detection.filter

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""Methods to find label issues in an object detection dataset, where each annotated bounding box in an image receives its own class label."""
+
+from collections import defaultdict
+from multiprocessing import Pool
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+import numpy as np
+
+from cleanlab.internal.constants import (
+    ALPHA,
+    HIGH_PROBABILITY_THRESHOLD,
+    LOW_PROBABILITY_THRESHOLD,
+    OVERLOOKED_THRESHOLD_FACTOR,
+    BADLOC_THRESHOLD_FACTOR,
+    SWAP_THRESHOLD_FACTOR,
+    AP_SCALE_FACTOR,
+)
+from cleanlab.internal.object_detection_utils import assert_valid_inputs
+from cleanlab.object_detection.rank import (
+    _get_valid_inputs_for_compute_scores,
+    _separate_label,
+    _separate_prediction,
+    compute_badloc_box_scores,
+    compute_overlooked_box_scores,
+    compute_swap_box_scores,
+    get_label_quality_scores,
+    issues_from_scores,
+    _get_overlap_matrix,
+)
+
+
+
[docs]def find_label_issues( + labels: List[Dict[str, Any]], + predictions: List[np.ndarray], + *, + return_indices_ranked_by_score: Optional[bool] = False, + overlapping_label_check: Optional[bool] = True, +) -> np.ndarray: + """ + Identifies potentially mislabeled images in an object detection dataset. + An image is flagged with a label issue if *any* of its bounding boxes appear incorrectly annotated. + This includes images for which a bounding box: should have been annotated but is missing, + has been annotated with the wrong class, or has been annotated in a suboptimal location. + + Suppose the dataset has ``N`` images, ``K`` possible class labels. + If ``return_indices_ranked_by_score`` is ``False``, a boolean mask of length ``N`` is returned, + indicating whether each image has a label issue (``True``) or not (``False``). + If ``return_indices_ranked_by_score`` is ``True``, the indices of images flagged with label issues are returned, + sorted with the most likely-mislabeled images ordered first. + + Parameters + ---------- + labels: + Annotated boxes and class labels in the original dataset, which may contain some errors. + This is a list of ``N`` dictionaries such that ``labels[i]`` contains the given labels for the `i`-th image in the following format: + ``{'bboxes': np.ndarray((L,4)), 'labels': np.ndarray((L,)), 'image_name': str}`` where ``L`` is the number of annotated bounding boxes + for the `i`-th image and ``bboxes[l]`` is a bounding box of coordinates in ``[x1,y1,x2,y2]`` format and with given class label ``labels[j]``. + ``image_name`` is an optional part of the labels that can be used to later refer to specific images. + + Note: Here, ``(x1,y1)`` corresponds to the top-left and ``(x2,y2)`` corresponds to the bottom-right corner of the bounding box with respect to the image matrix [e.g. `XYXY in Keras <https://keras.io/api/keras_cv/bounding_box/formats/>`, `Detectron 2 <https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.Visualizer.draw_box>`]. + + For more information on proper labels formatting, check out the `MMDetection library <https://mmdetection.readthedocs.io/en/dev-3.x/advanced_guides/customize_dataset.html>`_. + + predictions: + Predictions output by a trained object detection model. + For the most accurate results, predictions should be out-of-sample to avoid overfitting, eg. obtained via :ref:`cross-validation <pred_probs_cross_val>`. + This is a list of ``N`` ``np.ndarray`` such that ``predictions[i]`` corresponds to the model prediction for the `i`-th image. + For each possible class ``k`` in 0, 1, ..., K-1: ``predictions[i][k]`` is a ``np.ndarray`` of shape ``(M,5)``, + where ``M`` is the number of predicted bounding boxes for class ``k``. Here the five columns correspond to ``[x1,y1,x2,y2,pred_prob]``, + where ``[x1,y1,x2,y2]`` are coordinates of the bounding box predicted by the model and ``pred_prob`` is the model's confidence in the predicted class label for this bounding box. + + Note: Here, ``(x1,y1)`` corresponds to the top-left and ``(x2,y2)`` corresponds to the bottom-right corner of the bounding box with respect to the image matrix [e.g. `XYXY in Keras <https://keras.io/api/keras_cv/bounding_box/formats/>`, `Detectron 2 <https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.Visualizer.draw_box>`]. The last column, pred_prob, represents the predicted probability that the bounding box contains an object of the class k. + + For more information see the `MMDetection package <https://github.com/open-mmlab/mmdetection>`_ for an example object detection library that outputs predictions in the correct format. + + return_indices_ranked_by_score: + Determines what is returned by this method (see description of return value for details). + + overlapping_label_check : bool, default = True + If True, boxes annotated with more than one class label have their swap score penalized. Set this to False if you are not concerned when two very similar boxes exist with different class labels in the given annotations. + + + Returns + ------- + label_issues : np.ndarray + Specifies which images are identified to have a label issue. + If ``return_indices_ranked_by_score = False``, this function returns a boolean mask of length ``N`` (``True`` entries indicate which images have label issue). + If ``return_indices_ranked_by_score = True``, this function returns a (shorter) array of indices of images with label issues, sorted by how likely the image is mislabeled. + + More precisely, indices are sorted by image label quality score calculated via :py:func:`object_detection.rank.get_label_quality_scores <cleanlab.object_detection.rank.get_label_quality_scores>`. + """ + scoring_method = "objectlab" + + assert_valid_inputs( + labels=labels, + predictions=predictions, + method=scoring_method, + ) + + is_issue = _find_label_issues( + labels, + predictions, + scoring_method=scoring_method, + return_indices_ranked_by_score=return_indices_ranked_by_score, + overlapping_label_check=overlapping_label_check, + ) + + return is_issue
+ + +def _find_label_issues( + labels: List[Dict[str, Any]], + predictions: List[np.ndarray], + *, + scoring_method: Optional[str] = "objectlab", + return_indices_ranked_by_score: Optional[bool] = True, + overlapping_label_check: Optional[bool] = True, +): + """Internal function to find label issues based on passed in method.""" + + if scoring_method == "objectlab": + auxiliary_inputs = _get_valid_inputs_for_compute_scores(ALPHA, labels, predictions) + + per_class_scores = _get_per_class_ap(labels, predictions) + lab_list = [_separate_label(label)[1] for label in labels] + pred_list = [_separate_prediction(pred)[1] for pred in predictions] + pred_thresholds_list = _process_class_list(pred_list, per_class_scores) + lab_thresholds_list = _process_class_list(lab_list, per_class_scores) + overlooked_scores_per_box = compute_overlooked_box_scores( + alpha=ALPHA, + high_probability_threshold=HIGH_PROBABILITY_THRESHOLD, + auxiliary_inputs=auxiliary_inputs, + ) + overlooked_issues_per_box = _find_label_issues_per_box( + overlooked_scores_per_box, pred_thresholds_list, OVERLOOKED_THRESHOLD_FACTOR + ) + overlooked_issues_per_image = _pool_box_scores_per_image(overlooked_issues_per_box) + + badloc_scores_per_box = compute_badloc_box_scores( + alpha=ALPHA, + low_probability_threshold=LOW_PROBABILITY_THRESHOLD, + auxiliary_inputs=auxiliary_inputs, + ) + badloc_issues_per_box = _find_label_issues_per_box( + badloc_scores_per_box, lab_thresholds_list, BADLOC_THRESHOLD_FACTOR + ) + badloc_issues_per_image = _pool_box_scores_per_image(badloc_issues_per_box) + + swap_scores_per_box = compute_swap_box_scores( + alpha=ALPHA, + high_probability_threshold=HIGH_PROBABILITY_THRESHOLD, + overlapping_label_check=overlapping_label_check, + auxiliary_inputs=auxiliary_inputs, + ) + swap_issues_per_box = _find_label_issues_per_box( + swap_scores_per_box, lab_thresholds_list, SWAP_THRESHOLD_FACTOR + ) + swap_issues_per_image = _pool_box_scores_per_image(swap_issues_per_box) + + issues_per_image = ( + overlooked_issues_per_image + badloc_issues_per_image + swap_issues_per_image + ) + is_issue = issues_per_image > 0 + else: + is_issue = np.full( + shape=[ + len(labels), + ], + fill_value=-1, + ) + + if return_indices_ranked_by_score: + scores = get_label_quality_scores(labels, predictions) + sorted_scores_idx = issues_from_scores(scores, threshold=1.0) + is_issue_idx = np.where(is_issue == True)[0] + sorted_issue_mask = np.in1d(sorted_scores_idx, is_issue_idx, assume_unique=True) + issue_idx = sorted_scores_idx[sorted_issue_mask] + return issue_idx + else: + return is_issue + + +def _find_label_issues_per_box( + scores_per_box: List[np.ndarray], threshold_classes, threshold_factor=1.0 +) -> List[np.ndarray]: + """Takes in a list of size ``N`` where each index is an array of scores for each bounding box in the `n-th` example + and a threshold. Each box below or equal to the corresponding threshold in threshold_classes will be marked as an issue. + + Returns a list of size ``N`` where each index is a boolean array of length number of boxes per example `n` + marking if a specific box is an issue - 1 or not - 0.""" + is_issue_per_box = [] + for idx, score_per_box in enumerate(scores_per_box): + if len(score_per_box) == 0: # if no for specific image, then image not an issue + is_issue_per_box.append(np.array([False])) + else: + score_per_box[np.isnan(score_per_box)] = 1.0 + score_per_box = score_per_box + issue_per_box = [] + for i in range(len(score_per_box)): + issue_per_box.append( + score_per_box[i] <= threshold_classes[idx][i] * threshold_factor + ) + is_issue_per_box.append(np.array(issue_per_box, bool)) + return is_issue_per_box + + +def _pool_box_scores_per_image(is_issue_per_box: List[np.ndarray]) -> np.ndarray: + """Takes in a list of size ``N`` where each index is a boolean array of length number of boxes per image `n ` + marking if a specific box is an issue - 1 or not - 0. + + Returns a list of size ``N`` where each index marks if the image contains an issue - 1 or not - 0. + Images are marked as issues if 1 or more bounding boxes in the image is an issue.""" + is_issue = np.zeros( + shape=[ + len( + is_issue_per_box, + ) + ] + ) + for idx, issue_per_box in enumerate(is_issue_per_box): + if np.sum(issue_per_box) > 0: + is_issue[idx] = 1 + return is_issue + + +def _process_class_list(class_list: List[np.ndarray], class_dict: Dict[int, float]) -> List: + """ + Converts a list of classes represented as numpy arrays using a class-to-float dictionary, + and returns a list where each class is replaced by its corresponding float value from the dictionary. + + Args: + class_list (List[np.ndarray]): A list of classes represented as numpy arrays. + class_dict (Dict[int, float]): A dictionary mapping class indices to their corresponding float values. + + Returns: + List[float]: A list of float values corresponding to the classes in the input list. + """ + class_l2 = [] + for i in class_list: + l3 = [class_dict[j] for j in i] + class_l2.append(l3) + return class_l2 + + +def _calculate_ap_per_class( + labels: List[Dict[str, Any]], + predictions: List[np.ndarray], + *, + iou_threshold: Optional[float] = 0.5, + num_procs: int = 1, +) -> List: + """ + Computes the average precision for each class based on provided labels and predictions. + It uses an Intersection over Union (IoU) threshold and supports parallel processing with a specified number of processes. + + """ + num_images = len(predictions) + num_scale = 1 + num_classes = len(predictions[0]) + if num_images > 1: + num_procs = min(num_procs, num_images) + pool = Pool(num_procs) + ap_per_class_list = [] + for class_num in range(num_classes): + pred_bboxes, lab_bboxes = _filter_by_class(labels, predictions, class_num) + if num_images > 1: + tpfp = pool.starmap( + _calculate_true_positives_false_positives, + zip(pred_bboxes, lab_bboxes, [iou_threshold for _ in range(num_images)]), + ) + else: + tpfp = [ + _calculate_true_positives_false_positives( + pred_bboxes[0], + lab_bboxes[0], + iou_threshold, + ) + ] + true_positives, false_positives = tuple(zip(*tpfp)) + num_gts = np.zeros(num_scale, dtype=int) + for j, bbox in enumerate(lab_bboxes): + num_gts[0] += bbox.shape[0] + pred_bboxes = np.vstack(pred_bboxes) + sort_inds = np.argsort(-pred_bboxes[:, -1]) + true_positives = np.hstack(true_positives)[:, sort_inds] + false_positives = np.hstack(false_positives)[:, sort_inds] + true_positives = np.cumsum(true_positives, axis=1) + false_positives = np.cumsum(false_positives, axis=1) + eps = np.finfo(np.float32).eps + recalls = true_positives / np.maximum(num_gts[:, np.newaxis], eps) + precisions = true_positives / np.maximum((true_positives + false_positives), eps) + recalls = recalls[0, :] + precisions = precisions[0, :] + ap = _calculate_average_precision(recalls, precisions) + ap_per_class_list.append(ap) + if num_images > 1: + pool.close() + return ap_per_class_list + + +def _filter_by_class( + labels: List[Dict[str, Any]], predictions: List[np.ndarray], class_num: int +) -> Tuple[List, List]: + """ + Filters predictions and labels based on a specific class number. + """ + pred_bboxes = [prediction[class_num] for prediction in predictions] + lab_bboxes = [] + for label in labels: + gt_inds = label["labels"] == class_num + lab_bboxes.append(label["bboxes"][gt_inds, :]) + return pred_bboxes, lab_bboxes + + +def _calculate_true_positives_false_positives( + pred_bboxes: np.ndarray, + lab_bboxes: np.ndarray, + iou_threshold: Optional[float] = 0.5, + return_false_negative: bool = False, +) -> Union[Tuple[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray, np.ndarray]]: + """Calculates true positives (TP) and false positives (FP) for object detection tasks. + It takes predicted bounding boxes, ground truth bounding boxes, and an optional Intersection over Union (IoU) threshold as inputs. + If return_false_negative is True, it returns an array of False negatives as well. + """ + num_preds = pred_bboxes.shape[0] + num_labels = lab_bboxes.shape[0] + num_scales = 1 + true_positives = np.zeros((num_scales, num_preds), dtype=np.float32) + false_positives = np.zeros((num_scales, num_preds), dtype=np.float32) + + if lab_bboxes.shape[0] == 0: + false_positives[...] = 1 + if return_false_negative: + return true_positives, false_positives, np.array([], dtype=np.float32) + else: + return true_positives, false_positives + ious = _get_overlap_matrix(pred_bboxes, lab_bboxes) + ious_max = ious.max(axis=1) + ious_argmax = ious.argmax(axis=1) + sorted_indices = np.argsort(-pred_bboxes[:, -1]) + is_covered = np.zeros(num_labels, dtype=bool) + for index in sorted_indices: + if ious_max[index] >= iou_threshold: + matching_label = ious_argmax[index] + if not is_covered[matching_label]: + is_covered[matching_label] = True + true_positives[0, index] = 1 + else: + false_positives[0, index] = 1 + else: + false_positives[0, index] = 1 + if return_false_negative: + false_negatives = np.zeros((num_scales, num_labels), dtype=np.float32) + for label_index in range(num_labels): + if not is_covered[label_index]: + false_negatives[0, label_index] = 1 + return true_positives, false_positives, false_negatives + return true_positives, false_positives + + +def _calculate_average_precision( + recall_values: np.ndarray, precision_values: np.ndarray +) -> np.ndarray: + """Computes the average precision (AP) for a set of recall and precision values. It takes arrays of recall and precision values as inputs.""" + recall_values = recall_values[np.newaxis, :] + precision_values = precision_values[np.newaxis, :] + num_scales = recall_values.shape[0] + average_precision = np.zeros(num_scales, dtype=np.float32) + zeros_matrix = np.zeros((num_scales, 1), dtype=recall_values.dtype) + ones_matrix = np.ones((num_scales, 1), dtype=recall_values.dtype) + modified_recall = np.hstack((zeros_matrix, recall_values, ones_matrix)) + modified_precision = np.hstack((zeros_matrix, precision_values, zeros_matrix)) + + for i in range(modified_precision.shape[1] - 1, 0, -1): + modified_precision[:, i - 1] = np.maximum( + modified_precision[:, i - 1], modified_precision[:, i] + ) + + for i in range(num_scales): + index = np.where(modified_recall[i, 1:] != modified_recall[i, :-1])[0] + average_precision[i] = np.sum( + (modified_recall[i, index + 1] - modified_recall[i, index]) + * modified_precision[i, index + 1] + ) + + return average_precision + + +def _get_per_class_ap( + labels: List[Dict[str, Any]], predictions: List[np.ndarray] +) -> Dict[int, float]: + """Computes the Average Precision (AP) for each class in an object detection task. + It takes a list of label dictionaries and a list of prediction arrays as inputs. + It calculates AP values for different Intersection over Union (IoU) thresholds, averages them per class, and then scales the AP values. + """ + iou_thrs = np.linspace(0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True) + class_num_to_iou_list = defaultdict(list) + for threshold in iou_thrs: + ap_per_class = _calculate_ap_per_class(labels, predictions, iou_threshold=threshold) + for class_num in range(0, len(ap_per_class)): + class_num_to_iou_list[class_num].append(ap_per_class[class_num]) + class_num_to_AP = {} + for class_num in class_num_to_iou_list: + class_num_to_AP[class_num] = np.mean(class_num_to_iou_list[class_num]) * AP_SCALE_FACTOR + return class_num_to_AP +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/object_detection/rank.html b/v2.6.6/_modules/cleanlab/object_detection/rank.html new file mode 100644 index 000000000..1fcd3b08f --- /dev/null +++ b/v2.6.6/_modules/cleanlab/object_detection/rank.html @@ -0,0 +1,1805 @@ + + + + + + + + + + + cleanlab.object_detection.rank - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.object_detection.rank

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""Methods to rank and score images in an object detection dataset (object detection data), based on how likely they
+are to contain label errors. """
+
+from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, TypeVar
+import warnings
+import copy
+import numpy as np
+
+from cleanlab.internal.constants import (
+    ALPHA,
+    CUSTOM_SCORE_WEIGHT_BADLOC,
+    CUSTOM_SCORE_WEIGHT_OVERLOOKED,
+    CUSTOM_SCORE_WEIGHT_SWAP,
+    EPSILON,
+    EUC_FACTOR,
+    HIGH_PROBABILITY_THRESHOLD,
+    LOW_PROBABILITY_THRESHOLD,
+    MAX_ALLOWED_BOX_PRUNE,
+    TINY_VALUE,
+    TEMPERATURE,
+    LABEL_OVERLAP_THRESHOLD,
+)
+from cleanlab.internal.object_detection_utils import (
+    softmin1d,
+    assert_valid_aggregation_weights,
+    assert_valid_inputs,
+)
+
+
+if TYPE_CHECKING:  # pragma: no cover
+    from typing import TypedDict
+
+    AuxiliaryTypesDict = TypedDict(
+        "AuxiliaryTypesDict",
+        {
+            "pred_labels": np.ndarray,
+            "pred_label_probs": np.ndarray,
+            "pred_bboxes": np.ndarray,
+            "lab_labels": np.ndarray,
+            "lab_bboxes": np.ndarray,
+            "similarity_matrix": np.ndarray,
+            "iou_matrix": np.ndarray,
+            "min_possible_similarity": float,
+        },
+    )
+else:
+    AuxiliaryTypesDict = TypeVar("AuxiliaryTypesDict")
+
+
+
[docs]def get_label_quality_scores( + labels: List[Dict[str, Any]], + predictions: List[np.ndarray], + *, + aggregation_weights: Optional[Dict[str, float]] = None, + overlapping_label_check: Optional[bool] = True, + verbose: bool = True, +) -> np.ndarray: + """Computes a label quality score for each image of the ``N`` images in the dataset. + + For object detection datasets, the label quality score for an image estimates how likely it has been correctly labeled. + Lower scores indicate images whose annotation is more likely imperfect. + Annotators may have mislabeled an image because they: + + - overlooked an object (missing annotated bounding box), + - chose the wrong class label for an annotated box in the correct location, + - imperfectly annotated the location/edges of a bounding box. + + Any of these annotation errors should lead to an image with a lower label quality score. This quality score is between 0 and 1. + + - 1 - clean label (given label is likely correct). + - 0 - dirty label (given label is likely incorrect). + + Parameters + ---------- + labels: + A list of ``N`` dictionaries such that ``labels[i]`` contains the given labels for the `i`-th image. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + predictions: + A list of ``N`` ``np.ndarray`` such that ``predictions[i]`` corresponds to the model predictions for the `i`-th image. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + verbose : bool, default = True + Set to ``False`` to suppress all print statements. + + aggregation_weights: + Optional dictionary to specify weights for aggregating quality scores for subtype of label issue into an overall label quality score for the image. + Its keys are: "overlooked", "swap", "badloc", and values should be nonnegative weights that sum to 1. + Increase one of these weights to prioritize images with bounding boxes that were either: + missing in the annotations (overlooked object), annotated with the wrong class label (class for the object should be swapped to another class), or annotated in a suboptimal location (badly located). + + swapped examples, bad location examples, and overlooked examples. + It is important to ensure that the weights are non-negative values and that their sum equals 1.0. + + overlapping_label_check : bool, default = True + If True, boxes annotated with more than one class label have their swap score penalized. Set this to False if you are not concerned when two very similar boxes exist with different class labels in the given annotations. + + Returns + --------- + label_quality_scores: + Array of shape ``(N, )`` of scores between 0 and 1, one per image in the object detection dataset. + Lower scores indicate images that are more likely mislabeled. + """ + method = "objectlab" + probability_threshold = 0.0 + + assert_valid_inputs( + labels=labels, + predictions=predictions, + method=method, + threshold=probability_threshold, + ) + aggregation_weights = _get_aggregation_weights(aggregation_weights) + + return _compute_label_quality_scores( + labels=labels, + predictions=predictions, + method=method, + threshold=probability_threshold, + aggregation_weights=aggregation_weights, + overlapping_label_check=overlapping_label_check, + verbose=verbose, + )
+ + +
[docs]def issues_from_scores(label_quality_scores: np.ndarray, *, threshold: float = 0.1) -> np.ndarray: + """Convert label quality scores to a list of indices of images with issues sorted from most to least severe cut off at threshold. + + Returns the list of indices of images with issues sorted from most to least severe cut off at threshold. + + Parameters + ---------- + label_quality_scores: + Array of shape ``(N, )`` of scores between 0 and 1, one per image in the object detection dataset. + Lower scores indicate images are more likely to contain a label issue. + + threshold: + Label quality scores above the threshold are not considered to be label issues. The corresponding examples' indices are omitted from the returned array. + + Returns + --------- + issue_indices: + Array of issue indices sorted from most to least severe who's label quality scores fall below the threshold if one is provided. + """ + + if threshold > 1.0: + raise ValueError( + f""" + Threshold is a cutoff of label_quality_scores and therefore should be <= 1. + """ + ) + + issue_indices = np.argwhere(label_quality_scores <= threshold).flatten() + issue_vals = label_quality_scores[issue_indices] + sorted_idx = issue_vals.argsort() + return issue_indices[sorted_idx]
+ + +def _compute_label_quality_scores( + labels: List[Dict[str, Any]], + predictions: List[np.ndarray], + *, + method: Optional[str] = "objectlab", + aggregation_weights: Optional[Dict[str, float]] = None, + threshold: Optional[float] = None, + overlapping_label_check: Optional[bool] = True, + verbose: bool = True, +) -> np.ndarray: + """Internal function to prune extra bounding boxes and compute label quality scores based on passed in method.""" + + pred_probs_prepruned = False + min_pred_prob = _get_min_pred_prob(predictions) + aggregation_weights = _get_aggregation_weights(aggregation_weights) + + if threshold is not None: + predictions = _prune_by_threshold( + predictions=predictions, threshold=threshold, verbose=verbose + ) + if np.abs(min_pred_prob - threshold) < 0.001 and threshold > 0: + pred_probs_prepruned = True # the provided threshold is the threshold used for pre_pruning the pred_probs during model prediction. + else: + threshold = min_pred_prob # assume model was not pre_pruned if no threshold was provided + + if method == "objectlab": + scores = _get_subtype_label_quality_scores( + labels=labels, + predictions=predictions, + alpha=ALPHA, + low_probability_threshold=LOW_PROBABILITY_THRESHOLD, + high_probability_threshold=HIGH_PROBABILITY_THRESHOLD, + temperature=TEMPERATURE, + aggregation_weights=aggregation_weights, + overlapping_label_check=overlapping_label_check, + ) + else: + raise ValueError( + "Invalid method: '{}' is not a valid method for computing label quality scores. Please use the 'objectlab' method.".format( + method + ) + ) + return scores + + +def _get_min_pred_prob( + predictions: List[np.ndarray], +) -> float: + """Returns min pred_prob out of all predictions.""" + pred_probs = [1.0] # avoid calling np.min on empty array. + for prediction in predictions: + for class_prediction in prediction: + pred_probs.extend(list(class_prediction[:, -1])) + + min_pred_prob = np.min(pred_probs) + return min_pred_prob + + +def _prune_by_threshold( + predictions: List[np.ndarray], threshold: float, verbose: bool = True +) -> List[np.ndarray]: + """Removes predicted bounding boxes from predictions who's pred_prob is below the cuttoff threshold.""" + + predictions_copy = copy.deepcopy(predictions) + num_ann_to_zero = 0 + total_ann = 0 + for idx_predictions, prediction in enumerate(predictions_copy): + for idx_class, class_prediction in enumerate(prediction): + filtered_class_prediction = class_prediction[class_prediction[:, -1] >= threshold] + if len(class_prediction) > 0: + total_ann += 1 + if len(filtered_class_prediction) == 0: + num_ann_to_zero += 1 + + predictions_copy[idx_predictions][idx_class] = filtered_class_prediction + + p_ann_pruned = total_ann and num_ann_to_zero / total_ann or 0 # avoid division by zero + if p_ann_pruned > MAX_ALLOWED_BOX_PRUNE: + warnings.warn( + f"Pruning with threshold=={threshold} prunes {p_ann_pruned}% labels. Consider lowering the threshold.", + UserWarning, + ) + if verbose: + print( + f"Pruning {num_ann_to_zero} predictions out of {total_ann} using threshold=={threshold}. These predictions are no longer considered as potential candidates for identifying label issues as their similarity with the given labels is no longer considered." + ) + return predictions_copy + + +def _separate_label(label: Dict[str, Any]) -> Tuple[np.ndarray, np.ndarray]: + """Separates labels into bounding box and class label lists.""" + bboxes = label["bboxes"] + labels = label["labels"] + return bboxes, labels + + +# TODO: make object detection work for all predicted probabilities +def _separate_prediction_all_preds( + prediction: List[np.ndarray], +) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: + pred_bboxes, pred_labels, det_probs = prediction + return pred_bboxes, pred_labels, det_probs + + +def _separate_prediction_single_box( + prediction: np.ndarray, +) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: + """Separates predictions into class labels, bounding boxes and pred_prob lists""" + labels = [] + boxes = [] + for idx, prediction_class in enumerate(prediction): + labels.extend([idx] * len(prediction_class)) + boxes.extend(prediction_class.tolist()) + bboxes = [box[:4] for box in boxes] + pred_probs = [box[-1] for box in boxes] + return np.array(bboxes), np.array(labels), np.array(pred_probs) + + +def _get_prediction_type(prediction: np.ndarray) -> str: + if ( + len(prediction) == 3 + and prediction[0].shape[0] == prediction[2].shape[1] + and prediction[1].shape[0] == prediction[2].shape[0] + ): + return "all_pred" + else: + return "single_pred" + + +def _separate_prediction( + prediction, prediction_type="single_pred" +) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: + """Returns bbox, label and pred_prob values for prediction.""" + + if prediction_type == "all_pred": + boxes, labels, pred_probs = _separate_prediction_all_preds(prediction) + else: + boxes, labels, pred_probs = _separate_prediction_single_box(prediction) + return boxes, labels, pred_probs + + +def _mod_coordinates(x: List[float]) -> Dict[str, Any]: + """Takes is a list of xyxy coordinates and returns them in dictionary format.""" + + wd = {"x1": x[0], "y1": x[1], "x2": x[2], "y2": x[3]} + return wd + + +def _get_overlap(bb1: List[float], bb2: List[float]) -> float: + """Takes in two bounding boxes `bb1` and `bb2` and returns their IoU overlap.""" + + return _get_iou(_mod_coordinates(bb1), _mod_coordinates(bb2)) + + +def _get_overlap_matrix(bb1_list: np.ndarray, bb2_list: np.ndarray) -> np.ndarray: + """Takes in two lists of bounding boxes and returns an IoU matrix where IoU[i][j] is the overlap between + the i-th box in `bb1_list` and the j-th box in `bb2_list`.""" + wd = np.zeros(shape=(len(bb1_list), len(bb2_list))) + for i in range(len(bb1_list)): + for j in range(len(bb2_list)): + wd[i][j] = _get_overlap(bb1_list[i], bb2_list[j]) + return wd + + +def _get_iou(bb1: Dict[str, Any], bb2: Dict[str, Any]) -> float: + """ + Calculate the Intersection over Union (IoU) of two bounding boxes. + I've modified this to calculate overlap ratio in the line: + iou = np.clip(intersection_area / float(min(bb1_area,bb2_area)),0.0,1.0) + + Parameters + ---------- + bb1 : dict + Keys: {'x1', 'x2', 'y1', 'y2'} + The (x1, y1) position is at the top left corner, + the (x2, y2) position is at the bottom right corner + bb2 : dict + Keys: {'x1', 'x2', 'y1', 'y2'} + The (x, y) position is at the top left corner, + the (x2, y2) position is at the bottom right corner + Returns + ------- + float + in [0, 1] + """ + # determine the coordinates of the intersection rectangle + x_left = max(bb1["x1"], bb2["x1"]) + y_top = max(bb1["y1"], bb2["y1"]) + x_right = min(bb1["x2"], bb2["x2"]) + y_bottom = min(bb1["y2"], bb2["y2"]) + + if x_right < x_left or y_bottom < y_top: + return 0.0 + + # The intersection of two axis-aligned bounding boxes is always an + # axis-aligned bounding box + intersection_area = (x_right - x_left) * (y_bottom - y_top) + + # compute the area of both AABBs + bb1_area = (bb1["x2"] - bb1["x1"]) * (bb1["y2"] - bb1["y1"]) + bb2_area = (bb2["x2"] - bb2["x1"]) * (bb2["y2"] - bb2["y1"]) + + # compute the intersection over union by taking the intersection + # area and dividing it by the sum of prediction + ground-truth + # areas - the interesection area + iou = intersection_area / np.clip( + float(bb1_area + bb2_area - intersection_area), a_min=EPSILON, a_max=None + ) # avoid division by 0 + # There are some hyper-parameters here like consider tile area/object area + return iou + + +def _has_overlap(bbox_list, labels): + """This function determines whether each labeled box overlaps with another box of a different class (i.e. virtually the same box having multiple conflicting annotations). It returns a boolean array.""" + iou_matrix = _get_overlap_matrix(bbox_list, bbox_list) + results_overlap = [] + for i in range(0, len(iou_matrix)): + is_overlap = False + for j in range(0, len(iou_matrix)): + if i != j: + if iou_matrix[i][j] >= LABEL_OVERLAP_THRESHOLD: + lab_1 = labels[i] + lab_2 = labels[j] + if lab_1 != lab_2: + is_overlap = True + results_overlap.append(is_overlap) + return np.array(results_overlap) + + +def _euc_dis(box1: List[float], box2: List[float]) -> float: + """Calculates the Euclidean distance between `box1` and `box2`.""" + x1, y1 = (box1[0] + box1[2]) / 2, (box1[1] + box1[3]) / 2 + x2, y2 = (box2[0] + box2[2]) / 2, (box2[1] + box2[3]) / 2 + p1 = np.array([x1, y1]) + p2 = np.array([x2, y2]) + val2 = np.exp(-np.linalg.norm(p1 - p2) * EUC_FACTOR) + return val2 + + +def _get_dist_matrix(bb1_list: np.ndarray, bb2_list: np.ndarray) -> np.ndarray: + """Returns a distance matrix of distances from all of boxes in bb1_list to all of boxes in bb2_list.""" + wd = np.zeros(shape=(len(bb1_list), len(bb2_list))) + for i in range(len(bb1_list)): + for j in range(len(bb2_list)): + wd[i][j] = _euc_dis(bb1_list[i], bb2_list[j]) + return wd + + +def _get_min_possible_similarity( + alpha: float, + predictions, + labels: List[Dict[str, Any]], +) -> float: + """Gets the min possible similarity score between two bounding boxes out of all images.""" + min_possible_similarity = 1.0 + for prediction, label in zip(predictions, labels): + lab_bboxes, lab_labels = _separate_label(label) + pred_bboxes, pred_labels, _ = _separate_prediction(prediction) + iou_matrix = _get_overlap_matrix(lab_bboxes, pred_bboxes) + dist_matrix = 1 - _get_dist_matrix(lab_bboxes, pred_bboxes) + similarity_matrix = iou_matrix * alpha + (1 - alpha) * (1 - dist_matrix) + non_zero_similarity_matrix = similarity_matrix[np.nonzero(similarity_matrix)] + min_image_similarity = ( + 1.0 if 0 in non_zero_similarity_matrix.shape else np.min(non_zero_similarity_matrix) + ) + min_possible_similarity = np.min([min_possible_similarity, min_image_similarity]) + return min_possible_similarity + + +def _get_valid_inputs_for_compute_scores_per_image( + alpha: float, + *, + label: Optional[Dict[str, Any]] = None, + prediction: Optional[np.ndarray] = None, + pred_labels: Optional[np.ndarray] = None, + pred_label_probs: Optional[np.ndarray] = None, + pred_bboxes: Optional[np.ndarray] = None, + lab_labels: Optional[np.ndarray] = None, + lab_bboxes: Optional[np.ndarray] = None, + similarity_matrix: Optional[np.ndarray] = None, + iou_matrix: Optional[np.ndarray] = None, + min_possible_similarity: Optional[float] = None, +) -> AuxiliaryTypesDict: + """Returns valid inputs for compute scores by either passing through values or calculating the inputs internally.""" + if lab_labels is None or lab_bboxes is None: + if label is None: + raise ValueError( + f"Pass in either one of label or label labels into auxiliary inputs. Both can not be None." + ) + lab_bboxes, lab_labels = _separate_label(label) + + if pred_labels is None or pred_label_probs is None or pred_bboxes is None: + if prediction is None: + raise ValueError( + f"Pass in either one of prediction or prediction labels and prediction probabilities into auxiliary inputs. Both can not be None." + ) + pred_bboxes, pred_labels, pred_label_probs = _separate_prediction(prediction) + + if similarity_matrix is None: + iou_matrix = _get_overlap_matrix(lab_bboxes, pred_bboxes) + dist_matrix = 1 - _get_dist_matrix(lab_bboxes, pred_bboxes) + similarity_matrix = iou_matrix * alpha + (1 - alpha) * (1 - dist_matrix) + + if iou_matrix is None: + iou_matrix = _get_overlap_matrix(lab_bboxes, pred_bboxes) + + if min_possible_similarity is None: + min_possible_similarity = ( + 1.0 + if 0 in similarity_matrix.shape + else np.min(similarity_matrix[np.nonzero(similarity_matrix)]) + ) + + auxiliary_input_dict: AuxiliaryTypesDict = { + "pred_labels": pred_labels, + "pred_label_probs": pred_label_probs, + "pred_bboxes": pred_bboxes, + "lab_labels": lab_labels, + "lab_bboxes": lab_bboxes, + "similarity_matrix": similarity_matrix, + "iou_matrix": iou_matrix, + "min_possible_similarity": min_possible_similarity, + } + + return auxiliary_input_dict + + +def _get_valid_inputs_for_compute_scores( + alpha: float, + labels: Optional[List[Dict[str, Any]]] = None, + predictions: Optional[List[np.ndarray]] = None, +) -> List[AuxiliaryTypesDict]: + """Takes in alpha, labels and predictions and returns auxiliary input dictionary containing divided parts of labels and prediction per image.""" + if predictions is None or labels is None: + raise ValueError( + f"Predictions and labels can not be None. Both are needed to get valid inputs." + ) + min_possible_similarity = _get_min_possible_similarity(alpha, predictions, labels) + + auxiliary_inputs = [] + + for prediction, label in zip(predictions, labels): + auxiliary_input_dict = _get_valid_inputs_for_compute_scores_per_image( + alpha=alpha, + label=label, + prediction=prediction, + min_possible_similarity=min_possible_similarity, + ) + auxiliary_inputs.append(auxiliary_input_dict) + + return auxiliary_inputs + + +def _get_valid_score(scores_arr: np.ndarray, temperature: float) -> float: + """Given scores array, returns valid score (softmin) or 1. Checks validity of score.""" + scores_arr = scores_arr[~np.isnan(scores_arr)] + if len(scores_arr) > 0: + valid_score = softmin1d(scores_arr, temperature=temperature) + else: + valid_score = 1.0 + return valid_score + + +def _get_valid_subtype_score_params( + alpha: Optional[float] = None, + low_probability_threshold: Optional[float] = None, + high_probability_threshold: Optional[float] = None, + temperature: Optional[float] = None, +): + """This function returns valid params for subtype score. If param is None, then default constant is returned""" + if alpha is None: + alpha = ALPHA + if low_probability_threshold is None: + low_probability_threshold = LOW_PROBABILITY_THRESHOLD + if high_probability_threshold is None: + high_probability_threshold = HIGH_PROBABILITY_THRESHOLD + if temperature is None: + temperature = TEMPERATURE + return alpha, low_probability_threshold, high_probability_threshold, temperature + + +def _get_aggregation_weights( + aggregation_weights: Optional[Dict[str, Any]] = None +) -> Dict[str, Any]: + """This function validates aggregation weights, returning the default weights if none are provided.""" + if aggregation_weights is None: + aggregation_weights = { + "overlooked": CUSTOM_SCORE_WEIGHT_OVERLOOKED, + "swap": CUSTOM_SCORE_WEIGHT_SWAP, + "badloc": CUSTOM_SCORE_WEIGHT_BADLOC, + } + else: + assert_valid_aggregation_weights(aggregation_weights) + return aggregation_weights + + +def _compute_overlooked_box_scores_for_image( + alpha: float, + high_probability_threshold: float, + label: Optional[Dict[str, Any]] = None, + prediction: Optional[np.ndarray] = None, + pred_labels: Optional[np.ndarray] = None, + pred_label_probs: Optional[np.ndarray] = None, + pred_bboxes: Optional[np.ndarray] = None, + lab_labels: Optional[np.ndarray] = None, + lab_bboxes: Optional[np.ndarray] = None, + similarity_matrix: Optional[np.ndarray] = None, + iou_matrix: Optional[np.ndarray] = None, + min_possible_similarity: Optional[float] = None, +) -> np.ndarray: + """This method returns one score per predicted box (above threshold) in an image. Score from 0 to 1 ranking how overlooked the box is.""" + + auxiliary_input_dict = _get_valid_inputs_for_compute_scores_per_image( + alpha=alpha, + label=label, + prediction=prediction, + pred_labels=pred_labels, + pred_label_probs=pred_label_probs, + pred_bboxes=pred_bboxes, + lab_labels=lab_labels, + lab_bboxes=lab_bboxes, + similarity_matrix=similarity_matrix, + min_possible_similarity=min_possible_similarity, + ) + + pred_labels = auxiliary_input_dict["pred_labels"] + pred_label_probs = auxiliary_input_dict["pred_label_probs"] + lab_labels = auxiliary_input_dict["lab_labels"] + similarity_matrix = auxiliary_input_dict["similarity_matrix"] + min_possible_similarity = auxiliary_input_dict["min_possible_similarity"] + iou_matrix = auxiliary_input_dict["iou_matrix"] + + scores_overlooked = np.empty(len(pred_labels)) # same length as num of predicted boxes + + for iid, k in enumerate(pred_labels): + if pred_label_probs[iid] < high_probability_threshold or np.any(iou_matrix[:, iid] > 0): + scores_overlooked[iid] = np.nan + continue + + k_similarity = similarity_matrix[lab_labels == k, iid] + + if len(k_similarity) == 0: # if there are no annotated boxes of class k + score = min_possible_similarity * (1 - pred_label_probs[iid]) + else: + closest_annotated_box = np.argmax(k_similarity) + score = k_similarity[closest_annotated_box] + + scores_overlooked[iid] = score + + return scores_overlooked + + +
[docs]def compute_overlooked_box_scores( + *, + labels: Optional[List[Dict[str, Any]]] = None, + predictions: Optional[List[np.ndarray]] = None, + alpha: Optional[float] = None, + high_probability_threshold: Optional[float] = None, + auxiliary_inputs: Optional[List[AuxiliaryTypesDict]] = None, +) -> List[np.ndarray]: + """ + Returns an array of overlooked box scores for each image. + This is a helper method mostly for advanced users. + + An overlooked box error is when an image contains an object that is one of the given classes but there is no annotated bounding box around it. + Score per high-confidence predicted bounding box is between 0 and 1, with lower values indicating boxes we are more confident were overlooked in the given label. + + Each image has ``L`` annotated bounding boxes and ``M`` predicted bounding boxes. + A score is calculated for each predicted box in each of the ``N`` images in dataset. + + Note: ``M`` and ``L`` can be a different values for each image, as the number of annotated and predicted boxes varies. + + Parameters + ---------- + labels: + A list of ``N`` dictionaries such that ``labels[i]`` contains the given labels for the `i`-th image. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + predictions: + A list of ``N`` ``np.ndarray`` such that ``predictions[i]`` corresponds to the model predictions for the `i`-th image. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + alpha: + Optional weighting between IoU and Euclidean distance when calculating similarity between predicted and annotated boxes. High alpha means weighting IoU more heavily over Euclidean distance. If no alpha is provided, a good default is used. + + high_probability_threshold: + Optional probability threshold that determines which predicted boxes are considered high-confidence when computing overlooked scores. If not provided, a good default is used. + + auxiliary_inputs: + Optional list of ``N`` dictionaries containing keys for sub-parts of label and prediction per image. Useful to minimize computation when computing multiple box scores for a single set of images. For the `i`-th image, `auxiliary_inputs[i]` should contain following keys: + + * pred_labels: np.ndarray + Array of predicted classes for `i`-th image of shape ``(M,)``. + * pred_label_probs: np.ndarray + Array of predicted class probabilities for `i`-th image of shape ``(M,)``. + * pred_bboxes: np.ndarray + Array of predicted bounding boxes for `i`-th image of shape ``(M, 4)``. + * lab_labels: np.ndarray + Array of given label classed for `i`-th image of shape ``(L,)``. + * lab_bboxes: np.ndarray + Array of given label bounding boxes for `i`-th image of shape ``(L, 4)``. + * similarity_matrix: np.ndarray + Similarity matrix between labels and predictions `i`-th image. + * min_possible_similarity: float + Minimum possible similarity value greater than 0 between labels and predictions for the entire dataset. + Returns + --------- + scores_overlooked: + A list of ``N`` numpy arrays where scores_overlooked[i] is an array of size ``M`` of overlooked scores per predicted box for the `i`-th image. + """ + ( + alpha, + low_probability_threshold, + high_probability_threshold, + temperature, + ) = _get_valid_subtype_score_params(alpha, None, high_probability_threshold, None) + + if auxiliary_inputs is None: + auxiliary_inputs = _get_valid_inputs_for_compute_scores(alpha, labels, predictions) + + scores_overlooked = [] + for auxiliary_input_dict in auxiliary_inputs: + scores_overlooked_per_box = _compute_overlooked_box_scores_for_image( + alpha=alpha, + high_probability_threshold=high_probability_threshold, + **auxiliary_input_dict, + ) + scores_overlooked.append(scores_overlooked_per_box) + return scores_overlooked
+ + +def _compute_badloc_box_scores_for_image( + alpha: float, + low_probability_threshold: float, + label: Optional[Dict[str, Any]] = None, + prediction: Optional[np.ndarray] = None, + pred_labels: Optional[np.ndarray] = None, + pred_label_probs: Optional[np.ndarray] = None, + pred_bboxes: Optional[np.ndarray] = None, + lab_labels: Optional[np.ndarray] = None, + lab_bboxes: Optional[np.ndarray] = None, + similarity_matrix: Optional[np.ndarray] = None, + iou_matrix: Optional[np.ndarray] = None, + min_possible_similarity: Optional[float] = None, +) -> np.ndarray: + """This method returns one score per labeled box in an image. Score from 0 to 1 ranking how badly located the box is.""" + + auxiliary_input_dict = _get_valid_inputs_for_compute_scores_per_image( + alpha=alpha, + label=label, + prediction=prediction, + pred_labels=pred_labels, + pred_label_probs=pred_label_probs, + pred_bboxes=pred_bboxes, + lab_labels=lab_labels, + lab_bboxes=lab_bboxes, + similarity_matrix=similarity_matrix, + iou_matrix=iou_matrix, + min_possible_similarity=min_possible_similarity, + ) + pred_labels = auxiliary_input_dict["pred_labels"] + pred_label_probs = auxiliary_input_dict["pred_label_probs"] + lab_labels = auxiliary_input_dict["lab_labels"] + similarity_matrix = auxiliary_input_dict["similarity_matrix"] + iou_matrix = auxiliary_input_dict["iou_matrix"] + + scores_badloc = np.empty(len(lab_labels)) + + for iid, k in enumerate(lab_labels): + k_similarity = similarity_matrix[iid, pred_labels == k] + k_pred = pred_label_probs[pred_labels == k] + k_iou = iou_matrix[iid, pred_labels == k] + + if len(k_pred) == 0 or np.max(k_pred) <= low_probability_threshold: + scores_badloc[iid] = 1.0 + continue + + idx_at_least_low_probability_threshold = np.where(k_pred > low_probability_threshold)[0] + idx_at_least_intersection_threshold = np.where(k_iou > 0)[0] + combined_idx = np.intersect1d( + idx_at_least_low_probability_threshold, idx_at_least_intersection_threshold + ) + + k_similarity = k_similarity[combined_idx] + k_pred = k_pred[combined_idx] + + scores_badloc[iid] = np.max(k_similarity) if len(k_pred) > 0 else 1.0 + return scores_badloc + + +
[docs]def compute_badloc_box_scores( + *, + labels: Optional[List[Dict[str, Any]]] = None, + predictions: Optional[List[np.ndarray]] = None, + alpha: Optional[float] = None, + low_probability_threshold: Optional[float] = None, + auxiliary_inputs: Optional[List[AuxiliaryTypesDict]] = None, +) -> List[np.ndarray]: + """ + Returns a numeric score for each annotated bounding box in each image, estimating the likelihood that the edges of this box are not badly located. + This is a helper method mostly for advanced users. + + A badly located box error is when a box has the correct label but incorrect coordinates so it does not correctly encapsulate the entire object it is for. + Score per high-confidence predicted bounding box is between 0 and 1, with lower values indicating boxes we are more confident were overlooked in the given label. + + Each image has ``L`` annotated bounding boxes and ``M`` predicted bounding boxes. + A score is calculated for each predicted box in each of the ``N`` images in dataset. + + Note: ``M`` and ``L`` can be a different values for each image, as the number of annotated and predicted boxes varies. + + Parameters + ---------- + labels: + A list of ``N`` dictionaries such that ``labels[i]`` contains the given labels for the `i`-th image. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + predictions: + A list of ``N`` ``np.ndarray`` such that ``predictions[i]`` corresponds to the model predictions for the `i`-th image. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + alpha: + Optional weighting between IoU and Euclidean distance when calculating similarity between predicted and annotated boxes. High alpha means weighting IoU more heavily over Euclidean distance. If no alpha is provided, a good default is used. + + low_probability_threshold: + Optional minimum probability threshold that determines which predicted boxes are considered when computing badly located scores. If not provided, a good default is used. + + auxiliary_inputs: + Optional list of ``N`` dictionaries containing keys for sub-parts of label and prediction per image. Useful to minimize computation when computing multiple box scores for a single set of images. For the `i`-th image, `auxiliary_inputs[i]` should contain following keys: + + * pred_labels: np.ndarray + Array of predicted classes for `i`-th image of shape ``(M,)``. + * pred_label_probs: np.ndarray + Array of predicted class probabilities for `i`-th image of shape ``(M,)``. + * pred_bboxes: np.ndarray + Array of predicted bounding boxes for `i`-th image of shape ``(M, 4)``. + * lab_labels: np.ndarray + Array of given label classed for `i`-th image of shape ``(L,)``. + * lab_bboxes: np.ndarray + Array of given label bounding boxes for `i`-th image of shape ``(L, 4)``. + * similarity_matrix: np.ndarray + Similarity matrix between labels and predictions `i`-th image. + * min_possible_similarity: float + Minimum possible similarity value greater than 0 between labels and predictions for the entire dataset. + Returns + --------- + scores_badloc: + A list of ``N`` numpy arrays where scores_badloc[i] is an array of size ``L`` badly located scores per annotated box for the `i`-th image. + """ + ( + alpha, + low_probability_threshold, + high_probability_threshold, + temperature, + ) = _get_valid_subtype_score_params(alpha, low_probability_threshold, None, None) + if auxiliary_inputs is None: + auxiliary_inputs = _get_valid_inputs_for_compute_scores(alpha, labels, predictions) + + scores_badloc = [] + for auxiliary_input_dict in auxiliary_inputs: + scores_badloc_per_box = _compute_badloc_box_scores_for_image( + alpha=alpha, low_probability_threshold=low_probability_threshold, **auxiliary_input_dict + ) + scores_badloc.append(scores_badloc_per_box) + return scores_badloc
+ + +def _compute_swap_box_scores_for_image( + alpha: float, + high_probability_threshold: float, + label: Optional[Dict[str, Any]] = None, + prediction: Optional[np.ndarray] = None, + pred_labels: Optional[np.ndarray] = None, + pred_label_probs: Optional[np.ndarray] = None, + pred_bboxes: Optional[np.ndarray] = None, + lab_labels: Optional[np.ndarray] = None, + lab_bboxes: Optional[np.ndarray] = None, + similarity_matrix: Optional[np.ndarray] = None, + iou_matrix: Optional[np.ndarray] = None, + min_possible_similarity: Optional[float] = None, + overlapping_label_check: Optional[bool] = True, +) -> np.ndarray: + """This method returns one score per labeled box in an image. Score from 0 to 1 ranking how likeley swapped the box is.""" + + auxiliary_input_dict = _get_valid_inputs_for_compute_scores_per_image( + alpha=alpha, + label=label, + prediction=prediction, + pred_labels=pred_labels, + pred_label_probs=pred_label_probs, + pred_bboxes=pred_bboxes, + lab_labels=lab_labels, + lab_bboxes=lab_bboxes, + similarity_matrix=similarity_matrix, + min_possible_similarity=min_possible_similarity, + ) + + pred_labels = auxiliary_input_dict["pred_labels"] + pred_label_probs = auxiliary_input_dict["pred_label_probs"] + lab_labels = auxiliary_input_dict["lab_labels"] + similarity_matrix = auxiliary_input_dict["similarity_matrix"] + min_possible_similarity = auxiliary_input_dict["min_possible_similarity"] + + if overlapping_label_check: + has_overlap_label_bboxes = _has_overlap(lab_bboxes, lab_labels) + else: + has_overlap_label_bboxes = np.array([False] * len(lab_labels)) + + scores_swap = np.empty(len(lab_labels)) + + for iid, k in enumerate(lab_labels): + not_k_idx = np.where(pred_labels != k)[0] + if has_overlap_label_bboxes[iid]: + scores_swap[iid] = min_possible_similarity + continue + if not_k_idx.size == 0 or np.all(pred_label_probs[not_k_idx] <= high_probability_threshold): + scores_swap[iid] = 1.0 + continue + + not_k_pred = pred_label_probs[not_k_idx] + idx_at_least_high_probability_threshold = np.where(not_k_pred > high_probability_threshold)[ + 0 + ] + not_k_similarity = similarity_matrix[iid, not_k_idx][ + idx_at_least_high_probability_threshold + ] + + closest_predicted_box = np.argmax(not_k_similarity) + score = np.max([min_possible_similarity, 1 - not_k_similarity[closest_predicted_box]]) + scores_swap[iid] = score + + return scores_swap + + +
[docs]def compute_swap_box_scores( + *, + labels: Optional[List[Dict[str, Any]]] = None, + predictions: Optional[List[np.ndarray]] = None, + alpha: Optional[float] = None, + high_probability_threshold: Optional[float] = None, + overlapping_label_check: Optional[bool] = True, + auxiliary_inputs: Optional[List[AuxiliaryTypesDict]] = None, +) -> List[np.ndarray]: + """ + Returns a numeric score for each annotated bounding box in each image, estimating the likelihood that the class label for this box was not accidentally swapped with another class. + This is a helper method mostly for advanced users. + + A swapped box error occurs when a bounding box should be labeled as a class different to what the current label is. + Score per high-confidence predicted bounding box is between 0 and 1, with lower values indicating boxes we are more confident were overlooked in the given label. + + Each image has ``L`` annotated bounding boxes and ``M`` predicted bounding boxes. + A score is calculated for each predicted box in each of the ``N`` images in dataset. + + Note: ``M`` and ``L`` can be a different values for each image, as the number of annotated and predicted boxes varies. + + Parameters + ---------- + labels: + A list of ``N`` dictionaries such that ``labels[i]`` contains the given labels for the `i`-th image. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + predictions: + A list of ``N`` ``np.ndarray`` such that ``predictions[i]`` corresponds to the model predictions for the `i`-th image. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + alpha: + Optional weighting between IoU and Euclidean distance when calculating similarity between predicted and annotated boxes. High alpha means weighting IoU more heavily over Euclidean distance. If no alpha is provided, a good default is used. + + high_probability_threshold: + Optional probability threshold that determines which predicted boxes are considered high-confidence when computing overlooked scores. If not provided, a good default is used. + + overlapping_label_check : bool, default = True + If True, boxes annotated with more than one class label have their swap score penalized. Set this to False if you are not concerned when two very similar boxes exist with different class labels in the given annotations. + + auxiliary_inputs: + Optional list of ``N`` dictionaries containing keys for sub-parts of label and prediction per image. Useful to minimize computation when computing multiple box scores for a single set of images. For the `i`-th image, `auxiliary_inputs[i]` should contain following keys: + + * pred_labels: np.ndarray + Array of predicted classes for `i`-th image of shape ``(M,)``. + * pred_label_probs: np.ndarray + Array of predicted class probabilities for `i`-th image of shape ``(M,)``. + * pred_bboxes: np.ndarray + Array of predicted bounding boxes for `i`-th image of shape ``(M, 4)``. + * lab_labels: np.ndarray + Array of given label classed for `i`-th image of shape ``(L,)``. + * lab_bboxes: np.ndarray + Array of given label bounding boxes for `i`-th image of shape ``(L, 4)``. + * similarity_matrix: np.ndarray + Similarity matrix between labels and predictions `i`-th image. + * min_possible_similarity: float + Minimum possible similarity value greater than 0 between labels and predictions for the entire dataset. + Returns + --------- + scores_swap: + A list of ``N`` numpy arrays where scores_swap[i] is an array of size ``L`` swap scores per annotated box for the `i`-th image. + """ + ( + alpha, + low_probability_threshold, + high_probability_threshold, + temperature, + ) = _get_valid_subtype_score_params(alpha, None, high_probability_threshold, None) + + if auxiliary_inputs is None: + auxiliary_inputs = _get_valid_inputs_for_compute_scores(alpha, labels, predictions) + + scores_swap = [] + for auxiliary_inputs in auxiliary_inputs: + scores_swap_per_box = _compute_swap_box_scores_for_image( + alpha=alpha, + high_probability_threshold=high_probability_threshold, + overlapping_label_check=overlapping_label_check, + **auxiliary_inputs, + ) + scores_swap.append(scores_swap_per_box) + return scores_swap
+ + +
[docs]def pool_box_scores_per_image( + box_scores: List[np.ndarray], *, temperature: Optional[float] = None +) -> np.ndarray: + """ + Aggregates all per-box scores within an image to return a single quality score for the image rather than for individual boxes within it. + This is a helper method mostly for advanced users to be used with the outputs of :py:func:`object_detection.rank.compute_overlooked_box_scores <cleanlab.object_detection.rank.compute_overlooked_box_scores>`, :py:func:`object_detection.rank.compute_badloc_box_scores <cleanlab.object_detection.rank.compute_badloc_box_scores>`, and :py:func:`object_detection.rank.compute_swap_box_scores <cleanlab.object_detection.rank.compute_swap_box_scores>`. + + Score per image is between 0 and 1, with lower values indicating we are more confident image contains an error. + + Parameters + ---------- + box_scores: + A list of ``N`` numpy arrays where box_scores[i] is an array of badly located scores per box for the `i`-th image. + + temperature: + Optional temperature of the softmin function where a lower value suggests softmin acts closer to min. If not provided, a good default is used. + + Returns + --------- + image_scores: + An array of size ``N`` where ``image_scores[i]`` represents the score for the `i`-th image. + """ + + ( + alpha, + low_probability_threshold, + high_probability_threshold, + temperature, + ) = _get_valid_subtype_score_params(None, None, None, temperature) + + image_scores = np.empty( + shape=[ + len(box_scores), + ] + ) + for idx, box_score in enumerate(box_scores): + image_score = _get_valid_score(box_score, temperature=temperature) + image_scores[idx] = image_score + return image_scores
+ + +def _get_subtype_label_quality_scores( + labels: List[Dict[str, Any]], + predictions: List[np.ndarray], + *, + alpha: Optional[float] = None, + low_probability_threshold: Optional[float] = None, + high_probability_threshold: Optional[float] = None, + temperature: Optional[float] = None, + aggregation_weights: Optional[Dict[str, float]] = None, + overlapping_label_check: Optional[bool] = True, +) -> np.ndarray: + """ + Returns a label quality score for each of the ``N`` images in the dataset. + Score is between 0 and 1. + + 1 - clean label (given label is likely correct). + 0 - dirty label (given label is likely incorrect). + + Parameters + ---------- + labels: + A list of ``N`` dictionaries such that ``labels[i]`` contains the given labels for the `i`-th image. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + predictions: + A list of ``N`` ``np.ndarray`` such that ``predictions[i]`` corresponds to the model predictions for the `i`-th image. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + alpha: + Optional weighting between IoU and Euclidean distance when calculating similarity between predicted and annotated boxes. High alpha means weighting IoU more heavily over Euclidean distance. If no alpha is provided, a good default is used. + + low_probability_threshold: + Optional minimum probability threshold that determines which predicted boxes are considered when computing badly located scores. If not provided, a good default is used. + + high_probability_threshold: + Optional probability threshold that determines which predicted boxes are considered high-confidence when computing overlooked and swapped scores. If not provided, a good default is used. + + temperature: + Optional temperature of the softmin function where a lower score suggests softmin acts closer to min. If not provided, a good default is used. + + overlapping_label_check : bool, default = True + If True, boxes annotated with more than one class label have their swap score penalized. Set this to False if you are not concerned when two very similar boxes exist with different class labels in the given annotations. + + Returns + --------- + label_quality_scores: + As returned by :py:func:`get_label_quality_scores <cleanlab.outlier.get_label_quality_scores>`. See function for more details. + """ + ( + alpha, + low_probability_threshold, + high_probability_threshold, + temperature, + ) = _get_valid_subtype_score_params( + alpha, low_probability_threshold, high_probability_threshold, temperature + ) + auxiliary_inputs = _get_valid_inputs_for_compute_scores(alpha, labels, predictions) + aggregation_weights = _get_aggregation_weights(aggregation_weights) + + overlooked_scores_per_box = compute_overlooked_box_scores( + alpha=alpha, + high_probability_threshold=high_probability_threshold, + auxiliary_inputs=auxiliary_inputs, + ) + overlooked_score_per_image = pool_box_scores_per_image( + overlooked_scores_per_box, temperature=temperature + ) + + badloc_scores_per_box = compute_badloc_box_scores( + alpha=alpha, + low_probability_threshold=low_probability_threshold, + auxiliary_inputs=auxiliary_inputs, + ) + badloc_score_per_image = pool_box_scores_per_image( + badloc_scores_per_box, temperature=temperature + ) + + swap_scores_per_box = compute_swap_box_scores( + alpha=alpha, + high_probability_threshold=high_probability_threshold, + auxiliary_inputs=auxiliary_inputs, + overlapping_label_check=overlapping_label_check, + ) + swap_score_per_image = pool_box_scores_per_image(swap_scores_per_box, temperature=temperature) + + scores = ( + aggregation_weights["overlooked"] * np.log(TINY_VALUE + overlooked_score_per_image) + + aggregation_weights["badloc"] * np.log(TINY_VALUE + badloc_score_per_image) + + aggregation_weights["swap"] * np.log(TINY_VALUE + swap_score_per_image) + ) + + scores = np.exp(scores) + + return scores +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/object_detection/summary.html b/v2.6.6/_modules/cleanlab/object_detection/summary.html new file mode 100644 index 000000000..826ddf17c --- /dev/null +++ b/v2.6.6/_modules/cleanlab/object_detection/summary.html @@ -0,0 +1,1438 @@ + + + + + + + + + + + cleanlab.object_detection.summary - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.object_detection.summary

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods to display examples and their label issues in an object detection dataset.
+Here each image can have multiple objects, each with its own bounding box and class label.
+"""
+from multiprocessing import Pool
+from typing import Optional, Any, Dict, Tuple, Union, List, TYPE_CHECKING, TypeVar, DefaultDict
+
+import numpy as np
+import collections
+
+from cleanlab.internal.constants import (
+    MAX_CLASS_TO_SHOW,
+    ALPHA,
+    EPSILON,
+    TINY_VALUE,
+)
+from cleanlab.object_detection.filter import (
+    _filter_by_class,
+    _calculate_true_positives_false_positives,
+)
+from cleanlab.object_detection.rank import (
+    _get_valid_inputs_for_compute_scores,
+    _separate_prediction,
+    _separate_label,
+    _get_prediction_type,
+)
+
+from cleanlab.internal.object_detection_utils import bbox_xyxy_to_xywh
+
+if TYPE_CHECKING:
+    from PIL.Image import Image as Image  # pragma: no cover
+else:
+    Image = TypeVar("Image")
+
+
+
[docs]def object_counts_per_image( + labels=None, + predictions=None, + *, + auxiliary_inputs=None, +) -> Tuple[List, List]: + """Return the number of annotated and predicted objects for each image in the dataset. + + This method can help you discover images with abnormally many/few object annotations. + + Parameters + ---------- + labels : + Annotated boxes and class labels in the original dataset, which may contain some errors. + This is a list of ``N`` dictionaries such that ``labels[i]`` contains the given labels for the `i`-th image in the following format: + ``{'bboxes': np.ndarray((L,4)), 'labels': np.ndarray((L,)), 'image_name': str}`` where ``L`` is the number of annotated bounding boxes + for the `i`-th image and ``bboxes[l]`` is a bounding box of coordinates in ``[x1,y1,x2,y2]`` format with given class label ``labels[j]``. + ``image_name`` is an optional part of the labels that can be used to later refer to specific images. + + Note: Here, ``(x1,y1)`` corresponds to the top-left and ``(x2,y2)`` corresponds to the bottom-right corner of the bounding box with respect to the image matrix [e.g. `XYXY in Keras <https://keras.io/api/keras_cv/bounding_box/formats/>`, `Detectron 2 <https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.Visualizer.draw_box>`]. + + For more information on proper labels formatting, check out the `MMDetection library <https://mmdetection.readthedocs.io/en/dev-3.x/advanced_guides/customize_dataset.html>`_. + + predictions : + Predictions output by a trained object detection model. + For the most accurate results, predictions should be out-of-sample to avoid overfitting, eg. obtained via :ref:`cross-validation <pred_probs_cross_val>`. + This is a list of ``N`` ``np.ndarray`` such that ``predictions[i]`` corresponds to the model prediction for the `i`-th image. + For each possible class ``k`` in 0, 1, ..., K-1: ``predictions[i][k]`` is a ``np.ndarray`` of shape ``(M,5)``, + where ``M`` is the number of predicted bounding boxes for class ``k``. Here the five columns correspond to ``[x1,y1,x2,y2,pred_prob]``, + where ``[x1,y1,x2,y2]`` are coordinates of the bounding box predicted by the model + and ``pred_prob`` is the model's confidence in the predicted class label for this bounding box. + + Note: Here, ``(x1,y1)`` corresponds to the top-left and ``(x2,y2)`` corresponds to the bottom-right corner of the bounding box with respect to the image matrix [e.g. `XYXY in Keras <https://keras.io/api/keras_cv/bounding_box/formats/>`, `Detectron 2 <https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.Visualizer.draw_box>`]. The last column, pred_prob, represents the predicted probability that the bounding box contains an object of the class k. + + For more information see the `MMDetection package <https://github.com/open-mmlab/mmdetection>`_ for an example object detection library that outputs predictions in the correct format. + + auxiliary_inputs: optional + Auxiliary inputs to be used in the computation of counts. + The `auxiliary_inputs` can be computed using :py:func:`rank._get_valid_inputs_for_compute_scores <cleanlab.object_detection.rank._get_valid_inputs_for_compute_scores>`. + It is internally computed from the given `labels` and `predictions`. + + Returns + ------- + object_counts: Tuple[List, List] + A tuple containing two lists. The first is an array of shape ``(N,)`` containing the number of annotated objects for each image in the dataset. + The second is an array of shape ``(N,)`` containing the number of predicted objects for each image in the dataset. + """ + if auxiliary_inputs is None: + auxiliary_inputs = _get_valid_inputs_for_compute_scores(ALPHA, labels, predictions) + return ( + [len(sample["lab_bboxes"]) for sample in auxiliary_inputs], + [len(sample["pred_bboxes"]) for sample in auxiliary_inputs], + )
+ + +
[docs]def bounding_box_size_distribution( + labels=None, + predictions=None, + *, + auxiliary_inputs=None, + class_names: Optional[Dict[Any, Any]] = None, + sort: bool = False, +) -> Tuple[Dict[Any, List], Dict[Any, List]]: + """Return the distribution over sizes of annotated and predicted bounding boxes across the dataset, broken down by each class. + + This method can help you find annotated/predicted boxes for a particular class that are abnormally big/small. + + Parameters + ---------- + labels: + Annotated boxes and class labels in the original dataset, which may contain some errors. + Refer to documentation for this argument in :py:func:`object_counts_per_image <cleanlab.object_detection.summary.object_counts_per_image>` for further details. + + predictions: + Predictions output by a trained object detection model. + Refer to documentation for this argument in :py:func:`object_counts_per_image <cleanlab.object_detection.summary.object_counts_per_image>` for further details. + + auxiliary_inputs: optional + Auxiliary inputs to be used in the computation of counts. + Refer to documentation for this argument in :py:func:`object_counts_per_image <cleanlab.object_detection.summary.object_counts_per_image>` for further details. + + class_names: optional + A dictionary mapping one-hot-encoded class labels back to their original class names in the format ``{"integer-label": "original-class-name"}``. + You can use this argument to control the classes for which the size distribution is computed. + + sort: bool + If True, the returned dictionaries are sorted by the number of instances of each class in the dataset in descending order. + + Returns + ------- + bbox_sizes: Tuple[Dict[Any, List], Dict[Any, List]] + A tuple containing two dictionaries. Each maps each class label to a list of the sizes of annotated bounding boxes for that class in the label and prediction datasets, respectively. + """ + if auxiliary_inputs is None: + auxiliary_inputs = _get_valid_inputs_for_compute_scores(ALPHA, labels, predictions) + + lab_area: Dict[Any, list] = collections.defaultdict(list) + pred_area: Dict[Any, list] = collections.defaultdict(list) + for sample in auxiliary_inputs: + _get_bbox_areas(sample["lab_labels"], sample["lab_bboxes"], lab_area, class_names) + _get_bbox_areas(sample["pred_labels"], sample["pred_bboxes"], pred_area, class_names) + + if sort: + lab_area = dict(sorted(lab_area.items(), key=lambda x: -len(x[1]))) + pred_area = dict(sorted(pred_area.items(), key=lambda x: -len(x[1]))) + + return lab_area, pred_area
+ + +
[docs]def class_label_distribution( + labels=None, + predictions=None, + *, + auxiliary_inputs=None, + class_names: Optional[Dict[Any, Any]] = None, +) -> Tuple[Dict[Any, float], Dict[Any, float]]: + """Returns the distribution of class labels associated with all annotated bounding boxes (or predicted bounding boxes) in the dataset. + + This method can help you understand which classes are: rare or over/under-predicted by the model overall. + + Parameters + ---------- + labels: + Annotated boxes and class labels in the original dataset, which may contain some errors. + Refer to documentation for this argument in :py:func:`object_counts_per_image <cleanlab.object_detection.summary.object_counts_per_image>` for further details. + + predictions: + Predictions output by a trained object detection model. + Refer to documentation for this argument in :py:func:`object_counts_per_image <cleanlab.object_detection.summary.object_counts_per_image>` for further details. + + auxiliary_inputs: optional + Auxiliary inputs to be used in the computation of counts. + Refer to documentation for this argument in :py:func:`object_counts_per_image <cleanlab.object_detection.summary.object_counts_per_image>` for further details. + + class_names: optional + Optional dictionary mapping one-hot-encoded class labels back to their original class names in the format ``{"integer-label": "original-class-name"}``. + + Returns + ------- + class_distribution: Tuple[Dict[Any, float], Dict[Any, float]] + A tuple containing two dictionaries. The first is a dictionary mapping each class label to its frequency in the dataset annotations. + The second is a dictionary mapping each class label to its frequency in the model predictions across all images in the dataset. + """ + if auxiliary_inputs is None: + auxiliary_inputs = _get_valid_inputs_for_compute_scores(ALPHA, labels, predictions) + + lab_freq: DefaultDict[Any, int] = collections.defaultdict(int) + pred_freq: DefaultDict[Any, int] = collections.defaultdict(int) + for sample in auxiliary_inputs: + _get_class_instances(sample["lab_labels"], lab_freq, class_names) + _get_class_instances(sample["pred_labels"], pred_freq, class_names) + + label_norm = _normalize_by_total(lab_freq) + pred_norm = _normalize_by_total(pred_freq) + + return label_norm, pred_norm
+ + +
[docs]def get_sorted_bbox_count_idxs(labels, predictions): + """ + Returns a tuple of idxs and bounding box counts of images sorted from highest to lowest number of bounding boxes. + + This plot can help you discover images with abnormally many/few object annotations. + + Parameters + ---------- + labels: + Annotated boxes and class labels in the original dataset, which may contain some errors. + Refer to documentation for this argument in :py:func:`object_counts_per_image <cleanlab.object_detection.summary.object_counts_per_image>` for further details. + + predictions: + Predictions output by a trained object detection model. + Refer to documentation for this argument in :py:func:`object_counts_per_image <cleanlab.object_detection.summary.object_counts_per_image>` for further details. + + + Returns + ------- + sorted_idxs: List[Tuple[int, int]], List[Tuple[int, int]] + A tuple containing two lists. The first is an array of shape ``(N,)`` containing the number of annotated objects for each image in the dataset. + The second is an array of shape ``(N,)`` containing the number of predicted objects for each image in the dataset. + """ + lab_count, pred_count = object_counts_per_image(labels, predictions) + lab_grouped = list(enumerate(lab_count)) + pred_grouped = list(enumerate(pred_count)) + + sorted_lab = sorted(lab_grouped, key=lambda x: x[1], reverse=True) + sorted_pred = sorted(pred_grouped, key=lambda x: x[1], reverse=True) + + return sorted_lab, sorted_pred
+ + +
[docs]def plot_class_size_distributions( + labels, predictions, class_names=None, class_to_show=MAX_CLASS_TO_SHOW, **kwargs +): + """ + Plots the size distributions for bounding boxes for each class. + + This plot can help you find annotated/predicted boxes for a particular class that are abnormally big/small. + + Parameters + ---------- + labels: + Annotated boxes and class labels in the original dataset, which may contain some errors. + Refer to documentation for this argument in :py:func:`object_counts_per_image <cleanlab.object_detection.summary.object_counts_per_image>` for further details. + + predictions: + Predictions output by a trained object detection model. + Refer to documentation for this argument in :py:func:`object_counts_per_image <cleanlab.object_detection.summary.object_counts_per_image>` for further details. + + class_names: optional + Optional dictionary mapping one-hot-encoded class labels back to their original class names in the format ``{"integer-label": "original-class-name"}``. + You can use this argument to control the classes for which the size distribution is plotted. + + class_to_show: optional + The number of classes to show in the plots. Classes over `class_to_show` are hidden. If this argument is provided, then the classes are sorted by the number of instances in the dataset. + Defaults to `MAX_CLASS_TO_SHOW` which is set to 10. + + kwargs: + Additional keyword arguments to pass to ``plt.show()`` (matplotlib.pyplot.show). + """ + try: + import matplotlib.pyplot as plt + except ImportError as e: + raise ImportError( + "This functionality requires matplotlib. Install it via: `pip install matplotlib`" + ) + + lab_boxes, pred_boxes = bounding_box_size_distribution( + labels, + predictions, + class_names=class_names, + sort=True if class_to_show is not None else False, + ) + + for i, c in enumerate(lab_boxes.keys()): + if i >= class_to_show: + break + fig, axs = plt.subplots(1, 2, figsize=(10, 5)) + fig.suptitle(f"Size distributions for bounding box for class {c}") + for i, l in enumerate([lab_boxes, pred_boxes]): + axs[i].hist(l[c], bins="auto") + axs[i].set_xlabel("box area (pixels)") + axs[i].set_ylabel("count") + axs[i].set_title("annotated" if i == 0 else "predicted") + + plt.show(**kwargs)
+ + +
[docs]def plot_class_distribution(labels, predictions, class_names=None, **kwargs): + """ + Plots the distribution of class labels associated with all annotated bounding boxes and predicted bounding boxes in the dataset. + + This plot can help you understand which classes are rare or over/under-predicted by the model overall. + + Parameters + ---------- + labels: + Annotated boxes and class labels in the original dataset, which may contain some errors. + Refer to documentation for this argument in :py:func:`object_counts_per_image <cleanlab.object_detection.summary.object_counts_per_image>` for further details. + + predictions: + Predictions output by a trained object detection model. + Refer to documentation for this argument in :py:func:`object_counts_per_image <cleanlab.object_detection.summary.object_counts_per_image>` for further details. + + class_names: optional + Optional dictionary mapping one-hot-encoded class labels back to their original class names in the format ``{"integer-label": "original-class-name"}``. + + kwargs: + Additional keyword arguments to pass to ``plt.show()`` (matplotlib.pyplot.show). + """ + try: + import matplotlib.pyplot as plt + except ImportError as e: + raise ImportError( + "This functionality requires matplotlib. Install it via: `pip install matplotlib`" + ) + + lab_dist, pred_dist = class_label_distribution(labels, predictions, class_names=class_names) + fig, axs = plt.subplots(1, 2, figsize=(10, 5)) + fig.suptitle(f"Distribution of classes in the dataset") + for i, d in enumerate([lab_dist, pred_dist]): + axs[i].pie(d.values(), labels=d.keys(), autopct="%1.1f%%") + axs[i].set_title("Annotated" if i == 0 else "Predicted") + + plt.show(**kwargs)
+ + +
[docs]def visualize( + image: Union[str, np.ndarray, Image], + *, + label: Optional[Dict[str, Any]] = None, + prediction: Optional[np.ndarray] = None, + prediction_threshold: Optional[float] = None, + overlay: bool = True, + class_names: Optional[Dict[Any, Any]] = None, + figsize: Optional[Tuple[int, int]] = None, + save_path: Optional[str] = None, + **kwargs, +) -> None: + """Display the annotated bounding boxes (given labels) and predicted bounding boxes (model predictions) for a particular image. + Given labels are shown in red, model predictions in blue. + + + Parameters + ---------- + image: + Image object loaded into memory or full path to the image file. If path is provided, image is loaded into memory. + + label: + The given label for a single image in the format ``{'bboxes': np.ndarray((L,4)), 'labels': np.ndarray((L,))}`` where + ``L`` is the number of bounding boxes for the `i`-th image and ``bboxes[j]`` is in the format ``[x1,y1,x2,y2]`` with given label ``labels[j]``. + + Note: Here, ``(x1,y1)`` corresponds to the top-left and ``(x2,y2)`` corresponds to the bottom-right corner of the bounding box with respect to the image matrix [e.g. `XYXY in Keras <https://keras.io/api/keras_cv/bounding_box/formats/>`, `Detectron 2 <https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.Visualizer.draw_box>`]. + + prediction: + A prediction for a single image in the format ``np.ndarray((K,))`` and ``prediction[k]`` is of shape ``np.ndarray(N,5)`` + where ``M`` is the number of predicted bounding boxes for class ``k`` and the five columns correspond to ``[x,y,x,y,pred_prob]`` where + ``[x1,y1,x2,y2]`` are the bounding box coordinates predicted by the model and ``pred_prob`` is the model's confidence in ``predictions[i]``. + + Note: Here, ``(x1,y1)`` corresponds to the top-left and ``(x2,y2)`` corresponds to the bottom-right corner of the bounding box with respect to the image matrix [e.g. `XYXY in Keras <https://keras.io/api/keras_cv/bounding_box/formats/>`, `Detectron 2 <https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.Visualizer.draw_box>`]. The last column, pred_prob, represents the predicted probability that the bounding box contains an object of the class k. + + prediction_threshold: + All model-predicted bounding boxes with confidence (`pred_prob`) + below this threshold are omitted from the visualization. + + overlay: bool + If True, display a single image with given labels and predictions overlaid. + If False, display two images (side by side) with the left image showing the model predictions and the right image showing the given label. + + class_names: + Optional dictionary mapping one-hot-encoded class labels back to their original class names in the format ``{"integer-label": "original-class-name"}``. + + save_path: + Path to save figure at. If a path is provided, the figure is saved. To save in a specific image format, add desired file extension to the end of `save_path`. Allowed file extensions are: 'png', 'pdf', 'ps', 'eps', and 'svg'. + + figsize: + Optional figure size for plotting the image. + Corresponds to ``matplotlib.figure.figsize``. + + kwargs: + Additional keyword arguments to pass to ``plt.show()`` (matplotlib.pyplot.show). + """ + try: + import matplotlib.pyplot as plt + except ImportError as e: + raise ImportError( + "This functionality requires matplotlib. Install it via: `pip install matplotlib`" + ) + + # Create figure and axes + if isinstance(image, str): + image = plt.imread(image) + + if prediction is not None: + prediction_type = _get_prediction_type(prediction) + pbbox, plabels, pred_probs = _separate_prediction( + prediction, prediction_type=prediction_type + ) + + if prediction_threshold is not None: + keep_idx = np.where(pred_probs > prediction_threshold) + pbbox = pbbox[keep_idx] + plabels = plabels[keep_idx] + + if label is not None: + abbox, alabels = _separate_label(label) + + if overlay: + figsize = (8, 5) if figsize is None else figsize + fig, ax = plt.subplots(frameon=False, figsize=figsize) + plt.axis("off") + ax.imshow(image) + if label is not None: + fig, ax = _draw_boxes( + fig, ax, abbox, alabels, edgecolor="r", linestyle="-", linewidth=1 + ) + if prediction is not None: + _, _ = _draw_boxes(fig, ax, pbbox, plabels, edgecolor="b", linestyle="-.", linewidth=1) + else: + figsize = (14, 10) if figsize is None else figsize + fig, axes = plt.subplots(nrows=1, ncols=2, frameon=False, figsize=figsize) + axes[0].axis("off") + axes[0].imshow(image) + axes[1].axis("off") + axes[1].imshow(image) + + if label is not None: + fig, ax = _draw_boxes( + fig, axes[0], abbox, alabels, edgecolor="r", linestyle="-", linewidth=1 + ) + if prediction is not None: + _, _ = _draw_boxes( + fig, axes[1], pbbox, plabels, edgecolor="b", linestyle="-.", linewidth=1 + ) + bbox_extra_artists = None + if label or prediction is not None: + legend, plt = _plot_legend(class_names, label, prediction) + bbox_extra_artists = (legend,) + + if save_path: + allowed_image_formats = set(["png", "pdf", "ps", "eps", "svg"]) + image_format: Optional[str] = None + if save_path.split(".")[-1] in allowed_image_formats and "." in save_path: + image_format = save_path.split(".")[-1] + plt.savefig( + save_path, + format=image_format, + bbox_extra_artists=bbox_extra_artists, + bbox_inches="tight", + transparent=True, + pad_inches=0.5, + ) + plt.show(**kwargs)
+ + +def _get_per_class_confusion_matrix_dict_( + labels: List[Dict[str, Any]], + predictions: List[np.ndarray], + iou_threshold: Optional[float] = 0.5, + num_procs: int = 1, +) -> DefaultDict[int, Dict[str, int]]: + """ + Returns a confusion matrix dictionary for each class containing the number of True Positive, False Positive, and False Negative detections from the object detection model. + """ + num_classes = len(predictions[0]) + num_images = len(predictions) + pool = Pool(num_procs) + counter_dict: DefaultDict[int, dict[str, int]] = collections.defaultdict( + lambda: {"TP": 0, "FP": 0, "FN": 0} + ) + + for class_num in range(num_classes): + pred_bboxes, lab_bboxes = _filter_by_class(labels, predictions, class_num) + tpfpfn = pool.starmap( + _calculate_true_positives_false_positives, + zip( + pred_bboxes, + lab_bboxes, + [iou_threshold for _ in range(num_images)], + [True for _ in range(num_images)], + ), + ) + + for image_idx, (tp, fp, fn) in enumerate(tpfpfn): # type: ignore + counter_dict[class_num]["TP"] += np.sum(tp) + counter_dict[class_num]["FP"] += np.sum(fp) + counter_dict[class_num]["FN"] += np.sum(fn) + + return counter_dict + + +def _sort_dict_to_list(index_value_dict): + """ + Convert a dictionary to a list sorted by index and return the values in that order. + + Parameters: + - index_value_dict (dict): The input dictionary where keys represent indices and values are the corresponding elements. + + Returns: + list: A list containing the values from the input dictionary, sorted by index. + + Example: + >>> my_dict = {'0': '0', '1': '1', '2': '2', '3': '3', '4': '4'} + >>> sort_dict_to_list(my_dict) + ['0', '1', '2', '3', '4'] + """ + sorted_list = [ + value for key, value in sorted(index_value_dict.items(), key=lambda x: int(x[0])) + ] + return sorted_list + + +
[docs]def get_average_per_class_confusion_matrix( + labels: List[Dict[str, Any]], + predictions: List[np.ndarray], + num_procs: int = 1, + class_names: Optional[Dict[Any, Any]] = None, +) -> Dict[Union[int, str], Dict[str, float]]: + """ + Compute a confusion matrix dictionary for each class containing the average number of True Positive, False Positive, and False Negative detections from the object detection model across a range of Intersection over Union thresholds. + + At each IoU threshold, the metrics are calculated as follows: + - True Positive (TP): Instances where the model correctly identifies the class with IoU above the threshold. + - False Positive (FP): Instances where the model predicts the class, but IoU is below the threshold. + - False Negative (FN): Instances where the ground truth class is not predicted by the model. + + The average confusion matrix provides insights into the model strengths and potential biases. + + Note: lower TP at certain IoU thresholds does not necessarily imply that everything else is FP, instead it indicates that, at those specific IoU thresholds, the model is not performing as well in terms of correctly identifying class instances. The other metrics (FP and FN) provide additional information about the model's behavior. + + Note: Since we average over many IoU thresholds, 'TP', 'FP', and 'FN' may contain float values representing the average across these thresholds. + + Parameters + ---------- + labels: + A list of ``N`` dictionaries such that ``labels[i]`` contains the given labels for the `i`-th image. + Refer to documentation for this argument in :py:func:`object_detection.filter.find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + predictions: + A list of ``N`` ``np.ndarray`` such that ``predictions[i]`` corresponds to the model predictions for the `i`-th image. + Refer to documentation for this argument in :py:func:`object_detection.filter.find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + num_procs: + Number of processes for parallelization. Default is 1. + + class_names: + Optional dictionary mapping one-hot-encoded class labels back to their original class names in the format ``{"integer-label": "original-class-name"}`` + + + Returns + ------- + avg_metrics: dict + A distionary containing the average confusion matrix. + + The default range of Intersection over Union thresholds is from 0.5 to 0.95 with a step size of 0.05. + """ + iou_thrs = np.linspace(0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True) + num_classes = len(predictions[0]) + if class_names is None: + class_names = {str(i): int(i) for i in list(range(num_classes))} + class_names = _sort_dict_to_list(class_names) + avg_metrics = {class_num: {"TP": 0.0, "FP": 0.0, "FN": 0.0} for class_num in class_names} + + for iou_threshold in iou_thrs: + results_dict = _get_per_class_confusion_matrix_dict_( + labels, predictions, iou_threshold, num_procs + ) + + for class_num in results_dict: + tp = results_dict[class_num]["TP"] + fp = results_dict[class_num]["FP"] + fn = results_dict[class_num]["FN"] + + avg_metrics[class_names[class_num]]["TP"] += tp + avg_metrics[class_names[class_num]]["FP"] += fp + avg_metrics[class_names[class_num]]["FN"] += fn + + num_thresholds = len(iou_thrs) * len(results_dict) + for class_name in avg_metrics: + avg_metrics[class_name]["TP"] /= num_thresholds + avg_metrics[class_name]["FP"] /= num_thresholds + avg_metrics[class_name]["FN"] /= num_thresholds + return avg_metrics
+ + +
[docs]def calculate_per_class_metrics( + labels: List[Dict[str, Any]], + predictions: List[np.ndarray], + num_procs: int = 1, + class_names=None, +) -> Dict[Union[int, str], Dict[str, float]]: + """ + Calculate the object detection model's precision, recall, and F1 score for each class in the dataset. + + These metrics can help you identify model strengths and weaknesses, and provide reference statistics for model evaluation and comparisons. + + Parameters + ---------- + labels: + A list of ``N`` dictionaries such that ``labels[i]`` contains the given labels for the `i`-th image. + Refer to documentation for this argument in :py:func:`object_detection.filter.find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + predictions: + A list of ``N`` ``np.ndarray`` such that ``predictions[i]`` corresponds to the model predictions for the `i`-th image. + Refer to documentation for this argument in :py:func:`object_detection.filter.find_label_issues <cleanlab.object_detection.filter.find_label_issues>` for further details. + + num_procs: + Number of processes for parallelization. Default is 1. + + class_names: + Optional dictionary mapping one-hot-encoded class labels back to their original class names in the format ``{"integer-label": "original-class-name"}`` + + + Returns + ------- + per_class_metrics: dict + A dictionary containing per-class metrics computed from the object detection model's average confusion matrix values across a range of Intersection over Union thresholds. + + The default range of Intersection over Union thresholds is from 0.5 to 0.95 with a step size of 0.05. + """ + avg_metrics = get_average_per_class_confusion_matrix( + labels, predictions, num_procs, class_names=class_names + ) + + avg_metrics_dict = {} + for class_name in avg_metrics: + tp = avg_metrics[class_name]["TP"] + fp = avg_metrics[class_name]["FP"] + fn = avg_metrics[class_name]["FN"] + + precision = tp / (tp + fp + TINY_VALUE) # Avoid division by zero + recall = tp / (tp + fn + TINY_VALUE) # Avoid division by zero + f1 = 2 * (precision * recall) / (precision + recall + TINY_VALUE) # Avoid division by zero + + avg_metrics_dict[class_name] = { + "average precision": precision, + "average recall": recall, + "average f1": f1, + } + + return avg_metrics_dict
+ + +def _normalize_by_total(freq): + """Helper function to normalize a frequency distribution.""" + total = sum(freq.values()) + return {k: round(v / (total + EPSILON), 2) for k, v in freq.items()} + + +def _get_bbox_areas(labels, boxes, class_area_dict, class_names=None) -> None: + """Helper function to compute the area of bounding boxes for each class.""" + for cl, bbox in zip(labels, boxes): + if class_names is not None: + if str(cl) not in class_names: + continue + cl = class_names[str(cl)] + class_area_dict[cl].append((bbox[2] - bbox[0]) * (bbox[3] - bbox[1])) + + +def _get_class_instances(labels, class_instances_dict, class_names=None) -> None: + """Helper function to count the number of class instances in each image.""" + for cl in labels: + if class_names is not None: + cl = class_names[str(cl)] + class_instances_dict[cl] += 1 + + +def _plot_legend(class_names, label, prediction): + colors = ["black"] + colors.extend(["red"] if label is not None else []) + colors.extend(["blue"] if prediction is not None else []) + + markers = [None] + markers.extend(["s"] if label is not None else []) + markers.extend(["s"] if prediction is not None else []) + + labels = [r"$\bf{Legend}$"] + labels.extend(["given label"] if label is not None else []) + labels.extend(["predicted label"] if prediction is not None else []) + + if class_names: + colors += ["black"] + ["black"] * min(len(class_names), MAX_CLASS_TO_SHOW) + markers += [None] + [f"${class_key}$" for class_key in class_names.keys()] + labels += [r"$\bf{classes}$"] + list(class_names.values()) + + try: + import matplotlib.pyplot as plt + except ImportError as e: + raise ImportError( + "This functionality requires matplotlib. Install it via: `pip install matplotlib`" + ) + + f = lambda m, c: plt.plot([], [], marker=m, color=c, ls="none")[0] + handles = [f(marker, color) for marker, color in zip(markers, colors)] + legend = plt.legend( + handles, labels, bbox_to_anchor=(1.04, 0.05), loc="lower left", borderaxespad=0 + ) + + return legend, plt + + +def _draw_labels(ax, rect, label, edgecolor): + """Helper function to draw labels on an axis.""" + + rx, ry = rect.get_xy() + c_xleft = rx + 10 + c_xright = rx + rect.get_width() - 10 + c_ytop = ry + 12 + + if edgecolor == "r": + cx, cy = c_xleft, c_ytop + else: # edgecolor == b + cx, cy = c_xright, c_ytop + + l = ax.annotate( + label, (cx, cy), fontsize=8, fontweight="bold", color="white", ha="center", va="center" + ) + l.set_bbox(dict(facecolor=edgecolor, alpha=0.35, edgecolor=edgecolor, pad=2)) + return ax + + +def _draw_boxes(fig, ax, bboxes, labels, edgecolor="g", linestyle="-", linewidth=3): + """Helper function to draw bboxes and labels on an axis.""" + bboxes = [bbox_xyxy_to_xywh(box) for box in bboxes] + + try: + from matplotlib.patches import Rectangle + except Exception as e: + raise ImportError( + "This functionality requires matplotlib. Install it via: `pip install matplotlib`" + ) + + for (x, y, w, h), label in zip(bboxes, labels): + rect = Rectangle( + (x, y), + w, + h, + linewidth=linewidth, + linestyle=linestyle, + edgecolor=edgecolor, + facecolor="none", + ) + ax.add_patch(rect) + + if labels is not None: + ax = _draw_labels(ax, rect, label, edgecolor) + + return fig, ax +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/outlier.html b/v2.6.6/_modules/cleanlab/outlier.html new file mode 100644 index 000000000..611c03fe8 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/outlier.html @@ -0,0 +1,1277 @@ + + + + + + + + + + + cleanlab.outlier - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.outlier

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods for finding out-of-distribution examples in a dataset via scores that quantify how atypical each example is compared to the others.
+
+The underlying algorithms are described in `this paper <https://arxiv.org/abs/2207.03061>`_.
+"""
+
+import warnings
+from typing import Dict, Optional, Tuple, Union
+
+import numpy as np
+from sklearn.exceptions import NotFittedError
+from sklearn.neighbors import NearestNeighbors
+
+from cleanlab.count import get_confident_thresholds
+from cleanlab.internal.label_quality_utils import (
+    _subtract_confident_thresholds,
+    get_normalized_entropy,
+)
+from cleanlab.internal.neighbor.knn_graph import correct_knn_distances_and_indices, features_to_knn
+from cleanlab.internal.numerics import softmax
+from cleanlab.internal.outlier import correct_precision_errors, transform_distances_to_scores
+from cleanlab.internal.validation import assert_valid_inputs, labels_to_array
+from cleanlab.typing import LabelLike
+
+
+
[docs]class OutOfDistribution: + """ + Provides scores to detect Out Of Distribution (OOD) examples that are outliers in a dataset. + + Each example's OOD score lies in [0,1] with smaller values indicating examples that are less typical under the data distribution. + OOD scores may be estimated from either: numeric feature embeddings or predicted probabilities from a trained classifier. + + To get indices of examples that are the most severe outliers, call `~cleanlab.rank.find_top_issues` function on the returned OOD scores. + + Parameters + ---------- + params : dict, default = {} + Optional keyword arguments to control how this estimator is fit. Effect of arguments passed in depends on if + `OutOfDistribution` estimator will rely on `features` or `pred_probs`. These are stored as an instance attribute `self.params`. + + If `features` is passed in during ``fit()``, `params` could contain following keys: + * knn: sklearn.neighbors.NearestNeighbors, default = None + Instantiated ``NearestNeighbors`` object that's been fitted on a dataset in the same feature space. + Note that the distance metric and `n_neighbors` is specified when instantiating this class. + You can also pass in a subclass of ``sklearn.neighbors.NearestNeighbors`` which allows you to use faster + approximate neighbor libraries as long as you wrap them behind the same sklearn API. + If you specify ``knn`` here, there is no need to later call ``fit()`` before calling ``score()``. + If ``knn is None``, then by default: + The knn object is instantiated as ``sklearn.neighbors.NearestNeighbors(n_neighbors=k, metric=dist_metric).fit(features)``. + - If ``dim(features) > 3``, the distance metric is set to "cosine". + - If ``dim(features) <= 3``, the distance metric is set to "euclidean". + The implementation of the euclidean distance metric depends on the number of examples in the features array: + - For more than 100 rows, it uses scikit-learn's "euclidean" metric. This is for efficiency reasons reasons. + - For 100 or fewer rows, it uses scipy's ``scipy.spatial.distance.euclidean`` metric. This is for numerical stability reasons. + See: https://scikit-learn.org/stable/modules/neighbors.html + See: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html + See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.euclidean.html + * k : int, default=None + Optional number of neighbors to use when calculating outlier score (average distance to neighbors). + If `k` is not provided, then by default ``k = knn.n_neighbors`` or ``k = 10`` if ``knn is None``. + If an existing ``knn`` object is provided, you can still specify that outlier scores should use + a different value of `k` than originally used in the ``knn``, + as long as your specified value of `k` is smaller than the value originally used in ``knn``. + * t : int, default=1 + Optional hyperparameter only for advanced users. + Controls transformation of distances between examples into similarity scores that lie in [0,1]. + The transformation applied to distances `x` is ``exp(-x*t)``. + If you find your scores are all too close to 1, consider increasing `t`, + although the relative scores of examples will still have the same ranking across the dataset. + + If `pred_probs` is passed in during ``fit()``, `params` could contain following keys: + * confident_thresholds: np.ndarray, default = None + An array of shape ``(K, )`` where K is the number of classes. + Confident threshold for a class j is the expected (average) "self-confidence" for that class. + If you specify `confident_thresholds` here, there is no need to later call ``fit()`` before calling ``score()``. + * adjust_pred_probs : bool, True + If True, account for class imbalance by adjusting predicted probabilities + via subtraction of class confident thresholds and renormalization. + If False, you do not have to pass in `labels` later to fit this OOD estimator. + See `Northcutt et al., 2021 <https://jair.org/index.php/jair/article/view/12125>`_. + * method : {"entropy", "least_confidence"}, default="entropy" + Method to use when computing outlier scores based on `pred_probs`. + Letting length-K vector ``P = pred_probs[i]`` denote the given predicted class-probabilities + for the i-th example in dataset, its outlier score can either be: + + - ``'entropy'``: ``1 - sum_{j} P[j] * log(P[j]) / log(K)`` + - ``'least_confidence'``: ``max(P)`` (equivalent to Maximum Softmax Probability method from the OOD detection literature) + - ``gen``: Generalized ENtropy score from the paper of Liu, Lochman, and Zach (https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_GEN_Pushing_the_Limits_of_Softmax-Based_Out-of-Distribution_Detection_CVPR_2023_paper.pdf) + + """ + + OUTLIER_PARAMS = {"k", "t", "knn"} + OOD_PARAMS = {"confident_thresholds", "adjust_pred_probs", "method", "M", "gamma"} + DEFAULT_PARAM_DICT: Dict[str, Union[str, int, float, None, np.ndarray]] = { + "k": None, # param for feature based outlier detection (number of neighbors) + "t": 1, # param for feature based outlier detection (controls transformation of outlier scores to 0-1 range) + "knn": None, # param for features based outlier detection (precomputed nearest neighbors graph to use) + "method": "entropy", # param specifying which pred_probs-based outlier detection method to use + "adjust_pred_probs": True, # param for pred_probs based outlier detection (whether to adjust the probabilities by class thresholds or not) + "confident_thresholds": None, # param for pred_probs based outlier detection (precomputed confident thresholds to use for adjustment) + "M": 100, # param for GEN method for pred_probs based outlier detection + "gamma": 0.1, # param for GEN method for pred_probs based outlier detection + } + + def __init__(self, params: Optional[dict] = None) -> None: + self._assert_valid_params(params, self.DEFAULT_PARAM_DICT) + self.params = self.DEFAULT_PARAM_DICT.copy() + if params is not None: + self.params.update(params) + if self.params["adjust_pred_probs"] and self.params["method"] == "gen": + print( + "CAUTION: GEN method is not recommended for use with adjusted pred_probs. " + "To use GEN, we recommend setting: params['adjust_pred_probs'] = False" + ) + + # scaling_factor internally used to rescale distances based on mean distances to k nearest neighbors + self.params["scaling_factor"] = None + +
[docs] def fit_score( + self, + *, + features: Optional[np.ndarray] = None, + pred_probs: Optional[np.ndarray] = None, + labels: Optional[np.ndarray] = None, + verbose: bool = True, + ) -> np.ndarray: + """ + Fits this estimator to a given dataset and returns out-of-distribution scores for the same dataset. + + Scores lie in [0,1] with smaller values indicating examples that are less typical under the dataset + distribution (values near 0 indicate outliers). Exactly one of `features` or `pred_probs` needs to be passed + in to calculate scores. + + If `features` are passed in a ``NearestNeighbors`` object is fit. If `pred_probs` and 'labels' are passed in a + `confident_thresholds` ``np.ndarray`` is fit. For details see `~cleanlab.outlier.OutOfDistribution.fit`. + + Parameters + ---------- + features : np.ndarray, optional + Feature array of shape ``(N, M)``, where N is the number of examples and M is the number of features used to represent each example. + For details, `features` in the same format expected by the `~cleanlab.outlier.OutOfDistribution.fit` function. + + pred_probs : np.ndarray, optional + An array of shape ``(N, K)`` of predicted class probabilities output by a trained classifier. + For details, `pred_probs` in the same format expected by the `~cleanlab.outlier.OutOfDistribution.fit` function. + + labels : array_like, optional + A discrete array of given class labels for the data of shape ``(N,)``. + For details, `labels` in the same format expected by the `~cleanlab.outlier.OutOfDistribution.fit` function. + + verbose : bool, default = True + Set to ``False`` to suppress all print statements. + + Returns + ------- + scores : np.ndarray + If `features` are passed in, `ood_features_scores` are returned. + If `pred_probs` are passed in, `ood_predictions_scores` are returned. + For details see return of `~cleanlab.outlier.OutOfDistribution.scores` function. + + """ + scores = self._shared_fit( + features=features, + pred_probs=pred_probs, + labels=labels, + verbose=verbose, + ) + + if scores is None: # Fit was called on already fitted object so we just score vals instead + scores = self.score(features=features, pred_probs=pred_probs) + + return scores
+ +
[docs] def fit( + self, + *, + features: Optional[np.ndarray] = None, + pred_probs: Optional[np.ndarray] = None, + labels: Optional[LabelLike] = None, + verbose: bool = True, + ): + """ + Fits this estimator to a given dataset. + + One of `features` or `pred_probs` must be specified. + + If `features` are passed in, a ``NearestNeighbors`` object is fit. + If `pred_probs` and 'labels' are passed in, a `confident_thresholds` ``np.ndarray`` is fit. + For details see `~cleanlab.outlier.OutOfDistribution` documentation. + + Parameters + ---------- + features : np.ndarray, optional + Feature array of shape ``(N, M)``, where N is the number of examples and M is the number of features used to represent each example. + All features should be **numeric**. For less structured data (e.g. images, text, categorical values, ...), you should provide + vector embeddings to represent each example (e.g. extracted from some pretrained neural network). + + pred_probs : np.ndarray, optional + An array of shape ``(N, K)`` of model-predicted probabilities, + ``P(label=k|x)``. Each row of this matrix corresponds + to an example `x` and contains the model-predicted probabilities that + `x` belongs to each possible class, for each of the K classes. The + columns must be ordered such that these probabilities correspond to + class 0, 1, ..., K-1. + + labels : array_like, optional + A discrete vector of given labels for the data of shape ``(N,)``. Supported `array_like` types include: ``np.ndarray`` or ``list``. + *Format requirements*: for dataset with K classes, labels must be in 0, 1, ..., K-1. + All the classes (0, 1, ..., and K-1) MUST be present in ``labels``, such that: ``len(set(labels)) == pred_probs.shape[1]`` + If ``params["adjust_confident_thresholds"]`` was previously set to ``False``, you do not have to pass in `labels`. + Note: multi-label classification is not supported by this method, each example must belong to a single class, e.g. ``labels = np.ndarray([1,0,2,1,1,0...])``. + + verbose : bool, default = True + Set to ``False`` to suppress all print statements. + + """ + _ = self._shared_fit( + features=features, + pred_probs=pred_probs, + labels=labels, + verbose=verbose, + )
+ +
[docs] def score( + self, *, features: Optional[np.ndarray] = None, pred_probs: Optional[np.ndarray] = None + ) -> np.ndarray: + """ + Use fitted estimator and passed in `features` or `pred_probs` to calculate out-of-distribution scores for a dataset. + + Score for each example corresponds to the likelihood this example stems from the same distribution as the dataset previously specified in ``fit()`` (i.e. is not an outlier). + + If `features` are passed, returns OOD score for each example based on its feature values. + If `pred_probs` are passed, returns OOD score for each example based on classifier's probabilistic predictions. + You may have to previously call ``fit()`` or call ``fit_score()`` instead. + + Parameters + ---------- + features : np.ndarray, optional + Feature array of shape ``(N, M)``, where N is the number of examples and M is the number of features used to represent each example. + For details, see `features` in `~cleanlab.outlier.OutOfDistribution.fit` function. + + pred_probs : np.ndarray, optional + An array of shape ``(N, K)`` of predicted class probabilities output by a trained classifier. + For details, see `pred_probs` in `~cleanlab.outlier.OutOfDistribution.fit` function. + + Returns + ------- + scores : np.ndarray + Scores lie in [0,1] with smaller values indicating examples that are less typical under the dataset distribution + (values near 0 indicate outliers). + + If `features` are passed, `ood_features_scores` are returned. + The score is based on the average distance between the example and its K nearest neighbors in the dataset + (in feature space). + + If `pred_probs` are passed, `ood_predictions_scores` are returned. + The score is based on the uncertainty in the classifier's predicted probabilities. + """ + self._assert_valid_inputs(features, pred_probs) + + if features is not None: + if self.params["knn"] is None: + raise ValueError( + "OOD estimator needs to be fit on features first. Call `fit()` or `fit_scores()` before this function." + ) + scores, _ = self._get_ood_features_scores( + features, **self._get_params(self.OUTLIER_PARAMS) + ) + + if pred_probs is not None: + if self.params["confident_thresholds"] is None and self.params["adjust_pred_probs"]: + raise ValueError( + "OOD estimator needs to be fit on pred_probs first since params['adjust_pred_probs']=True. Call `fit()` or `fit_scores()` before this function." + ) + scores, _ = _get_ood_predictions_scores(pred_probs, **self._get_params(self.OOD_PARAMS)) + + return scores
+ + def _get_params(self, param_keys) -> dict: + """Get function specific dictionary of parameters (i.e. only those in param_keys).""" + return {k: v for k, v in self.params.items() if k in param_keys} + + @staticmethod + def _assert_valid_params(params, param_keys): + """Validate passed in params and get list of parameters in param that are not in param_keys.""" + if params is not None: + wrong_params = list(set(params.keys()).difference(set(param_keys))) + if len(wrong_params) > 0: + raise ValueError( + f"Passed in params dict can only contain {param_keys}. Remove {wrong_params} from params dict." + ) + + @staticmethod + def _assert_valid_inputs(features, pred_probs): + """Check whether features and pred_prob inputs are valid, throw error if not.""" + if features is None and pred_probs is None: + raise ValueError( + "Not enough information to compute scores. Pass in either features or pred_probs." + ) + + if features is not None and pred_probs is not None: + raise ValueError( + "Cannot fit to OOD Estimator to both features and pred_probs. Pass in either one or the other." + ) + + if features is not None and len(features.shape) != 2: + raise ValueError( + "Feature array needs to be of shape (N, M), where N is the number of examples and M is the " + "number of features used to represent each example. " + ) + + def _shared_fit( + self, + *, + features: Optional[np.ndarray] = None, + pred_probs: Optional[np.ndarray] = None, + labels: Optional[LabelLike] = None, + verbose: bool = True, + ) -> Optional[np.ndarray]: + """ + Shared fit functionality between ``fit()`` and ``fit_score()``. + + For details, refer to `~cleanlab.outlier.OutOfDistribution.fit` + or `~cleanlab.outlier.OutOfDistribution.fit_score`. + """ + self._assert_valid_inputs(features, pred_probs) + scores = None # If none scores are returned, fit was skipped + + if features is not None: + if self.params["knn"] is not None: + # No fitting twice if knn object already fit + warnings.warn( + "A KNN estimator has previously already been fit, call score() to apply it to data, or create a new OutOfDistribution object to fit a different estimator.", + UserWarning, + ) + else: + # Get ood features scores + if verbose: + print("Fitting OOD estimator based on provided features ...") + scores, knn = self._get_ood_features_scores( + features, **self._get_params(self.OUTLIER_PARAMS) + ) + self.params["knn"] = knn + + if pred_probs is not None: + if self.params["confident_thresholds"] is not None: + # No fitting twice if confident_thresholds object already fit + warnings.warn( + "Confident thresholds have previously already been fit, call score() to apply them to data, or create a new OutOfDistribution object to fit a different estimator.", + UserWarning, + ) + else: + # Get ood predictions scores + if verbose: + print("Fitting OOD estimator based on provided pred_probs ...") + scores, confident_thresholds = _get_ood_predictions_scores( + pred_probs, + labels=labels, + **self._get_params(self.OOD_PARAMS), + ) + if confident_thresholds is None: + warnings.warn( + "No estimates need to be be fit under the provided params, so you could directly call " + "score() as an alternative.", + UserWarning, + ) + else: + self.params["confident_thresholds"] = confident_thresholds + return scores + + def _get_ood_features_scores( + self, + features: Optional[np.ndarray] = None, + knn: Optional[NearestNeighbors] = None, + k: Optional[int] = None, + t: int = 1, + ) -> Tuple[np.ndarray, Optional[NearestNeighbors]]: + """ + Return outlier score based on feature values using `k` nearest neighbors. + + The outlier score for each example is computed inversely proportional to + the average distance between this example and its K nearest neighbors (in feature space). + + Parameters + ---------- + features : np.ndarray + Feature array of shape ``(N, M)``, where N is the number of examples and M is the number of features used to represent each example. + For details, `features` in the same format expected by the `~cleanlab.outlier.OutOfDistribution.fit` function. + + knn : sklearn.neighbors.NearestNeighbors, default = None + For details, see key `knn` in the params dict arg of `~cleanlab.outlier.OutOfDistribution`. + + k : int, default=None + Optional number of neighbors to use when calculating outlier score (average distance to neighbors). + For details, see key `k` in the params dict arg of `~cleanlab.outlier.OutOfDistribution`. + + t : int, default=1 + Controls transformation of distances between examples into similarity scores that lie in [0,1]. + For details, see key `t` in the params dict arg of `~cleanlab.outlier.OutOfDistribution`. + + Returns + ------- + ood_features_scores : Tuple[np.ndarray, Optional[NearestNeighbors]] + Return a tuple whose first element is array of `ood_features_scores` and second is a `knn` Estimator object. + """ + DEFAULT_K = 10 + # fit skip over (if knn is not None) then skipping fit and suggest score else fit. + distance_metric = None + correct_knn = False + if knn is None: # setup default KNN estimator + # Make sure both knn and features are not None + knn = features_to_knn(features, n_neighbors=k) + correct_knn = True + features = None # features should be None in knn.kneighbors(features) to avoid counting duplicate data points + # Log knn metric as string to ensure compatibility for score correction + distance_metric = ( + metric if isinstance((metric := knn.metric), str) else str(metric.__name__) + ) + k = knn.n_neighbors + + elif k is None: + k = knn.n_neighbors + + max_k = knn.n_neighbors # number of neighbors previously used in NearestNeighbors object + if k > max_k: # if k provided is too high, use max possible number of nearest neighbors + warnings.warn( + f"Chosen k={k} cannot be greater than n_neighbors={max_k} which was used when fitting " + f"NearestNeighbors object! Value of k changed to k={max_k}.", + UserWarning, + ) + k = max_k + + # Fit knn estimator on the features if a non-fitted estimator is passed in + try: + knn.kneighbors(features) + except NotFittedError: + knn.fit(features) + + # Get distances to k-nearest neighbors Note that the knn object contains the specification of distance metric + # and n_neighbors (k value) If our query set of features matches the training set used to fit knn, the nearest + # neighbor of each point is the point itself, at a distance of zero. + distances, indices = knn.kneighbors(features) + if ( + correct_knn + ): # This should only happen if knn is None at the start of this function. Will NEVER happen for approximate KNN provided by user. + _features_for_correction = ( + knn._fit_X if features is None else features + ) # Hacky way to get features (training or test). Storing np.unique results is a hassle. ONLY WORKS WITH sklearn NearestNeighbors object + distances, _ = correct_knn_distances_and_indices( + features=_features_for_correction, + distances=distances, + indices=indices, + ) + + # Calculate average distance to k-nearest neighbors + avg_knn_distances = distances[:, :k].mean(axis=1) + + if self.params["scaling_factor"] is None: + self.params["scaling_factor"] = float( + max(np.median(avg_knn_distances), 100 * np.finfo(np.float_).eps) + ) + scaling_factor = self.params["scaling_factor"] + + if not isinstance(scaling_factor, float): + raise ValueError(f"Scaling factor must be a float. Got {type(scaling_factor)} instead.") + + ood_features_scores = transform_distances_to_scores( + avg_knn_distances, t, scaling_factor=scaling_factor + ) + distance_metric = distance_metric or ( + metric if isinstance((metric := knn.metric), str) else metric.__name__ + ) + p = None + if distance_metric == "minkowski": + p = knn.p + ood_features_scores = correct_precision_errors( + ood_features_scores, avg_knn_distances, distance_metric, p=p + ) + return (ood_features_scores, knn)
+ + +def _get_ood_predictions_scores( + pred_probs: np.ndarray, + *, + labels: Optional[LabelLike] = None, + confident_thresholds: Optional[np.ndarray] = None, + adjust_pred_probs: bool = True, + method: str = "entropy", + M: int = 100, + gamma: float = 0.1, +) -> Tuple[np.ndarray, Optional[np.ndarray]]: + """Return an OOD (out of distribution) score for each example based on it pred_prob values. + + Parameters + ---------- + pred_probs : np.ndarray + An array of shape ``(N, K)`` of model-predicted probabilities, + `pred_probs` in the same format expected by the `~cleanlab.outlier.OutOfDistribution.fit` function. + + confident_thresholds : np.ndarray, default = None + For details, see key `confident_thresholds` in the params dict arg of `~cleanlab.outlier.OutOfDistribution`. + + labels : array_like, optional + `labels` in the same format expected by the `~cleanlab.outlier.OutOfDistribution.fit` function. + + adjust_pred_probs : bool, True + Account for class imbalance in the label-quality scoring. + For details, see key `adjust_pred_probs` in the params dict arg of `~cleanlab.outlier.OutOfDistribution`. + + method : {"entropy", "least_confidence", "gen"}, default="entropy" + Which method to use for computing outlier scores based on pred_probs. + For details see key `method` in the params dict arg of `~cleanlab.outlier.OutOfDistribution`. + + M : int, default=100 + For GEN method only. Hyperparameter that controls the number of top classes to consider when calculating OOD scores. + + gamma : float, default=0.1 + For GEN method only. Hyperparameter that controls the weight of the second term in the GEN score. + + + Returns + ------- + ood_predictions_scores : Tuple[np.ndarray, Optional[np.ndarray]] + Returns a tuple. First element is array of `ood_predictions_scores` and second is an np.ndarray of `confident_thresholds` or None is 'confident_thresholds' is not calculated. + """ + valid_methods = ( + "entropy", + "least_confidence", + "gen", + ) + + if (confident_thresholds is not None or labels is not None) and not adjust_pred_probs: + warnings.warn( + "OOD scores are not adjusted with confident thresholds. If scores need to be adjusted set " + "params['adjusted_pred_probs'] = True. Otherwise passing in confident_thresholds and/or labels does not change " + "score calculation.", + UserWarning, + ) + + if adjust_pred_probs: + if confident_thresholds is None: + if labels is None: + raise ValueError( + "Cannot calculate adjust_pred_probs without labels. Either pass in labels parameter or set " + "params['adjusted_pred_probs'] = False. " + ) + labels = labels_to_array(labels) + assert_valid_inputs(X=None, y=labels, pred_probs=pred_probs, multi_label=False) + confident_thresholds = get_confident_thresholds(labels, pred_probs, multi_label=False) + + pred_probs = _subtract_confident_thresholds( + None, pred_probs, multi_label=False, confident_thresholds=confident_thresholds + ) + + # Scores are flipped so ood scores are closer to 0. Scores reflect confidence example is in-distribution. + if method == "entropy": + ood_predictions_scores = 1.0 - get_normalized_entropy(pred_probs) + elif method == "least_confidence": + ood_predictions_scores = pred_probs.max(axis=1) + elif method == "gen": + if pred_probs.shape[1] < M: # pragma: no cover + warnings.warn( + f"GEN with the default hyperparameter settings is intended for datasets with at least {M} classes. You can adjust params['M'] according to the number of classes in your dataset.", + UserWarning, + ) + probs = softmax(pred_probs, axis=1) + probs_sorted = np.sort(probs, axis=1)[:, -M:] + ood_predictions_scores = ( + 1 - np.sum(probs_sorted**gamma * (1 - probs_sorted) ** (gamma), axis=1) / M + ) # Use 1 + original gen score/M to make the scores lie in 0-1 + else: + raise ValueError( + f""" + {method} is not a valid OOD scoring method! + Please choose a valid scoring_method: {valid_methods} + """ + ) + + return ( + ood_predictions_scores, + confident_thresholds, + ) +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/rank.html b/v2.6.6/_modules/cleanlab/rank.html new file mode 100644 index 000000000..c992ac00e --- /dev/null +++ b/v2.6.6/_modules/cleanlab/rank.html @@ -0,0 +1,1278 @@ + + + + + + + + + + + cleanlab.rank - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.rank

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+
+"""
+Methods to rank examples in standard (multi-class) classification datasets by cleanlab's `label quality score`.
+Except for `~cleanlab.rank.order_label_issues`, which operates only on the subset of the data identified
+as potential label issues/errors, the methods in this module can be used on whichever subset
+of the dataset you choose (including the entire dataset) and provide a `label quality score` for
+every example. You can then do something like: ``np.argsort(label_quality_score)`` to obtain ranked
+indices of individual datapoints based on their quality.
+
+Note: multi-label classification is not supported by most methods in this module,
+each example must be labeled as belonging to a single class, e.g. format: ``labels = np.ndarray([1,0,2,1,1,0...])``.
+For multi-label classification, instead see :py:func:`multilabel_classification.get_label_quality_scores <cleanlab.multilabel_classification.get_label_quality_scores>`.
+
+Note: Label quality scores are most accurate when they are computed based on out-of-sample `pred_probs` from your model.
+To obtain out-of-sample predicted probabilities for every datapoint in your dataset, you can use :ref:`cross-validation <pred_probs_cross_val>`. This is encouraged to get better results.
+"""
+
+import numpy as np
+from sklearn.metrics import log_loss
+from typing import List, Optional
+import warnings
+
+from cleanlab.internal.validation import assert_valid_inputs
+from cleanlab.internal.constants import (
+    CLIPPING_LOWER_BOUND,
+)  # lower-bound clipping threshold to prevents 0 in logs and division
+
+from cleanlab.internal.label_quality_utils import (
+    _subtract_confident_thresholds,
+    get_normalized_entropy,
+)
+
+
+
[docs]def get_label_quality_scores( + labels: np.ndarray, + pred_probs: np.ndarray, + *, + method: str = "self_confidence", + adjust_pred_probs: bool = False, +) -> np.ndarray: + """Returns a label quality score for each datapoint. + + This is a function to compute label quality scores for standard (multi-class) classification datasets, + where lower scores indicate labels less likely to be correct. + + Score is between 0 and 1. + + 1 - clean label (given label is likely correct). + 0 - dirty label (given label is likely incorrect). + + Parameters + ---------- + labels : np.ndarray + A discrete vector of noisy labels, i.e. some labels may be erroneous. + *Format requirements*: for dataset with K classes, labels must be in 0, 1, ..., K-1. + Note: multi-label classification is not supported by this method, each example must belong to a single class, e.g. format: ``labels = np.ndarray([1,0,2,1,1,0...])``. + + pred_probs : np.ndarray, optional + An array of shape ``(N, K)`` of model-predicted probabilities, + ``P(label=k|x)``. Each row of this matrix corresponds + to an example `x` and contains the model-predicted probabilities that + `x` belongs to each possible class, for each of the K classes. The + columns must be ordered such that these probabilities correspond to + class 0, 1, ..., K-1. + + **Note**: Returned label issues are most accurate when they are computed based on out-of-sample `pred_probs` from your model. + To obtain out-of-sample predicted probabilities for every datapoint in your dataset, you can use :ref:`cross-validation <pred_probs_cross_val>`. + This is encouraged to get better results. + + method : {"self_confidence", "normalized_margin", "confidence_weighted_entropy"}, default="self_confidence" + Label quality scoring method. + + Letting ``k = labels[i]`` and ``P = pred_probs[i]`` denote the given label and predicted class-probabilities + for datapoint *i*, its score can either be: + + - ``'normalized_margin'``: ``P[k] - max_{k' != k}[ P[k'] ]`` + - ``'self_confidence'``: ``P[k]`` + - ``'confidence_weighted_entropy'``: ``entropy(P) / self_confidence`` + + Note: the actual label quality scores returned by this method + may be transformed versions of the above, in order to ensure + their values lie between 0-1 with lower values indicating more likely mislabeled data. + + Let ``C = {0, 1, ..., K-1}`` be the set of classes specified for our classification task. + + The `normalized_margin` score works better for identifying class conditional label errors, + i.e. examples for which another label in ``C`` is appropriate but the given label is not. + + The `self_confidence` score works better for identifying alternative label issues + corresponding to bad examples that are: not from any of the classes in ``C``, + well-described by 2 or more labels in ``C``, + or generally just out-of-distribution (i.e. anomalous outliers). + + adjust_pred_probs : bool, optional + Account for class imbalance in the label-quality scoring by adjusting predicted probabilities + via subtraction of class confident thresholds and renormalization. + Set this to ``True`` if you prefer to account for class-imbalance. + See `Northcutt et al., 2021 <https://jair.org/index.php/jair/article/view/12125>`_. + + Returns + ------- + label_quality_scores : np.ndarray + Contains one score (between 0 and 1) per example. + Lower scores indicate more likely mislabeled examples. + + See Also + -------- + get_self_confidence_for_each_label + get_normalized_margin_for_each_label + get_confidence_weighted_entropy_for_each_label + """ + + assert_valid_inputs( + X=None, y=labels, pred_probs=pred_probs, multi_label=False, allow_one_class=True + ) + return _compute_label_quality_scores( + labels=labels, pred_probs=pred_probs, method=method, adjust_pred_probs=adjust_pred_probs + )
+ + +def _compute_label_quality_scores( + labels: np.ndarray, + pred_probs: np.ndarray, + *, + method: str = "self_confidence", + adjust_pred_probs: bool = False, + confident_thresholds: Optional[np.ndarray] = None, +) -> np.ndarray: + """Internal implementation of get_label_quality_scores that assumes inputs + have already been checked and are valid. This speeds things up. + Can also take in pre-computed confident_thresholds to further accelerate things. + """ + scoring_funcs = { + "self_confidence": get_self_confidence_for_each_label, + "normalized_margin": get_normalized_margin_for_each_label, + "confidence_weighted_entropy": get_confidence_weighted_entropy_for_each_label, + } + try: + scoring_func = scoring_funcs[method] + except KeyError: + raise ValueError( + f""" + {method} is not a valid scoring method for rank_by! + Please choose a valid rank_by: self_confidence, normalized_margin, confidence_weighted_entropy + """ + ) + if adjust_pred_probs: + if method == "confidence_weighted_entropy": + raise ValueError(f"adjust_pred_probs is not currently supported for {method}.") + pred_probs = _subtract_confident_thresholds( + labels=labels, pred_probs=pred_probs, confident_thresholds=confident_thresholds + ) + + scoring_inputs = {"labels": labels, "pred_probs": pred_probs} + label_quality_scores = scoring_func(**scoring_inputs) + return label_quality_scores + + +
[docs]def get_label_quality_ensemble_scores( + labels: np.ndarray, + pred_probs_list: List[np.ndarray], + *, + method: str = "self_confidence", + adjust_pred_probs: bool = False, + weight_ensemble_members_by: str = "accuracy", + custom_weights: Optional[np.ndarray] = None, + log_loss_search_T_values: List[float] = [1e-4, 1e-3, 1e-2, 1e-1, 1e0, 1e1, 1e2, 2e2], + verbose: bool = True, +) -> np.ndarray: + """Returns label quality scores based on predictions from an ensemble of models. + + This is a function to compute label-quality scores for classification datasets, + where lower scores indicate labels less likely to be correct. + + Ensemble scoring requires a list of pred_probs from each model in the ensemble. + + For each pred_probs in list, compute label quality score. + Take the average of the scores with the chosen weighting scheme determined by `weight_ensemble_members_by`. + + Score is between 0 and 1: + + - 1 --- clean label (given label is likely correct). + - 0 --- dirty label (given label is likely incorrect). + + Parameters + ---------- + labels : np.ndarray + Labels in the same format expected by the `~cleanlab.rank.get_label_quality_scores` function. + + pred_probs_list : List[np.ndarray] + Each element in this list should be an array of pred_probs in the same format + expected by the `~cleanlab.rank.get_label_quality_scores` function. + Each element of `pred_probs_list` corresponds to the predictions from one model for all examples. + + method : {"self_confidence", "normalized_margin", "confidence_weighted_entropy"}, default="self_confidence" + Label quality scoring method. See `~cleanlab.rank.get_label_quality_scores` + for scenarios on when to use each method. + + adjust_pred_probs : bool, optional + `adjust_pred_probs` in the same format expected by the `~cleanlab.rank.get_label_quality_scores` function. + + weight_ensemble_members_by : {"uniform", "accuracy", "log_loss_search", "custom"}, default="accuracy" + Weighting scheme used to aggregate scores from each model: + + - "uniform": Take the simple average of scores. + - "accuracy": Take weighted average of scores, weighted by model accuracy. + - "log_loss_search": Take weighted average of scores, weighted by exp(t * -log_loss) where t is selected from log_loss_search_T_values parameter and log_loss is the log-loss between a model's pred_probs and the given labels. + - "custom": Take weighted average of scores using custom weights that the user passes to the custom_weights parameter. + + custom_weights : np.ndarray, default=None + Weights used to aggregate scores from each model if weight_ensemble_members_by="custom". + Length of this array must match the number of models: len(pred_probs_list). + + log_loss_search_T_values : List, default=[1e-4, 1e-3, 1e-2, 1e-1, 1e0, 1e1, 1e2, 2e2] + List of t values considered if weight_ensemble_members_by="log_loss_search". + We will choose the value of t that leads to weights which produce the best log-loss when used to form a weighted average of pred_probs from the models. + + verbose : bool, default=True + Set to ``False`` to suppress all print statements. + + Returns + ------- + label_quality_scores : np.ndarray + Contains one score (between 0 and 1) per example. + Lower scores indicate more likely mislabeled examples. + + See Also + -------- + get_label_quality_scores + """ + + # Check pred_probs_list for errors + assert isinstance( + pred_probs_list, list + ), f"pred_probs_list needs to be a list. Provided pred_probs_list is a {type(pred_probs_list)}" + + assert len(pred_probs_list) > 0, "pred_probs_list is empty." + + if len(pred_probs_list) == 1: + warnings.warn( + """ + pred_probs_list only has one element. + Consider using get_label_quality_scores() if you only have a single array of pred_probs. + """ + ) + + for pred_probs in pred_probs_list: + assert_valid_inputs(X=None, y=labels, pred_probs=pred_probs, multi_label=False) + + # Raise ValueError if user passed custom_weights array but did not choose weight_ensemble_members_by="custom" + if custom_weights is not None and weight_ensemble_members_by != "custom": + raise ValueError( + f""" + custom_weights provided but weight_ensemble_members_by is not "custom"! + """ + ) + + # This weighting scheme performs search of t in log_loss_search_T_values for "best" log loss + if weight_ensemble_members_by == "log_loss_search": + # Initialize variables for log loss search + pred_probs_avg_log_loss_weighted = None + neg_log_loss_weights = None + best_eval_log_loss = float("inf") + + for t in log_loss_search_T_values: + neg_log_loss_list = [] + + # pred_probs for each model + for pred_probs in pred_probs_list: + pred_probs_clipped = np.clip( + pred_probs, a_min=CLIPPING_LOWER_BOUND, a_max=None + ) # lower-bound clipping threshold to prevents 0 in logs when calculating log loss + pred_probs_clipped /= pred_probs_clipped.sum(axis=1)[:, np.newaxis] # renormalize + + neg_log_loss = np.exp(-t * log_loss(labels, pred_probs_clipped)) + neg_log_loss_list.append(neg_log_loss) + + # weights using negative log loss + neg_log_loss_weights_temp = np.array(neg_log_loss_list) / sum(neg_log_loss_list) + + # weighted average using negative log loss + pred_probs_avg_log_loss_weighted_temp = sum( + [neg_log_loss_weights_temp[i] * p for i, p in enumerate(pred_probs_list)] + ) + # evaluate log loss with this weighted average pred_probs + eval_log_loss = log_loss(labels, pred_probs_avg_log_loss_weighted_temp) + + # check if eval_log_loss is the best so far (lower the better) + if best_eval_log_loss > eval_log_loss: + best_eval_log_loss = eval_log_loss + pred_probs_avg_log_loss_weighted = pred_probs_avg_log_loss_weighted_temp + neg_log_loss_weights = neg_log_loss_weights_temp.copy() + + # Generate scores for each model's pred_probs + scores_list = [] + accuracy_list = [] + for pred_probs in pred_probs_list: + # Calculate scores and accuracy + scores = get_label_quality_scores( + labels=labels, + pred_probs=pred_probs, + method=method, + adjust_pred_probs=adjust_pred_probs, + ) + scores_list.append(scores) + + # Only compute if weighting by accuracy + if weight_ensemble_members_by == "accuracy": + accuracy = (pred_probs.argmax(axis=1) == labels).mean() + accuracy_list.append(accuracy) + + if verbose: + print(f"Weighting scheme for ensemble: {weight_ensemble_members_by}") + + # Transform list of scores into an array of shape (N, M) where M is the number of models in the ensemble + scores_ensemble = np.vstack(scores_list).T + + # Aggregate scores with chosen weighting scheme + if weight_ensemble_members_by == "uniform": + label_quality_scores = scores_ensemble.mean(axis=1) # Uniform weights (simple average) + + elif weight_ensemble_members_by == "accuracy": + weights = np.array(accuracy_list) / sum(accuracy_list) # Weight by relative accuracy + if verbose: + print("Ensemble members will be weighted by their relative accuracy") + for i, acc in enumerate(accuracy_list): + print(f" Model {i} accuracy : {acc}") + print(f" Model {i} weight : {weights[i]}") + + # Aggregate scores with weighted average + label_quality_scores = (scores_ensemble * weights).sum(axis=1) + + elif weight_ensemble_members_by == "log_loss_search": + assert neg_log_loss_weights is not None + weights = neg_log_loss_weights # Weight by exp(t * -log_loss) where t is found by searching through log_loss_search_T_values + if verbose: + print( + "Ensemble members will be weighted by log-loss between their predicted probabilities and given labels" + ) + for i, weight in enumerate(weights): + print(f" Model {i} weight : {weight}") + + # Aggregate scores with weighted average + label_quality_scores = (scores_ensemble * weights).sum(axis=1) + + elif weight_ensemble_members_by == "custom": + # Check custom_weights for errors + assert ( + custom_weights is not None + ), "custom_weights is None! Please pass a valid custom_weights." + + assert len(custom_weights) == len( + pred_probs_list + ), "Length of custom_weights array must match the number of models: len(pred_probs_list)." + + # Aggregate scores with custom weights + label_quality_scores = (scores_ensemble * custom_weights).sum(axis=1) + + else: + raise ValueError( + f""" + {weight_ensemble_members_by} is not a valid weighting method for weight_ensemble_members_by! + Please choose a valid weight_ensemble_members_by: uniform, accuracy, custom + """ + ) + + return label_quality_scores
+ + +
[docs]def find_top_issues(quality_scores: np.ndarray, *, top: int = 10) -> np.ndarray: + """Returns the sorted indices of the `top` issues in `quality_scores`, ordered from smallest to largest quality score + (i.e., from most to least likely to be an issue). For example, the first value returned is the index corresponding + to the smallest value in `quality_scores` (most likely to be an issue). The second value in the returned array is + the index corresponding to the second smallest value in `quality-scores` (second-most likely to be an issue), and so forth. + + This method assumes that `quality_scores` shares an index with some dataset such that the indices returned by this method + map to the examples in that dataset. + + Parameters + ---------- + quality_scores : + Array of shape ``(N,)``, where N is the number of examples, containing one quality score for each example in the dataset. + + top : + The number of indices to return. + + Returns + ------- + top_issue_indices : + Indices of top examples most likely to suffer from an issue (ranked by issue severity).""" + + if top is None or top > len(quality_scores): + top = len(quality_scores) + + top_outlier_indices = quality_scores.argsort()[:top] + return top_outlier_indices
+ + +
[docs]def order_label_issues( + label_issues_mask: np.ndarray, + labels: np.ndarray, + pred_probs: np.ndarray, + *, + rank_by: str = "self_confidence", + rank_by_kwargs: dict = {}, +) -> np.ndarray: + """Sorts label issues by label quality score. + + Default label quality score is "self_confidence". + + Parameters + ---------- + label_issues_mask : np.ndarray + A boolean mask for the entire dataset where ``True`` represents a label + issue and ``False`` represents an example that is accurately labeled with + high confidence. + + labels : np.ndarray + Labels in the same format expected by the `~cleanlab.rank.get_label_quality_scores` function. + + pred_probs : np.ndarray (shape (N, K)) + Predicted-probabilities in the same format expected by the `~cleanlab.rank.get_label_quality_scores` function. + + rank_by : str, optional + Score by which to order label error indices (in increasing order). See + the `method` argument of `~cleanlab.rank.get_label_quality_scores`. + + rank_by_kwargs : dict, optional + Optional keyword arguments to pass into `~cleanlab.rank.get_label_quality_scores` function. + Accepted args include `adjust_pred_probs`. + + Returns + ------- + label_issues_idx : np.ndarray + Return an array of the indices of the examples with label issues, + ordered by the label-quality scoring method passed to `rank_by`. + """ + + allow_one_class = False + if isinstance(labels, np.ndarray) or all(isinstance(lab, int) for lab in labels): + if set(labels) == {0}: # occurs with missing classes in multi-label settings + allow_one_class = True + assert_valid_inputs( + X=None, + y=labels, + pred_probs=pred_probs, + multi_label=False, + allow_one_class=allow_one_class, + ) + + # Convert bool mask to index mask + label_issues_idx = np.arange(len(labels))[label_issues_mask] + + # Calculate label quality scores + label_quality_scores = get_label_quality_scores( + labels, pred_probs, method=rank_by, **rank_by_kwargs + ) + + # Get label quality scores for label issues + label_quality_scores_issues = label_quality_scores[label_issues_mask] + + return label_issues_idx[np.argsort(label_quality_scores_issues)]
+ + +
[docs]def get_self_confidence_for_each_label( + labels: np.ndarray, + pred_probs: np.ndarray, +) -> np.ndarray: + """Returns the self-confidence label-quality score for each datapoint. + + This is a function to compute label-quality scores for classification datasets, + where lower scores indicate labels less likely to be correct. + + The self-confidence is the classifier's predicted probability that an example belongs to + its given class label. + + Self-confidence can work better than normalized-margin for detecting label errors due to out-of-distribution (OOD) or weird examples + vs. label errors in which labels for random examples have been replaced by other classes. + + Parameters + ---------- + labels : np.ndarray + Labels in the same format expected by the `~cleanlab.rank.get_label_quality_scores` function. + + pred_probs : np.ndarray + Predicted-probabilities in the same format expected by the `~cleanlab.rank.get_label_quality_scores` function. + + Returns + ------- + label_quality_scores : np.ndarray + Contains one score (between 0 and 1) per example. + Lower scores indicate more likely mislabeled examples. + """ + + # To make this work for multi-label (but it will slow down runtime), return: + # np.array([np.mean(pred_probs[i, l]) for i, l in enumerate(labels)]) + return pred_probs[np.arange(labels.shape[0]), labels]
+ + +
[docs]def get_normalized_margin_for_each_label( + labels: np.ndarray, + pred_probs: np.ndarray, +) -> np.ndarray: + """Returns the "normalized margin" label-quality score for each datapoint. + + This is a function to compute label-quality scores for classification datasets, + where lower scores indicate labels less likely to be correct. + + Letting ``k`` denote the given label for a datapoint, the margin is + ``(p(label = k) - max(p(label != k)))``, i.e. the probability + of the given label minus the probability of the argmax label that is not + the given label (``margin = prob_label - max_prob_not_label``). + This gives you an idea of how likely an example is BOTH its given label AND not another label, + and therefore, scores its likelihood of being a good label or a label error. + The normalized margin is simply a transformed version of the margin, + to ensure values between 0-1 with lower values indicating more likely mislabeled data. + + Normalized margin works best for finding class conditional label errors where + there is another label in the set of classes that is clearly better than the given label. + + Parameters + ---------- + labels : np.ndarray + Labels in the same format expected by the `~cleanlab.rank.get_label_quality_scores` function. + + pred_probs : np.ndarray + Predicted-probabilities in the same format expected by the `~cleanlab.rank.get_label_quality_scores` function. + + Returns + ------- + label_quality_scores : np.ndarray + Contains one score (between 0 and 1) per example. + Lower scores indicate more likely mislabeled examples. + """ + + self_confidence = get_self_confidence_for_each_label(labels, pred_probs) + N, K = pred_probs.shape + del_indices = np.arange(N) * K + labels + max_prob_not_label = np.max( + np.delete(pred_probs, del_indices, axis=None).reshape(N, K - 1), axis=-1 + ) + label_quality_scores = (self_confidence - max_prob_not_label + 1) / 2 + return label_quality_scores
+ + +
[docs]def get_confidence_weighted_entropy_for_each_label( + labels: np.ndarray, pred_probs: np.ndarray +) -> np.ndarray: + """Returns the "confidence weighted entropy" label-quality score for each datapoint. + + This is a function to compute label-quality scores for classification datasets, + where lower scores indicate labels less likely to be correct. + + "confidence weighted entropy" is defined as the normalized entropy divided by "self-confidence". + The returned values are a transformed version of this score, in order to + ensure values between 0-1 with lower values indicating more likely mislabeled data. + + Parameters + ---------- + labels : np.ndarray + Labels in the same format expected by the `~cleanlab.rank.get_label_quality_scores` function. + + pred_probs : np.ndarray + Predicted-probabilities in the same format expected by the `~cleanlab.rank.get_label_quality_scores` function. + + Returns + ------- + label_quality_scores : np.ndarray + Contains one score (between 0 and 1) per example. + Lower scores indicate more likely mislabeled examples. + """ + + self_confidence = get_self_confidence_for_each_label(labels, pred_probs) + self_confidence = np.clip(self_confidence, a_min=CLIPPING_LOWER_BOUND, a_max=None) + + # Divide entropy by self confidence + label_quality_scores = get_normalized_entropy(pred_probs) / self_confidence + + # Rescale + clipped_scores = np.clip(label_quality_scores, a_min=CLIPPING_LOWER_BOUND, a_max=None) + label_quality_scores = np.log(label_quality_scores + 1) / clipped_scores + + return label_quality_scores
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/regression/learn.html b/v2.6.6/_modules/cleanlab/regression/learn.html new file mode 100644 index 000000000..c03a2e332 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/regression/learn.html @@ -0,0 +1,1566 @@ + + + + + + + + + + + cleanlab.regression.learn - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.regression.learn

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+cleanlab can be used for learning with noisy data for any dataset and regression model.
+
+For regression tasks, the :py:class:`regression.learn.CleanLearning <cleanlab.regression.learn.CleanLearning>`
+class wraps any instance of an sklearn model to allow you to train more robust regression models,
+or use the model to identify corrupted values in the dataset.
+The wrapped model must adhere to the `sklearn estimator API
+<https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator>`_,
+meaning it must define three functions:
+
+* ``model.fit(X, y, sample_weight=None)``
+* ``model.predict(X)``
+* ``model.score(X, y, sample_weight=None)``
+
+where ``X`` contains the data (i.e. features, covariates, independant variables) and ``y`` contains the target 
+value (i.e. label, response/dependant variable). The first index of ``X`` and of ``y`` should correspond to the different 
+examples in the dataset, such that ``len(X) = len(y) = N`` (sample-size).
+
+Your model should be correctly clonable via
+`sklearn.base.clone <https://scikit-learn.org/stable/modules/generated/sklearn.base.clone.html>`_:
+cleanlab internally creates multiple instances of the model, and if you e.g. manually wrap a 
+PyTorch model, ensure that every call to the estimator's ``__init__()`` creates an independent 
+instance of the model (for sklearn compatibility, the weights of neural network models should typically 
+be initialized inside of ``clf.fit()``).
+
+Example
+-------
+>>> from cleanlab.regression.learn import CleanLearning
+>>> from sklearn.linear_model import LinearRegression 
+>>> cl = CleanLearning(clf=LinearRegression()) # Pass in any model.
+>>> cl.fit(X, y_with_noise)
+>>> # Estimate the predictions as if you had trained without label issues.
+>>> predictions = cl.predict(y)
+
+If your model is not sklearn-compatible by default, it might be the case that standard packages can adapt 
+the model. For example, you can adapt PyTorch models using `skorch <https://skorch.readthedocs.io/>`_ 
+and adapt Keras models using `SciKeras <https://www.adriangb.com/scikeras/>`_.
+
+If an adapter doesn't already exist, you can manually wrap your 
+model to be sklearn-compatible. This is made easy by inheriting from
+`sklearn.base.BaseEstimator
+<https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html>`_:
+
+.. code:: python
+
+    from sklearn.base import BaseEstimator
+
+    class YourModel(BaseEstimator):
+        def __init__(self, ):
+            pass
+        def fit(self, X, y):
+            pass
+        def predict(self, X):
+            pass
+        def score(self, X, y):
+            pass
+            
+"""
+
+from typing import Optional, Union, Tuple
+import inspect
+import warnings
+
+import math
+import numpy as np
+import pandas as pd
+
+import sklearn.base
+from sklearn.base import BaseEstimator
+from sklearn.model_selection import KFold
+from sklearn.linear_model import LinearRegression
+from sklearn.metrics import r2_score
+
+from cleanlab.typing import LabelLike
+from cleanlab.internal.constants import TINY_VALUE
+from cleanlab.internal.util import train_val_split, subset_X_y
+from cleanlab.internal.regression_utils import assert_valid_regression_inputs
+from cleanlab.internal.validation import labels_to_array
+
+
+
[docs]class CleanLearning(BaseEstimator): + """ + CleanLearning = Machine Learning with cleaned data (even when training on messy, error-ridden data). + + Automated and robust learning with noisy labels using any dataset and any regression model. + For regression tasks, this class trains a ``model`` with error-prone, noisy labels + as if the model had been instead trained on a dataset with perfect labels. + It achieves this by estimating which labels are noisy (you might solely use CleanLearning for this estimation) + and then removing examples estimated to have noisy labels, such that a more robust copy of the same model can be + trained on the remaining clean data. + + Parameters + ---------- + model : + Any regression model implementing the `sklearn estimator API <https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator>`_, + defining the following functions: + + - ``model.fit(X, y)`` + - ``model.predict(X)`` + - ``model.score(X, y)`` + + Default model used is `sklearn.linear_model.LinearRegression + <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html>`_. + + cv_n_folds : + This class needs holdout predictions for every data example and if not provided, + uses cross-validation to compute them. This argument sets the number of cross-validation + folds used to compute out-of-sample predictions for each example in ``X``. Default is 5. + Larger values may produce better results, but requires longer to run. + + n_boot : + Number of bootstrap resampling rounds used to estimate the model's epistemic uncertainty. + Default is 5. Larger values are expected to produce better results but require longer runtimes. + Set as 0 to skip estimating the epistemic uncertainty and get results faster. + + include_aleatoric_uncertainty : + Specifies if the aleatoric uncertainty should be estimated during label error detection. + ``True`` by default, which is expected to produce better results but require longer runtimes. + + verbose : + Controls how much output is printed. Set to ``False`` to suppress print statements. Default `False`. + + seed : + Set the default state of the random number generator used to split + the data. By default, uses ``np.random`` current random state. + """ + + def __init__( + self, + model: Optional[BaseEstimator] = None, + *, + cv_n_folds: int = 5, + n_boot: int = 5, + include_aleatoric_uncertainty: bool = True, + verbose: bool = False, + seed: Optional[bool] = None, + ): + if model is None: + # Use linear regression if no model is provided. + model = LinearRegression() + + # Make sure the given regression model has the appropriate methods defined. + if not hasattr(model, "fit"): + raise ValueError("The model must define a .fit() method.") + if not hasattr(model, "predict"): + raise ValueError("The model must define a .predict() method.") + + if seed is not None: + np.random.seed(seed=seed) + + if n_boot < 0: + raise ValueError("n_boot cannot be a negative value") + if cv_n_folds < 2: + raise ValueError("cv_n_folds must be at least 2") + + self.model: BaseEstimator = model + self.seed: Optional[int] = seed + self.cv_n_folds: int = cv_n_folds + self.n_boot: int = n_boot + self.include_aleatoric_uncertainty: bool = include_aleatoric_uncertainty + self.verbose: bool = verbose + self.label_issues_df: Optional[pd.DataFrame] = None + self.label_issues_mask: Optional[np.ndarray] = None + self.k: Optional[float] = None # frac flagged as issue + +
[docs] def fit( + self, + X: Union[np.ndarray, pd.DataFrame], + y: LabelLike, + *, + label_issues: Optional[Union[pd.DataFrame, np.ndarray]] = None, + sample_weight: Optional[np.ndarray] = None, + find_label_issues_kwargs: Optional[dict] = None, + model_kwargs: Optional[dict] = None, + model_final_kwargs: Optional[dict] = None, + ) -> BaseEstimator: + """ + Train regression ``model`` with error-prone, noisy labels as if the model had been instead trained + on a dataset with the correct labels. ``fit`` achieves this by first training ``model`` via + cross-validation on the noisy data, using the resulting predicted probabilities to identify label issues, + pruning the data with label issues, and finally training ``model`` on the remaining clean data. + + Parameters + ---------- + X : + Data features (i.e. covariates, independent variables), typically an array of shape ``(N, ...)``, + where N is the number of examples (sample-size). + Your ``model`` must be able to ``fit()`` and ``predict()`` data of this format. + + y : + An array of shape ``(N,)`` of noisy labels (i.e. target/response/dependant variable), where some values may be erroneous. + + label_issues : + Optional already-identified label issues in the dataset (if previously estimated). + Specify this to avoid re-estimating the label issues if already done. + If ``pd.DataFrame``, must be formatted as the one returned by: + :py:meth:`self.find_label_issues <cleanlab.regression.learn.CleanLearning.find_label_issues>` or + :py:meth:`self.get_label_issues <cleanlab.regression.learn.CleanLearning.get_label_issues>`. The DataFrame must + have a column named ``is_label_issue``. + + If ``np.ndarray``, the input must be a boolean mask of length ``N`` where examples that have label issues + have the value ``True``, and the rest of the examples have the value ``False``. + + sample_weight : + Optional array of weights with shape ``(N,)`` that are assigned to individual samples. Specifies how to weight the examples in + the loss function while training. + + find_label_issues_kwargs: + Optional keyword arguments to pass into :py:meth:`self.find_label_issues <cleanlab.regression.learn.CleanLearning.find_label_issues>`. + + model_kwargs : + Optional keyword arguments to pass into model's ``fit()`` method. + + model_final_kwargs : + Optional extra keyword arguments to pass into the final model's ``fit()`` on the cleaned data, + but not the ``fit()`` in each fold of cross-validation on the noisy data. + The final ``fit()`` will also receive the arguments in `clf_kwargs`, but these may be overwritten + by values in `clf_final_kwargs`. This can be useful for training differently in the final ``fit()`` + than during cross-validation. + + Returns + ------- + self : CleanLearning + Fitted estimator that has all the same methods as any sklearn estimator. + + After calling ``self.fit()``, this estimator also stores extra attributes such as: + + - ``self.label_issues_df``: a ``pd.DataFrame`` containing label quality scores, boolean flags + indicating which examples have label issues, and predicted label values for each example. + Accessible via :py:meth:`self.get_label_issues <cleanlab.regression.learn.CleanLearning.get_label_issues>`, + of similar format as the one returned by :py:meth:`self.find_label_issues <cleanlab.regression.learn.CleanLearning.find_label_issues>`. + See documentation of :py:meth:`self.find_label_issues <cleanlab.regression.learn.CleanLearning.find_label_issues>` + for column descriptions. + - ``self.label_issues_mask``: a ``np.ndarray`` boolean mask indicating if a particular + example has been identified to have issues. + """ + assert_valid_regression_inputs(X, y) + + if find_label_issues_kwargs is None: + find_label_issues_kwargs = {} + if model_kwargs is None: + model_kwargs = {} + if model_final_kwargs is None: + model_final_kwargs = {} + model_final_kwargs = {**model_kwargs, **model_final_kwargs} + + if "sample_weight" in model_kwargs or "sample_weight" in model_final_kwargs: + raise ValueError( + "sample_weight should be provided directly in fit() rather than in model_kwargs or model_final_kwargs" + ) + + if sample_weight is not None: + if "sample_weight" not in inspect.signature(self.model.fit).parameters: + raise ValueError( + "sample_weight must be a supported fit() argument for your model in order to be specified here" + ) + if len(sample_weight) != len(X): + raise ValueError("sample_weight must be a 1D array that has the same length as y.") + + if label_issues is None: + if self.label_issues_df is not None and self.verbose: + print( + "If you already ran self.find_label_issues() and don't want to recompute, you " + "should pass the label_issues in as a parameter to this function next time." + ) + + label_issues = self.find_label_issues( + X, + y, + model_kwargs=model_kwargs, + **find_label_issues_kwargs, + ) + else: + if self.verbose: + print("Using provided label_issues instead of finding label issues.") + if self.label_issues_df is not None: + print( + "These will overwrite self.label_issues_df and will be returned by " + "`self.get_label_issues()`. " + ) + + self.label_issues_df = self._process_label_issues_arg(label_issues, y) + self.label_issues_mask = self.label_issues_df["is_label_issue"].to_numpy() + + X_mask = np.invert(self.label_issues_mask) + X_cleaned, y_cleaned = subset_X_y(X, y, X_mask) + if self.verbose: + print(f"Pruning {np.sum(self.label_issues_mask)} examples with label issues ...") + print(f"Remaining clean data has {len(y_cleaned)} examples.") + + if sample_weight is not None: + model_final_kwargs["sample_weight"] = sample_weight[X_mask] + if self.verbose: + print("Fitting final model on the clean data with custom sample_weight ...") + else: + if self.verbose: + print("Fitting final model on the clean data ...") + + self.model.fit(X_cleaned, y_cleaned, **model_final_kwargs) + + if self.verbose: + print( + "Label issues stored in label_issues_df DataFrame accessible via: self.get_label_issues(). " + "Call self.save_space() to delete this potentially large DataFrame attribute." + ) + return self
+ +
[docs] def predict(self, X: np.ndarray, *args, **kwargs) -> np.ndarray: + """ + Predict class labels using your wrapped model. + Works just like ``model.predict()``. + + Parameters + ---------- + X : np.ndarray or DatasetLike + Test data in the same format expected by your wrapped regression model. + + Returns + ------- + predictions : np.ndarray + Predictions for the test examples. + """ + return self.model.predict(X, *args, **kwargs)
+ +
[docs] def score( + self, + X: Union[np.ndarray, pd.DataFrame], + y: LabelLike, + sample_weight: Optional[np.ndarray] = None, + ) -> float: + """Evaluates your wrapped regression model's score on a test set `X` with target values `y`. + Uses your model's default scoring function, or r-squared score if your model as no ``"score"`` attribute. + + Parameters + ---------- + X : + Test data in the same format expected by your wrapped model. + + y : + Test labels in the same format as labels previously used in ``fit()``. + + sample_weight : + Optional array of shape ``(N,)`` or ``(N, 1)`` used to weight each test example when computing the score. + + Returns + ------- + score : float + Number quantifying the performance of this regression model on the test data. + """ + if hasattr(self.model, "score"): + if "sample_weight" in inspect.signature(self.model.score).parameters: + return self.model.score(X, y, sample_weight=sample_weight) + else: + return self.model.score(X, y) + else: + return r2_score( + y, + self.model.predict(X), + sample_weight=sample_weight, + )
+ +
[docs] def find_label_issues( + self, + X: Union[np.ndarray, pd.DataFrame], + y: LabelLike, + *, + uncertainty: Optional[Union[np.ndarray, float]] = None, + coarse_search_range: list = [0.01, 0.05, 0.1, 0.15, 0.2], + fine_search_size: int = 3, + save_space: bool = False, + model_kwargs: Optional[dict] = None, + ) -> pd.DataFrame: + """ + Identifies potential label issues (corrupted `y`-values) in the dataset, and estimates how noisy each label is. + + Note: this method estimates the label issues from scratch. To access previously-estimated label issues from + this :py:class:`CleanLearning <cleanlab.regression.learn.CleanLearning>` instance, use the + :py:meth:`self.get_label_issues <cleanlab.regression.learn.CleanLearning.get_label_issues>` method. + + This is the method called to find label issues inside + :py:meth:`CleanLearning.fit() <cleanlab.regression.learn.CleanLearning.fit>` + and they share mostly the same parameters. + + Parameters + ---------- + X : + Data features (i.e. covariates, independent variables), typically an array of shape ``(N, ...)``, + where N is the number of examples (sample-size). + Your ``model``, must be able to ``fit()`` and ``predict()`` data of this format. + + y : + An array of shape ``(N,)`` of noisy labels (i.e. target/response/dependant variable), where some values may be erroneous. + + uncertainty : + Optional estimated uncertainty for each example. Should be passed in as a float (constant uncertainty throughout all examples), + or a numpy array of length ``N`` (estimated uncertainty for each example). + If not provided, this method will estimate the uncertainty as the sum of the epistemic and aleatoric uncertainty. + + save_space : + If True, then returned ``label_issues_df`` will not be stored as attribute. + This means some other methods like :py:meth:`self.get_label_issues <cleanlab.regression.learn.CleanLearning.get_label_issues>` will no longer work. + + coarse_search_range : + The coarse search range to find the value of ``k``, which estimates the fraction of data which have label issues. + More values represent a more thorough search (better expected results but longer runtimes). + + fine_search_size : + Size of fine-grained search grid to find the value of ``k``, which represents our estimate of the fraction of data which have label issues. + A higher number represents a more thorough search (better expected results but longer runtimes). + + + For info about the **other parameters**, see the docstring of :py:meth:`CleanLearning.fit() + <cleanlab.regression.learn.CleanLearning.fit>`. + + Returns + ------- + label_issues_df : pd.DataFrame + DataFrame with info about label issues for each example. + Unless `save_space` argument is specified, same DataFrame is also stored as `self.label_issues_df` attribute accessible via + :py:meth:`get_label_issues<cleanlab.regression.learn.CleanLearning.get_label_issues>`. + + Each row represents an example from our dataset and the DataFrame may contain the following columns: + + - *is_label_issue*: boolean mask for the entire dataset where ``True`` represents a label issue and ``False`` represents an example + that is accurately labeled with high confidence. + - *label_quality*: Numeric score that measures the quality of each label (how likely it is to be correct, + with lower scores indicating potentially erroneous labels). + - *given_label*: Values originally given for this example (same as `y` input). + - *predicted_label*: Values predicted by the trained model. + """ + + X, y = assert_valid_regression_inputs(X, y) + + if model_kwargs is None: + model_kwargs = {} + + if self.verbose: + print("Identifying label issues ...") + + # compute initial values to find best k + initial_predictions = self._get_cv_predictions(X, y, model_kwargs=model_kwargs) + initial_residual = initial_predictions - y + initial_sorted_index = np.argsort(abs(initial_residual)) + initial_r2 = r2_score(y, initial_predictions) + + self.k, r2 = self._find_best_k( + X=X, + y=y, + sorted_index=initial_sorted_index, + coarse_search_range=coarse_search_range, + fine_search_size=fine_search_size, + ) + + # check if initial r2 score (ie. not removing anything) is the best + if initial_r2 >= r2: + self.k = 0 + + # get predictions using the best k + predictions = self._get_cv_predictions( + X, y, sorted_index=initial_sorted_index, k=self.k, model_kwargs=model_kwargs + ) + residual = predictions - y + + if uncertainty is None: + epistemic_uncertainty = self.get_epistemic_uncertainty(X, y, predictions=predictions) + if self.include_aleatoric_uncertainty: + aleatoric_uncertainty = self.get_aleatoric_uncertainty(X, residual) + else: + aleatoric_uncertainty = 0 + uncertainty = epistemic_uncertainty + aleatoric_uncertainty + else: + if isinstance(uncertainty, np.ndarray) and len(y) != len(uncertainty): + raise ValueError( + "If uncertainty is passed in as an array, it must have the same length as y." + ) + + residual_adjusted = abs(residual / (uncertainty + TINY_VALUE)) + + # adjust lqs by the median (for more human-readable scores) + residual_median = max( + np.median(residual_adjusted), TINY_VALUE + ) # take the max to prevent median = 0 + label_quality_scores = np.exp(-residual_adjusted / residual_median) + + label_issues_mask = np.zeros(len(y), dtype=bool) + num_issues = math.ceil(len(y) * self.k) + issues_index = np.argsort(label_quality_scores)[:num_issues] + label_issues_mask[issues_index] = True + + # convert predictions to int if input is int + if y.dtype == int: + predictions = predictions.astype(int) + + label_issues_df = pd.DataFrame( + { + "is_label_issue": label_issues_mask, + "label_quality": label_quality_scores, + "given_label": y, + "predicted_label": predictions, + } + ) + + if self.verbose: + print(f"Identified {np.sum(label_issues_mask)} examples with label issues.") + + if not save_space: + if self.label_issues_df is not None and self.verbose: + print( + "Overwriting previously identified label issues stored at self.label_issues_df. " + "self.get_label_issues() will now return the newly identified label issues. " + ) + self.label_issues_df = label_issues_df + self.label_issues_mask = label_issues_df["is_label_issue"].to_numpy() + elif self.verbose: + print("Not storing label_issues as attributes since save_space was specified.") + + return label_issues_df
+ +
[docs] def get_label_issues(self) -> Optional[pd.DataFrame]: + """ + Accessor, returns `label_issues_df` attribute if previously computed. + This ``pd.DataFrame`` describes the issues identified for each example (each row corresponds to an example). + For column definitions, see the documentation of + :py:meth:`CleanLearning.find_label_issues<cleanlab.regression.learn.CleanLearning.find_label_issues>`. + + Returns + ------- + label_issues_df : pd.DataFrame + DataFrame with (precomputed) info about the label issues for each example. + """ + if self.label_issues_df is None: + warnings.warn( + "Label issues have not yet been computed. Run `self.find_label_issues()` or `self.fit()` first." + ) + return self.label_issues_df
+ +
[docs] def get_epistemic_uncertainty( + self, + X: np.ndarray, + y: np.ndarray, + predictions: Optional[np.ndarray] = None, + ) -> np.ndarray: + """ + Compute the epistemic uncertainty of the regression model for each example. This uncertainty is estimated using the bootstrapped + variance of the model predictions. + + Parameters + ---------- + X : + Data features (i.e. training inputs for ML), typically an array of shape ``(N, ...)``, where N is the number of examples. + + y : + An array of shape ``(N,)`` of target values (dependant variables), where some values may be erroneous. + + predictions : + Model predicted values of y, will be used as an extra bootstrap iteration to calculate the variance. + + Returns + _______ + epistemic_uncertainty : np.ndarray + The estimated epistemic uncertainty for each example. + """ + X, y = assert_valid_regression_inputs(X, y) + + if self.n_boot == 0: # does not estimate epistemic uncertainty + return np.zeros(len(y)) + else: + bootstrap_predictions = np.zeros(shape=(len(y), self.n_boot)) + for i in range(self.n_boot): + bootstrap_predictions[:, i] = self._get_cv_predictions(X, y, cv_n_folds=2) + + # add a set of predictions from model that was already trained + if predictions is not None: + _, predictions = assert_valid_regression_inputs(X, predictions) + bootstrap_predictions = np.hstack( + [bootstrap_predictions, predictions.reshape(-1, 1)] + ) + + return np.sqrt(np.var(bootstrap_predictions, axis=1))
+ +
[docs] def get_aleatoric_uncertainty( + self, + X: np.ndarray, + residual: np.ndarray, + ) -> float: + """ + Compute the aleatoric uncertainty of the data. This uncertainty is estimated by predicting the standard deviation + of the regression error. + + Parameters + ---------- + X : + Data features (i.e. training inputs for ML), typically an array of shape ``(N, ...)``, where N is the number of examples. + + residual : + The difference between the given value and the model predicted value of each examples, ie. + `predictions - y`. + + Returns + _______ + aleatoric_uncertainty : float + The overall estimated aleatoric uncertainty for this dataset. + """ + X, residual = assert_valid_regression_inputs(X, residual) + residual_predictions = self._get_cv_predictions(X, residual) + return np.sqrt(np.var(residual_predictions))
+ +
[docs] def save_space(self): + """ + Clears non-sklearn attributes of this estimator to save space (in-place). + This includes the DataFrame attribute that stored label issues which may be large for big datasets. + You may want to call this method before deploying this model (i.e. if you just care about producing predictions). + After calling this method, certain non-prediction-related attributes/functionality will no longer be available + """ + if self.label_issues_df is None and self.verbose: + print("self.label_issues_df is already empty") + + self.label_issues_df = None + self.label_issues_mask = None + self.k = None + + if self.verbose: + print("Deleted non-sklearn attributes such as label_issues_df to save space.")
+ + def _get_cv_predictions( + self, + X: np.ndarray, + y: np.ndarray, + sorted_index: Optional[np.ndarray] = None, + k: float = 0, + *, + cv_n_folds: Optional[int] = None, + seed: Optional[int] = None, + model_kwargs: Optional[dict] = None, + ) -> np.ndarray: + """ + Helper method to get out-of-fold predictions using cross validation. + This method also allows us to filter out the bottom k percent of label errors before training the cross-validation models + (both ``sorted_index`` and ``k`` has to be provided for this). + + Parameters + ---------- + X : + Data features (i.e. training inputs for ML), typically an array of shape ``(N, ...)``, where N is the number of examples. + + y : + An array of shape ``(N,)`` of target values (dependant variables), where some values may be erroneous. + + sorted_index : + Index of each example sorted by their residuals in ascending order. + + k : + The fraction of examples to hold out from the training sets. Usually this is the fraction of examples that are + deemed to contain errors. + + """ + # set to default unless specified otherwise + if cv_n_folds is None: + cv_n_folds = self.cv_n_folds + + if model_kwargs is None: + model_kwargs = {} + + if k < 0 or k > 1: + raise ValueError("k must be a value between 0 and 1") + elif k == 0: + if sorted_index is None: + sorted_index = np.array(range(len(y))) + in_sample_idx = sorted_index + else: + if sorted_index is None: + # TODO: better error message + raise ValueError( + "You need to pass in the index sorted by prediction quality to use with k" + ) + num_to_drop = math.ceil(len(sorted_index) * k) + in_sample_idx = sorted_index[:-num_to_drop] + out_of_sample_idx = sorted_index[-num_to_drop:] + + X_out_of_sample = X[out_of_sample_idx] + out_of_sample_predictions = np.zeros(shape=[len(out_of_sample_idx), cv_n_folds]) + + if len(in_sample_idx) < cv_n_folds: + raise ValueError( + f"There are too few examples to conduct {cv_n_folds}-fold cross validation. " + "You can either reduce cv_n_folds for cross validation, or decrease k to exclude less data." + ) + + predictions = np.zeros(shape=len(y)) + + kf = KFold(n_splits=cv_n_folds, shuffle=True, random_state=seed) + + for k_split, (cv_train_idx, cv_holdout_idx) in enumerate(kf.split(in_sample_idx)): + try: + model_copy = sklearn.base.clone(self.model) # fresh untrained copy of the model + except Exception: + raise ValueError( + "`model` must be clonable via: sklearn.base.clone(model). " + "You can either implement instance method `model.get_params()` to produce a fresh untrained copy of this model, " + "or you can implement the cross-validation outside of cleanlab " + "and pass in the obtained `pred_probs` to skip cleanlab's internal cross-validation" + ) + + # map the index to the actual index in the original dataset + data_idx_train, data_idx_holdout = ( + in_sample_idx[cv_train_idx], + in_sample_idx[cv_holdout_idx], + ) + + X_train_cv, X_holdout_cv, y_train_cv, y_holdout_cv = train_val_split( + X, y, data_idx_train, data_idx_holdout + ) + + model_copy.fit(X_train_cv, y_train_cv, **model_kwargs) + predictions_cv = model_copy.predict(X_holdout_cv) + + predictions[data_idx_holdout] = predictions_cv + + if k != 0: + out_of_sample_predictions[:, k_split] = model_copy.predict(X_out_of_sample) + + if k != 0: + out_of_sample_predictions_avg = np.mean(out_of_sample_predictions, axis=1) + predictions[out_of_sample_idx] = out_of_sample_predictions_avg + + return predictions + + def _find_best_k( + self, + X: np.ndarray, + y: np.ndarray, + sorted_index: np.ndarray, + coarse_search_range: list = [0.01, 0.05, 0.1, 0.15, 0.2], + fine_search_size: int = 3, + ) -> Tuple[float, float]: + """ + Helper method that conducts a coarse and fine grained grid search to determine the best value + of k, the fraction of the dataset that contains issues. + + Returns a tuple containing the the best value of k (ie. the one that has the best r squared score), + and the corrsponding r squared score obtained when dropping k% of the data. + """ + if len(coarse_search_range) == 0: + raise ValueError("coarse_search_range must have at least 1 value of k") + elif len(coarse_search_range) == 1: + curr_k = coarse_search_range[0] + num_examples_kept = math.floor(len(y) * (1 - curr_k)) + if num_examples_kept < self.cv_n_folds: + raise ValueError( + f"There are too few examples to conduct {self.cv_n_folds}-fold cross validation. " + "You can either reduce self.cv_n_folds for cross validation, or decrease k to exclude less data." + ) + predictions = self._get_cv_predictions( + X=X, + y=y, + sorted_index=sorted_index, + k=curr_k, + ) + best_r2 = r2_score(y, predictions) + best_k = coarse_search_range[0] + else: + # conduct coarse search + coarse_search_range = sorted(coarse_search_range) # sort to conduct fine search well + r2_coarse = np.full(len(coarse_search_range), np.NaN) + for i in range(len(coarse_search_range)): + curr_k = coarse_search_range[i] + num_examples_kept = math.floor(len(y) * (1 - curr_k)) + # check if there are too few examples to do cross val + if num_examples_kept < self.cv_n_folds: + r2_coarse[i] = -1e30 # arbitrary large negative number + else: + predictions = self._get_cv_predictions( + X=X, + y=y, + sorted_index=sorted_index, + k=curr_k, + ) + r2_coarse[i] = r2_score(y, predictions) + + max_r2_ind = np.argmax(r2_coarse) + + # conduct fine search + if fine_search_size < 0: + raise ValueError("fine_search_size must at least 0") + elif fine_search_size == 0: + best_k = coarse_search_range[np.argmax(r2_coarse)] + best_r2 = np.max(r2_coarse) + else: + fine_search_range = np.array([]) + if max_r2_ind != 0: + fine_search_range = np.append( + np.linspace( + coarse_search_range[max_r2_ind - 1], + coarse_search_range[max_r2_ind], + fine_search_size + 1, + endpoint=False, + )[1:], + fine_search_range, + ) + if max_r2_ind != len(coarse_search_range) - 1: + fine_search_range = np.append( + fine_search_range, + np.linspace( + coarse_search_range[max_r2_ind], + coarse_search_range[max_r2_ind + 1], + fine_search_size + 1, + endpoint=False, + )[1:], + ) + + r2_fine = np.full(len(fine_search_range), np.NaN) + for i in range(len(fine_search_range)): + curr_k = fine_search_range[i] + num_examples_kept = math.floor(len(y) * (1 - curr_k)) + # check if there are too few examples to do cross val + if num_examples_kept < self.cv_n_folds: + r2_fine[i] = -1e30 # arbitrary large negative number + else: + predictions = self._get_cv_predictions( + X=X, + y=y, + sorted_index=sorted_index, + k=curr_k, + ) + r2_fine[i] = r2_score(y, predictions) + + # check the max between coarse and fine search + if max(r2_coarse) > max(r2_fine): + best_k = coarse_search_range[np.argmax(r2_coarse)] + best_r2 = np.max(r2_coarse) + else: + best_k = fine_search_range[np.argmax(r2_fine)] + best_r2 = np.max(r2_fine) + + return best_k, best_r2 + + def _process_label_issues_arg( + self, + label_issues: Union[pd.DataFrame, pd.Series, np.ndarray], + y: LabelLike, + ) -> pd.DataFrame: + """ + Helper method to process the label_issues input into a well-formatted DataFrame. + """ + y = labels_to_array(y) + + if isinstance(label_issues, pd.DataFrame): + if "is_label_issue" not in label_issues.columns: + raise ValueError( + "DataFrame label_issues must contain column: 'is_label_issue'. " + "See CleanLearning.fit() documentation for label_issues column descriptions." + ) + if len(label_issues) != len(y): + raise ValueError("label_issues and labels must have same length") + if "given_label" in label_issues.columns and np.any( + label_issues["given_label"].to_numpy() != y + ): + raise ValueError("labels must match label_issues['given_label']") + return label_issues + + elif isinstance(label_issues, (pd.Series, np.ndarray)): + if label_issues.dtype is not np.dtype("bool"): + raise ValueError("If label_issues is numpy.array, dtype must be 'bool'.") + if label_issues.shape != y.shape: + raise ValueError("label_issues must have same shape as labels") + return pd.DataFrame({"is_label_issue": label_issues, "given_label": y}) + + else: + raise ValueError( + "label_issues must be either pandas.DataFrame, pandas.Series or numpy.ndarray" + )
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/regression/rank.html b/v2.6.6/_modules/cleanlab/regression/rank.html new file mode 100644 index 000000000..dfd4ad292 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/regression/rank.html @@ -0,0 +1,869 @@ + + + + + + + + + + + cleanlab.regression.rank - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.regression.rank

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+
+"""
+Methods to score the quality of each label in a regression dataset. These can be used to rank the examples whose Y-value is most likely erroneous.
+
+Note: Label quality scores are most accurate when they are computed based on out-of-sample `predictions` from your regression model.
+To obtain out-of-sample predictions for every datapoint in your dataset, you can use :ref:`cross-validation <pred_probs_cross_val>`. This is encouraged to get better results.
+
+If you have a sklearn-compatible regression model, consider using `cleanlab.regression.learn.CleanLearning` instead, which can more accurately identify noisy label values.
+"""
+
+from typing import Dict, Callable, Optional, Union
+import numpy as np
+from numpy.typing import ArrayLike
+
+from cleanlab.internal.neighbor.metric import decide_euclidean_metric
+from cleanlab.internal.neighbor.knn_graph import features_to_knn
+from cleanlab.outlier import OutOfDistribution
+from cleanlab.internal.regression_utils import assert_valid_prediction_inputs
+
+from cleanlab.internal.constants import TINY_VALUE
+
+
+
[docs]def get_label_quality_scores( + labels: ArrayLike, + predictions: ArrayLike, + *, + method: str = "outre", +) -> np.ndarray: + """ + Returns label quality score for each example in the regression dataset. + + Each score is a continous value in the range [0,1] + + * 1 - clean label (given label is likely correct). + * 0 - dirty label (given label is likely incorrect). + + Parameters + ---------- + labels : array_like + Raw labels from original dataset. + 1D array of shape ``(N, )`` containing the given labels for each example (aka. Y-value, response/target/dependent variable), where N is number of examples in the dataset. + + predictions : np.ndarray + 1D array of shape ``(N,)`` containing the predicted label for each example in the dataset. These should be out-of-sample predictions from a trained regression model, which you can obtain for every example in your dataset via :ref:`cross-validation <pred_probs_cross_val>`. + + method : {"residual", "outre"}, default="outre" + String specifying which method to use for scoring the quality of each label and identifying which labels appear most noisy. + + Returns + ------- + label_quality_scores: + Array of shape ``(N, )`` of scores between 0 and 1, one per example in the dataset. + + Lower scores indicate examples more likely to contain a label issue. + + Examples + -------- + >>> import numpy as np + >>> from cleanlab.regression.rank import get_label_quality_scores + >>> labels = np.array([1,2,3,4]) + >>> predictions = np.array([2,2,5,4.1]) + >>> label_quality_scores = get_label_quality_scores(labels, predictions) + >>> label_quality_scores + array([0.00323821, 0.33692597, 0.00191686, 0.33692597]) + """ + + # Check if inputs are valid + labels, predictions = assert_valid_prediction_inputs( + labels=labels, predictions=predictions, method=method + ) + + scoring_funcs: Dict[str, Callable[[np.ndarray, np.ndarray], np.ndarray]] = { + "residual": _get_residual_score_for_each_label, + "outre": _get_outre_score_for_each_label, + } + + scoring_func = scoring_funcs.get(method, None) + if not scoring_func: + raise ValueError( + f""" + {method} is not a valid scoring method. + Please choose a valid scoring technique: {scoring_funcs.keys()}. + """ + ) + + # Calculate scores + label_quality_scores = scoring_func(labels, predictions) + return label_quality_scores
+ + +def _get_residual_score_for_each_label( + labels: np.ndarray, + predictions: np.ndarray, +) -> np.ndarray: + """Returns a residual label-quality score for each example. + + This is function to compute label-quality scores for regression datasets, + where lower score indicate labels less likely to be correct. + + Residual based scores can work better for datasets where independent variables + are based out of normal distribution. + + Parameters + ---------- + labels: np.ndarray + Labels in the same format expected by the `~cleanlab.regression.rank.get_label_quality_scores` function. + + predictions: np.ndarray + Predicted labels in the same format expected by the `~cleanlab.regression.rank.get_label_quality_scores` function. + + Returns + ------- + label_quality_scores: np.ndarray + Contains one score (between 0 and 1) per example. + Lower scores indicate more likely mislabled examples. + + """ + residual = predictions - labels + label_quality_scores = np.exp(-abs(residual)) + return label_quality_scores + + +def _get_outre_score_for_each_label( + labels: np.ndarray, + predictions: np.ndarray, + *, + residual_scale: float = 5, + frac_neighbors: float = 0.5, + neighbor_metric: Optional[Union[str, Callable]] = None, +) -> np.ndarray: + """Returns OUTRE based label-quality scores. + + This function computes label-quality scores for regression datasets, + where a lower score indicates labels that are less likely to be correct. + + Parameters + ---------- + labels: np.ndarray + Labels in the same format as expected by the `~cleanlab.regression.rank.get_label_quality_scores` function. + + predictions: np.ndarray + Predicted labels in the same format as expected by the `~cleanlab.regression.rank.get_label_quality_scores` function. + + residual_scale: float, default = 5 + Multiplicative factor to adjust scale (standard deviation) of the residuals relative to the labels. + + frac_neighbors: float, default = 0.5 + Fraction of examples in dataset that should be considered as `n_neighbors` in the ``NearestNeighbors`` object used internally to assess outliers. + + neighbor_metric: Optional[str or callable], default = None + The parameter is passed to sklearn NearestNeighbors. # TODO add reference to sklearn.NearestNeighbor? + If None, the metric is chosen based on the number of features in the dataset. + + Returns + ------- + label_quality_scores: np.ndarray + Contains one score (between 0 and 1) per example. + Lower scores indicate more likely mislabled examples. + """ + residual = predictions - labels + labels = (labels - labels.mean()) / (labels.std() + TINY_VALUE) + residual = residual_scale * ((residual - residual.mean()) / (residual.std() + TINY_VALUE)) + + # 2D features by combining labels and residual + features = np.array([labels, residual]).T + + neighbors = int(np.ceil(frac_neighbors * labels.shape[0])) + # Use provided metric or select a decent implementation of the euclidean metric for knn search + neighbor_metric = neighbor_metric or decide_euclidean_metric(features) + knn = features_to_knn(features, n_neighbors=neighbors, metric=neighbor_metric) + ood = OutOfDistribution(params={"knn": knn}) + + label_quality_scores = ood.score(features=features) + return label_quality_scores +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/segmentation/filter.html b/v2.6.6/_modules/cleanlab/segmentation/filter.html new file mode 100644 index 000000000..0f57fbd1f --- /dev/null +++ b/v2.6.6/_modules/cleanlab/segmentation/filter.html @@ -0,0 +1,913 @@ + + + + + + + + + + + cleanlab.segmentation.filter - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.segmentation.filter

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods to find label issues in image semantic segmentation datasets, where each pixel in an image receives its own class label.
+
+"""
+
+from typing import Optional, Tuple
+
+import numpy as np
+
+from cleanlab.experimental.label_issues_batched import LabelInspector
+from cleanlab.internal.segmentation_utils import _check_input, _get_valid_optional_params
+
+
+
[docs]def find_label_issues( + labels: np.ndarray, + pred_probs: np.ndarray, + *, + batch_size: Optional[int] = None, + n_jobs: Optional[int] = None, + verbose: bool = True, + **kwargs, +) -> np.ndarray: + """ + Returns a boolean mask for the entire dataset, per pixel where ``True`` represents + an example identified with a label issue and ``False`` represents an example of a pixel correctly labeled. + + * N - Number of images in the dataset + * K - Number of classes in the dataset + * H - Height of each image + * W - Width of each image + + Tip + --- + If you encounter the error "pred_probs is not defined", try setting ``n_jobs=1``. + + Parameters + ---------- + labels: + A discrete array of shape ``(N,H,W,)`` of noisy labels for a semantic segmentation dataset, i.e. some labels may be erroneous. + + *Format requirements*: For a dataset with K classes, each pixel must be labeled using an integer in 0, 1, ..., K-1. + + Tip + --- + If your labels are one hot encoded you can do: ``labels = np.argmax(labels_one_hot, axis=1)`` assuming that `labels_one_hot` is of dimension ``(N,K,H,W)``, in order to get properly formatted `labels`. + + pred_probs: + An array of shape ``(N,K,H,W,)`` of model-predicted class probabilities, + ``P(label=k|x)`` for each pixel ``x``. The prediction for each pixel is an array corresponding to the estimated likelihood that this pixel belongs to each of the ``K`` classes. The 2nd dimension of `pred_probs` must be ordered such that these probabilities correspond to class 0, 1, ..., K-1. + + batch_size: + Optional size of image mini-batches used for computing the label issues in a streaming fashion (does not affect results, just the runtime and memory requirements). + To maximize efficiency, try to use the largest `batch_size` your memory allows. If not provided, a good default is used. + + n_jobs: + Optional number of processes for multiprocessing (default value = 1). Only used on Linux. + If `n_jobs=None`, will use either the number of: physical cores if psutil is installed, or logical cores otherwise. + + verbose: + Set to ``False`` to suppress all print statements. + + **kwargs: + * downsample: int, + Optional factor to shrink labels and pred_probs by. Default ``1`` + Must be a factor divisible by both the labels and the pred_probs. Larger values of `downsample` produce faster runtimes but potentially less accurate results due to over-compression. Set to 1 to avoid any downsampling. + + Returns + ------- + label_issues: np.ndarray + Returns a boolean **mask** for the entire dataset of length `(N,H,W)` + where ``True`` represents a pixel label issue and ``False`` represents an example that is correctly labeled. + """ + batch_size, n_jobs = _get_valid_optional_params(batch_size, n_jobs) + downsample = kwargs.get("downsample", 1) + + def downsample_arrays( + labels: np.ndarray, pred_probs: np.ndarray, factor: int = 1 + ) -> Tuple[np.ndarray, np.ndarray]: + if factor == 1: + return labels, pred_probs + + num_image, num_classes, h, w = pred_probs.shape + + # Check if possible to downsample + if h % downsample != 0 or w % downsample != 0: + raise ValueError( + f"Height {h} and width {w} not divisible by downsample value of {downsample}. Set kwarg downsample to 1 to avoid downsampling." + ) + small_labels = np.round( + labels.reshape((num_image, h // factor, factor, w // factor, factor)).mean((4, 2)) + ) + small_pred_probs = pred_probs.reshape( + (num_image, num_classes, h // factor, factor, w // factor, factor) + ).mean((5, 3)) + + # We want to make sure that pred_probs are renormalized + row_sums = small_pred_probs.sum(axis=1) + renorm_small_pred_probs = small_pred_probs / np.expand_dims(row_sums, 1) + + return small_labels, renorm_small_pred_probs + + def flatten_and_preprocess_masks( + labels: np.ndarray, pred_probs: np.ndarray + ) -> Tuple[np.ndarray, np.ndarray]: + _, num_classes, _, _ = pred_probs.shape + labels_flat = labels.flatten().astype(int) + pred_probs_flat = np.moveaxis(pred_probs, 0, 1).reshape(num_classes, -1) + + return labels_flat, pred_probs_flat.T + + ## + _check_input(labels, pred_probs) + + # Added Downsampling + pre_labels, pre_pred_probs = downsample_arrays(labels, pred_probs, downsample) + + num_image, _, h, w = pre_pred_probs.shape + + ### This section is a modified version of find_label_issues_batched(), old code is commented out + # ranked_label_issues = find_label_issues_batched( + # pre_labels, pre_pred_probs, batch_size=batch_size, n_jobs=n_jobs, verbose=verbose + # ) + lab = LabelInspector( + num_class=pre_pred_probs.shape[1], + verbose=verbose, + n_jobs=n_jobs, + quality_score_kwargs=None, + num_issue_kwargs=None, + ) + n = len(pre_labels) + + if verbose: + from tqdm.auto import tqdm + + pbar = tqdm(desc="number of examples processed for estimating thresholds", total=n) + + # Precompute the size of each image in the batch + image_size = np.prod(pre_pred_probs.shape[1:]) + images_per_batch = max(batch_size // image_size, 1) + + for start_index in range(0, n, images_per_batch): + end_index = min(start_index + images_per_batch, n) + labels_batch, pred_probs_batch = flatten_and_preprocess_masks( + pre_labels[start_index:end_index], pre_pred_probs[start_index:end_index] + ) + lab.update_confident_thresholds(labels_batch, pred_probs_batch) + if verbose: + pbar.update(end_index - start_index) + + if verbose: + pbar.close() + pbar = tqdm(desc="number of examples processed for checking labels", total=n) + + for start_index in range(0, n, images_per_batch): + end_index = min(start_index + images_per_batch, n) + labels_batch, pred_probs_batch = flatten_and_preprocess_masks( + pre_labels[start_index:end_index], pre_pred_probs[start_index:end_index] + ) + _ = lab.score_label_quality(labels_batch, pred_probs_batch) + if verbose: + pbar.update(end_index - start_index) + + if verbose: + pbar.close() + + ranked_label_issues = lab.get_label_issues() + ### End find_label_issues_batched() section + + # Upsample carefully maintaining indicies + label_issues = np.full((num_image, h, w), False) + + # only want to call it an error if pred_probs doesnt match the label at those pixels + for i in range(0, ranked_label_issues.shape[0], batch_size): + issues_batch = ranked_label_issues[i : i + batch_size] + # Finding the right indicies + image_batch, batch_coor_i, batch_coor_j = _get_indexes_from_ranked_issues( + issues_batch, h, w + ) + label_issues[image_batch, batch_coor_i, batch_coor_j] = True + if downsample == 1: + # check if pred_probs matches the label at those pixels + pred_argmax = np.argmax(pred_probs[image_batch, :, batch_coor_i, batch_coor_j], axis=1) + mask = pred_argmax == labels[image_batch, batch_coor_i, batch_coor_j] + label_issues[image_batch[mask], batch_coor_i[mask], batch_coor_j[mask]] = False + + if downsample != 1: + label_issues = label_issues.repeat(downsample, axis=1).repeat(downsample, axis=2) + + for i in range(0, ranked_label_issues.shape[0], batch_size): + issues_batch = ranked_label_issues[i : i + batch_size] + image_batch, batch_coor_i, batch_coor_j = _get_indexes_from_ranked_issues( + issues_batch, h, w + ) + # Upsample the coordinates + upsampled_ii = batch_coor_i * downsample + upsampled_jj = batch_coor_j * downsample + # Iterate over the upsampled region + for i in range(downsample): + for j in range(downsample): + rows = upsampled_ii + i + cols = upsampled_jj + j + pred_argmax = np.argmax(pred_probs[image_batch, :, rows, cols], axis=1) + # Check if the predicted class (argmax) at the identified issue location matches the true label + mask = pred_argmax == labels[image_batch, rows, cols] + # If they match, set the corresponding entries in the label_issues array to False + label_issues[image_batch[mask], rows[mask], cols[mask]] = False + + return label_issues
+ + +def _get_indexes_from_ranked_issues( + ranked_label_issues: np.ndarray, h: int, w: int +) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: + hw = h * w + relative_index = ranked_label_issues % hw + pixel_coor_i, pixel_coor_j = np.unravel_index(relative_index, (h, w)) + image_batch = ranked_label_issues // hw + return image_batch, pixel_coor_i, pixel_coor_j +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/segmentation/rank.html b/v2.6.6/_modules/cleanlab/segmentation/rank.html new file mode 100644 index 000000000..38694b978 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/segmentation/rank.html @@ -0,0 +1,925 @@ + + + + + + + + + + + cleanlab.segmentation.rank - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.segmentation.rank

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods to rank and score images in a semantic segmentation dataset based on how likely they are to contain mislabeled pixels.
+"""
+import warnings
+from typing import Optional, Tuple
+
+import numpy as np
+
+from cleanlab.internal.segmentation_utils import _check_input, _get_valid_optional_params
+from cleanlab.segmentation.filter import find_label_issues
+
+
+
[docs]def get_label_quality_scores( + labels: np.ndarray, + pred_probs: np.ndarray, + *, + method: str = "softmin", + batch_size: Optional[int] = None, + n_jobs: Optional[int] = None, + verbose: bool = True, + **kwargs, +) -> Tuple[np.ndarray, np.ndarray]: + """Returns a label quality score for each image. + + This is a function to compute label quality scores for semantic segmentation datasets, + where lower scores indicate labels less likely to be correct. + + * N - Number of images in the dataset + * K - Number of classes in the dataset + * H - Height of each image + * W - Width of each image + + Parameters + ---------- + labels: + A discrete array of noisy labels for a segmantic segmentation dataset, in the shape ``(N,H,W,)``, + where each pixel must be integer in 0, 1, ..., K-1. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.segmentation.filter.find_label_issues>` for further details. + + pred_probs: + An array of shape ``(N,K,H,W,)`` of model-predicted class probabilities. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.segmentation.filter.find_label_issues>` for further details. + + method: {"softmin", "num_pixel_issues"}, default="softmin" + Label quality scoring method. + + - "softmin" - Calculates the inner product between scores and softmax(1-scores). For efficiency, use instead of "num_pixel_issues". + - "num_pixel_issues" - Uses the number of pixels with label issues for each image using :py:func:`find_label_issues <cleanlab.segmentation.filter.find_label_issues>` + + batch_size : + Optional size of mini-batches to use for estimating the label issues for 'num_pixel_issues' only, not 'softmin'. + To maximize efficiency, try to use the largest `batch_size` your memory allows. If not provided, a good default is used. + + n_jobs: + Optional number of processes for multiprocessing (default value = 1). Only used on Linux. For 'num_pixel_issues' only, not 'softmin' + If `n_jobs=None`, will use either the number of: physical cores if psutil is installed, or logical cores otherwise. + + verbose: + Set to ``False`` to suppress all print statements. + + **kwargs: + * downsample : int, + Factor to shrink labels and pred_probs by for 'num_pixel_issues' only, not 'softmin' . Default ``16`` + Must be a factor divisible by both the labels and the pred_probs. Larger values of `downsample` produce faster runtimes but potentially less accurate results due to over-compression. Set to 1 to avoid any downsampling. + * temperature : float, + Temperature for softmin. Default ``0.1`` + + + Returns + ------- + image_scores: + Array of shape ``(N, )`` of scores between 0 and 1, one per image in the dataset. + Lower scores indicate image more likely to contain a label issue. + pixel_scores: + Array of shape ``(N,H,W)`` of scores between 0 and 1, one per pixel in the dataset. + """ + batch_size, n_jobs = _get_valid_optional_params(batch_size, n_jobs) + _check_input(labels, pred_probs) + + softmin_temperature = kwargs.get("temperature", 0.1) + downsample_num_pixel_issues = kwargs.get("downsample", 1) + + if method == "num_pixel_issues": + _, K, _, _ = pred_probs.shape + labels_expanded = labels[:, np.newaxis, :, :] + mask = np.arange(K)[np.newaxis, :, np.newaxis, np.newaxis] == labels_expanded + # Calculate pixel_scores + masked_pred_probs = np.where(mask, pred_probs, 0) + pixel_scores = masked_pred_probs.sum(axis=1) + scores = find_label_issues( + labels, + pred_probs, + downsample=downsample_num_pixel_issues, + n_jobs=n_jobs, + verbose=verbose, + batch_size=batch_size, + ) + img_scores = 1 - np.mean(scores, axis=(1, 2)) + return (img_scores, pixel_scores) + + if downsample_num_pixel_issues != 1: + warnings.warn( + f"image will not downsample for method {method} is only for method: num_pixel_issues" + ) + + num_im, num_class, h, w = pred_probs.shape + image_scores = np.empty((num_im,)) + pixel_scores = np.empty((num_im, h, w)) + if verbose: + from tqdm.auto import tqdm + + pbar = tqdm(desc=f"images processed using {method}", total=num_im) + + h_array = np.arange(h)[:, None] + w_array = np.arange(w) + + for image in range(num_im): + image_probs = pred_probs[image][ + labels[image], + h_array, + w_array, + ] + pixel_scores[image, :, :] = image_probs + image_scores[image] = _get_label_quality_per_image( + image_probs.flatten(), method=method, temperature=softmin_temperature + ) + if verbose: + pbar.update(1) + return image_scores, pixel_scores
+ + +
[docs]def issues_from_scores( + image_scores: np.ndarray, pixel_scores: Optional[np.ndarray] = None, threshold: float = 0.1 +) -> np.ndarray: + """ + Converts scores output by `~cleanlab.segmentation.rank.get_label_quality_scores` + to a list of issues of similar format as output by :py:func:`segmentation.filter.find_label_issues <cleanlab.segmentation.filter.find_label_issues>`. + + Only considers as issues those tokens with label quality score lower than `threshold`, + so this parameter determines the number of issues that are returned. + + Note + ---- + - This method is intended for converting the most severely mislabeled examples into a format compatible with ``summary`` methods like :py:func:`segmentation.summary.display_issues <cleanlab.segmentation.summary.display_issues>`. + - This method does not estimate the number of label errors since the `threshold` is arbitrary, for that instead use :py:func:`segmentation.filter.find_label_issues <cleanlab.segmentation.filter.find_label_issues>`, which estimates the label errors via Confident Learning rather than score thresholding. + + Parameters + ---------- + image_scores: + Array of shape `(N, )` of overall image scores, where `N` is the number of images in the dataset. + Same format as the `image_scores` returned by `~cleanlab.segmentation.rank.get_label_quality_scores`. + + pixel_scores: + Optional array of shape ``(N,H,W)`` of scores between 0 and 1, one per pixel in the dataset. + Same format as the `pixel_scores` returned by `~cleanlab.segmentation.rank.get_label_quality_scores`. + + threshold: + Optional quality scores threshold that determines which pixels are included in result. Pixels with with quality scores above the `threshold` are not + included in the result. If not provided, all pixels are included in result. + + Returns + --------- + issues: + Returns a boolean **mask** for the entire dataset + where ``True`` represents a pixel label issue and ``False`` represents an example that is + accurately labeled with using the threshold provided by the user. + Use :py:func:`segmentation.summary.display_issues <cleanlab.segmentation.summary.display_issues>` + to view these issues within the original images. + + If `pixel_scores` is not provided, returns array of integer indices (rather than boolean mask) of the images whose label quality score + falls below the `threshold` (sorted by overall label quality score of each image). + + """ + + if image_scores is None: + raise ValueError("pixel_scores must be provided") + if threshold < 0 or threshold > 1 or threshold is None: + raise ValueError("threshold must be between 0 and 1") + + if pixel_scores is not None: + return pixel_scores < threshold + + ranking = np.argsort(image_scores) + cutoff = np.searchsorted(image_scores[ranking], threshold) + return ranking[: cutoff + 1]
+ + +def _get_label_quality_per_image(pixel_scores, method=None, temperature=0.1): + from cleanlab.internal.multilabel_scorer import softmin + + """ + Input pixel scores and get label quality score for that image, currently using the "softmin" method. + + Parameters + ---------- + pixel_scores: + Per-pixel label quality scores in flattened array of shape ``(N, )``, where N is the number of pixels in the image. + + method: default "softmin" + Method to use to calculate the image's label quality score. + Currently only supports "softmin". + temperature: default 0.1 + Temperature of the softmax function. Too small values may cause numerical underflow and NaN scores. + + Lower values encourage this method to converge toward the label quality score of the pixel with the lowest quality label in the image. + + Higher values encourage this method to converge toward the average label quality score of all pixels in the image. + + Returns + --------- + image_score: + Float of the image's label quality score from 0 to 1, 0 being the lowest quality and 1 being the highest quality. + + """ + if pixel_scores is None or pixel_scores.size == 0: + raise Exception("Invalid Input: pixel_scores cannot be None or an empty list") + + if temperature == 0 or temperature is None: + raise Exception("Invalid Input: temperature cannot be zero or None") + + pixel_scores_64 = pixel_scores.astype("float64") + if method == "softmin": + if len(pixel_scores_64) > 0: + return softmin( + np.expand_dims(pixel_scores_64, axis=0), axis=1, temperature=temperature + )[0] + else: + raise Exception("Invalid Input: pixel_scores is empty") + else: + raise Exception("Invalid Method: Specify correct method. Currently only supports 'softmin'") +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/segmentation/summary.html b/v2.6.6/_modules/cleanlab/segmentation/summary.html new file mode 100644 index 000000000..d01d1be88 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/segmentation/summary.html @@ -0,0 +1,1045 @@ + + + + + + + + + + + cleanlab.segmentation.summary - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.segmentation.summary

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods to display images and their label issues in a semantic segmentation dataset, as well as summarize the overall types of issues identified.
+"""
+
+from typing import Any, Dict, List, Optional
+
+import numpy as np
+import pandas as pd
+from tqdm.auto import tqdm
+
+from cleanlab.internal.segmentation_utils import _get_summary_optional_params
+
+
+
[docs]def display_issues( + issues: np.ndarray, + *, + labels: Optional[np.ndarray] = None, + pred_probs: Optional[np.ndarray] = None, + class_names: Optional[List[str]] = None, + exclude: Optional[List[int]] = None, + top: Optional[int] = None, + **kwargs, # Accepting additional kwargs for plt.show() +) -> None: + """ + Display semantic segmentation label issues, showing images with problematic pixels highlighted. + + Can also show given and predicted masks for each image identified to have label issue. + + Parameters + ---------- + issues: + Boolean **mask** for the entire dataset + where ``True`` represents a pixel label issue and ``False`` represents an example that is + accurately labeled. + + Same format as output by :py:func:`segmentation.filter.find_label_issues <cleanlab.segmentation.filter.find_label_issues>` + or :py:func:`segmentation.rank.issues_from_scores <cleanlab.segmentation.rank.issues_from_scores>`. + + labels: + Optional discrete array of noisy labels for a segmantic segmentation dataset, in the shape ``(N,H,W,)``, + where each pixel must be integer in 0, 1, ..., K-1. + If `labels` is provided, this function also displays given label of the pixel identified with issue. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.segmentation.filter.find_label_issues>` for more information. + + pred_probs: + Optional array of shape ``(N,K,H,W,)`` of model-predicted class probabilities. + If `pred_probs` is provided, this function also displays predicted label of the pixel identified with issue. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.segmentation.filter.find_label_issues>` for more information. + + Tip + --- + If your labels are one hot encoded you can `np.argmax(labels_one_hot, axis=1)` assuming that `labels_one_hot` is of dimension (N,K,H,W) + before entering in the function + + class_names: + Optional list of strings, where each string represents the name of a class in the semantic segmentation problem. + The order of the names should correspond to the numerical order of the classes. The list length should be + equal to the number of unique classes present in the labels. + If provided, this function will generate a legend + showing the color mapping of each class in the provided colormap. + + Example: + If there are three classes in your labels, represented by 0, 1, 2, then class_names might look like this: + + .. code-block:: python + + class_names = ['background', 'person', 'dog'] + + top: + Optional maximum number of issues to be printed. If not provided, a good default is used. + + exclude: + Optional list of label classes that can be ignored in the errors, each element must be 0, 1, ..., K-1 + + kwargs + Additional keyword arguments to pass to ``plt.show()`` (matplotlib.pyplot.show). + """ + class_names, exclude, top = _get_summary_optional_params(class_names, exclude, top) + if labels is None and len(exclude) > 0: + raise ValueError("Provide labels to allow class exclusion") + + top = min(top, len(issues)) + + correct_ordering = np.argsort(-np.sum(issues, axis=(1, 2)))[:top] + + try: + import matplotlib.pyplot as plt + import matplotlib.patches as mpatches + from matplotlib.colors import ListedColormap + except ImportError: + raise ImportError('try "pip install matplotlib"') + + output_plots = (pred_probs is not None) + (labels is not None) + 1 + + # Colormap for errors + error_cmap = ListedColormap(["none", "red"]) + _, h, w = issues.shape + if output_plots > 1: + if pred_probs is not None: + _, num_classes, _, _ = pred_probs.shape + cmap = _generate_colormap(num_classes) + elif labels is not None: + num_classes = max(np.unique(labels)) + 1 + cmap = _generate_colormap(num_classes) + else: + cmap = None + + # Show a legend + if class_names is not None and cmap is not None: + patches = [ + mpatches.Patch(color=cmap[i], label=class_names[i]) for i in range(len(class_names)) + ] + legend = plt.figure() # adjust figsize for larger legend + legend.legend( + handles=patches, loc="center", ncol=len(class_names), facecolor="white", fontsize=20 + ) # adjust fontsize for larger text + plt.axis("off") + plt.show(**kwargs) + + for i in correct_ordering: + # Show images + fig, axes = plt.subplots(1, output_plots, figsize=(5 * output_plots, 5)) + plot_index = 0 + + # First image - Given truth labels + if labels is not None: + axes[plot_index].imshow(cmap[labels[i]]) + axes[plot_index].set_title("Given Labels") + plot_index += 1 + + # Second image - Argmaxed pred_probs + if pred_probs is not None: + axes[plot_index].imshow(cmap[np.argmax(pred_probs[i], axis=0)]) + axes[plot_index].set_title("Argmaxed Prediction Probabilities") + plot_index += 1 + + # Third image - Errors + if output_plots == 1: + ax = axes + else: + ax = axes[plot_index] + + mask = np.full((h, w), True) + if labels is not None and len(exclude) != 0: + mask = ~np.isin(labels[i], exclude) + ax.imshow(issues[i] & mask, cmap=error_cmap, vmin=0, vmax=1) + ax.set_title(f"Image {i}: Suggested Errors (in Red)") + plt.show(**kwargs) + + return None
+ + +
[docs]def common_label_issues( + issues: np.ndarray, + labels: np.ndarray, + pred_probs: np.ndarray, + *, + class_names: Optional[List[str]] = None, + exclude: Optional[List[int]] = None, + top: Optional[int] = None, + verbose: bool = True, +) -> pd.DataFrame: + """ + Display the frequency of which label are swapped in the dataset. + + These may correspond to pixels that are ambiguous or systematically misunderstood by the data annotators. + + * N - Number of images in the dataset + * K - Number of classes in the dataset + * H - Height of each image + * W - Width of each image + + Parameters + ---------- + issues: + Boolean **mask** for the entire dataset + where ``True`` represents a pixel label issue and ``False`` represents an example that is + accurately labeled. + + Same format as output by :py:func:`segmentation.filter.find_label_issues <cleanlab.segmentation.filter.find_label_issues>` + or :py:func:`segmentation.rank.issues_from_scores <cleanlab.segmentation.rank.issues_from_scores>`. + + labels: + A discrete array of noisy labels for a segmantic segmentation dataset, in the shape ``(N,H,W,)``. + where each pixel must be integer in 0, 1, ..., K-1. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.segmentation.filter.find_label_issues>` for more information. + + pred_probs: + An array of shape ``(N,K,H,W,)`` of model-predicted class probabilities. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.segmentation.filter.find_label_issues>` for more information. + + Tip + --- + If your labels are one hot encoded you can `np.argmax(labels_one_hot, axis=1)` assuming that `labels_one_hot` is of dimension (N,K,H,W) + before entering in the function + + class_names: + Optional length K list of names of each class, such that `class_names[i]` is the string name of the class corresponding to `labels` with value `i`. + If `class_names` is provided, display these string names for predicted and given labels, otherwise display the integer index of classes. + + exclude: + Optional list of label classes that can be ignored in the errors, each element must be in 0, 1, ..., K-1. + + top: + Optional maximum number of tokens to print information for. If not provided, a good default is used. + + verbose: + Set to ``False`` to suppress all print statements. + + Returns + ------- + issues_df: + DataFrame with columns ``['given_label', 'predicted_label', 'num_label_issues']`` + where each row contains information about a particular given/predicted label swap. + Rows are ordered by the number of label issues inferred to exhibit this type of label swap. + """ + try: + N, K, H, W = pred_probs.shape + except: + raise ValueError("pred_probs must be of shape (N, K, H, W)") + + assert labels.shape == (N, H, W), "labels must be of shape (N, H, W)" + + class_names, exclude, top = _get_summary_optional_params(class_names, exclude, top) + # Find issues by pixel coordinates + issue_coords = np.column_stack(np.where(issues)) + + # Count issues per class (given label) + count: Dict[int, Any] = {} + for i, j, k in tqdm(issue_coords): + label = labels[i, j, k] + pred = pred_probs[i, :, j, k].argmax() + if label not in count: + count[label] = np.zeros(K, dtype=int) + if pred not in exclude: + count[label][pred] += 1 + + # Prepare output DataFrame + if class_names is None: + class_names = [str(i) for i in range(K)] + + info = [] + for given_label, class_name in enumerate(class_names): + if given_label in count: + for pred_label, num_issues in enumerate(count[given_label]): + if num_issues > 0: + info.append([class_name, class_names[pred_label], num_issues]) + + info = sorted(info, key=lambda x: x[2], reverse=True)[:top] + issues_df = pd.DataFrame(info, columns=["given_label", "predicted_label", "num_pixel_issues"]) + + if verbose: + for idx, row in issues_df.iterrows(): + print( + f"Class '{row['given_label']}' is potentially mislabeled as class for '{row['predicted_label']}' " + f"{row['num_pixel_issues']} pixels in the dataset" + ) + + return issues_df
+ + +
[docs]def filter_by_class( + class_index: int, issues: np.ndarray, labels: np.ndarray, pred_probs: np.ndarray +) -> np.ndarray: + """ + Return label issues involving particular class. Note that this includes errors where the given label is the class of interest, and the predicted label is any other class. + + Parameters + ---------- + class_index: + The specific class you are interested in. + + issues: + Boolean **mask** for the entire dataset where ``True`` represents a pixel label issue and ``False`` represents an example that is + accurately labeled. + + Same format as output by :py:func:`segmentation.filter.find_label_issues <cleanlab.segmentation.filter.find_label_issues>` + or :py:func:`segmentation.rank.issues_from_scores <cleanlab.segmentation.rank.issues_from_scores>`. + + labels: + A discrete array of noisy labels for a segmantic segmentation dataset, in the shape ``(N,H,W,)``, + where each pixel must be integer in 0, 1, ..., K-1. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.segmentation.filter.find_label_issues>` for further details. + + pred_probs: + An array of shape ``(N,K,H,W,)`` of model-predicted class probabilities. + Refer to documentation for this argument in :py:func:`find_label_issues <cleanlab.segmentation.filter.find_label_issues>` for further details. + + Returns + ---------- + issues_subset: + Boolean **mask** for the subset dataset where ``True`` represents a pixel label issue and ``False`` represents an example that is + accurately labeled for the labeled class. + + Returned mask shows **all** instances that involve the particular class of interest. + + + """ + issues_subset = (issues & np.isin(labels, class_index)) | ( + issues & np.isin(pred_probs.argmax(1), class_index) + ) + return issues_subset
+ + +def _generate_colormap(num_colors): + """ + Finds a unique color map based on the number of colors inputted ideal for semantic segmentation. + Parameters + ---------- + num_colors: + How many unique colors you want + + Returns + ------- + colors: + colors with num_colors distinct colors + """ + + try: + from matplotlib.cm import hsv + except: + raise ImportError('try "pip install matplotlib"') + + num_shades = 7 + num_colors_with_shades = -(-num_colors // num_shades) * num_shades + linear_nums = np.linspace(0, 1, num_colors_with_shades, endpoint=False) + + arr_by_shade_rows = linear_nums.reshape(num_shades, -1) + arr_by_shade_columns = arr_by_shade_rows.T + num_partitions = arr_by_shade_columns.shape[0] + nums_distributed_like_rising_saw = arr_by_shade_columns.flatten() + + initial_cm = hsv(nums_distributed_like_rising_saw) + lower_partitions_half = num_partitions // 2 + upper_partitions_half = num_partitions - lower_partitions_half + + lower_half = lower_partitions_half * num_shades + initial_cm[:lower_half, :3] *= np.linspace(0.2, 1, lower_half)[:, np.newaxis] + + upper_half_indices = np.arange(lower_half, num_colors_with_shades).reshape( + upper_partitions_half, num_shades + ) + modifier = ( + (1 - initial_cm[upper_half_indices, :3]) + * np.arange(upper_partitions_half)[:, np.newaxis, np.newaxis] + / upper_partitions_half + ) + initial_cm[upper_half_indices, :3] += modifier + colors = initial_cm[:num_colors] + return colors +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/token_classification/filter.html b/v2.6.6/_modules/cleanlab/token_classification/filter.html new file mode 100644 index 000000000..09952f546 --- /dev/null +++ b/v2.6.6/_modules/cleanlab/token_classification/filter.html @@ -0,0 +1,796 @@ + + + + + + + + + + + cleanlab.token_classification.filter - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.token_classification.filter

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods to find label issues in token classification datasets (text data), where each token in a sentence receives its own class label.
+
+The underlying algorithms are described in `this paper <https://arxiv.org/abs/2210.03920>`_.
+"""
+
+import numpy as np
+from typing import List, Tuple
+import warnings
+
+from cleanlab.filter import find_label_issues as find_label_issues_main
+from cleanlab.experimental.label_issues_batched import find_label_issues_batched
+
+
+
[docs]def find_label_issues( + labels: list, + pred_probs: list, + *, + return_indices_ranked_by: str = "self_confidence", + low_memory: bool = False, + **kwargs, +) -> List[Tuple[int, int]]: + """Identifies tokens with label issues in a token classification dataset. + + Tokens identified with issues will be ranked by their individual label quality score. + + Instead use :py:func:`token_classification.rank.get_label_quality_scores <cleanlab.token_classification.rank.get_label_quality_scores>` + if you prefer to rank the sentences based on their overall label quality. + + Parameters + ---------- + labels: + Nested list of given labels for all tokens, such that `labels[i]` is a list of labels, one for each token in the `i`-th sentence. + + For a dataset with K classes, each class label must be integer in 0, 1, ..., K-1. + + pred_probs: + List of np arrays, such that `pred_probs[i]` has shape ``(T, K)`` if the `i`-th sentence contains T tokens. + + Each row of `pred_probs[i]` corresponds to a token `t` in the `i`-th sentence, + and contains model-predicted probabilities that `t` belongs to each of the K possible classes. + + Columns of each `pred_probs[i]` should be ordered such that the probabilities correspond to class 0, 1, ..., K-1. + + return_indices_ranked_by: {"self_confidence", "normalized_margin", "confidence_weighted_entropy"}, default="self_confidence" + Returned token-indices are sorted by their label quality score. + + See :py:func:`cleanlab.filter.find_label_issues <cleanlab.filter.find_label_issues>` + documentation for more details on each label quality scoring method. + + kwargs: + Additional keyword arguments to pass into :py:func:`filter.find_label_issues <cleanlab.filter.find_label_issues>` + which is internally applied at the token level. Can include values like `n_jobs` to control parallel processing, `frac_noise`, etc. + + Returns + ------- + issues: + List of label issues identified by cleanlab, such that each element is a tuple ``(i, j)``, which + indicates that the `j`-th token of the `i`-th sentence has a label issue. + + These tuples are ordered in `issues` list based on the likelihood that the corresponding token is mislabeled. + + Use :py:func:`token_classification.summary.display_issues <cleanlab.token_classification.summary.display_issues>` + to view these issues within the original sentences. + + Examples + -------- + >>> import numpy as np + >>> from cleanlab.token_classification.filter import find_label_issues + >>> labels = [[0, 0, 1], [0, 1]] + >>> pred_probs = [ + ... np.array([[0.9, 0.1], [0.7, 0.3], [0.05, 0.95]]), + ... np.array([[0.8, 0.2], [0.8, 0.2]]), + ... ] + >>> find_label_issues(labels, pred_probs) + [(1, 1)] + """ + labels_flatten = [l for label in labels for l in label] + pred_probs_flatten = np.array([pred for pred_prob in pred_probs for pred in pred_prob]) + + if low_memory: + for arg_name, _ in kwargs.items(): + warnings.warn(f"`{arg_name}` is not used when `low_memory=True`.") + quality_score_kwargs = {"method": return_indices_ranked_by} + issues_main = find_label_issues_batched( + labels_flatten, pred_probs_flatten, quality_score_kwargs=quality_score_kwargs + ) + else: + issues_main = find_label_issues_main( + labels_flatten, + pred_probs_flatten, + return_indices_ranked_by=return_indices_ranked_by, + **kwargs, + ) + + lengths = [len(label) for label in labels] + mapping = [[(i, j) for j in range(length)] for i, length in enumerate(lengths)] + mapping_flatten = [index for indicies in mapping for index in indicies] + + issues = [mapping_flatten[issue] for issue in issues_main] + return issues
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/token_classification/rank.html b/v2.6.6/_modules/cleanlab/token_classification/rank.html new file mode 100644 index 000000000..95970289c --- /dev/null +++ b/v2.6.6/_modules/cleanlab/token_classification/rank.html @@ -0,0 +1,969 @@ + + + + + + + + + + + cleanlab.token_classification.rank - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.token_classification.rank

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods to rank and score sentences in a token classification dataset (text data), based on how likely they are to contain label errors.
+
+The underlying algorithms are described in `this paper <https://arxiv.org/abs/2210.03920>`_.
+"""
+
+import pandas as pd
+import numpy as np
+from typing import List, Optional, Union, Tuple
+
+from cleanlab.rank import get_label_quality_scores as main_get_label_quality_scores
+from cleanlab.internal.numerics import softmax
+
+
+
[docs]def get_label_quality_scores( + labels: list, + pred_probs: list, + *, + tokens: Optional[list] = None, + token_score_method: str = "self_confidence", + sentence_score_method: str = "min", + sentence_score_kwargs: dict = {}, +) -> Tuple[np.ndarray, list]: + """ + Returns overall quality scores for the labels in each sentence, as well as for the individual tokens' labels in a token classification dataset. + + Each score is between 0 and 1. + + Lower scores indicate token labels that are less likely to be correct, or sentences that are more likely to contain a mislabeled token. + + Parameters + ---------- + labels: + Nested list of given labels for all tokens, such that `labels[i]` is a list of labels, one for each token in the `i`-th sentence. + + For a dataset with K classes, each label must be in 0, 1, ..., K-1. + + pred_probs: + List of np arrays, such that `pred_probs[i]` has shape ``(T, K)`` if the `i`-th sentence contains T tokens. + + Each row of `pred_probs[i]` corresponds to a token `t` in the `i`-th sentence, + and contains model-predicted probabilities that `t` belongs to each of the K possible classes. + + Columns of each `pred_probs[i]` should be ordered such that the probabilities correspond to class 0, 1, ..., K-1. + + tokens: + Nested list such that `tokens[i]` is a list of tokens (strings/words) that comprise the `i`-th sentence. + + These strings are used to annotated the returned `token_scores` object, see its documentation for more information. + + sentence_score_method: {"min", "softmin"}, default="min" + Method to aggregate individual token label quality scores into a single score for the sentence. + + - `min`: sentence score = minimum of token scores in the sentence + - `softmin`: sentence score = ``<s, softmax(1-s, t)>``, where `s` denotes the token label scores of the sentence, and ``<a, b> == np.dot(a, b)``. + Here parameter `t` controls the softmax temperature, such that the score converges toward `min` as ``t -> 0``. + Unlike `min`, `softmin` is affected by the scores of all tokens in the sentence. + + token_score_method: {"self_confidence", "normalized_margin", "confidence_weighted_entropy"}, default="self_confidence" + Label quality scoring method for each token. + + See :py:func:`cleanlab.rank.get_label_quality_scores <cleanlab.rank.get_label_quality_scores>` documentation for more info. + + sentence_score_kwargs: + Optional keyword arguments for `sentence_score_method` function (for advanced users only). + + See `~cleanlab.token_classification.rank._softmin_sentence_score` for more info about keyword arguments supported for that scoring method. + + Returns + ------- + sentence_scores: + Array of shape ``(N, )`` of scores between 0 and 1, one per sentence in the dataset. + + Lower scores indicate sentences more likely to contain a label issue. + + token_scores: + List of ``pd.Series``, such that `token_info[i]` contains the + label quality scores for individual tokens in the `i`-th sentence. + + If `tokens` strings were provided, they are used as index for each ``Series``. + + Examples + -------- + >>> import numpy as np + >>> from cleanlab.token_classification.rank import get_label_quality_scores + >>> labels = [[0, 0, 1], [0, 1]] + >>> pred_probs = [ + ... np.array([[0.9, 0.1], [0.7, 0.3], [0.05, 0.95]]), + ... np.array([[0.8, 0.2], [0.8, 0.2]]), + ... ] + >>> sentence_scores, token_scores = get_label_quality_scores(labels, pred_probs) + >>> sentence_scores + array([0.7, 0.2]) + >>> token_scores + [0 0.90 + 1 0.70 + 2 0.95 + dtype: float64, 0 0.8 + 1 0.2 + dtype: float64] + """ + methods = ["min", "softmin"] + assert sentence_score_method in methods, "Select from the following methods:\n%s" % "\n".join( + methods + ) + + labels_flatten = np.array([l for label in labels for l in label]) + pred_probs_flatten = np.array([p for pred_prob in pred_probs for p in pred_prob]) + + sentence_length = [len(label) for label in labels] + + def nested_list(x, sentence_length): + i = iter(x) + return [[next(i) for _ in range(length)] for length in sentence_length] + + token_scores = main_get_label_quality_scores( + labels=labels_flatten, pred_probs=pred_probs_flatten, method=token_score_method + ) + scores_nl = nested_list(token_scores, sentence_length) + + if sentence_score_method == "min": + sentence_scores = np.array(list(map(np.min, scores_nl))) + else: + assert sentence_score_method == "softmin" + temperature = sentence_score_kwargs.get("temperature", 0.05) + sentence_scores = _softmin_sentence_score(scores_nl, temperature=temperature) + + if tokens: + token_info = [pd.Series(scores, index=token) for scores, token in zip(scores_nl, tokens)] + else: + token_info = [pd.Series(scores) for scores in scores_nl] + return sentence_scores, token_info
+ + +
[docs]def issues_from_scores( + sentence_scores: np.ndarray, *, token_scores: Optional[list] = None, threshold: float = 0.1 +) -> Union[list, np.ndarray]: + """ + Converts scores output by `~cleanlab.token_classification.rank.get_label_quality_scores` + to a list of issues of similar format as output by :py:func:`token_classification.filter.find_label_issues <cleanlab.token_classification.filter.find_label_issues>`. + + Issues are sorted by label quality score, from most to least severe. + + Only considers as issues those tokens with label quality score lower than `threshold`, + so this parameter determines the number of issues that are returned. + This method is intended for converting the most severely mislabeled examples to a format compatible with + ``summary`` methods like :py:func:`token_classification.summary.display_issues <cleanlab.token_classification.summary.display_issues>`. + This method does not estimate the number of label errors since the `threshold` is arbitrary, + for that instead use :py:func:`token_classification.filter.find_label_issues <cleanlab.token_classification.filter.find_label_issues>`, + which estimates the label errors via Confident Learning rather than score thresholding. + + Parameters + ---------- + sentence_scores: + Array of shape `(N, )` of overall sentence scores, where `N` is the number of sentences in the dataset. + + Same format as the `sentence_scores` returned by `~cleanlab.token_classification.rank.get_label_quality_scores`. + + token_scores: + Optional list such that `token_scores[i]` contains the individual token scores for the `i`-th sentence. + + Same format as the `token_scores` returned by `~cleanlab.token_classification.rank.get_label_quality_scores`. + + threshold: + Tokens (or sentences, if `token_scores` is not provided) with quality scores above the `threshold` are not + included in the result. + + Returns + --------- + issues: + List of label issues identified by comparing quality scores to threshold, such that each element is a tuple ``(i, j)``, which + indicates that the `j`-th token of the `i`-th sentence has a label issue. + + These tuples are ordered in `issues` list based on the token label quality score. + + Use :py:func:`token_classification.summary.display_issues <cleanlab.token_classification.summary.display_issues>` + to view these issues within the original sentences. + + If `token_scores` is not provided, returns array of integer indices (rather than tuples) of the sentences whose label quality score + falls below the `threshold` (also sorted by overall label quality score of each sentence). + + Examples + -------- + >>> import numpy as np + >>> from cleanlab.token_classification.rank import issues_from_scores + >>> sentence_scores = np.array([0.1, 0.3, 0.6, 0.2, 0.05, 0.9, 0.8, 0.0125, 0.5, 0.6]) + >>> issues_from_scores(sentence_scores) + array([7, 4]) + + Changing the score threshold + + >>> issues_from_scores(sentence_scores, threshold=0.5) + array([7, 4, 0, 3, 1]) + + Providing token scores along with sentence scores finds issues at the token level + + >>> token_scores = [ + ... [0.9, 0.6], + ... [0.0, 0.8, 0.8], + ... [0.8, 0.8], + ... [0.1, 0.02, 0.3, 0.4], + ... [0.1, 0.2, 0.03, 0.4], + ... [0.1, 0.2, 0.3, 0.04], + ... [0.1, 0.2, 0.4], + ... [0.3, 0.4], + ... [0.08, 0.2, 0.5, 0.4], + ... [0.1, 0.2, 0.3, 0.4], + ... ] + >>> issues_from_scores(sentence_scores, token_scores=token_scores) + [(1, 0), (3, 1), (4, 2), (5, 3), (8, 0)] + """ + if token_scores: + issues_with_scores = [] + for sentence_index, scores in enumerate(token_scores): + for token_index, score in enumerate(scores): + if score < threshold: + issues_with_scores.append((sentence_index, token_index, score)) + + issues_with_scores = sorted(issues_with_scores, key=lambda x: x[2]) + issues = [(i, j) for i, j, _ in issues_with_scores] + return issues + + else: + ranking = np.argsort(sentence_scores) + cutoff = 0 + while sentence_scores[ranking[cutoff]] < threshold and cutoff < len(ranking): + cutoff += 1 + return ranking[:cutoff]
+ + +def _softmin_sentence_score( + token_scores: List[np.ndarray], *, temperature: float = 0.05 +) -> np.ndarray: + """ + Sentence overall label quality scoring using the "softmin" method. + + Parameters + ---------- + token_scores: + Per-token label quality scores in nested list format, + where `token_scores[i]` is a list of scores for each toke in the i'th sentence. + + temperature: + Temperature of the softmax function. + + Lower values encourage this method to converge toward the label quality score of the token with the lowest quality label in the sentence. + + Higher values encourage this method to converge toward the average label quality score of all tokens in the sentence. + + Returns + --------- + sentence_scores: + Array of shape ``(N, )``, where N is the number of sentences in the dataset, with one overall label quality score for each sentence. + + Examples + --------- + >>> from cleanlab.token_classification.rank import _softmin_sentence_score + >>> token_scores = [[0.9, 0.6], [0.0, 0.8, 0.8], [0.8]] + >>> _softmin_sentence_score(token_scores) + array([6.00741787e-01, 1.80056239e-07, 8.00000000e-01]) + """ + if temperature == 0: + return np.array([np.min(scores) for scores in token_scores]) + + if temperature == np.inf: + return np.array([np.mean(scores) for scores in token_scores]) + + def fun(scores: np.ndarray) -> float: + return np.dot( + scores, softmax(x=1 - np.array(scores), temperature=temperature, axis=0, shift=True) + ) + + sentence_scores = list(map(fun, token_scores)) + return np.array(sentence_scores) +
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/cleanlab/token_classification/summary.html b/v2.6.6/_modules/cleanlab/token_classification/summary.html new file mode 100644 index 000000000..9c774679a --- /dev/null +++ b/v2.6.6/_modules/cleanlab/token_classification/summary.html @@ -0,0 +1,1040 @@ + + + + + + + + + + + cleanlab.token_classification.summary - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

Source code for cleanlab.token_classification.summary

+# Copyright (C) 2017-2023  Cleanlab Inc.
+# This file is part of cleanlab.
+#
+# cleanlab is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# cleanlab is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with cleanlab.  If not, see <https://www.gnu.org/licenses/>.
+
+"""
+Methods to display sentences and their label issues in a token classification dataset (text data), as well as summarize the types of issues identified.
+"""
+
+from typing import Any, Dict, List, Optional, Tuple
+
+import numpy as np
+import pandas as pd
+
+from cleanlab.internal.token_classification_utils import color_sentence, get_sentence
+
+
+
[docs]def display_issues( + issues: list, + tokens: List[List[str]], + *, + labels: Optional[list] = None, + pred_probs: Optional[list] = None, + exclude: List[Tuple[int, int]] = [], + class_names: Optional[List[str]] = None, + top: int = 20, +) -> None: + """ + Display token classification label issues, showing sentence with problematic token(s) highlighted. + + Can also shows given and predicted label for each token identified to have label issue. + + Parameters + ---------- + issues: + List of tuples ``(i, j)`` representing a label issue for the `j`-th token of the `i`-th sentence. + + Same format as output by :py:func:`token_classification.filter.find_label_issues <cleanlab.token_classification.filter.find_label_issues>` + or :py:func:`token_classification.rank.issues_from_scores <cleanlab.token_classification.rank.issues_from_scores>`. + + tokens: + Nested list such that `tokens[i]` is a list of tokens (strings/words) that comprise the `i`-th sentence. + + labels: + Optional nested list of given labels for all tokens, such that `labels[i]` is a list of labels, one for each token in the `i`-th sentence. + For a dataset with K classes, each label must be in 0, 1, ..., K-1. + + If `labels` is provided, this function also displays given label of the token identified with issue. + + pred_probs: + Optional list of np arrays, such that `pred_probs[i]` has shape ``(T, K)`` if the `i`-th sentence contains T tokens. + + Each row of `pred_probs[i]` corresponds to a token `t` in the `i`-th sentence, + and contains model-predicted probabilities that `t` belongs to each of the K possible classes. + + Columns of each `pred_probs[i]` should be ordered such that the probabilities correspond to class 0, 1, ..., K-1. + + If `pred_probs` is provided, this function also displays predicted label of the token identified with issue. + + exclude: + Optional list of given/predicted label swaps (tuples) to be ignored. For example, if `exclude=[(0, 1), (1, 0)]`, + tokens whose label was likely swapped between class 0 and 1 are not displayed. Class labels must be in 0, 1, ..., K-1. + + class_names: + Optional length K list of names of each class, such that `class_names[i]` is the string name of the class corresponding to `labels` with value `i`. + + If `class_names` is provided, display these string names for predicted and given labels, otherwise display the integer index of classes. + + top: int, default=20 + Maximum number of issues to be printed. + + Examples + -------- + >>> from cleanlab.token_classification.summary import display_issues + >>> issues = [(2, 0), (0, 1)] + >>> tokens = [ + ... ["A", "?weird", "sentence"], + ... ["A", "valid", "sentence"], + ... ["An", "sentence", "with", "a", "typo"], + ... ] + >>> display_issues(issues, tokens) + Sentence index: 2, Token index: 0 + Token: An + ---- + An sentence with a typo + <BLANKLINE> + <BLANKLINE> + Sentence index: 0, Token index: 1 + Token: ?weird + ---- + A ?weird sentence + """ + if not class_names and (labels or pred_probs): + print( + "Classes will be printed in terms of their integer index since `class_names` was not provided.\n" + "Specify this argument to see the string names of each class.\n" + ) + + top = min(top, len(issues)) + shown = 0 + is_tuple = isinstance(issues[0], tuple) + + for issue in issues: + if is_tuple: + i, j = issue + sentence = get_sentence(tokens[i]) + word = tokens[i][j] + + if pred_probs: + prediction = pred_probs[i][j].argmax() + if labels: + given = labels[i][j] + if pred_probs and labels: + if (given, prediction) in exclude: + continue + + if pred_probs and class_names: + prediction = class_names[prediction] + if labels and class_names: + given = class_names[given] + + shown += 1 + print(f"Sentence index: {i}, Token index: {j}") + print(f"Token: {word}") + if labels and not pred_probs: + print(f"Given label: {given}") + elif not labels and pred_probs: + print(f"Predicted label according to provided pred_probs: {prediction}") + elif labels and pred_probs: + print( + f"Given label: {given}, predicted label according to provided pred_probs: {prediction}" + ) + print("----") + print(color_sentence(sentence, word)) + else: + shown += 1 + sentence = get_sentence(tokens[issue]) + print(f"Sentence issue: {sentence}") + if shown == top: + break + print("\n")
+ + +
[docs]def common_label_issues( + issues: List[Tuple[int, int]], + tokens: List[List[str]], + *, + labels: Optional[list] = None, + pred_probs: Optional[list] = None, + class_names: Optional[List[str]] = None, + top: int = 10, + exclude: List[Tuple[int, int]] = [], + verbose: bool = True, +) -> pd.DataFrame: + """ + Display the tokens (words) that most commonly have label issues. + + These may correspond to words that are ambiguous or systematically misunderstood by the data annotators. + + Parameters + ---------- + issues: + List of tuples ``(i, j)`` representing a label issue for the `j`-th token of the `i`-th sentence. + + Same format as output by :py:func:`token_classification.filter.find_label_issues <cleanlab.token_classification.filter.find_label_issues>` + or :py:func:`token_classification.rank.issues_from_scores <cleanlab.token_classification.rank.issues_from_scores>`. + + tokens: + Nested list such that `tokens[i]` is a list of tokens (strings/words) that comprise the `i`-th sentence. + + labels: + Optional nested list of given labels for all tokens in the same format as `labels` for `~cleanlab.token_classification.summary.display_issues`. + + If `labels` is provided, this function also displays given label of the token identified to commonly suffer from label issues. + + pred_probs: + Optional list of model-predicted probabilities (np arrays) in the same format as `pred_probs` for + `~cleanlab.token_classification.summary.display_issues`. + + If both `labels` and `pred_probs` are provided, also reports each type of given/predicted label swap for tokens identified to commonly suffer from label issues. + + class_names: + Optional length K list of names of each class, such that `class_names[i]` is the string name of the class corresponding to `labels` with value `i`. + + If `class_names` is provided, display these string names for predicted and given labels, otherwise display the integer index of classes. + + top: + Maximum number of tokens to print information for. + + exclude: + Optional list of given/predicted label swaps (tuples) to be ignored in the same format as `exclude` for + `~cleanlab.token_classification.summary.display_issues`. + + verbose: + Whether to also print out the token information in the returned DataFrame `df`. + + Returns + ------- + df: + If both `labels` and `pred_probs` are provided, DataFrame `df` contains columns ``['token', 'given_label', + 'predicted_label', 'num_label_issues']``, and each row contains information for a specific token and + given/predicted label swap, ordered by the number of label issues inferred for this type of label swap. + + Otherwise, `df` only has columns ['token', 'num_label_issues'], and each row contains the information for a specific + token, ordered by the number of total label issues involving this token. + + Examples + -------- + >>> from cleanlab.token_classification.summary import common_label_issues + >>> issues = [(2, 0), (0, 1)] + >>> tokens = [ + ... ["A", "?weird", "sentence"], + ... ["A", "valid", "sentence"], + ... ["An", "sentence", "with", "a", "typo"], + ... ] + >>> df = common_label_issues(issues, tokens) + Token '?weird' is potentially mislabeled 1 times throughout the dataset + <BLANKLINE> + Token 'An' is potentially mislabeled 1 times throughout the dataset + <BLANKLINE> + >>> df + token num_label_issues + 0 An 1 + 1 ?weird 1 + """ + count: Dict[str, Any] = {} + if not labels or not pred_probs: + for issue in issues: + i, j = issue + word = tokens[i][j] + if word not in count: + count[word] = 0 + count[word] += 1 + + words = [word for word in count.keys()] + freq = [count[word] for word in words] + rank = np.argsort(freq)[::-1][:top] + + for r in rank: + print( + f"Token '{words[r]}' is potentially mislabeled {freq[r]} times throughout the dataset\n" + ) + + info = [[word, f] for word, f in zip(words, freq)] + info = sorted(info, key=lambda x: x[1], reverse=True) + return pd.DataFrame(info, columns=["token", "num_label_issues"]) + + if not class_names: + print( + "Classes will be printed in terms of their integer index since `class_names` was not provided. " + ) + print("Specify this argument to see the string names of each class. \n") + + n = pred_probs[0].shape[1] + for issue in issues: + i, j = issue + word = tokens[i][j] + label = labels[i][j] + pred = pred_probs[i][j].argmax() + if word not in count: + count[word] = np.zeros([n, n], dtype=int) + if (label, pred) not in exclude: + count[word][label][pred] += 1 + words = [word for word in count.keys()] + freq = [np.sum(count[word]) for word in words] + rank = np.argsort(freq)[::-1][:top] + + for r in rank: + matrix = count[words[r]] + most_frequent = np.argsort(count[words[r]].flatten())[::-1] + print( + f"Token '{words[r]}' is potentially mislabeled {freq[r]} times throughout the dataset" + ) + if verbose: + print( + "---------------------------------------------------------------------------------------" + ) + for f in most_frequent: + i, j = f // n, f % n + if matrix[i][j] == 0: + break + if class_names: + print( + f"labeled as class `{class_names[i]}` but predicted to actually be class `{class_names[j]}` {matrix[i][j]} times" + ) + else: + print( + f"labeled as class {i} but predicted to actually be class {j} {matrix[i][j]} times" + ) + print() + info = [] + for word in words: + for i in range(n): + for j in range(n): + num = count[word][i][j] + if num > 0: + if not class_names: + info.append([word, i, j, num]) + else: + info.append([word, class_names[i], class_names[j], num]) + info = sorted(info, key=lambda x: x[3], reverse=True) + return pd.DataFrame( + info, columns=["token", "given_label", "predicted_label", "num_label_issues"] + )
+ + +
[docs]def filter_by_token( + token: str, issues: List[Tuple[int, int]], tokens: List[List[str]] +) -> List[Tuple[int, int]]: + """ + Return subset of label issues involving a particular token. + + Parameters + ---------- + token: + A specific token you are interested in. + + issues: + List of tuples ``(i, j)`` representing a label issue for the `j`-th token of the `i`-th sentence. + Same format as output by :py:func:`token_classification.filter.find_label_issues <cleanlab.token_classification.filter.find_label_issues>` + or :py:func:`token_classification.rank.issues_from_scores <cleanlab.token_classification.rank.issues_from_scores>`. + + tokens: + Nested list such that `tokens[i]` is a list of tokens (strings/words) that comprise the `i`-th sentence. + + Returns + ---------- + issues_subset: + List of tuples ``(i, j)`` representing a label issue for the `j`-th token of the `i`-th sentence, in the same format as `issues`. + But restricting to only those issues that involve the specified `token`. + + Examples + -------- + >>> from cleanlab.token_classification.summary import filter_by_token + >>> token = "?weird" + >>> issues = [(2, 0), (0, 1)] + >>> tokens = [ + ... ["A", "?weird", "sentence"], + ... ["A", "valid", "sentence"], + ... ["An", "sentence", "with", "a", "typo"], + ... ] + >>> filter_by_token(token, issues, tokens) + [(0, 1)] + """ + returned_issues = [] + for issue in issues: + i, j = issue + if token.lower() == tokens[i][j].lower(): + returned_issues.append(issue) + return returned_issues
+
+
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_modules/index.html b/v2.6.6/_modules/index.html new file mode 100644 index 000000000..2b0b3795a --- /dev/null +++ b/v2.6.6/_modules/index.html @@ -0,0 +1,739 @@ + + + + + + + + + + + Overview: module code - cleanlab + + + + + + + + + + + + + + + + + + + + + Contents + + + + + + Menu + + + + + + + + Expand + + + + + + Light mode + + + + + + + + + + + + + + Dark mode + + + + + + + Auto light/dark mode + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
cleanlab
+
+
+
+ +
+ +
+
+ +
+
+
+ + + + + Back to top + +
+
+ +
+ +
+
+ + + + + + + + +

+ + + + + + +

All modules for which code is available

+ +
+
+
+ + +
+
+ + Made with Sphinx and @pradyunsg's + + Furo + +
+
+
+ + + + + + +
+
+
+ + + + + +
+
+ +
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/v2.6.6/_sources/cleanlab/benchmarking/index.rst b/v2.6.6/_sources/cleanlab/benchmarking/index.rst new file mode 100644 index 000000000..7a2e1607d --- /dev/null +++ b/v2.6.6/_sources/cleanlab/benchmarking/index.rst @@ -0,0 +1,12 @@ +benchmarking +============ + +.. automodule:: cleanlab.benchmarking + :autosummary: + :members: + :undoc-members: + :show-inheritance: + +.. toctree:: + + noise_generation diff --git a/v2.6.6/_sources/cleanlab/benchmarking/noise_generation.rst b/v2.6.6/_sources/cleanlab/benchmarking/noise_generation.rst new file mode 100644 index 000000000..d408ad79c --- /dev/null +++ b/v2.6.6/_sources/cleanlab/benchmarking/noise_generation.rst @@ -0,0 +1,8 @@ +noise_generation +================ + +.. automodule:: cleanlab.benchmarking.noise_generation + :autosummary: + :members: + :undoc-members: + :show-inheritance: diff --git a/v2.6.6/_sources/cleanlab/classification.rst b/v2.6.6/_sources/cleanlab/classification.rst new file mode 100644 index 000000000..cf4430548 --- /dev/null +++ b/v2.6.6/_sources/cleanlab/classification.rst @@ -0,0 +1,8 @@ +classification +============== + +.. automodule:: cleanlab.classification + :autosummary: + :members: + :undoc-members: + :show-inheritance: \ No newline at end of file diff --git a/v2.6.6/_sources/cleanlab/count.rst b/v2.6.6/_sources/cleanlab/count.rst new file mode 100644 index 000000000..33f743584 --- /dev/null +++ b/v2.6.6/_sources/cleanlab/count.rst @@ -0,0 +1,8 @@ +count +===== + +.. automodule:: cleanlab.count + :autosummary: + :members: + :undoc-members: + :show-inheritance: \ No newline at end of file diff --git a/v2.6.6/_sources/cleanlab/data_valuation.rst b/v2.6.6/_sources/cleanlab/data_valuation.rst new file mode 100644 index 000000000..8a05136b1 --- /dev/null +++ b/v2.6.6/_sources/cleanlab/data_valuation.rst @@ -0,0 +1,8 @@ +data_valuation +============== + +.. automodule:: cleanlab.data_valuation + :autosummary: + :members: + :undoc-members: + :show-inheritance: \ No newline at end of file diff --git a/v2.6.6/_sources/cleanlab/datalab/datalab.rst b/v2.6.6/_sources/cleanlab/datalab/datalab.rst new file mode 100644 index 000000000..8a38a27f9 --- /dev/null +++ b/v2.6.6/_sources/cleanlab/datalab/datalab.rst @@ -0,0 +1,9 @@ +datalab +======= + +.. automodule:: cleanlab.datalab.datalab + :autosummary: + :members: + :undoc-members: + :show-inheritance: + :ignore-module-all: \ No newline at end of file diff --git a/v2.6.6/_sources/cleanlab/datalab/guide/_templates/issue_types_tip.rst b/v2.6.6/_sources/cleanlab/datalab/guide/_templates/issue_types_tip.rst new file mode 100644 index 000000000..5b8ee6144 --- /dev/null +++ b/v2.6.6/_sources/cleanlab/datalab/guide/_templates/issue_types_tip.rst @@ -0,0 +1,10 @@ +.. tip:: + + This type of issue has the issue name `"{{issue_name}}"`. + + Run a check for this particular kind of issue by calling :py:meth:`Datalab.find_issues() ` like so: + + .. code-block:: python + + # `lab` is a Datalab instance + lab.find_issues(..., issue_types = {"{{issue_name}}": {}}) diff --git a/v2.6.6/_sources/cleanlab/datalab/guide/custom_issue_manager.rst b/v2.6.6/_sources/cleanlab/datalab/guide/custom_issue_manager.rst new file mode 100644 index 000000000..dd7ddcc09 --- /dev/null +++ b/v2.6.6/_sources/cleanlab/datalab/guide/custom_issue_manager.rst @@ -0,0 +1,225 @@ +.. _issue_manager_creating_your_own: + +Creating Your Own Issues Manager +================================ + + + +This guide walks through the process of creating your own +:py:class:`IssueManager ` +to detect a custom-defined type of issue alongside the pre-defined issue types in +:py:class:`Datalab `. + +.. seealso:: + + - :py:meth:`register `: + You can either use this function at runtime to register a new issue manager: + + .. code-block:: python + + from cleanlab.datalab.internal.issue_manager_factory import register + register(MyIssueManager) # Defaults to task="classification" + # register(MyIssueManagerForRegression, task="regression") # Alternative for regression tasks + + or add as a decorator to the class definition (currently only works for classification tasks): + + .. code-block:: python + + @register + class MyIssueManager(IssueManager): + ... + +Prerequisites +------------- + +As a starting point for this guide, we'll import the necessary things for the next section and create a dummy dataset. + +.. note:: + + .. include:: ../optional_dependencies.rst + +.. code-block:: python + + + import numpy as np + import pandas as pd + from cleanlab import IssueManager + + # Create a dummy dataset + N = 20 + data = pd.DataFrame( + { + "text": [f"example {i}" for i in range(N)], + "label": np.random.randint(0, 2, N), + }, + ) + + +Implementing IssueManagers +-------------------------- + +.. _basic_issue_manager: + +Basic Issue Check +~~~~~~~~~~~~~~~~~ + + +To create a basic issue manager, inherit from the +:py:class:`IssueManager ` class, +assign a name to the class as the class-variable, `issue_name`, and implement the +:py:meth:`find_issues ` method. + +The :py:meth:`find_issues ` +method should mark each example in the dataset as an issue or not with a boolean array. +It should also provide a score for each example in the dataset that quantifies the quality of the example +with regards to the issue. + +.. code-block:: python + + class Basic(IssueManager): + # Assign a name to the issue + issue_name = "basic" + def find_issues(self, **kwargs) -> None: + # Compute scores for each example + scores = np.random.rand(len(self.datalab.data)) + + # Construct a dataframe where examples are marked for issues + # and the score for each example is included. + self.issues = pd.DataFrame( + { + f"is_{self.issue_name}_issue" : scores < 0.1, + self.issue_score_key : scores, + }, + ) + + # Score the dataset as a whole based on this issue type + self.summary = self.make_summary(score = scores.mean()) + + +.. _intermediate_issue_manager: + +Intermediate Issue Check +~~~~~~~~~~~~~~~~~~~~~~~~ + + +To create an intermediate issue: + +- Perform the same steps as in the :ref:`basic issue check ` section. +- Populate the `info` attribute with a dictionary of information about the identified issues. + +The information can be included in a report generated by :py:class:`Datalab `, +if you add any of the keys to the `verbosity_levels` class-attribute. +Optionally, you can also add a description of the type of issue this issue manager handles to the `description` class-attribute. + +.. code-block:: python + + class Intermediate(IssueManager): + issue_name = "intermediate" + # Add a dictionary of information to include in the report + verbosity_levels = { + 0: [], + 1: ["std"], + 2: ["raw_scores"], + } + # Add a description of the issue + description = "Intermediate issues are a bit more involved than basic issues." + def find_issues(self, *, intermediate_arg: int, **kwargs) -> None: + N = len(self.datalab.data) + raw_scores = np.random.rand(N) + std = raw_scores.std() + threshold = min(0, raw_scores.mean() - std) + sin_filter = np.sin(intermediate_arg * np.arange(N) / N) + kernel = sin_filter ** 2 + scores = kernel * raw_scores + self.issues = pd.DataFrame( + { + f"is_{self.issue_name}_issue" : scores < threshold, + self.issue_score_key : scores, + }, + ) + self.summary = self.make_summary(score = scores.mean()) + + # Useful information that will be available in the Datalab instance + self.info = { + "std": std, + "raw_scores": raw_scores, + "kernel": kernel, + } + +Advanced Issue Check +~~~~~~~~~~~~~~~~~~~~ + +There could be different types of issues detected in a dataset. A local issue which affects individual data points in a dataset and can be tracked via `Datalab.issues` dataframe (to see which data points are exhibiting this type of issue). Alternatively, a global issue which affects the overall dataset but is not easily attributable to individual data points (hard to say one data point exhibits the issue but another does not). Even for global issues, we recommend trying to assign a per data point score (and boolean) if possible, see the Non-IID IssueManager as an example of this. Note that a global issue must have num_issues greater than 0 in its `issue_summary`, otherwise it won't show up in `Datalab.report()` by default. + + +Use with Datalab +---------------- + +We can create a +:py:class:`Datalab ` +instance and run issue checks with the custom issue managers we created like so: + + +.. code-block:: python + + from cleanlab.datalab.internal.issue_manager_factory import register + from cleanlab import Datalab + + + # Register the issue manager + for issue_manager in [Basic, Intermediate]: + register(issue_manager) + + # Instantiate a datalab instance + datalab = Datalab(data, label_name="label") + + # Run the issue check + issue_types = {"basic": {}, "intermediate": {"intermediate_arg": 2}} + datalab.find_issues(issue_types=issue_types) + + # Print report + datalab.report(verbosity=0) + + +The report will look something like this: + +.. code-block:: text + + Here is a summary of the different kinds of issues found in the data: + + issue_type score num_issues + basic 0.477762 2 + intermediate 0.286455 0 + + (Note: A lower score indicates a more severe issue across all examples in the dataset.) + + + ------------------------------------------- basic issues ------------------------------------------- + + Number of examples with this issue: 2 + Overall dataset quality in terms of this issue: 0.4778 + + Examples representing most severe instances of this issue: + is_basic_issue basic_score + 13 True 0.003042 + 8 True 0.058117 + 11 False 0.121908 + 15 False 0.169312 + 17 False 0.229044 + + + --------------------------------------- intermediate issues ---------------------------------------- + + About this issue: + Intermediate issues are a bit more involved than basic issues. + + Number of examples with this issue: 0 + Overall dataset quality in terms of this issue: 0.2865 + + Examples representing most severe instances of this issue: + is_intermediate_issue intermediate_score kernel + 0 False 0.000000 0.0 + 1 False 0.007059 0.009967 + 3 False 0.010995 0.087332 + 2 False 0.016296 0.03947 + 11 False 0.019459 0.794251 diff --git a/v2.6.6/_sources/cleanlab/datalab/guide/generating_cluster_ids.rst b/v2.6.6/_sources/cleanlab/datalab/guide/generating_cluster_ids.rst new file mode 100644 index 000000000..5209fc8b3 --- /dev/null +++ b/v2.6.6/_sources/cleanlab/datalab/guide/generating_cluster_ids.rst @@ -0,0 +1,29 @@ +Generating Cluster IDs +====================== + +The underperforming group issue manager provides the option for passing pre-computed +cluster IDs to `find_issues`. These cluster IDs can be obtained by clustering +the features using algorithms such as K-Means, DBSCAN, HDBSCAN etc. Note that + +* K-Means requires specifying the number of clusters explicitly. +* DBSCAN is sensitive to the choice of `eps` (radius) and `min_samples` (minimum samples for each cluster). + + +Example: + +.. code-block:: python + + import datalab + from sklearn.cluster import KMeans + features, labels = your_data() # Get features and labels + pred_probs = get_pred_probs() # Get prediction probabilities for all samples + # Group features into 8 clusters + clusterer = KMeans(n_clusters=5) + clusterer.fit(features) + cluster_ids = clusterer.labels_ + lab = Datalab(data={"features": features, "y": labels}, label_name="y") + issue_types = {"underperforming_group": {"cluster_ids": cluster_ids}} + lab.find_issues(features=features, pred_probs=pred_probs, issue_types=issue_types) + + + diff --git a/v2.6.6/_sources/cleanlab/datalab/guide/index.rst b/v2.6.6/_sources/cleanlab/datalab/guide/index.rst new file mode 100644 index 000000000..de902196a --- /dev/null +++ b/v2.6.6/_sources/cleanlab/datalab/guide/index.rst @@ -0,0 +1,41 @@ +Datalab guides +============== + +Guides for using Datalab and understanding the issues it detects. + +.. note:: + + .. include:: ../optional_dependencies.rst + + +Types of issues +--------------- + +Guides to use Datalab with greater control, selecting what issues to search for and what nondefault settings to use for detecting them. + +.. toctree:: + :maxdepth: 3 + + issue_type_description + +Customizing issue types +----------------------- + +Guides (for developers) to create a custom issue type that Datalab audits for together with its built-in issue types. + +.. toctree:: + :maxdepth: 3 + + custom_issue_manager + + +Cleanlab Studio (Easy Mode) +--------------------------- + +`Cleanlab Studio `_ is a fully automated platform that can detect the same data issues as this package, as well as `many more types of issues `_, all without you having to do any Machine Learning (or even write any code). Beyond being 100x faster to use and producing more useful results, `Cleanlab Studio `_ also provides an intelligent data correction interface for you to quickly fix the issues detected in your dataset (a single data scientist can fix millions of data points thanks to AI suggestions). + +`Cleanlab Studio `_ offers a powerful AutoML system (with Foundation models) that is useful for more than improving data quality. With a few clicks, you can: find + fix issues in your dataset, identify the best type of ML model and train/tune it, and deploy this model to serve accurate predictions for new data. Also use the same AutoML to auto-label large datasets (a single user can label millions of data points thanks to powerful Foundation models). `Try Cleanlab Studio for free! `_ + +.. image:: https://raw.githubusercontent.com/cleanlab/assets/master/cleanlab/ml-with-cleanlab-studio.png + :width: 800 + :alt: Stages of modern AI pipeline that can now be automated with Cleanlab Studio diff --git a/v2.6.6/_sources/cleanlab/datalab/guide/issue_type_description.rst b/v2.6.6/_sources/cleanlab/datalab/guide/issue_type_description.rst new file mode 100644 index 000000000..ee6f09fe8 --- /dev/null +++ b/v2.6.6/_sources/cleanlab/datalab/guide/issue_type_description.rst @@ -0,0 +1,748 @@ +Datalab Issue Types +******************* + + +Types of issues Datalab can detect +=================================== + +This page describes the various types of issues that Datalab can detect in a dataset. +For each type of issue, we explain: what it says about your data if detected, why this matters, and what parameters you can optionally specify to control the detection of this issue. + +In case you didn't know: you can alternatively use `Cleanlab Studio `_ to detect the same data issues as this package, plus `many more types of issues `_, all without having to do any Machine Learning (or even write any code). + + +.. include:: table.rst + + +Estimates for Each Issue Type +------------------------------ + +Datalab produces three estimates for **each** type of issue (called say `` here): + + +1. A numeric quality score `_score` (between 0 and 1) estimating how severe this issue is exhibited in each example from a dataset. Examples with higher scores are less likely to suffer from this issue. Access these via: the :py:attr:`Datalab.issues ` attribute or the method :py:meth:`Datalab.get_issues(\) `. +2. A Boolean `is__issue` flag for each example from a dataset. Examples where this has value `True` are those estimated to exhibit this issue. Access these via: the :py:attr:`Datalab.issues ` attribute or the method :py:meth:`Datalab.get_issues(\) `. +3. An overall dataset quality score (between 0 and 1), quantifying how severe this issue is overall across the entire dataset. Datasets with higher scores do not exhibit this issue as badly overall. Access these via: the :py:attr:`Datalab.issue_summary ` attribute or the method :py:meth:`Datalab.get_issue_summary(\) `. + +**Example (for the outlier issue type)** + +.. code-block:: python + + issue_name = "outlier" # how to reference the outlier issue type in code + issue_score = "outlier_score" # name of column with quality scores for the outlier issue type, atypical datapoints receive lower scores + is_issue = "is_outlier_issue" # name of Boolean column flagging which datapoints are considered outliers in the dataset + +**Dataset vs. data point level issues** + +Some issues are primarily about the overall dataset (e.g. non-IID, class imbalance, underperforming group), whereas others are primarily about individual examples (e.g. label issue, outlier, near duplicate, null, etc). The former issue types should be first investigated via the global score from :py:meth:`Datalab.get_issue_summary `, as the per-example results for such issues from :py:meth:`Datalab.get_issues ` require more expertise to interpret. + +Inputs to Datalab +----------------- + +Datalab estimates various issues based on the four inputs below. +Each input is optional, if you do not provide it, Datalab will skip checks for those types of issues that require this input. + +1. ``label_name`` - a field in the dataset that the stores the annotated class for each example in a multi-class classification dataset. +2. ``pred_probs`` - predicted class probabilities output by your trained model for each example in the dataset (these should be out-of-sample, eg. produced via cross-validation). +3. ``features`` - numeric vector representations of the features for each example in the dataset. These may be embeddings from a (pre)trained model, or just a numerically-transformed version of the original data features. +4. ``knn_graph`` - K nearest neighbor graph represented as a sparse matrix of dissimilarity values between examples in the dataset. If both `knn_graph` and `features` are provided, the `knn_graph` takes precedence, and if only `features` is provided, then a `knn_graph` is internally constructed based on the (either euclidean or cosine) distance between different examples’ features. + + +Label Issue +----------- + +Examples whose given label is estimated to be potentially incorrect (e.g. due to annotation error) are flagged as having label issues. +Datalab estimates which examples appear mislabeled as well as a numeric label quality score for each, which quantifies the likelihood that an example is correctly labeled. + +For now, Datalab can only detect label issues in multi-class classification datasets, regression datasets, and multi-label classification datasets. +The cleanlab library has alternative methods you can use to detect label issues in other types of datasets (multi-annotator, token classification, etc.). + +Label issues are calculated based on provided `pred_probs` from a trained model. If you do not provide this argument, but you do provide `features`, then a K Nearest Neighbor model will be fit to produce `pred_probs` based on your `features`. Otherwise if neither `pred_probs` nor `features` is provided, then this type of issue will not be considered. +For the most accurate results, provide out-of-sample `pred_probs` which can be obtained for a dataset via `cross-validation `_. + +Having mislabeled examples in your dataset may hamper the performance of supervised learning models you train on this data. +For evaluating models or performing other types of data analytics, mislabeled examples may lead you to draw incorrect conclusions. +To handle mislabeled examples, you can either filter out the data with label issues or try to correct their labels. + +Learn more about the method used to detect label issues in our paper: `Confident Learning: Estimating Uncertainty in Dataset Labels `_ + +.. testsetup:: * + + import numpy as np + from cleanlab import Datalab + from sklearn.linear_model import LogisticRegression + from sklearn.model_selection import cross_val_predict + + # Load a dataset + np.random.seed(0) + + X = np.random.rand(100, 10) + X[-1] = X[-2] # Create an exact-duplicate example + y = np.random.randint(0, 3, 100) + + X[y == 1] -= 0.85 # Add noise to the features of class 1 + X[y == 2] += 0.85 # Add noise to the features of class 2 + + y[-3] = {0: 1, 1: 2, 2: 0}[y[-3]] # Swap the label of the example at index -3 + + clf = LogisticRegression(random_state=0) + pred_probs = cross_val_predict(clf, X, y, cv=3, method="predict_proba") + + data = {"features": X, "labels": y} + + lab = Datalab(data, label_name="labels", task="classification") + +.. testsetup:: + + lab.find_issues(features=X, pred_probs=pred_probs) + lab.find_issues(features=X, pred_probs=pred_probs, issue_types={"data_valuation": {}}) + +Some metadata about label issues is stored in the `issues` attribute of the Datalab object. +Let's look at one way to access this information. + +.. testcode:: + + lab.get_issues("label").sort_values("label_score").head(5) + +The output will look something like this: + +.. testoutput:: + + is_label_issue label_score given_label predicted_label + 97 True 0.064045 0 2 + 58 False 0.680894 2 2 + 41 False 0.746043 0 0 + 4 False 0.794894 2 2 + 98 False 0.802911 1 1 + +``is_label_issue`` +~~~~~~~~~~~~~~~~~~ + +A boolean column that flags examples with label issues. +If `True`, the example is estimated to have a label issue. +If `False`, the example is estimated to not have a label issue. + +``label_score`` +~~~~~~~~~~~~~~~ + +A numeric column that gives the label quality score for each example. +The score lies between 0 and 1. +The lower the score, the less likely the given label is to be correct. + + +``given_label`` +~~~~~~~~~~~~~~~ + +A column of the actual labels as provided in the original dataset. + +``predicted_label`` +~~~~~~~~~~~~~~~~~~~ + +A column of the predicted labels for each example. This column may contain different labels than the given label, especially when the example is estimated to have a label issue or when a model predicts a different label than the given label. + +.. jinja :: + + {% with issue_name = "label" %} + {% include "cleanlab/datalab/guide/_templates/issue_types_tip.rst" %} + {% endwith %} + + +Outlier Issue +------------- + +Examples that are very different from the rest of the dataset (i.e. potentially out-of-distribution or rare/anomalous instances). + +Outlier issues are calculated based on provided `features` , `knn_graph` , or `pred_probs`. +If you do not provide one of these arguments, this type of issue will not be considered. +This article describes how outlier issues are detected in a dataset: `https://cleanlab.ai/blog/outlier-detection/ `_. + +When based on `features` or `knn_graph`, the outlier quality of each example is scored inversely proportional to its distance to its K nearest neighbors in the dataset. + +When based on `pred_probs`, the outlier quality of each example is scored inversely proportional to the uncertainty in its prediction. + +Modeling data with outliers may have unexpected consequences. +Closely inspect them and consider removing some outliers that may be negatively affecting your models. + + +Learn more about the methods used to detect outliers in our article: `Out-of-Distribution Detection via Embeddings or Predictions `_ + +Some metadata about outlier issues is stored in the `issues` attribute of the Datalab object. +Let's look at one way to access this information. + +.. testcode:: + + lab.get_issues("outlier").sort_values("outlier_score").head(5) + +The output will look something like this: + +.. testoutput:: + + is_outlier_issue outlier_score + 98 True 0.011562 + 62 False 0.019657 + 22 False 0.035243 + 1 False 0.040907 + 42 False 0.056865 + + + +``is_outlier_issue`` +~~~~~~~~~~~~~~~~~~~~ + +A boolean column, where `True` indicates that an example is identified as an outlier. + +``outlier_score`` +~~~~~~~~~~~~~~~~~ + +A numeric column with scores between 0 and 1. +A smaller value for an example indicates that it is less common or typical in the dataset, suggesting that it is more likely to be an outlier. + + +.. jinja :: + + {% with issue_name = "outlier" %} + {% include "cleanlab/datalab/guide/_templates/issue_types_tip.rst" %} + {% endwith %} + +(Near) Duplicate Issue +---------------------- + +A (near) duplicate issue refers to two or more examples in a dataset that are extremely similar to each other, relative to the rest of the dataset. +The examples flagged with this issue may be exactly duplicated, or lie atypically close together when represented as vectors (i.e. feature embeddings). +Near duplicated examples may record the same information with different: + +- Abbreviations, misspellings, typos, formatting, etc. in text data. +- Compression formats, resolutions, or sampling rates in image, video, and audio data. +- Minor variations which naturally occur in many types of data (e.g. translated versions of an image). + +Near Duplicate issues are calculated based on provided `features` or `knn_graph`. +If you do not provide one of these arguments, this type of issue will not be considered. + +Datalab defines near duplicates as those examples whose distance to their nearest neighbor (in the space of provided `features`) in the dataset is less than `c * D`, where `0 < c < 1` is a small constant, and `D` is the median (over the full dataset) of such distances between each example and its nearest neighbor. +Scoring the numeric quality of an example in terms of the near duplicate issue type is done proportionally to its distance to its nearest neighbor. + +Including near-duplicate examples in a dataset may negatively impact a ML model's generalization performance and lead to overfitting. +In particular, it is questionable to include examples in a test dataset which are (nearly) duplicated in the corresponding training dataset. +More generally, examples which happen to be duplicated can affect the final modeling results much more than other examples — so you should at least be aware of their presence. + +Some metadata about near-duplicate issues is stored in the `issues` attribute of the Datalab object. +Let's look at one way to access this information. + +.. testcode:: + + lab.get_issues("near_duplicate").sort_values("near_duplicate_score").head(5) + +The output will look something like this: + +.. testoutput:: + + is_near_duplicate_issue near_duplicate_score near_duplicate_sets distance_to_nearest_neighbor + 36 True 0.066009 [11, 80] 0.003906 + 11 True 0.066009 [36] 0.003906 + 80 True 0.093245 [36] 0.005599 + 27 False 0.156720 [] 0.009751 + 72 False 0.156720 [] 0.009751 + + +``is_near_duplicate_issue`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A boolean column, where `True` indicates that an example is identified as either a near- or exact-duplicate of other examples in the dataset. + +``near_duplicate_score`` +~~~~~~~~~~~~~~~~~~~~~~~~ + +A numeric column with scores between 0 and 1. The lower the score, the more likely the example is to be a near-duplicate of another example in the dataset. + +Exact duplicates are assigned a score of 0, while near-duplicates are assigned a score close to 0. + +``near_duplicate_sets`` +~~~~~~~~~~~~~~~~~~~~~~~ + +A column of lists of integers. The i-th list contains the indices of examples that are considered near-duplicates of example i (not including example i). + +``distance_to_nearest_neighbor`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A numeric column that represents the distance between each example and its nearest neighbor in the dataset. +The distance is calculated based on the provided `features` or `knn_graph`, and is directly related to the `near_duplicate_score`. +A smaller distance indicates that the example is similar to another example in the dataset. + +.. jinja :: + + {% with issue_name = "near_duplicate" %} + {% include "cleanlab/datalab/guide/_templates/issue_types_tip.rst" %} + {% endwith %} + +Non-IID Issue +------------- + +Whether the overall dataset exhibits statistically significant violations of the IID assumption like: changepoints or shift, drift, autocorrelation, etc. The specific form of violation considered is whether the examples are ordered within the dataset such that almost adjacent examples tend to have more similar feature values. If you care about this check, do **not** first shuffle your dataset -- this check is entirely based on the sequential order of your data. Learn more via our blog: `https://cleanlab.ai/blog/non-iid-detection/ `_ + +The Non-IID issue is detected based on provided `features` or `knn_graph`. If you do not provide one of these arguments, this type of issue will not be considered. While the Non-IID check produces per-example information, it is primarily about assessing the overall dataset rather than assessing individual examples. So pay more attention to the overall dataset Non-IID score obtained via :py:meth:`Datalab.get_issue_summary("non_iid") ` than the per-example scores. + +The Non-IID issue is really a dataset-level check, not a per-datapoint level check (either a dataset violates the IID assumption or it doesn't). The per-datapoint scores returned for Non-IID issues merely highlight which datapoints you might focus on to better understand this dataset-level issue - there is not necessarily something specifically wrong with these specific datapoints. + +Mathematically, the **overall** Non-IID score for the dataset is defined as the p-value of a statistical test for whether the distribution of *index-gap* values differs between group A vs. group B defined as follows. For a pair of examples in the dataset `x1, x2`, we define their *index-gap* as the distance between the indices of these examples in the ordering of the data (e.g. if `x1` is the 10th example and `x2` is the 100th example in the dataset, their index-gap is 90). We construct group A from pairs of examples which are amongst the K nearest neighbors of each other, where neighbors are defined based on the provided `knn_graph` or via distances in the space of the provided vector `features` . Group B is constructed from random pairs of examples in the dataset. + +The Non-IID quality score for each example `x` is defined via a similarly computed p-value but with Group A constructed from the K nearest neighbors of `x` and Group B constructed from random examples from the dataset paired with `x`. Learn more about this method in our paper: `Detecting Dataset Drift and Non-IID Sampling via k-Nearest Neighbors `_ (or the associated `blogpost `_). + +The assumption that examples in a dataset are Independent and Identically Distributed (IID) is fundamental to proper modeling. Detecting all possible violations of the IID assumption is statistically impossible. This issue type only considers specific forms of violation where examples that tend to be closer together in the dataset ordering also tend to have more similar feature values. This includes scenarios where: + +- The underlying distribution from which examples stem is evolving/drifting over time (not identically distributed). +- An example can influence the values of future examples in the dataset (not independent). + +For datasets with low non-IID score, you should consider why your data are not IID and act accordingly. For example, if the data distribution is drifting over time, consider employing a time-based train/test split instead of a random partition. Note that shuffling the data ahead of time will ensure a good non-IID score, but this is not always a fix to the underlying problem (e.g. future deployment data may stem from a different distribution, or you may overlook the fact that examples influence each other). We thus recommend **not** shuffling your data to be able to diagnose this issue if it exists. + +Some metadata about non-IID issues is stored in the `issues` attribute of the Datalab object. +Let's look at one way to access this information. + +.. testcode:: + + lab.get_issues("non_iid").sort_values("non_iid_score").head(5) + +The output will look something like this: + +.. testoutput:: + + is_non_iid_issue non_iid_score + 24 False 0.681458 + 37 False 0.804582 + 64 False 0.810646 + 80 False 0.815691 + 78 False 0.834293 + +``is_non_iid_issue`` +~~~~~~~~~~~~~~~~~~~~ + +A boolean column, where `True` values indicate that the dataset exhibits statistically significant violations of the IID assumption. +If the overall dataset does not appear to be Non-IID (p-value > 0.05), then all entries in this column will be `False`. +If the dataset appears to be Non-IID (p-value < 0.05), then one entry will be `True`, specifically the example with the lowest `non_iid_score`. +We do not recommend interpreting the per-example boolean values, as the Non-IID check is more about the overall dataset. + +``non_iid_score`` +~~~~~~~~~~~~~~~~~ + +A numeric column with scores between 0 and 1, containing the Non-IID quality scores for each example. +Learn more via our `blogpost `_. + +Be cautious when interpreting the non-IID issue score for individual examples. +The dataset as a whole receives a p-value for our non-IID test (obtained via :py:meth:`Datalab.get_issue_summary("non_iid") `), which better indicates whether the dataset exhibits non-IID behavior. + +When this p-value is low, you can use the per-example non-IID scores to identify which examples to look at for better understanding this non-IID behavior. + +.. jinja :: + + {% with issue_name = "non_iid" %} + {% include "cleanlab/datalab/guide/_templates/issue_types_tip.rst" %} + {% endwith %} + +Class Imbalance Issue +--------------------- + +Class imbalance is diagnosed just using the `labels` provided as part of the dataset. The overall class imbalance quality score of a dataset is the proportion of examples belonging to the rarest class `q`. If this proportion `q` falls below a threshold, then we say this dataset suffers from the class imbalance issue. + +In a dataset identified as having class imbalance, the class imbalance quality score for each example is set equal to `q` if it is labeled as the rarest class, and is equal to 1 for all other examples. + +Class imbalance in a dataset can lead to subpar model performance for the under-represented class. Consider collecting more data from the under-represented class, or at least take special care while modeling via techniques like over/under-sampling, SMOTE, asymmetric class weighting, etc. + +This issue-type is more about the overall dataset vs. individual data points. If severe class imbalance is detected, Datalab will flag the individual data points from the minority class. + +Some metadata about class imbalance issues is stored in the `issues` attribute of the Datalab object. +Let's look at one way to access this information. + +.. testcode:: + + lab.get_issues("class_imbalance").sort_values("class_imbalance_score").head(5) + +The output will look something like this: + +.. testoutput:: + + is_class_imbalance_issue class_imbalance_score given_label + 27 False 0.28 2 + 72 False 0.28 2 + 75 False 0.28 2 + 33 False 0.28 2 + 68 False 0.28 2 + +``is_class_imbalance_issue`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A boolean column, where `True` indicates which examples belong to the minority class (rarest class) in a classification dataset that exhibits severe class imbalance. If the dataset is not considered to have severe class imbalance (i.e. proportion of examples in the rarest class is not to small relative to the number of classes in the dataset), then all values will be `False`. + + +``class_imbalance_score`` +~~~~~~~~~~~~~~~~~~~~~~~~~ + +A numeric column with scores between 0 and 1. +Any example belonging to the most under-represented class is assigned a score equal to the proportion of examples in the dataset belonging to that class. +All other examples are assigned a score of 1. +All examples sharing the same label also share the same score. + +``given_label`` +~~~~~~~~~~~~~~~ + +A column of the actual labels as provided in the original dataset. + +.. jinja :: + + {% with issue_name = "class_imbalance" %} + {% include "cleanlab/datalab/guide/_templates/issue_types_tip.rst" %} + {% endwith %} + +Image-specific Issues +--------------------- + +Datalab can identify image-specific issues in datasets, such as images that are excessively dark or bright, blurry, lack detail, or have unusual sizes. +To detect these issues, simply specify the `image_key` argument in :py:meth:`~cleanlab.datalab.datalab.Datalab`, indicating the image column name in your dataset. +This functionality currently works only with Hugging Face datasets. You can convert other local dataset formats into a Hugging Face dataset by following `this guide `_. +More information on these image-specific issues is available in the `CleanVision package `_ . + +Underperforming Group Issue +--------------------------- + +An underperforming group refers to a cluster of similar examples (i.e. a slice) in the dataset for which the ML model predictions are poor. The examples in this underperforming group may have noisy labels or feature values, or the trained ML model may not have learned how to properly handle them (consider collecting more data from this subpopulation or up-weighting the existing data from this group). + +This issue-type is more about the overall dataset vs. individual data points. If an underperforming group is detected, Datalab will flag the individual data points from this group. + +Underperforming Group issues are detected based on one of: + +- provided `pred_probs` and `features`, +- provided `pred_probs` and `knn_graph`, or +- provided `pred_probs` and `cluster_ids`. (This option is for advanced users, see the `FAQ <../../../tutorials/faq.html#How-do-I-specify-pre-computed-data-slices/clusters-when-detecting-the-Underperforming-Group-Issue?>`_ for more details.) + +If you do not provide both these arguments, this type of issue will not be considered. + +To find the underperforming group, Cleanlab clusters the data using the provided `features` and determines the cluster `c` with the lowest average model predictive performance. Model predictive performance is evaluated via the model's self-confidence of the given labels, calculated using :py:func:`rank.get_self_confidence_for_each_label `. Suppose the average predictive power across the full dataset is `r` and is `q` within a cluster of examples. This cluster is considered to be an underperforming group if `q/r` falls below a threshold. A dataset suffers from the Underperforming Group issue if there exists such a cluster within it. +The underperforming group quality score is equal to `q/r` for examples belonging to the underperforming group, and is equal to 1 for all other examples. +Advanced users: If you have pre-computed cluster assignments for each example in the dataset, you can pass them explicitly to :py:meth:`Datalab.find_issues ` using the `cluster_ids` key in the `issue_types` dict argument. This is useful for tabular datasets where you want to group/slice the data based on a categorical column. An integer encoding of the categorical column can be passed as cluster assignments for finding the underperforming group, based on the data slices you define. + +Some metadata about underperforming group issues is stored in the `issues` attribute of the Datalab object. +Let's look at one way to access this information. + +.. testcode:: + + lab.get_issues("underperforming_group").sort_values("underperforming_group_score").head(5) + +The output will look something like this: + +.. testoutput:: + + is_underperforming_group_issue underperforming_group_score + 0 False 1.0 + 72 False 1.0 + 71 False 1.0 + 70 False 1.0 + 69 False 1.0 + +``is_underperforming_group_issue`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A boolean column, where `True` indicates which examples belong to the subgroup (i.e. cluster/slice) for which model predictions are significantly worse than for the rest of the dataset. +If there is no such underperforming subgroup detected, then all values will be `False`. + +``underperforming_group_score`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A numeric column with scores between 0 and 1. Only examples belonging to a detected underperforming group receive a score less than 1. +Every example in the underperforming group shares the same score, which is the ratio of group's label quality score vs. the mean label quality score across the dataset. +The lower the score, the worse the model predictions are for this particular subgroup relative to the rest of the dataset. + +.. jinja :: + + {% with issue_name = "underperforming_group" %} + {% include "cleanlab/datalab/guide/_templates/issue_types_tip.rst" %} + {% endwith %} + +Null Issue +---------- + +Examples identified with the null issue correspond to rows that have null/missing values across all feature columns (i.e. the entire row is missing values). + +Null issues are detected based on provided `features`. If you do not provide `features`, this type of issue will not be considered. + +Each example's null issue quality score equals the proportion of features values in this row that are not null/missing. The overall dataset null issue quality score +equals the average of the individual examples' quality scores. + +Presence of null examples in the dataset can lead to errors when training ML models. It can also +result in the model learning incorrect patterns due to the null values. + +Some metadata about null issues is stored in the `issues` attribute of the Datalab object. +Let's look at one way to access this information. + +.. testcode:: + + lab.get_issues("null").sort_values("null_score").head(5) + +The output will look something like this: + +.. testoutput:: + + is_null_issue null_score + 0 False 1.0 + 72 False 1.0 + 71 False 1.0 + 70 False 1.0 + 69 False 1.0 + +``is_null_issue`` +~~~~~~~~~~~~~~~~~ + +A boolean column, where `True` indicates that an example is identified as having null/missing values across all feature columns. +Examples that just have a single non-null value across multiple feature columns are not flagged with this issue. + +``null_score`` +~~~~~~~~~~~~~~ + +A numeric column with scores between 0 and 1. The score represents the proportion of non-null (i.e. non-missing) values in each example. +Lower scores indicate examples with more null/missing values. + +.. jinja :: + + {% with issue_name = "null"%} + {% include "cleanlab/datalab/guide/_templates/issue_types_tip.rst" %} + {% endwith %} + +Data Valuation Issue +-------------------- + +The examples in the dataset with lowest data valuation scores contribute least to a trained ML model's performance (those whose value falls below a threshold are flagged with this type of issue). + +Data valuation issues can be detected based on provided `features` or a provided `knn_graph` (or one pre-computed during the computation of other issue types). If you do not provide one of these two arguments and there isn't a `knn_graph` already stored in the Datalab object, this type of issue will not be considered. + +The data valuation score is an approximate Data Shapley value, calculated based on the labels of the top k nearest neighbors of an example. The details of this KNN-Shapley value could be found in the papers: `Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms `_ and `Scalability vs. Utility: Do We Have to Sacrifice One for the Other in Data Importance Quantification? `_. + +Some metadata about data valuation issues is stored in the `issues` attribute of the Datalab object. +Let's look at one way to access this information. + +.. testcode:: + + lab.get_issues("data_valuation").sort_values("data_valuation_score").head(5) + +The output will look something like this: + +.. testoutput:: + + is_data_valuation_issue data_valuation_score + 39 False 0.5 + 32 False 0.5 + 98 False 0.5 + 6 False 0.5 + 7 False 0.5 + +``is_data_valuation_issue`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A boolean column, where `True` indicates that an example does not appear to contribute positively to a model's training performance. + +``data_valuation_score`` +~~~~~~~~~~~~~~~~~~~~~~~~ + +A numeric column with scores between 0 and 1. The score reflects how valuable each individual example is in terms of improving the performance of the ML model trained on this dataset. +Examples with higher scores more positively influence the resulting model's predictive performance, contributing to better learning. One would expect the model to get worse if many such examples were removed from its training dataset. + +.. jinja :: + + {% with issue_name = "data_valuation"%} + {% include "cleanlab/datalab/guide/_templates/issue_types_tip.rst" %} + {% endwith %} + +Optional Issue Parameters +========================= + +Here is the dict of possible (**optional**) parameter values that can be specified via the argument `issue_types` to :py:meth:`Datalab.find_issues `. +Optionally specify these to exert greater control over how issues are detected in your dataset. +Appropriate defaults are used for any parameters you do not specify, so no need to specify all of these! + +.. code-block:: python + + possible_issue_types = { + "label": label_kwargs, "outlier": outlier_kwargs, + "near_duplicate": near_duplicate_kwargs, "non_iid": non_iid_kwargs, + "class_imbalance": class_imbalance_kwargs, "underperforming_group": underperforming_group_kwargs, + "null": null_kwargs, "data_valuation": data_valuation_kwargs, + } + + +where the possible `kwargs` dicts for each key are described in the sections below. + +Label Issue Parameters +---------------------- + +.. code-block:: python + + label_kwargs = { + "k": # number of nearest neighbors to consider when computing pred_probs from features, + "health_summary_parameters": # dict of potential keyword arguments to method `dataset.health_summary()`, + "clean_learning_kwargs": # dict of keyword arguments to constructor `CleanLearning()` including keys like: "find_label_issues_kwargs" or "label_quality_scores_kwargs", + "thresholds": # `thresholds` argument to `CleanLearning.find_label_issues()`, + "noise_matrix": # `noise_matrix` argument to `CleanLearning.find_label_issues()`, + "inverse_noise_matrix": # `inverse_noise_matrix` argument to `CleanLearning.find_label_issues()`, + "save_space": # `save_space` argument to `CleanLearning.find_label_issues()`, + "clf_kwargs": # `clf_kwargs` argument to `CleanLearning.find_label_issues()`. Currently has no effect., + "validation_func": # `validation_func` argument to `CleanLearning.fit()`. Currently has no effect., + } + +.. attention:: + + ``health_summary_parameters`` and ``health_summary_kwargs`` can work in tandem to determine the arguments to be used in the call to :py:meth:`dataset.health_summary `. + +.. note:: + + For more information, view the source code of: :py:class:`datalab.internal.issue_manager.label.LabelIssueManager `. + +Outlier Issue Parameters +------------------------ + +.. code-block:: python + + outlier_kwargs = { + "threshold": # floating value between 0 and 1 that sets the sensitivity of the outlier detection algorithms, based on either features or pred_probs.. + "ood_kwargs": # dict of keyword arguments to constructor `OutOfDistribution()`{ + "params": { + # NOTE: Each of the following keyword arguments can also be provided outside "ood_kwargs" + + "knn": # `knn` argument to constructor `OutOfDistribution()`. Used with features, + "k": # `k` argument to constructor `OutOfDistribution()`. Used with features, + "t": # `t` argument to constructor `OutOfDistribution()`. Used with features, + "adjust_pred_probs": # `adjust_pred_probs` argument to constructor `OutOfDistribution()`. Used with pred_probs, + "method": # `method` argument to constructor `OutOfDistribution()`. Used with pred_probs, + "confident_thresholds": # `confident_thresholds` argument to constructor `OutOfDistribution()`. Used with pred_probs, + }, + }, + } + +.. note:: + + For more information, view the source code of: :py:class:`datalab.internal.issue_manager.outlier.OutlierIssueManager `. + +Duplicate Issue Parameters +-------------------------- + +.. code-block:: python + + near_duplicate_kwargs = { + "metric": # string or callable representing the distance metric used in nearest neighbors search (passed as argument to `NearestNeighbors`), if necessary, + "k": # integer representing the number of nearest neighbors for nearest neighbors search (passed as argument to `NearestNeighbors`), if necessary, + "threshold": # `threshold` argument to constructor of `NearDuplicateIssueManager()`. Non-negative floating value that determines the maximum distance between two examples to be considered outliers, relative to the median distance to the nearest neighbors, + } + +.. attention:: + + `k` does not affect the results of the (near) duplicate search algorithm. It only affects the construction of the knn graph, if necessary. + +.. note:: + + For more information, view the source code of: :py:class:`datalab.internal.issue_manager.duplicate.NearDuplicateIssueManager `. + + +Non-IID Issue Parameters +------------------------ + +.. code-block:: python + + non_iid_kwargs = { + "metric": # `metric` argument to constructor of `NonIIDIssueManager`. String or callable for the distance metric used for nearest neighbors search if necessary. `metric` argument to constructor of `sklearn.neighbors.NearestNeighbors`, + "k": # `k` argument to constructor of `NonIIDIssueManager`. Integer representing the number of nearest neighbors for nearest neighbors search if necessary. `n_neighbors` argument to constructor of `sklearn.neighbors.NearestNeighbors`, + "num_permutations": # `num_permutations` argument to constructor of `NonIIDIssueManager`, + "seed": # seed for numpy's random number generator (used for permutation tests), + "significance_threshold": # `significance_threshold` argument to constructor of `NonIIDIssueManager`. Floating value between 0 and 1 that determines the overall signicance of non-IID issues found in the dataset. + } + +.. note:: + + For more information, view the source code of: :py:class:`datalab.internal.issue_manager.noniid.NonIIDIssueManager `. + + +Imbalance Issue Parameters +-------------------------- + +.. code-block:: python + + class_imbalance_kwargs = { + "threshold": # `threshold` argument to constructor of `ClassImbalanceIssueManager`. Non-negative floating value between 0 and 1 indicating the minimum fraction of samples of each class that are present in a dataset without class imbalance. + } + +.. note:: + + For more information, view the source code of: :py:class:`datalab.internal.issue_manager.imbalance.ClassImbalanceIssueManager `. + +Underperforming Group Issue Parameters +-------------------------------------- + +.. code-block:: python + + underperforming_group_kwargs = { + # Constructor arguments for `UnderperformingGroupIssueManager` + "threshold": # Non-negative floating value between 0 and 1 used for determinining group of points with low confidence. + "metric": # String or callable for the distance metric used for nearest neighbors search if necessary. `metric` argument to constructor of `sklearn.neighbors.NearestNeighbors`. + "k": # Integer representing the number of nearest neighbors for constructing the nearest neighbour graph. `n_neighbors` argument to constructor of `sklearn.neighbors.NearestNeighbors`. + "min_cluster_samples": # Non-negative integer value specifying the minimum number of examples required for a cluster to be considered as the underperforming group. Used in `UnderperformingGroupIssueManager.filter_cluster_ids`. + "clustering_kwargs": # Key-value pairs representing arguments for the constructor of the clustering algorithm class (e.g. `sklearn.cluster.DBSCAN`). + + # Argument for the find_issues() method of UnderperformingGroupIssueManager + "cluster_ids": # A 1-D numpy array containing cluster labels for each sample in the dataset. If passed, these cluster labels are used for determining the underperforming group. + } + +.. note:: + + For more information, view the source code of: :py:class:`datalab.internal.issue_manager.underperforming_group.UnderperformingGroupIssueManager `. + + For more information on generating `cluster_ids` for this issue manager, refer to this `FAQ Section <../../../tutorials/faq.html#How-do-I-specify-pre-computed-data-slices/clusters-when-detecting-the-Underperforming-Group-Issue?>`_. + +Null Issue Parameters +--------------------- + +.. code-block:: python + + null_kwargs = {} + +.. note:: + + For more information, view the source code of: :py:class:`datalab.internal.issue_manager.null.NullIssueManager `. + +Data Valuation Issue Parameters +------------------------------- + +.. code-block:: python + + data_valuation_kwargs = { + "k": # Number of nearest neighbors used to calculate data valuation scores, + "threshold": # Examples with scores below this threshold will be flagged with a data valuation issue + } + +.. note:: + For more information, view the source code of: :py:class:`datalab.internal.issue_manager.data_valuation.DataValuationIssueManager `. + +Image Issue Parameters +---------------------- + +To customize optional parameters for specific image issue types, you can provide a dictionary format corresponding to each image issue. The following codeblock demonstrates how to specify optional parameters for all image issues. However, it's important to note that providing optional parameters for specific image issues is not mandatory. If no specific parameters are provided, defaults will be used for those issues. + +.. code-block:: python + + image_issue_types_kwargs = { + "dark": {"threshold": 0.32}, # `threshold` argument for dark issue type. Non-negative floating value between 0 and 1, lower value implies fewer samples will be marked as issue and vice versa. + "light": {"threshold": 0.05}, # `threshold` argument for light issue type. Non-negative floating value between 0 and 1, lower value implies fewer samples will be marked as issue and vice versa. + "blurry": {"threshold": 0.29}, # `threshold` argument for blurry issue type. Non-negative floating value between 0 and 1, lower value implies fewer samples will be marked as issue and vice versa. + "low_information": {"threshold": 0.3}, # `threshold` argument for low_information issue type. Non-negative floating value between 0 and 1, lower value implies fewer samples will be marked as issue and vice versa. + "odd_aspect_ratio": {"threshold": 0.35}, # `threshold` argument for odd_aspect_ratio issue type. Non-negative floating value between 0 and 1, lower value implies fewer samples will be marked as issue and vice versa. + "odd_size": {"threshold": 10.0}, # `threshold` argument for odd_size issue type. Non-negative integer value between starting from 0, unlike other issues, here higher value implies fewer samples will be selected. + } + +.. note:: + + For more information, view the cleanvision `docs `_. + + +Cleanlab Studio (Easy Mode) +--------------------------- + +`Cleanlab Studio `_ is a fully automated platform that can detect the same data issues as this package, as well as `many more types of issues `_, all without you having to do any Machine Learning (or even write any code). Beyond being 100x faster to use and producing more useful results, `Cleanlab Studio `_ also provides an intelligent data correction interface for you to quickly fix the issues detected in your dataset (a single data scientist can fix millions of data points thanks to AI suggestions). + +`Cleanlab Studio `_ offers a powerful AutoML system (with Foundation models) that is useful for more than improving data quality. With a few clicks, you can: find + fix issues in your dataset, identify the best type of ML model and train/tune it, and deploy this model to serve accurate predictions for new data. Also use the same AutoML to auto-label large datasets (a single user can label millions of data points thanks to powerful Foundation models). `Try Cleanlab Studio for free! `_ + +.. image:: https://raw.githubusercontent.com/cleanlab/assets/master/cleanlab/ml-with-cleanlab-studio.png + :width: 800 + :alt: Stages of modern AI pipeline that can now be automated with Cleanlab Studio diff --git a/v2.6.6/_sources/cleanlab/datalab/guide/table.rst b/v2.6.6/_sources/cleanlab/datalab/guide/table.rst new file mode 100644 index 000000000..a1f3d81af --- /dev/null +++ b/v2.6.6/_sources/cleanlab/datalab/guide/table.rst @@ -0,0 +1,186 @@ +.. tabs:: + + .. tab:: Classification task + + .. list-table:: + :widths: 20 10 20 50 + :header-rows: 1 + + * - Issue Name + - Default + - Column Name + - Required keyword arguments in :py:meth:`Datalab.find_issues ` + * - :ref:`label