Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastADC implementation #470

Open
wants to merge 86 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
451c76f
Replace int with int64_t in Predicate class
ol-imorozko Sep 29, 2024
82f8437
Initial commit that adds dc folder and placeholder for dc.h
ol-imorozko Feb 25, 2024
3c50e82
Implement method to get value from TypedColumnData
ol-imorozko Sep 24, 2024
66c13dd
Implement IndexProvider class
ol-imorozko Mar 2, 2024
f348335
Implement functions to get similarities metrics between two columns
ol-imorozko Mar 6, 2024
9e057cf
Implement PrediateBuilder class
ol-imorozko Mar 7, 2024
fd64797
Implement tests for predicate space building
ol-imorozko Mar 8, 2024
e6d2fc3
Implement Pli and PliShardBuilder
ol-imorozko Mar 16, 2024
7753aed
Implement CommonClueSetBuilder
ol-imorozko Sep 21, 2024
effa8ba
Implement SingleClueSetBuilder
ol-imorozko Sep 21, 2024
c99e02c
Implement CrossClueSetBuilder
ol-imorozko Sep 21, 2024
623ab63
Add test that checks static fields of CommonClueSetBuilder
ol-imorozko May 1, 2024
7245877
Implement ClueSetBuilder
ol-imorozko Sep 21, 2024
eded4d8
Add test that checks ClueSet building
ol-imorozko Sep 24, 2024
e1ed44c
Add an ability to force kString type on TypedColumnData instead of kM…
ol-imorozko Sep 24, 2024
8d5edc1
Add initial EvidenceSetBuilder class that builds cardinality mask
ol-imorozko Sep 29, 2024
3943406
Add test that verifies CardinalityMask
ol-imorozko Sep 29, 2024
fbb6a86
Implement Evidence
ol-imorozko Sep 29, 2024
fc63418
Implement EvidenceSet
ol-imorozko Sep 29, 2024
37e1136
Implement EvidenceSetBuilder
ol-imorozko Sep 29, 2024
68b990b
Add test to verify evidence set
ol-imorozko Sep 29, 2024
1a4a57c
Fix wrong creating of inverted predicate, operands were swapped
ol-imorozko Oct 3, 2024
dee8405
Add type alias for bitset holding predicates
ol-imorozko Oct 1, 2024
4decf6e
Implement PredicateOrganizer class
ol-imorozko Oct 1, 2024
e200ba1
Add test that validates predicate organizer
ol-imorozko Oct 1, 2024
495ae73
Implement DCCandidateTrie class
ol-imorozko Oct 1, 2024
e9c9528
Implement PredicateSet class
ol-imorozko Mar 2, 2024
148e24c
Implement DenialConstraint class
ol-imorozko Oct 2, 2024
570c804
Return reference from GetImplications Predicate method
ol-imorozko Oct 2, 2024
3d05336
Implement Closure class
ol-imorozko Oct 2, 2024
e1f6f38
Implement NTreeSearch class
ol-imorozko Oct 2, 2024
e3804a8
Implement DenialConstraintSet
ol-imorozko Oct 2, 2024
860fafd
Implement ApproximateEvidenceInverter class
ol-imorozko Oct 2, 2024
0409803
Implement test for approximate denial constraints
ol-imorozko Oct 2, 2024
eca4790
Change namespace model to namespace algos::fastadc for FastADC files
ol-imorozko Oct 3, 2024
f774cc2
Split FastADC files into subfolders
ol-imorozko Oct 3, 2024
de78c4d
Correct includes paths after renaming and moving FastADC files
ol-imorozko Oct 3, 2024
8c26f9a
Refactor providers* structures
ol-imorozko Oct 3, 2024
bc1be2e
Adjust unittests after providers refactoring
ol-imorozko Oct 4, 2024
b87850b
Extract predicate packs and correction map building to a separate class
ol-imorozko Oct 5, 2024
8a0d876
Move cardinality mask building from Evidence set to a new structure
ol-imorozko Oct 5, 2024
d15ae01
Remove unused clue field from Evidence class
ol-imorozko Oct 5, 2024
1827d3d
Remove unused N field from SearchNode class
ol-imorozko Oct 5, 2024
b6a124d
Optimize AccumulateClues by hashing clue with zero value
ol-imorozko Oct 5, 2024
c9302c8
Increase performance of AccumulateClues by preallocating and utilizin…
ol-imorozko Oct 5, 2024
8ff5492
Optimize clues by moving allocations out of Build* methods
ol-imorozko Oct 5, 2024
036970f
Change predicate bitset size from 64 to 128
ol-imorozko Oct 6, 2024
33cddfe
Add missing cvs file for DC mining testing
ol-imorozko Jan 13, 2025
c61f461
Fix clang-format header recommendtaion
ol-imorozko Dec 1, 2024
cc3c076
Add DependentFalse to namespace::details
ol-imorozko Dec 1, 2024
69ae4e8
Remove inline from template functions
ol-imorozko Dec 1, 2024
2318ca4
Remove redundant static variables
ol-imorozko Dec 1, 2024
65d930e
Don't define several variables on the same line
ol-imorozko Dec 1, 2024
00d8041
Do not use std::initializer_list to store anything, replace with cons…
ol-imorozko Dec 1, 2024
cec11af
NTreeSearch() = default Not needed
ol-imorozko Dec 1, 2024
cfc5039
Rename GetInverse/MutexMap to Take since we're moving them
ol-imorozko Dec 1, 2024
2ac5184
Capture only coverages instead of everything in labda
ol-imorozko Dec 1, 2024
221ae1b
To mimic Java's behavior, return 0.0 when avg1=avg2=0 in GetAverageRatio
ol-imorozko Dec 1, 2024
80dcae1
Do not reopen namespace std when specializing std::hash
ol-imorozko Dec 1, 2024
b75cfe7
No need to define hash_value, declaration is enough to use boost::has…
ol-imorozko Dec 1, 2024
f6c3022
Use BetterEnum instead of bool to indicate tuple in ColumnOperand
ol-imorozko Dec 2, 2024
7aa1f06
Apply IWYU to src/core/algorithms/dc/FastADC
ol-imorozko Dec 2, 2024
a5c5b19
Don't use relative paths in inlcudes
ol-imorozko Dec 2, 2024
fd91034
Apply clang-format after header changes
ol-imorozko Dec 2, 2024
6ed4ac7
Remove unnecessary assert, predicate_index_provider can't be null the…
ol-imorozko Dec 2, 2024
7ec6be3
Define operator!= of DenialConstraint
ol-imorozko Dec 2, 2024
ab25724
Move Initizlize*Map in Operator to private section and add alias for map
ol-imorozko Dec 2, 2024
2879af7
Use default == and != operators from Operator
ol-imorozko Dec 2, 2024
b41fdff
Add TODO comments for classes that are both used for DC mining and ve…
ol-imorozko Jan 13, 2025
a562274
Use emplace back where needed
ol-imorozko Jan 13, 2025
c424a3f
Make PliShard fields private
ol-imorozko Jan 13, 2025
9d8d389
Do not explicitly delete PredicateSet constructor
ol-imorozko Jan 13, 2025
8ce8fab
Make GetIndex in IndexProvider accept references
ol-imorozko Jan 13, 2025
c02e8ab
Add inline to operator& and &= in aei.h to avoid violating ODR
ol-imorozko Jan 13, 2025
f98e9e1
Don't capture all variables by references in aei.h
ol-imorozko Jan 13, 2025
36dfea9
Remove shared_ptrs in aei.h and simplify Hit method
ol-imorozko Jan 13, 2025
d0c362d
Get rid of try-catch in clue builder by adding a helper function to c…
ol-imorozko Jan 13, 2025
26bfbd6
Split variable definitions into several lines
ol-imorozko Jan 13, 2025
0244e4d
Add missing braces around if-else
ol-imorozko Jan 13, 2025
2ba381e
Move BuildClueSet implementation to cpp file
ol-imorozko Jan 13, 2025
12b2881
Make CompareBitsets a static method of DenialConstraintSet
ol-imorozko Jan 13, 2025
11aedfe
Implement FastADC algorithm as a class derived from Algorithm
ol-imorozko Jan 13, 2025
88a7b1f
Add FastADC python bindings
ol-imorozko Jan 13, 2025
cdd22ed
Correct clang-format issues
ol-imorozko Jan 14, 2025
5e9835e
Bring back missing tmp_dc.cvs and rename it appropriately
ol-imorozko Jan 14, 2025
9f2a1d5
Add example of ADC mining
ol-imorozko Jan 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions examples/basic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ These scenarios showcase a single pattern by discussing its definition and provi
+ [mining_set_od_1.py](https://github.com/Desbordante/desbordante-core/tree/main/examples/basic/mining_set_od_1.py) — a scenario showing how to discover order dependencies based on set axiomatization, part 1.
+ [mining_set_od_2.py](https://github.com/Desbordante/desbordante-core/tree/main/examples/basic/mining_set_od_2.py) — a scenario showing how to discover order dependencies based on set axiomatization, part 2.
+ [mining_ucc.py](https://github.com/Desbordante/desbordante-core/tree/main/examples/basic/mining_ucc.py) — a scenario showing how to discover exact unique column combinations.
+ [mining_adc.py](https://github.com/Desbordante/desbordante-core/tree/main/examples/basic/mining_adc.py) — a scenario showing how to discover an approximate denial constraints.
+ [verifying_aucc.py](https://github.com/Desbordante/desbordante-core/tree/main/examples/basic/verifying_aucc.py) — a scenario showing how to verify an approximate unique column combination.
+ [verifying_fd_afd.py](https://github.com/Desbordante/desbordante-core/tree/main/examples/basic/verifying_fd_afd.py) — a scenario showing how to verify exact and approximate functional dependencies.
+ [verifying_gfd](https://github.com/Desbordante/desbordante-core/tree/main/examples/basic/verifying_gfd) — a scenario showing how to verify a graph functional dependency.
Expand Down
107 changes: 107 additions & 0 deletions examples/basic/mining_adc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
import desbordante as db
import pandas as pd

RED = '\033[31m'
YELLOW = '\033[33m'
GREEN = '\033[32m'
CYAN = '\033[1m\033[36m'
ENDC = '\033[0m'

TABLE_1 = "examples/datasets/taxes.csv"
TABLE_2 = "examples/datasets/taxes_2.csv"

def print_table(filename: str, title: str = "") -> None:
if title:
print(f"{title}")
data = pd.read_csv(filename, header=0)
print(data, end="\n\n")

def main():
print(f"""{YELLOW}This file demonstrates how to discover Approximate Denial Constraints (ADCs){ENDC}.

DC {CYAN}φ{ENDC} is a conjunction of predicates of the following form:
{CYAN}∀s, t ∈ R, s ≠ t: ¬(p_1 ∧ . . . ∧ p_m){ENDC}

DCs involve comparisons between pairs of rows within a dataset.
A typical DC example, derived from a Functional Dependency such as {CYAN}A -> B{ENDC},
is expressed as: {CYAN}∀s, t ∈ R, s ≠ t, ¬(t.A == s.A and t.B ≠ s.B){ENDC}.
This denotes that for any pair of rows in the relation, it should not be the case
that while the values in column A are equal, the values in column B are unequal.

{YELLOW}Let's begin by looking at TABLE_1:{ENDC}
""")

print_table(TABLE_1, "TABLE_1 (examples/datasets/taxes.csv):")

print(f"""- The 'evidence_threshold' parameter specifies the fraction of row pairs that must satisfy the DC for it to be considered valid.
* evidence_threshold = 0 => exact DC mining, where all pairs must satisfy.
* evidence_threshold < 1.0 => approximate DC mining, which allows a fraction of violations.
- The 'shard_length' parameter splits the dataset into row "shards" for parallelization. Here,
we set it to 0 so all rows are processed in one shard.
""")

print(f"""{YELLOW}Mining exact DCs (evidence_threshold=0) on TABLE_1{ENDC}""")

algo = db.dc.algorithms.Default()
algo.load_data(table=(TABLE_1, ',', True))
algo.execute(evidence_threshold=0, shard_length=0)
dcs_table1_exact = algo.get_dcs()

print(f"{YELLOW}Discovered DCs:{ENDC}")
for dc in dcs_table1_exact:
print(f" {CYAN}{dc}{ENDC}")
print()

print(f"""Note the following Denial Constraint: {GREEN}¬{{ t.State == s.State ∧ t.Salary <= s.Salary ∧ t.FedTaxRate >= s.FedTaxRate }}{ENDC}.
It tells us that for all people in the same state the person with a higher salary has a higher tax rate.
""")

print(f"""{YELLOW}Now let's lower the evidence_threshold to 0.5 on TABLE_1{ENDC}
This means the DC only needs to hold for at least half of the row pairs, thus allowing more approximate constraints.
""")

print(f"""{YELLOW}Mining ADCs (evidence_threshold=0.5) on TABLE_1{ENDC}""")

algo = db.dc.algorithms.Default()
algo.load_data(table=(TABLE_1, ',', True))
algo.execute(evidence_threshold=0.5, shard_length=0)
dcs_table1_approx = algo.get_dcs()

print(f"{YELLOW}Discovered ADCs:{ENDC}")
for dc in dcs_table1_approx:
print(f" {CYAN}{dc}{ENDC}")
print()

print(f"""{YELLOW}Let's take a look at TABLE_2:{ENDC}""")

print_table(TABLE_2, "TABLE_2 (examples/datasets/taxes_2.csv):")


print(f"""TABLE_2 is almost the same as TABLE_1, but we added a new record for Texas:
{GREEN}(State=Texas, Salary=5000, FedTaxRate=0.05){ENDC}.

This additional record violates one of the DCs that was valid in TABLE_1,
because it introduces a new pair of rows that breaks the constraint
""")

print(f"""{YELLOW}Mining exact DCs (evidence_threshold=0) on TABLE_2{ENDC}""")

algo = db.dc.algorithms.Default()
algo.load_data(table=(TABLE_2, ',', True))
algo.execute(evidence_threshold=0, shard_length=0)
dcs_table2_exact = algo.get_dcs()

print(f"{YELLOW}Discovered DCs:{ENDC}")
for dc in dcs_table2_exact:
print(f" {CYAN}{dc}{ENDC}")
print()

print(f"""Now we can see that the same DC we examined on the previous dataset doesn't hold on the new one.
The thing is that for the last record {GREEN}(Texas, 5000, 0.05){ENDC} there are people in Texas with a lower salary
but higher tax rate. Pairs of records like this that contradict a DC are called violations.
In this case the following pairs are violations: {RED}(6, 9), (7, 9), (8, 9){ENDC}, where each number is an index of a record.
""")

if __name__ == "__main__":
main()

147 changes: 147 additions & 0 deletions src/core/algorithms/dc/FastADC/fastadc.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
#include "algorithms/dc/FastADC/fastadc.h"

#include <easylogging++.h>

#include "config/names_and_descriptions.h"
#include "config/option.h"
#include "config/option_using.h"
#include "config/tabular_data/input_table/option.h"
#include "dc/FastADC/model/pli_shard.h"
#include "dc/FastADC/util/approximate_evidence_inverter.h"
#include "dc/FastADC/util/evidence_aux_structures_builder.h"
#include "dc/FastADC/util/evidence_set_builder.h"
#include "dc/FastADC/util/predicate_builder.h"
#include "descriptions.h"
#include "names.h"
Comment on lines +14 to +15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "descriptions.h"
#include "names.h"
#include "config/names_and_descriptions.h"


namespace algos::dc {

FastADC::FastADC() : Algorithm({}) {
RegisterOptions();
MakeOptionsAvailable({config::kTableOpt.GetName()});
}

void FastADC::RegisterOptions() {
DESBORDANTE_OPTION_USING;

config::InputTable default_table;

RegisterOption(config::kTableOpt(&input_table_));
RegisterOption(Option{&shard_length_, kShardLength, kDShardLength, 350U});
RegisterOption(Option{&allow_cross_columns_, kAllowCrossColumns, kDAllowCrossColumns, true});
RegisterOption(Option{&minimum_shared_value_, kMinimumSharedValue, kDMinimumSharedValue, 0.3});
RegisterOption(
Option{&comparable_threshold_, kComparableThreshold, kDComparableThreshold, 0.1});
RegisterOption(Option{&evidence_threshold_, kEvidenceThreshold, kDEvidenceThreshold, 0.01});
}

void FastADC::MakeExecuteOptsAvailable() {
using namespace config::names;

MakeOptionsAvailable({kShardLength, kAllowCrossColumns, kMinimumSharedValue,
kComparableThreshold, kEvidenceThreshold});
}

void FastADC::LoadDataInternal() {
typed_relation_ = model::ColumnLayoutTypedRelationData::CreateFrom(
*input_table_, true, true); // kMixed type will be treated as a string type

if (typed_relation_->GetColumnData().empty()) {
throw std::runtime_error("Got an empty dataset: DC mining is meaningless.");
}
}

void FastADC::SetLimits() {
unsigned all_rows_num = typed_relation_->GetNumRows();

if (shard_length_ > all_rows_num) {
throw std::invalid_argument(
"'shard_length' (" + std::to_string(shard_length_) +
") must be less or equal to the number of rows in the table (total "
"rows: " +
std::to_string(all_rows_num) + ")");
}
if (shard_length_ == 0) shard_length_ = all_rows_num;
}

void FastADC::CheckTypes() {
model::ColumnIndex columns_num = typed_relation_->GetNumColumns();
unsigned rows_num = typed_relation_->GetNumRows();

for (model::ColumnIndex column_index = 0; column_index < columns_num; column_index++) {
model::TypedColumnData const& column = typed_relation_->GetColumnData(column_index);
model::TypeId type_id = column.GetTypeId();

if (type_id == +model::TypeId::kMixed) {
LOG(WARNING) << "Column with index \"" + std::to_string(column_index) +
"\" contains values of different types. Those values will be "
"treated as strings.";
} else if (!column.IsNumeric() && type_id != +model::TypeId::kString) {
throw std::invalid_argument(
"Column with index \"" + std::to_string(column_index) +
"\" is of unsupported type. Only numeric and string types are supported.");
}

for (std::size_t row_index = 0; row_index < rows_num; row_index++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use prefix increment
https://google.github.io/styleguide/cppguide.html#Preincrement_and_Predecrement:~:text=Use%20prefix%20increment/decrement%2C%20unless%20the%20code%20explicitly%20needs%20the%20result%20of%20the%20postfix%20increment/decrement%20expression.

Suggested change
for (std::size_t row_index = 0; row_index < rows_num; row_index++) {
for (std::size_t row_index = 0; row_index < rows_num; ++row_index) {

if (column.IsNull(row_index)) {
throw std::runtime_error("Some of the value coordinates are nulls.");
}
if (column.IsEmpty(row_index)) {
throw std::runtime_error("Some of the value coordinates are empty.");
}
Comment on lines +86 to +91
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not important you may combine with column.IsNullOrEmpty()

Suggested change
if (column.IsNull(row_index)) {
throw std::runtime_error("Some of the value coordinates are nulls.");
}
if (column.IsEmpty(row_index)) {
throw std::runtime_error("Some of the value coordinates are empty.");
}
if (column.IsNullOrEmpty(row_index)) {
throw std::runtime_error("Some of the value coordinates are null or empty.");
}

}
}
}

void FastADC::PrintResults() {
LOG(INFO) << "Total denial constraints: " << dcs_.TotalDCSize();
LOG(INFO) << "Minimal denial constraints: " << dcs_.MinDCSize();
LOG(DEBUG) << dcs_.ToString();
}

unsigned long long FastADC::ExecuteInternal() {
auto const start_time = std::chrono::system_clock::now();
LOG(DEBUG) << "Start";

SetLimits();
CheckTypes();

PredicateBuilder predicate_builder(&pred_provider_, &pred_index_provider_, allow_cross_columns_,
minimum_shared_value_, comparable_threshold_);
predicate_builder.BuildPredicateSpace(typed_relation_->GetColumnData());

PliShardBuilder pli_shard_builder(&int_prov_, &double_prov_, &string_prov_, shard_length_);
pli_shard_builder.BuildPliShards(typed_relation_->GetColumnData());

EvidenceAuxStructuresBuilder evidence_aux_structures_builder(predicate_builder);
evidence_aux_structures_builder.BuildAll();

EvidenceSetBuilder evidence_set_builder(pli_shard_builder.pli_shards,
evidence_aux_structures_builder.GetPredicatePacks());
evidence_set_builder.BuildEvidenceSet(evidence_aux_structures_builder.GetCorrectionMap(),
evidence_aux_structures_builder.GetCardinalityMask());

LOG(INFO) << "Built evidence set";
auto elapsed_milliseconds = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now() - start_time);
LOG(DEBUG) << "Current time: " << elapsed_milliseconds.count();

ApproxEvidenceInverter dcbuilder(predicate_builder, evidence_threshold_,
std::move(evidence_set_builder.evidence_set));

dcs_ = dcbuilder.BuildDenialConstraints();

PrintResults();

elapsed_milliseconds = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now() - start_time);
LOG(INFO) << "Algorithm time: " << elapsed_milliseconds.count();
return elapsed_milliseconds.count();
}

// TODO: mb make this a list?
std::vector<DenialConstraint> const& FastADC::GetDCs() const {
return dcs_.GetResult();
}

} // namespace algos::dc
61 changes: 61 additions & 0 deletions src/core/algorithms/dc/FastADC/fastadc.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#pragma once

#include <memory>
#include <vector>

#include "algorithm.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use full name

#include "dc/FastADC/providers/predicate_provider.h"
#include "dc/FastADC/util/denial_constraint_set.h"
#include "model/denial_constraint.h"
#include "table/column_layout_typed_relation_data.h"
#include "tabular_data/input_table_type.h"

namespace algos::dc {

using namespace fastadc;

class FastADC : public Algorithm {
private:
unsigned shard_length_;
bool allow_cross_columns_;
double minimum_shared_value_;
double comparable_threshold_;
double evidence_threshold_;

config::InputTable input_table_;
std::unique_ptr<model::ColumnLayoutTypedRelationData> typed_relation_;

PredicateIndexProvider pred_index_provider_;
PredicateProvider pred_provider_;
IntIndexProvider int_prov_;
DoubleIndexProvider double_prov_;
StringIndexProvider string_prov_;
DenialConstraintSet dcs_;

void MakeExecuteOptsAvailable() override;
void LoadDataInternal() override;

void SetLimits();
void CheckTypes();
void PrintResults();

void ResetState() final {
pred_index_provider_.Clear();
pred_provider_.Clear();
int_prov_.Clear();
double_prov_.Clear();
string_prov_.Clear();
dcs_.Clear();
}

unsigned long long ExecuteInternal() final;

void RegisterOptions();

public:
FastADC();

std::vector<DenialConstraint> const& GetDCs() const;
};

} // namespace algos::dc
53 changes: 53 additions & 0 deletions src/core/algorithms/dc/FastADC/misc/misc.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#pragma once

#include "model/table/typed_column_data.h"

namespace algos::fastadc {

namespace details {
// Helper to trigger a compile-time error for unsupported types
template <typename T>
struct DependentFalse : std::false_type {};
} // namespace details

// TODO: look at performance, is returning by const reference here beneficial?
template <typename T>
[[nodiscard]] T GetValue(model::TypedColumnData const& column, size_t row) {
model::Type const& type = column.GetType();

if (!column.IsNullOrEmpty(row)) {
return type.GetValue<T>(column.GetValue(row));
}

/*
* Mimicking the Java behavior:
* https://github.com/RangerShaw/FastADC/blob/master/src/main/java/de/metanome/algorithms/dcfinder/input/Column.java#L71
*
* public Long getLong(int line) {
* return values.get(line).isEmpty() ? Long.MIN_VALUE :
* Long.parseLong(values.get(line));
* }
*
* public Double getDouble(int line) {
* return values.get(line).isEmpty() ? Double.MIN_VALUE :
* Double.parseDouble(values.get(line));
* }
*
* public String getString(int line) {
* return values.get(line) == null ? "" : values.get(line);
* }
*/
if constexpr (std::is_same_v<T, std::string>) {
return "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "";
return {};

} else if constexpr (std::is_same_v<T, int64_t>) {
return std::numeric_limits<int64_t>::min();
} else if constexpr (std::is_same_v<T, double>) {
return std::numeric_limits<double>::lowest();
} else {
static_assert(details::DependentFalse<T>::value,
"FastADC algorithm supports only int64_t, string, or double as column types. "
"This function should not be called with other types.");
}
}

} // namespace algos::fastadc
Loading
Loading