Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: fix 404 links #17617

Merged
merged 2 commits into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions docs/development/extensions-core/test-stats.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
id: test-stats
title: "Test Stats Aggregators"
title: "Test stats aggregators"
---

<!--
Expand All @@ -23,13 +23,14 @@ title: "Test Stats Aggregators"
-->


This Apache Druid extension incorporates test statistics related aggregators, including z-score and p-value. Please refer to [https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/](https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/) for math background and details.
The `druid-stats` extension for Apache Druid incorporates aggregators to compute test statistics, including z-scores and p-values.
Please refer to [Democratizing Experimentation Data for Product Innovations](https://medium.com/paypal-tech/democratizing-experimentation-data-for-product-innovations-8b6e1cf40c27) for math background and details.

Make sure to include `druid-stats` extension in order to use these aggregators.

## Z-Score for two sample ztests post aggregator

Please refer to [https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-proportions-test/](https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-proportions-test/) and [http://www.ucs.louisiana.edu/~jcb0773/Berry_statbook/Berry_statbook_chpt6.pdf](http://www.ucs.louisiana.edu/~jcb0773/Berry_statbook/Berry_statbook_chpt6.pdf) for more details.
Please refer to [Making Sense of the Two-Proportions Test](https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-proportions-test/) and [An Introduction to Statistics: Comparing Two Means](https://userweb.ucs.louisiana.edu/~jcb0773/Berry_statbook/427bookall-August2024.pdf) for more details.

z = (p1 - p2) / S.E. (assuming null hypothesis is true)

Expand All @@ -41,6 +42,7 @@ S.E. = sqrt{ p1 * ( 1 - p1 )/n1 + p2 * (1 - p2)/n2) }
(p1 – p2) is the observed difference between two sample proportions.

### zscore2sample post aggregator

* **`zscore2sample`**: calculate the z-score using two-sample z-test while converting binary variables (***e.g.*** success or not) to continuous variables (***e.g.*** conversion rate).

```json
Expand Down Expand Up @@ -74,7 +76,7 @@ p2 = (successCount2) / (sample size 2)
}
```

## Example Usage
## Example usage

In this example, we use zscore2sample post aggregator to calculate z-score, and then feed the z-score to pvalue2tailedZtest post aggregator to calculate p-value.

Expand Down
2 changes: 1 addition & 1 deletion docs/querying/sql-translation.md
Original file line number Diff line number Diff line change
Expand Up @@ -803,7 +803,7 @@ the query hits `maxStreamLength`: the maximum number of items to store in each s
See [GitHub issue 11544](https://github.com/apache/druid/issues/11544) for more details.
To workaround the issue, increase value of the maximum string length with the `approxQuantileDsMaxStreamLength` parameter
in the query context. Since it is set to 1,000,000,000 by default, you don't need to override it in most cases.
See [accuracy information](https://datasketches.apache.org/docs/Quantiles/OrigQuantilesSketch) in the DataSketches documentation for how many bytes are required per stream length.
See [accuracy information](https://datasketches.apache.org/docs/Quantiles/ClassicQuantilesSketch.html) in the DataSketches documentation for how many bytes are required per stream length.
This query context parameter is a temporary solution to avoid the known issue. It may be removed in a future release after the bug is fixed.

## Unsupported features
Expand Down
Loading