Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding avg() to select ... join leads to wrong results #2915

Open
5 tasks
sanikolaev opened this issue Jan 9, 2025 · 2 comments
Open
5 tasks

Adding avg() to select ... join leads to wrong results #2915

sanikolaev opened this issue Jan 9, 2025 · 2 comments
Assignees
Labels

Comments

@sanikolaev
Copy link
Collaborator

Bug Description:

This command creates 2 tables:

  • products - 10K docs
  • and reviews - 10M docs
manticore-load --drop --batch-size=1000 --threads=5 --total=10000 --init="CREATE TABLE products(name text, category multi)" --load="INSERT INTO products(id, name,category) VALUES(<increment>, '<text/2/10>',(<array/1/3/1/1000>))" --together --drop --batch-size=10000 --threads=5 --total=10000000 --init="create table reviews(product_id bigint, user_id int, rating int, comment text)" --load="insert into reviews(product_id, user_id, rating, comment) values(<int/1/10000>, <int/1/1000>, <int/1/100>, '<text/20/100>')"

(should take 1-2 minutes to load).

Alternatively, you can use this sql dump:

root@perf3 ~ # ls -lah /root/products_reviews.sql.tgz
-rw-r--r-- 1 root root 1.4G Jan  9 18:14 /root/products_reviews.sql.tgz

With this setup this query returns wrong results (too low count(*)):

root@perf3 ~ # mysql -P9306 -h0 -e "select name, count(*), avg(reviews.rating) avg_rating from products left join reviews on reviews.product_id = products.id group by id order by count(*) desc"
+---------------------------------------------------------------------+----------+-------------+
| name                                                                | count(*) | avg_rating  |
+---------------------------------------------------------------------+----------+-------------+
| Write; broke? informally thick hold.                                |        7 | 54.28571429 |
| Publicly! should illegally yearly right 50 dawn long draw.          |        6 | 52.83333333 |
| Begin. Young soft slow; particularly! bring.                        |        6 | 65.66666667 |
| Ask not excited? tame dry.                                          |        6 | 55.66666667 |
| Clearly have! medically write; ask hold clever dusk.                |        6 | 28.16666667 |
| Internationally politically north 3. Called successfully? happily.  |        5 |   34.200000 |
| Doubtfully regionally angry careless; lazy.                         |        5 |   21.600000 |
| Loud begin narrow had will? outside evening know, frequently empty. |        5 |   36.200000 |
| Happy light foolish obviously fresh.                                |        5 |   65.400000 |
| We. Know spend.                                                     |        5 |   43.200000 |
| Serious and? cold evening later.                                    |        5 |   22.800000 |
| Brave gone strong angry 100! thick was.                             |        5 |   53.400000 |
| Do! easily. Feel loudly constantly put draw. Real subtly old.       |        5 |   49.000000 |
| Scientifically bring medically maybe found; generally.              |        5 |   50.000000 |
| Everywhere taken rise.                                              |        5 |   41.200000 |
| Got had irregularly! day hot west.                                  |        5 |   36.800000 |
| Dusk light spiritually! correctly thick. Happy usually think.       |        5 |   47.400000 |
| Evening doubtfully daily unsuccessfully.                            |        5 |   40.000000 |
| Short that; nowhere held.                                           |        5 |   49.800000 |
| Came began did I, musically tell.                                   |        5 |   58.800000 |
+---------------------------------------------------------------------+----------+-------------+

The max count(*) is actually 1000 which we can see if we remove avg(reviews.rating) avg_rating:

root@perf3 ~ # mysql -P9306 -h0 -e "select name, count(*) from products left join reviews on reviews.product_id = products.id group by id order by count(*) desc"
+----------------------------------------------------------------------------------+----------+
| name                                                                             | count(*) |
+----------------------------------------------------------------------------------+----------+
| Dull artistically? occasionally evening artificially the 3; as autumn will.      |     1000 |
| Foolish loud worked generally possibly? wild! alive ancient brought.             |     1000 |
| Slow always dull have she sharp universally 50. Become from.                     |     1000 |
| Quiet tame slowly know keep.                                                     |     1000 |
| Led! could empty professionally politically happy yesterday.                     |     1000 |
| Always sour lazy have softly unhappily ask musically; back.                      |     1000 |
| Physically back rough short; certainly? bring above.                             |     1000 |
| Generally frequently informally weekly not back down try.                        |     1000 |
| I! fell spring never has dull; above correctly above.                            |     1000 |
| Leave; nightly take up.                                                          |     1000 |
| On in. Careless? above. Monthly, rich sometimes! grow unusually.                 |     1000 |
| Been understand early normally permanently mentally dead.                        |     1000 |
| Historically? dirty? careful serious taken expectedly, globally 5 personally on. |     1000 |
| Feel south perhaps.                                                              |     1000 |
| Generally hungry gradually tired.                                                |     1000 |
| Dry, brought clever. Unhappily 20 surely would.                                  |     1000 |
| North poor globally.                                                             |     1000 |
| Ask saw happily is kept.                                                         |     1000 |
| Sat of temporarily gentle wrote, got humble real technically.                    |     1000 |
| Send rich bad complexly early came! technically.                                 |     1000 |
+----------------------------------------------------------------------------------+----------+

Adding option join_batch_size=0 makes the query much slower, but helps:

root@perf3 ~ # mysql -P9306 -h0 -e "select name, count(*), avg(reviews.rating) avg_rating from products left join reviews on reviews.product_id = products.id group by id order by count(*) desc option join_batch_size=0"
+----------------------------------------------------------------------------------+----------+------------+
| name                                                                             | count(*) | avg_rating |
+----------------------------------------------------------------------------------+----------+------------+
| Dull artistically? occasionally evening artificially the 3; as autumn will.      |     1000 |  49.808000 |
| Led! could empty professionally politically happy yesterday.                     |     1000 |  50.913000 |
| Always sour lazy have softly unhappily ask musically; back.                      |     1000 |  49.335000 |
| Quiet tame slowly know keep.                                                     |     1000 |  49.539000 |
| Foolish loud worked generally possibly? wild! alive ancient brought.             |     1000 |  50.710000 |
| Slow always dull have she sharp universally 50. Become from.                     |     1000 |  51.047000 |
| Physically back rough short; certainly? bring above.                             |     1000 |  50.617000 |
| Generally frequently informally weekly not back down try.                        |     1000 |  48.804000 |
| Leave; nightly take up.                                                          |     1000 |  50.235000 |
| I! fell spring never has dull; above correctly above.                            |     1000 |  50.026000 |
| Feel south perhaps.                                                              |     1000 |  49.637000 |
| Historically? dirty? careful serious taken expectedly, globally 5 personally on. |     1000 |  49.656000 |
| Generally hungry gradually tired.                                                |     1000 |  48.926000 |
| On in. Careless? above. Monthly, rich sometimes! grow unusually.                 |     1000 |  50.585000 |
| Been understand early normally permanently mentally dead.                        |     1000 |  52.532000 |
| Thought, sit right become.                                                       |     1000 |  50.970000 |
| Send rich bad complexly early came! technically.                                 |     1000 |  50.386000 |
| Understand a? might spiritually.                                                 |     1000 |  50.488000 |
| Sat of temporarily gentle wrote, got humble real technically.                    |     1000 |  50.401000 |
| Dry, brought clever. Unhappily 20 surely would.                                  |     1000 |  49.497000 |
+----------------------------------------------------------------------------------+----------+------------+

Manticore Search Version:

Manticore 6.3.9 f990808e1@25010913 dev (columnar 2.3.1 9d9a466@24122408) (secondary 2.3.1 9d9a466@24122408) (knn 2.3.1 9d9a466@24122408)

Operating System Version:

Ubuntu Impish (perf3)

Have you tried the latest development version?

Yes

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated
@sanikolaev
Copy link
Collaborator Author

MRE (2 + 1001 docs):

root@perf3 ~ # manticore-load --drop --total=2 --init="CREATE TABLE products(name text)" --load="INSERT INTO products(id) VALUES(<increment>)" --together --drop --batch-size=1000 --total=1001 --init="create table reviews(product_id bigint, rating int)" --load="insert into reviews(product_id, rating) values(<int/1/2>, <int/1/100>)" --quiet

Threads ; Batch     ; Time        ; Total Docs  ; Docs/Sec    ; Avg QPS     ; p99 QPS     ; p95 QPS     ; p5 QPS      ; p1 QPS      ; Lat Avg     ; Lat p50     ; Lat p95     ; Lat p99     ;
1       ; 1         ; 00:00       ; 2           ; 1242        ; 0           ; 0           ; 0           ; 0           ; 0           ; 0.4         ; 0.5         ; 0.5         ; 0.5         ;

Threads ; Batch     ; Time        ; Total Docs  ; Docs/Sec    ; Avg QPS     ; p99 QPS     ; p95 QPS     ; p5 QPS      ; p1 QPS      ; Lat Avg     ; Lat p50     ; Lat p95     ; Lat p99     ;
1       ; 1000      ; 00:00       ; 2000        ; 544361      ; 0           ; 0           ; 0           ; 0           ; 0           ; 3.1         ; 3.5         ; 3.5         ; 3.5         ;

root@perf3 ~ # mysql -v -P9306 -h0 -e "select name, count(*), avg(reviews.rating) avg_rating from products left join reviews on reviews.product_id = products.id group by id order by count(*) desc; select name, count(*) from products left join reviews on reviews.product_id = products.id group by id order by count(*) desc"
--------------
select name, count(*), avg(reviews.rating) avg_rating from products left join reviews on reviews.product_id = products.id group by id order by count(*) desc
--------------

+------+----------+-------------+
| name | count(*) | avg_rating  |
+------+----------+-------------+
|      |      513 | 50.39376218 |
|      |      487 | 52.02464066 |
+------+----------+-------------+
--------------
select name, count(*) from products left join reviews on reviews.product_id = products.id group by id order by count(*) desc
--------------

+------+----------+
| name | count(*) |
+------+----------+
|      |      514 |
|      |      487 |
+------+----------+

Notes:

@sanikolaev
Copy link
Collaborator Author

A little bit more compact MRE:

root@perf3 ~ # manticore-load --drop --total=2 --init="CREATE TABLE products" --load="INSERT INTO products(id) VALUES(<increment>)" --together --drop --batch-size=1000 --total=1001 --init="create table reviews(product_id bigint, rating int)" --load="insert into reviews(product_id, rating) values(<int/1/2>, <int/1/100>)" --quiet

Threads ; Batch     ; Time        ; Total Docs  ; Docs/Sec    ; Avg QPS     ; p99 QPS     ; p95 QPS     ; p5 QPS      ; p1 QPS      ; Lat Avg     ; Lat p50     ; Lat p95     ; Lat p99     ;
1       ; 1         ; 00:00       ; 2           ; 2601        ; 0           ; 0           ; 0           ; 0           ; 0           ; 0.4         ; 0.5         ; 0.5         ; 0.5         ;

Threads ; Batch     ; Time        ; Total Docs  ; Docs/Sec    ; Avg QPS     ; p99 QPS     ; p95 QPS     ; p5 QPS      ; p1 QPS      ; Lat Avg     ; Lat p50     ; Lat p95     ; Lat p99     ;
1       ; 1000      ; 00:00       ; 2000        ; 491165      ; 0           ; 0           ; 0           ; 0           ; 0           ; 2.9         ; 2.5         ; 2.5         ; 2.5         ;
root@perf3 ~ # mysql -v -P9306 -h0 -e "select id, count(*), avg(reviews.rating) avg_rating from products left join reviews on reviews.product_id = products.id group by id order by count(*) desc; select id, count(*) from products left join reviews on reviews.product_id = products.id group by id order by count(*) desc"
--------------
select id, count(*), avg(reviews.rating) avg_rating from products left join reviews on reviews.product_id = products.id group by id order by count(*) desc
--------------

+------+----------+-------------+
| id   | count(*) | avg_rating  |
+------+----------+-------------+
|    1 |      513 | 50.39376218 |
|    2 |      487 | 52.02464066 |
+------+----------+-------------+
--------------
select id, count(*) from products left join reviews on reviews.product_id = products.id group by id order by count(*) desc
--------------

+------+----------+
| id   | count(*) |
+------+----------+
|    1 |      514 |
|    2 |      487 |
+------+----------+

products_reviews.sql.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants