Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misbehaviour when inserting 100K docs with float_vectors and HNSW #2883

Open
5 tasks
sanikolaev opened this issue Dec 27, 2024 · 1 comment
Open
5 tasks

Comments

@sanikolaev
Copy link
Collaborator

Bug Description:

If you try to insert 100K 512-dims float_vectors, some inserts take too long, and a DROP TABLE afterwards also takes too long and doesn't fully remove the table files.

Use the manticore-load tool or the cache it generates.

snikolaev@dev2:~$ git clone https://github.com/manticoresoftware/manticore-load.git
snikolaev@dev2:~$ cd manticore-load
snikolaev@dev2:~/manticore-load$ ./manticore-load --drop --batch-size=1000 --threads=1 --total=100000 --init="create table test ( v float_vector knn_type='hnsw' knn_dims='512' hnsw_similarity='l2' );" --load="insert into test values ( 0, (<array_float/512/512/0/1>) )"
Process 1: Dropping table: DROP TABLE IF EXISTS test
Process 1: Executing init command: create table test ( v float_vector knn_type='hnsw' knn_dims='512' hnsw_similarity='l2' )
Process 1: Using cached data from: /tmp/manticore_load_0165b492661e1bd874e67d49bcada3d6
----------------------------------------------------------------------------------------------------------------
Time      Elapsed   Progress    QPS      DPS      CPU     Workers    Chunks    Merging   Size growth  Size
----------------------------------------------------------------------------------------------------------------
08:52:31  1.0s      20%         19       18.8K    N/A     3          0                   0B           670B
08:52:32  2.0s      40%         20       19.6K    13%     3          0                   0B           670B
08:52:33  3.1s      58%         17       18.3K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B

----------------------------------------------------------------------------------------------------------------
Time      Elapsed   Progress    QPS      DPS      CPU     Workers    Chunks    Merging   Size growth  Size
----------------------------------------------------------------------------------------------------------------
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:52:34  4.1s      76%         18       18.5K    16%     3          0                   0B           670B
08:53:04  33.8s     77%         0        0.1K     11%     2          1                   0B           86MB

Process 1 Final Statistics:
Init command: create table test ( v float_vector knn_type='hnsw' knn_dims='512' hnsw_similarity='l2' );
Load command: insert into test values ( 0, (<array_float/512/512/0/1>) )
--------------------------------------------------------------------------------------
Write Statistics:
Total time:       35.2s
Total queries:    100
Total docs:       100000
Docs per sec avg: 2838
QPS avg:          15
QPS 1p:           0
QPS 5p:           0
QPS 95p:          20
QPS 99p:          20
Latency avg:      715.9 ms
Latency 50p:      115.0 ms
Latency 95p:      135.0 ms
Latency 99p:      29500.0 ms
--------------------------------------------------------------------------------------

Note, some inserts took up to 29 seconds.

If you attempt dropping the table it takes 49 seconds and can't remove the files:

snikolaev@dev2:~/manticore-load$ time mysql -P9306 -h0 -e "drop table test; show warnings"
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                                                                       |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| warning | 1000 | index test directory '/var/lib/manticore/test' is not empty after table drop, clean up files manually: test.2.spa, test.2.spb, test.2.spidx.0.tmp, test.3.spa, test.3.spb, test.3.spidx.0.tmp |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

real	0m49.293s
user	0m0.004s
sys	0m0.004s

The same w/o HNSW works fine:

snikolaev@dev2:~/manticore-load$ ./manticore-load --drop --batch-size=1000 --threads=1 --total=100000 --init="create table test ( v float_vector );" --load="insert into test values ( 0, (<array_float/512/512/0/1>) )"
Process 1: Dropping table: DROP TABLE IF EXISTS test
Process 1: Executing init command: create table test ( v float_vector )
Process 1: Generating new data cache /tmp/manticore_load_f9b6bc63a688ee53bc8b695211e85104 ... 100%
----------------------------------------------------------------------------------------------------------------
Time      Elapsed   Progress    QPS      DPS      CPU     Workers    Chunks    Merging   Size growth  Size
----------------------------------------------------------------------------------------------------------------
08:55:34  1.0s      19%         18       17.7K    N/A     3          0                   0B           596B
08:55:35  2.1s      38%         19       18.2K    14%     3          0                   0B           596B
08:55:36  3.1s      58%         19       17.2K    16%     3          1                   51MB         55MB
08:55:37  4.1s      76%         18       18.2K    15%     2          1                   0B           55MB

Process 1 Final Statistics:
Init command: create table test ( v float_vector );
Load command: insert into test values ( 0, (<array_float/512/512/0/1>) )
--------------------------------------------------------------------------------------
Write Statistics:
Total time:       5.5s
Total queries:    100
Total docs:       100000
Docs per sec avg: 18193
QPS avg:          19
QPS 1p:           18
QPS 5p:           18
QPS 95p:          19
QPS 99p:          19
Latency avg:      109.6 ms
Latency 50p:      115.0 ms
Latency 95p:      125.0 ms
Latency 99p:      135.0 ms
--------------------------------------------------------------------------------------

snikolaev@dev2:~/manticore-load$ time mysql -P9306 -h0 -e "drop table test; show warnings"

real	0m0.102s
user	0m0.007s
sys	0m0.000s

Manticore Search Version:

Manticore 6.3.9 aa1c56713@24122619 dev (columnar 2.3.1 9d9a466@24122408) (secondary 2.3.1 9d9a466@24122408) (knn 2.3.1 9d9a466@24122408)

Operating System Version:

Ubuntu Jammy (dev2)

Have you tried the latest development version?

Yes

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated
@sanikolaev
Copy link
Collaborator Author

As discussed on today's dev call, this might have something to do with saving data to disk in chunks. I can confirm that. I did further testing. Here’s what it looks like after running the original query:

snikolaev@dev2:~/manticore-load$ ./manticore-load --drop --batch-size=1000 --threads=1 --total=100000 --init="create table test ( v float_vector knn_type='hnsw' knn_dims='512' hnsw_similarity='l2');" --load="insert into test values ( 0, (<array_float/512/512/0/1>) )"
======================================================================================
Process 1: Running with threads=1, batch-size=1000
Process 1: Dropping table: DROP TABLE IF EXISTS test
Process 1: Executing init command: create table test ( v float_vector knn_type='hnsw' knn_dims='512' hnsw_similarity='l2')
Process 1: Using cached data from: /tmp/manticore_load_0dafc0b92285d3e080bfbb649dc95811

----------------------------------------------------------------------------------------------------------------
Time      Elapsed   Progress    QPS      DPS      CPU     Workers    Chunks    Merging   Size growth  Size
----------------------------------------------------------------------------------------------------------------
10:42:50  1.0s      21%         20       19.7K    N/A     3          0                   0B           670B
10:42:52  2.1s      40%         18       18.6K    14%     3          0                   0B           670B
10:42:53  3.1s      59%         18       17.7K    17%     3          0                   0B           670B
...
Process 1 Final Statistics:
Init command: create table test ( v float_vector knn_type='hnsw' knn_dims='512' hnsw_similarity='l2');
Load command: insert into test values ( 0, (<array_float/512/512/0/1>) )
--------------------------------------------------------------------------------------
Write Statistics:
Total time:       36.8s
Total queries:    100
Threads:          1
Batch size:       1000
Total docs:       100000
Docs per sec avg: 2715
QPS avg:          15
QPS 1p:           1
QPS 5p:           1
QPS 95p:          20
QPS 99p:          20
Latency avg:      748.9 ms
Latency 50p:      115.0 ms
Latency 95p:      135.0 ms
Latency 99p:      31500.0 ms
--------------------------------------------------------------------------------------
snikolaev@dev2:~/manticore-load$ sudo tail -f /var/log/manticore/searchd.log
...
[Mon Dec 30 10:43:25.317 2024] [802854] rt: table test: diskchunk 0(1), segments 21  saved in 34.242173 (34.267058) sec, RAM saved/new 45766749/115506557 ratio 0.333333 (soft limit 44739197, conf limit 134217728)
[Mon Dec 30 10:43:25.966 2024] [802852] rt: table test: diskchunk 1(2), segments 21  saved in 33.721439 (33.756128) sec, RAM saved/new 45766749/93712867 ratio 0.333333 (soft limit 44739197, conf limit 134217728)
[Mon Dec 30 10:43:59.857 2024] [802842] rt: table test: diskchunk 3(3), segments 21  saved in 33.363645 (33.393256) sec, RAM saved/new 45766749/10896845 ratio 0.807692 (soft limit 108406626, conf limit 134217728)
[Mon Dec 30 10:44:26.280 2024] [802838] rt: table test: diskchunk 2(4), segments 32  saved in 60.919561 (60.958473) sec, RAM saved/new 69739808/10896845 ratio 0.864865 (soft limit 116080197, conf limit 134217728)
snikolaev@dev2:~/manticore-load$ mysql -P9306 -h0 -e "show table test status like '%chunk%'"
+--------------------------+----------+
| Variable_name            | Value    |
+--------------------------+----------+
| ram_chunk                | 10899125 |
| ram_chunk_segments_count | 5        |
| disk_chunks              | 4        |
+--------------------------+----------+
snikolaev@dev2:~/manticore-load$ mysql -P9306 -h0 -e "select id, indexed_documents from test.@status"
+------+-------------------+
| id   | indexed_documents |
+------+-------------------+
|    2 |             21000 |
|    3 |             21000 |
|    4 |             32000 |
|    1 |             21000 |
+------+-------------------+

If you attempt dropping the table after the insert:

snikolaev@dev2:~/manticore-load$ time mysql -P9306 -h0 -e "drop table test; show warnings"
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                                                                       |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| warning | 1000 | index test directory '/var/lib/manticore/test' is not empty after table drop, clean up files manually: test.2.spa, test.2.spb, test.2.spidx.0.tmp, test.3.spa, test.3.spb, test.3.spidx.0.tmp |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The questions are:

  1. Is it acceptable that saving 21K 512-dimensional float vectors and building an HNSW index takes 33 seconds? How can we improve the performance?
  2. Is it expected that we cannot drop a table normally while a disk chunk is being saved to it? This seems like a poor user experience because if the user doesn’t have root access, they won’t be able to create a table with the same name afterward. We need to find a solution to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants