Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add size to index-stats API response #492

Open
wants to merge 2 commits into
base: mainline
Choose a base branch
from

Conversation

hmacr
Copy link

@hmacr hmacr commented May 26, 2023

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
    feature

  • What is the current behavior? (You can also link to an open issue here)
    Add Index Size (GB) in stats[ENHANCEMENT] #374

  • What is the new behavior (if this is a feature change)?
    Index size will be returned as a field named size in the index-stats API response

  • Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
    No

  • Have unit tests been run against this PR? (Has there also been any additional testing?)
    Yes

  • Related Python client changes (link commit/PR here)
    There is no index/stats client method exposed in py-marqo. Do you want it to be added as part of this change?

  • Related documentation changes (link commit/PR here)

  • Other information:

  • Please check if the PR fulfills these requirements

  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes/features)
  • Docs have been added / updated (for bug fixes / features)

@pandu-k
Copy link
Collaborator

pandu-k commented May 31, 2023

Hi @hmacr ! Thanks for the PR!

  1. Is it possible to rework this using the OpenSearch stats endpoint?
GET /{index}/_stats

indices.{index}.total.store.size_in_bytes
  1. Can you add some a unit test for this please?
    See here for the existing get_stats tests
    https://github.com/marqo-ai/marqo/blob/mainline/tests/tensor_search/test_get_stats.py

  2. Can you run please run tests.
    Read here for how to do that: https://github.com/marqo-ai/marqo/blob/d39f2312e9e289b0c54f49e14cddd49095808ad4/CONTRIBUTING.md#unit-tests
    You can run just the tensor_search tests, which are faster and use less resources, with the following:
    tox -- tests/tensor_search
    If you are having a lot of trouble running tests, I can create branch in our repo from your fork and run our pipelines against it, and pass you the feedback

@hmacr
Copy link
Author

hmacr commented Jul 26, 2023

@pandu-k Apologies, I understand it's been a long time since I created the pull request; I'm trying to be more active on open source these days.

About the review comments, I've addressed them and ran the unit-tests as well (attaching the screenshots; there was one failure due to an unavailable remote repo). Please let me know your feedback.

Screenshot 2023-07-26 at 8 03 46 PM

Screenshot 2023-07-26 at 8 04 14 PM

assert tensor_search.get_stats(config=self.config, index_name=self.index_name_1)["numberOfDocuments"] == 3
index_stats = tensor_search.get_stats(config=self.config, index_name=self.index_name_1)
assert index_stats["numberOfDocuments"] == 3
assert len(index_stats["size"]) != 0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that the computed size of the documents uploaded in this test was changing on each run; so figured len check would be a good assertion here, and added unit-tests for the bytes to human readable format function here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants