Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot metrics - not finding if the last backup was successful #961

Open
GeiserX opened this issue Nov 15, 2024 · 4 comments
Open

Snapshot metrics - not finding if the last backup was successful #961

GeiserX opened this issue Nov 15, 2024 · 4 comments

Comments

@GeiserX
Copy link

GeiserX commented Nov 15, 2024

I enabled the --collector.snapshots to see which metrics I could get from the snapshots I'm doing in my ES cluster.
However these are the only metrics I found related to it then:

elasticsearch_scrape_duration_seconds{collector="snapshots"} 0.006456413
elasticsearch_scrape_success{collector="snapshots"} 1

And then many times this structure, for each node:

elasticsearch_thread_pool_active_count{cluster="deduplicator-es",es_client_node="true",es_data_node="false",es_ingest_node="false",es_master_node="true",host="...",name="deduplicator-es-es-master-2",type="searchable_snapshots_cache_fetch_async"} 0
elasticsearch_thread_pool_active_count{cluster="deduplicator-es",es_client_node="true",es_data_node="false",es_ingest_node="false",es_master_node="true",host="...",name="deduplicator-es-es-master-2",type="searchable_snapshots_cache_prewarming"} 0
elasticsearch_thread_pool_active_count{cluster="deduplicator-es",es_client_node="true",es_data_node="false",es_ingest_node="false",es_master_node="true",host="...",name="deduplicator-es-es-master-2",type="snapshot"} 0
elasticsearch_thread_pool_active_count{cluster="deduplicator-es",es_client_node="true",es_data_node="false",es_ingest_node="false",es_master_node="true",host="...",name="deduplicator-es-es-master-2",type="snapshot_meta"} 0

I would simply like to check if the last backup failed or not, but there's nothing I can check related to that it seems. I have set one daily backup.

I don't know if this is a feature request, a bug, or if I'm doing something wrong on my side, honestly.

Thank you

@GeiserX GeiserX changed the title Snapshot metrics Snapshot metrics - not finding if the last backup was successful Nov 15, 2024
@Skunnyk
Copy link

Skunnyk commented Dec 2, 2024

Once enabled, you should have tons of metrics related to snapshots: https://github.com/prometheus-community/elasticsearch_exporter?tab=readme-ov-file#metrics like elasticsearch_snapshot_stats_xxxxxx

@sysadmind
Copy link
Contributor

Can you provide the version of the exporter that you are running? Also, the exporter looks for metrics from the /_snapshot endpoint. Can you query that on your elasticsearch cluster and see if there are any results?

@GeiserX
Copy link
Author

GeiserX commented Jan 8, 2025

@Skunnyk I have all those metrics EXCEPT the elasticsearch_snapshot_stats_...

@sysadmind I'm running this version:

image: quay.io/prometheuscommunity/elasticsearch-exporter:v1.7.0

(I'm using the elastic-operator, version 2.16.0 with an ElasticSearch version 8.6.2 for this given ES cluster)

I tried forwarding locally and:

~ ❯ curl -u "elastic:password" -k https://my-es-http:9200/_snapshot\?pretty
{
  "backup" : {
    "type" : "s3",
    "uuid" : "...",
    "settings" : {
      "bucket" : "my-bucket",
      "path_style_access" : "true",
      "endpoint" : "http://(...).rook-ceph.svc"
    }
  }
}

So it seems there are snapshots... and I can check there are a lot of backups:

~ ❯ curl -u "elastic:password" -k https://my-es-http:9200/_snapshot/backup/_all\?pretty
{
  "snapshots" : [
    {
      "snapshot" : "daily-snapshots-...",
      "uuid" : "...",
      "repository" : "backup",
      "version_id" : 8060299,
      "version" : "8.6.2",
      "indices" : [
        "index1",
        ....
       ],
       ... and many other snapshots

I will be trying with other ES clusters in the same k8s cluster, but it's odd.

@GeiserX
Copy link
Author

GeiserX commented Jan 9, 2025

I tried with another ES in another cluster, similarly deployed, with exactly same versions, and there as well I have many metrics:

elasticsearch_breakers_estimated_size_bytes{...}
... # 100+ more results with the same metric name
elasticsearch_breakers_overhead{...}
... # 100+ more results with the same metric name
elasticsearch_breakers_tripped{...}
... # 100+ more results with the same metric name
elasticsearch_cluster_health_active_primary_shards{...} 
elasticsearch_cluster_health_active_shards{...} 
elasticsearch_cluster_health_delayed_unassigned_shards{...}
elasticsearch_cluster_health_initializing_shards{...}
elasticsearch_cluster_health_number_of_data_nodes{...}
elasticsearch_cluster_health_number_of_in_flight_fetch{...}
elasticsearch_cluster_health_number_of_nodes{...} 
... # A lot more metrics

But again the only metrics available related to snapshots are:

# HELP elasticsearch_scrape_duration_seconds elasticsearch_exporter: Duration of a collector scrape.
# TYPE elasticsearch_scrape_duration_seconds gauge
elasticsearch_scrape_duration_seconds{collector="cluster-info"} 0.005485547
elasticsearch_scrape_duration_seconds{collector="snapshots"} 0.005945568
# HELP elasticsearch_scrape_success elasticsearch_exporter: Whether a collector succeeded.
# TYPE elasticsearch_scrape_success gauge
elasticsearch_scrape_success{collector="cluster-info"} 1
elasticsearch_scrape_success{collector="snapshots"} 1

And I also can check that the --colector.snapshots is enabled:

  - command:
    - elasticsearch_exporter
    - --log.format=logfmt
    - --log.level=info
    - --es.uri=https://my-es-http:9200
    - --es.all
    - --es.indices
    - --es.indices_settings
    - --es.indices_mappings
    - --es.shards
    - --collector.snapshots
    - --es.timeout=30s
    - --es.ca=/ssl/ca.crt
    - --web.listen-address=:9108
    - --web.telemetry-path=/metrics

And that the cluster offers the /_snapshot endpoint:

~ ❯ curl -u "elastic:password" -k https://my-es-http:9200/_snapshot/backup\?pretty
{
  "backup" : {
    "type" : "s3",
    "uuid" : "....",
    "settings" : {
      "bucket" : "...",
      "path_style_access" : "true",
      "endpoint" : "http://(...).rook-ceph.svc"
    }
  }
}

Supposedly in 1.7.0 this --colector.snapshots is already working. I could try upgrading, but this doesn't seem to be the issue here, as the changes for 1.8.0 do not mention anything has been changed in this regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants