How to find and diagnose unassigned Elasticsearch shards

featured image

Identifying the unassigned shards in Elasticsearch and finding the reason why a shard can’t be allocated is critical when we want to get rid of the “NoShardAvailableActionException: No shard available” warning. Additionally, the Elasticsearch APIs also provide helpful hints pointing to the possible cause of this issue.

Verify which Elasticsearch shards are unassigned

First, we have to be aware that some shards could not be assigned. Below you’ll find example ways of learning about the issue: using monitoring dashboards, browsing log messages and, the most useful, calling the Elasticsearch cat shard API.

Shard overview in the ElastiHQ and Kibana dashboards

We can see that some shards were marked as unavailable in the ElasticHQ panel:

not available shards in elastichq

Furthermore, if we monitor our Elasticsearch cluster with Metricbeat and Kibana, we can visit the Overview tab in the dashboard to see how many shards are not allocated:

unassigned shards count in kibana dashboard

On the same page, we can list all indices to see those with the loose shards. As we can see on the example screenshot below, the .elastichq index has two unassigned shards:

After clicking on this index we can see a short summary for the shards on the bottom of the page:

unassigned shards for the chosen index

In other words, the Kibana dashboard can show us how many shards are unassigned, what type they are and where to look for them.

The unavailable shard warning in logs

When you browse the Elasticsearch logs you’ll see a warning messages for faulty shards similar to the following one:

Details on the shards status returned by the Elasticsearch APIs

In order to see the state and details for all shards, we’re going to call the cat shards API:

The curl request consists of the following parts:

  • u – the parameter for specifying the credentials for our user in the username:password format;
  • localhost:9300 – the host and port of my Elasticsearch instance;
  • _cat/shards – the cat API endpoint that we can call in a terminal or a Kibana console (it’s intended for human use, you should choose a corresponding JSON API for application consumption);
  • v – one of the query parameters that includes column headings in the response, defaults to false;
  • h – one of the query parameters that allows us to list what columns we want to see.

Consequently, the curl request will return all Elasticsearch shards with data about their state:

shard state with unassigned shards listed

As a result, we not only identified unavailable shards but also got the simplified reason for the error.

On the other hand, if you want to only check the status, you can rely on the default set of columns, that doesn’t include the reason and copy the following request to the terminal (adjust the credentials, host and port):

If you’re working with a large amount of shards, you can limit the response with the <target> path parameter With this in mind, pass a comma-separated list of data streams, indices, or index aliases.

Diagnose the shard allocation issue

Elasticsearch provides the Cluster allocation explain API, which we can use to learn more about a particular shard. Thanks to listing the shards status above, I know that there are three of them that are not available:

  • spring-boot-app-logs-2020.10.28,
  • metricbeat-7.7.0-2020.10.28-000001,
  • .elastichq.

As an example, I’m going to call the explain API to gather data on the metricbeat-7.7… shard. For the request body I have to provide the shard index, name and whether it’s a primary shard or a replica. Fortunately, all those data were returned by the cat shard API that I’d called before. I just have to look at the values from the index, shard and prirep columns:

cat shard response with unassigned shards

The values in the prirep (primaryOrReplica) column can be either p (primary) or r (replica).

Finally, I can send the following request:

The response:

From this response I can extract the following data helpful in solving the issue:

  • "current_state" : "unassigned" – the current state of the shard (line 5);
  • "reason" : "CLUSTER_RECOVERED" – why the shard originally became unassigned (line 7);
  • "can_allocate" : "no" – whether to allocate the shard (line 11);
  • "node_decision" : "no" – whether to allocate the shard to the particular node (line 24);
  • "decider" : "same_shard" – which decider gave the no decision for the node (line 27);
  • "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists…" – a reason for the decision (line 29).

With this set of information, fixing the unassigned shard problem should be a lot easier.

More on debugging an unassigned shards in Elasticsearch

Photo by Ekaterina Belinskaya from Pexels

Leave a Reply

Your email address will not be published. Required fields are marked *