English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Elasticsearch Modules

Elasticsearch is composed of many modules, which are responsible for its functions. These modules have two types of settings, as shown below:

  • Static settings−Before starting Elasticsearch, these settings need to be configured in the config(elasticsearch.yml) file. You need to update all concerned nodes in the cluster to reflect the changes in these settings.

  • Dynamic settings −These settings can be set on a real-time Elasticsearch.

We will discuss the different modules of Elasticsearch in the following sections of this chapter.

Cluster-level routing and shard allocation

Cluster-level settings determine the allocation of fragments to different nodes and the redistribution of fragments to rebalance the cluster. The following settings control fragment allocation.

Cluster-level shard allocation

SettingsPossible valuesDescription
cluster.routing.allocation.enable

all

This default value allows fragment allocation for all types of fragments.

primaries

This only allows fragment allocation for the master fragment.

new_primaries

This only allows fragment allocation for the master fragment of a new index.

none

This does not allow any fragment allocation.

cluster.routing.allocation.node_concurrent_recoveries

Numeric value (default value is2)

This limits the number of concurrent fragment recoveries.

cluster.routing.allocation.node_initial_primaries_recoveriesNumeric value (default is4)

This limits the number of parallel initial master recoveries.

cluster.routing.allocation.same_shard.hostBoolean value (default is false)

This limits the number of multiple copies of the same shard allocated in the same physical node.

index.recovery.concurrent_streamsNumeric value (default is3)

This controls the number of network streams opened by each node when recovering fragments from peers.

index.recovery.concurrent_small_file_streamsNumeric value (default is2)

This can control the size of fragments during recovery to be less than5The number of streams opened by mb's small files on each node.

cluster.routing.rebalance.enable

all

This default value allows balancing all types of shards.

primaries

This only allows shard balancing for primary fragments.

replicas

This only allows shard balancing for replica fragments.

none

This does not allow any form of shard balancing.

cluster.routing.allocation .allow_rebalance

always

This default value always allows rebalancing.

indexs_primaries_active

This allows rebalancing when all primary fragments in the cluster are allocated.

Indices_all_activeThis allows rebalancing when all primary and replica fragments are allocated.
cluster.routing.allocation.cluster _concurrent_rebalanceNumeric value (default is2)

This limits the number of concurrent shard balances in the cluster.

cluster.routing.allocation .balance.shardFloating-point value (default is 0.45f)

This defines the weight factor for the fragments allocated to each node.

cluster.routing.allocation .balance.indexFloating-point value (default is 0.55f)

This defines the ratio of the number of fragments allocated to each index on a specific node.

cluster.routing.allocation .balance.thresholdNon-negative floating-point value (default is1.0f)

This is the minimum optimization value for the operation that should be performed.

Disk-based shard allocation

SettingsPossible valuesDescription
cluster.routing.allocation.disk.threshold_enabled

Boolean value (default is true)

This enables and disables the disk allocation decision-making process.
cluster.routing.allocation.disk.watermark.low

String value (default is85)

This indicates the maximum usage rate of the disk; after this point, no other shards can be allocated to this disk.
cluster.routing.allocation.disk.watermark.high

string value (default is90%)

This indicates the maximum usage during allocation; if this point is reached during allocation, Elasticsearch will allocate that shard to another disk.
cluster.info.update.interval

string value (default30s)

This is the interval between two disk usage checks.
cluster.routing.allocation.disk.include_relocations

Boolean value (default is true)

This determines whether to consider the currently allocated shards when calculating disk usage.

Discovery

This module helps the cluster discover and maintain the status of all nodes in the cluster. The cluster status changes when nodes are added or removed from the cluster. Cluster name settings are used to create logical differences between different clusters. Some modules can help you use the API provided by cloud service providers, as shown below-

  • Azure discovery

  • EC2Discovery

  • Google Compute Engine discovery

  • Zen discovery

Gateway

This module maintains cluster state and shard data during the entire cluster restart. The following are the static settings of this module-

SettingsPossible valuesDescription
gateway.expected_nodes

Numeric value (default is 0)

The number of nodes in the cluster used to recover local shards.
gateway.expected_master_nodes

Numeric value (default is 0)

The expected number of master nodes in the cluster before starting recovery.
gateway.expected_data_nodes

Numeric value (default is 0)

The expected number of data nodes in the cluster before starting recovery.
gateway.recover_after_time

String value (default is5m)

This is the interval between two disk usage checks.
cluster.routing.allocation. disk.include_relocations

Boolean value (default is true)

This specifies the time at which the recovery process will start, regardless of the number of nodes joining the cluster.

gateway.recover_after_nodes
gateway.recover_after_master_nodes
gateway.recover_after_data_nodes

HTTP

This module manages communication between the HTTP client and the Elasticsearch API. This module can be disabled by changing the value of http.enabled to false.

The following are the settings used to control this module (configured in elasticsearch.yml)-

Serial numberSettings and descriptions
1

http.port

This is the port for accessing Elasticsearch, ranging from9200-9300.

2

http.publish_port

This port is used for http clients and is also very useful in firewall situations.

3

http.bind_host

This is the host address of the http service.

4

http.publish_host

This is the host address of the http client.

5

http.max_content_length

This is the maximum size of the content in the http request. Its default value is100mb.

6

http.max_initial_line_length

This is the maximum size of the URL, with the default value being4kb.

7

http.max_header_size

This is the maximum size of the http header, with the default value being8kb.

8

http.compression

This will enable or disable support for compression, with the default value being false.

9

http.pipelinig

This will enable or disable HTTP pipelining.

10

http.pipelining.max_events

This limits the number of events to be queued before closing the HTTP request.

Index

This module maintains the global settings for each index. The following settings are mainly related to memory usage-

Circuit breaker

This is used to prevent operations from causing OutOfMemoryError. This setting mainly limits the JVM heap size. For example, the indexs.breaker.total.limit setting, by default, is the size of the JVM heap.70%.

Field data cache

主要用于在字段上聚合时使用. It is recommended to have enough memory to allocate it. The index.fielddata.cache.size setting can be used to control the amount of memory used for field data caching.

Node query cache

This memory is used to cache query results. The cache uses the least recently used (LRU) eviction policy. The Indices.queries.cache.size setting controls the memory size of this cache.

Index buffer

This buffer stores newly created documents in the index and refreshes them when the buffer is full. Settings like indexs.memory.index_buffer_size control the number of heaps allocated to this buffer.

Shard request cache

This cache is used to store local search data for each shard. It can be enabled during index creation and disabled by sending URL parameters.

Disable cache - ?request_cache = true
Enable cache "index.requests.cache.enable": true

Index recovery

It controls resources during the recovery process. The following settings are provided-

SettingsDefault Value
indices.recovery.concurrent_streams3
indices.recovery.concurrent_small_file_streams2
indices.recovery.file_chunk_size512kb
indices.recovery.translog_ops1000
indices.recovery.translog_size512kb
indices.recovery.compresstrue
indices.recovery.max_bytes_per_sec40mb

TTL Interval

The TTL interval setting defines the time of the document, after which the document will be deleted. The following are dynamic settings used to control this process-

SettingsDefault Value
indices.ttl.interval60s
indices.ttl.bulk_size1000

Node

Each node can choose whether it is a data node. This attribute can be changed by modifying the node.data setting. Setting this value to false defines the node as not a data node.