ElasticSearch Notes
Page Contents
Intro
- See: https://github.com/codingexplained/complete-guide-to-elasticsearch
- See: https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html
-
Elasticsearch is the distributed search and analytics engine ... Elasticsearch provides near real-time search and analytics for all types of data.
- Often used for Application Performance Management (APM)
- Analayse application logs and system metrics, most often to detect errors or understand resource usage etc.
- Log stash
- Input > Filters > Output
- Each stage can have plugins
- E.g. reading in log files
- File is input and log stash treats each line as an event
- Filter stage parses the input data to make sense of it: i.e., structure unstructured data
- E.g. use of "grok pattern"
- Take raw line and turn into fields, to send for eg to ElasticSearch
- Allows seperation of concerns - the apps sending the data don't need to known about how it should be processed - logstash handles that logic.
- X-Pack
- Add user authentication and access control to ElasticStash
- Monitoring
- Reporting
- Elasticsearch SQL
- Normally used Query DSL - this makes it easier for SQL-familiaar developers
- Translates SQL to Query DSL.
- Translate APIs also exist
- Helper tool to get started - Query DSL probably best to use once familiar
- Beats
- Collection of light weight data shippers
- Single pupose - send data to logstash or ES.
- e.g. FileBeat - sends log files to ES
- e.g. MetricBeat - sends resource usage info to ES.
- Single node Docker image for learning:
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.16.3
docker run -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.16.3
- Took quite some time to start up for me on an i7 with 16bg ram although system was doing other things.
curl http://127.0.0.1:9200
to figure out if it is running okay
- Run Kibana and ES:
- See https://www.elastic.co/guide/en/kibana/current/docker.html
docker network create elastic docker pull docker.elastic.co/elasticsearch/elasticsearch:7.16.3 docker run --name es01-test --net elastic -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.16.3
docker pull docker.elastic.co/kibana/kibana:7.16.3 docker run --name kib01-test --net elastic -p 127.0.0.1:5601:5601 -e "ELASTICSEARCH_HOSTS=http://es01-test:9200" docker.elastic.co/kibana/kibana:7.16.3
- To access Kibana, go to http://localhost:5601
- Some ES directory structure:
- Notable
bin
utils|-- bin | |-- ... | |-- elasticsearch-cli | |-- elasticsearch-plugin # Install plugins | |-- elasticsearch-sql-cli # Do SQL-like queries instead of using Query DSM |
- Config
*
|-- config | |-- elasticsearch-plugins.example.yml | |-- elasticsearch.keystore | |-- elasticsearch.yml #<<< This is the main config file | |-- jvm.options #<<< Runs on JVM - HEAP size best thing to mod. | |-- jvm.options.d | |-- log4j2.file.properties | |-- log4j2.properties | |-- role_mapping.yml | |-- roles.yml | |-- users | `-- users_roles
elasticsearch.yaml' * Commented out by default so defaults are used *
cluster.name: Best practice to set this! *
node.name`: Best practice to set this!
- Notable
- Basic Architecture
- Node: Esentially an ES instance
- Can run many nodes. Each node can store part of data set: distributed storage = large storage.
- Node == ES instance so can run many nodes on one machine.
- Each node belongs to a cluster
- Node is always part of a cluster, even if single node in cluster.
- Cluster - collection of nodes.
- Split is normally for logical seperation.
- Document - A unit of data stored in a cluster.
- JSON object.
- Index: Every document is stored within an index.
- Groups documents together logicially.
- Provide scalability and availablility settings.
- Search queries are run against indicies.
- Node: Esentially an ES instance
- Basic cURL queries (Kibana makes this waaay easier!)
- Local ES
curl -XGET "http://localhost:9200/_cluster/health"
curl -XGET "http://localhost:9208/_cat/indicies?v"
curl -XGET "http://localhost:9208/.kibana/_search" -H 'Content-Type: applicaton/json' -d'{ "query": { "match_all": {} }}'
- Cloud ES - needs authentication
- `curl -XGET -u username:password "https://3aodfff....sa.eu-central.aws.cloud.es.io:9243/.kibana/_search" -H 'Content-Type: applicaton/json' -d'{ "query": { "match_all": {} }}'
- Local ES
- Sharding & Scalability
- To store 1TB can use 2 nodes of 0.5TB - can aggregate node storage
- Done using sharding.
- Sharding is a way to divide inidicies into smaller pieces, where each piece is a shard
- Done at the index level!
- One shard must be on a single node and can be placed on any node. Many to one.
- Horizontally scale the data volume.
- Each shard in an Apache Lucene index
- Can improve performance by running queries on multiple shards at the same time.
- To store 1TB can use 2 nodes of 0.5TB - can aggregate node storage