Elasticsearch
Elasticsearch is a search engine based on Lucene with a distributed full-text search engine with an HTTP web interface storing JSON documents.
Links
- https://www.elastic.co/
- https://en.wikipedia.org/wiki/Elasticsearch
- eBook Elasticsearch: The Definitive Guide
Getting a test installation via Docker
https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:5.6.3
Elastic query
Introducing the query language
Your data in Elastic is stored in an so called "index". Which is not the same thing as an index in the database world but more like a table.
In this example the index is called "MY_INDEX"
With this you can search for all documents in the index
It starts with "GET" because you only give the URL and no extra payload
In the results you will find an ID field. You can use this ID field to directly access a document like this
This also works for DELETE - get rid of document and PUT to over- write a document. For PUT you need to provide a body with {} where the new content is
{
"name": "john"
}
If we want to have more complex filter criteria we need to use POST instead of GET and also provide a body. Normally elastic only return the most relevent documents to you and therefore calculates a score for each document to sort them. If you do not need this you should use constant_score so every document has the same score.
{
"from": 0,
"size":3,
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"firstName": "John"
}
}
]
}
}
}
}
}
So this is the equivalent of
With more than one criteria that needs to be fulfilled (in this example we test with exists if a specific field exists and is non null)
{
"from": 0,
"size":3,
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"firstName": "John"
}
},
{
"term": {
"secondName": "Doe"
}
},
{
"exists": { "field" : "userId" }
}
]
}
}
}
}
}
You can also have
- conditions that are not allowed to be there
- check if a timestamp is a certain range
- test if an array contains a value
{
"from": 0,
"size":3,
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"firstName": "John"
}
},
{
"script": {
"script": "doc['myArray'].values.contains('myValue')"
}
},
{
"range": {
"mytimestamp": {
"gte": "2017-10-22T00:00:00Z",
"lte": "2017-10-23T00:00:00Z"
}
}
}
],
"must_not": [
{
"term": {
"secondName": "Doe"
}
}
]
}
}
}
}
}
The script keyword opens you a way to use the result programs in different programing languages as a value for your query. But with great power ...
{
"from": 0,
"size": 1,
"query": {
...
},
"aggs": {
"foobar": {
"terms": {
"script": "def res=[]; def m=params['_source'].productDna; if(m!=null){for(x in m.keySet()){def f=m.get(x); if(f>0){res.add(x);}}} return res",
"size": 10
}
}
}
}
Or regular expressions (don't work with must but only with should)
{
"from": 0,
"size": 0,
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"regexp": {
"firstName": "s22.*"
}
}
]
}
}
}
}
}
You can also aggregate results like with GROUP BY in the SQL world.
This gives you 10 values that the field "secondName" has within the search results. None of the actual search results is shown to you (size 0 / size 10)
Aggreations are limited to 10 by default!
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"firstName": "John"
}
}
]
}
}
}
},
"aggs": {
"allSecondNames": {
"terms": {
"field": "secondName",
"size": 10
}
}
}
}
This get the average value of the field timeSpent and also does not show you any document
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"firstName": "John"
}
}
]
}
}
}
},
"aggs": {
"myResult": {
"avg": {
"field": "timeSpent"
}
}
}
}
Instead of just getting the average you can also bucket all values of a field, in this example the field timeSpent and bucket size is 500
{
"size" : 0,
"query" : {
"bool" : {
"must" : [
{
"term" : {
"firstName": "John"
}
},
{
"term" : {
"secondName": "Doe"
}
}
]
}
},
"aggs": {
"myResult": {
"histogram": {
"field": "timeSpent",
"interval" : 500
}
}
}
}
You can also talk directly to the REST endpoint of your Elastic instance via curl, very useful to post process results
curl -X POST 'http://127.0.0.1:9200/YOUR_INDEX/_search' -d '{
"from": 0,
"size":9999,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"regexp": {
"type": "from.*"
}
}
}
}
}' | grep _id | sed 's/_id/\n_id/g' | grep ^_id | sed s/\",\".*// | sed s/^.*\"//
Show Delete Entry
You need the type and the id of an entry to show or delete it (which you can get from a query result) So assume this is the result from a query
{
...,
"_type": "YOUR_TYPE",
"_id": "YOUR_ID",
...
}
Try this (in GET mode)
/YOUR_INDEX/YOUR_TYPE/YOUR_ID
Or via curl
curl -X GET 'http://127.0.0.1:9200/YOUR_INDEX/YOUR_TYPE/YOUR_ID'
For deletion just replace by delete
Maintanance
Get status
GET _cluster/health?pretty
GET _cluster/allocation/explain
GET _cluster/settings
GET /_cluster/settings?pretty&include_defaults
Exclude node from cluster
Exclude one node and allow all nodes again
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._ip": "10.20.30.40"
}
}
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._ip": ""
}
}
Templates
Elastic uses templates when it creates an index. This lists all templates and prints one
GET /_template/my_template_name