My Notes

 _____ _           _   _      ____                      _     
| ____| | __ _ ___| |_(_) ___/ ___|  ___  __ _ _ __ ___| |__  
|  _| | |/ _` / __| __| |/ __\___ \ / _ \/ _` | '__/ __| '_ \ 
| |___| | (_| \__ \ |_| | (__ ___) |  __/ (_| | | | (__| | | |
|_____|_|\__,_|___/\__|_|\___|____/ \___|\__,_|_|  \___|_| |_|
                                                              
 ____  _             _       
/ ___|| |_ _   _  __| |_   _ 
\___ \| __| | | |/ _` | | | |
 ___) | |_| |_| | (_| | |_| |
|____/ \__|\__,_|\__,_|\__, |
                       |___/ 

Index some test data

# Creation (C)
curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d '{
'name': "John Doe"
}' | jq .
# jq . simply colorizes the output

Note 1: If you run the exact same command again, it shows version 2, meaning that the file is modified.

Explanation: This request automatically creates customer index and adds a document that has an ID of 1 and stores and indexes “name” field. The first response shows that version 1 was created. (See note)

Output

{
  "_index": "customer",
  "_type": "_doc",
  "_id": "1",
  "_version": 2,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 1,
  "_primary_term": 1
}

To obtain it again,

# Retrieval (R)
curl "localhost:9200/customer/_doc/1" | jq .

Gives the result as:

{
  "_index": "customer",
  "_type": "_doc",
  "_id": "1",
  "_version": 2,
  "_seq_no": 1,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "name": "John Doe"
  }
}

There is Bulk update API as well, for that as an example download accounts.json file, which has about 1000 account details and they will be added as follows.

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
# To view progress
curl "localhsot:9200/_cat/indices?v"

Output:

# Upon finishing of command


{
  "took": 2881,
  "errors": false,
  "items": [
    {
      "index": {
        "_index": "bank",
        "_type": "_doc",
        "_id": "1",
        "_version": 2,
        "result": "updated",
        "forced_refresh": true,
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 1000,
        "_primary_term": 1,
        "status": 200

...

And upon inspecting the beautified full index

vagrant@vagrant:~$ curl "localhost:9200/_cat/indices?v" 
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank      mXhxv0r3RyiLgiC5CYkY7Q   1   1       1000         1000    811.9kb        811.9kb
yellow open catalog   u7jc2R1mRdS7e8ISowUtdQ   1   1          2            0     12.5kb         12.5kb
yellow open vehicles  0lwv1otoQMiPSL8ueao8hw   1   1          1            0      4.4kb          4.4kb
yellow open vehibikes 0ikRWFEaT0STQuqcyy6M2A   1   1          1            0      4.4kb          4.4kb
yellow open customer  pBDz9u_KTa-w49GH5oq4dw   1   1          1            0      3.5kb          3.5kb
vagrant@vagrant:~$ 

Searching

Searches can be performed by sending requests to _search endpoint, for full suite access one can use the ElasticSearch Query DSL to specify search criteria.

To retrieve all the documents in the bank index sorted by account number,

curl -X GET "localhost:9200/bank/_search?pretty" _H "Content-Type: application/json" -d '{
  "query": { "match_all": {}  },
  "sort" : [
    { "account_number": "asc" }
  ]
}'

Which results:

{
  "took" : 41,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "0",
        "_score" : null,
        "_source" : {
          "account_number" : 0,
          "balance" : 16623,
          "firstname" : "Bradshaw",
          "lastname" : "Mckenzie",
          "age" : 29,
          "gender" : "F",
          "address" : "244 Columbus Place",
          "employer" : "Euron",
          "email" : "bradshawmckenzie@euron.com",
          "city" : "Hobucken",
          "state" : "CO"
        },
        "sort" : [
          0
        ]
      },
      {
        "_index" : "bank",

        ...

Response also provides following crucial stuffs,

key	Meaning
`took`	Time es took to run the query
`timed_out`	Whether request was timed out or not
`shards`	Number of shards that were searched
`max_score`	Score of most relevant document
`hits.total.value`	how many total matching docs were found
`hits.sort`	Sort position of documents
`hits._score`	Docs relevance score (not applicable on `match_all`)

Each search request is self contained, stateless basically so as to page through the search hits, one must specify the from and size params in your request.

For example, to get the hits 10 through 19,

curl -X GET "http://localhost:9200/bank/_search" -d '{
  "query": { "match_all": {} },
  "sort": [
      { "account_number": "asc" }
  ],
  "from" : 10,
  "size" : 10,
}'

Which basically results in

{
  "took" : 40,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : null,
        "_source" : {
          "account_number" : 10,
          "balance" : 46170,
          "firstname" : "Dominique",
          "lastname" : "Park",
          "age" : 37,
          "gender" : "F",
          "address" : "100 Gatling Place",
          "employer" : "Conjurica",
          "email" : "dominiquepark@conjurica.com",
          "city" : "Omar",
          "state" : "NJ"
        },
        "sort" : [
          10
        ]
      },
      {
        "_index" : "bank",

...

So as to perform a phrase search, instead of matching individual terms, one can use the match_phrase instead of match.

For example, the following request will only match addresses that contain phrase mill lane:

curl -X GET "http://localhost:9200/bank/_search" -d '{
"query": { "match_phrase": {
"address": "mill lane" 
} }
}'

Which results in following output:

{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 9.507477,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "136",
        "_score" : 9.507477,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      }
    ]
  }
}

To construct more complex queries, one can use a bool query to combine multiple query criteria, One can designate criteria as required (must match), desirable (should match), or undesireable (must not match)

For example, following query request searches the bank index for accounts that belong to customers who are40 years old, but excludes anyone who lives in Idaho (ID):

{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 9.507477,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "136",
        "_score" : 9.507477,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      }
    ]
  }
}

Following request uses a range filter (gte and lte) to result the accounts with a balance between $20k and $30k

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}
'

Output:

{
  "took" : 38,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 217,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "49",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 49,
          "balance" : 29104,
          "firstname" : "Fulton",
          "lastname" : "Holt",
          "age" : 23,
          "gender" : "F",
          "address" : "451 Humboldt Street",
          "employer" : "Anocha",
          "email" : "fultonholt@anocha.com",
          "city" : "Sunriver",
          "state" : "RI"
        }
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "102",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 102,
          "balance" : 29712,
          "firstname" : "Dena",
          "lastname" : "Olson",
          "age" : 27,
          "gender" : "F",
          "address" : "759 Newkirk Avenue",
          "employer" : "Hinway",
          "email" : "denaolson@hinway.com",
          "city" : "Choctaw",
          "state" : "NJ"
        }
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "133",
        "_score" : 1.0,
        "_source" : {

Aggregations

Elasticsearch aggregations allow us to obtain metainfo about our search result and help us in answering questions like “How many account holders are in texas?” or “whats the average balance of accounts in Tennessee”

Example: Following request uses a “terms” aggregation to group all of the accounts in bank index by state and returns the 10 states with most accounts in descending order:

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}
'

Results into:

{
  "took" : 698,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 743,
      "buckets" : [
        {
          "key" : "TX",
          "doc_count" : 30
        },
        {
          "key" : "MD",
          "doc_count" : 28
        },
        {
          "key" : "ID",
          "doc_count" : 27
        },
        {
          "key" : "AL",
          "doc_count" : 25
        },
        {
          "key" : "ME",
          "doc_count" : 25
        },
        {
          "key" : "TN",
          "doc_count" : 25
        },
        {
          "key" : "WY",
          "doc_count" : 25
        },
        {
          "key" : "DC",
          "doc_count" : 24
        },
        {
          "key" : "MA",
          "doc_count" : 24
        },
        {
          "key" : "ND",
...

We can also combine the aggregations to build more complex summaries of our data, for example the following request nests an average aggregation within the previous group_by_state and calculates the average account balances for each of the states

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
'

Search format

Example is from udemy course.

GET /courses/_search
{
"query": {
"bool": {
"must":[
{"match": {"name": "Accounting"}},
{"mathc": {"room": "E3"}}
],
"must_not":[{"match": {"room": "e3"}}]
}
}
}