Elasticsearch Introduction

2020-10-23

Elasticsearch提供了一种在非结构化数据中进行分析的有力工具。

Getting started

Basic concepts

NRT Cluster Node Index Document Shards and Replicas

Data

Query DSL

  • _source: similar to “SELECT FROM COL1, COL2”
  • match_all: where clause
  • sort: DESC/AESC
  • bool query: AND/OR/NOT in SQL
  • size: TOP/FETCH FIRST
  • score: evaluate that how relavant the field is, the higher the better matched
  • range: criteria1 <= FIELD <= criteria2
  • aggregation: just like GROUP BY etc. Can be nested.
GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC

Setup Elasticsearch

FYI. MY COMPUTER SUCKS AND I COULD NOT DEMO WHILE STREAMING

  • JVM > 1.8.0_131
  • Brew will be fine

API Conventionis

Date math sample

Searches the Logstash indices for the past three days

# GET /<logstash-{now/d-2d}>,<logstash-{now/d-1d}>,<logstash-{now/d}>/_search
GET /%3Clogstash-%7Bnow%2Fd-2d%7D%3E%2C%3Clogstash-%7Bnow%2Fd-1d%7D%3E%2C%3Clogstash-%7Bnow%2Fd%7D%3E/_search
{
  "query" : {
    "match": {
      "test": "data"
    }
  }
}

Access control

Specified in elasticsearch.yml file

rest.action.multi.allow_explicit_index: false

Documents

Write model

  • Replication group - primary shard vs replica shards
  • in-sync copies: available copies

Read model

  • coordinating node

  • Versioning
  • Operation type: op_type
  • Routing
  • timeout

Misc

  • NDJSON: Newline Delimited JSON
  • Apache Lucene