Elasticsearch Introduction

2020-10-23

Elasticsearch提供了一种在非结构化数据中进行分析的有力工具。

Getting started

Basic concepts

NRT Cluster Node Index Document Shards and Replicas

Data

Query DSL

_source: similar to “SELECT FROM COL1, COL2”
match_all: where clause
sort: DESC/AESC
bool query: AND/OR/NOT in SQL
size: TOP/FETCH FIRST
score: evaluate that how relavant the field is, the higher the better matched
range: criteria1 <= FIELD <= criteria2
aggregation: just like GROUP BY etc. Can be nested.

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC

Setup Elasticsearch

FYI. MY COMPUTER SUCKS AND I COULD NOT DEMO WHILE STREAMING

JVM > 1.8.0_131
Brew will be fine

API Conventionis

Date math sample

Searches the Logstash indices for the past three days

# GET /<logstash-{now/d-2d}>,<logstash-{now/d-1d}>,<logstash-{now/d}>/_search
GET /%3Clogstash-%7Bnow%2Fd-2d%7D%3E%2C%3Clogstash-%7Bnow%2Fd-1d%7D%3E%2C%3Clogstash-%7Bnow%2Fd%7D%3E/_search
{
  "query" : {
    "match": {
      "test": "data"
    }
  }
}

Access control

Specified in elasticsearch.yml file

rest.action.multi.allow_explicit_index: false

Documents

Write model

Replication group - primary shard vs replica shards
in-sync copies: available copies

Read model

coordinating node
Versioning
Operation type: op_type
Routing
timeout

Misc

NDJSON: Newline Delimited JSON
Apache Lucene