Elasticsearch提供了一种在非结构化数据中进行分析的有力工具。
Getting started
Basic concepts
NRT Cluster Node Index Document Shards and Replicas
Data
Query DSL
- _source: similar to “SELECT FROM COL1, COL2”
- match_all: where clause
- sort: DESC/AESC
- bool query: AND/OR/NOT in SQL
- size: TOP/FETCH FIRST
- score: evaluate that how relavant the field is, the higher the better matched
- range: criteria1 <= FIELD <= criteria2
- aggregation: just like GROUP BY etc. Can be nested.
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}
SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC
Setup Elasticsearch
FYI. MY COMPUTER SUCKS AND I COULD NOT DEMO WHILE STREAMING
- JVM > 1.8.0_131
- Brew will be fine
API Conventionis
Date math sample
Searches the Logstash indices for the past three days
# GET /<logstash-{now/d-2d}>,<logstash-{now/d-1d}>,<logstash-{now/d}>/_search
GET /%3Clogstash-%7Bnow%2Fd-2d%7D%3E%2C%3Clogstash-%7Bnow%2Fd-1d%7D%3E%2C%3Clogstash-%7Bnow%2Fd%7D%3E/_search
{
"query" : {
"match": {
"test": "data"
}
}
}
Access control
Specified in elasticsearch.yml file
rest.action.multi.allow_explicit_index: false
Documents
Write model
- Replication group - primary shard vs replica shards
- in-sync copies: available copies
Read model
-
coordinating node
- Versioning
- Operation type: op_type
- Routing
- timeout
Misc
- NDJSON: Newline Delimited JSON
- Apache Lucene