하위 쿼리 집계와 같은 Elasticsearch SQL

저는 대부분의 시나리오를 다룰 수 있는지 이해하기 위해 ES를 가지고 놀고 있습니다. 나는 SQL에서 매우 단순한 특정 결과에 도달하는 방법을 고민하는 시점에 있습니다. 하위 쿼리 집계와 같은 Elasticsearch SQL

이

내가 SQL I의 특정 날짜 범위에 물린 얼마나 많은 다른 매장 명에서 구입 한 과일 알고 싶다면 나는이 문서

{ "Id": 1, "Fruit": "Banana", "BoughtInStore"="Jungle", "BoughtDate"=20160101, "BestBeforeDate": 20160102, "BiteBy":"John"} 
{ "Id": 2, "Fruit": "Banana", "BoughtInStore"="Jungle", "BoughtDate"=20160102, "BestBeforeDate": 20160104, "BiteBy":"Mat"} 
{ "Id": 3, "Fruit": "Banana", "BoughtInStore"="Jungle", "BoughtDate"=20160103, "BestBeforeDate": 20160105, "BiteBy":"Mark"} 
{ "Id": 4, "Fruit": "Banana", "BoughtInStore"="Jungle", "BoughtDate"=20160104, "BestBeforeDate": 20160201, "BiteBy":"Simon"} 
{ "Id": 5, "Fruit": "Orange", "BoughtInStore"="Jungle", "BoughtDate"=20160112, "BestBeforeDate": 20160112, "BiteBy":"John"} 
{ "Id": 6, "Fruit": "Orange", "BoughtInStore"="Jungle", "BoughtDate"=20160114, "BestBeforeDate": 20160116, "BiteBy":"Mark"} 
{ "Id": 7, "Fruit": "Orange", "BoughtInStore"="Jungle", "BoughtDate"=20160120, "BestBeforeDate": 20160121, "BiteBy":"Simon"} 
{ "Id": 8, "Fruit": "Kiwi", "BoughtInStore"="Shop", "BoughtDate"=20160121, "BestBeforeDate": 20160121, "BiteBy":"Mark"} 
{ "Id": 8, "Fruit": "Kiwi", "BoughtInStore"="Jungle", "BoughtDate"=20160121, "BestBeforeDate": 20160121, "BiteBy":"Simon"}

와 인덱스를 가지고있는 예 탄성에

입니다 이

SELECT 
    COUNT(DISTINCT kpi.Fruit) as Fruits, 
    kpi.BoughtInStore, 
    kpi.BiteBy 
FROM 
    (
     SELECT f1.Fruit, f1.BoughtInStore, f1.BiteBy 
     FROM FruitsTable f1 
     WHERE f1.BoughtDate = (
      SELECT MAX(f2.BoughtDate) 
      FROM FruitsTable f2 
      WHERE f1.Fruit = f2.Fruit 
      and f2.BoughtDate between 20160101 and 20160131 
      and (f2.BestBeforeDate between 20160101 and 20160131) 
     ) 
    ) kpi 
GROUP BY kpi.BoughtInStore, kpi.ByteBy

결과 같은 것을 쓰기하면이

,536,913 같은 것입니다

{ "Fruits": 1, "BoughtInStore": "Jungle", "BiteBy"="Mark"} 
{ "Fruits": 1, "BoughtInStore": "Shop", "BiteBy"="Mark"} 
{ "Fruits": 2, "BoughtInStore": "Jungle", "BiteBy"="Simon"}

집계를 사용하여 탄성파에서 어떻게 동일한 결과를 얻을 수 있는지 알고 있습니까? 몇 마디에

나는 탄성에 직면하고있는 문제는 다음과 같습니다

는
방법 (각 과일마다 범위이 예에서 최신 행 등)을 통합하기 전에 데이터 subsed 준비 방법 나는 동일한 쿼리의 필터에 집계 결과를 참조 할 수있는 방법이 없습니다 알고있는 것처럼 여러 필드를 기준으로 그룹 결과에

당신에게

출처

2016-06-24 Simone Belia

감사드립니다.

그래서

GET /purchases/fruits/_search 
{ 
    "query": { 
    "filtered":{ 
     "filter": { 
     "range": { 
      "BoughtDate": { 
      "gte": "2015-01-01", //assuming you have right mapping for dates 
      "lte": "2016-03-01" 
      } 
     } 
     } 
    } 
    }, 
    "sort": { "BoughtDate": { "order": "desc" }}, 
    "aggs": { 
    "byBoughtDate": { 
     "terms": { 
     "field": "BoughtDate", 
     "order" : { "_term" : "desc" } 
     }, 
     "aggs": { 
     "distinctCount": { 
      "cardinality": { 
      "field": "Fruit" 
      } 
     } 
     } 
    } 
    } 
}

당신이 날짜의 범위 내에서 모든 문서를해야합니다, 당신은 기간으로 분류 집계 버킷 수를 가지고, 그래서 최대 날짜가 될 것입니다 : 그래서 당신은 단일 쿼리와 함께 퍼즐의 일부를 해결할 수 있습니다 상단에. 클라이언트는이 첫 번째 버킷 (카운트 및 값 모두)을 구문 분석 한 다음이 날짜 값에 대한 문서를 가져올 수 있습니다. 별개의 열매 수의 경우 중첩 된 카디널리티 집계 만 사용합니다.

네는 쿼리는 필요한 것보다 더 많은 정보를 반환하지만 그 수명 :

출처

2016-06-30 20:23:13 xeye

당연히 Elasticsearch DSL에 SQL에서 직접 경로가 없다, 그러나 꽤 일반적인 상관 관계가있다.

처음에는 GROUP BY/HAVING이 집계됩니다. 일반적인 질의 의미는 일반적으로 질의 DSL에 의해 다루어 질 수있다. 통합 이전

그래서, 당신은 종류의 서로 다른 두 가지를 요구하고 (이 예 각 과일마다 범위의 최신 행처럼) 데이터의 subsed 준비 방법

.

는 집계

하기 전에 데이터 subsed 준비하는 방법이 쿼리 단계입니다.아닌 일반 쿼리 : 당신은 기술적으로이 예제에 대한 답을 얻을 수 집계를 요구하고

(이 예 각 과일마다 범위의 최신 행에서 같은)

. 귀하의 예를 들어, 이것을 얻으려면 MAX을하고 있는데, GROUP BY를 사용하여 효과가 있습니다.

어떻게 여러 필드
상황에 따라 다르다

에 의한 그룹 결과. 그들을 계층화하고 싶습니까 (일반적으로, 예) 또는 함께 원하십니까?

계층을 원한다면 하위 집계를 사용하여 원하는 것을 얻을 수 있습니다. 그것들을 결합하고 싶다면 일반적으로 다른 그룹에 대해 filters 집합을 사용하면됩니다.

다시 모으기 : 특정 필터링 된 기간이 주어지면 과일 당 가장 최근 구매를 원합니다. 날짜 범위는 단지 일반 쿼리/필터입니다 : 그와

{ 
    "query": { 
    "bool": { 
     "filter": [ 
     { 
      "range": { 
      "BoughtDate": { 
       "gte": "2016-01-01", 
       "lte": "2016-01-31" 
      } 
      } 
     }, 
     { 
      "range": { 
      "BestBeforeDate": { 
       "gte": "2016-01-01", 
       "lte": "2016-01-31" 
      } 
      } 
     } 
     ] 
    } 
    } 
}

, 어떤 문서 (효과적으로 AND) 두 필드에 그 날짜 범위 내에 있지 않은 요청에 포함되지 않습니다. 필자는 필터를 사용했기 때문에 unscored and cacheable입니다.

이제 나머지 정보를 얻기 위해 집계를 시작해야합니다. 먼저 우리가보고있는 것을 단순화하기 위해 위의 필터를 사용하여 문서가 필터링되었다고 가정 해 보겠습니다. 우리는 그것을 결국 결합 할 것입니다. 당신이 실제로 안타 신경 쓰지 않기 때문에

{ 
    "size": 0, 
    "aggs": { 
    "group_by_date": { 
     "date_histogram": { 
     "field": "BoughtDate", 
     "interval": "day", 
     "min_doc_count": 1 
     }, 
     "aggs": { 
     "group_by_store": { 
      "terms": { 
      "field": "BoughtInStore" 
      }, 
      "aggs": { 
      "group_by_person": { 
       "terms": { 
       "field": "BiteBy" 
       } 
      } 
      } 
     } 
     } 
    } 
    } 
}

당신은 최상위 수준에 "size" : 0를 원한다. 집계 된 결과 만 필요합니다.

첫 번째 집계는 실제로 가장 최근 날짜별로 그룹화되었습니다. 좀 더 현실감있게하기 위해 약간 변경했는데 (각 일), 사실은 똑같습니다. MAX을 사용하는 방식으로 집합을 "size": 1과 함께 사용할 수 있지만 truer 일 때 (아마도 시간!)에 어떻게 할 것인가? 또한 데이터가없는 일치하는 문서에서 일을 무시하도록 요청했습니다. 시작부터 끝까지 진행되므로 실제로 해당 날짜를 염려하지 않습니다.

실제적으로이 마지막 날만 원하면 파이프 라인 집계를 사용하여 최대 양동이를 제외한 모든 항목을 삭제할 수 있지만이 유형의 요청을 실제로 사용하려면 전체 기간이 필요합니다.

그래서 우리는 원하는대로 가게별로 그룹화하여 계속합니다. 그런 다음 사람 (BiteBy)별로 하위 그룹화합니다. 이것은 암시 적으로 당신에게 카운트를 줄 것이다.

다시 모두 함께 퍼팅 :

{ 
    "size": 0, 
    "query": { 
    "bool": { 
     "filter": [ 
     { 
      "range": { 
      "BoughtDate": { 
       "gte": "2016-01-01", 
       "lte": "2016-01-31" 
      } 
      } 
     }, 
     { 
      "range": { 
      "BestBeforeDate": { 
       "gte": "2016-01-01", 
       "lte": "2016-01-31" 
      } 
      } 
     } 
     ] 
    } 
    }, 
    "aggs": { 
    "group_by_date": { 
     "date_histogram": { 
     "field": "BoughtDate", 
     "interval": "day", 
     "min_doc_count": 1 
     }, 
     "aggs": { 
     "group_by_store": { 
      "terms": { 
      "field": "BoughtInStore" 
      }, 
      "aggs": { 
      "group_by_person": { 
       "terms": { 
       "field": "BiteBy" 
       } 
      } 
      } 
     } 
     } 
    } 
    } 
}

참고 : 여기에 내가 데이터를 색인 방법입니다.

PUT /grocery/store/_bulk 
{"index":{"_id":"1"}} 
{"Fruit":"Banana","BoughtInStore":"Jungle","BoughtDate":"2016-01-01","BestBeforeDate":"2016-01-02","BiteBy":"John"} 
{"index":{"_id":"2"}} 
{"Fruit":"Banana","BoughtInStore":"Jungle","BoughtDate":"2016-01-02","BestBeforeDate":"2016-01-04","BiteBy":"Mat"} 
{"index":{"_id":"3"}} 
{"Fruit":"Banana","BoughtInStore":"Jungle","BoughtDate":"2016-01-03","BestBeforeDate":"2016-01-05","BiteBy":"Mark"} 
{"index":{"_id":"4"}} 
{"Fruit":"Banana","BoughtInStore":"Jungle","BoughtDate":"2016-01-04","BestBeforeDate":"2016-02-01","BiteBy":"Simon"} 
{"index":{"_id":"5"}} 
{"Fruit":"Orange","BoughtInStore":"Jungle","BoughtDate":"2016-01-12","BestBeforeDate":"2016-01-12","BiteBy":"John"} 
{"index":{"_id":"6"}} 
{"Fruit":"Orange","BoughtInStore":"Jungle","BoughtDate":"2016-01-14","BestBeforeDate":"2016-01-16","BiteBy":"Mark"} 
{"index":{"_id":"7"}} 
{"Fruit":"Orange","BoughtInStore":"Jungle","BoughtDate":"2016-01-20","BestBeforeDate":"2016-01-21","BiteBy":"Simon"} 
{"index":{"_id":"8"}} 
{"Fruit":"Kiwi","BoughtInStore":"Shop","BoughtDate":"2016-01-21","BestBeforeDate":"2016-01-21","BiteBy":"Mark"} 
{"index":{"_id":"9"}} 
{"Fruit":"Kiwi","BoughtInStore":"Jungle","BoughtDate":"2016-01-21","BestBeforeDate":"2016-01-21","BiteBy":"Simon"}

그건 당신이 집계하려는 문자열 값 (저장 및 사람) not_analyzedstring의 (ES의 keyword 5.0)이다 중요한! 그렇지 않으면 fielddata라는 것을 사용하게 될 것이고 그것은 좋은 것이 아닙니다.

매핑은 ES 1.x에서이 같을 것이다/ES 2.x에서 :

PUT /grocery 
{ 
    "settings": { 
    "number_of_shards": 1 
    }, 
    "mappings": { 
    "store": { 
     "properties": { 
     "Fruit": { 
      "type": "string", 
      "index": "not_analyzed" 
     }, 
     "BoughtInStore": { 
      "type": "string", 
      "index": "not_analyzed" 
     }, 
     "BiteBy": { 
      "type": "string", 
      "index": "not_analyzed" 
     }, 
     "BestBeforeDate": { 
      "type": "date" 
     }, 
     "BoughtDate": { 
      "type": "date" 
     } 
     } 
    } 
    } 
}

함께이의

모든, 당신은 같은 대답 얻을 : 지금 현재로

{ 
    "took": 8, 
    "timed_out": false, 
    "_shards": { 
    "total": 1, 
    "successful": 1, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 8, 
    "max_score": 0, 
    "hits": [] 
    }, 
    "aggregations": { 
    "group_by_date": { 
     "buckets": [ 
     { 
      "key_as_string": "2016-01-01T00:00:00.000Z", 
      "key": 1451606400000, 
      "doc_count": 1, 
      "group_by_store": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 0, 
      "buckets": [ 
       { 
       "key": "Jungle", 
       "doc_count": 1, 
       "group_by_person": { 
        "doc_count_error_upper_bound": 0, 
        "sum_other_doc_count": 0, 
        "buckets": [ 
        { 
         "key": "John", 
         "doc_count": 1 
        } 
        ] 
       } 
       } 
      ] 
      } 
     }, 
     { 
      "key_as_string": "2016-01-02T00:00:00.000Z", 
      "key": 1451692800000, 
      "doc_count": 1, 
      "group_by_store": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 0, 
      "buckets": [ 
       { 
       "key": "Jungle", 
       "doc_count": 1, 
       "group_by_person": { 
        "doc_count_error_upper_bound": 0, 
        "sum_other_doc_count": 0, 
        "buckets": [ 
        { 
         "key": "Mat", 
         "doc_count": 1 
        } 
        ] 
       } 
       } 
      ] 
      } 
     }, 
     { 
      "key_as_string": "2016-01-03T00:00:00.000Z", 
      "key": 1451779200000, 
      "doc_count": 1, 
      "group_by_store": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 0, 
      "buckets": [ 
       { 
       "key": "Jungle", 
       "doc_count": 1, 
       "group_by_person": { 
        "doc_count_error_upper_bound": 0, 
        "sum_other_doc_count": 0, 
        "buckets": [ 
        { 
         "key": "Mark", 
         "doc_count": 1 
        } 
        ] 
       } 
       } 
      ] 
      } 
     }, 
     { 
      "key_as_string": "2016-01-12T00:00:00.000Z", 
      "key": 1452556800000, 
      "doc_count": 1, 
      "group_by_store": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 0, 
      "buckets": [ 
       { 
       "key": "Jungle", 
       "doc_count": 1, 
       "group_by_person": { 
        "doc_count_error_upper_bound": 0, 
        "sum_other_doc_count": 0, 
        "buckets": [ 
        { 
         "key": "John", 
         "doc_count": 1 
        } 
        ] 
       } 
       } 
      ] 
      } 
     }, 
     { 
      "key_as_string": "2016-01-14T00:00:00.000Z", 
      "key": 1452729600000, 
      "doc_count": 1, 
      "group_by_store": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 0, 
      "buckets": [ 
       { 
       "key": "Jungle", 
       "doc_count": 1, 
       "group_by_person": { 
        "doc_count_error_upper_bound": 0, 
        "sum_other_doc_count": 0, 
        "buckets": [ 
        { 
         "key": "Mark", 
         "doc_count": 1 
        } 
        ] 
       } 
       } 
      ] 
      } 
     }, 
     { 
      "key_as_string": "2016-01-20T00:00:00.000Z", 
      "key": 1453248000000, 
      "doc_count": 1, 
      "group_by_store": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 0, 
      "buckets": [ 
       { 
       "key": "Jungle", 
       "doc_count": 1, 
       "group_by_person": { 
        "doc_count_error_upper_bound": 0, 
        "sum_other_doc_count": 0, 
        "buckets": [ 
        { 
         "key": "Simon", 
         "doc_count": 1 
        } 
        ] 
       } 
       } 
      ] 
      } 
     }, 
     { 
      "key_as_string": "2016-01-21T00:00:00.000Z", 
      "key": 1453334400000, 
      "doc_count": 2, 
      "group_by_store": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 0, 
      "buckets": [ 
       { 
       "key": "Jungle", 
       "doc_count": 1, 
       "group_by_person": { 
        "doc_count_error_upper_bound": 0, 
        "sum_other_doc_count": 0, 
        "buckets": [ 
        { 
         "key": "Simon", 
         "doc_count": 1 
        } 
        ] 
       } 
       }, 
       { 
       "key": "Shop", 
       "doc_count": 1, 
       "group_by_person": { 
        "doc_count_error_upper_bound": 0, 
        "sum_other_doc_count": 0, 
        "buckets": [ 
        { 
         "key": "Mark", 
         "doc_count": 1 
        } 
        ] 
       } 
       } 
      ] 
      } 
     } 
     ] 
    } 
    } 
}

출처

2016-06-30 23:34:08 pickypg

을 , 버킷 집합과 함께 나의 작은 주목할만한 해결 방법은'date_histogram'과 함께 작동하지 않을 것입니다. 아이러니하게도 원래 수치처럼 수치를 남겨두면 효과가 있습니다. – pickypg

하위 쿼리 집계와 같은 Elasticsearch SQL

답변

관련 문제