왜 Postgres는 인덱스가 데이터의 1 % 미만을 반환하는 순차 스캔을 수행합니까?

저는 오라클 및 MySQL 경험 (DBA 및 dev)을 19 년 이상 받았으며 Postgres를 처음 사용하기 때문에 뭔가 분명하지 않을 수 있습니다. 그러나 나는이 쿼리가 내가 원하는 것을 할 수 없다.왜 Postgres는 인덱스가 데이터의 1 % 미만을 반환하는 순차 스캔을 수행합니까?

NOTE: This query is running on an EngineYard Postgres instance. I am not immediately aware of the parameters it has set up. Also, columns applicable_type and status in the items table are of extension type citext.

다음 쿼리는 행을 반환 60 초를 초과하여 수행 할 수 있습니다

SELECT items.item_id, 
     CASE when items.sku is null then items.title else concat(item.title, ' (SKU: ', items.sku, ')') END title, 
     items.listing_status, items.updated_at, items.id, 
     items.sku, count(details.id) detail_count 
FROM "items" LEFT OUTER JOIN details ON details.applicable_id = items.id 
            and details.applicable_type = 'Item' 
            and details.status = 'Valid' 
       LEFT OUTER JOIN products ON products.id = items.product_id 
WHERE "items"."user_id" = 3 
GROUP BY items.id 
ORDER BY title asc 
LIMIT 25 OFFSET 0

details 테이블은 650 행이 포함되어 있습니다. LEFT OUTER JOIN은 applicable_id에서 순차적 스캔을 수행합니다. 카디널리티 측면에서 볼 때이 열은 6.5M 행에서 120,000 가지의 고유 한 가능성을 제공합니다.

applicable_id 
applicable_type 
status

하지만 정말, applicable_id 및 applicable_type 낮은 기수가 :

나는 다음과 같은 열 details에 BTREE 인덱스를 가지고있다. 이 같은

내 explain analyze 외모 :

Limit (cost=247701.59..247701.65 rows=25 width=118) (actual time=28781.090..28781.098 rows=25 loops=1) 
    -> Sort (cost=247701.59..247703.05 rows=585 width=118) (actual time=28781.087..28781.090 rows=25 loops=1) 
     Sort Key: (CASE WHEN (items.sku IS NULL) THEN (items.title)::text ELSE pg_catalog.concat(items.title, ' (SKU: ', items.sku, ')') END) 
     Sort Method: top-N heapsort Memory: 30kB 
     -> HashAggregate (cost=247677.77..247685.08 rows=585 width=118) (actual time=28779.658..28779.974 rows=664 loops=1) 
      -> Hash Right Join (cost=2069.47..247645.64 rows=6425 width=118) (actual time=17798.898..28742.395 rows=60047 loops=1) 
       Hash Cond: (details.applicable_id = items.id) 
       -> Seq Scan on details (cost=0.00..220591.65 rows=6645404 width=8) (actual time=6.272..27702.717 rows=6646205 loops=1) 
         Filter: ((applicable_type = 'Listing'::citext) AND (status = 'Valid'::citext)) 
         Rows Removed by Filter: 942 
       -> Hash (cost=2062.16..2062.16 rows=585 width=118) (actual time=1.286..1.286 rows=664 loops=1) 
         Buckets: 1024 Batches: 1 Memory Usage: 90kB 
         -> Bitmap Heap Scan on items (cost=16.87..2062.16 rows=585 width=118) (actual time=0.157..0.748 rows=664 loops=1) 
          Recheck Cond: (user_id = 3) 
          -> Bitmap Index Scan on index_items_on_user_id (cost=0.00..16.73 rows=585 width=0) (actual time=0.141..0.141 rows=664 loops=1) 
            Index Cond: (user_id = 3)

총 런타임 : 28781.238 MS

출처

2013-10-17 AKWF

궁금한 점 ... 어떤 종류의 계획을 기대하고 있었습니까? 큰 골재의 톱 앤 타입이 맞을 것 같아, 안 그래? (또한 : 제품에 대한 왼쪽 조인을 보면서 (distinct.id)를 계산해서는 안됩니다.) –

'set enable_hashjoin = false; '로 해시 조인을 일시적으로 비활성화하여 다른 쿼리 계획을보고 비교할 수 있습니다 현재와. –

또한 더 나은 성능을 얻으려면 다른 테이블과 조인 할 하위 쿼리에서'items'에'LIMIT'을 적용 할 수 있습니다. –

당신이 제목을 산출 식에 인덱스가 있습니까? 더 나은 아직, 하나 (user_id, title_expression).

그렇지 않은 경우 추가하는 것이 좋습니다. 따라서 인덱스 스캔의 처음 25 행을 네스트 루프 할 때 Postgres가 임의의 25 행을 합리적으로 추측 할 수 없음을 알 수 있습니다 (따라서 seq 스캔은 현재 조인 된 테이블에 올라 타는 중)이 필요할 것입니다.

출처

2013-10-17 21:50:28

고유 색인이 (user_id, sku, title, item_id, id) 있습니다. seq 스캔은 여전히 제한없이 표시됩니다. – AKWF

@AKWF : 검색어가 표시되지만 seq 검색을 피하기 위해 필요한 색인이 켜져 있습니다 (user_id, sku가 null 일 때 CASE, 그 외 제목 concat (제목, '(SKU :', SKU, ')'). END) ... 그렇지 않으면 Postgres가 상위 25 행이 될 것으로 예상 할 수있는 방법이 전혀 없습니다. 또는 items.title, items.sku (원시 데이터에서와 같이) 순으로 정렬 한 다음 (user_id, title, sku)에 더 유용한 색인을 추가하십시오. –

apply_id 열에 대해서만 인덱스가 필요하다고 생각합니다 (해당 _ 유형, 상태 열 제외). 매개 변수를 늘려야 할 수도 있습니다 (시스템 전체 또는 적용 가능한 열만 해당). 따라서 postgresql은 결합시 행 수를 더 잘 예측할 수 있습니다.

출처

2013-10-18 02:46:21 alexius

왜 Postgres는 인덱스가 데이터의 1 % 미만을 반환하는 순차 스캔을 수행합니까?

답변

관련 문제