2016-06-20 3 views
0

많은 중복 데이터가있는 테이블의 행을 줄이려고합니다. 내 첫 번째 생각은 테이블에 저장할 날짜 범위를 정의하기 위해 일부 창 함수를 사용하여이 정보가 필요할 때마다 날짜 범위를 조인 조건의 구분 기호로 사용하는 것이 었습니다. 그러나 일부 참고 문헌이 중복되어있는 것으로 나타 났으므로 어떤 방법이 가장 효과적 일지 잘 모르겠습니다.창 함수를 사용하여 데이터 범위를 구현하는 선의 양을 줄입니다.

저는 Postgres 9.3을 사용하고 있습니다.

select distinct  
    min(obs_date) over (partition by equipment, temperature) as beg_obs_date, 
    max(obs_date) over (partition by equipment, temperature) as end_obs_date, 
    equipment, 
    temperature 
from  
( select generate_series('2016-05-01', '2016-05-08', '1 day'::interval)::date as obs_date, 
     'FREEZER_1'::varchar as equipment, 
     -15.20::real as temperature 
    union all 
    select generate_series('2016-05-09', '2016-05-15', '1 day'::interval)::date as obs_date, 
     'FREEZER_1'::varchar as equipment, 
     -20.00::real as temperature 
    union all 

    select generate_series('2016-05-16', '2016-06-10', '1 day'::interval)::date as obs_date, 
     'FREEZER_1'::varchar as equipment, 
     -15.20::real as temperature 
) sq 

내가 얻을 :

beg_obs_date end_obs_date equipment temperature 
2016-05-01  2016-06-10  FREEZER_1 -15,2 
2016-05-09  2016-05-15  FREEZER_1 -20 

내가 원하는입니다 :

beg_obs_date end_obs_date equipment temperature 
2016-05-01  2016-05-08  FREEZER_1 -15,2 
2016-05-09  2016-05-15  FREEZER_1 -20 
2016-05-16  2016-06-10  FREEZER_1 -15,2 

어떤 생각?

감사합니다.

답변

1

연속 시리즈를 구별하려면 row_number()을 사용하십시오. 패킷와 (약간 단순화 된) 데이터는 부가 :

with the_data as (
    select generate_series('2016-05-01', '2016-05-03', '1 day'::interval)::date as obs_date, 
     'FREEZER_1'::varchar as equipment, 
     -15.20::real as temperature 
    union all 
    select generate_series('2016-05-04', '2016-05-05', '1 day'::interval)::date as obs_date, 
     'FREEZER_1'::varchar as equipment, 
     -20.00::real as temperature 
    union all 
    select generate_series('2016-05-06', '2016-05-08', '1 day'::interval)::date as obs_date, 
     'FREEZER_1'::varchar as equipment, 
     -15.20::real as temperature 
    ) 
select 
    *, 
    row_number() over (partition by equipment, temperature order by obs_date)- row_number() over (order by obs_date) as packet 
from the_data 

    obs_date | equipment | temperature | packet 
------------+-----------+-------------+-------- 
2016-05-01 | FREEZER_1 |  -15.2 |  0 
2016-05-02 | FREEZER_1 |  -15.2 |  0 
2016-05-03 | FREEZER_1 |  -15.2 |  0 
2016-05-04 | FREEZER_1 |   -20 |  -3 
2016-05-05 | FREEZER_1 |   -20 |  -3 
2016-05-06 | FREEZER_1 |  -15.2 |  -2 
2016-05-07 | FREEZER_1 |  -15.2 |  -2 
2016-05-08 | FREEZER_1 |  -15.2 |  -2 
(8 rows) 

packet 대신 temperature 사용 max()min() :

with the_data as (
    select generate_series('2016-05-01', '2016-05-03', '1 day'::interval)::date as obs_date, 
     'FREEZER_1'::varchar as equipment, 
     -15.20::real as temperature 
    union all 
    select generate_series('2016-05-04', '2016-05-05', '1 day'::interval)::date as obs_date, 
     'FREEZER_1'::varchar as equipment, 
     -20.00::real as temperature 
    union all 
    select generate_series('2016-05-06', '2016-05-08', '1 day'::interval)::date as obs_date, 
     'FREEZER_1'::varchar as equipment, 
     -15.20::real as temperature 
    ) 
select distinct  
    min(obs_date) over (partition by equipment, packet) as beg_obs_date, 
    max(obs_date) over (partition by equipment, packet) as end_obs_date, 
    equipment, 
    temperature 
from (
    select 
     *, 
     row_number() over (partition by equipment, temperature order by obs_date)- row_number() over (order by obs_date) as packet 
    from the_data 
) s 
order by 1; 

beg_obs_date | end_obs_date | equipment | temperature 
--------------+--------------+-----------+------------- 
2016-05-01 | 2016-05-03 | FREEZER_1 |  -15.2 
2016-05-04 | 2016-05-05 | FREEZER_1 |   -20 
2016-05-06 | 2016-05-08 | FREEZER_1 |  -15.2 
(3 rows) 
+0

완벽! 좋은 생각! Klin에게 감사드립니다. –