지연 도착 차원 및 NULL 비즈니스 키에 대한 차원 테이블에서 -1 및 -2 행을 사용하는 Kimball 데이터 마트를 구현하려고합니다. 사실과 차원 데이터를위한 하나의 스테이징 테이블, 데이터 마트를위한 2 개의 차원 테이블과 하나의 팩트 테이블을 생성하는 예제 코드가 있습니다. 다음은 SQL의 데이터를 사용한 예제 코드입니다.Kimball Data Mart에서 지연 도착 차원과 NULL 비즈니스 키를 처리하는 방법은 무엇입니까?
--drop table stg_sales
--go
CREATE TABLE dbo.stg_sales
(
stg_sales_id INT IDENTITY(1, 1) NOT NULL,
sales_number INT NOT NULL,
sales_amt INT NULL,
cust_number INT NULL,
cust_firstname NVARCHAR(50) NULL,
cust_lastname NVARCHAR(100) NULL,
cust_address NVARCHAR(500) NULL,
salesperson_number INT NULL,
CONSTRAINT pk_stg_sales PRIMARY KEY (stg_sales_id)
)
go
INSERT stg_sales
(sales_number,
sales_amt,
cust_number,
cust_firstname,
cust_lastname,
cust_address,
salesperson_number)
VALUES (123,
434,
2342,
'Jim',
'Moriaty',
'something',
23)
INSERT stg_sales
(sales_number,
sales_amt,
cust_number,
cust_firstname,
cust_lastname,
cust_address,
salesperson_number)
VALUES (124,
234,
2342,
'Jim',
'Moriaty',
'something',
23)
INSERT stg_sales
(sales_number,
sales_amt,
cust_number,
cust_firstname,
cust_lastname,
cust_address,
salesperson_number)
VALUES (125,
434,
4545,
'Joe',
'Esk',
'someother',
24)
INSERT stg_sales
(sales_number,
sales_amt,
cust_number,
cust_firstname,
cust_lastname,
cust_address,
salesperson_number)
VALUES (126,
434,
5555,
'Daniel',
'Hart',
'Someaddr',
NULL) --salesperson_number business key missing here
INSERT stg_sales
(sales_number,
sales_amt,
cust_number,
cust_firstname,
cust_lastname,
cust_address,
salesperson_number)
VALUES (127,
333,
4444,
'Pat',
'Smith',
'Someaddr',
30)
SELECT *
FROM stg_sales
--create a dimension and fact tables
--drop table dbo.dim_customer
--go
CREATE TABLE dbo.dim_customer
(
customer_wid INT IDENTITY(1, 1) NOT NULL,
cust_number INT NULL,
cust_firstname NVARCHAR(50) NULL,
cust_lastname NVARCHAR(100) NULL,
cust_address NVARCHAR(500) NULL,
date_insert DATETIME2 NOT NULL DEFAULT (Getdate()),
date_update DATETIME2 NULL,
is_current BIT NOT NULL
CONSTRAINT pk_dim_customer PRIMARY KEY (customer_wid)
CONSTRAINT chk_is_current CHECK (is_current IN (0, 1))
)
go
SET IDENTITY_INSERT dbo.dim_customer ON
INSERT dbo.dim_customer
(customer_wid,
cust_number,
cust_firstname,
cust_lastname,
cust_address,
date_insert,
date_update,
is_current)
VALUES (-1,
-1,
'unknown',
'unknown',
'unknown',
Getdate(),
Getdate(),
1)
INSERT dbo.dim_customer
(customer_wid,
cust_number,
cust_firstname,
cust_lastname,
cust_address,
date_insert,
date_update,
is_current)
VALUES (-2,
-2,
'Error',
'Error',
'Error',
Getdate(),
Getdate(),
1)
SET IDENTITY_INSERT dbo.dim_customer OFF
--insert data into dimension table
INSERT dbo.dim_customer
(cust_number,
cust_firstname,
cust_lastname,
cust_address,
is_current)
SELECT DISTINCT cust_number,
cust_firstname,
cust_lastname,
cust_address,
1 AS is_current
FROM dbo.stg_sales
WHERE cust_number <> 4444 --left one record off to simulate the situation where you don't have corrensponding row in dim table (late arriving dimension)
SELECT *
FROM dbo.dim_customer
DROP TABLE dbo.dim_salesperson
--create salesperson table
CREATE TABLE dbo.dim_salesperson
(
salesperson_wid INT IDENTITY(1, 1) NOT NULL,
salesperson_number INT NULL,
salesperson_firstname NVARCHAR(50) NULL,
salesperson_lastname NVARCHAR(100) NULL,
salesperson_address NVARCHAR(500) NULL,
date_insert DATETIME2 NOT NULL DEFAULT (Getdate()),
date_update DATETIME2 NULL,
is_current BIT NOT NULL
CONSTRAINT pk_dim_salesperson PRIMARY KEY (salesperson_wid)
CONSTRAINT chk_dim_salesperson_is_current CHECK (is_current IN (0, 1))
)
go
SET IDENTITY_INSERT dbo.dim_salesperson ON
INSERT dbo.dim_salesperson
(salesperson_wid,
salesperson_number,
salesperson_firstname,
salesperson_lastname,
salesperson_address,
date_insert,
date_update,
is_current)
VALUES (-1,
-1,
'Not available',
'Not available',
'Not available',
Getdate(),
Getdate(),
1)
INSERT dbo.dim_salesperson
(salesperson_wid,
salesperson_number,
salesperson_firstname,
salesperson_lastname,
salesperson_address,
date_insert,
date_update,
is_current)
VALUES (-2,
-2,
'Error',
'Error',
'Error',
Getdate(),
Getdate(),
1)
SET IDENTITY_INSERT dbo.dim_salesperson OFF
--insert data into salesperson
INSERT dbo.dim_salesperson
(salesperson_number,
salesperson_firstname,
salesperson_lastname,
salesperson_address,
is_current)
VALUES (23,
'John',
'Fox',
'something',
1)
INSERT dbo.dim_salesperson
(salesperson_number,
salesperson_firstname,
salesperson_lastname,
salesperson_address,
is_current)
VALUES (24,
'Hadley',
'Fox',
'something',
1)
INSERT dbo.dim_salesperson
(salesperson_number,
salesperson_firstname,
salesperson_lastname,
salesperson_address,
is_current)
VALUES (30,
'Ashley',
'Fox',
'something',
1)
SELECT *
FROM dbo.dim_salesperson
SELECT *
FROM dbo.stg_sales
--create and populate the fact table
--drop table dbo.f_sales
--go
CREATE TABLE dbo.f_sales
(
sales_number INT NOT NULL,
customer_wid INT NOT NULL,
salesperson_wid INT NOT NULL,
sales_amt INT NULL
CONSTRAINT pk_f_sales PRIMARY KEY (sales_number)
CONSTRAINT fk_customer_wid FOREIGN KEY (customer_wid) REFERENCES
dbo.dim_customer(customer_wid),
CONSTRAINT fk_salesperson_wid FOREIGN KEY (salesperson_wid) REFERENCES
dbo.dim_salesperson(salesperson_wid)
)
--populate the fact table
INSERT dbo.f_sales
(sales_number,
customer_wid,
salesperson_wid,
sales_amt)
SELECT stg.sales_number,
Isnull(dimcust.customer_wid, -1) AS customer_wid,
--this is maybe correct way to assign -1 foreign key when there is no corresponding dimension row in the dim table
Isnull(dimsp.salesperson_wid, -2) AS salesperson_wid,
--NOT CORRECT, how to assign -2 foreign key when the business key is NULL in the source?
stg.sales_amt
FROM dbo.stg_sales AS stg
LEFT JOIN dbo.dim_customer AS dimcust
ON stg.cust_number = dimcust.cust_number
LEFT JOIN dbo.dim_salesperson AS dimsp
ON stg.salesperson_number = dimsp.salesperson_number
SELECT *
FROM dbo.f_sales
원본 시스템에 비즈니스 키가없는 행에 -2를 어떻게 할당 할 수 있습니까?
이것은 기본적으로 내가 달성하기 위해 노력하고있어 :
Handling NULL in dimensions and measures
당신은 킴볼이 구현 뒤에 이론에 대한 자세한 내용을보실 수 있습니다 편집 :
,210가 나는 왼쪽에서 COALESCE
또는 ISNULL
조인을 사용할 수 있다고 생각, 올바른 결과를 가져 오는 것 같습니다 : 순수
-- add nullable keys to the staging table
alter table dbo.stg_sales ADD
sales_person_wid integer null
, customer_wid integer null
;
-- insert to staging table here (as in your example)
-- lookup sales person key
update dbo.stg_sales
set sales_person_wid = p.sales_person_wid
from dbo.stg_sales as s , dbo.dim_salesperson as p
where s.salesperson_number = p.salesperson_number ;
-- decide what to do with missing business keys
update dbo.stg_sales
set sales_person_wid = -2
where sales_person_wid is null ;
-- do similar for customer
-- now all keys in staging table are not null
-- load to fact table
그러나 조회 기법으로
INSERT dbo.f_sales
(sales_number,
customer_wid,
salesperson_wid,
sales_amt)
SELECT stg.sales_number,
Isnull(dimcust.customer_wid, -1) AS customer_wid,
--this is maybe correct way to assign -1 foreign key when there is no corresponding dimension row in the dim table
dimsp.salesperson_wid,
stg.sales_amt
FROM dbo.stg_sales AS stg
LEFT JOIN dbo.dim_customer AS dimcust
ON COALESCE(stg.cust_number, -2) = dimcust.cust_number
LEFT JOIN dbo.dim_salesperson AS dimsp
ON COALESCE(stg.salesperson_number, -2) = dimsp.salesperson_number
답장을 보내 주셔서 감사합니다. 스테이징 테이블 조회를 위해 차원 테이블을 사용하는 것은 결코 마음에 들지 않습니다. 이전에는 이러한 상황에서 ETL 도구를 사용했으며 준비 영역이 아닌 사실로드에서이 문제를 해결하는 것이 일반적입니다. -2로 join 연산에서 NULL을 대체하기 위해 왼쪽 조인 조건에서 coalesce/isnull 함수를 사용할 수 있다고 생각합니다. – jrara
내 질문을 편집했습니다. – jrara