2017-01-17 16 views
1

는 그냥 PyMC3 documentation를 통해 읽기 시작 (I 훨씬 더 편안 sklearn과 함께 해요)와 Rugby hierarchical model example 건너 온했습니다PyMC3 : 계층 적 럭비 모델 후부?

# Imports and Rugby data setup -- model in next section 

import numpy as np 
import pandas as pd 
import pymc3 as pm 
import theano.tensor as tt 
import matplotlib.pyplot as plt 
import seaborn as sns 

games = [ 
    ['Wales', 'Italy', 23, 15], 
    ['France', 'England', 26, 24], 
    ['Ireland', 'Scotland', 28, 6], 
    ['Ireland', 'Wales', 26, 3], 
    ['Scotland', 'England', 0, 20], 
    ['France', 'Italy', 30, 10], 
    ['Wales', 'France', 27, 6], 
    ['Italy', 'Scotland', 20, 21], 
    ['England', 'Ireland', 13, 10], 
    ['Ireland', 'Italy', 46, 7], 
    ['Scotland', 'France', 17, 19], 
    ['England', 'Wales', 29, 18], 
    ['Italy', 'England', 11, 52], 
    ['Wales', 'Scotland', 51, 3], 
    ['France', 'Ireland', 20, 22], 
] 
columns = ['home_team', 'away_team', 'home_score', 'away_score'] 
df = pd.DataFrame(games, columns=columns) 

teams = df.home_team.unique() 
teams = pd.DataFrame(teams, columns=['team']) 
teams['i'] = teams.index 

df = pd.merge(df, teams, left_on='home_team', right_on='team', how='left') 
df = df.rename(columns = {'i': 'i_home'}).drop('team', 1) 
df = pd.merge(df, teams, left_on='away_team', right_on='team', how='left') 
df = df.rename(columns = {'i': 'i_away'}).drop('team', 1) 

observed_home_goals = df.home_score.values 
observed_away_goals = df.away_score.values 

home_team = df.i_home.values 
away_team = df.i_away.values 

num_teams = len(df.i_home.drop_duplicates()) 
num_games = len(home_team) 

g = df.groupby('i_away') 
att_starting_points = np.log(g.away_score.mean()) 
g = df.groupby('i_home') 
def_starting_points = -np.log(g.away_score.mean()) 

다음 주 PyMC3 모델 설정입니다 : 내가 아는

with pm.Model() as model: 
    # Global model parameters 
    home = pm.Normal('home', 0, tau=.0001) 
    tau_att = pm.Gamma('tau_att', .1, .1) 
    tau_def = pm.Gamma('tau_def', .1, .1) 
    intercept = pm.Normal('intercept', 0, tau=.0001) 

    # Team-specific model parameters 
    atts_star = pm.Normal('atts_star', mu=0, tau=tau_att, shape=num_teams) 
    defs_star = pm.Normal('defs_star', mu=0, tau=tau_def, shape=num_teams) 

    atts = pm.Deterministic('atts', atts_star - tt.mean(atts_star)) 
    defs = pm.Deterministic('defs', defs_star - tt.mean(defs_star)) 
    home_theta = tt.exp(intercept + home + atts[home_team] + defs[away_team]) 
    away_theta = tt.exp(intercept + atts[away_team] + defs[home_team]) 

    # Likelihood of observed data 
    home_points = pm.Poisson('home_points', mu=home_theta, observed=observed_home_goals) 
    away_points = pm.Poisson('away_points', mu=away_theta, observed=observed_away_goals) 

    start = pm.find_MAP() 
    step = pm.NUTS(state=start) 
    trace = pm.sample(20000, step, init=start) 

은 음모를 꾸미는 방법 :

pm.traceplot(trace[5000:]) 

그리고 유전자 속도 posterior predictive samples :

나는 확신 해요 무엇
ppc = pm.sample_ppc(trace[5000:], samples=500, model=model) 

: 나는 모델/후방의 질문을 어떻게 ? 예를 들어

, 나는 것 Wales vs Italy 매치업에 대한 점수의 분포를 가정하고 있습니다 :

# Wales vs Italy is the first matchup in our dataset 
home_wales = ppc['home_points'][:, 0] 
away_italy = ppc['away_points'][:, 0] 

하지만 원본 데이터에 기록되지 않습니다 매치업은 어떻습니까?

  • 이탈리아가 프랑스와 경기를 치르는 경우 점수 분배는 어떻게됩니까?
  • 이탈리아가 프랑스와 홈 경기에서 뛰는 경우, 두 팀은 15 세 이하에서 얼마나 자주 점수를 매 깁니까?

도움/의견을 보내 주셔서 감사합니다.

답변

1

나는 PyMC3 Hierarchical Partial Pooling example을 읽은 후 이것을 알아낼 수 있었다고 확신합니다. 순서대로 질문에 대답 :

  1. 예, 즉 (는 첫 경기가 관찰 된 데이터 이후) Wales vs Italy 매치업에 대한 분포가 될 것입니다.

  2. Italy vs France (두 팀이 원래 데이터 집합에서 서로 경쟁하지 않았으므로) 예측하려면 thetas 예측이 필요합니다. trace이 완료되면

    # Setup the model similarly to the previous one... 
    with pm.Model() as model: 
        # Global model parameters 
        home = pm.Normal('home', 0, tau=.0001) 
        tau_att = pm.Gamma('tau_att', .1, .1) 
        tau_def = pm.Gamma('tau_def', .1, .1) 
        intercept = pm.Normal('intercept', 0, tau=.0001) 
    
        # Team-specific model parameters 
        atts_star = pm.Normal('atts_star', mu=0, tau=tau_att, shape=num_teams) 
        defs_star = pm.Normal('defs_star', mu=0, tau=tau_def, shape=num_teams) 
    
        atts = pm.Deterministic('atts', atts_star - tt.mean(atts_star)) 
        defs = pm.Deterministic('defs', defs_star - tt.mean(defs_star)) 
        home_theta = tt.exp(intercept + home + atts[home_team] + defs[away_team]) 
        away_theta = tt.exp(intercept + atts[away_team] + defs[home_team]) 
    
        # Likelihood of observed data 
        home_points = pm.Poisson('home_points', mu=home_theta, observed=observed_home_goals) 
        away_points = pm.Poisson('away_points', mu=away_theta, observed=observed_away_goals) 
    
    # Now for predictions with no games played... 
    with model: 
        # IDs from `teams` DataFrame 
        italy, france = 4, 1 
        # New `thetas` for Italy vs France predictions 
        pred_home_theta = tt.exp(intercept + home + atts[italy] + defs[france]) 
        pred_away_theta = tt.exp(intercept + atts[france] + defs[italy]) 
        pred_home_points = pm.Poisson('pred_home_points', mu=pred_home_theta) 
        pred_away_points = pm.Poisson('pred_away_points', mu=pred_away_theta) 
    
    # Sample the final model 
    with model: 
        start = pm.find_MAP() 
        step = pm.NUTS(state=start) 
        trace = pm.sample(20000, step, init=start) 
    

    , 우리가 예측 플롯 할 수 있습니다 :

    # Use 5,000 as MCMC burn in 
    pred = pd.DataFrame({ 
        "italy": trace["pred_home_points"][5000:], 
        "france": trace["pred_away_points"][5000:], 
    }) 
    # Plot the distributions 
    sns.kdeplot(pred.italy, shade=True, label="Italy") 
    sns.kdeplot(pred.france, shade=True, label="France") 
    plt.show() 
    

    Italy vs France Rugby distributions

    가 얼마나 자주 이탈리아를 수행 여기

업데이트 된 모델에 코드입니다 집에서 이겼어?

# 19% of the time 
(pred.italy > pred.france).mean() 

얼마나 자주 두 팀 모두 15 점 미만으로 점수를 매깁니까?

# 0.7% of the time 
1.0 * len(pred[(pred.italy <= 15) & (pred.france <= 15)])/len(pred) 
+1

이것은 나에게 좋을 것 같습니다. 왜 PyMC3의 문서에 추가하지 않습니까? –