Accuracy of BMLiNGAM¶
This notebook shows accuracy of BMLiNGAM for causal inference.
In [1]:
%matplotlib inline
%autosave 0
import sys, os
sys.path.insert(0, os.path.expanduser('~/work/git/github/taku-y/bmlingam'))
sys.path.insert(0, os.path.expanduser('~/work/git/github/pymc-devs/pymc3'))
import theano
theano.config.floatX = 'float64'
from copy import deepcopy
import hashlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import time
from expr1 import run_trial
from bmlingam import load_pklz, save_pklz
# from bmlingam import do_mcmc_bmlingam, InferParams, MCMCParams, save_pklz, load_pklz, define_hparam_searchspace, find_best_model
# from bmlingam.utils.gendata import GenDataParams, gen_artificial_data
Autosave disabled
/opt/conda/lib/python3.5/site-packages/matplotlib/__init__.py:1357: UserWarning: This call to matplotlib.use() has no effect
because the backend has already been chosen;
matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.
warnings.warn(_use_error_msg)
Experimental condition¶
The experimental conditions are as follows.
- Parameter of artiifcial data
- The number of observed samples (
n_samples
):[100]
- The number of confounding factors (
n_confs
or \(Q\)):[1, 3, 5, 10]
- Scale of total noise: \(c=0.25, 0.5, 1.0, 3.0\).
- Scale of confounding factors: \(c/\sqrt{Q}\)
- Distribution of observation noise (
data_noise_type
):['laplace', 'uniform']
- Scale of observation noise: 3
- The number of observed samples (
- Hyperparameter range
- Correlation coefficient of confounding factors: (
L_cov_21s
):[[-.9, -.7, -.5, -.3, 0, .3, .5, .7, .9]]
- Distribution of observation noise (
model_noise_type
):['gg']
- Correlation coefficient of confounding factors: (
In [2]:
conds = [
{
'totalnoise': totalnoise,
'L_cov_21s': L_cov_21s,
'n_samples': n_samples,
'n_confs': n_confs,
'data_noise_type': data_noise_type,
'model_noise_type': model_noise_type
}
for totalnoise in [0.25, 0.5, 1.0, 3.0]
for L_cov_21s in [[-.9, -.7, -.5, -.3, 0, .3, .5, .7, .9]]
for n_samples in [100]
for n_confs in [1, 3, 5, 10] # [1, 3, 5, 10]
for data_noise_type in ['laplace', 'uniform']
for model_noise_type in ['gg']
]
Program¶
Identifier of trial¶
Identifier of a trial is determined based on:
- Trial index (
ix_trial
) - The number of samples (
n_samples
) - The number of confounders (
n_confs
) - Distribution of observation noise of artificial data
(
data_noise_type
) - Distribution of observation noise of BMLiNGAM model
(
model_noise_type
) - Correlation coefficient of confounding effect (
L_cov_21s
) - Total noise scale (
totalnoise
)
We use identifiers to store results of trials.
In [3]:
def make_id(ix_trial, n_samples, n_confs, data_noise_type, model_noise_type, L_cov_21s, totalnoise):
L_cov_21s_ = ' '.join([str(v) for v in L_cov_21s])
return hashlib.md5(
str((L_cov_21s_, ix_trial, n_samples, n_confs, data_noise_type, model_noise_type, totalnoise)).encode('utf-8')
).hexdigest()
# Test
print(make_id(55, 100, 12, 'all', 'gg', [1, 2, 3], 0.3))
8eb07d5dcb68e8163a7a2a0582b59d1f
Append results to dataframe¶
In [4]:
def add_result_to_df(df, result):
if df is None:
return pd.DataFrame({k: [v] for k, v in result.items()})
else:
return df.append(result, ignore_index=True)
# Test
result1 = {'col1': 10, 'col2': 20}
result2 = {'col1': 30, 'col2': -10}
df1 = add_result_to_df(None, result1)
print('--- df1 ---')
print(df1)
df2 = add_result_to_df(df1, result2)
print('--- df2 ---')
print(df2)
--- df1 ---
col1 col2
0 10 20
--- df2 ---
col1 col2
0 10 20
1 30 -10
Save and load data frame¶
In [5]:
def load_df(df_file):
if os.path.exists(df_file):
return load_pklz(df_file)
else:
return None
def save_df(df_file, df):
save_pklz(df_file, df)
Perform experiment over conditions and trials¶
In [6]:
def df_exist_result_id(df, result_id):
if df is not None:
return result_id in np.array(df['result_id'])
else:
False
def run_expr(conds, n_trials_per_cond=50):
"""Perform evaluation of BMLiNGAM given a set of experimental conditions.
For each condition, several trials are executed.
In a trial, BMLiNGAM is applied to causal inference for artificial data.
The average accuracy is computed for each condition.
"""
# Filename of dataframe
data_dir = '.'
df_file = data_dir + '/20160822-eval-bml-results.pklz'
# Load results computed in previous
df = load_df(df_file)
# Loop over experimental conditions
n_skip = 0
for cond in conds:
print(cond)
# Loop over trials
for ix_trial in range(n_trials_per_cond):
# Identifier of a trial for (cond, ix_trial)
result_id = make_id(ix_trial, **cond)
# Check if the result has been already stored in the data frame
if df_exist_result_id(df, result_id):
n_skip += 1
else:
# `result` is a dict including results of trials.
# `ix_trial` is used as the random seed of the corresponding trial.
result = run_trial(ix_trial, cond)
result.update({'result_id': result_id})
df = add_result_to_df(df, result)
save_df(df_file, df)
print('Number of skipped trials = {}'.format(n_skip))
return df
Run program¶
In [7]:
df = run_expr(conds)
{'totalnoise': 0.25, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 1, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 0.25, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 1, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 0.25, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 3, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 0.25, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 3, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 0.25, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 5, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 0.25, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 5, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 0.25, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 10, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 0.25, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 10, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 0.5, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 1, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 0.5, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 1, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 0.5, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 3, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 0.5, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 3, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 0.5, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 5, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 0.5, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 5, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 0.5, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 10, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 0.5, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 10, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 1.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 1, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 1.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 1, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 1.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 3, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 1.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 3, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 1.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 5, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 1.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 5, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 1.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 10, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 1.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 10, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 3.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 1, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 3.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 1, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 3.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 3, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 3.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 3, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 3.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 5, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 3.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 5, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
{'totalnoise': 3.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 10, 'model_noise_type': 'gg', 'data_noise_type': 'laplace', 'n_samples': 100}
{'totalnoise': 3.0, 'L_cov_21s': [-0.9, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 0.9], 'n_confs': 10, 'model_noise_type': 'gg', 'data_noise_type': 'uniform', 'n_samples': 100}
Number of skipped trials = 1600
Result table¶
In [8]:
import pandas as pd
df_file = './20160822-eval-bml-results.pklz'
df = load_pklz(df_file)
df = pd.concat(
{
'2log(bf)': df['log_bf'],
'correct rate': df['correct_rate'],
'totalnoise': df['totalnoise'],
'data noise type': df['data_noise_type'],
'n_confs': df['n_confs']
}, axis=1
)
sg = df.groupby(['data noise type', 'n_confs', 'totalnoise'])
sg1 = sg['correct rate'].mean()
sg2 = sg['2log(bf)'].mean()
pd.concat(
{
'correct_rate': sg1,
'2log(bf)': sg2,
}, axis=1
)
Out[8]:
2log(bf) | correct_rate | |||
---|---|---|---|---|
data noise type | n_confs | totalnoise | ||
laplace | 1 | 0.25 | 10.658543 | 0.84 |
0.50 | 9.956851 | 0.76 | ||
1.00 | 8.132585 | 0.78 | ||
3.00 | 6.067567 | 0.44 | ||
3 | 0.25 | 9.701838 | 0.82 | |
0.50 | 8.965892 | 0.80 | ||
1.00 | 6.592364 | 0.64 | ||
3.00 | 5.585870 | 0.38 | ||
5 | 0.25 | 9.973692 | 0.88 | |
0.50 | 9.525252 | 0.80 | ||
1.00 | 7.731706 | 0.70 | ||
3.00 | 6.341696 | 0.60 | ||
10 | 0.25 | 10.690053 | 0.86 | |
0.50 | 10.349471 | 0.86 | ||
1.00 | 8.007714 | 0.74 | ||
3.00 | 5.491167 | 0.52 | ||
uniform | 1 | 0.25 | 11.557302 | 0.88 |
0.50 | 10.521002 | 0.76 | ||
1.00 | 7.951135 | 0.68 | ||
3.00 | 5.253098 | 0.48 | ||
3 | 0.25 | 12.815920 | 0.90 | |
0.50 | 11.660920 | 0.86 | ||
1.00 | 7.951725 | 0.74 | ||
3.00 | 6.008483 | 0.56 | ||
5 | 0.25 | 11.016579 | 0.86 | |
0.50 | 10.214459 | 0.78 | ||
1.00 | 6.845886 | 0.66 | ||
3.00 | 5.512882 | 0.52 | ||
10 | 0.25 | 11.683218 | 0.88 | |
0.50 | 10.147768 | 0.82 | ||
1.00 | 7.335032 | 0.62 | ||
3.00 | 6.205580 | 0.50 |
Bayes factor and accuracy¶
In [9]:
import pandas as pd
def count(x): return np.sum(x.astype(int))
data_dir = '.'
df_file = data_dir + '/20160822-eval-bml-results.pklz'
df = load_pklz(df_file)
df = pd.concat(
{
'2log(bf)': df['log_bf'],
'correct rate': df['correct_rate'],
'count': df['correct_rate'],
'totalnoise': df['totalnoise'],
'data noise type': df['data_noise_type']
}, axis=1
)
df = df.pivot_table(values=['correct rate', 'count'],
index=['totalnoise', pd.cut(df['2log(bf)'], [0., 2., 6., 10., 100.])],
columns='data noise type',
aggfunc={'correct rate': np.mean, 'count': np.sum})
df
Out[9]:
count | correct rate | ||||
---|---|---|---|---|---|
data noise type | laplace | uniform | laplace | uniform | |
totalnoise | 2log(bf) | ||||
0.25 | (0, 2] | 21.0 | 18.0 | 0.636364 | 0.642857 |
(2, 6] | 26.0 | 31.0 | 0.722222 | 0.794872 | |
(6, 10] | 37.0 | 20.0 | 0.822222 | 0.869565 | |
(10, 100] | 86.0 | 107.0 | 1.000000 | 0.972727 | |
0.50 | (0, 2] | 13.0 | 16.0 | 0.481481 | 0.533333 |
(2, 6] | 37.0 | 27.0 | 0.711538 | 0.613636 | |
(6, 10] | 37.0 | 24.0 | 0.840909 | 0.800000 | |
(10, 100] | 74.0 | 94.0 | 0.961039 | 0.979167 | |
1.00 | (0, 2] | 21.0 | 18.0 | 0.567568 | 0.600000 |
(2, 6] | 36.0 | 39.0 | 0.631579 | 0.520000 | |
(6, 10] | 27.0 | 34.0 | 0.600000 | 0.723404 | |
(10, 100] | 59.0 | 44.0 | 0.967213 | 0.916667 | |
3.00 | (0, 2] | 28.0 | 30.0 | 0.509091 | 0.600000 |
(2, 6] | 34.0 | 36.0 | 0.465753 | 0.486486 | |
(6, 10] | 17.0 | 20.0 | 0.515152 | 0.500000 | |
(10, 100] | 18.0 | 17.0 | 0.461538 | 0.472222 |
In [ ]: