Skip to content

MDTerp module

MDTerp.neighborhood.py – Function for generating perturbed neighborhood samples.

generate_neighborhood(save_dir, numeric_dict, angle_dict, sin_cos_dict, np_dat, index, seed, num_samples, selected_features, periodicity_upper=3.141592653589793, periodicity_lower=-3.141592653589793)

Function for creating a logger detailing MDTerp operations.

Parameters:

Name Type Description Default
save_dir str

Location to save MDTerp results.

required
numeric_dict dict

Python dictionary, each key represents the name of a numeric feature (non-periodic). Values should be lists with a single element using the index of the corresponding numpy array in np_data.

required
angle_dict dict

Python dictionary, each key represents the name of an angular feature in [-pi, pi]. Values should be lists with a single element using the index of the corresponding numpy array in np_data.

required
sin_cos_dict dict

Python dictionary, each key represents the name of an angular feature. Values should be lists with two elements representing the sine, cosine indices of the corresponding numpy array in np_data.

required
np_dat np.ndarray

Numpy 2D array containing training data for the black-box model. Samples along rows and features along columns.

required
index int

Row/sample of the provided dataset using np_dat to analyze.

required
seed int

Random seed.

required
num_samples int

Size of the generated perturbed neighborhood.

required
selected_features <built-in function array>

If an empty array (Default), perturbs all the features/columns. Otherwise, an array of integers representing the subset of features to perturb.

required
periodicity_upper float

Sets periodicity of the angular features (Default: numpy.pi).

3.141592653589793
periodicity_lower float

Sets periodicity of the angular features (Default: -numpy.py).

-3.141592653589793

Returns:

Type Description
list

List of np.ndarray indicating indices of numeric, angular, sin_cos features respectively. list: List of the combined names of the features.

Source code in MDTerp/neighborhood.py
def generate_neighborhood(save_dir: str, numeric_dict: dict, angle_dict: dict, sin_cos_dict: dict, np_dat: np.ndarray, index: int, seed: int, num_samples: int, selected_features: np.array, periodicity_upper: float = np.pi, periodicity_lower: float = -np.pi):
    """
    Function for creating a logger detailing MDTerp operations.

    Args:
        save_dir (str): Location to save MDTerp results.
        numeric_dict (dict): Python dictionary, each key represents the name of a numeric feature (non-periodic). Values should be lists with a single element using the index of the corresponding numpy array in np_data.
        angle_dict (dict): Python dictionary, each key represents the name of an angular feature in [-pi, pi]. Values should be lists with a single element using the index of the corresponding numpy array in np_data.
        sin_cos_dict (dict): Python dictionary, each key represents the name of an angular feature. Values should be lists with two elements representing the sine, cosine indices of the corresponding numpy array in np_data.
        np_dat (np.ndarray): Numpy 2D array containing training data for the black-box model. Samples along rows and features along columns.
        index (int): Row/sample of the provided dataset using np_dat to analyze.
        seed (int): Random seed.
        num_samples (int): Size of the generated perturbed neighborhood.
        selected_features: If an empty array (Default), perturbs all the features/columns. Otherwise, an array of integers representing the subset of features to perturb.
        periodicity_upper (float): Sets periodicity of the angular features (Default: numpy.pi).
        periodicity_lower (float): Sets periodicity of the angular features (Default: -numpy.py).
    Returns:
        list: List of np.ndarray indicating indices of numeric, angular, sin_cos features respectively.
        list: List of the combined names of the features.
    """

    numeric_indices = []
    angle_indices = []
    sin_indices = []
    cos_indices = []

    indices_names = []

    for i in numeric_dict:
            numeric_indices.append(numeric_dict[i])
            indices_names.append(i)
            assert numeric_dict[i][0] in np.arange(np_dat.shape[1]), 'Invalid numeric index'
    for i in angle_dict:
            angle_indices.append(angle_dict[i])
            indices_names.append(i)
            assert angle_dict[i][0] in np.arange(np_dat.shape[1]), 'Invalid angle index'
    for i in sin_cos_dict:
            sin_indices.append(sin_cos_dict[i][0])
            cos_indices.append(sin_cos_dict[i][1])
            indices_names.append(i)
            assert sin_cos_dict[i][0] in np.arange(np_dat.shape[1]), 'Invalid sin index'
            assert sin_cos_dict[i][1] in np.arange(np_dat.shape[1]), 'Invalid cos index'

    numeric_indices = np.array(numeric_indices).flatten()
    angle_indices = np.array(angle_indices).flatten()
    sin_indices = np.array(sin_indices).flatten()
    cos_indices = np.array(cos_indices).flatten()

    std_master = []
    for i in range(np_dat.shape[1]):
        if i not in angle_indices:
            std_master.append(np.std(np_dat[:,i]))
        else:
            std_master.append(sst.circstd(np_dat[:,i], high = np.pi, low = -np.pi))

    std_master = np.array(std_master).flatten()
    if selected_features.shape[0]==0:
        save_directory = save_dir + 'DATA'
        os.makedirs(save_directory, exist_ok = True)
        make_prediction_data, TERP_data = perturbation(np_dat, std_master, num_samples, index, seed)
    else:
        save_directory = save_dir + 'DATA_2'
        os.makedirs(save_directory, exist_ok = True)
        make_prediction_trimmed, TERP_data = perturbation(np_dat[:, selected_features], std_master[selected_features], num_samples, index, seed)
        make_prediction_data = np.ones((num_samples, np_dat.shape[1]))*np_dat[index,:]
        make_prediction_data[:, selected_features] = make_prediction_trimmed

    np.save(save_directory + '/make_prediction.npy', make_prediction_data)  
    np.save(save_directory + '/TERP_dat.npy', TERP_data)

    return [numeric_indices, angle_indices, sin_indices, cos_indices], indices_names

perturbation(data, std, num_samples, index, seed)

Function for generating perturbed samples.

Parameters:

Name Type Description Default
data np.ndarray

Numpy 2D array containing training data for the black-box model. Samples along rows and features along columns.

required
std np.ndarray

Numpy 1D array containing the standard deviation of features in data.

required
num_samples int

Size of the generated perturbed neighborhood.

required
index int

Row/sample of the provided dataset using np_dat to analyze.

required
seed int

Random seed.

required

Returns:

Type Description
np.ndarray

Perturbed samples to be passed to the black-box model to fetch state probabilities. np.ndarray: Perturbed samples for constructing linear models.

Source code in MDTerp/neighborhood.py
def perturbation(data: np.ndarray, std: np.ndarray, num_samples: float, index: int, seed: int):
      """
      Function for generating perturbed samples.

      Args:
        data (np.ndarray): Numpy 2D array containing training data for the black-box model. Samples along rows and features along columns.
        std (np.ndarray): Numpy 1D array containing the standard deviation of features in data.
        num_samples (int): Size of the generated perturbed neighborhood.
        index (int): Row/sample of the provided dataset using np_dat to analyze.
        seed (int): Random seed.
      Returns:
        np.ndarray: Perturbed samples to be passed to the black-box model to fetch state probabilities.
        np.ndarray: Perturbed samples for constructing linear models.
      """
      make_prediction_data = np.zeros((num_samples, data.shape[1]))
      TERP_data = np.zeros((num_samples, data.shape[1]))

      perturb = np.random.randint(0, 2, num_samples * data.shape[1]).reshape((num_samples, data.shape[1]))
      perturb[0,:] = 1

      np.random.seed(seed)

      for i in range(num_samples):
        for j in range(data.shape[1]):
          if perturb[i,j] == 1:
            make_prediction_data[i,j] = data[index,j]
          elif perturb[i,j] == 0:
            rand_data = np.random.normal(0, 1)
            make_prediction_data[i,j] = data[index,j] + std[j]*rand_data
            TERP_data[i,j] = rand_data

      return make_prediction_data, TERP_data