Modules¶

Model Architectures¶

This module contains all the fucntions that define various network architectures

Fucntions:

BPNet: The network architecture for BPNet as described in
the paper: https://www.biorxiv.org/content/10.1101/737981v1.full.pdf

BPNetSumAll: A variation of BPNet in which each conv layer
is added to all subsequent conv layers. In the paper version each conv layer is added only to the subsequent conv layer rather than to ALL subsequent conv layers.

basepairmodels.common.model_archs.BPNet(input_seq_len, output_len, num_bias_profiles, filters=64, num_dilation_layers=9, conv1_kernel_size=21, dilation_kernel_size=3, profile_kernel_size=75, num_tasks=2)¶

BPNet model architecture as described in the BPNet paper https://www.biorxiv.org/content/10.1101/737981v1.full.pdf

Parameters

input_seq_len (int) – The length of input DNA sequence
output_len (int) – The length of the profile output
num_bias_profiles (int) – The total number of control/bias tracks. In the case where original control and one smoothed version are provided this value is 2.
filters (int) – The number of filters in each convolutional layer of BPNet
num_dilation_layers (int) – the num of layers with dilated convolutions
conv1_kernel_size (int) – The kernel size for the first 1D convolution
dilation_kernel_size (int) – The kernel size in each of the dilation layers
profile_kernel_size (int) – The kernel size in the first convolution of the profile head branch of the network
num_tasks (int) – The number of output profile tracks

Returns

keras.model.Model

basepairmodels.common.model_archs.BPNet1000d8(input_seq_len=2114, output_len=1000, num_bias_profiles=2, filters=64, num_dilation_layers=8, conv1_kernel_size=21, dilation_kernel_size=3, profile_kernel_size=75, num_tasks=2)¶

BPNet model architecture as described in the BPNet paper https://www.biorxiv.org/content/10.1101/737981v1.full.pdf

Parameters

input_seq_len (int) – The length of input DNA sequence
output_len (int) – The length of the profile output
num_bias_profiles (int) – The total number of control/bias tracks. In the case where original control and one smoothed version are provided this value is 2.
filters (int) – The number of filters in each convolutional layer of BPNet
num_dilation_layers (int) – the num of layers with dilated convolutions
conv1_kernel_size (int) – The kernel size for the first 1D convolution
dilation_kernel_size (int) – The kernel size in each of the dilation layers
profile_kernel_size (int) – The kernel size in the first convolution of the profile head branch of the network
num_tasks (int) – The number of output profile tracks

Returns

keras.model.Model

basepairmodels.common.model_archs.BPNet500d7(input_seq_len, output_len, num_bias_profiles, filters=25, num_dilation_layers=7, conv_kernel_size=21, dilation_kernel_size=3, profile_kernel_size=75, num_tasks=2)¶

BPNet model architecture with output size of 500 and a receptive field of 623

Parameters

input_seq_len (int) – The length of input DNA sequence
output_len (int) – The length of the profile output
num_bias_profiles (int) – The total number of control/bias tracks. In the case where original control and one smoothed version are provided this value is 2.
filters (int) – The number of filters in each convolutional layer of BPNet
num_dilation_layers (int) – the num of layers with dilated convolutions
conv1_kernel_size (int) – The kernel size for the first 1D convolution
dilation_kernel_size (int) – The kernel size in each of the dilation layers
profile_kernel_size (int) – The kernel size in the first convolution of the profile head branch of the network
num_tasks (int) – The number of output profile tracks

Returns

keras.model.Model

basepairmodels.common.model_archs.BPNetSumAll(input_seq_len, output_len, num_bias_profiles, filters=64, num_dilation_layers=9, conv1_kernel_size=21, dilation_kernel_size=3, profile_kernel_size=75, num_tasks=2)¶

A variation of BPNet in which each convolutional layer is added to all subsequent convolutional layers. In the paper version each conv layer is added only to the subsequent conv layer rather than to ALL subsequent conv layers.

Parameters

input_seq_len (int) – The length of input DNA sequence
output_len (int) – The length of the profile output
num_bias_profiles (int) – The total number of control/bias tracks. In the case where original control and one smoothed version are provided this value is 2.
filters (int) – The number of filters in each convolutional layer of BPNet
num_dilation_layers (int) – the num of layers with dilated convolutions
conv1_kernel_size (int) – The kernel size for the first 1D convolution
dilation_kernel_size (int) – The kernel size in each of the dilation layers
profile_kernel_size (int) – The kernel size in the first convolution of the profile head branch of the network
num_tasks (int) – The number of output profile tracks

Returns

keras.model.Model

Stats¶

This module contains functions to

basepairmodels.common.stats.get_recommended_counts_loss_weight(input_bigWigs, peaks, alpha=1.0)¶

This function computes the hyper parameter lambda (l) as suggested in the BPNet paper on pg. 28 https://www.biorxiv.org/content/10.1101/737981v2.full.pdf

if lambda l is set to 1/2 * n_obs, where n_obs is the average number of total counts in the training set, the profile loss and the total counts loss will be roughly given equal weight. We can use the alpha parameter to upweight the profile predictions relative to the total count predictions as shown below

l = (alpha / 2) * n_obs

Parameters

input_bigWigs (list) – list of bigWig files with assay signal. n_obs will computed as a global average across all the input bigWigs
peaks (list) – list 3 column pandas dataframes, with ‘chrom’, ‘start’ and ‘end’ columns, corresponding to each input bigWig
alpha (float) – parameter to scale profile loss relative to the counts loss. A value < 1.0 will upweight the profile loss

Returns: float: counts loss weight (lambda)

Bigwig Utils¶

basepairmodels.cli.bigwigutils.prepare_BPNet_output_files(tasks, output_dir, chroms, chrom_sizes, model_tag, exponentiate_counts, other_tags=[])¶

prepare output bigWig files for writing bpnet predictions a. Construct aprropriate filenames b. Add headers to each bigWig file

Parameters

tasks (collections.OrderedDict) – nested python dictionary of tasks. The predictions of each task will be written to a separate bigWig
output_dir (str) – destination directory where the output files will be created
chroms (list) – list of chromosomes for which the bigWigs will contain predictions
chrom_sizes (str) – the path to the chromosome sizes file. The chrom size is used in constructing the header of the bigWig file
model_tag (str) – the unique tag of the model that is generating the predictions
exponentiate_counts (boolean) – True if counts predictions are to be exponentiated before writing to the bigWigs. This will determine if the counts bigWigs have the ‘exponentiated’ tag in the filename
other_tags (list) – list of additional tags to be added as suffix to the filenames

Returns

(list of profile bigWig file objects,: list of counts bigWig file objects)

Return type

tuple

basepairmodels.cli.bigwigutils.write_BPNet_predictions(profile_predictions, counts_predictions, profile_fileobjs, counts_fileobjs, coordinates, tasks, exponentiate_counts, output_window_size)¶

write one batch of BPNet predictions to bigWig files

Parameters

profile_predictions (np.ndarray) – 3 dimensional numpy array of size (batch_size, output_len, num_tasks*num_strands)
counts_predictions (np.ndarray) – 2 dimensional numpy array of size (batch_size, num_tasks*num_strands)
profile_fileobjs (list) – list of file objects that have been opened to write profile predicitions
counts_fileobjs (list) – list of file objects that have been opened to write counts predicitions
coordinates (list) – list of (chrom, start, end) for each prediction
tasks (collections.OrderedDict) – nested python dictionary of tasks
exponentiate_counts (boolean) – True if counts predictions are to be exponentiated before writing to the bigWigs
output_window_size (int) – size of the central window of the output

Loss Functions¶

class basepairmodels.cli.losses.MultichannelMultinomialNLL(n)¶

Class to compute combined loss from ‘n’ tasks

Parameters: n (int) – the number of channels / tasks

basepairmodels.cli.losses.multinomial_nll(true_counts, logits)¶: Compute the multinomial negative log-likelihood :param true_counts: observed count values :param logits: predicted logits values