Modules

Model Architectures

This module contains all the fucntions that define various network architectures

Fucntions:

BPNet: The network architecture for BPNet as described in

the paper: https://www.biorxiv.org/content/10.1101/737981v1.full.pdf

BPNetSumAll: A variation of BPNet in which each conv layer

is added to all subsequent conv layers. In the paper version each conv layer is added only to the subsequent conv layer rather than to ALL subsequent conv layers.

basepairmodels.common.model_archs.BPNet(input_seq_len, output_len, num_bias_profiles, filters=64, num_dilation_layers=9, conv1_kernel_size=21, dilation_kernel_size=3, profile_kernel_size=75, num_tasks=2)

BPNet model architecture as described in the BPNet paper https://www.biorxiv.org/content/10.1101/737981v1.full.pdf

Parameters
  • input_seq_len (int) – The length of input DNA sequence

  • output_len (int) – The length of the profile output

  • num_bias_profiles (int) – The total number of control/bias tracks. In the case where original control and one smoothed version are provided this value is 2.

  • filters (int) – The number of filters in each convolutional layer of BPNet

  • num_dilation_layers (int) – the num of layers with dilated convolutions

  • conv1_kernel_size (int) – The kernel size for the first 1D convolution

  • dilation_kernel_size (int) – The kernel size in each of the dilation layers

  • profile_kernel_size (int) – The kernel size in the first convolution of the profile head branch of the network

  • num_tasks (int) – The number of output profile tracks

Returns

keras.model.Model

basepairmodels.common.model_archs.BPNet1000d8(input_seq_len=2114, output_len=1000, num_bias_profiles=2, filters=64, num_dilation_layers=8, conv1_kernel_size=21, dilation_kernel_size=3, profile_kernel_size=75, num_tasks=2)

BPNet model architecture as described in the BPNet paper https://www.biorxiv.org/content/10.1101/737981v1.full.pdf

Parameters
  • input_seq_len (int) – The length of input DNA sequence

  • output_len (int) – The length of the profile output

  • num_bias_profiles (int) – The total number of control/bias tracks. In the case where original control and one smoothed version are provided this value is 2.

  • filters (int) – The number of filters in each convolutional layer of BPNet

  • num_dilation_layers (int) – the num of layers with dilated convolutions

  • conv1_kernel_size (int) – The kernel size for the first 1D convolution

  • dilation_kernel_size (int) – The kernel size in each of the dilation layers

  • profile_kernel_size (int) – The kernel size in the first convolution of the profile head branch of the network

  • num_tasks (int) – The number of output profile tracks

Returns

keras.model.Model

basepairmodels.common.model_archs.BPNet500d7(input_seq_len, output_len, num_bias_profiles, filters=25, num_dilation_layers=7, conv_kernel_size=21, dilation_kernel_size=3, profile_kernel_size=75, num_tasks=2)

BPNet model architecture with output size of 500 and a receptive field of 623

Parameters
  • input_seq_len (int) – The length of input DNA sequence

  • output_len (int) – The length of the profile output

  • num_bias_profiles (int) – The total number of control/bias tracks. In the case where original control and one smoothed version are provided this value is 2.

  • filters (int) – The number of filters in each convolutional layer of BPNet

  • num_dilation_layers (int) – the num of layers with dilated convolutions

  • conv1_kernel_size (int) – The kernel size for the first 1D convolution

  • dilation_kernel_size (int) – The kernel size in each of the dilation layers

  • profile_kernel_size (int) – The kernel size in the first convolution of the profile head branch of the network

  • num_tasks (int) – The number of output profile tracks

Returns

keras.model.Model

basepairmodels.common.model_archs.BPNetSumAll(input_seq_len, output_len, num_bias_profiles, filters=64, num_dilation_layers=9, conv1_kernel_size=21, dilation_kernel_size=3, profile_kernel_size=75, num_tasks=2)

A variation of BPNet in which each convolutional layer is added to all subsequent convolutional layers. In the paper version each conv layer is added only to the subsequent conv layer rather than to ALL subsequent conv layers.

Parameters
  • input_seq_len (int) – The length of input DNA sequence

  • output_len (int) – The length of the profile output

  • num_bias_profiles (int) – The total number of control/bias tracks. In the case where original control and one smoothed version are provided this value is 2.

  • filters (int) – The number of filters in each convolutional layer of BPNet

  • num_dilation_layers (int) – the num of layers with dilated convolutions

  • conv1_kernel_size (int) – The kernel size for the first 1D convolution

  • dilation_kernel_size (int) – The kernel size in each of the dilation layers

  • profile_kernel_size (int) – The kernel size in the first convolution of the profile head branch of the network

  • num_tasks (int) – The number of output profile tracks

Returns

keras.model.Model

Stats

This module contains functions to

This function computes the hyper parameter lambda (l) as suggested in the BPNet paper on pg. 28 https://www.biorxiv.org/content/10.1101/737981v2.full.pdf

if lambda l is set to 1/2 * n_obs, where n_obs is the average number of total counts in the training set, the profile loss and the total counts loss will be roughly given equal weight. We can use the alpha parameter to upweight the profile predictions relative to the total count predictions as shown below

l = (alpha / 2) * n_obs

Parameters
  • input_bigWigs (list) – list of bigWig files with assay signal. n_obs will computed as a global average across all the input bigWigs

  • peaks (list) – list 3 column pandas dataframes, with ‘chrom’, ‘start’ and ‘end’ columns, corresponding to each input bigWig

  • alpha (float) – parameter to scale profile loss relative to the counts loss. A value < 1.0 will upweight the profile loss

Returns

float: counts loss weight (lambda)

Bigwig Utils

basepairmodels.cli.bigwigutils.prepare_BPNet_output_files(tasks, output_dir, chroms, chrom_sizes, model_tag, exponentiate_counts, other_tags=[])

prepare output bigWig files for writing bpnet predictions a. Construct aprropriate filenames b. Add headers to each bigWig file

Parameters
  • tasks (collections.OrderedDict) – nested python dictionary of tasks. The predictions of each task will be written to a separate bigWig

  • output_dir (str) – destination directory where the output files will be created

  • chroms (list) – list of chromosomes for which the bigWigs will contain predictions

  • chrom_sizes (str) – the path to the chromosome sizes file. The chrom size is used in constructing the header of the bigWig file

  • model_tag (str) – the unique tag of the model that is generating the predictions

  • exponentiate_counts (boolean) – True if counts predictions are to be exponentiated before writing to the bigWigs. This will determine if the counts bigWigs have the ‘exponentiated’ tag in the filename

  • other_tags (list) – list of additional tags to be added as suffix to the filenames

Returns

(list of profile bigWig file objects,

list of counts bigWig file objects)

Return type

tuple

basepairmodels.cli.bigwigutils.write_BPNet_predictions(profile_predictions, counts_predictions, profile_fileobjs, counts_fileobjs, coordinates, tasks, exponentiate_counts, output_window_size)

write one batch of BPNet predictions to bigWig files

Parameters
  • profile_predictions (np.ndarray) – 3 dimensional numpy array of size (batch_size, output_len, num_tasks*num_strands)

  • counts_predictions (np.ndarray) – 2 dimensional numpy array of size (batch_size, num_tasks*num_strands)

  • profile_fileobjs (list) – list of file objects that have been opened to write profile predicitions

  • counts_fileobjs (list) – list of file objects that have been opened to write counts predicitions

  • coordinates (list) – list of (chrom, start, end) for each prediction

  • tasks (collections.OrderedDict) – nested python dictionary of tasks

  • exponentiate_counts (boolean) – True if counts predictions are to be exponentiated before writing to the bigWigs

  • output_window_size (int) – size of the central window of the output

Loss Functions

class basepairmodels.cli.losses.MultichannelMultinomialNLL(n)

Class to compute combined loss from ‘n’ tasks

Parameters

n (int) – the number of channels / tasks

basepairmodels.cli.losses.multinomial_nll(true_counts, logits)

Compute the multinomial negative log-likelihood :param true_counts: observed count values :param logits: predicted logits values