Modules¶
Model Architectures¶
This module contains all the fucntions that define various network architectures
Fucntions:
- BPNet: The network architecture for BPNet as described in
the paper: https://www.biorxiv.org/content/10.1101/737981v1.full.pdf
- BPNetSumAll: A variation of BPNet in which each conv layer
is added to all subsequent conv layers. In the paper version each conv layer is added only to the subsequent conv layer rather than to ALL subsequent conv layers.
-
basepairmodels.common.model_archs.BPNet(input_seq_len, output_len, num_bias_profiles, filters=64, num_dilation_layers=9, conv1_kernel_size=21, dilation_kernel_size=3, profile_kernel_size=75, num_tasks=2)¶ BPNet model architecture as described in the BPNet paper https://www.biorxiv.org/content/10.1101/737981v1.full.pdf
- Parameters
input_seq_len (int) – The length of input DNA sequence
output_len (int) – The length of the profile output
num_bias_profiles (int) – The total number of control/bias tracks. In the case where original control and one smoothed version are provided this value is 2.
filters (int) – The number of filters in each convolutional layer of BPNet
num_dilation_layers (int) – the num of layers with dilated convolutions
conv1_kernel_size (int) – The kernel size for the first 1D convolution
dilation_kernel_size (int) – The kernel size in each of the dilation layers
profile_kernel_size (int) – The kernel size in the first convolution of the profile head branch of the network
num_tasks (int) – The number of output profile tracks
- Returns
keras.model.Model
-
basepairmodels.common.model_archs.BPNet1000d8(input_seq_len=2114, output_len=1000, num_bias_profiles=2, filters=64, num_dilation_layers=8, conv1_kernel_size=21, dilation_kernel_size=3, profile_kernel_size=75, num_tasks=2)¶ BPNet model architecture as described in the BPNet paper https://www.biorxiv.org/content/10.1101/737981v1.full.pdf
- Parameters
input_seq_len (int) – The length of input DNA sequence
output_len (int) – The length of the profile output
num_bias_profiles (int) – The total number of control/bias tracks. In the case where original control and one smoothed version are provided this value is 2.
filters (int) – The number of filters in each convolutional layer of BPNet
num_dilation_layers (int) – the num of layers with dilated convolutions
conv1_kernel_size (int) – The kernel size for the first 1D convolution
dilation_kernel_size (int) – The kernel size in each of the dilation layers
profile_kernel_size (int) – The kernel size in the first convolution of the profile head branch of the network
num_tasks (int) – The number of output profile tracks
- Returns
keras.model.Model
-
basepairmodels.common.model_archs.BPNet500d7(input_seq_len, output_len, num_bias_profiles, filters=25, num_dilation_layers=7, conv_kernel_size=21, dilation_kernel_size=3, profile_kernel_size=75, num_tasks=2)¶ BPNet model architecture with output size of 500 and a receptive field of 623
- Parameters
input_seq_len (int) – The length of input DNA sequence
output_len (int) – The length of the profile output
num_bias_profiles (int) – The total number of control/bias tracks. In the case where original control and one smoothed version are provided this value is 2.
filters (int) – The number of filters in each convolutional layer of BPNet
num_dilation_layers (int) – the num of layers with dilated convolutions
conv1_kernel_size (int) – The kernel size for the first 1D convolution
dilation_kernel_size (int) – The kernel size in each of the dilation layers
profile_kernel_size (int) – The kernel size in the first convolution of the profile head branch of the network
num_tasks (int) – The number of output profile tracks
- Returns
keras.model.Model
-
basepairmodels.common.model_archs.BPNetSumAll(input_seq_len, output_len, num_bias_profiles, filters=64, num_dilation_layers=9, conv1_kernel_size=21, dilation_kernel_size=3, profile_kernel_size=75, num_tasks=2)¶ A variation of BPNet in which each convolutional layer is added to all subsequent convolutional layers. In the paper version each conv layer is added only to the subsequent conv layer rather than to ALL subsequent conv layers.
- Parameters
input_seq_len (int) – The length of input DNA sequence
output_len (int) – The length of the profile output
num_bias_profiles (int) – The total number of control/bias tracks. In the case where original control and one smoothed version are provided this value is 2.
filters (int) – The number of filters in each convolutional layer of BPNet
num_dilation_layers (int) – the num of layers with dilated convolutions
conv1_kernel_size (int) – The kernel size for the first 1D convolution
dilation_kernel_size (int) – The kernel size in each of the dilation layers
profile_kernel_size (int) – The kernel size in the first convolution of the profile head branch of the network
num_tasks (int) – The number of output profile tracks
- Returns
keras.model.Model
Stats¶
This module contains functions to
-
basepairmodels.common.stats.get_recommended_counts_loss_weight(input_bigWigs, peaks, alpha=1.0)¶ This function computes the hyper parameter lambda (l) as suggested in the BPNet paper on pg. 28 https://www.biorxiv.org/content/10.1101/737981v2.full.pdf
if lambda l is set to 1/2 * n_obs, where n_obs is the average number of total counts in the training set, the profile loss and the total counts loss will be roughly given equal weight. We can use the alpha parameter to upweight the profile predictions relative to the total count predictions as shown below
l = (alpha / 2) * n_obs
- Parameters
input_bigWigs (list) – list of bigWig files with assay signal. n_obs will computed as a global average across all the input bigWigs
peaks (list) – list 3 column pandas dataframes, with ‘chrom’, ‘start’ and ‘end’ columns, corresponding to each input bigWig
alpha (float) – parameter to scale profile loss relative to the counts loss. A value < 1.0 will upweight the profile loss
- Returns
float: counts loss weight (lambda)
Bigwig Utils¶
-
basepairmodels.cli.bigwigutils.prepare_BPNet_output_files(tasks, output_dir, chroms, chrom_sizes, model_tag, exponentiate_counts, other_tags=[])¶ prepare output bigWig files for writing bpnet predictions a. Construct aprropriate filenames b. Add headers to each bigWig file
- Parameters
tasks (collections.OrderedDict) – nested python dictionary of tasks. The predictions of each task will be written to a separate bigWig
output_dir (str) – destination directory where the output files will be created
chroms (list) – list of chromosomes for which the bigWigs will contain predictions
chrom_sizes (str) – the path to the chromosome sizes file. The chrom size is used in constructing the header of the bigWig file
model_tag (str) – the unique tag of the model that is generating the predictions
exponentiate_counts (boolean) – True if counts predictions are to be exponentiated before writing to the bigWigs. This will determine if the counts bigWigs have the ‘exponentiated’ tag in the filename
other_tags (list) – list of additional tags to be added as suffix to the filenames
- Returns
- (list of profile bigWig file objects,
list of counts bigWig file objects)
- Return type
tuple
-
basepairmodels.cli.bigwigutils.write_BPNet_predictions(profile_predictions, counts_predictions, profile_fileobjs, counts_fileobjs, coordinates, tasks, exponentiate_counts, output_window_size)¶ write one batch of BPNet predictions to bigWig files
- Parameters
profile_predictions (np.ndarray) – 3 dimensional numpy array of size (batch_size, output_len, num_tasks*num_strands)
counts_predictions (np.ndarray) – 2 dimensional numpy array of size (batch_size, num_tasks*num_strands)
profile_fileobjs (list) – list of file objects that have been opened to write profile predicitions
counts_fileobjs (list) – list of file objects that have been opened to write counts predicitions
coordinates (list) – list of (chrom, start, end) for each prediction
tasks (collections.OrderedDict) – nested python dictionary of tasks
exponentiate_counts (boolean) – True if counts predictions are to be exponentiated before writing to the bigWigs
output_window_size (int) – size of the central window of the output
Loss Functions¶
-
class
basepairmodels.cli.losses.MultichannelMultinomialNLL(n)¶ Class to compute combined loss from ‘n’ tasks
- Parameters
n (int) – the number of channels / tasks
-
basepairmodels.cli.losses.multinomial_nll(true_counts, logits)¶ Compute the multinomial negative log-likelihood :param true_counts: observed count values :param logits: predicted logits values