EXPERIMENTAL CNNVariantTrain

Train a CNN model for filtering variants

Category Variant Filtering

Overview

Train a Convolutional Neural Network (CNN) for filtering variants. This tool expects requires training data generated by CNNVariantWriteTensors.

Inputs

data-dir The training data created by CNNVariantWriteTensors.
The --tensor-type argument determines what types of tensors the model will expect. Set it to "reference" for 1D tensors or "read_tensor" for 2D tensors.

Outputs

output-dir The model weights file and semantic configuration json are saved here. This default to the current working directory.
model-name The name for your model.

Usage example

Train a 1D CNN on Reference Tensors

 gatk CNNVariantTrain \
   -tensor-type reference \
   -input-tensor-dir my_tensor_folder \
   -model-name my_1d_model

Train a 2D CNN on Read Tensors

 gatk CNNVariantTrain \
   -input-tensor-dir my_tensor_folder \
   -tensor-type read-tensor \
   -model-name my_2d_model

CNNVariantTrain specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s)	Default value	Summary
Required Arguments
--input-tensor-dir	null	Directory of training tensors to create.
Optional Tool Arguments
--annotation-shortcut	false	Shortcut connections on the annotation layers.
--annotation-units	16	Number of units connected to the annotation input layer
--arguments_file	[]	read one or more arguments files and add them to the command line
--conv-batch-normalize	false	Batch normalize convolution layers
--conv-dropout	0.0	Dropout rate in convolution layers
--conv-height	5	Height of convolution kernels
--conv-layers	[]	List of number of filters to use in each convolutional layer
--conv-width	5	Width of convolution kernels
--epochs	10	Maximum number of training epochs.
--fc-batch-normalize	false	Batch normalize fully-connected layers
--fc-dropout	0.0	Dropout rate in fully-connected layers
--fc-layers	[]	List of number of filters to use in each fully-connected layer
--gcs-max-retries -gcs-retries	20	If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--gcs-project-for-requester-pays	""	Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed.
--help -h	false	display the help message
--image-dir	null	Path where plots and figures are saved.
--model-name	variant_filter_model	Name of the model to be trained.
--output-dir	./	Directory where models will be saved, defaults to current working directory.
--padding	valid	Padding for convolution layers, valid or same
--spatial-dropout	false	Spatial dropout on convolution layers
--tensor-type	reference	Type of tensors to use as input reference for 1D reference tensors and read_tensor for 2D tensors.
--training-steps	10	Number of training steps per epoch.
--validation-steps	2	Number of validation steps per epoch.
--version	false	display the version number for this tool
Optional Common Arguments
--gatk-config-file	null	A configuration file to use with the GATK.
--QUIET	false	Whether to suppress job-summary info on System.err.
--tmp-dir	null	Temp directory to use.
--use-jdk-deflater -jdk-deflater	false	Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use-jdk-inflater -jdk-inflater	false	Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity	INFO	Control verbosity of logging.
Advanced Arguments
--annotation-set	best_practices	Which set of annotations to use.
--channels-last	true	Store the channels in the last axis of tensors, tensorflow->true, theano->false
--showHidden	false	display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

--annotation-set / -annotation-set

Which set of annotations to use.

String best_practices

--annotation-shortcut / -annotation-shortcut

Shortcut connections on the annotation layers.

boolean false

--annotation-units / -annotation-units

Number of units connected to the annotation input layer

int 16 [ [ -∞ ∞ ] ]

--arguments_file / NA

read one or more arguments files and add them to the command line

List[File] []

--channels-last / -channels-last

Store the channels in the last axis of tensors, tensorflow->true, theano->false

boolean true

--conv-batch-normalize / -conv-batch-normalize

Batch normalize convolution layers

boolean false

--conv-dropout / -conv-dropout

Dropout rate in convolution layers

float 0.0 [ [ -∞ ∞ ] ]

--conv-height / -conv-height

Height of convolution kernels

int 5 [ [ -∞ ∞ ] ]

--conv-layers / -conv-layers

List of number of filters to use in each convolutional layer

List[Integer] []

--conv-width / -conv-width

Width of convolution kernels

int 5 [ [ -∞ ∞ ] ]

--epochs / -epochs

Maximum number of training epochs.

int 10 [ [ 0 ∞ ] ]

--fc-batch-normalize / -fc-batch-normalize

Batch normalize fully-connected layers

boolean false

--fc-dropout / -fc-dropout

Dropout rate in fully-connected layers

float 0.0 [ [ -∞ ∞ ] ]

--fc-layers / -fc-layers

List of number of filters to use in each fully-connected layer

List[Integer] []

--gatk-config-file / NA

A configuration file to use with the GATK.

String null

--gcs-max-retries / -gcs-retries

If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection

int 20 [ [ -∞ ∞ ] ]

--gcs-project-for-requester-pays / NA

Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed.

String ""

--help / -h

display the help message

boolean false

--image-dir / -image-dir

Path where plots and figures are saved.

String null

--input-tensor-dir / -input-tensor-dir

Directory of training tensors to create.

R String null

--model-name / -model-name

Name of the model to be trained.

String variant_filter_model

--output-dir / -output-dir

Directory where models will be saved, defaults to current working directory.

String ./

--padding / -padding

Padding for convolution layers, valid or same

String valid

--QUIET / NA

Whether to suppress job-summary info on System.err.

Boolean false

--showHidden / -showHidden

display hidden arguments

boolean false

--spatial-dropout / -spatial-dropout

Spatial dropout on convolution layers

boolean false

--tensor-type / -tensor-type

Type of tensors to use as input reference for 1D reference tensors and read_tensor for 2D tensors.

The --tensor-type argument is an enumerated type (TensorType), which can have one of the following values:

reference
read_tensor

TensorType reference

--tmp-dir / NA

Temp directory to use.

String null

--training-steps / -training-steps

Number of training steps per epoch.

int 10 [ [ 0 ∞ ] ]

--use-jdk-deflater / -jdk-deflater

Whether to use the JdkDeflater (as opposed to IntelDeflater)

boolean false

--use-jdk-inflater / -jdk-inflater

Whether to use the JdkInflater (as opposed to IntelInflater)

boolean false

--validation-steps / -validation-steps

Number of validation steps per epoch.

int 2 [ [ 0 ∞ ] ]

--verbosity / -verbosity

Control verbosity of logging.

The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel INFO

--version / NA

display the version number for this tool

boolean false

Return to top

GATK version 4.1.0.0 built at Tue, 29 Jan 2019 22:20:41 -0500.

**EXPERIMENTAL** CNNVariantTrain

Category Variant Filtering

Overview

Inputs

Outputs

Usage example

Train a 1D CNN on Reference Tensors

Train a 2D CNN on Read Tensors

CNNVariantTrain specific arguments

Argument details

--annotation-set / -annotation-set

--annotation-shortcut / -annotation-shortcut

--annotation-units / -annotation-units

--arguments_file / NA

--channels-last / -channels-last

--conv-batch-normalize / -conv-batch-normalize

--conv-dropout / -conv-dropout

--conv-height / -conv-height

--conv-layers / -conv-layers

--conv-width / -conv-width

--epochs / -epochs

--fc-batch-normalize / -fc-batch-normalize

--fc-dropout / -fc-dropout

--fc-layers / -fc-layers

--gatk-config-file / NA

--gcs-max-retries / -gcs-retries

--gcs-project-for-requester-pays / NA

--help / -h

--image-dir / -image-dir

--input-tensor-dir / -input-tensor-dir

--model-name / -model-name

--output-dir / -output-dir

--padding / -padding

--QUIET / NA

--showHidden / -showHidden

--spatial-dropout / -spatial-dropout

--tensor-type / -tensor-type

--tmp-dir / NA

--training-steps / -training-steps

--use-jdk-deflater / -jdk-deflater

--use-jdk-inflater / -jdk-inflater

--validation-steps / -validation-steps

--verbosity / -verbosity

--version / NA

EXPERIMENTAL CNNVariantTrain