\documentclass[10pt]{article}
\usepackage[hmargin=1.25cm,top=1.5cm,bottom=1.5cm]{geometry}
\usepackage{multicol}
\setlength\columnsep{15pt}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{array}
\usepackage{booktabs}
\usepackage{tabularx}
\usepackage[auth-sc]{authblk}
\usepackage{longtable}
\usepackage{multirow}
\usepackage{hyperref}
\usepackage{enumerate}
\usepackage[labelfont=bf]{caption}
\usepackage[usenames,dvipsnames]{xcolor}
\usepackage{mdframed}
\usepackage{graphics}
\usepackage{multirow}
\usepackage{rotating}
\usepackage{array}
\usepackage{lscape}
\usepackage{caption}
\usepackage{breakurl}
\usepackage{todonotes}
\usepackage{hanging}
\usepackage[final]{pdfpages}
\usepackage[leftFloats,CaptionAfterwards]{fltpage}
\usepackage[numbers,super,sort&compress]{natbib}
\setlength{\bibsep}{0pt plus 0.3ex}
\usepackage{abstract}
\usepackage{enumitem}
\usepackage{soul}
\usepackage{titlesec}
\titleformat{\section}[block]{\large\bfseries\filcenter}{\thesection.}{0.4em}{}
\titleformat{\subsection}[block]{\normalsize\sc\bfseries\filcenter}{\thesubsection.}{0.4em}{}
\titleformat{\subsubsection}[block]{\normalsize\sc\itshape\filright}{\thesubsubsection.}{0.4em}{}
\setcounter{secnumdepth}{5}

\makeatletter
\def\@biblabel#1{\@ifnotempty{#1}{#1.}}
\makeatother

\newcommand{\filllastline}[1]{
\setlength\leftskip{0pt}
\setlength\rightskip{0pt}
\setlength\parfillskip{0pt}
#1}

\newenvironment{Figure}
{\par\medskip\noindent\minipage{\linewidth}}
{\endminipage\par\medskip}

\title{\bf Understanding Transcriptional Regulatory Logic Using Convolutional and Generative Deep Learning Models}
\renewcommand\Authfont{\scshape\normalsize}
\author[1,$\#$]{Nic Fishman}
\author[2]{Georgi K. Marinov}
\author[2,3]{Anshul Kundaje}
\renewcommand\Affilfont{\itshape\small}
\affil[1]{Department of Bioengineering, Stanford University, Stanford, California 94305, USA}
\affil[2]{Department of Genetics, Stanford University, Stanford, California 94305, USA}
\affil[3]{Department of Computer Science, Stanford University, Stanford, California 94305, USA}
\affil[$\#$]{To whom correspondence should be addressed: \href{njwfish@stanford.edu}{njwfish@stanford.edu} }
\date{}

\begin{document}
\maketitle

% \centerline{}
% \centerline{}
\begin{abstract}
\noindent {\normalsize \textbf{The concerted action of sequence-specific transcription factors onto \textit{cis}-regulatory elements in DNA is the core mechanism through which gene regulation is accomplished in most eukaryotes. Mechanistic dissection of the logic of these interactions is therefore of critical importance for understanding cellular responses to external and internal stimuli in a wide variety of contexts, such as organismal development and disease. However, despite many decades of intensive studies the goal of mapping out THIS \textit{cis}-regulatory logic is still far from achieved. Massively Parallel Reporter Assays (MPRA) are a potentially extremely powerful tool that can be applied for this purpose, but extracting \textit{cis}-regulatory interactions from their output is not trivial for the human eye. Here we apply deep learning approaches to MPRA datasets to understand the individual contribution to gene expression activity of transcription factor (TF) motifs and the relationships between them, and show that convolutional neural networks (CNNs) are capable of learning both aspects of \textit{cis}-regulatory logic. We then build on our CNN models and develop generative adversarial networks (GANs) that can produce novel regulatory sequences with particular gene expression activities.} 
}
\end{abstract}

\begin{multicols}{2}

\section{Introduction}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig1.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Outline of deep learning strategies for predicting activity and \textit{de novo} generating functionally active DNA sequences}. 
(a) The regression training phase of the sequence generation pipeline takes a set of sequences and trains a number of models to predict expression driven by these sequences
(b) Structure of the GAN component of the sequence generation pipeline, which creates \textit{de novo} predicted-to-be functionally active sequences.
% \centerline{}
% \rightline{\textit{(legend continued on next page)}}
} 
\label{fig:yeast-truepred}
\end{figure*}

\begin{table*}[!hb]
\centering
\caption{\textbf{Architectural search of deep learning CNN models for predicting regulatory activity from sequence}. Each architectural property and the corresponding distribution it was drawn from in the random search are shown. Note that for properties relevant to multiple layers, such as units per dense layer, $n$ independent samples were drawn from the distribution corresponding to the relevant number of layers.}
\begin{tabular}{|r l|}\hline
       Architecture Property & Distribution Drawn From \\\hline\hline
       Number Convolutional Layers &$\sim Uni(2, 4)$\\\hline
       Filters per Convolutional Layer & $\sim Uni(5, 50)$\\\hline
       Filter Size & $\sim Uni(4, 15)$\\\hline
       Number Dense Layers & $\sim Uni(1, 5)$\\\hline
       Units per Dense Layer & $\sim Uni(5, 100)$\\\hline
       Regularize all Layers & $\sim Bern(p=0.5)$\\\hline\hline
\end{tabular}
\label{table:rand-search}
\end{table*}

Gene regulation in most eukaryotes is driven by the action of sequence-specific transcription factors, which recognize specific DNA sequence motifs/binding sites (TFBSs) within larger \textit{cis}-regulatory elements (cREs), and act in concert to affect the expression of their cognate genes. These cREs typically contain multiple binding sites for several different transcription factors; it is the integration of the activity of all these factors within an individual cRE that determines its activity, and the integration of the activity of all cREs that together act on a gene that drives changes in its expression. The complexity of \textit{cis}-regulatory logic is considerable -- each human gene is associated with on average about a dozen putative cREs\cite{ENCODE2012}, each of them several hundred bases in length and potentially containing dozens of TFBSs -- thus it is not surprising that despite decades of research in the field, understanding that logic in detail has remained an elusive target.

Two technological and analytical developments hold great promise for resolving that challenge. 

The first one is the development of MPRAs, which allow the regulatory activity of many thousands of both endogenous as well as synthetic DNA sequences to be assayed simultaneously on a large scale\cite{Patwardhan2012,Melnikov2012,Arnold2013,Sharon2012}. In a typical MPRA design, each tested sequence is either associated with a barcode placed within a reporter gene\cite{Patwardhan2012,Melnikov2012,Sharon2012}, or is directly located in the reporter itself\cite{Arnold2013}. High-throughput DNA sequencing is used to determine the relative abundance of barcodes or input sequences in both the populations of expressed RNA and input DNA molecules, thus providing information on the relative activity of each construct.

The second advance involves the application of deep learning techniques to discerning gene regulatory logic, with the hope that machine learning will prove to be more effective where humans have not succeeded so far. Deep learning approaches have already demonstrated remarkable performance on a wide variety of problems in genomics\cite{Ching2018}, such as predicting transcription factor binding, accessible chromatin regions, nucleosome positioning, RNA splicing outcomes, and many others, and they have more recently also seen initial applications to MPRAs involving endogenous\cite{Movva2018} and random sequences\cite{Cuperus2017}. Once successfully predictive deep learning models are built, methods of deep learning model interpretation can then be used to extract regulatory relationships encoded in the DNA sequence\cite{Shrikumar2017}. 

Beyond the simple development of predictive models, another highly promising deep learning-based approach in this area is the generation of novel DNA sequences (following the ``What I cannot create, I do not understand'' principle), the activity of which can subsequently be tested in an MPRA. Initial work in this area has shown the ability of generative deep learning methods to generate DNA sequence \textit{de novo}\cite{Killoran2017}, but the success of such approaches in specific biological contexts has not yet been demonstrated. 

In this work we first apply deep learning approaches (Figure \ref{fig:yeast-truepred}) to an yeast MPRA dataset that includes DNA sequences containing predefined regulatory grammars\cite{VanDijk2017} and their activities at various amino acid concentrations (``[AA]''). We find that CNN-based approaches can successfully capture these lexical grammars, identify relevant motifs, and predict regulatory activities in the context of this MPRA dataset. We build on our CNN models to develop a GAN-based algorithm for generating novel DNA sequences with desired regulatory activities. We then apply these approaches in the context of a much larger MPRA dataset in human cells, and plan on experimentally testing the activity of newly designed regulatory sequences in an MPRA. 

\section{Related Work}

Initial applications to MPRAs involving endogenous\cite{Movva2018} and random sequences\cite{Cuperus2017} have shown that deep learning frameworks are capable of capturing the underlying regulatory grammars encoded in DNA sequences. More broadly, approaches applying deep learning to DNA sequence are at this point well developed in the literature. \cite{Movva2018,Zeng2016,deepsea}

The problem of \textit{de novo} sequence generation has been less extensively explored. Killoran et al.\cite{Killoran2017} applied GANs to generating generic DNA sequences using the Wasserstein GAN, which optimizes the earth mover distance between the generated and real samples\cite{arjovsky2017wasserstein}. In this approach, the generator was first pre-trained to produce DNA sequences, and then the discriminator was replaced with a differentiable analyzer, oriented towards classifying if a given sequence had a specified property. Along a similar vein, Gupta and Zou\cite{fbgan} develop a GAN approach based on a feedback loop, generating DNA sequence in a generator-discriminator framework, but adding the selection of some ``top'' sequences (evaluated according to a given criterion) to the ``real'' distribution each iteration. In this way they expand the ``real'' distribution and force the generator to construct sequences in line with the analyzer's selection criteria. More recently Brookes et. al. propose a framework for design by adaptive sampling\cite{dbas}, but this approach, while powerful, is extremely limited by computational complexity; the authors were only able to generate sequence $\leq$13 bp in length.

Here we present a novel framework for end-to-end generation of sequences to minimize any developer-defined loss function on a set of DNA sequences with associated scores for some property(s) of interest.

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig2.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Performance of predictive models on yeast MPRA dataset}. 
(a) True ($x$-axis) and predicted ($y$-axis) expression levels for constructs with different numbers of GNC4 motifs.
(b) Deep learning models reproduce experimentally the observed behaviors of constructs with different numbers of GNC4 motifs in response to increasing [AA].
(c-d) DeepLIFT importance score for a construct containing three weak GCN4 TFBSs (c) and another one containing three strong GCN4 TFBSs correctly identify motifs driving regulatory activity
% \centerline{}
% \rightline{\textit{(legend continued on next page)}}
} 
\label{Fig2}
\end{figure*}

\section{Methods}

The backbone of this project is an end-to-end sequence generation pipeline that takes a set of sequences, and some corresponding property(s) to learn, then trains a GAN that can generate sequence designed to exhibit the given property(s). Throughout this paper we examine this pipeline in the context of predicting the regulatory effects of DNA sequences on gene expression, guided by existing MPRA datasets.

\subsection{MPRA Datasets and data preprocessing}

We used two MPRA datasets in this study. The first one\cite{VanDijk2017} was carried out in the budding yeast \textit{Saccharomyces cerevisiae} and includes testing the activity of $\sim$5,000 synthetic sequences containing defined numbers of TFBSs for factors known to be important in regulating gene expression upon changes in nutrient availability. Regulatory activity was measured under a range of six different increasing amino acid concentrations. 

The Sharpr-MPRA dataset\cite{Ernst2016} is significantly larger, containing $\sim$500,000 endogenous 145 bp-long human sequences that tile $\sim$15,000 putative regulatory regions at 5-bp intervals. These sequences were assayed in two different human cell lines (the erythroid K562 and the hepatic HepG2) with two different promoters. 

DNA sequences were one-hot encoded following established practices. We also $log+$-transformed the regression targets so we could use the ReLU non-linearity as the output of our regression networks.

% \subsection{End-to-End Objective Oriented Sequence Generation}

\subsection{Predicting Gene Expression}

The first part of the pipeline involves building a neural net that learns to predict the effects on gene expression of a given set of regulatory sequences. We largely follow CNN-based approaches previously developed to tackle such tasks\cite{Movva2018,Zeng2016}. After one-hot encoding, the input sequences are split into training and test sets. Convolutional layers learn to capture transcription factor binding motifs, followed by a series of fully connected layers to predict output expression under a set of different conditions; in this case the regressor predicts a vector where each element corresponds to one of the conditions.

Because it is not \textit{a priori} clear exactly which architecture is best suited for a given prediction task, we implement a random search, training 100 models with random numbers of convolutional and dense layers, randomly sized kernels and random numbers of hidden units (Table \ref{table:rand-search}). We train these models for 1,000 epochs using the Adam optimizer, keeping only the lowest validation loss across the training history. Models are then evaluated on the test set, and a linear combination of the test error and the overfitting ratio (defined as $\frac{MSE_{train}}{MSE_{test}}$) is used to rank the regression models. Later in the process it becomes important that the regressor is actually learning to assign importance to TFBSs, and that overfitting is minimized as much as possible.

\subsection{Feature Importance Scoring}

The DeepLIFT framework\cite{Shrikumar2017} was used to assign feature importance scores of MPRA prediction models to each nucleotide in input sequences.

\subsection{Generating Regulatory Sequence}

\filllastline{To \textit{de novo} generate regulatory sequences, we start with the improved Wasserstein GAN\cite{arjovsky2017wasserstein,wgangp}, the core architecture of which we modify for our purposes in two ways. First, in addition to the random seed $z$, we also input a continuous vector $t$, with the same shape as the output from the regressor trained in the previous phase. The vector $t$ is drawn from a normal distribution fit to the distribution of expressions in the MPRA dataset being trained on. The generator outputs a one-hot encoded sequence vector, making use of the}

\end{multicols}

\begin{center}
\begin{longtable}{|m{1cm}|m{2cm}|m{2cm}|m{3cm}|m{0.2cm}|m{1cm}|m{2cm}|m{2cm}|}
\caption[]{\textbf{Top models from random architecture search for the Yeast MPRA and Sharpr-MPRA datasets}. For feature importance scoring, we sum the DeepLIFT importance scores ascribed to TFBSs, and divide by the total importance of the input sequence. We sum this score over all sequences in both the training and test datasets.}\\
\hline
\multicolumn{4}{|c|}{Yeast MPRA} & \hspace{0.2cm} & \multicolumn{3}{|c|}{Sharpr-MPRA} \\
\hline
Model Rank & Training MSE & Test MSE & Motif Importance & \hspace{0.2cm} & Model Rank & Training MSE & Test MSE \\
\hline
\endfirsthead
\multicolumn{8}{c}%
{\tablename\ \thetable\ -- \textit{Continued from previous page}} \\
\hline
\multicolumn{4}{|c|}{Yeast MPRA} & \hspace{0.2cm} & \multicolumn{3}{|c|}{Sharpr-MPRA} \\
\hline
Model Rank & Training MSE & Test MSE & Motif Importance & \hspace{0.2cm} & Model Rank & Training MSE & Test MSE  \\
\hline
\endhead
\hline \multicolumn{8}{r}{\textit{Continued on next page}} \\
\endfoot
\hline
\endlastfoot
\hline
1 & 0.014043 & 0.026653 & 1087.418475 & \hspace{0.2cm} & 1 & 8674.099 & 34120.833\\\hline
\hline
2 & 0.002005 & 0.028467 & 1084.981529 & \hspace{0.2cm} & 2 & 7581.866 & 34191.550\\\hline
\hline
3 & 0.008275 & 0.030267 & 11z77.912115 & \hspace{0.2cm} & 3 & 12927.603 & 34713.798
\label{table:yeast-reg}
\end{longtable}
\end{center}

\begin{multicols}{2}

\noindent gumbel-softmax activation function to remain differentiable. The discriminator is trained in classic WGAN-GP fashion, feeding examples from both real sequences from the MPRA dataset and generated ones from the generator. Formally this loss is calculated as:

\begin{equation}
L_D = \mathop{\mathbb{E}}_{\Tilde{x} \sim \mathop{\mathbb{P}}_g} [D(\Tilde{x})] - \mathop{\mathbb{E}}_{x \sim \mathop{\mathbb{P}}_r} [D(x)] + \lambda \mathop{\mathbb{E}}_{\hat{x} \sim \mathop{\mathbb{P}}_{\hat{x}}} [(||\nabla_{\hat{x}} D(\hat{x})||_2 - 1)^2]
\end{equation}

Where we update the discriminator by maximizing with respect to $L_D$. For the generator, we feed the generated sequence into both the discriminator and the regressor (from the previous phase), and the generator is updated by minimizing with respect to a weighted average of the two losses. These losses are weighed with a tunable $\gamma$ term as follows: 

\begin{equation}
\mathcal{L_G} = \gamma L_{D} + (1-\gamma)L_{R}
\end{equation}

\subsection{Evaluating Generated Sequences}

\subsubsection{Motif Counts}

The simplest way to evaluate generated sequences is to check whether they contain TFBSs known to be relevant for gene regulation under the conditions tested in the training set. To do this we can simply count frequency of each motif or motif combination in generated sequences, and compare it to either the real data or randomly generated sequence.

\subsubsection{1-Nearest Neighbor}

The best method for evaluating the quality of a generated distribution given an actual distribution is to do calculate the LOO (Leave-One-Out)accuracy of a 1-NN algorithm in a learned feature space.\cite{xu2018empirical} We create a general framework for generating a learned feature space to compare regulatory sequences of a given length $k <= 2000$ by starting with a CNN model,\cite{deepsea} altering the length of the input space, and then truncating the model after the last convolutional layer that produces a non-negative output size. To compute the LOO accuracy, we label real sequences $1$ and generated sequences $-1$, calculate the distances between all sequence pairs, and take the label of the second smallest distance sequence as the predicted label for all sequences. If the generated and real distributions were the same, i.e. if the generated sequences are high quality, the 1-NN will achieve 50\% accuracy.

\subsubsection{Predicting Expression}

To ensure that the generated sequences are not simply overfitting the regressor network used to train the generator, we use an ensemble of several of the next-best regressors produced by the random architecture search to predict the expression of the generated sequences, and compare it to the initial target expression $t$.

\section{Results}

\begin{table*}[!ht]
\centering
\begin{tabular}{|c | c|}
\hline
$\gamma = 1.0 $ & 
\begin{tabular}{c | c  | c }\hline
       Metric & Optimal/Real Value & Generated \\\hline\hline
       1-NN LOO & 0.5 & 0.65\\\hline
       Motif Count Per Sequence & 17.54 & 14.114\\\hline
       Predicted Expression MSE & 0.0 & 0.801 \\\hline
\end{tabular}\\\hline
$\gamma = 0.5 $ & 
\begin{tabular}{c | c  | c }\hline
       Metric & Optimal/Real Value & Generated \\\hline\hline
       1-NN LOO & 0.5 & 0.77\\\hline
       Motif Count Per Sequence & 17.54 & 11.894\\\hline
       Predicted Expression MSE & 0.0 & 0.205\\\hline
\end{tabular}\\\hline
$\gamma = 0.0$  & 
\begin{tabular}{c | c  | c }\hline
       Metric & Optimal/Real Value & Generated \\\hline\hline
       1-NN LOO & 0.5 & 1.0\\\hline
       Motif Count Per Sequence & 17.54 & 8.336\\\hline
       Predicted Expression MSE & 0.0 & 0.792\\\hline
\end{tabular}\\\hline
\end{tabular}
\caption{Evaluation of sequences produced from 5000 epochs of training for various $\gamma$ on the Yeast MPRA dataset.}
\label{table:yeast-gan}
\end{table*}

\begin{table*}[!hb]
\centering
\begin{tabular}{|c | c  | c |}\hline
       Metric & Optimal/Real Value & Generated \\\hline\hline
       1-NN LOO & 0.5 & 0.89\\\hline
       Predicted Expression MSE & 0.0 & 64324.5176\\\hline
\end{tabular}
\caption{Evaluation of sequences produced from 5000 epochs of training for $\gamma = 0.5$ on the SHRAPR MPRA dataset.}
\label{table:sharpr-gan}
\end{table*}

\subsection{Hyperparameters}

Throughout training we used the default hyperparameters for the Adam optimizer, the default hyperparameters for the WGAN-GP algorithm,\cite{wgangp} and a standard batch size of 64, because these parameters are relatively well supported in the literature and returned reasonably good results in preliminary tests.

\subsection{Regression}

\subsubsection{Yeast MPRA}

After random architecture search on the yeast MPRA dataset, we arrived at an optimal model with a test MSE of 0.0266 (Table \ref{table:yeast-reg}). The performance of this model can be compared to the baseline established in the yeast MPRA study itself by physical thermodynamic models for predicting expression, the top test performance of which is $R^2 = 0.8$ across all sequences. The corresponding test $R^2$ for the top deep learning method is $R^2  = 0.91$ (Figure \ref{Fig2}a). Also, a key result of the original study is that combining large numbers of TFBSs in the same construct does not necessarily lead to higher regulatory activity, and can even decrease it, due to competition for binding between nearby sites. Deep learning models are well capable of capturing that behavior (Figure \ref{Fig2}b).

To confirm that the neural net's models are not just learning to predict expression, but are also identifying the key regulatory sequences driving, we applied DeepLIFT featue importance scoring\cite{Shrikumar2017} on the input sequences. The relevant motif indeed tend to be the most highly scoring sequences, with two examples shown in Figure \ref{Fig2}c-d. To more generally quantify motif importance, we also calculate a motif importance score, calculated as the sum of the importance ascribed to basepairs within motif divided by the total importance for a given sequence. This score is summed over all sequences in both the training and test datasets. % The results here validate that we are overfitting the data only slightly, and that we are assigning signal to the motifs.

\subsubsection{Sharpr-MPRA}

The Sharpr-MPRA dataset is significantly more difficult to predict than the Yeast MPRAs as it exhibits poor reproducibility between the experimental replicates\cite{Movva2018}, and the between-replicate correlation sets an upper limit on the achievable model performance. We are still in the process of identifying the optimal performing models for this dataset.

\subsection{Sequence Generation}

\subsubsection{Yeast MPRA}

We were able to successfully train several GANs using the Yeast MPRA regressor, but are still working evaluating and understanding the resulting sequences. Current summary results are shown in Table \ref{table:yeast-gan}, where several interesting observations are evident. First, using just the WGAN architecture ($\gamma = 1$), the distribution of generated sequences is very similar to the real one, and the per-sequence motif occurrence is close to that in the real data; however, the MSE is large. In contrast, using just the regressor ($\gamma = 1$) results in sequence with poor motif occurrence characteristics. It is only for the combination loss ($\gamma = 0.5$) that we see generated sequences that are approaching the desired properties. This supports the usefulness of the architecture we propose in Fig \ref{fig:yeast-truepred}.

An issue we have encountered in this context is that extremely extended training periods (10,000-20,000 epochs) begin to seriously degrade performance on all metrics. We are still investigating its sources and possible solutions.

\subsection{Sharpr-MPRA}

We only have very preliminary sequence generation results on the Sharpr-MPRA dataset (Table \ref{table:sharpr-gan}), as we have not yet arrived at an optimal prediction model. % Thus the regressor essentially acts as a fuzzer on the discriminator loss, confounding the generators ability to learn from the discriminator, without providing real support for generating underlying motifs.

\section{Conclusions}

Here we show, using the Yeast MPRA dataset, that first, our CNN regression architecture is capable of learning regulatory activity from MPRA datasets, and even surpasses the baseline established by van Dijk using physical models, and second, that our proposed modified WGAN deep learning architecture is capable of \textit{de novo} generating DNA sequence using the measured regulatory activities of known sequences as input. As the Sharpr-MPRA dataset is much larger and more difficult to predict, we are still in the process of finding the optimal approach for predicting and generating sequences on it, and this is where the next steps for this project are primarily oriented towards. Training WGAN with various $\gamma$ values, and annealing the $\gamma$ parameter over time are among the strategies we are focusing on next. 

Additionally, the suite of evaluation tools we have built can be applied to any generated DNA sequences, and comparing against existing sequence generation approaches\cite{Killoran2017,fbgan,dbas} is also something we plan to do in the immediate future. We are also interested in establish a more straightforward, non-GAN baseline by setting up an MLE method for sequence generation based on the promise such approaches have shown in the language generation literature.

\section{Contributions}

N.F., A.K., and G.K.M. conceived the project. N.F. carried out deep learning model generation and data analysis with input from G.K.M. and A.K. N.F. prepared the manuscript with input and oversight from G.K.M. and A.K.

\begin{thebibliography}{100}

% \section*{References}

\input{references}

\end{thebibliography}

\end{multicols}

% \clearpage

% \setcounter{table}{0}
% \renewcommand{\tablename}{Supplementary Table}
% \setcounter{figure}{0}
% \renewcommand{\figurename}{Supplementary Figure}

% \setcounter{page}{1}
% \renewcommand\thepage{{SM }\arabic{page}}

% \begin{center}
% {\LARGE \textbf{Supplementary Materials}}
% \end{center}

% \section*{Supplementary Tables}

% \section*{Supplementary Figures}

\end{document}