\documentclass[10pt]{article}
\usepackage{marginnote}
\usepackage[paperheight=25cm,paperwidth=18cm,lmargin=1.7cm,rmargin=1.7cm,top=2.2cm,bottom=2cm,marginparwidth=3.2cm,marginparsep=-3.2cm]{geometry}
% \usepackage[paperheight=25cm,paperwidth=18cm,lmargin=1.7cm,rmargin=1.7cm,top=2.2cm,bottom=2cm]{geometry}
\setlength\columnsep{30pt}
\usepackage{multicol}
\usepackage{amsmath}
\usepackage{mathtools}
\usepackage{amsthm}
\usepackage{array}
\usepackage{booktabs}
\usepackage{tabularx}
\usepackage[auth-sc]{authblk}
\usepackage{longtable}
\usepackage{multirow}
\usepackage{hyperref}
\usepackage{enumerate}
\usepackage[labelfont=bf]{caption}
\usepackage[usenames,dvipsnames]{xcolor}
\usepackage{mdframed}
\usepackage{graphics}
\usepackage{multirow}
\usepackage{rotating}
\usepackage{array}
\usepackage{lscape}
\usepackage{caption}
\usepackage{breakurl}
\usepackage{todonotes}
\usepackage{hanging}
\usepackage[final]{pdfpages}
\usepackage[leftFloats,CaptionAfterwards]{fltpage}
\usepackage[numbers,sort&compress]{natbib}
\setlength{\bibsep}{3pt}
\usepackage{abstract}
\usepackage{enumitem}
\usepackage{titlesec}
\usepackage{etoolbox}
\usepackage{soul}
\patchcmd{\thebibliography}{\section*}{\section}{}{}
\titleformat{\section}[block]{\bf \fontfamily{phv}\selectfont{\Large\bfseries\filcenter}}{\thesection}{0.6em}{}
\titleformat{\subsection}[block]{\bf\fontfamily{phv}\selectfont{\normalsize\bfseries\filcenter}}{\thesubsection}{0.4em}{}

\hypersetup{
  colorlinks,
  citecolor=Blue,
  linkcolor=Red,
  urlcolor=Violet}
  
\usepackage{helvet}

\usepackage{titlesec}

% \titleformat{\section}[leftmargin]{\normalfont\sffamily\bfseries\filleft}{}{0pt}{}
% \titlespacing{\section}{4pc}{1.5ex plus .1ex minus .2ex}{1pc}

\makeatletter
\def\@biblabel#1{\@ifnotempty{#1}{#1.}}
\makeatother

\newenvironment{Figure}
{\par\medskip\noindent\minipage{\linewidth}}
{\endminipage\par\medskip}

\makeatletter
\renewcommand{\maketitle}{\bgroup\setlength{\parindent}{0pt}
\begin{flushleft}
  \textbf{\@title}
  \@author
\end{flushleft}\egroup
}
\makeatother


\title{\bf \begin{flushleft}\fontfamily{phv}\selectfont{\Large Genome-wide mapping of active regulatory elements using ATAC-seq}
\end{flushleft}}
\renewcommand\Authfont{\normalsize}
\author[1,*,$\#$]{\fontfamily{phv}\selectfont{\textbf{Georgi K. Marinov}}}
\author[1,*]{\fontfamily{phv}\selectfont{\textbf{Zohar Shipony}}}
\author[1,2]{\fontfamily{phv}\selectfont{\textbf{Anshul Kundaje}}}
\author[1,3,4,5,$\#$]{\fontfamily{phv}\selectfont{\textbf{William J. Greenleaf}}}
\renewcommand\Affilfont{\itshape\normalsize}
\affil[1]{Department of Genetics, Stanford University, Stanford, CA 94305, USA}
\affil[2]{Department of Computer Science, Stanford University, Stanford, CA 94305, USA}
\affil[3]{Center for Personal Dynamic Regulomes, Stanford University, Stanford, California 94305, USA}
\affil[4]{Department of Applied Physics, Stanford University, Stanford, California 94305, USA}
\affil[5]{Chan Zuckerberg Biohub, San Francisco, California, USA}
\affil[*]{These authors contributed equally to this work}
\affil[$\#$]{Corresponding authors}
\date{}

\def\changemargin#1#2{\list{}{\rightmargin#2\leftmargin#1}\item[]}
\let\endchangemargin=\endlist 

\theoremstyle{definition}
\newtheorem{note}{}

\begin{document}
\maketitle

\renewcommand{\abstractname}{\noindent\fontfamily{phv}\selectfont{\centerline{}
Abstract}}

\renewenvironment{abstract}
 {\small
  \begin{flushleft}
  \bfseries \noindent{\large\abstractname}\par\nobreak\smallskip\vspace{-.5em}\vspace{0pt}
  \end{flushleft}
  \list{}{
    \setlength{\leftmargin}{.0cm}%
    \setlength{\rightmargin}{\leftmargin}%
  }%
  \item\relax}
 {\endlist}
 
% \renewenvironment{abstract}
%   {\small\quotation
%   {\bfseries\noindent{\large\abstractname}\par\nobreak\smallskip}}
%   {\endquotation}

\renewcommand{\figurename}{Fig.}

\centerline{}
\begin{abstract}
\noindent\noindent{\normalsize Active \textit{cis-}regulatory elements (cREs) in eukaryotes are characterized by nucleosomal depletion and, accordingly, higher accessibility. This property has turned out to be immensely useful for identifying cREs genome-wide and tracking their dynamics across different cellular states, and is the basis of numerous methods taking advantage of the preferential enzymatic cleavage/labeling of accessible DNA. ATAC-seq (\textbf{A}ssay for \textbf{T}ransposase-\textbf{A}ccessible \textbf{C}hromatin using \textbf{seq}uencing) has emerged as the most versatile and widely adaptable such method, and has been widely adopted as the standard tool for mapping open chromatin regions. Here, we discuss the current optimal practices and important considerations for carrying out ATAC-seq experiments, primarily in the context of mammalian systems. 
\centerline{}
\centerline{}
\indent\indent\textbf{Key words:} Enhancers, Promoters, Chromatin accessibility, ATAC-seq, High-throughput sequencing}
\end{abstract}
\centerline{}
\centerline{}
\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm}

\reversemarginpar\marginpar{\section{Introduction}}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig1.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Outline of the ATAC-seq assay}. Nuclei are isolated from cells and chromatin is incubated with an active Tn5 transposase carrying PCR amplification adapter sequences. Tn5 preferentially inserts into accessible chromatin, such as that found at active regulatory elements. After transposition, DNA is purified and PCR amplification is carried out from the primer landing sites deposited by Tn5.}
\label{Fig1}
\end{center}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig2.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Structure of an ATAC-seq library}. (A). After transposition, an original DNA fragment is flanked by two Tn5 molecules with their adapter. Note that all three possible configurations -- A-A, B-B, and A-B/B-A (where ``A'' and ``B'' indicate the two different adapters that Tn5 molecules used for transposition carry; these sequences have a common ``ME'' segment) -- are produced, but only the A-B ones can be subsequently amplified and sequenced under conventional protocols. The A and B are used as landing sites for the PCR primers that add the i5 and i7 barcodes and the P5 and P7 sequences needed for Illumina sequencing. (B) Typical sequences of A and B adapters and of i5 and i7 PCR primers. The [i7] and [i5] sequences are typically 8-bp long, and should be chosen appropriately so as to maximize the sequence distance between each pair of indexes.}
\label{Fig2}
\end{center}
\end{figure*}

Eukaryotic chromatin is generally packaged by nucleosomes, octamer particles comprised of the four core nucleosomal histones H3, H4, H2A and H2B \cite{Luger1997}. Nucleosomal packaging has an inhibitory effect on transcriptional activity and prevents the binding of most transcription factors and other regulatory proteins. Active promoter and enhancer elements differ from the rest of the genome in that they usually exist in a depleted of nucleosomes, open-chromatin state. This property is highly useful in practice because just as regulatory factors can access active cREs so can various enzymes, whose action is otherwise inhibited by nucleosome particles. That enhancers and promoters exhibit this property was already appreciated nearly four decades ago, when their hypersensitivity to cleavage by DNase enzymes was first reported \cite{Wu1980,Keene1981,McGhe1981}. 

DNase remained the main tool for mapping active cREs into the genomic era, initially coupled to microarray readouts \cite{Dorschner2004,Sabo2004,Sabo2006}, and eventually adapted to a high-throughput sequencing format \cite{Crawford2006,Boyle2008,Thurman2012}. In parallel to these developments as well as more recently, a wide variety of alternative methods taking advantage of the preferential enzymatic/chemical cleavage/modification of accessible DNA were also developed, employing methyltransferases \cite{Kelly2012,Krebs2017,Shipony2018,Wang2019,Aughey2018}, restriction enzymes \cite{Chereji2019}, nicking enzymes \cite{Ponnaluri2017}, small molecules \cite{Umeyama2017}, viral integration \cite{Timms2019}, and others. 

ATAC-seq, which is based on the preferential insertion into unprotected DNA by a hyperactive mutant version of the Tn5 transposase \cite{Buenrostro2013} (Figure \ref{Fig1}), has emerged as the most convenient, widely adaptable and straightforward to execute method for profiling open chromatin. Treatment of chromatin with Tn5 results in the insertion into accessible DNA of adapters that then enable the direct amplification of open chromatin fragments. This eliminates much of complex series of enzymatic steps that are unavoidable features of previous methods such as DNase-seq, allows for the protocol to be completed in just a few hours, and also dramatically lowers the input requirements, down to a few tens of thousands of cells in bulk reactions as well as enabling single cell (scATAC) assays \cite{Buenrostro2015,Cusanovich2015}.

In this chapter, we describe the most important considerations for carrying out successful ATAC-seq experiments in the context of the Omni-ATAC protocol, an optimized version of the ATAC-seq assay that produces high-quality ATAC libraries for most mammalian cell lines and cell types, as well as for a number of other eukaryotes.

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm} 
\reversemarginpar\marginpar{\section{Materials}}

Prepare a master stock of the ATAC-RSB buffer without detergents in a large volume (e.g. 50 mL) and store it 4$\,^{\circ}\mathrm{C}$. 

Prepare master stocks of 2$\times$ TD buffer (e.g. in 2-mL tubes) and keep those at -20$\,^{\circ}\mathrm{C}$

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Transposition Buffers and Reagents}}}

Prepare the ATAC-RSB-Lysis and ATAC-RSB-Wash buffers immediately before use by adding the necessary detergents; keep on ice.

\begin{enumerate}
\item IGEPAL CA-630 detergent (Sigma Cat\# 11332465001; supplied as a 10\% solution)
\item Tween-20 detergent (Sigma Cat\# 11332465001, supplied as a 10\% solution; store at 4$\,^{\circ}\mathrm{C}$)
\item Digitonin detergent (Promega Cat\# G9441, supplied as a 2\% solution in DMSO; store at -20$\,^{\circ}\mathrm{C}$))

\item ATAC-RSB buffer (master stock) \\
\hspace*{20pt}10 mM Tris-HCl pH 7.4 \\
\hspace*{20pt}10 mM NaCl \\
\hspace*{20pt}3 mM MgCl$_2$

\item ATAC-RSB-Lysis buffer \\
\hspace*{20pt}10 mM Tris-HCl pH 7.4 \\
\hspace*{20pt}10 mM NaCl \\
\hspace*{20pt}3 mM MgCl$_2$ \\
\hspace*{20pt}0.1\% IGEPAL CA-630\\
\hspace*{20pt}0.1\% Tween-20 \\
\hspace*{20pt}0.01\% Digitonin

\item Lysis Wash Buffer (ATAC-RSB-wash) \\
\hspace*{20pt}10 mM Tris-HCl pH 7.4 \\
\hspace*{20pt}10 mM NaCl \\
\hspace*{20pt}3 mM MgCl$_2$ \\
\hspace*{20pt}0.1\% Tween-20

\item 2$\times$ TD buffer \\
\hspace*{20pt}20 mM Tris-HCl pH 7.6 \\
\hspace*{20pt}10 mM MgCl$_2$ \\
\hspace*{20pt}20\% Dimethyl Formamide

\item Tn5 transposase (\textit{see} \textbf{Note \ref{Tn5}})
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Library building, sequencing and quality evaluation}}}

\begin{enumerate}
\item 200-$\mu$L PCR tubes
\item Sequencing primers/adapters (\textit{see} \textbf{Note \ref{Adapters}}) 
\item NEBNext High-Fidelity 2$\times$ PCR Master Mix (NEB, Cat\# M0541S)
\item Qubit fluorometer or equivalent
\item QuBit tubes
\item QuBit dsDNA HS Assay Kit
\item TapeStation (Agilent) or equivalent, e.g. BioAnalyzer (Agilent).
\item TapeStation D1000 tape and reagents (Agilent) 
\item 10mM dNTP mix 
\item 25$\times$ SYBR Green (Thermo Fisher Cat\# S7563. Supplied as 10,000X)
\item Phusion High-Fidelity DNA Polymerase (NEB, Cat\# M0530L) 
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{General materials and Equipment}}}

\begin{enumerate}
\item 1.5-mL microcentrifuge tubes, preferably low protein and DNA binding (\textit{see} \textbf{Note \ref{Tubes}})
\item 2-mL, 15-mL and 50-mL tubes
\item Incubator (37$\,^{\circ}\mathrm{C}$), or a Thermomixer.
\item Tabletop centrifuge
\item Thermal cycler
\item MinElute PCR Purification Kit (Qiagen Cat\# 28004/28006), Zymo DNA Clean and Concentrator Kit (Zymo Cat\#  D4013/D4014), or equivalent (\textit{see} \textbf{Note \ref{DNAPurification}}) 
\item Nuclease-free H$_2$O
\item 1$\times$ PBS buffer solution
\item qPCR machine (StepOne or equivalent)
\end{enumerate}

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm} 
\reversemarginpar\marginpar{\section{Methods}}

The general outline of the ATAC-seq assay is shown in Figure \ref{Fig1}. Nuclei are isolated from cells, then a transposition reaction is carried out, DNA purified, and sequencing libraries prepared. Here we discuss the Omni-ATAC protocol \cite{Corces2017} in its most widely applicable version, as it derives the optimal results in terms of reduced mitochondrial contamination (\textit{see} \textbf{Note \ref{Mitochondria}}) compared to other versions of the assay \cite{Buenrostro2013,Corces2016}. The Omni-ATAC protocol works as described for the great majority of mammalian and insect cell lines, as well as for many other eukaryotic cells without cell walls. Different protocols need to be applied for nuclei isolation from other sources, such as tissues (\textit{see} \textbf{Note \ref{tissue}}), plant cells (\textit{see} \textbf{Note \ref{plants}}), various small metazoan animals (\textit{see} \textbf{Note \ref{Celegans}}), yeast (\textit{see} \textbf{Note \ref{yeast}}), and others. 

It is also in principle possible to carry out ATAC-seq on crosslinked material but this generally produces suboptimal results and we advise against it (\textit{see} \textbf{Note \ref{xlink}}). 

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Removal of non-viable cells (optional)}}}

The presence of non-viable cells can negatively affect the quality of final ATAC-seq libraries as dead cells generate a general background of dechromatinized DNA, decreasing the enrichment for open chromatin regions. Two strategies are usually used to address this problem:

\begin{enumerate}
\item If the fraction of dead cells is not too high (i.e. $\sim$5-15\%), cells are treated with DNAse (200 U/mL) in culture media, usually for 30 minutes at $37\,^{\circ}\mathrm{C}$. Cells are then washed thoroughly with 1$\times$PBS to remove DNAse. 
\item If the fraction of dead cells is higher, live cells can be separated from dead cells using a Ficoll gradient (Sigma Cat\# GE17-1440-02), with the exact conditions varying depending on the cell type.
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Preparation of nuclei}}}

Once the quality of the input cells has been ensured, the next step is to prepare nuclei and transpose them. The empirically determined optimum input number of cells for a species with a mammalian-sized genome is 50,000 diploid cells. Scale appropriately according to expected genome size and ploidy, and also change other parameters, such as centrifugation speeds, if necessary.

\begin{enumerate}
\item Centrifuge 50,000 viable cells at 500 $g$ for 5 min at $4\,^{\circ}\mathrm{C}$
\item Carefully aspirate the supernatant avoiding the pellet. 
\item Add 50 $\mu$L of cold ATAC-RSB-Lysis Buffer and pipette up and down several times.
\item Incubate on ice for 3 minutes
\item Add 1 mL cold ATAC-RSB-Wash Buffer, and invert several times to mix well. 
\item Centrifuge at 500 $g$ for 5 min at $4\,^{\circ}\mathrm{C}$
\item Carefully aspirate the supernatant as fully as possible while avoiding the pellet.
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Transposition}}}

Carry out transposition as follows:

\begin{enumerate}
\item Immediately resuspend the pellet in the transposase reaction mix:\\
\hspace*{20pt}25 $\mu$L TD buffer\\
\hspace*{20pt}2.5 $\mu$L Tn5\\
\hspace*{20pt}22.5 $\mu$L nuclease-free H$_2$0

\item Incubate at $37\,^{\circ}\mathrm{C}$ for 30 min in a Thermomixer at 1000 RPM.
\end{enumerate}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig3.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Determination of additional PCR cycles (post pre-amplification) and library quantification using qPCR}. (a) Determinaton of additional PCR cycles; qPCR is performed to determine the number of extra cycles to perform on the pre-amplified ATAC material without reaching saturation. To determine the number of extra cycles, find the number of cycles needed to reach 1/3 of the maximum relative fluorescence, and then carry out this number of additional PCR cycles. (b) Quantification of libraries; qPCR qualification is performed on diluted ATAC-seq libraries (400$\times$) against a serial dilution of PhiX (200 pM to 1.56 pM). A standard curve is generated based on the PhiX dilutions, and used to calculate the  molarity of the ATAC-seq library.}
\label{Fig3}
\end{center}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig4.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Evaluation of ATAC-seq library size distribution}. Shown is the fragment length distribution as evaluated using a TapeStation instrument and a D1000 TapeStation kit for an ATAC-seq library for the human GM12878 cell line. When a clear nucleosomal signature is observed, as in the example shown here, the library is most likely of high quality. Note that the nucleosomal signature can in some cases be obscured by the presence of high levels of mitochondrial contamination or some other source of highly accessible DNA (\textit{see} \textbf{Note \ref{riboDNA}} for further discussion).}
\label{Fig4}
\end{center}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig5.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Expected results from a successful ATAC-seq experimnent}. (A) Shown is the insert length distribution of a typical sequenced mammalian ATAC-seq library, showing a prominent subnucleosomal peak, as well as a mononucleosomal and a less pronounced dinuleosomal peak; (B) Aggregate ATAC-seq signal profile around transcription start sites (TSSs); (C) ATAC-seq profile in a 212-kb neighborhood around the human \textit{MYC} gene. The ENCODE Consortium \cite{ENCODE2012} keratinocyte dataset with accession ID ENCSR798IJQ was used for this example. }
\label{Fig5}
\end{center}
\end{figure*}


\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{DNA purification}}}

\begin{enumerate}
\item Immediately stop the reaction using 250 $\mu$L (i.e 5$\times$) of PB buffer (if using MinElute) or DNA Binding Buffer (if using Zymo; also \textit{see} \textbf{Note \ref{PB}}). 
\item Purify samples following the kit instructions.
\item Elute with 10 $\mu$L of Elution Buffer.
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{PCR amplification and library generation}}}

Typically, a dual-indexing approach is used when amplifying ATAC-seq libraries. The general structure of an ATAC-seq library as well as the relevant adapter and primer sequences are shown in Figure \ref{Fig2}. \textit{See} \textbf{Note \ref{Adapters}} for further discussion.

\begin{enumerate}
\item Set up a PCR reaction as follows:\\
\hspace*{20pt}10 $\mu$L eluted transposition reaction\\
\hspace*{20pt}10 $\mu$L Nuclease-free H$_2$O\\
\hspace*{20pt}2.5 $\mu$L of Adapter 1\\
\hspace*{20pt}2.5 $\mu$L of Adapter 2\\
\hspace*{20pt}25 $\mu$L NEBNext High-Fidelity 2$\times$ PCR Master Mix (\textit{see} \textbf{Note \ref{HotStart}})

\item Optimization of PCR conditions, pre-amplification. Amplify DNA for 5 cycles as follows: \\
\hspace*{20pt}$72\,^{\circ}\mathrm{C}$ for 3 minutes \\
\hspace*{20pt}$98\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}5 cycles of: \\
\hspace*{40pt}$98\,^{\circ}\mathrm{C}$ for 10 seconds \\
\hspace*{40pt}$63\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{40pt}$72\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}Hold at $4\,^{\circ}\mathrm{C}$

\item Determining additional cycles using qPCR. Use 5 $\mu$L of the pre-amplified reaction in a total qPCR reaction of 15 $\mu$L as follows: \\
\hspace*{20pt}3.76 $\mu$L nuclease-free H$_2$O\\
\hspace*{20pt}0.5 $\mu$L of Adapter 1\\
\hspace*{20pt}0.5 $\mu$L of Adapter 2\\
\hspace*{20pt}0.24 $\mu$L 25$\times$ SYBR Green (in DMSO)\\
\hspace*{20pt}5 $\mu$L NEBNext High-Fidelity 2$\times$ PCR Master Mix\\
\hspace*{20pt}5 $\mu$L pre-amplified sample \\

\item Determining additional cycles using qPCR. Run the qPCR reaction with the following settings in a qPCR machine: \\
\hspace*{20pt}$98\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}20 cycles of: \\
\hspace*{40pt}$98\,^{\circ}\mathrm{C}$ for 10 seconds \\
\hspace*{40pt}$63\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{40pt}$72\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}Hold at $4\,^{\circ}\mathrm{C}$

\item Assess the amplification profiles and determine the required number of additional cycles to amplify. Typical results are shown in Figure \ref{Fig3}.

\item Carry out final amplification by placing the remaining 45 $\mu$L in a thermocycler and running the following program:\\
\hspace*{20pt}$N_{add}$ cycles of: \\
\hspace*{40pt}$98\,^{\circ}\mathrm{C}$ for 10 seconds \\
\hspace*{40pt}$63\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{40pt}$72\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}Hold at $4\,^{\circ}\mathrm{C}$

Where $N_{add}$ is the number of additional cycles.

In practice, we have found that 8-10 cycles are usually sufficient to amplify a standard mammalian ATAC library, and, if a very large number of samples are being processed at a time, the following reaction can be run:

\item Single-step PCR.  \\
\hspace*{20pt}$72\,^{\circ}\mathrm{C}$ for 3 minutes \\
\hspace*{20pt}$98\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}8-10 cycles of: \\
\hspace*{40pt}$98\,^{\circ}\mathrm{C}$ for 10 seconds \\
\hspace*{40pt}$63\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{40pt}$72\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}Hold at $4\,^{\circ}\mathrm{C}$

\item Purify the amplified library following the same procedure used for purified the ATAC reaction.
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Library quantification and evaluation of library quality}}}

Before libraries can be sequenced, they need to be properly quantified and their quality evaluated. There are two components to this process -- first, evaluation of the insert distribution, and second, quantification. 

\begin{enumerate} 
\item Examination of library size distribution. This step can be carried out using a variety of instruments that are now available for this purpose, such as a TapeStation or a BioAnalyzer. In our practice we prefer to use a TapeStation (with the D1000 or HS D1000 kits) due to its ease of use, flexibility and rapid turnaround time. Typical results are shown in Figure \ref{Fig4}. A successful mammalian ATAC-seq library usually exhibits a clear nucleosomal signature (though the reverse is not always true; \textit{see} \textbf{Note \ref{riboDNA}} for further discussion). 

\item Quantification of library concentration. For most high-throughput sequencing applications, this step is standardly carried out using a Qubit fluorometer. This works well for most libraries as they exhibit a unimodal fragment length distribution, and the Qubit generally returns highly accurate and reliable measurements. 

However, ATAC libraries do not exhibit a unimodal fragment distribution and in fact often contain fragments of length higher than what can be sequenced on standard Illumina instruments. Thus the effective library concentration often differs from the apparent library concentration measured using Qubit (though Qubit measurements can still be used, with that caveat in mind, if no other information can be obtained )

\item Estimation of effective library concentration using qPCR. 

A standard curve is generated using Illumina PhiX standard (10nM) by first making a 50$\times$ dilution to 200 pM, from which additional seven serial 2$\times$ dilutions are made (to 100 pM, 50 pM, 25 pM, 12.5 pM, 6.25 pM, 3.125 pM, and 1.56 pM).

Set up a 20 $\mu$L qPCR reactions as follows: \\

\hspace*{20pt}7.9 $\mu$L nuclease-free H$_2$O\\
\hspace*{20pt}5 $\mu$L ATAC-seq 400$\times$ diluted library or PhiX standards\\
\hspace*{20pt}4 $\mu$L Phusion HF Buffer\\
\hspace*{20pt}1 $\mu$L 25 $\mu$M i7 primer\\
\hspace*{20pt}1 $\mu$L 25 $\mu$M i5 primer\\
\hspace*{20pt}0.4 $\mu$L 10mM dNTP mix\\
\hspace*{20pt}0.5 $\mu$L 25$\times$ SYBR Green (in DMSO)\\
\hspace*{20pt}0.2 $\mu$L NEB Phusion HF \\

Run the qPCR reaction with the following settings in a qPCR machine: \\

\hspace*{20pt}$98\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}20 cycles of: \\
\hspace*{40pt}$98\,^{\circ}\mathrm{C}$ for 10 seconds \\
\hspace*{40pt}$63\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{40pt}$72\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}Hold at $4\,^{\circ}\mathrm{C}$ \\

Create a standard curve based on the PhiX dilutions and estimate the true molarity of the qPCR library based on it.

Commercial kits such as NEBNext Library Quant Kit for Illumina or KAPA Library Quantification Kits can also be used, in a similar manner.

\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Sequencing}}}

The protocol described here generates libraries designed to be sequenced on Illumina sequencers. 
A decision usually needs to be made regarding the format to be used when sequencing. 

We strongly advise against sequencing ATAC-seq libraries in a single-end format, for two reasons. First, analysis of the fragment length distribution is an important part of the quality evaluation of ATAC-seq datasets, and this is only truly possible in paired-end format. Second, many analyses of ATAC-seq data (e.g. transcription factor footprinting) operate at the level of examining insertion points rather than read coverage; paired-end reads produce twice as many such insertion points for the same cost.

In practice, we have observed that ATAC-seq insert length distributions peaks around 50 to 60 bp (Figure \ref{Fig5}). Therefore it is most cost effective to sequence ATAC libraries in 2$\times$36 bp or 2 $\times$ 50 bp formats (depending on the exact sequencer and kits available), as sequencing kits with more cycles are usually priced significantly higher.

However, for some applications (for example, if aiming to study the effects of sequence variation on chromatin accessibility), longer reads can provide important additional information.

Thus how exactly sequencing is to be executed is to be determined depending on the specific needs of the study being carried out.

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm} 
\centerline{}

\reversemarginpar\marginpar{\section{Expected results}}

After sequencing, reads mapped to the reference genome, and several quality evaluation metrics are considered before proceeding with downstream analysis. A typical ATAC-seq library exhibits a nucleosomal signature as shown in Figure \ref{Fig5}A. Enhancer, promote and insulator regions should be strongly enriched relative to the rest of the genome (Figure \ref{Fig5}C shows the accessibility profile in the neighborhood of the \textit{MYC} gene, which highlights a number of candidate distal regulatory elements). A quick way to evaluate the degree of enrichment is to examine aggregate plots of ATAC-seq signal around annotated TSSs, as shown in Figure \ref{Fig5}B, which can also be further formalized as a TSS ratio score, which is calculated by dividing the average number of fragments within $\pm$100 bp of the TSS to the sum of the average number of fragments within the two 100-bp windows at the points +2 and -2 kb away from the TSS. The advantage of this metric is that it is independent of peak calling or sequencing depth; its disadvantage is that it is annotation-dependent and well calibrated only for the human and mouse genomes. In the latter two cases, good ATAC-seq libraries usually exhibit TSS rations $\geq$8. 

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm} 
\centerline{}

\reversemarginpar\marginpar{\section{Notes}}

\begin{enumerate}

\item \label{Tn5}The Tn5 transposase can be obtained as part of the various Nextera DNA Library Prep kits offered by Illumina, and also from several other commercial vendors. It can also be made by individual laboratories following previously published protocols \cite{Picelli2014}. The latter approach is, although laborious, the most cost-effective, especially for large-scale projects. If homemade Tn5 is used, its activity should ideally be well characterized relative to standard enzymatic formulations.

\item \label{Adapters}PCR and indexing primers/adapters supplied with the Nextera DNA Library Prep kits offered by Illumina can be used. Alternatively, or if a larger number of indexing sequences is needed, custom-designed and synthesized oligos can also be used with equivalent success. The structure of an ATAC-seq library with the relevant sequences is shown in Figure \ref{Fig2}. The i7 primer sequence is:

\begin{verbatim}
5'-CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG-3'
\end{verbatim}

The i5 sequence is:

\begin{verbatim}
5'AATGATACGGCGACCACCGAGATCTACAC[i5]TCGTCGGCAGCGTC-3'
\end{verbatim}

Where \verb|[i7]| and \verb|[i5]| are the index sequences (typically 8-bp long). Dissolve and dilute to 25 $\mu$M. 

\item \label{HotStart}The initial extension is very important when amplifying tranposed DNA as it is needed to fill in the gap left from the transposition itself (see Figure \ref{Fig2}) and allow PCR primers to land in subsequent amplification cycles. For this reason, it is not recommended to use hot-start polymerase mixes, in which the polymerase is only activated by exposion to denaturation temperatures.

\item \label{Tubes}Low-binding tubes are preferable, though not absolutely required, as a low number of cells (only $\sim$50,000) is usually used as input to an ATAC reaction.

\item \label{DNAPurification}The MinElute kit can be replaced with other DNA purification kits; for example, we have also had equivalent success using the DNA Clean \& Concentrator from Zymo. The important variables regarding the DNA isolation procedure after transposition are the efficiency of recovery and the lower size limit of the recovered fragments. The insert length distribution of most ATAC-seq libraries peaks around 50-60 bp, i.e. even including Tn5 adapter sequences, many of the informative fragments are shorter than 90-100 bp, and should ideally be preserved during the DNA purification procedure.

\item \label{Mitochondria} Early versions of the ATAC-seq protocol \cite{Buenrostro2013} exhibited very high proportions of reads originating from the mitochondrial genome, often exceeding 80\% of the total. This is due to the fact that the mitochondrial genome is not packaged by nucleosomes, and is therefore highly accessible to transposase insertion. Decreasing the fraction of mitochondria has been a key part of the improvement of the ATAC-seq protocol in its the currently used variants relative to the original version, and has been achieved thanks to the addition of the combination of digitonin, Tween-20 and IgePAL detergents during the cell lysis and nuclei preparation step. As a result ATAC-seq libraries generated using modern protocols frequently contain as little as $\leq$5\% of reads for many cell types. 

We do note, however, that there are cells that simply contain an extremely high number of mitochondria due to very high levels of metabolic activity (e.g. some cancer cell lines), and even with the optimized Omni-ATAC protocol mitochondrial fractions are still quite high for them. These are special cases though.

We also note that high levels of mitochondrial contamination do not necessarily correspond to poor-quality ATAC-seq datasets in terms of signal-to-noise ratios in the nuclear genome. We have found no inverse correlation between the fraction of reads mapping to the mitochondrial genome and the levels of enrichment for open chromatin regions. The key benefit of eliminating mitochondrial reads is to reduce sequencing costs as fewer overall reads need to be sequenced to achieve the necessary coverage over the nuclear genome.

\item \label{tissue} Nuclei isolation from tissues, especially when frozen, is a multi-step procedure involving tissue homogenization by douncing followed by density gradient centrifugation. The reader is referred to \cite{Corces2017} for more details.

\item \label{plants} Plant cells are a challenging system to isolate nuclei from because of their thick cellulose cell walls. Nuclei are isolated by grinding tissue material in liquid nitrogen \cite{Lu2017,Maher2018,Bajic2018} and then sorting nuclei by sucrose sedimentation or by FACS, if the cell type of interest has been labeled accordingly, e.g. using the INTACT approach \cite{Deal2010}.

\item \label{Celegans} It is also often necessary to carry out homogenization followed by nuclei isolation when working with whole animals, e.g. \textit{C. elegans} \cite{Daugherty2017}, with the exact protocol optimized according to the specifics of the organism being studied.

\item \label{yeast} Yeast (and fungal cells in general) have thick cell walls comprised of polysaccharides, lipids and chitin in various proportions. They present a barrier to the access of Tn5 to the nucleus, thus ATAC-seq protocols tailored to such cells involve treatment with zymolyase or chitinase enzymes \cite{Schep2015}, with the exact details varying depending on the species studied.

\item \label{xlink} It is in principle possible to carry out ATAC-seq experiments and obtain enriched libraries from crosslinked sources. This has in fact been how a number of scATAC-seq studies have been executed in recent years \cite{Cusanovich2018,Cao2018}. However, ATAC-seq libraries generated from crosslinked cells are generally suboptimal, with lower signal-to-noise ratio than standard ATAC-seq datasets, and they also tend to display a pronounced loss of subnucleosomal fragments compared to the standard protocol. We thus advice against using fixed material for ATAC-seq except for special circumstances where this is the only available option.

\item \label{PB} This is also a possible stopping point if necessary. DNA can be stored in PB buffer at -20$\,^{\circ}\mathrm{C}$ before proceeding with subsequent clean up steps at a later time.

\item \label{riboDNA} If a clear nucleosomal signature is observed in a TapeStation profile, that in almost all cases indicates a successful ATAC-seq experiment. The inverse is not always true, as the fragment distribution can be dominated by the presence of large amounts of strongly accessible DNA in the original sample. For example, libraries with high levels of mitochondrial contamination often exhibit an obscured nucleosomal signature, and so do nearly all yeast libraries (in the latter case it is because yeast genomes contain a large number of ribosomal DNA copies, which are nearly nucleosome-free when being actively transcribed, and often comprise half or even more of yeast ATAC-seq libraries \cite{Shipony2018}). As discussed in Note \ref{Mitochondria}, high levels of mitochondrial contamination are undesirable in terms of the efficient utilization of sequencing resources but do not necessarily result in poor quality datasets in the nuclear genome.

\end{enumerate}

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}
\centerline{}
\begin{changemargin}{3.7cm}{0cm} 
\reversemarginpar\marginpar{\section*{Acknowledgements}} 

The authors thank members of the Greenleaf and Kundaje labs for many helpful discussions. This work was supported by NIH grants UM1HG009436 and P50HG007735 (to W.J.G.). WJG is a Chan Zuckerberg investigator. Z.S. is supported by EMBO Long-Term Fellowship EMBO ALTF 1119-2016 and by Human Frontier Science Program Long-Term Fellowship HFSP LT 000835/2017-L. G.K.M. was supported by the Stanford School of Medicine Dean's Fellowship. 

\end{changemargin} 

\begin{thebibliography}{100}

\begin{multicols}{2}
\begin{small}

\input{references}

\end{small}
\end{multicols}

\end{thebibliography}

\end{document}