 \documentclass[10pt]{article}
\usepackage{marginnote}
\usepackage[paperheight=25cm,paperwidth=18cm,lmargin=1.7cm,rmargin=1.7cm,top=2.2cm,bottom=2cm,marginparwidth=3.2cm,marginparsep=-3.2cm]{geometry}
% \usepackage[paperheight=25cm,paperwidth=18cm,lmargin=1.7cm,rmargin=1.7cm,top=2.2cm,bottom=2cm]{geometry}
\setlength\columnsep{30pt}
\usepackage{multicol}
\usepackage{amsmath}
\usepackage{mathtools}
\usepackage{amsthm}
\usepackage{array}
\usepackage{booktabs}
\usepackage{tabularx}
\usepackage[auth-sc]{authblk}
\usepackage{longtable}
\usepackage{multirow}
\usepackage{hyperref}
\usepackage{enumerate}
\usepackage[labelfont=bf]{caption}
\usepackage[usenames,dvipsnames]{xcolor}
\usepackage{mdframed}
\usepackage{graphics}
\usepackage{multirow}
\usepackage{rotating}
\usepackage{array}
\usepackage{lscape}
\usepackage{caption}
\usepackage{breakurl}
\usepackage{todonotes}
\usepackage{hanging}
\usepackage[final]{pdfpages}
%\usepackage[leftFloats,CaptionAfterwards]{fltpage}
\usepackage[numbers,sort&compress]{natbib}
\setlength{\bibsep}{3pt}
\usepackage{abstract}
\usepackage{enumitem}
\usepackage{titlesec}
\usepackage{etoolbox}
\usepackage{soul}
\patchcmd{\thebibliography}{\section*}{\section}{}{}
\titleformat{\section}[block]{\bf \fontfamily{phv}\selectfont{\Large\bfseries\filcenter}}{\thesection}{0.6em}{}
\titleformat{\subsection}[block]{\bf\fontfamily{phv}\selectfont{\normalsize\bfseries\filcenter}}{\thesubsection}{0.4em}{}

\hypersetup{
  colorlinks,
  citecolor=Blue,
  linkcolor=Red,
  urlcolor=Violet}
  
\usepackage{helvet}

\usepackage{titlesec}

% \titleformat{\section}[leftmargin]{\normalfont\sffamily\bfseries\filleft}{}{0pt}{}
% \titlespacing{\section}{4pc}{1.5ex plus .1ex minus .2ex}{1pc}

\makeatletter
\def\@biblabel#1{\@ifnotempty{#1}{#1.}}
\makeatother

\newenvironment{Figure}
{\par\medskip\noindent\minipage{\linewidth}}
{\endminipage\par\medskip}

\makeatletter
\renewcommand{\maketitle}{\bgroup\setlength{\parindent}{0pt}
\begin{flushleft}
  \textbf{\@title}
  \@author
\end{flushleft}\egroup
}
\makeatother


\title{\bf \begin{flushleft}\fontfamily{phv}\selectfont{\Large Single-molecule mapping of chromatin accessiility using NOMe-seq/dSMF}
\end{flushleft}}
\renewcommand\Authfont{\normalsize}
\author[1,$\#$]{\fontfamily{phv}\selectfont{\textbf{Michaela Hinks}}}
\author[1]{\fontfamily{phv}\selectfont{\textbf{Georgi K. Marinov}}}
\author[1,2]{\fontfamily{phv}\selectfont{\textbf{Anshul Kundaje}}}
\author[3]{\fontfamily{phv}\selectfont{\textbf{Lacramioara Bintu}}}
\author[1,4,5,6,$\#$]{\fontfamily{phv}\selectfont{\textbf{William J. Greenleaf}}}
\renewcommand\Affilfont{\itshape\normalsize}
\affil[1]{Department of Genetics, Stanford University, Stanford, CA 94305, USA}
\affil[2]{Department of Computer Science, Stanford University, Stanford, CA 94305, USA}
\affil[3]{Department of Bioengineering, Stanford University, Stanford, CA 94305, USA}
\affil[4]{Center for Personal Dynamic Regulomes, Stanford University, Stanford, California 94305, USA}
\affil[5]{Department of Applied Physics, Stanford University, Stanford, California 94305, USA}
\affil[6]{Chan Zuckerberg Biohub, San Francisco, California, USA}
% \affil[*]{These authors contributed equally to this work}
\affil[$\#$]{Corresponding authors}
\date{}

\def\changemargin#1#2{\list{}{\rightmargin#2\leftmargin#1}\item[]}
\let\endchangemargin=\endlist 

\theoremstyle{definition}
\newtheorem{note}{}

\begin{document}
\maketitle

\renewcommand{\abstractname}{\noindent\fontfamily{phv}\selectfont{\centerline{}
Abstract}}

\renewenvironment{abstract}
 {\small
  \begin{flushleft}
  \bfseries \noindent{\large\abstractname}\par\nobreak\smallskip\vspace{-.5em}\vspace{0pt}
  \end{flushleft}
  \list{}{
    \setlength{\leftmargin}{.0cm}%
    \setlength{\rightmargin}{\leftmargin}%
  }%
  \item\relax}
 {\endlist}
 
% \renewenvironment{abstract}
%   {\small\quotation
%   {\bfseries\noindent{\large\abstractname}\par\nobreak\smallskip}}
%   {\endquotation}

\renewcommand{\figurename}{Fig.}

\centerline{}
\begin{abstract}
\noindent\noindent{\normalsize The bulk of gene expression regulation in most organisms is accomplished through the action of transcription factors (TFs) on \textit{cis}-regulatory elements (CREs). In eukaryotes, these CREs are generally characterized by nucleosomal depletion and thus higher physical accessibility of DNA. Many methods exploit this property to map regions of high average accessibility, and thus putative active CREs, in bulk. However, these techniques do not provide information about coordinated patterns of accessibility along the same DNA molecule, nor do they map the absolute levels of occupancy/accessibility. SMF (\textbf{S}ingle \textbf{M}olecule \textbf{F}ootprinting) fills these gaps by leveraging recombinant DNA cytosine methyltransferases (MTase) to mark accessible locations on individual DNA molecules. In this chapter we discuss current methods and important considerations for performing SMF experiments.
\centerline{}
\centerline{}
\indent\indent\textbf{Key words:} Enhancers, Promoters, Chromatin accessibility, SMF, High-throughput sequencing}
\end{abstract}
\centerline{}
\centerline{}
\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm}

\reversemarginpar\marginpar{\section{Introduction}}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig1.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Outline of the NOMe-seq/dSMF assay}. As a first step, nuclei are isolated from cells and chromatin is incubated with the M.CviPI (GpC) and/or M.SssI (CpG) DNA methyltransferases (CpG can usually only be used in biological contexts in which there is no endogenous CpG DNA methylation). DNA is methylated where it is accessible, i.e., where it is not protected by nucleosomes and bound transcription factors. DNA is then purified and fragmented, and chemical or enzymatic conversion is carried out. Three different readout strategies can be applied subsequently -- unbiased whole-genome sequencing (left), targeted enrichment using probe-hybridization pulldown, or amplicon sequencing (see the text for more details). After sequencing, single-molecule accessibility maps are generated based on the methylation status of informative positions along DNA.}
\label{Fig1}
\end{center}
\end{figure*}

% \begin{figure*}[!ht]
% \begin{center}
% \includegraphics[width=15cm]{Fig2.png}
% \captionsetup{singlelinecheck=off,justification=justified}
% \caption{\small \textbf{Structure of NOMe-seq/SMF libraries}. (A). }
% \label{Fig2}
% \end{center}
% \end{figure*}

The development of assays such as ChIP-seq \cite{Johnson2007,Mikkelsen2007} enabled the direct mapping of genome-wide TF binding while methods such as ATAC-seq \cite{Buenrostro2013}, DNase-seq \cite{Crawford2006,Boyle2008}, and MNase-seq \cite{Schones2008} have provided unbiased global mapping of accessible DNA and nucleosome positioning, with open chromatin generally being a proxy indicator of TF occupancy. These methods have enabled identification of CREs and the profiling the average occupancy of TFs across the genome. While powerful, identifying genome-wide TF binding in bulk across tens of thousands of cells is insufficient to fully understand mechanisms of TF action. In contrast, single molecule methods such as NOMe-seq \cite{Kelly2012} (Nucleosome Occupancy and Methylome sequencing) and SMF \cite{Krebs2017} (single-molecule footprinting) enable profiling of accessible DNA and TF occupancy within individual molecules, thus potentially providing invaluable information about binding cooperativity and dependencies between individual accessibility states. The core principle underlying all SMF assays is the use of DNA methyltransferases to deposit methyl groups on accessible DNA, followed by detection of the methylation on individual molecules of interest. 

Several different versions of the SMF assays can be carried out, based on which DNA MTase, DNA methylation detection method, sequencing modality, and sequence enrichment strategy are used. In this chapter, we provide important considerations for performing SMF experiments intended for sequencing on Illumina instruments, either in an unbiased genome-wide or targeted manner.

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm} 
\reversemarginpar\marginpar{\section{Materials}}

Prepare a master stock of the ATAC-RSB buffer without detergents in a large volume (e.g. 50 mL) and store it 4$\,^{\circ}\mathrm{C}$. 

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Methylation Buffers and Reagents}}}

Prepare the RSB-Lysis and RSB-Wash buffers immediately before use by adding the necessary detergents; keep on ice.

\begin{enumerate}
\item IGEPAL CA-630 detergent (Sigma Cat\# 11332465001; supplied as a 10\% solution).
\item Tween-20 detergent (Sigma Cat\# 11332465001, supplied as a 10\% solution; store at 4$\,^{\circ}\mathrm{C}$).
\item Digitonin detergent (Promega Cat\# G9441, supplied as a 2\% solution in DMSO; store at -20$\,^{\circ}\mathrm{C}$)).

\item RSB buffer (master stock) \\
\hspace*{20pt}10 mM Tris-HCl pH 7.4 \\
\hspace*{20pt}10 mM NaCl \\
\hspace*{20pt}3 mM MgCl$_2$.

\item RSB-Lysis buffer \\
\hspace*{20pt}10 mM Tris-HCl pH 7.4 \\
\hspace*{20pt}10 mM NaCl \\
\hspace*{20pt}3 mM MgCl$_2$ \\
\hspace*{20pt}0.1\% IGEPAL CA-630\\
\hspace*{20pt}0.1\% Tween-20 \\
\hspace*{20pt}0.01\% Digitonin.

\item Lysis Wash Buffer (RSB-wash) \\
\hspace*{20pt}10 mM Tris-HCl pH 7.4 \\
\hspace*{20pt}10 mM NaCl \\
\hspace*{20pt}3 mM MgCl$_2$ \\
\hspace*{20pt}0.1\% Tween-20.

\item GpC Methyltransferase (M.CviPI) Reaction Buffer (NEB Cat \# B0227SVIAL). This bugger is supplied with GpC Methyltransferase Cat \# M0227S as a 10$\times$ stock without the S-adenosylmethionine). It final composition (1$\times$) is as follows: \\
% \begin{enumerate}
\hspace*{20pt}50 mM NaCl \\
\hspace*{20pt}50 mM Tris-HCl (pH 8.5) \\
\hspace*{20pt}10 mM DTT \\
% \end{enumerate}
\hspace*{20pt}32 mM S-adenosylmethionine (SAM) (NEB Cat \# B9003SVIAL, supplied with all NEB DNA methyltransferase enzymes). SAM is to be added immediately prior to use. Avoid repeated freeze--thawing of SAM as it is an unstable reagent.

\item GpC MTase (M.CviPI) (NEB Cat \# M0227S, supplied at 4,000 units/mL).
\item CpG MTase (M.SssI) (NEB Cat \# M0226S, supplied at 4,000 units/mL).
\item MgCl$_2$ (Thermo Fisher Scientific Cat \# AM9530G). 
\item 2 M Sucrose solution (Sigma Aldrich Cat \# S0389).

\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Library \\ building, \\ sequencing \\ and quality \\ evaluation}}}

\begin{enumerate}
\item Monarch Genomic DNA Purification Kit (NEB, Cat \# T3010L) or equivalent.
\item NEBNext Enzymatic Methyl-seq Kit (EM-seq, NEB, Cat \# E7120L) and associated reagents or EZ-DNA Methylation-Gold Kit (Zymo Research Cat\# D5005 (or equivalent), depending on the exact type of SMF experiment being performed (see more details below).
\item \textit{Optional, required if doing probe hybridization enrichment of genomic locations}: SureSelectXT Methyl-Seq Library Preparation kit (Agilent, Cat\# G9651A) and associated reagents.
\item \textit{Optional, required if doing probe hybridization enrichment of genomic locations}: SureSelectXT Mouse Methyl-Seq target enrichment panel and associated reagents (Agilent, Cat\# 931052) (or equivalent).
\item \textit{Optional, required if doing probe hybridization enrichment of genomic locations} - Dynabeads MyOne Streptavidin T1 (Thermo Fisher Scientific Cat\# 65601)
\item Agencourt AMPure XP Kit (Beckman Coulter Genomics Cat\# A63880).
\item 10 M NaOH, molecular biology grade (Sigma Cat\# 72068).
\item 100\% Ethanol, molecular biology grade (Sigma-Aldrich Cat\# E7023).
\item 1$\times$ Low TE Buffer (10 mM Tris-HCl, pH 8.0, 0.1 mM EDTA) (Thermo Fisher Scientifi Cat\# 12090015).
\item 200-$\mu$L PCR tubes.
\item Sequencing primers/adapters. % (\textit{see} \textbf{Note \ref{Adapters}})
\item NEBNext High-Fidelity 2$\times$ PCR Master Mix (NEB, Cat\# M0541S).
\item Qubit fluorometer or equivalent.
\item QuBit tubes.
\item QuBit dsDNA HS Assay Kit (Thermo Fisher Scientific Cat\# Q328500).
\item TapeStation (Agilent) or equivalent, e.g. BioAnalyzer (Agilent).
\item TapeStation D1000 tape and reagents (Agilent).

\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{General \\ materials \\ and equipment}}}

\begin{enumerate}
\item 1.5-mL microcentrifuge tubes, preferably low protein and DNA binding.
\item 2-mL, 15-mL and 50-mL tubes.
\item Incubator (37$\,^{\circ}\mathrm{C}$), or a ThermoMixer.
\item Tabletop centrifuge.
\item Thermal cycler.
\item MinElute PCR Purification Kit (Qiagen Cat\# 28004/28006), Zymo DNA Clean and Concentrator Kit (Zymo Cat\#  D4013/D4014), or equivalent. % (\textit{see} \textbf{Note \ref{DNAPurification}}) 
\item Nuclease-free H$_2$O.
\item 1$\times$ PBS buffer solution.
\item qPCR machine (StepOne or equivalent).
\item Covaris E220 or equivalent method for shearing genomic DNA (gDNA).
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Software \newline packages}}}

\begin{enumerate}
\item UCSC Genome Browser \cite{Kuhn2013,Kent2010} utilities: \burl{http://hgdownload.cse.ucsc.edu/admin/exe/}.
\item R: \burl{https://www.r-project.org/}.
\item Python (version 2.7 or higher) \burl{https://www.python.org/}.
\item TGL Kmeans: \burl{https://github.com/tanaylab/tglkmeans}.
\item SciPy: \burl{https://www.scipy.org/}.
\item Matplotlib: \burl{https://matplotlib.org/}.
\item Trimmomatic \cite{trimmomatic}: \burl{http://www.usadellab.org/cms/?page=trimmomatic}.
\item Cutadapt \cite{cutadapt}: \burl{https://cutadapt.readthedocs.io/en/stable/}.
\item TrimGalore: \burl{https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/}.
\item \verb|bwa-meth| \cite{bwameth}: \burl{https://github.com/brentp/bwa-meth}.
\item \verb|samtools| \cite{Li2009a}:  \burl{http://www.htslib.org/}
\item \verb|PicardTools|: \burl{https://broadinstitute.github.io/picard/}
\item MethylDackel: \burl{https://github.com/dpryan79/MethylDackel}.
\item Additional scripts: \\
 \burl{https://github.com/georgimarinov/GeorgiScripts}. Contains python scripts used in the examples shown below; some of the scripts depend on having \verb|pysam| (\burl{https://pysam.readthedocs.io/en/latest/index.html}) and \verb|pyBigWig| (\burl{https://github.com/deeptools/pyBigWig}) installed.
\end{enumerate}

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm} 
\reversemarginpar\marginpar{\section{Methods}}

The general outline of the dSMF assay is shown in Figure \ref{Fig1}. Nuclei are first isolated from cells, then chromatin is methylated using a 5mC methyltransferase and genomic DNA is purified. Next, base conversion of unmethylated cytosines are converted to uracils is carried out and sequencing libraries are prepared. In most cases a GpC methyltransferase is used, e.g. M.CviPI, which methylates cytosines in a GpC dinucleotide context. This is because the genomes of mammals, plants and many other species contain endogenous methylation in CpG context. However, if endogenous CpG methylation is not present in the samples being analyzed (e.g. yeast, \textit{Drosophila}, specially engineered mammalian cells that lack endogenous methylation \cite{mESCdSMF}, and others), an additional CpG methyltransferase can be used, e.g. M.SssI. This improves the resolution of the assay as the number of informative positions can be increased by a factor of two. Historically, the difference between NOMe-seq \cite{Kelly2012} and dSMF \cite{Krebs2017} (dual-enzyme SMF) has been that the latter uses both enzymes.

There are several ways to create a dSMF sequencing library, including via hybridization-based probe enrichment of genomic regions \cite{mESCdSMF}, targeted PCR amplification of specific loci, or by unbiased whole-genome sequencing of methylated DNA. Here we describe a generalized protocol for creating dSMF libraries following these approaches using commercially available kits. 

We also note that it is possible to carry out SMF on crosslinked material, but we advise that the exact parameters of any such protocol be individually optimized depending on the specifics of the experiment. The protocol described here is for native chromatin.

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Preparation of nuclei}}}

The first step of the SMF procedure is to prepare nuclei for methylation. The nuclei lysis delineated here is different from most previously published SMF protocols and identical to the Omni-ATAC cell lysis procedure \cite{Corces2017} as we have found that optimal and consistent results are obtained that way. It will work well for most mammalian and insect cell lines. Note that tissues and eukaryotic cells with cell walls (e.g. yeast and plant cells) will require different lysis and nuclei isolation procedures.

\begin{enumerate}
\item Count 1$\times$10$^6$ live cells (if working with a mammalian-sized genome) and aliquot into a microcentrifuge tube (\textit{see} \textbf{Note \ref{numbercells}}). 
\item Centrifuge cells at 500 $g$ for 5 min at $4\,^{\circ}\mathrm{C}$
\item Carefully aspirate the supernatant avoiding the pellet. 
\item Wash cells by resuspending in 200 $\mu$L ice cold 1$\times$ PBS.
\item Centrifuge cells at 500 $g$ for 5 min at $4\,^{\circ}\mathrm{C}$.
\item Aspirate supernatant. 
\item Add 200 $\mu$L of cold RSB-Lysis Buffer and pipette up and down several times.
\item Incubate on ice for 3 minutes
\item Add 1.2 mL cold RSB-Wash Buffer and invert several times to mix well. 
\item Centrifuge at 500 $g$ for 10 min at $4\,^{\circ}\mathrm{C}$.
\item Carefully aspirate the supernatant as fully as possible while avoiding the pellet. Proceed immediately to methylation.
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Methylation treatment}}}

Carry out methylation as follows:

\begin{enumerate}
\item Without resuspending, add 100 $\mu$L of CviPI Reaction Buffer to cells.
\item To each sample add 50 $\mu$L of GpC MTase M.CviPI (4 U/$\mu$L). Pipette gently $\sim$6$\times$ to mix (\textit{see} \textbf{Note \ref{MTase-concentration}}).
\item Incubate at $37\,^{\circ}\mathrm{C}$ for 7.5 min in a Thermomixer at 1,000 rpm.
\item Add more GpC MTase. Add additional 25 $\mu$L of low concentration M.CviPI (4 U/$\mu$L) and 2.4 $\mu$L more 32 mM SAM to the same tube, pipette 3$\times$ to mix, and return to $37\,^{\circ}\mathrm{C}$ with shaking for another 7.5 minutes. 
\item (\textit{Optional}, \textit{see} \textbf{Note \ref{CpG}} and discussion above). Add CpG MTase. Add 3 $\mu$L of high concentration (20 U/$\mu$L) M.SssI and 2.4 $\mu$L more SAM to the same tube, pipette 3$\times$ to mix, and return to $37\,^{\circ}\mathrm{C}$ with shaking for another 7.5 minutes.
\item At this point, proceed immediately to DNA purification or freeze cells in methylation solution at $-20\,^{\circ}\mathrm{C}$. 
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{DNA \\ purification}}}

% \begin{enumerate}
% \item Quench MTase by adding 175 $\mu$L of the lysis buffer from the NEB Monarch gDNA extraction kit, along with 3$\mu$L RNase A and 1$\mu$L Proteinase K (supplied with Monarch gDNA kit). (\textit{see} \textbf{Note \ref{gDNA-cleanup}}). 
% \item Incubate at $56\,^{\circ}\mathrm{C}$ for 10min to fully lyse nuclei and degrade protein.
% \item Add to the lysed cell mixture 700$\mu$L of gDNA binding buffer. Vortex to mix.
% \item Add 550$\mu$L to a Monarch gDNA column and centrifuge for 3 minutes at 1000g at room temperature. 
% \item Discard flow-through.
% \item Add remaining 550$\mu$L of lysed cells to a Monarch gDNA column and centrifuge for 3 minutes at 1000g at room temperature. 
% \item Discard flow-through.
% \item Perform one final spin of the column at maximum speed (or $>$10,000g) for 1 minute at room temperature. 
% \item Proceed as directed with manufacturer instructions.
% \item Following elution, quantify gDNA using Qubit. 
% \end{enumerate}

Quench MTase by adding 175 $\mu$L of the lysis buffer from the NEB Monarch gDNA extraction kit, along with 3$\mu$L RNase A and 1$\mu$L Proteinase K (supplied with Monarch gDNA kit). (\textit{see} \textbf{Note \ref{gDNA-cleanup}}). 

Purify gDNA following the NEB Monarch gDNA extraction kit instructions. 

Following elution, quantify gDNA using Qubit.

% \begin{figure*}[!ht]
% \begin{center}
% \includegraphics[width=15cm]{Fig3.png}
% \captionsetup{singlelinecheck=off,justification=justified}
% \caption{\small \textbf{XXX Show TapeStation profiles XXX}. }
% \label{Fig3}
% \end{center}
% \end{figure*}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Library \\ preparation -- \\ whole genome \\ SMF}}}

If carrying out whole-genome SMF, libraries are to be generated from this point using standard methods for carrying out whole genome cytosine methylation profiling, in which unmethylated cytosines are converted into uracils and final sequencing libraries are generated from the converted DNA. Two main approaches exist -- bisulfite conversion or enzymatic conversion.

For bisulfite conversion, we recommend the EZ-DNA Methylation-Gold Kit. However, bisulfite conversion leads to fragmentation of DNA, often to shorter fragments than what is desired for SMF experiments, where a key objective is to obtain molecules as long as can be sequenced on a short-read platform and thus maximize the information contained within each single molecule. Bisulfite conversion generally leads to fragments shorter than 200 bp, often considerably shorter, which has historically necessitated careful size selection of the subsequently generated libraries in order to maximize the coverage of long fragments \cite{Krebs2017}. 

Enzymatic conversion using the NEB EM-seq kit offers an attractive alternative as it does not degrade DNA and fragment size can be carefully controlled. As a first step before entering the EM-seq procedure, DNA needs to be sheared to the desired size. The Covaris E220 instrument allows a convenient solution for controlling fragment length, but other methods for shearing can be used too. 

Note that the EM-seq kit contains important positive and negative controls -- pUC19 and Lambda DNA, that are respectively fully methylated and unmethylated, and are invaluable for monitoring the efficiency of methylation conversion. Either add those to your samples as a $\sim$1\% spike-in before shearing, or maintain a stock of pre-sheared pUC19 and Lambda to be mixed with sonicated samples prior to conversion. If using bisulfite conversion, use unsheared controls.

Depending on the exact kit used, follow the manufacturer's instructions for final sequencing library generation.

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Library \\ preparation -- \\ probe \\ hybridization \\ enrichment}}}

A significant practical challenge to the application of single-molecule footprinting approaches to mammalian genomes is the very high sequencing depth that needs to be achieved in order to fully take advantage of the wealth of information contained in single molecules. These reads need to be as long as possible too (see further discussion below). Consequently, sequencing costs quickly become a major consideration when working with large genomes. 

However, given that most of the genome is inaccessible and active CREs comprise only a small portion of it, it is possible to greatly reduce costs by using hybridization capture to enrich for a desired subset of the genome. As an example, this approach has been previously successfully used to apply dSMF to many thousands of promoters and enhancers in the mouse genome \cite{mESCdSMF}, using the SureSelectXT Mouse Methyl-Seq target enrichment panel from Agilent. Other probe sets and enrichment protocols are likely to work as well. %  However, care must be taken when mixing and matching adaptor ligation and probe enrichment kits \textit{see} \textbf{Note \ref{probe-protocol}}). 

The exact details of the protocol will vary depending on the specifics of the kit used. A general outline of the procedure is as follows:

For a probe hybridization enrichment dSMF experiment, footprinted DNA is sheared using a Covaris sonicator, end-repaired and methylated adapters are ligated, creating a pre-capture library. Adapters need to be methylated in order to block their conversion during the subsequent steps and enable PCR amplification. The pre-capture library is then hybridized with a biotinylated set of target probes, and purified using streptavidin bead capture. The captured molecules are subjected to bisulfite conversion, and then PCR-amplified.

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Library \\ preparation -- \\ amplicon-targeted \\ SMF}}}

Even greater levels of enrichment and depth of coverage can be obtained by selectively amplifying individual loci. This approach works best together with the EM-seq conversion kit because, as discussed above, it provides better preserved DNA compared to bisulfite treatment. Footprinted whole-genome DNA is used as input and carried through the EM-seq procedure up to the last, final library amplification step. Then PCR primers specific for a locus (or loci) of interest are used to make the final targeted library. The challenge when using this approach is that PCR primers need to be selected and/or designed in such a way that they work on converted DNA; the exact specifics of that selection will vary depending on the particulars of the experiment carried out.

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Library \\ quantification \\ and evaluation \\ of library quality}}}

Before libraries can be sequenced, they need to be properly quantified, and their quality evaluated. There are two components to this process -- first, evaluation of the insert distribution, and second, quantification. 

\begin{enumerate} 
\item Examination of library size distribution. This step can be carried out using a variety of instruments that are now available for this purpose, such as a TapeStation or a BioAnalyzer. In our practice we prefer to use a TapeStation (with the D1000 or HS D1000 kits) due to its ease of use, flexibility and rapid turnaround time. % Typical results are shown in Figure \ref{Fig3}. 

\item Quantification of library concentration. For most high-throughput sequencing applications, where fragment size is unimodal, this step can be carried out with a sufficient degree of accuracy using a Qubit fluorometer. Typically, dSMF falls in that category. For libraries with complex fragment distributions, as well as for higher accuracy of quantification, qPCR can be used. Commercial kits, such as the NEBNext Library Quant Kit for Illumina or the KAPA Library Quantification Kits, exist for that purposes, and custom in-house quantification methods can also be used (see the first chapter in this book on ATAC-seq for details).

\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Sequencing}}}

The protocol described here generates libraries designed to be sequenced on Illumina sequencers. Because every molecule in an SMF library contains information about its unique accessibility state throughout the sequence, it is advisable to perform longer read-length sequencing than is necessary to simply align the fragments. It is best to sequence all molecules completely (e.g. a 300-bp insert would be sequenced with 150 cycles in Read 1 and 150 cycles in Read 2). Paired-end sequencing is preferable to single-end sequencing to improve quality, though single-end will also work provided the reads are sufficiently long. It is recommended to sequence SMF libraries to high depth, i.e. $\sim$1000$\times$ coverage of the size of the probed portion of the genome. This leads to, on average, 1000 unique molecules per genomic locus that are each read once. Sequencing depth can be adjusted based on the user probe set and the frequency of accessibility states observed. 

Due to the relatively high cost of longer-read Illumina sequencing, users may wish to perform quality control checks on the library prior to full sequencing. A useful way to verify that the library is complex and captures chromatin accessibility is to sequence it at a fraction of the optimal depth using shorter read-length sequencing (e.g. as 2$\times$36mers). This way, the user can check that methylation is detected at GpC locations and ensure that there is a diversity of probed regions represented. 

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig4.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Outline of NOMe-seq/dSMF computational processing}. Raw sequencing reads are first trimmed of adapters (note that it is important to do this properly depending on the type of conversion protocol used for making the libraries). They are then aligned against the target genome or amplicons in a methylation-aware manner. Subsequently, alignments are used to make aggregate methylation tracks (if data is to be used to evaluate bulk accessibility) and single-molecule plots (for actual footprinting).
}
\label{Fig4}
\end{center}
\end{figure*}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Computational \\ analysis}}}

The overall outline of NOMe-seq/dSMF data processing is shown in Figure \ref{Fig4}. Briefly, reads are trimmed of adapters and then aligned against the genome or a set of target amplicons. These alignments are then used to evaluate bulk-level accessibility and to carry out analysis at the level of individual single molecules.

\centerline{}
\reversemarginpar\marginpar{\subsubsection{\textit{Adapter \newline trimming}}}

If working with EM-seq datasets, Trimmomatic can be used to trim adapters as follows:

\begin{verbatim}
java -jar trimmomatic-0.36.jar PE 
EM-seq.read1.fastq.gz EM-seq.read1.fastq.gz 
EM-seq.read1.paired.fastq.gz  
EM-seq.read1.unpaired.fastq.gz  
EM-seq.read2.paired.fastq.gz  
EM-seq.read2.unpaired.fastq.gz  
ILLUMINACLIP:Trimmomatic-0.36/adapters/adapters.fa:2:30:10 
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
\end{verbatim}

For bisulfite data, we recommend clipping the first 9 bases off reads due to their usually lower quality in addition to adapter remova, using \verb|trim_galore|, as follows:

\begin{verbatim}
trim_galore 
--path_to_cutadapt ./cutadapt 
--clip_R1 9 --clip_R2 9 
--three_prime_clip_r1 6 
--three_prime_clip_r2 6 
--paired EM-seq.read1.fastq.gz 
         EM-seq.read1.fastq.gz
\end{verbatim}

\reversemarginpar\marginpar{\subsubsection{\textit{Read \newline mapping and \\ alignment \\ filtering}}}

We use BWAmeth for alignment of base-converted datasets. 

\begin{enumerate}
\item The first step is to prepare a reference, as follows:

\begin{verbatim}
python bwameth.py index bwameth-indexes/genome.fa
\end{verbatim}

\item Next, reads are mapped against the reference (while filtering out low quality alignments and unmapped reads):

\begin{verbatim}
python bwameth.py --reference bwameth-indexes/genome.fa 
EM-seq.read1.paired.fastq.gz EM-seq.read2.paired.fastq.gz 
| samtools view -F 1804 -q 30 -bT 
bwameth-indexes/genome.fa - | samtools sort - 
EM-seq.bwameth
\end{verbatim}

\item The next step is to remove potential PCR duplicates. Note that this step generally applies only to whole-genome and probe-capture libraries where there is diversity of fragment coordinates.

\begin{verbatim}
java -Xmx4G -jar picard-tools-1.99/MarkDuplicates.jar 
INPUT=EM-seq.bwameth.bam OUTPUT=EM-seq.bwameth.dedup.bam 
METRICS_FILE=EM-seq.bwameth.dedup.metric 
VALIDATION_STRINGENCY=LENIENT ASSUME_SORTED=true 
REMOVE_DUPLICATES=true
\end{verbatim}

\item Use \verb|samtools| to index the final BAM file:

\begin{verbatim}
samtools index EM-seq.bwameth.dedup.bam
\end{verbatim}

\end{enumerate}

\reversemarginpar\marginpar{\subsubsection{\textit{Methylation \newline conversion \\ assessment }}}

In order to evaluate the efficiency of methylation conversion, use the same procedure as described above to map reads against the Lambda and pUC19 genomes.

Then use the custom \verb|MethylationPercentageContext.py| script to calculate the average methylation levels in GpC and CpG contexts, e.g. for pUC19:

\begin{verbatim}
python MethylationPercentageContext.py 
EM-seq.pUC19.dedup.bam pCU19.fa CG,GC 
EM-seq.pUC19.dedup.CG-GC-meth-perc
\end{verbatim}

The pUC19 plasmid is the methylated positive control and should show very high (90\%+) levels of specifically CpG methylation, while Lambda DNA is the unmethylated negative control and should exhibit minimal levels of methylation.

\reversemarginpar\marginpar{\subsubsection{\textit{Methylation \newline calling}}}

The next step is to extract methylation calls, using \verb|MethylDackel|:

\begin{verbatim}
MethylDackel extract --CHG --CHH genome.fa 
EM-seq.bwameth.dedup.bam
\end{verbatim}

Note the parameters used so that both CpG and GpC contexts are included in the output. However, further filtering is needed in order to specifically obtain GpC positions, described further below.

\reversemarginpar\marginpar{\subsubsection{\textit{Bulk \\ accessibility  \\ or methylation \\ profile \\ generation}}}

For the purpose of generating bulk accessibility profiles (this is often useful for genome browser visualization of results), execute the following steps:

\begin{enumerate}

\item Compress the \verb|MethylDackel| output:

\begin{verbatim}
gzip EM-seq.bwameth.dedup_CHG.bedGraph
gzip EM-seq.bwameth.dedup_CHH.bedGraph
gzip EM-seq.bwameth.dedup_CpG.bedGraph
\end{verbatim}

\item Extract GpC positions from the \verb|MethylDackel| output, for GpC positions, using the \verb|MethylationPercentageSmooth-dSMF.py| custom script:

\begin{verbatim}
python MethylationPercentageSmooth-dSMF.py 
EM-seq.bwameth.dedup_CHH.bedGraph.gz 
genome.fa GpC 10 -MethylDackel -minCov 10 > 
EM-seq.bwameth.dedup_CHH.GpC-only.minCov10.wig
\end{verbatim}

\item Do the same for CpG positions:

\begin{verbatim}
python MethylationPercentageSmooth-dSMF.py 
EM-seq.bwameth.dedup_CpG.bedGraph.gz 
genome.fa CpG 10 -MethylDackel -minCov 10 > 
EM-seq.bwameth.dedup_CHH.CpG-only.minCov10.wig
\end{verbatim}

Note that in this case we also apply a minimal coverage cutoff of 10 reads. This can be adjusted as needed.

\item These steps create bedGraph files from which bigWig files to be used on a genome browser can be generated:

\begin{verbatim}
UCSC-utils/wigToBigWig 
EM-seq.bwameth.dedup_CHH.GpC-only.minCov10.wig 
genome.chrom.sizes 
EM-seq.bwameth.hg38.dedup_CHH.GpC-only.minCov10.bigWig
\end{verbatim}

Note that for this step a \verb|chrom.sizes| file is needed. This file can be created using the \verb|makeChromSizesFromFasta.py| custom script

\item It is also often useful to know what the raw read coverage is along the genome. This can be generated using many different existing tools, in this case we use the custom \verb|makewigglefromBAM-NH.py| script:

\begin{verbatim}
python makewigglefromBAM-NH.py title 
EMs-eq.dedup.bam genome.chrom.sizes 
EMs-eq.dedup.coverage.wig -uniqueBAM
\end{verbatim}

Convert into a bigWig file as above.

\end{enumerate}

\reversemarginpar\marginpar{\subsubsection{\textit{Metaprofile \newline evaluation}}}

It is often useful to generate metaplots over a defined set of genomic features, for quality evaluation (e.g. assessing how strong the methylation levels are around active promoters) and for other analysis tasks (e.g. measuring average footprinting by TFs at their occupancy sites). 

\begin{enumerate}

\item As a first step, extract the wanted sequence contexts from the \verb|MethylDackel| output using the \verb|BismarckSequenceContextFilter.py| custome script, e.g. as follows for GpC:

\begin{verbatim}
python BismarckSequenceContextFilter.py 
EM-seq.bwameth.hg38.dedup_CHH.bedGraph.gz GC 
genome.fa | gzip > 
EM-seq.bwameth.hg38.dedup_CHH.GpC-only.bedGraph.gz
\end{verbatim}

\item Then generate the metaprofile using the \verb|signalAroundPeaks-nano.py| custom script. This script can be run with a wide variety of inputs and window lengths around the desired viewpoints. In this example, we use it to generate a metaprofile around annotated transcription start sites:

\begin{verbatim}
python signalAroundPeaks-nano.py 
annotation.TSS-0bp.bed 0 1 3 1000 10 
EM-seq.bwameth.hg38.dedup_CHH.GpC-only.bedGraph.gz 
EM-seq.bwameth.hg38.dedup_CHH.GpC-only.TSS_profile 
-bismark.cov
\end{verbatim}

The \verb|annotation.TSS-0bp.bed| can be generated from a GTF files using the \verb|TSS_bed_FromGTF.py| custom script.

\end{enumerate}

\reversemarginpar\marginpar{\subsubsection{\textit{Generating \newline single-molecule \newline maps}}}

Finally, we illustrate the generation of single molecule maps. This is done using the \verb|dSMF-footprints.py| script, which has as a dependency the \verb|heatmap.py| custom script, and has a wide variety of options for color adjustment, minimal read coverage filtering, and others.

It takes as input a BAM file, a BED file with the windows over which single molecules are to be plotted, the genome sequence, and the sequence context(s) (\verb|GC|, \verb|CG|, or \verb|both|).

\begin{verbatim}
python dSMF-footprints.py EM-seq.bwameth.hg38.dedup.bam 
genome.fa GC region.bed 0 1 2 3 EM-seq 
-heatmap heatmap.py 10 10 binary 10,100 -minCov 0.9 
\end{verbatim}

In this case we filter out all alignments that do not cover at least 90\% of the input regions, and plot the single molecules using the \verb|binary| Matplotlib colormap, meaning that methylated positions will be shown as light while protected unmethylated positions will be shown in black.

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm} 
\reversemarginpar\marginpar{\section{Expected results}}

Figures \ref{Fig5}A shows bulk accessibility, CpG methylation maps and raw read coverage tracks for previously published \cite{mESCdSMF} probe capture dataset in mouse.

Figures \ref{Fig5}B and \ref{Fig5}C show typical CpG (endogenous methylation) and GpC (accessibility) metaprofiles around transcription start sites and around occupancy sites of the CTCF transcription factors (which is well known to be a strong driver of nucleosomal occupancy in the vicinity of its occupancy sites \cite{Fu2008}) for the same dataset. 

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig5.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Aggregate accessibility analysis of NOMe-seq/dSMF datasets}. 
(A) Read coverage and aggregate CpG/GpC methylation genome browser tracks show accessibility and/or endogenous methylation levels around the genome. In this case, reduced representation probe-capture dSMF dataserts (obtained from ArrayExpress accessions E-MTAB-9033 and E-MTAB-9123) is show, thus the uneven coverage; 
(B) Metaplot showing average accessibility levels around TSSs in the mouse genome; 
(C) Metaplot showing average footprinting levels at occupied CTCF motifs.
}
\label{Fig5}
\end{center}
\end{figure*}

Examples of single-molecule maps showing footprint protection around binding sites for the CTCF transcription factor are shown in Figure \ref{Fig6}.

\begin{figure*}
\begin{center}
\includegraphics[width=10cm]{Fig6.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Examples of single-molecule accessibility measurements}. Shown are dSMF single-molecule maps (obtained from ArrayExpress accessions E-MTAB-9033 and E-MTAB-9123 \cite{mESCdSMF}). 
(A) High levels of occupancy by the CTCF transcription factor (middle).
(B) CTCF (middle) and possible nucleosome (left) footprints.
}
\label{Fig6}
\end{center}
\end{figure*}

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm} 
\centerline{}

\reversemarginpar\marginpar{\section{Notes}}

\begin{enumerate}

% \item \label{nanopore} Long-read sequencing modalities such as Nanopore and PacBio allow for direct detection of any DNA methylation on molecules >1kb long, providing the most flexibility in DNA methyltransferase options and subsequent footprinting resolution. However, preserving all original DNA methylation makes traditional methods of enriching for specific sequences of interest, such as PCR or probe hybridization, impractical. As a result, obtaining deep coverage of these sequences is often challenging. In contrast, detection of DNA methylation using short-read Illumina sequencing is only possible for cytosines. All known cytosine methyltransferases require either CpG or GpC dinucleotide sequence context to function, which in turn limits the resolution of SMF with Illumina sequencing to the frequency of CpG or GpC dinucleotides. On the other hand, probe hybridization and PCR enrichment strategies are compatible with cytosine methylation detection using Illumina, which facilitates deep coverage of genomic regions of interest in SMF assays. 

\item \label{numbercells}Note that the efficiency of the methylation reaction is potentially dependent on the ratio between the amount of enzyme present and the quantity of input material. Therefore one should be careful to avoid using too many cells as this could lead to suboptimal level of methylation in open chromatin. The input cell number should be scaled according to genome size and ploidy, i.e. a fission yeast cell (a $\sim$12-Mbp haploid genome) contains $\sim$500$\times$ less chromatin than a typical human cell (a $\sim$3-Gbp diploid genome).

\item \label{MTase-concentration}We have found that the high concentration of glycerol in the final methylation reaction is important for maintaining cell solubility and has little or no adverse effect on methyltransferase function. As a result, we recommended to use the low concentration of methyltransferase supplied by NEB at the time of writing. If using higher concentrations of enzymes from another source, adding extra glycerol to the methylation reaction may help to prevent cells from clumping together. Low concentrations of non-ionic detergents such as Tween-20 may also prevent cell clumping, but further optimization would be required. 

\item \label{CpG}It is possible to do single molecule footprinting for intended use with Illumina sequencing with only one or both of GpC or CpG methyltransferases. The particular application will determine which option is advisable. When working in organisms such as \textit{Drosophila} that contain no endogenous DNA methylation, using both enzymes is recommended for maximal footprinting resolution. CpG methylation exists endogenously in mammalian cells, so users may opt to only use GpC methyltransferase in order to distinguish between natural and synthetic DNA methylation. 

\item \label{gDNA-cleanup}In our experience, using the Monarch Genomic DNA Purification Kit is the easiest way to obtain high quality, purified genomic DNA after methylation. However, other methods, such as phenol-chloroform extraction, have been demonstrated to work. It is likely other kits for genomic DNA extraction also perform well. If using a different column-based gDNA purification kit, care should be taken to increase the amount of DNA binding and cell lysis buffers to ensure the ratio of kit buffer volume to sample buffer volume remains the same as in the manufacturer's instructions. Inappropriate buffer volume ratios may lead to poor DNA binding to the column and subsequent low yield.

% \item \label{probe-protocol} [ EXPAND ] We suspect any methylation-compatible probe-hybridization enrichment kit or otherwise available commercial protocol will work well in place of SureSelectXT Methyl-Seq Library Preparation Kit, though it has not yet been demonstrated. 
\end{enumerate}

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}
\centerline{}
\begin{changemargin}{3.7cm}{0cm} 
\reversemarginpar\marginpar{\section*{Acknowledgements}} 

The authors thank members of the Greenleaf, Bintu and Kundaje labs for many helpful discussions.  This work was supported by NIH grants (P50HG007735, RO1 HG008140, U19AI057266 and UM1HG009442 to W.J.G., 1UM1HG009436 to W.J.G. and A.K., 1DP2OD022870-01 and 1U01HG009431 to A.K., and HG006827 to C.H.), the Rita Allen Foundation (to W.J.G.), the Baxter Foundation Faculty Scholar Grant, and the Human Frontiers Science Program grant RGY006S (to W.J.G). W.J.G is a Chan Zuckerberg Biohub investigator and acknowledges grants 2017-174468 and 2018-182817 from the Chan Zuckerberg Initiative. Fellowship support provided by the Stanford School of Medicine Dean's Fellowship (G.K.M.).

\end{changemargin} 

\begin{thebibliography}{100}

\begin{multicols}{2}
\begin{small}

\input{references}

\end{small}
\end{multicols}

\end{thebibliography}

\end{document}
