\documentclass[10pt]{article}
\usepackage[hmargin=1.5cm,top=2cm,bottom=2cm]{geometry}
\usepackage{multicol}
\setlength\columnsep{15pt}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{array}
\usepackage{booktabs}
\usepackage{tabularx}
\usepackage[auth-sc]{authblk}
\usepackage{longtable}
\usepackage{multirow}
\usepackage{hyperref}
\usepackage{enumerate}
\usepackage[labelfont=bf]{caption}
\usepackage[usenames,dvipsnames]{xcolor}
\usepackage{mdframed}
\usepackage{graphics}
\usepackage{multirow}
\usepackage{rotating}
\usepackage{array}
\usepackage{lscape}
\usepackage{caption}
\usepackage{breakurl}
\usepackage{todonotes}
\usepackage{hanging}
\usepackage[final]{pdfpages}
\usepackage[leftFloats,CaptionAfterwards]{fltpage}
\usepackage[numbers,super,sort&compress]{natbib}
\setlength{\bibsep}{0pt plus 0.3ex}
\usepackage{abstract}
\usepackage{enumitem}
\usepackage{soul}
\usepackage{titlesec}
\titleformat{\section}[block]{\large\bfseries\filcenter}{\thesection.}{0.4em}{}
\titleformat{\subsection}[block]{\normalsize\sc\bfseries\filcenter}{\thesubsection.}{0.4em}{}
\titleformat{\subsubsection}[block]{\normalsize\sc\itshape\filright}{\thesubsection.}{0.4em}{}
\setcounter{secnumdepth}{5}

\makeatletter
\def\@biblabel#1{\@ifnotempty{#1}{#1.}}
\makeatother

\newcommand{\filllastline}[1]{
\setlength\leftskip{0pt}
\setlength\rightskip{0pt}
\setlength\parfillskip{0pt}
#1}

\newenvironment{Figure}
{\par\medskip\noindent\minipage{\linewidth}}
{\endminipage\par\medskip}

\title{\bf Direct profiling of genome-wide dCas9 and Cas9 specificity using ssDNA mapping (CasKAS)}
\renewcommand\Authfont{\scshape\normalsize}
\author[1,$\#$,*]{Georgi K. Marinov}
\author[2,*]{Samuel H. Kim}
\author[1]{S. Tansu Bagdatli}
\author[1]{Soon Il Higashino}
\author[3,4]{Alexandro E. Trevino}
\author[1]{Josh Tycko}
\author[5]{Tong Wu}
\author[4]{Lacramioara Bintu}
\author[1,6]{Michael C. Bassik}
\author[5,7,8]{Chuan He}
\author[1,9]{Anshul Kundaje}
\author[1,10,11,12,$\#$]{William J. Greenleaf}
\renewcommand\Affilfont{\itshape\small}
\affil[1]{Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA}
\affil[2]{Cancer Biology Program, School of Medicine, Stanford University, Stanford, CA 94305, USA}
\affil[3]{Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94305, USA}
\affil[4]{Department of Bioengineering, Stanford University, Stanford, CA 94305, USA}
\affil[5]{Department of Chemistry, The University of Chicago, Chicago, IL, 60637, USA}
\affil[6]{Chemistry, Engineering, and Medicine for Human Health (ChEM-H), Stanford University, Stanford, CA, 94305, USA}
\affil[7]{Department of Biochemistry and Molecular Biology and Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL, 60637, USA}
\affil[8]{Howard Hughes Medical Institute, The University of Chicago, Chicago, IL, 60637, USA}
\affil[9]{Department of Computer Science, Stanford University, Stanford, CA 94305, USA}
\affil[10]{Department of Applied Physics, Stanford University, Stanford, CA 94305, USA}
\affil[11]{Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, 94305, USA}
\affil[12]{Chan Zuckerberg Biohub, San Francisco, California, USA}
\affil[*]{These authors contributed equally to this work}
\affil[$\#$]{Corresponding author}
\date{}
\begin{document}
\maketitle

% \centerline{}
% \centerline{}
\begin{abstract}

\noindent {\normalsize \textbf{Detecting and mitigating off-target activity is critical to the practical application of CRISPR-mediated genome and epigenome editing. While numerous methods have been developed to map Cas9 binding specificity genome-wide, they are generally time-consuming and/or expensive, and not applicable to catalytically dead CRISPR enzymes. We have developed a rapid, inexpensive, and facile assay for identifying off-target CRISPR enzyme binding and cleavage by chemically mapping the unwound single-stranded DNA structures formed upon binding of a sgRNA-loaded Cas9 protein (``CasKAS''). We demonstrate this method in both \textit{in vitro} and \textit{in vivo} contexts.} 
}
\centerline{}
\centerline{}
\end{abstract}

\begin{multicols}{2}

\section*{Introduction}

CRISPR-based methods for editing the genome and epigenome have emerged as a highly versatile means of manipulating the genetic makeup and regulatory states of cells. CRISPR technologies hold the potential to transform medical practice by enabling direct elimination of pathogenic sequence variants or manipulation of aberrant gene expression programs. CRISPR has also become a standard tool for discovery in biomedical research, including its uses for high-throughput, massively parallel genomic screens\cite{Wang2014}. 

The presence of significant off-target effects is of universal concern for genome engineering technologies, presenting a major hurdle to fully realizing their potential utility. CRISPR tools have been shown to exhibit biochemical activity away from their intended target sites, which is particularly problematic for therapeutic applications, where risks of activity at sites other than the intended target leading to negative consequences to patient health must be minimal. Understanding and mapping these effects is therefore an urgent need.

\filllastline{To this end, numerous experimental approaches have been developed to experimentally map off-target effects genome-wide. Methods such as Digenome-seq\cite{Kim2015} look for particular types of cut sites around target sequences in whole-genome sequencing data; however, deep whole-genome sequencing remains expensive. Assays such as BLESS\cite{BLESS}, GUIDE-seq\cite{Tsai2015}, HTGTS\cite{HTGTS}, DSBCapture\cite{DSBCapture}, BLISS\cite{Yan2017}, SITE-seq\cite{Cameron2017}, CIRCLE--seq\cite{Tsai2017}, TTISS\cite{TTISS}, INDUCE-seq\cite{INDUCE-seq}, and CHANGE-seq\cite{CHANGE-seq} aim instead to directly map Cas9 cleavage events. However, all these methods involve some combination of complex and laborious molecular biology protocols and non-standard reagents, and have not been widely adopted. Other methods, such as DISCOVER-seq\cite{Wienert2019}, which}

\end{multicols}

\renewcommand{\footnotesize}{\fontsize{10pt}{12pt}\selectfont}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{Fig1V5.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS maps dCas9- and Cas9-mediated strand invasion and cleavage events genome-wide \textit{in vitro} on purified DNA and \textit{in vivo} in cell lines}. 
(a) CasKAS is based on the KAS-seq assay for mapping ssDNA structures. N$_3$-kethoxal covalently modifies unpaired guanine bases (while having no activity for G bases paired within dsDNA). Strand invasion by Cas9/dCas9 carrying an sgRNA results in the formation of a ssDNA structure, which can be directly identified 
\rightline{\textit{(legend continued on next page)}}
} 
\label{Fig1}
\end{figure*}
\clearpage
\let\thefootnote\relax\footnotetext{using N$_3$-kethoxal. 
(b) Outline of \textit{in vivo} and \textit{in vitro} CasKAS. For in \textit{in vitro} CasKAS, gDNA is incubated with a dCas9/Cas9 RNP, then N$_3$-kethoxal is added to the reaction; for in \textit{in vivo} CasKAS, cells are transfected with an RNP, then treated with kethoxal. DNA is then purified, click chemistry is carried out, DNA is sheared, labeled fragments are pulled down with streptavidin beads, and sequenced.
(c and d) Mapping of dCas9 targets \textit{in vitro}. 
(c) Mouse gDNA was incubated with dCas9 RNPs carrying one of two sgRNAs targeting the mouse \textit{HOXA} locus. Highly specific labeling is observed at the expected target location of each sgRNA. 
(d) Asymmetric strand distribution of \textit{in vitro} dCas9 CasKAS reads around the sgRNA target site. 
(e and f) Mapping of Cas9 targets \textit{in vitro}. 
(e) Mouse gDNA was incubated with Cas9 RNPs carrying one of same two sgRNAs targeting the mouse \textit{HOXA} locus. 
(f) The distribution of 5' read ends around targets sites in \textit{in vitro} CasKAS datasets shows direct capture of the intermediate cleavage state.
(g) Reproducibility of \textit{in vivo} dCas9 CasKAS datasets. Shown are RPM values for 500bp windows centered on the top $\sim$7,000 predicted target sites for the ``sgRNA \#1'' in two \textit{in vitro} CasKAS experiments. Off-target sites are color-coded by the number of mismatches relative to the sgRNA.
(h) CasKAS requires a moderate sequencing depth of 10-20 $\times$ 10$^6$ reads to accurately rank potential off-targets. A total of 10 different sets of subsamplings were generated, and the fraction of off-targets within 20\% of their final quantification value was calculated for each. The mean $\pm$ SD is shown.
(i-k) \textit{In vitro} CasKAS maps Cas9 and dCas9 target sites. 
(i) Shown are CasKAS experiments with Cas9 and dCas9 and with the EMX1 sgRNA or with no sgRNA (negative control)
(j) Assymmetric 5' end distribution around target sites in dCas9 \textit{in vivo} CasKAS. 
(k) In \textit{in vivo} Cas9 CasKAS, a mixture distribution is observed between phased cleavage sites and broader ssDNA labeling.}

\begin{multicols}{2}

\noindent maps DNA repair activity by applying ChIP-seq against the MRE11 protein, as well as earlier applications of ChIP-seq to map catalytically dead dCas9 occupancy sites genome-wide\cite{Wu2014,Kuscu2014}, suffer from technical  issues associated with the ChIP procedure. Most recently, long-read sequencing has been adapted to the problem of Cas9 specificity profiling, in the form of SMRT-OTS and Nano-OTS\cite{OTS}, but the cost of these methods is relatively high while their throughput is comparatively low.

These existing methods have differing advantages and weaknesses -- some (e.g. ChIP-seq) are capable of capturing dCas9 association with DNA as a snapshot in time, others (e.g. those mapping editing outcomes by sequencing) provide information for off-target activity that can occur over a broad period of time, and generally with higher specificity. 

Various computational models have also been trained to predict off-targets genome-wide\cite{Doench2016,Perez2017}. However, these exhibit far from perfect accuracy, and thus in many situations, especially within clinical contexts, direct experimental evidence is needed to accurately identify potential unintended effects of CRISPR-based reagents.

A faster, more accessible, and versatile method for mapping CRISPR off targets is thus still a major need in the field. Here, we introduce CasKAS, a fast, inexpensive and straightforward method for mapping CRPSIR off-targets that is applicable to both active and catalytically dead CRISPR enzymes. CasKAS takes advantage of the unwound single-strand DNA associated with CRISPR occupancy of DNA, which can be very specifically mapped using kethoxal as recently demonstrated by the KAS-seq assay. We demonstrate the application of CasKAS for profiling off-targets of active Cas9 and dCas9, \textit{in vitro} using purified genomic DNA, and \textit{in vivo} in live cells, and we also show that CasKAS can be used to distinguish off-target sites where active Cas9 cleaves DNA from sites where it is only binding. CasKAS is thus a highly versatile and adaptable tool for profiling CRISPR off-target sites, as well as for studying the dynamics of CRISPR association with the genome and of the editing process.

\section*{Results}

\subsection*{CasKAS for mapping the physical association of CRISPR enzymes with DNA}

When a Cas9-sgRNA ribonucleoprotein (RNP) is engaged with its target site, the sgRNA invades the DNA double helix, forming a ssDNA structure on the other strand (Fig. \ref{Fig1}a). We thus reasoned that mapping ssDNA-containing regions should be a sensitive biochemical signal of productive Cas9 binding. The recently developed KAS-seq\cite{KAS} assay for mapping single-stranded DNA (ssDNA) (\textbf{k}ethoxal-\textbf{a}ssisted \textbf{s}sDNA sequencing\cite{KAS}) is ideally suited for the purpose of identifying ssDNA generated by CRISPR protein binding to DNA (Fig. \ref{Fig1}a-b). KAS-seq is based on the specific covalent labeling of unpaired guanine bases with N$_3$-kethoxal, generating an adduct to which biotin can then be added using click chemistry. After shearing, biotinylated DNA, corresponding to regions containing ssDNA structure, can be specifically enriched for and sequenced. 

To determine the feasibility of using KAS-seq to map regions of ssDNA generated by Cas9 binding, we carried out an initial \textit{in vitro} experiment using mouse genomic DNA (gDNA), purified dCas9 and two sgRNAs targeting the \textit{Hoxa} locus.

Strikingly, we observed strong peaks at the expected target sites for each sgRNA (Fig. \ref{Fig1}c). Detailed examination of dCas9 CasKAS profiles around the predicted sgRNA target sites revealed strand coverage asymmetry patterns similar to those observed for ChIP-seq around transcription factor binding sites\cite{Landt2012} (Fig. \ref{Fig1}d), indicating that enrichment derives from the sgRNA target site itself and confirming the utility of N$_3$-kethoxal for mapping dCas9 occupancy sites (in ChIP-seq, forward-strand reads are clustered to the left of the occupancy sites and reverse-strand reads to the right; this pattern arises because the occupancy site is crosslinked to the target protein and is thus always pulled down during immunoprecipitation resulting in all enriched fragments containing this site somewhere in their middle; observing a strong such pattern thus suggests high specificity of enrichment). We termed the assay ``CasKAS''. 

\subsection*{CasKAS for mapping active Cas9 nuclease cleavage sites}

We then reasoned that CasKAS should also capture active Cas9 complexed with DNA, as the enzyme is thought to remain associated with DNA for some time after cleavage\cite{Richardson2016}. We performed CasKAS experiments with the same sgRNAs and active Cas9 nuclease, and again observed enrichment at the expected on-target sites (Fig. \ref{Fig1}e). Examination of Cas9 CasKAS read profiles around the on-target site showed that the 5' ends of reads are precisely positioned around the expected cut site, with one cut position on the target strand (which binds the sgRNA and is cleaved by the HNH domain) and two to three such positions on the non-target strand (which is cleaved by the RuvC domain; Fig. \ref{Fig1}f), consistent with the previously known patterns of Cas9 cleavage \cite{Gisler2019,Jones2021}. CasKAS therefore provides target specificity profiles for both active and catalytically dead Cas9 enzymes.

\subsection*{CasKAS for mapping the activity of CRISPR enzymes \textit{in vivo}}

\textit{In vitro} CasKAS data was highly reproducible between replicates  (Fig. \ref{Fig1}g), and a modest sequencing depth of between 10 and 20 million mapped reads was sufficient to capture off-target specificity profiles (Fig. \ref{Fig1}h), which is an order of magnitude lower than required for resequencing the whole genome. 

We observed similar results with two mouse sgRNAs targeting the \textit{Nanog} locus (Supplementary Fig. \ref{FigS7}) and with two human sgRNA (``EMX1'' and ``VEGFA''; Supplementary Fig. \ref{FigS4} and \ref{FigS5}). We found no enrichment using components of the RNP in isolation -- sgRNAs, dCas9 or Cas9 (Supplementary Fig. \ref{FigS4}). 

Next we tested the application of CasKAS \textit{in vivo} in cell culture. Living cells contain substantial ssDNA due to active transcription, \hl{DNA replication,} and other processes\cite{KAS}, so \textit{in vivo} CasKAS signal derives from a mixture of Cas9-associated ssDNA and endogenous processes. We carried out KAS-seq experiments using both dCas9 and Cas9 in HEK293 cells transfected with RNPs targeting \textit{EMX1} or \textit{VEGFA}, as well as negative, no-guide controls, which provided a map of background endogenous ssDNA profiles. At \textit{EMX1}, which is not active in HEK293 cells, we observe strong peaks at the expected target site (Fig. \ref{Fig1}i), as well as an asymmetric read profile around it for dCas9 (Fig. \ref{Fig1}j), and a substantial degree of 5' end clustering at the cut site, similar to what is observed \textit{in vitro} for active Cas9 (Fig. \ref{Fig1}g). The VEGFA gene is active in HEK293 cells, but the dCas9/Cas9 CasKAS signal is still readily identifiable as an addition to the endogenous ssDNA enrichment pattern (Supplementary Fig. \ref{FigS18}). These results demonstrate the utility of CasKAS for profiling CRISPR specificity both \textit{in vitro} and \textit{in vivo}.

We then examined the temporal dynamics of Cas9 and dCas9 association with the genome \textit{in vivo} by carrying out \textit{in vivo} time course for the EMX1 and VEGFA sgRNAs with both dCas9 and Cas9, assaying at 6, 12, 24, 48 and 72 hours (Supplementay Fig. \ref{FigS27}--\ref{FigS30}). We find that association with DNA is not yet detectable at 6 hours, is strongest at 48 hours, and disappears for Cas9 at 72 hours but persists for dCas9 at that time point. This is likely explained by the fact that by the 72-hour time point cells have divided and DNA edits have been completed, thus disrupting Cas9's recognition of its cognate sequence. Thus, the 24-hour and 48-hour time points are optimal for \textit{in vivo} CasKAS, with the caveat that this may be dependent on the growth dynamics of the cell lines/organisms being studied.

We further demonstrated the utility of CasKAS for profiling the association of Cas9 and dCas9 with the genome, both \textit{in vivo} and \textit{in vitro}, by carrying out CasKAS for pairs of guides targeting promoter regions of multiple human genes (\textit{CD2}, CD90/\textit{THY1}, CD45/\textit{PTPRC}, CD298/\textit{ATP1B3}) as well as a pair of ``safe'' sgRNAs targeting non-coding sequence. We observe similar patterns to those described above for the mouse \#1 and \#2, Nanog-sg2 and Nanog-sg3, and the human EMX1 and VEGFA sgRNAs (Supplementary Fig. \ref{FigS31}--\ref{FigS38}).

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{Fig2-V5.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS profiles sgRNA specificity genome-wide}. 
(a) Summary of de novo peak calls for sgRNA \#1 (using MACS2)
(b) CasKAS signal is stronger over predicted off-target sites, but legitimate interactions are also found elsewhere in the genome.
(c) CasKAS profile over predicted (by Cas-OFFinder) off-target sites for sgRNA \#1 with dCas9 (all such sites and focusing only on the top 100 ranked by dCas9 CasKAS signal).
(d) CasKAS profile over peak calls outside predicted (by Cas-OFFinder) off-target sites for sgRNA \#1 with dCas9.
(e) Determinants of sequence specificity as measured by dCas9 CasKAS (for sgRNA \#1). PAM-distal regions of the sgRNA are less constrained than its PAM-proximal parts. The on-target sgRNA is highlighted in yellow. 
(f) Active Cas9 signal read profiles can be used to distinguish off-targets associated with cutting from those where only binding occurs. Shown are the same off-target sites as in (e) and the plus- and minus-strand active Cas9 5' end profiles around the sgRNA. In this case (sgRNA \#1), only the on-target site shows a Cas9 CasKAS pattern indicating cleavage; at the other sites even active Cas9 likely only binds but does not cut. A simple cutting score metric (``$C$-score'') based on multiplying the 5' end forward- and reverse-strand profiles can be used to quantify cutting vs. binding. 
(g and h) Comparison between \textit{in vitro} and \textit{in vivo} CasKAS signal over predicted off-target sites for the EMX1 sgRNA. \textit{In vivo} CasKAS is quantified as the difference in read per million ($\pm$500 bp of the sgRNA site) between the sgRNA KAS-seq and the no-guide control KAS-seq (``RPM$_{diff}$). The on-target site is shown in blue.
}
\label{Fig2}
\end{figure*}

\subsection*{Mapping CRISPR off-target sites using CasKAS}

We next examined the genome-wide specificity of sgRNAs as measured by CasKAS. We focus on the mouse sgRNA \#1 as it displayed a substantial number of off-targets yet that number was also sufficiently small for all of them to be examined directly. We first called peaks \textit{de novo} (see Methods for details) without relying on off-target prediction algorithms, then manually curated the resulting peak set, excluding peaks not exhibiting the canonical asymmetric read distribution around a fixed point on the two strands (Fig. \ref{Fig2}a; see also Supplementary Fig. \ref{FigS8} for illustration). Remarkably, while we found 32 peaks at predicted off-target sites, we also found 198 (i.e. $\sim$6$\times$ as many) additional manually curated peaks; while these peaks exhibit generally lower CasKAS signal (Fig. \ref{Fig2}b), they all display proper peak shape characteristics (see Supplementary Fig. \ref{FigS8} for details), suggesting that they are genuine sites of occupancy. Most of the predicted (in total $\sim$7,500) off-target sites for this sgRNA did not show substantial occupancy by dCas9 CasKAS (Fig. \ref{Fig2}c-d). 

Sequence comparison of the occupied predicted off-target sites allowed us to evaluate determinants of the specificity of dCas9 association and unwinding of DNA (Fig. \ref{Fig2}e). Consistent with previous reports \cite{Hsu2013,Semenova2011}, the PAM-distal region was much less sequence-constrained than the PAM-proximal seed region. We observed a similar pattern with the other sgRNAs we profiled, in both mouse and human (Supplementary Fig. \ref{FigS9}-\ref{FigS60}). 

When analyzing peaks not associated with predicted off-target sites (Supplementary Fig. \ref{FigS17}) we observed other telling patterns -- at numerous sites with strong dCas9 CasKAS signal, we observe a large number of mismatches to the sgRNA sequence as well as ``bulge'' regions wherein indels (relative to the sgRNA sequence) are observed in the target sequence. These mismatches and bulges were in general much larger than what is considered permissible by off-target prediction algorithms; we speculate that the lack of consideration of potential target sequences with large numbers of mismatches or substantial insertions could explain the much larger number of such sites compared to the set of occupied predicted off-targets. 

We next devised a simple metric for evaluating the degree of read clustering at cut sites (a ``$C$-score''; see Methods for details) to estimate the degree of cutting by Cas9. The on-target site exhibits the second highest dCas9 CasKAS signal genome-wide. However, strikingly, even though all CasKAS-identified off-target sites showed Cas9 binding, only the on-target site displayed strong cutting activity (Fig. \ref{Fig2}f). The behavior of other sgRNAs varies (Supplementary Fig. \ref{FigS13}--\ref{FigS48} and \ref{FigS21}--\ref{FigS70}), with some showing multiple clearly identifiable cut sites. Overall, these results are consistent with previous reports that Cas9 requires more successful RNA:DNA basepairing for cleavage activity than is necessary for binding\cite{Dahlman2015,Kiani2015}. Thus, interpreting the read distributions of Cas9 CasKAS at target sites enables simultaneous detection of binding specificity and the promiscuity of catalytic activity.

We then carried out amplicon sequencing over a set of 81 potential EMX1 and 52 potential VEGFA on- and off-targets on genomic DNA extracted from HEK293 cells transfected with each of the two sgRNAs. These experiments generally corroborated the \textit{in vitro} CasKAS results and identified no additional sites at which cutting occurs but for which cutting was not obsrved in \textit{in vitro} CasKAS data (Supplementary Fig. \ref{FigS51} and \ref{FigS52}).

Finally, we compared \textit{in vitro} and \textit{in vivo} CasKAS profiles \hl{using the difference between the signal in CasKAS and no-guide negative control libraires as a measure of \textit{in vivo} occupancy} (Fig. \ref{Fig2}g-h). We find many fewer strongly enriched sites in \textit{in vivo} datasets than \textit{in vitro}, with the on-target site being either the top (for dCas9) or among the top (for Cas9) sites in vivo. A potential explanation for this difference is the previously reported impediment of Cas9/dCas9 binding to DNA by the presence of nucleosomes\cite{Horlbeck2016}. %; this inhibitory effect need not be complete to generate the observed patterns as CasKAS measures the physical occupancy of DNA by CRISPR proteins at the moment of harvesting cells % , i.e. Cas9/dCas9 could still bind nucleosome-protected DNA but much more transiently than \textit{in vitro}. As their effect, in particularly cutting, but also base editing (in the case of dCas9 fused with base editing enzymes) is not necessarily dependent on constant physical association with DNA, the optimal strategy for off-target identification might include a combination of \textit{in vitro} experimentation on purified gDNA (generating a maximally permissible set of sites) combined with an \textit{in vivo} occupancy map (providing an estimation of the potentially most relevant sites \textit{in vivo}).

\section*{Discussion}

In conclusion, we have presented CasKAS, a simple and robust method for mapping the specificity of active and catalytically dead versions of CRISPR enzymes. CasKAS has numerous advantages over existing tools while also opening up new possibilities for studying CRISPR biology. CasKAS requires no specialized molecular biology protocols, takes just a few hours \textit{in vitro} (and a similar amount of time after harvesting cells \textit{in vivo}), and, due to the strong, active enrichment of target sequences, is inexpensive. In contrast to previously developed methods, It measures strand invasion by CRISPR, which is \hl{likely biochemically more specific and relevant to CRISPR function than DNA association (invasion is critical for cleavage by active Cas9, and it also ensures stable occupancy to drive epigenetic modulation and the various other effector functions of dCas9 fusions)}. We compared \textit{de novo} called CasKAS peaks to those generated by other means, and while we found a variable degree of concordance and large sets of peaks unique to some methods, those found only by CasKAS often contained higher fractions of predicted off-target sites than those unique to other methods (Supplementary Fig. \ref{FigS19}--\ref{FigS24}). 

CasKAS does not rely on measuring DNA cleavage or modification and can thus be used to profile the specificity of all types of DNA-targeting CRISPR proteins \hl{that generate a stable ssDNA structure}. CasKAS also does not rely on cellular repair processes, cell division, or delivery of additional exogenous DNA (as in GUIDE-seq) to generate a detectable signal. These advantages, coupled with low cell input requirements, may increase the utility of the method in rare primary cell types, tissues from animal models, or even for direct assessment of specificity in edited patient cells (e.g. \textit{ex vivo} edited immune cells). A current limitation of CasKAS is the requirement that a G nucleotide is present within the sgRNA sequence, since kethoxal requires an exposed G to react with. However, only a small fraction ($\leq$5\%) of sgRNAs in the human genome lack any Gs for \textit{S. pyogenes} PAM sequences (Supplementary Fig. \ref{FigS3}). We also do not observe a strong correlation between the number of G bases in an off-target sgRNA match and CasKAS enrichment (Supplementary Fig. \ref{FigS22}). A minor limitation specific to \textit{in vivo} experiments is that high levels of ssDNA generated as a result of active transcription or other endogenous processes may obscure the CasKAS signal at certain loci in some situations. We have explored this issue with sgRNAs targeting the promoter of the CD298/\textit{ATP1B3} gene (Supplementary Fig. \ref{FigS38}), where we observe additional KAS signal well above the endogenous levels for dCas9 but not for the active Cas9; this suggests that dCas9, the association of which does not result in cuts to DNA, is likely able to reassociate with DNA if displaced by transcription (or other processes); in contrast, active Cas9 is not. Another minor limitation of the current \textit{in vitro} protocol is that labeling is carried out on high molecular weight (HMW) DNA and samples must be sheared serially. We have explored using pre-sheared and end-repaired DNA (to minimize kethoxal labeling of Gs on sticky ends generated by sonication), with comparable results to using HMW DNA (Supplementary Fig. \ref{FigS6}); we anticipate that further optimization or using other approaches, such as enzymatic fragmentation, should allow the parallel high-throughput plate-based profiling of the specificity of very large numbers of sgRNAs.

In addition to being highly valuable for off-target profiling \textit{in vitro} and in previously difficult to assay settings such as primary cells, we expect CasKAS to provide fruitful insights into the mechanisms and dynamics of \textit{in vivo} CRISPR action (taking advantage of finely controllable CRISPR systems such as vfCRISPR\cite{vfCRISPR}), and the influence of transcriptional, regulatory, and epigenetic and other functional genomic contexts on CRISPR activity.

\section*{Methods}

\subsection*{Guide RNA sequences}

Guide RNAs were obtained from IDT (``sgRNA \#1'' and ``sgRNA \#2'') or from Synthego (all others).

The following sgRNA sequences were used in this study:

\begin{enumerate}
\item ``sgRNA \#1'': \verb|GCTTAATTAAGGTAAACGTC|
\item ``sgRNA \#2'': \verb|CCAACCTGGCGGCTCGTTGG| 
\item ``EMX1\_Tsai'': \verb|GAGTCCGAGCAGAAGAAGAA|
\item ``VEGFA-site1'': \verb|GGGTGGGGGGAGTTTGCTCC|
\item ``Nanog-sg2'': \verb|GATCTCTAGTGGGAAGTTTC|
\item ``Nanog-sg3'': \verb|GTCTGTAGAAAGAATGGAAG|
\item ``CD2-1'': \verb|ACATGGAAAGCTCATCTTAG|
\item ``CD2-2'': \verb|TACATGGAAAGCTCATCTTA|
\item ``CD90-1'': \verb|GCGGAAGACCCCAGTCCAGG|
\item ``CD90-2'': \verb|GTCCAGGTGGGAACTGGAGC|
\item ``CD45-1'': \verb|GTTTGTTCTTAGGGTAACAG|
\item ``CD45-2'': \verb|GAGTTTAAGCCACAAATACA|
\item ``CD298-1'': \verb|GACGGCAGTGAAGGGTGGGA|
\item ``CD298-2'': \verb|GAGTACTCCCCGTAACGAGG|
\item ``safe-1'': \verb|GTGCATTGTTGGTGGTTGTG|
\item ``safe-2`'': \verb|GCTAAAGTATCAAAGGGAAT|
\end{enumerate}

Guide RNAs were dissolved to a concentration of 100 $\mu$M using nuclease-free 1$\times$ TE buffer and stored at --20$\,^{\circ}\mathrm{C}$.

\subsection*{\textit{In vitro} CasKAS}

\textit{In vitro} CasKAS experiments were executed as follows. 

First, 1 $\mu$L of each synthetic sgRNA were incubated at room temperature with 1 $\mu$L of recombinant purified dCas9 (MCLab dCAS9B-200, at 20$\mu$M, i.e. a total of 20 pmol) for 20 minutes. The RNP was then incubated with 1 $\mu$g of gDNA at 37$\,^{\circ}\mathrm{C}$ for 10 minutes. 

The KAS reaction was then carried out by adding 1 $\mu$L of 500 mM N$_3$-kethoxal (ApeXBio A8793). DNA was immediately purified using the MinElute PCR Purification Kit (Qiagen 28006), and eluted in 87.5 or 175 $\mu$L 25mM K$_3$BO$_3$.

\subsection*{\textit{In vivo} CasKAS}

For \textit{in vivo} CasKAS experiments, HEK293T cells were seeded at 400,000 cells/well into a 6-well plate the day before RNP transfection. Media was exchanged 2 hours before transfection. For each well, 6,250 ng of Cas9 (MCLAB CAS9-200) or dCas9 (MCLAB dCAS9B-200) and 1,200 ng sgRNA was complexed with CRISPRMAX (Thermo Fisher CMAX00008) reagent in Opti-MEM (Thermo Fisher 51985091) following manufacturer's protocol. After incubation at room temperature for 15 minutes, the RNP solution was directly added to each well and gently mixed. The cells were incubated with the RNP complex for 14 hours at 37$\,^{\circ}\mathrm{C}$. To harvest and perform kethoxal labeling, media was removed and room temperature 1$\times$ PBS was used to wash the cells. Cells were then dissociated with trypsin, trypsin was quenched with media, cells were pelleted at room temperature, and then resuspended in 100 $\mu$L of media supplemented with 5 mM N$_3$-kethoxal (final concentration). Cells were incubated for 10 minutes at 37$\,^{\circ}\mathrm{C}$ with shaking at 500 rpm in a Thermomixer. Cells were then pelleted by centrifuging at 500 $g$ for 5 minutes at 4$\,^{\circ}\mathrm{C}$. Genomic DNA was then extracted using the Monarch gDNA Purification Kit (NEB T3010S) following the standard protocol but with elution using 175 $\mu$L 25 mM K$_3$BO$_3$ at pH 7.0. 

\subsection*{Click reaction, biotin pull down and library generation}

The click reaction was carried out by combining 175 $\mu$L purified DNA, 5 $\mu$L 20 mM DBCO-PEG4-biotin (DMSO solution, Sigma 760749), and 20 $\mu$L 10$\times$ PBS in a final volume of 200 $\mu$L or 87.5 $\mu$L purified and sheared DNA, 2.5 $\mu$L 20 mM DBCO-PEG4-biotin (DMSO solution, Sigma 760749), and 10 $\mu$L 10$\times$ PBS in a final volume of 100 $\mu$L. The reaction was incubated at 37$\,^{\circ}\mathrm{C}$ for 90 minutes.

DNA was purified using AMPure XP beads (50 $\mu$L for a 100 $\mu$L reaction or 100 $\mu$L for a 200 $\mu$L reaction), beads were washed on a magnetic stand twice with 80\% EtOH, and eluted in 130 $\mu$L 25mM K$_3$BO$_3$.

Purified DNA was then sheared on a Covaris E220 instrument down to $\sim$150-400 bp size.

For streptavidin pulldown of biotin-labeled DNA, 10 $\mu$L of 10 mg/mL Dynabeads MyOne Streptavidin T1 beads (Life Technologies, 65602) were separated on a magnetic stand, then washed with 300 $\mu$L of 1$\times$ TWB (Tween Washing Buffer; 5 mM Tris-HCl pH 7.5; 0.5 mM EDTA; 1 M NaCl; 0.05\% Tween 20). The beads were resuspended in 300 $\mu$L of 2$\times$ Binding Buffer (10 mM Tris-HCl pH 7.5, 1 mM EDTA; 2 M NaCl), the sonicated DNA was added (diluted to a final volume of 300 $\mu$L if necessary), and the beads were incubated for $\geq$15 minutes at room temperature on a rotator. After separation on a magnetic stand, the beads were washed with 300 $\mu$L of 1$\times$ TWB, and heated at 55$\,^{\circ}\mathrm{C}$ in a Thermomixer with shaking for 2 minutes. After removal of the supernatant on a magnetic stand, the TWB wash and 55$\,^{\circ}\mathrm{C}$ incubation were repeated. 

Final libraries were prepared on beads using the NEBNext Ultra II DNA Library Prep Kit (NEB, $\#$E7645) as follows. End repair was carried out by resuspending beads in 50 $\mu$L 1$\times$ EB buffer, and adding 3 $\mu$L NEB Ultra End Repair Enzyme and 7 $\mu$L NEB Ultra End Repair Enzyme, followed by incubation at 20$\,^{\circ}\mathrm{C}$ for 30 minutes (in a Thermomixer, with shaking at 1,000 rpm) and then at 65$\,^{\circ}\mathrm{C}$ for 30 minutes. 

Adapters were ligated to DNA fragments by adding 30 $\mu$L Blunt Ligation mix, 1 $\mu$L Ligation Enhancer and 2.5 $\mu$L NEB Adapter, incubating at 20$\,^{\circ}\mathrm{C}$ for 20 minutes, adding 3 $\mu$L USER enzyme, and incubating at 37$\,^{\circ}\mathrm{C}$ for 15 minutes (in a Thermomixer, with shaking at 1,000 rpm) . 

Beads were then separated on a magnetic stand, and washed with 300 $\mu$L TWB for 2 minutes at 55$\,^{\circ}\mathrm{C}$, 1000 rpm in a Thermomixer. After separation on a magnetic stand, beads were washed in 100 $\mu$L 0.1 $\times$ TE buffer, then resuspended in 15 $\mu$L 0.1 $\times$ TE buffer, and heated at 98$\,^{\circ}\mathrm{C}$ for 10 minutes. 

For PCR, 5 $\mu$L of each of the i5 and i7 NEB Next sequencing adapters were added together with 25 $\mu$L 2$\times$ NEB Ultra PCR Mater Mix. PCR was carried out with a 98$\,^{\circ}\mathrm{C}$ incubation for 30 seconds and 12 cycles of 98$\,^{\circ}\mathrm{C}$ for 10 seconds, 65$\,^{\circ}\mathrm{C}$ for 30 seconds, and 72$\,^{\circ}\mathrm{C}$ for 1 minute, followed by incubation at 72$\,^{\circ}\mathrm{C}$ for 5 minutes. 

Beads were separated on a magnetic stand, and the supernatant was cleaned up using 1.8$\times$ AMPure XP beads. 

Libraries were sequenced in a paired-end format on an Illumina NextSeq instrument using NextSeq 500/550 high output kits (2$\times$36 cycles). 

\subsection*{CasKAS data processing}

Demultipexed fastq files were mapped to the \verb|hg38| assembly of the human genome or the \verb|mm10| version of the mouse genome as 2$\times$36mers using Bowtie\cite{Bowtie2009} with the following settings: \verb|-v 2| \verb|-k 2| \verb|-m 1| \verb|--best| \verb|--strata| \verb|-X 1000|. Duplicate reads were removed using \verb|picard|\verb|-tools| (version 1.99). 

Browser tracks generation, fragment length estimation, TSS enrichment calculations, and other analyses were carried out as previously described\cite{MIMB1,MIMB2} using custom-written Python scripts (\burl{https://github.com/georgimarinov/GeorgiScripts}). The \verb|refSeq| set of annotations were used for evaluation of enrichment around TSSs.

\subsection*{CasKAS peak calling}

Peak calling on \textit{in vitro} binding datasets was carried out using version 2.1.0 of MACS2\cite{MACS2} with default settings.

Peaks were then compared against the ENCODE set of ``blacklisted'' regions\cite{BL2019} to filter out likely artifacts.

\subsection*{Sequence analysis}

Guide RNA off-target predictions were obtained from Cas-OFFinder\cite{CasOFFinder}

Multiple sequence alignments of sgRNA sequences and their off-targets were generated using MUSCLE\cite{MUSCLE} and visualized using JalView\cite{JalView}.

\subsection*{Quantification}

\subsection*{\hl{CasKAS occupancy quantification}}

For \textit{in vitro} CasKAS datasets, we quantified occupancy by calculating Read-Per-Million (RPM) values for the $\pm$500-bp regions around off-target sites using the traditional RPM formula:

\begin{equation}
RPM_{OT} = \cfrac{|R_{OT}|}{\cfrac{|R|}{10^6}}
\end{equation}

Where $|R_{OT}|$ is the number of reads mapping to the $\pm$500-bp off-target region, and $|R|$ is the total number of mapped reads.

For \textit{in vitro} CasKAS datasets, we estimated occupancy levels as the difference between \textit{in vivo} CasKAS RPM values and RPM values in a negative no-sgRNA control.

\subsection*{Cutting score calculation}

The Cas9 cutting $C$-score was calculated as follows. 

First, basepair-level RPM profiles for mapped read 5' ends were generated separately for the forward and reverse strands. Then the $C$-score was calculated by multiply the forward and reverse strand profiles (summed over a running window of 3 bp):

\begin{equation}
C\mbox{-score}_{c,i} = \sum^{j=i+1}_{j=i-1} RPM^+_{c,j} \times \sum^{j=i+1}_{j=i-1} RPM^-_{c,j}
\end{equation}

Where ${c,i}$ indicate the coordinates by chromosome and position.

\subsection*{Amplicon sequencing}

Amplicon sequencing was performed according to the xGen Amplicon Panels (IDT) protocol. A custom panel of primers were designed for predicted off target sites for each sgRNA through IDT's amplicon sequencing panel design service. Multiplex amplicon PCR was performed using 17.5 ng of genomic DNA for PCR. For amplification, the standard xGen Amplicon Panels protocol was followed with annealing at 63$\,^{\circ}\mathrm{C}$ and extension  at 65$\,^{\circ}\mathrm{C}$. Libraries were quantified using NEBNext Library Quantification Kit for Illumina (E7630S) and pooled for sequencing on a MiSeq.
	
\subsection*{Amplicon sequencing analysis}

Amplicon sequencing reads were aligned against the \verb|hg38| version of the human genome using \verb|bwa mem|\cite{BWA} (version 0.7.5a) with default settings. Indel frequencies per basepair were calculated as the fraction of reads containing an indel over a given position using custom-written scripts.

\section*{Data availability}

Sequencing reads for the datasets described in this study are available from GEO accession GSE171962.

\section*{Author contributions}

G.K.M. conceptualized the study, performed initial \textit{in vitro} CasKAS experiments, analyzed data, and wrote the manuscript with input from all authors. S.H.K. developed the \textit{in vivo} CasKAS protocol, and performed \textit{in vivo} CasKAS experiments together with S.I.H. S.T.B. carried out \textit{in vitro} CasKAS optimization. A.E.T. and J.T. supplied sgRNAs and designed off-target profiling experiments. A.E.T. carried out off-target analysis for mouse sgRNAs. T.W. provided key reagents. W.J.G., A.K., C.H. M.C.B. and L.B. supervised the study.

\section*{Acknowledgments}

This work was supported by NIH grants (P50HG007735, RO1 HG008140, U19AI057266 and UM1HG009442 to W.J.G., 1UM1HG009436 to W.J.G. and A.K., 1DP2OD022870-01 and 1U01HG009431 to A.K., and HG006827 to C.H.), the Rita Allen Foundation (to W.J.G.), the Baxter Foundation Faculty Scholar Grant, and the Human Frontiers Science Program grant RGY006S (to W.J.G). W.J.G is a Chan Zuckerberg Biohub investigator and acknowledges grants 2017-174468 and 2018-182817 from the Chan Zuckerberg Initiative. S.K. is supported by MSTP training grant T32GM007365 and the Paul and Daisy Soros Fellowship. J.T. is supported by the NIDDK F99/K00 fellowship of the National Institutes of Health (F99DK126120). M.C.B. is supported by a grant from Stanford ChEM-H and an NIH Director’s New Innovator Award (1DP2HD08406901). Fellowship support also provided by the Stanford School of Medicine Dean's Fellowship (G.K.M.), the Siebel Scholars, the Enhancing Diversity in Graduate Education Program and the Weiland Family Fellowship (A.E.T.). C.H. is a Howard Hughes Medical Institute Investigator. 

The authors would like to thank Zohar Shipony and members of the Greenleaf, Kundaje, and Bassik labs for helpful discussion and suggestions regarding this work.

\section*{Competing interests}

G.K.M., W.J.G, T.W. and C.H. have submitted a provisional patent application based on this work.


\begin{thebibliography}{100}

\input{references-V4}

\end{thebibliography}

\end{multicols}

\clearpage

\clearpage

\setcounter{table}{0}
\renewcommand{\tablename}{Supplementary Table}
\setcounter{figure}{0}
\renewcommand{\figurename}{Supplementary Figure}

\setcounter{page}{1}
\renewcommand\thepage{{SM }\arabic{page}}

\begin{center}
% {\LARGE \textbf{\begin{spacing}{1.1}XXXX. \\ Supplementary Materials\end{spacing} }}
{\LARGE \textbf{Supplementary Materials}}
\end{center}

% \section*{Supplementary Tables}

\section*{Supplementary Figures}

% \begin{figure*}[!ht]
% \begin{center}
% \includegraphics[width=8cm]{FigS1-correlations-V2.png}
% \end{center}
% \captionsetup{singlelinecheck=off,justification=justified}
% \caption{
% {\bf Correspondence between \textit{in vitro} dCas9 and active Cas9 CasKAS profiles for the mouse sgRNA \#1 guide}. 
% }
% \label{FigS1}
% \end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS7-Nanog.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles around the mouse \textit{Nanog} locus using the ``Nanog-sg2'' and ``Nanog-sg3'' sgRNAs}. 
}
\label{FigS7}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS4-EMX1-in-vitro-negative-controls.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS signal \textit{in vitro} is specific to the activity of the dCas9/Cas9 protein combined with its sgRNA}. CasKAS was carried out with the EMX1 sgRNA and with the following combinations of protein and sgRNA: dCas9 + sgRNA, Cas9 + sgRNA, dCas9 alone, Cas9 alone, or sgRNA alone.
}
\label{FigS4}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS5-VEGFA-in-vitro.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS signal \textit{in vitro} around the \textit{VEGFA} gene with the VEGFA sgRNA}. 
}
\label{FigS5}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS18-VEGFA-in-vivo.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS signal \textit{in vivo} around the \textit{VEGFA} gene with the VEGFA sgRNA}. 
}
\label{FigS18}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS27-in-vivo-time-course-dCas9-EMX1.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Time course of \textit{in vivo} CasKAS signal around the \textit{EMX1} gene with the EMX1 sgRNA using dCas9}. HEK293 cells were harvested and KAS-seq carried out at the indicated time points after the initiation of the \textit{in vivo} CasKAS experiment.
}
\label{FigS27}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS28-in-vivo-time-course-dCas9-VEGF.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Time course of \textit{in vivo} CasKAS signal around the \textit{VEGFA} gene with the VEGFA sgRNA using dCas9}. HEK293 cells were harvested and KAS-seq carried out at the indicated time points after the initiation of the \textit{in vivo} CasKAS experiment.
}
\label{FigS28}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS29-in-vivo-time-course-Cas9-EMX1.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Time course of \textit{in vivo} CasKAS signal around the \textit{EMX1} gene with the EMX1 sgRNA using active Cas9}. HEK293 cells were harvested and KAS-seq carried out at the indicated time points after the initiation of the \textit{in vivo} CasKAS experiment.
}
\label{FigS29}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS30-in-vivo-time-course-Cas9-VEGF.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Time course of \textit{in vivo} CasKAS signal around the \textit{VEGFA} gene with the VEGFA sgRNA using active Cas9}. HEK293 cells were harvested and KAS-seq carried out at the indicated time points after the initiation of the \textit{in vivo} CasKAS experiment.
}
\label{FigS30}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS31-in-vitro-CD2.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS signal \textit{in vitro} around the \textit{CD2} gene with two different sgRNA targeting the gene}. 
}
\label{FigS31}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS32-in-vivo-CD2.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS signal \textit{in vivo} (HEK293 cells, harvested at 48 hours) around the \textit{CD2} gene with two different sgRNA targeting the gene}. 
}
\label{FigS32}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS33-in-vitro-CD90.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS signal \textit{in vitro} around the \textit{CD90/THY1} gene with two different sgRNA targeting the gene}. 
}
\label{FigS33}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS34-in-vivo-CD90.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS signal \textit{in vivo} (HEK293 cells, harvested at 48 hours) around the \textit{CD90/THY1} gene with two different sgRNA targeting the gene}. 
}
\label{FigS34}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS35-in-vitro-CD45.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS signal \textit{in vitro} around the \textit{CD45/PTPRC} gene with two different sgRNA targeting the gene}. 
}
\label{FigS35}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS36-in-vitro-CD45.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS signal \textit{in vivo} (HEK293 cells, harvested at 48 hours) around the \textit{CD45/PTPRC} gene with two different sgRNA targeting the gene}. 
}
\label{FigS36}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS37-in-vitro-CD298.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS signal \textit{in vitro} around the \textit{CD298/ATP1B3} gene with two different sgRNA targeting the gene}. 
}
\label{FigS37}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS38-in-vivo-CD298.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS signal \textit{in vivo} (HEK293 cells, harvested at 48 hours) around the \textit{CD298/ATP1B3} gene with two different sgRNA targeting the gene}. Note that in this case the gene displays strong native KAS-seq signal around its promoter in HEK293 cells overlapping the sgRNA targeting sites (see the profiles for ``safe'' sgRNAs not targeting this locus below). With dCas9 KAS-seq signal above the control levels is observed, but not with with active Cas9, suggesting that the processes generating ssDNA at this locus (e.g. association with RNA polymerases) might be displacing the active Cas9; in contrast, continuous reassociation with dCas9 (as the target sequence is not altered by cleavage) maintains the elevated KAS-seq signal signature.
}
\label{FigS38}
\end{figure*}

% \begin{figure*}[!ht]
% \begin{center}
% \includegraphics[width=18.5cm]{FigS39-in-vitro-safe.png}
% \end{center}
% \captionsetup{singlelinecheck=off,justification=justified}
% \caption{
% {\bf CasKAS signal \textit{in vitro} around the target sites for two different ``safe'' sgRNA not targeting any genes}. 
% (a) CasKAS signal around the ``safe'' sgRNA \#1 target site;
% (b) CasKAS signal around the ``safe'' sgRNA \#2 target site.
% }
% \label{FigS39}
% \end{figure*}

% \begin{figure*}[!ht]
% \begin{center}
% \includegraphics[width=18.5cm]{FigS40-in-vivo-safe.png}
% \end{center}
% \captionsetup{singlelinecheck=off,justification=justified}
% \caption{
% {\bf CasKAS signal \textit{in vivo} (HEK293 cells, harvested at 48 hours) around the target sites for two different ``safe'' sgRNA not targeting any genes}. 
% (a) CasKAS signal around the ``safe'' sgRNA \#1 target site;
% (b) CasKAS signal around the ``safe'' sgRNA \#2 target site.
% }
% \label{FigS40}
% \end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS8-peak-shape.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS identifies proper off-target sites that are missed by sgRNA prediction algorithms}. Shown is \textit{in vitro} dCas9 CasKAS for the ``sgRNA \#1'' sgRNA. Peaks were called \textit{de novo} using MACS2, then intersected with Cas-OFFinder off-target prediction, and the outersect was manually filtered to exclude obvious artifacts based on peak shape (e.g. arising from repetitive elements in the genome). 
(a) Aggregate forward- and reverse-strand profiles around off-target sites predicted by Cas-OFFinder (centered on the sgRNA);
(b) Aggregate forward- and reverse-strand profiles around sites not predicted by Cas-OFFinder (centered on the MACS2 peak summit);
(c) Example UCSC Genome Browser snapshot of a CasKAS read profile around an off-target site predicted by Cas-OFFinder;
(c) Example UCSC Genome Browser snapshot of a CasKAS read profile around an off-target site not predicted by Cas-OFFinder. Both predicted and identified through peak calling sites exhibit the expected asymmetric read distribution around a fixed occupancy point (the sgRNA-dCas9 RNP complexed with DNA).
}
\label{FigS8}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS13-heatmap-Nanog-sg2.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles for the ``Nanog-sg2'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
}
\label{FigS13}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS14-heatmap-Nanog-sg3.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles for the ``Nanog-sg3'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
}
\label{FigS14}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS15-heatmap-EMX1.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles for the ``EMX1\_Tsai'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
}
\label{FigS15}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS16-heatmap-VEGFA.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles for the ``VEGFA-site1'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
}
\label{FigS16}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS41-heatmap-CD2-1.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles for the ``CD2-1'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
}
\label{FigS41}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS42-heatmap-CD2-2.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles for the ``CD2-2'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
}
\label{FigS42}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS43-heatmap-CD45-1.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles for the ``CD45-1'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
}
\label{FigS43}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS44-heatmap-CD45-2.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles for the ``CD45-2'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
}
\label{FigS44}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=10cm]{FigS45-heatmap-CD90-1.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf \textit{In vitro} Cas9 CasKAS profiles for the ``CD90-1'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
}
\label{FigS45-heatmap-CD90-1}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS46-heatmap-CD90-2.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles for the ``CD90-2'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
}
\label{FigS46-heatmap-CD90-2}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS47-heatmap-CD298-1.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles for the ``CD298-1'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
}
\label{FigS47}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS48-heatmap-CD298-2.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles for the ``CD298-2'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
}
\label{FigS48}
\end{figure*}

\clearpage

% \begin{figure*}[!ht]
% \begin{center}
% \includegraphics[width=18.5cm]{FigS49-heatmap-safe-1.png}
% \end{center}
% \captionsetup{singlelinecheck=off,justification=justified}
% \caption{
% {\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles for the ``safe-1'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
% }
% \label{FigS49}
% \end{figure*}

% \clearpage

% \begin{figure*}[!ht]
% \begin{center}
% \includegraphics[width=18.5cm]{FigS50-heatmap-safe-2.png}
% \end{center}
% \captionsetup{singlelinecheck=off,justification=justified}
% \caption{
% {\bf \textit{In vitro} dCas9 and Cas9 CasKAS profiles for the ``safe-2'' sgRNA}. CasKAS profiles are shown for all off-target sites predicted by Cas-OFFinder as well as for the top 1000 sites (ranked by CasKAS RPM values over the $\pm$500bp region around the sgRNA target site). 
% }
% \label{FigS50}
% \end{figure*}

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=11.75cm]{FigS9-sgRNA-off-targets-Nanog-sg2.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``Nanog-sg2'' sgRNA}. Shown are the top 100 off-target sites as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
}
\label{FigS9}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=11.75cm]{FigS10-sgRNA-off-targets-Nanog-sg3.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``Nanog-sg3'' sgRNA}. Shown are the top 100 off-target sites as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
}
\label{FigS10}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=11.75cm]{FigS11-sgRNA-off-targets-EMX1.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``EMX1\_Tsai'' sgRNA}. Shown are the top 100 off-target sites as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
}
\label{FigS11}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=11.75cm]{FigS12-sgRNA-off-targets-VEGFA.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``VEGFA-site1'' sgRNA}. Shown are the all target sites with RPM $\geq$ 1.5 as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
}
\label{FigS12}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=11.75cm]{FigS53-sgRNA-off-targets-CD2-1.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``CD2-1'' sgRNA}. Shown are the top 100 off-target sites as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
}
\label{FigS53}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=11.75cm]{FigS54-sgRNA-off-targets-CD2-2.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``CD-2'' sgRNA}. Shown are the top 100 off-target sites as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
}
\label{FigS54}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=11.75cm]{FigS55-sgRNA-off-targets-CD45-1.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``CD45-1'' sgRNA}. Shown are the top 100 off-target sites as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
}
\label{FigS55}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=11.75cm]{FigS56-sgRNA-off-targets-CD45-2.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``CD45-2'' sgRNA}. Shown are the top 100 off-target sites as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
}
\label{FigS45}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=11.75cm]{FigS57-sgRNA-off-targets-CD90-1.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} Cas9 CasKAS for the ``CD90-1'' sgRNA}. Shown are the top 100 off-target sites as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
}
\label{FigS57}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=11.75cm]{FigS58-sgRNA-off-targets-CD90-2.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``CD90-2'' sgRNA}. Shown are the top 100 off-target sites as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
}
\label{FigS58}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=11.75cm]{FigS59-sgRNA-off-targets-CD298-1.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``CD298-1'' sgRNA}. Shown are the top 100 off-target sites as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
}
\label{FigS59}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=11.75cm]{FigS60-sgRNA-off-targets-CD298-2.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``CD298-2'' sgRNA}. Shown are the top 100 off-target sites as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
}
\label{FigS60}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

% \begin{figure*}[!ht]
% \begin{center}
% \begin{minipage}[c]{0.70\linewidth}
% \includegraphics[width=11.75cm]{FigS61-sgRNA-off-targets-safe-1.png}
% \end{minipage}\hfill
% \begin{minipage}[c]{0.30\linewidth}
% \captionsetup{singlelinecheck=off,justification=justified}
% \caption{
% {\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``safe-1'' sgRNA}. Shown are the top 100 off-target sites as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
% }
% \label{FigS61}
% \end{minipage}
% \end{center}
% \end{figure*}

% \clearpage

% \begin{figure*}[!ht]
% \begin{center}
% \begin{minipage}[c]{0.70\linewidth}
% \includegraphics[width=11.75cm]{FigS62-sgRNA-off-targets-dafe-2.png}
% \end{minipage}\hfill
% \begin{minipage}[c]{0.30\linewidth}
% \captionsetup{singlelinecheck=off,justification=justified}
% \caption{
% {\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``safe-2'' sgRNA}. Shown are the top 100 off-target sites as predicted by Cas-OFFinder and ranked by CasKAS signal. The on-target site (if within the top 100) is highlighted in yellow. The black bars on the bottom indicate the degree of sequence conservation for a given position within the multiple sequence alignment.
% }
% \label{FigS62}
% \end{minipage}
% \end{center}
% \end{figure*}

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.50\linewidth}
\includegraphics[width=5.25cm]{FigS17-sgRNA-off-targets-sgRNA1-other-peak-calls.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.50\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Multiple sequence alignment of off-target sites identified by \textit{in vitro} dCas9 and Cas9 CasKAS for the ``sgRNA \#1'' sgRNA outside the list of predicted off-targets by Cass-OFFinder}. MACS2 peak calls were manually filtered to exclude artifactual peaks, then the sequence of the $\pm$50-bp region around the peak summit was used as input to the multiple sequence alignment, together with the sgRNA itself.
\label{FigS17}}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=12cm]{FigS21-VEGFA-cutting.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Cutting profiles around on- and off-target sites for the VEGFA sgRNA}. Four sites where cleavage is observed are identified within the list of predicted off-targets.
}
\label{FigS21}
\end{minipage}
\end{center}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=12cm]{FigS63-CD2-1-cutting.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Cutting profiles around the top 100 on- and off-target sites for the ``CD2-1'' sgRNA}. 
}
\label{FigS63}
\end{minipage}
\end{center}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=12cm]{FigS64-CD2-2-cutting.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Cutting profiles around the top 100 on- and off-target sites for the ``CD2-2'' sgRNA}. 
}
\label{FigS64}
\end{minipage}
\end{center}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=12cm]{FigS65-CD45-1-cutting.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Cutting profiles around the top 100 on- and off-target sites for the ``CD45-1'' sgRNA}. 
}
\label{FigS65}
\end{minipage}
\end{center}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=12cm]{FigS66-CD45-2-cutting.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Cutting profiles around the top 100 on- and off-target sites for the ``CD45-2'' sgRNA}. 
}
\label{FigS66}
\end{minipage}
\end{center}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=12cm]{FigS67-CD90-1-cutting.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Cutting profiles around the top 100 on- and off-target sites for the ``CD90-1'' sgRNA}. 
}
\label{FigS67}
\end{minipage}
\end{center}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=12cm]{FigS68-CD90-2-cutting.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Cutting profiles around the top 100 on- and off-target sites for the ``CD90-2'' sgRNA}. 
}
\label{FigS68}
\end{minipage}
\end{center}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=12cm]{FigS69-CD298-1-cutting.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Cutting profiles around the top 100 on- and off-target sites for the ``CD298-1'' sgRNA}. 
}
\label{FigS69}
\end{minipage}
\end{center}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\begin{minipage}[c]{0.70\linewidth}
\includegraphics[width=12cm]{FigS70-CD298-2-cutting.png}
\end{minipage}\hfill
\begin{minipage}[c]{0.30\linewidth}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Cutting profiles around the top 100 on- and off-target sites for the ``CD298-2'' sgRNA}. 
}
\label{FigS70}
\end{minipage}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS51-Amplicon-EMX1.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Amplicon sequencing of DNA edits with the EMX1 sgRNA}. HEK293 cells were transfected (in replicates) with the EMX1 or the VEGFA sgRNAs. Genomic DNA was extracted and a total of 81 potential off-target sites for the EMX1 sgRNA were amplicon-sequenced. 
(a) The indel frequency (the fraction of reads with an indel over a given position) profiles over each such site identify the on-target site as the only position that is edited, concordant with CasKAS results.
(b) Genome browser snapshot of indel frequencies over the on-target site
(c) An additional site shows high indel frequency, however, it is present at the same rate in all datasets, including the no-guide negative control, indicating that this is an endogenous sequence variant and not an actual \textit{in vivo} off-target.
}
\label{FigS51}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS52-Amplicon-VEGFA.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Amplicon sequencing of DNA edits with the VEGFA sgRNA}. HEK293 cells were transfected (in replicates) with the EMX1 or the VEGFA sgRNAs. Genomic DNA was extracted and a total of 52 potential off-target sites for the VEGFA sgRNA were amplicon-sequenced. Very high \textit{in vivo} indel frequency is observed for the on-target (b) and one of the other sites (d) identified as active Cas9 cutting targets in the \textit{in vitro} CasKAS. Another site (c) also shows elevated indel frequency. The fourth site (e) does not appear to be a cutting target \textit{in vitro}.
}
\label{FigS52}
\end{center}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS19-GUIDE-seq-ChIP-seq-comparison.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Comparing \textit{in vitro} dCas9 results to using ChIP-seq and CHANGE-seq for off-target profiling}. Shown is the overlap between MACS2 peak calls for the Nanog-sg3 sgRNA with Nanog ChIP-seq dataset (SRR1168384 from GEO accession ID GSE54745) in (a) and the EMX1 sgRNA with EMX1 CHANGE-seq (SRA accession SRX8227890) in (b). The fraction of peaks common or unique to each assay that are predicted to be off-targets for each sgRNA by Cas-OFFinder is shown in (c).
}
\label{FigS19}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=14cm]{FigS25-Discover-seq-comparison-1.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Comparing \textit{in vitro} dCas9 results to using DISCOVER-seq for off-target profiling}. Shown is the overlap between MACS2 peak calls for the EMX1 sgRNA with MACS2 peak calls on datasets from the original DISCOVER-seq publication\cite{Wienert2019}.
}
\label{FigS25}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=14cm]{FigS26-Discover-seq-comparison-2.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Comparing \textit{in vitro} dCas9 results to using DISCOVER-seq for off-target profiling}. Shown is the overlap between MACS2 peak calls for the VEGFA sgRNA with MACS2 peak calls on datasets from the original DISCOVER-seq publication\cite{Wienert2019}.
}
\label{FigS26}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS23-GUIDE-seq-comparison.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Comparing \textit{in vitro} dCas9 results to using GUIDE-seq for off-target profiling}. Shown is the overlap between MACS2 peak calls for the EMX1 and VEGFA sgRNAs with off-target regions defined by the original GUIDE-seq publication\cite{Tsai2015}.
}
\label{FigS23}
\end{figure*}

\begin{figure*}[!hb]
\begin{center}
\includegraphics[width=18.5cm]{FigS24-Digenome-seq-comparison.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Comparing \textit{in vitro} dCas9 results to using Digenome-seq for off-target profiling}. Shown is the overlap between MACS2 peak calls for the VEGFA sgRNA with off-target regions defined by the original Digenome-seq publication\cite{Kim2015}.
}
\label{FigS24}
\end{figure*}

\clearpage

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS3-sgRNA-G-content.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Most sgRNAs in the human genome contain multiple G nucleotides and are thus subject to labeling by N$_3$-kethoxal}. Statistics were calculated for all valid sgRNAs as defined by GuideScan\cite{Perez2017}
(a) Cumulative fraction of sgRNAs.
(b) Absolute number of sgRNAs.
}
\label{FigS3}
\end{figure*}

\clearpage

\begin{FPfigure}
\begin{center}
\includegraphics[width=18cm]{FigS22-GC-vs-RPM-V2.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Absence of strong correlation between the number of G nucleotides in a sgRNA off-target site and CasKAS signal}. Highly enriched off-target sites do not show a strong preference for containing more G nucleotides than other predicted off-target sites.  \\
(a) sgRNA \#1 dCas9; Pearson $r^2$ = 0.00, Spearman $R$ = 0.13; \\
(b) Nanog-sg2 dCas9; Pearson $r^2$ = 0.10, Spearman $R$ = 0.12; \\ 
(c) Nanog-sg3 \#1 dCas9;  Pearson $r^2$ = 0.11, Spearman $R$ = 0.18; \\
(d) VEGFA \#1 dCas9;  Pearson $r^2$ = -0.07, Spearman $R$ = -0.09; \\
(e) EMX1 \#1 dCas9;  Pearson $r^2$ = --0.02, Spearman $R$ = --0.06; \\
(f) CD90 \#2 dCas9;  Pearson $r^2$ = 0.07, Spearman $R$ = 0.13; \\
(g) CD2 \#1 dCas9;  Pearson $r^2$ = 0.08, Spearman $R$ = 0.13; \\
(h) CD2 \#2 dCas9;  Pearson $r^2$ = --0.01, Spearman $R$ = --0.01; \\
(i) CD45 \#1 dCas9;  Pearson $r^2$ = 0.07, Spearman $R$ = 0.06; \\
(j) CD45 \#2 dCas9;  Pearson $r^2$ = 0.08, Spearman $R$ = 0.08; \\
(k) CD298 \#1 dCas9;  Pearson $r^2$ = 0.22, Spearman $R$ = 0.28; \\
(l) CD298 \#2 dCas9;  Pearson $r^2$ = 0.06, Spearman $R$ = 0.04.
}
\label{FigS22}
\end{FPfigure}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{FigS6-pre-sheared-EMX1.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf CasKAS can be performed on pre-sheared DNA}. CasKAS was performed \textit{in vitro} using the EMX1 sgRNA, first, conventionally, by carrying out the CasKAS reaction, then isolating and shearing genomic DNA, and also by pre-shearing the DNA and carrying out the CasKAS reaction on the fragmented DNA. The concern in that case is that the presence of sticky ends containing Gs and unprotected from the action of the N$_3$-kethoxal would lower the background. This problem can be addressed by carrying out end repair on the sheared DNA prior to the CasKAS reaction. 
}
\label{FigS6}
\end{figure*}

\end{document}
