\documentclass[10pt]{article}
\usepackage{marginnote}
\usepackage[paperheight=25cm,paperwidth=18cm,lmargin=1.7cm,rmargin=1.7cm,top=2.2cm,bottom=2cm,marginparwidth=3.2cm,marginparsep=-3.2cm]{geometry}
% \usepackage[paperheight=25cm,paperwidth=18cm,lmargin=1.7cm,rmargin=1.7cm,top=2.2cm,bottom=2cm]{geometry}
\setlength\columnsep{30pt}
\usepackage{multicol}
\usepackage{amsmath}
\usepackage{mathtools}
\usepackage{amsthm}
\usepackage{array}
\usepackage{booktabs}
\usepackage{tabularx}
\usepackage[auth-sc]{authblk}
\usepackage{longtable}
\usepackage{multirow}
\usepackage{hyperref}
\usepackage{enumerate}
\usepackage[labelfont=bf]{caption}
\usepackage[usenames,dvipsnames]{xcolor}
\usepackage{mdframed}
\usepackage{graphics}
\usepackage{multirow}
\usepackage{rotating}
\usepackage{array}
\usepackage{lscape}
\usepackage{caption}
\usepackage{breakurl}
\usepackage{todonotes}
\usepackage{hanging}
\usepackage[final]{pdfpages}
\usepackage[leftFloats,CaptionAfterwards]{fltpage}
\usepackage[numbers,sort&compress]{natbib}
\setlength{\bibsep}{3pt}
\usepackage{abstract}
\usepackage{enumitem}
\usepackage{titlesec}
\usepackage{etoolbox}
\usepackage{soul}
\patchcmd{\thebibliography}{\section*}{\section}{}{}
\titleformat{\section}[block]{\bf \fontfamily{phv}\selectfont{\Large\bfseries\filcenter}}{\thesection}{0.6em}{}
\titleformat{\subsection}[block]{\bf\fontfamily{phv}\selectfont{\normalsize\bfseries\filcenter}}{\thesubsection}{0.4em}{}

\hypersetup{
  colorlinks,
  citecolor=Blue,
  linkcolor=Red,
  urlcolor=Violet}
  
\usepackage{helvet}

\usepackage{titlesec}

% \titleformat{\section}[leftmargin]{\normalfont\sffamily\bfseries\filleft}{}{0pt}{}
% \titlespacing{\section}{4pc}{1.5ex plus .1ex minus .2ex}{1pc}

\makeatletter
\def\@biblabel#1{\@ifnotempty{#1}{#1.}}
\makeatother

\newenvironment{Figure}
{\par\medskip\noindent\minipage{\linewidth}}
{\endminipage\par\medskip}

\makeatletter
\renewcommand{\maketitle}{\bgroup\setlength{\parindent}{0pt}
\begin{flushleft}
  \textbf{\@title}
  \@author
\end{flushleft}\egroup
}
\makeatother


\title{\bf \begin{flushleft}\fontfamily{phv}\selectfont{\Large Simultaneous single-cell profiling of the transcriptome and accessible chromatin using SHARE-seq}
\end{flushleft}}
\renewcommand\Authfont{\normalsize}
\author[1,*,$\#$]{\fontfamily{phv}\selectfont{\textbf{Samuel H. Kim}}}
\author[2,*]{\fontfamily{phv}\selectfont{\textbf{Georgi K. Marinov}}}
\author[2]{\fontfamily{phv}\selectfont{\textbf{S. Tansu Bagdatli}}}
\author[2]{\fontfamily{phv}\selectfont{\textbf{Soon Il Higashino}}}
\author[2]{\fontfamily{phv}\selectfont{\textbf{Zohar Shipony}}}
\author[2,3]{\fontfamily{phv}\selectfont{\textbf{Anshul Kundaje}}}
\author[2,4,5,6]{\fontfamily{phv}\selectfont{\textbf{William J. Greenleaf}}}
\renewcommand\Affilfont{\itshape\normalsize}
\affil[1]{Cancer Biology Programs, School of Medicine, Stanford University, Stanford, CA 94305, USA}
\affil[2]{Department of Genetics, Stanford University, Stanford, CA 94305, USA}
\affil[3]{Department of Computer Science, Stanford University, Stanford, CA 94305, USA}
\affil[4]{Center for Personal Dynamic Regulomes, Stanford University, Stanford, California 94305, USA}
\affil[5]{Department of Applied Physics, Stanford University, Stanford, California 94305, USA}
\affil[6]{Chan Zuckerberg Biohub, San Francisco, California, USA}
\affil[*]{These authors contributed equally to this work}
\affil[$\#$]{Corresponding author}
\date{}

\def\changemargin#1#2{\list{}{\rightmargin#2\leftmargin#1}\item[]}
\let\endchangemargin=\endlist 

\theoremstyle{definition}
\newtheorem{note}{}

\begin{document}
\maketitle

\renewcommand{\abstractname}{\noindent\fontfamily{phv}\selectfont{\centerline{}
Abstract}}

\renewenvironment{abstract}
 {\small
  \begin{flushleft}
  \bfseries \noindent{\large\abstractname}\par\nobreak\smallskip\vspace{-.5em}\vspace{0pt}
  \end{flushleft}
  \list{}{
    \setlength{\leftmargin}{.0cm}%
    \setlength{\rightmargin}{\leftmargin}%
  }%
  \item\relax}
 {\endlist}
 
% \renewenvironment{abstract}
%   {\small\quotation
%   {\bfseries\noindent{\large\abstractname}\par\nobreak\smallskip}}
%   {\endquotation}

\renewcommand{\figurename}{Fig.}

\centerline{}
\begin{abstract}
\noindent\noindent{\normalsize The ability to analyze the transcriptomic and epigenomic states of individual single cells has in recent years transformed our ability to measure and understand biological processes. Recent advancements have focused on increasing sensitivity and throughput  to provide richer and deeper biological insights at the cellular level. The next frontier is the development of multiomic methods capable of analyzing multiple features from the same cell, such as the simultaneous measurement of the transcriptome and the chromatin accessibility of candidate regulatory elements. In this chapter we discuss and describe SHARE-seq (\textbf{\underline{S}}imultaneous \textbf{\underline{h}}igh-throughput \textbf{\underline{A}}TAC and \textbf{\underline{R}}NA \textbf{\underline{e}}xpression with \textbf{\underline{seq}}uencing) for carrying out simultaneous chromatin accessibility and transcriptome measurements in single cells, together with the experimental and analytical considerations for achieving optimal results.
\centerline{}
\centerline{}
\indent\indent\textbf{Key words:} scRNA-seq, scATAC-seq, multiomics, chromatin accessibility, transcriptomics, split-pool}
\end{abstract}
\centerline{}
\centerline{}
\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm}

\reversemarginpar\marginpar{\section{Introduction}}

The basic unit of biological organization is the individual cell. In combination with their surrounding cellular microenvironments within the context of a multicellular organism, each cell integrates across internal and external stimuli to maintain or alter its state for biological function. Understanding the cellular state at the single cell resolution, therefore, is critical to defining the regulatory processes driving health and disease. A key advancement toward understanding cellular states have been in the development of transcriptomic methods. With the advent of high-throughput sequencing methods in the late 2000s, RNA-seq was developed to profile transcriptomes at base-pair resolutions \cite{Mortazavi2008,Nagalakshmi2008,Sultan2008,Wilhelm2008}. Subsequently, the molecular biology approaches that enabled ever improved RNA-seq sensitivity has led to the development of single-cell RNA-seq (scRNA-seq) to measure transcriptomes at the single-cell level. The first scRNA-seq methods \cite{Tang2009,Islam2011,Ramdskold2012,Hashimshony2012} were very low throughput, only able to measure a few cells at a time. Further technical advancements utilized microfluidics- and plate-based approaches to increase throughput to the $10^2$--$10^3$ range \cite{Shalek2013,Jaitin2014}, while droplet- and bead-based methods later boosted it to the $10^4$--$10^5$ range \cite{Klein2015,Macosko2015,10xRNA,Microwell-Seq}. However, the approach that holds the most promise for ultra-high throughput single-cell measurements is combinatorial indexing. The core concept of these approaches is to dynamically assign barcodes through multiple rounds of splitting and pooling cells to create a combinatorial set of barcodes that can be used to uniquely identify each cell. Specifically, a set of cells can be split into a 96- or 384-well plates, each well given a specific barcode, then pooled back together to be randomly split into another set of plates. Iteratively performing these split-pool rounds with an optimal number of input cells, barcodes, and number of rounds of barcoding, one can create a sufficient diversity of barcodes to uniquely assign each cell to a combination of barcodes. In comparison to physical isolation of each cell in a droplet or a well, combinatorial indexing provides a scalable platform for single cell measurements. This is the basis of all ``sci'' (single-cell combinatorial indexing) methods, such as sci-RNA-seq \cite{Cao2017} and SPLiT-seq \cite{Rosenberg2018}.

While scRNA-seq measures the current amount of transcripts in a given cell, it does not provide insight into how that transcriptional state is achieved and maintained through regulation. Mapping active \textit{cis}-regulatory elements (cREs) provides key insight to address this need. A common property of active cREs, originally recognized more than four decades ago \cite{McGhe1981,Keene1981,Wu1980}, is that they are depleted of nucleosomes and exhibit an open, ``accessible'' conformation. This property has been the basis for the numerous methods that have been developed over the years to profile these elements \cite{Minnoye2021}, which rely on the preferential enzymatic cleavage or labeling of open chromatin regions. ATAC-seq \cite{Buenrostro2013,Corces2017} (\textbf{\underline{A}}ssay for \textbf{\underline{T}}ransposase-\textbf{\underline{A}}ccessible \textbf{\underline{C}}hromatin using \textbf{\underline{seq}}uencing) has emerged as the most versatile instance of such assays. ATAC-seq takes advantage of the preferential insertion of a hyperactive Tn5 \cite{Reznikoff2008} transposase, preloaded with sequencing adapters into open chromatin. Tn5 had been previously adapted and successfully used for the generation of high-throughput sequencing libraries from low-input DNA samples \cite{Adey2010}. The realization that it can also be used to tag open chromatin regions with ready-for-amplification sequencing adapters in a single reaction allowed for chromatin accessibility profiling to be carried out in bulk on very low-input samples (typically 50,000 cells, but also down to just a few thousand \cite{Buenrostro2013}), and eventually in single-cells, in the form of scATAC-seq, in the mid-2010s\cite{Buenrostro2015}. As with scRNA-seq, the throughput of scATAC-seq has also been dramatically increased over the years, using combinatorial indexing (sciATAC-seq \cite{Cusanovich2015,Cusanovich2018,Preissl2018}), microwell plates ($\mu$ATAC-seq \cite{Mezger2018}), droplet-based methods \cite{10xATAC}, and combinations of combinatorial indexing and droplets (dsciATAC-seq \cite{Lareau2019}).

Techniques such as scRNA-seq and scATAC-seq has provided unprecedented insights into the diversity of cell types, their developmental dynamics, and cellular responses to external stimuli in a wide variety of context. However, the ideal measurements would provide information about all relevant aspects of the state from the same cell. To this end, a variety of single-cell multiomic methods, measuring multiple such modalities in the same individual cells, have been under active development in recent years. These include methods for sequencing the genomes and transcriptomes of single cells (G\&T-seq \cite{G&T-seq}, PRDD-seq \cite{PRDD-seq}, DNTR-seq \cite{DNTR-seq}, sci-L3-RNA/DNA \cite{Yin2019}, TARGET-seq \cite{TARGET-seq}, and others), for sequencing methylomes and transcriptomes (scTrio-seq \cite{scTrio-seq}, scMT-seq \cite{scMT-seq}, and scM\&T-seq \cite{scM&T-seq}), for mapping accessible chromatin and methylomes (e.g. scNOMe-seq \cite{scNOMe-seq}), for measuring proteins and transcripts (REAP-seq \cite{REAP-seq},  CITE-seq \cite{CITE-seq}, QBC \cite{QBC}, inCITE-seq \cite{inCITE-seq}, iNS-seq \cite{iNS-seq}, using methylation-based labeling of open chromatin to map accessible DNA and transcripts (COOL-seq \cite{Guo2017}, scNMT-seq \cite{scNMT-seq}, scNOMeRe-seq \cite{scNOMeRe-seq}, snmC2T-seq \cite{snmC2T-seq}), mapping protein occupancy and transcriptomes (CoTECH \cite{CoTECH}, Paired-Tag \cite{Paired-Tag}, scDam\&T-seq \cite{scDamT-seq}), for quantifying proteins levels and mapping open chromatin (PHAGE-ATAC \cite{PHAGE-ATAC}, ASAP-seq \cite{Mimitou2021}), for quantifying proteins and transcriptome levels and mapping open chromatin (DOGMA-seq \cite{Mimitou2021}, TEA-seq \cite{Swanson2021}), and others \cite{SUGAR-seq}. 

\begin{figure*}
\begin{center}
\includegraphics[width=8cm]{Fig1-outline.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Outline of the SHARE-seq assay}. Nuclei are isolated from cells or tissues and crosslinked. Transposition is then carried out on chromatin, followed by reverse transcription with a biotinylated RT primer. Three pool/split rounds of hybridization of barcode oligos are then performed. Hybridized barcodes are then ligated, and crosslinks are reversed. The ATAC and RNA portions are separated by streptavidin pull-down. The ATAC is directly amplified, the RNA is subjected to cDNA amplification, tagmentation, and final library amplification. }
\label{Fig1}
\end{center}
\end{figure*}

As regulatory elements and RNA levels are the two perhaps most informative modalities, joint scATAC-seq + scRNA-seq methods are the most sought after multiomic assays. A number of these have been developed in recent years -- sci-CAR-seq \cite{Cao2018}, Paired-seq \cite{Zhu2019}, ASTAR-seq \cite{ASTAR-seq}, SNARE-seq \cite{SNARE-seq}, SHARE-seq \cite{Ma2020}, and others. The ideal such assay should capture as many of the transcripts present in each cell as possible and also as many of the open chromatin regions in the nucleus, with high specificity and little noise. The SHARE-seq assay, which is based on the combinatorial indexing  described above, provides high-quality and high throughput transcriptome and accessible chromatin measurements in the same single cells. 

In this chapter, we describe in detail the SHARE-seq procedure and discuss the key optimization points and considerations for the generation of high-quality scATAC+scRNA-seq datasets.	

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm} 
\reversemarginpar\marginpar{\section{Materials}}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{DNA oligos and primers}}}

All oligonucleotides can be obtained through IDT. The exact scale and purification methods are listed below.

\begin{enumerate}
\item Round 1 linker (1 $\mu$mol scale, standard desalting): 

\verb|CCGAGCCCACGAGACTCGGACGATCATGGG|

\item Round 2 linker (1 $\mu$mol scale, standard desalting):

\verb|CAAGTATGCAGCGCGCTCAAGCACGTGGAT|

\item Round 3 linker (1 $\mu$mol scale, standard desalting):

\verb|AGTCGTACGCCGATGCGAAACATCGGCCAC|

\item Round 1 blocking (1 $\mu$mol scale, standard desalting):

\verb|CCCATGATCGTCCGAGTCTCGTGGGCTCGG|

\item Round 2 blocking (1 $\mu$mol scale, standard desalting):

\verb|ATCCACGTGCTTGAGCGCGCTGCATACTTG|

\item Round 3 blocking (1 $\mu$mol scale, standard desalting):

\verb|GTGGCCGATGTTTCGCATCGGCGTACGACT|

\item Read 1 (100 nmol scale, HPLC purified):

\verb|TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG|

\item Template Switching Oligo (TSO) (100nmol scale, HPLC purified):

\verb|AAGCAGTGGTATCAACGCAGAGTGAATrGrG+G|

\item RNA PCR primer (100 nmol scale, standard desalting):

\verb|AAGCAGTGGTATCAACGCAGAGT|

\item P7 primer (100 nmol scale, standard desalting):

\verb|CAAGCAGAAGACGGCATACGAGAT|

\item Phosphorylated Read2 (100 nmol scale, HPLC purified):

\verb|/5Phos/GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG||

\item Reverse transcription primer (RT primer) (100 nmol scale, HPLC purified)

\begin{verbatim}
/5Phos/GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNNNNNNNNNN
/iBiodT/TTTTTTTTTTTTTTVN
\end{verbatim}

\item Blocked\_ME\_Comp (100 nmol scale, HPLC purified):

\verb|/5Phos/C*T*G* T*C*T* C*T*T* A*T*A* C*A*/3ddC/|

\item Pool-split ligation Plate R1 (\textit{see} \textbf{Note \ref{plates}}:

\verb|/5Phos/CGCGCTGCATACTTG[8-bp-barcode]CCCATGATCGTCCGA|

\item Pool-split ligation Plate R2 (\textit{see} \textbf{Note \ref{plates}}:

\verb|/5Phos/CATCGGCGTACGACT[8-bp-barcode]ATCCACGTGCTTGAG|

\item Pool-split ligation Plate R3 (\textit{see} \textbf{Note \ref{plates}}:

\verb|CAAGCAGAAGACGGCATACGAGAT[8-bp-barcode]GTGGCCGATGTTTCG|

\item PCR Library indexing primers plate:

\begin{verbatim}
AATGATACGGCGACCACCGAGATCTACAC[8bp-index]
TCGTCGGCAGCGTCAGATGTGTAT
\end{verbatim}
\end{enumerate}

An example set of 96 barcodes is listed below

\begin{verbatim}
AACGTGAT AAGGTACA CACTTCGA GATAGACA TGGAACAA ATCATTCC
AAACATCG ACACAGAA CAGCGTTA GCCACATA TGGCTTCA ATTGGCTC
ATGCCTAA ACAGCAGA CATACCAA GCGAGTAA TGGTGGTA CAAGGAGC
AGTGGTCA ACCTCCAA CCAGTTCA GCTAACGA TTCACGCA CACCTTAC
ACCACTGT ACGCTCGA CCGAAGTA GCTCGGTA AACTCACC CCATCCTC
ACATTGGC ACGTATCA CCGTGAGA GGAGAACA AAGAGATC CCGACAAC
CAGATCTG ACTATGCA CCTCCTGA GGTGCGAA AAGGACAC CCTAATCC
CATCAAGT AGAGTCAA CGAACTTA GTACGCAA AATCCGTC CCTCTATC
CGCTGATC AGATCGCA CGACTGGA GTCGTAGA AATGTTGC CGACACAC
ACAAGCTA AGCAGGAA CGCATACA GTCTGTCA ACACGACC CGGATTGC
CTGTAGCC AGTCACTA CTCAATGA GTGTTCTA ACAGATTC CTAAGGTC
AGTACAAG ATCCTGTA CTGAGCCA TAGGATGA AGATGTAC GAACAGGC
AACAACCA ATTGAGGA CTGGCATA TATCAGCA AGCACCTC GACAGTGC
AACCGAGA CAACCACA GAATCTGA TCCGTCTA AGCCATGC GAGTTAGC
AACGCTTA GACTAGTA CAAGACTA TCTTCACA AGGCTAAC GATGAATC
AAGACGGA CAATGGAA GAGCTGAA TGAAGAGA ATAGCGAC GCCAAGAC
\end{verbatim}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{General \newline Reagents}}}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{General  \newline  Equipment}}}

\begin{enumerate}
\item Eppendorf Thermomixer C (96 well plate adapter)
\item Tabletop centrifuge
\item Swing bucket centrifuge with temperature control
\item Thermal cycler
\item Cold room
\item qPCR machine (QuantStudio 3)
\item Qubit fluorometer or equivalent
\item E-gel electrophoresis system (Thermo Fisher Scientific)
\item TapeStation (Agilent) or equivalent, e.g. BioAnalyzer (Agilent).
\item Multichannel pipettes or liquid handling instruments
\item gentleMACS Dissociator (Miltenyi Biotec)
\item Automated cell counter, e.g. Countess 3 (ThermoFisher Scientific) or equivalent.
\end{enumerate}

\begin{enumerate}
\item 1$\times$ PBS buffer solution  (Thermo Fisher Scientific, Cat \#10010049)
\item Bovine Albumin Fraction V (7.5\% solution) (Thermo Fisher Scientific, Cat \#15260037)
\item Trypan Blue Stain (0.4\%) (Thermo Fisher Scientific, Cat \#T10282)
\item Enzymatic RI (Qiagen, Cat \#Y9240L)
\item SUPERase RI (Thermo Fisher Scientific, Cat \#AM2696)
\item Lucigen RI (Lucigen Cat \# 30281-2)
\item Protector RI (Sigma Aldrich Cat \# 3335399001)
\item 16\% FA (Thermo Fisher Scientific, Cat \# 28906)
\item Glycine  (Sigma Aldrich, Cat \#50049)
\item 1 M Tris HCl pH 7.5 (Thermo Fisher Scientific, Cat \#15567027)
\item 1 M Tris HCl pH 8.0 (Thermo Fisher Scientific, Cat \#15568025)
\item 5 M NaCl (Thermo Fisher Scientific, Cat \#AM9760G)
\item 1 M MgCl$_2$  (Sigma Aldrich, Cat \#63069)
\item 1 M CaCl$_2$ (Sigma Aldrich, Cat \#21115-100ML)
\item DMF (Dimethyl Formamide) (Sigma, Cat \#227056)
\item 0.2 M Tris-acetate pH 7.8 (Bioworld, Cat \#40120265-2)
\item 5 M Potassium acetate (Sigma Aldrich, Cat \#95843-100ML-F)
\item 1 M Magnesium acetate (Sigma Aldrich, Cat \#63052-100ML)
\item 10\% NP-40 (Thermo Fisher Scientific, Cat \#28324)
\item Buffer EB (Qiagen, Cat \#19086) 
\item PEG 6000 (Sigma Aldrich, Cat \#528877)
\item Maxima H Minus Reverse Transcriptase with buffer (Thermo Fisher Scientific, Cat \#EP075)
\item 10 mM dNTPs (NEB, Cat \#N0447L)
\item T4 DNA Ligase (NEB, Cat \#M0202L)
\item Additional 10$\times$ T4 Ligase buffer (NEB, Cat \#B0202S)
\item Proteinase K (20 mg/mL) (NEB, Cat \#P8107S)
\item 20\% SDS (VWR, Cat \#97062+440)
\item 100 mM PMSF/IPA (Sigma Aldrich, Cat \# P7626)
\item cOmplete Protease Inhibitor Cocktail (Sigma Aldrich, Cat \# 11697498001)
\item 0.5 M EDTA  (Sigma Aldrich, Cat \#AM9260G)
\item Tween-20 (Sigma Aldrich, Cat \#P9416-100ML)
\item Digitonin (Promega, Cat \#G9441)
\item MyOne C1 Dynabeads (Thermo Fisher Scientific, Cat \#65001)
\item Ficoll PM-400 (20\%) (Sigma Aldrich, Cat \#F5415-25ML)
\item Kapa HiFi 2$\times$ mix (Fisher Scientific, Cat \#NC0295239)
\item SPRIselect beads (Beckman Coulter, Cat \#B23318)
\item 100\% EtOH
\item 100 mM DTT (Thermo Fisher Scientific, Cat \#707265ML)
\item NEBnext 2$\times$ Mix (NEB, Cat \#M0541L)
\item Glycerol (Thermo Fisher Scientific, Cat \#15514011)
\item TD buffer from Nextera kit
\item SYBR Green I Nucleic Acid Gel Stain (Thermo Fisher Scientific, Cat \#S7563)
\item EVAGreen Dye, 20x in water (Biotium, Cat \#31000)
\item Nuclease-free H$_2$O
\item 96-well plates (Eppendorf, Cat \#0030129300) (preferably low protein and DNA binding; \textit{see} \textbf{Note \ref{Tubes}}) 
\item 1.5-mL microcentrifuge tubes, preferably low protein and DNA binding (\textit{see} \textbf{Note \ref{Tubes}})
\item 2-mL, 15-mL and 50-mL tubes
\item gentleMACS M Tubes (Miltenyi Biotec, Cat \#130-093-236)
\item 30 $\mu$m Sterile single-pack CellTrics filters (Sysmex, Cat \#04-004-2326)
\item 200-$\mu$L PCR tubes
\item Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Cat \# Q32851)
\item TapeStation D1000 and D5000 tape and reagents (Agilent) 
\item Tn5 transposase (\textit{see} \textbf{Note \ref{Tn5}})
\item MinElute PCR Purification Kit (Qiagen Cat\# 28004/28006), Zymo DNA Clean and Concentrator Kit (Zymo Cat\#  D4013/D4014), or equivalent % (\textit{see} \textbf{Note \ref{DNAPurification}}) 
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Buffers  \newline and \newline Reagents}}}

Make all buffers using ultrapure molecular biology-grade ddH$_2$O.

\begin{enumerate}

\item 2.5M Glycine (50 mL) \\
 \hspace*{20pt}9.375 g Glycine (powder) \\
 \hspace*{20pt}1$\times$ PBS up to 50 mL 

Filter through a 0.22 $\mu$M filter. Store at room temperature.

% \item IGEPAL CA-630 detergent (Sigma Cat\# 11332465001; supplied as a 10\% solution)
% \item Tween-20 detergent (Sigma Cat\# 11332465001, supplied as a 10\% solution; store at 4$\,^{\circ}\mathrm{C}$)
% \item Digitonin detergent (Promega Cat\# G9441, supplied as a 2\% solution in DMSO; store at -20$\,^{\circ}\mathrm{C}$))

\item Tissue Dissociation (MACS) buffer \\
\hspace*{20pt}10 mM Tris-HCl pH 8.0 \\
\hspace*{20pt}5 mM CaCl$_2$ \\
\hspace*{20pt}5 mM EDTA \\
\hspace*{20pt}3 mM MgAc \\
\hspace*{20pt}0.6 mM DTT \\
\hspace*{20pt}cOmplete protease inhibitor

Make fresh every time.

\item Nuclei Isolation Buffer (NIB) \\
\hspace*{20pt}10 mM Tris-HCl pH 7.4 \\
\hspace*{20pt}10 mM NaCl \\
\hspace*{20pt}3 mM MgCl$_2$ \\
\hspace*{20pt}0.1\% IGEPAL CA-630

Store at 4$\,^{\circ}\mathrm{C}$.

\item 2$\times$ TD buffer \\
\hspace*{20pt}20 mM Tris-HCl pH 7.6 \\
\hspace*{20pt}10 mM MgCl$_2$ \\
\hspace*{20pt}20\% Dimethyl Formamide

Store at -20$\,^{\circ}\mathrm{C}$.

\item PEG 6000 50\%

Mix equal mass of PEG6000 and H$_2$O, heat to 65$\,^{\circ}\mathrm{C}$) for 4 minutes, then cool down to room temperature.

\item 2$\times$ RCB buffer \\
\hspace*{20pt}100 mM Tris pH 8.0 \\
\hspace*{20pt}100 mM NaCl \\
\hspace*{20pt}0.40\% SDS

Store at room temperature.

\item 2$\times$ BW buffer \\
\hspace*{20pt}10 mM Tris pH 8.0 \\
\hspace*{20pt}2 M NaCl \\
\hspace*{20pt}1 mM EDTA

Store at 4$\,^{\circ}\mathrm{C}$.

\item 1$\times$ B\&W-T Buffer \\
\hspace*{20pt}5 mM Tris pH 8.0 \\
\hspace*{20pt}1 M NaCl \\
\hspace*{20pt}0.5 mM EDTA \\
\hspace*{20pt}0.05\% Tween-20

Store at 4$\,^{\circ}\mathrm{C}$.

\item Oligo resuspension buffer (IDTE) \\
\hspace*{20pt}10 mM Tris pH 8.0 \\
\hspace*{20pt}0.1 mM EDTA

Store at room temperature.

\item Oligo annealing buffer (STE) \\
\hspace*{20pt}10 mM Tris pH 8.0 \\
\hspace*{20pt}50 mM NaCl \\
\hspace*{20pt}1 mM EDTA

Store at room temperature.

\item Dilution Buffer \\
\hspace*{20pt}50\% glycerol \\
\hspace*{20pt}50 mM Tris pH 7.5 \\
\hspace*{20pt}100 mM NaCl \\
\hspace*{20pt}0.1 mM EDTA \\
\hspace*{20pt}0.1\% NP-40

Store at -20$\,^{\circ}\mathrm{C}$.
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Software \newline packages}}}

\begin{enumerate}
\item \verb|Bowtie| \cite{Langmead2009} (\burl{http://bowtie-bio.sourceforge.net/index.shtml}).
\item \verb|samtools| \cite{Li2009a}:  \burl{http://www.htslib.org/}
\item \verb|PicardTools| \burl{https://broadinstitute.github.io/picard/}
\item UCSC Genome Browser \cite{Kuhn2013,Kent2010} utilities: \burl{http://hgdownload.cse.ucsc.edu/admin/exe/}
\item \verb|STAR| \cite{STAR} \burl{https://github.com/alexdobin/STAR}
\item \verb|R|: \burl{https://www.r-project.org/}
\item \verb|Python| (version 2.7 or higher) \burl{https://www.python.org/}
\item \verb|ArchR| \cite{ArchR}: \burl{https://www.archrproject.com/}
\item \verb|Seurat| \cite{Seurat}: \burl{https://satijalab.org/seurat/}
\item Additional scripts: \\
 \burl{https://github.com/georgimarinov/GeorgiScripts}. Contains python scripts used in the examples shown below; some of the scripts depend on having \verb|pysam| (\burl{https://pysam.readthedocs.io/en/latest/index.html}) and \verb|pyBigWig| (\burl{https://github.com/deeptools/pyBigWig}) installed.
\end{enumerate}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig2-libraries-structure.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Structure of final SHARE-seq libraries}. ATAC (top) and RNA (bottom). Dots represent the actual library insert.}
\label{Fig2}
\end{center}
\end{figure*}

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm} 
\reversemarginpar\marginpar{\section{Methods}}

The general outline of the SHARE-seq assay is shown in Figure \ref{Fig1}. The first of the two basic ideas behind SHARE-seq and other pool/split-based assays is to label molecules originating from each cells with a unique combination of barcodes that are added serially and randomly by pooling cells and then randomly redistributing them across subsequent sets of barcodes, thus ensuring that statistically each cell can be identified through a unique combination of barcodes. The second is the separation of chromatin and transcriptome molecules through the use of a biotinylated reverse transcription (RT) primer, which can then be used for a streptavidin pulldown of the transcriptome. 

In brief, before the beginning of a SHARE-seq experiment, the needed barcode plates and transposases are prepared and stored. The experiment itself begins with the isolation of nuclei from cells in culture or from tissues (\textit{see} \textbf{Note \ref{Tissues}}. Nuclei are then cross-linked, usually lightly  (\textit{see} \textbf{Note \ref{xlinks}}. Transposition is then carried out, followed by reverse transcription using a biotinylated RT primer containing a random unique molecular identifier (UMI). Three rounds of pool/split hybridization and blocking are then carried out, after which the hybridized oligos are ligated into single molecules to each other and to the transposed chromatin fragments and reverse transcribed mRNA. Crosslinks are then reversed, and streptavidin pulldown is used to separate the chromatin from the transcriptome. ATAC libraries are directly amplified from the supernatant. The transcriptome is first amplified on-beads into cDNAs, which are then tagmented into sequenceable fragments and PCR-amplified into final libraries. 

The resulting library structures for ATAC and RNA are shown in Figure \ref{Fig2}. ATAC libraries contain three barcodes while RNA libraries also include the UMI. Note that with many Illumina-based sequencing readouts the first barcode to be read is actually the third one added during the pool/split procedure.

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Determining the optimal cell number}}}

It is important to carefully track the number of cells going into the SHARE-seq assays and being retained at each key step of the procedure. Pool/split assays rely on the statistical uniqueness of barcode combinations through which cells pass, which in turn means that having too many cells entering the pool/split procedure will lead to an unacceptably high rate of doublets (two or more cells with the same barcode). In the same time some of the reactions have an efficiency-imposed limit on the number of cells that can enter them, and need to be distributed into parallel reactions for optimal results. This applies to the initial transposition and reverse transcription reactions, as well as to the final amplification, where the existing protocol is optimized for libraries of size $\sim$20,000 cells, which means that after the final pooling cells are split into separate subpools of that size and processed into individual sublibraries. 

Figure \ref{Fig3} shows the theoretical number of detected cells and doublet rate for different pool/split setups with three rounds, accounting for a certain level of cell loss during repeated handling. Based on these calculations and empirical experience, we usually start the pool/split rounds with $\sim$5$\times 10^5$ cells for a 96 $\times$ 96 $\times$ 96 pool/split experiment.

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig3-doublet-rate-cell-number.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Combinatorial indexing and SHARE-seq's throughput}. Shown is the number of cells that can be detected at a given doublet rate; the pool-split process was simulated as a random Poisson loading at a 50\% loss of cells during each pool/split round.}
\label{Fig3}
\end{center}
\end{figure*}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Annealing of oligo plates}}}

In this step, barcode containing oligonucleotides for each round of split pool are annealed and distributed into 96-well plates prior to the actual assay. These plates can be stored at -20$\,^{\circ}\mathrm{C}$ indefinitely. It is advisable for the purposes of time saving to prepare sufficiently many such plates in advance to support multiple experiments. It is critical to thaw these plates to room temperature prior to use.

(\textit{see} \textbf{Note \ref{plates}})

\begin{enumerate}
\item Dilute Round 1 linker oligos (120 $\mu$L at 1 mM concentration) with 11,880 $\mu$L STE buffer. 
\item Mix 90 $\mu$L diluted Round 1 linker oligo with 10 $\mu$L Round 1 oligo (at 100 $\mu$M) in the wells of a multiwell plate
\item Dilute Round 2 linker oligos (120 $\mu$L at 1 mM concentration) with 9,480 $\mu$L STE buffer. 
\item Mix 88 $\mu$L diluted Round 2 linker oligo with 12 $\mu$L Round 2 oligo (at 100 $\mu$M) in the wells of a multiwell plate
\item Dilute Round 3 linker oligos (144 $\mu$L at 1 mM concentration) with 9,360 $\mu$L STE buffer. 
\item Mix 86 $\mu$L diluted Round 3 linker oligo with 14 $\mu$L Round 3 oligo (at 100 $\mu$M) in the wells of a multiwell plate
\item Anneal the  Round 1, Round 2, and Round 3 plates as follows in a thermocycler: \\
\hspace*{20pt}2 minutes at 95$\,^{\circ}\mathrm{C}$ \\
\hspace*{20pt}Slow ramp at -1$\,^{\circ}\mathrm{C}$ per minute to 20$\,^{\circ}\mathrm{C}$ \\
\hspace*{20pt}2 minutes at 20$\,^{\circ}\mathrm{C}$ \\
\hspace*{20pt}Indefinitely at 4$\,^{\circ}\mathrm{C}$
\item Check if there has been significant water evaporation for wells situated at the corners. If yes, add water to equalize volumes
\item Aliquot 10 $\mu$L of the annealed oligos to new plates. This should be enough for 9 experiments. Store these plates at -20$\,^{\circ}\mathrm{C}$.
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Anneal adapter oligos}}}

In this step, Tn5 adapters are prepared for both transposition of chromatin and tagmentation during cDNA library preparation. 

\begin{enumerate}
\item Dilute the Phosphorylated Read2, Read1, and Blocked ME Comp oligos to a 100 $\mu$M concentration with the IDTE buffer. 
\item Prepare the transposition adapter mix in a PCR tube as follows: \\ 
\hspace*{20pt}6.5 $\mu$L 100 $\mu$M Phosphorylated Read2 oligo \\
\hspace*{20pt}6.5 $\mu$L 100 $\mu$M Read1 oligo \\
\hspace*{20pt}13 $\mu$L 100 $\mu$M Blocked ME Comp oligo \\
\hspace*{20pt}0.26 $\mu$L 1 M Tris pH 8.0 \\
\hspace*{20pt}0.26 $\mu$L 5 M NaCl

\item Prepare the tagmentation adapter mix in a PCR tube as follows: \\ 
\hspace*{20pt}13 $\mu$L 100 $\mu$M Read1 oligo \\
\hspace*{20pt}13 $\mu$L 100 $\mu$M Blocked ME Comp oligo \\
\hspace*{20pt}0.26 $\mu$L 1 M Tris pH 8.0 \\
\hspace*{20pt}0.26 $\mu$L 5 M NaCl

\item Anneal oligos as follows in a thermocycler:\\
\hspace*{20pt}2 minutes at 85$\,^{\circ}\mathrm{C}$ \\
\hspace*{20pt}Slow ramp at -1$\,^{\circ}\mathrm{C}$ per minute to 20$\,^{\circ}\mathrm{C}$ \\
\hspace*{20pt}2 minutes at 20$\,^{\circ}\mathrm{C}$ \\
\hspace*{20pt}Indefinitely at 4$\,^{\circ}\mathrm{C}$

\item Heat glycerol to 65$\,^{\circ}\mathrm{C}$, then equilibrate to room temperature
\item Mix 25 $\mu$L glycerol with 25 $\mu$L of annealed oligo
\end{enumerate}

The annealed adapters can be immediately used or stored at --20$\,^{\circ}\mathrm{C}$.

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Transposome assembly}}}

In this step, Tn5 transposomes are assembled together with the annealed adapter oligos.

\begin{enumerate}

\item Assemble Tn5 transposomes by mixing the following components. \\
\hspace*{20pt}0.625$\times N$ 1$\times$ home-made Tn5  \\
\hspace*{20pt}0.625$\times N$ Dilution Buffer  \\
\hspace*{20pt}1.25$\times N$ annealed transposition adapter with glycerol 

Total volume: 2.5$\times N$

\item Incubate at room temperature for 30 minute. 

The assembled transposome can be stored at --20$\,^{\circ}\mathrm{C}$ for up to 2 weeks.
\end{enumerate}



\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Tissue dissociation}}}

Here, we describe an example tissue dissociation protocol that has worked successfully in our hands for several human embryonic tissues. However, users should be aware that generally each tissue requires separate optimization of dissociation conditions and it is likely that a different protocol will have to be adapted in most situations.

\begin{enumerate}
\item Set swing bucket centrifuge to 4$\,^{\circ}\mathrm{C}$ Fast Temp and thaw 1M DTT.
\item Transfer tissue samples onto dry ice.
\item Prepare MACS buffer (2 mL for each sample) as described above. Make sure the buffer is cold on ice.
\item Add 10 $\mu$L Protector RNase Inhibitor for each 1 mL in GentleMACS M-tubes. Add 1 mL of MACS buffer to each GentleMACS M-tube and chill on ice.
\item Transfer 30-50 mg of tissue into each GentleMACS M-tube containing 1 mL MACS buffer.
\item Allow the tissue to thaw in buffer. Transition to a cold room.
\item Homogenize using a \verb|Protein_01_01| dissociation protocol on a GentleMACS Tissue Dissociator instrument.
\item Filter the homogenate through 30 $\mu$m CellTrics filter into a 2mL DNA lo-bind tube by pipetting directly onto the top of the filter and gently tapping to allow flow.
\item Wash the GentleMACS M tube with 1 mL MACS buffer and filter the wash again through the 30 $\mu$m CellTrics filter.
\item Spin down the homogenate in a swing bucket centrifuge at 500 $g$ for 5 minutes at 4$\,^{\circ}\mathrm{C}$ (ramp up and down both at 3/9).
\item Remove and discard supernatant.
\item Resuspend in 1mL PBS-2RI.
\item Count cells/nuclei and proceed with desired number of cells/nuclei.
\end{enumerate}



\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Fixation of cells in culture and of dissociated nuclei from tissue}}}

The next step, if starting with a dissociated tissue, is to fix the nuclei. This is also the first step if starting with cells in culture. The procedure used is generally the same, with the difference that with nuclei the first step is directly the fixation.

\begin{enumerate}
\item Prepare PBS-2RI Buffer ($\sim$4 mL) by mixing the following: \\
\hspace*{20pt}4 mL 1$\times$ PBS \\
\hspace*{20pt}21.4 $\mu$L 7.5\% BSA \\
\hspace*{20pt}10 $\mu$L Enzymatic RI \\
\hspace*{20pt}5 $\mu$L SUPERase RI

Keep on ice.

\item Prepare NIB-RI Buffer ($\sim$8 mL) by mixing the following: \\
\hspace*{20pt}8 mL NIB \\
\hspace*{20pt}20 $\mu$L Enzymatic RI \\
\hspace*{20pt}20 $\mu$L SUPERase RI

Keep on ice.

\item Spin down cells at 500 $g$
\item Wash cells with 0.5 ml PBS-2RI
\item Count cells with Trypan blue
\item Resuspend cells with cold PBS-2RI at concentration of 1 $\times 10^6$ cells/mL.

\item For each 1 mL of cells in PBS-2RI, add 66.7 $\mu$L of 1.6\% FA (final concentration 0.1\% FA) for cells or 66.7 $\mu$L of 3.2\% FA for tissues. Mix and incubate at room temperature for 5 minutes.

\item Quench the reaction by adding to each 1 mL of cells in PBS-2RI the following.  \\
\hspace*{20pt}56.1 $\mu$L 2.5 M Glycine \\
\hspace*{20pt}50 $\mu$L 1M Tris pH 8.0 \\
\hspace*{20pt}13.3 $\mu$L of 7.5\% BSA

Mix well and incubate on ice for 5 minutes.

\item Spin down at 500 $g$. Remove supernatant, and add 0.5 mL PBS-2RI without disturbing the cell pellet.

\item Prepare RSB-RI by mixing the following: \\
\hspace*{20pt}2.5 $\mu$L 1 M Tris-HCl pH 7.5 \\
\hspace*{20pt}0.5 $\mu$L 5 M NaCl \\
\hspace*{20pt}0.75 $\mu$L 1 M MgCl$_2$ \\
\hspace*{20pt}2.5 $\mu$L 10\% Tween-20 \\
\hspace*{20pt}2.5 $\mu$L 10\% NP-40 \\
\hspace*{20pt}2.5 $\mu$L 1\% Digitonin \\
\hspace*{20pt}33.3 $\mu$L 7.5\% BSA \\
\hspace*{20pt}0.25 $\mu$L 1 M DTT \\
\hspace*{20pt}204 $\mu$L Ultrapure water \\
\hspace*{20pt}1.25 $\mu$L Enzymatic RI

\item Spin down again at 500 $g$. Remove supernatant, and resuspend cells in 100 $\mu$L RSB-RI and incubate on ice for 3 minutes.

\item Prepare RSB-T by mixing the following: \\
\hspace*{20pt}25 $\mu$L 1 M Tris-HCl pH 7.5 \\
\hspace*{20pt}5 $\mu$L 5 M NaCl \\
\hspace*{20pt}7.5 $\mu$L 1 M MgCl$_2$ \\
\hspace*{20pt}25 $\mu$L 10\% Tween-20 \\
\hspace*{20pt}333.3 $\mu$L 7.5\% BSA \\
\hspace*{20pt}2.5 $\mu$L 1 M DTT \\
\hspace*{20pt}2089.5 $\mu$L Ultrapure water \\
\hspace*{20pt}12.5 $\mu$L Enzymatic RI

\item Pipette 1 mL of RSB-T to cells and mix. Spin down at 500 $g$ for 5 minutes. 

\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{ATAC reaction}}}

In this step, transposition of the entire sample is performed by splitting it into 10,000-20,000 cells in 50-$\mu$L reactions each in a 96-well plate. The smaller volume and the number of cells per reaction improves the quality of transposition. 

The cell lysis conditions described here are adapted from the omniATAC bulk ATAC protocol \cite{Corces2017} (\textit{see} \textbf{Note \ref{TBbuffer}}).

\begin{enumerate}

\item Prepare PBS-RI by mixing the following: \\
\hspace*{20pt}800 $\mu$L PBS \\
\hspace*{20pt}2 $\mu$L Enzymatic RI

\item After the last centrifugation, remove supernatant and resuspend the cells with PBS-RI to 2$\times 10^6$ cells/mL. 

\item Prepare 2$\times$ TB buffer (sufficient for 96 reactions) by mixing the following: \\
\hspace*{20pt}874.5 $\mu$L 0.2 M Tris-acetate  \\
\hspace*{20pt}70 $\mu$L 5 M Potassium acetate \\ 
\hspace*{20pt}53 $\mu$L 1 M Magnesium acetate  \\
\hspace*{20pt}53 $\mu$L 10\% Tween-20  \\
\hspace*{20pt}53 $\mu$L 1\% Digitonin  \\
\hspace*{20pt}848 $\mu$L 100\% DMF  \\
\hspace*{20pt}698.5 $\mu$L H$_2$O 

\item Prepare 1$\times$ TB buffer according to the number of reactions $N$ to be carried out. $N = 1$ corresponds to 10$^4$ input cells. \\
\hspace*{20pt}25$\times N$ 2$\times$ TB \\
\hspace*{20pt}16.45$\times N$ H$_2$O \\
\hspace*{20pt}0.2$\times N$ PIC  \\
\hspace*{20pt}0.85$\times N$ Enzymatic RI

Total volume: 42.5$\times N$.

\item Aliquot 5$\times N$ $\mu$L of the diluted cells to a new tube. e.g. for 10$\times 10^5$ cells, $N = 10$, so aliquot 50 $\mu$L cells to a new tube.
\item Add 42.5$\times N$ $\mu$L 1$\times$ TB to sample.
% \item Incubate at room temperature for 10 minutes.
\item Add 2.5$\times N$ $\mu$L of assembled Tn5 to sample. Mix well.
\item Aliquot 50 $\mu$L of sample in the wells of a 96- or 384-well plate. 
\item Seal the plate and incubate with shaking at 500 rpm for 30 minutes at 37$\,^{\circ}\mathrm{C}$.
\item Pool the reactions and spin down at 500 $g$.
\item Add 0.5 mL NIB-RI without disturbing the pellet and spin down again at 500 $g$
\item Resuspend the cells in 60 $\mu$L EB.
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Reverse  \newline transcription}}}

In this step, reverse transcription is performed \textit{in situ}. The conditions are optimized for 1$\times$10$^5$ cells entering each 50-$\mu$L reaction.

\begin{enumerate}
\item Prepare the Reverse Transcription (RT) mix (sufficient for 6 reactions) as follows: \\
\hspace*{20pt}70 $\mu$L 5$\times$ RT buffer  \\
\hspace*{20pt}2.19 $\mu$L Enzymatics RNase Inhibitor  \\
\hspace*{20pt}4.38 $\mu$L SUPERase RI  \\
\hspace*{20pt}17.5 $\mu$L dNTPs  \\
\hspace*{20pt}35 $\mu$L RT Primer  \\
\hspace*{20pt}10.94 $\mu$L H2O  \\
\hspace*{20pt}105 $\mu$L 50\% PEG  \\
\hspace*{20pt}35 $\mu$L Maxima H Minus Reverse Transcriptase (add right before RT reaction)

Total volume: 280 $\mu$L.

\item Add 240 $\mu$L RT mix to 60 $\mu$L cells in EB.
\item Aliquot 50 $\mu$L to 6 PCR wells. 
\item Start thawing the oligo plates while the RT is ongoing.
\item Run the reverse transcription reaction in a thermocycler as follows: \\
\hspace*{20pt}50$\,^{\circ}\mathrm{C}$ for 10 minutes \\
\hspace*{20pt}3 cycles of:\\
\hspace*{20pt}\hspace*{20pt}8$\,^{\circ}\mathrm{C}$ for 12 seconds \\
\hspace*{20pt}\hspace*{20pt}15$\,^{\circ}\mathrm{C}$ for 45 seconds \\
\hspace*{20pt}\hspace*{20pt}20$\,^{\circ}\mathrm{C}$ for 45 seconds \\
\hspace*{20pt}\hspace*{20pt}30$\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}\hspace*{20pt}42$\,^{\circ}\mathrm{C}$ for 2 minutes \\
\hspace*{20pt}\hspace*{20pt}50$\,^{\circ}\mathrm{C}$ for 3 minutes \\
\hspace*{20pt}50$\,^{\circ}\mathrm{C}$ for 5 minutes.
\item Pool samples and mix with 500 $\mu$L NIB-RI.
\item Spin down at 500 $g$.
\item Wash with 1000 $\mu$L NIB.
\item Spin down at 500 $g$.
\item Resuspend with 1,152 $\mu$L NIB-RI.
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Hybridization-ligation and  \newline pool-split}}}

In this step, cells/nuclei are iteratively split into individual wells to dynamically create a combinatorial index statistically unique to each cell. All handling is performed at room temperature so make absolutely sure that oligo plates have been fully thawed before proceeding.

If different samples are multiplexed in a single run, they can be individually identified based on the first-round barcodes. If such a strategy is deployed, each sample needs to be processed through transposition and reverse transcription separately, then loaded into specified positions in the first-round plate(s).

\begin{enumerate}
\item Prepare 3,456 $\mu$L hybridization buffer as follows: \\
\hspace*{20pt}2761.9 $\mu$L H$_2$O \\
\hspace*{20pt}576 $\mu$L 10$\times$ T4 ligase buffer  \\
\hspace*{20pt}14.4 $\mu$L SUPERase RI 20 U/$\mu$L  \\
\hspace*{20pt}46.08 $\mu$L Enzymatics RI 40 U/$\mu$L  \\
\hspace*{20pt}57.60 $\mu$L 10\% NP40 

\item Mix 1,152 $\mu$L of sample with 3,456 $\mu$L hybridization buffer. Keep the sample at RT
\item Aliquot 40 $\mu$L of mixture to a Round 1 plate. 
\item Mix and shake at 300 rpm for 30 minutes at RT

\item Prepare 1,152 $\mu$L Blocking Oligo 1 mix as follows: \\
\hspace*{20pt}253.4 $\mu$L 100 $\mu$M Round 1 blocking oligo\\
\hspace*{20pt}211.2 $\mu$L 10$\times$ T4 DNA Ligase buffer \\
\hspace*{20pt}687.4 $\mu$L H$_2$O

\item Add 10 $\mu$L Blocking Oligo 1 mix to each well
\item Mix and shake at 300 rpm for 30 minutes at RT

\item Pool samples from all wells
\item Aliquot 50 $\mu$L of mixture to a Round 2 plate. 
\item Mix and shake at 300 rpm for 30 minutes at RT

\item Prepare 1,152 $\mu$L Blocking Oligo 2 mix as follows: \\
\hspace*{20pt}304.1 $\mu$L 100 $\mu$M Round 2 blocking oligo\\
\hspace*{20pt}211.2 $\mu$L 10$\times$ T4 DNA Ligase buffer \\
\hspace*{20pt}636.7 $\mu$L H$_2$O

\item Add 10 $\mu$L Blocking Oligo 2 mix to each well
\item Mix and shake at 300 rpm for 30 minutes at RT

\item Pool samples from all wells
\item Aliquot 60 $\mu$L of mixture to a Round 2 plate. 
\item Mix and shake at 300 rpm for 30 minutes at RT

\item Prepare 1,152 $\mu$L Blocking Oligo 3 mix as follows: \\
\hspace*{20pt}265.0 $\mu$L 100 $\mu$M Round 3 blocking oligo\\
\hspace*{20pt}11.5 $\mu$L 10\% NP-40  \\
\hspace*{20pt}875.5 $\mu$L H$_2$O

\item Add 10 $\mu$L Blocking Oligo 1 mix to each well
\item Mix and shake at 300 rpm for 30 minutes at RT

\item Pool samples from all wells
\item Spin down at 500 $g$ 5 minutes
\item Wash with 1 mL NIB-RI
\item Spin down at 500 $g$ 5 minutes
\item Resuspend in 80 $\mu$L NIB-RI

\item Prepare 320 $\mu$L Ligation mix as follows: \\
\hspace*{20pt}3.2 $\mu$L Enzymatics RI \\
\hspace*{20pt}1.00 $\mu$L SUPERase RI \\
\hspace*{20pt}40 $\mu$L 10$\times$ T4 DNA Ligase Ligation buffer \\
\hspace*{20pt}20 $\mu$L T4 DNA Ligase 400 U/$\mu$L \\
\hspace*{20pt}251.8 $\mu$L H2O \\
\hspace*{20pt}4 $\mu$L 10\% NP40

\item Mix sample with the 320 $\mu$L Ligation mix
\item Aliquot 8$\times$ 50 $\mu$L in PCR tubes
\item Shake at 300 rpm for 30 minutes at RT

\item Pool samples from all tubes
\item Spin down at 500 $g$ 5 minutes
\item Wash with 1 mL NIB-RI
\item Spin down at 500 $g$ 5 minutes
\item Resuspend in 400 $\mu$L NIB-RI
\item Count the number of nuclei.
\end{enumerate}

Note: If fewer cells are preferred per sub-library, count cells to desired concentration and add more NIB to make the volume up to 50 $\mu$L per sub-library.

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Reverse  \newline crosslinking}}}

In this step, cells are reverse crosslinked to release DNA from the bound proteins so that the ATAC libraries can be amplified. As the crosslinking is relatively gentle (at 0.1 or 0.2\%), a milder reverse crosslinking condition of 1 hour incubation at 55$\,^{\circ}\mathrm{C}$ is generally sufficient. 

Further reverse crosslinking optimization might be needed if the crosslinking protocol has been modified.

\begin{enumerate}
\item For each $N$ of 50-$\mu$L sub-library, add the following: \\
\hspace*{20pt}50 $\mu$L 2$\times$ RCB \\
\hspace*{20pt}2 $\mu$L Proteinase K \\
\hspace*{20pt}1 $\mu$L SUPERase RI 
\item Incubate at 55$\,^{\circ}\mathrm{C}$ for 1 hour.
\item Add 5 $\mu$L 100 mM PMSF/IPA 
\item Incubate at room temperature for 10 minutes
\end{enumerate}

Note: this is an optional stopping point. The reverse crosslinked product can be stored at --80$\,^{\circ}\mathrm{C}$ for a few days.

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Pull down}}}

In this step, the cDNA is separated from the transposition products by pulling down on the biotin that is part of the reverse transcription primer. The supernatant constitutes the transposition products, and is processed separately from the cDNA.

\begin{enumerate}
\item Prepare 1$\times$ B\&W-T/RI buffer by mixing the following: \\
\hspace*{20pt}400$\times (N+1)$ $\mu$L 1$\times$ B\&W-T buffer \\
\hspace*{20pt}4$\times (N+1)$ $\mu$L SUPERase RI

\item Prepare 1$\times$ B\&W/RI buffer by mixing the following: \\
\hspace*{20pt}100$\times (N+1)$ $\mu$L 1$\times$ BW buffer \\
\hspace*{20pt}2$\times (N+1)$ $\mu$L SUPERase RI

\item Prepare 1$\times$ STE/RI buffer by mixing the following: \\
\hspace*{20pt}200$\times (N+1)$ $\mu$L 1$\times$ STE buffer \\
\hspace*{20pt}$N+1$ $\mu$L SUPERase RI

\item In a fresh tube, mix 10$\times N$ $\mu$L MyOne C1 Dynabeads with 100$\times N$ $\mu$L 1$\times$ B\&W-T 
\item Separate on a magnetic rack and remove supernatant
\item Wash twice with 100$\times N$ $\mu$L B\&W-T without RI.
\item Wash once with 100$\times N$ $\mu$L B\&W-T/RI
\item Resuspend beads in 100$\times N$ $\mu$L 2$\times$ B\&W/RI.
\item Add 100 $\mu$L beads to each sample
\item Incubate at room temperature on a rotator for 60 minutes
\item Place the tube on a magnetic rack
\item Transfer the supernatant (which contains chromatin fragments) to a new tube for ATAC library preparation. The ATAC fragments are stable for a few hours at room temperature and can be processed concurrently or after cDNA library construction is complete.
\item Wash cDNA/RNA-bound beads three times with 100 $\mu$L 1$\times$ B\&W-T/RI.
\item Wash with 100 $\mu$L 1$\times$ STE/RI without resuspending beads. 
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{ATAC library preparation}}}

In this step, ATAC fragments are purified and amplified into a final library ready for sequencing.

\begin{enumerate}
\item Clean up the ATAC part of the sample using Zymo DNA Clean \& Concentrate. Elute in 11 $\mu$L EB buffer, then elute again with additional 11 $\mu$L EB buffer (a total of 22 $\mu$L EB buffer).
\item Prepare ATAC PCR Master Mix by mixing the following: \\
\hspace*{20pt}225 $\mu$L 2$\times$ NEBnext Master Mix \\
\hspace*{20pt}9 $\mu$L P7 primer 25 $\mu$M \\
\hspace*{20pt}27 $\mu$L H$_2$O

\item Mix the following: \\
\hspace*{20pt}$\sim$20 $\mu$L sample \\
\hspace*{20pt}29 $\mu$L ATAC PCR Master Mix \\
\hspace*{20pt}1 $\mu$L of 25 $\mu$M Adapter 1 Primer (from the PCR Library indexing primers plate)

\item Run PCR for 5 cycles as follows: \\
\hspace*{20pt}72$\,^{\circ}\mathrm{C}$ for 5 minutes \\
\hspace*{20pt}98$\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}5 cycles of:  \\
\hspace*{20pt}\hspace*{20pt}98$\,^{\circ}\mathrm{C}$ for 10 seconds  \\
\hspace*{20pt}\hspace*{20pt}65$\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}\hspace*{20pt}72$\,^{\circ}\mathrm{C}$ for 30 seconds

\item Determining additional cycles using qPCR. Add 5 $\mu$L of the pre-amplified reaction to 10 $\mu$L qPCR mastermix for a total qPCR reaction of 15 $\mu$L as follows: \\

\hspace*{20pt}5 $\mu$L NEBnext Master Mix \\
\hspace*{20pt}0.2 $\mu$L 25 $\mu$M Adapter 1.1 \\
\hspace*{20pt}0.2 $\mu$L 25 $\mu$M P7 \\
\hspace*{20pt}0.9 $\mu$L 10x SYBR Green \\
\hspace*{20pt}3.7 $\mu$L H$_2$O \\

\item Assess the amplification profiles and determine the required number of additional cycles to amplify. Please refer to Figure 2 in Buenrostro et al. (2015).


\item Carry out final amplification by placing the remaining 45 $\mu$L in a thermocycler and running the following program:\\
\hspace*{20pt}$N_{add}$ cycles of: \\
\hspace*{20pt}\hspace*{20pt}98$\,^{\circ}\mathrm{C}$ for 10 seconds  \\
\hspace*{20pt}\hspace*{20pt}65$\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}\hspace*{20pt}72$\,^{\circ}\mathrm{C}$ for 30 seconds

Where $N_{add}$ is the number of additional cycles.

\item Clean up the finally library using Zymo DNA Clean \& Concentrate, eluting in 15 $\mu$L.
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{RNA library  \newline preparation step 1.  \newline Template  \newline switching}}}

In this step, RNA library generation is initiated by carrying out template switching on the pulled down cDNA.

\begin{enumerate}
\item Prepare the Template switch mix by mixing the following: \\
\hspace*{20pt}11.25 $\mu$L H$_2$O \\
\hspace*{20pt}125 $\mu$L 50\% PEG 6000 \\
\hspace*{20pt}90 $\mu$L 5$\times$ Maxima RT buffer \\
\hspace*{20pt}90 $\mu$L Ficoll PM-400 (20\%) \\
\hspace*{20pt}45 $\mu$L 10 mM dNTPs \\
\hspace*{20pt}45 $\mu$L RNAse inhibitor (Lucigen) \\
\hspace*{20pt}11.25 $\mu$L 100 $\mu$M TSO oligo \\
\hspace*{20pt}22.5 $\mu$L Maxima RT Rnase H Minus (add last right before reaction)

\item Remove all supernatant. Be careful to avoid drying the beads
\item Resuspend beads in 50 $\mu$L Template switch mix
\item Incubate samples for 30 minutes at room temperature with rotation
\item Incubate samples for 90 minutes at 42$\,^{\circ}\mathrm{C}$ at 300 rpm. Resuspend every 30 minutes by pipetting up and down.

\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{RNA library preparation step 2. Amplification of cDNA}}}

The next step is to amplify the individual cDNA molecules.

\begin{enumerate}
\item Prepare cDNA PCR Mix by mixing the following: \\
\hspace*{20pt}247.5 $\mu$L Kapa Hifi 2$\times$ mix \\
\hspace*{20pt}7.92 $\mu$L 25 $\mu$M RNA PCR primer  \\
\hspace*{20pt}7.92 $\mu$L 25 $\mu$M P7 primer  \\
\hspace*{20pt}231.7 $\mu$L H$_2$O
\item Mix samples with 100 $\mu$L H$_2$O. 
\item Separate beads on magnet. Wash with 200 $\mu$L STE without resuspending the beads
\item Mix beads with 55 $\mu$L cDNA PCR Mix and transfer to PCR tubes/plates
\item Run PCR as follows: \\
\hspace*{20pt}95$\,^{\circ}\mathrm{C}$ for 3 minutes \\
\hspace*{20pt}5 cycles of:  \\
\hspace*{20pt}\hspace*{20pt}98$\,^{\circ}\mathrm{C}$ for 20 seconds  \\
\hspace*{20pt}\hspace*{20pt}65$\,^{\circ}\mathrm{C}$ for 45 seconds \\
\hspace*{20pt}\hspace*{20pt}72$\,^{\circ}\mathrm{C}$ for 3 minutes

\item Determining additional cycles using qPCR. Add 2.5 $\mu$L of the pre-amplified reaction to 7.5 $\mu$L qPCR mastermix in a total qPCR reaction of 10 $\mu$L as follows: \\

\hspace*{20pt}3.75 $\mu$L Kapa Hifi 2$\times$ mix \\
\hspace*{20pt}0.12 $\mu$L 25 $\mu$M RNA PCR primer  \\
\hspace*{20pt}0.12 $\mu$L 25 $\mu$M P7 primer  \\
\hspace*{20pt}0.5 $\mu$L 20x EVAgreen  \\
\hspace*{20pt}3.01 $\mu$L H$_2$O

\item Determine additional cycles as described above for ATAC libraries.

\hspace*{20pt}5 cycles of:  \\
\hspace*{20pt}\hspace*{20pt}98$\,^{\circ}\mathrm{C}$ for 20 seconds  \\
\hspace*{20pt}\hspace*{20pt}65$\,^{\circ}\mathrm{C}$ for 45 seconds \\
\hspace*{20pt}\hspace*{20pt}72$\,^{\circ}\mathrm{C}$ for 3 minutes

\item Purify using SPRI beads. Mix the reaction with 0.8$\times$ volume of SPRI beads and incubate at room temperature for 10 minutes. Separate the beads on magnet and wash twice with 200 $\mu$L freshly prepare 70\% EtOH. Make sure to remove all liquid, and elute in 20 $\mu$L.

\item Optional: check size of the cDNA using the D5000 TapeStation.
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{RNA library preparation step 3. Tagmentation}}}

The next step is to tagment the amplified cDNA, which will prepare it for the final library amplification step.

\begin{enumerate}
\item Quantify cDNA concentration using Qubit.
\item Dilute cDNA to a concentration of 5 ng/$\mu$L for tagmentation

Note: expect more than 50 ng cDNA. If cDNA amount is low, it can get away with tagmenting 20 ng cDNA; in this case, adjust the volume of H2O and cDNA accordingly

\item Prepare tagmentation transposome by mixing the following: \\
\hspace*{20pt}11.25 $\mu$L 1$\times$ Tn5 \\
\hspace*{20pt}11.25 $\mu$L Dilution Buffer \\
\hspace*{20pt}22.5 $\mu$L annealed tagmentation adapter with glycerol

\item Mix the following: \\
\hspace*{20pt}10 $\mu$L 5 ng/$\mu$L cDNA \\
\hspace*{20pt}10 $\mu$L H$_2$O \\
\hspace*{20pt}25 $\mu$L 2$\times$ TD buffer \\
\hspace*{20pt}5 $\mu$L assembled Tn5

\item Incubate for 5 minutes at 55$\,^{\circ}\mathrm{C}$.

\item Purify tagmented library using the Zymo kit (use 250 $\mu$L binding buffer). Elute twice with 11 $\mu$L EB (a total of 22 $\mu$L)
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{RNA library preparation step 4. Final Amplification}}}

Final libraries are generated by PCR.

\begin{enumerate}
\item Prepare Post-tagmentation PCR mix by mixing the following: \\
\hspace*{20pt}20 $\mu$L sample \\
\hspace*{20pt}25 $\mu$L 2$\times$ NEB Next Master Mix \\
\hspace*{20pt}1 $\mu$L 25 $\mu$M P7 primer \\
\hspace*{20pt}1 $\mu$L 25 $\mu$M Adapter 1 Primer (from the PCR Library indexing primers plate)\\
\hspace*{20pt}3 $\mu$L H$_2$O

\item Run PCR as follows: \\
\hspace*{20pt}72$\,^{\circ}\mathrm{C}$ for 5 minutes \\
\hspace*{20pt}9 cycles of:  \\
\hspace*{20pt}\hspace*{20pt}98$\,^{\circ}\mathrm{C}$ for 10 seconds  \\
\hspace*{20pt}\hspace*{20pt}65$\,^{\circ}\mathrm{C}$ for 30 seconds \\
\hspace*{20pt}\hspace*{20pt}72$\,^{\circ}\mathrm{C}$ for 60 seconds
\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Library  \newline quantification  \newline and evaluation  \newline of library quality}}}

Before libraries can be sequenced, they need to be properly quantified and be subjected to quality evaluation. This is done by first, evaluation of the insert distribution, and second, quantification. 

\begin{enumerate} 
\item Examination of library size distribution. This step can be carried out using several different instruments, such as a TapeStation or a BioAnalyzer. We prefer to use a TapeStation (with the D1000 or HS D1000 kits) due to flexibility, ease of use, and rapid turnaround time.

\item Quantification of library concentration. For most high-throughput sequencing applications, this step is standardly carried out using a Qubit fluorometer. While this works well for libraries with a unimodal fragment length distribution, ATAC libraries typically exhibit a multimodal fragment distribution and also often contain fragments of length higher than what can be sequenced on standard Illumina instruments. As a result effective library concentrations often differ from apparent library concentrations measured using Qubit, and the optimal way for estimating effective library concentration is qPCR

\item Estimation of effective library concentration using qPCR. Standard Illumina library quantification kits can be used to quantify the concentration of the library that will be able to be sequenced. Products from NEB or KAPA are appropriate for this use. 

\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Sequencing}}}

The protocol described here generates libraries designed to be sequenced on Illumina sequencers, the most widely available of which is the NextSeq. On a NextSeq, SHARE-seq libraries are to be run as follows using a 150-cycle kit:

For the RNA libraries, use a 50 bp $\times$ 10 bp $\times$ 99 bp $\times$ 8 bp configuration (Read 1 $\times$ Read 2 $\times$ Index1 $\times$ Index2, respectively). 

For the ATAC libraries, use a 30 bp $\times$ 30 bp $\times$ 99 bp $\times$ 8 bp configuration (Read 1 $\times$ Read 2 $\times$ Index1 $\times$ Index2, respectively). 

For RNA, the 10bp of Read 2 capture the UMI, the 50 bp capture the actual RNA sequence. 

For ATAC, fragments are sequenced in a 2$\times$30 bp format.

The 8 bp of Index 2 capture the library barcode (if more than one library is sequenced in a single run). The 99 bp of Index 1 capture the pool-split barcodes.

For other Illumina instruments, different configurations can be used. For example, using a 200-cycle kit on NovaSeq, run ATAC libraries in 55 bp $\times$ 55 bp $\times$ 99 bp $\times$ 8 bp configuration and RNA libraries in a 100 bp $\times$ 10 bp $\times$ 99 bp $\times$ 8 bp configuration.

An important consideration to take into account before sequencing is that the standard Illumina run recipes do not allow for the 99-bp index read configuration that is necessary for SHARE-seq libraries. This necessitates the creation of custom recipes in which the limits on the length of the index reads are increased accordingly. However, different methods for creating thse custom recipes are necessary depending on the Illumina instrument used and the versions of the control software that the machine is equipped with; resolving this issues may on occasions require seeking help from Illumina's customer support service.

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm} 
\centerline{}

\reversemarginpar\marginpar{\section{Computational processing}}

At present there is no standard tool for analyzing pool-split-based multiomics datasets. The pipeline presented here is the one we have been using in our practice. It's objective is to take the raw SHARE-seq reads and to produce object that can be used for further analysis with established tools for scRNA-seq/scATAC-seq processing such as \verb|Seurat| and \verb|ArchR| (e.g. sparse matrices and BAM files). The outline of the processing is show in Figure \ref{Fig4}. For both ATAC and RNA, reads are first assigned their cellular barcodes. RNA reads are additionally annotated with the sequenced UMIs. RNA reads are aligned against the genome, a quantification is carried out for each gene in each cell, and a final sparse matrix is created. For ATAC, reads are mapped against the genome, then filtered and deduplicated within each cell, and a final BAM file with cellular barcodes appended to each alignment is created. 

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig4-Computational-processing-outline.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Outline of the SHARE-seq computational processing procedures}. As a first step, cell barcodes are annotated for all reads in both ATAC and RNA FASTQ files. Subsequently, UMIs are consolidated and assigned to reads in the RNA set. RNA reads are then aligned against the genome, and gene expression is quantified in single cells, resulting in a final data matrix that can be analyzed in Seurat (or other scRNA-seq) tools. ATAC reads are aligned against the genome, filtered (removing mitochondria-mapping reads), and deduplicated within each barcode. Alignments are then annotated with their cell barcodes and can be used as input for further analysis in ArchR. Further joint analysis of the ATAC and RNA can be carried out downstream. }
\label{Fig4}
\end{center}
\end{figure*}


\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{RNA}}}

\begin{enumerate}
\item As a first step in the RNA processing, annotated barcodes for each read pair, using the \verb|SHARE-seq-barcode-annotate.py| script. 

\begin{verbatim}
python SHARE-seq-barcode-annotate.py 
            BC1file fieldID pos1 lenBC1 BC2file 
            fieldID2 pos2 lenBC2 BC3file fieldID3 
            pos3 lenBC3 [-BCedit N] [-revcompBC]
\end{verbatim}

The script is flexible and can be used to assign barcodes to almost any kind of pool-split experiment in which the indexes are in Index Read 1. It takes as input files containing the barcodes for each round of pool-split and the column positions of the barcode sequences in each file (0-based), their position in Index Read 1 (0-based), their length, and their orientation (use the \verb|[-revcompBC]| option if the sequences are reverse complement, depending on the exact format of the sequencing). Use the \verb|[-BCedit]| option to increase/decrease the stringency of matching barcode sequences to the master list (the default value is 1). In this case the barcode files are in the following format:

\begin{verbatim}
#WellPosition   Name    Sequence
A1      Round1_01       AACGTGAT
B1      Round1_02       AAACATCG
C1      Round1_03       ATGCCTAA
[...]
\end{verbatim}

And barcodes are assigned in a single step as follows:

\begin{verbatim}
python PEFastqToTabDelimited.py RNA.end1.fastq.gz 
    RNA.end2.fastq.gz | python SHARE-seq-barcode-annotate.py 
    Plate_R1.tsv 2 15 8 Plate_R2.tsv 2 53 8 Plate_R3.tsv 
    2 91 8 -revcompBC 
    | PEFastqToTabDelimited-reverse.py - 
    RNA.barcodes_annotated
\end{verbatim}

This will produce FASTQ files with headers looking as follows:

\begin{verbatim}
@[readID]:::[GTTAGCCT+TAGTCTTG+TACCGAGC] 1:N:0:
TGGGGNCACAGAGCCAAACCATATCAGCTG
+
AAAAA#EEEEEAEEEEEEEEEEEEEEEEEE
\end{verbatim}

In which barcode combinations have been appended to the read headers, with \verb|nan| if no matching barcode was found due to sequencing errors or other issues, e.g:

\begin{verbatim}
@[readID]:::[GACGGATT+GATAGAGG+nan] 1:N:0:
ACCAANCTGTGCACAAGCGTGAATCAACCT
+
6AAAA#E/EEEEEEEEEAEEEEEEEEEEEE
\end{verbatim}

Note that it is considerably faster to split the FASTQ files into smaller pieces and process them in parallel.

\item Compress the output files:

\begin{verbatim}
gzip RNA.barcodes_annotated.barcodes_annotated.end1.fastq
gzip RNA.barcodes_annotated.barcodes_annotated.end1.fastq
\end{verbatim}

\item Annotated UMIs using the \verb|SHARE-seq-RNA-UMI-Add.py| script, which is also flexible and can read UMIs of different length in each read in the pair:

\begin{verbatim}
python SHARE-seq-RNA-UMI-Add.py UMIlen read1|read2
\end{verbatim}

As follows:

\begin{verbatim}
python PEFastqToTabDelimited.py 
                    RNA.barcodes_annotated.end1.fastq.gz 
                    RNA.barcodes_annotated.end2.fastq.gz | 
                python SHARE-seq-RNA-UMI-Add.py 10 read2 | 
                python PEFastqToTabDelimited-reverse.py - 
                RNA.barcodes_annotated.RNA_UMI
\end{verbatim}

This step will append the UMI sequence to the cell barcodes in the read ID:

\begin{verbatim}
@[readID]:::[TGACCACT+GGTCGTGT+TGCTGATA+TTTATGATAG]
CCTCTNGCTCAGCCTATATACCGCCATCTTCAGCAAACCCTGATGAAGGC
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEE
\end{verbatim}

\item Compress the output files:

\begin{verbatim}
gzip RNA.barcodes_annotated.RNA_UMI.end1.fastq
gzip RNA.barcodes_annotated.RNA_UMI.end2.fastq
\end{verbatim}

\item Merge the individual files:

\begin{verbatim}
cat RNA_*.barcodes_annotated.RNA_UMI.end1.fastq > 
     RNA.barcodes_annotated.RNA_UMI.end1.fastq.gz
cat RNA_*.barcodes_annotated.RNA_UMI.end2.fastq > 
     RNA.barcodes_annotated.RNA_UMI.end2.fastq.gz
\end{verbatim}

\item Align the Read 1 FASTQ file against the genome using STAR as follows (the commands given here use the standard ENCODE Project Consortium\cite{ENCODE2012} STAR settings):

\begin{verbatim}
STAR --limitSjdbInsertNsj 10000000 --genomeDir genome/STAR 
--outFileNamePrefix RNA.end1.STAR/ 
--readFilesIn RNA.barcodes_annotated.RNA_UMI.end1.fastq.gz 
--runThreadN 20 --outSAMunmapped Within --outFilterType 
BySJout --outSAMattributes NH HI AS NM MD 
--outFilterMultimapNmax 50 --outSAMstrandField intronMotif 
--outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 
0.04 --alignIntronMin 10 --alignIntronMax 1000000 
--alignMatesGapMax 1000000 --alignSJoverhangMin 8 
--alignSJDBoverhangMin 1 --sjdbScore 1 --readFilesCommand 
zcat --outSAMtype BAM SortedByCoordinate --outWigStrand 
Stranded --twopassMode Basic --twopass1readsN -1 
--limitBAMsortRAM 500000000000
\end{verbatim}

\item Index the output BAM file:

\begin{verbatim}
samtools index 
 RNA.end1.STAR/Aligned.sortedByCoord.out.bam
\end{verbatim}

\item Calculate global mapping statistics:

\begin{verbatim}
python SAMstats.py 
    RNA.end1.STAR/Aligned.sortedByCoord.out.bam 
    SAMstats-RNA.end1.STAR.hg38 
    -bam genome.chrom.sizes samtools
\end{verbatim}

This script will output the number of mapped reads in various categories (uniquely mapping, spliced, etc.) as well as the molecular complexity of the alignment.

\item Calculate read distribution relative to the genome annotation:

\begin{verbatim}
python SAM_reads_in_genes3_BAM.py annotation.gtf 
     RNA.end1.STAR/Aligned.sortedByCoord.out.bam 
     genome.chrom.sizes 
     sam_reads_genes-RNA.end1.STAR -nomulti
\end{verbatim}

This script will output the fraction of exonic, intronic, and intergenic reads. This is important information for single-cell assays for evaluating to what extent the cytoplasm (which is enriched for exonic reads relative to the nucleus) is captured in the final libraries.

\item Make a RPM-normalized (Reads Per Million mapped reads) global coverage track:

\begin{verbatim}
python makewigglefromBAM-NH.py title 
    RNA.end1.STAR/Aligned.sortedByCoord.out.bam 
    genome.chrom.sizes 
    RNA.end1.STAR/Aligned.sortedByCoord.out.wig -RPM 
\end{verbatim}

\item Evaluate read coverage along transcripts:

\begin{verbatim}
python gene_coverage_wig_gtf.py annotation.gtf 
   RNA.end1.STAR/Aligned.sortedByCoord.out.wig 
   1000 coverage-RNA -normalize -singlemodelgenes
\end{verbatim}

This script run with these settings will output the average read profile over all genes with only a single transcript annotated (in order to avoid confounding by the presence of multiple isoforms) and $\geq$ 1,000 bp in length. Use a simple annotation with few isoforms, such as refSeq to get as many genes meeting these requirement as possible.

\item Calculate UMI counts per gene and per cell barcode combination using the \verb|SHARE-seq_RNA_counts.py|. For faster processing, run this on each chromosome in parallel, as follows (shown is chr1)

\begin{verbatim}
python SHARE-seq_RNA_counts.py 
   RNA.end1.STAR/Aligned.sortedByCoord.out.bam 
   annotation.gtf.chr1 genome.chrom.sizes 
   RNA.SHARE-seq_RNA_counts.chr1 -UMIedit 1
\end{verbatim}

The \verb|[-UMIedit]| option can be used to tweak the level of UMI collapsing (in this case UMIs within an edit distance of 1 from each other will be collapsed into a single UMI). 

% \hl{XXX EXPLAIN OPTIONS, DEFAULT TREATMENT OF MULTIREADS AND OF INTRONIC READS XXX}

\item Calculate per-cell statistics by mering the individual outputs using the \verb|SHARE-seq-RNA-BC-sum-across-files.py| script as follows:

\begin{verbatim}
python SHARE-seq-RNA-BC-sum-across-files.py 
      list_of_per_chromosome_outputs 
      RNA.SHARE-seq_RNA_counts.UMIs_per_cell
\end{verbatim}

This will output a file in the following format:

\begin{verbatim}
#BC1+BC2+BC3	            rank3 UMIs3	Aligned Positions genes
GCCAATGT+CAGATCTG+TAACGCTG	1	  64660	171969	          8369
GTTGTCGG+TAAGCGTT+GATCAGCG	2	  47079	123008	          7864
TGACCACT+GGTCGTGT+TGCTGATA	3	  45034	109960	          7652
\end{verbatim}

Which shows the number of UMIs and number of detected genes for each cell barcode combination.

\item Extract cell barcode combinations above a desired threshold, e.g. $\geq$500 UMIs into a separate file.

\item Create final sparse matrix format files that can be used as input to Seurat for further analysis with the \newline \verb|SHARE-seq-RNA-UMIs-sum-across-files.py| script:

\begin{verbatim}
python SHARE-seq-RNA-UMIs-sum-across-files.py 
   list_of_per_chromosome_outputs 
   RNA.SHARE-seq_RNA_counts.UMIs_per_cell.min500 0 
   RNA.SHARE-seq_RNA_counts.UMIs_per_cell.min500.sparse 
  -sparse
\end{verbatim}

\end{enumerate}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{ATAC}}}

The first steps of the ATAC processing are analogous to those of the RNA pipeline.

\begin{enumerate}
\item First, annotate cellular barcodes:

\begin{verbatim}
python PEFastqToTabDelimited.py 
  ATAC.end1.fastq.gz ATAC.end2.fastq.gz | 
  python SHARE-seq-barcode-annotate.py 
  Plate_R1.tsv 2 15 8 Plate_R2.tsv 2 53 8 Plate_R3.tsv 2 
  91 8 -revcompBC | 
  PEFastqToTabDelimited-reverse.py - 
  ATAC.barcodes_annotated 
\end{verbatim}

Note as before that it is considerably faster to split the FASTQ files into smaller pieces and process them in parallel.

\item Compress the output files:

\begin{verbatim}
gzip ATAC.barcodes_annotated.end1.fastq
gzip ATAC.barcodes_annotated.end1.fastq
\end{verbatim}

\item Merge the individual files:

\begin{verbatim}
cat ATAC_*.barcodes_annotated.end1.fastq > 
     ATAC.barcodes_annotated.end1.fastq.gz
cat ATAC_*.barcodes_annotated.end2.fastq > 
     ATAC.barcodes_annotated.end2.fastq.gz
\end{verbatim}

\item Align reads against the mitochondrial genome with \verb|Bowtie| as follows:

\begin{verbatim}
python PEFastqToTabDelimited.py 
  ATAC.barcodes_annotated.end1.fastq.gz
  ATAC.barcodes_annotated.end2.fastq.gz -trim 30 30 | 
  bowtie bowtie-indexes/chrM -p 20 -v 2 -a -t --best 
  --strata -q -X 1000 --sam --12 - |
  samtools view -F4 -bT genome.fa - | 
  samtools sort - ATAC.2x30mers.chrM
\end{verbatim}

This step is for the purpose of evaluating the extent of mitochondrial contamination in the overall library.

\item Align reads against the full genome with \verb|Bowtie| and filter out mitochondrial reads as follows:

\begin{verbatim}
python PEFastqToTabDelimited.py 
 ATAC.barcodes_annotated.end1.fastq.gz
 ATAC.barcodes_annotated.end2.fastq.gz 
 -trim 30 30 | bowtie bowtie-indexes/genome 
 -p 20 -v 2 -k 2 -m 1 -t --best --strata -q 
 -X 1000 --sam --12 - | egrep -v chrM | 
 samtools view -F4 -bT genome.fa - | samtools sort - 
 ATAC.2x30mers.unique.nochrM
\end{verbatim}

Adjust accordingly if working a genome in which the mitochondrial chromosome/contigs are named differently or there are multiple contigs to be filtered out (e.g. in plants where there is also a plastid in addition to the mitochondrion).

\item Index the resulting BAM files.

\begin{verbatim}
samtools index ATAC.2x30mers.unique.nochrM.bam
samtools index ATAC.2x30mers.chrM.bam
\end{verbatim}

\item Calculate mapping statistics for the two sets of alignments.

\begin{verbatim}
python SAMstats.py ATAC.2x30mers.chrM.bam 
    SAMstats-ATAC.2x30mers.chrM 
   -bam genome.chrom.sizes samtools 
   -paired -noNHinfo 
python SAMstats.py ATAC.2x30mers.unique.nochrM.bam 
    SAMstats-ATAC.2x30mers.unique.nochrM 
    -bam genome.chrom.sizes samtools 
    -paired -uniqueBAM
\end{verbatim}

\item Calculate the mitochondrial reads fraction $MRF$ as follows:

\begin{equation}
MRF = \cfrac{|R_M|}{|R_M| + |R_N|}
\end{equation}

Where $R_M$ is the total number of reads that map to the mitochondrial genome and $R_N$ is the number of reads that map to the nuclear genome after filtering out mito-mapping reads. 

\item Evaluate the fragment size distribution oer the nuclear genome:

\begin{verbatim}
python PEInsertDistFromBAM.py 
    ATAC.2x30mers.unique.nochrM.bam 
    genome.chrom.sizes 
    ATAC.2x30mers.unique.nochrM.InsLen 
    -uniqueBAM -normalize
\end{verbatim}

\item Create a normalized genome coverage track:

\begin{verbatim}
python makewigglefromBAM-NH.py title 
   ATAC.2x30mers.unique.nochrM.bam 
   genome.chrom.sizes ATAC.2x30mers.unique.nochrM.wig 
   -notitle -RPM -uniqueBAM
\end{verbatim}

\item Create a bigWig file using the \verb|wigToBigWig| program from the UCSC Genome Browser utilities suite.

\begin{verbatim}
wigToBigWig ATAC.2x30mers.unique.nochrM.wig 
    genome.chrom.sizes 
    ATAC.2x30mers.unique.nochrM.bigWig
\end{verbatim}

\item Calculate the global TSS enrichment. The TSS enrichment $TSS_E$ is the most informative ATAC-seq, and is based on generating an average read distribution profile around annotated transcription start sites for protein coding genes, then calculating the ratio between the number of reads in the immediate neighborhood of the TSS and the number of reads falling in the regions on the flanks of the TSS peak. The advantage of the $TSS_E$ metric is that it is an internal to the dataset measure independent of peak calling. We use a TSS window of $\pm 100$ bp and a TSS flank distances of 2,000 bp, i.e. $TSS_E$ is calculated as follows:

\begin{align}
\begin{split}
TSS_E = \cfrac{|R \in [TSS \pm 100]|}{|R \in [TSS - 2050, TSS - 1950]| + |R \in [TSS + 1950, TSS + 2050]|}
\end{split}
\end{align}

First, generate the TSS metaprofile:

\begin{verbatim}
python signalAroundCoordinate-BW.py 
   annotation.TSS-0bp.bed 0 1 3 4000 
   ATAC.2x30mers.unique.nochrM.bigWig 
   ATAC.2x30mers.unique.nochrM.TSS_profile -normalize
\end{verbatim}

Note that you need a \verb|BED| file containing the start positions and the strands of annotated TSSs in the genome, e.g.:

\begin{verbatim}
#chr	TSS	  TSS	strand	geneName
chr1	1000  1000  +       GENE1
\end{verbatim}

Second, calculate the TSS score:

\begin{verbatim}
python ATACTSSscore.py 
   ATAC.2x30mers.unique.nochrM.TSS_profile 
   100 2000 >> ATACTSSscore.txt
\end{verbatim}

\item Deduplicate the BAM file. Note that this step is different from the typical deduplication carried out in most high-throughput sequencing pipelines, based on tools such as \verb|MarkDups| in \verb|picard|. Here, we perform deduplication of fragments only within the same cell barcode, i.e. for two fragments two be collapsed, they need to have the same coordinates, orientation, and cell barcode.

\begin{verbatim}
python SHARE-seq_ATAC_dedup.py 
   ATAC.2x30mers.unique.nochrM.bam 
   genome.chrom.sizes 
   ATAC.2x30mers.unique.nochrM.BC_dedup.bam 
   -addBC
\end{verbatim}

Use the \verb|[-addBC]| to append the cell barcodes to each alignment as a \verb|BC| tag, making these final files ready to use with \verb|ArchR|.

\item Index the deduplicated BAM file:

\begin{verbatim}
samtools index ATAC.2x30mers.unique.nochrM.BC_dedup.bam
\end{verbatim}

\item Calculate alignment stats for the deduplicated BAM file:

\begin{verbatim}
python SAMstats.py ATAC.2x30mers.unique.nochrM.BC_dedup.bam 
       SAMstats-ATAC.2x30mers.unique.nochrM.BC_dedup
       -bam genome.chrom.sizes samtools -paired -uniqueBAM
\end{verbatim}

\item Calculate fragment count and TSS enrichment statistics for each cell barcode. 

\begin{verbatim}
python SHARE-seq_ATAC_stats_per_cell.py 
  ATAC.2x30mers.unique.nochrM.BC_dedup.bam
  genome.chrom.sizes annotation.TSS-0bp.bed 0 1 2000 200 
  ATAC.2x30mers.unique.nochrM.BC_dedup.per_cell_stats
\end{verbatim}

This script will output a file containing information about the number of fragments and TSS enrichment for each barcode that can be used to filter barcodes for downstream analysis. 

More sophisticated filtering, in addition to these simple metrics, i.e. of doublet cells, can be performed in \verb|ArchR| \cite{ArchR}.

\end{enumerate}

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm} 

\reversemarginpar\marginpar{\section{Expected results}}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Sequencing libraries}}}

Figure \ref{Fig5} shows the typical fragment profiles for ATAC and RNA SHARE-seq libraries. ATAC libraries are expected to show a nucleosomal signature, with a prominent subnucleosomal, mononucleosomal and perhaps dinucleosomal peaks, shifted to the right by the length of the adapters and barcodes added to the original fragments. In contrast, RNA libraries are primarily unimodal in length.

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=14cm]{Fig5-TapeStation.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Typical fragment-length profiles of SHARE-seq libraries}. (A) BioAnalyzer profile of a SHARE-seq ATAC library; (B) BioAnalyzer profile of a SHARE-seq RNA library. }
\label{Fig5}
\end{center}
\end{figure*}

% \hl{XXX DISCUSS TYPICAL LIBRARY CONCENTRATIONS XXX}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Species mixing experiments}}}

A customary experiment to be carried out when testing, adopting or developing any new single-cell protocol is the species mixing experiment, in which cells from two different species, usually mouse and human, are mixed together and the extent of crosstalk/contamination of individual barcodes or of doublet formation (in which two cells are processed together with the same barcode) is assessed based on how many reads in each barcode map to each species. Ideally, all barcodes should feature reads coming from only one of the two species. Doublet arise from loading of multiple cells in the same droplets/wells (depending on the method used) or from physical clumping of cells early in the protocol that then are processed together throughout the rest of the procedure.

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig6-species-mixing.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Typical results from a species mixing SHARE-seq experiment}. Human HEK293 and mouse embryonic fibroblast (MEF) cells were mixed in equal proportions and carried through the SHARE-seq workflow. (A) ATAC fragments per cell. (B) RNA UMIs per cell}
\label{Fig6}
\end{center}
\end{figure*}

Figure \ref{Fig6} shows typical species mixing results for a SHARE-seq experiment. We note that in our hands ATAC experiments usually show virtually no crosstalk between barcodes and very few doublets. On the other hand, pool-split RNA experiments in general often exhibit a small fraction of reads resulting from ``leakage'', likely because of some cells opening up during cell handling and releasing their content into the general reaction pool. This issue does not significantly affect most analyses, but it should be kept in mind in the cases in which it could be a confounding factor.

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{ATAC post-sequencing quality evaluation}}}

Figure \ref{Fig7} shows the key ATAC-seq bulk-level metrics. The fragment length distribution (Figure \ref{Fig7}A) usually shows strong subnucleosomal and nucleosomal peaks as well as a weaker dinucleosomal one. High TSS enrichment is desirable; in this case (Figure \ref{Fig7}B) it is very high ($TSS_E$  $\geq$25). \textit{See} \textbf{Note \ref{tissue_tss}} for more details. Figure \ref{Fig7}C shows the fraction of mitochondrial reads in the human and mouse cells in the species mixing experiment. Note that the fraction can very greatly depending on the properties of the cell type (cancer cell lines and highly  metabolically active cells tend to have more mitochondria \cite{Marinov2014}) and not just on the experimental variation (which in this case is completely minimized as the cells were processed together).

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig7-ATAC-QC.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Basic evaluation of bulk-level ATAC quality and enrichment}. (A) Fragment length distribution. (B) TSS enrichment. Shown are the same experiments as those featured in Figure \ref{Fig6}. (C) Mitochondrial read fraction for each species in this experiment.}
\label{Fig7}
\end{center}
\end{figure*}

Figure \ref{Fig8} shows the key scATAC metrics. One such metric is the relationship between the number of fragments per cell barcode and the TSS enrichment within each cell barcode (Figure \ref{Fig8}A). Another is the curve of the number of fragments per cell barcode plotted against the rank (by the number of fragments per cell barcode) of the cell barcodes (Figure \ref{Fig8}B). Ideally, there should be a clear inflection point between the cell barcodes with high fragment counts and the cell barcodes with low fragment counts, indicating that a set of high-quality cells has been captured and preserved intact through the full pool-split procedure. A flatter, diagonal-like shape of that curve can be indicative of loss of cell integrity during handling and is potentially concerning regarding the biological interpretability of the experiment if the lack of inflection is too extreme.

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig8-scATAC-QC.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Basic evaluation of scATAC-seq-level quality and enrichment}. (A) Number fragments per cell barcode vs. TSS enrichment; (B) Cell barcode rank (by fragment counts) vs. fragment counts per cell barcode}. 
\label{Fig8}
\end{center}
\end{figure*}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{RNA post-sequencing quality evaluation}}}

Figure \ref{Fig9} shows the typical parameters to be evaluated for a bulk-level RNA-seq dataset. One is the distribution of reads along transcripts (Figure \ref{Fig9}A). SHARE-seq is not a 3'-tagging experiment the way some scRNA-seq approaches are as it attaches UMIs to the 3' end of transcripts but cDNAs are tagmented at random after cDNA amplification, thus the first reads of the RNA part of a SHARE-seq dataset can be some distance away from the 3' end. 

Another is the distribution of reads relative to the annotation (Figure \ref{Fig9}B). As is often observed in scRNA-seq datasets, SHARE-seq RNA libraries contain a significant portion of reads originating from introns, presumably from unspliced transcripts present in the nucleus. This is likely due to the fact that the ATAC reaction has to happen first in the workflow, thus a substantial portion of the cytoplasm is lost and the final libraries are enriched for nuclear material.

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig9-RNA-QC.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Basic evaluation of the bulk-level RNA-seq properties}. (A) Read distribution along transcript lengths; (B) Read distribution relative to the exonic, intronic and intergenic genomic spaces.}
\label{Fig9}
\end{center}
\end{figure*}

Figure \ref{Fig10} shows the key metric for evaluating the success of the RNA portion of a SHARE-seq experiment. As with ATAC above, the curve of the number of UMIs per cell barcode plotted against the rank (by the number of UMIs per cell barcode) of the cell barcodes should ideally feature a clear inflection point between the cell barcodes with high UMI counts and the cell barcodes with low UMI counts (Figure \ref{Fig10}A). There should also be a concordance between the cell barcodes with high ATAC fragment counts and those with high UMI counts, i.e. the same cells are of high quality in both modalities, and are thus usable for joint analysis (Figure \ref{Fig10}B)

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=12cm]{Fig10-RNA-UMIs.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Basic evaluation of SHARE-seq RNA single cell-level quality and enrichment}. (A) Cell barcode rank (by UMI counts) vs. UMI counts per cell barcode (B) UMI counts per barcode vs. ATAC fragment counts per barcode. % (C) UMI counts per barcode vs. ATAC TSS enrichment.
}
\label{Fig10}
\end{center}
\end{figure*}

\centerline{}
\reversemarginpar\marginpar{\subsection{\textit{Dimensionality reduction and cell type/cluster identification}}}

Following initial data processing, clusters and cell types can be identified using standard tools for that purpose such as \verb|Seurat| \cite{Seurat} and/or \verb|ArchR| \cite{ArchR}. Figure \ref{Fig11} shows typical such output in UMAP space for both the ATAC and RNA sides of a SHARE-seq experiment from a human embryonic lung tissue sample.

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig11-UMAPs.png}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{\small \textbf{Example SHARE-seq output on human embryonic lung samples}. (A) ArchR iterative LSI UMAP on the ATAC-seq dataset; (B) Seurat UMAP on the RNA dataset. Individual ArchR- and Seurat-defined clusters are colored separately }
\label{Fig11}
\end{center}
\end{figure*}

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}

\begin{changemargin}{3.7cm}{0cm} 
\centerline{}

\reversemarginpar\marginpar{\section{Notes}}

\begin{enumerate}

\item \label{Tn5}The details of the production of hyperactive transposition are beyond the scope of the this chapter. However, detailed instructions for how to carry it out can be found in Picelli et al. 2014 \cite{Picelli2014}.

\item \label{Tissues}In this chapter we presented one of many available protocols for tissue dissociation and nuclei isolation that has worked in our hands in some contexts. However, the variety of tissues and their properties that can be encountered in different organisms is vast, making it practically impossible to have one common such protocol for all situations. Thus novel optimal procedures for tissue dissociation often have to be empirically devised or adapted.

\item \label{xlinks}The protocol we described here used light 0.1\% FA crosslinking. This does not mean that optimal results will be obtained in all contexts with the same conditions, and crosslinking may have to be optimized depending on the specifics of the experimental system being studied.

\item \label{plates}The protocol described here is for a 96 $\times$ 96 $\times$ 96 indexing. However, it can be expanded to more cycles and/or more barcodes, e.g. to a 3-round 384 $\times$ 384 $\times$ 384 indexing, or 4-round or 5-round 96/384 $\times$ 96/384 $\times$ 96/384. Pick the optimal design based on the availability of robotic liquids handlers (it is generally not practical to carry out pipetting of 384-well plates by hand), the desired throughput, and other considerations. Note that additional barcodes and linker would have to be designed so that they are compatible with each other and with further rounds of barcoding. Aim for as much distance in sequence space between the 8-bp barcodes (or increase their length, if the sequencing format allows for it). The set of 8-bp barcodes can be identical throughout all rounds of indexing. % \hl{XXX explain about linkers and blocking oligos XXX}.

\item \label{Tubes}Low-binding tubes are preferable for all reactions in order to ensure maximum yields.

\item \label{Plates}It is optimal in terms of effort to anneal a sufficient amount of oligos for multiple experiment on many separate plates. These can then be used immediately when cells/tissues become available, saving a considerable amount of experimental time.

\item \label{TBbuffer} The TB buffer described here is modified from the original omniATAC protocol with the addition of acetate. In our experience, this provides superior results compared to the traditional buffer formulation.

\item \label{tissue_tss}In our (and not only ours) experience, experiments in cell lines always produce much higher quality ATAC datasets than those obtained from tissues, especially frozen tissues. This is not limited to SHARE-seq but is what has been observed by numerous previous studies mapping chromatin accessibility in tissues samples in contexts such as cancer, development, adult tissues \cite{Cusanovich2018,Preissl2018,Domcke2020,Corces2018}. This is likely due to the extensive handling and freezing and thawing of tissues leading to the breaking up of nuclei and the release of unprotected free DNA that is tagmented by Tn5, increasing the background fragments and decreasing the signal-to-noise. Whether future protocol optimizations can resolve these issues or they are fundamentally insurmountable is not known at present. 

% \item \label{Mitochondria} Early versions of the ATAC-seq protocol \cite{Buenrostro2013} exhibited very high proportions of reads originating from the mitochondrial genome, often exceeding 80\% of the total. This is due to the fact that the mitochondrial genome is not packaged by nucleosomes, and is therefore highly accessible to transposase insertion. Decreasing the fraction of mitochondria has been a key part of the improvement of the ATAC-seq protocol in its the currently used variants relative to the original version, and has been achieved thanks to the addition of the combination of digitonin, Tween-20 and IgePAL detergents during the cell lysis and nuclei preparation step. As a result ATAC-seq libraries generated using modern protocols frequently contain as little as $\leq$5\% of reads for many cell types. 

\end{enumerate}

\end{changemargin} 

\noindent\makebox[\textwidth]{\rule{\textwidth}{1.5pt}}
\centerline{}
\begin{changemargin}{3.7cm}{0cm} 
\reversemarginpar\marginpar{\section*{Acknowledgements}} 

The authors thank Sai Ma and Jason Buenrostro for helpful discussion regarding the SHARE-seq protocol. This work was supported by NIH grants (P50HG007735, RO1 HG008140, U19AI057266 and UM1HG009442 to W.J.G., 1UM1HG009436 to W.J.G. and A.K., 1DP2OD022870-01 and 1U01HG009431 to A.K., and HG006827 to C.H.), the Rita Allen Foundation (to W.J.G.), the Baxter Foundation Faculty Scholar Grant, and the Human Frontiers Science Program grant RGY006S (to W.J.G). W.J.G is a Chan Zuckerberg Biohub investigator and acknowledges grants 2017-174468 and 2018-182817 from the Chan Zuckerberg Initiative. S.K. is supported by MSTP training grant T32GM007365 and the Paul and Daisy Soros Fellowship. Fellowship support also provided by the Stanford School of Medicine Dean's Fellowship (G.K.M.) and by the EMBO Long-Term Fellowship EMBO ALTF 1119-2016 the Human Frontier Science Program Long-Term Fellowship HFSP LT 000835/2017-L (Z.S.). 

\end{changemargin} 

\begin{thebibliography}{100}

\begin{multicols}{2}
\begin{small}

\input{references}

\end{small}
\end{multicols}

\end{thebibliography}

\end{document}
