\documentclass[10pt]{article}
\usepackage[hmargin=1.5cm,top=2cm,bottom=2cm]{geometry}
\usepackage{multicol}
\setlength\columnsep{15pt}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{array}
\usepackage{booktabs}
\usepackage{tabularx}
\usepackage[auth-sc]{authblk}
\usepackage{longtable}
\usepackage{multirow}
\usepackage{hyperref}
\usepackage{enumerate}
\usepackage[labelfont=bf]{caption}
\usepackage[usenames,dvipsnames]{xcolor}
\usepackage{mdframed}
\usepackage{graphics}
\usepackage{multirow}
\usepackage{rotating}
\usepackage{array}
\usepackage{lscape}
\usepackage{caption}
\usepackage{breakurl}
\usepackage{todonotes}
\usepackage{hanging}
\usepackage[final]{pdfpages}
\usepackage[leftFloats,CaptionAfterwards]{fltpage}
\usepackage[numbers,super,sort&compress]{natbib}
\setlength{\bibsep}{0pt plus 0.3ex}
\usepackage{abstract}
\usepackage{enumitem}
\usepackage{soul}
\usepackage{titlesec}
\titleformat{\section}[block]{\large\bfseries\filcenter}{\thesection.}{0.4em}{}
\titleformat{\subsection}[block]{\normalsize\sc\bfseries\filcenter}{\thesubsection.}{0.4em}{}
\titleformat{\subsubsection}[block]{\normalsize\sc\itshape\filright}{\thesubsection.}{0.4em}{}
\setcounter{secnumdepth}{5}

\makeatletter
\def\@biblabel#1{\@ifnotempty{#1}{#1.}}
\makeatother

\newcommand{\filllastline}[1]{
\setlength\leftskip{0pt}
\setlength\rightskip{0pt}
\setlength\parfillskip{0pt}
#1}

\newenvironment{Figure}
{\par\medskip\noindent\minipage{\linewidth}}
{\endminipage\par\medskip}

\title{\bf The chromatin organization of a chlorarachniophyte nucleomorph genome}
\renewcommand\Authfont{\scshape\normalsize}
\author[1,$\#$]{Georgi K. Marinov}
\author[2]{Xinyi Chen}
% \author[1]{S. Tansu Bagdatli}
% \author[1]{Zohar Shipony}
\author[3]{Tong Wu}
\author[3,4,5]{Chuan He}
\author[6]{Arthur R. Grossman}
\author[1,7]{Anshul Kundaje}
\author[1,8,9,10,$\#$]{William J. Greenleaf}
\renewcommand\Affilfont{\itshape\small}
\affil[1]{Department of Genetics, Stanford University, Stanford, California 94305, USA}
\affil[2]{Department of Bioengineering, Stanford University, Stanford, California 94305, USA}
\affil[3]{Department of Chemistry and Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL, 60637, USA}
\affil[4]{Department of Biochemistry and Molecular Biology and Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL, 60637, USA}
\affil[5]{Howard Hughes Medical Institute, The University of Chicago, Chicago, IL, 60637, USA}
\affil[6]{Carnegie Institution for Science, Department of Plant Biology, Stanford, California 94305, USA}
\affil[7]{Department of Computer Science, Stanford University, Stanford, California 94305, USA}
\affil[8]{Center for Personal Dynamic Regulomes, Stanford University, Stanford, California 94305, USA}
\affil[9]{Department of Applied Physics, Stanford University, Stanford, California 94305, USA}
\affil[10]{Chan Zuckerberg Biohub, San Francisco, California, USA}
\affil[$\#$]{Corresponding author}
\date{}

\begin{document}
\maketitle

% \centerline{}
% \centerline{}
\begin{abstract}

\noindent {\normalsize \textbf{Nucleomoprhs are remnants of secondary endosymbiotic events between two eukaryote cells wherein the endosymbiont has retained its eukaryotic nucleus. Nucleomoprhs have evolved at least twice independently, in chlorarachniophytes and cryptophytes, yet they have converged on a remarkably similar genomic architecture, characterized by the most extreme compression and minituarization among all known eukaryotes. Previous computational studies have suggested that nucleomorph chromatin likely exhibits a number of divergent features. In this work, we provide the first maps of open chromatin, active transcription, and three-dimensional genome architecture in the nucleomorph of the chlorarachniophyte \textit{Bigelowiella natans}. We find that the \textit{B. natans} nucleomorph genome exists in a highly accessible state, akin to that of ribosomal DNA in some other eukaryotes, and that it is highly transcribed throughout its length, with few signs of polymerase pausing at transcription start sites (TSSs). At the same time, most nucleomorph TSSs show very strong nucleosome positioning. Chromosome conformation (Hi-C) analysis reveals that nucleomorph chromosomes interact with one other at their telomeric regions, and that \textit{B. natans} mitochondria, which derive from the host, physically interact more strongly with the endosymbiont-derived plastid and with nucleomorph chromosomes than with the host nuclear genome. } 
}
\centerline{}
\centerline{}
\end{abstract}

\begin{multicols}{2}

\section*{Introduction}

Endosymbiosis, especially between an eukaryote host and a prokaryote, is a common event in the evolution of eukaryotes, and subsequent changes in the host and endosymbiont genomes often follow similar general trends. One such trend is the reduction of the endosymbiont's genome due to gene loss and endosymbotic gene transfer\cite{Blanchard2000,Moran2014} (EGT) into the host's nucleus, the classic example of which are the extremely reduced genomes of plastids and mitochondria. This trend is also strongly manifested in the fate of secondary endosymbionts (eukaryotes that become endosymbionts of other eukaryotes). Such endosymbiotic events have occurred on multiple occasions in the evolution of eukaryotes\cite{Keeling2010}, usually resulting in the retention of the plastid of the photosynthetic eukaryotic endosymbiont (as a secondary plastid) while the nucleus of the endosymbiont is lost entirely. However, several notable exceptions to this general rule do exist. One are the dinotoms, the result of the endosymbiosis between a dinoflagellate host and a diatom, in which the diatom has not been substantially reduced\cite{Dodge1971,Figueroa2009}. More striking are the nucleomorphs, which are best known from the chlorarachniophytes and the cryptophytes (but may in fact have arisen in other groups too, such as some dinoflagellates\cite{Nakayama2020,Sarai2020}). Nucleomorphs retain a highly reduced but still functional remnant of the endosymbiont's nucleus and genome\cite{Greenwood1974,Greenwood1977}.

A striking feature of chlorarachniophyte and cryptophyte nucleomorphs is that they have evolved independently, from a green and a red alga, respectively, yet their genomes exhibit remarkably convergent properties\cite{Archibald2007,Archibald2009}. In both cases, the genomes of their nucleomorphs are the smallest known among all eukaryotes, usually just a few hundred kilobases in size. All sequenced nucleomorph genomes are organized into three highly AT-rich chromosomes, in which arrays of ribosomal RNA genes form the subtelomeric regions. These genomes are also extremely compressed, exhibiting very little intergenic space between genes, even overlapping genes on occasion. The genes themselves are also often shortened\cite{Zauner2000,Douglas2001,Tanifuji2011,Moore2012,Tanifuji2014,Gilson2006,Lane2007}.

A number of important questions about nucleomorph biology remain unanswered, including the extent of conservation and divergence relative to conventional eukaryotes of the chromatin organization and transcriptional mechanisms of these extremely reduced nuclei. Previous computational analysis of nucleomorph genome sequences\cite{Marinov2016} has suggested that a considerable degree of deviation from the conventional eukaryotic state is likely to have developed in nucleomorphs. For example, histone proteins are ancestral to all eukaryotes and the key posttranscriptional modifications (PTMs) that they carry also date back to the last eukaryotic common ancestor (LECA) and are extremely conserved in nearly all branches of the eukaryotic tree\cite{Postberg2010}, with the notable exception of dinoflagellates. This is likely because of the involvement of these PTMs in the so called ``histone code''\cite{Jenuwein2001}, in which different PTMs are deposited in a highly regulated manner on specific residues of histone proteins, and are then read out by various effector proteins. The histone code plays key roles in practically all aspects of chromatin biology, such as the regulation of gene expression, the transcriptional cycle, the formation of repressive heterochromatin, mitotic condensation of chromosomes, DNA repair, and many others.

Nucleomorphs appear to be one of the few\cite{Marinov2015dino,Marinov2016} exceptions to this general rule. Inside nucleomorph genomes, in both chlorarachniophytes and cryptophytes, only two histone genes are encoded, one for H3 and for H4, with H2A and H2B apparently imported from the host's nucleus\cite{Hirakawa2011}. Sequence analysis of the H3 and H4 proteins shows remarkable divergence from the typical aminoacid sequence in eukaryotes, in particular in chlorarachniophytes, which have lost nearly all key histone code residues\cite{Marinov2016}. Furthermore, the heptad repeats in the C-terminal domain (CTD) tail of the Rpb1 subunit of RNA Polymerase II, which are also key to the eukaryote transcriptional cycle and mRNA processing\cite{Eick2013} are also highly conserved in eukaryotes\cite{Yang2014}, have also been lost. 

These observations suggest that the nucleomorph chromatin and chromatin-based regulatory changes may be unconventional compared to those of other eukaryotes. For example, nucleomorphs may organize and  protect DNA differently than other eukaryotes, nucleomorph promoters may display atypical signatures of nucleosome depletion and positioning, histone modifications, etc., and relation of these marks to transcriptional activity, or they may exhibit unique 3D genomic organization. However, none of these features of nucleomorph chromatin or gene expression regulation has been directly studied. 

In this work we map chromatin accessibility, active transcription, and three-dimensional (3D) genome organization in the chlorarachniophyte \textit{Bigelowiella natans} to address these gaps in our knowledge of nucleomorph biology. We find that nucleomorph chromosomes exist in a highly accessible state, reminiscent of what is observed for ribosomal DNA (rDNA) in other eukaryotes, such as budding yeast, where it is thought to exist in a fully nucleosome-free state when actively transcribed\cite{Jones2007,Merz2008,Conconi1989}. However, nucleomorph promoters are associated with strongly positioned nucleosomes and exhibit a distinct nucleosome free region upstream of the transcription start site (TSS). Active transcription is nearly uniformly distributed across nucleomorph genomes, with the exception of elevated transcription and chromatin accessibility at the subtelomeric rDNA gene. We find few signs of RNA polymerase pausing over promoters. Nucleomorph chromosomes form a network of telomere-to-telomere interactions in 3D space, and also fold on themselves, but centromeres do not interact preferentially with each other. Curiously, the \textit{B. natans} mitochondrion, which derives from from the host, is more often physically interacting with the endosymbiont compartments (the plastid and the nucleomorph) than the host genome. These results shed light on chlorarachniophyte nucleomorph chromatin structure and provide a framework for future mechanistic studies of transcriptional and regulatory biology in nucleomorphs.

\section*{Results}

\subsection*{Chromatin accessibility in nucleomorphs}

To study the chromatin structure of the \textit{B. natans} nucleomorph genome, we carried out ATAC-seq experiments in \textit{B. natans} grown under standard conditions (see Methods). As \textit{B. natans} has four different genomic compartments (Figure \ref{Fig1}A) -- nucleus, nucleomorph, mitochondrion and plastid -- we first examined the fragment length distribution in each (Figure \ref{Fig1}B). The nucleus exhibits a subnucleosomal peak at $\sim$100 bp as well as a second, most likely nucleosomal, peak (or a ``shoulder'' in the curve) at $\sim$200 bp. In contrast, the nucleomorph displays two peaks, one at $\leq$100 bp and another at $\sim$220 bp, which are tentatively interpreted as subnucleosomal and a nucleosomal one (see further below for details). The mitochondrion and the plastid fragment length distributions are unimodal, consistent with the open DNA structure expected from these compartments. 

\end{multicols}

\renewcommand{\footnotesize}{\fontsize{10pt}{12pt}\selectfont}

\begin{FPfigure}
\begin{center}
\includegraphics[width=17cm]{Fig1-ATAC-V3.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf The chromatin accessibility landscape of the \textit{B. natans} nuclear and nucleomorph genomes}. 
(A) Schematic outline of the different genomic compartments in a \textit{B. natans} cell.
(B) ATAC-seq fragment length distribution in the different genomic compartments.
(C) Distribution of mapped ATAC-seq reads across genomic compartments.
(D) ATAC-seq read coverage metaplot around nuclear TSSs.
(E) Snapshot of an ATAC-seq profile at a typical nuclear locus.
(F) Distribution of ATAC-seq called peaks in the nucleus relative to TSSs. The ``random'' distribution was generated by splitting the genome in 500-bp bins and taking the boundary coordinates of each bin as ``peaks''.
(G) ATAC-seq profiles around all nuclear genes.
(H) ATAC-seq profiles over the NM1, NM2 and NM3 nucleomorph chromosomes.
(I) ATAC-seq read coverage metaplot around nucleomorph TSSs.
(J) ATAC-seq profiles around all nucleomorph genes.
(K) The nucleomorph genome is $\sim$10$\times$ enriched in ATAC-seq datasets relative to the nuclear genome. Shown is the ratio of normalized mapped ATAC-seq peaks for each of the compartments relative to the normalized mapped reads in an input sample (a Hi-C dataset mapped in a single-end format).
(L) Nucleomorph accessibility is comparable to the accessibility of rDNA loci in the budding yeast \textit{S. cerevisiae}, which exist in a fully nucleosome-free conformation when expressed.}
\label{Fig1}
\end{FPfigure}
% [you didnt really make all the changes I requested here, the most relevant would be showing E and H right next to one another, but I have lost my energy to fight this] 
% That was in no way done out of pure stubbornness. I just could not find a way to make it work in terms of space -- the NM plots need to take up as much of the page as possible, and there isn't enough space left for another snapshot side by side. 
\begin{multicols}{2}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{Fig4-nucleosome-positioning-V2.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Nucleosome positioning in the \textit{B. natans} nuclear and nucleomorph genomes}. 
% (A) ATAC-seq read coverage metaplot around nucleomorph TSSs.
% (B) ATAC-seq profiles around all nucleomorph genes.
(A) Location of positioned nucleosomes (determined by NucleoATAC) relative to annotated TSSs in the \textit{B. natans} nucleus (shown are dyad positions extended by $\pm$5 bp) 
(B) V-plot of ATAC-seq fragment distribution around positioned nucleosomes in the nucleus.
(C) Location of positioned nucleosomes (determined by NucleoATAC) relative to annotated TSSs in the \textit{B. natans} nucleomorph (shown are dyad positions extended by $\pm$5 bp) 
(D) V-plot of ATAC-seq fragment distribution around positioned nucleosomes in the nucleomorph.
}
\label{Fig4}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{Fig2-KAS-V2.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf The active transcription landscape of the \textit{B. natans} nuclear and nucleomorph genomes as measured by KAS-seq}. 
(A) KAS-seq and ATAC-seq profiles at a typical nuclear locus.
(B) KAS-seq profiles over the top 10,000 (by KAS signal) nuclear genes.
(C) KAS-seq profiles over the NM1, NM2 and NM3 nucleomorph chromosomes.
(D) Average KAS-seq profile over nuclear gene TSSs
(E) Average KAS-seq profile over nucleomorph TSSs
(F) Relative enrichment of KAS-seq signal in the different \textit{B. natans} genomic compartments. Shown is the ratio of normalized mapped KAS-seq peaks for each of the compartments relative to the normalized mapped reads in an input sample (a Hi-C dataset mapped in a single-end format)
}
\label{Fig2}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=18.5cm]{Fig3-Hi-C.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Three-dimensional organization of \textit{B. natans} nucleomorph chromosomes}. 
(A) Hi-C maps (5-kbp resolution) of the three NM chromosomes reveals a network of telomere-to-telomere interactions as the main 3D organizational feature of the nucleomorph.
(B) High-resolution (1-kbp) maps of the individual NM chromosomes.
(C) and (D) Global scaffolding of the \textit{B. natans} genome.
(E) and (F) The \textit{B. natans} mitochondrion interacts physically more often with the endosymbiont compartments than with the nucleus.
}
\label{Fig3}
\end{figure*}

We then examined the distribution of reads across the compartments (Figure \ref{Fig1}C). As expected from the lack of nucleosomal protection over mitochondrial and plastid DNA, \textit{B. natans} ATAC libraries are dominated by reads mapping to those compartments. However, curiously, nucleomorph-mapping reads represent a much larger fraction of mapped reads than expected from the portion of genomic real estate that the nucleomorph genome comprises, and also relative to what is seen in input samples, suggesting that the nucleomorph might exist in a preferentially accessible chromatin state.
%, even when compared to the genomes of other endosymbiont genomes.

We next turned our attention to ATAC-seq profiles in the nucleus, both to characterize accessibility in the \textit{B. natans} host genome, and to verify the quality of the ATAC-seq libraries. Figure \ref{Fig1}D shows the average ATAC-seq signal over annotated \textit{B. natans} TSSs; it is enriched over promoters, as expected from successful ATAC-seq experiments (we note that the shape of the metaplot is somewhat distorted by the fact that available annotations do not actually include the real TSSs, but only the sites of translation initiation, with most 5'UTR missing).
% [is this oriented to transcripiton? Why is this symmetric?].
% It is of course oriented, it is symmetric because the annotation is entirely missing the 5' UTRs so it gets all averaged out to a symmetric curve
Examination of browser tracks confirmed the enrichment over TSSs (Figure \ref{Fig1}E), and did not reveal obvious open chromatin sites outside promoters. We carried out peak calling using MACS2\cite{MACS2}, and the distribution of called peaks was also strongly centered on promoters, with almost no open chromatin regions outside the $\pm$2 kbp range around TSSs. Thus \textit{B. natans} appears to have a functional genomic organization typical for an eukaryote with a small compact genome such as yeast, with all regulatory elements located immediately adjacent to TSSs, and few to no distal regulatory elements that exhibit increased accessibility. In addition, in standard \textit{B. natans} culture conditions, the majority of promoters exhibit an open chromatin configuration (Figure \ref{Fig1}G). 

Genome browser examination of ATAC-seq profiles over the nucleomorph genome (Figure \ref{Fig1}H) showed high levels of chromatin accessibility throughout all chromosomes, with numerous localized peaks and generally increased accessibility over the rDNA located near telomeres. Strikingly, the average ATAC-seq profile over nucleomorph TSSs (Figure \ref{Fig1}I) showed a strong increase in accessibility around the TSS, but also a clear signature of multiple positioned nucleosomes around each TSS (a clear +1 nucleosome immediately downstream of the TSS, as well as a putative +2 one, together with a --1 nucleosome upstream of the TSS).[% would be great if D and I were next to one another, but I understand the narrative or figure might be awkward] 
This phasing is also clearly visible from the individual ATAC-seq profiles over each nucleomorph gene (Figure \ref{Fig1}J). 

We then quantified the extent of increased accessibility over the nucleomorph genome by calculating the enrichment of ATAC-seq signal relative to the total DNA mass as measured by an input sample. We find that the nucleomorph is $\sim$10$\times$ enriched in ATAC-seq libraries (Figure \ref{Fig1}K). Notably, this enrichment is comparable to what is observed for rDNA genes in the budding yeast \textit{S. cerevisiae} (Figure \ref{Fig1}L),
% [sorry where is the yeast data in fig 1j? this is a confusing call out. You need to tell us the enrichment number and compare it numerically if you are not going to actually show yeast data], 
% That was an unfortunate typo -- I had rearranged the figures but the reference in the text was to when current K and L were I and J. It is fixed now.
which are known to exist in an almost fully nucleosome-free configuration when actively transcribed, which is thought to be $\sim$50\% of the time\cite{Jones2007,Merz2008,Conconi1989,Zhipony2020}. 

Thus the nucleomorph apparently exists in a highly accessible state. 
% [I would not say this here if we have more evidence things are transcribed. If we do not, then I think we need to be more speculative here] 
Of note, this estimation is not driven by the rDNA genes within it, although those are indeed more accessible than the rest of the nucleomorph genome, as the difference in accessibility between the rDNA arrays and the rest of the genome is on the order of $\sim$2$\times$ and they occupy a minor % [is it a secret what protion they occupy? just put the actualy number here] 
($\sim$11\%) portion of it. 

However, nucleomorph TSSs show very strong nucleosome positioning. To more accurately analyze nucleosome positioning in both the nuclear and the nucleomorph compartments, we applied the NucleoATAC algorithm\cite{Schep2015} over the whole nucleomorph genome and over the 1-kb regions centered on annotated 5' gene ends in the nucleus. We identified 7,251 and 1,440 positioned nucleosomes in the nucleus and in the nucleomorph, respectively. The distribution of the nuclear nucleosomes peaked shortly downstream of TSSs (Figure \ref{Fig2}A), suggesting that nuclear TSSs are also associated with a positioned +1 nucleosome. A V-plot\cite{Henikoff2011} analysis showed that the ATAC-seq fragment lengths associated with these nucleosomes  are in the 175-200 bp range, and that subnucleosomal fragments are located in the immediate vicinity (Figure \ref{Fig2}A). In contrast, in the nucleomorph we observe the three positioned nucleosomes (+1, +2, and -1) indicated above (Figure \ref{Fig2}C), but ATAC-seq fragment lengths associated with these nucleosomes are larger, in the 200-225 bp range  (Figure \ref{Fig2}D).

\subsection*{Transcriptional activity in the nucleomorph genome}

Next, we studied the patterns of active transcription in the nucleomorph. To this end, we deployed the KAS-seq assay\cite{KAS}, which maps single-stranded DNA (ssDNA) by specifically labeling unpaired guanines with N$_3$-kethoxal, to which biotin can then be attached using click chemistry, allowing for regions containing ssDNA to be specifically enriched. Most ssDNA in the cell is usually associated with RNA polymerase bubbles\cite{KAS}, thus KAS-seq is a good proxy for active transcription.% [need citation] 
% there has only been one KAS-seq paper, we already cite it...

In the \textit{B. natans} nucleus, KAS-seq shows enrichment over promoters and over actively transcribed genes (Figure \ref{Fig2}A-B), as expected based on patterns observed in other eukaryotes\cite{KAS}, indicative of RNA polymerase spending more time near the TSS. However, we observe only very weak correlation between promoter accessibility and active transcription (Supplementary Figure \ref{FigS1}), suggesting significant decoupling between the opening of nucleosome depleted promoter-proximal regions and the regulation of active transcription in \textit{B. natans}.

In the nucleomorph, we see largely uniform levels of KAS-seq signal, with the exception of the rDNA genes, and three localized peaks, one on the first, and two on the second nucleomorph chromosomes (Figure \ref{Fig2}C-E). The increased transcription of the rDNA genes is consistent with their higher accessibility observed in ATAC-seq data. We quantified the overall enrichment of active transcription in the different compartments and found that the nucleomorph is $\sim$2-fold enriched in KAS-seq datasets than the nucleus (Figure \ref{Fig2}F) relative to an input sample. % [how is this normalized?]

These observations, based on measuring actual active transcription, corroborate previous reports, based on transcriptomic analysis, of high, pervasive, and largely uniform transcriptional activity over most the nucleomorph\cite{Tanifuji2014b,Suzuki2016,Sanita2017,Rangsrikitphoti2019}. However, rDNA genes were removed in some of these analyses\cite{Tanifuji2014b} while we identify them as a transcriptional unit existing in a distinct state from the rest of the nucleomorph genome (in the analysis presented here, multimapping reads were retained and normalized, allowing us to measure accessibility and transcription levels over the rDNA genes; see the Methods section for more details). 

\subsection*{Three-dimensional organization of the \textit{B.natans} nucleomorph genome}

Finally, we mapped the three-dimensional genome organization in \textit{B.natans} using \textit{in situ} chromosomal conformation capture (Hi-C\cite{Rao2014}). We employed a modified protocol for the highly AT-rich nucleomorph genomes (see Methods for details) and generated high-resolution 1-kbp maps, which allow us to investigate the fine features of the small nucleomorph chromosomes.

Hi-C maps reveal that the nucleomorph chromosomes often exist in a folded conformation, in which the two chromosome ends contact each other (Figure \ref{Fig3}A-B). In addition, the telomeric regions of all nucleomorph chromosomes physically associate with each other, forming a telomeric network of interactions (Figure \ref{Fig3}A). In many eukaryotes, a centromeric interaction network is also observed\cite{Hoencamp2021}, but enriched interchromosomal interactions in nucleomorphs appear to be only telomeric. We do not observe much internal structure inside individual nucleomorph chromosomes, with the exception of NM2, in which one potential loop interaction is seen; its mechanistic origins are currently unclear as its singular nature prevents the identification of sequence drivers of its formation. 

We also used our Hi-C data to generate a chromosome-level scaffolding\cite{Dudchenko2017} of the existing assembly of the \textit{B. natans} nuclear genome\cite{Curtis2012}, which originally consisted of 302 nuclear contigs. Our chromosome-level assembly identifies 79 pseudochromosomes; the smallest is $\sim$350 kbp,  and the largest is $\sim$3 Mbp. This assembly retains only 18 smaller unplaced contigs, the largest being 8,753 bp (Figure \ref{Fig3}D).

We made one surprising observation when manually finalizing the chromosome-level assembly -- the mitochondrion, although topologically derived
% [you keep saying topologically derived from the host. What does that mean?] 
% Sorry for the confusion, this is actually standard usage when discussing symbiotic events. It refers to how many membranes separate the compartments; secondary plastids genomes are inside 3 or 4 (depending on the species) membranes; primary plastids and the host mitochondrion are inside 2 membranes; the dinotom endosymbiont mitonchondrion is separates from the host cytoplasm by 4 membranes (and by 6 membranes from the host mitochondrial genome). Etc.
from the host (Figure \ref{Fig1}A), exhibits frequent Hi-C interactions with both plastid and  nucleomorph chromosomes. These interactions can be  seen in the maps themselves (Figure \ref{Fig3}E) and were confirmed by a systematic analysis of chrM interactions with all other chromosomes (Figure \ref{Fig3}F). We also note that we obtain the same result with all available methods for normalizing Hi-C data (Supplementary Figure \ref{FigS2}. While both the plastid and the mitochondrial genomes exist in high copy numbers, the nucleomorph genome has the same copy number as the nuclear genome (as shown by our input samples,
% [what does this mean?]
in which read coverage over the nucleomorph is the same as read coverage over the nucleus), thus we conclude that these preferential interactions likely indeed represent frequent physical proximity in the cell, which then leads to ligation events with nucleomorph chromosomes during the \textit{in situ} Hi-C procedure. % [again, could this just be because they are way more accessible and then just ligate better? Can we normalize for accessibility or compare only highly accessible nuclear genomic elements?]
% These were KR normalized. I did all the other normalizations too, same thing. This was also done in 25-kb bins, so pretty much every single one of them has an accessible peak in the nucleus. 

\section*{Discussion}

This study presents the first analysis of physical chromatin organization in a nucleomorph genome, in the chlorarachniophyte \textit{B. natans}, using a combination of ATAC-seq, Hi-C, and KAS-seq measurements. We also provide a near-complete chromosome-level scaffolding of the nuclear genome by taking advantage of the physical proximity information provided by Hi-C data, and assess the extent of physical interactions between the different genomic compartments. 


While it was previously suspected that nucleomorphs are very highly transcriptionally active, we demonstrate that this activity is also reflected at the level of chromatin structure, as nucleomorph chromosomes are much more highly accessible than those in the nucleus. Previous transcriptomic analyses also suggested pervasive largely uniform transcription levels that also do not change much between conditions\cite{Suzuki2016,Rangsrikitphoti2019,Tanifuji2014b}, and this is also what is seen at the level of the measurements of active transcription by KAS-seq, % [sorry did we look at different conditions? Did I miss that?], we did not; We would have liked to, but we got shut down just before we could do that, and I have not had access to the Carnegie since then to reestablish the culture; and I don't want to grow it on the bench because that is not controlled -- it needs the proper light cycling that can only happen there
with the notable exception of the rDNA genes, which are much more strongly transcribed than the rest of the nucleomorph (and also exhibit elevated accessibility). Taken together, these results suggests the possibility of limited transcriptional regulation in the nucleomorph. However, nucleomorph promoters exhibit a very prominent upstream nucleosome depleted region and strong degree of nucleosome positioning. How this promoter architecture  is generated by sequence elements associated with each promoter is at present not known. It also remains opaque whether these elements merely indicate the location of transcription initiation or if sequence elements with regulatory activity  can influence the levels of transcription. To dissect the function of these elements, methods for the direct genetic manipulation of nucleomorphs will be needed. Surprisingly, strong nucleosome positioning at TSSs is not associated with promoter pausing by the polymerase; elucidating the mechanistic details of transcription initiation and initial nucleosome clearance will likely resolve this apparent contradiction. 

The presence of strongly positioned promoter-proximal nucleosomes also suggests that nucleosomes in different locations in the nucleomorph may in fact exist in distinct chromatin states, but what these might be given the lack of the classical histone PTMs % [I hate the phrase histone code. I think we should avoid it if at all possible] 
% OK, I don't invest too much meaning in it, but I see your point
in the nucleomorph histones is a mystery. There exist only limited studies of the nucleomorph proteome \cite{Hopkins2012} and the posttranslation modifications of nucleomorph histones are yet to be studied. The difference in nucleosome protection fragment lengths between the nuclear and the nucleomorph compartment point the possibility that the nucleomorph may also contain a distinct linker histone(s); these issues remain to be clarified in the future.

Finally, it will be instructive to compare chromatin organization across the different nucleomorph-bearing groups. Nucleomorph histones in cryptophytes are considerably closer to the conventional state of most eukaryotes, and thus determining if these organisms too exhibit elevated accessibility, strong nucleosome positioning, and lack promoter polymerase pausing will be illuminating. % [seems like we could have a stronger concluding paragraph?]

\section*{Methods}

\subsection*{\textit{B. natans} cell culture}

\textit{Bigelowiella natans} strain CCMP2755 starting cultures were obtained from NCMA (National Center for Marine Algae and Microbiota) and cultured in L1-Si media on a 12-h-light:12-h-dark cycle.

\subsection*{ATAC-seq experiments}

ATAC-seq experiments were performed following the omniATAC protocol\cite{omniATAC}. 

Briefly, $\sim$1M \textit{B. natans} cells were centrifuged at 1,000 $g$, then resuspended in 500 $\mu$L 1$\times$ PBS and centrifuged again. Cells were then resuspended in 50 $\mu$L ATAC-RSB-Lysis buffer (10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl$_2$, 0.1\% IGEPAL CA-630, 0.1\% Tween-20, 0.01\% Digitonin) and incubated on ice for 3 minutes. Subsequently 1 mL ATAC-RSB-Wash buffer (10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl$_2$, 0.1\% Tween-20, 0.01\% Digitonin) were added, the tubes were inverted several times, and nuclei were centrifuged at 500 $g$ for 5 min at $4\,^{\circ}\mathrm{C}$. 

Transposition was carried out by resuspending nuclei in a mix of 25 $\mu$L 2$\times$ TD buffer (20 mM Tris-HCl pH 7.6, 10 mM MgCl$_2$, 20\% Dimethyl Formamide), 2.5 $\mu$L transposase (custom produced) and 22.5 $\mu$L nuclease-free H$_2$O, and incubating at $37\,^{\circ}\mathrm{C}$ for 30 min in a Thermomixer at 1000 RPM. 

Transposed DNA was isolated using the MinElute PCR Purification Kit (Qiagen Cat\# 28004/28006), and PCR amplified as previously before\cite{omniATAC}. Libraries were purified using the MinElute kit, then sequenced on a Illumina NextSeq 550 instrument as 2$\times$36mers or  as 2$\times$75mers. 

% \subsection*{NOME-seq experiments}

% Briefly, 1$\times 10^6$ human GM12878 cells were washed with 1$\times$ PBS then resuspended in 200 $\mu$L ice-cold Nuclei Lysis Buffer (10 mM Tris pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1 mM EDTA, 0.5\% NP-40) and incubated on ice for 10 minutes. Nuclei were then centrifuged at 500 $g$ for 5 min at $4\,^{\circ}\mathrm{C}$, resuspended in 200 $\mu$L cold Nuclei Wash Buffer (10 mM Tris pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1 mM EDTA), and centrifuged again at 500 $g$ for 5 min at $4\,^{\circ}\mathrm{C}$. Finally, nuclei were resuspended in 200 $\mu$L reaction buffer (1$\times$ NEB CutSmart buffer, 0.3 M sucrose). Nuclei were then treated with EcoGII by adding 200 units of EcoGII (NEB) and SAM at 0.6 mM, and incubating at $37\,^{\circ}\mathrm{C}$ for 10 min. The reaction was stopped by adding 0.2\% SDS, and HMW DNA was immediately isolated as previously described.

\subsection*{KAS-seq experiments}

KAS-seq experiments were performed as previously described\cite{KAS} with some modifications. 

\textit{B. natans} cells were pelleted by centrifugation at 1000 $g$ for 5 minutes at room temperature, then resuspended in 500 $\mu$L of media supplemented with 5 мM N$_3$-kethoxal (final concentration). Cells were incubated at room temperature for 10 minutes, then centrifuged at 1000 $g$ for 5 minutes at room temperature to remove the media with the kethoxal, and resuspended in 100 $\mu$L cold 1$\times$ PBS. Genomic DNA was then extracted using the Monarch gDNA Purification Kit (NEB T3010S) following the standard protocol but with elution using 85 $\mu$L 25 mM K$_3$BO$_3$ at pH 7.0. 

The click reaction was carried out by combining 87.5 $\mu$L purified and sheared DNA, 2.5 $\mu$L 20 mM DBCO-PEG4-biotin (DMSO solution, Sigma 760749), and 10 $\mu$L 10$\times$ PBS in a final volume of 100 $\mu$L. The reaction was incubated at 37$\,^{\circ}\mathrm{C}$ for 90 minutes.

DNA was purified using AMPure XP beads (50 $\mu$L for a 100 $\mu$L reaction or 100 $\mu$L for a 200 $\mu$L reaction), beads were washed on a magnetic stand twice with 80\% EtOH, and eluted in 130 $\mu$L 25mM K$_3$BO$_3$.

Purified DNA was then sheared on a Covaris E220 instrument down to $\sim$150-400 bp size.

For streptavidin pulldown of biotin-labeled DNA, 10 $\mu$L of 10 mg/mL Dynabeads MyOne Streptavidin T1 beads (Life Technologies, 65602) were separated on a magnetic stand, then washed with 300 $\mu$L of 1$\times$ TWB (Tween Washing Buffer; 5 mM Tris-HCl pH 7.5; 0.5 mM EDTA; 1 M NaCl; 0.05\% Tween 20). The beads were resuspended in 300 $\mu$L of 2$\times$ Binding Buffer (10 mM Tris-HCl pH 7.5, 1 mM EDTA; 2 M NaCl), the sonicated DNA was added (diluted to a final volume of 300 $\mu$L if necessary), and the beads were incubated for $\geq$15 minutes at room temperature on a rotator. After separation on a magnetic stand, the beads were washed with 300 $\mu$L of 1$\times$ TWB, and heated at 55$\,^{\circ}\mathrm{C}$ in a Thermomixer with shaking for 2 minutes. After removal of the supernatant on a magnetic stand, the TWB wash and 55$\,^{\circ}\mathrm{C}$ incubation were repeated. 

Final libraries were prepared on beads using the NEBNext Ultra II DNA Library Prep Kit (NEB, $\#$E7645) as follows. End repair was carried out by resuspending beads in 50 $\mu$L 1$\times$ EB buffer, and adding 3 $\mu$L NEB Ultra End Repair Enzyme and 7 $\mu$L NEB Ultra End Repair Enzyme, followed by incubation at 20$\,^{\circ}\mathrm{C}$ for 30 minutes (in a Thermomixer, with shaking at 1,000 rpm) and then at 65$\,^{\circ}\mathrm{C}$ for 30 minutes. 

Adapters were ligated to DNA fragments by adding 30 $\mu$L Blunt Ligation mix, 1 $\mu$L Ligation Enhancer and 2.5 $\mu$L NEB Adapter, incubating at 20$\,^{\circ}\mathrm{C}$ for 20 minutes, adding 3 $\mu$L USER enzyme, and incubating at 37$\,^{\circ}\mathrm{C}$ for 15 minutes (in a Thermomixer, with shaking at 1,000 rpm) . 

Beads were then separated on a magnetic stand, and washed with 300 $\mu$L TWB for 2 minutes at 55$\,^{\circ}\mathrm{C}$, 1000 rpm in a Thermomixer. After separation on a magnetic stand, beads were washed in 100 $\mu$L 0.1 $\times$ TE buffer, then resuspended in 15 $\mu$L 0.1 $\times$ TE buffer, and heated at 98$\,^{\circ}\mathrm{C}$ for 10 minutes. 

For PCR, 5 $\mu$L of each of the i5 and i7 NEB Next sequencing adapters were added together with 25 $\mu$L 2$\times$ NEB Ultra PCR Mater Mix. PCR was carried out with a 98$\,^{\circ}\mathrm{C}$ incubation for 30 seconds and 12 cycles of 98$\,^{\circ}\mathrm{C}$ for 10 seconds, 65$\,^{\circ}\mathrm{C}$ for 30 seconds, and 72$\,^{\circ}\mathrm{C}$ for 1 minute, followed by incubation at 72$\,^{\circ}\mathrm{C}$ for 5 minutes. 

Beads were separated on a magnetic stand, and the supernatant was cleaned up using 1.8$\times$ AMPure XP beads. 

Libraries were sequenced in a paired-end format on a Illumina NextSeq instrument using NextSeq 500/550 high output kits (2$\times$36 cycles). 

\subsection*{Hi-C experiments}

Hi-C was carried out using the previously described \textit{in situ} procedure\cite{Marinov2021} as follows:

\textit{B. natans} cells were first crosslinked using 37\% formaldehyde (Sigma) at a final concentration of 1\% for 15 minutes at room temperature. Formaldehyde was then quenched using 2.5 M Glycine at a final concentration of 0.25 M. Cells were subsequently centrifuged at 2,000 $g$ for 5 minutes, washed once in 1$\times$ PBS, and stored at -80$\,^{\circ}\mathrm{C}$. 

Cell lysis was initiated by incubation with 250 $\mu$L of cold Hi-C Lysis Buffer (10 mM Tris-HCl pH 8.0, 10 mM NaCl, 0.2\% Igepal CA630) on ice for 15 minutes, followed by centrifugation at 2,500 $g$ for 5 minutes, a wash with 500 $\mu$L of cold Hi-C Lysis Buffer, and centrifugation at 2,500 $g$ for 5 minutes. The pellet was the resuspended in 50 $\mu$L of 0.5\% SDS and incubated at 62$\,^{\circ}\mathrm{C}$ for 10 minutes. SDS was quenched by adding 145 $\mu$L of H$_2$O and 25 $\mu$L of 10\% Triton X-100 and incubating at 37$\,^{\circ}\mathrm{C}$ for 15 minutes.

Restriction digestion was carried out by adding 25 $\mu$L of 10$\times$ NEBuffer 2 and 100 U of the MluCI restriction enzyme (NEB, R0538) and incubating for $\geq$2 hours at 37$\,^{\circ}\mathrm{C}$ in a Thermomixer at 900 rpm. The MluCI restriction enzyme was chosen as more suitable for the highly AT-rich nucleomorph genome. The reaction was then incubated at 62$\,^{\circ}\mathrm{C}$ for 20 minutes in order to inactivate the restriction enzyme. 

Fragment ends were filled in by adding 37.5 $\mu$L of 0.4 mM biotin-14-dATP (ThermoFisher Scientific, $\#$ 19524-016), 1.5 $\mu$L each of 10 mM dCTP, dGTP and dTTP, and 8 $\mu$L of 5U/$\mu$L DNA Polymerase I Large (Klenow) Fragment (NEB M0210). The reaction was the incubated at 37$\,^{\circ}\mathrm{C}$ in a Thermomixer at 900 rpm for 45 minutes.

Fragment end ligation was carried out by adding 663 $\mu$L H$_2$O, 120 $\mu$L 10$\times$ NEB T4 DNA ligase buffer (NEB B0202), 100 $\mu$L of 10\% Triton X-100, 12 $\mu$L of 10 mg/mL Bovine Serum Albumin (100$\times$ BSA, NEB), 5 $\mu$L of 400 U/$\mu$L T4 DNA Ligase (NEB M0202), and incubating at room temperature for $\geq$4 hours with rotation. 

Nuclei were then pelleted by centrifugation at 2,000 $g$ for 5 minutes; the pellet was resuspended in 200 $\mu$L ChIP Elution Buffer (1\% SDS, 0.1 M NaHCO$_3$), Proteinase K was added, and incubated at 65$\,^{\circ}\mathrm{C}$ overnight to reverse crosslinks. 

After addition of 600 $\mu$L 1$\times$TE buffer, DNA was sheared using a Covaris E220 instrument. DNA was then purified using the MinElute PCR Purificaiton Kit (Qiagen $\#$28006), with elution in a total volume of 300 $\mu$L 1$\times$ EB buffer. 

For streptavidin pulldown of biotin-labeled DNA, 150 $\mu$L of 10 mg/mL Dynabeads MyOne Streptavidin T1 beads (Life Technologies, 65602) were separated on a magnetic stand, then washed with 400 $\mu$L of 1$\times$ TWB (Tween Washing Buffer; 5 mM Tris-HCl pH 7.5; 0.5 mM EDTA; 1 M NaCl; 0.05\% Tween 20). The beads were resuspended in 300 $\mu$L of 2$\times$ Binding Buffer (10 mM Tris-HCl pH 7.5, 1 mM EDTA; 2 M NaCl), the sonicated DNA was added, and the beads were incubated for $\geq$15 minutes at room temperature on a rotator. After separation on a magnetic stand, the beads were washed with 600 $\mu$L of 1$\times$ TWB, and heated at 55$\,^{\circ}\mathrm{C}$ in a Thermomixer with shaking for 2 minutes. After removal of the supernatant on a magnetic stand, the TWB wash and 55$\,^{\circ}\mathrm{C}$ incubation were repeated. 

Final libraries were prepared on beads using the NEBNext Ultra II DNA Library Prep Kit (NEB, $\#$E7645) as follows. End repair was carried out by resuspending beads in 50 $\mu$L 1$\times$ EB buffer, and adding 3 $\mu$L NEB Ultra End Repair Enzyme and 7 $\mu$L NEB Ultra End Repair Enzyme, followed by incubation at 20$\,^{\circ}\mathrm{C}$ for 30 minutes and then at 65$\,^{\circ}\mathrm{C}$ for 30 minutes. 

Adapters were ligated to DNA fragments by adding 30 $\mu$L Blunt Ligation mix, 1 $\mu$L Ligation Enhancer and 2.5 $\mu$L NEB Adapter, incubating at 20$\,^{\circ}\mathrm{C}$ for 20 minutes, adding 3 $\mu$L USER enzyme, and incubating at 37$\,^{\circ}\mathrm{C}$ for 15 minutes. 

Beads were then separated on a magnetic stand, and washed with 600 $\mu$L TWB for 2 minutes at 55$\,^{\circ}\mathrm{C}$, 1000 rpm in a Thermomixer. After separation on a magnetic stand, beads were washed in 100 $\mu$L 0.1 $\times$ TE buffer, then resuspended in 16 $\mu$L 0.1 $\times$ TE buffer, and heated at 98$\,^{\circ}\mathrm{C}$ for 10 minutes. 

For PCR, 5 $\mu$L of each of the i5 and i7 NEB Next sequencing adapters were added together with 25 $\mu$L 2$\times$ NEB Ultra PCR Mater Mix. PCR was carried out with a 98$\,^{\circ}\mathrm{C}$ incubation for 30 seconds and 12 cycles of 98$\,^{\circ}\mathrm{C}$ for 10 seconds, 65$\,^{\circ}\mathrm{C}$ for 30 seconds, and 72$\,^{\circ}\mathrm{C}$ for 1 minute, followed by incubation at 72$\,^{\circ}\mathrm{C}$ for 5 minutes. 

Beads were separated on a magnetic stand, and the supernatant was cleaned up using 1.8$\times$ AMPure XP beads. 

Libraries were sequenced in a paired-end format on a Illumina NextSeq instrument using NextSeq 500/550 high output kits (either 2$\times$75 or 2$\times$36 cycles). 

\subsection*{ATAC-seq data processing}

Demultipexed FASTQ files were mapped to the \verb|v1.0| assembly for \textit{Bigelowiella natans} CCMP2755 (with the nucleomorph sequence added) as 2$\times$36mers using Bowtie\cite{Bowtie2009} with the following settings: \verb|-v 2| \verb|-k 2| \verb|-m 1| \verb|--best| \verb|--strata| \verb|-X 1000|. Duplicate reads were removed using \verb|picard|\verb|-tools| (version 1.99). Reads mapping to the plastid, mitochondrion and the nucleomoprh were filtered out for the analysis of accessibility in the nuclear genome. 

Browser tracks generation, fragment length estimation, TSS enrichment calculations, and other analyses were carried out using custom-written Python scripts (\burl{https://github.com/georgimarinov/GeorgiScripts}). 

For the purpose of the analysis of rDNA arrays in nucleomorphs, alignments were carried out with unlimited multimappers with the following settings: \verb|-v 2| \verb|-a| \verb|--best| \verb|--strata| \verb|-X 1000|. Normalization of multimappers was performed as previously described\cite{Marinov2015}.

\subsection*{ATAC-seq peak calling}

Peak calling was carried out using version 2.1.0 of MACS2\cite{MACS2} with default settings.

\subsection*{Analysis of positioned nucleosomes}

Positioned nucleosomes along the whole nucleomorph genome and in the $\pm$500 bp regions around annotated TSSs in the nucleus were identified using NucleoATAC\cite{Schep2015} as follows. We used the low resolution nucleosome calling program \verb|nucleoatac occ| with default parameters that requires ATAC-seq data and genomic windows of interest, and returns a list of nucleosome positions based on the distribution of ATAC-seq fragment lengths centered at these positions. To cover the whole nucleomorph genome, sliding windows of 1 kbp in steps of 500 bp were taken as inputs, and redundant nucleosome positions were eventually discarded. For nuclear TSSs, 1-kbp windows centered at the TSSs were used as inputs. V plots were made by aggregating unique-mapping ATAC-seq reads centered around the positioned nucleosomes, and mapping the density of fragment sizes versus fragment center locations relative to the positioned nucleosomes as previously described\cite{Henikoff2011,Schep2015}.

\subsection*{KAS-seq data processing}

Demultipexed FASTQ files were mapped to the \verb|v1.0| assembly for \textit{Bigelowiella natans} CCMP2755 (with the nucleomorph sequence added) as 2$\times$36mers using Bowtie\cite{Bowtie2009} with the following settings: \verb|-v 2| \verb|-k 2| \verb|-m 1| \verb|--best| \verb|--strata| \verb|-X 1000|. Duplicate reads were removed using \verb|picard|\verb|-tools| (version 1.99). 

Browser tracks generation, fragment length estimation, TSS enrichment calculations, and other analyses were carried out using custom-written Python scripts (\burl{https://github.com/georgimarinov/GeorgiScripts}). 

For the analysis of rDNA arrays in nucleomorphs, alignments were carried out with unlimited multimappers with the following settings: \verb|-v 2| \verb|-a| \verb|--best| \verb|--strata| \verb|-X 1000|. Normalization of multimappers was performed as previously described\cite{Marinov2015}.

\subsection*{Hi-C data processing and assembly scaffolding}

As an initial step, Hi-C sequencing reads were processed against the previously published \textit{B. natans} assembly\cite{Curtis2012} using the Juicer pipeline\cite{Durand2016a} for analyzing Hi-C datasets (version 1.8.9 of Juicer Tools). 

The resulting Hi-C matrices were then used as input to the 3D DNA pipeline\cite{Dudchenko2017} for automated scaffolding with the following parameters: \verb|--editor-coarse-resolution| \verb|5000| \verb|--editor-coarse-region| \verb|5000| \verb|--polisher-input-size| \verb|100000| \verb|--polisher-coarse-resolution| \verb|1000| \newline \verb|--polisher-coarse-region| \verb|300000| \newline  \verb|--splitter-input-size| \verb|100000| \newline  \verb|--splitter-coarse-resolution| \verb|5000| \newline \verb|--splitter-coarse-region| \verb|300000| \verb|--sort-output| \verb|--build-gapped-map| \verb|-r 10| \verb|-i 5000|.

Manual correction of obvious assembly and scaffolding errors was then carried out using Juicebox\cite{Durand2016a}.

After finalizing the scaffolding, Hi-C reads were reprocessed against the new assembly using the Juicer pipeline. 

\section*{Author contributions}

G.K.M. conceptualized the study, performed cell culture, ATAC-seq, KAS-seq and Hi-C experiments and analyzed data. X.C. carried out nucleosome positioning analysis. T.W. and C.H. provided key reagents. W.J.G., A.K., and A.R.G. supervised the study. G.K.M. wrote the manuscript with input from all authors. 

\section*{Acknowledgements}

This work was supported by NIH grants (P50HG007735, RO1 HG008140, U19AI057266 and UM1HG009442 to W.J.G., 1UM1HG009436 to W.J.G. and A.K., 1DP2OD022870-01 and 1U01HG009431 to A.K.), the Rita Allen Foundation (to W.J.G.), the Baxter Foundation Faculty Scholar Grant, and the Human Frontiers Science Program grant RGY006S (to W.J.G). W.J.G. is a Chan Zuckerberg Biohub investigator and acknowledges grants 2017-174468 and 2018-182817 from the Chan Zuckerberg Initiative. Fellowship support provided by the Stanford School of Medicine Dean's Fellowship (G.K.M.). This work is also supported by NSF-IOS EDGE Award 1645164 to A.R.G. and Carnegie Venture grant 10907.

The authors would like to thank Alexandro E. Trevino and members of the Greenleaf, Kundaje, Grossman and Pringle laboratories for helpful discussion and suggestions regarding this work.

\section*{Data Availability}

Data associated with this manuscript have been submitted to GEO under accession number \hl{XXXX}

\section*{Code Availability}

Custom code used to process the data is available at \burl{https://github.com/georgimarinov/GeorgiScripts} and \burl{https://github.com/chenxy19/nucleomorph}.

\section*{Competing Interests}

The authors declare no competing interests.

\begin{thebibliography}{100}

% \section*{References}

\input{references}

\end{thebibliography}

\end{multicols}

\clearpage

\setcounter{table}{0}
\renewcommand{\tablename}{Supplementary Table}
\setcounter{figure}{0}
\renewcommand{\figurename}{Supplementary Figure}

\setcounter{page}{1}
\renewcommand\thepage{{SM }\arabic{page}}

\begin{center}
% {\LARGE \textbf{\begin{spacing}{1.1}XXXX. \\ Supplementary Materials\end{spacing} }}
{\LARGE \textbf{Supplementary Materials}}
\end{center}

% \section*{Supplementary Tables}

\section*{Supplementary Figures}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{FigS1-KAS-vs-ATAC.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Relationship between chromatin accessibility and active transcription as measured by KAS-seq in the \textit{B. natans} nuclear genome}. 
(A) Correlation between ATAC-seq signal over promoters and KAS-seq signal over promoters.
(B) Correlation between ATAC-seq signal over promoters and KAS-seq signal over gene bodies.
}
\label{FigS1}
\end{figure*}

\begin{figure*}[!ht]
\begin{center}
\includegraphics[width=15cm]{FigS2-Hi-C-norm.png}
\end{center}
\captionsetup{singlelinecheck=off,justification=justified}
\caption{
{\bf Impact of different Hi-C normalization methods on the quantification of interactions between different compartments}. 
(A) KR normalization
(B) No normalization
(C) Coverage normalization (VC)
(D) Coverage normalization (VC\_SQRT)
}
\label{FigS2}
\end{figure*}

\end{document}
