# Gene Expression Data Files Generated: 2025-12-18 23:35:12.654007 ## Dataset Information - Total genes: 33355 - Total cells: 57868 - Clusters: 23 (c0, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22) - PCW timepoints: 4 (pcw16, pcw20, pcw21, pcw24) - Cluster×PCW combinations: 85 ## Directory Structure ### 01_by_cluster/ Expression matrices with genes as rows and clusters as columns. - mean_expression_by_cluster.txt: Mean expression per gene per cluster - median_expression_by_cluster.txt: Median expression per gene per cluster - pct_expressing_by_cluster.txt: % of cells expressing each gene per cluster ### 02_by_pcw/ Expression matrices with genes as rows and PCW timepoints as columns. - mean_expression_by_pcw.txt: Mean expression per gene per PCW - median_expression_by_pcw.txt: Median expression per gene per PCW - pct_expressing_by_pcw.txt: % of cells expressing each gene per PCW ### 03_by_cluster_and_pcw/ Combined cluster×PCW analysis. - mean_expression_cluster_pcw.txt: Mean expression per gene for each cluster×PCW combo - pct_expressing_cluster_pcw.txt: % expressing for each cluster×PCW combo - metadata_cluster_pcw.txt: Information about each combination (cell counts) ### 04_individual_pcw/ Separate files for each PCW timepoint showing cluster breakdowns. - pcw16_mean_by_cluster.txt, pcw16_pct_by_cluster.txt - pcw20_mean_by_cluster.txt, pcw20_pct_by_cluster.txt - pcw21_mean_by_cluster.txt, pcw21_pct_by_cluster.txt - pcw24_mean_by_cluster.txt, pcw24_pct_by_cluster.txt ### 05_gene_annotations/ Gene-level metadata and annotations. - gene_metadata.txt: Overall statistics for each gene - highly_variable_genes.txt: Top 2000 most variable genes ### 06_summary/ Summary statistics and top gene lists. - cell_counts_per_condition.txt: Number of cells in each cluster×PCW combo - top50_genes_per_cluster.txt: Top 50 genes for each cluster - top50_genes_per_pcw.txt: Top 50 genes for each PCW timepoint ## File Format All files are tab-delimited text (.txt) with: - Header row with column names - First column: gene_id (Ensembl ID) - No quotes around strings - Missing values: represented as 0 ## Usage Examples ### R ```R # Load cluster expression data cluster_expr <- read.table('01_by_cluster/mean_expression_by_cluster.txt', header=TRUE, row.names=1, sep='\t') # Get expression of a specific gene (e.g., SBF2) sbf2_expr <- cluster_expr['ENSG00000133703', ] # Find top genes in cluster c15 top_c15 <- cluster_expr[order(cluster_expr$c15, decreasing=TRUE), ] head(top_c15, 20) ``` ### Python (pandas) ```python import pandas as pd # Load cluster expression data cluster_expr = pd.read_table('01_by_cluster/mean_expression_by_cluster.txt', index_col=0) # Get expression of specific gene sbf2_expr = cluster_expr.loc['ENSG00000133703'] # Find top genes in cluster c15 top_c15 = cluster_expr.sort_values('c15', ascending=False) ``` ## Notes - Gene IDs are Ensembl IDs (format: ENSG#############) - Expression values are normalized (log-transformed) - Expression = 0 means gene not detected in any cells - Percentage expressing: % of cells with expression > 0