# WARNING: this file is not sorted! # db id alt consensus E-value adj_p-value log_adj_p-value bin_location bin_width total_width sites_in_bin total_sites p_success p-value mult_tests 1 CCASYAGRKGGCRS MEME-1 CCASYAGRKGGCRS 2.1e-9693 4.8e-9695 -22322.00 0.0 43 207 20925 25255 0.20773 4.6e-9697 103 2 AGRKGGCR DREME-1 AGRKGGCR 1.8e-4092 4.2e-4094 -9425.36 0.0 39 213 8617 10952 0.18310 3.9e-4096 106 2 CCASYAGR DREME-2 CCASYAGR 2.2e-3249 5.1e-3251 -7484.07 0.0 45 213 7950 10222 0.21127 4.8e-3253 106 2 AGRKGGAG DREME-4 AGRGGGAG 9.3e-548 2.1e-549 -1263.37 0.0 37 213 2074 3935 0.17371 2.0e-551 106 2 BGTGGY DREME-5 KGTGGY 8.2e-973 1.9e-974 -2242.09 0.0 39 215 6447 15656 0.18140 1.7e-976 107 2 GCGCCBCC DREME-7 GCGCCBCC 1.2e-523 2.7e-525 -1207.85 0.0 33 213 1309 2068 0.15493 2.6e-527 106 2 CKBCCTCC DREME-8 CTBCCTCC 1.2e-022 2.6e-024 -54.30 0.0 31 213 746 3524 0.14554 2.5e-026 106 2 CCKCYAGG DREME-10 CCKCYAGG 3.0e-303 6.7e-305 -700.38 0.0 43 213 1317 2396 0.20188 6.4e-307 106 2 CTGYAGK DREME-11 CTGCAGK 2.7e-165 6.1e-167 -382.72 0.0 66 214 2621 5345 0.30841 5.8e-169 106 2 GTCRCTGY DREME-12 GTCRCTGY 1.1e-073 2.6e-075 -171.74 0.0 53 213 538 1037 0.24883 2.4e-077 106 2 GTGGHCA DREME-14 GTGGMCA 2.8e-388 6.4e-390 -896.15 0.0 42 214 1932 3848 0.19626 6.1e-392 106 2 GAKGGYGC DREME-15 GAKGGTGC 2.2e-607 5.0e-609 -1400.67 0.0 39 213 1567 2265 0.18310 4.7e-611 106 2 RGWGACA DREME-17 RGWGACA 8.7e-035 2.0e-036 -82.21 0.0 38 214 1192 4710 0.17757 1.9e-038 106 2 RAACWGCA DREME-18 RAACWGCA 1.1e-035 2.5e-037 -84.28 0.0 71 213 903 1882 0.33333 2.4e-039 106 2 CBGCCGCC DREME-19 CBGCCGCC 2.0e-022 4.6e-024 -53.73 0.0 49 213 387 1028 0.23005 4.4e-026 106 2 GSTGCTGY DREME-22 GSTGCTGY 2.2e-082 5.1e-084 -191.80 0.0 51 213 926 2138 0.23944 4.8e-086 106 2 CAGCRTC DREME-23 CAGCATC 1.6e-006 3.6e-008 -17.14 0.0 90 214 843 1703 0.42056 3.4e-010 106 2 CCGCTAGA DREME-24 CCGCTAGA 9.6e-038 2.2e-039 -89.02 0.0 37 213 103 154 0.17371 2.1e-041 106 2 ACTGACA DREME-25 ACTGACA 1.3e-022 3.0e-024 -54.18 0.0 46 214 365 1013 0.21495 2.8e-026 106 2 CGBCTCC DREME-26 CGSCTCC 6.5e-030 1.5e-031 -70.99 0.0 36 214 614 2283 0.16822 1.4e-033 106 2 GCRGCCRC DREME-27 GCAGCCGC 1.4e-031 3.1e-033 -74.85 0.0 59 213 557 1266 0.27700 2.9e-035 106 2 AGAGGCCA DREME-29 AGAGGCCA 1.7e-029 3.8e-031 -70.05 0.0 47 213 272 623 0.22066 3.6e-033 106 2 GGACRC DREME-31 GGACRC 2.0e-028 4.5e-030 -67.57 0.0 51 215 1252 3913 0.23721 4.2e-032 107 2 GCAGTWCC DREME-32 GCAGTWCC 2.4e-044 5.4e-046 -104.23 0.0 81 213 405 601 0.38028 5.1e-048 106 2 CCACTGGR DREME-36 CCACTGGR 2.1e-099 4.8e-101 -230.99 0.0 45 213 602 1231 0.21127 4.5e-103 106 2 CTCTGCWG DREME-37 CTCTGCWG 2.3e-325 5.1e-327 -751.31 0.0 33 213 1178 2375 0.15493 4.9e-329 106 2 CDCTTCC DREME-38 CWCTTCC 2.2e0000 4.9e-002 -3.01 0.0 130 214 2338 3687 0.60748 4.8e-004 106 2 GCWCCTCC DREME-39 GCWCCTCC 1.5e-017 3.4e-019 -42.52 0.0 59 213 427 1033 0.27700 3.2e-021 106 2 ATCACMGC DREME-41 ATCACAGC 4.8e-003 1.1e-004 -9.13 0.0 105 213 220 354 0.49296 1.0e-006 106 ## # Detailed descriptions of columns in this file: # # db: The name of the database (file name) that contains the motif. # id: A name for the motif that is unique in the motif database file. # alt: An alternate name of the motif that may be provided # in the motif database file. # consensus: A consensus sequence computed from the motif. # E-value: The expected number motifs that would have least one. # region as enriched for best matches to the motif as the reported region. # The E-value is the p-value multiplied by the number of motifs in the # input database(s). # adj_p-value: The probability that any tested region would be as enriched for # best matches to this motif as the reported region is. # By default the p-value is calculated by using the one-tailed binomial # test on the number of sequences with a match to the motif # that have their best match in the reported region, corrected for # the number of regions and score thresholds tested. # The test assumes that the probability that the best match in a sequence # falls in the region is the region width divided by the # number of places a motif # can align in the sequence (sequence length minus motif width plus 1). # When CentriMo is run in discriminative mode with a negative # set of sequences, the p-value of a region is calculated # using the Fisher exact test on the # enrichment of best matches in the positive sequences relative # to the negative sequences, corrected # for the number of regions and score thresholds tested. # The test assumes that the probability that the best match (if any) # falls into a given region # is the same for all positive and negative sequences. # log_adj_p-value: Log of adjusted p-value. # bin_location: Location of the center of the most enriched region. # bin_width: The width (in sequence positions) of the most enriched region. # A best match to the motif is counted as being in the region if the # center of the motif falls in the region. # total_width: The window maximal size which can be reached for this motif: # rounded(sequence length - motif length +1)/2 # sites_in_bin: The number of (positive) sequences whose best match to the motif # falls in the reported region. # Note: This number may be less than the number of # (positive) sequences that have a best match in the region. # The reason for this is that a sequence may have many matches that score # equally best. # If n matches have the best score in a sequence, 1/n is added to the # appropriate bin for each match. # total_sites: The number of sequences containing a match to the motif # above the score threshold. # p_success: The probability of falling in the enriched window: # bin width / total width # p-value: The uncorrected p-value before it gets adjusted to the # number of multiple tests to give the adjusted p-value. # mult_tests: This is the number of multiple tests (n) done for this motif. # It was used to correct the original p-value of a region for # multiple tests using the formula: # p' = 1 - (1-p)^n where p is the uncorrected p-value. # The number of multiple tests is the number of regions # considered times the number of score thresholds considered. # It depends on the motif length, sequence length, and the type of # optimizations being done (central enrichment, local enrichment, # score optimization).