# WARNING: this file is not sorted! # db id alt consensus E-value adj_p-value log_adj_p-value bin_location bin_width total_width sites_in_bin total_sites p_success p-value mult_tests 1 YNRCCASYAGRKGGCRSY MEME-1 YNRCCASYAGRKGGCRSY 6.4e-8636 1.6e-8637 -19886.97 0.0 43 203 18062 21203 0.21182 1.6e-8639 101 2 AGRKGGCR DREME-1 AGRKGGCR 1.5e-3787 3.7e-3789 -8723.18 0.0 39 213 7805 9785 0.18310 3.5e-3791 106 2 CCASYAGR DREME-2 CCASYAGR 9.7e-3055 2.4e-3056 -7035.84 0.0 45 213 7243 9146 0.21127 2.2e-3058 106 2 KGTGGY DREME-4 KGTGGY 3.8e-940 9.3e-942 -2166.81 0.0 35 215 5155 12569 0.16279 8.7e-944 107 2 AGRKGGAG DREME-5 AGRGGGAG 8.9e-492 2.2e-493 -1134.39 0.0 37 213 1797 3339 0.17371 2.1e-495 106 2 KHACTGCA DREME-7 KHACTGCA 1.0e-077 2.4e-079 -181.01 0.0 77 213 1057 1815 0.36150 2.3e-081 106 2 GCGCCBCC DREME-8 GCGCCBCC 2.6e-471 6.4e-473 -1087.26 0.0 35 213 1115 1632 0.16432 6.1e-475 106 2 CKBCCTCC DREME-9 CTBCCTCC 7.5e-018 1.8e-019 -43.14 0.0 31 213 594 2800 0.14554 1.7e-021 106 2 CCKCTAGR DREME-11 CCKCTAGR 4.6e-274 1.1e-275 -633.11 0.0 39 213 847 1367 0.18310 1.0e-277 106 2 MGTGGYCA DREME-12 AGTGGYCA 2.6e-178 6.4e-180 -412.61 0.0 51 213 757 1198 0.23944 6.0e-182 106 2 ACAGYGVC DREME-13 ACAGYGVC 4.5e-241 1.1e-242 -557.14 0.0 33 213 860 1715 0.15493 1.0e-244 106 2 ACAGCABC DREME-16 ACAGCASC 1.3e-025 3.2e-027 -60.99 0.0 43 213 366 1042 0.20188 3.1e-029 106 2 CACTAGAK DREME-17 CACTAGAK 5.7e-782 1.4e-783 -1802.59 0.0 47 213 1753 2094 0.22066 1.3e-785 106 2 RGWGACA DREME-18 RGWGACA 1.6e-031 3.8e-033 -74.65 0.0 38 214 1019 3981 0.17757 3.6e-035 106 2 GYGGCCGY DREME-19 GYGGCCGC 9.8e-023 2.4e-024 -54.39 0.0 51 213 219 471 0.23944 2.2e-026 106 2 GGACRC DREME-20 GGACRC 1.8e-026 4.5e-028 -62.98 0.0 47 215 988 3247 0.21860 4.2e-030 107 2 CTGYAG DREME-21 CTGYAG 4.6e-146 1.1e-147 -338.37 0.0 61 215 2677 6071 0.28372 1.0e-149 107 2 CGCCKCC DREME-22 CGCCKCC 5.4e-050 1.3e-051 -117.16 0.0 34 214 463 1430 0.15888 1.2e-053 106 2 GGGCAGYA DREME-24 GGGCAGCA 4.1e-325 1.0e-326 -750.63 0.0 39 213 917 1401 0.18310 9.5e-329 106 2 CCGCCAGR DREME-27 CCGCCAGG 1.2e-083 3.0e-085 -194.61 0.0 47 213 323 515 0.22066 2.9e-087 106 2 GTGGWA DREME-29 GTGGAA 7.0e-083 1.7e-084 -192.88 0.0 37 215 1438 5065 0.17209 1.6e-086 107 2 GTGGCWGA DREME-30 GTGGCWGA 2.0e-051 4.8e-053 -120.48 0.0 53 213 443 902 0.24883 4.5e-055 106 2 AAACTRCA DREME-32 AAACTGCA 1.9e-006 4.6e-008 -16.90 0.0 83 213 373 744 0.38967 4.3e-010 106 2 CAGCVTC DREME-33 CAGCCTC 1.6e-007 3.9e-009 -19.36 0.0 110 214 1763 3078 0.51402 3.7e-011 106 2 GWAAACA DREME-36 GWAAACA 2.5e0000 6.1e-002 -2.80 0.0 194 214 2097 2264 0.90654 5.9e-004 106 2 AGAKGGTG DREME-37 AGAKGGTG 1.5e-150 3.6e-152 -348.71 0.0 33 213 521 1018 0.15493 3.4e-154 106 2 GAYATTGC DREME-38 GAYATTGC 8.5e-008 2.1e-009 -20.00 0.0 75 213 156 286 0.35211 1.9e-011 106 ## # Detailed descriptions of columns in this file: # # db: The name of the database (file name) that contains the motif. # id: A name for the motif that is unique in the motif database file. # alt: An alternate name of the motif that may be provided # in the motif database file. # consensus: A consensus sequence computed from the motif. # E-value: The expected number motifs that would have least one. # region as enriched for best matches to the motif as the reported region. # The E-value is the p-value multiplied by the number of motifs in the # input database(s). # adj_p-value: The probability that any tested region would be as enriched for # best matches to this motif as the reported region is. # By default the p-value is calculated by using the one-tailed binomial # test on the number of sequences with a match to the motif # that have their best match in the reported region, corrected for # the number of regions and score thresholds tested. # The test assumes that the probability that the best match in a sequence # falls in the region is the region width divided by the # number of places a motif # can align in the sequence (sequence length minus motif width plus 1). # When CentriMo is run in discriminative mode with a negative # set of sequences, the p-value of a region is calculated # using the Fisher exact test on the # enrichment of best matches in the positive sequences relative # to the negative sequences, corrected # for the number of regions and score thresholds tested. # The test assumes that the probability that the best match (if any) # falls into a given region # is the same for all positive and negative sequences. # log_adj_p-value: Log of adjusted p-value. # bin_location: Location of the center of the most enriched region. # bin_width: The width (in sequence positions) of the most enriched region. # A best match to the motif is counted as being in the region if the # center of the motif falls in the region. # total_width: The window maximal size which can be reached for this motif: # rounded(sequence length - motif length +1)/2 # sites_in_bin: The number of (positive) sequences whose best match to the motif # falls in the reported region. # Note: This number may be less than the number of # (positive) sequences that have a best match in the region. # The reason for this is that a sequence may have many matches that score # equally best. # If n matches have the best score in a sequence, 1/n is added to the # appropriate bin for each match. # total_sites: The number of sequences containing a match to the motif # above the score threshold. # p_success: The probability of falling in the enriched window: # bin width / total width # p-value: The uncorrected p-value before it gets adjusted to the # number of multiple tests to give the adjusted p-value. # mult_tests: This is the number of multiple tests (n) done for this motif. # It was used to correct the original p-value of a region for # multiple tests using the formula: # p' = 1 - (1-p)^n where p is the uncorrected p-value. # The number of multiple tests is the number of regions # considered times the number of score thresholds considered. # It depends on the motif length, sequence length, and the type of # optimizations being done (central enrichment, local enrichment, # score optimization).