# WARNING: this file is not sorted! # db id alt consensus E-value adj_p-value log_adj_p-value bin_location bin_width total_width sites_in_bin total_sites p_success p-value mult_tests 1 SYGCCMYCTRSWGGY MEME-1 SYGCCMYCTRSWGGY 2.9e-5334 7.1e-5336 -12284.64 0.0 50 206 18125 25595 0.24272 6.9e-5338 102 1 ATBRCTTGAACCTGGGARGYANARGTTGCA MEME-2 ATBRCTTGAACCTGGGARGYANARGTTGCA 8.9e-003 2.2e-004 -8.43 0.0 139 191 136 154 0.72775 2.3e-006 95 1 STGTCAGCWRGAAGCMGTYA MEME-3 STGTCAGCWRGAAGCMGTYA 2.0e-011 4.8e-013 -28.36 0.0 47 201 144 338 0.23383 4.8e-015 100 2 AGRDGGCR DREME-1 AGRKGGCR 4.2e-1696 1.0e-1697 -3907.47 0.0 45 213 6317 10280 0.21127 9.6e-1700 106 2 CCASYAGR DREME-2 CCASCAGR 4.4e-1062 1.1e-1063 -2447.57 0.0 45 213 4292 7287 0.21127 1.0e-1065 106 2 GCKCCMYC DREME-3 GCKCCCYC 1.2e-982 2.9e-984 -2264.68 0.0 51 213 4066 6382 0.23944 2.7e-986 106 2 CCTSYAG DREME-5 CCTSCAG 4.4e-354 1.1e-355 -817.34 0.0 44 214 3606 9251 0.20561 1.0e-357 106 2 CYGCCNCC DREME-6 CYGCCNCC 6.1e-427 1.5e-428 -985.10 0.0 39 213 2987 7360 0.18310 1.4e-430 106 2 CRBCTCC DREME-8 CASCTCC 2.4e-073 5.9e-075 -170.92 0.0 38 214 2365 9280 0.17757 5.6e-077 106 2 CCACAHGG DREME-10 CCACAWGG 1.7e-089 4.3e-091 -208.09 0.0 57 213 733 1390 0.26761 4.0e-093 106 2 ACAGCRVC DREME-11 ACAGCASC 2.2e-083 5.5e-085 -194.02 0.0 51 213 1041 2480 0.23944 5.2e-087 106 2 CTACTGVC DREME-12 CTACTGVC 2.6e-178 6.4e-180 -412.60 0.0 63 213 850 1224 0.29577 6.1e-182 106 2 GGWGACA DREME-13 GGWGACA 4.1e-009 1.0e-010 -23.03 0.0 66 214 1025 2760 0.30841 9.4e-013 106 2 CRCCRCC DREME-16 CRCCRCC 1.8e-203 4.3e-205 -470.57 0.0 56 214 2467 5409 0.26168 4.1e-207 106 2 CTGYAG DREME-18 CTGYAG 5.0e-143 1.2e-144 -331.37 0.0 79 215 4431 8823 0.36744 1.1e-146 107 2 AGRGGTCR DREME-19 AGRGGTCA 3.8e-065 9.2e-067 -152.06 0.0 57 213 634 1270 0.26761 8.7e-069 106 2 CWGTGGY DREME-21 CWGTGGY 2.4e-059 5.7e-061 -138.71 0.0 68 214 2796 6742 0.31776 5.4e-063 106 2 GCAGCAKC DREME-22 GCAGCAGC 2.5e-031 6.1e-033 -74.18 0.0 71 213 878 1862 0.33333 5.7e-035 106 2 CCGCYAGG DREME-24 CCGCCAGG 8.8e-053 2.2e-054 -123.57 0.0 71 213 466 758 0.33333 2.0e-056 106 2 GTGASAAC DREME-26 GTGASAAC 2.0e-002 4.8e-004 -7.63 0.0 165 213 378 439 0.77465 4.6e-006 106 2 CTGCCMTC DREME-27 CTGCCCTC 6.2e-220 1.5e-221 -508.46 0.0 49 213 1285 2425 0.23005 1.4e-223 106 2 GAAGGTGR DREME-28 GAAGGTGR 4.1e-004 9.9e-006 -11.52 0.0 91 213 498 975 0.42723 9.4e-008 106 2 CAGCBTCC DREME-29 CAGCCTCC 1.2e-001 2.8e-003 -5.87 0.0 163 213 1783 2225 0.76526 2.7e-005 106 2 CGCCCCCA DREME-30 CGCCCCCA 5.0e-052 1.2e-053 -121.83 0.0 55 213 379 705 0.25822 1.2e-055 106 2 AGACATTK DREME-31 AGACATTG 4.5e-002 1.1e-003 -6.81 0.0 125 213 412 613 0.58685 1.0e-005 106 2 CAGMAGAG DREME-33 CAGMAGAG 1.2e-178 2.9e-180 -413.40 0.0 51 213 1229 2399 0.23944 2.7e-182 106 2 GGCCTCTA DREME-34 GGCCTCTA 2.7e-025 6.6e-027 -60.28 0.0 49 213 190 388 0.23005 6.3e-029 106 2 GGAAGCCA DREME-35 GGAAGCCA 7.0e0000 1.7e-001 -1.77 0.0 163 213 485 594 0.76526 1.8e-003 106 2 ATGTGGY DREME-36 ATGTGGY 2.1e-002 5.1e-004 -7.58 0.0 68 214 614 1664 0.31776 4.8e-006 106 2 AGCGCCYC DREME-37 AGCGCCYC 1.8e-243 4.3e-245 -562.67 0.0 51 213 967 1486 0.23944 4.1e-247 106 ## # Detailed descriptions of columns in this file: # # db: The name of the database (file name) that contains the motif. # id: A name for the motif that is unique in the motif database file. # alt: An alternate name of the motif that may be provided # in the motif database file. # consensus: A consensus sequence computed from the motif. # E-value: The expected number motifs that would have least one. # region as enriched for best matches to the motif as the reported region. # The E-value is the p-value multiplied by the number of motifs in the # input database(s). # adj_p-value: The probability that any tested region would be as enriched for # best matches to this motif as the reported region is. # By default the p-value is calculated by using the one-tailed binomial # test on the number of sequences with a match to the motif # that have their best match in the reported region, corrected for # the number of regions and score thresholds tested. # The test assumes that the probability that the best match in a sequence # falls in the region is the region width divided by the # number of places a motif # can align in the sequence (sequence length minus motif width plus 1). # When CentriMo is run in discriminative mode with a negative # set of sequences, the p-value of a region is calculated # using the Fisher exact test on the # enrichment of best matches in the positive sequences relative # to the negative sequences, corrected # for the number of regions and score thresholds tested. # The test assumes that the probability that the best match (if any) # falls into a given region # is the same for all positive and negative sequences. # log_adj_p-value: Log of adjusted p-value. # bin_location: Location of the center of the most enriched region. # bin_width: The width (in sequence positions) of the most enriched region. # A best match to the motif is counted as being in the region if the # center of the motif falls in the region. # total_width: The window maximal size which can be reached for this motif: # rounded(sequence length - motif length +1)/2 # sites_in_bin: The number of (positive) sequences whose best match to the motif # falls in the reported region. # Note: This number may be less than the number of # (positive) sequences that have a best match in the region. # The reason for this is that a sequence may have many matches that score # equally best. # If n matches have the best score in a sequence, 1/n is added to the # appropriate bin for each match. # total_sites: The number of sequences containing a match to the motif # above the score threshold. # p_success: The probability of falling in the enriched window: # bin width / total width # p-value: The uncorrected p-value before it gets adjusted to the # number of multiple tests to give the adjusted p-value. # mult_tests: This is the number of multiple tests (n) done for this motif. # It was used to correct the original p-value of a region for # multiple tests using the formula: # p' = 1 - (1-p)^n where p is the uncorrected p-value. # The number of multiple tests is the number of regions # considered times the number of score thresholds considered. # It depends on the motif length, sequence length, and the type of # optimizations being done (central enrichment, local enrichment, # score optimization).