Help file for darwin created by darwin on linneus78 on Tue Feb 19 10:54:00 2013 Do not edit this file, this file is built automatically by: make darwinhelp Warning: procedure Trim reassigned Warning: procedure FastME reassigned Warning: procedure NJ reassigned Warning: procedure BioNJ reassigned Warning: procedure BioNJAdam reassigned Warning: procedure Weighbor reassigned Warning: procedure Protdist reassigned Warning: procedure TKFdist reassigned Warning: procedure TKF91 reassigned Warning: procedure TKF92 reassigned Warning: procedure RAxML reassigned Warning: procedure PhyML reassigned Warning: procedure PhyML_2_0 reassigned Warning: procedure WritePHYLIP reassigned Warning: procedure FastME reassigned Warning: procedure NJ reassigned Warning: procedure BioNJ reassigned Warning: procedure BioNJAdam reassigned Warning: procedure Weighbor reassigned Warning: procedure Protdist reassigned Warning: procedure TKFdist reassigned Warning: procedure TKF91 reassigned Warning: procedure TKF92 reassigned Warning: procedure FastME reassigned Warning: procedure NJ reassigned Warning: procedure BioNJ reassigned Warning: procedure BioNJAdam reassigned Warning: procedure Weighbor reassigned Warning: procedure Protdist reassigned Warning: procedure TKFdist reassigned Warning: procedure TKF91 reassigned Warning: procedure TKF92 reassigned Warning: procedure IdenticalTrees reassigned Warning: procedure GetLcaSubtree reassigned Warning: procedure TotalTreeWeight reassigned Warning: procedure GetTreeLength_r reassigned Warning: procedure AddSpecies reassigned Warning: procedure FindRules reassigned Warning: procedure FindRules_R reassigned Warning: procedure FindSpeciesViolations reassigned Warning: procedure IsAmbig reassigned Warning: procedure CheckAmbigTree reassigned Warning: procedure CheckTree reassigned Warning: procedure SubDist reassigned Warning: procedure GetRootDist_r reassigned Warning: procedure GetPathDistance reassigned Warning: procedure GetMATreeNew reassigned Warning: procedure NucPepMatch reassigned Warning: procedure NucPepMatch_select reassigned Warning: procedure NucPepMatch_Entry reassigned Warning: procedure NucPepMatch_ID reassigned Warning: procedure GetPosition reassigned Warning: procedure LocalNucPepAlign reassigned Warning: procedure NucPepMatch_print reassigned Warning: procedure LocalNucPepAlignBestPam reassigned Warning: procedure GlobalNucPepAlign reassigned Warning: procedure GetPeptides reassigned Warning: procedure GetIntrons reassigned Warning: procedure Normalize reassigned Warning: procedure Denormalize reassigned Warning: procedure AlignNucPepAll reassigned Warning: procedure VisualizeGene reassigned Warning: procedure VisualizeProtein reassigned AAAToInt Function AAAToInt - convert a 3 letter amino acid code to an integer Option: builtin Calling Sequence: AAAToInt(aa) Parameters: Name Type Description ----------------------------------------------------- aa string three-letter amino acid abbreviations Returns: 1..20 Synopsis: This function converts a three letter abbreviation for an amino acid to a posint between 1..20 according to the standard ordering of amino acids. (see ?aminoacids) Examples: > AAAToInt('Val'); 20 See Also: ?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt ?AminoToInt ?BToInt ?CodonCode ?IntToAAA ?IntToCodon ?AToCInt ?CIntToA ?CodonToA ?IntToAmino ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB ?AToInt ?CIntToAmino ?CodonToInt ?IntToBase ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB AC Class AC - Data structure for storing ACs (Accession numbers) of DB Template: AC(id) Fields: Name Type Description ------------------------------------------------------------------------ id {list,string,structure} ID(s) of Entries in the database DB PatEntry, Match or Entry data structure Returns: AC Methods: AC_type Entry Sequence Synopsis: AC is a data structure which holds accession numbers (ACs) contained in the and tags in a Darwin formatted database. ACs can be used as arguments to other functions, e.g. Entry, Sequence, to indicate that the Entry or sequence desired is the one with the given AC. AC will attempt to convert its arguments when they are other entry descriptions to ACs. An AC can be given with or without the trailing ';'. The database contains the semicolon, so if the AC does not have it, one is added. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > ac := AC('Q62671'); ac := AC(Q62671) > Entry(ac); EDD_RATQ62671;Ubiquitin-- ..(1568).. V > Sequence(ac); ARRERMTAREEASLRTLEGRRRATLLSARQGMMSARGDFLNYALSLMRSH ..(920).. LAIKTKNFGFV > AC(Entry(2)); AC(Q43495;) > AC(PatEntry(10000..10002)); AC(P25623; P25622; Q96VH0;,Q9Z851; Q9JSE4;,P56926;) > AC(Sequence(Entry(1))); AC(P15711;) See Also: ?Entry ?Match ?SearchAC ?Sequence ?Species_Entry ?ID ?PatEntry ?SearchID ?SPCommonName ?SP_Species APC Function APC( MA:array(string), Pos:integer ) Returns an APC amino acid if all sequences in MA at Pos contain the same amino acid. If a third argument is given then the percentage of non indel is greater than or equal to a certain threshold. Deletions are ignored AToCInt Function AToCInt - One Letter Amino Acid Name to List of Codon Integers Calling Sequence: AToCInt(AA) Parameters: Name Type Description ---------------------------------------- AA string amino acid 1 letter code Returns: list Synopsis: This function converts an amino acid 1 letter code into a list of the corresponding codons. The amino acid 1 letter code for the stop codons is '$'. Examples: > AToCInt('$'); [49, 51, 57] > AToCInt(L); [29, 30, 31, 32, 61, 63] See Also: ?AAAToInt ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt ?aminoacids ?BToInt ?CodonCode ?IntToAAA ?IntToCodon ?AminoToInt ?CIntToA ?CodonToA ?IntToAmino ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB ?AToInt ?CIntToAmino ?CodonToInt ?IntToBase ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB AToCodon Function AToCodon - One Letter Amino Acid Name to List of Codons Calling Sequence: AToCodon(AA) Parameters: Name Type Description ---------------------------------------- AA string amino acid 1 letter code Returns: list Synopsis: This function converts an amino acid 1 letter code into a list of the corresponding codons. The amino acid 1 letter code for the stop codons is '$'. Examples: > AToCodon('$'); [TAA, TAG, TGA] > AToCodon(L); [CTA, CTC, CTG, CTT, TTA, TTG] See Also: ?AAAToInt ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt ?aminoacids ?BToInt ?CodonCode ?IntToAAA ?IntToCodon ?AminoToInt ?CIntToA ?CodonToA ?IntToAmino ?AToCInt ?CIntToAAA ?CodonToCInt ?IntToB ?AToInt ?CIntToAmino ?CodonToInt ?IntToBase ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB AToInt Function AToInt - convert a 1 letter amino acid code to an integer Option: builtin Calling Sequence: AToInt(aa) Parameters: Name Type Description ------------------------------------------------------ aa string single letter amino acid abbreviations Returns: 0..21 Synopsis: This function converts a one letter abbreviation for an amino acid to a posint between 1..20 according to the standard ordering of amino acids (see ?aminoacids). If aa is not a amino acid abbreviation, the value 0 is returned. If aa is the unknown amino acid X, then the value 21 is returned. Examples: > AToInt('V'); 20 > AToInt(X); 21 See Also: ?AAAToInt ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt ?aminoacids ?BToInt ?CodonCode ?IntToAAA ?IntToCodon ?AminoToInt ?CIntToA ?CodonToA ?IntToAmino ?AToCInt ?CIntToAAA ?CodonToCInt ?IntToB ?AToCodon ?CIntToAmino ?CodonToInt ?IntToBase ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB AaFreqNoPat Function AaFreqNoPat( DB:database ) Return the count vector of all amino acids or bases in F. ActOut Function ActOut( MA:array(string), ActAA ) Reports the APC positions in which the amino acid is of the type ActAA AddDeviation Function AddDeviation - Perturbs the length of the outer branches of a tree. Calling Sequence: AddDeviation(t) Parameters: Name Type Description ------------------------- t Tree tree Returns: Tree Synopsis: The function AddDeviation perturbs the lenghts of the outer branches of a tree by scaling it by a exponentially distributed factor, thus removing ultrametricity. Examples: > BDTree := BirthDeathTree(0.1, 0.01, 10, 50); BDTree := Tree(Tree(Tree(Tree(Leaf(S1,50),49.0331,Leaf(S2,50)),46.3559,Tree(Leaf(S3,50),48.4224,Leaf(S4,50))),41.6245,Tree(Leaf(S5,50),48.0734,Tree(Leaf(S6,50),48.4142,Leaf(S7,50)))),34.5821,Tree(Leaf(S8,50),42.2466,Tree(Leaf(S9,50),42.6260,Leaf(S10,50)))) > newTree := AddDeviation(BDTree); newTree := Tree(Tree(Tree(Tree(Leaf(S1,7.6316),7.1416,Leaf(S2,8.4040)),6.4383,Tree(Leaf(S3,8.6195),7.3261,Leaf(S4,10.7321))),5.4527,Tree(Leaf(S5,15.4179),11.0590,Tree(Leaf(S6,14.3074),11.2550,Leaf(S7,11.8044)))),0,Tree(Leaf(S8,10.8904),4.5808,Tree(Leaf(S9,10.3070),4.8952,Leaf(S10,6.9679)))) See also: ?BirthDeathTree ?ScaleTree ?Tree AddSpecies Function AddSpecies( t:Tree, Species:list ) Species: List of species, this is used to distinguish between paralogous and orthologous changes. Every node of the tree contains the information of which species were on the left (t[6]) and on the right (t[7]) side of the branch. If the tree length is less than 6, then the tree is expanded with 0 at position 4 and 5. (e.g. {MOUSE, YEAST, ECOLI}). Align dynamic programmingalignments Function Align - align sequences using various modes of dynamic programming Calling Sequence: Align(seq1,seq2,method,DayMat) Parameters: Name Type Description ------------------------------------------------------------------------------ seq1 {ProbSeq,string} pept, nucleot or probabilistic sequence seq2 {ProbSeq,string} pept, nucleot or probabilistic sequence method string the mode of dynamic programming to use DayMat {DayMatrix,list(DayMatrix)} Dayhoff matrices used for alignment Returns: Alignment Synopsis: Align does an alignment of two sequences using the similarity scores given in the DayMat and the given method. If a single DayMatrix is given, the alignment is done using it. If a list of DayMatrix is given, it is understood that the best PAM matrix be used. In this case Align will also compute the PamDistance and PamVariance between the two sequences. The method is optional, if not given it assumes Local. The valid methods are: Local A local alignment will be performed, this means that the best subsequences of seq1 and seq2 will be selected to be aligned. This type of alignment gives the highest possible similarity score of any alignment. This is sometimes called the Smith & Watermann algorithm. Global A global alignment will be performed, this means that the entire seq1 is aligned against the entire seq2. This may result in a negative score if the sequences do not align very well. This is sometimes called the Needleman & Wunsch algorithm. CFE A Cost-Free ends alignment is done. This is like a Global alignment, but deletions of one of the sequences at each of the end are not penalized. In some sense it is between a Local and a Global alignment. Shake A forward-backward alignment is performed. This alignment iterates forward and backwards until the score cannot be increased. In its forward phase will start at the given positions for seq1 and seq2 and find the ends which give a maximal score. From this end, it will perform backwards dynamic programming to find the optimal beginning, and so on until convergence. This type of alignment is quite similar to a Local alignment, but can be directed to focus on a particular alignment, even though it may not be the best of the two sequences. If the DayMat is omitted, the global variable DM (if assigned a DayMatrix) is used, else a PAM-250 matrix is constructed. If in addition to the method, the keyword "NoSelf" is included, when sequences of peptides or nucleotides are aligned (excluding ProbSeq), self-matches are not allowed. That is, if a sequence is aligned to itself (being structurally the same string, this we call self-alignment), the self-match (which is trivial) will not be allowed. This is done by giving the alignment of a position with itself a large penalty. By doing this it is possible to find repeated patterns. I.e. an alignment with itself, where the identity is ruled out, will show any repeated patterns. In particular if the sequences align with an offset of k, then there is a k-long motif which is repeated in the sequence. The method to find the approximate PamDistance and variance may not find the global maximum of the Score, it may find a local maximum. By using the argument "ApproxPAM=ppp", the search for the maximum will be started at PAM distance ppp. This may help when we know an approximation of the distance, or may provide a way of exploring the existence of other local maxima. Examples: > Align(AC(P00083),AC(P00091)); Alignment(Sequence(AC('P00083'))[14..92],Sequence(AC('P00091'))[19..97],177.7799,DM,0,0,{Local}) > Align(Entry(1),Entry(2),Local,DMS); Alignment(Sequence(AC('P15711'))[905..917],Sequence(AC('Q43495'))[13..25],45.1050,DMS[346],80,1153.8025,{Local}) > Align(AC(P13475),AC(P13475),Local,DMS,NoSelf); Alignment(Sequence(AC('P13475'))[128..178],Sequence(AC('P13475'))[137..188],279.9088,DMS[308],42.1286,98.4150,{Local,NoSelf}) See Also: ?Alignment ?CodonAlign ?DynProgStrings ?MAlign ?CalculateScore ?DynProgScore ?EstimatePam AlignNucPepAll Function AlignNucPepAll Calling Sequence: AlignNucPepAll(nuc,dm,division,goal,pEntries) Parameters: Name Type --------------------------- nuc NucleotideString dm DayMatrix division string goal numeric pEntries posint..posint Returns: list(NucPepMatch) Global Variables: DB OneAllMatch_SimilOnly Synopsis: Match nuc against a complete PepDB or the entries in the range given by pEntries of PepDB and return all matches reaching goal using dm and intron scoring according to division. Examples: See Also: ?Denormalize ?GlobalNucPepAlign ?NucPepMatch ?GetIntrons ?LocalNucPepAlign ?VisualizeGene ?GetPeptides ?LocalNucPepAlignBestPam ?VisualizeProtein ?GetPosition ?Normalize AlignNucPepMatch Function AlignNucPepMatch Option: builtin Calling Sequence: AlignNucPepMatch(npm,dm) Parameters: Name Type ------------------ npm NucPepMatch dm DayMatrix Returns: NucPepMatch Synopsis: Returns a new match with additional entries: NucGaps, PepGaps and Introns defining its alignment. Examples: See also: ?NucPepMatch AlignOneAll Function AlignOneAll Option: builtin Calling Sequence: AlignOneAll(seq,db,day,cutoff,entries) Parameters: Name Type Description ----------------------------------------------------------------------------- seq {posint,string} a sequence or an entry number db database a DNA or protein database day DayMatrix scoring matrix cutoff numeric only matches with score > cutoff will be reported entries posint..posint (optional) compare these entries in db only Returns: list(Match) Synopsis: Align seq against all members of the database db (or the subset of entries specified by the entries parameter when present) and return the list of matches which have a similarity score, using day, which exceeds cutoff. This function will return only one alignment per database sequence. If seq is a positive integer, then it is understood to be the sequence in that entry number. The alignments reported are Local alignments, that is the best subsequences are matched. This type of search is similar to what FASTA and BLAST (Basic Local Alignment Search Tool) do. The main difference between them and AlignOneAll is that AlignOneAll does not use approximations, it does rigorous dynamic programming against all the sequences in the database. Its speed is comparable to the other programs, so we see no reason to use shortcuts when the exact results are easy to obtain. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > AlignOneAll('NKRSPAASQPPVSRVNPQEESYQKLAMETLEELDWCLD',DB,DM,110); [Match(168.3,9748355,71916164,38,38,250), Match(147.5,9749450,71916164,38,38,250), Match(122.2,9752627,71916164,38,38,250), Match(122.2,9754188,71916164,38,38,250)] See also: ?DB ?SearchFrag ?SearchSeqDb AlignedSeq Function AlignedSeq( MA:array(string) ) returns for all positions in a multiple alignment the number of alignable sequences Alignment Class Alignment - a protein or DNA pairwise sequence alignment Template: Alignment(Seq1,Seq2,Score,DayMatrix,PamDistance,PamVariance,modes) Fields: Name Type Description --------------------------------------------------------------- Seq1 string the first protein or DNA sequence Seq2 string the second protein or DNA sequence Score numeric score of the alignment DayMatrix DayMatrix Dayhoff matrix used PamDistance numeric estimate of the PAM distance or 0 PamVariance numeric variance of the PAM distance or 0 modes set(string) optional modes of alignment Identity numeric fraction identical positions (0..1) Length1 posint length of Seq1 Length2 posint length of Seq2 Offset1 integer database offset of Seq1 Offset2 integer database offset of Seq2 PamNumber numeric synonym of PamDistance Sim numeric synonym of Score Methods: Alignment_type AlSumm HTMLC LaTeXC lprint Match print Rand select Sequence string Synopsis: An Alignment stores the information of a pairwise alignment between two sequences (protein or DNA). It replaces the Match structure, which is now obsolete. If the mode for the alignment is just Local or unknown, it is omitted, otherwise it is a set with one of {Local,Global,CFE,Shake} and optionally NoSelf. See Also: ?Align ?DynProgScore ?EstimatePam ?CalculateScore ?DynProgStrings ?MAlignment AllIndices Function AllIndices( ma:array(string), t:Tree ) compute and print the Kabat-Wu, Probabilistic and Scale indices AllRootedTrees Function AllRootedTrees - Returns all root variants from a tree Calling Sequence: AllRootedTrees(tree) Parameters: Name Type Description ------------------------------------------------------------- tree Tree the tree structure with arbitrary root position Returns: set(Tree) Synopsis: Returns all root variants from a tree, including the input tree itself. Examples: > t := Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15))); t := Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15))) > sAllRootVariants := AllRootedTrees(t); sAllRootVariants := {Tree(Leaf(A,5),0,Tree(Leaf(B,15),5,Tree(Leaf(C,25),21,Leaf(D,25)),100)),Tree(Leaf(C,2),0,Tree(Tree(Leaf(A,28),18,Leaf(B,28)),2,Leaf(D,6),100)),Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15))),Tree(Tree(Leaf(A,15),5,Tree(Leaf(C,25),21,Leaf(D,25)),100),0,Leaf(B,5)),Tree(Tree(Tree(Leaf(A,28),18,Leaf(B,28)),2,Leaf(C,6),100),0,Leaf(D,2))} See also: ?AllTernaryRoots ?RotateTree ?Tree AllTernaryRoots Function AllTernaryRoots - returns a set of all trees with ternary roots Calling Sequence: AllTernaryRoots(tree) Parameters: Name Type Description ------------------------------------------------------------- tree Tree the tree structure with arbitrary root position Returns: set(Tree) Synopsis: Returns all posssible trees with ternary roots. For each internal node of the tree (except the original root), a tree is returned where the root is at distance 0 above the internal node. For all practical purposes (e.g. reconstruction of ancestral sequences), this has the same effect as having a ternary root (which is not possible with the Tree data structure). Examples: > t := Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15))); t := Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15))) > AllTernaryRoots(t); {Tree(Tree(Leaf(A,10),0,Leaf(B,10)),0,Tree(Leaf(C,20),16,Leaf(D,20))),Tree(Tree(Leaf(A,26),16,Leaf(B,26)),0,Tree(Leaf(C,4),0,Leaf(D,4)))} See also: ?AllRootedTrees ?PASfromTree ?RotateTree ?Tree AltGenCode Function AltGenCode - Use Alternative Translation Tables Calling Sequence: AltGenCode(transl_table,codon) Parameters: Name Type Description ------------------------------------------------------ transl_table integer alternative translation table codon string 3 DNA bases Returns: list Global Variables: AltGenCode_array Synopsis: AltGenCode takes a 3 letter codon as an input and returns a list of the amino acid(s) for which the triplet codes. A codon has more than one translation when, in addition to its normal translation, it is used as an alternative start codon (M). Absent codons are not designated as such. They will return the translation of the standard genetic code. The translation tables are the same as those of the reference website. Additional initiation codons may be possible. See the website for more information and a list of the organisms that use each code. table number description --------------------------------------------------------------------- 1 The Standard Code 2 The Vertebrate Mitochondrial Code 3 The Yeast Mitochondrial Code 4 The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code 5 The Invertebrate Mitochondrial Code 6 The Ciliate, Dasycladacean and Hexamita Nuclear Code 7 deleted 8 deleted 9 The Echinoderm Mitochondrial Code 10 The Euplotid Nuclear Code 11 The Bacterial and Plant Plastid Code 12 The Alternative Yeast Nuclear Code 13 The Ascidian Mitochondrial Code 14 The Flatworm Mitochondrial Code 15 Blepharisma Nuclear Code 16 Chlorophycean Mitochondrial Code 17 not available 18 not available 19 not available 20 not available 21 Trematode Mitochondrial Code 22 Scenedesmus obliquus mitochondrial Code 23 Thraustochytrium Mitochondrial Code References: www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c#SG4 Examples: > AltGenCode(11,TTG); [L, M] > AltGenCode(11,TTT); [F] > AltGenCode(12,CTG); [S, M] See Also: ?AminoToInt ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt ?antiparallel ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon ?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB AminoToInt Function AminoToInt - convert an amino-acid name to an integer Option: builtin Calling Sequence: AminoToInt(aa) Parameters: Name Type Description ------------------------------------------ aa string full names for amino acids Returns: 1..20 Synopsis: This function converts the full name for an amino acid to a posint between 1..20 according to the standard ordering of amino acids. Examples: > AminoToInt('Serine'); 16 See Also: ?AAAToInt ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt ?aminoacids ?BToInt ?CodonCode ?IntToAAA ?IntToCodon ?AToCInt ?CIntToA ?CodonToA ?IntToAmino ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB ?AToInt ?CIntToAmino ?CodonToInt ?IntToBase ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB ApproxSearchString Function ApproxSearchString Option: builtin Calling Sequence: ApproxSearchString(pat,txt,tol) Parameters: Name Type ------------------ pat string txt string tol {0, posint} Returns: {-1,posint} Synopsis: The tolerance tol specifies how many mismatches are allowed between the pattern pat and the body of text txt. If pat is found in txt (within tol mismatches), the offset in txt is returned. Otherwise, -1 is returned. Note, spaces count as mismatches and case differences do not count as mismatches. Examples: > txt := 'AAAAAAAAAHeLLoBBBBB'; txt := AAAAAAAAAHeLLoBBBBB > j := ApproxSearchString('hallo', txt, 1); j := 9 > j+txt; HeLLoBBBBB > ApproxSearchString('nothing', 'N.O.T.H.I.N.G.', 4); -1 See Also: ?BestSearchString ?MatchRegex ?SearchMultipleString ?CaseSearchString ?SearchApproxString ?SearchString ?HammingSearchString ?SearchDelim AsciiToInt Function AsciiToInt - convert a single character to its ascii ordinal number Option: builtin Calling Sequence: AsciiToInt(s) Parameters: Name Type Description ------------------------------------ s string a string of length 1 Returns: posint Synopsis: Converts a single character into its ascii ordinal number. This is useful when encoding/decoding symbols for dynamic programming. It is also useful in general for the analysis of raw input. Examples: > AsciiToInt('a'); 97 > AsciiToInt(' '); 32 See Also: ?AToInt ?IntToA ?SearchDelim ?BestSearchString ?IntToAscii ?SearchMultipleString ?CaseSearchString ?MatchRegex ?SearchString ?HammingSearchString ?SearchApproxString BBBToInt Function BBBToInt - Nucleic Acid Three Letter Code To Integer Option: builtin Calling Sequence: BBBToInt(nuc) Parameters: Name Type Description -------------------------------------------------- nuc string three letter code for nucleic acid Returns: 1..5 Synopsis: This function converts the following three letter codes for nucleic acids Ade, Cyt, Gua, Thy, Ura to the integers 1..5 respectively. Examples: > BBBToInt('Ade'); 1 See Also: ?AAAToInt ?BaseToInt ?CIntToInt ?IntToA ?IntToCInt ?aminoacids ?BToInt ?CodonCode ?IntToAAA ?IntToCodon ?AminoToInt ?CIntToA ?CodonToA ?IntToAmino ?AToCInt ?CIntToAAA ?CodonToCInt ?IntToB ?AToCodon ?CIntToAmino ?CodonToInt ?IntToBase ?AToInt ?CIntToCodon ?GeneticCode ?IntToBBB BFGSMinimize Function BFGSMinimize Calling Sequence: BFGSMinimize(f,iniguess,epsini,epsfinal) Parameters: Name Type ------------------------- f procedure iniguess array(numeric) epsini numeric epsfinal numeric Returns: x, f(x) Synopsis: The Quasi-Newton approach of the BFGS method is used to find the (local) minimum of a function f. BFGSMinimize starts at iniguess and stops if either the distance between the two last points is smaller than epsfinal or after 1000 iterations without convergence. See Also: ?DisconMinimize ?MaxLikelihoodSize ?MinimizeBrent ?MinimizeSD ?MaximizeFunc ?Minimize2DFunc ?MinimizeFunc ?NBody BToInt Function BToInt - Nucleic Acid One Letter Code To Integer Option: builtin Calling Sequence: BToInt(nuc) Parameters: Name Type Description ------------------------------------------------ nuc string one letter code for nucleic acid Returns: 0..6 Synopsis: This function converts the following one letter codes for nucleic acids A, C, G, T, U, X to the integers 1..6 respectively. If nuc is not one of these symbols, then 0 is returned. Examples: > BToInt('A'); 1 > BToInt('R'); 0 See Also: ?AAAToInt ?BaseToInt ?CIntToInt ?IntToA ?IntToCInt ?aminoacids ?BBBToInt ?CodonCode ?IntToAAA ?IntToCodon ?AminoToInt ?CIntToA ?CodonToA ?IntToAmino ?AToCInt ?CIntToAAA ?CodonToCInt ?IntToB ?AToCodon ?CIntToAmino ?CodonToInt ?IntToBase ?AToInt ?CIntToCodon ?GeneticCode ?IntToBBB BackTranscribe Function BackTranscribe - RNA to DNA Calling Sequence: BackTranscribe(rna) Parameters: Name Type Description ------------------------------- rna string string of bases Returns: string Synopsis: Replaces all U with T in the string. Examples: > BackTranscribe('AUG'); ATG See also: ?Transcribe BackTranslate Function BackTranslate - Protein to DNA Calling Sequence: BackTranslate(prot,method,k,db) Parameters: Name Type Description ----------------------------------------------------------- prot string protein sequence method {string,set(string)} the mode of codon selection k integer window size db database (opt) database to be used Returns: string Synopsis: Back Translate a protein into DNA. The following methods can be used: Random - Select codons randomly Freq - Select the most frequent codons Least - Select the least frequent codons/motifs Reuse - Choose codons favoring tRNA reuse DynProg - Select codons based on favored motifs in in coding DNA (default) Combination of methods can be used as a set. Some methods require a database to be loaded. For methods based on codon frequency, DB must contain the DNA tag and for the DynProg the SEQ tag of DB must be DNA. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > BackTranslate('MAAAT'); > BackTranslate('MAAAT','DynProg',7); See also: ?Translate BaseCount Function BaseCount - Counts the number of DNA bases in a sequence Calling Sequence: BaseCount(sequ) Parameters: Name Type Description ---------------------------- sequ string DNA sequence Returns: list Synopsis: BaseCount counts the number of each base in a DNA sequence and returns a vector of length 6 with the number of each kind of base A, C, G, T, U, and X in place numbers 1 through 6 respectively. Examples: > BaseCount('ACCGGGTTTUUX'); [1, 2, 3, 3, 2, 1] BaseToInt Function BaseToInt - Nucleic Acid Name To Integer Option: builtin Calling Sequence: BaseToInt(nuc) Parameters: Name Type ----------------------------------- nuc full name for a nucleic acid Returns: 1..5 Synopsis: This function converts the following full names for nucleic acids Adenine, Cytosine, Guanine, Thymine, Uracil to the integers 1..5 respectively. Examples: > BaseToInt('Adenine'); 1 See Also: ?AAAToInt ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt ?aminoacids ?BToInt ?CodonCode ?IntToAAA ?IntToCodon ?AminoToInt ?CIntToA ?CodonToA ?IntToAmino ?AToCInt ?CIntToAAA ?CodonToCInt ?IntToB ?AToCodon ?CIntToAmino ?CodonToInt ?IntToBase ?AToInt ?CIntToCodon ?GeneticCode ?IntToBBB BestSearchString Function BestSearchString Calling Sequence: BestSearchString(pat,text) Parameters: Name Type ------------- pat string txt string Returns: {0,posint} Global Variables: NumberErrors Synopsis: The BestSearchString function returns the best match of pat in the body of text txt. If no match is found, it takes the first match (index 0). Examples: > BestSearchString('CYIQNCPRG', 'PPATBCYTQNCPLGFPTTSPS'); 5 > BestSearchString('CYIQNCPRG', 'XXXXXXXXXXXXXXXXX'); 0 See Also: ?CaseSearchString ?SearchApproxString ?SearchString ?HammingSearchString ?SearchDelim ?MatchRegex ?SearchMultipleString Beta_Rand Function Beta_Rand - Generate random Beta distributed reals Calling Sequence: Rand(Beta(a,b)) Parameters: Name Type ------------------ a nonnegative b nonnegative Returns: nonnegative Synopsis: This function returns a random Beta distributed number with average a/(a+b) and variance a*b/((a+b)^2*(a+b+1)). When a and be are integers, the Beta distribution corresponds to the distribution of the a-th ordered random number (U(0,1)) out of a+b-1 numbers. Also, if X1 and X2 are Chi-square distributed numbers with parameters nu1 and nu2, X1/(X1+X2) is Beta(nu1,nu2) distributed. Beta_Rand uses Rand() which can be seeded by either the function SetRand or SetRandSeed. References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.5 Examples: > Rand(Beta(3,4)); 0.5647 > Rand(Beta(2,100)); 0.02392550 See Also: ?Binomial_Rand ?FDist_Rand ?Normal_Rand ?StatTest ?ChiSquare_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score ?CreateRandSeq ?Geometric_Rand ?SetRand ?Student_Rand ?Cumulative ?Graph_Rand ?SetRandSeed ?Zscore ?Exponential_Rand ?Multinomial_Rand ?Shuffle BinTree Function BinTree( g:Graph ) Converts a cycle free connected graph to a graph equivalent to a binary tree by introducing new nodes and edges. Binomial_Rand Function Binomial_Rand - Generate random binomially distributed integers Calling Sequence: Rand(Binomial(n,p)) Binomial_Rand(n,p) Returns: integer Synopsis: This function returns a random integer binomially distributed with average n*p and variance n*p*(1-p). An example of a binomial distribution is the number of heads resulting from tossing n times a biased coin (that will give "heads" with probability p). In mathematical terms, the probability that the outcome is i is binomial(n,i) * p^i * (1-p)^(n-i) (for 0 <= i <= n). Binomial_Rand uses Rand() which can be seeded by either the function SetRand or SetRandSeed. References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.1.20 Examples: > Rand(Binomial(20,0.3)); 7 > Rand(Binomial(1000,0.01)); 13 See Also: ?Beta_Rand ?FDist_Rand ?Normal_Rand ?StatTest ?ChiSquare_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score ?CreateRandSeq ?Geometric_Rand ?SetRand ?Student_Rand ?Cumulative ?Graph_Rand ?SetRandSeed ?Zscore ?Exponential_Rand ?Multinomial_Rand ?Shuffle BipartiteGraph Function BipartiteGraph - generate a random bipartite graph Calling Sequence: BipartiteGraph(n1,n2,e) Parameters: Name Type Description ---------------------------------------------------------------- n1 integer optional number of nodes/vertices in first set n2 integer optional number of nodes/vertices in second set e integer optional number of edges Returns: Graph Synopsis: Generate a random bipartite graph with n1 nodes in one set and n2 nodes in another set and e edges connecting between the two. If e is not specified, it is chosen at random. If n1 and n2 are not specified, they are chosen at random between 5 and 20. A complete bipartite graph can be generated by requesting it to have n1*n2 edges. The edges are otherwise randomly chosen and have label 0. Examples: > BipartiteGraph(3,4,5); Graph(Edges(Edge(0,1,7),Edge(0,2,5),Edge(0,3,4),Edge(0,3,6),Edge(0,3,7)),Nodes(1,2,3,4,5,6,7)) See Also: ?Clique ?Graph_Rand ?ParseDimacsGraph ?DrawGraph ?Graph_XGMML ?Path ?Edge ?InduceGraph ?RegularGraph ?EdgeComplement ?MaxCut ?ShortestPath ?Edges ?MaxEdgeWeightClique ?TetrahedronGraph ?FindConnectedComponents ?MinCut ?VertexCover ?Graph ?MST ?Graph_minus ?Nodes BipartiteSquared Function BipartiteSquared - Computes the distance between two trees Calling Sequence: BipartiteSquared(tree1,tree2,conf,mode) Parameters: Name Type Description -------------------------------------------------------------------- tree1 Tree first tree tree2 Tree second tree conf posint (optional, def=2) size of basic configuration mode string (optional, def=RF) mode of counting: RF or SizeDiff Returns: posint Synopsis: BipartiteSquared generalizes the Robinson and Foulds (RF) distance between two trees. The first generalization is with respect to the basic configurations matched. The RF inspects each internal edge, which separates the leaves in two (conf=2) sets. It can also inspect internal nodes, which separate the leaves in 3 groups (conf=3) or in quartets, which separate the leaves in 4 groups (conf=4). The second generalization is in the way that the differences are counted. The RF measure is like a Hamming distance, if the sets of leaves are different it counts 1 if they are the same it counts 0 (mode=RF). A second alternative is to count the size of the set differences, that is for each pair, count the number of leaves in one but not in the other (mode=SizeDiff). If the global variable MinLen is assigned a numerical value, any edge whose length is <= MinLen will be considered non-existent, that is it will not generate a difference. This is useful when comparing against trees which are not binary, but multifurcating, like trees derived from taxonomic information. The name BipartiteSquared comes from the algorithm to compute the distance which solves two nested weighted bipartite matching problems: the inner one for finding the minimum cost of a configuration against another and the outer one for matching the best configurations of each tree. Examples: > t1 := Tree(Tree(Leaf(a,2),1,Leaf(b,2)),0,Tree(Leaf(c,2),1,Leaf(d,2))): > t2 := Tree(Tree(Leaf(a,2),1,Leaf(d,2)),0,Tree(Leaf(c,2),1,Leaf(b,2))): > BipartiteSquared(t1,t2,2,RF); 1 > BipartiteSquared(t1,t2,2,SizeDiff); 4 See Also: ?BootstrapTree ?LeastSquaresTree ?RobinsonFoulds ?ComputeDimensionlessFit ?PhylogeneticTree ?SignedSynteny ?GapTree ?RBFS_Tree ?Synteny ?IntraDistance ?ReconcileTree BirthDeathTree Function BirthDeathTree - Generates a tree from a birth-death process Calling Sequence: BirthDeathTree(lambda,mu,N,h) Parameters: Name Type Description --------------------------------------------------- lambda nonnegative birth rate mu nonnegative death rate N posint number of leaves h positive distance from root to leaves Returns: Tree Synopsis: The function BirthDeathTree generates a tree with N leaves. The time points of the bifurcations are sampled from a birth-death process with birth rate lambda and death rate mu over a time span h. Note: - The resulting tree is ultrametric. - For mu > 0 the root will usually not be at time 0. References: Gernhard T. The conditioned reconstructed process. J Theor Biol, 2008, 253(4):769-768 Examples: > BDTree := BirthDeathTree(0.1, 0.01, 10, 100); BDTree := Tree(Tree(Tree(Tree(Tree(Leaf(S1,100),96.8853,Leaf(S2,100)),94.1321,Tree(Leaf(S3,100),99.2827,Tree(Leaf(S4,100),99.5675,Leaf(S5,100)))),88.6199,Leaf(S6,100)),81.4629,Tree(Tree(Leaf(S7,100),97.0925,Leaf(S8,100)),96.3985,Leaf(S9,100))),68.6093,Leaf(S10,100)) See also: ?AddDeviation ?ScaleTree ?Tree Block Data structure Block( GapList, Left, Right, Sum, NrGaps, NrAA, Score, Pos, Type ) Function: creates a Block data structure Selectors: GapList 1 - Left 2 - Right 3 - Sum 4 - NrGaps 5 - NrAA 6 - Score 7 - Pos 8 - Type 9 - gaps, left, right, sum, score, bestpos BootstrapTree Function BootstrapTree - assign confidence values to internal nodes or branches Calling Sequence: BootstrapTree(Ds,labels,bstype) BootstrapTree(Ds,labels,nrounds,bstype) BootstrapTree(treeofall,bstrees,bstype) Parameters: Name Type Description ---------------------------------------------------------- Ds array(matrix) Distance matrices labels array(anything) Labels nrounds posint (optional) number of rounds treeofall Tree tree of all data bstrees array(Tree) trees from bootstrapping bstype {Branches,Nodes} (optional) type Returns: Tree Synopsis: Depending on the value of 'bstype', this function computes confidence values for internal nodes (default) or branches. The values are integers between 0 and 100, denoting how often (in percent) a particular node or branch occured during the bootstrapping. By default, 100 bootstrapping trees from randomly selected distance matrices (prob 2/3) are constructed and evaluated. Typically, each of the input matrices corresponds to one orthologous group. Alternatively, a tree from all data plus a list of trees from bootstrapping experiments could be given as arguments. The confidence values are stored in the fourth field of the Tree data structure and can be displayed using the option InternalNodes = ShowBootstrap or BranchDrawing = ShowBootstrap for the DrawTree function. To make the result more readable, only bootstrap values below 100 percent are displayed. Examples: > T1 := Tree(Tree(Tree(Leaf('A',-3),-2,Leaf('B',-3) ),-1,Leaf('C',-3)),0, Tree(Tree(Leaf('E',-3),-2,Leaf('F',-3) ),-1,Leaf('D',-3))): > T2 := Tree(Tree(Tree(Leaf('B',-3),-2,Leaf('C',-3) ),-1,Leaf('A',-3)),0, Tree(Tree(Leaf('D',-3),-2,Leaf('E',-3) ),-1,Leaf('F',-3))): > BS1 := BootstrapTree(T1, [T1,T2]); BS1 := Tree(Tree(Tree(Leaf(A,-3),-2,Leaf(B,-3),50),-1,Leaf(C,-3),50),0,Tree(Tree(Leaf(E,-3),-2,Leaf(F,-3),50),-1,Leaf(D,-3),50)) > DrawTree(BS1, InternalNodes=ShowBootstrap); > BS2 := BootstrapTree(T1, [T1,T2], Branches); BS2 := Tree(Tree(Tree(Leaf(A,-3),-2,Leaf(B,-3),50),-1,Leaf(C,-3),100),0,Tree(Tree(Leaf(E,-3),-2,Leaf(F,-3),50),-1,Leaf(D,-3),100)) > DrawTree(BS2, BranchDrawing=ShowBootstrap); See Also: ?ComputeDimensionlessFit ?LeastSquaresTree ?RBFS_Tree ?DrawTree ?PhylogeneticTree ?Tree BrightenColor Function BrightenColor - Brighten or darken a RGB color Calling Sequence: BrightenColor(color) BrightenColor(color,beta) Parameters: Name Type Description -------------------------------------------------------------------------------- color list(nonnegative) a RGB color beta numeric (optional) amount of increase/decrease in brightness Returns: nonnegative : nonnegative Synopsis: BrightenColor increases or decreases the color intensity. If 0 < beta < 1, the color gets brighter and if -1 < beta < 0 the color gets darker. The operation is not necessarily reversable (see example). The default value for beta is 0.3 Examples: > BrightenColor([1,0,0]); [1, 0.3000, 0.3000] > BrightenColor([0.5,0.5,0.5], -.2); [0.3000, 0.2400, 0.2400] > BrightenColor(BrightenColor([0.3,0.5,0.9],-0.4),0.4); [0.3600, 0.5400, 0.9000] See Also: ?ColorPalette ?DrawPointDistribution ?Set ?DrawDistribution ?DrawStackedBar ?SmoothData ?DrawDotplot ?DrawTree ?StartOverlayPlot ?DrawGraph ?GetColorMap ?StopOverlayPlot ?DrawHistogram ?Plot2Gif ?ViewPlot ?DrawPlot ?PlotArguments CIntToA Function CIntToA - Integer Codon Representation to Amino Acid Letter Calling Sequence: CIntToA(codon) Parameters: Name Type Description -------------------------------------- codon integer integer from 1 to 64 Returns: string Synopsis: This function converts the integer code for the Codons from 1 to 64 (see ?CodonCode) to the corresponding amino acid integer one letter code. The stop codon returns $. Examples: > CIntToA(37); A > CIntToA(1); K See Also: ?AAAToInt ?BaseToInt ?CIntToInt ?IntToA ?IntToCInt ?aminoacids ?BBBToInt ?CodonCode ?IntToAAA ?IntToCodon ?AminoToInt ?BToInt ?CodonToA ?IntToAmino ?AToCInt ?CIntToAAA ?CodonToCInt ?IntToB ?AToCodon ?CIntToAmino ?CodonToInt ?IntToBase ?AToInt ?CIntToCodon ?GeneticCode ?IntToBBB CIntToAAA Function CIntToAAA - Integer Codon Representation to Amino Acid 3-Letter Code Calling Sequence: CIntToAAA(codon) Parameters: Name Type Description -------------------------------------- codon integer integer from 1 to 64 Returns: string Synopsis: This function converts the integer code for the Codons from 1 to 64 (see ?CodonCode) to the corresponding amino acid three letter code. The stop codon returns the string 'Stop'. Examples: > CIntToAAA(37); Ala > CIntToAAA(1); Lys See Also: ?AAAToInt ?BaseToInt ?CIntToInt ?IntToA ?IntToCInt ?aminoacids ?BBBToInt ?CodonCode ?IntToAAA ?IntToCodon ?AminoToInt ?BToInt ?CodonToA ?IntToAmino ?AToCInt ?CIntToA ?CodonToCInt ?IntToB ?AToCodon ?CIntToAmino ?CodonToInt ?IntToBase ?AToInt ?CIntToCodon ?GeneticCode ?IntToBBB CIntToAmino Function CIntToAmino - Integer Codon Representation to Amino Acid Name Calling Sequence: CIntToAmino(codon) Parameters: Name Type Description --------------------------------------------------------- codon integer integer code for codon between 1 and 64 Returns: string Synopsis: This function converts the integer code for the Codons from 1 to 64 (see ?CodonCode) to the corresponding amino acid Name. The stop codon returns the string 'Stop'. Examples: > CIntToAmino(12); Serine > CIntToAmino(49); Stop See Also: ?AAAToInt ?BaseToInt ?CIntToInt ?IntToA ?IntToCInt ?aminoacids ?BBBToInt ?CodonCode ?IntToAAA ?IntToCodon ?AminoToInt ?BToInt ?CodonToA ?IntToAmino ?AToCInt ?CIntToA ?CodonToCInt ?IntToB ?AToCodon ?CIntToAAA ?CodonToInt ?IntToBase ?AToInt ?CIntToCodon ?GeneticCode ?IntToBBB CIntToCodon Function CIntToCodon - convert an integer into 3-letter codon Calling Sequence: CIntToCodon(x) Parameters: Name Type Description ---------------------------------------- x integer an integer from 1 to 64 Returns: three nucleic bases (one letter each Synopsis: The 64 different codons over the alphabet {A, C, G, T=U} are ordered from 1..64. This function converts a number between 1..64 to a codon. Examples: > CIntToCodon(15); ATG See Also: ?AAAToInt ?BaseToInt ?CIntToInt ?IntToA ?IntToCInt ?aminoacids ?BBBToInt ?CodonCode ?IntToAAA ?IntToCodon ?AminoToInt ?BToInt ?CodonToA ?IntToAmino ?AToCInt ?CIntToA ?CodonToCInt ?IntToB ?AToCodon ?CIntToAAA ?CodonToInt ?IntToBase ?AToInt ?CIntToAmino ?GeneticCode ?IntToBBB CIntToInt Function CIntToInt - Integer Codon Representation to Amino Acid Number Calling Sequence: CIntToInt(codon) Parameters: Name Type Description -------------------------------------- codon integer integer from 1 to 64 Returns: 1..22 Synopsis: This function converts the integer code for the Codons from 1 to 64 (see ?CodonCode) to the corresponding amino acid integers (1..20). The stop codon returns 22. Examples: > CIntToInt(37); 1 > CIntToInt(1); 12 See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?IntToA ?IntToCInt ?aminoacids ?BBBToInt ?CodonCode ?IntToAAA ?IntToCodon ?AminoToInt ?BToInt ?CodonToA ?IntToAmino ?AToCInt ?CIntToA ?CodonToCInt ?IntToB ?AToCodon ?CIntToAAA ?CodonToInt ?IntToBase ?AToInt ?CIntToAmino ?GeneticCode ?IntToBBB CalculateScore Function CalculateScore - Score two sequences as is, without aligning them Calling Sequence: CalculateScore(seq1,seq2,DM) Parameters: Name Type Description --------------------------------------------------------- seq1 string first sequence seq2 string second sequence DM DayMatrix Dahyhoff matrix to score the sequences Returns: numeric Synopsis: Calculate the score between two sequences, as is, no alignment done. Sequences may contain '_' indicating an indel. An '_' matched against another '_' will not be scored any value, like what happens as a result of a multiple alignment. Examples: > CalculateScore(CITKLWDGDQVLY,CLTKIFDGDQVIV,DM); 50.2066 See also: ?Align ?EstimatePam CallSystem Function CallSystem Option: builtin Calling Sequence: CallSystem(cmd) Parameters: Name Type ------------- cmd string Returns: integer Synopsis: The CallSystem command passes the argument cmd to the underlying operating system for execution. It returns the integer value returned by the operating system. If the results of the execution are to be returned as a string in Darwin, then the command TimedCallSystem will do this without the need of an intermediate file. Also the command OpenPipe allows the direct reading of the output of a system command Examples: > CallSystem('date'); Fri Apr 25 12:39:18 MEST 2003 0 See also: ?FileStat ?LockFile ?OpenPipe ?SystemCommand ?TimedCallSystem CaseSearchString Function CaseSearchString - case sensitive exact string searching Option: builtin Calling Sequence: CaseSearchString(pat,txt) Parameters: Name Type ------------- pat string txt string Returns: {-1,0,posint} Synopsis: This returns the offset before the character where pat matches with txt. If pat does not match txt, -1 is returned. Examples: > CaseSearchString('here', 'It is in here'); 9 > CaseSearchString('it', 'It is in here'); -1 See Also: ?BestSearchString ?SearchApproxString ?SearchString ?HammingSearchString ?SearchDelim ?MatchRegex ?SearchMultipleString CenterTreeRoot Function CenterTreeRoot - Place root in center of tree Calling Sequence: CenterTreeRoot(t) Parameters: Name Type Description ------------------------- t Tree a Tree Returns: Tree Synopsis: Place root of tree such that the number of leaves on each side is most equal. Useful when drawing circular trees when the root has been placed far from the center. Examples: > t := Tree(Leaf('1',3),0,Tree(Tree(Leaf('2',3),2,Leaf('3',3)),1,Leaf('4',3))): > CenterTreeRoot(t); Tree(Tree(Leaf(2,1.5000),0.5000,Leaf(3,1.5000)),0,Tree(Leaf(1,4.5000),0.5000,Leaf(4,2.5000),100)) See also: ?RotateTree ?TreeSize ChangeLeafLabels Function ChangeLeafLabels( t:Tree, Labels:list ) Replaces the number of the leaves (t[3]) by the name in the list Labels CheckAmbigTree Function CheckAmbigTree( t:Tree ) Tree t must contain species information at position 6 and 7. To get this, use AddSpecies. The function checks for violations of rules such as "a is closer to b than to c in one place but a is closer to c than to b in another place". The number of violations for each subtree are counted and added to the tree at position 8. If an additional argument is given, a list of rules, those rules are taken. (Function FindRules finds those rules :-) ChiSquare_Rand Function ChiSquare_Rand - Generate random Chi-square distributed reals Calling Sequence: Rand(ChiSquare(nu)) Parameters: Name Type ------------------ nu nonnegative Returns: nonnegative Synopsis: This function returns a random chi-square distributed number with average nu and variance 2*nu. When nu is an integer, the sum of the squares of nu Normal(0,1) variables is distributed as ChiSquare(nu). ChiSquare_Rand uses Rand() which can be seeded by either the function SetRand or SetRandSeed. References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.4 Examples: > Rand(ChiSquare(3)); 1.4350 > Rand(ChiSquare(100)); 123.9082 See Also: ?Beta_Rand ?FDist_Rand ?Normal_Rand ?StatTest ?Binomial_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score ?CreateRandSeq ?Geometric_Rand ?SetRand ?Student_Rand ?Cumulative ?Graph_Rand ?SetRandSeed ?Zscore ?Exponential_Rand ?Multinomial_Rand ?Shuffle Cholesky Function Cholesky - decomposition of a positive definite matrix A = R * R^t Option: builtin Calling Sequence: Cholesky(A) Parameters: Name Type Description ------------------------------------ A matrix(numeric) a matrix Returns: matrix Synopsis: R := Cholesky(A) computes the Cholesky decomposition of the matrix A. A is the input matrix, and must be a square, symmetric, positive definite matrix. If A does not satisfy these conditions, an error is returned. R is a square matrix, lower triangular, such that R*transpose(R) = A. Cholesky is used to check for positive-definiteness, and at the same time it allows to solve a system Ax=b (by doing two back-substitutions) if it is positive-definite. Examples: > A := [[3,1,2],[1,2,-1],[2,-1,5]]; A := [[3, 1, 2], [1, 2, -1], [2, -1, 5]] > R := Cholesky(A); R := [[1.7321, 0, 0], [0.5774, 1.2910, 0], [1.1547, -1.2910, 1.4142]] > R * R^t; [[3.0000, 1, 2], [1, 2, -1.0000], [2, -1.0000, 5.0000]] See Also: ?convolve ?GivensElim ?matrix ?Eigenvalues ?Identity ?matrix_inverse ?GaussElim ?LinearProgramming ?transpose CircularTour Function CircularTour - find a minimal cost Circular tour Calling Sequence: CircularTour(seqs) CircularTour(AllAll) CircularTour(Dist) Parameters: Name Type Description ------------------------------------------------------------------- seqs list(string) a list of Sequences (DNA or proteins) AllAll matrix(Alignment) all vs all Alignment matrix Dist matrix(numeric) all vs all distance matrix (symmetric) Returns: list(posint) Synopsis: This is a front-end to ComputeTSP where we give as input either a set of sequences or a distance matrix or an AllAll matrix and the result is a minimal cost tour broken at the most convenient place (highest cost). The input can be: List of sequences - n sequences. The sequences are aligned all against all using Global alignments with the default DM matrix. (the rest is as with AllAll matrix). AllAll matrix - an n x n symmetric matrix of Alignments. If the Alignments have a PamDistance, the minimal cost tour is based on PamDistances. If not it is based on maximizing the Score of the neighbouring alignments. Distance matrix - an n x n symmetric distance matrix. The tour is computed to minimize the sum of the distances. The output is the list of indices in the best tour of length n. Examples: > seqs := [SSSS, AAAA, AAAS, AASS, ASSS, SSSA, SSAA, SAAA]: > CircularTour(seqs); [5, 4, 3, 2, 8, 7, 6, 1] See also: ?Clusters ?ComputeTSP ?FindCircularOrder ?MAlign Clique Function Clique - Maximum clique exact/approximate algorithm Calling Sequence: Clique(A) Parameters: Name Type Description -------------------------- A Graph a Graph Returns: set Global Variables: CliqueUpperBound Synopsis: The input to this algorithm is an undirected graph. An undirected graph is represented as a Graph data structure which should accept two selectors: Nodes and Edges. The Maximum Clique problem is finding a set of completely connected vertices which is of maximum size. The output is a set of the Nodes in the clique. The algorithm computes an upper bound on the size of the maximum clique which is left in the global variable CliqueUpperBound. If this coincides with the size of the answer, it means that the answer is optimal (maximal). The global variable CliqueIterFactor may be assigned a non-negative number f. The algorithm will then run for f*n^2 iterations. If f=0 then only the greedy heuristic is run, and this is quite fast. The larger f, the more accurate the answers will be, and the more time the algorithm will consume. The Clique problem is closely related to the Vertex Cover problem. They can be related by the following formula: Clique(G) = NodeComplement(VertexCover(EdgeComplement(G))) Examples: > hex := HexahedronGraph(); hex := Graph(Edges(Edge(0,1,2),Edge(0,1,4),Edge(0,1,5),Edge(0,2,3),Edge(0,2,6),Edge(0,3,4),Edge(0,3,7),Edge(0,4,8),Edge(0,5,6),Edge(0,5,8),Edge(0,6,7),Edge(0,7,8)),Nodes(1,2,3,4,5,6,7,8)) > Clique(hex); {7,8} See Also: ?BipartiteGraph ?Graph_Rand ?ParseDimacsGraph ?DrawGraph ?Graph_XGMML ?Path ?Edge ?InduceGraph ?RegularGraph ?EdgeComplement ?MaxCut ?ShortestPath ?Edges ?MaxEdgeWeightClique ?TetrahedronGraph ?FindConnectedComponents ?MinCut ?VertexCover ?Graph ?MST ?Graph_minus ?Nodes Clustal Function Clustal( ma:array(string) ) Use clustal program to align sequences. ClustalMSA Function ClustalMSA - Multiple sequence alignment using clustalw2 Calling Sequence: ClustalMSA(seqs,{optional_args}) Parameters: Name Type Description -------------------------------------------------------------------------------------------------------------- GENERAL SETTINGS seqs list(string) sequences to align labels list(string) (opt) sequence labels bootstrap numeric (opt) nr. of bootstraps quicktree boolean (opt) FAST algo for guide tree? seqtype string (opt) type of sequence tmpdir string (opt) dir for tempfiles FAST PAIRWISE AL. ktuple numeric (opt) word size topdiags numeric (opt) nr. of best diag. window numeric (opt) window around best diag. pairgap numeric (opt) gap penalty SLOW PAIRWISE AL. pwmatrix {CodonMatrix,DayMatrix,string} (opt) protein weight matrix pwgapopen numeric (opt) gap open penalty pwgapext numeric (opt) gap ext. penalty MULTIPLE AL. msamatrix {CodonMatrix,DayMatrix,string} (opt) protein weight matrix gapopen numeric (opt) gap opening penalty gapext numeric (opt) gap ext. penalty endgaps boolean (opt) no end gap sep. penalty gapdist numeric (opt) gap sep. penalty range nogap boolean (opt) residue-spec. gaps off nohgap boolean (opt) hydrophilic gaps off maxdiv numeric (opt) % ident. for delay transweight numeric (opt) transitions weighting iteration string (opt) NONE, TREE or ALIGNMENT numiter numeric (opt) max nr of iterations STRUCTURE AL. helixgap numeric (opt) gap penalty for helix core residues strandgap numeric (opt) gap penalty for strand core residues loopgap numeric (opt) gap penalty for loop regions terminalgap numeric (opt) gap penalty for structure termini helixendin numeric (opt) nr of res. inside helix to be treated as terminal helixendout numeric (opt) nr of res. outside helix to be treated as terminal strandendin numeric (opt) nr of res. inside strand to be treated as terminal strandendout numeric (opt) nr of res. outside strand to be treated as terminal Returns: MAlignment Synopsis: ClustalMSA computes a multiple sequence alignment (MSA). If no Dayhoff or Codon matrix is passed, clustalw uses the Gonnet scoring matrix. The score and upperbound score in the MAlignment data structure is left undefined. The function works only in unix/linux, and assumes that clustalw is available (set environment variable $Clustalw to point to binary). More information and source of clustalw is available at 'http://www.clustal.org/ '. Optional arguments and their default values: seqs: true bootstrap: 1000 quicktree: false seqtype: guessed from seqs, {PROTEIN, DNA} tmpdir: /tmp ktuple: 1 topdiags: 5 window: 5 pairgap: 3 pwmatrix: GONNET, {DayMatrix, CodonMatrix, 'GONNET', 'BLOSUM', 'PAM', 'ID'} pwgapopen: 10 pwgapext: 0.1 msamatrix: GONNET, {DayMatrix, CodonMatrix, 'GONNET', 'BLOSUM', 'PAM', 'ID'} gapopen: 10 gapext: 0.2 endgaps: false gapdist: 4 nogap: false maxdiv: 30 transweight: 0.5 iteration: NONE, {'NONE', 'TREE', 'ALIGNMENT'} numiter: 0 helixgap: 4 strandgap: 4 loopgap: 1 terminalgap: 2 helixendin: 3 helixendout: 0 strandendin: 1 strandendout: 1 Examples: > msa := ClustalMSA(['ASDFAARA','ASDAVRA','ASFDAATA','ASGDAGTA']); > print(msa); Multiple sequence alignment: ---------------------------- Score of the alignment: 0 Maximum possible score: 1.7976931e+308 1 ASDFAARA 2 AS_DAVRA 3 ASFDAATA 4 ASGDAGTA See also: ?Align ?Alignment ?MafftMSA ?MAlign ?MAlignment ClusterRelPam Function ClusterRelPam( MinSquareTree:Tree, MaxPW:array ) returns an array of array of clusters for the Pam windows. Each sequence from SeqToMul can be addressed directly by [PAMwindow_no, Cluster_no, Sequence_no] Clusters Function Clusters - find Clusters of seqs or objects Calling Sequence: Clusters(seqs,lim) Clusters(AllAll,lim) Clusters(Dist,lim) Parameters: Name Type Description ------------------------------------------------------------------- seqs list(string) a list of Sequences (DNA or proteins) AllAll matrix(Alignment) all vs all Alignment matrix Dist matrix(numeric) all vs all distance matrix (symmetric) lim symbol = positive mode and value used to define clusters Returns: list(set(posint)) Synopsis: This function finds clusters in a set of sequences or any objects from their distance or similarity constraints. The input is either a set of sequences or a distance matrix or an AllAll matrix and the result is a list of sets of clusters. The components of the clusters are identified by the indices to the seqs or AllAll or Dist arrays. The parameters can be: List of sequences - n sequences. The sequences are aligned all against all using Global alignments with the default DM matrix. (the rest is as with AllAll matrix). AllAll matrix - an n x n symmetric matrix of Alignments. If the cluster definition is based on MaxDistance=ddd or AveDistance=dd then the clusters are selected so that the PamDistance (or average) of the Alignments are less than ddd. If MinSimil=sss or AveSimil=sss is specified, the the clusters will be determined by the Score (or average) of the Alignments being larger than sss. Distance matrix - an n x n symmetric distance matrix. MaxDistance=ddd or AveDistance=ddd should be specified and the clusters are determined by this maximum/average distance. MaxDistance = ddd - The clusters are determined by the distance ddd. I.e. any two sequences or objects which are separated by less than ddd will be part of the same cluster AveDistance = ddd - The clusters are determined by the distance ddd. The clusters are built one at a time, starting with the first sequence/object and adding one member at a time. The member added is the one whose average distance to the rest of the cluster is less than ddd. The clusters built this way, may depend on the order of the input sequences. MinSimil = sss - Like MaxDistance, but the selection criteria is based on Similarity or Score being greater than sss. AveSimil = sss - Like AveDistance, but the selection criteria is based on the average Similarity or Score being greater than sss. The output is the list of sets of indices. Each set is a cluster. All indices are included, hence some clusters may be singletons. Examples: > seqs := [SSSSS, AAAAA, AAAAS, SASSS, SSSSA, ASAAA]: > Clusters(seqs,AveSimil=8); [{1,4,5}, {2,3,6}] See also: ?CircularTour ?ComputeTSP ?FindCircularOrder ?MAlign Code Class Code - placeholder for text that should be displayed "as is" Template: Code(string1,...) Fields: Name Type Description --------------------------------------------- string1 string text to be displayed as is Returns: Code Methods: Code_type HTMLC LaTeXC print string Synopsis: The Code data structure holds text that is to be displayed preserving all spaces, tabs, newlines, etc. This is what is expected to happen to a program. The content of Code will normally be displayed with constant-width font. Any newlines appearing in the argument strings will be displayed. Additionally, a newline is inserted at the end of every argument. Arguments of Code will be displayed in new lines. So if the insertion of code is desired within a sentence, the TT() structure should be used (constant width font). Examples: > Code( 'for i to 10 do lprint(i^2) od'); Code(for i to 10 do lprint(i^2) od) See Also: ?Block ?HTML ?List ?RunDarwinSession ?Color ?HyperLink ?Paragraph ?screenwidth ?Copyright ?Indent ?PostscriptFigure ?Table ?DocEl ?LastUpdatedBy ?print ?TT ?Document ?latex ?Roman ?View CodonAlign Function CodonAlign - align codon sequences using dynamic programming Calling Sequence: CodonAlign(seq1,seq2,method,cm) Parameters: Name Type Description ----------------------------------------------------------------------------- seq1 string codon sequence seq2 string codon sequence method string the mode of dynamic programming to use cm {DayMatrix,list(DayMatrix)} codon matrices used for alignment Returns: Alignment Global Variables: logPAM1 Synopsis: CodonAlign does an alignment of two codon sequences using the similarity scores given in the DayMatrix (of type 'Codon') and the given method. If a single DayMatrix is given, the alignment is done using it. If a list of DayMatrix is given, it is understood that the best CodonPAM matrix be used. Since the introduction of the generic dynamic programming, CodonAlign is only a wrapper function. It extracts the DNA sequence from an entry and converts the codon sequence to a character string for the generic Align function. Examples: > CodonAlign('AAACCCGGG','AAGCCGGGG', CM); Alignment('AVq','CWq',30.2765,DM,0,0) > CodonAlign('AAACCCGGG','AAGCCGGGG',CMS); Alignment('AVq','CWq',34.3914,DMS[345],79,30984.0898) See also: ?Align ?CodonDynProgStrings ?CreateCodonMatrices ?DayMatrix CodonCount Function CodonCount - Count the number codons Calling Sequence: CodonCount() CodonCount(dna) Parameters: Name Type Description -------------------------------------- dna string a string of coding DNA Returns: list Global Variables: CodonCountsG DBmarkG Synopsis: The function CodonCount count all codons in the loaded database (if no arguments) or counts the codons in DNA sequence coding for a protein (given as an argument). The function returns a list of codon occurrences. See also: ?CodonUsage CodonDynProgStrings Function CodonDynProgStrings - compute score and aligned strings from a codon alignment Calling Sequence: CodonDynProgStrings(al) Parameters: Name Type Description ---------------------------------- al Alignment Codon alignment Returns: [numeric, string, string] : [score,seq1,seq2] Synopsis: Returns a list with the similarity score, first sequence and second sequence suitable for printing the aligned DNA sequences (with '___' inserted at gap positions). Examples: > al := CodonAlign(AAACCCGGGTTT,AAACCTTTT,CMS,Global); al := Alignment('AVq#','AX#',10.7382,DMS[368],102,47328.8945,{Global}) > CodonDynProgStrings(al); [10.7382, AAACCCGGGTTT, AAACCT___TTT] See also: ?CodonAlign ?CreateCodonMatrices ?EstimateCodonPAM CodonMatrix Class CodonMatrix - a codon mutation matrix Template: CodonMatrix() CodonMatrix(Sim, Desc, CodonPam) CodonMatrix(Sim, Desc, CodonPam, AAPAM) CodonMatrix(Sim, Desc, CodonPam, AAPAM, FixedDel, IncDel) Fields: Name Type Description ------------------------------------------------------------------------------------------- Sim matrix(numeric,64) 64 x 64 codon similarity matrix Desc string a description CodonPam numeric CodonPam number of the matrix AAPam numeric the equivalent PAM distance FixedDel numeric the constant part of the deletion costs IncDel numeric the length-dependent part of the deletion costs PamDistance numeric synonym of CodonPam PamNumber numeric synonym of CodonPam MaxSim numeric the highest similarity score in the matrix MinSim numeric the lowest similarity score in the matrix MaxOffDiag numeric the highest similarity score that is not in the diagonal Type string synonym of Desc Description string synonym of Desc Methods: CodonMatrix_type lprint print Rand select string Synopsis: A CodonMatrix contains everything that is needed to score codon alignments. This is basically the 64x64 scoring matrix plus the deletion cost function. These costs are based on the PAM distance equivalent and are calculated automatically if they are not given as an argument. A CodonMatrix is now only used for storing SynPAM matrices See also: ?CreateSynMatrices ?EstimateSynPAM CodonMutate Function CodonMutate - randomly evolve a codon sequence Calling Sequence: CodonMutate(seq1,cpam) CodonMutate(seq1,cpam,DelType,lnM1) Parameters: Name Type Description ------------------------------------------------------------- seq1 string codon sequence cpam positive CodonPAM distance to mutate DelType ExpGaps (optional) gap type lnM1 matrix(numeric) (optional) log. of a 1-PAM matrix Returns: string Synopsis: Mutates a sequence of codons over a certain CodonPAM distance. Stop codons always mutate to stop codons while sense codon always mutate to sense codons. When a gap type is given, the function returns not only the mutated string, but also the two aligned sequences, where the exact position of the gaps can be seen. lnM1 is by default assumed to be CodonLogPAM1 which must be created with CreateDayMatrices() first. Examples: > CodonMutate(CCCATCAACACTGAC,50); CCTATCGCCACCGAC See also: ?CreateCodonMatrices ?CreateRandSeq ?Mutate CodonPamToPam Function CodonPamToPam - Convert CodonPAM to PAM. Calling Sequence: CodonPamToPam(lnM1,CF,CodonPam) Parameters: Name Type Description --------------------------------------------------------------------------- lnM1 matrix(numeric,64) Logarithm of a 1-PAM codon mutation matrix. CF array(numeric,64) Codon frequencies CodonPam numeric CodonPAM to be converted Returns: numeric Synopsis: Converts CodonPAM to PAM. This conversion depends on the amount of synonymous mutations for a species or set of species, so the logarithm of the 1-CodonPAM matrix and the codon frequencies are required as arguments. The conversion is done by summing up the percentage of synonymous mutations in the codon matrix. This sum is the expected percentage of identical amino acids at this CodonPAM distance, which then can be converted to PAM using the PerIdentToPam function. Examples: > CodonPamToPam(CodonLogPAM1,CF,50); 23.3413 See also: ?CreateCodonMatrices ?PamToCodonPam ?PerIdentToPam CodonToA Function CodonToA Calling Sequence: CodonToA(triple) Parameters: Name Type ------------------------------------ triple a 3 letter DNA/RNA sequence Returns: one letter amino acid description Synopsis: This function converts a 3 letter DNA/RNA sequence into the amino acid specified by the genetic code. It returns $ when the given codon corresponds to the stop codon. Examples: > CodonToA('UUU'); F See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?IntToA ?IntToCInt ?aminoacids ?BBBToInt ?CIntToInt ?IntToAAA ?IntToCodon ?AminoToInt ?BToInt ?CodonCode ?IntToAmino ?AToCInt ?CIntToA ?CodonToCInt ?IntToB ?AToCodon ?CIntToAAA ?CodonToInt ?IntToBase ?AToInt ?CIntToAmino ?GeneticCode ?IntToBBB CodonToCInt Function CodonToCInt - convert a 3-letter codon into a integer Calling Sequence: CodonToCInt(code) Parameters: Name Type Description ---------------------------------------------------------------- code string three nucleic (DNA, RNA) bases (one letter each) Returns: 0..64 Synopsis: The 64 different codons over the alphabet {A, C, G, T=U} are ordered from 1..64. This function converts a codon to a number between 1..64. If it contains an invalid base or an X, it returns 0. Examples: > CodonToCInt('TTT'); 64 See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?IntToA ?IntToCInt ?aminoacids ?BBBToInt ?CIntToInt ?IntToAAA ?IntToCodon ?AminoToInt ?BToInt ?CodonCode ?IntToAmino ?AToCInt ?CIntToA ?CodonToA ?IntToB ?AToCodon ?CIntToAAA ?CodonToInt ?IntToBase ?AToInt ?CIntToAmino ?GeneticCode ?IntToBBB CodonToInt Function CodonToInt Option: builtin Calling Sequence: CodonToInt(UUU) Parameters: Name Type -------------------------------------------------- UUU a three RNA base sequence (one letter each) Returns: 1..22 Synopsis: This function converts a three RNA base sequence to the amino acid number it specifies according to the standard genetic code. If the triplet is unknown, the value 21 is returned. If it is a stop codon, it returns 22. Examples: > CodonToInt('UUU'); 14 See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?IntToA ?IntToCInt ?aminoacids ?BBBToInt ?CIntToInt ?IntToAAA ?IntToCodon ?AminoToInt ?BToInt ?CodonCode ?IntToAmino ?AToCInt ?CIntToA ?CodonToA ?IntToB ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase ?AToInt ?CIntToAmino ?GeneticCode ?IntToBBB CodonUsage Function CodonUsage - Get Codon Usage for a particular amino acid Calling Sequence: CodonUsage() CodonUsage(dna) Parameters: Name Type Description --------------------------- dna string coding DNA Returns: list Synopsis: Get the codon usage for sequence of coding DNA. If no argument is give the function gets the codon usage for all entries in the loaded database. Examples: > CodonUsage(); [[[GCA, 0], [GCC, 0], [GCG, 0], [GCT, 0]], [[AGA, 0], [AGG, 0], [CGA, 0], [CGC, 0], [CGG, 0], [CGT, 0]], [[AAC, 0], [AAT, 0]], [[GAC, 0], [GAT, 0]], [[TGC, 0], [TGT, 0]], [[CAA, 0], [CAG, 0]], [[GAA, 0], [GAG, 0]], [[GGA, 0], [GGC, 0], [GGG, 0], [GGT, 0]], [[CAC, 0], [CAT, 0]], [[ATA, 0], [ATC, 0], [ATT, 0]], [[CTA, 0], [CTC, 0], [CTG, 0], [CTT, 0], [TTA, 0], [TTG, 0]], [[AAA, 0], [AAG, 0]], [[ATG, 0]], [[TTC, 0], [TTT, 0]], [[CCA, 0], [CCC, 0], [CCG, 0], [CCT, 0]], [[AGC, 0], [AGT, 0], [TCA, 0], [TCC, 0], [TCG, 0], [TCT, 0]], [[ACA, 0], [ACC, 0], [ACG, 0], [ACT, 0]], [[TGG, 0]], [[TAC, 0], [TAT, 0]], [[GTA, 0], [GTC, 0], [GTG, 0], [GTT, 0]], [[XXX, 1]], [[TAA, 0], [TAG, 0], [TGA, 0]]] See also: ?CodonCount ?RSCU Collapse Function Collapse( g:Graph ) Collapses cycles in g by removing edges. CollapseNodes Function CollapseNodes Calling Sequence: CollapseNodes(tree,PAM = pam) CollapseNodes(tree,NodeCount = ncount) CollapseNodes(tree,Class = class) CollapseNodes(tree,Bootstrapping = boots) Parameters: Name Type Description ----------------------------------------------------------------- tree Tree pam positive PAM distance ncount posint Number of nodes class {string,list(string)} Lineage(s) boots posint Minimal bootstrapping percentage Returns: Tree Synopsis: Collapses subtrees to a single leaf. With the PAM option, all leaves that are at most the desired PAM distance from each other are collapsed. The NodeCount option collapses all subtrees with less or equal this number of leaves. The Class option is used for species trees to collapse leaves that are from the same class of species. Finally, Bootstrapping collapses all subtrees where all nodes are at least boots% supported by bootstrapping. Examples: > tree := Tree(Leaf(Mouse,-2.9000),0,Tree(Leaf(Human,-2.2000),-0.6000, Leaf(Dog,-2.7000))); tree := Tree(Leaf(Mouse,-2.9000),0,Tree(Leaf(Human,-2.2000),-0.6000,Leaf(Dog,-2.7000))) > CollapseNodes(tree,PAM=4); Tree(Leaf(Mouse,-2.9000),0,Leaf(Human/Dog,-2.4500)) See also: ?BootstrapTree ?PrintTreeSeq ?Tree CollectStat Function CollectStat - collect and summarize Stat structures Calling Sequence: CollectStat(data) Parameters: Name Type Description ------------------------------------------------------------------- data anything any structure/list/set containing Stat structures Returns: list(Stat) Synopsis: CollectStat will inspect any list/structure/set and collect all the Stat structures together. The Stat structures will be union-ed whenever they have the same description. This provides an easy way of adding together several simulation results, which have been obtained in different runs. See also: ?OutsideBounds ?Stat ?union ?UpdateStat Color Class Color - structure to define the color of some document part Template: Color(colcode,doc1,...) Fields: Name Type Description ----------------------------------------------------- colcode string a color name or its hex RGB values Returns: Color Methods: Color_type HTMLC LaTeXC string Synopsis: The Color data structure holds document parts that are to be displayed in that color. The number of arguments is variable. Examples: > Color( red, 'Your balance is negative' ); Color(red,Your balance is negative) See Also: ?Block ?HyperLink ?PostscriptFigure ?string_RGB ?Code ?Indent ?print ?Table ?Copyright ?LastUpdatedBy ?RGB_string ?TT ?DocEl ?latex ?Roman ?View ?Document ?List ?RunDarwinSession ?HTML ?Paragraph ?screenwidth ColorPalette Function ColorPalette - creates a set of colors according to a colormap Calling Sequence: ColorPalette(n) ColorPalette(n,map) Parameters: Name Type Description ------------------------------------------------------------ n posint the number of different colors to be created map string (optional) a colormap Returns: list([nonnegative, nonnegative, nonnegative]) Synopsis: This function computes n different colors according to a colormap and returns their RGB values between [0,1]. The possible colormaps are described below and completely specify the appearance of the colors: The map parameter can be one of the following colormaps: jet jet jet ranges from blue to red, and passes through the colors cyan, yellow, and orange. It is a variation of the hsv colormap. hsv hsv varies the hue component of the hue-saturation-value color model. The colors begin with red, pass through yellow, green, cyan, blue, magenta, and return to red. The colormap is particularly appropriate for displaying periodic functions. heat heat varies the color from a saturated blue through white into a saturated red. This map is useful for heatmaps, where negative and positive values are possible. stoplight stoplight gives colors from red through yellow to green. lines lines gives a list of distinct colors. Examples: > colors := ColorPalette(10); colors := [[0, 0, 1], [0, 0.4444, 1], [0, 0.8889, 1], [0, 1, 0.6667], [0, 1, 0.2222], [0.2222, 1, 0], [0.6667, 1, 0], [1, 0.8889, 0], [1, 0.4444, 0], [1, 0, 0]] See Also: ?BrightenColor ?DrawPointDistribution ?Set ?DrawDistribution ?DrawStackedBar ?SmoothData ?DrawDotplot ?DrawTree ?StartOverlayPlot ?DrawGraph ?GetColorMap ?StopOverlayPlot ?DrawHistogram ?Plot2Gif ?ViewPlot ?DrawPlot ?PlotArguments Complement Function Complement - complement of a DNA sequence Calling Sequence: Complement(nuc) Parameters: Name Type Description ----------------------------------------- nuc string a string of DNA/RNA bases Returns: string Synopsis: Computes the complement DNA/RNA of the given sequence. For more clarity, the antiparallel of AACC is GGTT. The reverse of AACC is CCAA and the Complement of AACC is TTGG. The Complement of a DNA sequence does not form a double helix with the sequence. Examples: > Complement('ACTTACG'); TGAATGC See Also: ?AltGenCode ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToCInt ?AminoToInt ?BBBToInt ?CIntToCodon ?GeneticCode ?IntToCodon ?antiparallel ?BToInt ?CIntToInt ?IntToB ?Reverse ?AToCInt ?CIntToA ?CodonToA ?IntToBase ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBBB ComplementSequence Function ComplementSequence Calling Sequence: ComplementSequence(offset) Parameters: Name Type ---------------- offset integer Returns: integer : integer Synopsis: Returns the numeric offset of the sequence ofs is pointing to and the negative offset of the original sequence passed to GetComplement. See Also: ?AltGenCode ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB ?AminoToInt ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt ?antiparallel ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon ?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase Complex Function Complex( Re:numeric, Im:numeric ) Data structure Complex( Re, Im ) Representation of complex numbers by a pair of numerical arguments, the real part and the complex part. - Operations: Initialization: a := Complex(1,1); b := Complex(0,1); All arithmetic operations: a+b, a-b, a*b, a/b, a^b Special functions exp(a), ln(a), sin(a), cos(a), tan(a) Printing: print(a); printf( '%.3f', a ); Type testing: type(a,Complex); - Conversions: To string : string(a) - Selectors: a[Re] : real part a[Im] : imaginary part ComputeCAI Function ComputeCAI - Compute Codon Adaptation Index Calling Sequence: ComputeCAI(e) Parameters: Name Type Description --------------------------------------- e {Entry,string} dna information Returns: numeric Synopsis: Computes the CAI (codon adaptation index) for a dna string or an entry (with DNA tag). The function requires the the Relative Adaptiveness RA has to be calculated prior to calling ComputeCAI. See also: ?SetupRA ComputeCAIVector Function ComputeCAIVector - Compute CAI for all AA individually Calling Sequence: ComputeCAI(e) Returns: list Synopsis: Computes the CAI for all codons in an entry. See also: ?ComputeCAI ?SetupRA ComputeCubicTSP Function ComputeCubicTSP - compute Travelling Salesman Cycle (cubic time) Option: builtin Calling Sequence: ComputeCubicTSP(Dist,trials,p1..pk) Parameters: Name Type Description -------------------------------------------------------------------------- Dist matrix(nonnegative) symmetric, square, distance matrix trials posint number of random starting points (optional) p1..pk list(posint) optional good solutions Returns: list(posint) Synopsis: Compute a minimum distance cycle (symmetric travelling salesman problem) with a heuristic O(n^3) algorithm. The second argument is optional. If present it indicates the number of (random starting points) trials that will be computed; the best cycle/tour of these will be returned. The third ... kth arguments are also optional and are permutations of integers which are good solutions to the TSP problem. These will be used as seeds to build new (better) solutions. This is the default function used by ComputeTSP. It should be used only when you can provide initial good solutions or a different number of trials is desired. See also: ?ComputeTSP ComputeDimensionlessFit Function ComputeDimensionlessFit - dimensionless fitting of a distance tree Calling Sequence: ComputeDimensionlessFit(t,Dist,Var) Parameters: Name Type Description ------------------------------------------------------------------- t Tree the given tree t matrix(numeric) alternatively, the distances over the tree Dist matrix(numeric) distance matrix Var matrix(numeric) variances of the distances Returns: nonnegative Synopsis: This function computes the Dimensionless fitting index of a set of distances over a tree as in "A Dimensionless Fit Measure for Phylogenetic Distance Trees", J Bioinform Comput Biol, vol 3, pp 1429-1440. Trees built over the same set of species, even with radically different methods, can be ranked by the quality of their fit with this index. The input can be a tree (which is converted to an actual distance matrix with the Tree_matrix) or an actual distance matrix. The input matrices Dist and Var are the distance measured between the species and their variance. See Also: ?BootstrapTree ?LeastSquaresTree ?RBFS_Tree ?Synteny ?GapTree ?PhylogeneticTree ?SignedSynteny ?Tree_matrix ComputeQuarticTSP Function ComputeQuarticTSP - compute Travelling Salesman Cycle (quartic time) Option: builtin Calling Sequence: ComputeQuarticTSP(Dist,trials,p1..pk) Parameters: Name Type Description -------------------------------------------------------------------------- Dist matrix(nonnegative) symmetric, square, distance matrix trials posint number of random starting points (optional) p1..pk list(posint) optional good solutions Returns: list(posint) Synopsis: Compute a minimum distance cycle (symmetric travelling salesman problem) with a heuristic O(n^4) algorithm. The second argument is optional. If present it indicates the number of (random starting points) trials that will be computed; the best cycle/tour of these will be returned. The third ... kth arguments are also optional and are permutations of integers which are good solutions to the TSP problem. These will be used as seeds to build new (better) solutions. See also: ?ComputeTSP ComputeTPI Function ComputeTPI - TPI index of a DNA sequence Calling Sequence: ComputeTPI(e,mode) Parameters: Name Type Description ------------------------------------------------------ e string an Entry which contains a DNA sequence mode string (optional), the string AllAA Returns: list({list,numeric}) Synopsis: The TPI index measures how much correlation there exists among the consecutive tRNAs coding for each amino acid. This autocorrelation is measured in a way that it is insensitive to different frequencies of amino acids, different frequencies of tRNAs and different frequencies of bases. Two indices are computed, which are two representations of the same magnitude. In both cases, the TPI measures the cumulative distribution of the number of pairs of consecutive tRNAs coding for the same amino acids. If the actual number of pairs it too low, this means that the tRNAs are "rotated" around quite often. If the number of pairs is high, this means that the tRNAs are "reused" often. The first value returned is a normal deviate with the same cumulative probability of having the observed number of pairs. A negative value means less correlation than expected, a positive value higher correlation than expected. Since it is a normal deviate, N(0, 1), it is easy to estimate how rare the values are. E.g. 1.96 means that it is higher only 2.5% of the time, etc. The second value is the cumulative probability of having the observed number of pairs spread over the interval -1 .. 1. The function cannot compute the TPI unless the tRNA information is given. This is normally done with the function SetuptRNA. If a second argument, AllAA, is given, then both indices are computed for all the individual amino acids as well as for the ensemble. In this case, a list of lists is returned, where each component is a list with the two values and the name of the amino acid. See also: ?SetuptRNA ?TPIDistr ComputeTSP Function ComputeTSP Calling Sequence: ComputeTSP(D) Parameters: Name Type Description -------------------------------------------------- D matrix(numeric) symmetric distance matrix Returns: list(posint) Global Variables: ComputeTSP_table Synopsis: This function computes a minimum distance tour through the distance matrix D (this is the symmetric travelling salesperson problem). Examples: > D := [[0,1,1,10],[1,0,10,1],[1,10,0,1],[10,1,1,0]]; D := [[0, 1, 1, 10], [1, 0, 10, 1], [1, 10, 0, 1], [10, 1, 1, 0]] > ComputeTSP(D); [1, 2, 4, 3] See also: ?ComputeCubicTSP ?ComputeQuarticTSP ConcatStrings Function ConcatStrings Calling Sequence: ConcatStrings(slist,sep) Parameters: Name Type Description -------------------------------------------- slist array(string) array of strings sep string (optional) separator Returns: string Synopsis: Concatenates a list of strings to one string. The optional second argument can be a separator character which is inserted between any two substrings. This method is much more efficient than repeatedly appending to a string. Examples: > ConcatStrings(['Hello ','World','!']); Hello World! > ConcatStrings(['A','B','C'],', '); A, B, C See also: ?RenderTemplate ?string ?trim ConnectTcp Function ConnectTcp Option: builtin Calling Sequence: ConnectTcp(path,slave) Parameters: Name Type Description ------------------------------------- path string path to a UNIX pipe slave boolean Returns: NULL Synopsis: Creates connection to IPC daemon at path (a UNIX pipe). slave must be false for all darwin processes not created by the daemon. Examples: > r := traperror(ConnectTcp('/tmp/.ipc/darwin', false)); > SendTcp('PING'); r := ReceiveTcp(3); r := PING OK > SendTcp('MSTAT linneus1'); r := ReceiveTcp(3); r := DATA linneus1 0:OK ALIVE > DisconnectTcp();; See Also: ?darwinipc ?ParExecuteIPC ?ReceiveDataTcp ?SendTcp ?DisconnectTcp ?ParExecuteSlave ?ReceiveTcp ?ipcsend ?ParExecuteTest ?SendDataTcp ConsistentGenome Class ConsistentGenome - check the consistency of a database file Template: ConsistentGenome(name) Fields: Name Type Description ------------------------------------------------- name string 5-letter name of a species/genome Returns: NULL Methods: ConsistentGenome_type Synopsis: This function check various aspects of consistency of a database file which contains a single genome. The database should have been loaded with ReadDb before calling this function. Various header fields should be present, (SCINAME, KINGDOM, 5LETTERNAME and optionally ALTGENETICCODE). The entries should contain also a DNA entry, which is checked to be in accordance with the protein sequence. This function will print error messages of the inconsistencies found. For some errors, like identically duplicated sequences, it will print editor commands (vi) to correct the problems. See also: ?database ?DB ?Entry ?GenomeSummary ?ReadDb ?Sequence Counter Class Counter - accumulates values Template: Counter() Counter(title) Returns: Counter Fields: Name Type Description -------------------------------------------------- value numeric accumulated value of the counter title string user-defined description Methods: Counter_type plus printf Rand string times Synopsis: A Counter is an object which stores a number. It is understood that this number is incremented occasionally, that is the purpose of a counter. It is possible to have as many counters as we want, each one with its own description. The way to increment a Counter is by add a value to the structure. E.g. c1+1, will increment the Counter c1 by 1. The counter is incremented as a side effect of the addition (or subtraction). The result of the expression is the total accumulated so far. A Counter can also be multiplied by a numerical factor, this has little use in practice, except for multiplying by zero which erases the Counter. E.g. 0*c1 Examples: > c1 := Counter(iterations): c2 := Counter('Normal numbers'): > to 100 do c1+1; c2+Rand(Normal) od: > print(c1,c2); iterations: 100 Normal numbers: 11.532789 See also: ?LinearRegression ?objectorientation ?Stat Covariance Class Covariance - store and compute covariances and correlations Template: Covariance() Covariance(Description) Covariance(Description,VarNames) Fields: Name Type Description ------------------------------------------------------------------------ Description string descriptive name of the set VarNames list names of the variables Mean list(numeric) mean values of the variables Variance list(numeric) variances of the variables Minimum list(numeric) minimum values of the variables Maximum list(numeric) maximum values of the variables Number integer number of sample points recorded MaxVariance [numeric, list] largest eigenvalue/eigenvector Eigenvalues [list, matrix] eigenvalues/vectors of covariance matrix CovMatrix matrix(numeric) estimated covariance matrix CorrMatrix matrix(numeric) estimated correlation matrix Returns: Covariance Methods: Covariance_type plus print Rand select string update Synopsis: Covariance is a data structure which stores the values of vectors of variables, and upon demand selects/computes various results. A call to Covariance sets the space to record the information. Calls to Covariance_ update or adding a value to a Covariance variable records additional results. At any point selections can be made, resulting in computations. Further data can be added and further selections can be made. Each data point should be a numerical vector of dimension m. The CovMatrix selector returns an unbiased estimator of the covariances of the variables. The diagonal of this matrix contains the estimates of the variances of the variables. The CorrMatrix selector returns an unbiased estimator of the correlation coefficients of the variables. Its diagonal is 1. MaxVariance returns the largest eigenvalue of the covariance matrix and its corresponding eigenvector. This vector gives the linear combination of the variables that will show the largest variance. The Eigenvalues selector returns a list, [e,v], with the eigenvalues and the eigenvectors of the covariance matrix. e is sorted in increasing order ( e[1]<=e[2]<=...<=e[m] ) and v is the array of eigenvectors (each row is an eigenvector, v is an m x m matrix). Covariance analysis is useful to find which are the linear combinations of the data which give the maximum/minimum variances. If a is a data point (a vector of dimension m), then a*v[i] has variance e[i]. If the data has linear dependencies, then some linear combinations will have 0 variance. Then the smallest e value will be 0 (or roundoff error from 0). The number of 0 (or near 0) eigenvalues is the number of linear dependencies in the data. Examples: > c := Covariance('test two vars',[v1,v2]): > c+[0,1]: c+[0.1,1.1]: c+[0.2,1.2]: c+[0,1.3]: > print(c); Covariance analysis for test two vars, 4 data points v1 v2 Means 0.0750 1.1500 Covariance matrix v1 0.0092 v2 0.0017 0.0167 > c[Eigenvalues]; [[0.00881298, 0.01702036], [[0.9782, -0.2076], [0.2076, 0.9782]]] See also: ?Counter ?Eigenvalues ?OutsideBounds ?Stat ?SvdAnalysis CreateArray Function CreateArray - Creates an array of defined length and initialization Option: builtin Calling Sequence: CreateArray(1..n1,1..n2,1..nk) CreateArray(1..k,z) Parameters: Name Type Description --------------------------------------------------- ni integer integer dimensions of the array k integer integer dimension of the array z anything initialization value of the array Returns: list Synopsis: This function creates a new array of dimension specified by k. If the last argument to CreateArray is not of type range, this is the initial value assigned to each element of the array. Examples: > x := CreateArray(1..5, 4); x := [4, 4, 4, 4, 4] > y := CreateArray(1..2, 1..2, [3,4]); y := [[[3, 4], [3, 4]], [[3, 4], [3, 4]]] See also: ?CreateString CreateCodonMatrices Function CreateCodonMatrices - creates a global list of codon mutation matrices. Calling Sequence: CreateCodonMatrices() CreateCodonMatrices(setname) CreateCodonMatrices(counts) CreateCodonMatrices(rates,freqs) Parameters: Name Type Description ------------------------------------------------------------------ setname string Name of the desired set of species. count matrix(numeric) Matrix with codon mutation counts. rates matrix(numeric) a rate matrix Q freqs array(nonnegative) codon frequencies Returns: NULL Global Variables: AF CF CM CMS CodonLogPAM1 DM DMS logPAM1 Synopsis: When called with a set name, the precomputed logarithm of the respective mutation matrices are loaded and used to create the global scoring matrices. When called with no argument, the matrices are cretaed from the data form the OMA project. Alternatively, 'mt' can be used as setname to construct matrices for mitochondiral coding DNA. When a count matrix is given, the mutation matrices are derived from this matrix. When a rate matrix and the natural frequencies are given, then those are used to create the scoring matrices. The function creates the following global objects: CF - a vector of length 64 containing the codon frequencies, CodonLogPAM1 - the logarithm of a 1-CodonPAM mutation matrix, CM - the 250-CodonPAM similarity matrix and CMS - a list of 1266 similarity matrices. Examples: > CreateCodonMatrices(); > CreateCodonMatrices(hum); See Also: ?CodonAlign ?CreateDayMatrices ?EstimateCodonPAM ?CodonDynProgStrings ?CreateSynMatrices CreateCodonModelMatrices Function CreateCodonModelMatrices - Creates a set of CodonPAM1 matrices according to the M-series codon models. Calling Sequence: CreateCodonModelMatrices(model,freq,kappa,w) CreateCodonModelMatrices(model,freq,kappa,w,props) CreateCodonModelMatrices(model,freq,kappa,w,props,p,q) Parameters: Name Type Description ----------------------------------------------------------------------------------- model {M0,M2,M3,M8} type of substitution model freq list(nonnegative) frequency vector kappa nonnegative transition/transversion ratio w {nonnegative,set(nonnegative)} dN/dS ratio(s) props {nonnegative,list(nonnegative)} (for model <> M0) proportion(s) p positive (for M8) p parameter of Beta distribution q positive (for M8) q parameter of Beta distribution Returns: list(matrix) Synopsis: The function CreateCodonModelMatrices creates a set of codon substitution matrices according to the M-series codon models M0, M1/2, M3, M7/8 by Yang. To create matrices for M1 using M2, set w to 1; to create matrices for M7 using M8, set props to 0 and only use elements 1..10 of the list returned by the function. Examples: See also: ?CreateParametricQMatrix CreateDayMatrices Function CreateDayMatrices - Create all the Dayhoff matrices needed Calling Sequence: CreateDayMatrices() CreateDayMatrices(Name) CreateDayMatrices(Counts) CreateDayMatrices(Q,freqs) Parameters: Name Type Description ----------------------------------------------------------------------------------------------------------- Counts matrix (optional) a symmetric aa mutation count matrix mapping procedure (optional) a mapping between symbols and posints type = anything (optional) matrices will be of the given type Q matrix (optional) a rate matrix freqs array (optional) frequencies (if called with Q) name string (optional) name of a substitution model (currently allowed are JTT, LG and WAG) Returns: NULL Global Variables: AF DM DMS logPAM1 Synopsis: This function creates all the Dayhoff matrices needed for other alignment functions to work. It performs the following four calculations: (1) It assigns a Dayhoff matrix computed at PAM distance 250 to the global variable DM. (2) It computes 1266 Dayhoff matrices for various PAM distances between 0.049 and 1000 and assigns the list of such matrices to the global variable DMS. (3) It computes the amino acid natural frequencies and assigns them to the global variable AF. (4) It assigns the global variable logPAM1 with the logarithm of the mutation matrix (at PAM distance 1) being used. By default, with no arguments, it uses the data derived from the entire SwissProt database in Nov 1991 (Benner, Gonnet and Cohen). This can be altered in four ways: (a) by assigning the global variable NewLogPAM1 with the logarithm of a PAM 1 mutation matrix, all the computations will be based on this mutation matrix. (b) by passing a count matrix as argument all the computations will be based on this count matrix. A count matrix has the counts of mutations (and non mutation on the diagonal) for a large sample of alignments. Normally if two amino acids X and Y are aligned, we will add 1/2 to Counts[X,Y] and 1/2 to Counts[Y,X]. (c) by calling the function with a rate matrix and a frequency vector, the computations will be based on these parameters (d) by calling the function with the name of a specific substitution model (currently, JTT, LG and WAG are allowed). The computations will then be based on that model. If the counts are only on the amino acids A, C, G and T, (and the rest of the counts are just 1 on the diagonal and 0 elsewhere), the Dayhoff matrices produced are suitable to align DNA sequences. Actually this is the standard and simplest way of aligning DNA sequences. The system knows about the following count matrices, which can be used as argument of CreateDayMatrices: name Description --------------------------------------------------------------------- HumanMtDNA Human mitochondrial DNA count matrix based on very short PAM evolution, taken from 86 full mtDNA genomes ViralRNA Counts matrix derived from 50 RNA viruses Examples: > CreateDayMatrices(); See Also: ?CreateCodonMatrices ?CreateOrigDayMatrix ?SearchDayMatrix ?CreateDayMatrix ?DayMatrix CreateDayMatrix Function CreateDayMatrix Option: builtin Calling Sequence: CreateDayMatrix(LogMutMatrix,PamNumber) Parameters: Name Type Description --------------------------------------------------------------------------- LogMutMatrix array(array(numeric)) logarithm of a 1-PAM mutation matrix PamNumber numeric desired PAM distance of the result posint..posint range of integer PAM distances Returns: {DayMatrix,list(DayMatrix)} Synopsis: Computes a similarity scoring matrix (usually called Dayhoff matrix) from a the logarithm of a 1-PAM mutation matrix (LogMutMatrix) and a PAM distance PamNumber. CreateDayMatrices() assigns the global variable logPAM1 a logarithm of a 1-PAM mutation matrix. If the second argument is an integer range, a list of PAM matrices with all the PAM values in the range will be computed. Examples: > CreateDayMatrix( NewLogPAM1 , 250); DayMatrix(Peptide, pam=250, Sim: max=14.152, min=-5.161, del=-19.814-1.396*(k-1)) See Also: ?CreateDayMatrices ?CreateOrigDayMatrix ?DayMatrix ?SearchDayMatrix CreateMSAMethods Function CreateMSAMethods( ) Creates a list of several default MSA methods CreateOrigDayMatrix Function CreateOrigDayMatrix Option: builtin Calling Sequence: CreateOrigDayMatrix(Mutations,AaCounts,PamNumber) CreateOrigDayMatrix(mutations,counts,1..UpperPam) Parameters: Name Type -------------------------------- Mutations array(numeric,20,20) AaCounts array(numeric,20) PamNumber numeric Returns: {DayMatrix,list(DayMatrix)} Synopsis: This function computes a Dayhoff matrix (structured type DayMatrix) computed by the method first given by Dayhoff et. all cite{DayhoffOS78} given an observed mutation matrix mutations, a frequency vector counts and a PAM distance PAM (or range of PAM distances beginning at 1). Examples: > OrigTot := [87, 41, 40, 47, 33, 38, 50, 89, 34, 37,85, 81, 15, 40, 51, 70, 58, 10, 30, 65]; OrigTot := [87, 41, 40, 47, 33, 38, 50, 89, 34, 37, 85, 81, 15, 40, 51, 70, 58, 10, 30, 65] > OrigFreq := OrigTot/sum(OrigTot); OrigFreq := [0.08691309, 0.04095904, 0.03996004, 0.04695305, 0.03296703, 0.03796204, 0.04995005, 0.08891109, 0.03396603, 0.03696304, 0.08491508, 0.08091908, 0.01498501, 0.03996004, 0.05094905, 0.06993007, 0.05794206, 0.00999001, 0.02997003, 0.06493506] > OrigDM := CreateOrigDayMatrix(Mutations1978, OrigFreq, 250); OrigDM := DayMatrix(Peptide, pam=250, Sim: max=17.302, min=-7.510, del=-19.814-1.396*(k-1)) See also: ?CreateDayMatrices ?CreateDayMatrix ?DayMatrix ?SearchDayMatrix CreateParametricQMatrix Function CreateParametricQMatrix - Creates a rate matrix from a frequency vector, Ts/Tv and dN/dS. Calling Sequence: CreateParametricQMatrix(f,k,w) Parameters: Name Type Description -------------------------------------------------------- f list(nonnegative) frequency vector k nonnegative transition/transversion ratio w nonnegative dN/dS ratio Returns: matrix Synopsis: The function CreateParametricQMatrix creates a rate matrix Q from the frequencies and given kappa and w (omega) parameters. Examples: See also: ?CreateCodonModelMatrices CreateRandMultAlign Function CreateRandMultAlign - Random multiple alignment following a phylogenetic tree Calling Sequence: CreateRandMultAlign(tree,len) CreateRandMultAlign(tree,len,method) CreateRandMultAlign(tree,len,DelType) Parameters: Name Type Description ---------------------------------------------------------------------------- tree Tree Phylogenetic tree len posint Length of root sequence method string (optional) MSA method, default: Probabilistic DelType {ExpGaps,ZipfGaps} (optional) mutation type, default: no gaps Returns: MAlignment Synopsis: Produces a random multiple alignment that is generated from a phylogenetic tree. The DelType is directly passed to the Mutate function, while the method is used for the MAlign function. Examples: > tree := Tree(Leaf(A,-7.5000,1),0,Tree(Tree(Leaf(D,-8.5000,4),-7.5000,Leaf(C,-8.5000,3)),-4.5000,Leaf(B,-5.5000,2))): > msa := CreateRandMultAlign(tree,200,ExpGaps); dimensionless fitting index 0.0337 > print(msa); Multiple sequence alignment: ---------------------------- Score of the alignment: 4498.2632 Maximum possible score: 4498.2632 A GTDQPFTNFNGINRFATPGFNPFGALLDNLSVGGVNHIAIEHSGEIEPSVRSNLVTYYVLEKKGFFPTGCVLAL D GTDQPFTNFNGVGMFATPGFNPFGAALDNLSVGGINHIAIEHSGEIEPSVRSNLVTYYVLEKKGFFPTGCVLAL C GTDQPFTNFNGVGMFATPGFNPFGAALDNLSVGGINHIAIEHSGEIEPSVRSNLVTYYVLEKKGFFPTGCVLAL B GTDQPFTNFNGVGRFATPGFNPFGAALDDLSVGGVNHVAIEHSGEIEPSVRSNLVTYYVLEKKGFFPTGCVLAL A LLDPLFLFVSPPECKVLNLFNAKTTVTDNNAPMPIMVSVGKEGADDYVFIHLSFHVPAWRAGDYRLCSSLEFTI D LIDPLFLFVSPPECKVVNLFKAKTTVTNENAPMPIMVPVGAEGVDDYVFIHLSFHVPPWRAGDYRLCSSLEFTN C LIDPLFLFVSPPECKVVNLFKAKTTVTNENAPMPIMVSVGAEGADDYVFIHLSFHVPPWRAGDYRLCSSLEFTN B LIDPLFLFVSPPECKVVNLFKAKTTVTNENAPMPIMVSVGAEGADDYVFIHLSFHVPPWRAGDYRLCSSLEFTN A FENTYWAPYIVTEIGRKRAETSANSQHGDRQSKEKGTRLMVLHTKGLTEPTA D FENTYWAHYIVTEAGRKRAETSANSQHGDRQSKEKGTRLMVLNAKGLTEPTA C FENTYWAHYIVTEAGRKRAETSANSQHGDRQSKEKGTRLMVLNAKGLTEPTA B FENTYWAHYIVTEAGRKRAETSANSQHGDRQSKEKGTRLMVLNAKGLTEPTA See Also: ?BootstrapTree ?MAlign ?Mutate ?CreateRandSeq ?MAlignment ?Tree CreateRandPermutation Function CreateRandPermutation Calling Sequence: CreateRandPermutation(n) Parameters: Name Type ------------- n posint Returns: list(integer) Synopsis: Returns a random permutation of the integers from 1 to n. Examples: > CreateRandPermutation(5); [2, 1, 5, 3, 4] See Also: ?CreateRandSeq ?Permutation ?SetRand ?Shuffle ?Mutate ?Rand ?SetRandSeed CreateRandSeq Function CreateRandSeq - create a random sequence Calling Sequence: CreateRandSeq(len,F) Parameters: Name Type Description --------------------------------------------- len posint sequence length F array(numeric) character frequencies Returns: string Synopsis: Given a list of frequencies of length 4, this function creates a random nucleotide sequence of length len. When given a list of 20 (amino acid) frequencies, it generates a random amino acid sequence. A list of length 64 or 65 produces a random codon sequence without stop codons. Examples: > CreateRandSeq(20, [0.2, 0.3, 0.4, 0.1]); AGGCCCCCGGACAAGCGGGA See also: ?CreateRandPermutation ?Rand ?SetRand ?SetRandSeed ?Shuffle CreateString Function CreateString - Creates a string of defined length and initialization Option: builtin Calling Sequence: CreateString(len) CreateString(len,z) Parameters: Name Type Description ------------------------------------------------------------------------ len {0,posint} integer length of the string z string initialization value of each character of the string Returns: string Synopsis: Create a new string of the given length and initialize it, setting each character to the initialization value (default: blank). Examples: > x := CreateString(6); x := > y := CreateString(10, d); y := dddddddddd See also: ?CreateArray CreateSynMatrices Function CreateSynMatrices - Creates a global list of SynPAM matrices. Calling Sequence: CreateSynMatrices() CreateSynMatrices(setname) Parameters: Name Type Description ------------------------------------------ setname string name of predefined set. Returns: NULL Global Variables: SynMS Synopsis: When called with a set name, the precomputed count matrices are loaded and used to create the global scoring matrices. As default, the count matrix form the OMA project is used. The function then sets all non- synonymous mutation counts to zero and uses this matrix to create the global list SynMS with 1000 scoring matrices of various SynPAM distances. Examples: > CreateSynMatrices(); > CreateSynMatrices(mus); See Also: ?CodonDynProgStrings ?CreateCodonMatrices ?EstimateSynPAM ?CodonMatrix ?CreateDayMatrices CreateTreeConstruction Function CreateTreeConstruction( type:string ) Creates a reasonalbe TreeConstruction data structure for the given type. type may be one of the following: prob, phylip, linear, dynamic CreateTreeConstructions Function CreateTreeConstructions( ) Creates a selection of tree construction algorithms. If an optional argument specifies the number of different algorithms. SMALL: 4 different methods MEDIUM: 14 methods LARGE: 40 methods CreateTreeStatistics Function CreateTreeStatistics( Constructions:array(TreeConstruction), Trees:array(Tree) ) Creates an array of TreeStatistics of TreeConstruction and Tree Cumulative Function Cumulative - compute the cumulative probability for x Option: polymorphic Calling Sequence: Cumulative(distr,x) Parameters: Name Type Description ------------------------------------------------------------ distr anything description of a probability distribution x numeric a number Returns: numeric Synopsis: This function computes the probability that a random distributed variable with distribution "distr" has a value less or equal to x. This is normally called the cumulative probability distribution. The result is between 0 and 1 inclusive. The format describing the distribution is the same as the one used by Rand. If x is continuously distributed, with density f(x), then the cumulative is: x / | Cumulative(f, x) = | f(t) dt | / -infinity If the distribution is a discrete distribution, say over the integers, then the cumulative is defined as: / x - 1 \ | ----- | | \ | Cumulative(f, x) = 1/2 f(x) + | ) f(t)| | / | | ----- | \t = -infinity / The system knows how to compute the Cumulative distributions of: {Binomial, ChiSquare,LogIndepEvents,Normal,U}. If the arguments are such that the value returned is too close to 1 or too close to 0 for accurate representation, consider using CumulativeStd which returns its result in equivalent standard deviations and will not suffer from precision problems. The relations between the Cumulative(c) and the CumulativeStd(s) are the following: s c = 1/2 (1 + erf(----)) 1/2 2 s c = 1 - 1/2 erfc(----) 1/2 2 s c = 1/2 erfc(- ----) 1/2 2 References: Erdelyi53, Handbook of Mathematical functions, Abramowitz and Stegun, 7.1 Examples: > Cumulative( Binomial(10,0.5), 5 ); 0.5000 > Cumulative( U(0,10), 7.5 ); 0.7500 See Also: ?CumulativeStd ?ProbBallsBoxes ?Rand ?Std_Score ?OutsideBounds ?ProbCloseMatches ?StatTest CumulativeStd Function CumulativeStd - cumulative probability in standard deviations Option: polymorphic Calling Sequence: CumulativeStd(distr,x) Parameters: Name Type Description ------------------------------------------------------------ distr anything description of a probability distribution x numeric a number Returns: numeric Synopsis: This function computes the probability that a random distributed variable with distribution "distr" has a value less or equal to x. This is normally called the cumulative probability distribution. The result is returned in standard deviations of an equivalent Normal(0,1) distribution. This is useful when the result is exponentially close to 1 (or to 0) and returning the probability would cause large truncation errors. The format describing the distribution is the same as the one used by Rand. If x is continuously distributed, with density f(x), then the cumulative is: x / | Cumulative(f, x) = | f(t) dt | / -infinity The system knows how to compute the Cumulative distributions of: {Binomial, ChiSquare,LogIndepEvents,Normal,U}. If the distribution is a discrete distribution, say over the integer, then the cumulative is defined as: / x - 1 \ | ----- | | \ | Cumulative(f, x) = 1/2 f(x) + | ) f(t)| | / | | ----- | \t = -infinity / Examples: > CumulativeStd( Binomial(10,0.5), 6 ); 0.5995 > CumulativeStd( U(0,10), 9.75 ); 1.9600 See Also: ?Cumulative ?ProbBallsBoxes ?Rand ?Std_Score ?OutsideBounds ?ProbCloseMatches ?StatTest CurrentOff Function CurrentOff Option: builtin Calling Sequence: CurrentOff() Returns: integer Synopsis: Returns the current file pointer offset when reading. This runs the C function ftell() on the current input descriptor. See also: DataMatrix Data structure DataMatrix( ) Function: creates a datastructure to keep a DataMatrix. A datamatrix can be: - an AllAll (array of matches) - a matrix of PAM distances or other metrics - a matrix of Scores The data structure keeps all three kinds of data types. If any of them is not specified, then the field is 0. If an AllAll is given, then both score and pam matrices are extracted automatically. If only scores are given or PAM distances, the other two fields are 0. Selectors: TYPE: string, describes the type of data used DISTANCE: array of (PAM or other positive) distances SCORE: array of scores (or other similar measures) The type is used for example for the calculation of TSP. If the data has a distance flavor, then shorter distances are better. But if the data is a score, then a higher score is better. If no type is specified, then PAM is assumed (a distance measure) TSP: returns optimal path in the form [a, b, c, .. , a] (the last element is repeated) if possible (i.e. if pam data is available) use this to calculate the TSP order The result is saved in the data structure. So it will only compute the best order if the field is 0, otherwise the last result is returned. RAW: matrix returns the original data, i.e an AllALl matrix DATA: matrix returns the distance or score matrix or 0 if none is there VAR: calculates variances of PAM distances of AllAll if no AllALl is given, it returns the data matrix SEQ: associated sequences (optional) Constructors: d := DataMatrix(); d := DataMatrix("SCORE", AllAll); d := DataMatrix("PAM", some_distance_matrix); DayMatrix Class DayMatrix - similarity scoring matrix or Dayhoff matrix Template: DayMatrix(PAM) CreateDayMatrix(logPAM1,PAM) CreateDayMatrices() CreateOrigDayMatrix() Fields: Name Type Description ----------------------------------------------------------------------------- DelCost procedure proc(k,pam) gives cost of k-long indel Dimension posint dimension of the similarity matrix FixedDel numeric fixed cost (opening) for affine indels IncDel numeric incremental cost for affine indels logPAM1 matrix rate matrix used for this DayMatrix Mapping procedure proc to map symbols to matrix indices MaxOffDiag numeric maximum similarity for distinct residues MaxSim numeric maximum similarity score in the matrix MinSim numeric minimum similarity score in the matrix PamDistance numeric PAM distance of this matrix PamNumber numeric PAM distance of this matrix Sim matrix(numeric) Similarity matrix of scores (see below) StopSimil numeric cost of matching a stop codon type symbol type of scoring matrix, Peptide or Nucleotide Methods: DayMatrix_type print Synopsis: A DayMatrix is the data structure or class which holds similarity scores computed from mutation matrices. The matrices are used for alignment of sequences. The scores have a precise mathematical meaning: they are 10 times the log10 of the probability that the alignment comes from homology as opposed to a random coincidence. Hence alignment scores give a rough estimate of how rare the alignment is if it were produced by chance only. The functions which create DayMatrices (CreateDayMatrices) normally assign a dense array of DayMatrix to the variable DMS (to allow estimation of distances between sequences) and a 250-PAM matrix to DM (the most commonly used matrix). Currently, DayMatrices are internal objects. The functions mentioned in the Template part above are used to create DayMatrices. Some other commonly used scoring matrices can be obtained by the command Matrices (). When DayMatrix is used as a constructor (first entry above), it searches the list of DayMatrix DMS for a matrix of the right PAM and returns it. If none is found, it calls CreateDayMatrix to build an appropriate one. When selecting the similarity matrix from a DayMatrix (selector Sim), a new matrix is constructed and returned. If the selection on Sim is immediately followed by two indices, then no matrix is constructed and the corresponding entry of the Dayhoff matrix is returned. For this special case, (e.g. DM[Sim,a,b]), the selectors a and b can be the one letter codes for the amino acids. This is more efficient and simpler than invoking the AToInt conversion. Examples: > CreateDayMatrices(); > DM[Sim,1,1]; 2.3562 > DMS[100,Sim,L,I]; -17.8435 > DayMatrix(316); DayMatrix(Peptide, pam=316, Sim: max=12.964, min=-3.989, del=-19.057-1.396*(k-1)) See Also: ?CreateDayMatrices ?CreateOrigDayMatrix ?SearchDayMatrix ?CreateDayMatrix ?Matrices DayMatrixScale Function DayMatrixScale Calling Sequence: DayMatrixScale(dm) Parameters: Name Type ---------------- dm DayMatrix Returns: numeric Synopsis: Computes the scaling factor lambda of dm such that sum (f[i]*f[j]* exp(lambda*dm[Sim,i,j])) = 1. For Dayhoff-like matrices DM, DayMatrixScale (DM) = ln (10) / 10. Examples: > DayMatrixScale( DM ); 0.2303 See Also: ?CreateDayMatrices ?CreateOrigDayMatrix ?DayMatrix ?SearchDayMatrix DbToDarwin Function DbToDarwin - Make a darwin-readable version of SwissProt Calling Sequence: DbToDarwin(inp,outfile,descr,TagsToKeep) Parameters: Name Type Description ------------------------------------------------------------------- inp string the complete input database as a string outfile string name of the output file (database) descr string any commentary TagsToKeep list(string) tags to keep from SwissProt Returns: NULL Synopsis: Converts a SwissProt formatted text (inp) into a file (outfile) usable by Darwin. This program requires a lot of main memory, (as much the original input file). Make sure that you have enough memory by using (in unix) "unlimit datasize memoryuse". Once the new database is created, the first time the command "ReadDb (SwissProt40);" is executed, the index of the database will be built. Building the index can take quite a bit of CPU time. This time is spent only once; future uses of the database will not require any index building. You will find that Darwin creates a file named "SwissProt40.tree". This index file is the Pat tree for all the peptides and is needed for most of the basic operations of Darwin. You must have write permissions in the directory in which the database is stored to create the tree (only the first time the database is loaded). If an index is not needed (no fast searches will be possible), creating an empty SwissProt40.tree file will indicate to ReadDb that the user does not want an index. Examples: > DbToDarwin( ReadRawFile('sprot40.dat'), 'SwissProt40', ReadRawFile('relnotes.txt'), ['AC','DE','OS','KW'] ); See also: ?ConsistentGenome ?DB ?GenomeSummary ?ReadDb Denormalize Function Denormalize Calling Sequence: Denormalize(m) Parameters: Name Type ------------------ m NucPepMatch Returns: NucPepMatch Synopsis: Denormalizes a match referencing a sequence being present in memory to refer to (the complement of) an NucDB database entry. Examples: See also: ?Normalize Description Class Description - contains structured information on a function Methods: Description_type Document error HTMLC latex select string Synopsis: This class contains structured information on a function suitable to build the "description" entry for a function. This structure establishes the "official" format for description of functions in Darwin. Description allows an arbitrary number of parameters. The first argument describes the function/class/variable/iterator being described. It is a structure, where the name of the structure is one of function/class/variable/iterator and the only field is the name of the object being described. E.g. function(sin), structure(Stat), variable('Pi'). The Paragraph(), Indent() or Table() or any other Document valid structures can appear at any place and will insert a paragraph/table etc. of text at that point. The following arguments are optional, but must be given in this order. Summary( string ) Summary has an English short description of what the function does. It should fit in one line together with the name of the function. It is better if does not start with a capital letter. CallingSequence( noeval( Func(Ver1) ), noeval( Func(Ver2) )) CallingSequence contains examples on how the function may be called. These are typically surrounded by noeval() to prevent their execution. They serve as a pattern for the one or many ways of using the object. The names of the arguments used will be described later. Parameters( [param1, type1, description1], [param2, type2, description2], ...) Parameters holds the name of the parameters, their types, and a short description for each of them. Make sure that the 3 columns fit in the width of a normal page (80 columns). For a data structure/class, the names represent the fields of the structure. Selector( [selname1, type1, description1], [selname1, type1, description1], ... ) For data structures/classes, these provide the type and description of the explicit and computed selectors. Returns( type ) Returns( [type,description] ) Returns describes the value returned, when it is obvious what it is, the type information is enough, otherwise, a description may be added. Synopsis( string, string, ... ) Like a Paragraph, Synopsis contains the description of what the function does/computes. References( string, string, ... ) Provides a format for citations related to the object. Keywords( string, string, ... ) Keywords related to this help topic. Examples( ) Examples contains examples of how the function is used. They will appear sequentially and with their output. Examples( ) have five formats: Quoted string: the statement contained in quotes is executed and the statement and its output are printed out. If the string is terminated with a colon (":"), then its output will not be part of the help file (like in a Darwin session). No semicolon is needed at the end, one is added if necessary. Fake(commands,output): The first element is the input to Darwin, the second element is the desired output. Nothing is evaluated (e.g. an assignment is not executed). This is convenient when the action being described interacts with the system (show a Plot, write a file, etc.) Hide(command): This executes a statement but does not print out either the input or the output. It is useful when we want to prepare for the execution or undo some action. Unassign(string, string, ... ): The arguments, which should be strings, are assumed to be names that were assigned in the example and need to be unassigned. Do not leave names assigned, as these are almost certain to cause trouble when we generate the entire set of help files. Print(command): The command is expected to print (which unless precautions are taken, will end up printing in the wrong place.) This command collects the printing output in a file and inserts it appropriately. Must be used for all the commands which print in one way or another. SeeAlso( token, [token,description], ... ) SeeAlso contains a series of tokens suitable for additional references (and an optional description if necessary) DigestAspN Function DigestAspN - return digestion fragments from AspN Calling Sequence: DigestAspN(seq) Parameters: Name Type Description ---------------------------------- seq string a protein sequence Returns: list(string) Synopsis: This functions returns a set of fragment sequences of seq as though seq were digested by AspN. See Also: ?DigestionWeights ?DynProgMass ?ProbBallsBoxes ?DigestSeq ?DynProgMassDb ?ProbCloseMatches ?DigestTrypsin ?enzymes ?Protein ?DigestWeights ?MassProfileResults ?SearchMassDb DigestSeq Function DigestSeq - return digestion fragments Calling Sequence: DigestSeq(seq,enzyme) Parameters: Name Type --------------- seq string enzyme string Returns: list(string) Synopsis: Return the protein fragments that would result from a digestion with the given enzyme. Examples: > DigestSeq('WWWWWWPCPLTTTTTTTTT', Armillaria ); [WWWWWWP, CPLTTTTTTTTT] See Also: ?DigestAspN ?DynProgMass ?ProbBallsBoxes ?DigestionWeights ?DynProgMassDb ?ProbCloseMatches ?DigestTrypsin ?enzymes ?Protein ?DigestWeights ?MassProfileResults ?SearchMassDb DigestTrypsin Function DigestTrypsin - return digestion fragments from Trypsin Calling Sequence: DigestTrypsin(seq) Parameters: Name Type Description ---------------------------------- seq string a protein sequence Returns: list(string) Synopsis: This function returns a set of fragment sequences of seq as though seq were digested by trypsin. See Also: ?DigestAspN ?DynProgMass ?ProbBallsBoxes ?DigestionWeights ?DynProgMassDb ?ProbCloseMatches ?DigestSeq ?enzymes ?Protein ?DigestWeights ?MassProfileResults ?SearchMassDb DigestWeights Function DigestWeights - return weights of digestion fragments Calling Sequence: DigestWeights(seq,enzyme) Parameters: Name Type Description --------------------------------------------- seq string a protein sequence enzyme matrix(boolean) Returns: list(numeric) Synopsis: Return the weights of the protein fragments that would result from a digestion with the given enzyme. Examples: > DigestWeights('WWWWWWPCPLTTTTTTTTT', Armillaria ); [1232.3950, 1241.3660] See Also: ?DigestAspN ?DynProgMass ?ProbBallsBoxes ?DigestionWeights ?DynProgMassDb ?ProbCloseMatches ?DigestSeq ?enzymes ?Protein ?DigestTrypsin ?MassProfileResults ?SearchMassDb DisconMinimize Function DisconMinimize Calling Sequence: DisconMinimize(f,iniguess,epsini,epsfinal) Parameters: Name Type ------------------------- f procedure iniguess array(numeric) epsini numeric epsfinal numeric Returns: x, f(x) Global Variables: DisconMinimize_feval Synopsis: Starting at iniguess, this function minimizes f until the argument accuracy in each dimension is less than or equal to epsfinal (for discontinuous function f). See Also: ?BFGSMinimize ?MaxLikelihoodSize ?MinimizeBrent ?MinimizeSD ?MaximizeFunc ?Minimize2DFunc ?MinimizeFunc ?NBody DisconnectTcp Function DisconnectTcp Option: builtin Calling Sequence: DisconnectTcp() Returns: NULL Synopsis: Closes the connection to the IPC daemon. Examples: > r := traperror(ConnectTcp('/tmp/.ipc/darwin', false)); > SendTcp('PING'); r := ReceiveTcp(3); r := PING OK > SendTcp('MSTAT linneus1'); r := ReceiveTcp(3); r := DATA linneus1 0:OK ALIVE > DisconnectTcp();; See Also: ?ConnectTcp ?ParExecuteIPC ?ReceiveDataTcp ?SendTcp ?darwinipc ?ParExecuteSlave ?ReceiveTcp ?ipcsend ?ParExecuteTest ?SendDataTcp DoGapHeuristic Function DoGapHeuristic( msa:MAlignment, gaph:GapHeuristic ) Heuristics for gap alignment. The following algorithms are implemented: a) gap fusion and shifting b) stacking of gap blocks c) left-right shifting of gap blocks d) random shifting of gap blocks Parameters: msa: MAlignment data structure. An alignment must exist gh: GapHeuristic data structure. See the description there. It holds all parameteres needed for the gap heuristics, such as the maxgaps (max. number of gaps to combine) etc. with "gh := GapHeuristic()" default values are used. As a third parameter the algorithm can be specified (it can also be specified in the GapHeuristic data structure). The following values are valid: ALL: all heuristics are used FUSION: gap fusion and shifting is used STACKING: gap block stacking SHIFTING: gap block left-right shifting is used RANDOM: random gap block shifting is used As a FOURTH parameter a flag can be specified (it can also be specified in the GapHeuristic data structure). The following values are valid: NORMAL: in each round the values thay the same INCREMENTAL: in each round the values are increased by one RANDOM: in each round the values are changed randomly the maximum values are the ones initially used DocEl Class DocEl - Adds metainformation to some content Template: DocEl(tag,content1,...) Returns: DocEl Fields: Name Type Description ------------------------------------------------------------- tag string the tag added to the content content_i {string,structure} the content of the element Methods: DocEl_type HTMLC LaTeXC string Synopsis: DocEl is only meaningful in the context of a structured output format such as LaTeX or (X)HTML. If used in a normal print statement, DocEl will just output the content parameters. If used in a LaTeXC statement, DocEl will wrap the content in a latex tag. Examples: > d := DocEl( 'author', 'John Doe' ); d := DocEl(author,John Doe) > print(d); John Doe > prints(LaTeXC(d)); \author{John Doe} See Also: ?Block ?HTML ?List ?RunDarwinSession ?Code ?HyperLink ?Paragraph ?screenwidth ?Color ?Indent ?PostscriptFigure ?Table ?Copyright ?LastUpdatedBy ?print ?TT ?Document ?latex ?Roman ?View Document AlphabeticalAlphanumericalBoldCenterCopyrightFontHyperLinkIndentITLastUpdatedByOrdinalPlusMinRomanSectionHeaderSize Class Document - holds contents of a human-readable document Template: Document(content1,content2,...) Returns: Document Fields: Name Type Description ------------------------------------------------------------- content_i {string,structure} the contents of the Document Methods: Document_type HTMLC LaTeXC print string Synopsis: The Document structure holds text and other structures which are expected to be laid out as a Document. When a Document is converted, each content_i is converted to the same target. Normally a Document is converted to a string, HTML or Latex. Besides text, the following structures are valid inside Documents: Name/Use Description -------------------------------------------------------------------------- Alphabetical(int) Convert a number to alphabetical numerals Alphanumerical(int) Convert a number to alphanumerical numerals Bold(txt,...) Bold text Center(txt,...) Center the contents Code(txt,...) preformated, equally spaced text Color(code,txt) Color the contents Copyright(who) Insert copyright symbol, year and argument. Font(font,txt,...) Set contents with a given font HyperLink(txt,URL) URL linked data Indent(txt,...) Indented data IT(txt,...) Italic text LastUpdatedBy(who) Convenient macro to end Document page. List(format,txt,...) List/Definitions/bullets MapleFormula(string) mathematical formula in Maple format Ordinal(int) Convert a number to its ordinal ending Paragraph(int,txt,...) A paragraph of text, lines adjusted PlusMin(string) Expand +- to proper plus-minus symbols PostscriptFigure(psfile,...) Figure from postscript source Roman(int) Convert a number to roman numerals SectionHeader(lev,txt) Section/subsection header Size(size,txt,...) Set contents to a given size Table(...) Tabular data TT(txt,...) tty format (equally spaced font) where txt means a string or any structure that will represent text. Examples: > d := Document( Paragraph(2,Hi), Indent(5,List(Roman,first,second)) ); d := Document(Paragraph(2,Hi),Indent(5,List(Roman,first,second))) > print(d); Hi I first II second See Also: ?Block ?HTML ?List ?RunDarwinSession ?Code ?HyperLink ?Paragraph ?screenwidth ?Color ?Indent ?PostscriptFigure ?Table ?Copyright ?LastUpdatedBy ?print ?TT ?DocEl ?latex ?Roman ?View DownloadURL Function DownloadURL Calling Sequence: DownloadURL(url,filename) Parameters: Name Type Description ---------------------------------------- url string a URL filename string filename to save URL Returns: string Synopsis: Downloads a URL and saves its content in a file. See Also: ?OpenAppending ?OpenWriting ?ReadRawLine ?SearchDelim ?OpenReading ?ReadLine ?ReadURL ?SplitLines DrawDistribution Function DrawDistribution Calling Sequence: DrawDistribution(sample) Parameters: Name Type Description ------------------------------------------------------------------ sample array([numeric, numeric]) [mean,variance] values anything (optional) see ?PlotArguments Returns: NULL Synopsis: Draws a distribution curve as a superposition of normal distributions based on [mu,sigma^2] values. Each entry in the array sample is interpreted as one distribution given by the pair [average,variance]. The results are usually stored in a file as with DrawPlot. They can be seen with ViewPlot(). Examples: > DrawDistribution( [ [0,1], [10,1], [5,10] ] ); ViewPlot(); See Also: ?BrightenColor ?DrawPointDistribution ?Set ?ColorPalette ?DrawStackedBar ?SmoothData ?DrawDotplot ?DrawTree ?StartOverlayPlot ?DrawGraph ?GetColorMap ?StopOverlayPlot ?DrawHistogram ?Plot2Gif ?ViewPlot ?DrawPlot ?PlotArguments DrawDotplot Function DrawDotplot Calling Sequence: DrawDotplot(data,legend) Parameters: Name Type -------------------------------------------------------------------- data {array([numeric, numeric]),list(array([numeric, numeric]))} legend string Returns: NULL Synopsis: Plots data points as dots (circle, crosses, squares, triangles). Examples: See Also: ?BrightenColor ?DrawPointDistribution ?Set ?ColorPalette ?DrawStackedBar ?SmoothData ?DrawDistribution ?DrawTree ?StartOverlayPlot ?DrawGraph ?GetColorMap ?StopOverlayPlot ?DrawHistogram ?Plot2Gif ?ViewPlot ?DrawPlot ?PlotArguments DrawGraph Function DrawGraph - draw a graph in two dimensions Calling Sequence: DrawGraph(G) DrawGraph(G,modif) Parameters: Name Type Description ------------------------------------------------------------------------- G Graph an input Graph modif {string,symbol = anything} (optional) modifiers for the drawing Returns: NULL Global Variables: printlevel Synopsis: DrawGraph uses the plot facility to display a Graph in two dimensions. The first argument, G, should be a Graph data structure. The positioning of the nodes and other properties of appearance depend on the optional arguments: Mode of node positioning for NBody problem: equal equal All the edges have an equal initial distance and variance distance The Edges' labels correspond to distances between the adjacent nodes. The variance of the distances are assumed to be equal to the distance. Edges having a non-positive label are ignored for the fitting. weight The Edges' labels correspond to weights or scores between the adjacent nodes. They are converted to distances by taking the inverse of the weights. Variances are assumed to be equal to the distance. Edges having a non-positive labels are ignored for the fitting. procedure A procedure Edge -> [dist, var] that assigns a distance and a variance to an edge. Edge drawing and labeling: unlabeled EdgeDrawing=unlabeled Edges are drawn without any label EdgeDrawing=labeled The label of each edge is drawn centered on the line and in the same color than the edge. EdgeDrawing= A procedure (x1,y1,x2,y2,label,ts,col) -> list( drawing commands ). x1,y1,x2,y2 are the starting and end points of the edge, label is its label, ts the desired textsize and col the color. Node drawing: Nodes are represented with a circle and the node description NodeDrawing= A procedure (x,y,label,ts,col) -> list( drawing commands ), where the node with label 'label' is centered at (x,y). ts is the desired textsize and col the color. Size of Text: TextSize= Set point-size for all text Nodes and edges can be colored using the optional argument, which takes a list of arguments of the following form: Color( colorname, obj1, obj2, ... ). The objects are either Nodes(), Edges() or Edge() data structures. This means that those edges or nodes will be colored with colorname. The valid names for colorname are defined in lib/Color. The output is directed according to plotsetup. See Also: ?BrightenColor ?DrawPointDistribution ?Set ?ColorPalette ?DrawStackedBar ?SmoothData ?DrawDistribution ?DrawTree ?StartOverlayPlot ?DrawDotplot ?GetColorMap ?StopOverlayPlot ?DrawHistogram ?Plot2Gif ?string_RGB ?DrawPlot ?PlotArguments ?ViewPlot DrawHistogram Function DrawHistogram - single or multiple (side by side) histogram Calling Sequence: DrawHistogram(data,labels,legend) Parameters: Name Type Description -------------------------------------------------------------------------- data array(numeric) data values, dim n (single histogram) or data matrix(numeric) data values dim m x n (multiple histograms) labels array (optional) dim n labels of each vertical bar(s) legend array (optional) dim m description of each histogram anything (optional) see ?PlotArguments Returns: NULL Synopsis: DrawHistogram produces a plot of a histogram of the numerical values given in data. That is, when "data" is a single array (dim n), hence a single histogram, each numerical value of data is represented by a vertical bar with its height proportional to its data value. The data values are printed at the top of each bar. When "data" is a matrix (dim m x n), this means that m values will be be plotted together; this will be done with proportional vertical bars, side by side. To have the m values stacked on top of each other (instead of side by side), use ?DrawStackedBar. The results of DrawHistogram are placed in a file, following the same conventions as DrawPlot. The plot can be seen with ViewPlot(). Examples: > DrawHistogram( [1,2,3,4,3,2,1] ); ViewPlot(); > DrawHistogram( [ [ 38, 180, 42 ], [ 42, 40, 48] ], [ 'politicians', 'darwin users', 'boxers'], [ 'IQ', 'shoe size' ] ); > ViewPlot(); See Also: ?BrightenColor ?DrawPointDistribution ?Set ?ColorPalette ?DrawStackedBar ?SmoothData ?DrawDistribution ?DrawTree ?StartOverlayPlot ?DrawDotplot ?GetColorMap ?StopOverlayPlot ?DrawGraph ?Plot2Gif ?ViewPlot ?DrawPlot ?PlotArguments DrawPlot Function DrawPlot - produce a plot/drawing in a file Option: builtin Calling Sequence: DrawPlot(p,lo..hi) DrawPlot(numlist) DrawPlot(pairlist) DrawPlot(objlist) DrawPlot(plotset) DrawPlot(plotset,lo..hi) Parameters: Name Type Description --------------------------------------------------------------------------- p procedure a numerical procedure lo..hi numeric..numeric numerical range to plot numlist list(numeric) a list of values joined by lines, the values are interpreted as coordinates (i,y[i]) pairlist list([numeric, numeric]) a list of pairs (x[i],y[i]) joined by straight lines objlist list(object) a list of objects (described below) plotset set(plots) a list of any of the above plots The format of the objects in objlist is: ----------------------------------------------------------------------- left aligned text LTEXT(x,y,string,points,angle,color) centered text CTEXT(x,y,string,points,angle,color) right aligned text RTEXT(x,y,string,points,angle,color) line LINE(x1,y1,x2,y2,color,width) closed polygon POLYGON(x1,y1,x2,y2,..., fill,color,width) circle CIRCLE(x,y,radius,fill,color,width) ----------------------------------------------------------------------- x,x1,x2,y,y1,.. numeric values of coordinates points points=, size in points of the text angle angle=, angle of the text in degrees color color=[r,g,b], values of red/green/blue within 0..1 (color is incompatible with fill) fill fill=, fill in color (0-black, 1-white) (fill is incompatible with color) width width=, width of lines in points Returns: NULL Synopsis: Plot a set of objects creating PostScript output which is stored in a file. The name of the file can be set using Set(plotoutput). By default it is "temp.ps". It is assumed that all the objects being drawn are on the same x and y coordinates, that is all the x and y values are on the same units. Optional arguments are: keyword description ------------------------------------------------------------------ proportional causes identical scaling for x and y axis axis forces x and y axes to be drawn grid forces a grid of lines to be drawn topmargin=xx xx (user) units of space are added at the top botmargin=xx xx (user) units of space are added at the bottom leftmargin=xx xx (user) units of space are added as left margin rightmargin=xx xx (user) units of space are added as right margin See Also: ?BrightenColor ?DrawPointDistribution ?Set ?ColorPalette ?DrawStackedBar ?SmoothData ?DrawDistribution ?DrawTree ?StartOverlayPlot ?DrawDotplot ?GetColorMap ?StopOverlayPlot ?DrawGraph ?Plot2Gif ?ViewPlot ?DrawHistogram ?PlotArguments DrawPointDistribution Function DrawPointDistribution - histogram of point distribution Calling Sequence: DrawPointDistribution(data,Bars) Parameters: Name Type Description ------------------------------------------------------------------- data array(numeric) data values, not necessarily ordered Bars posint (optional) number of ranges, histogram bars anything (optional) see ?PlotArguments Returns: NULL Synopsis: DrawPointDistribution produces a plot of a histogram of the distribution of the given data points. The data values are sorted and classified in a number of equally spaced ranges. For each range a histogram (vertical bar) with the number of points in that range is drawn. This produces a discrete approximation of the density distribution of the points in data. The data values do not need to be in order. The number of vertical bars is automatically computed or it can be set with an optional second argument. The results of DrawPointDistribution are placed in a file, following the same conventions as DrawPlot. The plot can be seen with ViewPlot(). Examples: > DrawPointDistribution( [seq(Rand(Normal),i=1..500)] ); ViewPlot(); See Also: ?BrightenColor ?DrawHistogram ?Plot2Gif ?StopOverlayPlot ?ColorPalette ?DrawPlot ?PlotArguments ?ViewPlot ?DrawDistribution ?DrawStackedBar ?Set ?DrawDotplot ?DrawTree ?SmoothData ?DrawGraph ?GetColorMap ?StartOverlayPlot DrawSplitGraph Function DrawSplitGraph( g:Graph, angles:array(numeric), title:string ) Draws graph g with edge e at angle angles[e[1,2]]. DrawSplits Function DrawSplits( splits:list([numeric, set]), all:{posint,set} ) Draws a graph from a list of dSplits. all is the set of all taxa of the split or a posint if the set is 1..all. DrawStackedBar Function DrawStackedBar - histogram with multiple values on each bar Calling Sequence: DrawStackedBar(data,labels,legend) Parameters: Name Type Description ------------------------------------------------------------------------ data matrix(numeric) data values dim m x n labels array (optional) dim n, labels of each vertical bar legend array (optional) dim m, description of each stack anything (optional) see ?PlotArguments Returns: NULL Synopsis: DrawStackedBar produces a histogram of the numerical values given in data. Each vertical bar is composed of several segments, corresponding to the m lists of values, stacked on top of each other. The data values are printed inside each stacked segment of the bars. To have the m values side by side (instead of stacked), use ?DrawHistogram. The results of DrawStackedBar are placed in a file, following the same conventions as DrawPlot. The plot can be seen with ViewPlot(). Examples: > DrawStackedBar( [ [ 38, 180, 42 ], [ 42, 40, 48] ], [ 'politicians', 'darwin users', 'boxers'], [ 'IQ', 'shoe size' ] ); > ViewPlot(); See Also: ?BrightenColor ?DrawPlot ?Set ?ColorPalette ?DrawPointDistribution ?SmoothData ?DrawDistribution ?DrawTree ?StartOverlayPlot ?DrawDotplot ?GetColorMap ?StopOverlayPlot ?DrawGraph ?Plot2Gif ?ViewPlot ?DrawHistogram ?PlotArguments DrawTree Function DrawTree - general front-end for drawing phylogenetic trees Calling Sequence: DrawTree(tree,method,modif) Parameters: Name Type Description --------------------------------------------------------------------------- tree Tree input tree to draw method string (optional) method to display the tree modif {string,symbol = anything} optional modifiers for the drawing Returns: NULL Synopsis: DrawTree draws a phylogenetic tree and produces a file containing postscripts commands. This is a single interface for all the methods and variants that we could imagine for drawing phylogenetic trees. The tree must contain length information in its nodes, as it is the common case for the functions which build the trees. The behaviour is classified according to the following phases: Mode of tree display: Vertical Vertical horizontally equally spaced leaves, vertical height preserved Unrooted planar representation, root is only identified by a small circle, branch distances are preserved. Also called Splat trees Radial leaves are on equally spaced directions from the root, distances to the root preserved RadialLines like Radial, with arcs indicating distances Phylogram left to right horizontal branches, branch lengths preserved Cladogram left to right horizontal branches, branches to leaves stretched to align right Bisect like Radial, but parent is on bisector line BisectLines like Bisect, with arcs indicating distances ArcRadial a Cladogram drawn with polar coordinates Reordering of leaves: use the ordering in the Tree OrderLeaves= permute the left-right subtrees to make the clusters as contiguous as possible OrderLeaves=LeftHeavy permute the left-right subtrees to make the left subtrees the largest OrderLeaves=Random randomly permute the left-right subtrees to (possibly) obtain better looking trees Branch labelling: Adaptive, 2-digit precision, branch labelling LengthFormat= A string which is interpreted as a format of an sprintf call with the length of the branch. If set to the empty string, no branch labelling will happen. LengthFormat= A procedure: (Length) -> string which takes the branch length as an argument and produces the string to be placed on the branch. BranchDrawing= A procedure that will do all the branch drawing. (x1, y1, x2,y2, l) -> list( drawing commands). The branch spans from (x1,y1) to (x2,y2) and has a branch length l. Use ShowBootstrap to display boostrapping values on the branches. Internal Nodes: no labelling happens for internal nodes InternalNodes= A procedure (Tree,x,y) -> list( drawing commands) which will be invoked every time that an internal node (identified by Tree), is drawn at position (x,y). ShowBootstrap would display the bootstrapping values for internal nodes if they are present in the fourth field of the Tree data structure. Leaf display information: circle with leaf[Label] written. If the Leaf contains additional arguments of the form: Shape = sss or Color = ccc, then the Leaf is displayed using the shape sss and color ccc. Alternatively, if the Label is the structure Color(colorcode,xxx), then xxx will be taken as the Label and will be colored with the given colorcode. Legend leaf[Label] written (no circle) LeafDrawing= A procedure (Leaf,x,y) -> list( drawing commands) to display the Leaf centered at (x,y). Clusters= color and shape according to cluster RadialLabels leaf labels radial Cross referencing: no cross referencing, all labelling is done with leaf[Label] CrossReference all labelling is done with an alphanumeric character and leaf[Label] is cross referenced on the right Title: Title=anything Title to appear centered at the bottom Size of Text: TextSize= Set point-size for all text Minimum branch length: MinBranchLength=positive Force all branches to be of a minimum length. The labelling will be done with the original lengths, but the drawing will use this minimum value. This is a useful option when part of the tree is cramped together and difficult to see. The proportions will not be maintained, but the tree can be understood. It is recommended to display the edge lengths if this option is used. list of drawing commands: CTEXT(...) Centered text (as for DrawPlot) LTEXT(...) Left aligned text (as for DrawPlot) RTEXT(...) Right aligned text (as for DrawPlot) LINE(...) Line (as for DrawPlot) POLYGON(...) Closed polygon (as for DrawPlot) CIRCLE(...) Circle (as for DrawPlot) In all cases, provides the definition of the clusters, or groups of leaves. This can be done as: list(anything) the numbering in the leaves is used as an index in this list, and the value is the cluster name. Clustering will be done on equal values. procedure as above, but the value is obtained by running the procedure on the Leaf. Drawing of lateral gene transfer (LGT) arrows: in the ArcRadial tree display, arrows can be drawn to depict LGTs. Each LGT must is characterized by its two endpoints, defined in a list placed in the 4th field of the relevant Tree() structure (or the 3rd field of a Leaf() structure), as follows: [ 'unique id', {'start','end'}, height, (optionally, an RGB color triplet)]. A list of drawing commands is composed of the objects (as defined in ?DrawPlot) LTEXT, CTEXT, RTEXT, LINE, POLYGON and CIRCLE. See Also: ?BootstrapTree ?Leaf ?SignedSynteny ?Tree ?DrawPlot ?LeastSquaresTree ?Synteny ?ViewPlot ?GapTree ?PhylogeneticTree ?SystemCommand DynProgGap Function DynProgGap( seq1:string, seq2:string ) Does dynamic programming between the two sequences, but the sequences may have gaps. Gap against gap is scored 0. Implementaion of Gotohs algorithm. An additional optional parameter window: integer can be passed. If window > 0, the pam variance along the sequence is estimated by sliding a window along a match and for each stretch the best pam distance is calculated. For this "normal" dynamic programming without alignment of gaps is used. For both sequences a list of pam distances is used. Then the dynamic programming is repeated, but this time using a different Dayhoff matrix at each position of the match that was determined. If there is a deletion in seq1, pam1 is used and vice versa. If there is a match, both pam distances is used and score. Then the distance (and score) with the better score is used. DynProgMass Function DynProgMass - matches digestion fragments with a sequence Option: builtin Calling Sequence: DynProgMass(p,seq,stddev,deb) Parameters: Name Type Description ---------------------------------------- p {array,structure} seq string stddev numeric deb numeric Returns: NULL Synopsis: Matches a Carboxypeptidase A digest (Fragment) with a sequence using dynamic programming. Data structure of fragment: [[-2.0023, P]], [[-1.2703, GV],[-1.2703, VG],[-0.9824, R]], .... [[-1.8961, T]], [0, 104.0941]] Examples: See Also: ?DigestAspN ?DigestWeights ?ProbBallsBoxes ?DigestionWeights ?DynProgMassDb ?ProbCloseMatches ?DigestSeq ?enzymes ?Protein ?DigestTrypsin ?MassProfileResults ?SearchMassDb DynProgMassDb Function DynProgMassDb - matches digestion fragments with a database Option: builtin Calling Sequence: DynProgMassDb(p,m,term,df,stddev,ddb) Parameters: Name Type Description ------------------------------- p array m integer term string df database stddev numeric ddb numeric Returns: NULL Synopsis: Matches a Carboxypeptidase A digest (Fragment) against the whole database Examples: See Also: ?DigestAspN ?DigestWeights ?ProbBallsBoxes ?DigestionWeights ?DynProgMass ?ProbCloseMatches ?DigestSeq ?enzymes ?Protein ?DigestTrypsin ?MassProfileResults ?SearchMassDb DynProgNucPepString Function DynProgNucPepString Option: builtin Calling Sequence: DynProgNucPepString(npm) Parameters: Name Type ------------------ npm NucPepMatch Returns: NULL Synopsis: Return two texts defining the alignment of NucPepMatch suitable to print it. npm[NucGaps], npm[PepGaps] and npm[Introns] must be defined. Examples: See Also: ?AlignNucPepAll ?GetPosition ?NucPepDynProg ?AlignNucPepMatch ?GlobalNucPepAlign ?NucPepMatch ?Denormalize ?Intron ?NucPepRegions ?FindNucPepPam ?LocalNucPepAlign ?ParallelAllNucPepMatches ?Gene ?LocalNucPepAlignBestPam ?PepDB ?GetAllNucPepMatches ?Normalize ?ScoreIntron ?GetIntrons ?NucDB ?VisualizeGene ?GetPeptides ?NucPepBackDynProg ?VisualizeProtein DynProgScore Function DynProgScore - compute the forward phase of sequence alignment Option: builtin Calling Sequence: DynProgScore(seq1,seq2,dm,modif) Parameters: Name Type Description ----------------------------------------------------------------------------- seq1 {ProbSeq,string} first sequence to be aligned seq2 {ProbSeq,string} second sequence to be aligned dm {DayMatrix,list(DayMatrix)} Dayhoff matrix to use for the alignment modif {string,set(string)} specification of alignment Returns: {[Score:numeric],[score:numeric, from1..to1, from2..to2]} Synopsis: Computes the optimal cost of the alignment between seq1 and seq2 using the Dayhoff matrix dm, a specified alignment mode and a specified deletion cost model. It returns a triplet: [ Score, from1..to1, from2..to2 ] or [ Score ] where Score is the optimal score of the alignment and seq1[from1..to1] and seq2[from2..to2] are the selected portions of the sequences to align. seq1 and seq2 can be either peptide sequences, nucleotide sequences or probabilistic sequences, ProbSeq(). Modif is a set of strings which have the following meanings: For the alignment type, one of the following can be specified: Local - (default) a local alignment, the subsequences of seq1 and seq2 which give the highest score. Global - a global alignment, the entire seq1 is matched against the entire seq2. CFE - cost-free ends, the entire seq1 is matched against seq2, but one deletion at the ends is not penalized. CFEright - cost-free ends, the entire seq1 is matched against seq2, but one deletion at the right end is not penalized. Shake - align seq1 and seq2 up to the point where the maximum score happens. Then do the same backwards and forwards until no improvements of the score happen. MinLength(k) - align seq1 and seq2 as in a Local alignment (starting anyplace, ending anyplace) but at least k amino acids of each sequence are aligned. I.e. the minimum of the aligned lengths is k or larger. For the deletion cost model, one of the following can be specified: Affine - (default) deletion cost of a gap of length k is FixedDel + IncDel*(k-1). The values for FixedDel and IncDel are taken from the Dayhoff matrix dm. LogDel - logarithmic deletion cost, the cost of a gap of length k is DelFixedLog + DelLog*log(k). The values for DelFixedLog and DelLog are taken from the Dayhoff matrix dm. For the type of result, any combination of the following can be specified: JustScore - only the score is computed, and the locations of the match are not returned (this makes the algorithm run faster for Local and CFE). NoSelf - compute an alignment where matches of a position with itself are disallowed. This is relevant when aligning a sequence with itself with the purpose of discovering repeated motifs. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > DynProgScore(AC(P00083),AC(P00091),DM,Local); [177.7799, 14..92, 19..97] > DynProgScore(AC(P00083),AC(P00091),DM,Global); [144.4751, 1..127, 1..139] > DynProgScore(AC(P00083),AC(P00091),DM,{CFE,JustScore}); [174.0188] > DynProgScore('ADEFGHIKSDEFGHLK','ADEFGHIKSDEFGHLK',DM,NoSelf); [75.0720, 1..16, 1..16] See also: ?Align ?Alignment ?CreateDayMatrices ?MAlign DynProgStrings Function DynProgStrings - compute score and aligned strings from a Match Option: builtin Calling Sequence: DynProgStrings(m,dm) DynProgStrings(m,dm,NoSelf) DynProgStrings(al) Parameters: Name Type Description ------------------------------------------------------------------- m Match input Match dm DayMatrix scoring matrix NoSelf string (optional), no self alignments will be allowed al Alignment input Alignment object Returns: [numeric, string, string] : [score,seq1,seq2] Synopsis: Returns a list with the similarity score, first sequence and second sequence suitable for printing the given match with the given similarity matrix. The sequences are the original sequences from the match with inserted '_' as needed to produce the desired alignment. If a third argument is provided, it must be the keyword 'NoSelf'. This is an indication that no position will be aligned with itself, a situation useful for the detection of repetitious patterns. If an Alignment is provided, all the information is contained in the object, and no additional arguments are needed. Examples: > al := Align('ADEFGHIKLMNNW','ADEFGKLMNNW'); al := Alignment('ADEFGHIKLMNNW','ADEFGKLMNNW',36.4025,DM,0,0,{Local}) > DynProgStrings(Match(al),DM); [36.4025, ADEFGHIKLMNNW, ADEFG__KLMNNW] > seq1 := 'ADEFGHIKSDEFGHLK'; seq1 := ADEFGHIKSDEFGHLK > al := Align(seq1,seq1,NoSelf); al := Alignment('ADEFGHIK','SDEFGHLK',35.0800,DM,0,0,{Local,NoSelf}) > DynProgStrings(Match(al),DM,NoSelf); [35.0800, ADEFGHIK, SDEFGHLK] > DynProgStrings(al); [35.0800, ADEFGHIK, SDEFGHLK] See also: ?Align ?CodonDynProgStrings ?Match ?print Edge Class Edge - edge/arc description Template: Edge(Label,From,To) Returns: Edge Fields: Name Type Description ---------------------------------------------------- Label anything the label of the edge From anything the first end point of the edge. To anything the second end point of the edge. Methods: Edge_type select Synopsis: The Edge data structure stores the information associated with an edge. Some algorithms assume that the Label field stores a numeric value representing a weight. The Edges are always directed, but if the graph is meant to be undirected, then the From/To are exchangeable and only one entry per Edge is needed. Examples: > G := Graph( Edges( Edge(4,1,2), Edge(7,1,3), Edge(6,2,4), Edge(5,3,4) ), Nodes(1, 2, 3, 4) ); G := Graph(Edges(Edge(4,1,2),Edge(7,1,3),Edge(6,2,4),Edge(5,3,4)),Nodes(1,2,3,4)) > G[Edges, 1, Label]; 4 See Also: ?BipartiteGraph ?Graph_Rand ?ParseDimacsGraph ?Clique ?Graph_XGMML ?Path ?DrawGraph ?InduceGraph ?RegularGraph ?EdgeComplement ?MaxCut ?ShortestPath ?Edges ?MaxEdgeWeightClique ?TetrahedronGraph ?FindConnectedComponents ?MinCut ?VertexCover ?Graph ?MST ?Graph_minus ?Nodes EdgeComplement Function EdgeComplement - construct the graph on the complementary edges Calling Sequence: EdgeComplement(Graph) Parameters: Name Type Description ------------------------------ Graph Graph an input graph Returns: Graph Synopsis: Computes the complement graph of the input. This is a graph over the same set of nodes, but with edges where there were no edges and vice- versa. The labels of the old edges are lost and the new edges are assigned a 0 label. Examples: > hex := HexahedronGraph(); hex := Graph(Edges(Edge(0,1,2),Edge(0,1,4),Edge(0,1,5),Edge(0,2,3),Edge(0,2,6),Edge(0,3,4),Edge(0,3,7),Edge(0,4,8),Edge(0,5,6),Edge(0,5,8),Edge(0,6,7),Edge(0,7,8)),Nodes(1,2,3,4,5,6,7,8)) > EdgeComplement(hex); Graph(Edges(Edge(0,1,3),Edge(0,1,6),Edge(0,1,7),Edge(0,1,8),Edge(0,2,4),Edge(0,2,5),Edge(0,2,7),Edge(0,2,8),Edge(0,3,5),Edge(0,3,6),Edge(0,3,8),Edge(0,4,5),Edge(0,4,6),Edge(0,4,7),Edge(0,5,7),Edge(0,6,8)),Nodes(1,2,3,4,5,6,7,8)) See Also: ?BipartiteGraph ?Graph_Rand ?ParseDimacsGraph ?Clique ?Graph_XGMML ?Path ?DrawGraph ?InduceGraph ?RegularGraph ?Edge ?MaxCut ?ShortestPath ?Edges ?MaxEdgeWeightClique ?TetrahedronGraph ?FindConnectedComponents ?MinCut ?VertexCover ?Graph ?MST ?Graph_minus ?Nodes Eigenvalues Function Eigenvalues - Eigenvalue/vector decomposition of a symmetric matrix Option: builtin Calling Sequence: Eigenvalues(A,eigenvects) Parameters: Name Type Description --------------------------------------------- A matrix a symmetric matrix eigenvects name an optional matrix name Returns: list(numeric) Synopsis: Compute an eigenvalue/eigenvector decomposition of A. A must be a symmetric matrix. The function returns the vector containing the eigenvalues in increasing order. The optional second argument, if present must be a name that will be assigned with the matrix of the eigenvectors. The eigenvectors have norm 1 and are stored columnwise and the ith column corresponds to the ith eigenvalue. Examples: > A := [[3,1,2],[1,2,-1],[2,-1,5]]; A := [[3, 1, 2], [1, 2, -1], [2, -1, 5]] > alpha := Eigenvalues(A,V); alpha := [0.4921, 3.2444, 6.2635] > Vt := V^t; Vt := [[0.6041, -0.6782, -0.4185], [0.6191, 0.7301, -0.2894], [0.5018, -0.08423029, 0.8609]] > A*Vt[1] = alpha[1]*Vt[1]; [0.2973, -0.3337, -0.2059] = [0.2973, -0.3337, -0.2059] > Vt[2]*Vt[2]; 1.0000 See Also: ?Cholesky ?GivensElim ?matrix ?transpose ?convolve ?Identity ?matrix_inverse ?GaussElim ?LinearProgramming ?SvdAnalysis EnterProfile Function EnterProfile Option: builtin Calling Sequence: EnterProfile(blockname) Parameters: Name Type ------------------ blockname string Returns: NULL Synopsis: This function is used to identify the beginning of a block to be profiled. An EnterProfile should always be matched to an ExitProfile which should be at the same level (in the same statement sequence) and should have the same blockname. The code surrounded by the EnterProfile and ExitProfile will be profiled under the name given by blockname. Many pairs of Enter/ ExitProfile may be used, with or without the same blockname. Run time statistics will be grouped by blockname. Enter/ExitProfile pairs cannot be nested within the same statement sequence. Examples: > EnterProfile(longloop); > s:=0: for i to 10^5 do s := s+1/i od; 12.0901 > ExitProfile(longloop); See also: ?ExitProfile ?profiling Entry Function Entry - return entries from the database DB Option: polymorphic Calling Sequence: Entry(a) Parameters: Name Type Description ------------------------------------------------------------------------------------------------- a {integer,string,structure,list(integer)} Entry number(s) or other description of entries Returns: {expseq(string),string} Synopsis: Entry returns the string(s) corresponding to the entries in the database DB described. This can be through entry numbers, PatEntry, Match, ID, AC or partial references to entries. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > e1 := Entry(1); e1 := 104K_THEPAP15711;104 kDa ..(1255).. L > Entry(PatEntry(10000..10001)); SYP1_YEASTP25623; P25622; Q96VH0;< ..(1372).. A, SYP_CHLPNQ9Z851; Q9JSE4;P ..(1517).. A > Entry(AC('P11341')); VG9_SPV4P11341;Gene 9 pro ..(266).. R > Entry(ID('ID5B_PROJU')); ID5B_PROJUP32734;Kunitz-t ..(678).. G > s1 := Sequence(e1); s1 := MKFLILLFNILCLFPVLAADNHGVGPQGASGVDPITFDINSNQTGPAFLT ..(924).. ILVVSLIVGIL > Entry(e1); 104K_THEPAP15711;104 kDa ..(1255).. L > GetEntryNumber(e1); 1 See Also: ?AC ?GetEntryNumber ?Match ?SearchTag ?GetEntryInfo ?ID ?PatEntry ?Sequence EstimateCodonPAM Function EstimateCodonPAM - finds the best-scoring CodonPAM matrix for an alignment Calling Sequence: EstimateCodonPAM(dps1,dps2,cms) Parameters: Name Type Description --------------------------------------------------------- dps1 string First of the aligned sequences. dps2 string Second of the aligned sequences. cms list(DayMatrix) array of codon scoring matrices Returns: [Score, CodonPAM, CodonPAMVariance] Synopsis: Given two codon-wise aligned DNA sequences, this functions finds the best-scoring CodonPAM matrix. Anaologous to EstimatePam, it returns a list containing the score, the CodonPAM estimate and the CodonPAM variance. Examples: > EstimateCodonPAM(AAACCCGGGTTT,AAACCG___TTT,CMS); [11.0814, 91, 1653.3145] See Also: ?CodonAlign ?CreateCodonMatrices ?EstimateSynPAM ?CodonDynProgStrings ?EstimatePam EstimateNG86 Function EstimateNG86 Calling Sequence: EstimateNG86(seq1,seq2) Parameters: Name Type Description ------------------------------------ seq1 string aligned DNA sequence seq2 string aligned DNA sequence Returns: array(numeric) Synopsis: Computes dN and dS following the method by Nei and Gojobori (1986). The function returns four values, dN and dS as well as the number of nonsynonymous (N) and synonymous sites (S). If either dN or dS cannot be computed (typically because of too much divergence), -1 is returned for the respective value. Examples: > EstimateNG86(AAAAAATTT,AAAAAGTTA); [0.1435, 3.5455, 7.6548, 1.3452] See also: ?CodonAlign ?EstimatePB93 ?EstimateSynPAM EstimatePB93 Function EstimatePB93 Calling Sequence: EstimatePB93(seq1,seq2) Parameters: Name Type Description ------------------------------------ seq1 string aligned DNA sequence seq2 string aligned DNA sequence Returns: array(numeric) Synopsis: Computes Ka and Ks following the method by Pamilo and Bianchi (1993). The function returns a list [Ka,Ks] with the two estimates. If these values cannot be computed (typically because of too much divergence), then [-1,-1] is returned. Examples: > EstimatePB93(AAAAAATTT,AAAAAGTTA); [0.1648, 0.7611] See also: ?CodonAlign ?EstimateNG86 ?EstimateSynPAM EstimatePam Function EstimatePam Option: builtin Calling Sequence: EstimatePam(s1,s2,days) Parameters: Name Type ----------------------- s1 string s2 string days array(DayMatrix) Returns: [Score,PamDistance,PamVariance] Synopsis: Calculates the similarity score, Pam distance and Pam variance for the alignment defined by s1 and s2. Notice that s1 and s2 are taken as aligned already, that is, they are not re-aligned. If s1 and s2 need to be aligned, use DynProgStrings first. The estimation of the Pam distance and variance is normally done by Align when given an array of Dayhoff matrices. If the the estimated distance is lower than 0.1 pam, the estimate is also computed by expected values (the computation of distances by maximum likelihood becomes less accurate). This second estimate is stored in the global variable ExpectedPamDistance. The computation of the PamDistance by maximum likelihood (exactly, not just for an existing DM in days) is stored in the global variable MLPamDistance. Examples: > EstimatePam('CITKLFDGDQVLY', Mutate('CITKLFDGDQVLY', 100), DMS); [73.1848, 61, 822.4780] See Also: ?Align ?DynProgStrings ?EstimateSynPAM ?CalculateScore ?EstimateCodonPAM EstimateSynPAM Function EstimateSynPAM - finds the best-scoring SynPAM matrix for an alignment Calling Sequence: EstimateSynPAM(dps1,dps2) Parameters: Name Type Description ------------------------------------------------ dps1 string First of the aligned sequences. dps2 string Second of the aligned sequences. Returns: [Score, SynPAM, SynPAMVariance] Synopsis: Given two codon-wise aligned DNA sequences, this functions finds the best-scoring SynPAM matrix. Anaologous to EstimatePam, it returns a list containing the score, the SynPAM estimate and the SynPAM variance. Examples: > EstimateSynPAM(AAACCCGGGTTT,AAACCG___TTT); [2.3328, 51.9870, 942.8518] See Also: ?CodonAlign ?CreateSynMatrices ?EstimatePam ?CodonDynProgStrings ?EstimateCodonPAM ?EstimatePB93 ?CodonMatrix ?EstimateNG86 EvolTree Data structure EvolTree( ) Function: creates a EvolTree data structure Selectors: Tree: Tree TC: TreeConstruction type (how was the tree constructed) Data: DataMatrix Index: Tree fitting index PAM: Total pam length of tree Score: Score of tree Order: TSP order Other selectors not contained directly in data structure n: number of leaves leaves: returns a list of leafnames of tree Constructors: EvolTree(Tree) EvolTree(Tree, TC) EvolTree(Tree, TC, Data, Index, PAM, Score, Order) ExitProfile Function ExitProfile Option: builtin Calling Sequence: ExitProfile(blockname) Parameters: Name Type ------------------ blockname string Returns: NULL Synopsis: This function is used to identify the ending of a block to be profiled. An ExitProfile should always be matched to a previous EnterProfile which should be at the same level (in the same statement sequence) and should have the same blockname. The code surrounded by the EnterProfile and ExitProfile will be profiled under the name given by blockname. Many pairs of Enter/ExitProfile may be used, with or without the same blockname. Run time statistics will be grouped by blockname. Enter/ ExitProfile pairs cannot be nested within the same statement sequence. Examples: > EnterProfile(longloop); > s:=0: for i to 10^5 do s := s+1/i od; 12.0901 > ExitProfile(longloop); See also: ?EnterProfile ?profiling ExpFit Function ExpFit - Least squares exponential fit: y[i] ~ a + b * exp(c*x[i]) Calling Sequence: ExpFit(y,x) Parameters: Name Type Description -------------------------------------------- x array(numeric) dependent variable y array(numeric) independent variable Returns: [a,b,c,sumsq] Synopsis: Compute a least squares fit of the type: y[i] ~ a + b * exp(c*x[i]) where a,b and c are the parameters of the approximation and sumsq is the sum of the squares of the errors of the approximation. Examples: > x := [1,2,3,4,5]; x := [1, 2, 3, 4, 5] > y := [0.49, 1.02, 2.1, 4.01, 7.8]; y := [0.4900, 1.0200, 2.1000, 4.0100, 7.8000] > ExpFit(y,x); [-0.1014, 0.3113, 0.6467, 0.00204305] See also: ?ExpFit2 ?LinearRegression ?Stat ExpFit2 Function ExpFit2 - Least squares exponential fit: y[i] ~ a * exp(b*x[i]) Calling Sequence: ExpFit2(y,x) Parameters: Name Type Description -------------------------------------------- y array(numeric) dependent variable x array(numeric) independent variable Returns: [a, b, sumsq] Synopsis: Compute the least squares fit of the type y[i] ~ a * exp(b * x[i]). sumsq is the sum of squares of the approximation errors. Examples: > x := [1,2,3,4,5]; x := [1, 2, 3, 4, 5] > y := [0.49, 1.02, 2.1, 4.01, 7.8]; y := [0.4900, 1.0200, 2.1000, 4.0100, 7.8000] > ExpFit2(y,x); [0.2771, 0.6677, 0.00586200] See also: ?ExpFit ?LinearRegression ?Stat ExpandFileName Function ExpandFileName( dir:string, name:string ) Generate file name from directory and name. Exponential_Rand Function Exponential_Rand - Generate random exponentially distributed reals Calling Sequence: Rand(Exponential(a,b)) Returns: numeric Synopsis: This function returns a random exponentially distributed number with average a+b and variance b^2. In mathematical terms, the probability that the outcome is x is exp( -(x-a)/b ) / b. The first parameter, a, can take any arbitrary value. The second parameter, b, has to be positive. Exponential_Rand uses Rand() which can be seeded by either the function SetRand or SetRandSeed. References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.1.28 Examples: > Rand(Exponential(0.3,3)); 2.4395 See Also: ?Beta_Rand ?FDist_Rand ?Normal_Rand ?StatTest ?Binomial_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score ?ChiSquare_Rand ?Geometric_Rand ?SetRand ?Student_Rand ?CreateRandSeq ?Graph_Rand ?SetRandSeed ?Zscore ?Cumulative ?Multinomial_Rand ?Shuffle ExtendClass Function ExtendClass - Extend a class with additional fields Calling Sequence: ExtendClass(newclass,oldclass,addarg1,...) Parameters: Name Type Description ------------------------------------------------------------------------------ newclass symbol The new class, oldclass with more fields oldclass symbol The base class being extended addarg1 [symbol, type, anything] Description of additional arguments Returns: NULL Synopsis: ExtendClass creates a new class which has all the fields of the base class plus additionally defined ones. The result is a new class which automatically inherits all the methods of the oldclass and has additional fields described in the 3 and onwards arguments of ExtendClass. The description of each additional argument is a list of three values, the name of the new field, its type and, optionally, its default value. The default value is used when creating an object without it or when converting an object from oldclass to newclass. More precisely the following functions are created: Old method New Method Comment ------------------------------------------------------------------------------ oldclass newclass Constructor based on the oldclass constructor oldclass_xxx newclass_xxx same rules as Inherit yyy_oldclass yyy_newclass conversions from other classes to newclass oldclass_newclass widening conversion newclass_oldclass narrowing conversion If some methods are not expected to be inherited from oldclass, they should either be unevaluated after calling ExtendClass or defined before calling ExtendClass. ExtendClass does an implicit Inherit, so there is no point in doing an Inherit (newclass,oldclass). Any protection defined for the oldclass is inherited in the newclass. The newclass can Inherit other additional classes as usual. Examples: > ExtendClass( DistTree, Tree, [height,numeric,0] ); See also: ?CompleteClass ?Inherit ?objectorientation ?Protect FDist_Rand Function FDist_Rand - Generate random F-(variance-ratio) distributed reals Calling Sequence: Rand(FDist(nu1,nu2)) Parameters: Name Type ------------------ nu1 nonnegative nu2 nonnegative Returns: nonnegative Synopsis: This function returns a random F distributed or Variance-ratio distributed number with average nu1/(nu2-2). If X1 and X2 are Chi-square distributed variables with parameters nu1 and nu2, then X1/X2 is distributed as FDist(nu1,nu2). This distribution has a non-finite expected value for nu2<=2 and non-finite variance for nu2<=4. FDist_Rand uses Rand() which can be seeded by either the function SetRand or SetRandSeed. References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.6 Examples: > Rand(FDist(3,1)); 1.9703 > Rand(FDist(1,100)); 0.02513195 See Also: ?Beta_Rand ?Exponential_Rand ?Normal_Rand ?StatTest ?Binomial_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score ?ChiSquare_Rand ?Geometric_Rand ?SetRand ?Student_Rand ?CreateRandSeq ?Graph_Rand ?SetRandSeed ?Zscore ?Cumulative ?Multinomial_Rand ?Shuffle FileStat Class FileStat - the unix file status structure Template: FileStat(path) Fields: Name Type Description ---------------------------------------------------- path string a filename or a path st_dev posint device st_ino posint inode st_mode posint protection st_nlink posint number of hard links st_uid integer user ID of owner st_gid integer group ID of owner st_rdev integer device type (if inode device) st_size integer total size, in bytes st_blksize posint blocksize for filesystem I/O st_blocks integer number of blocks allocated st_atime posint time of last access st_mtime posint time of last modification st_ctime posint time of last change Returns: FileStat Methods: FileStat_type Synopsis: This class stores the unix stat structure, see "man 2 stat" in any unix system for details. When called with a single argument, it constructs the entire structure. The unix names have been retained for the fields. This operation is very efficient, it only requires reading the directory and completes without the execution of any system command. Hence it is the recommended way of finding any information about a file. When the file does not exist, an empty data structure is returned. Examples: > FileStat(libname)[st_size]; 163840 > FileStat('/dev/null')[st_mtime]; 1349038461 > FileStat(non_existing_file); FileStat() See Also: ?inputoutput ?OpenReading ?ReadLine ?SearchDelim ?LockFile ?OpenWriting ?ReadRawFile ?SplitLines ?OpenAppending ?ReadData ?ReadRawLine FindCircularOrder Function FindCircularOrder - list of Leaf labels in lexicographical order Calling Sequence: FindCircularOrder(t) Parameters: Name Type Description ------------------------- t Tree input tree Returns: list Synopsis: Find a circular order of a tree, in particular, a lexicographical order of a Tree. Examples: > tree := Tree(Tree(Tree(Leaf(f9,-90.4683,372),-89.6 ..(219).. 572,367))): > FindCircularOrder(tree); [f9, e8, e7, e6, e5, e4, f9] See also: ?CircularTour ?Clusters ?Leaf ?Leaves ?Tree FindConnectedComponents Function FindConnectedComponents - set of connected components of a Graph Calling Sequence: FindConnectedComponents(G) Parameters: Name Type Description ---------------------------- G Graph a given Graph Returns: set(Graph) Synopsis: This function computes the set of connected components of a Graph, which are returned as a set of Graphs. The Graph is assumed to be undirected. The disconnected nodes are returned as singleton Graphs, i.e. a Graph with a single node. Examples: > G1 := Graph(Edges(Edge('a',1,2),Edge('b',2,3)), Nodes(1,2,3,4)): > FindConnectedComponents(G1); {Graph(Edges(),Nodes(4)),Graph(Edges(Edge(a,1,2),Edge(b,2,3)),Nodes(1,2,3))} See Also: ?BipartiteGraph ?Graph ?MaxEdgeWeightClique ?RegularGraph ?Clique ?Graph_minus ?MinCut ?ShortestPath ?DrawGraph ?Graph_Rand ?MST ?TetrahedronGraph ?Edge ?Graph_XGMML ?Nodes ?VertexCover ?EdgeComplement ?InduceGraph ?ParseDimacsGraph ?Edges ?MaxCut ?Path FindEntropy Function FindEntropy Calling Sequence: FindEntropy(day) Parameters: Name Type ---------------- day DayMatrix Returns: numeric Synopsis: Computes the relative entropy H of day, i.e. how many bits of information are available per position of an alignment. See S.F. Altschul, "Amino Acid Substitution Matrices from an Information Theoretic Perspective" , JMB 219(1991):555-565. Examples: > CreateDayMatrices(); > FindEntropy(DMS[1]); 4.1819 > FindEntropy(DMS[500]); 0.2829 > FindEntropy(DMS[1000]); 0.00618796 FindHighlyExpressedGenes Function FindHighlyExpressedGenes - Find genes with high expression Calling Sequence: FindHighlyExpressedGenes([e]) Returns: list Synopsis: experimental expression data must be avalable in the entries See also: ?ComputeCAI ?SetupRA FindLongestRep Function FindLongestRep Option: builtin Calling Sequence: FindLongestRep(db) FindLongestRep(db,len) FindLongestRep(db,len,eb) Parameters: Name Type --------------- db database len integer eb integer Returns: string Synopsis: Find the longest repetition(s) in the database db. If len is specified, then return only those repetitions longer than len. If len and eb are specified, then return repetitions longer than len - endbonus when matching to the end of both sequences. This command requires that a pat index has been built for the database db FindNucPepPam Function FindNucPepPam - Compute Pam estimate for a NucPepMatch Option: builtin Calling Sequence: FindNucPepPam(npm,DMS) Parameters: Name Type ----------------------- npm NucPepMatch DMS array(DayMatrix) Returns: NULL Synopsis: Computes the best pam estimate and its variance for the given NucPepMatch. Examples: See Also: ?AlignNucPepAll ?GlobalNucPepAlign ?NucPepBackDynProg ?AlignNucPepMatch ?LocalNucPepAlign ?NucPepDynProg ?DynProgNucPepString ?LocalNucPepAlignBestPam ?NucPepMatch FindRules Function FindRules( t:Tree ) Checks the tree for any rules in the form: a is closer to b than to c and returns a list of those rules. FindSpeciesViolations Function FindSpeciesViolations( arg:anything ) arg: a Tree or a list. If it is a tree, it must contain information (6, 7) about species. Use AddSpecies to get such a tree. From this tree a list of rules is generated (a closer to b than to c etc). If it is a list of those rules ([a, {b, c}], [d, {e, f}], ...) a list of contradictions is returned GOdefinition Function GOdefinition - returns the definition of a Gene Ontology Calling Sequence: GOdefinition(go) Parameters: Name Type Description ------------------------------------ go {posint,string} GO number Returns: string Synopsis: Returns a longer definition describing a GO number. The argument can either be a number or a string of the form 'GO:002354'. Examples: > GOdefinition(23); The chemical reactions and pathways involving the disaccharide maltose (4-O-alpha-D-glucopyranosyl-D-glucopyranose), an intermediate in the catabolism of glycogen and starch See Also: ?GOdownload ?GOnumber ?GOsubclassR ?GOsuperclassR ?GOname ?GOsubclass ?GOsuperclass GOdownload Function GOdownload - downloads the gene ontologies and converts them to a Darwin readable format Calling Sequence: GOdownload Returns: NULL Synopsis: Downloads the gene ontologies from http://www.geneontology.org/ ontology/gene_ontology.obo and converts them to Darwin tables that are stored in the file GOdata.drw which is located in Darwin' data directory. See Also: ?GOdefinition ?GOnumber ?GOsubclassR ?GOsuperclassR ?GOname ?GOsubclass ?GOsuperclass GOname Function GOname - returns the name of a Gene Ontology Calling Sequence: GOname(go) Parameters: Name Type Description ------------------------------------ go {posint,string} GO number Returns: string Synopsis: Returns the name for a GO number. The argument can either be a number or a string of the form 'GO:001369'. Examples: > GOname(23); maltose metabolic process See Also: ?GOdefinition ?GOnumber ?GOsubclassR ?GOsuperclassR ?GOdownload ?GOsubclass ?GOsuperclass GOnumber Function GOname - returns the GO number of a Gene Ontology term Calling Sequence: GOnumber(go) Parameters: Name Type Description --------------------------- desc string GO name Returns: integer Synopsis: Returns the GO number corresponding to a GO name. This function is the inverse of GOname(). Examples: > GOnumber('metabolic process'); 8152 See Also: ?GOdefinition ?GOnumber ?GOsubclassR ?GOsuperclassR ?GOdownload ?GOsubclass ?GOsuperclass GOsubclass Function GOsubclass - returns all subclasses for a Gene Ontology Calling Sequence: GOsubclass(go) GOsubclass(go,links = {can_be,has_parts}) Parameters: Name Type Description ---------------------------------------------------------------------------- go {posint,string} GO number links set(string) types of links to follow (can_be and/or has_parts) Returns: list(integer) Synopsis: Returns all subclasses for a Gene Ontology. This is the inverse of the 'is_a' and 'part_of' relationship. The argument can either be a number or a string of the form 'GO:009594'. The optional named argument 'links' can be used to restrict the type of relationships using 'can_be' or 'has_parts'. Examples: > GOname(48311); mitochondrion distribution > GOsubclass(48311); [1, 48312] > for t in GOsubclass(48311) do print(GOname(t)) od; mitochondrion inheritance intracellular distribution of mitochondria See Also: ?GOdefinition ?GOname ?GOsubclassR ?GOsuperclassR ?GOdownload ?GOnumber ?GOsuperclass GOsubclassR Function GOsubclassR - recursive calls to GOsubclass Calling Sequence: GOsubclassR(go) GOsubclassR(go,links = {can_be}) Parameters: Name Type Description ---------------------------------------------------------------------------- go {posint,string} GO number links set(string) types of links to follow (can_be and/or has_parts) Returns: list(integer) Synopsis: Recursively calls GOsubclass to find all subclasses for a Gene Ontology. The argument can either be a number or a string of the form 'GO:001819'. Examples: > GOname(7005); mitochondrion organization > GOsubclassR(7005); [1, 2, 266, 1836, 1844, 6264, 6390, 6391, 6392, 6393, 6626, 6627, 7006, 7007, 7008, 7287, 8053, 8637, 30150, 30382, 32042, 32043, 32543, 32976, 32979, 32981, 33108, 33615, 33617, 33955, 34551, 34553, 42407, 42792, 43504, 43653, 45039, 45040, 45041, 45042, 45043, 45044, 46902, 48311, 48312, 51204, 70096, 70124, 70125, 70126, 70127, 70143, 70144, 70145, 70146, 70147, 70148, 70149, 70150, 70151, 70152, 70153, 70154, 70155, 70156, 70157, 70158, 70159, 70183, 70184, 70185, 70584] See Also: ?GOdefinition ?GOname ?GOsubclass ?GOsuperclassR ?GOdownload ?GOnumber ?GOsuperclass GOsuperclass Function GOsuperclass - returns all superclasses for a Gene Ontology Calling Sequence: GOsuperclass(go) GOsuperclass(go,links = {is_a}) Parameters: Name Type Description ------------------------------------------------------------------------ go {posint,string} GO number links set(string) types of links to follow (is_a and/or part_of) Returns: list(integer) Synopsis: Returns all superclasses for a Gene Ontology. This represents in the default case both the 'is_a' and 'part_of' relationship. The argument can either be a number or a string of the form 'GO:005951'. The optional named argument 'links' can be used to restrict the type of relationships to one of them. Examples: > GOname(1); mitochondrion inheritance > GOsuperclass(1); [48308, 48311] > for t in GOsuperclass(1) do print(GOname(t)) od; organelle inheritance mitochondrion distribution See Also: ?GOdefinition ?GOname ?GOsubclass ?GOsuperclassR ?GOdownload ?GOnumber ?GOsubclassR GOsuperclassR Function GOsuperclassR - recursive calls to GOsuperclass Calling Sequence: GOsuperclassR(go) GOsuperclassR(go,links = {is_a}) Parameters: Name Type Description ----------------------------------------------------------------------- go {posint,string} GO number links set(string) types of links to follow (is_a and/or part_of Returns: list(integer) Synopsis: Recursively calls GOsuperclass to find all superclasses for a Gene Ontology. The argument can either be a number or a string of the form 'GO:008085'. Examples: > GOname(1); mitochondrion inheritance > GOsuperclassR(1); [6996, 7005, 8150, 9987, 16043, 48308, 48311, 51179, 51640, 51641, 51646] > for t in GOsuperclassR(1) do print(GOname(t)) od; organelle organization mitochondrion organization biological_process cellular process cellular component organization organelle inheritance mitochondrion distribution localization organelle localization cellular localization mitochondrion localization See Also: ?GOdefinition ?GOname ?GOsubclass ?GOsuperclass ?GOdownload ?GOnumber ?GOsubclassR Gamma Function Gamma - the Gamma and Incomplete Gamma functions Calling Sequence: Gamma(a) Gamma(a,x) Parameters: Name Type Description ------------------------------------------------------------------------- a numeric a numerical value x numeric a nonnegative argument for the Incomplete Gamma function Returns: numeric Synopsis: For a positive integer a, Gamma(a) returns the product of 1*2*3*... *(a-1) = (a-1)!. Gamma satisfies the functional equation: Gamma(a+1) = a*Gamma(a) Gamma can be defined as a definite integral: infinity / | (a - 1) Gamma(a) = | t exp(-t) dt | / 0 For non-integer values it is also possible to define Gamma for negative arguments. When Gamma is used with two arguments, it is understood to be the Incomplete Gamma function, defined by the integral: infinity / | (a - 1) Gamma(a, x) = | t exp(-t) dt | / x In this case, a must be positive. References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 6.1, 6.5.3 Examples: > Gamma(7); 720 > Gamma(100); 9.3326215443944096e+155 > Gamma(-1.5); 2.3633 > Gamma(3,2); 1.3534 See also: ?factorial ?LnGamma ?Lngamma GammaDist_Rand Function GammaDist_Rand - Generate random Gamma distributed reals Calling Sequence: Rand(GammaDist(p)) Parameters: Name Type ------------------ p nonnegative Returns: nonnegative Synopsis: This function returns a random Gamma distributed number with average p and variance p. The sum of two Gamma distributed random variables with parameters p and q is a Gamma distributed variable with parameter p+q. We have to call this function GammaDist to prevent the collision with the Gamma function. GammaDist_Rand uses Rand() which can be seeded by either the function SetRand or SetRandSeed. References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.1.32 Examples: > Rand(GammaDist(3)); 4.2584 > Rand(GammaDist(100)); 104.3901 See Also: ?Beta_Rand ?Exponential_Rand ?Normal_Rand ?StatTest ?Binomial_Rand ?FDist_Rand ?Poisson_Rand ?Std_Score ?ChiSquare_Rand ?Geometric_Rand ?SetRand ?Student_Rand ?CreateRandSeq ?Graph_Rand ?SetRandSeed ?Zscore ?Cumulative ?Multinomial_Rand ?Shuffle Gap Data structure Gap( Pos:posint, Len:posint, Seq:integer, Flag:integer ) Function: creates a gap data structure The Gap starts at position Pos and is of length Len. Selectors: Pos, - the position where the gap starts Len, - the length of the gap Seq, - the sequence number where the gap was found Flag - 1 if the gap appears identically in another sequence GapHeuristic Data structure GapHeuristic( ) Function: creates a gap heuristic data structure Selectors: Type: can be ALL, FUSION, ISLAND, STACKING, GAPSHIFT, STACKSHIFT, ISLANDSHIFT, RANDOM default value: ALL (type 1) GAP PARAMETERS: Mingaplen: Minimum length of gaps to process. default value: 1 Maxgaplen: Maximum length of gaps to process. Values < 0 mean unlimited. default value: -1 Maxgaps: maximum number of gaps that can be fused default value: 2 (should be from 1 - 5) Gapdelta: maximum allowed difference in gap sum. This should be small. If it is greater than zero, then the gap sum in a block can vary by this value. default value: 1 Stackdelta: maximum number of amino acids to left and right of a given block where other blocks should be seeked. This is needed for the stacking of gap-blocks. default value: 20 ISLAND PARAMETERS: Minislandlen: maximum length of an island to group and shift around (aa between two gaps) default value: 1 Maxislandlen: minimum length of an island to group and shift around default value: 10 Islanddelta: maximum variation of the island length in number of amino acids. Should be zero. default value: 0 LEFT RIGHT SHIFTING AND RANDOM SHIFTING PARAMETERS: Window: This value is needed for the shifting of gap blocks to the left and right, AND for the random shifting (meandistance for shifting). Each block in an alignment is shifted by this to left and right, and all positions in between are checked for a better score default value: 5 Times: How many times each gapblock is randomly shifted default value: 5 OTHER PARAMETERS: Maxaa: maximum sum of number of amino acids between gaps default value: 10 Extension: maximum number of amino acids to the left/right of a gaprow where the program should look for other gaps default value: 10 Flag: NORMAL: the values stay the same in each round INCREMENTAL: in each round the values are increased RANDOM: random values are used each round - the maximum values are the ones initially set default value: NORMAL (1) Counter: Maximum number of times the heuristics should be repeated. Values < 0 mean unlimited - i.e until the score no longer increases default value: 10 MaxBlocks: Maximum number of blocks to process. f the number of blocks is very lage ( > 100) then it can take very long to compute the alignment. In this case the parameters should be decreased. Values < 0 mean unlimited. default value: 100 GapMatch Data structure GapMatch( ) Function: creates a datastructure to keep a GapMatch Selectors: align1: alignment string of first sequence align2: " " 2nd " seq1: sequence 1 seq2: sequence 2 Pam: Pam distance len: length of alignmen score: similarity score mid: middle string (match string with |, ! and : etc) iden: identity Constructors: GapMatch(seq1, seq2); GapTree Function GapTree - build a phylogenetic tree based on gaps Calling Sequence: GapTree(msa,...) Parameters: Name Type Description ----------------------------------------------------------------- msa MAlignment one or many MAlignments over the same species Returns: Tree Global Variables: GapTree_Title Synopsis: GapTree builds a phylogenetic tree based on the gaps of one/several multiple sequence alignments. The assumption is that gap creation is a sufficiently rare which allows us to build better trees for longer distances. The gaps are extracted from MAlignments given as arguments. Only single gaps which are clearly delimited are used. Areas in which no sequence is gap-less are not considered. Areas where sequences have two gaps are also discarded. The existence/non-existence of gaps is then fed to a parsimony algorithm to produce a tree. The input MAlignments should be on the same set of labels. More specifically, we expect the MAlignments to be over different sets of sequences belonging to the same set of species, identified by the same list of labels. The global variable GapTree_Title is set to a short description of the details of the construction. See Also: ?BootstrapTree ?Entry ?LeastSquaresTree ?Sequence ?Synteny ?DrawTree ?Leaf ?MAlignment ?SignedSynteny ?Tree GaussElim Function GaussElim Option: builtin Calling Sequence: GaussElim(A,b) Parameters: Name Type ---------------------- A matrix(numeric) b array(numeric) Returns: a vector (one dimensional array) of numeric Synopsis: Given a matrix of numerical values A and vector b, this function computes x so that A * x = b by Gaussian elimination. A must be a square numerical matrix. Examples: > GaussElim([[2,4,6], [9,0,27],[17,23,5]], [8, 15, 17]); [-0.8225, 1.1667, 0.8297] See Also: ?Cholesky ?GivensElim ?matrix ?convolve ?Identity ?matrix_inverse ?Eigenvalues ?LinearProgramming ?transpose Gene Class Gene Template: Gene(Division,NucEntry,Exons,PepOffset,AlignErrors) Fields: Name Type Description ----------------------------------------------------------- Division string NucEntry integer Exons list(posint..posint) PepOffset PepLength AlignErrors integer Division NucEntry Exons list of exon locations Introns mRNA NucSequence PepOffset PepLength PepSequence AlignErrors Returns: Gene Methods: Gene_type NucPepMatch print select Synopsis: Data structure defining gene-peptide references. Examples: See also: ?NSubGene ?NucPepMatch ?PSubGene GenomeSummary Class GenomeSummary - summary information of a database file Template: GenomeSummary(DB) Fields: Name Type Description ---------------------------------------------------------------------------- DB database database structure to create a summary FileName string name of external file containing the database string string the entire header of the database as a string TotAA posint number of amino acids or bases in the database TotChars posint number of characters in the database TotEntries posint number of entries in the database type string dna, rna, mixed or peptide EntryLengths list(posint) length of each entry Id string 5-letter code (SwissProt) for species/genome Kingdom string either Bacteria, Archaea or Eukaryota Lineage list(string) Lineage as a list (from OS tags) Genus string First part of the scientific name Epithet string Second part of the scientific name sgml_tag string The contents of the tag in the database header Returns: GenomeSummary Methods: GenomeSummary_type print Rand select string Synopsis: GenomeSummary provides an alternative to loading a database when the sequences themselves are not needed. Typically, the database is loaded, then GenomeSummary is run and its results are stored in a file for later reading. In this way, all of the data except for the sequences themselves, is available and many genomes can be loaded into a darwin session. GenomeSummary has all the selectors which are available for a database (except for Entry and Pat which are can only be used if the sequences are present). Additionally it provides a few additional selectors. The EntryLengths contains the length of the sequence of each entry. The string selector, does not select the entire text of the database, just the text that is before the first entry. This is normally called the header of the database. In the header there are several useful tags which describe the entire database, for example, 5-letter code, kingdom, lineage, etc. This information is available directly through selectors. Any other tagged information in the header can be selected with the name of the tag as a selector. Examples: > ReadDb('/home/darwin/DB/genomes/ECOLI/ECOLI.db'): > gs := GenomeSummary(DB): > gs[TotAA]; 1358990 > gs[Lineage]; [Bacteria, Proteobacteria, Gammaproteobacteria, Enterobacteriales, Enterobacteriaceae, Escherichia, Escherichia coli] > print(gs); FileName: /home/darwin/DB/genomes/ECOLI/ECOLI.db string: Escherichia coli K-12 MG1655 complete genome. Geometric_Rand Function Geometric_Rand - Generate random geometrically distributed integers Calling Sequence: Rand(Geometric(p)) Returns: integer Synopsis: This function returns a random geometrically distributed integer with average (1-p)/p and variance (1-p)/p^2. In mathematical terms, the probability that the outcome is i is p*(1-p)^i (for 0 <= i). Notice that the distribution starts at 0. Geometric_Rand uses Rand() which can be seeded by either the function SetRand or SetRandSeed. References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.1.24 Examples: > Rand(Geometric(0.3)); 4 > Rand(Geometric(0.01)); 51 See Also: ?Beta_Rand ?Exponential_Rand ?Normal_Rand ?StatTest ?Binomial_Rand ?FDist_Rand ?Poisson_Rand ?Std_Score ?ChiSquare_Rand ?GammaDist_Rand ?SetRand ?Student_Rand ?CreateRandSeq ?Graph_Rand ?SetRandSeed ?Zscore ?Cumulative ?Multinomial_Rand ?Shuffle GetAaCount Function GetAaCount Calling Sequence: GetAaCount(db) Parameters: Name Type --------------- db database Returns: list(numeric,20) Synopsis: This function counts the number of occurrences of each of the twenty amino acids. It returns a list in the standard amino acid order. This function requires that a patricia tree has been created for the database assigned to DB. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > amino_acid_counts := GetAaCount(DB); amino_acid_counts := [4667613, 3174685, 2506004, 3164021, 932902, 2350027, 3935943, 4140146, 1358718, 3522464, 5740368, 3536945, 1416890, 2392860, 2893327, 4101839, 3256308, 692928, 1836010, 4004568] See also: ?GetAaFrequency GetAaFrequency Function GetAaFrequency Calling Sequence: GetAaFrequency(db) Parameters: Name Type --------------- db database Returns: NULL Synopsis: This procedure computes the percent amino acid frequencies of the database assigned to db. It prints out the results in a nice format. This function requires that a patricia tree has been created for the database assigned to db. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > GetAaFrequency(DB); Alanine 7.83 % Arginine 5.32 % Asparagine 4.20 % Aspartic acid 5.31 % Cysteine 1.56 % Glutamine 3.94 % Glutamic acid 6.60 % Glycine 6.94 % Histidine 2.28 % Isoleucine 5.91 % Leucine 9.63 % Lysine 5.93 % Methionine 2.38 % Phenylalanine 4.01 % Proline 4.85 % Serine 6.88 % Threonine 5.46 % Tryptophan 1.16 % Tyrosine 3.08 % Valine 6.72 % unknown 0.01 % GetAllNucPepMatches Function GetAllNucPepMatches Option: builtin Calling Sequence: GetAllNucPepMatches(npm,D,goal) Parameters: Name Type ------------------ npm NucPepMatch D DayMatrix goal numeric Returns: list Synopsis: Return the list of all NucPepMatch between the nucleotide and the peptide sequences in npm reaching goal. Examples: See also: ?GetAllMatches ?NucPepMatch GetComplement Function GetComplement Calling Sequence: GetComplement(nuc) Parameters: Name Type Description ----------------------------------------- nuc string a string of DNA/RNA bases Returns: string Global Variables: CO_Cache Synopsis: Computes the complementary DNA/RNA strand for the given sequence. Examples: > GetComplement('ACTTACG'); CGTAAGT See Also: ?AltGenCode ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB ?AminoToInt ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt ?antiparallel ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon ?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase GetEntryInfo Function GetEntryInfo - selected tag information from a database entry Calling Sequence: GetEntryInfo(EntryDescr,tag1,tag2) Parameters: Name Type Description ----------------------------------------------------------------------------- EntryDescr {integer,list,string} an entry, entry offset or a list of same tag1 string tag2 optional tags Returns: expseq(string) Synopsis: Return the information tags (tag1 and additional optional tags) for an entry given by offset or several entries given by an Entry data structure. The function returns an expression sequence of string, two elements for each entry and tag - the first being the tag and the second being the information for that tag. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > GetEntryInfo(100,'DE'); DE, 104 kDa microneme-rhoptry antigen. > GetEntryInfo([Entry(1,2)],'AC','ID' ); AC, P15711;, ID, 104K_THEPA, AC, Q43495;, ID, 108_LYCES See Also: ?Entry ?SearchAC ?SearchTag ?GetEntryNumber ?SearchID ?Species_Entry GetEntryNumber Function GetEntryNumber Option: builtin Calling Sequence: GetEntryNumber(offset,df) Parameters: Name Type Description --------------------------------------------------------------------- offset {integer,string} an offset of an entry or an entry df database optional - will default to DB if assigned Returns: integer Synopsis: Return the number of the entry which contains the given offset in df (default DB). If the argument is a string, it is assumed to be part of the database - in which case the entry number including the beginning of the string is returned. Examples: > a := Entry(1); a := 104K_THEPAP15711;104 kDa ..(1255).. L > GetEntryNumber(a); 1 > GetEntryNumber(34675449); 30873 See also: ?Entry ?GetEntryInfo ?GetOffset ?TextHead GetFileInfo Function GetFileInfo( CommentString:string ) Determines some information about where when and by whom a file has been created. GetGramRegionScore Function GetGramRegionScore Option: builtin Calling Sequence: GetGramRegionScore(n,S) Parameters: Name Type ----------------- n string G GramRegion Returns: numeric Synopsis: Computes k-gram region scores over nucleotide sequence n according to S. See also: ?GetGramRegion ?GetMolWeight ?GetMostFrequentGrams GetGramSiteScore Function GetGramSiteScore Option: builtin Calling Sequence: GetGramSiteScore(n,S) Parameters: Name Type --------------- n string S GramSite Returns: NULL Synopsis: Computes k-gram site scores over nucleotide sequence n according to S. Examples: See also: ?GetGramRegionScore ?GetGramSite ?GramSite GetIntrons Function GetIntrons Calling Sequence: GetIntrons(m) Parameters: Name Type ------------------ m NucPepMatch Returns: list Synopsis: Returns the introns derived from m. m[Introns] must be defined. Examples: See also: ?NucPepMatch GetLcaSubtree Function GetLcaSubtree( t ) Get all leaf numbers of tree t GetMATreeNew Function GetMATreeNew( MA:array(string) ) Estimates Dist and Var Matrices from an alignment GetMachineUsage Function GetMachineUsage Calling Sequence: GetMachineUsage(logfile) Parameters: Name Type ---------------- logfile string Returns: NULL Synopsis: Reads a log file created by ParExecuteIPC and produces a listing containing all machines and the work they did, sorted by machine usage. See Also: ?ConnectTcp ?ipcsend ?ParExecuteTest ?SendDataTcp ?darwinipc ?ParExecuteIPC ?ReceiveDataTcp ?SendTcp ?DisconnectTcp ?ParExecuteSlave ?ReceiveTcp GetMolWeight Function GetMolWeight Calling Sequence: GetMolWeight(s) Parameters: Name Type Description ------------------------------------------------------------------ s {string,list(string)} an (or list of) amino acid sequence Returns: {numeric,list(numeric)} Synopsis: This function computes the molecular weight of an amino acid sequence or list of amino acid sequences. Examples: > GetMolWeight('IHGGCA'); 556.6290 > GetMolWeight(['VTTWD', 'LIHAAG']); [620.6250, 580.6720] See also: ?GetMostFrequentGrams GetMostFrequentGrams Function GetMostFrequentGrams Option: builtin Calling Sequence: GetMostFrequentGrams(n,k) Parameters: Name Type ------------- n posint k posint Returns: NULL Synopsis: This function prints out the n most frequent k-grams (sequences of length k). It requires a database loaded at system variable DB. Examples: > GetMostFrequentGrams(5, 5); The 5 most frequent strings of length 5 or longer are: "GGGGG" occurs 4359 times (1997 without overlaps) "EEEEE" occurs 4718 times (2075 without overlaps) "SSSSS" occurs 4980 times (2320 without overlaps) "AAAAA" occurs 5924 times (2793 without overlaps) "QQQQQ" occurs 7032 times (2404 without overlaps) See Also: ?DB ?GetGramRegion ?GetGramRegionScore ?GetMolWeight ?GramRegion GetOffset Function GetOffset - Gets an offset in the database for a string Option: builtin Calling Sequence: GetOffset(seq) Parameters: Name Type Description --------------------------------------------------- seq string a string in or outside the database Returns: integer Synopsis: The GetOffset function finds the offset of a string whether it is in the database or outside. It is necessary when we want to pretend that a string is a sequence in the database to make it an argument of Match. The GetOffset requires that the system variable DB must be assigned a sequence database. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > CreateDayMatrices(); > s1 := 'MSRYEKMFARLNERNQGAFVPFVTVCDPNAEQSYKIMETLVESGADALELGIPFSDP': > s2 := 'MLLLSVNPPLFIPFIVAGDPSPEVTVDLALALEEAGADLLELGVPYSDP': > m3 := Match( GetOffset(s1), GetOffset(s2) ); m3 := Match(801680240,487437408) See also: ?MAlign ?NucPepMatch ?ReadDb ?TotalAlign GetPartitions Function GetPartitions( ) returns the splits or partitions of a data set or a tree. The resulting data structure is a list of sets GetPathDistance Function GetPathDistance( order:array ) order: order of tree or AllAll traversal If second argument is a tree, then the tree is traversed in the given order and the length (only in PAM units!) of the path is returned. If second argument is an array of Matches (AllAll) then the AllAl is "traversed" in the given order and the path length is returned. The score is always divided by the length of the match The second argument may also be a distance matrix. If a third argument is given ("PAM" or "SCORE") the units of the distance can be chosen (for the AllAll) GetPeptides Function GetPeptides Calling Sequence: GetPeptides(m) Parameters: Name Type ------------------ m NucPepMatch Returns: NULL Synopsis: Returns the peptide derived from m. m[NucGaps] and m[Introns] must be defined. Note that amino acids derived from indels which are a multiple of 3 do not always correspond to the reading frame implied by the alignment. Examples: See also: ?NucPepMatch GetPosition Function GetPosition Calling Sequence: GetPosition(df,ofs) Parameters: Name Type --------------- df database ofs integer Returns: list Global Variables: DB Synopsis: Returns [pos, len] such that t is the complete sequence ofs is pointing to after execution of 't := ofs + df[string]; t := t[1..len];'. Examples: See also: GetSubTree_r Function GetSubTree_r( t:Tree, i, j ) Get the subtree that has both leaves i and j, one in the left and one in the right subtree GetTreeLabels Function GetTreeLabels Calling Sequence: GetTreeLabels(t) Parameters: Name Type ----------- t Tree Returns: list Synopsis: This function returns a list of all the leaf labels present in t. Examples: > T := Tree( Leaf(a, 2), 0.5, Leaf(e, 1) ); T := Tree(Leaf(a,2),0.5000,Leaf(e,1)) > GetTreeLabels( T ); a, e See also: ?Leaf ?Tree GivensElim Function GivensElim Calling Sequence: GivensElim(A) Parameters: Name Type ---------------------------------- A an m x n numerical matrix Returns: [matrix, matrix] : [Q,R] Synopsis: GivensElim factors an m x n matrix A into two factors, A = Q*R. This decomposition is done with individual Givens' rotations. The decomposition is commonly called the QR-decomposition. Q is an m x m square orthonormal matrix, that is Q*Q^t = I. R is an m x n upper triangular matrix. If the matrix is found to be singular, then R will have zeros in the diagonal and the decomposition is still correctly done. References: Computermathematik, Walter Gander, Birkhauser, Ch 5.3 Examples: > GivensElim( [[1,2], [-2,3]] ); [[[-0.4472, -0.8944], [0.8944, -0.4472]], [[-2.2361, 1.7889], [0, -3.1305]]] See Also: ?Cholesky ?GaussElim ?matrix ?convolve ?Identity ?matrix_inverse ?Eigenvalues ?LinearProgramming ?transpose GlobalNucPepAlign Function GlobalNucPepAlign Calling Sequence: GlobalNucPepAlign(m,DM) Parameters: Name Type ------------------ m NucPepMatch DM DayMatrix Returns: NucPepMatch Global Variables: DB Synopsis: Run the dynamic programming algorithm for the given Match with the given DM matrix. Examples: See Also: ?AlignNucPepAll ?GetPeptides ?VisualizeGene ?FindNucPepPam ?LocalNucPepAlign ?VisualizeProtein ?Gene ?LocalNucPepAlignBestPam ?GetIntrons ?NucPepMatch Globals Function Globals - returns all global variables set inside a function Calling Sequence: Globals(func) Parameters: Name Type Description ------------------------------- func procedure the function Returns: set(symbol) Synopsis: Globals returns all global variables that are set inside a function. For functions inside modules and inside other functions Globals returns exactly those global variables that are also visible to the user. Variables that are only global inside a module are not reported. Examples: > Globals(CreateDayMatrices); {AF,DM,DMS,logPAM1} See also: ?local ?UnassignGlobals GramRegion Class GramRegion Template: GramRegion(ProbI,ProbE,Extend,LogR0) GramRegion(intCounts,totCounts,Extend,LogR0) Fields: Name Type ----------------------- ProbI array(numeric) ProbE array(numeric) Extend integer LogR0 numeric Mean numeric Min numeric Max numeric Returns: structure(array,array,integer,numeric) Methods: GramRegion_type print select Synopsis: Structure to hold GramRegion scoring model data. If called with array(integer) as the first two arguments (holding Counting data), it returns the GramRegion data structure with ProbI and ProbE. Examples: See also: ?GetGramRegion ?GetMolWeight ?GetMostFrequentGrams GramSchmidt Function GramSchmidt Calling Sequence: GramSchmidt(A) Parameters: Name Type -------------------------------------------------------- A a list of linearly independent vectors (a matrix) Returns: matrix(numeric) Synopsis: The GramSchmidt function computes an orthonormal basis spanning the same subspace as the vectors in A. The input matrix A is interpreted as a list of vectors. The vectors have to be all of the same dimension and linearly independent. The result is a list of orthonormal vectors, with the same dimension as A. If the dimension of A is m x n, then m <= n. This is often called the Gram-Schmidt orthonormalization process. Examples: > GramSchmidt( [[1,2],[1,-1]] ); [[0.4472, 0.8944], [0.8944, -0.4472]] > GramSchmidt( [[0,1,-1],[1,-1,3]] ); [[0, 0.7071, -0.7071], [0.5774, 0.5774, 0.5774]] See Also: ?Cholesky ?GaussElim ?LinearProgramming ?transpose ?convolve ?GivensElim ?matrix ?Eigenvalues ?Identity ?matrix_inverse GramSite Class GramSite Template: GramSite(Scores,LeftLen,LogR0) GramSite(counts,totCounts,LeftLen,LogR0) Fields: Name Type -------------------------------- Scores array(array(numeric)) LeftLen posint LogR0 numeric RightLen posint Mean numeric Min numeric Max numeric Returns: structure(array,posint,numeric) Methods: GramSite_type print select Synopsis: Structure to hold k-gram scoring model data. When called with counts and totCounts, it returns the Gram Site Scores Examples: See also: ?GetGramSite ?GetGramSiteScore Graph Class Graph - Data structure for storing a graph Template: Graph(Edges,Nodes) Fields: Name Type Description ----------------------------------------------------------------------------- Edges Edges description of edges Nodes Nodes description of nodes (vertices) Degrees list(integer) a list containing the degree of each node Adjacencies list(list) an array with lists of adjacent nodes, indexed by node number Incidences list(list) an array with lists of incident edge numbers, indexed by node number. Distances matrix(numeric) a square matrix containing the distance between pairs of nodes (assuming that the edges label is a distance or a list with first element being a distance). Disconnected nodes are at dist DBL_MAX. Labels matrix a square matrix containing the labels of edges between pairs of nodes. Disconnected nodes get the label DBL_MAX. AdjacencyMatrix matrix a square matrix containing 1s for each edge and zeros otherwise. The matrix is symmetric. The diagonal is zeroed. Methods: dimacs display Graph_type minus plus Rand select Tree union XGMML Synopsis: A graph is represented by Graph( Edges( Edge(lab1,n1,n2), ... ), Nodes( lab1, lab2, ... ) ) Where Edges describes the set of edges and Nodes describes the set of nodes. Alternatively, and only as input, graphs can be represented with the standard notation of set of vertices and sets of edges. In this case an edge is represented as a set of two vertices. A node (or vertex) can be represented by any valid object in Darwin. Usually integers are used. Notice that the values of Edge must correspond to a node, hence if you use complicated objects as nodes, these have to be replicated every time you include them in an Edge. If Graph is used with only a set of Edges, it deduces which are the Nodes from the Edges. Examples: > Graph({a,b,c},{{a,b},{a,c},{b,c}}); Graph(Edges(Edge(0,a,b),Edge(0,a,c),Edge(0,b,c)),Nodes(a,b,c)) > Graph(Edges(Edge(0,10,20),Edge(0,10,35)), Nodes(0,10,20,35)); Graph(Edges(Edge(0,10,20),Edge(0,10,35)),Nodes(0,10,20,35)) > Graph(Edges(Edge(0,10,20),Edge(0,10,35))); Graph(Edges(Edge(0,10,20),Edge(0,10,35)),Nodes(10,20,35)) See Also: ?BipartiteGraph ?Graph_Rand ?ParseDimacsGraph ?Clique ?Graph_XGMML ?Path ?DrawGraph ?InduceGraph ?RegularGraph ?Edge ?MaxCut ?ShortestPath ?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph ?Edges ?MinCut ?VertexCover ?FindConnectedComponents ?MST ?Graph_minus ?Nodes Graph_Rand Function Graph_Rand - generate a random graph Calling Sequence: Rand(Graph) Graph_Rand(n,m) Parameters: Name Type Description -------------------------------------------------- n integer optional number of nodes/vertices m integer optional number of edges Returns: Graph Synopsis: Generate a random undirected graph with n nodes and m edges. If m is not specified, then the number of edges is <= n*ln(n). If n is not specified, a random value between 5 and 20 is chosen. The Edges are all labelled with 0. Examples: > Rand(Graph); Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,1,9),Edge(0,1,10),Edge(0,1,13),Edge(0,1,15),Edge(0,2,4),Edge(0,2,7),Edge(0,2,8),Edge(0,2,9),Edge(0,2,10),Edge(0,2,13),Edge(0,2,14),Edge(0,2,15),Edge(0,2,16),Edge(0,3,5),Edge(0,3,10),Edge(0,3,11),Edge(0,3,15),Edge(0,3,16),Edge(0,4,8),Edge(0,4,9),Edge(0,4,11),Edge(0,4,13),Edge(0,4,14),Edge(0,5,7),Edge(0,5,8),Edge(0,5,11),Edge(0,6,8),Edge(0,6,10),Edge(0,6,12),Edge(0,7,8),Edge(0,7,12),Edge(0,7,15),Edge(0,8,14),Edge(0,9,12),Edge(0,9,14),Edge(0,9,16),Edge(0,10,11),Edge(0,11,12),Edge(0,11,13),Edge(0,11,15),Edge(0,12,14),Edge(0,13,15),Edge(0,14,15)),Nodes(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)) > Graph_Rand(3,4); Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,2,3)),Nodes(1,2,3)) See Also: ?BipartiteGraph ?Graph_minus ?ParseDimacsGraph ?Clique ?Graph_XGMML ?Path ?DrawGraph ?InduceGraph ?RegularGraph ?Edge ?MaxCut ?ShortestPath ?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph ?Edges ?MinCut ?VertexCover ?FindConnectedComponents ?MST ?Graph ?Nodes G:=Graph(Edges(Edge(1,A,B),Edge(2,B,C)),Nodes(A,B,C)); G := Graph(Edges(Edge(1,A,B),Edge(2,B,C)),Nodes(A,B,C)) > print(Graph_XGMML(G)); See Also: ?BipartiteGraph ?Graph_minus ?ParseDimacsGraph ?Clique ?Graph_Rand ?Path ?DrawGraph ?InduceGraph ?RegularGraph ?Edge ?MaxCut ?ShortestPath ?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph ?Edges ?MinCut ?VertexCover ?FindConnectedComponents ?MST ?Graph ?Nodes Graph_minus Function Graph_minus Calling Sequence: Graph_minus(G,V) Graph_minus(G,E) Graph_minus(G,G1) Parameters: Name Type Description ------------------------------------- G Graph a given Graph V Nodes Nodes to be removed E Edges Edges to be removed G1 Graph Subgraph to be removed Returns: Graph Synopsis: This function removes a set of either edges or vertices from a graph and returns the updated graph. Note that the deletion of an edge does not remove the vertex end points of this edge. The deletion of a vertex removes all incident edges. Examples: > G1 := Graph(Edges(Edge('a',1,2), Edge('b',2,3), Edge('c',3,4)), Nodes(1,2,3,4,5)); G1 := Graph(Edges(Edge(a,1,2),Edge(b,2,3),Edge(c,3,4)),Nodes(1,2,3,4,5)) > Graph_minus(G1, Edges(Edge('b',2,3))); Graph(Edges(Edge(a,1,2),Edge(c,3,4)),Nodes(1,2,3,4,5)) > G1 minus Nodes(2,3); Graph(Edges(),Nodes(1,4,5)) > Graph_minus(Graph(Edges(), Nodes(1,2,3,4)), Nodes(1,2,3,4)); Graph(Edges(),Nodes()) See Also: ?BipartiteGraph ?Graph_Rand ?ParseDimacsGraph ?Clique ?Graph_XGMML ?Path ?DrawGraph ?InduceGraph ?RegularGraph ?Edge ?MaxCut ?ShortestPath ?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph ?Edges ?MinCut ?VertexCover ?FindConnectedComponents ?MST ?Graph ?Nodes HTMLColor Function HTMLColor( what, color ) Converts any text into html format with color information what: any text color: color keyword, at least 2 (3 for blue and black) letters e.g. "ye" (yellow), "light blue", "pink", "li gr" etc HTMLColorprint Function HTMLColorprint( what:anything, Directory:string, Filename:string, Positions:array, Colors:array(string), index:integer, DP0 ) Prints what either into a file (if specified) or just returns the string in html format. what: MultiAlign similar as print(), but creates a postscript file for the tree in the same directory, same filename with .ps extension string adds to the text and prints it Filename: Use NO extension. ".html" is added automatically HTMLCols Function HTMLCols( what:array(string), border:integer ) Puts each element of the array into different columns. border: thickness of border (0 => invisible) HTMLRows Function HTMLRows( what:array(string), border:integer ) Puts each element of the array into different rows. border: thickness of border (0 => invisible) HTMLTitle Function HTMLTitle( what:string, how:string ) Converts any text into html headings, bold or italiic text what: any text how: keyword, at least 1 letter e.g. "H1" (Heading 1), "Heading 4", "bold", "it" etc. HTMLprint Function HTMLprint( what:anything, Directory:string, Filename:string ) Prints what either into a file (if specified) or just returns the string in html format. what: MultiAlign similar as print(), but creates a postscript file for the tree in the same directory, same filename with .ps extension what: string adds to the text and prints it Filename: Use NO extension. ".html" is added automatically HammingSearchAllString Function HammingSearchAllString - Find several approx instances of phrase in a text Calling Sequence: HammingSearchAllString(pat,txt) Parameters: Name Type Description ----------------------------------------- pat string a pattern that is sought txt string a text which is searched dist integer an (opt.) hamming dist Returns: list Synopsis: The function HammingSearchAllArray returns the array of indices of an all the occurrences of the pattern in the text with in a hamming distance (default 1). If pattern can not be found it returns an empty list. This function is case insensitive. Examples: > HammingSearchAllString('cat', 'acgcatcatgcatcagtca'); [4, 7, 11, 14] See Also: ?BestSearchString ?MatchRegex ?SearchMultipleString ?CaseSearchString ?SearchApproxString ?SearchString ?HammingSearchString ?SearchDelim HammingSearchString Function HammingSearchString Option: builtin Calling Sequence: HammingSearchString(pat,txt,tol) Parameters: Name Type ------------------ pat string txt string tol {0, posint} Returns: {-1,posint} Synopsis: This function is almost identical to ApproxSearchString. The only difference is that insertions and deletions are not allowed. Examples: > txt := 'AAAAAAAAAHeLLoBBBBB'; txt := AAAAAAAAAHeLLoBBBBB > j := HammingSearchString('hallo', txt, 1); j := 9 > j+txt; HeLLoBBBBB > HammingSearchString('aahllo', txt, 1); -1 See Also: ?BestSearchString ?SearchApproxString ?SearchString ?CaseSearchString ?SearchDelim ?MatchRegex ?SearchMultipleString History Data structure History( ) Function: creates a datastructure to keep a history of what happened Selectors: Show: Prints the whole history ID Class ID - Data structure for storing IDs of the database DB Template: ID(id) Fields: Name Type Description ------------------------------------------------------------------------ id {list,string,structure} ID(s) of Entries in the database DB PatEntry, Match or Entry data structure Returns: ID Methods: Entry ID_type Sequence Synopsis: ID is a data structure which holds database identification tags (IDs) contained in the and tags in a Darwin formatted database. IDs can be used as arguments to other functions, e.g. Entry, Sequence, to indicate that the Entry or sequence desired is the one with the given ID. ID will attempt to convert its arguments when they are other entry descriptions to IDs. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > id := ID('100K_RAT'); id := ID(100K_RAT) > Entry(id); > Sequence(id); > ID(Entry(2)); ID(108_LYCES) > ID(PatEntry(10000..10002)); ID(SYP1_YEAST,SYP_CHLPN,SYQ_DEIRA) > ID(Sequence(Entry(1))); ID(104K_THEPA) See Also: ?AC ?Match ?SearchAC ?Sequence ?Species_Entry ?Entry ?PatEntry ?SearchID ?SPCommonName ?SP_Species IdenticalTrees Function IdenticalTrees - test whether two trees have the same topology Calling Sequence: IdenticalTrees(t1,t2) Parameters: Name Type ----------- t1 Tree t2 Tree Returns: boolean Synopsis: IdenticalTrees tests whether the two given trees have the same topology (shape, relation between the leaves). The branch lengths are ignored. The trees must have leaves based on the same labels (first argument of Leaf). If the set of leaf labels differs, IdenticalTrees will return false. Examples: > t1 := Tree(Tree(Leaf(a,2),1,Leaf(b,2)),0,Tree(Leaf(c,2),1,Leaf(d,2))): > t2 := Tree(Tree(Leaf(a,2),1,Leaf(d,2)),0,Tree(Leaf(c,2),1,Leaf(b,2))): > IdenticalTrees(t1,t2); false See also: ?Leaf ?Tree Identity Function Identity - create an identity matrix Calling Sequence: Identity(n) Parameters: Name Type Description --------------------------------------- n posint dimension of the matrix Returns: matrix(integer) Synopsis: Creates a new identity matrix of dimension n x n. Examples: > Identity(3); [[1, 0, 0], [0, 1, 0], [0, 0, 1]] See Also: ?Cholesky ?GaussElim ?matrix ?convolve ?GivensElim ?matrix_inverse ?Eigenvalues ?LinearProgramming ?transpose If Function If Option: builtin Calling Sequence: If(cond,exptrue,expfalse) Parameters: Name Type ----------------------------- cond boolean expression exptrue expression expfalse expression Returns: {type(expfalse),type(exptrue)} Synopsis: The If construct provides a short hand version of the if-then-fi construct. Every If can be re-written as follows: > if cond then exptrue else expfalse fi; Note that the If function returns the result of exptrue or expfalse. Examples: > x:=5; x := 5 > If(mod(x,2)=0, x/2, (x-1)/2); 2 InduceGraph Function InduceGraph Calling Sequence: InduceGraph(G,V) InduceGraph(G,E) Parameters: Name Type Description -------------------------------------- G Graph a given Graph V Nodes Nodes inducing subgraph E Edges Edges inducing subgraph Returns: Graph Synopsis: This function computes a vertex- or edge-induced subgraph. A vertex-induced subgraph is one that consists of some of the vertices of the original graph and all of the edges that connect them in the original. An edge-induced subgraph consists of some of the edges of the original graph and the vertices that are at their endpoints. Examples: > G := Graph( {{1,2},{2,3},{1,3},{2,4}},{1,2,3,4} ); G := Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,2,3),Edge(0,2,4)),Nodes(1,2,3,4)) > InduceGraph( G, Nodes(1,2,3) ); Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,2,3)),Nodes(1,2,3)) > InduceGraph( G, Edges( Edge(0,1,2),Edge(0,2,3)) ); Graph(Edges(Edge(0,1,2),Edge(0,2,3)),Nodes(1,2,3)) See Also: ?BipartiteGraph ?Graph_minus ?ParseDimacsGraph ?Clique ?Graph_Rand ?Path ?DrawGraph ?Graph_XGMML ?RegularGraph ?Edge ?MaxCut ?ShortestPath ?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph ?Edges ?MinCut ?VertexCover ?FindConnectedComponents ?MST ?Graph ?Nodes InfixNr Function InfixNr( t:Tree ) returns all numbers of the leafs in a tree (or a leaf) Warning: procedure Polar_abs reassigned Inherit Function Inherit - Inherit all defined methods of the old into the new class Calling Sequence: Inherit(newclass,oldclass) Parameters: Name Type Description -------------------------------------------------- newclass symbol The new class being extended oldclass symbol The class donating the methods Returns: NULL Synopsis: All methods defined for the oldclass which are not defined in the newclass are converted to work with the newclass. Any method which is not wanted to be inherited must be defined before calling Inherit. Alternatively, an unwanted method can be removed by using noeval. Multiple inheritance is obtained by invoking Inherit more than once. Inherit benefits from the availability of newclass_Rand, if the objects have some special property. In general, it is a good idea to define all the methods which are particular to newclass before invoking Inherit. Note: Since the newclass is not a subclass of the oldclass (but only convertible) objects of type newclass are not of type oldclass and a corresponding test with the function "type" results in "false". Examples: > Polar := proc( Rho:numeric, Theta:numeric ) ... end; > Polar_abs := proc( a:Polar ) a[Rho] end; Polar_abs := proc (a:Polar) a[Rho] end > Inherit(Polar,Complex); See also: ?CompleteClass ?ExtendClass ?objectorientation ?Protect IntOut Function IntOut( IntMatrix:array(array(array)), IntMatrixTot:array(array) ) Returns for each position the IntProb of being interior, the size of the largest APC subgroup at the specified MaxPW and IntAA used to determine IntProb IntToA Function IntToA - convert an integer into a 1 letter amino-acid name Option: builtin Calling Sequence: IntToA(x) Parameters: Name Type Description ---------------------------------------- x integer an integer from 1 to 20 Returns: string Synopsis: This function converts a posint into a one letter abbreviation of an amino acid. This follows the standard ordering of amino acids. (See ?aminoacids) Examples: > IntToA(20); V See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToCInt ?aminoacids ?BBBToInt ?CIntToInt ?IntToAAA ?IntToCodon ?AminoToInt ?BToInt ?CodonCode ?IntToAmino ?AToCInt ?CIntToA ?CodonToA ?IntToB ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase ?AToInt ?CIntToAmino ?CodonToInt ?IntToBBB IntToAAA Function IntToAAA - convert an integer into a 3 letter amino-acid name Option: builtin Calling Sequence: IntToAAA(x) Parameters: Name Type Description ---------------------------------------- x integer an integer from 1 to 20 Returns: string Synopsis: This function converts a posint into a three letter abbreviation of an amino acid. This follows the standard ordering of amino acids. (See ?aminoacids) Examples: > IntToAAA(1); Ala See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToCInt ?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCodon ?AminoToInt ?BToInt ?CodonCode ?IntToAmino ?AToCInt ?CIntToA ?CodonToA ?IntToB ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase ?AToInt ?CIntToAmino ?CodonToInt ?IntToBBB IntToAmino Function IntToAmino - convert an integer into an amino-acid name Option: builtin Calling Sequence: IntToAmino(x) Parameters: Name Type Description ---------------------------------------- x integer an integer from 1 to 20 Returns: string Synopsis: This function converts a posint into the full name for an amino acid following the standard ordering of amino acids. (See ?aminoacids) Examples: > IntToAmino(15); Proline See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToCInt ?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCodon ?AminoToInt ?BToInt ?CodonCode ?IntToAAA ?AToCInt ?CIntToA ?CodonToA ?IntToB ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase ?AToInt ?CIntToAmino ?CodonToInt ?IntToBBB IntToAscii Function IntToAscii - convert an integer to its ascii ordinal character Option: builtin Calling Sequence: IntToAscii(i) Parameters: Name Type Description -------------------------------------------- i posint an integer between 1 and 255 Returns: string Synopsis: Converts an integer between 1 and 255 to its ascii ordinal character. The null character (octal 000) cannot be represented. This function allows an easy way to generate non-printable characters, or special (accentuated) characters. This is useful when encoding/decoding symbols for dynamic programming. It is also useful in general for the analysis of raw input. Examples: > IntToAscii(97); a > IntToAscii(126); ~ See Also: ?AsciiToInt ?HammingSearchString ?SearchDelim ?AToInt ?IntToA ?SearchMultipleString ?BestSearchString ?MatchRegex ?SearchString ?CaseSearchString ?SearchApproxString IntToB Function IntToB - Integer to One Letter Nucleic Option: builtin Calling Sequence: IntToB(x) Parameters: Name Type ------------- x {1..6} Returns: {A,C,G,T,U,X} Synopsis: This function converts an integer between 1..6 into the one letter code for nucleic acids A, C, G, T, U, X. Examples: > IntToB(1); A > IntToB(6); X See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToCInt ?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCodon ?AminoToInt ?BToInt ?CodonCode ?IntToAAA ?AToCInt ?CIntToA ?CodonToA ?IntToAmino ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase ?AToInt ?CIntToAmino ?CodonToInt ?IntToBBB IntToBBB Function IntToBBB - Integer to Three Letter Nucleic Option: builtin Calling Sequence: IntToBBB(x) Parameters: Name Type ------------- x {1..5} Returns: {Ade,Cyt,Gua,Thy,Ura} Synopsis: This function converts an integer between 1..5 into the three letter code for nucleic acids Ade, Cyt, Gua, Thy, Ura respectively. Examples: > IntToBBB(1); Ade See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToCInt ?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCodon ?AminoToInt ?BToInt ?CodonCode ?IntToAAA ?AToCInt ?CIntToA ?CodonToA ?IntToAmino ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB ?AToInt ?CIntToAmino ?CodonToInt ?IntToBase IntToBase Function IntToBase - Integer to Nucleic Acid Name Option: builtin Calling Sequence: IntToBase(x) Parameters: Name Type ------------- x {1..5} Returns: {Adenine,Cytosine,Guanine,Thymine,Uracil} Synopsis: This function converts an integer between 1..5 into the full name for a nucleic acid Adenine, Cytosine, Guanine, Thymine, Uracil respectively. Examples: > IntToBase(1); Adenine See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToCInt ?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCodon ?AminoToInt ?BToInt ?CodonCode ?IntToAAA ?AToCInt ?CIntToA ?CodonToA ?IntToAmino ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB ?AToInt ?CIntToAmino ?CodonToInt ?IntToBBB IntToCInt Function IntToCInt - Amino Acid Integer to List of Codon Integers Calling Sequence: IntToCInt(AA) Parameters: Name Type Description ---------------------------------- AA posint amino acid integer Returns: list Synopsis: This function converts an amino acid integer code into a list of the corresponding codon integers. It will convert the symbol for a stop codon '$' into a list of stop codons. Examples: > IntToCInt('$'); [49, 51, 57] > IntToCInt(4); [34, 36] See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB ?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCodon ?AminoToInt ?BToInt ?CodonCode ?IntToAAA ?AToCInt ?CIntToA ?CodonToA ?IntToAmino ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB ?AToInt ?CIntToAmino ?CodonToInt ?IntToBase IntToCodon Function IntToCodon - Integer Amino Acid Representation to List of Codons Calling Sequence: IntToCodon(AA) Parameters: Name Type Description ---------------------------------------- AA integer amino acid integer code Returns: list Synopsis: This function converts an amino acid integer code (see ?aminoacids) into a list of the corresponding codons. The amino acid integer code for the stop codons is 22. Examples: > IntToCodon(22); [TAA, TAG, TGA] > IntToCodon(5); [TGC, TGT] See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB ?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt ?AminoToInt ?BToInt ?CodonCode ?IntToAAA ?AToCInt ?CIntToA ?CodonToA ?IntToAmino ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB ?AToInt ?CIntToAmino ?CodonToInt ?IntToBase Interior Function Interior( Cluster:list(list(list)), MA:array(string), MaxPW:array, IntAA:array, ActMatrixOut:array ) Reports the length of the largest subgroup at defined PAM windows in which all amino acids are of the types defined in IntAA InteriorTot Function InteriorTot( IntMatrix:array(array(array)) ) Reports the sum of the length of all the largest subgroups at defined PAM windows and IntAAs counted over all positions IntraDistance Function IntraDistance - Computes the pairwise distances between trees in a list Calling Sequence: IntraDistance(Trees,DistanceFunction) Parameters: Name Type Description ---------------------------------------------------------------------- Trees list(Tree) list of trees DistanceFunction procedure (optional), distance between two trees Returns: table Synopsis: IntraDistance computes the distances between every pair of trees in the given list over the set of common leaves. That is, each pair of trees is first reduced to the subtrees of the common leaves and then the distance is computed. If there are less than 4 common leaves, the pair is ignored as the distance will be always 0. IntraDistance returns a table which contains the first three moments (0th, first and second) of the distance distribution per size of the intersecting leaves. That is to say, if r is the result, then r[4] will be a list of 3 values, which are the 0th, 1st and 2nd moment of the distribution of distances for trees which shared exactly 4 leaves. If no DistanceFunction is provided, the Robinson-Foulds distance will be used. If a branch length of a tree is less or equal to MinLen, then it is assumed that this branch does not exists, i.e. this is a case of multifurcation rather than bifurcation and the corresponding edge will not be counted in the distance. This is a difference from the RobinsonFoulds distance and it allows to compute distances to trees with partial information, like trees derived from taxonomic data. Examples: > st1 := Tree(Leaf(a,2),1,Leaf(b,2)): > st2 := Tree(Leaf(c,2),1,Leaf(d,2)): > st3 := Tree(Leaf(e,2),0.5,st2): > st4 := Tree(Leaf(a,2),0.5,Tree(Leaf(e,2),1,Leaf(b,2))): > r := IntraDistance( [Tree(st1,0,st2),Tree(st3,0,st1),Tree(st4,0,st2)] ): > print(r); 4 --> [2, 0, 0] 5 --> [1, 1, 1] See Also: ?BipartiteSquared ?LeastSquaresTree ?RobinsonFoulds ?BootstrapTree ?PhylogeneticTree ?SignedSynteny ?ComputeDimensionlessFit ?RBFS_Tree ?Synteny ?GapTree ?ReconcileTree Intron Class Intron Template: Intron(n,pam,div) Fields: Name Type Description ------------------------------------- n string nucleotide sequence pam numeric PAM distance div string code of the division Returns: Intron Methods: Intron_type Global Variables: IT_model IT_olddiv IT_oldn IT_oldres IT_scores Synopsis: Computes and stores the Bayesian probabilistic intron scoring model. Use Intron(div) to select the scoring model for division div. Divisions are fun, inv, mam, pln, pri, pro, rod, vrt, any. Examples: See also: IntronModel Class IntronModel Template: IntronModel(Donor,InIntron,Acceptor,MinLen) Fields: Name Type Description ----------------------------------- Donor GramSite InIntron GramRegion Acceptor GramSite MinLen posint Donor GramSite InIntron GramRegion Acceptor GramSite MinLen posint Returns: IntronModel Methods: IntronModel_type print select Synopsis: Structure to hold intron scoring model data. See also: ?LinearIntron IsolationIndex Function IsolationIndex( d:matrix(numeric), I:set ) Computes isolation index for the split [I, {1..length(d)} minus I]. KHTest Function KHTest - Runs KH test on two tree topologies over a MAlignment. Calling Sequence: KHTest(msa,t1,t2) Parameters: Name Type Description ------------------------------------------------------------------------ msa MAlignment Multiple sequence alignment t1 Tree First tree t2 Tree Second tree method string (optional) BS; RELL; CONV (default) subst string (optional) Substitution model for PhyML (LG) nrOfBootraps posint (optional) Number of bootstraps (100) sigLevel numeric (optional) Significance level Returns: boolean Synopsis: Run KH test on two tree topologies over a MAlignment and return whether the null hypothesis is rejected or not. Tree topologies are kept fixed during resampling. KHTest returns true if null hypothesis is rejected, false otherwise. PhyML is employed to do likelihood maximization and must be installed in order to use this function. KHTest uses either a convolution (default), RELL or bootstrap. References: Goldman N., Anderson J.P., Rodrigo A.G. Likelihood-Based Tests of Topologies in Phylogenetics, Systematic Biology, 49:652-670, 2000 Examples: > ReadProgram('datasets/quartet1/trees.drw');; > msa := ReadFastaIntoMAlignment('datasets/quartet1/MSA_1.fa');; > lprint('BootStrap', KHTest(msa,tree1,tree2,method='BS'));; > lprint('RELL', KHTest(msa,tree1,tree2,method='RELL'));; KWIndex Function KWIndex - Compute the Kabat-Wu Variation Index Calling Sequence: KWIndex(ma) Parameters: Name Type Description -------------------------------------------------- ma array(string) multiple sequence alignment Returns: list(numeric) Synopsis: Computes the Kabat-Wu variation index for all positions of a multiple alignment. References: T.T. Wu, E.A. Kabat: An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity. J. Exp. Med. 132(1970): 211-250. Examples: > ma := [ ' -------------------------FPE', ' - ..(295).. LQCVKYYYV']; ma := [ -------------------------FPE, -------------------IASAGFVRD, AKQVVLLIFGSWQLARERLANEMRKAVAY__T, AEPIVPLLFGMWRLKRKKANNKLLRCVKY__T, AEVIVPLLFGVWRLKREERTYTLLQCVKY__V, AEPIVPLLFGLWQLAREKASNTLLQCVKY__V, EPIVPLL__MWQLAIEKSSNTLLQCVK__KV, PIVPLLFGMWQLAREKASNTLLQCVKYYYV] > kwxd := KWIndex (ma); kwxd := [1, 2.5000, 4.5000, 2.4000, 1, 2.4000, 1, 2.4000, 1, 1, 8, 1, 3, 1, 3, 2.4000, 2.4000, 4.5000, 8, 8, 2.4000, 4.5000, 2.4000, 4.2000, 7, 4.2000, 2.3333, 4.2000, 2.4000, 9, 16, 8] See also: ?PlotIndex ?PrintIndex ?ProbIndex ?ScaleIndex LSBestDelete Function LSBestDelete( AtA:matrix(numeric), btA:list(numeric), btb:numeric ) Least Squares approximation removing the least significant variable. LSBestDelete finds the least significant independent variable to remove. This variable is least significant in the sense that increases the norm of the residuals by the least amount. This is the reverse process of Stepwise regression, where we start with all the independent variables and remove the one with the least norm increase at a time. Problem: Given a matrix of A (dim n x m) and a vector b (dim n), we want to find a vector x (dim m) such that Ax ~ b, where x has one entry which is zero. This approximation is in the least squares sense, i.e. ||Ax-b||^2 is minimum. The calling arguments are: AtA is a matrix (dim m x m) which is the product A^t * A btA is a vector (dim m) which is the product b^t * A btb is the norm squared of b, i.e. b^t * b Output: The output is a list with two values: [i,norm], where i is the index of the variable removed norm is the value of the norm of the residuals without this variable i.e. norm = ||Ax-b||^2 See Also: ?LSBestSum ?LSBestSumDelete LSBestSum Function LSBestSum( AtA:matrix(numeric), btA:list(numeric), btb:numeric ) Least Squares approximation using the best sum of independent variables. LSBestSum finds the best pair of variables which can be replaced by their sum. This pair is best in the sense of increasing the norm of the residuals by the least amount. Problem: Given a matrix of A (dim n x m) and a vector b (dim n), we want to find a vector x (dim m) such that Ax ~ b, where x has two values which are identical. This approximation is in the least squares sense, i.e. ||Ax-b||^2 is minimum. The calling arguments are: AtA is a matrix (dim m x m) which is the product A^t * A btA is a vector (dim m) which is the product b^t * A btb is the norm squared of b, i.e. b^t * b Output: The output is a list with three values: [i,j,norm], where i and j are integers and are the indices of the variables which are replaced by their sum. norm is the value of the norm of the residuals with this sum, i.e. norm = ||Ax-b||^2 See Also: ?LSBestSumDelete ?LSBestDelete LSBestSumDelete Function LSBestSumDelete( AtA:matrix(numeric), btA:list(numeric), btb:numeric ) Least Squares approximation using the best sum of independent variables or best deleted variable. LSBestDelete finds the best pair of variables which can be replaced by their sum or the best variable that can be removed. This is best in the sense of increasing the norm of the residuals by the least amount. This function does the work of both LSBestSum and LSBestDelete in one pass. Problem: Given a matrix of A (dim n x m) and a vector b (dim n), we want to find a vector x (dim m) such that Ax ~ b, where x has two values which are identical or one value which is zero. This approximation is in the least squares sense, i.e. ||Ax-b||^2 is minimum. The calling arguments are: AtA is a matrix (dim m x m) which is the product A^t * A btA is a vector (dim m) which is the product b^t * A btb is the norm squared of b, i.e. b^t * b Output: The output is a list with three values: [i,j,norm], where i and j are integers and are the indices of the variables which are replaced by their sum. If i=0 then j is the variable to be removed. norm is the value of the resulting norm of the residuals, i.e. norm = ||Ax-b||^2 See Also: ?LSBestSum ?LSBestDelete Leaf Class Leaf - external node for binary Tree Template: Leaf(Label) Leaf(Label,Height) Fields: Name Type Description ----------------------------------- Label anything optional label Height numeric optional height Returns: Leaf Methods: Leaf_type Synopsis: The Leaf structure holds the information associated with the leaf of a tree (Tree structure). The format is generally unspecified allowing Leaf structures containing anything. However, most phylogenetic tree construction algorithms in Darwin assume that a leaf label is contained in the first position and the height information is contained in the second position. Type testing for Tree will also yield true for a Leaf so that recursive trees with Leaf() nodes are easy to code. If additional information needs to be stored in the Leaf, the Leaf class can be extended with ExtendClass. Alternatively, extra arguments to Leaf will be left undisturbed. Examples: > t:=Tree(Tree(Leaf(A),5,Leaf(B)),0,Tree(Leaf(C),11,Leaf(D))); t := Tree(Tree(Leaf(A),5,Leaf(B)),0,Tree(Leaf(C),11,Leaf(D))) > t[Left, Left, Label]; A See also: ?DrawTree ?ExtendClass ?Infix ?Leaves ?Postfix ?Prefix ?Tree LeastSquaresTree Function LeastSquaresTree - compute a distance phylogenetic tree using least squares Option: builtin Calling Sequence: LeastSquaresTree(Dist,Var) LeastSquaresTree(Dist,Var,Labels) LeastSquaresTree(Dist,Var,Labels,IniTree,Keep) Parameters: Name Type Description --------------------------------------------------------------------------------- Dist matrix(numeric) Pairwise distances Var matrix(numeric) Variances Labels list Optional labels for the leaves IniTree Tree Initial tree to optimize its branch lengths IniTree 'Random' To start with a completely random tree IniTree 'NJRandom' To start with a random Neighbour-joining like tree IniTree 'Trials' = posint Run n trials with NJRandom and return the best tree Keep 'KeepTopology' (optional) Optimize branch lengths only Returns: Tree Synopsis: This function computes a binary tree which approximates the given distances Dist by least squares. The distances are assumed to have a variance given by the matrix Var. If a list Labels is given, the leaf of the resulting trees are labelled with these values. The Leaf nodes produced have 3 fields: (1) the label given (or their integer index if no Labels are given), (2) the height of the Leaf and (3) their integer index. If the global variable MinLen is assigned a positive value, it will determine the minimum branch length. If not set, 1/1000th of the average distance between leaves is used. The quality of the fit is measured by the sum of the squares of the weighted deviations divided by (n-2)(n-3)/2. This value is stored in the global variable MST_Qual. A dimensionless fitting index is also computed, it is the MST_Qual / variance(Dist) * harmonic_mean(Var). This value is printed and stored in the global variable DimensionlessFit. Trees built over the same set of species, even with radically different methods, can be ranked by the quality of their fit with this index. If the fourth parameter has a Tree, then this tree is taken and optimized. If the fourth argument is the word "Random", then the optimization is started over a random tree. For large trees it makes sense to try several random trees and choose the one with the best MST_Qual. When starting with random trees, the global variable MST_Prob can be set to any numerical value between 0 and 1. Values close to 1 select trees which are very close to the one given by Neighbour Joining. Values close to 0 select completely random trees. Leaving MST_Prob unassigned is equivalent to using NJRandom. When "NJRandom" is used, a Neighbour-joining like tree is make with a variable level of randomness at each step which may produce better random trees. When the word KeepTopology is used, the optimization is done only on the branch lengths. This is useful to optimize the branches of a given tree. The function Tree_matrix extracts the distance matrix from a tree. It is sort of the inverse of LeastSquaresTree. Examples: > D := [[0, 3, 13, 10], [3, 0, 14, 11], [13, 14, 0, 9], [10, 11, 9, 0]]; D := [[0, 3, 13, 10], [3, 0, 14, 11], [13, 14, 0, 9], [10, 11, 9, 0]] > V := [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]; V := [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]] > LeastSquaresTree(D, V); dimensionless fitting index 0 > t := LeastSquaresTree(D, V, [AA, BB, CC, DD]); dimensionless fitting index 0 > print(Tree_matrix(t)); 0 3 13 10 3 0 14 11 13 14 0 9 10 11 9 0 See Also: ?BootstrapTree ?Leaf ?Synteny ?ComputeDimensionlessFit ?PhylogeneticTree ?Tree ?DrawTree ?RBFS_Tree ?Tree_matrix ?GapTree ?SignedSynteny ?ViewPlot LinearClassification Class LinearClassification - results of a linear classification Template: LinearClassification(X,X0,WeightPos,WeightNeg,NumberPos,NumberNeg, WeightedFalses,HighestNeg,LowestPos) Fields: Name Type Description ---------------------------------------------------------------------- X list(numeric) solution vector X0 numeric threshold value WeightPos numeric weight of the positives WeightNeg numeric weight of the negatives NumberPos posint number of positives NumberNeg posint number of negatives WeightedFalses numeric weighted misclassifications HighestNeg list([posint, numeric]) highest scoring negatives LowestPos list([posint, numeric]) lowest scoring positives Returns: LinearClassification Methods: DirOpt2 DirOpt3 DirOpt4 LinearClassification_type print refine Synopsis: Data structure which holds the result of a linear classification. A linear classification is defined by a vector X, such that the internal product of every data point A[i] with X can be compared against a threshold and decide whether the data point is a positive or a negative. I.e. A[i] X < X0 implies a negative and A[i] X >= X0 a positive. See also: ?LinearClassify LinearClassify Function LinearClassify - Linear form which does pos/neg classification Calling Sequence: LinearClassify(A,accept,mode,WeightNeg) Parameters: Name Type Description ----------------------------------------------------------------------------- A matrix(numeric) an n x m matrix of independent variables accept {list,procedure} positive/negative determination mode anything optional, algorithm selection, defaults to Svd WeightNeg positive optional, weight of negatives Returns: LinearClassification Global Variables: BestLinearClassifications ComputeSensitivity Synopsis: Computes a vector X such that the values A[i]*X >= X0 classify the positive/negatives. Such a vector is called a linear discriminant in statistics. A special way of computing such a vector is called the Fisher linear discriminant. LinearClassify normally produces results which are much better than the Fisher linear discriminant. LinearClassify returns a data structure called LinearClassification which contains the vector X and the splitting value, called X0, the weights of positive and negatives, the score obtained and the worst misclassifications. The third argument, if present, directs the function to use a particular method of computation. The methods are mode description ------------------------------------------------------------------------- BestBasis equivalent to BestBasis(10) BestBasis(k) Least Squares to 0-1 using SvdBestBasis of size k Svd (default mode) equivalent to Svd(1e-5) Svd(bound) Least Squares to 0-1 using Svd, with svmin=bound Svd(First(k)) LS to 0-1 using Svd so that k sing values are used CenterMass Direction between the pos/neg center of masses Variance Variance Discrimination of each variable Fisher Fisher linear discriminant Logistic Steepest descent optimization using the logistic function CrossEntropy Steepest descent optimization using cross entropy Best equivalent to Best(10) Best(n) A combination of methods found most effective by experimentation. n determines the amount of optimization. Svd(1e-5) In practice the best results are obtained with Best(n). Svd(1e-12) and CrossEntropy are also very effective. Once a LinearClassification has been computed, it can be improved or refined with the functions: function description -------------------------------------------------------------------- LinearClassification_refine find the center of the min in each dim LinearClassification_refine2 Svd applied to a hyperswath LinearClassification_refine3 Minimize in a random direction LinearClassification_refine4 Svd on progressively smaller swaths Unless you use the Best option, no matter how good the initial results are, it always pays to do some refinement steps. In particular refine2 and refine4 give very good refinements. Examples: > A := [[0,3], [8,5], [10,7], [5,5], [7,4], [7,9]]: > lc := LinearClassify( A, [0,1,1,0,1,0] ): > print(lc); solution vector is X = [0.1945, -0.1364] discriminator is A[i] * X > 0.553293 6 data points, 3 positive, 3 negative positives weigh 1, negatives 1, overall misclassifications 0 Highest negative scores: [] Lowest positive scores: [] See Also: ?LinearClassification ?Stat ?SvdBestBasis ?LinearRegression ?SvdAnalysis LinearIntron Class LinearIntron Template: LinearIntron(n,pam,minlen,F,I) Fields: Name Type ---------------------------- n nucleotide sequence pam numeric minlen integer F numeric I numeric Returns: NULL Methods: LinearIntron_type Global Variables: LI_oldF LI_oldI LI_oldlen LI_oldn LI_oldres Synopsis: Computes and stores the general linear intron scoring model. Use LinearIntron(minlen, F, I) to score F + (len - 1) * I for any subsequence of length len >= minlen fulfilling the GT-AG rule. See also: ?IntronModel LinearProgramming Function LinearProgramming - Solves a linear optimization problem Calling Sequence: LinearProgramming(A,b,c) Parameters: Name Type Description ---------------------------------------------------------------------------------- A matrix(numeric) Matrix of LHS coefficients b list(numeric) Vector of RHS coefficients c {Feasibility,list(numeric)} Vector of coefficients for objective function Returns: [list(numeric), set(posint)] : where the first element is the solution and the second is the set of indices to rows of A which define the corner x SimplexHasNoSolution : when there is no solution SimplexIsSingular : when it cannot find a subset of rows from A which is non-singular UnboundedSolution(x,d) : where x + h*d, is a solution for any h>=0 and c*(x+h*d) grows unboundedly Synopsis: LinearProgramming( A, b, c ) solves the problem of finding a vector x such that Ax >= b and c*x is maximum. This is the unconstrained problem, the variables in x can be positive or negative, for the classical problem, x >= 0, these conditions have to be stated explicitly. If c is 'Feasibility' LinearProgramming will only attempt to find a feasible solution, which is returned and do no optimization. This saves computation. Examples: > A := [[-1, -1.5000], [-2, -1], [1, 0], [0, 1]]; > b := [-750, -1000, 0, 0]; > c := [50, 40]; > LinearProgramming(A,b,c);; See Also: ?Cholesky ?EvolutionaryOptimization ?Identity ?SvdAnalysis ?convolve ?GaussElim ?matrix ?transpose ?Eigenvalues ?GivensElim ?matrix_inverse LinearRegression Function LinearRegression - Compute a linear regression Calling Sequence: LinearRegerssion(y,x1,...) Parameters: Name Type Description ---------------------------------------------------------- y array(numeric) dependent variable y table dependent data are values in table x1 array(numeric) independent variable(s) Returns: array(numeric) Global Variables: SumSq Synopsis: Computes a linear regression y = a0 + a1*x1 + a2*x2 + ... by least squares. The number of arguments is variable, it should be at least 2. LinearRegression returns the vector [a0,a1,a2,...]. The global variable SumSq is set to the sum of squares of errors in the regression. Alternatively, if only one argument is provided, and it is a table, the regression will be made as if the table values were the dependent variable and the table arguments were the independent variable(s). Hence the arguments of the table must be either numbers or lists of numbers, consistently. Examples: > LinearRegression( [2.1,3.01,3.9,4.89], [0,1,2,3] ); [2.0860, 0.9260] > SumSq; 0.00232000 See also: ?ExpFit ?ExpFit2 ?Stat ?SvdAnalysis LnGamma Function LnGamma - logarithm of the Gamma and Incomplete Gamma functions Calling Sequence: LnGamma(a) LnGamma(a,x) Parameters: Name Type Description ----------------------------------------------------------------------------- a numeric a numerical value x nonnegative a nonnegative argument for the Incomplete Gamma function Returns: numeric Synopsis: For a positive integer a, LnGamma returns the logarithm of the product of 1*2*3*...*(a-1) = ln( (a-1)! ). LnGamma satisfies the functional equation: LnGamma(a+1) = ln(a) + LnGamma(a) = ln(Gamma(a+1)) For non-integer negative values, LnGamma returns the logarithm of the absolute value of Gamma. LnGamma is used to compute factorials or combinatorial numbers when the results are too large to be represented as floating point numbers. LnGamma will compute results for virtually all possible arguments. When Gamma is used with two arguments, it is understood to be the Incomplete Gamma function, defined by the integral: infinity / | (a - 1) LnGamma(a, x) = ln( | t exp(-t) dt) | / x References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 6.1, 6.5.3 Examples: > LnGamma(2); 0 > LnGamma(-100.5); -364.9010 > LnGamma(15000); 129233.1932 > LnGamma(100,100); 358.4141 See also: ?factorial ?Gamma ?Lngamma Lngamma Function Lngamma - logarithm of the complement of the Gamma function Calling Sequence: Lngamma(a,x) Parameters: Name Type Description ----------------------------------------------------------------------------- a positive a numerical value x nonnegative a nonnegative argument for the Incomplete Gamma function Returns: numeric Synopsis: Lngamma is the logarithm of the complement with respect to Gamma(a) of the Incomplete Gamma function: Lngamma(a,x) = ln( Gamma(a) - Gamma(a,x) ) x / | (a - 1) Lngamma(a, x) = ln( | t exp(-t) dt) | / 0 References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 6.5.2 Examples: > Lngamma(2,3); -0.2221 > ln( Gamma(2) - Gamma(2,3) ); -0.2221 See also: ?factorial ?Gamma ?LnGamma LoadMatrixFile Function LoadMatrixFile - Loads a substitution rate matrix and character frequencies from a file. Calling Sequence: LoadMatrixFile(f) Parameters: Name Type Description ---------------------------- f string path to file Returns: Q : freq Synopsis: The function LoadMatrixFile reads a matrix file in PAML compatible format. It computes the substitution rate matrix and returns it, together with the character frequency vector. It is assumed that the order of amino acids and codons is always the same and the matrix is re-ordered to correspond to the order used by Darwin. Examples: > LoadMatrixFile('matrices/wag.dat'); See also: LocalNucPepAlign Function LocalNucPepAlign Calling Sequence: LocalNucPepAlign(npm,D) Parameters: Name Type ------------------ npm NucPepMatch D DayMatrix Returns: NucPepMatch Synopsis: Return the NucPepMatch between the nucleotide and the peptide of npm with the highest score. Examples: See Also: ?AlignNucPepAll ?GetPeptides ?VisualizeGene ?FindNucPepPam ?GlobalNucPepAlign ?VisualizeProtein ?Gene ?LocalNucPepAlignBestPam ?GetIntrons ?NucPepMatch LocalNucPepAlignBestPam Function LocalNucPepAlignBestPam Calling Sequence: LocalNucPepAlignBestPam(m) Parameters: Name Type ------------------ m NucPepMatch Returns: NucPepMatch Synopsis: Apply LocalNucPepAlign and FindNucPepPam until a maximum is found. Examples: See Also: ?AlignNucPepAll ?GetPeptides ?VisualizeGene ?FindNucPepPam ?GlobalNucPepAlign ?VisualizeProtein ?Gene ?LocalNucPepAlign ?GetIntrons ?NucPepMatch LockFile Function LockFile - createas a exclusive lock file Option: builtin Calling Sequence: LockFile(filename,message) Parameters: Name Type Description ---------------------------------------------------------------- filename string the name of the exclusive lock file message string (optional) a comment, added to the lock file Returns: boolean Synopsis: This command creates a file with the given name which contains some information about the process and, optionally, the given message. The creation of this file is done in such a way that only one process will succeed in creating such a file, even if various are competing for the same filename. This implements an exclusive lock mechanism or a semaphore. The command returns true when it was successful in securing the lock and false otherwise. The filename will contain a single line with the hostname, process id number, date and any given message. It is guaranteed that only one process will be successful with a given lock file. This will work on file systems which implement the exclusive locking mechanims provided by fcntl (see "man 2 fcntl" in unix/linux). Examples: See Also: ?FileStat ?OpenWriting ?ReadRawLine ?SplitLines ?inputoutput ?ReadData ?ReadURL ?OpenAppending ?ReadLine ?SearchDelim ?OpenReading ?ReadRawFile ?ServerSocket LongInteger Function LongInteger( s ) Data structure LongInteger( ... ) Representation of integers which could exceed the 53 bits of precision available with IEEE double precision floating point numbers. Operations with LongIntegers are contagious, that is to say that any arithmetic operation with at least one LongInteger argument will return a LongInteger result. This implementation is OO and any program/function working correctly for integers, should work correctly when the input contains LongIntegers (with the obvious differences accounted for additional precision). - Operations: Initialization: a := LongInteger( ) a := LongInteger( ) a := LongInteger( , , ... ) The first case transform the integer argument to the long precision format. The second format accepts a string which should contain an integer (possibly signed) of arbitrary length. The third case is to build a long precision integer when its representation base LongInteger_base is known. LongIntegers are represented by a LongInteger structure having the following properties: a := LongInteger( i1, i2, i3, .... , i[k] ); value: i1 + i2*LongInteger_base + i3*LongInteger_base^2 + ... assertions: -LongInteger_base/2 <= i[j] <= LongInteger_base/2 i[k] <> 0 (except for the representation of 0) Arithmetic operations: a+b, a-b, a*b, iquo(a,b), a^b, mod(a,b), |a| (powering is only supported for positive exponents) Boolean operations: a = b, a <= b, a < b Special functions Rand(LongInteger) Printing: print(a); printf( '%d', a ); Type testing: type(a,LongInteger); - Conversions: To string : string(a) numeric : numeric(a) - Selectors: no selectors See also, ?Inherit ?integer ?LLL MAlign Function MAlign - multiple sequence alignment using various methods Calling Sequence: MAlign(seqs,method,labels,tree,allall) Parameters: Name Type Description ------------------------------------------------------------------------------ seqs list(string) sequences to align method string (optional) method(s) to compute the alignment labels list(string) (optional) labels for the sequences tree Tree (optional) Tree used by the prob method allall matrix({0,Alignment}) (optional) all-against-all Alignments Returns: MAlignment Global Variables: MSA_CircularTour Synopsis: MAlign does a multiple sequence alignment (MSA) using the given method(s). The valid methods are: prob Probabilistic method to build MSA circ Circular tour method to build MSA best Chooses the best of 4 methods (expensive) Global Global alignments between sequences Local Local alignments between sequences CFE Cost Free End alignments between sequences GapHeuristic Use gap heuristics to improve the result If a method is not specified, the probabilistic method will be used. The GapHeuristic can be specified in addition to the other method specification. If a tree is not provided, it will be calculated (for the Probabilistic method which needs a tree). If an all-against-all Alignment array is not provided, one will be calculated. With the method best, 4 different multiple sequence alignments will be computed (circular, and probabilistic with Local, CFE and Global) and the best scoring one will be returned. GapHeuristics are used for the 4 methods. This is naturally 4 times more expensive than a single alignment and should be used with care. Examples: > msa := MAlign(['ASDFAA','ASDAV','ASFDAA']):; dimensionless fitting index 73.14 > print(msa); Multiple sequence alignment: ---------------------------- Score of the alignment: 14.782993 Maximum possible score: 23.171372 Sequence 1 _ASDFAA Sequence 2 _ASDAV_ Sequence 3 ASFDAA_ > msa := MAlign(['ASDFAA','ASDAV','ASFDAA'], 'circ'):; > print(msa); Multiple sequence alignment: ---------------------------- Score of the alignment: 1.8851224 Maximum possible score: 15.639881 Sequence 1 ASDFAA Sequence 2 ASD_AV Sequence 3 ASFDAA See Also: ?Align ?Clusters ?DynProgStrings ?Alignment ?DynProgScore ?MAlignment MAlignment Class MAlignment - a protein or DNA multiple sequence alignment Template: MAlignment(InputSeqs,AlignedSeqs,labels,method,PrintOrder,Score, UpperBound,tree,AllAll) Fields: Name Type Description ----------------------------------------------------------------------- InputSeqs list(string) input sequences (before alignment) AlignedSeqs list(string) aligned sequences (in input order) labels list(string) labels for the sequences (in input order) method string method(s) that generated the MSA PrintOrder list(integer) order used for printing and scoring Score numeric score of the MSA (circular tour) UpperBound numeric upper bound score (circular tour) tree Tree tree used by the probabilistic method AllAll matrix all against all Alignment matrix Methods: MAlignment_type PartialOrderMSA print Rand select string Synopsis: An MAlignment stores the information of a multiple sequence alignment. The sequences may contain proteins or DNA. The Score and UpperBound (on the score) are calculated using the circular tour method. In order to force recalculation of the score, use the selector RecalcScore. A MAlignment is normally created by calling MAlign. See also: ?Align ?Alignment ?MAlign MLTopoTest Function MLTopoTest - Run KH test on an prespecified tree and ML tree over a MAlignment and return whether the null hypothesis is rejected or not. Calling Sequence: MLTopoTest(msa,t1) Parameters: Name Type Description ------------------------------------------------------------------------ msa MAlignment Multiple sequence alignment t1 Tree Input tree subst string (optional) Substitution model for PhyML (LG) nrOfBootraps posint (optional) Number of bootstraps (100) sigLevel numeric (optional) Significance level Returns: boolean Synopsis: Run KH test on an apriori tree and the ML tree over a MAlignment and return whether the null hypothesis is rejected or not. MLTopoTest returns true if null hypothesis is rejected, false otherwise. PhyML is employed to do likelihood maximization and must be installed in order to use this function. References: Goldman N., Anderson J.P., Rodrigo A.G. Likelihood-Based Tests of Topologies in Phylogenetics, Systematic Biology, 49:652-670, 2000 Examples: > ReadProgram('datasets/quartet1/trees.drw');; > msa := ReadFastaIntoMAlignment('datasets/quartet1/MSA_1.fa');; > lprint('ML KH test', MLTest(msa,tree1));; MSAMethod Data structure MSAMethod( ) Function: creates a datastructure for MSA construction Selectors: Method: String "PROB", "CLUSTAL", "MSA", "REPEATED" or any combination with "GAP", e.g. "PROB GAP" Default: "PROB GAP" Gap: GapHeuristics() If GAP is used in Method, the GapHeuristics data structure is used MSAStatistics Data structure MSAStatistics( ) Data structure that keeps statistical data about MSA constructions and methods Selectors: Type: Tree Information on the Tree that was used Construction: TreeConstruction Information about the TreeConstruction type that was used Method: MSAMethod Type of MSA Method that was used Real: Integer Number of best msa constructions Total: Integer Total number of msas construced Score: Stat() Average Score of msa Deltascore: Stat() Difference of real score minus calculated score Name: string Name/Title of these statistics MST Function MST - Minimum-Spanning Tree algorithm Calling Sequence: MST(A) Parameters: Name Type Description -------------------------- A Graph a Graph Returns: Graph Synopsis: The input to this algorithm is an undirected graph. It computes the minimum spanning tree according to Prim's algorithm. The implementation has a time complexity of O(|V|^2*log(|V|)), whereas the theoretical minimum is O(|E|). Therefore, this implementation is relatively good when working with dense graphs, in which case |E| is O(|V^2|). Examples: > hex := HexahedronGraph(); hex := Graph(Edges(Edge(0,1,2),Edge(0,1,4),Edge(0,1,5),Edge(0,2,3),Edge(0,2,6),Edge(0,3,4),Edge(0,3,7),Edge(0,4,8),Edge(0,5,6),Edge(0,5,8),Edge(0,6,7),Edge(0,7,8)),Nodes(1,2,3,4,5,6,7,8)) > MST(hex); Graph(Edges(Edge(0,1,2),Edge(0,2,3),Edge(0,1,4),Edge(0,1,5),Edge(0,2,6),Edge(0,3,7),Edge(0,4,8)),Nodes(1,2,3,4,5,6,7,8)) See Also: ?BipartiteGraph ?Graph_minus ?ParseDimacsGraph ?Clique ?Graph_Rand ?Path ?DrawGraph ?Graph_XGMML ?RegularGraph ?Edge ?InduceGraph ?ShortestPath ?EdgeComplement ?MaxCut ?TetrahedronGraph ?Edges ?MaxEdgeWeightClique ?VertexCover ?FindConnectedComponents ?MinCut ?Graph ?Nodes Machine Class Machine - structure to hold Machine references Template: Machine(Name,User,Class,Processes,MaxProcesses,LoginControl, OffHours,LoadRange,ForcedRun,NiceValue,StartCycle,DownCount, LastProcess) Fields: Name Type ------------------------------- Name string User string Class integer Processes list(Process) MaxProcesses posint LoginControl boolean OffHours integer..integer LoadRange numeric..numeric ForcedRun boolean NiceValue integer StartCycle numeric DownCount integer LastProcess integer Returns: Machine Methods: Machine_type select Synopsis: This data structure holds information about a particular machine (computer). The main application is for parallel processing and hence it contains all sorts of controlling information. See also: ?darwinipc ?ParExec2 ?Process MafftMSA Function MafftMSA - Multiple sequence alignment using Mafft Calling Sequence: MafftMSA(seqs,labels,dm) Parameters: Name Type Description -------------------------------------------------------------------- seqs list(string) sequences to align labels list(string) (optional) labels for the sequences dm DayMatrix (optional) Dayhoff matrix used for alignment Returns: MAlignment Synopsis: MafftMSA computes a multiple sequence alignment (MSA). If no Dayhoff matix is passed, mafft uses the BLOSUM62 scoring matrix. Since mafft does not return a score of the MSA, the score and upperbound score in the MAlignment data structure is left undefined. The function works only in unix/linux, and assumes that Mafft is available. Information and source of mafft is available from 'http://align.bmr.kyushu-u.ac.jp/mafft/software/'. Examples: > msa := MafftMSA(['ASDFAARA','ASDAVRA','ASFDAATA']); > print(msa); Multiple sequence alignment: ---------------------------- Score of the alignment: 0 Maximum possible score: 1.7976931e+308 1 ASDFAARA 2 AS_DAVRA 3 ASFDAATA See also: ?Align ?Alignment ?MAlign ?MAlignment MapleFormula Class MapleFormula - mathematical formula given in Maple format Template: MapleFormula(string) Fields: Name Type ------------------------------------- string math formula in maple format Returns: MapleFormula Methods: HTMLC MapleFormula_type Rand string Synopsis: A MapleFormula object is constructed with a single argument, the formula that is to be sent to Maple for "nice" text output formatting. Examples: > M := MapleFormula('sum(i,i=1..10)'): > print(M); 10 ----- \ ) i / ----- i = 1 See Also: ?Block ?HTML ?Paragraph ?Table ?Code ?HyperLink ?PostscriptFigure ?TT ?Color ?Indent ?print ?View ?Copyright ?LastUpdatedBy ?Roman ?DocEl ?latex ?RunDarwinSession ?Document ?List ?screenwidth Match Class Match - Structure data type to hold peptide/peptide matches Template: Match(Offset1,Offset2) Match(Sim,Offset1,Offset2) Match(Sim,Offset1,Offset2,Length1,Length2) Match(Sim,Offset1,Offset2,Length1,Length2,pam) Match(Sim,Offset1,Offset2,Length1,Length2,PamNumber,PamVariance) Fields: Name Type Description -------------------------------------------------------------------------- Sim numeric similarity score of the Match Offset1 posint offset of the first sequence in the database Offset2 posint offset of the second sequence in the database Length1 posint length of the match of the first sequence Length2 posint length of the match of the second sequence PamNumber numeric Estimate of the PAM distance between the sequences PamVariance numeric Estimate of the PAM variance between the sequences Returns: Match Methods: AC Alignment Entry ID Match_type print Sequence Synopsis: The Match structure holds all the necessary information for the alignment of two peptide sequences. The offsets are positions into a peptide database, hence Match requires that an appropriate database has been loaded. The offsets are relative to the system variable DB. Typically, Match structures are initialized by giving only the two offsets. The remaining fields are completed by one of several alignment algorithms. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > m:=Match( Sequence(Entry(1)), Sequence(Entry(2)) ); m := Match(376,1836) > m2 := Match( GetOffset('UTTUWPC'), Sequence(Entry(20))); m2 := Match(377757968,19068) See also: ?GetOffset ?MAlign ?NucPepMatch ?ReadDb ?TotalAlign MatchRegex Function MatchRegex - matches a regex in a string Option: builtin Calling Sequence: MatchRegex(pat,txt) Parameters: Name Type Description --------------------------------------------- pat string a regex pattern to be matched txt string a text which is searched Returns: list(string) Synopsis: This function matches a regex pattern string in the POSIX Extended Regular Expression syntax in a query string. The matching is case sensitive. If the pattern cannot be matched, the empty list is returned. Examples: > MatchRegex('^a(b|c*)e', 'accceb'); [accce, ccc] > MatchRegex('([a-c]*)de(a.*)', 'xacccdeabbb'); [acccdeabbb, accc, abbb] > MatchRegex('[A-D]a', 'acccda'); [] See Also: ?BestSearchString ?SearchApproxString ?SearchString ?CaseSearchString ?SearchDelim ?HammingSearchString ?SearchMultipleString Matrices Function Matrices Calling Sequence: Matrices() Returns: NULL Synopsis: This function loads various peptide scoring matrices including the Gonnet/Benner PAM matrices, Blosum{50,60,62,70}, UNITARY, UNITARY2, RDDH250 (see `Amino Acid Substitutions in Structurally Related Proteins', JMB (1988) 204, 1019-1029. by Risler, Delorme, Delacroix and Henaut.), PIMA. See also: ?CreateDayMatrices ?CreateDayMatrix ?DayMatrix MaxCut Function MaxCut - Approximate max-cut algorithm Calling Sequence: MaxCut(G) MaxCut(G,weighted) Parameters: Name Type Description ------------------------------------------------------- G Graph a Graph weighted boolean (optional) compute weighted maxcut Returns: list : [set,set,numeric] Synopsis: MaxCut is the problem of computing the maximum cut of a undirected graph G(V,E), i.e., that of partitioning the vertex set V into two parts so that the number (resp. weights) of edges joining vertices in different parts is as large as possible. It is known to be NP-hard. This greedy approximation algorithm solves the unweighted MaxCut problem in O(e+n) (weighted O(e*log(e)+n)) and is a 1/2+1/(2n) approximation. The weighted form of the algorithm expects numeric Label fields in the graph data-structure. The algortihm returns the two disjoint vertex sets and the number (resp. weights) of the edges crossing the two sets. Examples: > G := Rand(Graph): > MaxCut(G); [{2,5,7,9,10}, {1,3,4,6,8}, 15] See Also: ?BipartiteGraph ?Graph_minus ?ParseDimacsGraph ?Clique ?Graph_Rand ?Path ?DrawGraph ?Graph_XGMML ?RegularGraph ?Edge ?InduceGraph ?ShortestPath ?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph ?Edges ?MinCut ?VertexCover ?FindConnectedComponents ?MST ?Graph ?Nodes MaxEdgeWeightClique Function MaxEdgeWeightClique - Maximum edge-weight clique approximate algorithm Option: builtin Calling Sequence: MaxEdgeWeightClique(A) Parameters: Name Type Description ------------------------------------------------- A Graph a Graph with positive edge weights Returns: set Synopsis: The input to this algorithm is an undirected graph. An undirected graph is represented as a Graph data structure which should accept three selectors: Nodes, Edges and Weight. An approximation algorithm is used to find the best Clique. The global variable CliqueIterFactor may be assigned a non-negative number f. The larger f, the more accurate the answers will be, and the more time the algorithm will consume. The default behaviour is identical to setting CliqueIterFactor to 1. The current version does part of the searching for the best solution in a random way, so for large problems, different runs may give different results. This allows the algorithm to be run in parallel if necessary. For convenience, the global variable TotalEdgeWeight is assigned the sum of edge-weights of the clique found. See Also: ?BipartiteGraph ?Graph_minus ?ParseDimacsGraph ?Clique ?Graph_Rand ?Path ?DrawGraph ?Graph_XGMML ?RegularGraph ?Edge ?InduceGraph ?ShortestPath ?EdgeComplement ?MaxCut ?TetrahedronGraph ?Edges ?MinCut ?VertexCover ?FindConnectedComponents ?MST ?Graph ?Nodes MaxLikelihoodSize Function MaxLikelihoodSize Calling Sequence: MaxLikelihoodSize(m,k) Parameters: Name Type ------------------------------------------- m posint, the number of balls k posint, the number of occupied boxes Returns: posint Synopsis: MaxLikelihoodSize determines the number of boxes from a "balls in boxes" experiment by maximum likelihood. When m balls are randomly distributed and they occupy k boxes this function returns the most likely total number of boxes. An application of this is the estimation of how many local minima a function may have, when m random searches find k different minima. If m=k, the determination is not possible (result is infinity) and the function returns DBL_MAX. The probability of m balls randomly distributed among n boxes using k of them is: Pr{n,m,k} = stirling2(m,k) * n(n-1)...(n-k+1) / n^m Examples: > MaxLikelihoodSize(10,9); 42 > MaxLikelihoodSize(10,10); 1.7976931348623147e+308 > MaxLikelihoodSize(120,100); 316 See also: ?ProbBallsBoxes ?ProbCloseMatches MaximizeFunc Function MaximizeFunc Calling Sequence: MaximizeFunc(f,r,tol) MaximizeFunc(f,r) Parameters: Name Type ----------------------- f procedure r numeric..numeric tol numeric >= 0 Returns: [x, f(x)] Synopsis: This function finds the maximum [x, f(x)] of a convex function f over a range r within an absolute accuracy of tol. If tol is not given as a parameter, the result is of machine accuracy. Examples: > MaximizeFunc(x -> sin(x), 0..1); [1.0000, 0.8415] > MaximizeFunc(x -> x^2 - 3*x, -2..0); [-2.0000, 10.0000] See Also: ?BFGSMinimize ?MaxLikelihoodSize ?MinimizeBrent ?MinimizeSD ?DisconMinimize ?Minimize2DFunc ?MinimizeFunc ?NBody MaximizeRD Function MaximizeRD Calling Sequence: MaximizeRD(ini,f,ran,MaxHours) Parameters: Name Type Description ------------------------------------------------------------------- ini anything Initial solution f procedure Function to be optimized ran procedure Procedure that returns a new direction MaxHours positive Optional, limit of computation time in hour. Returns: point:type(ini) Synopsis: This function finds the point (in a potentially high dimensional space) that maximizes f using random directions. The input "ini" can be of any type that accepts linear operations (type(ini) could be numerical, list(numerical),matrix(numerical) or anything which accepts addition of similar objects and multiplication by numerical constants). The function f takes a single argument of type(ini) and returns a numerical value. f(ini) is the initial value. The function f does not need to be continuous. It is common to have f returning -DBL_MAX when the argument is out of the valid range. The procedure ran returns an object of type(ini) and provides a random direction provides a random direction in the space of the arguments. ran is called with an argument which is the most recent optimal point. This is useful when the generation of the random direction requires information about the point. Let d := ran( pt ); Then, f( pt + h*d ) are the points that will be explored, that is starting from pt following the direction d. It is clear that pt + h*d has to be computable or in other words (h is numeric) that type(ini) is an object which accepts linear operations. (type(ini) could be numerical, list(numerical), matrix (numerical) or anything which accepts addition of similar objects and multiplication by numerical constants). Examples: > MaximizeFunc(x -> sin(x), 0..1); [1.0000, 0.8415] > MaximizeFunc(x -> x^2 - 3*x, -2..0); [-2.0000, 10.0000] See Also: ?BFGSMinimize ?MaxLikelihoodSize ?MinimizeFunc ?DisconMinimize ?Minimize2DFunc ?MinimizeSD ?MaximizeFunc ?MinimizeBrent ?NBody MinCut Function MinCut - Approximate min-cut algorithm Calling Sequence: MinCut(G) MinCut(G,errbound) Parameters: Name Type Description ----------------------------------------------------------------------- G Graph a Graph errbound nonnegative (optional) error bound for not finding minimum Returns: list : [integer, Nodes, Nodes] Synopsis: MinCut is the problem of computing the minimal cut of a undirected graph G(V,E), i.e., that of partitioning the vertex set V into two parts so that the number of edges joining vertices in different parts is minimal. This randomized algorithm solves computes a MinCut in O(n^2*log^3(n)). The optional argument 'errbound' is used to set the number of trial runs and has been empirically found to be very conservative. The algorithm returns the number of edges which cross the cut and the two disjoint vertex sets. Examples: > G := Graph({{1,2},{2,3},{1,3},{1,4}},{1,2,3,4}); G := Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,1,4),Edge(0,2,3)),Nodes(1,2,3,4)) > MinCut(G); [1, Nodes(1,2,3), Nodes(4)] See Also: ?BipartiteGraph ?Graph_minus ?ParseDimacsGraph ?Clique ?Graph_Rand ?Path ?DrawGraph ?Graph_XGMML ?RegularGraph ?Edge ?InduceGraph ?ShortestPath ?EdgeComplement ?MaxCut ?TetrahedronGraph ?Edges ?MaxEdgeWeightClique ?VertexCover ?FindConnectedComponents ?MST ?Graph ?Nodes MinSquareTree Function LeastSquaresTree - compute a distance phylogenetic tree using least squares Option: builtin Calling Sequence: LeastSquaresTree(Dist,Var) LeastSquaresTree(Dist,Var,Labels) LeastSquaresTree(Dist,Var,Labels,IniTree,Keep) Parameters: Name Type Description --------------------------------------------------------------------------------- Dist matrix(numeric) Pairwise distances Var matrix(numeric) Variances Labels list Optional labels for the leaves IniTree Tree Initial tree to optimize its branch lengths IniTree 'Random' To start with a completely random tree IniTree 'NJRandom' To start with a random Neighbour-joining like tree IniTree 'Trials' = posint Run n trials with NJRandom and return the best tree Keep 'KeepTopology' (optional) Optimize branch lengths only Returns: Tree Synopsis: This function computes a binary tree which approximates the given distances Dist by least squares. The distances are assumed to have a variance given by the matrix Var. If a list Labels is given, the leaf of the resulting trees are labelled with these values. The Leaf nodes produced have 3 fields: (1) the label given (or their integer index if no Labels are given), (2) the height of the Leaf and (3) their integer index. If the global variable MinLen is assigned a positive value, it will determine the minimum branch length. If not set, 1/1000th of the average distance between leaves is used. The quality of the fit is measured by the sum of the squares of the weighted deviations divided by (n-2)(n-3)/2. This value is stored in the global variable MST_Qual. A dimensionless fitting index is also computed, it is the MST_Qual / variance(Dist) * harmonic_mean(Var). This value is printed and stored in the global variable DimensionlessFit. Trees built over the same set of species, even with radically different methods, can be ranked by the quality of their fit with this index. If the fourth parameter has a Tree, then this tree is taken and optimized. If the fourth argument is the word "Random", then the optimization is started over a random tree. For large trees it makes sense to try several random trees and choose the one with the best MST_Qual. When starting with random trees, the global variable MST_Prob can be set to any numerical value between 0 and 1. Values close to 1 select trees which are very close to the one given by Neighbour Joining. Values close to 0 select completely random trees. Leaving MST_Prob unassigned is equivalent to using NJRandom. When "NJRandom" is used, a Neighbour-joining like tree is make with a variable level of randomness at each step which may produce better random trees. When the word KeepTopology is used, the optimization is done only on the branch lengths. This is useful to optimize the branches of a given tree. The function Tree_matrix extracts the distance matrix from a tree. It is sort of the inverse of LeastSquaresTree. Examples: > D := [[0, 3, 13, 10], [3, 0, 14, 11], [13, 14, 0, 9], [10, 11, 9, 0]]; D := [[0, 3, 13, 10], [3, 0, 14, 11], [13, 14, 0, 9], [10, 11, 9, 0]] > V := [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]; V := [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]] > LeastSquaresTree(D, V); dimensionless fitting index 0 > t := LeastSquaresTree(D, V, [AA, BB, CC, DD]); dimensionless fitting index 0 > print(Tree_matrix(t)); 0 3 13 10 3 0 14 11 13 14 0 9 10 11 9 0 See Also: ?BootstrapTree ?Leaf ?Synteny ?ComputeDimensionlessFit ?PhylogeneticTree ?Tree ?DrawTree ?RBFS_Tree ?Tree_matrix ?GapTree ?SignedSynteny ?ViewPlot Minimize2DFunc Function Minimize2DFunc Calling Sequence: Minimize2DFunc(f,x,y,prevpoints) Parameters: Name Type ------------------------------------------------------------------- f a function with two arguments to be minimized x an optional initial value for the first argument to f y an optional initial value for the second argument to f prevpoints an optional list of triplets, [x,y,f(x,y)] Returns: [numeric, numeric, numeric] : [x,y,f(x,y)], a triplet, where x,y is a local minimum of f Synopsis: Minimize2DFunc minimizes the function f in two variables. If x,y are given, the minimization starts at the point x,y. If additional points are known, they can be included in the list prevpoints and they will not be recomputed. To avoid a point (which, for example, has a singularity) the point should be included in the prevpoints list with a very high (faked) value. If no points are given, Minimize2DFunc starts at random values U(-1, 1) of x and y. Minimize2DFunc assumes that f(x,y) is very expensive to compute and tries to do a minimum number of evaluations. Examples: > Minimize2DFunc((x,y) -> sin(x*y)+cos(y)); [0.5000, -3.1416, -2] > Minimize2DFunc((x,y) -> x^4+3*y^2-x*y,1,2); [0.2041, 0.03402069, -0.00173611] See Also: ?BFGSMinimize ?MaximizeFunc ?MinimizeBrent ?MinimizeSD ?DisconMinimize ?MaxLikelihoodSize ?MinimizeFunc ?NBody MinimizeBrent Function MinimizeBrent - Univariate minimization using Brent's algorithm Calling Sequence: MinimizeBrent(f,iniguess,incr,relateps) Parameters: Name Type -------------------------------------------------------- f a function of one argument to be minimized iniguess an initial value for the argument of f incr an initial increment to probe around iniguess relateps a relative error goal in the argument of f Returns: [numeric, numeric] : [x,f(x)], a pair, where x is a local minimum of f Synopsis: MinimizeBrent minimizes the function f(x) in one variable. The minimization starts at the point "iniguess" and probes around with initial increment "incr". By giving a small increment, one can have some assurance that a local minimum close to the initial guess will be found. Additional arguments to MinimizeBrent are passed to the function f(x), so that f(x) can be written without using global variables. If the function cannot achieve the accuracy requested in 200 iterations it will stop. The algorithm uses a technique based on 3 points spaced by the golden ratio which was introduced by Richard Brent. Examples: > MinimizeBrent( cos, 3, 0.01, 1e-7 ); [3.1416, -1.0000] See Also: ?BFGSMinimize ?MaximizeFunc ?Minimize2DFunc ?MinimizeSD ?DisconMinimize ?MaxLikelihoodSize ?MinimizeFunc ?NBody MinimizeFunc Function MinimizeFunc - Minimize a multivariate function using hill descending Calling Sequence: MinimizeFunc(f,iniguess,epsini,epsfinal) Parameters: Name Type ------------------------- f procedure iniguess array(numeric) epsini numeric epsfinal numeric Returns: [x, f(x)] Global Variables: Minimize_args Synopsis: Starting at iniguess with error tolerance epsini, this function minimizes f until the accuracy in each dimension is less than or equal to epsfinal. The function f takes an array of arguments. It returns the argument and the value of the local minimum found. The dimension of the array is given by the dimension of iniguess. Examples: > MinimizeFunc(x -> 3*tan(x[1])+abs(x[2]), [0.33, 0.44], 0.2, 0.1); [[-1.5708, -0.01126905], -362492.3547] See Also: ?BFGSMinimize ?MaximizeFunc ?Minimize2DFunc ?MinimizeSD ?DisconMinimize ?MaxLikelihoodSize ?MinimizeBrent ?NBody MinimizeSD Function MinimizeSD - Minimize a multivariate function using steepest descent Calling Sequence: MinimizeSD(f,iniguess,relateps) Parameters: Name Type Description -------------------------------------------------------------------------- f procedure the function to minimize, returns [f(x),f'(x)] iniguess array(numeric) initial guess relateps numeric stop when |f'(x)| < |x| relateps Returns: [x, f(x),f'(x)] Synopsis: Starting at iniguess this function searches a local minimum in the direction of the steepest descent. The direction of the steepest descent is used as long as the actual function decrease agrees (up to 90%) with the predicted (from the gradient) decrease. This guarantees that the minimum found is the one in the direction of the initial steepest descent. The convergence of this function is fast when it is searching far from the minimum and then it becomes slow when it is close to the minimum. MinimizeSD returns the list [x,f(x),f'(x)] at a local minimum (when the convergence criteria is met) or when the number of iterations exceeds 200. The function f(x) should compute the functional and its gradient and these should be returned as a list of two values: [fx:numeric,f1x:list(numeric)] Additional arguments passed on to MinimizeSD (fourth, fifth, etc.) are passed as additional arguments to f(). In this way f() usually does not need to rely on global information. Examples: > f := x -> [sin(x[1])+x[2]^2,[cos(x[1]),2*x[2]]]; f := x -> [sin(x[1])+x[2]^2, [cos(x[1]), 2*x[2]]] > MinimizeSD(f, [0.33, 0.44], 0.001); [[-1.5695, 1.2084e-11], -1.0000, [0.00133735, 2.4168e-11]] See Also: ?BFGSMinimize ?MaximizeFunc ?Minimize2DFunc ?MinimizeFunc ?DisconMinimize ?MaxLikelihoodSize ?MinimizeBrent ?NBody Multinomial_Rand Function Multinomial_Rand - Generate random multinomially distributed integers Calling Sequence: Rand(Multinomial(n,ps)) Multinomial_Rand(n,ps) Parameters: Name Type Description -------------------------------------------- n integer number of experiments ps list(numeric) probabilities Returns: list(integer) Synopsis: Given k probabilities ps=[p_1,..., p_k], this function returns a list of k random integers multinomially distributed with averages n*p_i, variances n*p_i*(1-p_i) and covariances -n*p_i*p_j. The sum of all integers is n. Multinomial_Rand uses Rand() which can be seeded by either the function SetRand or SetRandSeed. References: MB Brown and J Bromberg (1984), The American Statistician Examples: > Rand(Multinomial(100,[0.3,0.2,0.5])); [34, 24, 42] > Rand(Multinomial(1000,[0.01,0.9,0.09])); [9, 898, 93] See Also: ?Beta_Rand ?Exponential_Rand ?Normal_Rand ?StatTest ?Binomial_Rand ?FDist_Rand ?Poisson_Rand ?Std_Score ?ChiSquare_Rand ?GammaDist_Rand ?SetRand ?Student_Rand ?CreateRandSeq ?Geometric_Rand ?SetRandSeed ?Zscore ?Cumulative ?Graph_Rand ?Shuffle MultipleSubTree Function MultipleSubTree( MinSquareTree:Tree, MaxPW:array ) generates an array of arrays of SubTrees for all Pam windows in MaxPW Mutate Function Mutate - randomly mutate an amino acid sequence Calling Sequence: Mutate(seq,PAM,DelType) Parameters: Name Type Description -------------------------------------------------------------- seq string original amino acid sequence PAM numeric desired PAM distance to mutate DelType {ExpGaps,ZipfGaps} optional, model for making gaps Returns: string Synopsis: This function simulates evolution by performing random mutations in an amino acid sequence. These random mutations respond to the PAM distance given. The mutations will respond to a mutation matrix at that distance. If the third argument is given, the gaps will be inserted, either with an exponential or zipfian distribution. If no third parameter is given, no gaps will be inserted. Mutate will use the mutation matrices available in logPAM1, which are normally set by CreateDayMatrices(). If these matrices are created for DNA (only A,C,G and T), then the function Mutate will mutate a DNA sequence. Examples: > Mutate(CreateString(40,A),100); AKAASVAAFGGTNRAGSAAHASEAARGFNTAAPPTAPADE See Also: ?CreateDayMatrices ?CreateRandPermutation ?CreateRandSeq ?Shuffle MySql Function MySql - Wrapper for MySQL client Calling Sequence: MySql(query) Parameters: Name Type Description ---------------------------------------------------------------------------------------------------------- query string The MySQL query to be executed. setParseColumns {list(posint),set(posint)} (optional) columns to parse host host=string (optional) URL of the MySQL server user user=string (optional) the MySQL username to use when connecting password password=string (optional) the password to use when connecting port port=integer (optional) the TCP/IP port number to use for the connection database database=string (optional) the name of the database to use Returns: MySqlResult Synopsis: The MySql function can be used to access any MySQL database. The passed query in sql format is executed on the (remote) server and the result is returned to the user. Optional arguments and their default values: setParseColumns A list/set of integer to indicate which columns should be parsed. Unparsed columns appear as strings in the result. By default, no columns are parsed. host=string The URL where the MySQL server is running. The default is 'linneus54.inf.ethz.ch'. user=string The username to be used when connecting to the server. The default username is 'darwin'. password=string The password to be used when connecting to the server. By default, no password is used. port=string The TCP/IP port number of the server. If no port number is provided, the default MySQL port is used. database=string The name of the database to use. The default database is 'vpeikert' if the host is linneus54, otherwise no database is selected. Examples: > MySql('Select genome_5letter, entry_nr, entry_seq from genome, entry where entry_id IN (44,45) and entry_genome_id=genome_id'):; MySqlResult([genome_5letter, entry_nr, entry_seq],[[BACSU, 44, MAKTLSDIKRSLDGNLGKRLTLKANGGRRKTIERSGILAETYPSVFVIQLDQDENSFERVSYSYADILTETVELTFNDDAASSVAF], [BACSU, 45, MGRRRGVMSDEFKYELAKDLGFYDTVKNGGWGEIRARDAGNMVKRAIEIAEQQMAQNQNNR]]) > MySql('Select * from oma where oma_id=9233', database='oma_sep08'); MySqlResult([oma_id, oma_entry_id],[[9233, 1039323], [9233, 1107833], [9233, 2057091], [9233, 2201433]]) See also: ?MySqlResult ?OpenPipe ?OpenReading ?ReadRawFile MySqlResult Class MySqlResult - the result of a MySql function call Template: MySqlResult(ColumnLabels,Data) Fields: Name Type Description ------------------------------------------------------------- ColumnLabels list(string) the labels of each column Data matrix the two dimensional data matrix Methods: MySqlResult_type print Rand select string Synopsis: A MySqlResult structure stores the result of a MySql query. The data of a column can be retreived using the column label as the selector on the MySqlResult structure. Examples: > x := MySql('Select * from oma where oma_id=9233', database='oma_sep08'):; MySqlResult([oma_id, oma_entry_id],[[9233, 1039323], [9233, 1107833], [9233, 2057091], [9233, 2201433]]) > x['Data']; [[9233, 1039323], [9233, 1107833], [9233, 2057091], [9233, 2201433]] > x['oma_entry_id']; [1039323, 1107833, 2057091, 2201433] See also: ?MySql NBody Function NBody Option: builtin Calling Sequence: NBody(dist,var,k1) NBody(dist,var,k1,k2,rho1,rho2,inipos) Parameters: Name Type --------------------------------------------------------------------- dist distance matrix(numeric > 0) (1..n x 1..n) var distance variance matrix(numeric >= 0) (1..n x 1..n) k1 posint, initial dimension, k2 <= k1 k2 posint, final dimension rho1 numeric >= 0, point separation force rho2 numeric >= 0, point sequencing force inipos matrix(numeric), initial guesses (1..n x 1..k), k=k1 or k=k2 Returns: matrix(numeric) : coordinates of points (1..n x 1..k2) Synopsis: NBody solves the n-body steady state problem for k1 dimensions, then squeezes the coordinates to k2 dimensions. The problem is defined as minimizing the sum ( |x[i]-x[j]| - dist[i,j] ) ^ 2 / var[i,j] In other words, it does a least squares approximation of the distances. Var[i,j]=0 is an indication that the distance between i,j should not be used (fitted). When the errors in the fitting are supposed to be relative to the values of the distances, it is typical to use var = dist. k2, rho1, rho2 and inipos are all optional. If used, they must appear in the given order. If k2 is not present it is assumed to be equal to k1. Rho1 is used to guarantee some separation of the points. The function to minimize is added the value rho1 / (x[i]-x[j]) ^ 4. If not present, rho1 defaults to 0. Rho2 is used to impose sequencing over the points. The function to minimize is added the value rho2 * |x[i]-x[i+1]|. This will guarantee, that if there are many choices, the selected one will have the original sequence preserved, like a chain. If not present, rho2 defaults to 0.01. Inipos is an initial guess of the positions of the bodies. It is a matrix of dimension 1..n x 1..k1 (or 1..n x 1..k2). If not present, the algorithm starts with random locations (it uses the function Rand()). The global variable NBodyPotential is set to the minimum value of the cost potential. Examples: > dist := [ [0,1,1], [1,0,1], [1,1,0] ]; dist := [[0, 1, 1], [1, 0, 1], [1, 1, 0]] > NBody(dist,dist,2,2,0,0); [[0, 0], [1.0000, 0], [0.5000, 0.8660]] > NBodyPotential; 1.3194e-18 See Also: ?BFGSMinimize ?MaximizeFunc ?Minimize2DFunc ?MinimizeFunc ?DisconMinimize ?MaxLikelihoodSize ?MinimizeBrent ?MinimizeSD NSubGene Function NSubGene Calling Sequence: NSubGene(g,baseRange) Parameters: Name Type ------------------------------- g Gene data structure baseRange posint..posint Returns: Gene Synopsis: Returns the modified Gene containing only bases in baseRange. Examples: See also: ?Gene ?PSubGene Normal_Rand Function Normal_Rand - Generate random normally distributed numbers Options: builtin and numeric Calling Sequence: Rand(Normal) Rand(Normal(m,s2)) Normal_Rand() Parameters: Name Type Description --------------------------------------------------- m numeric expected value of the variable s2 nonnegative variance of the variable Returns: numeric Synopsis: This function returns a random number normally distributed with average 0 and variance 1. Normal_Rand uses Rand() which can be seeded by either the function SetRand or SetRandSeed. An normal variable with average m and variance s2 is obtained with the expression sqrt(s2)*Rand(Normal) + m or with Rand(Normal(m,s2)). References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.1.26 and 26.2 Examples: > Normal_Rand(); 1.5093 > [Rand(Normal),Rand(Normal)]; [-0.9358, 0.5327] > Rand(Normal(10,0.001)); 9.9755 See Also: ?Beta_Rand ?Exponential_Rand ?Multinomial_Rand ?StatTest ?Binomial_Rand ?FDist_Rand ?Poisson_Rand ?Std_Score ?ChiSquare_Rand ?GammaDist_Rand ?SetRand ?Student_Rand ?CreateRandSeq ?Geometric_Rand ?SetRandSeed ?Zscore ?Cumulative ?Graph_Rand ?Shuffle Normalize Function Normalize Calling Sequence: Normalize(m) Parameters: Name Type ------------------ m NucPepMatch Returns: NucPepMatch Global Variables: DB Synopsis: Normalizes a match referencing (the complement of) an NucDB database entry to refer to a sequence being present in memory. Examples: See also: ?Denormalize NucPepBackDynProg Function NucPepBackDynProg - Backwards dynamic programming alignment for peptide and nucleotide sequences Option: builtin Calling Sequence: NucPepBackDynProg(nuc,pep,DM,len1,len2,IntronScoring) Parameters: Name Type Description --------------------------------------------------------------- nuc string a nucleotide sequence pep string a peptide sequence DM DayMatrix Dayhoff Matrix len1 integer optional length of the 1st sequence len2 integer optional length of the 2nd sequence IntronScoring list optional Intron Scoring list Returns: NULL Synopsis: Compute the similarity and lengths of the best alignment between nuc and pep using the Dayhoff matrix DM, the optional lengths len1 and len2 and the optional IntronScoring doing backwards dynamic programming. If the lengths are not given or -1, return the maximum similarity. Examples: See Also: ?AlignNucPepAll ?FindNucPepPam ?LocalNucPepAlignBestPam ?AlignNucPepMatch ?GlobalNucPepAlign ?NucPepDynProg ?DynProgNucPepString ?LocalNucPepAlign ?NucPepMatch NucPepDynProg Function NucPepDynProg - Compute a Nucleotide Peptide Alignment Option: builtin Calling Sequence: NucPepDynProg(nuc,pep,DM,len1,len2,IntronScoring) Parameters: Name Type Description ----------------------------------------------------------- nuc string Nucleotide Sequence pep string Peptide Sequence DM DayMatrix Dayhoff Matrix len1 integer optional length of 1st sequence len2 integer optional length of 2st sequence IntronScoring list Intron scoring Returns: NULL Synopsis: Compute the similarity and lengths of the best alignment between nuc and pep using the Dayhoff matrix DM, the optional lengths len1 and len2 and the optional IntronScoring. If the lengths are not given or -1, return the maximum similarity. Examples: See Also: ?AlignNucPepAll ?FindNucPepPam ?LocalNucPepAlignBestPam ?AlignNucPepMatch ?GlobalNucPepAlign ?NucPepBackDynProg ?DynProgNucPepString ?LocalNucPepAlign ?NucPepMatch NucPepMatch Class NucPepMatch Template: NucPepMatch(NucEntries,PepEntries) NucPepMatch(NucOffset,PepOffset) NucPepMatch(Sim,NucOffset,PepOffset,NucLen,PepLen) NucPepMatch(Sim,NucOffset,PepOffset,NucLen,PepLen,PamNumber) NucPepMatch(Sim,NucOffset,PepOffset,NucLen,PepLen,PamNumber, PamVariance) NucPepMatch(Sim,NucOffset,PepOffset,NucLen,PepLen,PamNumber, PamVariance,IntronScoring) NucPepMatch(Sim,NucOffset,PepOffset,NucLen,PepLen,PamNumber, PamVariance,IntronScoring,NucGaps,PepGaps,Introns) Fields: Name Type Description ----------------------------------------------------------------------------------------- Sim numeric Similarity score NucOffset integer Offset of the nucleotide sequence in NucDB PepOffset integer Offset of the peptide sequence in PepDB NucLength integer Length of the nucleotide sequence PepLength integer Length of the peptide sequence PamNumber numeric Estimated PAM distance for the match PamVariance numeric Estimated PAM variance for the match IntronScoring {0,string,structure} Function for scoring introns NucGaps list Gaps in the nucleotide sequence from the alignment PepGaps list Gaps in the peptide sequence from the alignment Introns list List of suspected introns Returns: NULL Methods: Entry Gene ID NucPepMatch_type print select Global Variables: DB Synopsis: The NucPepMatch structure holds all the necessary information for the alignment of a peptide and a nucleotide sequence. The offsets are positions into a peptide and nucleotide database, hence NucPepMatch requires that appropriate databases have been loaded. Examples: See Also: ?AlignNucPepAll ?GetPosition ?NucPepDynProg ?AlignNucPepMatch ?GlobalNucPepAlign ?NucPepRegions ?Denormalize ?Intron ?ParallelAllNucPepMatches ?DynProgNucPepString ?LocalNucPepAlign ?PepDB ?FindNucPepPam ?LocalNucPepAlignBestPam ?ScoreIntron ?Gene ?Match ?VisualizeGene ?GetAllNucPepMatches ?Normalize ?VisualizeProtein ?GetIntrons ?NucDB ?GetPeptides ?NucPepBackDynProg NucPepRegions Function NucPepRegions Option: builtin Calling Sequence: NucPepRegions(npm) Parameters: Name Type Description ------------------------------------------------ npm NucPepMatch a nucleotide peptide Match Returns: list Synopsis: Converts an NucPepMatch into a list of alignment regions. Region formats are: - [ALIGN, Sim, nucLen, pepLen] - [NUCGAP, Sim, nucLen, 0] - [PEPGAP, Sim, 0, pepLen] - [INTRON, Sim, nucLen, 0]. After r := NucPepRegions (m), the following equations hold: sum (zip ((x->x[2])(r))) = m[Sim] sum (zip ((x->x[3])(r))) = m[NucLength] sum (zip ((x->x[4])(r))) = m[PepLength]. If either PepDB or NucDB are not loaded, Sim will be 0 in ALIGN regions. If no suitable Dayhoff matrix can be found, Sim will be 0 in ALIGN, NUCGAP and PEPGAP regions. Examples: See also: ?NucPepMatch OpenAppending Function OpenAppending Option: builtin Calling Sequence: OpenAppending(fname) OpenAppending(terminal) OpenAppending(previous) Parameters: Name Type -------------------------- fname filename terminal system variable previous system variable Returns: NULL Synopsis: If the parameter is the system name "terminal", all subsequent output generated by Darwin is sent to the standard output. This is typically the terminal. Otherwise, all subsequent output generated will be appended to the file "fname". Fname can be a name or an entire path. If no file named "fname" exists, Darwin creates such a file. The options "terminal" and "previous" behave the same way as they do for OpenWriting. Examples: > OpenAppending('~hallett/bankaccount'); > print('Debit 100000 SFr.'); > OpenWriting(terminal); > See Also: ?FileStat ?OpenReading ?ReadLine ?ReadRawLine ?inputoutput ?OpenWriting ?ReadOffsetLine ?LockFile ?ReadData ?ReadRawFile OpenPipe Function OpenPipe - execute system command and pipe output to Darwin Option: builtin Calling Sequence: OpenPipe(cmd) Parameters: Name Type Description -------------------------------------------------------- cmd string a command for the underlying UNIX system Returns: NULL Synopsis: OpenPipe will execute the command described by the string cmd and directs its output to be the input for Darwin. This is called opening a pipe in the Unix terminology. This output is readable with ReadRawLine() commands (simply as text) or with ReadLine() commands (when the output is/are valid Darwin commands). When the output is exhausted, the string EOF will be returned by the read commands and the pipe will be closed. Examples: > OpenPipe(date); > ReadRawLine(); Thu Oct 12 08:01:39 MET DST 2000 > ReadRawLine(); EOF See Also: ?CallSystem ?OpenAppending ?ReadOffsetLine ?SystemCommand ?FileStat ?OpenReading ?ReadRawFile ?TimedCallSystem ?inputoutput ?OpenWriting ?ReadRawLine ?LockFile ?ReadLine ?SplitLines OpenReading Function OpenReading - open a file for future reading Option: builtin Calling Sequence: OpenReading(filename) Parameters: Name Type ----------------- filename string Returns: NULL Synopsis: This functions opens the file given as argument for reading. Any future ReadRawLine or ReadLine commands will read data from the opened file. When the end of the file is reached, the read commands will return the token EOF. If the argument is the name 'terminal', then the standard input (stdin in Unix) is opened for input. A file that is opened this way can contain Darwin commands or any arbitrary text. If a ReadRawLine() command is used, then a textual line of the file will be read. If a ReadLine() command is used, then the line is expected to be a valid Darwin command. If filename ends in ".gz" or ".Z", then it is assumed to be a compressed file and it is decompressed before reading. Examples: > OpenReading( '/home/darwin/test' ); > t := ReadRawLine(); > OpenReading(terminal); See Also: ?FileStat ?OpenAppending ?ReadLine ?ReadURL ?inputoutput ?OpenPipe ?ReadOffsetLine ?ServerSocket ?LockFile ?OpenWriting ?ReadRawFile ?MySql ?ReadData ?ReadRawLine OpenWriting Function OpenWriting Option: builtin Calling Sequence: OpenWriting(fname) OpenWriting(terminal) OpenWriting(previous) Parameters: Name Type Description ----------------------------------- fname string filename terminal symbol system variable previous symbol system variable Returns: NULL Synopsis: If filename is given as the parameter, OpenWriting will open a file named filename and send all subsequent output directed towards the standard output into this file. If filename already exists, it is overwritten. If "terminal" is specified, all subsequent output is directed back towards the standard output (typically the monitor). If filename is "previous", then the current output stream is closed and subsequent output is reverted to the stream which was active before the previous OpenWriting or OpenAppending. Examples: > OpenWriting('~hallett/Book/mainfile'); > print('A quick way to create a lot of work for myself'); > OpenWriting(terminal); See Also: ?FileStat ?OpenAppending ?ReadOffsetLine ?inputoutput ?OpenReading ?ReadRawFile ?LockFile ?ReadLine ?ReadRawLine OrthologousGroup Class OrthologousGroup - information about an orthologous group of sequences Template: OrthologousGroup(Species,Seqs,AllAll) Fields: Name Type Description -------------------------------------------------------------------------- Species list(string) species of each sequence Seqs list(string) the amino acid sequence AllAll matrix({0,Alignment}) All-against-all alignments Length posint number of sequences in group Tree Tree phylogenetic distance tree for the group Methods: OrthologousGroup_type Rand select string Synopsis: This is the main result of the function Orthologues. It stores the information about a group (clique) of orthologous sequences belonging to various species. See also: ?Orthologues ?PhylogeneticTree ?Species_Entry ?SP_Species Orthologues Function Orthologues - find orthologous groups between various species Calling Sequence: Orthologues(SpeciesList,SampleSeq,...) Parameters: Name Type Description -------------------------------------------------------------------------- SpeciesList list(string) a list of strings identifying species SampleSeq string a sequence, find all homologous MinScore MinScore = positive minimum score for determining homology ScoreTol ScoreTol = positive score tolerance for stable pairs LenthTol LenthTol = positive length ratio tolerance for homology Returns: list(OrthologousGroup) Synopsis: Orthologues finds the orthologous groups between a set/list of species. All the parameters are optional, but one of SpeciesList or SampleSeq must be provided. An orthologous pair of sequences are homologous sequences which have diverged because of speciation alone. That is, the most recent ancestor of the two sequences resided in the most recent common ancestor of both species. The process follows four steps: (1) An all-against-all alignment of all sequences in the species (or all sequences homologous to the given sample) is done. The alignments with a score above MinScore (default 300) are refined to compute their distance. The alignment length has to be at least LengthTol (default 70%) of the length of the shorter sequence. (2) The stable pairs are found, that is a pair which scores highest among all pairs in both directions. This maximum score is accepted with a percentage tolerance given by ScoreTol (default 95%). (3) The stable pairs are compared against all other species to see if they are paralogous and not orthologous, the ones which survive the tests are called verified stable pairs. (4) Cliques of the verified stable pairs are extracted, one at a time to form the orthologous groups. The alignments are done using the Dayhoff matrices stored in DM and DMS (normally build with CreateDayMatrices). The orthologous groups are returned in a list of OrtholgousGroup data structures. Examples: > Orthologues(['Picea abies', 'Pinus contorta', 'Pinus radiata']); [OrthologousGroup([Picea abies, Pinus radiata],[CWELYWLEHGIQPDGMMPSDTTVGVGDDAFNTFFSETGAGKHVPRAVFVDLEPTVIDEVRTGAYRQLFHPEQLISGKEDAANNFARGHYTVGEEIVDLCLDRVRKLADNCTGL, MSPKTETKASVGFKAGVKDYRLTYYTPEYQTKDTDILAAFRVTPQPGVPP ..(475).. IKFEFDVIDRL],[[0, Alignment(Sequence(AC('O82035'))[1..356],Sequence(AC('Q40976'))[1..356],4045.0074,DMS[209],4.7818,1.3803,{Local})], [Alignment(Sequence(AC('O82035'))[1..356],Sequence(AC('Q40976'))[1..356],4045.0074,DMS[209],4.7818,1.3803,{Local}), 0]])] See Also: ?Align ?CreateDayMatrices ?Species_Entry ?Alignment ?OrthologousGroup ?SP_Species OutsideBounds Function OutsideBounds - test whether Stats could be the same Calling Sequence: OutsideBounds(a,b,Confidence = 0.9750) Parameters: Name Type Description ------------------------------------------------------------ a Stat Stat data structure to be compared b Stat second Stat to be compared or b numeric value to be compared Confidence positive confidence level (defaults to 0.975) Returns: boolean Synopsis: OutsideBounds checks whether two univariate statistics (Stat objects) or one univariate statistics and a value represent different values with a certain confidence level. The confidence level is set to 0.975 by default, which gives the usual 2.5% error on one side, or the 1.96 standard deviations away the mean for a normal variable. If the second argument is a single value the test is equivalent to determining whether the first distribution could have average b at the given confidence level. The Confidence level can be changed to any value between 0.5 <= c < 1. Examples: > st := Stat('Near 7'): > to 10 do st+Rand(6.5..7.5) od: > print(st); Near 7: number of sample points=10 mean = 6.95 +- 0.20 variance = 0.105 +- 0.058 skewness=0.146037, excess=-1.35184 minimum=6.53935, maximum=7.48143 > OutsideBounds(st,7); false > OutsideBounds(st,6.5); true > OutsideBounds(st,6.5,Confidence=0.999999); false See also: ?ExpFit ?LinearRegression ?Stat ?StatTest ?UpdateStat PASfromMSA Function PASfromMSA Calling Sequence: PASfromMSA(msa) PASfromMSA(msa,lnM,freq) Parameters: Name Type Description ---------------------------------------------------------- msa MAlignment multiple sequence alignment lnM matrix(numeric) (optional) log. of a 1-PAM matrix freq array(numeric) (optional) character frequencies Returns: ProbSeq Synopsis: Computes the probabilistic ancestral sequence at the root of a phylogenetic tree over a multiple sequence alignment of probabilistic sequences. For protein sequences, the global variable NewLogPAM1 is assumed to describe the amino acid mutation probabilities. The global variable LogLikelihoods will be assigned to an array containing the ln of the likelihoods at each position. References: GM Cannarozzi, A Schneider and GH Gonnet (2007): Probabilistic Ancestral Sequences Based on the Markovian Model of Evolution - Algorithms and Applications, in: D Liberless (editor): Ancestral Sequence Reconstruction, Oxford University Press. Examples: > seqs := ['AAAR','AARR','VTAARRQQ']: > msa := MAlign(seqs):; dimensionless fitting index 1470 > print(msa);; Multiple sequence alignment: ---------------------------- Score of the alignment: 54.882333 Maximum possible score: 54.882333 Sequence 1 _AAAR___ Sequence 2 __AARR__ Sequence 3 VTAARRQQ > pas := PASfromMSA(msa): > print(pas);; pos Most probable chars 1 V 0.83 I 0.05 L 0.04 A 0.03 T 0.02 2 A 0.70 T 0.25 S 0.03 V 0.01 K 0.00 3 A 1.00 S 0.00 V 0.00 G 0.00 T 0.00 4 A 1.00 S 0.00 V 0.00 G 0.00 T 0.00 5 R 1.00 K 0.00 Q 0.00 A 0.00 S 0.00 6 R 0.97 K 0.01 Q 0.00 A 0.00 L 0.00 7 Q 0.70 E 0.06 K 0.05 R 0.03 A 0.03 8 Q 0.70 E 0.06 K 0.05 R 0.03 A 0.03 See Also: ?MAlign ?PASfromTree ?ProbSeq ?MAlignment ?ProbAncestor ?PSDynProg PASfromTree Function PASfromTree Calling Sequence: PASfromTree(seqs,tree) PASfromTree(seqs,tree,lnM,freq,gapcosts) Parameters: Name Type Description ---------------------------------------------------------------------- seqs array({ProbSeq,string}) (probabilistic) sequences tree Tree tree of the sequences lnM matrix(numeric) (optional) log. of a 1-PAM matrix freq array(numeric) (optional) freq. of characters gapcosts procedure (optional) gap cost function Synopsis: Computes the probabilistic ancestral sequence at the root of a phylogenetic tree over a list of probabilistic sequences. For each internal node, the prob. sequences at the roots of the two subtrees are aligned and then an ancestral vector is computed. The global variable LogLikelihoods will be assigned to an array containing the ln of the likelihoods at each position. The third field of the leaves must be integer numbers corresponding to the sequences in the list (as it is automatically teh case when the tree comes either from an MAlign or a PhylogeneticTree call). For protein sequences, the global variables NewLogPAM1, AF and gap costs drevied from DMS are assumed. For other types of sequences, the log of a mutation matrix (e.g. CodonLogPAM1), a vector of natural character frequencies (e.g. CF) and a function to compute gap costs for a given gap length at a given PAM distance is needed. (Typically of the form (pam,len)->-37.64+7.434*log10 (pam)-(len-1)*1.3961). References: GM Cannarozzi, A Schneider and GH Gonnet (2007): Probabilistic Ancestral Sequences Based on the Markovian Model of Evolution - Algorithms and Applications, in: D Liberless (editor): Ancestral Sequence Reconstruction, Oxford University Press. Examples: > seqs := ['VAAAR','AARR','VTAARRQQ']: > ps := [seq(ProbSeq(s,IntToA),s=seqs)]: > tree := PhylogeneticTree(seqs,[seq(i,i=1..length(seqs))],DISTANCE); > pas := PASfromTree(ps,tree): > print(pas);; pos Most probable chars 1 V 1.00 I 0.00 L 0.00 A 0.00 T 0.00 2 A 0.71 T 0.25 S 0.02 V 0.01 K 0.00 3 A 1.00 S 0.00 V 0.00 G 0.00 T 0.00 4 A 1.00 S 0.00 V 0.00 G 0.00 T 0.00 5 R 1.00 K 0.00 Q 0.00 A 0.00 S 0.00 6 R 0.97 K 0.01 Q 0.00 A 0.00 L 0.00 7 Q 0.76 E 0.05 K 0.04 R 0.03 A 0.02 8 Q 0.76 E 0.05 K 0.04 R 0.03 A 0.02 See Also: ?CreateCodonMatrices ?PASfromMSA ?ProbSeq ?CreateDayMatrices ?ProbAncestor ?PSDynProg PSDynProg Function PSDynProg Calling Sequence: PSDynProg(ps1,ps2,dist,meth) PSDynProg(ps1,ps2,dist,lnM,freq,gapcosts,meth) Parameters: Name Type Description ----------------------------------------------------------------------------- ps1, ps2 ProbSeq Probabilistic sequences dist numeric Distance between the two sequences lnM matrix(numeric) (optional) log. of a 1-PAM matrix freq array(numeric) (optional) Natural frequencies of the characters gapcosts procedure (optional) Gapcosts as a function of gap length meth {Global,Local} (optional) alignment method Returns: numeric : ProbSeq Global Variables: DBGTMP Synopsis: Dynamic programming over two probabilistic sequences. In the standard case of proteins, the global varibles NewLogPAM1, AF and gap costs according to the Dayhoff matrices are used. For other types of sequences (e.g. DNA or codons), the logarithm of a mutation matrix (e.g. CodonLogPAM1) and the natural frequencies of the characters (e.g. CF) are required. Also, a gap cost function is needed that returns the costs for a gap of a given size. This is usually k->FixedDel+(k-1)*IncDel with the coefficients taken from the CMS matrix for the given distance. The default alignment method is 'Local'. References: GM Cannarozzi, A Schneider and GH Gonnet (2007): Probabilistic Ancestral Sequences Based on the Markovian Model of Evolution - Algorithms and Applications, in: D Liberless (editor): Ancestral Sequence Reconstruction, Oxford University Press. Examples: > ps1 := ProbSeq('RAAVTGAAAQQQFT',IntToA): > ps2 := ProbSeq('VTGQQQ',IntToA): > dist := 10: > aps := PSDynProg(ps1,ps2,dist): > print(aps);; 41.6760 pos Most probable chars 1 V 1.00 2 T 1.00 3 G 1.00 4 A 1.00 5 A 1.00 6 A 1.00 7 Q 1.00 8 Q 1.00 9 Q 1.00 pos Most probable chars 1 V 1.00 2 T 1.00 3 G 1.00 4 5 6 7 Q 1.00 8 Q 1.00 9 Q 1.00 See Also: ?CreateCodonMatrices ?PASfromMSA ?ProbAncestor ?CreateDayMatrices ?PASfromTree ?ProbSeq PSubGene Function PSubGene Calling Sequence: PSubGene(g,new,newLength) Parameters: Name Type ------------------------------------ PSubGene Gene new {posint, posint..posint} newLength posint Returns: Gene Synopsis: Returns the modified Gene encoding the peptide at offset new with length newLength or with amino acid range new. Examples: See also: ?Gene ?NSubGene PamMax Function PamMax( MinSquareTree:Tree ) returns the largest pam distance of two sequences in a MinSquareTree PamToCodonPam Function PamToCodonPam - Convert PAM to CodonPAM. Calling Sequence: PamToCodonPam(lnM1,CF,Pam) Parameters: Name Type Description ----------------------------------------------------------------------- lnM1 matrix(numeric,64) Logarithm of a 1-PAM codon mutation matrix. CF array(numeric,64) Codon frequencies Pam numeric PAM distance to be converted Returns: numeric Synopsis: Converts PAM to CodonPAM. This conversion depends on the amount of synonymous mutations for a species or set of species, so the logarithm of the 1-CodonPAM matrix and the codon frequencies are required as arguments. The conversion is done by inverting the CodonPamtoPam function using a Brent's search. Examples: > PamToCodonPam(CodonLogPAM1,CF,50); 109.2499 See also: ?CodonPamToPam ?CreateCodonMatrices PamToPerIdent Function PamToPerIdent - Compute percentage identity from PAM Calling Sequence: PamToPerIdent(p) Parameters: Name Type Description ----------------------------- p numeric PAM distance Returns: numeric Synopsis: Compute the percentage identity that a pam distance will leave. Examples: > PamToPerIdent(250); 19.6841 See also: ?PerIdentToPam PamWindows Function PamWindows( MinSquareTree:Tree ) returns a vector containing all different PamWindows in a tree ParExecuteIPC Function ParExecuteIPC Calling Sequence: ParExecuteIPC(queue,ProgFileName,machines,handler,delay, controls) Parameters: Name Type Description -------------------------------------------------------------------------------------------- queue list({string,structure}) statements parameterizing each job ProgFileName string File name containing init and job procedures machines list(string) list of machines to be used handler {0,procedure} result handler delay posint delay (secs) between checking machines: default 10 controls string statements about how a job be can be executed Returns: NULL Global Variables: Queue StartDate StartTime initCPU istodo killed logfile mach normal_termination nrCreated nrCycles nrVanished resultHandler send_mail startable_processes todo Synopsis: ParExecuteIPC runs the job described in ProgFileName with the parameters in queue on machines. Before executing a task in parallel on several machines, several areas must be prepared. 1). Find the machines to be used. The criteria for machines to be used are that a) they are accessible via the Internet. b) all machines have an account with the same name. c) All machines must be capable of running darwin and darwinipc (See ?darwinipc). It is possible to configure ParExecuteIPC to use certain machines at specific times of the day, only when they have a specific load or only when no one is logged in. Machine names in this list can have a suffix of the form ":class" where class is an integer (see example). If class suffixes are used, a machine with class greater than zero when becoming idle will start a job already running on a machine of lower class. This avoids waiting for termination of the last few jobs which are running on slow machines. 2) Determine what files are needed. All files that are needed must be available with the same path name on all machines (databases, darwin code, etc.). 3) Determine the smallest independent job. 4) Determine the variables that parameterize a single job and create a list (queue) of strings in which each string contains all Darwin statements required to parameterize a job. 5) Create a file (ProgFileName) containing two parameterless procedures- init and job. init does the initialization (loads databases, computes Dayhoff matrices) - its return value is ignored. job does the actual job and must return the results as a string. Inside job, the global variable PE_job is the number of the job being executed, and the global variable tmpfile can be used as the name of a temporary scratch file, for instance to write the results. The job procedure should be written in such a way that it can be executed several times within the same run (with different jobs). Note: do not forget to declare all variables being used in both procedures as global. 6) Optionally, a result handler procedure can be created. The result handler accepts a job number (an integer) and its result (a string) and handles the result. Note that handling a job result should only take negligible time, so this handler typically writes the result to a file. If you do not provide your own handler, the default handler (indicated by the number 0 as an argument to ParExecuteIPC) is used. The default result handler creates one output file per job: the results of job i are stored in ProgFileName.out.i. When ParExecuteIPC executes, it automatically creates two files- ProgFileName.log and ProgFileName.done. In ProgFileName.done, the job numbers of the completed jobs are listed. If the completion was not successful, then the job number is preceded by a minus sign. ProgFileName.log is a log of the process execution. It tells what the status of the machines was when ParExecuteIPc was started, tells which machine is running each job and the execution time. It also contains any error messages generated by the processes. When ParExecuteIPC completes, it sends a mail message containing some statistics unless the NoMail control statement is passed. When ParExecuteIPC is killed before completion, it creates a file named ParExecute.redo. If this file is renamed ParExecAction and the ParExecuteIPC command is restarted, it will automatically complete all jobs in the ParExecAction file. When restarted, ParExecuteIPC will also redo any jobs in the ProgFileName.done file that are preceded by a minus sign. At anytime during the execution of ParExecuteIPC, control statements can be executed by placing them in a file called ParExecAction in the directory from which the command was run. ParExecuteIPC recognizes the following control statements. StartUsing m Adds m to the pool of machines being used. StopUsing m Removes m from the pool of machines to be used. Any job running on m is killed and its results discarded. Status Write the status of all machines to the log file. LoginControl m on/off Turn the login control on machine m on or off. ForcedRun m on/off On machine m, force process to run (ignore BUSY flag) NiceValue m n Run at nice n on machine m OffHours m from..to Jobs running on machine m are stopped between from and to hours (both in 24 hour notation). MaxJobs m n Run n jobs (as if having n processors) on machine m LoadThreshold m low hi A job in machine m is stopped when the load on its machine is greater than hi, and is continued when the load gets less than low. LoadThreshold low hi Set global thresholds RunAlso job Adds job command to the job queue. KillAll Kill all running jobs and end ParExecuteIPC. Send a mail message with execution statistics to the user. Interrupt Kill all running jobs and end ParExecuteIPC. Write the jobs to be finished into a file, ParExecAction.redo. NoMail Turns off the sending of execution statistics by email. There are four ways to send a control statement to a ParExecuteIPC job: 1) As a line of the optional controls parameter when ParExecuteIPC is invoked. As data being sent to the ParExecuteIPC process via Darwin's IPC feature (see ?darwinipc). This is the most efficient way and response to the command is immediate. For example, (assuming that the ParExecuteIPC process has pid 8281 and runs on ru3), typing: ipcsend SEND ru2 8281:Status at the operating system prompt will cause the ParExecuteIPC status to be reported on the log file. 3) As a line of the file ParExecAction in the current directory of the ParExecuteIPC process. Whenever this file is found, it is read, processed and deleted. 4) As a line of ParExecAction.pid in the current directory of the ParExecuteAction process, where pid is the process id of the ParExecuteIPC process. Whenever this file is found, it is read, processed and deleted. If login control is on, jobs are stopped whenever an interactive user logs in to the machine, and are continued when no user is logged in. Logins of certain users can be excluded from this checking with the -u switch of the darwinipc daemon (see ?darwinipc). For the following example, first create a file with the name "ParExample". In that file, define procedures with the names init and job. These procedure names are not optional. In this example, a result handler is also defined with procedure name "Handler". This file name is optional. Here is the contents of the file "ParExample": init := proc() ReadDb('/home/darwin/DB/SwissProt'): end: job := proc() sequ := SearchTag('SEQ',Entry(entry)): sequence := staring(SearchSeqDb(sequ)): sequence; end: Handler := proc(job:integer,t:string) OpenAppending('Job.results'); printf('%s ',t): end: Before running the command, the queue which parameterizes each job, the machines to be used and the control strings must be defined. Examples: > Machines:=['linneus1:2','linneus2:3']; > Controls := 'OffHours linneus1 8..9 MaxJobs linneus2 4 '; > queue := [seq(sprintf('entry := %d:',i),i=1..10)]; > ParExecuteIPC(queue,'ParExample',Machines,Handler,10,Controls); See Also: ?ConnectTcp ?ipcsend ?ReceiveDataTcp ?SendTcp ?darwinipc ?ParExecuteSlave ?ReceiveTcp ?DisconnectTcp ?ParExecuteTest ?SendDataTcp ParExecuteTest Function ParExecuteTest Calling Sequence: ParExecuteTest(thisjob,ProgFileName,machine) Parameters: Name Type Description -------------------------------------------------------------------------------- thisjob {string,structure} statements parameterizing one job ProgFileName string File name containing init and job procedures machine string machine to be used Returns: string Global Variables: job Synopsis: ParExecuteTest tests thisjob using the prog in ProgFileName simulating a ParExecuteIPC on machine. It is designed to test the setup of a ParExecuteIPC before running it on multiple machines. For the following example, first create a file with the name "ParExample". In that file, define procedures with the names init and job. These procedure names are not optional. Here is the contents of the file "ParExample": init := proc() ReadDb('/home/darwin/DB/SwissProt'): end: job := proc() sequ := SearchTag('SEQ',Entry(entry)): sequence := staring(SearchSeqDb(sequ)): sequence; end: Examples: > queue := [seq(sprintf('entry := %d:',i),i=1..10)]:; > ParExecuteTest(queue[2],'ParExample',linneus2):; Warning: procedure Handler reassigned May 20 13:16:10 2003: linneus2 creates parallel process May 20 13:16:10 2003: linneus2(19680) started May 20 13:16:10 2003: linneus2(19680) initialized (0.0 s CPU) May 20 13:16:10 2003: linneus2(19680) started job May 20 13:16:10 2003: linneus2(19680) completed job (0.0 s CPU), result: MASVKSSSSSSSSSFISLLLLILLVIVLQSQVIECQPQQSCTASLTGLNVCAPFLVPGSPTASTECCNAVQSINHDCMC May 20 13:16:10 2003: linneus2(19680) ending May 20 13:16:10 2003: linneus2(19680) ended See Also: ?ConnectTcp ?ipcsend ?ReceiveDataTcp ?SendTcp ?darwinipc ?ParExecuteIPC ?ReceiveTcp ?DisconnectTcp ?ParExecuteSlave ?SendDataTcp Paragraph Class Paragraph - holds contents of a paragraph of text Template: Paragraph(content1,...) Paragraph(indent,content1,...) Returns: Paragraph Fields: Name Type Description ------------------------------------------------------------------ indent integer the integer indentation value content_i {string,structure} the text content of the Paragraph Methods: HTMLC LaTeXC Paragraph_type print string Synopsis: The Paragraph structure holds text that is expected to be laid out as a paragraph. The integer value indent specifies the number of blank positions to be added at the beginning of the first line. If the indent value is negative, then the first line is not indented, but the rest of the lines will be indented by -indent. Paragraphs are typically part of Documents or Descriptions or any place where text must be formatted. When a Paragraph is converted to a string, each content_i is converted to a string, all concatenated together and properly broken into lines not exceeding the value of the interface variable screenwidth. A newline character is always added at the end of the last line of the converted Paragraph. Any newlines or tab characters in the contents are changed into spaces. Examples: > p := Paragraph( 5, 'This text is indented 5 spaces' ); p := Paragraph(5,This text is indented 5 spaces) > print(p); This text is indented 5 spaces See Also: ?Block ?Document ?latex ?RunDarwinSession ?Code ?HTML ?List ?screenwidth ?Color ?HyperLink ?PostscriptFigure ?Table ?Copyright ?Indent ?print ?TT ?DocEl ?LastUpdatedBy ?Roman ?View ParallelAllNucPepMatches Function ParallelAllNucPepMatches Option: builtin Calling Sequence: ParallelAllNucPepMatches(npm,dm,goal) Parameters: Name Type Description ---------------------------------------------------------------- npm list(NucPepMatch) a Nucleotide Peptide Match dm {DayMatrix,list(DayMatrix)} Dayhoff matrix or matrices goal numeric threshold value Returns: NULL Synopsis: Does multiple GetAllNucPepMatches simultaneously. More efficient than single GetAllNucPepMatches calls on some parallel machines only. Examples: See also: ?GetAllNucPepMatches ?NucPepMatch ParseDimacsGraph Function ParseDimacsGraph Calling Sequence: ParseDimacsGraph(s) Parameters: Name Type Description -------------------------------------- s string graph in dimacs format Returns: Graph Synopsis: This function parses a graph in dimacs format and returns a Darwin graph structure. Examples: > ParseDimacsGraph('p edge 5 2 e 2 4 9 e 1 5 2 '); Graph(Edges(Edge(9,2,4),Edge(2,1,5)),Nodes(1,2,3,4,5)) See Also: ?BipartiteGraph ?Graph_minus ?Nodes ?Clique ?Graph_Rand ?Path ?DrawGraph ?Graph_XGMML ?RegularGraph ?Edge ?InduceGraph ?ShortestPath ?EdgeComplement ?MaxCut ?TetrahedronGraph ?Edges ?MaxEdgeWeightClique ?VertexCover ?FindConnectedComponents ?MinCut ?Graph ?MST ParseNewickTree Function ParseNewickTree - Converts a tree from newick to darwin format Calling Sequence: ParseNewickTree(t) ParseNewickTree(t,modif) Parameters: Name Type Description ----------------------------------------------------------------------------- t string tree in newick format modif symbol = procedure (optional) modifier procedures for label parsing Returns: Tree Synopsis: The function converts a tree from Newick (and also New Hampshire eXtended) format to a Darwin tree. Multifurcated nodes will be resolved to a binary representation with in-between-branches of length 0. Possible modifier for label parsing: 'InternalLabels'=procedure A function string->anything that is called with any label assigned to an internal node. The return value is stored in the 'xtra' field of the node. The default handler return NULL for empty labels (such that no 'xtra' field is created), the content of the 'NHX'-Tag (see References) or else the label itself. 'LeafLabels'=procedure A function string->anything that is called with 'NHX'-Tags assigned to leaves of the tree. The return value is stored in the 3rd field of the Leaf data structure. The default handler returns the content of those tags. 'defaultBranchLength'=nonnegative If only the topology of the tree is given, one can set a default length. N-ary inner nodes will can be preserved that way, as they get all the same height assigned. References: Newick format according to Olson Grammar: http://evolution. genetics.washington.edu/phylip/newicktree.html Description of New Hampshire extension (vers 2.0): http://www.phylosoft.org/forester/NHX.html Examples: > t := '(((A:0.2,B:0.3):0.3,(C:0.5,D:0.3):0.2):0.3,E:0.7):0.0;'; t := (((A:0.2,B:0.3):0.3,(C:0.5,D:0.3):0.2):0.3,E:0.7):0.0; > ParseNewickTree(t); Tree(Tree(Tree(Leaf(A,0.8000),0.6000,Leaf(B,0.9000)),0.3000,Tree(Leaf(C,1),0.5000,Leaf(D,0.8000))),0,Leaf(E,0.7000)) See also: ?Leaf ?LeastSquaresTree ?PhylogeneticTree ?Tree ParsePred Function ParsePred( MulAlign:array(string), tree ) Generates the prediction of parse regions in a multiple alignment PartialFraction Function PartialFraction Calling Sequence: PartialFraction(r) PartialFraction(r,eps) Parameters: Name Type ---------------------------------------------------------- r a numerical value eps optional, the desired accuracy of the approximation Returns: [integer, posint] : a rational number represented by two integers, p/q Synopsis: PartialFraction computes an approximation of the input value r as a rational number. The pair of integers returned, p,q, should be interpreted as a rational approximation of r, i.e. p/q=r. The second argument must be a positive argument. The computed approximation will have an error of the same order of magnitude as eps or smaller. If eps is omitted the value 1e-5 is used. Examples: > PartialFraction(1.234567); [100, 81] > PartialFraction(-Pi,0.01); [-22, 7] Partitions Data structure Partitions( ) Function: creates a splits or partitions data structure Selectors: Tree: Creates a tree from the given partitions If the partitions cause conflicts, then VertexCover is used to remove the conflicts and then a tree is constructed. Conflicts: Returns a reduced Partitions set that is free of conflicts (VertexCover) MinSquare: Uses the probabilistic model to create a tree. This is useful if a tree should be constructed but there are still conflicts in the graph. If you do not want to use VertexCover to remove the conflicts then this is an alternative. This way a minimum sqare tree is produced. Partitions_GetConflicts Function Partitions_GetConflicts( ) returns a list of sets. If the set is not empty, it specifies the conflict with another set. The number in the set is the other conflicting set in the list Partitions_GetTree Function Partitions_GetTree( ) Constructs a binary tree form a set of partitions PRECONDITIONS: the partitions must be conflict free and there must be enough partitions (n-2, n = nr of leaves) to construct a complete tree Partitions_ResolveConflicts Function Partitions_ResolveConflicts( ) Data is a list of partitions (list of sets). The procedure finds the conflicts, creates a graph and uses VertexCover to resolve the conflicts. The result a reduced list of sets that does not contain the conflicting sets. PatEntry Class PatEntry - Data structure for entries to the Pat index for the database DB Template: PatEntry(a) Fields: Name Type Description ----------------------------------------------------------------------------------- a {integer,range,string,list(integer)} PatEntry number(s) in the database DB or a string to be searched Returns: PatEntry Methods: AC Entry ID Match PatEntry_type print Sequence string Synopsis: When a Darwin database is read for the first time, Darwin will automatically create a Patricia tree data structure from the contents of the SEQ field for each entry. This is accessed via a Pat index. PatEntry is a data structure for entries to the Pat index for the database DB. If the argument is an integer, a list of integers or a range of integers, these are considered to be entries in the Pat index of the database. If a string is given, it is assumed to be a sequence, and the Pat index is searched for all the sequences which contain the string exactly. The result is returned as a range, even in the case that it is not found (an empty range) which is useful as it points to the two closest neighbouring sequences in the database. Searching for exact identity of peptides using PatEntry is very fast. Examples: > PatEntry(1); PatEntry(1) > PatEntry(1..5); PatEntry(1..5) > PatEntry('HHHHHHHH'); PatEntry(19663305..19663628) > PatEntry('B'); PatEntry(4667614..4667613) > PatEntry('C'); PatEntry(4667614..5600515) > Sequence(PatEntry(CCCCCCCC)); CCCCCCCCCCCNFCCGKFKPPVNESHDQYSHLNRPDGNREGNDMPTHLGQPPRLEDVDLDDVNLGAGGAPVTSQPREQAGGQPVFAMPPPSGAVGVNPFTGAPVAANENTSLNTTEQTTYTPDMVNQKY, CCCCCCCCCCNFCCGKFKPPVNESHDQYSHLNRPDGNREGNDMPTHLGQPPRLEDVDLDDVNLGAGGAPVTSQPREQAGGQPVFAMPPPSGAVGVNPFTGAPVAANENTSLNTTEQTTYTPDMVNQKY, CCCCCCCCCNFCCGKFKPPVNESHDQYSHLNRPDGNREGNDMPTHLGQPPRLEDVDLDDVNLGAGGAPVTSQPREQAGGQPVFAMPPPSGAVGVNPFTGAPVAANENTSLNTTEQTTYTPDMVNQKY, CCCCCCCCLCRDSCVSTWTKNSVANAVATNASSEVSIYSGSFLAILCTFSTGNLGEHRGADAVSLPLVSLFIVLA, CCCCCCCCNFCCGKFKPPVNESHDQYSHLNRPDGNREGNDMPTHLGQPPRLEDVDLDDVNLGAGGAPVTSQPREQAGGQPVFAMPPPSGAVGVNPFTGAPVAANENTSLNTTEQTTYTPDMVNQKY See also: ?Entry ?ID ?Match ?Sequence ?string Path Function Path - find a path between two nodes of a graph Calling Sequence: Path(g,n1,n2) Parameters: Name Type Description ------------------------------- g Graph given graph n1 Node source node n2 Node destination node Returns: list(Edge) Synopsis: Find a path between n1 and n2, returning all the edges that need to be traversed in a list. If there is no path, it returns an empty list. Examples: > g := Graph( Edges(Edge(1.2,1,2),Edge(2,1,4),Edge(3,1,5),Edge(4,2,3),Edge(5,3,4)),Nodes(1,2,3,4,5)); g := Graph(Edges(Edge(1.2000,1,2),Edge(2,1,4),Edge(3,1,5),Edge(4,2,3),Edge(5,3,4)),Nodes(1,2,3,4,5)) > Path(g,3,5); [Edge(4,2,3), Edge(1.2000,1,2), Edge(3,1,5)] See Also: ?BipartiteGraph ?Graph_minus ?Nodes ?Clique ?Graph_Rand ?ParseDimacsGraph ?DrawGraph ?Graph_XGMML ?RegularGraph ?Edge ?InduceGraph ?ShortestPath ?EdgeComplement ?MaxCut ?TetrahedronGraph ?Edges ?MaxEdgeWeightClique ?VertexCover ?FindConnectedComponents ?MinCut ?Graph ?MST PerIdentToPam Function PerIdentToPam - Compute PAM distance from percentage identity Calling Sequence: PerIdentToPam(p) Parameters: Name Type Description ------------------------------------ p numeric percentage identity Returns: numeric Synopsis: Compute the PAM distance which results in the given percentage identity. Examples: > PerIdentToPam(17); 289.5953 See also: ?PamToPerIdent Permutation Class Permutation - a mathematical permutation Template: Permutation(p) Permutation(n) Fields: Name Type Description --------------------------------------------------------------- p list(posint) list of integers from 1 to n n posint creates an identity permutation of size n Returns: Permutation Methods: Permutation_type power Rand string times Synopsis: A Permutation holds a list of consecutive positive integers which describe how to permute a set of size n. Permutations can be multiplied (the product of two permutations a * b is a permutation which is identical to applying b and then a). Permutations can also be powered, in particular an inverse permutation is obtained by 1/a. Examples: > a := Rand(Permutation(7)); a := Permutation([4, 5, 6, 2, 3, 1, 7]) > b := Rand(Permutation(7)); b := Permutation([5, 6, 1, 2, 4, 7, 3]) > a*b; Permutation([2, 4, 7, 6, 1, 5, 3]) > 1/a; Permutation([6, 4, 5, 1, 2, 3, 7]) See also: ?CreateRandPermutation ?Mutate ?Rand ?Shuffle PhyML Function PhyML - Wrapper for PhyML, a ML tree reconstruction tool Calling Sequence: PhyML(msa) Parameters: Name Type Description ---------------------------------------------------------------------------------------------------------- msa {MAlignment,list(string)} Multiple Sequence Alignment labels list(string) (optional) Sequence Labels subst string (optional) substitution model inv_sites inv_sites=boolean (optional) Estimate invariant sites gamma_dist gamma_dist={'e',positive (optional) Use or estimate gamma parameter rate_cats rate_cats={numeric} (optional) number of discrete rate categories inv_sites inv_sites=boolean (optional) Estimate invariant sites start_tree start_tree={Tree,string} (optional) start tree for search nr_bootstrap nr_bootstrap={numeric} (optional) number of bootstrap samples opt_topo opt_topo=boolean (optional) optimize topoplogy opt_branch opt_branch=boolean (optional) optimize branchlengths seqtype seqtype=string (optional) type of sequences (default AA) search_heuris search_heuris=string (optional) applied search heuristics LnLperSite LnLperSite=boolean (optional) Report log-likelihood values per site? (default NO) Returns: TreeResult Synopsis: PhyML is a tool to compute maximum likelihood trees from multiple sequence alignments. For details see manual (Reference section). Available substitution models: JTT subst=string HKY85,JC69,K80,F81,F84,TN93, GTR,LG,WAG,JTT,MtREV, Dayhoff,DCMut,RtREV,CpREV,VT, Blosum62,MtMam,MtArt,HIVw, HIVb seqtype={AA,DNA} specify type of sequence data. By default, Amino Acid is assumed Available model modifiers: inv_sites={'e', 0..1} estimate (e) or set proportion of invariant sites to a fixed value. gamma_dist={'e',positive} estimate (e) or set the gamma rate parameter. rate_cats=integer number of discrete rate categories Other parameters: nr_bootstrap=integer determines the amount of bootstrap samples to be evaluated. Default=0 start_tree={Tree,'MP','BioNJ'} specifies the start topology for the ML search. 'MP' uses a maximum parsimony tree and 'BioNJ' starts with a Neighbor-Joining tree. Alternatively, you can pass a starting topology. Default='MP' search_heuris={NNI,SPR,BEST} specifies the applied seach heuristics. By default, NNI is used. opt_topo=boolean specifies, whether or not the topology is optimized. opt_branch=boolean specifies, whether or not the branchlengths are optimized. References: Guindon S., Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood, Systematic Biology, 52(5):696-704, 2003. Examples: > msa := Rand(MAlignment):; > PhyML(msa, 'subst'='LG','inv_sites'='e'); TreeResult(Tree(Tree(Leaf(RandSeq7, 1e-10),0,Leaf(RandSeq9,0.2974)),0,Tree(Leaf( RandSeq6,0.2229),0.2173,Tree(Tree(Tree(Leaf(RandSeq5,1.0211),1.0162,Leaf( RandSeq8,1.3645)),0.7386,Tree(Tree(Leaf(RandSeq3,1.0698),1.0698,Leaf(RandSeq10, 1.5540)),0.7504,Leaf(RandSeq1,0.7580))),0.4718,Tree(Leaf(RandSeq2,0.4878),0.4878 ,Leaf(RandSeq4,0.6897))))),ML,table([{[Likelihood, [-3094.8008]]}, {}, {}, {[ InvSites, [0.00500000]]}, {}, {[SubstModel, [LG]]}, {[CPUtime, [12.4500]]}, {}, {[Method, [Phyml 3.0]]}, {[Alpha, [99.8640]]}, {}, {}, {}],unassigned)) See Also: ?LeastSquaresTree ?MAlign ?RellTree ?Tree ?MafftMSA ?PhylogeneticTree ?RobinsonFoulds ?TreeResult PhylogeneticTree Function PhylogeneticTree - Constructs Phylogenetic Trees Calling Sequence: PhylogeneticTree(Seqs,Ids,Mode) Parameters: Name Type Description --------------------------------------------------------------------------- Seqs list Sequences or Entries from which a tree is built Ids {list,procedure} list of id tags or procedure that produces tags Mode symbol method - DISTANCE, PARSIMONY or LINEAGE msa MAlignment optional Multiple sequence alignment allall matrix optional all vs all matrix of Alignments Returns: Tree Global Variables: DimensionlessFit MST_Qual printlevel Synopsis: PhylogeneticTree is a method for constructing phylogenetic trees using either minimization of the least squares of the distances in the real data and computed tree or by minimizing the number of changes/mutations that would be required If the mode passed is DISTANCE, an all-against-all (each sequence aligned against each other sequence) is calculated and the distance and variance information is used to compute a binary tree which approximates via least squares the distance information. If an optional array of Alignment data structures is passed as an argument, this all-against-all will be used instead of recalculating it. Ten trees are constructed from random starting points and the best tree is returned. All trees are optimized using iterations of 4-optim and 5-optim which optimize all subtrees with 4 and 5 branches respectively. The quality of the fit is measured by the sum of the squares of the weighted deviations divided by (n-2)(n-3)/2. This value is stored in the global variable MST_Qual. If the global variable MinLen is assigned a positive value, it will determine the minimum length between internal or external nodes. If not set, 0.1 PAM is used. The distance of the branches are the approximate distances calculated by least squares in PAM units. Since the tree is made from alignments, the input sequences must be protein or DNA sequences. If the mode passed is PARSIMONY, random trees are constructed and then optimized with 4-and 5-optim using the parsimony criterion (the tree with the least amount of mutations is the best tree). This is sometimes also called character compatibility. Each position of the given sequences is treated as a character. The goal of the parsimony trees is to build a tree such that we can assign character changes on the branches of the tree and this total number of changes is minimized. Amino acids or DNA bases can be used as characters, but also any other arbitrary symbol (characters are restricted to be ASCII characters though). If a MAlignment data structure is passed as an optional argument, this alignment is used. If all the sequences are exactly the same length, it is assumed that they have been already aligned and they are taken as given. If not, the sequences in Seqs are aligned with the circular tour method (See ?MAlign). The global variable MST_Qual is assigned the number of changes that the returned tree requires. The distances in the tree are taken from the parsimony construction and indicate the minimum number of changes that must occur in that particular branch. The Parsimony method accepts an additional parameter which indicates which method to use to build the initial tree. This tree is later optimized. The methods to build the initial tree are: NJRandom Neighbour Joining with randomness in the selection of the best pair to join. CircularTour A circular tour of minimum cost is built at each step, and the pair of nodes with least cost is selected to be joined. NeighJoin Neighbour Joining. At each step the two subtrees with the least cost to join them are joined. DynProgr(k) Use a dynamic programming approach among the k best results of Neighbour Joining. DynProgr Identical to DynProgr(10) OptInsertion Insert each leaf/subtree in the best possible branch of the previously built subtrees. This is the default choice, it is a bit slow, but normally gives the best trees. Random Leaves/subtrees are joined randomly. Quite fast, but produces poor trees. LowerBound Do not build a tree, just compute a lower bound on the cost of the tree (minimum number of changes). SemiOptInsertion(t) Like OptInsertion, but limit the search of the best insertion to t seconds. SemiOptInsertion Synonym of SemiOptInsertion(10). If the mode passed is StrictCharacterCompatibility, then it is assumed that the Seqs are strings (all of the same lengths) of binary characters. Any symbols can be used for the characters. If the characters are not compatible, an error is given with the first pair of characters which are not compatible. The global variable MST_Qual will contain the minimum number of character changes, which is equal to the number of informative characters (and never greater than the length of the sequences of characters). If the mode passed is LINEAGE, then it is assumed that the Seqs are lists containing lineage descriptions. The lists are assumed to classify each sequence from the most general to the most specific class. The lineage descriptions have to be consistent, that is if a particular class is used, then it should always be preceded with the same sequence of classes. The classes are typically strings, but could be any valid Darwin object. Examples: > Ids := ['one','two','three','four']: > Seqs := ['RTHKLPEMNVC', 'KSHKLPEMNVC', 'SHKLMNVC', 'HKLPEMNVC']: > PhylogeneticTree(Seqs,Ids,DISTANCE); > MST_Qual; 0.01116240 > PhylogeneticTree(Seqs,Ids,PARSIMONY); Tree(Tree(Leaf(one,2.5000,1),0.5000,Leaf(four,1.5000,4)),0,Tree(Leaf(three,2.5000,3),0.5000,Leaf(two,1.5000,2))) > Seqs := [B1xj,B2zj,G2zi,G1xi,G2xi]: > PhylogeneticTree(Seqs,[seq(i,i=1..5)],parsimony); Tree(Tree(Tree(Leaf(1,3.5000,1),1.5000,Leaf(4,1.5100,4)),0.5000,Leaf(5,0.5100,5)),0,Tree(Leaf(2,2.5000,2),0.5000,Leaf(3,0.5100,3))) > MST_Qual; 6 See Also: ?BootstrapTree ?Leaf ?SignedSynteny ?ComputeDimensionlessFit ?LeastSquaresTree ?Synteny ?DrawTree ?MAlignment ?Tree ?Entry ?RBFS_Tree ?Tree_matrix ?GapTree ?Sequence Plot2Gif Function Plot2Gif - convert a plot output to a gif file Calling Sequence: Plot2Gif(opt) Parameters: Name Type Description ----------------------------------------------------------------------- opt 'landscape' (optional) produce the gif in landscape format opt 'portrait' (optional) produce the gif in portrait format opt output = string (optional) file name to place the result Returns: NULL Synopsis: Uses underlying unix/linux commands to convert the output of a Draw/Plot command to a xxx.gif file. The commands used are pstopnm and ppmtogif and may not exist in all versions of the operating systems. Examples: > Plot2Gif( landscape, output='figure1.gif' ); See Also: ?BrightenColor ?DrawPlot ?Set ?ColorPalette ?DrawPointDistribution ?SmoothData ?DrawDistribution ?DrawStackedBar ?StartOverlayPlot ?DrawDotplot ?DrawTree ?StopOverlayPlot ?DrawGraph ?GetColorMap ?ViewPlot ?DrawHistogram ?PlotArguments PlotArguments Class PlotArguments - structure to hold plotting/drawing options Template: PlotArguments(Title,TitleX,TitleY,TitlePts,Lines,Grid,LabelFormat, GridFormat,Colors,Axis) Fields: Name Type Description -------------------------------------------------------- Title string text to be displayed in the plot TitleX numeric x coordinate of the title TitleY numeric y coordinate of the title TitlePts numeric point size of the title Lines boolean Grid boolean LabelFormat string GridFormat string Colors string colour map Axis boolean axis will be drawn Returns: PlotArguments Methods: draw PlotArguments_type Title Synopsis: Structure to hold plot options. This structure is used internally by several drawing functions. The way of filling the values is uniform for all the functions, and these accept the values in the following format: Title = string text to be displayed in the plot TitleX = numeric x coordinate of the title TitleY = numeric y coordinate of the title TitlePts = numeric point size of the title Lines = boolean draw horizontal lines Grid = boolean draw a grid (horizontal and vertical lines LabelFormat = string printf-style format for labels GridFormat = string printf-style format for Lines or Grid values Colors = list list of colors suitable for GetColorMap Axis = boolean draw x and y axes See Also: ?BrightenColor ?DrawPlot ?Set ?ColorPalette ?DrawPointDistribution ?SmoothData ?DrawDistribution ?DrawStackedBar ?StartOverlayPlot ?DrawDotplot ?DrawTree ?StopOverlayPlot ?DrawGraph ?GetColorMap ?ViewPlot ?DrawHistogram ?Plot2Gif PlotIndex Function PlotIndex - Plot a Variation Index Calling Sequence: PlotIndex(ma) Parameters: Name Type Description ---------------------------------------------------- ma array(string) multiple sequence alignment index array(numeric) a variation index Returns: NULL Synopsis: Plots a histogram from the variation index. See also: ?KWIndex ?PrintIndex ?ProbIndex ?ScaleIndex Poisson_Rand Function Poisson_Rand - Generate random Poisson-distributed integers Calling Sequence: Rand(Poisson(m)) Returns: integer Synopsis: This function returns a random Poisson-distributed integer with average m and variance m. The Poisson distribution is the limiting case of the binomial distribution when n -> infinity and n*p=m remains bounded. In mathematical terms, the probability that the outcome is i is exp(-m) * m^i / i! (for 0 <= i). Poisson_Rand uses Rand() which can be seeded by either the function SetRand or SetRandSeed. References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.1.22 Examples: > Rand(Poisson(20)); 12 > Rand(Poisson(1000)); 979 See Also: ?Beta_Rand ?Exponential_Rand ?Multinomial_Rand ?StatTest ?Binomial_Rand ?FDist_Rand ?Normal_Rand ?Std_Score ?ChiSquare_Rand ?GammaDist_Rand ?SetRand ?Student_Rand ?CreateRandSeq ?Geometric_Rand ?SetRandSeed ?Zscore ?Cumulative ?Graph_Rand ?Shuffle Polar Data structure Polar( Rho:numeric, Theta:numeric ) Data structure Polar( Rho, Theta ) Representation of complex numbers in polar form. The number is Rho * exp( i*Theta ). - Operations: Initialization: a := Polar(1,Pi/2); b := Polar(0,1); All arithmetic operations: a+b, a-b, a*b, a/b, a^b, |a| Special functions exp(a), ln(a), sin(a), cos(a), tan(a) Printing: print(a); printf( '%.3f', a ); Type testing: type(a,Polar); - Conversions: To string : string(a) Complex : Complex(a) Polar : Polar(Complex(...)) - Selectors: a[Re] : real part a[Im] : imaginary part a[Rho] : radius or absolute value a[Theta] : angle, (-Pi < a[Theta] <= Pi) PolishAngles Function PolishAngles( g:Graph, angles:array(numeric) ) Attempts to polish angles by collapsing g to a tree. PositionTree Function PositionTree( ma:array(string), t:Tree, pos:posint ) Creates a tree containing the amino acids of position pos in ma as labels. PostscriptFigure Class PostscriptFigure - figure given by a postscript file (Darwin or other) Template: PostscriptFigure() Fields: Name Type Description --------------------------------------------------------------------------- psfile string (opt) file name containing the postscript caption Caption = string (opt) caption to describe the figure convmeth Convert = string (opt) conversion method linkas LinkAs = string (opt) path of image source in HTML newfn PlaceUnder = string (opt) name of converted image file modif string = string (opt) pattern substitutions for input file Returns: PostscriptFigure Methods: HTMLC LaTeXC PostscriptFigure_type Rand string Synopsis: A PostscriptFigure object is constructed from a postscript file which could be generated by a Darwin Draw command or from some other source, e.g. xfig. This structure is normally held in a Document and is displayed as appropriate (as HTML, latex or a string). If no psfile is given, it is assumed that it comes from a Draw command and hence plotoutfile is used. When this structure is converted to HTML, a .gif or .jpg file has to be made. The default method is 'auto' which will use the UNIX tool 'convert' to automatically create a .jpg file without user interaction. If this does not lead to satisfying results or some modifications (e.g. rotation) has to be performed, the method 'gimp' should be used. This will open the file in Gimp and gives control to the user. Hence Gimp has to be available in the system. The LinkAs option allows linking the file under a different path when converting to HTML. With PlaceUnder a filename for the converted file can be given. This filename also determines the image format (.gif or .jpg). If it is converted to latex, the postscript is converted to encapsulated postscript with ps2eps, which should also be available. Conversion to a string just prints a box with a unix command suitable to display the contents. The modifiers are a simple mechanism to modify previously created postscript files. Textual substitution will be performed (length issues are ignored, and most of the time they work well). These substitutions should be based on a relatively unique pattern, short patterns that may coincide with other postscript commands are bound to be disastrous. Examples: > PostscriptFigure( 'PAMgraph.ps', Caption='Score vs PAM'); PostscriptFigure(PAMgraph.ps,Caption = Score vs PAM,Convert = auto,PlaceUnder = PAMgraph.jpg,LinkAs = PAMgraph.jpg) See Also: ?Block ?Document ?latex ?RunDarwinSession ?Code ?HTML ?List ?screenwidth ?Color ?HyperLink ?Paragraph ?Table ?Copyright ?Indent ?print ?TT ?DocEl ?LastUpdatedBy ?Roman ?View PredictGenes Function PredictGenes( ms:list(NucPepMatch) ) Predict the best disjoint genes implied by ms. All matches in ms must refer to the same nucleotide sequence. Returns genes: list([cds: list(posint..posint), simil: numeric, nr: set]), exons: list(Region), introns: list(Region). PrintIndex Function PrintIndex - Prints a Variation Index Calling Sequence: PrintIndex(ma,index) Parameters: Name Type Description ---------------------------------------------------- ma array(string) multiple sequence alignment index array(numeric) a variation index Returns: NULL Synopsis: Prints the multiple alignment, followed by the indices, one position per row. Examples: > ma := [ 'AKQVVLLIFGSW', 'AEPIVPLLFGMW', 'AEVIVPLLFGVW', 'AEPIVPLLFGLW', ' EPIVPLL__MW', ' PIVPLLFGMW']: > tree := Tree(Tree(Leaf(3,-50.3881,c),-31.1550,Tree(Tree( Leaf(2,-52.2087,b),-50.4844,Tree(Leaf(6,-71.9795,f),-53.3023, Leaf(5,-92.0774,e))),-41.0671,Leaf(4,-48.3231,d))),0,Leaf(1,-62.9954,a)): > prxd := ProbIndex (ma, tree); prxd := [1.5749, 3.0664, 6.2335, 3.1332, 2.1029, 3.9343, 1.6915, 2.9950, 2.0193, 1.5307, 6.9936, 2.2708] > PrintIndex(ma,prxd); 1 AAAA 1.57 2 KEEEE 3.07 3 QPVPPP 6.23 4 VIIIII 3.13 5 VVVVVV 2.10 6 LPPPPP 3.93 7 LLLLLL 1.69 8 ILLLLL 2.99 9 FFFF_F 2.02 10 GGGG_G 1.53 11 SMVLMM 6.99 12 WWWWWW 2.27 See also: ?KWIndex ?PlotIndex ?ProbIndex ?ScaleIndex PrintInfo Function PrintInfo( entries:{integer,structure}, tag1:string ) Print the entry number and information tags (tag1 and additional optional tags) for an entry given by number or several entries given by a data structure. PrintMatrix Function PrintMatrix Calling Sequence: PrintMatrix(A,format) Parameters: Name Type ---------------------------------------------------- A a rectangular or square matrix format optional, a formatting string, as in printf Returns: NULL Synopsis: This function pretty-prints a square or rectangular matrix. It is normally used by the print() command. If called directly, the user can specify the format to be used. Without a printing format, it will calculate a reasonable format to fit on the screen width. Examples: > PrintMatrix( [[1,2], [3,4]] ); 1 2 3 4 > PrintMatrix( [[1/7,2/7], [3/7,4/7]], '%13.10f'); 0.1428571429 0.2857142857 0.4285714286 0.5714285714 See also: ?print ?printf (for the codes accepted as format) PrintStringMatch Function PrintStringMatch( pat:string, t:string ) Print the alignment of a string (pat) matched against a text (t). PrintTreeSeq Function PrintTreeSeq( t:Tree ) Print out sequences cross referenced in a tree. ProbAncestor Function ProbAncestor Calling Sequence: ProbAncestor(ps1,ps2,d1,d2) ProbAncestor(ps1,ps2,d1,d2,lnM,freq) Parameters: Name Type Description -------------------------------------------------------------- ps1, ps2 ProbSeq Probabilistic sequences d1, d2 numeric Distances to the common ancestor lnM matrix(numeric) (optional) log. of a 1-PAM matrix freq array(numeric) (optional) character frequencies Returns: ProbSeq Global Variables: LogLikelihoods Synopsis: Given two probabilistic sequences and the distances to their common ancestor, this function computes the probabilistic ancestral sequence (PAS). The logarithm of a 1-PAM matrix is needed to compute the mutation matrices for the two distances. The mutation matrix NewlogPAM1 is the default value and can be used for amino acid sequences. For codon sequences CodonLogPAM1 is recommended. The ancestral probabilities depend on the natural frequencies of the characters. By default, the amino acid frequencies AF are used. The global variable LogLikelihoods will be assigned to an array containing the ln of the likelihoods at each position. References: GM Cannarozzi, A Schneider and GH Gonnet (2007): Probabilistic Ancestral Sequences Based on the Markovian Model of Evolution - Algorithms and Applications, in: D Liberless (editor): Ancestral Sequence Reconstruction, Oxford University Press. Examples: > ps1 := ProbSeq('AARV',IntToA): > ps2 := ProbSeq('AVVV',IntToA): > pas := ProbAncestor(ps1,ps2,10,10): > print(pas);; pos Most probable chars 1 A 1.00 S 0.00 V 0.00 G 0.00 T 0.00 2 A 0.56 V 0.43 L 0.00 T 0.00 I 0.00 3 V 0.56 R 0.34 A 0.02 L 0.02 K 0.02 4 V 1.00 I 0.00 L 0.00 A 0.00 T 0.00 See Also: ?CreateCodonMatrices ?PASfromMSA ?ProbSeq ?CreateDayMatrices ?PASfromTree ?PSDynProg ProbBallsBoxes Function ProbBallsBoxes - probability of hitting k eps-boxes with n balls Calling Sequence: ProbBallsBoxes(k,n,eps) Parameters: Name Type Description ---------------------------------------------------------- k posint number of boxes n posint number of balls randomly thrown in [0,1] eps positive 0 ProbBallsBoxes(3,10,0.0001); 7.1924e-10 See Also: ?Cumulative ?DigestWeights ?MassProfileResults ?StatTest ?DigestAspN ?DynProgMass ?OutsideBounds ?Std_Score ?DigestionWeights ?DynProgMassDb ?ProbCloseMatches ?DigestSeq ?enzymes ?SearchMassDb ?DigestTrypsin ?lnProbBallsBoxes ?Stat ProbCloseMatches Function ProbCloseMatches - prob of k eps-close matches among U(0,1) values Calling Sequence: ProbCloseMatches(k,n1,n2,eps) Parameters: Name Type Description ----------------------------------------------------------------- k posint number of matches n1 posint number of points randomly thrown in [0,1] n2 posint number of points randomly thrown in [0,1] eps positive 0 ProbCloseMatches(4,10,22,0.0001); 5.7379e-08 See Also: ?Cumulative ?OutsideBounds ?SearchMassDb ?StatTest ?DynProgMassDb ?ProbBallsBoxes ?Stat ?Std_Score ProbDynProg Function ProbDynProg - Probabilistic dynamic programming Option: builtin Calling Sequence: ProbDynProg(A,B,f,w,FixedDel,IncDel) Parameters: Name Type -------------------------------- A array(array(numeric)) B array(array(numeric)) f array(numeric) w posint FixedDel numeric IncDel numeric Returns: NULL Synopsis: Probabilistic dynamic programming. Examples: See also: ProbIndex Function ProbIndex - Compute the Probability Index Calling Sequence: ProbIndex(ma) Parameters: Name Type Description -------------------------------------------------- ma array(string) multiple sequence alignment t Tree a phylogenetic tree Returns: list(numeric) Synopsis: Computes a variation index defined as -log10( Probability{position} ) for all positions of a multiple alignment. Examples: > ma := [ ' -------------------------FPEVVGKTVDQA ..(535).. CSPRKGTKT']; ma := [ -------------------------FPEVVGKTVDQAREYFTLHYPQ , -------------------IASAGFVRDAQGNCIK--- , AKQVVLLIFGSWQLARERLANEMRKAVAY__TFL__NFDMGRQPLSMHYSDKVCSPRMSTET, AEPIVPLLFGMWRLKRKKANNKLLRCVKY__TLLARNTSDGREPVACRYSEKICSPRTGTKT, AEVIVPLLFGVWRLKREERTYTLLQCVKY__VFLARNTVAGNRPLSKKFSEKVCSPRK , AEPIVPLLFGLWQLAREKASNTLLQCVKY__VFLARNTVAGRRPLKMKYSDKVCSPRKGAKT, EPIVPLL__MWQLAIEKSSNTLLQCVK__KVFLARKTVAGRRPLSMKFSDKVCNPRKGTKT, PIVPLLFGMWQLAREKASNTLLQCVKYYYVFLARNTVAGRRPLSMKYSDKVCSPRKGTKT] > tree := Tree(Tree(Leaf(b,-250.0000,2),-2.8422e-14, ..(272).. 00,3))))))); tree := Tree(Tree(Leaf(Permutation([5, 6, 1, 2, 4, 7, 3]),-250,2),-2.8422e-14,Leaf(Permutation([4, 5, 6, 2, 3, 1, 7]),-250,1)),0,Tree(Leaf(h,-250,8),-209.7583,Tree(Leaf(g,-260.8121,7),-227.6537,Tree(Leaf(f,-256.9830,6),-233.8701,Tree(Leaf(d,-240.9182,4),-235.7326,Tree(Leaf(e,-252.2867,5),-237.4908,Leaf(c,-239,3))))))) > prxd := ProbIndex (ma, tree); prxd := [1.6978, 2.8954, 5.5145, 2.4769, 1.8613, 3.2753, 1.5187, 2.4065, 1.8195, 1.4028, 6.9273, 2.1637, 4.8046, 1.5187, 5.2698, 4.5194, 3.1698, 4.0876, 6.5461, 5.0991, 4.2424, 4.5438, 2.7156, 4.4407, 5.2065, 4.4283, 2.9725, 4.3070, 2.9907, 3.7505, 6.7152, 6.2474, 5.2650, 4.0940, 3.4090, 4.0343, 6.7206, 6.0418, 7.7801, 6.6427, 2.4979, 4.9061, 5.6526, 3.3186, 4.0506, 6.6403, 7.4820, 5.9496, 5.6263, 3.3733, 5.2957, 1.9501, 2.6816, 2.0689, 3.7597, 1.7027, 1.8456, 5.0920, 2.7307, 3.9599, 2.8101, 1.8395] See also: ?KWIndex ?PlotIndex ?PrintIndex ?ScaleIndex ProbSeq Class ProbSeq - stores a generic probabilistic sequence Template: ProbSeq(ProbVec,CharMap) Fields: Name Type Description ------------------------------------------------------------ ProbVec {string,array(array)} Probability vectors CharMap procedure Character mapping function Methods: print ProbSeq_type Sequence Synopsis: ProbSeq stores a generic (i.e. any type of sequence - amino acid, nucleotides, codons or others) probabilistic sequence in the form of a probability vectors. Hence each position of the sequence is a vector giving the probability of each possible character. The sum of the probabilities at each position is 1 except vectors containing only zeros denoting a gap at this position. The ProbSeq can alternatively be constructed with a sequence as a string and a mapping function (typically one of IntToA, IntToB or CIntToCodon). It will then automatically construct a probabilistic sequence with a 1 for the known character and 0 otherwise. If only the probabilistic vectors are given, the constructor tries to find the appropriate mapping function based on the number of characters. Examples: > ps1 := ProbSeq('ADRIAN',IntToA); ps1 := ProbSeq([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],IntToA) > ps2 := ProbSeq([[.5,0,.5,0],[.3,.7,0,0]]); ps2 := ProbSeq([[0.5000, 0, 0.5000, 0], [0.3000, 0.7000, 0, 0]],IntToB) > print(ps2); pos Most probable chars 1 A 0.50 G 0.50 2 C 0.70 A 0.30 See Also: ?CIntToCodon ?IntToB ?PASfromTree ?PSDynProg ?IntToA ?PASfromMSA ?ProbAncestor Process Class Process - structure to hold Process information Template: Process(Pid,Job,Stopped,EventTime,JobTime) Fields: Name Type Description ----------------------------------- Pid integer Job integer Stopped boolean EventTime numeric JobTime numeric ElapsedTime string Returns: Process Methods: Process_type select Synopsis: This data structure holds information about a particular process in a machine. The main application is for parallel processing and hence it contains all sorts of status information. See also: ?darwinipc ?Machine ?ParExec2 Protect Function Protect - Protect fields from a class Calling Sequence: Protect(classname,field1,...) Parameters: Name Type Description ------------------------------------------------- classname symbol a class name to be protected field1 symbol a field name of classname Returns: NULL Global Variables: printlevel Synopsis: Protect sets up the appropriate mechanism so that the named fields of the class cannot be changed by any function other than the methods already defined at the time that Protect is called. If Protect is called without any field name, then all the fields of the data structure which have not been protected yet, are protected. Darwin does not support the concept of hiding at this point. That is, prevent the user from reading a value from a field. We do not see any advantages to hiding and we do see disadvantages to it. The protection operates at two levels. First, all indexing references are forbidden by setting the option "NoIndexing" in the class. Secondly, the fields mentioned are given a special name, identical in appearance to the defined one, but different from what a user can type. All methods referring to the class will have these names fixed appropriately. Additional calls to Protect can be used to Protect names not yet protected and hence create a hierarchy of protected names and functions that can use them. Examples: > Protect( Polar, Rho, Theta); See also: ?CompleteClass ?ExtendClass ?Inherit ?objectorientation ?option PruneTree Function PruneTree Calling Sequence: PruneTree(t,contains) Parameters: Name Type Description --------------------------------------------------------------------- t Tree contains {list,procedure,set} labels remaining in the pruned tree Returns: Tree Synopsis: This function returns a pruned version of the input tree containing only leaves whose labels are member of the 'contains'-set / -list or for which 'contains()' of a Leaf() structure evaluates to true respectively. Examples: > T := Tree( Leaf('a', 2), 0.5, Tree(Leaf('b',1.5),0.7,Leaf('e', 1)) ); T := Tree(Leaf(a,2),0.5000,Tree(Leaf(b,1.5000),0.7000,Leaf(e,1))) > PruneTree( T, ['a','b'] ); Tree(Leaf(a,2),0.5000,Leaf(b,1.5000)) See also: ?Leaf ?RotateTree ?Tree RAxML Function RAxML - Wrapper for RAxML, a ML tree reconstruction tool Calling Sequence: RAxML(msa) Parameters: Name Type Description ----------------------------------------------------------------------------------------- msa {MAlignment,list(string)} Multiple Sequence Alignment labels list(string) (optional) Sequence Labels subst string (optional) substitution model inv_sites inv_sites=boolean (optional) Estimate invariant sites estimate_basefreqs estimate_basefreqs=boolean (optional) Estimate base frequencies start_tree start_tree={Tree,string} (optional) start tree for search nr_runs nr_runs=posint (optional) number of ML tree searches bootstrap bootstrap={0,posint} (optional) number of bootstrap samples rates rates=string (optional) Rates model threaded threaded=integer (optional) number of threads to be used eps eps=positive (optional) stop criteria for ML search Returns: TreeResult Synopsis: RAxML is a tool to compute maximum likelihood trees from multiple sequence alignments. For details see manual (Reference section). Available substitution models: GONNET matrices subst=string GONNET, JTT, DAYHOFF, WAG, BLOSUM62, MTREV, RTREV, CPREV, MTMAM, VT (all Protein) or GTR (DNA) Available model modifiers: inv_sites=boolean estimate proportion of invariant sites. Default=false estimate_basefreqs=boolean estimate the base frequences from the data, otherwise use fixed frequencies from the model. Default=false rates=string choice of rates implementation. Available are 'CAT', 'GAMMA' and 'MIX'. 'CAT' classifies each site into a fixed rate category. Likelihoods between different topologies are not comparable and thus, the method is only available in combination with 'nr_runs'=1. 'GAMMA' uses 4 discrete rate categories according to a gamma distribution and estimates the alpha parameter. The 'MIX' searches for a good topology using the 'CAT' model and switches afterwards to the 'GAMMA' model to compute stable likelihoods. default='MIX'. Parameters determine exhaustiveness of reconstruction: nr_runs=posint determines the number of ML tree searches on the original multiple sequence alignment. default=10 bootstrap={0,posint} determines the amount of bootstrap samples to be evaluated. Default=0 start_tree={Tree,'MP','random'} specifies the start topology for the ML search. 'MP' uses for each run a different maximum parsimony tree and 'random' starts with a random topology. Alternatively, you can pass a starting topology. Default='MP' eps=positive ML search will be stopped if the likelihood increased by less than 'eps'. Default=0.1 Other parameters: threaded=integer specifies the number of threads. If set to <= 1, the sequential program is used. Default=1. References: Alexandros Stamatakis. RAxML-VI-HPC: Maximum Likelihood-based Phylogenetic Analyses with Thousands of Taxa and Mixed Models, Bioinformatics 22(21):2688-2690, 2006 Source code and Manual: http://icwww. epfl.ch/~stamatak/index-Dateien/Page443.htm Examples: > msa := Rand(MAlignment):; > RAxML(msa, 'nr_runs'=2,'bootstrap'=100,'inv_sites'=true); TreeResult(Tree(Tree(Leaf(RandSeq9,0.2961),0,Tree(Leaf(RandSeq6,0.2228),0.2127, Tree(Tree(Leaf(RandSeq2,0.4692),0.4692,Leaf(RandSeq4,0.6664),73),0.4559,Tree( Tree(Leaf(RandSeq8,1.3377),0.9953,Leaf(RandSeq5,1.0007),100),0.7236,Tree(Leaf( RandSeq1,0.7442),0.7398,Tree(Leaf(RandSeq10,1.5340),1.0507,Leaf(RandSeq3,1.0612) ,100),95),100),100),100)),0,Leaf(RandSeq7,1.0473e-06)),ML,table([{[Likelihood, [ -3120.6126]]}, {}, {}, {[InvSites, [0.00011700]]}, {}, {[SubstModel, [GONNET]]}, {[CPUtime, [21.6900]]}, {}, {[Method, [RAxML]]}, {[Alpha, [1000.0997]]}, {}, {}, {}],unassigned)) See Also: ?LeastSquaresTree ?MAlign ?RobinsonFoulds ?TreeResult ?MafftMSA ?PhylogeneticTree ?Tree RBFS_Tree Function RBFS_Tree - apply heuristics to improve a distance tree Calling Sequence: RBFS_Tree(t,Dist,Var) Parameters: Name Type Description ------------------------------------------------------------------- t Tree input distance Tree Dist matrix(numeric) distance matrix used to build Tree Var matrix(numeric) variances of the distances 'Top' = posint (default=1) number of best trees to return Returns: set([numeric, Tree]) Synopsis: RBFS_Tree is a method for improving distance phylogenetic trees using heuristics. The first type of heuristics, called Reduce Best Fitting Subtree (RBFS) selects a set of subtrees which are highly consistent and their fit is of good quality, replaces them with a single leaf and attempts to optimize the reduced tree. The second heuristic chooses, from different trees, subtrees which are on the same set of leaves and tries to graft them together hoping that the resulting tree is better. RBFS_Tree returns a set of pairs: [DimensionlessFit,Tree]. The number of trees returned can be changed with the optional parameter Top=n. The trees returned are the ones which have the highest quality (lowest DimensionlessFit value). See Also: ?BootstrapTree ?Leaf ?SignedSynteny ?ComputeDimensionlessFit ?LeastSquaresTree ?Synteny ?GapTree ?PhylogeneticTree ?Tree RGB_string Function RGB_string - convert an RGB vector into a color name Calling Sequence: RGB_string(rgb) RGB_string(r,g,b) Parameters: Name Type Description ----------------------------------------------------- rgb list(nonnegative) an RGB vector of length 3 r nonnegative intensity for red (0..1) g nonnegative intensity for green (0..1) b nonnegative intensity for blue (0..1) Returns: string Synopsis: This function converts a 3 value RGB vector into a color name. The vector contains the values for red, green and blue in a scale of 0 to 1. Black is [0,0,0] and white is [1,1,1]. The matching is approximate and the result is the one which is closest in euclidean distance to one in the table. About 650 colours are known to this function. The full list can be found at lib/Color. Examples: > RGB_string([0,0,0]); black > RGB_string(0.5,1,0); chartreuse > RGB_string(.8,.4,.1); chocolate3 See also: ?Color ?DrawTree ?string_RGB RSCU Function RSCU - Relative synonymous codon usage Calling Sequence: RSCU() RSCU(dna) Parameters: Name Type Description --------------------------------------------- dna string optional string of coding DNA Returns: list Synopsis: The function RSCU returns the relative synonymous codon usage of a organism if no argument is given. If a string of coding DNA is given the relative synonymous codon usage for the string is returned. Relative synonymous codon usage values are estimated as the ratio of the observed codon usage to that value expected if there is uniform usage within synonymous groups The RSCU for a codon (i) is RSCUi = Xi / Xj where Xi is the number of times the ith codon has been used for a given amino acid, and n is the number of synonymous codons for that amino acid References: Sharp PM, Tuohy TMF, and Mosurski KR. Codon usage in yeast: Cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Research 14:5125-5143 Rand Function Rand Options: builtin, numeric and polymorphic Calling Sequence: Rand() Returns: numeric Synopsis: This function returns a random number uniformly distributed between 0 and 1. The random number generator has the seed set by either the function SetRand or SetRandSeed. Any class which is completed with the command CompleteClass will have an automatically generated Rand function, i.e. random objects of the class can be generated. The following table describes the possible arguments of Rand and the object that will be generated. argument random structure ------------------------------------------------------------------------------ Alignment random alignment array(t,d1,...) array of dimensions d1,... with entries of type t Beta(a,b) Beta distributed number with average a/(a+b) Binomial(n,p) integer binomially distributed, ave n*p, var n*p*(1-p) Multinomial(n,ps) multinominally distributed integers ChiSquare(nu) chi-square distributed number with ave nu, var 2*nu CodingDNA(n) random DNA coding sequence (no stops) with n bases DNA(n) random DNA sequence with n bases. Uses the global vector AF, if suitable Entry a random entry from the database in DB Exponential(a,b) exponentially distributed number with ave a+b, var b^2 FDist(nu1,nu2) random F distributed or Variance-ratio number GammaDist(p) random Gamma distributed number with ave p and var p Geometric(p) geometrically distributed integer with ave (1-p)/p Graph(n,m) random graph with n vertices and m edges integer random integer [t1,t2,...] a list with random components of the given types LongInteger random extended precision integer MAlignment random multiple sequence alignment matrix(t) matrix with random dimensions and random entries of type t Normal(a,b) normally distributed variable with ave a and var b MNormal(a,b) multivariate normal with ave vector a and cov matrix b Poisson(m) Poisson distributed integer with average and variance m Polar complex number in Polar representation posint random positive integer Protein(n) a random sequence of amino acids of length n Sequence the sequence of a random entry from the database in DB a..b integers or numbers (depending on type of a,b) in the range {a,b,...} a random value from the set Stat results of univariate statistics string random (readable) string Student(nu) Student distributed variable with parameter nu SvdResult results of an Svd least squares approximation Tree random distance tree a random object of this type Examples: > SetRand(5); > Rand(); 0.8649 > Rand(); 0.6743 > Rand(Normal); 0.6467 > Rand(Binomial(20,0.2)); 2 > Rand(Poisson(55)); 43 > Rand(Geometric(0.2)); 3 > Rand(Exponential(1.2,3)); 3.8216 See Also: ?Beta_Rand ?Exponential_Rand ?Multinomial_Rand ?Shuffle ?Binomial_Rand ?FDist_Rand ?Normal_Rand ?StatTest ?ChiSquare_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score ?CreateRandSeq ?Geometric_Rand ?SetRand ?Student_Rand ?Cumulative ?Graph_Rand ?SetRandSeed ?Zscore Rank Function Rank - Computes sample ranks Calling Sequence: Rank(l) Rank(l,p) Parameters: Name Type Description -------------------------------------------------- l list a list of values p {procedure} (optianal) ordering procedure Returns: list Synopsis: This function returns the sample ranks of a list of values. Ties (i.e. equal values) are replaced by the average rank of them. Examples: > Rank( [4,6,1,5,6,9,1,3,3] ); [5, 7.5000, 1.5000, 6, 7.5000, 9, 1.5000, 3.5000, 3.5000] > Rank( [4,6,1,5,6,9,1,3,3], x->-x); [5, 2.5000, 8.5000, 4, 2.5000, 1, 8.5000, 6.5000, 6.5000] See also: ?avg ?cor ?sort ?std ?sum ?var ReadBrk Function ReadBrk Calling Sequence: ReadBrk(fname) ReadBrk(fname,tags = taglist) Parameters: Name Type Description --------------------------------------------------------------- fname string file name with the Brookhaven database taglist list(string) list of tags to be included Returns: NULL Global Variables: chains Synopsis: Read a Brookhaven database file into a Fold() data structure. Specify "compressed=true" as an argument if the file should be read by "zcat". The default taglist is HEADER, SOURCE, SEQRES, ATOM. ReadData Function ReadData - read a formatted file Calling Sequence: ReadData(filename,fmt) Parameters: Name Type Description ------------------------------------------- filename string name of file to be read fmt string a valid sscanf format Returns: list(list) Synopsis: ReadData opens and reads the file and scans each line with the format given. The result of the scan is stored in a list which is returned. Normally this will be a matrix (list of lists). If there are format errors, a message is printed and the process continues up to 100 errors. See Also: ?FileStat ?MySql ?ReadProgram ?ReadRawLine ?LockFile ?OpenReading ?ReadRawFile ?SearchDelim ReadDb Function ReadDb Calling Sequence: ReadDb(fname) Parameters: Name Type Description ---------------------------------- fname string sequence database Returns: database Global Variables: DB Synopsis: The function loads the sequence database located in file. The contents of file must be in the Darwin ISO-SGML format. By default, this sequence database is assigned to the system variable DB unless another variable is specified. This functions allows filename to specify a path. If fname ends in ".gz" or ".Z", then it is assumed to be a compressed file and it is decompressed before reading. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) See also: ?ConsistentGenome ?DB ?Entry ?GenomeSummary ?MySql ReadDssp Function ReadDssp Calling Sequence: ReadDssp(fname) Parameters: Name Type ---------------- fname filename Returns: NULL Global Variables: chains Synopsis: Read a DSSP formatted database file into a Fold() data structure. Specify "compressed=true" as an argument if the file should be read by "zcat". Specify "tags=[taglist]" as an argument to read selected tags. The default taglist is HEADER, SOURCE. ReadFasta Function ReadFasta - load fasta sequence file Calling Sequence: ReadFasta(fn) Parameters: Name Type Description --------------------------- fn string filename Returns: list(string) : list(string) Synopsis: ReadFasta loads a file with fasta sequences and returns a list of sequences and a list of ids. See Also: ?FileStat ?OpenReading ?ReadLibrary ?ReadPima ?LockFile ?OpenWriting ?ReadLine ?ReadPir ?MySql ?ReadBrk ?ReadMap ?ReadProgram ?OpenAppending ?ReadDb ?ReadMsa ?ReadRawFile ?OpenPipe ?ReadDssp ?ReadOffsetLine ReadLibrary Function ReadLibrary Option: builtin Calling Sequence: ReadLibrary(filename) ReadLibrary(filename,funcname) Parameters: Name Type Description ------------------------------------------------------ filename string procedure name or library filename funcname symbol procedure name Returns: procedure Synopsis: If only filename is supplied as a parameter, this function loads the contents of filename located in the user's local Darwin library. The ReadLibrary returns the function with the supplied name. If there is a second parameter, the first one is used to load the file from the library and the second should be a procedure name which is loaded in that file. ReadLibrary returns the procedure named in the second argument. With two arguments, if the filename starts with a slash ("/"), the it is assumed to be an absolute path name and the library name (stored in libname) will not be prepended to it. The location of the Darwin library is set with the -l flag when initiating your Darwin session and is kept in the global variable "libname". One of the main uses of ReadLibrary is to provide a mechanism for automatic loading of functions from the library. By assigning a name with an unevaluated call to ReadLibrary (with the appropriate parameters), when the function is used (and its name is evaluated), it will produce the actual reading of the library. Since reading the library is likely to assign the function name with a proc (or something else), the unevaluated ReadLibrary will be obliterated and the reading of the library happens only once. This mechanism allows efficient reading of library functions from many points; the first read will be the only one executed. The file "darwinit" in the Darwin library provides the definitions of all system-defined functions and many different examples of its use. Examples: > ReadLibrary(MultiAlign); > ReadLibrary(MultiAlign, AnchorAlign); See also: ?libname ?ReadProgram ?ReadRawFile ReadLine Function ReadLine - reads a darwin command in a single line Option: builtin Calling Sequence: ReadLine() ReadLine(t) Parameters: Name Type Description -------------------------------- t string a prompt string Returns: anything Synopsis: Reads one statement from the current input stream, evaluates the statement and return its value. The string t is a prompt which is sent to the standard output directly before reading from the standard input. This statement should only be used from within a procedure. Examples: > x := proc() t := ReadLine('prompt: '); lprint('The user entered: ',t); end; > x(); prompt: 1+3; The user entered: 4 See Also: ?FileStat ?OpenPipe ?ReadRawFile ?SplitLines ?inputoutput ?OpenReading ?ReadRawLine ?LockFile ?ReadData ?ReadURL ?MySql ?ReadOffsetLine ?SearchDelim ReadOffsetLine Function ReadOffsetLine - Reads one state from a file at a given offset Option: builtin Calling Sequence: ReadOffsetLine(filename,ofs) Parameters: Name Type Description ------------------------------------------------------ filename filename a filename from which to be read ofs posint an offset into the file Returns: NULL Synopsis: Reads one statement starting at ofs in file. Examples: See Also: ?FileStat ?OpenAppending ?ReadData ?ReadRawLine ?SplitLines ?inputoutput ?OpenReading ?ReadLine ?ReadURL ?LockFile ?OpenWriting ?ReadRawFile ?SearchDelim ReadPhylip Function ReadPhylip Calling Sequence: ReadPhylip(fname) Parameters: Name Type ------------------- fname a file name Returns: list Synopsis: ReadPhylip opens the file indicated by fname, assumes that it is an MSA in PHYLIP format and parses its content. The return value consists of a list containing a list of sequences plus a list of corresponding labels. Examples: > ReadPhylip('myphylipfile.phy'); See Also: ?FileStat ?OpenReading ?ReadFasta ?ReadOffsetLine ?LockFile ?OpenWriting ?ReadLibrary ?ReadPima ?MySql ?ReadBrk ?ReadLine ?ReadPir ?OpenAppending ?ReadDb ?ReadMap ?ReadProgram ?OpenPipe ?ReadDssp ?ReadMsa ?ReadRawFile ReadProgram Function ReadProgram Option: builtin Calling Sequence: ReadProgram(fname) Parameters: Name Type ------------------------------------- fname a string which is a file name Returns: NULL Synopsis: ReadProgram opens the file indicated by fname. The file name should be readable from the directory where Darwin is being executed. The file is expected to contain valid Darwin statements. All statements in the file are read and are only echoed if printlevel is sufficiently high. The effect of the statements read is as if they were executed at the top level, even when ReadProgram is called inside a function Examples: > ReadProgram(test); See Also: ?FileStat ?OpenReading ?ReadFasta ?ReadOffsetLine ?LockFile ?OpenWriting ?ReadLibrary ?ReadPhylip ?MySql ?ReadBrk ?ReadLine ?ReadPima ?OpenAppending ?ReadDb ?ReadMap ?ReadPir ?OpenPipe ?ReadDssp ?ReadMsa ?ReadRawFile ReadRawFile Function ReadRawFile Option: builtin Calling Sequence: ReadRawFile(filename) Parameters: Name Type Description -------------------------------------------------------------- filename string name of file to be read as a single string Returns: string Synopsis: Read an entire file (returned as a single string) given by its filename. Examples: See Also: ?FileStat ?OpenAppending ?ReadLine ?SearchDelim ?inputoutput ?OpenReading ?ReadOffsetLine ?SplitLines ?LockFile ?OpenWriting ?ReadRawLine ?MySql ?ReadData ?ReadURL ReadRawLine Function ReadRawLine - read a line as a string Option: builtin Calling Sequence: ReadRawLine() ReadRawLine(t) Parameters: Name Type Description -------------------------------- t string a prompt string Returns: string Synopsis: Reads one line from the current input stream and returns it as a string. When the input file is exhausted, the next ReadRawLines will return the string EOF. The string t, if provided, is a prompt which is sent to the standard output directly before reading from the standard input. This statement should not be used in interactive mode or in the middle of a program which is being read from the input stream, as there is bound to be confusion between the program and the data. It is recommended to use it inside a function/procedure. Examples: > OpenPipe(date); > ReadRawLine(); Thu Oct 12 08:01:39 MET DST 2000 > ReadRawLine(); EOF > x := proc() t := ReadRawLine('prompt: '); lprint('The user entered: ',t); end; > x(); prompt: 1+3; The user entered: 1+3; See Also: ?FileStat ?OpenReading ?ReadRawFile ?SplitLines ?inputoutput ?ReadData ?ReadURL ?LockFile ?ReadLine ?SearchDelim ?OpenPipe ?ReadOffsetLine ?ServerSocket ReadTable Function ReadTable( file:string ) Read utility similar to Splus read.table () function. Optional arguments: Sep = string, Skip = integer, Format = string, Prog = string and Format2 = [string]. Nota: Format bypass the sep=string mechanism. E.g. ReadTable (somefile.gz, Skip = 1, Format = '%s%d%d', Prog = gzcat). ReadTcp Function ReadTcp( timeout:{0,posint} ) Waits up to timeout seconds to receive data from another machine and execute it as Darwin commands. Returns (machine:string, pid:posint) or NULL if not successful. ReadURL Function ReadURL Calling Sequence: ReadURL(url) Parameters: Name Type Description --------------------------- url string a URL Returns: string Synopsis: Reads a URL and returns it as a string. Works the same way as ReadRawFile, just with URLs instead of filenames. See Also: ?DownloadURL ?OpenReading ?ReadLine ?SearchDelim ?OpenAppending ?OpenWriting ?ReadRawLine ?SplitLines Readability Function Readability - statistical index of readability Calling Sequence: Readability(s,Counts,Ctype) Parameters: Name Type Description ----------------------------------------------------------------------- s string input text to compute readability index Counts array(integer,26,26) optional statistical frequencies Ctype symbol optional name of frequencies Returns: numeric Synopsis: Readability computes an index based on how well the text follows a given set of probabilities of pairs of characters. The probabilities are computed from a 26 x 26 matrix of counts of occurrences of pairs of letters. Non letters are ignored (including spaces) and case is not sensitive. The following names of frequencies are implemented: Ctype Description ----------------------------------------------------------- English di-graphs frequencies for Shakespeare VowCon (default) vowel-consonant pairs only VowConF vowel-consonant pairs including letter frequencies Spanish di-graphs frequencies for Spanish Examples: > Readability('To be or not to be that is the question',English); 38.0937 > Readability('En un lugar de la Mancha de cuyo nombre no me acuerdo',English); 22.8186 > Readability(ASILITE,VowCon); 11.1734 See also: ?Mutate ?Rand ?Sequence ReceiveDataTcp Function ReceiveDataTcp( timeout:{0,posint} ) Waits up to timeout seconds to receive data from another machine. Returns (machine:string, pid:posint, data:string) or NULL if not successful. ReceiveTcp Function ReceiveTcp Option: builtin Calling Sequence: ReceiveTcp(timeout) Parameters: Name Type Description -------------------------------------------------- timeout {0,posint} seconds to wait for timeout Returns: {string} Synopsis: Waits up to timeout seconds to receive data from the IPC daemon. This command is usually preceded by a SendTcp. Returns NULL if no data is received (i.e. timeout occurred). Examples: > r := traperror(ConnectTcp('/tmp/.ipc/darwin', false)); > SendTcp('PING'); r := ReceiveTcp(3); r := PING OK > SendTcp('MSTAT linneus1'); r := ReceiveTcp(3); r := DATA linneus1 0:OK ALIVE > DisconnectTcp();; See Also: ?ConnectTcp ?ipcsend ?ParExecuteTest ?SendTcp ?darwinipc ?ParExecuteIPC ?ReceiveDataTcp ?DisconnectTcp ?ParExecuteSlave ?SendDataTcp ReconcileTree Function ReconcileTree - Reconciles a gene tree with a species tree Calling Sequence: ReconcileTree(g,s,g2s) ReconcileTree(g,s,g2s,reroot) Parameters: Name Type Description ---------------------------------------------------------------- g Tree Gene Tree s {OVERLAP,Tree} Species Tree or Species Overlap method g2s procedure mapping function from gene to species reroot boolean (optional) reroot gene tree Returns: list Synopsis: The function ReconcileTree infers gene duplication and speciation events on a gene tree by comparing it to a TRUSTED species tree. Alternatively, if no trusted species tree exists, one can use the species overlap reconcilation method by setting passing 'OVERLAP' as the species tree. The function g2s is a mapping function from the gene name to its species. If reroot is set to 'true' (by default it's false), the function reroots the gene tree on every possible branch and reconciles all those trees. It returns the rooted gene tree, that minimizes the number of dupliction events. The function returns the reconciled gene tree and the number of duplication events on it. The events are stored in the 'XTRA' field of the tree: 'D=Y' and 'D=N' indicate whether the node represents a duplication or speciation event respectively. References: Zmasek CM and Eddy SR. A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics, 2001, 17(9):821-828 van der Heijden RT et al, Orthology prediction at scalable resolution by phylogenetic tree analysis. BMC Bioinformatics. 2007, 8:83. Examples: > GeneTree := Tree(Tree(Tree(Leaf(a_HUMAN, 3), 2, Leaf(a_YEAST,3)), 1, Leaf(b_BOVIN,3)), 0, Leaf(c_HUMAN,3)); GeneTree := Tree(Tree(Tree(Leaf(a_HUMAN,3),2,Leaf(a_YEAST,3)),1,Leaf(b_BOVIN,3)),0,Leaf(c_HUMAN,3)) > SpeciesTree := Tree(Tree(Leaf(HUMAN,2),1,Leaf(BOVIN,2)),0,Leaf(YEAST,2)); SpeciesTree := Tree(Tree(Leaf(HUMAN,2),1,Leaf(BOVIN,2)),0,Leaf(YEAST,2)) > SwissProtID := x -> x[SearchString('_',x)+2..-1]; SwissProtID := x -> x[SearchString(_,x)+2..-1] > tree := ReconcileTree(GeneTree, SpeciesTree, SwissProtID); tree := [Tree(Tree(Tree(Leaf(a_HUMAN,3),2,Leaf(a_YEAST,3),D=N),1,Leaf(b_BOVIN,3),D=Y),0,Leaf(c_HUMAN,3),D=Y), 2] > tree := ReconcileTree(GeneTree, 'OVERLAP', SwissProtID); tree := [Tree(Tree(Tree(Leaf(a_HUMAN,3),2,Leaf(a_YEAST,3),D=N),1,Leaf(b_BOVIN,3),D=N),0,Leaf(c_HUMAN,3),D=Y), 1] See Also: ?BipartiteSquared ?LeastSquaresTree ?RobinsonFoulds ?Tree ?IntraDistance ?PhylogeneticTree ?RotateTree RedoCompletion Function RedoCompletion - Rewrite the file listing commands for shell autocompletion Calling Sequence: RedoCompletion() Returns: NULL Synopsis: This function rewrites the file "cmds" in the library with the list of all function defined in the current session (for this purpose, it uses the function names()) See also: ?libname ?names Region Data structure Region( ) Structure to hold a gene region. - Selectors: Nr: set, Start: posint, End: posint, StartFrame: posint, EndFrame: posint, FloatStart: boolean, FloatEnd: boolean, Sim: numeric, BestNr: posint, MinShifts: integer, MaxShifts: integer - Format: Region(Nr,Start,End,StartFrame,EndFrame,FloatStart,FloatEnd,Sim,BestNr, MinShifts,MaxShifts). RegularGraph Function RegularGraph - generate a random regular graph Calling Sequence: RegularGraph(n,e) Parameters: Name Type Description -------------------------------------------------- n integer optional number of nodes/vertices e integer optional number of edges Returns: Graph Synopsis: Generate a random graph where each of the n vertices has the same degree e. The product n*e must be even. Examples: > RegularGraph(5,2); Graph(Edges(Edge(0,1,2),Edge(0,1,4),Edge(0,2,5),Edge(0,3,4),Edge(0,3,5)),Nodes(1,2,3,4,5)) See Also: ?BipartiteGraph ?Graph_minus ?Nodes ?Clique ?Graph_Rand ?ParseDimacsGraph ?DrawGraph ?Graph_XGMML ?Path ?Edge ?InduceGraph ?ShortestPath ?EdgeComplement ?MaxCut ?TetrahedronGraph ?Edges ?MaxEdgeWeightClique ?VertexCover ?FindConnectedComponents ?MinCut ?Graph ?MST RelativeAdaptiveness Function RelativeAdaptiveness - Calculate the realative adaptiveness Calling Sequence: RelativeAdaptiveness([e]) Returns: list Synopsis: See also: ?ComputeCAI ?SetupRA RellTree Function RellTree - does RELL on a TreeResult Calling Sequence: RellTree(TreeResult,nrOfBootstraps) Parameters: Name Type Description ---------------------------------------------------------------------- TreeResult TreeResult the TreeResult object in question nrOfBootstraps posint (opt) desired number of bootstrap values Synopsis: Applies RELL (resamplling of estimated log likelihood values) on a TreeResult object that contains log likelihoods per site (e.g. from a phyml run). See Kishino et al., MBE 1990, for more information. See also: ?PhyML ?Tree RenderTemplate Function RenderTemplate - Substitutes placeholders in template file with user variables Calling Sequence: RenderTemplate(file,tab) Parameters: Name Type Description ------------------------------------ file string filename of template tab table substitution table Returns: string Synopsis: Return the content of the template file with substituted placeholders. Three different placeholders are supported in the template file: 1) '', where the whole tag gets replaced by the value in the substitution table. 2) ' ... ', where ... is ignored if the variable XXX is false and inserted if it is true respectively. 3) '...' indicates a loop section, where '' occurrences are replaced with the appropriate values from tab[XXX,i,YYY], for all possible i. In this case, the value of tab[XXX] needs to be a list of tables. See Also: ?Block ?Document ?List ?screenwidth ?Code ?HTML ?Paragraph ?string ?Color ?HyperLink ?PostscriptFigure ?Table ?ConcatStrings ?Indent ?print ?trim ?Copyright ?LastUpdatedBy ?Roman ?TT ?DocEl ?latex ?RunDarwinSession ?View ReplaceString Function ReplaceString - Replace a phrase in a text Calling Sequence: ReplaceString(old,new,txt) Parameters: Name Type Description -------------------------------------- old string pattern to be replaced new string new pattern txt string text that will changed Returns: string Synopsis: Replaces all occurrences of a string in a text with a new string. Examples: > ReplaceString('east', 'west', 'one flew east'); one flew west See also: ?SearchAllString ?SearchDelim ?SearchString Reverse Function Reverse - Reverse a string or a list Calling Sequence: Reverse(s) Parameters: Name Type Description ----------------------------------------- s {list,string} any string or list Returns: {list,string} Synopsis: Reverses a string or a list, i.e. the first character or element becomes the last, the second the before-last, etc. Examples: > Reverse('ACTTACG'); GCATTCA See also: ?antiparallel ?Complement ?CreateString ?string RobinsonFoulds Function RobinsonFoulds - Computes the pairwise Robinson-Foulds distance between a set of trees Calling Sequence: RobinsonFoulds(trees) Parameters: Name Type Description ---------------------------------- trees list(Tree) list of trees Returns: matrix(numeric) Synopsis: The Robinson and Foulds (RF) distance between two trees is the number of non-trivial bipartitions present in one of the two trees but not the other, divided by the number of possible bi-partitions. Thus, the smaller the RF distance between two trees the closer are their topologies. The algorithm runs in O(m^2*n), where m ist the number of trees an n the number of Leaves. References: Pattengale, Gottlieb and Moret, "Efficiently Computing the Robinson-Foulds Metric", J. Comp. Biol., 2007, 14(6), 724--735 Examples: > t1 := Tree(Tree(Leaf(a,2),1,Leaf(b,2)),0,Tree(Leaf(c,2),1,Leaf(d,2))): > t2 := Tree(Tree(Leaf(a,2),1,Leaf(d,2)),0,Tree(Leaf(c,2),1,Leaf(b,2))): > RobinsonFoulds([t1,t2]); [[0, 1], [1, 0]] See also: ?BipartiteSquared ?IdenticalTrees ?IntraDistance ?Tree Roman Function Roman - convert an integer to a roman numeral Calling Sequence: Roman(n) Parameters: Name Type ------------- n posint Returns: string Synopsis: Roman converts a positive integer into an uppercase roman numeral. The conversion cannot be done for n<=0. For very large numbers, the output string becomes linear in n/1000. Examples: > Roman(73); LXXIII > Roman(1948); MCMXLVIII > lowercase(Roman(14)); xiv See Also: ?Block ?Document ?latex ?RunDarwinSession ?Code ?HTML ?List ?screenwidth ?Color ?HyperLink ?Paragraph ?Table ?Copyright ?Indent ?PostscriptFigure ?TT ?DocEl ?LastUpdatedBy ?print ?View Romberg Function Romberg - Integrates a function using Romberg's Schema Calling Sequence: Romberg(f,a..b,eps,n) Parameters: Name Type Description -------------------------------------------------------------------------------- f procedure function to integrate a..b range (optional, default -inf..+inf) range of the integration eps numeric (optional, default 1e-8) epsilon n posint (optional, default 20) maximum dimension of Romberg's tableau Returns: numeric Synopsis: Integrates the function numerically using Romberg's method. If the range is not given, it integrates between -infinity and +infinity by using the following substitution int(f(x),x=-inf..+inf) = int(f(tan(x))*(1+tan (x)^2),x=-Pi/2..Pi). Examples: > Romberg(x -> sin(x), 0..2*Pi); 2.5649e-16 RotateTree Function RotateTree - Returns a new, rotated tree Calling Sequence: RotateTree(tree,side,sub_side) Parameters: Name Type Description ---------------------------------------------------------- tree Tree a tree to be rotated side {Left,Right} the first indication about side sub_side {Left,Right} the second indication about side Returns: Tree Synopsis: Returns a new, rotated tree rooted half-way through the edge that is indicated by the side and sub-side arguments. The leaves of the tree should have annotated heights, but this is not strictly enforced, unless the rotation is happening directly next to a leaf. Examples: > t := Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15))); t := Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15))) > newt := RotateTree(t,Left,Left); newt := Tree(Leaf(A,5),0,Tree(Leaf(B,15),5,Tree(Leaf(C,25),21,Leaf(D,25)),100)) See also: ?AllRootedTrees ?AllTernaryRoots ?Tree RunDarwinSession Function RunDarwinSession - run Darwin code inside a Document and insert results Calling Sequence: RunDarwinSession(doc) Parameters: Name Type Description ------------------------------------------------------ doc structure typically a document or part of one Returns: structure Synopsis: RunDarwinSession scans the input Document structure (or part of one) and collects all the structures of type DarwinCode(string), DarwinHideInput(string), DarwinExpression(string), DarwinHidden(string) and DarwinCodeHTML(string). These have the following effects: DarwinCode(string) - The string contents of this structure are interpreted as statements to a darwin session and are collected and executed by darwin. The output is separated into its component, and each original DarwinCode structure is replaced by a green Code structure containing the input and a red Code structure containing the output. DarwinHideInput(string) - The string contents of this structure are interpreted as statements to a darwin session and are collected and executed by darwin. The output is separated into its component, and each original DarwinHideInput structure is replaced by a red Code structure containing the output. DarwinExpression(string) - The contents of this structure are considered to be statements also to be merged in the darwin code and executed. Their values will replace the structure in the Document. DarwinExpressions serve as a mechanism to incorporate values computed in the darwin run, which may not be known, into the text of the Document. DarwinHidden(string) - The contents of this structure is executed, but no result is incorporated in the document. This is useful to set parameters appropriately while it is unwanted to reflect this in the resulting document. E.g. Set(gc=xxx). DarwinCodeHTML(string) - The contents of this structure are assumed to contain characters which are invalid in normal HTML, (like "<" and ">") and these characters are converted to their corresponding Entity Names. Typical uses of this structure are programs which have the special symbols or programs which will output HTML tags. E.g. "if a < b then ..." or printf( '%s' ). InvokeDarwin - This is a global variable which may be assigned with the name of a command to execute Darwin if the default ("darwin") is not suitable. This is needed when the Darwin command is special or it must be executed with special arguments. DarwinOutputUpperLimit - This is a global variable which if assigned with a positive integer (call it n) will limit the number of output lines of each single DarwinCode() set of statements. The value n is the number of lines to be displayed, and if the output has more lines than n lines, the top n/2 lines will be displayed followed by a line ". . . . (xxx output lines skipped) . . . ." followed by the last n/2 lines. This is very useful when the output is undesirably long, but necessary. DarwinTimeout - This is a global variable which if assigned a positive value will limit the execution time of the Darwin session to that value (in seconds). By default the session is allowed to run for 600 seconds. RunDarwinSession is relatively robust against errors, help files, etc. It cannot display objects which are shown with the command View. See Also: ?Block ?Document ?latex ?Roman ?Code ?HTML ?List ?screenwidth ?Color ?HyperLink ?Paragraph ?Table ?Copyright ?Indent ?PostscriptFigure ?TT ?DocEl ?LastUpdatedBy ?print ?View SPCommonName Function SPCommonName - common name of the species of the entry or scientific name Calling Sequence: SPCommonName(entry) Parameters: Name Type Description -------------------------------------------------------------- entry anything any description of an entry or entry number Returns: string Synopsis: SPCommonName finds the common name of the species of the given entry. If the input is the scientific name of a species, SPCommonName will try to locate an entry with that name to use it. The common name is found within parenthesis in the OS entry in SwissProt databases. If the database in DB does not conform to this rule, the function may not work properly. If no common name is found, it returns the species name. If no species name is found, it returns the AC or ID or "no name". This function is useful to provide simple labels for plots. Examples: > SPCommonName(AC(P13475)); Slime mold > SPCommonName('Raphicerus campestris'); Steenbok > SPCommonName(AC(P00083)); Rhodopseudomonas viridis See Also: ?DbToDarwin ?SearchAC ?Species_Entry ?GetEntryInfo ?SearchID ?SP_Species SP_Species Function SP_Species - find all the names of species in the database Calling Sequence: SP_Species(taxon) Parameters: Name Type Description -------------------------------------------------- taxon string optional taxonomic classification Returns: set(string) Synopsis: SP_Species scans the database assigned to DB and returns the names of all the species (or all the species of the given taxonomic classification). This assumes that the database is a SwissProt database or that at least it has the OC tags with taxonomic information. The matching of the taxonomic information is done in a textual and case insensitive mode. If this results in an ambiguous selection, it is possible to include a longer portion of the taxonomic information (see examples). Examples: > SP_Species(Abies); {Abies alba,Abies bracteata,Abies firma,Abies grandis,Abies holophylla,Abies homolepis,Abies magnifica,Abies mariesii,Abies sachalinensis,Abies veitchii} > SP_Species(Pinus); {Carpinus betulus,Carpinus caroliana,Lupinus albescens, Lupinus aureonitens,Lupinus albifrons,Lupinus albus,Lupinus angustifolius,Lupinus arboreus,Lupinus atlanticus, Lupinus digitatus, Lupinus pilosus,Lupinus cosentinii,Lupinus densiflorus,Lupinus luteus,Lupinus microcarpus,Lupinus nanus,Lupinus polyphyllus,Pinus balfouriana,Pinus banksiana,Pinus contorta,Pinus edulis,Pinus griffithii,Pinus koraiensis,Pinus krempfii,Pinus longaeva,Pinus monticola,Pinus pinaster,Pinus pinea,Pinus radiata,Pinus strobus,Pinus sylvestris,Pinus taeda,Pinus thunbergii,Pinus virginiana} > SP_Species('Pinaceae; Pinus'); {Pinus balfouriana,Pinus banksiana,Pinus contorta,Pinus edulis,Pinus griffithii,Pinus koraiensis,Pinus krempfii,Pinus longaeva,Pinus monticola,Pinus pinaster,Pinus pinea,Pinus radiata,Pinus strobus,Pinus sylvestris,Pinus taeda,Pinus thunbergii,Pinus virginiana} See Also: ?DbToDarwin ?SearchAC ?SPCommonName ?GetEntryInfo ?SearchID ?Species_Entry SaveEntries Function SaveEntries( xs, descr:string ) Save all sequences from entries xs to files descr. ScaleIndex Function ScaleIndex - Compute the Scale Variation Index Calling Sequence: ScaleIndex(ma) Parameters: Name Type Description -------------------------------------------------- ma array(string) multiple sequence alignment t Tree a phylogenetic tree Returns: list(numeric) Global Variables: ScaleIndex_MA ScaleIndex_Tree Synopsis: Computes a variation index defined as the scale factor for pam distances that makes Probability{position} maximal for all positions of a multiple alignment. Examples: > ma := [ ' -------------------------FPEVVGKTVDQA ..(535).. CSPRKGTKT']; ma := [ -------------------------FPEVVGKTVDQAREYFTLHYPQ , -------------------IASAGFVRDAQGNCIK--- , AKQVVLLIFGSWQLARERLANEMRKAVAY__TFL__NFDMGRQPLSMHYSDKVCSPRMSTET, AEPIVPLLFGMWRLKRKKANNKLLRCVKY__TLLARNTSDGREPVACRYSEKICSPRTGTKT, AEVIVPLLFGVWRLKREERTYTLLQCVKY__VFLARNTVAGNRPLSKKFSEKVCSPRK , AEPIVPLLFGLWQLAREKASNTLLQCVKY__VFLARNTVAGRRPLKMKYSDKVCSPRKGAKT, EPIVPLL__MWQLAIEKSSNTLLQCVK__KVFLARKTVAGRRPLSMKFSDKVCNPRKGTKT, PIVPLLFGMWQLAREKASNTLLQCVKYYYVFLARNTVAGRRPLSMKYSDKVCSPRKGTKT] > tree := Tree(Tree(Leaf(b,-250.0000,2),-2.8422e-14, ..(271).. 00,3))))))); tree := Tree(Tree(Leaf(Permutation([5, 6, 1, 2, 4, 7, 3]),-250,2),-2.8422e-14,Leaf(Permutation([4, 5, 6, 2, 3, 1, 7]),-250,1)),0,Tree(Leaf(h,-250,8),-209.7583,Tree(Leaf(g,-260.8121,7),-227.6537,Tree(Leaf(f,-256.9830,6),-233.8701,Tree(Leaf(d,-240.9182,4),-235.7326,Tree(Leaf(e,-252.2867,5),-237.4908,Leaf(c,-239,3))))))) > scxd := ScaleIndex (ma, tree); scxd := [-2.9973, -0.1326, 0.5459, -0.2116, -2.9973, 0.2001, -2.9973, -0.03674823, -2.9973, -2.9973, 0.4785, -2.9973, 0.1069, -2.9973, 0.3251, -0.03946685, -0.1529, 0.1645, 0.6408, 0.3623, -0.1573, 0.2861, -0.07401567, -0.07673429, 0.1184, 0.06689822, -0.7055, -0.02355176, -0.6150, -0.1929, 2.6355, -0.1831, 0.06961684, -0.2861, -0.6337, -0.4572, -0.07401567, -0.2176, 0.4944, 0.3738, -0.4369, -0.6825, 0.2692, -0.09976684, -0.1157, 0.03234940, -0.00219941, 0.2089, -0.2363, -0.6194, 0.1672, -2.9973, -0.1831, -2.9973, -0.2746, -2.9973, -2.9973, 0.2878, 0.2461, -0.1414, -0.05978077, -2.9973] See also: ?KWIndex ?PlotIndex ?PrintIndex ?ProbIndex ScaleTree Function ScaleTree - Scales a tree to a specific height. Calling Sequence: ScaleTree(t,h) Parameters: Name Type Description -------------------------------------------------- t Tree tree to be scaled h positive new distance from root to leaves Returns: Tree Synopsis: The function ScaleTree scales a tree to a given height. If the tree is not ultrametric, the distance from the root to the deepest leaf is scaled. The root of the returned tree is always at height/time 0. Examples: > BDTree := BirthDeathTree(0.1, 0.01, 10, 50); BDTree := Tree(Tree(Leaf(S1,50),35.8040,Leaf(S2,50)),17.1528,Tree(Tree(Tree(Leaf(S3,50),32.0575,Tree(Tree(Leaf(S4,50),44.9293,Leaf(S5,50)),37.4626,Leaf(S6,50))),28.9295,Tree(Tree(Leaf(S7,50),47.4169,Leaf(S8,50)),45.1129,Leaf(S9,50))),21.3121,Leaf(S10,50))) > ScaledTree := ScaleTree(BDTree, 100); ScaledTree := Tree(Tree(Leaf(S1,100.0000),56.7819,Leaf(S2,100.0000)),0,Tree(Tree(Tree(Leaf(S3,100.0000),45.3760,Tree(Tree(Leaf(S4,100.0000),84.5627,Leaf(S5,100.0000)),61.8313,Leaf(S6,100.0000))),35.8531,Tree(Tree(Leaf(S7,100.0000),92.1360,Leaf(S8,100.0000)),85.1217,Leaf(S9,100.0000))),12.6625,Leaf(S10,100.0000))) See also: ?AddDeviation ?BirthDeathTree ?Tree ScoreAlignment Function ScoreAlignment - scores an existing codon or protein alignment Calling Sequence: ScoreAlignment(dps1,dps2,S) Parameters: Name Type Description ----------------------------------------------------------------- dps1 string First of the aligned sequences. dps2 string Second of the aligned sequences. S {CodonMatrix,DayMatrix} a scoring matrix Returns: numeric Synopsis: This functions scores two aligned sequences with a given scoring matrix S. If S is a CodonPAM or SynPAM matrix, the sequences are interpreted as DNA, and if S is a Dayhoff matrix, the sequences are assumed to be proteins. The two input strings must be of same length and can include gaps ('___' or '_') which will be scored according to the gap cost formula as defined in in the scoring matrix. Examples: > ScoreAlignment(AAACCCGGGTTT,AAACCG___TTT,cm); 13.7069 See Also: ?Align ?CodonMatrix ?CreateSynMatrices ?CodonAlign ?CreateCodonMatrices ?DynProgStrings ?CodonDynProgStrings ?CreateDayMatrices ScoreIntron Function ScoreIntron Calling Sequence: ScoreIntron(m,intron) Parameters: Name Type -------------------- m NucPepMatch intron posint Returns: NULL Synopsis: Computes the score [alpha, delta, omega] for a given intron. Examples: See also: ?Introns SearchAC Function SearchAC - find an entry with a given accession number Calling Sequence: SearchAC(pat) Parameters: Name Type ------------- pat string Returns: Entry Synopsis: The SearchAC function searches the sequence database currently assigned to system variable DB. It returns an entry data structure which contains at most one exact match of the given argument, pat, with the AC field of the entry. If no match can be found it returns NULL. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > SearchAC('Q62671;'); EDD_RATQ62671;Ubiquitin-- ..(1568).. V > SearchAC(ZZZZ); See also: ?DB ?SearchID ?SearchSeqDb ?SearchTag SearchAllArray Function SearchAllArray Calling Sequence: SearchAllArray(t,A) Parameters: Name Type --------------- t anything A array Returns: array Synopsis: The function SearchAllArray returns the array of indices of an element in an array if it is a member of the array. Otherwise it returns an empty list. Examples: > SearchAllArray(5, [1, 2, 7, 5, 8, 5, 7, 5]); [4, 6, 8] > SearchAllArray('hi', ['hello', 'hallo', 'hey', 'hoi']); [] See also: ?SearchArray ?SearchOrderedArray ?table SearchAllString Function SearchAllString - Find several instances of phrase in a text Calling Sequence: SearchAllString(pat,txt) Parameters: Name Type Description ---------------------------------------- pat string a pattern that is sought txt string a text which is searched Returns: list Synopsis: The function SearchAllArray returns the array of indices of an all the occurrences of the pattern in the text. If pattern can not be found it returns an empty list. This function is case insensitive. Examples: > SearchAllString('hehe', 'hehehe'); [1, 3] > SearchAllString('cat', 'acgcagcatgcatcagtca'); [7, 11] See Also: ?BestSearchString ?MatchRegex ?SearchMultipleString ?CaseSearchString ?SearchApproxString ?SearchString ?HammingSearchString ?SearchDelim SearchArray Function SearchArray Option: builtin Calling Sequence: SearchArray(t,A) Parameters: Name Type ----------------------- t {numeric,string} A array Returns: {0,posint} Synopsis: The function SearchArray returns the index of an element in an array if it is a member of the array. Otherwise it returns 0. Examples: > SearchArray(5, [1, 2, 7, 5, 8]); 4 > SearchArray('hi', ['hello', 'hallo', 'hey', 'hoi']); 0 See also: ?SearchAllArray ?SearchOrderedArray ?table SearchDayMatrix Function SearchDayMatrix - search an array of DayMatrix for a given PAM Option: builtin Calling Sequence: SearchDayMatrix(PAM,daymat) Parameters: Name Type Description -------------------------------------------------------------------- PAM numeric PAM distance for which matrix is sought daymat array(DayMatrix) an array of Dayhoff matrices Returns: DayMatrix Synopsis: This function searches the list of DayMatrix for the Dayhoff matrix calculated with PamNumber closest to PAM. This function assumes that daymat is in ascending order. Examples: > CreateDayMatrices(); > SearchDayMatrix(250, DMS); DayMatrix(Peptide, pam=250, Sim: max=14.152, min=-5.161, del=-19.814-1.396*(k-1)) See Also: ?CreateDayMatrices ?CreateDayMatrix ?CreateOrigDayMatrix ?DayMatrix SearchDb Function SearchDb Calling Sequence: SearchDb(pat_1..pat_k) Parameters: Name Type ---------------------------- pat_i {string,set(string)} Returns: an Entry structure Synopsis: The SearchDb function searches the sequence database currently assigned to system variable DB. When pat_i consists of a set of strings, the function returns the logical OR of the results (all entries containing at least one of the elements in the set pat_i). The comma symbol represents the logical AND of the arguments. In this case, SearchDb returns only those entries that contain all such patterns. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > SearchDb('platypus'); AMEL_ORNANO97646;Amelogen ..(595).. T, ATP6_ORNANQ36454;ATP synt ..(835).. T, ATP8_ORNANQ36453;ATP synt ..(569).. S, COX1_ORNANQ36452;Cytochro ..(1126).. A, COX2_ORNANQ37718;Cytochro ..(996).. S, COX3_ORNANQ36455;Cytochro ..(829).. S, CYB_ORNANQ36461;Cytochrom ..(968).. W, DLP1_ORNANP82172;Defensin ..(530).. 2, DLP2_ORNANP82140;Defensin ..(538).. 2, DLP3_ORNANP82141;Defensin ..(413).. F, HBA_ORNANP01979;Hemoglobi ..(754).. R, HBB_ORNANP02111;Hemoglobi ..(725).. H, HSP1_ORNANP35307;Sperm pr ..(603).. N, INS_ORNANQ9TQY7; Q9TQY8;I ..(614).. N, LCA_ORNANP30805;Alpha-lac ..(847).. C, MYG_ORNANP02196;Myoglobin ..(724).. G, NU1M_ORNANQ37717;NADH-ubi ..(878).. M, NU2M_ORNANQ36451;NADH-ubi ..(979).. S, NU3M_ORNANQ36456;NADH-ubi ..(593).. E, NU4M_ORNANQ36458;NADH-ubi ..(1137).. C, NU5M_ORNANQ36459;NADH-ubi ..(1343).. F, NU6M_ORNANQ36460;NADH-ubi ..(644).. H, NULM_ORNANQ36457;NADH-ubi ..(674).. C > SearchDb('alpha-lactalbumin'); LCAA_HORSEP08334;Alpha-la ..(794).. L, LCAB_HORSEP08896;Alpha-la ..(818).. L, LCA_BOSMUQ9TSR4;Alpha-lac ..(863).. L, LCA_BOVINP00711; Q95NE4;A ..(1467).. 3, LCA_BUBBUQ9TSN6;Alpha-lac ..(882).. L, LCA_CAMDRP00710;Alpha-lac ..(851).. W, LCA_CANFAQ9N2G9;Alpha-lac ..(825).. L, LCA_CAPHIP00712;Alpha-lac ..(1215).. 2, LCA_CAVPOP00713;Alpha-lac ..(1110).. 9, LCA_EQUASP28546;Alpha-lac ..(812).. L, LCA_FELCAP37154;Alpha-lac ..(562).. P, LCA_HUMANP00709;Alpha-lac ..(1557).. 5, LCA_MACEUQ06655;Alpha-lac ..(1002).. C, LCA_MACGIP19122;Alpha-lac ..(664).. V, LCA_MACRGP07458;Alpha-lac ..(839).. C, LCA_MOUSEP29752;Alpha-lac ..(1384).. 0, LCA_ORNANP30805;Alpha-lac ..(847).. C, LCA_PAPCYP12065;Alpha-lac ..(998).. 7, LCA_PIGP18137;Alpha-lacta ..(859).. M, LCA_RABITP00716; Q9TQT7;A ..(907).. K, LCA_RATP00714; P00715;Alp ..(965).. P, LCA_SHEEPP09462; Q9GKS5;A ..(942).. L, LCA_TACACP81646;Alpha-lac ..(828).. C, LCA_TRIVUQ29145;Alpha-lac ..(889).. C > SearchDb('platypus', 'alpha-lactalbumin'); LCA_ORNANP30805;Alpha-lac ..(847).. C > SearchDb('alpha-lactalbumin', {'platypus', 'panda'}); LCA_ORNANP30805;Alpha-lac ..(847).. C See Also: ?DB ?SearchAC ?SearchSeqDb ?Species_Entry ?PatEntry ?SearchID ?SearchTag SearchDelim Function SearchDelim - break up a string at each occurrence of a delimiter Calling Sequence: SearchDelim(delim,txt) Parameters: Name Type Description ------------------------------------------------------------- delim string a pattern that delimits portions of a string txt string the text to be split Returns: list(string) Synopsis: SearchDelim returns a list of strings, where each string in the list is one of the parts of the txt delimited by occurrences of delim. SearchDelim is ideal to break up a string which contains many lines separated by newlines. If the string after the last occurrence of delim is empty, it is not added to the list. Delimiting with an empty string does not make sense and it is not allowed. Examples: > SearchDelim('a', 'abracadabra'); [, br, c, d, br] > SearchDelim('\n', 'file1\nfile2\nfile3\n'); [file1, file2, file3] See Also: ?BestSearchString ?Lines ?SearchMultipleString ?CaseSearchString ?MatchRegex ?SearchString ?HammingSearchString ?SearchApproxString ?SplitLines SearchFrag Function SearchFrag - Search database for a fragment Calling Sequence: SearchFrag(seq) Parameters: Name Type ------------- seq string Returns: list(Match) Synopsis: Return all matches of seq against the peptide database located in the system variable DB. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > SearchFrag('SGPPRIP'); Searching the fragment SGPPRIP in /home/darwin/DB/SwissProt.Z, Tue Feb 19 10:54:39 2013 With goal 31.8 and PAM 250, 185 matches were found After refining with Align/DMS, 4 matches were selected with similarity not less than 70 See Also: ?AlignOneAll ?SearchAC ?SearchID ?Species_Entry ?PatEntry ?SearchDb ?SearchSeqDb SearchID Function SearchID Calling Sequence: SearchID(pat) Parameters: Name Type ------------- pat string Global Variables: SearchID_DBname SearchID_table Synopsis: The SearchID function searches the sequence database currently assigned to system variable DB. It returns an entry data structure which contains at most one exact match of the given argument, pat, with the ID field of the entry. If no match can be found it returns an empty data structure. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > SearchID(CY2_RHOVI); CY2_RHOVIP00083;Cytochrom ..(1021).. 6 > SearchID(ZZZZ); See also: ?DB ?SearchAC ?SearchSeqDb ?SearchTag ?Species_Entry SearchMassDb Function SearchMassDb - Searches digestion fragments against a database Option: builtin Calling Sequence: SearchMassDb(p,n) Parameters: Name Type Description ---------------------------------------------------------------- p Protein description of protein (weights, enzymes, etc.) n integer maximum number of returned matches Returns: MassProfileResults Synopsis: Searches the n most significant matches of weights of digested fragments. The search is done against the database which is currently loaded (with the command ReadDb). This could be a protein or a nucleotide database. The description of the protein to be searched is in terms of the (one or many) weights resulting from digesting the protein with an enzyme. This description can also hold other information as deuteration, and modified amino acid weights. See Protein and DigestionWeights for details. The result is a data structure which contains the best n matches, ordered from best to worst. Each match is described by the similarity score, number of fragments in the protein, number of matched fragments, and description of the matching protein. See MassProfileResults for full details. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > print( SearchMassDb( Protein(DigestionWeights('Trypsin', 601.9438, 504.0904, 1512.4545, 480, 590)), 5 )); Score n k AC DE OS 60.4 21 4 P28519; DNA repair protein RAD14. Saccharomyces cerevisiae (Baker's yeast). Unmatched weights: [1512.5]. 60.0 7 3 Q43284; Oleosin 14.9 kDa. Arabidopsis thaliana (Mouse-ear cress). Unmatched weights: [480.0, 1512.5]. 59.8 17 4 P21908; Glucokinase (EC 2.7.1.2) (Glucose kinase). Zymomonas mobilis. Unmatched weights: [590.0]. 58.2 6 3 Q9FC39; Protein crcB homolog 1. Streptomyces coelicolor. Unmatched weights: [590.0, 1512.5]. 57.3 11 3 P06931; E6 protein. Bovine papillomavirus type 1. Unmatched weights: [590.0, 601.9]. See Also: ?DigestAspN ?DigestWeights ?MassProfileResults ?DigestionWeights ?DynProgMass ?ProbBallsBoxes ?DigestSeq ?DynProgMassDb ?ProbCloseMatches ?DigestTrypsin ?enzymes SearchMultipleString Function SearchMultipleString - search several sequential patterns in a string Calling Sequence: SearchMultipleString(pat1,pat2,...,text) Parameters: Name Type -------------- pat_i string txt string Returns: list(integer) Synopsis: The SearchMultipleString function returns a list with the offsets of all the matches of each of the patterns given as arguments. This is very useful when one wants to search for a portion of a string enclosed in some particular context. The individual patterns are matched as case insensitive. All the patterns have to match, in a non-overlapping way and in the given order. If there is no match of all the patterns, the function returns an empty list. Examples: > SearchMultipleString( '(', 'a', ')', '(),(bbb), (...a...)' ); [0, 14, 18] See Also: ?BestSearchString ?MatchRegex ?SearchString ?CaseSearchString ?SearchApproxString ?HammingSearchString ?SearchDelim SearchOrderedArray Function SearchOrderedArray Option: builtin Calling Sequence: SearchOrderedArray(target,L) Parameters: Name Type Description ----------------------------------------------------------- target {numeric,string} target to be searched for L {array,list} array or list to be searched in Returns: {0,posint} Synopsis: The SearchOrderedArray function returns the first index i such that L[i] <= target < L[i+1]. Examples: > SearchOrderedArray(5, [2, 4, 6, 8, 10]); 2 > SearchOrderedArray('mike', ['chantal', 'gaston', 'mike', 'ulrike', 'xianghong']); 3 > SearchOrderedArray(5, [10, 8, 6, 4, 2]); 0 See also: ?SearchAllArray ?SearchArray ?table SearchSeqDb Function SearchSeqDb Option: builtin Calling Sequence: SearchSeqDb(txt) Parameters: Name Type Description --------------------------------------------------------------- txt {string,string..string} sequence string to be searched Returns: PatEntry Synopsis: Find all the occurrences of t in the amino acid sequences part of DB. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > SearchSeqDb('SGPPRIP'); PatEntry(46915583..46915583) See also: ?AlignOneAll ?SearchFrag SearchString Function SearchString - case insensitive exact string searching Option: builtin Calling Sequence: SearchString(pat,txt) Parameters: Name Type Description ---------------------------------------- pat string a pattern that is sought txt string a text which is searched Returns: {-1,0,posint} Synopsis: This returns the offset before the character where pat matches with txt. If pat does not match txt, -1 is returned. This function is case insensitive. Examples: > SearchString('HerE', 'It is in hERe'); 9 > SearchString('where', 'wear am i'); -1 See Also: ?BestSearchString ?MatchRegex ?SearchMultipleString ?CaseSearchString ?SearchApproxString ?HammingSearchString ?SearchDelim SearchTag Function SearchTag Option: builtin Calling Sequence: SearchTag(tg,txt) Parameters: Name Type Description ------------------------------------------------------------------------ tg string an SGML tag without the surrounding angle brackets txt string a string that is searched for in the field defined by tg Returns: string Synopsis: The SearchTag function extracts the information surrounded by SGML tag tg in the body of txt text. If tg is not found in txt, the empty string is returned. Examples: > SearchTag('AC', 'ABL1_CAEELP03949;'); P03949; See also: ?SearchAC ?SearchDb ?SearchID ?Species_Entry SendDataTcp Function SendDataTcp( machine:string, pid:posint, data:string ) Sends data to pid on machine. SendTcp Function SendTcp Option: builtin Calling Sequence: SendTcp(data) Parameters: Name Type Description ---------------------------------------- data string command to the ipcdeamon Returns: NULL Synopsis: SendTcp sends data to the IPC daemon. This data is usually a command understood by darwinipc. See ?darwinipc. A SendTcp is followed by a ReceiveTcp to read out the response from the daemon. Examples: > r := traperror(ConnectTcp('/tmp/.ipc/darwin', false)); > SendTcp('PING'); r := ReceiveTcp(3); r := PING OK > SendTcp('MSTAT linneus1'); r := ReceiveTcp(3); r := DATA linneus1 0:OK ALIVE > DisconnectTcp();; See Also: ?ConnectTcp ?ipcsend ?ParExecuteTest ?SendDataTcp ?darwinipc ?ParExecuteIPC ?ReceiveDataTcp ?DisconnectTcp ?ParExecuteSlave ?ReceiveTcp Sequence Function Sequence - Searching and retrieving sequences in the database DB Option: polymorphic Calling Sequence: Sequence(off) Parameters: Name Type Description -------------------------------------------------------------------------------------- off {integer,list,string,structure} entries or list of entries in the database DB Data structure of type PatEntry, AC or ID Returns: Sequence Synopsis: Sequence will return the peptide or nucleotide sequence pointed by the argument(s). This normally consists of the field enclosed by the tags and . Sequence returns a string or an expression sequence of strings. When the argument is an ID or an AC structure, the database is searched for the corresponding ID or AC. If the argument is an integer, it is taken to be a database offset into a sequence. In this case the maximal sequence starting at that offset is returned. Otherwise, the arguments are treated as the arguments for Entry, and their sequences extracted. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) > s1 := Sequence(Entry(1)); s1 := MKFLILLFNILCLFPVLAADNHGVGPQGASGVDPITFDINSNQTGPAFLT ..(924).. ILVVSLIVGIL > Sequence(PatEntry(10000..10001)); A, A > Sequence(AC('P11341')); MAYRGFKTSRVVKHRVRRRWFNHRRRYR > Sequence(ID('ID5B_PROJU')); SDRCKDLGISIDEENNRRLVVKDGDPLAVRFVKANRRG > GetEntryNumber(s1); 1 See Also: ?AC ?ID ?PatEntry ?SearchID ?Entry ?Match ?SearchAC ?Species_Entry ServerSocket Function ServerSocket - Listen from unix domain socket, fork to process requests Calling Sequence: ServerSocket(socket_path) Parameters: Name Type Description -------------------------------------------------------------------------- socket_path string path where server socket is created and listen from gc posint (optional) gc frequency of child process Returns: string Synopsis: ServerSocket is a function which creates a unix domain socket and starts listening on it. Each line sent to the socket will fork the darwin process. The child process will get the received string as the return value of the function and everything sent to standard out will be sent back through the socket to the client program. The parent process will wait forever and has to be killed externally. Note that garbage collection will require a lot of data to be copied. Hence, gc frequency will by default be assigned a very high value and childreen processes should not run too long. Examples: > req := ServerSocket('/tmp/server_square'); req := 5 > print(req ^ 2); > quit; See Also: ?CallSystem ?LockFile ?OpenReading ?TimedCallSystem ?gc ?OpenPipe ?Set Set echomapsizeplotdeviceplotoutputprintgcprofilepromptquietserverscreenwidthTotalDPCells Function Set - Set system options and defaults Option: builtin Calling Sequence: Set(opt) Parameters: Name Type -------------------------------- opt {string, string=anything} Returns: anything : previous value of the system variable Synopsis: The Set command is used to assign to system variables. Name Type Description -------------------------------------------------------------------------------- BytesAlloc posint Returns the number of allocated bytes echo posint Sets the level of input/output information displayed. 0 - no echo under any circumstance 1 - (default) echo whenever the input or the output are not from/to the terminal, but do not echo as a result of a read statement. 2 - echo whenever the input or the output are not from/to the terminal. 3 - echo only as a result of read statements 4 - echo everything. n - (n > 4): echo only as a result of read statements nested less than n-4 The echo option is superseded by quiet, i.e. if quiet=true, no echo will occur. gc integer Sets the frequency (in words allocated) for garbage collection. mapsize integer Sets the minimum size (in chars) required for Darwin to build a .map file for a database. plotdevice string Sets the protocol for subsequent Draw commands. (options: portrait (8.5x11 with 1/2' margin) landscape (11x8.5 with 1/2' margin) portraitA4 (210x297 with 1/2' margin) landscapeA4 (297x210 with 1/2' margin) plotoutput filename Name of the file to store the plotted code. printgc boolean Toggles displaying garbage collection information. printlevel integer Sets the amount of information which is printed out during execution. profile boolean Toggles printer/plotter profile mode prompt string Sets the Darwin prompt. quiet boolean Toggles the suppression of output. screenwidth posint Sets the width of a line for all subsequent output. server boolean Places Darwin in server mode. TotalDPCells posint Return/sets the number of cell computed for DynProgr Examples: > Set(printgc); false > Set(plotdevice=landscape); landscape SetRand Function SetRand Option: builtin Calling Sequence: SetRand(seed) Parameters: Name Type -------------- seed integer Returns: NULL Synopsis: Sets the seed of the random number generator. The sequence of pseudo-random numbers generated depends uniquely on the seed, i.e. the same seed will generate the same sequence. Examples: > SetRand(123); See also: ?Rand ?SetRandSeed SetRandSeed Function SetRandSeed Calling Sequence: SetRandSeed() Returns: NULL Global Variables: SetRandSeed_value Synopsis: Initialize the random number generator to produce a sequence depending on the date, time and process id. This is normally a guarantee that different processes end up with different random seeds. If printlevel is 3 or higher, SetRandSeed will print the value that it has used for SetRand() so that the same random sequence can be regenerated. Examples: > SetRandSeed(); > Rand(); 0.4405 See also: ?CreateRandSeq ?Rand ?SetRand ?Shuffle SetupRA Function SetupRA - setup of the relative adaptivnes for CAI Calling Sequence: SetupRA(mode) Global Variables: CodonProb RA Synopsis: Assigns the global variable RA needed by ComputeCAI. See also: ?ComputeCAI ?RelativeAdaptiveness SetuptRNA Function SetuptRNA - set up functions for tRNA translations Calling Sequence: SetuptRNA(d) Parameters: Name Type Description ------------------------------------------------------- d list(list) a list (by aa) of list of codons or d string the name of a known table of tRNA Returns: NULL Global Variables: CIntTotInt_list IntTotInt_list ntRNA tIntToCInt_list tIntToInt_list tIntTotRNA_list Synopsis: This function sets up all the necessary functions to translate tRNAs. These are from tInt to A, AAA, Amino, Int, CInt and Codon and from Int and CInt to tRNA and tInt. Its input is either a string (which means a predefined name) or it is a list of 20 (one per amino acid) lists of tRNAs. The format is best given by an example, see the file lib/SetuptRNA. Execution of SetuptRNA causes the following functions and values to be defined: Name Description ----------------------------------------------------------------- ntRNA integer, the number of tRNA molecules used ----------------------------------------------------------------- tIntToInt tInt (1..ntRNA) to Int (aa number, 1..20) tIntToA tInt (1..ntRNA) to A (aa one-letter code) tIntToAAA tInt (1..ntRNA) to AAA (aa 3-letter code) tIntToAmino tInt (1..ntRNA) to Amino (aa full name) ----------------------------------------------------------------- tIntToCInt tInt (1..ntRNA) to set of CInt (codon number, 1..64) tIntToCodon tInt (1..ntRNA) to set of Codon (3-letter codon) ----------------------------------------------------------------- tIntTotRNA tInt (1..ntRNA) to tRNA (tRNA name) tRNATotInt tRNA (tRNA name) to tInt (1..ntRNA) ----------------------------------------------------------------- IntTotInt Int (aa number, 1..20) to set of tInt (1..ntRNA) IntTotRNA Int (aa number, 1..20) to set of tRNA (tRNA name) ----------------------------------------------------------------- CIntTotInt CInt (codon number, 1..64) to tInt (1..ntRNA) CIntTotRNA CInt (codon number, 1..64) to tRNA (tRNA name) Currently the following names are recognized as arguments for SetuptRNA: [Archaea, Bacteria, Eukaryota, eukaryotes, prokaryotes, YEAST, yeast] Examples: > SetuptRNA(yeast); See also: ?ComputeTPI ?TPIDistr ShortestPath Function ShortestPath - shortest path from one node to all others Calling Sequence: ShortestPath(g,i,excl) Parameters: Name Type Description ---------------------------------------------- g Graph given graph i anything starting node excl set (optional) excluded node set Returns: list([posint, numeric]) Synopsis: Compute the shortest path from node i to every connected node in g. It is assumed that a non-negative numeric label on an Edge is the length of the edge, that is the distance between the corresponding nodes. "excl" is the set of nodes not to be considered and defaults to {}. Examples: > g := Graph( Edges(Edge(1.2,1,2),Edge(2,1,4),Edge(3,1,5),Edge(4,2,3),Edge(5,3,4)),Nodes(1,2,3,4,5)); g := Graph(Edges(Edge(1.2000,1,2),Edge(2,1,4),Edge(3,1,5),Edge(4,2,3),Edge(5,3,4)),Nodes(1,2,3,4,5)) > ShortestPath(g,1); [[1, 0], [2, 1.2000], [3, 5.2000], [4, 2], [5, 3]] See Also: ?BipartiteGraph ?Graph_minus ?Nodes ?Clique ?Graph_Rand ?ParseDimacsGraph ?DrawGraph ?Graph_XGMML ?Path ?Edge ?InduceGraph ?RegularGraph ?EdgeComplement ?MaxCut ?TetrahedronGraph ?Edges ?MaxEdgeWeightClique ?VertexCover ?FindConnectedComponents ?MinCut ?Graph ?MST Shuffle Function Shuffle Calling Sequence: Shuffle(t) Parameters: Name Type -------------------------------- t {string, list, structure} Returns: type(t) Synopsis: Randomly permute the characters (when t is a string) or components (when t is a list or a structure). A new object is created and the argument is left unchanged. Examples: > Shuffle('abcdefghijklmnopqrstuvwxyz'); boujaqhsgrwldpkziexvctymfn > Shuffle([1,2,3,4]); [4, 3, 2, 1] > Shuffle(ABC(a1,a2,a3,a4)); ABC(a1,a4,a2,a3) See also: ?CreateRandPermutation ?CreateRandSeq ?Mutate ?Permutation Signature Function Signature( ) Calculate the signature for a specific data type Trees: the signature is the same for isomorphic trees, and for trees with different roots. Only the graph topology is relevant. The function has the following form: to get the signature for two leaves a and b that are connected to the same node c, the signature value for node c is (x^a + x^b) modulo n. n is a large number, i.e. 2^32 x is a "generator" number, which means that x^1 mod n, x^2 mod n etc etc produces all numbers between 0 and n-1. SignedSynteny Function SignedSynteny - find the number of inversions of a signed permutation Calling Sequence: SignedSynteny(perm) Parameters: Name Type Description ------------------------------------ perm list(integer) a permutation Returns: integer Synopsis: SignedSynteny finds the minimum number of reversals needed to transform the input permutation into an ascending straight run of positive integers. The input permutation is a list of length n of the integers from 1 to n, where each number is also assigned a sign plus or minus (plus is implicit). A reversal operation modifies a signed permutation by swapping the order of a particular contiguous range and flipping the sign of the elements in the range. The problem of finding the synteny distance between two genomes, with known direction of every gene in the genomes, can be reduced to the problem of finding the number of reversals. SignedSynteny runs in O(n) and is an implementation of the algorithm described in "Kaplan et al., Faster and simpler algorithm for sorting signed permutations by reversals, SODA '97, ISBN:0-89871-390-0, 344-351, 1997.", except for a sub- algorithm to find connected components in a special graph. A faster algorithm to find the connected components is given in "Bader et al., A linear-time algorithm for computing inversion distance between signed permutations with an experimental study, WADS '01, ISBN:3-540-42423-7, 365- 376, 2001." Examples: > SignedSynteny([8, 9, -6, -1, 3, 5, -7, 2, 4]); 8 > SignedSynteny([4, 5, 6, -3, -1, -2]); 4 See also: ?DrawTree ?GapTree ?LeastSquaresTree ?PhylogeneticTree ?Synteny SmallAllAll Function SmallAllAll - do an all-against-all matching of a small database Calling Sequence: SmallAllAll(MinSim) Parameters: Name Type Description ------------------------------------------------------------- MinSim numeric optional cutoff value for match similarity Returns: NULL Synopsis: This function does a complete match of all sequences in a database against each other. A database must have been loaded previously with the ReadDb command. This function works more like a program and it prints all sorts of information about the all-all matching. A file named DB[Filename]. AA is created with the darwin-readable results of the matrix of matches. Besides the matrix of matches, the file contains commands to build a phylogenetic tree, a probabilistic ancestral sequence and a multiple alignment of all the sequences. It is expected that the user will inspect this file, and choose which commands to run. Some of the less used commands are commented out in the output file. If the sequences of the database are disconnected in several groups, that is no significant match can be found between the sequences, these groups are placed in different files named DB[FileName].i for consecutive values of i. If MinSim is omitted it defaults to 100. See also: ?AlignOneAll ?Match ?ReadDb SortedMA Function SortedMA( mulAlign:array(string), tree:Tree ) Returns the sequences of the multiple alignment sorted in order of the original data base SpToDarwin Function SpToDarwin( flatfile:string, darwinfile:string, descr:string, compressed:boolean ) Converts a SwissProt flat file (flatfile) into a Darwin loadable file (darwinfile). The actual data is prefixed by descr which should contain the database name (DBNAME tag) and release (DBRELEASE tag). If compressed is specified and true, the flat file is read using zcat. SpeciesCode Function SpeciesCode - NCBI TaxonId to SwissProt species code Calling Sequence: SpeciesCode(posint) Parameters: Name Type Description --------------------------------- tax posint NCBI taxonomic ID Returns: string Synopsis: Maps a NCBI taxonomic identifier to the SwissProt species code. If the ID is not known, the function returns an error. Examples: > SpeciesCode(9606); HUMAN See also: ?TaxonId ?UpdateSpeciesCode Species_Entry Function Species_Entry - find all the entries for a given species Calling Sequence: Species_Entry(specname) Parameters: Name Type Description ----------------------------------- specname string species name(s) Returns: list(Entry) Global Variables: SearchOS_table Species_table Synopsis: Species_Entry returns all the entries in DB (which must be assigned a sequence database) which match the given specname. This assumes that the database has a field tagged with .. where the species information is available. This is rather specific of SwissProt. The first time Species_Entry is called, it builds a table of species and it may require some time to compute. Following calls will be much more efficient. Examples: > Species_Entry('Abies firma'); [MATK_ABIFIQ9MV51;Maturase ..(1002).. S, RBL_ABIFIO78258;Ribulose ..(1081).. K] See Also: ?DbToDarwin ?SearchAC ?SPCommonName ?GetEntryInfo ?SearchID ?SP_Species SplitLines Function SplitLines - make a list of lines from a string Calling Sequence: SplitLines(s) Parameters: Name Type Description --------------------------------------------------- s string a string which may contain newlines Returns: list(string) Synopsis: SplitLines takes a string and breaks it after every newline character ('\n'). Each of these lines are placed in an output list. If the string does not end in a newline, the last string of the list will not end in a newline. In other words, SplitLines just splits the string, it does not introduce or remove any characters. Examples: > SplitLines('abc'); [abc] > SplitLines('abc xyz'); [abc , xyz] See Also: ?FileStat ?LockFile ?ReadLine ?ReadRawFile ?TimedCallSystem ?Lines ?OpenPipe ?ReadOffsetLine ?SearchDelim Stat Class Stat - Basic Univariate Statistics Package Template: Stat() Stat(Description) Returns: Stat Fields: Name Type Description ----------------------------------------------------------------------- Number integer number of observations recorded Mean numeric mean of the sample Average numeric mean of the sample (same as Mean) Variance numeric variance of the sample VarVariance numeric variance of the observed variance Skewness numeric coefficient of skewness (sidewise leaning) Excess numeric excess (flatness, or kurtosis) Min numeric the minimum of the sample Minimum numeric the minimum of the sample Max numeric the maximum of the sample Maximum numeric the maximum of the sample ShortForm string Description: MeanVar StdErr numeric 95% conf. interval of mean CV numeric coefficient of variance (std. dev/mean) Description string user-defined description MeanVar string form: xxx+-xx (mean and 95% conf. interval) VarVar string form: xxx+-xx (variance and 95% conf. interval) Methods: HTMLC plus print printf printpm Rand rawprint select Stat_type string times union Synopsis: Stat defines a new data structure to gather univariate statistical information. Methods exist for printing, adding and creating a union of two Stat data structures. The extraction of useful statistical data from the information collected in a Stat data structure is performed with the provided selectors. References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 26.1 Examples: > BooHoo := Stat('Stock Market Losses'); BooHoo := Stat(0,1.7797162035136915e+308,-1.7797162035136915e+308,0,0,0,0,0,Stock Market Losses) > BooHoo2 := Stat('More Losses'); BooHoo2 := Stat(0,1.7797162035136915e+308,-1.7797162035136915e+308,0,0,0,0,0,More Losses) > UpdateStat( BooHoo, 10000 ): > UpdateStat( BooHoo, 30000 ): > UpdateStat( BooHoo2, 50000 ): > UpdateStat( BooHoo2, 60000 ): > BooHoo[Mean]; 20000 > BooHoo[Number]; 2 > Akk := BooHoo union BooHoo2; Akk := Stat(4,10000,60000,30000,1700000000,27000000000000,1130000000000000000,30000,Stock Market Losses and More Losses) > print(BooHoo); Stock Market Losses: number of sample points=2 mean = 20000 +- 19600 variance = 200000000 +- 999999 skewness=999999, excess=999999 minimum=10000, maximum=30000 See Also: ?CollectStat ?ExpFit ?LinearRegression ?UpdateStat ?Counter ?ExpFit2 ?OutsideBounds StatTest Chi-SquareG-testIndependenceFriedman-Rafsky Function StatTest - Test a statistical hypothesis Option: polymorphic Calling Sequence: StatTest(test,data) Parameters: Name Type Description ------------------------------------------------------------------------- test string Indicator which test should be done data anything data used to test the hypothesis (type depends on test) Returns: TestStatResult Synopsis: This function tests several statistical hypothesis. The type of hypothesis to be tested is indicated via the first argument. Tests implemented so far: ChiSquare One-dimensional Chi-square test of independence (cells are assumed equally-probably). "data" is a one-dimensional array of counts (non-negative integers). The data can also be a table or counts which must be indexed over the integers. Every non-zero entry of the table will be assumed an entry in the data. ChiSquare Two-dimensional Chi-square test of independence (rows and columns are assumed independent). "data" is a two-dimensional array of counts (non-negative integers). The data can also be a table of counts which must be indexed over pairs of integers (lists of two integers). Every non-zero entry of the table will be assumed an entry in the data. Independence Two arrays of (any type of) data are grouped to test their independence. The most significant Chi-square test is reported. FriedmanRafsky Tests whether two samples, usually multivariates, come from the same distribution. Each sample must be inputed as a matrix in which each column is a sample. G One-dimensional G test of independence (cells are assumed equally-probably). This is an instance of the likelihood ratio test applied to a list of equiprobable events. "data" is a one-dimensional array of counts (non-negative integers). The data can also be a table or counts which must be indexed over the integers. Every non-zero entry of the table will be assumed an entry in the data. G Two-dimensional G test of independence (rows and columns are assumed independent). This is an instance of the likelihood ratio test applied to tableaux. "data" is a two-dimensional array of counts (non-negative integers). The data can also be a table of counts which must be indexed over pairs of integers (lists of two integers). Every non-zero entry of the table will be assumed an entry in the data. For each hypothesis an internal function will be called that computes the test statistic from the data, the p-value from Cumulative and the standardized deviation from CumulativeStd. References: Rice JA, Mathematical Statistics and Data Analysis, 2nd ed. chapter 13.4, p.489 Friedman, Rafsky (1979) "Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests" Examples: > StatTest( ChiSquare,[[1,2,3],[4,5,6],[7,8,9]] ); TestStatResult(ChiSquare,0.4688,0.9765,-1.9858,[[1, 2, 3], [4, 5, 6], [7, 8, 9]],Degrees_of_freedom = 4) > StatTest( Independence, [A,B,B,B,B,A], [-1,3,4,3,4,-3] ); TestStatResult(ChiSquare,1.5000,0.2207,0.7699,[[2, 0], [2, 2]],Degrees_of_freedom = 1) > StatTest( FriedmanRafsky, [[1,5],[2,-1],[1,3]], [[1,-1],[3,4]] ); TestStatResult(FriedmanRafsky,0.6547,0.5127,0.6547) See Also: ?Cumulative ?OutsideBounds ?ProbCloseMatches ?Std_Score ?CumulativeStd ?ProbBallsBoxes ?Rand ?TestStatResult Std_Score Function Std_Score - conversion from standard deviations to Score Calling Sequence: Std_Score(s) Parameters: Name Type Description ------------------------------------------------ s numeric a number of standard deviations Returns: numeric Synopsis: This function converts a probability expressed in terms of standard deviations to a Score (-10*log10(Prob)). This is done in such a way that very large values can be handled with precision and without causing overflow/underflow. Formally, a Score is defined as: Score = -10 * log10( Prob{ Normal(0,1) < s } ) Examples: > Std_Score( -30 ); 1973.0921 > Std_Score( +30 ); 2.131e-197 See Also: ?Cumulative ?OutsideBounds ?ProbCloseMatches ?StatTest ?CumulativeStd ?ProbBallsBoxes ?Rand Student_Rand Function Student_Rand - Generate random Student's-t distributed reals Calling Sequence: Rand(Student(nu)) Parameters: Name Type ------------------ nu nonnegative Returns: numeric Synopsis: This function returns a random Student's t distributed number with average 0 and variance nu/(nu-2). If X is a Normal(0,1) random variable and X1 is a Chi-square random variable with parameter nu, X/sqrt(X1/nu) is Student(nu) distributed. Student_Rand uses Rand() which can be seeded by either the function SetRand or SetRandSeed. References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.7 Examples: > Rand(Student(3)); -0.9813 > Rand(Student(100)); 0.00779824 See Also: ?Beta_Rand ?Exponential_Rand ?Multinomial_Rand ?Shuffle ?Binomial_Rand ?FDist_Rand ?Normal_Rand ?StatTest ?ChiSquare_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score ?CreateRandSeq ?Geometric_Rand ?SetRand ?Zscore ?Cumulative ?Graph_Rand ?SetRandSeed SubDist Function SubDist( t:Tree, i:integer, j:integer ) Get the distance in PAM units from leaf i to leaf j SubTree Function SubTree( MinSquareTree:Tree, pam ) generates an expression sequence of SubTrees from a given MinSquareTree at a specified pam distance SurfIntActPred Function SurfIntActPred( MulAlign:array(string), MinSquareTree ) Generates the prediction of surface, interior and active site positions in a multiple alignment. SurfOut Function SurfOut( SurfMatrix:array(array(array)), SurfMatrixTot:array(array) ) Returns for each position the SurfProb of being on the surface, the number of variable subgroups at the specified MaxPW and SurfAA used to determine SurfProb Surface Function Surface( Cluster:list(list(list)), MA:array(string), MaxPW:array, SurfAA:array, ActMatrixOut:array ) Reports the number of variable subgroups at defined PAM windows in which at least one amino acid is of the type defined in SurfAA SurfaceTot Function SurfaceTot( SurfMatrix:array(array(array)) ) Reports the sum of the number of variable subgroups at defined PAM windows and SurfAAs counted over all positions SvdAnalysis Function SvdAnalysis( AtA:matrix(numeric), btA:list(numeric), btb:numeric, NData:posint, names:list(string), svmin:{numeric,First(posint)} ) SvdAnalysis does a least squares approximation and returns various measures of quality of the fit. Problem: Given a matrix of A (dim n x m) and a vector b (dim n), we want to find a vector x (dim m) such that Ax ~ b. This approximation is in the least squares sense, i.e. ||Ax-b||^2 is minimum The calling arguments are: AtA is a matrix (dim m x m) which is the product A^t * A btA is a vector (dim m) which is the product b^t * A btb is the norm squared of b, i.e. ||b||^2 = b^t * b NData is the number of data points (A is dim n x m) names is a list (dim m) of the names associated with each column of A, or with each value of x. svmin is a positive numeric value. All singular values less than svmin will not be used. Making svmin=0, all singular values are used, and this is equivalent to pure least squares. Alternatively, svmin can be the structure First(k), where k is a positive integer not greater than the dimension of AtA. In this case, the largest k singular values will be used. If the global variable ComputeSensitivity is set to false, SvdAnalysis will not compute the sensitivity analysis and will compute more quickly. For m > 100 this is highly recommended. Output: The output is a darwin data structure SvdResult( Norm2Err, SensitivityAnalysis, SingularValuesUsed, SingularValuesDiscarded, Norm2Indep, MinNorm2Err, SolutionVector, NData ) where: Norm2Err is the norm squared of the resulting approximation, i.e. ||Ax-b||^2 SensitivityAnalysis is a list of 4-tuples with m entries, each one corresponding for one variable. Each entry is [nnn,vvv,sss,ttt], where: nnn is the name of the variable, vvv is the result value (the x[i] value) sss is an estimate of the standard deviation of vvv ttt is the amount by which ||Ax-b||^2 will increase if nnn would not be used. Two compute this difference, all singular values are used. The list is sorted by decreasing ttt The list is only produced if the global variable ComputeSensitivity is not set to false, otherwise it is empty. SingularValuesUsed is a list of the singular vales used ( > svmin ) SingularValuesDiscarded is a list of the singular values discarded ( <= svmin ) Norm2Indep is simply btb, the norm squared of the independent variables, the maximum norm that could be reached MinNorm2Err is the norm of ||Ax-b||^2 if all singular values were used, i.e. is the minimum norm that could be achieved with these m variables. SolutionVector is the solution vector x NData is the number of data points (A is of dimensions n x m) A good summary explanation of the Svd analysis can be found in many books, I like the one in Forsythe Malcolm and Moler, Computer Methods for mathematical computations. See Also: ?SvdBestBasis ?LSBestSum ?LSBestDelete ?LSBestSumDelete SvdBestBasis Function SvdBestBasis - Least squares by selecting best basis (subset) Calling Sequence: SvdBestBasis(AtA,btA,btb,NData,names,k,svmin,try,startset) Parameters: Name Type Description ------------------------------------------------------------------------- AtA matrix(m,m) the product of A^t * A btA vector(m) the product b^t * A btb numeric the norm squared of b, i.e. b*b NData posint number of data points (dim A is n x m) names list(string) names associated with each column of A k posint number of variables in the solution svmin numeric optional lower limit for using singular values try posint optional, trials after a new local minimum startset list(integer) optional, k column numbers to start Returns: SvdResult Global Variables: SvdBestHash SvdBest_A SvdBest_d SvdGoodBases SvdGoodPerms SvdHashSig Svd_svmin Synopsis: SvdBestBasis finds the best set of k variables to do a least square fit. For k<=2 this the result is the global minimum (and the variable "try" is ignored), for k>2 this is a heuristic, not an exact algorithm, and its precision depends on how many trials are performed. The problem of finding the best set of variables, when done incrementally, one variable at a time, is called Stepwise regression. The results of SvdBestBasis are generally much better than those obtained by stepwise regression. The problem is formally defined as follows: Given a matrix of A (dim n x m) and a vector b (dim n), we want to find a vector x (dim m) such that Ax ~ b, where x has k non-zero components and m-k zero components. This approximation is in the least squares sense, i.e. |Ax-b|^2 is minimum. The output is a SvdResult data structure. The global variable SvdGoodBases is assigned a list of SvdResult data structures for all the other local minima that are found. The global variable SvdGoodPerms is assigned a list of the permutations of the variables which gave the good bases in SvdGoodBases. SvdBestBasis prints information as it computes. The amount of information printed can be regulated with printlevel. svmin is an optional positive numeric value. All singular values less than svmin will not be used. Making svmin=0, all singular values are used, and this is equivalent to pure least squares. The selection of singular values is used for the final computation of the SvdResult, not for the computation of the best basis. try is an optional integer. It indicates the number of trials will be done after a new local minima is found before stopping. If omitted, 15 trials are done after the lowest norm has been found. startset is an optional list of k integers. SvdBestBasis will start its search for an optimal from this set. If try is greater than 1, then other trials, starting at random sets, will also be tried. See Also: ?ExpFit ?LSBestSum ?Stat ?SvdReduceGood ?LSBestDelete ?LSBestSumDelete ?SvdAnalysis ?SvdResult SvdResult Class SvdResult - results of a least squares approximation, Ax=b Template: SvdResult(Norm2Err,SensitivityAnalysis,SingularValuesUsed, SingularValuesDiscarded,Norm2Indep,MinNorm2Err,SolutionVector, NData) Fields: Name Type Description -------------------------------------------------------------------------------- Norm2Err numeric norm of approximation |Ax-b|^2 SensitivityAnalysis list(list) results with sensitivity analysis SingularValuesUsed list(numeric) singular values used SingularValuesDiscarded list(numeric) singular values discarded Norm2Indep numeric norm of independent variables, |b|^2 MinNorm2Err numeric |Ax-b|^2 is all sv were used SolutionVector list(numeric) least squares solution, x NData posint number of data points (dim A is n x m) Methods: HTMLC print Rand SvdResult_type Synopsis: An SvdResult holds the result of a linear least squares approximation. Such an approximation is normally generated by SvdAnalysis or SvdBestBasis. The list with the sensitivity results has 4 entries per variable. These are the name of the variable, the result value (the x[i] value), an estimate of the standard deviation and the amount by which |Ax- b|^2 will increase if this variable would not be used. Two compute this difference, all singular values are used. This list is sorted in decreasing order of the last argument. The list is only produced if the global variable ComputeSensitivity is not set to false, otherwise it is empty. See also: ?SvdAnalysis ?SvdBestBasis Synteny Function Synteny - find the number of inversions of a permutation Calling Sequence: Synteny(perm,k) Parameters: Name Type Description -------------------------------------------------- perm list(posint) a permutation k posint (optional) effort to be done Returns: integer Synopsis: Synteny finds an approximation to the minimum number of inversions needed to transform the input permutation into a straight run (ascending or descending). The input permutation is a list of the integers from 1 to n, where n is the length of the list. An inversion operation is a modification of a permutation which selects a particular contiguous range and swaps its order. The problem of finding the synteny distance between to genomes can be easily reduced to the problem of finding the number of inversions to straighten the permutation. The parameter k gives the function a hint on how much work should be done, it is the number of partial solutions that will be kept during the search. The problem is NP-complete, so this algorithm searches for a good approximate solution. The higher k, the more work it will be done. For a particular problem, the amount of work is linear in k. By default k=10. Examples: > Synteny( [1,7,8,9,6,5,4,2,3] ); 3 > Synteny( [4,5,6,1,2,3,7,8,9] ); 3 See Also: ?DrawTree ?LeastSquaresTree ?SignedSynteny ?GapTree ?PhylogeneticTree SystemCommand Function SystemCommand - execute a system command Calling Sequence: SystemCommand(operation,addit_args) Parameters: Name Type Description ----------------------------------------------------------- operation string the name of the system operation addit_args string (optional) additional argument needed Returns: numeric Synopsis: This command is provided to isolate system dependencies for performing some operations which require execution of other, standard, programs in the system. The optional additional arguments are dependent on the operation and are typically file names on which the commands should be run. The value returned is the integer value returned by the CallSystem command that will run this operation. This command also allows for simple customization for non-standard installations. In this case, the file lib/ SystemCommand may have to be extended with particular commands for your system. The valid values for operation are: HTML HTML viewer -- one additional parameter, the name of the file which contains html source. The process should be detached to allow stand-alone perusal. postscript postscript viewer -- one additional parameter, the name of the postscript file. (Usually a file ending in ".ps"). The process should detach to allow stand-alone perusal. This is the command that will show all the darwin plots. darwin darwin -- two additional parameters, the name of a file with darwin input commands and the name of the file where the output will be placed. The input file should end with a "quit" command, else the spawned darwin will attempt to read from the user once that all the commands are executed. gimp picture processing software (could be gimp, photoshop or something equivalent) -- one additional parameter, the name of the file (typically a jpg, gif, ps or pdf) rm remove file(s) -- one additional argument with the name(s) of the file(s) to be removed. The removing is forced and without questions asked. maple the maple computer algebra system -- two additional parameters, the name of a file with maple input commands and the name of the file where the output will be placed. Maple is run with option quiet to avoid unnecessary/ confusing output. See also: ?CallSystem ?date ?hostname ?TimedCallSystem TPIDistr Function TPIDistr - distribution of number of changes in a sequence Option: builtin Calling Sequence: TPIDistr(a1,a2,a3,a4) Parameters: Name Type Description ------------------------------------------------- a_i posint number of symbols of the ith type Returns: list(numeric) Synopsis: The arguments (any number from 1 to 4) are taken to be the number of symbols of each type. a1 is the number of symbols of type 1, a2 the number of symbols of type 2, etc. TPIDistr computes the probability distribution of the number of transitions in a random sequence with a1, a2, ... symbols of each type. This has a special application in computing the TPI index (tRNA Pairing Index) which measures how autocorrelated are the tRNAs that translate a given amino acid, independently of the frequencies of the tRNAs and codons. The distribution is returned in a list, and the first entry corresponds to 0 changes, the second to 1 change, etc. The number of changes can never exceed a1+a2+...-1, so the list returned is of length a1+a2+... For example, there are 3 ways of permuting 2 A's and one B. AAB, ABA and BAA. Two sequences have one transition and one sequence has two transitions. so the result in this case should be [0,2/3,1/3]. Examples: > TPIDistr(1,2); [0, 0.6667, 0.3333] > TPIDistr(1,2,3,4); [0, 0, 0, 0.00190476, 0.01714286, 0.08095238, 0.2167, 0.3310, 0.2671, 0.08523810] See also: ?ComputeTPI ?SetuptRNA TT Class TT - placeholder for text that should be displayed "as is" Template: TT(string1,...) Fields: Name Type Description --------------------------------------------- string1 string text to be displayed as is Returns: TT Methods: HTMLC LaTeXC print string TT_type Synopsis: The TT data structure holds text that is to be displayed using a constant width font (like in a typewriter) Examples: > TT( 'for i to 10 do lprint(i^2) od'); TT(for i to 10 do lprint(i^2) od) See Also: ?Block ?Document ?latex ?Roman ?Code ?HTML ?List ?RunDarwinSession ?Color ?HyperLink ?Paragraph ?screenwidth ?Copyright ?Indent ?PostscriptFigure ?Table ?DocEl ?LastUpdatedBy ?print ?View Table Class Table - structure to print/display tables Template: Table(arg1,...,argn) Fields: Name Type Description ------------------------------------------------------------------------------------------- arg1..n anything components of table in any order center the entire table is centered border the entire table is framed with a border gutter=posint set gutter between columns gutter=list(posint) set gutter for each individual column ColAlign({string,p(posint)}...) set alignment for each individual column RowAlign(string) set vertical alignment for following rows ('l', 'c' and 'r' for left, center, right) Row(args) a row of data, each argument in a column title=string title/caption to describe the table Values(args) args to be distributed columwise rowwise uses Values(), but args are distributed rowwise width=posint width of the table in characters Rule draw a horizontal line SpanPrevious possible argument of Row Returns: Table Methods: HTMLC LaTeXC print string Table_type Synopsis: The Table structure holds information describing a table (or tabular information). This is expected to be laid out as a table either as text, latex, html or something else. If a Row structure has an element with the name 'SpanPrevious', then the previous entry will be expanded to occupy also the space of this entry (like \multicolumn in latex or colspan in html). The alignment inside the cells are set with ColAlign - either l (left), r (right), c (center) or p(x) (paragraph with a fixed width of x characters). Examples: > t := Table( center, border, gutter=4, Row('abc','cde'),Row(1,1e9)): > print(t); ----------------------- | abc cde | | 1 1000000000 | ----------------------- See Also: ?Block ?Document ?latex ?Roman ?Code ?HTML ?List ?RunDarwinSession ?Color ?HyperLink ?Paragraph ?screenwidth ?Copyright ?Indent ?PostscriptFigure ?TT ?DocEl ?LastUpdatedBy ?print ?View TaxonId Function TaxonId - SwissProt species code to NCBI TaxonId Calling Sequence: TaxonId(string) Parameters: Name Type Description -------------------------------------- org posint SwissProt species code Returns: integer Synopsis: Maps a SwissProt species code to the NCBI taxonomic identifier. If the species code is not known, the function returns an error. Examples: > TaxonId('HUMAN'); 9606 See also: ?SpeciesCode ?UpdateSpeciesCode TaxonomyDownload Function TaxonomyDownload - downloads the UniProt species taxonomy and converts them to a Darwin readable format Calling Sequence: TaxonomyDownload() Returns: NULL Synopsis: Downloads the UniProt species taxonomy hierarchy from the UniProt webpage and converts them to Darwin tables that are stored in the file UniProtTaxonomy.drw which is located in Darwin' data directory. See also: ?SpeciesCode ?TaxonId ?TaxonomyEntry TaxonomyEntry Class TaxonomyEntry - data structure holding TaxonomyEntry information Fields: Name Type Description ---------------------------------------------------------------------------------------- id {integer,string} the id/name of the taxonomic level Scientific Name string scientific name of level Common Name string common name of level (or empty string) Synonym string synonym name of level (or empty string) Other names list(string) list of other names of level Species code string the UniProt species identifier (or empty string) Parent TaxonomyEntry the direct parent node in the taxonomy Children list(TaxonomyEntry) the direct children node in the taxonomy Lineage list(string) the lineage tree Lineagestring string the lineage tree as one string ('; ' separated) Methods: print Rand select string TaxonomyEntry_type Synopsis: The TaxonomyEntry datastructre allows to easily access the different names, IDs and parent-/children entries. The selectors are all case insensitive. The constructor of this function accepts a taxonomic identifier, a UniProt species identifier or a scientific species name and returns the instance of the TaxonomyEntry datastructre with the desired taxonomic level. Examples: > t := TaxonomyEntry(9606); t := TaxonomyEntry(9606) > seq(z['sciname'], z= t['children']); Homo sapiens neanderthalensis, Homo sapiens ssp. Denisova > t['comname']; Human See also: ?SpeciesCode ?TaxonId ?TaxonomyDownload TempName Function TempName( ) Generate file names that can safely be used for a temporary file. Optional arguments are: Dir = string and Prefix = string which allow the user to control the choice of a directory and prefix. TestGradHessian Function TestGradHessian Calling Sequence: TestGradHessian(f,f1,f2,point) Parameters: Name Type Description ---------------------------------------------------------------------- f procedure multivariate numerical function f1 procedure gradient of f, returns a vector f2 procedure hessian of f, returns a square matrix point list(numeric) (optional) value at which to test n posint (optional) dimension of argument of f Tol Tolerance = positive (optional, default=100) error tolerance Returns: boolean Synopsis: The TestGradHessian function is used to test whether the first and second derivatives of a function are computed correctly. This test is run at the given point (or at a random point instead). The arguments to f, f1 and f2 are vectors (lists) of dimension n. The output of f must be a number, the output of f1 must be a list of numbers of dimension n (the partial derivatives of f) and the output of f2 must be a matrix (n x n) with the second partial derivatives of f. TestGradHessian computes approximations to the gradient and the hessian by computing f and f1 at various points. If the results are within 100 times the minimal expected error, the function returns true, else it prints some information about the failure and returns false. The error tolerance can be changed from 100 to any desired number with the corresponding optional argument Examples: > f := proc(x) cos(x[1])*tan(x[2]) end: > f1 := proc(x) [-sin(x[1])*tan(x[2]), cos(x[1])*(1+tan(x[2])^2)] end: > f2 := proc(x) [[-cos(x[1])*tan(x[2]), -sin(x[1])*(1+tan(x[2])^2)], [-sin(x[1])*(1+tan(x[2])^2), 2*cos(x[1])*tan(x[2])*(1+tan(x[2])^2)]] end: > TestGradHessian(f,f1,f2,[0.3,0.5]); true > TestGradHessian(f,f1,f2,[0.3,0.5],Tolerance=0.4); (2,1) second derivative, error too large: -3.97238e-12, f1[1]p=-0.161445100295, f1[1]m=-0.16144174911, (f1[1]p-f1[1]m)/h=-0.38371715, f2[2,1]=-0.38371715, h=8.73348e-06 err / (DBL_EPSILON*(|gp[j]|+|gm[j]|)/h) = -0.483891 See Also: ?BFGSMinimize ?MaxLikelihoodSize ?MinimizeFunc ?DisconMinimize ?Minimize2DFunc ?MinimizeSD ?MaximizeFunc ?MinimizeBrent ?NBody TestStatResult Class TestStatResult - result of a statistical test Template: TestStatResult(name,TestStat,pvalue,pstd) Fields: Name Type Description ----------------------------------------------------------------------- name string name of the statistical test TestStat numeric test statistic computed from the data pvalue numeric p-value (probability value) pstd numeric p-value in standard deviations plog numeric natural logarithm of the p-value CountMatrix array(integer) count matrix, (optional, e.g. ChiSquare) Methods: print Rand select string Table TestStatResult_type Synopsis: A TestStatResult holds a result of a statistical test. It is normally generated by the StatTest function. The pvalue depends on the test, and in general it is the probability that such a result is obtained by chance. Extreme values (very close to 0 or very close to 1) are hence very rare. The pstd value measures the pvalue too. It is the number of standard deviations away that the pvalue would be if it were a normally distributed variable. It is useful to measure very extreme probabilities, where the p-values may be out of precision. For extremely small values of the p-value, the selector plog may be more practical, it records the natural logarithm of the p-value. For the ChiSquare test the count matrix is returned as the fifth field and is associated with the selector CountMatrix. Besides the first four fields and the CountMatrix, the structure can hold any number of additional arguments, which are test-dependent. These extra arguments are typically of the form string=anything. TestStatResult prints nicely via the print method. Any symbol of type string=anything will be printed using the format (%s = %a). Any occurrences of _ in the string will be replaced by a space. An example for this is Degrees_of_freedom=x in the ChiSquare test. See also: ?StatTest TetrahedronGraph Function TetrahedronGraph - generate graphs describing regular polyhedra Calling Sequence: TetrahedronGraph() HexahedronGraph() OctahedronGraph() IcosahedronGraph() DodecahedronGraph() Returns: Graph Synopsis: Generate a graph which corresponds to a regular polyhedra. That is, a graph whose vertices correspond to the vertices of a regular polyhedra, and so its edges. Examples: > TetrahedronGraph(); Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,1,4),Edge(0,2,3),Edge(0,2,4),Edge(0,3,4)),Nodes(1,2,3,4)) > HexahedronGraph(); Graph(Edges(Edge(0,1,2),Edge(0,1,4),Edge(0,1,5),Edge(0,2,3),Edge(0,2,6),Edge(0,3,4),Edge(0,3,7),Edge(0,4,8),Edge(0,5,6),Edge(0,5,8),Edge(0,6,7),Edge(0,7,8)),Nodes(1,2,3,4,5,6,7,8)) > OctahedronGraph(); Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,1,4),Edge(0,1,5),Edge(0,2,3),Edge(0,2,5),Edge(0,2,6),Edge(0,3,4),Edge(0,3,6),Edge(0,4,5),Edge(0,4,6),Edge(0,5,6)),Nodes(1,2,3,4,5,6)) > IcosahedronGraph(); Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,1,4),Edge(0,1,5),Edge(0,1,6),Edge(0,2,3),Edge(0,2,6),Edge(0,2,7),Edge(0,2,8),Edge(0,3,4),Edge(0,3,8),Edge(0,3,9),Edge(0,4,5),Edge(0,4,9),Edge(0,4,10),Edge(0,5,6),Edge(0,5,10),Edge(0,5,11),Edge(0,6,7),Edge(0,6,11),Edge(0,7,8),Edge(0,7,11),Edge(0,7,12),Edge(0,8,9),Edge(0,8,12),Edge(0,9,10),Edge(0,9,12),Edge(0,10,11),Edge(0,10,12),Edge(0,11,12)),Nodes(1,2,3,4,5,6,7,8,9,10,11,12)) > DodecahedronGraph(); Graph(Edges(Edge(0,1,2),Edge(0,1,5),Edge(0,1,6),Edge(0,2,3),Edge(0,2,8),Edge(0,3,4),Edge(0,3,10),Edge(0,4,5),Edge(0,4,12),Edge(0,5,14),Edge(0,6,7),Edge(0,6,15),Edge(0,7,8),Edge(0,7,16),Edge(0,8,9),Edge(0,9,10),Edge(0,9,17),Edge(0,10,11),Edge(0,11,12),Edge(0,11,18),Edge(0,12,13),Edge(0,13,14),Edge(0,13,19),Edge(0,14,15),Edge(0,15,20),Edge(0,16,17),Edge(0,16,20),Edge(0,17,18),Edge(0,18,19),Edge(0,19,20)),Nodes(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)) See Also: ?BipartiteGraph ?Graph_minus ?Nodes ?Clique ?Graph_Rand ?ParseDimacsGraph ?DrawGraph ?Graph_XGMML ?Path ?Edge ?InduceGraph ?RegularGraph ?EdgeComplement ?MaxCut ?ShortestPath ?Edges ?MaxEdgeWeightClique ?VertexCover ?FindConnectedComponents ?MinCut ?Graph ?MST TextBlock Class TextBlock - builds a named block around content Template: TextBlock(blockname,content1,...) Returns: TextBlock Fields: Name Type Description -------------------------------------------------------------- blockname string the name of the block content_i {string,structure} the text content of the block Methods: HTMLC LaTeXC string TextBlock_type Synopsis: A TextBlock is only meaningful in the context of a structured output format such as LaTeX or (X)HTML. If used in a normal print statement, TextBlock will just output the content parameters. If used in a LaTeXC statement, TextBlock will create an environment called 'blockname' around the content. Examples: > b := TextBlock( 'abstract', 'This is my funny abstract.' ); b := TextBlock(abstract,This is my funny abstract.) > print(b); This is my funny abstract. > prints(LaTeXC(b)); \begin{abstract}This is my funny abstract.\end{abstract} See Also: ?Block ?HTML ?Paragraph ?Table ?Code ?HyperLink ?PostscriptFigure ?TT ?Color ?Indent ?print ?View ?Copyright ?LastUpdatedBy ?Roman ?DocEl ?latex ?RunDarwinSession ?Document ?List ?screenwidth TextHead Function TextHead - Find the beginning of a string Option: builtin Calling Sequence: TextHead(x) Parameters: Name Type Description ----------------------------------- x string an arbitrary string Returns: integer Synopsis: Returns the offset to be added to x (on the left) to obtain the first character of the string containing x. Examples: > a := 'CYQQSVWPFMDYQQFQGFSWKMPLGNNH'; a := CYQQSVWPFMDYQQFQGFSWKMPLGNNH > a1 := a[10..20]; a1 := MDYQQFQGFSW > TextHead(a1); -9 > TextHead(a1)+a1; CYQQSVWPFMDYQQFQGFSW See also: ?GetOffset ?TextHandling TimedCallSystem Function TimedCallSystem Option: builtin Calling Sequence: TimedCallSystem(cmd) TimedCallSystem(cmd,timeout) Parameters: Name Type ----------------------------------------------- cmd a string containing a system command timeout an optional integer number of seconds Returns: [integer, string] : return code and result of command Synopsis: The "cmd" argument is passed to the underlying operating system. If the optional "timeout" argument is specified, Darwin allows for "timeout" seconds of execution. If the command does not terminate in the allocated time, it is killed and the TimedCallSystem returns [-1, '(Timeout)'], otherwise it returns a list consisting of the execution return code value returned by the operating system and the output generated by cmd. The output is returned as a string. It will normally be ended with a newline character. Normally, a return code 0 indicates successful execution. Examples: > TimedCallSystem(date,10); [0, Tue Feb 19 10:54:49 CET 2013 ] > TimedCallSystem('sleep 5',3); [-1, (Timeout)] See also: ?CallSystem ?SystemCommand ?time ?UTCTime TotalAlign Function TotalAlign Option: builtin Calling Sequence: TotalAlign(m,DM,goal) Parameters: Name Type Description ------------------------------------ m Match a Match DM DayMatrix a Dayhoff Matrix goal numeric a threshold value Returns: list(Match) Synopsis: The TotalAlign function implements the Smith-Waterman algorithm SmithW81 with an extension to find all independent local alignments of the complete sequences of 'm' reaching a score of at least 'goal'. The alignments are computed at PAM distance defined by the similarity matrix DM. Examples: See also: ?CreateDayMatrices ?MAlign TotalTreeWeight Function TotalTreeWeight( t:Tree ) Returns the sum of the length of all branches im PAM units Transcribe Function Transcribe - DNA to RNA Calling Sequence: Transcribe(dna) Parameters: Name Type Description ------------------------------- dna string string of bases Returns: string Synopsis: Replaces all T with U. Examples: > Transcribe('ATG'); AUG See also: ?BackTranscribe ?Translate Translate Function Translate - DNA to Protein Calling Sequence: Translate(dna) Parameters: Name Type Description ----------------------------------------- dna string sequence to be translated Returns: string Synopsis: Translate a DNA sequence into a protein sequence. Examples: > Translate('ATGAAATTTTAA'); MKF See also: ?BackTranslate ?Transcribe Tree Class Tree - Internal node of a binary Tree Template: Tree(Left,Height,Right,xtra) Fields: Name Type Description -------------------------------------------------------- Left {Leaf,Tree} recursive left subtree Right {Leaf,Tree} recursive right subtree Height anything any information, usually height xtra anything (optional) additional information Returns: Tree Methods: GetPartitions Graph GraphR matrix Newick Rand select Signature Tree_type Synopsis: The Tree data structure holds binary trees which may or may not be labelled and/or weighted. The Left and Right subtree of the tree are either (1) a Tree structure or (2) a Leaf structure. Many built-in Darwin routines for phylogenetic trees, assume that the Height field refers to the height of the node. These routines include DrawTree. The use of the xtra field varies significantly from algorithm to algorithm. Examples: > t := Tree(Tree(Leaf(A),5,Leaf(B)),0,Tree(Leaf(C),11,Leaf(D))); t := Tree(Tree(Leaf(A),5,Leaf(B)),0,Tree(Leaf(C),11,Leaf(D))) > t[Left]; Tree(Leaf(A),5,Leaf(B)) > t[Right]; Tree(Leaf(C),11,Leaf(D)) See Also: ?BipartiteSquared ?IntraDistance ?Prefix ?BootstrapTree ?Leaf ?RBFS_Tree ?ComputeDimensionlessFit ?LeastSquaresTree ?ReconcileTree ?DrawTree ?Leaves ?RobinsonFoulds ?GapTree ?PhylogeneticTree ?SignedSynteny ?Infix ?Postfix ?Synteny TreeAngles Function TreeAngles( g:Graph ) Find angles for edges of g (being a tree) in order to draw it. TreeConstruction Data structure TreeConstruction( ) Function: creates a gap heuristic data structure Selectors: Algorithm: Type string Method of tree construction. PROB: probabilistic model (MinSquareTree) TSP: TSP method PHYLIP: an algorithm from the phylip package default: PROB Method: Type string If method is TSP, then type describes what kind of TSP method to use. The type describes which leaves to connect in the connection step. Possible values are: - LINEAR: smallest "swapping" error (order n) The next three methods calculate ALL errors per step(order n^2). - MINSQUARE: chose leaves with minimum sum of square of errors - AVERAGE: smallest average error - MINMAX: minimal maximal error - TREE: minimal tree fitting index of subtree constructs a MinSquareTree of each subtree - DOUBLETSP: use TSP again to find *another* circular order and connect the leaves that were swapped. Is of order n^3 of course in each step ... so about n^4 in total Phylip package: - NEIGHBOR - KITSCH - FITCH default: LINEAR Relative: Type boolean true if relative error should be considered. false if absolute error should be used. default: true Simultan: Type real values < 0 mean only do one connection at a time Otherwise it is the maximal relative error up to which connections are made. range: -1, 0.0 - 1.0 default: 0.1 Dynamic: Type real values < 0 mean do NOT use dynamic programming. Otherwise it is the maximal relative error up to which connections are made. range: -1, 0.0 - 1.0 default: 0.2 AdjustEps: type boolean true if the maximal error (param. dynamic) should be adjusted, if the smallest error is larger than the error specified, or if all errors are smaller than the error specified. fals if errors should not be changed. Default: true Maxbranch: Type real Maximum number relative to n (nr of leaves) up to which connections should be kept, rounded to the next bigger integer. Only used if Dynamic is > 0. range: -1, 0.0 - 1.0 default: 4 Minbranch: Type real Used if Dynamic > 0. Determines how many connections should be considered in one step in ANY case, even if the error is too large. Values between 0.0 and 1.0 are relative to n (nr of leaves). Values > 1 are absolut values There is always at least ONE connection. range: 0.0 - 1.0, positive integer default value: 1 Limit: Type real Max. number of trees to keep in memory, if Dynamic is > 0 -1 means no limit. The number is relative to n, the number of leaves. range: -1, positive integer default: 3 Data: anything could be an array of statistics or any other information MSAScores: boolean true: uses the scores calculated form the MSA to reconstruct tree false: uses scores from the allall to reconstruct the tree default: false Scoring: String Scoring of trees. Can be PAM, SCORE, ERROR, INDEX, COMBINED PAM: tree w. smallest PAM distance is best tree SCORE: tree w. largest SCORE is best tree ERROR: tree w. smalles turn-error is best tree INDEX: tree w. smalles fitting index is best tree MSA: tree w. best associated msa is best tree default: PAM Datatype: String Data used for tree construction. Can be PAM, SCORE PAM: use PAM distances instead of scores SCORE: use scores and not PAM distances default: PAM TreeResult Class TreeResult - the result of a tree reconstrution call Template: TreeResult(Tree,Type,Other) Fields: Name Type Description --------------------------------------------------------------------------------------- Tree Tree the maximum likelihood tree Type string type of reconstruction (ML/Distance/Parsimony/Other) Name string (opt) arbitrary name to identify the tree Likelihood numeric (opt) log(Likelihood) for ML trees Alpha numeric (opt) alpha parameter of Gamma correction InvSites numeric (opt) invariant sites BaseFreqs list(numeric) (opt) base frequencies SubstModel string (opt) substitution model Method string (opt) name of the function used to build the tree CPUtime numeric (opt) seconds use to build the tree LSError nonnegative (opt) Weighted branch length errors (Distance) CharChanges integer (opt) Number of character changes needed (Parsimony) LnLperSite list(numeric) (opt) List of loglikelihood values per site Methods: print Rand select string TreeResult_type Synopsis: A TreeResult stores the result of a maximum likelihood tree reconstrution. Parameters, that have not been estimated are unassigned. See also: ?PhyML ?RAxML ?RellTree ?Tree TreeSize Function TreeSize - Number of leaves in a tree Calling Sequence: TreeSize(t) Parameters: Name Type Description ------------------------- t Tree a Tree Returns: integer Synopsis: Traverse a tree and returns the number of leaves. Examples: > t := Rand(Tree): > TreeSize(t); 12 See also: ?CenterTreeRoot ?RotateTree TreeStatistics Data structure TreeStatistics( ) Data structure that keeps statistical data about tree constructions and methods Selectors: Type: Tree Information on the Tree that was used Construction: TreeConstruction Information about the TreeConstruction type that was used Real: Integer Number of exact tree constructions (in position 1) Prob: Integer Number of trees that were the same as the tree calculated by the probabililistic model Total: Integer Total number of trees construced Time: Stat() Construction time Position: Stat() Position of the real tree in the list of constructed trees 1 is optimal Error: Stat() Average error for each connection step Number: Stat() Average number of trees at the end of construction Index: Stat() Tree fitting index Deltaindex: Stat() Difference of tree fitting index of real tree and constructed tree Topology: Stat() Average topology distance of trees Name: string Name/Title of these statistics Found: Integer How often was the tree found (anywhere) Notfound: Integer Goodindex: Integer If tree was not found: how often was index larger than that of real tree (-> good measure) Goodpam: Integer If tree was not found: how often was total pam distance larger than that of real tree (-> good measure) Goodscore: Integer If tree was not found: how often was score smaller than that of real tree (-> good measure) Goodmsa: Integer If tree was not found: how often was score of the msa smaller than that of real tree (-> good measure) Msa: Numeric Difference in Score of real msa minus score of calculated msa of constructed tree TreeToPam Function TreeToPam( tree ) returns a expression sequence which contains the PAM distance of the leafs of a tree (or a leaf) Tree_Graph Function Tree_Graph( no:Tree ) Convert a binary tree into a graph (unrooted tree). Tree_matrix Function Tree_matrix - Distence Matrix induced from Tree Calling Sequence: Tree_matrix(t) Tree_matrix(t,leaves) Parameters: Name Type Description ------------------------------------------------------------------ t Tree the given tree leaves {list,procedure,table} (optional) leaf to index mapping Returns: matrix(nonnegative) Synopsis: This function extracts the pairwise distances between any two leaves on a tree and returns them in a distance matrix. If the optional 'leaves' argument is not provided, the 'Label' or 3rd field of the Leaf datastructures have to contain the indices to the matrix. Otherwise, the leaves argument has to be either a list of leaf labels, a table pointing from labels to indices or a function returning for a leaf datastructure the appropriate index. Examples: > t := Tree(Leaf(A,1.2),0,Tree(Leaf(B,1.8),0.9,Leaf(C,1.4))); t := Tree(Leaf(A,1.2000),0,Tree(Leaf(B,1.8000),0.9000,Leaf(C,1.4000))) > Tree_matrix(t,[A,B,C]); [[0, 3, 2.6000], [3, 0, 1.4000], [2.6000, 1.4000, 0]] See also: ?CreateArray ?Leaf ?LeastSquaresTree ?PhylogeneticTree ?Tree UTCTime Function UTCTime - UTC time in seconds or wall-clock time of evaluation Option: builtin Calling Sequence: UTCTime() UTCTime(expr) Parameters: Name Type ----------------- expr expression Returns: numeric Synopsis: This function returns the total wall-clock time taken to evaluate the expression expr. When no expression is passed, it returns the number of seconds since 00:00:00 GMT, January 1, 1970. This is called UTC time or Coordinated Universal Time. Examples: > UTCTime(); 1361267692.2392 > UTCTime(log10(factorial(100))); 5.0068e-06 > UTCTime( CallSystem('sleep 2') ); 2.0325 See also: ?date ?time ?TimedCallSystem UnassignGlobals Function UnassignGlobals - unassigns all global variables from a given function Calling Sequence: UnassignGlobals(func) UnassignGlobals(func,ex) Parameters: Name Type Description ---------------------------------------- func procedure the function ex set (optional) exceptions Returns: NULL Synopsis: UnassignGlobals unassigns all global veriables that are set by a given function. The optional second argument allows the user to define a set of variables that should be excluded from this. Examples: > Clique(TetrahedronGraph()); {1,2,3,4} > CliqueUpperBound; 4 > UnassignGlobals(Clique); > CliqueUpperBound; CliqueUpperBound See also: ?Globals UnionFind Class UnionFind - Implementation of the Union-Find data structure and algorithm Template: UnionFind(Elements) UnionFind() Fields: Name Type Description ---------------------------------------------------------------------- Elements {list,list(set)} (optinal) initial elements Clusters list(set) sets resulting from the union operations Returns: UnionFind Methods: plus print select string union UnionFind_type Synopsis: The Union-Find data structure allows one to repetetly join two sets. The algorithm's performance, given m union/find operations of any ordering, on n elements takes O(log(n)*m*a(m,n)) where a(m,n) is the inverse ackermann function, thus close to O(log(n)) per operation. Sets can be unified by performing a union operation on the UnionFind data structure and a list containing two elements, one from each of the two sets. New sets can be added two the data structure using the plus function. References: Algorithmen und Datenstrukturen, T. Ottmann and P. Widmayer, Spektrum, Akad.Verl,,1996 Examples: > uf := UnionFind([{22,14,31},{12,41,23},{4},{99,25}]): > union(uf,[14,99]): > uf[Clusters]; [{4}, {12,23,41}, {14,22,25,31,99}] > uf + {33,2,6}: > union(uf, [2,4]): > uf[Clusters]; [{12,23,41}, {14,22,25,31,99}, {2,4,6,33}] UpdateSpeciesCode Function UpdateSpeciesCode - downloads the SwissProt-NCBI species mapping Calling Sequence: UpdateSpeciesCode() Synopsis: Downloads the mapping between the SwissProt species codes and the NCBI taxonomic identifiers from http://www.expasy.ch/cgi-bin/speclist and converts it into a Darwin readable file called speciescode.drw which is located in Darwin's data directory. See also: ?SpeciesCode ?TaxonId UpdateStat Function UpdateStat - Add sample point to Stat Data Structure Calling Sequence: UpdateStat(name,number) Parameters: Name Type Description ----------------------------------------------------------- name Stat Stat data structure to be updated number numeric value to be added to Stat data structure Returns: Stat Synopsis: UpdateStat is used to add a sample point to an existing Stat data structure. Examples: > BooHoo := Stat('Stock Market Losses'): > UpdateStat( BooHoo, 10000 ): > UpdateStat( BooHoo, 30000 ): > BooHoo[Mean]; 20000 > BooHoo[Number]; 2 > print(BooHoo); Stock Market Losses: number of sample points=2 mean = 20000 +- 19600 variance = 200000000 +- 999999 skewness=999999, excess=999999 minimum=10000, maximum=30000 See Also: ?CollectStat ?ExpFit ?LinearRegression ?Stat ?Counter ?ExpFit2 ?OutsideBounds VertexCover Function VertexCover - Vertex Cover exact/approximate algorithm Option: builtin Calling Sequence: VertexCover(A) Parameters: Name Type Description -------------------------- A Graph a Graph Returns: set Synopsis: The input to this algorithm is an undirected graph. An undirected graph is represented as a Graph data structure which should accept two selectors: Nodes and Edges. The Vertex Cover problem is finding the minimum set of vertices which "cover" all edges. That is a minimum size set of vertices such that each edge is incident to at least one of the vertices in this set. The output is a set of the Nodes in the vertex cover. The algorithm computes a lower bound on the size of the vertex cover which is left in the global variable VertexCoverLowerBound. If this coincides with the size of the answer, it means that the answer is optimal. The global variable VertexCoverIterFactor may be assigned a non-negative number f. The algorithm will then run for f*n^2 iterations. If f=0 then only the greedy heuristic is run, and this is quite fast. The larger f, the more accurate the answers will be, and the more time the algorithm will consume. The Vertex Cover problem is closely related to the Clique problem. They can be related by the following formula: VertexCover(G) = NodeComplement(Clique(EdgeComplement(G))) Examples: > VertexCover(PetersenGraph()); {1,2,3,6,8,10} > VertexCoverLowerBound; 6 See Also: ?BipartiteGraph ?Graph_minus ?Nodes ?Clique ?Graph_Rand ?ParseDimacsGraph ?DrawGraph ?Graph_XGMML ?Path ?Edge ?InduceGraph ?RegularGraph ?EdgeComplement ?MaxCut ?ShortestPath ?Edges ?MaxEdgeWeightClique ?TetrahedronGraph ?FindConnectedComponents ?MinCut ?Graph ?MST View Function View - show an object on the screen in a visual way Option: polymorphic Calling Sequence: View(t) Parameters: Name Type Description ------------------------------------------- t anything an object to be displayed Returns: NULL Synopsis: This function attempts to display an object in a visual way. If the object is an HTML file, a browser will be called. If it is a plot, a postscript viewer will be called, if it is a Latex file, the xdvi viewer will be called. This function is very system dependent, it works only in unix/linux, and assumes that the underlying programs are available. Examples: > View(Histogram(data)); > View(HTML(doc)); See Also: ?Block ?Document ?latex ?Roman ?Code ?HTML ?List ?RunDarwinSession ?Color ?HyperLink ?Paragraph ?screenwidth ?Copyright ?Indent ?PostscriptFigure ?Table ?DocEl ?LastUpdatedBy ?print ?TT ViewPlot Function ViewPlot - run a viewer on a plot just created Calling Sequence: ViewPlot() Returns: NULL Synopsis: Start ghostview showing the current output of DrawPlot() or most other plotting commands, in the proper orientation. See Also: ?BrightenColor ?DrawPlot ?PlotArguments ?ColorPalette ?DrawPointDistribution ?Set ?DrawDistribution ?DrawStackedBar ?SmoothData ?DrawDotplot ?DrawTree ?StartOverlayPlot ?DrawGraph ?GetColorMap ?StopOverlayPlot ?DrawHistogram ?Plot2Gif VisualizeProtein Function VisualizeProtein( ms:list(NucPepMatch) ) Visualize the alignment of a protein with all its homologue genes. WeightObservations Function WeightObservations - Weight data for least squares analysis Calling Sequence: WeightObservations(A,b,w) Parameters: Name Type Description --------------------------------------------------------------- A matrix(numeric) n rows of m-dimensional data vectors b array(numeric) n-dimensional vector of dependent data w array(numeric) n-dimensional vector of weights Returns: [ AtA:matrix(numeric), btA:array(numeric), btb:numeric ] Synopsis: Prepare matrices and vectors used for least squares approximations with given weights. Given the matrix A (dim n x m) and the vector b (dim n) a least squares solution searches a vector x, such that Ax ~ b, or |Ax-b| is minimal in some sense. A weighted least squares problem is equivalent to the above, except that every error is weighted by a (non-negative) factor w[i]. This is equivalent to minimizing | W*(Ax-b) | (where W is a diagonal matrix of weights). In simpler terms, if a weight w[i] is an integer, then considering the weight is equivalent to having w[i] equal observations of the data point i. Setting a weight to 0 is equivalent to deleting the observation. WeightObservations prepares the matrix AtA = A^t * A, btA = b^t * A and btb = b^t * b with the given weights. Usually, least squares approximating functions require these as input (SvdAnalysis, SvdBestBasis, etc.) Examples: > A := [[1,2],[3,3],[4,7],[6,2]]; A := [[1, 2], [3, 3], [4, 7], [6, 2]] > WeightObservations(A,[1,1,2,2],[10,5,2,0]); [[[87, 121], [121, 183]], [41, 63], 23] See also: ?SvdAnalysis ?SvdBestBasis WriteBlock Function WriteBlock( ali:array(string) ) Write a sequence alignment in Block format. Used namely by Geoff Barton's program alscript. WriteData Function WriteData - write data to a file Calling Sequence: WriteData(data,filename,separator) Parameters: Name Type Description ------------------------------------------------- data anything data to be saved filename string name of file to be written separator string string used as separator Returns: NULL Synopsis: WriteData function writes data to a file in a simple format. Useful for exporting data to other applications. The filename defaults to temp.dat and the separator is by default the tab character. See also: ?FileStat ?LockFile ?OpenWriting ?WriteFasta ?WriteSeqXML WriteFasta Function WriteFasta Calling Sequence: WriteFasta(seq) WriteFasta(seq,labs,fname) Parameters: Name Type --------------------- seq array(string) labs array(string) fname filename Returns: NULL Synopsis: Writes an array of sequences to a file (default is temp.fasta). If no labes are given, the sequences are numbered according to the order. Examples: > WriteFasta(['ACCGTA', 'AC_GTA']); >1 ACCGTA >2 AC_GTA See also: ?OpenWriting ?WriteData ?WriteSeqXML WriteSeqXML Function WriteSeqXML - Writes a genome database into a SeqXML formatted file. Calling Sequence: WriteSeqXML(f) Parameters: Name Type Description ----------------------------------------------------------------------------- f string path to output file db {database,string} (optional) path to database file / database handle Returns: NULL Global Variables: DB Synopsis: The function WriteSeqXML stores a genome database in SeqXML format. If no 'db' argument is passed the database currently assigned to DB is used. See also: ?WriteFasta Zeta Function Zeta Calling Sequence: Zeta(s) Parameters: Name Type -------------- s numeric Returns: numeric Synopsis: This function computes the Riemann Zeta function defined by inifinity ----- \ 1 Zeta(s) = ) ---- / s ----- i i = 1 Zeta has a simple pole at s=1. For all other values it is defined as the complex-plane extension of the above sum. References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 23.2 Examples: > Zeta(2); 1.6449 > Zeta(3); 1.2021 > Zeta(-0.5); -0.2079 Zscore Function Zscore - Test a statistical hypothesis Calling Sequence: Zscore(data) ZscorePercent(data) Parameters: Name Type Description ------------------------------------------------------------ data list Counts of observations, assumed equiprobable data matrix Counts by two criteria, assumed independent Returns: {list,matrix} Synopsis: Zscore transforms a vector or matrix of counts into a vector/matrix of normalized variables (ones with expected value 0 and variance 1). This is subtracting the expected value and dividing by the standard deviation. Or Z = (X-E[X])/sqrt(Var(X)). In this way the observations can be measured in "standard deviations away from the mean", which is a simple and useful measure. This is sometimes called the Z-transform, but since the Z-transform has a well established use in power series, we use the name Zscore. If the input is a vector of integers, it is assumed that all the values are counts of events which are equally probable. If the input is a matrix it is assumed that the values are counts of two independent events (columns/rows). In both cases, a binomial distribution is assumed for the counts, i.e. the individual events counted are independent of each other. ZscorePercent is very similar, but instead of returning a normalized variable, it returns a percentage of the expected value, i.e. Z = 100 * (X-E[X])/E[X] Examples: > Zscore( [8,12,21,7] ); [-1.3333, 0, 3, -1.6667] > print(Zscore( [[3,7,21],[10,15,33]] )); -0.73710648 -0.25050450 0.56887407 0.55192433 0.19114995 -0.47500296 > ZscorePercent( [8,12,21,7] ); [-33.3333, 0, 75, -41.6667] See Also: ?Cumulative ?ProbBallsBoxes ?StatTest ?CumulativeStd ?ProbCloseMatches ?Std_Score ?OutsideBounds ?Rand ?TestStatResult abs Function abs - absolute value Options: builtin, numeric, polymorphic and zippable Calling Sequence: |x| abs(x) Parameters: Name Type Description ------------------------------ x numeric an expression Returns: numeric Synopsis: This function computes the absolute value of a number. Two syntaxes are available, the functional one, abs(x) or the mathematical with vertical bars: |x|. Please note that |x|, when x is an array is not the norm of the vector, but the vector of the absolute values. Examples: > |-0.3|; 0.3000 > abs(cos(3)); 0.9900 > |[-1,-2,-3]|; [1, 2, 3] antiparallel Function antiparallel - reverse complement of a DNA sequence Option: builtin Calling Sequence: antiparallel(seq) Parameters: Name Type Description ---------------------------------- seq string a DNA/RNA sequence Returns: string Synopsis: Computes the antiparallel sequence of an DNA/RNA sequence. This is the complement in reverse order. For more clarity, the antiparallel of AACC is GGTT. The reverse of AACC is CCAA and the Complement of AACC is TTGG. The antiparallel of a DNA sequence describes a molecule that would form a double helix with the sequence. Examples: > antiparallel('ACCUUC'); GAAGGU See Also: ?AltGenCode ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt ?AminoToInt ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon ?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB append Function append - append to a list, set or structure Option: builtin Calling Sequence: append(L,e_1..e_k) Parameters: Name Type ------------------------------- L a list, set or structure e_i an arbitrary element Returns: {list,set,structure} Synopsis: This function appends e_1..e_k to the list or structure L. If the original list or set or structure has length less than 10, it appends on a new copy of L. Otherwise it appends it to L and hence (likely) modifies the original object. So if the first argument of append should not be destroyed, the appending should be done on a copy of L. Appending is written in a way that is efficient, even in case of appending thousands of elements, one at a time, to an empty list. Appending to sets, although efficient from the data enlargement point of view, is not efficient as every new set is reordered. If a large set is to be built by appending one element at a time, it is much more efficient to use a list and convert the list to a set once the appending is finished. This function accepts a variable number of additional arguments. Examples: > append( ABC(1,2,3), 4, 5 ); ABC(1,2,3,4,5) > append( CreateArray(1..11,7), 77 ); [7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 77] arcsin Function arcsin - the inverse trigonometric function Options: builtin, numeric, polymorphic and zippable Calling Sequence: arcsin(x) Parameters: Name Type Description -------------------------------------------- x numeric a numerical value, |x| <= 1 Returns: numeric Synopsis: This function computes the inverse of the trigonometric sine function. For all -1 <= x <= 1, sin(arcsin(x))=x. For all -Pi/2 <= y <= Pi/2, arcsin(sin(y))=y. The value returned by arcsin is a principal value, it is between (-Pi/2 and Pi/2). References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 4.4 Examples: > arcsin(0); 0 > arcsin(1/2); 0.5236 > arcsin(1); 1.5708 > arcsin(-1); -1.5708 See also: ?arctan ?cos ?sin ?tan arctan Function arctan - the inverse trigonometric function Options: builtin, numeric and polymorphic Calling Sequence: arctan(y) arctan(y,x) Parameters: Name Type Description -------------------------------------------- y numeric a numerical value x numeric an optional numerical value Returns: numeric Synopsis: This function, with a single argument, computes the inverse tangent function defined by: tan(arctan(y)) = y. The value returned by arctan is between -Pi/2 <= arctan(y) <= Pi/2. With two arguments, it computes the inverse tangent function defined by: tan(arctan(y,x)) = y/x when x <> 0. The value returned by arctan with two arguments is between -Pi < arctan(y,x) <= Pi. Arctan with two arguments computes the principal value of the argument of the complex number x+I*y References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 4.4 Examples: > arctan(0); 0 > arctan(1); 0.7854 > arctan(1,0); 1.5708 > arctan(-1,0); -1.5708 > arctan(1,1); 0.7854 > arctan(-1,-1); -2.3562 See also: ?arcsin ?cos ?sin ?tan assemble Function assemble - creates an internal structure Option: builtin Calling Sequence: assemble(s) Parameters: Name Type Description --------------------------------------------- s structure a structure of valid types Returns: anything : an arbitrary Darwin structure Synopsis: Assemble and disassemble are a pair of functions which allow the handling of procedures and expressions in Darwin. Disassemble transforms an internal structure into a Darwin data structure, where the names of the classes are the type names of the components. Assemble does exactly the reverse. The existence of this pair of functions is to be able to inspect, modify and create new bodies of procedures. Although they both work for any structure, common structures can be manipulated directly. It is the body of procedures which cannot be manipulated directly without dis/assemble. Examples: > assemble(power(a,2)); a^2 > assemble(list(expseq(1,2))); [1, 2] See Also: ?disassemble (the reverse operation) ?size ?length ?type (with a single argument) assert Function assert - test that an assertion is true Option: builtin Calling Sequence: assert(cond) Parameters: Name Type Description ----------------------------------------- cond boolean a condition to be tested Returns: NULL Synopsis: This function evaluates its argument, which is expected to be true or false. If it evaluates to true, it does nothing. If it evaluates to false it produces an "assertion failed" error. The first argument of the error is the unevaluated expression that evaluated to false. It is the easy to write assertions which upon failure will automatically produce meaningful errors. Examples: > assert(1=2);; Error, 1 = 2, assertion failed > Probab := 1.001; Probab := 1.0010 > e := [ traperror(assert( Probab >=0 and Probab <= 1))]; e := [0 <= Probab and Probab <= 1, assertion failed] > length(e); 2 > e[1]; 0 <= Probab and Probab <= 1 See also: ?error ?lasterror ?traperror ?warning assign Function assign - assign a variable as a function call Calling Sequence: assign(a,v) Parameters: Name Type --------------- a name v anything Returns: NULL Synopsis: This function assigns the value v to the name a. The assign function ignores the built-in scoping rules. Therefore, an assign call inside of a procedure persists after it is finished executing. A variable name can not be assigned a value from within a procedure if a global variable of the same name has already been assigned a value. Examples: > z := proc() assign(t, 100); end: > z(); > t; 100 See also: ?assigned ?eval ?names ?parse ?symbol assigned Function assigned - check if a name is assigned Option: builtin Calling Sequence: assigned(a) Parameters: Name Type ----------- a name Returns: boolean Synopsis: This function tests whether name a has been assigned a value or symbol. It should not be used for tables, as the table is not a name and unassigned entries evaluate to the default value. For tables, testing should be done against the default value. Examples: > a:=5; a := 5 > assigned(a); true > b:=c; b := c > assigned(b); true atoi Function atoi - convert characters to integers Calling Sequence: atoi(t) Parameters: Name Type ------------- t string Returns: integer Synopsis: The parameter t should be a string value formed over the symbols 0..9 and the period symbol (.). This function returns an integer value equal to trunc(tt) where tt is the integer value of t. Examples: > atoi('3993'); 3993 > atoi('-3.9'); -3 > type("); integer See also: ?sprintf ?trunc avg Function avg - average of numbers or list of numbers Calling Sequence: avg(L1,L2,...) Parameters: Name Type Description ------------------------------------------------------------ Li {numeric,list(numeric)} a number or list of numbers Returns: numeric Synopsis: Finds the average of all the values in the arguments. Examples: > avg(5, 97, 22, [14,15,16] ); 28.1667 > avg(2,3,5,7,11,13,17,19); 9.6250 See also: ?max ?median ?min ?std ?var ceil Function ceil Options: builtin, numeric and zippable Calling Sequence: ceil(x) Parameters: Name Type -------------- x numeric Returns: integer Synopsis: ceil returns the smallest integer larger or equal to x. ceil(x) = -floor(-x) for all values of x. Examples: > ceil(-2); -2 > ceil(-1.99999); -1 > ceil(2.000001); 3 See also: ?floor ?iquo ?mod ?round ?trunc coeff Function coeff Calling Sequence: coeff(s,v) Parameters: Name Type ------------------------------- s an arithmetic expression v a symbol name Returns: algebraic : the coefficient multiplying v in s Synopsis: Coeff computes the linear coefficient in the variable v contained in the algebraic expression s. The algebraic expression s may be any mathematical expression which is not yet evaluated (in symbolic form, see noeval). Examples: > t1 := noeval(3*a+b*c); t1 := 3*a+b*c > coeff(t1,a); 3 > coeff(t1,c); b See also: ?has ?hastype ?indets ?lcoeff ?mselect ?noeval ?subs ?types compress Function compress - compress an arbitrary object Option: builtin Calling Sequence: compress(obj) Parameters: Name Type Description ------------------------------------ obj anything object to compress Returns: compressed Synopsis: Compress takes any structure and compresses it in a simple way. The function decompress restores the original expression. Normally this is used in cases that lots of structures are stored in main memory and this would require too much memory and the structures are not used often enough, so that it pays to decompress them before using them. There are several internal structures which are not compressed, most notably Dayhoff matrices and databases. Consequently, structures that reference these will (e.g. Alignment) will not be compressed. The compression factor is about 3:1 for general structures on a 32-bit word implementation, higher for 64-bit words. Examples: > t := compress([1,2,{3,4}]); t := [1, 2, {3,4}] > decompress(t); [1, 2, {3,4}] > size(t)/size([1,2,{3,4}]); 0.1818 See also: ?decompress ?length ?size ?system convolve Function convolve - convolution of two or more vectors Calling Sequence: convolve(v1,v2,...) Parameters: Name Type Description ---------------------------------------------------------------- v_i list(numeric) a numerical vector of arbitrary dimension Returns: list(numeric) Synopsis: Compute the convolution of two or more numerical vectors. The convolution of two vectors v1 and v2 of dimensions d1 and d2 is the vector r with dimension d1+d2-1 with elements r[k] = sum( v1[i] * v2[k+i-i], i=1..k-1 ) (references outside v1 or v2 are considered 0). The convolution of more than two vectors is computed in an optimal order. Convolution is associative and commutative, so order of the operation does not matter. Examples: > v1:=[1,2,3,4]; v1 := [1, 2, 3, 4] > v2:=[1,1/2,1/3]; v2 := [1, 0.5000, 0.3333] > convolve(v1,v2); [1, 2.5000, 4.3333, 6.1667, 3, 1.3333] See Also: ?Cholesky ?GivensElim ?matrix ?Eigenvalues ?Identity ?matrix_inverse ?GaussElim ?LinearProgramming ?transpose copy Function copy - copy a modifiable data structure (at a desired depth) Option: builtin Calling Sequence: copy(x) copy(x,depth) Parameters: Name Type Description ----------------------------------------------- x anything the structure/object to copy depth posint optional depth of copying Returns: type(x) Synopsis: This function returns an exact copy of any object it is passed. This makes sense when we copy a modifiable object (strings, data structures, lists, etc.) which we want to modify and we want to preserve the original in its unmodified state. With a second argument, the copying will happen only for the given number of levels. Without a second argument is like using copy(x,infinity). Examples: > a := [1,2,[3,4]]; a := [1, 2, [3, 4]] > a1 := copy(a,1); a1 := [1, 2, [3, 4]] > a2 := copy(a); a2 := [1, 2, [3, 4]] > a1[1] := 5; a1[1] := 5 > a1[3,1] := 77; a1[3,1] := 77 > a; [1, 2, [77, 4]] > a1; [5, 2, [77, 4]] > a2; [1, 2, [3, 4]] cor Function cor - an unbiased correlation estimate Calling Sequence: cor(x) cor(x,y,method) Parameters: Name Type Description ------------------------------------------------------------ x {list,matrix} a numeric matrix or list y {list,matrix} (optional) a numeric matrix or list method string (optional) choice of coefficient Returns: {numeric,matrix(numeric)} Synopsis: This function computes the correlation of 'x' and 'y' if these are lists. If 'x' and 'y' are a matrix, the correlations between the columns of 'x' and the columns of 'y' are computed. The default of y, (i.e. 'y=NULL') is equivalent to 'y=x', but more efficient. The optional string argument 'method' indicates which correlation coefficient is computed. Available correlation coefficients: pearson correlation coefficient pearson pearson correlation coefficient spearman spearman's rank correlation coefficient kendall kendall's tau correlation coefficient If method is 'kendall' or 'spearman', Kendall's tau or Spearman's rho statistic is used to estimate a rank-based measure of association. These are more robust and have been recommended if the data do not necessarily come from a bivariate normal distribution. Note that 'spearman' basically computes 'cor(R(x), R(y))' where 'R(u) := Rank(u)' Examples: > cor([1,5,8,4], [6,2,8,9]); 0.1306 > cor([[1,4],[2,4],[2,2],[6,1],[7,-5]], 'spearman'); [[1, -0.9211], [-0.9211, 1]] See also: ?avg ?Covariance ?Rank ?StatTest ?std ?sum ?var cos Function cos - the trigonometric function Options: builtin, numeric, polymorphic and zippable Calling Sequence: cos(x) Parameters: Name Type Description ---------------------------- x numeric a number Returns: numeric Synopsis: This function computes the trigonometric cosine function. cos(x) has simple zeros at at x=Pi/2+n*Pi. References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 4.3 Examples: > cos(0); 1 > cos(Pi/4); 0.7071 > cos(Pi/2); 6.1232e-17 > cos(-Pi); -1 See also: ?arcsin ?arctan ?sin ?tan dSplitGraph Function dSplitGraph( splits:list([numeric, set]), all:{posint,set} ) Computes a graph from a list of dSplits. Edges will have labels of the format [length, splitnr] where splitnr is an index into splits which corresponds to this edge. The procedure returns an expression sequence g: Graph, angles: array(length(splits),numeric) where angles contains a list of angles to be used as hints when drawing the edges of graph. all is the set of all taxa of the split or a posint if the set is 1..all. dSplitIndex Function dSplitIndex( d:matrix(numeric), splits:list([numeric, set]) ) Computes the splittable fraction rho. dSplitMetricSum Function dSplitMetricSum( splits:list([numeric, set]), n:posint ) Computes the split decomposable distances d1. n is the number of taxa. dSplits Function dSplits( d:matrix(numeric) ) Computes the d-splits and their isolation indices from the distance matrix. Returns list([index,set]). date Function date Option: builtin Calling Sequence: date() Returns: string Synopsis: Returns the current date and time as a string. See also: ?time ?UTCTime debug Function debug Option: builtin Calling Sequence: debug() debug(arg) Parameters: Name Type ----------------------------------- arg an optional symbol or string Returns: NULL Synopsis: This function starts or stops the Darwin interactive debugger. The argument is optional. Without an argument, a call to debug() will start the interactive debugger. A call such as debug(false) will stop the debugger. If the argument is a string, this will be understood as a device which should be used to get the user's input (instead of stdin). E.g. if stdin is used for other purposes, then debug( '/dev/tty' ) will force the debugger to use /dev/tty for user input. At each interaction, the user can enter commands to inspect, alter variables and continue the debugging process. The interactive debugger is called whenever: an assignment statement has been executed an expression statement has been executed an if boolean expression has been evaluated The interactive debugger is activated by: a Darwin level call by the function debug() or debug(true) an interrupt () an error (when the option -de is used) The interactive debugger polls the user and takes the following actions depending on the input (when the debugger is expecting input, it will prompt the user with a ">>") command action -------------------------------------------------------------------------------- , "l", continue, as currently set "u", "k", set up to debug only at a higher level "d", "j", set up to debug everything "?", "h" short help "o" quit the debugger, continue the computation "p" print the current line "q" quit darwin "t" quit the debugger and computation and go to the top "w" print the current stack and lines Set(debug): start kernel debugging xxx;, xxx: execute xxx as a darwin statement Inspecting a variable may be achieved by executing a statement with just the variable. Similarly, any expression can be computed/inspected. Changing a variable can be done with an assignment statement. See also: ?printlevel ?profiling ?Set decompress Function decompress - decompresses a compressed object Option: builtin Calling Sequence: decompress(compr) Parameters: Name Type Description -------------------------------------- compr compressed compressed object Returns: anything Synopsis: Compress takes any structure and compresses it in a simple way. The function decompress restores the original expression. Normally this is used in cases that lots of structures are stored in main memory and this would require too much memory and the structures are not used often enough, so that it pays to decompress them before using them. The compression factor is about 3:1 for general structures on a 32-bit word implementation, higher for 64-bit words. Examples: > t := compress([1,2,{3,4}]); t := [1, 2, {3,4}] > decompress(t); [1, 2, {3,4}] > size(t)/size([1,2,{3,4}]); 0.1818 See also: ?compress ?length ?size ?system disassemble Function disassemble - produces a data structure from an internal structure Option: builtin Calling Sequence: disassemble(s) Parameters: Name Type Description --------------------------------------------- s anything any valid Darwin expression Returns: structure Synopsis: Assemble and disassemble are a pair of functions which allow the handling of procedures and expressions in Darwin. Disassemble transforms an internal structure into a Darwin data structure, where the names of the classes are the type names of the components. Assemble does exactly the reverse. The existence of this pair of functions is to be able to inspect, modify and create new bodies of procedures. Although they both work for any structure, common structures can be manipulated directly. It is the body of procedures which cannot be manipulated directly without dis/assemble. Examples: > disassemble(x -> sin(x)); procedure(expseq(x),expseq(),expseq(operator,arrow),expseq(),expseq(),structure(sin,Param(1))) > disassemble([1,{2,3}]); list(1,set(2,3)) See Also: ?assemble (the reverse operation) ?size ?length ?type (with a single argument) dprint Function dprint - print so that it can be read back by Darwin Calling Sequence: dprint(e1,e2,...) Parameters: Name Type Description ----------------------------- ei anything expression Returns: NULL Synopsis: This function prints out any Darwin expression. Expressions are printed so that they could be read back by Darwin. In principle a structure dprint-ed should produce, when read back in, the same structure (except for numerical precision). If given multiple expressions, these will be separated by commas, so that they can be read as an expression sequence. Dprint will use only one newline character at the end of the printing, so large expressions may be hard to handle in some systems (will be very long lines). Floating point numbers are printed with 5 significant digits. The global variable NumberFormat can be assigned a format, as in the printf function, and all numbers will be printed accordingly. Inside a printf statement, the format "%A" achieves the same effect as dprint. Examples: > dprint('a b c',1/3,1e9); 'a b c',0.3333,1000000000 > printf( '%A\n', ['a b c',1/3,1e9] ); ['a b c',0.3333,1000000000] See Also: ?lprint ?printf (contains conversion patterns) ?prints ?sscanf ?print ?PrintMatrix ?sprintf enum Function enum - list of consecutive integers Calling Sequence: enum(n) enum(r) Parameters: Name Type -------------- n integer r range Returns: list Synopsis: This function returns a list of numbers from 1 to n or from range r=r_1..r_2. Examples: > enum(5); [1, 2, 3, 4, 5] > enum(4..10); [4, 5, 6, 7, 8, 9, 10] See also: ?seq ?zip erf Function erf - error function - 2/sqrt(Pi)*int( exp(-t^2), t=0..x ) Options: builtin, numeric and zippable Calling Sequence: erf(x) Parameters: Name Type -------------- x numeric Returns: numeric Synopsis: This function returns the result of the following expression: x / 2 | 2 erf(x) = (-----) | exp(-t ) dt 1/2 | Pi / 0 The probability of a normally distributed variable with mean m and variance s^2 been less than x is 1/2+1/2*erf( (x-m)/sqrt(2*s^2) ). References: Erdelyi53, Handbook of Mathematical functions, Abramowitz and Stegun, 7.1 Examples: > erf(0); 0 > erf(1.96/sqrt(2)); 0.9500 > erf(3); 1.0000 > erf(-2); -0.9953 See also: ?erfc ?erfcinv ?Normal_Rand erfc Function erfc - the complement of the error function Options: builtin, numeric and zippable Calling Sequence: erfc(x) Parameters: Name Type -------------- x numeric Returns: numeric Synopsis: This function returns the result of the following expression: erfc(x) = 1 - erf(x) infinity / 2 | 2 erfc(x) = (-----) | exp(-t ) dt 1/2 | Pi / x References: Erdelyi53 Examples: > erfc(3); 2.209e-05 > erfc(1.96/sqrt(2)); 0.04999579 See also: ?erf ?erfcinv erfcinv Function erfcinv Options: builtin, numeric and zippable Calling Sequence: erfcinv(x) Parameters: Name Type Description ---------------------------- x numeric a number Returns: numeric Synopsis: This function returns the inverse of erfc(x), that is, the value of y such that: 2/ sqrt(Pi) integral from y to infinity exp(-t^2) dt = x. Examples: > erfcinv(3); Error, 3 is an invalid argument for erfcinv > erfcinv(0.1); 1.1631 > sqrt(2)*erfcinv(0.05); 1.9600 See also: ?erf ?erfc error Function error - terminate execution and issue an error message Option: builtin Calling Sequence: error(msg,...) Parameters: Name Type Description ----------------------------------------------------------- msg anything usually an error message ... anything additional arguments to clarify the error Returns: NULL Synopsis: This function returns to the top level of execution and issues the error message msg. If an error happens while executing a traperror() function, then the flow will not return to the top level, instead the traperror function will return with the value of the argument(s) of error(). When this happens, the global variable "lasterror" is set to the value of the error, else it is unassigned. Using error/traperror allows for a simple throw-catch mechanism. Examples: > f := proc(x) if x=0 then error('oops, div by 0') fi; 5/x end; f := proc (x) if x = 0 then error(oops, div by 0) fi; 5/x end > f(0); Error, (in f) oops, div by 0 > traperror(f(0)); oops, div by 0 > lasterror; oops, div by 0 See also: ?assert ?lasterror ?traperror ?warning eval Function eval Option: builtin Calling Sequence: eval(exp) Parameters: Name Type ----------------- exp expression Returns: anything Synopsis: This function forces the immediate and complete evaluation of exp. Examples: > eval(5+5); 10 > eval(parse('2+3!')); 8 See also: ?noeval evalb Function evalb Option: builtin Calling Sequence: evalb(exp) Parameters: Name Type ----------------- exp expression Returns: boolean Synopsis: This function forces an immediate evaluation of the boolean expression exp. Examples: > evalb(5=5); true > evalb(true = (not(not(not false)))); true See also: ?eval exit Function exit Option: builtin Calling Sequence: exit(status) Parameters: Name Type Description -------------------------------------- status {0,posint} exit status code Synopsis: The exit function causes Darwin to immediately be terminated and the value of status is returned to the parent process. A non-zero exit code indicates an error whereas 0 indicates a successful termination. References: man 3 exit Examples: > exit(2); See also: ?return exp Function exp - exponential function Options: builtin and polymorphic Calling Sequence: exp(x) exp(A) Parameters: Name Type -------------------------------- x a numerical value A a square numerical matrix Returns: numeric matrix(numeric) Synopsis: This function computes the exponential e^x (e = 2.71828...) if the parameter is a single numerical value. Otherwise, it computes e^A = I + A + A^2/2 + A^3/6 + ... or the exponential of a square matrix. For all numerical values of x, ln(exp(x))=x. References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 4.2 Examples: > exp(0); 1 > exp(5); 148.4132 > exp([[1, 2], [3, 4]]); [[51.9690, 74.7366], [112.1048, 164.0738]] See also: ?expx1 ?lg ?ln ?ln1x ?log ?log10 expx1 Function expx1 - compute exp(x)-1 accurately for small x Calling Sequence: expx1(x) Parameters: Name Type ------------------------ x a numerical value Returns: numeric Synopsis: This function computes the exponential e^x-1 (e = 2.71828...). This function is intended for very small values of x when exp(x) is too close to 1, and hence significant precision is lost. For all numerical values of x, ln1x(expx1(x))=x. References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 4.2 Examples: > expx1(0); 0 > expx1(5e-20); 5e-20 > expx1(ln1x(7e-30)); 7e-30 See also: ?exp ?lg ?ln ?ln1x ?log ?log10 factorial Function factorial Options: builtin, numeric and zippable Calling Sequence: factorial(n) Parameters: Name Type ------------------------------------ n an integer or numerical value Returns: numeric Synopsis: factorial returns the product of 1*2*3*...*n for integer values of n. For non-integer values it returns Gamma(n+1), the complex-plane extension of factorial. Gamma(z+1) = z*Gamma(z). For non-integer values it is also possible to define factorial for negative arguments. This function can be invoked with the standard postfix notation, that is n! or in functional form, factorial(n). References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 6.1 Examples: > 0!; 1 > 6!; 720 > factorial(-1.5); -3.5449 See also: ?Gamma ?LnGamma floor Function floor Options: builtin, numeric and zippable Calling Sequence: floor(x) Parameters: Name Type ------------------------ x a numerical value Returns: integer Synopsis: floor returns the largest integer less than or equal to x. floor (x) = -ceil(-x) for all values of x. Examples: > floor(-2); -2 > floor(-1.99999); -2 > floor(2.000001); 2 See also: ?ceil ?iquo ?mod ?round ?trunc gc Function gc - garbage collection Option: builtin Calling Sequence: gc() Returns: NULL Synopsis: This function forces Darwin to immediately coalesce all allocated but not in use memory. Unless the system variable printgc is set to false, this function prints the current number of bytes allocated and the total CPU time used so far. Examples: > gc(); See also: ?Set (the gc option) gcd Function gcd - greatest common divisor Calling Sequence: gcd(a1..ak) Parameters: Name Type ----------------------- ai an integer value Returns: integer Synopsis: Gcd computes the greatest common divisor of all the arguments given. That is a number that exactly divides each one of the arguments. Gcd takes a variable number of arguments, but all of them must be integers. Examples: > gcd(91,21); 7 > gcd(999999,142857); 142857 > gcd(); 0 > gcd(20,25,-30,-40); 5 See also: ?iquo ?mod getpid Function getpid Option: builtin Calling Sequence: getpid() Returns: posint Synopsis: This function returns the process identification number assigned by the operating system to the current invocation of Darwin. Examples: > getpid(); 25033 gigahertz Function gigahertz - estimate the processor speed Calling Sequence: gigahertz() Returns: numeric Synopsis: This function estimates the computing power of the processor which is running. The value has been tuned so that a Pentium III processor rated at 750MHz gives 0.75 as a result. Hence, this is a measure equivalent to the number of MHz of such processors. There are many many factors which affect the efficiency of Darwin running on a particular processor, e.g. compiler, system load, cache size, type of processor, memory speed, and many others. So this number should be taken with extreme care. The function executes alignments, some counting, random number generation and some linear algebra to obtain the estimate of the time. Examples: > gigahertz(); 4.1970 See also: ?time has Function has - test if a structure contains a value Option: builtin Calling Sequence: has(str,val) Parameters: Name Type Description ------------------------------------------ str anything an arbitrary structure val anything value to be found in str Returns: boolean Synopsis: The function tests whether the second argument is part of the first argument. Examples: > has([1,2,3],2); true > has(A(1,2,3),4); false > has({[A(77)]},77); true See Also: ?coeff ?indets ?mselect ?subs ?hastype ?lcoeff ?noeval ?types hash Function hash - hashing value of an arbitrary expression Option: builtin Calling Sequence: hash(expr) Parameters: Name Type Description --------------------------------------- expr anything any Darwin expression Returns: integer Synopsis: The hash function returns an arbitrary integer computed from the given expression. This hashing value is guaranteed to be the same for identical expressions, but it is not guaranteed to be unique. That is, there could be two different expressions which yield the same hash value. The hash value of a string with a single character is the numerical value of its ascii representation plus a constant. Hashing values are used internally for the remember function, and may be used by the user for similar purposes (detecting that two expressions are different without actually comparing them). The hashing values are not guaranteed to be the same across different systems, in particular they depend on the integer word size. Examples: > hash([1,2]); 7881299347950511 > hash('abc'); 3377699728949699 > hash(abc); 3377699728949699 > hash(a)-hash(A); 32 > hash(ASHYMY)-hash(YYYWYN); -927712935936 See also: ?remember ?sha2 ?table hastype Function hastype - test if a structure contains any object of a given type Calling Sequence: hastype(str,typ) Parameters: Name Type Description ------------------------------------------- str anything an arbitrary structure typ type a type to be found in str Returns: boolean Synopsis: The function hastype tests whether the first argument contains any value of the given type Examples: > hastype([1,2,3],posint); true > hastype(A(1,2,3),list); false > hastype({[A(77)]},list); true See also: ?coeff ?has ?indets ?lcoeff ?mselect ?noeval ?subs ?types help Function help Calling Sequence: help(topic) ? topic Parameters: Name Type -------------- topic string Returns: NULL Global Variables: HelpIndex HelpText Synopsis: The help and ? functions search (approximately) for the topic in the Darwin system and print out any description lines for these routines. The help function is case insensitive. Users should take note that print(topic) and help(topic) have different semantics. Firstly, no approximate search is performed with topic in the former and secondly, the description for topic is calculated dynamically for topic (any examples are run immediately). Examples: > help(Match); . . . > ? phylogenetic; . . . See also: ?print hostname Function hostname Option: builtin Calling Sequence: hostname() Returns: string Synopsis: This function returns the name of the current host on which the current session is running. Examples: > hostname(); linneus78 See also: ?CallSystem ?getpid ilogb Function ilogb Options: builtin, numeric and zippable Calling Sequence: ilogb(x) Parameters: Name Type ------------------------ x a numerical value Returns: integer Synopsis: ilogb returns the exponent of the floating point representation of x. This function is defined in the IEEE 754 floating point standard. It is the floor of the logarithm base 2 of |x|, for |x| >= 1, computed directly from the representation (very fast). For non-zero arguments, and for IEEE base 2 floating point numbers, 1 <= |x| / 2^ilogb(x) < 2. Examples: > ilogb(2); 1 > ilogb(1.0e-307); -1020 > ilogb(0); -2098 > ilogb(Pi); 1 See also: ?lg ?scalb indets Function indets - return all subexpressions of a given type Calling Sequence: indets(str,typ) Parameters: Name Type Description --------------------------------------------------------- str anything an arbitrary structure typ type (optional) a type to be searched in str Returns: set(typ) Synopsis: The function indets returns a set with all the subexpressions in str which are of type typ. If the type typ is omitted, it is assumed to be "symbol". Examples: > indets([1,-2,3.1,abc],posint); {1} > indets(A(1,[77],[[]]),list); {[],[77],[[]]} > t := Tree(Tree(Leaf(A),5,Leaf(B)),0,Tree(Leaf(C),11,Leaf(D))); t := Tree(Tree(Leaf(A),5,Leaf(B)),0,Tree(Leaf(C),11,Leaf(D))) > indets(t,Leaf); {Leaf(A),Leaf(B),Leaf(C),Leaf(D)} > indets(t); {A,B,C,D} See also: ?coeff ?has ?hastype ?lcoeff ?mselect ?noeval ?subs ?types intersect Function intersect Options: builtin and polymorphic Calling Sequence: a intersect b intersect(a,b) Parameters: Name Type ----------- a set b set Returns: set Synopsis: Computes the intersection of two sets, that is a set which has all the elements both in a and b. The value intersect() is understood to be the entire universe, and hence intersections including intersect() will simply return the other argument. In its functional form, any arbitrary number of sets can be intersected. In particular, intersect(a) = a. Examples: > {1,2,3} intersect {2,3,4}; {2,3} > {1,2,3} intersect {}; {} > {1,2,3} intersect intersect(); {1,2,3} See also: ?member ?minus ?subset ?union invlogit Function invlogit( l:numeric ) Convert 10 log10(p) to log10(p/(1-p)). iquo Function iquo Option: polymorphic Calling Sequence: iquo(a,b) Parameters: Name Type -------------- a integer b integer Returns: integer Synopsis: iquo returns the integer quotient between a and b. If b=0, a division by zero fault is generated. The result is truncated towards zero for both positive and negative results. Formally, iquo(a,b) = trunc(a/b). Examples: > iquo(7,3); 2 > iquo(-3,2); -1 > iquo(121,11); 11 See also: ?ceil ?floor ?mod ?round ?trunc islower Function islower( c:string ) Returns true if c is lower case, else returns false isupper Function isupper( c:string ) Returns true if c is upper case, else returns false iterate Function iterate - make available one value for an iterator Option: builtin Calling Sequence: iterate(v) Parameters: Name Type Description ------------------------------------------------------------ v anything a value that will be used by a for-in loop Returns: NULL Synopsis: iterate is used inside an iterator function to feed a value to the calling for-in loop. The argument(s) of iterate are evaluated, and the for loop variable is assigned this value, and another iteration is performed. The body of the for loop is executed by the call to iterate. See Also: ?Entries ?iterator ?Lines ?Postfix ?Primes ?Infix ?Leaves ?objectorientation ?Prefix ?Sequences json Function json - serialize darwin structure as json compatible string Calling Sequence: json(obj) Parameters: Name Type Description ----------------------------------------- obj anything object to be serialized Returns: string Synopsis: This function serializes any darwin object into a json formated string. Darwin objects are encoded as objects with a '_darwinType' and a 'data' field. References: http://www.json.org Examples: > json( [1,2,'blue']); [1,2,"blue"] > json(Complex(5,2)); {"_darwinType":"Complex","data":[5,2]} See also: ?OpenWriting ?WriteSeqXML latex Function latex - convert a document or part of it to latex Option: polymorphic Calling Sequence: latex(a,titl,auth) LaTeX(a,titl,auth) LaTeXC(a) Parameters: Name Type Description ---------------------------------------------------------------- a {string,structure} object to convert to latex titl string (optional) title of the document auth string (optional) author(s) of the document Returns: string Synopsis: The latex function converts an object, typically a Document or a part thereof, to latex. LaTeX is a synonym of latex, much more difficult to type but according to Leslie Lamport. LaTeXC is used for a component, that is no headers/trailers will be produced. Examples: > t := Table( center, border, Row('abc','cde')): > prints(LaTeXC(t)); \begin{table}[!ht] \begin{center} \begin{tabular}{|c|c|} \hline abc & cde\\ \hline \end{tabular} \end{center} \end{table} > d := Document('Species evolve, that''s it.'): > prints(latex(d,'The origin of species','Charles Darwin')); % automatically generated by Darwin % prepared on Tue Feb 19 10:54:59 2013 % running on linneus78 % by user darwin \documentclass{article} \usepackage{html,color,epsfig} \setlength{\parindent}{5pt} \begin{document} \title{The origin of species} \author{Charles Darwin} \maketitle Species evolve, that's it. \end{document} See Also: ?Block ?Document ?List ?RunDarwinSession ?Code ?HTML ?Paragraph ?screenwidth ?Color ?HyperLink ?PostscriptFigure ?Table ?Copyright ?Indent ?print ?TT ?DocEl ?LastUpdatedBy ?Roman ?View lcoeff Function lcoeff - leading coefficient Calling Sequence: lcoeff(s) Parameters: Name Type ------------------------------- s an arithmetic expression Returns: algebraic : the leading coefficient in s Synopsis: lcoeff computes the leading numerical coefficient contained in the algebraic expression s. The algebraic expression s may be any mathematical expression which is not yet evaluated (in symbolic form, see noeval). In case of a sum, the leading coefficient is extracted from the first (positional) coefficient. Examples: > t1 := noeval(3*a+b*c); t1 := 3*a+b*c > lcoeff(t1); 3 See also: ?coeff ?has ?hastype ?indets ?mselect ?noeval ?subs ?types length Function length - length of an object Option: builtin Calling Sequence: length(obj) Parameters: Name Type Description -------------------------------------------- obj {array,list,set,string} any object Returns: {0,posint} Synopsis: Returns the length of the given object obj. Examples: > length(''); 0 > length({1,2,{a,b,c}}); 3 > length([1,2,3,4]); 4 > length('length'); 6 See also: ?assemble ?Class ?CreateArray ?disassemble ?size lg Function lg Calling Sequence: lg(x) Parameters: Name Type ------------------------------------------- x a positive number or a square matrix Returns: {numeric,matrix(numeric)} Synopsis: lg computes the logarithm base 2 or a number or a square matrix. For all arguments it is true that lg(2^x) = x. For positive arguments or for matrices for which the logarithm can be computed, it is always true that 2^lg(x) = x. Examples: > lg(7.5); 2.9069 > lg(16); 4 > lg( [[2,1],[0,3]]); [[1, 0.5850], [0, 1.5850]] > 2^lg( [[2,1],[0,3]]); [[2.0000, 1.0000], [0, 3.0000]] See also: ?exp ?ilogb ?ln ?ln1x ?log ?log10 ln Function ln Options: builtin and polymorphic Calling Sequence: ln(x) ln(A) Parameters: Name Type -------------------------------- x a numerical value > 0 A a square numerical matrix Returns: numeric matrix(numeric) Synopsis: This function computes the logarithm base e (e = 2.71828...) if the parameter is a single numerical value. This is usually called the natural logarithm. If the argument is a square matrix, it computes a square matrix B with the same dimensions as A such that e^B=A, or the natural logarithm of a square matrix. Not all matrices have a logarithm (which is real-valued). For all numerical values of x, ln(exp(x))=x. References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 4.1 Examples: > ln(1); 0 > ln(5); 1.6094 > ln([[2, 1], [3, 4]]); [[0.4024, 0.4024], [1.2071, 1.2071]] See also: ?exp ?expx1 ?ilogb ?lg ?ln1x ?log ?log10 ln1x Function ln1x - compute ln(1+x) accurately for small x Calling Sequence: ln1x(x) Parameters: Name Type ----------------------------- x a numerical value > -1 Returns: numeric Synopsis: This function computes the logarithm base e (e = 2.71828...) of 1+x. This is necessary when the value of x is very small, and computing 1+x would produce a significant truncation. A typical such computation is when 1 - (1-eps)^n has to be computed, and eps is very small and n is very large. This can be done accurately with -expx1(n*ln1x(-eps)). References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 4.1 Examples: > ln1x(0.001)-ln(1.001); 1.0994e-16 > ln1x(1e-60); 1e-60 See also: ?exp ?expx1 ?ilogb ?lg ?log ?log10 lnProbBallsBoxes Function lnProbBallsBoxes - probability of hitting k eps-boxes with n balls Calling Sequence: lnProbBallsBoxes(k,n,eps) Parameters: Name Type Description ---------------------------------------------------------- k posint number of boxes n posint number of balls randomly thrown in [0,1] eps positive 0 lnProbBallsBoxes(3,10,0.0001); -21.0528 See Also: ?Cumulative ?DigestWeights ?OutsideBounds ?StatTest ?DigestAspN ?DynProgMass ?ProbBallsBoxes ?Std_Score ?DigestionWeights ?DynProgMassDb ?ProbCloseMatches ?DigestSeq ?enzymes ?SearchMassDb ?DigestTrypsin ?MassProfileResults ?Stat log Function log Options: builtin and polymorphic Calling Sequence: log(x) log(A) Parameters: Name Type -------------------------------- x a numerical value > 0 A a square numerical matrix Returns: numeric matrix(numeric) Synopsis: This function computes the logarithm base e (e = 2.71828...) if the parameter is a single numerical value. This is usually called the natural logarithm. If the argument is a square matrix, it computes a square matrix B with the same dimensions as A such that e^B=A, or the natural logarithm of a square matrix. Not all matrices have a logarithm (which is real-valued). For all numerical values of x, log(exp(x))=x. Log is an alias for ln. References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 4.1 Examples: > log(1); 0 > log(5); 1.6094 > log([[2, 1], [3, 4]]); [[0.4024, 0.4024], [1.2071, 1.2071]] See also: ?exp ?expx1 ?ilogb ?lg ?ln ?ln1x ?log10 log10 Function log10 Calling Sequence: log10(x) log10(A) Parameters: Name Type Description ---------------------------------------- x numeric numeric > 0 A matrix a square numeric matrix Returns: numeric matrix(numeric) Synopsis: This function computes the logarithm (base 10) if the parameter is a single numerical value. If the argument is a square matrix, it computes a square matrix B with the same dimensions of A such that 10^B = A. Not all matrices have a logarithm base 10 (which is real valued). Examples: > log10(7.5); 0.8751 > log10(10); 1 See also: ?lg ?ln ?ln1x ?log logit Function logit( L:numeric ) Convert log10(p/(1-p)) to 10 log10(p). lowercase Function lowercase Option: builtin Calling Sequence: lowercase(t) Parameters: Name Type ------------- t string Returns: string Synopsis: The string t is converted to lowercase letters. Examples: > lowercase('Not NEARLY SO BoLD'); not nearly so bold See also: ?uppercase lprint Function lprint - linear print of expression(s) Option: builtin Calling Sequence: lprint(e1,e2,...) Parameters: Name Type Description ----------------------------- ei anything expression Returns: NULL Synopsis: This function prints out any Darwin built-in type or structured type. If the expression is too long, newline characters will be inserted in a semi-intelligent way. Multiple expressions are separated by a single space. Floating point numbers are printed with 5 significant digits. The global variable NumberFormat can be assigned a format, as in the printf function, and all numbers will be printed accordingly. lprint is intended to provide a safe and quick way of printing expressions. In general, it is not possible to read them back into Darwin, use dprint for Darwin-readable output. Examples: > x:= [[1,2],[3,4]]: > lprint('A linear printing of a square matrix:', x); A linear printing of a square matrix: [[1, 2], [3, 4]] See Also: ?dprint ?printf (contains conversion patterns) ?prints ?sscanf ?print ?PrintMatrix ?sprintf matrix_inverse Function matrix_inverse - invert a square matrix Option: builtin Calling Sequence: matrix_inverse(A) Parameters: Name Type Description -------------------------------------------------------- A matrix a matrix for which the inverse is wanted Returns: matrix Synopsis: Compute the inverse of a square matrix. If A is a square matrix the same effect is obtained by computing A^(-1). To resolve a system of linear equations, GaussElim(A,b) is more efficient than A^(-1) * b. Examples: > A := [[3,1,2],[1,2,-1],[2,-1,5]]; A := [[3, 1, 2], [1, 2, -1], [2, -1, 5]] > A^(-1); [[0.9000, -0.7000, -0.5000], [-0.7000, 1.1000, 0.5000], [-0.5000, 0.5000, 0.5000]] See Also: ?Cholesky ?Eigenvalues ?GivensElim ?LinearProgramming ?transpose ?convolve ?GaussElim ?Identity ?matrix max Function max - maximum of numbers or list of numbers Options: builtin and numeric Calling Sequence: max(L1,L2,...) Parameters: Name Type Description ------------------------------------------------------------------------------------------ Li {numeric,list(numeric),list(list(numeric))} numbers or list (of lists) of numbers Returns: numeric Synopsis: Finds the maximum valued element in L if L is simply a list of numeric elements. If L is a list of lists of numeric, the function effectively flattens this list to a simple list and returns the maximum valued element. Examples: > max(5, 97, 22, [14,15,16] ); 97 > max(2,3,5,7,11,13,17,19); 19 See also: ?avg ?min ?std ?var median Function median - median of numbers or list of numbers Calling Sequence: median(L1,L2,...) Parameters: Name Type Description ------------------------------------------------------------ Li {numeric,list(numeric)} a number or list of numbers Returns: numeric Synopsis: Finds the median of all the values in the arguments. Examples: > median(5, 97, 22 ); 22 > median(2,3,5,7,11,13,17,19); 9 See also: ?avg ?max ?min ?std ?var member Function member Option: builtin Calling Sequence: member(a,b) Parameters: Name Type Description ---------------------------------------------------------------------- a anything element to be tested for membership in set or list b {list,set} a set or list Returns: boolean Synopsis: The member function returns true iff element a is in the set/list b Examples: > member(5, [1,2,5,7]); true See Also: ?intersect ?SearchArray ?subset ?union ?minus ?SearchOrderedArray ?table min Function min - minimum of numbers or list of numbers Options: builtin and numeric Calling Sequence: min(L1,L2,...) Parameters: Name Type Description ------------------------------------------------------------------------------------------ Li {numeric,list(numeric),list(list(numeric))} numbers or list (of lists) of numbers Returns: numeric Synopsis: Finds the minimum valued element in L if L is simply a list of numeric elements. If L is a list of lists of numeric, the function effectively flattens this list to a simple list and returns the minimum valued element. Examples: > min(5, 97, 22, [14,15,16] ); 5 > min(2,3,5,7,11,13,17,19); 2 See also: ?avg ?max ?median ?std ?var minus Function minus Options: builtin and polymorphic Calling Sequence: a minus b minus(a,b) Parameters: Name Type ------------ a a set b a set Returns: set Synopsis: Computes the set difference of two sets; that is a set consisting of all elements in a but not in b. The value intersect() is understood to be the entire universe, and hence subtracting intersect() will return the empty set and subtracting from intersect() is not allowed. Examples: > {1,2,3} minus {2,3,4}; {1} > {1,2,3} minus {}; {1,2,3} > {1,2,3} minus intersect(); {} See also: ?intersect ?member ?subset ?union mod Function mod Options: builtin, numeric and polymorphic Calling Sequence: mod(x,y) Parameters: Name Type Description ----------------------------- x numeric a number y numeric a number > 0 Returns: numeric Synopsis: This function computes the function x (mod y) i.e. it returns the integer remainder after dividing y into x. Note: if x or y are so large that they cannot be represented exactly as integers in a double precision number, the results may be wrong. Examples: > mod(5,2); 1 > mod(99,1); 0 > mod(-3,2); 1 See also: ?ceil ?floor ?round ?trunc mselect Function mselect Calling Sequence: mselect(fn,obj,[arg2,...]) Parameters: Name Type ---------------------------------------------------------------------------- fn the selection function, returns true/false obj a composed object (list, set, structure) whose parts will be selected arg2 additional arguments that are passed to fn Returns: anything : the result is of the same type as obj Synopsis: Mselect selects the parts of the second argument and builds a new object of the same type, but only with the parts for which the function fn is true. More precisely, for each i from 1 to length(obj), op(i,obj) will be in the result depending on fn(op(i,obj),arg2..) being true or false. The extra arguments, arg2, ... are passed as additional arguments to fn. Select is normally used on lists sets or structures. Examples: > mselect( type, [-1,0,1,1.2], posint ); [1] > mselect( x -> (x<1), [-1,0,1,1.2] ); [-1, 0] See also: ?op names Function names - find all assigned names Option: builtin Calling Sequence: names(typ) Parameters: Name Type ------------------------ typ {'assigned',type} Returns: an expression sequence Synopsis: When no arguments are specified (or with "anything" as a typ), the names function returns all names, assigned or unassigned. When the argument typ is included, all names which are assigned a value of type typ are returned. The special typ value "assigned" will return only the names which are assigned. Be careful when using the all the names, that some names like break, next etc. may produce very unexpected results when evaluated. Examples: > names(numeric); LongInteger_log2base, DBmarkG, SumSq, NBody_Cost, ScaleIndex_I, MLPamDistance, AveNormSD_lim, DBL_EPSILON, DBL_MAX, MinIterBeforeNewton, ExpectedPamDistance, BINARY_IN_PATH, BINARY_IN_WRAPPER_FOLDER_32, DimensionlessFit, ntRNA, LongInteger_base, BINARY_HARDCODED, NumberErrors, VertexCoverLowerBound, SetRandSeed_value, MST_Qual, Pi, StepsForCG, LongInteger_base2, FollowLine_nmin, printlevel, BINARY_IN_WRAPPER_FOLDER, AveNormSD_Damp, NewNodeName_next, NoSpectralBeforeSD, LinearClassify_X0_i0, NBodyPotential, iii, Minimize_n, RepeatNewtonFactor, MinLen See also: ?assigned ?types noeval Function noeval Option: builtin Calling Sequence: noeval(exp) Parameters: Name Type ----------------- exp expression Returns: expression Synopsis: The noeval function delays evaluation of the expression exp. It simply returns exp. Examples: > unevaluated := noeval(1+1); unevaluated := 1+1 > unevaluated_function := noeval(factorial(5)); unevaluated_function := factorial(5) > unevaluated; 1+1 > unevaluated_function; factorial(5) See also: ?eval op Function op - pick up operands of an expression Option: builtin Calling Sequence: op(obj) op(i,obj) op(i..j,obj) Parameters: Name Type Description -------------------------------------------------------------------- obj {array,equal,list,range,set,structure} an object with parts i posint j posint Returns: An expression sequence with the components of the object Synopsis: The op(obj) function strips off the outer-most square brackets [,] (list, array, matrix) or outer-most braces {,} (set). It returns an expression sequence with the components. One use of op is to change, for example, a list into a set. E.g. {op(x)}. When op is given two arguments, a posint i and an object obj, the function returns the i^th part of obj. If a range is given, it returns all the i^th through j^th parts of obj. Examples: > op([1, [a,b], 4]); 1, [a, b], 4 > op({1..2, {4, 5, {7}}}); 1..2, {4,5,{7}} > z := var = integer; z := var = integer > op(1,z); var > op(2, z); integer See also: ?selectorfunction (select operator a[i]) parse Function parse Option: builtin Calling Sequence: parse(s) Parameters: Name Type ------------------------------------------------------------- s a string with a correct Darwin expression or statement Returns: anything : an unevaluated Darwin expression/statement Synopsis: Parse does the same syntactic analysis that Darwin would do on a program or interactive command. It returns the object thus created without evaluation. If the string s has a syntax error, then the command will print an appropriate error and return an error condition. If more than one statement is provided in the string, then these are concatenated and a statement sequence is returned. A terminating semicolon is not necessary, the parser will add one. Any NULL statement will be ignored. Examples: > parse('a+b'); a+b > eval(parse('xyz := 1')); 1 > xyz; 1 See also: ?eval ?noeval print Function print - general pretty-printing Option: polymorphic Calling Sequence: print(e1,e2,...) Parameters: Name Type Description ----------------------------- ei anything expression Returns: NULL Synopsis: This function attempts to print out the contents of each e_{i} in a pretty/readable manner. Any user-defined data structures/classes named, for example ClassName, can make use of the print command by creating a procedure named ClassName_print. This routine should detail the manner in which the data structure is to be sent to the standard output. Any invocation of the print statement on an object of type ClassName will automatically invoke this routine. All built-in Darwin data structures have such a routine. Floating point numbers are printed with 5 significant digits. The global variable NumberFormat can be assigned a format, as in the printf function, and all numbers will be printed accordingly. To print procedures there are two options. Print on a procedure produces a short description based on the parameters, description field (if any) and return type. To print the body or the procedure (code) the function disassemble should be used in conjunction with print. This produces a nice albeit not perfect formatting. Examples: > x:= [[1,2],[3,4]]; x := [[1, 2], [3, 4]] > print(x); 1 2 3 4 > f := proc(x:positive) description 'test example'; for i to 20 do x+i od; i+sin(x) end: > print(f); f: Usage: f( x:positive ) test example > print(disassemble(op(PartialFraction))); proc( r:numeric, eps:numeric ) local t, t2; if nargs = 1 then procname(r, 1e-05) elif r < 0 then t := procname(-1*r,eps); [-1*t[1],t[2]] elif 1 < r*eps then [round(r),1] elif r < eps then [0,1] elif type(r,integer) then [r,1] else t2 := floor(r); if r-1*t2 < eps then [t2,1] else t := procname((r-1*t2)^(-1),r^2*eps); [t2*t[1]+t[2],t[1]] fi fi end: See also: ?dprint ?lprint ?printf ?prints printf Function printf Option: builtin Calling Sequence: printf(textpattern, e1, e2,...) Parameters: Name Type ------------------------ textpattern string ei expression Returns: NULL Synopsis: The printf statement behaves in a similar manner as C's printf statement. Conversion characters for the printf co Character Description a prints any Darwin value including lists, sets and structures A same as a, but will quote strings (same as dprint) c prints a single character d prints an integer e prints a number in exponential notation f prints a number (decimal notation) g prints a number (general format, use f or e, whichever is shorter) e prints a number (explicit exponent) o prints the octal conversion of an integer s prints a string (symbol or string) u prints an unsigned integer x prints the hexadecimal conversion of an integer % prints a percent sign % The cursor control sequences for the printf command: Character Description b backspace n carriage return and newline t tab v newline \\ single backslash '' single quote Examples: > printf('%a, %a\n', ['L', 'I', 'S', 'T'], 'a means any structure'); [L, I, S, T], a means any structure > int := 1234; int := 1234 > printf('|%d|%10d|%-10d|\n', int, int, int, int); |1234| 1234|1234 | > t := 1234.567; t := 1234.5670 > printf('|%f|%12f|%12.5f|%-12.5f|\n', t, t, t, t); |1234.567000| 1234.567000| 1234.56700|1234.56700 | > printf('|%11s|%12s|%12s|%12s|\n', 'normal', 'field of 12', '5 decimal', 'left flush'); | normal| field of 12| 5 decimal| left flush| See also: ?lprint ?print ?prints ?sprintf ?sscanf prints Function prints - print strings in full length Calling Sequence: prints(string1,...) Parameters: Name Type Description ----------------------------------------- string1 string a string to be printed Returns: NULL Synopsis: Print all the arguments as strings (format %s) ended with a newline. See also: ?dprint ?lprint ?print ?printf product Function product Option: builtin Calling Sequence: product(a) product(p,i = lo..hi) product(p,i = s) Parameters: Name Type Description -------------------------------------------------------------------- a list a list of multipliable elements lo numeric lower bound of index hi numeric upper bound of index p anything expression to be multiplied for all index values s {list,set} set or list of index values Returns: numeric Synopsis: When product is called with a list, the product of all the elements of the list is computed. The formats with an index variable, i, multiplies the expression p for all the values of the variable i. The expression p is evaluated each time that i is assigned a value. If a range of values is given, i is first assigned lo which is incremented by 1 every time. The expression p is evaluated and multiplied as long as i <= hi. In the third format, i is assigned all the values of the set or list. The index variable can be assigned another value, it will not be changed, nor it will disturb the multiplication. Examples: > product([1,3,6]); 18 > product(i,i=1..10); 3628800 > product(i^2,i={2,3,5,7}); 44100 > i:=nonsense; i := nonsense > product(10*i,i=1..2); 200 > i; nonsense > product([]); 1 See also: ?list ?op ?seq ?sum ?zip regexp Function regexp( r:string, s:string ) Returns the positions and lengths of regexp r in string s. remember Function remember - evaluate a function and remember its result Option: builtin Calling Sequence: remember(func_call) Parameters: Name Type Description --------------------------------------- func_call structure a function call Returns: anything Synopsis: The remember function stores results of function evaluations in an internal table for the purpose of saving computation time. When remember is called, the system checks to see if the argument function has been called previously with the same arguments, and if so, then the previous result is returned. If it is not found, the function call is executed and its result stored in the internal table as well as returned to the user. The internal table does not keep all the results forever, at garbage collection time arguments that are no longer available will cause the corresponding entries to be removed. Eventually, all unused entries will be removed. The user of remember should keep in mind that this is a heuristic saving of evaluations, it should not be counted on happening every time. Remember is usable when the argument function does not have side effects (for example printing), as it will be unpredictable when these side effects will happen. It should also be used on functions which do a significant amount of computation, else its effort is not justifiable. The profiling tools are good to determine which functions will profit from remembering. Warning: When the returned value is a structure (e.g. a matrix or a class), changing the structure will also change value stored in the remember-table! This will lead to unexpected behaviour. In case that the user wants to erase the remember table, (for example the function to be remembered has changed its behavior in some way and old values should not be remembered), calling remember with the argument "erase" will erase all previously remembered values For the example below we compute the Fibonacci numbers with their simple recurrence. Without the remember function, this definition takes exponential time. Examples: > F := proc( n:integer ) if n < 2 then n else remember(F(n-1)) + remember(F(n-2)) fi end: > [ seq( F(i), i=0..10 )]; [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55] > F(50); 12586269025 See also: ?hash ?profiling ?table return Function return Option: builtin Calling Sequence: return(obj) Parameters: Name Type --------------- obj anything Returns: anything Synopsis: The return function causes Darwin to immediately exit a procedure and return obj to the point of calling. Examples: > sum_up := proc(x) total := 0: for i from 1 to x do total := total + i od; return(total); end; sum_up := proc (x) local total, i; total := 0; for i to x do total := total+i od; return(total) end > sum_up(1772); 1570878 See also: ?exit round Function round Options: builtin, numeric and zippable Calling Sequence: round(x) Parameters: Name Type -------------- x numeric Returns: integer Synopsis: This function rounds the argument x to the nearest integer. For exact integers + 1/2, the rounding is done according to the next higher significant bit (IEEE standard). Examples: > round(5.5555555); 6 > round(1.3); 1 > round(-7.8); -8 See also: ?ceil ?floor ?mod ?trunc scalb Function scalb Options: builtin and numeric Calling Sequence: scalb(x,n) Parameters: Name Type ------------------------ x a numerical value n an integer Returns: numeric Synopsis: scalb returns the value x multiplied by the base to the power n. This function is defined in the IEEE 754 floating point standard. For IEEE 754 floating point, the base is 2 and scalb(x,n) = x * 2^n, is computed by exponent manipulation directly from the representation. Hence it is very fast and exact. Examples: > scalb(1,10); 1024 > scalb(1,-1023); 1.1125e-308 > scalb(0,1024); 0 See also: ?ilogb ?lg seq Function seq Option: builtin Calling Sequence: seq(e,n) seq(e,i = lo..hi) seq(e,i = SetOrList) Parameters: Name Type Description ----------------------------------------------------------- e an arbitrary expression n integer i symbol lo numeric hi numeric SetOrList {list,set} set or list of values Returns: expression sequence of the e objects Synopsis: In the first format, an expression sequence with e replicated n times is returned. This is useful, for example, to create arrays with initial values and to pad arrays. Normally, expression sequences will be enclosed in lists, sets or as arguments of functions or data structures. In the second format, an expression sequence is produced for all the values of e with the symbol i assigned consecutive values from lo to hi (inclusive). In both cases, a negative integer or hi < lo will generate an empty expression sequence. In the third format, the variable i will take all the values from the set or list. Examples: > [seq(7,3)]; [7, 7, 7] > {seq(2^i,i=0..10)}; {1,2,4,8,16,32,64,128,256,512,1024} > A(seq(i,i=1.5..2.8)); A(1.5000,2.5000) > seq(Rand(),4); 0.8632, 0.4194, 0.7952, 0.2781 > [seq(0,5),seq(i,i=1..5),seq(6,3)]; [0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 6, 6] See also: ?op ?sum ?zip sequal Function sequal Option: builtin Calling Sequence: sequal(a,b) Parameters: Name Type ------------------------------ a an arbitrary expression b an arbitrary expression Returns: true or false Synopsis: sequal tests for the structural equality of expressions. This means that if two expressions differ in their structure, but represent the same value, (e.g. LongInteger(1) and 1), sequal will just test for the structural equality, and hence sequal(LongInteger(1),1) will return false, whereas evalb(LongInteger(1)=1) will return true. Quoted strings and symbols representing the same character sequence, compare equal under normal equality but will compare different with sequal. The primary use of sequal is to take advantage of the representation of an object, and test for an exact representation, other than just a value. Indiscriminate use of sequal leads to non-polymorphic programs. Examples: > sequal(LongInteger(1),1); false > evalb(LongInteger(1)=1); true > sequal( {1,2,3}, {1,2} ); false > sequal('abc',abc); false See also: ?evalb ?If ?objectorientation sha2 Function sha2 - Computes SHA2 hash of a string Option: builtin Calling Sequence: sha2(s) Parameters: Name Type Description ----------------------------------- s string string to be hashed Returns: string Synopsis: This function computes the 512bit SHA2 hash value of a given string. The result is represented as a hex-formatted string Examples: > sha2('abc'); ddaf35a193617abacc417349ae20413112e6fa4e89a97ea20a9eeee64b55d39a2192992a274fc1a836ba3c23a3feebbd454d4423643ce80e2a9ac94fa54ca49f See also: ?hash sign Function sign - sign (-1,0,1) of a number or list of numbers Calling Sequence: sign(val) Parameters: Name Type Description --------------------------------------------------- val {list,numeric} any value or list of values Returns: {-1,0,1} Synopsis: Returns -1 (if val<0), 0 (if val=0) or 1 (if obj>0). It also maps itself onto lists (and hence matrices). Examples: > sign(-5); -1 > sign( [-1,2,-3,4,0] ); [-1, 1, -1, 1, 0] See also: ?If ?max ?min ?zip sin Function sin Options: builtin, numeric, polymorphic and zippable Calling Sequence: sin(x) Parameters: Name Type ------------------------ x a numerical value Returns: numeric Synopsis: This function computes the trigonometric sine function. sin(x) has simple zeros at at x=n*Pi. References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 4.3 Examples: > sin(0); 0 > sin(Pi/4); 0.7071 > sin(Pi/2); 1 > sin(-Pi); -1.2246e-16 See also: ?arcsin ?arctan ?cos ?tan size Function size - number of words used by the entire object Option: builtin Calling Sequence: size(obj) Parameters: Name Type Description ----------------------------- obj anything any object Returns: {0,posint} Synopsis: Returns the total number of words used by the representation of the obj in memory. This is the number of words, which depending on the hardware will be 32 or 64 bits words. See version() for this information. This should be used mostly for comparative purposes, when two alternatives for representing some information have to be evaluated. Size will not count the name of the Class for data structure objects, under the assumption that this name is defined only once and used too many times. Other objects, even if used repeatedly will be counted entirely. Examples: > size(''); 3 > size({1,2,{a,b,c}}); 21 > size([1,2,3,4]); 7 > size([1.2,2.2,3.2,4.2]); 15 See also: ?assemble ?Class ?CreateArray ?disassemble ?length ?version sleep Function sleep Option: builtin Calling Sequence: sleep(t) Parameters: Name Type ------------- t posint Returns: NULL Synopsis: This function causes Darwin to sleep (delay execution) for t seconds. Only the keystroke will interrupt the sleep command. Examples: > sleep(1); See also: ?CallSystem ?getpid ?TimedCallSystem sort Function sort - sort a list Option: builtin Calling Sequence: sort(L) sort(L,orderproc) Parameters: Name Type Description ---------------------------------------------------------- L list(anything) a list of things to be sorted orderproc procedure an ordering procedure Returns: list(anything) Synopsis: The sort function can order a list (array) containing any type of elements as long as these elements are comparable i.e. the operator <= is applicable and well-defined. When only supplied a list, sort places the elements in ascending order and returns a copy of the list. The ordering it uses is ascending order and for other data structures it is the same order that sets use. In particular, if there are no duplicate elements in the input list, sorting without an orderproc or transforming the list into a set have the same effect. The optional second argument must specify an ordering procedure. This procedure may have a single argument, in which case it is understood to return a value on which to order the records, or may take two arguments, in which case it should return true or false depending on whether the arguments are in the desired order. In both cases the arguments will be the entries of the array to be sorted. Sort does not destroy/change its argument, it returns a new array of (sorted) data. Naturally, sort is most efficient when called with a single argument. Examples: > a := [521, -923, 1293, 521, -3342]; a := [521, -923, 1293, 521, -3342] > sort(a); [-3342, -923, 521, 521, 1293] > a; [521, -923, 1293, 521, -3342] > sort(a, x -> -x); [1293, 521, 521, -923, -3342] > neg := proc(a) return(-(abs(a))) end; neg := proc (a) return(-1*|a|) end > sort(a, neg); [-3342, 1293, -923, 521, 521] > b :=[[z, f], [w, e], [y, d]]; b := [[z, f], [w, e], [y, d]] > sort(b, b->b[2]); [[y, d], [w, e], [z, f]] See also: ?set sprintf Function sprintf - Storage print -return a string as if printed Option: builtin Calling Sequence: sprintf(p,a1..ak) Parameters: Name Type ----------------------------------------------- p pattern (same format as for printf) ai arguments to be formatted according to p Returns: string Synopsis: This function behaves similar to C's sprintf function. Examples: > i:=5; i := 5 > j:=6; j := 6 > sprintf('i and j are: %d %d', i, j); i and j are: 5 6 See also: ?printf (for a complete list of all conversion codes) ?sscanf sqrt Function sqrt - Square Root Options: builtin, numeric and zippable Calling Sequence: sqrt(x) Parameters: Name Type ------------------- x numeric >= 0 Returns: numeric Synopsis: Computes the square root of x. Examples: > sqrt(5); 2.2361 See also: ?Complex ?Polar sscanf Function sscanf - String Format Scan Option: builtin Calling Sequence: sscanf(txt,pat) Parameters: Name Type Description --------------------------- txt string a string pat string a pattern Returns: list Synopsis: This function behaves similar to C's scanf function. Examples: > sscanf('hello 6 3', '%s %d %d'); [hello, 6, 3] See also: ?printf (for a complete list of all conversion codes) ?sprintf std Function std - unbiased estimate of standard deviation of (list of) numbers Calling Sequence: std(L1,L2,...) Parameters: Name Type Description ------------------------------------------------------------ Li {numeric,list(numeric)} a number or list of numbers Returns: numeric Synopsis: Finds the variance of all the values in the arguments. This is an unbiased estimator of the variance, that is it is computed with the formula: (sum(x^2) - sum(x)^2/n) / (n-1), where n is the number of x values. This function needs at least two values to compute successfully Examples: > std(5, 97, 22, [14,15,16] ); 34.1609 > std(2,3,5,7,11,13,17,19); 6.3906 See also: ?avg ?max ?median ?min ?var string Function string( a ) Converts argument to a string. Multiple arguments are concatenated string_RGB Function string_RGB - convert a color name into an RGB vector Calling Sequence: string_RGB(s) Parameters: Name Type Description -------------------------------------------- s string an color name without spaces Returns: nonnegative : nonnegative Synopsis: This function converts a color name into a 3 value RGB vector. The vector contains the values for red, green and blue in a scale of 0 to 1. Black is [0,0,0] and white is [1,1,1]. The name matching is case independent and it tolerates up to two errors. About 650 colours are known to this function. The full list can be found at lib/Color. Examples: > string_RGB(MidnightBlue); [0.09803922, 0.09803922, 0.4392] > string_RGB(midnightBLAU); [0.09803922, 0.09803922, 0.4392] > string_RGB(chocolate); [0.8235, 0.4118, 0.1176] See also: ?Color ?DrawTree ?RGB_string subs Function subs - substitute occurrences of subexpressions Calling Sequence: subs(val1 = repl1,val2 = repl2,...,s) Parameters: Name Type Description ------------------------------------------------- val.i anything an object to be replaced in s repl.i anything the replacement of val.i s anything an arbitrary object Returns: anything Synopsis: The function subs, creates a new expression, substituting every occurrence of the given values by the corresponding replacements. The substitutions happen left-to-right for the entire s. Examples: > subs(3=abc,[1,2,3]); [1, 2, abc] > subs(2=77,[77]=abc,A(1,[2],3)); A(1,abc,3) > subs(A=B,A(11,22)); B(11,22) See also: ?coeff ?has ?hastype ?indets ?lcoeff ?mselect ?noeval ?types subset Function subset Option: builtin Calling Sequence: subset(a,b) Parameters: Name Type ----------- a set b set Returns: boolean Synopsis: The subset function returns true if and only if every element in set a is in set b. Examples: > subset({1,2,3}, {1,2,3,4}); true See also: ?intersect ?member ?minus ?union sum Function sum Option: builtin Calling Sequence: sum(a) sum(p,i = lo..hi) sum(p,i = s) Parameters: Name Type Description ---------------------------------------------------------------- a list a list of summable elements lo numeric lower index of summation hi numeric upper bound of summation p anything expression to be summed for all index values s {list,set} set or list of index valuex Returns: numeric Synopsis: When sum is called with a list, the sum of all the elements of the list is computed. The formats with an index variable, i, sum the expression p for all the values of the variable i. The expression p is evaluated each time that i is assigned a value. If a range of values is given, i is first assigned lo which is incremented by 1 every time. The expression p is evaluated and summed as long as i <= hi. In the third format, i is assigned all the values of the set or list. If sum is applied on a matrix, the rows are summed, an easy way of adding the columns. If it is applied twice on a matrix it will return the sum of all the elements of a matrix. The summation variable can be assigned another value, it will not be changed, nor it will disturb the summation. Examples: > sum([1,3,19]); 23 > sum(1/i,i=1..1000); 7.4855 > sum(i^2,i={2,3,5,7}); 87 > A := [[1,2,3],[2,2,2]]: > sum(A); [3, 4, 5] > sum(sum(A)); 12 > i:=nonsense; i := nonsense > sum(10*i,i=1.53 .. 2); 15.3000 > i; nonsense See also: ?list ?matrix ?op ?product ?seq ?zip symbol Function symbol Option: builtin Calling Sequence: symbol(s) Parameters: Name Type --------------- s a string Returns: symbol Synopsis: Symbol transforms a string into a symbol (a Darwin variable that can hold values). This is typically needed when a name is formed by concatenation or as a result of an sprintf() command. The symbol obtained always refers to a global symbol, never to a local or to a parameter, when computed inside a procedure. Examples: > symbol(a.b); ab > type(symbol(a.b)); symbol > type(a.b); string See also: ?names ?string table Class table - structure to store and retrieve elements by name Template: table() table(unassig) Fields: Name Type Description ---------------------------------------------------------------------------- unassig anything value to be returned for an unassigned entry procedure a procedure that will be invoked on unassigned entries key anything key value for accessing or storing in table Returns: table Methods: list plus print table_type Synopsis: A table stores arbitrary values or structures, which can be accessed by a key. The key can be any valid object in Darwin. The access to the table is done with normal indexing and the assignment of values is done with assignments. When an inexistent element is accessed a special value is returned. By default this value is the symbol "unassigned". It can be changed to any other value. If the default value is a procedure, it will be understood that the value to be returned on an inexistent entry is the result of computing the procedure over the argument. For sparse numerical tables, it is convenient to set the unassigned value to 0 so addition into the table can be done directly. To test if an entry is assigned or not, it is not possible to use the function assigned, as the table is not a name, and non-existent entries are automatically considered to have the default value. Instead, testing for the default value should be used. The iterator Indices() will operate on a table and iterate over all the existing (assigned) indices of the table. Examples: > Kingdom := table(unknown): > Kingdom[mouse] := Eukaryota: Kingdom[ecoli] := Bacterium: > [Kingdom[mouse], Kingdom[rat]]; [Eukaryota, unknown] > print(Kingdom); ecoli --> Bacterium mouse --> Eukaryota > Kingdom[ecoli] := Bacteria; Kingdom[ecoli] := Bacteria > for z in Indices(Kingdom) do lprint(z,Kingdom[z]) od; ecoli Bacteria mouse Eukaryota See Also: ?assigned ?SearchAllArray ?SearchOrderedArray ?subset ?member ?SearchArray ?set ?Table tan Function tan Options: builtin, numeric, polymorphic and zippable Calling Sequence: tan(x) Parameters: Name Type ------------------------ x a numerical value Returns: numeric Synopsis: This function computes the trigonometric tangent function defined by: tan(x) = sin(x)/cos(x). tan(x) has a simple poles at x=Pi/2+n*Pi. References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun, Ch 4.3 Examples: > tan(0); 0 > tan(Pi/4); 1.0000 > tan(-Pi); 1.2246e-16 See also: ?arcsin ?arctan ?cos ?sin time Function time Option: builtin Calling Sequence: time() time(expr) Returns: expression Synopsis: This function returns the time needed to evaluate expr. If no expression is specified, it returns the total CPU time used by the current session of Darwin. If time is called with the string "all", then the total CPU time of the process and all its children is returned. This is useful to find the total time used when Darwin calls other programs. Examples: > time(); 37.5100 > time(exp(1.7 * 3.14)); 0 > time(all); 40.9000 See also: ?date ?gigahertz ?TimedCallSystem ?UTCTime transpose Function transpose - transpose a matrix Option: builtin Calling Sequence: transpose(A) A^T A^t Parameters: Name Type ----------------------- A matrix(anything) Returns: matrix(anything) Synopsis: Computes the transpose, A^T, of a matrix A. (The transpose of a matrix is produced by replacing entry A_ij with entry A_ji for all i, j.) Transposition can also be achieved through the use of the exponent T or t. For this to work properly, T or t must not be assigned. Transpose will also work for higher order arrays. Examples: > A := transpose([[1, 2, 3], [4, 5, 6], [7, 8, 9]]); A := [[1, 4, 7], [2, 5, 8], [3, 6, 9]] > A^t; [[1, 2, 3], [4, 5, 6], [7, 8, 9]] See Also: ?Cholesky ?GaussElim ?LinearProgramming ?convolve ?GivensElim ?matrix ?Eigenvalues ?Identity ?matrix_inverse traperror Function traperror Option: builtin Calling Sequence: traperror(exp) Parameters: Name Type ----------------- exp expression Returns: The result of evaluating exp or a string. Synopsis: If an error occurs while evaluating exp, the traperror function returns a string consisting of the Darwin error message. Execution does not halt. If no error occurs, traperror simply returns the result of evaluating exp. Examples: > traperror( undefined_symbol/20 ); undefined_symbol, variable not assigned, invalid term in product See also: ?error ?lasterror trim Function trim - Removes leading and trailing whitespace from a string Calling Sequence: trim(s) trim(s,chars) Parameters: Name Type Description ------------------------------------------------------ s string string to be trimed chars set (optional) set of chars to be removed Returns: string Synopsis: Return a copy of the string s with leading and trailing whitespace removed. If chars is not specified, the following characters are considered to be whitespaces: ' ','\t','\n','\r' and '\0'. Examples: > trim(' Hello '); Hello > trim('a World ',{' ','a'}); World See also: ?ConcatStrings ?RenderTemplate ?string trunc Function trunc Options: builtin, numeric and zippable Calling Sequence: trunc(x) Parameters: Name Type -------------- x numeric Returns: integer Synopsis: Returns the integer portion of the argument x. Examples: > trunc(9.9999999); 9 > trunc(-9.99999); -9 See also: ?ceil ?floor ?mod ?round type Function type - type testing Option: builtin Calling Sequence: type(exp) type(exp,typeeval) Parameters: Name Type Description ----------------------------------- exp anything an expression typeeval any type Returns: {boolean,type} Synopsis: The type function with two arguments returns true if the type of evaluated exp is typeeval. Otherwise, it returns false. With a single argument, it returns the type of expression exp. Examples: > type(a, anything); true > type(5, integer); true > type('hello', string); true > type('abc'); string See Also: ?types (This gives a full description of the valid types and their compositions) union Function union Options: builtin and polymorphic Calling Sequence: a union b union(a,b) Parameters: Name Type ----------- a set b set Returns: set Synopsis: Computes the union of two sets, that is a set which has all the elements of a and b. Repeated elements are removed from the resulting set. The value intersect() is understood to be the entire universe, and hence unions including intersect() will simply return this term. In its functional form, any arbitrary number of sets can be unioned. In particular, union(a) = a, and union() = {}. Examples: > {1,2,3} union {2,3,4}; {1,2,3,4} > {1,2,3} union {}; {1,2,3} > {1,2,3} union intersect(); intersect() See also: ?intersect ?member ?minus ?subset uppercase Function uppercase Option: builtin Calling Sequence: uppercase(t) Parameters: Name Type ------------- t string Returns: string Synopsis: This function returns the string t converted to uppercase. Examples: > uppercase('I have been converted'); I HAVE BEEN CONVERTED See also: ?lowercase var Function var - unbiased estimate of variance of (list of) numbers Calling Sequence: var(L1,L2,...) Parameters: Name Type Description ------------------------------------------------------------ Li {numeric,list(numeric)} a number or list of numbers Returns: numeric Synopsis: Finds the variance of all the values in the arguments. This is an unbiased estimator of the variance, that is it is computed with the formula: (sum(x^2) - sum(x)^2/n) / (n-1), where n is the number of x values. This function needs at least two values to compute successfully Examples: > var(5, 97, 22, [14,15,16] ); 1166.9667 > var(2,3,5,7,11,13,17,19); 40.8393 See also: ?avg ?max ?median ?min ?std version Function version Option: builtin Calling Sequence: version() Returns: expseq Synopsis: Returns and expression sequence with 8 components: 1 VersionType: string, Production or Debug 2 Architecture: string, encoded name of architecture 3 Version: number 4 CompiledWith: string, name of the compiler used 5 CompilerVersion: string 6 CompilerOptions: string 7 DateCompiled: string, result of system command date 8 CharactersPerWord: posint, number of characters per word Examples: > version(); RelWithDebInfo, Linux, 4, /usr/bin/gcc, 4.4.3, -static, Tue Feb 19 10:53:16 CET 2013, 8, ON warning Function warning - outputs warning string on STDERR Option: builtin Calling Sequence: warning(txt) Parameters: Name Type Description ------------------------------------------------- txt string the warning message to be printed Returns: NULL Synopsis: This function outputs a warning message on the error stream. Examples: > warning('This is a warning'); WARNING: This is a warning See also: ?error ?lasterror ?traperror zip Function zip - compute an expression for each component Option: builtin Calling Sequence: zip(expr) Parameters: Name Type ----------------- expr expression Returns: list(expression) Synopsis: Compute an expression over the components of a list element-wise. Zip gets its name from an operation like a+b, where a and b are lists of the same length and the result is the component-wise sum of each element of a and b. It is like zipping the two vectors together. In general, if the expr is an expression which contains vectors (or list or sets), and all these lists or sets are of the same length, then zip will compute the expression for each value of the lists/sets and return a list with the results. The arguments or components of expr which are not list/sets will be taken as constants. Notice that even if the argument contain only sets, zip will still return a list. Examples: > zip( sin( [1,2,3] )); [0.8415, 0.9093, 0.1411] > f := proc(a,b,c) a*b+c end; f := proc (a, b, c) a*b+c end > zip( f( [1,2,3], 10, {0.1,0.2,0.3} )); [10.1000, 20.2000, 30.3000] > zip( f( 1, [2,3,4,5], Pi-3 )); [2.1416, 3.1416, 4.1416, 5.1416] See also: ?op ?seq ?sum DBL_EPSILON System variable DBL_EPSILON Synopsis: The system variable DBL_EPSILON has the property that it is the smallest number where 1+DBL_EPSILON <> 1. See also: ?DBL_MAX DBL_MAX System variable DBL_MAX Synopsis: The system variable holds the value of the maximum double numeric allowed in Darwin. This variable is set in the library file darwinit. The LongInteger() routines in Darwin allow for larger integers. Examples: > DBL_MAX; 1.7976931348623147e+308 See also: ?LongInteger DMS System variable DMS Synopsis: The DMS (Dayhoff matrices) system variable has type list(DayMatrix) and contains 1266 Dayhoff matrices for various PAM distances between 0.049 and 1000 after a call to the function CreateDayMatrices(). Some routines perform all operations under the assumption that the Dayhoff matrices currently contained in DMS are the correct Dayhoff matrices to use. See also: ?CreateDayMatrices ?CreateDayMatrix ?DayMatrix ?DM DM System variable DM Synopsis: The DM (Dayhoff matrix) system variable has type DayMatrix and contains a Dayhoff matrix computed at PAM distance 250 after a call to the function CreateDayMatrices(). Some routines perform all operations under the assumption that the Dayhoff matrix currently contained in DM is the correct Dayhoff matrix to use. See also: ?CreateDayMatrices ?CreateDayMatrix ?DayMatrix ?DMS DigestionWeights Class DigestionWeights - data structure to hold digestion information Template: DigestionWeights(digestor,weights) Fields: Name Type Description ----------------------------------------------------------------- digestor string name of the digestion enzyme weights numeric molecular weights of the fragments {equation,symbol} amino acid weight modification Returns: DigestionWeights Methods: DigestionWeights_type Synopsis: DigestionWeights is a data structure used to hold the name of the digestion enzyme followed by the weights obtained from the digestion. See ?enzymes for a complete description of the enzymes being recognized and their properties. Additionally we can specify various conditions that result in weight modifications of the amino acids. The weight modifications can be placed anywhere in the list of weights and are all optional. Currently these are: C=208.29 An equation with a one-letter code on the lhs and a weight on the right indicates to the program that the given amino acid (due to some modification pre/post digestion) has the given weight. Deuterated This word will indicate that all the hydrogen atoms have been exchanged with Deuterium, and hence the weights of all aa should be adjusted accordingly. If the digestor is CNBr or TrypsinCysModified or NTCB, changes to the weights are made automatically. Examples: > DigestionWeights('Trypsin', 601.9438, 504.0904, 1512.4545, 480, 590); DigestionWeights(Trypsin,601.9438,504.0904,1512.4545,480,590) See Also: ?DigestAspN ?DigestWeights ?enzymes ?ProbCloseMatches ?DigestSeq ?DynProgMass ?MassProfileResults ?Protein ?DigestTrypsin ?DynProgMassDb ?ProbBallsBoxes ?SearchMassDb EOF System variable EOF Synopsis: The system variable EOF is used to mark the end of a file. See also: Edges Class Edges Template: Edges(L) Fields: Name Type --------------------------- L { list(Edge), NULL } Returns: Edges Methods: Edges_type set Synopsis: The Edges structure is the first field of a Graph data structure. It consists of a list of Edge structures. Examples: > G := Graph( Edges( Edge(4,1,2), Edge(7,1,3), Edge(6,2,4), Edge(5,3,4) ), Nodes(1, 2, 3, 4) ); G := Graph(Edges(Edge(4,1,2),Edge(7,1,3),Edge(6,2,4),Edge(5,3,4)),Nodes(1,2,3,4)) See Also: ?BipartiteGraph ?Graph_Rand ?ParseDimacsGraph ?Clique ?Graph_XGMML ?Path ?DrawGraph ?InduceGraph ?RegularGraph ?Edge ?MaxCut ?ShortestPath ?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph ?FindConnectedComponents ?MinCut ?VertexCover ?Graph ?MST ?Graph_minus ?Nodes Entries Iterator Entries - iterates over all entries in a database Usage: for z in Entries() do ... od; Returns: Entry Synopsis: This is an iterator which returns all the entries from the default database (stored in DB). The entries are returned in order, ie. Entry(1), Entry(2), etc. See Also: ?AC ?GetEntryInfo ?ID ?PatEntry ?Sequences ?Entry ?GetEntryNumber ?iterator ?Sequence Infix Iterator Infix - walks over all the nodes of a tree in infix order Usage: for n in Infix(tree) do ... od; Parameters: Name Type Description ---------------------------- tree Tree a general tree Returns: Tree Synopsis: This is an iterator which returns all the nodes (internal nodes of type "Tree" or external nodes of type "Leaf") of a tree in infix order. Infix order means that the left subtree is visited first, then the node, then the right subtree, for every node recursively. See Also: ?iterate ?Leaf ?objectorientation ?Prefix ?iterator ?Leaves ?Postfix Leaves Iterator Leaves - walks over all the leaves of a tree Usage: for n in Leaves(tree) do ... od; Parameters: Name Type Description ---------------------------- tree Tree a general tree Returns: Leaf Synopsis: This is an iterator which returns all the leaves of a tree in infix order. See Also: ?Infix ?iterator ?objectorientation ?Prefix ?iterate ?Leaf ?Postfix Lines Iterator Lines - iterates over all lines in a string Usage: for z in Lines(s) do ... od; Parameters: Name Type Description --------------------------- s string any string Returns: string Synopsis: This is an iterator which returns all the lines of a string (separated by a '\n' character) in the original order. The newline character at the end of each line is also included in the return value. See Also: ?iterate ?objectorientation ?SplitLines ?iterator ?SearchDelim ?string List Class List - holds contents of a List of displayable items Template: List(labelling,item1,item2,...) Returns: List Fields: Name Type Description ----------------------------------------------------------------- labelling {procedure,string} labelling method item_i {string,structure} text or structure for each entry Methods: HTMLC LaTeXC List_type print string Synopsis: The List structure holds information which will be formatted as a simple list. The first argument is a procedure which should produce a string for each integer argument. This will be the label that is used for each entry in the list. If the first argument is a string with a "%" in it, it is interpreted as an argument for sprintf. This is an easy way to provide arbitrary formating of numbers. If it is a string, that string is used for all items in the list. A list is normally part of a Document or some other structure intended for display or human-readable purposes. The following table shows some common labelling functions and their results for a few integers: procedure 1 2 10 20 30 -------------------------------------------------------------- Roman I II X XX XXX Alphabetical A B J T AD x->lowercase(Roman(x)) i ii x xx xxx x->sprintf('(%s)',Alphabetical(x)) (A) (B) (J) (T) (AD) '(%d)' (1) (2) (10) (20) (30) 'o' o o o o o Examples: > string( List('--%d--',First,Second)); --1-- First --2-- Second See Also: ?Block ?Document ?latex ?RunDarwinSession ?Code ?HTML ?Paragraph ?screenwidth ?Color ?HyperLink ?PostscriptFigure ?Table ?Copyright ?Indent ?print ?TT ?DocEl ?LastUpdatedBy ?Roman ?View NULL System variable NULL Synopsis: The NULL expression sequence. Nodes Class Nodes Template: Nodes(N) Fields: Name Type -------------------------------- N {list({posint, 0}), NULL} Returns: Nodes Methods: Nodes_type Synopsis: The Nodes structure holds the list of labels for nodes in a graph. Examples: > G := Graph( Edges( Edge(4,1,2), Edge(7,1,3), Edge(6,2,4), Edge(5,3,4) ), Nodes(1, 2, 3, 4) ); G := Graph(Edges(Edge(4,1,2),Edge(7,1,3),Edge(6,2,4),Edge(5,3,4)),Nodes(1,2,3,4)) See Also: ?BipartiteGraph ?Graph_minus ?ParseDimacsGraph ?Clique ?Graph_Rand ?Path ?DrawGraph ?Graph_XGMML ?RegularGraph ?Edge ?InduceGraph ?ShortestPath ?EdgeComplement ?MaxCut ?TetrahedronGraph ?Edges ?MaxEdgeWeightClique ?VertexCover ?FindConnectedComponents ?MinCut ?Graph ?MST NucDB System variable NucDB Synopsis: This system variable is used to point to a database containing nucleotide or ribonucleotide sequences. See also: ?DB ?PepDB PepDB System variable PepDB Synopsis: This system variables is used to point to a database containing amino acid sequences. See also: ?DB ?NucDB Pi System variable Pi Synopsis: Contains the value of Pi. Postfix Iterator Postfix - walks over all the nodes of a tree in postfix order Usage: for n in Postfix(tree) do ... od; Parameters: Name Type Description ---------------------------- tree Tree a general tree Returns: Tree Synopsis: This is an iterator which returns all the nodes (internal nodes of type "Tree" or external nodes of type "Leaf") of a tree in postfix order. Postfix order means that the left subtree is visited first, then the right subtree, then the node, for every node recursively. See Also: ?Infix ?iterator ?Leaves ?Prefix ?iterate ?Leaf ?objectorientation Prefix Iterator Prefix - walks over all the nodes of a tree in prefix order Usage: for n in Prefix(tree) do ... od; Parameters: Name Type Description ---------------------------- tree Tree a general tree Returns: Tree Synopsis: This is an iterator which returns all the nodes (internal nodes of type "Tree" or external nodes of type "Leaf") of a tree in prefix order. Prefix order means that the node is visited first, then the left subtree, then the right subtree, for every node recursively. See Also: ?Infix ?iterator ?Leaves ?Postfix ?iterate ?Leaf ?objectorientation Primes Iterator Primes - generates the prime numbers Usage: for n in Primes() do ... od; Returns: posint Synopsis: This is an iterator which returns all the prime numbers in increasing order. See also: ?iterate ?iterator ?objectorientation Protein Class Protein - data structure to hold SearchMassDb data Template: Protein(ApproxMass,DigestionWeights()) Protein(ApproxMass,DigestionMono()) Fields: Name Type Description ----------------------------------------------------------------------- ApproxMass structure approximate mass in Daltons DigestionWeights structure weights obtained from using the digestor DigestionMono structure as above but using monoisoptopic masses Returns: Protein Methods: Protein_type Rand Synopsis: Protein is a data structure that holds the approximate mass in an ApproxMass data structure and the digestion weights in either a DigestionWeights or a DigestionMono data structure. It is used as input to the SearchMassDb function. Examples: > Protein(ApproxMass(65800),DigestionWeights('Trypsin',601.9438, 504.0904, 1512.4545, 480, 590, 700, 998)); Protein(ApproxMass(65800),DigestionWeights(Trypsin,601.9438,504.0904,1512.4545,480,590,700,998)) See Also: ?DigestAspN ?DigestWeights ?MassProfileResults ?DigestionWeights ?DynProgMass ?ProbBallsBoxes ?DigestSeq ?DynProgMassDb ?ProbCloseMatches ?DigestTrypsin ?enzymes ?SearchMassDb Sequences Iterator Sequences - iterates over all entries in a database Usage: for z in Sequences() do ... od; Returns: Sequence Synopsis: This is an iterator which returns all the sequences from the default database (stored in DB). The sequences are returned in order, ie. Sequence(Entry(1)), Sequence(Entry(2)), etc. See Also: ?AC ?Entry ?GetEntryNumber ?iterator ?Sequence ?Entries ?GetEntryInfo ?ID ?PatEntry database Class database - Peptide or Nucleotide database Template: ReadDb(dbname) Returns: database Fields: Name Type Description ---------------------------------------------------------------------------- Entry,i string the offset into the database of the ith entry. For programming convenience, the offset of the beyond last entry is defined as DB[TotChars] FileName string name of the external file containing the database Pat,i integer the ith entry of the Pat index on the data, an integer offset string string the entire database as a string TotAA posint number of amino acids or bases in the database TotChars posint number of characters in the database TotEntries posint number of entries in the database type string dna, rna, mixed or peptide Methods: database_type Synopsis: A database (DNA, RNA, mixed or peptide) is loaded with the command ReadDb. The database needs to be loaded for most operations involving sequences and alignments. The database is always available in the global variable DB. A database can be assigned to any other name, but certain operations, like finding an Entry, or using the Pat index, will perform on the database which is assigned to the global variable DB. All the selectors are read-only, they cannot be modified. The database consists of an SGML-formatted file which contains the information about entries and sequences. For a file to be successfully loaded as a database, there have to be entries (tagged between and ). Within each entry there should be a sequence (tagged between and ) of peptides, DNA or RNA. The first time that a database is loaded, two index files are constructed. One contains the Pat index and it is stored under the name dbname.tree and the other is a quick reference for entries and is stored under the name dbname.map. If the database under dbname is changed, these two files (dbname.tree and dbname.map) should be removed to force ReadDb to rebuild them. The Pat index maintains a total order among all the subsequences of the SEQ fields of the entries. There are as many entries in the Pat index as amino acids (or bases) in the entire database. If a Pat index is not desired, creating a null dbname.tree file will prevent ReadDb of building a Pat index. Examples: > DB := ReadDb('/home/darwin/DB/SwissProt.Z'):; Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235 entries, 59631787 aminoacids) See Also: ?AC ?GetOffset ?ReadDb ?SearchSeqDb ?ConsistentGenome ?ID ?SearchDb ?SearchTag ?Entry ?Offset ?SearchFrag ?Sequence ?GenomeSummary ?PatEntry ?SearchID lasterror System variable lasterror Synopsis: Contains the last error message generated by Darwin during the current session See also: ?error ?traperror libname System variable libname Synopsis: The libname system variable stores the path of the Darwin library. It is set by the -l flag when executing Darwin from the command line. list Class list - list or array of arbitrary elements Template: [] [a] [a,...] Fields: Name Type Description --------------------------------------------------------------------- a anything the ith element in the list .. a sublist of elements from the list Returns: list Methods: HTMLC list_type power Rand Row Table Synopsis: A list holds arbitrary values or structures. Elements in the list are left in the order the list was created. A list is also an array. A list of lists (of the same length) is a matrix. Elements of a list can be replaced with an assignment statement. Arithmetic operations work on lists (arrays) and lists of lists (matrices) according to the normal rules of linear algebra. (See examples) As an array, the list has no interpretation of column or row. It will act as column or row depending on the operation performed on it. When selecting with an integer range, negative values are interpreted as counting from the right. I.e. -2..-1 select the last two elements of the list. Examples: > a := [b,1,2,2]; a := [b, 1, 2, 2] > a[1]; b > a[1..2]; [b, 1] > a[-1..-1]; [2] > a[-2..-1]; [2, 2] > a[3] := 77; a[3] := 77 > a; [b, 1, 77, 2] > A := [[1,2],[3,0]]; A := [[1, 2], [3, 0]] > V := [-2,3]; V := [-2, 3] > A*V; [4, -6] > V*A; [7, -4] > 2*A; [[2, 4], [6, 0]] > A/3; [[0.3333, 0.6667], [1, 0]] > 7*V; [-14, 21] > V/5; [-0.4000, 0.6000] > V*V; 13 > B := 1/A; B := [[0, 0.3333], [0.5000, -0.1667]] > A*B; [[1, 0], [0, 1]] > V+[0,1]; [-2, 4] See also: ?append ?CreateArray ?matrix ?member ?mselect ?set ?subset matrix Class matrix - a matrix of elements Template: CreateArray(1..m,1..n) [[...], [...], ...] Returns: matrix Methods: inverse matrix_type print Rand Synopsis: A matrix in darwin is a list of lists where all the internal lists have the same length. A matrix can be created with CreateArray, explicitly as a list of lists, with append or iteratively. Algebra between matrix and scalars or between matrix and vectors follows the normal rules of Linear Algebra. A matrix multiplied by a vector on the right assumes the vector is a column vector. A matrix multiplied by a vector on the left assumes the vector is a row vector. Examples: > [[1,2],[2,3]]; [[1, 2], [2, 3]] > CreateArray(1..3,1..4,777); [[777, 777, 777, 777], [777, 777, 777, 777], [777, 777, 777, 777]] See Also: ?Cholesky ?Eigenvalues ?LinearProgramming ?SvdAnalysis ?convolve ?GaussElim ?LinearRegression ?SvdBestBasis ?CovarianceAnalysis ?GivensElim ?list ?transpose ?CreateArray ?Identity ?matrix_inverse set Class set - (mathematical) set of arbitrary elements Template: {} {a} {a,...} Returns: set Fields: Name Type --------------------------------------------------------- the ith element in the set the ith element from the right .. an expseq of elements from the set Methods: power Rand set_type Synopsis: A set holds a set of arbitrary values or structures. Elements in the set are ordered according to a unique order, and repeated elements are removed. Elements of a set (when the user is sure where they are located), can be replaced with an assignment statement. When selecting with an integer range, negative values are interpreted as counting from the right. I.e. -2..-1 select the last two elements of the set. The sorting of sets is very efficient, so if order is desired, placing the information in sets may be more efficient. Examples: > a := {b,1,2,[d,e]}; a := {1,2,b,[d, e]} > a[1]; 1 > a[1..2]; 1, 2 > a[-1..-1]; [d, e] > a[-2..-1]; b, [d, e] > a[3] := 77; a[3] := 77 > a; {1,2,77,[d, e]} See Also: ?append ?list ?minus ?sort ?union ?intersect ?member ?mselect ?subset amino acidspeptides Amino acids, ordinal numbers, one letter codes, 3 letter codes, molecular weight and name 1 A Ala 89.079 Alanine 2 R Arg 174.188 Arginine 3 N Asn 132.104 Asparagine 4 D Asp 133.089 Aspartic acid 5 C Cys 121.144 Cysteine 6 Q Gln 146.131 Glutamine 7 E Glu 147.116 Glutamic acid 8 G Gly 75.052 Glycine 9 H His 155.142 Histidine 10 I Ile 131.160 Isoleucine 11 L Leu 131.160 Leucine 12 K Lys 146.174 Lysine 13 M Met 149.198 Methionine 14 F Phe 165.177 Phenylalanine 15 P Pro 115.117 Proline 16 S Ser 105.078 Serine 17 T Thr 119.105 Threonine 18 W Trp 204.213 Tryptophan 19 Y Tyr 181.170 Tyrosine 20 V Val 117.113 Valine See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?CodonToInt ?IntToBase ?aminoacids ?BBBToInt ?CIntToInt ?GeneticCode ?IntToBBB ?AminoToInt ?BToInt ?CodonCode ?IntToA ?IntToCInt ?AToCInt ?CIntToA ?CodonToA ?IntToAAA ?IntToCodon ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToAmino ?AToInt ?CIntToAmino ?CodonToInt ?IntToB basesnucleotides DNA/RNA bases, ordinal numbers, one letter codes, 3 letter codes and name 1 A Ade Adenine 2 C Cyt Cytosine 3 G Gua Guanine 4 T Thy Thymine 5 U Ura Uracil See Also: ?AltGenCode ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB ?AminoToInt ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt ?antiparallel ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon ?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase conversiontranslation Amino acid and genetic code conversion functions Amino acid translation functions ------------------------------------------------------------------------------------------------------ | To | | From 1-letter AA 3-letter AA full name AA AA indx 1-20 3-letter cod cod indx 1-64 | |----------------------------------------------------------------------------------------------------| | 1-letter AA --- AToInt AToCodon | | 3-letter AA --- AAAToInt | | full name AA --- AminoToInt | | AA indx 1-20 IntToA IntToAAA IntToAmino --- IntToCodon | | 3-letter cod CodonToA CodonToInt --- CodonToCInt | | cod indx 1-64 CIntToA CIntToAAA CIntToAmino CIntToInt CIntToCodon --- | ------------------------------------------------------------------------------------------------------ See Also: ?AAAToInt ?BaseToInt ?CIntToCodon ?CodonToInt ?IntToBase ?aminoacids ?BBBToInt ?CIntToInt ?GeneticCode ?IntToBBB ?AminoToInt ?BToInt ?CodonCode ?IntToA ?IntToCInt ?AToCInt ?CIntToA ?CodonToA ?IntToAAA ?IntToCodon ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToAmino ?AToInt ?CIntToAmino ?CodonToInt ?IntToB CodonCode Codon, Codon number, one letter aa code, integer aa representation AAA 1 K 12 AAC 2 N 3 AAG 3 K 12 AAT 4 N 3 ACA 5 T 17 ACC 6 T 17 ACG 7 T 17 ACT 8 T 17 AGA 9 R 2 AGC 10 S 16 AGG 11 R 2 AGT 12 S 16 ATA 13 I 10 ATC 14 I 10 ATG 15 M 13 ATT 16 I 10 CAA 17 Q 6 CAC 18 H 9 CAG 19 Q 6 CAT 20 H 9 CCA 21 P 15 CCC 22 P 15 CCG 23 P 15 CCT 24 P 15 CGA 25 R 2 CGC 26 R 2 CGG 27 R 2 CGT 28 R 2 CTA 29 L 11 CTC 30 L 11 CTG 31 L 11 CTT 32 L 11 GAA 33 E 7 GAC 34 D 4 GAG 35 E 7 GAT 36 D 4 GCA 37 A 1 GCC 38 A 1 GCG 39 A 1 GCT 40 A 1 GGA 41 G 8 GGC 42 G 8 GGG 43 G 8 GGT 44 G 8 GTA 45 V 20 GTC 46 V 20 GTG 47 V 20 GTT 48 V 20 TAA 49 $ 22 TAC 50 Y 19 TAG 51 $ 22 TAT 52 Y 19 TCA 53 S 16 TCC 54 S 16 TCG 55 S 16 TCT 56 S 16 TGA 57 $ 22 TGC 58 C 5 TGG 59 W 18 TGT 60 C 5 TTA 61 L 11 TTC 62 F 14 TTG 63 L 11 TTT 64 F 14 See Also: ?AltGenCode ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB ?AminoToInt ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt ?antiparallel ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon ?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase genetic code GGG G Gly AGG R Arg CGG R Arg UGG W Trp GGA G Gly AGA R Arg CGA R Arg UGA Stop GGC G Gly AGC S Ser CGC R Arg UGC C Cys GGU G Gly AGU S Ser CGU R Arg UGU C Cys GAG E Glu AAG K Lys CAG Q Gln UAG Stop GAA E Glu AAA K Lys CAA Q Gln UAA Stop GAC D Asp AAC N Asn CAC H His UAC Y Tyr GAU D Asp AAU N Asn CAU H His UAU Y Tyr GCG A Ala ACG T Thr CCG P Pro UCG S Ser GCA A Ala ACA T Thr CCA P Pro UCA S Ser GCC A Ala ACC T Thr CCC P Pro UCC S Ser GCU A Ala ACU T Thr CCU P Pro UCU S Ser GUG V Val AUG M Met CUG L Leu UUG L Leu GUA V Val AUA I Ile CUA L Leu UUA L Leu GUC V Val AUC I Ile CUC L Leu UUC F Phe GUU V Val AUU I Ile CUU L Leu UUU F Phe See Also: ?AltGenCode ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB ?AminoToInt ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt ?antiparallel ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon ?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase enzymes enzyme digestor digester For SearchMassDb the following enzymes are recognized (courtesy of Amos Bairoch): Enzyme name cuts between except for ########### ############ ########## Armillaria Xaa-Cys,Xaa-Lys ArmillariaMellea Xaa-Lys BNPS_NCS Trp-Xaa Chymotrypsin Trp-Xaa,Phe-Xaa,Tyr-Xaa, Trp-Pro,Phe-Pro,Tyr-Pro, Met-Xaa,Leu-Xaa, Met-Pro,Leu-Pro Clostripain Arg-Xaa CNBr_Cys Met-Xaa,Xaa-Cys CNBr Met-Xaa AspN Xaa-Asp LysC Lys-Xaa Hydroxylamine Asn-Gly MildAcidHydrolysis Asp-Pro NBS_long Trp-Xaa,Tyr-Xaa,His-Xaa NBS_short Trp-Xaa,Tyr-Xaa NTCB Xaa-Cys PancreaticElastase Ala-Xaa,Gly-Xaa,Ser-Xaa,Val-Xaa PapayaProteinaseIV Gly-Xaa PostProline Pro-Xaa Pro-Pro Thermolysin Xaa-Leu,Xaa-Ile,Xaa-Met, Xaa-Phe,Xaa-Trp,Xaa-Val TrypsinArgBlocked Lys-Xaa Lys-Pro TrypsinCysModified Arg-Xaa,Lys-Xaa,Cys-Xaa Arg-Pro,Lys-Pro,Cys-Pro TrypsinLysBlocked Arg-Xaa Arg-Pro Trypsin Arg-Xaa,Lys-Xaa Lys-Pro V8AmmoniumAcetate Glu-Xaa Glu-Pro V8PhosphateBuffer Asp-Xaa,Glu-Xaa Asp-Pro,Glu-Pro The following are double digestors (both acting simultaneously) CNBrTrypsin Met-Xaa Arg-Xaa,Lys-Xaa Lys-Pro CNBrAspN Met-Xaa Xaa-Asp CNBrLysC Met-Xaa Lys-Xaa CNBrV8AmmoniumAcetate Met-Xaa Glu-Xaa Glu-Pro CNBrV8PhosphateBuffer Met-Xaa Asp-Xaa,Glu-Xaa Asp-Pro,Glu-Pro Comments: CNBr_Cys - its chemistry is not well defined so modifications of other amino acids may occur. NBS_log NBS_short NTCB BNPS_NCS - these four digesters produce unpredictable chemical modifications of other residues which will adversely affect the search. Hydroxylamine MildAcidHydrolysis - both of these produce at most one or two fragments per protein and are therefore not useful for searching. Chymotrypsin PancreaticElastase Thermolysin - are not as specific (or go to completion) as it would be desired. PapayaProteinaseIV PostProline - these enzymes can only cleave small proteins, and hence are not of great practical use. CNBr - instead of methionine being left at the C-terminal, a homoserine (101.1054) or homoserine lactone (83.092) is produced. TrypsinCysModified - all the cysteines are transformed into aminoethyl- cysteine (146.2133). input/output input output io i/o Input/output is done in Darwin through function calls. The open commands cause no immediate input/output, they are expected to be followed by read or write commands. The open commands accept the name 'terminal', meaning the standard interactive input and output (stdin/stdout) of Darwin. File input/output dprint - print a general expression so that it can be read back lprint - print a general expression OpenAppending - all future output will be appended to file OpenPipe - all future ReadRawLine commands will read from pipe OpenReading - all future ReadRawLine commands will read from file OpenWriting - all future output will go to file print - pretty print expressions printf - print according to format ReadRawFile - read an entire file as a string ReadRawLine - read a line as a string Darwin commands input/output OpenPipe - all future ReadLine commands will read from pipe OpenReading - all future ReadLine commands will read from file ReadLibrary - read a file/function from the darwin library ReadLine - reads a darwin command in a single line ReadOffsetLine - reads a darwin command from a file (offseted) ReadProgram - read an entire file of darwin commands WriteTree Databases input/output Protein/Nucleotide ReadDb ReadBrk ReadDomain ReadDssp ReadFasta ReadMap ReadMsa ReadPima ReadPir WriteDomainDB WriteFasta Grid files AddGrid CloseGrid CompressGrid CreateGrid FlushGrid GetNextGrid MapGrid OpenGrid QueryGrid UncompressGrid NDBM StoreKey FindKey Plotting output plotoutput DrawGraph DrawHistogram DrawDistribution DrawDotplot SmoothData DrawStackedBar ViewPlot StartOverlayPlot StopOverlayPlot GetColorMap DrawTree DrawTCount DrawBisectTree DrawUnrootedTree DrawSimPam System commands CallSystem TimedCallSystem date time rtime selectorfunction A selector function is a function which allows the user/programmer to define the rules and names for selection. For a data structure D, (whether this is internal, as defined by type, or user defined), if D_select is assigned a function, then for every selector which is not a positive integer or an integer range, the function D_select will be called to do the selection (or assignment to a selected part). Example: Let Imaginary be a user-defined data structure with two parts, the real part and the imaginary part. A selector which implements the common names Re and Im can be written as follows: Imaginary_select := proc( c:Imaginary, sel, val:numeric ) if sel = Im then if nargs=3 then c[2] := val else c[2] fi elif sel = Re then if nargs=3 then c[1] := val else c[1] fi else error(sel,'is an invalid selector for an Imaginary number') fi end: a := Imaginary(1.0,-1.0); a[Im]; a[Re] := 0; a; Here we assume that the definition of Imaginary is Imaginary := proc( realpart:numeric, imagpart:numeric ) ... end: Notice that when the selector function is called with two arguments, it is an indication that a value is to be selected. When it is called with 3 arguments, it is an indication that an assignement should be made. printlevel debugginginformation printlevel - controlling the amount of printed output from Darwin. Normally, the result of every statement executed at the top level is printed. This printing is controlled by a global variable named printlevel. By default this variable is assigned 1. At this level, expressions or assignments at the top level and nested one level will be printed. E.g. This is an assignment at the top level, it is printed. > a := 1; a := 1 This is an expression nested one level, it is printed. > if a=1 then 7 fi; 7 This is an assignment nested two levels deep, nothing is printed. > for i to 2 do if a=1 then c := 1 fi od; By increasing printlevel by 1, the printing will happen one level deeper. E.g. > printlevel := 2; printlevel := 2 > for i to 2 do if a=1 then c := 1 fi od; c := 1 c := 1 By increasing printlevel to 5, the execution of any function called at the top level will be printed. This becomes a very valuable tool for debugging and inspecting Darwin functions. E.g. > f := x -> x+1; f := x -> x+1 > printlevel := 5; printlevel := 5 > f(1); {--> enter f, args = 1 2 <-- exit f = 2} 2 By increasing printlevel to 10 the statements in a nested function call will be displayed. E.g. > g := x->f(x): > printlevel := 10: > g(1); {--> enter g, args = 1 {--> enter f, args = 1 2 <-- exit f = 2} 2 <-- exit g = 2} 2 Some additional printing is also controlled by printlevel. If printlevel is higher than 2, then in case of an error, a complete traceback is printed with all the local variables, parameters and their values. Many functions use printlevel to print additional information about the problem they are solving. Users are encouraged to use printlevel for this purpose. In this case, a value of 1 should not print anything, and values greater than 4 are not recommended, since the user will be forced to see the trace of top level functionc calls. E.g. my_function := proc( x ) . . . . if printlevel > 2 then printf( 'hyperbolic cut method used\n' ) fi; . . . . end: Notice that if you want to modify printlevel inside a function, you should declare it in the global list, else by default it becomes a local variable. See also: ?debug ?trace profileprofiling callgraphcalltreescallingtree Profiling - Measuring how efficiently a program is executing. Darwin provides tools for profiling the execution of a program and then analyzing these results. The profiling is done at the darwin-level functions, kernel functions normally cannot be profiled. The procedure is as follows. The program or session to be profiled is ran with the addition of the option profile. This option is set by the command Set(profile); Darwin will then produce addtional output, which is sent to the standard output, consisting of one short line per every entry and exit to a darwin function. This information can be analyzed by three external programs: profile (which provides a basic profile per function), callgraph (which provides basic profile per caller-callee function) and calltrees (which analyzes the most resource consuming complete call trees). Lets assume that we want to profile the following LongInteger computation: > LLL( [[1,0,LongInteger(1000000000)], > [0,1,LongInteger(3141592654)]] ); [[LongInteger([-355]), LongInteger([113]), LongInteger([-30098])], [LongInteger([-104348]), LongInteger([33215]), LongInteger([2610])]] When this is run with option profile, the first few lines are: > Set(profile): > LLL( [[1,0,LongInteger(1000000000)], > [0,1,LongInteger(3141592654)]] ); ->LongInteger 21,81182,200 ->LongInteger_normal 50,81204,200 <-LongInteger_normal 50,81291,200 <-LongInteger 21,81296,200 . . . . . . The output contains the name of the function call, the recursion level, the number of words allocated and the number of clock ticks. All profilers report the time consumed and the storage requested. Ordering of the output is done based on time*space^2, a reasonable way of scoring the composite time/space resources. If this output is stored in a file, it can be later analyzed with the programs profile, callgraph and calltrees. In a Unix system, the output from Darwin can be piped into this programs directly, and the need for intermediate files, which may be very large, is avoided. The first few lines of the result of profile is: 12 different functions, using 1.017 secs and 107K words name #calls cpu words ==== ====== === ===== Main_Routine 1 0.667 ( 65.6%) 81244 ( 76.1%) LongInteger_times 78 0.100 ( 9.8%) 6426 ( 6.0%) LongInteger_normal 133 0.033 ( 3.3%) 6759 ( 6.3%) LongInteger 132 0.067 ( 6.6%) 3700 ( 3.5%) LongInteger_iquo 10 0.033 ( 3.3%) 4013 ( 3.8%) . . . . where the interpretation is obvious. The same output analyzed with callgraph produces: used by Callee Caller Callee #calls time words ====== ====== ====== ==== ===== #++ calls Main_Routine 1, 1.017, 106762 #++ Main_Routine calls LLL 1, 0.350, 25290 #++ LLL calls LongInteger_times 63, 0.133, 8750 #++ LongInteger_times calls LongInteger 64, 0.067, 7276 #++ LLL calls LongInteger_power 12, 0.067, 5523 #++ LLL calls LongInteger_iquo 10, 0.050, 5029 . . . . . In this case, for example, the function LLL calls the function LongInteger_times 63 times, and it and all its descendants consume 0.133 secs and allocate 8750 words. Main_Routine stands for all the commands executed at the top level. is a fictitious level, above the top level, used to be able to report the entire session. Finally the output of calltrees is: Calling sequence used by Callee time Kwords LLL uses 0.35 25.29 LLL calls LongInteger_iquo uses 0.02 0.65 . . . . LLL calls LongInteger_power calls LongInteger_times uses 0.02 0.49 . . . In this case, the complete calling trees and their resource consumption are shown in decreasing order of resources. This is information is useful when a significant amount of resources is consumed in a single calling path. The user has additional control, and can identify any block of code to profile it and incorporate it to the rest of the profiling computations. This is done with the commands EnterProfile and ExitProfile. For example, the following function identifies a for-loop as BigLoop: f := proc(n) s1 := sum(1/i,i=1..n); EnterProfile(BigLoop); s2 := 0; for i to n do s2 := s2+1/i od; ExitProfile(BigLoop); [s1,s2,s1-s2] end: Set(profile): f(10^5); which when profiled with callgraph produces: used by Callee Caller Callee #calls time words ====== ====== ====== ==== ===== #++ calls Main_Routine 1, 4.017, 467923 #++ Main_Routine calls f 1, 3.600, 400022 #++ f calls BigLoop 1, 2.983, 399996 #++ BigLoop calls gc 1, 0.067, 0 See Also: ?EnterProfile ?ExitProfile ?printlevel ?debug commandcommandline libnameexecuterundarwin Darwin is a program which can be executed interactively or in batch. In all cases it will read commands, execute them and write the output results. In Unix or Linux, the command name is darwin. Besides redirection, the command accepts the following options: darwin [-q] [-s] [-U] [-l lib_dir] [-i input_file] [-o output_file] -q - quiet option. Do not echo input statements (in batch mode) do not print garbage collection messages and final resources used message. This option can be changed with the Set command. -s - server option. Work as a server, this means that the system attempts to be immune to hostile programs that may be executing. It will not execute system commands, read or write files (except for reading from the library), use grid files, use tcp commands, use CallExternal open pipes. This option can be set with the Set command. -B - batch option. Work in batch mode, this means the system will exit when it encounters end of input or CTRL-C. This option can be changed with the Set command. -E - errorexit option. In this mode the system will exit with a nonzero status when it encounters an untrapped error. This option can be changed with the Set command. -U - Unbuffered option. The standard output, when redirected to a file, is not buffered. So any output will be stored in the file immediately. This is very useful for debugging to see the very last actions of the system in case of a crash. -l - Use the given directory as root for the darwin library. This value defaults to "lib". The global variable "libname" is set to this value. Darwin will always use the value in "libname" to load library functions. -S - Use the given file as initialization script instead of /darwinit. -i - Use the given file as standard input. This replaces the standard redirection available in Unix. -o - Use the given file as standard output. This replaces the standard redirection available in Unix. returntype return returntyping returnvalue Procedure declarations allow the definition of return-type after the parameter declarations. This definition is done with the -> operator and terminated by a colon or a semi-colon E. g. : my_function := proc () -> numeric; 42.1; end; If the function is written incorrectly i.e. it returns a string when a numeric is declared as the return type it will give an error E. g. my_function := proc () -> numeric; 'hello'; end; > my_function(); 'my_function should return numeric, returned: hello Error, (in my_function) invalid return value' Programs can test for the return type, by selecting the 5th component of a procedure body. For example, the expression op(5,op(my_function)); entered at the command prompt will return "numeric" when my_function is defined as above. Return types are used by the Inherit function to determine what data type should be returned by methods that are inherited from another class. See also: ?proc ?option ?Inherit hydrophobicity Fauchere FreeEnergy ChouFasman AtomicVolume Function hydrophobicity - define various measures of hydrophobicity and atomic volume Calling Sequence: No call needed as soon as library hydrophobicity is available Parameters: Returns: NULL Synopsis: This function assigns the global variables FauchereHydrophobicity, FreeEnergyHydrophobicity, ChouFasman and AtomicVolume. Each of those variables are assigned a vector of length 20. Each element in these vectors contain the respective value of the chemical property that variable name is referring to. Indexing of the amino acids is done according to AAAToInt. The following values are used as chemical properties: Amino acid Fauchere Free Energy Chou Fasman Atomic Volume ------------------------------------------------------------- Arg -1.01 19.92 1.04 225 Lys -0.99 -9.52 0.98 171 Asp -0.77 -10.95 1.20 125 Glu -0.64 -10.20 0.86 155 Asn -0.60 -9.68 1.35 135 Gln -0.22 -9.38 0.86 161 Ser -.004 -5.06 1.32 99 Gly 0.00 2.39 1.50 66 His 0.13 -10.27 1.06 167 Thr 0.26 -4.88 1.07 122 Ala 0.31 1.94 0.70 92 Pro 0.72 0.00 1.59 129 Tyr 0.96 -6.11 1.06 203 Val 1.20 1.99 0.62 142 Met 1.23 -1.48 0.58 171 Cys 1.54 -1.24 1.18 106 Leu 1.70 2.28 0.68 168 Phe 1.79 -0.76 0.71 203 Ile 1.80 2.15 0.66 169 Trp 2.25 -5.88 0.75 240 Examples: See also: AAAP The darwin 1.6 function AAAP has been renamed to AAAToInt in Darwin v2.2. ACS The darwin 1.6 function ACS has been renamed to AC in Darwin v2.2. AP The darwin 1.6 function AP has been renamed to AToInt in Darwin v2.2. AaCount The darwin 1.6 function AaCount has been renamed to GetAaCount in Darwin v2.2. AaFrequency The darwin 1.6 function AaFrequency has been renamed to GetAaFrequency in Darwin v2.2. AddGF The darwin 1.6 function AddGF has been renamed to AddGrid in Darwin v2.2. AlignGaps The darwin 1.6 function AlignGaps has been renamed to AdjustGaps in Darwin v2.2. AlignedIntrons The darwin 1.6 function AlignedIntrons has been renamed to GetIntrons in Darwin v2.2. AlignedPeptide The darwin 1.6 function AlignedPeptide has been renamed to GetPeptides in Darwin v2.2. AllMatches The darwin 1.6 function AllMatches has been renamed to GetAllMatches in Darwin v2.2. AminoP The darwin 1.6 function AminoP has been renamed to AminoToInt in Darwin v2.2. ApprTextSearch The darwin 1.6 function ApprTextSearch has been renamed to ApproxSearchString in Darwin v2.2. BBBC The darwin 1.6 function BBBC has been renamed to CodonToCInt BBBP The darwin 1.6 function BBBP has been renamed to BBBToInt in Darwin v2.2. BP The darwin 1.6 function BP has been renamed to BToInt in Darwin v2.2. BackDynProgr The darwin 1.6 function BackDynProgr is obsolete, use Align. BaseP The darwin 1.6 function BaseP has been renamed to BaseToInt in Darwin v2.2. BestPamShake The darwin 1.6 function BestPamShake is obsolete, use Align. BestStringMatch The darwin 1.6 function BestStringMatch has been renamed to SearchString in Darwin v2.2. BisectTree The darwin 1.6 function BisectTree has been renamed to DrawBisectTree in Darwin v2.2. CBBB The darwin 1.6 function CBBB has been renamed to CIntToCodon CleanMSA The darwin 1.6 function CleanMSA has been renamed to RemoveGaps in Darwin v2.2. CT_Species The darwin 1.6 function CT_Species has been renamed to AddSpecies in Darwin v2.2. CloseGF The darwin 1.6 function CloseGF has been renamed to CloseGrid in Darwin v2.2. ColorMap The darwin 1.6 function ColorMap has been renamed to GetColorMap in Darwin v2.2. ColorTree The darwin 1.6 function ColorTree has been renamed to CreateColoredTree in Darwin v2.2. CompressGF The darwin 1.6 function CompressGF has been renamed to CompressGrid in Darwin v2.2. ConvertSP The darwin 1.6 function ConvertSP has been renamed to SpToDarwin in Darwin v2.2. ConvertToDF The darwin 1.6 function ConvertToDF has been renamed to DbToDarwin in Darwin v2.2. CreateGF The darwin 1.6 function CreateGF has been renamed to CreateGrid in Darwin v2.2. DF The darwin 1.6 function DF has been renamed to DB in Darwin v2.2. DMDMS The darwin 1.6 function DMDMS has been renamed to CreateDayMatrices in Darwin v2.2. DnaFile The darwin 1.6 function DnaFile has been renamed to database in Darwin v2.2. DNAPepDayhoffM The darwin 1.6 function DNAPepDayhoffM has been renamed to ApproxDnaDayMatrix in Darwin v2.2. Dayhoff The darwin 1.6 function Dayhoff has been renamed to CreateOrigDayMatrix in Darwin v2.2. DayhoffM The darwin 1.6 function DayhoffM has been renamed to CreateDayMatrix in Darwin v2.2. DelFixed The darwin 1.6 function DelFixed has been renamed to FixedDel in Darwin v2.2. DelIncr The darwin 1.6 function DelIncr has been renamed to IncDel in Darwin v2.2. DigestSeqs The darwin 1.6 function DigestSeqs has been renamed to DigestSeq in Darwin v2.2. Distribution The darwin 1.6 function Distribution has been renamed to DrawDistribution in Darwin v2.2. DotPlot The darwin 1.6 function DotPlot has been renamed to DarwDotplot in Darwin v2.2. DynProgr The darwin 1.6 function DynProgr has been renamed to DynProgScore in Darwin v2.2. ERROR The darwin 1.6 function ERROR has been renamed to error in Darwin v2.2. EndOverlayPlot The darwin 1.6 function EndOverlayPlot has been renamed to StopOverlayPlot in Darwin v2.2. Entries The darwin 1.6 function Entries has been renamed to Entry in Darwin v2.2. Entropy The darwin 1.6 function Entropy has been renamed to FindEntropy in Darwin v2.2. EntryInfo The darwin 1.6 function EntryInfo has been renamed to GetEntryInfo in Darwin v2.2. EntryNumber The darwin 1.6 function EntryNumber has been renamed to GetEntryNumber in Darwin v2.2. EqradTree The darwin 1.6 function EqradTree has been renamed to DrawBisectTree in Darwin v2.2. ExponFit The darwin 1.6 function ExponFit has been renamed to ExpFit in Darwin v2.2. ExponFit2 The darwin 1.6 function ExponFit2 has been renamed to ExpFit2 in Darwin v2.2. ExtCallFrame The darwin 1.6 function ExtCallFrame has been renamed to CreateCProgram in Darwin v2.2. FlushGF The darwin 1.6 function FlushGF has been renamed to FlushGrid in Darwin v2.2. FragSearch The darwin 1.6 function FragSearch has been renamed to SearchFrag in Darwin v2.2. GetBetween The darwin 1.6 function GetBetween has been renamed to GetLcaSubtree in Darwin v2.2. GetIndex The darwin 1.6 function GetIndex has been renamed to FindTreeFitIndex in Darwin v2.2. GetLabels The darwin 1.6 function GetLabels has been renamed to GetTreeLabels in Darwin v2.2. GetTreeLength The darwin 1.6 function GetTreeLength has been renamed to TotalTreeWeight in Darwin v2.2. GetPath The darwin 1.6 function GetPath has been renamed to GetPathDistance in Darwin v2.2. Histogram The darwin 1.6 function Histogram has been renamed to DrawHistogram in Darwin v2.2. IDS The darwin 1.6 function IDS has been renamed to ID in Darwin v2.2. IPCconnect The darwin 1.6 function IPCconnect has been renamed to ConnectTcp in Darwin v2.2. IPCdisconnect The darwin 1.6 function IPCdisconnect has been renamed to DisconnectTcp in Darwin v2.2. IPCread The darwin 1.6 function IPCread has been renamed to ReadTcp in Darwin v2.2. IPCreceive The darwin 1.6 function IPCreceive has been renamed to ReceiveTcp in Darwin v2.2. IPCsend The darwin 1.6 function IPCsend has been renamed to SendTcp in Darwin v2.2. IPCreceiveDATA The darwin 1.6 function IPCreceiveDATA has been renamed to ReceiveDataTcp in Darwin v2.2. IPCsendDATA The darwin 1.6 function IPCsendDATA has been renamed to SendDataTcp in Darwin v2.2. LabelTree The darwin 1.6 function LabelTree has been renamed to ChangeLeafLabels in Darwin v2.2. LarsonTree The darwin 1.6 function LarsonTree has been renamed to DrawUnrootedTree in Darwin v2.2. LinRegr The darwin 1.6 function LinRegr has been renamed to LinearRegression in Darwin v2.2. LoadFile The darwin 1.6 function LoadFile has been renamed to ReadDb in Darwin v2.2. LongestRep The darwin 1.6 function LongestRep has been renamed to FindLongestRep in Darwin v2.2. MultiAlign The darwin 2.2 function MultiAlign has been renamed to MAlignment in Darwin v3.0. MachineUsage The darwin 1.6 function MachineUsage has been renamed to GetMachineUsage in Darwin v2.2. MapGF The darwin 1.6 function MapGF has been renamed to MapGrid in Darwin v2.2. MassDyn The darwin 1.6 function MassDyn has been renamed to DynProgMass in Darwin v2.2. MassDynAll The darwin 1.6 function MassDynAll has been renamed to DynProgMassDb in Darwin v2.2. MassProfile The darwin 1.6 function MassProfile has been renamed to SearchMassDb in Darwin v2.2. Maximize The darwin 1.6 function Maximize has been renamed to MaximizeFunc in Darwin v2.2. MinSqTree The darwin 1.6 function MinSqTree has been renamed to MinSquareTree in Darwin v2.2. Minimize The darwin 1.6 function Minimize has been renamed to MinimizeFunc in Darwin v2.2. Minimize2D The darwin 1.6 function Minimize2D has been renamed to Minimize2DFunc in Darwin v2.2. Minimizex The darwin 1.6 function Minimizex has been renamed to DisconMinimize in Darwin v2.2. MolWeight The darwin 1.6 function MolWeight has been renamed to GetMolWeight in Darwin v2.2. MostFrequent The darwin 1.6 function MostFrequent has been renamed to GetMostFrequentGrams in Darwin v2.2. MoveGap The darwin 1.6 function MoveGap has been renamed to MoveGap in Darwin v2.2. MultAlign The darwin 1.6 function MultAlign has been renamed to CreateMultiAlign in Darwin v2.2. NewArray The darwin 1.6 function NewArray has been renamed to CreateArray in Darwin v2.2. NewString The darwin 1.6 function NewString has been renamed to CreateString in Darwin v2.2. NextGF The darwin 1.6 function NextGF has been renamed to GetNextGrid in Darwin v2.2. NPAlignMatch The darwin 1.6 function NPAlignMatch has been renamed to AlignNucPepMatch in Darwin v2.2. NPAllMatches The darwin 1.6 function NPAllMaatches has been renamed to GetAllNucPepMatches in Darwin v2.2. NPBackDynProgr The darwin 1.6 function NPBackDynProgr has been renamed to NucPepBackDynProg in Darwin v2.2. NPBestPamMatch The darwin 1.6 function NPBestPamMatch has been renamed to FindNucPepPam in Darwin v2.2. NPBestPamShake The darwin 1.6 function NPBestPamShake has been renamed to LocalNucPepAlignBestPam in Darwin v2.2. NPDynProgr The darwin 1.6 function NPDynProgr has been renamed to NucPepDynProg in Darwin v2.2. NPMatch The darwin 1.6 function NPMatch has been renamed to NucPepMatch in Darwin v2.2. NPMultiAllMatches The darwin 1.6 function NPMultiAllMatches has been renamed to ParallelAllNucPepMatches in Darwin v2.2. NPOneAllMatch The darwin 1.6 function NPOneAllMatch has been renamed to AlignNucPepAll in Darwin v2.2. NPRefine The darwin 1.6 function NPRefine has been renamed to GlobalNucPepAlign in Darwin v2.2. NPRefineShake The darwin 1.6 function NPRefineShake has been renamed to LocalNucPepAlign in Darwin v2.2. NPRegions The darwin 1.6 function NPRegions has been renamed to NucPepRegions in Darwin v2.2. NPSprintMatch The darwin 1.6 function NPSprintMatch has been renamed to DynProgNucPepString in Darwin v2.2. Offsets The darwin 1.6 function Offsets has been renamed to Offset in Darwin v2.2. OneAllMatch The darwin 1.6 function OneAllMatch has been renamed to AlignOneAll in Darwin v2.2. OpenGF The darwin 1.6 function OpenGF has been renamed to OpenGrid in Darwin v2.2. OrderedSearch The darwin 1.6 function OrderedSearch has been renamed to SearchOrderedArray in Darwin v2.2. PA The darwin 1.6 function PA has been renamed to IntToA in Darwin v2.2. PAAA The darwin 1.6 function PAAA has been renamed to IntToAAA in Darwin v2.2. PAmino The darwin 1.6 function PAmino has been renamed to IntToAmino in Darwin v2.2. PB The darwin 1.6 function PB has been renamed to IntToN in Darwin v2.2. PBBB The darwin 1.6 function PBBB has been renamed to IntToNuc in Darwin v2.2. PBase The darwin 1.6 function PBase has been renamed to IntToNucleic in Darwin v2.2. PItoPam The darwin 1.6 function PItoPam has been renamed to PerIdentToPam in Darwin v2.2. PamtoPI The darwin 1.6 function PamtoPI has been renamed to PamToPerIdent in Darwin v2.2. ParExec The darwin 1.6 function ParExec has been renamed to ParExecute in Darwin v2.2. ParExec2 The darwin 1.6 function ParExec2 has been renamed to ParExecuteIPC in Darwin v2.2. ParTest The darwin 1.6 function ParTest has been renamed to ParExecuteTest in Darwin v2.2. PatEntries The darwin 1.6 function PatEntries has been renamed to PatEntry in Darwin v2.2. PepPepSearch The darwin 1.6 function PepPepSearch is obsolete, use FragSearch. SearchPepAll The darwin 1.6 function SearchPepAll is obsolete, use FragSearch. PhyloTree The darwin 1.6 function PhyloTree has been renamed to PhylogeneticTree in Darwin v2.2. PickTree The darwin 1.6 function PickTree has been renamed to FindLabeledSubtree in Darwin v2.2. PlotPam The darwin 1.6 function PlotPam has been renamed to DrawSimPam in Darwin v2.2. PlotOptions The darwin 1.6 function PlotOptions has been renamed to Plot in Darwin v2.2. PosInfo The darwin 1.6 function PosInfo has been renamed to GetPosition in Darwin v2.2. PositionDF The darwin 1.6 function PostionDF has been renamed to GetOffset in Darwin v2.2. PrintSeqsInTree The darwin 1.6 function PrintSeqsInTree has been renamed to PrintTreeSeq in Darwin v2.2. ProbDynProgr The darwin 1.6 function ProbDynProgr has been renamed to ProbDynProg in Darwin v2.2. ProfileEnter The darwin 1.6 function ProfileEnter has been renamed to EnterProfile in Darwin v2.2. ProfileExit The darwin 1.6 function ProfileExit has been renamed to ExitProfile in Darwin v2.2. QueryAll The darwin 1.6 function QueryAll has been renamed to AllQueryGrid in Darwin v2.2. QueryGF The darwin 1.6 function QueryGF has been renamed to QueryGrid in Darwin v2.2. RETURN The darwin 1.6 function RETURN has been renamed to return in Darwin v2.2. RandTree The darwin 1.6 function RandTree has been renamed to CreateRandMultAlign in Darwin v2.2. RandomPermut The darwin 1.6 function RandomPermut has been renamed to CreateRandPermutation in Darwin v2.2. RandomSeq The darwin 1.6 function RandomSeq has been renamed to CreateRandSeq in Darwin v2.2. RandomTrees The darwin 1.6 function RandomTrees has been renamed to CreateRandTrees in Darwin v2.2. Refine The darwin 1.6 function Refine is obsolete, use Align. RefineLog The darwin 1.6 function RefineLog is not implemented in Darwin v3.0. RefineShake The darwin 1.6 function RefineShake is obsolete, use Align. SameTree The darwin 1.6 function SameTree has been renamed to IdenticalTrees in Darwin v2.2. Scale The darwin 1.6 function Scale has been renamed to DayMatrixScale in Darwin v2.2. SearchDF The darwin 1.6 function SearchDF has been renamed to SearchDb in Darwin v2.2. SearchText The darwin 1.6 function SearchText has been renamed to CaseSearchString in Darwin v2.2. Sequences The darwin 1.6 function Sequences has been renamed to Sequence in Darwin v2.2. ShortestPath The darwin 1.6 function ShortestPath has been renamed to ConShortestPath in Darwin v2.2. ShortestPath2 The darwin 1.6 function ShortestPath2 has been renamed to ShortestPath in Darwin v2.2. Smooth The darwin 1.6 function Smooth has been renamed to SmoothData in Darwin v2.2. SplatTree The darwin 1.6 function SplatTree has been renamed to DrawUnrootedTree in Darwin v2.2. SplatTree The darwin 2.1 function DrawSplatTree has been renamed to DrawUnrootedTree in Darwin v2.2. SprintMatch The darwin 1.6 function SprintMatch has been renamed to DynProgStrings in Darwin v2.2. Ssystem The darwin 1.6 function Ssystem has been renamed to TimedCallSystem in Darwin v2.2. StackedBar The darwin 1.6 function StackedBar has been renamed to DrawStackedBar in Darwin v2.2. Stats The darwin 1.6 function Stats has been renamed to Stat in Darwin v2.2. Strings The darwin 1.6 function Strings has been renamed to string in Darwin v2.2. SummarizeTree The darwin 1.6 function SummarizeTree has been renamed to CollapseNodes in Darwin v2.2. TSP The darwin 1.6 function TSP has been renamed to ComputeTSP in Darwin v2.2. TSP3 The darwin 1.6 function TSP3 has been renamed to ComputeCubicTSP in Darwin v2.2. TSP4 The darwin 1.6 function TSP4 has been renamed to ComputeQuadraticTSP in Darwin v2.2. TreeOrder The darwin 1.6 function TreeOrder has been renamed to FindCircularOrder in Darwin v2.2. TrulyRandom The darwin 1.6 function TrulyRandom has been renamed to SetRandSeed in Darwin v2.2. UUUP The darwin 1.6 function UUUP has been renamed to CodonToInt in Darwin v2.2. UnCompressGF The darwin 1.6 function UnCompressGF has been renamed to UncompressGrid in Darwin v2.2. UnLabelTree The darwin 1.6 function UnLabelTree has been renamed to UnlabelLeaves in Darwin v2.2. UnionStats The darwin 1.6 function UnionStats has been renamed to UnionStat in Darwin v2.2. Violations The darwin 1.6 function Violations has been renamed to FindSpeciesViolations in Darwin v2.2. WriteMSA The darwin 1.6 function WriteMSA has been renamed to WriteMsa in Darwin v2.2. appendto The darwin 1.6 function appendix has been renamed to AppendFile in Darwin v2.2. clearw The darwin 1.6 function clearw has been renamed to ClearStat in Darwin v2.2. currentOfs The darwin 1.6 function currentOfs has been renamed to CurrentOff in Darwin v2.2. dpuTime The darwin 1.6 function dupTime has been renamed to DpuTime in Darwin v2.2. eigenvalues The darwin 1.6 function eigenvalues has been renamed to Eigenvalues in Darwin v2.2. externcall The darwin 1.6 function externcall has been renamed to CallExternal in Darwin v2.2. findkey The darwin 1.6 function findkey has been renamed to FindKey in Darwin v2.2. function The darwin 1.6 function function has been renamed to noeval in Darwin v2.2. gausselim The darwin 1.6 function gausselim has been renamed to GaussElim in Darwin v2.2. gcm The darwin 1.6 function gcm has been renamed to CodonToA in Darwin v2.2. kGramRegion The darwin 1.6 function kGramRegion has been renamed to GramRegion in Darwin v2.2. kGramRegionScore The darwin 1.6 function kGramRegionScore has been renamed to GetGramRegionScore in Darwin v2.2. kGramSite The darwin 1.6 function kGramSite has been renamed to GramSite in Darwin v2.2. kGramSiteScore The darwin 1.6 function kGramSiteScore has been renamed to GetGramSiteScore in Darwin v2.2. load The darwin 1.6 function load has been renamed to ReadLibrary in Darwin v2.2. numeric The darwin 1.6 function numeric has been renamed to real in Darwin v2.2. plot The darwin 1.6 function plot has been renamed to DrawPlot in Darwin v2.2. read The darwin 1.6 function read has been renamed to ReadProgram in Darwin v2.2. readBRK The darwin 1.6 function readBRK has been renamed to ReadBrk in Darwin v2.2. readDSSP The darwin 1.6 function readDSSP has been renamed to ReadDssp in Darwin v2.2. readfile The darwin 1.6 function readfile has been renamed to ReadRawFile in Darwin v2.2. readpipelines The darwin 1.6 function readpipelines has been renamed to OpenPipe in Darwin v2.2. readstat The darwin 1.6 function readstat has been renamed to ReadLine in Darwin v2.2. readstatAt The darwin 1.6 function readstatAt has been renamed to ReadOffsetLine in Darwin v2.2. searchtext The darwin 1.6 function searchtext has been renamed to SearchString in Darwin v2.2. specfunc The darwin 1.6 function specfunc has been renamed to specuneval in Darwin v2.2. srand The darwin 1.6 function srand has been renamed to SetRand in Darwin v2.2. system The darwin 1.6 function system has been renamed to CallSystem in Darwin v2.2. text The darwin 1.6 function text has been renamed to string in Darwin v2.2. update The darwin 1.6 function update has been renamed to UpdateStat in Darwin v2.2. writeto The darwin 1.6 function writeto has been renamed to WriteFile in Darwin v2.2. Predict The darwin 1.6 function Predict has been renamed to PredictSecStruct in Darwin v2.2. NDF The darwin 1.6 function NDF has been renamed to NucDB in Darwin v2.2. Simil The darwin 1.6 function Simil has been renamed to Sim in Darwin v2.2. Text The darwin 1.6 function Text has been renamed to string in Darwin v2.2. PDF The darwin 1.6 function PDF has been renamed to PepDB in Darwin v2.2. MaxSimil The darwin 1.6 function MaxSimil has been renamed to MaxSim in Darwin v2.2. MinSimil The darwin 1.6 function MinSimil has been renamed to MinSim in Darwin v2.2. GetPam The darwin 1.6 function GetPam has been removed, use Align in Darwin v4.0. Text The darwin 1.6 function Text has been renamed to string in Darwin v2.2. WriteFile The darwin 2.0 function WriteFile has been renamed to OpenWriting in Darwin v2.2. AppendFile The darwin 2.0 function AppendFile has been renamed to OpenAppending in Darwin v2.2. SearchPepDF The darwin 1.6 function SearchPepDF has been renamed to SearchSeqDb in Darwin v2.2. Scramble The darwin 1.6 function Scramble has been renamed to Shuffle in Darwin v2.2. AToGenCode The darwin function AToGenCode has been renamed to AToCodon IntToGenCode IntToGenCode has been renamed to IntToCodon NucToCode NucToCode has been renamed CodonToCInt CodeToNuc CodeToNuc has been renamed CIntToCodon CodonToAAA CodonToAAA has been renamed CIntToAAA CodonToAmino CodonToAmino has been renamed CIntToAmino GenCode GenCode has been renamed CodonToA NToInt NToInt has been renamed BToInt NucToInt NucToInt has been renamed BBBToInt NucleicToInt NucleicToInt has been renamed BaseToInt GenCodeToInt GenCodeToInt has been renamed CodonToInt classesclassstructuresdata structures Data structures in Darwin are represented by a name followed by the fields in parenthesis. For example: Complex( 1.0, 2.0 ) The data structures, syntactically, are identical to function calls, where the function name is the data structure name and the arguments of the call are the fields of the structure. A data structure may have its name defined as a procedure. In this case, the procedure is normally used to check the validity of the arguments, to simplify the structure if needed and/or to put it in normal form. For example: Complex := proc( realpart:numeric, imagpart:numeric ) if imagpart=0 then realpart else noeval(Complex(args)) fi end: The noeval returning the value is needed to avoid an infinite recursion on the name of the data structure; we do not want this final structure to be evaluated, it has been checked already. To construct a data structure, the functional syntax is used. To select a component, selection by an integer will always return the corresponding field. The particular data structures may have defined special name selectors. These are handled by the function StructureName_select. The following are the data structures currently implemented in Darwin. Use ?xxx to find the particulars of the structure xxx. AlSumm Fold MSAMethod Residue Alignment Gap MSAStatistics SparsePFA AllAllResult GapHeuristic Machine Stat Block GapMatch MySqlResult TaxonomyEntry Chain Gene NucPepMatch TestStatResult CoalescentNode GenomeSummary OrthologousGroup Tree CodonMatrix GramRegion PartialOrder TreeConstruction Covariance GramSite Partitions TreeResult DataMatrix Graph Polar TreeStatistics Dependency History ProbabilisticFA UnionFind Description IntronModel Process Edge LeafNode RecombinationNode EvolTree MAlignment Region See also: ?select darwin Darwin Darwin (Data Analysis and Retrieval With Indexed Nucleotide/peptide sequences) is an interactive system for doing Bioinformatics, in particular, sequence matching and sequence analysis. It is being presently developed at the E.T.H. in Zurich by the Computational Biochemistry Research Group. The development of the system and its use to solve real problems goes in parallel; the more capabilities the system has, the more complicated problems we can solve, which means more theory and more algorithms we want to implement. Darwin resembles the Maple symbolic algebra system (Maple Reference Manual, Char et al., Fifth edition, 1988) more than just superficially. Darwin works in ``calculator mode''. This means that Darwin will wait for the user to type in a command, execute the given command, print the answer (if any) and wait for more input from the user, repeating the above. Darwin indicates it is waiting for input from the user by printing a ">" character at the beginning of a line and waiting with the cursor positioned in that same line. A command to Darwin is called a ``statement'' and is always terminated by a semi-colon (;) and a carriage return (typically the key labelled ``return'' or ``enter''). Note that until Darwin reads a semi-colon and a carriage return, it will not consider its input completed and will not do anything with it. procprocedurefunctionsparameters A procedure or function in Darwin is defined with the syntactic construct "proc" ... "end". Functions (returning a value(s)), procedures (functions not returning any value) and Object constructors (functions which return a data structure), are defined by the same construct. A "proc" is the main way of defining procedures, but it is also possible to generate procedures with the arrow notation ("->") and with the use of high level functions, like Inherit, (see ?OO). Procs are also the main vehicle to define classes or data structures. A procedure has the following components: proc( param1:type1, param2:type2, ... ) -> ReturnType; local var1, var2, .... ; global gvar1, .... ; option opt1, .... ; description '....'; . . . . . . end: The formal parameters, enclosed by parenthesis right after the "proc" token, define the arguments which may be passed to the procedure. The actual number of arguments passed to the procedure in a call may differ from the number specified in the proc. The following rules apply: (1) The formal parameters have an optional type specification. (2) All the parameters which have a type, if they are present when calling the procedure, they will by typed-checked. If their type does not match, a suitable error is produced. (3) Parameters which are not present, are obviously not type-checked. If a non-passed parameter is used, then a suitable error is produced. (4) Parameters are passed by value/reference. However, if the value of a parameter is just a name, the procedure may further evaluate this name or assign values to it. Data structures or lists can be modified, and if passed as parameters and modified, will remain modified for the caller. (5) If additional parameters are passed, then this parameters are not checked nor are accessible by name. They can be accessed with "args" (all the parameters) or with "args[i]". (6) Passing more or less parameters, does not cause an error by itself. Only when a missing parameter is used it will cause an error. The number of parameters which are actually passed, is available in the body of the procedure with the name "nargs". (7) When defining a class, the parameter names become the field names (and their types) of the class. (8) Optional parameters are defined in a slightly different way and are separated from the rest by a semicolon ";". For a full description of optional parameters see ?OptionalParameters. The type following the arrow ("->") is optional and it indicates the type of result that should be returned. If a type is specified, the procedure will check that the returned value is of this type. If the type does not match, a suitable error is produced. This allows to write procedures which are completely type-safe. If the procedure returns an expression sequence, type checking is not possible. To pretty-print an entire procedure, i.e. print all the statements reformatted according to darwin's standard indentation rules, you should use print( disassemble(xxx) ) where xxx is a procedure. If xxx is the name of a procedure, then print( disassemble(op(xxx)) ) should be used. See also: ?local ?global ?OO ?OptionalParameters optionaloptional parameters defaults The Optional Parameters mechanism allows a flexible, uniform, self- documenting and efficient way of passing optional parameters to functions, procedures or constructors. The syntax is as follows, the optional parameters are separated from the rest of the parameters by a semicolon SomeProc := proc( parm1:type1, ... ; opt1, opt2, ... ) .... end: The parameters defined before the semicolon are the regular parameter, and their behaviour is as usual, except for the fact that when the procedure is invoked, all the regular parameters must be present. I.e. in the example below, f has to be called with at least one parameter (a set). The parameters defined after the semicolon are the optional parameters. Two examples are given below: f := proc( a:set ; b:posint, (c=''):string ) ... end: g := proc( ; 'mode'=(m:string), d:anything ) ... end: The definition of an optional parameter is as follows (when ambiguous, "actual parameter" stands for the use of a parameter in a function call, "formal parameter" stands for the definition of a parameter in the proc statement): (1) Each optional parameter definition is a type definition (2) The definition or exactly one of the subexpressions in each definition must be a "colon" expression, e.g. b:posint in f, m:string in g. For type-matching purposes, a colon expression matches the type defined on its right part. E.g. b:posint matches a posint. (3) The left part of a colon expression establishes the name of the variable that will hold the (part of) the parameter. It has two possible formats: name:type or (name=value):type (4) The name specified in the left part of a colon expression is the name of a local variable inside the function/procedure that will hold the value of what matches on the right part of the colon expression. E.g. f({5},ACGT, 7) will result in the local variable b assigned 7 and the local variable c assigned ACGT. (5) If the left part of the colon expression is of the type name=value, then if no parameters match the optional parameter, the given name will be assigned "value". This is the preferred mechanism to define default values for unspecified parameters. E.g. f({0},3) will result in b assigned 3 and c assigned '', the empty string. (6) On calling a function/procedure with optional parameters, each actual optional parameter is paired against the first formal parameter that matches its type. The actual parameters are matched from left to right. E.g. g(mode=exact) will assign "exact" to the local variable m, g([1]) will assign [1] to the local variable d. (7) Once that a formal parameter has been matched with an actual parameter, its associated name is assigned, and this formal parameter cannot be paired with any other actual parameter. E.g. f({1},2,3) will assign 2 to b, and then will give an error, since 3 cannot be matched against any formal parameter (not yet matched). Notice that the number of actual parameters cannot be larger than then number of formal parameters when optional parameters are used. (8) Once that all the actual parameters are paired, any remaining formal parameters which are not paired yet and have a colon expression of the form (name=value):type will have their corresponding local variables assigned their default values. E.g. f({3}) will leave b unassigned and assign '' to c. The following are some worked examples relating to some known functions or common situations (I) Align := proc( s1:string, s2:string ; (dm=DM):{Dayhoff,list(Dayhoff)}, ... ) The function Align always requires two sequences which are strings. Those will be required on each use and will be s1 and s2. The first optional argument is a Dayhoff or list of Dayhoff matrices. If none is supplied, the function will have dm assigned the variable DM (which is normally assigned to the default Dayhoff matrix). (II) Align := proc( ..., (Method='Local'):{'Local','Global','CFE','Shake'} ... The next optional argument defines the method to be used. The method can be given as a name/string. Only 4 strings are valid methods, and if none is provided, Method is assigned 'Local'. This also resolves the problem of incorrectly specifying more than one method, once that the formal parameter is matched, it cannot be matched again. (III) SomeClass := proc( ...., Comment:string ) By having an optional parameter at the end of the parameter list, so that it catches any leftover string is a good way to allow optional informational data like comments. (IV) Entry := proc( e:posint ; (db=DB):database ) For system-wide variables, like DB, the default database, which are 99% of the time used from their default values, this definition provides the added flexibility that it does not require anything when the default is used, and if a database is passed as an argument, then it will be used correctly (without any extra work inside the function). (V) DrawTree := proc( ..., 'LengthFormat' = ((lf='%d'):{string,procedure}) DrawTree is a function which has many many options, most of which have practical defaults. In this case, the format for displaying branch lengths is by default an integer. It could be some other printf format (which is a string) or some procedure which produces the display. The internal variable lf is assigned the right information, all error checking and defaults are done automatically. No disassembling of the parameter is needed. Finally here is a more formal definition of the steps followed by the evaluation of parameters in the presence of optional parameters: (i) The regular parameters (must all be there) are assigned and type-checked if they have a type definition. (ii) Each actual optional parameter (from left to right) is matched against the unmatched formal parameters (from left to right). An unmatched actual optional parameter gives an error. (iii) The unmatched formal parameters that have a default value definition are evaluated and the corresponding local variables assigned. This evaluation is done with access to all regular parameters and optional parameters already assigned. (iv) For further clarity, the types are never evaluated, they are as given in the proc statement, only the "value" part of a (name=value):type is evaluated in full. locallexical scopelexically scopedaccess rules temporary variableslexicalscoping The "local" statement in a procedure body, defines the variables which are local to the procedure. That is, variables that will only exist for each invocation of the procedure. The variables will not be assigned any value nor will retain any value after the procedure end its execution. Recursive invocations of a procedure will have their own set of local variables, distinct for every invocation. Normally it is not necessary to define any local variables, as any variable which is assigned in the body of the procedure (either explicitly or implicitly in a for-loop) will be made automatically local. To enforce that an assigned variable be global, it must be defined in the global statement (see ?global). Local variables will be accessible to any procedure which is defined inside the body of the procedure. This is normally called "lexically scoped variables". The following example clarifies the access rules for variables. outer := proc( a:numeric ) local x; x := a+w; inner := proc( b:numeric ) y := x+b+z end: x+inner(7) end: The above code defines a procedure called outer. This procedure has one formal parameter, "a". It also defines a local variable "x"; but this definition is redundant, as x is assigned inside outer and will automatically be defined local. "inner" is also a local variable of "outer" as it is assigned a values inside it. "inner" is a procedure which takes one argument, "b". "inner" will have a local variable, "y", which is assigned inside its body. The assignment inside "inner" illustrates all the types of access to variables: y is local, b is a parameter. Parameters and local have the highest binding, that is will dominate over other forms of reference. "x" is external to "inner" but local to "outer" where "inner" is defined to which it refers. So the "x" in "x+b+z" refers to the local "x" in outer. "x" is called a lexically scoped variable. It has the second binding strength. Finally, both "w" and "z" are neither parameters nor locals of any of the functions and hence are global. See also: ?global, ?proc, ?OO abbreviationacronyms syllableshyphenation English dictionaries provide the legal hyphenation pattern for a word, eg. ap . prox . i . mate, usually in bold face. This does not necessarily correspond to the syllables of the word (these are typically given in the international pronunciation) e.g. Oxford English Dictionary (OED). We will use the syllables of a word to create abbreviations for names which are too long in Darwin. The convention is as follows: When names are abbreviated in Darwin, we use the first syllable of a word according to the OED. If this abbreviation is either (1) too short for uniqueness, (2) unaesthetic or (3) extremely unpronounceable, the second syllable of the word is added. Subsequent syllables are added until problems (1) - (3) disappear. There are small number of computer and biological abbreviations common to both literatures. These abbreviations do not follow the above principle but may be used throughout the system and the onus lies on the user's shoulders to identify their meanings. In general, this list should be kept as small as possible. There is a delicate balance between the advantages of having short names in the system and the disadvantages of having too many abbreviations. Abbreviations from Computer Science: abbreviation description DB database eval evaluate int Integer IPC Inter-process communication LS Least Squares Svd Singular value decomposition TSP Travelling salesman problem UTC Universal time coordinated (Greenwich time) Abbreviations from Biology are: abbreviation description A Amino acid (single letter code) AAA Amino acid (3-letter code) AC Accession number, (used by SwissProt database) Amino Amino acid (fully spelled) B Base part of nucleotide, (one letter code) Base Base part of nucleotide, (fully spelled) BBB Base part of nucleotide, (3-letter code) CInt an integer between 1 and 64 identifying a codon (3 bases) Codon 3 bases in a single string, eg. "ACT" DM Dayhoff matrix DNA deoxyribonucleic acid, (A,C,G or T) ID Identification number, (used by SwissProt database) MSA Multiple Sequence Alignment NP Nucleotide-peptide PAM Point accepted mutations, a measure of distance Pep Peptide (amino acid) RNA ribonucleic acid, (A,C,G or U) Sim Similarity score tRNA transfer-RNA a molecule translating codons to peptides ipcsend ipcsend ipcsend is a simple UNIX program, included in the darwin distribution package, that sends darwinipc messages directly to the darwinipc daemon. It is useful for testing the darwinipc daemon. For the complete set of messages that can be used with ipcsend see the darwinipc help file. (?darwinipc) Usage is as follows: ipcsend [timeout] message timeout (optional) time in seconds to wait for a a response from the daemon. Default is 3 seconds. message Message to be sent. If this message contains any characters being interpreted by the shell or a sequence of blanks, quote it. Examples: >CallSystem('ipcsend MSTAT mendel;'); DATA mendel 0:OK BUSY: >CallSystem('ipcsend PING;'); PING OK See Also: ConnectTcp darwinipc DisconnectTcp ParExecuteIPC ReceiveTcp SendTcp darwinipcIPC darwinipcIPCinterprocess communication darwinipc is the interprocess communication program that is distributed with darwin. It enables two or more darwin processes (on the same or different machines) to communicate with each other via TCP/IP. The darwinipc daemon establishes TCP/IP connections between machines and UNIX internal protocol connections to local processes which want to communicate via the daemon. It also starts and controls remote jobs. This daemon runs once on each machine. For TCP/IP communication, darwinipc uses the port number defined by the Darwin entry in /etc/services. If no such port is defined, it will use the fixed port 12345. The /etc/services file on all machines you want to use should therefore contain the following line: darwin 12345/tcp Darwin DARWIN #Darwin IPC Whenever a connection to the daemon is established, the correct password must be sent as first data. The password is read from the file name defined by the IPC_PW environment variable, or from $HOME/.ipcpw if IPC_PW is not defined, or from .ipcpw if $HOME is undefined. Make sure this password file cannot be read by unauthorized users! As of March 2003, darwinipc, via a system call, uses ssh instead of rsh to start the darwinipc daemon on remote machines. For this to work properly, ssh must be configured to run without asking for a password. This can be accomplished with the .ssh/known_hosts file. Please read the ssh documentation to configure it properly. Usage is as follows: darwinipc [-l] [-L] [-U] [-u user] [-t timeout] port The command accepts the following options: port - The UNIX internal protocol port name (/tmp/.darwinipc). -l - Causes activity log to be written to stdout (default: no log) -L - Causes activity log to include data received and sent. -U - Unbuffered option. The standard output, when redirected to a file, is not buffered. (useful for debugging.) -u user - Adds user to the list of users that do not affect login control (default: all users affect login control) -t timeout - Sets the time between login and machine load checks to timeout seconds (default: 10) The darwinipc daemon understands the following set of commands: Message: EXIT Purpose: Exit the daemon. Replies: Nothing or ERROR message. Example: EXIT Message: EXIT Purpose: Exit the daemon. Replies: Nothing or ERROR message. Example: EXIT Message: JOBS mach Purpose: Returns the jobs controlled by mach and their status Replies: DATA mach 0:JOBS {pid RUNNING|STOPPED} or DATA mach 0:ERROR message or ERROR message Example: JOBS mendel, returns DATA mendel 0:JOBS 8281 RUNNING Message: LOADC mach low high Purpose: Sets load thresholds for mach to low and high (defaults are 0.7 and 2.0) Replies: nothing or DATA mach 0:ERROR message or ERROR message. Example: LOADC mendel 0.7 2.0 Message: LOGINC mach ON|OFF Purpose: Turn login control for mach on or off (turned off by default). Replies: nothing or DATA mach 0:ERROR message or ERROR message Example: LOGINC mendel ON Message: MAXJB mach max Purpose: Sets maximum number of RUN jobs for mach to max (defaults to 1) . Replies: nothing or DATA mach 0:ERROR message or ERROR message Example: MAXJB mendel 2 Message: MSTAT mach Purpose: Returns status of mach Replies: DATA mach 0:OK ALIVE (machine is alive and maximum number of RUN jobs is not reached) or DATA mach 0:OK BUSY (machine is ali ve and further RUN jobs will be rejected or stopped immediately) o r DATA mach 0:OK DOWN or DATA mach:0:OK STARTED or ERROR message Example: MSTAT mendel, returns DATA mendel 0:OK ALIVE Message: OFFHR mach start end Purpose: Sets off hours for mach to be from start to end Replies: nothing or DATA mach 0:ERROR message or ERROR message Example: OFFHR mendel 8 18 Message: PING Purpose: Check whether daemon is running on machine from which command i s issued Replies: PING OK or nothing if daemon is not running Example: PING, returns PING OK Message: PSTAT mach pid Purpose: Returns status of process pid on mach Replies: DATA mach 0:OK STOPPED or DATA mach 0:OK RUNNING or DATA mach 0 :OK NONE or ERROR message. Example: PSTAT mendel 8281, returns DATA mendel 0:OK STOPPED Message: REXIT mach Purpose: Exit the remote daemon on mach. Replies: Nothing Example: REXIT mendel Message: RSH mach cmd Purpose: Run command cmd on mach (same as background rsh, but much faste r.) cmd is interpreted by csh. Replies: Nothing or DATA mach 0:ERROR message or ERROR message. Example: RSH mendel kill -STOP 8281 Message: RUN mach cmd Purpose: Run controlled command cmd on mach (expects process group and i d being sent back by cmd). cmd is interpreted by csh, Replies: DATA mach 0:OK pid or DATA mach 0:ERROR message or ERROR messag e. Example: RUN mendel darwin -q outfile, DATA mendel:OK 8281 Message: SEND mach pid:data Purpose: Send data packet to PID on mach. mach will receive DATA srcmach srcpid:data, with srcmach and srcpid being the machine and pid of the sending process. Replies: nothing or DATA mach 0:ERROR message or ERROR message. Example: SEND mendel 8281:[20.3,14.8] sent by pid 1365 on vinci. PID 828 1 on mendel will receive the message DATA vinci 1365:[20.3,14.8] See Also: ConnectTcp DisconnectTcp ipcsend ParExecuteIPC ReceiveTcp SendTcp namingname conventionsfunction names name rules The following is a short document sketching the Darwin naming convention. We can group the different Darwin constructs into five categories: built-in types structured types commands built-in functions, and library functions. We give a short but reasonably precise set of rules for naming types, structures, routines etc. for each of these categories. This document is primarily meant for Darwin developers. Built-in Types Rules 1) All built-in types should have names consisting of only lower case letters. 2) Only very common computer science names should be abbreviated (See ?abbreviations). For example "uneval" (short for "unevaluated"). Structured Types Rules 1) Only the first letter of each word should be capitalised. 2) Structured type names should be kept reasonably short. When abbreviations seem appropriate, they should take place according to rule above. 3) Abbreviations used in selector names should correspond to abbreviations used throughout the system. It makes no sense to use the abbreviation "Sim" for "Similarity" throughout the Darwin system and then require users to select with Simil on DayMatrix structures. 4) When the structure is used in tandem with a routine, then the name of the structure will coincide with the name of the function which constructs such a structure. (See ?OO for more details) 5) Selector names should reflect the type of data they return. If they return a simple type, they should have names formatted according to the naming conventions for simple types. If they return structured types, they should have names formed according to these rules. Commands and Built-in function Rules: 1) We follow computer science history for naming conventions as close as possible. 2) We always use lower case letters. Thus, "error" and "return". Both of these functions act as commands in Darwin. 3) Mathematical functions are named according to Abramowitz and Stegun conventions. 4) We stay with the conventions of the language "C" when the function is sufficiently similar to the "C" routine (e.g. printf, sprintf, sscanf). 5) We stay with the conventions of the language "Maple" when the function is sufficiently similar to a function from that language (or an exact copy). 6) If the routine has a common application in another field (such as the "NBody" function does in physics), this name can be chosen. It would be preferable to give it the more abstract mathematical name when such a name exists. 7) If none of the above cases apply, we use the conventions of "Library Functions" below. Library Function Rule: 1) The name should consist of at most five parts. _< adjective> 2) The "verb" should reflect the action in a meaningful way ie. Draw, Load, Save, Print. For performing string searches, the verb should be Search. If we are aligning sequences, it should be Align. If we are creating a graphics file, it should be Draw. If a new object is returned or created, it should be Create. 3) The "noun" will typically be the object if you were to say the sentence completely. It will typically be a type or structured type. If the routine works on a particular type, then this type should be placed in the name as the noun. For example, CreateString. A noun should be chosen that represents the generic object and is mathematical in nature ie. avoid choosing names which are cute but little known. 4) The "adjective" should only be included when its absence does not distinguish between the objective of two or more routines. For example ?DrawTree and DrawUnrootedTree. 5) The first and only the first letter of each word should be capitalised unless it is part of an abbreviation common in the biology/biochemistry literature (see ?abbreviations). 6) The "adverb" indicates a qualified action. For example, ApproxSearchString. (Approximate (abbreviated) is the adverb, Search is the verb, String is the object type.) 7) The "domain" is a special identifier used to indicate that the routine that follows (ie. the ) applies to a special type of object. For example Inter Processor Communication abbreviated to IPC and Nuclear Peptide abbreviated to NucPep. For example, if we have a function to align sequences GlobalAlignBestPam, which works for amino acids, the same function working for nucleotide-peptides will be called NucPep_GlobalAlignBestPam. 8) Abbreviations should be avoided. When function names are too long, the adverb and adjective should be the first to be abbreviated. All abbreviations should follow the rule above. 9) Underscore characters should be avoided except to separate from the rest of the name. Of course, underscore characters are need for polymorphism but this poses no problem with our conventions. 10) Nouns should be singular. 11) Nouns in their plural form will be used to define iterators. So Entry defines a database entry, and Entries() is the iterator which goes over all the entries of the default database. 12) Functions which perform "conversion" require a bit of extra attention. If the conversion is from a type to a type, and it is expected to be done (sometimes) automatically, then the name should be _ (See ?OO for more details on converters). For example, PatEntry_string. For more general data converted into other data the name should be To. For example, IntToAmino. optionoptions builtininternalnumericpolymorphictracezippable Procedure declarations allow the definition of options in their headers. This definition is done with the keyword "option". Options are simple identifiers, separated by commas and terminated with a semicolon. E.g. f := proc( x:numeric ) option internal; x+1 end: Options can be added to procedures as desired. Programs can test for options, by selecting the 3rd component of a procedure body. For example, the expression op(3,op(f)) will return "internal" when f is defined as above. The system recognizes 7 options which have the following interpretation: builtin This is interpreted as a function whose definition is in the kernel. The body of such a procedure should consist of a sigle integer. This integer is fixed and links directly to an internal function. internal This means that the function is not intended for use by general users. No help files will be generated for them. numeric Functions which are internal and are mathematical, in the sense of computing a numerical value, should have this option. A different, faster, evaluator is used to evaluate them. polymorphic This means that the given function defined is polymorphic. When an unknown data structure is passed as an argument, then the corresponding function will be called. For example: f := proc( x ) option polymorphic; .... Later calls like f( ABC(...) ) will attempt to execute ABC_f, if this name is defined as a procedure. Or f( 1..2 ) will attempt to execute range_f if this name is defined as a procedure. See ?polymorphic or ?OO for more details on Object Oriented programming in Darwin. When a data type/structure is intended to be also a converter, e.g. Complex, then the definition of the constructor should also carry option polymorphic. trace The corresponding function will have its current printing level turned high enough so that the execution of all its statements is printed out. zippable This means that if the function would not compute when given an array or a matrix, before issuing an error, an attempt will be done to compute it as zip(f(x)). NoIndexing This option is for a class or data structure. It means that integer indexing will be disallowed both for assigning and for selecting fields. Ranges of integers are also disallowed. Setting this option enforces all access to the class through the names of the fields or through the xxx_select function. If this option is set and a xxx_select function is defined, then any integer or range indexing will be passed to the xxx_select function. Since accessing fields by position usually prevents polymorphism, this option will help enforce object orientation NormalizeOnAssign This option is for a class or data structure. It will force a normalization of the object every time that any of its components is assigned. This is normally done when there are many constraints between the components of a class, and it is not possible to check a field assignment without checking the entire object. selectorselectorsselectionfieldsindexing Selectors are the most common and efficient way to select components of a structure in Darwin. The syntax for the selectors is the same as the one for indexing arrays. That is, if s is a structure, s[x] is a selection or indexing of s. A selector may be used to return a part of an expression or may be used to modify the corresponding component of the structure. In all cases, the behaviour of generalized selectors is identical to the behaviour of array selection (indexing). The selector has several possible forms. Selectors can be integers, names, strings or arbitrary expressions. If the selectors are integers, names (also strings) that coincide with the names of the parameters of the data structures, then they are handled by the kernel. Otherwise a function xxx_select will be invoked to resolve the selection/ assignment. posint a positive integer i selects the ith element of the structure, array, list, set, range, etc. Over strings, selection will return the ith character. integer a negative integer i selects the ith element from the right of the structure, array, list, set, range, etc. Over strings, selection will return the ith character from the right. That is, a[-1] is equivalent to (and more efficient than) a[length(a)]. range an expression of the form a..b, where a and b are integers. The selection will return the values from to b as an expression sequence. If a or b are negative they are interpreted as counting from the left. So s[-2..-1] returns the last two components of s. This form cannot be used in an assignment. When a range selector is used, an expression sequence is returned. For example, if s is a structure with at least two elements, then s[1..2] will return an expression sequence with two elements (suitable for use in lists, sets and other structures). There is one exception to this rule, range selection of lists returns a list. name a name which coincides with a name of a parameter of the data structure definition, selects that field. If a string is used instead of a name, it has the same effect. The use of strings is sometimes needed when the name has been used for a variable and hence it has a value and cannot be used as a selector. AC ac Alignment DayMatrix Identity Length1 Length2 Offset1 Offset2 PamDistance PamNumber PamVariance Score Seq1 Seq2 Sim modes Block DigestSeq CodonMatrix AAPam CodonPam Desc FixedDel IncDel Sim Color colcode Complex DigestSeq ConsistentGenome name Counter title value Covariance CorrMatrix CovMatrix Description Eigenvalues First MaxVariance Maximum Mean Minimum Number StorMatrix StorSum VarNames Variance DayMatrix DelCost Dimension FixedDel IncDel Mapping MaxOffDiag MaxSim MinSim PamDistance PamNumber Sim StopSimil logPAM1 pam type DocEl content content_i name tag Document content_i Edge From Label Node1 Node2 To FileStat path st_atime st_blksize st_blocks st_ctime st_dev st_gid st_ino st_mode st_mtime st_nlink st_rdev st_size st_uid Gap DigestSeq Gene AlignErrors Division Exons Introns NucEntry NucSequence PepLength PepOffset PepSequence mRNA GenomeSummary EntryLengths Epithet FileName FileNameOrig Genus Id Kingdom Lineage String TotAA TotChars TotEntries Type sgml_tag string type Graph Adjacencies AdjacencyMatrix Degrees Distances Incidences Labels a n ID id Intron div n pam IntronModel Acceptor Donor InIntron MinLen Leaf Height Label LinearClassification HighestNeg LowestPos NumberNeg NumberPos WeightNeg WeightPos WeightedFalses X X0 LinearIntron F I minlen n pam LongInteger DigestSeq Machine Class DownCount ForcedRun LastProcess LoadRange LoginControl MaxProcesses Name NiceValue OffHours Processes StartCycle User MAlignment AlignedSeqs InputSeqs labels t MapleFormula expr Match MatchParams MySqlResult ColumnLabels Data OrthologousGroup AllAll Length Seqs Species Tree Paragraph content content_i indent PatEntry a Permutation p PlotArguments Axis Colors Grid GridFormat LabelFormat Lines Title TitlePts TitleX TitleY Polar DigestSeq ProbSeq CharMap ProbVec Process ElapsedTime EventTime Job JobTime Pid Stopped Stat Average CV Description Excess Max Maximum Mean MeanVar Min Minimum Number ShortForm Skewness StdErr VarVar VarVariance Variance SvdResult MinNorm2Err NData Norm2Err Norm2Indep SensitivityAnalysis SingularValuesDiscarded SingularValuesUsed SolutionVector table Table_Default Table_Values key TaxonomyEntry Children Common Name Lineage Lineagestring Other names Parent Scientific Name Species code Synonym id TestStatResult CountMatrix TestStat name plog pstd pvalue TextBlock blockname blocktype content content_i Tree Height Left Right TreeResult Other Tree Type UnionFind Clusters Col Elements ElmInd Sizes The user/programmer can define its own selectors for a particular structure. See ?selector function for details. See also: ?expseq, ?op, ?selector function types Types Names which can be used as arguments of the type function and in general, as arguments when a type is required. AC Entry MapleFormula Protein AcceptCriteria equal Match RandomGeneratorPFA algebraic ETHMachine matrix range Alignment EvolTree MatrixDecomposition RecombinationNode AllAllResult expseq MSAMethod relation AlSumm float MSAStatistics Residue And Fold MySqlResult RNA anything Font name SectionHeader ARG ForLoop ndimPoint select ARGNode Gap negative SeqThread array GapHeuristic Nodes set Assign GapMatch nonnegative Size Block Gene Not SparsePFA Bold GenomeSummary NucPepMatch Stat boolean GramRegion numeric StatSeq catenate GramSite Or Stop Center Graph OrthologousGroup string Chain HelpEntry Pair structure CoalescentNode History Paragraph SvdResult Code HyperLink Param symbol CodonMatrix IfStat Parsimonious Table Complex Indent PartialOrder table compressed integer PartialOrderMSA Target constant IntronModel Partitions TaxonomyEntry Copyright Island PatEntry TestStatResult Counter IT Permutation TextBlock Covariance LastUpdatedBy PlotArguments times database Leaf plus Tree DataMatrix LeafNode PlusMin TreeConstruction DayMatrix less Polar TreeResult Dependency lessequal posint TreeStatistics Description LinearClassification positive Triple DigestionWeights List PostscriptFigure TT DNA list power type DocEl Local ProbabilisticFA unequal Document LongInteger ProbSeq UnionFind Edge Machine procedure VectorDB Edges MAlignment Process Additionally to the above names, the following are also valid types: type description -------------------------------------------------------------------------------- matches a numeric with the same value matches a string/symbol with the same characters matches a list with the same length and corresponding types. Ditto for relational operators ( = <> < <= >= > ), ranges, ands, ors, nots, concatenation and selected names {typ1,typ2,...} matches if any of the types in the set are matched identical(xxx) matches xxx exactly anyfunc(typ1,..,typn) matches any structure which has n arguments and each argument matches the given subtype structure(argtype,sname) matches a structure named sname with each argument matching argtype. sname can be a set of names or can be absent (any name). matrix(subtype) matches a matrix of entries matching subtype array(subtype,dim1,dim2..) matches a multidimensional array of the given dimensions with entries matching subtype (subtype) matches the named structured type when all the components match the subtype StructName(typ1,...,typn) matches a StructName structure with n arguments and each argument matches the given subtype objectorientedprogramming polymorphic object oriented C++ OO polymorphism inheritance Darwin is an object oriented language. Object oriented programming is supported by several features. To illustrate these notions we will use the implementation of complex numbers in Darwin. The features supporting OO programming are: Data types/Classes - Arbitrary data types can be created dynamically by using a functional notation, where the function name is the data type name and the arguments are the components. Complex( real_part, complex_part ) will be our data type to hold complex numbers. The number 1 is then represented as Complex(1,0). Complex(0,1) denotes the imaginary unit. See ?Complex for full details of this example. Constructors - A constructor of the Data type is any function/method or operation that will produce as a result a new object of the given type. It is customary to use the name of the data type as a constructor. This has several advantages: readability, simpler name space, and the possibility of having a checker/ normalizer. When the data type has type restrictions in its components, this type checking can be done automatically by defining the contructor function as a function with argument type checking. For example, if we want our arguments of the Complex type to be numeric, we can do this by defining: Complex := proc( Re:numeric, Im:numeric ) . . . . end: The result of a call to Complex(a,b) (which is now a function too) should be the structure Complex(a,b). Technically, Complex(a,b) must evaluate first, to do parameter checking and other normalizations, and then return unevaluated. This is achieved with the noeval() function. Noeval assembles a data type without calling the function. The above example becomes: Complex := proc( Re:numeric, Im:numeric ) noeval( Complex(Re,Im) ) end: Normalizers - The constructor function/method could perform extra checks or simplification of the data type if this is desired. In the case of complex numbers, it may be desirable to simplify Complex data types with a 0 imaginary part to a simple numerical value. E.g. Complex := proc( Re:numeric, Im:numeric ) if Im=0 then Re else noeval( Complex(Re,Im) ) fi end: Selectors - Selectors are used in two main modes, to select part of the data type or to modify part of the data type. Selectors are handled by the kernel or by user functions. Integer selectors or selectors with the names of the parameters of the data type are handled by the kernel. Other selectors are handled by a function/method named like the data type concatenated with the string '_select'. The selector function is passed the object and the selection argument and, optionally, the value to be assigned. Selectors which are positive integers or a range of positive integers are computed directly, and operate on the corresponding component of the data type. Complex := proc( Re:numeric, Im:numeric ) ... end: a := Complex(7,-3); a[xxx] - Identical to Complex_select(a,xxx) a[1] - Is 7, without any function calls. a[Re] - is 7, without any function calls. a[Im] := -1; - Will change a so that the second component is changed to -1. This is done by the kernel. a[yyy] := 1; - Will be handled by calling Complex_ select(a,yyy,1). The return value is ignored in this case. Complex_select will normally modify the data type. a[2] := 3; - The value 3 is assigned to the second component of a without any call. a[1..2] - Is the expression sequence 7,-3, without any function calls. a[Im] := []; - Gives an error, since the type of the second argument does not match the assigned value. It is clear that using integer selectors will prevent the use of generic data types and object orientation, and should not be encouraged. Note that the function 'op' is equivalent to selecting with integers. Using the names of the parameters as the selectors provides type checking (on assignments) and is performed by the kernel, hence it is very efficient. If the integrity of the whole data structure needs to be checked, then the user must write a xxx_select function to run any desired check and/or the option NormalizeOnAssign should be specified in the constructor. Converters - A converter is a function/method which converts one data type into another. For data types A and B, the function B_A should convert a B object into an A object. When the data type A is defined with option polymorphic, then this conversion (calling B_A), is done automatically for any use of A(B(..)). It is common, and very useful, to have converters to basic types in the system, like string. The function/method B_A will be called with the object B as argument. E.g. Complex_string := proc( C:Complex ) sprintf( '%g+%gi', C[Re], C[Im] ) end: Polar_Complex := proc( p:Polar ) Complex( p[rho]*cos(p[theta]), p[rho]*sin(p[theta]) ) end: Operations - A function/method which is defined with option polymorphic is able to handle arbitrary objects. If the function is named f, then when f is called with a single object of type A, A_f will be called. E.g. f := proc( x:numeric ) option polymorphic; x+1 end: Complex_f := proc( x:Complex ) Complex( x[Re]+1, x[Im] ) end: Most system functions have option polymorphic. In particular all arithmetic operations. Complex_plus, Complex_times and Complex_power will handle all arithmetic operations with Complex data types. (Subtraction and division are handled by multiplication by -1 and powering to -1). It is very useful to implement the following methods for a data type A: A_plus A_times A_power A_print A_printf A_string A_equal A_Rand A_type A_example A_Description Type testing - Type testing can be done by a type-testing expression or by a type-testing procedure. In both cases, the symbol Complex_ type is assigned a value. Type testing expressions are powerful enough for most uses. E.g. Complex_type := noeval( Complex(numeric,numeric) ); Notice that a noeval is needed, since the arguments of the Complex type are not valid as a complex number, and hence would give an error if evaluated. With this definition type testing can be done explicitly or implicitly. E.g. if type(a,Complex) then .... Complex_plus := proc( a:Complex, b:Complex ) ... Inheritance - Inheritance is the ability of instructing the system that a certain data type is equivalent or a super-set of another, and hence operations do not need to be redefined. More precisely, let A and B be data types and assume that A is either equivalent or a super-set of B. The command Inherit( A, B ); is interpreted as: A will inherit any operation from B which is not defined for A. This operation will be appropriately modified so that it works with A objects. For example, we can define the data type Polar, which is also a complex number. So Polar is equivalent to Complex. Polar will have some special selectors, and some operations which can be performed more efficiently in this representation (e.g. multiplication, powering and absolute value). The rest of the operations can be inherited from Complex. The definition of Polar could be: Polar := proc( rho:numeric, theta:numeric ) option polymorphic; ... normalizations, error checking, etc. ... noeval( Polar(args) ) end: Polar_select := proc( a:Polar, s, val ) # selectors must include Re and Im so that it # can work as a Complex . . . . end: Polar_times := proc( a:Polar, b:Polar ) Polar( a[rho]*b[rho], a[theta]+b[theta] ) end: Polar_abs := proc( p:Polar ) p[rho] end: Inherit( Polar, Complex ); CompleteClass( Polar ); The function CompleteClass performs checking and some level of completion of a class. For example, it will find that there is no type definition for Complex, but there is enough information (from the types of the parameters) to construct a primitive checker. In this example it will also create a Complex_Rand function which creates random instances of Complex. It is recommended that CompleteClass is run after a class is defined. Organization - All functions/methods related to a data type, i.e. all functions with names Complex_xxx, should be stored in the library in a single file, ideally named "Complex". The symbol Complex should be assigned an unevaluated ReadLibrary command. E.g. Complex := noeval( ReadLibrary(Complex) ): or if the functions are stored in 'mylibrary/Complex', Complex := noeval( ReadLibrary( 'mylibrary/Complex', Complex )): See also: ?option ?Complex ?type ?Inherit ?selectors ?ReadLibrary index Index of topics available under this help system. Type ?xxxx in a single line to obtain the help on xxxx AAAP AAAToInt AaCount AaFreqNoPat AaFrequency abbreviation abs AC ACS ActOut AddDeviation AddGF AddSpecies Align AlignedIntrons AlignedPeptide AlignedSeq AlignGaps Alignment AlignNucPepAll AlignNucPepMatc AlignOneAll AllIndices AllMatches AllRootedTrees AllTernaryRoots AltGenCode amino acids AminoP AminoToInt antiparallel AP APC append AppendFile appendto ApproxSearchStr ApprTextSearch arcsin arctan AsciiToInt assemble assert assign assigned AToCInt AToCodon AToGenCode atoi AToInt AtomicVolume avg BackDynProgr BackTranscribe BackTranslate BaseCount BaseP bases BaseToInt BBBC BBBP BBBToInt BestPamShake BestSearchStrin BestStringMatch Beta_Rand BFGSMinimize Binomial_Rand BinTree BipartiteGraph BipartiteSquare BirthDeathTree BisectTree Block BootstrapTree BP BrightenColor BToInt CalculateScore CallSystem CaseSearchStrin CBBB ceil CenterTreeRoot ChangeLeafLabel CheckAmbigTree ChiSquare_Rand Cholesky ChouFasman CIntToA CIntToAAA CIntToAmino CIntToCodon CIntToInt CircularTour CleanMSA clearw Clique CloseGF Clustal ClustalMSA ClusterRelPam Clusters Code CodeToNuc CodonAlign CodonCode CodonCount CodonDynProgStr CodonMatrix CodonMutate CodonPamToPam CodonToA CodonToAAA CodonToAmino CodonToCInt CodonToInt CodonUsage coeff Collapse CollapseNodes CollectStat Color ColorMap ColorPalette ColorTree command commandline Complement ComplementSeque Complex compress CompressGF ComputeCAI ComputeCAIVecto ComputeCubicTSP ComputeDimensio ComputeQuarticT ComputeTPI ComputeTSP ConcatStrings ConnectTcp ConsistentGenom conversion ConvertSP ConvertToDF convolve copy cor cos Counter Covariance CreateArray CreateCodonMatr CreateCodonMode CreateDayMatric CreateDayMatrix CreateGF CreateMSAMethod CreateOrigDayMa CreateParametri CreateRandMultA CreateRandPermu CreateRandSeq CreateString CreateSynMatric CreateTreeConst CreateTreeConst CreateTreeStati CT_Species Cumulative CumulativeStd CurrentOff currentOfs darwin darwinipc data structures database DataMatrix date Dayhoff DayhoffM DayMatrix DayMatrixScale DBL_EPSILON DBL_MAX DbToDarwin debug decompress defaults DelFixed DelIncr Denormalize Description DF DigestAspN digester DigestionWeight digestor DigestSeq DigestSeqs DigestTrypsin DigestWeights disassemble DisconMinimize DisconnectTcp Distribution DM DMDMS DMS DnaFile DNAPepDayhoffM DocEl Document DoGapHeuristic DotPlot DownloadURL dprint dpuTime DrawDistributio DrawDotplot DrawGraph DrawHistogram DrawPlot DrawPointDistri DrawSplitGraph DrawSplits DrawStackedBar DrawTree dSplitGraph dSplitIndex dSplitMetricSum dSplits DynProgGap DynProgMass DynProgMassDb DynProgNucPepSt DynProgr DynProgScore DynProgStrings Edge EdgeComplement Edges Eigenvalues eigenvalues EndOverlayPlot EnterProfile Entries Entropy Entry EntryInfo EntryNumber enum enzyme enzymes EOF EqradTree erf erfc erfcinv ERROR error EstimateCodonPA EstimateNG86 EstimatePam EstimatePB93 EstimateSynPAM eval evalb EvolTree exit ExitProfile exp ExpandFileName ExpFit ExpFit2 Exponential_Ran ExponFit ExponFit2 expx1 ExtCallFrame ExtendClass externcall factorial Fauchere FDist_Rand fields FileStat FindCircularOrd FindConnectedCo FindEntropy FindHighlyExpre findkey FindLongestRep FindNucPepPam FindRules FindSpeciesViol floor FlushGF FragSearch FreeEnergy function Gamma GammaDist_Rand Gap GapHeuristic GapMatch GapTree GaussElim gausselim gc gcd gcm GenCode GenCodeToInt Gene genetic code GenomeSummary Geometric_Rand GetAaCount GetAaFrequency GetAllNucPepMat GetBetween GetComplement GetEntryInfo GetEntryNumber GetFileInfo GetGramRegionSc GetGramSiteScor GetIndex GetIntrons GetLabels GetLcaSubtree GetMachineUsage GetMATreeNew GetMolWeight GetMostFrequent GetOffset GetPam GetPartitions GetPath GetPathDistance GetPeptides getpid GetPosition GetSubTree_r GetTreeLabels GetTreeLength gigahertz GivensElim GlobalNucPepAli Globals GOdefinition GOdownload GOname GOnumber GOsubclass GOsubclassR GOsuperclass GOsuperclassR GramRegion GramSchmidt GramSite Graph Graph_minus Graph_Rand Graph_XGMML HammingSearchAl HammingSearchSt has hash hastype help Histogram History hostname HTMLColor HTMLColorprint HTMLCols HTMLprint HTMLRows HTMLTitle hydrophobicity i/o ID IdenticalTrees Identity IDS If ilogb indets indexing InduceGraph Infix InfixNr Inherit input input/output Interior InteriorTot intersect IntOut IntraDistance Intron IntronModel IntToA IntToAAA IntToAmino IntToAscii IntToB IntToBase IntToBBB IntToCInt IntToCodon IntToGenCode invlogit io IPCconnect IPCdisconnect IPCread IPCreceive IPCreceiveDATA IPCsend ipcsend IPCsendDATA iquo islower IsolationIndex isupper iterate json kGramRegion kGramRegionScor kGramSite kGramSiteScore KHTest KWIndex LabelTree LarsonTree lasterror latex lcoeff Leaf LeastSquaresTre Leaves length lg libname LinearClassific LinearClassify LinearIntron LinearProgrammi LinearRegressio Lines LinRegr List list ln ln1x LnGamma Lngamma lnProbBallsBoxe load LoadFile LoadMatrixFile local LocalNucPepAlig LocalNucPepAlig LockFile log log10 logit LongestRep LongInteger lowercase lprint LSBestDelete LSBestSum LSBestSumDelete Machine MachineUsage MafftMSA MAlign MAlignment MapGF MapleFormula MassDyn MassDynAll MassProfile Match MatchRegex Matrices matrix matrix_inverse max MaxCut MaxEdgeWeightCl Maximize MaximizeFunc MaximizeRD MaxLikelihoodSi MaxSimil median member min MinCut Minimize Minimize2D Minimize2DFunc MinimizeBrent MinimizeFunc MinimizeSD Minimizex MinSimil MinSqTree MinSquareTree minus MLTopoTest mod MolWeight MostFrequent MoveGap MSAMethod MSAStatistics mselect MST MultAlign MultiAlign Multinomial_Ran MultipleSubTree Mutate MySql MySqlResult name convention names naming NBody NDF NewArray NewString NextGF Nodes noeval Normalize Normal_Rand NPAlignMatch NPAllMatches NPBackDynProgr NPBestPamMatch NPBestPamShake NPDynProgr NPMatch NPMultiAllMatch NPOneAllMatch NPRefine NPRefineShake NPRegions NPSprintMatch NSubGene NToInt NucDB NucleicToInt nucleotides NucPepBackDynPr NucPepDynProg NucPepMatch NucPepRegions NucToCode NucToInt NULL numeric Offsets OneAllMatch op OpenAppending OpenGF OpenPipe OpenReading OpenWriting option optional optional parame options OrderedSearch OrthologousGrou Orthologues output OutsideBounds PA PAAA PAmino PamMax PamToCodonPam PamToPerIdent PamtoPI PamWindows Paragraph ParallelAllNucP parameters ParExec ParExec2 ParExecuteIPC ParExecuteTest parse ParseDimacsGrap ParseNewickTree ParsePred ParTest PartialFraction Partitions Partitions_GetC Partitions_GetT Partitions_Reso PASfromMSA PASfromTree PatEntries PatEntry Path PB PBase PBBB PDF PepDB PepPepSearch peptides PerIdentToPam Permutation PhylogeneticTre PhyloTree PhyML Pi PickTree PItoPam plot Plot2Gif PlotArguments PlotIndex PlotOptions PlotPam Poisson_Rand Polar PolishAngles PosInfo PositionDF PositionTree Postfix PostscriptFigur Predict PredictGenes Prefix Primes print printf PrintIndex PrintInfo printlevel PrintMatrix prints PrintSeqsInTree PrintStringMatc PrintTreeSeq ProbAncestor ProbBallsBoxes ProbCloseMatche ProbDynProg ProbDynProgr ProbIndex ProbSeq proc procedure Process product profile ProfileEnter ProfileExit profiling Protect Protein PruneTree PSDynProg PSubGene QueryAll QueryGF Rand RandomPermut RandomSeq RandomTrees RandTree Rank RAxML RBFS_Tree read Readability ReadBrk readBRK ReadData ReadDb ReadDssp readDSSP ReadFasta readfile ReadLibrary ReadLine ReadOffsetLine ReadPhylip readpipelines ReadProgram ReadRawFile ReadRawLine readstat readstatAt ReadTable ReadTcp ReadURL ReceiveDataTcp ReceiveTcp ReconcileTree RedoCompletion Refine RefineLog RefineShake regexp Region RegularGraph RelativeAdaptiv RellTree remember RenderTemplate ReplaceString RETURN return returntype Reverse RGB_string RobinsonFoulds Roman Romberg RotateTree round RSCU RunDarwinSessio SameTree SaveEntries scalb Scale ScaleIndex ScaleTree ScoreAlignment ScoreIntron Scramble SearchAC SearchAllArray SearchAllString SearchArray SearchDayMatrix SearchDb SearchDelim SearchDF SearchFrag SearchID SearchMassDb SearchMultipleS SearchOrderedAr SearchPepAll SearchPepDF SearchSeqDb SearchString SearchTag SearchText searchtext selection selector selectorfunctio SendDataTcp SendTcp seq sequal Sequence Sequences ServerSocket Set set SetRand SetRandSeed SetupRA SetuptRNA sha2 ShortestPath ShortestPath2 Shuffle sign Signature SignedSynteny Simil sin size sleep SmallAllAll Smooth sort SortedMA SPCommonName specfunc SpeciesCode Species_Entry SplatTree SplitLines sprintf SprintMatch SpToDarwin SP_Species sqrt srand sscanf Ssystem StackedBar Stat Stats StatTest std Std_Score string Strings string_RGB structures Student_Rand SubDist subs subset SubTree sum SummarizeTree Surface SurfaceTot SurfIntActPred SurfOut SvdAnalysis SvdBestBasis SvdResult symbol Synteny system SystemCommand Table table tan TaxonId TaxonomyDownloa TaxonomyEntry TempName TestGradHessian TestStatResult TetrahedronGrap Text text TextBlock TextHead time TimedCallSystem TotalAlign TotalTreeWeight TPIDistr Transcribe Translate translation transpose traperror Tree TreeAngles TreeConstructio TreeOrder TreeResult TreeSize TreeStatistics TreeToPam Tree_Graph Tree_matrix trim TrulyRandom trunc TSP TSP3 TSP4 TT type UnassignGlobals UnCompressGF union UnionFind UnionStats UnLabelTree update UpdateSpeciesCo UpdateStat uppercase UTCTime UUUP var version VertexCover View ViewPlot Violations VisualizeProtei warning WeightObservati WriteBlock WriteData WriteFasta WriteFile WriteMSA WriteSeqXML writeto Zeta zip Zscore